KR102153491B1

KR102153491B1 - Apparatus and method for estimating the sound source arrival angle

Info

Publication number: KR102153491B1
Application number: KR1020200038931A
Authority: KR
Inventors: 전찬준; 전광명
Original assignee: 한국건설기술연구원; 인트플로우 주식회사
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2020-09-08

Abstract

Provided are a sound source arrival angle estimating device and a method thereof. When m (m is an integer greater than 1) audio clips including stereo sound signals for two channels are input, an intensity difference calculation unit calculates an intensity difference between the two channels for each audio clip. An arrival angle estimation unit estimates an arrival angle of the sound source of each audio clip by applying the K-means clustering method to m intensity differences calculated by the intensity difference calculation unit.

Description

Apparatus and method for estimating the sound source arrival angle}

본 발명은 음원 도래각 추정 장치 및 방법에 관한 것으로서, 보다 상세하게는, 스테레오 마이크로폰 환경에서 음원의 발원 방향을 K-means 클러스터링 기법을 활용하여 추정할 수 있는 비지도 학습 기반의 음원 도래각 추정 장치 및 방법에 관한 것이다.The present invention relates to a sound source arrival angle estimation apparatus and method, and more particularly, a sound source arrival angle estimation apparatus based on unsupervised learning capable of estimating the source direction of a sound source in a stereo microphone environment using a K-means clustering technique. And a method.

도 1은 스테레오 마이크로폰에서 발생하는 시간차와 강도차를 보여주는 도면이다.1 is a diagram showing a time difference and an intensity difference occurring in a stereo microphone.

스테레오 마이크로폰 환경에서 음원의 발원방향에 따라서 채널간 강도차(ILD: Inter-channel Level Difference) 및 시간차(ITD: Inter-channel Time Difference)가 도 1과 같이 발생한다. 따라서, 강도차 또는 시간차를 분석하여 음원 방향을 추정하는 것이 가능하다.In a stereo microphone environment, an inter-channel level difference (ILD) and an inter-channel time difference (ITD) occur as shown in FIG. 1 according to the source direction of the sound source. Therefore, it is possible to estimate the sound source direction by analyzing the intensity difference or the time difference.

방향에 따른 강도차를 활용할 경우에는 마이크로폰의 사양(스펙)에 따라서 강도차가 다르게 나타나는 특징이 있다.When using the intensity difference according to the direction, there is a characteristic that the intensity difference appears differently depending on the specifications (specs) of the microphone.

도 2는 다양한 마이크로폰의 방향별 방향성 패턴(Directivity Pattern, 또는 Polar Pattern)을 보여주는 도면이다.2 is a diagram showing a directivity pattern (or polar pattern) for each direction of various microphones.

도 2에 도시된 것처럼, 마이크로폰은 사양에 따라 상이한 패턴을 가지고 있으며, 이에 따라 같은 방향에서 발생한 음원일지라도 어떠한 마이크로폰을 활용했는지에 따라서 강도차는 다르게 나타난다. 이에 따라서, 기존의 마이크로폰 기반의 도래각 추정 기술은 강도차를 활용하기 보다는 시간차를 활용하여 도래각을 추정하는 기술이 대부분이다.As shown in FIG. 2, the microphone has different patterns according to specifications, and accordingly, the intensity difference appears different depending on which microphone is used even for sound sources generated in the same direction. Accordingly, most of the techniques for estimating the angle of arrival based on the existing microphone use the time difference rather than the intensity difference.

시간차를 이용하여 도래각을 추정하는 기술은 크게 Time Delay Estimation(TDE) 방식과 Steered Response Power(SRP) 기반의 방식으로 나눌 수 있다.Techniques for estimating the angle of arrival using the time difference can be largely divided into a Time Delay Estimation (TDE) method and a Steered Response Power (SRP) based method.

TDE 방식의 경우에는 Generalized Cross Correlation with Phase Transform(GCC-PHAT)이 가장 대표적인 방법이며, [수학식 1]처럼 cross-correlation 기반으로 도래각을 추정한다. In the case of the TDE method, Generalized Cross Correlation with Phase Transform (GCC-PHAT) is the most representative method, and the angle of arrival is estimated based on cross-correlation as shown in [Equation 1].

소리 신호의 경우에는 주파수 도메인에서 분석을 하는 경우가 많으며, [수학식 1]을 주파수 도메인 환경에 맞게 변환을 하면 [수학식 2]와 같다.In the case of sound signals, analysis is often performed in the frequency domain, and if [Equation 1] is converted to suit the frequency domain environment, it is as shown in [Equation 2].

한편, SRP 방식의 경우에는 Steered Response Power with Phase Transform(SRP-PHAT)이 가장 대표적인 방법이며, [수학식 3]에서와 같이 다양한 방향별로 빔포밍을 형성한 후에, 에너지가 최대가 되는 곳을 추정함에 따라 도래각을 찾게 된다. On the other hand, in the case of the SRP method, Steered Response Power with Phase Transform (SRP-PHAT) is the most representative method, and after forming beamforming in various directions as in [Equation 3], it is estimated where the energy is maximized. As you do, you find the angle of arrival.

그러나, 시간차를 이용하는 경우에도 스테레오 마이크로폰의 간격이 좁을 경우에는 시간차가 많이 발생하지 않을 수 있으므로 도래각 추정이 어렵거나, 마이크로폰의 간격이 클 경우에는 고주파에서는 Spatial Aliasing 현상이 발생하게 되어 도래각 추정에 어려움을 겪게 된다. However, even when the time difference is used, it is difficult to estimate the angle of arrival because the time difference may not occur when the distance between the stereo microphones is narrow, or when the distance between the microphones is large, the spatial aliasing phenomenon occurs at high frequencies. You will have a hard time.

따라서, 스테레오 마이크로폰 환경에서 음원의 발원 방향을 보다 정확하게 추정할 수 있는 기술이 필요하다.Therefore, there is a need for a technology capable of more accurately estimating the direction of a sound source in a stereo microphone environment.

국내 등록특허 제10-16316110호(2016.06.13)Domestic Registration Patent No. 10-16316110 (2016.06.13)

전술한 문제점을 해결하기 위하여 본 발명이 이루고자 하는 기술적 과제는, 음원의 방향 또는 도래각에 따라서 강도차가 나타나며, 특히 정면을 기준으로 음향 방향이 멀어질수록 강도차가 더 커지는 경향성을 이용하여 음원의 도래각을 추정할 수 있는 음원 도래각 추정 장치 및 방법을 제시하는 데 있다.In order to solve the above-described problem, the technical problem to be achieved by the present invention is that the intensity difference appears according to the direction or angle of arrival of the sound source, and in particular, the arrival of the sound source using the tendency that the intensity difference increases as the sound direction increases from the front. It is to propose an apparatus and method for estimating the angle of arrival of a sound source that can estimate the angle.

본 발명의 해결과제는 이상에서 언급된 것들에 한정되지 않으며, 언급되지 아니한 다른 해결과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problem of the present invention is not limited to those mentioned above, and other problems that are not mentioned will be clearly understood by those skilled in the art from the following description.

전술한 기술적 과제를 해결하기 위한 수단으로서, 본 발명의 실시 예에 따르면, 음원의 도래각 추정 장치는 두 채널에 대한 스테레오 소리 신호를 포함하는 m(m은 1보다 큰 정수)개의 오디오 클립들이 입력되면, 각 오디오 클립 별로 두 채널 간의 강도차를 산출하는 강도차 산출부; 및 상기 강도차 산출부에서 산출되는 m개의 강도차들에 대해 K-means 클러스터링 방식을 적용하여 각 오디오 클립의 음원 도래각을 추정하는 도래각 추정부;를 포함한다.As a means to solve the above technical problem, according to an embodiment of the present invention, the apparatus for estimating the angle of arrival of a sound source inputs m (m is an integer greater than 1) audio clips including stereo sound signals for two channels. If so, the intensity difference calculation unit for calculating the intensity difference between the two channels for each audio clip; And an angle of arrival estimating unit for estimating the angle of arrival of the sound source of each audio clip by applying a K-means clustering method to the m intensity differences calculated by the intensity difference calculation unit.

상기 강도차 산출부는 다음 식을 이용하여 하나의 오디오 클립에 대한 두 채널 간의 강도차를 산출한다.The intensity difference calculation unit calculates an intensity difference between two channels for one audio clip using the following equation.

여기서,

은 하나의 오디오 클립에 대한 두 채널 간의 강도차, n은 하나의 오디오 클립에 속한 샘플들(즉, 신호들)의 인덱스(이하, '샘플 인덱스'라 한다), S는 해당 오디오 클립에서 샘플 인덱스의 최대값,

과

은 각각 스테레오 소리 신호의 좌측 샘플과 우측 샘플이다.here,

Is the intensity difference between two channels for one audio clip, n is the index of samples (i.e., signals) belonging to one audio clip (hereinafter referred to as'sample index'), and S is the sample index in the corresponding audio clip The maximum value of,

and

Are the left and right samples of the stereo sound signal, respectively.

상기 도래각 추정부는, 상기 m개의 오디오 클립들 각각의 음원 도래각 추정을 위해 p개(p는 1보다 큰 정수)의 도래각들이 설정되는 설정부; 상기 산출되는 m개의 강도차들에 대해 K-means 클러스터링 방식을 적용하여 p개의 도래각들에 대응하는 p개의 클러스터들 별로 강도차 센트로이드를 산출하는 센트로이드 산출부; 상기 p개의 클러스터들 별로 산출된 강도차 센트로이드에 대응하는 도래각을 상기 강도차 센트로이드가 속한 클러스터 내의 오디오 클립들의 음원 도래각으로서 추정하는 음원 도래각 추정부;를 포함한다.The angle of arrival estimating unit may include: a setting unit configured to set p (p is an integer greater than 1) angles of arrival for estimating the angle of arrival of the sound source of each of the m audio clips; A centroid calculator configured to calculate an intensity difference centroid for each of p clusters corresponding to p angles of arrival by applying a K-means clustering method to the calculated m intensity differences; And a sound source arrival angle estimating unit for estimating the angle of arrival corresponding to the intensity difference centroid calculated for each of the p clusters as sound source arrival angles of audio clips in the cluster to which the intensity difference centroid belongs.

상기 센트로이드 산출부는, 상기 설정부에서 설정된 도래각의 개수 p개를 클러스터 개수로서 정하고, 상기 강도차 산출부에서 산출된 m개의 강도차들을 p개로 클러스터링하며, p개의 클러스터들 각각에 대해 강도차 센트로이드를 산출한다.The centroid calculation unit sets p number of angles of arrival set by the setting unit as the number of clusters, clusters m intensity differences calculated by the intensity difference calculation unit into p, and intensity difference for each of the p clusters. Calculate the centroid.

한편, 전술한 기술적 과제를 해결하기 위한 수단으로서, 본 발명의 실시 예에 따르면, 음원의 도래각 추정 방법은, (A) 전자장치가, 두 채널에 대한 스테레오 소리 신호를 포함하는 m(m은 1보다 큰 정수)개의 오디오 클립들이 입력되면, 각 오디오 클립 별로 두 채널 간의 강도차를 산출하는 단계; 및 (B) 상기 전자장치가, 상기 (A) 단계에서 산출되는 m개의 강도차들에 대해 K-means 클러스터링 방식을 적용하여 각 오디오 클립의 음원 도래각을 추정하는 도래각 추정부;를 포함한다.On the other hand, as a means for solving the above-described technical problem, according to an embodiment of the present invention, the method of estimating the angle of arrival of a sound source includes: (A) the electronic device includes a stereo sound signal for two channels (m is If an integer greater than 1) number of audio clips are input, calculating an intensity difference between the two channels for each audio clip; And (B) the electronic device, an angle of arrival estimating unit for estimating the angle of arrival of the sound source of each audio clip by applying a K-means clustering method to the m intensity differences calculated in step (A). .

상기 (A) 단계는, 다음 식을 이용하여 하나의 오디오 클립에 대한 두 채널 간의 강도차를 산출한다.In step (A), an intensity difference between two channels for one audio clip is calculated using the following equation.

상기 (B) 단계는, (B1) 상기 m개의 오디오 클립들 각각의 음원 도래각 추정을 위해 p개(p는 1보다 큰 정수)의 도래각들이 설정되는 단계; (B2) 상기 산출되는 m개의 강도차들에 대해 K-means 클러스터링 방식을 적용하여 p개의 도래각들에 대응하는 p개의 클러스터들 별로 강도차 센트로이드를 산출하는 단계; 및 (B3) 상기 p개의 클러스터들 별로 산출된 강도차 센트로이드에 대응하는 도래각을 상기 강도차 센트로이드가 속한 클러스터 내의 오디오 클립들의 음원 도래각으로서 추정하는 단계;를 포함한다.The step (B) includes: (B1) setting p (p is an integer greater than 1) arrival angles for estimating the sound source arrival angle of each of the m audio clips; (B2) calculating intensity difference centroids for each of p clusters corresponding to p angles of arrival by applying a K-means clustering method to the calculated m intensity differences; And (B3) estimating an arrival angle corresponding to the intensity difference centroid calculated for each of the p clusters as sound source arrival angles of audio clips in the cluster to which the intensity difference centroid belongs.

본 발명에 따르면,정면을 기준으로 음원 방향이 변하면 강도차도 변하는 경향성에 기초하여 K-means 클러스터링 기반의 음원의 발원 방향을 추정함으로써 스테레오 마이크로폰의 성능에 상관없이 정확한 발원 방향, 즉, 음원 도래각 추정이 가능하다. According to the present invention, by estimating the origin direction of the sound source based on K-means clustering based on the tendency that the intensity difference also changes when the sound source direction changes based on the front, accurate source direction, that is, the sound source arrival angle, regardless of the performance of the stereo microphone This is possible.

또한, 스테레오 마이크로폰의 음원 도래각 추정이 가능함에 따라 AI 스피커, 사운드 이벤트 디텍션, 음성 인식 등 다양한 분야에서의 활용이 가능하며, 향후, 다양한 음성 및 오디오 관련 어플리케이션에 접목되어 고품질의 서비스를 제시할 수 있다.In addition, as the sound source arrival angle of the stereo microphone can be estimated, it can be used in various fields such as AI speaker, sound event detection, and voice recognition, and in the future, it can be combined with various voice and audio-related applications to provide high-quality services. have.

본 발명의 효과는 이상에서 언급된 것들에 한정되지 않으며, 언급되지 아니한 다른 효과들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to those mentioned above, and other effects that are not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 스테레오 마이크로폰에서 발생하는 시간차와 강도차를 보여주는 도면,
도 2는 다양한 마이크로폰의 방향별 방향성 패턴을 보여주는 도면,
도 3은 본 발명의 실시 예에 따른 음원 도래각 추정 장치를 도시한 블록도,
도 4는 음원과 스테레오 마이크로폰과의 관계를 보여주는 도면,
도 5는 음원의 방향에 따라 나타나는 강도차를 보여주는 예시도,
도 6은 0°, 60°, 120° 및 180° 방향에 존재하는 50개의 오디오 클립들에 대해서 도래각을 추정한 결과를 보여주는 도면,
도 7은 본 발명의 실시 예에 따른 전자장치의 음원 도래각 추정 방법을 보여주는 흐름도, 그리고,
도 8은 본 발명의 일 실시 예에 따른 음원 도래각 추정 방법을 실행하는 컴퓨팅 시스템을 보여주는 블록도이다.1 is a diagram showing a time difference and an intensity difference occurring in a stereo microphone;
2 is a diagram showing directional patterns of various microphone directions according to directions;
3 is a block diagram showing a sound source arrival angle estimation apparatus according to an embodiment of the present invention;
4 is a diagram showing a relationship between a sound source and a stereo microphone;
5 is an exemplary diagram showing an intensity difference appearing according to the direction of a sound source;
6 is a diagram showing the results of estimating the angle of arrival for 50 audio clips existing in the 0°, 60°, 120° and 180° directions;
7 is a flowchart illustrating a method of estimating a sound source arrival angle of an electronic device according to an embodiment of the present invention; and
8 is a block diagram illustrating a computing system that executes a method of estimating an angle of arrival of a sound source according to an embodiment of the present invention.

본 발명의 실시를 위한 구체적인 내용을 설명하기에 앞서, 본 명세서 및 청구범위에 사용된 용어나 단어는 발명자가 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사항에 부합하는 의미와 개념으로 해석되어야 할 것이다.Prior to describing specific details for the implementation of the present invention, terms or words used in the specification and claims may be appropriately defined by the inventor in order to describe his or her own invention in the best way. Based on the principle that there is, it should be interpreted as a meaning and concept corresponding to the technical matters of the present invention.

또한, 본 발명에 관련된 공지 기능 및 그 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는, 그 구체적인 설명을 생략하였음에 유의해야 할 것이다.In addition, when it is determined that a detailed description of known functions and configurations thereof related to the present invention may unnecessarily obscure the subject matter of the present invention, it should be noted that the detailed description has been omitted.

어떤 엘리먼트, 구성요소, 장치, 또는 시스템이 프로그램 또는 소프트웨어로 이루어진 구성요소를 포함한다고 언급되는 경우, 명시적인 언급이 없더라도, 그 엘리먼트, 구성요소, 장치, 또는 시스템은 그 프로그램 또는 소프트웨어가 실행 또는 동작하는데 필요한 하드웨어(예를 들면, 메모리, CPU 등)나 다른 프로그램 또는 소프트웨어(예를 들면 운영체제나 하드웨어를 구동하는데 필요한 드라이버 등)를 포함하는 것으로 이해되어야 할 것이다.Where an element, component, device, or system is stated to include a program or a component made of software, the element, component, device, or system is the execution or operation of the program or software, even if not explicitly stated. It should be understood to include hardware (for example, memory, CPU, etc.) or other programs or software (for example, a driver required to run an operating system or hardware).

또한, 어떤 엘리먼트(또는 구성요소)가 구현됨에 있어서 특별한 언급이 없다면, 그 엘리먼트(또는 구성요소)는 소프트웨어, 하드웨어, 또는 소프트웨어 및 하드웨어 어떤 형태로도 구현될 수 있는 것으로 이해되어야 할 것이다.In addition, it should be understood that an element (or component) may be implemented in software, hardware, or any form of software and hardware, unless otherwise specified in the implementation of any element (or component).

이하에서는 본 발명에서 실시하고자 하는 구체적인 기술내용에 대해 첨부도면을 참조하여 상세하게 설명하기로 한다.Hereinafter, with reference to the accompanying drawings, specific technical content to be implemented in the present invention will be described in detail.

도 3에 도시된 장치들의 각각의 구성은 기능 및 논리적으로 분리될 수 있음을 나타내는 것이며, 반드시 각각의 구성이 별도의 물리적 장치로 구분되거나 별도의 코드로 생성됨을 의미하는 것은 아님을 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다. The description of the present invention indicates that each configuration of the devices shown in FIG. 3 can be separated functionally and logically, and does not necessarily mean that each configuration is classified as a separate physical device or generated as a separate code. The average expert in the field will be able to reason easily.

도 3은 본 발명의 실시 예에 따른 음원 도래각 추정 장치(300)를 도시한 블록도이다.3 is a block diagram illustrating an apparatus 300 for estimating an angle of arrival of a sound source according to an embodiment of the present invention.

도 3에 도시된 음원 도래각 추정 장치(300)는 스테레오 마이크로폰 환경에서 음원의 방향, 즉, 도래각을 추정할 수 있는 장치로서, K-means 클러스터링 방식을 이용하여 비지도 학습 기반의 음원 도래각을 추정할 수 있다.The sound source arrival angle estimation apparatus 300 shown in FIG. 3 is a device capable of estimating the direction of a sound source, that is, the angle of arrival in a stereo microphone environment, and uses a K-means clustering method to determine the sound source arrival angle based on unsupervised learning. Can be estimated.

도 4는 음원과 스테레오 마이크로폰과의 관계를 보여주는 도면, 도 5는 음원의 방향에 따라 나타나는 강도차를 보여주는 예시도이다.4 is a diagram showing a relationship between a sound source and a stereo microphone, and FIG. 5 is an exemplary diagram showing a difference in intensity according to the direction of a sound source.

도 4를 참조하면, Mic1과 Mic2는 각각 우측 마이크로폰과 좌측 마이크로폰, 음원은 소리가 출력되는 위치, θ는 음원의 도래각, 정면은 Mic1과 Mic2의 중심과 음원이 이루는 각도가 90°인 경우로서, 정면을 기준으로 정하기 위한 각도 90°는 변경가능하다. 4, Mic1 and Mic2 are the right and left microphones, respectively, the sound source is the location where the sound is output, θ is the angle of arrival of the sound source, and the front is the case where the angle formed by the center of the Mic1 and Mic2 and the sound source is 90°. , The angle of 90° to be determined from the front can be changed.

도 5를 참조하면, 0°, 60°, 120° 및 180°는 음원의 방향, 즉, 음원의 도래각으로서, 스테레오 마이크로폰 환경에서 음원의 방향에 따라 강도차가 다르게 나타나는 것을 알 수 있다. 특히, 스테레오 마이크로폰의 종류나 성능에 따라서 강도차의 정도 차이가 나타날 수 있지만, 상술한 정면에서 멀어지는 도래각일수록(즉, 도 4에 도시된 것처럼 정면을 기준으로 양측으로 음원이 이동할수록) 강도차가 점점 크게 나타나는 경향성은 모든 마이크로폰에 대해 동일하다고 볼 수 있다. 따라서, 본 발명의 실시 예에서는 이러한 경향성을 이용하여 음원 도래각을 추정한다.Referring to FIG. 5, 0°, 60°, 120°, and 180° are the directions of the sound source, that is, the angle of arrival of the sound source, and it can be seen that the intensity difference appears different depending on the direction of the sound source in a stereo microphone environment. In particular, although the degree of intensity difference may appear depending on the type or performance of the stereo microphone, the intensity difference increases as the angle of arrival further away from the above-described front (that is, as the sound source moves to both sides relative to the front as shown in FIG. 4). The growing trend is the same for all microphones. Therefore, in the embodiment of the present invention, the angle of arrival of the sound source is estimated using this tendency.

다시 도 3을 참조하면, 본 발명의 실시 예에 따른 음원의 도래각 추정 장치(300)는 강도차 산출부(310) 및 도래각 추정부(320)를 포함할 수 있다.Referring back to FIG. 3, the apparatus 300 for estimating an angle of arrival of a sound source according to an embodiment of the present invention may include an intensity difference calculating unit 310 and an angle of arrival estimating unit 320.

강도차 산출부(310)는 두 채널에 대한 스테레오 소리 신호를 포함하는 m(m은 1보다 큰 정수)개의 오디오 클립들이 입력되면, 각 오디오 클립 별로 두 채널 간의 강도차를 산출할 수 있다. 오디오 클립은 스테레오 마이크로폰이 취득한 스테레오 소리 신호로부터 생성된 오디오 파일이다. 두 채널은 우측 마이크로폰과 좌측 마이크로폰에 해당하는 채널이다.When m (m is an integer greater than 1) audio clips including stereo sound signals for two channels are input, the intensity difference calculation unit 310 may calculate an intensity difference between the two channels for each audio clip. An audio clip is an audio file generated from a stereo sound signal acquired by a stereo microphone. The two channels correspond to the right microphone and the left microphone.

강도차 산출부(310)는 [수학식 4]를 이용하여 하나의 오디오 클립에 대한 두 채널 간의 강도차를 산출할 수 있다. 강도차를 산출하기 위한 오디오 클립이 예를 들어 50개인 경우, 강도차 산출부(310)는 50개의 강도차(

~

, 1~50은 오디오 클립 인덱스)를 산출한다.The intensity difference calculation unit 310 may calculate an intensity difference between two channels for one audio clip using [Equation 4]. When there are 50 audio clips for calculating the intensity difference, for example, the intensity difference calculation unit 310 includes 50 intensity differences (

~

, 1~50 is an audio clip index).

여기서,

은 하나의 오디오 클립에 대한 두 채널 간의 강도차(power ratio), n은 하나의 오디오 클립에 속한 샘플들(즉, 신호들)의 인덱스(이하, '샘플 인덱스'라 한다), S는 해당 오디오 클립에서 샘플 인덱스의 최대값,

과

은 각각 스테레오 소리 신호의 좌측 샘플과 우측 샘플을 의미한다.here,

Is the power ratio between two channels for one audio clip, n is the index of samples (i.e., signals) belonging to one audio clip (hereinafter referred to as'sample index'), and S is the corresponding audio The maximum value of the sample index in the clip,

and

Denotes the left and right samples of the stereo sound signal, respectively.

[수학식 4]를 참조하면, 강도차 산출부(310)는 입력되는 스테레오 소리 신호(

과

)를 제곱근한 후 강도차를 산출한다. 우측 채널의 신호와 좌측 채널의 신호의 크기가 비슷하면 1에 가까운 강도차가 산출되고, 우측 채널의 신호가 더 크다면 1보다 큰 강도차가 산출되고, 좌측 채널의 신호가 더 크다면 1보다 작은 강도차가 산출된다.Referring to [Equation 4], the intensity difference calculation unit 310 is an input stereo sound signal (

and

) Is squared and the intensity difference is calculated. If the signal of the right channel and the signal of the left channel are similar, the intensity difference close to 1 is calculated, if the signal of the right channel is larger, an intensity difference greater than 1 is calculated, and if the signal of the left channel is larger, the intensity less than 1 The difference is calculated.

도래각 추정부(320)는 강도차 산출부(310)에서 산출되는 m개의 강도차들에 대해 K-means 클러스터링 방식을 적용하여 강도차 센트로이드를 산출한 후 강도차 센트로이드를 이용하여 m개의 강도차들에 대해 도 6에 도시된 것처럼 군집하며, 이로써 각 오디오 클립의 음원 도래각을 추정할 수 있다.The angle of arrival estimating unit 320 calculates the intensity difference centroid by applying the K-means clustering method to the m intensity differences calculated by the intensity difference calculation unit 310, and then calculates the intensity difference centroid by using the intensity difference centroid. The intensity differences are clustered as shown in FIG. 6, and thus the angle of arrival of the sound source of each audio clip can be estimated.

이를 위하여, 도래각 추정부(320)는 설정부(322), 센트로이드 산출부(324) 및 음원 도래각 추정부(326)를 포함할 수 있다.To this end, the angle of arrival estimating unit 320 may include a setting unit 322, a centroid calculation unit 324 and a sound source angle of arrival estimating unit 326.

설정부(322)는 K-means 클러스터링 방식으로 형성할 클러스터의 개수(p)와 각 클러스터의 값을 설정한다. 즉, p는 도래각을 몇 개로 분류할지를 나타내는 파라메타이다. 따라서, 설정부(322)는 m개의 오디오 클립들 각각의 음원 도래각 추정을 위해 p개(p는 1보다 큰 정수)의 도래각들을 설정하고, 각 도래각에 대해 인덱스를 설정할 수 있다.The setting unit 322 sets the number (p) of clusters to be formed by the K-means clustering method and a value of each cluster. That is, p is a parameter indicating how many angles of arrival are to be classified. Accordingly, the setting unit 322 may set p (p is an integer greater than 1) arrival angles to estimate the sound source arrival angles of each of the m audio clips, and may set an index for each arrival angle.

예를 들어, 사용자가 음원의 방향을 0°, 60°, 120° 및 180°로 분류한 경우, 설정부(322)는 클러스터의 개수는 4로 설정하거나, 반대로 클러스터의 개수를 4로 정하면 음원의 방향(즉, 도래각)을 0°, 60°, 120° 및 180°로 분류한다. 그리고, 0°에는 p=0이라는 인덱스를, 60°에는 p=1이라는 인덱스를, 120°에는 p=2라는 인덱스를, 180°에는 p=3이라는 인덱스를 설정할 수 있다.For example, when the user classifies the direction of the sound source as 0°, 60°, 120°, and 180°, the setting unit 322 sets the number of clusters to 4, or conversely, if the number of clusters is set to 4, the sound source The directions (ie, angle of arrival) of are classified into 0°, 60°, 120° and 180°. In addition, an index of p=0 may be set for 0°, an index p=1 for 60°, an index p=2 for 120°, and an index p=3 for 180°.

센트로이드 산출부(324)는 산출되는 m개의 강도차들에 대해 [수학식 5]의 K-means 클러스터링 방식을 적용하여 p개의 도래각들에 대응하는 p개의 클러스터들 별로 강도차 센트로이드(g_p, p=0, 1, 2, 3)를 산출할 수 있다.The centroid calculation unit 324 applies the K-means clustering method of [Equation 5] to the calculated m intensity differences, so that the intensity difference centroids (g) for each p clusters corresponding to p angles of arrival _p , p=0, 1, 2, 3) can be calculated.

[수학식 5]에서, m은 K-means 클러스터링하기 위한 오디오 클립의 개수, p는 클러스터 개수(또는 분류된 클러스터 인덱스)이고, g_p는 각 클러스터의 강도차 센트로이드, c_p(m)은 클러스터가 p개로 분류된 경우, 강도차 산출부(310)에서 산출된 강도차가 어느 클러스터에 속하는지 마스킹하기 위한 함수이다.In [Equation 5], m is the number of audio clips for K-means clustering, p is the number of clusters (or classified cluster index), g _p is the intensity difference centroid of each cluster, c _p (m) is When the clusters are classified into p, this is a function for masking which cluster the intensity difference calculated by the intensity difference calculation unit 310 belongs to.

[수학식 5]를 참조하면 센트로이드 산출부(324)는 설정부(322)에서 설정된 도래각의 개수 p개를 클러스터 개수로서 정하고, 강도차 산출부(310)에서 산출된 m개의 강도차들을 p개로 클러스터링하며, p개의 클러스터들 각각에 대해 강도차 센트로이드를 산출할 수 있다.Referring to [Equation 5], the centroid calculation unit 324 determines the number of arrival angles p set by the setting unit 322 as the number of clusters, and m intensity differences calculated by the intensity difference calculation unit 310 are Clustering is performed by p, and intensity difference centroids can be calculated for each of the p clusters.

즉, [수학식 5]와 같은 K-means 클러스터링에 의해 센트로이드 산출부(324)는 오디오 클립들의 강도차들을 도래각 별로, 즉, p의 인덱스 별로 분류하여 클러스터링한 후, 동일한 클러스터에 속한 강도차들의 중심값(mean)인 센트로이드를 산출할 수 있다.That is, the centroid calculation unit 324 classifies and clusters the intensity differences of audio clips by angle of arrival, that is, by index of p, by K-means clustering as in [Equation 5], and then the intensity belonging to the same cluster The centroid, which is the mean of the cars, can be calculated.

예를 들어, 센트로이드 산출부(324)는 모든 오디오 클립들의 강도차를 순회하며 각 강도차마다 가장 가까운 센트로이드가 속해 있는 클러스터로 강도차를 어사인(assign)하고, 센트로이드를 재산출하여 클러스터의 중심으로 이동할 수 있다. 새로 어사인된 강도차가 발생하면 센트로이드도 변경되므로, 센트로이드 산출부(324)는 변경된 센트로이드와 강도차들의 유클리디안 거리에 따라 어사인을 반복 수행할 수 있다. 즉, 센트로이드 산출부(324)는 클러스터에 새로 어사인되는(즉, 변경된 센트로이드에 의해 소속된 클러스터가 변경되는) 강도차가 없을 때까지 센트로이드를 재산출하고 클러스터로 어사인하는 동작을 반복수행할 수 있다. For example, the centroid calculation unit 324 traverses the intensity difference of all audio clips, assigns the intensity difference to the cluster to which the nearest centroid belongs to each intensity difference, and recalculates the centroid. You can move to the center of the cluster. Since the centroid is also changed when a newly assigned intensity difference occurs, the centroid calculating unit 324 may repeatedly perform the assignment according to the Euclidean distance between the changed centroid and the intensity differences. That is, the centroid calculation unit 324 repeats the operation of recalculating the centroid and assigning it to the cluster until there is no intensity difference that is newly assigned to the cluster (that is, the cluster belonging to the changed centroid is changed). Can be done.

음원 도래각 추정부(326)는 센트로이드 산출부(324)에서 p개의 클러스터들 별로 산출된 강도차 센트로이드(g_p)에 대응하는 도래각을 강도차 센트로이드(g_p)가 속한 클러스터 내의 오디오 클립들의 음원 도래각으로서 추정할 수 있다.The sound source arrival angle estimating unit 326 calculates the angle of arrival corresponding to the intensity difference centroid (g _p ) calculated for each of p clusters in the centroid calculation unit 324 within the cluster to which the intensity difference centroid (g _p ) belongs. It can be estimated as the angle of arrival of the sound source of audio clips.

도 6은 0°, 60°, 120° 및 180° 방향에 존재하는 50개의 오디오 클립들에 대해서 도래각을 추정한 결과를 보여주는 도면이다.6 is a diagram showing a result of estimating the angle of arrival for 50 audio clips existing in the 0°, 60°, 120° and 180° directions.

도 6을 참조하면, 상단에 표기된 0°, 60°, 120° 및 180°는 초기에 강도차 산출부(310)로 입력된 50개의 오디오 클립들의 실제 음원 방향(또는 Ground Truth)이다.

은 m(m=1~50)번째 오디오 클립의 강도차, g_o, g₁, g₂, g₃는 각각 0°, 60°, 120° 및 180°에 해당하는 클러스터들에 대한 강도차 센트로이드이다. 예를 들어, m=1~12인 오디오 클립들은 g₀의 클러스터에 군집되어 있으며, 따라서, m=1~12인 오디오 클립들의 도래각은 0°로 추정된다.Referring to FIG. 6, 0°, 60°, 120°, and 180° indicated at the top are the actual sound source directions (or ground truths) of 50 audio clips initially input to the intensity difference calculation unit 310.

Is the intensity difference of the m (m=1-50) th audio clip, g _o , g ₁ , g ₂ , and g ₃ are the intensity difference cents for clusters corresponding to 0°, 60°, 120° and 180° respectively It's Lloyd. For example, audio clips of m=1-12 are clustered in a cluster of g ₀ , and therefore, the angle of arrival of the audio clips of m=1-12 is estimated to be 0°.

도 6에 의하면, 실제 GT 각도에 따라서 강도차의 분포가 실제 음원의 발원 방향과 일치하게 분포되어 있는 것을 알 수 있다. 즉, 본 발명의 실시 예에 따른 K-means 클러스터링을 기반으로 모든 음원 또는 모든 오디오 클립들에 대한 도래각을 정확하게 추정하는 것이 가능하다. According to FIG. 6, it can be seen that the distribution of the intensity difference according to the actual GT angle is distributed in accordance with the actual source direction of the sound source. That is, it is possible to accurately estimate the angle of arrival for all sound sources or all audio clips based on K-means clustering according to an embodiment of the present invention.

도 7은 본 발명의 실시 예에 따른 전자장치의 음원 도래각 추정 방법을 보여주는 흐름도이다.7 is a flowchart illustrating a method of estimating a sound source arrival angle of an electronic device according to an embodiment of the present invention.

도 7의 음원의 도래각 추정 방법을 수행하는 전자장치는 도 3 내지 도 6을 참조하여 설명한 음원의 도래각 추정 장치(300)이거나 또는 음원의 도래각 추정 장치(300)와 방법을 구현하기 위한 컴퓨팅 시스템(800)일 수도 있다.The electronic device for performing the method of estimating the angle of arrival of the sound source of FIG. 7 is the device 300 for estimating the angle of arrival of the sound source described with reference to FIGS. 3 to 6, or It may be a computing system 800.

도 7을 참조하면, 전자장치는 두 채널에 대한 스테레오 소리 신호를 포함하는 m(m은 1보다 큰 정수)개의 오디오 클립들이 입력되면, 각 오디오 클립 별로 두 채널 간의 강도차를 산출할 수 있다(S710). S710단계는 [수학식 4]를 이용할 수 있다. Referring to FIG. 7, when m (m is an integer greater than 1) audio clips including stereo sound signals for two channels are input, the electronic device may calculate an intensity difference between the two channels for each audio clip ( S710). Step S710 may use [Equation 4].

전자장치는 S710단계에서 산출되는 m개의 강도차들에 대해 K-means 클러스터링 방식을 적용하여 각 오디오 클립의 음원 도래각을 추정할 수 있다(S720).The electronic device may estimate a sound source arrival angle of each audio clip by applying a K-means clustering method to the m intensity differences calculated in step S710 (S720).

S720단계에서, 전자장치는 m개의 오디오 클립들 각각의 음원 도래각 추정을 위해 p개(p는 1보다 큰 정수)의 도래각들을 설정할 수 있다(S722).In step S720, the electronic device may set p (p is an integer greater than 1) arrival angles to estimate the sound source arrival angle of each of the m audio clips (S722).

전자장치는 S722단계에서 산출되는 m개의 강도차들에 대해 [수학식 5]와 같은 K-means 클러스터링 방식을 적용하여 p개의 도래각들에 대응하는 p개의 클러스터들 별로 강도차 센트로이드를 산출할 수 있다(S724).The electronic device calculates the intensity difference centroid for each of p clusters corresponding to p angles of arrival by applying a K-means clustering method such as [Equation 5] to the m intensity differences calculated in step S722. Can be (S724).

전자장치는 S724단계에서 p개의 클러스터들 별로 산출된 강도차 센트로이드(p=4인 경우, g_o, g₁, g₂, g₃)에 대응하는 도래각을 강도차 센트로이드(p=4인 경우, g_o, g₁, g₂, g₃)가 속한 클러스터 내의 오디오 클립들의 음원 도래각으로서 추정할 수 있다(S726). In step S724, the electronic device determines the angle of arrival corresponding to the intensity difference centroid (g _o , g ₁ , g ₂ , g ₃ ) calculated for each of p clusters (p=4). In the case of g _o , g ₁ , g ₂ , g ₃ ), it can be estimated as the sound source arrival angles of audio clips in the cluster to which it belongs (S726).

도 8은 본 발명의 일 실시 예에 따른 음원 도래각 추정 방법을 실행하는 컴퓨팅 시스템을 보여주는 블록도이다.8 is a block diagram illustrating a computing system that executes a method of estimating an angle of arrival of a sound source according to an embodiment of the present invention.

도 8을 참조하면, 컴퓨팅 시스템(800)은 버스(820)를 통해 연결되는 적어도 하나의 프로세서(810), 메모리(830), 사용자 인터페이스 입력 장치(840), 사용자 인터페이스 출력 장치(850), 스토리지(860), 및 네트워크 인터페이스(870)를 포함할 수 있다. 음원의 도래각 추정 장치(300)는 컴퓨팅 시스템(800)일 수 있다.Referring to FIG. 8, the computing system 800 includes at least one processor 810, a memory 830, a user interface input device 840, a user interface output device 850, and storage connected through a bus 820. 860, and a network interface 870. The apparatus 300 for estimating the angle of arrival of the sound source may be a computing system 800.

프로세서(810)는 중앙 처리 장치(CPU) 또는 메모리(830) 및/또는 스토리지(860)에 저장된 명령어들에 대한 처리를 실행하는 반도체 장치일 수 있다. 메모리(830) 및 스토리지(860)는 다양한 종류의 휘발성 또는 비휘발성 저장 매체를 포함할 수 있다. 예를 들어, 메모리(830)는 ROM(Read Only Memory)(831) 및 RAM(Random Access Memory)(832)을 포함할 수 있다. The processor 810 may be a central processing unit (CPU) or a semiconductor device that processes instructions stored in the memory 830 and/or the storage 860. The memory 830 and the storage 860 may include various types of volatile or nonvolatile storage media. For example, the memory 830 may include a read only memory (ROM) 831 and a random access memory (RAM) 832.

따라서, 본 명세서에 개시된 실시 예들과 관련하여 설명된 방법 또는 알고리즘의 단계는 프로세서(810)에 의해 실행되는 하드웨어, 소프트웨어 모듈, 또는 그 2 개의 결합으로 직접 구현될 수 있다. 소프트웨어 모듈은 RAM 메모리, 플래시 메모리, ROM 메모리, EPROM 메모리, EEPROM 메모리, 레지스터, 하드 디스크, 착탈형 디스크, CD-ROM과 같은 저장 매체(즉, 메모리(830) 및/또는 스토리지(860))에 상주할 수도 있다. 예시적인 저장 매체는 프로세서(810)에 커플링되며, 그 프로세서(810)는 저장 매체로부터 정보를 판독할 수 있고 저장 매체에 정보를 기입할 수 있다. 다른 방법으로, 저장 매체는 프로세서(810)와 일체형일 수도 있다. 프로세서 및 저장 매체는 주문형 집적회로(ASIC) 내에 상주할 수도 있다. ASIC는 사용자 단말기 내에 상주할 수도 있다. 다른 방법으로, 프로세서 및 저장 매체는 사용자 단말기 내에 개별 컴포넌트로서 상주할 수도 있다.Accordingly, the steps of the method or algorithm described in connection with the embodiments disclosed herein may be directly implemented in hardware executed by the processor 810, a software module, or a combination of the two. The software module resides in a storage medium (i.e., memory 830 and/or storage 860) such as RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM. You may. An exemplary storage medium is coupled to the processor 810, which is capable of reading information from and writing information to the storage medium. Alternatively, the storage medium may be integral with the processor 810. The processor and storage media may reside within an application specific integrated circuit (ASIC). The ASIC may reside within the user terminal. Alternatively, the processor and storage medium may reside as separate components within the user terminal.

이상에서, 본 발명의 실시예를 구성하는 모든 구성 요소들이 하나로 결합되거나 결합되어 동작하는 것으로 설명되었다고 해서, 본 발명이 반드시 이러한 실시예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성 요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다. 또한, 그 모든 구성요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성 요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수 개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 그 컴퓨터 프로그램을 구성하는 코드들 및 코드 세그먼트들은 본 발명의 기술 분야의 당업자에 의해 용이하게 추론될 수 있을 것이다. 이러한 컴퓨터 프로그램은 컴퓨터가 읽을 수 있는 저장매체(Computer Readable Media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시예를 구현할 수 있다.In the above, even if all the constituent elements constituting the embodiments of the present invention have been described as being combined into one or operating in combination, the present invention is not necessarily limited to these embodiments. That is, within the scope of the object of the present invention, all of the constituent elements may be selectively combined and operated in one or more. In addition, although all the components may be implemented as one independent hardware, a program module that performs some or all functions combined in one or more hardware by selectively combining some or all of the components. It may be implemented as a computer program having Codes and code segments constituting the computer program may be easily inferred by those skilled in the art of the present invention. Such a computer program is stored in a computer-readable storage medium, and is read and executed by a computer, thereby implementing an embodiment of the present invention.

한편, 이상으로 본 발명의 기술적 사상을 예시하기 위한 바람직한 실시 예와 관련하여 설명하고 도시하였지만, 본 발명은 이와 같이 도시되고 설명된 그대로의 구성 및 작용에만 국한되는 것이 아니며, 기술적 사상의 범주를 일탈함이 없이 본 발명에 대해 다수의 변경 및 수정 가능함을 당업자들은 잘 이해할 수 있을 것이다. 따라서, 그러한 모든 적절한 변경 및 수정과 균등물들도 본 발명의 범위에 속하는 것으로 간주하여야 할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 등록청구범위의 기술적 사상에 의해 정해져야 할 것이다.On the other hand, although it has been described and illustrated in connection with a preferred embodiment for exemplifying the technical idea of the present invention, the present invention is not limited to the configuration and operation as shown and described as described above, and deviates from the scope of the technical idea. It will be appreciated by those skilled in the art that a number of changes and modifications can be made to the present invention. Accordingly, all such appropriate changes, modifications, and equivalents should be considered to be within the scope of the present invention. Therefore, the true technical protection scope of the present invention should be determined by the technical idea of the attached registration claims.

300: 음원의 도래각 추정 장치
310: 강도차 산출부
320: 도래각 추정부300: device for estimating the angle of arrival of the sound source
310: intensity difference calculation unit
320: angle of arrival estimation unit

Claims

An intensity difference calculator for calculating an intensity difference between the two channels for each audio clip when m (m is an integer greater than 1) audio clips including stereo sound signals for two channels are input; And
Including; an angle of arrival estimating unit for estimating the angle of arrival of the sound source of each audio clip by applying a K-means clustering method to the m intensity differences calculated by the intensity difference calculation unit,
The intensity difference calculation unit calculates the intensity difference between two channels for one audio clip using the following equation,

here,

and

Are the left and right samples of the stereo sound signal, respectively,
The angle of arrival estimation unit,
A setting unit configured to set p (p is an integer greater than 1) arrival angles for estimating sound source arrival angles of the m audio clips;
A centroid calculator configured to calculate an intensity difference centroid for each of p clusters corresponding to p angles of arrival by applying a K-means clustering method to the calculated m intensity differences; And
And a sound source arrival angle estimating unit for estimating an angle of arrival corresponding to the intensity difference centroid calculated for each of the p clusters as sound source arrival angles of audio clips in the cluster to which the intensity difference centroid belongs. Angle of arrival estimation device.

delete

The method of claim 1,
The centroid calculation unit,
Setting p number of angles of arrival set in the setting unit as the number of clusters, clustering the m intensity differences calculated by the intensity difference calculation unit into p, and calculating intensity difference centroids for each of the p clusters. A device for estimating the angle of arrival of a sound source, characterized in that.

(A) when m audio clips including stereo sound signals for two channels (m is an integer greater than 1) are input, calculating an intensity difference between the two channels for each audio clip; And
(B) estimating, by the electronic device, a sound source arrival angle of each audio clip by applying a K-means clustering method to the m intensity differences calculated in step (A), and
In the step (A), an intensity difference between two channels for one audio clip is calculated using the following equation,

here,

and

Are the left and right samples of the stereo sound signal, respectively,
The (B) step,
(B1) setting p (p is an integer greater than 1) arrival angles for estimating sound source arrival angles of each of the m audio clips;
(B2) calculating an intensity difference centroid for each of p clusters corresponding to p angles of arrival by applying a K-means clustering method to the calculated m intensity differences; And
(B3) estimating an angle of arrival corresponding to the intensity difference centroid calculated for each of the p clusters as sound source arrival angles of audio clips in the cluster to which the intensity difference centroid belongs; and a sound source comprising: How to estimate the angle of arrival.

delete