KR20140015893A

KR20140015893A - Apparatus and method for estimating location of sound source

Info

Publication number: KR20140015893A
Application number: KR1020120081970A
Authority: KR
Inventors: 김광윤; 최우현; 고한석
Original assignee: 삼성테크윈 주식회사; 고려대학교 산학협력단
Priority date: 2012-07-26
Filing date: 2012-07-26
Publication date: 2014-02-07
Also published as: KR101767925B1

Abstract

The present invention relates to a device and a method for estimating a sound source location and more specifically, to a device and a method for improving an estimation speed of a sound source location by reducing an SRP-PHAT calculation quantity for sound source location estimation. The method for estimating a sound source location comprises the steps of generating a histogram based on a plurality of interested abnormal sound frames to establish a database; fast Fourier transforming a sound signal inputted by a multichannel; selecting the certain number of frequency index from the fast Fourier transformed signal by using, as a control signal, the frequency index that meets the predetermined conditions in the histogram stored in the database; and calculating SRP-PHAT for a signal corresponding to the selected frequency index. The present invention is able to reduce an SRP-PHAT calculation quantity for sound source location estimation based on the database in which the histogram generated from the interested abnormal sound signal is stored, thereby improving an estimation speed of the sound source location. [Reference numerals] (AA) Start; (BB) End; (S10) Generate a histogram based on a plurality of interested abnormal sound frames to establish a database; (S20) Fast Fourier transform a sound signal inputted by a multichannel; (S30) Select the certain number of frequency index from the fast Fourier transformed signal by using, as a control signal, the frequency index that meets the predetermined conditions in the histogram stored in the database; (S40) Calculate SRP-PHAT for a signal corresponding to the selected frequency index; (S50) Accumulating the calculated SRP-PHAT for each certain time to estimate, to a sound source location, the direction where the maximum SRP-PHAT is accumulated the most

Description

Apparatus and method for estimating location of sound source}

본 발명은 음원 위치 추정 장치 및 방법에 관한 것으로, 보다 상세하게는 음원 위치 추정을 위한 SRP-PHAT 연산량을 줄여 음원 위치 추정 속도를 향상시키는 음원 위치 추정 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for estimating a sound source, and more particularly, to an apparatus and method for estimating a sound source position by reducing an amount of SRP-PHAT calculation for sound source position estimation.

음원 위치 추정 기술은 마이크로폰 어레이 등의 음향 센서들을 사용하여 음원 및 화자의 위치를 파악하는 기술로서, 3차원 가상 입체 음향 산업, 인간형 지능 로봇의 화자인식 방법, 군사용 원격 감시 시스템 등 여러 공학적 응용 분야에 적용되어, 많은 부가가치를 창출하고 있다. 다수의 마이크를 직렬이나 병렬로 배치하여 마이크 어레이를 구성하고 이 마이크에 입력된 신호를 분석하여 음원 위치를 파악하려는 연구 개발에 많은 노력을 기울이고 있다.Sound source location technology uses sound sensors such as microphone arrays to locate sound sources and speakers, and is widely used in various engineering applications such as the 3D virtual stereo industry, speaker recognition of humanoid intelligent robots, and military remote monitoring systems. Applied, creating a lot of added value. Much effort is being put into research and development to locate microphones by arranging a microphone array by arranging a plurality of microphones in series or in parallel and analyzing the signals input to the microphones.

이러한 음원 추정 알고리즘은 임의의 음원 위치로부터 각 마이크까지 음파 전달 경로의 차이에 기인한 음파 도착 시간 차이를 가정하여, 이에 따라 각 마이크의 입력 신호를 보상해 준 다음, 신호 간의 상관도를 계하여, 그 값이 최대치가 되게 하는 도착 시간 차이를 통해 각도를 찾아내어 음원 위치로 추정하고 있다.The sound source estimation algorithm assumes a difference in sound wave arrival time due to a difference in sound wave propagation paths from an arbitrary sound source position to each microphone, thereby compensating an input signal of each microphone, and then calculating a correlation between signals. The angle is found from the arrival time difference that makes the value maximum, and it is estimated as the sound source position.

국내 공개특허 공보 제2009-0044314호Korean Unexamined Patent Publication No. 2009-0044314

본 발명이 해결하고자 하는 기술적인 과제는 관심 있는 비정상 사운드 신호로부터 생성된 히스토그램을 저장하고 있는 데이터베이스를 기반으로 음원 위치 추정을 위한 SRP-PHAT 연산량을 줄여 음원 위치 추정 속도를 향상시키는 음원 위치 추정 장치 및 방법을 제공하는데 있다.Technical problem to be solved by the present invention is a sound source position estimation apparatus for improving the sound source position estimation speed by reducing the amount of SRP-PHAT calculation for sound source position estimation based on a database storing a histogram generated from an abnormal sound signal of interest and To provide a method.

본 발명이 이루고자 하는 기술적인 과제를 해결하기 위한 일 실시 예에 따른 음원 위치 추정 방법은 복수의 관심 있는 비정상 사운드 프레임을 기반으로 히스토그램을 생성하여 데이터베이스를 구축하는 단계; 멀티 채널로부터 입력되는 사운드 신호를 고속 푸리에 변환하는 단계; 상기 데이터베이스에 저장된 히스토그램 중 소정의 조건을 만족하는 주파수 인덱스를 선택 제어 신호로 하여, 상기 고속 푸리에 변환된 신호로부터 소정 개수의 주파수 인덱스를 선택하는 단계; 및 상기 선택된 주파수 인덱스에 해당하는 신호에 대해 SRP-PHAT를 산출하는 단계;를 포함하는 것을 특징으로 한다.According to an aspect of the present invention, there is provided a sound source position estimation method, comprising: constructing a database by generating a histogram based on a plurality of abnormal sound frames of interest; Fast Fourier transforming a sound signal input from the multi-channel; Selecting a predetermined number of frequency indices from the fast Fourier transformed signal using a frequency index satisfying a predetermined condition among the histograms stored in the database as a selection control signal; And calculating an SRP-PHAT for a signal corresponding to the selected frequency index.

본 발명에 있어서, 소정의 시간대 별로 상기 산출된 SRP-PHAT를 누적하여 최대 SRP-PHAT가 가장 많이 누적된 방향을 음원 위치로 추정하는 단계;를 더 포함하는 것을 특징으로 한다.In the present invention, accumulating the calculated SRP-PHAT for each predetermined time period, and estimating a direction in which the maximum SRP-PHAT is accumulated the most, as a sound source location.

본 발명에 있어서, 상기 데이터베이스를 구축하는 단계는, 비명, 경보음, 유리 깨지는 소리 또는 타이어 마찰음 등과 같은 복수의 비정상 사운드의 주파수를 수집하는 단계; 상기 수집된 복수의 비정상 사운드 주파수에 대하여 프레임 분할 및 고속 푸리에 변환하는 단계; 및 상기 고속 푸리에 변환된 신호로부터 소정의 조건을 만족하는 주파수 인덱스를 이용하여 히스토그램을 생성하는 단계;를 포함하는 것을 특징으로 한다.In the present invention, the building of the database may include collecting frequencies of a plurality of abnormal sounds such as screams, alarm sounds, glass cracking sounds or tire friction sounds; Performing frame division and fast Fourier transform on the collected plurality of abnormal sound frequencies; And generating a histogram from the fast Fourier transformed signal using a frequency index that satisfies a predetermined condition.

본 발명에 있어서, 상기 히스토그램을 생성하는 단계는, 상기 고속 푸리에 변환된 신호로부터 소정의 에너지 임계값 이상의 주파수 인덱스를 이용하여 히스토그램을 생성하는 것을 특징으로 한다.In the present invention, the generating of the histogram may include generating a histogram from the fast Fourier transformed signal using a frequency index equal to or greater than a predetermined energy threshold.

본 발명에 있어서, 상기 히스토그램을 생성하는 단계는, 상기 고속 푸리에 변환된 신호를 에너지 기준으로 소팅하는 단계; 및 상기 소팅된 상위 소정 개수의 주파수 인덱스에 해당하는 주파수 인덱스를 이용하여 히스토그램을 생성하는 단계;를 포함하는 것을 특징으로 한다.The generating of the histogram may include: sorting the fast Fourier transformed signal by an energy reference; And generating a histogram using frequency indexes corresponding to the sorted upper predetermined number of frequency indexes.

본 발명에 있어서, 상기 주파수 인덱스를 선택하는 단계는, 상기 음원 위치 추정 방법을 구현하는 시스템에서 처리 가능한 개수의 주파수 인덱스를 선택하는 것을 특징으로 한다.In the present invention, the step of selecting the frequency index, characterized in that for selecting the number of frequency index that can be processed in the system implementing the sound source position estimation method.

본 발명이 이루고자 하는 기술적인 과제를 해결하기 위한 일 실시 예에 따른 음원 위치 추정 장치는 복수의 관심 있는 비정상 사운드 프레임을 기반으로 히스토그램을 생성하여 저장하고 있는 데이터베이스; 멀티 채널로부터 입력되는 사운드 신호를 고속 푸리에 변환하는 변환부; 상기 데이터베이스에 저장된 히스토그램 중 소정의 조건을 만족하는 주파수 인덱스를 선택 제어 신호로 하여, 상기 고속 푸리에 변환된 신호로부터 소정 개수의 주파수 인덱스를 선택하는 선택부; 및 상기 선택된 주파수 인덱스에 해당하는 신호에 대해 SRP-PHAT를 산출하는 SRP-PHAT 산출부;를 포함하는 것을 특징으로 한다.According to an aspect of the present invention, there is provided a sound source position estimation apparatus, including: a database generating and storing a histogram based on a plurality of abnormal sound frames of interest; A converter for fast Fourier transforming the sound signal input from the multi-channels; A selecting unit for selecting a predetermined number of frequency indices from the fast Fourier transformed signal using a frequency index satisfying a predetermined condition among the histograms stored in the database as a selection control signal; And an SRP-PHAT calculation unit for calculating an SRP-PHAT for a signal corresponding to the selected frequency index.

본 발명에 있어서, 소정의 시간대 별로 상기 산출된 SRP-PHAT를 누적하여 최대 SRP-PHAT가 가장 많이 누적된 방향을 음원 위치로 추정하는 음원 위치 추정부;를 더 포함하는 것을 특징으로 한다.In the present invention, the sound source position estimating unit for accumulating the calculated SRP-PHAT for each predetermined time period and estimating the direction in which the maximum SRP-PHAT is accumulated the most sound source position; characterized in that it further comprises.

본 발명에 있어서, 상기 데이터베이스는, 비명, 경보음, 유리 깨지는 소리 또는 타이어 마찰음 등과 같은 복수의 비정상 사운드의 주파수를 수집하여, 프레임 분할 및 고속 푸리에 변환한 후, 소정의 조건을 만족하는 주파수 인덱스를 이용하여 생성된 히스토그램을 저장하고 있는 것을 특징으로 한다.In the present invention, the database collects the frequencies of a plurality of abnormal sounds, such as screams, alarm sounds, glass cracking sounds or tire friction sounds, frame division and fast Fourier transform, and then sets a frequency index that satisfies a predetermined condition. It is characterized by storing the histogram generated by using.

본 발명에 있어서, 상기 데이터베이스는, 상기 고속 푸리에 변환된 신호로부터 소정의 에너지 임계값 이상의 주파수 인덱스를 이용하여 생성된 히스토그램을 저장하고 있는 것을 특징으로 한다.In the present invention, the database stores a histogram generated from the fast Fourier transformed signal using a frequency index equal to or greater than a predetermined energy threshold.

본 발명에 있어서, 상기 데이터베이스는, 상기 고속 푸리에 변환된 신호를 에너지 기준으로 소팅하고, 상기 소팅된 상위 소정 개수의 주파수 인덱스에 해당하는 주파수 인덱스를 이용하여 생성된 히스토그램을 저장하고 있는 것을 특징으로 한다.In the present invention, the database may be configured to sort the fast Fourier transformed signal on an energy basis, and store a histogram generated using a frequency index corresponding to the sorted upper predetermined number of frequency indexes. .

본 발명에 있어서, 상기 선택부는, 상기 장치에서 처리 가능한 개수의 주파수 인덱스를 선택하는 것을 특징으로 한다.In the present invention, the selector is characterized in that for selecting the number of frequency indices that can be processed in the device.

상술한 바와 같이 본 발명에 따르면, 관심 있는 비정상 사운드 신호로부터 생성된 히스토그램을 저장하고 있는 데이터베이스를 기반으로 음원 위치 추정을 위한 SRP-PHAT 연산량을 줄여 음원 위치 추정 속도를 향상시킬 수 있다.As described above, according to the present invention, it is possible to improve the sound source position estimation speed by reducing the SRP-PHAT calculation amount for sound source position estimation based on a database storing a histogram generated from an abnormal sound signal of interest.

소정 개수의 주파수 인덱스를 선택하여 SRP-PHAT 연산을 수행함으로써, 잡음에 의한 음원 위치 오추정을 줄일 수 있다.By selecting a predetermined number of frequency indices and performing an SRP-PHAT operation, it is possible to reduce sound source position erroneousness due to noise.

도 1은 본 발명의 일 실시 예에 따른 음원 위치 추정 장치의 구성을 보이는 블록도 이다.
도 2는 도 1 중 데이터베이스에 저장되는 히스토그램을 보이는 도면이다.
도 3은 본 발명의 일 실시 예에 따른 음원 위치 추정 방법의 동작을 보이는 흐름도 이다.1 is a block diagram showing the configuration of a sound source position estimation apparatus according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a histogram stored in a database of FIG. 1.
3 is a flowchart illustrating an operation of a sound source position estimation method according to an embodiment of the present invention.

이하, 본 발명의 실시 예를 첨부도면을 참조하여 상세히 설명하기로 하며, 첨부 도면을 참조하여 설명함에 있어, 동일하거나 대응하는 구성 요소는 동일한 도면번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. Referring to the accompanying drawings, the same or corresponding components are denoted by the same reference numerals, do.

도 1은 본 발명의 일 실시 예에 따른 음원 위치 추정 장치의 구성을 보이는 블록도 이다.1 is a block diagram showing the configuration of a sound source position estimation apparatus according to an embodiment of the present invention.

도 1을 참조하면, 음원 위치 추정 장치(10)는 데이터베이스(100), 변환부(200), 선택부(300), SRP-PHAT 산출부(400) 및 음원위치 추정부(500)를 포함한다.Referring to FIG. 1, the sound source position estimating apparatus 10 includes a database 100, a converting unit 200, a selecting unit 300, an SRP-PHAT calculating unit 400, and a sound source position estimating unit 500. .

데이터베이스(100)는 복수의 관심 있는 비정상 사운드 프레임들을 기반으로 히스토그램을 생성하여 저장한다. 여기서 복수의 관심 있는 비정상 사운드 신호는 남자 또는 여자의 비명, 경보음, 유리 깨지는 소리, 타이어 마찰음과 같은 위험 상황에서 발생하는 있는 사운드 신호일 수 있다. 이러한 비정상 사운드 신호들은 그 종류에 따라 고유의 주파수 특성이 있으므로, 이를 반영하기 위해 통계학적 분석 방법으로써 히스토그램을 생성하여 데이터베이스(100)를 구축한다.The database 100 generates and stores a histogram based on the plurality of abnormal sound frames of interest. Here, the plurality of abnormal sound signals of interest may be sound signals generated in a dangerous situation such as a scream of a man or a woman, an alarm sound, a glass cracking sound, a tire friction sound. Since these abnormal sound signals have inherent frequency characteristics according to their types, a database 100 is constructed by generating a histogram as a statistical analysis method to reflect them.

데이터베이스(100)를 구축하기 위해, 비명, 경보음, 유리 깨지는 소리 또는 타이어 마찰음 등과 같은 복수의 비정상 사운드의 주파수를 수집하고, 수집된 복수의 비정상 사운드 주파수에 대하여 프레임 분할하고, 각 프레임에 대해 고속 푸리에 변환을 수행한다. 이어서, 고속 푸리에 변환된 신호로부터 소정의 조건을 만족하는 주파수 인덱스를 이용하여 히스토그램을 생성하여 데이터베이스(100)를 구축한다. 이때, 소정의 조건을 만족하는 주파수 인덱스란, 고속 푸리에 변환된 신호로부터 소정의 에너지 임계값 이상의 주파수 인덱스일 수 있다. 또한 소정의 조건을 만족하는 주파수 인덱스란, 고속 푸리에 변환된 신호를 에너지 기준으로 소팅한 후, 소팅된 상위 소정 개수의 주파수 인덱스일 수 있다.In order to build the database 100, the frequency of a plurality of abnormal sounds, such as screams, alarm sounds, glass cracking or tire friction sounds, is collected, frame divided for the plurality of collected abnormal sound frequencies, and high speed for each frame. Perform Fourier transform. Subsequently, a histogram is generated from the fast Fourier transformed signal using a frequency index that satisfies a predetermined condition to construct the database 100. In this case, the frequency index that satisfies the predetermined condition may be a frequency index that is equal to or greater than a predetermined energy threshold value from the fast Fourier transformed signal. The frequency index that satisfies a predetermined condition may be a frequency index of a predetermined number of upper-sorted signals after the fast Fourier transformed signal is sorted on an energy basis.

도 2a는 데이터베이스(100)로 구축된 히스토그램의 한 종류를 도시한 것으로, 고속 푸리에 변환된 복수의 비정상 사운드 신호에 대한 소정의 에너지 임계값 이상의 주파수 인덱스에 대한 카운트 개수를 히스토그램화 한 것을 도시하고 있다. 도 2b는 데이터베이스(100)로 구축된 히스토그램의 다른 종류를 도시한 것으로, 고속 푸리에 변환된 복수의 비정상 사운드 신호를 에너지 기준으로 소팅한 후, 소팅된 상기 소정 개수의 주파수 인덱스에 대한 카운트 개수를 히스토그램화 한 것을 도시하고 있다. 도 2a 및 도 2b에서는 일 실시 예로 경보음, 여자 비명소리, 남자 비명소리 및 이들의 평균을 이용한 히스토그램을 도시하고 있으며, 이에 한정되는 것은 아니다.FIG. 2A shows a type of histogram constructed by the database 100, and shows a histogram of the number of counts for a frequency index above a predetermined energy threshold for a plurality of fast Fourier transformed abnormal sound signals. . FIG. 2B shows another type of histogram constructed by the database 100. After sorting a plurality of abnormal Fourier transformed abnormal sound signals on an energy basis, the histogram is counted for the predetermined number of frequency indexes. It shows what is done. 2A and 2B illustrate a histogram using an alarm sound, a woman scream, a man scream and an average thereof, but are not limited thereto.

이와 같이 데이터베이스(100)에 히스토그램으로 생성된 주파수 인덱스는 이후, 실제 입력되는 사운드 신호에 대한 주파수 인덱스 선택 시에 선택 제어 신호로 사용될 수 있다.As such, the frequency index generated as a histogram in the database 100 may be used as a selection control signal at the time of selecting the frequency index for the sound signal actually input.

변환부(200)는 복수개의 마이크(미도시)를 통하여 입력되는 멀티 채널 사운드 신호를 고속 푸리에 변환한다.The converter 200 performs fast Fourier transform on a multi-channel sound signal input through a plurality of microphones (not shown).

선택부(300)는 데이터베이스(100)에 저장된 히스토그램 중 소정의 조건을 만족하는 주파수 인덱스를 선택 제어 신호로 하여, 고속 푸리에 변환된 신호로부터 소정 개수의 주파수 인덱스를 선택한다. 이때 선택부(300)에서 선택하는 주파수 인덱스의 개수는 음원 위치 추정 방법을 구현하는 시스템에서 처리 가능한 인덱스 개수만큼 선택한다.The selector 300 selects a predetermined number of frequency indexes from the fast Fourier transformed signal using a frequency index satisfying a predetermined condition among the histograms stored in the database 100 as the selection control signal. In this case, the number of frequency indices selected by the selector 300 is selected as the number of indices that can be processed by a system implementing the sound source position estimation method.

SRP-PHAT 산출부(400)는 선택된 주파수 인덱스에 해당하는 신호에 대해 SRP-PHAT를 산출한다.The SRP-PHAT calculation unit 400 calculates the SRP-PHAT for the signal corresponding to the selected frequency index.

일반적인 SRP-PHAT 산출은 다음과 같다. 서로 다른 두 개의 마이크 조합 즉, 마이크 쌍으로부터 획득된 소리에 대해 상호상관관계(generalized cross correlation; GCC)를 산출한다. 여기서 상호상관관계란, 음원으로부터 두 개의 마이크 각각에서 수신한 소리의 도착 시간 차이를 나타낸다. GCC 산출 수학식은 하기 수학식 1과 같다.The general SRP-PHAT calculation is as follows. A generalized cross correlation (GCC) is calculated for two different microphone combinations, ie sounds obtained from a pair of microphones. Here, the cross-correlation refers to the difference in the arrival time of the sound received by each of the two microphones from the sound source. The GCC calculation equation is shown in Equation 1 below.

수학식 1에서,

는 가중치 함수를 나타내고,

는 i번째 마이크를 통해 입력된 신호의 주파수 영역 값이며,

는 주파수 영역의 공액 복소수 값이다. 상기 수학식 1에 개시된 바와 같이, 마이크 쌍으로 입력되는 신호의 모든 주파수(-∞∼∞)에 대하여 GCC를 산출하기 때문에 연산량이 많아지면서 속도가 느린 문제점이 있다.In Equation 1,

Represents a weight function,

Is the frequency domain value of the signal input through the i-th microphone,

Is a conjugate complex value in the frequency domain. As described in Equation 1, since the GCC is calculated for all frequencies (-∞ to ∞) of the signal input through the microphone pair, there is a problem that the operation amount increases and the speed is slow.

이에 반해 본 실시 예에서는 선택된 주파수 인덱스에 대해서 GCC를 연산하며, GCC 산출 수학식은 하기 수학식 2와 같다.In contrast, in the present embodiment, the GCC is calculated for the selected frequency index, and the GCC calculation equation is shown in Equation 2 below.

수학식 1에서,

는 가중치 함수를 나타내고,

는 i번째 마이크를 통해 입력된 신호의 주파수 영역 값이며,

는 주파수 영역의 공액 복소수 값이다. 특히 수학식 2에서 B_sel은 선택된 주파수 인덱스를 나타낸다. 따라서, 수학식 1 및 수학식 2의 GCC 연산량 비교시 수학식 2의 GCC 연산량이 더 작음을 알 수 있다.In Equation 1,

Represents a weight function,

Is the frequency domain value of the signal input through the i-th microphone,

Is a conjugate complex value in the frequency domain. In particular, B _sel in Equation 2 represents the selected frequency index. Accordingly, it can be seen that the GCC calculation amount of Equation 2 is smaller when comparing the GCC calculation amounts of Equation 1 and Equation 2.

각 마이크 쌍에 대하여, 선택된 주파수 인덱스에 대해서만 GCC를 산출 후, 복수의 마이크 쌍에 대한 SRP를 산출하며, SRP 산출 수학식은 하기 수학식 3과 같다.For each pair of microphones, GCC is calculated only for the selected frequency index, and then SRPs for a plurality of pairs of microphones are calculated, and the SRP calculation equation is shown in Equation 3 below.

여기서 q는 음원의 후보 위치 좌표 이고, N은 마이크의 개수,

은 두 마이크 사이의 신호 도달 시간차를 나타낸다.

는 앞서 설명한 GCC 값이다. SRP 알고리즘은 해석상 GCC의 모든 마이크 조합을 이용한 시간차 추정 기법이라고 할 수 있다. Where q is the candidate position coordinate of the sound source, N is the number of microphones,

Denotes the difference in signal arrival time between two microphones.

Is the GCC value described above. The SRP algorithm can be interpreted as a time difference estimation technique using all microphone combinations of GCC.

특히 가중치 함수를

로 적용하면 입력 신호의 파워를 갖게 하고 위상의 차이를 이용하는 것으로, PHAT(phase transform)이라 한다. PHAT 가중치 기법은 반향에 강인하다고 알려져 있고, 이를 적용한 SPR 알고리즘을 SRP-PHAT 라 한다. Especially the weight function

In this case, the power of the input signal is used and the phase difference is used. This is called a PHAT (phase transform). The PHAT weighting technique is known to be robust to echo, and the SPR algorithm to which it is applied is called SRP-PHAT.

음원위치 추정부(500)는 소정의 시간대 별로 산출된 SRP-PHAT를 누적하여 최대 SRP-PHAT가 가장 많이 누적된 방향을 음원 위치로 추정한다. 음원이 한 개인 경우 최대 SRP-PHAT가 가장 많이 누적된 방향이 한 개 산출이 되고, 음원이 복수 개 인 경우, 최대 SRP-PHAT가 가장 많이 누적된 방향이 복수 개 산출 된다.The sound source position estimator 500 accumulates the SRP-PHATs calculated for each predetermined time period and estimates the direction in which the maximum SRP-PHAT is accumulated the most as the sound source position. In the case of one sound source, the direction in which the maximum SRP-PHAT is accumulated the most is calculated. In the case of a plurality of sound sources, the direction in which the maximum SRP-PHAT is accumulated the most is calculated.

이와 같이 관심 있는 비정상 사운드 신호로부터 생성된 히스토그램을 저장하고 있는 데이터베이스를 기반으로 음원 위치 추정을 위한 SRP-PHAT 연산량을 줄여 음원 위치 추정 속도를 향상시킬 수 있게 된다.As described above, the SRP-PHAT calculation amount for sound source position estimation is reduced based on a database storing a histogram generated from an abnormal sound signal of interest, thereby improving the sound source position estimation speed.

도 3은 본 발명의 일 실시 예에 따른 음원 위치 추정 방법의 동작을 보이는 흐름도 이다. 이하의 설명에서, 도 1 및 도 2에 대한 설명과 중복되는 부분은 그 설명을 생략하기로 한다.3 is a flowchart illustrating an operation of a sound source position estimation method according to an embodiment of the present invention. In the following description, portions that overlap with the description of FIGS. 1 and 2 will be omitted.

도 3을 참조하면, 음원 위치 추정 장치(10)는 복수의 관심 있는 비정상 사운드 프레임들을 기반으로 히스토그램을 생성하여 데이터베이스를 구축하는 단계(S10)를 수행한다. 데이터베이스를 구축하기 위해, 비명, 경보음, 유리 깨지는 소리 또는 타이어 마찰음 등과 같은 복수의 비정상 사운드의 주파수를 수집하고, 수집된 복수의 비정상 사운드 주파수에 대하여 프레임 분할하고, 각 프레임에 대해 고속 푸리에 변환을 수행한다. 이어서, 고속 푸리에 변환된 신호로부터 소정의 조건을 만족하는 주파수 인덱스를 이용하여 히스토그램을 생성하여 데이터베이스를 구축한다. 이때, 소정의 조건을 만족하는 주파수 인덱스란, 고속 푸리에 변환된 신호로부터 소정의 에너지 임계값 이상의 주파수 인덱스일 수 있다. 또한 소정의 조건을 만족하는 주파수 인덱스란, 고속 푸리에 변환된 신호를 에너지 기준으로 소팅한 후, 소팅된 상위 소정 개수의 주파수 인덱스일 수 있다.Referring to FIG. 3, the sound source position estimation apparatus 10 generates a histogram based on a plurality of abnormal sound frames of interest to build a database (S10). To build a database, it collects the frequencies of a plurality of abnormal sounds, such as screams, alarms, glass cracks or tire frictions, splits the frames against the collected plurality of abnormal sound frequencies, and performs fast Fourier transforms for each frame. To perform. Next, a histogram is generated from the fast Fourier transformed signal using a frequency index that satisfies a predetermined condition to construct a database. In this case, the frequency index that satisfies the predetermined condition may be a frequency index that is equal to or greater than a predetermined energy threshold value from the fast Fourier transformed signal. The frequency index that satisfies a predetermined condition may be a frequency index of a predetermined number of upper-sorted signals after the fast Fourier transformed signal is sorted on an energy basis.

데이터베이스 구축이 완료되면, 음원 위치 추정 장치(10)는 복수개의 마이크(미도시)를 통하여 입력되는 멀티 채널 사운드 신호를 고속 푸리에 변환하는 단계(S20)를 수행한다.When the database construction is completed, the sound source position estimation apparatus 10 performs a fast Fourier transform of the multi-channel sound signal input through a plurality of microphones (not shown) (S20).

이어서, 음원 위치 추정 장치(10)는 데이터베이스에 저장된 히스토그램 중 소정의 조건을 만족하는 주파수 인덱스를 선택 제어 신호로 하여, 고속 푸리에 변환된 신호로부터 소정 개수의 주파수 인덱스를 선택하는 단계(S30)를 수행한다. 이때 선택하는 주파수 인덱스의 개수는 음원 위치 추정 방법을 구현하는 시스템에서 처리 가능한 인덱스 개수만큼 선택한다.Subsequently, the sound source position estimating apparatus 10 selects a predetermined number of frequency indices from the fast Fourier transformed signal using a frequency index satisfying a predetermined condition among the histograms stored in the database as a selection control signal (S30). do. In this case, the number of frequency indices to be selected is selected as the number of indices that can be processed by a system implementing the sound source position estimation method.

주파수 인덱스 선택이 완료되면, 음원 위치 추정 장치(10)는 선택된 주파수 인덱스에 해당하는 신호에 대해 SRP-PHAT를 산출하는 단계(S40)를 수행한다. SRP-PHAT 산출을 위해 각 마이크 쌍에 대한 GCC 연산을 수행한 후, SRP-PHAT 연산을 수행한다. GCC 연산 시에, 수학식 2에 개시된 바와 같이 선택된 주파수 인덱스에 대해서만 GCC 연산을 수행하기 때문에, 전 구간에 걸쳐 GCC 연산을 수행하는 수학식 1과 비교 시에, 그 연산량이 작고 속도가 빠르게 된다.When the frequency index selection is completed, the sound source position estimation apparatus 10 performs a step (S40) for calculating the SRP-PHAT for the signal corresponding to the selected frequency index. In order to calculate the SRP-PHAT, a GCC operation is performed on each microphone pair, and then an SRP-PHAT operation is performed. In the GCC operation, since the GCC operation is performed only on the selected frequency index as described in Equation 2, the calculation amount is small and the speed is high when compared with Equation 1 which performs the GCC operation over the entire interval.

SRP-PHAT 산출이 완료되면, 음원 위치 추정 장치(10)는 소정의 시간대 별로 산출된 SRP-PHAT를 누적하여 최대 SRP-PHAT가 가장 많이 누적된 방향을 음원 위치로 추정하는 단계(S50)를 수행한다.When the SRP-PHAT calculation is completed, the sound source position estimating apparatus 10 accumulates the SRP-PHAT calculated for each predetermined time period and estimates the direction in which the largest SRP-PHAT is accumulated the most as the sound source position (S50). do.

이제까지 본 발명에 대하여 바람직한 실시 예를 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 본 발명을 구현할 수 있음을 이해할 것이다. 그러므로 상기 개시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 한다.The present invention has been described above with reference to preferred embodiments. It will be understood by those skilled in the art that the present invention may be embodied in various other forms without departing from the spirit or essential characteristics thereof. Therefore, the above-described embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is shown not in the above description but in the claims, and all differences within the scope should be construed as being included in the present invention.

100: 데이터베이스
200: : 변환부
300: 선택부
400: SRP-PHAT 산출부
500: 음원 위치 추정부100: Database
200: converter
300: selection
400: SRP-PHAT calculation unit
500: sound source position estimation unit

Claims

Generating a histogram based on the plurality of abnormal sound frames of interest to build a database;
Fast Fourier transforming a sound signal input from the multi-channel;
Selecting a predetermined number of frequency indices from the fast Fourier transformed signal using a frequency index satisfying a predetermined condition among the histograms stored in the database as a selection control signal; And
And calculating an SRP-PHAT for a signal corresponding to the selected frequency index.

The method of claim 1,
And accumulating the calculated SRP-PHAT for each predetermined time period, and estimating a direction in which the maximum SRP-PHAT is accumulated the most as a sound source location.

The method of claim 1, wherein the building of the database comprises:
Collecting frequencies of a plurality of abnormal sounds, such as screams, alarm sounds, glass shattering sounds or tire friction sounds;
Performing frame division and fast Fourier transform on the collected plurality of abnormal sound frequencies; And
And generating a histogram from the fast Fourier transformed signal using a frequency index satisfying a predetermined condition.

The method of claim 3, wherein generating the histogram,
And a histogram is generated from the fast Fourier transformed signal using a frequency index equal to or greater than a predetermined energy threshold.

The method of claim 3, wherein generating the histogram,
Sorting the fast Fourier transformed signal on an energy basis; And
And generating a histogram using frequency indexes corresponding to the sorted upper predetermined number of frequency indexes.

The method of claim 1, wherein the selecting of the frequency index comprises:
And selecting the number of frequency indices that can be processed in a system implementing the sound source position estimation method.

A database for generating and storing a histogram based on a plurality of abnormal sound frames of interest;
A converter for fast Fourier transforming the sound signal input from the multi-channels;
A selecting unit for selecting a predetermined number of frequency indices from the fast Fourier transformed signal using a frequency index satisfying a predetermined condition among the histograms stored in the database as a selection control signal; And
And a SRP-PHAT calculation unit for calculating an SRP-PHAT for the signal corresponding to the selected frequency index.

8. The method of claim 7,
And a sound source position estimating unit for accumulating the calculated SRP-PHAT for each predetermined time period and estimating a direction in which the maximum SRP-PHAT is accumulated the most as a sound source position.

The method of claim 7, wherein the database,
After collecting frequencies of a plurality of abnormal sounds such as screams, alarm sounds, glass cracks or tire friction sounds, frame division and fast Fourier transform, the histograms are stored using frequency indices that satisfy predetermined conditions. Sound source position estimation device, characterized in that.

The method of claim 9, wherein the database,
And a histogram generated from the fast Fourier transformed signal using a frequency index equal to or greater than a predetermined energy threshold.

The method of claim 9, wherein the database,
And sorting the fast Fourier-transformed signal on an energy basis and storing a histogram generated using a frequency index corresponding to the sorted upper predetermined number of frequency indexes.

The method of claim 7, wherein the selection unit,
And selecting a frequency index that can be processed by the device.