KR20130047221A

KR20130047221A - Apparatus and method for estimating sound source

Info

Publication number: KR20130047221A
Application number: KR1020110112112A
Authority: KR
Inventors: 최종석; 강해용
Original assignee: 한국과학기술연구원
Priority date: 2011-10-31
Filing date: 2011-10-31
Publication date: 2013-05-08
Also published as: KR101269189B1

Abstract

PURPOSE: A sound source estimating device and a method thereof are provided to exactly draw a difference of arrival time of the sound received from a sound receiving unit and a direction angle of the sound source. CONSTITUTION: A sound source estimating device(100) comprises a bier coefficient of correlation generating unit(110), a bier coefficient of correlation correction unit(120), a bier coefficient of correlation filtering unit(130), a delay time output unit(140), a sound source direction output unit(150), a sound source direction filtering unit(160), and a sound source summation direction output unit(170). The bier coefficient of correlation generating unit generates the bier coefficient of correlation of a first sound signal received from a first sound receiving unit and a second sound signal received from a second sound receiving unit, from a plurality of sound receiving units. The bier coefficient of correlation correction unit adds the delay time in front and back of the first sound signal and the second sound signal to correct the bier coefficient of correlation. The bier coefficient of correlation filtering unit filters the corrected bier correlation function to remove an external noise. The delay time output unit calculates the delay time from the first sound receiving unit and the second sound receiving unit to the sound source based on the filtered bier coefficient of correlation. The sound source direction output unit calculates the direction information from a sound receiving center to the sound source based on the delay time. [Reference numerals] (110) Bier coefficient of correlation generating unit; (120) Bier coefficient of correlation correction unit; (130) Bier coefficient of correlation filtering unit; (140) Delay time output unit; (150) Sound source direction output unit; (160) Sound source direction filtering unit; (170) Sound source summation direction output unit

Description

Apparatus and method for estimating sound source

본 발명의 실시예는 음원 추정 장치 및 그 방법에 관한 것이며, 보다 구체적으로는 복수의 음성수신부를 통해 정확하게 음원을 추정하는 장치 및 그 방법에 관한 것이다.Embodiments of the present invention relate to an apparatus and method for estimating a sound source, and more particularly, to an apparatus and method for accurately estimating a sound source through a plurality of voice receivers.

최근 심야 범죄의 증가에 따른 보안카메라의 중요성이 부각되고 있다. 이에 따라서, 보안카메라의 설치 수요도 증가하고 있다. 하지만, 기존의 보안카메라는 일정한 방향을 주시하고 있어 보안카메라만으로는 감시할 수 없는 사각지대가 존재하여 보안 기능을 100퍼센트 달성하지는 못한다. 이러한 단점을 보완하기 위해서 음성수신장치의 한 예로 마이크로폰을 이용한 이상 알람 음원 추정 연구가 진행되고 있다. 알람 음원 추정 보안카메라는 이상 신호가 감지될 경우, 그 신호를 추정하여 감시하는 시스템을 말한다. 등록특허 10-0958932는 3차원 음원 위치 측정 기술을 이용한 침입 감지 장치 및 방법에 관한 기술로, 경비 구역의 침입시 파손 음향이 발생하면 파손 음향이 각 3D 마이크에 도착되는 음향의 도착 시간 차이를 측정하고 음원으로부터 거리 차이를 계산하고, 거리 차이를 삼각 측량법을 통해 음원의 3차원 위치를 계산하며 음원의 3차원 위치가 경비 구역 이내인지 여부를 계산하여 경비 구역의 침입 여부를 판단하는 침입 감지 장치 및 방법을 제공한다.Recently, the importance of security cameras with the increase of late-night crimes has been highlighted. Accordingly, the demand for installing security cameras is also increasing. However, existing security cameras are looking at a certain direction, and there are blind spots that cannot be monitored by security cameras alone. In order to make up for such drawbacks, an abnormal alarm sound source estimation study using a microphone has been conducted as an example of a voice receiver. The alarm sound source estimation security camera refers to a system that estimates and monitors an abnormal signal when an abnormal signal is detected. Patent 10-0958932 is a technique for intrusion detection apparatus and method using a three-dimensional sound source position measurement technology, when the breakage sound occurs when the invasion of the guard area measures the time difference of arrival of the sound that arrives at each 3D microphone. An intrusion detection device that calculates the distance difference from the sound source, calculates the three-dimensional position of the sound source by triangulation of the distance difference, and calculates whether the three-dimensional position of the sound source is within the guard zone; Provide a method.

다만 실외 환경에는 믿을 수 없을 정도의 다양한 잡음이 존재하기 때문에, 음원 추정 장치의 성능을 저하시키는 단점이 있다. 구체적으로 각 음성수신부에 도달하는 소리의 도착 시간 차이를 정확하게 도출하는 부분과 음원의 방향각을 최적화하여 예측하는 부분에 있어서 많은 오류가 발생해서 정확한 음원을 추정하는 것이 어려운 단점이 있었다.However, since there are incredibly various noises in the outdoor environment, there is a disadvantage in degrading the performance of the sound source estimation device. In detail, there is a disadvantage in that it is difficult to estimate an accurate sound source due to a large number of errors in the part that accurately derives the arrival time difference of the sound reaching each voice receiver and the part that optimizes the direction angle of the sound source.

등록특허 10-0958932Patent Registration 10-0958932

Jean-Marc Valin, Francois Micaud, Jean Rouat, Dominic Letourneau, "Robust Sound Source Localization Using a Microphone Array on a Mobile Robot" 2003, IEEE/RSJ Jean-Marc Valin, Francois Micaud, Jean Rouat, Dominic Letourneau, "Robust Sound Source Localization Using a Microphone Array on a Mobile Robot" 2003, IEEE / RSJ

본 발명의 일 측면에 따르면, 음성수신부에서 수신하는 소리의 도착 시간의 차이를 정확하게 도출할 수 있다.According to an aspect of the invention, it is possible to accurately derive the difference in the arrival time of the sound received by the voice receiver.

본 발명의 일 측면에 따르면, 음원의 방향각을 정확하게 도출할 수 있다. According to an aspect of the present invention, it is possible to accurately derive the direction angle of the sound source.

그리하여, 종래 기술에 비하여, 노이즈 비율이 최소화되어 정확한 음원 추정을 할 수 있다. Thus, compared with the prior art, the noise ratio can be minimized to make accurate sound source estimation.

본 발명의 일 측면에 따르면, 복수의 음성수신부 중, 제1음성수신부에서 수신한 제1음성신호와 제2음성수신부에서 수신한 제2음성신호의 상여상관계수를 생성하는 상여상관계수 생성부; 상기 생성된 상여상관계수에 기반하여 상기 제1음성수신부 및 상기 제2음성수신부로부터 음원까지의 지연 시간을 산출하는 지연 시간 산출부; 및 상기 지연시간에 기반하여 음성 수신 중심으로부터 상기 음원까지의 방향 정보를 산출하는 음원 방향 산출부를 포함하는 음원 추정 장치가 제공된다.According to an aspect of the present invention, among the plurality of voice receivers, a bonus correlation coefficient generator for generating a bonus correlation coefficient between the first voice signal received by the first voice receiver and the second voice signal received by the second voice receiver; A delay time calculator configured to calculate a delay time from the first voice receiver and the second voice receiver to a sound source based on the generated bonus correlation coefficient; And a sound source direction calculator for calculating direction information from a voice reception center to the sound source based on the delay time.

본 발명의 다른 측면에 따르면, 상기 상여상관계수 생성부는, 상기 복수의 음성수신부 중 임의의 한 쌍 이상의 음성수신부에서 수신한 음성신호들의 상여상관계수를 각각 더 생성하고, 상기 음성 수신 중심은 상기 복수의 음성수신부의 위치에 기반하여 정해지는, 음원 추정 장치가 제공된다.According to another aspect of the present invention, the bonus correlation coefficient generation unit, further generates a bonus correlation coefficient of the voice signals received from any one or more of the plurality of voice receivers, the voice receiver, wherein the voice receiving center Provided is a sound source estimating apparatus, which is determined based on the position of the voice receiver of the apparatus.

본 발명의 다른 측면에 따르면, 음성신호의 순환잡음을 제거하도록 상기 상여상관함수를 수정하는 상여상관계수 수정부를 더 포함하는, 음원 추정 장치가 제공된다.According to another aspect of the present invention, there is provided a sound source estimating apparatus further comprising a bonus correlation coefficient correction unit for modifying the bonus correlation function to remove the cyclic noise of the speech signal.

본 발명의 다른 측면에 따르면, 상기 상여상관계수 수정부는 제1음성신호 및 제2음성신호의 앞뒤에 시간 지연을 추가하여 상기 상여상관계수를 수정하는, 음원 추정 장치가 제공된다.According to another aspect of the present invention, the bonus correlation coefficient correcting unit is provided with a sound source estimation device for modifying the bonus correlation coefficient by adding a time delay before and after the first audio signal and the second voice signal.

본 발명의 다른 측면에 따르면, 외부 잡음을 제거하도록 상기 수정된 상여 상관함수를 필터링하는 상여상관계수 필터링부를 더 포함하는, 음원 추정 장치가 제공된다.According to another aspect of the present invention, there is provided a sound source estimating apparatus further comprising a bonus correlation coefficient filtering unit for filtering the modified bonus correlation function to remove external noise.

본 발명의 다른 측면에 따르면, 상기 상여상관계수 필터링부는, 상기 상여상관계수에 윈도우 필터링하는, 음원 추정 장치가 제공된다.According to another aspect of the present invention, the bonus correlation coefficient filtering unit is provided with a sound source estimating apparatus for window filtering the bonus correlation coefficient.

본 발명의 다른 측면에 따르면, 상기 윈도우 필터링은 카이저 윈도우 필터링인, 음원 추정 장치가 제공된다.According to another aspect of the present invention, there is provided a sound source estimation apparatus, wherein the window filtering is Kaiser window filtering.

본 발명의 다른 측면에 따르면, 상기 상여상관계수 생성부는, 복수의 시간 프레임에 관한 각각의 상여상관계수를 생성하는, 음원 추정 장치가 제공된다.According to another aspect of the present invention, the bonus correlation coefficient generator is provided with a sound source estimation apparatus for generating each of the bonus correlation coefficient for a plurality of time frames.

본 발명의 다른 측면에 따르면, 상기 음성 수신 중심으로부터 상기 음원까지의 복수의 방향 정보를 합산하여, 상기 음성 수신 중심으로부터 상기 음원까지의 합산 방향 정보를 산출하는 음원 합산 방향 산출부를 더 포함하는, 음원 추정 장치가 제공된다.According to another aspect of the present invention, the sound source further includes a sound source summarization direction calculation unit for calculating the summation direction information from the sound reception center to the sound source by summing a plurality of direction information from the sound reception center to the sound source; An estimation apparatus is provided.

본 발명의 다른 측면에 따르면, 상기 복수의 방향 정보를 필터링하는 음원 방향 필터링부를 더 포함하는, 음원 추정 장치가 제공된다.According to another aspect of the present invention, a sound source estimation apparatus further includes a sound source direction filtering unit for filtering the plurality of direction information.

본 발명의 다른 측면에 따르면, 상기 음원 방향 필터링부는 삼각윈도우인, 음원 추정 장치가 제공된다.According to another aspect of the present invention, the sound source direction filtering unit is a triangular window, a sound source estimation apparatus is provided.

본 발명의 또 다른 측면에 따르면, 복수의 음성수신부 중, 제1음성수신부에서 수신한 제1음성신호와 제2음성수신부에서 수신한 제2음성신호의 상여상관계수를 생성하는 단계; 상기 생성된 상여상관계수에 기반하여 상기 제1음성수신부 및 상기 제2음성수신부로부터 음원까지의 지연 시간을 산출하는 단계; 및 상기 지연시간에 기반하여 음성 수신 중심으로부터 상기 음원까지의 방향 정보를 산출하는 단계를 포함하는 음원 추정 방법이 제공된다.According to another aspect of the invention, the step of generating a phase correlation coefficient between the first voice signal received by the first voice receiver and the second voice signal received by the second voice receiver of the plurality of voice receivers; Calculating a delay time from the first voice receiver and the second voice receiver to a sound source based on the generated bonus correlation coefficient; And calculating direction information from a voice reception center to the sound source based on the delay time.

본 발명의 다른 측면에 따르면, 상기 상여상관계수를 생성하는 단계는, 상기 복수의 음성수신부 중 임의의 한 쌍 이상의 음성수신부에서 수신한 음성신호들의 상여상관계수를 각각 생성하는 단계를 더 포함하고, 상기 음성 수신 중심은 상기 복수의 음성수신부의 위치에 기반하여 정해지는, 음원 추정 방법이 제공된다.According to another aspect of the present invention, the generating of the bonus correlation coefficient further includes generating each of the bonus correlation coefficients of the voice signals received from any one or more voice receivers of the plurality of voice receivers, The sound reception center is determined based on the positions of the plurality of voice receivers.

본 발명의 다른 측면에 따르면, 상기 상여상관계수를 생성하는 단계는, 음성신호의 순환잡음을 제거하도록 상기 상여상관함수를 수정하는 단계를 더 포함하는, 음원 추정 방법이 제공된다.According to another aspect of the present invention, generating the bonus correlation coefficient further includes modifying the bonus correlation function to remove cyclic noise of a speech signal.

본 발명의 다른 측면에 따르면, 상기 상여상관함수를 수정하는 단계는, 제1음성신호 및 제2음성신호의 앞뒤에 시간 지연을 추가하여 상기 상여상관계수를 수정하는 단계를 더 포함하는, 음원 추정 방법이 제공된다.According to another aspect of the invention, the step of modifying the bonus correlation function, further comprising the step of modifying the bonus correlation coefficient by adding a time delay before and after the first audio signal and the second audio signal, sound source estimation A method is provided.

본 발명의 다른 측면에 따르면, 상기 상여상관함수를 수정하는 단계는, 외부 잡음을 제거하도록 상기 수정된 상여 상관함수를 필터링하는 단계를 더 포함하는, 음원 추정 방법이 제공된다.According to another aspect of the present invention, modifying the bonus correlation function is provided, further comprising filtering the modified bonus correlation function to remove external noise.

본 발명의 다른 측면에 따르면, 상기 상여 상관함수를 필터링하는 단계는, 상기 상여상관계수에 윈도우 필터링하는 단계를 더 포함하는, 음원 추정 방법이 제공된다.According to another aspect of the present invention, the filtering of the bonus correlation function further comprises window filtering the bonus correlation coefficient.

본 발명의 다른 측면에 따르면, 상기 상여상관계수에 윈도우 필터링하는 단계는, 카이저 윈도우 필터링하는 단계를 포함하는, 음원 추정 방법이 제공된다.According to another aspect of the present invention, the step of window filtering the bonus correlation coefficient, Kaiser window filtering comprises a sound source estimation method is provided.

본 발명의 다른 측면에 따르면, 상기 상여상관계수를 생성하는 단계는, 복수의 시간 프레임에 관한 각각의 상여상관계수를 생성하는 단계를 더 포함하는, 음원 추정 방법이 제공된다.According to another aspect of the present invention, generating the bonus correlation coefficient further includes generating each bonus correlation coefficient for a plurality of time frames.

본 발명의 다른 측면에 따르면, 상기 음성 수신 중심으로부터 상기 음원까지의 방향 정보를 산출하는 단계는, 상기 음성 수신 중심으로부터 상기 음원까지의 복수의 방향 정보를 합산하여, 상기 음성 수신 중심으로부터 상기 음원까지의 합산 방향 정보를 산출하는 단계를 더 포함하는, 음원 추정 방법이 제공된다.According to another aspect of the present invention, the step of calculating the direction information from the voice reception center to the sound source, the plurality of direction information from the voice reception center to the sound source, summing, from the voice reception center to the sound source Comprising a step of calculating the summation direction information of the sound source estimation method is provided.

본 발명의 다른 측면에 따르면, 상기 음성 수신 중심으로부터 상기 음원까지의 방향 정보를 산출하는 단계는, 상기 복수의 방향 정보를 필터링하는 단계를 더 포함하는, 음원 추정 방법이 제공된다.According to another aspect of the present invention, the calculating of the direction information from the voice reception center to the sound source further includes filtering the plurality of direction information, a sound source estimation method is provided.

본 발명의 다른 측면에 따르면, 상기 복수의 방향 정보를 필터링하는 단계는, 삼각 윈도우로 필터링하는 단계를 포함하는, 음원 추정 방법이 제공된다.
According to another aspect of the present invention, the filtering of the plurality of pieces of direction information includes filtering by a triangular window.

본 발명의 일 측면에 따르면 실외 환경의 다양한 잡음을 제거하고, 실제 음원에 의한 변수들을 활용하여 음원의 방향을 추정하는 효과가 있다. According to an aspect of the present invention, there is an effect of removing various noises in an outdoor environment and estimating the direction of a sound source by using variables by an actual sound source.

또한 복수 개의 음원수신부에서 수신되는 신호의 순환적인 영향에서 발생하는 오류를 제거할 수 있다. 그리하여, 종래 기술에 비하여, 정확한 음원 추정을 하게 되는 효과가 있다. In addition, it is possible to eliminate an error caused by the cyclic effect of the signal received from the plurality of sound source receivers. Thus, compared with the prior art, there is an effect of making accurate sound source estimation.

도1은 본 발명의 실시예에 따른 음원 추정 장치(100)의 내부 구성을 나타낸 도면이다.
도2는 본 발명의 실시예에 따른 음성수신부가 설치된 보안 카메라 및 음성 수신부의 좌표도를 나타낸 도면이다.
도3은 본 발명의 실시예에 따른 음원의 방향을 추정하는 좌표계를 나타낸 도면이다.
도4a는 본 발명의 실시예에 따른 상여상관계수 그래프를 나타낸 도면이다.
도4b는 본 발명의 실시예에 따른 수정 상여상관계수 그래프를 나타낸 도면이다.
도5a는 본 발명의 실시예에 따른 윈도우 필터링 전 상여상관계수 그래프를 나타낸 도면이다.
도5b는 본 발명의 실시예에 따른 윈도우 필터링 후 상여상관계수 그래프를 나타낸 도면이다.
도6은 본 발명의 실시예에 따른 카이저 윈도우 필터링 함수의 그래프를 나타낸 도면이다.
도7은 본 발명의 실시예에 따른 음원방향산출부의 작동 원리를 나타낸 도면이다.
도8은 본 발명의 실시예에 따른 음원 방향 필터링부 적용 전 및 후의 합산 방향 정보의 산출 차이를 나타낸 도면이다.
도9는 본 발명의 실시예에 따른 음원 추정 방법의 흐름도를 나타낸 도면이다.1 is a diagram showing the internal configuration of the sound source estimation apparatus 100 according to an embodiment of the present invention.
2 is a view showing a coordinate diagram of a security camera and a voice receiver provided with a voice receiver according to an embodiment of the present invention.
3 is a diagram illustrating a coordinate system for estimating the direction of a sound source according to an exemplary embodiment of the present invention.
Figure 4a is a diagram showing a bonus correlation coefficient in accordance with an embodiment of the present invention.
4B is a diagram illustrating a modified bonus correlation coefficient graph in accordance with an embodiment of the present invention.
5A illustrates a graph of the bonus correlation coefficient before window filtering according to an exemplary embodiment of the present invention.
5B is a diagram illustrating a bonus correlation coefficient graph after window filtering according to an exemplary embodiment of the present invention.
6 is a graph illustrating a Kaiser window filtering function according to an embodiment of the present invention.
7 is a view showing the operating principle of the sound source direction calculation unit according to an embodiment of the present invention.
8 is a diagram illustrating a difference in calculation of summation direction information before and after application of a sound source direction filtering unit according to an exemplary embodiment of the present invention.
9 is a flowchart illustrating a sound source estimation method according to an embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.
DETAILED DESCRIPTION The following detailed description of the invention refers to the accompanying drawings that show, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the present invention are different but need not be mutually exclusive. For example, certain shapes, structures, and characteristics described herein may be embodied in other embodiments without departing from the spirit and scope of the invention with respect to one embodiment. In addition, it is to be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the invention. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present invention, if properly described, is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled. In the drawings, like reference numerals refer to the same or similar functions throughout the several views.

도1은 본 발명의 실시예에 따른 음원 추정 장치의 내부 구성을 나타낸 도면이다. 일 실시예에서, 음원 추정장치(100)는 상여상관계수 생성부(110), 상여상관계수 수정부(120), 상여상관계수 필터링부(130), 지연시간 산출부(140), 음원방향 산출부(150), 음원 방향 필터링부(160), 음원 합산 방향 산출부(170)를 포함할 수 있다.
1 is a diagram showing the internal configuration of a sound source estimation apparatus according to an embodiment of the present invention. In one embodiment, the sound source estimating apparatus 100 is a bonus correlation coefficient generation unit 110, bonus correlation coefficient correction unit 120, bonus correlation coefficient filtering unit 130, delay time calculation unit 140, sound source direction calculation The unit 150 may include a sound source direction filtering unit 160 and a sound source sum direction calculating unit 170.

상여상관계수 생성부(110)는 복수의 음성수신부 중, 제1음성수신부에서 수신한 제1음성신호와 제2음성수신부에서 수신한 제2음성신호의 상여상관계수를 생성하는 역할을 한다. 또한 복수의 음성수신부 중 임의의 한 쌍 이상의 음성수신부에서 수신한 음성신호들의 상여상관계수를 각각 더 생성할 수도 있다. 도2의 왼쪽 부분을 참조하면 4개의 음성수신부가 설치된 보안 카메라를 확인할 수 있다. 3차원으로 음원에서 발생하는 음성신호를 추정하기 위해서는 복수의 음성 수신부가 필요하다. 일반적으로 3개 또는 4개 이상의 음성 수신부를 사용하며 음성 수신부는 음원에서 발생하는 음성신호를 수신하는 역할을 한다. 4개의 음성 수신부 중 임의의 2개를 선정하여 독립적인 6개의 상여상관계수(₄C₂에 의하면 총 6개의 독립적인 상여상관계수가 생성됨)를 생성할 수 있다. 또한 복수의 음성수신부의 위치에 기반하여 음성 수신 중심을 정할 수 있다. 이는 추후 음원의 방향을 정하는데 사용되는 레퍼런스값이 된다. 도2의 우측 부분을 참조하면 4개의 음성수신부를 3차원으로 배치한 좌표계를 기준으로 원점을 음성 수신 중심으로 하였다. The bonus correlation coefficient generator 110 generates a bonus correlation coefficient between the first voice signal received by the first voice receiver and the second voice signal received by the second voice receiver, from among the plurality of voice receivers. Also, each of the plurality of voice receivers may further generate a bonus correlation coefficient of the voice signals received by one or more voice receivers. Referring to the left part of FIG. 2, it can be confirmed that the four security receivers are installed. In order to estimate a voice signal generated from a sound source in three dimensions, a plurality of voice receivers are required. In general, three or four or more voice receivers are used, and the voice receiver serves to receive voice signals generated from a sound source. Any two of the four voice receivers may be selected to generate six independent bonus correlation coefficients (a total of six independent bonus correlation coefficients are generated according to ₄ C ₂ ). Also, the voice reception center may be determined based on the positions of the plurality of voice receivers. This will be a reference value used later to orient the sound source. Referring to the right part of FIG. 2, the origin is the voice reception center based on a coordinate system in which four voice receivers are arranged in three dimensions.

이하는 상여상관계수를 구하는 과정을 설명한다. 음원을 추정하기 위해서는 음성 수신부로 수신되는 신호의 도달 시간차이를 계산해야 한다. 일 실시예에서, 2개의 마이크로폰으로부터 샘플링 주파수 8[kHz]으로 얻어지는 N(256)개의 샘플의 신호(

)을 비교 및 시간차(

)을 중첩(64 overlapped [samples] )을 시키면서 하면서 실시간으로 계산할 수 있다. 가장 일반적으로 통용되는 두 신호의 유의성을 판단하기 위해서 수학식1의 Cross Correlation을 이용한다. The following describes the process of obtaining the bonus correlation coefficient. In order to estimate the sound source, it is necessary to calculate the time difference of arrival of the signal received by the voice receiver. In one embodiment, a signal of N (256) samples obtained at a sampling frequency of 8 [kHz] from two microphones (

) And time difference (

) Can be computed in real time with 64 overlapped [samples]. Cross Correlation of Equation 1 is used to determine the significance of the two most commonly used signals.

하지만 시간공간에서 수학식1을 이용한 방법은 Cross spectrum의 Inverse Fourier Transform을 이용한 방법인 수학식 2보다 복잡도가 높다는 단점이 있다. However, the method of using Equation 1 in the time space has a disadvantage that the complexity is higher than that of Equation 2, which is a method using the Inverse Fourier Transform of the cross spectrum.

실외 환경에서 실시간으로 이상 신호를 감지 해야 하는 시스템임을 감안하면 시간 공간에서의 상여상관은 적합하지 않다. Cross spectrum을 이용한 상관관계는 수학식 2와 같으며, 여기서

는 마이크로폰으로부터의 두 신호

의 Fourier Transform이며,

은 Cross spectrum이다. 상여상관계수는 Cross spectrum의 Inverse Fourier Transform의 과정을 통해서 얻은 Coefficient 값이다.
Given that the system needs to detect abnormal signals in real time in the outdoor environment, bonus correlation in time space is not suitable. The correlation using the cross spectrum is shown in Equation 2, where

Is the two signals from the microphone

Fourier Transform

Is the cross spectrum. The bonus correlation coefficient is the Coefficient value obtained through the process of Inverse Fourier Transform of Cross Spectrum.

상여상관계수 수정부(120)는 제1음성신호 및 제2음성신호의 앞뒤에 시간 지연을 추가하여 상기 상여상관계수를 수정하는 역할을 한다. N(256) [Samples]을 가진 신호의 Fourier Transform의 결과는 신호의 시작점과 종점이 연속이다. 이 상태로 어 두 신호의 Cross spectrum을 계산하는 과정에서 서로 순환적인 영향을 끼치게 된다. 이 영향을 없애기 위해서 두 신호

의 N(256) [Samples]를 각 신호의 앞 뒤에 덧붙여 -N+1~N+1의 Time Delay를 가지는 상여상관함수를 사용하는 것이다. 도4a는 본 발명의 실시예에 따른 상여상관계수 그래프를 나타낸 도면이며 도4b는 본 발명의 실시예에 따른 수정 상여상관계수 그래프를 나타낸 도면이다. 수정된 상여 상관계수를 통하여 순환적 영향으로 생기는 오류를 제거할 수 있다.
The bonus correlation coefficient correction unit 120 serves to correct the bonus correlation coefficient by adding a time delay before and after the first voice signal and the second voice signal. The result of the Fourier Transform of a signal with N (256) [Samples] is a continuous start and end point of the signal. In this state, the cross spectrum of the two signals has a cyclic effect on each other. To eliminate this effect, both signals

N (256) [Samples] is added to the front and back of each signal to use a bonus correlation function having a time delay of -N + 1 to N + 1. 4A is a diagram illustrating a bonus correlation coefficient graph according to an embodiment of the present invention, and FIG. 4B is a diagram illustrating a modified bonus correlation coefficient graph according to an embodiment of the present invention. The modified bonus correlation coefficient can eliminate errors caused by cyclic effects.

상여상관계수 필터링부(130)는 외부 잡음을 제거하도록 상기 수정된 상여 상관함수를 필터링하는 역할을 한다. 일 실시예에서, 상온에서의 소리의 전파속도 340m/s와 마이크로폰의 간격이 최대 120mm라고 가정하면

는 대부분 무시될 수 있다. 또한 실외 환경에서의 무수히 많은 잡음을 고려하여, 상여상관계수(Coefficient)에 윈도우 필터링(Window Filtering)을 할 수 있다. 다양한 윈도우 필터링 방법이 사용될 수 있지만 일 실시예에서 카이저 윈도우 필터링을 할 수 있다. 카이저 윈도우 필터링하는 방법은 수학식3과 같다.The bonus correlation coefficient filtering unit 130 filters the modified bonus correlation function to remove external noise. In an embodiment, assuming that the sound propagation speed of 340 m / s at room temperature and the distance of the microphone are up to 120 mm

Is mostly negligible. In addition, window filtering may be performed on an upper coefficient of efficiency considering a myriad of noises in an outdoor environment. Various window filtering methods may be used, but in one embodiment, Kaiser window filtering may be possible. The Kaiser window filtering method is shown in Equation 3.

Where,

Where,

일 실시예에서 카이저 윈도우 필터링은 변수로

를 사용할 수 있다. 도6은 본 발명의 실시예에 따른 카이저 윈도우 필터링 함수의 그래프를 나타낸 도면이며, 도5a와 도5b는 본 발명의 실시예에 따른 윈도우 필터링 전후의 상여상관계수 그래프를 나타낸 도면이다. 필터링 이후, 시간 지연의 Maximum의 값이 60 에서 28 로 이동되도록 필터링 된 것을 알 수 있다.In one embodiment, Kaiser window filtering is a variable.

Can be used. 6 is a graph showing a Kaiser window filtering function according to an embodiment of the present invention, and FIGS. 5A and 5B are graphs showing a bonus correlation coefficient before and after window filtering according to an embodiment of the present invention. After filtering, it can be seen that the maximum value of the time delay is filtered to move from 60 to 28.

지연 시간 산출부(140) 상기 생성된 또는 수정되거나 필터링된 상여상관계수에 기반하여 상기 제1음성수신부 및 상기 제2음성수신부로부터 음원까지의 지연 시간을 산출하는 역할을 한다. 여기서 Fourier Transform 상여상관계수(Coefficient)의 최고 값에 해당하는 X축의 값(Coefficient)이 두 신호의 시간차(Time Delay)로 산출한다. 일 실시예에서 4개의 음성 수신부를 사용하는 경우 전술한 바와 같이 여섯 가지의 독립적인 상여상관계수가 생성되고 그에 따라 여섯 가지의 독립적인 지연 시간이 산출되므로 지연시간 벡터 TDOA(

)가 산출될 수 있다.
The delay time calculator 140 calculates a delay time from the first voice receiver and the second voice receiver to a sound source based on the generated, modified or filtered bonus correlation coefficient. Here, the value of the X-axis (Coefficient) corresponding to the highest value of the Fourier Transform bonus coefficient (Coefficient) is calculated as the time delay between the two signals. In one embodiment, when four voice receivers are used, six independent bonus correlation coefficients are generated as described above, and six independent delay times are calculated.

) Can be calculated.

음원방향 산출부(150)는 지연시간에 기반하여 음성 수신 중심으로부터 상기 음원까지의 방향 정보를 산출하는 역할을 한다. 도3은 본 발명의 실시예에 따른 음원의 방향을 추정하는 좌표계를 나타낸 도면이다. 일 실시예에서, 음원의 방향은 -180도에서 180도까지로 추정되는 방위각 θ와 -90도에서 90도로 추정되는 고도각Ψ의 두가지 인자로 추정될 수 있다. 좌표계의 중심은 전술한 음원 수신 중심으로 정할 수 있다. 음원 수신 중심은 복수의 음성수신부의 위치에 따라 정해진다.The sound source direction calculator 150 calculates direction information from the voice reception center to the sound source based on the delay time. 3 is a diagram illustrating a coordinate system for estimating the direction of a sound source according to an exemplary embodiment of the present invention. In one embodiment, the direction of the sound source may be estimated by two factors: an azimuth angle θ estimated from −180 degrees to 180 degrees and an elevation angle Ψ estimated from −90 degrees to 90 degrees. The center of the coordinate system may be determined by the aforementioned sound source reception center. The sound source reception center is determined according to the positions of the plurality of voice receivers.

종래 기술에 의하면 지연 시간에 기반하여 기하학적인 계산을 통해서 음원 신호의 방향 추정이 가능하다(Valin, "Robust Sound Source Localization Using a Microphone Array on a Mobile Robot" 2003). 이 방법은 마이크로폰으로부터의 음원까지의 거리가 마이크로폰의 간격보다 월등히 크다는 전제조건으로 한다. Valin의 방식에 따라 설명해보자. 도7은 본 발명의 실시예에 따른 음원방향산출부의 작동 원리를 나타낸 도면이다. Cosine Law에 의해서 도 7은 수학식 4와 같이 표현될 수 있다.According to the related art, it is possible to estimate the direction of a sound source signal through geometric calculation based on delay time (Valin, "Robust Sound Source Localization Using a Microphone Array on a Mobile Robot" 2003). This method presupposes that the distance from the microphone to the sound source is much larger than the distance between the microphones. Let's explain it according to Valin's method. 7 is a view showing the operating principle of the sound source direction calculation unit according to an embodiment of the present invention. 7 may be expressed by Equation 4 by Cosine Law.

여기서

는 마이크로폰 i에서 마이크로폰j으로 향하는 벡터,

는 음원의 방향을 나타내는 단위벡터이다. 또한 삼각함수 원리에 의하여 수학식5가 도출된다.here

Is a vector from microphone i to microphone j,

Is a unit vector indicating the direction of the sound source. Equation 5 is also derived from the trigonometric principle.

또한, 수학식 4와 5를 결합시켜 수학식 6을 도출할 수 있다. 수학식6은 수학식 7으로 다시 쓸 수도 있다.Further, Equation 6 may be derived by combining Equations 4 and 5. Equation 6 may be rewritten as Equation 7.

Where, microphone i =

이며,

Where, microphone i =

Is,

일 실시예에서, N개의 음원수신부를 고려한다면 수학식8에서처럼 N-1개의 식을 얻을 수 있다 N이 4 이상일 때, 수학식 8 즉 선형방정식은 Pseudo-inverse 해법을 이용하여 계산을 한다. 일 실시예에서, 음원 추정 장치(100)에서 4개의 음원 수신 장치를 사용하는 경우 수학식 9의 Pseudo-inverse에 의해서 음원의 방향을 추정할 수 있다.In one embodiment, considering N sound source receivers, N-1 equations can be obtained as in Equation 8. When N is 4 or more, Equation 8, or a linear equation, is calculated using a pseudo-inverse solution. According to an embodiment, when four sound source receivers are used in the sound source estimation apparatus 100, the direction of the sound source may be estimated by the pseudo-inverse of Equation (9).

음원 합산 방향 산출부(170)는 복수의 시간 프레임에 관한 복수의 방향 정보를 누적합산하여, 상기 음성 수신 중심으로부터 상기 음원까지의 합산 방향 정보를 산출하는 역할을 한다. 전술한 내용들은 단일 프레임에서 음원의 방향을 추정하는 것이었지만, 복수의 프레임에서 추정된 음원의 방향을 누적 합산하여 보다 정확한 음원 방향을 추정할 수 있다. 일 실시예에서 방향 정보 중 고도각(Elevation) 정보를 추정할 때, 단일 프레임의 고도각을 정하는 함수 ∑h_k를 가정한다. 하지만 이는 방향각 추정에 있어서도 적용될 수 있다. 여기서 h_k는 -90도에서 90도까지 6도 간격으로 30계급으로 나눈 세그먼트함수이다. 이 세그먼트 함수들을 선형으로 중첩하여 고도각을 정하는 함수∑h_k를 구할 수 있다. 또한 복수의 프레임에서 추정된 음원의 방향을 합산하여야 하므로 예를 들어 프레임이 12개라고 가정하면 수학식 10과 같은 식을 통해 고도각을 추정할 수 있다. 이를 그래프로 나타내면 도8a의 그래프와 같다. 여기서 VAD(Voice Activity Detection)는 음원이 감지된 된 프레임이라는 의미이다. The sound source summation direction calculation unit 170 accumulates and accumulates a plurality of direction information for a plurality of time frames, and calculates summation direction information from the voice reception center to the sound source. Although the above descriptions are to estimate the direction of the sound source in a single frame, it is possible to estimate the more accurate sound source direction by cumulatively sum the direction of the sound source estimated in a plurality of frames. In an embodiment, when estimating elevation information among direction information, a function Σh _k that determines an elevation angle of a single frame is assumed. However, this can be applied to the estimation of the direction angle. Where h _k is a segment function divided by 30 classes at intervals of 6 degrees from -90 to 90 degrees. We can obtain a function ∑h _k that linearly overlaps these segment functions to determine the elevation angle. In addition, since the directions of the sound sources estimated in the plurality of frames must be summed, for example, assuming 12 frames, the elevation angle can be estimated through the equation (10). This is shown as a graph of FIG. 8A. Here, VAD (Voice Activity Detection) means that the sound frame is detected.

음원 방향 필터링부(160)는 복수의 방향 정보를 필터링하는 역할을 한다. 도8a를 참조하면 음원 합산 방향 산출부(170)에서의 산술적인 음원 방향의 합산은 실험에 따를 때 추정되는 음원까지의 방향 정보가 중복되는 경우가 다수 발생하여 추정의 정확도를 높이기 위한 필터링 방법이 필요하였다. 추정되는 방향 정보를 중심에 보다 높은 확률을 부여하기 위하여 삼각형 형태의 필터링을 사용할 수 있다. 다만 필터링 방법은 가우시안 필터링 등 다양한 필터링처럼 추정되는 방향 정보를 중심에 보다 높은 확률을 부여하기 위한 형태이기만 하면, 그 형태는 제한되지 아니한다. 일 실시예에서는 전술한 세그먼트 함수를 통하여 삼각형 형태의 필터링을 사용하였다. 수학식 11 및 12를 통해 삼각형 형태의 필터의 폼을 만들고 중심값을 3배 승산하는 필터링을 수행하였다. 필터링을 통해 다시 추정된 방향정보는 수학식 13의 형태가 된다. 도8a에서 중복하여 추정되었던 방향정보들이 도8b를 보면 하나의 방향 정보로 추정되는 것을 확인할 수 있고, 실험을 통하여 실제 방향 정보와 가장 가까운 방향 정보라는 것이 확인되기도 하였다. The sound source direction filtering unit 160 serves to filter the plurality of direction information. Referring to FIG. 8A, the arithmetic sum of the sound source directions in the sound source sum direction calculation unit 170 may include a plurality of cases in which the direction information to the sound source estimated according to the experiment is overlapped to increase the accuracy of the estimation. Needed. Triangular filtering may be used to give a higher probability to the center of the estimated direction information. However, the filtering method is not limited so long as it is a form for giving a higher probability to the center of the estimated direction information like various filtering such as Gaussian filtering. In one embodiment, triangular filtering is used through the aforementioned segment function. Equations 11 and 12 were performed to form a triangular filter and multiply the center by three times. The direction information estimated through filtering is in the form of Equation 13. Referring to FIG. 8B, the direction information duplicated in FIG. 8A may be estimated as one direction information, and it may be confirmed that the direction information is closest to the actual direction information through experiments.

도9는 본 발명의 실시예에 따른 음원 추정 방법의 흐름도를 나타낸 도면이다. 먼저 음성수신부를 통하여 음성신호를 수신한다(S901). 음성수신부는 복수의 음성수신부일 수 있으며, 일 실시예에서는 4개의 음성수신부를 사용할 수 있다. 음성수신부로 수신된 음성정보를 통하여 상여 상관계수를 생성한다(S902). 주파수 영역의 계산을 통하여 상여상관계수를 추정한다. 그 후 푸리에 변환에서의 연속된 함수에 따른 오류를 제거하기 위하여 상여상관계수를 수정한다(S903). 일 실시예에서, 두 신호

의 N(256) [Samples]를 각 신호의 앞 뒤에 덧붙여 -N+1~N+1의 Time Delay를 가지는 상여상관함수를 사용할 수 있다. 또한 외부 환경의 잡음을 제거하기 위하여 윈도우 필터링을 하며, 이는 카이저 윈도우 필터링일 수 있다(S904). 이렇게 하여 1프레임의 방향정보를 산출할 수 있다(S905). 보다 정확한 정보를 얻기 위하여 1프레임의 방향정보를 다시 윈도우 필터링하며, 이는 삼각 윈도우 필터링일 수 있다 (S906). 윈도우 필터링된 프레임의 방향정보를 합산하며(S907), 이를 통하여 음원의 합산 방향정보를 산출하여 음원의 방향을 최종적으로 추정할 수 있다(S980).
9 is a flowchart illustrating a sound source estimation method according to an embodiment of the present invention. First, a voice signal is received through the voice receiver (S901). The voice receiver may be a plurality of voice receivers, and in one embodiment, four voice receivers may be used. A bonus correlation coefficient is generated through the voice information received by the voice receiver (S902). The bonus correlation coefficient is estimated through the calculation of the frequency domain. Thereafter, the bonus correlation coefficient is modified to remove an error due to the continuous function in the Fourier transform (S903). In one embodiment, two signals

The N (256) [Samples] is added to the front and back of each signal to use the bonus correlation function having a time delay of -N + 1 to N + 1. In addition, window filtering is performed to remove noise of the external environment, which may be Kaiser window filtering (S904). In this way, the direction information of one frame can be calculated (S905). In order to obtain more accurate information, window filtering is performed again on the direction information of one frame, which may be triangular window filtering (S906). The direction information of the window-filtered frame is summed (S907), and through this, the summation direction information of the sound source may be calculated to finally estimate the direction of the sound source (S980).

이상에서 본 발명이 구체적인 구성요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나, 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명이 상기 실시예들에 한정되는 것은 아니며, 본 발명이 속하는 기술분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형을 꾀할 수 있다.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, Those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등하게 또는 등가적으로 변형된 모든 것들은 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be construed as being limited to the above-described embodiments, and all of the equivalents or equivalents of the claims, as well as the following claims, I will say.

100: 음원 추정 장치
110: 상여 상관계수 생성부
120: 상여 상관계수 수정부
130: 상여 상관계수 필터링부
140; 지연시간 산출부
150: 음원 방향 산출부
160: 음원 방향 필터링부
170: 음원 합산 방향 산출부100: sound source estimation device
110: bonus correlation coefficient generator
120: bonus correlation coefficient correction
130: bonus correlation coefficient filtering unit
140; Delay Time Calculator
150: sound source direction calculation unit
160: sound source direction filtering unit
170: sound source sum direction calculation unit

Claims

A bonus correlation coefficient generator for generating a bonus correlation coefficient between the first voice signal received by the first voice receiver and the second voice signal received by the second voice receiver;
A delay time calculator configured to calculate a delay time from the first voice receiver and the second voice receiver to a sound source based on the generated bonus correlation coefficient; And
And a sound source direction calculator for calculating direction information from a voice reception center to the sound source based on the delay time.

The method of claim 1,
The bonus correlation coefficient generating unit further generates bonus correlation coefficients of voice signals received by any one or more voice receivers of the plurality of voice receivers, respectively.
And the voice reception center is determined based on the positions of the plurality of voice receivers.

The method of claim 1,
And a bonus correlation coefficient correction unit for modifying the bonus correlation function to remove cyclic noise of a voice signal.

The method of claim 3,
And the bonus correlation coefficient correcting unit modifies the bonus correlation coefficient by adding a time delay before and after the first voice signal and the second voice signal.

The method of claim 1,
And a bonus correlation coefficient filtering unit for filtering the modified bonus correlation function to remove external noise.

The method of claim 5,
And the bonus correlation coefficient filtering unit performs window filtering on the bonus correlation coefficient.

The method according to claim 6,
And the window filtering is Kaiser window filtering.

The method of claim 1,
The bonus correlation coefficient generator generates each bonus correlation coefficient for a plurality of time frames.

9. The method of claim 8,
And a sound source summation direction calculation unit configured to sum up a plurality of pieces of direction information from the voice reception center to the sound source and calculate summation direction information from the voice reception center to the sound source.

10. The method of claim 9,
And a sound source direction filtering unit for filtering the plurality of direction information.

The method of claim 10,
And the sound source direction filtering unit is a triangular window.

Generating a bonus correlation coefficient between the first voice signal received by the first voice receiver and the second voice signal received by the second voice receiver, from among the plurality of voice receivers;
Calculating a delay time from the first voice receiver and the second voice receiver to a sound source based on the generated bonus correlation coefficient; And
And calculating direction information from a voice reception center to the sound source based on the delay time.

The method of claim 12,
The generating of the phase correlation coefficient further includes generating the phase correlation coefficients of the voice signals received by any one or more voice receivers of the plurality of voice receivers, respectively.
The sound reception center is determined based on the positions of the plurality of voice receivers.

The method of claim 12,
The generating of the bonus correlation coefficient further includes modifying the bonus correlation function to remove cyclic noise of a speech signal.

15. The method of claim 14,
The modifying the bonus correlation function further includes modifying the bonus correlation coefficient by adding a time delay before and after the first audio signal and the second audio signal.

The method of claim 12,
Modifying the bonus correlation function,
And filtering the modified bonus correlation function to remove external noise.

17. The method of claim 16,
The filtering of the bonus correlation function further includes window filtering the bonus correlation coefficient.

18. The method of claim 17,
The window filtering of the bonus correlation coefficient comprises a Kaiser window filtering method.

The method of claim 12,
The generating of the bonus correlation coefficient further includes generating respective bonus correlation coefficients for a plurality of time frames.

20. The method of claim 19,
Calculating direction information from the voice reception center to the sound source,
And summing a plurality of pieces of direction information from the voice reception center to the sound source, and calculating summation direction information from the voice reception center to the sound source.

21. The method of claim 20,
Calculating direction information from the voice reception center to the sound source,
And filtering the plurality of direction information.

The method of claim 21,
The filtering of the plurality of direction informations includes filtering to a triangular window.