KR101812159B1

KR101812159B1 - Method and apparatus for localizing sound source using deep learning

Info

Publication number: KR101812159B1
Application number: KR1020160132070A
Authority: KR
Inventors: 고한석; 문성규
Original assignee: 고려대학교 산학협력단
Priority date: 2016-10-12
Filing date: 2016-10-12
Publication date: 2017-12-26

Abstract

The present invention relates to a method and a device therefor to estimate an acoustic direction by using deep learning. The method includes: a step in which a model generating part generates a graph regeneration model to generate an emphasis correlation graph which emphasizes a core component of a correlation graph by learning a plurality of pre-stored correlation graphs based on deep learning; a step in which a graph generating part generates a collection data correlation graph which is a correlation graph in a frequency area between a plurality of acoustic data collected from a specific sound source by a plurality of microphones; a step in which a graph regenerating part generates a collection data emphasis correlation graph which is a graph emphasizing a core component of the collection data correlation graph by inputting the collection data correlation graph into the graph regeneration model; and a step in which an acoustic direction estimating part estimates the direction in which a specific sound source exists based on the collection data emphasis correlation graph.

Description

Field of the Invention [0001] The present invention relates to a method and apparatus for estimating an acoustic direction using deep running,

본 발명은 딥러닝을 이용하여 음향을 발생한 음원이 존재하는 방향을 추정하기 위한, 딥러닝을 이용한 음향 방향 추정 방법 및 장치에 관한 것이다.The present invention relates to a method and an apparatus for estimating an acoustic direction using deep learning for estimating a direction in which a sound source generating sound occurs by using deep learning.

현재 많은 사람들은 각종 음향에 지속적으로 노출되게 되며, 이러한 각종 음향들 중 특히나 차량의 경적음, 소화전의 비상벨소리, 아기의 울음소리를 비롯하여 사람들에게 위험한 상황을 알리는 특정 음향에 대해서 많은 사람들은 특별히 주의를 기울일 것이 요구된다.Now, many people are constantly exposed to various sounds, and many people pay particular attention to these sounds, especially the specific sound that notifies the dangerous situation to people, including the sound of the car, the emergency ringtone of the fire hydrant, It is required to tilt.

그러나, 특정 음향이 짧은 시간 동안 발생한 경우, 청력이 나쁜 사람들의 경우 등의 각종 상황에서 특정 음향이 발생하더라도, 사용자들은 해당 음향이 어떤 방향에서 발생하였는지 인지하는데 어려움을 겪는 경우가 발생하게 된다.However, even if a specific sound occurs in a short time, or in a hearing-impaired situation or the like, the user may have difficulty in recognizing the direction in which the sound occurs.

이러한 문제를 해결하기 위하여, 종래에는 2개의 마이크로부터 입력된 신호인 입력 신호를 각각 주파수 영역으로 퓨리에 변환한 뒤, 퓨리에 변환된 주파수 영역에서의 입력 신호 상호간의 상호 상관(Cross Correlation)을 산출하고, 산출된 상호 상관을 이용하여, 음향의 방향을 추정하는 방법이 제안되었다.In order to solve such a problem, conventionally, input signals which are input from two microphones are respectively Fourier-transformed into frequency domain, cross-correlation between input signals in the Fourier-transformed frequency domain is calculated, A method of estimating the direction of sound using the calculated cross correlation has been proposed.

그러나, 종래의 방법을 통해 산출된 상호 상관은, 잡음 성분이 모두 포함된 입력 신호를 이용하여 산출되기 때문에, 음향 방향 추정의 정밀도가 떨어지는 문제가 있었다.However, since the cross-correlation calculated by the conventional method is calculated by using the input signal including all of the noise components, there is a problem that accuracy of the acoustic direction estimation is inferior.

일본 공개특허공보 특개 제2003-337164호(2003.11.28.)Japanese Unexamined Patent Publication No. 2003-337164 (November 28, 2003)

본 발명의 목적은, 상기한 문제점을 해결하기 위한 것으로, 미리 저장된 복수의 상호상관그래프의 학습을 통해, 그래프 재생성 모델을 생성하고, 생성된 그래프 재생성 모델을 이용하여 수집 음향 데이터 상호간의 상호상관그래프를 재생성하여, 특정 음원이 존재하는 방향을 추정하기 위함이다.SUMMARY OF THE INVENTION The object of the present invention is to solve the above-described problems, and it is an object of the present invention to provide a method and apparatus for generating a graph regeneration model through learning of a plurality of previously stored cross-correlation graphs, And estimates the direction in which a specific sound source is present.

본 발명이 해결하고자 하는 과제는 이상에서 언급한 과제(들)로 제한되지 않으며, 언급되지 않은 또 다른 과제(들)은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the above-mentioned problem (s), and another problem (s) not mentioned can be clearly understood by those skilled in the art from the following description.

상기한 목적을 달성하기 위하여 본 발명의 실시예에 따른 딥러닝을 이용한 음향 방향 추정 방법은 모델 생성부가, 미리 저장된 복수의 상호상관그래프를 딥러닝(Deep Learning)에 기초해 학습하여 상호상관그래프의 핵심 성분을 강조한 그래프인 강조 상호상관그래프를 생성하는 모델인 그래프 재생성 모델을 생성하는 단계, 그래프 생성부가, 특정 음원으로부터 생성되어 복수 개의 마이크가 각각 수집한 복수 개의 수집 음향 데이터 상호 간의 주파수 영역에서의 상호상관그래프인 수집 데이터 상호상관그래프를 생성하는 단계, 그래프 재생성부가, 수집 데이터 상호상관그래프를 그래프 재생성 모델에 입력하여 수집 데이터 상호상관그래프의 핵심 성분을 강조한 그래프인 수집 데이터 강조 상호상관그래프를 생성하는 단계 및 음향 방향 추정부가, 수집 데이터 강조 상호상관그래프에 기초하여, 특정 음원이 존재하는 방향을 추정하는 단계를 포함한다.According to an aspect of the present invention, there is provided a method for estimating an acoustic direction using deep running according to an exemplary embodiment of the present invention. The model generating unit learns a plurality of pre-stored cross-correlation graphs based on Deep Learning, Generating a graph regeneration model, which is a model for generating a highlighted cross-correlation graph, which is a graph emphasizing a key component; Generating a cross-correlation graph of the collected data; inputting the collected data cross-correlation graph to the graph regeneration model to generate a collected data highlight cross-correlation graph that is a graph that emphasizes the key components of the collected data cross-correlation graph; And an acoustic direction estimating unit And estimating a direction in which a specific sound source is present, based on the data emphasis cross correlation graph.

예컨대, 미리 저장된 복수의 상호상관그래프는, 복수의 음원 방향별 상호상관그래프 및 복수의 SNR(Signal to Noise Ratio)별 상호상관그래프 중 선택된 복수의 상호상관그래프를 포함하는 것을 특징으로 한다.For example, a plurality of pre-stored cross-correlation graphs may include a plurality of cross-correlation graphs according to a sound source direction and a plurality of cross-correlation graphs selected from a plurality of cross-correlation graphs according to SNRs (Signal to Noise Ratios).

일 실시예에 따르면, 그래프 재생성 모델은, 심층 신뢰 신경망-심층 신경망(Deep Belief Network-Deep Neural Network) 알고리즘에 기초하여 생성되는 것을 특징으로 한다.According to one embodiment, the graph regeneration model is characterized in that it is generated based on a Deep Belief Network-Deep Neural Network algorithm.

일 실시예에 따르면, 특정 음원이 존재하는 방향을 추정하는 단계에서, 음향 방향 추정부는, 수집 데이터 강조 상호상관그래프의 최대값에 대응되는 각도를 특정 음원이 존재하는 방향으로 추정하는 것을 특징으로 한다.According to one embodiment, in the step of estimating the direction in which a specific sound source exists, the acoustic direction estimating unit estimates an angle corresponding to a maximum value of the collected data emphasis cross-correlation graph in a direction in which a specific sound source exists .

예컨대, 특정 음원이 존재하는 방향을 추정하는 단계 이후에, 디스플레이부가, 특정 음원이 존재하는 방향을 출력하는 단계를 더 포함하는 것을 특징으로 한다.For example, after the step of estimating the direction in which a specific sound source is present, the display unit may further include outputting a direction in which the specific sound source exists.

상기한 목적을 달성하기 위하여 본 발명의 실시예에 따른 딥러닝을 이용한 음향 방향 추정 장치는, 미리 저장된 복수의 상호상관그래프를 딥러닝(Deep Learning)에 기초해 학습하여 상호상관그래프의 핵심 성분을 강조한 그래프인 강조 상호상관그래프를 생성하는 모델인 그래프 재생성 모델을 생성하는 모델 생성부, 특정 음원으로부터 생성되어 복수 개의 마이크가 각각 수집한 복수 개의 수집 음향 데이터 상호 간의 주파수 영역에서의 상호상관그래프인 수집 데이터 상호상관그래프를 생성하는 그래프 생성부, 수집 데이터 상호상관그래프를 그래프 재생성 모델에 입력하여 수집 데이터 상호상관그래프의 핵심 성분을 강조한 그래프인 수집 데이터 강조 상호상관그래프를 생성하는 그래프 재생성부 및 수집 데이터 강조 상호상관그래프에 기초하여, 특정 음원이 존재하는 방향을 추정하는 음향 방향 추정부를 포함하는 것을 특징으로 한다.In order to achieve the above object, an apparatus for estimating an acoustic direction using deep learning according to an embodiment of the present invention includes: learning a plurality of pre-stored cross-correlation graphs based on Deep Learning, A model generation unit that generates a graph regeneration model that is a model for generating a highlighted cross-correlation graph that is an emphasized graph; a model generation unit that generates a cross-correlation graph in a frequency domain between a plurality of collected acoustic data generated by a plurality of microphones, A graph generating section for generating a data cross correlation graph, a collection data cross correlation graph is input to a graph regeneration model, and a collection data cross correlation graph, which is a graph that emphasizes key components of a graph, Based on the emphasized cross-correlation graph, And an acoustic direction estimating section for estimating a direction in which the positive sound source exists.

본 발명의 일 실시예에 따르면, 미리 저장된 복수의 상호상관그래프의 학습을 통해, 그래프 재생성 모델을 생성하고, 생성된 그래프 재생성 모델을 이용하여 수집 음향 데이터 상호간의 상호상관그래프의 핵심 성분이 강조되도록 상호상관그래프를 재생성함으로써, 잡음의 영향이 줄어든 상태에서 특정 음원이 존재하는 방향을 추정할 수 있도록 하는 효과가 있다.According to an embodiment of the present invention, a graph regeneration model is generated through learning of a plurality of pre-stored cross-correlation graphs, and a key component of the cross correlation graph between the collected sound data is emphasized using the generated graph regeneration model By regenerating the cross correlation graph, it is possible to estimate the direction in which a specific sound source exists in a state where the influence of noise is reduced.

도 1은 본 발명의 실시예에 따른, 딥러닝을 이용한 음향 방향 추정 장치를 설명하기 위한 구성도이다.
도 2는 본 발명의 실시예에 따른, 딥러닝을 이용한 음향 방향 추정 방법을 설명하기 위한 순서도이다.
도 3a 및 도 3b는 본 발명의 실시예에 따른, 딥러닝을 이용한 음향 방향 추정 방법 및 장치에서, 상호상관그래프의 일 실시예를 설명하기 위한 도면이다.
도 4a 및 도 4b는 본 발명의 실시예에 따른, 딥러닝을 이용한 음향 방향 추정 방법 및 장치에서, 수집 데이터 상호상관그래프를 생성하는 단계를 설명하기 위한 도면이다.
도 5a 및 도 5b는 본 발명의 실시예에 따른 딥러닝을 이용한 음향 방향 추정 방법 및 장치에서, 수집 데이터 강조 상호상관그래프를 생성하는 단계를 설명하기 위한 도면이다.1 is a block diagram for explaining an acoustic direction estimating apparatus using deep learning according to an embodiment of the present invention.
2 is a flowchart illustrating a method of estimating an acoustic direction using deep learning according to an embodiment of the present invention.
FIGS. 3A and 3B are views for explaining an embodiment of a cross-correlation graph in an acoustic direction estimation method and apparatus using deep learning according to an embodiment of the present invention.
4A and 4B are diagrams for explaining a step of generating a collected data cross-correlation graph in an acoustic direction estimation method and apparatus using deep running according to an embodiment of the present invention.
5A and 5B are diagrams for explaining a step of generating a collected data emphasis cross-correlation graph in an acoustic direction estimation method and apparatus using deep learning according to an embodiment of the present invention.

이하, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 정도로 상세히 설명하기 위하여, 본 발명의 가장 바람직한 실시예를 첨부 도면을 참조하여 설명하기로 한다. 우선 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings in order to facilitate a person skilled in the art to easily carry out the technical idea of the present invention. . In the drawings, the same reference numerals are used to designate the same or similar components throughout the drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

이하, 본 발명의 실시예에 따른 딥러닝을 이용한 음향 방향 추정 방법 및 장치를 첨부된 도면을 참조하여 상세하게 설명하면 아래와 같다.Hereinafter, a method and apparatus for estimating an acoustic direction using deep learning according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른, 딥러닝을 이용한 음향 방향 추정 장치를 설명하기 위한 구성도이다.1 is a block diagram for explaining an acoustic direction estimating apparatus using deep learning according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른, 딥러닝을 이용한 음향 방향 추정 장치(100)는 모델 생성부(110), 그래프 생성부(120), 그래프 재생성부(130) 및 음향 방향 추정부(140)를 포함한다.1, an apparatus for estimating an acoustic direction using deep running according to an exemplary embodiment of the present invention includes a model generating unit 110, a graph generating unit 120, a graph regenerating unit 130, And a direction estimating unit 140. [

나아가, 도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 딥러닝을 이용한 음향 방향 추정 장치(100)는 디스플레이부(150), 데이터베이스(160) 및 복수 개의 마이크(171, 172)를 더 포함할 수 있으나, 본 발명의 실시예에 따른 딥러닝을 이용한 음향 방향 추정 장치(100)는 이에 한정되지 않는다.1, an apparatus for estimating an acoustic direction using deep learning according to an embodiment of the present invention includes a display unit 150, a database 160, and a plurality of microphones 171 and 172 However, the apparatus for estimating an acoustic direction using deep running according to an embodiment of the present invention is not limited thereto.

한편, 도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 딥러닝을 이용한 음향 방향 추정 장치(100)는 설명의 편의를 위하여 제1 마이크(171) 및 제2 마이크(172)의 2개의 마이크를 포함하는 것으로 도시되었으나, 본 발명의 실시예에 따른 딥러닝을 이용한 음향 방향 추정 장치(100)는 마이크의 개수에 한정되지 않는다.1, an apparatus for estimating an acoustic direction using deep running according to an embodiment of the present invention includes two first and second microphones 171 and 172, It is to be understood that the apparatus for estimating an acoustic direction using deep running according to the embodiment of the present invention is not limited to the number of microphones.

모델 생성부(110)는 미리 저장된 복수의 상호상관그래프를 딥러닝(Deep Learning)에 기초해 학습하여 상호상관그래프의 핵심 성분을 강조한 그래프인 강조 상호상관그래프를 생성하는 모델인 그래프 재생성 모델을 생성한다.The model generation unit 110 generates a graph regeneration model, which is a model for generating an enhanced cross-correlation graph, which is a graph in which core components of a cross-correlation graph are emphasized by learning a plurality of pre-stored cross-correlation graphs based on Deep Learning do.

그래프 생성부(120)는 특정 음원으로부터 생성되어 복수 개의 마이크가 각각 수집한 복수 개의 수집 음향 데이터 상호 간의 주파수 영역에서의 상호상관그래프인 수집 데이터 상호상관그래프를 생성한다.The graph generating unit 120 generates a collected data cross-correlation graph, which is a cross-correlation graph in a frequency domain between a plurality of collected acoustic data generated from a specific sound source and collected by a plurality of microphones.

그래프 재생성부(130)는 수집 데이터 상호상관그래프를 그래프 재생성 모델에 입력하여 수집 데이터 상호상관그래프의 핵심 성분을 강조한 그래프인 수집 데이터 강조 상호상관그래프를 생성한다.The graph regenerator 130 inputs the collected data cross-correlation graph to the graph regeneration model to generate a collected data emphasis cross-correlation graph that is a graph that emphasizes the key components of the collected data cross-correlation graph.

음향 방향 추정부(140)는 수집 데이터 강조 상호상관그래프에 기초하여, 특정 음원이 존재하는 방향을 추정한다.The acoustic direction estimating unit 140 estimates the direction in which the specific sound source exists based on the collected data emphasis cross-correlation graph.

디스플레이부(150)는 음향 방향 추정부(140)의 특정 음원이 존재하는 방향을 추정한 결과를 시각적으로 표시할 수 있으며, 디스플레이부(150)는 사용자가 소지한 스마트 단말기의 액정화면, 차량 내부의 디스플레이 화면, 음향의 방향을 표시하기 위해 별도로 제작된 화면을 비롯하여, 각종 디스플레이 장치를 의미할 수 있다.The display unit 150 can visually display the result of estimating the direction in which the specific sound source of the sound direction estimating unit 140 exists. The display unit 150 displays the liquid crystal screen of the smart terminal, A display screen of a display device, a screen separately prepared for displaying the direction of sound, and the like.

데이터베이스(160)는 모델 생성부(110)가 그래프 재생성 모델을 생성하기 위하여, 음원이 존재하는 방향별로 두 개의 마이크가 수집한 음향 데이터 상호간의 주파수 영역에서의 상호상관그래프, SNR(Signal to Noise Ratio)별로 두 개의 마이크가 수집한 음향 데이터 상호간의 주파수 영역에서의 상호상관그래프를 비롯한 각종 상호상관그래프를 미리 저장할 수 있다.In order to generate the graph regeneration model, the database 160 generates a cross-correlation graph in the frequency domain between the sound data collected by the two microphones in the direction in which the sound source is present, a Signal to Noise Ratio (SNR) ), And cross correlation graphs in the frequency domain between the sound data collected by the two microphones.

예를 들어, 데이터베이스(160)는 30도 방향에서 두 개의 마이크가 각각 수집한 음향 데이터 상호간의 주파수 영역에서의 상호상관그래프, 60도 방향에서 두 개의 마이크가 각각 수집한 음향 데이터 상호간의 주파수 영역에서의 상호상관그래프 등 모델 생성부(110)가 상호상관그래프를 재생성하는 학습을 통해 그래프 재생성 모델을 생성하기 위하여, 미리 저장된 각종 상황별 상호상관그래프를 저장할 수 있다.For example, the database 160 may include a cross-correlation graph in the frequency domain between the acoustic data collected by two microphones in the 30-degree direction, a cross-correlation graph in the frequency domain between the acoustic data collected by the two microphones in the 60- The cross-correlation graph of various contexts can be stored in order to generate the graph regeneration model through learning that the model generating unit 110 regenerates the cross-correlation graph.

복수 개의 마이크(171, 172)는 서로 일정거리 이격하여 배치될 수 있으며, 복수 개의 마이크(171, 172) 각각은 특정 음원으로부터 생성된 음향 데이터를 각각 수집할 수 있으며, 예를 들어, 복수 개의 마이크(171, 172) 상호간의 이격거리는 그래프 생성부(120)가 수집 음향 데이터 상호 간의 주파수 영역에서의 상호상관그래프를 생성하기 위한 데이터로 활용될 수 있다.The plurality of microphones 171 and 172 may be spaced apart from each other by a predetermined distance. Each of the plurality of microphones 171 and 172 may collect sound data generated from a specific sound source. For example, The separation distances between the first and second sound sources 171 and 172 can be utilized as data for generating a cross-correlation graph in the frequency domain between the collected sound data.

본 발명의 실시예에 따른, 본 발명의 실시예에 따른 딥러닝을 이용한 음향 방향 추정 장치(100)의 각각의 구성에 대한 보다 구체적인 설명은 이하, 도 2 내지 도 5를 참조하여 후술하도록 하며, 중복되는 설명은 생략한다.A more detailed description of each configuration of the acoustic direction estimating apparatus 100 using deep running according to an embodiment of the present invention will be described below with reference to FIGS. 2 to 5, Duplicate description is omitted.

이제, 도 2를 참조하여, 본 발명의 실시예에 따른 딥러닝을 이용한 음향 방향 추정 방법을 설명한다.Now, referring to FIG. 2, a method of estimating an acoustic direction using deep running according to an embodiment of the present invention will be described.

도 2는 본 발명의 실시예에 따른, 딥러닝을 이용한 음향 방향 추정 방법을 설명하기 위한 순서도이다.2 is a flowchart illustrating a method of estimating an acoustic direction using deep learning according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 본 발명의 실시예에 따른, 딥러닝을 이용한 음향 방향 추정 방법은, 딥러닝에 기초하여 그래프 재생성 모델을 생성하는 단계(S210), 복수 개의 수집 음향 데이터 상호 간의 상호상관그래프인 수집 데이터 상호상관그래프를 생성하는 단계(S230), 수집 데이터 상호상관그래프의 핵심 성분을 강조한 그래프인 수집 데이터 강조 상호상관그래프를 생성하는 단계(S250) 및 음원이 존재하는 방향을 추정하는 단계(S270)를 포함한다.As shown in FIG. 2, a method for estimating an acoustic direction using deep learning, according to an embodiment of the present invention, includes a step S210 of generating a graph regeneration model based on deep running, A step S230 of generating a collection data cross-correlation graph which is a correlation graph, a step S250 of generating a collection data enhanced cross-correlation graph which is a graph emphasizing a key component of the collected data cross-correlation graph, Step S270.

이제 도 2 및 도 3을 동시에 참조하여, S210 단계에 대해 설명한다.Now, referring to FIG. 2 and FIG. 3 at the same time, step S210 will be described.

도 3a 및 도 3b는 본 발명의 실시예에 따른, 딥러닝을 이용한 음향 방향 추정 방법 및 장치에서, 상호상관그래프의 일 실시예를 설명하기 위한 도면이다.FIGS. 3A and 3B are views for explaining an embodiment of a cross-correlation graph in an acoustic direction estimation method and apparatus using deep learning according to an embodiment of the present invention.

S210 단계에서, 모델 생성부(110)는, 미리 저장된 복수의 상호상관그래프를 딥러닝(Deep Learning)에 기초해 학습하여 상호상관그래프의 핵심 성분을 강조한 그래프인 강조 상호상관그래프를 생성하는 모델인 그래프 재생성 모델을 생성한다.In step S210, the model generation unit 110 learns a plurality of pre-stored cross-correlation graphs based on Deep Learning, and generates a model of an enhanced cross-correlation graph that is a graph in which key components of the cross- Create a graph regeneration model.

예를 들어, 미리 저장된 복수의 상호상관그래프는, 복수의 음원 방향별 상호상관그래프 및 복수의 SNR(Signal to Noise Ratio)별 상호상관그래프 중 선택된 복수의 상호상관그래프를 포함한다.For example, the plurality of pre-stored cross-correlation graphs include a plurality of cross-correlation graphs of the sound source directions and a plurality of cross-correlation graphs selected from a plurality of cross-correlation graphs according to SNRs (Signal to Noise Ratios).

예컨대, 미리 저장된 복수의 상호상관그래프는 하나의 음원에서 발생한 음향 데이터를 두 개의 마이크가 각종 상황별로 수집한 두 개의 음향 데이터를 주파수 영역으로 변환한 뒤, 주파수 영역에서의 두 개의 음향 데이터 상호간의 시간 및 각도에 대한 상호상관의 크기를 나타내는 그래프를 의미할 수 있다.For example, a plurality of pre-stored cross-correlation graphs may be obtained by converting two acoustic data collected by two microphones in various situations into a frequency domain and then converting the time between two acoustic data in the frequency domain And a magnitude of the cross-correlation with respect to the angle.

예컨대, 복수의 상호상관그래프 각각은 도 3a에 도시된 바와 같이, 시간축(Time frame), 각도축(Angle) 및 상호상관축(Correlation)의 3축을 가지는 그래프를 의미할 수 있다.For example, each of the plurality of cross-correlation graphs may refer to a graph having three axes of a time frame, angles, and cross-correlation, as shown in FIG. 3A.

예를 들어, 복수의 상호상관그래프 각각은 도 3b에 도시된 바와 같이, 시간축(Time frame) 및 각도축(Angle)을 가지고, 시간 및 각도에 대한 상호상관을 색상으로 표시한 그래프를 의미할 수 있다.For example, each of the plurality of cross-correlation graphs may have a time frame and angles, as shown in FIG. 3B, and may represent a graph in which the cross-correlation of time and angle is expressed in color have.

예를 들어, S210 단계에서 그래프 재생성 모델을 생성하기 위한 복수의 상호상관그래프 각각은 도 3a 및 도 3b에 각각 도시된 바와 같이, 특정 각도에서의 상호상관이 높도록 나타나는 그래프를 의미할 수 있으며, 이때, 상호상관이 높은 특정 각도는 음원이 존재하는 방향으로 추정되는 각도를 의미할 수 있다.For example, each of the plurality of cross-correlation graphs for generating the graph regeneration model in step S210 may be a graph showing a high cross-correlation at a specific angle as shown in FIGS. 3A and 3B, At this time, the specific angle with a high cross-correlation may mean an angle estimated in the direction in which the sound source exists.

예컨대, 도 3a 및 도 3b의 상호상관그래프 상에서 상호상관이 높은 90도 및 270도는 음원이 존재하는 방향으로 추정되는 각도를 의미할 수 있다.For example, 90 degrees and 270 degrees, which are highly correlated with each other on the cross-correlation graphs of FIGS. 3A and 3B, may be an angle estimated in a direction in which sound sources exist.

그러나, 도 3a 및 도 3b의 상호상관그래프는 노이즈를 포함한 두 개의 음향 데이터 상호간의 주파수영역에서의 상호상관을 추가적인 처리없이 바로 나타내는 그래프이기 때문에, 상호상관이 높은 지점과 낮은 지점이 서로 구분되지 못하는 경우가 발생할 수 있다.However, since the cross-correlation graph of FIG. 3A and FIG. 3B shows a cross-correlation between two sound data including noise directly in the frequency domain without any additional processing, Can occur.

이에, S210 단계에서 모델 생성부(110)는 미리 저장된 복수의 상호상관그래프에서 음원이 존재하는 방향으로 추정되는 각도인 상호상관이 높은 영역에 대한 데이터인 핵심 성분을 강조하여 표시하는 그래프인 강조 상호상관그래프를 생성하는 그래프 재생성 모델을 생성할 수 있다.In step S210, the model generating unit 110 generates an emphasized mutual correlation graph, which is a graph that emphasizes and displays a key component, which is data on an area having a high cross-correlation, which is an angle estimated in a direction in which sound sources exist in a plurality of pre- A graph regeneration model for generating a correlation graph can be generated.

예컨대, 핵심 성분은 도 3a 및 도 3b에 도시된 상호상관이 높은 영역(붉은색 영역)의 데이터를 의미할 수 있다.For example, the key component may refer to data of a highly cross-correlated region (red region) shown in FIGS. 3A and 3B.

예를 들어, S210 단계에서, 모델 생성부(110)는, 도 3a 또는 도 3b와 같은 형태를 지니는 복수의 상호상관그래프를 학습하고, 도 3a 또는 도 3b에서 상호상관이 높은 영역(붉은색 영역)을 강조하여 표시하는 그래프인 강조 상호상관그래프를 생성하는 그래프 재생성 모델을 생성할 수 있다.For example, in step S210, the model generation unit 110 learns a plurality of cross-correlation graphs having the form of FIG. 3A or FIG. 3B, ) Is emphasized and displayed, as shown in FIG.

예컨대, S210 단계에서 생성된 그래프 재생성 모델은 특정 상호상관그래프가 입력되면, 입력된 특정 상호상관그래프의 핵심 성분을 강조하여 표시한 그래프인 강조 상호상관그래프를 생성하는 모델을 의미할 수 있다.For example, the graph regeneration model generated in step S210 may be a model for generating an enhanced cross-correlation graph, which is a graph in which a key component of a specific cross-correlation graph inputted is highlighted when a specific cross-correlation graph is input.

예를 들어, 그래프 재생성 모델은, 딥러닝 알고리즘의 일종인 심층 신뢰 신경망-심층 신경망(Deep Belief Network-Deep Neural Network) 알고리즘에 기초하여 생성될 수 있다.For example, the graph regeneration model may be generated based on a Deep Belief Network-Deep Neural Network algorithm, which is a type of deep-running algorithm.

예컨대, 심층 신뢰 신경망-심층 신경망(Deep Belief Network-Deep Neural Network) 알고리즘은 특정 화가의 그림에서 특징적인 부분을 추출하여, 특정 화가의 그림을 따라 그리는 등 이미지에서 특징이 되는 부분을 추출하여 해당 특징을 부각하여 새로운 이미지를 재생성하기 위한 모델을 생성하기 위해 활용되는 알고리즘을 의미할 수 있으며, 심층 신뢰 신경망-심층 신경망(Deep Belief Network-Deep Neural Network) 알고리즘에 대한 보다 구체적인 설명은 생략한다.For example, the Deep Belief Network-Deep Neural Network algorithm extracts characteristic parts from a picture of a specific artist, extracts characteristic parts of the image, such as drawing a picture of a specific artist, And a detailed description of the Deep Belief Network-Deep Neural Network algorithm will be omitted. [0053] [54] In addition,

이제 도 2 및 도 4를 동시에 참조하여, S230 단계를 설명한다.Now, referring to FIG. 2 and FIG. 4 simultaneously, step S230 will be described.

도 4a 및 도 4b는 본 발명의 실시예에 따른, 딥러닝을 이용한 음향 방향 추정 방법 및 장치에서, 수집 데이터 상호상관그래프를 생성하는 단계를 설명하기 위한 도면이다.4A and 4B are diagrams for explaining a step of generating a collected data cross-correlation graph in an acoustic direction estimation method and apparatus using deep running according to an embodiment of the present invention.

S230 단계에서, 그래프 생성부(120)는, 특정 음원으로부터 생성되어 복수 개의 마이크(171, 172)가 각각 수집한 복수 개의 수집 음향 데이터 상호 간의 주파수 영역에서의 상호상관그래프인 수집 데이터 상호상관그래프를 생성한다.In step S230, the graph generating unit 120 generates a collected data cross-correlation graph, which is a cross-correlation graph in a frequency domain between a plurality of collected sound data collected by a plurality of microphones 171 and 172, .

예를 들어, S230 단계에서 복수 개의 마이크(171, 172) 각각이 수집한 복수 개의 수집 음향 데이터는 도 4a에 도시된 바와 같이, 시간 영역에서의 파형을 나타내는 그래프를 의미할 수 있다.For example, the plurality of collected sound data collected by the plurality of microphones 171 and 172 in step S230 may be a graph representing a waveform in the time domain, as shown in FIG. 4A.

S230 단계에서, 그래프 생성부(120)는 특정 음원으로부터 생성되어 수집된 복수 개의 수집 음향 데이터 각각을 주파수 영역으로 주파수 변환하여, 주파수 영역에서의 파형을 나타내는 그래프인 복수 개의 주파수 영역 수집 음향 데이터를 각각 생성할 수 있다.In step S230, the graph generating unit 120 frequency-converts each of a plurality of collected sound data collected and collected from a specific sound source into a frequency domain to generate a plurality of frequency domain collected sound data, which is a graph indicating a waveform in the frequency domain, Can be generated.

이때, S230 단계에서의 주파수 변환은, 고속 퓨리에 변환(Fast Fourier Transform, FFT)를 비롯한 각종 퓨리에 변환 방식을 이용하여 수행될 수 있다.At this time, the frequency conversion in step S230 may be performed using various Fourier transform methods including Fast Fourier Transform (FFT).

S230 단계에서, 그래프 생성부(120)는 복수 개의 주파수 영역 수집 음향 데이터 상호간의 주파수 성분(FFT bin or spectrum)간 상호상관(Cross Correlation)을 연산한 뒤, 복수 개의 마이크(171, 172) 각각이 수집한 복수 개의 수집 음향 데이터의 입력시간 차이를 통해 수집 데이터 상호상관그래프를 생성할 수 있으며, 수집 데이터 상호상관그래프는 도 4b에 도시된 바와 같이 나타날 수 있다.In step S230, the graph generating unit 120 calculates a cross correlation between frequency components (FFT bin or spectrum) of a plurality of frequency domain collected acoustic data, and then generates a plurality of microphones 171 and 172 The collected data cross-correlation graph may be generated through the input time difference of the plurality of collected collected sound data, and the collected data cross-correlation graph may appear as shown in FIG. 4B.

예컨대, S230 단계에서, 수집 데이터 상호상관그래프는 시간 지연 추정(Time Delay Estimation), 지연 행렬(Delay Matrix) 등을 활용하여 각종 종래 방법을 이용해 생성될 수 있으며, S230 단계에서 수집 데이터 상호상관그래프를 생성하기 위한 보다 구체적인 설명은 생략한다.For example, in step S230, the collected data cross-correlation graph may be generated using various conventional methods using a time delay estimation (Time Delay Estimation), a delay matrix (Delay Matrix), and the like. In step S230, A more detailed description will be omitted.

이제 도 2 및 도 5를 동시에 참조하여, S250 단계를 설명한다.Now, referring to FIG. 2 and FIG. 5 simultaneously, step S250 will be described.

S250 단계에서, 그래프 재생성부(130)는, 수집 데이터 상호상관그래프를 그래프 재생성 모델에 입력하여 수집 데이터 상호상관그래프의 핵심 성분을 강조한 그래프인 수집 데이터 강조 상호상관그래프를 생성한다.In step S250, the graph regenerator 130 generates a collected data emphasis cross-correlation graph, which is a graph in which the collected data cross-correlation graph is input to the graph regeneration model to emphasize the key components of the collected data cross-correlation graph.

예를 들어, S210 단계에서 생성된 그래프 재생성 모델은, 각종 상황별로 수집된 두 개의 음향 데이터 상호간의 상호상관그래프에서, 핵심 성분을 강조하여 표시한 강조 상호상관그래프를 생성하는 학습을 통해 생성된 모델이기 때문에, S230 단계에서 생성된 수집 데이터 상호상관그래프를 그래프 재생성 모델에 입력하는 경우, S250 단계에서 그래프 재생성부(130)는 수집 데이터 상호상관그래프의 핵심 성분을 강조하여 표시한 그래프인 수집 데이터 강조 상호상관그래프를 생성할 수 있다.For example, the graph regeneration model generated in step S210 is a model that is generated through learning to generate an enhanced cross-correlation graph in which a key component is emphasized and displayed in a cross-correlation graph between two acoustic data collected by various situations The graph reproducibility section 130 may generate a correlation graph of the collected data cross-correlation graph by highlighting the key components of the collected data cross-correlation graph in step S250, A cross-correlation graph can be generated.

예컨대, 도 5a에 도시된 바와 같이, 수집 데이터 상호상관그래프는 시간축(Time frame) 및 각도축(Angle)을 가지고, 시간 및 각도에 대한 상호상관을 색상으로 표시한 그래프를 의미할 수 있으며, 도 5a에서 붉은 색으로 나타난 영역은 상호 상관이 상대적으로 높은 영역으로, 수집 데이터 상호상관그래프의 핵심 성분을 의미할 수 있다.For example, as shown in FIG. 5A, the collected data cross-correlation graph may have a time frame and an angle, and may denote a graph in which cross-correlation with respect to time and angle is expressed in color, The area indicated by red in 5a is a region where the cross-correlation is relatively high, which may be a key component of the collected data cross-correlation graph.

예컨대, 도 5a에 도시된 수집 데이터 상호상관그래프에서 S250 단계를 통해 핵심 성분을 강조하여 표시한 그래프인 수집 데이터 강조 상호상관그래프를 생성한 결과는 도 5b와 같이 나타날 수 있다.For example, the result of generating the collected data emphasis cross-correlation graph, which is a graph in which the key components are highlighted through the step S250 in the collected data cross-correlation graph shown in FIG. 5A, may be as shown in FIG.

도 5a와 도 5b를 비교하면, 도 5a의 수집 데이터 상호상관그래프에서는 핵심 성분인 붉은색 영역이 상대적으로 약하게 표시된 것을 확인할 수 있으며, 도 5b의 수집 데이터 강조 상호상관그래프에서는 핵심 성분인 붉은색 영역이 상대적으로 강조되어 표시된 것을 확인할 수 있다.5A and FIG. 5B, it can be seen that the red region as a key component is relatively weakly displayed in the collected data cross-correlation graph of FIG. 5A. In the collected data emphasis cross-correlation graph of FIG. 5B, Is displayed relatively emphatically.

다시 말해, 본 발명의 실시예에 따른 딥러닝을 이용한 음향 방향 추정 방법 및 장치는 각종 상황에서 수집된 복수의 상호상관그래프에서 핵심 성분을 강조하여 표시한 그래프인 강조 상호상관그래프를 생성하는 학습을 반복하여, 그래프 재생성 모델을 생성하고, 생성된 그래프 재생성 모델에 특정 수집 데이터 상호상관그래프를 입력하여, 특정 수집 데이터 상호상관그래프의 핵심 성분을 강조한 그래프인 수집 데이터 강조 상호상관그래프를 생성함으로써, 상호상관그래프 상에서 노이즈에 대한 성분보다 특정 음원이 존재하는 방향에 대한 성분이 더 강화되어 표시되도록 할 수 있으며, 이를 통해, 특정 음원이 존재하는 방향을 추정하는 정확도를 높일 수 있다.In other words, the method and apparatus for estimating an acoustic direction using deep running according to an embodiment of the present invention include learning to generate an enhanced cross-correlation graph, which is a graph in which key components are highlighted in a plurality of cross- By repeatedly generating a graph regeneration model and entering a specific collected data cross correlation graph into the generated graph regeneration model to generate a collected data highlight cross correlation graph that is a graph that emphasizes the key components of a particular collected data cross correlation graph, It is possible to further enhance the component of the correlation graph in the direction in which the specific sound source is present rather than the noise component, thereby increasing the accuracy of estimating the direction in which a specific sound source exists.

이제 도 2 및 도 5b를 동시에 참조하여, S270 단계를 설명한다.Now, referring to FIG. 2 and FIG. 5B at the same time, step S270 will be described.

S270 단계에서, 음향 방향 추정부(140)는, 수집 데이터 강조 상호상관그래프에 기초하여, 특정 음원이 존재하는 방향을 추정한다.In step S270, the acoustic direction estimating unit 140 estimates the direction in which the specific sound source exists based on the collected data emphasis cross-correlation graph.

예컨대, S270 단계에서, 음향 방향 추정부(140)는, 수집 데이터 강조 상호상관그래프의 최대값에 대응되는 각도를 특정 음원이 존재하는 방향으로 추정할 수 있다.For example, in step S270, the acoustic direction estimating unit 140 may estimate an angle corresponding to a maximum value of the collected data emphasis cross-correlation graph in a direction in which a specific sound source exists.

예컨대, 도 5b에 도시된 바와 같이, 특정 음원이 존재하는 것으로 추정되는 방향은 도 3b의 우측 색상 막대에 표현된 바와 같이, 붉은색일수록 높은 상호상관을 나타내고 푸른색일수록 낮은 상호상관을 나타내는 그래프인 수집 데이터 강조 상호상관그래프 상에서 최대값에 해당되는 붉은색 영역을 의미할 수 있다.For example, as shown in FIG. 5B, a direction in which a specific sound source is estimated to exist is a graph showing a higher cross-correlation with a red color and a lower cross-correlation with a blue color, as shown in the right color bar of FIG. 3B It can mean the red area corresponding to the maximum value on the highlighted cross-correlation graph of the collected data.

일 실시예에 따르면, S270 단계에서, 음향 방향 추정부(140)는 수집 데이터 강조 상호상관그래프 상에서 각각의 시간 및 각도에 대응되는 데이터값 중 최대값을 선정하여 특정 음원이 존재하는 방향을 추정할 수도 있다.According to one embodiment, in step S270, the acoustic direction estimating unit 140 estimates a direction in which a specific sound source exists by selecting a maximum value among data values corresponding to each time and angle on the collected data emphasis cross-correlation graph It is possible.

예를 들면, 도면에 도시되지는 않았으나, S270 단계 이후에, 디스플레이부(150)는, 특정 음원이 존재하는 방향을 출력할 수 있다.For example, although not shown in the figure, after step S270, the display unit 150 may output a direction in which a specific sound source exists.

예컨대, 디스플레이부(150)가, 특정 음원이 존재하는 방향을 출력함으로써 사용자는 해당 음원이 들려온 방향에 대한 정확한 위치정보를 파악할 수 있게 되어, 해당 음원에 대해 보다 주의를 기울일 수 있게 된다.For example, the display unit 150 outputs a direction in which a specific sound source is present, so that the user can grasp accurate position information on the direction in which the sound source is heard, so that the user can pay more attention to the sound source.

예컨대, 특정 장소에서 수집된 3시간 분량의 수집 음향 데이터를 통해 음향의 방향을 추정하는 실험을 수행한 결과, 수집 데이터 상호상관그래프를 그대로 이용하는 종래의 기술의 경우 92.5%의 방향 추정 성공률을 보였다.For example, as a result of performing an experiment of estimating the direction of sound through 3 hours of collected acoustic data collected at a specific place, the conventional technique using the collected data cross correlation graph as it is showed a success rate of 92.5%.

반면, 특정 장소에서 수집된 3시간 분량의 수집 음향 데이터를 통해 음향의 방향을 추정하는 실험을 수행한 결과, 본 발명의 실시예에 따른 딥러닝을 이용한 음향 방향 추정 방법 및 장치를 통해 생성된 수집 데이터 강조 상호상관그래프를 이용하는 경우, 94%의 방향 추정 성공률을 보였다.On the other hand, as a result of conducting an experiment for estimating the direction of sound through the collected acoustic data of 3 hours collected at a specific place, the acoustic direction estimation method using the deep running according to the embodiment of the present invention, When using the data emphasis cross correlation graph, the success rate of direction was 94%.

이상에서 본 발명에 따른 바람직한 실시예에 대해 설명하였으나, 다양한 형태로 변형이 가능하며, 본 기술분야에서 통상의 지식을 가진 자라면 본 발명의 특허청구범위를 벗어남이 없이 다양한 변형예 및 수정예를 실시할 수 있을 것으로 이해된다.While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but many variations and modifications may be made without departing from the scope of the present invention. It will be understood that the invention may be practiced.

100: 딥러닝을 이용한 음향 방향 추정 장치
110: 모델 생성부
120: 그래프 생성부
130: 그래프 재생성부
140: 음향 방향 추정부
150: 디스플레이부
160: 데이터베이스
171, 172: 복수 개의 마이크100: Acoustic direction estimation device using deep learning
110:
120:
130: Graph reproducibility part
140: acoustic direction estimating unit
150:
160: Database
171, 172: Multiple microphones

Claims

Generating a graph regeneration model, which is a model for generating an enhanced cross-correlation graph, which is a graph that emphasizes a key component of a cross-correlation graph by learning a plurality of pre-stored cross-correlation graphs based on Deep Learning;
Generating a cross-correlation graph, which is a cross-correlation graph in a frequency domain between a plurality of collected acoustic data collected from a specific sound source and collected by a plurality of microphones, respectively;
A graph regenerating unit, comprising: inputting the collected data cross-correlation graph to the graph regeneration model to generate a collected data enhanced cross-correlation graph that is a graph that emphasizes key components of the collected data cross-correlation graph; And
Wherein the acoustic direction estimating step includes estimating a direction in which the specific sound source exists based on the collected data emphasis crosscorrelation graph.

The method according to claim 1,
Wherein the plurality of pre-stored cross-
A cross-correlation graph of a plurality of sound source directions, and a plurality of cross-correlation graphs selected from a plurality of SNRs (Signal to Noise Ratio) cross-correlation graphs.

The method according to claim 1,
In the graph regeneration model,
Wherein the noise is generated based on a Deep Belief Network-Deep Neural Network algorithm.

The method according to claim 1,
In the step of estimating the direction in which the specific sound source is present,
Wherein the acoustic direction estimating unit estimates an angle corresponding to a maximum value of the collected data enhanced cross-correlation graph in a direction in which the specific sound source is present.

The method according to claim 1,
After the step of estimating the direction in which the specific sound source is present,
Wherein the display unit further comprises a step of outputting a direction in which the specific sound source is present.

A model generation unit that generates a graph regeneration model, which is a model for generating an enhanced cross-correlation graph that is a graph that emphasizes a core component of a cross-correlation graph by learning a plurality of pre-stored cross-correlation graphs based on Deep Learning;
A graph generating unit for generating a collected data cross-correlation graph which is a cross-correlation graph in a frequency domain between a plurality of collected acoustic data collected from a specific sound source and collected by a plurality of microphones;
A graph reproducibility unit for inputting the collected data cross-correlation graph to the graph regeneration model to generate a collected data enhanced cross-correlation graph, which is a graph in which the key components of the collected data cross-correlation graph are emphasized; And
And an acoustic direction estimator for estimating a direction in which the specific sound source exists based on the collected data emphasis cross-correlation graph.