KR20200072242A

KR20200072242A - Apparatus and method for determining a dangerous situation by using a mobile device and a multi-channel microphone device

Info

Publication number: KR20200072242A
Application number: KR1020180160203A
Authority: KR
Inventors: 정치윤; 김무섭
Original assignee: 한국전자통신연구원
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2020-06-22

Abstract

The present invention relates to a method for determining a dangerous situation by a terminal and an apparatus thereof. In the method, a second audio signal is collected from a multi-channel microphone located in a terminal separated from the terminal through a control signal based on a collected first audio signal and a dangerous situation is determined from situation information of a user based on analysis data for the second audio signal and the first audio signal.

Description

{Apparatus and method for determining a dangerous situation by using a mobile device and a multi-channel microphone device}

본 발명은 모바일 기기와 다채널 마이크 장치를 이용한 위험상황 판단 방법 및 장치에 관한 것으로, 보다 상세하게는 모바일 기기에서 다채널 마이크 장치의 오디오 정보를 효과적으로 활용하여 위험상황 판단의 정확도를 향상시키는 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for determining a risk situation using a mobile device and a multi-channel microphone device, and more specifically, a method for effectively improving the accuracy of a risk situation determination by effectively using audio information of a multi-channel microphone device in a mobile device and It is about the device.

소리는 사용자의 시야에서 벗어난 위험이나 주위가 분산된 경우에 감지할 수 없는 위험을 인지하는 데 있어서 중요한 정보가 된다. 따라서 소리를 분석하여 사용자에게 위험을 알려주기 위한 방법에 대한 연구가 많이 진행되었으며, 모바일 기기의 마이크를 사용하여 오디오 신호를 분석하는 방법과 다채널 마이크 장치를 사용하여 오디오 신호를 분석하는 방법으로 구분된다.Sound becomes important information in recognizing the risk of being out of the user's field of view or being undetectable when the environment is scattered. Therefore, a lot of research has been conducted on how to analyze the sound to inform the user of the danger, and divided into a method of analyzing an audio signal using a microphone of a mobile device and a method of analyzing an audio signal using a multi-channel microphone device. do.

모바일 기기의 마이크를 사용하여 오디오 신호를 분석하는 방법의 경우 모바일 기기의 높은 컴퓨팅 파워를 사용할 수 있기 때문에 복잡도 높은 알고리즘을 사용하여 오디오 이벤트 판단의 정확도를 향상 시킬 수 있다는 장점이 있다. 하지만 모바일 기기의 경우 사용자가 기기를 주머니 또는 가방에 휴대하는 경우가 빈번하고 기기의 위치가 고정되어 있지 않기 때문에 수집되는 오디오 신호에 잡음이 발생하거나 일정한 퀄리티의 오디오 수집이 어려워서 오디오 분석의 성능이 저하될 수 있다. 또한 모바일 기기의 경우 일반적으로 한 개의 마이크만 장착되어 있으며, 한 개의 마이크로는 위험 상황 판단에 필요한 정보인 음원의 위치와 방향 등의 상세한 정보를 획득 할 수 없다는 단점이 있다.In the case of a method of analyzing an audio signal using a microphone of a mobile device, since the high computing power of the mobile device can be used, the accuracy of audio event determination can be improved by using a highly complex algorithm. However, in the case of a mobile device, the user frequently carries the device in a pocket or a bag, and since the position of the device is not fixed, noise is generated in the collected audio signal or it is difficult to collect audio of a certain quality, thereby degrading audio analysis performance. Can be. In addition, in the case of a mobile device, only one microphone is generally mounted, and one microphone has a disadvantage in that detailed information such as location and direction of a sound source, which is necessary for determining a dangerous situation, cannot be obtained.

다채널 마이크 장치를 사용하여 오디오 신호를 분석하는 방법의 경우 항상 고정된 위치에 장치가 존재하기 때문에 비교적 일정한 퀄리티의 오디오 신호를 획득할 수 있는 장점이 있다. 또한 다수의 마이크를 사용함으로써 음원의 위치 및 방향 등의 상세 정보를 분석할 수 있는 장점이 있다. 하지만 다채널 마이크 장치의 경우 일반적으로 사용할 수 있는 배터리 용량이 제한적이므로 사용시간이 길지 않으며, 지속적으로 현재 발생되고 있는 소리를 모니터링해야되기 때문에 전력소모가 크다는 문제점이 있다. 또한 모바일 기기 보다 낮은 컴퓨팅 파워를 가지기 때문에 복잡도 높은 알고리즘을 적용할 수 없어 오디오 이벤트 판단의 정확도에 한계를 가진다. In the case of a method of analyzing an audio signal using a multi-channel microphone device, there is an advantage in that an audio signal of a relatively constant quality can be obtained because the device always exists at a fixed position. Also, by using a plurality of microphones, there is an advantage of analyzing detailed information such as the location and direction of a sound source. However, in the case of a multi-channel microphone device, since the usable battery capacity is generally limited, the usage time is not long and there is a problem in that power consumption is large because it is necessary to continuously monitor the currently occurring sound. In addition, since it has a lower computing power than a mobile device, it cannot apply a highly complex algorithm, which limits the accuracy of audio event determination.

오디오 신호 분석을 기반으로 사용자에게 위험을 통지함에 있어서 음원의 위치와 방향은 위험상황의 분석에 중요한 정보가 된다. 동일한 오디오 신호가 발생하더라고 음원과의 거리, 음원이 움직이고 있는 방향에 따라서 위험 여부가 달라질 수 있으며, 이는 오디오 신호 기반의 위험상황 판단 방법의 정확도에 큰 영향을 미친다. In notifying the user of the risk based on the analysis of the audio signal, the location and direction of the sound source becomes important information for the analysis of the risk situation. Even if the same audio signal is generated, the risk may vary depending on the distance from the sound source and the direction in which the sound source is moving, which greatly affects the accuracy of the audio signal based risk situation determination method.

따라서 오디오 신호로부터 음원의 위치와 방향 등의 정보를 추정하고, 복잡도 높은 알고리즘을 사용하여 오디오 이벤트 판단의 정확도를 향상 시킴으로써 사용자에게 위험 상황을 보다 정확하게 판단하여 통지 할 수 있는 방법이 필요하다.Therefore, there is a need for a method capable of more accurately determining and notifying a user of a dangerous situation by estimating information such as the location and direction of a sound source from an audio signal and improving the accuracy of audio event determination using a highly complex algorithm.

본 발명은 종래기술의 문제점을 해결하기 위하여, 모바일 기기와 다채널 마이크 장치를 효과적으로 연동한 위험 상황 판단 방법 및 장치를 제공함에 그 목적이 있다. In order to solve the problems of the prior art, the present invention has an object to provide a method and apparatus for determining a dangerous situation in which a mobile device and a multi-channel microphone device are effectively linked.

본 발명은 모바일 기기와 다채널 마이크 장치를 효과적으로 연동하여 다채널 마이크 장치의 전력 소모를 줄이는데 그 목적이 있다.An object of the present invention is to reduce power consumption of a multi-channel microphone device by effectively interworking a mobile device and a multi-channel microphone device.

본 발명은 오디오 이벤트 판단의 정확도를 향상시킴으로써 사용자에게 위험상황을 보다 정확하게 판단하여 통지 할 수 있는 장치 및 방법을 제공하는 데 그 목적이 있다.An object of the present invention is to provide an apparatus and method capable of more accurately judging and notifying a risk situation to a user by improving the accuracy of audio event determination.

본 발명은 모바일 기기와 다채널 마이크 장치를 효과적으로 연동하기 위하여 전송되는 데이터의 양을 줄이는데 그 목적이 있다.An object of the present invention is to reduce the amount of data transmitted to effectively link a mobile device with a multi-channel microphone device.

본 발명은 복수 개의 음원에 대한 오디오 탐지를 수행하는데 그 목적이 있다. An object of the present invention is to perform audio detection for a plurality of sound sources.

본 발명은 복잡도가 높은 알고리즘을 활용하여 오디오 정보에 따른 위험상황을 판단하는데 그 목적이 있다. An object of the present invention is to determine a risk situation according to audio information by using a highly complex algorithm.

본 발명에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problems to be achieved in the present invention are not limited to the technical problems mentioned above, and other technical problems that are not mentioned will be clearly understood by those skilled in the art from the following description. Will be able to.

본 발명의 일 실시예에 따라, 단말이 위험상황을 판단하는 방법 및 장치를 제공할 수 있다. 이 때 위험상황을 판단하는 장치는 제 1 단말 및 상기 제 1 단말과 통신을 수행하는 제 2 단말을 포함할 수 있다. According to an embodiment of the present invention, a terminal may provide a method and apparatus for determining a risk situation. At this time, the device for determining the risk situation may include a first terminal and a second terminal communicating with the first terminal.

단말이 위험상황을 판단하는 방법에 있어서 제 1 단말이 수집된 제 1 오디오 신호에 기초하여 제어신호를 전송하는 단계, 상기 제 1 단말이 상기 제어신호에 기초하여 수집된 제 2 오디오 신호에 대한 분석 데이터를 수신하는 단계, 상기 제 1 단말이 상기 제 2 오디오 신호에 대한 분석 데이터로부터 오디오 이벤트 정보를 파악하고, 상기 제 1 오디오 신호로부터 오디오 상황 정보를 파악하는 단계 및 상기 제 1 단말이 상기 오디오 이벤트 정보, 상기 오디오 상황 정보 및 센서 정보 중 하나 이상에 기초하여 위험 상황인지 여부를 판단하는 단계를 포함할 수 있다. In a method of determining a risk situation in a terminal, the first terminal transmits a control signal based on the collected first audio signal, and the first terminal analyzes the second audio signal collected based on the control signal Receiving data, the first terminal grasping audio event information from the analysis data for the second audio signal, grasping audio situation information from the first audio signal, and the first terminal is the audio event And determining whether it is a dangerous situation based on one or more of information, the audio situation information, and sensor information.

이 때 상기 제어신호에 기초하여 상기 제 1 단말과 분리된 제 2 단말에 위치한 다채널 마이크로부터 상기 제 2 오디오 신호가 수집되고, 상기 제 2 오디오 신호는 한 개 이상의 음원들로부터 수집된 신호에 해당될 수 있다.At this time, based on the control signal, the second audio signal is collected from a multi-channel microphone located in a second terminal separated from the first terminal, and the second audio signal corresponds to a signal collected from one or more sound sources. Can be.

본 발명의 일 실시예에 있어서 제 2 단말은 오디오 신호 분석부를 포함하고 상기 오디오 신호 분석부는 상기 제 2 오디오 신호에 대한 분석데이터를 생성하되, 상기 제 2 오디오 신호에 대한 분석 데이터에는 한 개 이상의 음원들 각각에 대한 오디오 특징정보, 음원의 거리 및 방향 정보 중 하나 이상이 포함될 수 있다.In one embodiment of the present invention, the second terminal includes an audio signal analysis unit, and the audio signal analysis unit generates analysis data for the second audio signal, wherein at least one sound source is included in the analysis data for the second audio signal. One or more of audio characteristic information, a distance and direction information of a sound source for each of them may be included.

본 발명의 일 실시예에 있어서 제 1 단말은 상기 수집된 제 1 오디오 신호를 분석하여 관심소리가 있는지 판단하고, 상기 관심소리가 존재하는 경우 상기 제 2 단말의 상기 다채널 마이크를 활성화시키는 제어신호를 전송할 수 있다. In one embodiment of the present invention, the first terminal The collected first audio signal may be analyzed to determine whether there is a sound of interest, and when the sound of interest exists, a control signal for activating the multi-channel microphone of the second terminal may be transmitted.

본 발명의 일 실시예에 있어서 상기 수집된 제 1 오디오 신호를 분석하여 상기 관심소리가 있는지 판단하는 경우, 상기 수집된 제 1 오디오 신호의 변화량에 기초하여 상기 제 2 오디오 신호의 수집 및 분석이 필요한지 판단할 수 있다. In an embodiment of the present invention, if it is determined that the sound of interest is obtained by analyzing the collected first audio signal, is it necessary to collect and analyze the second audio signal based on the amount of change in the collected first audio signal? I can judge.

본 발명의 일 실시예에 있어서 상기 관심소리가 없는 것으로 판단된 경우 상기 제 1 오디오 신호에 기초하여 사용자의 주변 상황에 대한 정보를 파악하고, 상기 주변 상황에 대한 정보와 상기 센서 정보에 기초하여 사용자가 위험 상황인지 여부를 판단할 수 있다. According to an embodiment of the present invention, when it is determined that there is no sound of interest, information on a user's surroundings is grasped based on the first audio signal, and a user is based on the information on the surroundings and the sensor information. Can determine whether is a dangerous situation.

본 발명의 일 실시예에 있어서 상기 제 2 오디오 신호에 대한 분석 데이터를 생성하는 경우 상기 한 개 이상의 음원들 각각에 대한 위치를 확인하고 빔포밍 오디오를 생성할 수 있다. In an embodiment of the present invention, when generating analysis data for the second audio signal, it is possible to check the location of each of the one or more sound sources and generate beamforming audio.

본 발명의 일 실시예에 있어서 상기 한 개 이상의 음원들 각각에 대한 위치를 확인하는 경우, 상기 다채널 마이크들 간의 간격 정보, 상기 다채널 마이크에 도달하는 음향 신호의 시간 차이 정보 및 세기 차이 정보 중 하나 이상에 기초하여 사용자로부터의 각 음원에 대한 거리 및 방향을 확인할 수 있다. In one embodiment of the present invention, when checking the location of each of the one or more sound sources, among the information on the distance between the multi-channel microphones, the time difference information and the intensity difference information of the sound signal reaching the multi-channel microphone The distance and direction of each sound source from the user may be checked based on one or more.

본 발명의 일 실시예에 있어서 상기 한 개 이상의 음원들 각각에 대한 빔포밍 오디오를 생성하는 경우 상기 확인된 위치의 음원의 소리 신호를 증폭시키고 다른 위치에 존재하는 음원에 의한 간섭을 제거할 수 있다. In one embodiment of the present invention, when generating beamforming audio for each of the one or more sound sources, the sound signal of the sound source at the identified location can be amplified and interference caused by the sound source existing at another location can be eliminated. .

본 발명의 일 실시예에 있어서 상기 한 개 이상의 음원들 각각에 대한 빔포밍 오디오에 기초하여 상기 오디오 특징 정보를 추출할 수 있다. In one embodiment of the present invention, the audio feature information may be extracted based on beamforming audio for each of the one or more sound sources.

본 발명의 일 실시예에 있어서 상기 오디오 특징 정보는 상기 제 2 오디오 신호에 대한 시간 영역에서 정의된 특징, 주파수 영역에서 정의된 특징 및 다른 영역에서 정의된 특징 중 하나 이상을 조합하여 정의될 수 있다. In one embodiment of the present invention, the audio feature information may be defined by combining at least one of a feature defined in a time domain, a feature defined in a frequency domain, and a feature defined in another domain for the second audio signal. .

본 발명의 일 실시예에 있어서 오디오 이벤트 정보는 단일 음원으로 발생되는 소리에 대한 정보를 의미할 수 있다. In one embodiment of the present invention, the audio event information may mean information about sound generated by a single sound source.

본 발명의 일 실시예에 있어서 상기 제 2 오디오 신호에 대한 상기 특징정보가 시간에 따라서 누적된 데이터에 기초하여 상기 오디오 이벤트 정보를 파악하고, 상기 오디오 이벤트 정보에 기초하여 사용자 주위에 이벤트 발생 여부를 판단할 수 있다.According to an embodiment of the present invention, the audio event information is identified based on the data in which the feature information for the second audio signal is accumulated over time, and whether an event occurs around the user based on the audio event information I can judge.

본 발명의 일 실시예에 있어서 제 1 단말이 기계학습으로 상기 이벤트 발생 여부를 판단할 수 있다. In one embodiment of the present invention, the first terminal may determine whether the event occurs through machine learning.

본 발명의 일 실시예에 있어서 오디오 상황 정보는 상기 제 1 오디오 신호에 포함된 다양한 종류의 음원들에 기초하여 파악하되, 현재 사용자가 위치한 주변 환경에 대한 정보를 의미할 수 있다. In one embodiment of the present invention, the audio context information is identified based on various types of sound sources included in the first audio signal, but may mean information about a surrounding environment in which the current user is located.

본 발명의 일 실시예에 있어서 제 1 단말이 기계학습으로 상기 오디오 상황 정보를 판단할 수 있다. In one embodiment of the present invention, the first terminal may determine the audio situation information by machine learning.

본 발명의 일 실시예에 있어서 상기 분석된 오디오 정보로 위험 상황인지를 판단하는 경우, 상기 한 개 이상의 음원들의 위치 정보에 기초하여 각 음원의 접근 여부 및 접근 속도를 파악할 수 있다. In an embodiment of the present invention, when it is determined whether the analyzed audio information is a dangerous situation, it is possible to grasp whether each sound source is approaching and access speed based on the location information of the one or more sound sources.

본 발명의 일 실시예에 있어서 상기 제 1 단말에는 단일마이크가 존재하고, 상기 단일마이크로부터 상기 제 1 오디오 신호가 수집될 수 있다.In an embodiment of the present invention, a single microphone exists in the first terminal, and the first audio signal may be collected from the single microphone.

본 발명은 모바일 기기와 다채널 마이크 장치를 연동하여 주변 소리를 장시간 분석하여 판단할 수 있는 상황 정보와 다수의 마이크를 통해서 획득 될 수 있는 복수개의 음원의 위치와 이벤트 정보를 동시에 사용하여 위험 상황 판단에 활용할 수 있는 효과가 있다. The present invention uses a mobile device and a multi-channel microphone device to analyze the ambient sound for a long time to determine the dangerous situation by simultaneously using the situation information and the location and event information of a plurality of sound sources that can be acquired through multiple microphones. There is an effect that can be utilized in.

본 발명은 기존의 모바일 기기가 장치에 탑재된 마이크만 사용하여 위험상황을 판단하는 방법과 달리, 다수 음원의 위치 정보를 활용하여 음원의 접근 여부, 접근 속도 등의 정보를 추정하고, 상황 정보를 사용함으로써 위험상황을 보다 정확하게 판단할 수 있는 효과가 있다.According to the present invention, unlike a method in which a conventional mobile device determines a dangerous situation using only a microphone mounted on a device, information on whether a sound source is approached or not, access speed, etc. is estimated using location information of multiple sound sources, and context information is By using it, it is possible to more accurately judge the risk situation.

본 발명은 다채널 마이크 장치에서 복수개의 음원의 위치를 추정하여 해당 방향의 소리 정보를 증폭시키고 다른 방향의 소리 정보의 간섭을 제거하는 빔포밍 오디오 신호를 생성한 후 오디오 이벤트 탐지를 위한 특징을 추출하고, 컴퓨팅 파워가 높은 모바일 기기에서 복잡도 높은 알고리즘을 활용하여 오디오 이벤트를 판단함으로써 오디오 이벤트 판단의 정확도를 향상 시킬 수 있는 효과가 있다.The present invention extracts features for audio event detection after generating a beamforming audio signal that amplifies sound information in a corresponding direction and removes interference of sound information in a different direction by estimating the positions of a plurality of sound sources in a multi-channel microphone device And, it is possible to improve the accuracy of audio event determination by determining an audio event by using a highly complex algorithm in a mobile device with high computing power.

본 발명은 다채널 마이크 장치를 항상 활성화 하지 않고 모바일 기기에서 관심 소리의 유무를 분석하여 다채널 마이크 장치를 필요에 따라 활성화함으로써 다채널 마이크 장치의 전력 소모를 감소시켜 사용 시간을 증가 시킬 수 있는 효과가 있다. The present invention analyzes the presence or absence of a sound of interest in a mobile device without always activating the multi-channel microphone device and activates the multi-channel microphone device as necessary, thereby reducing power consumption of the multi-channel microphone device and increasing usage time. There is.

본 발명은 모바일 기기와 다채널 마이크 장치를 연동함에 있어서 데이터 양이 많은 오디오 신호를 전송하지 않고 다채널 마이크 장치에서 오디오 특징을 추출하여 모바일 기기로 전달함으로써 네트워크를 통하여 전송되는 데이터의 양을 줄일 수 있고, 복수개의 음원에 대한 오디오 이벤트 탐지가 가능하다는 효과가 있다.The present invention can reduce the amount of data transmitted through the network by extracting audio characteristics from the multi-channel microphone device and transmitting it to the mobile device without transmitting a large amount of data audio signal when interworking the mobile device and the multi-channel microphone device. There is an effect that it is possible to detect audio events for a plurality of sound sources.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtainable in the present invention are not limited to the above-mentioned effects, and other effects not mentioned may be clearly understood by those skilled in the art from the following description. will be.

도 1은 모바일 기기와 다채널 마이크 장치를 이용한 위험상황 판단 방법 및 장치의 일 예이다.
도 2는 다채널 마이크로부터 오디오 신호를 수집하여 분석하는 오디오 신호수집부의 흐름을 오디오 신호 분석부의 흐름을 도시한다.
도 3은 오디오 신호 감시부에서 전달받은 오디오 정보와 오디오 신호 분석부에서 전달받은 특징정보를 사용하여 오디오 이벤트와 상황정보를 판단하는 오디오 정보 판단부의 흐름을 도시한다.
도 4는 모바일 기기와 다채널 마이크 장치를 이용한 위험상황 판단 방법의 순서도를 나타낸다.1 is an example of a method and apparatus for determining a risk situation using a mobile device and a multi-channel microphone device.
2 shows the flow of the audio signal analysis unit and the flow of the audio signal collection unit for collecting and analyzing audio signals from a multi-channel microphone.
FIG. 3 shows a flow of an audio information determination unit for determining audio events and situation information using audio information received from an audio signal monitoring unit and feature information received from an audio signal analysis unit.
4 is a flowchart of a method for determining a risk situation using a mobile device and a multi-channel microphone device.

이하에서는 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art to which the present invention pertains may easily practice. However, the present invention can be implemented in many different forms and is not limited to the embodiments described herein.

본 발명의 실시 예를 설명함에 있어서 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그에 대한 상세한 설명은 생략한다. 그리고, 도면에서 본 발명에 대한 설명과 관계없는 부분은 생략하였으며, 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.In describing the embodiments of the present invention, when it is determined that a detailed description of known configurations or functions may obscure the subject matter of the present invention, detailed descriptions thereof will be omitted. In the drawings, parts irrelevant to the description of the present invention are omitted, and similar reference numerals are used for similar parts.

본 발명에 있어서, 어떤 구성요소가 다른 구성요소와 "연결", "결합" 또는 "접속"되어 있다고 할 때, 이는 직접적인 연결관계뿐만 아니라, 그 중간에 또 다른 구성요소가 존재하는 간접적인 연결관계도 포함할 수 있다. 또한 어떤 구성요소가 다른 구성요소를 "포함한다" 또는 "가진다"고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 배제하는 것이 아니라 또 다른 구성요소를 더 포함할 수 있는 것을 의미한다.In the present invention, when a component is said to be "connected", "coupled" or "connected" with another component, this is not only a direct connection relationship, but also an indirect connection relationship in which another component exists in the middle. It may also include. Also, when a component is said to "include" or "have" another component, this means that other components may be further included, not specifically excluded, unless otherwise stated. .

본 발명에 있어서, 제1, 제2 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용되며, 특별히 언급되지 않는 한 구성요소들간의 순서 또는 중요도 등을 한정하지 않는다. 따라서, 본 발명의 범위 내에서 일 실시 예에서의 제1 구성요소는 다른 실시 예에서 제2 구성요소라고 칭할 수도 있고, 마찬가지로 일 실시 예에서의 제2 구성요소를 다른 실시 예에서 제1 구성요소라고 칭할 수도 있다.In the present invention, terms such as first and second are used only for the purpose of distinguishing one component from other components, and do not limit the order or importance between components unless otherwise specified. Accordingly, within the scope of the present invention, the first component in one embodiment may be referred to as a second component in another embodiment, and likewise the second component in one embodiment may be the first component in another embodiment It can also be called.

본 발명에 있어서, 서로 구별되는 구성요소들은 각각의 특징을 명확하게 설명하기 위함이며, 구성요소들이 반드시 분리되는 것을 의미하지는 않는다. 즉, 복수의 구성요소가 통합되어 하나의 하드웨어 또는 소프트웨어 단위로 이루어질 수도 있고, 하나의 구성요소가 분산되어 복수의 하드웨어 또는 소프트웨어 단위로 이루어질 수도 있다. 따라서, 별도로 언급하지 않더라도 이와 같이 통합된 또는 분산된 실시 예도 본 발명의 범위에 포함된다.In the present invention, the components that are distinguished from each other are for clarifying each feature, and the components are not necessarily separated. That is, a plurality of components may be integrated to be composed of one hardware or software unit, or one component may be distributed to be composed of a plurality of hardware or software units. Accordingly, such integrated or distributed embodiments are included in the scope of the present invention, unless otherwise stated.

본 발명에 있어서, 다양한 실시 예에서 설명하는 구성요소들이 반드시 필수적인 구성요소들은 의미하는 것은 아니며, 일부는 선택적인 구성요소일 수 있다. 따라서, 일 실시 예에서 설명하는 구성요소들의 부분집합으로 구성되는 실시예도 본 발명의 범위에 포함된다. 또한, 다양한 실시 예에서 설명하는 구성요소들에 추가적으로 다른 구성요소를 포함하는 실시 예도 본 발명의 범위에 포함된다.In the present invention, components described in various embodiments do not necessarily mean components, and some may be optional components. Therefore, an embodiment composed of a subset of components described in one embodiment is also included in the scope of the present invention. In addition, embodiments including other components in addition to the components described in various embodiments are also included in the scope of the present invention.

이하에서는, 본 발명의 실시예에 따른 모바일 기기와 다채널 마이크 장치를 이용한 위험상황 판단 방법 및 장치를 첨부한 도면들을 참조하여 설명한다. 본 발명에 따른 동작 및 작용을 이해하는데 필요한 부분을 중심으로 상세히 설명한다.Hereinafter, a method and apparatus for determining a risk situation using a mobile device and a multi-channel microphone device according to an embodiment of the present invention will be described with reference to the accompanying drawings. It will be described in detail focusing on the parts necessary to understand the operation and operation according to the present invention.

도 1은 본 발명의 모바일 기기와 다채널 마이크 장치를 이용한 위험상황 판단 방법 및 장치를 도시한 도면이다. 1 is a view showing a method and apparatus for determining a risk situation using a mobile device and a multi-channel microphone device of the present invention.

본 발명의 모바일 기기와 다채널 마이크 장치를 이용한 위험상황 판단 장치는 오디오 신호 감시부(110), 오디오 신호 수집부(120), 오디오 신호 분석부(130), 오디오 정보 판단부(140), 위험상황 판단부(150) 및 네트워크 인터페이스부(160)를 포함할 수 있다. 이 때 본 발명의 장치는 제 1 단말 및 제 2 단말로 구성될 수 있다. The apparatus for determining a risk situation using the mobile device and the multi-channel microphone device of the present invention includes an audio signal monitoring unit 110, an audio signal collection unit 120, an audio signal analysis unit 130, an audio information determination unit 140, and a risk It may include a situation determining unit 150 and the network interface unit 160. At this time, the device of the present invention may be configured as a first terminal and a second terminal.

제 1단말은 비교적 큰 배터리 용량, 높은 컴퓨팅 파워 및 다양한 센서 정보를 수집할 수 기능을 수행할 수 있는 장치로써, 본 발명의 일 실시예로써 모바일, 스마트폰, 노트북, 타블릿, 스마트 패드 등의 통신 기능을 포함하는 휴대용 전자기기 등이 해당될 수 있다. 이 때 제 1 단말은 고성능의 컴퓨팅 파워에 의해 오디오 신호 및 다양한 센서 정보를 처리할 수 있다. 또한 제 1 단말은 사용자에게 휴대성을 제공할 수 있다. 제 1단말은 오디오 센서 외 다른 센서들이 위치한 단말일 수 있다. 또한 제 1 단말에는 단일마이크가 존재하고, 단일마이크로부터 제 1 오디오 신호를 수집할 수 있다. The first terminal is a device capable of performing a function of collecting relatively large battery capacity, high computing power, and various sensor information, and as an embodiment of the present invention, communication such as mobile, smartphone, notebook, tablet, and smart pad is provided. A portable electronic device including a function may be applicable. At this time, the first terminal may process audio signals and various sensor information by high-performance computing power. Also, the first terminal may provide portability to the user. The first terminal may be a terminal in which sensors other than the audio sensor are located. In addition, a single microphone is present in the first terminal, and the first audio signal can be collected from the single microphone.

제 2 단말은 마이크의 성능이 우수하며, 사용자가 사용시에 고정된 위치에서 사용할 수 있으며, 음원의 위치를 추정할 수 있는 기능을 수행할 수 있다. 이 때 제 2 단말은 다채널 마이크 장치에 해당될 수 있으며, 본 발명의 일 실시예로써 다수의 마이크가 장착된 안경, 넥밴드 등의 착용형 기기 또는 휴대용 장치를 의미할 수 있다. 또한 제 2 단말에는 다채널 마이크가 설치되어 있을 수 있다. 본 발명의 일 실시예로써 제 2 단말은 다채널의 마이크를 1개 이상 보유하고 있고, 제한된 컴퓨팅 성능을 보유하고 있으나, 기본적인 오디오 신호 처리를 수행할 수 있다. 또한 제 2 단말은 사용자에게 휴대성을 제공할 수 있다. The second terminal has excellent microphone performance, can be used at a fixed position when the user is in use, and can perform a function of estimating the location of the sound source. At this time, the second terminal may correspond to a multi-channel microphone device, and as an embodiment of the present invention, may mean a wearable device such as glasses or a neckband equipped with multiple microphones or a portable device. Also, a multi-channel microphone may be installed in the second terminal. As an embodiment of the present invention, the second terminal has one or more multi-channel microphones and has limited computing power, but can perform basic audio signal processing. In addition, the second terminal can provide portability to the user.

또한 본 발명에서 제 1 단말 및 제 2 단말간에는 정보를 송수신할 수 있는 통신이 수행될 수 있다.In addition, in the present invention, a communication capable of transmitting and receiving information may be performed between the first terminal and the second terminal.

본 발명에서 제 1 단말에서 수집되는 오디오 신호를 제 1 오디오 신호로 정의할 수 있으며, 제 1 오디오 신호에 포함된 다양한 종류의 음원들에 기초하여 오디오 상황 정보를 파악하되, 오디오 상황 정보는 현재 사용자가 위치한 주변 환경에 대한 정보를 의미할 수 있다. In the present invention, the audio signal collected by the first terminal may be defined as the first audio signal, and the audio context information is identified based on various types of sound sources included in the first audio signal, but the audio context information is the current user. Can mean information about the surrounding environment.

본 발명에서 제 1 오디오 신호에 기초한 제어신호를 수신하여, 제 2단말에서 수집하는 오디오 신호를 제 2 오디오 신호로 정의할 수 있다. 이 때 제 2 오디오 신호로부터 오디오 이벤트 정보를 파악할 수 있다. In the present invention, by receiving a control signal based on the first audio signal, the audio signal collected by the second terminal can be defined as the second audio signal. At this time, audio event information can be grasped from the second audio signal.

오디오 신호 감시부(110)는 모바일 기기에 장착된 마이크를 사용하여 소리를 수집할 수 있다. 그리고 오디오 신호 감시부(110)는 수집된 오디오 신호의 변화량을 분석하여 다채널 마이크 장치를 활용한 오디오 신호 수집 및 분석이 필요한지 여부를 결정하는 기능을 수행할 수 있다. 이 때 본 발명의 일 실시예로 다채널 마이크 장치를 활용한 오디오 신호의 수집 및 분석의 필요 여부를 판단하는 방법은 기존의 VAD (Voice Activity Detection), SAD (Sound Activity Detection) 등의 알고리즘을 사용할 수 있다.The audio signal monitoring unit 110 may collect sound using a microphone mounted on a mobile device. In addition, the audio signal monitoring unit 110 may perform a function of analyzing whether the collected audio signal is changed and determining whether audio signal collection and analysis using a multi-channel microphone device is necessary. At this time, as an embodiment of the present invention, a method of determining whether audio signal collection and analysis using a multi-channel microphone device is necessary may use an existing VAD (Voice Activity Detection), SAD (Sound Activity Detection) algorithm, etc. Can be.

오디오 신호 감시부(110)는 다채널 마이크 장치의 오디오 신호 수집이 필요하다고 판단되는 경우, 네트워크 인터페이스부(160)를 통하여 오디오 신호의 수집 활성화를 위한 제어 메시지를 다채널 마이크 장치로 전달할 수 있다. When it is determined that the audio signal collection of the multi-channel microphone device is necessary, the audio signal monitoring unit 110 may transmit a control message for activating the collection of the audio signal to the multi-channel microphone device through the network interface unit 160.

또한 오디오 신호 감시부(110)에서 모바일 기기에 장착된 마이크를 사용하여 수집된 오디오 신호는 오디오 정보 판단부(140)로 전송될 수 있다.Also, the audio signal collected by the audio signal monitoring unit 110 using a microphone mounted on the mobile device may be transmitted to the audio information determination unit 140.

오디오 신호 수집부(120)는 다채널 마이크로부터 오디오 신호를 수집하여 오디오 신호 분석부(130)로 전달하는 기능을 수행한다. 이 때 오디오 신호 수집부(120)는 오디오 신호 감시부(110)의 제어 신호에 기초하여 동작이 제어된다. The audio signal collection unit 120 performs a function of collecting audio signals from a multi-channel microphone and transmitting them to the audio signal analysis unit 130. At this time, the audio signal collection unit 120 is controlled based on the control signal of the audio signal monitoring unit 110.

기존의 다채널 마이크 장비의 경우 항상 다수의 마이크로부터 신호를 수집한 후, VAD 또는 SAD 등의 알고리즘을 사용하여 추가적인 오디오의 분석여부를 판단해야 하기 때문에 배터리가 많이 소모되었으나, 본 발명의 경우 모바일 기기의 오디오 신호 감시부로부터 필요할 때만 제어 메시지를 수신하여 오디오 신호 수집 기능이 활성화됨으로 배터리 소모를 줄일 수 있는 장점이 있다. In the case of the existing multi-channel microphone equipment, since the signals are always collected from a plurality of microphones, it is necessary to determine whether to analyze additional audio using an algorithm such as VAD or SAD. There is an advantage of reducing the battery consumption by receiving the control message only when needed from the audio signal monitoring unit of the audio signal collection function is activated.

오디오 신호 분석부(130)는 오디오 신호 수집부(120)에서 전달되는 오디오 신호를 분석하여 주요 음원의 위치 (거리와 방향 등)을 추정한 후, 해당 음원에 대한 빔포밍 오디오를 생성하는 기능을 수행한다. 이때 음원의 위치 정보에는 음원과 사용자 또는 단말과의 거리, 음원과 사용자 또는 단말과의 방향 등에 대한 정보가 포함될 수 있다.The audio signal analysis unit 130 analyzes the audio signal transmitted from the audio signal collection unit 120, estimates the location (distance and direction, etc.) of the main sound source, and then generates a beamforming audio for the sound source. Perform. At this time, the location information of the sound source may include information about the distance between the sound source and the user or terminal, and the direction between the sound source and the user or terminal.

또한 오디오 신호 분석부(130)는 각각의 음원에 대한 빔포밍 오디오에 대해서 오디오 정보 판단부(140)에서 필요한 오디오 특징정보를 추출하는 기능을 수행할 수 있다. 이때 오디오 특징 정보는 제 2 오디오 신호에 대한 시간 영역에서 정의된 특징, 주파수 영역에서 정의된 특징 및 다른 영역에서 정의된 특징 중 하나 이상을 조합하여 정의될 수 있다.In addition, the audio signal analysis unit 130 may perform a function of extracting audio characteristic information required by the audio information determination unit 140 for beamforming audio for each sound source. In this case, the audio feature information may be defined by combining one or more of features defined in the time domain, features defined in the frequency domain, and features defined in other domains for the second audio signal.

분석된 오디오 정보 (음원의 위치, 음향 특징 정보 등)는 네트워크 인터페이스부(160)를 통하여 오디오 정보 판단부(140)로 전달된다. 이때 분석된 오디오 정보는 음원의 위치 정보 및/또는 음향의 특징 정보 등이 포함될 수 있다. The analyzed audio information (location of sound source, sound feature information, etc.) is transmitted to the audio information determination unit 140 through the network interface unit 160. At this time, the analyzed audio information may include location information of a sound source and/or characteristic information of sound.

본 발명은 오디오 정보의 분석을 위하여 수집되는 오디오 신호를 전송하는 대신 분석된 오디오 정보 (음원의 위치, 음향 특징 정보 등)를 전송함으로써 네트워크를 통하여 전송되는 데이터의 양을 줄일 수 있다. 또한 수집되는 오디오 신호를 네트워크로 전송하는 경우 하드웨어 등의 제약으로 동시에 여러 개의 빔포밍 오디오를 전달하지 못하는 반면, 본 발명은 분석된 오디오 정보를 전송함으로써 복수 개의 빔포밍 오디오 정보의 전송이 가능하여 동시에 다수의 음원에 오디오 이벤트 탐지가 가능하다. The present invention can reduce the amount of data transmitted through the network by transmitting the analyzed audio information (location of sound source, sound feature information, etc.) instead of transmitting the collected audio signal for analysis of the audio information. In addition, when transmitting the collected audio signal to the network, multiple beamforming audio cannot be transmitted simultaneously due to limitations of hardware, etc., whereas the present invention enables transmission of multiple beamforming audio information simultaneously by transmitting the analyzed audio information. Audio event detection is possible for multiple sound sources.

오디오 정보 판단부(140)는 오디오 신호 감시부에서 수집된 오디오 신호 및 오디오 신호 분석부(130)에서 분석된 오디오 정보를 수신하고, 수신된 정보들에 기초하여 판단을 수행하는 기능을 수행한다. 그리고 오디오 정보로부터 판단된 정보는 위험상황 판단부(150)으로 전송된다.The audio information determination unit 140 performs a function of receiving the audio signal collected by the audio signal monitoring unit and the audio information analyzed by the audio signal analysis unit 130, and performing judgment based on the received information. Then, the information determined from the audio information is transmitted to the risk situation determination unit 150.

보다 상세하게는 오디오 신호 분석부(130)는 네트워크 인터페이스부(160)를 통하여 전달받은 오디오 특징 정보를 사용하여 오디오 이벤트를 판단한다. 이 때 오디오 이벤트는 단일 음원으로 발생되는 소리를 의미하며, 자동차 소리, 총소리, 사이렌 소리 등의 다양한 소리를 포함한다. 본 발명의 일 실시예에 의할 때 오디오 이벤트 정보는 사용자에게 특정한 이벤트가 발생하였음을 판달 수 있는 정보로, 짧은 시간 동안 발생한 소리로부터 판단할 수 있는 정보에 해당된다. More specifically, the audio signal analysis unit 130 determines an audio event using the audio feature information received through the network interface unit 160. At this time, the audio event means sound generated by a single sound source, and includes various sounds such as a car sound, a gun sound, and a siren sound. According to an embodiment of the present invention, the audio event information is information that can determine that a specific event has occurred to a user, and corresponds to information that can be determined from sound generated for a short time.

또한 오디오 신호 분석부(130)는 오디오 신호 감시부(110)로부터 전달받은 오디오 신호를 사용하여 현재 상황정보를 판단하는 기능을 수행한다. 이때 오디오 상황정보는 다양한 음원들을 통하여 유추할 수 있는 현재 사용자 주변의 정보로써, 버스 정류장, 대형 마트, 레스토랑 등의 다양한 상황을 포함한다. 본 발명의 일 실시예에 의할 때 오디오 상황정보는 현재 사용자가 위치한 주변 환경에 대한 정보를 의미할 수 있으며, 긴 시간동안 발생한 소리로부터 판단할 수 있는 정보에 해당된다. In addition, the audio signal analysis unit 130 performs a function of determining current situation information using the audio signal received from the audio signal monitoring unit 110. At this time, the audio situation information is information around the current user that can be inferred through various sound sources, and includes various situations such as a bus stop, a large mart, and a restaurant. According to an embodiment of the present invention, audio context information may mean information about a surrounding environment in which a user is currently located, and corresponds to information that can be determined from sounds generated for a long time.

오디오 이벤트와 상황정보를 판단함에 있어서 모바일 기기는 높은 컴퓨팅 파워를 가지기 때문에 DNN(Deep Neural Networks) 등의 복잡도가 높은 알고리즘을 활용할 수 있기 때문에 판단의 정확도를 향상 시킬 수 있다. In determining audio events and context information, since the mobile device has high computing power, it is possible to use a highly complex algorithm such as Deep Neural Networks (DNN) to improve the accuracy of judgment.

위험상황 판단부(150)는 오디오 정보 판단부(140)로부터 수신한 판단 정보 및/또는 모바일 기기의 다른 센서들로부터 수집된 정보를 분석하여 최종적으로 사용자 주변에 위험이 존재하는지 여부를 판단할 수 있다. 또한 위험상황 판단부(150)는 위험하다고 판단되는 경우, 사용자에게 위험상황을 통지하는 기능을 수행한다. 이 때 오디오 정보 판단부(140)으로부터 수신한 판단 정보에는 오디오 이벤트, 상황 정보 음원의 위치 등의 특징정보가 하나 이상 포함될 수 있다. The risk situation determination unit 150 may analyze the determination information received from the audio information determination unit 140 and/or information collected from other sensors of the mobile device to finally determine whether there is a danger around the user. have. Also, the risk situation determination unit 150 performs a function of notifying the user of the risk situation when it is determined to be dangerous. At this time, the determination information received from the audio information determination unit 140 may include one or more characteristic information such as an audio event and a location of a sound source.

기존 기술들이 모바일 기기에 장착된 마이크를 사용하여 위험상황을 판단하는 방법과 달리, 본 발명에서는 다수 음원의 위치 정보를 활용하여 음원의 접근 여부, 접근 속도 등을 추정함으로써 위험상황을 보다 정확하게 판단할 수 있다. Unlike the methods in which existing technologies use a microphone mounted on a mobile device to determine the risk situation, the present invention uses the location information of multiple sound sources to estimate whether the sound source is approaching or not, and to determine the risk situation more accurately. Can be.

도 2는 다채널 마이크로부터 오디오 신호를 수집하여 분석하는 오디오 신호 분석부의 흐름을 도시한다. 2 shows a flow of an audio signal analysis unit for collecting and analyzing audio signals from a multi-channel microphone.

오디오 신호 감시부(110)가 네트워크 인터페이스부(160)를 통하여 제어신호를 전송하여 다채널 마이크 장치의 오디오 신호 수집을 활성화하게 되면, 오디오 신호 수집부(120)는 다채널 마이크로부터 오디오 신호를 수집한 후 오디오 신호 분석부(130)로 전달하게 된다. When the audio signal monitoring unit 110 transmits a control signal through the network interface unit 160 to activate the audio signal collection of the multi-channel microphone device, the audio signal collection unit 120 collects audio signals from the multi-channel microphone After that, it is transmitted to the audio signal analysis unit 130.

오디오 신호 분석부(130)는 음원 위치 추정기(210), 빔포밍 오디오 생성기(220), 오디오 특징 정보 추출기(230) 등으로 구성될 수 있다. The audio signal analysis unit 130 may include a sound source location estimator 210, a beamforming audio generator 220, an audio feature information extractor 230, and the like.

이 때 음원위치 추정기(210)는 다채널 마이크로부터 수집되는 오디오 신호를 분석하여 주요 음원의 위치 (거리와 방향 등)를 추정하는 기능을 수행한다. 이 때 주요 음원의 위치 추정은 사전에 배치된 마이크들의 간격 정보와 마이크에 도달하는 음향 신호의 시간 차이 및 세기 차이 등의 정보를 사용하여 이루어지며, N개의 음원(Si)에 대한 거리(Di)와 방향(Ai)이 추정된다. N개는 한 개 이상의 수에 해당될 수 있다.At this time, the sound source position estimator 210 performs a function of estimating the position (distance and direction) of the main sound source by analyzing the audio signal collected from the multi-channel microphone. At this time, the position estimation of the main sound source is made by using information such as the time difference and intensity difference of the sound signal reaching the microphone and the spacing information of the microphones previously arranged, and the distance (Di) to the N sound sources (Si) And direction Ai are estimated. N can correspond to one or more numbers.

빔포밍 오디오 생성기(220)는 음원위치 추정기(210)로부터 추정된 N개의 음원(Si)에 대해 추정된 거리(Di)와 방향(Ai)를 사용하여 해당 방향의 소리 정보를 증폭시키고 다른 방향의 소리 정보에 의한 간섭을 제거하는 빔포밍 오디오(Bi)를 생성하게 된다. The beamforming audio generator 220 amplifies sound information in the corresponding direction by using the estimated distance Di and the direction Ai for the N sound sources Si estimated from the sound source position estimator 210, and A beamforming audio (Bi) that eliminates interference due to sound information is generated.

보다 상세하게는 빔포밍 오디오 생성기(220)는 한 개 이상의 음원들 각각에 대한 빔포밍 오디오를 생성하는 경우, 확인된 위치의 음원의 소리 신호를 증폭시키고, 다른 위치에 존재하는 음원에 의한 간섭을 제거할 수 있다.In more detail, when the beamforming audio generator 220 generates beamforming audio for each of one or more sound sources, the sound signal of the sound source at the identified location is amplified, and interference by the sound source existing at the other location is generated. Can be removed.

오디오 특징정보 추출기(230)는 빔포밍 오디오(Bi)를 사용하여 오디오 정보 판단부(140)에서 필요한 오디오 특징정보(Fi)를 추출한다. 오디오 특징정보는 시간 영역에서 정의된 다양한 특징 (STE: Short Time Energy, ZCR: Zero crossing Rate 등) 또는 주파수 영역에서 정의된 다양한 특징 (MFCC: Mel-Frequency Cepstrum Coefficient, mel-scaled spectrogram 등), 또는 다른 영역에서 정의된 특징들의 조합으로 구성될 수 있다. The audio feature information extractor 230 extracts the audio feature information Fi required by the audio information determination unit 140 using the beamforming audio Bi. The audio characteristic information includes various characteristics defined in the time domain (STE: Short Time Energy, ZCR: Zero crossing rate, etc.) or various characteristics defined in the frequency domain (MFCC: Mel-Frequency Cepstrum Coefficient, mel-scaled spectrogram, etc.), or It can be composed of a combination of features defined in different domains.

오디오 특징 정보 추출기(230)에서 추출한 N개의 음원에 대한 오디오 특징정보(Fi) 및 음원의 거리(Di)와 방향(Ai) 정보는 네트워크 인터페이스부(160)를 통하여 오디오 정보 판단부(140)로 전달된다. The audio feature information (Fi) and distance (Di) and direction (Ai) information of the N sound sources extracted from the audio feature information extractor 230 are transmitted to the audio information determination unit 140 through the network interface unit 160. Is delivered.

즉, 보다 상세하게는 오디오 신호 분석기가 분석한 제 2 오디오 신호에 대한 분석 데이터에는 한 개 이상의 음원들 각각에 대한 오디오 특징정보, 음원의 거리, 방향 정보 중 하나 이상이 포함된다. 그리고 오디오 신호 분석기는 제 2 오디오 신호에 대한 분석 데이터를 오디오 정보 판단부(140)로 전달한다.That is, more specifically, the analysis data for the second audio signal analyzed by the audio signal analyzer includes one or more of audio characteristic information, distance of the sound source, and direction information for each of the one or more sound sources. In addition, the audio signal analyzer transmits analysis data for the second audio signal to the audio information determination unit 140.

도 3은 오디오 신호 감시부에서 전달받은 오디오 정보와 오디오 신호 분석부에서 전달받은 특징정보를 사용하여 오디오 이벤트와 상황정보를 판단하는 오디오 정보 판단부의 흐름을 도시한다. FIG. 3 shows a flow of an audio information determination unit for determining audio events and situation information using audio information received from an audio signal monitoring unit and feature information received from an audio signal analysis unit.

오디오 정보 판단부(140)는 오디오 상황정보 판단기(310), 오디오 이벤트 특징 관리기(320), 오디오 이벤트 판단기(330)로 구성된다. The audio information determination unit 140 includes an audio situation information determiner 310, an audio event feature manager 320, and an audio event determiner 330.

오디오 상황정보 판단기(310)는 오디오 신호 감시부(110)에서 전달받은 오디오 정보를 사용하여 현재 상황정보를 판단하는 기능을 수행한다. 현재 상황정보를 판단함에 있어서 오디오의 시간 영역에서 정의된 다양한 특징 또는 주파수 영역에 정의된 다양한 특징, 또는 다른 영역에서 정의된 특징들의 조합으로 구성된 오디오 특징 정보를 추출한 후, SVM (Support vector machines), KNN(k-Nearest Neighbor), DNN 등의 기계학습 방법을 사용하여 상황정보를 판단한다. 또는 오디오 신호로부터 특징을 추출하지 않고 원시 데이터(Raw data)를 기계학습 알고리즘에 입력으로 사용하여 상황 정보를 판단할 수 있다. The audio context information determiner 310 performs a function of determining the current context information using the audio information received from the audio signal monitoring unit 110. In determining current situation information, after extracting audio feature information consisting of various features defined in the time domain of audio, various features defined in the frequency domain, or a combination of features defined in other domains, SVM (Support vector machines), It uses machine learning methods such as KNN (k-Nearest Neighbor) and DNN to determine situation information. Alternatively, context information may be determined by using raw data as an input to a machine learning algorithm without extracting features from an audio signal.

일반적으로 오디오 이벤트는 수 초 정도의 단시간의 데이터를 사용하여 판단하는 반면, 상황정보의 경우 수 분 정도의 장시간의 데이터를 사용하여 판단하고, 다양한 음원이 혼재되어 있는 오디오 신호를 입력으로 사용하기 때문에 다채널 마이크 장치에서 분석하는 것보다 단일 마이크로부터 수집된 오디오 정보를 가지는 모바일 기기에서 판단하는 것이 효과적이다. 오디오 상황정보 판단기(310)에서 판단된 상황정보는 위험상황 판단부(150)로 전송된다.In general, audio events are judged using short-time data of a few seconds, while situation information is judged using long-term data of several minutes, and audio signals with various sound sources are used as input. It is more effective to judge on a mobile device having audio information collected from a single microphone than to analyze on a multi-channel microphone device. The situation information determined by the audio situation information determiner 310 is transmitted to the risk situation determination unit 150.

오디오 이벤트 특징관리기(320)는 네트워크 인터페이스부(160)를 통해서 전달받는 N개의 음원에 대한 오디오 특징정보(Fi)를 시간에 따라서 누적시켜서 오디오 이벤트 판단기로 전달하는 기능을 수행한다. 오디오의 특징정보는 빔포밍 오디오의 생성주기 마다 추출되며, 오디오 이벤트의 판단에 필요한 특징정보의 양은 빔포밍 오디오의 생성주기보다 길기 때문에 오디오 특징정보를 오디오 이벤트 판단에 필요한 만큼 누적시키는 기능이 필요하다. The audio event feature manager 320 accumulates audio feature information (Fi) for N sound sources received through the network interface unit 160 over time and transmits it to the audio event determiner. Since the feature information of audio is extracted for each generation cycle of beamforming audio, the amount of feature information necessary for determination of an audio event is longer than the generation cycle of beamforming audio, so a function of accumulating audio feature information as necessary for determining an audio event is necessary. .

오디오 이벤트 판단기(330)는 오디오 이벤트 특징관리기(320)로부터 누적된 오디오 특징정보를 사용하여 각 음원의 오디오 이벤트를 판단하는 기능을 수행한다. 이 때 오디오 이벤트의 판단은 SVM, KNN, DNN 등의 기계학습 방법을 사용하여 이루어지며 모바일 기기가 높은 연산성능을 가지기 때문에 복잡도가 높은 알고리즘을 활용하여 정확도를 향상 시킬 수 있는 장점이 있다. 또한 일반적으로 다채널 마이크장치를 통하여 오디오 신호를 수집하여 전송하게 되면 네트워크를 통하여 실시간으로 많은 양의 데이터가 전송되며, 한번에 다수의 오디오 신호를 전송하기 위해서는 별도의 추가적인 하드웨어가 필요하다. 하지만 본 발명과 같이 다채널 마이크 장치에서 분석에 필요한 오디오 특징정보만 추출하여 네트워크로 전송하게 되면 네트워크의 대역폭을 많이 소비하지 않고 다수의 음원에 대한 이벤트 분석이 가능해진다. N개의 음원에 대해서 판단된 이벤트 정보 및 위치 정보는 위험상황 판단부로 전달된다. The audio event determiner 330 performs a function of determining an audio event of each sound source using the audio feature information accumulated from the audio event feature manager 320. At this time, the determination of the audio event is made using a machine learning method such as SVM, KNN, DNN, etc. Since the mobile device has high computational performance, it has the advantage of improving accuracy by using a highly complex algorithm. Also, in general, when an audio signal is collected and transmitted through a multi-channel microphone device, a large amount of data is transmitted in real time through a network, and additional additional hardware is required to transmit multiple audio signals at once. However, as in the present invention, when only the audio characteristic information necessary for analysis is extracted from the multi-channel microphone device and transmitted to the network, event analysis of multiple sound sources is possible without consuming much bandwidth of the network. Event information and location information determined for N sound sources are transmitted to a risk situation determination unit.

도 4는 모바일 기기와 다채널 마이크 장치를 이용한 위험상황 판단 방법의 순서도를 나타낸다. 도 4에서 나타난 바와 같이 오디오 신호 감시부에서는 모바일 기기의 마이크를 통하여 수집되는 오디오 신호를 수집하여 분석(S402) 한 후, 관심 소리가 존재하는지 여부를 판단(S404)하게 된다. 4 is a flowchart of a method for determining a risk situation using a mobile device and a multi-channel microphone device. As shown in FIG. 4, the audio signal monitoring unit collects and analyzes the audio signal collected through the microphone of the mobile device (S402), and then determines whether a sound of interest exists (S404).

만약 관심 소리가 존재하는 경우에 네트워크 인터페이스를 통하여 다채널 마이크 장치가 활성화 (S406)된다. 오디오 신호 수집부(120)에서는 다채널 마이크 장치로부터 오디오 신호를 수집하여 오디오 신호 분석부(130)로 전달하게 되고, 오디오 신호 분석부(130)에서는 입력되는 오디오 신호로부터 복수개의 음원 위치를 추정(S408)하게 된다. 오디오 신호 분석부(130)에서는 각각의 음원에 대해서 위치 정보를 사용하여 빔포밍 오디오를 생성(S410)한 후, 오디오 정보 판단부에서 미리 정의한 오디오 특징을 추출(S412)하여 네트워크를 통하여 오디오 정보 판단부로 전달한다. 오디오 정보 판단부에서는 모바일 기기에서 수집된 오디오 신호로부터 상황 정보를 판단하고, 전달받은 오디오 특징을 사용하여 음원별 오디오 이벤트를 검출(S414)하게 된다. 음원 별 위치 및 오디오 이벤트 정보, 상황 정보, 그리고 모바일 기기의 다른 센서들로부터 분석된 정보를 사용하여 최종적으로 사용자가 위험상황인지에 대해 판단(S416)한다. If a sound of interest exists, the multi-channel microphone device is activated through the network interface (S406). The audio signal collection unit 120 collects an audio signal from a multi-channel microphone device and transmits it to the audio signal analysis unit 130, and the audio signal analysis unit 130 estimates a plurality of sound source positions from the input audio signal ( S408). The audio signal analysis unit 130 generates beamforming audio using location information for each sound source (S410), and then extracts audio characteristics previously defined by the audio information determination unit (S412) to determine audio information through a network. We pass to wealth. The audio information determination unit determines situation information from the audio signal collected by the mobile device and detects an audio event for each sound source using the received audio feature (S414). Finally, it is determined whether the user is in a dangerous situation by using location information for each sound source, audio event information, situation information, and information analyzed from other sensors of the mobile device (S416 ).

오디오 신호 감시부에서 관심 소리가 존재하지 않는다고 판단하는 경우에는 다채널 마이크 장치를 비활성화(S418) 시키고 모바일 기기에서 수집되는 오디오 신호만을 사용하여 상황정보를 판단(S420)하고, 모바일 기기의 다른 센서들로부터 분석된 정보들과 통합하여 위험상황 여부를 판단(S422)한다. When the audio signal monitoring unit determines that the sound of interest does not exist, the multi-channel microphone device is deactivated (S418), and context information is determined using only the audio signal collected from the mobile device (S420), and other sensors of the mobile device. It is determined whether the situation is dangerous by integrating with the analyzed information from (S422).

본 발명의 이점 및 특징, 그것들을 달성하는 방법은 첨부되어 있는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 제시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention, and methods for achieving them will be clarified with reference to embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments set forth below, but may be implemented in various different forms, and only the present embodiments make the disclosure of the present invention complete, and common knowledge in the art to which the present invention pertains It is provided to fully inform the person having the scope of the invention, and the invention is only defined by the scope of the claims.

110: 오디오 신호 감시부
120: 오디오 신호 수집부
130: 오디오 신호 분석부
140: 오디오 정보 판단부
150: 위험상황 판단부
160: 네트워크 인터페이스부
210: 음원 위치 추정기
220: 빔포밍 오디오 생성기
230: 오디오 특징 정보 추출기
310: 오디오 상황정보 판단기
320: 오디오 이벤트 특징 관리기
330: 오디오 이벤트 판단기110: audio signal monitoring unit
120: audio signal collection unit
130: audio signal analysis unit
140: audio information judgment unit
150: risk situation determination unit
160: network interface unit
210: sound source position estimator
220: beamforming audio generator
230: audio feature information extractor
310: audio situation information determiner
320: audio event feature manager
330: audio event determiner

Claims

In the way the terminal determines the risk situation
Transmitting, by the first terminal, a control signal based on the collected first audio signal;
Receiving, by the first terminal, analysis data for a second audio signal collected based on the control signal;
The first terminal grasping audio event information from analysis data for the second audio signal, and grasping audio situation information from the first audio signal; And
Determining whether the first terminal is in a dangerous situation based on one or more of the audio event information, the audio situation information, and sensor information;
Including,
Based on the control signal, the second audio signal is collected from a multi-channel microphone located in a second terminal separated from the first terminal,
The second audio signal is a risk situation, characterized in that the signal collected from one or more sound sources.

According to claim 1
The analysis data for the second audio signal includes one or more of each of one or more sound sources.
A method for determining a risk situation, characterized in that one or more of audio characteristic information, distance and direction information of a sound source is included.

The method of claim 2
Analyze the collected first audio signal to determine whether there is a sound of interest,
A method of judging a risk situation, characterized in that when the sound of interest is present, a control signal for activating the multi-channel microphone is transmitted.

The method of claim 3
When it is determined whether there is the sound of interest by analyzing the collected first audio signal,
Based on the amount of change in the collected first audio signal
And determining whether the second audio signal needs to be collected and analyzed.

The method of claim 3
If it is determined that there is no sound of interest
Based on the first audio signal, to grasp information about the user's surroundings,
A risk situation determination method characterized in that the user determines whether the user is in a dangerous situation based on the information on the surrounding situation and the sensor information.

The method of claim 2
When generating analysis data for the second audio signal
A method for determining a risk situation, characterized in that the location of each of the one or more sound sources is identified and beamforming audio is generated.

The method of claim 6
When checking the location of each of the one or more sound sources
Based on at least one of interval information between the multi-channel microphones, time difference information and intensity difference information of an acoustic signal reaching the multi-channel microphone
A method for determining a risk situation, characterized in that the distance and direction of each sound source from the user are checked.

The method of claim 6
When generating beamforming audio for each of the one or more sound sources
Amplify the sound signal of the sound source at the identified position
A method for judging a risk situation, characterized in that interference by a sound source existing in another location is eliminated.

The method of claim 6
And extracting the audio feature information based on beamforming audio for each of the one or more sound sources.

The method of claim 9
The audio feature information
A method for determining a risk situation, characterized in that the second audio signal is defined by combining at least one of a feature defined in a time domain, a feature defined in a frequency domain, and a feature defined in another domain.

The method of claim 2
The audio event information is a risk situation determination method characterized in that it means information about the sound generated by a single sound source.

The method of claim 2
The audio event information is identified based on the data in which the feature information for the second audio signal is accumulated over time,
And determining whether an event occurs around the user based on the audio event information.

The method of claim 2
Method for determining a risk situation, characterized in that the first terminal determines whether the event occurs through machine learning.

The method of claim 2
The audio situation information is identified based on various types of sound sources included in the first audio signal,
A method for determining a risk situation, characterized in that it refers to information about the environment in which the user is currently located.

The method of claim 2
A method for determining a risk situation, characterized in that the first terminal determines the audio situation information by machine learning.

The method of claim 2
When it is determined whether the situation is dangerous with the analyzed audio information,
Based on the location information of the one or more sound sources
Method of judging a risk situation, characterized by grasping whether each sound source is approaching and access speed.

The method of claim 2
A single microphone is present in the first terminal,
Method for determining a risk situation, characterized in that the first audio signal is collected from the single microphone.

In the device for determining the risk situation
A first terminal; And
A second terminal performing communication with the first terminal;
Including,
The first terminal
A control signal is transmitted based on the collected first audio signal,
Receive analysis data for the second audio signal collected based on the control signal,
The first terminal grasps audio event information from the analysis data for the second audio signal, grasps audio situation information from the first audio signal,
Whether the first terminal is a dangerous situation based on one or more of the audio event information, the audio situation information, and sensor information,
Based on the control signal, the second audio signal is collected from a multi-channel microphone located in a second terminal separated from the first terminal,
The second audio signal is a risk situation determination device, characterized in that the signal collected from one or more sound sources.

The method of claim 18
The second terminal
Includes an audio signal analysis unit and the audio signal analysis unit generates analysis data for the second audio signal,
The risk data determination device, characterized in that the analysis data for the second audio signal includes at least one of audio characteristic information, distance and direction information of each of one or more sound sources.

The method of claim 18
The first terminal determines whether there is a sound of interest by analyzing the collected first audio signal,
And a control signal for activating the multi-channel microphone of the second terminal when the sound of interest exists.