KR20190019726A

KR20190019726A - System and method for hidden markov model based uav sound recognition using mfcc technique in practical noisy environments

Info

Publication number: KR20190019726A
Application number: KR1020170105021A
Authority: KR
Inventors: 장경희; 석림; 하유경
Original assignee: 인하대학교 산학협력단
Priority date: 2017-08-18
Filing date: 2017-08-18
Publication date: 2019-02-27

Abstract

Disclosed are a hidden markov model (HMM)-based method and system for recognizing unmanned aerial vehicle (UAV) sound using a mel frequency cepstral coefficients (MFCC) technique in practical noise environment. The method may comprise the steps of: extracting a feature vector related to a sound signal of an UAV based on a MFCC technique in an environment where noise exists; learning a HMM model based on a learning data set generated based on the extracted feature vector; and recognizing the sound signal corresponding to the UAV based on the learned HMM model for an inputted sound signal.

Description

TECHNICAL FIELD [0001] The present invention relates to an HMM-based unmanned aerial vehicle acoustics recognition method and system using an MFCC technique in a real-

본 발명의 실시예들은 잡음이 존재하는 실질 환경에서 여러 음향 신호들 중에서 드론(drone) 등의 무인 항공기(Unmanned Aerial Vehicle)에 해당하는 음향 신호를 인식하여 무인 항공기를 식별하는 기술에 관한 것이다.Embodiments of the present invention relate to a technique for identifying an unmanned airplane by recognizing an acoustic signal corresponding to an unmanned aerial vehicle such as a drone among various acoustic signals in a real environment where noise exists.

무인 항공기(Unmanned Aerial Vehicle, UAV)인 드론(drone)의 이용 분야가 증가하면서, 드론은 군사용뿐만 아니라 상업용, 방송용으로 많이 이용되고 있다. 이러한 무인 항공기는 대부분 소형이며, 소형 무인항공기의 경우, 통제되지 않은 영공(airspace)에서 운행되므로, 공공 안전 및 개인 정보 보호에 심각한 위협이 될 수 있다. 예컨대, 소형 무인 항공기는 화학 무기, 생물학 무기, 또는 핵무기 등을 운반할 수 있으며, 마약을 밀수입하거나 국경 너머로 다른 불법 활동을 수행하기 위해 이용될 수 있다. Unmanned Aerial Vehicle (UAV) As the use of drone is increasing, drones are widely used for military, commercial, and broadcasting applications. These unmanned aerial vehicles are mostly small, and in the case of small unmanned aerial vehicles, they operate in uncontrolled airspace, which can pose a serious threat to public safety and privacy. For example, small unmanned aerial vehicles can carry chemical weapons, biological weapons, or nuclear weapons, and can be used to smuggle drugs or perform other illegal activities across borders.

이처럼, 드론 등의 소형 무인 항공기가 사람의 생명을 위협하거나, 사생활을 위협하는 용도, 마약 밀수입 등의 불법적인 용도 및 공공안전을 위협하는 용도로 활용될 가능성이 존재하기 때문에, 드론을 탐지하고 추적하는 기술이 점차로 중요해 지고 있다. 현재 드론을 탐지 및 추적하는 기술은 레이더(radar)를 기반으로 한다. As such, there is a possibility that a small unmanned aerial vehicle such as a drone may be used for threatening human life, threatening privacy, illegal use such as drug smuggling, and threatening public safety. Therefore, Technology is becoming increasingly important. The current technology for detecting and tracking drones is based on radar.

그러나, 레이더 기반으로 운행 중인 드론의 위치를 탐지 및 추적하는 경우, 드론의 크기가 작으면 지상 가까이에 위치하는 드론의 위치를 탐지 또는 추적하기 어렵고, 드론이 날아갈 수 있는 사각지대를 내포하는 문제가 존재하며, 또한 상당히 고가의 시스템 설치 및 운영비용이 소요되는 문제가 있다.However, when detecting and tracking the position of a drone in operation based on a radar, it is difficult to detect or track the position of the dron located near the ground when the size of the dron is small, and there is a problem in that it includes a blind spot And there is also a problem of costly installation and operation of the system.

이에 따라, 드론(drone)이 소형이든, 중형이든, 드론의 크기에 관계없이 드론의 실시간 위치를 보다 정확하게 탐지 및 추적할 수 있는 기술이 요구된다. 즉, 레이더 기반의 드론의 위치를 추적하는 기술 이외 드론의 위치 탐지 및 추적 기술이 요구된다.Accordingly, there is a need for a technique that can more accurately detect and track the real-time position of the dron regardless of the size of the dron, whether the drone is small or medium. That is, there is a need for drones' location detection and tracking technology, besides the technology for tracking the location of radar-based drones.

한국공개특허 제10-2013-0120176호는 무인 항공기의 위치 추적 방법에 관한 것으로, GPS 교란 또는 GPS 수신기의 고장에 의해, 무인 항공기의 위치를 파악할 수 없는 경우, 지상 통제 장비가 영상 감지기를 조정하여 사전에 알고 있는 고정 표적(절대 좌표)을 촬영하도록 함으로써, 무인 항공기의 상태 정보(피치, 롤, 헤딩, 고도)와 영상 감지기의 정보(방위각(Azimuth) 및 고각(Elevation))를 이용하여 고정 표적을 기준점으로 역으로 무인 항공기의 위치를 추적하는 기술을 개시하고 있다.Korean Patent Laid-Open No. 10-2013-0120176 relates to a method for tracking the position of an unmanned airplane. When the location of the unmanned airplane can not be determined by a GPS disturbance or a failure of the GPS receiver, the ground control device adjusts the image sensor (Azimuth and elevation) of the unmanned airplane and the image sensor information (pitch, roll, heading, altitude) and the fixed target (absolute coordinate) And tracking the position of the unmanned aerial vehicle in reverse with a reference point.

[1] I. Patel and Y. S. Rao, "Speech recognition using Hidden Markov Model with MFCC-Subband technique," in Proc. International Conference on Recent Trends in Information, Telecommunication and Computing, pp.168-172, Mar. 2010.[1] I. Patel and Y. S. Rao, "Speech recognition using Hidden Markov Model with MFCC-Subband technique," in Proc. International Conference on Recent Trends in Information, Telecommunication and Computing, pp. 168-172, Mar. 2010. [2] G. B. Singh and H. Song, "Using hidden Markov models in vehicular crash detection," IEEE Transactions on Vehicular Technology, vol. 58, no. 3, pp. 1119-1128, Mar. 2009.[2] G. B. Singh and H. Song, "Hidden Markov Models in Vehicular Crash Detection," IEEE Transactions on Vehicular Technology, vol. 58, no. 3, pp. 1119-1128, Mar. 2009.

본 발명은 무인 항공기(UAV)의 크기에 제약받지 않고(예컨대, 드론의 크기에 관계없이), 잡음이 존재하는 실질 환경에서 다양한 음향 신호가 입력으로 들어온 경우에, 입력된 음향 신호 중에서 드론에 해당하는 음향 신호를 정확하게 인식하는 기술에 관한 것이다. 즉, 실질 환경에 존재하는 여러 음향 신호들 중에서 드론에 해당하는 음향 신호를 인식하여 드론을 탐지 및 추적 가능하도록 하는 기술에 관한 것이다. The present invention can be applied to a drones which are not limited by the size of a UAV (for example, irrespective of the size of a dron) and, when various acoustic signals are inputted into the input in a real environment where noise exists, And more particularly, to a technique for accurately recognizing a sound signal to be generated. That is, the present invention relates to a technique for recognizing an acoustic signal corresponding to a dron among a plurality of acoustic signals existing in a real environment to enable detection and tracking of the dron.

무인 항공기(Unmanned Aerial Vehicle)의 음향을 인식하는 방법에 있어서, 잡음(noise)이 존재하는 환경에서 MFCC(Mel Frequency Cepstral Coefficients) 기법에 기초하여 무인 항공기(UAV)의 음향 신호관련 특징 벡터를 추출하는 단계, 추출된 상기 특징 벡터를 기반으로 생성된 학습 데이터 세트에 기초하여HMM(hidden Markov Model) 모델을 학습시키는 단계, 및 입력된 음향 신호를 대상으로, 상기 학습된 HMM 모델에 기초하여 무인 항공기(UAV)에 해당하는 음향 신호를 인식하는 단계를 포함할 수 있다.A method for recognizing sound of an unmanned aerial vehicle, the method comprising: extracting acoustic signal-related feature vectors of an unmanned aerial vehicle (UAV) based on a Mel Frequency Cepstral Coefficients (MFCC) technique in the presence of noise; Learning a HMM (hidden Markov Model) model based on the extracted learning data set based on the extracted feature vector, and a step of learning an HMM model based on the learned HMM model And recognizing an acoustic signal corresponding to the UAV.

일측면에 따르면, 상기 특징 벡터를 추출하는 단계는, 서로 다른 개수의 켑스트럴 계수로 구성된 제1 MFCC 및 제2 MFCC에 기초하여 상기 특징 벡터를 추출할 수 있다.According to an aspect of the present invention, the step of extracting the feature vector may extract the feature vector based on a first MFCC and a second MFCC which are composed of different numbers of cepstral coefficients.

다른 측면에 따르면, 상기 특징 벡터를 추출하는 단계는, 미리 지정된 기준 시간동안 켑스트럴 도메인(cepstral domain)의 포락선을 기반으로 종류 별 무인 항공기의 음향 신호관련 특징 벡터를 추출할 수 있다.According to another aspect of the present invention, the extracting of the feature vector may extract acoustic signal-related feature vectors of an unmanned aerial vehicle based on an envelope of a cepstral domain for a predetermined reference time.

또 다른 측면에 따르면, 상기 특징 벡터를 추출하는 단계는, 무인 항공기에 해당하는 음향 신호를 대상으로 고주파 필터링(highpass filtering)을 수행하는 프리 엠퍼시스(pre-emphasis) 처리 단계, 상기 프리 엠퍼시스 처리된 신호를 대상으로 프레이밍(framing), 윈도윙(windowing) 및 고속 푸리에 변환(FFT)을 수행하여 상기 무인 항공기(UAV)의 음향 신호에 해당하는 특징 벡터를 나타내는 스펙트로그램(spectrogram)을 생성하는 단계, 상기 고속 푸리에 변환된 신호를 대상으로 멜 스케일 필터 뱅크(Mel-scale filter bank)를 통과시키는 단계, 상기 멜 스케일 필터 뱅크를 통과한 신호를 대상으로 이산 코사인 변환(DCT)을 수행하여 켑스트럴 계수(Cepstral Coefficient)를 계산하는 단계, 및 계산된 상기 켑스트럴 계수에 기초하여 MFCC를 구성하는 단계를 포함할 수 있다.According to another aspect of the present invention, the step of extracting the feature vector includes a pre-emphasis processing step of performing highpass filtering on an acoustic signal corresponding to an unmanned air vehicle, a pre-emphasis processing step of performing high- Generating a spectrogram representing a feature vector corresponding to an acoustic signal of the UAV by performing framing, windowing, and fast Fourier transform (FFT) Passing a fast Fourier transformed signal through a Mel-scale filter bank, performing a discrete cosine transform (DCT) on the signal passed through the mel-scale filter bank, Calculating a Cepstral Coefficient, and constructing the MFCC based on the calculated cepstral coefficients.

무인 항공기(Unmanned Aerial Vehicle)의 음향을 인식하는 시스템에 있어서, 잡음(noise)이 존재하는 환경에서 MFCC(Mel Frequency Cepstral Coefficients) 기법에 기초하여 무인 항공기(UAV)의 음향 신호관련 특징 벡터를 추출하는 특징 벡터 추출부, 추출된 상기 특징 벡터를 기반으로 생성된 학습 데이터 세트에 기초하여 HMM(hidden Markov Model) 모델을 학습시키는 모델 학습부, 및 입력된 음향 신호를 대상으로, 상기 학습된 HMM 모델에 기초하여 무인 항공기(UAV)에 해당하는 음향 신호를 인식하는 음향 인식부를 포함할 수 있다.A system for recognizing sound of an unmanned aerial vehicle, the method comprising: extracting acoustic signal-related feature vectors of an unmanned aerial vehicle (UAV) based on a Mel Frequency Cepstral Coefficients (MFCC) technique in the presence of noise; A model learning unit for learning an HMM (hidden Markov Model) model based on the extracted learning data set based on the extracted feature vector, and a learning unit for learning the inputted HMM model And an acoustic recognition unit for recognizing an acoustic signal corresponding to the UAV on the basis of the acoustic signal.

본 발명은, MFCC(Mel Frequency Cepstral Coefficient) 기법에 기초하여 무인 항공기(UAV)의 음향 신호에서 특징 벡터를 추출하고, 추출된 특징 벡터에 기초하여 HMM(Hidden Markov Model)을 학습시키고, 학습된 모델을 기반으로 실질 환경에서 입력된 음향 신호들 중 무인 항공기의 프로펠러(propeller)에서 방출된 소리를 식별함으로써, 무인 항공기(UAV)의 크기에 제약받지 않고(예컨대, 드론의 크기에 관계없이), 잡음이 존재하는 실질 환경에서 무인 항공기(UAV)를 인식할 수 있다.The present invention extracts a feature vector from an acoustic signal of an unmanned aerial vehicle (UAV) based on MFCC (Mel Frequency Cepstral Coefficient) technique, learns HMM (Hidden Markov Model) based on the extracted feature vector, (UAV) without disturbing the size of the UAV (for example, regardless of the size of the drone) by identifying the sound emitted from the propeller of the unmanned airplane among the acoustic signals inputted in the real environment based on the noise (UAV) can be recognized in a real environment where there exists an unmanned aerial vehicle (UAV).

도 1은 본 발명의 일실시예에 있어서, 무인 항공기의 음향 인식 시스템의 구성을 도시한 블록도이다.
도 2는 본 발명의 일실시예에 있어서, 무인 항공기의 음향 인식 방법의 동작을 도시한 흐름도이다.
도 3은 본 발명의 일실시예에 있어서, 다양한 음향 신호 별 스펙트로그램(spectrogram)을 나타낼 수 있다.
도 4는 본 발명의 일실시예에 있어서, MFCC 기법에 기초하여 특징 벡터를 추출하는 동작을 도시한 흐름도이다.
도 5는 본 발명의 일실시예에 있어서, 특징 벡터 추출부의 세부 구성을 도시한 블록도이다.
도 6은 본 발명의 일실시예에 있어서, BWA(Baum-Welch Algorithm) 기반의 HMM 모델을 학습시키는 동작을 도시한 흐름도이다.
도 7은 본 발명의 일실시예에 있어서, 학습된 HMM 모델에 기초하여 원하는 음향 신호를 인식하는 동작을 도시한 블록도이다.
도 8은 본 발명의 일실시예에 있어서, 제1 및 2 MFCC기법을 이용하여 추출된 특징 벡터들을 기반으로 하는 평균 인식률을 도시한 그래프이다.
도 9는 본 발명의 일실시예에 있어서, SNR에 대한 인식률 성능을 도시한 그래프이다.1 is a block diagram showing the configuration of an acoustic recognition system of an unmanned aerial vehicle according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating an operation of an acoustic recognition method for an unmanned aerial vehicle according to an exemplary embodiment of the present invention.
FIG. 3 illustrates a spectrogram for various acoustic signals according to an embodiment of the present invention.
4 is a flowchart illustrating an operation of extracting a feature vector based on the MFCC technique in an embodiment of the present invention.
5 is a block diagram showing a detailed configuration of a feature vector extracting unit according to an embodiment of the present invention.
FIG. 6 is a flowchart illustrating an operation of learning a BWA (Baum-Welch Algorithm) based HMM model in an embodiment of the present invention.
7 is a block diagram illustrating an operation of recognizing a desired acoustic signal based on a learned HMM model in an embodiment of the present invention.
FIG. 8 is a graph illustrating an average recognition rate based on feature vectors extracted using the first and second MFCC techniques, according to an embodiment of the present invention. Referring to FIG.
9 is a graph showing recognition rate performance for SNR in an embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 실시예들은 드론(drone) 등의 무인 항공기(Unmanned Aerial Vehicle)를 인식하는 기술에 관한 것으로서, 특히, 무인 항공기의 프로펠러(propeller)가 구동됨에 따라 방출되는 음향 신호(즉, 프로펠러 소리와 관련된 음향 어레이 신호)를 기반으로 MFCC(Mel Frequency Cepstral Coefficient) 기법에 기초하여 특징 벡터를 추출하고, 추출된 특징 벡터를 이용하여 HMM(Hidden Markov Model)을 학습시켜 생성된 학습 데이터 세트를 이용하여 시스템으로 입력되는 음향 신호가 드론 등 무인 항공기의 프로펠러 소리에 해당하는 것인지를 인식하는 기술에 관한 것이다. 즉, 본 실시예들에서 HMM 모델은 드론 등의 무인 항공기의 음향 신호를 인식하기 위한 분류기(classifier)로 이용될 수 있다. The present invention relates to a technique for recognizing an unmanned aerial vehicle such as a drone, and more particularly, to a technique for recognizing an acoustic signal (that is, acoustic related to a propeller sound) emitted as a propeller of an unmanned airplane is driven Based on the MFCC (Mel Frequency Cepstral Coefficient) technique based on the training data, and learning data of HMM (Hidden Markov Model) using the extracted feature vectors. The sound signal corresponding to the sound of the propeller of the drone or the like. That is, in the present embodiments, the HMM model can be used as a classifier for recognizing acoustic signals of a drone or the like.

본 실시예들에서, '무인 항공기(Unmanned Aerial Vehicle)'는 조종사가 탑승하지 않고 지정된 임무를 수행할 수 있도록 제작된 비행체로서, 드론(drone) 등이 포함될 수 있다.In the present embodiments, the 'Unmanned Aerial Vehicle' is a flight vehicle manufactured to allow a pilot to perform a designated mission without boarding the aircraft, and may include a drone or the like.

본 실시예들에서, '음향 신호'는 오디오 신호(audio sound)를 나타내는 것으로서, 예컨대, 드론(drone)에 해당하는 음향 신호(sound)는 드론의 프로펠러가 회전하면서 방출하는 음향 신호(acoustic sound)를 나타낼 수 있다. In the present embodiment, the 'acoustic signal' represents an audio sound. For example, an acoustic signal corresponding to a drone is an acoustic sound emitted by a propeller of a drone, Lt; / RTI >

본 실시예들에서, '제2 MFCC'는 36개 켑스트럴 계수(MFCC)로 구성되는 것으로 설명하나, 이는 실시예에 해당되며, '제2 MFCC'는 36개보다 많은 켑스트럴 계수로 구성될 수 있다.In the present embodiments, the 'second MFCC' is described as being composed of 36 polynomial coefficients (MFCC), which corresponds to the embodiment, and the 'second MFCC' Lt; / RTI >

도 1은 본 발명의 일실시예에 있어서, 무인 항공기의 음향 인식 시스템의 구성을 도시한 블록도이고, 도 2는 본 발명의 일실시예에 있어서, 무인 항공기의 음향 인식 방법의 동작을 도시한 흐름도이다.FIG. 1 is a block diagram illustrating the configuration of an acoustic recognition system of an unmanned aerial vehicle according to an embodiment of the present invention. FIG. 2 is a flowchart illustrating an operation of the acoustic recognition method of an unmanned aerial vehicle FIG.

도 1에 따르면, 음향 인식 시스템(100)은 특징 벡터 추출부(110), 모델 학습부(120), 및 음향 인식부(130)를 포함할 수 있다. 그리고, 도 2의 각 단계들(210 내지 230 단계)은 특징 벡터 추출부(110), 모델 학습부(120), 및 음향 인식부(130)에 의해 수행될 수 있다.1, the acoustic recognition system 100 may include a feature vector extraction unit 110, a model learning unit 120, and an acoustic recognition unit 130. [ 2 may be performed by the feature vector extracting unit 110, the model learning unit 120, and the sound recognizing unit 130. [0052] FIG.

210 단계에서, 특징 벡터 추출부(110)는 잡음(noise) 및 다양한 음향 신호들이 존재하는 실질 환경에서 MFCC(Mel Frequency Cepstral Coefficients) 기법에 기초하여 무인 항공기(UAV)의 음향 신호와 관련된 특징 벡터들을 추출할 수 있다.In step 210, the feature vector extraction unit 110 extracts feature vectors related to the acoustic signal of the UAV based on the Mel Frequency Cepstral Coefficients (MFCC) technique in a real environment in which noise and various acoustic signals exist. Can be extracted.

이때, 특징 벡터 추출부(110)는 2개의 MFCC 기법을 이용하여 무인 항공기의 프로펠러가 회전하면서 방출하는 음향 신호로부터 무인 항공기에 해당하는 특징 벡터들을 추출할 수 있다. 즉, 특징 벡터 추출부(110)는 미리 지정된 기준 시간 동안 켑스트럴 도메인(cepstral domain)의 포락선을 기반으로 종류 별 무인 항공기의 음향 신호와 관련된 특징 벡터들을 추출할 수 있다. 여기서, 2개의 MFCC 기법은 서로 다른 개수의 켑스트럴 계수로 구성된 제1 MFCC 및 제2 MFCC를 포함할 수 있으며, 제1 MFCC는 24개 켑스트럴 계수(MFCC)로 구성되고, 제2 MFCC는 36개 켑스트럴 계수(MFCC)로 구성될 수 있다. 예컨대, 제1 및 제2 MFCC는 아래의 표 1과 같이 구성될 수 있다.At this time, the feature vector extracting unit 110 can extract feature vectors corresponding to the UAV from acoustic signals emitted while the propeller of the UAV is rotated using the two MFCC techniques. That is, the feature vector extraction unit 110 may extract feature vectors related to the acoustic signals of the unmanned aerial vehicle based on the envelope of the cepstral domain for a predetermined reference time. Here, the two MFCC schemes may include a first MFCC and a second MFCC having different numbers of scintillation coefficients, the first MFCC may be composed of twenty-four polynomial coefficients (MFCCs), the second MFCC May be composed of 36 Stellar coefficients (MFCC). For example, the first and second MFCCs may be configured as shown in Table 1 below.

위의 표 1은 특징 벡터 추출을 위한 2개의 MFCC 기법을 나타낼 수 있다. 표 1에 따르면, 제1 MFCC 기법은 12개 표준 MFCC 및 12개 델타 MFCC로 구성되고, 제2 MFCC 기법은 12개 표준 MFCC와 24개 델타 MFCC로 구성될 수 있다. 여기서, 24개 MFCC는 후술될 수학식 11 에 기초하여 계산된 12개 델타 MFCC와 수학식 12에 기초하여 계산된 12개 델타 MFCC로 구성될 수 있다.Table 1 above shows two MFCC schemes for feature vector extraction. According to Table 1, the first MFCC scheme consists of 12 standard MFCCs and 12 delta MFCCs, and the second MFCC scheme can consist of 12 standard MFCCs and 24 delta MFCCs. Here, the 24 MFCCs can be composed of 12 delta MFCCs calculated based on Equation (11) to be described later and 12 delta MFCCs calculated based on Equation (12).

220 단계에서, 모델 학습부(120)는 추출된 특징 벡터들을 기반으로 생성된 학습 데이터 세트에 기초하여 HMM(Hidden Markov Model) 모델을 학습시킬 수 있다. 여기서, 초기의 학습 데이터 세트는 음향 신호의 종류 별로 미리 지정될 수 있다.In step 220, the model learning unit 120 may learn a HMM (Hidden Markov Model) model based on the learning data set generated based on the extracted feature vectors. Here, the initial learning data set can be specified in advance for each type of acoustic signal.

예컨대, 모델 학습부(120)는 HMM 파라미터들을 초기화한 이후, 특징 벡터들을 이용하여 HMM 모델을 학습시킬 수 있다. 즉, 학습을 통해 HMM 파라미터들을 업데이트할 수 있다.For example, after initializing the HMM parameters, the model learning unit 120 may learn the HMM model using the feature vectors. In other words, HMM parameters can be updated through learning.

230 단계에서, 음향 인식부(130)는 시스템으로 입력되는 다양한 종류의 음향 신호들을 대상으로, 학습된 HMM 모델에 기초하여 무인 항공기(UAV)에 해당하는 음향 신호를 인식(또는 탐지)할 수 있다. 즉, 학습된 HMM 모델을 분류기(classifier)로 이용하여 입력된 신호들 중에서 무인 항공기(UAV)에 해당하는 음향 신호를 인식할 수 있다.In operation 230, the sound recognition unit 130 may recognize (or detect) acoustic signals corresponding to the UAV based on the learned HMM model, on various types of acoustic signals input to the system . That is, the learned HMM model can be used as a classifier to recognize the acoustic signal corresponding to the UAV from the input signals.

예컨대, 비행기 소리, 새 소리, 자동차 소리, 비 소리, 드론 소리 등과 같이 실질 환경에 존재하는 다양한 종류의 음향 신호들이 음향 인식부(130)의 음향 입력 센서를 통해 입력될 수 있다. 그러면, 음향 인식부(130)는 상기 무인 항공기와 관련하여 추출된 특징 벡터들을 기반으로 학습된 HMM 모델에 기초하여 다양한 종류의 음향 신호들 중에서 무인 항공기(UAV)에 해당하는 음향 신호를 인식할 수 있다. 이때, 무인 항공기의 종류 별로 모터에 의하여 구동되는 프로펠러의 소리가 상이할 수 있으며, 종류 별로 구분되어 특징 벡터들이 추출 및 학습이 수행될 수 있다. 이에 따라, 상기 음향 인식부(130)는 입력된 음향 신호가 단순히 무인 항공기에 해당함을 인식할 수 있을 뿐만 아니라, 인식된 무인 항공기(UAV)관련 정보(예컨대, 드론의 종류, 제조사, 크기, 드론의 소속 정보 등)을 파악할 수 있다. Various types of acoustic signals existing in a real environment such as an airplane sound, a bird sound, a car sound, a rain sound, a drone sound, and the like can be input through the sound input sensor of the sound recognition unit 130. Then, the sound recognition unit 130 recognizes the acoustic signal corresponding to the UAV among various types of acoustic signals based on the HMM model learned based on the extracted feature vectors in relation to the UAV have. At this time, the sound of the propeller driven by the motor may be different according to the type of the UAV, and the feature vectors may be extracted and learned according to the types. Accordingly, the sound recognition unit 130 not only recognizes that the inputted sound signal corresponds to the unmanned airplane but also recognizes the recognized UAV related information (for example, the type of the dron, the maker, the size, And the like).

그리고, 인식된 무인 항공기관련 정보는 시스템(100)의 디스플레이(미도시)를 통해 표시될 수도 있다. 예컨대, 입력된 음향 신호가 드론(drone)에 해당하는지 여부, 해당한다면 드론의 종류, 크기, 소속 정보(예컨대, 방송사 소속, 군 소속, 기업 소속 등) 등이 디플레이(미도시)에 표시될 수 있다.The recognized UAV related information may be displayed through a display (not shown) of the system 100. For example, whether or not the input sound signal corresponds to a drone, and if so, the type, size, and affiliation information (e.g., broadcasting company affiliation, military affiliation, company affiliation, etc.) .

이하에서는 수학식 1 내지 수학식 12를 참고하여, 2개의 MFCC 기법을 이용하여 특징 벡터들을 추출하는 동작에 대해 보다 상세히 설명하기로 한다. 이때, 3가지 학습 데이터 세트가 음향 유형의 증가와 함께 설정될 수 있으며, 각 학습 데이터 세트에 포함된 학습 데이터들(예컨대, 종류별 음향 신호)은 각각의 단일 클러스터(cluster)에 기초하여 HMM 모델을 학습시키기 위해 미리 여러 개의 클러스터로 분류될 수 있다.Hereinafter, the operation of extracting feature vectors using two MFCC techniques will be described in more detail with reference to Equations (1) to (12). At this time, three learning data sets may be set together with an increase in the acoustic type, and the learning data (for example, the acoustic signal for each type) included in each learning data set may include an HMM model based on each single cluster It can be classified into several clusters in advance for learning.

드론과 같은 무인 항공기의 프로펠러 소리를 인식하기 위한 프레임워크(즉, 드론 음향 인식 프레임워크)는 음향 신호로부터 특징 벡터를 추출하고, 특징 벡터들을 종류 별로 분류하는 두 단계로 구분될 수 있다. 음향 신호는 이동성, 속도 변화 및 전파 조건으로 인한 시간 영역 또는 주파수 구성 요소, 기타 요소로 인한 주파수 영역에서 자체 기능을 수행할 수 있다. 이때, 특징 벡터들을 포함하는 특징 벡터 집합을 형성하기 위해 시간-주파수 영역에서 음향 특성을 보여주기 위한 STFT(Short Time Fourier Transform)를 통해 스펙트로그램을 생성한 후 MFCC가 구성될 수 있다. STFT는 일종의 푸리에 변환으로서, STFT를 수행하는 것은 시간의 신호를 동일한 길이의 더 짧은 세그먼트로 나눈 다음, 더 짧은 세그먼트마다 별도로 푸리에 변환을 수행하는 것을 나타낼 수 있다. 이때, 연속 시간(continuous-time)의 경우, 음향 신호의 STFT는 아래의 수학식 1과 같이 표현될 수 있다.A framework (ie, a drone acoustic recognition framework) for recognizing a propeller sound of an unmanned airplane such as a drone can be divided into two steps of extracting a feature vector from a sound signal and classifying the feature vectors by type. Acoustic signals can perform their functions in the frequency domain due to time domain or frequency components due to mobility, speed variation and propagation conditions, and other factors. At this time, in order to form a feature vector set including the feature vectors, the MFCC can be configured after the spectrogram is generated by STFT (Short Time Fourier Transform) to show the acoustic characteristics in the time-frequency domain. STFT is a sort of Fourier transform, performing STFT can indicate that the signal of time is divided into shorter segments of equal length and then Fourier transform is performed separately for each shorter segment. At this time, in the case of continuous-time, the STFT of the acoustic signal can be expressed as Equation 1 below.

[수학식 1][Equation 1]

수학식 1에서,

는 윈도우(window) 함수,

는 푸리에 변환될 신호(예컨대, 드론 등의 음향 신호)를 나타낼 수 있다. 그리고,

는 시간과 주파수에 따른 신호의 위상과 크기를 나타내는 복소 함수인

의 푸리에 변환을 나타낼 수 있다.In Equation (1)

Is a window function,

(E.g., an acoustic signal such as a drone) to be Fourier transformed. And,

Is a complex function representing the phase and magnitude of the signal with respect to time and frequency

Lt; RTI ID = 0.0 > Fourier < / RTI >

이때, 이산 시간(discrete-time)의 경우, 푸리에 변환될 신호(즉, 음향 신호에 해당하는 데이터)가 프레임(frame)으로 분할될 수 있으며, 서로 오버랩(overlap)될 수 있다. 신호를 여러 프레임으로 분할 시, 분석하고자 하는 신호의 해상도와 총 지속 시간에 기초하여 프레임의 길이가 분석될 수 있다. 그러면, 이산 시간에서, 각 프레임에 대한 푸리에 변환은 수학식 2와 같이 표현될 수 있다. At this time, in case of discrete-time, a signal to be Fourier transformed (that is, data corresponding to a sound signal) may be divided into frames and overlap each other. When the signal is divided into several frames, the length of the frame can be analyzed based on the resolution and total duration of the signal to be analyzed. Then, at the discrete time, the Fourier transform for each frame can be expressed as shown in equation (2).

[수학식 2]&Quot; (2) "

수학식 2에서,

은 변환될 신호,

은 윈도우 함수를 나타낼 수 있다. 이때, 푸리에 변환 시 STFT 대신 FFT가 사용되는 경우, 상기 수학식 2의 변수 m과 ω는 이산적(discrete)일 수 있다.In Equation (2)

A signal to be converted,

Can represent a window function. In this case, when FFT is used instead of STFT in Fourier transform, the variables m and? In Equation (2) may be discrete.

이때, 스펙트로그램을 산출하기 위한 STFT의 크기의 제곱은 아래의 수학식 3과 같이 표현될 수 있다.At this time, the square of the magnitude of the STFT for calculating the spectrogram can be expressed by Equation (3) below.

[수학식 3]&Quot; (3) "

도 3은 본 발명의 일실시예에 있어서, 다양한 음향 신호 별 스펙트로그램(spectrogram)을 나타낼 수 있다.FIG. 3 illustrates a spectrogram for various acoustic signals according to an embodiment of the present invention.

도 3에서는 드론의 녹음된 음향 신호(즉, 드론의 프로펠러 회전 시 방출된 음향 신호를 녹음한 신호, 310) 및 인터넷 상의 데이터베이스로부터 수집된 비행기(320), 자동차(330), 및 새(340) 등의 다양한 종류의 음향 신호 각각에 해당하는 스펙트로그램을 나타낼 수 있다. In Figure 3, a recorded sound signal of the drones (i. E., A signal 310 of a recorded sound signal emitted upon propeller rotation of the drones) 310 and an airplane 320, an automobile 330, And a spectrogram corresponding to each of various types of acoustic signals.

도 3에 도시된, 비행기, 자동차, 새소리 등에 해당하는 음향 신호(320, 330, 340)는 드론(drone)의 프로펠러 소리에 해당하는 음향 신호를 녹음 및 인식하는데 주된 장애물이 될 수 있다. 도 3에 따르면, 서로 다른 종류의 음향 신호의 특징이 시간 및 주파수 영역에서 모두 나타남을 알 수 있다. 예컨대, 드론(drone)의 경우, 400Hz 내지 8KHz에서 강한 고조파 성분이 나타나는 특징을 가짐을 확인할 수 있으며, 나머지 음향 신호들(320, 330, 340)은 고조파 성분이 다양한 주파수 영역에서 광범위하게 존재함을 확인할 수 있다. 이처럼, 드론의 스펙트로그램에 기초하여, 400Hz 내지 8KHz 영역에서 고조파 성분을 갖는 특징 등을 기반으로 입력된 음향 신호의 스펙트로그램(즉, 패턴)이 상기 드론에 해당하는지 여부가 인식될 수 있다. The acoustic signals 320, 330, and 340 corresponding to airplanes, cars, birds, etc. shown in FIG. 3 may be a major obstacle to recording and recognizing sound signals corresponding to propeller sounds of a drone. Referring to FIG. 3, it can be seen that characteristics of different types of acoustic signals appear in both time and frequency domains. For example, in the case of a drone, it can be confirmed that a strong harmonic component appears at 400 Hz to 8 KHz, and the rest of the acoustic signals 320, 330 and 340 have a harmonic component in a wide frequency range Can be confirmed. As described above, based on the spectrogram of the drone, it can be recognized whether or not the spectrogram (i.e., pattern) of the acoustic signal inputted based on the feature having the harmonic component in the region of 400 Hz to 8 KHz corresponds to the drones.

도 4는 본 발명의 일실시예에 있어서, MFCC 기법에 기초하여 특징 벡터를 추출하는 동작을 도시한 흐름도이고, 도 5는 본 발명의 일실시예에 있어서, 특징 벡터 추출부의 세부 구성을 도시한 블록도이다.FIG. 4 is a flowchart illustrating an operation of extracting a feature vector based on an MFCC technique in an embodiment of the present invention. FIG. 5 is a flowchart illustrating an operation of extracting a feature vector Block diagram.

도 5에 따르면, 특징 벡터 추출부(500)는 프리 엠퍼시스(pre-emphasis) 처리부(510), 프레이밍(framing, 520), 윈도윙(windowing, 530), FFT 처리부(FFT, 540), 멜 스케일 필터 뱅크(Mel-scale filter bank, 550), 대수 처리부(Logarithm operation, 560), DCT(Discrete Cosine Transform, 570), MFCC 구성부(MFCC, 580)를 포함할 수 있다. 도 4에서 각 단계들(410 내지 440 단계는 도 5의 구성 요소인 프리 엠퍼시스(pre-emphasis) 처리부(510), 프레이밍부(framing, 520), 윈도윙부(windowing, 530), FFT 처리부(FFT, 540), 멜 스케일 필터 뱅크(Mel-scale filter bank, 550), 대수 처리부(Logarithm operation, 560), DCT(Discrete Cosine Transform, 570), MFCC 구성부(MFCC, 580)에 의해 수행될 수 있다.5, the feature vector extractor 500 includes a pre-emphasis processor 510, a framing 520, a windowing 530, an FFT processor 540, Scale filter bank 550, a logarithmic operation unit 560, a DCT (Discrete Cosine Transform) 570, and an MFCC configuration unit (MFCC) 580. 4, each of the steps 410 to 440 includes a pre-emphasis processing unit 510, a framing unit 520, a windowing unit 530, an FFT processing unit FFT 540, a Mel-scale filter bank 550, a logarithmic operation 560, a discrete cosine transform 570, and an MFCC 580 have.

먼저, 특징 벡터 추출부(500)는 음향 신호 별 특징 벡터(즉, 음향 특징)를 효율적으로 추출하여 음향 신호에 대한 특징 벡터 집합을 생성하기 위해, 대수(logarithm) 영역에서 스펙트럼의 포락선을 추출하는 MFCC 기법을 이용하여 켑스트럴 영역에서 특징 벡터들을 추출할 수 있다. 위의 비특허문헌 [1] I. Patel and Y. S. Rao , "Speech recognition using Hidden Markov Model with MFCC - Subband technique," in Proc . International Conference on Recent Trends in Information, Telecommunication and Computing, pp.168-172, Mar. 2010.에서는 대수 영역에서 스펙트럼의 포락선을 추출하여 켑스트럴 영역에서의 특징 벡터를 추출하는 구성을 제시하고 있다.First, the feature vector extractor 500 extracts the envelope of the spectrum in the logarithm region to efficiently extract a feature vector (i.e., acoustic feature) for each sound signal to generate a feature vector set for the sound signal Feature vectors can be extracted from the neural network using the MFCC technique. [1] I. Patel and YS Rao , "Speech recognition using Hidden Markov Model with MFCC - Subband technique, in Proc . International Conference on Recent Trends in Information, Telecommunication and Computing, pp. 168-172, Mar. 2010. In this paper , we propose a method to extract the envelope of the spectrum in the logarithmic domain and to extract the feature vector in the envelope domain.

이때, 특징 벡터 추출을 위해 다양한 종류의 음향 신호가 이용될 수 있다. 예컨대, 동일한 속성의 음향 신호 및 서로 다른 속성의 음향 신호들이 이용될 수 있다. 동일한 속성의 음향 신호는 서로 다른 종류의 드론들 각각의 프로펠러 소리에 해당하는 음향 신호를 나타낼 수 있고, 서로 다른 속성의 음향 신호는 새 소리, 비행기 소리, 자동차 소리 등의 음향 신호를 나타낼 수 있다.At this time, various kinds of acoustic signals can be used for extracting the feature vector. For example, acoustic signals of the same property and acoustic signals of different properties may be used. Acoustic signals of the same property can represent acoustic signals corresponding to propeller sounds of different kinds of drones, and acoustic signals of different properties can represent acoustic signals such as new sounds, airplanes, and automobile sounds.

먼저, 특징 벡터 추출부(500)는 음향 신호(acoustic signal)을 디지털 신호로 변환할 수 있다. 이때, 모든 이산 시간(discrete time) 영역에서 신호의 각 레벨을 나타내는 디지털 신호가 생성될 수 있다. First, the feature vector extractor 500 may convert an acoustic signal into a digital signal. At this time, a digital signal representing each level of the signal in all the discrete time regions may be generated.

410 단계에서, 프리 엠퍼시스(pre-emphasis) 처리부(510)는 디지털 신호를 대상으로 고주파 필터링(highpass filtering)을 수행하는 프리 엠퍼시스(pre-emphasis) 처리를 수행할 수 있다. 예컨대, 프리 엠퍼시스 처리부(510)는 상기 디지털 신호를 대상으로, 기록 매체에 의해 일정 레벨로 압축된 고주파에서의 에너지 량을 증가시킴으로써 전체 신호대 잡음비(SNR)를 향상시키는 프리 엠퍼시스 처리를 수행할 수 있다. 즉, 프리 엠퍼시스 처리부(510)는 1차 하이패스 필터(highpass filter)로 구성될 수 있으며, 낮은 주파수의 크기와 관련하여 상대적으로 더 높은 주파수의 크기를 증가 시키기 위한 필터링을 수행할 수 있다. 상기 1차 하이패스 필터는 아래의 수학식 4와 같이 표현될 수 있다.In operation 410, the pre-emphasis processor 510 may perform a pre-emphasis process for performing highpass filtering on a digital signal. For example, the pre-emphasis processing unit 510 performs pre-emphasis processing for increasing the total signal-to-noise ratio (SNR) by increasing the amount of energy at a high frequency compressed to a predetermined level by the recording medium with respect to the digital signal . That is, the pre-emphasis processing unit 510 may be configured as a first-order high-pass filter and may perform filtering to increase the size of a relatively higher frequency with respect to the size of the low frequency. The first-order high-pass filter can be expressed by Equation (4) below.

[수학식 4]&Quot; (4) "

수학식 4에서,

은 프리 엠퍼시스 처리부(510)로 입력되는 음향 신호(즉, 디지털 신호)를 나타내고,

은 출력 신호를 나타낼 수 있다. 그리고,

는 필터 계수를 나타낼 수 있다.In Equation (4)

(I.e., a digital signal) input to the pre-emphasis processing unit 510,

Can represent an output signal. And,

Can represent filter coefficients.

420 단계에서, 특징 벡터 추출부(500)는 프리 엠퍼시스 처리된 신호를 대상으로 프레이밍(framing, 520), 윈도윙(windowing, 530), FFT 처리부(FFT, 540)를 통해 스펙트로그램(spectrogram)을 생성할 수 있다.In operation 420, the feature vector extractor 500 generates a spectrogram through a framing 520, a windowing 530, and an FFT processor 540 on the pre-emphasis processed signal. Can be generated.

예컨대, 프레이밍(framing, 520) 과정에서, 프레임 쉬프트(frame shift)가 수행될 수 있으며, 프레임 쉬프트는 프레임 길이의 절반에 해당할 수 있다. 이때, 프레임 쉬프트가 수행되면서, 인접한 프레임들 간에 오버랩(overlap)이 발생할 수 있다. 즉, 프레이밍(520)을 통해 인접 프레임들 간의 오버랩을 유도할 수 있다. 이처럼, 프레임 쉬프트를 통해 오버랩된 프레임들(즉, 디지털화된 음향 신호는 복수개의 프레임들로 구성됨)을 대상으로, 각 프레임에 대해 윈도윙(windowing, 530)이 수행될 수 있다. 이때, 프레이밍(520) 및 윈도윙(530) 과정을 통해 불연속을 피하면서 윈도우 경계에서 신호값이 0으로 감소할 수 있다. 그러면, FFT 처리부(540)는 윈도윙된 신호를 대상으로 FFT 처리를 수행하여 스펙트로그램을 생성할 수 있다.For example, in the framing 520 process, a frame shift may be performed and the frame shift may correspond to half of the frame length. At this time, as the frame shift is performed, overlap may occur between adjacent frames. That is, it is possible to induce an overlap between adjacent frames through the framing 520. As described above, windowing 530 may be performed for each frame, with overlapping frames (i.e., the digitized sound signal consisting of a plurality of frames) through a frame shift. At this time, the signal value may be reduced to 0 at the window boundary while avoiding discontinuity through the framing 520 and the windowing 530 process. Then, the FFT processing unit 540 can perform the FFT processing on the windowed signal to generate the spectrogram.

즉, FFT 처리부(540)는 윈도윙된 신호를 고속 푸리에 변환(Fast Fourier Transform, FFT)할 수 있으며, 고속 푸리에 변환(FFT)된 출력 신호는 아래의 수학식 5와 같이 표현될 수 있다.That is, the FFT processing unit 540 can perform fast Fourier transform (FFT) on the windowed signal, and a fast Fourier transform (FFT) output signal can be expressed by Equation (5) below.

[수학식 5]&Quot; (5) "

수학식 5에서,

는 i번째 프레임(frame)에 해당하는 음향 데이터를 나타내고,

는 i번째 프레임(frame)에 해당하는 윈도우 함수(window function)를 나타낼 수 있다. 그리고, I는 전체 프레임들의 개수를 나타내고, N은 프레임의 길이,

는 주파수 영역에서 윈도윙된 신호를 나타낼 수 있다. 예컨대, 해밍 윈도우(hamming window)가 이용될 수 있으며, N-포인트 윈도우 함수(N-point window function)는 아래의 수학식 6과 같이 표현될 수 있다.In Equation (5)

Represents the sound data corresponding to the i-th frame,

May represent a window function corresponding to the i-th frame. I denotes the total number of frames, N denotes the length of the frame,

Can represent a windowed signal in the frequency domain. For example, a hamming window may be used, and an N-point window function may be expressed by Equation (6) below.

[수학식 6]&Quot; (6) "

430 단계에서, FFT 변환된 신호는 멜 스케일 필터 뱅크(Mel-scale filter bank, 550)를 통과하여 출력될 수 있다. 멜 스케일 필터 뱅크(550)는 FFT 처리부(540)로부터 FFT 변환된 신호를 입력받아 멜 스케일(Mel scale)에 주파수를 맵핑(mapping)할 수 있다. 예컨대, 아래의 수학식 7에 기초하여 멜 스케일에 주파수가 맵핑(mapping)될 수 있다.In operation 430, the FFT-transformed signal may be output through a Mel-scale filter bank 550. The mel-scale filter bank 550 receives the FFT-transformed signal from the FFT processing unit 540 and can map a frequency to a Mel scale. For example, the frequency may be mapped to the mel Scale based on Equation (7) below.

[수학식 7]&Quot; (7) "

그리고, 멜 스케일 필터 뱅크의 m번째(

) 필터를 구현하기 위한 방정식은 아래의 수학식 8과 같이 표현될 수 있다.Then, the m-th (

) &Lt; / RTI > filter can be expressed as Equation (8) below.

[수학식 8]&Quot; (8) "

수학식 8에서,

는 멜 스케일 주파수를 나타내고, M은 주파수 대역에서 필터의 총 개수를 나타낼 수 있다.In Equation (8)

Represents the Mel Scale frequency, and M represents the total number of filters in the frequency band.

각 프레임의 음향 신호(즉, 음향 시퀀스(sequence))가 멜 스케일 필터 뱅크(550)를 통과하고 난 후, 각 필터에 가중 스펙트럼 값이 더해지면, 필터당 하나의 단일 값이 얻어지고, 모든 계수는 멜 스펙트럼을 형성할 수 있다. 이때, 아래의 수학식 [9]에 기초하여 멜 스펙트럼의 대수가 계산될 수 있다.After the acoustic signal (i.e., the acoustic sequence) of each frame has passed through the melscale filter bank 550, and a weighted spectrum value is added to each filter, one single value per filter is obtained, Can form a mel-spectrum. At this time, the logarithm of the mel-spectrum can be calculated based on the following equation [9].

[수학식 9]&Quot; (9) "

440 단계에서, DCT(Discrete Cosine Transform, 570)는 멜 스케일 필터 뱅크(550)를 통과한 신호를 대상으로 이산 코사인 변환(DCT)을 수행하여 켑스트럴 계수를 계산할 수 있다. 예컨대, 아래의 수학식 10에 기초하여 이산 코사인 변환을 수행함으로써, n번째(

) 켑스트럴 계수가 계산될 수 있다.In step 440, the discrete cosine transform (DCT) 570 may perform discrete cosine transform (DCT) on the signal that has passed through the mel-scale filter bank 550 to calculate the cepstral coefficients. For example, by performing the discrete cosine transform based on the following equation (10), the nth (

) Stellar coefficients can be calculated.

[수학식 10]&Quot; (10) "

수학식 10에서, N은 켑스트럴 계수의 총 수를 나타낼 수 있다. In Equation (10), N may represent the total number of polynomial coefficients.

440 단계에서, MFCC 구성부(MFCC, 580)는 계산된 켑스트럴 계수에 기초하여 MFCC를 구성할 수 있다. 예컨대, 위의 수학식 10을 기반으로 계산된 켑스트럴 계수를 대상으로, 처음 12개의 켑스트럴 계수는 MFCC를 구성하기 위해 유지될 수 있다. In step 440, the MFCC component (MFCC, 580) may configure the MFCC based on the calculated cepstral coefficients. For example, for the cepstral coefficients computed based on Equation (10) above, the first 12 cepstral coefficients can be maintained to construct the MFCC.

수학식 10에서, 계산된 켑스트럴 계수에 기초하여 구성된 MFCC(즉, 특징 벡터)는 단일 프레임의 전력 스펙트럴(power spectral) 포락선 만을 나타내지만, 동적 특징들을 나타내기 위해 델타 계수가 더해질 수 있다. 예컨대, 아래의 수학식 11에 기초하여 MFCC 구성 시 이용되는 델타 계수가 계산될 수 있다.In Equation (10), the MFCC (i. E., The feature vector) configured based on the calculated cepstral coefficients represents only a power spectral envelope of a single frame, but a delta coefficient may be added to represent dynamic features . For example, the delta coefficient used in the MFCC configuration may be calculated based on Equation (11) below.

[수학식 11]&Quot; (11) "

즉, 처음 12개의 켑스트럴 계수들은 수학식 10에 기초하여 계산되고, 이후 12개의 델타 계수들은 수학식 11에 기초하여 계산될 수 있으며, 처음 12개의 켑스트럴 계수들과 이후 12개의 델타 계수들로 구성된 MFCC는 제1 MFCC에 해당할 수 있다. 그리고, 제2MFCC는 다른 방법으로 12개의 델타 계수들이 계산될 수 있으며, 상기 제1MFCC에 해당하는 계수들과 다른 방법으로 계산된 12개의 델타 계수들에 기초하여 제2 MFCC가 구성될 수 있다. 예컨대, 아래의 수학식 12에 기초하여 제1MFCC와는 다른 12개의 델타 계수들이 계산될 수 있다.That is, the first 12 cepstral coefficients are calculated based on Equation (10), and then the twelve delta coefficients can be calculated based on Equation (11), and the first 12 cepstral coefficients and the following 12 delta coefficients May correspond to the first MFCC. And, the second MFCC can be calculated by twelve different delta coefficients, and the second MFCC can be constructed based on twelve delta coefficients calculated in a different manner from the coefficients corresponding to the first MFCC. For example, twelve delta coefficients different from the first MFCC can be calculated based on Equation (12) below.

[수학식 12]&Quot; (12) "

수학식 12에서 델타(

)는 계수의 차를 계산하기 위해 이용될 수 있다.In equation (12), delta

) Can be used to calculate the difference of the coefficients.

이처럼, 특징 벡터 추출을 위해 2개의 MFCC 기법(예컨대, 24개 MFCC 및 36개 MFCC)이 이용될 수 있으며, 위의 표 1과 같이, 24개 MFCC는 12개 표준 MFCC 및 수학식 11 기반의 12개 델타 MFCC를 포함하고, 36개 MFCC는 12개 표준 MFCC, 수학식 11 기반의 12개 델타 MFCC 및 수학식 12 기반의 12개 델타 MFCC를 포함할 수 있다. 이러한 최적화는 실시간 인식 프로세스 이전에 수행되므로, 특징 추출 및 학습의 최적화와 관련된 복잡도는 크게 문제가 되지 않을 수 있다.As such, two MFCC techniques (e.g., 24 MFCCs and 36 MFCCs) may be used for feature vector extraction, and 24 MFCCs may be classified into 12 standard MFCCs and 12 And the 36 MFCCs can include 12 standard MFCCs, 12 delta MFCCs based on Equation (11), and 12 delta MFCCs based on Equation (12). Since such optimization is performed prior to the real-time recognition process, the complexity associated with feature extraction and optimization of the learning may not be a significant problem.

이하에서는, 2개의 MFCC 기법에 기초하여 추출된 특징 벡터들, 예컨대, 24개 MFCC(즉, 특징 벡터), 및 36개 MFCC(즉, 특징 벡터)를 기반으로 HMM 모델을 학습시켜 드론 등의 무인 항공기(UAV)를 인식하는 동작에 대해 설명하고자 한다. 본 실시예들에서 다차원 HMM(Hidden Markov Model)이 이용될 수 있다.Hereinafter, an HMM model is learned based on extracted feature vectors based on two MFCC techniques, for example, 24 MFCCs (i.e., feature vectors) and 36 MFCCs (i.e., feature vectors) The operation of recognizing an aircraft (UAV) will be described. In the present embodiments, a multidimensional HMM (Hidden Markov Model) can be used.

HMM 모델은 파라메트릭 랜덤 프로세스(Parametric random process)로 특성화되는 변수의 정렬된 시퀀스(sequence)에 대한 통계 모델을 나타낼 수 있다. HMM에서 상태(states)는 은닉(hidden)되는 반면, 입력은 관측/관찰 가능할 수 있다. 관측 벡터의 시퀀스(sequence)는 아래의 수학식 13과 같이 표현될 수 있으며, HMM 모델은 아래의 수학식 14와 같이 표현될 수 있다. The HMM model may represent a statistical model for an ordered sequence of variables characterized by a parametric random process. In HMMs, states may be hidden, while inputs may be observable / observable. The sequence of the observation vector can be expressed as Equation (13) below, and the HMM model can be expressed as Equation (14) below.

[수학식 13]&Quot; (13) "

[수학식 14]&Quot; (14) "

수학식 14에서, N은 은닉 상태(hidden state)의 개수를 나타내고, M은 상태(state) 당 관측 수를 나타내고, A는 상태 천이(state transition) 행렬을 나타낼 수 있다. 그리고,

는 상태 i에서 상태 j로 천이될 확률을 나타내고, B는 상태당 방출(emission) 확률 분포를 나타낼 수 있다.

는 HMM 모델이 상태 i에 있을 때 관측 k를 방출할 확률을 나타낼 수 있다. 상태 i에서의 원소가

인 경우,

는 초기 상태 분포의 확률로서, 압축 표기법에 따라 아래의 수학식 15와 같이 표현될 수 있다. In Equation 14, N represents the number of hidden states, M represents the number of observations per state, and A represents a state transition matrix. And,

Represents the probability of transition from state i to state j, and B may represent the emission probability distribution per state.

Can represent the probability of releasing the observation k when the HMM model is in state i. The element in state i

Quot;

Is the probability of the initial state distribution, and can be expressed as shown in Equation (15) according to the compressed notation.

[수학식 15]&Quot; (15) "

그리고, HMM 모델을 학습시키기 위해 학습 데이터 세트가 미리 지정될 수 있다. 예컨대, 음향 신호의 종류 별로 초기 학습 데이터 세트가 미리 지정될 수 있다. 여러 개의 음향 신호 각각에 해당하는 클러스터(cluster) 측면에서 초기 학습 데이터 세트기 정의될 수 있다.Then, a learning data set can be specified in advance to allow learning of the HMM model. For example, an initial learning data set may be specified in advance for each kind of acoustic signal. An initial learning data set may be predefined in terms of a cluster corresponding to each of a plurality of acoustic signals.

그리고, 각각의 음향 신호에 해당하는 클러스터는 HMM 모델로 표현될 수 있다. S개 클러스터로 구성된 HMM은 아래의 수학식 16과 같이 표현될 수 있다.The clusters corresponding to the respective acoustic signals can be expressed by the HMM model. An HMM composed of S clusters can be expressed as Equation (16) below.

[수학식 16]&Quot; (16) "

수학식 16에서, S는 s번째(

) 클러스터에 해당하는 s번째(

) HMM 모델을 나타낼 수 있다. 그리고, 특징 벡터들을 기반으로 생성된 학습 데이터 세트를 이용하여 HMM 모델(즉, 분류기(classifier))가 학습될 수 있다. 그러면, 학습된 HMM 모델을 사용하여 테스트 데이터 세트(예컨대, 입력되는 음향 신호) 내의 음향 신호가 드론의 프로펠러 소리에 해당하는지, 자동차, 새 소리 등에 해당하는지 여부가 인식될 수 있다.In Equation (16), S is the s-th (

) &Lt; / RTI > corresponding to the cluster (

) HMM model. The HMM model (i.e., a classifier) can be learned using the learning data set generated based on the feature vectors. Then, using the learned HMM model, it can be recognized whether the acoustic signal in the test data set (for example, the inputted acoustic signal) corresponds to the sound of the propeller of the drones, or whether it corresponds to a car, a new sound or the like.

HMM 모델의 파라미터는 학습 데이터 세트에 포함된 학습 데이터에 의해 결정되며, 상기 학습 데이터는 HMM 모델로 입력되는 입력 데이터로서, 특징 벡터 추출부(110)에서 추출된 특징 벡터를 나타낼 수 있다. 학습된 HMM 모델은 가장 가능성이 높은, 즉, 원하는 신호와 가장 일치하는 가능성이 높은 음향 신호를 찾기 위해 이용될 수 있다. 예컨대, 입력된 테스트 데이터 세트에 포함된 음향 신호를 대상으로, 드론의 음향 신호일 가능성이 높은 신호를 찾기 위해 이용될 수 있다. The parameters of the HMM model are determined by the learning data included in the learning data set, and the learning data is input data input to the HMM model, and can express the feature vector extracted by the feature vector extracting unit 110. The learned HMM model can be used to find the most probable, i.e., an acoustic signal most likely to match the desired signal. For example, it can be used to search for a signal that is likely to be an acoustic signal of a drones, based on the acoustic signal contained in the input test data set.

학습 데이터 세트는 아래의 수학식 17과 같이 표현될 수 있다.The learning data set can be expressed as Equation (17) below.

[수학식 17]&Quot; (17) "

관측 데이터 O의 세트를 제공할 가능성이 높은 HMM 파라미터를 찾아내는 것이 바람직할 수 있으며, 상기 가능성이 높은 HMM 파라미터를 찾아내는 것이 HMM 모델 최적화를 위한 학습 시 중요 문제가 될 수 있다. 여기서, 관측 데이터 O는 각 클러스터에 해당하는 학습 데이터 세트에서 추출된 특징 벡터를 나타낼 수 있다.It may be desirable to find an HMM parameter that is likely to provide a set of observed data O. Finding such a likely HMM parameter may be an important issue in learning for HMM model optimization. Here, the observation data O can represent a feature vector extracted from a learning data set corresponding to each cluster.

도 6은 본 발명의 일실시예에 있어서, BWA(Baum-Welch Algorithm) 기반의 HMM 모델을 학습시키는 동작을 도시한 흐름도이다.FIG. 6 is a flowchart illustrating an operation of learning a BWA (Baum-Welch Algorithm) based HMM model in an embodiment of the present invention.

도 6은 BWA에 기초하여 해결되는 HMM 모델의 학습 과정을 나타낼 수 있다. FIG. 6 shows the learning process of the HMM model solved based on the BWA.

도 6에서, 각 단계들(610 내지 630 단계)은 도 1의 모델 학습부(120)에 의해 수행될 수 있다.6, the steps 610 to 630 may be performed by the model learning unit 120 of FIG.

610 단계에서, 초기 학습 데이터 세트가 지정된 이후, 특징 벡터를 기반으로 학습 데이터 세트가 생성되고, HMM 모델이 초기화될 수 있다.In step 610, after the initial learning data set is designated, a learning data set is generated based on the feature vector, and the HMM model can be initialized.

620 단계에서, 모델 학습부(120)는 순방향 및 역방향(Forward/Backward) 알고리즘에 기초하여 시간 t에서의 상태 i에 대한 순방향 우도(forward likelihood)

, 및 역방향 우도(backward likelihood)

를 계산할 수 있다. 위의 비특허문헌 [2] G. B. Singh and H. Song, "Using hidden Markov models in vehicular crash detection," IEEE Transactions on Vehicular Technology, vol. 58, no. 3, pp. 1119-1128, Mar. 2009.에서는 HMM 모델의 상태(state) 별로 우도(likelihood)를 계산하는 자세한 구성을 제시하고 있다.In step 620, the model learning unit 120 calculates a forward likelihood for the state i at time t based on the Forward / Backward algorithm,

, And a backward likelihood

Can be calculated. [ Non-Patent Document 2] GB Singh and H. Song, "Using hidden Markov models in vehicular crash detection," IEEE Transactions on Vehicular Technology, vol. 58, no. 3, pp. 1119-1128, Mar. In 2009. , a detailed structure for calculating the likelihood for each HMM model state is presented.

그러면, 관측된 배열 O와 HMM 모델 파라미터 i가 주어지면, 시간 t에서 상태 i에 해당할 확률은 베이즈(Bayes)의 정리에 기초하여 아래의 수학식 18과 같이 표현될 수 있다.Then, given the observed sequence O and the HMM model parameter i, the probability of being in state i at time t can be expressed as: < EMI ID = 18.0 >

[수학식 18]&Quot; (18) "

이때, 상태(state) i에서 상태(state) j로의 천이 우도(transition likelihood)는 아래의 수학식 19와 같이 표현될 수 있다.At this time, a transition likelihood from state i to state j can be expressed by Equation 19 below.

[수학식 19]&Quot; (19) "

630 단계에서, 순방향 및 역방향 우도에 기초하여 HMM 파라미터가 업데이트될 수 있다.In step 630, the HMM parameters may be updated based on the forward and backward likelihoods.

예컨대, 아래의 수학식 20 내지 22에 기초하여, HMM 모델의 파라미터들(즉, HMM 파라미터들)이 재추정될 수 있으며, 재추정된 값으로 HMM 파라미터들이 업데이트될 수 있다. 여기서, 재추정(re-estimated)은, 새로운 모델

에 대한 학습 시퀀스의 우도(likelihood)를 증가시킬 수 있으며, 상기 새로운 모델

은 아래의 수학식 23과 같이 표현될 수 있다.For example, based on the following equations (20) to (22), the parameters of the HMM model (i.e., the HMM parameters) can be re-estimated and the HMM parameters can be updated with the re-estimated value. Here, the re-estimation is a re-

The likelihood of the learning sequence for the new model can be increased,

Can be expressed by the following equation (23).

[수학식 20]&Quot; (20) "

[수학식 21]&Quot; (21) "

[수학식 21]&Quot; (21) "

[수학식 23]&Quot; (23) "

도 7은 본 발명의 일실시예에 있어서, 학습된 HMM 모델에 기초하여 원하는 음향 신호를 인식하는 동작을 도시한 블록도이다.7 is a block diagram illustrating an operation of recognizing a desired acoustic signal based on a learned HMM model in an embodiment of the present invention.

HMM 모델 학습 과정에서, S개 클러스터로 구성된 HMMs,

를 혼합한 분류기(classifier)가 완성될 수 있다. 그리고, 테스트 과정에서, HMM 모델을 기반으로 주어진 시퀀스(sequence)의 확률을 최대화하는 상태 시퀀스를 찾기 위해 비터비(Viterbi) 알고리즘이 이용될 수 있다. 이때, 인식 과정에서 특징 벡터

의 시퀀스로 표현되는 입력 음향 신호를 검색하는 것이 중요하며, 인식과정은 시퀀스가 주어지면 가장 높은 확률로 HMM을 찾는 과정을 나타내는 것으로서 아래의 수학식 24와 같이 표현될 수 있다.In the HMM model learning process, HMMs composed of S clusters,

A classifier may be completed. And, in the test process, a Viterbi algorithm can be used to find a state sequence that maximizes the probability of a given sequence based on the HMM model. In this case,

It is important to search for an input sound signal represented by a sequence of the HMMs. The recognition process represents a process of finding an HMM with the highest probability given a sequence, and can be expressed as Equation (24) below.

[수학식 24]&Quot; (24) "

예컨대, HMM 모델에 테스트 데이터 세트(710)에 포함된 음향 신호(즉, 시퀀스)에 해당하는 특징 벡터(720)가 입력 데이터로 이용될 수 있으며, 입력된 특징 벡터(720)와 가장 높은 확률로 일치하는 HMM을 찾음으로써, 입력된 음향 신호가 드론관련 음향 신호인지, 비행기관련 음향 신호인지 여부가 인식될 수 있다. 즉, 입력된 다양한 종류의 특징 벡터들(720)을 대상으로, 가장 높은 확률로 일치하는 HMM 모델이 속하는 클러스터로 상기 특징 벡터들 각각이 분류될 수 있으며(730), 분류를 통해 해당 특징 벡터가 어떤 소리에 해당하는지 여부가 인식될 수 있다(740).For example, the HMM model may use the feature vector 720 corresponding to the sound signal (i.e., sequence) included in the test data set 710 as the input data, and the feature vector 720 with the highest probability By finding a matching HMM, it can be recognized whether the input acoustic signal is a drones related acoustic signal or an airplane related acoustic signal. That is, each of the feature vectors may be classified into clusters belonging to the HMM model matching the highest probability with respect to the input various kinds of feature vectors 720 (730) It may be recognized (740) whether it corresponds to what sound.

이하에서는 드론(drone)의 음향 인식에 대한 실험 결과 및 성능 평가 결과에 대해서 설명하고자 한다. 실험을 위해 학습 데이터 세트와 테스트 데이터 세트로 구성된 데이터베이스가 설정되며, 데이터베이스에는 각 데이터 세트에 대한 음향 신호의 클러스터들이 다수 포함될 수 있다. 그리고, 상기 두 개의 MFCC 기법을 사용하여 특징 벡터를 추출하는 동작은 각 클러스터 별로 서로 다른 음향 신호를 기반으로 수행될 수 있다. 이후, HMM 모델이 초기화되고, 학습 데이터 세트의 클러스터로부터의 특징 벡터가 초기화된 HMM 모델에 적용되어 HMM 파라미터를 최적화할 수 있다. 즉, 특징 벡터를 기반으로 HMM 파라미터를 최적화하고, 최적화된 파라미터로 HMM 파라미터를 갱신하는 과정을 통해 HMM 모델을 학습시킬 수 있다. 그리고, 입력되는 소리(즉, 실질 환경에 존재하는 다양한 종류의 음향 신호)를 테스트 데이터 세트로 설정하고, 테스트 데이터 세트를 대상으로 HMM 모델을 분류기(classifier)로 이용하여 테스트 데이터 세트에 포함된 음향 신호가 어떤 음향 신호인지 여부가 인식될 수 있다.Hereinafter, experimental results and performance evaluation results of acoustic recognition of a drone will be described. For the experiment, a database composed of a training data set and a test data set is set, and the database may include a plurality of clusters of acoustic signals for each data set. The operation of extracting feature vectors using the two MFCC techniques can be performed based on different acoustic signals for each cluster. Then, the HMM model is initialized, and the feature vector from the cluster of learning data sets is applied to the initialized HMM model to optimize the HMM parameters. In other words, the HMM model can be learned by optimizing the HMM parameters based on the feature vector and updating the HMM parameters with optimized parameters. Then, an input sound (that is, various types of acoustic signals existing in a real environment) is set as a test data set, and an HMM model is used as a classifier for a test data set, It can be recognized whether or not the signal is an acoustic signal.

이때, 드론에 해당하는 음향 신호의 인식을 위해 플랫폼(platform)이 설계될 수 있다. 학습 데이터 세트 및 테스트 데이터 세트는 5개의 클러스터(즉, 드론, 공항에서의 항공기, 자동차, 새, 및 비 소리에 해당하는 클러스터 1 내지 클러스터 5)를 포함할 수 있다. 아래의 표 2는 클러스터 관점에서의 테스트 데이터 세트에 포함된 음향 신호를 나타낼 수 있다.At this time, a platform may be designed for recognition of a sound signal corresponding to the drones. The training data set and the test data set may include five clusters (i.e., clusters 1 through 5 corresponding to drones, aircraft, cars, birds, and noises at airports). Table 2 below shows the acoustic signals included in the test data set from the cluster viewpoint.

표 2에서, 3개의 학습 데이터 세트는

로 표현될 수 있으며, 3개의 학습 데이터 세트 별 차이는 클러스터 1, 클러스터 2, 클러스터 4, 클러스터 5에 학습을 위하여 1가지, 2가지 또는 3가지 종류의 음향 신호들이 포함되는지 여부에 해당할 수 있다. 여기서, 클러스터 3의 경우, 2가지, 3가지, 또는 5가지 종류의 음향 신호가 수집될 수 있으며, 자동차에 해당하는 음향 신호의 짧은 지속시간으로 인해 각 클러스터에 동일한 음향 데이터 길이가 유지될 수 있다. 그리고, 3개의 학습 데이터 세트 내의 서로 다른 클러스터에 해당하는 각 음향 신호에 대한 특징 벡터들이 HMM 모델로 입력되기 위해 추출될 수 있다.In Table 2, three sets of learning data

, And the difference between the three learning data sets may correspond to whether one, two, or three kinds of acoustic signals are included in the cluster 1, cluster 2, cluster 4, and cluster 5 for learning . Here, in the case of the cluster 3, two, three, or five types of acoustic signals may be collected, and the same acoustic data length may be maintained in each cluster due to the short duration of the acoustic signal corresponding to the vehicle . Then, feature vectors for each acoustic signal corresponding to different clusters in the three learning data sets can be extracted for input to the HMM model.

이때, 2개의 MFCC 기법을 이용하여 특징 벡터들이 추출될 수 있으며, 아래의 표 3은 특징 벡터 추출을 위해 이용되는 파라미터들을 나타낼 수 있다.At this time, the feature vectors may be extracted using the two MFCC techniques, and Table 3 below may indicate the parameters used for extracting the feature vector.

표 3의 파라미터들을 이용하여 특징 벡터들이 추출되면, 추출된 특징 벡터들을 이용하여 분류기(즉, HMM 모델)를 구성하기 위해, 3개의 학습 데이터 세트 각각에 해당하는 3개의 분류기(즉, HMM 모델)가 초기화될 수 있으며,

로 표현될 수 있다. 여기서, n은 분류 기호 인덱스(index)를 나타낼 수 있으며, 각각의

내에서 5개의 클러스터에 대해 HMM 모델이 모델링될 수 있다(

). 그러면, 각 HMM 파라미터들은 상기 특징 벡터들에 기초하여 학습되고, 각각 5개의 HMM으로 구성된 3개의 분류기를 이용하여 입력되는 음향 신호가 인식될 수 있다.When the feature vectors are extracted using the parameters of Table 3, three classifiers (i.e., HMM models) corresponding to each of the three learning data sets are constructed to construct a classifier (i.e., HMM model) Can be initialized,

. &Lt; / RTI > Where n may represent a classification index,

The HMM model may be modeled for five clusters

). Then, each HMM parameter is learned based on the feature vectors, and an acoustic signal input using three classifiers composed of five HMMs can be recognized.

일례로, 24개 MFCC에 기초하여 드론에 해당하는 음향 신호를 인식하는 경우, 각 학습 데이터 세트 내의 각 클러스터에 대해 100개의 샘플(sample) 데이터가 수집될 수 있다. 총 입력되는 음향 신호의 총

개가 테스트 샘플인 경우, 분류기를 통해 성공적으로 인식된 샘플 수를 계산하여

로 표현하고, 인식률이라 불리는 확률은

로서 계산될 수 있다. 일반적인 음향 신호의 인식률은 아래의 수학식 25와 같이 표현될 수 있다.For example, when recognizing acoustic signals corresponding to drones based on 24 MFCCs, 100 sample data can be collected for each cluster in each learning data set. Total of the total input sound signal

If the dog is a test sample, the number of successfully recognized samples is calculated through the classifier

, And the probability called the recognition rate is

Lt; / RTI > The recognition rate of a general acoustic signal can be expressed by the following equation (25).

[수학식 25]&Quot; (25) "

아래의 표 4는 제1 MFCC 기법을 이용하여 인식된 음향 신호의 실험 결과를 나타낼 수 있다.Table 4 below shows experimental results of acoustic signals recognized using the first MFCC technique.

표 4에서는 드론 4 모델이 DJI Phantom 3 프로(Pro)이고, 비행기 4(Plane 4)가 에어버스(Airbus) A320인 경우, 24개 MFCC를 이용하여 각기 다른 학습 데이터 세트 별 인식률을 나타낼 수 있다.In Table 4, if the Dron 4 model is the DJI Phantom 3 Pro (Pro) and the Plane 4 is the Airbus A320, 24 MFCCs can be used to represent recognition rates for different learning data sets.

표 4에 따르면, 학습 데이터 세트 내에 포함된 음향 신호의 종류가 많을수록 인식률이 증가함을 확인할 수 있다. 학습 데이터 세트

가 클러스터 당 단일 종류의 음향 신호를 포함할 때, 드론(Drone)의 음향 신호에 대한 인식률은 64%에 불과하지만, 클러스터 1에 속하지 않는 다른 입력 음향 신호는 해당 클러스터에서 잘못 분류되지 않을 수 있다.According to Table 4, it can be seen that the recognition rate increases as the type of the acoustic signal included in the learning data set increases. Learning data set

The recognition rate for the acoustic signal of the drone is only 64%, but other input acoustic signals not belonging to cluster 1 may not be misclassified in the cluster.

그리고, 학습 데이터 세트

내에 포함된 음향 신호의 종류가 클러스터 당 2 종류로 증가하면, 최대 정확도는 89%로 증가하고, 학습 데이터 세트

내 음향 신호의 종류가 클러스터 당 3가지 유형으로 증가하면, 정확도에 대한 확률은 100%로 됨을 확인할 수 있다. 즉, 학습 데이터 세트

의 경우, 학습 데이터 세트

및

를 사용할 때 보다 정확도가 상당히 높음을 알 수 있다. 표 4의 실험 데이터를 참조하면, 학습 데이터 세트에 포함된 학습 데이터의 양이 많이 추가될수록 인식률 성능이 향상됨을 알 수 있다.Then,

The maximum accuracy increases to 89%, and when the number of kinds of acoustic signals included in the learning data set

If the type of my acoustic signal increases to three types per cluster, the probability of accuracy is 100%. That is,

, The learning data set

And

It can be seen that the accuracy is much higher than when using Referring to the experimental data in Table 4, it can be seen that the more the amount of learning data included in the learning data set is added, the better the recognition rate performance is.

마찬가지로, 공항의 비행기, 자동차, 새, 또는 비 등의 소리에 해당하는 음향 신호가 입력되면, 상기 음향 신호들 각각을 자신의 클러스터 모델로 분류 가능하고, 해당 음향 신호와 관련된 학습 데이터가 증가할수록 해당 음향 신호를 인식하는 정확도(즉, 인식률)도 마찬가지로 높아질 수 있다. Likewise, when an acoustic signal corresponding to a sound of an airplane, an automobile, a bird, a rain, or the like of an airport is inputted, each of the sound signals can be classified into its own cluster model. As the learning data related to the acoustic signal increases, The accuracy of recognizing the acoustic signal (i.e., the recognition rate) can be similarly increased.

아래의 표 5는 제2 MFCC 기법을 이용하여 인식된 음향 신호의 실험 결과를 나타낼 수 있다.Table 5 below shows experimental results of acoustic signals recognized using the second MFCC technique.

표 5는 36개 MFCC를 이용하여 각기 다른 학습 데이터 세트 별 인식률을 나타낼 수 있다. 표 5에 따르면, 표 4와 비교하여 36개 MFCC를 사용하는 경우의 인식률이 2% 내지 6% 향상되는 것을 확인할 수 있다. Table 5 shows the recognition rates for different learning data sets using 36 MFCCs. According to Table 5, it can be seen that the recognition rate is improved by 2% to 6% when 36 MFCCs are used as compared with Table 4. [

도 8은 본 발명의 일실시예에 있어서, 제1 및 2 MFCC기법을 이용하여 추출된 특징 벡터들을 기반으로 하는 평균 인식률을 도시한 그래프이다.FIG. 8 is a graph illustrating an average recognition rate based on feature vectors extracted using the first and second MFCC techniques, according to an embodiment of the present invention. Referring to FIG.

도 8을 참고하면 24개 MFCC를 사용하는 제1 MFCC 기법을 이용할 때 보다 36개 MFCC를 사용하는 제2 MFCC 기법을 이용하는 경우에 추출된 특징 벡터를 기반으로 학습된 분류기

을 학습 데이터 세트(training set 1)

과 함께 사용할 때 평균 인식률이 3% 향상되는 것을 확인할 수 있다. 그리고, 분류기

를 학습 데이터 세트(training set 2)

과 함께 사용할 때 성능이 2.5% 향상되는 것을 확인할 수 있다. 그리고, 2개의 MFCC 기법 모두에서, 학습된 분류기

를 학습 데이터(training set 3)

와 함께 이용한 드론의 음향 신호에 대한 인식률 성능은 모두 최적화된 분류기로 인해 100% 달성함을 확인할 수 있다. 동일한 분류기를 사용하는 두 개의 MFCC 기법들 간의 성능 차이는 주로 음향 신호의 특징들 및 더 긴 유효 특징 벡터들의 동적인 정보와 같은 음향 신호의 특징들과 관련된 세부 사항들이 더 많이 이용되므로, 보다 효율적이고 높은 인식 정확도가 달성될 수 있다.Referring to FIG. 8, when the second MFCC scheme using 36 MFCCs is used rather than the first MFCC scheme using 24 MFCCs,

(Training set 1)

It can be confirmed that the average recognition rate is improved by 3%. Then,

(Training set 2)

The performance is improved by 2.5%. And, in both MFCC schemes,

(Training set 3)

And the recognition rate performance of the acoustic signal of the dron which is used together with the classifier achieved by the optimized classifier is 100%. The performance differences between the two MFCC schemes using the same classifier are more efficient because the details associated with the characteristics of the acoustic signal, such as the characteristics of the acoustic signal and the dynamic information of the longer valid feature vectors, High recognition accuracy can be achieved.

도 9는 본 발명의 일실시예에 있어서, SNR에 대한 인식률 성능을 도시한 그래프이다.9 is a graph showing recognition rate performance for SNR in an embodiment of the present invention.

도 9에서는 잡음이 많은 실질 환경에서 제1 및 제2 MFCC 기법을 이용하여 특징 벡터를 추출하여 학습된 HMM 모델을 기반으로 인식된 음향 신호에 대한 인식률의 견고성을 테스트하기 위해 AWGN(Adaptive White Gaussian Noise) 환경이 고려되었다. 9, in order to test the robustness of the recognition rate with respect to the recognized acoustic signals based on the learned HMM model by extracting the feature vectors using the first and second MFCC techniques in a real environment with a lot of noises, Adaptive White Gaussian Noise (AWGN) ) Environment was considered.

도 9에 따르면, 인식률의 성능을 향상시키기 위해서는 SNR을 증가시키는 것이 합리적임을 확인할 수 있다. 그리고, 36개 MFCC 및 학습 데이터 세트 3을 이용하는 경우 최상의 성능을 제공하며, 동일한 학습 데이터 세트(즉, 학습 데이터 세트 3)와 24개 MFCC를 이용하는 경우, 두 번째로 좋은 성능을 제공함을 확인할 수 있다. 36개 MFCC와 24개 MFCC의 성능 차이는 0.01% 내지 0.03%이며, 36개 MFCC의 경우 SNR이 20dB일 때, 24개 MFCC의 경우 SNR이 16dB일 최대 정확도를 달성한다. 그리고, SNR이 5dB와 같이 낮은 경우에도, 인식률이 약 80%이상임을 확인할 수 있다. 결과적으로 도 9에 따르면, 36개 MFCC를 이용하는 제2 MFCC 기법 및 상대적으로 많은 학습 데이터를 가지는 HMM 모델을 이용하여 학습된 분류기가 실질 잡음이 많은 환경에서 드론(drone) 등의 무인 항공기(UAV)의 음향 신호를 인식하는데 적합함을 알 수 있다.According to FIG. 9, it can be confirmed that it is reasonable to increase the SNR in order to improve the performance of the recognition rate. It can be seen that it provides the best performance when using 36 MFCCs and learning data sets 3, and provides the second best performance when using the same learning data set (ie, learning data set 3) and 24 MFCCs . The performance difference between 36 MFCCs and 24 MFCCs is between 0.01% and 0.03%. For 36 MFCCs, the SNR is 20dB, and for 24 MFCCs, the SNR achieves the maximum accuracy of 16dB. Also, even if the SNR is as low as 5 dB, it can be confirmed that the recognition rate is about 80% or more. As a result, according to FIG. 9, the classifier learned using the second MFCC technique using 36 MFCCs and the HMM model having a relatively large amount of learning data is used as a UAV such as a drone in a realistic noisy environment, It can be seen that it is suitable for recognizing the acoustic signal of

이상에서 설명한 바와 같이, 무인 항공기(UAV)의 음향을 인식하는 시스템 및 방법은 드론(drone)의 프로펠러 소리를 이용하여 드론을 인식하기 위한 특징 벡터를 추출하고, 분류 접근법을 이용함으로써, 잡음이 많은 실질 환경에서 보다 정확하게 드론에 해당하는 음향 신호를 인식할 수 있다. 여러 클러스터에 대한 학습 데이터 세트 및 테스트 데이터 세트가 다양한 종류의 음향 신호를 위한 유형으로 설정되고, 24개 MFCC 및 36개 MFCC를 이용하여 특징 벡터를 추출하고, 추출된 특징 벡터를 이용하여 HMM 모델을 학습시키고, 학습된 HMM 모델에 기반한 분류기를 이용하여 3개의 학습 데이터 세트에 대한 인식률을 살펴보면, 많은 유형의 음향 신호가 학습 데이터 세트에 포함될수록 분류기가 점점 최적화되어 드론에 해당하는 음향 신호의 평균 인식률이 거의 100%에 다다를 수 있다. 36개 MFCC의 경우, 24개 MFCC 보다 동적인 정보를 더 많이 고려한 것이므로(즉, 동적인 정보인 12개의 델타 계수를 더 포함함), 인식률이 평균적으로 3 내지 3.5% 우수할 수 있다.As described above, the system and method for recognizing the sound of an unmanned aerial vehicle (UAV) extract feature vectors for recognizing drones using a sound of a propeller of a drone, and by using a classification approach, It is possible to recognize the sound signal corresponding to the drone more accurately in a real environment. Learning data sets and test data sets for various clusters are set as types for various types of acoustic signals, and a feature vector is extracted using 24 MFCCs and 36 MFCCs, and the HMM model is extracted using the extracted feature vectors The recognition rate of three learning data sets using a classifier based on the learned HMM model is as follows. As many types of acoustic signals are included in the learning data set, the classifiers are gradually optimized so that the average recognition rate Can reach almost 100%. In the case of 36 MFCCs, since the dynamic information is considered more than 24 MFCCs (ie, it includes 12 delta coefficients, which are dynamic information), the recognition rate can be 3 to 3.5% better on average.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A method for recognizing sound of an unmanned aerial vehicle,
Extracting acoustic signal-related feature vectors of a UAV based on a Mel Frequency Cepstral Coefficients (MFCC) technique in an environment where noise exists;
Learning a HMM (hidden Markov Model) model based on the learning data set generated based on the extracted feature vector; And
Recognizing an acoustic signal corresponding to an unmanned aerial vehicle (UAV) on the basis of the learned acoustic signal, based on the learned HMM model
Wherein the sound recognition method comprises the steps of:

The method according to claim 1,
Wherein the extracting of the feature vector comprises:
Extracting the feature vector on the basis of a first MFCC and a second MFCC composed of different numbers of cepstral coefficients
Wherein the sound recognition method comprises the steps of:

The method according to claim 1,
Wherein the extracting of the feature vector comprises:
Extracting acoustic signal feature vectors of unmanned aerial vehicles based on the envelope of the cepstral domain for a predetermined reference time
Wherein the sound recognition method comprises the steps of:

The method according to claim 1,
Wherein the extracting of the feature vector comprises:
A pre-emphasis processing step of performing highpass filtering on an acoustic signal corresponding to the unmanned aerial vehicle;
A spectrogram (spectrogram) representing a feature vector corresponding to an acoustic signal of the UAV is performed by performing framing, windowing, and fast Fourier transform (FFT) on the pre- );
Passing the fast Fourier transformed signal through a Mel-scale filter bank;
Performing a discrete cosine transform (DCT) on a signal passed through the mel-scale filter bank to calculate a cepstral coefficient; And
Constructing the MFCC based on the calculated cepstral coefficients;
Wherein the sound recognition method comprises the steps of:

A system for recognizing sound of an unmanned aerial vehicle,
A feature vector extractor for extracting acoustic signal-related feature vectors of a UAV based on a Mel Frequency Cepstral Coefficients (MFCC) technique in an environment where noise exists;
A model learning unit that learns an HMM (hidden Markov Model) model based on the learning data set generated based on the extracted feature vector; And
An acoustic recognition unit for recognizing an acoustic signal corresponding to an unmanned air vehicle (UAV) based on the learned HMM model,
Wherein the sound recognition system comprises: