KR102374144B1

KR102374144B1 - Abnormaly sound recognizing method and apparatus based on artificial intelligence and monitoring system using the same

Info

Publication number: KR102374144B1
Application number: KR1020200037299A
Authority: KR
Inventors: 배영훈; 김재훈
Original assignee: 아이브스 주식회사
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2022-03-15
Also published as: KR20210120508A

Abstract

본 발명은 인공지능 기반의 이상음원 인식 장치, 그 방법 및 이를 이용한 관제시스템에 관한 것으로, 본 발명의 실시예에 따른 인공지능 기반의 이상음원 인식 장치는, 임의음원 데이터셋의 데이터량을 토대로 수행된 기계학습 결과를 이용하여 이상음원을 분류하기 위한 이상음원용 모델을 생성하기 위한 모델 생성부; 외부로부터 입력된 음원신호를 시간차를 두고 인식하기 위해 음원구간을 시분할하여 추출하기 위한 음원구간 추출부; 상기 음원구간의 음원특징을 추출하기 위한 음원특징 추출부; 상기 이상음원용 모델을 이용하여 상기 음원특징을 미리 정해진 이상음원 종류별로 분류하기 위한 이상음원 분류부; 및 상기 음원특징의 시간차 분류결과에 따라 상기 음원신호의 이상음원 인식결과를 알려주기 위한 이상음원 인식부;를 포함한다.The present invention relates to an artificial intelligence-based abnormal sound source recognition apparatus, a method therefor, and a control system using the same. a model generation unit for generating a model for an abnormal sound source for classifying an abnormal sound source using the machine learning result; a sound source section extractor for time-dividing and extracting a sound source section to recognize a sound source signal inputted from the outside with a time difference; a sound source feature extraction unit for extracting sound source features of the sound source section; an abnormal sound source classification unit for classifying the sound source characteristics according to a predetermined abnormal sound source type using the abnormal sound source model; and an abnormal sound source recognition unit for notifying an abnormal sound source recognition result of the sound source signal according to the time difference classification result of the sound source characteristics.

Description

AI-based abnormal sound source recognition device, method and control system using the same

본 발명은 인공지능 기반의 이상음원 인식 장치, 그 방법 및 이를 이용한 관제시스템에 관한 것으로서, 보다 상세하게는 임의의 음원신호에 대해 음원구간을 시분할하여 추출한 후 인공지능 기반으로 하여 음원구간에 대해 이상음원을 추출 및 분류함으로써 정확하게 이상음원을 인식하기 위한, 인공지능 기반의 이상음원 인식 장치, 그 방법 및 이를 이용한 관제시스템에 관한 것이다.The present invention relates to an apparatus for recognizing an abnormal sound source based on artificial intelligence, a method thereof, and a control system using the same, and more particularly, time-division and extraction of a sound source section for an arbitrary sound source signal, and then, based on artificial intelligence, an abnormal sound source section To accurately recognize abnormal sound sources by extracting and classifying sound sources, artificial intelligence-based abnormal sound source recognition device, method, and control system using the same.

인공지능(Artificial Intelligence, AI) 시스템은 인간 수준의 지능을 구현하는 컴퓨터 시스템이며, 기존 규칙 기반 스마트 시스템과 달리 기계가 스스로 학습하고 판단하며 똑똑해지는 시스템이다. An artificial intelligence (AI) system is a computer system that implements human-level intelligence, and unlike the existing rule-based smart system, the machine learns, judges, and becomes smarter by itself.

이러한 인공지능 시스템은 사용할수록 인식률이 향상되고 사용자 취향을 보다 정확하게 이해할 수 있게 되어, 기존 규칙 기반 스마트 시스템은 점차 딥러닝 기반 인공지능 시스템으로 대체되고 있다.As these artificial intelligence systems are used, the recognition rate improves and user preferences can be understood more accurately, and the existing rule-based smart systems are gradually being replaced by deep learning-based artificial intelligence systems.

인공지능 기술은 기계학습(딥러닝) 및 기계학습을 활용한 요소 기술들로 구성된다. 기계학습은 입력 데이터들의 특징을 스스로 분류/학습하는 알고리즘 기술이며, 요소기술은 딥러닝 등의 기계학습 알고리즘을 활용하는 기술로서, 언어적 이해, 시각적 이해, 추론/예측, 지식 표현, 동작 제어 등의 기술 분야로 구성된다.Artificial intelligence technology consists of machine learning (deep learning) and element technologies using machine learning. Machine learning is an algorithm technology that categorizes/learns characteristics of input data by itself, and element technology is a technology that uses machine learning algorithms such as deep learning, such as language understanding, visual understanding, reasoning/prediction, knowledge expression, motion control, etc It consists of the technical fields of

최근에는 인공지능 기술이 발달함에 따라 다양한 기술이 쏟아져 나오고 있다. 특히, 음성과 음향 등의 오디오 데이터 인식과 관련된 분야는 오디오 데이터로부터 비정상 상황을 인식하는 연구들이 활발하게 진행되고 있다. 이는 감시 카메라로 촬영된 영상 데이터를 통해 비정상 상황 인식하는 방법의 한계를 벗어나서 오디오 데이터에서 이상음원을 감지하여 감시자에게 알려줌으로써 보다 효과적으로 해당 상황에 따른 조치를 취하는 방안일 수 있다.Recently, with the development of artificial intelligence technology, various technologies are pouring out. In particular, in the field related to the recognition of audio data such as voice and sound, studies for recognizing abnormal situations from audio data are being actively conducted. This can be a way to more effectively take action according to the situation by detecting an abnormal sound source in the audio data and notifying the supervisor, beyond the limit of the method of recognizing an abnormal situation through image data captured by a surveillance camera.

한편, 기존에는 SVM(Support Vector Machine)을 이용하여 이상음원을 분류하였다. 이 경우에는 관측되지 않은 영역을 포함하여 결정 경계면을 생성할 수 있기 때문에 새로운 오디오 데이터에 대한 오분류 가능성이 높다는 한계가 있다.Meanwhile, in the past, abnormal sound sources were classified using SVM (Support Vector Machine). In this case, there is a limitation in that there is a high possibility of misclassification of new audio data because a decision interface can be created including an unobserved region.

더욱이, 이상음원을 분류하기 위해서는 학습 데이터량이 제한적이므로 이상음원을 빠르고 정확하게 분류하기 위한 모델을 생성하기 위한 방안이 마련될 필요가 있고, 이상음원의 음향 특성을 고려한 모델을 구성하여 필요없는 구간을 분석함에 따라 발생할 수 있는 오분류 가능성을 줄일 필요가 있다.Moreover, since the amount of training data is limited to classify an abnormal sound source, it is necessary to prepare a method for generating a model for quickly and accurately classifying an abnormal sound source, and analyze the unnecessary section by constructing a model considering the acoustic characteristics of the abnormal sound source As a result, it is necessary to reduce the possibility of misclassification that may occur.

따라서, 이상음원을 분류하기 위해서는 인공지능 기반으로 하여 이상음원을 추출 및 분류하여 정확하게 이상음원을 인식할 수 있는 방안을 구현할 필요가 있다.Therefore, in order to classify the abnormal sound source, it is necessary to implement a method for accurately recognizing the abnormal sound source by extracting and classifying the abnormal sound source based on artificial intelligence.

한국 등록특허공보 제10-1718073호 (2017.03.14 등록)Korean Patent Publication No. 10-1718073 (registered on Mar. 14, 2017)

본 발명의 목적은 임의의 음원신호에 대해 음원구간을 시분할하여 추출한 후 인공지능 기반으로 하여 음원구간에 대해 이상음원을 추출 및 분류함으로써 정확하게 이상음원을 인식하기 위한, 인공지능 기반의 이상음원 인식 장치, 그 방법 및 이를 이용한 관제시스템을 제공하는데 있다.An object of the present invention is to accurately recognize an abnormal sound source by extracting and extracting the sound source section by time division for an arbitrary sound source signal and then extracting and classifying the abnormal sound source for the sound source section based on artificial intelligence, an artificial intelligence-based abnormal sound source recognition device , to provide a method and a control system using the same.

본 발명의 실시예에 따른 인공지능 기반의 이상음원 인식 장치는, 임의음원 데이터셋의 데이터량을 토대로 수행된 기계학습 결과를 이용하여 이상음원을 분류하기 위한 이상음원용 모델을 생성하기 위한 모델 생성부; 외부로부터 입력된 음원신호를 시간차를 두고 인식하기 위해 음원구간을 시분할하여 추출하기 위한 음원구간 추출부; 상기 음원구간의 음원특징을 추출하기 위한 음원특징 추출부; 상기 이상음원용 모델을 이용하여 상기 음원특징을 미리 정해진 이상음원 종류별로 분류하기 위한 이상음원 분류부; 및 상기 음원특징의 시간차 분류결과에 따라 상기 음원신호의 이상음원 인식결과를 알려주기기 위한 이상음원 인식부;를 포함할 수 있다.The artificial intelligence-based abnormal sound source recognition apparatus according to an embodiment of the present invention generates a model for generating a model for an abnormal sound source for classifying an abnormal sound source using the machine learning result performed based on the data amount of an arbitrary sound source data set wealth; a sound source section extractor for time-dividing and extracting a sound source section to recognize a sound source signal inputted from the outside with a time difference; a sound source feature extraction unit for extracting sound source features of the sound source section; an abnormal sound source classification unit for classifying the sound source characteristics according to a predetermined abnormal sound source type using the abnormal sound source model; and an abnormal sound source recognition unit for notifying an abnormal sound source recognition result of the sound source signal according to the time difference classification result of the sound source characteristics.

상기 음원구간 추출부는, 상기 음원신호를 첫번째 최고점이 확인되는 제1 음원구간과, 상기 제1 음원구간에 연속하여 미리 정해진 시간 동안에 확인되는 제2 음원구간으로 추출하는 것일 수 있다.The sound source section extractor may extract the sound source signal into a first sound source section in which the first highest point is confirmed and a second sound source section in succession to the first sound source section for a predetermined time.

상기 제1 음원구간은, 싱기 첫번째 최고점을 기준으로 좌우 각각 1초의 구간 길이를 나타내고, 상기 제2 음원구간은, 상기 제1 음원구간 보다 4배의 구간 길이를 나타내는 것일 수 있다.The first sound source section may represent a section length of 1 second left and right, respectively, based on the first highest point of the song, and the second sound source section may represent a section length four times longer than that of the first sound source section.

상기 제2 음원구간은, 한 번 이상의 최고점이 나타나는 것일 수 있다.In the second sound source section, one or more peaks may appear.

상기 이상음원 인식부는, 상기 제1 음원구간을 통해 상기 음원신호에 대해 1차 이상음원 인식결과를 알려준 다음, 상기 제2 음원구간을 통해 상기 음원신호에 대해 2차 이상음원 인식결과를 알려주는 것일 수 있다.The abnormal sound source recognition unit notifies the first abnormal sound source recognition result for the sound source signal through the first sound source section, and then informs the second abnormal sound source recognition result for the sound source signal through the second sound source section can

상기 이상음원 인식부는, 상기 1차 이상음원 인식결과와 상기 2차 이상음원 인식결과에서 복수의 이상음원을 인식하는 경우에, 복수의 이상음원 인식결과를 종합하여 이상음원 시나리오에 따른 이벤트 정보를 알려주는 것일 수 있다.The abnormal sound source recognition unit, when recognizing a plurality of abnormal sound sources from the first abnormal sound source recognition result and the second abnormal sound source recognition result, synthesizes the plurality of abnormal sound source recognition results to inform event information according to the abnormal sound source scenario may be

상기 이상음원 시나리오는, 복수의 이상음원이 인식될 때 개별 이상음원의 발생 순서에 대응되어 특정 이벤트 발생을 알려주는 것일 수 있다.The abnormal sound source scenario may indicate occurrence of a specific event corresponding to the order of occurrence of individual abnormal sound sources when a plurality of abnormal sound sources are recognized.

상기 이상음원용 모델은, 컨볼루션 신경망(Convolutional Neural Network) 기반의 모델일 수 있다.The model for the abnormal sound source may be a model based on a convolutional neural network.

상기 음원특징 추출부는, 상기 음원신호를 주파수영역으로 나타내는 스펙트럼(spectrum)으로 구한 후, 상기 스펙트럼에 멜 스케일(mel scale)에 기반한 필터뱅크(filter bank)를 적용하여 멜 스펙트럼(mel spectrum)을 도출하고, 상기 멜 스펙트럼에 대한 캡스트럼(cepstrum) 분석을 통해 로그 멜 스펙트로그램(log mel spectrogram) 기반의 음원 특징을 추출하는 로그 멜 스펙트로그램(log mel spectrogram) 기반의 추출 기법을 이용하는 것일 수 있다.The sound source feature extraction unit obtains the sound source signal as a spectrum representing a frequency domain, and then applies a filter bank based on a mel scale to the spectrum to derive a mel spectrum And, it may be to use a log mel spectrogram-based extraction technique that extracts a sound source feature based on a log mel spectrogram through cepstrum analysis of the mel spectrum.

또한, 본 발명의 실시예에 따른 (a) 임의음원 데이터셋의 데이터량을 토대로 수행된 기계학습 결과를 이용하여 이상음원을 분류하기 위한 이상음원용 모델을 생성하는 단계; (b) 외부로부터 입력된 음원신호를 시간차를 두고 인식하기 위해 음원구간을 시분할하여 추출하는 단계; (c) 상기 음원구간의 음원특징을 추출하는 단계; (d) 상기 이상음원용 모델을 이용하여 상기 음원특징을 미리 정해진 이상음원 종류별로 분류하는 단계; 및 (e) 상기 음원특징의 시간차 분류결과에 따라 상기 음원신호의 이상음원 인식결과를 알려주는 단계;를 포함할 수 있다.In addition, according to an embodiment of the present invention, (a) generating a model for an abnormal sound source for classifying an abnormal sound source by using the machine learning result performed based on the data amount of the arbitrary sound source dataset; (b) time-dividing and extracting the sound source section to recognize the sound source signal inputted from the outside with a time difference; (c) extracting a sound source feature of the sound source section; (d) classifying the sound source characteristics into predetermined abnormal sound source types using the abnormal sound source model; and (e) notifying an abnormal sound source recognition result of the sound source signal according to the time difference classification result of the sound source feature.

상기 (b) 단계는, 상기 음원신호를 첫번째 최고점이 확인되는 제1 음원구간과, 상기 제1 음원구간에 연속하여 미리 정해진 시간 동안에 확인되는 제2 음원구간으로 추출하는 것일 수 있다.The step (b) may include extracting the sound source signal into a first sound source section in which the first highest point is confirmed and a second sound source section in succession to the first sound source section for a predetermined time.

상기 (e) 단계는, 상기 제1 음원구간을 통해 상기 음원신호에 대해 1차 이상음원 인식결과를 알려준 다음, 상기 제2 음원구간을 통해 상기 음원신호에 대해 2차 이상음원 인식결과를 알려주는 것일 수 있다.In the step (e), the first abnormal sound source recognition result for the sound source signal is informed through the first sound source section, and then the second abnormal sound source recognition result for the sound source signal is informed through the second sound source section it could be

상기 (e) 단계는, 상기 1차 이상음원 인식결과와 상기 2차 이상음원 인식결과에서 복수의 이상음원을 인식하는 경우에, 복수의 이상음원 인식결과를 종합하여 이상음원 시나리오에 따른 이벤트 정보를 알려주는 것일 수 있다.In step (e), when a plurality of abnormal sound sources are recognized from the first abnormal sound source recognition result and the second abnormal sound source recognition result, event information according to the abnormal sound source scenario is synthesized by synthesizing the plurality of abnormal sound source recognition results. may be informing

상기 (c) 단계는, 상기 음원신호를 주파수영역으로 나타내는 스펙트럼(spectrum)으로 구한 후, 상기 스펙트럼에 멜 스케일(mel scale)에 기반한 필터뱅크(filter bank)를 적용하여 멜 스펙트럼(mel spectrum)을 도출하고, 상기 멜 스펙트럼에 대한 캡스트럼(cepstrum) 분석을 통해 로그 멜 스펙트로그램(log mel spectrogram) 기반의 음원 특징을 추출하는 로그 멜 스펙트로그램(log mel spectrogram) 기반의 추출 기법을 이용하는 것일 수 있다.In the step (c), after obtaining a spectrum representing the sound source signal in a frequency domain, a filter bank based on a mel scale is applied to the spectrum to obtain a mel spectrum It may be to use a log mel spectrogram-based extraction technique for extracting sound source characteristics based on the derivation and log mel spectrogram through cepstrum analysis of the mel spectrum. .

또한, 본 발명의 실시예에 따른 관제시스템은, 사건/사고 발생 가능성이 예상되는 관제구역을 모니터링하는 관제시스템에 있어서, 상기 관제구역에서 발생된 음원신호로부터 이상음원을 인식하기 위한 이상음원 인식 장치; 및 상기 이상음원 인식 장치로부터 전달된 이상음원 인식정보에 따라 상기 관제구역을 집중 관제하도록 제어하기 위한 관제서버;를 포함하되, 상기 이상음원 인식 장치는, 임의음원 데이터셋의 데이터량을 토대로 수행된 기계학습 결과를 이용하여 이상음원을 분류하기 위한 이상음원용 모델을 생성하기 위한 모델 생성부; 상기 음원신호를 시간차를 두고 인식하기 위해 음원구간을 시분할하여 추출하기 위한 음원구간 추출부; 상기 음원구간의 음원특징을 추출하기 위한 음원특징 추출부; 상기 이상음원용 모델을 이용하여 상기 음원특징을 미리 정해진 이상음원 종류별로 분류하기 위한 이상음원 분류부; 및 상기 음원특징의 시간차 분류결과에 따라 상기 음원신호의 이상음원 인식결과를 알려주기 위한 이상음원 인식부;를 포함할 수 있다.In addition, in the control system according to an embodiment of the present invention, in the control system for monitoring a control area where an event/accident is expected to occur, an abnormal sound source recognition device for recognizing an abnormal sound source from a sound source signal generated in the control area ; and a control server for controlling the control area to be intensively controlled according to the abnormal sound source recognition information transmitted from the abnormal sound source recognition device; a model generator for generating a model for an abnormal sound source for classifying an abnormal sound source using the machine learning result; a sound source section extractor for time-dividing and extracting the sound source section in order to recognize the sound source signal with a time difference; a sound source feature extraction unit for extracting sound source features of the sound source section; an abnormal sound source classification unit for classifying the sound source characteristics according to a predetermined abnormal sound source type using the abnormal sound source model; and an abnormal sound source recognition unit for notifying an abnormal sound source recognition result of the sound source signal according to the time difference classification result of the sound source characteristics.

상기 이상음원 인식정보는, 상기 관제구역의 위치정보와 상기 이상음원이 발생되는 방향정보가 포함되는 것일 수 있다.The abnormal sound source recognition information may include location information of the control area and direction information in which the abnormal sound source is generated.

상기 방향정보는, 상기 관제구역에 설치된 음원감지기에 의해 탐지되는 것일 수 있다.The direction information may be detected by a sound source sensor installed in the control area.

상기 관제서버는, 상기 관제구역에 설치된 CCTV의 카메라 방향을 상기 방향정보에 따라 제어하는 것일 수 있다.The control server may control the camera direction of the CCTV installed in the control area according to the direction information.

실시예에 따르면, 상기 관제서버에 의해 제어되는 상기 CCTV의 촬영영상을 실시간으로 재생하기 위한 관제단말기;를 더 포함할 수 있다.According to an embodiment, a control terminal for reproducing in real time the captured image of the CCTV controlled by the control server; may further include.

본 발명은 임의의 음원신호에 대해 음원구간을 시분할하여 추출한 후 인공지능 기반으로 하여 음원구간에 대해 이상음원을 추출 및 분류함으로써 정확하게 이상음원을 인식할 수 있다.The present invention can accurately recognize an abnormal sound source by extracting and extracting the sound source section by time division for an arbitrary sound source signal, and then extracting and classifying the abnormal sound source for the sound source section based on artificial intelligence.

또한, 본 발명은 이상음원 데이터셋의 데이터량이 제한적이더라도 기계학습을 빠르게 하고 이상음원 분류 결과의 정확도를 더욱 높여줄 수 있다.In addition, the present invention can speed up machine learning and further increase the accuracy of the abnormal sound source classification result even though the data amount of the abnormal sound source dataset is limited.

또한, 본 발명은 이상음원의 음향 특성과 사람이 소리를 인식하는 심리음향 특성을 반영하여 이상음원을 분류하기 위한 모델을 구성할 수 있다.In addition, the present invention can configure a model for classifying an abnormal sound source by reflecting the acoustic characteristics of the abnormal sound source and the psychoacoustic characteristics that humans recognize the sound.

또한, 본 발명은 CCTV의 사각지대를 최소화하고 능동적으로 CCTV를 운영할 수 있다.In addition, the present invention can minimize the blind spot of the CCTV and actively operate the CCTV.

도 1은 본 발명의 실시예에 따른 인공지능 기반의 이상음원 인식 장치를 나타낸 도면,
도 2는 도 1의 CNN 모델 생성부를 나타낸 도면,
도 3은 이상음원 파형을 나타낸 도면,
도 4는 도 3의 이상음원을 확장한 음원구간의 추출을 설명하는 도면, 도 5는 전체 음원구간을 분석하는 경우와 제1 및 제2 음원구간을 분할하여 분석하는 경우에 대한 비교 결과를 설명하는 도면,
도 6 및 도 7은 본 발명의 실시예에 따른 인공지능 기반의 이상음원 인식 방법을 나타낸 도면,
도 8은 본 발명의 실시예에 따른 인공지능 기반의 이상음원 인식 장치를 이용한 관제시스템을 나타낸 도면이다.1 is a view showing an artificial intelligence-based abnormal sound source recognition device according to an embodiment of the present invention;
Figure 2 is a view showing the CNN model generator of Figure 1;
3 is a diagram showing an abnormal sound source waveform;
4 is a view for explaining the extraction of the sound source section extending the abnormal sound source of FIG. 3, and FIG. 5 is a comparison result for the case of analyzing the entire sound source section and the case of analyzing the first and second sound source sections by dividing drawing to do,
6 and 7 are views showing an artificial intelligence-based abnormal sound source recognition method according to an embodiment of the present invention;
8 is a diagram illustrating a control system using an artificial intelligence-based abnormal sound source recognition device according to an embodiment of the present invention.

이하 본 발명의 바람직한 실시 예를 첨부한 도면을 참조하여 상세히 설명한다. 다만, 하기의 설명 및 첨부된 도면에서 본 발명의 요지를 흐릴 수 있는 공지 기능 또는 구성에 대한 상세한 설명은 생략한다. 또한, 도면 전체에 걸쳐 동일한 구성 요소들은 가능한 한 동일한 도면 부호로 나타내고 있음에 유의하여야 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, detailed descriptions of well-known functions or configurations that may obscure the gist of the present invention in the following description and accompanying drawings will be omitted. Also, it should be noted that, throughout the drawings, the same components are denoted by the same reference numerals as much as possible.

이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위한 용어로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다.The terms or words used in the present specification and claims described below should not be construed as being limited to conventional or dictionary meanings, and the inventor shall appropriately define his/her invention as terms for the best description of his/her invention. Based on the principle that it can be done, it should be interpreted as meaning and concept consistent with the technical idea of the present invention.

따라서 본 명세서에 기재된 실시 예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시 예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다.Accordingly, the embodiments described in this specification and the configurations shown in the drawings are only the most preferred embodiment of the present invention, and do not represent all of the technical spirit of the present invention. It should be understood that there may be equivalents and variations.

첨부 도면에 있어서 일부 구성요소는 과장되거나 생략되거나 또는 개략적으로 도시되었으며, 각 구성요소의 크기는 실제 크기를 전적으로 반영하는 것이 아니다. 본 발명은 첨부한 도면에 그려진 상대적인 크기나 간격에 의해 제한되어지지 않는다.In the accompanying drawings, some components are exaggerated, omitted, or schematically illustrated, and the size of each component does not fully reflect the actual size. The present invention is not limited by the relative size or spacing drawn in the accompanying drawings.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다.In the entire specification, when a part "includes" a certain element, this means that other elements may be further included, rather than excluding other elements, unless otherwise stated. Also, when a part is said to be “connected” to another part, it includes not only a case in which it is “directly connected” but also a case in which it is “electrically connected” with another element interposed therebetween.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The singular expression includes the plural expression unless the context clearly dictates otherwise. Terms such as “comprise” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification is present, and includes one or more other features, number, or step. , it should be understood that it does not preclude in advance the possibility of the presence or addition of an operation, component, part, or combination thereof.

또한, 명세서에서 사용되는 "부"라는 용어는 소프트웨어, FPGA 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, "부"는 어떤 역할들을 수행한다. 그렇지만 "부"는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. "부"는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 "부"는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 "부"들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 "부"들로 결합되거나 추가적인 구성요소들과 "부"들로 더 분리될 수 있다.Also, as used herein, the term “unit” refers to a hardware component such as software, FPGA, or ASIC, and “unit” performs certain roles. However, "part" is not meant to be limited to software or hardware. A “unit” may be configured to reside on an addressable storage medium and may be configured to refresh one or more processors. Thus, by way of example, “part” includes components such as software components, object-oriented software components, class components and task components, processes, functions, properties, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables. The functionality provided within components and “parts” may be combined into a smaller number of components and “parts” or further divided into additional components and “parts”.

아래에서는 첨부한 도면을 참고하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, the embodiments of the present invention will be described in detail so that those of ordinary skill in the art can easily implement them. However, the present invention may be embodied in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 설명한다.Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 인공지능 기반의 이상음원 인식 장치를 나타낸 도면이고, 도 2는 도 1의 CNN 모델 생성부를 나타낸 도면이며, 도 3은 이상음원 파형을 나타낸 도면이고, 도 4는 도 3의 이상음원을 확장한 음원구간의 추출을 설명하는 도면이며, 도 5는 전체 음원구간을 분석하는 경우와 제1 및 제2 음원구간을 분할하여 분석하는 경우에 대한 비교 결과를 설명하는 도면이다. 1 is a diagram showing an artificial intelligence-based abnormal sound source recognition apparatus according to an embodiment of the present invention, FIG. 2 is a diagram showing the CNN model generator of FIG. 1, FIG. 3 is a diagram showing abnormal sound source waveforms, and FIG. 4 is a diagram for explaining the extraction of the sound source section that extends the abnormal sound source of Fig. 3, and Fig. 5 is a diagram for explaining the comparison results for the case of analyzing the entire sound source section and the case of dividing and analyzing the first and second sound source sections It is a drawing.

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 인공지능 기반의 이상음원 인식 장치(이하 '이상음원 인식 장치'라 함, 100)는, 임의의 음원신호에 대해 음원구간을 시분할하여 추출한 후 인공지능 기반으로 하여 음원구간에 대해 이상음원을 추출 및 분류함으로써 정확하게 이상음원을 인식할 수 있다.As shown in Fig. 1, the artificial intelligence-based abnormal sound source recognition device (hereinafter referred to as 'abnormal sound source recognition device', 100) according to an embodiment of the present invention is extracted by time-divisioning a sound source section for an arbitrary sound source signal. Then, it is possible to accurately recognize abnormal sound sources by extracting and classifying abnormal sound sources for sound source sections based on artificial intelligence.

여기서, 이상음원(abnormaly sound)이라 함은, 특정 위험 상황에서 발생하여 단어로 표현되지 않으나 특정 위험 상황을 직관적으로 알려줄 수 있는 비정형적 음원으로서, 예를 들어, 비명소리, 유리창 파손음, 차량경적소리, 차량급정거소리, 차량사고소리, 폭발소리, 총소리 등의 종류로 구분할 수 있다. 이러한 이상음원의 음향 특성은 일정 데시벨(db) 이상으로 소리크기가 크고 소리구간이 짧은 특징을 나타낸다. Here, the abnormal sound source is an atypical sound source that is generated in a specific dangerous situation and is not expressed in words, but can intuitively inform a specific dangerous situation, for example, a scream, a broken window sound, a vehicle horn. It can be classified into types such as sound, sudden vehicle stop sound, vehicle accident sound, explosion sound, and gun sound. The acoustic characteristics of such an abnormal sound source are characterized by a large sound volume of a certain decibel (db) or more and a short sound section.

즉, 이상음원은 음향학적 신호를 의미를 갖는 단어나 문장으로 인식되지 않고 특정 위험 상황을 나타낸다. 이러한 이상음원은 음원 구간에 대해 라벨되어 있는 데이터셋이 아닌 파일 단위에서 음원 구간의 유무로만 라벨링되는 약한 라벨(weak label) 음원에 해당한다.That is, the abnormal sound source is not recognized as a word or sentence having a meaning of an acoustic signal, but represents a specific dangerous situation. Such an abnormal sound source corresponds to a weak label sound source that is labeled only with the presence or absence of a sound source section in a file unit rather than a dataset labeled for the sound source section.

또한, 인공지능 기술은 기계학습(딥러닝) 및 기계학습을 활용한 요소 기술들로 구성되고, 구체적으로 이상음원을 분류하기 위한 딥러닝 모델로서 컨볼루션 신경망(Convolutional Neural Network, 이하 'CNN'이라 함) 모델을 적용하는 경우에 대하여 설명하기로 한다.In addition, artificial intelligence technology consists of machine learning (deep learning) and element technologies using machine learning, and is a deep learning model for specifically classifying abnormal sound sources. A case in which the model is applied will be described.

한편, 이상음원 인식 장치(100)는 CNN 모델 생성부(110), 음원구간 추출부(120), 음원특징 추출부(130), 이상음원 분류부(140), 이상음원 인식부(150), 음원구간 저장부(160)를 포함하여 구성할 수 있다.On the other hand, the abnormal sound source recognition apparatus 100 includes a CNN model generation unit 110, a sound source section extraction unit 120, a sound source feature extraction unit 130, an abnormal sound source classification unit 140, an abnormal sound source recognition unit 150, It may be configured to include a sound source section storage unit 160 .

먼저, CNN 모델 생성부(110)는 이상음원을 분류하기 위한 CNN 모델을 생성한다.First, the CNN model generator 110 generates a CNN model for classifying an abnormal sound source.

이 경우, CNN 모델 생성부(110)는 이상음원 데이터셋에 대해 기계학습을 수행하여 CNN 모델을 생성하기 위한 데이터량이 제한적이기 때문에 이상음원 데이터셋을 이용하여 곧바로 이상음원을 분류하기 위한 CNN 모델을 생성하지 않는다.In this case, the CNN model generator 110 performs machine learning on the abnormal sound source dataset to generate a CNN model because the amount of data for generating the CNN model is limited. do not create

즉, CNN 모델 생성부(110)는 우선 다양한 종류의 음원이 약하게 라벨링이 되어 있는 임의음원 데이터셋의 충분한 학습 데이터량을 토대로 기계학습을 수행하여 CNN 모델을 생성한 이후, 이를 이용하여 이상음원을 분류하기 위한 CNN 모델을 생성한다.That is, the CNN model generator 110 first generates a CNN model by performing machine learning based on a sufficient amount of training data of an arbitrary sound source dataset in which various types of sound sources are weakly labeled, and then uses this to generate an abnormal sound source. Create a CNN model for classification.

여기서는 설명의 편의상 이상음원을 분류하기 위한 CNN 모델을 이하 '이상음원용 CNN 모델'이라 하고, 임의음원을 분류하기 위한 CNN 모델을 이하 '임의음원용 CNN 모델'이라 한다.Hereinafter, for convenience of explanation, a CNN model for classifying an abnormal sound source is referred to as a 'CNN model for an abnormal sound source', and a CNN model for classifying an arbitrary sound source is referred to as a 'CNN model for an arbitrary sound source'.

이와 같이, CNN 모델 생성부(110)는 미리 만들어진 임의음원용 CNN 모델을 사용하여 이상음원용 CNN 모델을 만들시 이상음원 데이터셋의 데이터량이 제한적이더라도 기계학습을 빠르게 하고 이상음원 분류 결과의 정확도를 더욱 높여줄 수 있다.In this way, when the CNN model generating unit 110 uses a pre-made CNN model for an arbitrary sound source to create a CNN model for an abnormal sound source, even if the data amount of the abnormal sound source dataset is limited, the machine learning is fast and the accuracy of the abnormal sound source classification result is improved. can make it higher.

여기서, 임의음원용 CNN 모델은 임의음원의 분류 정확도가 일반적으로 요구되는 수준에 이르지 않더라도 충분한 학습 데이터량을 확보하여 만들어진 모델이라면 사용할 수 있다. 이는 이상음원 데이터셋이 모델 생성을 위해 충분한 학습 데이터량을 확보하기 어렵기 때문에 이를 보완하는 방안으로 임의음원용 CNN 모델을 이용하여 이상음원용 CNN 모델을 생성하기 위함이다.Here, the CNN model for arbitrary sound sources can be used as long as it is made by securing sufficient amount of training data even if the classification accuracy of arbitrary sound sources does not reach the generally required level. This is to create a CNN model for an abnormal sound source by using a CNN model for an arbitrary sound source as a supplementary method because it is difficult for the abnormal sound source dataset to secure sufficient amount of training data for model generation.

구체적으로, CNN 모델 생성부(110)는 임의음원용 CNN 모델을 다음과 같이 생성한다.Specifically, the CNN model generator 110 generates a CNN model for an arbitrary sound source as follows.

즉, CNN 모델 생성부(110)는 임의음원 데이터셋의 각 음원신호를 프레임별로 나누어 단시간 푸리에 변환(Short-Time Fourier Transformer, STFT)을 적용하여 각 음원신호를 시간영역(time domain) 대신 주파수영역(frequency domain)으로 나타내는 스펙트럼(spectrum)을 구한다.That is, the CNN model generator 110 divides each sound source signal of the arbitrary sound source data set by frame and applies a Short-Time Fourier Transformer (STFT) to convert each sound source signal in the frequency domain instead of the time domain. A spectrum expressed in (frequency domain) is obtained.

그리고, CNN 모델 생성부(110)는 고주파수(high frequency) 보다 저주파수(low frequency) 대역에서 더욱 민감하게 나타나는 사람의 청각기관의 특성을 반영하기 위해, 물리적인 주파수와 실제 사람이 인식하는 주파수의 관계를 나타내는 멜 스케일(mel scale)에 기반한 필터뱅크(filter bank)를 스펙트럼에 적용해 멜 스펙트럼(mel spectrum)을 도출한다.And, the CNN model generating unit 110 is a relationship between a physical frequency and a frequency recognized by a real person in order to reflect the characteristics of the human auditory organ appearing more sensitively in a low frequency band than in a high frequency (high frequency) band A mel spectrum is derived by applying a filter bank based on a mel scale representing .

그런 다음, CNN 모델 생성부(110)는 사람의 청각기관의 특성을 반영하여 얻은 멜 스펙트럼에 로그(log)와 역푸리에변환(inverse fourier transformer)을 통해 캡스트럼(cepstrum) 분석을 수행한다.Then, the CNN model generator 110 performs cepstrum analysis through log and inverse Fourier transform on the Mel spectrum obtained by reflecting the characteristics of the human auditory organ.

이를 통해, CNN 모델 생성부(110)는 임의음원 데이터셋의 각 음원신호에 대한 로그 멜 스펙트로그램(log mel spectrogram)으로 임의음원 특징을 추출한다(S1). 이는 CNN 모델을 활용하여 각 음원신호를 분류하기 위해 음원신호를 이미지 형태로 변환하는 신호처리 과정이다.Through this, the CNN model generation unit 110 extracts the random sound source feature as a log mel spectrogram for each sound source signal of the arbitrary sound source data set (S1). This is a signal processing process that converts a sound source signal into an image form to classify each sound source signal using a CNN model.

또한, CNN 모델 생성부(110)는 임의음원 특징에 대한 CNN 기반의 기계학습을 수행하여 임의음원용 CNN 모델을 생성한다(S2, S3). 이때, CNN 모델 생성부(110)는 합성곱 계층(Convolutional Layer)에서 연산을 위한 가중치(weight)와 바이어스(bias)를 결정하여 분류 특징을 추출하고, 풀링 계층(pooling layer)에서 맥스 풀링(max-pooling)을 사용하여 분류 특징에 대해 미리 정해진 이상음원 종류별로 분류할 수 있다.In addition, the CNN model generation unit 110 performs CNN-based machine learning on the features of the arbitrary sound source to generate a CNN model for the arbitrary sound source (S2, S3). At this time, the CNN model generator 110 extracts classification features by determining a weight and a bias for calculation in the convolutional layer, and max pooling (max) in the pooling layer. -pooling) can be used to classify by type of abnormal sound source predetermined for classification characteristics.

한편, CNN 모델 생성부(110)는 이상음원용 CNN 모델을 다음과 같이 생성한다.Meanwhile, the CNN model generating unit 110 generates a CNN model for an abnormal sound source as follows.

이 경우, CNN 모델 생성부(110)는 이미 사전에 기계학습이 완료된 임의음원용 CNN 모델을 가지고 이상음원을 분류하려는 기계학습에 미세 조정을 수행하여 학습시켜 이상음원용 CNN 모델을 생성하게 된다.In this case, the CNN model generating unit 110 generates a CNN model for an abnormal sound source by performing fine adjustments to machine learning to classify an abnormal sound source with a CNN model for an arbitrary sound source that has already been machine learning completed in advance.

즉, CNN 모델 생성부(110)는 임의음원용 CNN 모델의 합성곱 계층에서 연산을 위한 가중치와 바이어스를 이상음원에 맞게 미세 조정하여 이상음원용 CNN 모델을 생성한다. 이때, CNN 모델 생성부(110)는 임의음원용 CNN 모델에서 마지막 완전 연결층(fully connected layer)에 대해서만 미세 조정을 수행할 수 있다.That is, the CNN model generating unit 110 generates a CNN model for an abnormal sound source by finely adjusting the weights and biases for calculation in the convolutional layer of the CNN model for an arbitrary sound source according to the abnormal sound source. In this case, the CNN model generator 110 may perform fine adjustment only on the last fully connected layer in the CNN model for an arbitrary sound source.

도 2를 참조하면, CNN 모델 생성부(110)는 임의음원 데이터셋에서 임의음원 특징을 추출하는 바와 같이, 이상음원 데이터셋의 각 음원신호에 대한 로그 멜 스펙트로그램(log mel spectrogram)으로 이상음원 특징을 추출한다(S11).Referring to FIG. 2 , the CNN model generating unit 110 converts the abnormal sound source into a log mel spectrogram for each sound source signal of the abnormal sound source data set, as in extracting the random sound source features from the arbitrary sound source dataset. A feature is extracted (S11).

그리고, CNN 모델 생성부(110)는 임의음원용 CNN 모델에서 이상음원 특징을 재학습시켜 이상음원에 맞게 미세 조정을 통한 이상음원용 CNN 모델을 생성한다(S12, S13). 즉, CNN 모델 생성부(110)는 임의음원용 CNN 모델을 이상음원용 CNN 모델로 재튜닝함으로써 이상음원에 대한 분류 특징을 추출하고, 분류 특징을 토대로 미리 정해진 이상음원 종류별로 분류하는 이상음원용 CNN 모델을 마련하게 된다.Then, the CNN model generator 110 re-learns the characteristics of an abnormal sound source from the CNN model for an arbitrary sound source and generates a CNN model for an abnormal sound source through fine adjustment to fit the abnormal sound source (S12, S13). That is, the CNN model generating unit 110 extracts classification features for an abnormal sound source by re-tuning the CNN model for an arbitrary sound source into a CNN model for an abnormal sound source, and classifies the abnormal sound source by a predetermined type of abnormal sound source based on the classification characteristic. A CNN model is prepared.

이처럼 CNN 모델 생성부(110)는 임의음원 데이터셋의 데이터랑을 토대로 수행된 기계학습 결과를 이용하여 이상음원을 분류하기 위한 이상음원용 CNN 모델을 생성한다.As such, the CNN model generator 110 generates a CNN model for an abnormal sound source for classifying an abnormal sound source by using the machine learning result performed based on the data range of the arbitrary sound source dataset.

다음으로, 음원구간 추출부(120)는 외부로부터 임의의 음원신호가 입력되면, 해당 음원신호에서 이상음원의 음향 특성과 사람이 소리를 인식하는 심리음향 특성을 반영하는 음원구간을 추출한다.Next, when an arbitrary sound source signal is input from the outside, the sound source section extractor 120 extracts a sound source section that reflects the acoustic characteristics of the abnormal sound source and the psychoacoustic characteristics that a person recognizes from the sound source signal.

이때, 음원구간 추출부(120)는 외부로부터 입력된 음원신호를 시간차를 두고 인식하기 위해 시분할하여 추출한다.In this case, the sound source section extractor 120 time-divisions and extracts the sound source signal input from the outside in order to recognize it with a time difference.

여기서, 이상음원의 음향 특성은 전술한 바와 같이 일정 데시벨(db) 이상으로 소리의 크기가 크고 구간이 짧은 특징을 나타낸다. 즉, 이상음원은 도 3을 참조하면 음원구간의 최고점(peak audio)을 형성하는 모양을 나타낸다. Here, as described above, the acoustic characteristics of the abnormal sound source indicate a characteristic of a large sound volume of a certain decibel (db) or more and a short section. That is, the abnormal sound source has a shape that forms the peak audio of the sound source section with reference to FIG. 3 .

사람이 소리를 인식하는 심리음향 특성은 사람의 반사신경과 소리에 대한 민감도 또는 학습 여부에 따라 달라질 수 있으나 사람이 소리를 인식하는데 필요한 소리의 길이가 평균 2초 정도를 나타낸다.Although the psychoacoustic characteristics that a person recognizes may vary depending on the person's reflexes, sensitivity to sound, or whether or not to learn, the length of sound required for a person to recognize a sound is about 2 seconds on average.

이에 따라, 음원구간 추출부(120)는 임의의 음원신호에서 음원구간의 최고점을 기준으로 미리 정해진 선/후 구간을 음원구간으로 추출한다. 이러한 음원구간은 도 3과 같이 심리음향 특성에 따라 시간 영역에서 최고점을 기준으로 선/후 구간 각각 1초씩 전체 2초의 구간으로 추출될 수 있다.Accordingly, the sound source section extractor 120 extracts a predetermined pre/post section from the sound source signal as a sound source section based on the highest point of the sound source section. As shown in FIG. 3 , the sound source section may be extracted as a section of 2 seconds in total, 1 second each before and after each section, based on the highest point in the time domain according to psychoacoustic characteristics.

그런데, 전체 2초의 음원구간은 분석결과가 100% 정확하지 않을 수도 있다. 그래서, 음원구간 추출부(120)는 2초의 음원구간에 대해 선 인식 과정을 실시한 후, 해당 이상음원의 전후상황을 판단 가능한 정도의 길이로 연속하여 이어진 8초의 음원구간에 대해 후 인식 과정을 실시한다.However, the analysis result may not be 100% accurate for the entire 2 second sound source section. So, the sound source section extractor 120 performs a pre-recognition process for the sound source section of 2 seconds, and then performs a post-recognition process for the sound source section of 8 seconds continuously with a length that can determine the before and after situations of the abnormal sound source. do.

여기서, 도 4와 같이 선 인식 과정을 실시하는 음원구간으로서, 가장 먼저 나타나는 첫번째 최고점이 확인되는 구간을 이하 '제1 음원구간'으로 통칭하고, 후 인식 과정을 실시하는 음원구간으로서, 제1 음원구간에 연속하여 미리 정해진 시간 동안에 확인되는 구간을 이하 '제2 음원구간'으로 통칭하기로 한다. Here, as a sound source section in which the line recognition process is performed as shown in FIG. 4, the section in which the first peak that appears first is identified is hereinafter collectively referred to as the 'first sound source section', and is a sound source section in which a subsequent recognition process is performed, the first sound source A section that is continuously confirmed for a predetermined time in succession to the section is hereinafter collectively referred to as a 'second sound source section'.

제1 음원구간은 첫번째 최고점 기준으로 좌우 1초의 구간인 총 2초의 구간 길이를 나타내고, 제2 음원구간은 제1 음원구간에 연속하여 미리 정해진 시간인 총 8초의 구간 길이로 나타낼 수 있다. 제1 음원구간 및 제2 음원구간은 소리의 특성에 따라 구간 길이를 변경할 수 있다. The first sound source section may represent a total section length of 2 seconds, which is a section of 1 second left and right based on the first highest point, and the second sound source section can be expressed as a section length of 8 seconds, which is a predetermined time consecutive to the first sound source section. The length of the first sound source section and the second sound source section can be changed according to the characteristics of the sound.

그리고, 제1 음원구간은 하나의 최고점(첫번째 최고점, W1)이 나타나지만, 제2 음원구간은 한 번 이상의 최고점(두번째 최고점 W2, 세번째 최고점 W3)이 나타날 수 있다. 이때, 제2 음원구간은 제1 음원구간 보다 4배의 구간 길이를 나타낼 수 있다.In addition, one peak (first highest point, W1) may appear in the first sound source section, but more than one peak (second highest point W2, third highest point W3) may appear in the second sound source section. In this case, the second sound source section may represent a section length four times that of the first sound source section.

이와 같이, 제2 음원구간은 적어도 하나 이상의 이상음원을 포함하는 경우에, 복수의 이상음원이 인식될 때 개별 이상음원의 발생 순서에 대응되어 특정 이벤트 발생을 알려주는 시나리오(이하 '이상음원 시나리오'라 함)를 나타낼 수 있다.As such, when the second sound source section includes at least one abnormal sound source, a scenario in which a specific event is notified corresponding to the order of occurrence of individual abnormal sound sources when a plurality of abnormal sound sources are recognized (hereinafter, 'abnormal sound source scenario') ) can be shown.

예를 들어, 이상음원 시나리오는 이상음원1이 '경적소리', 이상음원2가 '차량 급정지음', 이상음원3이 '차량 충돌음'이라 할 때, 이상음원1-이상음원2-이상음원3의 발생 순서에 대응되는 '차량 교통 사고 발생'이라는 특정 이벤트를 의미할 수 있다. 이와 같이 이상음원 시나리오는 다양한 케이스를 고려하여 미리 정하여 음원구간 저장부(160)에 미리 저장할 수 있다.For example, the abnormal sound source scenario is when abnormal sound source 1 is 'horn sound', abnormal sound source 2 is 'vehicle sudden stop', and abnormal sound source 3 is 'vehicle crash sound', abnormal sound source 1 - abnormal sound source 2 - abnormal sound source 3 It may mean a specific event of 'vehicle traffic accident occurrence' corresponding to the order of occurrence. In this way, the abnormal sound source scenario may be previously determined in consideration of various cases and stored in the sound source section storage unit 160 in advance.

여기서, 복수 개의 이상음원이 인식되는 경우라 함은 제2 음원구간에서 복수 개의 이상음원이 인식되는 경우와 제1 및 제2 음원구간에서 복수 개의 이상음원이 인식되는 경우가 해당한다.Here, the case in which a plurality of abnormal sound sources are recognized corresponds to a case in which a plurality of abnormal sound sources are recognized in the second sound source section and a case in which a plurality of abnormal sound sources are recognized in the first and second sound source sections.

한편, 이상음원은 전술한 바와 같이 특정 위험 상황을 직관적으로 알려줄 수 있는 음원에 해당하므로, 신속하고 누락 없이 인식되는 것이 중요하다.On the other hand, since the abnormal sound source corresponds to a sound source that can intuitively inform a specific dangerous situation as described above, it is important to recognize it quickly and without omission.

그런데 도 5를 참조하면, 전체 음원구간을 한번에 인식하는 경우는 제1 및 제2 음원구간을 시분할하여 인식하는 경우와 마찬가지로 이상음원을 누락없이 인식하는 것이 가능할지라도, 제1 및 제2 음원구간을 시분할하여 인식하는 경우에 비해 이상음원을 신속하게 인식하기 곤란할 수 있다.However, referring to FIG. 5 , in the case of recognizing the entire sound source section at once, as in the case of recognizing the first and second sound source sections by time division, even though it is possible to recognize abnormal sound sources without omission, the first and second sound source sections are It may be difficult to quickly recognize an abnormal sound source compared to the case of time division recognition.

즉, 전체 음원구간을 한번에 인식하는 경우는 전체 음원구간을 모두 추출한 후에 비로서 전체 음원구간에 대해 인식을 실시하여 이상음원 발생을 알려줄 수 있다.That is, in the case of recognizing the entire sound source section at once, after extracting all the sound source sections, recognition of the entire sound source section may be performed to inform the occurrence of an abnormal sound source.

반면에, 제1 및 제2 음원구간을 시분할하여 인식하는 경우는 제1 음원구간을 추출한 후에 곧바로 제1 음원구간에 대해 선 인식 과정을 실시하여 외부로부터 입력된 음원신호에 대해 1차로 이상음원 발생을 신속하게 알려줄 수 있을 뿐만 아니라, 제1 음원구간과 별개로 전체 음원구간 보다 짧은 제2 음원구간에 대해 후 인식 과정을 실시하여 음원신호에 대해 2차로 이상음원 발생을 정확하게 알려줄 수 있다. 물론, 제2 음원구간에 대해 후 인식 과정을 실시하는 경우에는 제1 음원구간에 대한 인식 과정이 제외되므로 인식 시간이 짧아지게 된다.On the other hand, in the case of time-division recognition of the first and second sound source sections, a line recognition process is performed for the first sound source section immediately after the first sound source section is extracted, thereby generating an abnormal sound source for the sound source signal input from the outside It is possible not only to quickly inform the sound source, but also to perform a post-recognition process for the second sound source section, which is shorter than the entire sound source section separately from the first sound source section, to accurately inform the second generation of abnormal sound sources with respect to the sound source signal. Of course, when the post-recognition process is performed for the second sound source section, the recognition process for the first sound source section is excluded, so that the recognition time is shortened.

도 5와 같이 시간축상에서 임의의 음원신호에 대한 이상음원 인식에 있어서, 전체 음원구간을 한번에 인식하는 경우와 제1 및 제2 음원구간을 시분할하여 인식하는 경우는 이상음원의 인식 속도에 있어서 서로 상이한 것을 쉽게 이해할 수 있다. 이러한 차이는 제1 및 제2 음원구간의 길이가 길어질수록 더욱 분명하게 나타날 수 있다.In the case of recognizing the entire sound source section at once in recognizing an abnormal sound source for an arbitrary sound source signal on the time axis as shown in FIG. 5, and when recognizing the first and second sound source segments by time division, the recognition speed of the abnormal sound source is different from each other. can be easily understood Such a difference may appear more clearly as the length of the first and second sound source sections increases.

다시 도 1을 참조하면, 음원구간 추출부(120)는 제1 음원구간과 제2 음원구간을 추출하되, 제1 음원구간에 대해 선 인식 과정을 실시하기 위해 음원특징 추출부(130)로 전달하고(①), 제2 음원구간에 대해 후 인식 과정을 실시하기 위해 음원구간 저장부(160)에 저장한다(ⓐ).Referring back to FIG. 1 , the sound source section extractor 120 extracts the first sound source section and the second sound source section, and transfers the first sound source section to the sound source feature extractor 130 to perform a line recognition process for the first sound source section and (①), and store it in the sound source section storage unit 160 to perform a post-recognition process for the second sound source section (ⓐ).

즉, 음원구간 추출부(120)는 제1 음원구간을 추출하여 음원특징 추출부(130)로 전달하면서, 제2 음원구간을 미리 정해진 구간 길이 만큼 추출 및 녹음하여 음원구간 저장부(160)에 저장한다.That is, the sound source section extractor 120 extracts the first sound source section and delivers it to the sound source feature extractor 130, while extracting and recording the second sound source section by a predetermined section length to the sound source section storage unit 160 Save.

다음으로, 음원특징 추출부(130)는 음원구간 추출부(120)에 의해 추출된 임의의 음원신호의 제1 음원구간에 대한 로그 멜 스펙트로그램 기반의 음원 특징을 추출한다(②). 이에 대한 자세한 설명은 앞서 언급한 바와 동일하므로 생략하기로 한다.Next, the sound source feature extraction unit 130 extracts the sound source characteristics based on the log Mel spectrogram for the first sound source section of the arbitrary sound source signal extracted by the sound source section extraction unit 120 (②). A detailed description thereof will be omitted since it is the same as described above.

마찬가지로, 음원특징 추출부(130)는 음원구간 저장부(160)에 저장된 제2 음원구간에 대한 로그 멜 스펙트로그램 기반의 음원 특징을 추출한다(ⓑ, ⓓ). 이때, 음원특징 추출부(130)는 이상음원 분류부(140) 간 요청이 있는 경우에 음원구간 저장부(160)로부터 제2 음원구간을 전달받는다(ⓒ).Similarly, the sound source feature extraction unit 130 extracts the sound source characteristics based on the log Mel spectrogram for the second sound source section stored in the sound source section storage unit 160 (ⓑ, ⓓ). At this time, the sound source feature extraction unit 130 receives the second sound source section from the sound source section storage unit 160 when there is a request between the abnormal sound source classification units 140 (ⓒ).

다음으로, 이상음원 분류부(140)는 CNN 모델 생성부(110)에 의해 생성된 이상음원용 CNN 모델을 기반으로 하여 음원특징 추출부(130)에 의해 추출된 제1 음원구간의 음원 특징에 대해 분류 특징을 추출하고, 이러한 분류 특징에 대해 미리 정해진 이상음원 종류별로 분류한다(③).Next, the abnormal sound source classification unit 140 is based on the CNN model for the abnormal sound source generated by the CNN model generation unit 110 to the sound source characteristics of the first sound source section extracted by the sound source feature extraction unit 130. Classification features are extracted for the data, and the classification features are classified according to a predetermined abnormal sound source type (③).

마찬가지로, 이상음원 분류부(140)는 CNN 모델 생성부(110)에 의해 생성된 이상음원용 CNN 모델을 기반으로 하여 음원특징 추출부(130)에 의해 추출된 제2 음원구간의 음원 특징에 대해 분류 특징을 추출하고, 이러한 분류 특징에 대해 미리 정해진 이상음원 종류별로 분류한다(ⓔ).Similarly, the abnormal sound source classifying unit 140 is based on the CNN model for the abnormal sound source generated by the CNN model generating unit 110. For the sound source characteristics of the second sound source section extracted by the sound source feature extraction unit 130, Classification features are extracted, and the classification features are classified according to a predetermined abnormal sound source type (ⓔ).

이와 같이, 이상음원 분류부(140)는 이상음원용 CNN 모델의 합성곱 계층에서 제1 음원구간 또는 제2 음원구간의 음원 특징에 대한 분류 특징을 추출하고, 풀링 계층에서 분류에 대한 특징점을 간소화(sub-sampling)시키며, 최종 완전 연결층을 통하여 분류 특징에 대해 미리 정해진 이상음원 종류별로 분류한다.In this way, the abnormal sound source classification unit 140 extracts the classification features for the sound source characteristics of the first sound source section or the second sound source section in the convolutional layer of the CNN model for the abnormal sound source, and simplifies the feature points for classification in the pooling layer. (sub-sampling), and classifies by type of abnormal sound source predetermined for classification characteristics through the final fully connected layer.

다음으로, 이상음원 인식부(150)는 제1 음원구간과 제2 음원구간 각각에 대한 분류결과를 이용하여 임의의 음원신호에서 이상음원 인식결과를 알려준다.Next, the abnormal sound source recognition unit 150 notifies the abnormal sound source recognition result in an arbitrary sound source signal using the classification results for each of the first sound source section and the second sound source section.

구체적으로, 이상음원 인식부(150)는 제1 음원구간과 제2 음원구간에서 하나의 이상음원을 인식하는 경우에, 하나의 이상음원 인식결과로서 이상음원 분류결과를 알려준다.Specifically, when one abnormal sound source is recognized in the first sound source section and the second sound source section, the abnormal sound source recognition unit 150 notifies the abnormal sound source classification result as one abnormal sound source recognition result.

또한, 이상음원 인식부(150)는 제1 음원구간과 제2 음원구간에서 복수의 이상음원을 인식하는 경우에, 복수의 이상음원 인식결과 각각에 대해 개별로 이상음원 분류결과를 알려주거나, 복수의 이상음원 인식결과를 종합하여 이상음원 시나리오에 따른 이벤트(예, 교통사고 이벤트 등) 정보를 알려줄 수 있다.In addition, when recognizing a plurality of abnormal sound sources in the first sound source section and the second sound source section, the abnormal sound source recognition unit 150 notifies the abnormal sound source classification result individually for each of the plurality of abnormal sound source recognition results, or By synthesizing the sound source recognition results of

여기서, 이상음원 인식부(150)는 제1 음원구간에서만 이상음원을 인식하는 경우를 제외하고, 제2 음원구간에서 이상음원을 인식하는 경우 또는 제1 및 제2 음원구간에서 이상음원을 인식하는 경우에 이상음원 시나리오에 따른 이벤트 정보를 알려주는 경우가 발생할 수 있다. Here, the abnormal sound source recognition unit 150 recognizes an abnormal sound source in the second sound source section or in the first and second sound source sections, except for the case of recognizing the abnormal sound source only in the first sound source section. In some cases, event information according to an abnormal sound source scenario may be notified.

한편, 이상음원 인식 장치(100)는 임의의 음원신호를 입력받고, 임의의 음원신호에 대한 분류결과에 따라 이상음원 인식결과를 출력하기 위한 인터페이스부(미도시)를 더 포함할 수 있다. 예를 들어, 인터페이스부는 임의의 음원신호를 입력받기 위해 마이크로폰(microphone) 등을 구비하고, 임의의 음원신호에 대한 분류결과에 따라 이상음원 인식결과를 출력하기 위해 디스플레이 장치(display device)나 스피커(speaker) 등을 구비할 수 있다.On the other hand, the abnormal sound source recognition apparatus 100 may further include an interface unit (not shown) for receiving an arbitrary sound source signal and outputting an abnormal sound source recognition result according to a classification result for the arbitrary sound source signal. For example, the interface unit includes a microphone to receive an arbitrary sound source signal, and a display device or speaker ( speaker), etc. may be provided.

아울러, 이상음원 인식 장치(100)는 적어도 하나 이상의 프로세서(processor)와 컴퓨터 판독 가능한 명령들을 저장하기 위한 메모리(memory)를 포함한다.In addition, the abnormal sound source recognition apparatus 100 includes at least one processor and a memory for storing computer-readable instructions.

적어도 하나 이상의 프로세서는 메모리에 저장된 컴퓨터 판독 가능한 명령들을 실행할 때, 본 발명의 실시예에 따른 인공지능 기반의 이상음원 인식 방법을 수행하게 된다.At least one processor performs the artificial intelligence-based abnormal sound source recognition method according to an embodiment of the present invention when executing computer readable instructions stored in the memory.

즉, 프로세서는 메모리에 저장된 컴퓨터 판독 가능한 명령들을 실행할 때, 이상음원 인식 장치(100)에 포함된 CNN 모델 생성부(110), 음원구간 추출부(120), 음원특징 추출부(130), 이상음원 분류부(140), 이상음원 인식부(150)의 기능을 수행할 수 있게 된다.That is, when the processor executes computer readable instructions stored in the memory, the CNN model generator 110, the sound source section extractor 120, the sound source feature extractor 130, It is possible to perform the functions of the sound source classification unit 140 and the abnormal sound source recognition unit 150 .

여기서, 프로세서는 적어도 하나 이상의 프로세서로서, 컨트롤러(controller), 마이크로 컨트롤러(microcontroller), 마이크로 프로세서(microprocessor), 마이크로 컴퓨터(microcomputer) 등으로도 호칭될 수 있다.Here, the processor is at least one or more processors, and may also be referred to as a controller, a microcontroller, a microprocessor, a microcomputer, or the like.

그리고, 프로세서는 하드웨어(hardware) 또는 펌웨어(firmware), 소프트웨어, 또는 이들의 결합에 의해 구현될 수 있다.In addition, the processor may be implemented by hardware or firmware, software, or a combination thereof.

또한, 메모리는 하나의 저장 장치일 수 있거나, 또는 복수의 저장 엘리먼트의 집합적인 용어일 수 있다. 메모리에 저장된 컴퓨터 판독 가능한 명령들은 실행가능한 프로그램 코드 또는 파라미터, 데이터 등일 수 있다. Also, a memory may be a single storage device, or may be a collective term for a plurality of storage elements. The computer readable instructions stored in the memory may be executable program code or parameters, data, and the like.

그리고, 메모리는 RAM(Random Access Memory)을 포함할 수 있거나, 또는 자기 디스크 저장장치 또는 플래시(flash) 메모리와 같은 NVRAM(Non-Volatile Memory)을 포함할 수 있다. 여기서, 메모리는 음원구간 저장부(160)의 기능을 수행할 수 있다.In addition, the memory may include random access memory (RAM), or may include non-volatile memory (NVRAM) such as magnetic disk storage or flash memory. Here, the memory may perform the function of the sound source section storage unit 160 .

도 6 및 도 7은 본 발명의 실시예에 따른 인공지능 기반의 이상음원 인식 방법을 나타낸 도면이다.6 and 7 are diagrams illustrating an artificial intelligence-based abnormal sound source recognition method according to an embodiment of the present invention.

도 6에 도시된 바와 같이, 이상음원 인식 장치(100)는 외부로부터 입력된 음원신호를 시간차를 두고 인식하기 위해 제1 음원구간과 제2 음원구간을 시분할하여 추출한다.As shown in FIG. 6 , the abnormal sound source recognition apparatus 100 time-divisions and extracts the first sound source section and the second sound source section in order to recognize the sound source signal inputted from the outside with a time difference.

먼저, 이상음원 인식 장치(100)는 음원신호에서 제1 음원구간을 추출한다(S201).First, the abnormal sound source recognition apparatus 100 extracts a first sound source section from the sound source signal (S201).

그리고, 이상음원 인식 장치(100)는 제1 음원구간의 음원특징을 추출하고, 이상음원용 모델을 이용하여 음원특징을 미리 정해진 이상음원 종류별로 분류한다(S202).Then, the abnormal sound source recognition apparatus 100 extracts the sound source characteristics of the first sound source section, and classifies the sound source characteristics according to a predetermined abnormal sound source type using the abnormal sound source model (S202).

그런 다음, 이상음원 인식 장치(100)는 제1 음원구간에서 하나의 이상음원을 인식하는지를 확인한다(S203).Then, the abnormal sound source recognizing apparatus 100 checks whether one abnormal sound source is recognized in the first sound source section (S203).

이때, 이상음원 인식 장치(100)는 제1 음원구간에서 하나의 이상음원을 인식하지 않는 경우(S203), 음원신호에서 제2 음원구간을 추출하면(S203-1), 제2 음원구간의 음원특징을 추출하고, 이상음원용 모델을 이용하여 음원특징을 미리 정해진 이상음원 종류별로 분류한다(S204).In this case, when the abnormal sound source recognition apparatus 100 does not recognize one abnormal sound source in the first sound source section (S203), and extracts the second sound source section from the sound source signal (S203-1), the sound source of the second sound source section The features are extracted, and the sound source features are classified according to a predetermined abnormal sound source type using the abnormal sound source model (S204).

이후, 이상음원 인식 장치(100)는 제2 음원구간에서 하나 이상의 이상음원을 인식하는지를 확인한다(S205). Thereafter, the abnormal sound source recognizing apparatus 100 checks whether one or more abnormal sound sources are recognized in the second sound source section (S205).

이때, 이상음원 인식 장치(100)는 복수의 이상음원을 인식할 때 복수의 이상음원을 종합하여 이상음원 시나리오에 해당하는지를 확인한다(S206). 즉, 이상음원 인식 장치(100)는 복수의 이상음원을 구성하는 개별 이상음원의 발생 순서에 대응되는 이상음원 시나리오를 조회하여 확인할 수 있다. S206 단계에서는 제2 음원구간에서 이상음원 인식결과만 고려하여 이상음원 시나리오에 해당하는지를 확인한다.At this time, when recognizing a plurality of abnormal sound sources, the apparatus 100 for recognizing an abnormal sound source synthesizes the plurality of abnormal sound sources and confirms whether the abnormal sound source scenario corresponds to the abnormal sound source scenario (S206). That is, the abnormal sound source recognizing apparatus 100 may inquire and confirm the abnormal sound source scenario corresponding to the generation order of the individual abnormal sound sources constituting the plurality of abnormal sound sources. In step S206, only the abnormal sound source recognition result in the second sound source section is considered to determine whether the abnormal sound source scenario is applicable.

여기서, 이상음원 인식 장치(100)는 이상음원 시나리오에 해당하면(S206), 이상음원 시나리오에 따른 특정 이벤트 정보를 알려준다(S207). 이때, 이상음원 인식 장치(100)는 제2 음원구간에서 복수의 이상음원을 인식한다.Here, if the abnormal sound source recognition apparatus 100 corresponds to the abnormal sound source scenario (S206), it notifies specific event information according to the abnormal sound source scenario (S207). In this case, the abnormal sound source recognition apparatus 100 recognizes a plurality of abnormal sound sources in the second sound source section.

그렇지 않으면, 이상음원 인식 장치(100)는 개별 이상음원 인식결과를 알려준다(S208). 이때, 이상음원 인식 장치(100)는 제2 음원구간에서 하나의 이상음원을 인식하거나 제2 음원구간에서 복수의 이상음원을 인식하더라도 이상음원 시나리오에 해당하지 않으면, 개별 이상음원 인식결과를 알려준다.Otherwise, the abnormal sound source recognition apparatus 100 notifies the individual abnormal sound source recognition result (S208). At this time, the abnormal sound source recognition apparatus 100 notifies the individual abnormal sound source recognition result if it does not correspond to the abnormal sound source scenario even if it recognizes one abnormal sound source in the second sound source section or recognizes a plurality of abnormal sound sources in the second sound source section.

한편, 도 6의 S203 단계에서, 이상음원 인식 장치(100)는 제1 음원구간에서 하나의 이상음원을 인식하면(S203), 도 7에 도시된 바와 같이 후속 단계를 진행한다.Meanwhile, in step S203 of FIG. 6 , when the abnormal sound source recognizing apparatus 100 recognizes one abnormal sound source in the first sound source section ( S203 ), as shown in FIG. 7 , a subsequent step is performed.

도 6은 제2 음원구간에서 이상음원을 인식하는 경우를 나타내는 것이고, 도 7은 먼저 제1 음원구간에서 이상음원을 인식하는 경우를 나타내고, 다음 제1 및 제2 음원구간에서 이상음원을 인식하는 경우를 나타내는 것이다.6 shows a case of recognizing an abnormal sound source in the second sound source section, FIG. 7 shows a case of recognizing an abnormal sound source in the first sound source section first, and then recognizing an abnormal sound source in the first and second sound source sections indicates the case.

도 7을 참조하면, 이상음원 인식 장치(100)는 제1 음원구간에서 하나의 이상음원을 인식하면(S251), 하나의 이상음원 인식결과를 알려준다(S252). 이와 같이, 이상음원 인식 장치(100)는 제1 음원구간에 대해 선 분석을 실시하여 제1 음원구간에 대한 이상음원 발생을 신속하게 알려줄 수 있다.Referring to FIG. 7 , when the abnormal sound source recognition apparatus 100 recognizes one abnormal sound source in the first sound source section (S251), it notifies the result of the one abnormal sound source recognition (S252). In this way, the apparatus for recognizing an abnormal sound source 100 may perform line analysis on the first sound source section to promptly notify the occurrence of an abnormal sound source for the first sound source section.

이와 동시에, 이상음원 인식 장치(100)는 음원신호에서 제2 음원구간을 추출하면(S252), 제2 음원구간의 음원특징을 추출하고, 이상음원용 모델을 이용하여 음원특징을 미리 정해진 이상음원 종류별로 분류한다(S253).At the same time, when the abnormal sound source recognition apparatus 100 extracts the second sound source section from the sound source signal (S252), the sound source feature of the second sound source section is extracted, and the sound source characteristic is determined in advance by using the abnormal sound source model. Classify by type (S253).

이후, 이상음원 인식 장치(100)는 제2 음원구간에서 하나 이상의 이상음원을 인식하는지를 확인한다(S254).Thereafter, the abnormal sound source recognizing apparatus 100 checks whether one or more abnormal sound sources are recognized in the second sound source section (S254).

이때, 이상음원 인식 장치(100)는 제2 음원구간에서 하나 이상의 이상음원을 인식하면(S254), 제1 음원구간에서 이상음원 인식결과를 확인하고(S255), 제2 음원구간에서 하나 이상의 이상음원 인식결과를 종합하여 이상음원 시나리오에 해당하는지를 확인한다(S256). S256 단계에서는 제1 음원구간에서 하나의 이상음원 인식결과와 제2 음원구간에서 하나 이상의 이상음원 인식결과를 종합적으로 고려하여 이상음원 시나리오에 해당하는지를 확인한다.At this time, when the abnormal sound source recognition apparatus 100 recognizes one or more abnormal sound sources in the second sound source section (S254), it checks the abnormal sound source recognition result in the first sound source section (S255), and at least one or more abnormal sound sources in the second sound source section By synthesizing the sound source recognition results, it is checked whether it corresponds to an abnormal sound source scenario (S256). In step S256, it is checked whether the abnormal sound source scenario corresponds to the result of recognition of one abnormal sound source in the first sound source section and the recognition result of one or more abnormal sound sources in the second sound source section.

여기서, 이상음원 인식 장치(100)는 이상음원 시나리오에 해당하면(S256), 이상음원 시나리오에 따른 특정 이벤트 정보를 알려준다(S257). 이때, 이상음원 인식 장치(100)는 제1 음원구간에서 하나의 이상음원 인식결과와 제2 음원구간에서 하나 이상의 이상음원 인식결과를 토대로 구성하는 이상음원 시나리오에 따른 특정 이벤트 정보를 알려준다.Here, if the abnormal sound source recognition apparatus 100 corresponds to the abnormal sound source scenario (S256), it notifies specific event information according to the abnormal sound source scenario (S257). In this case, the abnormal sound source recognition apparatus 100 informs of specific event information according to the abnormal sound source scenario constituted based on the recognition result of one abnormal sound source in the first sound source section and the recognition result of one or more abnormal sound sources in the second sound source section.

그렇지 않으면, 이상음원 인식 장치(100)는 개별 이상음원 인식결과를 알려준다(S258). 이때, 이상음원 인식 장치(100)는 제2 음원구간에서 하나의 이상음원을 인식하거나 제2 음원구간에서 복수의 이상음원을 인식하더라도 이상음원 시나리오에 해당하지 않으면, 개별 이상음원 인식결과를 알려준다. S258 단계에서는 제1 음원구간에서 이상음원 인식결과를 다시 알려주지는 않는다.Otherwise, the abnormal sound source recognition apparatus 100 notifies the individual abnormal sound source recognition result (S258). At this time, the abnormal sound source recognition apparatus 100 notifies the individual abnormal sound source recognition result if it does not correspond to the abnormal sound source scenario even if it recognizes one abnormal sound source in the second sound source section or recognizes a plurality of abnormal sound sources in the second sound source section. In step S258, the abnormal sound source recognition result is not notified again in the first sound source section.

도 8은 본 발명의 실시예에 따른 인공지능 기반의 이상음원 인식 장치를 이용한 관제시스템을 나타낸 도면이다.8 is a diagram illustrating a control system using an artificial intelligence-based abnormal sound source recognition device according to an embodiment of the present invention.

도 8을 참조하면, 관제시스템은 이상음원 인식 장치(100), 관제서버(200) 및 관제단말기(300)를 포함한다.Referring to FIG. 8 , the control system includes an abnormal sound source recognition device 100 , a control server 200 , and a control terminal 300 .

이러한 관제시스템은 사건/사고 발생 가능성이 예상되는 다수 개의 관제구역(즉, 제1 관제구역 내지 제N 관제구역)을 모니터링한다. 그리고, 다수 개의 관제구역 각각에는 CCTV(10-1 내지 10-N)와 음원감지기(20-1 내지 20-N)가 설치되어 있다. 이때, 음원감지기(20-1 내지 20-N)는 음원이 발생되는 방향 탐지가 가능하며, 이에 대한 자세한 설명은 통상의 기술자라면 쉽게 이해할 수 있는 것이므로 생략하기로 한다.This control system monitors a number of control areas (ie, the 1st control area to the Nth control area) where the possibility of an event/accident is expected. In addition, CCTVs (10-1 to 10-N) and sound source detectors (20-1 to 20-N) are installed in each of the plurality of control areas. At this time, the sound source detectors 20-1 to 20-N are capable of detecting the direction in which the sound source is generated, and a detailed description thereof will be omitted because it is easily understood by those skilled in the art.

한편, 이상음원 인식 장치(100)는 음원감지기(20-1 내지 20-N)와 연결되어 있고, 음원감지기(20-1 내지 20-N)에 의해 수집된 음원신호를 입력받아 이상음원을 추출 및 분류함으로써 이상음원을 인식한다. 이에 대한 자세한 설명은 전술한 도 1 내지 도 3에 대한 설명을 통해 이해 가능하므로 생략하기로 한다.On the other hand, the abnormal sound source recognition apparatus 100 is connected to the sound source detectors 20-1 to 20-N, receives the sound source signal collected by the sound source detectors 20-1 to 20-N, and extracts the abnormal sound source and classifying to recognize abnormal sound sources. A detailed description thereof will be omitted because it can be understood through the description of FIGS. 1 to 3 described above.

이때, 이상음원 인식 장치(100)는 특정 관제구역의 이상음원 분류결과를 관제서버(200)로 전달한다. 이때, 이상음원 인식 장치(100)는 특정 관제구역의 이상음원 인식결과를 알려주는 것과 동시에, 특정 관제구역의 위치정보와 음원감지기(20-1 내지 20-N)에 의해 탐지된 음원이 발생되는 방향정보를 관제서버(200)로 함께 전달한다.At this time, the abnormal sound source recognition apparatus 100 transmits the abnormal sound source classification result of a specific control area to the control server 200 . At this time, the abnormal sound source recognition device 100 notifies the abnormal sound source recognition result of the specific control area, and the location information of the specific control area and the sound source detected by the sound source detectors 20-1 to 20-N are generated. The direction information is transmitted together to the control server 200 .

그러면, 관제서버(200)는 이상음원 인식 장치(100)로부터 전달된 특정 관제구역의 위치정보와 음원 방향정보를 이용하여 특정 관제구역에 설치된 해당 CCTV의 카메라 방향을 이상음원이 발생한 위치로 향하여 집중 관제하도록 제어할 수 있다. 이때, CCTV는 PTZ(Pan/Tilt/Zoom) 제어된다.Then, the control server 200 uses the location information and sound source direction information of the specific control area transmitted from the abnormal sound source recognition device 100 to focus the camera direction of the CCTV installed in the specific control area toward the location where the abnormal sound source is generated. It can be controlled to be controlled. At this time, the CCTV is PTZ (Pan/Tilt/Zoom) controlled.

또한 관제단말기(300)는 관제서버(200)에 의해 제어되는 CCTV의 촬영영상을 실시간으로 재생하여 관리자가 확인할 수 있게 한다.In addition, the control terminal 300 reproduces the captured video of the CCTV controlled by the control server 200 in real time so that the administrator can check it.

이와 같이, 관제시스템은 CCTV의 사각지대를 최소화하고 능동적으로 CCTV를 운영할 수 있게 된다.In this way, the control system can minimize the blind spot of the CCTV and actively operate the CCTV.

일부 실시 예에 의한 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CDROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The method according to some embodiments may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CDROMs and DVDs, and magneto-optical disks such as floppy disks. hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

비록 상기 설명이 다양한 실시예들에 적용되는 본 발명의 신규한 특징들에 초점을 맞추어 설명되었지만, 본 기술 분야에 숙달된 기술을 가진 사람은 본 발명의 범위를 벗어나지 않으면서도 상기 설명된 장치 및 방법의 형태 및 세부 사항에서 다양한 삭제, 대체, 및 변경이 가능함을 이해할 것이다. 따라서, 본 발명의 범위는 상기 설명에서보다는 첨부된 특허청구범위에 의해 정의된다. 특허청구범위의 균등 범위 안의 모든 변형은 본 발명의 범위에 포섭된다.Although the foregoing description has focused on novel features of the invention as applied to various embodiments, those skilled in the art will recognize the apparatus and method described above without departing from the scope of the invention. It will be understood that various deletions, substitutions, and changes are possible in the form and details of Accordingly, the scope of the invention is defined by the appended claims rather than by the foregoing description. All modifications within the scope of equivalents of the claims are included in the scope of the present invention.

10-1 내지 10-N ; CCTV 20-1 내지 20-N ; 음원감지기
110 ; CNN 모델 생성부 120 ; 음원구간 추출부
130 ; 음원특징 추출부 140 ; 이상음원 분류부
150 ; 이상음원 인식부 160 ; 음원구간 저장부
200 ; 관제서버 300 ; 관제단말기10-1 to 10-N; CCTV 20-1 to 20-N ; sound detector
110 ; CNN model generator 120 ; sound source section
130 ; sound source feature extraction unit 140 ; Abnormal sound classification section
150 ; Abnormal sound source recognition unit 160 ; music section storage
200 ; control server 300 ; control terminal

Claims

a model generator for generating a model for an abnormal sound source for classifying an abnormal sound source by using the machine learning result performed based on the data amount of the arbitrary sound source dataset;
a sound source section extractor for extracting the sound source section by time division in order to recognize the sound source signal inputted from the outside with a time difference;
a sound source feature extraction unit for extracting sound source features of the sound source section;
an abnormal sound source classification unit for classifying the sound source characteristics according to a predetermined abnormal sound source type using the abnormal sound source model; and
an abnormal sound source recognition unit for notifying an abnormal sound source recognition result of the sound source signal according to the time difference classification result of the sound source characteristics;
The sound source section extraction unit,
Extracting the sound source signal into a first sound source section in which the first highest point is confirmed, and a second sound source section in succession to the first sound source section for a predetermined time period,
The first sound source section,
It represents the length of the section of 1 second on each left and right based on the first highest point of the sing-gi,
The second sound source section,
Represents a section length four times that of the first sound source section,
The second sound source section is an artificial intelligence-based abnormal sound source recognition device in which one or more peaks appear.

delete

The method of claim 1,
The abnormal sound source recognition unit,
An artificial intelligence-based abnormality that informs the first abnormal sound source recognition result for the sound source signal through the first sound source section, and then informs the second abnormal sound source recognition result for the sound source signal through the second sound source section sound recognition device.

6. The method of claim 5,
The abnormal sound source recognition unit,
When a plurality of abnormal sound sources are recognized from the first abnormal sound source recognition result and the second abnormal sound source recognition result, it is based on artificial intelligence that informs event information according to the abnormal sound source scenario by synthesizing the plurality of abnormal sound source recognition results of abnormal sound source recognition device.

7. The method of claim 6,
The above abnormal sound source scenario is,
An artificial intelligence-based abnormal sound source recognition device that notifies the occurrence of a specific event in response to the order of occurrence of individual abnormal sound sources when a plurality of abnormal sound sources are recognized.

The method of claim 1,
The model for the above abnormal sound source,
An artificial intelligence-based abnormal sound source recognition device that is a convolutional neural network-based model.

The method of claim 1,
The sound source feature extraction unit,
After obtaining the sound source signal as a spectrum representing the frequency domain, a filter bank based on a mel scale is applied to the spectrum to derive a mel spectrum, and the mel spectrum is An artificial intelligence-based abnormal sound source recognition device that uses a log mel spectrogram-based extraction technique that extracts log mel spectrogram-based sound source features through cepstrum analysis .

(a) generating a model for an abnormal sound source for classifying an abnormal sound source by using the machine learning result performed based on the data amount of the arbitrary sound source dataset;
(b) time-dividing and extracting the sound source section to recognize the sound source signal inputted from the outside with a time difference;
(c) extracting a sound source feature of the sound source section;
(d) classifying the sound source characteristics into predetermined abnormal sound source types using the abnormal sound source model; and
(e) notifying an abnormal sound source recognition result of the sound source signal according to the time difference classification result of the sound source feature;
includes,
The step (b) is,
Extracting the sound source signal into a first sound source section in which the first highest point is confirmed and a second sound source section in succession to the first sound source section for a predetermined time period,
The first sound source section,
It represents the length of the section of 1 second on each left and right, based on the first highest point,
The second sound source section,
Represents a section length four times that of the first sound source section,
The second sound source section is an artificial intelligence-based abnormal sound source recognition method in which one or more peaks appear.

delete

11. The method of claim 10,
Step (e) is,
An artificial intelligence-based abnormality that informs the first abnormal sound source recognition result for the sound source signal through the first sound source section, and then informs the second abnormal sound source recognition result for the sound source signal through the second sound source section How to recognize a sound source.

15. The method of claim 14,
Step (e) is,
When a plurality of abnormal sound sources are recognized from the first abnormal sound source recognition result and the second abnormal sound source recognition result, it is based on artificial intelligence that informs event information according to the abnormal sound source scenario by synthesizing the plurality of abnormal sound source recognition results method of recognizing abnormal sound sources.

16. The method of claim 15,
The abnormal sound source scenario is,
An artificial intelligence-based abnormal sound source recognition method that notifies the occurrence of a specific event in response to the order of occurrence of individual abnormal sound sources when a plurality of abnormal sound sources are recognized.

11. The method of claim 10,
The model for the above abnormal sound source,
An artificial intelligence-based abnormal sound source recognition method, a convolutional neural network-based model.

11. The method of claim 10,
Step (c) is,
After obtaining the sound source signal as a spectrum representing the frequency domain, a filter bank based on a mel scale is applied to the spectrum to derive a mel spectrum, and the mel spectrum is An artificial intelligence-based abnormal sound source recognition method using a log mel spectrogram-based extraction technique that extracts the log mel spectrogram-based sound source characteristics through cepstrum analysis .

delete