KR102518615B1

KR102518615B1 - Apparatus and Method for complex monitoring to judge abnormal sound source

Info

Publication number: KR102518615B1
Application number: KR1020220029209A
Authority: KR
Inventors: 배영훈; 김재훈; 박장수
Original assignee: 아이브스 주식회사
Priority date: 2022-03-08
Filing date: 2022-03-08
Publication date: 2023-05-08

Abstract

이상 감지 장치를 이용해 영상과 음원을 수집하고, 수집된 정보를 복합 분석하여 이상 상황을 감지할 수 있는 장치가 개시된다. 상기 장치는, 영상 데이터 및 음원 데이터를 수집하는 이상 감지기, 상기 영상 데이터 및 상기 음원 데이터를 복합 분석하여 이상상황을 판단하여 이상상황정보를 생성하는 분석부, 및 상기 이상상황정보를 송출하는 저장송출부를 포함하는 것을 특징으로 한다.Disclosed is a device capable of detecting an abnormal situation by collecting images and sound sources using an anomaly detection device and performing complex analysis on the collected information. The device includes an anomaly detector that collects video data and sound source data, an analysis unit that determines an abnormal situation by complexly analyzing the video data and the sound source data, and generates abnormal situation information, and transmits and stores the abnormal situation information. It is characterized by including wealth.

Description

Apparatus and Method for complex monitoring to judge abnormal sound source}

본 발명은 이상 음원을 판단하는 기술에 관한 것으로서, 더 상세하게는 일체형 이상 감지 장치를 이용해 영상과 음원을 수집하고, 수집된 정보를 복합 분석하여 이상 상황을 감지할 수 있는 장치 및 방법에 대한 것이다.The present invention relates to a technology for determining an abnormal sound source, and more particularly, to a device and method capable of detecting an abnormal situation by collecting images and sound sources using an integrated abnormality detection device and complexly analyzing the collected information. .

인공지능 기술은 기계학습(딥러닝) 및 기계학습을 활용한 요소 기술들로 구성된다. 기계학습은 입력 데이터들의 특징을 스스로 분류/학습하는 알고리즘 기술이며, 요소기술은 딥러닝 등의 기계학습 알고리즘을 활용하는 기술로서, 언어적 이해, 시각적 이해, 추론/예측, 지식 표현, 동작 제어 등의 기술 분야로 구성된다.Artificial intelligence technology consists of machine learning (deep learning) and element technologies using machine learning. Machine learning is an algorithm technology that classifies/learns the characteristics of input data by itself, and element technology is a technology that utilizes machine learning algorithms such as deep learning, such as linguistic understanding, visual understanding, inference/prediction, knowledge expression, motion control, etc. consists of the technical fields of

최근에는 인공지능 기술이 발달함에 따라 다양한 기술이 쏟아져 나오고 있다. 특히, 음성과 음향 등의 오디오 데이터 인식과 관련된 분야는 오디오 데이터로부터 비정상 상황을 인식하는 연구들이 활발하게 진행되고 있다.In recent years, as artificial intelligence technology develops, various technologies are pouring out. In particular, in the field of recognizing audio data such as voice and sound, studies on recognizing abnormal situations from audio data are being actively conducted.

이는 CCTV(Closed-circuit Television)로 촬영된 영상 데이터를 통해 비정상 상황 인식하는 방법의 한계를 벗어나서 오디오 데이터에서 이상음원을 감지하여 감시자에게 알려줌으로써 보다 효과적으로 해당 상황에 따른 조치를 취하는 방안일 수 있다.This may be a method of taking action according to the situation more effectively by detecting an abnormal sound source in the audio data and notifying the supervisor of the abnormal situation beyond the limits of the abnormal situation recognition method through video data captured by CCTV (Closed-circuit Television).

그런데, 기존 복합감지 CCTV의 경우, 음향정보를 근거로 이상발생여부를 판단한다. 부연하면, 복합 감지 CCTV의 설치 위치 정보 및 설치 방향 정보를 근거로 복수의 복합 감지 CCTV들에 대한 센서 매트릭스 맵을 생성하고, 이상 음향 발생 시 상기 센서 매트릭스 맵을 근거로 상기 이상음향의 발생 위치를 계산하는 방식이다.However, in the case of the existing multi-sensing CCTV, it is determined whether an abnormality has occurred based on sound information. In other words, a sensor matrix map for a plurality of complex sensing CCTVs is created based on the installation location information and installation direction information of the complex sensing CCTV, and when abnormal sound is generated, the location of the abnormal sound is determined based on the sensor matrix map. way to calculate it.

그러나, 이러한 이상감지 장치는 음향정보에 대한 분석만을 사용하여 오탐의 확률이 있다. 또한, 영상에 대한 분석이 이루어지지 않아 음향이 작게 발생하는 이상상황을 감지하지 못하는 문제가 있다. However, such an anomaly detection device has a probability of false positives by using only analysis of acoustic information. In addition, there is a problem in that an abnormal situation in which a small sound is generated cannot be detected because an image is not analyzed.

또한, 구조 요청음 등의 음성을 인식하지 못해 실제 위급상황에 대한 대처 능력이 떨어지는 문제가 있다. In addition, there is a problem in that the ability to cope with an actual emergency situation is reduced because the voice such as a rescue request is not recognized.

1. 대한민국등록특허번호 제10-1543561호(등록일자: 2015년08월05일)1. Republic of Korea Patent Registration No. 10-1543561 (registration date: August 5, 2015)

본 발명은 위 배경기술에 따른 문제점을 해소하기 위해 제안된 것으로서, 일체형 이상 감지 장치를 이용해 영상과 음원을 수집하고, 수집된 정보를 복합 분석하여 이상 상황을 감지할 수 있는 장치 및 방법을 제공하는데 그 목적이 있다.The present invention has been proposed to solve the problems caused by the above background art, and provides a device and method capable of detecting an abnormal situation by collecting images and sound sources using an integrated abnormality detection device and complexly analyzing the collected information. It has a purpose.

또한, 본 발명은 음향이 작게 발생하는 이상상황일지라도 이를 정확하게 감지할 수 있는 장치 및 방법을 제공하는데 다른 목적이 있다.In addition, another object of the present invention is to provide a device and method capable of accurately detecting an abnormal situation in which a small sound is generated.

또한, 본 발명은 구조 요청음 등의 음성을 정확하게 인식하여 실제 위급상황에 대한 대처 능력을 향상시킬 수 있는 장치 및 방법을 제공하는데 또 다른 목적이 있다.In addition, another object of the present invention is to provide an apparatus and method capable of improving the ability to cope with an actual emergency situation by accurately recognizing a voice such as a rescue request.

본 발명은 위에서 제시된 과제를 달성하기 위해, 이상 감지 장치를 이용해 영상과 음원을 수집하고, 수집된 정보를 복합 분석하여 이상 상황을 감지할 수 있는 장치를 제공한다.In order to achieve the object presented above, the present invention provides a device capable of detecting an abnormal situation by collecting images and sound sources using an anomaly detection device and complexly analyzing the collected information.

상기 장치는,The device,

영상 데이터 및 음원 데이터를 수집하는 이상 감지기;an anomaly detector that collects image data and sound source data;

상기 영상 데이터 및 상기 음원 데이터를 복합 분석하여 이상상황을 판단하여 이상상황정보를 생성하는 분석부; 및 an analyzer configured to perform a complex analysis of the video data and the sound source data to determine an abnormal situation and generate abnormal situation information; and

상기 이상상황정보를 송출하는 저장송출부;를 포함하는 것을 특징으로 한다.It is characterized in that it includes; a storage transmission unit for transmitting the abnormal situation information.

또한, 상기 이상 감지기는, 상기 영상 데이터를 생성하는 영상 수집부; 및 상기 음원 데이터를 생성하는 음원 수집부;를 포함하는 것을 특징으로 한다.In addition, the anomaly detector may include an image collection unit generating the image data; and a sound source collection unit generating the sound source data.

또한, 상기 음원 수집부는, 음원을 수집하여 음원 데이터를 생성하는 2개의 마이크; 및 상기 음원의 방향을 감지하여 음원 방향 정보를 생성하는 방향 감지 모듈;을 포함하는 것을 특징으로 한다.In addition, the sound source collection unit, two microphones for generating sound source data by collecting the sound source; and a direction detecting module for detecting the direction of the sound source and generating sound source direction information.

또한, 상기 음원 수집부는, 방향 감지 모듈이 내부에 구성되는 몸체; 및 상기 몸체의 좌우에 일체로 배치되어 음원을 집음하며 2개의 상기 마이크가 가장자리에 배치되는 집음판;을 포함하는 것을 특징으로 한다.In addition, the sound source collection unit, a body in which the direction detection module is configured; and a sound collecting plate integrally disposed on the left and right sides of the body to collect sound sources and having two microphones disposed at edges.

또한, 상기 집음판은 상기 음원의 집음 효과를 강화하기 위해 파라볼로이드 형태인 것을 특징으로 한다.In addition, the sound collecting plate is characterized in that it is a paraboloid type to enhance the sound collecting effect of the sound source.

또한, 2개의 상기 마이크는 상기 집음판에 반사되는 상기 음원이 포커싱되는 집음 지점에 반사 음원을 마주보는 방향으로 설치되는 것을 특징으로 한다.In addition, the two microphones are characterized in that they are installed in a direction facing the reflected sound source at a sound collecting point where the sound source reflected by the sound collecting plate is focused.

또한, 상기 분석부는, 상기 영상 데이터를 분석하여 객체를 인식하고 상기 객체의 정보를 바탕으로 이상행동을 탐지하여 이상행동정보를 생성하는 영상 분석 모듈; 상기 음원 데이터로부터 이상음원을 인식하고 상기 이상음원의 음원방향정보를 생성하는 음원 분석 모듈; 및 상기 이상행동정보 및 상기 음원방향정보를 이용하여 미리 정의된 이벤트 처리 규칙에 따라 주변의 상기 이상상황정보를 생성하는 복합 분석 모듈;을 포함하는 것을 특징으로 한다.The analysis unit may further include: an image analysis module for recognizing an object by analyzing the image data, detecting an abnormal behavior based on information of the object, and generating abnormal behavior information; a sound source analysis module recognizing an abnormal sound source from the sound source data and generating sound source direction information of the abnormal sound source; and a complex analysis module generating the surrounding abnormal situation information according to a predefined event processing rule using the abnormal behavior information and the sound source direction information.

또한, 상기 영상 분석 모듈은, 상기 영상 데이터를 전처리하여 표준화된 표준 영상 프레임을 생성하는 전처리 유닛; 상기 표준 영상 프레임에 신경망 분류 모델을 적용하여 상기 객체를 탐지하고 분류하는 영상 분석 유닛; 및 인식한 상기 객체를 이용하여 상기 이상행동을 탐지하여 상기 이상행동정보를 생성하는 이상행동탐지유닛;을 포함하는 것을 특징으로 한다.The image analysis module may include a pre-processing unit pre-processing the image data to generate a standardized standard image frame; an image analysis unit detecting and classifying the object by applying a neural network classification model to the standard image frame; and an abnormal behavior detection unit generating the abnormal behavior information by detecting the abnormal behavior using the recognized object.

또한, 상기 탐지된 객체에 대한 신뢰도를 높이기 위해 현장 상황에 맞게 학습된 분류기를 앙상블해 분석되는 것을 특징으로 한다.In addition, in order to increase the reliability of the detected object, it is characterized in that an ensemble of classifiers learned according to the field situation is analyzed.

또한, 상기 이상음원은 비명음, 폭발음, 충돌음, 차량 급정거음을 포함하는 이상 소리 또는 구조요청을 위한 특정 단어를 갖는 이상 음성인 것을 특징으로 한다.In addition, the abnormal sound source is characterized in that it is an abnormal sound including a scream, an explosion sound, a collision sound, or a vehicle sudden stop, or an abnormal sound having a specific word for a rescue request.

또한, 상기 이상행동정보 또는 상기 음원방향정보의 발생에 의한 제 1 이벤트가 상기 이벤트 처리 규칙에 따라 미리 정의되는 우선 조건 이벤트에 해당하면 1단계 알람을 나타내는 상기 이상상황정보가 생성되는 것을 특징으로 한다.In addition, when the first event caused by the occurrence of the abnormal behavior information or the sound source direction information corresponds to a priority condition event predefined according to the event processing rule, the abnormal situation information indicating a first-level alarm is generated. .

또한, 상기 제 2 이벤트가 미리 설정되는 제 1 가중치를 초과하면 2단계 알람을 나타내는 상기 이상상황정보가 생성되며, 상기 제 1 가중치는 소리분석 또는 음성분석에 따른 이벤트에 부여되는 값인 것을 특징으로 한다.In addition, when the second event exceeds a preset first weight, the abnormal situation information indicating a second-stage alarm is generated, and the first weight is a value given to an event according to sound analysis or voice analysis. .

또한, 상기 제 1 이벤트가 상기 우선 조건 이벤트에 해당되지 않고, 상기 이상행동정보의 발생에 의한 영상 이벤트가 발생하면, 제 1 가중치 및 제 2 가중치를 합하여 가중치 합이 미리 설정되는 제 1 설정값을 초과하면 상기 2단계 알람을 나타내는 상기 이상상황정보가 생성되며, 상기 제 2 가중치는 영상 분석에 따른 이벤트에 부여되는 값인 것을 특징으로 한다.In addition, when the first event does not correspond to the priority condition event and a video event due to the occurrence of the abnormal behavior information occurs, a first set value for which the sum of weights is preset by adding a first weight and a second weight is obtained. If it exceeds, the abnormal situation information indicating the second-stage alarm is generated, and the second weight is a value given to an event according to video analysis.

또한, 상기 제 1 이벤트가 상기 우선 조건 이벤트에 해당되지 않고, 상기 이상행동정보의 발생에 의한 영상 이벤트가 발생하면, 제 2 가중치가 미리 설정되는 제 2 설정값을 초과하면 상기 2단계 알람을 나타내는 상기 이상상황정보가 생성되는 것을 특징으로 한다.In addition, when the first event does not correspond to the priority condition event and a video event due to the occurrence of the abnormal behavior information occurs, and the second weight exceeds a preset second set value, the second-stage alarm is displayed. It is characterized in that the abnormal situation information is generated.

또한, 상기 제 1 이벤트가 상기 우선 조건 이벤트에 해당되지 않고, 상기 이상행동정보의 발생에 의한 영상 이벤트가 발생하지 않으면, 상기 제 2 가중치가 미리 설정되는 제 3 설정값을 초과하면 상기 1단계 알람을 나타내는 상기 이상상황정보가 생성되고, 상기 제 2 가중치가 미리 설정되는 제 3 설정값을 초과하지 않으면, 운영자 단말로 다른 장치의 탐색을 요청하는 것을 특징으로 한다.In addition, if the first event does not correspond to the priority condition event and no video event occurs due to the occurrence of the abnormal behavior information, and the second weight exceeds a preset third set value, the first-stage alarm When the abnormal situation information indicating is generated and the second weight does not exceed a preset third set value, a search for another device is requested to the operator terminal.

다른 한편으로, 본 발명의 다른 일실시예는, (a) 이상 감지기가 영상 데이터 및 음원 데이터를 수집하는 단계; (b) 분석부가 상기 영상 데이터 및 상기 음원 데이터를 복합 분석하여 이상상황을 판단하여 이상상황정보를 생성하는 단계; 및 (c) 저장송출부가 상기 이상상황정보를 송출하는 단계;를 포함하는 것을 특징으로 하는 이상 음원을 판단하는 복합 감시 방법을 제공한다.On the other hand, another embodiment of the present invention, (a) the anomaly detector collecting image data and sound source data; (b) generating abnormal situation information by determining an abnormal situation through a complex analysis of the video data and the sound source data by an analysis unit; and (c) transmitting the abnormal situation information by a storage/transmission unit.

본 발명에 따르면, 일체형 이상 감지 장치를 이용해 영상과 음원을 수집하고, 수집된 정보를 복합 분석하여 이상 상황을 감지할 수 있다.According to the present invention, it is possible to detect an abnormal situation by collecting images and sound sources using an integrated anomaly detection device and complexly analyzing the collected information.

또한, 본 발명의 다른 효과로서는 음향이 작게 발생하는 이상상황일지라도 이를 정확하게 감지할 수 있다는 점을 들 수 있다.In addition, another effect of the present invention is that even in an abnormal situation in which a small sound is generated, it can be accurately sensed.

또한, 본 발명의 또 다른 효과로서는 구조 요청음 등의 음성을 정확하게 인식하여 실제 위급상황에 대한 대처 능력을 향상시킬 수 있다는 점을 들 수 있다.In addition, another effect of the present invention is that it is possible to improve the ability to cope with an actual emergency situation by accurately recognizing a voice such as a rescue request sound.

도 1은 본 발명의 일실시예에 따른 복합 감시 장치의 구성 블럭도이다.
도 2는 도 1에 도시된 음원 수집부의 외관 사시도이다.
도 3은 도 2에 도시된 음원 수집부의 일부 확대 사시도이다.
도 4는 도 3에 도시된 마이크의 사시도이다.
도 5는 도 1에 도시된 영상 수집부의 구성 블럭도이다.
도 6은 도 1에 도시된 영상 분석 모듈의 세부 구성 블럭도이다.
도 7은 본 발명의 일실시예에 따른 이상 음원을 판단하는 과정을 보여주는 흐름도이다.
도 8은 본 발명의 일실시예에 따른 가중치를 적용하여 이벤트 결과를 생성하는 과정을 보여주는 흐름도이다.
도 9는 본 발명의 일실시예에 따른 3채널 특징점 추출 개념을 보여주는 도면이다.
도 10은 본 발명의 일실시예에 따른 분류 모델을 생성하는 과정을 보여주는 흐름도이다.1 is a configuration block diagram of a complex monitoring device according to an embodiment of the present invention.
FIG. 2 is an external perspective view of the sound source collection unit shown in FIG. 1 .
FIG. 3 is an enlarged perspective view of a portion of the sound source collection unit shown in FIG. 2 .
4 is a perspective view of the microphone shown in FIG. 3;
FIG. 5 is a block diagram of the image collecting unit shown in FIG. 1 .
FIG. 6 is a detailed block diagram of the image analysis module shown in FIG. 1 .
7 is a flowchart illustrating a process of determining an abnormal sound source according to an embodiment of the present invention.
8 is a flowchart illustrating a process of generating event results by applying weights according to an embodiment of the present invention.
9 is a diagram showing a concept of 3-channel feature point extraction according to an embodiment of the present invention.
10 is a flowchart illustrating a process of generating a classification model according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는바, 특정 실시예들을 도면에 예시하고 상세한 설명에 구체적으로 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, or substitutes included in the spirit and technical scope of the present invention.

각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용한다.In describing each figure, like reference numbers are used for like elements.

제 1, 제 2등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms such as first and second may be used to describe various components, but the components should not be limited by the terms. These terms are only used for the purpose of distinguishing one component from another.

예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제 1 구성요소는 제 2 구성요소로 명명될 수 있고, 유사하게 제 2 구성요소도 제 1 구성요소로 명명될 수 있다. "및/또는" 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.For example, a first element may be termed a second element, and similarly, a second element may be termed a first element, without departing from the scope of the present invention. The term "and/or" includes any combination of a plurality of related listed items or any of a plurality of related listed items.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미가 있다. Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs.

일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미가 있는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않아야 한다.Terms such as those defined in commonly used dictionaries should be interpreted as having meanings consistent with the meanings in the context of the related art, and unless explicitly defined in this application, they should not be interpreted in ideal or excessively formal meanings. Should not be.

이하 첨부된 도면을 참조하여 본 발명의 일실시예에 따른 이상 음원을 판단하는 복합 감시 장치 및 방법을 상세하게 설명하기로 한다.Hereinafter, a complex monitoring apparatus and method for determining an abnormal sound source according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 복합 감시 장치(100)의 구성 블럭도이다. 도 1을 참조하면, 복합 감시 장치(100)는, 영상 데이터 및 음원 데이터를 수집하는 이상 감지기(101), 영상 데이터 및 음원 데이터를 복합 분석하여 이상상황을 판단하여 이상상황정보를 생성하는 분석부(130), 이상상황정보를 송출하는 저장송출부(140) 등을 포함하여 구성될 수 있다.1 is a block diagram of a complex monitoring apparatus 100 according to an embodiment of the present invention. Referring to FIG. 1 , the complex monitoring device 100 includes an anomaly detector 101 that collects video data and sound source data, and an analysis unit that determines an abnormal situation by complexly analyzing the video data and sound source data and generates abnormal situation information. 130, a storage and transmission unit 140 for transmitting abnormal situation information, and the like.

이상 감지기(101)는 영상 데이터를 수집하는 영상 수집부(110), 음원 데이터를 수집하는 음원 수집부(120) 등을 포함하여 구성될 수 있다.The anomaly detector 101 may include an image collection unit 110 that collects image data, a sound source collection unit 120 that collects sound source data, and the like.

영상 수집부(110)는 CCTV(Closed-circuit Television)(111)를 포함하여 구성될 수 있다. 물론, CCTV이외에도 CCD(Charge-Coupled Device) 카메라, CMOS(complementary metal-oxide semiconductor) 카메라, IP(Internet Protocol) 카메라 등이 될 수 있다, The image collection unit 110 may include a closed-circuit television (CCTV) 111. Of course, in addition to CCTV, it may be a CCD (Charge-Coupled Device) camera, CMOS (Complementary Metal-Oxide Semiconductor) camera, IP (Internet Protocol) camera, etc.

음원 수집부(120)는 음원을 집음하는 집음판(121), 집음판(121)에 의해 집음되는 음원을 수집하여 음원 데이터를 생성하는 마이크(122), 음원의 방향을 감지하여 음원 방향 정보를 생성하는 방향 감지 모듈(123) 등을 포함하여 구성될 수 있다.The sound source collecting unit 120 includes a sound collecting plate 121 that collects sound sources, a microphone 122 that collects sound sources collected by the sound collecting plate 121 and generates sound source data, detects the direction of the sound source and obtains sound source direction information. It may be configured to include a direction detection module 123 and the like to generate.

마이크(122)는 2개 이상의 마이크로 구성될 수 있다. 즉 좌우로 나란히 설치될 수 있다.The microphone 122 may consist of two or more microphones. That is, they can be installed side by side side by side.

방향 감지 모듈(123)은 2개의 이상의 마이크 각각으로부터 수신된 음향 신호들의 도달 시간차를 산출하고 이를 이용하여 음원 발생 방향을 추정하는 알고리즘이 포함된 마이콤을 포함하여 구성될 수 있다. 물론, 방향 감지 모듈(123)은 다른 방식으로도 구현될 수 있다.The direction detection module 123 may include a microcomputer including an algorithm that calculates the arrival time difference of the sound signals received from each of the two or more microphones and estimates the direction of the sound source using the difference. Of course, the direction sensing module 123 can be implemented in other ways as well.

분석부(130)는 영상 데이터를 분석하여 객체를 인식하고 객체의 정보를 바탕으로 이상행동을 탐지하여 이상행동정보를 생성하는 영상 분석 모듈(131), 음원 데이터로부터 이상음원을 인식하고 이상음원의 방향정보를 생성하는 음원 분석 모듈(132), 이상행동정보와 방향정보를 이용하여 미리 정의된 복합분석규칙에 따라 주변의 이상상황정보를 생성하는 복합 분석 모듈(133) 등을 포함하여 구성될 수 있다. 이상음원은 비명음, 폭발음, 충돌음, 차량급 정거음 등을 들 수 있다.The analysis unit 130 analyzes image data to recognize an object, detects abnormal behavior based on object information, and generates abnormal behavior information. The image analysis module 131 recognizes an abnormal sound source from sound source data, It may be configured to include a sound source analysis module 132 that generates direction information, a complex analysis module 133 that generates surrounding abnormal situation information according to a predefined complex analysis rule using abnormal behavior information and direction information, and the like. there is. Abnormal sound sources may include screams, explosions, crashes, vehicle-level stop sounds, and the like.

저장 송출부(140)는 수집되는 영상 데이터, 음원 데이터를 스트리밍하고(141), 발생한 이벤트 정보를 이벤트 DB(142)에 저장한다. 또한, 이벤트 처리 규칙이 미리 정의되어 복합분석규칙DB(143)에 저장된다. 또한, 저장 송출부(140)는 이상상황정보에 따른 경보 정보를 생성하여 관제 시스템(20)에 전송하는 기능을 수행한다.The storage transmission unit 140 streams the collected image data and sound source data (141), and stores generated event information in the event DB 142. In addition, event processing rules are predefined and stored in the composite analysis rule DB 143 . In addition, the storage and transmission unit 140 performs a function of generating alarm information according to abnormal situation information and transmitting it to the control system 20 .

저장 송출부(140)는 DB(Database) 구현, 스트리밍을 위해 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD(Secure Digital) 또는 XD(eXtreme Digital) 메모리 등), 램(Random Access Memory, RAM), SRAM(Static Random Access Memory), 롬(Read Only Memory, ROM), EEPROM(Electrically Erasable Programmable Read Only Memory), PROM(Programmable Read Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. The storage transmission unit 140 implements DB (Database), flash memory type for streaming, hard disk type, multimedia card micro type, card type memory ( SD (Secure Digital) or XD (eXtreme Digital) memory, etc.), RAM (Random Access Memory, RAM), SRAM (Static Random Access Memory), ROM (Read Only Memory, ROM), EEPROM (Electrically Erasable Programmable Read Only Memory), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, and an optical disk may include at least one type of storage medium.

또한, 도 1에 도시된 영상 분석 모듈(131), 음원 분석 모듈(132), 복합 분석 모듈(133)은 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 소프트웨어 및/또는 하드웨어로 구현될 수 있다. 하드웨어 구현에 있어, 상술한 기능을 수행하기 위해 디자인된 ASIC(application specific integrated circuit), DSP(digital signal processing), PLD(programmable logic device), FPGA(field programmable gate array), 프로세서, 마이크로프로세서, 다른 전자 유닛 또는 이들의 조합으로 구현될 수 있다. In addition, the image analysis module 131, the sound source analysis module 132, and the complex analysis module 133 shown in FIG. 1 refer to units that process at least one function or operation, and are implemented in software and/or hardware. It can be. In hardware implementation, ASIC (application specific integrated circuit), DSP (digital signal processing), PLD (programmable logic device), FPGA (field programmable gate array), processor, microprocessor, other It may be implemented as an electronic unit or a combination thereof.

소프트웨어 구현에 있어, 소프트웨어 구성 컴포넌트(요소), 객체 지향 소프트웨어 구성 컴포넌트, 클래스 구성 컴포넌트 및 작업 구성 컴포넌트, 프로세스, 기능, 속성, 절차, 서브 루틴, 프로그램 코드의 세그먼트, 드라이버, 펌웨어, 마이크로 코드, 데이터, 데이터베이스, 데이터 구조, 테이블, 배열 및 변수를 포함할 수 있다. 소프트웨어, 데이터 등은 메모리에 저장될 수 있고, 프로세서에 의해 실행된다. 메모리나 프로세서는 당업자에게 잘 알려진 다양한 수단을 채용할 수 있다.In software implementation, software component components (elements), object-oriented software component components, class component components and task component components, processes, functions, properties, procedures, subroutines, segments of program code, drivers, firmware, microcode, data , databases, data structures, tables, arrays, and variables. Software, data, etc. may be stored in memory and executed by a processor. The memory or processor may employ various means well known to those skilled in the art.

운영자는 운영자 단말(10)을 통해 이벤트를 정의할 수 있으며, 이벤트 처리 규칙을 설정하고, 설치 환경에 따라 조정할 수 있다. 운영자 단말(10)은 휴대폰(mobile phone), 스마트폰(smart phone), 노트북 컴퓨터(laptop computer), 디지털방송용 단말기, 노트 패드, 데스크탑 컴퓨터 등이 될 수 있다.The operator can define events through the operator terminal 10, set event processing rules, and adjust them according to the installation environment. The operator terminal 10 may be a mobile phone, a smart phone, a laptop computer, a digital broadcast terminal, a note pad, a desktop computer, and the like.

또한, 도 1에 도시된 구성요소들은 분리되어 구성된 것으로 도시하였으나, 일체형으로 구성될 수도 있다.In addition, the components shown in FIG. 1 are shown as being configured separately, but may be configured integrally.

도 2는 도 1에 도시된 음원 수집부(120)의 외관 사시도이다. 도 2를 참조하면, 음원 수집부(120)는 원형의 몸체(210), 몸체(210)의 좌우에 일체로 배치되어 음원을 집음하는 집음판(121), 집음판(121)의 가장자리에 배치되어 음원을 수집하는 마이크(122) 등을 포함하여 구성될 수 있다. 몸체(210)의 내부에는 방향 감지 모듈(123)이 구성될 수 있다.FIG. 2 is an external perspective view of the sound source collecting unit 120 shown in FIG. 1 . Referring to FIG. 2, the sound source collection unit 120 is integrally disposed on the left and right sides of the circular body 210 and the body 210 to collect sound sources, and is disposed on the edge of the sound collecting plate 121. It may be configured to include a microphone 122 for collecting sound sources. A direction detection module 123 may be configured inside the body 210 .

도 3은 도 2에 도시된 음원 수집부(120)의 일부 확대 사시도이다. 도 3을 참조하면, 집음판(121)은 음원의 집음 효과를 강화하기 위해 파라볼로이드 형태를 띠게 된다. 또한, 집음판(121)에 반사되는 음원이 포커싱되는 집음 지점(310)에 반사 음원을 마주보는 방향으로 마이크(122)가 설치된다. FIG. 3 is a partially enlarged perspective view of the sound source collection unit 120 shown in FIG. 2 . Referring to FIG. 3 , the sound collecting plate 121 takes on a paraboloid shape to enhance the sound collecting effect of the sound source. In addition, a microphone 122 is installed in a direction facing the reflected sound source at the sound collecting point 310 where the sound source reflected by the sound collecting plate 121 is focused.

도 4는 도 3에 도시된 마이크(122)의 사시도이다. 도 4를 참조하면, 마이크(122)가 하우징(410)의 내측 외벽에 설치된다. 물론, 하우징(410)에는 집음판(121)과의 체결을 위해 관통홀(411)이 생성된다. FIG. 4 is a perspective view of the microphone 122 shown in FIG. 3 . Referring to FIG. 4 , the microphone 122 is installed on the inner and outer walls of the housing 410 . Of course, a through hole 411 is formed in the housing 410 for fastening with the sound collecting plate 121 .

도 5는 도 1에 도시된 영상 수집부(110)의 구성 블럭도이다. 도 5를 참조하면, 카메라로부터 영상 스트림을 수신하는 영상 수신 모듈(510), 영상 스트림을 복호화해 영상 프레임을 생성하는 영상 복호화 모듈(520), 영상 프레임을 RGB(Red Green Blue) 형태의 영상 데이터로 변환하는 영상 변환 모듈(530) 등을 포함하여 구성될 수 있다. FIG. 5 is a block diagram of the image collecting unit 110 shown in FIG. 1 . Referring to FIG. 5, an image receiving module 510 for receiving a video stream from a camera, an image decoding module 520 for generating an image frame by decoding the video stream, and converting the image frame into RGB (Red Green Blue) format image data It may be configured to include an image conversion module 530 that converts to .

도 6은 도 1에 도시된 영상 분석 모듈(131)의 세부 구성 블럭도이다. 도 6을 참조하면, 영상 분석 모듈(131)은, 영상 데이터를 전처리하여 표준화된 표준 영상 프레임을 생성하는 전처리 유닛(610), 표준 영상 프레임에 신경망 분류 모델을 적용하여 객체를 탐지하고 분류하는 영상 분석 유닛(620), 인식한 객체를 이용하여 이상행동을 탐지하여 이상행동정보를 생성하는 이상행동탐지유닛(630) 등을 포함하여 구성될 수 있다.FIG. 6 is a detailed block diagram of the image analysis module 131 shown in FIG. 1 . Referring to FIG. 6 , the image analysis module 131 includes a preprocessing unit 610 that preprocesses image data to generate a standardized standard image frame, and an image that detects and classifies objects by applying a neural network classification model to the standard image frame. It may be configured to include an analysis unit 620, an abnormal behavior detection unit 630 that detects abnormal behavior using the recognized object and generates abnormal behavior information, and the like.

전처리 유닛(610)은 영상 데이터의 정밀도를 분석에 적합하도록 조정하는 부분으로 영상의 화소값을 표준화해 [0,1] 사이의 32비트 부동 소수점 데이터로 변환해 연산 자원을 효율적으로 사용할 수 있도록 한다. The pre-processing unit 610 is a part that adjusts the precision of image data to be suitable for analysis, standardizes the pixel values of the image and converts them into 32-bit floating point data between [0,1] so that computational resources can be used efficiently. .

영상 분석 유닛(620)은 전처리를 통해 표준화된 표준 영상 프레임에 대해 예측 분류의 정확도를 높이기 위해 신경망 분류 모델(예를 들면 EfficientDet 모델)을 사용해 객체를 탐지하고 분류한다. 신경망 분류 모델은 딥러닝 기반의 머신러닝을 통해 객체를 인식한다. The image analysis unit 620 detects and classifies an object using a neural network classification model (eg, an EfficientDet model) in order to increase the accuracy of prediction classification for standardized image frames standardized through preprocessing. The neural network classification model recognizes objects through deep learning-based machine learning.

탐지된 객체에 대한 신뢰도를 높이기 위해 현장 상황에 맞게 학습된 객체 분류기를 앙상블해 분석한다. EfficientDet은 분류 정확도가 높은 EfficientNet과 같이 사전에 학습된 사전 학습 모델을 백본 모델로 사용할 수 있다. In order to increase the reliability of the detected object, an ensemble of object classifiers learned according to the field situation is analyzed. EfficientDet can use a pretrained pretrained model such as EfficientNet with high classification accuracy as a backbone model.

객체 분류기는 객체 탐지 과정에서 발생하는 오탐으로 인한 정확도 하락을 개선하기 위하여 사용되며 빈번히 발생하는 오탐 상황은 다음과 같다.The object classifier is used to improve accuracy deterioration due to false positives occurring in the object detection process, and the frequently occurring false positive situations are as follows.

1) 위치 추정 오류: 탐지될 객체가 없지만 탐지된 경우.1) Localization error: when there is no object to be detected but it is detected.

2) 객체 분류 오류: 위치 추정은 올바르게 되었지만 객체 분류가 잘못된 경우.2) Object Classification Error: When the localization is correct but the object classification is incorrect.

이를 해결하기 위하여 객체 분류기는 탐지된 객체 영역의 화소값을 입력 받아 2차 분석을 수행하여 배경 또는 탐지할 객체 종류에 대한 분류를 수행한다. 예를 들어 사람, 차량에 대한 분류를 목적으로 한다면 배경, 사람, 차량으로 총 3종류의 객체에 대한 분류를 수행한다. 따라서 위치 추정 오류 발생시 분류기는 배경을 결과로 출력하며 그렇지 않은 경우 탐지하고자 하는 객체를 결과로 출력한다. To solve this problem, the object classifier receives the pixel value of the detected object area and performs secondary analysis to classify the background or object type to be detected. For example, if the purpose is to classify people and vehicles, three types of objects are classified: background, people, and vehicles. Therefore, when a localization error occurs, the classifier outputs the background as a result, and otherwise outputs the object to be detected as a result.

하지만, 배경은 탐지할 객체 대비 다양한 특징을 가짐으로 여전히 오탐 발생 가능성이 존재한다. 이를 해결하기 위하여 사용하는 앙상블 방법은 다양한 조건에서 학습된 여러 개의 분류기를 이용하여 얻은 다양한 결과에 대하여 투표를 통하여 최종 결과를 출력한다. 투표기반 앙상블 방법에는 다수결 투표(Hard Voting) 방법과 확률 기반 투표(Soft Voting)이 있으며 본 기술에서는 배경의 다양성으로 인한 불확실성을 최소화하기 위하여 확률 기반 투표 방법을 이용한다.However, since the background has various characteristics compared to the object to be detected, there is still a possibility of false positives. The ensemble method used to solve this problem outputs the final result through voting on various results obtained by using several classifiers learned under various conditions. Voting-based ensemble methods include hard voting and soft voting, and in this technology, probability-based voting is used to minimize uncertainty due to diversity of backgrounds.

이상행동 탐지 유닛(630)은 영상 분석 유닛(620)에서 인식한 객체(사람이나 차량 등)가 사전에 정의한 특정 구역을 침입 또는 배회하거나 차량 충돌과 같은 교통사고 등에 대해 가중치를 적용해 이벤트 결과를 전송한다. The abnormal behavior detection unit 630 determines the event result by applying weights to the object (human or vehicle) recognized by the video analysis unit 620 invading or wandering in a predefined specific area or traffic accident such as a vehicle collision. send.

도 6에 도시된 전처리 유닛(610), 영상분석 유닛(620), 이상행동 탐지 유닛(630)은 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 소프트웨어 및/또는 하드웨어로 구현될 수 있다. 하드웨어 구현에 있어, 상술한 기능을 수행하기 위해 디자인된 ASIC(application specific integrated circuit), DSP(digital signal processing), PLD(programmable logic device), FPGA(field programmable gate array), 프로세서, 마이크로프로세서, 다른 전자 유닛 또는 이들의 조합으로 구현될 수 있다. The preprocessing unit 610, the image analysis unit 620, and the abnormal behavior detection unit 630 shown in FIG. 6 refer to units that process at least one function or operation, and may be implemented in software and/or hardware. there is. In hardware implementation, ASIC (application specific integrated circuit), DSP (digital signal processing), PLD (programmable logic device), FPGA (field programmable gate array), processor, microprocessor, other It may be implemented as an electronic unit or a combination thereof.

소프트웨어 구현에 있어, 소프트웨어 구성 컴포넌트(요소), 객체 지향 소프트웨어 구성 컴포넌트, 클래스 구성 컴포넌트 및 작업 구성 컴포넌트, 프로세스, 기능, 속성, 절차, 서브 루틴, 프로그램 코드의 세그먼트, 드라이버, 펌웨어, 마이크로 코드, 데이터, 데이터베이스, 데이터 구조, 테이블, 배열 및 변수를 포함할 수 있다. 소프트웨어, 데이터 등은 메모리에 저장될 수 있고, 프로세서에 의해 실행된다. 메모리나 프로세서는 당업자에게 잘 알려진 다양한 수단을 채용할 수 있다. In software implementation, software component components (elements), object-oriented software component components, class component components and task component components, processes, functions, properties, procedures, subroutines, segments of program code, drivers, firmware, microcode, data , databases, data structures, tables, arrays, and variables. Software, data, etc. may be stored in memory and executed by a processor. The memory or processor may employ various means well known to those skilled in the art.

도 7은 본 발명의 일실시예에 따른 이상 음원을 판단하는 과정을 보여주는 흐름도이다. 도 7을 참조하면, 분석부(130)는 촬영된 영상 데이터를 딥러닝 기반으로 분석을 실행하고, 이와 동시에 획득한 음원 데이터를 딥러닝 기반으로 분석한다(단계 S710,S720).7 is a flowchart illustrating a process of determining an abnormal sound source according to an embodiment of the present invention. Referring to FIG. 7 , the analyzer 130 analyzes captured image data based on deep learning, and at the same time analyzes acquired sound source data based on deep learning (steps S710 and S720).

이후, 영상 데이터로부터 이상상황을 위해 영상분석을 실행한다(단계 S711). 부연하면, 딥러닝 기반의 머신러닝을 통한 객체 인식을 수행하고, 인식된 객체의 정보를 바탕으로 이상행동을 탐지한다. 객체는 예를 들면, 사람, 차량 등이 될 수 있고, 이상행동은 침입, 배회, 충돌 등이 될 수 있다.Then, image analysis is performed for an abnormal situation from the image data (step S711). To elaborate, object recognition is performed through deep learning-based machine learning, and abnormal behavior is detected based on the information of the recognized object. The object may be, for example, a person or a vehicle, and the abnormal behavior may be trespassing, wandering, or collision.

한편, 음원 데이터로부터 음원 인식을 실행한다(단계 S721). 부연하면, 음원은 비명음, 폭발음, 충돌음, 차량 급정거음 등의 이상 소리와 “살려주세요”, “도와주세요”, “불이야” 등 구조요청을 위한 특정 단어를 갖는 이상 음성으로 구성될 수 있다. 물론, 음원을 다르게 정의할 수도 있다. Meanwhile, sound source recognition is performed from the sound source data (step S721). In other words, the sound source may consist of abnormal sounds such as screams, explosions, crashes, and vehicle sudden stops, and abnormal voices with specific words for rescue requests, such as “help me”, “help me”, and “fire”. . Of course, the sound source may be defined differently.

따라서, 비명음, 폭발음, 충돌음, 차량 급정거음 등의 이상소리를 인식하고 발생 음원의 방향을 인식할 수 있다. 물론, 음성을 분석하여 “살려주세요”, “도와주세요”, “불이야” 등 구조요청을 위한 특정 단어를 인식하고 발생음원의 방향을 인식할 수도 있다.Therefore, it is possible to recognize abnormal sounds such as screams, explosions, collisions, and sudden vehicle stops, and to recognize the direction of a generated sound source. Of course, by analyzing voices, it is possible to recognize specific words for rescue requests, such as “help me,” “help me,” and “fire,” and recognize the direction of the generated sound source.

이러한 음원 인식으로 인해 이벤트가 생성되며, 분석부(130)에서는 이러한 이벤트에 대해 미리 정의된 이벤트 처리 규칙에 따라 복합감지 CCTV 주변의 상황을 판단한다(단계 S730). 이벤트 정의는 운영자 단말(10)에서 설정가능하며, 설치 환경에 따라 조정가능하다.An event is generated due to this recognition of the sound source, and the analyzer 130 determines a situation around the multi-sensing CCTV according to a predefined event processing rule for this event (step S730). Event definitions can be set in the operator terminal 10 and can be adjusted according to the installation environment.

이후, 분석부(130)는 우선 조건 이벤트가 발생했는지를 판단한다(단계 S740). 부연하면 미리 정의된 이벤트 처리 규칙에 따라 우선 조건 이벤트가 발생했는지를 확인한다. 미리 정의된 이벤트 처리 규칙에 따른 판단 조건의 예를 보면 다음과 같다.Then, the analysis unit 130 first determines whether a condition event has occurred (step S740). In other words, according to predefined event handling rules, it is first checked whether a condition event has occurred. Examples of judgment conditions according to predefined event processing rules are as follows.

이벤트event 가중치weight 우선 조건preferential conditions 선택 조건selection condition 영상분석image analysis 사람person -- ㅇblanket 차량vehicle -- ㅇblanket 충돌crash 5050 ㅇblanket 배회prowl 5050 ㅇblanket 소리분석sound analysis 비명음scream 5050 ㅇblanket 폭발음belch 100100 ㅇblanket 충돌음crash sound 100100 ㅇblanket 급정거음sudden stop 5050 ㅇblanket 음성분석voice analysis 구조요청rescue request 200200 ㅇblanket

따라서, 상기 이상행동정보 또는 상기 음원방향정보의 발생에 의한 제 1 이벤트가 상기 이벤트 처리 규칙에 따라 미리 정의되는 우선 조건 이벤트에 해당하면 1단계 알람을 나타내는 상기 이상상황정보가 생성되어 관제 시스템(20)에 전송된다(단계 S741).Therefore, if the first event caused by the occurrence of the abnormal behavior information or the sound source direction information corresponds to a priority condition event predefined according to the event processing rule, the abnormal situation information indicating the first stage alarm is generated and the control system 20 ) (step S741).

이후, 미리 설정되는 가중치가 초과되면, 2단계 알람을 나타내는 이상상황정보가 생성되어 관제 시스템(20)에 전송된다(단계 S760,S770).Thereafter, when the preset weight is exceeded, abnormal situation information indicating a second-stage alarm is generated and transmitted to the control system 20 (steps S760 and S770).

가중치에 따라서 복합 이벤트를 판단하며 미리 설정된 시간(약 30초) 이내에 발생하는 이벤트를 통해 복합 분석한다. 우선 조건 이벤트가 발생할 경우 1단계 알람(주의), 가중치 최대합 200 초과시 2단계 알람(경고)을 생성하고, 관제 시스템으로 전송한다.Complex events are judged according to weights and complex analysis is performed through events that occur within a preset time (approximately 30 seconds). When a priority condition event occurs, a first-level alarm (warning) and a second-level alarm (warning) are generated when the maximum weight exceeds 200, and transmitted to the control system.

한편, 단계 S740에서, 우선 조건 이벤트 발생이 아니면, 영상 이벤트가 발생했는지를 확인한다(단계 S750).Meanwhile, in step S740, if the first condition event does not occur, it is checked whether a video event has occurred (step S750).

단계 S750에서, 영상 이벤트가 탐지된 상태에서 가중치 최대 합(즉 소리 분석/음성분석에 따른 제 1 가중치 및 영상분석에 따른 제 2 가중치의 합)이 200 초과시 2단계 알람(경고) 관제시스템으로 전송한다(단계 S743 내지 S770). In step S750, when the maximum sum of weights (that is, the sum of the first weights according to sound analysis/voice analysis and the second weights according to video analysis) exceeds 200 in a state where a video event is detected, a second-stage alarm (warning) is transmitted to the control system. (steps S743 to S770).

영상 이벤트가 탐지되지 않은 상태에서 가중치가 200 초과시 1단계 알람 관제시스템으로 전송하며, 운영자 단말로 주변의 다른 CCTV 장치 탐색 요청한다.When the weight exceeds 200 in a state where no video event is detected, it is transmitted to the first-stage alarm control system, and a search request for other CCTV devices is requested to the operator's terminal.

제 1 가중치는 소리분석 또는 음성분석에 따른 이벤트에 부여되는 값이고, 제 2 가중치는 영상 분석에 따른 이벤트에 부여되는 값이다. The first weight is a value assigned to an event based on sound analysis or voice analysis, and the second weight is a value assigned to an event based on image analysis.

도 8은 본 발명의 일실시예에 따른 가중치를 적용하여 이벤트 결과를 생성하는 과정을 보여주는 흐름도이다. 도 8을 참조하면, 영상 데이터를 획득하고, 이 영상 데이터를 표준화하여 표준 영상 프레임을 생성한다(단계 S810,S820).8 is a flowchart illustrating a process of generating event results by applying weights according to an embodiment of the present invention. Referring to FIG. 8 , image data is acquired and standardized image data is standardized to generate a standard image frame (steps S810 and S820).

이후, 표준 영상 프레임에 대한 예측 분류의 정확도를 높이기 위해 신경망 분류 모델(예를 들면, EfficientDet)을 사용해 객체를 탐지하고 분류한다. 탐지된 객체에 대한 신뢰도를 높이기 위해 현장 상황에 맞게 학습된 분류기를 앙상블해 분석한다(단계 S830).Afterwards, the object is detected and classified using a neural network classification model (eg, EfficientDet) to increase the accuracy of prediction classification for the standard video frame. In order to increase the reliability of the detected object, an ensemble of learned classifiers according to the field situation is analyzed (step S830).

이후, 이상행동을 분석하고, 가중치를 주어 이벤트를 생성한다(단계 S840,S850,S860).Thereafter, the abnormal behavior is analyzed, and an event is generated by giving a weight (steps S840, S850, and S860).

도 9는 본 발명의 일실시예에 따른 3채널 특징점 추출 개념을 보여주는 도면이다. 도 9를 참조하면, 획득된 오디오 데이터(즉 음원 데이터)(910)에 대해 채널 분할(920)을 통해 오리지널(Raw) 성분 소스, 고조파(Harmonic) 성분 소스, 퍼커시브(Percussive) 성분 소스(930)로 분리하고, 각 채널별로 -1 ~ 1 피크 정규화(peak normalize)(940)를 수행하고, 피크 정규화 데이터를 이용하여 로그-멜-스펙트로그램(Log-Mel spectrogram)으로 각 채널별 특징점(950)을 추출하고, 각 채널별 특징점(950)을 합하여(960), 분류 모델(Efficient-Net)을 위한 채널 특징점(970)을 생성한다. 9 is a diagram showing a concept of 3-channel feature point extraction according to an embodiment of the present invention. Referring to FIG. 9 , a raw component source, a harmonic component source, and a percussive component source 930 are obtained through channel division 920 for obtained audio data (ie sound source data) 910 . ), perform -1 to 1 peak normalization (940) for each channel, and use the peak normalization data to log-Mel-spectrogram. ) is extracted, and feature points 950 for each channel are combined (960) to generate channel feature points 970 for the classification model (Efficient-Net).

로그-멜-스펙트로그램(Log-Mel spectrogram)의 경우, 로그 메일(Log-Mel)은 pitch에서 발견한 사람의 음을 인지하는 기준(threshold)을 반영한 scale 변환 함수이다. 예를 들면, Mel(f) = 2595 log(1+f/700)가 될 수 있다. f는 주파수이다. 스펙트로그램(Spectrogram)은 소리나 파동을 시각화하여 파악하기 위한 도구로, 파형(waveform)과 스펙트럼(spectrum)의 특징이 조합되어 있다.In the case of the Log-Mel spectrogram, Log-Mel is a scale conversion function that reflects the threshold for recognizing a person's sound found in pitch. For example, it can be Mel(f) = 2595 log(1+f/700). f is the frequency. A spectrogram is a tool for visualizing and grasping sound or waves, and is a combination of characteristics of a waveform and a spectrum.

도 10은 본 발명의 일실시예에 따른 분류 모델을 생성하는 과정을 보여주는 흐름도이다. 도 10을 참조하면, 오디오 데이터를 취득하면, 오디오 데이터를 분할하여 앞부분 오디오(front audio)와 뒷부분 오디오(rear audio)로 구분된다(S1010,S1011,S1020,S1030). 앞에서 발생하는 소리와 뒤에서 발생되는 소리의 연관성을 확인하여 더욱 높은 정확도를 확보하기 위함이다.10 is a flowchart illustrating a process of generating a classification model according to an embodiment of the present invention. Referring to FIG. 10, when audio data is acquired, the audio data is divided into front audio and rear audio (S1010, S1011, S1020, S1030). This is to secure higher accuracy by confirming the correlation between the sound generated from the front and the sound generated from the rear.

이후, 앞부분 오디오(front audio)와 뒷부분 오디오(rear audio)에 대해 각 채널별 특징점을 추출한다(단계 S1021,S1031).Thereafter, feature points for each channel are extracted for front audio and rear audio (steps S1021 and S1031).

이후, 연관성에 대한 정확한 가중치를 뽑아내기 위해 타임디스트리뷰티드(Timedistributed) 방식을 활용한다. TimeDistributed 방식을 이용하면 각 time에서 출력된 아웃풋을 내부에 선언해준 레이어와 연결시켜주는 역할한다.After that, a time-distributed method is used to extract an accurate weight for correlation. If you use the TimeDistributed method, it serves to connect the output output at each time with the layer declared inside.

짧은 음원의 경우에는 Timedistributed 방식을 차용하지 않아도 좋은 결과를 얻지만, 5초이상의 긴 음원의 경우에는 Timedistributed 방식을 차용하는 것이 더 좋은 결과를 얻는다.In the case of short sound sources, good results are obtained without using the Timedistributed method, but for long sound sources of more than 5 seconds, better results are obtained by using the Timedistributed method.

오디오 데이터에서 추출한 채널 데이터는 이미지와 동일하게 사용할수 있기 때문에 높은 이미지 분류 정확도를 가지는 Efficient-Net같은 CNN기반의 사전 학습된 모델을 활용할 수 있다(S1040,S1050). Channel data extracted from audio data can be used the same as images, so a CNN-based pretrained model such as Efficient-Net with high image classification accuracy can be used (S1040, S1050).

채널 특징점을 사전 학습된 사전 학습 모델을 통하여 가중치를 뽑고 시공간 정보를 추가하여, 이미지 + 시공간 정보를 활용하여 가중치(Auto-Pool ID)를 재구성하여 결과 모델을 생성한다(단계 S1041 내지 S1060). Weights are extracted from channel feature points through a pre-learning model, spatiotemporal information is added, and weights (Auto-Pool ID) are reconstructed using image + spatiotemporal information to generate a resultant model (steps S1041 to S1060).

더 자세히 설명하면, 다음과 같다.In more detail, it is as follows.

이미지 데이터를 통한 사전 학습된 모델에서 마지막 DNN 레이어를 제거하여 컨볼루션 가중치만 사용한다(단계 S1040,S1050). The last DNN layer is removed from the pretrained model through image data to use only convolutional weights (steps S1040 and S1050).

이후, 과적합을 막기위해 드롭아웃(Dropout)을 사용한다(단계 S1041,S1051). 드롭아웃은 신경망의 뉴런을 부분적으로 생략하여 모델의 과적합(overfitting)을 해결해주기 위한 방법중 하나이다.After that, dropout is used to prevent overfitting (steps S1041 and S1051). Dropout is one of the methods to solve overfitting of a model by partially omitting neurons in a neural network.

매트릭스 데이터를 Flatten층를 활용해 한 채널로 합쳐서 표현한다(단계 S1043,S1053). Flatten층은 추출된 주요 특징을 전결합층에 전달하기 위해 1차원 자료로 바꿔주는 layer이다. 이미지 형태의 데이터를 배열형태로　flatten하게 만들어준다.Matrix data is combined into one channel using the flatten layer and expressed (steps S1043 and S1053). The flatten layer is a layer that converts the extracted main features into one-dimensional data to deliver to the fully coupled layer. It flattens the data in the form of an image into an array form.

최종 합쳐진 가중치를 통한 학습을 진행한다(단계 S1045,S1055). 즉 Dense 층을 이용하여 학습을 진행한다. dense층은 신경망에서 사용되는 레이어로 입력과 출력을 모두 연결해준다. 예를 들어, 입력 뉴런이 4개, 출력 뉴런이 8개라고 할때 총 연결선은 4x8=32개가 된다. 각 연결선은 가중치(weight)를 포함하고 있는데 연결강도를 의미한다. Learning is performed through the finally combined weights (steps S1045 and S1055). That is, learning proceeds using the dense layer. The dense layer is a layer used in neural networks and connects both input and output. For example, if there are 4 input neurons and 8 output neurons, the total number of connections is 4x8=32. Each connection line contains a weight, which means connection strength.

학습된 결과를 시그모이드(sigmoid) 형태로 출력한다(단계 S1047,S1057). Sigmoid　는 S자와 유사한 완만한 시그모이드 커브　형태를 보이는 함수이다.The learned result is output in a sigmoid form (steps S1047 and S1057). Sigmoid　 is a function that shows a gentle sigmoid curve 　 similar to the letter S.

두개의 sigmoid 결과를 통한 Max pool, Average pool를 통하여 최적의 결과(즉, Auto-pool 1D)를 산출한다(단계 S1060).An optimal result (ie, Auto-pool 1D) is calculated through Max pool and Average pool through the two sigmoid results (step S1060).

이후, 채널 특징점을 사전 학습된 모델에 적용하여 가중치를 뽑고, 가중치를 적용하여 분류 모델을 생성한다.Then, weights are extracted by applying the channel feature points to the pre-learned model, and a classification model is generated by applying the weights.

또한, 여기에 개시된 실시형태들과 관련하여 설명된 방법 또는 알고리즘의 단계들은, 마이크로프로세서, 프로세서, CPU(Central Processing Unit) 등과 같은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 (명령) 코드, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. In addition, the steps of a method or algorithm described in connection with the embodiments disclosed herein are implemented in the form of program instructions that can be executed through various computer means such as a microprocessor, processor, CPU (Central Processing Unit), etc. It can be recorded on any available medium. The computer readable medium may include program (instruction) codes, data files, data structures, etc. alone or in combination.

상기 매체에 기록되는 프로그램 (명령) 코드는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프 등과 같은 자기 매체(magnetic media), CD-ROM, DVD, 블루레이 등과 같은 광기록 매체(optical media) 및 롬(ROM: Read Only Memory), 램(RAM: Random Access Memory), 플래시 메모리 등과 같은 프로그램 (명령) 코드를 저장하고 수행하도록 특별히 구성된 반도체 기억 소자가 포함될 수 있다. The program (command) code recorded on the medium may be specially designed and configured for the present invention, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs, DVDs, and Blu-rays, and Read Only Memory (ROMs). ), a RAM (Random Access Memory), flash memory, etc., may include a semiconductor storage device specially configured to store and execute program (command) codes.

여기서, 프로그램 (명령) 코드의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Here, examples of the program (command) code include high-level language codes that can be executed by a computer using an interpreter or the like as well as machine code generated by a compiler. The hardware devices described above may be configured to act as one or more software modules to perform the operations of the present invention, and vice versa.

10: 운영자 단말
20: 관제 시스템
100: 복합 감시 장치
101: 이상 감지기
110: 영상 수집부
120: 음원 수집부
121: 집음판 122: 마이크 123: 방향감지모듈
130: 분석부
131: 영상 분석 모듈 132: 음원 분석 모듈 133: 복합 분석 모듈
140: 저장 송출부
310l 집음 지점
510: 영상 수신 모듈
520: 영상 복호화 모듈
530: 영상 변환 모듈10: operator terminal
20: control system
100: complex monitoring device
101: anomaly detector
110: image collection unit
120: sound source collection unit
121: sound collector 122: microphone 123: direction detection module
130: analysis unit
131: video analysis module 132: sound source analysis module 133: complex analysis module
140: storage sending unit
310l collection point
510: video receiving module
520: image decoding module
530: image conversion module

Claims

an anomaly detector 101 that collects image data and sound source data;
an analyzer 130 that determines an abnormal situation by performing complex analysis of the image data and the sound source data and generates abnormal situation information; and
A storage and transmission unit 140 for transmitting the abnormal situation information; includes,
The anomaly detector 101,
an image collection unit 110 generating the image data; and
A sound source collection unit 120 generating the sound source data;
The sound source collection unit 120,
Two microphones 122 for collecting sound sources and generating sound source data; and
A direction detection module 123 for detecting the direction of the sound source and generating sound source direction information;
The sound source collection unit 120,
a body 210 in which the direction detection module 123 is configured; and
It includes; a sound collecting plate 121 integrally disposed on the left and right sides of the body 210 to collect sound sources and having two microphones 122 disposed at the edges,
It is composed of an integral type including the abnormal detector 101, the analyzer 130, and the storage and transmission unit 140,
Analysis of the video data and the sound source data is performed simultaneously,
The sound source direction information is estimated by calculating the arrival time difference of the sound signals received from each of the two microphones and using the arrival time difference,
The analysis unit 130,
an image analysis module 131 for recognizing an object by analyzing the image data, detecting an abnormal behavior based on information of the object, and generating abnormal behavior information;
a sound source analysis module 132 recognizing an abnormal sound source from the sound source data and generating sound source direction information of the abnormal sound source; and
A complex analysis module 133 generating the surrounding abnormal situation information according to a predefined event processing rule using the abnormal behavior information and the sound source direction information;
The sound source data is classified into abnormal sounds including screams, explosions, crashes, and vehicle sudden stops, and abnormal sounds having specific words for rescue request.

delete

According to claim 1,
The complex monitoring device for determining an abnormal sound source, characterized in that the sound collecting plate 121 is in the form of a paraboloid to enhance the sound collecting effect of the sound source.

According to claim 1,
The two microphones 122 are installed in a direction facing the reflected sound source at the sound collecting point 310 where the sound source reflected by the sound collecting plate 121 is focused.

delete

According to claim 1,
The image analysis module 131,
a preprocessing unit 610 for preprocessing the image data to generate a standardized standard image frame;
an image analysis unit 620 for detecting and classifying the object by applying a neural network classification model to the standard image frame; and
An abnormal behavior detection unit (630) for detecting the abnormal behavior using the recognized object and generating the abnormal behavior information;

According to claim 8,
A complex monitoring device for determining an abnormal sound source, characterized in that an ensemble of classifiers learned according to the field situation is analyzed to increase the reliability of the detected object.

delete

According to claim 1,
characterized in that the abnormal situation information indicating a first-level alarm is generated when the first event caused by the occurrence of the abnormal behavior information or the sound source direction information corresponds to a priority condition event predefined according to the event processing rule. A complex monitoring device that judges

According to claim 11,
When the second event exceeds the preset first weight, the abnormal situation information indicating the second-stage alarm is generated, and the first weight is a value given to an event according to sound analysis or voice analysis. A complex monitoring device that judges.

According to claim 12,
When the first event does not correspond to the priority condition event and a video event due to the occurrence of the abnormal behavior information occurs, the sum of the first weight and the second weight exceeds a preset first set value. The complex monitoring device for determining an abnormal sound source, characterized in that the abnormal situation information indicating the second-stage alarm is generated, and the second weight is a value given to an event according to video analysis.

According to claim 13,
If the first event does not correspond to the priority condition event and a video event occurs due to the occurrence of the abnormal behavior information, if the second weight exceeds a preset second set value, the abnormality indicating the second-stage alarm A complex monitoring device for determining an abnormal sound source, characterized in that situation information is generated.

According to claim 13,
If the first event does not correspond to the priority condition event and no video event occurs due to the occurrence of the abnormal behavior information, and the first weight exceeds a preset third set value, the first-stage alarm is displayed. The composite monitoring device for determining an abnormal sound source, characterized in that when the abnormal situation information is generated and the first weight does not exceed a preset third set value, a search for another device is requested to an operator terminal.

(a) collecting image data and sound source data by the anomaly detector 101;
(b) generating abnormal situation information by determining an abnormal situation through complex analysis of the video data and the sound source data by the analysis unit 130; and
(c) transmitting, by the storage and transmission unit 140, the abnormal situation information;
The anomaly detector 101,
an image collection unit 110 generating the image data; and
A sound source collection unit 120 generating the sound source data;
The sound source collection unit 120,
Two microphones 122 for collecting sound sources and generating sound source data; and
A direction detection module 123 for detecting the direction of the sound source and generating sound source direction information;
The sound source collection unit 120,
a body 210 in which the direction detection module 123 is configured; and
It includes; a sound collecting plate 121 integrally disposed on the left and right sides of the body 210 to collect sound sources and having two microphones 122 disposed at the edges,
It is composed of an integral type including the abnormal detector 101, the analyzer 130, and the storage and transmission unit 140,
Analysis of the video data and the sound source data is performed simultaneously,
The sound source direction information is estimated by calculating the arrival time difference of the sound signals received from each of the two microphones and using the arrival time difference,
The analysis unit 130,
an image analysis module 131 for recognizing an object by analyzing the image data, detecting an abnormal behavior based on information of the object, and generating abnormal behavior information;
a sound source analysis module 132 recognizing an abnormal sound source from the sound source data and generating sound source direction information of the abnormal sound source; and
A complex analysis module 133 generating the surrounding abnormal situation information according to a predefined event processing rule using the abnormal behavior information and the sound source direction information;
The sound source data is classified into an abnormal sound including a scream, an explosion sound, a crash sound, and a vehicle sudden stop, and an abnormal sound having a specific word for a rescue request.