KR20220060587A

KR20220060587A - Method, apparatus and system for detecting abnormal event

Info

Publication number: KR20220060587A
Application number: KR1020200145928A
Authority: KR
Inventors: 김동칠; 양창모
Original assignee: 한국전자기술연구원
Priority date: 2020-11-04
Filing date: 2020-11-04
Publication date: 2022-05-12
Also published as: WO2022097805A1; KR102484198B1

Abstract

The present invention relates to a method, apparatus, and system for detecting an abnormal event to quickly detect abnormal events even in a real-time environment. According to one embodiment of the present invention, the abnormal event detection method detects an abnormal event for an object in a real-time 3D depth image captured by a 3D depth camera of a video surveillance system. The method comprises the following steps of: acquiring a 3D depth image and Internet of Things (IoT) sensor data in real time around a predetermined place; using a learned event recognition model learned through the training data including the input data for the 3D depth image and the result data on whether an abnormal event occurs or the type of the abnormal event according to a deep learning technique to recognize whether an abnormal event for the object in the image occurs on the basis of the acquired 3D depth image; and using IoT sensor data to determine whether a corresponding abnormal event actually occurs when the abnormal event is recognized.

Description

Abnormal event detection method, device and system {METHOD, APPARATUS AND SYSTEM FOR DETECTING ABNORMAL EVENT}

본 발명은 이상 이벤트 탐지 방법, 장치 및 시스템에 관한 것으로서, 더욱 상세하게는 영상 감시 중에 이상 이벤트를 탐지하기 위한 방법, 장치 및 시스템에 관한 것이다.The present invention relates to a method, apparatus and system for detecting an abnormal event, and more particularly, to a method, apparatus and system for detecting an abnormal event during video monitoring.

영상 감시 시스템은 CCTV 등에서 수집된 영상에 대한 분석을 기반으로 범죄예방, 재난감시 등과 같이 생명과 재산을 보호하기 위한 시스템이다. 이러한 영상 감시 시스템은 수집된 영상에서 사람, 차량 등 감시 대상 사물을 인식하고, 이들의 행동 패턴을 분석하여 이상 이벤트(예를 들어, 침입, 화재, 폭력, 도난, 투기 등)가 발생될 경우, 이에 대한 정보를 각종 경고 장치 등을 통해 감시자에게 전달하는 시스템이다.The video surveillance system is a system for protecting lives and property, such as crime prevention and disaster monitoring, based on the analysis of images collected from CCTV. These video surveillance systems recognize objects to be monitored, such as people and vehicles, from the collected images, analyze their behavior patterns, and when abnormal events (eg, intrusion, fire, violence, theft, speculation, etc.) occur, It is a system that delivers information about this to the monitor through various warning devices.

하지만, 종래의 영상 감시 시스템(이하, “종래 기술”이라 지칭함)은 대부분 RGB의 영상만을 활용하여 이상 이벤트를 탐지하는 방식이다. 이에 따라, 종래 기술은 복잡한 환경 및 야간 화경 등과 같은 저조도 환경에서 해당 이상 이벤트를 정확하고 신속하게 탐지가 어려운 문제점이 있었다.However, most conventional video surveillance systems (hereinafter, referred to as “prior art”) use only RGB images to detect abnormal events. Accordingly, the prior art has a problem in that it is difficult to accurately and quickly detect a corresponding abnormal event in a complex environment or a low-illuminance environment such as a night scene.

또한, 종래 기술은 이상 이벤트 탐지를 위한 영상 분석 시에 배경 제거를 통한 객체 추출 등과 같은 다양한 전처리가 필수적으로 수행되어야 한다. 이에 따라, 종래 기술은 이상 이벤트의 탐지가 전처리 수행에 의해 늦을 수밖에 없어, 실시간 환경에 적용되기 어려운 문제점이 있었다.In addition, in the prior art, various pre-processing, such as object extraction through background removal, must be essentially performed when analyzing an image for detecting an abnormal event. Accordingly, the prior art has a problem in that it is difficult to apply to a real-time environment because the detection of an abnormal event is inevitably delayed due to the pre-processing.

KRKR 10-169005010-1690050 BB

상기한 바와 같은 종래 기술의 문제점을 해결하기 위하여, 본 발명은 실시간 환경에서 빠르고 정확하게 영상 감시 중의 이상 이벤트에 대한 탐지가 가능한 방법, 장치 및 시스템을 제공하는데 그 목적이 있다.In order to solve the problems of the prior art as described above, it is an object of the present invention to provide a method, apparatus and system capable of quickly and accurately detecting an abnormal event during video monitoring in a real-time environment.

다만, 본 발명이 해결하고자 하는 과제는 이상에서 언급한 과제에 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.However, the problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned can be clearly understood by those of ordinary skill in the art to which the present invention belongs from the description below. There will be.

상기와 같은 과제를 해결하기 위한 본 발명의 일 실시예에 따른 이상 이벤트 방지 방법은 전자 장치에서 수행되며, 영상 감시 시스템의 3차원 깊이 카메라에서 촬영된 실시간 3차원 깊이 영상 속 객체에 대한 이상 이벤트를 탐지하는 방법으로서, 특정 장소 주변에서의 3차원 깊이 영상 및 IoT(Internet of Things) 센서 데이터를 실시간으로 획득하는 단계; 3차원 깊이 영상에 대한 입력 데이터와, 이상 이벤트 여부 또는 이상 이벤트 종류에 대한 결과 데이터를 각각 포함한 학습 데이터를 통해, 딥러닝(deep learning) 기법에 따라 기 학습된 이벤트 인지 모델을 이용하여, 획득된 3차원 깊이 영상을 기반으로 영상 속 객체에 대한 이상 이벤트 여부를 인지하는 단계; 및 이상 이벤트로 인지되는 경우, IoT 센서 데이터를 이용하여 해당 이상 이벤트의 실제 발생 여부를 판단하는 단계;를 포함한다.An abnormal event prevention method according to an embodiment of the present invention for solving the above problems is performed in an electronic device, and an abnormal event for an object in a real-time 3D depth image captured by a 3D depth camera of an image monitoring system is detected. A detection method comprising: acquiring a three-dimensional depth image and Internet of Things (IoT) sensor data in real time around a specific place; Through training data including input data for 3D depth image and result data for abnormal event or abnormal event type, respectively, using an event recognition model that has been previously learned according to a deep learning technique, Recognizing whether there is an abnormal event on the object in the image based on the 3D depth image; and if recognized as an abnormal event, determining whether the corresponding abnormal event actually occurs using IoT sensor data.

상기 이벤트 인지 모델은 이상 이벤트 인지에 대한 필터와, 인체 관절 인지에 대한 필터를 각각 포함하는 컨볼루션 레이어를 포함할 수 있다.The event recognition model may include a convolution layer each including a filter for recognizing an abnormal event and a filter for recognizing a human joint.

상기 이벤트 인지 모델은 인체 관절 정보를 이용하여 현재 영상 프레임에서의 인체 관절 위치와 이상 이벤트의 인체 관절 위치 간에 오차를 줄이도록 컨볼루션 레이어의 가중치가 업데이트된 모델일 수 있다.The event recognition model may be a model in which the weight of the convolution layer is updated to reduce an error between the position of the human joint in the current image frame and the position of the human joint of the abnormal event by using the human joint information.

상기 이벤트 인지 모델은 이상 이벤트 분류에 대한 손실과 관절 인지에 대한 손실에 따른 합계 손실(total loss)이 최소화되도록 컨볼루션 레이어의 가중치가 업데이트된 모델일 수 있다.The event recognition model may be a model in which the weight of the convolutional layer is updated so that a total loss according to the loss for abnormal event classification and the loss for joint recognition is minimized.

상기 IoT 센서 데이터는 영상 속 인물이 소지한 IoT 센서에서 해당 인물의 상태에 대해 측정된 데이터일 수 있다.The IoT sensor data may be data measured on the state of the person in the IoT sensor possessed by the person in the image.

상기 IoT 센서는 해당 인물의 신체 일부에 접촉, 부착, 착용, 또는 삽입된 웨어러블 장치에 포함될 수 있다.The IoT sensor may be included in a wearable device that is in contact with, attached to, worn, or inserted into a body part of the person.

상기 IoT 센서는 심박수 센서, 심전도 센서, 산소 센서, 피부 전도 센서, 또는 피부 온도 센서 중 어느 하나일 수 있다.The IoT sensor may be any one of a heart rate sensor, an electrocardiogram sensor, an oxygen sensor, a skin conduction sensor, and a skin temperature sensor.

상기 인지하는 단계는 실시간으로 획득되는 연속된 3차원 깊이 영상에 대해 슬라이딩 윈도우에 따라 특정 시간 단위로 나눠 이상 이벤트 여부를 인지할 수 있다.In the recognizing step, an abnormal event may be recognized by dividing the continuous 3D depth image acquired in real time by a specific time unit according to a sliding window.

본 발명의 일 실시예에 따른 이상 이벤트 탐지 방법은 이상 이벤트가 실제 발생한 것으로 판단된 경우에 알림 데이터를 생성하는 단계를 더 포함할 수 있다.The abnormal event detection method according to an embodiment of the present invention may further include generating notification data when it is determined that the abnormal event has actually occurred.

본 발명의 일 실시예에 따른 이상 이벤트 탐지 장치는 영상 감시 시스템의 3차원 깊이 카메라에서 촬영된 실시간 3차원 깊이 영상 속 객체에 대한 이상 이벤트를 탐지하는 장치로서, 특정 장소 주변에서의 3차원 깊이 영상 및 IoT(Internet of Things) 센서 데이터를 실시간으로 수신하는 통신부; 3차원 깊이 영상에 대한 입력 데이터와, 이상 이벤트 여부 또는 이상 이벤트 종류에 대한 결과 데이터를 각각 포함한 학습 데이터를 통해, 딥러닝(deep learning) 기법에 따라 기 학습된 이벤트 인지 모델을 저장한 저장부; 및 통신부에 수신된 정보와 저장부에 저장된 정보를 이용하여 이상 이벤트의 탐지를 제어하는 제어부;를 포함한다.An abnormal event detection apparatus according to an embodiment of the present invention is an apparatus for detecting an abnormal event with respect to an object in a real-time 3D depth image captured by a 3D depth camera of a video surveillance system, and a 3D depth image around a specific place. and a communication unit configured to receive Internet of Things (IoT) sensor data in real time; a storage unit for storing an event recognition model previously learned according to a deep learning technique through learning data including input data for a three-dimensional depth image and result data on whether an abnormal event or an abnormal event type is present; and a control unit controlling the detection of an abnormal event using the information received in the communication unit and the information stored in the storage unit.

상기 제어부는 저장된 이벤트 인지 모델을 이용하여, 수신된 3차원 깊이 영상을 기반으로 영상 속 객체에 대한 이상 이벤트 여부를 인지할 수 있으며, 이상 이벤트로 인지되는 경우에 수신된 IoT 센서 데이터를 이용하여 해당 이상 이벤트의 실제 발생 여부를 판단할 수 있다.The control unit can recognize whether an abnormal event for an object in the image is based on the received 3D depth image using the stored event recognition model, and when recognized as an abnormal event, the corresponding IoT sensor data is used It can be determined whether an abnormal event actually occurs.

본 발명의 일 실시예에 따른 영상 감시 시스템은, 3차원 깊이 영상을 촬영하는 3차원 깊이 카메라; 3차원 깊이 카메라의 촬영 위치 주변에 위치하여 IoT(Internet of Things) 센서 데이터를 측정하는 IoT 센서; 및 3차원 깊이 영상 및 센서 데이터를 실시간으로 수신하여, 실시간 3차원 깊이 영상 속 객체에 대한 이상 이벤트의 탐지를 제어하는 제어부;를 포함한다.An image monitoring system according to an embodiment of the present invention includes: a three-dimensional depth camera for photographing a three-dimensional depth image; IoT sensor for measuring IoT (Internet of Things) sensor data by being located around the shooting position of the three-dimensional depth camera; and a controller configured to receive a 3D depth image and sensor data in real time, and control detection of an abnormal event with respect to an object in the real-time 3D depth image.

상기 제어부는 3차원 깊이 영상에 대한 입력 데이터와, 이상 이벤트 여부 또는 이상 이벤트 종류에 대한 결과 데이터를 각각 포함한 학습 데이터를 통해, 딥러닝(deep learning) 기법에 따라 기 학습된 이벤트 인지 모델을 이용하여, 수신된 3차원 깊이 영상을 기반으로 영상 속 객체에 대한 이상 이벤트 여부를 인지할 수 있으며, 이상 이벤트로 인지되는 경우에 수신된 센서 데이터를 이용하여 해당 이상 이벤트의 실제 발생 여부를 판단할 수 있다.The control unit uses an event recognition model previously learned according to a deep learning technique through learning data including input data for a three-dimensional depth image and result data on whether an abnormal event or an abnormal event type is present, respectively. , it is possible to recognize whether an abnormal event for an object in the image is based on the received 3D depth image, and when it is recognized as an abnormal event, it is possible to determine whether the abnormal event actually occurs using the received sensor data .

상기와 같이 구성되는 본 발명은 전처리 과정 없이 3차원 깊이 영상을 통해 이상 이벤트를 분석하므로 실시간 환경에서도 빠르게 이상 이벤트의 탐지가 가능한 이점이 있다.The present invention configured as described above has an advantage in that it is possible to quickly detect an abnormal event even in a real-time environment because an abnormal event is analyzed through a three-dimensional depth image without a pre-processing process.

또한, 본 발명은 3차원 깊이 영상과 함께 인체 관절 정보 및 IoT 센서 데이터도 고려함에 따라 야간 등의 저조도 환경에서도 이상 이벤트에 대한 정확한 탐지가 가능한 이점이 있다. In addition, the present invention has an advantage in that it is possible to accurately detect an abnormal event even in a low-illuminance environment such as at night by considering human joint information and IoT sensor data along with a three-dimensional depth image.

또한, 본 발명은 보안 취약 환경에 설치하여 영상 감시 시스템과 연동함으로써, 개인의 신변 및 재산을 보장하고 범죄 예방이 가능한 이점이 있다.In addition, the present invention has the advantage of being installed in a security-vulnerable environment and interworking with the video surveillance system, thereby ensuring personal and property protection and preventing crime.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtainable in the present invention are not limited to the above-mentioned effects, and other effects not mentioned may be clearly understood by those of ordinary skill in the art to which the present invention belongs from the following description. will be.

도 1은 본 발명의 일 실시예에 따른 영상 감시 시스템(10)의 블록 구성도를 나타낸다.
도 2는 일반적인 RGB 영상과 3차원 깊이 영상에 대한 일 예를 나타낸다.
도 3은 본 발명의 일 실시예에 따른 탐지 장치(400)의 블록 구성도를 나타낸다.
도 4는 본 발명의 일 실시예에 따른 탐지 장치(400)에서 제어부(450)의 블록 구성도를 나타낸다.
도 5는 이벤트 인지 모델의 학습 과정(S101 내지 S104)을 나타낸다.
도 6은 DNN(Deep Neural Network) 기반으로 학습된 이벤트 인지 모델의 구조에 대한 일 예를 나타낸다.
도 7은 학습된 이벤트 인지 모델을 이용한 이상 이벤트 탐지 과정(S201 내지 S205)을 나타낸다.
도 8은 DNN(Deep Neural Network) 기반으로 학습된 이벤트 인지 모델에 슬라이딩 윈도우를 적용한 경우에 대한 일 예를 나타낸다.1 shows a block diagram of a video surveillance system 10 according to an embodiment of the present invention.
2 shows an example of a general RGB image and a 3D depth image.
3 is a block diagram showing a detection apparatus 400 according to an embodiment of the present invention.
4 is a block diagram showing the control unit 450 in the detection apparatus 400 according to an embodiment of the present invention.
5 shows the learning process (S101 to S104) of the event recognition model.
6 shows an example of the structure of an event recognition model learned based on a deep neural network (DNN).
7 shows an abnormal event detection process ( S201 to S205 ) using the learned event recognition model.
8 shows an example of a case in which a sliding window is applied to an event recognition model learned based on a deep neural network (DNN).

본 발명의 상기 목적과 수단 및 그에 따른 효과는 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다.The above object and means of the present invention and its effects will become more apparent through the following detailed description in relation to the accompanying drawings, and accordingly, those of ordinary skill in the art to which the present invention pertains can easily understand the technical idea of the present invention. will be able to carry out In addition, in describing the present invention, if it is determined that a detailed description of a known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며, 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 경우에 따라 복수형도 포함한다. 본 명세서에서, "포함하다", “구비하다”, “마련하다” 또는 “가지다” 등의 용어는 언급된 구성요소 외의 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다.The terminology used herein is for the purpose of describing the embodiments, and is not intended to limit the present invention. In the present specification, the singular form also includes the plural form as the case may be, unless otherwise specified in the phrase. In this specification, terms such as “include”, “provide”, “provide” or “have” do not exclude the presence or addition of one or more other components other than the mentioned components.

본 명세서에서, “또는”, “적어도 하나” 등의 용어는 함께 나열된 단어들 중 하나를 나타내거나, 또는 둘 이상의 조합을 나타낼 수 있다. 예를 들어, “또는 B”“및 B 중 적어도 하나”는 A 또는 B 중 하나만을 포함할 수 있고, A와 B를 모두 포함할 수도 있다.In this specification, terms such as “or” and “at least one” may indicate one of the words listed together, or a combination of two or more. For example, “or B” and “at least one of B” may include only one of A or B, or both A and B.

본 명세서에서, “예를 들어” 등에 따르는 설명은 인용된 특성, 변수, 또는 값과 같이 제시한 정보들이 정확하게 일치하지 않을 수 있고, 허용 오차, 측정 오차, 측정 정확도의 한계와 통상적으로 알려진 기타 요인을 비롯한 변형과 같은 효과로 본 발명의 다양한 실시 예에 따른 발명의 실시 형태를 한정하지 않아야 할 것이다.In the present specification, descriptions according to “for example” and the like may not exactly match the information presented, such as recited properties, variables, or values, tolerances, measurement errors, limits of measurement accuracy, and other commonly known factors The embodiments of the present invention according to various embodiments of the present invention should not be limited by effects such as modifications including .

본 명세서에서, 어떤 구성요소가 다른 구성요소에 '연결되어’ 있다거나 '접속되어' 있다고 기재된 경우, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성 요소에 '직접 연결되어' 있다거나 '직접 접속되어' 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해될 수 있어야 할 것이다.In this specification, when it is described that a certain element is 'connected' or 'connected' to another element, it may be directly connected or connected to the other element, but other elements may exist in between. It should be understood that there may be On the other hand, when it is mentioned that a certain element is 'directly connected' or 'directly connected' to another element, it should be understood that the other element does not exist in the middle.

본 명세서에서, 어떤 구성요소가 다른 구성요소의 '상에' 있다거나 '접하여' 있다고 기재된 경우, 다른 구성요소에 상에 직접 맞닿아 있거나 또는 연결되어 있을 수 있지만, 중간에 또 다른 구성요소가 존재할 수 있다고 이해되어야 할 것이다. 반면, 어떤 구성요소가 다른 구성요소의 '바로 위에' 있다거나 '직접 접하여' 있다고 기재된 경우에는, 중간에 또 다른 구성요소가 존재하지 않은 것으로 이해될 수 있다. 구성요소 간의 관계를 설명하는 다른 표현들, 예를 들면, '～사이에'와 '직접 ～사이에' 등도 마찬가지로 해석될 수 있다.In this specification, when it is described that a certain element is 'on' or 'in contact with' another element, it may be directly in contact with or connected to the other element, but another element may exist in the middle. It should be understood that On the other hand, when it is described that a certain element is 'directly on' or 'directly' of another element, it may be understood that another element does not exist in the middle. Other expressions describing the relationship between the elements, for example, 'between' and 'directly between', etc. may be interpreted similarly.

본 명세서에서, '제1', '제2' 등의 용어는 다양한 구성요소를 설명하는데 사용될 수 있지만, 해당 구성요소는 위 용어에 의해 한정되어서는 안 된다. 또한, 위 용어는 각 구성요소의 순서를 한정하기 위한 것으로 해석되어서는 안되며, 하나의 구성요소와 다른 구성요소를 구별하는 목적으로 사용될 수 있다. 예를 들어, '제1구성요소'는 '제2구성요소'로 명명될 수 있고, 유사하게 '제2구성요소'도 '제1구성요소'로 명명될 수 있다.In this specification, terms such as 'first' and 'second' may be used to describe various components, but the components should not be limited by the above terms. In addition, the above terms should not be construed as limiting the order of each component, and may be used for the purpose of distinguishing one component from another. For example, a 'first component' may be referred to as a 'second component', and similarly, a 'second component' may also be referred to as a 'first component'.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다. Unless otherwise defined, all terms used herein may be used with meanings commonly understood by those of ordinary skill in the art to which the present invention pertains. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless specifically defined explicitly.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일 실시예를 상세히 설명하도록 한다.Hereinafter, a preferred embodiment according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 영상 감시 시스템(10)의 블록 구성도를 나타낸다.1 shows a block diagram of a video surveillance system 10 according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 영상 감시 시스템(10)은 범죄예방, 재난감시 등과 같이 생명과 재산을 보호하기 위한 시스템이다. 즉, 영상 감시 시스템(10)은 수집된 실시간의 3차원 깊이 영상 및 센서 데이터를 기반으로, 3차원 깊이 영상 속 감시 대상 객체(즉, 영상 속에 등장하는 인물 또는 사물)에 대한 이상 이벤트(즉, 평상 시와 다른 상황, 위기 상항, 긴급 상황 등)를 탐지하며, 해당 이상 이벤트가 발생된 경우에 이에 대한 알림을 경고 장치(300) 등을 통해 감시자에게 전달하는 시스템이다. 특히, 감시 대상 객체가 인물인 경우, 이상 이벤트는 해당 인물의 행동이 이상한 것으로서, 그 인물의 신상이 위태로운 상황(예를 들어, 싸움, 쓰러짐 등)인 것일 수 있다.The video surveillance system 10 according to an embodiment of the present invention is a system for protecting life and property, such as crime prevention and disaster monitoring. That is, the video monitoring system 10 is based on the collected real-time 3D depth image and sensor data, an abnormal event (ie, a person or object appearing in the image) for the object to be monitored in the 3D depth image (that is, the image). It is a system that detects a situation different from normal, crisis situation, emergency situation, etc.) and delivers a notification to a supervisor through the warning device 300 when a corresponding abnormal event occurs. In particular, when the object to be monitored is a person, the abnormal event may be a situation in which the person's behavior is strange and the person's identity is in jeopardy (eg, fight, collapse, etc.).

이러한 영상 감시 시스템(10)은, 도 1에 도시된 바와 같이, 3차원 깊이 카메라(100), IoT(Internet of Things) 센서(200), 경고 장치(300) 및 탐지 장치(400)를 포함할 수 있다.As shown in FIG. 1 , the video surveillance system 10 may include a three-dimensional depth camera 100 , an Internet of Things (IoT) sensor 200 , a warning device 300 , and a detection device 400 . can

도 2는 일반적인 RGB 영상과 3차원 깊이 영상에 대한 일 예를 나타낸다.2 shows an example of a general RGB image and a 3D depth image.

3차원 깊이 카메라(100)는 감시 대상 객체의 주변에 설치되어, 감시 객체에 대한 3차원 깊이 영상을 촬영 수집하는 장치로서, 측정된 3차원 깊이 영상을 탐지 장치(400)로 전달한다. 즉, 도 2를 참조하면, 3차원 깊이 영상은 촬영된 화면에서 각 화소에 대한 깊이에 대한 정보를 포함하므로, 일반적인 광학 카메라에서 획득되는 RGB 영상과는 다른 영상 정보를 포함한다. 즉, RGB 영상은 촬영된 화면에서 각 화소에 대한 RGB의 휘도 등에 대한 정보를 포함한다. 즉, 본 발명은 3차원 깊이 영상을 이용함에 따라, RGB 영상을 이용하는 경우에 비해 야간 등과 같은 저조도 환경에서도 정확한 이상 이벤트의 탐지가 가능한 이점이 있다. 예를 들어, 3차원 깊이 영상은 모아레 기법, 스테레오 영상 기법, 또는 레이저 측정 등에 의해 획득될 수 있으나, 이에 한정되는 것은 아니다.The 3D depth camera 100 is installed in the vicinity of a monitoring target object and captures and collects a 3D depth image of the monitoring object, and transmits the measured 3D depth image to the detection device 400 . That is, referring to FIG. 2 , since the 3D depth image includes information about the depth of each pixel on the captured screen, it includes image information different from the RGB image obtained from a general optical camera. That is, the RGB image includes information on the luminance of RGB for each pixel in the captured screen. That is, according to the present invention, since the 3D depth image is used, there is an advantage in that it is possible to accurately detect an abnormal event even in a low-illuminance environment such as at night, compared to the case of using an RGB image. For example, the 3D depth image may be obtained by a moiré technique, a stereo imaging technique, or laser measurement, but is not limited thereto.

IoT 센서(200)는 3차원 깊이 카메라(100)의 주변에 위치하여 감시 객체 또는 그 주변에 대한 다양한 정보를 감지하며, 센서 데이터를 생성하여 탐지 장치(400)로 전달한다. 이러한 IoT 센서(200)는 감시 객체 주변에 대한 정보를 감지하는 센서(이하, “제1 센서”라 지칭함)이거나, 웨어러블 장치(wearable device) 등과 같이 영상 속 인물이 소지하여 해당 인물에 대한 정보를 감지하는 센서(이하, “제2 센서”라 지칭함)일 수 있다. 이때, “소지한 것”은 영상 속 인물의 신체 일부에 접촉, 부착, 착용, 또는 삽입된 것을 의미할 수 있다. The IoT sensor 200 is located in the vicinity of the 3D depth camera 100 to detect a monitoring object or various information about its surroundings, and generates sensor data and transmits it to the detection device 400 . The IoT sensor 200 is a sensor (hereinafter, referred to as a “first sensor”) that detects information about the surrounding object to be monitored, or is possessed by a person in the image, such as a wearable device, to receive information about the person. It may be a sensing sensor (hereinafter referred to as a “second sensor”). In this case, “possessed” may refer to being in contact with, attached to, worn, or inserted into a body part of a person in the image.

예를 들어, 웨어러블 장치는 전자 장갑, 전자 안경, head-mounted-device(HMD), 전자 의복, 전자 팔찌, 전자 목걸이, 전자 앱세서리(appcessory), 스마트 워치(smart watch), 또는 스마트 글라스(smart glass) 등일 수 있으나, 이에 한정되는 것은 아니다.For example, the wearable device may be an electronic glove, electronic glasses, head-mounted-device (HMD), electronic garment, electronic bracelet, electronic necklace, electronic accessory, smart watch, or smart glass (smart watch). glass), but is not limited thereto.

예를 들어, 제1 센서는 조도 센서, 조도 센서, 습도 센서, 누설 센서, 진동 센서, 또는 지자기 센서 등을 포함할 수 있으나, 이에 한정되는 것은 아니다. 또한, 제2 센서는 심박수 센서, 심전도(ECG) 센서, 산소 센서(혈액 내 산소 측정), 피부 전도 센서(피부의 전류 반응 측정, 땀 흘리는 정도 측정 가능), 또는 피부 온도 센서(사용자 온도 측정) 등을 포함할 수 있으나, 이에 한정되는 것은 아니다.For example, the first sensor may include, but is not limited to, an illuminance sensor, an illuminance sensor, a humidity sensor, a leakage sensor, a vibration sensor, or a geomagnetic sensor. In addition, the second sensor may be a heart rate sensor, an electrocardiogram (ECG) sensor, an oxygen sensor (measuring oxygen in the blood), a skin conduction sensor (measuring the current response of the skin, capable of measuring the amount of sweating), or a skin temperature sensor (measuring user temperature) and the like, but is not limited thereto.

경고 장치(300)는 탐지 장치(400)가 최종적으로 이상 이벤트가 발생한 것으로 판단한 경우에 탐지 장치(400)에서 전송된 알림 데이터에 따라 해당 알림 정보를 생성하여 감시자에게 전달하는 장치이다. 예를 들어, 이러한 알림 정보는 시각 정보(경고 화면 발생) 또는 청각 정보(경고음 발생) 등일 수 있으나, 이에 한정되는 것은 아니다.The warning device 300 is a device that generates corresponding notification information according to notification data transmitted from the detection device 400 and delivers it to a supervisor when the detection device 400 finally determines that an abnormal event has occurred. For example, the notification information may be visual information (occurrence of a warning screen) or auditory information (occurrence of a warning sound), but is not limited thereto.

도 3은 본 발명의 일 실시예에 따른 탐지 장치(400)의 블록 구성도를 나타낸다.3 is a block diagram showing a detection apparatus 400 according to an embodiment of the present invention.

탐지 장치(400)는 3차원 깊이 카메라(100) 및 IoT 센서(200)에서 실시간으로 수집된 정보를 기반으로 영상 속 객체에 대한 이상 이벤트의 발생 여부를 탐지하는 장치로서, 컴퓨팅(computing)이 가능한 전자 장치 또는 컴퓨팅 네트워크일 수 있다.The detection device 400 is a device that detects whether an abnormal event has occurred for an object in an image based on information collected in real time from the 3D depth camera 100 and the IoT sensor 200, and is capable of computing It may be an electronic device or a computing network.

예를 들어, 전자 장치는 데스크탑 PC(desktop personal computer), 랩탑 PC(laptop personal computer), 태블릿 PC(tablet personal computer), 넷북 컴퓨터(netbook computer), 워크스테이션(workstation), PDA(personal digital assistant), 스마트폰(smartphone), 스마트패드(smartpad), 또는 휴대폰(mobile phone), 등일 수 있으나, 이에 한정되는 것은 아니다.For example, the electronic device includes a desktop personal computer (PC), a laptop personal computer (PC), a tablet personal computer (PC), a netbook computer, a workstation, and a personal digital assistant (PDA). , a smartphone (smartphone), a smart pad (smartpad), or a mobile phone (mobile phone), etc., but is not limited thereto.

이러한 탐지 장치(400)는, 도 3에 도시된 바와 같이, 입력부(410), 통신부(420), 디스플레이(430), 메모리(440) 및 제어부(450)를 포함할 수 있다.As shown in FIG. 3 , the detection device 400 may include an input unit 410 , a communication unit 420 , a display 430 , a memory 440 , and a control unit 450 .

입력부(410)는 다양한 사용자(감시자 등)의 입력에 대응하여, 입력데이터를 발생시키며, 다양한 입력수단을 포함할 수 있다. 예를 들어, 입력부(410)는 키보드(key board), 키패드(key pad), 돔 스위치(dome switch), 터치 패널(touch panel), 터치 키(touch key), 터치 패드(touch pad), 마우스(mouse), 메뉴 버튼(menu button) 등을 포함할 수 있으나, 이에 한정되는 것은 아니다.The input unit 410 generates input data in response to input of various users (monitor, etc.), and may include various input means. For example, the input unit 410 may include a keyboard, a keypad, a dome switch, a touch panel, a touch key, a touch pad, and a mouse. (mouse), a menu button (menu button) and the like may be included, but is not limited thereto.

통신부(420)는 다른 장치와의 통신을 수행하는 구성이다. 예를 들어, 통신부(420)는 5G(5th generation communication), LTE-A(long term evolution-advanced), LTE(long term evolution), 블루투스, BLE(bluetooth low energe), NFC(near field communication), 와이파이(WiFi) 통신 등의 무선 통신을 수행하거나, 케이블 통신 등의 유선 통신을 수행할 수 있으나, 이에 한정되는 것은 아니다. 특히, 통신부(420)는 3차원 깊이 카메라(100)로부터 실시간의 3차원 깊이 영상을 수신할 수 있으며, IoT 센서(200)로부터 실시간의 센서 데이터를 수신할 수 있다. 또한, 통신부(420)는 경고 장치(300)로 알림 데이터를 전송할 수도 있다.The communication unit 420 is configured to communicate with other devices. For example, the communication unit 420 is 5th generation communication (5G), long term evolution-advanced (LTE-A), long term evolution (LTE), Bluetooth, bluetooth low energe (BLE), near field communication (NFC), Wireless communication such as Wi-Fi communication may be performed or wired communication such as cable communication may be performed, but is not limited thereto. In particular, the communication unit 420 may receive a 3D depth image in real time from the 3D depth camera 100 , and may receive sensor data in real time from the IoT sensor 200 . Also, the communication unit 420 may transmit notification data to the warning device 300 .

디스플레이(430)는 다양한 영상 데이터를 화면으로 표시하는 것으로서, 비발광형 패널이나 발광형 패널로 구성될 수 있다. 예를 들어, 디스플레이(430)는 액정 디스플레이(LCD; liquid crystal display), 발광 다이오드(LED; light emitting diode) 디스플레이, 유기 발광 다이오드(OLED; organic LED) 디스플레이, 마이크로 전자기계 시스템(MEMS; micro electro mechanical systems) 디스플레이, 또는 전자 종이(electronic paper) 디스플레이 등을 포함할 수 있으나, 이에 한정되는 것은 아니다. 또한, 디스플레이(430)는 입력부(410)와 결합되어 터치 스크린(touch screen) 등으로 구현될 수 있다.The display 430 displays various image data on a screen, and may be configured as a non-emissive panel or a light emitting panel. For example, the display 430 may include a liquid crystal display (LCD), a light emitting diode (LED) display, an organic light emitting diode (OLED) display, and a micro electromechanical system (MEMS). mechanical systems) display, or an electronic paper display, etc., but is not limited thereto. In addition, the display 430 may be implemented as a touch screen or the like in combination with the input unit 410 .

메모리(440)는 탐지 장치(400)의 동작에 필요한 각종 정보를 저장한다. 저장 정보로는 수신한 3차원 깊이 영상, 수신한 센서 데이터, 이벤트 인지 모델, 후술할 이상 이벤트 탐지 방법에 관련된 프로그램 정보 등이 포함될 수 있으나, 이에 한정되는 것은 아니다. 예를 들어, 메모리(440)는 그 유형에 따라 하드디스크 타입(hard disk type), 마그네틱 매체 타입(Sagnetic media type), CD-ROM(compact disc read only memory), 광기록 매체 타입(Optical Media type), 자기-광 매체 타입(Sagneto-optical media type), 멀티미디어 카드 마이크로 타입(Sultimedia card micro type), 플래시 저장부 타입(flash memory type), 롬 타입(read only memory type), 또는 램 타입(random access memory type) 등을 포함할 수 있으나, 이에 한정되는 것은 아니다. 또한, 메모리(440)는 그 용도/위치에 따라 캐시(cache), 버퍼, 주기억장치, 또는 보조기억장치이거나 별도로 마련된 저장 시스템일 수 있으나, 이에 한정되는 것은 아니다.The memory 440 stores various types of information necessary for the operation of the detection device 400 . The storage information may include, but is not limited to, a received 3D depth image, received sensor data, an event recognition model, and program information related to an abnormal event detection method to be described later. For example, the memory 440 may be a hard disk type, a magnetic media type, a compact disc read only memory (CD-ROM), or an optical media type depending on the type of the memory 440 . ), a Sagneto-optical media type, a multimedia card micro type, a flash memory type, a read only memory type, or a random access memory type), but is not limited thereto. In addition, the memory 440 may be a cache, a buffer, a main memory, an auxiliary memory, or a separately provided storage system according to its purpose/location, but is not limited thereto.

제어부(450)는 탐지 장치(400)의 다양한 제어 동작을 수행할 수 있다. 즉, 제어부(450)는 후술할 이상 이벤트 탐지 방법의 수행을 제어할 수 있으며, 탐지 장치(400)의 나머지 구성, 즉 입력부(410), 통신부(420), 디스플레이(430), 메모리(440) 등의 동작을 제어할 수 있다. 예를 들어, 제어부(450)는 하드웨어인 프로세서(processor) 또는 해당 프로세서에서 수행되는 소프트웨어인 프로세스(process) 등을 포함할 수 있으나, 이에 한정되는 것은 아니다.The controller 450 may perform various control operations of the detection device 400 . That is, the controller 450 may control the execution of an abnormal event detection method to be described later, and the remaining components of the detection device 400 , that is, the input unit 410 , the communication unit 420 , the display 430 , and the memory 440 . You can control actions such as For example, the controller 450 may include a processor that is hardware or a process that is software that is executed in the corresponding processor, but is not limited thereto.

도 4는 본 발명의 일 실시예에 따른 탐지 장치(400)에서 제어부(450)의 블록 구성도를 나타낸다. 도 5 및 도 7은 본 발명의 일 실시예에 따른 이상 이벤트 탐지 방법의 순서도를 나타낸다. 즉, 도 5는 이벤트 인지 모델의 학습 과정(S101 내지 S104)을 나타내고, 도 7은 학습된 이벤트 인지 모델을 이용한 이상 이벤트 탐지 과정(S201 내지 S205)을 나타낸다.4 is a block diagram showing the control unit 450 in the detection apparatus 400 according to an embodiment of the present invention. 5 and 7 are flowcharts of an abnormal event detection method according to an embodiment of the present invention. That is, FIG. 5 shows the learning process (S101 to S104) of the event recognition model, and FIG. 7 shows the abnormal event detection process (S201 to S205) using the learned event recognition model.

제어부(450)는 본 발명의 일 실시예에 따른 이상 이벤트 탐지 방법의 수행을 제어하며, 도 4에 도시된 바와 같이, 학습부(451), 모델 분석부(452), 센서 분석부(453) 및 알림 생성부(454)를 포함할 수 있다. 예를 들어, 학습부(451), 모델 분석부(452), 센서 분석부(453) 및 알림 생성부(454)는 제어부(450)의 하드웨어 구성이거나, 제어부(450)에서 수행되는 소프트웨어인 프로세스일 수 있으나, 이에 한정되는 것은 아니다.The control unit 450 controls the execution of the abnormal event detection method according to an embodiment of the present invention, and as shown in FIG. 4 , the learning unit 451 , the model analysis unit 452 , and the sensor analysis unit 453 . and a notification generator 454 . For example, the learning unit 451 , the model analysis unit 452 , the sensor analysis unit 453 , and the notification generation unit 454 are a hardware configuration of the control unit 450 or a process that is software performed by the control unit 450 . may be, but is not limited thereto.

한편, 이벤트 인지 모델은 입력 데이터(또는 입력 데이터 및 출력 데이터 쌍)의 학습 데이터를 통해 딥 러닝(deep learning) 기법에 따라 학습된 딥 러닝 모델이다. 예를 들어, 딥 러닝 기법은 Deep Neural Network(DNN), Convolutional Neural Network(CNN), Recurrent Neural Network(RNN), Restricted Boltzmann Machine(RBM), Deep Belief Network(DBN), Deep Q-Networks 등을 포함할 수 있으나, 이에 한정되는 것은 아니다.On the other hand, the event recognition model is a deep learning model learned according to a deep learning technique through training data of input data (or input data and output data pair). For example, deep learning techniques include Deep Neural Network (DNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Deep Q-Networks, etc. can, but is not limited thereto.

이벤트 인지 모델은 다수의 레이어(layer)를 포함하여, 입력 데이터와 출력 데이터의 관계에 대한 함수를 가진다. 즉, 이벤트 인지 모델에 입력 데이터가 입력되는 경우, 해당 함수에 따른 출력 데이터가 출력될 수 있다. The event recognition model includes a plurality of layers and has a function on the relationship between input data and output data. That is, when input data is input to the event recognition model, output data according to a corresponding function may be output.

이벤트 인지 모델은 입력 데이터와 출력 데이터 간의 관계를 다수의 층(즉, 레이어)으로 표현하며, 이러한 다수의 표현층을 “신경망(neural network)”라 지칭하기도 한다. 신경망 내의 각 레이어는 적어도 하나 이상의 필터로 이루어지며, 각 필터는 가중치(weight)의 매트릭스(matrix)를 가진다. 즉, 해당 필터의 매트릭스에서 각 원소(픽셀)는 가중치의 값에 해당할 수 있다. 이러한 필터는 2차원 또는 3차원 매트릭스를 가질 수 있다. 즉, 2차원 매트릭스는 총 h'×w'(단, h' 및 w'는 자연수) 개의 가중치 원소를 가지며, 이를 “차원 텐서(tensor)”라 지칭하기도 한다. 또한, 3차원 매트릭스는 총 h×w×d(단, h, w 및 d은 자연수) 개의 가중치 원소를 가지며, 이를 “차원 텐서”라 지칭하기도 한다. 만일, 하나의 레이어 내에서 3차원 매트릭스(h×w×d)의 필터가 복수개인 m개(m은 2이상의 자연수)가 구비된 경우, 총 h×w×d×n 개의 가중치 원소를 가지며, 이를 “차원 텐서”라 지칭하기도 한다.The event recognition model expresses the relationship between input data and output data in multiple layers (ie, layers), and these multiple representation layers are also referred to as “neural networks”. Each layer in the neural network consists of at least one filter, and each filter has a matrix of weights. That is, each element (pixel) in the matrix of the corresponding filter may correspond to a weight value. Such filters may have a two-dimensional or three-dimensional matrix. That is, the two-dimensional matrix has a total of h'×w' (where h' and w' are natural numbers) number of weighting elements, which is also referred to as a “dimensional tensor”. In addition, the 3D matrix has a total of h×w×d weight elements (where h, w, and d are natural numbers), which is also referred to as a “dimensional tensor”. If m filters of a three-dimensional matrix (h×w×d) are provided in one layer (m is a natural number greater than or equal to 2), it has a total of h×w×d×n weighting elements, This is sometimes referred to as a “dimensional tensor”.

이하, 본 발명에 따른 이상 이벤트 탐지 방법에 대해 보다 상세하게 설명하도록 한다.Hereinafter, the abnormal event detection method according to the present invention will be described in more detail.

<학습 과정><Learning process>

먼저, 도 5를 참조하면, 본 발명에 따른 학습 과정은 S101 내지 S104를 포함할 수 있으며, 학습부(451)에 의해 제어될 수 있다. 즉, 학습부(451)는 3차원 영상 카메라(100)로부터 획득한 3차원 깊이 영상 및 인체 관절 정보를 이용하여 이상 이벤트 탐지를 위한 딥 러닝 모델인 이벤트 인지 모델을 생성하는 학습 과정을 수행할 수 있다.First, referring to FIG. 5 , the learning process according to the present invention may include steps S101 to S104 and may be controlled by the learning unit 451 . That is, the learning unit 451 may perform a learning process of generating an event recognition model, which is a deep learning model for detecting abnormal events, using the 3D depth image and human joint information obtained from the 3D image camera 100 . there is.

즉, S101에서, 3차원 영상 카메라(100)로부터 3차원 깊이 영상을 취득하며, 해당 3차원 깊이 영상에 나타난 인물에 대한 인체 관절 정보를 취득한다. 이때, 인체 관절 정보는 해당 3차원 깊이 영상에 대한 영상 처리를 통해 취득된 것으로서, 해당 화면 상의 인물에 대한 관절 위치에 대한 정보이며, 영상 정보일 수 있다. That is, in S101, a 3D depth image is acquired from the 3D imaging camera 100, and human joint information about a person displayed in the 3D depth image is acquired. In this case, the human joint information is obtained through image processing on the corresponding 3D depth image, and is information on the joint position of the person on the corresponding screen, and may be image information.

가령, 인체 관절 정보는 다양한 인체 관절 추출 알고리즘을 이용하여 취득될 수 있다. 또는, 인체 관절 정보는 딥 러닝 기법을 통해 학습된 딥 러닝 모델을 이용하여 취득될 수 있다. 이 경우, 해당 딥 러닝 모델은 인물을 포함하는 3차원 깊이 영상의 입력 데이터와, 해당 3차원 깊이 영상에서 해당 인물의 인체 관절 정보에 대한 결과 데이터를 포함한 학습 데이터를 이용해 딥 러닝 기법에 따라 학습될 수 있다. 이와 같이 학습된 딥 러닝 모델은 3차원 깊이 영상이 입력되는 경우, 그에 대한 인체 관절 정보를 결과값으로 생성할 수 있다.For example, human joint information may be acquired using various human joint extraction algorithms. Alternatively, the human joint information may be acquired using a deep learning model learned through a deep learning technique. In this case, the deep learning model is to be learned according to the deep learning technique using the input data of the 3D depth image including the person and the learning data including the result data on the joint information of the person in the 3D depth image. can When a three-dimensional depth image is input, the deep learning model learned in this way may generate human joint information about it as a result value.

S102에서, 취득한 3차원 깊이 영상 및 인체 관절 정보를 이상 이벤트 별로 분류 후, 데이터 학습을 위한 학습 데이터 GT(Ground-Truth)를 생성한다. 즉, 학습 데이터는 3차원 깊이 영상에 대한 입력 데이터와, 그에 대한 이상 이벤트 여부 또는 이상 이벤트 종류에 대한 결과 데이터를 쌍으로 포함할 수 있다. 또한, 각 학습 데이터의 쌍에는 그와 관련된 인체 관절 정보가 매칭될 수 있다.In S102, after classifying the acquired 3D depth image and human joint information for each abnormal event, training data GT (Ground-Truth) for data learning is generated. That is, the training data may include input data for the 3D depth image and result data on whether or not an abnormal event is performed or the type of the abnormal event as a pair. In addition, each pair of learning data may be matched with related human joint information.

S103에서, 생성된 학습 데이터를 이용하여 딥 러닝 기법에 따른 학습을 수행한다. 특히, 학습 시에 학습 데이터 외에 그와 매칭된 인체 관절 정보를 이용하여 네트워크 가중치를 업데이트함으로써, 그 이상 이벤트 인식(분류)의 정확도를 더욱 향상할 수 있다.In S103, learning according to the deep learning technique is performed using the generated learning data. In particular, by updating the network weights using human joint information matched with the learning data in addition to the learning data during learning, the accuracy of recognizing (classifying) abnormal events can be further improved.

S104에서, 학습 결과로 딥 러닝 모델인 이벤트 인지 모델을 생성할 수 있다.In S104, an event recognition model that is a deep learning model may be generated as a result of the learning.

도 6은 DNN(Deep Neural Network) 기반으로 학습된 이벤트 인지 모델의 구조에 대한 일 예를 나타낸다.6 shows an example of the structure of an event recognition model learned based on a deep neural network (DNN).

가령, 이벤트 인지 모델은, 도 6에 도시된 바와 같이, DNN(Deep Neural Network)을 기반으로 3차원 깊이 영상과 인체 관절 정보를 이용하여 학습이 수행된 딥 러닝 모델일 수 있다. 즉, S103에서, 3차원 깊이 영상(예를 들어, Spatio-temporal 분석)와 그 인체 관절 정보 영상(예를 들어, Spatio 분석)을 DNN에 입력하여 딥러닝 학습이 수행될 수 있다.For example, as shown in FIG. 6 , the event recognition model may be a deep learning model in which learning is performed using a 3D depth image and human joint information based on a deep neural network (DNN). That is, in S103, deep learning may be performed by inputting a three-dimensional depth image (eg, spatio-temporal analysis) and the human joint information image (eg, spatio analysis) to the DNN.

특히, DNN은 종래의 행위 인지 모델과 다르게 관절 인지를 수행하기 위한 컨볼루션 레이어(convolution layer)를 포함한다. 즉, 컨볼루션 레이어는 이상 이벤트를 인지(분류)하기 위한 필터(이하, “제1 필터”라 지칭함)와, 인체 관절을 인지(분류)하기 위한 필터(이하, “제2 필터”라 지칭함)를 각각 포함한다.In particular, the DNN includes a convolution layer for performing joint recognition differently from the conventional behavior recognition model. That is, the convolutional layer includes a filter for recognizing (classifying) abnormal events (hereinafter referred to as “first filter”) and a filter for recognizing (classifying) human joints (hereinafter referred to as “second filter”). include each.

이때, 이상 이벤트 인지의 정확도를 향상하기 위해, 학습 시에, 정답 행위(이벤트)와 추론한 행위(이벤트)의 오차가 최소가 되도록 제1 필터의 네트워크 가중치가 업데이트될 뿐 아니라, 현재 영상 프레임에서 객체(사람)의 관절 위치와 정답 관절 위치의 오차가 최소가 되도록 제2 필터의 네트워크 가중치가 업데이트될 수 있다. 즉, 인체 관절 정보를 이용하여 현재 영상 프레임에서의 인체 관절 위치와 이상 이벤트의 인체 관절 위치 간에 오차를 줄이도록 제2 필터의 가중치가 업데이트될 수 있다.At this time, in order to improve the accuracy of recognizing abnormal events, the network weight of the first filter is updated so that the error between the correct action (event) and the inferred action (event) is minimized during learning, as well as in the current image frame. The network weight of the second filter may be updated so that an error between the joint position of the object (person) and the correct joint position is minimized. That is, the weight of the second filter may be updated to reduce an error between the human joint position in the current image frame and the human joint position of the abnormal event using the human joint information.

3차원 깊이 영상(V)은 연속된 다수의 프레임(F)을 포함하며, 이는 V={F₁, F₂, …F_N}(단, N은 2이상의 자연수)로 나타낼 수 있다. 이때, 이벤트 인지 모델에서, 이상 이벤트 인지에 대한 손실, 즉 이상 이벤트 분류 손실(classification loss)(L_C)는 하기 식(1)과 같이 나타낼 수 있다.The three-dimensional depth image (V) includes a plurality of consecutive frames (F), which are V={F ₁ , F ₂ , ... F _N } (however, N is a natural number greater than or equal to 2). In this case, in the event recognition model, the loss for the recognition of the abnormal event, that is, the classification loss ( _LC ) of the abnormal event may be expressed as in Equation (1) below.

(1)

(One)

단, 식(1)에서, x_k는 관절 로지트(key-point logits), z_k는 관절 타켓(key-point targets), q는 양의 가중치(positive weight), σ는 시그모이드 함수(sigmoid function)를 각각 나타낸다.However, in Equation (1), x _k is the key-point logits, z _k is the key-point targets, q is the positive weight, and σ is the sigmoid function. function) respectively.

또한, 이벤트 인지 모델에서, 관절 인지에 대한 손실(key-point loss)(L_K)은 하기 식(2)와 같이 나타낼 수 있다.In addition, in the event recognition model, the loss (key-point loss) (L _K ) for joint recognition can be expressed as the following Equation (2).

(2)

단, 식(2)에서, y_i'는 어느 한 핫 벡터의 클래스 GT i번째 요소(class ground truth i_th element of one hot vector), y_i는 네트워크의 i번째 로지트 요소(i_th logit element from network)를 각각 나타낸다.However, in Equation (2), y _i ' is the class ground truth i _th element of one hot vector, and y _i is the i _th logit element from the network. network) respectively.

또한, 이벤트 인지 모델에서, L_C와 L_K에 따른 합계 손실(total loss)(L)은 하기 식(3)과 같이 나타낼 수 있다.In addition, in the event recognition model, the total loss (L) according to L _C and L _K can be expressed as Equation (3) below.

(3)

단, 식(3)에서, w_c는 L_c(classification loss)의 가중치(weight), w_k는 L_k(key-point loss)의 가중치(weight)를 각각 나타낸다. However, in Equation (3), w _c represents a weight of L _c (classification loss), and w _k represents a weight of L _k (key-point loss), respectively.

즉, 이벤트 인지 모델은 이상 이벤트 인지에 대한 손실(L_C)과 관절 인지에 대한 손실(L_K)에 따른 합계 손실(L)이 최소화되도록 컨볼루션 레이어의 각 필터의 네트워크 가중치가 업데이트됨으로써 최적화될 수 있다. 이와 같이 최적화된 이벤트 인지 모델은 메모리(440)에 저장되어, 이후 이상 이벤트 탐지 과정에서 활용될 수 있다.That is, the event recognition model can be optimized by updating the network weight of each filter of the convolutional layer so that the total loss (L) according to the loss (L _C ) and the loss (L _K ) for recognizing an anomaly is minimized. can The event recognition model optimized as described above is stored in the memory 440 and may be used later in the abnormal event detection process.

다만, 상술한 S101 내지 S104에 따른 학습 과정은 탐지 장치(400)가 아닌 타 장치에서 수행될 수도 있다. 이 경우, 해당 타 장치에서 S101 내지 S104에 따른 학습 과정이 수행되어 이벤트 인지 모델이 최적화 학습될 수 있다. 이후, 학습된 이벤트 인지 모델은 통신부(420)를 통해 타 장치로부터 탐지 장치(400)로 전송되어 탐지 장치(400)의 메모리(440)에 저장될 수 있다.However, the learning process according to S101 to S104 described above may be performed in a device other than the detection device 400 . In this case, the learning process according to S101 to S104 may be performed in the corresponding other device to optimize the event recognition model. Thereafter, the learned event recognition model may be transmitted from another device to the detection device 400 through the communication unit 420 and stored in the memory 440 of the detection device 400 .

<이상 이벤트 탐지 과정><Anomalous event detection process>

다음으로, 도 7을 참조하면, 본 발명에 따른 이상 이벤트 탐지 과정은 S201 내지 S205를 포함할 수 있으며, 모델 분석부(452), 센서 분석부(453) 및 알림 생성부(454)에 의해 제어될 수 있다. 즉, 이상 이벤트 탐지 과정에서는 학습 과정 결과로 생성된 이벤트 인지 모델을 이용하여, 실시간의 환경에 발생하는 3차원 깊이 영상 및 IoT 센서 데이터에 대한 이상 이벤트를 탐지한다. 다만, 이벤트 인지 모델 학습 시 사용한 인체 관절 정보는 이상 이벤트 탐지 과정에서 필요하지 않다. 이는 이상 이벤트 학습 시의 학습 데이터에 인체 관절 정보가 포함되지 않았기 때문이다. 즉, 학습 데이터는 인체 관절 정보를 제외한 3차원 깊이 영상을 포함한다. 다만, 특정 이상 이벤트에 대한 인체 관절 정보는 이상 이벤트 학습 시에 네트워크 가중치를 업데이트하는데 활용되었을 뿐이다. 이에 따라, 학습이 완료된 이벤트 인지 모델은 인체 관절 정보에 따른 네트워크 가중치가 이미 반영된 상태이다.Next, referring to FIG. 7 , the abnormal event detection process according to the present invention may include S201 to S205 , and is controlled by the model analyzer 452 , the sensor analyzer 453 , and the notification generator 454 . can be That is, in the abnormal event detection process, an abnormal event for the 3D depth image and IoT sensor data generated in the real-time environment is detected using the event recognition model generated as a result of the learning process. However, the human joint information used in learning the event recognition model is not required in the abnormal event detection process. This is because human joint information is not included in the learning data during abnormal event learning. That is, the learning data includes a 3D depth image excluding information on human joints. However, the human joint information for a specific abnormal event is only used to update the network weight when learning the abnormal event. Accordingly, in the event recognition model that has been trained, the network weight according to the joint information of the human body is already reflected.

S201에서, 실시간의 3차원 깊이 영상 및 IoT 센서 데이터가 수신 입력된다.In S201, a real-time 3D depth image and IoT sensor data are received and input.

S202에서, 모델 분석부(452)는 이상 이벤트 인지 모델을 기반으로 이상 행위를 탐지한다. 즉, 이상 이벤트 인지 모델에 수신된 3차원 깊이 영상을 입력하여, 그 결과값에 따른 이상 이벤트 여부를 인지할 수 있다.In S202, the model analysis unit 452 detects an abnormal behavior based on the abnormal event recognition model. That is, by inputting the received 3D depth image to the abnormal event recognition model, it is possible to recognize whether an abnormal event occurs according to the result value.

S203에서, 센서 분석부(453)는 S202에서 이상 이벤트로 인지되는 경우에 IoT 센서 데이터를 이용하여 해당 이상 이벤트의 실제 발생 여부를 확인(판단)한다. 즉, IoT 센서 데이터는 이상 이벤트의 종류를 탐지하기 위해 사용되는 것이 아니라, 이상 이벤트 모델에서 인지한 이상 이벤트가 실제로 발생하였는지 여부를 확인하기 위해 사용된다.In S203, the sensor analysis unit 453 checks (determines) whether the abnormal event actually occurs by using IoT sensor data when it is recognized as an abnormal event in S202. That is, IoT sensor data is not used to detect the type of abnormal event, but is used to check whether the abnormal event recognized by the abnormal event model actually occurred.

예를 들어, 화면 속의 인물이 소지한 심박수 센서의 IoT 센서(200)를 활용할 수 있다. 이 경우, S202에서 “싸움”이란 이상 이벤트를 인지하면, 심박수 센서의 IoT 센서 데이터가 정상 패턴을 보이면 “싸움”의 이상 이벤트가 실제 발생하지 않았다고 판단하고, 해당 IoT 센서 데이터가 급격한 심박수 상승 등과 같은 비정상 패턴을 보이면 “싸움”의 이상 이벤트가 실제 발생한 것으로 판단할 수 있다. For example, the IoT sensor 200 of the heart rate sensor possessed by the person in the screen may be utilized. In this case, when S202 recognizes an abnormal event called “fighting”, if the IoT sensor data of the heart rate sensor shows a normal pattern, it is determined that the abnormal event of “fighting” has not actually occurred, and the corresponding IoT sensor data If an abnormal pattern is shown, it can be determined that an abnormal event of “fighting” has actually occurred.

또한, S202에서 “쓰러짐”이란 이상 이벤트를 인지하면, 심박수 센서의 IoT 센서 데이터가 정상 패턴을 보이면 “쓰러짐”의 이상 이벤트가 실제 발생하지 않았다고 판단하고, 해당 IoT 센서 데이터가 급격한 심박수 상승 또는 심박수 감소 등과 같은 비정상 패턴을 보이면 “쓰러짐” 이벤트가 실제 발생한 것으로 판단할 수 있다.In addition, when S202 recognizes an abnormal event of “fall”, if the IoT sensor data of the heart rate sensor shows a normal pattern, it is determined that the abnormal event of “fall” has not actually occurred, and the corresponding IoT sensor data shows a sudden increase in heart rate or decrease in heart rate If an abnormal pattern such as such is shown, it can be determined that the “falling down” event has actually occurred.

다만, S204에서, 이상 이벤트 발생 여부를 최종 판단한다. 즉, 이벤트 인지 모델을 통해 이상 이벤트가 인지되고 IoT 센서 데이터를 통해 해당 이상 이벤트의 발생이 확인되는 경우, 해당 이상 이벤트가 발생한 것으로 판단하여, S205를 수행한다. 만일, 이벤트 인지 모델을 통해 이상 이벤트가 인지되지 않거나 IoT 센서 데이터를 통해 이상 이벤트의 발생이 확인되지 않는 경우, S201로 돌아가 다음의 실시간 영상 및 IoT 센서에 대한 상술한 과정을 반복 수행한다.However, in S204, it is finally determined whether an abnormal event has occurred. That is, when the abnormal event is recognized through the event recognition model and the occurrence of the corresponding abnormal event is confirmed through the IoT sensor data, it is determined that the abnormal event has occurred, and S205 is performed. If the abnormal event is not recognized through the event recognition model or the occurrence of the abnormal event is not confirmed through the IoT sensor data, the process returns to S201 and repeats the above-described process for the following real-time image and IoT sensor.

도 8은 DNN(Deep Neural Network) 기반으로 학습된 이벤트 인지 모델에 슬라이딩 윈도우를 적용한 경우에 대한 일 예를 나타낸다.8 shows an example of a case in which a sliding window is applied to an event recognition model learned based on a deep neural network (DNN).

특히, S201 내지 S204는, 도 8에 도시된 바와 같이, 슬라이딩 윈도우(sliding window) 기술을 접목하여 수행될 수 있다. 즉, 슬라이딩 윈도우는 이상 이벤트의 시작과 끝을 분석하기 위한 것으로서, 연속된 3차원 깊이 영상의 프레임들을 특정 시간 단위로 나눠서, 나눠진 단위 별로 이상 이벤트를 탐지하는 기술이다. 이 경우, 그 탐색 시간 및 효율이 보다 향상될 수 있다.In particular, steps S201 to S204 may be performed by grafting a sliding window technology, as shown in FIG. 8 . That is, the sliding window is used to analyze the start and end of an abnormal event, and is a technology for dividing consecutive 3D depth image frames into specific time units and detecting abnormal events for each divided unit. In this case, the search time and efficiency can be further improved.

S205에서, 알림 생성부(454)는 이상 이벤트가 실제 발생한 것으로 판단된 경우, 알림 데이터를 생성하여 경고 장치(300)로 전송할 수 있다. 물론, 경고 장치(300) 외에 탐지 장치(400) 자체에 마련된 디스플레이(430) 또는 다른 출력부를 이용하여 해당 알림 데이터에 따른 알림 정보고를 감시자에게 전달할 수 있다.In S205 , when it is determined that the abnormal event has actually occurred, the notification generator 454 may generate notification data and transmit it to the warning device 300 . Of course, in addition to the warning device 300 , the notification information according to the corresponding notification data may be transmitted to the monitor using the display 430 or other output unit provided in the detection device 400 itself.

상술한 바와 같이 구성되는 본 발명은 영상의 전처리 과정 없이 e2e(end to end) 구조로 보다 빠르게 이상 이벤트 탐지가 가능하므로, 실시간 환경에 적용 가능하다. The present invention configured as described above can be applied to a real-time environment because it is possible to detect an anomaly more quickly in an e2e (end to end) structure without an image pre-processing process.

한편, 종래와 같이 단순히 영상만을 기반으로 머신 러닝(machine learning) 기법에 따라 학습된 모델의 경우, 영상만을 고려하므로 그 이상 이벤트 탐지의 정확도가 떨어질 수 있다. 이에 따라, 본 발명은 야간 및 복잡한 환경에서 복합 센서 데이터 기반 이상 이벤트 인지 기술을 통해 이상 이벤트를 종래 모다 정확하게 인지할 수 있다. 즉, 본 발명은 3차원 깊이 영상과 함께 인체 관절 정보 및 IoT 센서 데이터도 고려함에 따라, 야간 등의 저조도 환경에서도 이상 이벤트에 대한 정확한 탐지가 가능하다. Meanwhile, in the case of a model trained according to a machine learning technique based on only an image as in the prior art, since only an image is considered, the accuracy of detecting an abnormal event may be lowered. Accordingly, the present invention can recognize an abnormal event more accurately than in the prior art through the complex sensor data-based abnormal event recognition technology at night and in a complex environment. That is, the present invention considers human joint information and IoT sensor data as well as a 3D depth image, so that it is possible to accurately detect an abnormal event even in a low-illuminance environment such as at night.

특히, 3차원 깊이 영상을 기반으로 하는 학습 데이터를 이용하여 학습하여 컨볼루션 레이어의 제1 필터의 네트워크 가중치를 업데이트하되, 학습 중에 인체 관절 정보를 이용하여 컨볼루션 레이어의 제2 필터의 네트워크 가중치를 업데이트한다. 그 결과, 학습된 이벤트 인지 모델은 이상 이벤트 인지의 정확도가 더욱 향상될 수 있다. 이와 같이 학습된 이벤트 인지 모델은 이후에 인체 관절 정보 없이 3차원 깊이 영상만 입력하여도 그 이상 이벤트 인지(분류)에 대한 결과값을 생성할 수 있다. 즉, 학습 시에는 3차원 깊이 영상 및 그 인체 관절 정보가 필요하지만, 이후 학습된 이벤트 인지 모델 활용 시에는 3차원 깊이 영상만 필요하다. 따라서, 본 발명은 실시간 환경에서 빠르고 정확하게 영상 감시 중의 이상 이벤트 탐지가 가능하다. 특히, 본 발명은 보안 취약 환경에 설치하여 영상 감시 시스템과 연동함으로써, 야간 등과 같은 저조도 환경 및 복잡한 이벤트 상황에서도 종래 보다 정확하고 신속하게 이상 이벤트의 탐지가 가능하여, 개인의 신변 및 재산을 보장하고 범죄를 예방할 수 있다.In particular, the network weight of the first filter of the convolutional layer is updated by learning using learning data based on a 3D depth image, but the network weight of the second filter of the convolutional layer is updated using human joint information during learning. update As a result, the learned event recognition model may further improve the accuracy of abnormal event recognition. The event recognition model learned in this way can generate a result value for event recognition (classification) beyond that by inputting only a three-dimensional depth image without information on human joints thereafter. That is, a 3D depth image and its joint information are required for learning, but only a 3D depth image is required when using the learned event recognition model. Therefore, according to the present invention, it is possible to quickly and accurately detect an abnormal event during video monitoring in a real-time environment. In particular, the present invention is installed in a security-vulnerable environment and interlocked with the video surveillance system, so that abnormal events can be detected more accurately and quickly than before, even in low-light environments and complex event situations such as at night, to ensure personal safety and property, and crime can be prevented.

본 발명의 상세한 설명에서는 구체적인 실시 예에 관하여 설명하였으나 본 발명의 범위에서 벗어나지 않는 한도 내에서 여러 가지 변형이 가능함은 물론이다. 그러므로 본 발명의 범위는 설명된 실시 예에 국한되지 않으며, 후술되는 청구범위 및 이 청구범위와 균등한 것들에 의해 정해져야 한다.In the detailed description of the present invention, although specific embodiments have been described, various modifications are possible without departing from the scope of the present invention. Therefore, the scope of the present invention is not limited to the described embodiments, and should be defined by the following claims and their equivalents.

100: 3차원 깊이 카메라 200: IoT 센서
300: 경고 장치 400: 탐지 장치
410: 입력부 420: 통신부
430: 디스플레이부 440: 메모리
450: 제어부 451: 학습부
452: 모델 분석부 453: 센서 분석부
454: 알림 생성부100: 3D depth camera 200: IoT sensor
300: warning device 400: detection device
410: input unit 420: communication unit
430: display unit 440: memory
450: control unit 451: learning unit
452: model analysis unit 453: sensor analysis unit
454: notification generator

Claims

A method of detecting an abnormal event for an object in a real-time 3D depth image captured by a 3D depth camera of a video surveillance system, which is performed in an electronic device, the method comprising:
acquiring a 3D depth image and Internet of Things (IoT) sensor data in real time around a specific place;
Through training data including input data for 3D depth image and result data for abnormal event or abnormal event type, respectively, acquired using an event recognition model learned according to deep learning technique Recognizing whether there is an abnormal event on the object in the image based on the 3D depth image; and
when it is recognized as an abnormal event, determining whether the abnormal event actually occurs using IoT sensor data;
How to include.

According to claim 1,
The event recognition model includes a convolution layer each including a filter for abnormal event recognition and a filter for human joint recognition.

3. The method of claim 2,
The event recognition model is a model in which the weight of the convolution layer is updated to reduce an error between the position of the human joint in the current image frame and the position of the human joint of the abnormal event by using the human joint information.

4. The method of claim 3,
The event recognition model is a model in which the weight of the convolutional layer is updated so that the total loss according to the loss for abnormal event classification and the loss for joint recognition is minimized.

According to claim 1,
The IoT sensor data is data measured on the state of the person in the IoT sensor possessed by the person in the image.

6. The method of claim 5,
The IoT sensor is a method included in a wearable device that is in contact with, attached to, worn, or inserted into a body part of the person.

6. The method of claim 5,
The IoT sensor is any one of a heart rate sensor, an electrocardiogram sensor, an oxygen sensor, a skin conduction sensor, or a skin temperature sensor.

According to claim 1,
The step of recognizing is a method of recognizing whether an abnormal event exists by dividing continuous 3D depth images acquired in real time into specific time units according to a sliding window.

According to claim 1,
When it is determined that the abnormal event has actually occurred, the method further comprising the step of generating notification data.

A device for detecting an abnormal event on an object in a real-time 3D depth image captured by a 3D depth camera of a video surveillance system, comprising:
a communication unit configured to receive a 3D depth image and Internet of Things (IoT) sensor data around a specific place in real time;
a storage unit for storing an event recognition model previously learned according to a deep learning technique through learning data including input data for a three-dimensional depth image and result data on whether an abnormal event or an abnormal event type is present; and
A control unit for controlling the detection of an abnormal event by using the information received in the communication unit and the information stored in the storage unit;
The control unit is
Using the stored event recognition model, based on the received 3D depth image, it recognizes whether there is an abnormal event on the object in the image,
When recognized as an abnormal event, a device that determines whether the abnormal event actually occurs by using the received IoT sensor data.

a three-dimensional depth camera for taking a three-dimensional depth image;
IoT sensor for measuring IoT (Internet of Things) sensor data by being located around the shooting position of the three-dimensional depth camera; and
A control unit for receiving a 3D depth image and sensor data in real time, and controlling the detection of an abnormal event for an object in the real-time 3D depth image;
The control unit is
Through the training data including input data for the 3D depth image and result data on the presence or absence of abnormal events or abnormal event types, respectively, the received event recognition model according to the deep learning technique is used. Recognizes whether there is an abnormal event on the object in the image based on the 3D depth image,
When an abnormal event is recognized, an image monitoring system that determines whether the abnormal event actually occurs using the received sensor data.