KR101995107B1

KR101995107B1 - Method and system for artificial intelligence based video surveillance using deep learning

Info

Publication number: KR101995107B1
Application number: KR1020170036874A
Authority: KR
Inventors: 노용만; 김성태; 김형일; 장진혁
Original assignee: 한국과학기술원
Priority date: 2017-03-23
Filing date: 2017-03-23
Publication date: 2019-07-01
Also published as: KR20180107930A

Abstract

딥 러닝을 이용한 인공지능 기반 영상 감시 방법 및 시스템이 개시된다. 본 발명의 일 실시예에 따른 인공지능 기반 영상 감시 시스템은 미리 정의된 검출 딥 러닝 네트워크를 이용하여 감시 카메라 영상에 대해 미리 설정된 적어도 하나의 객체를 검출하는 검출부; 미리 정의된 인식 딥 러닝 네트워크와 상기 객체에 대한 검출 결과에 기초하여 상기 검출된 객체에 대한 정보를 인식하는 인식부; 및 미리 정의된 추적 딥 러닝 네트워크와 상기 검출된 객체에 대한 검출 결과에 기초하여 상기 검출된 객체를 추적하는 추적부를 포함하고, 상기 검출부는 컨볼루션 신경망(CNN; convolution neural network) 기반의 히트 맵(heat map) 생성 방법을 포함하는 영역 제안 추출(region proposal extraction)을 이용하여 상기 객체를 검출할 수 있다.An artificial intelligence based video surveillance method and system using deep learning is disclosed. The artificial intelligence based video surveillance system according to an embodiment of the present invention includes a detector for detecting at least one object preset for a surveillance camera image using a predefined sensing deep learning network; A recognition unit for recognizing information about the detected object based on a detection result for the object; And a tracking unit for tracking the detected object based on a predefined tracking deep learning network and a detection result for the detected object, wherein the detecting unit detects a heat map based on a convolution neural network (CNN) the object can be detected using a region proposal extraction including a heat map generation method.

Description

TECHNICAL FIELD [0001] The present invention relates to an artificial intelligence based video surveillance method and system using deep learning,

본 발명은 스마트 관제를 위한 인공지능 기반 영상 감시 기술에 관한 것으로서, 보다 구체적으로 CCTV로 촬영된 입력 비디오에 대해 인공지능 딥 러닝 기술 기반으로 객체의 검출, 인식 및 추적할 수 있는 인공지능 기반 영상 감시 방법 및 시스템에 관한 것이다.The present invention relates to artificial intelligence based video surveillance technology for smart control, and more specifically, to an artificial intelligence based video surveillance technology capable of detecting, recognizing, and tracking objects based on artificial intelligent deep learning technology for input video captured by CCTV &Lt; / RTI >

다양한 범죄 및 사고를 예방하기 위해 CCTV 감시(surveillance)가 활성화 되고 있으며, 스마트폰과 연동하는 감시 시스템이 개발되는 등 일상 생활 공간으로 적용이 확대되고 있다. CCTV 감시의 적용범위가 확대됨에 따라 하나의 감시 시스템에서도 엄청난 양의 비디오 데이터가 생산되고 있다. 따라서, 방대한 양의 비디오 데이터를 한 관리자의 모니터링에 의해 위험을 탐지하고 상황 분석을 하는 것은 불가능하게 되고 있다. 현재 시장에서는 4세대 CCTV 핵심인 스마트 감시 기능이 최신 기술로 적용되고 있다.CCTV monitoring (surveillance) is being activated to prevent various crimes and accidents, and surveillance systems linked to smart phones have been developed and are being applied to everyday living spaces. As the coverage of CCTV surveillance is expanded, a huge amount of video data is being produced in one surveillance system. Therefore, it is impossible to detect a danger and analyze situation by monitoring a large amount of video data by one manager. In the current market, smart monitoring function, which is the core of 4th generation CCTV, is being applied as the latest technology.

스마트 기능으로서 사람 검출 및 차량 검출과 같은 영상인식 기술이 CCTV에 접목되고 있으나 방대한 영상 데이터 분석은 한계가 있다. 따라서 인간과 같은 판단능력을 가지고 방대한 CCTV 감시 영상을 지능적으로 분석할 수 있는 고도화된 영상 인식 기술 기반 영상 감시 방법이 요구되고 있다.As a smart function, image recognition technology such as human detection and vehicle detection is applied to CCTV, but analysis of vast amount of image data is limited. Therefore, there is a need for a video monitoring method based on advanced image recognition technology capable of intelligently analyzing vast CCTV surveillance images with human judgment ability.

기존 CCTV 영상 인식 기술은 실제 환경에서 적용할 때 기상조건이나 주변 조명상태, 그리고 영상 내 객체의 모양 변화 때문에 인식 성능이 저하되어 실용성 문제가 대두되고 있다. 뿐만 아니라, 기존 CCTV에 탑재되는 영상 인식 기술은 기초적인 물체 검출 기술이며 감시 체계관리에 도움이 되지만 한계가 있다. 방대한 양의 비디오 분석을 위해서는 고성능의 객체 검출 및 인식 기술과 더불어 상황인식 및 예측과 같은 지능화 수준의 영상인식 기술이 요구되고 있다.Conventional CCTV image recognition technology has a problem of practicality due to degradation of recognition performance due to change of weather condition, surrounding illumination condition, and shape of an object in an actual environment when applied in a real environment. In addition, the existing CCTV video recognition technology is a basic object detection technology and it helps to manage the surveillance system, but it has limitations. For a vast amount of video analysis, there is a demand for intelligent image recognition technology such as situation recognition and prediction, as well as high-performance object detection and recognition technology.

따라서, 고성능의 객체 검출 및 인식 기술과 더불어 상황 예측과 같은 지능화 수준의 영상 인식을 위한 인공지능 CCTV 영상 감시 기술의 필요성이 대두된다.Therefore, there is a need for an artificial intelligent CCTV video surveillance technology for intelligent image recognition such as situation prediction together with high-performance object detection and recognition technology.

본 발명의 실시예들은, CCTV로 촬영된 입력 비디오에 대해 인공지능 딥 러닝 기술 기반으로 객체 예를 들어, 사람, 얼굴, 차량, 화재 등의 검출, 인식 및 추적할 수 있는 인공지능 기반 영상 감시 방법 및 시스템을 제공한다.Embodiments of the present invention provide an artificial intelligence based video surveillance method capable of detecting, recognizing, and tracking an object, for example, a person, a face, a vehicle, a fire, etc., based on artificial intelligent deep learning technology, And a system.

본 발명의 일 실시예에 따른 인공지능 기반 영상 감시 시스템은 미리 정의된 검출 딥 러닝 네트워크를 이용하여 감시 카메라 영상에 대해 미리 설정된 적어도 하나의 객체를 검출하는 검출부; 미리 정의된 인식 딥 러닝 네트워크와 상기 객체에 대한 검출 결과에 기초하여 상기 검출된 객체에 대한 정보를 인식하는 인식부; 및 미리 정의된 추적 딥 러닝 네트워크와 상기 검출된 객체에 대한 검출 결과에 기초하여 상기 검출된 객체를 추적하는 추적부를 포함한다.The artificial intelligence based video surveillance system according to an embodiment of the present invention includes a detector for detecting at least one object preset for a surveillance camera image using a predefined sensing deep learning network; A recognition unit for recognizing information about the detected object based on a detection result for the object; And a tracking unit for tracking the detected object based on a predefined tracking deep learning network and a detection result for the detected object.

상기 검출부는 컨볼루션 신경망(CNN; convolution neural network) 기반의 히트 맵(heat map) 생성 방법을 포함하는 영역 제안 추출(region proposal extraction)을 이용하여 상기 객체를 검출할 수 있다.The detection unit may detect the object using region proposal extraction including a method of generating a heat map based on a convolution neural network (CNN).

상기 검출부는 상기 객체의 속성들을 이용한 멀티 태스킹 러닝(multi-tasking learning) 방법에 기초하여 객체 영역의 추출과 바운딩 박스(bounding box)의 조정을 수행함으로써, 상기 객체를 검출할 수 있다.The detecting unit may detect the object by extracting an object region and adjusting a bounding box based on a multi-tasking learning method using attributes of the object.

상기 검출부는 상기 감시 카메라 영상에 대해 영역 제안 추출을 수행하고, 상기 추출된 영역 제안들로부터 컨볼루션 신경망 기반의 특성 추출을 통해 상기 영역 제안들 각각의 특성들을 추출함으로써, 상기 객체를 검출할 수 있다.The detection unit may perform the region suggestion extraction on the surveillance camera image and extract the characteristics of each of the region suggestions through the convolutional neural network based feature extraction from the extracted region suggestions to detect the object .

상기 추적부는 상기 검출된 객체에 대한 이전 추적 결과와 현재 추적 결과의 신뢰도를 측정하고, 상기 측정된 신뢰도가 일정 이하인 경우 온라인으로 딥 러닝 모델을 업데이트함으로써, 상기 검출된 객체를 추적할 수 있다.The tracking unit measures the previous tracking result of the detected object and the reliability of the current tracking result, and updates the deep learning model on-line when the measured reliability is less than a predetermined value, thereby tracking the detected object.

나아가, 본 발명의 일 실시예에 따른 인공지능 기반 영상 감시 시스템은 상기 검출부 및 상기 인식부와 연동하고, 사고와 위험에 대해 미리 학습된 사고 예측 딥 러닝 네트워크와 상기 검출된 객체 및 상기 인식된 객체 정보에 기초하여 위험 요소를 예측하는 예측부를 더 포함할 수 있다.Further, the artificial intelligence based video surveillance system according to an embodiment of the present invention may include an accident prediction and deep learning network that is interlocked with the detection unit and the recognition unit and is previously learned about an accident and a danger, And a prediction unit for predicting the risk factors based on the information.

본 발명의 일 실시예에 따른 인공지능 기반 영상 감시 방법은 미리 정의된 검출 딥 러닝 네트워크를 이용하여 감시 카메라 영상에 대해 미리 설정된 적어도 하나의 객체를 검출하는 단계; 미리 정의된 인식 딥 러닝 네트워크와 상기 객체에 대한 검출 결과에 기초하여 상기 검출된 객체에 대한 정보를 인식하는 단계; 및 미리 정의된 추적 딥 러닝 네트워크와 상기 검출된 객체에 대한 검출 결과에 기초하여 상기 검출된 객체를 추적하는 단계를 포함한다.According to an embodiment of the present invention, an artificial intelligence based video surveillance method includes: detecting at least one object preset for a surveillance camera image using a predefined sensing deep learning network; Recognizing information about the detected object based on a detection result for a predefined deep learning network and the object; And tracking the detected object based on a predefined trace deep learning network and a detection result for the detected object.

상기 검출하는 단계는 컨볼루션 신경망(CNN; convolution neural network) 기반의 히트 맵(heat map) 생성 방법을 포함하는 영역 제안 추출(region proposal extraction)을 이용하여 상기 객체를 검출할 수 있다.The detecting step may detect the object using a region proposal extraction including a method of generating a heat map based on a convolution neural network (CNN).

상기 검출하는 단계는 상기 객체의 속성들을 이용한 멀티 태스킹 러닝(multi-tasking learning) 방법에 기초하여 객체 영역의 추출과 바운딩 박스(bounding box)의 조정을 수행함으로써, 상기 객체를 검출할 수 있다.The detecting may detect the object by extracting an object region and adjusting a bounding box based on a multi-tasking learning method using attributes of the object.

상기 검출하는 단계는 상기 감시 카메라 영상에 대해 영역 제안 추출을 수행하고, 상기 추출된 영역 제안들로부터 컨볼루션 신경망 기반의 특성 추출을 통해 상기 영역 제안들 각각의 특성들을 추출함으로써, 상기 객체를 검출할 수 있다.The detecting step may include performing an area suggestion extraction on the surveillance camera image, extracting characteristics of each of the area suggestions through convolutional neural network-based feature extraction from the extracted area suggestions, .

상기 추적하는 단계는 상기 검출된 객체에 대한 이전 추적 결과와 현재 추적 결과의 신뢰도를 측정하고, 상기 측정된 신뢰도가 일정 이하인 경우 온라인으로 딥 러닝 모델을 업데이트함으로써, 상기 검출된 객체를 추적할 수 있다.The tracking step may track the detected object by measuring a previous tracking result for the detected object and a reliability of the current tracking result and updating the deep learning model online if the measured reliability is less than a predetermined value .

나아가, 본 발명의 일 실시예에 따른 인공지능 기반 영상 감시 방법은 사고와 위험에 대해 미리 학습된 사고 예측 딥 러닝 네트워크와 상기 검출된 객체 및 상기 인식된 객체 정보에 기초하여 위험 요소를 예측하는 단계를 더 포함할 수 있다Further, the artificial intelligence based video surveillance method according to an embodiment of the present invention may include an accident prediction and deep learning network that is learned in advance about an accident and a risk, a step of predicting a risk element based on the detected object and the recognized object information Lt; RTI ID = 0.0 >

본 발명의 실시예들에 따르면, CCTV로 촬영된 입력 비디오에 대해 인공지능 딥 러닝 기술 기반으로 객체 예를 들어, 사람, 얼굴, 차량, 화재 등의 검출, 인식 및 추적할 수 있다.According to embodiments of the present invention, an object, for example, a person, a face, a vehicle, a fire, etc., can be detected, recognized, and tracked on the basis of an artificial intelligent deep learning technique for input video shot by CCTV.

본 발명의 실시예들에 따르면, 고성능의 영상 감시가 가능하고, 예측 딥 네트워크를 통해 하이 레벨 비전 기술을 구현함으로써, 감시 시스템 시장에서 지능형 영상 감시 핵심 기술 확보가 가능하며 국가적으로 산업 경쟁력을 높일 수 있다. According to the embodiments of the present invention, high-performance video surveillance is possible, and high-level vision technology is implemented through a predictive deep network, it is possible to acquire core technology for intelligent video surveillance in the surveillance system market, have.

뿐만 아니라, 기존 4세대 스마트 CCTV에서 할 수 없었던 딥 러닝 기술로 인공 지능형 예측 기술을 구현함으로써, 기술적 파급력을 높일 수 있다.In addition, it can enhance technological impact by implementing artificial intelligence prediction technology with deep learning technology that could not be done in the existing 4th generation smart CCTV.

본 발명의 실시예들에 따르면, CCTV, 드론카메라, 바디카메라 등 다양한 감시 카메라 환경에 적용할 수 있다.According to embodiments of the present invention, the present invention can be applied to various surveillance camera environments such as CCTV, drone camera, and body camera.

도 1은 본 발명의 일 실시예에 따른 인공지능 기반 영상 감시 시스템에 대한 개념도를 나타낸 것이다.
도 2는 CNN 기반의 히트 맵 생성을 통한 영역 제안 추출을 설명하기 위한 일 예시도를 나타낸 것이다.
도 3은 인공지능 기반 사람 검출 프레임워크에 대한 일 예시도를 나타낸 것이다.
도 4는 인공지능 기반 얼굴 검출 프레임워크에 대한 일 예시도를 나타낸 것이다.
도 5는 사고 예측을 포함하는 인공지능 기반 감시 시스템에 대한 일 실시예의 개념도를 나타낸 것이다.
도 6은 객체 인식 및 사고 예측 딥 러닝 네트워크에 대한 일 실시예 구성을 나타낸 것이다.
도 7은 사고 예측 딥 러닝 네트워크 학습에 대한 일 예시도를 나타낸 것이다.
도 8은 본 발명의 일 실시예에 따른 인공지능 기반 영상 감시 방법에 대한 동작 흐름도를 나타낸 것이다.1 is a conceptual diagram of an artificial intelligence based video surveillance system according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating an example of region proposal extraction through CNN-based heat map generation.
FIG. 3 shows an example of an artificial intelligence based human detection framework.
4 is a diagram illustrating an example of an artificial intelligence based face detection framework.
5 is a conceptual diagram of an embodiment of an artificial intelligence based surveillance system including an accident prediction.
FIG. 6 shows a configuration of an embodiment of an object recognition and accident prediction deep learning network.
FIG. 7 shows an example of an accident predicting deep learning network learning.
8 is a flowchart illustrating an artificial intelligence based video surveillance method according to an embodiment of the present invention.

이하, 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 또한, 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited to or limited by the embodiments. In addition, the same reference numerals shown in the drawings denote the same members.

본 발명의 실시예들은, 스마트 관제를 위한 인공지능 기반 영상 감시 기술에 관한 것으로, CCTV로 촬영된 입력 비디오에 대해 인공지능 딥 러닝 기술 기반으로 객체의 검출, 인식 및 추적을 수행하는 것을 그 요지로 한다.Embodiments of the present invention relate to an artificial intelligence based video surveillance technology for smart control, and it is an object of the present invention to detect, recognize and track objects based on artificial intelligent deep learning technology for input video captured by CCTV do.

이 때, 본 발명은 물체(또는 객체) 검출, 인식, 추적 및 예측과 같은 영상 감시 핵심 기술을 딥 러닝 기반의 딥 네트워크들로 구성할 수 있다.At this time, the present invention can constitute deep learning based deep networks such as video monitoring core technologies such as object (or object) detection, recognition, tracking and prediction.

도 1은 본 발명의 일 실시예에 따른 인공지능 기반 영상 감시 시스템에 대한 개념도를 나타낸 것이다.1 is a conceptual diagram of an artificial intelligence based video surveillance system according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 인공지능 기반 영상 감시 시스템은 검출부(DetectionNet), 인식부(RecognitionNet) 및 추적부(TrackingNet)를 포함한다.As shown in FIG. 1, an artificial intelligence based video surveillance system according to an embodiment of the present invention includes a detection unit, a recognition unit, and a tracking unit.

검출부는 딥 러닝 기반으로 객체를 검출하는 구성 수단으로, 딥 네트워크를 사용하여 사람 검출, 얼굴 검출, 자동차 검출 및 번호판 검출 등을 수행한다.The detection unit is a constituent means for detecting an object based on a deep learning, and performs human detection, face detection, vehicle detection, and license plate detection using a deep network.

여기서, 검출부는 입력 비디오에 대해서 일정 시간 간격으로 자동 객체검출을 수행하고, 검출된 객체에 대해 딥 러닝 네트워크를 이용하여 추적 기술을 적용함으로써 실시간 처리할 수 있으며, 물체가 검색되는 영역을 최소화하는 방법을 적용하여 딥 네트워크를 설계하고 트레이닝(training)할 수 있다.Here, the detection unit can perform automatic object detection at a predetermined time interval for the input video, apply real-time processing to the detected object using the deep learning network, and minimize the area in which the object is searched Can be applied to design and train deep networks.

검출부는 검출 딥 네트워크를 이용하여 식별 가능한 물체를 포함하는 감시카메라 영상에 대해 자동으로 객체를 검출한다.The detection unit automatically detects the object of the surveillance camera image including the identifiable object using the detection deep network.

여기서, 자동으로 검출되는 물체(또는 객체)는 사람, 얼굴, 자동차, 번호판 등을 포함할 수 있다.Here, the automatically detected object (or object) may include a person, a face, a car, a license plate, and the like.

구체적으로, 검출부는 딥 러닝 기반 네트워크를 활용하여 사람 영역을 검출하고, 검출된 사람 영역 내에서 얼굴 검출 딥 네트워크를 활용하여 얼굴 영역을 검출한다. 마찬가지로, 검출부는 자동차 검출 딥 네트워크를 활용하여 영상 내 존재하는 자동차들을 검출하고, 검출된 자동차 영상 내에서 번호판 영역을 검출한다.Specifically, the detection unit detects a human region using a deep learning-based network, and detects a face region using a face detection deep network in a detected human region. Likewise, the detection unit detects automobiles existing in the image using the automobile detection deep network, and detects the license plate area within the detected automobile image.

이 때, 검출부는 검출된 각 객체의 위치 정보, 크기 정보, 그리고 검출의 신뢰도 값을 출력할 수 있다.At this time, the detection unit can output the position information, the size information, and the reliability value of detection of each detected object.

나아가, 검출부는 물체의 크기를 사전에 모르기 때문에 영상에서 미리 설정된 스케일들 각각에서 물체 영역 검색을 하면서 객체 존재 유무를 확인할 수 있다.Furthermore, since the detection unit does not know the size of the object in advance, it is possible to check the presence or absence of the object while searching the object area on each of the scales preset in the image.

또한, 검출부는 입력 영상 내 사람들을 자동으로 검출하고, 사람의 위치, 자세, 이동방향 등의 다양한 정보를 포함하는 사람 관련 정보를 빠르게 추출하기 위한 관심 영역 기반의 고속 사람 검출용 네트워크를 사용할 수 있다.In addition, the detection unit can use a high-speed human detection network based on a region of interest to automatically detect people in the input image and quickly extract human-related information including various information such as a person's position, posture, .

실제 환경에서의 사람 검출은 검출 성능의 정확도뿐만 아니라 검출의 속도가 매우 중요하다. 예를 들어, 자율주행 자동차의 경우 보행자 검출 속도는 사고 발생여부를 결정짓는 중요한 요소이며, 감시 시스템 등의 환경에서는 단순히 검출에서 그치는 것이 아니라 검출된 사람의 다양한 특성을 제공해주는 것이 필요하다.Human detection in a real environment is very important as well as accuracy of detection performance. For example, in autonomous vehicles, pedestrian detection speed is an important factor that determines the occurrence of an accident. In an environment such as a surveillance system, it is necessary to provide not only detection but also various characteristics of the detected person.

이를 위해, 검출부는 도 2에 도시된 바와 같이, 빠른 속도로 후보영역을 추출하는 영역 제안 추출(region proposal extraction)을 이용할 수도 있다. 영역 제안 추출의 일 예로 컨볼루션 신경망(CNN; convolution neural network) 기반의 히트 맵(heat map) 생성 방법 등이 사용될 수 있다.To this end, the detection unit may use region proposal extraction to extract candidate regions at a high speed, as shown in FIG. As an example of the area suggestion extraction method, a heat map generation method based on a convolution neural network (CNN) can be used.

여기서, CNN 기반의 히트 맵 생성 방법은 도 3에 도시된 바와 같이, CNN기반의 네트워크를 통해 객체 후보영역에 대한 확률 값을 나타내는 2차원 맵을 의미하는 히트 맵을 생성하고, 이를 이용하여 영역 제안을 추출할 수 있다.Here, as shown in FIG. 3, a CNN-based hit map generation method generates a heat map indicating a two-dimensional map representing a probability value for an object candidate region through a CNN-based network, Can be extracted.

도 2에서의 영역 제안 네트워크는 노란색으로 표시된 총 5개의 컨볼루션 레이어로 구성된 영역 제안 네트워크로, 첫 번째와 두 번째 컨볼루션 레이어에는 초록색으로 표시된 맥스풀링 레이어(maxpooling layer)와 파란색으로 표시된 정규화 레이어(normalization layer)가 연결될 수 있다.The proposed area network in Fig. 2 is a region proposed network composed of a total of five convolution layers indicated by yellow. In the first and second convolution layers, a max pooling layer indicated in green and a normalization layer indicated in blue normalization layer can be connected.

이 때, 다섯 번째 컨볼루션 레이어에서 얻은 확률분포 맵을 얻게 되고, 이를 이용하여 높은 확률 값을 갖는 영역들을 추출하여 최종적으로 가장 오른쪽에 도시된 바와 같이 영역 제안을 획득할 수 있다. 여기서, 도 2에 도시된 확률분포 맵에서 파란색으로 표시된 부분은 확률 값 '0'을 나타내고, 빨간색으로 갈수록 '1'에 가까운 높은 확률 값을 나타내는 것을 의미할 수 있다.At this time, a probability distribution map obtained from the fifth convolution layer is obtained, and regions having high probability values are extracted using the result, thereby finally obtaining the domain proposal as shown on the rightmost side. Here, the portion indicated by blue in the probability distribution map shown in FIG. 2 may represent a probability value of '0' and a higher probability value closer to '1' toward red.

또한, 검출부는 검출된 후보영역이 실제 사람인지 분류하는 문제를 효율적으로 해결하기 위하여, 사람의 속성들 예를 들어, 보행자 크기, 포즈, 모자, 안경 등의 사람의 속성들을 활용한 멀티 태스킹 러닝(multi-tasking learning) 방법을 사용할 수도 있다.In order to efficiently solve the problem of classifying the detected candidate region as a real person, the detection unit may perform a multitasking learning process using human attributes such as human attributes such as pedestrian size, pose, hat, multi-tasking learning method.

여기서, 멀티 태스킹 러닝 방법은 추출된 영역 제안들로부터 실제 CNN 기반의 네트워크(CNN-based feature extraction)를 통해 특성을 추출하고 최종 검출을 수행하는 단계로, CNN 기반의 네트워크는 사람 검출과 관련된 여러 가지 태스크들 예를 들어, 보행자 크기, 보행자 포즈 등을 공통적으로 만족하는 방향으로 학습하여 좀 더 효과적인 특성을 학습할 수 있다. 즉, 기존 사람검출과 관련된 객체 함수가 아래 <수학식 1>과 같다면 수학식 1을 T 개의 태스크로 확장하여 아래 <수학식 2>와 같은 객체 함수를 만족하도록 네트워크를 학습할 수 있다.Here, the multitasking learning method is a step of extracting characteristics from the extracted domain proposals through a CNN-based feature extraction and performing final detection. The CNN-based network includes various methods related to human detection For example, pedestrian size, pedestrian pose, etc. can be studied in a direction that satisfies commonly, so that a more effective characteristic can be learned. That is, if the object function related to the existing person detection is as shown in Equation (1) below, the network can be learned so as to satisfy the object function expressed by Equation (2) by extending Equation (1) to T tasks.

[수학식 1][Equation 1]

[수학식 2]&Quot; (2) "

여기서, N은 학습데이터 개수를 의미하고,

는 i번째 학습데이터의 특성 벡터(feature vector)를 의미하며,

는 i번째 학습데이터의 t번째 태스크의 라벨(label) 값을 의미하고,

는 t번째 태스크의 로스 함수(loss function)를 의미할 수 있다.Here, N means the number of learning data,

Denotes a feature vector of the ith learning data,

Denotes the label value of the t-th task of the i-th learning data,

May be a loss function of the t-th task.

즉, 도 3에 도시된 바와 같이, 인공지능 기반 사람 검출 프레임워크에서는 도 2에 도시된 네트워크를 통해 영역 제안 추출을 수행하고, 추출된 영역 제안들로부터 CNN 기반의 특성 추출을 통해 특성들 예컨대, 영역 제안들 각각의 특성들을 추출함으로써, 최종 검출을 수행한다.That is, as shown in FIG. 3, the artificial intelligence-based human detection framework performs region suggestion extraction through the network shown in FIG. 2 and extracts CNN-based characteristics from the extracted domain suggestions, By extracting the characteristics of each of the domain suggestions, the final detection is performed.

이 때, 특성 추출을 위한 CNN 네트워크는 도 3에 도시된 바와 같이 멀티 태스킹 러닝 방법으로 학습함으로써, 특성 추출에 대한 성능을 향상시킬 수 있다.At this time, the CNN network for the characteristic extraction can improve the performance for the feature extraction by learning with the multitasking learning method as shown in FIG.

검출부의 얼굴 검출 모듈은 사람 검출 모듈에 의해 획득된 사람 영역을 포함하는 영상 입력에 대해 얼굴 검출 딥 네트워크를 이용하여 얼굴 영역을 검출하고, 검출된 얼굴 영역에 대한 위치정보, 크기정보, 그리고 검출 신뢰도 값을 출력한다.The face detection module of the detection unit detects a face region using a face detection deep network for an image input including a human region acquired by the human detection module, and detects position information, size information, and detection reliability Output the value.

이 때, 얼굴 검출 모듈은 사람 검출 모듈에 의해 획득된 사람 영역에 대해 얼굴 검출을 수행하기 때문에 검색 영역 감소로 인해 빠르고 효율적인 얼굴 검출을 수행할 수 있다.In this case, since the face detection module performs face detection on the human area acquired by the human detection module, fast and efficient face detection can be performed due to the reduction of the search area.

얼굴 검출 모듈은 사람 검출 네트워크와 마찬가지로 도 4에 도시된 바와 같이, 히트 맵을 추출하여 얼굴 후보 영역을 추출할 수 있는데, 다양한 크기로 영상에 나타나는 얼굴들에 대한 고성능의 얼굴 검출을 수행하기 위해 다중 스케일로 얼굴 맵을 추출한다.As shown in FIG. 4, the face detection module can extract a face candidate region by extracting a heat map. As shown in FIG. 4, in order to perform high-performance face detection on faces displayed in various sizes, Extract face map with scale.

이 때, 단일 스케일에서 히트 맵을 추출하는 것은 도 2와 유사한 네트워크를 통해 이루어질 수 있다.At this time, extracting the heat map at a single scale can be done via a network similar to that of FIG.

얼굴 후보영역에 대해 최종적으로 얼굴인지 아닌지 판단하는 딥 네트워크에서는 검출된 영역을 표시하는 바운딩 박스(bounding box)가 보다 정확한 위치에 위치하도록 멀티 태스크 러닝 방법을 사용한다.In a DIP network that determines whether a face candidate region is finally a face or not, a multi-task learning method is used so that a bounding box indicating a detected region is positioned at a more accurate position.

즉, 얼굴 검출 모듈은 도 4에 도시된 바와 같이 사람 검출 모듈을 통해 추출된 사람 영역에 대해 얼굴 컴포넌트 정보 기반의 다중스케일 히트 맵을 추출하고, 추출된 다중스케일 히트 맵을 통해 얼굴 후보 영역을 추출하며, 추출된 얼굴 후보 영역에 대한 멀티 태스크 러닝 방법을 통해 최종 얼굴 영역의 추출과 바운딩 박스 조정을 수행함으로써, 얼굴 검출 결과를 제공한다.That is, as shown in FIG. 4, the face detection module extracts a multi-scale heat map based on the face component information for the human region extracted through the human detection module, and extracts a face candidate region through the extracted multi-scale heat map Extracting a final face area and adjusting a bounding box through a multitask running method on the extracted face candidate region, thereby providing a face detection result.

그리고, 검출부의 자동차 검출 모듈과 번호판 검출 모듈은 상술한 사람 검출 모듈 및 얼굴 검출 모듈과 유사한 방식으로, 딥 러닝 네트워크를 통해 입력 영상 내의 자동차 영역을 검출하고, 검출된 자동차 영역에서 번호판 영역을 검출할 수 있다.The vehicle detection module and the license plate detection module of the detection unit detect an automobile area in the input image through a deep learning network in a manner similar to the human detection module and the face detection module described above and detect the license plate area in the detected automobile area .

인식부는 검출부에 의해 검출된 객체 검출 결과에 대한 딥 러닝 네트워크를 통해 검출된 객체에 대해 어떤 사람이 존재하는지 예를 들어, 사람의 식별 정보(ID)를 통해 어떤 사람이 존재하는지 인식하고, 자동차의 번호판(license plate)에 있는 번호 정보를 자동으로 인식한다.The recognition unit recognizes, for example, whether a person is present in the object detected through the deep learning network with respect to the object detection result detected by the detection unit, for example, through the identification information (ID) of the person, It automatically recognizes the number information on the license plate.

이 때, 인식부는 인식을 수행하기 전 영상품질이 좋은 영상에 대해 신뢰성이 높은 인식을 수행하기 위하여 영상품질 측정을 수행할 수 있으며, 평가되는 영상품질은 해상도, 밝기, 자세 정보, 물체 정렬, 블러(blur) 정보에 기반하여 평가될 수 있다.In this case, the recognition unit may perform image quality measurement to perform highly reliable recognition of an image having good image quality before performing recognition, and the image quality to be evaluated may include resolution, brightness, posture information, (blur) information.

인식부는 측정된 영상품질에 기반하여 인식이 가능한 객체에 대해서 딥 러닝 구조에 의해 인식을 수행하고, 얼굴에 대해서는 DB 등록 유무, 사람의 ID, 인식 신뢰도 값을 출력하며, 자동차 번호판에 대해서는 인식된 자동차 번호 정보를 출력한다.The recognition unit performs recognition based on the measured image quality by the deep learning structure for the recognizable object, outputs the DB registration status, the human ID, and the recognition reliability value for the face, Print number information.

이 때, 인식부는 검출 네트워크와 마찬가지로 실시간 처리를 위해 일정 주기에 대해 자동 객체 인식을 수행하고, 추적 네트워크와 연동할 수 있으며, 고성능의 인식을 위해 영상 내에서 활용할 수 있는 다중 정보 예를 들어, 다중 특징, 컬러 정보 등의 정보에 기반한 딥 러닝 구조를 사용할 수 있다.In this case, as in the detection network, the recognition unit performs automatic object recognition for a predetermined period of time for real-time processing, can interlock with the tracking network, and has multiple information that can be utilized in the image for high- Feature, color information, and the like can be used.

추적부는 추적 네트워크를 이용하여 검출된 객체에 대해 추적을 수행하거나 영상에서 사용자에 의해 초기화된 객체에 대한 추적을 수행한다.The tracking unit performs tracking on the detected object using the tracking network or performs tracking on the object initialized by the user in the image.

여기서, 추적부는 객체의 추적을 통해서 객체의 위치 정보, 크기 정보, 그리고 움직임 정보가 출력될 수 있으며, 높은 신뢰도를 갖는 추적을 위해서 추적 결과에 대해 신뢰도를 측정하여 객체가 드리프트(drift)되는 현상을 최소화할 수 있다.Here, the tracking unit can output the position information, the size information, and the motion information of the object through the tracking of the object, and the object is drifted by measuring the reliability of the tracking result for high reliability tracking Can be minimized.

뿐만 아니라, 추적부는 실시간의 객체 추적을 위해 상대적으로 얕은 딥 러닝 구조를 이용할 수 있고, 고 신뢰성을 갖는 추적을 위해 온라인 학습이 가능한 딥 러닝 구조를 이용할 수도 있다.In addition, the tracker can use a relatively shallow deep-run structure for real-time object tracking and a deep-run structure that enables on-line learning for high-reliability tracking.

물체추적 딥 네트워크에서의 온라인 학습에 대해 설명하면, 기존에 학습된 모델로만 물체추적을 수행하게 되면, 조명변화, 자세변화, 그리고 가리움 같은 현상이 발생했을 때 추적 결과가 드리프트가 생기기 때문에 이전에 추적된 결과와 현재 추적된 결과 사이의 비교를 통해 물체추적의 신뢰도를 측정하고, 신뢰도가 일정 이하로 떨어질 경우에는 온라인으로 딥 러닝 모델을 업데이트할 수 있다.Object Tracking Describing on-line learning in deep networks, tracking objects only with previously learned models can lead to tracking drift when lighting changes, attitude changes, and clipping occur, The reliability of object tracking can be measured through comparison between the results and the currently tracked results, and the on-line deep-learning model can be updated when the reliability falls below a certain level.

이 때, 추적부는 전체 레이어를 모두 업데이트하는 경우 많은 연산시간이 요구되므로, CNN의 마지막 완결 연결 레이어(fully-connected layer)에 대해 파인 튜닝(fine-tuning)을 수행하여 모델을 업데이트할 수 있다.At this time, since the tracking unit requires a large amount of computation time when updating all the layers, it is possible to update the model by performing fine-tuning on CNN's fully-connected layer.

추적부는 온라인으로 모델을 업데이트 하기 위해 현재 객체의 위치에 해당하는 샘플을 포지티브(positive)로 놓고 포지티브(positive) 주변 영역을 네거티브(negative)로 놓은 후 학습을 수행하고, 다음 온라인 업데이트까지 일정 프레임 수에 대한 포지티브와 네거티브 샘플 정보를 누적하여 온라인 학습에 활용한다. 추적부는 이러한 방법을 통해 가리움과 같은 현상으로 인해 추적 결과가 드리프트되는 현상을 방지할 수 있다.In order to update the model online, the tracking unit sets the sample corresponding to the position of the current object as positive, places the positive peripheral region as negative, performs learning, Positive and Negative sample information for the online learning is accumulated. In this way, the tracking unit can prevent the tracking result from drifting due to a phenomenon such as a clogging.

이와 같이, 본 발명의 일 실시예에 따른 인공지능 기반 영상 감시 시스템은 객체 검출, 인식 및 추적과 관련된 구성 수단을 딥 러닝 기반의 네트워크를 이용하여 구성함으로써, 예측 딥 네트워크를 통해 하이 레벨 비전 기술을 구현하고, 이를 통해 감시 시스템의 지능형 영상 감시 핵심 기술을 확보할 수 있다.As described above, the artificial intelligence based video surveillance system according to an embodiment of the present invention configures the configuration means related to object detection, recognition, and tracking using a deep learning based network, thereby enabling high level vision technology And it can secure the core technology of the intelligent video surveillance system of the surveillance system.

본 발명에 따른 인공지능 기반 영상 감시 시스템의 동작 시나리오에 대한 일 예는, CCTV 감시 카메라로 촬영하는 상황에서 수상한 사람이 영상에 나타났을 경우 물체 검출 네트워크인 검출부인 DetectionNet를 통해 사람을 검출하고 추가적으로 얼굴의 위치를 검출한다.An example of the operation scenario of the artificial intelligence based video surveillance system according to the present invention is to detect a person through DetectionNet which is a detection part which is an object detection network when a suspicious person appears in an image while shooting with a CCTV surveillance camera, As shown in FIG.

이 때, 검출된 사람에 관해 계속해서 인식 네트워크를 수행할 경우, 동작 시간도 오래 걸리고 저품질의 영상으로 인한 성능 저하가 존재하기에 이를 해결하기 위해 다음과 같은 방법으로 시스템이 동작할 수 있다.In this case, when the recognition network is continuously performed with respect to the detected person, the operation time is long and the performance deterioration due to the low quality image exists. Therefore, the system can operate in the following manner.

추적 네트워크인 추적부의 TrackingNet에서 검출된 사람이 움직이는 동선을 지속해서 따라가게 된다.TrackingNet, a tracking network, tracks people who are detected in TrackingNet to follow moving lines continuously.

이 때, 추적 중인 얼굴의 품질이 좋은 경우 예를 들어, 가려짐이 없고 조명상태가 나쁘지 않은 상황이 포착되면 인식 네트워크인 인식부의 RecognitionNet에서 해당 얼굴에 관한 인식을 진행하고 해당 사람의 신원정보를 파악할 수 있다.At this time, if the quality of the face being tracked is good, for example, when a situation in which there is no obstruction and the lighting condition is not bad is detected, RecognitionNet of Recognition Net, which is a recognition network, carries out recognition of the face and identifies the person .

인식 결과 해당 사람이 사전에 등록된 안전한 사람인지, 또는 범인 데이터베이스에 존재하는 수상한 사람인지 등에 관해 판단함으로써, 수상한 사람으로 판단되는 경우 감시 본부에 보고할 수 있다.If the person is judged to be a suspicious person, it can be reported to the surveillance center by judging whether the person is a safe person registered in advance or a suspicious person present in the criminal database.

또한, 본 발명에 따른 딥 러닝을 이용한 인공 지능 기반 감시 시스템은 도 5에 도시된 일 예와 같은 구조로 전체 시스템이 구성될 수도 있다. 즉, 본 발명에 따른 딥 러닝을 이용한 인공 지능 기반 감시 시스템은 검출된 물체에 관해 물체 인식 딥 네트워크(deep net for object detection, deep net for object recognition)와 사고 예측 딥 러닝 네트워크(deep net for accident prediction)를 연동하여 위험 및 사고 예측, 범죄 예측, 교통 사고 예측 등에 사용될 수 있다. 영상에 나타나는 물체를 검출, 인식하는 것을 넘어서 안전 및 사고데이터와 인공지능 기술에 기반하여 학습된 딥 네트워크에 의해 위험 및 사고 등을 예측하고, 이를 시각화하여 감시 시스템으로 활용할 수도 있다.In addition, the artificial intelligence based monitoring system using deep learning according to the present invention may be configured as an entire system with the same structure as the example shown in FIG. That is, the artificial intelligence based surveillance system using deep learning according to the present invention can detect a deep net for object recognition and a deep net for accident prediction ), It can be used for risk and accident prediction, crime prediction, and traffic accident prediction. Beyond detecting and recognizing objects appearing in images, risks and accidents can be predicted by the learned deep network based on safety and accident data and artificial intelligence technology, visualized and used as a surveillance system.

사고 예측 딥 러닝 네트워크는 도 6에 도시된 바와 같이, 물체 검출과 물체 인식 및 상황 인식을 통해 얻은 정보를 활용하여 미래 또는 이후 일정 시간 범위 내에서 발생할 위험/사고, 범죄를 예측한다. 위험 및 사고의 경우에는 CCTV 관찰을 통해 사고(예를 들어, 화재, 폭발 등), 다수의 군중, 이상행위, 또는 범죄자의 정보(예를 들어, 범죄자의 얼굴 또는 범죄 차량)로부터 위험 요소를 예측할 수 있다.Accident Prediction As shown in FIG. 6, the deep learning network uses information obtained through object detection, object recognition, and situation recognition to predict risks / accidents and crimes that occur within a certain time range in the future or afterward. In the case of risks and accidents, CCTV observations can be used to predict risks from accidents (eg, fire, explosions, etc.), mass crowds, abnormal behavior, or information from criminals (eg criminals faces or criminal vehicles) .

본 발명은 딥 러닝 기술로 현재 차량의 수와 흐름, 군중정보, 행동인식으로부터 미래의 범죄 사고를 예측을 수행 한다.The present invention predicts future crime incidents from current vehicle numbers and flows, crowd information, and behavioral awareness with deep learning technology.

이 때, 본 발명에 따른 시스템은 감시 비디오의 시공간 변화에 따른 상관정보 추출, 다중 객체 및 카메라 정보 데이터로부터 딥 러닝 네트워크를 통해 예측을 수행할 수 있다.At this time, the system according to the present invention can perform prediction through a deep learning network from correlation information extraction, multi-object and camera information data according to time and space change of surveillance video.

본 발명에서는 사고예측 딥 네트워크를 학습하기 위해서 도 7에 도시된 바와 같이, CCTV로부터 추출된 정보 예를 들어, 범죄자, 군중, 차량, 그리고 상황정보 및 안전, 사고와 관련된 정부 부처 데이터를 활용함으로써, 범죄, 교통 사고, 위험 및 사고 예측을 위한 딥 러닝 구조를 학습할 수 있다.In the present invention, as shown in FIG. 7, by using data extracted from the CCTV, for example, criminal, crowd, vehicle, and situation information, and government department data related to safety and accident, You can learn the deep learning structure for crime, traffic accident, risk and accident prediction.

이 때, 예측 네트워크 구조로 RNN, DAE, DNN 기반 데이터 표현 방법, 그리고 멀티 모달 딥 러닝 네트워크 융합에 관한 구조를 이용할 수도 있다.At this time, it is possible to use RNN, DAE, DNN-based data representation method and multi-modal deep learning network convergence structure as a predictive network structure.

도 8은 본 발명의 일 실시예에 따른 인공지능 기반 영상 감시 방법에 대한 동작 흐름도를 나타낸 것으로, 상술한 도 1 내지 도 7의 시스템에서의 동작 흐름도를 나타낸 것이다.FIG. 8 is a flowchart illustrating an operation of an artificial intelligence based video surveillance method according to an embodiment of the present invention, and shows an operation flow chart in the system of FIGS. 1 to 7 described above.

도 8을 참조하면, 본 발명의 실시예에 따른 인공지능 기반 영상 감시 방법은 검출부에서 미리 정의된 검출 딥 러닝 네트워크를 이용하여 감시 카메라 영상에 대해 미리 설정된 적어도 하나의 객체를 검출한다(S810).Referring to FIG. 8, an artificial intelligence based video surveillance method according to an embodiment of the present invention detects at least one object preset for a surveillance camera image using a predefined detection deep learning network in a detection unit (S810).

여기서, 단계 S810은 딥 네트워크를 사용하여 사람 검출, 얼굴 검출, 자동차 검출 및 번호판 검출 등을 수행할 수 있으며, 빠른 속도로 후보영역을 추출하는 영역 제안 추출(region proposal extraction) 예를 들어, 컨볼루션 신경망(CNN; convolution neural network) 기반의 히트 맵(heat map) 생성 방법을 이용하여 감시 카메라 영상에 대해 객체를 검출할 수 있다.In step S810, a region proposal extraction for extracting a candidate region at a high speed, which can perform a human detection, a face detection, an automobile detection, a license plate detection, and the like using a DIP network, A method of generating a heat map based on a neural network (CNN) can be used to detect an object for a surveillance camera image.

이 때, 단계 S810은 객체의 속성들을 이용한 멀티 태스킹 러닝(multi-tasking learning) 방법에 기초하여 객체 영역의 추출과 바운딩 박스(bounding box)의 조정을 수행함으로써, 객체를 검출할 수 있다. 구체적으로, 단계 S810은 감시 카메라 영상에 대해 영역 제안 추출을 수행하고, 추출된 영역 제안들로부터 컨볼루션 신경망 기반의 특성 추출을 통해 영역 제안들 각각의 특성들을 추출함으로써, 객체를 검출할 수 있다.At this time, step S810 can detect the object by extracting the object area and adjusting the bounding box based on the multi-tasking learning method using the attributes of the object. Specifically, in step S810, an area proposal extraction is performed on the surveillance camera image, and an object can be detected by extracting the characteristics of each of the area proposals through characteristic extraction based on the convolutional neural network from the extracted area proposals.

인식부에서 미리 정의된 인식 딥 러닝 네트워크와 객체에 대한 검출 결과에 기초하여 검출된 객체에 대한 정보를 인식한다(S820).The recognition unit recognizes the information about the detected object based on the detection result of the predefined deep learning network and the objects (S820).

여기서, 단계 S820은 인식을 수행하기 전 영상품질이 좋은 영상에 대해 신뢰성이 높은 인식을 수행하기 위하여 영상품질 측정을 수행할 수 있으며, 평가되는 영상품질은 해상도, 밝기, 자세 정보, 물체 정렬, 블러(blur) 정보 중 적어도 하나를 포함할 수 있다.Here, in step S820, the image quality measurement may be performed to perform highly reliable recognition on an image having good image quality before performing recognition, and the image quality to be evaluated may include resolution, brightness, posture information, and blur information.

추적부에서 미리 정의된 추적 딥 러닝 네트워크와 검출된 객체에 대한 검출 결과에 기초하여 검출된 객체를 추적한다(S830).The tracking unit traces the detected object based on the detection results of the pre-defined tracking deep learning network and the detected object (S830).

여기서, 단계 S830은 검출된 객체에 대한 이전 추적 결과와 현재 추적 결과의 신뢰도를 측정하고, 측정된 신뢰도가 일정 이하인 경우 온라인으로 딥 러닝 모델을 업데이트함으로써, 검출된 객체를 추적할 수 있고, 이를 통해 신뢰성이 높은 추적을 수행할 수 있다.Here, the step S830 may measure the reliability of the previous tracking result and the current tracking result for the detected object, and update the deep learning model on-line when the measured reliability is less than a predetermined value, thereby tracking the detected object, Reliable tracking can be performed.

나아가, 단계 S830은 온라인으로 모델을 업데이트 하기 위해 현재 객체의 위치에 해당하는 샘플을 포지티브(positive)로 놓고 포지티브(positive) 주변 영역을 네거티브(negative)로 놓은 후 학습을 수행하고, 다음 온라인 업데이트까지 일정 프레임 수에 대한 포지티브와 네거티브 샘플 정보를 누적하여 온라인 학습에 활용함으로써, 가리움과 같은 현상으로 인해 추적 결과가 드리프트되는 현상을 방지할 수 있다.Further, in step S830, in order to update the model on-line, the sample corresponding to the position of the current object is positively set, the positive surrounding area is negatively set, and the learning is performed. Positive and negative sample information for a certain number of frames is accumulated and used for on-line learning, thereby preventing a tracking result from drifting due to a phenomenon such as a clipping.

예측부에서 사고와 위험에 대해 미리 학습된 사고 예측 딥 러닝 네트워크와 검출된 객체 및 인식된 객체 정보에 기초하여 위험 요소를 예측한다(S840).The predictor predicts the risk factors based on the predicted deep learning network and the detected objects and the recognized object information, which have been previously learned about the accident and the risk (S840).

즉, 단계 S840은 검출된 물체에 관해 물체 인식 딥 네트워크(deep net for object detection, deep net for object recognition)와 사고 예측 딥 러닝 네트워크(deep net for accident prediction)를 연동하여 위험 및 사고 예측, 범죄 예측, 교통 사고 예측 등의 사고 및 위험 예측을 수행할 수 있다.That is, step S840 associates the deep net for object recognition and deep net for accident prediction with respect to the detected object to detect danger and accident prediction, , Traffic accident prediction, and so on.

여기서, 단계 S840은 CCTV 관찰을 통해 사고(예를 들어, 화재, 폭발 등), 다수의 군중, 이상행위, 또는 범죄자의 정보(예를 들어, 범죄자의 얼굴 또는 범죄 차량)로부터 위험 요소를 예측할 수 있다.Here, step S840 can be used to predict a risk factor from an accident (e.g., fire, explosion, etc.) through a CCTV observation, a number of crowds, an unusual activity, or information of an offender (for example, a criminal's face or a criminal vehicle) have.

비록, 도 8의 인공지능 기반 영상 감시 방법에서 설명하지 않았더라도, 도 8의 인공지능 기반 영상 감시 방법은 상술한 도 1 내지 도 7의 시스템에서 설명한 내용을 모두 포함할 수 있다는 것은 이 기술분야에 종사하는 당업자라면 자명하다.Although not described in the artificial intelligence based video surveillance method of FIG. 8, it is to be understood that the artificial intelligence based video surveillance method of FIG. 8 may include all of the contents described in the system of FIGS. It will be obvious to those skilled in the art.

이상에서 설명된 시스템 또는 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 시스템, 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The system or apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the systems, devices, and components described in the embodiments may be implemented in various forms such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array ), A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예들에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to embodiments may be implemented in the form of a program instruction that may be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A detector for detecting at least one object preset for a surveillance camera image using a predefined detection deep learning network;
A recognition unit for recognizing information about the detected object based on a detection result for the object; And
And a tracking unit for tracking the detected object based on a detection result for the detected object,
Lt; / RTI >
The tracking unit
Tracking the detected object by measuring a previous tracking result for the detected object and a reliability of the current tracking result and updating the deep learning model on-line when the measured reliability is less than a predetermined value,
The tracking unit
Updating the deep learning model by performing a learning after placing a sample corresponding to a position of the detected object as positive and a positive surrounding region as a negative, Based video surveillance system that accumulates positive sample information and negative sample information for a predetermined number of frames for the next online update.

The method according to claim 1,
The detection unit
Wherein the object is detected using region proposal extraction including a method of generating a heat map based on a convolution neural network (CNN).

The method according to claim 1,
The detection unit
Wherein the object is detected by performing extraction of an object region and adjustment of a bounding box based on a multi-tasking learning method using attributes of the object, system.

The method according to claim 1,
The detection unit
Extracting characteristics of each of the region proposals by extracting region suggestion extraction for the surveillance camera image and extracting characteristics of the region proposal based on the convolutional neural network based on the extracted region proposals, Intelligent video surveillance system.

delete

The method according to claim 1,
And a predictor for predicting a risk element based on the detected object and the recognized object information, wherein the predicted deep learning network includes an accident prediction and deep learning network which is interlocked with the detection unit and the recognition unit,
Further comprising: an artificial intelligence based video surveillance system.

Detecting at least one object preset for the surveillance camera image using a predefined detection deep learning network at the detection unit;
Recognizing information on the detected object based on a detection result for the object and a predefined deepening learning network in the recognition unit; And
Tracking the detected object based on a predefined tracking deep learning network and a detection result for the detected object in a tracking unit
Lt; / RTI >
The tracking step
Tracking the detected object by measuring a previous tracking result for the detected object and a reliability of the current tracking result and updating the deep learning model on-line when the measured reliability is less than a predetermined value,
The tracking step
Updating the deep learning model by performing a learning after placing a sample corresponding to a position of the detected object as positive and a positive surrounding region as a negative, And accumulates the positive sample information and the negative sample information with respect to the predetermined frame number for the next online update.

8. The method of claim 7,
The detecting step
Wherein the object is detected using region proposal extraction including a method of generating a heat map based on a convolution neural network (CNN).

8. The method of claim 7,
The detecting step
Wherein the object is detected by performing extraction of an object region and adjustment of a bounding box based on a multi-tasking learning method using attributes of the object, Way.

8. The method of claim 7,
The detecting step
Extracting characteristics of each of the region proposals by extracting region suggestion extraction for the surveillance camera image and extracting characteristics of the region proposal based on the convolutional neural network based on the extracted region proposals, Intelligent video monitoring method.

delete

8. The method of claim 7,
Predicting a risk factor based on the detected object and the recognized object information;
The method comprising the steps of: (a) providing an artificial intelligence-based video surveillance system comprising: