KR20200019280A

KR20200019280A - The visual detecting system and visual detecting method for operating by the same

Info

Publication number: KR20200019280A
Application number: KR1020180091880A
Authority: KR
Inventors: 이준환; 김병준
Original assignee: 전북대학교산학협력단
Priority date: 2018-08-07
Filing date: 2018-08-07
Publication date: 2020-02-24
Also published as: KR102104548B1; WO2020032506A1

Abstract

According to one embodiment of the present invention, provided is a vision detecting system which comprises: a feature extraction unit for generating a feature map of an image using a neural network; a spatial vision detecting unit for estimating a location and a name of an object in the image using the generated feature map; a temporal vision detecting unit for detecting the object in the image in accordance with a time sequence based on the generated feature map; and a convergence determination unit for determining the result of detecting the object in the image based on the result estimated from the spatial vision detecting unit and the result detected from the temporal vision detecting unit.

Description

VISUAL DETECTING SYSTEM AND VISUAL DETECTING METHOD FOR OPERATING BY THE SAME}

본 발명은 시각 감지 시스템 및 이를 이용한 시각 감지 방법에 관한 것으로, 더욱 상세하게는 딥러닝 객체 감지 모델(예컨대, R-CNN, YOLO, SSD 등)을 통한 객체 인식률을 높이기 위하여, 공간적 시각감지 결과와 함께 시간적 정보에 딥러닝 모델(예컨대, MLP, LSTM, GRU 등)을 이용한 시간적 시각감지 결과를 융합하여 영상 내 오인식을 줄이고 정확성을 제고하는 시각 감지 시스템 및 이를 이용한 시각 감지 방법에 관한 것이다.The present invention relates to a visual sensing system and a visual sensing method using the same, and more particularly, in order to increase the object recognition rate through a deep learning object sensing model (eg, R-CNN, YOLO, SSD, etc.), In addition, the present invention relates to a visual sensing system that fuses temporal visual sensing results using a deep learning model (eg, MLP, LSTM, GRU, etc.) to temporal information, thereby reducing false recognition in the image and improving accuracy, and a visual sensing method using the same.

딥러닝(deep learning)은 여러 비선형 변환기법의 조합을 통해 높은 수준의 추상화(다량의 데이터 속에서 핵심 내용 또는 기능 요약)를 시도하는 알고리즘의 집합으로 정의되며, 이미지를 이용한 객체검출(object detection) 및 인식(recognition), 분류(classification)와 음성인식, 자연어 처리 등의 분야에서 적용되고 있다. 특히, 객체 검출(object detection)은 컴퓨터 비전(computer vision) 분야에서 많이 다뤄왔던 기술로, 최근 딥러닝 및 기계학습 방법이 이슈화되면서 다양한 연구들이 진행되어 영상 인식 및 감지 성능은 인간의 시각 수준으로 높아지고 있다. 다만, 컨볼루션 신경망 등의 딥러닝 기술이 개발되었음에도 불구하고, 오인식은 여전히 존재하며 이러한 오인식을 줄이기 위한 기술의 개발이 절실한 상황이다.Deep learning is defined as a set of algorithms that attempts a high level of abstraction (summarizing key content or functionality in a large amount of data) through a combination of several nonlinear transform techniques, and object detection using images. And recognition, classification and speech recognition, and natural language processing. In particular, object detection is a technology that has been dealt with a lot in the field of computer vision. Recently, as deep learning and machine learning methods have been issued, various researches have been conducted to increase image recognition and detection performance to human vision level. have. However, despite the development of deep learning technologies such as convolutional neural networks, misperceptions still exist and development of technologies to reduce such misperceptions is urgently needed.

도 1은 종래의 시각 감지 방법을 사용하여 영상 내 객체를 인식하는 상태를 나타낸 예시도이다. 도 1의 (a)와 같이 보행자가 두 명이 존재함에도 불구하고 한 명만 보행자로 인식되는 경우가 존재하며, 도 1의 (b)에서와 같이 자동차 번호판을 검출하려는 경우에도 자동차 번호판이 아닌 다른 부분이 자동차 번호판으로 잘못 인식되는 경우도 발생된다. 또한, 도 1의 (c) 에서처럼 도로 왼쪽의 나무 부근의 화재가 발생하였음에도 화재감지가 전혀 되지 않고, 다른 부분이 인식되는 문제점도 존재한다.1 is an exemplary view illustrating a state of recognizing an object in an image using a conventional visual sensing method. Even though there are two pedestrians as shown in (a) of FIG. 1, only one person is recognized as a pedestrian, and even when the license plate is detected as shown in FIG. It can also be mistaken for a license plate. In addition, even when a fire occurs near the tree on the left side of the road as shown in FIG. 1C, the fire detection is not performed at all, and another part is recognized.

더불어, 기존의 시각 감지 방법은 다양한 환경(Ex. 날씨, 장소 등)을 고려하기 힘들고, 단일 영상 내에서 공간적 특징 정보를 이용할 뿐 시간에 따른 분석에 따른 객체 인식을 위한 기술의 개발이 이루어지지 않은 실정이다.In addition, the existing visual sensing method is difficult to consider various environments (ex. Weather, places, etc.), and uses spatial feature information within a single image, and has not developed a technology for object recognition according to analysis over time. It is true.

1. 대한민국 등록특허공보 제10-1415001호 “객체 검출 및 추적장치와 방법 및 그것을 사용하는 지능형 감시 시스템” (등록일자 :2013.01.31)1. Republic of Korea Patent Publication No. 10-1415001 "Object detection and tracking device and method and intelligent monitoring system using the same" (Registration date: 2013.01.31)

본 발명은 전술한 바와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명은 공간적 시각감지와 시간적 시각감지가 동시에 수행되도록 함으로써 객체 인식의 정확성을 제고하고자 함에 목적이 있다.SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object of the present invention is to improve the accuracy of object recognition by allowing spatial and temporal visual sensing to be performed simultaneously.

또한, 공간적 시각감지 및 시간적 시각감지 모두 특징 추출 단계를 공유하도록 구성됨으로써 특징 추출을 위한 계산량을 줄이고 수행 시간을 감소시키고자 함에 목적이 있다.In addition, both the spatial and temporal visual sensing are configured to share a feature extraction step, thereby reducing the amount of computation for feature extraction and reducing the execution time.

본 발명에서 이루고자 하는 기술적 목적들은 이상에서 언급한 사항들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 이하 설명할 본 발명의 실시 예들로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 고려될 수 있다.Technical objects to be achieved in the present invention are not limited to the above-mentioned matters, and other technical problems not mentioned above are provided to those skilled in the art from the embodiments of the present invention to be described below. May be considered.

본 발명의 일 실시 예로써, 입력된 영상 내에서 객체를 감지하기 위한 시각 감지 시스템이 제공될 수 있다.As an embodiment of the present disclosure, a visual sensing system for detecting an object in an input image may be provided.

본 발명의 일 실시 예에 따른 시각 감지 시스템은 신경망(neural network)을 이용하여 영상의 특징 맵(feature map)이 생성되는 특징추출부, 생성된 특징 맵을 이용하여 영상 내 객체의 위치 및 이름을 추정하기 위한 공간적 시각감지부, 생성된 특징 맵을 기초로 하여 시간 순서에 따라 영상 내 객체를 감지하기 위한 시간적 시각감지부 및 공간적 시각감지부로부터의 추정된 결과 및 시간적 시각감지부로부터의 감지된 결과를 기초로 하여 영상 내 객체 감지 결과를 판단하는 융합판단부가 포함될 수 있다.The visual sensing system according to an exemplary embodiment of the present invention uses a feature extractor to generate a feature map of an image using a neural network, and uses the generated feature map to determine the location and name of an object in the image. Spatial visual sensor for estimation, temporal visual sensor for detecting objects in the image in chronological order based on the generated feature map, estimated results from spatial visual sensor and detected from temporal visual sensor The fusion decision unit may be included to determine the object detection result in the image based on the result.

본 발명의 일 실시 예에 따른 시각 감지 시스템에서 특징추출부의 신경망에는 영상의 특징 맵을 생성하기 위한 복수의 컨볼루션 층(convolution layer)들 및 특징 맵을 샘플링(sampling)하기 위한 풀링 층(pooling layer)이 포함될 수 있다.In the neural network of the feature extractor according to an embodiment of the present invention, a plurality of convolution layers for generating a feature map of an image and a pooling layer for sampling the feature map are included in the neural network of the feature extractor. ) May be included.

본 발명의 일 실시 예에 따른 시각 감지 시스템의 공간적 시각감지부에는 적어도 하나의 은닉 층(hidden layer)으로 구성된 완전 연결 층(fully connected layer)이 포함될 수 있다.The spatial visual sensing unit of the visual sensing system according to an exemplary embodiment of the present invention may include a fully connected layer including at least one hidden layer.

본 발명의 일 실시 예에 따른 시각 감지 시스템의 특징추출부에서 생성된 특징 맵이 저장되는 저장부가 더 포함되고, 시간적 시각감지부는 재귀신경망(recurrent neural network)을 이용하여 시간적 흐름에 따라 영상 내 객체를 감지하며, 재귀신경망은 저장부에 저장된 특징 맵을 기초로 학습될 수 있다.The apparatus further includes a storage unit for storing a feature map generated by the feature extractor of the visual sensing system according to an embodiment of the present invention, wherein the temporal visual sensor is an object in the image according to a temporal flow using a recurrent neural network. The recursive neural network may be learned based on the feature map stored in the storage unit.

본 발명의 일 실시 예에 따른 시각 감지 시스템에서 신경망은 화면 전환(flip), 스케일링(scaling) 및 회전(rotation) 처리가 수행된 영상이 입력되어 학습될 수 있다.In the visual sensing system according to an embodiment of the present invention, the neural network may be input by learning an image on which screen flip, scaling, and rotation processing is performed.

본 발명의 일 실시 예로써, 시각 감지 시스템을 이용한 시각 감지 방법이 제공될 수 있다.As an embodiment of the present invention, a visual sensing method using a visual sensing system may be provided.

본 발명의 일 실시 예에 따른 시각 감지 방법은 영상이 입력되는 단계, 신경망(neural network)을 이용하여 영상의 특징 맵(feature map)이 생성되는 단계, 특징 맵을 이용하여 영상 내 객체의 위치 및 이름이 추정되는 단계, 특징 맵을 기초로 하여 시간 순서에 따라 영상 내 객체가 감지되는 단계 및 단계의 추정 결과 및 단계의 감지 결과를 기초로 하여 영상 내 객체 감지 결과가 판단되는 단계가 포함될 수 있다.In the visual sensing method according to an embodiment of the present invention, the step of inputting an image, generating a feature map of the image using a neural network, a position of an object in the image using the feature map, The step of estimating a name, the step of determining the object in the image based on the feature map, the step of detecting the object in the image based on the estimation result of the step and the detection result of the step based on the time sequence. .

본 발명의 일 실시 예에 따른 시각 감지 방법에서 특징 맵을 이용하여 영상 내 객체의 위치 및 이름이 추정되는 단계 및 특징 맵을 기초로 하여 시간 순서에 따라 영상 내 객체가 감지되는 단계가 동시에 수행될 수 있다.In the visual sensing method according to an embodiment of the present invention, the step of estimating the position and name of the object in the image using the feature map and the step of detecting the object in the image in chronological order based on the feature map may be simultaneously performed. Can be.

본 발명의 일 실시 예에 따른 시각 감지 방법의 신경망에는 영상의 특징 맵을 생성하기 위한 복수의 컨볼루션 층(convolution layer)들 및 특징 맵을 샘플링(sampling)하기 위한 풀링 층(pooling layer)이 포함될 수 있다.A neural network of a visual sensing method according to an embodiment of the present invention includes a plurality of convolution layers for generating a feature map of an image and a pooling layer for sampling a feature map. Can be.

본 발명의 일 실시 예에 따른 시각 감지 방법에서 특징 맵을 이용하여 영상 내 객체의 위치 및 이름이 추정되는 단계는 적어도 하나의 은닉 층(hidden layer)으로 구성된 완전 연결 층(fully connected layer)을 이용하여 수행될 수 있다.In the visual sensing method according to an embodiment of the present invention, the step of estimating the position and name of the object in the image using the feature map uses a fully connected layer composed of at least one hidden layer. Can be performed.

본 발명의 일 실시 예에 따른 시각 감지 방법은 신경망을 이용하여 영상의 특징 맵이 생성되는 단계에서 생성된 특징 맵이 시각 감지 시스템의 저장부에 저장되는 단계가 더 포함되고, 특징 맵을 기초로 하여 시간 순서에 따라 영상 내 객체가 감지되는 단계는 재귀신경망(recurrent neural network)을 이용하여 수행되며, 재귀신경망은 신경망을 이용하여 영상의 특징 맵이 생성되는 단계에서 저장된 특징 맵을 기초로 학습될 수 있다.The visual sensing method according to an embodiment of the present invention further includes the step of storing the feature map generated in the step of generating a feature map of the image using a neural network, in a storage unit of the visual sensing system, based on the feature map. Therefore, the step of detecting the object in the image according to the time sequence is performed using a recurrent neural network, and the recursive neural network is trained based on the stored feature map in the step of generating the feature map of the image using the neural network. Can be.

본 발명의 일 실시 예에 따른 시각 감지 방법에서, 신경망은 화면 전환(flip), 스케일링(scaling) 및 회전(rotation) 처리가 수행된 영상이 입력되어 학습될 수 있다.In the visual sensing method according to an embodiment of the present disclosure, the neural network may be trained by inputting an image on which screen flip, scaling, and rotation processing is performed.

한편, 본 발명의 일 실시 예로써, 전술한 방법을 구현하기 위한 프로그램이 기록된 컴퓨터로 판독 가능한 기록매체가 제공될 수 있다.Meanwhile, as an embodiment of the present invention, a computer-readable recording medium having recorded thereon a program for implementing the above method may be provided.

이와 같은 본 발명에 의해서, 공간적 시각감지 및 시간적 시각감지를 동시에 수행함으로써 영상 내 객체 인식에 있어서 오인식을 줄일 수 있다.According to the present invention, by performing spatial and temporal visual sensing at the same time, it is possible to reduce the misperception in the object recognition in the image.

또한, 공간적 시각감지 및 시간적 시각감지 모두 특징 추출 단계를 공유하도록 구성됨으로써 특징 추출을 위한 계산량을 줄이고 수행 시간을 줄일 수 있다.In addition, since both spatial and temporal visual sensing are configured to share the feature extraction step, it is possible to reduce the calculation amount for the feature extraction and to reduce the execution time.

본 발명의 실시 예들에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 이하의 본 발명의 실시 예들에 대한 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 도출되고 이해될 수 있다. 즉, 본 발명을 실시함에 따른 의도하지 않은 효과들 역시 본 발명의 실시 예들로부터 당해 기술분야의 통상의 지식을 가진 자에 의해 도출될 수 있다.The effects obtainable in the embodiments of the present invention are not limited to the above-mentioned effects, and other effects not mentioned above are common knowledge in the technical field to which the present invention pertains from the following description of the embodiments of the present invention. Can be clearly derived and understood by those who have In other words, unintended effects of practicing the present invention may also be derived by those skilled in the art from the embodiments of the present invention.

도 1은 종래의 시각 감지 방법을 사용하여 영상 내 객체를 인식하는 상태를 나타낸 예시도이다.
도 2는 본 발명의 일 실시 예에 따른 시각 감지 시스템을 나타낸 블록도이다.
도 3 및 도 4는 본 발명의 일 실시 예에 따른 시각 감지 시스템을 나타낸 예시도이다.
도 5는 본 발명의 일 실시 예에 따른 시각 감지 시스템을 이용하여 화재 감지 결과를 나타낸 예시도이다.
도 6a 내지 도 6c는 본 발명의 일 실시 예에 따른 시각 감지 시스템의 학습을 위한 데이터를 나타낸 예시도이다.
도 7a 및 도 7b는 본 발명의 일 실시 예에 따른 시각 감지 시스템을 이용한 산불 감지 결과를 나타낸 예시도이다.
도 8 내지 도 10은 본 발명의 일 실시 예에 따른 시각 감지 시스템을 이용한 시각 감지 방법을 나타낸 순서도이다.1 is an exemplary view illustrating a state of recognizing an object in an image using a conventional visual sensing method.
2 is a block diagram illustrating a visual sensing system according to an exemplary embodiment.
3 and 4 are exemplary views illustrating a visual sensing system according to an exemplary embodiment.
5 is an exemplary view showing a fire detection result using the visual detection system according to an embodiment of the present invention.
6A to 6C are exemplary diagrams illustrating data for learning a visual sensing system according to an exemplary embodiment.
7A and 7B are exemplary views illustrating a forest fire detection result using a visual sensing system according to an exemplary embodiment.
8 to 10 are flowcharts illustrating a visual sensing method using a visual sensing system according to an exemplary embodiment.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시 예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 발명에 대해 구체적으로 설명하기로 한다.Terms used herein will be briefly described, and the present invention will be described in detail.

본 발명에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다. The terms used in the present invention have been selected as widely used general terms as possible in consideration of the functions in the present invention, but this may vary according to the intention or precedent of the person skilled in the art, the emergence of new technologies, and the like. In addition, in certain cases, there is a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the invention. Therefore, the terms used in the present invention should be defined based on the meanings of the terms and the general contents of the present invention, rather than simply the names of the terms.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.또한, 명세서 전체에서 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, "그 중간에 다른 소자를 사이에 두고" 연결되어 있는 경우도 포함한다.When any part of the specification is to "include" any component, this means that it may further include other components, except to exclude other components unless specifically stated otherwise. In addition, the terms "... unit", "module", etc. described in the specification mean a unit that processes at least one function or operation, which may be implemented by hardware or software or a combination of hardware and software. In addition, when a part of the specification is "connected" to another part, this includes not only "directly connected", but also "connected with other elements in the middle". .

이하 첨부된 도면을 참고하여 본 발명을 상세히 설명하기로 한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 1은 종래의 시각 감지 방법을 사용하여 영상(10) 내 객체를 인식하는 상태를 나타낸 예시도이다. 도 1에서와 같이 입력된 이미지 내에서 객체를 인식하고 분류해내는 기술들이 다양하게 개발되고 있다. 특히, 딥러닝 알고리즘(Ex. 콘볼루션 신경망 등)을 이용하여 이미지 내의 객체를 인식하는 기술들로 인해 감지 성능은 인간의 시각 수준으로 높아지고 있다. 다만, 상기와 같은 기술의 개발로 감지 성능이 향상되었음에도 불구하고 오인식은 존재하며 이러한 오인식을 줄이기 위한 기술의 개발이 절실한 상황이다.1 is an exemplary view illustrating a state of recognizing an object in the image 10 using a conventional visual sensing method. As shown in FIG. 1, various techniques for recognizing and classifying objects in an input image have been developed. In particular, detection performance is being raised to the level of human vision due to techniques for recognizing objects in images using deep learning algorithms (eg, convolutional neural networks). However, even though the detection performance is improved by the development of the above technology, there is a misperception and there is an urgent need to develop a technology for reducing such misperception.

구체적으로, 도 1의 (a)와 같이 보행자가 두 명이 존재함에도 불구하고 한 명만 보행자로 인식되는 경우가 존재하며, 도 1의 (b)에서와 같이 자동차 번호판을 검출하려는 경우에도 자동차 번호판이 아닌 다른 부분이 자동차 번호판으로 잘못 인식되는 경우도 발생된다. 또한, 도 1의 (c) 에서처럼 화재감지가 전혀 되지 않는 문제점도 존재한다.Specifically, even though there are two pedestrians as shown in FIG. 1 (a), only one person is recognized as a pedestrian, and even when attempting to detect a license plate as shown in FIG. It also happens that other parts are mistakenly recognized as license plates. In addition, there is a problem that the fire detection is not at all as shown in (c) of FIG.

더불어, 기존의 시각 감지 방법은 다양한 환경(Ex. 날씨, 장소 등)을 고려하기 힘들고, 단일 영상(10) 내에서 공간적 특징 정보를 이용할 뿐 시간에 따른 분석에 따른 객체 인식을 위한 기술의 개발이 이루어지지 않은 실정이다.In addition, the conventional visual sensing method is difficult to consider various environments (ex. Weather, places, etc.), and uses the spatial feature information within a single image 10, and the development of a technique for object recognition according to analysis over time is difficult. It is not done.

이하에서는 이미지 내에서 객체 인식 시 공간적 시각감지뿐만 아니라 시간에 따른 객체 인식도 함께 고려한 위한 시각 감지 시스템을 설명한다.Hereinafter, a description will be given of a visual sensing system for considering not only spatial vision but also object recognition according to time when an object is recognized in an image.

도 2는 본 발명의 일 실시 예에 따른 시각 감지 시스템을 나타낸 블록도이다.2 is a block diagram illustrating a visual sensing system according to an exemplary embodiment.

도 2를 참조하면, 본 발명의 일 실시 예에 따른 시각 감지 시스템은 신경망(neural network)을 이용하여 영상(10)의 특징 맵(feature map)이 생성되는 특징추출부(100), 생성된 특징 맵을 이용하여 영상(10) 내 객체의 위치 및 이름을 추정하기 위한 공간적 시각감지부(200), 생성된 특징 맵을 기초로 하여 시간 순서에 따라 영상(10) 내 객체를 감지하기 위한 시간적 시각감지부(300) 및 공간적 시각감지부(200)로부터의 추정된 결과 및 시간적 시각감지부(300)로부터의 감지된 결과를 기초로 하여 영상(10) 내 객체 감지 결과를 판단하는 융합판단부(400)가 포함될 수 있다.Referring to FIG. 2, in the visual sensing system according to an exemplary embodiment, the feature extractor 100 generating a feature map of the image 10 by using a neural network, the generated feature Spatial visual sensor 200 for estimating the position and name of the object in the image 10 using the map, temporal visual for detecting the object in the image 10 in chronological order based on the generated feature map A fusion determination unit for determining an object detection result in the image 10 based on the estimated result from the sensor 300 and the spatial visual sensor 200 and the detected result from the temporal visual sensor 300 ( 400).

본 발명의 일 실시 예에 따른 시각 감지 시스템의 특징추출부(100)에서는 다양한 신경망 모델을 이용하여 입력된 영상(10)의 특징 맵이 생성될 수 있다. 특히, 컨볼루션 신경망(Convolution Neural Network, CNN)을 이용하는 경우에는 상기 신경망에는 영상(10)의 특징 맵을 생성하기 위한 복수의 컨볼루션 층(convolution layer)들 및 특징 맵을 샘플링(sampling)하기 위한 풀링 층(pooling layer)이 포함될 수 있다. The feature extractor 100 of the visual sensing system according to an exemplary embodiment may generate a feature map of the input image 10 using various neural network models. In particular, when using a convolutional neural network (CNN), the neural network includes a plurality of convolution layers for sampling a feature map and a feature map for generating a feature map of the image 10. Pooling layers may be included.

상기 특징추출부(100)에서 생성된 특징 맵은 후술하는 바와 같이 공간적 시각감지부(200) 및 시간적 시각감지부(300)에서 공유되는 특징이 있다. 즉, 본 발명의 시각 감지 시스템에서는 공간적 시각감지부(200)에 의한 영상(10) 내 객체의 추정과 시간적 시각감지부(300)에 의한 시간 순서에 따른 영상(10) 내 객체 감지가 동시에 수행될 수 있다. 이와 같은 공간적 시각감지부(200) 및 시간적 시각감지부(300)에 의해서 동시에 객체 감지를 위해서 특징추출부(100)에서 특징 맵이 생성되는 단계가 공간적 시각감지부(200) 및 시간적 시각감지부(300) 모두에 공유될 수 있다. 상기와 같이 공간적 시각감지부(200) 및 시간적 시각감지부(300)에서 특징 맵이 공유되도록 함으로써 영상(10) 내에서 특징 추출을 위한 계산량이 줄어들 수 있고, 수행 시간이 감소될 수 있다.The feature map generated by the feature extractor 100 may be shared by the spatial visual sensor 200 and the temporal visual sensor 300 as described below. That is, in the visual sensing system of the present invention, the estimation of the object in the image 10 by the spatial visual sensor 200 and the detection of the object in the image 10 by the temporal visual sensor 300 are performed simultaneously. Can be. The step of generating a feature map in the feature extractor 100 for object detection at the same time by the spatial visual sensor 200 and the temporal visual sensor 300 is the spatial visual sensor 200 and the temporal visual sensor. 300 can be shared to all. As described above, by allowing the feature map to be shared between the spatial visual sensor 200 and the temporal visual sensor 300, the calculation amount for feature extraction in the image 10 may be reduced, and the execution time may be reduced.

본 발명의 일 실시 예에 따른 시각 감지 시스템의 공간적 시각감지부(200)에서는 특징추출부(100)에서 생성된 특징 맵을 기초로 하여 입력된 영상(10) 내에서 객체의 위치 및 이름이 추정될 수 있다.In the spatial visual sensing unit 200 of the visual sensing system according to an exemplary embodiment, the position and name of the object are estimated in the input image 10 based on the feature map generated by the feature extractor 100. Can be.

구체적으로, 본 발명의 공간적 시각감지부(200)에는 적어도 하나의 은닉 층(hidden layer)으로 구성된 완전 연결 층(fully connected layer)이 포함됨으로써, 본 발명의 특징추출부(100)와 함께 컨볼루션 신경망(CNN)이 구성될 수 있다. Specifically, the spatial visual sensing unit 200 of the present invention includes a fully connected layer composed of at least one hidden layer, thereby convolving together with the feature extraction unit 100 of the present invention. Neural networks (CNNs) may be configured.

즉, 전술한 바와 같이 특징추출부(100)의 신경망은 공간적 시각감지부(200) 및 시간적 시각감지부(300)에 공유되는 특징이 있으므로, 특징추출부(100)의 컨볼루션 층들 및 풀링 레이어에 공간적 시각감지부(200)의 완전 연결 층이 결합됨으로써 입력된 단일 영상(10)의 객체가 인식될 수 있다.That is, as described above, since the neural network of the feature extractor 100 has a feature shared by the spatial visual sensor 200 and the temporal visual sensor 300, the convolution layers and the pooling layer of the feature extractor 100 are used. By combining the fully connected layer of the spatial visual sensing unit 200, an object of the input single image 10 may be recognized.

다만, 공간적 시각감지부(200)가 특징추출부(100)와 결합되어 컨볼루션 신경망(CNN) 모델이 적용되는 것뿐만 아니라, 공간적 객체검출 모델인 R-CNN(Region based CNN), Faster R-CNN, SSD(Single Shot multibox Detector), YOLO(You Only Look Once) 및 기계학습 모델 등이 사용자의 사용 목적에 따라 다양하게 적용될 수 있다.However, the spatial visual sensing unit 200 is combined with the feature extraction unit 100 to apply a convolutional neural network (CNN) model, as well as a spatial object detection model (R-CNN) and Faster R-. CNN, Single Shot Multibox Detector (SSD), You Only Look Once (YOLO), and machine learning models can be applied in various ways depending on the user's purpose of use.

또한, 본 발명의 일 실시 예에 따른 시각 감지 시스템의 시간적 시각감지부(300)에서는 특징추출부(100)에서 생성된 특징 맵을 기초로 하여 시간 순서에 따라 영상(10) 내 객체가 감지될 수 있다. 즉, 시간적 시각감지부(300)에서는 특정 시점(T_n)에 입력된 영상(10)에 대한 특징 맵 및 상기 특정 시점 이전(T_n _- ₁)에 입력된 영상(10)에 대한 특징 맵이 함께 고려되어 영상(10) 내 객체의 감지가 이루어질 수 있다.In addition, in the temporal visual sensor 300 of the visual sensing system according to an embodiment of the present invention, objects in the image 10 may be detected in a time sequence based on the feature map generated by the feature extractor 100. Can be. That is, in the temporal visual detection unit 300, the feature map for the image 10 input at a specific time point T _n and the feature map for the image 10 input before the specific time point T _n _- ₁ are included. Considered together, the object in the image 10 may be detected.

상기와 같은 시간적 시각감지부(300)에서의 객체 감지는 전술한 공간적 시각감지부(200)의 감지 결과와는 별도로 이루어지는 것으로 공간적 시각감지부(200)에서 객체의 위치나 이름 등을 추정하지 못하는 경우에도 시간적 시각감지부(300)에서 객체가 감지될 수 있다. 결국 공간적 시각감지부(200) 및 시간적 시각감지부(300)는 상호 보완적으로 동작함으로써 입력된 영상(10)의 객체를 인식할 수 있다.The object detection in the temporal visual sensor 300 is performed separately from the detection result of the spatial visual sensor 200 described above, and the spatial visual sensor 200 cannot estimate the position or name of the object. Even in this case, the object may be detected by the temporal visual sensor 300. As a result, the spatial visual sensing unit 200 and the temporal visual sensing unit 300 may complement each other to recognize the object of the input image 10.

구체적으로, 본 발명의 시간적 시각감지부(300)는 시퀀스 입력을 모두 고려하여 시간 순서에 따른 영상(10) 내 객체가 감지될 수 있는데, 이를 위해 다양한 시간적 시각감지 알고리즘(310)이 사용될 수 있다. 예를 들면, 상기 시간적 시각감지 알고리즘(310)에는 다층 퍼셉트론(Multi-Layer Perceptron, MLP), 재귀신경망(Recurrent Neural Network, RNN), LSTM(Long-Short Term Memory) 등이 포함될 수 있다. 상기의 RNN, LSTM 등을 이용한 객체 감지 분류모델에 의해 분류된 결과를 보팅(Voting), 앙상블(Ensemble)등의 알고리즘을 이용하여 객체 감지의 정확성을 제고할 수 있다. In detail, the temporal visual sensor 300 according to the present invention may detect objects in the image 10 according to a time sequence in consideration of all sequence inputs. Various temporal visual sensor algorithms 310 may be used for this purpose. . For example, the temporal visual sensing algorithm 310 may include a multi-layer perceptron (MLP), a recurrent neural network (RNN), a long-short term memory (LSTM), and the like. The accuracy of object detection may be improved using algorithms such as voting and ensemble based on the results classified by the object detection classification model using the RNN and LSTM.

본 발명의 일 실시 예에 따른 시각 감지 시스템은 특징추출부(100)에서 생성된 특징 맵이 공간적 시각감지부(200) 및 시간적 시각감지부(300)에 공유됨으로써 공간적 시각감지부(200) 및 시간적 시각감지부(300) 각각에서 영상(10) 내 객체 감지가 이루어지는데, 상기 각 감지 결과는 융합판단부(400)에서 종합적으로 판단이 이루어질 수 있다.In the visual sensing system according to an exemplary embodiment of the present invention, the feature map generated by the feature extractor 100 is shared by the spatial visual sensor 200 and the temporal visual sensor 300 so that the spatial visual sensor 200 and Object detection in the image 10 is performed by each of the temporal visual detection units 300, and the detection result may be comprehensively determined by the fusion determination unit 400.

구체적으로, 본 발명의 일 실시 예에 따른 시각 감지 시스템의 융합판단부(400)에서는 공간적 시각감지부(200) 및 시간적 시각감지부(300)의 감지 결과가 미감지인 경우에는 융합 결과를 '미경보'로 판단하며, 공간적 시각감지부(200) 및 시간적 시각감지부(300) 중 어느 하나의 감지 결과만이 감지인 경우에는 융합 결과를 '주의'로 판단할 수 있다. 또한, 공간적 시각감지부(200) 및 시간적 시각감지부(300) 모두의 감지된 것으로 결과가 나온 경우에는 융합판단부(400)는 '경보'로 판단할 수 있다. 본 발명의 융합판단부(400)의 판단 결과에서 '미경보'는 입력된 영상(10) 내에서 어떠한 객체(Ex.보행자, 화재 등)도 감지되지 않았다는 것을 의미하며, '주의'는 공간적 시각감지 결과와 시간적 시각감지 결과가 상이한 것으로 입력된 영상(10) 내 객체가 감지될 여지가 있는 것을 의미하거나 판단이 애매한 경우로써 판단보류 혹은 재판단이 요구되는 것을 의미한다. '경보'는 공간적 시각감지 결과 및 시간적 시각감지 결과 모두 감지된 것으로 판단되어 보행자나 화재 등의 객체가 감지된 것으로 판단된 것을 의미한다.Specifically, in the fusion determination unit 400 of the visual sensing system according to an embodiment of the present invention, when the detection result of the spatial visual sensing unit 200 and the temporal visual sensing unit 300 is undetected, the fusion result is referred to as 'not'. Alarm, and when the detection result of only one of the spatial visual sensor 200 and the temporal visual sensor 300 is detected, the fusion result may be determined as 'caution'. In addition, when the result of being detected by both the spatial visual sensing unit 200 and the temporal visual sensing unit 300, the fusion decision unit 400 may be determined as an 'alarm'. In the determination result of the fusion decision unit 400 of the present invention, 'unalarmed' means that no object (Ex. Pedestrian, fire, etc.) is detected in the input image 10, and 'attention' means spatial vision. This means that the object in the image 10 inputted as the detection result and the temporal visual detection result is different, or that the judgment is ambiguous, which means that the judgment hold or trial is required. 'Alarm' means that both the spatial and temporal visual sensing results are detected, and that objects such as pedestrians and fires are detected.

상기의 예시와 같이 본 발명의 융합판단부(400)에서 공간적 시각감지부(200) 및 시간적 시각감지부(300)의 감지 결과를 기초로 하여 융합하여 판단하는 내용이 아래의 [표 1]과 같이 정리될 수 있다.As shown in the above example, the fusion decision unit 400 of the present invention fuses and judges based on the detection result of the spatial visual sensing unit 200 and the temporal visual sensing unit 300. Can be arranged together.

[표 1]TABLE 1

또한, 본 발명의 융합판단부에서는 공간적 시각감지부(200)의 감지 결과가 미감지이고, 시간적 시각감지 결과는 감지로 판단된 경우에는 공간적 시각감지부(200)의 예측 임계값(threshold)을 재조정하여 다시 판단하도록 할 수 있다. 상기 예측 임계값은 본 발명의 특징추출부(100)와 공간적 시각감지부(200)가 결합되어 CNN, R-CNN 등으로 적용되는 경우 신경망의 노드(node) 혹은 은닉 층(hidden layer)의 여러 가중치 파라미터들 중 하나에 해당할 수 있다.In addition, in the fusion decision unit of the present invention, when the detection result of the spatial visual sensing unit 200 is undetected and the temporal visual sensing result is determined as the detection, the predicted threshold of the spatial visual sensing unit 200 is determined. It can be readjusted and judged again. The prediction threshold is a combination of the feature extraction unit 100 and the spatial visual sensing unit 200 of the present invention when applied to the CNN, R-CNN, etc., a number of nodes or hidden layers of the neural network. It may correspond to one of the weighting parameters.

도 3 및 도 4는 본 발명의 일 실시 예에 따른 시각 감지 시스템을 나타낸 예시도이다. 도 3을 참조하면, 도 3에서는 화재 감지를 위해 본 발명의 일 실시 예에 따른 시각 감지 시스템이 적용되어 입력된 영상(10)들로부터 화재가 감지되는 과정이 나타나 있다.3 and 4 are exemplary views illustrating a visual sensing system according to an exemplary embodiment. Referring to FIG. 3, in FIG. 3, a process of detecting a fire from the input images 10 by applying a visual sensing system according to an embodiment of the present disclosure is illustrated for detecting a fire.

구체적으로, 도 3에서와 같이 시간 순서에 따른 영상(10)들이 입력되고, 각각의 영상(10)들은 본 발명의 특징추출부(100)에 의해서 특징 맵이 생성될 수 있다. 특징추출부(100)에서 생성된 특징 맵은 공간적 시각감지부(200) 및 시간적 시각감지부(300) 모두에 공유되어 입력되어 공간적 시각감지부(200) 및 시간적 시각감지부(300) 각각에서 객체 감지가 이루어질 수 있다. 도 3에서는 시간적 시각감지부(300)에서 이용되는 시각감지 알고리즘 모델로 다층 퍼셉트론(MLP)알고리즘이 사용된 상태가 나타나 있다. 본 발명의 융합판단부(400)에서는 전술한 바와 같이 공간적 시각감지부(200)의 감지 결과와 시간적 시각감지부(300)의 감지 결과를 기초로 하여 감지 결과가 판단될 수 있다.In detail, as illustrated in FIG. 3, images 10 according to a time sequence are input, and each image 10 may have a feature map generated by the feature extractor 100 of the present invention. The feature map generated by the feature extractor 100 is shared and input to both the spatial visual sensor 200 and the temporal visual sensor 300 so that each of the spatial visual sensor 200 and the temporal visual sensor 300 is input. Object detection can be made. In FIG. 3, a multilayer perceptron (MLP) algorithm is used as a visual sensing algorithm model used in the temporal visual sensing unit 300. In the fusion decision unit 400 of the present invention, as described above, the detection result may be determined based on the detection result of the spatial visual sensor 200 and the detection result of the temporal visual sensor 300.

본 발명의 일 실시 예에 따른 시각 감지 시스템은 입력된 영상(10) 내의 객체가 감지(perception) 혹은 인식(recognition)은 전술한 바와 같이 특징추출부(100)의 신경망을 통해 특징 맵이 형성됨으로써 이루어질 수 있다. 상기 객체 감지 혹은 인식은 주어진 영상(10) 내에서 객체로 인식된 영역을 미리 분류된 복수개의 클래스(class)들 중 하나로 인지하는 것을 의미할 수 있다. 예를 들어, 화재 감지의 경우에는 입력된 영상(10) 내에서 불꽃이나 화염, 연기 등이 객체의 대상이 될 수 있다. 이러한 객체 감지 혹은 인식은 기계학습(machine learning) 또는 딥러닝(deep learning)을 통하여 수행될 수 있다. 기계학습 혹은 딥러닝에 의해 객체가 감지/인식되는 경우 데이터세트(train/validation/test dataset)을 이용하여 분류 모델(classification model)이 학습된 후, 입력된 영상(10)에 대하여 상기 복수개의 클래스들 중 어느 클래스에 해당되는지 판단될 수 있다.According to an embodiment of the present invention, in the visual sensing system, a feature map is formed through a neural network of the feature extractor 100 to detect or recognize an object in the input image 10. Can be done. The object detection or recognition may mean recognizing an area recognized as an object in a given image 10 as one of a plurality of classes classified in advance. For example, in the case of fire detection, a flame, flame, smoke, or the like may be the object of an object in the input image 10. Such object detection or recognition may be performed through machine learning or deep learning. When an object is detected / recognized by machine learning or deep learning, after a classification model is trained using a train / validation / test dataset, the plurality of classes are input to the input image 10. Which of these classes can be determined.

본 발명의 일 실시 예에 따른 시각 감지 시스템에서 입력된 영상(10)은 화면 전환(flip), 스케일링(scaling) 및 회전(rotation) 처리가 수행된 영상일 수 있다. 즉, 입력된 영상에 대하여 상기와 같은 화면 전환(flip), 스케일링(scaling) 및 회전(rotation) 처리가 수행되어 다양한 환경이 고려됨으로써 본 발명의 시각 감지 시스템을 이용한 강건한 검출이 이루어질 수 있다.The image 10 input by the visual sensing system according to an exemplary embodiment of the present invention may be an image on which screen flip, scaling, and rotation processing are performed. That is, the above screen flip, scaling, and rotation processes are performed on the input image, and various environments are considered, thereby making it possible to perform robust detection using the visual sensing system of the present invention.

구체적으로, 본 발명에서 특징추출부(100)의 신경망 학습을 위한 데이터세트에는 상기 화면 전환, 스케일링 또는 회전 등의 전처리(pre-processing)된 영상(10)들이 더 포함될 수 있다. 화면 전환(flip)은 기존의 데이터세트로 구비된 영상(10)에서 좌우 반전 혹은 상하 반전 등의 화면 전환을 통해서 학습 결과를 다양하게 하기 위한 것에 해당한다. 상기 스케일링(scaling)은 기존의 영상(10)의 크기를 조절하는 것에 해당하며, 스케일 비율(scale rate)는 다양하게 설정될 수 있다. 또한, 회전(rotation)도 마찬가지로 입력된 화면이 여러 각도로 회전됨으로써 데이터세트에 더 포함될 수 있다.Specifically, in the present invention, the data set for neural network learning of the feature extractor 100 may further include pre-processed images 10 such as screen switching, scaling, or rotation. The flip corresponds to varying the learning result through screen switching such as left and right or upside down in the image 10 provided with the existing data set. The scaling corresponds to adjusting the size of the existing image 10 and the scale rate may be set in various ways. In addition, the rotation may be included in the data set as the input screen is rotated by various angles.

상기의 데이터세트를 다양하게 하여 특징추출부(100)에서 특징 맵을 생성하기 위한 입력 데이터 생성 방법에 관한 예시적인 내용이 아래의 [표 2]와 같이 정리될 수 있다.Exemplary contents regarding the input data generation method for generating the feature map in the feature extractor 100 by varying the data set may be summarized as shown in Table 2 below.

[표 2]TABLE 2

본 발명의 일 실시 예에 따른 시각 감지 시스템의 특징추출부(100)에서 생성된 특징 맵이 저장되는 저장부가 더 포함되고, 시간적 시각감지부(300)는 재귀신경망(recurrent neural network)을 이용하여 시간적 흐름에 따라 영상(10) 내 객체를 감지하며, 재귀신경망은 저장부에 저장된 특징 맵을 기초로 학습될 수 있다.In addition, a storage unit for storing the feature map generated by the feature extraction unit 100 of the visual detection system according to an embodiment of the present invention, the temporal visual detection unit 300 by using a recurrent neural network (recurrent neural network) Objects in the image 10 are detected as time passes, and the recursive neural network can be learned based on the feature map stored in the storage unit.

구체적으로, 본 발명의 시각 감지 시스템에서 시간적 시각감지부(300)는 특징추출부(100)의 신경망을 통해 학습된 특징 맵들이 저장부에 저장되고, 상기 학습된 특징 맵들을 이용하여 재귀신경망이 특징추출부(100)의 신경망과는 별개로 학습될 수 있다. 즉, 본 발명의 일 실시 예에 따른 시각 감지 시스템의 실행과는 별도로, 학습(learning)과 관련하여서는 특징추출부(100)의 신경망 학습에 사용된 신경망 구조를 시간적 시각감지부(300)의 재귀신경망에서도 공동으로 사용될 수 있다.Specifically, in the visual sensing system of the present invention, the temporal visual sensor 300 stores feature maps learned through a neural network of the feature extractor 100 in a storage unit, and a recursive neural network uses the learned feature maps. The neural network of the feature extractor 100 may be learned separately. That is, apart from the execution of the visual sensing system according to an embodiment of the present invention, in relation to learning, the recursion of the temporal visual sensing unit 300 is performed on the neural network structure used for neural network learning of the feature extractor 100. It can also be used jointly in neural networks.

본 발명의 시각 감지 시스템은 입력된 영상(10) 내 객체를 검출하기 위한 다양한 분야(Ex. 차량 번호판 검출, 보행자 검출, CCTV를 통한 감시, 불량품 검사 또는 화재 감지 등)에 적용될 수 있다.The visual sensing system of the present invention can be applied to various fields (eg, vehicle license plate detection, pedestrian detection, CCTV monitoring, defective product inspection or fire detection) for detecting an object in the input image 10.

이하에서는 본 발명의 일 실시 예에 따른 시각 감지 시스템이 화재 감지에 적용되는 경우를 예를 들어 설명한다.Hereinafter, a case where the visual detection system according to an embodiment of the present invention is applied to fire detection will be described.

도 5는 시각 감지 시스템을 이용하여 화재 감지 결과를 나타낸 예시도이다. 도 5의 (a)는 시간적 시각감지만을 통해서 화재가 감지되는 상태를 나타낸 것이고, 도 5의 (b)는 본 발명의 일 실시 예에 따른 시각 감지 시스템에 따라 공간적 시각감지 및 시간적 시각감지가 함께 이루어진 상태를 나타낸 예시도이다.5 is an exemplary view illustrating a fire detection result using a visual detection system. FIG. 5 (a) shows a state in which a fire is detected only through temporal visual sensing, and FIG. 5 (b) shows spatial and temporal visual sensing according to the visual sensing system according to an embodiment of the present invention. It is an exemplary figure which shows the state made.

도 5의 (a)를 참조하면, 차량 상단에 불꽃이나 화염은 보이지 않고 검회색빛의 연기만이 관찰되는데, 시간적 시각감지에 의해서 연속적인 영상(10)을 감지한 결과 왼쪽 상단에 화재를 감지한 것으로 판단한 시간적 시각감지결과(301)가 나타나 있다. 반면, 도 5의 (b)는 차량이 불타오르는 모습이 나타나 있고, 불꽃 혹은 화염이 파란색 박스로 공간적 시각감지결과(201)가 표현되어 화재를 감지하고 있다. 또한, 도 5의 (a)와 마찬가지로 시간적 시각감지 결과도 화재를 감지한 것으로 판단되고 있다. 상기 내용을 기초로 하였을 때, 본 발명의 융합판단부(400)는 도 5의 (b)에 대하여 공간적 시각감지 및 시간적 시각감지 결과에 기초하여 '화재경보'라는 판단을 내릴 수 있다.Referring to (a) of FIG. 5, only flames or flames are observed at the top of the vehicle, and only black-gray smoke is observed. As a result of detecting the continuous image 10 by temporal visual detection, a fire is detected at the upper left corner. The temporal visual detection result 301 determined to have been shown is shown. On the other hand, Fig. 5 (b) shows the appearance of the vehicle is burning, the flame or flame is represented by a blue box, the spatial visual detection result 201 is to detect the fire. In addition, as in FIG. 5A, it is determined that the temporal visual detection result detects a fire. Based on the above description, the fusion decision unit 400 of the present invention may make a determination of 'fire alarm' on the basis of the spatial visual and temporal visual sensing results with respect to FIG.

도 6a 내지 도 6c는 본 발명의 일 실시 예에 따른 시각 감지 시스템의 학습을 위한 데이터를 나타낸 예시도이다. 본 발명의 시각 감지 시스템의 학습은 전술한 데이터세트(train/validation/test dataset)을 이용하여 이루어질 수 있는데, 화재 감시에 적용되는 경우에는 도 6a내지 도 6c에 도시된 바와 같이 다양한 연기 혹은 화재 영상(10)이 상기 데이터세트로 활용될 수 있다.6A to 6C are exemplary diagrams illustrating data for learning a visual sensing system according to an exemplary embodiment. Learning of the visual sensing system of the present invention can be made using the above-described dataset (train / validation / test dataset). When applied to fire monitoring, various smoke or fire images as shown in FIGS. 6A to 6C are illustrated. 10 can be utilized as the data set.

도 6a내지 도 6c를 참조하면, 데이터세트가 다양한 화재 영상(10)들로 구성될 수 있도록, 화재 감시를 위한 데이터세트에는 도 6a와 같이 배경이 존재하는 화염 영상(10), 도 6b와 같이 배경이 존재하는 연기 영상(10)뿐만 아니라 도 6c에서와 같이 배경이 존재하지 않는 연기 영상(10)까지 포함될 수 있다. 또한, 화재가 일어난 시기도 다양하게 화재 초기부터 진압되는 순간까지 나타난 영상(10)들이 데이터세트에 포함될 수 있다.6A to 6C, a flame image 10 having a background as shown in FIG. 6A and a flame image 10 as shown in FIG. 6B so that the data set may include various fire images 10. Not only the smoke image 10 having a background may be included, but also the smoke image 10 having no background as shown in FIG. 6C. In addition, the image 10 may be included in the data set from the beginning of the fire to the moment of extinguishing.

도 7a 및 도 7b는 본 발명의 일 실시 예에 따른 시각 감지 시스템을 이용한 산불 감지 결과를 나타낸 예시도이다.7A and 7B are exemplary views illustrating a forest fire detection result using a visual sensing system according to an exemplary embodiment.

도 7a를 참조하면, 산 중턱에 운무(雲霧)가 나타난 영상(10)이 도시되어 있다. 본 발명의 공간적 시각감지부(200)는 연기인 것으로 감지하여 상기 도 7a에는 파란색 박스(201)를 통해 화재가 발생한 영역이 표시되어 있다. 다만, 본 발명의 시간적 시각감지부(300)에서는 시간 순서에 따라 감지 결과가 '미감지'로 판단하여, 도 7a에는 융합 결과(401) 미감지로 판단된 상태가 도시되어 있다. 상기 공간적 시각감지부(200) 및 시간적 시각감지부(300)의 감지 결과 각각이 고려되어 융합판단부(400)에서는 '주의'로 판단될 수 있다.Referring to FIG. 7A, there is shown an image 10 in which clouds appear on the mountainside. The spatial visual sensing unit 200 of the present invention detects that the smoke is a smoke area is shown in Figure 7a through the blue box 201. However, the temporal visual detection unit 300 of the present invention determines that the detection result is 'not detected' according to the time sequence, and FIG. 7A shows a state in which it is determined that the fusion result 401 is not detected. Each of the results of the detection of the spatial visual sensing unit 200 and the temporal visual sensing unit 300 may be considered, and thus may be determined as 'caution' in the fusion determination unit 400.

이와는 달리 도 7b를 참조하면, 산 속 건물에 화재가 발생한 영상(10)이 도시되어 있다. 본 발명의 공간적 시각감지부(200)에서는 화재 및 연기를 감지하여 상기 도 7b에는 파란색 박스를 통해 화재가 발생한 영역이 표시되어 있다. 또한, 본 발명의 시간적 시각감지부(300)에서는 시간 순서에 따라 감지 결과를 '감지'로 판단하여, 도 7b에는 융합 결과(401) 감지로 판단된 상태가 도시되어 있다. 이에 따라, 융합판단부(400)에서는 공간적 시각감지 및 시간적 시각감지 모두 감지로 판단되었으므로 상기 결과들을 융합하여 감지 결과를 '화재경보'로 판단될 수 있다.Unlike this, referring to FIG. 7B, an image 10 in which a fire occurs in a mountain building is shown. In the spatial visual detection unit 200 of the present invention, the fire and the smoke are sensed, and the area in which the fire is generated is shown in FIG. 7B through the blue box. In addition, the temporal visual sensor 300 of the present invention determines the sensing result as 'detection' according to the time sequence, and FIG. 7B shows a state determined as sensing the fusion result 401. Accordingly, in the fusion determination unit 400, both the spatial and temporal visual sensing were judged as sensing, and thus the sensing results may be determined as 'fire alarm' by fusing the results.

도 8 내지 도 10은 본 발명의 일 실시 예에 따른 시각 감지 시스템을 이용한 시각 감지 방법을 나타낸 순서도이다.8 to 10 are flowcharts illustrating a visual sensing method using a visual sensing system according to an exemplary embodiment.

도 8을 참조하면, 본 발명의 일 실시 예에 따른 시각 감지 방법은 영상(10)이 입력되는 단계, 신경망(neural network)을 이용하여 영상(10)의 특징 맵(feature map)이 생성되는 단계, 특징 맵을 이용하여 영상(10) 내 객체의 위치 및 이름이 추정되는 단계, 특징 맵을 기초로 하여 시간 순서에 따라 영상(10) 내 객체가 감지되는 단계 및 단계의 추정 결과 및 단계의 감지 결과를 기초로 하여 영상(10) 내 객체 감지 결과가 판단되는 단계가 포함될 수 있다.Referring to FIG. 8, in the visual sensing method according to an exemplary embodiment of the present invention, an image 10 is input, and a feature map of the image 10 is generated using a neural network. Estimating the position and name of the object in the image 10 using the feature map, detecting the result of the step and the step of detecting the object in the image 10 in chronological order based on the feature map Based on the result, the step of determining the object detection result in the image 10 may be included.

도 9를 참조하면, 본 발명의 일 실시 예에 따른 시각 감지 방법에서 특징 맵을 이용하여 영상(10) 내 객체의 위치 및 이름이 추정되는 단계 및 특징 맵을 기초로 하여 시간 순서에 따라 영상(10) 내 객체가 감지되는 단계가 동시에 수행될 수 있다.Referring to FIG. 9, in the visual sensing method according to an embodiment of the present invention, the position and the name of the object in the image 10 are estimated by using the feature map, and the image is arranged in time order based on the feature map. 10) The step of detecting the object within can be performed at the same time.

구체적으로, 본 발명의 일 실시예에 따르면, 공간적 시각감지와 시간적 시각감지가 동시에 병렬적으로 수행되어 얻어진 결과를 종합하여 객체 검출(감지)에 사용할 수 있다. 또한, 본 발명의 일 실시 예에 따르면 공간적 시각감지에 따른 결과(예컨대, 영상(10)으로부터 추출된 특징)를 시간적 시각감지 과정에 적용함으로써 영상(10)의 특징 추출에 대한 계산량을 늘리지 않을 뿐만 아니라, 객체 검출(감지)에 소요될 시간을 대폭 감소시킬 수 있다는 이점이 있다.Specifically, according to one embodiment of the present invention, the spatial and temporal visual sensing can be used in the object detection (detection) by combining the results obtained in parallel. In addition, according to an embodiment of the present invention, by applying the result of the spatial visual sensing (for example, the feature extracted from the image 10) to the temporal visual sensing process, the calculation amount for the feature extraction of the image 10 is not increased. Rather, there is an advantage that the time required for object detection (detection) can be greatly reduced.

본 발명의 일 실시 예에 따른 시각 감지 방법의 신경망에는 영상(10)의 특징 맵을 생성하기 위한 복수의 컨볼루션 층(convolution layer)들 및 특징 맵을 샘플링(sampling)하기 위한 풀링 층(pooling layer)이 포함될 수 있다.A neural network of a visual sensing method according to an embodiment of the present invention includes a plurality of convolution layers for generating a feature map of an image 10 and a pooling layer for sampling a feature map. ) May be included.

본 발명의 일 실시 예에 따른 시각 감지 방법에서 특징 맵을 이용하여 영상(10) 내 객체의 위치 및 이름이 추정되는 단계는 적어도 하나의 은닉 층(hidden layer)으로 구성된 완전 연결 층(fully connected layer)을 이용하여 수행될 수 있다.In the visual sensing method according to an embodiment of the present invention, the step of estimating the position and name of the object in the image 10 using the feature map may include a fully connected layer including at least one hidden layer. ) May be performed using

본 발명의 일 실시 예에 따른 시각 감지 방법은 신경망을 이용하여 영상(10)의 특징 맵이 생성되는 단계에서 생성된 특징 맵이 시각 감지 시스템의 저장부에 저장되는 단계가 더 포함되고, 특징 맵을 기초로 하여 시간 순서에 따라 영상(10) 내 객체가 감지되는 단계는 재귀신경망(recurrent neural network)을 이용하여 수행되며, 재귀신경망은 신경망을 이용하여 영상(10)의 특징 맵이 생성되는 단계에서 저장된 특징 맵을 기초로 학습될 수 있다.The visual sensing method according to an embodiment of the present invention further includes the step of storing the feature map generated in the step of generating a feature map of the image 10 using a neural network, in a storage unit of the visual sensing system, and the feature map The step of detecting an object in the image 10 based on time based on the step is performed using a recurrent neural network, the step of generating a feature map of the image 10 by using a neural network Can be learned based on the stored feature map.

본 발명의 일 실시 예에 따른 시가 감지 방법에서, 신경망은 화면 전환(flip), 스케일링(scaling) 및 회전(rotation) 처리가 수행된 영상이 입력되어 학습될 수 있다.In the cigar detection method according to an embodiment of the present disclosure, the neural network may be input by learning an image on which screen flip, scaling, and rotation processing is performed.

또한, 전술한 방법은 컴퓨터에서 실행될 수 있는 프로그램으로 작성 가능하고, 컴퓨터 판독 가능 매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다. 또한, 상술한 방법에서 사용된 데이터의 구조는 컴퓨터 판독 가능 매체에 여러 수단을 통하여 기록될 수 있다. 본 발명의 다양한 방법들을 수행하기 위한 실행 가능한 컴퓨터 프로그램이나 코드를 기록하는 기록 매체는, 반송파(carrier waves)나 신호들과 같이 일시적인 대상들은 포함하는 것으로 이해되지는 않아야 한다. 상기 컴퓨터 판독 가능 매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드 디스크 등), 광학적 판독 매체(예를 들면, 시디롬, DVD 등)와 같은 저장 매체를 포함할 수 있다.In addition, the above-described method may be written as a program executable in a computer, and may be implemented in a general-purpose digital computer operating the program using a computer readable medium. In addition, the structure of the data used in the above-described method can be recorded on the computer-readable medium through various means. A recording medium for recording an executable computer program or code for performing various methods of the present invention should not be understood to include temporary objects, such as carrier waves or signals. The computer readable medium may include a storage medium such as a magnetic storage medium (eg, a ROM, a floppy disk, a hard disk, etc.), an optical reading medium (eg, a CD-ROM, a DVD, etc.).

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The foregoing description of the present invention is intended for illustration, and it will be understood by those skilled in the art that the present invention may be easily modified in other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, the embodiments described above are to be understood in all respects as illustrative and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is shown by the following claims rather than the above description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present invention. do.

10 : 영상 100 : 특징추출부
200 : 공간적 시각감지부 201 : 공간적 시각감지결과
300 : 시간적 시각감지부 301 : 시간적 시각감지결과
310 : 시간적 시각감지 알고리즘 320 : 분류모델
400 : 융합판단부 401 : 융합결과10: image 100: feature extraction unit
200: spatial visual sensing unit 201: spatial visual sensing result
300: temporal visual detection unit 301: temporal visual detection result
310: temporal visual detection algorithm 320: classification model
400: fusion decision unit 401: fusion results

Claims

A visual sensing system for detecting an object in an input image,
A feature extractor for generating a feature map of the image using a neural network;
A spatial visual sensing unit for estimating the location and name of the object in the image using the generated feature map;
A temporal visual sensor for detecting an object in the image based on the generated feature map in chronological order; and
And a fusion determination unit that determines an object detection result in the image based on the estimated result from the spatial visual sensor and the detected result from the temporal visual sensor.

The method of claim 1,
The neural network of the feature extractor includes a plurality of convolution layers for generating a feature map of the image and a pooling layer for sampling the feature map. Detection system.

The method of claim 1,
And the spatial vision sensing unit includes a fully connected layer composed of at least one hidden layer.

The method of claim 1,
Further comprising a storage unit for storing the feature map generated by the feature extractor,
The temporal visual sensor detects an object in the image over time using a recurrent neural network,
The recursive neural network is trained based on a feature map stored in the storage unit.

The method of claim 1,
The neural network is a visual sensing system, characterized in that the image that is performed to the screen (flip), scaling (scaling) and rotation (rotation) process is input and learned.

In the time detection method using a time detection system,
(a) inputting an image;
(b) generating a feature map of the image using a neural network;
(c) estimating the location and name of the object in the image using the feature map;
(d) detecting an object in the image in chronological order based on the feature map; And
(e) determining the object detection result in the image based on the estimation result of step (c) and the detection result of step (d).

The method of claim 6,
(C) and (d) are performed simultaneously.

The method of claim 6,
The neural network includes a plurality of convolution layers for generating a feature map of the image and a pooling layer for sampling the feature map.

The method of claim 6,
The step (c) is performed by using a fully connected layer consisting of at least one hidden layer (hidden layer).

The method of claim 6,
And storing the feature map generated in the step (b) in a storage unit of the visual sensing system.
Step (d) is performed using a recurrent neural network,
And the recursive neural network is trained based on the feature map stored in step (b).

The method of claim 6,
The neural network is a visual sensing method characterized in that the image is subjected to the screen flip (scaling), scaling (scaling) and rotation processing is input and learned.

A computer-readable recording medium having recorded thereon a program for implementing the method of claim 6.