KR102522399B1

KR102522399B1 - Method for detecting object and system therefor

Info

Publication number: KR102522399B1
Application number: KR1020190082471A
Authority: KR
Inventors: 전은솜; 문일현; 정재협
Original assignee: 주식회사 케이티
Priority date: 2019-07-09
Filing date: 2019-07-09
Publication date: 2023-04-14
Also published as: KR20210006627A

Abstract

본 발명은 컨볼루션 신경망(Convolutional Neural Network: CNN)을 이용하여 추출된 특징 맵을 토대로 객체를 인식하는 객체 확인 방법 및 이를 위한 시스템에 관한 것이다. 본 발명의 실시예에 따른 영상 분석 시스템에서 객체를 확인하는 방법은, 원본 영상에 대해서 다수의 컨볼루션 연산을 수행하여, 서로 다른 크기의 복수의 기준 특징 맵을 원본 영상으로부터 추출하는 단계; 상기 기준 특징 맵 각각에 대하여 객체가 표출되는 영역을 확인하고, 상기 확인한 표출 영역을 기초로 서로 다른 형상의 복수의 후보 영역을 원본 영상에 설정하는 단계; 상기 후보 영역 각각에 대하여 상기 객체가 표출되는 확률을 산출하고, 각 후보 영역 간의 중첩 비율을 산출하는 단계; 및 상기 산출된 객체 표출 확률과 중첩 비율을 기초로 객체가 위치하는 영역을 원본 영상에서 인식하는 단계를 포함한다.The present invention relates to an object identification method for recognizing an object based on a feature map extracted using a convolutional neural network (CNN) and a system therefor. A method for identifying an object in an image analysis system according to an embodiment of the present invention includes extracting a plurality of reference feature maps of different sizes from an original image by performing a plurality of convolution operations on an original image; checking regions where objects are displayed for each of the reference feature maps, and setting a plurality of candidate regions having different shapes in an original image based on the checked expression regions; calculating a probability that the object is displayed for each of the candidate regions and calculating an overlapping ratio between each candidate region; and recognizing a region where an object is located in an original image based on the calculated object presentation probability and overlap ratio.

Description

Method for detecting object and system therefor}

본 발명은 영상 분석을 통해 객체를 확인하는 방법에 관한 것으로서, 더욱 상세하게는 컨볼루션 신경망(Convolutional Neural Network: CNN)을 이용하여 추출된 특징 맵을 토대로 객체를 확인하는 객체 확인 방법 및 이를 위한 시스템에 관한 것이다. The present invention relates to a method for identifying an object through image analysis, and more particularly, to a method for identifying an object based on a feature map extracted using a convolutional neural network (CNN), and a system therefor. It is about.

카메라로부터 수신한 영상을 분석하여, 객체를 인식하는 기술이 개발되어 이용되고 있다. 이러한 객체 인식 기술로서, 차량 번호를 인식하는 기술과 사용자를 인식하는 기술 등을 대표적으로 둘 수 있다. A technology for recognizing an object by analyzing an image received from a camera has been developed and used. As such an object recognition technology, a vehicle number recognition technology and a user recognition technology may be representatively placed.

사용자를 정확하게 인식하기 위하여, 컨볼루션 신경망(Convolutional Neural Network: CNN) 기술이 이용되고 있다. 상기 컨볼루션 신경망 기술은, 다수의 컨볼루션과 풀링(pooling) 단계를 거쳐, 이미지에 대한 특징 맵을 구축하고, 이 특징 맵을 이용하여 영상 내에서 원하는 객체를 인식한다.In order to accurately recognize a user, convolutional neural network (CNN) technology is used. The convolutional neural network technology constructs a feature map for an image through a plurality of convolution and pooling steps, and recognizes a desired object in an image using the feature map.

이러한 컨볼루션 신경망을 이용한 객체 인식 기술은, 고정형 카메라가 장착된 환경에서 주로 이용되고 있다. 즉, 컨볼루션 신경망을 이용한 객체 인식 기술은, 일정한 위치와 높이에 카메라가 설치되어 있으며, 이 카메라로부터 수신한 일정한 촬영 각도의 영상을 분석하여 객체를 인식한다. 그런데 이러한 객체 인식 방법은, 한정된 촬영 각도 범위에서 촬영된 영상을 분석 대상으로 삼고 있기 때문에, 촬영 각도가 기존 촬영 각도에서 벗어나는 경우 정확도와 인식 속도가 떨어지는 문제점이 있다. 부연하면, 이동형 카메라와 웨어러블 카메라에서 등과 같이 다양한 각도에서 영상이 수신되는 환경에서, 기존의 객체 인식 기술을 적용하는 경우, 객체 인식 결과의 정확도가 높지 않고 객체를 검출하는 데에 많은 시간이 소요되는 문제점이 있다.Object recognition technology using such a convolutional neural network is mainly used in an environment in which a fixed camera is mounted. That is, in object recognition technology using a convolutional neural network, a camera is installed at a certain position and height, and an object is recognized by analyzing an image received from the camera at a certain shooting angle. However, since this object recognition method takes an image captured in a limited range of shooting angles as an analysis target, accuracy and recognition speed are degraded when the shooting angle deviates from the existing shooting angle. In other words, in an environment where images are received from various angles, such as from a mobile camera and a wearable camera, when the existing object recognition technology is applied, the accuracy of the object recognition result is not high and it takes a lot of time to detect the object. There is a problem.

본 발명은 이러한 종래의 문제점을 해결하기 위하여 제안된 것으로, 이동형 카메라가 적용되는 환경에서도, 신속하고 정확하게 객체를 확인할 수 있는 객체 확인 방법 및 이를 위한 시스템을 제공하는데 그 목적이 있다.The present invention has been proposed to solve these conventional problems, and an object of the present invention is to provide an object identification method and a system therefor capable of quickly and accurately identifying an object even in an environment where a mobile camera is applied.

본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.Other objects and advantages of the present invention can be understood by the following description, and will be more clearly understood by the examples of the present invention. It will also be readily apparent that the objects and advantages of the present invention may be realized by means of the instrumentalities and combinations indicated in the claims.

상기 목적을 달성하기 위한 본 발명의 제1측면에 따른 영상 분석 시스템에서 객체를 확인하는 방법은, 원본 영상에 대해서 다수의 컨볼루션 연산을 수행하여, 서로 다른 크기의 복수의 기준 특징 맵을 원본 영상으로부터 추출하는 단계; 상기 기준 특징 맵 각각에 대하여 객체가 표출되는 영역을 확인하고, 상기 확인한 표출 영역을 기초로 서로 다른 형상의 복수의 후보 영역을 원본 영상에 설정하는 단계; 상기 후보 영역 각각에 대하여 상기 객체가 표출되는 확률을 산출하고, 각 후보 영역 간의 중첩 비율을 산출하는 단계; 및 상기 산출된 객체 표출 확률과 중첩 비율을 기초로 객체가 위치하는 영역을 원본 영상에서 인식하는 단계를 포함한다. In order to achieve the above object, a method for identifying an object in an image analysis system according to a first aspect of the present invention performs a plurality of convolution operations on an original image to obtain a plurality of reference feature maps having different sizes from the original image. Extracting from; checking regions where objects are displayed for each of the reference feature maps, and setting a plurality of candidate regions having different shapes in an original image based on the checked expression regions; calculating a probability that the object is displayed for each of the candidate regions and calculating an overlapping ratio between each candidate region; and recognizing a region where an object is located in an original image based on the calculated object presentation probability and overlap ratio.

상기 인식하는 단계는, 상기 후보 영역 간의 중첩 비율이 사전에 설정된 임계 비율을 초과하면, 중첩된 후보 영역 중에서 가장 높은 확률을 가지는 영역을 객체가 위치하는 영역으로 인식하는 단계를 포함할 수 있다. The recognizing may include recognizing a region having the highest probability among overlapping candidate regions as the region where the object is located, when an overlapping ratio between the candidate regions exceeds a preset threshold ratio.

상기 인식하는 단계는, 사전에 설정된 임계값을 초과하는 객체 표출 확률을 가지는 후보 영역이 복수이고, 상기 후보 영역 각각의 중첩 비율이 사전에 설정된 임계 비율 이하이면, 복수의 후보 영역을 객체가 위치하는 영역으로 인식할 수 있다.In the recognizing, if there are a plurality of candidate regions having an object presentation probability exceeding a preset threshold and an overlap ratio of each of the candidate regions is equal to or less than a preset threshold, the object is located in the plurality of candidate regions. area can be recognized.

상기 복수의 후보 영역을 원본 영상에서 설정하는 단계는, 상기 원본 영상에서 상기 기준 특징 맵의 객체가 표출되는 영역과 매칭되는 영역을 확인하고, 상기 확인한 영역의 중심좌표를 산출하는 단계; 및 상기 중심좌표를 기준으로 서로 다른 형상의 복수의 후보 영역을 상기 원본 영상에서 설정하는 단계를 포함할 수 있다.The setting of the plurality of candidate regions in the original image may include: identifying regions in the original image that match regions in which objects of the reference feature map are expressed, and calculating center coordinates of the identified regions; and setting a plurality of candidate regions having different shapes based on the center coordinates in the original image.

상기 방법은, 상기 인식하는 단계 이후에, 기준 위치에서 상기 인식한 객체에 대한 방향을 나타내는 방향 정보를 생성하는 단계; 및 상기 생성한 방향 정보를 사용자에게 제공하는 단계를 포함할 수 있다.The method may include, after the recognizing step, generating direction information indicating a direction to the recognized object from a reference location; and providing the generated direction information to a user.

상기 복수의 기준 특징 맵을 원본 영상으로부터 추출하는 단계는, 상기 원본 영상에 대해서 다수의 컨볼루션 연산을 수행하여, 최소 크기의 기준 특징 맵을 상기 원본 영상으로부터 추출하는 단계; 가장 마지막으로 사용된 컨볼루션 연산의 입력값으로 이용된 특징 맵을 업샘플링하고, 이 업샘플링한 특징 맵과 이전에 추출된 특징 맵을 접합 연산하여 새로운 특징 맵을 추출하는 단계; 및 상기 접합 연산하여 추출한 특징 맵을 컨볼루션 연산하여 상기 최소 크기의 기준 특징 맵보다 큰 크기를 가지는 하나 이상의 기준 특징 맵을 원본 영상으로부터 추출하는 단계를 포함할 수 있다.The extracting of the plurality of reference feature maps from the original image may include extracting a reference feature map having a minimum size from the original image by performing a plurality of convolution operations on the original image; Upsampling a feature map used as an input value of a most recently used convolution operation, and extracting a new feature map by performing a concatenation operation between the upsampled feature map and a previously extracted feature map; and extracting one or more reference feature maps having a size greater than the reference feature map of the minimum size from the original image by performing a convolution operation on the feature maps extracted through the concatenation operation.

상기 목적을 달성하기 위한 본 발명의 제2측면에 따른, 영상을 분석하여 객체를 확인하는 영상 분석 시스템은, 원본 영상을 수신하는 영상 수신부; 상기 원본 영상에 대해서 다수의 컨볼루션 연산을 수행하여, 서로 다른 크기의 복수의 기준 특징 맵을 원본 영상으로부터 추출하는 특징 맵 추출부; 및 기준 특징 맵 각각에 대하여 객체가 표출되는 영역을 확인하여, 상기 확인한 표출 영역을 기초로 서로 다른 형상의 복수의 후보 영역을 원본 영상에 설정하고, 상기 후보 영역 각각에 대하여 상기 객체가 표출되는 확률을 산출하고, 각 후보 영역 간의 중첩 비율을 산출한 후, 상기 산출한 객체 표출 확률과 중첩 비율을 기초로 객체가 위치하는 영역을 원본 영상에서 인식하는 객체 인식부를 포함한다.According to a second aspect of the present invention for achieving the above object, an image analysis system for analyzing an image to identify an object includes: an image receiving unit for receiving an original image; a feature map extraction unit extracting a plurality of reference feature maps having different sizes from the original image by performing a plurality of convolution operations on the original image; and a region in which an object is displayed for each reference feature map is identified, a plurality of candidate regions having different shapes are set in an original image based on the identified expression region, and a probability that the object is represented for each of the candidate regions and an object recognizing unit that recognizes a region where an object is located in an original image based on the calculated object expression probability and overlap ratio after calculating an overlapping ratio between candidate regions.

본 발명은 이동형 카메라로부터 수신한 이미지를 분석하여, 빠르고 정확하게 영상 내에서 객체를 검출하여 객체 확인에 따른 서비스를 실시간으로 사용자에게 제공하는 장점이 있다. The present invention has the advantage of analyzing an image received from a mobile camera, quickly and accurately detecting an object in the image, and providing a service according to the object confirmation to the user in real time.

또한, 본 발명은 서로 다른 기준 특징 맵(feature map)을 분석하여, 서로 다른 형상의 복수의 후보 영역을 선정하고, 이 선정한 후보 영역의 겹침 정도와 확률에 따라, 원본 영상에서 단일 또는 복수의 객체를 인식함으로써, 큰 객체뿐만 아니라 작은 객체까지도 정확하게 확인하는 이점이 있다. In addition, the present invention analyzes different reference feature maps to select a plurality of candidate regions of different shapes, and according to the overlapping degree and probability of the selected candidate regions, single or multiple objects in the original image By recognizing , there is an advantage in accurately identifying not only large objects but also small objects.

게다가, 본 발명은 객체의 방향 정보를 사용자에게 제공함으로써, 실제 공간 내에서 사용자가 객체까지 이동하는데 편의성을 제공하는 효과가 있다.In addition, the present invention has an effect of providing convenience for the user to move to the object in a real space by providing the direction information of the object to the user.

본 명세서에 첨부되는 다음의 도면들은 본 발명의 바람직한 실시예를 예시하는 것이며, 발명을 실시하기 위한 구체적인 내용과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니 된다.
도 1은 본 발명의 일 실시예에 따른, 영상 분석 시스템이 적용되는 환경을 나타내는 도면이다.
도 2는 본 발명의 일 실시예에 따른 영상 분석 시스템의 구성을 나타내는 도면이다.
도 3은 본 발명의 일 실시예에 따른, 영상 분석 시스템에서 기준 특징 맵을 생성하는 방법을 설명하는 흐름도이다.
도 4는 본 발명의 일 실시예에 따른, 영상 분석 시스템에서 객체를 확인하는 방법을 설명하는 흐름도이다.
도 5는 복수의 후보 영역을 예시하는 도면이다.
도 6은 객체가 인식된 영역을 예시하는 도면이다.
도 7은 실내에서 사람을 인식한 영상을 예시하는 도면이다.
도 8은 영상 분석 시스템이 주차 관리 서비스에 적용되는 것을 예시한 도면이다. The following drawings attached to this specification illustrate preferred embodiments of the present invention, and serve to further understand the technical idea of the present invention together with specific details for carrying out the invention, so the present invention is described in such drawings should not be construed as limited to
1 is a diagram illustrating an environment to which an image analysis system according to an embodiment of the present invention is applied.
2 is a diagram showing the configuration of an image analysis system according to an embodiment of the present invention.
3 is a flowchart illustrating a method of generating a reference feature map in an image analysis system according to an embodiment of the present invention.
4 is a flowchart illustrating a method of identifying an object in an image analysis system according to an embodiment of the present invention.
5 is a diagram illustrating a plurality of candidate regions.
6 is a diagram illustrating a region in which an object is recognized.
7 is a diagram illustrating an image of recognizing a person indoors.
8 is a diagram illustrating that an image analysis system is applied to a parking management service.

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일 실시예를 상세히 설명하기로 한다.The above-described objects, features and advantages will become more apparent through the following detailed description in conjunction with the accompanying drawings, and accordingly, those skilled in the art to which the present invention belongs can easily implement the technical idea of the present invention. There will be. In addition, in describing the present invention, if it is determined that a detailed description of a known technology related to the present invention may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. Hereinafter, a preferred embodiment according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른, 영상 분석 시스템이 적용되는 환경을 나타내는 도면이다.1 is a diagram illustrating an environment to which an image analysis system according to an embodiment of the present invention is applied.

도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른, 영상 분석 시스템(200)은, 복수의 카메라(120), 사용자 단말(110) 각각과 네트워크(300)를 통하여 통신한다. 상기 네트워크(300)는 와이파이와 같은 근거리 무선통신망, 4세대 이동통신망, 5세대 이동통신망 및 유선 통신망을 포함한다.As shown in FIG. 1 , the video analysis system 200 according to an embodiment of the present invention communicates with each of a plurality of cameras 120 and user terminals 110 through a network 300 . The network 300 includes a local area wireless communication network such as Wi-Fi, a 4G mobile communication network, a 5G mobile communication network, and a wired communication network.

상기 사용자 단말(110)은 스마트폰, 태블릿 컴퓨터 등과 같은 이동형 통신 장치로서, 객체의 방향 정보를 영상 분석 시스템(200)으로부터 수신할 수 있다. 또한, 사용자 단말(110)은 내장된 카메라를 통해서 영상을 촬영하여, 이 영상을 영상 분석 시스템(200)으로 전송할 수 있다. The user terminal 110 is a mobile communication device such as a smart phone or a tablet computer, and may receive direction information of an object from the image analysis system 200 . In addition, the user terminal 110 may capture an image through a built-in camera and transmit the image to the image analysis system 200 .

카메라(120)는 이동 가능한 영상 촬영 장치로서, 예컨대 네트워크 카메라이다. 상기 카메라(120)는 주변 영상을 촬영하여 영상 분석 시스템(200)으로 전송할 수 있다. 상기 카메라(120)는 사용자가 착용할 수 있는 웨어러블 기기에 탑재될 수도 있다. 또는, 상기 카메라(120)는 360도로 촬영 가능한 카메라일 수 있다. 카메라(120)는 전체 영상을 복수 개로 분할(예를 들어, 4등분로 분할)하여 촬영하고, 이 분할된 영상을 영상 분석 시스템(200)으로 전송할 수 있다. The camera 120 is a movable image capture device, for example, a network camera. The camera 120 may capture a surrounding image and transmit it to the image analysis system 200 . The camera 120 may be mounted on a wearable device worn by a user. Alternatively, the camera 120 may be a camera capable of capturing 360 degrees. The camera 120 may divide the entire image into a plurality of pieces (eg, into quarters) and capture the image, and transmit the divided image to the image analysis system 200 .

영상 분석 시스템(200)은 사용자 단말(110) 또는 카메라(120)로부터 수신한 영상을 분석하여, 영상에서 객체를 검출하여 인식한다. 상기 영상 분석 시스템(200)은 수신한 원본 영상을 여러 차례의 컨볼루션 연산하여, 서로 다른 크기의 복수의 기준 특징 맵(feature map)을 생성하고, 상기 기준 특징 맵을 이용하여 원본 영상에서 객체가 위치하는 영역을 인식할 수 있다. The image analysis system 200 analyzes the image received from the user terminal 110 or the camera 120 and detects and recognizes an object in the image. The image analysis system 200 performs convolution operations on the received original image several times to generate a plurality of reference feature maps of different sizes, and uses the reference feature maps to determine whether an object in the original image is area can be recognized.

도 2는 본 발명의 일 실시예에 따른 영상 분석 시스템의 구성을 나타내는 도면이다.2 is a diagram showing the configuration of an image analysis system according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 영상 분석 시스템(200)은 영상 수신부(210), 화면 분할부(220), 특징 맵 추출부(230), 객체 인식부(240), 방향 알림부(250) 및 데이터베이스(260)를 포함하고, 이러한 구성요소들은 하드웨어 또는 소프트웨어로 구성되거나, 하드웨어와 소프트웨어의 결합을 통해서 구현될 수 있다.As shown in FIG. 2, the video analysis system 200 includes an image receiver 210, a screen divider 220, a feature map extractor 230, an object recognizer 240, a direction notification unit 250, and It includes the database 260, and these components may be configured with hardware or software, or implemented through a combination of hardware and software.

또한, 영상 분석 시스템(200)은 하나 이상의 프로세서와 메모리를 포함할 수 있으며, 영상 수신부(210), 화면 분할부(220), 특징 맵 추출부(230), 객체 인식부(240) 및 방향 알림부(250)는 상기 프로세서에 의해서 실행되는 프로그램 형태로 상기 메모리에 탑재(저장)될 수 있다.In addition, the image analysis system 200 may include one or more processors and memory, and may include an image receiver 210, a screen divider 220, a feature map extractor 230, an object recognizer 240, and a direction notification. The unit 250 may be loaded (stored) in the memory in the form of a program executed by the processor.

데이터베이스(260)는 스토리지와 같은 대용량 저장수단으로서, 영상 데이터 등과 같은 각종 데이터를 저장한다. 또한, 데이터베이스(260)는 객체 인식 정보를 저장할 수 있으며, 건물 도면을 저장할 수 있다. 상기 데이터베이스(260)는 각 기준 특징 맵의 영역과 원본 영상의 좌표 영역을 매칭시키는 좌표 참조 데이터를 저장할 수 있다. 여기서 기준 특징 맵은 특징 맵들 중에서 객체 검출의 기준으로서 이용되는 특징 맵을 의미한다. 또한, 데이터베이스(260)는 사람, 동물, 차량 번호판 등과 같은 객체를 나타내는 객체 패턴 정보를 저장할 수 있다.The database 260 is a mass storage means such as storage and stores various data such as image data. Also, the database 260 may store object recognition information and may store building drawings. The database 260 may store coordinate reference data matching the region of each reference feature map with the coordinate region of the original image. Here, the reference feature map means a feature map used as a reference for object detection among feature maps. In addition, the database 260 may store object pattern information representing objects such as people, animals, license plates, and the like.

영상 수신부(210)는 카메라(120), 사용자 단말(110) 중에서 어느 하나로부터 영상 데이터를 수신한다. 상기 영상 수신부(210)는 수신한 영상을 데이터베이스(260)에 저장할 수 있다. 상기 영상 수신부(210)는 영상으로서 분할된 영상을 카메라(120)로부터 수신할 수 있다.The image receiver 210 receives image data from either the camera 120 or the user terminal 110 . The image receiver 210 may store the received image in the database 260 . The image receiving unit 210 may receive divided images from the camera 120 as images.

화면 분할부(220)는 카메라(120) 또는 사용자 단말(110)로부터 영상을 수신하면, 이 영상을 사전에 설정한 크기의 복수의 영역으로 분할하는 기능을 수행한다. 상기 화면 분할부(220)는 카메라(120)로부터 이미 분할된 영상이 수신되면, 자체적으로 화면을 분할을 수행하지 않을 수도 있다. 또한, 상기 화면 분할부(220)는 카메라(120)로부터 이미 분할된 영상이 수신되면, 분할된 영상 중에서 사전에 설정된 해상도를 만족하지 않은 영상이 존재하면 이 영상의 해상도를 변경할 수도 있다. When an image is received from the camera 120 or the user terminal 110, the screen divider 220 divides the image into a plurality of regions of a preset size. The screen division unit 220 may not perform the division of the screen by itself when an already divided image is received from the camera 120 . In addition, when an already divided image is received from the camera 120, the screen divider 220 may change the resolution of the divided image if there is an image that does not satisfy a preset resolution.

특징 맵 추출부(230)는 분할된 원본 영상에 대해서, 여러 단계의 컨볼루션 연산을 진행하여 복수의 특징 맵을 추출하고, 이 추출한 특징 맵들 중에서 서로 다른 크기를 가지는 복수의 기준 특징 맵을 선정한다. 특징 맵 추출부(230)에서 기준 특징 맵을 선정하는 구체적인 내용은 도 3을 참조하여 후술된다.The feature map extractor 230 extracts a plurality of feature maps by performing a multi-step convolution operation on the segmented original image, and selects a plurality of reference feature maps having different sizes from among the extracted feature maps. . Details of selecting a reference feature map in the feature map extractor 230 will be described later with reference to FIG. 3 .

객체 인식부(240)는 서로 다른 크기를 가지는 기준 특징 맵을 분석하여, 원본 영상에서 사람, 번호판, 동물 등과 같은 객체를 인식한다. 상기 객체 인식부(240)는 각각의 기준 특징 맵을 분석하여 객체가 위치하는 것으로 예측되는 복수의 후보 영역을 원본 영상에서 설정하고, 각 후보 영역별 확률과 중첩 정도를 토대로, 후보 영역들 중에서 객체가 위치하는 하나 이상의 영역을 확인한다. 구체적인 내용은 도 4를 참조하여 후술된다.The object recognizing unit 240 recognizes objects such as people, license plates, and animals in the original image by analyzing reference feature maps having different sizes. The object recognizing unit 240 analyzes each reference feature map, sets a plurality of candidate regions where the object is predicted to be located in the original image, and based on the probability and overlapping degree of each candidate region, the object among the candidate regions. Identify one or more regions where is located. Details will be described later with reference to FIG. 4 .

방향 알림부(250)는 사용자 단말(110)의 현재 위치와 상기 객체 인식부(240)에서 인식한 객체의 위치를 비교하여, 사용자 단말(110)의 기준으로 객체가 위치하는 방향 정보를 생성하고, 이 생성한 방향 정보를 사용자 단말(110)로 전송한다. 이때, 방향 알림부(250)는 사용자 단말(110)의 위치를 토대로 사용자가 위치한 건물 도면을 데이터베이스(260)에서 확인하고, 이 건물 도면과 방향 정보를 함께 사용자 단말(110)로 전송할 수 있다. The direction notification unit 250 compares the current location of the user terminal 110 with the location of the object recognized by the object recognizing unit 240, and generates direction information in which the object is located based on the user terminal 110. , transmits the generated direction information to the user terminal 110. At this time, the direction notification unit 250 may check the building drawing where the user is located in the database 260 based on the location of the user terminal 110 and transmit the building drawing and direction information to the user terminal 110 together.

도 3은 본 발명의 일 실시예에 따른, 영상 분석 시스템에서 기준 특징 맵을 생성하는 방법을 설명하는 흐름도이다. 3 is a flowchart illustrating a method of generating a reference feature map in an image analysis system according to an embodiment of the present invention.

도 3을 참조하면, 영상 수신부(210)가 카메라(120) 또는 사용자 단말(110)에서 촬영한 영상을 수신한다(S301). 상기 영상 수신부(210)는 카메라(120) 또는 사용자 단말(110)로부터 영상을 수신하면, 화면 분할부(220)는 상기 영상을 사전에 설정된 크기로 분할할 수 있다.Referring to FIG. 3 , the image receiving unit 210 receives an image captured by the camera 120 or the user terminal 110 (S301). When the image receiver 210 receives an image from the camera 120 or the user terminal 110, the screen divider 220 may divide the image into a preset size.

그러면, 특징 맵 추출부(230)는 화면 분할부(220)에서 분할된 영상 또는 영상 수신부(210)에서 수신한 영상을 원본 영상으로 설정하고, 이 원본 원상을 제1컨볼루션 레이어에 입력값으로 입력하여, 상기 원본 영상에서 제1특징 맵을 추출한다(S303). 이때, 제1컨볼루션 레이어에는 사전에 설정된 필터(예컨대, 3×3 필터)와 사전에 설정된 스트라이드(stride)가 적용될 수 있으며, 상기 특징 맵 추출부(230)는 풀링(Pooling)을 진행하지 않고 상기 제1컨볼루션 레이어와 스트라이드를 이용하여, 원본 영상과 대비하여 축소된 제1특징 맵을 원본 영상에서 추출할 수 있다. 상기 특징 맵 추출부(230)는 서로 다른 복수의 제1컨볼루션 레이어를 이용하여, 상기 원본 영상에 대하여 반복적으로 컨볼루션을 진행하여, 제1특징 맵을 추출할 수 있다. 예를 들어, 첫번째 컨볼루션 연산에서 제1필터를 가지는 제1컨볼루션 레이어에 원본 영상이 입력되고, 두번째 연산에서부터 바로 직전의 제1컨볼루션 레이어의 결과값이 입력값으로 입력될 수 있다. 이때, 제1컨볼루션 레이어의 필터는 직전이 사용한 필터와 상이하거나 동일할 수 있다. Then, the feature map extractor 230 sets the image divided by the screen divider 220 or the image received from the image receiver 210 as an original image, and uses the original image as an input value to the first convolution layer. input, and a first feature map is extracted from the original image (S303). At this time, a preset filter (eg, 3×3 filter) and a preset stride may be applied to the first convolution layer, and the feature map extractor 230 does not perform pooling. A reduced first feature map compared to the original image may be extracted from the original image using the first convolution layer and the stride. The feature map extractor 230 may extract a first feature map by repeatedly performing convolution on the original image using a plurality of different first convolution layers. For example, in the first convolution operation, the original image is input to the first convolution layer having the first filter, and in the second operation, the result value of the immediately preceding first convolution layer may be input as an input value. In this case, the filter of the first convolution layer may be different from or the same as the filter used immediately before.

다음으로, 특징 맵 추출부(230)는 추출한 제1특징 맵을 입력값으로서 컨볼루션 연산을 하여 제2특징 맵을 추출한다(S305). 즉, 특징 맵 추출부(230)는 제1특징 맵을 제2-1컨볼루션 레이어의 입력값으로 입력하여 제2-1특징 맵과 제2-2특징 맵으로 추출하고, 추출한 제2-1특징 맵과 제2-2특징 맵을 합 연산하여, 제2특징 맵을 추출한다. 상기 제2-1컨볼루션 레이어에는 1×1 필터가 적용될 수 있으며, 제2-2컨볼루션 레이어에는 3×3 필터와 패딩(padding) 연산이 적용될 수 있다. 또한, 제2-1컨볼루션 레이어가 사전에 설정된 횟수만큼 반복적으로 이용되어, 제2-1특징 맵을 추출할 수 있으며, 더불어 제2-2컨볼루션 레이어가 사전에 설정된 횟수만큼 반복적으로 이용되어 제2-2특징 맵을 추출할 수 있다. 상기 제2-1컨볼루션 레이어가 반복적으로 이용되는 경우, 첫번째 연산에서 제2-1 컨볼루션 레이어에 입력값으로 제1특징 맵이 입력되고, 두번째 연산부터는 바로 직전의 연산 결과값이 제2-1컨볼루션 레이어의 입력값으로 입력된다. 마찬가지로, 상기 제2-2컨볼루션 레이어가 반복적으로 적용되는 경우, 첫번째 연산에서 제2-2 컨볼루션 레이어에 입력값으로 제1특징 맵이 입력되고, 두번째 연산부터는 바로 직전의 연산 결과값이 제2-2컨볼루션 레이어의 입력값으로 입력된다. Next, the feature map extractor 230 extracts a second feature map by performing a convolution operation on the extracted first feature map as an input value (S305). That is, the feature map extractor 230 inputs the first feature map as an input value of the 2-1 convolution layer, extracts the 2-1 feature map and the 2-2 feature map, and extracts the extracted 2-1 feature map. A second feature map is extracted by summing the feature map and the 2-2 feature map. A 1×1 filter may be applied to the 2-1 convolution layer, and a 3×3 filter and a padding operation may be applied to the 2-2 convolution layer. In addition, the 2-1 convolutional layer is repeatedly used a preset number of times to extract the 2-1 feature map, and the 2-2 convolutional layer is repeatedly used a preset number of times A 2-2 feature map may be extracted. When the 2-1 convolution layer is repeatedly used, the first feature map is input as an input value to the 2-1 convolution layer in the first operation, and from the second operation, the immediately preceding operation result value is the second-1 convolution layer. It is input as an input value of 1 convolution layer. Similarly, when the 2-2 convolution layer is repeatedly applied, the first feature map is input as an input value to the 2-2 convolution layer in the first operation, and the immediately preceding operation result value is provided in the second operation. It is input as the input value of the 2-2 convolution layer.

다음으로, 특징 맵 추출부(230)는 상기 제2특징 맵을 풀링(pooling)하여, 제2특징 맵에서 제3특징 맵을 추출한다(S307). 이때, 특징 맵 추출부(230)는, 최대값 풀링(Max pooling) 또는 평균 풀링(Mean pooling)을 이용하여, 제2특징 맵을 축소하여 제3특징 맵을 생성할 수 있다. 풀링으로 인하여, 제3특징 맵은 제2특징 맵보다 크기가 작다.Next, the feature map extractor 230 extracts a third feature map from the second feature map by pooling the second feature map (S307). In this case, the feature map extractor 230 may generate a third feature map by reducing the second feature map using max pooling or mean pooling. Due to pooling, the third feature map is smaller in size than the second feature map.

그리고 특징 맵 추출부(230)는 상기 풀링되어 축소된 제3특징 맵을 제3컨볼루션 레이어에 입력값으로 입력하여 제4특징 맵을 추출하고, 이 제4특징 맵을 제1기준 특징 맵으로 선정한다(S309). 이때, 상기 제3컨볼루션에는 1×1 필터가 적용될 수 있다. 상기 제1기준 특징 맵은 기준 특징 맵들 중에서 가장 작은 크기를 갖는다.Then, the feature map extractor 230 extracts a fourth feature map by inputting the pooled and reduced third feature map as an input value to a third convolution layer, and converts the fourth feature map into a first reference feature map. Select (S309). In this case, a 1×1 filter may be applied to the third convolution. The first reference feature map has the smallest size among reference feature maps.

다음으로, 특징 맵 추출부(230)는 제2특징 맵 및 가장 마지막 컨볼루션(즉, 제3컨볼루션) 레이어에 입력값으로 이용된 제3특징 맵을 이용하여 1차 업샘플링을 수행하여 제1기준 특징 맵 보다 크기가 확대된 제5특징 맵을 추출한다(S311). Next, the feature map extractor 230 performs first upsampling using the second feature map and the third feature map used as input values to the last convolution (ie, third convolution) layer to obtain A fifth feature map enlarged in size from the first reference feature map is extracted (S311).

다음으로, 특징 맵 추출부(230)는 상기 1차 업샘플링 된 제1특징 맵과 상기 제2특징 맵을 접합(concatenation) 연산한 후, 접합 연산한 결과를 컨볼루션 연산하여 제2기준 특징 맵을 선정한다(S313). 즉, 특징 맵 추출부(230)는 상기 제5특징 맵과 상기 제2특징 맵을 접합(concatenation) 연산하여 제5-1특징 맵을 추출하고, 상기 추출한 제5-1특징 맵을 제4컨볼루션 레이어에 입력값으로 입력하여 제6특징 맵을 추출하여 이 제6특징 맵을 제2기준 특징 맵으로서 선정한다. 상기 제4컨볼루션 레이어에는 1×1 필터 또는 2×2 필터가 적용될 수 있다. Next, the feature map extractor 230 performs a concatenation operation on the primary upsampled first feature map and the second feature map, and then performs a convolution operation on the result of the concatenation operation to obtain a second reference feature map. is selected (S313). That is, the feature map extractor 230 performs a concatenation operation on the fifth feature map and the second feature map to extract a 5-1 feature map, and uses the extracted 5-1 feature map as a fourth convoluted feature map. A sixth feature map is extracted by inputting it to the solution layer as an input value, and this sixth feature map is selected as the second reference feature map. A 1×1 filter or a 2×2 filter may be applied to the fourth convolution layer.

이어서, 특징 맵 추출부(230)는 가장 마지막 컨볼루션(즉, 제4컨볼루션) 레이어에 입력값으로 이용된 제5-1특징 맵을 2차 업샘플링을 수행하여 제2기준 특징 맵 보다 크기가 확대된 제7특징 맵을 추출할 수 있다(S315). 그리고 특징 맵 추출부(230)는 상기 2차 업샘플링 된 제7특징 맵과 상기 제1특징 맵을 접합(concatenation) 연산한 후, 접합 연산한 결과를 컨볼루션 연산하여 제3기준 특징 맵을 선정한다(S317). 즉, 특징 맵 추출부(230)는 상기 제1특징 맵 및 상기 제7특징 맵을 접합(concatenation) 연산하여 제7-1특징 맵을 추출하고, 상기 추출한 제7-1특징 맵을 제5컨볼루션 레이어에 입력값으로 입력하여 제8특징 맵을 추출한 후, 이 제8특징 맵을 제3기준 특징 맵으로서 선정한다. 상기 제5컨볼루션 레이어에는 1×1 필터 또는 2×2 필터가 적용될 수 있다. Subsequently, the feature map extractor 230 performs secondary upsampling on the 5-1 feature map used as an input value to the last convolution (ie, 4th convolution) layer so that the feature map is larger than the second reference feature map. An enlarged seventh feature map may be extracted (S315). Further, the feature map extractor 230 performs a concatenation operation on the second upsampled seventh feature map and the first feature map, and performs a convolution operation on the result of the concatenation operation to select a third reference feature map. Do (S317). That is, the feature map extractor 230 extracts a 7-1 feature map by performing a concatenation operation on the first feature map and the seventh feature map, and converts the extracted 7-1 feature map into a fifth convolve. After extracting the eighth feature map by inputting it to the solution layer as an input value, the eighth feature map is selected as the third reference feature map. A 1×1 filter or a 2×2 filter may be applied to the fifth convolution layer.

상술한 바와 같이, 특징 맵 추출부(230)는 컨볼루션 또는 풀링을 통해서 특징 맵을 축소시켜, 제일 작은 제1기준 특징 맵을 선정하고, 1차 및 2차 업샘플링한 특징 맵을 이용하여, 중간 크기의 제2기준 특징 맵과 최대 크기의 제3기준 특징 맵을 선정한다. As described above, the feature map extractor 230 reduces the feature map through convolution or pooling, selects the smallest first reference feature map, and uses the first and second upsampled feature maps, A medium-sized second reference feature map and a maximum-sized third reference feature map are selected.

한편, 상술한 실시예에서, 업샘플링 횟수는 2회인 것으로 설명되었으나, 상기 업샘플링의 횟수는 원본 영상이 축소되는 횟수에 비례하여 유동적으로 변경될 수 있으며, 더불어 기준 특징 맵의 개수도 원본 영상의 축소 횟수에 비례하여 변경될 수 있다.Meanwhile, in the above-described embodiment, the number of upsampling is described as being twice, but the number of upsampling can be flexibly changed in proportion to the number of times the original image is reduced, and the number of reference feature maps is also equal to that of the original image. It can be changed in proportion to the number of reductions.

도 4는 본 발명의 일 실시예에 따른, 영상 분석 시스템에서 객체를 확인하는 방법을 설명하는 흐름도이다. 4 is a flowchart illustrating a method of identifying an object in an image analysis system according to an embodiment of the present invention.

도 5는 복수의 후보 영역을 예시하는 도면이다.5 is a diagram illustrating a plurality of candidate regions.

도 4 및 도 5를 참조하면, 객체 인식부(240)는 크기가 상이한 복수의 기준 특징 맵에서 객체가 나타내는 영역을 인식한다(S401). 이때, 객체 인식부(240)는 기준 특징 맵의 데이터와 데이터베이스(260)에 저장된 객체 패턴을 비교하여, 기준 특징 맵별로 객체가 나타내는 영역이 존재하는지 여부를 판별하여 객체가 나타내는 영역을 기준 특징 맵별로 확인할 수 있다.Referring to FIGS. 4 and 5 , the object recognizing unit 240 recognizes an area indicated by an object in a plurality of reference feature maps having different sizes (S401). At this time, the object recognizer 240 compares the data of the reference feature map with the object pattern stored in the database 260, determines whether or not there is a region represented by the object for each reference feature map, and identifies the region represented by the object as the reference feature map. you can check very much.

다음으로, 객체 인식부(240)는 기준 특징 맵에서 객체가 존재하는 것으로 확인되면, 기준 특징 맵에서 인식한 객체의 표출 영역을 원본 영상에서 확인하고, 이 원본 영상에서 확인한 객체의 표출 영역에 대한 중심좌표를 확인한다(S403). 상기 객체 인식부(240)는 기준 특징 맵의 영역과 원본 영상의 좌표 영역을 매칭시키는 좌표 참조 데이터를 데이터베이스(260)에서 확인하고, 이 확인한 좌표 참조 데이터를 토대로 원본 영상에서 객체가 위치하는 영역을 확인하고, 이 영역에 대한 중심좌표를 계산한다. 부연하면, 기준 특징 맵과 실제 원본 영상의 크기가 서로 다르기 때문에, 기준 특징 맵에서 인식한 객체의 표출 영역을 원본 영상에서 매칭하여 인식하기 위하여, 객체 인식부(240)는 좌표 참조 데이터를 참조한다. Next, when it is confirmed that the object exists in the reference feature map, the object recognition unit 240 checks the expression area of the object recognized in the reference feature map in the original image, and determines the expression area of the object identified in the original image. The center coordinates are checked (S403). The object recognizing unit 240 checks the database 260 for coordinate reference data matching the region of the reference feature map and the coordinate region of the original image, and determines the region where the object is located in the original image based on the checked coordinate reference data. Check, and calculate the coordinates of the center for this area. In other words, since the size of the reference feature map and the actual original image are different, the object recognition unit 240 refers to the coordinate reference data in order to match and recognize the expression area of the object recognized from the reference feature map in the original image. .

이어서, 객체 인식부(240)는 하나 이상의 기준 특징 맵에서 객체가 인식되면, 원본 영상에서 인식한 객체의 중심 좌표를 중심으로, 서로 다른 형상을 가지는 복수의 후보 영역을 원본 영상에서 설정한다(S405). 이때, 객체 인식부(240)는 객체를 인식한 기준 특징 맵의 크기를 토대로, 각 후보 영역의 크기를 결정할 수 있다. 즉, 객체 인식부(240)는 객체 인식의 기초가 되는 기준 특징 맵별로 서로 다른 크기의 후보 영역을 원본 영상에서 설정할 수 있다. Subsequently, when an object is recognized in one or more reference feature maps, the object recognizing unit 240 sets a plurality of candidate regions having different shapes in the original image, centered on the coordinates of the center of the object recognized in the original image (S405). ). In this case, the object recognizing unit 240 may determine the size of each candidate region based on the size of the reference feature map from which the object is recognized. That is, the object recognizer 240 may set candidate regions of different sizes in the original image for each reference feature map that is a basis for object recognition.

부연하면, 기준 특징 맵의 크기에 따라서 객체를 인식할 수 있는 성공률의 차이가 발생할 수 있다. 즉, 크기가 작은 제1기준 특징 맵을 이용하여 객체를 인식하는 경우, 원본 영상에서 작게 보이는 객체를 인식하는 성공률이 떨어지나 반대로 크기가 큰 객체를 인식하는 성공률은 높아진다. 반면에, 크기가 가장 큰 제3기준 특징 맵을 이용하여 객체를 인식하는 경우, 원본 영상에서 작게 보이는 객체에 대한 인식 성공률은 높게 나타날 수 있으나, 크기가 큰 객체의 영역을 완전하게 인식하는 성공률은 상대적으로 낮게 나타날 수 있다. 이렇게, 기준 특징 맵의 크기가 서로 상이하고 객체 크기에 따라 인식률이 상이함으로 인하여, 가장 작은 크기의 제1기준 특징 맵을 통해서 인식된 객체의 후보 영역은 가장 크게 원본 영상에서 설정되고, 가장 크기가 큰 제3기준 특징 맵을 통해서 인식된 객체의 후보 영역은 가장 작은 크기로 원본 영상에서 설정된다.To elaborate, a difference in the success rate of recognizing an object may occur depending on the size of the reference feature map. That is, when an object is recognized using the first reference feature map having a small size, the success rate of recognizing a small-looking object in the original image is reduced, but the success rate of recognizing a large-sized object is increased. On the other hand, when recognizing an object using the third reference feature map having the largest size, the recognition success rate for an object that looks small in the original image may be high, but the success rate of completely recognizing a region of a large object is may appear relatively low. In this way, since the sizes of the reference feature maps are different from each other and the recognition rate is different according to the size of the object, the candidate region of the object recognized through the first reference feature map having the smallest size is set to be the largest in the original image and has the largest size. A candidate region of an object recognized through the large third reference feature map is set to the smallest size in the original image.

도 5에서는 제1기준 특징 맵을 통해서 인식한 객체(즉, 사람)의 중심좌표가 참조부호 51로, 제2기준 특징 맵을 기반으로 인식한 객체의 중심좌표가 참조부호 52로, 제3기준 특징 맵을 기반으로 인식한 객체의 중심좌표가 참조부호 53으로 예시된다. 도 5를 참조하면, 참조부호 51의 중심좌표를 기준으로 가장 큰 면적을 가지는 3개의 후보영역(51a, 51b, 51c)이 화면에 설정되고, 참조부호 52의 중심좌표를 기준으로 중간 크기의 면적을 가지는 3개의 후보영역(52a, 52b, 52c)이 화면에 설정된다. 또한, 참조부호 53의 중심좌표를 기준으로 최소 크기의 면적을 가지는 3개의 후보영역(53a, 53b, 53c)이 화면에 설정된다. 제1기준 특징 맵, 제2기준 특징 맵을 이용한 경우에 가장 큰 객체(즉, 사람)를 인식할 수 있으나, 작은 객체를 인식하지 못할 수 있다. 또한, 각 기준 특징 맵을 이용하여 설정한 복수의 후보 영역(51a, 51b, 51c, 52a, 52b, 52c, 53a, 53b, 53c)이 원본 영상에서 설정될 경우, 후보 영역(51a, 51b, 51c, 52a, 52b, 52c, 53a, 53b, 53c)에서 객체가 온전하게 표출되는 것이 서로 상이할 수 있다. 즉, 특정 후보 영역에서는 객체가 온전하게 표출될 수 있으나, 다른 후보 영역에서는 객체가 일부만 표출될 수도 있다. In FIG. 5, the center coordinates of an object (ie, a person) recognized through the first reference feature map are indicated by reference numeral 51, and the center coordinates of objects recognized based on the second reference feature map are indicated by reference numeral 52. The center coordinates of the object recognized based on the feature map are exemplified by reference numeral 53. Referring to FIG. 5, three candidate regions 51a, 51b, and 51c having the largest area based on the center coordinate of reference numeral 51 are set on the screen, and have an intermediate size based on the center coordinate of reference numeral 52. Three candidate regions 52a, 52b, and 52c having ? are set on the screen. In addition, three candidate regions 53a, 53b, and 53c having the smallest area based on the center coordinate of reference numeral 53 are set on the screen. When the first reference feature map and the second reference feature map are used, the largest object (ie, a person) may be recognized, but a small object may not be recognized. In addition, when a plurality of candidate regions 51a, 51b, 51c, 52a, 52b, 52c, 53a, 53b, and 53c set using each reference feature map are set in the original image, the candidate regions 51a, 51b, and 51c , 52a, 52b, 52c, 53a, 53b, 53c) may differ from each other in that the object is fully expressed. That is, an object may be completely expressed in a specific candidate region, but only a part of the object may be expressed in another candidate region.

다시 도 4를 참조하면, 객체 인식부(240)는 각 후보 영역에 객체가 표출되는 확률을 산출한다(S407). 상기 확률은 객체가 온전하게 후보 영역에서 나타날 수 있는 가능성을 수치화한 것으로서, 1에서부터 0까지로 설정될 수 있다. 예를 들어, 객체가 특정 후보 영역에 모두 나타나면, 상기 후보 영역의 확률은 1로 산출될 수 있으며, 또한 객체의 후보 영역에 전혀 나타나지 않으면 확률이 0으로 설정될 수 있으며, 후보 영역에 객체가 일부가 표출되면 이 표출된 비율에 따라 확률이 0 초과 1 미만 사이로 계산될 수 있다. 도 5의 후보 영역을 예를 들어 설명하면, 객체 인식부(240)는 총 9개의 후보 영역별로 확률을 계산할 수 있다.Referring back to FIG. 4 , the object recognizing unit 240 calculates a probability that an object is displayed in each candidate region (S407). The probability is a numerical value of the probability that an object can appear in the candidate area intact, and can be set from 1 to 0. For example, if all objects appear in a specific candidate region, the probability of the candidate region may be calculated as 1, and if the object does not appear in the candidate region at all, the probability may be set to 0, and the object may be partially present in the candidate region. When is expressed, the probability can be calculated between 0 and less than 1 according to the expressed ratio. Referring to the candidate regions of FIG. 5 as an example, the object recognizing unit 240 may calculate probabilities for each of nine candidate regions.

객체 인식부(240)는 각각의 후보 영역 중에서 확률이 사전에 설정된 임계값(예를 들어, 0.5) 이상인 후보 영역을 원본 영상에 남겨두고, 상기 설정된 값 미만의 확률을 가지는 후보 영역들을 원본 영상에서 제거하여 후보 영역을 필터링한다(S409).The object recognizing unit 240 leaves candidate regions having a probability greater than or equal to a preset threshold value (eg, 0.5) among each candidate region in the original image, and candidate regions having a probability less than the set value in the original image. The candidate region is filtered by removing it (S409).

다음으로, 객체 인식부(240)는 후보 영역의 필터링이 완료되면, 후보 영역들 간에 중첩되는 영역이 존재하는지 여부를 판별하고(S411), 중첩되는 후보 영역들이 존재하면 해당 후보 영역들의 중첩 비율이 사전에 설정된 임계 비율(예컨대, 50%)을 초과하는지 여부를 판별한다(S413).Next, when the filtering of the candidate regions is completed, the object recognition unit 240 determines whether there is an overlapping region between the candidate regions (S411), and if there are overlapping candidate regions, the overlapping ratio of the corresponding candidate regions is determined. It is determined whether or not a preset threshold ratio (eg, 50%) is exceeded (S413).

객체 인식부(240)는 중첩 비율이 임계 비율을 초과하여 중첩된 복수의 후보 영역이 존재하면, 이 후보 영역들을 중첩 그룹으로 설정한다. 그리고 객체 인식부(240)는 상기 중첩 그룹 중에서 객체 표출 확률이 가장 높은 후보 영역을 객체가 존재하는 영역(이하, '객체 영역'으로 지칭함)으로서 인식하고, 나머지 후보 영역을 원본 영상에서 제거한다(S415). 도 6은 객체가 인식된 영역을 예시하는 도면으로서, 도 5와 도 6을 참조하면, 중첩된 후보 영역 중에서 확률이 가장 높은 영역이 최종적으로 선정되는 것을 예시한다. The object recognizing unit 240 sets the candidate regions as an overlapping group, when a plurality of overlapping candidate regions exist with an overlapping ratio exceeding a threshold ratio. The object recognizing unit 240 recognizes a candidate region having the highest object expression probability among the overlapping groups as a region where an object exists (hereinafter, referred to as an 'object region'), and removes the remaining candidate regions from the original image ( S415). FIG. 6 is a diagram illustrating a region in which an object is recognized. Referring to FIGS. 5 and 6 , a region having the highest probability among overlapping candidate regions is finally selected.

한편, 객체 인식부(240)는 중첩된 복수의 후보 영역이 임계 비율을 초과하지 않으면, 화면에 나타나는 후보 영역 각각을 객체 영역으로 인식한다(S419). 이 경우, 객체 인식부(240)는 도 6과 같이, 복수 개의 객체 영역을 원본 영상에서 인식할 수 있다.Meanwhile, the object recognition unit 240 recognizes each of the candidate regions appearing on the screen as an object region when a plurality of overlapping candidate regions does not exceed a threshold ratio (S419). In this case, the object recognizing unit 240 may recognize a plurality of object areas from the original image as shown in FIG. 6 .

객체 인식부(240)는 객체 영역이 인식되면, 상기 객체 영역에 나타내는 객체를 분석하여, 객체의 상태(예컨대, 직립, 구부림, 앉음 등)를 식별하거나 객체의 신원을 확인할 수 있으며, 또는 객체를 추적할 수도 있다.When the object area is recognized, the object recognizing unit 240 analyzes the object represented in the object area to identify the state of the object (eg, upright, bent, sitting, etc.) or to identify the object, or identify the object. can also be tracked.

또한, 방향 알림부(250)는 객체를 인식하는 경우, 기준 위치에서 상기 인식한 객체의 방향을 나타내는 방향 정보를 생성하고, 이 방향 정보를 사용자 단말로 전송할 수 있다(S417). 이때, 기준 위치는 객체의 위치를 요구하는 사용자 단말(110)의 위치일 수 있다. 일 실시예로서, 방향 알림부(250)는 사용자 단말(110)로부터 객체의 위치를 요청하는 메시지를 수신할 있으며, 공지된 측위 시스템(도면에 도시되지 않음)과 연동하여 사용자 단말(110)의 위치를 측위한 후, 사용자 단말(110)의 위치를 기준으로 인식한 객체가 존재하는 방향을 확인하고, 이 방향에 대한 정보를 사용자 단말(110)로 전송할 수 있다. 이 경우, 방향 알림부(250)는 사용자 단말(110)의 위치를 토대로, 사용자 단말(110)이 위치하는 도면을 데이터베이스(260)에서 추출하고, 이 도면과 함께 방향 정보를 사용자 단말(110)로 전송할 수 있다. 이때, 방향 알림부(250)는 객체의 위치를 도면에 추가적으로 표시할 수도 있다.In addition, when recognizing an object, the direction notification unit 250 may generate direction information representing the direction of the recognized object from the reference location and transmit the direction information to the user terminal (S417). In this case, the reference location may be the location of the user terminal 110 requesting the location of the object. As an embodiment, the direction notification unit 250 may receive a message requesting a location of an object from the user terminal 110, and interwork with a known positioning system (not shown) to determine the location of the user terminal 110. After measuring the location, the direction in which the recognized object exists based on the location of the user terminal 110 may be checked, and information about this direction may be transmitted to the user terminal 110 . In this case, the direction notification unit 250 extracts a drawing in which the user terminal 110 is located based on the location of the user terminal 110 from the database 260, and transmits direction information together with the drawing to the user terminal 110. can be sent to At this time, the direction notification unit 250 may additionally display the location of the object on the drawing.

도 7은 실내에서 사람을 인식한 영상을 예시하는 도면으로서, 본 발명에 따른 영상 분석 시스템(200)은 카메라로부터 멀리 떨어져서 측정된 사람까지도 정확하게 인식할 수 있으며, 사람의 상태(즉, 직립, 앉음 등)를 식별할 수 있다.7 is a diagram illustrating an image of a person recognized indoors, and the image analysis system 200 according to the present invention can accurately recognize even a person measured far from the camera, and the person's state (ie, upright, sitting) etc.) can be identified.

도 8은 영상 분석 시스템이 주차 관리 서비스에 적용되는 것을 예시한 도면이다. 8 is a diagram illustrating that an image analysis system is applied to a parking management service.

도 8을 참조하면, 영상 분석 시스템(200)은 차량 번호판을 객체로서 인식할 수 있으며, 더불어 사용자의 위치를 기준으로 객체의 방향 정보(81)와 실내 도면을 사용자의 사용자 단말(110)로 제공할 수 있다.Referring to FIG. 8 , the video analysis system 200 may recognize a vehicle license plate as an object, and also provide direction information 81 of the object and an indoor drawing based on the user's location to the user terminal 110 of the user. can do.

본 명세서는 많은 특징을 포함하는 반면, 그러한 특징은 본 발명의 범위 또는 특허청구범위를 제한하는 것으로 해석되어서는 안 된다. 또한, 본 명세서에서 개별적인 실시예에서 설명된 특징들은 단일 실시예에서 결합되어 구현될 수 있다. 반대로, 본 명세서에서 단일 실시예에서 설명된 다양한 특징들은 개별적으로 다양한 실시예에서 구현되거나, 적절히 결합되어 구현될 수 있다.While this specification contains many features, such features should not be construed as limiting the scope of the invention or the claims. Also, features described in separate embodiments in this specification may be implemented in combination in a single embodiment. Conversely, various features that are described in this specification in a single embodiment may be implemented in various embodiments individually or in combination as appropriate.

도면에서 동작들이 특정한 순서로 설명되었으나, 그러한 동작들이 도시된 바와 같은 특정한 순서로 수행되는 것으로, 또는 일련의 연속된 순서, 또는 원하는 결과를 얻기 위해 모든 설명된 동작이 수행되는 것으로 이해되어서는 안 된다. 특정 환경에서 멀티태스킹 및 병렬 프로세싱이 유리할 수 있다. 아울러, 상술한 실시예에서 다양한 시스템 구성요소의 구분은 모든 실시예에서 그러한 구분을 요구하지 않는 것으로 이해되어야 한다. 상술한 프로그램 구성요소 및 시스템은 일반적으로 단일 소프트웨어 제품 또는 멀티플 소프트웨어 제품에 패키지로 구현될 수 있다.Although actions are described in a particular order in the drawings, it should not be understood that such actions are performed in the specific order as shown, or that the actions are performed in a series of sequential order, or that all described actions are performed to achieve a desired result. . Multitasking and parallel processing can be advantageous in certain circumstances. In addition, it should be understood that the division of various system components in the above-described embodiments does not require such division in all embodiments. The program components and systems described above may generally be implemented as a package in a single software product or multiple software products.

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 형태로 기록매체(시디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다. 이러한 과정은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있으므로 더 이상 상세히 설명하지 않기로 한다.The method of the present invention as described above may be implemented as a program and stored in a recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.) in a computer-readable form. Since this process can be easily performed by a person skilled in the art to which the present invention belongs, it will not be described in detail.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the technical spirit of the present invention to those skilled in the art to which the present invention belongs, and thus the above-described embodiments and It is not limited by drawings.

110 : 사용자 단말 120 : 카메라
200 : 영상 분석 시스템 210 : 영상 수신부
220 : 화면 분할부 230 : 특징 맵 추출부
240 : 객체 인식부 250 : 방향 알림부
260 : 데이터베이스 300 : 네트워크110: user terminal 120: camera
200: video analysis system 210: video receiver
220: screen division unit 230: feature map extraction unit
240: object recognition unit 250: direction notification unit
260: database 300: network

Claims

As a method for identifying an object in an image analysis system,
extracting a plurality of reference feature maps having different sizes by performing a plurality of convolution operations in which an output value of a previous convolution operation is input as an input value of a next convolution operation using an original image as an initial input value;
Identifying a region in which an object is displayed for each of the reference feature maps, and setting a plurality of candidate regions having different shapes in an original image for each reference feature map based on coordinates of the identified expression region and coordinate reference data. ;
calculating a probability that the object is displayed for each of the candidate regions and calculating an overlapping ratio between each candidate region; and
Recognizing a region where an object is located in an original image based on the calculated object expression probability and overlap ratio;
The size of each candidate region set in the original image is determined according to the size of a reference feature map based on object recognition, and candidate regions based on the largest reference feature map are candidate regions based on other reference feature maps. and candidate regions based on the smallest reference feature map are larger than candidate regions based on other reference feature maps.

According to claim 1,
The step of recognizing is
and recognizing an area having the highest probability among the overlapping candidate areas as an area where an object is located when an overlapping ratio between the candidate areas exceeds a preset threshold ratio.

According to claim 1,
The step of recognizing is
If there are a plurality of candidate regions having an object presentation probability exceeding a preset threshold, and the overlapping ratio of each candidate region is equal to or less than a preset threshold ratio, the plurality of candidate regions are recognized as regions where objects are located. How to check an object with .

According to claim 1,
The step of setting the plurality of candidate regions in the original image,
checking an area in the original image that matches an area where an object of the reference feature map is displayed, and calculating center coordinates of the checked area; and
and setting a plurality of candidate regions having different shapes in the original image for each reference feature map based on the center coordinates.

delete

According to claim 1,
After the recognition step,
generating direction information indicating a direction to the recognized object based on a location of a user terminal requesting the location of the object; and
Providing the generated direction information to the user terminal; object confirmation method comprising a.

According to claim 1,
The step of extracting the plurality of reference feature maps,
extracting a reference feature map having a minimum size by performing a plurality of convolution operations in which an output value of a previous convolution operation is input as an input value of a next convolution operation using the original image as an initial input value;
Upsampling a feature map used as an input value of a most recently used convolution operation, and extracting a new feature map by performing a concatenation operation between the upsampled feature map and a previously extracted feature map; and
and extracting at least one reference feature map having a larger size than the reference feature map of the minimum size by performing a convolution operation on the feature map extracted through the concatenation operation.

An image analysis system that analyzes an image to identify an object,
an image receiving unit for receiving an original image;
Feature map extraction to extract a plurality of reference feature maps of different sizes by performing a plurality of convolution operations in which the output value of the previous convolution operation is input as an input value of the next convolution operation using the original image as an initial input value wealth; and
A region in which an object is expressed is checked for each reference feature map, and a plurality of candidate regions having different shapes are set in the original image for each reference feature map based on the coordinates of the confirmed expression region and the coordinate reference data. An object that calculates the probability that the object is displayed for each candidate region, calculates the overlap ratio between each candidate region, and then recognizes the region where the object is located in the original image based on the calculated object expression probability and overlap ratio. Recognition unit; including,
The object recognition unit,
The size of each candidate region set in the original image is determined according to the size of a reference feature map based on object recognition, and candidate regions based on the largest reference feature map are candidate regions based on other reference feature maps. An image analysis system that sets candidate regions based on a reference feature map having the smallest size to be larger than candidate regions based on other reference feature maps.

According to claim 8,
The object recognition unit,
When the overlapping ratio of the candidate regions exceeds a preset threshold ratio, the image analysis system, characterized in that for recognizing a region having the highest probability among the overlapping candidate regions as the region where the object is located.

According to claim 8,
The object recognition unit,
If there are a plurality of candidate regions having an object presentation probability exceeding a preset threshold, and the overlap ratio of each candidate region is less than or equal to a preset threshold ratio, image analysis recognizing the plurality of candidate regions as regions where objects are located. system.

According to claim 8,
The object recognition unit,
In the original image, a region that matches the region where the object of the reference feature map is displayed is checked, and after calculating the coordinates of the center of the checked region, a plurality of images having different shapes for each reference feature map are calculated based on the center coordinates. An image analysis system, characterized in that for setting a candidate region in the original image.

According to claim 8,
A video further comprising a; direction notification unit generating direction information indicating a direction to the recognized object based on the location of the user terminal requesting the location of the object, and providing the generated direction information to the user terminal. analysis system.

According to claim 8,
The feature map extractor,
Using the original image as the first input value, a plurality of convolution operations are performed in which the output value of the previous convolution operation is input as the input value of the next convolution operation to extract a reference feature map of the minimum size, and the last used convolution Upsampling the feature map used as the input value of the convolution operation, extracting a new feature map by performing a concatenation operation between the upsampled feature map and the previously extracted feature map, and then performing a convolution operation on the feature map extracted through the concatenation operation. The image analysis system, characterized in that for extracting one or more reference feature maps having a larger size than the reference feature map of the minimum size.