KR20180119013A

KR20180119013A - Method and apparatus for retrieving image using convolution neural network

Info

Publication number: KR20180119013A
Application number: KR1020170052427A
Authority: KR
Inventors: 백성욱; 칸 무하마드; 자밀 아마드; 이미영
Original assignee: 세종대학교산학협력단
Priority date: 2017-04-24
Filing date: 2017-04-24
Publication date: 2018-11-01
Also published as: KR101917369B1

Abstract

The present invention provides a method for operating an image retrieving apparatus applying a convolution neural network and the image retrieving apparatus. The method according to the present invention includes the steps of: obtaining an image frame; generating a plurality of color feature maps and a plurality of edge feature maps by convoluting a plurality of color sensitivity kernels and a plurality of edge sensitivity kernels in the image frame, respectively; generating the maximum color activity map for each of a color and an edge using an index of the color sensitivity kernel or an index of the edge sensitivity kernel matched to the maxim activity value at each pixel position based on the plurality of color feature maps and the plurality of edge feature maps; spatially pooling the maximum activity map for each of the color and the edge; and retrieving an object which exists in the image frame by comparing a previously store feature value with a result value concatenating the spatially pooled value. Accordingly, the present invention can more efficiently retrieve a specific object in image data.

Description

TECHNICAL FIELD [0001] The present invention relates to an image retrieval method using convolutional neural networks,

본 발명은 컨볼루션 신경망(convolution neural network)를 이용한 영상 검색 방법 및 그 장치에 관한 것이다. The present invention relates to an image retrieval method and apparatus using a convolution neural network.

다양한 산업 현장은 시각적 표현(예컨대, 이미지 등)에서 개체를 식별할 수 있는 기계 또는 프로세서를 이용함으로써 효율화될 수 있다. 컴퓨터 비전의 분야는 이미지의 개체를 식별 및 검출하는 알고리즘을 제공하고자 하며, 여기서 개체는 하나 이상의 포인트들(예컨대, 모든 픽셀 포인트들, 관심 키포인트들 등)을 식별하는 서술자들에 의해 특징지어질 수 있다. 일반적으로, 개체 인식은 피처(feature) 식별 및/또는 개체 인식의 목적을 위해 이미지에서 관심 포인트들을 식별하는 것을 수반할 수 있다. 이러한 관심 포인트들은, 그들이 이미지 스케일 변화 및/또는 회전에 대해 불변이고, 상당한 범위의 왜곡, 관점 변화, 및/또는 잡음과 조명 변화에 대해 강건한 매칭을 제공하도록 프로세싱될 필요가 있다. 특히, 수많은 이미지들을 연속적으로 수집하는 감시 시스템 분야에서는 특정 개체가 큰 데이터베이스에 대해 높은 확률로 정확하게 매칭될 수 있도록 할 필요가 있다. 또한, 감시 시스템 분야는 큰 데이터베이스에 대한 효율적이고 저비용의 연산 처리를 요한다. A variety of industrial sites may be streamlined by using a machine or processor capable of identifying an entity in a visual representation (e.g., image, etc.). The field of computer vision is to provide an algorithm for identifying and detecting an object of an image wherein the object can be characterized by descriptors identifying one or more points (e.g., all pixel points, interest keypoints, etc.) have. In general, object recognition may involve identifying points of interest in an image for purposes of feature identification and / or object recognition. These points of interest need to be processed so that they are invariant to image scale changes and / or rotations, provide a significant range of distortion, perspective changes, and / or robust matching against noise and illumination changes. In particular, in the field of surveillance systems that collect a large number of images continuously, there is a need to ensure that certain entities can be accurately matched to a large database with high probability. In addition, the field of surveillance systems requires efficient and low-cost computational processing for large databases.

이를 위한 방안으로서, SIFT(Scale Invariant Feature Transform) 알고리즘은, 초기 단계에서 쿼리(query)된 개체를 해당 개체의 피처들에 기초하여 분류함으로써 이미지를 인덱싱하는 비용을 감소시켰다. 그러나 여전히 피처를 추출하는 과정에서 많은 프로세싱량을 요구하고 있다. To this end, the Scale Invariant Feature Transform (SIFT) algorithm reduces the cost of indexing images by sorting the objects queried in the initial stage based on the features of the corresponding entity. However, it still requires a large amount of processing in the process of extracting features.

미국공개특허 제 2017/0076195 호(발명의 명칭: DISTRIBUTED NEURAL NETWORKS FOR SCALABLE REAL-TIME ANALYTICS)U.S. Published Patent Application No. 2017/0076195 (entitled DISTRIBUTED NEURAL NETWORKS FOR SCALABLE REAL-TIME ANALYTICS)

본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 본 발명의 일 실시예는 컨볼루션 신경망을 이용하여 이미지 데이터를 효율적으로 서술함으로써 이미지 데이터의 효율적인 관리 및 탐색/검색이 가능한 영상 검색 시스템을 제공하는데에 그 목적이 있다. 또한, 본 발명의 일 실시예는 컨볼루션 신경망의 프로세싱 부하를 감소시킴으로써 고효율/저비용의 영상 검색 시스템을 제공하는데에 그 목적이 있다. SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and it is an object of the present invention to provide an image retrieval system capable of efficiently managing image data and searching / retrieving image data by efficiently describing image data using a convolutional neural network The purpose is to provide. Another object of the present invention is to provide a high efficiency / low cost image search system by reducing the processing load of the convolutional neural network.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. It should be understood, however, that the technical scope of the present invention is not limited to the above-described technical problems, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 제1 측면은, 이미지 프레임을 획득하는 단계; 이미지 프레임에 복수의 색상 감도 커널과 복수의 에지(edge) 감도 커널을 각각 컨볼루션하여 복수의 색상 특징맵과 복수의 에지 특징맵을 생성하는 단계; 복수의 색상 특징맵과 복수의 에지 특징맵을 기초로, 각 픽셀(pixel) 위치에서의 최대 활성값에 매칭하는 색상 감도 커널의 인덱스(index) 또는 에지 감도 커널의 인덱스를 이용하여 색상 및 에지 각각에 대한 최대 활성 맵을 생성하는 단계; 색상 및 에지 각각에 대한 최대 활성 맵을 공간 풀링(spatial pooling)하는 단계; 및 공간 풀링된 값을 연결한 결과값을 기 저장된 특징값과 비교하여 이미지 프레임 내에 존재하는 개체를 검색하는 단계를 포함한다. As a technical means for achieving the above-mentioned technical object, a first aspect of the present invention is a method for acquiring an image frame, comprising: obtaining an image frame; Generating a plurality of color feature maps and a plurality of edge feature maps by convoluting a plurality of color sensitivity kernels and a plurality of edge sensitivity kernels respectively in an image frame; Based on a plurality of color feature maps and a plurality of edge feature maps, an index of a color sensitivity kernel or an index of an edge sensitivity kernel that matches a maximum active value at each pixel position, Generating a maximum activity map for the maximum activity map; Spatial pooling a maximum activity map for each of the color and edge; And comparing the result obtained by concatenating the space-pooled values with pre-stored feature values to search for an entity existing in the image frame.

또한, 본 발명의 제2 측면은, 컨볼루션 신경망(convolution neural network)을 이용하여 영상을 검색하는 프로그램이 저장된 메모리(memory) 및 상기 프로그램을 실행하는 프로세서(processor)를 포함하며, 프로세서는 상기 프로그램이 실행됨에 따라 이미지 프레임을 획득하며, 획득된 이미지 프레임을 상기 컨볼루션 신경망에 입력하고, 컨볼루션 신경망의 출력값을 기 저장된 특징값과 비교하여 이미지 프레임 내에 존재하는 개체를 검색한다. 이때, 컨볼루션 신경망은 이미지 프레임에 복수의 색상 감도 커널과 복수의 에지(edge) 감도 커널을 각각 컨볼루션하여 복수의 색상 특징맵과 복수의 에지 특징맵을 생성하는 적어도 하나의 컨볼루션 레이어, 복수의 색상 특징맵과 복수의 에지 특징맵을 기초로, 각 픽셀(pixel) 위치에서의 최대 활성값에 매칭하는 색상 감도 커널의 인덱스(index) 또는 에지 감도 커널의 인덱스를 이용하여 색상 및 에지 각각에 대해 생성된 최대 활성도 맵을 공간 풀링(spatial pooling)하는 적어도 하나의 풀링 레이어, 및 풀링된 값을 연결(concatenating)하는 적어도 하나의 풀 커넥티드(full-connected) 레이어로 구성된다. The second aspect of the present invention also includes a memory for storing a program for retrieving an image using a convolution neural network and a processor for executing the program, Acquires an image frame as it is executed, inputs the acquired image frame to the convolutional neural network, and compares the output value of the convolutional neural network with a previously stored feature value to search for an object existing in the image frame. The convolution neural network includes at least one convolution layer for generating a plurality of color feature maps and a plurality of edge feature maps by convoluting a plurality of color sensitivity kernels and a plurality of edge sensitivity kernels respectively in an image frame, Based on the color feature map and the plurality of edge feature maps, the index of the color sensitivity kernel or the index of the edge sensitivity kernel, which matches the maximum active value at each pixel position, At least one pooling layer for spatial pooling the generated maximum activity map, and at least one full-connected layer for concatenating the pooled values.

또한, 본 발명의 제 3 측면은, 상기 제 1 측면의 방법을 구현하기 위한 프로그램이 기록된 컴퓨터로 판독 가능한 기록 매체를 제공한다. A third aspect of the present invention provides a computer-readable recording medium on which a program for implementing the method of the first aspect is recorded.

전술한 본 발명의 과제 해결 수단에 의하면, 본 발명의 일 실시예는 학습 가능한 컨볼루션 신경망을 통해 이미지 데이터의 특징을 서술함으로써, 이미지 데이터 내의특정 개체를 보다 효율적으로 탐색/검색할 수 있다. 또한, 본 발명의 일 실시예는 상기한 컨볼루션 신경망의 풀링 레이어가 커널의 인덱스를 이용하여 공간 풀링하도록 구성함으로써 저비용으로 이미지 데이터 내의 특정 개체를 탐색/검색할 수 있도록 할 수 있다. According to an aspect of the present invention, an embodiment of the present invention describes a feature of image data through a learnable convolution neural network, so that a specific entity in image data can be more efficiently searched / retrieved. In addition, an embodiment of the present invention can allow a pooling layer of the convolution neural network to space-pool using the index of the kernel, thereby enabling a specific object in image data to be searched / retrieved at low cost.

도 1은 본 발명의 일 실시예에 따른 영상 검색 시스템을 도시한 개요도이다.
도 2는 본 발명의 일 실시예에 따른 컨볼루션 레이어를 도시한 도면이다.
도 3은 본 발명의 일 실시예에 따라 풀링 레이어에서 색상 감도 특징맵들로부터 공간 최대 활성도 맵이 구성되는 일례를 도시한 도면이다.
도 4는 본 발명의 일 실시예에 따른 색상 히스토그램의 일례를 도시한 도면이다.
도 5는 본 발명의 일 실시예에 따라 도 1의 프로세서의 동작 방법을 도시한 순서도이다. 1 is a schematic diagram showing an image retrieval system according to an embodiment of the present invention.
FIG. 2 illustrates a convolution layer according to an embodiment of the present invention. Referring to FIG.
3 is a diagram illustrating an example in which a spatial maximum activity map is constructed from color sensitivity feature maps in a pooling layer according to an embodiment of the present invention.
4 is a diagram illustrating an example of a color histogram according to an embodiment of the present invention.
5 is a flowchart illustrating an operation method of the processor of FIG. 1 according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, which will be readily apparent to those skilled in the art. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is referred to as being "connected" to another part, it includes not only "directly connected" but also "electrically connected" with another part in between . Also, when a part is referred to as "including " an element, it does not exclude other elements unless specifically stated otherwise.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다.In this specification, the term " part " includes a unit realized by hardware, a unit realized by software, and a unit realized by using both. Further, one unit may be implemented using two or more hardware, or two or more units may be implemented by one hardware.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 영상 검색 시스템(10)의 개요도이다. 도 1을 참조하면, 영상 검색 시스템(10)은 감시 카메라(11: 11a, 11b, ... )와 영상 검색 장치(12)로 구성된다. 이때, 감시 카메라(11)와 영상 검색 장치(12)는 유무선 네트워크를 통해 데이터를 주고받을 수 있다. 1 is a schematic diagram of an image retrieval system 10 according to an embodiment of the present invention. Referring to FIG. 1, an image retrieval system 10 includes a surveillance camera 11 (11a, 11b, ...) and an image retrieval apparatus 12. At this time, the surveillance camera 11 and the image search device 12 can exchange data through a wired / wireless network.

감시 카메라(11)는 이미지 센서(미도시)를 통해 획득한 이미지 프레임을 처리하여 영상 검색 장치(12)로 제공한다. 이때, 감시 카메라(11)는 CCTV 등과 같이 다양한 장소에서 특정 개체(예컨대, 사람, 차량 등)을 모니터링하기 위한 것일 수 있다. The surveillance camera 11 processes the image frame acquired through the image sensor (not shown) and provides the processed image frame to the image search device 12. At this time, the surveillance camera 11 may be for monitoring a specific entity (e.g., a person, a vehicle, etc.) at various places such as a CCTV.

영상 검색 장치(12)는 감시 카메라(11)에서 수집된 이미지 프레임들로부터 특정 개체를 검색한다. 이를 위해 영상 검색 장치(12)는 메모리(121), 특징 히스토그램 저장 유닛(122) 및 프로세서(123)를 포함한다. 또한, 영상 검색 장치(12)는 감시 카메라(11)로부터 이미지 프레임을 수신하기 위한 통신부(미도시), 영상 검색 장치(12)에서 처리되는 정보를 출력하기 위한 디스플레이부(미도시) 등을 더 포함할 수 있으며, 구현 방식에 따라 이 외의 다른 구성 요소들을 더 포함할 수도 있다. The image retrieval apparatus 12 retrieves a specific object from the image frames collected by the surveillance camera 11. To this end, the image search apparatus 12 includes a memory 121, a feature histogram storage unit 122, and a processor 123. The image search apparatus 12 further includes a communication unit (not shown) for receiving an image frame from the surveillance camera 11, a display unit (not shown) for outputting information processed in the image search apparatus 12, and the like And may further include other components depending on the implementation method.

이하, 도 1 내지 도 5를 참조하여, 영상 검색 장치(12)의 구성 및 동작 방법을 상세히 설명한다. Hereinafter, the configuration and operation method of the image search apparatus 12 will be described in detail with reference to FIGS. 1 to 5. FIG.

먼저, 메모리(121)는 비휘발성 메모리, 휘발성 메모리, 하드 디스크 드라이브(HDD) 또는 솔리드 스테이트 드라이브(SSD) 등으로 구현될 수 있으며, 영상 검색 장치(12)가 이미지 프레임들로부터 특정 개체를 검색하기 위한 프로그램을 저장한다. 상기 프로그램은, 본 발명의 일 실시예에 따라 이미지 서술자를 생성하는 컨볼루션 신경망(convolution neural network)을 포함할 수 있다. 여기서, 컨볼루션 신경망은 입력 데이터가 이미지라는 가정 하에서, 이미지 데이터가 갖는 특성들을 인코딩 하도록 설계된 신경망이다. 따라서, 컨볼루션 신경망은 가로, 세로, 깊이(RGB 채널)의 3차원으로 구현된 컨볼루션 레이어(convolution layer), 풀링 레이어(pooling layer) 및 풀커넥티드 레이어(full-connected layer)로 구성되며, 구현예에 따라 각각의 레이어는 단일 또는 복수개로 구성될 수 있다. 한편, 컨볼루션 신경망은 프로세서(123)의 제어에 의해, 컨볼루션 레이어에 이미지 프레임이 입력됨에 따라 구동될 수 있다. 이하 도 2 내지 도 4를 참조하여 본 발명의 일 실시예에 따른 컨볼루션 신경망을 상세히 설명한다. The memory 121 may be implemented as a nonvolatile memory, a volatile memory, a hard disk drive (HDD), or a solid state drive (SSD). When the image search apparatus 12 searches for a specific object And stores the program. The program may include a convolution neural network that generates an image descriptor in accordance with an embodiment of the present invention. Here, the convolutional neural network is a neural network designed to encode the characteristics of image data under the assumption that the input data is an image. Therefore, the convolution neural network is composed of a convolution layer, a pooling layer and a full-connected layer implemented in three dimensions of width, length, and depth (RGB channel) According to an embodiment, each layer may be composed of a single layer or a plurality of layers. On the other hand, the convolutional neural network can be driven as the image frame is input to the convolution layer, under the control of the processor 123. Hereinafter, a convolutional neural network according to an embodiment of the present invention will be described in detail with reference to FIGS. 2 to 4. FIG.

도 2는 본 발명의 일 실시예에 따른 컨볼루션 레이어를 도시한 도면이다. 먼저, 컨볼루션 레이어의 파라미터들은 학습 가능한 컨볼루션 커널들의 집합(201)으로 구성되며, 입력 데이터(즉, 이미지 프레임)(211)와 컨볼루션되어 특징맵(feature map)(213)을 출력한다. 본 발명의 일 실시예에 따라 컨볼루션 커널 집합은, 기 알려진 AlexNet 모델, ConvNet 모델, LeNet-5모델 등에서의 커널들이 이용될 수 있다. 그러나, 이에 제한되는 것은 아니며, 다양한 컨볼루션 신경망 모델들에서 이용되는 커널들이 이용될 수 있다. 한편, 상기한 커널은 실시예에 따라 필터(filter) 등으로 명명될 수 있다. FIG. 2 illustrates a convolution layer according to an embodiment of the present invention. Referring to FIG. The parameters of the convolution layer are composed of a set of learnable convolution kernels 201 and are convolved with input data (i.e., image frame) 211 to output a feature map 213. According to an embodiment of the present invention, the convolution kernel set can be a kernel in known AlexNet model, ConvNet model, LeNet-5 model, and the like. However, the present invention is not limited thereto, and kernels used in various convolution neural network models can be used. Meanwhile, the kernel can be named as a filter according to an embodiment.

이때, 본 발명의 일 실시예에 따라 컨볼루션 커널 집합(201)은 각 커널의 색상 및 에지 민감도에 따라 색상 감도 커널(202a)과 에지 감도 커널(202b)로 구분된다. 채널간 표준 편차는 색상에 대한 커널의 감도에 민감하고, 픽셀 간 표준 편차는 에지에 대한 감도에 민감하므로, 각 커널의 색상 및 에지에 대한 민감도는 하기의 수학식 1 과 수학식 2 에 의해 산출될 수 있다. In this case, according to an embodiment of the present invention, the convolution kernel set 201 is divided into a color sensitivity kernel 202a and an edge sensitivity kernel 202b according to the hue and edge sensitivity of each kernel. Since the inter-channel standard deviation is sensitive to the sensitivity of the kernel to the color and the inter-pixel standard deviation is sensitive to the sensitivity to the edge, the sensitivity to color and edge of each kernel is calculated by the following equations (1) and .

위 식에서, i (i=1, 2, .. I)는 커널의 인덱스이며, m은 i 번째 커널의 너비와 높이를 나타낸다. 또한,

는 i번째 커널 K의 커널 계수에 대한 RGB 채널 사이에서 계산된 표준편차(

)의 합으로 계산된 색상 민감도를 나타낸다.In this equation, i (i = 1, 2, .. I) is the index of the kernel, and m is the width and height of the i-th kernel. Also,

Is the standard deviation calculated between the RGB channels for the kernel coefficients of the i < th > kernel K (

) &Lt; / RTI >

위 식에서,

는 i 번째 커널 K의 모든 색상 채널에서 연속적인 수평 및 수직 커널 계수간에 계산된 표준편차(

)의 합으로 계산된 에지 민감도를 나타낸다. In the above equation,

Is the calculated standard deviation between successive horizontal and vertical kernel coefficients in all color channels of the i < th > kernel K (

) &Lt; / RTI >

예컨대, 특정 커널의 색상 민감도(

)가 2.0 이상이면, 해당 커널은 색상 감도 커널로 분류될 수 있다. 또는, 특정 커널은 다른 커널들보다 상대적으로 높은 색상 민감도(

) 또는 에지 민감도(

)를 가짐에 따라 색상 감도 커널 또는 에지 감도 커널로 분류될 수 있다. For example, the color sensitivity of a particular kernel (

) Is 2.0 or higher, the kernel can be classified as a color sensitivity kernel. Alternatively, a particular kernel may have a relatively high color sensitivity (

) Or edge sensitivity (

), It can be classified into a color sensitivity kernel or an edge sensitivity kernel.

이후, 입력 데이터(즉, 입력 프레임)(211)는 색상 감도 커널(202a)에 포함된 각각의 컨볼루션 커널(212a)과 컨볼루션(203)되어 복수의 색상 특징맵(213a)을 생성하며, 이와 함께 에지 감도 커널(202b)에 포함된 각각의 컨볼루션 커널(212b)과 컨볼루션(203)되어 복수의 에지 특징맵(213b)을 생성한다. 상기한 과정은 병렬적으로 수행될 수 있으나, 구현예에 따라서는 순차적으로 수행될 수도 있다. The input data 211 is then convolved with each of the convolution kernels 212a included in the color sensitivity kernel 202a to generate a plurality of color feature maps 213a, And is convolved with each convolution kernel 212b included in the edge sensitivity kernel 202b to generate a plurality of edge feature maps 213b. The above process may be performed in parallel, but may be performed sequentially according to an implementation.

컨볼루션 레이어는 생성된 복수의 색상 특징맵(213a)과 복수의 에지 특징맵(213b)을 출력한다. The convolution layer outputs a plurality of generated color feature maps 213a and a plurality of edge feature maps 213b.

다음으로, 풀링 레이어는, 색상 특징맵들과 에지 특징맵들로부터 최대 활성값을 공간 풀링하는 대신, 각 픽셀의 최대 활성값에 대응하는 커널 인덱스(index)를 이용하여 최대 활성도 맵을 구성한다. 즉, 본 발명의 일 실시예에 따른 최대 활성도 맵은 색상과 에지 각각에 대해 생성되며, 각각은 색상 감도 커널의 인덱스들 또는 에지 감도 커널의 인덱스들로 구성된다. 이는 하기의 수학식 3과 수학식 4로 표현된다. Next, instead of pooling the maximum active value from the color feature maps and the edge feature maps, the pooling layer constructs a maximum activity map using a kernel index corresponding to the maximum active value of each pixel. That is, the maximum activity map according to an exemplary embodiment of the present invention is generated for each color and edge, and each of the indexes is composed of color sensitivity kernel indexes or edge sensitivity kernel indexes. This is expressed by the following equations (3) and (4).

위 식에서,

는 각 픽셀 위치(x, y)(x

X , y

Y, X는 가로값, Y는 세로값)는 자연수)에서의 최대 활성값을 나타내며,

는 색상 감도 커널의 개수를 나타내고,

은 색상 특징맵을 나타낸다. 또한,

는 색상 특징맵(

)으로부터 생성되는 최대 활성도 맵을 나타낸다. In the above equation,

(X, y) (x

X, y

Y, X are horizontal values, and Y is vertical values) is a natural number)

Represents the number of color sensitivity kernels,

Represents a color feature map. Also,

Is a color feature map (

). &Lt; / RTI >

위 식에서,

는 에지 감도 커널의 개수이며,

는 에지 특징맵을 나타낸다. 또한,

는 에지 특징맵(

Is the number of edge sensitivity kernels,

Represents an edge feature map. Also,

Is an edge feature map

). &Lt; / RTI >

도 3은 본 발명의 일 실시예에 따라 풀링 레이어에서 색상 특징맵들(310)로부터 최대 활성도 맵(320)이 구성되는 일례를 도시한 도면이다. 도 3을 참조하면, 색상 특징맵들(310)의 (1,1) 픽셀 위치(311)에서의 최대 활성값이 3번째 색상 감도 커널에 대응하는 경우, 최대 활성도 맵(320)의 (1,1) 픽셀에는 3이 저장된다. 이와 같이, 풀링 레이어에서는 색상 특징맵들로부터 색상에 대한 최대 활성도 맵을 생성하며, 에지 감도 특징맵들로부터 에지에 대한 최대 활성도 맵을 생성한다. FIG. 3 is a diagram illustrating an example in which a maximum activity map 320 is constructed from color feature maps 310 in a pooling layer according to an embodiment of the present invention. Referring to FIG. 3, when the maximum activity value at the (1,1) pixel position 311 of the color feature maps 310 corresponds to the third color sensitivity kernel, 1) 3 is stored in the pixel. Thus, the pooling layer generates a maximum activity map for color from the color feature maps and generates a maximum activity map for the edges from the edge sensitivity feature maps.

이후, 최대 활성도 맵들의 정보는 공간 풀링(spatial pooling)을 통해 히스토그램(histogram)으로 수집된다. 구체적으로, 색상에 대한 최대 활성도 맵으로부터의 색상 감도 커널 별 빈도에 따라 색상 히스토그램이 수집되며, 에지에 대한 최대 활성도 맵으로부터의 에지 감도 커널 별 빈도에 따라 에지 히스토그램이 수집된다. 도 4는 본 발명의 일 실시예에 따른 색상 히스토그램의 일례를 도시한 도면이다. 도 4에 도시된 바와 같이, 색상 히스토그램은 색상 감도 커널(k

K, K =1,2,..

) 별 빈도값(frequency)으로 표현될 수 있다. 이와 같이, 본 발명의 일 실시예에 따른 풀링 레이어는 공간 정보를 제외하여 간단한 풀링을 수행함으로써 이미지를 저차원적 특징을 추출할 수 있다. The information of the maximum activity maps is then collected as a histogram through spatial pooling. Specifically, the color histogram is collected according to the color sensitivity kernel specific frequency from the maximum activity map for color, and the edge histogram is collected according to the edge sensitivity kernel-specific frequency from the maximum activity map for the edge. 4 is a diagram illustrating an example of a color histogram according to an embodiment of the present invention. As shown in FIG. 4, the color histogram includes a color sensitivity kernel k

K, K = 1,2, ...

) Can be expressed as a frequency value. As such, the pooling layer according to an embodiment of the present invention can extract low-order features of an image by performing simple pulling excluding spatial information.

다음으로, 풀커넥티드 레이어(full-connected layer)에서는 색상 히스토그램과 에지 히스토그램의 연결한다. 즉, 풀커넥티드 레이어는 색상 히스토그램과 에지 히스토그램을 완전 연결한 결과값으로써 특징 히스토그램을 출력한다. 이러한 특징 히스토그램은 이미지의 특징(즉, 피처(feature))를 서술하는 서술자로서 기능한다. Next, a full-connected layer connects the color histogram and the edge histogram. That is, the full connected layer outputs a feature histogram as a result of the color histogram and the edge histogram being fully connected. This feature histogram serves as a descriptor that describes the features (i.e., features) of the image.

한편, 풀링 레이어에서는 상기한 실시예 외에 다른 공간 풀링 방식이 적용될 수도 있다. 예컨대, 구현예에 따라 풀링 레이어는 공간 피라미드 매칭(spatial pyramid matching), 4분원 기반 공간 풀링(quadrant based spatial pooling) 등으로 색상 특징맵들과 에지 특징맵들을 공간 풀링할 수도 있다. 이 경우, 풀링 레이어의 출력값을 이미지 프레임의 각 영역에 대한 히스토그램을 획득한 후, 이들을 연결하여 특징 히스토그램을 출력할 수 있다. Meanwhile, in the pooling layer, a space pooling method other than the above embodiment may be applied. For example, according to an embodiment, the pooling layer may space-pool color feature maps and edge feature maps by spatial pyramid matching, quadrant based spatial pooling, and the like. In this case, the output values of the pulling layer can be obtained by obtaining histograms for the respective regions of the image frame, and then connecting the output histograms to output the feature histogram.

다시 도 1을 참조하면, 특징 히스토그램 저장 유닛(122)은 프로세서(123)의 제어에 의해, 컨볼루션 신경망을 통해 획득된 이미지 프레임들의 특징 히스토그램들(즉, 특징값들)을 저장한다. 한편, 도 1에서 특징 히스토그램 저장 유닛(122)과 메모리(121)는 별개의 구성요소인 것으로 도시되었으나, 특징 히스토그램 저장 유닛(122)은 메모리(121)에 포함될 수 있다. 또는 특징 히스토그램 저장 유닛(122)은 영상 검색 장치(12)의 외부에 위치하는 별개의 데이터베이스(database) 장치일 수도 있다. Referring again to FIG. 1, the feature histogram storage unit 122 stores feature histograms (i.e., feature values) of image frames obtained through the convolutional neural network, under the control of the processor 123. 1, the feature histogram storage unit 122 and the memory 121 are shown as separate components, but the feature histogram storage unit 122 may be included in the memory 121. [ Or feature histogram storage unit 122 may be a separate database device located external to image retrieval device 12. [

프로세서(123)는 영상 검색 장치(12)의 전반적인 동작을 제어한다. 이를 위해, 프로세서(123)는 데이터 처리를 위한 CPU(central processing unit) 이외에, 그래픽 처리를 위한GPU(graphic processor unit), 신호 처리를 위한 DSP(digital signal processor)를 더 포함하여 구현될 수 있으며, 상기한 적어도 하나를 통합한 SoC(system on chip)로 구현될 수 있다.The processor 123 controls the overall operation of the image search apparatus 12. [ The processor 123 may further include a graphics processor unit (GPU) for graphics processing and a digital signal processor (DSP) for signal processing, in addition to a central processing unit (CPU) And may be implemented as a system on chip (SoC) incorporating at least one of the above.

구체적으로, 프로세서(123)는 메모리(121)와 특징 히스토그램 저장 유닛(122)을 제어하여, 감시 카메라(11)로부터 제공받은 이미지 프레임 내에 존재하는 개체를 검출/검색할 수 있다. Specifically, the processor 123 can control the memory 121 and the feature histogram storage unit 122 to detect / retrieve objects existing in the image frame received from the surveillance camera 11. [

도 5를 참조하면, 프로세서(123)는 감시 카메라(11)로부터 이미지 프레임을 제공받는다(S510). 이후, 프로세서(123)는 이미지 프레임을 전술한 컨볼루션 신경망에 입력한다(S520). 이에 따라, 컨볼루션 신경망이 구동되어, 도 2 내지 도 4에서 전술한 컨볼루션 레이어(CONV), 풀링 레이어(POOL) 및 풀커넥티드 레이어(FC)가 동작한다. 구체적으로, 컨볼루션 레이어(CONV)는 이미지 프레임에 각각 색상 감도 커널들과 에지 감도 커널들을 컨볼루션하여 복수의 색상 특징맵과 에지 특징맵을 T생성한다(S521). 다음으로, 풀링 레이어(POOL)는 상기 색상 특징맵들과 에지 특징맵들로부터 색상 및 에지 각각에 대한 최대 활성도 맵을 생성한 후, 각 최대 활성도 맵을 공간 풀링한다(S522). 이에 따라, 각각의 최대 활성도 맵의 정보는 색상 히스토그램과 에지 히스토그램으로 수집된다. Referring to FIG. 5, the processor 123 receives an image frame from the surveillance camera 11 (S510). Thereafter, the processor 123 inputs the image frame to the above-described convolutional neural network (S520). Accordingly, the convolutional neural network is driven, and the convolution layer (CONV), the pulling layer (POOL) and the full connected layer (FC) described in Figs. 2 to 4 operate. Specifically, the convolution layer CONV convolutes the color sensitivity kernels and the edge sensitivity kernels respectively in the image frame to generate a plurality of color feature maps and an edge feature map T (S521). Next, the pooling layer (POOL) generates a maximum activity map for each color and edge from the color feature maps and the edge feature maps, and then pools each maximum activity map (S522). Accordingly, the information of each maximum activity map is collected as a color histogram and an edge histogram.

다음으로, 풀커넥티드 레이어(FC)는 색상 히스토그램과 에지 히스토그램을 연결하여, 특징 히스토그램을 출력한다(S523). 즉, 풀커넥티드 레이어는 색상 히스토그램과 에지 히스토그램을 완전 연결한 결과값으로써 특징 히스토그램을 출력한다. Next, the full connected layer FC links the color histogram and the edge histogram, and outputs the characteristic histogram (S523). That is, the full connected layer outputs a feature histogram as a result of the color histogram and the edge histogram being fully connected.

이후, 프로세서(123)는 풀커넥티드 레이어로부터 출력된 특징 히스토그램을 특징 히스토그램 저장 유닛(122)에 기 저장된 이미지 프레임들의 특징 히스토그램과 비교함으로써, 해당 이미지 프레임과 동일 또는 유사한 개체들을 포함하는 기 저장된 이미지들을 검출/검색할 수 있다(S530). 예컨대, 프로세서(123)는 해당 이미지 프레임으로부터 추출된 특징 히스토그램과 기 저장된 특징 히스토그램(즉, 특징값) 간의 유사도 또는 비유사도 점수를 이용하여 기 저장된 이미지 프레임들을 순위화할 수 있다. Thereafter, the processor 123 compares the feature histogram output from the full connected layer with the feature histogram of the pre-stored image frames in the feature histogram storage unit 122 to obtain a pre-stored image containing the same or similar entities as the corresponding image frame (S530). For example, the processor 123 may rank the pre-stored image frames using the similarity score or non-default score between the feature histogram extracted from the image frame and the pre-stored feature histogram (i.e., feature value).

한편, 본 발명의 일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. On the other hand, an embodiment of the present invention may also be realized in the form of a recording medium including instructions executable by a computer such as a program module executed by a computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. The computer-readable medium may also include computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.It will be understood by those skilled in the art that the foregoing description of the present invention is for illustrative purposes only and that those of ordinary skill in the art can readily understand that various changes and modifications may be made without departing from the spirit or essential characteristics of the present invention. will be. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive. For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is defined by the appended claims rather than the detailed description and all changes or modifications derived from the meaning and scope of the claims and their equivalents are to be construed as being included within the scope of the present invention do.

10: 영상 검색 시스템
11: 카메라
12: 영상 검색 장치
121: 메모리
122: 특징 히스토그램 저장 유닛
123: 프로세서10: Image search system
11: Camera
12: Image search device
121: Memory
122: Feature histogram storage unit
123: Processor

Claims

A method of operating an image retrieval apparatus using a convolution neural network, the method comprising:
Obtaining an image frame;
Generating a plurality of color feature maps and a plurality of edge feature maps by convoluting the plurality of color sensitivity kernels and the plurality of edge sensitivity kernels respectively in the image frame;
Based on the plurality of color feature maps and the plurality of edge feature maps, an index of a color sensitivity kernel or an index of an edge sensitivity kernel matching a maximum active value at each pixel position, Generating a maximum activity map for each;
Spatially pooling a maximum activity map for each of the hue and edge; And
And comparing the result of concatenating the space-pooled values with pre-stored feature values to search for an entity present in the image frame.

The method according to claim 1,
The step of space pooling the maximum activity map
Calculating a frequency per color sensitivity kernel from the maximum activity map for the color, collecting the frequency per kernel in a color histogram, calculating a frequency per edge sensitivity kernel from the maximum activity map for the edge, and collecting it as an edge histogram.

3. The method of claim 2,
The concatenated result value
Wherein the histogram is a feature histogram fully connected to the color histogram and the edge histogram.

The method of claim 1, wherein
The pre-stored feature value is a feature histogram of pre-stored image frames,
The step of searching for the entity
Wherein the ranking of the pre-stored image frames is determined according to the degree of similarity or non-preference between the linked resultant value and the characteristic histogram of the pre-stored image frames.

The method according to claim 1,
The color sensitivity kernel or edge sensitivity kernels may include:
Wherein the learnable convolution kernels are classified as color sensitive kernels or edge sensitive kernels, depending on color and sensitivity scores for edges.

6. The method of claim 5,
The sensitivity score for the color is the sum of standard deviations between the RGB color channels for the kernel coefficients of the convolution kernels,
Wherein the sensitivity score for the edge is a sum of standard deviations between successive horizontal and vertical kernel coefficients in an RGB color channel.

A program for retrieving an image using a convolution neural network is stored in a memory
And a processor for executing the program,
The processor, as the program is executed,
Searching an object existing in the image frame by comparing the output value of the convolutional neural network with a previously stored feature value to obtain an image frame, inputting the obtained image frame to the convolutional neural network,
The convolution neural network
At least one convolution layer for convoluting a plurality of color sensitivity kernels and a plurality of edge sensitivity kernels respectively in the image frame to generate a plurality of color feature maps and a plurality of edge feature maps, And a maximum value generated for each color and edge using the index of the color sensitivity kernel or the index of the edge sensitivity kernel that matches the maximum active value at each pixel position based on the plurality of edge feature maps. At least one pooling layer for spatial pooling an activity map, and at least one full-connected layer for concatenating said pooled values.

A computer-readable recording medium on which a program for implementing the method of any one of claims 1 to 6 is recorded.