KR101847874B1

KR101847874B1 - Image recognition method using convolution neural network and recording medium thereof

Info

Publication number: KR101847874B1
Application number: KR1020170081718A
Authority: KR
Inventors: 이광엽; 최세진; 김민철
Original assignee: 서경대학교 산학협력단
Priority date: 2017-06-28
Filing date: 2017-06-28
Publication date: 2018-05-25

Abstract

The present invention relates to an image recognition method using a convolutional neural network and a recording method thereof, which can increase speed of calculation for image classification by applying a hybrid form of a convolutional layer and a pooling layer. According to the present invention, the image recognition method is a method of recognizing an image displayed in NxN pixels using a convolutional neural network and includes a first process and a second process simultaneously performed in parallel to each other. The first process includes: step 1a of reading a first region configured of nxn (here, n<N) pixels among the NxN pixels; step 1b of performing four times of convolution operations by applying (n-1)x(n-1) kernels of ′Stride = 1′ for the first region; step 1c of extracting a first feature by consecutively performing a pooling operation with results of the convolution operations without storing the results of the convolution operations of step 1b in a shared memory; and step 1d of mapping the first feature to a convolution-pooling hybrid feature map.

Description

[0001] IMAGE RECOGNITION METHOD USING CONVOLUTION NEURAL NETWORK AND RECORDING MEDIUM THEREOF [0002]

본 발명은 합성곱 신경망을 이용한 이미지 인식 방법에 관한 것으로서, 보다 상세하게는 합성곱 레이어와 풀링 레이어가 혼합된 형태를 적용함으로써 이미지 분류를 위한 연산 속도를 향상시킬 수 있는 합성곱 신경망을 이용한 이미지 인식 방법 및 그 기록 매체에 관한 것이다.The present invention relates to an image recognition method using a composite neural network, and more particularly, to an image recognition method using a composite neural network capable of improving a calculation speed for image classification by applying a mixture form of a composite product layer and a pooling layer And a recording medium therefor.

학습 데이터를 기반한 신경망(Neural Network)은 이미 많은 분야에서 활용되고 있는데, 그 중 합성곱 신경망(Convolution Neural Network; CNN)은 영상 인식 분야에서 뛰어난 성능을 보이고 있다.Neural networks based on learning data have already been used in many fields, among which Convolution Neural Network (CNN) has shown excellent performance in image recognition.

그런데, 합성곱 신경망은 임베디드 시스템에서는 자원이 한정되어 있어 미리 학습된 가중치를 가지고 분류 작업을 수행하지만 연산 속도가 현저히 느린 문제점이 존재한다. 이를 개선하기 위하여, GPU(Graphics Processing Unit)를 범용적인 계산에 활용하여 병렬 처리할 수 있는 GP-GPU(General-Purpose computing on Graphics Processing Units)를 사용할 수 있다.However, the articulated neural network has a problem that the resources are limited in the embedded system and the classification operation is performed with the pre-learned weight, but the operation speed is remarkably slow. To improve this, GPU (Graphics Processing Unit) can be used for GPU (general-purpose computing on Graphics Processing Units), which can be used for general-purpose calculation.

합성곱 신경망(CNN)은 단순하면서 반복적인 연산이 많기 때문에 이를 병렬 처리할 수 있는 GP-GPU를 이용하면 효율적인 연산을 수행할 수 있다. 구체적으로, GP-GPU가 가지고 있는 SIMT(Single Instruction Multiple Thread) 구조에서 스레드와 메모리 접근법을 효율적으로 구성한다면 종래 CNN 방식보다 더 빠른 속도로 원하는 결과를 얻어낼 수 있다.Since the CNN has many simple and repetitive operations, it can perform efficient operations by using GP-GPU that can parallelize it. Specifically, if the thread and memory approach are efficiently configured in the single instruction multiple thread (SIMT) structure of the GP-GPU, the desired result can be obtained at a higher speed than the conventional CNN method.

이처럼 GP-GPU를 활용하여 CNN의 연산속도 성능을 향상시키기 위해서는 스레드를 적절히 분배하고 메모리의 접근 시간을 최소화해야 한다. 그런데, 종래 합성곱 신경망을 이용한 이미지 인식 방법은 인식 대상 이미지의 전체 영역을 하나의 입력으로 읽어들이고, 이 전체 영역에 대해 커널을 순차적으로 이동시켜나가며 합성곱 연산을 수행하고 그 출력값을 합성곱 특징맵에 매핑한 후 이에 대하여 풀링 연산을 수행하는 과정을 다수 회 반복하여 최종적으로 n×n(예컨대, 4×4)의 풀링 레이어를 생성하게 된다. 이에 따라, 종래 합성곱 신경망을 이용한 이미지 인식 방법은 각각의 합성곱 연산 결과를 공유 메모리(Shared memory)에 저장해야만 했고, 이 합성곱 특징맵의 수만큼 메모리 접근 수가 증가하여 연산속도를 개선함에 한계가 있었다.In order to improve the operation speed performance of CNN using the GP-GPU, it is necessary to properly allocate the threads and minimize the access time of the memory. Conventionally, an image recognition method using a composite neural network reads an entire region of an image to be recognized as one input, moves the kernel sequentially to the entire region, performs a composite product operation, Mapped to the map and then performing the pooling operation on the map is repeated a number of times to finally generate n × n (eg, 4 × 4) pooling layers. Accordingly, in the image recognition method using the conventional articulated neural network, the result of each result of the convolutional product must be stored in a shared memory, and the number of memory accesses is increased by the number of the convolutional feature maps, .

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은 GP-GPU가 가지고 있는 SIMT 구조에서 스레드 분배와 메모리 접근법을 개선하여, 종래 대비 메모리 접근 수를 줄일 수 있고 스레드 연상량을 최소화할 수 있어 이미지 분류를 위한 연산 속도를 향상시킬 수 있는 합성곱 신경망을 이용한 이미지 인식 방법 및 그 기록 매체를 제공하는 것이다.SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and it is an object of the present invention to improve a thread distribution and a memory approach in a SIMT structure of a GP-GPU, The present invention provides a method of recognizing an image using a composite neural network and a recording medium therefor.

상기 목적을 달성하기 위한 본 발명에 따른 이미지 인식 방법은 합성곱 신경망을 이용하여 N×N 픽셀에 표시되는 이미지를 인식하기 위한 방법으로서 상호 병렬적으로 동시 수행되는 제1 처리 및 제2 처리를 포함한다.According to an aspect of the present invention, there is provided a method for recognizing an image displayed on NxN pixels using a composite neural network, including a first process and a second process performed simultaneously and in parallel do.

상기 제1 처리는 상기 N×N 픽셀 중 n×n(여기서, n<N) 픽셀로 이루어진 제1 영역을 입력으로 읽어들이는 단계 1a; 상기 제1 영역에 대하여 'Stride = 1'의 (n-1)×(n-1) 커널을 적용하여 네 번의 합성곱(Convolution) 연산을 수행하는 단계 1b; 상기 단계 1b의 각 합성곱 연산 결과를 공유 메모리(Shared memory)에 저장하지 않고, 상기 각 합성곱 연산 결과를 가지고 연이어 풀링(Pooling) 연산을 수행하여 제1 특징을 추출하는 단계 1c; 및 상기 제1 특징을 합성곱-풀링 혼합 특징맵(Feature map)에 매핑시키는 단계 1d를 포함한다.The first process comprising: inputting as input a first region of n × n (where n <N) pixels of the N × N pixels; Performing four convolution operations by applying (n-1) x (n-1) kernels of Stride = 1 to the first region; Extracting a first characteristic by sequentially performing a pooling operation on the result of each of the resultant products without storing the result of each of the resultant products of the step 1b in a shared memory; And mapping the first feature to a composite product-pooling feature map.

상기 제2 처리는 상기 제1 영역 대비 'Stride = m'만큼 이동되어 n×n 픽셀로 이루어진 제2 영역을 입력으로 읽어들이는 단계 2a; 상기 제2 영역에 대하여 'Stride = 1'의 (n-1)×(n-1) 커널을 적용하여 네 번의 합성곱(Convolution) 연산을 수행하는 단계 2b; 상기 단계 2b의 각 합성곱 연산 결과를 공유 메모리(Shared memory)에 저장하지 않고, 상기 각 합성곱 연산 결과를 가지고 연이어 풀링 (Pooling) 연산을 수행하여 제2 특징을 추출하는 단계 2c; 및 상기 제2 특징을 합성곱-풀링 혼합 특징맵(Feature map)에 매핑시키는 단계 2d를 포함한다.The second process is a step of reading as input a second area of n × n pixels shifted by 'Stride = m' with respect to the first area; (B) performing four convolution operations by applying (n-1) x (n-1) kernels of Stride = 1 to the second region; Extracting a second feature by sequentially performing a pooling operation on the result of each of the resultant products, without storing the result of each of the resultant products in step 2b in a shared memory; And step 2d mapping the second feature to a composite product-pooling mix feature map.

본 발명에 따른 합성곱 신경망을 이용한 이미지 인식 방법에 의하면, 최초 입력(즉, 전체 이미지)으로부터 분할된 각각의 인식 대상 이미지를 다시 다수 개의 영역(즉, 제1 내지 제M 영역)으로 분할한 후, 이 분할된 각각의 영역을 독립적 입력으로 읽어들여 병렬 구조의 합성곱-풀링 혼합 연산을 수행하게 된다.According to the image recognition method using the composite neural network according to the present invention, each recognition target image segmented from the first input (i.e., the entire image) is divided again into a plurality of regions (i.e., first to Mth regions) , And each of the divided regions is read as an independent input to perform a combined product-pooling mixing operation of a parallel structure.

이에 따라, 종래 처리 방식에 필수적으로 수반되는 합성곱 특징맵의 메모리 저장 과정을 생략할 수 있는 바, 종래 합성곱 특징맵의 수만큼 메모리 접근 수를 줄일 수 있어 메모리 접근시간을 최소화할 수 있고, 스레드를 효율적으로 분배할 수 있어 스레드 연산량을 최소화할 수 있게 되었다.Accordingly, it is possible to omit the memory storing process of the synthesized product feature map, which is essentially required for the conventional processing method. As a result, the number of memory accesses can be reduced by the number of the conventional synthesized product feature maps, The threads can be efficiently distributed, minimizing thread computation.

이에 의해, 결국 종래 이미지 인식 방법보다 더 빠른 속도로 목적하는 결과를 출력할 수 있는 효과가 있다.As a result, the desired result can be output at a faster rate than the conventional image recognition method.

도 1은 본 발명에 따른 합성곱-풀링 혼합 레이어 생성 과정을 나타낸 처리 흐름도.
도 2는 본 발명의 제1 영역에 합성곱-풀링 혼합 연산을 적용하여 제1 영역으로부터 제1 특징을 추출하는 과정을 도식화한 도면.
도 3은 본 발명의 인식 대상 이미지를 다수 개로 분할한 후 병렬 구조로 연산 처리하여 합성곱-풀링 혼합 특징맵을 생성하는 과정을 도식화한 도면.
도 4는 본 발명의 합성곱 신경망에 사용되는 비선형 활성화 함수를 설명하기 위한 도면.
도 5는 본 발명의 풀링 연산을 설명하기 위한 도면.
도 6은 본 발명에 따른 합성곱 신경망을 이용한 이미지 인식 방법의 CNN 구조를 나타낸 도면.
도 7은 종래 합성곱 신경망을 이용한 이미지 인식 방법의 CNN 구조를 나타낸 도면.FIG. 1 is a processing flowchart showing a process of generating a composite product-pooling mixed layer according to the present invention.
FIG. 2 is a diagram illustrating a process of extracting a first feature from a first region by applying a composite product-pooling mixture operation to a first region of the present invention; FIG.
FIG. 3 is a diagram illustrating a process of dividing a recognition target image of the present invention into a plurality of images and then performing arithmetic processing in a parallel structure to generate a combined-product-pooling mixed feature map.
Fig. 4 is a diagram for explaining a nonlinear activation function used in the composite-object-based neural network of the present invention; Fig.
5 is a diagram for explaining a pulling operation of the present invention;
6 is a diagram illustrating a CNN structure of an image recognition method using a composite-object-based neural network according to the present invention.
7 is a diagram illustrating a CNN structure of an image recognition method using a conventional articulated neural network;

본 발명에서 사용하는 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, the terms "comprises" or "having" and the like refer to the presence of stated features, integers, steps, operations, elements, components, or combinations thereof, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

또한, 본 명세서에서, "~ 상에 또는 ~ 상부에" 라 함은 대상 부분의 위 또는 아래에 위치함을 의미하는 것이며, 반드시 중력 방향을 기준으로 상 측에 위치하는 것을 의미하는 것은 아니다. 또한, 영역, 판 등의 부분이 다른 부분 "상에 또는 상부에" 있다고 할 때, 이는 다른 부분 "바로 상에 또는 상부에" 접촉하여 있거나 간격을 두고 있는 경우뿐 아니라 그 중간에 또 다른 부분이 있는 경우도 포함한다.Also, in the present specification, the term " above or above "means to be located above or below the object portion, and does not necessarily mean that the object is located on the upper side with respect to the gravitational direction. It will also be understood that when an element such as a region, plate, or the like is referred to as being "above or above another portion ", this applies not only to the presence or spacing of another portion & And the like.

또한, 본 명세서에서, 일 구성요소가 다른 구성요소와 "연결된다" 거나 "접속된다" 등으로 언급된 때에는, 상기 일 구성요소가 상기 다른 구성요소와 직접 연결되거나 또는 직접 접속될 수도 있지만, 특별히 반대되는 기재가 존재하지 않는 이상, 중간에 또 다른 구성요소를 매개하여 연결되거나 또는 접속될 수도 있다고 이해되어야 할 것이다.Also, in this specification, when an element is referred to as being "connected" or "connected" with another element, the element may be directly connected or directly connected to the other element, It should be understood that, unless an opposite description is present, it may be connected or connected via another element in the middle.

또한, 본 명세서에서, 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Also, in this specification, the terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.

이하에서는 첨부된 도면을 참조하여 본 발명의 바람직한 실시예, 장점 및 특징에 대하여 상세히 설명하도록 한다.Hereinafter, preferred embodiments, advantages and features of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 합성곱-풀링 혼합 레이어 생성 과정을 나타낸 처리 흐름도이다. 도 1을 참조하면, 본 발명에 따른 합성곱 신경망(Convolution Neural Network; CNN)을 이용한 이미지 인식 방법은 인식 대상 이미지에 있어서, 소정 간격(Stride)에 따라 할당된 다수 개의 영역(S1a)을 병렬적으로 동시 처리(S1b)하여 추출된 특징들(S1c)을 합성곱-풀링 혼합 특징맵(Feature map)에 매핑(S1d)하고, 상기 과정에 다수 개의 커널을 사용하여 다수의 합성곱-풀링 혼합 특징맵으로 이루어진 합성곱-풀링 혼합 레이어를 생성(S1e)하게 된다. 참고로, 상기 "다수 개의 영역을 처리함"이란, 합성곱 연산과 풀링 연산이 결합된 방식을 이용하여 처리함을 의미하며, 이하에서는 이를 '합성곱-풀링 혼합 연산'이라 칭하기로 한다.FIG. 1 is a flowchart illustrating a process of generating a composite product-pooling mixed layer according to the present invention. 1, an image recognition method using a Convolution Neural Network (CNN) according to the present invention includes a plurality of regions S1a allocated in accordance with a predetermined interval in a recognition target image in parallel (S1b), maps the extracted features (S1c) to a composite product-pooling mixed feature map (S1d), and uses a plurality of kernels in the process to generate a plurality of composite product- And generates a composite product-pooling mixed layer made up of maps (S1e). For the sake of reference, the above-mentioned "processing of a plurality of areas" means processing by using a combination of a composite product operation and a pooling operation. Hereinafter, this is referred to as a " composite product-pooling operation.

그리고, 이렇게 생성된 합성곱-풀링 혼합 특징맵에 대하여 다시 소정 간격(Stride)에 따라 할당된 다수 개의 영역을 병렬적으로 동시 처리하여 보다 상위 수준의 합성곱-풀링 혼합 특징맵을 생성하는 과정을 다수 회 반복함으로써 마지막 층에서는 전결합 레이어(Fully connected layer)를 출력하도록 구성된다.Then, a process of concurrently processing a plurality of regions assigned in accordance with a predetermined interval (Stride) concurrently with the thus-generated resultant product-pooling mixed feature map to generate a higher-level resultant product-pooling mixed feature map And is configured to output a fully connected layer in the last layer by repeating a plurality of times.

이와 같이 본 발명에서 제시하는 합성곱-풀링 혼합 레이어 방식에 따르면, 종래 처리 방식에 필수적으로 수반되는 합성곱 특징맵(Convolution feature map)의 메모리 저장 과정을 생략할 수 있게 되고, 이에 따라 종래 합성곱 특징맵의 수만큼 메모리 접근 수를 줄일 수 있고, 스레드를 효율적으로 분배할 수 있어 결국 이미지 분류를 위한 연산 속도를 크게 향상시킬 수 있게 된다.As described above, according to the present invention, it is possible to omit the memory storing process of the convolution feature map, which is essential for the conventional processing method, It is possible to reduce the number of memory accesses by the number of characteristic maps and efficiently distribute the threads, and as a result, the operation speed for image classification can be greatly improved.

도 2는 본 발명의 제1 영역에 합성곱-풀링 혼합 연산을 적용하여 제1 영역으로부터 제1 특징을 추출하는 과정을 도식화한 도면이고, 도 3은 본 발명의 인식 대상 이미지를 다수 개로 분할한 후 병렬 구조로 연산 처리하여 합성곱-풀링 혼합 특징맵을 생성하는 과정을 도식화한 도면이다.FIG. 2 is a diagram illustrating a process of extracting a first feature from a first region by applying a composite product-and-pooling operation to a first region of the present invention. FIG. Parallel structure to generate a synthesized product-pooling mixed feature map. As shown in FIG.

도 2 및 도 3을 참조하면, 하나의 합성곱-풀링 혼합 특징맵(40)(이하, '제1 합성곱-풀링 혼합 특징맵'이라 함)은 상호 병렬적으로 동시 수행되는 제1 처리, 제2 처리, 제3 처리,... 및 제M 처리를 통해 생성된다. 상기 제1 처리, 제2 처리, 제3 처리,... 및 제M 처리는 서로 동일한 처리 방식을 따르되, 그 처리를 위한 입력 영역이 상이한 것이 차이점이다.2 and 3, one resultant product-and-pooling mixing feature map 40 (hereinafter referred to as a 'first composite product-pooling mixing feature map') includes a first process that is performed concurrently and in parallel, The second process, the third process, ..., and the M process. The first processing, the second processing, the third processing, ..., and the M processing follow the same processing method, but the input areas for the processing are different.

도 2의 예시는 이러한 제1 처리, 제2 처리, 제3 처리,... 및 제M 처리 중 어느 하나의 처리 동작(예컨대, '제1 처리'라 가정함)을 설명하기 위한 도면으로서, 인식 대상 이미지에 대한 합성곱-풀링 혼합 연산을 통해 합성곱-풀링 혼합 특징맵(40)에 매핑될 특성(Feature)을 추출하는 과정을 보여준다.The example of FIG. 2 is a diagram for explaining a processing operation (for example, "first processing") of any one of the first processing, the second processing, the third processing, and the M processing, And a process of extracting a feature to be mapped to the composite product-pooling mixed feature map 40 through a composite product-pooling mixture operation on the recognition target image.

구체적으로, 제1 처리는 제1 합성곱-풀링 혼합 특징맵(40) 생성을 위한 과정으로서, 세부적으로 영역 입력 단계(단계 1a), 합성곱 연산 수행 단계(단계 1b), 특징 추출 단계(단계 1c), 및 특징맵 매핑 단계(단계 1d)를 포함한다.Specifically, the first process is a process for generating a first synthesized product-pooling mixed feature map 40, which includes the steps of inputting a region in detail (step 1a), performing a composite product operation (step 1b) 1c), and a feature map mapping step (step 1d).

단계 1a는 N×N 픽셀 사이즈의 인식 대상 이미지(10)(Input) 중 n×n(여기서, n<N) 픽셀 사이즈로 이루어진 제1 영역(21)을 입력으로 읽어들이는 단계이다. 도 2 예시의 경우 인식 대상 이미지(10)(Input)의 'N×N'은 '10×10' 픽셀에 해당하고, 제1 영역(21)의 'n×n'은 인식 대상 이미지(10) 중 좌측 상부의 '6×6' 픽셀에 해당한다.Step 1a is a step of reading, as an input, a first region 21 having n × n (where n <N) pixel sizes among the image to be recognized 10 (Input) of N × N pixel size. In the example of FIG. 2, 'N × N' of the recognition target image 10 corresponds to '10 × 10' pixels, 'n × n' of the first region 21 corresponds to the recognition target image 10, It corresponds to a '6 × 6' pixel in the upper left of the middle.

단계 1b는 상기 제1 영역(21)에 대하여, 'Stride = 1'의 (n-1)×(n-1) 커널(30)을 적용하여 합성곱(Convolution) 연산을 수행하는 단계이다. 도 2 예시의 경우, 커널(30)(필터)의 '(n-1)×(n-1)'은 '((n=6)-1)×((n=6)-1)'인 '5×5'에 해당하고, 이 '5×5' 커널(30)을 상기 제1 영역(21)에 "Stride = 1"을 적용하여 총 네 번의 합성곱 (Convolution) 연산을 수행하게 된다.Step 1b is a step of performing a convolution operation on the first area 21 by applying (n-1) x (n-1) kernels 30 of 'Stride = 1'. In the example of FIG. 2, '(n-1) × (n-1)' of the kernel 30 (filter) is' ((n = 6) -1) '5 × 5', and the '5 × 5' kernel 30 is subjected to a total of four convolution operations by applying "Stride = 1" to the first area 21.

첫 번째 합성곱 연산(이하, 제1 합성곱 연산)은 입력이 신호처리에서 f(t), 커널이 g(t)라고 한다면, g(t)를 반전하여 입력과 커널의 겹치는 부분을 연산한다. 즉, 6×6의 제1 영역(21)에 있어서 그 좌측 상부에 위치하는 커널과 오버랩되는 부분에서 각 요소별 위치에 있는 값을 곱하고 그 값들을 모두 합친(Summation of Products, SOP) 결과를 출력한다.If the input is f (t) in the signal processing and the kernel is g (t), the first convolution operation (hereinafter referred to as the first convolution operation) computes the overlap between the input and the kernel by inverting g . That is, in the 6 × 6 first region 21, the value at the position of each element is multiplied by the portion overlapping with the kernel located at the upper left of the 6 × 6 region, and the summation of products (SOP) do.

두 번째 합성곱 연산(이하, 제2 합성곱 연산)은 제1 합성곱 연산에 사용된 커널(30)을 좌측으로 한 칸 이동시켜, 제1 영역(21)과 커널(30)(우측 상부에 위치하는 커널(30))이 오버랩되는 부분에서 각 요소별 위치에 있는 값을 곱하고 그 값들을 모두 합친 결과를 출력한다.The second composite product operation (hereinafter referred to as a second composite product calculation) moves the kernel 30 used for the first composite product calculation to the left by one space to form the first region 21 and the kernel 30 Multiplied by the value at the position of each element in the overlapping part of the kernel 30, and outputs the result of summing the values.

세 번째 합성곱 연산(이하, 제3 합성곱 연산)은 제2 합성곱 연산에 사용된 커널(30)을 밑으로 한 칸 이동시켜, 제1 영역(21)과 커널(30)(즉, 우측 하부에 위치하는 커널)이 오버랩되는 부분에서 각 요소별 위치에 있는 값을 곱하고 그 값들을 모두 합친 결과를 출력한다.The third synthesis product operation (hereinafter referred to as the third synthesis product operation) moves the kernel 30 used in the second synthesis product by one space downward to form the first region 21 and the kernel 30 The kernel at the lower part) is multiplied by the value at the position of each element in the overlapping part, and the result of summing the values is output.

네 번째 합성곱 연산(이하, 제4 합성곱 연산)은 제3 합성곱 연산에 사용된 커널(30)을 우측으로 한 칸 이동시켜, 제1 영역(21)과 커널(30)(즉, 좌측 하부에 위치하는 커널)이 오버랩되는 부분에서 각 요소별 위치에 있는 값을 곱하고 그 값들을 모두 합친 결과를 출력한다.The fourth synthesis product operation (hereinafter referred to as the fourth synthesis product operation) moves the kernel 30 used for the third synthesis operation to the right by one space so that the first area 21 and the kernel 30 The kernel at the lower part) is multiplied by the value at the position of each element in the overlapping part, and the result of summing the values is output.

그리고, 전술한 각 합성곱 연산을 수행하여 출력되는 데이터를 활성화 함수(Activation function)를 통해 값을 비선형 함수 형태로 변형시켜준다. 비선형 함수를 쓰는 이유는 이러한 활성화 함수가 존재하지 않을 때 뉴럴 네트워크가 아무리 깊어져도 얕은 뉴럴 네트워크로 퇴화하게 되기 때문이다. 예를 들어 단순하게 바이어스(bias)가 없는 뉴럴 네트워크가 존재한다고 가정하였을 때, 히든 레이어를 행렬로 보고 계산하면 "H=WX, Y=WH" 가 성립될 수 있다. 이는 결국 입력에 대하여, 입력과 히든 사이의 가중치와 히든과 출력 사이의 가중치를 곱한 것으로, 하나의 행렬과 곱한 결과로 볼 수 있다. 결국 뉴럴 네트워크가 깊어져도 도 4(b)와 같이 얕아진 뉴럴 네트워크 효과를 갖게 된다. 만약, 많은 파라미터를 넣게 되었다면 오히려 성능은 좋아지지 않은 채 속도만 저조하게 되며, 이러한 이유로 선형 형태의 함수가 되지 않게 하기 위해 활성화 함수는 비선형 함수를 사용하는 것이 바람직하다.Then, the output data is transformed into a nonlinear function through an activation function by performing each of the above-described composite product operations. The reason for using a nonlinear function is that when the activation function does not exist, the neural network degenerates into a shallow neural network no matter how deep. For example, assuming that there is a simple neuronal network with no bias, "H = WX, Y = WH" can be established by calculating the hidden layer as a matrix. This, in turn, is the result of multiplying the input by the weight between the input and the hidden and the weight between the hidden and the output, multiplied by a matrix. As a result, even if the neural network is deepened, it has a shallow neural network effect as shown in Fig. 4 (b). If you put a lot of parameters, the performance is not getting better but the speed is lowered. For this reason, it is preferable to use a nonlinear function for the activation function so that it does not become a linear function.

단계 1c는 단계 1b의 각 합성곱 연산 결과를 공유 메모리(Shared memory)에 저장하지 않고, 상기 단계 1b의 각 합성곱 연산 결과를 가지고 연이어 풀링 (Pooling) 연산을 수행하여 제1 특징을 추출하는 단계이다.Step 1c is a step of extracting a first characteristic by performing a pooling operation successively with the result of each of the resultant products of step 1b without storing the result of each of the resultant products of step 1b in a shared memory, to be.

전술한 단계 1b의 합성곱 연산을 수행하고 활성화 함수를 거친 뒤의 출력값들의 크기는 입력의 크기에서 크게 작아지지 않는다. 이를 줄이기 위해서 풀링 (Pooling) 또는 서브샘플링(Subsampling) 연산을 수행하게 된다.The size of the output values after performing the product multiply operation of step 1b and passing through the activation function does not become much smaller in the size of the input. In order to reduce this, pooling or subsampling operation is performed.

풀링 연산은 도 5와 같이 4 개의 값을 가지고 어떤 특정한 기준에 따라 값을 골라내는 과정이다. 환언하면, 합성곱 연산을 통해 한 단계 높은 추상화된 정보를 추출한 다음, 그 추상화된 정보에서 가장 중요한 정보만을 남기도록 그 크기를 압축 요약하는 과정이다.The pooling operation is a process of selecting a value according to a specific criterion with four values as shown in FIG. In other words, it is a process of extracting a higher level of abstracted information through a product multiply operation and compressing the size of the abstracted information so as to leave only the most important information.

여기서, 압축 요약되는 크기는 바람직하게는 도 5와 같이 1/4 크기로 압축되는 2 x 2 사이즈 일 수 있으나 반드시 이에 한정할 필요는 없다. 스트라이드 (Stride)는 도 5와 같이 'Stride=2'를 적용하는 것이 바람직하나 반드시 이에 한정할 필요는 없다. 풀링 연산은 최대값을 출력하는 Max Pooling 방식을 적용하는 것이 바람직하나 이에 한정할 필요는 없다. 예컨대, 풀링 연산은 평균값을 출력하는 Average Pooling 방식을 적용할 수도 있음은 물론이다.Here, the compressed size is preferably 2 x 2 size compressed to 1/4 size as shown in FIG. 5, but it is not necessarily limited thereto. As for the stride, it is preferable to apply 'Stride = 2' as shown in FIG. 5, but it is not necessarily limited thereto. It is preferable to apply the Max Pooling method of outputting the maximum value, but the present invention is not limited thereto. For example, it is needless to say that an average pooling method for outputting an average value may be applied to a pooling operation.

풀링 연산은 커널 값과 곱해지는 것이 아니기 때문에 학습해야할 매개변수가 없으며 입력 수만큼 그대로 출력하기 때문에 채널의 변화도 없으며, 이러한 풀링 연산을 통해 계산해야 할 변수량을 줄여줄 수 있게 된다.Since the pooling operation is not multiplied with the kernel value, there is no parameter to learn, and since the number of input is outputted as it is, there is no change of channel, and it is possible to reduce the amount of variables to be calculated through such a pooling operation.

도 2를 기준으로 설명하면, 단계 1b 제1 합성곱 연산에 의한 출력값, 제2 합성곱 연산에 의한 출력값, 제3 합성곱 연산에 의한 출력값 및 제4 합성곱 연산에 의한 출력값을 가지고, Max Pooling 방식을 적용하여 특징(Feature)(이하, '제1 특징'이라 함)을 추출한다. 도 2 예시의 경우 제4 합성곱 연산에 의한 출력값(즉, 보라색 영역의 '8')이 제1 특징에 해당한다. 이와 같이 추출된 제1 특징은 공유 메모리에 저장하고 특징맵(Feature map)에 매핑(단계 1d)되어 본 발명의 제1 합성곱-풀링 혼합 특징맵(40)을 구성하게 된다.Referring to Fig. 2, step 1b has an output value by the first convolution operation, an output value by the second convolution operation, an output value by the third convolution operation, and an output value by the fourth convolution operation, (Hereinafter, referred to as 'first feature') is extracted. In the example of FIG. 2, the output value (i.e., '8' in the purple region) by the fourth composite product calculation corresponds to the first characteristic. The extracted first feature is stored in the shared memory and mapped to a feature map (step 1d) to form the first combined product-pooling mixed feature map 40 of the present invention.

본 발명의 제2 처리는 제1 합성곱-풀링 혼합 특징맵(40)을 생성하기 위한 과정으로서, 세부적으로 영역 입력 단계(단계 2a), 합성곱 연산 수행 단계(단계 2b), 특징 추출 단계(단계 2c), 및 특징맵 매핑 단계(단계 2d)를 포함하고, 전술한 제1 처리와 독립된 병렬 구조로 수행된다.The second process of the present invention is a process for generating a first synthesized product-and-pooling mixed feature map 40, which comprises the steps of inputting a region in detail (step 2a), performing a synthesis product operation (step 2b) Step 2c), and a feature map mapping step (step 2d), and is performed in a parallel structure independent of the first processing described above.

단계 2a는 제1 처리에서 사용된 인식 대상 이미지(10) 중 제1 영역(21) 대비 'Stride = m'만큼 이동되어 n×n 픽셀로 이루어진 제2 영역(22)을 입력으로 읽어들이는 단계이다. 도 2,3 예시의 경우 인식 대상 이미지(10)의 'N×N'은 '10×10' 픽셀에 해당하고, 제1 영역(21)의 'n×n'은 인식 대상 이미지(10) 중 좌측 상부의 '6×6' 픽셀에 해당하며, 스트라이드는 'Stride = 2'를 적용하였다. 따라서 제2 영역(22)의 'n×n'은 제1 영역(21)에서 우측으로 2칸 이동한 영역 즉, 인식 대상 이미지(10) 중 중심 상부의 '6×6' 픽셀에 해당하게 된다.Step 2a is a step of reading as input a second region 22 of n × n pixels shifted by 'Stride = m' with respect to the first region 21 of the recognition target image 10 used in the first process to be. In the example of FIGS. 2 and 3, 'N × N' of the recognition target image 10 corresponds to '10 × 10' pixels, 'n × n' of the first region 21 corresponds to It corresponds to '6 × 6' pixel in upper left corner, and 'Stride = 2' is applied to stride. Accordingly, 'nxn' in the second area 22 corresponds to a region shifted two spaces to the right in the first area 21, that is, a '6x6' pixel in the center of the recognition target image 10 .

단계 2b는 상기 제2 영역(22)에 대하여, 제1 처리와 동일한 커널 즉, 'Stride = 1'의 (n-1)×(n-1) 커널(30)을 적용하여 합성곱(Convolution) 연산을 수행하는 단계이다. 도 2 예시의 경우, 'Stride = 1'의 '5×5' 커널(30)을 상기 제2 영역(22)에 적용하여 총 네 번의 합성곱(Convolution) 연산을 수행하게 된다. 제2 처리의 상기 네 번의 합성곱 연산 방식은 제1 처리의 그것과 동일하므로 상세한 설명은 생략하기로 한다.Step 2b is a process for convoluting the second area 22 by applying the same kernel as the first process, that is, (n-1) x (n-1) kernels 30 of 'Stride = 1' And performing an operation. In the example of FIG. 2, a '5 × 5' kernel 30 of 'Stride = 1' is applied to the second area 22 to perform a total of four convolution operations. The fourth composite product operation method of the second process is the same as that of the first process, and a detailed description thereof will be omitted.

단계 2c는 단계 2b의 각 합성곱 연산 결과를 공유 메모리(Shared memory)에 저장하지 않고, 상기 단계 2b의 각 합성곱 연산 결과를 가지고 연이어 풀링 (Pooling) 연산을 수행하여 제1 특징과는 다른 또 다른 하나의 특징(이하, '제2 특징'이라 함)을 추출하는 단계이다. 단계 2c의 풀링 연산은 제1 처리의 단계 1c의 풀링 연산과 그 처리 방식이 동일하므로 상세한 설명은 생략하기로 한다.Step 2c does not store the result of each of the products in step 2b in a shared memory but performs a pooling operation successively with the result of each of the products in step 2b, (Hereinafter referred to as " second feature "). The pulling operation in the step 2c is the same as the pulling operation in the step 1c of the first processing, and a detailed description thereof will be omitted.

도 3 예시의 경우 도 3(b)의 붉은색 박스 영역이 제2 영역(22)에 해당하고, 이 제2 영역(22)에 대하여 전술한 제2 처리를 수행함으로써 제2 특징(도 3의 붉은색 영역의 '9')을 추출할 수 있게 된다. 그리고, 이와 같이 추출된 제2 특징은 특징맵에 매핑(단계 2d)되어 도 3(f)와 같은 제1 합성곱-풀링 혼합 특징맵(40)을 구성하게 된다.In the case of FIG. 3, the red box area in FIG. 3 (b) corresponds to the second area 22, and the second characteristic described above with respect to the second area 22 9 'in the red color area) can be extracted. The second feature extracted in this way is mapped to the feature map (step 2d) to constitute the first synthesized product-pooling mixed feature map 40 as shown in FIG. 3 (f).

본 발명의 제3 처리는 제1 합성곱-풀링 혼합 특징맵(40)을 생성하기 위한 과정으로서, 영역 입력 단계, 합성곱 연산 수행 단계, 특징 추출 단계, 및 특징맵 매핑 단계를 포함하며, 상기 각 단계의 구체적인 방법은 제1 처리의 각 단계와 동일하다. 다만, 제3 처리는 제2 처리의 제2 영역(22) 대비 'Stride = m'만큼 이동되어 n×n 픽셀로 이루어진 제3 영역(23)을 입력으로 읽어들여, 이에 대하여 전술한 합성곱 연산 단계를 통해 특징을 추출하는 것이 차이점이다. 도 3 예시의 경우, 스트라이드는 'Stride = 2'를 적용하였다. 따라서 제3 영역(23)의 'n×n'은 제2 영역(22)에서 우측으로 2칸 이동한 영역 즉, 인식 대상 이미지(10) 중 우측 상부의 '6×6' 픽셀에 해당하게 된다.The third process of the present invention is a process for generating a first synthesized product-pooling mixed feature map 40, which includes an area input step, a composite product calculation step, a feature extraction step, and a feature map mapping step, The concrete method of each step is the same as each step of the first processing. However, the third process is shifted by 'Stride = m' with respect to the second region 22 of the second process and the third region 23 of n × n pixels is read as an input, The difference is that the features are extracted through the steps. In the example of FIG. 3, 'Stride = 2' is applied to the stride. Accordingly, 'nxn' in the third region 23 corresponds to a region shifted two spaces to the right in the second region 22, that is, a '6x6' pixel in the upper right portion of the recognition target image 10 .

그리고 이러한 제3 영역(23)에 대하여 'Stride = 1'의 (n-1)×(n-1) 커널(30)(예컨대 도 2의 '5×5' 커널)을 적용하여 총 네 번의 합성곱(Convolution) 연산을 수행한 후, 그 합성곱 연산 결과를 가지고 연이어 풀링 연산하여 제2 특징과는 다른 또 다른 하나의 특징(이하, '제3 특징'이라 함)을 추출하게 된다. 그리고 이와 같이 추출된 제3 특징(도 3의 녹색 영역의 '5')은 특징맵에 매핑되어 도 3 (f)와 같은 합성곱-풀링 혼합 특징맵(40)을 구성하게 된다.(N-1) × (n-1) kernels 30 (for example, the '5 × 5' kernel in FIG. 2) of 'Stride = 1' are applied to the third area 23, (Hereinafter, referred to as 'third feature') different from the second feature by extracting the third feature from the result of the convolution operation. The extracted third feature ('5' in the green region of FIG. 3) is mapped to the feature map to form the compound-pooling mixed feature map 40 as shown in FIG. 3 (f).

본 발명의 제4 처리는 제3 처리와 마찬가지로 영역 입력 단계, 합성곱 연산 수행 단계, 특징 추출 단계, 및 특징맵 매핑 단계를 포함하고, 상기 각 단계의 구체적인 방법은 제1 처리의 각 단계와 동일하되, 다만 제3 처리의 제3 영역(23) 대비 "stride = m"만큼 이동되어 n×n 픽셀로 이루어진 제4 영역(24)을 입력으로 읽어들인 후, 이 제3 영역(23)에 대하여 제1 처리와 동일한 커널(30)을 사용하여 합성곱 연산 및 특징 추출을 수행하는 것이 차이점이다. 제4 처리를 통해 추출된 제4 특징(도 3의 보라색 영역의 '6')은 특징맵에 매핑되어 도 3(f)와 같은 제1 합성곱-풀링 혼합 특징맵(40)을 구성하게 된다.The fourth process of the present invention includes an area input step, a concurrent product calculation step, a feature extraction step, and a feature map mapping step as in the third processing, and the specific method of each step is the same as each step of the first processing , But the fourth area 24 of n × n pixels is shifted by "stride = m" with respect to the third area 23 of the third process and input to the third area 23 The difference is that the synthesis product operation and feature extraction are performed using the same kernel 30 as the first process. The fourth feature ('6' in the purple region of FIG. 3) extracted through the fourth process is mapped to the feature map to construct a first synthesized product-pooling mixed feature map 40 as shown in FIG. 3 (f) .

본 발명의 제M 처리는 제1 합성곱-풀링 혼합 특징맵(40)을 생성하기 위한 가장 마지막 영역(즉, 후술할 제M 영역)을 처리하는 과정으로서, 전술한 제1 내지 제4 처리와 동일하되 다만 제M-1 영역 대비 'Stride = m'만큼 이동되어 n×n 픽셀로 이루어진 제M 영역을 입력으로 읽어들인 후, 이 제M 영역에 대하여 제1 처리와 동일한 커널(30)을 사용하여 합성곱 연산 및 특징 추출을 수행하는 것이 차이점이다. 여기서, 상기 '제M-1 영역'이란 제M 처리 바로 이전의 처리에 할당된 영역을 지칭한다. 예컨대 제M 처리가 제9 처리(M=9)라면, 제M-1 영역은 제8 처리에 할당된 영역 즉, 제8 영역을 지칭하고, 제M 영역은 제9 영역에 해당하게 된다.The Mth process of the present invention is a process of processing the last area (i.e., the Mth area to be described later) for generating the first resultant multiply-and-pool mixing feature map 40, The M-th region is shifted by 'Stride = m' with respect to the M-1 region and is read as an input, and then the same kernel 30 as the first process is used for the M-th region And performs the product multiply operation and feature extraction. Here, the 'M-1 region' refers to an area allocated to a process immediately before the Mth processing. For example, if the Mth processing is the ninth processing (M = 9), the M-1 area refers to the area allocated to the eighth processing, that is, the eighth area, and the Mth area corresponds to the ninth area.

즉, 제M 영역은 인식 대상 이미지(10)에 있어서 'Stride = m'에 따라 분할된 다수 개의 처리 영역에 있어서 가장 마지막 영역에 해당한다. 예컨대 도 3과 같은 10×10 사이즈의 인식 대상 이미지(10)와 'Stride = 2'를 기준으로 하면, 10×10 인식 대상 이미지(10)의 9개 영역 중 우측 하단부 영역일 수 있다. 그리고, 도 3 예시의 경우 제M 처리는 제9 처리에 해당하게 된다.That is, the Mth region corresponds to the last region in the plurality of processing regions divided according to 'Stride = m' in the recognition target image 10. [ For example, when the recognition target image 10 of 10 × 10 size as shown in FIG. 3 and 'Stride = 2' are used as a reference, the region may be the right lower end region among the nine regions of the 10 × 10 recognition target image 10. In the example of FIG. 3, the Mth processing corresponds to the ninth processing.

종래 합성곱 신경망을 이용한 이미지 인식 방법은 인식 대상 이미지(10)의 전체 영역을 하나의 입력으로 읽어들인 후, 이 영역에 대해 소정 간격(Stride=n)으로 커널(30)을 순차적으로 이동시켜나가며 합성곱 연산을 수행하고 그 출력값을 합성곱 특징맵에 매핑 및 저장하도록 구성되었다. 이에 따라 각각의 합성곱 연산 결과를 공유 메모리(Shared memory)에 저장해야만 했고, 이 합성곱 특징맵의 수만큼 메모리 접근 수가 증가하여 연산 속도가 느려지는 한계가 있었다.In the image recognition method using the conventional articulated neural network, the entire area of the image to be recognized 10 is read as one input, and the kernel 30 is sequentially moved at a predetermined interval (Stride = n) Perform a composite product operation and map and store the output value in a composite product feature map. As a result, each result of the convolution operation must be stored in a shared memory, and the number of memory accesses is increased by the number of the convolution feature maps.

그러나, 전술한 바와 같은 본 발명의 합성곱 신경망을 이용한 이미지 인식 방법에 의하면, 인식 대상 이미지(10)의 전체 영역을 하나의 입력으로 읽어들이는 종래 방식과 달리, 이 인식 대상 이미지(10)를 다수 개의 영역(즉, 전술한 제1 내지 제M 영역)으로 분할한 후, 이 분할된 각 영역을 독립적 입력으로 읽어들여 병렬 구조의 합성곱-풀링 혼합 연산을 수행하도록 구성된다.However, according to the image recognition method using the composite-object neural network of the present invention as described above, unlike the conventional method of reading the entire area of the recognition target image 10 as one input, Division into a plurality of regions (i.e., the first to M-th regions described above), and then reads the divided regions as independent inputs to perform a combined product-pooling mixing operation of a parallel structure.

환언하면, 본 발명의 합성곱 신경망을 이용한 이미지 인식 방법은 인식 대상 이미지(10)로부터 분할된 다수 개 영역에 대하여 각각 "Stride = 1"의 (n-1)×(n-1) 커널(30)을 적용하여 합성곱-풀링 혼합 연산을 병렬적으로 동시에 수행하고, 이에 의해 각 영역에서 거의 동시에 추출되는 특징들을 특징맵에 매핑시킴으로써 하나의 합성곱-풀링 혼합 특징맵(40)을 생성하게 된다.(N-1) x (n-1) kernels 30 of "Stride = 1" for a plurality of areas divided from the recognition object image 10, ) Are applied in parallel to perform a combined product-pooling mixing operation, thereby mapping a feature extracted at about the same time in each region to a feature map, thereby generating a combined product-pooling mixed feature map 40 .

따라서, 종래 처리 방식에 필수적으로 수반되는 합성곱 특징맵(Convolution feature map)의 메모리 저장 과정을 생략할 수 있게 되고, 이에 따라 종래 합성곱 특징맵의 수만큼 메모리 접근 수를 줄일 수 있어 이미지 분류를 위한 연산 속도를 향상시킬 수 있게 된다.Accordingly, it is possible to omit the memory storing process of the convolution feature map, which is essentially required for the conventional processing method, thereby reducing the number of memory accesses by the number of the conventional result multiply feature maps, It is possible to improve the computation speed for the above.

한편, 본 발명의 합성곱 신경망을 이용한 이미지 인식 방법인 제1 내지 제M 처리는 GP-GPU(General-Purpose Computing on Graphics Processing Units)를 이용하여 병렬 구조로 처리할 수 있다.Meanwhile, the first to Mth processes, which are image recognition methods using the composite-object-based neural network of the present invention, can be processed in parallel using GP-GPU (General-Purpose Computing on Graphics Processing Units).

상기 경우, GP-GPU가 가지고 있는 SIMT(Single Instruction Multiple Thread) 구조에서, 제1 처리는 다수 개의 스레드 중 어느 하나의 스레드(이하, 제1 스레드)에 할당하여 동작되고, 제2 처리는 제1 스레드와 구분되는 또 다른 하나의 스레드(이하, 제2 스레드)에 할당하여 동작되며, 제M 처리 역시 앞선 스레드들과 구분되는 또 다른 하나의 스레드에 할당하여 동작되게 구성될 수 있다.In the above case, in the single instruction multiple thread (SIMT) structure of the GP-GPU, the first process is assigned to one of the plurality of threads (hereinafter referred to as a first thread) (Hereinafter, referred to as " second thread ") that is distinguished from the thread, and the Mth processing may be configured to be allocated to another thread that is distinguished from the preceding threads.

이에 따라, 전술한 메모리 접근 수 단축 효과(즉, 메모리 접근시간 최소화)와 더불어 매우 효율적인 스레드 분배를 구현할 수 있어, 결국 종래 이미지 인식 방법보다 더 빠른 속도로 목적하는 결과를 출력할 수 있게 된다.Accordingly, it is possible to implement a very efficient thread distribution in addition to the memory access shortening effect (that is, minimizing the memory access time) described above, so that the desired result can be outputted at a faster speed than the conventional image recognition method.

환언하면, 인식 대상 이미지(10)를 다수 개의 영역으로 분할하고, 이 다수 영역 각각에 대하여 합성곱 레이어와 풀링 레이어를 혼합한 형태를 적용함으로써, 스레드 연산량을 최소화할 수 있게 되었다.In other words, by dividing the image to be recognized 10 into a plurality of regions, and applying a combination of a composite product layer and a pooling layer to each of the plurality of regions, the amount of thread operation can be minimized.

도 6은 본 발명에 따른 합성곱 신경망을 이용한 이미지 인식 방법의 CNN 구조를 나타낸 도면으로서, 본 발명의 합성곱-풀링 혼합 레이어를 기반으로 전결합 레이어를 출력하는 과정을 보여준다.FIG. 6 is a diagram illustrating a CNN structure of an image recognition method using a composite-object-based neural network according to the present invention. FIG. 6 illustrates a process of outputting a fore-coupling layer based on the composite product-pooling mixed layer of the present invention.

도 6의 예시를 기준으로 설명하면, 본 발명에 따른 합성곱 신경망을 이용한 이미지 인식 방법은 28×28 사이즈의 인식 대상 이미지(10)에 대하여 8개의 커널을 사용하여 각 커널(30)마다 전술한 제1 내지 제M 처리 방식에 따르는 병렬 구조 연산을 수행함으로써, 12×12의 제1 합성곱-풀링 혼합 특징맵(40)들로 이루어진 제1 합성곱-풀링 혼합 레이어를 생성하게 된다.6, an image recognition method using a composite-object-based neural network according to the present invention uses 8 kernels for a recognition object image 10 of 28 × 28 size, By performing the parallel structure calculation according to the first to Mth processing methods, a first composite product-summing mixed layer composed of 12 × 12 first-order product-pooling blending feature maps 40 is generated.

그리고 이렇게 생성된 제1 합성곱-풀링 혼합 레이어의 각 합성곱-풀링 혼합 특징맵에 대하여, 다시 전술한 제1 내지 제M 처리 방식에 따르는 병렬 구조 연산을 수행하여 새로운 특징(Feature)들을 추출한 후 이를 특징맵에 매핑시킴으로써 4×4의 제2 합성곱-풀링 혼합 특징맵들로 이루어진 제2 합성곱-풀링 혼합 레이어를 생성하게 된다.Then, for each of the resultant product-pooling mixed feature maps of the first resultant product-pooling mixed layer thus generated, a parallel structure operation according to the first to Mth processing methods is performed again to extract new features And maps it to a feature map to produce a second composite product-pooling mixed layer of 4 × 4 second composite product-pooling mix feature maps.

인식 대상 이미지(10)의 픽셀 사이즈에 따라, 필요시 전술한 제1 내지 제M 처리와 동일한 처리 방식을 다수 회 더 반복 수행하여, 최종적으로 4×4 사이즈의 합성곱-풀링 혼합 특징맵들로 이루어진 합성곱-풀링 혼합 레이어를 생성하게 된다. 참고로, 도 6 예시의 경우 두 번째 합성곱-풀링 혼합 레이어(즉, 제2 합성곱-풀링 혼합 레이어)가 4×4 사이즈의 합성곱-풀링 혼합 레이어에 해당한다.According to the pixel size of the recognition target image 10, if necessary, the same processing method as the first to Mth processing described above is repeated a plurality of times to finally obtain the 4 × 4 size composite product-pooling blending feature maps Lt; RTI ID = 0.0 > product-pooling < / RTI > For reference, in the example of FIG. 6, the second composite product-pooling mixed layer (i.e., the second composite product-pooling mixed layer) corresponds to a 4 × 4 composite product-pooling mixed layer.

4×4의 합성곱-풀링 혼합 레이어가 생성되면, 합성곱-풀링 혼합 레이어의 모든 뉴런과 완전 연결되는 전결합 레이어(Fully connected layer)를 출력한 후, 상기 전결합 레이어의 출력값이 어떤 분류에 해당하는지 판별하게 된다. 도 6 예시의 경우, 최종적으로 16×4×4(= 256)의 입력을 가진 신경망으로 학습을 하여 이미지를 분류한다.When a 4 × 4 composite product-pooling mixed layer is generated, a fully connected layer that is fully connected to all the neurons of the composite product-pooling mixed layer is output, and the output value of the full- It is determined whether it is applicable. In the case of FIG. 6, finally, the image is classified by learning with a neural network having an input of 16 × 4 × 4 (= 256).

도 7은 종래 합성곱 신경망을 이용한 이미지 인식 방법의 CNN 구조를 나타낸 도면으로서, 종래 합성곱 레이어와 풀링 레이어를 기반으로 전결합 레이어를 출력하는 과정을 보여준다.FIG. 7 is a diagram illustrating a CNN structure of an image recognition method using a conventional artificial neural network. FIG. 7 shows a process of outputting a foreground layer based on a conventional composite product layer and a pulling layer.

도 7을 참조하면, 종래 합성곱 신경망을 이용한 이미지 인식 방법은 인식 대상 이미지(10)의 전체 영역을 하나의 입력으로 읽어들이고, 이 전체 영역에 대해 커널(30)을 순차적으로 이동시켜나가며 합성곱 연산을 수행하고 그 출력값을 합성곱 특징맵에 매핑한 후 이에 대하여 풀링 연산을 수행하는 과정을 다수 회 반복하여 최종적으로 4×4의 풀링 레이어를 생성하게 된다.Referring to FIG. 7, in the conventional image recognition method using a composite neural network, the entire area of the image to be recognized 10 is read as one input, the kernel 30 is sequentially moved over the entire area, Operation is performed, the output value is mapped to the resultant product feature map, and then the pooling operation is performed for the resultant number of times, and finally a 4 × 4 pooling layer is generated.

결국, 종래 합성곱 신경망을 이용한 이미지 인식 방법은 각각의 합성곱 연산 결과를 공유 메모리(Shared memory)에 저장해야만 했고, 종국에는 이 합성곱 특징맵의 수만큼 메모리 접근 수가 증가하여 연산 속도가 느려지는 한계가 있었다.As a result, in the image recognition method using the conventional artificial neural network, the result of each of the resultant products must be stored in a shared memory, and eventually the number of memory accesses increases by the number of the resultant product feature maps, There was a limit.

다음의 실험예 1은 종래 CNN과 본 발명의 합성곱-풀링 혼합 레이어 기반의 CNN 간의 연산 속도를 테스트한 것이다.The following Experimental Example 1 tests the operation speed between conventional CNN and CNN based on the inventive composite product-pooling mixed layer.

실험예Experimental Example 1 One

실험예 1에서는 손글씨 인식을 위한 CNN 구조 중 특징 추출부만을 가지고 MNIST[5] 입력 28×28 이미지 하나와 미리 학습된 가중치를 사용해 Feed Forward 연산 속도를 검증하였다. 실험을 위해 Xilinx의 VC707 FPGA에 16개의 warp과 16개의 스레드로 총 256개의 스레드를 가지고 있는 GP-GPU를 사용하였다. 참고로, 실험예 1의 특징 추출부란 종래 CNN의 경우 합성곱 레이어와 풀링 레이어를 지칭하고, 본 발명의 CNN의 경우 합성곱-풀링 혼합 레이어를 지칭한다.In Experimental Example 1, the feed forward operation speed was verified by using one of the MNIST [5] input 28 × 28 image and the pre-learned weight with only the feature extraction part among the CNN structure for handwriting recognition. For the experiment, I used GP-GPU with 16 warps and 16 threads in Xilinx VC707 FPGA with 256 threads in total. For reference, the feature extraction section of Experimental Example 1 refers to a composite product layer and a pooling layer in the case of conventional CNN, and refers to a composite product-pooling mixed layer in the case of CNN of the present invention.

실험예 1에 따른 테스트 결과인 합성곱 신경망(CNN) 연산 누적 시간(㎲)을 다음의 표 1에 정리하였다.The cumulative computation time (㎲) of the CNN computation, which is the test result according to Experimental Example 1, is summarized in Table 1 below.

종래 특징 추출부

Conventionally,

종래 CNN
Conventional CNN
본 발명의 특징 추출부
The feature extraction unit
본 발명의 CNN
The CNN of the present invention
합성곱 레이어 1

Composite Product Layer 1

1,180
1,180

합성곱-풀링
혼합 레이어 1

Composite-Pulling
Mixed Layer 1

1,426

1,426
풀링 레이어 1

Pooling Layer 1

2,180
2,180
합성곱 레이어 2

Composite Product Layer 2

5,362
5,362

합성곱-풀링
혼합 레이어 2

Composite-Pulling
Mixed Layer 2

4,686

4,686
풀링 레이어 2

Pooling Layer 2

5,610
5,610

표 1에서 알 수 있듯이, 본 발명에 따른 합성곱-풀링 혼합 레이어를 적용하면, 종래 합성곱 레이어의 특징맵 수만큼 메모리 접근 수가 줄어들고 스레드를 효율적으로 분배할 수 있어, 결국 합성곱 신경망의 연산속도 성능을 향상시킬 수 있게 된다. 구체적으로, 실험예 1의 표 1 결과에 따르면, 본 발명에 따른 합성곱 신경망의 연산 누적 시간은 종래 대비 924㎲ 차이를 나타내어 약 16.47%의 시간이 감소됨을 확인할 수 있다.As can be seen from Table 1, the application of the composite product-pooling layer according to the present invention reduces the memory access count by the number of feature maps of the conventional composite product layer and efficiently distributes the threads. As a result, The performance can be improved. Specifically, according to the results of Table 1 of Experimental Example 1, it can be confirmed that the cumulative cumulative time of the resultant articulated neural network according to the present invention is 924 μs less than that of the conventional art, and the time is reduced by about 16.47%.

한편, 상기에서 설명 및 도시한 본 발명의 합성곱 신경망을 이용한 이미지 인식 방법은 컴퓨터와 같은 전기 전자 장치에 전술한 각각의 처리(제1 ~ 제M 처리 등) 내지 단계들을 실행시키기 위한 프로그램을 기록한 전기 전자 장치로 읽을 수 있는 기록 매체 형태로 제공될 수 있다.On the other hand, the image recognition method using the composite neural network of the present invention described above and shown in the above is a method of recording a program for executing each of the above-described processes (first to Mth processes, etc.) And may be provided in the form of a recording medium readable by an electric / electronic device.

상기에서 본 발명의 바람직한 실시예가 특정 용어들을 사용하여 설명 및 도시되었지만 그러한 용어는 오로지 본 발명을 명확히 설명하기 위한 것일 뿐이며, 본 발명의 실시예 및 기술된 용어는 다음의 청구범위의 기술적 사상 및 범위로부터 이탈되지 않고서 여러 가지 변경 및 변화가 가해질 수 있는 것은 자명한 일이다. 이와 같이 변형된 실시예들은 본 발명의 사상 및 범위로부터 개별적으로 이해되어져서는 안되며, 본 발명의 청구범위 안에 속한다고 해야 할 것이다.While the preferred embodiments of the present invention have been described and illustrated above using specific terms, such terms are used only for the purpose of clarifying the invention, and it is to be understood that the embodiment It will be obvious that various changes and modifications can be made without departing from the spirit and scope of the invention. Such modified embodiments should not be understood individually from the spirit and scope of the present invention, but should be regarded as being within the scope of the claims of the present invention.

10: 인식 대상 이미지
21: 제1 영역
22: 제2 영역
23: 제3 영역
24: 제4 영역
25: 제5 영역
30: 커널
40: 합성곱-풀링 혼합 특징맵10: Image to be recognized
21: first region
22: second region
23: third region
24: fourth region
25: fifth region
30: The kernel
40: Composite-Pulling Mix Feature Map

Claims

A method for recognizing an image displayed on N x N pixels using a composite neural network,
A first process that is performed concurrently and in parallel; And a second process,
The first process may include:
A first region of n × n pixels (where n <N) of the N × N pixels is read as an input;
Performing four convolution operations by applying (n-1) x (n-1) kernels of Stride = 1 to the first region;
Extracting a first characteristic by sequentially performing a pooling operation on the result of each of the resultant products without storing the result of each of the resultant products of the step 1b in a shared memory; And
And mapping the first characteristic to a composite product-pooling mixed feature map,
The second process may include:
Reading a second region shifted by 'Stride = m' with respect to the first region and composed of n × n pixels as an input;
(B) performing four convolution operations by applying (n-1) x (n-1) kernels of Stride = 1 to the second region;
Extracting a second characteristic by performing a pooling operation successively with the result of each of the resultant products of step 2b without storing the result of each of the resultant products of step 2b in a shared memory; And
And mapping the second feature to a composite product-pooling feature map. 2. The method of claim 1, further comprising:

The method according to claim 1,
Further comprising a third process, a fourth process, ..., and an M process which are performed simultaneously and in parallel,
The third process is shifted by 'Stride = m' relative to the second region, and the third region consisting of n × n pixels is read as an input, and the fourth process is shifted by 'Stride = m' The Mth region is shifted by 'Stride = m' relative to the (M-1) th region and is read as an input. The Mth region is composed of n × n pixels.
(N-1) × (n-1) kernels of 'Stride = 1' are applied to each of the third region, the fourth region, the M region, The third, fourth,..., And M-th features are extracted by performing the pooling operation successively with each result of the convolution operation of each region without storing the result of each convolution operation of the region in the shared memory ; And mapping the third feature, the fourth feature, ..., and the M feature to the combined product-pooling mixed feature map, respectively.

3. The method of claim 2,
The K-th new feature is extracted by applying the same processing method as the first through M-th processes to the resultant product-pooling mixed feature map composed of the first through M-th features, and then another composite product- And mapping each of the images to the map.

The method of claim 3,
The method of claim 1, further comprising generating a composite product-pooling mixed layer comprising the composite product-pooling blend feature maps.

5. The method of claim 4,
Further comprising the step of outputting a fully connected layer that is fully connected to all the neurons of the composite product-pooling mixed layer.

6. The method of claim 5,
Further comprising the step of determining which classification corresponds to the output value of the total binding layer.

3. The method of claim 2,
Wherein the first process is performed by assigning to a first thread of the process unit,
Wherein the second process is performed by assigning to a second thread of the process unit,
Wherein the third process is performed by assigning to a third thread of the process unit,
Wherein the fourth process is performed by assigning to a fourth thread of the process unit,
And the Mth processing is performed by allocating to the Mth thread of the process unit.

8. The method of claim 7,
Wherein the process unit is GP-GPU (General-Purpose Computing on Graphics Processing Units).

The method according to claim 1,
Wherein the composite neural network uses a non-linear function as an activation function.

The method according to claim 1,
Wherein the pooling is Max Pooling or Average Pooling. &Lt; RTI ID = 0.0 > 11. < / RTI >

In electrical and electronic devices,
10. A recording medium readable by an electric / electronic apparatus having recorded thereon a program for executing any one of the methods selected from among claims 1 to 10.