KR20190022237A

KR20190022237A - Method and apparatus for performing convolution operation in neural network

Info

Publication number: KR20190022237A
Application number: KR1020170135246A
Authority: KR
Inventors: 이세환; 김이섭; 김현욱; 심재형; 최영재
Original assignee: 삼성전자주식회사; 한국과학기술원
Priority date: 2017-08-23
Filing date: 2017-10-18
Publication date: 2019-03-06
Also published as: KR102452951B1

Abstract

A method for performing a convolution operation of a neural network in a neural network device obtains data of an input feature map and kernels from a memory, disassembles each of the kernels into first and second type sub-kernels, performs the convolution operation using the input feature map and first and second type sub-kernels, and obtains an output feature map by synthesizing results of the convolution operation.

Description

[0001] The present invention relates to a method and apparatus for performing a convolution operation on a neural network,

뉴럴 네트워크에서 피처맵과 커널 간의 컨볼루션 연산을 수행하는 방법 및 장치에 관한다.To a method and apparatus for performing a convolution operation between a feature map and a kernel in a neural network.

뉴럴 네트워크(neural network)는 생물학적 뇌를 모델링한 컴퓨터 과학적 아키텍쳐(computational architecture)를 참조한다. 최근 뉴럴 네트워크(neural network) 기술이 발전함에 따라, 다양한 종류의 전자 시스템에서 뉴럴 네트워크를 활용하여 입력 데이터를 분석하고 유효한 정보를 추출하는 연구가 활발히 진행되고 있다. 뉴럴 네트워크를 처리하는 장치는 복잡한 입력 데이터에 대한 많은 양의 연산을 필요로 한다. 따라서, 뉴럴 네트워크를 이용하여 대량의 입력 데이터를 실시간으로 분석하여, 원하는 정보를 추출하기 위해서는 뉴럴 네트워크에 관한 연산을 효율적으로 처리할 수 있는 기술이 요구된다.A neural network refers to a computational architecture that models a biological brain. Recently, with the development of neural network technology, various kinds of electronic systems have been actively studied for analyzing input data using neural network and extracting valid information. Devices that process neural networks require large amounts of computation for complex input data. Therefore, in order to extract a desired information by analyzing a large amount of input data in real time using a neural network, there is a need for a technique capable of efficiently processing an operation related to a neural network.

뉴럴 네트워크의 컨볼루션 연산을 수행하는 방법 및 장치를 제공하는데 있다. 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 이하의 실시예들로부터 또 다른 기술적 과제들이 유추될 수 있다.And a method and apparatus for performing a convolution operation of a neural network. The technical problem to be solved by this embodiment is not limited to the above-mentioned technical problems, and other technical problems can be deduced from the following embodiments.

일 측면에 따르면, 뉴럴 네트워크의 컨볼루션 연산을 수행하는 방법은, 뉴럴 네트워크의 레이어에서 처리될, 입력 피처맵 및 바이너리-웨이트를 갖는 커널들의 데이터를 메모리로부터 획득하는 단계; 상기 커널들 각각을, 동일 부호의 웨이트들로 재구성된 제1타입 서브커널 및 상기 커널과 상기 제1타입 서브커널 간의 차이를 보정하기 위한 제2타입 서브커널로 분해하는 단계; 상기 입력 피처맵과 상기 커널들 각각으로부터 분해된 상기 제1타입 서브커널 및 상기 제2타입 서브커널을 이용하여 컨볼루션 연산을 수행하는 단계; 및 상기 컨볼루션 연산의 결과들을 합성함으로써 출력 피처맵을 구하는 단계를 포함한다.According to an aspect, a method of performing a convolution operation of a neural network includes obtaining data of kernels having an input feature map and a binary-weight to be processed at a layer of a neural network from a memory; Decomposing each of the kernels into a first type sub-kernel reconstructed with weights of the same sign and a second type sub-kernel for correcting the difference between the kernel and the first type sub-kernel; Performing a convolution operation using the input feature map and the first type sub-kernel and the second type sub-kernel decomposed from each of the kernels; And obtaining an output feature map by combining the results of the convolution operation.

다른 측면에 따르면, 뉴럴 네트워크 장치는 적어도 하나의 프로그램이 저장된 메모리; 및 상기 적어도 하나의 프로그램을 실행함으로써 뉴럴 네트워크를 구동하는 프로세서를 포함하고, 상기 프로세서는, 뉴럴 네트워크의 레이어에서 처리될, 입력 피처맵 및 바이너리-웨이트를 갖는 커널들의 데이터를 메모리로부터 획득하고, 상기 커널들 각각을, 동일 부호의 웨이트들로 재구성된 제1타입 서브커널 및 상기 커널과 상기 제1타입 서브커널 간의 차이를 보정하기 위한 제2타입 서브커널로 분해하고, 상기 입력 피처맵과 상기 커널들 각각으로부터 분해된 상기 제1타입 서브커널 및 상기 제2타입 서브커널을 이용하여 컨볼루션 연산을 수행하고, 상기 컨볼루션 연산의 결과들을 합성함으로써 출력 피처맵을 구한다.According to another aspect, a neural network device includes a memory in which at least one program is stored; And a processor for driving the neural network by executing the at least one program, the processor acquiring from the memory data of kernels having an input feature map and a binary-weight to be processed at a layer of a neural network, Each of the kernels is decomposed into a first type sub-kernel reconstructed with weights of the same sign and a second type sub-kernel for correcting the difference between the kernel and the first type sub-kernel, The first type sub-kernel and the second type sub-kernel which are decomposed from each other, and combines the results of the convolution operation to obtain an output feature map.

또 다른 측면에 따르면, 컴퓨터로 읽을 수 있는 기록매체는 상술한 방법을 실행하는 명령어들을 포함하는 하나 이상의 프로그램이 기록된 기록매체를 포함할 수 있다.According to another aspect, a computer-readable recording medium may include a recording medium having recorded thereon one or more programs including instructions for executing the above-described method.

도 1은 일 실시예에 따른 뉴럴 네트워크의 아키텍처를 설명하기 위한 도면이다.
도 2는 일 실시예에 따른 뉴럴 네트워크에서 입력 피처맵 및 출력 피처맵의 관계를 설명하기 위한 도면이다.
도 3은 일 실시예에 따른 뉴럴 네트워크 장치의 하드웨어 구성을 도시한 블록도이다.
도 4는 뉴럴 네트워크의 컨볼루션 연산을 설명하기 위한 도면이다.
도 5는 일 실시예에 따른 커널 분해(kernel decomposition)를 설명하기 위한 도면이다.
도 6은 다른 실시예에 따른 커널 분해를 설명하기 위한 도면이다.
도 7은 일 실시예에 따라, 입력 피처맵과 원본 커널로부터 분해된 서브커널들 간의 컨볼루션 연산에 대해 설명하기 위한 도면이다.
도 8은 일 실시예에 따라 베이스 출력과 필터드 출력을 이용하여 출력 피처맵의 픽셀 값을 결정하는 것을 설명하기 위한 도면이다.
도 9는 일 실시예에 따라 하나의 입력 피처맵으로부터 컨볼루션 연산을 통해 복수의 출력 피처맵들을 생성하는 것을 설명하기 위한 도면이다.
도 10은 일 실시예에 따라 복수의 입력 피처맵들로부터 컨볼루션 연산을 통해 복수의 출력 피처맵들을 생성하는 것을 설명하기 위한 도면이다.
도 11은 일 실시예에 따라 커널 분해에 기초하여 뉴럴 네트워크의 컨볼루션 연산을 수행하기 위한 하드웨어 설계를 도시한 도면이다.
도 12a 및 도 12b는 다른 실시예에 따른 터너리-웨이트 커널의 커널 분해를 설명하기 위한 도면이다.
도 13은 일 실시예에 따라 뉴럴 네트워크 장치에서 커널 분해를 이용하여 뉴럴 네트워크의 컨볼루션 연산을 수행하는 과정을 설명하기 위한 흐름도이다.
도 14는 일 실시예에 따른 뉴럴 네트워크의 컨볼루션 연산을 수행하는 방법의 흐름도이다. 1 is a view for explaining an architecture of a neural network according to an embodiment.
2 is a diagram for explaining a relationship between an input feature map and an output feature map in a neural network according to an embodiment.
3 is a block diagram illustrating a hardware configuration of a neural network device according to an embodiment.
4 is a diagram for explaining the convolution operation of the neural network.
5 is a diagram for explaining kernel decomposition according to an embodiment.
6 is a diagram for explaining kernel decomposition according to another embodiment.
7 is a diagram for explaining a convolution operation between an input feature map and sub-kernels decomposed from the original kernel, according to an embodiment.
8 is a diagram for explaining determining pixel values of an output feature map using a base output and a filtered output according to an embodiment.
9 is a diagram for describing generating a plurality of output feature maps through convolution operations from one input feature map in accordance with one embodiment.
10 is a diagram for describing generating a plurality of output feature maps through convolution operations from a plurality of input feature maps in accordance with one embodiment.
11 is a diagram illustrating a hardware design for performing a convolution operation of a neural network based on kernel decomposition according to one embodiment.
12A and 12B are views for explaining kernel decomposition of the ternary-weight kernel according to another embodiment.
13 is a flowchart illustrating a process of performing a convolution operation of a neural network using kernel decomposition in a neural network apparatus according to an embodiment.
14 is a flowchart of a method of performing convolutional computation of a neural network according to an embodiment.

본 실시예들에서 사용되는 용어는 본 실시예들에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 기술분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 임의로 선정된 용어도 있으며, 이 경우 해당 실시예의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서, 본 실시예들에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 실시예들의 전반에 걸친 내용을 토대로 정의되어야 한다.Although the terms used in the present embodiments have been selected in consideration of the functions in the present embodiments and are currently available in common terms, they may vary depending on the intention or the precedent of the technician working in the art, the emergence of new technology . Also, in certain cases, there are arbitrarily selected terms, and in this case, the meaning will be described in detail in the description part of the embodiment. Therefore, the terms used in the embodiments should be defined based on the meaning of the terms, not on the names of simple terms, and on the contents of the embodiments throughout.

실시예들에 대한 설명들에서, 어떤 부분이 다른 부분과 연결되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐 아니라, 그 중간에 다른 구성요소를 사이에 두고 전기적으로 연결되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 포함한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.In the descriptions of the embodiments, when a part is connected to another part, it includes not only a case where the part is directly connected but also a case where the part is electrically connected with another part in between . Also, when a component includes an element, it is understood that the element may include other elements, not the exclusion of any other element unless specifically stated otherwise.

본 실시예들에서 사용되는 "구성된다" 또는 "포함한다" 등의 용어는 명세서 상에 기재된 여러 구성 요소들, 도는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.It should be noted that the terms such as " comprising " or " comprising ", as used in these embodiments, should not be construed as necessarily including the various components or stages described in the specification, Some steps may not be included, or may be interpreted to include additional components or steps.

하기 실시예들에 대한 설명은 권리범위를 제한하는 것으로 해석되지 말아야 하며, 해당 기술분야의 당업자가 용이하게 유추할 수 있는 것은 실시예들의 권리범위에 속하는 것으로 해석되어야 할 것이다. 이하 첨부된 도면들을 참조하면서 오로지 예시를 위한 실시예들을 상세히 설명하기로 한다.The following description of the embodiments should not be construed as limiting the scope of the present invention and should be construed as being within the scope of the embodiments of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Exemplary embodiments will now be described in detail with reference to the accompanying drawings.

도 1은 일 실시예에 따른 뉴럴 네트워크의 아키텍처를 설명하기 위한 도면이다.1 is a view for explaining an architecture of a neural network according to an embodiment.

도 1을 참고하면, 뉴럴 네트워크(1)는 딥 뉴럴 네트워크(Deep Neural Network, DNN) 또는 n-계층 뉴럴 네트워크(n-layers neural networks)의 아키텍처일 수 있다. DNN 또는 n-계층 뉴럴 네트워크는 컨볼루션 뉴럴 네트워크(Convolutional Neural Networks, CNN), 리커런트 뉴럴 네트워크(Recurrent Neural Networks, RNN), Deep Belief Networks, Restricted Boltzman Machines 등에 해당될 수 있다. 예를 들어, 뉴럴 네트워크(1)는 컨볼루션 뉴럴 네트워크(CNN)로 구현될 수 있으나, 이에 제한되지 않는다. 도 1에서는 뉴럴 네트워크(1)의 예시에 해당하는 컨볼루션 뉴럴 네트워크에서 일부의 컨볼루션 레이어가 도시되었지만, 컨볼루션 뉴럴 네트워크는 도시된 컨볼루션 레이어 외에도, 풀링 레이어(pooling layer), 풀리 커넥티드(fully connected) 레이어 등을 더 포함할 수 있다.Referring to FIG. 1, the neural network 1 may be an architecture of a Deep Neural Network (DNN) or n-layers neural networks. DNN or n-layer neural networks may be suitable for Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Deep Belief Networks, Restricted Boltzman Machines and the like. For example, the neural network 1 may be implemented as a convolutional neural network (CNN), but is not limited thereto. In FIG. 1, although some convolutional layers are shown in the convolutional neural network corresponding to the example of the neural network 1, in addition to the convolutional layer shown, the convolutional neural network includes a pooling layer, a fully connected layer, and the like.

뉴럴 네트워크(1)는 입력 이미지, 피처맵들(feature maps) 및 출력을 포함하는 복수 레이어들을 갖는 아키텍처로 구현될 수 있다. 뉴럴 네트워크(1)에서 입력 이미지는 커널(kernel)이라 불리는 필터와의 컨볼루션 연산이 수행되고, 그 결과 피처맵들이 출력된다. 이때 생성된 출력 피처맵들은 입력 피처맵들로서 다시 커널과의 컨볼루션 연산이 수행되고, 새로운 피처맵들이 출력된다. 이와 같은 컨볼루션 연산이 반복적으로 수행된 결과, 최종적으로는 뉴럴 네트워크(1)를 통한 입력 이미지의 특징들에 대한 인식 결과가 출력될 수 있다.The neural network 1 may be implemented with an architecture having a plurality of layers including an input image, feature maps and an output. In the neural network 1, an input image is convoluted with a filter called a kernel, and as a result, feature maps are output. At this time, the generated output feature maps are input feature maps, the convolution operation with the kernel is performed, and new feature maps are output. As a result of the convolution operation being repeatedly performed, the recognition result for the features of the input image through the neural network 1 can be finally output.

예를 들어, 도 1의 뉴럴 네트워크(1)에 24x24 픽셀 크기의 이미지가 입력된 경우, 입력 이미지는 커널과의 컨볼루션 연산을 통해 20x20 크기를 갖는 4채널의 피처맵들로 출력될 수 있다. 이후에도, 20x20 피처맵들은 커널과의 반복적인 컨볼루션 연산을 통해 크기가 줄어들면서, 최종적으로는 1x1 크기의 특징들이 출력될 수 있다. 뉴럴 네트워크(1)는 여러 레이어들에서 컨볼루션 연산 및 서브샘플링(또는 풀링) 연산을 반복적으로 수행함으로써 입력 이미지로부터 이미지 전체를 대표할 수 있는 강인한 특징들을 필터링하여 출력하고, 출력된 최종 특징들을 통해 입력 이미지의 인식 결과를 도출할 수 있다.For example, when an image of 24x24 pixels size is input to the neural network 1 of FIG. 1, the input image may be outputted as four channel feature maps having a size of 20x20 through a convolution operation with the kernel. After that, 20x20 feature maps are reduced in size through iterative convolution operations with the kernel, and finally, features of 1x1 size can be output. The neural network 1 filters and outputs robust features that can represent an entire image from an input image by repeatedly performing a convolution operation and a sub-sampling (or pulling) operation at various layers, The recognition result of the input image can be derived.

도 2는 일 실시예에 따른 뉴럴 네트워크에서 입력 피처맵 및 출력 피처맵의 관계를 설명하기 위한 도면이다.2 is a diagram for explaining a relationship between an input feature map and an output feature map in a neural network according to an embodiment.

도 2를 참고하면, 뉴럴 네트워크의 어느 레이어(2)에서, 제1피처맵(FM1)은 입력 피처맵에 해당될 수 있고, 제2피처 맵(FM2)는 출력 피처맵에 해당될 수 있다. 피처맵은 입력 데이터의 다양한 특징들이 표현된 데이터 세트를 의미할 수 있다. 피처맵들(FM1, FM2)은 2차원 매트릭스의 엘리먼트들을 갖거나 또는 3차원 매트릭스의 엘리먼트들을 가질 수 있고, 각각의 엘리먼트에는 픽셀 값이 정의될 수 있다. 피처 맵들(FM1, FM2)은 너비(W)(또는 칼럼이라고 함), 높이(H)(또는 로우라고 함) 및 깊이(D)를 가진다. 이때, 깊이(D)는 채널들의 개수에 해당될 수 있다.2, in any layer 2 of the neural network, the first feature map FM1 may correspond to the input feature map and the second feature map FM2 may correspond to the output feature map. The feature map may refer to a data set in which various features of input data are represented. Feature maps FM1 and FM2 may have elements of a two-dimensional matrix or may have elements of a three-dimensional matrix, and pixel values may be defined for each element. The feature maps FM1 and FM2 have a width W (or a column), a height H (or a row), and a depth D. At this time, the depth D may correspond to the number of channels.

제1피처맵(FM1) 및 커널의 웨이트맵(WM)에 대한 컨볼루션 연산이 수행될 수 있고, 그 결과 제2피처맵(FM2)이 생성될 수 있다. 웨이트맵(WM)은 각 엘리먼트에 정의된 웨이트로 제1피처맵(FM1)과 컨볼루션 연산을 수행함으로써 제1피처맵(FM1)의 특징들을 필터링한다. 웨이트맵(WM)은 제1입력 피처맵(FM1)을 슬라이딩 윈도우 방식으로 시프트하면서 제1입력 피처맵(FM1)의 윈도우들(또는 타일이라고도 함)과 컨볼루션 연산을 수행한다. 각 시프트 동안, 웨이트맵(WM)에 포함된 웨이트들 각각은 제1피처맵(FM1) 내 중첩된 윈도우의 픽셀 값들 각각과 곱해지고 더해질 수 있다. 제1피처맵(FM1)과 웨이트맵(WM)이 컨볼루션됨에 따라, 제2피처맵(FM2)의 하나의 채널이 생성될 수 있다. 도 1에는 하나의 커널에 대한 웨이트맵(WM)이 도시되었으나, 실제로는 복수의 커널들의 웨이트 맵들이 제1피처맵(FM1)과 각각 컨볼루션되어, 복수의 채널들의 제2피처맵(FM2)이 생성될 수 있다.A convolution operation may be performed on the first feature map FM1 and the kernel weight map WM so that a second feature map FM2 may be generated. The weight map WM filters the features of the first feature map FM1 by performing a convolution operation with the first feature map FM1 with the weights defined for each element. The weight map WM performs a convolution operation with windows (or a tile) of the first input feature map FM1 while shifting the first input feature map FM1 in a sliding window manner. During each shift, each of the weights contained in the weight map WM may be multiplied and added to each of the pixel values of the overlaid window in the first feature map FM1. As the first feature map FM1 and the weight map WM are convolved, one channel of the second feature map FM2 can be generated. Although the weight maps WM for one kernel are shown in FIG. 1, in reality, the weight maps of the plurality of kernels are each convolved with the first feature map FM1 to form the second feature map FM2 of the plurality of channels, Can be generated.

한편, 제2피처맵(FM2)은 다음 레이어의 입력 피처맵에 해당될 수 있다. 예를 들어, 제2피처맵(FM2)은 풀링(또는 서브샘플링) 레이어의 입력 피처맵이 될 수 있다.On the other hand, the second feature map FM2 may correspond to the input feature map of the next layer. For example, the second feature map FM2 may be an input feature map of a pooling (or subsampling) layer.

도 1 및 도 2에서는 설명의 편의를 위하여 뉴럴 네트워크(1)의 개략적인 아키텍처에 대해서만 도시되어 있다. 하지만, 뉴럴 네트워크(1)는 도시된 바와 달리, 보다 많거나 적은 개수의 레이어들, 피처맵들, 커널들 등으로 구현될 수 있고, 그 크기들 또한 다양하게 변형될 수 있음을 당해 기술분야의 통상의 기술자라면 이해할 수 있다.1 and 2, only the schematic architecture of the neural network 1 is shown for convenience of explanation. It should be understood, however, that the neural network 1 may be implemented with more or less numbers of layers, feature maps, kernels, etc., as shown, and that the sizes may also vary widely. And can be understood by a person skilled in the art.

도 3은 일 실시예에 따른 뉴럴 네트워크 장치의 하드웨어 구성을 도시한 블록도이다.3 is a block diagram illustrating a hardware configuration of a neural network device according to an embodiment.

뉴럴 네트워크 장치(10)는 PC(personal computer), 서버 디바이스, 모바일 디바이스, 임베디드 디바이스 등의 다양한 종류의 디바이스들로 구현될 수 있고, 구체적인 예로서 뉴럴 네트워크를 이용한 음성 인식, 영상 인식, 영상 분류 등을 수행하는 스마트폰, 태블릿 디바이스, AR(Augmented Reality) 디바이스, IoT(Internet of Things) 디바이스, 자율주행 자동차, 로보틱스, 의료기기 등에 해당될 수 있으나, 이에 제한되지 않는다. 나아가서, 뉴럴 네트워크 장치(10)는 위와 같은 디바이스에 탑재되는 전용 하드웨어 가속기(HW accelerator)에 해당될 수 있고, 뉴럴 네트워크 장치(10)는 뉴럴 네트워크 구동을 위한 전용 모듈인 NPU(neural processing unit), TPU(Tensor Processing Unit), Neural Engine 등과 같은 하드웨어 가속기일 수 있으나, 이에 제한되지 않는다.The neural network device 10 may be implemented by various types of devices such as a PC (personal computer), a server device, a mobile device, an embedded device, and the like. Specific examples thereof include voice recognition, image recognition, But are not limited to, smart phones, tablet devices, Augmented Reality (IAR) devices, Internet of Things (IoT) devices, autonomous vehicles, robots, medical devices and the like. Further, the neural network device 10 may correspond to a dedicated hardware accelerator (HW accelerator) mounted on the device. The neural network device 10 may include a neural processing unit (NPU), which is a dedicated module for driving a neural network, TPU (Tensor Processing Unit), Neural Engine, and the like.

도 3을 참고하면, 뉴럴 네트워크 장치(10)는 프로세서(110) 및 메모리(120)를 포함한다. 도 3에 도시된 뉴럴 네트워크 장치(10)에는 본 실시예들와 관련된 구성요소들만이 도시되어 있다. 따라서, 뉴럴 네트워크 장치(10)에는 도 3에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 더 포함될 수 있음은 당해 기술분야의 통상의 기술자에게 자명하다.Referring to FIG. 3, the neural network device 10 includes a processor 110 and a memory 120. Only the components associated with these embodiments are shown in the neural network device 10 shown in Fig. Therefore, it is apparent to those skilled in the art that the neural network device 10 may further include general components other than the components shown in FIG.

프로세서(110)는 뉴럴 네트워크 장치(10)를 실행하기 위한 전반적인 기능들을 제어하는 역할을 한다. 예를 들어, 프로세서(110)는 뉴럴 네트워크 장치(10) 내의 메모리(120)에 저장된 프로그램들을 실행함으로써, 뉴럴 네트워크 장치(10)를 전반적으로 제어한다. 프로세서(110)는 뉴럴 네트워크 장치(10) 내에 구비된 CPU(central processing unit), GPU(graphics processing unit), AP(application processor) 등으로 구현될 수 있으나, 이에 제한되지 않는다.The processor 110 serves to control overall functions for executing the neural network device 10. [ For example, the processor 110 generally controls the neural network device 10 by executing programs stored in the memory 120 in the neural network device 10. [ The processor 110 may be implemented by a central processing unit (CPU), a graphics processing unit (GPU), an application processor (AP), or the like, which are provided in the neural network device 10, but the present invention is not limited thereto.

메모리(120)는 뉴럴 네트워크 장치(10) 내에서 처리되는 각종 데이터들을 저장하는 하드웨어로서, 예를 들어, 메모리(120)는 뉴럴 네트워크 장치(10)에서 처리된 데이터들 및 처리될 데이터들을 저장할 수 있다. 또한, 메모리(120)는 뉴럴 네트워크 장치(10)에 의해 구동될 애플리케이션들, 드라이버들 등을 저장할 수 있다. 메모리(120)는 DRAM(dynamic random access memory), SRAM(static random access memory) 등과 같은 RAM(random access memory), ROM(read-only memory), EEPROM(electrically erasable programmable read-only memory), CD-ROM, 블루레이 또는 다른 광학 디스크 스토리지, HDD(hard disk drive), SSD(solid state drive), 또는 플래시 메모리를 포함할 수 있다.The memory 120 is a hardware for storing various data processed in the neural network device 10. For example, the memory 120 may store data processed in the neural network device 10 and data to be processed have. The memory 120 may also store applications, drivers, etc., to be driven by the neural network device 10. The memory 120 may be a random access memory (RAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a dynamic random access memory (DRAM) ROM, Blu-ray or other optical disk storage, a hard disk drive (HDD), a solid state drive (SSD), or a flash memory.

프로세서(110)는 메모리(120)로부터 뉴럴 네트워크 데이터, 예를 들어 이미지 데이터, 피처맵 데이터, 커널 데이터 등을 리드/라이트(read/write)하고, 리드/라이트된 데이터를 이용하여 뉴럴 네트워크를 실행한다. 뉴럴 네트워크가 실행될 때, 프로세서(110)는 출력 피처맵에 관한 데이터를 생성하기 위하여, 입력 피처맵과 커널 간의 컨볼루션 연산을 반복적으로 수행한다. 이때, 입력 피처맵의 채널 수, 커널의 채널 수, 입력 피처맵의 크기, 커널의 크기, 값의 정밀도(precision) 등의 다양한 팩터들에 의존하여 컨볼루션 연산의 연산량이 결정될 수 있다. 도 1에 도시된 뉴럴 네트워크(1)와 달리, 뉴럴 네트워크 장치(10)에서 구동되는 실제 뉴럴 네트워크는 보다 복잡한 아키텍처로 구현될 수 있다. 이에 따라 프로세서(110)는 수억에서 수백억에 다다를 정도로 매우 많은 연산량(operation count)의 컨볼루션 연산들을 수행하게 되고, 프로세서(110)가 컨볼루션 연산을 위해 메모리(120)에 액세스하는 빈도가 함께 비약적으로 증가될 수 밖에 없다. 이와 같은 연산량 부담으로 인하여 비교적 처리 성능이 낮은 스마트폰, 태블릿, 웨어러블 디바이스 등과 같은 모바일 디바이스, 임베디드(embedded) 디바이스 등에서는 뉴럴 네트워크의 처리가 원활하지 않을 수 있다.The processor 110 reads / writes neural network data, such as image data, feature map data, kernel data, etc., from the memory 120 and executes the neural network using the read / do. When the neural network is executed, the processor 110 repeatedly performs a convolution operation between the input feature map and the kernel to generate data relating to the output feature map. At this time, the computation amount of the convolution operation can be determined depending on various factors such as the number of channels of the input feature map, the number of channels of the kernel, the size of the input feature map, the size of the kernel, and the precision of the values. Unlike the neural network 1 shown in FIG. 1, the actual neural network driven by the neural network device 10 can be implemented with a more complex architecture. Accordingly, the processor 110 performs convolution operations with a very large number of operation counts from hundreds of millions to tens of millions of cycles, and the frequency with which the processor 110 accesses the memory 120 for the convolution operation, As shown in Fig. Due to such computational burden, the processing of the neural network may not be smooth in mobile devices such as smart phones, tablets, wearable devices and the like and embedded devices having relatively low processing performance.

한편, 뉴럴 네트워크에서 커널은 부동 소수점(floating point) 타입의 웨이트 또는 고정 소수점(fixed point) 타입의 웨이트를 갖거나, 바이너리(binary)-웨이트 커널 또는 터너리(ternary)-웨이트 커널에 해당될 수도 있다. 즉, 뉴럴 네트워크에서 커널은 뉴럴 네트워크의 활용 목적, 디바이스의 성능 등 다양한 요인들을 고려하여 다양하게 정의될 수 있다. 여기서, 바이너리-웨이트 커널은 부동 소수점 웨이트 또는 고정 소수점 웨이트를 갖는 커널과 달리, 웨이트 값이 예를 들어 +1 또는 -1로 제한되어(constrained) 있는 커널을 의미할 수 있다. 그리고, 터너리-웨이트 커널은 웨이트 값이 +1, 0 또는 -1로 제한되어 있는 커널을 의미할 수 있다.On the other hand, in a neural network, the kernel may have a floating point type weight or a fixed point type weight, a binary-weight kernel or a ternary-weight kernel have. That is, in a neural network, a kernel can be variously defined in consideration of various factors such as a purpose of a neural network, performance of a device, and the like. Here, a binary-weight kernel may mean a kernel whose weight value is constrained to +1 or -1, for example, unlike a kernel with a floating-point weight or a fixed-point weight. And, the ternary-weight kernel can mean a kernel whose weight value is limited to +1, 0, or -1.

이하에서, 프로세서(110)에 의해 실행되는 뉴럴 네트워크는 바이너리-웨이트 커널, 터너리-웨이트 커널 등과 같이 웨이트가 특정 레벨들로 양자화된 커널을 이용하여 컨볼루션 연산을 수행하는 경우를 가정하여 설명하겠으나, 본 실시예들은 이에 제한되지 않고 다른 종류의 커널을 이용한 컨볼루션 연산에도 적용이 가능하다.Hereinafter, a neural network executed by the processor 110 will be described on the assumption that the weight performs a convolution operation using a kernel quantized with specific levels, such as a binary-weight kernel, a ternary-weight kernel, , But the present embodiments are not limited to this and can be applied to convolution operations using other kinds of kernels.

커널의 웨이트가 특정 레벨들로 양자화된 바이너리-웨이트 커널 또는 터너리-웨이트 커널이라 할지라도, 컨볼루션 연산은 뉴럴 네트워크의 처리에 있어서 전체 연산량 중에서 여전히 높은 비중을 차지한다. 따라서, 뉴럴 네트워크의 처리에 있어서 컨볼루션 연산의 연산량을 충분히 감소시키면서도 정확도 손실을 최소화하는 처리 방식이 요구된다.Even if the kernel's weight is a binary-weight kernel or a ternary-weight kernel quantized to specific levels, the convolution operation still occupies a high proportion of the total computational complexity in the processing of the neural network. Therefore, a processing method that minimizes the accuracy loss while sufficiently reducing the amount of computation of the convolution operation in the processing of the neural network is required.

바이너리-웨이트 커널의 경우, 웨이트가 2 종류(예를 들어 -1 또는 +1, 0 또는 1, -1 또는 0)로 제한되어 있기 때문에, 바이너리-웨이트 커널에서 임의로 두 개의 웨이트들을 선택했을 때 선택된 웨이트들은 서로 같을 확률이 높을 수 있다. 즉, 부동 소수점 또는 고정 소수점 타입의 커널에 비해, 뉴럴 네트워크의 어느 레이어 내 임의의 두 바이너리-웨이트 커널들은 유사할 확률이 높다. 이와 같은 유사할 확률을 활용하여, 뉴럴 네트워크의 커널들을, 커널들에 공통된 근사적인 서브 커널과, 에러를 보정해주는 서브 커널로 분해하여(decompose) 컨볼루션 연산이 수행된다면 컨볼루션 연산의 연산량이 효율적으로 줄어들 수 있다. 이하 본 실시예들의 설명에서는 이와 같이 뉴럴 네트워크의 커널들을 분해하여 컨볼루션 연산을 수행하는 방법들에 대해 상세하게 설명하도록 한다. 이하에서 설명된 방법들은 뉴럴 네트워크 장치(10)의 프로세서(110) 및 메모리(120)에 의해 수행될 수 있다.Because the binary-weight kernel is limited to two types of weights (for example, -1 or +1, 0 or 1, -1, or 0), it is selected when two weights are arbitrarily selected in the binary- The weights can be the same for each other. That is, compared to a floating-point or fixed-point type kernel, any two binary-weight kernels in any layer of the neural network are likely to be similar. Utilizing this similarity probability, if the neural network kernels are decomposed into an approximate sub kernel that is common to kernels and a sub kernel that corrects errors, and the convolution operation is performed, the amount of computation of the convolution operation becomes efficient . Hereinafter, the methods of disassembling the kernels of the neural network and performing the convolution operation will be described in detail. The methods described below may be performed by the processor 110 and the memory 120 of the neural network device 10. [

도 4는 뉴럴 네트워크의 컨볼루션 연산을 설명하기 위한 도면이다.4 is a diagram for explaining the convolution operation of the neural network.

도 4의 예시에서, 입력 피처맵(410)은 6x6 크기이고, 원본 커널(420)은 3x3 크기이고, 출력 피처맵(430)은 4x4 크기인 것으로 가정하나, 이에 제한되지 않고 뉴럴 네트워크는 다양한 크기의 피처맵들 및 커널들로 구현될 수 있다. 또한, 입력 피처맵(410), 원본 커널(420) 및 출력 피처맵(430)에 정의된 값들은 모두 예시적인 값들일 뿐이고, 본 실시예들은 이에 제한되지 않는다. 한편, 원본 커널(420)은 앞서 설명된 바이너리-웨이트 커널에 해당된다.4, it is assumed that the input feature map 410 is a 6x6 size, the original kernel 420 is a 3x3 size, and the output feature map 430 is a 4x4 size, but the present invention is not limited thereto, And may be implemented with feature maps and kernels. In addition, the values defined in the input feature map 410, the original kernel 420, and the output feature map 430 are all exemplary values, and the present embodiments are not limited thereto. On the other hand, the original kernel 420 corresponds to the binary-weight kernel described above.

원본 커널(420)은 입력 피처맵(410)에서 3x3 크기의 윈도우 단위로 슬라이딩하면서 컨볼루션 연산을 수행한다. 컨볼루션 연산은 입력 피처맵(410)의 어느 윈도우의 각 픽셀 값 및 원본 커널(420)에서 대응 위치의 각 엘리먼트의 웨이트 간의 곱셈을 하여 획득된 값들을 모두 합산하여, 출력 피처맵(430)의 각 픽셀 값을 구하는 연산을 의미한다. 구체적으로, 원본 커널(420)은 먼저 입력 피처맵(410)의 제1윈도우(411)와 컨볼루션 연산을 수행한다. 즉, 제1윈도우(411)의 각 픽셀 값 1, 2, 3, 4, 5, 6, 7, 8, 9는 각각 원본 커널(420)의 각 엘리먼트의 웨이트 -1, -1, +1, +1, -1, -1, -1, +1, +1과 각각 곱해지고, 그 결과로서 -1, -2, 3, 4, -5, -6, -7, 8, 9가 획득된다. 다음으로, 획득된 값들 -1, -2, 3, 4, -5, -6, -7, 8, 9를 모두 더한 결과인 3이 계산되고, 출력 피처맵(430)의 1행1열의 픽셀 값(431)은 3으로 결정된다. 여기서, 출력 피처맵(430)의 1행1열의 픽셀 값(431)은 제1윈도우(411)에 대응된다. 마찬가지 방식으로, 입력 피처맵(410)의 제2윈도우(412)와 원본 커널(420) 간의 컨볼루션 연산이 수행됨으로써 출력 피처맵(430)의 1행2열의 픽셀 값(432)인 -3이 결정된다. 최종적으로, 입력 피처맵(410)의 마지막 윈도우인 제16윈도우(413)와 원본 커널(420) 간의 컨볼루션 연산이 수행됨으로써 출력 피처맵(430)의 4행4열의 픽셀 값(433)인 -13이 결정된다.The original kernel 420 performs a convolution operation while sliding in a 3x3 window unit in the input feature map 410. [ The convolution operation sums all of the values obtained by multiplying each pixel value of a window of the input feature map 410 and the weight of each element of the corresponding position in the original kernel 420, Means an operation for obtaining each pixel value. Specifically, the original kernel 420 first performs a convolution operation with the first window 411 of the input feature map 410. That is, the pixel values 1, 2, 3, 4, 5, 6, 7, 8, and 9 of the first window 411 correspond to the weight -1, -1, -2, 3, 4, -5, -6, -7, 8, 9 are obtained as the result of multiplication by +1, -1, -1, -1, . 3, which is a result of addition of the obtained values -1, -2, 3, 4, -5, -6, -7, 8, and 9, The value 431 is determined to be 3. Here, the pixel value 431 of the first row and the first column of the output feature map 430 corresponds to the first window 411. Similarly, a convolution operation is performed between the second window 412 and the original kernel 420 of the input feature map 410, so that the pixel value 432 in the first row and the second column of the output feature map 430, . Finally, a convolution operation between the 16th window 413 and the original kernel 420, which is the last window of the input feature map 410, is performed to calculate the pixel value 433 of the 4th row and 4th column of the output feature map 430, 13 is determined.

즉, 하나의 입력 피처맵(410)과 하나의 원본 커널(420) 간의 컨볼루션 연산은 입력 피처맵(410) 및 원본 커널(420)에서 서로 대응하는 각 엘리먼트의 값들의 곱셈 및 곱셈 결과들의 합산을 반복적으로 수행함으로써 처리될 수 있고, 컨볼루션 연산의 결과로서 출력 피처맵(430)이 생성된다.That is, the convolution operation between one input feature map 410 and one source kernel 420 is performed by multiplying the values of each element corresponding to each other in the input feature map 410 and the original kernel 420, , And an output feature map 430 is generated as a result of the convolution operation.

하지만, 입력 피처맵(410)의 어느 한 윈도우와 원본 커널(420) 간의 컨볼루션 연산에서는 엘리먼트 개수만큼의 곱셈 및 곱셈 결과들의 합산이 필수적으로 요구되는바, 엘리먼트 개수가 많으면 많을수록 연산량이 높아질 수 있다. 나아가서, 입력 피처맵 내에 슬라이딩 횟수가 많거나 뉴럴 네트워크 내 많은 채널들의 입력 피처맵들이 존재하거나 많은 레이어들이 존재하는 경우에는, 연산량이 더욱 더 기하급수적으로 증가하게 된다. 본 실시예들에 따른 컨볼루션 연산은 원본 커널(420)을 여러 서브 커널들로 분해함으로써 연산량 감소를 이룰 수 있다.However, in the convolution operation between any one window of the input feature map 410 and the original kernel 420, the multiplication and multiplication results as many as the number of elements are necessarily required, and the larger the number of elements, the higher the computation amount . Furthermore, when the number of sliding times is large in the input feature map or there are input feature maps of many channels in the neural network or there are many layers, the amount of computation increases more exponentially. The convolution operation according to the present embodiments can reduce the amount of computation by decomposing the original kernel 420 into a plurality of sub-kernels.

도 5는 일 실시예에 따른 커널 분해(kernel decomposition)를 설명하기 위한 도면이다.5 is a diagram for explaining kernel decomposition according to an embodiment.

도 5를 참고하면, 원본 커널(500)은 바이너리-웨이트 커널로서, -1 또는 +1의 웨이트를 갖는다. 다만, 본 실시예에서는 웨이트의 종류가 -1 또는 +1인 경우를 가정하여 설명하겠으나, 이에 제한되지 않고 바이너리-웨이트는 +1 또는 0일 수 있고, 또는 바이너리-웨이트는 -1 또는 0일 수 있다.Referring to FIG. 5, the original kernel 500 is a binary-weight kernel and has a weight of -1 or +1. However, the present invention is not limited thereto. The binary-weight may be +1 or 0, or the binary-weight may be -1 or 0 have.

원본 커널(500)은 베이스(base) 커널(510)과 필터드(filtered) 커널(520)로 분해(decompose)될 수 있다.The original kernel 500 may be decomposed into a base kernel 510 and a filtered kernel 520.

본 실시예들에서, 베이스 커널(510)은 원본 커널(500)을 동일 부호의 웨이트들로 재구성한 서브커널인 것으로 정의될 수 있고, 제1타입 서브커널의 용어로도 지칭될 수 있다. 도 5에서는 원본 커널(500)의 모든 엘리먼트들의 웨이트들이 모두 동일한 -1로 치환되는 것으로 도시되어 있다.In these embodiments, the base kernel 510 may be defined as being a sub-kernel in which the original kernel 500 is reconstituted with weights of the same sign, and may also be referred to as a term of a first type sub-kernel. In FIG. 5, all the weights of all the elements of the original kernel 500 are shown to be replaced by the same -1.

본 실시예들에서, 필터드 커널(520)은 원본 커널(500)에서 베이스 커널(510)과 서로 다른 웨이트를 갖는 엘리먼트에는 원본 커널(500)의 원래 웨이트를 정의하고 나머지 엘리먼트에는 웨이트를 정의하지 않는 것으로 재구성된 서브커널인 것으로 정의될 수 있고, 제2타입 서브커널의 용어로도 지칭될 수 있다. 도 5에서 원본 커널(500)과 베이스 커널(510) 간에 서로 다른 엘리먼트들은 원본 커널(500)에서 +1의 웨이트를 갖는 엘리먼트들이다. 결국, 필터드 커널(520)은 도 5에 도시된 바와 같이, 일부의 엘리먼트에만 +1이 정의된 서브커널이다.In the present embodiments, the filtered kernel 520 defines the original weight of the original kernel 500 for the elements having different weights from the base kernel 510 in the original kernel 500, and defines the weights for the remaining elements , And may also be referred to as a term of a second type sub-kernel. In FIG. 5, elements different between the original kernel 500 and the base kernel 510 are elements having a weight of +1 in the original kernel 500. As a result, the filtered kernel 520 is a sub-kernel in which +1 is defined for only some of the elements, as shown in FIG.

이와 같이, 바이너리-웨이트를 갖는 원본 커널(500)은, 모든 엘리먼트들이 -1의 웨이트들로 치환된 베이스 커널(510)과, 일부 엘리먼트들에만 +1의 웨이트들이 정의된 필터드 커널(520)로 분해될 수 있다.As described above, the original kernel 500 having a binary-weight has a base kernel 510 in which all elements are replaced with weights of -1, and a filtered kernel 520 in which weights of +1 are defined for some elements. Lt; / RTI >

도 6은 다른 실시예에 따른 커널 분해를 설명하기 위한 도면이다.6 is a diagram for explaining kernel decomposition according to another embodiment.

도 6을 참고하면, 도 5의 원본 커널(500)의 분해와 유사한 방식으로 원본 커널(600)이 분해될 수 있다. 다만, 도 5와 달리, 원본 커널(600)로부터 분해된 베이스 커널(610)은 모든 엘리먼트들이 +1의 웨이트들로 치환되고, 필터드 커널(620)은 일부 엘리먼트들에만 -1의 웨이트들이 정의된다.Referring to FIG. 6, the original kernel 600 may be disassembled in a manner similar to disassembly of the original kernel 500 of FIG. 5, all of the elements of the base kernel 610, which are decomposed from the original kernel 600, are replaced by weights of +1, and the filtered kernel 620 defines the weights of -1 for some elements do.

즉, 본 실시예에 따른 뉴럴 네트워크의 바이너리-웨이트 커널은 도 5 또는 도 6에서 설명된 커널 분해를 이용하여 분해될 수 있다. That is, the binary-weight kernel of the neural network according to the present embodiment can be decomposed using the kernel decomposition described in FIG. 5 or FIG.

한편, 원본 커널(500) 및 원본 커널(600)에 정의된 값들은 모두 예시적인 값들일 뿐이고, 본 실시예들은 이에 제한되지 않는다.On the other hand, the values defined in the original kernel 500 and the original kernel 600 are all exemplary values, and the present embodiments are not limited thereto.

도 7은 일 실시예에 따라, 입력 피처맵과 원본 커널로부터 분해된 서브커널들 간의 컨볼루션 연산에 대해 설명하기 위한 도면이다.7 is a diagram for explaining a convolution operation between an input feature map and sub-kernels decomposed from the original kernel, according to an embodiment.

도 7을 참고하면, 도 4에서 설명된 컨볼루션 연산의 방식과 달리, 입력 피처맵(710)은 원본 커널(720)이 아닌, 원본 커널(720)로부터 분해된 베이스 커널(723) 및 필터드 커널(725)과 컨볼루션 연산을 각각 수행한다.4, the input feature map 710 includes not only the original kernel 720 but also the base kernel 723, which is decomposed from the original kernel 720, and the filtered kernel 720, And performs the convolution operation with the kernel 725, respectively.

먼저, 입력 피처맵(710)의 제1윈도우(711)는 베이스 커널(723)과의 베이스 컨볼루션 연산(제1컨볼루션 연산) 및 필터드 커널(725)과의 필터드 컨볼루션 연산(제2컨볼루션 연산)을 각각 수행한다. 베이스 컨볼루션 연산 결과(742)는 -45이고, 필터드 컨볼루션 연산 결과(746)는 24이다. 다음으로, 입력 피처맵(710)의 나머지 윈도우들 각각에 대해서도 베이스 커널(723)과의 베이스 컨볼루션 연산(또는 제1컨볼루션 연산의 용어로도 지칭) 및 필터드 커널(725)과의 필터드 컨볼루션 연산(또는 제2컨볼루션 연산의 용어로도 지칭)이 각각 수행되고, 이에 따라 베이스 출력(741)과 필터드 출력(745)의 전체 픽셀 값이 결정될 수 있다. 한편, 필터드 컨볼루션 연산은, 필터드 커널(725)에서 웨이트가 정의된 엘리먼트만을 이용하여 수행되고, 웨이트가 정의되지 않은 엘리먼트에 대한 곱셈은 스킵될 수 있다. 이에 따라, 프로세서(110)의 곱셈 연산량이 어느 정도 감소될 수 있다.First of all, the first window 711 of the input feature map 710 includes a base convolution operation (first convolution operation) with the base kernel 723 and a filtered convolution operation with the filtered kernel 725 2 convolution operation). The base convolution operation result 742 is -45, and the filtered convolution operation result 746 is 24. Next, for each of the remaining windows of the input feature map 710, a base convolution operation (also referred to as the term of the first convolution operation) with the base kernel 723 and a filter with the filter kernel 725 Convolution operation (also referred to as a second convolution operation) are performed, respectively, so that the overall pixel values of the base output 741 and the filtered output 745 can be determined. The filtered convolution operation, on the other hand, is performed using only the elements for which the weights are defined in the filtered kernel 725, and the multiplication for the elements for which the weights are not defined can be skipped. Accordingly, the multiplication operation amount of the processor 110 can be reduced to some extent.

베이스 출력(741)과 필터드 출력(745)의 각 픽셀은 출력 피처맵(730)의 각 픽셀에 대응한다. 출력 피처맵(730)의 각 픽셀 값은 대응하는 베이스 출력(741)과 필터드 출력(745)의 각 픽셀 값을 이용하여 결정될 수 있다. 이에 대해서는 도 7을 참고하여 설명하도록 한다.Each pixel of the base output 741 and the filtered output 745 corresponds to each pixel of the output feature map 730. Each pixel value of the output feature map 730 may be determined using each pixel value of the corresponding base output 741 and the filtered output 745. This will be described with reference to FIG.

도 8은 일 실시예에 따라 베이스 출력과 필터드 출력을 이용하여 출력 피처맵의 픽셀 값을 결정하는 것을 설명하기 위한 도면이다.8 is a diagram for explaining determining pixel values of an output feature map using a base output and a filtered output according to an embodiment.

도 8을 참고하면, 도 7의 베이스 출력(741)에서의 베이스 컨볼루션 연산 결과(742)와 필터드 출력(745)에서의 필터드 컨볼루션 연산 결과(746)의 2배수를 합산한 값에 기초하여, 출력 피처맵(800)의 픽셀 값(810)으로 결정될 수 있다. 즉, 베이스 컨볼루션 연산 결과(742)인 -45와 필터드 컨볼루션 연산 결과(746)의 2배수인 48을 합산한 값인 3이, 출력 피처맵(800)의 픽셀 값(810)으로 결정된다. 베이스 컨볼루션 연산 결과(742), 필터드 컨볼루션 연산 결과(746) 및 픽셀 값(810)은 서로 대응하는 위치들에서의 값들이다.8, the base convolution operation result 742 in the base output 741 of FIG. 7 and the filtered convolution operation result 746 in the filtered output 745 are multiplied by two May be determined as the pixel value 810 of the output feature map 800. [ That is, the sum of the base convolution operation result 742 (-45) and the filtered convolution operation result 746 (48), which is a multiple of 48, is determined as the pixel value 810 of the output feature map 800 . The base convolution operation result 742, the filtered convolution operation result 746, and the pixel value 810 are values at positions corresponding to each other.

도 7 및 도 8에서 원본 커널(720)의 커널 분해를 이용하여 최종적으로 획득된 출력 피처맵(800)의 픽셀 값(810)은, 도 4에서 커널 분해 없이 획득된 출력 피처맵(430)의 픽셀 값(431)과 동일하다. 하지만, 컨볼루션 연산의 연산량은 필터드 커널(725)에서의 빈 엘리먼트들로 인하여 감소될 여지가 있다. 즉, 원본 커널(720)의 커널 분해를 이용한 컨볼루션 연산이 수행된다면, 동일한 컨볼루션 연산 결과를 얻으면서도 컨볼루션 연산의 연산량 감소 효과를 이룰 수 있다.The pixel values 810 of the output feature map 800 ultimately obtained using kernel decomposition of the original kernel 720 in Figures 7 and 8 correspond to the pixel values 810 of the output feature map 430 obtained without kernel decomposition in Figure 4 Pixel value < / RTI > However, the amount of computation of the convolution operation may be reduced due to the empty elements in the filtered kernel 725. That is, if the convolution operation using the kernel decomposition of the original kernel 720 is performed, the same convolution operation result can be obtained, but the arithmetic operation amount of the convolution operation can be reduced.

한편, 도 7 및 도 8에서 설명된 커널 분해는 도 5에서 설명된 방식(-1로 치환된 베이스 커널)을 이용하여 수행되었으나, 이에 제한되지 않고 도 6에서 설명된 방식(+1로 치환된 베이스 커널)의 커널 분해를 이용하여서도 동일한 결론이 도출될 수 있음을 당해 기술분야의 통상의 기술자라면 이해할 수 있다. 또한, 도 7 및 도 8에서 설명된 입력 피처맵(710) 및 원본 커널(720)에서 정의된 값들은 모두 예시적인 값들일 뿐이고, 본 실시예들은 이에 제한되지 않는다.Meanwhile, the kernel decomposition described in FIGS. 7 and 8 is performed using the method described in FIG. 5 (the base kernel replaced with -1), but the present invention is not limited thereto. It should be understood by those of ordinary skill in the art that the same conclusions may be drawn using a kernel decomposition of the kernel kernel (the base kernel). In addition, the values defined in the input feature map 710 and the original kernel 720 described in FIGS. 7 and 8 are all exemplary values, and the present embodiments are not limited thereto.

비록, 도 7 및 도 8에서는 입력 피처맵(710) 내 하나의 윈도우(711)와 하나의 원본 커널(720)로부터 분해된 서브 커널들(723, 725)을 이용하는 실시예에 대해 설명되었으나, 프로세서(110)는 뉴럴 네트워크 내 각 레이어에 포함된 다양한 입력 피처맵들 및 다양한 커널들에 대하여, 앞서 설명된 방식들을 적절하게 적용하여 컨볼루션 연산들을 수행할 수 있다.Although FIGS. 7 and 8 illustrate embodiments using sub-kernels 723 and 725 resolved from one window 711 and one source kernel 720 in the input feature map 710, The controller 110 may perform convolution operations by appropriately applying the schemes described above to various input feature maps and various kernels contained in each layer in the neural network.

도 9는 일 실시예에 따라 하나의 입력 피처맵으로부터 컨볼루션 연산을 통해 복수의 출력 피처맵들을 생성하는 것을 설명하기 위한 도면이다.9 is a diagram for describing generating a plurality of output feature maps through convolution operations from one input feature map in accordance with one embodiment.

도 9을 참고하면, 프로세서(110)는 하나의 입력 피처맵(910)과 복수의 커널들(920) 각각과의 컨볼루션 연산을 수행함으로써 복수의 출력 피처맵들(931, 932, 933)을 생성한다. 예를 들어, 입력 피처맵(910)과 원본 커널 1(940) 간의 컨볼루션 연산을 통해 출력 피처맵 1(931)이 생성되고, 입력 피처맵(910)과 원본 커널 2(940) 간의 컨볼루션 연산을 통해 출력 피처맵 2(932)가 생성되고, ..., 입력 피처맵(910)과 원본 커널 N(960) 간의 컨볼루션 연산을 통해 출력 피처맵 N(933)이 생성될 수 있다 (N은 자연수).9, processor 110 performs a convolution operation on one input feature map 910 and each of a plurality of kernels 920 to thereby generate a plurality of output feature maps 931, 932, 933 . For example, an output feature map 1 931 is generated through a convolution operation between the input feature map 910 and the original kernel 1 940, and a convolution between the input feature map 910 and the original kernel 2 940 An output feature map 2 932 may be generated through an operation and an output feature map N 933 may be generated through a convolution operation between an input feature map 910 and an original kernel N 960 N is a natural number).

원본 커널 1(940)은 입력 피처맵(910)과 컨볼루션 연산이 수행되는 커널이다. 본 실시예들에 따라, 원본 커널 1(940)은 베이스 커널 1(941)과 필터드 커널 1(942)로 분해된다. 입력 피처맵(910)의 각 윈도우와 베이스 커널 1(941) 간의 베이스 컨볼루션 연산을 통해 베이스 출력 1(970)이 획득되고, 입력 피처맵(910)의 각 윈도우와 필터드 커널 1(942) 간의 필터드 컨볼루션 연산을 통해 필터드 출력 1(981)이 획득된다. 도 8에서 설명된, 베이스 출력 1(970)과 필터드 출력 1(981)의 합성을 통해 출력 피처맵 1(931)이 생성된다.The original kernel 1 940 is an input feature map 910 and a kernel in which a convolution operation is performed. In accordance with these embodiments, the original kernel 1 940 is decomposed into a base kernel 1 941 and a filtered kernel 1 942. Base output 1 970 is obtained through a base convolution operation between each window of input feature map 910 and base kernel 1 941 and each window of input feature map 910 and filtered kernel 1 942, Filtered output 1 981 is obtained through a filtered convolution operation between the filtered output 1 (981). Output feature map 1 931 is generated by combining base output 1 970 and filtered output 1 981, which is described in FIG.

다음으로, 원본 커널 2(950)도 입력 피처맵(910)과 컨볼루션 연산이 수행되는 커널이다. 원본 커널 2(950)도 마찬가지로 베이스 커널 2(951)와 필터드 커널 2(952)로 분해될 수 있다. 여기서, 베이스 커널 2(951)는 앞서 분해된 베이스 커널 1(941)과 동일하다. 왜냐하면, 베이스 커널 1(941) 및 베이스 커널 2(951)는 모두 원본 커널 1(940) 및 원본 커널 2(950) 각각의 엘리먼트들이 모두 동일한 부호의 웨이트(-1 또는 +1)로 치환된 서브커널들이기 때문이다. 예를 들어, 베이스 커널 1(941) 및 베이스 커널 2(951)의 엘리먼트들은 모두 -1이거나, 또는 모두 +1일 수 있다. 그러므로, 입력 피처맵(910)의 각 윈도우와 베이스 커널 2(951) 간의 베이스 컨볼루션 연산의 결과는, 입력 피처맵(910)의 각 윈도우와 베이스 커널 1(941) 간의 베이스 컨볼루션 연산의 결과인 베이스 출력 1(970)과 동일하다. 따라서, 입력 피처맵(910)의 각 윈도우와 베이스 커널 2(951) 간의 베이스 컨볼루션 연산은 스킵되고, 베이스 출력 1(970)은 입력 피처맵(910)의 각 윈도우와 베이스 커널 2(951) 간의 베이스 컨볼루션 연산의 결과인 것으로 재사용된다. 즉, 베이스 출력 1(970)은 다른 베이스 커널들의 베이스 컨볼루션 연산의 결과들인 것으로 공유될 수 있다. 이로써, 프로세서(110)는 하나의 입력 피처맵과 하나의 베이스 커널 간의 컨볼루션 연산을 수행한 경우에는, 나머지 베이스 커널들에 대한 컨볼루션 연산을 스킵할 수 있으므로, 컨볼루션 연산의 연산량을 감소시킬 수 있다.Next, the original kernel 2 950 is also a kernel in which an input feature map 910 and a convolution operation are performed. The original kernel 2 (950) can be decomposed into the base kernel 2 (951) and the filtered kernel 2 (952) as well. Here, the base kernel 2 (951) is the same as the base kernel 1 (941) disassembled previously. This is because both of the base kernel 1 941 and the base kernel 2 951 are replaced with the same sign (-1 or +1) of the elements of the original kernel 1 940 and the original kernel 2 950 Because they are kernels. For example, elements of base kernel 1 941 and base kernel 2 951 may both be -1, or all +1. The result of the base convolution operation between each window of the input feature map 910 and the base kernel 2 951 is therefore the result of the base convolution operation between each window of the input feature map 910 and the base kernel 1 941 In base output 1 (970). The base convolution operation between each window of the input feature map 910 and the base kernel 2 951 is skipped and the base output 1 970 is skipped between each window of the input feature map 910 and the base kernel 2 951, Lt; RTI ID = 0.0 > convolution < / RTI > That is, base output 1 (970) may be shared as being the results of the base convolution operation of the other base kernels. Accordingly, when the processor 110 performs the convolution operation between one input feature map and one base kernel, the processor 110 can skip the convolution operation on the remaining base kernels, thereby reducing the amount of computation of the convolution operation .

도 9에서 베이스 커널 1(941)은 나머지 베이스 출력들로서 공유될 베이스 출력 1(970)을 생성하기 위한 최초의 베이스 컨볼루션 연산을 수행하는 서브커널이므로, 본 실시예들에서는 베이스 커널 1(941)의 유래가 되는 원본 커널 1(940)을 최초 커널이라는 용어로 정의하도록 하나, 이에 제한되지 않고 다른 용어들로도 다양하게 정의될 수도 있다.9, since the base kernel 1 941 is a sub-kernel for performing the first base convolution operation for generating the base output 1 970 to be shared as the remaining base outputs, in this embodiment, the base kernel 1 941, The original kernel 1 940 originating from the first kernel 940 may be defined in terms of the initial kernel, but may be variously defined in other terms without being limited thereto.

나머지 원본 커널(원본 커널 N(960))에 대해서도 앞서 설명된 바와 같이, 베이스 커널 N(961) 및 필터드 커널 N(962)으로의 커널 분해가 수행되나, 입력 피처맵(910)과 베이스 커널 N(961) 간의 베이스 컨볼루션 연산은 스킵되고, 베이스 출력 1(970)이 베이스 커널 N(961)의 베이스 컨볼루션 연산의 결과인 것으로 공유될 수 있다.Kernel decomposition into the base kernel N 961 and the filtered kernel N 962 is performed for the remaining original kernels (original kernel N 960) as described above, but the input feature map 910 and the base kernel The base convolution operation between N (961) is skipped and the base output 1 (970) can be shared as being the result of the base convolution operation of base kernel N (961).

한편, 입력 피처맵(910)과 필터드 커널 2(952), ..., 필터드 커널 N(962) 각각과의 필터드 컨볼루션 연산은 개별적으로 수행되며, 그 결과 필터드 출력 2(982), ..., 필터드 출력 N(983)이 획득된다. 나머지 출력 피처맵들(932, 933)은 공유된 베이스 출력 1(970)과 필터드 출력들(982, 983) 각각의 합성에 의해 생성된다.On the other hand, the filtered convolution operations with input feature map 910, filtered kernel 2 952, ..., and filtered kernel N 962, respectively, are performed separately resulting in filtered output 2 982 ), ..., and filtered output N 983 are obtained. The remaining output feature maps 932 and 933 are generated by combining the shared base output 1 970 and the filtered outputs 982 and 983, respectively.

결국, 복수의 커널들에 대한 컨볼루션 연산들이 수행될 때, 프로세서(110)는 커널 분해를 이용함으로써, 베이스 출력의 공유로 인한 연산량 감소 및 필터드 커널에서의 빈 엘리먼트로 인한 연산량 감소를 이룰 수 있다.As a result, when convolution operations on a plurality of kernels are performed, the processor 110 can utilize kernel decomposition to reduce the amount of computation due to the sharing of the base output and the amount of computation due to the empty element in the filtered kernel have.

도 10은 일 실시예에 따라 복수의 입력 피처맵들로부터 컨볼루션 연산을 통해 복수의 출력 피처맵들을 생성하는 것을 설명하기 위한 도면이다.10 is a diagram for describing generating a plurality of output feature maps through convolution operations from a plurality of input feature maps in accordance with one embodiment.

도 10에서 서브커널의 참조문자 K_NMb는 원본 커널 K_N(1023)으로부터 분해된 베이스 커널이면서, 입력 피처맵 M(1003)과 원본 커널 K_N(1023) 간의 베이스 컨볼루션 연산을 위한 베이스 커널을 나타낸다. 그리고, 서브커널의 참조문자 K_NMf는 원본 커널 K_N(1023)으로부터 분해된 필터드 커널이면서, 입력 피처맵 M(1003)과 원본 커널 K_N(1023) 간의 필터드 컨볼루션 연산을 위한 필터드 커널을 나타낸다. 예를 들어, K_12b는 원본커널 K₁(1021)로부터 분해된 베이스 커널이면서, 입력 피처맵 2(1002)와 원본 커널 K₁(1021) 간의 베이스 컨볼루션 연산을 위한 베이스 커널을 나타내고, K_21f는 원본커널 K₂(1022)로부터 분해된 필터드 커널이면서, 입력 피처맵 1(1001)과 원본 커널 K₂(1022) 간의 필터드 컨볼루션 연산을 위한 필터드 커널을 나타낸다(M, N, L은 자연수).Reference character K _NMb of the sub-kernel in Fig. 10 while the base kernel degradation from the original kernel K _N (1023), a database kernel for base convolution operation between the input feature map M (1003) to match that of the original kernel K _N (1023) . The sub-kernel reference character K _NMf is a filtered kernel decomposed from the original kernel K _N (1023), filtered by the filter kernel for the filtered convolution operation between the input feature map M (1003) and the original kernel K _N (1023) Represents the kernel. For example, K _12b denotes a database kernel for base convolution operation between the original kernels K ₁ while the base kernel decomposition from 1021, the input feature map 2 1002 as the original kernels K ₁ (1021), K _21f is both a filter de kernel degradation from the original kernel K ₂ (1022), it shows a filter de kernel for the filter de-convolution operation between the input feature map 1 (1001) of the original kernel _{K 2 (1022) (M,} N, L Is a natural number).

한편, 입력 피처맵 1(1001)은 뉴럴 네트워크의 어느 레이어에서 홀수 채널의 인덱스를 갖는 입력 피처맵이고, 입력 피처맵 2(1002)는 그 레이어에서 짝수 채널의 인덱스를 갖는 입력 피처맵일 수 있다.On the other hand, the input feature map 1 1001 may be an input feature map having an index of an odd channel at a certain layer of the neural network, and the input feature map 1002 may be an input feature map having an index of an even channel at that layer.

도 10을 참고하면, 프로세서(110)는 복수의 입력 피처맵들(1001, 1002, 1003)과 복수의 커널들(1021, 1022, 1023) 각각과의 컨볼루션 연산을 수행함으로써 복수의 출력 피처맵들(1031, 1032, 1033)을 생성한다.10, the processor 110 may perform a convolution operation on a plurality of input feature maps 1001, 1002, and 1003 and a plurality of kernels 1021, 1022, and 1023, respectively, (1031, 1032, 1033).

구체적으로, 입력 피처맵 1(1001) 내지 입력 피처맵 M(1003) 각각과 원본 커널 K₁(1021) 간의 컨볼루션 연산을 위하여, 원본 커널 K₁(1021)은 베이스 커널 K_11b 및 필터드 커널 K_11f로 분해된다. 입력 피처맵 1(1001) 내지 입력 피처맵 M(1003) 각각과 베이스 커널 K_11b 간의 베이스 컨볼루션 연산 결과들은 축적(accumulation) 연산을 통해 베이스 출력 1로서 생성된다. 그리고, 입력 피처맵 1(1001) 내지 입력 피처맵 M(1003) 각각과 필터드 커널 K_11f 간의 필터드 컨볼루션 연산 결과들은 축적 연산을 통해 필터드 출력 1로서 생성된다. 출력 피처맵 1(1031)은 베이스 출력 1과 필터드 출력 1을 이용하여 앞서 설명된 방식의 합성에 의해 생성된다.Specifically, for the convolution operation between each of the input feature map 1 (1001) to the input feature map M (1003) and the original kernel K ₁ 1021, the original kernel K ₁ 1021 includes a base kernel K _11b and a filtered kernel K _11f . The base convolution operation results between the input feature map 1 (1001) to the input feature map M (1003) and the base kernel K _11b are generated as the base output 1 through an accumulation operation. The filtered convolution operation results between the input feature map 1 (1001) and the input feature map M (1003) and the filtered kernel K _11f are generated as the filtered output 1 through the accumulation operation. Output feature map 1 (1031) is generated by synthesis of the manner described above using base output 1 and filtered output 1.

일 실시예에 따르면, 다중 입력(복수의 입력 피처맵들) 및 다중 출력(복수의 출력 피처맵들)의 경우, 홀수 채널 인덱스의 입력 피처맵과 컨볼루션 연산이 수행될 원본 커널로부터 분해된 베이스 커널은, 모든 엘리먼트들에서의 웨이트들이 제1부호의 동일한 값들로 치환되어 재구성된다. 그러나, 짝수 채널 인덱스의 입력 피처맵과 컨볼루션 연산이 수행될 원본 커널로부터 분해된 베이스 커널은, 모든 엘리먼트들에서의 웨이트들이 제2부호의 동일한 값들로 치환되어 재구성된다. 여기서, 제1부호의 값이 -1인 경우 제2부호의 값은 +1이고, 제1부호의 값이 +1인 경우 제2부호의 값은 -1이다.According to one embodiment, in the case of multiple inputs (multiple input feature maps) and multiple outputs (multiple output feature maps), the input feature map of the odd channel index and the base of the decomposed from the original kernel, The kernel is reconstructed by replacing the weights in all elements with the same values of the first sign. However, the input kernel map of the even channel index and the base kernel decomposed from the original kernel in which the convolution operation is performed are reconstructed by replacing the weights in all the elements with the same values of the second code. Here, when the value of the first code is -1, the value of the second code is +1, and when the value of the first code is +1, the value of the second code is -1.

도 10에 도시된 바와 같이, 홀수 채널의 인덱스를 갖는 입력 피처맵 1(1001)과 컨볼루션 연산이 수행될 베이스 커널들 K_11b, K_21b, ..., K_N1b는 모두 -1로 치환된 서브커널들에 해당된다. 이와 달리, 짝수 채널의 인덱스를 갖는 입력 피처맵 2(1002)와 컨볼루션 연산이 수행될 베이스 커널들 K_12b, K_22b, ..., K_N2b는 모두 +1로 치환된 서브커널들에 해당된다.As shown in FIG. 10, the input feature map 1 (1001) having an odd channel index and the base kernels K _11b , K _21b , ..., K _N1b to be subjected to the convolution operation are all replaced with -1 Sub-kernels. Alternatively, the input feature map 2 (1002) having an index of an even channel and the base kernels K _12b , K _22b , ..., K _N2b to be _convoluted are all associated with sub- do.

이와 같이, 홀수 채널에 대응하는 베이스 커널과 짝수 채널에 대응하는 베이스 커널을 서로 다른 부호의 웨이트들로 치환하는 이유는, 축적 연산의 축적 값을 되도록 작은 값으로 감소시키기 위함이다. 다시 말하면, 홀수 및 짝수 채널들에 대응하는 베이스 커널들을 모두 동일한 부호의 웨이트들로 치환하면, 베이스 출력에서의 축적 값이 매우 커질 우려가 있고, 이에 따라 베이스 출력을 저장하기 위한 메모리 공간이 부족할 가능성이 있기 때문이다. 하지만, 다른 실시예에 따라, 베이스 출력을 저장하기 위한 메모리 공간이 충분히 확보될 수 있는 환경이라면, 홀수 및 짝수 채널들에 대응하는 베이스 커널들을 모두 동일한 부호의 웨이트들로 치환하는 방식으로 구현되는 것도 가능하다.The reason why the base kernels corresponding to the odd-numbered channels and the base kernels corresponding to the even-numbered channels are replaced with weights having different codes is to reduce the accumulation value of the accumulation operation to a small value as described above. In other words, if the base kernels corresponding to the odd and even channels are all replaced with weights having the same sign, the accumulation value at the base output may become very large, which may result in a lack of memory space for storing the base output This is because. However, according to another embodiment, in an environment in which a sufficient memory space for storing the base output can be ensured, the base kernels corresponding to odd and even channels are all replaced with weights of the same sign It is possible.

다음으로, 입력 피처맵 1(1001) 내지 입력 피처맵 M(1003) 각각과 원본 커널 K₂(1022) 간의 컨볼루션 연산에 따른 출력 피처맵 2(1032)도 커널 분해에 기초하여 베이스 출력 2 및 필터드 출력 2의 합성에 의해 생성될 수 있다. 이때, 베이스 출력 2는 베이스 출력 1과 동일하다. 왜냐하면, 베이스 커널들 K_21b, K_22b, ..., K_2Mb 각각은 베이스 커널들 K_11b, K_12b, ..., K_1Mb 각각과 동일하기 때문이다. 그러므로, 베이스 커널들 K_21b, K_22b, ..., K_2Mb 각각을 이용한 베이스 컨볼루션 연산 및 베이스 출력 2를 위한 축적 연산은 스킵되고, 베이스 출력 1은 베이스 출력 2로서 공유되어 재사용된다. Next, the output feature map 2 (1032) according to the convolution operation between the input feature map 1 (1001) to the input feature map M (1003) and the original kernel K ₂ (1022) Lt; RTI ID = 0.0 > 2 < / RTI > At this time, base output 2 is the same as base output 1. This is because each of the base kernels K _21b , K _22b , ..., K _{2Mb is the same as} each of the base kernels K _11b , K _12b , ..., K _1Mb . Therefore, the base convolution operation using each of the base kernels K _21b , K _22b , ..., K _2Mb and the accumulation operation for the base output 2 are skipped and the base output 1 is shared and reused as the base output 2.

한편, 나머지 베이스 컨볼루션 연산 및 축적 연산은 마찬가지로 스킵되고, 베이스 출력 1은 다른 베이스 출력(베이스 출력 N)에 대해서도 재사용된다.On the other hand, the remaining base convolution operation and accumulation operation are skipped in the same manner, and the base output 1 is also reused for the other base output (base output N).

결국, 다중 입력(복수의 입력 피처맵들) 및 다중 출력(복수의 출력 피처맵들)의 경우에도 마찬가지로, 프로세서(110)는 커널 분해를 이용함으로써, 베이스 출력의 공유로 인한 연산량 감소 및 필터드 커널에서의 빈 엘리먼트로 인한 연산량 감소가 달성될 수 있다.As a result, in the case of multiple inputs (multiple input feature maps) and multiple outputs (multiple output feature maps), processor 110 likewise uses kernel decomposition to reduce computational load due to sharing of base output, A reduction in computation due to an empty element in the kernel can be achieved.

도 11은 일 실시예에 따라 커널 분해에 기초하여 뉴럴 네트워크의 컨볼루션 연산을 수행하기 위한 하드웨어 설계를 도시한 도면이다.11 is a diagram illustrating a hardware design for performing a convolution operation of a neural network based on kernel decomposition according to one embodiment.

도 11을 참고하면, 도 3의 프로세서(110)은 컨트롤러(1100), 필터드 컨볼루션 연산기들(1101, 1102, 1103), 베이스 컨볼루션 연산기(1104), 곱셈기들(1111, 1112, 1113) 및 쉬프터들(1121, 1122, 1123)을 포함하도록 구현될 수 있다. 도 3의 메모리(120)은 입력 피처맵 버퍼(1151), 커널 버퍼(1152), 출력 피처맵 버퍼(1153) 및 스케일링 팩터 버퍼(1154)를 포함하도록 구현될 수 있다.The processor 110 of FIG. 3 includes a controller 1100, filtered convolution operators 1101, 1102 and 1103, a base convolution operator 1104, multipliers 1111, 1112 and 1113, And shifters 1121, 1122, and 1123, respectively. The memory 120 of FIG. 3 may be implemented to include an input feature map buffer 1151, a kernel buffer 1152, an output feature map buffer 1153, and a scaling factor buffer 1154.

컨트롤러(1100)는 도 11에 도시된 구성요소들의 전체적인 동작 및 기능을 제어한다. 예를 들어, 컨트롤러(1100)는 커널 버퍼(1152)에 저장된 원본 커널에 대한 커널 분해를 처리하고, 뉴럴 네트워크의 컨볼루션 연산을 위하여 각 구성요소의 동작을 스케줄링할 수 있다.The controller 1100 controls the overall operation and functions of the components shown in Fig. For example, the controller 1100 may process the kernel decomposition for the original kernel stored in the kernel buffer 1152, and may schedule the operation of each component for the convolution operation of the neural network.

입력 피처맵 버퍼(1151)는 뉴럴 네트워크의 입력 피처맵들을 저장하고, 커널 버퍼(1152)는 원본 커널들, 분해된 베이스 커널들 및 필터드 커널들을 저장하고, 출력 피처맵 버퍼(1153)는 생성된 출력 피처맵들을 저장한다.The input feature map buffer 1151 stores input feature maps of the neural network, the kernel buffer 1152 stores original kernels, decomposed base kernels and filtered kernels, and an output feature map buffer 1153 Stored output feature maps.

베이스 컨볼루션 연산기(1104)는 입력 피처맵 버퍼(1151)로부터 제공된 입력 피처맵과 커널 버퍼(1152)로부터 제공된 베이스 커널 간의 베이스 컨볼루션 연산을 수행한다. 베이스 컨볼루션 연산기(1104)는 최초의 베이스 컨볼루션 연산이 수행될 경우에만 동작된다. 따라서, 그 이후에 나머지 윈도우들에 대한 컨볼루션 연산이 진행될 때에는 클럭 게이팅(clock gating)을 통해 에너지 소모를 감소시킨다.The base convolution operator 1104 performs a base convolution operation between the input feature map provided from the input feature map buffer 1151 and the base kernel provided from the kernel buffer 1152. The base convolution operator 1104 is operated only when the first base convolution operation is performed. Therefore, when the convolution operation is performed for the remaining windows thereafter, the energy consumption is reduced through clock gating.

필터드 컨볼루션 연산기들(1101, 1102, 1103)은 입력 피처맵 버퍼(1151)로부터 제공된 입력 피처맵과 커널 버퍼(1152)로부터 제공된 필터드 커널 간의 필터드 컨볼루션 연산을 각각 수행한다. 도 11의 하드웨어 설계에 따르면, 복수 개의 필터드 컨볼루션 연산기들(1101, 1102, 1103)과 달리, 베이스 컨볼루션 연산기(1104)는 하나만 구현되는 것으로 도시되어 있다. 이는 베이스 컨볼루션 연산기(1104)에 의한 베이스 컨볼루션 연산 결과는 공유되기 때문이다.The filtered convolution operators 1101, 1102, and 1103 perform a filtered convolution operation between the input feature map provided from the input feature map buffer 1151 and the filtered kernel provided from the kernel buffer 1152, respectively. According to the hardware design of FIG. 11, unlike the plurality of filtered convolution operators 1101, 1102, and 1103, only one base convolution operator 1104 is shown to be implemented. This is because the result of the base convolution operation by the base convolution operator 1104 is shared.

베이스 컨볼루션 연산기(1104) 및 필터드 컨볼루션 연산기들(1101, 1102, 1103)에서 연산 엘리먼트들의 어레이는 분해된 서브커널의 크기(즉, 윈도우 크기)에 대응하도록 구현될 수 있다.The array of arithmetic elements in the base convolution arithmetic operator 1104 and the filtered convolution arithmetic operators 1101, 1102 and 1103 can be implemented to correspond to the size of the decomposed sub-kernel (i.e., the window size).

베이스 컨볼루션 연산기(1104) 및 필터드 컨볼루션 연산기들(1101, 1102, 1103)은 프로세서(110) 내에서 컨볼루션 연산의 병렬 처리를 수행할 수 있다.The base convolution operator 1104 and the filtered convolution operators 1101, 1102, and 1103 may perform parallel processing of the convolution operation within the processor 110.

곱셈기들(1111, 1112, 1113) 및 쉬프터들(1121, 1122, 1123)은 베이스 컨볼루션 연산 결과와 필터드 컨볼루션 연산 결과에 대한 축적 연산을 수행하고, 축적 연산의 결과에 의해 생성된 출력 피처맵은 출력 피처맵 버퍼(1153)에 저장된다. 여기서, 쉬프터들(1121, 1122, 1123)은 도 8에서 설명된 필터드 출력의 2배수를 획득하기 위하여, 필터드 출력에 대해 1비트 좌측 쉬프트 연산을 수행할 수 있다.The multipliers 1111, 1112 and 1113 and the shifters 1121, 1122 and 1123 perform an accumulation operation on the base convolution operation result and the filtered convolution operation result, The map is stored in the output feature map buffer 1153. Here, the shifters 1121, 1122, and 1123 may perform a one-bit left shift operation on the filtered output to obtain a multiple of the filtered output described in FIG.

도 12a 및 도 12b는 다른 실시예에 따른 터너리-웨이트 커널의 커널 분해를 설명하기 위한 도면이다.12A and 12B are views for explaining kernel decomposition of the ternary-weight kernel according to another embodiment.

도 12a를 참고하면, 터너리-웨이트 원본 커널(1200)은 -1, 0, 1의 웨이트들을 갖는다. 예를 들어, 베이스 커널(1201)은 모든 엘리먼트들이 -1 (또는 +1)로 치환된 서브커널일 수 있다. 이때, 필터드 커널은 2개가 요구되고, 그 중 하나는 “±1” 필터드 커널(1202)이고, 다른 하나는 “0” 필터드 커널(1203)이다. “±1” 필터드 커널(1202)은 원본 커널(1200)에서 +1 (또는 -1)의 웨이트만을 정의한 서브커널이고, “0” 필터드 커널(1203)은 원본 커널(1200)에서 0의 웨이트만을 정의한 서브커널이다.12A, the ternary-weight original kernel 1200 has weights of -1, 0, For example, the base kernel 1201 may be a sub-kernel in which all elements are replaced with -1 (or +1). At this time, two filtered kernels are required, one of which is a "± 1" filtered kernel 1202 and the other is a "0" filtered kernel 1203. Quot; 1 " filtered kernel 1202 is a sub-kernel defining only a weight of +1 (or -1) in the original kernel 1200 and a filtered kernel 1203 is a sub- It is a sub-kernel that defines only the weight.

베이스 컨볼루션 연산은 베이스 커널(1201)을 이용하여 수행되고, 필터드 컨볼루션 연산은 “±1” 필터드 커널(1202) 및 “0” 필터드 커널(1203) 각각에 대해 별도로 수행된다.The base convolution operation is performed using the base kernel 1201 and the filtered convolution operation is performed separately for each of the "± 1" filtered kernel 1202 and the "0" filtered kernel 1203.

도 12b를 참고하면, 출력 피처맵(1230)의 픽셀 값은, 베이스 출력 값, “0” 필터드 출력 값 및 “±1” 필터드 출력의 2배 값이 합산됨으로써 결정될 수 있다.Referring to Figure 12B, the pixel value of the output feature map 1230 may be determined by summing the base output value, the " 0 " filtered output value, and the doubled value of the " ± 1 " filtered output.

뉴럴 네트워크가 터너리-웨이트 커널을 포함하는 경우, 앞서 설명된 바이너리-웨이트 커널과 유사한 방식으로 커널 분해 및 분해된 서브 커널들을 이용한 컨볼루션 연산이 수행될 수 있음을 당해 기술분야의 통상의 기술자라면 이해할 수 있다.It will be appreciated by those skilled in the art that if the neural network includes a ternary-weight kernel, convolution operations using kernel decomposition and decomposition sub-kernels may be performed in a manner similar to the binary-weight kernel described above I can understand.

한편, 원본 커널(1200)에 정의된 값들은 모두 예시적인 값들일 뿐이고, 본 실시예들은 이에 제한되지 않는다.On the other hand, the values defined in the original kernel 1200 are all exemplary values, and the present embodiments are not limited thereto.

도 13은 일 실시예에 따라 뉴럴 네트워크 장치에서 커널 분해를 이용하여 뉴럴 네트워크의 컨볼루션 연산을 수행하는 과정을 설명하기 위한 흐름도이다.13 is a flowchart illustrating a process of performing a convolution operation of a neural network using kernel decomposition in a neural network apparatus according to an embodiment.

1301 단계에서, 프로세서(110)는 메모리(120)로부터 입력 피쳐맵과 커널에 대한 초기 데이터를 로드한다.In step 1301, the processor 110 loads the input feature map and initial data for the kernel from the memory 120.

1302 단계에서, 프로세서(110)는 시작 명령어의 가져오기가 완료되면, 로드된 입력 피쳐맵들의 첫 번째 윈도우에 대한 컨볼루션 연산을 시작한다.In step 1302, when the fetching of the start instruction is completed, the processor 110 starts a convolution operation on the first window of the loaded input feature maps.

1303 단계에서, 프로세서(110)는 입력 피처맵 인덱스 및 윈도우 인덱스를 초기화한다.In step 1303, the processor 110 initializes the input feature map index and the window index.

1304 단계에서, 프로세서(110)는 입력 피쳐맵 데이터들을 부호-확장한 후 부호-확장된 입력 피쳐맵 데이터들을 필터드 컨볼루션 연산기들(1101, 1102, 1103) 및 베이스 컨볼루션 연산기(1104)에 브로드캐스트한다.In step 1304, the processor 110 codes-expands the input feature map data, and outputs sign-extended input feature map data to the filtered convolution operators 1101, 1102, and 1103 and the base convolution operator 1104 Broadcast.

1305 단계에서, 프로세서(110)는 필터드 컨볼루션 연산을 수행한다.In step 1305, the processor 110 performs a filtered convolution operation.

1306 단계에서, 프로세서(110)는 현재 윈도우 인덱스가 최초 인덱스(예를 들어 '0')인지 여부룰 판단한다. 최초 인덱스인 경우, 1307 단계로 진행된다. 하지만, 최초 인덱스가 아닌 경우, 1305 단계로 진행된다. 이는, 한 윈도우에 대해서는 커널들 전체에 대하여 베이스 컨볼루션 연산을 최초 1회만 수행하기 위함이다.In step 1306, the processor 110 determines whether the current window index is the first index (e.g., '0'). If it is the first index, the process proceeds to step 1307. However, if it is not the first index, the process proceeds to step 1305. This is to perform the base convolution operation only once for the entire kernel for one window.

1307 단계에서, 프로세서(110)는 베이스 컨볼루션 연산을 수행한다. 다만, 1305 단계의 필터드 컨볼루션 연산과 1307 단계의 베이스 컨볼루션 연산은 시간의 선후에 관계 없이 병렬적으로 수행될 수 있다.In step 1307, the processor 110 performs a base convolution operation. However, the filtered convolution operation in step 1305 and the base convolution operation in step 1307 can be performed in parallel regardless of the time of the end.

1308 단계에서, 프로세서(110)는 입력 피처맵의 채널 인덱스를 +1만큼 증가시킨다.In step 1308, the processor 110 increments the channel index of the input feature map by +1.

1309 단계에서, 프로세서(110)는 증가된 채널 인덱스가 마지막 채널 인덱스인지 여부를 판단한다. 마지막 채널 인덱스인 경우, 1310 단계로 진행된다. 그러나, 마지막 채널 인덱스가 아닌 경우, 1304 단계로 리턴된다. 즉, 채널 인덱스가 마지막 채널이 될 때까지 1304 단계 내지 1308 단계가 반복된다.In step 1309, the processor 110 determines whether the increased channel index is the last channel index. If the index is the last channel index, step 1310 is performed. However, if it is not the last channel index, the process returns to step 1304. That is, steps 1304 to 1308 are repeated until the channel index becomes the last channel.

1310 단계에서, 프로세서(110)는 베이스 출력과 필터드 출력을 합성함으로써 출력 피처맵을 생성하고, 생성된 출력 피처맵을 메모리(120)에 저장한다.In step 1310, the processor 110 generates an output feature map by combining the base output and the filtered output, and stores the generated output feature map in the memory 120.

도 14는 일 실시예에 따른 뉴럴 네트워크의 컨볼루션 연산을 수행하는 방법의 흐름도이다. 도 14에 도시된, 뉴럴 네트워크의 컨볼루션 연산을 수행하는 방법은, 앞서 설명된 도면들에서 설명된 실시예들에 관련되므로, 이하 생략된 내용이라 할지라도, 앞서 도면들에서 설명된 내용들은 도 14의 방법에도 적용될 수 있다.14 is a flowchart of a method of performing convolutional computation of a neural network according to an embodiment. Since the method of performing the convolution operation of the neural network shown in FIG. 14 relates to the embodiments described in the above-described drawings, even if omitted from the following description, 14 < / RTI >

1401 단계에서, 프로세서(110)는 뉴럴 네트워크의 레이어에서 처리될, 입력 피처맵 및 바이너리-웨이트를 갖는 커널들의 데이터를 메모리(120)로부터 획득한다.In step 1401, the processor 110 obtains from the memory 120 the data of kernels having an input feature map and a binary-weight to be processed at a layer of the neural network.

1402 단계에서, 프로세서(110)는 커널들 각각을, 동일 부호의 웨이트들로 재구성된 제1타입 서브커널(베이스 커널) 및 커널과 제1타입 서브커널 간의 차이를 보정하기 위한 제2타입 서브커널(필터드 커널)로 분해한다.In step 1402, the processor 110 converts each of the kernels into a first type sub-kernel (base kernel) reconfigured with weights of the same sign and a second type sub-kernel for correcting the difference between the kernel and the first type sub- (Filtered kernel).

베이스 커널은 커널들 각각의 모든 엘리먼트들에서의 웨이트들을 동일한 값들로 치환함으로써 재구성된 서브커널이다. 베이스 커널은 입력 피처맵이 홀수 채널의 인덱스를 갖는 경우 베이스 커널의 모든 엘리먼트들에서의 웨이트들을 제1부호의 동일한 값들로 치환함으로써 재구성된 서브 커널이고, 입력 피처맵이 짝수 채널의 인덱스를 갖는 경우 베이스 커널은의 모든 엘리먼트들에서의 웨이트들을 제2부호의 동일한 값들로 치환함으로써 재구성된 서브 커널이다. 입력 피처맵과 컨볼루션 연산을 수행할 커널들 각각으로부터 분해된 베이스 커널들은 모두 동일하다.The base kernel is a reconstructed sub-kernel by replacing the weights in all the elements of each of the kernels with the same values. The base kernel is a reconstructed sub-kernel by replacing the weights in all the elements of the base kernel with the same values of the first code when the input feature map has an index of the odd channel, and if the input feature map has an index of the even channel The base kernel is a reconstructed sub-kernel by replacing the weights in all of the elements with the same values of the second sign. The input feature map and the decomposed base kernels from each of the kernels to be convoluted are the same.

한편, 필터드 커널은 커널들 각각에서 베이스 커널과 서로 다른 웨이트를 갖는 엘리먼트에는 커널들 각각의 원래 웨이트를 정의하고 나머지 엘리먼트에는 웨이트를 정의하지 않는 것으로 재구성된 서브커널이다.On the other hand, the filtered kernel is a sub-kernel that is reconstructed by defining the original weights of each of the kernels and the weights of the remaining elements in the elements having different weights from the base kernel in each of the kernels.

1403 단계에서, 프로세서(110)는 입력 피처맵과 커널들 각각으로부터 분해된 제1타입 서브커널(베이스 커널) 및 제2타입 서브커널(필터드 커널)을 이용하여 컨볼루션 연산을 수행한다.In step 1403, the processor 110 performs a convolution operation using the first type sub-kernel (base kernel) and the second type sub-kernel (filtered kernel), which are decomposed from the input feature map and the kernels, respectively.

프로세서(110)는 최초 커널로부터 분해된 베이스 커널과 입력 피처맵에서의 현재 윈도우 간의 제1컨볼루션 연산(베이스 컨볼루션 연산), 및 커널들로부터 분해된 필터드 커널들 각각과 현재 윈도우 간의 제2컨볼루션 연산(필터드 컨볼루션 연산)을 수행한다. 이때, 프로세서(110)는 커널들 중 최초 커널을 제외한 나머지 커널들로부터 분해된 베이스 커널들 각각과 현재 윈도우 간의 베이스 컨볼루션 연산은 스킵되도록 클럭-게이팅하고, 최초 커널에 대해 수행된 베이스 컨볼루션 연산의 결과는 나머지 커널들에 대한 베이스 컨볼루션 연산의 결과들인 것으로서 재사용된다.Processor 110 may perform a first convolution operation (base convolution operation) between the base kernel decomposed from the initial kernel and the current window in the input feature map, and a second convolution operation (base convolution operation) between each of the filtered kernels decomposed from the kernels and the current window Performs a convolution operation (filter de-convolution operation). At this time, the processor 110 clocks-gates the base convolution operation between each of the base kernels decomposed from the kernels other than the first kernel among the kernels and the current window, skips the base convolution operation between the base kernels, Is reused as the result of the base convolution operation on the remaining kernels.

필터드 컨볼루션 연산은 필터드 커널들 각각에서 웨이트가 정의된 엘리먼트와 입력 피처맵의 대응 픽셀 간에 대해 수행되고, 웨이트가 정의되지 않은 행렬 엘리먼트에 대해서는 스킵된다.The filtered convolution operation is performed between the weighted elements of each of the filtered kernels and the corresponding pixels of the input feature map, and skipped over the matrix elements whose weights are not defined.

1404 단계에서, 프로세서(110)는 컨볼루션 연산의 결과들을 합성함으로써 출력 피처맵을 구한다. 프로세서(110)는 출력 피처맵의 각 픽셀 값을, 베이스 커널과 입력 피처맵의 윈도우 간의 베이스 컨볼루션 연산의 결과 값에 필터드 커널과 윈도우 간의 필터드 컨볼루션 연산의 결과 값의 2배수를 합산한 값에 기초하여 결정함으로써, 출력 피처맵을 결정할 수 있다.In step 1404, the processor 110 obtains an output feature map by combining the results of the convolution operation. The processor 110 adds each pixel value of the output feature map to the result of the base convolution operation between the base kernel and the input feature map window by two times the result of the filtered convolution operation between the filtered kernel and the window By determining based on a value, an output feature map can be determined.

한편, 상술한 본 발명의 실시예들은 컴퓨터에서 실행될 수 있는 프로그램으로 작성 가능하고, 컴퓨터로 읽을 수 있는 기록매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다. 또한, 상술한 본 발명의 실시예에서 사용된 데이터의 구조는 컴퓨터로 읽을 수 있는 기록매체에 여러 수단을 통하여 기록될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드 디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등)와 같은 저장매체를 포함한다.The above-described embodiments of the present invention can be embodied in a general-purpose digital computer that can be embodied as a program that can be executed by a computer and operates the program using a computer-readable recording medium. In addition, the structure of the data used in the above-described embodiments of the present invention can be recorded on a computer-readable recording medium through various means. The computer-readable recording medium includes a storage medium such as a magnetic storage medium (e.g., ROM, floppy disk, hard disk, etc.), optical reading medium (e.g., CD ROM,

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.The present invention has been described with reference to the preferred embodiments. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is defined by the appended claims rather than by the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included in the present invention.

Claims

Obtaining data of kernels having an input feature map and a binary-weight to be processed in a layer of a neural network from a memory;
Decomposing each of the kernels into a first type sub-kernel reconstructed with weights of the same sign and a second type sub-kernel for correcting the difference between the kernel and the first type sub-kernel;
Performing a convolution operation using the input feature map and the first type sub-kernel and the second type sub-kernel decomposed from each of the kernels; And
And obtaining an output feature map by combining the results of the convolution operation. &Lt; Desc / Clms Page number 21 >

The method according to claim 1,
The first type sub-
And replacing the weights in all elements of each of the kernels with the same values.

3. The method of claim 2,
The first type sub-
The sub-kernel being a reconstructed sub-kernel by replacing the weights in all elements of the first type sub-kernel with the same values of the first code if the input feature map has an index of an odd channel,
The sub-kernel being reconstructed by replacing weights in all elements of the first type sub-kernel with identical values of a second code if the input feature map has an index of an even-numbered channel.

The method according to claim 1,
Wherein the input feature map and the first type sub-kernels decomposed from each of the kernels to perform the convolution operation are all the same.

The method according to claim 1,
The second type sub-
Wherein in each of the kernels an element having a different weight from the first type sub-kernel is sub-kernels redefined to define the original weights of each of the kernels and not define the weights for the remaining elements.

The method according to claim 1,
The step of performing the convolution operation
A first convolution operation between a first type sub-kernel decomposed from an initial kernel and a current window in the input feature map, and a second convolution operation between each of the second type sub- How to do the.

The method according to claim 6,
Wherein the performing of the convolution operation comprises clock-gating the first convolution operation between each of the first type sub-kernels decomposed from the kernels other than the first kernel among the kernels and the current window,
Wherein the result of the first convolution operation performed on the first kernel is reused as a result of the first convolution operation on the remaining kernels.

The method according to claim 6,
The second convolution operation
Wherein the weighting is performed for elements between the weighted elements and the corresponding pixels of the input feature map in each of the second type sub-kernels, and skipped for matrix elements for which weights are not defined.

The method according to claim 1,
The step of obtaining the output feature map
Wherein each pixel value of the output feature map is converted to a result of a first convolution operation between the first type sub kernel and a window of the input feature map as a result of a second convolution operation between the second type sub kernel and the window Value based on a value obtained by adding two times the value of the output feature map.

10. A computer-readable recording medium having recorded thereon one or more programs including instructions for executing the method of any one of claims 1 to 9.

A memory in which at least one program is stored; And
And a processor for driving the neural network by executing the at least one program,
The processor comprising:
Obtaining data of kernels having an input feature map and a binary-weight to be processed at a layer of a neural network from memory,
Decomposing each of the kernels into a first type sub-kernel reconstructed with weights of the same sign and a second type sub-kernel for correcting a difference between the kernel and the first type sub-kernel,
Performing a convolution operation using the first type sub-kernel and the second type sub-kernel, which are decomposed from the input feature map and the kernels, respectively,
And obtains an output feature map by combining the results of the convolution operation.

12. The method of claim 11,
The first type sub-
Wherein the kernel is a reconstructed sub-kernel by replacing the weights in all elements of each of the kernels with the same values.

13. The method of claim 12,
The first type sub-
The sub-kernel being a reconstructed sub-kernel by replacing the weights in all elements of the first type sub-kernel with the same values of the first code if the input feature map has an index of an odd channel,
Wherein the sub-kernel is a reconstructed sub-kernel by replacing weights in all elements of the first type sub-kernel with identical values of a second code if the input feature map has an index of an even-numbered channel.

12. The method of claim 11,
Wherein the input feature map and the first type sub-kernels decomposed from each of the kernels to perform the convolution operation are all the same.

12. The method of claim 11,
The second type sub-
Wherein each of the kernels is a sub-kernel that is reconfigured to define an original weight for each of the kernels and a weight for the remaining elements in an element having a different weight from the first type sub-kernel.

12. The method of claim 11,
The processor
A first convolution operation between a first type sub-kernel decomposed from an initial kernel and a current window in the input feature map, and a second convolution operation between each of the second type sub- To the neural network device.

17. The method of claim 16,
The processor clock-gates the first convolution operation between each of the first type sub-kernels decomposed from the kernels other than the first kernel among the kernels and the current window,
Wherein the result of the first convolution operation performed on the initial kernel is reused as a result of the first convolution operation on the remaining kernels.

17. The method of claim 16,
The second convolution operation
Wherein the second type sub-kernels are skipped for elements whose weights are defined and corresponding pixels of the input feature map, and for which matrix elements whose weights are not defined are skipped.

12. The method of claim 11,
The processor
Wherein each pixel value of the output feature map is converted to a result of a first convolution operation between the first type sub kernel and a window of the input feature map as a result of a second convolution operation between the second type sub kernel and the window Value of the output feature map to determine the output feature map.