KR102042446B1

KR102042446B1 - Improved binarization apparatus and method of first layer of convolution neural network

Info

Publication number: KR102042446B1
Application number: KR1020180041588A
Authority: KR
Inventors: 김태환; 신지훈
Original assignee: 한국항공대학교산학협력단
Priority date: 2018-04-10
Filing date: 2018-04-10
Publication date: 2019-11-08
Also published as: KR20190118365A

Abstract

컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치 및 방법에 관한 것으로 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치는, 상기 레이어에 대한 초기 입력데이터를 이진화하여 이진화 입력데이터를 생성하고, 상기 첫번째 레이어의 파라미터를 이진화하여 이진화 파라미터를 생성하는 이진화 파라미터 생성부, 상기 이진화 파라미터에 대하여 배수화를 수행하여 배수화 파라미터를 생성하고, 상기 이진화 입력데이터와 상기 배수화 파라미터 간의 컨벌루션 연산을 수행한 1차 연산 결과를 생성하는 컨벌루션 연산부 및 상기 1차 연산 결과에 쇼트컷을 더하여 2차 연산 결과를 산출하는 쇼트컷 연산부를 포함하되, 상기 쇼트컷은 이진화 이전의 상기 초기 입력데이터에 대응할 수 있다.An improved binarization apparatus and method for a first layer of a convolutional neural network. The improved binarization apparatus for a first layer of a convolutional neural network according to an embodiment of the present disclosure generates binarization input data by binarizing initial input data for the layer. A binarization parameter generator for generating binarization parameters by binarizing the parameters of the first layer, performing a multiplexing on the binarization parameters to generate a multiplexing parameter, and performing a convolution operation between the binarization input data and the multiplexing parameter A convolution operation unit that generates a first operation result and a short operation unit that calculates a second operation result by adding a shortcut to the first operation result, wherein the shortcut may correspond to the initial input data before binarization. .

Description

IMPROVED BINARIZATION APPARATUS AND METHOD OF FIRST LAYER OF CONVOLUTION NEURAL NETWORK

본원은 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치 및 방법에 관한 것이다.The present application is directed to an improved binarization apparatus and method of the first layer of a convolutional neural network.

심층 신경망 (Deep Neural Network; DNN)은 컴퓨터 비전과 음성 인식을 포함하는 여러 분야에서 뛰어난 성과를 보였으며, 특히 컴퓨터 비전 분야에서 DNN의 일종인 컨벌루션 신경망 (Convolutional Neural Network; CNN)이 높은 영상 분석 성능을 보였다. CNN의 분석 성능을 향상시키기 위해서는, 수많은 인공 신경들이 모델에 포함 되어야 하며, 이를 구현하기 위해서는 매우 높은 연산량이 요구된다. 이와 더불어, 수많은 인공 신경들 간의 연결 관계를 구현하기 위해서는 매우 큰 메모리 용량이 요구된다.Deep Neural Networks (DNNs) have shown outstanding performance in many fields, including computer vision and speech recognition. Especially, in the field of computer vision, the Convolutional Neural Network (CNN) has high image analysis performance. Showed. To improve the CNN's analytical performance, a large number of artificial nerves must be included in the model, which requires very high computational complexity. In addition, a very large memory capacity is required to implement a connection relationship between numerous artificial nerves.

그러나 CNN 특유의 높은 메모리 요구량과 연산 량으로 인해, 값비싼 고성능-고전력의 그래픽 프로세싱 유닛(GPU)을 갖춘 환경에서는 원활하게 동작하지만, 스마트폰, 스마트 웨어러블 기기 등의 임베디드 플랫폼에서의 빠른 동작은 어렵다. 이로 인해 CNN의 연산 량과 메모리 요구량을 감소시키기 위한 방법이 요구되고 있다However, due to CNN's high memory demands and computations, it works well in environments with expensive, high-performance, high-power graphics processing units (GPUs), but it is difficult to operate quickly on embedded platforms such as smartphones and smart wearable devices. . As a result, there is a demand for a method for reducing the amount of computation and memory requirements of the CNN.

본원의 배경이 되는 기술은 한국공개특허공보 제2017-0023708호에 개시되어 있다.Background art of the present application is disclosed in Korea Patent Publication No. 2017-0023708.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 컨벌루션 입력데이터와 파라미터를 모두 이진화하면서도 입력데이터의 손실을 최소화할 수 있는 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치 및 방법을 제공하는 것을 목적으로 한다.The present invention is to solve the above-mentioned problems of the prior art, an object of the present invention to provide an improved binarization apparatus and method of the first layer of the convolutional neural network that can minimize the loss of input data while binarizing both the convolution input data and parameters. do.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 종래의 컨벌루션 신경망의 연산에 비해 부동소수점 곱셈의 연산량을 감소시키고, 분석 성능의 저하를 저감할 수 있는 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치 및 방법을 제공하는 것을 목적으로 한다.The present invention is to solve the above-mentioned problems of the prior art, an improved binarization device of the first layer of the convolutional neural network that can reduce the amount of floating point multiplication and reduce the degradation of analysis performance compared to the conventional convolutional neural network operation And to provide a method.

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들도 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the embodiments of the present application is not limited to the technical problems as described above, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 방법은, (a) 상기 첫번째 레이어에 대한 초기 입력데이터를 이진화하여 이진화 입력데이터를 생성하고, 상기 첫번째 레이어의 파라미터를 이진화하여 이진화 파라미터를 생성하는 단계, (b) 상기 이진화 파라미터에 대하여 배수화를 수행하여 배수화 파라미터를 생성하고, 상기 이진화 입력데이터와 상기 배수화 파라미터 간의 컨벌루션 연산을 수행한 다음 상기 컨벌루션 연산에 대하여 상기 배수화 파라미터로 배수화 곱셈을 진행하여 1차 연산 결과를 생성하는 단계, 및 (c) 상기 1차 연산 결과에 쇼트컷을 더하여 2차 연산 결과를 산출하는 단계를 포함하되, 상기 쇼트컷은 이진화 이전의 상기 초기 입력데이터에 대응할 수 있다.As a technical means for achieving the above technical problem, the improved binarization method of the first layer of the convolutional neural network according to an embodiment of the present application, (a) binarizes the initial input data for the first layer to generate the binarized input data Generating a binarization parameter by binarizing the parameter of the first layer, (b) performing a binarization on the binarization parameter to generate a binarization parameter, and performing a convolution operation between the binarization input data and the binarization parameter. Performing multiplication multiplication with the multiplication factor for the convolution operation to generate a first operation result, and (c) calculating a second operation result by adding a short cut to the first operation result. Wherein the shortcut corresponds to the initial input data before binarization Can.

본원의 일 실시예에 따르면, 상기 (c) 단계에서의 상기 쇼트컷의 덧셈 연산에 의해, 상기 (a) 단계에서의 이진화 이전의 초기 입력데이터가 고려될 수 있다.According to one embodiment of the present application, by the addition operation of the shortcut in the step (c), the initial input data before the binarization in the step (a) can be considered.

본원의 일 실시예에 따르면, 상기 (b) 단계에서, 상기 1차 연산 결과는 배치 정규화된 것이고, 상기 (c) 단계에서, 상기 쇼트컷은 배치 정규화된 것일 수 있다.According to an embodiment of the present disclosure, in the step (b), the first operation result is batch normalized, and in the step (c), the shortcut may be batch normalized.

본원의 일 실시예에 따르면, 상기 2차 연산 결과는 식1에 의해 산출될 수 있다.According to one embodiment of the present application, the quadratic operation result may be calculated by Equation 1.

본원의 일 실시예에 따르면, 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 방법은, (d) 상기 2차 연산 결과에 대하여 풀링 및 배치 정규화를 수행하는 단계를 더 포함할 수 있다.According to an embodiment of the present disclosure, the improved binarization method of the first layer of the convolutional neural network may further include (d) performing pooling and batch normalization on the quadratic result.

본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치는 상기 레이어에 대한 초기 입력데이터를 이진화하여 이진화 입력데이터를 생성하고, 상기 첫번째 레이어의 파라미터를 이진화하여 이진화 파라미터를 생성하는 이진화 파라미터 생성부, 상기 이진화 파라미터에 대하여 배수화를 수행하여 배수화 파라미터를 생성하고, 상기 이진화 입력데이터와 상기 배수화 파라미터 간의 컨벌루션 연산을 수행한 다음 상기 컨벌루션 연산에 대하여 상기 배수화 파라미터로 배수화 곱셈을 진행하여 1차 연산 결과를 생성하는 컨벌루션 연산부 및 상기 1차 연산 결과에 쇼트컷을 더하여 2차 연산 결과를 산출하는 쇼트컷 연산부를 포함하되, 상기 쇼트컷은 이진화 이전의 상기 초기 입력데이터에 대응할 수 있다.The improved binarization apparatus of the first layer of the convolutional neural network according to an embodiment of the present invention binarizes initial input data for the layer to generate binarization input data, and binarizes the parameter of the first layer to generate binarization parameters. A generation unit generates a doubling parameter by performing a doubling on the binarization parameter, performs a convolution operation between the binarization input data and the doubling parameter, and then multiplies a multiplication factor by the doubling parameter with respect to the convolution operation. A convolution operation unit configured to generate a first operation result and a short cut operation unit configured to calculate a second operation result by adding a shortcut to the first operation result, wherein the short cut may correspond to the initial input data before binarization. have.

본원의 일 실시예에 따르면, 상기 쇼트컷의 덧셈 연산에 의해, 이진화 이전의 상기 초기 입력데이터가 고려될 수 있다.According to one embodiment of the present application, by the addition operation of the shortcut, the initial input data before binarization may be considered.

본원의 일 실시예에 따르면, 상기 컨벌루션 연산부는, 상기 1차 연산 결과를 배치 정규화하고, 상기 쇼트컷 연산부는, 상기 쇼트컷을 배치 정규화할 수 있다.According to the exemplary embodiment of the present application, the convolution operation unit may perform batch normalization of the first operation result, and the shortcut operation unit may perform batch normalization of the shortcut.

본원의 일 실시예에 따르면, 상기 쇼트컷 연산부는, 식 2를 통해 상기 2차 연산 결과를 산출할 수 있다.According to the exemplary embodiment of the present application, the short cut operation unit may calculate the secondary calculation result through Equation 2.

본원의 일 실시예에 따르면, 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치는 상기 2차 연산 결과에 대하여 풀링 및 배치 정규화를 수행하는 정규화 연산부를 더 포함할 수 있다.According to an embodiment of the present disclosure, the improved binarization apparatus of the first layer of the convolutional neural network may further include a normalization operation unit that performs pooling and batch normalization on the second operation result.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-mentioned means for solving the problems are merely exemplary and should not be construed as limiting the present application. In addition to the above-described exemplary embodiments, additional embodiments may exist in the drawings and detailed description of the invention.

전술한 본원의 과제 해결 수단에 의하면, 컨벌루션 입력데이터와 파라미터를 모두 이진화하면서도 초기 입력데이터를 우회하는 쇼트컷 연산을 부가함으로써 입력데이터의 손실을 최소화할 수 있는 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치 및 방법을 제공할 수 있다.According to the above-described problem solving means of the present invention, an improved binarization apparatus of the first layer of the convolutional neural network that can minimize the loss of input data by adding a shortcut operation that bypasses initial input data while binarizing both convolution input data and parameters. And methods.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 초기 입력데이터 및 첫번재 레이어의 파라미터를 이진화하여 종래의 컨벌루션 신경망의 연산에 비해 부동소수점 곱셈의 연산량을 감소시키고, 쇼트컷 연산을 통해 분석 성능의 저하를 저감할 수 있는 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치 및 방법을 제공할 수 있다.The present invention is to solve the above-described problems of the prior art, by binarizing the initial input data and the parameters of the first layer to reduce the amount of floating point multiplication compared to the operation of the conventional convolutional neural network, analysis performance through the short cut operation It is possible to provide an improved binarization apparatus and method of the first layer of the convolutional neural network, which can reduce the degradation of.

도 1은 종래의 일반적인 컨벌루션 신경망의 프로세스를 도시한 도면이다.
도 2는 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치의 구성을 도시한 도면이다.
도 3은 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화의 흐름을 도시한 도면이다.
도 4는 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치에 의한 분석 성능 감소를 도식화한 도면이다.
도 5는 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치 및 종래의 컨벌루션 신경망의 연산량을 비교한 도면이다.
도 6은 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치와 종래의 CNN의 분석 성능을 비교한 도면이다.
도 7은 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 방법의 흐름을 도시한 도면이다.1 is a diagram illustrating a process of a conventional general convolutional neural network.
2 is a diagram illustrating the configuration of an improved binarization apparatus of a first layer of a convolutional neural network according to an embodiment of the present disclosure.
3 is a diagram illustrating an improved binarization flow of a first layer of a convolutional neural network according to an embodiment of the present disclosure.
4 is a diagram illustrating an analysis performance reduction by an improved binarization apparatus of a first layer of a convolutional neural network according to an embodiment of the present disclosure.
5 is a diagram comparing the amount of computation of the improved binarization apparatus of the first layer of the convolutional neural network and the conventional convolutional neural network according to an embodiment of the present application.
6 is a diagram comparing the analysis performance of the conventional CNN and the improved binarization device of the first layer of the convolutional neural network according to an embodiment of the present application.
FIG. 7 illustrates a flow of an improved binarization method of a first layer of a convolutional neural network according to an embodiment of the present disclosure.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present disclosure. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted for simplicity of explanation, and like reference numerals designate like parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when a portion is "connected" to another portion, this includes not only "directly connected" but also "electrically connected" with another element in between. do.

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout this specification, when a member is said to be located on another member "on", "upper", "top", "bottom", "bottom", "bottom", this means that any member This includes not only the contact but also the presence of another member between the two members.

본원 명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함" 한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout this specification, when a part is said to "include" a certain component, it means that it can further include other components, without excluding the other components unless specifically stated otherwise.

도 1은 종래의 일반적인 컨벌루션 신경망의 프로세스를 도시한 도면이다.1 is a diagram illustrating a process of a conventional general convolutional neural network.

종래의 일반적인 컨벌루션 신경망(Convolution Neural Network)(이하, 종래의 CNN이라 한다.)은 크게 입력의 지역적 특성(예를 들어 이미지의 특징(feature))을 추출하는 복수의 컨벌루션 레이어(Convolutional Layer, 이하, CL)와 추출된 특성들을 분석하는 복수의 완전 연결 레이어 (Fully-Connected Layer, 이하, FCL)로 구성된다. 종래의 CNN으로 입력된 정보는 다수의 CL과 FCL을 거쳐 분류될 수 있다. 예를 들어 입력된 정보가 이미지인 경우, 이미지의 특징점에 따라 이미지가 어떤 범주에 속하는지 분류될 수 있다. CNN의 파라미터와 컨벌루션 입력들의 정밀한 32 비트 부동소수점 값은, 높은 영상 분석 성능을 내는데 중요치 않다고 알려졌으며, 이에 따라 파라미터와 컨벌루션 입력들을 더 낮은 비트 수로 표현하는 연구가 수행된 바 있다. 도 1의 (a)를 참조하면, 종래의 CNN은 컨벌루션(Convolution)(S11, 풀링(Pooling)(S12), 배치 정규화(Batch Normalization)(S13) 및 활성화(Activation)(S40)의 프로세스를 포함할 수 있다. 컨벌루션(S11)은 입력데이터와 복수의 필터 간의 연산이며, 각 필터의 파라미터들은 학습을 통해 획득될 수 있다. w×h×d 크기의 입력 A와 (2n-1)×(2n-1)×d 크기의 M개 필터 W 간의 컨벌루션의 결과 O는 하기의 수학식 1로 나타낼 수 있다.A conventional general convolutional neural network (hereinafter, referred to as a conventional CNN) is a plurality of convolutional layers (hereinafter, referred to as "features of an image") that largely extract input local characteristics. CL) and a plurality of fully-connected layers (FCLs) for analyzing the extracted characteristics. Information input to the conventional CNN may be classified through a plurality of CL and FCL. For example, if the inputted information is an image, it may be classified into which category the image belongs to according to the feature point of the image. Precise 32-bit floating-point values of CNN's parameter and convolution inputs are known to be insignificant for high image analysis performance, and research has been conducted to represent parameter and convolution inputs with lower bit counts. Referring to FIG. 1A, a conventional CNN includes a process of convolution (S11), pooling (S12), batch normalization (S13), and activation (S40). The convolution S11 is an operation between the input data and the plurality of filters, and the parameters of each filter can be obtained by learning W x h x d of input A and (2n-1) x (2n). The result O of the convolution between M filters W having a size of −1) × d may be expressed by Equation 1 below.

[수학식 1][Equation 1]

결과값인 O는 (i, j, m) 위치의 성분 값을 나타낸다. 또한,

는m 번째 필터

의 (x, y, z) 위치에서의 파라미터 값이고 A _x,y,z 는 A의 (x, y, z) 위치에서의 성분 값이다 상기 수학식1에서와 같이 종래의 CNN에서 O의 성분 값을 계산하기 위해서는 (2n-1)·(2n-1)·d번의 곱셈-누산이 필요하며, O의 크기는 (w-2n+2) Х(h-2n+2)ХM이 된다. 이에 따라, 하나의 CL을 처리하기위해 총 (2n-1)·(2n-1)·d·(w-2n+2)·(h-2n+2)·M번의 연산이 요구되며, (2n-1)·(2n-1)·d·M개의 파라미터를 저장할 메모리가 필요하다. 또한, 컨벌루션 연산 후에는 채널별로 풀링을 수행하고(S20), O를 배치 정규화한다(S30). 풀링은 국부 영역 내의 값들에 대한 평균 또는 최대값을 선택하여 입력데이터의 크기를 줄이는 것을 의미한다. 이후 비선형 활성화 연산을 통과한 활성화 출력은 다음 레이어의 입력으로 사용된다. 여기서, 비선형 활성화 연산은 비선형 함수를 이용한 연산을 의미하며, 비선형 함수는, 예를 들어, 시그모이드(sigmoid) 함수나 정류 선형 함수(ReLU : Rectified Linear Unit)를 포함할 수 있다. 또한, 활성화 출력은 상기 비선형 함수를 이용한 연산의 결과를 의미한다. The resulting value O represents the component value at position (i, j, m). Also,

Mth filter

Is the parameter value at position (x, y, z) of A and _{x, y, z} is the component value at position (x, y, z) of A. To calculate the value, (2n-1) · (2n-1) · d times of multiplication-accumulation is required, and the magnitude of O is (w-2n + 2) Х (h-2n + 2) ХM . Accordingly, a total of (2n-1), (2n-1), d, (w-2n + 2), (h-2n + 2), and M operations are required to process one CL, and (2n -1) A memory for storing (2n-1) d d M parameters is required. In addition, after the convolution operation, pooling is performed for each channel (S20), and O is batch normalized (S30). Pulling means reducing the size of the input data by selecting an average or maximum value for the values in the local area. The activation output, which then passes the nonlinear activation operation, is used as the input for the next layer. Here, the nonlinear activation operation may mean an operation using a nonlinear function, and the nonlinear function may include, for example, a sigmoid function or a rectified linear unit (ReLU). In addition, the activation output means the result of the operation using the nonlinear function.

상기 FCL은 완전 연결된 컨벌루션 연산, 집단 정규화 및 활성화를 포함하는 프로세스를 의미한다. 완전 연결된 컨벌루션은 상기 CL의 컨벌루션 연산과 동일한 연산을 하되, M개의 필터 크기가 입력의 크기와 같으며, 출력의 크기는 1×1ⅹM이 된다. 이에 따라, 하나의 FCL을 처리하기 위해서, w·h·d·M번의 곱셈-누산 연산이 필요하며, w·h·d·M개의 파라미터를 저장할 메모리가 필요하다. 상술한 바와 같이 종래의 CNN을 구현하기 위해서는 많은 연산량 및 큰 메모리 용량이 요구되는 문제점이 있다. 이러한 문제점에 의해, 임베디드 시스템과 같이 제한된 연상 성능과 메모리 용량을 갖춘 플랫폼에서 종래의 CNN을 구현하는 것이 어려운 실정이다.The FCL refers to a process involving fully connected convolution operations, group normalization and activation. The fully connected convolution performs the same operation as the convolution operation of the CL, with M filter sizes equal to the input size, and the output size is 1 × 1 μM . Accordingly, in order to process one FCL, a multiplication-accumulation operation of w · h · d · M times is required, and a memory for storing w · h · d · M parameters is required. As described above, in order to implement a conventional CNN, a large amount of computation and a large memory capacity are required. Due to these problems, it is difficult to implement a conventional CNN on a platform having limited associative performance and memory capacity, such as an embedded system.

또한, 종래에는 CNN에는 파라미터와 컨벌루션 입력을 모두 8비트 부동소수점으로 표현하는 방식이 제안된 바 있었고, 파라미터는 3개의 값, 컨벌루션 입력은 3비트로 표현하는 방법도 제안된 바 있다. 또한, 극단적인 단일 비트로 표현하는 이진화 (Binarization) 과정을 통해 CNN의 연산량과 메모리 요구량을 획기적으로 감소시키는 방식(BCNN)이 도 1의 (b)와 같이 제안된 바 있다. 도 1의 (b)를 참조하면, 종래의 이진화된 입력에 대한 CNN은 레이어의 입력을 이진화하여 이진화된 입력에 대한 컨벌루션을 수행하고(S21), 배수화(S22), 풀링(S23), 배치 정규화(S24) 및 이진화(S25)의 프로세스를 포함할 수 있다. 이진화된 컨벌루션(S21)은 입력과 다수의 필터 간의 연산이며 컨벌루션 입력은 이진화 이전의 컨벌루션 레이어의 활성화 결과 A^b로 수학식 2와 같이 단일 비트로 표현될 수 있다.In addition, conventionally, a CNN has been proposed a method of expressing both a parameter and a convolution input in 8-bit floating point, and a method of expressing a parameter in 3 values and a convolution input in 3 bits has also been proposed. In addition, a method of dramatically reducing the amount of computation and memory required by the CNN through a binarization process represented by an extreme single bit (BCNN) has been proposed as shown in FIG. Referring to (b) of FIG. 1, the CNN for a conventional binarized input binarizes the input of a layer to perform convolution on the binarized input (S21), multiplexing (S22), pooling (S23), and arranging. It may include a process of normalization (S24) and binarization (S25). The binarized convolution S21 is an operation between the input and the plurality of filters, and the convolution input may be represented as a single bit as Equation 2 as an activation result A ^b of the convolution layer before binarization.

[수학식 2][Equation 2]

여기서, l은 레이어 번호 이고, clip(·)은 max(-1, min(·, 1)) 연산을 나타내고, sign(·)은 부호를 추출(반환)하는 연산을 나타낸다. (A_l)^b _x,y,z는 이진화된 컨벌루션의 입력을 나타내고, (A_l) _x _, _y _, _z 는 이진화 이전의 컨벌루션의 입력을 나타낸다. 또한, 일반적인 CNN과 달리, 각 필터의 파라미터들은 부동소수점 형식인 W가 아닌 이진화된 형식 W^b로 표현될 수 있다. 각 필터의 파라미터들을 이진화하는 연산은 수학식 3과 같이 나타날 수 있다.Where l is the layer number, clip () represents the max (-1, min (·, 1)) operation, and sign (·) represents the operation to extract (return) the sign. (A _l ) ^b _{x, y, z} represents the input of binarized convolution, and (A _l ) _x _, _y _, _z represents the input of convolution before binarization. In addition, unlike the general CNN, the parameters of each filter may be represented in a binary form W ^b rather than a floating point type W. The operation of binarizing the parameters of each filter may be represented as in Equation 3.

[수학식 3][Equation 3]

여기서, (W_l)^b _m,x,y,z는 이진화된 파라미터를 나타내고, (W_l)_m,x,y,z는 이진화 이전의 파라미터를 나타낸다. -1 또는 1의 값만 갖는 이진화된 컨벌루션의 입력과 파라미터 간의 컨벌루션 연산은 후술하는 Bitwise XNOR와 누산으로 대체될 수 있다. 또한, wⅹhⅹd 크기의 입력 A^b와 (2n-1)ⅹ(2n-1)ⅹd 크기의 M개 필터 W^b 간의 이진화된 컨벌루션의 결과는 수학식 4와 같이 나타날 수 있다.Here, (W ₁ ) ^b _{m, x, y, z} represents a binarized parameter, and (W ₁ ) _{m, x, y, z} represents a parameter before binarization. The convolution operation between the input and the parameter of the binary convolution having only a value of -1 or 1 may be replaced by Bitwise XNOR and accumulation, which will be described later. In addition, the result of the binarized convolution between the input A ^b of the size w ⅹ h ⅹ ^d and the M filters W ^b of the size (2n-1) ⅹ (2n-1) ⅹ ^d can be expressed as Equation 4.

[수학식 4][Equation 4]

여기서 l은 레이어 번호이고, W_m,x,y,z, W^b _m,x,y,z는 각각 m 번째 필터 W_m의 (x, y, z) 위치에서의 부동소수점 파라미터 값과 이진화된 파라미터 값을 의미한다. 또한, W^b와 A^b간의 곱셈 누산은 수학식 5와 같이 단일 비트로 매핑된 파라미터와 컨벌루션 입력간의 Bitwise XNOR와 누산으로 대체될 수 있다.Where l is the layer number, and W _{m, x, y, z} , W ^b _{m, x, y, z} are the floating-point parameter values and binarized at the (x, y, z) position of the m th filter, W _m , respectively. It means the parameter value. In addition, the multiplication accumulation between W ^b and A ^b may be replaced by a bitwise XNOR and accumulation between a parameter mapped to a single bit and a convolution input, as shown in Equation (5).

[수학식 5][Equation 5]

여기서 XNOR(·,·)는 단일 비트의 두 입력 간의 XNOR 연산으로, 두 입력의 값이 다르면 -1, 같으면 1의 출력을 가진다. 상기 수학식 5에서와 같이, O의 크기는 제로-패딩 (Zero-Padding)을 한 경우 채널 수만 바뀌어 wⅹhⅹM 이 된다. 이에 따라 CL의 이진화된 컨벌루션을 처리하기 위해 총 (2n-1)·(2n-1)·d·w·h·M번의 비트 연산이 요구된다.Here, XNOR (·, ·) is an XNOR operation between two inputs of a single bit, and has an output of -1 if the values of the two inputs are different, and 1 if they are the same. As in Equation 5, the size of O becomes wⅹhⅹM by changing only the number of channels in the case of zero-padding. Accordingly, a total of (2n-1)-(2n-1) -d-w-h-M bit operations are required to process the binary convolution of CL.

이진화된 컨벌루션(S21) 이후, 파라미터 배수화 계수 e 를 곱하는 배수화가 수행될 수 있다(S22). 배수화는 파라미터의 이진화로 인한 분석 성능의 감소를 줄이기 위해 수행될 수 있다. 따라서, 배수화 계수 e를 도입하여 이진화된 파라미터를 이진화 이전의 파라미터로 근사시킬 수 있다. 배수화 계수는 수학식 6과 같이 각 채널마다 해당 채널 파라미터의 절대값 평균으로 정의될 수 있다. 배수화 계수는 해당 레이어의 채널 수만큼 생성될 수 있다. 또한, 수학식 7과 같이 배수화를 수행함에 따라 이진화된 파라미터 값을 이진화 이전의 파라미터 값에 근사화 시킬 수 있다.After the binarized convolution (S21), a multiplication may be performed by multiplying the parameter doubling coefficient e (S22). Drainage can be performed to reduce the reduction in analysis performance due to binarization of parameters. Thus, the multiplexing coefficient e can be introduced to approximate the binarized parameter to the parameter before binarization. The multiplexing coefficient may be defined as an average of absolute values of the corresponding channel parameter for each channel as shown in Equation 6. The doubling coefficient may be generated by the number of channels of the corresponding layer. In addition, as the multiplication is performed as shown in Equation 7, the binarized parameter value may be approximated to the parameter value before binarization.

[수학식 6][Equation 6]

여기서, e_l,m은 배수화 계수를 나타낸다.Here, e _{l, m} represents a doubling coefficient.

[수학식 7][Equation 7]

즉, 배수화(S22)는 이진화된 컨벌루션 결과 O의 각 성분에 e를 곱하는 연산으로, w·h·M 번의 부동소수점 곱셈 연산이 요구되며, 이후 진행되는 풀링(S23), 배치 정규화(S24), 활성화 모두 동일한 수의 부동소수점 연산을 소모한다. 풀링(S23)은 이진화 컨벌루션과 배수화(S22) 이후, 최대값 풀링을 통해 특성 지도(Feature Map)의 크기를 줄이는 과정이다. 수학식 8은 배치 정규화 연산을 나타내며, 학습 과정에서 입력을 채널 단위로 정규화 할 수 있다. 즉, 입력의 채널 단위로 평균

과 분산

을 기록하고, 스케일 (Scale) 파라미터 γ_m와 바이어스 파라미터 β_m를 학습하여 입력을 정규화함으로써 공변량 이동 현상을 억제시킬 수 있다.That is, the doubling (S22) is an operation of multiplying each component of the binarized convolution result O by e, and requires a floating point multiplication operation of w · h · M times, followed by pooling (S23) and batch normalization (S24). , Activation consumes the same number of floating-point operations. Pooling (S23) is a process of reducing the size of a feature map through maximum value pooling after binarization convolution and multiplexing (S22). Equation 8 represents a batch normalization operation, and the input may be normalized in units of channels during the learning process. That is, the average in the channel unit of the input

And dispersion

The covariate shift phenomenon can be suppressed by normalizing the input by learning the scale parameter γ _m and the bias parameter β _m .

[수학식 8][Equation 8]

여기서 X_m,x,y,z는 배치 정규화 과정의 입력을 나타내고_,

은 각각 m 번째 필터의 평균, 분산, 스케일링 파라미터, 바이어스 파라미터를 나타내며, 각각은 필터 수, 즉 입력의 채널 수 만큼의 개수를 가진다. 보편적으로 2ⅹ2, 혹은 3ⅹ3의 국부 영역 내의 값들의 평균값 또는 최댓값을 선택하여 입력의 크기를 (w/2)ⅹ(h/2)ⅹM 으로 줄인다. 이 후 전술한 수학식 2를 통한 이진화가 이루어지고(S25), 출력된 레이어는 다음 레이어의 입력으로 사용될 수 있다.Where X _{m, x, y, z} represents the input to the batch normalization process _,

Represents the average, variance, scaling parameter, and bias parameter of the m th filter, and each has the number of filters, that is, the number of channels of the input. Typically, the average or maximum value of the values in the local area of 2ⅹ2 or 3ⅹ3 is chosen to reduce the size of the input to (w / 2) ⅹ (h / 2) ⅹM. Thereafter, binarization is performed through Equation 2 (S25), and the output layer may be used as an input of a next layer.

영상 데이터가 CL을 여러 번 통과함으로써 다수의 풀링 연산으로 인해 크기가 많이 줄어들면 FCL을 통해 최종 영상 분석을 진행한다. FCL의 연산 구조는 CL과 동일하되, 컨벌루션에 해당하는 연산이 입력과 파라미터 간의 일차결합 연산으로 대체된다. 이를 위해 3차원 형태의 CL 출력을 1차원 형태로 바꾸고, 다수의 1차원 필터들과 컨벌루션 한다. 수학식 9는 k 크기의 이진화된 입력 A^b와 동일한 k 크기의 이진화된 M개 필터 W^b 간의 연산 결과로 생성되는 출력 O의 m번째 성분 값을 나타낸다.If the image data passes through the CL several times and is greatly reduced in size due to a number of pooling operations, the final image analysis is performed through the FCL. The operation structure of FCL is the same as CL, but the operation corresponding to the convolution is replaced by the first-combination operation between the input and the parameter. To this end, the CL output of the three-dimensional form is converted into the one-dimensional form and convolved with a plurality of one-dimensional filters. Equation (9) represents the value of the mth component of the output O generated as a result of the operation between k sized binarized inputs A ^b and M equalized M sized filters W ^b .

[수학식 9][Equation 9]

여기서, W_m,x는 m번째 필터와 입력을 나타내고, A_x는 입력의 x번째 성분 값을 나타낸다. 상기 수학식 9는 총 k·M 번의 비트 연산을 소모한다.Where W _{m, x} represents the m-th filter and the input, and A _x represents the x-th component value of the input. Equation 9 consumes a total of k · M bit operations.

종래의 BCNN은 영상 데이터를 입력으로 받는 신경망의 입력 레이어는 이진화 시 데이터 손실로 인한 성능 감소를 이유로 이진화를 하지 않았다. 하지만 이진화되지 않은 CL의 컨벌루션 연산은 막대한 양의 부동소수점 연산을 요구하며, 이러한 연산량은 각 레이어의 배치 정규화, 풀링, 활성화 시 요구되는 부동소수점 연산량을 모두 합한 것을 상회하기 때문에, 연산량 관점에서 보다 효율적인 방안이 제시될 필요가 있다.Conventional BCNN has not binarized the input layer of the neural network that receives the image data as input due to the performance reduction due to data loss during the binarization. However, non-binarized CL's convolution operation requires a huge amount of floating point operations, which exceeds the sum of all the floating point operations required for batch normalization, pooling, and activation of each layer. The plan needs to be presented.

도 2는 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치의 구성을 도시한 도면이고, 도 3는 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화의 흐름을 도시한 도면이다.2 is a view showing the configuration of the improved binarization apparatus of the first layer of the convolutional neural network according to an embodiment of the present application, Figure 3 is a flow of the improved binarization of the first layer of the convolutional neural network according to an embodiment of the present application Figure is shown.

도 2를 참조하면, 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치(100)는 이진화 파라미터 생성부(110), 컨벌루션 연산부(120), 쇼트컷 연산부(130) 및 정규화 연산부(140)를 포함할 수 있다. 도 5를 참조하면, 파라미터 생성부(110)는 첫번째 레이어에 대한 초기 입력데이터를 이진화하여 이진화 입력데이터를 생성할 수 있다. 첫번째 레이어에는 이전 레이어가 없기 때문에 컨벌루션의 입력이 -1 또는 1의 값을 갖지 않는다. 따라서, 이진화 파라미터 생성부(110)는 수학식 10을 통해 초기 입력 데이터 즉, 컨벌루션 입력을 이진화할 수 있다.Referring to FIG. 2, the improved binarization apparatus 100 of the first layer of the convolutional neural network may include a binarization parameter generator 110, a convolution operator 120, a shortcut operator 130, and a normalization operator 140. have. Referring to FIG. 5, the parameter generator 110 may generate binarized input data by binarizing initial input data of a first layer. Since there is no previous layer in the first layer, the convolution input does not have a value of -1 or 1. Accordingly, the binarization parameter generator 110 may binarize the initial input data, that is, the convolution input, through Equation (10).

[수학식 10][Equation 10]

여기서, (A_l)^b _x,y,z는 첫번째 레이어의 이진화된 컨벌루션 입력을 나타내고, (A_l)_x,y,z는이진화 이전의 컨벌루션 입력을 나타낸다. 첫번째 레이어를 이진화할 경우, 입력 영상 데이터 손실로 인한 분석 성능이 떨어지지만, 컨벌루션이 이진화된 컨벌루션으로 대체되어 파라미터 저장을 위한 메모리 요구량과 연산 복잡도를 낮출 수 있다.Where (A _l ) ^b _{x, y, z} is Binary convolution input of the first layer, and ( _Al ) _{x, y, z} represents the convolution input before binarization. When binarizing the first layer, the analysis performance due to loss of input image data is degraded, but convolution can be replaced by binary convolution to reduce memory requirements and computational complexity for storing parameters.

또한, 이진화 파라미터 생성부(110)는 첫번째 레이어의 파라미터를 이진화하여 이진화 파라미터를 생성할 수 있다(S51). 예시적으로, 이진화 파라미터 생성부(110)는 수학식 11을 통해 첫번째 레이어의 파라미터를 이진화하여 이진화 파라미터를 생성할 수 있다. In addition, the binarization parameter generator 110 may generate a binarization parameter by binarizing the parameter of the first layer (S51). For example, the binarization parameter generator 110 may generate the binarization parameter by binarizing the parameter of the first layer through Equation (11).

[수학식 11][Equation 11]

여기서, (W₁)^b _m,x,y,z는 첫번째 레이어의 이진화된 파라미터를 나타내고, (W₁)_m,x,y,z는 첫번째 레이어의 이진화 이전의 파라미터를 나타낸다.Here, (W ₁ ) ^b _{m, x, y, z} represents the binarized parameter of the first layer, and (W ₁ ) _{m, x, y, z} represents the parameter before the binarization of the first layer.

컨벌루션 연산부(120)는 상기 이진화 파라미터에 대하여 배수화를 수행하여 배수화 파라미터를 생성할 수 있다(S52). 예시적으로, 컨벌루션 연산부(120)는 수학식 12를 통해 배수화 계수를 산출하고, 수학식 13을 통해 이진화 파라미터에 대한 배수화를 수행하여 배수화 파라미터를 생성할 수 있다.The convolution calculator 120 may generate a multiplexing parameter by performing multiplexing on the binarization parameter (S52). For example, the convolution operator 120 may calculate a doubling factor through Equation 12, and generate a doubling parameter by performing multiplexing on the binarization parameter through Equation 13.

[수학식 12][Equation 12]

여기서, e₁은 배수화 계수를 나타낸다.Here, e ₁ represents a doubling coefficient.

[수학식 13][Equation 13]

여기서, (W ₁)^b _m,x,y,z는 이진화된 파라미터이고, (W ₁)_m,x,y,z는 배수화 계수를 통해 상기 이진화된 파라미터를 이진화 이전의 파라미터로 근사시킨 값을 나타낸다. Here, ( W ₁ ) ^b _{m, x, y, z} is a binarized parameter, and ( W ₁ ) _{m, x, y, z} is a value obtained by approximating the binarized parameter to a parameter before binarization through a fold factor. Indicates.

컨벌루션 연산부(120)는 상기 이진화 입력데이터와 상기 배수화 파라미터 간의 컨벌루션 연산을 수행한 1차 연산 결과를 생성할 수 있다. 컨벌루션 연산부(120)는 수학식 14를 통해 1차 연산 결과를 생성할 수 있다.The convolution operator 120 may generate a first operation result of performing a convolution operation between the binarization input data and the multiplexing parameter. The convolution operator 120 may generate the first operation result through Equation 14.

[수학식 14][Equation 14]

여기서, (O ₁)_I,j,m은 1차 연산 결과이고, (A ₀)^b는 이진화된 초기 입력데이터를 의미한다.Here, ( O ₁ ) _{I, j, m} is the result of the first order operation, and ( A ₀ ) ^b means the binarized initial input data.

또한, 컨벌루션 연산부(120)는 1차 연산 결과를 배치 정규화 할 수 있다(S53). 예시적으로, 컨벌루션 연산부(120)는 상기 수학식 8을 이용하여 배치 정규화를 수행할 수 있다.In addition, the convolution operation unit 120 may batch normalize the results of the first operation (S53). For example, the convolution operator 120 may perform batch normalization using Equation 8.

[수학식 8][Equation 8]

컨벌루션 연산부(120)에 의한 배치 정규화 과정에서 상기 X_m,x,y,z는 배치 정규화 과정의 입력 즉, 1차 연산 결과((O ₁)_I,j,m) 일 수 있다. 또한,

은 각각 m 번째 필터의 평균, 분산, 스케일링 파라미터, 바이어스 파라미터를 나타내며, 각각은 필터 수, 즉 입력의 채널 수 만큼의 개수를 가진다In the batch normalization process by the convolution operator 120, X _{m, x, y, z} may be an input of a batch normalization process, that is, a first order result (( O ₁ ) _{I, j, m} ). Also,

Represents the average, variance, scaling parameter, and bias parameter of the m-th filter, each having the number of filters, that is, the number of channels of the input.

상기 1차 연산 결과에 배치 정규화를 적용하면 수학식 15와 같이 표현될 수 있다.When batch normalization is applied to the first operation result, it may be expressed as in Equation 15.

[수학식 15][Equation 15]

쇼트컷 연산부(130)는 상기 1차 연산 결과에 쇼트컷을 더하여 2차 연산 결과를 산출할 수 있다. 전술한 바와 같이, 레이어의 입력 데이터를 이진화함으로써, 연산량을 감소시킬 수 있는 이점이 있으나, 입력 데이터의 이진화에 따라 데이터가 손실되는 문제가 상존한다. 따라서, 이진화 이전의 초기 입력 데이터를 우회하여 컨벌루션 연산 결과인, 1차 연산 결과에 더하는 쇼트컷 연산을 통해 입력을 이진화 하지 않은 컨벌루션에 근접한 결과를 도출할 수 있다. 즉, 쇼트컷은 이진화 이전의 초기 입력데이터에 대응할 수 있다. 또한, 쇼트컷의 덧셈 연산에 의해 이진화 이전의 상기 초기 입력데이터가 고려될 수 있다.The short cut operation unit 130 may calculate a second arithmetic result by adding a short cut to the first arithmetic result. As described above, there is an advantage that the amount of computation can be reduced by binarizing the input data of the layer, but there is a problem that data is lost due to the binarization of the input data. Accordingly, a short cut operation that bypasses the initial input data before binarization and adds to the first result of the convolution operation to obtain a result close to the convolution without binarization of the input. In other words, the shortcut may correspond to initial input data before binarization. In addition, the initial input data before binarization may be considered by an addition operation of a shortcut.

쇼트컷 연산부(130)는 수학식 16을 통해 2차 연산 결과를 산출할 수 있다.The shortcut operator 130 may calculate a quadratic operation result through Equation 16.

[수학식 16][Equation 16]

여기서, e_1,m은 첫번째 레이어에 대한 m번째 필터의 계수이고, A ₀는 초기 입력데이터이고, I,j,m은 A ₀의 위치 성분 값을 나타낸다.Here, e _{1, m} is the coefficient of the m-th filter for the first layer, A ₀ is the initial input data, I, j, m represents the position component value of A ₀ .

상기 수학식 16은 배치 정규화 하지 않은 1차 연산 결과(수학식 14 참조)와 초기 입력데이터의 덧셈 연산을 나타내며, 이는 공변량 이동 현상을 억제하는 것에 대한 고려가 되어 있지 않다. 따라서, 쇼트컷 연산부(130)는 상기 수학식 15와 같이 1차 연산 결과를 배치 정규화하고, 초기 입력데이터를 정규화할 수 있으며(S54), 배치 정규화된 1차 연산 결과 및 배치 정규화된 쇼트컷을 덧셈 연산할 수 있다.Equation 16 represents an addition operation of the first input result (see Equation 14) and initial input data without batch normalization, which is not considered to suppress the covariate shift phenomenon. Accordingly, the shortcut operator 130 may perform batch normalization of the first order calculation result and normalize initial input data as shown in Equation 15 (S54), and perform the batch normalization first order result and the batch normalization shortcut. You can add operations.

쇼트컷 연산부(130)는 수학식 17을 통해 2차 연산 결과를 산출할 수 있다. 수학식 17은 배치 정규화된 1차 연산 결과 및 배치 정규화된 쇼트컷의 덧셈 연산을 나타낸다.The shortcut operator 130 may calculate a quadratic operation result through Equation 17. Equation 17 shows the result of the batch normalized first order operation and the addition operation of the batch normalized shortcut.

[수학식 17][Equation 17]

수학식 17에 따르면, 배치 정규화된 1차 연산 결과 및 배치 정규화된 쇼트컷을 더하였으므로, 다시 배치 정규화를 수행할 필요가 있다. 정규화 연산부(140)는 2차 연산 결과에 대하여 풀링 및 배치 정규화를 수행할 수 있다. 정규화 연산부(140)는 상기 수학식 8을 이용하여 2차 연산 결과를 배치 정규화 할 수 있다(S55). 또한, 정규화 연산부(140)는 2차 연산 결과를 활성화하고(S56), 풀링할 수 있다(S57). 활성화는 예시적으로, 비선형 함수를 이용한 연산을 의미하며, 비선형 함수는, 예를 들어, 시그모이드(sigmoid) 함수나 정류 선형 함수(ReLU : Rectified Linear Unit)를 포함할 수 있다. 또한, 풀링은 풀링은 국부 영역 내의 값들에 대한 평균 또는 최대값을 선택하여 입력데이터의 크기를 줄이는 것을 의미한다. 상기 활성화 및 풀링은 공지된 개념이므로 구체적인 설명은 생략한다.According to Equation 17, since the batch normalized first order result and the batch normalized shortcut are added, it is necessary to perform batch normalization again. The normalization operator 140 may perform pooling and batch normalization on the result of the quadratic operation. The normalization operator 140 may perform batch normalization on the result of the quadratic operation using Equation 8 (S55). In addition, the normalization operation unit 140 may activate the quadratic operation result (S56) and pool (S57). Activation, for example, means an operation using a nonlinear function, and the nonlinear function may include, for example, a sigmoid function or a rectified linear unit (ReLU). In addition, pooling means reducing the size of the input data by selecting an average or maximum value for the values in the local area. Since activation and pooling are well-known concepts, detailed descriptions thereof will be omitted.

다양한 ImageNet 데이터 셋으로 학습을 수행하는 종래의 신경망들과 마찬가지로, ResNet-18은 영상 데이터를 224×224×3의 크기로 줄인 후, 전처리 과정을 통해 정규화한다. 첫 번째 레이어는 스트라이드 (stride)값 2의 7×7 컨벌루션으로, 64개의 7×7×3 파라미터 필터가 사용되며, 출력의 크기는 112×112×64 가 된다. 따라서, 기존 BCNN은 입력 레이어의 컨벌루션에서 7x7x3x112x112x64 번의 부동소수점 MAC 연산을, 배수화, 배치 정규화, 풀링, 활성화에서 각각 112x112x64 번의 부동소수점 곱셈 연산을 소모하게 된다.Like conventional neural networks that perform learning with various ImageNet data sets, ResNet-18 reduces the image data to a size of 224 × 224 × 3 and then normalizes them by preprocessing. The first layer is a 7 × 7 convolution with a stride value of 2, with 64 7 × 7 × 3 parametric filters, with an output size of 112 × 112 × 64. Thus, the existing BCNN consumes 7x7x3x112x112x64 floating point MAC operations in the input layer convolution, and 112x112x64 floating point multiplication operations in the multiplexing, batch normalization, pooling, and activation operations.

반면, 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치(100)는 정규화된 영상 데이터를 sign 연산을 통해 1, -1로 이진화한 후 이진화된 컨벌루션을 수행하여 1차 연산 결과를 산출하고, sign 연산을 수행하지 않은 영상 데이터 즉, 쇼트컷을 1차 연산 결과에 더하여 2차 연산 결과를 산출할 수 있다. 예시적으로, 쇼트컷의 영상 데이터의 크기(224x 224x3)를 이진화된 컨벌루션 출력의 크기 112x112x64와 맞추기 위해 2x2크기로 부 표본을 생성하고 RGB 3개의 채널을 반복적으로 이어 붙여, 64개의 채널로 늘린 결과, 7x7x3x112x112x64 번의 부동소수점 MAC 연산은 XNOR 비트 연산과 누산 연산으로 전환됨으로써, 종래의 신경망 대비 연산량을 감소시킬 수 있다.On the other hand, the improved binarization apparatus 100 of the first layer of the convolutional neural network binarizes normalized image data to 1, -1 through sign operation, and then performs binarized convolution to calculate a first operation result and performs a sign operation. The second operation result may be calculated by adding the image data that is not performed, that is, the shortcut to the first operation result. For example, in order to match the size of the image data of the shortcut (224x 224x3) with the size of the binarized convolution output 112x112x64, subsamples were created at 2x2 size, and the RGB 3 channels were repeatedly connected to increase the number of 64 channels. , 7x7x3x112x112x64 floating point MAC operations are converted into XNOR bit operations and accumulation operations, thereby reducing the amount of computation compared to the conventional neural network.

이하에서는, 종래의 이진화된 컨벌루션과 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치(100)에 의한 컨벌루션의 분석 성능, 부동소수점 연산량 및 비트 연산량을 비교하는 실험을 통해 효용성을 검증하고자 한다. 분석 성능을 측정하기 위해서 평균 469×387의 큰 크기의 컬러 영상 데이터셋인 ImageNet 데이터셋 ILSVRC2012를 사용한다. 또한, 학습 데이터는 입력 영상의 너비와 높이 중 작은 쪽이 256크기를 갖도록 영상 비율을 유지하며 축소 후, 간단한 색채 변동을 하고 임의의 위치에서 224×224크기로 잘라내어 사용하기로 한다. 테스트 데이터에 대해서는 색채 변동 없이, 영상 축소와 정 중앙에서 224×224크기를 잘라내어 사용한다. Hereinafter, the effectiveness of the conventional binarized convolution and the convolutional neural network by the improved binarization apparatus 100 by the improved binarization apparatus 100 will be verified through experiments comparing the analysis performance, the floating-point operation amount, and the bit operation amount. To measure the analysis performance, we use the ImageNet dataset ILSVRC2012, a large color image dataset with an average size of 469 × 387. In addition, the training data is maintained to maintain the ratio of the image so that the smaller one of the width and height of the input image has a 256 size, and after the reduction, a simple color change is made and cut out to a size of 224 × 224 at an arbitrary position. For the test data, 224 × 224 size is cut out from the center of the image without any color change.

도 4는 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치에 의한 분석 성능 감소를 도식화한 도면이다.4 is a diagram illustrating an analysis performance reduction by an improved binarization apparatus of a first layer of a convolutional neural network according to an embodiment of the present disclosure.

도 4를 참조하면, 첫번째 레이어를 이진화한 BCNN은 종래의 CNN에 비해 약 17.5%의 Top-1 분석 성능 감소가 있으나, 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치(100)에 의한 쇼트컷을 부가한 컨벌루션 연산의 경우, 동등한 연산 량으로 13.7% 억제된, 3.8%의 Top-1 분석 성능 감소를 보인다.Referring to FIG. 4, the BCNN binarizing the first layer has a 1-1% reduction in Top-1 analysis performance compared to the conventional CNN, but adds a shortcut by the improved binarization apparatus 100 of the first layer of the convolutional neural network. In the case of one convolution operation, the performance of Top-1 analysis is reduced by 3.8%, which is suppressed by 13.7% with the equivalent amount of computation.

도 5는 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치 및 종래의 컨벌루션 신경망의 연산량을 비교한 도면이다.5 is a diagram comparing the amount of computation of the improved binarization apparatus of the first layer of the convolutional neural network and the conventional convolutional neural network according to an embodiment of the present application.

도 5는 레이어 당 컨벌루션, 배치 정규화, 활성화, 풀링의 부동소수점 곱셈 연산 량과 비트 연산 량의 총합을 나타낸다. 도 5를 참조하면, 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치(100)에 의하면, 종래의 BCNN의 부동소수점 곱셈 연산량의 94.53%가 비트 연산으로 전환되었음을 확인할 수 있다.FIG. 5 shows the sum of the floating point multiplication and bit operations of convolution, batch normalization, activation, and pooling per layer. Referring to FIG. 5, according to the improved binarization apparatus 100 of the first layer of the convolutional neural network, it can be seen that 94.53% of the floating point multiplication operations of the conventional BCNN are converted to bit operations.

또한, 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치(100)의 분석 성능을 측정하기 위한 다른 실험에서, 데이터 셋으로CIFAR10을 사용하고, CNN 모델은 6개의 컨벌루션 레이어와 3개의 풀리 컨넥티드 레이어로 구성된 ConvNet을 사용했다. 상기 CIFAR10은 32x32의 작은 크기의 컬러 영상 데이터 셋으로, 10개의 물체 분류에 대한 5만 장의 트레이닝 셋과 1만 장의 테스트 셋으로 구성되어 있는 것으로 한다.In addition, in another experiment to measure the analytical performance of the improved binarization apparatus 100 of the first layer of the convolutional neural network, we use CIFAR10 as the data set, and the CNN model consists of six convolutional layers and three pulley connected layers. I used ConvNet. The CIFAR10 is a small 32x32 color image data set, which is composed of 50,000 training sets for 10 object classifications and 10,000 test sets.

도 6은 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치와 종래의 CNN의 분석 성능을 비교한 도면이다.6 is a diagram comparing the analysis performance of the conventional CNN and the improved binarization device of the first layer of the convolutional neural network according to an embodiment of the present application.

도 6을 참조하면, 첫번째 레이어를 이진화한 BCNN에 비해 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치(100)에 의한 컨벌루션의 분석 성능 저하가 10% 억제되었음을 확인할 수 있다.Referring to FIG. 6, it can be seen that the degradation of the analysis performance of the convolution by the improved binarization apparatus 100 of the first layer of the convolutional neural network is 10% suppressed compared to BCNN which binarized the first layer.

도 7은 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 방법의 흐름을 도시한 도면이다.FIG. 7 illustrates a flow of an improved binarization method of a first layer of a convolutional neural network according to an embodiment of the present disclosure.

도 7에 도시된 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 방법은 앞선 도 2 내지 도 6을 통해 설명된 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치에 의하여 수행될 수 있다. 따라서 이하 생략된 내용이라고 하더라도 도 2 내지 도 6를 통해 본원의 일 실시예에 따른 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치에 대하여 설명된 내용은 도 7에도 동일하게 적용될 수 있다.The improved binarization method of the first layer of the convolutional neural network according to the embodiment of the present application shown in FIG. 7 is the improved binarization apparatus of the first layer of the convolutional neural network according to the embodiment of the present application described with reference to FIGS. It can be performed by. Therefore, even if omitted below, the descriptions of the improved binarization apparatus of the first layer of the convolutional neural network according to an embodiment of the present disclosure through FIGS. 2 to 6 may be equally applicable to FIG. 7.

도 7을 참조하면, 단계 S1010에서 이진화 파라미터 생성부(110)는 컨벌루션 신경망의 첫번째 레이어에 대한 초기 입력데이터를 이진화하여 이진화 입력데이터를 생성하고, 상기 첫번째 레이어의 파라미터를 이진화하여 이진화 파라미터를 생성할 수 있다. 예시적으로 이진화 파라미터 생성부(110)는 상기 수학식 10을 통해 초기 입력 데이터 즉, 컨벌루션 입력을 이진화할 수 있다.Referring to FIG. 7, in step S1010, the binarization parameter generator 110 generates binarization input data by binarizing initial input data of a first layer of a convolutional neural network, and generates binarization parameters by binarizing the parameters of the first layer. Can be. For example, the binarization parameter generator 110 may binarize initial input data, that is, a convolution input, through Equation 10 above.

[수학식 10][Equation 10]

여기서, (A_l)^b _x,y,z는 첫번째 레이어의 이진화된 컨벌루션 입력을 나타내고, (A_l)_x,y,z는이진화 이전의 컨벌루션 입력을 나타낸다.Where (A _l ) ^b _{x, y, z} is Binary convolution input of the first layer, and ( _Al ) _{x, y, z} represents the convolution input before binarization.

또한, 이진화 파라미터 생성부(110)는 수학식 11을 통해 첫번째 레이어의 파라미터를 이진화하여 이진화 파라미터를 생성할 수 있다. In addition, the binarization parameter generator 110 may generate the binarization parameter by binarizing the parameter of the first layer through Equation (11).

[수학식 11][Equation 11]

단계 S1020에서 컨벌루션 연산부(120)는 상기 이진화 파라미터에 대하여 배수화를 수행하여 배수화 파라미터를 생성하고, 상기 이진화 입력데이터와 상기 배수화 파라미터 간의 컨벌루션 연산을 수행한 1차 연산 결과를 생성할 수 있다. 예시적으로, 컨벌루션 연산부(120)는 수학식 12를 통해 배수화 계수를 산출하고, 수학식 13을 통해 이진화 파라미터에 대한 배수화를 수행하여 배수화 파라미터를 생성할 수 있다.In operation S1020, the convolution calculator 120 may generate a multiplexing parameter by performing a multiplexing on the binarization parameter, and generate a first operation result of performing a convolution operation between the binarization input data and the multiplexing parameter. . For example, the convolution operator 120 may calculate a doubling factor through Equation 12, and generate a doubling parameter by performing multiplexing on the binarization parameter through Equation 13.

[수학식 12][Equation 12]

여기서, e_1,m은 배수화 계수를 나타낸다.Here, e _{1, m} represents a doubling coefficient.

[수학식 13][Equation 13]

여기서, (W ₁)^b _m,x,y,z는 이진화된 파라미터이고, e_1,m(W ₁)^b _m,x,y,z는 배수화 계수를 통해 상기 이진화된 파라미터를 이진화 이전의 파라미터 (W ₁)_m,x,y,z로 근사시킨 값을 나타낸다.Where ( W ₁ ) ^b _{m, x, y, z} is a binarized parameter, and e _{1, m} ( W ₁ ) ^b _{m, x, y, z} is a binarization factor before binarization. The parameter ( W ₁ ) represents a value approximated by _{m, x, y, z} .

컨벌루션 연산부(120)는 수학식 14를 통해 1차 연산 결과를 생성할 수 있다.The convolution operator 120 may generate the first operation result through Equation 14.

[수학식 14][Equation 14]

또한, 컨벌루션 연산부(120)는 1차 연산 결과를 배치 정규화 할 수 있다. 예시적으로, 컨벌루션 연산부(120)는 상기 수학식 8을 이용하여 배치 정규화를 수행할 수 있다.In addition, the convolution operator 120 may batch normalize the results of the first operation. For example, the convolution operator 120 may perform batch normalization using Equation 8.

[수학식 8][Equation 8]

단계 S1030에서 쇼트컷 연산부(130)는 상기 1차 연산 결과에 쇼트컷을 더하여 2차 연산 결과를 산출할 수 있다. 예시적으로, 상기 쇼트컷은 이진화 이전의 초기 입력데이터에 대응할 수 있다. 또한, 쇼트컷의 덧셈 연산에 의해 이진화 이전의 상기 초기 입력데이터가 고려될 수 있다.In operation S1030, the shortcut operator 130 may calculate the secondary operation result by adding the shortcut to the primary operation result. In exemplary embodiments, the shortcut may correspond to initial input data before binarization. In addition, the initial input data before binarization may be considered by an addition operation of a shortcut.

[수학식 16][Equation 16]

상기 수학식 16은 배치 정규화 하지 않은 1차 연산 결과(수학식 14 참조)와 초기 입력데이터의 덧셈 연산을 나타내며, 이는 공변량 이동 현상을 억제하는 것에 대한 고려가 되어 있지 않다. 따라서, 쇼트컷 연산부(130)는 상기 수학식 15와 같이 1차 연산 결과를 배치 정규화하고, 초기 입력데이터를 정규화할 수 있으며, 배치 정규화된 1차 연산 결과 및 배치 정규화된 쇼트컷을 덧셈 연산할 수 있다.Equation 16 represents an addition operation of the first input result (see Equation 14) and initial input data without batch normalization, which is not considered to suppress the covariate shift phenomenon. Accordingly, the shortcut operator 130 may perform batch normalization on the first order result and normalize initial input data as shown in Equation 15, and perform addition calculation on the batch normalized first order result and batch normalized shortcut. Can be.

[수학식 17][Equation 17]

수학식 17에 따르면, 배치 정규화된 1차 연산 결과 및 배치 정규화된 쇼트컷을 더하였으므로, 다시 배치 정규화를 수행할 필요가 있다. 정규화 연산부(140)는 2차 연산 결과에 대하여 풀링 및 배치 정규화를 수행할 수 있다. 정규화 연산부(140)는 상기 수학식 8을 이용하여 2차 연산 결과를 배치 정규화 할 수 있다. 또한, 정규화 연산부(140)는 2차 연산 결과를 활성화하고, 풀링할 수 있다.According to Equation 17, since the batch normalized first order result and the batch normalized shortcut are added, it is necessary to perform batch normalization again. The normalization operator 140 may perform pooling and batch normalization on the result of the quadratic operation. The normalization operator 140 may perform batch normalization on the result of the quadratic operation using Equation 8. In addition, the normalization operator 140 may activate and pool the quadratic result.

본원의 일 실시 예에 따른, 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 방법은, 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.According to an embodiment of the present disclosure, the improved binarization method of the first layer of the convolutional neural network may be implemented in the form of program instructions that may be executed by various computer means and may be recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The foregoing description of the application is intended for illustration, and it will be understood by those skilled in the art that the present invention may be easily modified in other specific forms without changing the technical spirit or essential features of the present application. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the following claims rather than the above description, and it should be construed that all changes or modifications derived from the meaning and scope of the claims and their equivalents are included in the scope of the present application.

100: 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 장치
110: 이진화 파라미터 생성부
120: 컨벌루션 연산부
130: 쇼트컷 연산부
200: 정규화 연산부100: Improved Binarization Device of the First Layer of Convolutional Neural Networks
110: binarization parameter generator
120: convolution calculation unit
130: short cut operation
200: normalization operation unit

Claims

An improved binarization method of a first layer of a convolutional neural network performed by an improved binarization apparatus of a first layer of a convolutional neural network,
(a) binarizing the initial input data for the first layer to generate binarization input data, and binarizing the parameter of the first layer to generate a binarization parameter;
(b) multiply the binarization parameter to generate a multiplication parameter, perform a convolution operation between the binarization input data and the multiplication parameter, and then perform a multiplication multiplication with the multiplying parameter on the convolution operation. Generating a first order operation result; And
(c) adding a short cut to the first operation result to calculate a second operation result,
And wherein said shortcut corresponds to said initial input data prior to binarization.

The method of claim 1,
And by the addition operation of the shortcut in step (c), initial input data prior to binarization in step (a) is taken into account.

The method of claim 1,
In step (b), the result of the first operation is batch normalized,
In step (c), wherein the shortcut is batch normalized.

The method of claim 3,
The quadratic operation result is calculated by Equation 1 below,
[Equation 1]

Here, e _{1, m} is the coefficient of the m-th filter, (W ₁ ) ^b _{m, x, y, z} is a binarization parameter of the first layer, A - ₀ is the initial input data, the convolutional neural network Improved binarization of the first layer.

The method of claim 1,
(d) further performing pooling and batch normalization on the result of the quadratic operation.

An improved binarization device of the first layer of the convolutional neural network,
A binarization parameter generator for generating binarization input data by binarizing initial input data for the layer, and generating binarization parameters by binarizing the parameters of the first layer;
Performs multiplexing on the binarization parameter to generate a multiplexing parameter, performs a convolution operation between the binarization input data and the multiplexing parameter, and then performs a multiplication multiplication with the multiplexing parameter with respect to the convolution operation. A convolution operator for generating a difference operation result; And
Including a shortcut operation to calculate a secondary operation result by adding a shortcut to the first operation result,
And wherein said shortcut corresponds to said initial input data prior to binarization.

The method of claim 6,
Wherein, by the addition operation of the shortcut, the initial input data before binarization is taken into account.

The method of claim 6,
The convolution operation unit,
Batch normalize the result of the first operation,
The short cut operation unit,
Improved binarization of the first layer of the convolutional neural network, wherein the shortcut is batch normalized.

The method of claim 8,
The short cut operation unit,
Equation 2 below to calculate the second operation result,
[Equation 2]

Here, e _{1, m} is the coefficient of the m th filter for the first layer, (W ₁ ) ^b _{m, x, y, z} is the binarization parameter of the first layer, A - ₀ is the initial input data Improved binarization of the first layer of the convolutional neural network.

The method of claim 6,
And a normalization operator configured to perform pooling and batch normalization on the quadratic result.

A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 1 to 5.