KR102271254B1

KR102271254B1 - Device and method for enhancing deduction speed of binarized convolutional neural network

Info

Publication number: KR102271254B1
Application number: KR1020190024092A
Authority: KR
Inventors: 김태환; 신지훈
Original assignee: 한국항공대학교산학협력단
Priority date: 2019-02-28
Filing date: 2019-02-28
Publication date: 2021-06-29
Also published as: KR20200105220A

Abstract

이진화된 컨벌루션 신경망의 추론 속도 향상 장치 및 방법이 개시되며, 본원의 일 실시예에 따른, 이진화된 컨벌루션 신경망의 추론 속도 향상 방법으로서, (a) 컨벌루션 레이어에 대한 입력데이터를 이진화하여 이진화 입력 데이터를 생성하고, 상기 컨벌루션 레이어의 파라미터를 이진화하여 이진화 파라미터를 생성하고, 상기 이진화 입력데이터와 상기 배수화 파라미터 간의 이진화된 컨벌루션 연산을 수행하는 단계, (b) 상기 이진화 파라미터에 대하여 배수화를 수행하여 배수화 파라미터를 생성하고, 이진화된 컨벌루션 연산 결과에 대하여 상기 배수화 파라미터를 곱하여 배수화 연산을 진행하고, 배수화 연산 결과에 대해 배치 정규화를 수행하는 단계 및 (c) 상기 배치 정규화된 결과값을 이진화하여 이진화 연산 결과값을 산출하는 단계 및 (d) 상기 이진화 연산 결과값을 입력 요소로 하는 최대값 풀링을 통해 추론을 수행하는 단계를 포함할 수 있다.An apparatus and method for improving the inference speed of a binarized convolutional neural network are disclosed, and as a method for improving the inference speed of a binarized convolutional neural network according to an embodiment of the present application, (a) binarizing the input data to the convolutional layer to obtain the binarized input data generating, binarizing the parameters of the convolutional layer to generate a binarized parameter, and performing a binarized convolution operation between the binarized input data and the doubling parameter, (b) multiplying the binarized parameter by performing a doubling generating a binarization parameter, performing a multiplication operation by multiplying the binarized convolutional operation result by the multiplication parameter, and performing batch normalization on the multiplication operation result, and (c) binarizing the batch normalized result value and calculating a binarization operation result value by doing so, and (d) performing inference through maximum value pooling using the binarization operation result value as an input element.

Description

DEVICE AND METHOD FOR ENHANCED DEDUCTION SPEED OF BINARIZED CONVOLUTIONAL NEURAL NETWORK

본원은 이진화된 컨벌루션 신경망의 추론 속도 향상 장치 및 방법에 관한 것이다.The present application relates to an apparatus and method for improving inference speed of a binarized convolutional neural network.

이진화된 컨벌루션 신경망(Binarized convolutional neural network, BCNN) 은 가중치와 특징의 각 요소가 단일 비트로 표현되는 컨벌루션 신경망의 종류 중 하나이다. 이진화된 컨벌루션 신경망은 CNN의 연산량과 메모리 요구량을 감소시킬 수 있다. A binarized convolutional neural network (BCNN) is a type of convolutional neural network in which each element of a weight and feature is represented by a single bit. The binarized convolutional neural network can reduce the computation and memory requirements of CNNs.

한편 최근에는 이진화로 인해 효율적인 신경망을 구현하면서 기존의 CNN과의 정확도 차이를 줄일 수 있는 BCNN이 연구되고 있다. 예시적으로, 가중치와 특징이 이진화되어 있더라도 소규모 데이터 세트의 분류를 통해 준수한 정확도를 보장하는 BinaryNet의 연구와 가중치와 특징이 다중 이진베이스와 함께 이진화되어 분류 정확도를 향상시키는 연구 등이 수행된 바 있다.Meanwhile, recently, BCNN, which can reduce the difference in accuracy with existing CNNs while implementing an efficient neural network due to binarization, is being studied. For example, studies on BinaryNet that guarantees the accuracy that is acceptable through classification of small data sets even if weights and features are binarized, and studies on improving classification accuracy by binarizing weights and features with multiple binary bases have been conducted. .

그러나, 이진화된 컨벌루션 신경망을 통해 추론 속도를 향상시키는 기술에 대한 연구는 미흡한 실정이며, BCNN을 포함한 다양한 신경망에 적용되어 추론 속도를 향상시킬 수 있는 신경망 모델에 대한 기술과 관련하여 그 개발 수준이 마땅치 않은 실정이다.However, research on technology to improve inference speed through binarized convolutional neural networks is insufficient, and the level of development is not adequate in relation to the technology for neural network models that can be applied to various neural networks including BCNN to improve inference speed. it is not the case.

본원의 배경이 되는 기술은 한국공개특허공보 제10-2018-0138441 호에 개시되어 있다.The technology that is the background of the present application is disclosed in Korean Patent Application Laid-Open No. 10-2018-0138441.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 기존의 이진화된 컨벌루션 신경망의 영상 분석 정확도를 유지하면서 추론 시간을 감소시킬 수 있는 이진화된 컨벌루션 신경망의 추론 속도 향상 장치 및 방법을 제공하는 것을 목적으로 한다.The present application is to solve the problems of the prior art described above, and to provide an apparatus and method for improving the inference speed of a binarized convolutional neural network that can reduce the inference time while maintaining the image analysis accuracy of the existing binarized convolutional neural network do it with

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 이진화된 컨벌루션 신경망의 프로세스를 변경하여 추론 속도를 향상시킬 수 있는 이진화된 컨벌루션 신경망의 추론 속도 향상 장치 및 방법을 제공하는 것을 목적으로 한다.An object of the present application is to provide an apparatus and method for improving the inference speed of a binarized convolutional neural network, which can improve the inference speed by changing the process of the binarized convolutional neural network.

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들도 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problems to be achieved by the embodiment of the present application are not limited to the technical problems as described above, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 이진화된 컨벌루션 신경망의 추론 속도 향상 방법은, (a) 컨벌루션 레이어에 대한 입력데이터를 이진화하여 이진화 입력 데이터를 생성하고, 상기 컨벌루션 레이어의 파라미터를 이진화하여 이진화 파라미터를 생성하고, 상기 이진화 입력데이터와 상기 이진화 파라미터 간의 이진화된 컨벌루션 연산을 수행하는 단계, (b) 상기 이진화 파라미터에 대하여 배수화를 수행하여 배수화 파라미터를 생성하고, 이진화된 컨벌루션 연산 결과에 대하여 상기 배수화 파라미터를 곱하여 배수화 연산을 진행하고, 배수화 연산 결과에 대해 배치 정규화를 수행하는 단계, (c) 상기 배치 정규화된 결과값을 이진화하여 이진화 연산 결과값을 산출하는 단계 및 (d) 상기 이진화 연산 결과값을 입력 요소로 하는 최대값 풀링을 통해 추론을 수행하는 단계를 포함할 수 있다.As a technical means for achieving the above technical problem, the method for improving the inference speed of a binarized convolutional neural network according to an embodiment of the present application includes (a) binarizing input data for a convolutional layer to generate binarized input data, and A step of binarizing a parameter of a convolutional layer to generate a binarized parameter, and performing a binarized convolution operation between the binarized input data and the binarized parameter, (b) performing doubling on the binarized parameter to generate a binarized parameter, , performing a doubling operation by multiplying the binarized convolutional operation result by the doubling parameter, and performing batch normalization on the doubling operation result, (c) binarizing the batch normalized result value to a binarization operation result value and (d) performing inference through maximal pooling using the result of the binarization operation as an input element.

본원의 일 실시예에 따르면, 이진화된 컨벌루션 신경망의 추론 속도 향상 방법은 상기 (a) 단계 이전에, 상기 추론을 위한 이진화된 컨벌루션 신경망의 학습을 수행하는 단계를 더 포함하고, 상기 학습을 수행하는 단계는, 상기 최대값 풀링 이전에 상기 배치 정규화를 수행하고, 상기 최대값 풀링 이후, 상기 배치 정규화된 결과값을 이진화할 수 있다.According to an embodiment of the present application, the method for improving the inference speed of a binarized convolutional neural network further comprises, before the step (a), performing learning of the binarized convolutional neural network for the inference, and performing the learning. The step may include performing the batch normalization before the maximum value pooling, and binarizing the batch normalized result value after the maximum value pooling.

본원의 일 실시예에 따르면, 상기 이진화 연산 결과값에 대응하는 입력 요소는 -1 또는 +1의 값을 갖고, 상기 (d) 단계에서, 상기 이진화 연산 결과에 의한 풀링 윈도우 내의 요소 중 +1이 포함된 경우, 해당 풀링 윈도우의 출력 값을 +1로 결정할 수 있다.According to an embodiment of the present application, the input element corresponding to the result of the binarization operation has a value of -1 or +1, and in step (d), +1 of the elements in the pooling window by the result of the binarization operation is If included, the output value of the corresponding pooling window may be determined to be +1.

본원의 일 실시예에 따르면, (d) 단계는, 상기 풀링 윈도우 내의 입력 요소에 대한 순차적인 최대값 풀링 계산 도중 +1 값이 확인되면 나머지 입력 요소에 대한 계산을 생략하고 상기 풀링 윈도우에 대한 최대값 풀링의 출력값을 +1로 결정할 수 있다.According to an embodiment of the present application, in step (d), if a +1 value is confirmed during the sequential maximum pooling calculation for the input elements in the pooling window, the calculation for the remaining input elements is omitted and the maximum value for the pooling window is It can be determined that the output value of value pooling is +1.

본원의 일 실시예에 따르면, 상기 풀링 윈도우는 K x K 윈도우로서 k2개의 입력 요소를 가질 수 있다.According to an embodiment of the present application, the pooling window may have k2 input elements as a K x K window.

본원의 일 실시예에 따르면, 상기 이진화 입력 데이터는 아래의 식 1에 의해 산출될 수 있고,According to an embodiment of the present application, the binarized input data can be calculated by Equation 1 below,

[식 1][Equation 1]

여기서, sign(·)은 입력의 부호를 반환하는 연산이고, A^bx,y,z 은 이진화된 컨벌루션의 입력인 A^b의 (x, y, z) 위치 성분을 나타낸다.Here, sign(·) is an operation that returns the sign of the input, and A ^b x,y,z represents the (x, y, z) position component ^{of A b} , the input of the binarized convolution.

본원의 일 실시예에 따르면, 상기 이진화된 컨벌루션 연산은 아래의 식 2에 의해 산출될 수 있고,According to an embodiment of the present application, the binarized convolution operation can be calculated by Equation 2 below,

[식 2][Equation 2]

여기서, W^bm,x,y,z는 m번째 필터 W^bm의 (x, y, z) 위치 성분을 나타낸다.Here, W ^b m,x,y,z represents the (x, y, z) position component of the ^{m-th filter W b m.}

본원의 일 실시예에 따르면, 상기 배수화 계수는 아래의 식 3에 의해 산출될 수 있다.According to an embodiment of the present application, the doubling coefficient may be calculated by Equation 3 below.

[식 3][Equation 3]

본원의 일 실시예에 따르면, 상기 배치 정규화는 아래의 식 4에 의해 연산될 수 있고,According to an embodiment of the present application, the batch normalization can be calculated by Equation 4 below,

[식 4][Equation 4]

여기서, X_m은 배치 정규화 과정의 입력을 나타내고,

는 각각 m 번째 채널의 평균, 분산, 스케일링 가중치, 바이어스 가중치를 나타낸다.where X _m represents the input of the batch normalization process,

denotes the average, variance, scaling weight, and bias weight of the m-th channel, respectively.

본원의 일 실시예에 따른 이진화된 컨벌루션 신경망의 추론 속도 향상 장치는 컨벌루션 레이어에 대한 입력데이터를 이진화하여 이진화 입력 데이터를 생성하고, 상기 컨벌루션 레이어의 파라미터를 이진화하여 이진화 파라미터를 생성하고, 상기 이진화 입력데이터와 상기 이진화 파라미터 간의 이진화된 컨벌루션 연산을 수행하는 컨벌루션 연산부; 상기 이진화 파라미터에 대하여 배수화를 수행하여 배수화 파라미터를 생성하고, 이진화된 컨벌루션 연산 결과에 대하여 상기 배수화 파라미터를 곱하여 배수화 연산을 진행하고, 배수화 연산 결과에 대해 배치 정규화를 수행하는 정규화부; 상기 배치 정규화된 결과값을 이진화하여 이진화 연산 결과값을 산출하는 이진화 연산부; 및 상기 이진화 연산 결과값을 입력 요소로 하는 최대값 풀링을 통해 추론을 수행하는 추론부를 포함할 수 있다.The apparatus for improving inference speed of a binarized convolutional neural network according to an embodiment of the present application generates binarized input data by binarizing input data for a convolutional layer, and binarizing a parameter of the convolutional layer to generate a binarized parameter, and the binarized input a convolution operation unit for performing a binarized convolution operation between data and the binarization parameter; A normalization unit that performs doubling on the binarization parameter to generate a doubling parameter, multiplies the binarized convolutional operation result by the doubling parameter, performs a doubling operation, and performs batch normalization on the doubling operation result. ; a binarization operation unit for binarizing the batch normalized result value to calculate a binarization result value; and an inference unit for performing inference through maximum value pooling using the result of the binarization operation as an input element.

본원의 일 실시예에 따르면, 이진화된 컨벌루션 신경망의 추론 속도 향상 장치는 상기 추론을 위한 이진화된 컨벌루션 신경망의 학습을 수행하는 학습부를 더 포함하고, 상기 학습부는, 상기 최대값 풀링 이전에 상기 배치 정규화를 수행하고, 상기 최대값 풀링 이후, 상기 배치 정규화된 결과값을 이진화할 수 있다.According to an embodiment of the present application, the apparatus for improving the inference speed of a binarized convolutional neural network further includes a learning unit configured to perform learning of the binarized convolutional neural network for the inference, wherein the learning unit is configured to perform the batch normalization prior to the maximum pooling. , and after pooling the maximum value, the batch normalized result value may be binarized.

본원의 일 실시예에 따르면, 상기 이진화 연산 결과값에 대응하는 입력 요소는 -1 또는 +1의 값을 갖고, 상기 추론부는, 상기 이진화 연산 결과에 의한 풀링 윈도우 내의 요소 중 +1이 포함된 경우, 해당 풀링 윈도우의 출력 값을 +1로 결정할 수 있다.According to an embodiment of the present application, when the input element corresponding to the result of the binarization operation has a value of -1 or +1, and the reasoning unit includes +1 among the elements in the pooling window by the result of the binarization operation, , an output value of the corresponding pooling window may be determined to be +1.

본원의 일 실시예에 따르면, 상기 추론부는, 상기 풀링 윈도우 내의 입력 요소에 대한 순차적인 최대값 풀링 계산 도중 +1 값이 확인되면 나머지 입력 요소에 대한 계산 을 생략하고 상기 풀링 윈도우에 대한 최대값 풀링의 출력값을 +1로 결정할 수 있다.According to an embodiment of the present application, the inference unit, if a +1 value is confirmed during the sequential maximum pooling calculation for the input elements within the pooling window, omit the calculation for the remaining input elements and pool the maximum value for the pooling window It can be determined that the output value of +1 is +1.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary, and should not be construed as limiting the present application. In addition to the exemplary embodiments described above, additional embodiments may exist in the drawings and detailed description.

전술한 본원의 과제 해결 수단에 의하면, 기존의 이진화된 컨벌루션 신경망의 영상 분석 정확도를 유지하면서 추론 시간을 감소시킬 수 있는 이진화된 컨벌루션 신경망의 추론 속도 향상 장치 및 방법을 제공할 수 있다.According to the above-described problem solving means of the present application, it is possible to provide an apparatus and method for improving the inference speed of a binarized convolutional neural network, which can reduce the inference time while maintaining the image analysis accuracy of the existing binarized convolutional neural network.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 이진화된 컨벌루션 신경망의 프로세스를 변경하여 추론 속도를 향상시킬 수 있는 이진화된 컨벌루션 신경망의 추론 속도 향상 장치 및 방법을 제공할 수 있다.The present application provides an apparatus and method for improving the inference speed of a binarized convolutional neural network that can improve the inference speed by changing the process of the binarized convolutional neural network to solve the problems of the prior art described above.

도 1은 종래의 이진화된 컨벌루션의 프로세스를 도시한 도면이다.
도 2는 최대값 풀링의 예를 도시한 도면이다.
도 3은 본원의 일 실시예에 따른 이진화된 컨벌루션 신경망의 추론 속도 향상 장치의 구성을 도시한 도면이다.
도 4는 본원의 일 실시예에 따른 이진화된 컨벌루션 신경망의 학습을 위한 프로세스를 도시한 도면이다.
도 5 는 본원의 일 실시예에 따른 이진화된 컨벌루션 신경망의 추론을 위한 프로세스를 도시한 도면이다.
도 6은 본원의 일 실시예에 따른 이진화된 컨벌루션 신경망의 추론 속도 향상 장치에 의한 분석 성능을 비교한 도면이다.
도 7은 본원의 일 실시예에 따른 이진화된 컨벌루션 신경망의 추론 속도 향상 방법의 흐름을 도시한 도면이다.1 is a diagram illustrating a process of a conventional binarized convolution.
2 is a diagram illustrating an example of maximum pooling.
3 is a diagram illustrating the configuration of an apparatus for improving inference speed of a binarized convolutional neural network according to an embodiment of the present application.
4 is a diagram illustrating a process for learning a binarized convolutional neural network according to an embodiment of the present application.
5 is a diagram illustrating a process for inference of a binarized convolutional neural network according to an embodiment of the present application.
6 is a diagram comparing analysis performance by an apparatus for improving inference speed of a binarized convolutional neural network according to an embodiment of the present application.
7 is a diagram illustrating a flow of a method for improving inference speed of a binarized convolutional neural network according to an embodiment of the present application.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present application will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art to which the present application pertains can easily implement them. However, the present application may be embodied in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present application in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when a part is said to be "connected" with another part, it includes not only the case where it is "directly connected" but also the case where it is "electrically connected" with another element interposed therebetween. do.

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout this specification, when it is said that a member is positioned "on", "on", "on", "under", "under", or "under" another member, this means that a member is positioned on the other member. It includes not only the case where they are in contact, but also the case where another member exists between two members.

본원 명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함" 한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout this specification, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated.

도 1은 종래의 이진화된 컨벌루션의 프로세스를 도시한 도면이다.1 is a diagram illustrating a process of a conventional binarized convolution.

종래의 이진화된 컨벌루션 신경망(Binaryized convolutional neural network, BCNN)(이하, 종래의 BCNN)을 통한 추론은 다수의 MAC(multiply-and-accumulate) 연산과 연계되어 있다. 반면, 종래의 BCNN을 통한 추론에서 MAC 연산은 단순한 XNOR-bitcount 연산에 의해 동일하게 실현될 수 있으므로 기존의 CPU 기반 시스템에서도 신속한 추론이 이루어질 수 있다. 따라서, 종래의 BCNN은 GPU 또는 전용 하드웨어가 구비되지 않은 연산제한장치(Computation Limited Device)에서 인공지능을 구현하기 위해 연구된 바 있다. Inference through a conventional binary convolutional neural network (BCNN) (hereinafter, conventional BCNN) is associated with a plurality of multiply-and-accumulate (MAC) operations. On the other hand, in the conventional reasoning through BCNN, the MAC operation can be equally realized by a simple XNOR-bitcount operation, so that rapid inference can be made in the existing CPU-based system. Therefore, the conventional BCNN has been studied to implement artificial intelligence in a computation limited device (Computation Limited Device) that is not equipped with a GPU or dedicated hardware.

이진화를 통한 효율성을 확보하면서, 기존의 컨벌루션 신경망과의 정확도 차이를 줄이기 위해 연구된 종래의 BCNN은 도 1을 참조하면, 이진화된 컨벌루션(S11), 최대값 풀링(S12), 배수화 및 배치 정규화(S13) 및 이진화(S14)의 순서로 정렬될 수 있고, 각 프로세스를 블록으로 표현한 경우, BCNN은 각 블록이 직렬로 연결된 블록 집합으로 표현될 수 있다. 후술하는 이진화된 컨벌루션 신경망의 추론 속도 향상 장치는 상기 블록 구조를 변경하여 이진화된 컨벌루션 신경망의 영상 분석 정확도를 유지하면서 추론 속도를 향상시킬 수 있다. 먼저 종래의 BCNN에 대해 살펴보면, 이진화된 컨벌루션(S11)은 이전 레이어의 이진화를 통해 입력 데이터를 이진화하고, 컨벌루션 레이어의 파라미터 또한 이진화될 수 있다. 이진화된 컨벌루션의 입력과 이진화 파라미터 간의 컨벌루션 연산은 Bitwise XNOR와 누산으로 이루어질 수 있다.Referring to FIG. 1, the conventional BCNN studied to reduce the accuracy difference with the existing convolutional neural network while securing efficiency through binarization is binarized convolution (S11), maximum value pooling (S12), multiplication and batch normalization. It can be arranged in the order of (S13) and binarization (S14), and when each process is expressed as a block, BCNN can be expressed as a set of blocks in which each block is serially connected. An apparatus for improving the inference speed of a binarized convolutional neural network, which will be described later, may improve the inference speed while maintaining the image analysis accuracy of the binarized convolutional neural network by changing the block structure. First, looking at the conventional BCNN, the binarized convolution S11 may binarize input data through binarization of a previous layer, and parameters of the convolution layer may also be binarized. The convolution operation between the input of the binarized convolution and the binarization parameter can be performed by bitwise XNOR and accumulation.

도 2는 최대값 풀링의 예를 도시한 도면이다.2 is a diagram illustrating an example of maximum pooling.

컨벌루션 연산 이후 최대값 풀링(S12)이 이루어질 수 있다. 최대값 풀링은 국부 영역(K X K)으로 형성된 풀링 윈도우(10) 내의 값들에 대한 최대값을 선택하여 입력데이터의 크기를 줄이는 것을 의미한다. 따라서, K²개의 입력 값이 계산될 필요가 있다. 예시적으로 도 2는 스트라이드 값이 2이고 영역이 3 X 3인 경우의 최대값 풀링 결과를 나타낸다. 최대값 풀링 이후 배수화 및 배치 정규화(S13)가 이루어질 수 있다. 배수화 및 배치 정규화는 아핀 변환(affine transformation)으로 효율적으로 구현될 수 있으므로, 배수화 및 배치 정규화가 하나의 블록으로 병합될 수 있다. 이후, 배치 정규화된 결과값에 대한 이진화(S14)를 통해 레이어의 출력이 산출될 수 있다. After the convolution operation, maximum pooling ( S12 ) may be performed. Maximum pooling means reducing the size of input data by selecting a maximum value for values within the pooling window 10 formed by the local area KXK. Therefore, K ² input values need to be calculated. Exemplarily, FIG. 2 shows the maximum pooling result when the stride value is 2 and the area is 3×3. After pooling the maximum value, multiplication and batch normalization (S13) may be performed. Since the fold normalization and batch normalization can be efficiently implemented as an affine transformation, the fold normalization and batch normalization can be merged into one block. Thereafter, the output of the layer may be calculated through binarization ( S14 ) of the batch normalized result value.

종래의 BCNN에서는 최대값 풀링의 출력 하나의 성분 값을 계산하기 위해서는 k2 개의 입력 값을 모두 계산해야하는 비효율적인 측면이 존재한다. 이에 본원의 일 실시예에 따른 이진화된 컨벌루션 신경망의 추론 속도 향상 장치는 최대값 풀링의 입력 요소가 이진수가 되도록 BCNN의 구조를 변경하여 최대값 풀링 연산을 조기에 종료할 수 있다.In the conventional BCNN, in order to calculate one component value of the output of maximum pooling, all k2 input values have to be calculated, which is inefficient. Accordingly, the apparatus for improving the inference speed of a binarized convolutional neural network according to an embodiment of the present application may change the structure of the BCNN so that the input element of the maximum value pooling becomes a binary number, thereby terminating the maximum value pooling operation early.

도 3은 본원의 일 실시예에 따른 이진화된 컨벌루션 신경망의 추론 속도 향상 장치의 구성을 도시한 도면이고, 도 4는 본원의 일 실시예에 따른 이진화된 컨벌루션 신경망의 학습을 위한 프로세스를 도시한 도면이다.3 is a diagram illustrating the configuration of an apparatus for improving inference speed of a binarized convolutional neural network according to an embodiment of the present application, and FIG. 4 is a diagram illustrating a process for learning a binarized convolutional neural network according to an embodiment of the present application to be.

도 3을 참조하면, 이진화된 컨벌루션 신경망의 추론 속도 향상 장치(100)는 학습부(110), 컨벌루션 연산부(120), 정규화부(130), 이진화 연산부(140) 및 추론부(150)를 포함할 수 있다. 도 4를 참조하면, 컨벌루션 연산부(120)는 컨벌루션 레이어에 대한 입력데이터를 이진화하여 이진화 입력 데이터를 생성할 수 있다. 이진화 입력 데이터는 수학식 1을 통해 산출될 수 있다.Referring to FIG. 3 , the apparatus 100 for improving the inference speed of a binarized convolutional neural network includes a learning unit 110 , a convolution operation unit 120 , a normalization unit 130 , a binarization operation unit 140 , and an inference unit 150 . can do. Referring to FIG. 4 , the convolution operator 120 may generate binarized input data by binarizing input data for a convolutional layer. The binarized input data may be calculated through Equation (1).

[수학식 1][Equation 1]

여기서, sign(·)은 입력의 부호를 반환하는 연산이고, A^bx,y,z는 이진화된 컨벌루션의 입력인 A^b의 (x, y, z) 위치 성분을 나타낸다. 입력데이터를 이진화할 경우, 이진화에 따른 데이터 손실로 인한 분석 성능이 떨어질 수 있으나, 컨벌루션이 이진화된 컨벌루션으로 대체되기 때문에, 파라미터 저장을 위한 메모리 요구량과 연산 복잡도가 저감될 수 있다.Here, sign(·) is an operation that returns the sign of the input, and A ^b x,y,z represents the (x, y, z) position component ^{of A b} , which is the input of the binarized convolution. When input data is binarized, analysis performance may be degraded due to data loss due to binarization, but since convolution is replaced with binarized convolution, memory requirements for parameter storage and computational complexity may be reduced.

또한, 컨벌루션 연산부(120)는 상기 컨벌루션 레이어의 파라미터를 이진화하여 이진화 파라미터를 생성할 수 있다. 컨벌루션 연산부(120)는 수학식 2를 통해 이진화 파라미터를 생성할 수 있다.Also, the convolution operation unit 120 may generate a binarization parameter by binarizing the parameter of the convolution layer. The convolution operator 120 may generate a binarization parameter through Equation (2).

[수학식 2][Equation 2]

여기서, (W₁)^b _m,x,y,z는 첫번째 레이어의 이진화된 파라미터를 나타내고, (W₁)_m,x,y,z는 첫번째 레이어의 이진화 이전의 파라미터를 나타낸다.Here, (W ₁ ) ^b _m,x,y,z represents the binarized parameter of the first layer, and (W ₁ ) _m,x,y,z represents the parameter before binarization of the first layer.

또한, 컨벌루션 연산부(120)는 상기 이진화 입력데이터와 이진화 파라미터(예를 들면 가중치가 고려된 이진화 파라미터) 간의 이진화된 컨벌루션 연산을 수행할 수 있다(S41). 이진화된 컨벌루션 연산은 수학식 3을 통해 산출될 수 있다.Also, the convolution operation unit 120 may perform a binarized convolution operation between the binarized input data and a binarization parameter (eg, a binarization parameter in which a weight is considered) ( S41 ). The binarized convolution operation may be calculated through Equation (3).

[수학식 3][Equation 3]

여기서, W^bm,x,y,z는 m번째 필터 W^bm의 (x, y, z) 위치 성분을 나타낸다. 또한, 이진화 파라미터는 해당 컨벌루션 레이어의 가중치가 고려된 파라미터를 의미할 수 있다.Here, W ^b m,x,y,z represents the (x, y, z) position component of the ^{m-th filter W b m.} Also, the binarization parameter may mean a parameter in which the weight of the corresponding convolutional layer is considered.

정규화부(130)는 상기 이진화 파라미터에 대하여 배수화를 수행하여 배수화 파라미터를 생성할 수 있다. 배수화 연산은 가중치의 이진화 과정으로 인한 분석 성능의 감소를 줄이기 위해 수행될 수 있다. 다시 말해, 배수화 연산은 배수화 계수 e를 도입하여 이진화된 파라미터를 이진화 이전의 파라미터로 근사시킬 수 있는 연산을 의미한다. 배수화 계수는 수학식 4에 의해 산출될 수 있다. 배수화 계수는 각 채널마다 해당 채널 파라미터의 절대값 평균으로 정의될 수 있으며, 해당 레이어의 채널 수만큼 생성될 수 있다 .The normalizer 130 may generate a multiplied parameter by performing multiplication on the binarization parameter. The multiplication operation may be performed to reduce the decrease in analysis performance due to the binarization process of the weights. In other words, the doubling operation refers to an operation capable of approximating a binarized parameter to a parameter before binarization by introducing a doubling coefficient e. The multiplication coefficient may be calculated by Equation (4). The multiplication coefficient may be defined as the average of the absolute values of the corresponding channel parameters for each channel, and may be generated as many as the number of channels of the corresponding layer.

[수학식 4][Equation 4]

여기서, e_m은 배수화 계수를 나타낸다.Here, e _m represents the multiplication factor.

또한, 정규화부(130)는 이진화된 컨벌루션 연산 결과에 대하여 상기 배수화 파라미터를 곱하여 배수화 연산을 진행할 수 있다. 정규화부(130)는 수학식 5와 같이 배수화 연산을 진행함에 따라 이진화된 파라미터 값을 이진화 이전의 파라미터 값에 근사화 시킬 수 있다.Also, the normalizer 130 may perform a multiplication operation by multiplying the binarized convolutional operation result by the multiplication parameter. The normalization unit 130 may approximate the binarized parameter value to the parameter value prior to binarization as the multiplication operation is performed as shown in Equation (5).

[수학식 5][Equation 5]

즉, 배수화는 이진화된 컨벌루션 결과 O의 각 성분에 e를 곱하는 연산으로, w·h·M 번의 부동소수점 곱셈 연산이 요구되며, 이후 진행되는 배치 정규화 및 최대값 풀링 모두 동일한 수의 부동소수점 연산을 소모한다. That is, doubling is an operation that multiplies each component of the binarized convolution result O by e, and requires w·h·M floating-point multiplication operations. consumes

정규화부(130)는 이진화된 컨벌루션 연산 결과에 대하여 상기 배수화 파라미터를 곱하여 배수화 연산을 진행하고, 배수화 연산 결과에 대해 배치 정규화를 수행할 수 있다(S42). 배치 정규화는 수학식 6에 의해 연산될 수 있다. 수학식 The normalization unit 130 may perform a multiplication operation by multiplying the binarized convolutional operation result by the doubling parameter, and may perform batch normalization on the doubling operation result ( S42 ). Batch normalization can be calculated by Equation (6). formula

[수학식 6][Equation 6]

여기서, X_m은 배치 정규화 과정의 입력을 나타내고,

는 각각 m 번째 채널의 평균, 분산, 스케일링 가중치, 바이어스 가중치를 나타낸다. 또한, 상기 m 번째 채널의 평균, 분산, 스케일링 가중치, 바이어스 가중치 각각은 필터 수, 즉 입력의 채널 수 만큼의 개수를 가질 수 있다. 보편적으로 2ⅹ2, 혹은 3ⅹ3의 국부 영역 내의 값들의 평균값 또는 최댓값을 선택하여 입력의 크기를 (w/2)ⅹ(h/2)ⅹM 으로 줄인다.where X _m represents the input of the batch normalization process,

denotes the average, variance, scaling weight, and bias weight of the m-th channel, respectively. Also, each of the average, variance, scaling weight, and bias weight of the m-th channel may have the same number of filters, that is, as many as the number of input channels. In general, the size of the input is reduced to (w/2)×(h/2)×M by selecting the average or maximum value of the values in the local region of 2×2 or 3×3.

이진화된 컨벌루션 신경망의 추론 속도 향상 장치(100)를 통한 추론에 앞서 학습 과정이 선행될 수 있다. 상기 학습은 학습을 위한 입력 데이터(학습 데이터)를 통해 이진화된 컨벌루션 신경망의 파라미터(가중치가 고려된 파라미터) 값을 갱신하면서 최적값을 산출하는 과정을 나타내며 학습부(110)에 의해 수행되는 '학습 단계'라 할 수 있다. 한편, 전술한 이진화된 컨벌루션 신경망의 추론 속도 향상 장치(100)를 통한 추론은 학습된 파라미터를 사용하여 이진화된 컨벌루션 신경망의 컨벌루션 연산, 배수화 및 배치 정규화, 이진화, 최대값 풀링 연산을 수행하는 과정을 포함하는 추론 단계라 할 수 있다. 즉, 학습부(110)와 추론부(150) 모두 컨벌루션 연산, 배수화 및 배치 정규화, 이진화 및 최대값 풀링 연산을 수행하나, 학습부(110)는 학습을 통해 상기 파라미터의 최적값을 찾는 학습 과정을 수행하고, 추론부(150)는 학습부(110)에 의해 학습된 파라미터를 실제로 사용하는 추론 과정을 수행하는 것이라 할 수 있다.A learning process may precede inference through the apparatus 100 for improving the inference speed of a binarized convolutional neural network. The learning refers to the process of calculating the optimal value while updating the parameter (weighted parameter) value of the binarized convolutional neural network through input data (learning data) for learning, and 'learning performed by the learning unit 110' step'. On the other hand, inference through the apparatus 100 for improving the inference speed of the binarized convolutional neural network described above is a process of performing convolution operation, multiplication and batch normalization, binarization, and maximum pooling operation of the binarized convolutional neural network using the learned parameters. It can be called an inference step that includes That is, both the learning unit 110 and the inference unit 150 perform convolutional operations, multiples and batch normalization, binarization and maximum pooling operations, but the learning unit 110 learns to find the optimal value of the parameter through learning. process, and the reasoning unit 150 may be said to perform an inference process that actually uses the parameters learned by the learning unit 110 .

이진화된 컨벌루션 신경망의 추론 속도 향상 장치(100)를 통한 추론은 배수화 및 배치 정규화(S42)이후, 이진화 및 최대값 풀링을 통해 도출되나, 이진화된 컨벌루션 신경망의 학습은 도 4에 도시된 바와 같이 최대값 풀링(S43)이 이진화(S44)보다 선행될 수 있다. 구체적으로, 종래의 BCNN에서 최대값 풀링(S12) 이후 배수화 및 배치 정규화(S13)가 수행되는 것과 달리, 학습부(110)는 최대값 풀링(S43) 이전에 배치 정규화(S42)를 수행하고, 최대값 풀링(S43) 이후 배치 정규화된 결과값을 이진화(S44)할 수 있다. 이는 분산 방식으로 풀링 윈도우의 각 요소에 분산적으로 증가하는 이진화 함수를 적용하는 것으로 간주될 수 있으므로, 프로세스 순서의 변화가 이진화된 컨벌루션 신경망에 영향을 미치지 않는다. 구체적으로, 배치 정규화의 스케일링 가중치

은 항상 양의 값을 유지하며 학습시키기 때문에, 배치 정규화 연산은 단조 증가 함수이다. 또한, 배수화 계수 e_m은 정의상 항시 양의 값을 가지며, 이진화 함수 또한 단조 증가 함수이기 때문에 프로세스 순서의 변화는 최종 출력값의 영향을 미치지 않을 수 있다.Inference through the apparatus 100 for improving the inference speed of a binarized convolutional neural network is derived through binarization and maximum pooling after doubling and batch normalization (S42), but the learning of the binarized convolutional neural network is as shown in FIG. Maximum pooling (S43) may precede binarization (S44). Specifically, unlike the conventional BCNN in which multiples and batch normalization (S13) are performed after maximum pooling (S12), the learning unit 110 performs batch normalization (S42) before maximum pooling (S43) and , after the maximum value pooling (S43), the batch normalized result value may be binarized (S44). This can be considered as applying a distributively increasing binarization function to each element of the pooling window in a distributed manner, so that changes in the process order do not affect the binarized convolutional neural network. Specifically, the scaling weights of batch normalization

Batch regularization is a monotonically increasing function, since it always learns while maintaining a positive value. In addition, the multiplication coefficient e _m always has a positive value by definition, and since the binarization function is also a monotonically increasing function, a change in the process order may not affect the final output value.

도 5는 본원의 일 실시예에 따른 이진화된 컨벌루션 신경망의 추론을 위한 프로세스를 도시한 도면이다.5 is a diagram illustrating a process for inference of a binarized convolutional neural network according to an embodiment of the present application.

이진화된 컨벌루션 신경망을 통한 학습에서는 도 5를 통해 설명할 이진화된 컨벌루션 신경망의 추론과정에 따른 최대값 풀링(S46) 이전에 이진화(S45)를 수행하는 프로세스를 적용할 수 없다. 구체적으로, '컨벌루션-배치정규화-이진화-최대값풀링'의 구조로 ‘학습’할 경우, 둘 이상의 최대 요소 존재와 그로 인한 그라디언트 역전파를 위한 최대 요소 선택이 비결정적인 문제점이 발생될 수 있다(예를 들어, k x k 윈도우 내에 +1 이 2개 존재할 경우). 이에 따라, 이진화된 컨벌루션 신경망의 추론과정과 같이 최대값풀링 직전에 이진화를 수행해서 최대값풀링 입력이 -1 또는 +1 의 값 만을 갖게 하는 방식은 '학습' 단계에서는 사용할 수 없다. 즉, 학습 과정에 있어서, 최대값풀링 직전에 이진화를 수행하게 되면, 최대값 풀링 입력이 -1 또는 +1의 값만을 갖게 되므로, 풀링 윈도우 내의 둘 이상의 최대 요소가 존재할 가능성이 높다고 할 수 있고, 이에 따라 그라디언트(gradient)를 역전파 하기 위해 필요한 하나의 최대 요소는 비결정적(ambiguously)으로 선택되기 때문에 부적절한 그라디언트가 선택될 가능성이 있으므로 상기 추론을 위한 프로세스를 학습 과정에 적용할 수 없다. 따라서, 학습 과정에서는 도 4에 도시된 바와 같이 최대값 풀링(S43) 이후 이진화(S44)가 수행되는 과정을 따를 수 있고, 반면 추론 과정에서는 도 5에 도시된 바와 같이 이진화(S45) 이후 최대값 풀링(S46)이 수행되는 과정을 따를 수 있다. 이처럼 학습 프로세스와는 달리, 이진화(S45) 이후 최대값 풀링(S46)이 수행되는 추론 프로세스에 의하면, 풀링 윈도우 내의 요소가 이진화되므로, 최대값 풀링이 신속하게 이루어질 수 있으며, 결과적으로 추론 또한 신속하게 도출될 수 있다.In learning through the binarized convolutional neural network, the process of performing the binarization (S45) before the maximum value pooling (S46) according to the inference process of the binarized convolutional neural network, which will be described with reference to FIG. 5, cannot be applied. Specifically, when 'learning' with the structure of 'convolution-batch normalization-binarization-maximum pooling', there may be a problem that there are two or more maximum elements and the maximum element selection for gradient backpropagation is non-deterministic ( For example, if there are two +1s in a kxk window). Accordingly, a method in which the maximum pooling input has only a value of -1 or +1 by performing binarization immediately before maximum pooling, such as the inference process of a binarized convolutional neural network, cannot be used in the 'learning' stage. That is, in the learning process, if binarization is performed immediately before maximum pooling, the maximum pooling input has only a value of -1 or +1, so it is highly likely that two or more maximum elements within the pooling window exist, Accordingly, since one maximum factor required for backpropagating the gradient is selected ambiguously, there is a possibility that an inappropriate gradient is selected, so the process for inference cannot be applied to the learning process. Therefore, in the learning process, as shown in FIG. 4 , the process in which binarization ( S44 ) is performed after maximum pooling ( S43 ) can be followed, whereas in the inference process, the maximum value after binarization ( S45 ) as shown in FIG. 5 . A process in which the pulling ( S46 ) is performed may be followed. Unlike the learning process, according to the inference process in which the maximum value pooling (S46) is performed after the binarization (S45), since the elements in the pooling window are binarized, the maximum value pooling can be performed quickly, and as a result, the inference is also quickly performed. can be derived.

구체적으로, 이진화 연산부(140)는 배치 정규화된 결과값을 이진화하여 이진화 연산 결과값을 산출할 수 있다. 추론부(150)는 이진화 연산 결과값을 입력 요소로 하는 최대값 풀링을 통해 추론을 수행할 수 있다. 추론부(150)에서 수행되는 최대값 풀링은 이진화 연산부(140)에 의해 이진화된 연산 결과값을 대상으로 수행될 수 있다. 즉, 이진화 연산 결과값에 대응하는 입력 요소는 -1 또는 +1의 값을 가질 수 있다. 또한, 추론부(150)에서 수행되는 최대값 풀링의 풀링 윈도우는 K x K윈도우로서 K²개의 입력 요소를 가질 수 있다. 전술한 바와 같이, 최대값 풀링의 K²개의 입력 요소는 이진화된 값이므로, 추론부(150)는 이진화 연산 결과에 의한 풀링 윈도우 내의 요소 중 +1이 포함된 경우, 해당 풀링 윈도우의 출력 값을 +1로 결정할 수 있다. 구체적으로, 추론부(150)는 풀링 윈도우 내의 입력 요소에 대한 순차적인 최대값 풀링 계산 도중 +1 값이 확인되면 나머지 입력 요소에 대한 계산을 생략하고 상기 풀링 윈도우에 대한 최대값 풀링의 출력값을 +1로 결정할 수 있다. 즉, 추론부(150)에서 수행되는 최대값 풀링은 풀링 윈도우 내의 모든 K²개의 요소가 계산되는 종래의 BCNN과 달리, 풀링 윈도우 내의 -1과+1을 포함하는 K²개의 요소에 대한 순차적인 최대값 풀링 계산 도중 +1값이 확인되면 나머지 요소를 산출하기 위해 필요한 중복되는 계산을 생략할 수 있으므로, 최대값 풀링을 조기에 종료할 수 있으며, 이에 따라 신속한 추론이 가능해질 수 있다. 뿐만 아니라, 이진화된 입력 요소에 대해 최대값 풀링을 수행하므로, K x K의 풀링 윈도우의 크기가 커지더라도 신속한 추론이 가능할 수 있다.Specifically, the binarization operation unit 140 may binarize the batch normalized result value to calculate the binarization operation result value. The reasoning unit 150 may perform inference through maximum value pooling using the result of the binarization operation as an input element. The maximum value pooling performed by the inference unit 150 may be performed on the result value of the operation binarized by the binarization operation unit 140 . That is, the input element corresponding to the result value of the binarization operation may have a value of -1 or +1. In addition, the pooling window of the maximum pooling performed by the inference unit 150 may have ^{K 2 input elements as a K x K window.} As described above, since the K ² input elements of the maximum pooling are binarized values, the reasoning unit 150 calculates the output value of the corresponding pooling window when +1 of the elements in the pooling window according to the binarization operation result is included. It can be decided by +1. Specifically, the reasoning unit 150 omits the calculation of the remaining input elements when a +1 value is confirmed during the sequential maximum pooling calculation for the input elements within the pooling window, and adds the output value of the maximum pooling for the pooling window to + 1 can be determined. That is, the maximum value pooling performed by the inference unit 150 is sequential for ^{K 2} elements including -1 and +1 in the pooling window, unlike the conventional BCNN in which ^{all K 2 elements in the pooling window are calculated.} If the +1 value is confirmed during the maximum pooling calculation, the redundant calculation required to calculate the remaining elements can be omitted, so that the maximum pooling can be terminated early, thereby enabling rapid inference. In addition, since maximum value pooling is performed on the binarized input element, rapid inference may be possible even if the size of the K x K pooling window increases.

풀링 윈도우 내의 -1과 +1의 비율은 추론 속도 향상에 영향을 미친다. 본원의 일 실시예에 따른 이진화된 컨벌루션 신경망의 추론을 위한 프로세스는 배수화 및 배치 정규화(S42) 및 이진화(S45)의 연산이 최대값 풀링 연산 내에 포함되도록 알고리즘을 구축하므로, 풀링 윈도우 내의 -1의 비율이 높을수록 추론 시간이 길어질 수 있다. 구체적으로,'컨벌루션-배치정규화-이진화-최대값풀링'에 따른 추론 프로세스의 경우, 배치 정규화(배수화 포함)와 이진화가 최대값 풀링 계산 내부에 포함되기 때문에 연산 수가 늘어날 수 있다. 그런데, 이러한 경우, 풀링 윈도우 내의 -1의 비율이 높아지게 되면 오히려 추론 시간이 길어질 수 있다. 즉, 연산의 수가 증가한 배치정규화 및 이진화의 연산 시간보다 이진화된 컨벌루션 신경망의 추론을 위한 프로세스를 적용함으로써 감소된 컨벌루션 연산 시간이 적은 경우, 추론 시간이 오히려 증가할 수 있다. 예시적으로, 입력의 크기가 8 x 8이고, 풀링 윈도우의 크기가 3 x 3이고, 스트라이드가 2인 경우, 출력의 크기는 4 x 4가 되며, 출력의 각각의 요소에 대한 계산을 위해 (3 x 3 - skip)번의 배수화, 배치 정규화와 이진화가 수행된다(여기서, skip은 풀링 윈도우 내의 입력 요소에 대한 순차적인 최대값 풀링 계산 도중 +1 값이 확인되면 나머지 입력 요소에 대해 생략되는 연산의 수를 의미할 수 있음). 즉 4 x 4크기의 출력을 모두 계산하기 위해서는 (4 x 4 x 3 x3 - skip 총량)번의 배수화, 배치 정규화 및 이진화가 수행된다. 반면, 컨벌루션 연산, 최대값 풀링, 배수화, 배치정규화 및 이진화의 프로세스를 통해 학습 및 추론하는 종래의 BCNN의 경우, 최대값 풀링이 배수화, 배치정규화 및 이진화보다 먼저 수행되기 때문에 배수화, 배치정규화 및 이진화는 4 x 4번만 수행될 수 있다. 하지만, 프로세스의 변화에 따라 배수화 및 배치 정규화와 이진화 연산에서의 연산 시간이 상술한 바와 같이 종래의 BCNN에 비해 증가하더라도, 배수화 및 배치 정규화와 이진화 연산에 비해 상대적으로 연산 시간이 긴 컨벌루션 연산이 상기 skip 총량만큼 단축될 수 있어 결과적으로 추론 시간이 종래 대비 크게 단축될 수 있다.이러한 프로세스의 변화에 따른 추론의 속도 향상에 대한 효과는 후술하는 실험을 통해 효용성을 검증하고자 한다.The ratio of -1 to +1 within the pooling window affects the speed of inference. The process for inference of a binarized convolutional neural network according to an embodiment of the present application builds an algorithm such that the operations of multiplication and batch normalization (S42) and binarization (S45) are included in the maximum pooling operation, so -1 in the pooling window The higher the ratio, the longer the inference time may be. Specifically, in the case of an inference process according to 'convolution-batch normalization-binarization-maximum pooling', the number of operations may increase because batch normalization (including multiplication) and binarization are included in the maximum pooling calculation. However, in this case, if the ratio of -1 in the pooling window is increased, the inference time may be rather long. That is, when the convolution operation time reduced by applying the process for inference of a binarized convolutional neural network is smaller than the operation time of batch normalization and binarization in which the number of operations is increased, the inference time may rather increase. Exemplarily, if the size of the input is 8 x 8, the size of the pooling window is 3 x 3, and the stride is 2, the size of the output is 4 x 4, and for calculations for each element of the output ( Multiplexing, batch normalization, and binarization are performed 3 x 3 - skip) times (here, skip is an operation that is omitted for the remaining input elements when a value of +1 is confirmed during the sequential maximum pooling calculation for input elements within the pooling window) can mean the number of ). That is, in order to calculate all 4 x 4 output, (4 x 4 x 3 x3 - skip total amount) times the doubling, batch normalization and binarization are performed. On the other hand, in the case of a conventional BCNN that learns and infers through the processes of convolution operation, maximum pooling, multiplication, batch normalization, and binarization, since maximum pooling is performed before doubling, batch normalization, and binarization, doubling, batch Normalization and binarization can only be performed 4 x 4 times. However, even if the computation time in the multiplication, batch normalization, and binarization operation is increased compared to the conventional BCNN as described above according to a change in the process, the convolution operation takes a relatively long time compared to the multiplication, batch normalization, and binarization operation. This skip can be shortened as much as the total amount of skips, and as a result, the reasoning time can be significantly shortened compared to the prior art. The effect of improving the speed of reasoning according to the change of such a process is to be verified through an experiment to be described later.

한편, 이진화된 컨벌루션 신경망의 학습 프로세스에서는, 학습 속도의 향상을 위해, 도 4에 도시된 바와 같이, 배수화 및 배치 정규화(S42) 이후 최대값 풀링(S43)과 이진화(S44)를 수행한다. 배수화 및 배치 정규화(S42)는 종래의 BCNN과 다른 순서를 가지기 때문에, 배수화 및 배치 정규화의 결과는 종래의 BCNN의 학습 과정에서의 가중치와는 다른 값을 가질 수 있다. 배치 정규화는 결과값의 분포를 균일화 하기 때문에 풀링 윈도우 내의 -1의 비율이 상대적으로 높을 가능성이 적다.Meanwhile, in the learning process of the binarized convolutional neural network, as shown in FIG. 4 , maximum pooling (S43) and binarization (S44) are performed after multiplication and batch normalization (S42) in order to improve the learning speed. Since the multiplication and batch normalization ( S42 ) have a different order from that of the conventional BCNN, the results of the multiplication and batch normalization may have different values from the weights in the learning process of the conventional BCNN. Because batch normalization equalizes the distribution of results, it is unlikely that the ratio of -1 within the pooling window will be relatively high.

이진화된 컨벌루션 신경망의 추론 속도 향상 장치(100)에 의한 추론 시간과 분류 작업의 정확성을 비교하는 실험을 통해 효용성을 검증하고자 한다. 실험에 활용되는 컬러 영상 데이터 셋인 ImageNet과 CIFAR-10는 각각 대규모 및 소규모 데이터 세트가 분류대상으로 고려된다. CIFAR10은 32x32의 작은 크기의 컬러 영상 데이터 셋으로, 10개의 물체 분류에 대한 5만 장의 트레이닝 셋과 1만 장의 테스트 셋을 포함한다. ImageNet은 평균 469×387의 큰 크기의 컬러 영상 데이터셋으로, 1000개의 물체 분류에 대한 120만 장의 트레이닝 셋과 5만 장의 테스트 셋을 포함한다. 종래의 BCNN 모델에는 Binarized ConvNet(CIFAR-10용) Binarized AlexNet(ImageNet용)이 사용된다. Binarized ConvNet은 초기 학습 속도가 0.1이고 모멘텀이 0.9인 SGD 최적기를 기반으로 하여 배치 크기가 512인 250개의 에펙스(epochs)에 걸쳐 학습된다. Binarized AlexNet은 상기 Binarized ConvNet과 동일한 최적화 도구와 배치 크기 설정을 사용하며, 40개의 에펙스에 걸쳐 학습된다. 또한, 분류 정확도에 영향을 줄 수 있는 모든 사전 처리 또는 데이터 확대 기법은 사용하지 않는다.It is intended to verify the effectiveness of the binarized convolutional neural network through an experiment comparing the inference time by the inference speed improvement apparatus 100 and the accuracy of the classification task. ImageNet and CIFAR-10, which are color image data sets used for experiments, consider large and small data sets as classification targets, respectively. CIFAR10 is a small 32x32 color image data set, including 50,000 training sets and 10,000 test sets for 10 object classifications. ImageNet is a large color image dataset with an average size of 469×387, including 1.2 million training sets and 50,000 test sets for 1000 object classifications. In the conventional BCNN model, Binarized ConvNet (for CIFAR-10) and Binarized AlexNet (for ImageNet) are used. The Binarized ConvNet is trained over 250 epochs with a batch size of 512 based on an SGD optimizer with an initial training rate of 0.1 and momentum of 0.9. Binarized AlexNet uses the same optimization tools and batch size settings as the Binarized ConvNet above, and is trained over 40 epex. Also, any pre-processing or data augmentation techniques that may affect classification accuracy are not used.

도 6은 본원의 일 실시예에 따른 이진화된 컨벌루션 신경망의 추론 속도 향상 장치에 의한 분석 성능을 비교한 도면이다.6 is a diagram comparing analysis performance by an apparatus for improving inference speed of a binarized convolutional neural network according to an embodiment of the present application.

종래의 BCNN과 이진화된 컨벌루션 신경망의 추론 속도 향상 장치(100)를 비교하는 분류 실험은 C언어를 사용하였으며, 추론 시간이 측정되는 런타임 환경은 800MHz ARM 코텍스-A9 프로세서와 1GB SDRAM을 활용한다. 도 6의 (a)는 평균 추론 시간을 밀리초 단위로 나타내고, 도 6의 (b)는 최상위 분류 정확도를 나타낸다. 도 6을 참조하면, 최대값 풀링 시 풀링 윈도우의 중복 연산 생략을 고려하지 않은 종래의 BCNN과 이진화된 컨벌루션 신경망의 추론 속도 향상 장치(100)의 ImageNet의 이미지 분류에 대한 Binarized ConvNet과 Binarized AlexNet의 추론 속도를 비교한 결과, Binarized ConvNet는 27.5% 감소하고, Binarized AlexNet은 42.3 % 감소하여 이진화된 컨벌루션 신경망의 추론 속도 향상 장치(100)보다 종래의 BCNN 보다 빠른 것을 확인할 수 있다. 이진화된 컨벌루션 신경망의 추론 속도 향상 장치(100)는 종래의 BCNN의 영상 분석 정확도를 유지하면서, 추론의 속도를 효과적으로 저감시킬 수 있다. 뿐만 아니라, 최대값 풀링의 상한 값이 미리 설정된 경우, BCNN에 한정하지 않고, 다양한 신경망에 이진화된 컨벌루션 신경망의 추론 속도 향상 장치(100)를 적용하여 추론 속도를 향상시킬 수 있다.The classification experiment comparing the inference speed improvement apparatus 100 of the conventional BCNN and the binarized convolutional neural network used C language, and the runtime environment in which the inference time is measured utilizes an 800 MHz ARM Cortex-A9 processor and 1 GB SDRAM. Fig. 6(a) shows the average inference time in milliseconds, and Fig. 6(b) shows the highest classification accuracy. Referring to FIG. 6 , Binarized ConvNet and Binarized AlexNet for image classification of ImageNet of the conventional BCNN and binarized convolutional neural network inference speed improvement apparatus 100 that do not take into account redundant operation omission of the pooling window when pooling the maximum value. As a result of comparing the speed, it can be confirmed that the Binarized ConvNet decreases by 27.5% and the Binarized AlexNet decreases by 42.3%, which is faster than the conventional BCNN compared to the inference speed improvement apparatus 100 of the binarized convolutional neural network. The apparatus 100 for improving the inference speed of a binarized convolutional neural network can effectively reduce the speed of inference while maintaining the image analysis accuracy of the conventional BCNN. In addition, when the upper limit of the maximum pooling is set in advance, the inference speed may be improved by applying the inference speed improving apparatus 100 of the binarized convolutional neural network to various neural networks, not limited to BCNN.

도 7은 본원의 일 실시예에 따른 이진화된 컨벌루션 신경망의 추론 속도 향상 방법의 흐름을 도시한 도면이다.7 is a diagram illustrating a flow of a method for improving inference speed of a binarized convolutional neural network according to an embodiment of the present application.

도 7에 도시된 본원의 일 실시예에 이진화된 컨벌루션 신경망의 추론 속도 향상 방법은 앞선 도 3 내지 도 6을 통해 설명된 본원의 일 실시예에 따른 이진화된 컨벌루션 신경망의 추론 속도 향상 장치에 의하여 수행될 수 있다. 따라서 이하 생략된 내용이라고 하더라도 도 3 내지 도 6를 통해 본원의 일 실시예에 따른 이진화된 컨벌루션 신경망의 추론 속도 향상 장치에 대하여 설명된 내용은 도 7에도 동일하게 적용될 수 있다.The method for improving the inference speed of the binarized convolutional neural network in the embodiment of the present application shown in FIG. 7 is performed by the apparatus for improving the inference speed of the binarized convolutional neural network according to the embodiment of the present application described above with reference to FIGS. 3 to 6 . can be Therefore, even if omitted below, the description of the apparatus for improving the inference speed of a binarized convolutional neural network according to an embodiment of the present application through FIGS. 3 to 6 may be equally applied to FIG. 7 .

도 7을 참조하면, 단계 S710에서 컨벌루션 연산부(120)는 컨벌루션 레이어에 대한 입력데이터를 이진화하여 이진화 입력 데이터를 생성할 수 있다. 입력데이터를 이진화할 경우, 이진화에 따른 데이터 손실로 인한 분석 성능이 떨어질 수 있으나, 컨벌루션이 이진화된 컨벌루션으로 대체되기 때문에, 파라미터 저장을 위한 메모리 요구량과 연산 복잡도가 저감될 수 있다. 또한, 컨벌루션 연산부(120)는 상기 컨벌루션 레이어의 파라미터를 이진화하여 이진화 파라미터를 생성할 수 있다. 또한, 컨벌루션 연산부(120)는 상기 이진화 입력데이터와 상기 배수화 파라미터 간의 이진화된 컨벌루션 연산을 수행할 수 있다.Referring to FIG. 7 , in operation S710 , the convolution operation unit 120 may binarize input data for a convolution layer to generate binarized input data. When input data is binarized, analysis performance may be degraded due to data loss due to binarization, but since convolution is replaced with binarized convolution, memory requirements for parameter storage and computational complexity may be reduced. Also, the convolution operation unit 120 may generate a binarization parameter by binarizing the parameter of the convolution layer. Also, the convolution operation unit 120 may perform a binarized convolution operation between the binarized input data and the multiplier parameter.

단계 S720에서 정규화부(130)는 상기 이진화 파라미터에 대하여 배수화를 수행하여 배수화 파라미터를 생성할 수 있다. 배수화 연산은 가중치의 이진화 과정으로 인한 분석 성능의 감소를 줄이기 위해 수행될 수 있다. 다시 말해, 배수화 연산은 배수화 계수 e를 도입하여 이진화된 파라미터를 이진화 이전의 파라미터로 근사시킬 수 있는 연산을 의미한다. 또한, 정규화부(130)는 이진화된 컨벌루션 연산 결과에 대하여 상기 배수화 파라미터를 곱하여 배수화 연산을 진행할 수 있다. 또한, 정규화부(130)는 이진화된 컨벌루션 연산 결과에 대하여 상기 배수화 파라미터를 곱하여 배수화 연산을 진행하고, 배수화 연산 결과에 대해 배치 정규화를 수행할 수 있다, In step S720, the normalization unit 130 may generate a multiplier parameter by performing multiplication on the binarization parameter. The multiplication operation may be performed to reduce the decrease in analysis performance due to the binarization process of the weights. In other words, the doubling operation refers to an operation capable of approximating a binarized parameter to a parameter before binarization by introducing a doubling coefficient e. Also, the normalizer 130 may perform a multiplication operation by multiplying the binarized convolutional operation result by the multiplication parameter. In addition, the normalization unit 130 may perform a multiplication operation by multiplying the binarized convolutional operation result by the multiplication parameter, and may perform batch normalization on the multiplication operation result.

단계 S730에서 이진화 연산부(140)는 배치 정규화된 결과값을 이진화하여 이진화 연산 결과값을 산출할 수 있다.In step S730, the binarization operation unit 140 may binarize the batch normalized result value to calculate a binarization operation result value.

단계 S740에서 추론부(150)는 이진화 연산 결과값을 입력 요소로 하는 최대값 풀링을 통해 추론을 수행할 수 있다. 추론부(150)에서 수행되는 최대값 풀링은 이진화 연산부(140)에 의해 이진화된 연산 결과값을 대상으로 수행될 수 있다. 즉, 이진화 연산 결과값에 대응하는 입력 요소는 -1 또는 +1의 값을 가질 수 있다. 또한, 추론부(150)에서 수행되는 최대값 풀링의 풀링 윈도우는 K x K윈도우로서 K2개의 입력 요소를 가질 수 있다. 전술한 바와 같이, 최대값 풀링의 K2개의 입력 요소는 이진화된 값이므로, 추론부(150)는 이진화 연산 결과에 의한 풀링 윈도우 내의 요소 중 +1이 포함된 경우, 해당 풀링 윈도우의 출력 값을 +1로 결정할 수 있다. 구체적으로, 추론부(150)는 풀링 윈도우 내의 입력 요소에 대한 순차적인 최대값 풀링 계산 도중 +1 값이 확인되면 나머지 입력 요소에 대한 계산을 생략하고 상기 풀링 윈도우에 대한 최대값 풀링의 출력값을 +1로 결정할 수 있다.In step S740 , the inference unit 150 may perform inference through maximum value pooling using the result of the binarization operation as an input element. The maximum value pooling performed by the inference unit 150 may be performed on the result value of the operation binarized by the binarization operation unit 140 . That is, the input element corresponding to the result value of the binarization operation may have a value of -1 or +1. In addition, the pooling window of the maximum pooling performed by the inference unit 150 may have K2 input elements as a K x K window. As described above, since the K2 input elements of the maximum pooling are binarized values, the reasoning unit 150 increases the output value of the corresponding pooling window when +1 among the elements in the pooling window according to the binarization operation result is included. 1 can be determined. Specifically, the reasoning unit 150 omits the calculation of the remaining input elements when a +1 value is confirmed during the sequential maximum pooling calculation for the input elements within the pooling window, and adds the output value of the maximum pooling for the pooling window to + 1 can be determined.

본원의 일 실시예에 따르면, 상기 S710단계 이전에 학습부(110)는 상기 추론을 위한 이진화된 컨벌루션 신경망의 학습을 수행할 수 있다. 학습부(110)는 최대값 풀링(S43) 이전에 배치 정규화를 수행하고, 최대값 풀링 이후 배치 정규화된 결과값을 이진화할 수 있다.According to an embodiment of the present application, before the step S710, the learning unit 110 may learn the binarized convolutional neural network for the inference. The learner 110 may perform batch normalization before the maximum value pooling ( S43 ), and may binarize the batch normalized result value after the maximum value pooling.

본원의 일 실시 예에 따른, 컨벌루션 신경망의 첫번째 레이어의 개선된 이진화 방법은, 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The improved binarization method of the first layer of the convolutional neural network according to an embodiment of the present application may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and carry out program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present application is for illustration, and those of ordinary skill in the art to which the present application pertains will understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present application. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present application.

10: 풀링 윈도우
100: 이진화된 컨벌루션 신경망의 추론 속도 향상 장치
110: 학습부
120: 컨벌루션 연산부
130: 정규화부
140: 이진화 연산부
150: 추론부10: pooling window
100: Inference speedup device of binarized convolutional neural network
110: study department
120: convolution operation unit
130: normalization unit
140: binary operation unit
150: reasoning unit

Claims

A method for improving the inference speed of a binarized convolutional neural network performed by a device for speeding up inference of a binarized convolutional neural network,
(a) generating binarized input data by binarizing the input data for the convolutional layer, generating binarized parameters by binarizing the parameters of the convolutional layer, and performing a binarized convolution operation between the binarized input data and the binarized parameters ;
(b) perform doubling on the binarization parameter to generate a doubling parameter, multiply the binarized convolutional operation result by the doubling parameter to perform a doubling operation, and perform batch normalization on the doubling operation result to do;
(c) binarizing the batch normalized result value to calculate a binarization result value; and
(d) performing inference through maximum pooling using the result of the binarization operation as an input element,
The input element corresponding to the result of the binarization operation has a value of -1 or +1,
In step (d),
If +1 is included among the elements in the pooling window according to the binarization result, the output value of the corresponding pooling window is determined as +1,
In step (d), if a value of +1 is confirmed during the sequential maximum pooling calculation for the input elements within the pooling window, the calculation for the remaining input elements is omitted and the output value of the maximum pooling for the pooling window is +1 A method for improving the inference speed of a binarized convolutional neural network to determine

According to claim 1,
Prior to the step (a), further comprising the step of performing the learning of the binarized convolutional neural network for the inference,
In the performing of the learning, the batch normalization is performed before the maximum value pooling, and the batch normalized result value is binarized after the maximum value pooling. How to improve the inference speed of a binarized convolutional neural network.

delete

According to claim 1,
The pooling window is a K x K window and has k ² input elements. The method for improving inference speed of a binarized convolutional neural network.

According to claim 1,
The binarized input data is calculated by Equation 1 below,
[Equation 1]

Here, sign(·) is an operation that returns the sign of the input, and A ^b x,y,z is the (x, y, z) position component of ^{A b , the input of the binarized convolutional binarized convolutional neural network.} How to speed up your reasoning.

According to claim 1,
The binarized convolution operation is calculated by Equation 2 below,
[Equation 2]

Here, W ^b m,x,y,z is ^{the (x, y, z) position component of the mth filter W b} m, the method for improving the inference speed of a binarized convolutional neural network.

According to claim 1,
The method for improving the inference speed of a binarized convolutional neural network, wherein the multiplication parameter is calculated by Equation 3 below.
[Equation 3]

According to claim 1,
The batch normalization is calculated by Equation 4 below,
[Equation 4]

where X _m represents the input of the batch normalization process,

are the mean, variance, scaling weight, and bias weight of the m-th channel, respectively. A method for improving inference speed of a binarized convolutional neural network.

An inference speedup device for a binarized convolutional neural network, comprising:
a convolution operator for binarizing input data for a convolutional layer to generate binarized input data, generating a binarized parameter by binarizing a parameter of the convolutional layer, and performing a binarized convolution operation between the binarized input data and the binarized parameter;
A normalization unit that performs doubling on the binarization parameter to generate a doubling parameter, multiplies the binarized convolutional operation result by the doubling parameter, performs a doubling operation, and performs batch normalization on the doubling operation result. ;
a binarization operation unit for binarizing the batch normalized result value to calculate a binarization result value; and
and a reasoning unit for performing inference through maximal pooling using the result of the binarization operation as an input element,
The input element corresponding to the result of the binarization operation has a value of -1 or +1,
The reasoning unit is
When +1 is included among the elements in the pooling window according to the binarization result, the output value of the corresponding pooling window is determined as +1, but during the sequential maximum pooling calculation for the input elements in the pooling window, the +1 value is If it is confirmed, the calculation for the remaining input elements is omitted and the output value of the maximum pooling for the pooling window is determined as +1, the apparatus for improving the inference speed of a binarized convolutional neural network.

11. The method of claim 10,
Further comprising a learning unit that performs learning of the binarized convolutional neural network for the inference,
The learning unit performs the batch normalization before the maximum value pooling, and after the maximum value pooling, the apparatus for improving the inference speed of a binarized convolutional neural network to binarize the batch normalized result value.

delete

11. The method of claim 10,
The pooling window is a K x K window, which has k ² input elements, an apparatus for improving inference speed of a binarized convolutional neural network.

A computer-readable recording medium recording a program for executing the method of any one of claims 1, 2, and 5 to 9 on a computer.