KR20210048396A

KR20210048396A - Apparatus and method for generating binary neural network

Info

Publication number: KR20210048396A
Application number: KR1020200110356A
Authority: KR
Inventors: 박준용
Original assignee: 한국전자통신연구원
Priority date: 2019-10-23
Filing date: 2020-08-31
Publication date: 2021-05-03

Abstract

Disclosed are a device and method for generating a binary neural network. According to one embodiment of the present invention, the method for generating a binary neural network includes the steps of: extracting a filter weight of a real value from a first neural network for which inference training has been completed; performing a binary orthogonal transform on the filter weight; and generating a second neural network using a binary weight calculated according to the binary orthogonal transform. The present invention provides reasoning performance close to reasoning performance of an artificial neural network with full-precision while maintaining inference speed provided by a lightweight neural network.

Description

Binary neural network generation method and apparatus {APPARATUS AND METHOD FOR GENERATING BINARY NEURAL NETWORK}

본 발명은 이진 신경망 생성 방법 및 장치에 관한 것으로, 더욱 상세하게는 기존의 인공신경망을 이진 변환하여 이진 신경망을 생성하는 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for generating a binary neural network, and more particularly, to a method and apparatus for generating a binary neural network by converting an existing artificial neural network to binary.

인공지능 분야에서 각광받고 있는 합성곱 인공신경망(Convolutional Neural Networks)은 기존의 인공지능 기술보다 압도적으로 우월한 성능을 보이며 나날이 다르게 발전하고 있다. 하지만 합성곱 인공 신경망은 보다 더 많은 데이터를 훈련해서 더 높은 성능을 내기 위해서는 보다 깊어지고 넓어져야 하는데, 이러한 현상이 진행 될수록 사용되는 모델의 크기가 커지고, 처리하는 데 필요한 연산이 늘어남에 따라 연산 시간이 늘어나게 되는 단점이 있다. Convolutional Neural Networks, which are in the spotlight in the field of artificial intelligence, are overwhelmingly superior to existing artificial intelligence technologies and are developing differently day by day. However, the convolutional artificial neural network needs to be deeper and wider in order to achieve higher performance by training more data. As this phenomenon progresses, the size of the model used increases, and the computation time required to process increases. There is a downside to this increase.

이러한 단점을 보완하고자, 합성곱 인공신경망의 모델 크기를 줄이는 기법들이 제안되었다. 예를 들어, 합성곱 인공 신경망의 구조 자체를 슬림하게 설계(Lightweight)하거나, 인공 신경망의 가지 자체를 임의로 삭제(Pruning)해서 훈련을 진행시키거나, 아니면 가중치의 값을 더 적은 비트로 양자화(n-bit quantization)하는 기법들이 있다. To compensate for these shortcomings, techniques to reduce the model size of convolutional neural networks have been proposed. For example, the structure of the convolution artificial neural network itself is designed to be slim (Lightweight), the branches of the artificial neural network are arbitrarily deleted (Pruning) to proceed with training, or the weight value is quantized with fewer bits (n- There are techniques for bit quantization.

대표적인 경량 뉴럴 네트워크의 일종으로 이진 인공신경망이 있다. 이진 인공신경망은 기존 인공신경망의 속도를 대폭적으로 상승시키고 인공신경망 모델의 메모리 용량을 대폭 줄일 수 있다는 점에서 획기적인 방식이나, 기존의 부동소수점인 가중치와 활성화 함수를 -1과 1로 표현하기 때문에 정보의 손실이 발생한다는 단점이 있다. 이러한 정보 손실은 결과적으로 정확도 저하로 이어지며, 사물을 인식하거나 물건을 탐지하는 데 있어 성능 저하를 가져올 수 있다.A typical lightweight neural network is a binary artificial neural network. The binary artificial neural network is an innovative method in that it can significantly increase the speed of the existing artificial neural network and significantly reduce the memory capacity of the artificial neural network model. There is a disadvantage in that the loss occurs. This loss of information leads to a decrease in accuracy as a result, and may result in performance degradation in recognizing objects or detecting objects.

인공신경망의 이진화로 인한 정보 손실 및 정확도 저하의 핵심이 단순 이진 변환에 있다는 것에 착안하여 많은 이진 인공 신경망들이 증폭/보완 계수를 사용해서 이를 보완하고 정보 손실을 해결하고자 하였다. 하지만 기본적으로 이진화를 통한 정보손실이 훈련시 그래디언트(Gradient)에도 영향을 미치기 때문에 여전히 이진 인공신경망을 훈련하는 것은 매우 어려운 일이다. Focusing on the fact that simple binary transformation is the key to information loss and accuracy degradation due to the binarization of artificial neural networks, many binary artificial neural networks use amplification/complement coefficients to compensate for this and solve information loss. However, it is still very difficult to train a binary artificial neural network because information loss through binarization affects the gradient during training.

상기와 같은 문제점을 해결하기 위한 본 발명의 목적은, 이진 신경망 생성 방법을 제공하는 데 있다.An object of the present invention for solving the above problems is to provide a method for generating a binary neural network.

상기와 같은 문제점을 해결하기 위한 본 발명의 다른 목적은, 상기 이진 신경망 생성 방법을 사용하는 이진 신경망 생성 장치를 제공하는 데 있다.Another object of the present invention for solving the above problems is to provide a binary neural network generation apparatus using the binary neural network generation method.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 이진 신경망 생성 방법은, 추론 훈련이 완료된 제1 신경망으로부터 실수 값의 필터 가중치를 추출하는 단계; 상기 필터 가중치에 대한 이진 직교 변환을 수행하는 단계; 및 상기 이진 직교 변환에 따라 산출된 이진 가중치를 이용해 제2 신경망을 생성하는 단계를 포함할 수 있다. A method for generating a binary neural network according to an embodiment of the present invention for achieving the above object includes: extracting a real-valued filter weight from a first neural network on which inference training has been completed; Performing binary orthogonal transformation on the filter weights; And generating a second neural network using the binary weight calculated according to the binary orthogonal transformation.

여기서, 상기 제1 신경망은 합성곱 신경망이고, 상기 필터 가중치는 합성곱 필터의 곱셈 증폭계수 및 상수 증폭 계수를 포함할 수 있다. Here, the first neural network is a convolutional neural network, and the filter weight may include a multiplication amplification factor and a constant amplification factor of the convolutional filter.

상기 필터 가중치에 대한 이진 직교 변환을 수행하는 단계는, 이진 직교 벡터를 생성하는 단계; 상기 이진 직교 벡터의 각 열을 추출하여 하나 이상의 이진 필터를 생성하는 단계; 및 상기 하나 이상의 이진 필터를 이용해 이진 곱셈 증폭계수 및 이진 상수 증폭 계수를 계산하는 단계를 포함할 수 있다. The performing of the binary orthogonal transformation on the filter weight may include generating a binary orthogonal vector; Generating at least one binary filter by extracting each column of the binary orthogonal vector; And calculating a binary multiplication amplification factor and a binary constant amplification factor using the at least one binary filter.

상기 이진 곱셈 증폭계수 및 이진 상수 증폭 계수는, 제1 신경망이 포함하는 실수 합성곱 필터에 대한 벡터, 상기 하나 이상의 이진 필터에 대한 벡터 및 합성곱 필터에 대한 벡터의 크기 값을 이용하여 표현되는 수학식에 의해 계산될 수 있다. The binary multiplication amplification coefficient and the binary constant amplification coefficient are mathematics expressed using a vector for a real convolution filter included in the first neural network, a vector for the one or more binary filters, and a vector for the convolution filter. It can be calculated by the equation.

상기 제2 신경망은, 일반화 함수, 이진활성화 함수, 이진 합성곱 함수, 및 활성화 함수를 포함하는 하나 이상의 합성곱 레이어를 포함할 수 있다. The second neural network may include at least one convolutional layer including a generalization function, a binary activation function, a binary convolution function, and an activation function.

상기 이진 곱셈 증폭계수 및 이진 상수 증폭 계수는, 상기 제2 신경망 내 합성곱 필터의 가중치로 삽입될 수 있다.The binary multiplication amplification factor and the binary constant amplification factor may be inserted as a weight of the convolutional filter in the second neural network.

상기 이진 직교 벡터는 하다마드 매트릭스(Hadamard Matrix)일 수 있다.The binary orthogonal vector may be a Hadamard Matrix.

상기 이진활성화 함수는 sign 함수를 포함할 수 있다. The binary activation function may include a sign function.

상기 다른 목적을 달성하기 위한 본 발명의 일 실시예에 따른 이진 신경망 생성 장치는, 프로세서; 및 상기 프로세서를 통해 실행되는 적어도 하나의 명령을 저장하는 메모리를 포함할 수 있으며, 상기 적어도 하나의 명령은, 추론 훈련이 완료된 제1 신경망으로부터 실수 값의 필터 가중치를 추출하도록 하는 명령; 상기 필터 가중치에 대한 이진 직교 변환을 수행하도록 하는 명령; 및 상기 이진 직교 변환에 따라 산출된 이진 가중치를 이용해 제2 신경망을 생성하도록 하는 명령을 포함할 수 있다. An apparatus for generating a binary neural network according to an embodiment of the present invention for achieving the above other object includes: a processor; And a memory for storing at least one instruction executed through the processor, wherein the at least one instruction includes: an instruction for extracting a real-valued filter weight from the first neural network on which inference training has been completed; Instructions for performing binary orthogonal transformation on the filter weight; And an instruction for generating a second neural network using the binary weight calculated according to the binary orthogonal transformation.

여기서, 상기 제1 신경망은 합성곱 신경망이고, 상기 필터 가중치는 합성곱 필터의 곱셈 증폭계수 및 상수 증폭 계수를 포함할 수 있다.Here, the first neural network is a convolutional neural network, and the filter weight may include a multiplication amplification factor and a constant amplification factor of the convolutional filter.

상기 필터 가중치에 대한 이진 직교 변환을 수행하도록 하는 명령은, 이진 직교 벡터를 생성하도록 하는 명령; 상기 이진 직교 벡터의 각 열을 추출하여 하나 이상의 이진 필터를 생성하도록 하는 명령; 및 상기 하나 이상의 이진 필터를 이용해 이진 곱셈 증폭계수 및 이진 상수 증폭 계수를 계산하도록 하는 명령을 포함할 수 있다. The instructions for performing a binary orthogonal transform on the filter weight include: an instruction for generating a binary orthogonal vector; Instructions for generating at least one binary filter by extracting each column of the binary orthogonal vector; And an instruction for calculating a binary multiplication amplification factor and a binary constant amplification factor using the at least one binary filter.

상기 이진 곱셈 증폭계수 및 이진 상수 증폭 계수는, 상기 제2 신경망 내 합성곱 필터의 가중치로 삽입될 수 있다. The binary multiplication amplification factor and the binary constant amplification factor may be inserted as a weight of the convolutional filter in the second neural network.

상기 이진 직교 벡터는 하다마드 매트릭스(Hadamard Matrix)일 수 있다. The binary orthogonal vector may be a Hadamard Matrix.

또한, 상기 이진활성화 함수는 sign 함수를 포함할 수 있다.In addition, the binary activation function may include a sign function.

상기와 같은 본 발명의 실시예들에 따르면, 경량화된 신경망이 제공하는 추론 속도를 유지하면서도 완전 정확성(Full-Precision)을 갖는 인공 신경망에 가까운 추론 성능을 제공할 수 있다. According to the embodiments of the present invention as described above, while maintaining the inference speed provided by the lightweight neural network, it is possible to provide inference performance close to that of an artificial neural network having full-precision.

즉, 기존의 이진 인공신경망보다 월등한 성능을 얻을 수 있음과 동시에 이진 연산을 통한 속도 개선 효과 또한 기대할 수 있다.In other words, it is possible to obtain superior performance compared to the existing binary artificial neural network, and at the same time, the speed improvement effect through binary operation can also be expected.

도 1은 인공인경망 유형에 따른 성능 특성을 나타낸 테이블이다.
도 2는 본 발명의 일 실시예에 따른 이진 신경망 생성 장치의 개념도이다.
도 3a 및 3b는 추론 모델에 사용되는 인공신경망 내부의 합성곱 레이어들의 구조도이다.
도 4는 여러 이진화 함수 및 그에 따른 함수 곡선을 나타낸다.
도 5는 본 발명의 일 실시예에 따른 인공신경망 내 합성곱 레이어의 블록 구성도이다.
도 6은 본 발명의 일 실시예에 따른 가중치 이진화 방법의 상세 개념을 나타내는 도면이다.
도 7은 본 발명에 적용되는 하다마드 매트릭스를 생성하는 과정을 나타낸 도면이다.
도 8은 본 발명의 일 실시예에 따라 이진 신경망을 생성하는 방법의 동작 순서도이다.
도 9는 본 발명의 일 실시예에 따른 이진 신경망 생성 장치의 블록 구성도이다. 1 is a table showing the performance characteristics according to the artificial neural network type.
2 is a conceptual diagram of an apparatus for generating a binary neural network according to an embodiment of the present invention.
3A and 3B are structural diagrams of convolutional layers inside an artificial neural network used in an inference model.
4 shows several binarization functions and corresponding function curves.
5 is a block diagram of a convolutional layer in an artificial neural network according to an embodiment of the present invention.
6 is a diagram illustrating a detailed concept of a weight binarization method according to an embodiment of the present invention.
7 is a diagram showing a process of generating a Hadamard matrix applied to the present invention.
8 is a flowchart illustrating a method of generating a binary neural network according to an embodiment of the present invention.
9 is a block diagram of an apparatus for generating a binary neural network according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. In the present invention, various modifications may be made and various embodiments may be provided, and specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to a specific embodiment, it should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the present invention. In describing each drawing, similar reference numerals have been used for similar elements.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는 데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. "및/또는"이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다. Terms such as first, second, A, and B may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another component. For example, without departing from the scope of the present invention, a first element may be referred to as a second element, and similarly, a second element may be referred to as a first element. The term “and/or” includes a combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. When a component is referred to as being "connected" or "connected" to another component, it is understood that it may be directly connected or connected to the other component, but other components may exist in the middle. It should be. On the other hand, when a component is referred to as being "directly connected" or "directly connected" to another component, it should be understood that there is no other component in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present application, terms such as "comprise" or "have" are intended to designate the presence of features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, but one or more other features. It is to be understood that the presence or addition of elements or numbers, steps, actions, components, parts, or combinations thereof does not preclude in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein including technical or scientific terms have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms as defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and should not be interpreted as an ideal or excessively formal meaning unless explicitly defined in the present application. Does not.

본 발명은 인공지능(Artificial Intelligence) 분야에서 딥러닝(Deep Learning), 합성곱 신경망(Convolutional Neural Network) 모델의 압축(Model Compression) 관련 기술에 관한 것으로, 보다 구체적으로는 합성곱 신경망 모델의 압축을 위한 n-비트 양자화(n-bit Quantization) 기법 중 1-비트 양자화 방법인 이진화(Binarization)와 연관될 수 있다. The present invention relates to a technology related to deep learning and model compression of a convolutional neural network model in the field of artificial intelligence, and more specifically, the compression of a convolutional neural network model. It may be related to binarization, which is a 1-bit quantization method, among n-bit quantization techniques.

본 발명에서는, 기존 합성곱 신경망에서 32비트의 부동 소수점(Floating Point 32Bit, 이하 FP32)의 가중치를 직접적인 계산을 통해 이진 형태로 변환하는 방법을 제안한다. 본 발명에 따른 방법을 사용하면 기존의 이진 인공신경망보다 월등한 성능을 얻을 수 있음과 동시에 이진 연산을 통한 속도 개선 효과 또한 기대할 수 있다. 또한, 이를 달성하기 위한 훈련 과정을 대부분 생략할 수 있고 정보의 손실을 제거할 수 있다. 본 발명에서 제안하는 방법은 또한 수학적으로 닫힌 형태의 솔루션(Closed-form solution)이다.In the present invention, a method of converting a weight of a floating point of 32 bits (Floating Point 32Bit, FP32) into a binary form through direct calculation in an existing convolutional neural network is proposed. If the method according to the present invention is used, performance superior to that of a conventional binary artificial neural network can be obtained, and at the same time, an effect of improving speed through binary computation can also be expected. In addition, most of the training process to achieve this can be omitted and loss of information can be eliminated. The method proposed by the present invention is also a mathematically closed-form solution.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다. Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 인공신경망 유형에 따른 성능 특성을 나타낸 테이블이다. 1 is a table showing performance characteristics according to types of artificial neural networks.

도 1은 통상적인 합성곱 인공신경망(10)을 기준으로, 이진 가중치를 사용하는 인공신경망(20)과 이진 가중치 및 이진 입력을 사용하는 인공신경망(30)에서 사용하는 연산자들(operations), 각 신경망에서의 메모리 소비 감소량, 연산 감소량을 나타낸 것이다. 이진 가중치 및 이진 입력을 사용하는 대표적인 신경망의 예로 XNOR-Net을 들 수 있다. 1 shows operators used in the artificial neural network 20 using binary weights and the artificial neural network 30 using binary weights and binary inputs based on a conventional convolutional artificial neural network 10, respectively. It shows the reduction in memory consumption and computation in the neural network. An example of a typical neural network using binary weights and binary inputs is XNOR-Net.

이진 신경망(Binary Neural Networks)은 기존의 합성곱 인공신경망의 가중치와 활성화 함수를 1-bit로 양자화하는 네트워크이다. 기존의 부동 소수점(floating point)의 32-bit 숫자를 1-bit인 {+1, -1}로 양자화하면서 모델의 사이즈를 파격적으로 줄일 수 있다. 또한, 이진 값으로 이루어진 합성곱의 경우, 부호 연산인 XNOR와 비트의 갯수 연산인 POPCOUNT 연산을 단번에 실행할 수 있다. 이러한 연산 방법을 사용하면 최대 64비트를 지원하는 프로세서에서 동작하는 경우 한번에 64개의 비트를 처리해, 대략 60배에 달하는 속도 이득을 기대할 수 있다. Binary Neural Networks are networks that quantize weights and activation functions of existing convolutional artificial neural networks into 1-bit. By quantizing a 32-bit number of a conventional floating point to 1-bit {+1, -1}, the size of the model can be dramatically reduced. In addition, in the case of convolution made of binary values, XNOR as a sign operation and POPCOUNT operation as a number of bits operation can be performed at once. Using this method of operation, when operating on a processor that supports up to 64 bits, 64 bits are processed at a time, and a speed gain of approximately 60 times can be expected.

도 1을 참조하면, 이진 인공신경망에 대해 32비트, 64비트 단위의 변수를 사용하는 연산 기기들을 사용하면, 64개의 -1과 +1을 1비트로 압축시켜서 한번의 연산으로 계산하므로, 통상적인 합성곱(standard convolution) 대비 대략 60배에 달하는 연산속도를 얻을 수 있다. 이러한 연산은 인공신경망의 입력과 신경망 내부의 가중치, 필터 모두가 1비트로 이진화가 되었을 때만 가능한 것이다. Referring to FIG. 1, when computing devices that use 32-bit and 64-bit variables for a binary artificial neural network, 64 -1s and +1s are compressed into 1 bit and calculated in one operation. You can get about 60 times the computation speed compared to the standard convolution. This operation is possible only when the input of the artificial neural network, the weight inside the neural network, and the filter are all binarized to 1 bit.

본 발명에서는 입력의 이진화 방법 및 가중치의 이진화 방법을 사용하는 이진 신경망을 제안한다. 본 발명은 그에 따라 인공신경망의 속도 향상 효과를 얻을 수 있을뿐 아니라 추론(inference)의 정확도 또한 대폭 끌어올릴 수 있다.In the present invention, a binary neural network using an input binarization method and a weight binarization method is proposed. According to the present invention, not only can the speed improvement effect of the artificial neural network be obtained, but also the accuracy of inference can be greatly improved.

도 2는 본 발명의 일 실시예에 따른 이진 신경망 생성 장치의 개념도이다. 2 is a conceptual diagram of an apparatus for generating a binary neural network according to an embodiment of the present invention.

본 발명의 실시예들에 따른 이진 신경망 생성 장치는 일반적인 FP32를 갖는 일반적인 인공 신경망(100)을 활용한다. 완전 정확성(Full-Precision)의 통상적인 인공 신경망은 많은 부분에서 정밀한 예측을 할 수 있지만, 모델의 크기가 너무 크기 때문에 엣지 기기 등에서 사용되기 어려운 점이 있다. The apparatus for generating a binary neural network according to embodiments of the present invention utilizes a general artificial neural network 100 having a general FP32. A typical artificial neural network with full-precision can make precise predictions in many areas, but it is difficult to use in edge devices because the size of the model is too large.

따라서, 본 발명에서는 이러한 일반 인공 신경망을 이진화하기 위해, 일반 인공신경망(100)으로부터 부동소수점 가중치를 추출하여 다차원 행렬인 완전 정확도 텐서(full precision Tensor)(11)을 생성한다. 이때, 본 발명에서 이용할 수 있는 인공신경망은 어떠한 합성곱 신경망이라도 무방하다. 합성곱 신경망은 예를 들어, AlexNet, ResNet, NasNet 등이 될 수 있다. Accordingly, in the present invention, in order to binarize such a general artificial neural network, a full precision tensor 11 which is a multidimensional matrix is generated by extracting a floating-point weight from the general artificial neural network 100. At this time, the artificial neural network that can be used in the present invention may be any convolutional neural network. The convolutional neural network may be, for example, AlexNet, ResNet, NasNet, or the like.

이이서, 본 발명에서는 부동소수점 가중치로 구성된 다차원 행렬에 대한 이진화를 수행하여 이진 행렬인 바이너리 텐서(Binary Tensor)(21)을 생성한다. 생성된 이진 행렬(21)은 본 발명의 일 실시예에 따른 이진 인공신경망(200)으로 제공된다. 본 발명의 일 실시예에 따른 이진 인공신경망(200)은 모바일 엣지 컴퓨팅 환경에서 도 2에 도시된 이미지 파일(201)과 같은 입력을 수신하여 추론을 수행하고 추론에 따른 결과 값을 출력할 수 있다. Then, in the present invention, a binary tensor (21), which is a binary matrix, is generated by performing binarization on a multidimensional matrix composed of floating-point weights. The generated binary matrix 21 is provided to the binary artificial neural network 200 according to an embodiment of the present invention. The binary artificial neural network 200 according to an embodiment of the present invention may perform inference by receiving an input such as the image file 201 shown in FIG. 2 in a mobile edge computing environment, and output a result value according to the inference. .

도 2의 아래에서는 본 발명에 따른 이진 신경망 생성 장치에서 수행될 수 있는 이진 인공신경망 생성 방법의 동작 순서를 나타내고 있다. 2 shows an operation sequence of a method for generating a binary artificial neural network that can be performed in the apparatus for generating a binary neural network according to the present invention.

본 발명의 일 실시예에 따른 이진 신경망 생성 방법에서는 우선, 통상적인 인공 신경망(100)의 가중치들을 모두 추출한다(S110). 추출된 가중치들 각각은 텐서(Tensor)의 방식으로 압축되고 이들에 대해 직교 행렬을 사용한 이진화가 진행된다(S120). 직교 변환된 가중치(Binary Tensor의 형태)는 다시 본 발명에 따른 이진 인공 신경망(200)으로 입력된다(S130). 이후, 직교변환된 이진 가중치를 이용해 신경망의 필터가 확정되고 이진 신경망(200)에 대한 정밀 조정(fine-tuning)(S140)을 거쳐 본 발명에 따른 신경망의 이진화가 완료될 수 있다. In the method of generating a binary neural network according to an embodiment of the present invention, first, all weights of the conventional artificial neural network 100 are extracted (S110). Each of the extracted weights is compressed using a tensor, and binarization using an orthogonal matrix is performed on them (S120). The orthogonally transformed weight (in the form of a binary tensor) is again input to the binary artificial neural network 200 according to the present invention (S130). Thereafter, the filter of the neural network is determined using the orthogonally transformed binary weight, and the binarization of the neural network according to the present invention may be completed through fine-tuning (S140) for the binary neural network 200.

도 3a 및 3b는 추론 모델에 사용되는 인공신경망 내부의 합성곱 레이어들의 구조도이다. 3A and 3B are structural diagrams of convolutional layers inside an artificial neural network used in an inference model.

인공 신경망은 기계 학습에 가장 일반적으로 활용되는 기술이다. 추론해야 할 데이터가 인공신경망으로 입력되면 수많은 뉴런들로 구성된 다층 레이어들을 바탕으로 데이터의 특징을 뉴런에 학습시키는 방식을 사용해 인공신경망을 학습시킨다. 합성곱 신경망은 인공 신경망의 하나로, 입력 데이터와 필터의 합성곱(convolution)을 이용해서 데이터를 보다 용이하게 분석하는 데 쓰인다. Artificial neural networks are the most commonly used technology for machine learning. When data to be inferred is input to the artificial neural network, the artificial neural network is trained using a method of learning the features of the data to neurons based on multi-layered layers composed of numerous neurons. A convolutional neural network is one of artificial neural networks, and is used to analyze data more easily by using the convolution of input data and filters.

합성곱 신경망은 대량의 시각(visual) 정보가 사용되는 분야에 주로 사용되고 있으며, 많은 양의 데이터를 훈련함에도 불구하고 추론 정확도가 높아 활용도가 높다. The convolutional neural network is mainly used in a field where a large amount of visual information is used, and despite the training of a large amount of data, its inference accuracy is high and its utilization is high.

도 3a는 기존의 부동소수점의 완전 정확도(Full-precision) 합성곱을 사용하는 레이어를 나타내며, 이 경우의 합성곱 레이어(310)는 합성곱 함수(311), 부분 정규분포 일반화(Batch Norm) 함수(312), 및 활성화(Activation) 함수(313)를 포함하여 구성될 수 있다.3A shows a layer using a conventional full-precision convolution of floating-point numbers, and in this case, the convolution layer 310 includes a convolution function 311 and a partial normal distribution generalization (Batch Norm) function ( 312), and an activation function 313 may be included.

도 3b는 입력 이진화를 사용한 인공신경망의 합성곱 레이어를 나타낸다. 해당 합성곱 레이어(320)는 일반화 함수인 배치 정규화(Batch Normm; Batch Normalization)(321), 이진 활성화(Bin Active) 함수(322), 이진 합성곱 함수(Bin Conv)(323), 및 활성화 함수인 ReLU(Rectified Linear Unit)(324)를 포함하여 구성될 수 있다.3B shows a convolutional layer of an artificial neural network using input binarization. The convolutional layer 320 is a generalization function of batch normalization (321), a binary activation function (322), a binary convolution function (Bin Conv) 323, and an activation function. It may be configured to include a ReLU (Rectified Linear Unit) 324.

이진 활성화(Binary Activation) 함수를 사용하면 입력에 대한 이진화를 수행할 수 있다. 다양한 이진화 방식이 활용될 수 있으며, 도 4는 여러 이진화 함수 및 그에 따른 함수 곡선을 나타낸다. 도 4를 참조하면, 이진화는 입력 데이터를 (-1) 혹은 (+1)로 간략화하는 과정으로 이해될 수 있으며, 이진화 오퍼레이션으로는 하이퍼볼릭 탄젠트 함수(Tanh(x)), 사인 함수(Sign(x)), 및 HTanh(x) 등이 사용될 수 있다. 도 4에서는 각 함수에 대한 함수 곡선(Function plots) 및 미분 곡선(Derivative plots)이 함께 도시되어 있다. Binary Activation can be performed on the input by using the Binary Activation function. Various binarization schemes may be used, and FIG. 4 shows several binarization functions and function curves corresponding thereto. 4, binarization can be understood as a process of simplifying input data to (-1) or (+1), and binarization operations include a hyperbolic tangent function (Tanh(x)) and a sine function (Sign( x)), and HTanh(x) and the like can be used. In FIG. 4, function plots and derivative plots for each function are shown together.

본 발명의 바람직한 실시예에 따르면 Sign함수에 기반한 함수들이 이진화에 사용될 수 있다.According to a preferred embodiment of the present invention, functions based on the Sign function can be used for binarization.

이진 합성곱 레이어(320)에서는 계산 속도의 혜택을 보기 위해서 이진 가중치뿐만 아니라 입력의 이진화를 수행해야 하는데, 입력에 대해 이진화를 수행하는 이진 활성화(Binary Activation) 함수를 먼저 수행하고 일반화를 진행하면 정보의 손실이 일어나고 입력이 다시 부동소수점이 되어버릴 수 있다. 따라서, 입력 이진화를 수행하기 전에 입력에 대한 일반화(321)를 먼저 진행해서 평균 0을 기준으로 데이터를 정리하고, 그 후에 입력의 이진화(322) 그리고 나서 이진 합성곱(Bin Convolution) 함수(323) 및 활성화 함수 ReLU(324)를 수행하는 것이 바람직하다.In the binary convolutional layer 320, binary weights as well as inputs must be binarized in order to benefit from computational speed. Can cause the loss of and the input become floating point again. Therefore, before performing input binarization, the input is generalized (321) first, and the data is arranged based on the average of 0, then the input is binarized (322), and then the binary convolution function (323). And it is preferable to perform the activation function ReLU (324).

도 3b를 통해 살펴본 바와 같은 이진 인공신경망은 추론할 때 속도의 증가와 저장할 때 메모리 감소로 각광 받았지만, 훈련할 때 이진화의 단점이 드러난다. 보다 구체적으로, 이러한 이진 인공신경망은 그래디언트(Gradient) 값을 이진화하지 못하기도 하고, 오히려 늘어난 함수들의 개수 및 그래디언트 연산의 복잡도 때문에 훈련하는 속도가 기존의 합성곱 신경망보다 느려지는 현상이 초래될 수 있다. The binary artificial neural network as shown in FIG. 3B has been spotlighted for an increase in speed during inference and a decrease in memory during storage, but the disadvantage of binarization is revealed when training. More specifically, such a binary artificial neural network may not be able to binarize the gradient value, and rather, the training speed may be slower than that of the conventional convolutional neural network due to the increased number of functions and the complexity of the gradient operation.

따라서, 본 발명에서는 이진화되지 않은 인공신경망 및 이진화된 인공신경망 양쪽 모두의 장점을 가져와 활용하는 방법을 택했다. Therefore, in the present invention, a method of taking advantage of both the non-binarized artificial neural network and the binarized artificial neural network was selected and utilized.

도 5는 본 발명의 일 실시예에 따른 인공신경망 내 합성곱 레이어의 블록 구성도이다. 5 is a block diagram of a convolutional layer in an artificial neural network according to an embodiment of the present invention.

본 발명의 바람직한 일 실시예에 따른 인공신경망(520)은 도 3b를 통해 살펴보았던 이진화된 인공신경망의 구성과 유사하지만, 컨볼루션 함수의 구성이 상이하다. 즉, 완전 정확도 합성곱(Full Precision Convolution) 함수를 사용해서 최대로 정밀한 결과를 도출하도록 훈련시킨 후, 훈련된 합성곱 함수를 즉각적으로 이진화하여 사용(523)함으로써, 이진 인공 신경망을 훈련할 때 발생하는 문제들을 모두 회피할 수 있다. The artificial neural network 520 according to a preferred embodiment of the present invention is similar to the configuration of the binarized artificial neural network shown through FIG. 3B, but the configuration of the convolution function is different. In other words, it occurs when training a binary artificial neural network by training to derive the most precise result using the Full Precision Convolution function, and then immediately binarizing the trained convolution function (523). You can avoid all the problems that you do.

정리하면, 본 발명의 실시예들에 따른 인공신경망의 합성곱 레이어(520)는 완전 정확도 합성곱 함수를 사용하는 인공신경망(510)을 훈련하여 도출된 필터 가중치들을 변환하여 이진 신경망(520)의 컨벌루션 함수의 가중치로 사용한다. 그에 따라, 본 발명의 실시예들에 따른 인공신경망의 합성곱 레이어(520)는 배치 정규화(Batch Normm; Batch Normalization), 이진 활성화(Bin Active) 함수(322), 이진 합성곱 함수(Bin Conv), 및 활성화 함수 ReLU(Rectified Linear Unit)(324)를 포함하여 구성될 수 있다. 이때, 이진 합성곱 함수(Bin Conv)는 가중치 변환 과정을 통해 이진화된 필터 가중치, 즉 이진 곱셈 증폭계수 및 이진 상수 증폭계수를 가지게 된다. In summary, the convolutional layer 520 of the artificial neural network according to the embodiments of the present invention transforms the filter weights derived by training the artificial neural network 510 using a full-accuracy convolution function, It is used as the weight of the convolution function. Accordingly, the convolutional layer 520 of the artificial neural network according to embodiments of the present invention includes a batch normalization (Batch Normm), a binary activation function 322, and a binary convolution function (Bin Conv). , And an activation function ReLU (Rectified Linear Unit) 324. At this time, the binary convolution function (Bin Conv) has a filter weight that has been binarized through a weight conversion process, that is, a binary multiplication amplification factor and a binary constant amplification factor.

도 6은 본 발명의 일 실시예에 따른 가중치 이진화 방법의 상세 개념을 나타내는 도면이다. 6 is a diagram illustrating a detailed concept of a weight binarization method according to an embodiment of the present invention.

본 발명의 일 실시예에 따라 인공신경망 필터의 가중치를 이진화하는 방법에서는, 잘 훈련된 일반 인공신경망으로부터 추출된 부동소수점 가중치로 이루어진 완전 정확도 텐서(11)로부터 이진 직교 벡터를 생성하고(S610), 곱셈 증폭계수를 추출하고(S620), 상수 증폭계수를 추출(S630)하는 단계를 포함할 수 있다.In a method of binarizing the weight of an artificial neural network filter according to an embodiment of the present invention, a binary orthogonal vector is generated from a full accuracy tensor 11 consisting of floating-point weights extracted from a well-trained general artificial neural network (S610), It may include the step of extracting the multiplication amplification factor (S620), and extracting the constant amplification factor (S630).

이하에서는 가중치를 이진화하는 방법을 보다 상세히 설명한다. Hereinafter, a method of binarizing the weights will be described in more detail.

본 발명에 적용되는 합성곱 신경망은 아래 수학식 1과 같이 표현될 수 있다. The convolutional neural network applied to the present invention can be expressed as Equation 1 below.

여기서, W는 실수 필터, α는 이진 필터에 사용할 곱셈 증폭 계수, β는 이진 필터에 사용할 상수 증폭 계수이다. 또한, B는 본 발명의 일 실시예에 따라 생성될 이진 필터, 1은 전부 1로 이루어진 상수 이진 필터를 나타낸다. 즉, 실수로 이루어진 필터를 다수의 이진 필터, 전부 1로 이루어진 상수 필터, 및 증폭계수들을 통해서 표현할 수 있다. 이 경우, 이진 필터의 개수가 늘어나더라도, 이진연산 자체가 가져다 주는 성능향상 효과가 워낙 뛰어나기 때문에 높은 정확도와 빠른 연산이 가능한다.Here, W is a real filter, α is a multiplication amplification factor to be used in a binary filter, and β is a constant amplification factor to be used in a binary filter. Further, B denotes a binary filter to be generated according to an embodiment of the present invention, and 1 denotes a constant binary filter composed of all 1s. That is, a filter made of real numbers can be expressed through a plurality of binary filters, a constant filter made of all 1s, and amplification coefficients. In this case, even if the number of binary filters increases, the performance improvement effect provided by the binary operation itself is so excellent that high accuracy and fast operation are possible.

예를 들어, 합성곱 연산이 교환 법칙과 결합법칙을 따르기 때문에 아래 수학식 2과 같은 관계가 성립될 수 있다. For example, since the convolution operation follows the commutative law and the associative law, the relationship shown in Equation 2 below can be established.

입력이 I, 실수 합성곱 필터 W, 합성곱 기호를 ⊙ 이라고 가정하면, 기존의 실수 합성곱 필터의 연산을 증폭계수가 포함된 이진화로 바꾸어도 여전히 이진연산이 가능함을 확인할 수 있다. Assuming that the input is I, the real convolution filter W, and the convolution symbol ⊙ , it can be confirmed that binary operation is still possible even if the operation of the existing real convolution filter is changed to binarization including the amplification factor.

여기서, 원래의 실수 합성곱 필터의 식에서 계산의 편의상 행렬 W를 벡터w로 치환하고 행렬 B를 벡터b로 치환하고, 단일 K값(k=1), b = Sign(w)라고 가정하면 아래 수학식 3과 같이 표현할 수 있다. Here, in the equation of the original real convolution filter , assuming that the matrix W is substituted with the vector w and the matrix B is substituted with the vector b , and a single K value (k=1), b = Sign(w) for convenience of calculation, the following mathematics It can be expressed as Equation 3.

또한, w에 근접할 b와 α, β값을 구하기 위해, 이들 간의 에러(Error)를 최소화하는 수학식 4를 아래와 같이 표현할 수 있다. In addition, in order to obtain the values of b , α, and β that will be close to w, Equation 4 for minimizing an error between them can be expressed as follows.

각 α, β값에 대한 편미분으로 서로에 대한 값을 계산하면 아래 수학식 5와 같이 표현될 수 있다. If the values for each other are calculated by partial derivatives for each α and β values, it can be expressed as Equation 5 below.

수학식 5는 다시 아래 수학식 6과 같이 표현될 수 있다.Equation 5 may be expressed as Equation 6 below.

여기서, M은 w 벡터의 크기이다. Here, M is the size of the w vector.

즉, 1개의 이진 필터를 사용하는 b는 sign(w)에 근접하고 이를 통해 α, β값를 직접적으로 계산할 수 있다. That is, b using one binary filter is close to sign(w), and through this, the values of α and β can be calculated directly.

추가적으로, 본 발명에서는 이진 필터를 구성함에 있어 직교 벡터를 이용한다. 본 발명의 실시예에서는 직교 벡터의 예로 하다마드 매트릭스를 이용할 수 있다. Additionally, in the present invention, an orthogonal vector is used to construct a binary filter. In an embodiment of the present invention, a Hadamard matrix may be used as an example of an orthogonal vector.

각 성분이 {-1, 1}로 구성된 NxN 행렬이 주어졌을 때 이러한 행렬

이 아래 수학식 7에 의해 정의되는 속성을 가지게 되는 경우 이를 N차 하다마드 매트릭스라고 할 수 있다. Given an N-by-N matrix of each component {-1, 1}

In the case of having the property defined by Equation 7 below, it may be referred to as an N-order Hadamard matrix.

즉, 하다마드 매트릭스는 매트릭스 내 모든 성분이 1 혹은 -1로 이루어져 있어 행 벡터들과 열 벡터들이 서로 직교하는 행렬을 뜻한다.That is, the Hadamard matrix refers to a matrix in which row vectors and column vectors are orthogonal to each other because all components in the matrix are 1 or -1.

도 7은 본 발명에 적용되는 하다마드 매트릭스를 생성하는 과정을 나타낸 도면이다. 7 is a diagram showing a process of generating a Hadamard matrix applied to the present invention.

도 7은 1차 하다마드 매트릭스로부터 8차 하다마드 매트릭스를 생성하는 과정을 도시한다. 하다마드 매트릭스의 큰 특징은 열과 행들이 서로에 대해서 모두 직교한다는 점이다. 즉, 임의의 i번째 열과 j번째 열(i≠j일 때)에 대해 벡터의 내적곱을 하면 그 결과는 반드시 0이 나온다. 7 shows a process of generating an eighth-order Hadamard matrix from the first-order Hadamard matrix. The great feature of the Hadamard Matrix is that the columns and rows are all orthogonal to each other. That is, if the dot product of a vector is multiplied for an i-th column and a j-th column (when i≠j), the result is always 0.

본 발명에서는 이러한 속성을 갖는 하다마트 매트릭스의 열들을 추출해서 이진 필터를 구성한 후, 해당 속성을 이용해 데이터나 추가적인 훈련없이 증폭계수 값에 대한 직접적인 변환을 가능토록 한다. In the present invention, after constructing a binary filter by extracting the columns of the Hadamat matrix having such a property, it is possible to directly convert the amplification coefficient value without data or additional training using the property.

다시 합성곱 필터에 대한 수학식 3으로 돌아가, 해당 식들의 이진 필터의 개수를 1보다 큰 K개로 확장하면 아래 수학식 8과 같이 정리될 수 있다.Returning to Equation 3 for the convolutional filter, if the number of binary filters of the corresponding equations is expanded to K greater than 1, it can be summarized as Equation 8 below.

수학식 8에서 w에 근접할 b, α, β 값을 구하기 위해서는 아래 수학식 9에 따른 조건을 만족할 필요가 있다. In Equation 8, it is necessary to satisfy the condition according to Equation 9 below in order to obtain values of b , α, and β that will be close to w.

일반적인 이진필터 b를 사용한다면 조합의 개수가 무궁무진해서 계산하는 것이 불가능할 수 있다. 따라서, 본 발명에서는 Hadamard Matrix의 열(column)들을 추출해서 1번열부터 하나씩 뽑아서 이진 필터 b로 사용한다. 이런 방법을 사용하면 이진 필터 자체를 따로 저장할 필요가 없고, 무엇보다 서로 직교적인 성질 때문에 어떠한 값을 구하는 데 있어서 대부분의 값들이 0이 되어 직접적인 축약이 가능하다.If a general binary filter b is used, it may be impossible to calculate because the number of combinations is infinite. Accordingly, in the present invention, columns of the Hadamard Matrix are extracted, one by one from column 1, and used as a binary filter b. If you use this method, you do not need to store the binary filter itself. Above all, because of the orthogonal property, most of the values become 0 in obtaining a certain value, so direct abbreviation is possible.

정리하면, 이진 필터에 사용할 곱셈 증폭 계수 α 및 상수 증폭 계수 β는 아래 수학식 10과 같이 정리될 수 있다.In summary, the multiplication amplification factor α and the constant amplification factor β to be used in the binary filter can be summarized as shown in Equation 10 below.

즉, 하다마드 매트릭스 의 k번째 열b_k를 추출하면, 기존의 w와 b의 값만 가지고 기타 k열과는 독립적으로 직접 α, β값을 도출할 수 있다. 여기서, M은 벡터의 크기이다.That is, by extracting the k-th column b _k of the Hadamard matrix, α and β values can be directly derived independently of other k columns with only the existing values of w and b. Here, M is the size of the vector.

이러한 직접적인 계산에 의해 도출되는 값들은 이미 최적 값이기 때문에 추가적으로 훈련이 필요 없는 값들이다. The values derived by this direct calculation are already optimal values, so additional training is not required.

아래 표 1은 프로세서 모델별(여기서는, AlexNet, VGG-11, ResNet-18) 추론 성능의 정확도를 비교하여 나타낸 것이다.Table 1 below shows the accuracy of inference performance by processor model (here, AlexNet, VGG-11, ResNet-18).

정확도(%)accuracy(%) 원본original XNOR-NetXNOR-Net 본 발명The present invention AlexNetAlexNet 88.9888.98 84.1584.15 88.3988.39 VGG-11VGG-11 91.7391.73 86.7886.78 91.6591.65 ResNet-18ResNet-18 93.5393.53 90.5190.51 93.3393.33

표 1을 살펴보면, 본 발명에 따른 방법의 경우 업계의 표준인 XNOR-Net보다 월등히 뛰어난 성능을 보이며 심지어 원본에 가까운 정확도를 보임을 알 수 있다. 여기서, 원본은 경량화 또는 이진화되지 않은 완전 정확성(Full-Precision)을 갖는 인공 신경망을 나타낸다. Looking at Table 1, it can be seen that the method according to the present invention exhibits far superior performance than the industry standard XNOR-Net, and even shows an accuracy close to that of the original. Here, the original represents an artificial neural network with full-precision that is not lightened or binarized.

표 2는 가중치 이진화 알고리즘의 정확도를 비교한 표이다.Table 2 is a table comparing the accuracy of the weighted binarization algorithm.

사용모델, 정확도(%)Model used, accuracy (%) 원본 (Top-1/Top-5)Original (Top-1/Top-5) 제안하는 특허(Top-1/Top-5)Proposed patent (Top-1/Top-5) ResNet-18ResNet-18 69.76 / 89.08 69.76 / 89.08 69.41 / 88.9269.41 / 88.92 ResNet-50ResNet-50 76.15 / 92.87 76.15 / 92.87 75.99 / 92.8575.99 / 92.85 VGG-11VGG-11 69.02 / 88.63 69.02 / 88.63 68.92 / 88.5968.92 / 88.59 VGG-19VGG-19 72.38 / 90.88 72.38 / 90.88 71.91 / 90.5671.91 / 90.56 SqueezeNet 1.1SqueezeNet 1.1 58.19 / 80.62 58.19 / 80.62 58.18 / 80.4758.18 / 80.47 MNASNET 1.0MNASNET 1.0 73.51 / 91.54 73.51 / 91.54 73.35 / 91.3873.35 / 91.38

단순히 가중치 이진화 알고리즘 자체만 보면, 본 발명에 따른 방법이 원본과 비교해 1% 이내의 오차를 보이고 있다. Simply looking at the weighted binarization algorithm itself, the method according to the present invention shows an error of less than 1% compared to the original.

도 8은 본 발명의 일 실시예에 따라 이진 신경망을 생성하는 방법의 동작 순서도이다. 8 is a flowchart illustrating a method of generating a binary neural network according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 이진 신경망 생성 방법은 이진 신경망 생성 장치, 예를 들어, 사용자 단말 또는 엣지 단말에 의해 수행될 수 있으나, 동작 주체가 이에 한정되는 것은 아니다.The binary neural network generation method according to an embodiment of the present invention may be performed by a binary neural network generation apparatus, for example, a user terminal or an edge terminal, but the operation subject is not limited thereto.

도 8을 참조하면, 이진 신경망 생성 장치는, 제1 신경망으로부터 실수 값의 필터 가중치를 추출한다(S810). 이때, 제1 신경망은 추론을 위한 훈련이 완료된 상태의 인공신경망일 수 있다. Referring to FIG. 8, the apparatus for generating a binary neural network extracts a real-valued filter weight from the first neural network (S810). In this case, the first neural network may be an artificial neural network in a state in which training for inference has been completed.

이진 신경망 생성 장치는 제1 신경망으로부터 추출한 필터 가중치에 대해 이진 직교 변환을 수행한다(S820). 이진 직교 변환 단계(S820)에서는 필터 가중치로부터 이진 직교 벡터를 생성하고(S821), 이진 직교 벡터의 각 열을 추출하여 하나 이상의 이진 필터를 생성한다(S822). 또한, 생성된 이진 필터를 이용해 이진 곱셈 증폭계수 및 이진 상수 증폭 계수를 계산한다(S823). The apparatus for generating a binary neural network performs binary orthogonal transformation on the filter weight extracted from the first neural network (S820). In the binary orthogonal transformation step (S820), a binary orthogonal vector is generated from the filter weight (S821), and at least one binary filter is generated by extracting each column of the binary orthogonal vector (S822). In addition, a binary multiplication amplification factor and a binary constant amplification factor are calculated using the generated binary filter (S823).

여기서, 이진 곱셈 증폭계수 및 이진 상수 증폭 계수는, 제1 신경망이 포함하는 실수 합성곱 필터에 대한 벡터, 상기 하나 이상의 이진 필터에 대한 벡터 및 합성곱 필터에 대한 벡터의 크기 값을 이용하여 표현되는 수학식에 의해 계산될 수 있다.Here, the binary multiplication amplification coefficient and the binary constant amplification coefficient are expressed using a vector for a real convolution filter included in the first neural network, a vector for the one or more binary filters, and a vector for the convolution filter It can be calculated by the equation.

이진 인공신경망 생성 장치는 이진 직교 변환에 따라 산출된 이진 가중치, 즉 이진 곱셈 증폭계수 및 이진 상수 증폭 계수를 이용해 제2 신경망을 생성할 수 있다(S830).The apparatus for generating a binary artificial neural network may generate a second neural network using a binary weight calculated according to binary orthogonal transformation, that is, a binary multiplication amplification factor and a binary constant amplification factor (S830).

도 9는 본 발명의 일 실시예에 따른 이진 신경망 생성 장치의 블록 구성도이다. 9 is a block diagram of an apparatus for generating a binary neural network according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 이진 신경망 생성 장치는, 적어도 하나의 프로세서(910), 상기 프로세서를 통해 실행되는 적어도 하나의 명령을 저장하는 메모리(920) 및 네트워크와 연결되어 통신을 수행하는 송수신 장치(930)를 포함할 수 있다. The apparatus for generating a binary neural network according to an embodiment of the present invention includes at least one processor 910, a memory 920 for storing at least one command executed through the processor, and a transmission/reception device connected to a network to perform communication. (930) may be included.

이진 신경망 생성 장치(900)는 또한, 입력 인터페이스 장치(940), 출력 인터페이스 장치(950), 저장 장치(960) 등을 더 포함할 수 있다. 이진 신경망 생성 장치(900)에 포함된 각각의 구성 요소들은 버스(bus)(970)에 의해 연결되어 서로 통신을 수행할 수 있다. The binary neural network generating device 900 may further include an input interface device 940, an output interface device 950, a storage device 960, and the like. Each of the constituent elements included in the apparatus 900 for generating a binary neural network may be connected by a bus 970 to communicate with each other.

프로세서(910)는 메모리(920) 및 저장 장치(960) 중에서 적어도 하나에 저장된 프로그램 명령(program command)을 실행할 수 있다. 프로세서(910)는 중앙 처리 장치(central processing unit, CPU), 그래픽 처리 장치(graphics processing unit, GPU), 또는 본 발명의 실시예들에 따른 방법들이 수행되는 전용의 프로세서를 의미할 수 있다. 메모리(920) 및 저장 장치(960) 각각은 휘발성 저장 매체 및 비휘발성 저장 매체 중에서 적어도 하나로 구성될 수 있다. 예를 들어, 메모리(920)는 읽기 전용 메모리(read only memory, ROM) 및 랜덤 액세스 메모리(random access memory, RAM) 중에서 적어도 하나로 구성될 수 있다.The processor 910 may execute a program command stored in at least one of the memory 920 and the storage device 960. The processor 910 may mean a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor in which methods according to embodiments of the present invention are performed. Each of the memory 920 and the storage device 960 may be configured with at least one of a volatile storage medium and a nonvolatile storage medium. For example, the memory 920 may be composed of at least one of read only memory (ROM) and random access memory (RAM).

여기서, 적어도 하나의 명령은, 상기 프로세서로 하여금, 추론 훈련이 완료된 제1 신경망으로부터 실수 값의 필터 가중치를 추출하도록 하는 명령; 상기 필터 가중치에 대한 이진 직교 변환을 수행하도록 하는 명령; 및 상기 이진 직교 변환에 따라 산출된 이진 가중치를 이용해 제2 신경망을 생성하도록 하는 명령을 포함할 수 있다. Here, the at least one instruction may include: an instruction for causing the processor to extract a real-valued filter weight from the first neural network on which inference training has been completed; Instructions for performing binary orthogonal transformation on the filter weight; And an instruction for generating a second neural network using the binary weight calculated according to the binary orthogonal transformation.

상기 제1 신경망은 합성곱 신경망이고, 상기 필터 가중치는 합성곱 필터의 곱셈 증폭계수 및 상수 증폭 계수를 포함할 수 있다.The first neural network is a convolutional neural network, and the filter weight may include a multiplication amplification factor and a constant amplification factor of the convolutional filter.

본 발명의 실시예에 따른 방법의 동작은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 프로그램 또는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산 방식으로 컴퓨터로 읽을 수 있는 프로그램 또는 코드가 저장되고 실행될 수 있다. The operation of the method according to the embodiment of the present invention can be implemented as a computer-readable program or code on a computer-readable recording medium. The computer-readable recording medium includes all types of recording devices that store data that can be read by a computer system. In addition, a computer-readable recording medium may be distributed over a network-connected computer system to store and execute a computer-readable program or code in a distributed manner.

또한, 컴퓨터가 읽을 수 있는 기록매체는 롬(rom), 램(ram), 플래시 메모리(flash memory) 등과 같이 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다. 프로그램 명령은 컴파일러(compiler)에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터(interpreter) 등을 사용해서 컴퓨터에 의해 실행될 수 있는 고급 언어 코드를 포함할 수 있다.Further, the computer-readable recording medium may include a hardware device specially configured to store and execute program commands, such as ROM, RAM, and flash memory. The program instructions may include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

본 발명의 일부 측면들은 장치의 문맥에서 설명되었으나, 그것은 상응하는 방법에 따른 설명 또한 나타낼 수 있고, 여기서 블록 또는 장치는 방법 단계 또는 방법 단계의 특징에 상응한다. 유사하게, 방법의 문맥에서 설명된 측면들은 또한 상응하는 블록 또는 아이템 또는 상응하는 장치의 특징으로 나타낼 수 있다. 방법 단계들의 몇몇 또는 전부는 예를 들어, 마이크로프로세서, 프로그램 가능한 컴퓨터 또는 전자 회로와 같은 하드웨어 장치에 의해(또는 이용하여) 수행될 수 있다. 몇몇의 실시예에서, 가장 중요한 방법 단계들의 하나 이상은 이와 같은 장치에 의해 수행될 수 있다. While some aspects of the invention have been described in the context of an apparatus, it may also represent a description according to a corresponding method, where a block or apparatus corresponds to a method step or characteristic of a method step. Similarly, aspects described in the context of a method can also be represented by a corresponding block or item or a feature of a corresponding device. Some or all of the method steps may be performed by (or using) a hardware device such as, for example, a microprocessor, a programmable computer or electronic circuit. In some embodiments, one or more of the most important method steps may be performed by such an apparatus.

실시예들에서, 프로그램 가능한 로직 장치(예를 들어, 필드 프로그머블 게이트 어레이)가 여기서 설명된 방법들의 기능의 일부 또는 전부를 수행하기 위해 사용될 수 있다. 실시예들에서, 필드 프로그머블 게이트 어레이는 여기서 설명된 방법들 중 하나를 수행하기 위한 마이크로프로세서와 함께 작동할 수 있다. 일반적으로, 방법들은 어떤 하드웨어 장치에 의해 수행되는 것이 바람직하다.In embodiments, a programmable logic device (eg, a field programmable gate array) may be used to perform some or all of the functionality of the methods described herein. In embodiments, the field programmable gate array may work with a microprocessor to perform one of the methods described herein. In general, the methods are preferably performed by some hardware device.

이상 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. Although the above has been described with reference to preferred embodiments of the present invention, those skilled in the art will be able to variously modify and change the present invention within the scope not departing from the spirit and scope of the present invention described in the following claims. You will understand that you can.

Claims

Extracting real-valued filter weights from the first neural network on which inference training has been completed;
Performing binary orthogonal transformation on the filter weights; And
And generating a second neural network using the binary weights calculated according to the binary orthogonal transformation.

The method according to claim 1,
The first neural network is a convolutional neural network,
The filter weight includes a multiplication amplification factor and a constant amplification factor of a convolutional filter.

The method according to claim 1,
Performing a binary orthogonal transform on the filter weight,
Generating a binary orthogonal vector;
Generating at least one binary filter by extracting each column of the binary orthogonal vector; And
Comprising the step of calculating a binary multiplication amplification factor and a binary constant amplification factor using the at least one binary filter, binary neural network generation method.

The method of claim 3,
The binary multiplication amplification factor and the binary constant amplification factor are,
A method of generating a binary neural network, which is calculated by an equation expressed using a vector for a real convolutional filter included in the first neural network, a vector for the one or more binary filters, and a magnitude value of the vector for the convolutional filter.

The method according to claim 1,
The second neural network,
A method of generating a binary neural network, comprising a generalization function, a binary activation function, a binary convolution function, and one or more convolutional layers including an activation function.

The method of claim 3,
The binary multiplication amplification factor and the binary constant amplification factor are inserted as weights of a convolutional filter in the second neural network.

The method of claim 3,
The binary orthogonal vector is a Hadamard Matrix, a binary neural network generation method.

The method of claim 5,
The binary activation function includes a sign function, a binary neural network generation method.

Processor; And
Includes a memory for storing at least one instruction executed through the processor,
The at least one command,
An instruction for extracting a real-valued filter weight from the first neural network on which inference training has been completed;
Instructions for performing binary orthogonal transformation on the filter weight; And
A binary neural network generation apparatus comprising an instruction to generate a second neural network using the binary weight calculated according to the binary orthogonal transformation.

The method of claim 9,
The first neural network is a convolutional neural network,
The filter weight includes a multiplication amplification factor and a constant amplification factor of a convolutional filter.

The method of claim 9,
The command to perform binary orthogonal transformation on the filter weight,
Instructions to generate a binary orthogonal vector;
Instructions for generating at least one binary filter by extracting each column of the binary orthogonal vector; And
A binary neural network generation apparatus comprising an instruction to calculate a binary multiplication amplification factor and a binary constant amplification factor using the at least one binary filter.

The method of claim 11,
The binary multiplication amplification factor and the binary constant amplification factor are,
An apparatus for generating a binary neural network, which is calculated by an equation expressed using a vector for a real convolutional filter included in the first neural network, a vector for the at least one binary filter, and a magnitude value of the vector for the convolutional filter.

The method of claim 9,
The second neural network,
An apparatus for generating a binary neural network, comprising a generalization function, a binary activation function, a binary convolution function, and one or more convolutional layers including an activation function.

The method of claim 11,
The binary multiplication amplification factor and the binary constant amplification factor are inserted as weights of a convolutional filter in the second neural network.

The method of claim 11,
The binary orthogonal vector is a Hadamard Matrix, a binary neural network generator.

The method of claim 13,
The binary activation function includes a sign function, a binary neural network generating apparatus.