KR20230173840A

KR20230173840A - Device for training binary neural networks based on efficient gradient computations and method thereof

Info

Publication number: KR20230173840A
Application number: KR1020220074597A
Authority: KR
Inventors: 장혜령; 이상민
Original assignee: 동국대학교 산학협력단
Priority date: 2022-06-20
Filing date: 2022-06-20
Publication date: 2023-12-27

Abstract

본 발명은 경량화 신경망에 관한 것으로, 미분 값 자체를 근사하여 효율적으로 계산하고, 그 값을 가중치들에 전달하여 학습하는 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 방법 및 장치에 대한 것이다. 본 발명의 일 실시 예에 따르면, 메모리나 전원, 연산능력이 부족한 모바일 장치 또는 소형 기기의 환경에서 인공지능 신경망 학습과 실행을 할 수 있다.The present invention relates to a lightweight neural network, and to a binary neural network learning method and device using efficient differential calculation, which efficiently calculates the differential value itself by approximating it and transfers the value to the weights to learn. According to an embodiment of the present invention, artificial intelligence neural network learning and execution can be performed in the environment of mobile devices or small devices that lack memory, power, or computing power.

Description

Binary neural network learning method and device using efficient differential calculation {DEVICE FOR TRAINING BINARY NEURAL NETWORKS BASED ON EFFICIENT GRADIENT COMPUTATIONS AND METHOD THEREOF}

본 발명은 경량화 신경망에 관한 것으로, 미분 값 자체를 근사하여 효율적으로 계산하고, 그 값을 가중치들에 전달하여 학습하는 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 방법 및 장치에 대한 것이다.The present invention relates to a lightweight neural network, and to a binary neural network learning method and device using efficient differential calculation, which efficiently calculates the differential value itself by approximating it and transfers the value to the weights to learn.

인공지능은 인간의 지적 능력이 요구되는 문제를 기계가 스스로 해결하는 능력을 갖출 수 있도록 주력하는 연구 분야로서, 다음과 같은 요소들로 인해 큰 발전을 이루었다Artificial intelligence is a field of research that focuses on equipping machines with the ability to solve problems that require human intellectual ability. It has made great progress due to the following factors:

첫번째, 인공신경망 (Artificial Neural Networks, ANNs) 기반의 인공지능 모델은 미분(Gradient, 그래디언트) 기반의 효율적인 학습 방법(이른바 역전파 알고리즘)을 가능케 하였다. First, artificial intelligence models based on artificial neural networks (ANNs) enabled an efficient learning method (the so-called backpropagation algorithm) based on gradients.

두번째, 다양한 기기들을 이용하여 대규모의 데이터를 생성하고 수집할 수 있는 체계가 마련되었다.Second, a system was established to generate and collect large-scale data using various devices.

마지막으로 세번째, 대규모의 데이터를 저장하고 복잡한 구조의 인공지능 모델을 이용한 학습에 요구되는 대규모의 연산량을 처리할 수 있는 컴퓨팅 자원이 풍부해 졌다. 이러한 요소들을 기반으로 컴퓨터 비전, 자연어 처리와 같은 복잡한 문제 해결을 요하는 다양한 분야로 인공지능의 적용이 확장되며, 산업이나 사회 구조에 큰 변화를 일으키는 기술로 각광받고 있다. Lastly, computing resources that can store large-scale data and process the large-scale calculations required for learning using artificial intelligence models with complex structures have become abundant. Based on these factors, the application of artificial intelligence is expanding to various fields that require complex problem solving, such as computer vision and natural language processing, and is attracting attention as a technology that brings about significant changes in industrial and social structures.

하지만 해결하고자 하는 문제의 복잡도에 따라 인공신경망 구조의 복잡도가 증가하며 특히 많은 개수의 뉴런과 많은 수의 층(layer)으로 이루어진 인공신경망 구조를 필요로 한다. 이러한 심층신경망(Deep Neural Networks)은 학습과 추론 두 가지 측면에서 많은 계산량을 요구하며, 결과적으로 많은 자원을 소모한다. 이로 인해 모바일 기기 및 사물인터넷 기기와 같이 배터리나 메모리, 연산 능력에 제약 조건을 갖고 있는 환경에서는 심층신경망 기반의 일반적인 인공지능 모델과 학습 방법이 원활하게 동작하지 않는 문제가 발생할 가능성이 크다.However, the complexity of the artificial neural network structure increases depending on the complexity of the problem to be solved, and in particular, an artificial neural network structure consisting of a large number of neurons and a large number of layers is required. These deep neural networks require a large amount of computation in both learning and inference, and consequently consume a lot of resources. As a result, there is a high possibility that general artificial intelligence models and learning methods based on deep neural networks will not operate smoothly in environments with constraints on battery, memory, and computing power, such as mobile devices and Internet of Things devices.

이에 인공신경망 혹은 심층신경망의 학습과 추론에 요구되는 자원을 줄이는 대표적인 방법 중 하나는 경량화신경망을 사용해 문제를 해결하는 것으로, 신경망의 뉴런이 가지는 가중치들을 압축하는 가중치 양자화(Quantization) 기법과 중요도가 적은 가중치를 가지치기(Pruning)하는 기법이 존재한다. 양자화 기법은 32비트 실수 값(float) 형태의 값을 갖는 가중치를 적은 비트로 압축하여 32비트 소수점 연산을 대체하는 방법으로 각 가중치의 저장 용량을 줄일 수 있다. 극단적으로는 -1과 1의 값을 갖도록 양자화 할 수 있다. 반면, 가지치기 기법은 신경망을 이루는 가중치 중 크기가 작아 그 중요도가 덜한 가중치를 제거, 혹은 가지치기, 하는 방법으로 그 결과 모델의 복잡도와 크기를 줄일 수 있다.Accordingly, one of the representative ways to reduce the resources required for learning and inference of an artificial neural network or deep neural network is to solve the problem using a lightweight neural network, using a weight quantization technique that compresses the weights of neurons in the neural network and There is a technique for pruning weights. The quantization technique replaces 32-bit decimal operations by compressing weights with 32-bit float values into fewer bits, thereby reducing the storage capacity of each weight. In the extreme, it can be quantized to have values of -1 and 1. On the other hand, the pruning technique is a method of removing or pruning weights that are small and less important among the weights that make up the neural network, and as a result, the complexity and size of the model can be reduced.

이러한 경량화 신경망 중 하나인 이진 뉴럴 네트워크 (Binary Neural Network, BNN)는 신경망을 이루는 각 가중치(파라미터)와 액티베이션 결과값을 가장 작은 비트 수인 1비트로 양자화 하여 이진 값을 가지도록 함으로써 모델의 복잡도를 최대한으로 줄인 모델이다. 이진 뉴럴 네트워크의 가중치와 액티베이션 값은 -1와 +1 (혹은 0과 1)로 이루어져 훨씬 적은 자원을 사용하여 연산을 수행할 수 있지만, 이진화 과정으로 인해 기존의 역전파 학습기법을 적용할 수 없다. 구체적으로, 가중치의 이진화에는 부호(Sign) 함수가 사용되는데, 부호 함수의 미분(그래디언트)은 대부분의 경우 0이다.Binary Neural Network (BNN), one of these lightweight neural networks, maximizes the complexity of the model by quantizing each weight (parameter) and activation result of the neural network to 1 bit, the smallest number of bits, so that it has a binary value. It is a shortened model. The weights and activation values of a binary neural network are made up of -1 and +1 (or 0 and 1), allowing calculations to be performed using much fewer resources, but the existing backpropagation learning technique cannot be applied due to the binarization process. . Specifically, the sign function is used to binarize the weights, and the derivative (gradient) of the sign function is 0 in most cases.

한편, 근대의 인공지능 학습은 가중치에 대한 미분(그래디언트)으로 학습의 오차를 수정해 나가기 때문에, 기존의 학습 기법을 이진 뉴럴 네트워크에 직접 적용할 수 없다는 어려움이 존재한다. 이에 이진 뉴럴 네트워크의 이진화 과정으로 인한 에너지 효율성의 향상에도 불구하고, 이를 학습에 활용하기 위해서는 어떻게 효율적으로 미분(그래디언트)을 계산해야 하는지에 대한 고려가 필요하다. On the other hand, because modern artificial intelligence learning corrects learning errors through differentiation (gradient) of weights, there is a difficulty in applying existing learning techniques directly to binary neural networks. Accordingly, despite the improvement in energy efficiency due to the binarization process of the binary neural network, consideration is needed on how to efficiently calculate the differential (gradient) in order to use it for learning.

이러한 문제를 해결하기 위한 연구로, 미분 가능한 함수로 부호 함수를 근사(approximation)하여 기존의 학습 기법을 적용하는 방법들이 주로 제시되었다. 부호 함수를 근사하는 함수를 사용하여 미분(그래디언트)을 계산하는 것은 기존의 학습 기법들을 큰 문제없이 적용할 수 있다는 장점을 가지고 있으나, 정확한 이진 뉴럴 네트워크의 미분(그래디언트) 값을 계산하지 못한다는 단점이 존재한다. 반면, 부호 함수 기반의 결정론 적인(deterministic) 이진화 과정 대신 확률적인(probabilistic) 이진화 과정을 사용하는 연구 방법론은 미분(그래디언트) 값을 계산할 수 있다는 장점을 가지고 있으나, 이진화 과정에서 더 많은 복잡도를 요구하며 하드웨어 상 구현이 명확하지 않다는 점에서 아직 활발한 연구가 진행되지 않았다.In research to solve this problem, methods of applying existing learning techniques by approximating the sign function with a differentiable function were mainly presented. Calculating the derivative (gradient) using a function that approximates the sign function has the advantage of being able to apply existing learning techniques without major problems, but has the disadvantage of not being able to accurately calculate the derivative (gradient) value of a binary neural network. This exists. On the other hand, the research methodology that uses a probabilistic binarization process instead of a sign function-based deterministic binarization process has the advantage of being able to calculate differential (gradient) values, but requires more complexity in the binarization process. Since the hardware implementation is not clear, active research has not yet been conducted.

1. 한국 등록특허공보 제10-2345409호 “컨볼루션 뉴럴 네트워크에서 컨볼루션 연산을 가속하는 프로세서 및 프로세서의 동작 방법”(공개일자: 2021년 03월 10일)1. Korean Patent Publication No. 10-2345409 “Processor and processor operation method for accelerating convolution operation in convolutional neural network” (Publication date: March 10, 2021)

본 발명은 부호 함수 근사 기법이나 확률적 이진화 방법과 달리 미분(그래디언트) 값 자체를 근사하여 효율적으로 계산하고, 그 값을 가중치들에 전달하여 학습하는 이진 뉴럴 네트워크의 학습 방법을 제공한다.Unlike the sign function approximation technique or the stochastic binarization method, the present invention provides a learning method for a binary neural network that efficiently calculates the differential (gradient) value itself by approximating it and transfers the value to the weights to learn.

본 발명은 양자화 된 값의 미분을 근사하여 계산하기 위한 보조 변수 값을 활용해 부호 함수가 가중치를 그대로 이진화 함으로 적은 자원으로도 동작하는 경량화 신경망을 제공한다.The present invention provides a lightweight neural network that operates with few resources by using auxiliary variable values for approximating and calculating the derivative of a quantized value and binarizing the weight of the sign function as is.

본 발명의 일 측면에 따르면, 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치를 제공한다.According to one aspect of the present invention, a binary neural network learning device using efficient differential calculation is provided.

본 발명의 일 실시 예에 따른 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치는 이진 뉴럴 네트워크 학습을 위한 이진화 단계를 수행하는 이진화부 및 이진 뉴럴 네트워크 학습에 필요한 미분 값을 산출하는 미분계산부를 포함할 수 있다.A binary neural network learning device using efficient differential calculation according to an embodiment of the present invention may include a binarization unit that performs a binarization step for binary neural network learning and a differential calculation unit that calculates the differential value required for binary neural network learning. there is.

본 발명의 다른 일 측면에 따르면, 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 방법 및 이를 실행하는 컴퓨터 프로그램을 제공한다.According to another aspect of the present invention, a method for learning a binary neural network using efficient differential calculations and a computer program for executing the same are provided.

본 발명의 일 실시 예에 따른 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 방법 및 이를 실행하는 컴퓨터 프로그램은 가중치를 이진화 하는 단계, 활성화 결과 값을 이진화 하는 단계 및 실수 값 가중치를 미분하는 단계를 포함할 수 있다.A binary neural network learning method using efficient differential calculation and a computer program executing the method according to an embodiment of the present invention may include the steps of binarizing weights, binarizing activation result values, and differentiating real value weights. there is.

본 발명의 일 실시 예에 따르면, 메모리나 전원, 연산능력이 부족한 모바일 장치 또는 소형 기기의 환경에서 인공지능 신경망 학습과 실행을 할 수 있다.According to an embodiment of the present invention, artificial intelligence neural network learning and execution can be performed in the environment of mobile devices or small devices that lack memory, power, or computing power.

또한 본 발명의 일 실시 예에 따르면, 이진 뉴럴 네트워크의 이진화가 간단하므로 가중치와 곱해지는 각 층의 결과를 이진화하여 XNOR 이진 연산을 대체할 수 있어 연산 복잡도를 줄여 효과적으로 활용할 수 있다.In addition, according to an embodiment of the present invention, since binarization of a binary neural network is simple, the result of each layer multiplied by the weight can be binarized to replace the XNOR binary operation, thereby reducing computational complexity and making it effective.

또한 본 발명의 일 실시 예에 따르면, 컴퓨팅 자원이 부족한 기기에서 인공 신경망 학습이 가능하므로 인공지능 적용 분야를 확장할 수 있다.Additionally, according to an embodiment of the present invention, artificial neural network learning is possible on devices with insufficient computing resources, thereby expanding the field of application of artificial intelligence.

도 1은 이진 뉴럴 네트워크의 구조 예시.
도 2 내지 도 5는 본 발명의 일 실시 예에 따른 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치를 설명하기 위한 도면들
도 6은 본 발명의 일 실시 예에 따른 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 방법을 도시한 도면.
도 7 및 도 8을 본 발명의 일 실시 예에 따른 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치의 예시 도면들.
도 9 내지 도 12는 본 발명의 일 실시 예에 따른 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치의 실험 결과 예시 도면들.Figure 1 shows an example of the structure of a binary neural network.
2 to 5 are diagrams illustrating a binary neural network learning device using efficient differential calculation according to an embodiment of the present invention.
Figure 6 is a diagram illustrating a binary neural network learning method using efficient differential calculation according to an embodiment of the present invention.
7 and 8 are exemplary diagrams of a binary neural network learning device using efficient differential calculation according to an embodiment of the present invention.
9 to 12 are diagrams illustrating experimental results of a binary neural network learning device using efficient differential calculation according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시 예를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 이를 상세한 설명을 통해 상세히 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 본 발명을 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 본 명세서 및 청구항에서 사용되는 단수 표현은, 달리 언급하지 않는 한 일반적으로 "하나 이상"을 의미하는 것으로 해석되어야 한다.Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail through detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all changes, equivalents, and substitutes included in the spirit and technical scope of the present invention. In describing the present invention, if it is determined that a detailed description of related known technologies may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. Additionally, as used in this specification and claims, the singular expressions “a,” “a,” and “an” should generally be construed to mean “one or more,” unless otherwise specified.

이하, 본 발명의 바람직한 실시 예를 첨부도면을 참조하여 상세히 설명하기로 하며, 첨부 도면을 참조하여 설명함에 있어, 동일하거나 대응하는 구성 요소는 동일한 도면번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the description with reference to the accompanying drawings, identical or corresponding components will be assigned the same drawing numbers and redundant description thereof will be omitted. Do this.

도 1은 이진 뉴럴 네트워크의 구조 예시이다.Figure 1 is an example of the structure of a binary neural network.

이진 뉴럴 네트워크는 신경망을 이루는 각 가중치(파라미터)와 액티베이션 결과값을 가장 작은 비트 수인 1비트로 양자화 하여 이진 값을 가지도록 함으로써 인공지능 모델의 복잡도를 최대한으로 줄인 모델이다. 이진 뉴럴 네트워크의 가중치와 액티베이션 값은 -1과 1(혹은 0과 1)로 이루어져 훨씬 적은 자원을 사용하여 연산을 수행할 수 있지만, 이진화 과정으로 인해 기존의 역전파 학습 기법을 적용할 수 없다. 구체적으로 가중치의 이진화에 사용되는 부호(Sign) 함수의 미분(그래디언트)는 대부분의 경우 0이다. 근래의 인공지능 학습은 가중치에 대한 미분(그래디언트)으로 학습의 오차를 수정해 나가기 때문에, 이진화 과정으로 인해 에너지 효율성이 향상됨에도 불구하고, 이진 뉴럴 네트워크에는 이진화된 가중치에 미분 기반 학습 기법을 적용할 수가 없다. 또한, 부호 함수를 근사하는 함수를 사용하여 연산하는 경우 미분(그래디언트) 값을 계산하기 용이하나 정확한 이진화가 이루어지지 않으므로 연산의 효율성을 얻을 수 없다. The binary neural network is a model that minimizes the complexity of the artificial intelligence model by quantizing each weight (parameter) and activation result value of the neural network to the smallest number of bits, 1 bit, to have a binary value. The weights and activation values of a binary neural network consist of -1 and 1 (or 0 and 1), allowing calculations to be performed using much fewer resources, but the existing backpropagation learning technique cannot be applied due to the binarization process. Specifically, the derivative (gradient) of the sign function used for binarization of weights is 0 in most cases. Since recent artificial intelligence learning corrects learning errors through differentiation (gradient) of weights, despite the improvement in energy efficiency due to the binarization process, differentiation-based learning techniques cannot be applied to binarized weights in binary neural networks. I can't. In addition, when calculating using a function that approximates the sign function, it is easy to calculate the differential (gradient) value, but since accurate binarization is not performed, computational efficiency cannot be obtained.

본 발명은 미분(그래디언트) 값 자체를 근사하여 효율적으로 계산하고, 그 값을 가중치들에 전달하여 학습하는 이진 뉴럴 네트워크를 구현할 수 있다.The present invention can implement a binary neural network that efficiently calculates the differential (gradient) value itself by approximating it and transfers the value to the weights to learn.

도 2 내지 도 5는 본 발명의 일 실시 예에 따른 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치를 설명하기 위한 도면들이다.2 to 5 are diagrams for explaining a binary neural network learning device using efficient differential calculation according to an embodiment of the present invention.

도 2를 참조하면, 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 이진화부(100) 및 미분계산부(200)를 포함할 수 있다.Referring to FIG. 2, the binary neural network learning apparatus 10 using efficient differential calculation may include a binarization unit 100 and a differential calculation unit 200.

이진화부(100)는 이진 뉴럴 네트워크 학습을 위한 이진화 단계를 수행할 수 있다. 이진화부(100)는 뉴럴 네트워크의 전체적인 연산이 이진 값을 통해 진행되도록 한다.The binarization unit 100 may perform a binarization step for learning a binary neural network. The binarization unit 100 allows the overall calculation of the neural network to proceed through binary values.

이진화부(100)는 가중치와 각 레이어( layer)의 활성화(Activation, 액티베이션) 결과값을 모두 이진화 할 수 있다. 가중치와 활성화(Activation, 액티베이션) 결과값이 모두 이진화 되면, 이진 뉴럴 네트워크에서는 모든 연산이 XNOR로 가능해지기 때문에 연산의 복잡도 또한 줄어들어 학습과 실행에 요구되는 에너지 소모가 급격하게 줄어든다.The binarization unit 100 can binarize both the weight and the activation result of each layer. When both weights and activation results are binarized, all operations in a binary neural network are possible with XNOR, so the complexity of operations is also reduced, drastically reducing the energy consumption required for learning and execution.

도 3을 참조하면, 이진화부(100)는 가중치 이진화부(110) 및 활성화 이진화부(120)를 포함할 수 있다.Referring to FIG. 3, the binarization unit 100 may include a weight binarization unit 110 and an activation binarization unit 120.

가중치 이진화부(110)는 뉴럴 네트워크의 모든 가중치들에 대해 이진화 할 수 있다. 가중치 이진화부(110)는 부호(Sign) 함수를 사용하여 가중치를 이진화 할 수 있다. 예를 들면, 가중치 이진화부(110)는 입력 x의 값이 0보다 크거나 같으면 +1을 출력하고, x의 값이 0보다 작으면 -1을 출력할 수 있다. The weight binarization unit 110 can binarize all weights of the neural network. The weight binarization unit 110 may binarize the weight using a sign function. For example, the weight binarization unit 110 may output +1 if the value of input x is greater than or equal to 0, and output -1 if the value of x is less than 0.

가중치 이진화부(110)는[수학식 1]에서와 같이 실수 값의 가중치 w를 이진화하여 w^b 를 산출한다.The weight binarization unit 110 calculates w ^b by binarizing the weight w of the real value as shown in [Equation 1].

활성화 이진화부(120)는 이진화 된 가중치의 값을 활용해 여러 레이어(layer)의 활성화(Activation, 액티베이션) 결과 값을 계산하고 부호(Sign) 함수를 사용하여 활성화(Activation, 액티베이션) 결과 값을 이진화 할 수 있다. 활성화 이진화부(120)는 이전 레이어(layer)의 뉴런 들로부터 받은 입력 값과 이진화 된 가중치의 곱을 합해

를 계산하고, 이를 부호 함수(Sign)를 사용해 이진화 할 수 있다.The activation binarization unit 120 calculates the activation result of several layers using the binarized weight value and binarizes the activation result using the sign function. can do. The activation binarization unit 120 adds the product of the input values received from neurons of the previous layer and the binarized weights.

You can calculate and binarize it using the sign function (Sign).

도 4를 참조하면, 이진화부(100)는 부호(Sign) 함수를 사용하는 결정론적(deterministic) 이진화 방법을 이용하므로 입력 값에 따라 항상 일정한 이진화가 이루어지며, 32비트 실수 값을 1비트로 축소하기 때문에 대략 32배만큼 적은 메모리를 사용해 구현 가능하다. 자세히 설명하면, 도 4(a)는 결정론적 이진화를 위한 부호 함수이고, 도 4(b)는 부호 함수의 미분(그래디언트)이다.Referring to FIG. 4, the binarization unit 100 uses a deterministic binarization method using a sign function, so constant binarization is always performed depending on the input value, and a 32-bit real number is reduced to 1 bit. Therefore, it can be implemented using approximately 32 times less memory. In detail, Figure 4(a) is the sign function for deterministic binarization, and Figure 4(b) is the differentiation (gradient) of the sign function.

이진화부(100)는 정방향에서 이루어지는 가중치와 활성화(액티베이션)의 이진화를 수행하여, 실수 값 가중치 w의 값에 대한 이진화 된 가중치 값 w^b를 구할 수 있다.The binarization unit 100 performs binarization of the weight and activation in the forward direction to obtain a binarized weight value w ^b for the real value weight w.

다시 도 2를 참조하면, 미분계산부(200)는 이진 뉴럴 네트워크 학습에 필요한 미분(그래디언트) 값을 효율적으로 산출할 수 있다.Referring again to FIG. 2, the differential calculation unit 200 can efficiently calculate the differential (gradient) value required for binary neural network learning.

미분계산부(200)는 부호 함수를 통한 이진화로 인해 대부분의 영역에서 미분(그래디언트) 값이 0인 문제를 해결하기 위해 보조 변수 값을 사용하여 미분 값을 근사하여 계산할 수 있다. 예를 들면, 미분계산부(200)는 Straight Through Estimator (STE) 기법을 사용할 수 있다. The differential calculation unit 200 can calculate the differential value by approximating it using the auxiliary variable value to solve the problem in which the differential (gradient) value is 0 in most areas due to binarization through the sign function. For example, the differential calculation unit 200 may use the Straight Through Estimator (STE) technique.

미분계산부(200)는 연쇄 규칙(chain rule)에 의해 미분(그래디언트) 값을 [수학식 3]과 같이 계산한다.The differential calculation unit 200 calculates the differential (gradient) value according to the chain rule as shown in [Equation 3].

이진 뉴럴 네트워크에서는 일반적으로 신경망의 손실 함수(L)를 이진화 된 가중치w ^b 에 대해 미분하지만, 이는 부호 함수의 특성 상 대부분의 경우 0의 값을 가지는 문제가 발생할 수 있다. 자세히 설명하면, 결정론적 이진화로 인해

의 값은 대부분 0의 값을 가지므로 유의미한 학습을 위해, Straight Through Estimator(STE) 기법에서는

의 값을 1로 가정하여, 미분(그래디언트) 값을 계산하도록 한다.In a binary neural network, the loss function ( L ) of the neural network is generally differentiated with respect to the binarized weight w ^b , but due to the nature of the sign function, this may cause a problem in that it has a value of 0 in most cases. In detail, due to deterministic binarization,

Most of the values of have a value of 0, so for meaningful learning, the Straight Through Estimator (STE) technique uses

Assuming the value of is 1, calculate the differential (gradient) value.

하지만 Straight Through Estimator(STE) 기법에서는 이진화 된 값과 실제 실수 값의 범위 차이가 큰 경우 오차가 커지는 문제점이 있다.However, the Straight Through Estimator (STE) technique has a problem in that the error increases when the range difference between the binarized value and the actual real number value is large.

도 5를 참조하면, 미분계산부(200)는 오차가 커지는 것을 방지하기 위해 실수 값을 -1과 +1 사이로 제한 (clipping)하는 방법을 사용한다. 도 5(a)는 부호 함수의 미분(그래디언트)이고, 도5(b)는 역방향 과정에서 실수 값을 -1과 1 사이로 제한(clipping)하는 Straight Through Estimator(STE) 기반 미분 계산에서 사용되는 근사 함수이다.Referring to FIG. 5, the differential calculation unit 200 uses a method of limiting (clipping) real values to between -1 and +1 to prevent errors from increasing. Figure 5(a) is the differentiation (gradient) of the sign function, and Figure 5(b) is an approximation used in differential calculations based on the Straight Through Estimator (STE), which limits (clipping) real values to between -1 and 1 in the reverse process. It is a function.

미분계산부(200)는 역방향에서의 손실 함수(L)를 실수 값 가중치 w에 대해서 미분하여 값 자체를 학습하여 보조 변수 값으로 활용할 수 있다The differential calculation unit 200 can learn the value itself by differentiating the loss function ( L ) in the reverse direction with respect to the real value weight w and use it as an auxiliary variable value.

도 6은 본 발명의 일 실시예에 따른 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 방법을 도시한 도면이다. 이하 설명하는 각 과정은 단계에서 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치를 구성하는 각 기능부가 수행하는 과정이나, 본 발명의 간결하고 명확한 설명을 위해 각 단계의 주체를 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치로 통칭하도록 한다.Figure 6 is a diagram illustrating a binary neural network learning method using efficient differential calculation according to an embodiment of the present invention. Each process described below is a process performed by each functional unit constituting the binary neural network learning device using efficient differential calculation in each step, but for a concise and clear explanation of the present invention, the subject of each step is a binary neural network using efficient differential calculation. It will be collectively referred to as a network learning device.

효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치 (10)는 초기값 w를 랜덤하게 설정하고, 도6에 도시한 바와 같이 이진 뉴럴 네트워크 학습을 수행할 수 있다.The binary neural network learning device 10 using efficient differential calculation can randomly set the initial value w and perform binary neural network learning as shown in FIG. 6.

도 6을 참조하면, 단계 S610에서 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치 (10)는 각 레이어(layer)에서 실수 값 가중치와 부호 함수를 사용해 가중치를 이진화 한다.Referring to FIG. 6, in step S610, the binary neural network learning device 10 using efficient differential calculation binarizes the weights using real-valued weights and sign functions in each layer.

단계 S620에서 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 이진화 된 가중치와 여러 레이어에 걸쳐 정방향으로 활성화(Activation, 액티베이션) 결과 값을 계산해 이진화 한다. In step S620, the binary neural network learning device 10 using efficient differential calculation calculates and binarizes the binarized weights and the activation result values in the forward direction across multiple layers.

S630 단계에서 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치 (10)는 손실 함수를 구하고, 역방향으로 STE를 사용해 실수 값 가중치의 미분(그래디언트)을 계산한다.In step S630, the binary neural network learning device (10) using efficient differential calculation obtains the loss function and uses STE in the reverse direction to calculate the derivative (gradient) of the real value weight.

S640 단계에서 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 경사 하강법을 이용해 실수 값 가중치를 업데이트한다. 예를 들면, Adam, AdaGrad, RMSProp 등의 경사 하강법을 이용해 학습 속도를 개선할 수 있다.In step S640, the binary neural network learning device 10 using efficient differential calculation updates real value weights using gradient descent. For example, the learning speed can be improved by using gradient descent methods such as Adam, AdaGrad, and RMSProp.

S650단계에서 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 미분(그래디언트) 값이 일정 값보다 작으면 학습을 종료한다. 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 미분(그래디언트) 값과 일정 값을 비교하여 미분(그래디언트) 값이 일정 값보다 작아질 때까지 S610 단계로 되돌아가 이를 반복적으로 수행한다. In step S650, the binary neural network learning device 10 using efficient differential calculation ends learning when the differential (gradient) value is less than a certain value. The binary neural network learning device 10 using efficient differential calculation compares the differential (gradient) value with a certain value, returns to step S610, and performs this repeatedly until the differential (gradient) value becomes smaller than the certain value.

도 7 및 도 8을 본 발명의 일 실시 예에 따른 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치의 예시 도면들이다.Figures 7 and 8 are exemplary diagrams of a binary neural network learning device using efficient differential calculation according to an embodiment of the present invention.

도 7을 참조하면, 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 실수 값 가중치와 부호 함수를 사용해 가중치를 이진화하고, 활성화 결과 값을 산출해 이진화 한다. 또한 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 이진화 과정으로 각 레이어(layer)에 걸쳐 정방향으로 수행하여 손실 함수를 산출할 수 있다.Referring to FIG. 7, the binary neural network learning device 10 using efficient differential calculation binarizes the weights using real-valued weights and sign functions, and calculates and binarizes the activation result values. Additionally, the binary neural network learning device 10 using efficient differential calculation can calculate a loss function by performing a binarization process in the forward direction across each layer.

도 8을 참조하면, 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 역방향에서는 손실 함수를 STE 기반으로 실수 값 가중치에 대해 미분하고, 가중치 미분 값을 학습하여 보조 변수로 활용할 수 있다.Referring to FIG. 8, the binary neural network learning device 10 using efficient differential calculation can differentiate the loss function with respect to the real value weight based on STE in the reverse direction, learn the weight differential value, and use it as an auxiliary variable.

도 9 내지 도 12는 본 발명의 일 실시 예에 따른 효율적인 미분 계산을 이용한 뉴럴 네트워크 학습 장치의 실험 결과 예시 도면들이다.9 to 12 are diagrams illustrating experimental results of a neural network learning device using efficient differential calculation according to an embodiment of the present invention.

효율적인 미분 계산을 이용한 뉴럴 네트워크 학습 장치(10)의 성능을 확인하기 위해 ECG200, ECG5000 및 ECGThorax의 총 3개의 데이터 셋을 활용하여 실험하였다. 각각의 데이터 셋은 심전도 데이터이고, 이를 통해 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)가 심장 이상을 분류하는 분류(Classification) 문제를 학습한 실험 결과들이다.To check the performance of the neural network learning device (10) using efficient differential calculation, an experiment was conducted using a total of three data sets: ECG200, ECG5000, and ECGThorax. Each data set is electrocardiogram data, and these are the results of an experiment in which the binary neural network learning device 10 using efficient differential calculations learned a classification problem to classify cardiac abnormalities.

도 9는 본 발명의 일 실시 예에 따른 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치가 실험에 사용한 이진 뉴럴 네트워크 구조를 간단히 표현한 예시 도면이다.Figure 9 is an example diagram briefly expressing the binary neural network structure used in the experiment by the binary neural network learning device using efficient differential calculation according to an embodiment of the present invention.

도 9를 참조하면, 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 입력 레이어, 출력 레이어와 2개의 은닉 레이어로 구성할 수 있다. 실험에서 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 2개의 은닉 레이어가 가지는 뉴런의 수를 데이터 셋에 따라 달리하였다. 자세히 설명하면, 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 ECG200과 ECG5000 데이터 셋에서는 각 레이어마다 뉴런을 1024개 설정하였고, ECGThorax 데이터 셋은 4096개의 뉴런을 설정하였다.Referring to FIG. 9, the binary neural network learning device 10 using efficient differential calculation can be composed of an input layer, an output layer, and two hidden layers. In the experiment, the binary neural network learning device 10 using efficient differential calculation varied the number of neurons in the two hidden layers depending on the data set. In detail, the binary neural network learning device 10 using efficient differential calculation set 1024 neurons for each layer in the ECG200 and ECG5000 data sets, and 4096 neurons in the ECGThorax data set.

도 10의 예시는 ECG200 데이터 셋을 기준으로 정상적인 심장 박동과 허혈성 심질환의 심장 박동을 비교한 예시로 심장 박동 이상의 분류 기준이다.The example in Figure 10 is an example of comparing a normal heartbeat and a heartbeat in ischemic heart disease based on the ECG200 data set, and is a standard for classifying heart rhythm abnormalities.

도 11및 도12는 본 발명의 일시 예에 따른 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크의 학습 결과 예시들이다.11 and 12 are examples of learning results of a binary neural network using efficient differential calculation according to an example of the present invention.

ECG200 데이터 셋 환경에서는 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)가 ECG200 데이터 셋을 이용해 200개의 심장 박동 중 어떤 박동이 이상이 있는지 분류하는 이진 분류 실험을 하였다. 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 ECG200 데이터 셋에서 100개는 학습 데이터로 사용하고, 100개는 테스트 데이터로 사용하여 학습하였다.In the ECG200 data set environment, a binary neural network learning device (10) using efficient differential calculation conducted a binary classification experiment to classify which of 200 heartbeats was abnormal using the ECG200 data set. The binary neural network learning device (10) using efficient differential calculation was trained using 100 data sets as training data and 100 data sets as test data in the ECG200 data set.

도 11(a)은 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)가 ECG200 데이터 셋으로 학습한 이진 뉴럴 네트워크의 정확도 예시이다.Figure 11(a) is an example of the accuracy of a binary neural network learned by the binary neural network learning device 10 using efficient differential calculation with the ECG200 data set.

도 11(b)는 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치의 ECG200 데이터 셋에 대한 학습 곡선 예시이다. Figure 11(b) is an example of a learning curve for the ECG200 data set of a binary neural network learning device using efficient differential calculation.

도 11(b)의 예시를 보면, 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 학습을 잘 수행하여 초반에 정확도가 상승하고 손실이 감소하는 것을 확인할 수 있다. 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 50번 정도의 학습이 진행된 이후부터는 점차 테스트 셋의 손실이 증가하며, 과적합 되는 모습을 나타낸다. 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)의 테스트 정확도는 93% 정도 이다.Looking at the example of FIG. 11(b), it can be seen that the binary neural network learning device 10 using efficient differential calculation performs learning well, increasing accuracy and reducing loss at the beginning. The binary neural network learning device 10 using efficient differential calculation gradually increases the loss of the test set after about 50 training sessions, and shows overfitting. The test accuracy of the binary neural network learning device 10 using efficient differential calculation is about 93%.

도 12를 참조하면, 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 ECG5000 데이터 셋과 ECGThorax 데이터 셋에서도ECG200데이터 셋과 비슷하게 STE 기반 학습 알고리즘이 잘 동작한다.Referring to FIG. 12, the binary neural network learning device 10 using efficient differential calculation operates well with the STE-based learning algorithm in the ECG5000 data set and the ECGThorax data set, similar to the ECG200 data set.

도 12(a)는 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)가 수행한 ECG5000 데이터 셋에 대한 학습 결과 예시이다. ECG5000 데이터 셋 환경에서는 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)의 테스트 정확도는 94.62% 정도 이고, 100번째 학습부터 정확도가 약해지는 것을 확인할 수 있다.Figure 12(a) is an example of learning results for the ECG5000 data set performed by the binary neural network learning device 10 using efficient differential calculation. In the ECG5000 data set environment, the test accuracy of the binary neural network learning device 10 using efficient differential calculation is about 94.62%, and it can be seen that the accuracy weakens from the 100th learning.

ECG5000 데이터 셋 환경에서는 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 500개의 학습 데이터와 4500개의 테스트 데이터로 정상 박동 1개와 4개의 이상 박동으로 5중 클래스를 분류하는 문제를 실험하였다.In the ECG5000 data set environment, the binary neural network learning device (10) using efficient differential calculation tested the problem of classifying 5 classes into 1 normal beat and 4 abnormal beats with 500 learning data and 4500 test data.

도 12(b)는 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)가 수행한 ECGThorax 데이터 셋에 대한 학습 결과 예시이다. ECGThorax 데이터 셋 환경에서는 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)의 테스트 정확도는 9121% 정도 이고, 2000번 반복될 때 까지 성능은 크게 악화되지 않았다.Figure 12(b) is an example of learning results for the ECGThorax data set performed by the binary neural network learning device 10 using efficient differential calculation. In the ECGThorax data set environment, the test accuracy of the binary neural network learning device (10) using efficient differential calculation was about 9121%, and the performance did not deteriorate significantly until it was repeated 2000 times.

ECGThorax 데이터 셋 환경에서는 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 태아의 박동이 어떤 상태인지 42 클래스로 분류하는 문제로, 1800개의 학습 데이터와 1965개의 테스트 데이터로 나누어 실험하였다.In the ECGThorax data set environment, the binary neural network learning device (10) using efficient differential calculation was tested on the problem of classifying the fetal heart rate into 42 classes, divided into 1800 learning data and 1965 test data.

DATA SETDATA SET BNN(STE)BNN(STE) (FULL-PRECISION) NN(FULL-PRECISION) N.N. (OPTIMIZED) NN(OPTIMIZED) NN ECG5000ECG5000 94.62%94.62% 94.62%94.62% 94.73%94.73% ECGTHORAXECGTHORAX 91.21%91.21% 92.73%92.73% 94.68%94.68% ECG200ECG200 93%93% 94%94% 89.05%89.05%

[표 1]은 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)의 학습 정확도와 기존의 뉴럴 네트워크(NN) 학습 정확도를 표시한 것이다. 이진화 과정을 전혀 거치치 않은 뉴럴 네트워크(FULL-PRECISION NN)의 경우 공통적으로 4096 뉴런 2층의 모델을 사용하여 학습하였고, OPTIMIZED NN의 경우 가장 좋은 성능을 가지는 종래의 연구 모델(start-of-the-art)을 의미한다.[표 1]을 참조하면, 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)는 다양한 데이터에 적용될 수 있고, 각각의 다른 모델을 사용하여 모델의 구조 또한 가변 될 수 있다. 이는 모바일 단말기나 스마트 기기를 기반으로 하는 다양한 헬스케어 관련된 실제 응용에서 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치(10)의 활용성이 다양함을 의미한다.[Table 1] shows the learning accuracy of the binary neural network learning device 10 using efficient differential calculation and the learning accuracy of the existing neural network (NN). In the case of a neural network (FULL-PRECISION NN) that did not go through any binarization process, it was commonly learned using a two-layer model of 4096 neurons, and in the case of OPTIMIZED NN, a conventional research model (start-of-the-line) with the best performance was used. -art). Referring to [Table 1], the binary neural network learning device 10 using efficient differential calculation can be applied to various data, and the structure of the model can also be varied by using each different model. there is. This means that the binary neural network learning device 10 using efficient differential calculations has various usability in various real-life healthcare-related applications based on mobile terminals or smart devices.

상술한 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 방법은 컴퓨터가 읽을 수 있는 매체 상에 컴퓨터가 읽을 수 있는 코드로 구현될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체는, 예를 들어 이동형 기록 매체(CD, DVD, 블루레이 디스크, USB 저장 장치, 이동식 하드 디스크)이거나, 고정식 기록 매체(ROM, RAM, 컴퓨터 구비형 하드 디스크)일 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체에 기록된 상기 컴퓨터 프로그램은 인터넷 등의 네트워크를 통하여 다른 컴퓨팅 장치에 전송되어 상기 다른 컴퓨팅 장치에 설치될 수 있고, 이로써 상기 다른 컴퓨팅 장치에서 사용될 수 있다.The binary neural network learning method using efficient differential calculation described above can be implemented as computer-readable code on a computer-readable medium. The computer-readable recording medium may be, for example, a removable recording medium (CD, DVD, Blu-ray disk, USB storage device, removable hard disk) or a fixed recording medium (ROM, RAM, computer-equipped hard disk). You can. The computer program recorded on the computer-readable recording medium can be transmitted to another computing device through a network such as the Internet, installed on the other computing device, and thus used on the other computing device.

이상에서, 본 발명의 실시 예를 구성하는 모든 구성 요소들이 하나로 결합되거나 결합되어 동작하는 것으로 설명되었다고 해서, 본 발명이 반드시 이러한 실시 예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위안에서라면, 그 모든 구성요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다.In the above, even though all the components constituting the embodiment of the present invention have been described as being combined or operated in combination, the present invention is not necessarily limited to this embodiment. That is, within the scope of the purpose of the present invention, all of the components may be operated by selectively combining one or more of them.

도면에서 동작들이 특정한 순서로 도시되어 있지만, 반드시 동작들이 도시된 특정한 순서로 또는 순차적 순서로 실행되어야만 하거나 또는 모든 도시 된 동작들이 실행되어야만 원하는 결과를 얻을 수 있는 것으로 이해되어서는 안 된다. 특정 상황에서는, 멀티태스킹 및 병렬 처리가 유리할 수도 있다. 더욱이, 위에 설명한 실시 예 들에서 다양한 구성들의 분리는 그러한 분리가 반드시 필요한 것으로 이해되어서는 안 되고, 설명된 프로그램 컴포넌트들 및 시스템들은 일반적으로 단일 소프트웨어 제품으로 함께 통합되거나 다수의 소프트웨어 제품으로 패키지 될 수 있음을 이해하여야 한다.Although operations are shown in the drawings in a specific order, it should not be understood that the operations must be performed in the specific order shown or sequential order or that all illustrated operations must be performed to obtain the desired results. In certain situations, multitasking and parallel processing may be advantageous. Moreover, the separation of the various components in the embodiments described above should not be construed as necessarily requiring such separation, and the program components and systems described may generally be integrated together into a single software product or packaged into multiple software products. You must understand that it exists.

이제까지 본 발명에 대하여 그 실시 예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been examined focusing on its embodiments. A person skilled in the art to which the present invention pertains will understand that the present invention may be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered from an illustrative rather than a restrictive perspective. The scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the equivalent scope should be construed as being included in the present invention.

10: 효율적인 미분 계산을 이용한 이진 뉴럴 네트워크 학습 장치
100: 이진화부
110: 가중치 이진화부
120: 활성화 이진화부
200: 미분계산부10: Binary neural network learning device using efficient differential calculation
100: binarization unit
110: Weight binarization unit
120: Activation binarization unit
200: Differential calculation unit

Claims

In a binary neural network learning device using efficient differential calculation,
A binarization unit that performs the binarization step for learning a binary neural network; and
A binary neural network learning device using efficient differential calculation, including a differential calculation unit that calculates the differential value required for learning the binary neural network.

According to paragraph 1,
The differential calculation unit
Calculate by approximating the differential value using the auxiliary variable value.
Binary neural network learning device using efficient differential calculation.

According to claim 1,
The binarization unit
A weight binarization unit that binarizes the weight and
Contains an activation binarization unit that binarizes the activation result value using a sign function.
Binary neural network learning device using efficient differential calculation.

In the learning method performed by a binary neural network learning device using efficient differential calculation,
Binarizing the weights;
Binarizing the activation result value; and
comprising differentiating real-valued weights.
Binary neural network learning method using efficient differential calculation.

According to paragraph 4,
The step of differentiating the real value weight is
Calculate by approximating the differential value using the auxiliary variable value.
Binary neural network learning method using efficient differential calculation.

A computer program recorded on a computer-readable recording medium that executes the binary neural network learning method using efficient differential calculation according to any one of claims 4 and 5.