KR20230089555A

KR20230089555A - Neural network quantization error correction apparatus and method

Info

Publication number: KR20230089555A
Application number: KR1020220172924A
Authority: KR
Inventors: 박은혁
Original assignee: 포항공과대학교 산학협력단
Priority date: 2021-12-13
Filing date: 2022-12-12
Publication date: 2023-06-20

Abstract

A device for correcting a neural network quantization error according to one embodiment of the present invention receives a first parameter of a neural network before quantization, receives a second parameter of the neural network after quantization, corrects the second parameter based on statistical information of the first parameter and statistical information of the second parameter, and enables the corrected second parameter to be outputted as a third parameter. Therefore, the present invention is capable of minimizing accuracy loss.

Description

Neural network quantization error correction apparatus and method {NEURAL NETWORK QUANTIZATION ERROR CORRECTION APPARATUS AND METHOD}

본 발명은 양자화를 통해 뉴럴 네트워크를 최적화/경량화하는 시스템에 관한 것으로, 뉴럴 네트워크의 양자화 오류 보정 기능을 포함하는 뉴럴 네트워크, 뉴럴 네트워크의 양자화, 및 양자화된 뉴럴 네트워크를 이용한 추론 프로세스에 관한 것이다. The present invention relates to a system for optimizing/lightening a neural network through quantization, and relates to a neural network including a quantization error correction function of the neural network, quantization of the neural network, and an inference process using the quantized neural network.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래 기술을 구성하는 것은 아니다.The contents described in this part merely provide background information on the present embodiment and do not constitute prior art.

인공지능(Artificial Intelligence)의 한 분야인 딥러닝(Deep Learning)은 복잡한 데이터의 패턴을 인식하고 정교한 예측을 가능하게 한다는 점에서 4차 산업혁명 시대의 핵심 기술로서 다양한 분야에서 활용되고 있다. 딥러닝은 인간의 생물학적 신경 세포의 특성을 수학적 표현에 의해 모델링 한 인공신경망(artificial neural network)을 깊게 구성하여 학습하는 방법을 말한다.Deep Learning, a field of Artificial Intelligence, is used in various fields as a key technology in the era of the 4th Industrial Revolution in that it recognizes patterns in complex data and enables sophisticated predictions. Deep learning refers to a method of deeply constructing and learning an artificial neural network that models the characteristics of human biological nerve cells through mathematical expressions.

일반적으로 딥러닝은 학습용 데이터를 활용하여 인공신경망을 학습시키는 학습 단계(training)와, 학습이 완료된 인공신경망 모델(trained model)에 새로운 데이터를 입력하여 출력을 얻는 추론 단계(inference)로 구성된다. 이러한 딥러닝은 인공신경망을 깊게 구성할수록 더 정교한 예측이 가능하여 성능을 끌어올릴 수 있는 반면, 이는 막대한 연산량으로 인해 더 많은 전력을 필요로 하고, 속도가 저하되는 문제로 이어지게 된다. 이러한 문제를 해결하기 위해 비슷한 성능을 유지한 채 더 적은 파라미터 수와 연산량을 가지는 모델을 만드는 인공신경망 모델 경량화 기술이 사용되고 있다.In general, deep learning consists of a learning step (training) in which an artificial neural network is trained using learning data, and an inference step (inference) in which an output is obtained by inputting new data to a trained model (trained model). In deep learning, the deeper the artificial neural network is configured, the more sophisticated predictions can be made and the performance can be improved. In order to solve this problem, an artificial neural network model weight reduction technology is used to create a model with a smaller number of parameters and a smaller amount of computation while maintaining similar performance.

이러한 인공신경망 모델 경량화 기술은 크게 알고리즘 자체를 적은 연산과 효율적인 구조로 설계하는 경량 알고리즘 연구 방식과 만들어진 모델의 파라미터들을 줄이는 모델 압축과 같은 기법을 적용하는 알고리즘 경량화 방식으로 구분된다. This artificial neural network model lightweight technology is largely divided into a lightweight algorithm research method that designs the algorithm itself with a small amount of computation and an efficient structure, and an algorithm lightweight method that applies techniques such as model compression to reduce the parameters of the created model.

알고리즘 경량화 방식은 인공신경망을 압축하는 여러 기술이 적용될 수 있는데, 주로 양자화(quantization) 하여 가중치(weight)로 저장하는 bit를 최소화하는 방법을 사용하고 있다. For the algorithm lightweight method, various techniques for compressing artificial neural networks can be applied, and a method of minimizing bits stored as weights by quantization is mainly used.

양자화 과정에서는 floating point 표현형으로 표현된 가중치를 보다 작은 bit의 표현형으로 양자화하는 과정이 대표적으로 이용된다. 이때 양자화된 이후의 가중치는 원본 가중치의 정보를 일부 손실하므로, 양자화 과정은 일종의 손실 압축에 해당한다고 볼 수 있다. In the quantization process, a process of quantizing a weight expressed as a floating point expression into a smaller bit expression is typically used. At this time, since the weights after quantization lose some information of the original weights, the quantization process can be regarded as a kind of lossy compression.

이처럼 양자화 과정에서 가중치 정보의 손실이 발생하므로, 종래 기술들은 양자화 이후의 가중치 파라미터를 통계적인 보조 정보로 기술하고자 노력하고 있다. 예를 들어 평균값, 중간값, 분산/표준편차 등이 양자화된 이후의 가중치 집합을 기술하기 위해 부가적으로 포함되고 있다. As such, since weight information is lost in the quantization process, conventional technologies have tried to describe weight parameters after quantization as statistical auxiliary information. For example, mean value, median value, variance/standard deviation, etc. are additionally included to describe the set of weights after quantization.

일반적으로 뉴럴 네트워크 학습 후 양자화가 이루어지는 경우 정확도 손실을 만회하기 위한 방법은 양자화 후 학습(quantization-aware training) 또는 양자화 후 별도의 학습 없이 에러를 보정하여 활용하는 학습 후 양자화(post-training quantization)으로 나눌 수 있다. In general, when quantization is performed after learning a neural network, the method to compensate for the loss of accuracy is quantization-aware training or post-training quantization, which corrects errors without additional training after quantization. can share

이 중 양자화 후 학습 방법의 경우 학습을 통해 양자화된 네트워크의 오류를 보정하고 normalization layer의 running mean/variance를 보정하는 단계가 포함되어 있어 양자화 오류를 보정하기 위한 과정이 복잡하며 이로 인하여 경량화라는 목적에 부합하지 못하는 문제가 있다. Among them, the post-quantization learning method includes a step of correcting the error of the quantized network through learning and correcting the running mean/variance of the normalization layer, so the process for correcting the quantization error is complicated, which makes it suitable for the purpose of light weight. There is a problem that doesn't fit.

양자화 오류를 보정하는 또 다른 시도로서 [1] 한국공개특허 KR 10-2021-0035702호 "인공 신경망의 양자화 방법 및 인공 신경망을 이용한 연산 방법" 등이 제안되기도 하였다.As another attempt to correct the quantization error, [1] Korean Patent Publication No. KR 10-2021-0035702 "Quantization method of artificial neural network and calculation method using artificial neural network" has been proposed.

상기 선행문헌 [1]에서는 인공 신경망을 구동하는 신경망 시스템 및 인공 신경망을 양자화하는 양자화 시스템을 포함하고, 양자화 시스템은, 인공 신경망의 파라미터들을 양자화함으로써, 인공 신경망의 양자화된 파라미터들을 생성하고, 인공 신경망의 파라미터들 및 양자화된 파라미터들을 기초로 인공 신경망의 파라미터들의 양자화 오차를 생성하고, 양자화된 파라미터들 및 인공 신경망의 파라미터들의 양자화 오차를 기초로 보정 바이어스를 생성하고, 생성한 양자화된 파라미터들 및 보정 바이어스를 신경망 시스템에 전송할 수 있다.The prior art document [1] includes a neural network system that drives an artificial neural network and a quantization system that quantizes the artificial neural network, and the quantization system generates quantized parameters of the artificial neural network by quantizing parameters of the artificial neural network, and the artificial neural network Generating quantization errors of the parameters of the artificial neural network based on the parameters and quantized parameters of , generating a correction bias based on the quantized parameters and the quantization errors of the parameters of the artificial neural network, and generating the quantized parameters and correction The bias can be transmitted to the neural network system.

선행문헌 [1]에서는 양자화된 입력 샘플 및 양자화된 파라미터들을 기초로 제1 MAC(multiply-accumulate) 연산을 수행하고, 제1 MAC 연산의 결과에 보정 바이어스를 반영함으로써 최종 연산 결과를 생성하는 구성을 제안한다.In Prior Document [1], a first multiply-accumulate (MAC) operation is performed based on quantized input samples and quantized parameters, and a configuration for generating a final operation result by reflecting a correction bias to the result of the first MAC operation is provided. Suggest.

이 방식은 양자화 파라미터를 직접적으로 보정하는 것이 아니고 양자화 오차에 기반하여 추론 결과가 어떻게 영향을 받을 지를 별도로 학습하거나 분석하는 복잡한 과정을 필요로 한다. 따라서 대부분의 종래 기술들처럼 양자화 오류를 보정하기 위한 과정이 오히려 신경망을 더욱 복잡화하는 문제점이 반복되고 있다.This method does not directly correct the quantization parameter, but requires a complicated process of separately learning or analyzing how the inference result will be affected based on the quantization error. Therefore, the process of correcting the quantization error rather complicates the neural network, as in most of the conventional technologies, is repeated.

한국공개특허 KR 10-2021-0035702호 "인공 신경망의 양자화 방법 및 인공 신경망을 이용한 연산 방법" (공개일 2021년 4월 1일)Korean Patent Publication No. KR 10-2021-0035702 "Quantization method of artificial neural network and calculation method using artificial neural network" (published on April 1, 2021)

상기와 같은 문제점을 해결하기 위한 본 발명의 목적은, 학습 후 양자화 (post-training quantization) 기법으로 양자화된 이후의 파라미터를 이용한 별도의 학습이 필요하지 않으면서도 성능을 향상한 뉴럴 네트워크 양자화 오류 보정 장치 및 방법을 제공하는 것이다.An object of the present invention to solve the above problems is a neural network quantization error correction device that improves performance without requiring separate learning using parameters after quantization by post-training quantization technique and to provide a method.

본 발명의 목적은 양자화 이후 가중치의 양자화에서 발생하는 통계적 분포의 오류를 보정하는 알고리즘을 통해 정확도를 보전하면서 성능을 향상한 뉴럴 네트워크 양자화 오류 보정 장치 및 방법을 제공하는 것이다.An object of the present invention is to provide a neural network quantization error correction apparatus and method that improves performance while preserving accuracy through an algorithm for correcting statistical distribution errors generated in quantization of weights after quantization.

본 발명의 목적은 제안된 양자화 오류 기법을 적용한 뉴럴 네트워크 및 그 운용 방법을 제공하는 것이다.An object of the present invention is to provide a neural network and its operating method to which the proposed quantization error technique is applied.

본 발명의 목적을 달성하기 위한 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 장치는, 프로세서(processor); 및 프로세서를 통해 실행되는 적어도 하나의 명령이 저장된 메모리(memory)를 포함하고, 프로세서가 적어도 하나의 명령을 실행함으로써, 뉴럴 네트워크의 양자화되기 전 제1 파라미터를 수신하고, 뉴럴 네트워크의 양자화된 이후의 제2 파라미터를 수신하고, 제1 파라미터의 통계적 정보와 제2 파라미터의 통계적 정보에 기반하여 제2 파라미터를 보정하고, 보정된 제2 파라미터를 제3 파라미터로서 출력한다.A neural network quantization error correction apparatus according to an embodiment for achieving the object of the present invention includes a processor; and a memory in which at least one instruction executed by the processor is stored, and by executing the at least one instruction, the processor receives the first parameter before quantization of the neural network, and the first parameter after quantization of the neural network The second parameter is received, the second parameter is corrected based on the statistical information of the first parameter and the statistical information of the second parameter, and the corrected second parameter is output as a third parameter.

프로세서가 적어도 하나의 명령을 실행함으로써, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 차이를 최소화하도록 제2 파라미터 각각을 보정할 수 있다. By executing at least one instruction, the processor may calibrate each of the second parameters to minimize a difference between an average of the first parameter and an average of the second parameter.

프로세서가 적어도 하나의 명령을 실행함으로써, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 차이에 기반하여 제2 파라미터 각각을 보정할 수 있다. When the processor executes at least one instruction, each second parameter may be corrected based on a difference between an average of the first parameter and an average of the second parameter.

프로세서가 적어도 하나의 명령을 실행함으로써, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 차이를 최소화하고, 제1 파라미터의 표준편차와 제2 파라미터의 표준편차 간의 차이를 최소화하도록 제2 파라미터 각각을 보정할 수 있다. By causing the processor to execute at least one instruction, each of the second parameters is configured to minimize a difference between an average of the first parameter and an average of the second parameter, and minimize a difference between a standard deviation of the first parameter and a standard deviation of the second parameter. can be corrected

프로세서가 적어도 하나의 명령을 실행함으로써, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 제1 차이; 및 제1 파라미터의 표준편차와 제2 파라미터의 표준편차 간의 제2 차이에 기반하여 제2 파라미터 각각을 보정할 수 있다. a first difference between an average of a first parameter and an average of a second parameter by a processor executing at least one instruction; and each of the second parameters may be corrected based on a second difference between the standard deviation of the first parameter and the standard deviation of the second parameter.

본 발명의 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 방법은 메모리(memory)에 저장되는 적어도 하나의 명령을 실행하는 프로세서(processor)에 의하여 수행되는 방법으로서, 프로세서가 적어도 하나의 명령을 실행함으로써, 뉴럴 네트워크의 양자화되기 전 제1 파라미터를 수신하는 단계; 뉴럴 네트워크의 양자화된 이후의 제2 파라미터를 수신하는 단계; 제1 파라미터의 통계적 정보와 제2 파라미터의 통계적 정보에 기반하여 제2 파라미터를 보정하는 단계; 및 보정된 제2 파라미터를 제3 파라미터로서 출력하는 단계를 포함한다. A neural network quantization error correction method according to an embodiment of the present invention is a method performed by a processor executing at least one command stored in a memory, wherein the processor executes the at least one command, thereby receiving a first parameter of the neural network before being quantized; Receiving a second parameter after quantization of the neural network; correcting a second parameter based on the statistical information of the first parameter and the statistical information of the second parameter; and outputting the corrected second parameter as a third parameter.

제2 파라미터를 보정하는 단계는, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 차이를 최소화하도록 제2 파라미터 각각을 보정하는 단계를 포함할 수 있다. Correcting the second parameters may include correcting each of the second parameters to minimize a difference between an average of the first parameters and an average of the second parameters.

제2 파라미터 각각을 보정하는 단계는, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 차이에 기반하여 제2 파라미터 각각을 보정할 수 있다. In the correcting of each of the second parameters, each of the second parameters may be corrected based on a difference between an average of the first parameters and an average of the second parameters.

제2 파라미터를 보정하는 단계는, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 차이를 최소화하고, 제1 파라미터의 표준편차와 제2 파라미터의 표준편차 간의 차이를 최소화하도록 제2 파라미터 각각을 보정하는 단계를 포함할 수 있다. Correcting the second parameters may include correcting each of the second parameters to minimize a difference between an average of the first parameter and an average of the second parameter, and minimize a difference between a standard deviation of the first parameter and a standard deviation of the second parameter. steps may be included.

제2 파라미터 각각을 보정하는 단계는, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 제1 차이; 및 제1 파라미터의 표준편차와 제2 파라미터의 표준편차 간의 제2 차이에 기반하여 제2 파라미터 각각을 보정할 수 있다. Correcting each of the second parameters may include a first difference between an average of the first parameter and an average of the second parameter; and each of the second parameters may be corrected based on a second difference between the standard deviation of the first parameter and the standard deviation of the second parameter.

본 발명의 일 실시예에 따른 양자화 오류가 보정된 뉴럴 네트워크의 동작 방법은 메모리(memory)에 저장되는 적어도 하나의 명령을 실행하는 프로세서(processor)에 의하여 수행되는 방법으로서, 프로세서가 적어도 하나의 명령을 실행함으로써, 뉴럴 네트워크의 양자화되기 전 제1 파라미터를 수신하는 단계; 뉴럴 네트워크의 양자화된 이후의 제2 파라미터를 수신하는 단계; 제1 파라미터의 통계적 정보와 제2 파라미터의 통계적 정보에 기반하여 제2 파라미터를 보정함으로써 제3 파라미터를 생성하는 단계; 및 제3 파라미터에 기반하여 입력 데이터에 대한 추론 결과를 생성하는 단계를 포함한다. A method of operating a neural network with corrected quantization errors according to an embodiment of the present invention is a method performed by a processor executing at least one instruction stored in a memory, wherein the processor executes at least one instruction Receiving a first parameter of the neural network before quantization by executing ; Receiving a second parameter after quantization of the neural network; generating a third parameter by correcting a second parameter based on the statistical information of the first parameter and the statistical information of the second parameter; and generating an inference result for the input data based on the third parameter.

본 발명의 일 실시예에 따른 양자화 오류가 보정된 뉴럴 네트워크의 동작 방법은 제3 파라미터에 기반하여 새로운 뉴럴 네트워크를 생성하는 단계를 더 포함할 수 있다. A method of operating a neural network with quantization errors corrected according to an embodiment of the present invention may further include generating a new neural network based on a third parameter.

추론 결과를 생성하는 단계는 새로운 뉴럴 네트워크에 입력 데이터를 입력하는 단계; 및 새로운 뉴럴 네트워크의 출력을 추론 결과로서 생성하는 단계를 포함할 수 있다. Generating the inference result may include inputting input data to a new neural network; and generating an output of the new neural network as an inference result.

본 발명의 일 실시예에 따른 양자화 오류가 보정된 뉴럴 네트워크의 동작 방법은 제3 파라미터를 뉴럴 네트워크에 전달하는 단계; 및 제3 파라미터에 기반하여 뉴럴 네트워크의 모든 파라미터를 갱신하는 단계를 더 포함할 수 있다. A method of operating a neural network with quantization errors corrected according to an embodiment of the present invention includes transmitting a third parameter to the neural network; and updating all parameters of the neural network based on the third parameter.

추론 결과를 생성하는 단계는 모든 파라미터가 갱신된 뉴럴 네트워크에 입력 데이터를 입력하는 단계; 및 뉴럴 네트워크의 출력을 추론 결과로서 생성하는 단계를 포함할 수 있다. Generating the inference result may include inputting input data to a neural network in which all parameters are updated; and generating an output of the neural network as an inference result.

제3 파라미터를 생성하는 단계는, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 차이를 최소화하도록 제2 파라미터 각각을 보정하는 단계를 포함할 수 있다. Generating the third parameter may include correcting each of the second parameters to minimize a difference between an average of the first parameter and an average of the second parameter.

제3 파라미터를 생성하는 단계는, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 차이를 최소화하고, 제1 파라미터의 표준편차와 제2 파라미터의 표준편차 간의 차이를 최소화하도록 제2 파라미터 각각을 보정하는 단계를 포함할 수 있다. The generating of the third parameter may include correcting each of the second parameters to minimize a difference between an average of the first parameter and an average of the second parameter, and minimize a difference between a standard deviation of the first parameter and a standard deviation of the second parameter. steps may be included.

본 발명의 실시예에 따르면, 양자화 전 후 가중치 양자화로 인해 발생하는 평균 및 분산의 오차를 보정함으로써 양자화 이후의 정확도 손실을 최소화할 수 있다.According to an embodiment of the present invention, the loss of accuracy after quantization can be minimized by correcting errors in mean and variance caused by quantization of weights before and after quantization.

본 발명의 실시예에 따르면, 학습 후 양자화 (post-training quantization) 기법으로 양자화된 이후의 파라미터를 이용한 별도의 학습이 필요하지 않으면서도 뉴럴 네트워크의 성능을 향상할 수 있다. According to an embodiment of the present invention, the performance of a neural network can be improved without requiring separate learning using parameters after quantization using a post-training quantization technique.

본 발명의 실시예에 따르면, 통계 분포의 특성에 기반하는 제안된 양자화 오류 기법을 적용한 뉴럴 네트워크를 효과적으로 운용할 수 있다.According to an embodiment of the present invention, a neural network to which the proposed quantization error technique based on the characteristics of statistical distribution is applied can be effectively operated.

도 1은 데이터 양자화 과정에서 발생하는 양자화 에러를 도시하는 그래프이다.
도 2는 본 발명의 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 기법이 적용된 뉴럴 네트워크 및 뉴럴 네트워크의 동작/운용 방법을 도시하는 개념도이다.
도 3은 본 발명의 다른 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 기법이 적용된 뉴럴 네트워크 및 뉴럴 네트워크의 동작/운용 방법을 도시하는 개념도이다.
도 4는 본 발명의 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 장치 및 방법을 도시하는 개념도이다.
도 5는 본 발명의 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 장치 및 방법의 세부적인 구성을 도시하는 개념도이다.
도 6은 본 발명의 다른 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 장치 및 방법의 세부적인 구성을 도시하는 개념도이다.
도 7은 본 발명의 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 장치의 가중치 양자화 이후 보정을 통한 정확도 손실 최소화 성능을 나타낸 그래프이다.
도 8은 본 발명의 다른 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 장치의 가중치 양자화 이후 보정을 통한 정확도 손실 최소화 성능을 나타낸 그래프이다.
도 9는 도 1 내지 도 8의 과정의 적어도 일부를 수행할 수 있는 일반화된 뉴럴 네트워크 양자화 오류 보정 장치, 또는 컴퓨팅 시스템의 예시를 도시하는 개념도이다. 1 is a graph showing quantization errors generated in the process of quantizing data.
2 is a conceptual diagram illustrating a neural network to which a neural network quantization error correction technique according to an embodiment of the present invention is applied and an operation/operation method of the neural network.
3 is a conceptual diagram illustrating a neural network to which a neural network quantization error correction technique according to another embodiment of the present invention is applied and an operation/operation method of the neural network.
4 is a conceptual diagram illustrating a neural network quantization error correction apparatus and method according to an embodiment of the present invention.
5 is a conceptual diagram illustrating the detailed configuration of a neural network quantization error correction apparatus and method according to an embodiment of the present invention.
6 is a conceptual diagram showing the detailed configuration of a neural network quantization error correction apparatus and method according to another embodiment of the present invention.
7 is a graph showing performance of minimizing accuracy loss through correction after weight quantization of a neural network quantization error correction apparatus according to an embodiment of the present invention.
8 is a graph showing performance of minimizing accuracy loss through correction after weight quantization of a neural network quantization error correction apparatus according to another embodiment of the present invention.
FIG. 9 is a conceptual diagram illustrating an example of a generalized neural network quantization error correcting device or a computing system capable of performing at least part of the processes of FIGS. 1 to 8 .

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the present invention can make various changes and have various embodiments, specific embodiments are illustrated in the drawings and described in detail. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. '및/또는' 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first and second may be used to describe various components, but the components should not be limited by the terms. These terms are only used for the purpose of distinguishing one component from another. For example, a first element may be termed a second element, and similarly, a second element may be termed a first element, without departing from the scope of the present invention. The term 'and/or' includes a combination of a plurality of related recited items or any one of a plurality of related recited items.

본 출원의 실시예들에서, "A 및 B 중에서 적어도 하나"는 "A 또는 B 중에서 적어도 하나" 또는 "A 및 B 중 하나 이상의 조합들 중에서 적어도 하나"를 의미할 수 있다. 또한, 본 출원의 실시예들에서, "A 및 B 중에서 하나 이상"은 "A 또는 B 중에서 하나 이상" 또는 "A 및 B 중 하나 이상의 조합들 중에서 하나 이상"을 의미할 수 있다.In embodiments of the present application, “at least one of A and B” may mean “at least one of A or B” or “at least one of combinations of one or more of A and B”. Also, in the embodiments of the present application, “one or more of A and B” may mean “one or more of A or B” or “one or more of combinations of one or more of A and B”.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.It is understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, but other elements may exist in the middle. It should be. On the other hand, when an element is referred to as “directly connected” or “directly connected” to another element, it should be understood that no other element exists in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in this application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, the terms "include" or "have" are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or more other features It should be understood that the presence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가진 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in this application, they should not be interpreted in an ideal or excessively formal meaning. don't

한편 본 출원일 전에 공지된 기술이라 하더라도 필요 시 본 출원 발명의 구성의 일부로서 포함될 수 있으며, 이에 대해서는 본 발명의 취지를 흐리지 않는 범위 내에서 본 명세서에서 설명한다. 다만 본 출원 발명의 구성을 설명함에 있어, 본 출원일 전에 공지된 기술로서 당업자가 자명하게 이해할 수 있는 사항에 대한 자세한 설명은 본 발명의 취지를 흐릴 수 있으므로, 공지 기술에 대한 지나치게 자세한 사항의 설명은 생략한다. On the other hand, even if the technology is known prior to the filing date of the present application, it may be included as part of the configuration of the present application if necessary, and this will be described herein without obscuring the spirit of the present invention. However, in describing the configuration of the invention of the present application, a detailed description of matters that can be clearly understood by those skilled in the art as a known technology prior to the filing date of the present application may obscure the purpose of the present invention, so excessively detailed description of the known technology omit

예를 들어, 뉴럴 네트워크의 파라미터(가중치 등)를 양자화하는 기술 등은 본 발명의 출원 전 공지 기술을 이용할 수 있으며, 이들 공지 기술들 중 적어도 일부는 본 발명을 실시하는 데에 필요한 요소 기술로서 적용될 수 있다. For example, techniques for quantizing parameters (weights, etc.) of a neural network may use known techniques prior to filing the application of the present invention, and at least some of these known techniques may be applied as elemental techniques necessary for implementing the present invention. can

그러나 본 발명의 취지는 이들 공지 기술에 대한 권리를 주장하고자 하는 것이 아니며 공지 기술의 내용은 본 발명의 취지에 벗어나지 않는 범위 내에서 본 발명의 일부로서 포함될 수 있다. However, the purpose of the present invention is not to claim the rights to these known technologies, and the contents of the known technologies may be included as part of the present invention within the scope not departing from the spirit of the present invention.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예를 보다 상세하게 설명하고자 한다. 본 발명을 설명함에 있어 전체적인 이해를 용이하게 하기 위하여 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, with reference to the accompanying drawings, preferred embodiments of the present invention will be described in more detail. In order to facilitate overall understanding in the description of the present invention, the same reference numerals are used for the same components in the drawings, and redundant descriptions of the same components are omitted.

도 1은 데이터 양자화 과정에서 발생하는 양자화 에러를 도시하는 그래프이다. 1 is a graph showing quantization errors generated in the process of quantizing data.

도 1을 참조하면, 양자화 과정에 의하여 연속적으로 분포하는 가중치 값들이 양자화 오류를 가진 채로 양자화되는 과정이 도시된다. Referring to FIG. 1, a process in which weight values continuously distributed by the quantization process are quantized while having quantization errors is shown.

양자화 구간 내의 가중치 값들은 양자화된 대표값으로 매핑되고, 이 과정에서 양자화 전의 원본 가중치와 양자화 이후의 양자화된 가중치 사이에는 양자화 오류가 발생한다. Weight values within the quantization interval are mapped to quantized representative values, and in this process, quantization errors occur between original weights before quantization and quantized weights after quantization.

양자화로 인하여 연속적으로 분포하는 가중치 값들의 평균에서 가장 먼 구간에 속하는 원본 값들은 Truncation되고 이로 인한 Truncation error 또한 일종의 양자화 오류로 볼 수 있다. Due to quantization, the original values belonging to the range farthest from the mean of continuously distributed weight values are truncated, and the resulting truncation error can also be regarded as a kind of quantization error.

이러한 양자화 오류를 최소화하기 위한 종래 기술들의 한 종류는 학습 후 양자화 기법으로 발전하였다. One kind of conventional techniques for minimizing such quantization errors has been developed as a post-learning quantization technique.

종래의 학습 후 양자화 기법의 경우 양자화 오류를 최소화하기 위하여 주어진 가중치와 양자화된 가중치 사이의 오차를 최소화하는 양자화 알고리즘에 집중하여 설계된다. 양자화 후 가중치와 원본 가중치와의 차이로 인하여 정확도 손실이 발생할 수 있는데, 이러한 정확도 손실의 원인은 양자화된 가중치의 제한된 표현형으로 인한 손실과 양자화된 가중치가 원본 가중치와 다른 통계값 (평균/분산)을 가짐으로 인하여 뉴럴 네트워크를 거치면서 오차가 누적되어 발생하는 손실로 나눌 수 있다. 종래 기술들은 후자의 오차를 batch normalization layer의 running mean/variance 등을 보정하여 양자화를 위한 오차를 최소화하고자 시도되었다. In the case of the conventional post-learning quantization technique, in order to minimize quantization error, it is designed by focusing on a quantization algorithm that minimizes an error between a given weight and a quantized weight. Accuracy loss may occur due to the difference between the weights after quantization and the original weights. The causes of this accuracy loss are the loss due to the limited expression of the quantized weights and the statistical value (mean/variance) of the quantized weights different from the original weights. It can be divided into the loss caused by the accumulation of errors while passing through the neural network. Conventional techniques have tried to minimize the error for quantization by correcting the error of the latter by correcting the running mean/variance of the batch normalization layer.

본 발명의 일 실시예에서는 해당 오류를 직접적으로 보정함으로써 normalization layer등의 보정 없이 에러를 최소화할 수 있는 알고리즘을 제안한다. An embodiment of the present invention proposes an algorithm capable of minimizing an error without correction such as a normalization layer by directly correcting the corresponding error.

도 2는 본 발명의 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 기법이 적용된 뉴럴 네트워크 및 뉴럴 네트워크의 동작/운용 방법을 도시하는 개념도이다.2 is a conceptual diagram illustrating a neural network to which a neural network quantization error correction technique according to an embodiment of the present invention is applied and an operation/operation method of the neural network.

이하 본 발명의 명세서에서 뉴럴 네트워크의 "파라미터"는 가중치를 의미할 수 있다. 본 발명의 변형된 실시예에서는 뉴럴 네트워크의 파라미터는 가중치 및 액티베이션 파라미터를 의미할 수 있다. 본 발명의 실시예들에서 파라미터의 양자화 과정은 가중치의 양자화를 의미할 수도 있고, 가중치 및 액티베이션 파라미터의 양자화를 의미할 수 있다. Hereinafter, in the specification of the present invention, a “parameter” of a neural network may mean a weight. In a modified embodiment of the present invention, parameters of the neural network may mean weights and activation parameters. In embodiments of the present invention, the process of quantizing parameters may mean quantization of weights or quantization of weights and activation parameters.

본 발명의 일 실시예에 따른 양자화 오류가 보정된 뉴럴 네트워크의 동작 방법은 뉴럴 네트워크의 양자화되기 전 제1 파라미터를 수신하는 단계(S330); 오류가 보정된 양자화기(200)를 통하여 양자화 오류가 보정된 제3 파라미터를 생성하는 단계(S340); 및 제3 파라미터에 기반하여 입력 데이터에 대한 추론 결과를 생성하는 단계(S360)를 포함한다. A method of operating a neural network with quantization errors corrected according to an embodiment of the present invention includes receiving a first parameter before quantization of the neural network (S330); Generating a third parameter with corrected quantization errors through the error-corrected quantizer 200 (S340); and generating a reasoning result for the input data based on the third parameter (S360).

본 발명의 일 실시예에 따른 양자화 오류가 보정된 뉴럴 네트워크의 동작 방법은 제3 파라미터에 기반하여 새로운 뉴럴 네트워크(120)를 생성하는 단계를 더 포함할 수 있다. The method of operating a neural network with quantization error corrected according to an embodiment of the present invention may further include generating a new neural network 120 based on the third parameter.

추론 결과를 생성하는 단계(S360)는 새로운 뉴럴 네트워크(120)에 입력 데이터를 입력하는 단계(S350); 및 새로운 뉴럴 네트워크(120)의 출력을 추론 결과로서 생성하는 단계(S360)를 포함할 수 있다. Generating the inference result (S360) includes inputting input data to the new neural network 120 (S350); and generating an output of the new neural network 120 as an inference result (S360).

본 발명의 일 실시예에 따른 양자화 오류가 보정된 뉴럴 네트워크의 동작 방법은 양자화 전의 뉴럴 네트워크(100)에 학습 데이터를 입력하는 단계(S310); 및 학습 데이터에 기반하여 뉴럴 네트워크(100)가 학습하도록 뉴럴 네트워크(100)를 제어하는 단계(S320)를 더 포함할 수 있다. A method of operating a neural network with quantization errors corrected according to an embodiment of the present invention includes the steps of inputting training data to the neural network 100 before quantization (S310); and controlling the neural network 100 so that the neural network 100 learns based on the learning data ( S320 ).

도 3은 본 발명의 다른 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 기법이 적용된 뉴럴 네트워크 및 뉴럴 네트워크의 동작/운용 방법을 도시하는 개념도이다.3 is a conceptual diagram illustrating a neural network to which a neural network quantization error correction technique according to another embodiment of the present invention is applied and an operation/operation method of the neural network.

본 발명의 일 실시예에 따른 양자화 오류가 보정된 뉴럴 네트워크(100)의 동작 방법은 오류가 보정된 양자화기(200)에서 뉴럴 네트워크(100)의 양자화되기 전 제1 파라미터를 수신하는 단계(S330); 오류가 보정된 양자화기(200)에서 양자화 오류가 보정된 제3 파라미터를 생성하는 단계(S340); 및 제3 파라미터에 기반하여 입력 데이터에 대한 추론 결과를 생성하는 단계(S360)를 포함한다. According to an embodiment of the present invention, a method of operating a neural network 100 with quantization errors corrected includes receiving a first parameter before quantization of the neural network 100 in the error-corrected quantizer 200 (S330). ); generating a quantization error-corrected third parameter in the error-corrected quantizer 200 (S340); and generating a reasoning result for the input data based on the third parameter (S360).

본 발명의 일 실시예에 따른 양자화 오류가 보정된 뉴럴 네트워크(100)의 동작 방법은 제3 파라미터를 뉴럴 네트워크(100)에 전달하는 단계(S340); 및 제3 파라미터에 기반하여 뉴럴 네트워크(100)의 모든 파라미터를 갱신하는 단계를 더 포함할 수 있다. An operating method of the neural network 100 with quantization errors corrected according to an embodiment of the present invention includes transmitting a third parameter to the neural network 100 (S340); and updating all parameters of the neural network 100 based on the third parameter.

추론 결과를 생성하는 단계(S360)는 모든 파라미터가 갱신된 뉴럴 네트워크(100)에 입력 데이터를 입력하는 단계(S350); 및 뉴럴 네트워크(100)의 출력을 추론 결과로서 생성하는 단계(S360)를 포함할 수 있다. Generating the inference result (S360) includes inputting input data to the neural network 100 in which all parameters are updated (S350); and generating an output of the neural network 100 as an inference result (S360).

도 4는 본 발명의 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 장치(220) 및 방법을 도시하는 개념도이다.4 is a conceptual diagram illustrating a neural network quantization error correction apparatus 220 and method according to an embodiment of the present invention.

본 발명의 목적을 달성하기 위한 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 장치(220)는, 프로세서(processor); 및 프로세서를 통해 실행되는 적어도 하나의 명령이 저장된 메모리(memory)를 포함하고, 프로세서가 적어도 하나의 명령을 실행함으로써, 뉴럴 네트워크(100)의 양자화되기 전 제1 파라미터를 수신하고(S330), 뉴럴 네트워크(100)의 제1 파라미터가 양자화기(210)에서 양자화된 이후의 제2 파라미터를 수신하고(S212), 제1 파라미터의 통계적 정보와 제2 파라미터의 통계적 정보에 기반하여 제2 파라미터를 보정하고(220), 보정된 제2 파라미터를 제3 파라미터로서 출력한다(250).A neural network quantization error correction apparatus 220 according to an embodiment for achieving the object of the present invention includes a processor; and a memory storing at least one instruction executed by the processor, and by executing the at least one instruction, the processor receives the first parameter before quantization of the neural network 100 (S330), After the first parameter of the network 100 has been quantized by the quantizer 210, the second parameter is received (S212), and the second parameter is corrected based on the statistical information of the first parameter and the statistical information of the second parameter. (220), and outputs the corrected second parameter as a third parameter (250).

본 발명의 뉴럴 네트워크 양자화 오류 보정 방법은 딥러닝 최적화, 특히 양자화(quantization)와 관련하여 뉴럴 네트워크를 학습시킨 후(S320) 가중치(weight)에 양자화(quantization)를 적용하여 최적화를 적용한 후(210), 에러의 누적으로 인해 발생하는 정확도 손실을 최소화하도록 보정함으로써(220) 정확도를 유지시킬 수 있는 알고리즘을 제안한다.The neural network quantization error correction method of the present invention applies optimization by applying quantization to weights after training a neural network in relation to deep learning optimization, particularly quantization (S320), and then applying optimization (210) , We propose an algorithm that can maintain accuracy by correcting (220) to minimize the loss of accuracy caused by the accumulation of errors.

본 발명의 뉴럴 네트워크 양자화 오류 보정 장치(220)는 convolution 및 fully-connected layer 모두에 적용 가능하다. 일반적으로 convolution layer의 가중치는 4차원, fully-connected layer의 가중치는 2차원 데이터를 가지지만 본 발명의 경우 fan-in과 fan-out 차원으로 나누어서 보정을 적용하므로 편의를 위해 2차원 가중치를 가정한다. The neural network quantization error correction apparatus 220 of the present invention is applicable to both convolution and fully-connected layers. In general, the weights of the convolution layer have 4-dimensional data, and the weights of the fully-connected layer have 2-dimensional data. .

원본 가중치를

, 양자화 이후의 가중치를

라 할 때, 해당 가중치의 통계값 보정을 위해 2단계의 보정을 적용한다. original weight

, the weights after quantization

In this case, a two-step correction is applied to correct the statistical value of the corresponding weight.

도 5는 본 발명의 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 장치(220) 및 양자화 오류 방법의 세부적인 구성을 도시하는 개념도이다.5 is a conceptual diagram illustrating detailed configurations of a neural network quantization error correcting apparatus 220 and a quantization error method according to an embodiment of the present invention.

프로세서가 적어도 하나의 명령을 실행함으로써, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 차이를 최소화하도록 제2 파라미터 각각을 보정할 수 있다(230). By executing at least one instruction, the processor may correct each of the second parameters to minimize a difference between an average of the first parameter and an average of the second parameter (230).

프로세서가 적어도 하나의 명령을 실행함으로써, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 차이에 기반하여 제2 파라미터 각각을 보정할 수 있다(230). By executing at least one instruction, the processor may correct each of the second parameters based on the difference between the average of the first parameter and the average of the second parameter (230).

2단계 보정 중 첫 번째로, 양자화된 가중치와 원본 가중치의 출력 차원 별 평균의 차이를 최소화하기 위하여 다음과 같은 수학식 1의 보정을 적용한다. First of the two-step correction, the following correction of Equation 1 is applied in order to minimize the difference between the average of the quantized weights and the original weights for each output dimension.

[수학식 1][Equation 1]

이때 평균에 기반하여 보정된 가중치값

은 출력 차원에 대한 평균을 원본 가중치와 동일하게 유지할 수 있으므로 하기 수학식 2에 의하여 나타내어지는 양자화 후 정확도 손실을 줄일 수 있다. At this time, the weight value corrected based on the average

Since can keep the average of the output dimension equal to the original weight, the loss of accuracy after quantization represented by Equation 2 below can be reduced.

[수학식 2][Equation 2]

도 6은 본 발명의 다른 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 장치(220) 및 양자화 오류 보정 방법의 세부적인 구성을 도시하는 개념도이다.6 is a conceptual diagram illustrating detailed configurations of a neural network quantization error correction apparatus 220 and a quantization error correction method according to another embodiment of the present invention.

프로세서가 적어도 하나의 명령을 실행함으로써, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 차이를 최소화하고, 제1 파라미터의 표준편차와 제2 파라미터의 표준편차 간의 차이를 최소화하도록 제2 파라미터 각각을 보정할 수 있다(240). By causing the processor to execute at least one instruction, each of the second parameters is configured to minimize a difference between an average of the first parameter and an average of the second parameter, and minimize a difference between a standard deviation of the first parameter and a standard deviation of the second parameter. It can be corrected (240).

프로세서가 적어도 하나의 명령을 실행함으로써, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 제1 차이; 및 제1 파라미터의 표준편차와 제2 파라미터의 표준편차 간의 제2 차이에 기반하여 제2 파라미터 각각을 보정할 수 있다(240). a first difference between an average of a first parameter and an average of a second parameter by a processor executing at least one instruction; And based on the second difference between the standard deviation of the first parameter and the standard deviation of the second parameter, each of the second parameters may be corrected (240).

2단계 보정 중 두 번째로 양자화된 가중치와 원본 가중치의 출력 차원 별 평균 및 분산의 차이를 모두 최소화하기 위하여 다음과 같은 수학식 3의 보정을 적용한다.Among the two-step corrections, the correction of Equation 3 below is applied to minimize both the difference between the mean and variance of the second quantized weight and the original weight for each output dimension.

[수학식 3][Equation 3]

이때

는 원본 가중치 및 양자화된 가중치의 표준 편차이다. 평균과 표준편차가 모두 고려되어 보정된 가중치값

은 원본 가중치의 출력 차원에 대한 평균 및 분산을 동일하게 유지할 수 있으므로 양자화 후 정확도 손실을 최소화할 수 있다.At this time

is the standard deviation of the original weights and quantized weights. Weight values corrected by considering both the mean and standard deviation

can keep the mean and variance of the output dimension of the original weights the same, so the loss of accuracy after quantization can be minimized.

본 발명의 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 장치(220)의 뉴럴 네트워크 양자화 오류 보정 방법의 작동을 설명하면 다음과 같다.The operation of the neural network quantization error correction method of the neural network quantization error correction apparatus 220 according to an embodiment of the present invention will be described below.

본 발명은 가중치에 양자화를 적용한 이후 양자화된 가중치가 원본 가중치와 다른 통계값(평균/분산)을 가짐으로 인하여 오차가 누적되어 정확도 손실이 발생할 수 있다. In the present invention, after quantization is applied to the weights, since the quantized weights have statistical values (average/variance) different from those of the original weights, errors may accumulate, resulting in loss of accuracy.

본 발명은 양자화 전후의 통계값 차이를 보정해주는 알고리즘을 통해 양자화 정확도 손실을 최소화하였다.In the present invention, quantization accuracy loss is minimized through an algorithm that corrects the difference in statistical values before and after quantization.

도 7은 본 발명의 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 장치(220)의 가중치 양자화 이후 보정을 통한 정확도 손실 최소화 성능을 나타낸 그래프이다. 7 is a graph showing performance of minimizing accuracy loss through correction after weight quantization of the neural network quantization error correcting apparatus 220 according to an embodiment of the present invention.

도 7에 도시된 일 실시예는 MobileNet-v2 네트워크를 이용하여 양자화 및 오류 보정 과정을 수행하였다. In the embodiment shown in FIG. 7, quantization and error correction processes are performed using a MobileNet-v2 network.

도 7의 그래프의 x축은 양자화 레벨, y축은 정확도를 나타내며, 왼쪽 그래프는 CIFAR-100 데이터셋을, 오른쪽 그래프는 CIFAR-10 데이터셋을 이용한 실험 결과가 도시된다. In the graph of FIG. 7, the x-axis represents the quantization level and the y-axis represents the accuracy. The left graph shows the experimental results using the CIFAR-100 dataset and the right graph shows the experimental results using the CIFAR-10 dataset.

도 7을 참조하면, MobileNet-v2 네트워크를 이미지 분류 작업에 대하여 학습시킨 후 가중치 양자화를 적용했을 때 정확도 손실을 측정한 결과가 도시된다. 다양한 가중치 양자화 레벨에 대하여 정확도 트렌드가 도시된다. 양자화 레벨이 낮을수록 양자화로 인한 손실이 크고 정확도가 낮음을 확인할 수 있다. Referring to FIG. 7 , a result of measuring accuracy loss when weight quantization is applied after training a MobileNet-v2 network for an image classification task is shown. Accuracy trends are shown for various weight quantization levels. It can be seen that the lower the quantization level, the greater the loss due to quantization and the lower the accuracy.

도 7에서 도시된 것처럼 mean 보정은 baseline보다 정확도가 높고, mean&std 보정은 mean 보정보다도 정확도가 더 높아 순차적으로 정확도 손실이 최소화되는 것을 확인할 수 있다.As shown in FIG. 7 , it can be confirmed that the mean correction has higher accuracy than the baseline, and the mean&std correction has higher accuracy than the mean correction, so that the loss of accuracy is sequentially minimized.

도 8은 본 발명의 다른 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 장치(220)의 가중치 양자화 이후 보정을 통한 정확도 손실 최소화 성능을 나타낸 그래프이다. 8 is a graph showing performance of minimizing loss of accuracy through correction after weight quantization of the neural network quantization error correction apparatus 220 according to another embodiment of the present invention.

도 8에 도시된 일 실시예는 ResNet-18 네트워크를 이용하여 양자화 및 오류 보정 과정을 수행하였다. In the embodiment shown in FIG. 8, quantization and error correction processes were performed using a ResNet-18 network.

도 8의 그래프의 x축은 양자화 레벨, y축은 정확도를 나타내며, 왼쪽 그래프는 CIFAR-100 데이터셋을, 오른쪽 그래프는 CIFAR-10 데이터셋을 이용한 실험 결과가 도시된다.In the graph of FIG. 8, the x-axis represents the quantization level and the y-axis represents the accuracy. The left graph shows the experimental results using the CIFAR-100 dataset and the right graph shows the CIFAR-10 dataset.

도 8을 참조하면, ResNet-18 네트워크를 이미지 분류 작업에 대하여 학습시킨 후 가중치 양자화를 적용했을 때 정확도 손실을 측정한 결과가 도시된다. 다양한 가중치 양자화 레벨에 대하여 정확도 트렌드가 도시된다. 양자화 레벨이 낮을수록 양자화로 인한 손실이 크고 정확도가 낮음을 확인할 수 있다. Referring to FIG. 8 , a result of measuring accuracy loss when weight quantization is applied after training a ResNet-18 network for an image classification task is shown. Accuracy trends are shown for various weight quantization levels. It can be seen that the lower the quantization level, the greater the loss due to quantization and the lower the accuracy.

도 8에서 도시된 것처럼 mean 보정은 baseline보다 정확도가 높고, mean&std 보정은 mean 보정보다도 정확도가 더 높아 순차적으로 정확도 손실이 최소화되는 것을 확인할 수 있다. As shown in FIG. 8, it can be seen that the mean correction has higher accuracy than the baseline, and the mean&std correction has higher accuracy than the mean correction, so that the loss of accuracy is sequentially minimized.

도 9는 도 1 내지 도 8의 과정의 적어도 일부를 수행할 수 있는 일반화된 뉴럴 네트워크 양자화 오류 보정 장치(220), 또는 장치(220)를 구성하는 컴퓨팅 시스템의 예시를 도시하는 개념도이다. FIG. 9 is a conceptual diagram illustrating an example of a generalized neural network quantization error correction device 220 capable of performing at least part of the processes of FIGS. 1 to 8 or a computing system constituting the device 220 .

도 1 내지 도 8의 실시예에서도 도면 상으로는 생략되었으나 프로세서, 및 메모리가 전자적으로 각 구성 요소와 연결되고, 프로세서에 의하여 각 구성 요소의 동작이 제어되거나 관리될 수 있다. Even in the embodiments of FIGS. 1 to 8 , although omitted in the drawings, a processor and a memory are electronically connected to each component, and operations of each component may be controlled or managed by the processor.

본 발명의 일 실시예에 따른 방법의 적어도 일부의 과정은 도 9의 컴퓨팅 시스템(1000)에 의하여 실행될 수 있다. At least some of the processes of the method according to an embodiment of the present invention may be executed by the computing system 1000 of FIG. 9 .

도 9를 참조하면, 본 발명의 일 실시예에 따른 컴퓨팅 시스템(1000)은, 프로세서(1100), 메모리(1200), 통신 인터페이스(1300), 저장 장치(1400), 입력 인터페이스(1500), 출력 인터페이스(1600) 및 버스(bus)(1700)를 포함하여 구성될 수 있다.Referring to FIG. 9 , a computing system 1000 according to an embodiment of the present invention includes a processor 1100, a memory 1200, a communication interface 1300, a storage device 1400, an input interface 1500, and an output It may be configured to include an interface 1600 and a bus 1700.

본 발명의 일 실시예에 따른 컴퓨팅 시스템(1000)은, 적어도 하나의 프로세서(processor)(1100) 및 상기 적어도 하나의 프로세서(1100)가 적어도 하나의 단계를 수행하도록 지시하는 명령어들(instructions)을 저장하는 메모리(memory)(1200)를 포함할 수 있다. 본 발명의 일 실시예에 따른 방법의 적어도 일부의 단계는 상기 적어도 하나의 프로세서(1100)가 상기 메모리(1200)로부터 명령어들을 로드하여 실행함으로써 수행될 수 있다. The computing system 1000 according to an embodiment of the present invention includes at least one processor 1100 and instructions instructing the at least one processor 1100 to perform at least one step. It may include a memory (memory) 1200 for storing. At least some steps of the method according to an embodiment of the present invention may be performed by the at least one processor 1100 loading instructions from the memory 1200 and executing them.

프로세서(1100)는 중앙 처리 장치(central processing unit, CPU), 그래픽 처리 장치(graphics processing unit, GPU), 또는 본 발명의 실시예들에 따른 방법들이 수행되는 전용의 프로세서를 의미할 수 있다. The processor 1100 may mean a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which methods according to embodiments of the present invention are performed.

메모리(1200) 및 저장 장치(1400) 각각은 휘발성 저장 매체 및 비휘발성 저장 매체 중에서 적어도 하나로 구성될 수 있다. 예를 들어, 메모리(1200)는 읽기 전용 메모리(read only memory, ROM) 및 랜덤 액세스 메모리(random access memory, RAM) 중에서 적어도 하나로 구성될 수 있다. Each of the memory 1200 and the storage device 1400 may include at least one of a volatile storage medium and a non-volatile storage medium. For example, the memory 1200 may include at least one of a read only memory (ROM) and a random access memory (RAM).

또한, 컴퓨팅 시스템(1000)은, 무선 네트워크를 통해 통신을 수행하는 통신 인터페이스(1300)를 포함할 수 있다. Also, the computing system 1000 may include a communication interface 1300 that performs communication through a wireless network.

또한, 컴퓨팅 시스템(1000)은, 저장 장치(1400), 입력 인터페이스(1500), 출력 인터페이스(1600) 등을 더 포함할 수 있다.In addition, the computing system 1000 may further include a storage device 1400, an input interface 1500, an output interface 1600, and the like.

또한, 컴퓨팅 시스템(1000)에 포함된 각각의 구성 요소들은 버스(bus)(1700)에 의해 연결되어 서로 통신을 수행할 수 있다.In addition, each component included in the computing system 1000 may be connected by a bus 1700 to communicate with each other.

본 발명의 컴퓨팅 시스템(1000)의 예를 들면, 통신 가능한 데스크탑 컴퓨터(desktop computer), 랩탑 컴퓨터(laptop computer), 노트북(notebook), 스마트폰(smart phone), 태블릿 PC(tablet PC), 모바일폰(mobile phone), 스마트 워치(smart watch), 스마트 글래스(smart glass), e-book 리더기, PMP(portable multimedia player), 휴대용 게임기, 네비게이션(navigation) 장치, 디지털 카메라(digital camera), DMB(digital multimedia broadcasting) 재생기, 디지털 음성 녹음기(digital audio recorder), 디지털 음성 재생기(digital audio player), 디지털 동영상 녹화기(digital video recorder), 디지털 동영상 재생기(digital video player), PDA(Personal Digital Assistant) 등일 수 있다.For example, the computing system 1000 of the present invention includes a communicable desktop computer, a laptop computer, a notebook, a smart phone, a tablet PC, and a mobile phone. (mobile phone), smart watch, smart glass, e-book reader, PMP (portable multimedia player), portable game device, navigation device, digital camera, DMB (digital It may be a multimedia broadcasting) player, digital audio recorder, digital audio player, digital video recorder, digital video player, personal digital assistant (PDA), and the like. .

본 발명의 일 실시예에 따른 양자화 오류가 보정된 뉴럴 네트워크(100, 120)의 동작 방법은 메모리(memory)(1200)에 저장되는 적어도 하나의 명령을 실행하는 프로세서(processor)(1100)에 의하여 수행되는 방법으로서, 프로세서(1100)가 적어도 하나의 명령을 실행함으로써, 뉴럴 네트워크(100)의 양자화되기 전 제1 파라미터를 수신하는 단계(S330); 뉴럴 네트워크(100)의 양자화된 이후의 제2 파라미터를 수신하는 단계(S340); 제1 파라미터의 통계적 정보와 제2 파라미터의 통계적 정보에 기반하여 제2 파라미터를 보정함으로써 제3 파라미터를 생성하는 단계(220); 및 제3 파라미터에 기반하여 입력 데이터에 대한 추론 결과를 생성하는 단계(S360)를 포함한다. A method of operating a neural network (100, 120) with quantization error corrected according to an embodiment of the present invention is performed by a processor (1100) executing at least one instruction stored in a memory (1200). As a method, the processor 1100 executes at least one instruction to receive a first parameter of the neural network 100 before being quantized (S330); Receiving a second parameter after quantization of the neural network 100 (S340); generating a third parameter by correcting the second parameter based on the statistical information of the first parameter and the statistical information of the second parameter (220); and generating a reasoning result for the input data based on the third parameter (S360).

본 발명의 일 실시예에 따른 양자화 오류가 보정된 뉴럴 네트워크(100, 120)의 동작 방법은 제3 파라미터에 기반하여 새로운 뉴럴 네트워크(120)를 생성하는 단계를 더 포함할 수 있다. The method of operating the neural networks 100 and 120 with quantization errors corrected according to an embodiment of the present invention may further include generating a new neural network 120 based on the third parameter.

제3 파라미터를 생성하는 단계(220, 250)는, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 차이를 최소화하도록 제2 파라미터 각각을 보정하는 단계(230)를 포함할 수 있다. Generating the third parameter (220, 250) may include correcting each second parameter to minimize a difference between the average of the first parameter and the average of the second parameter (230).

제2 파라미터 각각을 보정하는 단계(220)는, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 차이에 기반하여 제2 파라미터 각각을 보정할 수 있다(230). In the step of correcting each second parameter (220), each second parameter may be corrected based on the difference between the average of the first parameter and the average of the second parameter (230).

제3 파라미터를 생성하는 단계(220, 250)는, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 차이를 최소화하고, 제1 파라미터의 표준편차와 제2 파라미터의 표준편차 간의 차이를 최소화하도록 제2 파라미터 각각을 보정하는 단계(240)를 포함할 수 있다. The generating of the third parameter (220, 250) is performed so as to minimize the difference between the mean of the first parameter and the mean of the second parameter and to minimize the difference between the standard deviation of the first parameter and the standard deviation of the second parameter. It may include a step 240 of correcting each of the two parameters.

제2 파라미터 각각을 보정하는 단계(220)는, 제1 파라미터의 평균과 제2 파라미터의 평균 간의 제1 차이; 및 제1 파라미터의 표준편차와 제2 파라미터의 표준편차 간의 제2 차이에 기반하여 제2 파라미터 각각을 보정할 수 있다(240).Correcting 220 each of the second parameters may include a first difference between an average of the first parameter and an average of the second parameter; And based on the second difference between the standard deviation of the first parameter and the standard deviation of the second parameter, each of the second parameters may be corrected (240).

본 발명의 일 실시예에 따른 뉴럴 네트워크 양자화 오류 보정 방법은 메모리(memory)(1200)에 저장되는 적어도 하나의 명령을 실행하는 프로세서(processor)(1100)에 의하여 수행되는 방법으로서, 프로세서(1100)가 적어도 하나의 명령을 실행함으로써, 뉴럴 네트워크(1000의 양자화되기 전 제1 파라미터를 수신하는 단계(S330); 뉴럴 네트워크(100)의 양자화된 이후의 제2 파라미터를 수신하는 단계(S212); 제1 파라미터의 통계적 정보와 제2 파라미터의 통계적 정보에 기반하여 제2 파라미터를 보정하는 단계(220); 및 보정된 제2 파라미터를 제3 파라미터로서 출력하는 단계(250)를 포함한다. A neural network quantization error correction method according to an embodiment of the present invention is a method performed by a processor 1100 executing at least one command stored in a memory 1200, and the processor 1100 By executing at least one command, receiving a first parameter of the neural network 1000 before quantization (S330); receiving a second parameter of the neural network 100 after quantization (S212); Correcting the second parameter based on the statistical information of the first parameter and the statistical information of the second parameter (220), and outputting the corrected second parameter as a third parameter (250).

본 발명의 실시예에 따른 방법의 동작은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 프로그램 또는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽힐 수 있는 정보가 저장되는 모든 종류의 기록장치를 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산 방식으로 컴퓨터로 읽을 수 있는 프로그램 또는 코드가 저장되고 실행될 수 있다.The operation of the method according to the embodiment of the present invention can be implemented as a computer readable program or code on a computer readable recording medium. A computer-readable recording medium includes all types of recording devices in which information readable by a computer system is stored. In addition, computer-readable recording media may be distributed to computer systems connected through a network to store and execute computer-readable programs or codes in a distributed manner.

또한, 컴퓨터가 읽을 수 있는 기록매체는 롬(rom), 램(ram), 플래시 메모리(flash memory) 등과 같이 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다. 프로그램 명령은 컴파일러(compiler)에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터(interpreter) 등을 사용해서 컴퓨터에 의해 실행될 수 있는 고급 언어 코드를 포함할 수 있다.In addition, the computer-readable recording medium may include hardware devices specially configured to store and execute program instructions, such as ROM, RAM, and flash memory. The program command may include high-level language codes that can be executed by a computer using an interpreter or the like as well as machine code generated by a compiler.

본 발명의 일부 측면들은 장치의 문맥에서 설명되었으나, 그것은 상응하는 방법에 따른 설명 또한 나타낼 수 있고, 여기서 블록 또는 장치는 방법 단계 또는 방법 단계의 특징에 상응한다. 유사하게, 방법의 문맥에서 설명된 측면들은 또한 상응하는 블록 또는 아이템 또는 상응하는 장치의 특징으로 나타낼 수 있다. 방법 단계들의 몇몇 또는 전부는 예를 들어, 마이크로프로세서, 프로그램 가능한 컴퓨터 또는 전자 회로와 같은 하드웨어 장치에 의해(또는 이용하여) 수행될 수 있다. 몇몇의 실시 예에서, 가장 중요한 방법 단계들의 적어도 하나 이상은 이와 같은 장치에 의해 수행될 수 있다.Although some aspects of the present invention have been described in the context of an apparatus, it may also represent a description according to a corresponding method, where a block or apparatus corresponds to a method step or feature of a method step. Similarly, aspects described in the context of a method may also be represented by a corresponding block or item or a corresponding feature of a device. Some or all of the method steps may be performed by (or using) a hardware device such as, for example, a microprocessor, programmable computer, or electronic circuitry. In some embodiments, at least one or more of the most important method steps may be performed by such a device.

실시예들에서, 프로그램 가능한 로직 장치(예를 들어, 필드 프로그래머블 게이트 어레이)가 여기서 설명된 방법들의 기능의 일부 또는 전부를 수행하기 위해 사용될 수 있다. 실시예들에서, 필드 프로그래머블 게이트 어레이(field-programmable gate array)는 여기서 설명된 방법들 중 하나를 수행하기 위한 마이크로프로세서(microprocessor)와 함께 작동할 수 있다. 일반적으로, 방법들은 어떤 하드웨어 장치에 의해 수행되는 것이 바람직하다.In embodiments, a programmable logic device (eg, a field programmable gate array) may be used to perform some or all of the functions of the methods described herein. In embodiments, a field-programmable gate array may operate in conjunction with a microprocessor to perform one of the methods described herein. Generally, methods are preferably performed by some hardware device.

이상 본 발명의 바람직한 실시 예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to preferred embodiments of the present invention, those skilled in the art can variously modify and change the present invention without departing from the spirit and scope of the present invention described in the claims below. You will understand that you can.

Claims

processor; and
a memory in which at least one instruction executed by the processor is stored; including,
By the processor executing the at least one instruction,
Receiving a first parameter of a neural network before being quantized;
Receiving a second parameter after quantization of the neural network;
Correcting the second parameter based on the statistical information of the first parameter and the statistical information of the second parameter;
Outputting the corrected second parameter as a third parameter,
Neural network quantization error correction device.

According to claim 1,
By the processor executing the at least one instruction,
correcting each of the second parameters to minimize a difference between an average of the first parameter and an average of the second parameter;
Neural network quantization error correction device.

According to claim 2,
By the processor executing the at least one instruction,
Correcting each of the second parameters based on a difference between an average of the first parameter and an average of the second parameter;
Neural network quantization error correction device.

According to claim 1,
By the processor executing the at least one instruction,
Correcting each of the second parameters to minimize a difference between an average of the first parameter and an average of the second parameter, and to minimize a difference between a standard deviation of the first parameter and a standard deviation of the second parameter.
Neural network quantization error correction device.

According to claim 4,
By the processor executing the at least one instruction,
a first difference between an average of the first parameter and an average of the second parameter; and
a second difference between the standard deviation of the first parameter and the standard deviation of the second parameter;
Correcting each of the second parameters based on
Neural network quantization error correction device.

A method performed by a processor executing at least one instruction stored in memory,
By the processor executing the at least one instruction,
receiving a first parameter of the neural network before being quantized;
Receiving a second parameter after quantization of the neural network;
correcting the second parameter based on the statistical information of the first parameter and the statistical information of the second parameter; and
outputting the corrected second parameter as a third parameter;
including,
Neural network quantization error correction method.

According to claim 6,
Correcting the second parameter,
correcting each of the second parameters to minimize a difference between an average of the first parameter and an average of the second parameter;
including,
Neural network quantization error correction method.

According to claim 7,
The step of correcting each of the second parameters,
Correcting each of the second parameters based on a difference between an average of the first parameter and an average of the second parameter;
Neural network quantization error correction method.

According to claim 6,
Correcting the second parameter,
correcting each of the second parameters to minimize a difference between an average of the first parameter and an average of the second parameter, and minimize a difference between a standard deviation of the first parameter and a standard deviation of the second parameter;
including,
Neural network quantization error correction method.

According to claim 9,
The step of correcting each of the second parameters,
a first difference between an average of the first parameter and an average of the second parameter; and
a second difference between the standard deviation of the first parameter and the standard deviation of the second parameter;
Correcting each of the second parameters based on
Neural network quantization error correction method.

A method performed by a processor executing at least one instruction stored in memory,
By the processor executing the at least one instruction,
receiving a first parameter of the neural network before being quantized;
Receiving a second parameter after quantization of the neural network;
generating a third parameter by correcting the second parameter based on the statistical information of the first parameter and the statistical information of the second parameter; and
generating an inference result for input data based on the third parameter;
including,
A method of operating a neural network with quantization error corrected.

According to claim 11,
generating a new neural network based on the third parameter;
Including more,
The step of generating the inference result is
inputting the input data to the new neural network; and
generating an output of the new neural network as the reasoning result;
including,
A method of operating a neural network with quantization error corrected.

According to claim 11,
transmitting the third parameter to the neural network; and
updating all parameters of the neural network based on the third parameter;
Including more,
The step of generating the inference result is
inputting the input data to a neural network in which all parameters are updated; and
generating an output of the neural network as the reasoning result;
including,
A method of operating a neural network with quantization error corrected.

According to claim 11,
Generating the third parameter,
correcting each of the second parameters to minimize a difference between an average of the first parameter and an average of the second parameter;
including,
A correction method for a neural network with quantization error correction.

According to claim 14,
The step of correcting each of the second parameters,
Correcting each of the second parameters based on a difference between an average of the first parameter and an average of the second parameter;
A correction method for a neural network with quantization error correction.

According to claim 11,
Generating the third parameter,
correcting each of the second parameters to minimize a difference between an average of the first parameter and an average of the second parameter, and minimize a difference between a standard deviation of the first parameter and a standard deviation of the second parameter;
including,
A correction method for a neural network with quantization error correction.

According to claim 16,
The step of correcting each of the second parameters,
a first difference between an average of the first parameter and an average of the second parameter; and
a second difference between the standard deviation of the first parameter and the standard deviation of the second parameter;
Correcting each of the second parameters based on
A correction method for a neural network with quantization error correction.