KR20220030108A

KR20220030108A - Method and system for training artificial neural network models

Info

Publication number: KR20220030108A
Application number: KR1020200111900A
Authority: KR
Inventors: 유영준; 김태훈
Original assignee: 네이버 주식회사; 라인 가부시키가이샤
Priority date: 2020-09-02
Filing date: 2020-09-02
Publication date: 2022-03-10
Also published as: JP2022042467A; KR102505946B1

Abstract

The present disclosure relates to a method for training an artificial neural network model. The method for training the artificial neural network model comprises: a step of training a first artificial neural network model up to a preset epoch by a momentum-based gradient descent method, and determining a momentum value of the first artificial neural network model at the epoch; a step of setting a determined momentum value as an initial momentum value of a second artificial neural network model; and a step of updating a parameter value of the second artificial neural network model using the training data based on the initial momentum value, wherein the parameter comprises a plurality of weights and the momentum. Therefore, the present invention is capable of reducing errors.

Description

Artificial neural network model training method and system {METHOD AND SYSTEM FOR TRAINING ARTIFICIAL NEURAL NETWORK MODELS}

본 개시는 인공신경망 모델 학습 방법 및 시스템에 관한 것으로, 구체적으로 인공신경망 모델을 학습하고 양자화된 인공신경망 모델을 생성하는 방법 및 시스템에 관한 것이다.The present disclosure relates to a method and system for learning an artificial neural network model, and more particularly, to a method and system for learning an artificial neural network model and generating a quantized artificial neural network model.

인공 신경망(artificial neural network)은, 인공 뉴런(neuron)(또는 뉴런 모델)들의 상호연결된 집합들을 구현하기 위하여 컴퓨팅 기기 또는 컴퓨팅 기기에 의해서 수행되는 방법이나 데이터 구조를 지칭할 수 있다. 인공 신경망에서 임의의 레이어에 포함된 인공 뉴런은 입력 데이터에 대한 가중치의 곱셈이나 활성화 함수의 적용과 같은 연산들을 수행함으로써 출력 데이터를 생성할 수 있고, 출력 데이터는 다른 레이어에 포함된 인공 뉴런에 전달될 수 있다. 인공 신경망의 예시로서 심층 신경망(deep neural network) 또는 딥 러닝(deep learning) 모델은 복수의 레이어들을 포함하며, 각 레이어의 노드들은 다수의 학습 데이터에 따라 학습됨으로써 학습 데이터의 특성을 인지할 수 있다. 이러한 인공신경망은 분류(classification), 객체 감지(object detection), 시맨틱 분할(semantic segmentation) 또는 스타일 이전(style transfer) 등과 같은 다양한 응용 분야에 적용될 수 있다.An artificial neural network may refer to a computing device or a method or data structure performed by a computing device to implement interconnected sets of artificial neurons (or neuron models). In an artificial neural network, an artificial neuron included in an arbitrary layer may generate output data by performing operations such as multiplication of weights or application of an activation function to input data, and the output data is transmitted to an artificial neuron included in another layer. can be As an example of an artificial neural network, a deep neural network or deep learning model includes a plurality of layers, and nodes of each layer are trained according to a plurality of training data to recognize the characteristics of the training data. . Such an artificial neural network may be applied to various application fields such as classification, object detection, semantic segmentation, or style transfer.

종래에는 인공신경망 모델의 복잡성(complexity)을 줄이고, 인공신경망 모델의 빠른 처리 속도를 위해, 인공신경망 모델의 가중치(weight) 및/또는 액티베이션(activation)을 양자화하는 연구가 활발히 진행되고 있다. 이러한 인공신경망 모델의 양자화는, 작은 수의 비트로 표현되는 고정 소수점 데이터의 연산으로, 고정밀도(full-precision) 모델의 높은 수의 비트로 표현되는 부동 소수점 연산을 근사하는 것을 목표로 한다. 즉, 사전 학습된 고정밀도 모델(pre-trained full-precision model)에서 양자화된 대응 모델(quantized counterpart)로의 전이 학습(transfer learning)을 목표로 한다.In the related art, research on quantizing the weight and/or activation of the artificial neural network model is being actively conducted in order to reduce the complexity of the artificial neural network model and to speed up the processing of the artificial neural network model. Quantization of such an artificial neural network model aims to approximate a floating-point operation represented by a high number of bits of a full-precision model by operation of fixed-point data represented by a small number of bits. That is, it aims at transfer learning from a pre-trained full-precision model to a quantized counterpart.

그러나, 순전파(forward-propagation) 연산 과정에서 양자화로 인한 근사 오류가 축적되고, 이에 따라 인공신경망 모델의 성능이 크게 저하될 수 있다. 특히 경량 모델(lightweight model)의 경우, 양자화로 인한 초기 통계 오류로 인해 사전 학습된 고정밀도 모델의 가중치를 그대로 사용하기 어렵다는 문제가 있다. 이러한 문제를 해결하기 위해, 학습 과정에서 네트워크 양자화(network quantization)를 시뮬레이션하는 방법이 사용될 수 있다. 예를 들어, 양자화 인지 학습(QAT: quantization-aware training)은, 순전파 과정에서 양자화된 추론을 시뮬레이션하고, 역전파 과정에서 그래디언트(gradient)를 계산하기 위해 예측기(STE: straight-through estimator)를 사용한다. QAT 방법이 가중치들의 범위에서 차이와 극단적인 가중치(outlier weight value)의 수를 감소시킬 수 있지만, STE에 의해 발생하는 그래디언트 근사 오류를 극복하기는 어렵다.However, approximation errors due to quantization are accumulated in the process of forward-propagation calculation, and accordingly, the performance of the artificial neural network model may be greatly degraded. In particular, in the case of a lightweight model, there is a problem in that it is difficult to use the weights of the pre-trained high-precision model as it is due to an initial statistical error due to quantization. To solve this problem, a method for simulating network quantization in the learning process may be used. For example, quantization-aware training (QAT) uses a straight-through estimator (STE) to simulate quantized inference in the forward propagation process and calculate the gradient in the backpropagation process. use. Although the QAT method can reduce the number of differences and outlier weight values in the range of weights, it is difficult to overcome the gradient approximation error caused by STE.

본 개시는 상기와 같은 문제를 해결하기 위한 인공신경망 모델 학습 방법, 기록 매체에 저장된 컴퓨터 프로그램 및 장치(시스템)를 제공한다.The present disclosure provides a method for learning an artificial neural network model, a computer program stored in a recording medium, and an apparatus (system) for solving the above problems.

본 개시는 방법, 장치(시스템) 또는 판독 가능 저장 매체에 저장된 컴퓨터 프로그램을 포함한 다양한 방식으로 구현될 수 있다.The present disclosure may be implemented in various ways including a method, an apparatus (system), or a computer program stored in a readable storage medium.

본 개시의 일 실시예에 따르면, 적어도 하나의 프로세서에 의해 수행되는 인공신경망 모델 학습 방법은 모멘텀(momentum) 기반의 경사하강법에 의해 미리 설정된 에포크(epoch)까지 제1 인공신경망 모델을 학습하여, 에포크에서의 제1 인공신경망 모델의 모멘텀 값을 결정하는 단계, 결정된 모멘텀 값을 제2 인공신경망 모델의 초기 모멘텀 값으로 설정하는 단계 및 초기 모멘텀 값을 기초로, 학습 데이터를 이용하여 제2 인공신경망 모델의 파라미터 값을 업데이트하는 단계를 포함하고, 파라미터는 복수의 가중치(weight) 및 모멘텀을 포함한다.According to an embodiment of the present disclosure, the artificial neural network model learning method performed by at least one processor learns the first artificial neural network model until a preset epoch by a momentum-based gradient descent method, Determining the momentum value of the first artificial neural network model in the epoch, setting the determined momentum value as the initial momentum value of the second artificial neural network model, and the second artificial neural network using training data based on the initial momentum value updating a parameter value of the model, wherein the parameter includes a plurality of weights and momentum.

본 개시의 일 실시예에 따른 인공신경망 모델 학습 방법을 컴퓨터에서 실행하기 위해 컴퓨터 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램이 제공된다.There is provided a computer program stored in a computer-readable recording medium to execute the artificial neural network model learning method according to an embodiment of the present disclosure on a computer.

본 개시의 일 실시예에 따른 인공신경망 모델 학습 시스템은 통신 모듈, 메모리 및 메모리와 연결되고, 메모리에 포함된 컴퓨터 판독 가능한 적어도 하나의 프로그램을 실행하도록 구성된 적어도 하나의 프로세서를 포함하고, 적어도 하나의 프로그램은, 모멘텀 기반의 경사하강법에 의해 미리 설정된 에포크까지 제1 인공신경망 모델을 학습하여, 에포크에서의 제1 인공신경망 모델의 모멘텀 값을 결정하고, 결정된 모멘텀 값을 제2 인공신경망 모델의 초기 모멘텀 값으로 설정하고, 초기 모멘텀 값을 기초로, 학습 데이터를 이용하여 제2 인공신경망 모델의 파라미터 값을 업데이트하기 위한 명령어들을 포함하고, 파라미터는 복수의 가중치 및 모멘텀을 포함한다.The artificial neural network model learning system according to an embodiment of the present disclosure includes at least one processor configured to execute at least one computer-readable program included in the communication module, the memory and the memory and is connected to the memory, and at least one The program learns the first artificial neural network model up to the preset epoch by the momentum-based gradient descent method, determines the momentum value of the first artificial neural network model at the epoch, and sets the determined momentum value to the initial value of the second artificial neural network model. Set as a momentum value and include instructions for updating a parameter value of the second artificial neural network model using training data based on the initial momentum value, and the parameter includes a plurality of weights and momentum.

본 개시의 다양한 실시예에서 QAT 모델의 적절한 초기 모멘텀 값을 설정함으로써, 파라미터들이 적절한 학습 방향으로 업데이트되도록 유도할 수 있고, 오류가 있는 그래디언트 값들이 무시될 수 있다. 이에 따라, 에포크를 반복함에 따라 양자화로 인한 오류가 증폭되는 것을 방지하고, QAT 모델의 파라미터들을 학습/업데이트할 때 발생하는 오류를 감소시킬 수 있다.In various embodiments of the present disclosure, by setting an appropriate initial momentum value of the QAT model, parameters may be induced to be updated in an appropriate learning direction, and erroneous gradient values may be ignored. Accordingly, it is possible to prevent an error due to quantization from being amplified as the epoch is repeated, and to reduce an error occurring when learning/updating parameters of the QAT model.

본 개시의 다양한 실시예에서 양자화된 모델은 FP 모델에 비해, 생성, 학습 및 추론 과정에서 저비용 및 저용량의 리소스를 필요로 하며, 빠른 속도로 추론 과정을 실행할 수 있다.In various embodiments of the present disclosure, compared to the FP model, the quantized model requires low-cost and low-capacity resources in the process of generation, learning, and inference, and can execute the inference process at a faster speed.

본 개시의 다양한 실시예에서 양자화된 모델은 기존의 QAT 모델을 양자화한 모델과 달리, FP 모델의 성능에 근접하거나 더 개선된 성능을 가질 수 있다.In various embodiments of the present disclosure, the quantized model may have performance close to or more improved than that of the FP model, unlike a model obtained by quantizing the existing QAT model.

본 개시의 다양한 실시예에서 분류, 객체 감지, 시맨틱 분할 또는 스타일 이전 등의 처리를 수행하는 우수한 성능을 갖는 경량 모델을 생성 및 학습할 수 있다.According to various embodiments of the present disclosure, it is possible to create and train a lightweight model having excellent performance for processing such as classification, object detection, semantic segmentation, or style transfer.

본 개시의 실시예들은, 이하 설명하는 첨부 도면들을 참조하여 설명될 것이며, 여기서 유사한 참조 번호는 유사한 요소들을 나타내지만, 이에 한정되지는 않는다.
도 1은 본 개시의 일 실시예에 따른 고정밀도(FP) 인공신경망 모델에 대한 양자화 인지 학습(QAT)을 실행하여 8비트 정수형(INT8: Integer 8-bit) 인공신경망 모델을 생성하는 인공신경망 학습 과정의 예시를 나타내는 도면이다.
도 2는 본 개시의 일 실시예에 따른 인공신경망 모델 학습 시스템의 내부 구성을 나타내는 블록도이다.
도 3은 본 개시의 일 실시예에 따른 학습 데이터 DB와 연결된 프로세서의 내부 구성을 나타내는 블록도이다.
도 4는 본 개시의 일 실시예에 따른 인공신경망 모델 학습 방법을 나타내는 흐름도이다.
도 5는 본 개시의 일 실시예에 따른 제2 인공신경망 모델의 파라미터 값을 업데이트하는 방법을 나타내는 흐름도이다.
도 6은 본 개시의 실시예들에 따라 생성 및 학습한 인공신경망 모델의 성능 평가를 위한, 각 레이어(layer)에서의 가중치 그래디언트 값의 히스토그램들(histograms) 이다.
도 7은 본 개시의 실시예들에 따라 생성 및 학습한 인공신경망 모델의 성능 평가를 위한, 손실 함수 그래프들(loss landscapes)이다.
도 8은 본 개시의 실시예들에 따라 생성 및 학습한 이미지 스타일 이전 인공신경망 모델의 성능 평가 결과의 예시이다.
도 9는 본 개시의 일 실시예에 따른 인공신경망 모델 학습 방법을 수행하기 위한 알고리즘의 흐름도이다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present disclosure will be described with reference to the accompanying drawings described below, in which like reference numerals denote like elements, but are not limited thereto.
1 is an artificial neural network learning that generates an 8-bit integer (INT8: Integer 8-bit) artificial neural network model by executing quantization cognitive learning (QAT) on a high-precision (FP) artificial neural network model according to an embodiment of the present disclosure; It is a diagram showing an example of the process.
2 is a block diagram illustrating an internal configuration of an artificial neural network model learning system according to an embodiment of the present disclosure.
3 is a block diagram illustrating an internal configuration of a processor connected to a learning data DB according to an embodiment of the present disclosure.
4 is a flowchart illustrating a method for learning an artificial neural network model according to an embodiment of the present disclosure.
5 is a flowchart illustrating a method of updating a parameter value of a second artificial neural network model according to an embodiment of the present disclosure.
6 is histograms of weight gradient values in each layer for performance evaluation of an artificial neural network model created and learned according to embodiments of the present disclosure.
7 is loss function graphs (loss landscapes) for performance evaluation of an artificial neural network model created and trained according to embodiments of the present disclosure.
8 is an example of a performance evaluation result of an artificial neural network model before image style generated and learned according to embodiments of the present disclosure.
9 is a flowchart of an algorithm for performing a method for learning an artificial neural network model according to an embodiment of the present disclosure.

이하, 본 개시의 실시를 위한 구체적인 내용을 첨부된 도면을 참조하여 상세히 설명한다. 다만, 이하의 설명에서는 본 개시의 요지를 불필요하게 흐릴 우려가 있는 경우, 널리 알려진 기능이나 구성에 관한 구체적 설명은 생략하기로 한다.Hereinafter, specific contents for carrying out the present disclosure will be described in detail with reference to the accompanying drawings. However, in the following description, if there is a risk of unnecessarily obscuring the gist of the present disclosure, detailed descriptions of well-known functions or configurations will be omitted.

첨부된 도면에서, 동일하거나 대응하는 구성요소에는 동일한 참조부호가 부여되어 있다. 또한, 이하의 실시예들의 설명에 있어서, 동일하거나 대응되는 구성요소를 중복하여 기술하는 것이 생략될 수 있다. 그러나, 구성요소에 관한 기술이 생략되어도, 그러한 구성요소가 어떤 실시예에 포함되지 않는 것으로 의도되지는 않는다.In the accompanying drawings, the same or corresponding components are assigned the same reference numerals. In addition, in the description of the embodiments below, overlapping description of the same or corresponding components may be omitted. However, even if descriptions regarding components are omitted, it is not intended that such components are not included in any embodiment.

개시된 실시예의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 개시는 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 개시가 완전하도록 하고, 본 개시가 통상의 기술자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것일 뿐이다.Advantages and features of the disclosed embodiments, and methods of achieving them, will become apparent with reference to the embodiments described below in conjunction with the accompanying drawings. However, the present disclosure is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only the present embodiments allow the present disclosure to be complete, and the present disclosure provides those skilled in the art with the scope of the invention. It is provided for complete information only.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 개시된 실시예에 대해 구체적으로 설명하기로 한다. 본 명세서에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 관련 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서, 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다.Terms used in this specification will be briefly described, and the disclosed embodiments will be described in detail. The terms used in this specification have been selected as currently widely used general terms as possible while considering the functions in the present disclosure, but these may vary depending on the intention or precedent of a person skilled in the art, the emergence of new technology, and the like. In addition, in a specific case, there is a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the corresponding invention. Therefore, the terms used in the present disclosure should be defined based on the meaning of the term and the content throughout the present disclosure, rather than the simple name of the term.

본 명세서에서의 단수의 표현은 문맥상 명백하게 단수인 것으로 특정하지 않는 한, 복수의 표현을 포함한다. 또한, 복수의 표현은 문맥상 명백하게 복수인 것으로 특정하지 않는 한, 단수의 표현을 포함한다. 명세서 전체에서 어떤 부분이 어떤 구성요소를 포함한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다.References in the singular herein include plural expressions unless the context clearly dictates the singular. Also, the plural expression includes the singular expression unless the context clearly dictates the plural. In the entire specification, when a part includes a certain element, this means that other elements may be further included, rather than excluding other elements, unless otherwise stated.

또한, 명세서에서 사용되는 '모듈' 또는 '부'라는 용어는 소프트웨어 또는 하드웨어 구성요소를 의미하며, '모듈' 또는 '부'는 어떤 역할들을 수행한다. 그렇지만, '모듈' 또는 '부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '모듈' 또는 '부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서, '모듈' 또는 '부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 또는 변수들 중 적어도 하나를 포함할 수 있다. 구성요소들과 '모듈' 또는 '부'들은 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '모듈' 또는 '부'들로 결합되거나 추가적인 구성요소들과 '모듈' 또는 '부'들로 더 분리될 수 있다.In addition, the term 'module' or 'unit' used in the specification means a software or hardware component, and 'module' or 'unit' performs certain roles. However, 'module' or 'unit' is not meant to be limited to software or hardware. A 'module' or 'unit' may be configured to reside on an addressable storage medium or configured to reproduce one or more processors. Thus, as an example, a 'module' or 'unit' refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, and properties. , procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, database, data structures, tables, arrays or at least one of variables. Components and 'modules' or 'units' are the functions provided therein that are combined into a smaller number of components and 'modules' or 'units' or additional components and 'modules' or 'units' can be further separated.

본 개시의 일 실시예에 따르면, '모듈' 또는 '부'는 프로세서 및 메모리로 구현될 수 있다. '프로세서'는 범용 프로세서, 중앙 처리 장치(CPU), 마이크로프로세서, 디지털 신호 프로세서(DSP), 제어기, 마이크로제어기, 상태 머신 등을 포함하도록 넓게 해석되어야 한다. 몇몇 환경에서, '프로세서'는 주문형 반도체(ASIC), 프로그램가능 로직 디바이스(PLD), 필드 프로그램가능 게이트 어레이(FPGA) 등을 지칭할 수도 있다. '프로세서'는, 예를 들어, DSP와 마이크로프로세서의 조합, 복수의 마이크로프로세서들의 조합, DSP 코어와 결합한 하나 이상의 마이크로프로세서들의 조합, 또는 임의의 다른 그러한 구성들의 조합과 같은 처리 디바이스들의 조합을 지칭할 수도 있다. 또한, '메모리'는 전자 정보를 저장 가능한 임의의 전자 컴포넌트를 포함하도록 넓게 해석되어야 한다. '메모리'는 임의 액세스 메모리(RAM), 판독-전용 메모리(ROM), 비-휘발성 임의 액세스 메모리(NVRAM), 프로그램가능 판독-전용 메모리(PROM), 소거-프로그램가능 판독 전용 메모리(EPROM), 전기적으로 소거가능 PROM(EEPROM), 플래쉬 메모리, 자기 또는 광학 데이터 저장장치, 레지스터들 등과 같은 프로세서-판독가능 매체의 다양한 유형들을 지칭할 수도 있다. 프로세서가 메모리로부터 정보를 판독하고/하거나 메모리에 정보를 기록할 수 있다면 메모리는 프로세서와 전자 통신 상태에 있다고 불린다. 프로세서에 집적된 메모리는 프로세서와 전자 통신 상태에 있다.According to an embodiment of the present disclosure, a 'module' or a 'unit' may be implemented with a processor and a memory. 'Processor' should be construed broadly to include general purpose processors, central processing units (CPUs), microprocessors, digital signal processors (DSPs), controllers, microcontrollers, state machines, and the like. In some contexts, a 'processor' may refer to an application specific semiconductor (ASIC), a programmable logic device (PLD), a field programmable gate array (FPGA), or the like. 'Processor' refers to a combination of processing devices, such as, for example, a combination of a DSP and a microprocessor, a combination of a plurality of microprocessors, a combination of one or more microprocessors in combination with a DSP core, or any other such configurations. You may. Also, 'memory' should be construed broadly to include any electronic component capable of storing electronic information. 'Memory' means random access memory (RAM), read-only memory (ROM), non-volatile random access memory (NVRAM), programmable read-only memory (PROM), erase-programmable read-only memory (EPROM); may refer to various types of processor-readable media, such as electrically erasable PROM (EEPROM), flash memory, magnetic or optical data storage, registers, and the like. A memory is said to be in electronic communication with the processor if the processor is capable of reading information from and/or writing information to the memory. A memory integrated in the processor is in electronic communication with the processor.

본 개시에서'고정밀도(full-precision or high-precision) 인공신경망 모델'은 큰 수의 비트로 표현되는 데이터형의 파라미터들을 포함하는 인공신경망 모델을 지칭할 수 있다. 예를 들어, 고정밀도 인공신경망 모델은 32 비트 또는 그 이상의 비트 수로 표현되는 부동 소수점 데이터형의 파라미터들(가중치, 모멘텀, 커널, 액티베이션 등)로 구성되는 인공신경망 모델을 포함할 수 있다. 이에 반해, '저정밀도(low-precision) 인공신경망 모델'은 작은 수의 비트로 표현되는 데이터형의 파라미터들을 포함하는 인공신경망 모델을 지칭할 수 있다. 예를 들어, 저정밀도 인공신경망 모델은 8비트로 표현되는 고정 소수점 또는 정수 데이터형의 파라미터들로 구성되는 인공신경망 모델을 포함할 수 있다.In the present disclosure, a 'full-precision or high-precision artificial neural network model' may refer to an artificial neural network model including parameters of a data type expressed by a large number of bits. For example, the high-precision neural network model may include an artificial neural network model composed of parameters (weight, momentum, kernel, activation, etc.) of a floating-point data type expressed in 32 bits or more bits. On the other hand, a 'low-precision artificial neural network model' may refer to an artificial neural network model including parameters of a data type expressed by a small number of bits. For example, the low-precision neural network model may include an artificial neural network model composed of parameters of a fixed-point or integer data type expressed in 8 bits.

본 개시에서 '양자화(quantization)'는 인공신경망의 파라미터들의 데이터형의 비트 수를 감소시킴으로써 인공신경망의 파라미터들이 저장되는 메모리양 또는 그 파라미터들에 대한 연산량을 감소시키는 방법을 지칭할 수 있다. 예를 들어, 인공신경망의 양자화 방법은, 고정밀도 인공신경망 모델의 학습이 완료된 후 해당 모델의 파라미터들의 데이터형을 더 작은 수의 비트로 표현되는 데이터형으로 변환하는 후양자화 방법(post-quantization), 인공신경망의 학습 과정 중에 파라미터들 중 적어도 일부에 대한 연산을 작은 수의 비트 연산으로 시뮬레이션하는 양자화 인지 학습(QAT: quantization-aware training) 등을 포함할 수 있다.In the present disclosure, 'quantization' may refer to a method of reducing the amount of memory in which the parameters of the artificial neural network are stored or the amount of computation for the parameters by reducing the number of bits of the data type of the parameters of the artificial neural network. For example, the quantization method of the artificial neural network includes a post-quantization method in which the data type of parameters of the corresponding model is converted into a data type expressed by a smaller number of bits after training of the high-precision artificial neural network model is completed; During the learning process of the artificial neural network, it may include quantization-aware training (QAT) that simulates an operation on at least some of the parameters with a small number of bit operations.

본 개시에서 '오류역전파(error back-propagation) 학습' 방법은, 인공신경망의 인공 뉴런에 대한 입력에 대한 출력 값과 학습 대상인 타겟 값의 차이(또는 오류)에 기초하여 역전파 학습 과정에서 파라미터들(예를 들어, 가중치, 모멘텀)을 업데이트하는 방법을 지칭할 수 있다. 오류역전파 학습 방법은, 인공신경망의 출력 값과 타겟 값의 차이가 최소가 되도록 가중치를 결정하는 방법으로 '경사하강법(gradient-descent)'을 사용할 수 있다. 경사하강법에 따라 인공신경망의 파라미터들을 업데이트할 때 인공신경망의 학습이 국소 최저점에서 정체되는 현상이 발생할 수 있다. 이러한 문제점을 방지하기 위해 이전 에포크에서의 파라미터의 업데이트 방향을 유지하기 위한 요소로 모멘텀(momentum)을 사용할 수 있다.In the present disclosure, the 'error back-propagation learning' method is based on the difference (or error) between the output value of the input to the artificial neuron of the artificial neural network and the target value to be learned, parameters in the back-propagation learning process It may refer to a method of updating (eg, weight, momentum). In the error backpropagation learning method, 'gradient-descent' may be used as a method of determining weights so that the difference between the output value of the artificial neural network and the target value is minimized. When the parameters of the artificial neural network are updated according to the gradient descent method, the learning of the artificial neural network may stagnate at the local lowest point. In order to prevent this problem, momentum may be used as an element for maintaining the update direction of parameters in the previous epoch.

도 1은 본 개시의 일 실시예에 따른 고정밀도(FP: Full-Precision) 인공신경망 모델에 대한 양자화 인지 학습(QAT)을 실행하여 8비트 정수형(INT8: Integer 8-bit) 인공신경망 모델(150)을 생성하는 인공신경망 학습 과정의 예시를 나타내는 도면이다.1 is an 8-bit integer type (INT8: Integer 8-bit) artificial neural network model 150 by executing quantization cognitive learning (QAT) on a high-precision (FP: Full-Precision) artificial neural network model according to an embodiment of the present disclosure. ) is a diagram showing an example of an artificial neural network learning process that generates

도시된 인공신경망 학습 과정에 따르면, QAT 모델(130)을 학습을 위한 초기 파라미터 값들 중 적어도 일부를 FP 모델(110)의 학습을 통해 획득할 수 있다. 일 실시예에서, QAT 모델(130)의 파라미터들을 모멘텀 기반의 경사하강법을 이용하여 최적화하는 경우, QAT 모델(130)의 초기 파라미터들 중에서 모멘텀을 FP 모델(110)의 학습을 통해 획득할 수 있다. 예를 들어, FP 모델(110)의 파라미터 업데이트의 최초 일정 주기(예를 들어, 최초 주기(first epoch))에서 결정된 모멘텀이 QAT 모델(130)의 초기 모멘텀으로 사용될 수 있다. 이와 같이 QAT 모델(130)의 초기 모멘텀을 FP 모델(110)의 최초 일정 주기까지의 학습을 통해 결정된 모멘텀으로부터 결정함으로써, QAT 모델(130)이 학습 과정에서 국소 최저점(local minimum)에서 정체하는 것을 방지하여 전역 최저점(global minimum)을 효과적으로 탐색할 수 있도록 도울 수 있다.According to the illustrated artificial neural network learning process, at least some of initial parameter values for learning the QAT model 130 may be acquired through learning of the FP model 110 . In one embodiment, when the parameters of the QAT model 130 are optimized using the momentum-based gradient descent method, momentum among the initial parameters of the QAT model 130 can be obtained through learning of the FP model 110 . there is. For example, the momentum determined in the first constant period (eg, first epoch) of the parameter update of the FP model 110 may be used as the initial momentum of the QAT model 130 . In this way, by determining the initial momentum of the QAT model 130 from the momentum determined through learning up to the first predetermined period of the FP model 110, the QAT model 130 stagnates at a local minimum in the learning process. This can help to effectively search for a global minimum.

구체적으로, 제1 모델 학습부(120)는 오류 역전파 학습방법을 이용하여 FP 모델(110)의 파라미터들을 업데이트할 수 있다. 여기서, FP 모델(110)의 파라미터들은 고정밀도(full-precision) 연산이 가능한 데이터형(예를 들어, 32 또는 그 이상의 비트 수로 표현되는 부동 소수점 데이터)으로 표현될 수 있다. 따라서, FP 모델(110)의 생성 및 학습과 해당 모델에 기초한 추론(inference)의 실행 시 파라미터들을 저장 및 연산 처리하기 위해 고용량 및 고비용의 메모리 및 계산 리소스(resource)가 필요할 수 있다.Specifically, the first model learning unit 120 may update the parameters of the FP model 110 by using the error backpropagation learning method. Here, the parameters of the FP model 110 may be expressed as a data type capable of performing a full-precision operation (eg, floating-point data expressed in 32 or more bits). Accordingly, high capacity and expensive memory and computational resources may be required to store and process parameters when generating and learning the FP model 110 and executing inference based on the model.

제1 모델 학습부(120)에서 미리 설정된 에포크(epoch)(예를 들어, 최초 1차 에포크)까지 FP 모델(110)을 학습하여, 해당 에포크에서의 FP 모델(110)의 파라미터 값들을 결정할 수 있다. 예를 들어, 제1 모델 학습부(120)가 모멘텀 기반의 경사하강법을 이용하여 파라미터들을 최적화하는 경우에, 첫번째 에포크에서 FP 모델(110)의 각 레이어에 학습 데이터를 입력하여 출력값을 계산하는 순전파(forward-propagation) 학습 단계를 실행하고, 그 출력값과 타겟값의 차이(오류)를 가중치(weight)로 편미분한 값(또는 그래디언트(gradient)를 이용하여 해당 가중치를 업데이트하는 역전파(back-propagation) 학습 단계를 실행할 수 있다. 역전파 학습 단계는, 각 가중치를 업데이트하는 방향이 FP 모델(110)의 오류를 최소화하는 것으로 설정될 수 있도록, 가중치에 그래디언트와 함께 모멘텀을 더하는 것을 포함할 수 있다. 여기서, 모멘텀 값은 이전 단계(또는 이전 에포크)에서 계산된 그래디언트 값의 흔적(trace)을 누적한 것을 의미할 수 있다. 이와 같이, 첫번째 에포크에서의 FP 모델(110)의 모멘텀 값(

)을 결정할 수 있다.By learning the FP model 110 up to a preset epoch (eg, the first first epoch) in the first model learning unit 120, parameter values of the FP model 110 at the corresponding epoch can be determined. there is. For example, when the first model learning unit 120 optimizes parameters using the momentum-based gradient descent method, input training data to each layer of the FP model 110 in the first epoch to calculate an output value Executes the forward-propagation learning step, and updates the weight using a value (or gradient) obtained by partial differentiation of the difference (error) between the output value and the target value as a weight. -propagation) the learning step may be executed, which may include adding momentum to the weights along with the gradient so that the direction of updating each weight can be set to minimize the error of the FP model 110 Here, the momentum value may mean accumulating the traces of the gradient values calculated in the previous step (or the previous epoch). As such, the momentum value of the FP model 110 in the first epoch (

) can be determined.

제2 모델 학습부(140)는 이렇게 결정된 FP 모델(110)의 모멘텀을 기초로 QAT 모델(130)을 생성하고, 생성된 QAT 모델(130)을 학습/업데이트할 수 있다. 예를 들어, 제2 모델 학습부(140)는 첫번째 에포크에서 결정된 FP 모델(110)의

을 QAT 모델(130)의 초기 모멘텀 값으로 설정하고, 이를 기초로, 학습 데이터를 이용하여 QAT 모델(130)을 학습/업데이트할 수 있다. 예를 들어, 제2 모델 학습부(140)는 모멘텀 기반의 경사하강법을 이용하여 QAT 모델(130)의 파라미터들을 최적화할 수 있다.The second model learning unit 140 may generate the QAT model 130 based on the momentum of the FP model 110 determined in this way, and may learn/update the generated QAT model 130 . For example, the second model learning unit 140 may perform the FP model 110 determined in the first epoch.

is set as the initial momentum value of the QAT model 130 , and based on this, the QAT model 130 may be trained/updated using the training data. For example, the second model learner 140 may optimize the parameters of the QAT model 130 by using the momentum-based gradient descent method.

일 실시예에서, 제2 모델 학습부(140)는 QAT 모델(130)의 순전파 학습 과정에서는 파라미터의 양자화의 효과를 시뮬레이션(simulation)하여 예측 값을 결정하고, 역전파 학습 과정에서는 기존의 방법대로 고정밀도의 데이터형으로 표현된 파라미터 값의 업데이트를 수행함으로써 QAT 모델(130)을 학습/업데이트할 수 있다. 예를 들어, 제2 모델 학습부(140)는 QAT 모델(130)의 파라미터 값들 중 적어도 일부를 8비트 정수형으로 표현하여 순전파 학습 과정을 수행하고, 32비트 실수형으로 표현된 QAT 모델(130)의 파라미터 값들을 이용하여 역전파 학습 과정을 수행할 수 있다. 따라서, QAT 모델(130)의 파라미터의 값은 32비트의 실수형으로 업데이트 및 저장될 수 있으며, QAT 모델(130)은 순전파 및 역전파 학습 과정이 모두 8비트 정수형으로 학습/업데이트된 모델보다 높은 정밀도(또는 정확도)를 가질 수 있다. 이를 위해, QAT 모델(130)은 인공신경망의 각 레이어에 포함된 노드들에 입력값을 전달하는 경로에 가장 양자화 노드(fake quantization node)를 추가로 포함하고, 제2 모델 학습부(140)는 순전파 학습 과정에서 가장 양자화 노드를 통해 입력값을 양자화된 8비트 정수형의 값으로 변환함으로써, 기존의 32비트 실수형 연산(예를 들면, 32비트 실수형으로 표현된 파라미터를 포함하는 연산)을 통해 8비트 정수형 계산을 모방(mimic)할 수 있다.In one embodiment, the second model learning unit 140 determines the predicted value by simulating the effect of quantization of parameters in the forward propagation learning process of the QAT model 130, and in the backpropagation learning process, the conventional method As shown, the QAT model 130 can be trained/updated by updating the parameter value expressed in the high-precision data type. For example, the second model learning unit 140 performs a forward propagation learning process by expressing at least some of the parameter values of the QAT model 130 as an 8-bit integer type, and the QAT model 130 expressed as a 32-bit real number type. ), a backpropagation learning process can be performed using the parameter values. Therefore, the parameter value of the QAT model 130 can be updated and stored in a 32-bit real number type, and the QAT model 130 is more than a model in which both forward propagation and back propagation learning processes are learned/updated in an 8-bit integer type. It can have high precision (or accuracy). To this end, the QAT model 130 additionally includes a fake quantization node in a path for transmitting input values to nodes included in each layer of the artificial neural network, and the second model learning unit 140 is In the forward propagation learning process, the input value is converted into a quantized 8-bit integer value through the most quantization node, so that the existing 32-bit real-type operation (for example, an operation including a parameter expressed in a 32-bit real-type) is performed. 8-bit integer arithmetic can be mimicked.

일 실시예에서, 제2 모델 학습부(140)는 역전파 학습 과정에서 QAT 모델(130)의 복수의 파라미터를 그래디언트 값을 기초로 업데이트할 수 있다. 구체적으로, 복수의 파라미터 중 적어도 하나의 파라미터에 대한 그래디언트 값을 증폭하고, 증폭된 그래디언트 값을 기초로, QAT 모델(130)의 파라미터 값을 업데이트할 수 있다. 이와 같이 제2 모델 학습부(140)에 의해 QAT 모델(130)의 파라미터 값을 업데이트함으로써, QAT 모델(130)의 초기 에포크에서 오류가 포함된 정보에 의해 그래디언트가 계산되어 최적의 국소 최소값에 대한 탐색 영역의 범위를 감소시키는 문제를 해결할 수 있다.In an embodiment, the second model learning unit 140 may update a plurality of parameters of the QAT model 130 based on the gradient value in the backpropagation learning process. Specifically, the gradient value of at least one parameter among the plurality of parameters may be amplified, and the parameter value of the QAT model 130 may be updated based on the amplified gradient value. In this way, by updating the parameter values of the QAT model 130 by the second model learning unit 140, the gradient is calculated by the information including the error in the initial epoch of the QAT model 130, and It is possible to solve the problem of reducing the scope of the search area.

QAT 모델(130)의 학습이 완료된 후, QAT 모델(130)의 파라미터들을 저정밀도의 데이터형으로 양자화하여 INT8 모델(150)을 생성할 수 있다. 예를 들어, 32비트 실수형의 QAT 모델(130)의 파라미터 값들을 8비트 정수형으로 양자화하여, INT8 모델(150)을 생성할 수 있다. 이렇게 생성된 INT8 모델(150)은 QAT 모델(130)보다 저비용 및 저용량의 리소스를 필요로 하며, 해당 모델을 이용한 추론 과정도 빠른 속도로 실행할 수 있다. 또한, 이상 설명한 방법에 따라 생성된 INT8 모델(150)은, 후양자화 방식(post-quantization)에 따라 고정밀도 데이터형으로 표현된 파라미터들을 포함하는 FP 모델을 학습/업데이트한 후 양자화하여 생성한 모델에 비하여 양자화로 인한 오류를 더 감소시킬 수 있고 경우에 따라 더 향상된 성능을 갖는다.After the learning of the QAT model 130 is completed, the INT8 model 150 may be generated by quantizing the parameters of the QAT model 130 into a low-precision data type. For example, the INT8 model 150 may be generated by quantizing parameter values of the QAT model 130 of a 32-bit real number type into an 8-bit integer type. The INT8 model 150 generated in this way requires a lower cost and a lower capacity resource than the QAT model 130 , and an inference process using the corresponding model can be executed at a faster speed. In addition, the INT8 model 150 generated according to the above-described method is a model generated by quantizing after learning/updating an FP model including parameters expressed in a high-precision data type according to a post-quantization method. Compared to , the error due to quantization can be further reduced, and in some cases, the performance is further improved.

도 1에서는 제1 모델 학습부(120)와 제2 모델 학습부(140)가 각각 도시되어 있으나, 이에 한정하지 않고, 제1 모델 학습부(120)와 제2 모델 학습부(140)는 하나의 학습부로 통합되어 구현될 수 있다. 또한, 도 1에서는 FP 모델(110), QAT 모델(130) 및 INT8 모델(150)은 각각 별도의 인공신경망 모델로 도시되어 있으나, 이에 한정하지 않고, 하나의 인공신경망 모델 구조에 포함된 파라미터 값들의 데이터형을 각 학습 단계에 따라 수정 또는 변경하여 구현된 것일 수 있다.Although the first model learning unit 120 and the second model learning unit 140 are respectively illustrated in FIG. 1 , the present invention is not limited thereto, and the first model learning unit 120 and the second model learning unit 140 are one It can be implemented by being integrated into the learning unit of In addition, in FIG. 1 , the FP model 110 , the QAT model 130 , and the INT8 model 150 are respectively illustrated as separate artificial neural network models, but the present invention is not limited thereto, and parameter values included in one artificial neural network model structure. It may be implemented by modifying or changing their data types according to each learning step.

도 2는 본 개시의 일 실시예에 따른 인공신경망 모델 학습 시스템(200)의 내부 구성을 나타내는 블록도이다. 인공신경망 모델 학습 시스템(200)은 도 1을 참조하여 전술한 적어도 하나의 장치들을 포함하거나, 도 1을 참조하여 전술한 적어도 하나의 방법을 수행할 수 있다. 인공신경망 모델 학습 시스템(200)은, 도시된 바와 같이, 통신 모듈(210), 메모리(220) 및 프로세서(230)를 포함할 수 있으나, 이에 한정되지 않는다. 인공신경망 모델 학습 시스템(200)은, 본 개시의 실시예들에 따른 분류(classification), 객체 감지(object detection), 시맨틱 분할(semantic segmentation) 또는 스타일 이전(style transfer) 등의 처리를 수행하는 인공신경망 모델을 생성, 학습/업데이트 할 수 있다.2 is a block diagram illustrating an internal configuration of an artificial neural network model learning system 200 according to an embodiment of the present disclosure. The artificial neural network model learning system 200 may include the at least one apparatus described above with reference to FIG. 1 or may perform the at least one method described above with reference to FIG. 1 . The artificial neural network model learning system 200, as shown, may include a communication module 210, a memory 220, and a processor 230, but is not limited thereto. The artificial neural network model learning system 200 is an artificial neural network that performs processing such as classification, object detection, semantic segmentation, or style transfer according to embodiments of the present disclosure. You can create, train, and update neural network models.

통신 모듈(210)은 인공신경망 모델 학습 시스템(200)이 외부 장치 또는 외부 시스템(일례로 별도의 클라우드 시스템 등)과 통신하기 위한 구성 또는 기능을 제공할 수 있다. 예를 들어, 인공신경망 모델 학습 시스템(200)은 통신 모듈(210)을 통해 외부 시스템으로부터 학습 데이터 및 인공신경망 모델 학습을 위한 프로그램 코드 등을 수신할 수 있다. 역으로, 인공신경망 모델 학습 시스템(200)의 프로세서(230)의 제어에 따라 제공되는 제어 신호나 명령, 데이터 등이 통신 모듈(210)을 통해 외부 장치 또는 외부 시스템에 전달될 수 있다. 예를 들어, 인공신경망 모델 학습 시스템(200)은 생성 및 학습이 완료된 인공신경망 모델에 대한 파라미터들, 프로그램 코드 등을 통신 모듈(210)을 통해 외부 장치 또는 외부 시스템으로 제공할 수 있다.The communication module 210 may provide a configuration or function for the artificial neural network model learning system 200 to communicate with an external device or an external system (eg, a separate cloud system). For example, the artificial neural network model learning system 200 may receive training data and a program code for learning the artificial neural network model from an external system through the communication module 210 . Conversely, a control signal, command, or data provided under the control of the processor 230 of the artificial neural network model learning system 200 may be transmitted to an external device or an external system through the communication module 210 . For example, the artificial neural network model learning system 200 may provide parameters, program codes, and the like for the artificial neural network model that have been created and learned to an external device or an external system through the communication module 210 .

메모리(220)는 인공신경망 모델의 파라미터들(예를 들어, 인공신경망을 구성하는 레이어들의 입력들의 특성들, 가중치들, 커널들의 특성들 중 적어도 하나)을 저장하거나 인공신경망 학습 방법이 구현된 프로그램 코드를 저장할 수 있다. 메모리(220)는 휘발성 메모리 또는 비휘발성 메모리일 수 있다. 또는, 메모리(220)는 비-일시적인 임의의 컴퓨터 판독 가능한 기록매체를 포함할 수 있다. 예를 들어, 메모리(220)는 RAM(random access memory), ROM(read only memory), 디스크 드라이브, SSD(solid state drive), 플래시 메모리(flash memory) 등과 같은 비소멸성 대용량 저장 장치(permanent mass storage device)를 포함할 수 있다.The memory 220 stores parameters of the artificial neural network model (eg, at least one of characteristics of inputs, weights, and characteristics of kernels constituting the artificial neural network) or a program in which an artificial neural network learning method is implemented. code can be saved. The memory 220 may be a volatile memory or a non-volatile memory. Alternatively, the memory 220 may include any non-transitory computer-readable recording medium. For example, the memory 220 is a non-volatile mass storage device such as random access memory (RAM), read only memory (ROM), disk drive, solid state drive (SSD), flash memory, etc. device) may be included.

프로세서(230)는, 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(220) 또는 통신 모듈(210)에 의해 프로세서(230)로 제공될 수 있다. 예를 들어, 프로세서(230)는 프로그램을 실행함으로써, 인공신경망 모델 학습 시스템(200)을 제어할 수 있다. 프로세서(230)에 의하여 실행되는 프로그램의 코드는 메모리(220)에 저장될 수 있다. 인공신경망 모델 학습 시스템(200)은 통신 모듈(210) 또는 입출력 장치(미도시)를 통하여 외부 장치(예를 들어, 퍼스널 컴퓨터 또는 네트워크)에 연결되고, 데이터를 교환할 수 있다. 일 실시예에 따르면 인공신경망 모델 학습 시스템(200)은 인공신경망과 관련된 연산을 고속으로 처리하는 CPU(Central Processing Unit) 또는 GPU(Graphics Processing Unit), CNN 가속기, NPU(Neural Processing Unit) 또는 VPU(Vision Processing Unit)에 채용되어 해당 전용 프로세서를 제어할 수 있다. 인공신경망 모델 학습 시스템(200)은 설계 의도에 따라 다양한 하드웨어를 채용하거나 다양한 하드웨어에 채용될 수 있으며 도시된 구성요소들의 실시예에 한정되지 않는다. 인공신경망 모델을 생성 및 학습/업데이트 시 상술한 실시예들을 적용하는 경우, 인공신경망의 처리에서 요구되는 데이터양 또는 연산양을 줄여 메모리를 절감하고 처리 속도를 높일 수 있으므로, 상술한 실시예들은 제한된 리소스를 사용하는 환경이나 임베디드 단말에 적합할 수 있다. 일 실시예에 따른 인공신경망 모델 학습 시스템(200)은 순전파 학습 과정에서 오류가 최소화될 수 있도록 역전파 학습 과정에서 파라미터들의 변경 방향을 조절함으로써 인공신경망 모델의 성능을 지속적으로 발전시킬 수 있다.The processor 230 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. The instructions may be provided to the processor 230 by the memory 220 or the communication module 210 . For example, the processor 230 may control the artificial neural network model learning system 200 by executing a program. The code of the program executed by the processor 230 may be stored in the memory 220 . The artificial neural network model learning system 200 may be connected to an external device (eg, a personal computer or a network) through the communication module 210 or an input/output device (not shown) and exchange data. According to an embodiment, the artificial neural network model learning system 200 is a CPU (Central Processing Unit) or GPU (Graphics Processing Unit), CNN accelerator, NPU (Neural Processing Unit) or VPU (Central Processing Unit) that processes operations related to the artificial neural network at high speed Vision Processing Unit) to control the dedicated processor. The artificial neural network model learning system 200 may employ a variety of hardware or may be employed in a variety of hardware according to design intent, and is not limited to embodiments of the illustrated components. When the above-described embodiments are applied when generating and learning/updating an artificial neural network model, memory can be saved and processing speed can be increased by reducing the amount of data or calculation required for processing of the artificial neural network, so the above-described embodiments are limited It may be suitable for an environment that uses resources or an embedded terminal. The artificial neural network model learning system 200 according to an embodiment can continuously develop the performance of the artificial neural network model by adjusting the direction of change of parameters in the backpropagation learning process so that errors can be minimized in the forward propagation learning process.

도 3은 본 개시의 일 실시예에 따른 학습 데이터 DB(310)와 연결된 프로세서(230)의 내부 구성을 나타내는 블록도이다. 도시된 바와 같이, 프로세서(230)는, 예를 들어 통신 모듈을 통해 학습 데이터 DB(310)와 연결되고, 데이터 입력부(320), 제1 인공신경망 모델 학습부(330), 제2 인공신경망 모델 학습부(340) 및 제3 인공신경망 모델 생성부(350)를 포함할 수 있다. 여기서, 학습 데이터 DB(310)는 인공신경망 모델을 생성 및 학습하기 위해 이용되는 학습 데이터를 포함할 수 있다. 예를 들어, 학습 데이터 DB(310)는, 분류, 인식 또는 시맨틱 분할의 대상이 되는 객체들이 표현된 이미지들, 스타일 전이의 대상되는 이미지들을 포함할 수 있다. 학습 데이터 DB(310)는, 인공신경망 모델 학습 시스템(200)의 메모리에 포함될 수 있다. 이와 달리, 학습 데이터 DB(310)는 외부 시스템, 외부 장치 또는 외부 메모리에 저장되고, 통신 모듈을 통해 인공신경망 모델 학습 시스템(200)과 연결될 수 있다.3 is a block diagram illustrating an internal configuration of the processor 230 connected to the learning data DB 310 according to an embodiment of the present disclosure. As shown, the processor 230 is, for example, connected to the training data DB 310 through a communication module, and the data input unit 320, the first artificial neural network model learning unit 330, the second artificial neural network model. It may include a learning unit 340 and a third artificial neural network model generation unit 350 . Here, the learning data DB 310 may include learning data used to generate and learn an artificial neural network model. For example, the learning data DB 310 may include images in which objects targeted for classification, recognition, or semantic segmentation are expressed, and images targeted for style transfer. The training data DB 310 may be included in the memory of the artificial neural network model learning system 200 . Alternatively, the training data DB 310 may be stored in an external system, an external device, or an external memory, and may be connected to the artificial neural network model learning system 200 through a communication module.

데이터 입력부(320)는 학습 데이터 DB(310)로부터 학습 데이터를 수신할 수 있다. 선택적으로 또는 추가적으로, 데이터 입력부(320)는 수신된 학습 데이터에 대한 데이터형의 변환, 차원 축소, 정규화 등과 같은 전처리를 실행할 수 있다. 데이터 입력부(320)가 수신한 학습 데이터는 제1 인공신경망 모델 학습부(330) 및/또는 제2 인공신경망 모델 학습부(340)에서 인공신경망 모델을 생성 및 학습하는 데 사용될 수 있다.The data input unit 320 may receive training data from the training data DB 310 . Optionally or additionally, the data input unit 320 may perform pre-processing such as data type conversion, dimension reduction, normalization, etc. on the received training data. The training data received by the data input unit 320 may be used to generate and learn an artificial neural network model by the first artificial neural network model learning unit 330 and/or the second artificial neural network model learning unit 340 .

일 실시예에서, 제1 인공신경망 모델 학습부(330) 및 제2 인공신경망 모델 학습부(340)는 오류 역전파 학습 방식을 이용하여 인공신경망 모델의 파라미터들(예를 들어, 가중치들)을 학습/업데이트할 수 있다. 오류 역전파 학습 방식을 이용하는 경우, 제1 인공신경망 모델 학습부(330) 및 제2 인공신경망 모델 학습부(340)는 출력 오류를 최소화하기 위한 최적의 파라미터들을 결정하기 위해 모멘텀 기반의 경사하강법을 이용할 수 있다. 예를 들어, 제1 인공신경망 모델 학습부(330) 및 제2 인공신경망 모델 학습부(340)는 아래 수학식 1을 기초로 인공신경망 모델의 가중치를 학습/업데이트할 수 있다.In an embodiment, the first artificial neural network model learning unit 330 and the second artificial neural network model learning unit 340 use the error backpropagation learning method to obtain parameters (eg, weights) of the artificial neural network model. You can learn/update. When the error backpropagation learning method is used, the first artificial neural network model learning unit 330 and the second artificial neural network model learning unit 340 are configured to determine optimal parameters for minimizing the output error using the momentum-based gradient descent method. is available. For example, the first artificial neural network model learning unit 330 and the second artificial neural network model learning unit 340 may learn/update the weights of the artificial neural network model based on Equation 1 below.

위 수학식에서

각각은 t번째 에포크에서의 가중치 값 및 t+1번째 에포크에서의 가중치 값을 나타내고,

는 그래디언트 값을 나타내고,

는 모멘텀 값을 나타내고,

는 모멘텀 값의 영향력을 조절하기 위한 상수를 나타낸다. 여기서, 모멘텀 값은 이전 단계(또는 이전 에포크)에서 계산된 그래디언트 값의 흔적(trace)을 누적한 것을 의미할 수 있으며,

는 인공신경망 모델 학습의 학습률을 제어하기 위한 상수를 나타낸다.in the above formula

each represents a weight value at the t-th epoch and a weight value at the t+1-th epoch,

represents the gradient value,

represents the momentum value,

denotes a constant for adjusting the influence of the momentum value. Here, the momentum value may mean the accumulation of traces of the gradient values calculated in the previous step (or the previous epoch),

denotes a constant for controlling the learning rate of artificial neural network model learning.

제1 인공신경망 모델 학습부(330)는, 제1 인공신경망 모델의 파라미터들을 학습/업데이트할 수 있다. 여기서, 제1 인공신경망 모델의 파라미터 값은 제1 데이터형으로 표현될 수 있다. 예를 들면, 제1 인공신경망 모델 학습부(330)는 32비트 실수형의 파라미터들로 표현되는 FP 모델을 학습/업데이트할 수 있다.The first artificial neural network model learning unit 330 may learn/update parameters of the first artificial neural network model. Here, the parameter value of the first artificial neural network model may be expressed as a first data type. For example, the first artificial neural network model learning unit 330 may learn/update the FP model expressed by 32-bit real parameters.

제2 인공신경망 모델 학습부(340)는 제2 인공신경망 모델의 파라미터들을 학습/업데이트할 수 있다. 일 실시예에서, 제2 인공신경망 모델 학습부(340)는 제2 데이터형으로 표현된 파라미터 값을 이용하여 제2 인공신경망 모델의 순전파 학습 과정을 수행하고, 제1 데이터형으로 표현된 파라미터 값을 이용하여 2 인공신경망 모델의 역전파 학습 과정을 수행하여, 제2 인공신경망 모델의 파라미터들을 학습/업데이트할 수 있다. 여기서 제2 데이터형은 제1 데이터형 보다 작은 수의 비트로 표현되는 데이터형일 수 있다(예를 들어, 8비트 정수). 예를 들어, 제2 인공신경망 모델 학습부(340)는 제2 인공신경망 모델의 파라미터들을 양자화 인지 학습(QAT) 방법을 통해 업데이트할 수 있으며, 제2 인공신경망 모델은 QAT 모델일 수 있다. 이 경우, 제2 인공신경망 모델 학습부(340)는 제2 인공신경망 모델의 순전파 학습 과정 동안은 8비트 정수형으로 양자화된 추론(quantized inference)을 시뮬레이션하고, 역전파 학습 과정 동안은 STE(straight through estimator)를 사용하여 32비트 실수형으로 가중치의 그래디언트 값을 계산하여 가중치 값을 업데이트할 수 있다. 제2 인공신경망 모델 학습부(340)가 사용하는 QAT 방법은 공지의 QAT 방법들 중의 어느 하나를 포함하거나 그 방법의 변형예일 수 있다(Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko, "Quantization and training of neural networks for efcient integer-arithmetic-only inference" 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2704-2713. IEEE, 2018. 참조).The second artificial neural network model learning unit 340 may learn/update parameters of the second artificial neural network model. In an embodiment, the second artificial neural network model learning unit 340 performs a forward propagation learning process of the second artificial neural network model using the parameter value expressed in the second data type, and the parameter expressed in the first data type By performing a backpropagation learning process of the two artificial neural network models using the values, it is possible to learn/update parameters of the second artificial neural network model. Here, the second data type may be a data type expressed by a smaller number of bits than the first data type (eg, an 8-bit integer). For example, the second artificial neural network model learning unit 340 may update parameters of the second artificial neural network model through a quantization cognitive learning (QAT) method, and the second artificial neural network model may be a QAT model. In this case, the second artificial neural network model learning unit 340 simulates quantized inference in an 8-bit integer type during the forward propagation learning process of the second artificial neural network model, and STE (straight propagation) during the back propagation learning process through estimator) to compute the gradient value of the weight as a 32-bit real number and update the weight value. The QAT method used by the second artificial neural network model learning unit 340 may include any one of known QAT methods or may be a modified example of the method (Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko, "Quantization and training of neural networks for efcient integer-arithmetic-only inference" 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2704-2713 (see IEEE, 2018.).

QAT 방법에 따르면 파라미터들 중 일부 값의 양자화로 인해, 그래디언트 값을 계산하는데 있어서 근사(estimation)로 인한 오류(error)가 발생할 수 있으며, 이러한 그래디언트 값의 오류는 가중치의 양자화 확률값(quantization statistics)을 업데이트하는 데 있어서 오류를 발생시킬 수 있다. 이러한 오류는 또 다시 다음 에포크에서 계산되는 그래디언트 값의 오류를 가져오므로, QAT 방법에 따른 인공신경망 모델의 학습을 반복할수록 해당 인공신경망 모델의 오류는 더욱 증폭될 수 있다. 제2 인공신경망 모델 학습부(340)는 이러한 오류 증폭을 방지하기 위해, 제2 인공신경망 모델을 위한 적절한 초기 모멘텀 값을 설정할 수 있다. 제2 인공신경망 모델의 적절한 초기 모멘텀 값을 설정함으로써, 파라미터들이 적절한 학습 방향으로 업데이트되도록 유도할 수 있고, 오류가 있는 그래디언트 값들이 무시될 수 있다. 따라서, QAT 방법에 따라 결정되는 가중치의 양자화 확률값은 올바른 방향으로 업데이트 될 수 있고, 그래디언트 값의 오류도 감소할 수 있다.According to the QAT method, due to quantization of some values of parameters, an error due to estimation may occur in calculating the gradient value. Errors may occur in updating. Since this error again causes an error in the gradient value calculated in the next epoch, the more repeating the learning of the artificial neural network model according to the QAT method, the more the error of the artificial neural network model can be amplified. The second artificial neural network model learning unit 340 may set an appropriate initial momentum value for the second artificial neural network model in order to prevent such error amplification. By setting an appropriate initial momentum value of the second artificial neural network model, parameters may be induced to be updated in an appropriate learning direction, and erroneous gradient values may be ignored. Accordingly, the quantization probability value of the weight determined according to the QAT method can be updated in the correct direction, and the error of the gradient value can be reduced.

일 실시예에서, 제2 인공신경망 모델 학습부(340)가 제2 인공신경망 모델을 위한 적절한 초기 모멘텀 값을 설정하기 위해, 우선, 미리 설정된 에포크까지 제1 인공신경망 모델을 학습하여, 미리 설정된 에포크에서의 제1 인공신경망 모델의 모멘텀 값을 결정할 수 있다. 또한, 제2 인공신경망 모델 학습부(340)는 이와 같이 결정된 제1 인공신경망 모델의 모멘텀 값을 제공받을 수 있다. 예를 들어, 제1 인공신경망 모델 학습부(330)는 첫번째 에포크까지 32비트 실수형의 파라미터들을 포함하는 FP 모델을 학습하여 모멘텀

값을 결정하고, 모멘텀

을 제2 인공신경망 모델 학습부(340)에 제공할 수 있다. 그 후, 제2 인공신경망 모델 학습부(340)는 제공받은 모멘텀 값을 제2 인공신경망 모델의 초기 모멘텀 값으로 설정하고, 학습 데이터를 이용하여 제2 인공신경망 모델의 파라미터들을 업데이트할 수 있다. 따라서, 제2 인공신경망 모델 학습부(340)는 제1 인공신경망 모델 학습부(330)로부터 FP 모델의 적절한 초기 모멘텀 값을 제공받아, QAT 모델의 학습 초기 단계의 불안정성을 제어함으로써 QAT 방법에 따라 QAT 모델의 파라미터들을 학습할 때 발생하는 오류를 감소시킬 수 있다.In one embodiment, in order for the second artificial neural network model learning unit 340 to set an appropriate initial momentum value for the second artificial neural network model, first, the first artificial neural network model is learned up to a preset epoch, and the preset epoch It is possible to determine the momentum value of the first artificial neural network model in . Also, the second artificial neural network model learning unit 340 may receive the momentum value of the first artificial neural network model determined as described above. For example, the first artificial neural network model learning unit 330 learns an FP model including 32-bit real parameters until the first epoch to obtain momentum.

Determine the value, momentum

may be provided to the second artificial neural network model learning unit 340 . Thereafter, the second artificial neural network model learning unit 340 may set the received momentum value as an initial momentum value of the second artificial neural network model, and update parameters of the second artificial neural network model using the training data. Accordingly, the second artificial neural network model learning unit 340 receives an appropriate initial momentum value of the FP model from the first artificial neural network model learning unit 330, and controls the instability of the initial stage of learning the QAT model according to the QAT method. It is possible to reduce errors that occur when learning the parameters of the QAT model.

일 실시예에서, 제2 인공신경망 모델 학습부(340)는 제2 인공신경망 모델의 복수의 가중치 중 적어도 하나의 가중치에 대한 그래디언트 값을 증폭하고, 증폭된 그래디언트 값을 기초로 제2 인공신경망 모델의 파라미터들을 업데이트 할 수 있다. 예를 들면, 제2 인공신경망 모델 학습부(340)는 복수의 가중치 중 적어도 하나의 가중치에 대한 그래디언트 값의 부호를 기초로 현재 학습 방향에 대응하는 방향으로 그래디언트 값을 증폭하고, 증폭된 그래디언트 값을 기초로 제2 인공신경망 모델의 파라미터들을 업데이트 할 수 있다.In an embodiment, the second artificial neural network model learning unit 340 amplifies a gradient value for at least one weight among a plurality of weights of the second artificial neural network model, and based on the amplified gradient value, the second artificial neural network model parameters can be updated. For example, the second artificial neural network model learning unit 340 amplifies the gradient value in a direction corresponding to the current learning direction based on the sign of the gradient value for at least one weight among the plurality of weights, and the amplified gradient value Based on , parameters of the second artificial neural network model may be updated.

제2 인공신경망 모델 학습부(340)는 복수의 가중치(

)에서 임의로 일부 가중치(

)를 추출하여, 추출한 일부 가중치에 대해서 그래디언트 값을 증폭하고, 나머지 가중치에 대해서는 증폭하지 않을 수 있다. 일 실시예에서, 제2 인공신경망 모델 학습부(340)는 아래 수학식 2 내지 4를 기초로 그래디언트 값을 증폭할 수 있다.The second artificial neural network model learning unit 340 includes a plurality of weights (

) at arbitrarily some weights (

), to amplify the gradient value for some of the extracted weights, and not to amplify the remaining weights. In an embodiment, the second artificial neural network model learning unit 340 may amplify the gradient value based on Equations 2 to 4 below.

여기서,

는 복수의 가중치의 그래디언트 값들의 확률 분포를 기초로 결정되는 왜곡 값(distortion)을 나타내고,

은

의 부호(+ 혹은 -)를 나타내고,

는 급격한 그래디언트 증폭을 방지하기 위한 클램핑 계수(clamping factor)를 나타내고,

는 지수적 감쇄(exponential decay)하는 에포크의 지수(power of t)를 나타낼 수 있다. 여기서, 왜곡 값은 복수의 가중치의 그래디언트 값들의 라플라스(Laplace) 확률 분포를 기초로 결정될 수 있다. 따라서, 위 수학식들에 따르면

의 부호와

의 부호를 일치시키고,

에

를 더함으로써, 그래디언트 값을 현재 학습 방향으로 증폭할 수 있다.here,

represents a distortion value determined based on the probability distribution of gradient values of a plurality of weights,

silver

represents the sign (+ or -) of

represents a clamping factor to prevent abrupt gradient amplification,

may represent the power of t of an epoch with exponential decay. Here, the distortion value may be determined based on a Laplace probability distribution of gradient values of a plurality of weights. Therefore, according to the above equations

sign of and

match the signs of

to

By adding , the gradient value can be amplified in the current learning direction.

제3 인공신경망 모델 생성부(350)는 제2 인공신경망 모델의 파라미터들을 양자화하여 제3 인공신경망 모델을 생성할 수 있다. 예를 들면, 제2 인공신경망 모델의 32비트 실수형의 파라미터들을 8비트 정수형의 파라미터들로 양자화함으로써, 제3 인공신경망 모델을 생성할 수 있다. 이렇게 생성된 제3 인공신경망 모델은 8비트 정수형의 파라미터들로 구성되므로, 해당 양자화된 모델이 저장되는 메모리 크기를 절감할 수 있으며 해당 모델을 이용한 추론의 속도를 높일 수 있어 제한된 리소스를 사용하는 환경이나 임베디드 단말에 적용될 수 있다. 또한, 제3 인공신경망 모델은 기존의 QAT 방법으로 학습된 8비트 정수형 인공신경망 모델에 비해 향상된 성능을 가질 수 있다.The third artificial neural network model generator 350 may generate a third artificial neural network model by quantizing parameters of the second artificial neural network model. For example, the third artificial neural network model may be generated by quantizing the 32-bit real-type parameters of the second artificial neural network model into 8-bit integer-type parameters. Since the third artificial neural network model created in this way is composed of 8-bit integer parameters, the size of the memory in which the quantized model is stored can be reduced and the speed of inference using the model can be increased, so that the environment using limited resources However, it can be applied to embedded terminals. In addition, the third artificial neural network model may have improved performance compared to the 8-bit integer artificial neural network model trained by the existing QAT method.

도 4는 본 개시의 일 실시예에 따른 인공신경망 모델 학습 방법(400)을 나타내는 흐름도이다. 도시된 바와 같이, 적어도 하나의 프로세서에 의해 수행되는 인공신경망 모델 학습 방법(400)은, 모멘텀 기반의 경사하강법에 의해 미리 설정된 에포크까지 제1 인공신경망 모델을 학습하여, 해당 에포크에서의 제1 인공신경망 모델의 모멘텀 값을 결정하는 단계(S410)로 개시될 수 있다. 일 실시예에 따르면, 첫번째 에포크까지 제1 인공신경망 모델을 학습하여, 첫번째 에포크에서의 제1 인공신경망 모델의 모멘텀 값을 결정할 수 있다. 결정된 모멘텀 값을 제2 인공신경망 모델의 초기 모멘텀 값으로 설정할 수 있다(S420).4 is a flowchart illustrating a method 400 for learning an artificial neural network model according to an embodiment of the present disclosure. As shown, the neural network model learning method 400 performed by at least one processor learns the first neural network model up to the preset epoch by the momentum-based gradient descent method, and the first neural network model at the epoch It may start with the step of determining the momentum value of the artificial neural network model ( S410 ). According to an embodiment, by learning the first neural network model until the first epoch, the momentum value of the first neural network model in the first epoch may be determined. The determined momentum value may be set as an initial momentum value of the second artificial neural network model (S420).

그 후, 초기 모멘텀 값을 기초로, 학습 데이터를 이용하여 제2 인공신경망 모델의 파라미터 값을 업데이트할 수 있다(S430). 여기서, 파라미터는 복수의 가중치(weight) 및 모멘텀을 포함할 수 있다. 일 실시예에서, 프로세서는 제2 데이터형으로 표현된 파라미터 값을 이용하여 제2 인공신경망 모델의 순전파 학습 과정을 수행하고, 제1 데이터형으로 표현된 파라미터 값을 이용하여 제2 인공신경망 모델의 역전파 학습 과정을 수행할 수 있다. 제2 데이터형은 제1 데이터형에 비하여 더 작은 수의 비트로 표현될 수 있다. 예를 들어, 제2 데이터형은 8비트 고정 소수점 데이터형이고, 제1 데이터형은 32비트 부동 소수점 데이터형일 수 있다.Thereafter, based on the initial momentum value, the parameter value of the second artificial neural network model may be updated using the training data ( S430 ). Here, the parameter may include a plurality of weights and momentum. In an embodiment, the processor performs a forward propagation learning process of the second artificial neural network model using the parameter value expressed in the second data type, and the second artificial neural network model using the parameter value expressed in the first data type. of the backpropagation learning process can be performed. The second data type may be expressed by a smaller number of bits compared to the first data type. For example, the second data type may be an 8-bit fixed-point data type, and the first data type may be a 32-bit floating-point data type.

일 실시예에서, 업데이트된 제2 인공신경망 모델의 파라미터 값을 양자화하여, 제3 인공신경망 모델을 생성할 수 있다. 일 실시에에서, 제1 인공신경망 모델의 파라미터 값은 제1 데이터형으로 표현되고, 제2 인공신경망 모델의 파라미터 값은 제1 데이터형 또는 제2 데이터형으로 표현되고, 제3 인공신경망 모델의 파라미터 값은 제2 데이터형으로 표현될 수 있다.In an embodiment, the third artificial neural network model may be generated by quantizing the parameter values of the updated second artificial neural network model. In an embodiment, the parameter value of the first artificial neural network model is expressed as a first data type, the parameter value of the second artificial neural network model is expressed as the first data type or the second data type, and the parameter value of the third artificial neural network model is expressed as The parameter value may be expressed as a second data type.

도 5는 본 개시의 일 실시예에 따른 제2 인공신경망 모델의 파라미터 값을 업데이트하는 방법(S430)을 나타내는 흐름도이다. 일 실시예에 따르면, 단계(S430)에서 초기 모멘텀 값을 기초로, 학습 데이터를 이용하여 제2 인공신경망 모델의 파라미터 값을 업데이트할 때, 프로세서는 복수의 가중치에 대한 그래디언트(gradient) 값을 기초로, 학습 데이터를 이용하여 제2 인공신경망 모델의 파라미터 값을 업데이트할 수 있다. 구체적으로, 프로세서는, 복수의 가중치 중 적어도 하나의 가중치에 대한 그래디언트 값의 부호를 기초로 현재 학습 방향에 대응하는 방향으로 그래디언트 값을 증폭할 수 있다(S510). 또한, 프로세서는, 증폭된 그래디언트 값을 기초로, 제2 인공신경망 모델의 파라미터 값을 업데이트할 수 있다(S520).5 is a flowchart illustrating a method ( S430 ) of updating a parameter value of a second artificial neural network model according to an embodiment of the present disclosure. According to an embodiment, when updating the parameter values of the second artificial neural network model using the training data based on the initial momentum value in step S430 , the processor based on the gradient values for the plurality of weights. Therefore, the parameter value of the second artificial neural network model may be updated using the training data. Specifically, the processor may amplify the gradient value in a direction corresponding to the current learning direction based on the sign of the gradient value for at least one weight among the plurality of weights ( S510 ). Also, the processor may update the parameter value of the second artificial neural network model based on the amplified gradient value ( S520 ).

도 4 및 도 5에서 상술한 인공신경망 모델 학습 방법은 분류, 객체 감지, 시맨틱 분할 또는 스타일 이전 등의 처리를 수행하는 인공신경망 모델을 생성 및 학습하는데 적용될 수 있다.The artificial neural network model learning method described above in FIGS. 4 and 5 may be applied to creating and learning an artificial neural network model that performs processing such as classification, object detection, semantic segmentation, or style transfer.

도 6은 본 개시의 실시예들에 따라 생성 및 학습한 인공신경망 모델의 성능 평가를 위한, 각 레이어(layer)에서의 가중치 그래디언트 값의 히스토그램들(histograms)(610, 620, 630)이다.FIG. 6 is histograms 610, 620, and 630 of weight gradient values in each layer for performance evaluation of an artificial neural network model created and learned according to embodiments of the present disclosure.

본 개시의 실시예들에 따른 인공신경망 모델의 학습 방법 및 시스템의 성능 평가를 위해, 상술한 인공신경망 학습 방법들로, CIFAR-10 데이터 셋을 이용하여 MobileNetV2 모델을 10번째 에포크까지 학습하였다. 제1 히스토그램(610)은 종래의 고정밀도(FP) 모델 학습 방법으로 학습한 인공신경망 모델의 각 레이어에서의 가중치 그래디언트 값들을 나타내며, 제2 히스토그램(620)은 FP 모델의 모멘텀 값을 초기 모멘텀 값으로 이용하지 않으며 그래디언트 값의 증폭을 적용하지 않는, 종래의 QAT 방법으로 학습한 인공신경망 모델의 각 레이어에서의 가중치 그래디언트 값들을 나타낸다. 또한, 제3 히스토그램(630)은 본 개시의 실시예들에 따라, 첫번째 에포크에서 결정된 FP 모델의 모멘텀 값을 초기 모멘텀 값으로 이용하고, 그래디언트 값의 증폭을 적용하는 QAT 방법으로 학습한 인공신경망 모델의 각 레이어에서의 가중치 그래디언트 값을 나타낸다. 즉, 제1 히스토그램(610)은 본 개시의 인공신경망 학습 방법 및 시스템이 목표로 하는 타겟 성능을 갖는 인공신경망 모델의 히스토그램일 수 있다.In order to evaluate the performance of the neural network model learning method and system according to the embodiments of the present disclosure, the MobileNetV2 model was trained up to the 10th epoch using the CIFAR-10 data set using the artificial neural network learning methods described above. The first histogram 610 represents the weight gradient values in each layer of the artificial neural network model learned by the conventional high-precision (FP) model learning method, and the second histogram 620 shows the momentum value of the FP model as the initial momentum value. It represents the weighted gradient values in each layer of the artificial neural network model learned by the conventional QAT method, which is not used as . In addition, the third histogram 630 uses the momentum value of the FP model determined in the first epoch as the initial momentum value, according to embodiments of the present disclosure, and the artificial neural network model learned by the QAT method for applying the amplification of the gradient value. represents the weight gradient value in each layer of . That is, the first histogram 610 may be a histogram of an artificial neural network model having a target performance targeted by the artificial neural network learning method and system of the present disclosure.

제2 히스토그램(620)에 나타난 바와 같이, 종래 QAT 방법에 따르면 STE로 인한 그래디언트 근사 오류로 인해 그래디언트 값이 사라지는 문제가 발생하는 것을 확인할 수 있다. 반면, 제3 히스토그램(630)은, 본 개시의 실시예들에 따라 학습한 인공신경망 모델의 각 레이어에서의 가중치 그래디언트 값의 분포가 제1 히스토그램(610)의 분포와 유사함을 보여준다. 따라서, 본 개시의 실시예들에 따라 학습한 인공신경망 모델은 타겟 성능을 갖는 인공신경망 모델과 유사한 성능을 갖는다는 것을 확인할 수 있다.As shown in the second histogram 620 , according to the conventional QAT method, it can be confirmed that a problem in which the gradient value disappears due to a gradient approximation error due to STE occurs. On the other hand, the third histogram 630 shows that the distribution of weight gradient values in each layer of the artificial neural network model learned according to the embodiments of the present disclosure is similar to the distribution of the first histogram 610 . Accordingly, it can be confirmed that the artificial neural network model learned according to the embodiments of the present disclosure has similar performance to the artificial neural network model having the target performance.

도 7은 본 개시의 실시예들에 따라 생성 및 학습한 인공신경망 모델의 성능 평가를 위한, 손실 함수 그래프들(loss landscapes)(710, 720, 730, 740)이다.7 is a loss function graph (loss landscapes) 710, 720, 730, and 740 for performance evaluation of an artificial neural network model created and trained according to embodiments of the present disclosure.

본 개시의 실시예들에 따른 인공신경망 모델의 학습 방법 및 시스템의 성능 평가를 위해, 상술한 인공신경망 학습 방법들로, CIFAR-10 데이터 셋을 이용하여 MobileNetV2 모델을 학습하였다.In order to evaluate the performance of the artificial neural network model learning method and system according to the embodiments of the present disclosure, a MobileNetV2 model was trained using the CIFAR-10 data set using the artificial neural network learning methods described above.

제1 그래프(710)는 종래의 FP 학습 방법으로 학습한 인공신경망 모델의 손실 함수 그래프를 나타내며, 제2 그래프(720)는 첫번째 에포크에서 결정된 FP 모델의 모멘텀 값을 이용하지 않고 그래디언트 값의 증폭없이, 종래의 QAT 방법으로 학습한 인공신경망 모델의 손실 함수 그래프를 나타낸다. 또한, 제3 그래프(730)는 본 개시의 실시예에 따라, 첫번째 에포크에서 결정된 FP 모델의 모멘텀 값을 초기 모멘텀 값으로 이용하여, QAT 방법으로 학습한 인공신경망 모델의 손실 함수 그래프를 나타내고, 제4 그래프(740)는 본 개시의 실시예들에 따라, 첫번째 에포크에서 결정된 FP 모델의 모멘텀 값을 초기 모멘텀 값으로 이용하고, 그래디언트 값의 증폭을 적용하는, QAT 방법으로 학습한 인공신경망 모델의 손실 함수 그래프를 나타낸다. 즉, 제1 그래프(710)는 본 개시의 인공신경망 학습 방법 및 시스템이 목표로 하는 타겟 성능을 갖는 인공신경망 모델의 손실 함수 그래프일 수 있다.The first graph 710 shows a loss function graph of an artificial neural network model learned by the conventional FP learning method, and the second graph 720 does not use the momentum value of the FP model determined in the first epoch and without amplification of the gradient value. , shows the loss function graph of the artificial neural network model learned by the conventional QAT method. In addition, the third graph 730 shows the loss function graph of the artificial neural network model learned by the QAT method by using the momentum value of the FP model determined in the first epoch as the initial momentum value according to the embodiment of the present disclosure, 4 graph 740, according to embodiments of the present disclosure, using the momentum value of the FP model determined in the first epoch as the initial momentum value, and applying the amplification of the gradient value, the loss of the artificial neural network model learned by the QAT method Represents a function graph. That is, the first graph 710 may be a loss function graph of an artificial neural network model having a target performance targeted by the artificial neural network learning method and system of the present disclosure.

도시된 바와 같이, 제2 그래프(720)는 평평한 표면으로 나타나는 반면, 제3 그래프(730) 및 제4 그래프(740)는 제1 그래프(710)와 유사하게 안정적인 손실 함수 그래프를 보여준다. 특히, 제4 그래프(740)가 더 넓은 범위의 손실 영역(loss terrain)을 포함(cover)하고 있으며, 이를 통해, 그래디언트 값의 증폭을 적용하는 QAT 방법에 따라 인공신경망을 학습하는 경우, 최적의 국소 최소값(local-minima)의 탐색 영역(search area)이 확장될 수 있음을 확인할 수 있다.As shown, the second graph 720 appears as a flat surface, while the third graph 730 and the fourth graph 740 show a stable loss function graph similar to the first graph 710 . In particular, the fourth graph 740 includes a wider range of loss terrain, and through this, when learning the artificial neural network according to the QAT method applying the amplification of the gradient value, the optimal It can be seen that the search area of the local-minima can be expanded.

도 8은 본 개시의 실시예들에 따라 생성 및 학습한 인공신경망 모델을 이미지 스타일 이전 응용분야에 적용하는 경우의 성능 평가 결과의 예시이다. 본 개시의 다양한 실시예들에 따른 인공신경망 모델 학습 방법은 분류, 객체 감지, 시맨틱 분할 또는 스타일 이전 등의 처리를 수행하는 인공신경망 모델을 생성 및 학습하는데 적용될 수 있다. 도 8은, 본 개시의 실시예들에 따라 학습한 스타일 이전을 위한 인공신경망 모델의 입력 이미지와 그에 대응하는 출력 이미지를 도시한다.8 is an example of a performance evaluation result when an artificial neural network model created and learned according to embodiments of the present disclosure is applied to an image style transfer application field. The artificial neural network model learning method according to various embodiments of the present disclosure may be applied to create and learn an artificial neural network model that performs processing such as classification, object detection, semantic segmentation, or style transfer. 8 illustrates an input image of an artificial neural network model for style transfer learned according to embodiments of the present disclosure and an output image corresponding thereto.

미니맥스 생성 손실(minimax generation loss)을 적용한 Pix2Pix 스타일 이전 모델을 학습함으로써, 불안정한 학습 손실에 대한 본 개시의 실시예들에 따른 인공신경망 모델 학습 방법의 견고성을 평가할 수 있다. 레이어 융합(layer fusion) 호환성을 위해, ResNet 기반의 Pix2Pix 모델과 본 개시의 실시예들에 따른 인공신경망 모델 학습 방법을 적용한 Adam 옵티마이저(optimizer)를 사용했다. 인공신경망을 이용한 추론 과정 중에 판별자(discriminator)가 사용되지 않기 때문에 모델의 생성자(generator)에만 가장 양자화를 적용하였다. 본 개시의 실시예에 따른 인공신경망 모델 학습 방법은, 미니 맥스 기반 생성 모델의 실패 신호로 여겨지는 모드 붕괴를 유발하지 않으며 퍼지(fuzzy) 학습 조건에도 적합하다는 것을 확인할 수 있다. 따라서 도시된 바와 같이, 본 개시의 실시예에 따른 인공신경망 모델 학습 방법은 다양한 이미지-이미지 스타일 이전을 수행할 수 있는 Pix2Pix 모델을 학습할 수 있다.By learning the Pix2Pix style previous model to which the minimax generation loss is applied, the robustness of the artificial neural network model training method according to the embodiments of the present disclosure for unstable learning loss can be evaluated. For layer fusion compatibility, the ResNet-based Pix2Pix model and the Adam optimizer to which the artificial neural network model learning method according to embodiments of the present disclosure is applied were used. Since discriminators are not used during the reasoning process using artificial neural networks, quantization is applied only to the generator of the model. It can be confirmed that the artificial neural network model learning method according to the embodiment of the present disclosure does not cause mode collapse, which is considered a failure signal of the mini-max-based generative model, and is suitable for a fuzzy learning condition. Accordingly, as shown, the artificial neural network model learning method according to an embodiment of the present disclosure may learn a Pix2Pix model capable of performing various image-image style transfers.

한편, 아래 표 1은 본 개시의 실시예들에 따라 학습한 분류 인공신경망 모델의 ImageNet-1K 데이터 세트에 대한 분류 결과(상위 1 정확도)이다.Meanwhile, Table 1 below shows the classification results (top 1 accuracy) for the ImageNet-1K data set of the classification artificial neural network model learned according to embodiments of the present disclosure.

ModelModel ParamsParams MAddsMAdds FP trainingFP training QAT
Fine-tuneQAT
fine-tune StatAssist
OnlyStatAssist
Only StatAssist
GradBoostStatAssist
GradBoost ResNet18ResNet18 11.68M11.68M 7.25B7.25B 69.769.7 68.868.8 68.968.9 69.669.6 MobileNetV2MobileNetV2 3.51M3.51M 1.19B1.19B 71.871.8 70.370.3 70.770.7 71.571.5 ShuffleNetV2ShuffleNetV2 2.28M2.28M 0.57B0.57B 69.369.3 63.463.4 67.767.7 68.868.8 ShuffleNetV2

0.5ShuffleNetV2

0.5 1.36M 0.16B 58.2 44.8 56.8 57.3

위 표에서, 'Params'은 각 모델의 매개 변수의 수를 나타내고, 'MAdds'는 224 Х 224 입력에 대해 측정된 곱-가산 값(Multiply-Adds)을 나타내고, 'QAT Fine-tune'은 종래의 QAT 방법으로 학습되거나 미세 조정된 양자화된 모델을 나타낸다. 한편, 'StatAssist Only'는, 본 개시의 실시예들에 따라 첫번째 에포크에서 결정된 FP 모델의 모멘텀 값을 초기 모멘텀 값으로 이용하여, QAT 방법으로 학습한 모델을 나타내고, 'StatAssist GradBoost'는 본 개시의 실시예들에 따라 첫번째 에포크에서 결정된 FP 모델의 모멘텀 값을 초기 모멘텀 값으로 이용하고, 그래디언트 값의 증폭을 적용하는, QAT 방법으로 학습한 모델을 나타낸다.In the table above, 'Params' indicates the number of parameters for each model, 'MAAdds' indicates the Multiply-Adds measured for 224 Х 224 inputs, and 'QAT Fine-tune' is the conventional It represents a quantized model trained or fine-tuned by the QAT method of Meanwhile, 'StatAssist Only' indicates a model learned by the QAT method using the momentum value of the FP model determined in the first epoch according to the embodiments of the present disclosure as the initial momentum value, and 'StatAssist GradBoost' is the According to embodiments, a model trained by the QAT method is shown using the momentum value of the FP model determined in the first epoch as the initial momentum value and applying the amplification of the gradient value.

'QAT Fine-tune'과 'FP training'간의 성능 차이는 모델 구조에 따라 상이하다. 특히, ShuffleNetV2 모델과 ShuffleNetV2Х0.5 모델에서 그 성능 차이가 더 큰 것을 확인할 수 있다. 한편, 'StatAssist GradBoost'와 'FP training'의 성능 차이가 0.9 % 이하에 해당하는 것을 확인할 수 있다. 따라서, 본 개시의 실시예들에 따른 인공신경망 모델 학습 방법으로 학습한 분류 인공신경망 모델은 FP 학습한 모델과 유사한 성능을 갖는다.The performance difference between 'QAT Fine-tune' and 'FP training' is different depending on the model structure. In particular, it can be seen that the performance difference is larger between the ShuffleNetV2 model and the ShuffleNetV2Х0.5 model. On the other hand, it can be seen that the performance difference between 'StatAssist GradBoost' and 'FP training' is 0.9% or less. Accordingly, the classification artificial neural network model learned by the artificial neural network model learning method according to the embodiments of the present disclosure has similar performance to the FP-trained model.

아래 표 2는 본 개시의 실시예들에 따라 학습한 객체 감지 인공신경망 모델의 PASCAL-VOC 2007 데이터 세트에 대한 객체 감지 결과(mAP)이다.Table 2 below is an object detection result (mAP) for the PASCAL-VOC 2007 data set of the object detection artificial neural network model learned according to embodiments of the present disclosure.

ModelModel ParamsParams MAddsMAdds FP trainingFP training QAT
Fine-tuneQAT
fine-tune StatAssist
OnlyStatAssist
Only StatAssist
GradBoostStatAssist
GradBoost T-DSODT-DSOD 2.17M2.17M 2.24B2.24B 71.571.5 71.471.4 71.971.9 72.072.0 SSD-mv2SSD-mv2 2.95M2.95M 1.60B1.60B 71.071.0 70.870.8 71.171.1 71.371.3

'MAdds'는 300 Х 300 입력에 대해 측정된 곱-가산 값(Multiply-Adds)을 나타낸다. T-DSOD 모델의 경우, 초기 학습률(

)을

로 설정하고 총 180K 반복 중 120K 및 150K 반복에서 학습률을 0.1로 조정했다. SSD-mv2 모델의 경우, 초기 학습률은

로, 총 120K 반복을 사용하고, 80K 및 100K에서 학습률을 조정했다. 각 경우에 대해 배치 크기를 64로 설정하고, 성능 평가를 위해 각 모델의 모든 레이어를 융합하도록 감지기를 약간 수정했다.'MAdds' represents the Multiply-Adds values measured for 300 Х 300 inputs. For the T-DSOD model, the initial learning rate (

)second

and adjusted the learning rate to 0.1 at 120K and 150K iterations out of a total of 180K iterations. For the SSD-mv2 model, the initial learning rate is

, using a total of 120K iterations, and adjusting the learning rate at 80K and 100K. For each case, we set the batch size to 64, and slightly modified the detector to fuse all layers of each model for performance evaluation.

표 2에 나타난 바와 같이, 'FP Training'에 비해, 'QAT Fine-tune'에서는 약간의 mAP 하락이 나타나는 반면, 'StatAssist GradBoost'에서는 mAP 증가가 나타난다. 즉, 'QAT Fine-tune'의 성능은 'FP training'의 성능을 능가할 수 없으나, 'StatAssist GradBoost'의 성능이 'FP training'의 성능보다 우수함을 확인할 수 있다.As shown in Table 2, compared to 'FP Training', 'QAT Fine-tune' showed a slight decrease in mAP, while 'StatAssist GradBoost' showed an increase in mAP. That is, the performance of 'QAT Fine-tune' cannot surpass that of 'FP training', but it can be confirmed that the performance of 'StatAssist GradBoost' is superior to that of 'FP training'.

아래 표 3은 본 개시의 실시예들에 따라 학습한 시맨틱 분할 인공신경망 모델의 Cityscapes 데이터 세트에 대한 시맨틱 분할 결과(mIOU)이다.Table 3 below shows the semantic segmentation results (mIOU) for the Cityscapes data set of the semantic segmentation artificial neural network model learned according to embodiments of the present disclosure.

ModelModel ParamsParams MAddsMAdds FP trainingFP training QAT
Fine-tuneQAT
fine-tune StatAssist
OnlyStatAssist
Only StatAssist
GradBoostStatAssist
GradBoost ESPNetESPNet 0.60M0.60M 16.8B16.8B 65.465.4 64.664.6 65.065.0 65.565.5 ESPNetV2ESPNetV2 3.43M3.43M 27.2B27.2B 64.464.4 63.863.8 64.664.6 64.564.5 Mv3-LRASPP-LargeMv3-LRASPP-Large 2.42M2.42M 7.21B7.21B 65.365.3 64.564.5 64.764.7 65.265.2 Mv3-LRASPP-SmallMv3-LRASPP-Small 0.75M0.75M 2.22B2.22B 62.562.5 61.761.7 61.661.6 62.162.1 Mv3-LRASPP-Large-RE (Ours)Mv3-LRASPP-Large-RE (Ours) 2.42M2.42M 7.21B7.21B 65.565.5 64.964.9 65.165.1 65.865.8 Mv3-LRASPP-Small-RE (Ours)Mv3-LRASPP-Small-RE (Ours) 0.75M0.75M 2.22B2.22B 61.561.5 61.261.2 62.162.1 62.362.3

여기서, 'MAdds'는 768 Х768 입력에 대해 측정된 곱-가산 값(Multiply-Adds)을 나타내고, '-Large' 및 '-Small'는 각각 높은 리소스 사용 사례와 낮은 리소스 사용 사례를 대상으로 하는 다른 모델 구성을 나타내고, '-RE'는 하드 스위시 액티배이션(hard-swish activation)을 ReLU로 교체한 모델을 나타낸다.where 'MAdds' represents the Multiply-Adds measured for 768 Х768 inputs, and '-Large' and '-Small' are different for high and low resource use cases, respectively. It represents the model configuration, and '-RE' represents a model in which hard-swish activation is replaced with ReLU.

학습을 위해 초기 학습률을

로 설정하고 폴리(poly) 학습률 일정과 함께 Nesterov-momentum SGD를 사용했다. 단일 NVIDIA P40 GPU에서, 모델에 적합하도록 768 Х 768로 무작위로 자른 기차 이미지를 이용하여 모델을 학습했다. 성능 평가는 풀 스케일 2048 Х 1024 이미지로 수행하였다.The initial learning rate for learning

, and Nesterov-momentum SGD with a poly learning rate schedule was used. On a single NVIDIA P40 GPU, the model was trained using train images randomly cropped to 768 Х 768 to fit the model. Performance evaluation was performed with full-scale 2048 Х 1024 images.

표 3에 나타난 바와 같이, 'FP training'에 비해, 'QAT Fine-tune'은 평균 0.65 % mIOU 감소가 나타나지만,'StatAssist GradBoost'은 평균 0.13 % mIOU 증가가 나타난다. 따라서, 'StatAssist GradBoost'의 성능이 'FP training'의 성능과 유사하거나 'FP training'의 성능보다 우수함을 확인할 수 있다.As shown in Table 3, compared to 'FP training', 'QAT Fine-tune' shows an average 0.65% mIOU decrease, but 'StatAssist GradBoost' shows an average 0.13% mIOU increase. Therefore, it can be confirmed that the performance of 'StatAssist GradBoost' is similar to that of 'FP training' or superior to that of 'FP training'.

도 9는 본 개시의 일 실시예에 따른 인공신경망 모델 학습 방법을 수행하기 위한 알고리즘(900)의 흐름도이다. 도 9에 도시된 따른 인공신경망 모델 학습 방법을 실행하는 알고리즘은, 예를 들어 PyTorch 1.4 양자화 라이브러리를 이용하여 구현될 수 있다(Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala, "Pytorch: An imperative style, high-performance deep learning library," In Advances in Neural Information Processing Systems 32, pages 8024-8035. Curran Associates, Inc., 2019. 참조).9 is a flowchart of an algorithm 900 for performing an artificial neural network model learning method according to an embodiment of the present disclosure. The algorithm for executing the artificial neural network model learning method shown in FIG. 9 may be implemented using, for example, the PyTorch 1.4 quantization library (Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala, "Pytorch: An imperative style, high-performance deep learning library," In Advances in Neural Information Processing Systems 32, pages 8024-8035. See Curran Associates, Inc., 2019.).

본 실시예에 따른 인공신경망 모델 학습 방법을 수행하기 위한 알고리즘은, 가장 양자화 호환성을 가진 FP 모델을 준비하는 단계(S910)로 개시될 수 있다. 예를 들면, 인공신경망 모델 학습하기 위한 프로그램 코드에서, 미리 저장된 FP 모델을 불러오거나 선언할 수 있다. 그 후, FP 모델의 학습 워크플로우(workflow)를 생성할 수 있다(S920). 또한, 인공신경망의 파라미터들의 최적화를 실행하는 옵티마이저(optimizer)의 버전을, 인공신경망 모델의 가중치 값을 업데이트할 때 그래디언트 값을 증폭하는 버전으로 변경할 수 있다(S930). 그 후, FP 모델을 첫번째 에포크까지 학습하여 첫번째 모멘텀 값을 결정할 수 있다(S940). 예를 들어, 단계(S920)에서 생성한 학습 워크 플로우에 따라 FP 모델을 첫번째 에포크까지 학습할 수 있다.The algorithm for performing the artificial neural network model learning method according to the present embodiment may begin with the step of preparing an FP model with the most quantization compatibility ( S910 ). For example, in the program code for learning the artificial neural network model, a pre-stored FP model may be called or declared. Thereafter, a learning workflow of the FP model may be generated ( S920 ). Also, the version of the optimizer that optimizes the parameters of the artificial neural network may be changed to a version that amplifies the gradient value when the weight value of the artificial neural network model is updated ( S930 ). Thereafter, the first momentum value may be determined by learning the FP model up to the first epoch ( S940 ). For example, the FP model may be trained up to the first epoch according to the learning workflow generated in step S920 .

그 후, 레이어 융합 및 가장 양자화된 QAT 모델을 준비할 수 있다(S950). QAT 모델의 레이어 융합은 추론 과정에서 콘볼루션, 정규화 및 액티베이션을 하나의 콘폴루션 연산으로 통합함으로써 연산 지연을 감소시키는 것을 의미할 수 있다. 예를 들어, 단계(S940)에서 결정된 FP 모델의 첫번째 모멘텀 값을 초기 모멘텀 값으로 설정하여 QAT 모델을 준비할 수 있다.Thereafter, layer fusion and the most quantized QAT model may be prepared (S950). Layer fusion of the QAT model may mean reducing computational delay by integrating convolution, regularization, and activation into one convolution operation in the reasoning process. For example, the QAT model may be prepared by setting the first momentum value of the FP model determined in step S940 as the initial momentum value.

이렇게 준비된 QAT 모델을 QAT 방법으로 학습할 수 있다(S960). 이 때, 단계(S930)에서 옵티마이저의 버전을, 인공신경망 모델의 가중치 값을 업데이트할 때 그래디언트 값을 증폭하는 버전으로 변경하였으므로, QAT 모델의 복수의 가중치 중 적어도 하나의 가중치에 대한 그래디언트 값이 증폭되고, 증폭된 그래디언트 값을 기초로 QAT 모델의 파라미터 값이 업데이트될 수 있다. QAT 모델의 학습이 완료되면, 학습 완료된 모델을 INT8로 양자화된 버전으로 전환할 수 있다(S970).The prepared QAT model may be trained by the QAT method (S960). At this time, since the version of the optimizer is changed to a version that amplifies the gradient value when the weight value of the artificial neural network model is updated in step S930, the gradient value for at least one weight among the plurality of weights of the QAT model is amplified and the parameter value of the QAT model may be updated based on the amplified gradient value. When the learning of the QAT model is completed, the trained model may be converted into a quantized version of INT8 (S970).

상술한 인공신경망 모델 학습 방법은 컴퓨터에서 실행하기 위해 컴퓨터 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램으로 제공될 수 있다. 매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수개 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD 와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 애플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다.The above-described artificial neural network model learning method may be provided as a computer program stored in a computer-readable recording medium to be executed by a computer. The medium may be to continuously store a computer executable program, or to temporarily store it for execution or download. In addition, the medium may be various recording means or storage means in the form of a single or several hardware combined, it is not limited to a medium directly connected to any computer system, and may exist distributed on a network. Examples of the medium include a hard disk, a magnetic medium such as a floppy disk and a magnetic tape, an optical recording medium such as CD-ROM and DVD, a magneto-optical medium such as a floppy disk, and those configured to store program instructions, including ROM, RAM, flash memory, and the like. In addition, examples of other media may include recording media or storage media managed by an app store that distributes applications, sites that supply or distribute various other software, and servers.

본 개시의 방법, 동작 또는 기법들은 다양한 수단에 의해 구현될 수도 있다. 예를 들어, 이러한 기법들은 하드웨어, 펌웨어, 소프트웨어, 또는 이들의 조합으로 구현될 수도 있다. 본원의 개시와 연계하여 설명된 다양한 예시적인 논리적 블록들, 모듈들, 회로들, 및 알고리즘 단계들은 전자 하드웨어, 컴퓨터 소프트웨어, 또는 양자의 조합들로 구현될 수도 있음을 통상의 기술자들은 이해할 것이다. 하드웨어 및 소프트웨어의 이러한 상호 대체를 명확하게 설명하기 위해, 다양한 예시적인 구성요소들, 블록들, 모듈들, 회로들, 및 단계들이 그들의 기능적 관점에서 일반적으로 위에서 설명되었다. 그러한 기능이 하드웨어로서 구현되는지 또는 소프트웨어로서 구현되는지의 여부는, 특정 애플리케이션 및 전체 시스템에 부과되는 설계 요구사항들에 따라 달라진다. 통상의 기술자들은 각각의 특정 애플리케이션을 위해 다양한 방식들로 설명된 기능을 구현할 수도 있으나, 그러한 구현들은 본 개시의 범위로부터 벗어나게 하는 것으로 해석되어서는 안된다.The method, operation, or techniques of this disclosure may be implemented by various means. For example, these techniques may be implemented in hardware, firmware, software, or a combination thereof. Those of ordinary skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementations should not be interpreted as causing a departure from the scope of the present disclosure.

하드웨어 구현에서, 기법들을 수행하는 데 이용되는 프로세싱 유닛들은, 하나 이상의 ASIC들, DSP들, 디지털 신호 프로세싱 디바이스들(digital signal processing devices; DSPD들), 프로그램가능 논리 디바이스들(programmable logic devices; PLD들), 필드 프로그램가능 게이트 어레이들(field programmable gate arrays; FPGA들), 프로세서들, 제어기들, 마이크로제어기들, 마이크로프로세서들, 전자 디바이스들, 본 개시에 설명된 기능들을 수행하도록 설계된 다른 전자 유닛들, 컴퓨터, 또는 이들의 조합 내에서 구현될 수도 있다.In a hardware implementation, the processing units used to perform the techniques include one or more ASICs, DSPs, digital signal processing devices (DSPDs), programmable logic devices (PLDs). ), field programmable gate arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, electronic devices, and other electronic units designed to perform the functions described in this disclosure. , a computer, or a combination thereof.

따라서, 본 개시와 연계하여 설명된 다양한 예시적인 논리 블록들, 모듈들, 및 회로들은 범용 프로세서, DSP, ASIC, FPGA나 다른 프로그램 가능 논리 디바이스, 이산 게이트나 트랜지스터 로직, 이산 하드웨어 컴포넌트들, 또는 본원에 설명된 기능들을 수행하도록 설계된 것들의 임의의 조합으로 구현되거나 수행될 수도 있다. 범용 프로세서는 마이크로프로세서일 수도 있지만, 대안으로, 프로세서는 임의의 종래의 프로세서, 제어기, 마이크로제어기, 또는 상태 머신일 수도 있다. 프로세서는 또한, 컴퓨팅 디바이스들의 조합, 예를 들면, DSP와 마이크로프로세서, 복수의 마이크로프로세서들, DSP 코어와 연계한 하나 이상의 마이크로프로세서들, 또는 임의의 다른 구성의 조합으로서 구현될 수도 있다.Accordingly, the various illustrative logic blocks, modules, and circuits described in connection with this disclosure are suitable for use in general-purpose processors, DSPs, ASICs, FPGAs or other programmable logic devices, discrete gate or transistor logic, discrete hardware components, or the present disclosure. It may be implemented or performed in any combination of those designed to perform the functions described in A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, eg, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other configuration.

펌웨어 및/또는 소프트웨어 구현에 있어서, 기법들은 랜덤 액세스 메모리(random access memory; RAM), 판독 전용 메모리(read-only memory; ROM), 비휘발성 RAM(non-volatile random access memory; NVRAM), PROM(programmable read-only memory), EPROM(erasable programmable read-only memory), EEPROM(electrically erasable PROM), 플래시 메모리, 컴팩트 디스크(compact disc; CD), 자기 또는 광학 데이터 스토리지 디바이스 등과 같은 컴퓨터 판독가능 매체 상에 저장된 명령들로서 구현될 수도 있다. 명령들은 하나 이상의 프로세서들에 의해 실행 가능할 수도 있고, 프로세서(들)로 하여금 본 개시에 설명된 기능의 특정 양태들을 수행하게 할 수도 있다.In firmware and/or software implementations, the techniques include random access memory (RAM), read-only memory (ROM), non-volatile random access memory (NVRAM), PROM ( on computer-readable media such as programmable read-only memory), erasable programmable read-only memory (EPROM), electrically erasable PROM (EEPROM), flash memory, compact disc (CD), magnetic or optical data storage devices, and the like. It may be implemented as stored instructions. The instructions may be executable by one or more processors, and may cause the processor(s) to perform certain aspects of the functionality described in this disclosure.

이상 설명된 실시예들이 하나 이상의 독립형 컴퓨터 시스템에서 현재 개시된 주제의 양태들을 활용하는 것으로 기술되었으나, 본 개시는 이에 한정되지 않고, 네트워크나 분산 컴퓨팅 환경과 같은 임의의 컴퓨팅 환경과 연계하여 구현될 수도 있다. 또 나아가, 본 개시에서 주제의 양상들은 복수의 프로세싱 칩들이나 장치들에서 구현될 수도 있고, 스토리지는 복수의 장치들에 걸쳐 유사하게 영향을 받게 될 수도 있다. 이러한 장치들은 PC들, 네트워크 서버들, 및 휴대용 장치들을 포함할 수도 있다.Although the embodiments described above have been described utilizing aspects of the presently disclosed subject matter in one or more standalone computer systems, the present disclosure is not so limited and may be implemented in connection with any computing environment, such as a network or distributed computing environment. . Still further, aspects of the subject matter in this disclosure may be implemented in a plurality of processing chips or devices, and storage may be similarly affected across the plurality of devices. Such devices may include PCs, network servers, and portable devices.

본 명세서에서는 본 개시가 일부 실시예들과 관련하여 설명되었지만, 본 개시의 발명이 속하는 기술분야의 통상의 기술자가 이해할 수 있는 본 개시의 범위를 벗어나지 않는 범위에서 다양한 변형 및 변경이 이루어질 수 있다. 또한, 그러한 변형 및 변경은 본 명세서에 첨부된 특허청구의 범위 내에 속하는 것으로 생각되어야 한다.Although the present disclosure has been described in connection with some embodiments herein, various modifications and changes may be made without departing from the scope of the present disclosure that can be understood by those skilled in the art to which the present disclosure pertains. Further, such modifications and variations are intended to fall within the scope of the claims appended hereto.

110: FP 모델
120: 제1 모델 학습부
130: QAT 모델
140: 제2 모델 학습부
150: INT8 모델
200: 인공신경망 모델 학습 시스템
210: 통신 모듈
220: 메모리
230: 프로세서110: FP model
120: first model learning unit
130: QAT model
140: second model learning unit
150: INT8 model
200: artificial neural network model learning system
210: communication module
220: memory
230: processor

Claims

In the artificial neural network model learning method performed by at least one processor,
learning a first neural network model up to a preset epoch by a momentum-based gradient descent method, and determining a momentum value of the first neural network model at the epoch;
setting the determined momentum value as an initial momentum value of a second artificial neural network model; and
Based on the initial momentum value, updating the parameter value of the second artificial neural network model using training data,
The parameter includes a plurality of weights and momentum, an artificial neural network model learning method.

According to claim 1,
The step of learning the first neural network model up to the preset epoch by the momentum-based descending gradient method and determining the momentum value of the first neural network model at the epoch includes:
Learning the first artificial neural network model up to a first epoch, and determining a momentum value of the first artificial neural network model in the first epoch.

According to claim 1,
Quantizing the parameter values of the updated second artificial neural network model, further comprising the step of generating a third artificial neural network model.

4. The method of claim 3,
The parameter value of the first artificial neural network model is expressed as a first data type,
The parameter value of the second artificial neural network model is expressed as the first data type or the second data type,
The parameter value of the third artificial neural network model is expressed in the second data type,
The number of bits of the second data type is smaller than the number of bits of the first data type, an artificial neural network model learning method.

5. The method of claim 4,
The step of updating the parameter value of the second artificial neural network model using training data based on the initial momentum value includes:
performing a forward-propagation learning process of the second artificial neural network model using the parameter value expressed in the second data type; and
performing a backward-propagation learning process of the second artificial neural network model using the parameter value expressed in the first data type
Including, artificial neural network model learning method.

According to claim 1,
The step of updating the parameter value of the second artificial neural network model using training data based on the initial momentum value includes:
and updating parameter values of the second artificial neural network model using training data based on gradient values for a plurality of weights of the second artificial neural network model.

7. The method of claim 6,
The updating of the parameter values of the second artificial neural network model using training data based on the gradient values of the plurality of weights of the second artificial neural network model includes:
amplifying the gradient value in a direction corresponding to a current learning direction based on the sign of the gradient value for at least one of the plurality of weights; and
updating a parameter value of the second artificial neural network model based on the amplified gradient value
Including, artificial neural network model learning method.

A computer program stored in a computer-readable recording medium to execute the artificial neural network model learning method according to any one of claims 1 to 7 on a computer.

As an artificial neural network model learning system,
communication module;
Memory; and
At least one processor coupled to the memory and configured to execute at least one computer-readable program included in the memory
including,
the at least one program,
By learning the first neural network model up to a preset epoch by the momentum-based gradient descent method, the momentum value of the first neural network model at the epoch is determined, and the determined momentum value is used as the initial value of the second neural network model. Set to a momentum value, and based on the initial momentum value, including instructions for updating the parameter value of the second artificial neural network model using training data,
The parameter comprises a plurality of weights and momentum, an artificial neural network model learning system.

10. The method of claim 9,
the at least one program,
and instructions for learning the first neural network model until a first epoch, and determining a momentum value of the first neural network model in the first epoch.

10. The method of claim 9,
the at least one program,
The artificial neural network model learning system further comprising instructions for generating a third artificial neural network model by quantizing the parameter values of the updated second artificial neural network model.

12. The method of claim 11,
The parameter value of the first artificial neural network model is expressed as a first data type, the parameter value of the second artificial neural network model is expressed as the first data type or the second data type, and the parameter of the third artificial neural network model A value is represented by the second data type, and the number of bits of the second data type is smaller than the number of bits of the first data type.

13. The method of claim 12,
the at least one program,
The forward propagation learning process of the second artificial neural network model is performed using the parameter value expressed in the second data type, and the back propagation of the second artificial neural network model is performed using the parameter value expressed in the first data type. An artificial neural network model learning system, comprising instructions for performing a learning process.

10. The method of claim 9,
the at least one program,
and instructions for updating parameter values of the second artificial neural network model using training data based on the gradient values for the plurality of weights of the second artificial neural network model.

15. The method of claim 14,
the at least one program,
and instructions for amplifying the gradient value in a direction corresponding to a current learning direction based on the sign of the gradient value for at least one weight among the plurality of weights.