KR102368590B1

KR102368590B1 - Electronic apparatus and control method thereof

Info

Publication number: KR102368590B1
Application number: KR1020200121231A
Authority: KR
Inventors: 오지훈; 이상정; 박미정; 가우라브 푸니왈라; 권기석
Original assignee: 삼성전자주식회사
Priority date: 2020-05-19
Filing date: 2020-09-21
Publication date: 2022-03-02
Also published as: WO2021235656A1; KR20210143093A

Abstract

전자 장치가 개시된다. 본 전자 장치는 복수의 레이어로 구성된 인공 지능 모델이 저장된 메모리 및 프로세서를 포함하며, 인공 지능 모델은 복수의 레이어 각각에 포함된 복수의 채널 별로 상이한 시프트 스케일링 팩터에 기초하여 스케일링되고 복수의 레이어 별로 양자화된 복수의 가중치 값을 포함하며, 프로세서는 입력 데이터가 수신되면, 입력 데이터에 대한 신경망 연산 과정에서 각 채널 별 연산 결과를 각 채널에 대응되는 시프트 스케일링 팩터에 기초하여 역스케일링된 합성 스케일 파라미터와 연산할 수 있다.An electronic device is disclosed. The electronic device includes a memory and a processor in which an artificial intelligence model composed of a plurality of layers is stored, and the artificial intelligence model is scaled based on a different shift scaling factor for a plurality of channels included in each of the plurality of layers, and is quantized for each of the plurality of layers. and a plurality of weighted values, and when input data is received, the processor calculates an operation result for each channel in a neural network operation process on the input data, based on a shift scaling factor corresponding to each channel, and calculates an inversely scaled synthetic scale parameter can do.

Description

Electronic device and its control method { ELECTRONIC APPARATUS AND CONTROL METHOD THEREOF }

본 개시는 전자 장치 및 그 제어 방법에 대한 것으로, 더욱 상세하게는 딥러닝 등의 기계 학습 알고리즘을 활용하여 인간 두뇌의 인지, 판단 등의 기능을 모사하는 인공 지능(Artificial Intelligence, AI) 모델을 처리하는 전자 장치 및 그 제어 방법에 대한 것이다.The present disclosure relates to an electronic device and a control method thereof, and more particularly, processing an artificial intelligence (AI) model that mimics functions such as cognition and judgment of the human brain by using a machine learning algorithm such as deep learning It relates to an electronic device and a method for controlling the same.

최근 딥러닝 모델의 성능 저하를 최소화하면서 압축률을 높이기 위해 양자화(Quantization)가 이용되고 있다. 가중치 양자화(Weight Quantization) 방법에는 양자화를 하는 시점을 기준으로 학습 후 양자화(Post-training Quantization) 및 학습 중 양자화(Quantization-aware Training)로 나눌 수 있고, 양자화 방식을 기준으로 선형 양자화(Linear quantization) 및 비선형 양자화(Non-linear quantization)로 나눌 수 있다.Recently, quantization has been used to increase the compression rate while minimizing the performance degradation of deep learning models. The weight quantization method can be divided into post-training quantization and quantization-aware training based on the time of quantization, and linear quantization based on the quantization method. and non-linear quantization.

학습 후 양자화는 이미 학습된 Float32 모델을 이용하여 재학습 없이 IntN Quantization을 수행하기 때문에 양자화 속도가 빠르고 학습 데이터가 요구되지 않는다. 선형 양자화는 하드웨어에서 INT MULTIPLYER와 INT ADDER로 쉽게 구현할 수 있어 Neural Processing Units(NPU)에서 보편적으로 지원된다.Quantization after learning uses an already trained Float32 model to perform IntN quantization without re-learning, so the quantization speed is fast and training data is not required. Linear quantization is universally supported in Neural Processing Units (NPUs) as it can be easily implemented in hardware with INT MULTIPLYER and INT ADDER.

이러한 장점에도 불구하고, 일반적으로 학습 후 양자화와 선형 양자화의 조합은 학습 중 양자화와 비선형 양자화의 조합 대비 양자화 후에 작은 양자 비트(8비트 이하) 조건으로 갈수록 정확도 손실이 크다는 단점이 존재한다. 이는 하나의 레이어에 포함된 복수의 채널이 서로 파라미터 분포가 상이하며, 특히 작은 범위의 파라미터 분포를 가지는 채널은 양자화 후에 하나의 양자값으로 수렴하여 양자 에러가 커지기 때문이다.In spite of these advantages, in general, the combination of quantization and linear quantization after learning has a disadvantage in that the accuracy loss becomes larger as the condition of small quantum bits (8 bits or less) after quantization is compared to the combination of quantization and nonlinear quantization during learning. This is because a plurality of channels included in one layer have different parameter distributions, and in particular, a channel having a small parameter distribution converges to one quantum value after quantization, resulting in a large quantum error.

이러한 단점을 극복하기 위해 다양한 방법이 개발되고 있다.Various methods have been developed to overcome these shortcomings.

먼저, 채널 별 양자화(Channel-wise quantization)는 기존 방식과 같이 신경망 레이어 단위로 양자화를 진행하여 레이어 별로 파라미터의 최소값, 최대값 한 쌍을 구하고, 이로부터 하드웨어 Fixed computing에 필요한 양자 파라미터(QWeight, Scale, Zero point)를 계산하는 것과는 달리, 레이어에 포함된 채널 단위로 양자화를 진행하여 채널 별로 파라미터의 최소값, 최대값 한 쌍을 구한다. 예를 들어, n채널인 경우 n개의 [min, max]가 획득될 수 있다.First, in channel-wise quantization, quantization is performed in units of neural network layers as in the existing method to obtain a pair of minimum and maximum parameters for each layer, and from this, quantum parameters (QWeight, Scale) required for hardware fixed computing are obtained. , zero point), quantization is performed in units of channels included in the layer to obtain a pair of minimum and maximum parameters for each channel. For example, in the case of n channels, n [min, max] may be obtained.

이러한 동작을 통해 양자화 정밀도 손실은 작아지게 되나, 양자 파라미터 사이즈가 채널 개수에 비례해서 증가하는 문제가 있다. 이는 메인 메모리에서 캐시 메모리로 양자 파라미터를 로딩하는 시간을 증가시켜 Latency 성능이 저하된다.Although the loss of quantization precision is reduced through this operation, there is a problem in that the size of a quantum parameter increases in proportion to the number of channels. This increases the time to load the quantum parameters from the main memory to the cache memory, which degrades the latency performance.

레이어간 파라미터 동등화(Cross Layer Equalization, CLE)는 앞 컨볼루션 레이어의 출력과 뒤 컨볼루션 레이어의 대응되는 입력에

(float) scaling,

(float) rescaling를 적용하는 전처리를 수행한다. 신경망 첫 레이어부터 마지막 레이어까지 연속적으로 스케일을 조정하며, 이를

의 변화가 없게되는 때까지 전체를 다시 반복 수행하며, 전처리가 종료되면 일반적인 Layer-wise quantization를 수행한다.Cross Layer Equalization (CLE) is performed on the output of the front convolution layer and the corresponding input of the back convolution layer.

(float) scaling,

(float) Performs preprocessing that applies rescaling. It scales continuously from the first layer to the last layer of the neural network, and

The whole process is repeated again until there is no change in , and when the pre-processing is finished, general layer-wise quantization is performed.

이러한 전처리를 통해 채널 별 파라미터의 범위가 겹치도록 조정되어 양자화 정밀도 손실은 작아지나, 연속된 신경망 레이어 간에 scaling equivariance를 유지해야 하는 속성 때문에 레이어 사이의 ReLU6나 PReLU처럼 piecewise linear인 경우에는 문제가 발생할 수 있다. 그에 따라, ReLU6나 PReLU를 ReLU로 교체해주어야 하며, 그에 따라 activated feature map distribution이 달라지고, 오차가 커지고 정확도가 하락하는 문제가 발생한다.Through this preprocessing, the range of parameters for each channel is adjusted to overlap, so the loss of quantization precision is small. there is. Accordingly, it is necessary to replace ReLU6 or PReLU with ReLU, and accordingly, the activated feature map distribution changes, the error increases, and the accuracy decreases.

그에 따라, 신경망 레이어의 구조적 변화를 요구하지 않으면서 하드웨어 연산에 유리하고, 양자화의 정확도를 유지하면서도 양자화 효율을 높이는 방법이 개발될 필요가 있다.Accordingly, there is a need to develop a method that does not require structural change of the neural network layer, is advantageous for hardware operation, and increases quantization efficiency while maintaining quantization accuracy.

본 개시는 상술한 필요성에 따른 것으로, 본 개시의 목적은 채널 별로 시프트 스케일링되고 레이어 별로 양자화된 인공 지능 모델을 이용하여 신경망 연산을 수행하는 전자 장치 및 그 제어 방법을 제공함에 있다.The present disclosure has been made in accordance with the above-mentioned necessity, and an object of the present disclosure is to provide an electronic device for performing a neural network operation using an artificial intelligence model shift-scaled for each channel and quantized for each layer, and a method for controlling the same.

이상과 같은 목적을 달성하기 위한 본 개시의 일 실시 예에 따른 전자 장치는 복수의 레이어로 구성된 인공 지능 모델이 저장된 메모리 및 프로세서를 포함하며, 상기 인공 지능 모델은 상기 복수의 레이어 각각에 포함된 복수의 채널 별로 상이한 시프트 스케일링 팩터에 기초하여 스케일링되고 상기 복수의 레이어 별로 양자화된 복수의 가중치 값을 포함하며, 상기 프로세서는 입력 데이터가 수신되면, 상기 입력 데이터에 대한 신경망 연산 과정에서 각 채널 별 연산 결과를 상기 각 채널에 대응되는 시프트 스케일링 팩터에 기초하여 역스케일링된 합성 스케일 파라미터와 연산할 수 있다.In order to achieve the above object, an electronic device according to an embodiment of the present disclosure includes a memory and a processor in which an artificial intelligence model composed of a plurality of layers is stored, and the artificial intelligence model includes a plurality of layers included in each of the plurality of layers. includes a plurality of weight values scaled based on a different shift scaling factor for each channel and quantized for each of the plurality of layers of may be calculated with the inverse-scaled synthetic scale parameter based on the shift scaling factor corresponding to each channel.

또한, 상기 인공 지능 모델은 상기 양자화된 복수의 가중치 값, 상기 복수의 레이어 각각에 포함된 복수의 채널 별 시프트 스케일링 팩터 및 상기 복수의 레이어 각각에 대응되는 스케일 파라미터와 제로 포인트 파라미터를 포함하며, 상기 스케일 파라미터는 상기 복수의 레이어 각각을 양자화하는 경우, 양자화 전의 값과 양자화 후 값 간의 기울기를 나타내고, 상기 제로 포인트 파라미터는 상기 복수의 레이어 각각을 양자화하는 경우, 양자화 전 제로 값의 양자화 후의 값을 나타낼 수 있다.In addition, the artificial intelligence model includes the plurality of quantized weight values, a shift scaling factor for each channel included in each of the plurality of layers, and a scale parameter and a zero point parameter corresponding to each of the plurality of layers, The scale parameter represents a gradient between a value before quantization and a value after quantization when each of the plurality of layers is quantized, and the zero point parameter represents a value after quantization of a zero value before quantization when quantizing each of the plurality of layers. can

그리고, 상기 프로세서는 현재 레이어의 스케일 파라미터, 상기 현재 레이어 직전 레이어의 스케일 파라미터 및 상기 복수의 가중치 값의 스케일 파라미터에 기초하여 획득된 값을, 상기 각 채널에 대응되는 시프트 스케일링 팩터에 기초하여 역스케일링하여 상기 합성 스케일 파라미터를 획득할 수 있다.In addition, the processor inversely scales a value obtained based on a scale parameter of a current layer, a scale parameter of a layer immediately before the current layer, and a scale parameter of the plurality of weight values based on a shift scaling factor corresponding to each channel. to obtain the synthesis scale parameter.

그리고, 상기 프로세서는 상기 획득된 값을 상기 각 채널에 대응되는 시프트 스케일링 팩터에 기초하여 시프트함으로써 상기 합성 스케일 파라미터를 획득할 수 있다.In addition, the processor may obtain the synthesized scale parameter by shifting the obtained value based on a shift scaling factor corresponding to each channel.

여기서, 상기 획득된 값은 상기 현재 레이어의 스케일 파라미터에 반비례하고, 상기 현재 레이어 직전 레이어의 스케일 파라미터 및 상기 복수의 가중치 값의 스케일 파라미터에 비례할 수 있다.Here, the obtained value may be inversely proportional to a scale parameter of the current layer, and may be proportional to a scale parameter of a layer immediately before the current layer and a scale parameter of the plurality of weight values.

또한, 상기 프로세서는 상기 복수의 레이어 각각에 포함된 복수의 채널 별 시프트 스케일링 팩터가 상기 인공 지능 모델에 포함되었다고 식별되면, 상기 역스케일링을 수행할 수 있다.In addition, when it is identified that the shift scaling factors for each channel included in each of the plurality of layers are included in the artificial intelligence model, the processor may perform the inverse scaling.

또한, 상기 전자 장치는 신경망 처리 장치(Neural Processing Unit, NPU)로 구현될 수 있다.Also, the electronic device may be implemented as a Neural Processing Unit (NPU).

그리고, 상기 각 채널에 대응되는 시프트 스케일링 팩터는 상기 각 채널에 포함된 가중치 값 및 상기 각 채널을 포함하는 레이어에 포함된 가중치 값에 기초하여 결정될 수 있다.The shift scaling factor corresponding to each channel may be determined based on a weight value included in each channel and a weight value included in a layer including each channel.

또한, 상기 각 채널에 대응되는 시프트 스케일링 팩터는 상기 각 채널에서 가장 크기가 큰 가중치 값 및 상기 각 채널을 포함하는 레이어에서 가장 크기가 큰 가중치 값에 기초하여 결정될 수 있다.In addition, the shift scaling factor corresponding to each channel may be determined based on a weight value having a largest magnitude in each channel and a weight value having a largest magnitude in a layer including each channel.

한편, 본 개시의 일 실시 예에 따른 전자 장치의 제어 방법은 입력 데이터를 수신하는 단계 및 인공 지능 모델을 이용하여 상기 입력 데이터에 대한 신경망 연산 과정에서 상기 인공 지능 모델을 구성하는 복수의 레이어 각각의 채널 별 연산 결과를 상기 각 채널에 대응되는 시프트 스케일링 팩터에 기초하여 역스케일링된 합성 스케일 파라미터와 연산하는 단계를 포함하며, 상기 인공 지능 모델은 상기 복수의 레이어 각각에 포함된 복수의 채널 별로 상이한 시프트 스케일링 팩터에 기초하여 스케일링되고 상기 복수의 레이어 별로 양자화된 복수의 가중치 값을 포함할 수 있다.Meanwhile, in the method of controlling an electronic device according to an embodiment of the present disclosure, each of a plurality of layers constituting the artificial intelligence model in the step of receiving input data and a neural network calculation process for the input data using the artificial intelligence model calculating an operation result for each channel with an inversely scaled synthetic scale parameter based on a shift scaling factor corresponding to each channel, wherein the artificial intelligence model performs a different shift for each channel included in each of the plurality of layers. A plurality of weight values scaled based on a scaling factor and quantized for each of the plurality of layers may be included.

그리고, 상기 연산하는 단계는 현재 레이어의 스케일 파라미터, 상기 현재 레이어 직전 레이어의 스케일 파라미터 및 상기 복수의 가중치 값의 스케일 파라미터에 기초하여 획득된 값을, 상기 각 채널에 대응되는 시프트 스케일링 팩터에 기초하여 역스케일링하여 상기 합성 스케일 파라미터를 획득할 수 있다.In addition, the calculating may include calculating a value obtained based on a scale parameter of a current layer, a scale parameter of a layer immediately before the current layer, and a scale parameter of the plurality of weight values, based on a shift scaling factor corresponding to each channel. The synthesis scale parameter may be obtained by inverse scaling.

그리고, 상기 연산하는 단계는 상기 획득된 값을 상기 각 채널에 대응되는 시프트 스케일링 팩터에 기초하여 시프트함으로써 상기 합성 스케일 파라미터를 획득할 수 있다.In addition, the calculating may include obtaining the synthesized scale parameter by shifting the obtained value based on a shift scaling factor corresponding to each channel.

또한, 상기 연산하는 단계는 상기 복수의 레이어 각각에 포함된 복수의 채널 별 시프트 스케일링 팩터가 상기 인공 지능 모델에 포함되었다고 식별되면, 상기 연산을 수행할 수 있다.In addition, in the calculating, when it is identified that the shift scaling factors for each channel included in each of the plurality of layers are included in the artificial intelligence model, the calculation may be performed.

이상과 같은 본 개시의 다양한 실시 예에 따르면, 전자 장치는 입력 데이터에 대한 신경망 연산 과정에서 각 채널 별 연산 결과를 각 채널에 대응되는 시프트 스케일링 팩터에 기초하여 역스케일링된 합성 스케일 파라미터와 연산함으로써 상대적으로 작은 용량으로 구현된 인공 지능 모델을 이용하면서도 신경망 연산을 정확도를 향상시킬 수 있다.According to various embodiments of the present disclosure as described above, the electronic device calculates the operation result for each channel with the inversely scaled synthetic scale parameter based on the shift scaling factor corresponding to each channel in the neural network operation process for the input data, so that the relative It is possible to improve the accuracy of neural network computation while using an artificial intelligence model implemented with a small capacity.

도 1a는 본 개시의 일 실시 예에 따른 전자 장치의 구성을 나타내는 블록도이다.
도 1b는 본 개시의 일 실시 예에 따른 전자 장치의 소프트웨어 구성을 설명하기 위한 블록도이다.
도 1c는 본 개시의 일 실시 예에 따른 채널 별 스케일링을 설명하기 위한 도면이다.
도 1d 및 도 1e는 본 개시의 다양한 실시 예에 따른 시프터의 구현 방법을 설명하기 위한 도면이다.
도 2는 본 개시의 일 실시 예에 따른 컴파일러 및 전자 장치의 동작을 설명하기 위한 도면이다.
도 3은 본 개시의 다양한 실시 예에 따른 시프트 스케일링 팩터를 획득하는 방법을 설명하기 위한 흐름도이다.
도 4a 및 도 4b는 본 개시의 일 실시 예에 따른 역스케일링 동작을 설명하기 위한 도면들이다.
도 5a 내지 도 5c는 본 개시에 따른 효과를 설명하기 위한 도면들이다.
도 6은 본 개시의 일 실시 예에 따른 전자 장치의 제어 방법을 설명하기 위한 흐름도이다.1A is a block diagram illustrating a configuration of an electronic device according to an embodiment of the present disclosure.
1B is a block diagram illustrating a software configuration of an electronic device according to an embodiment of the present disclosure.
1C is a diagram for explaining scaling for each channel according to an embodiment of the present disclosure.
1D and 1E are diagrams for explaining a method of implementing a shifter according to various embodiments of the present disclosure.
2 is a diagram for explaining operations of a compiler and an electronic device according to an embodiment of the present disclosure.
3 is a flowchart illustrating a method of obtaining a shift scaling factor according to various embodiments of the present disclosure.
4A and 4B are diagrams for explaining an inverse scaling operation according to an embodiment of the present disclosure.
5A to 5C are diagrams for explaining the effect according to the present disclosure.
6 is a flowchart illustrating a method of controlling an electronic device according to an embodiment of the present disclosure.

이하에서는 첨부 도면을 참조하여 본 개시를 상세히 설명한다.Hereinafter, the present disclosure will be described in detail with reference to the accompanying drawings.

본 개시의 실시 예에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 개시의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다.Terms used in the embodiments of the present disclosure are selected as currently widely used general terms as possible while considering the functions in the present disclosure, which may vary depending on the intention or precedent of a person skilled in the art, the emergence of new technology, etc. . In addition, in a specific case, there is a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the corresponding disclosure. Therefore, the terms used in the present disclosure should be defined based on the meaning of the term and the contents of the present disclosure, rather than the simple name of the term.

본 명세서에서, "가진다," "가질 수 있다," "포함한다," 또는 "포함할 수 있다" 등의 표현은 해당 특징(예: 수치, 기능, 동작, 또는 부품 등의 구성요소)의 존재를 가리키며, 추가적인 특징의 존재를 배제하지 않는다.In this specification, expressions such as “have,” “may have,” “include,” or “may include” indicate the presence of a corresponding characteristic (eg, a numerical value, function, operation, or component such as a part). and does not exclude the presence of additional features.

A 또는/및 B 중 적어도 하나라는 표현은 "A" 또는 "B" 또는 "A 및 B" 중 어느 하나를 나타내는 것으로 이해되어야 한다.The expression "at least one of A and/or B" is to be understood as indicating either "A" or "B" or "A and B".

본 명세서에서 사용된 "제1," "제2," "첫째," 또는 "둘째,"등의 표현들은 다양한 구성요소들을, 순서 및/또는 중요도에 상관없이 수식할 수 있고, 한 구성요소를 다른 구성요소와 구분하기 위해 사용될 뿐 해당 구성요소들을 한정하지 않는다.As used herein, expressions such as "first," "second," "first," or "second," can modify various elements, regardless of order and/or importance, and refer to one element. It is used only to distinguish it from other components, and does not limit the components.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "구성되다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as "comprises" or "consisting of" are intended to designate that the features, numbers, steps, operations, components, parts, or combinations thereof described in the specification exist, and are intended to indicate that one or more other It should be understood that this does not preclude the possibility of addition or presence of features or numbers, steps, operations, components, parts, or combinations thereof.

본 명세서에서, 사용자라는 용어는 전자 장치를 사용하는 사람 또는 전자 장치를 사용하는 장치(예: 인공 지능 전자 장치)를 지칭할 수 있다.In this specification, the term user may refer to a person who uses an electronic device or a device (eg, an artificial intelligence electronic device) using the electronic device.

이하 첨부된 도면들을 참조하여 본 개시의 다양한 실시 예를 보다 상세하게 설명한다.Hereinafter, various embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings.

도 1a는 본 개시의 일 실시 예에 따른 전자 장치(100)의 구성을 나타내는 블록도이다. 전자 장치(100)는 도 1a에 도시된 바와 같이, 메모리(110) 및 프로세서(120)를 포함한다.1A is a block diagram illustrating a configuration of an electronic device 100 according to an embodiment of the present disclosure. The electronic device 100 includes a memory 110 and a processor 120 as shown in FIG. 1A .

전자 장치(100)는 인공 지능 모델에 기초하여 신경망 연산을 수행하는 장치일 수 있다. 예를 들어, 전자 장치(100)는 인공 지능 모델을 저장하고, 입력 데이터가 수신되면, 인공 지능 모델에 기초하여 입력 데이터에 대한 신경망 연산을 수행하는 장치로서, 데스크탑 PC, 노트북, TV 등으로 구현될 수 있다. 다만, 이에 한정되는 것은 아니며, 전자 장치(100)는 인공 지능 모델에 기초하여 신경망 연산을 수행할 수 있는 장치라면 어떠한 장치라도 무방하다.The electronic device 100 may be a device that performs a neural network operation based on an artificial intelligence model. For example, the electronic device 100 stores an artificial intelligence model, and when input data is received, a device that performs a neural network operation on the input data based on the artificial intelligence model, and is implemented with a desktop PC, a laptop computer, a TV, etc. can be However, the present invention is not limited thereto, and the electronic device 100 may be any device as long as it is capable of performing a neural network operation based on an artificial intelligence model.

특히, 전자 장치(100)는 스마트폰, 태블릿 PC, 웨어러블 기기 등과 같이 리소스가 한정된 장치로서, 양자화된 인공 지능 모델을 저장하고, 양자화된 인공 지능 모델에 기초하여 신경망 연산을 수행하는 장치일 수 있다. 양자화란 연속적인 값을 복수의 레벨로 구분하고, 각 레벨 내의 값을 각 레벨을 대표하는 값으로 치환하는 것을 의미한다.In particular, the electronic device 100 is a device with limited resources, such as a smart phone, a tablet PC, a wearable device, etc., and may be a device that stores a quantized artificial intelligence model and performs a neural network operation based on the quantized artificial intelligence model. . Quantization means dividing a continuous value into a plurality of levels, and substituting a value within each level with a value representing each level.

예를 들어, 0에서 1 사이의 값을 1로 치환하고, 1에서 2 사이의 값을 2로 치환하는 양자화를 통해 데이터 크기를 줄일 수 있다. 즉, 인공 지능 모델을 양자화하여 리소스가 한정된 전자 장치(100)에서도 온 디바이스(on-device) 형태로 신경망 연산이 수행될 수 있다. 인공 지능 모델의 양자화에 대한 구체적인 설명을 후술한다.For example, a data size may be reduced through quantization in which a value between 0 and 1 is replaced with 1 and a value between 1 and 2 is replaced with 2. That is, the neural network operation may be performed in an on-device form even in the electronic device 100 having limited resources by quantizing the artificial intelligence model. A detailed description of the quantization of the artificial intelligence model will be described later.

메모리(110)는 프로세서(120) 등이 접근할 수 있도록 데이터 등의 정보를 전기 또는 자기 형태로 저장하는 하드웨어를 지칭할 수 있다. 이를 위해, 메모리(110)는 비휘발성 메모리, 휘발성 메모리, 플래시 메모리(Flash Memory), 하드디스크 드라이브(HDD) 또는 솔리드 스테이트 드라이브(SSD), RAM, ROM 등 중에서 적어도 하나의 하드웨어로 구현될 수 있다.The memory 110 may refer to hardware that stores information such as data in an electrical or magnetic form so that the processor 120 can access it. To this end, the memory 110 may be implemented as hardware at least one of non-volatile memory, volatile memory, flash memory, hard disk drive (HDD) or solid state drive (SSD), RAM, ROM, etc. .

메모리(110)에는 전자 장치(100) 또는 프로세서(120)의 동작에 필요한 적어도 하나의 인스트럭션(instruction) 또는 모듈이 저장될 수 있다. 여기서, 인스트럭션은 전자 장치(100) 또는 프로세서(120)의 동작을 지시하는 부호 단위로서, 컴퓨터가 이해할 수 있는 언어인 기계어로 작성된 것일 수 있다. 모듈은 작업 단위의 특정 작업을 수행하는 일련의 인스트럭션의 집합체(instruction set)일 수 있다.At least one instruction or module required for the operation of the electronic device 100 or the processor 120 may be stored in the memory 110 . Here, the instruction is a code unit for instructing the operation of the electronic device 100 or the processor 120 , and may be written in machine language, which is a language that a computer can understand. A module may be a set of instructions that perform a specific task of a unit of work.

메모리(110)에는 문자, 수, 영상 등을 나타낼 수 있는 비트 또는 바이트 단위의 정보인 데이터가 저장될 수 있다. 예를 들어, 메모리(110)에는 복수의 문장을 포함하는 문서와 같은 데이터가 저장될 수 있다.The memory 110 may store data that is information in units of bits or bytes that can represent characters, numbers, images, and the like. For example, data such as a document including a plurality of sentences may be stored in the memory 110 .

메모리(110)에는 복수의 레이어로 구성된 인공 지능 모델이 저장될 수 있다. 여기서, 인공 지능 모델은 복수의 레이어 각각에 포함된 복수의 채널 별로 상이한 시프트 스케일링 팩터에 기초하여 스케일링되고 복수의 레이어 별로 양자화된 복수의 가중치 값을 포함할 수 있다.An artificial intelligence model composed of a plurality of layers may be stored in the memory 110 . Here, the artificial intelligence model may include a plurality of weight values scaled based on a different shift scaling factor for each of a plurality of channels included in each of the plurality of layers and quantized for each of the plurality of layers.

예를 들어, 스케일링 및 양자화 전의 인공 지능 모델이 5개의 레이어로 구성되고, 5개의 레이어 각각은 32개의 채널을 포함한다고 가정하면, 스케일링 및 양자화 전의 인공 지능 모델은 총 160개의 채널을 포함할 수 있다. 먼저, 160개의 채널 각각은 상이한 시프트 스케일링 팩터에 기초하여 스케일링될 수 있다. 도 1c의 상단은 인공 지능 모델에 포함된 5개의 레이어 중 하나에 포함된 복수의 채널을 도시하였다. 도 1c의 하단은 복수의 채널 각각이 상이한 시프트 스케일링 팩터를 통해 스케일링된 도면이다.For example, if it is assumed that the AI model before scaling and quantization consists of 5 layers, and each of the 5 layers contains 32 channels, the AI model before scaling and quantization may include a total of 160 channels. . First, each of the 160 channels may be scaled based on a different shift scaling factor. The upper part of FIG. 1C shows a plurality of channels included in one of five layers included in the artificial intelligence model. The lower part of FIG. 1C is a diagram in which each of a plurality of channels is scaled by a different shift scaling factor.

이러한 방식으로 시프트 스케일링 팩터는 채널 별로 상이할 수 있으며, 그에 따라 전체 시프트 스케일링 팩터는 총 160개일 수 있다. 여기서, 채널 별 시프트 스케일링 팩터는 하드웨어에서 shift 연산으로 수행될 수 있도록 Power-of-Two 형태로 결정될 수 있다. 가령, 제1 채널은 shift by 3에 기초하여 스케일링되고, 제2 채널은 shift by 5에 기초하여 스케일링될 수 있다. 이때, 각 채널에 적용되는 시프트 스케일링 팩터는 각 채널에서 가장 크기가 큰 가중치 값 및 각 채널이 포함된 레이어에서 가장 크기가 큰 가중치 값에 기초하여 결정될 수 있다. 가령, 5개의 레이어 중 제1 레이어에서 크기가 가장 큰 가중치 값이 10이고, 제1 레이어에 포함된 제1 채널에서 크기가 가장 큰 가중치 값이 6이면, 10과 6 간의 2지수 대수 비율(logarithmic ratio)에 기초하여 제1 레이어에 포함된 제1 채널의 시프트 스케일링 팩터의 초기값이 결정될 수 있다. 그리고, 양자-역양자화 에러값이나 Top-1 테스트 정확도를 비용 함수로 정의하고, 비선형 최적화 방법(Nelder-Mead, Bayesian Optimization, etc.)을 통해 채널 별 시프트 스케일링 팩터의 최적 값이 획득될 수 있다. 이상과 같이 편차가 상대적으로 작은 채널은 시프트 스케일링 팩터가 상대적으로 크고, 편차가 상대적으로 큰 채널은 시프트 스케일링 팩터가 상대적으로 작을 수 있으며, 이러한 동작을 통해 레이어 단위로 양자화를 수행하더라도 일정 수준의 정확도가 확보될 수 있다.In this way, the shift scaling factors may be different for each channel, and accordingly, the total number of shift scaling factors may be 160. Here, the shift scaling factor for each channel may be determined in a power-of-two form to be performed by a shift operation in hardware. For example, the first channel may be scaled based on shift by 3, and the second channel may be scaled based on shift by 5. In this case, the shift scaling factor applied to each channel may be determined based on a weight value having the largest magnitude in each channel and a weight value having the largest magnitude in a layer including each channel. For example, if the weight value with the largest size in the first layer among the five layers is 10 and the weight value with the largest size in the first channel included in the first layer is 6, the logarithmic ratio between 10 and 6 (logarithmic) ratio), an initial value of the shift scaling factor of the first channel included in the first layer may be determined. And, the quantum-inverse quantization error value or Top-1 test accuracy is defined as a cost function, and the optimal value of the shift scaling factor for each channel can be obtained through a nonlinear optimization method (Nelder-Mead, Bayesian Optimization, etc.). . As described above, a channel with a relatively small deviation may have a relatively large shift scaling factor, and a channel with a relatively large deviation may have a relatively small shift scaling factor. can be secured.

스케일링이 완료되면, 각 레이어 별로 양자화가 수행된다. 상술한 예에서 가령, 제1 레이어의 가중치 값 중 최소값 및 최대값이 각각 0 및 255로 매핑되면, 제1 레이어의 가중치 값은 0에서 255 사이의 정수로 치환될 수 있다.When scaling is completed, quantization is performed for each layer. In the above-described example, for example, if the minimum and maximum values of the weight values of the first layer are mapped to 0 and 255, respectively, the weight value of the first layer may be replaced with an integer between 0 and 255.

이상과 같이 채널 별로 시프트 스케일링되고 레이어 별로 양자화된 인공 지능 모델은 양자화된 복수의 가중치 값, 복수의 레이어 각각에 포함된 복수의 채널 별 시프트 스케일링 팩터 및 복수의 레이어 각각에 대응되는 스케일 파라미터와 제로 포인트 파라미터를 포함할 수 있다. 여기서, 스케일 파라미터는 복수의 레이어 각각을 양자화하는 경우, 양자화 전의 값과 양자화 후 값 간의 기울기를 나타내고, 제로 포인트 파라미터는 복수의 레이어 각각을 양자화하는 경우, 양자화 전 제로 값의 양자화 후의 값을 나타낼 수 있다.As described above, the artificial intelligence model shift-scaled for each channel and quantized for each layer has a plurality of quantized weight values, a shift scaling factor for each channel included in each of the plurality of layers, and a scale parameter and zero point corresponding to each of the plurality of layers. It may contain parameters. Here, the scale parameter represents a gradient between a value before quantization and a value after quantization when each of a plurality of layers is quantized, and a zero point parameter represents a value after quantization of a zero value before quantization when each of a plurality of layers is quantized. there is.

이는 종래 채널 별 스케일링 없이 레이어 단위 양자화를 수행하는 경우보다 채널 별 시프트 스케일링 팩터만큼의 추가 데이터가 필요하나, 종래 스케일링 없이 양자화를 수행하는 경우는 양자화에 따라 일부 채널의 정확도가 낮아지는 문제가 있으며, 본 개시에 따르면 채널 별 시프트 스케일링을 통해 이러한 문제를 해소하는 장점이 있다. 또는, 종래 채널 별 양자화를 수행하는 경우 정확도가 확보되는 반면, 각 채널마다 양자화 파라미터(38-bit 부동 소수점 형태의 스케일과 8-bit 정수 형태의 제로 포인트)가 필요하여 상당한 데이터가 필요한 문제가 있으나, 본 개시에 따르면 채널 별 시프트 스케일링을 통해 일정 수준의 정확도를 확보하면서도 추가되는 데이터는 4-bit 정수 형태의 채널 별 시프트 스케일링 팩터에 불과한 장점이 있다.This requires additional data as much as a shift scaling factor for each channel than when performing layer-by-layer quantization without scaling for each channel in the prior art. According to the present disclosure, there is an advantage in solving this problem through shift scaling for each channel. Alternatively, in the case of performing quantization for each channel in the related art, accuracy is secured, but quantization parameters (38-bit floating-point scale and 8-bit integer zero point) are required for each channel, which requires considerable data. , according to the present disclosure, while securing a certain level of accuracy through shift scaling for each channel, there is an advantage that the added data is only a shift scaling factor for each channel in the form of a 4-bit integer.

즉, 채널 별 시프트 스케일링 후 레이어 별로 양자화된 인공 지능 모델은 종래 스케일링 없이 레이어 별로 양자화를 수행하는 경우보다 채널 별 시프트 스케일링 팩터만큼의 추가 저장 용량을 요구하나 가중치의 용량 대비 미미한 수준으로, 이는 온 디바이스 형태로 신경망 연산을 수행함에 있어 크게 문제되지 않는다. 그럼에도 채널 별 스케일링 후 레이어 별로 양자화된 인공 지능 모델을 이용하면 일정 수준의 정확도가 확보 가능하며, 이에 대하여는 프로세서(120)의 동작과 함께 설명한다.That is, the artificial intelligence model quantized for each layer after shift scaling for each channel requires an additional storage capacity as much as the shift scaling factor for each channel compared to the case where quantization is performed for each layer without conventional scaling, but is insignificant compared to the capacity of the weight, which is on-device There is no big problem in performing neural network calculations in the form. Nevertheless, if a quantized artificial intelligence model is used for each layer after scaling for each channel, a certain level of accuracy can be secured, and this will be described along with the operation of the processor 120 .

프로세서(120)는 전자 장치(100)의 동작을 전반적으로 제어한다. 구체적으로, 프로세서(120)는 전자 장치(100)의 각 구성과 연결되어 전자 장치(100)의 동작을 전반적으로 제어할 수 있다. 예를 들어, 프로세서(120)는 메모리(110), 통신 인터페이스(미도시) 등과 같은 구성과 연결되어 전자 장치(100)의 동작을 제어할 수 있다.The processor 120 controls the overall operation of the electronic device 100 . Specifically, the processor 120 may be connected to each component of the electronic device 100 to control the overall operation of the electronic device 100 . For example, the processor 120 may be connected to components such as the memory 110 and a communication interface (not shown) to control the operation of the electronic device 100 .

일 실시 예에 따라 프로세서(120)는 디지털 시그널 프로세서(digital signal processor(DSP), 마이크로 프로세서(microprocessor), TCON(Time controller)으로 구현될 수 있다. 다만, 이에 한정되는 것은 아니며, 중앙처리장치(central processing unit(CPU)), MCU(Micro Controller Unit), MPU(micro processing unit), 컨트롤러(controller), 어플리케이션 프로세서(application processor(AP)), 또는 커뮤니케이션 프로세서(communication processor(CP)), ARM 프로세서 중 하나 또는 그 이상을 포함하거나, 해당 용어로 정의될 수 있다. 또한, 프로세서(120)는 프로세싱 알고리즘이 내장된 SoC(System on Chip), LSI(large scale integration)로 구현될 수도 있고, FPGA(Field Programmable gate array) 형태로 구현될 수도 있다.According to an embodiment, the processor 120 may be implemented as a digital signal processor (DSP), a microprocessor, or a time controller (TCON). However, the present invention is not limited thereto, and the central processing unit ( central processing unit (CPU)), micro controller unit (MCU), micro processing unit (MPU), controller, application processor (AP), or communication processor (CP), ARM processor In addition, the processor 120 may be implemented as a SoC (System on Chip) or LSI (large scale integration) in which a processing algorithm is embedded, or an FPGA ( Field programmable gate array) may be implemented.

프로세서(120)는 입력 데이터가 수신되면, 입력 데이터에 대한 신경망 연산 과정에서 각 채널 별 연산 결과를 각 채널에 대응되는 시프트 스케일링 팩터에 기초하여 역스케일링된 합성 스케일 파라미터와 연산할 수 있다. 이러한 동작은 이상에서 설명한 바와 같이, 양자 에러를 줄이기 위해 채널 별로 시프트 스케일링함에 따라 상이해진 출력의 스케일을 원래 스케일로 복원하고자 함이다.When input data is received, the processor 120 may calculate an operation result for each channel in a neural network operation process on the input data with a synthetic scale parameter descaled based on a shift scaling factor corresponding to each channel. As described above, this operation is intended to restore the scale of the output, which is different according to the shift scaling for each channel, to the original scale in order to reduce the quantum error.

이러한 프로세서(120)의 동작은 도 1b의 각종 모듈을 통해 좀더 구체적으로 설명한다.The operation of the processor 120 will be described in more detail through various modules of FIG. 1B .

도 1b는 본 개시의 일 실시 예에 따른 전자 장치(100)의 소프트웨어 구성을 설명하기 위한 블록도이다. 도 1b에서 프로세서(120) 내부에 복수의 모듈이 위치하는 것은 복수의 모듈이 프로세서(120)에 의해 로딩(또는 실행)되어 프로세서(120)에서 동작되는 상태를 나타내기 위한 것이며, 복수의 모듈은 메모리(110)에 기저장된 상태일 수 있다.1B is a block diagram illustrating a software configuration of the electronic device 100 according to an embodiment of the present disclosure. Positioning a plurality of modules inside the processor 120 in FIG. 1b is to indicate a state in which the plurality of modules are loaded (or executed) by the processor 120 and operated in the processor 120, and the plurality of modules are It may be in a state previously stored in the memory 110 .

도 1b를 참조하면, 메모리(110)에는 채널 별로 시프트 스케일링되고 레이어 별로 양자화된 인공 지능 모델, 입력 데이터가 저장될 수 있다. 여기서, 인공 지능 모델은 스케일 파라미터 및 제로 포인트 파라미터를 포함할 수 있다.Referring to FIG. 1B , an artificial intelligence model and input data shift-scaled for each channel and quantized for each layer may be stored in the memory 110 . Here, the artificial intelligence model may include a scale parameter and a zero point parameter.

그리고, 프로세서(120)는 메모리(110)에 저장된 모듈 또는 인스트럭션을 실행함으로써 전자 장치(100)의 전반적인 동작을 제어할 수 있다. 구체적으로, 프로세서(120)는 모듈 또는 인스트럭션을 읽고 해석하며 데이터 처리를 위한 시퀀스를 결정할 수 있으며, 그에 따라 메모리(110) 등 다른 구성의 동작을 제어하는 제어 신호를 전송함으로써 다른 구성의 동작을 제어할 수 있다.In addition, the processor 120 may control the overall operation of the electronic device 100 by executing a module or an instruction stored in the memory 110 . Specifically, the processor 120 reads and interprets the module or instruction, and may determine a sequence for data processing, and accordingly controls the operation of the other configuration by transmitting a control signal that controls the operation of the other configuration, such as the memory 110 . can do.

프로세서(120)는 신경망 연산 모듈 및 채널 별 역스케일링 모듈을 실행함으로써 입력 데이터를 양자화된 인공 지능 모델에 적용할 수 있다. 이때, 프로세서(120)는 입력 데이터에 대한 신경망 연산을 수행할 수 있고, 각 채널 별 연산 결과에 대한 역스케일링을 위해 합성 스케일 파라미터를 획득할 수 있다. 여기서, 신경망 연산 모듈 및 채널 별 역스케일링 모듈은 물리적으로 하나의 모듈로 구현될 수도 있고, 구분된 형태로 구현될 수도 있다.The processor 120 may apply the input data to the quantized artificial intelligence model by executing the neural network operation module and the inverse scaling module for each channel. In this case, the processor 120 may perform a neural network operation on the input data, and may obtain a synthetic scale parameter for inverse scaling of an operation result for each channel. Here, the neural network operation module and the inverse scaling module for each channel may be physically implemented as one module or may be implemented in a separate form.

예를 들어, 프로세서(120)는 입력 데이터 또는 피쳐 맵 데이터를 대응되는 채널의 가중치 값과 연산한 후, 연산 결과를 대응되는 시프트 스케일링 팩터에 기초하여 역스케일링된 합성 스케일 파라미터와 연산할 수 있다. 구체적으로, 프로세서(120)는 입력 데이터를 제1 레이어에 포함된 복수의 제1 채널 각각의 가중치 값과 연산하고, 복수의 제1 채널 각각의 가중치 값과의 연산 결과를 복수의 제1 채널 각각에 대응되는 시프트 스케일링 팩터에 기초하여 역스케일링된 합성 스케일 파라미터와 연산할 수 있다. 그리고, 프로세서(120)는 제1 레이터로부터 출력되는 피쳐 맵 데이터를 제1 레이어 다음의 제2 레이어에 포함된 복수의 제2 채널 각각의 가중치 값과 연산하고, 복수의 제2 채널 각각의 가중치 값과의 연산 결과를 복수의 제2 채널 각각에 대응되는 시프트 스케일링 팩터에 기초하여 역스케일링된 합성 스케일 파라미터와 연산할 수 있다. 프로세서(120)의 역스케일링 동작은 시프트 동작으로써 구현되며, 시프트 동작 및 합성 스케일 파라미터에 대하여는 후술한다.For example, the processor 120 may calculate the input data or the feature map data with the weight value of the corresponding channel, and then calculate the operation result with the inverse-scaled synthetic scale parameter based on the corresponding shift scaling factor. Specifically, the processor 120 calculates the input data with a weight value of each of the plurality of first channels included in the first layer, and calculates the calculation result with the weight value of each of the plurality of first channels, respectively, of the plurality of first channels. An operation may be performed with the inverse-scaled synthetic scale parameter based on the shift scaling factor corresponding to . Then, the processor 120 calculates the feature map data output from the first rater with a weight value of each of a plurality of second channels included in a second layer after the first layer, and a weight value of each of the plurality of second channels A result of the operation of and may be calculated with the inverse-scaled synthetic scale parameter based on the shift scaling factor corresponding to each of the plurality of second channels. The inverse scaling operation of the processor 120 is implemented as a shift operation, and the shift operation and the composite scale parameter will be described later.

먼저, 인공 지능 모델에 포함된 가중치 값은 인공 지능 모델에 포함된 채널 별로 상이한 시프트 스케일링 팩터에 기초하여 시프트 스케일링되고, 레이어 별로 양자화됨에 따라 획득된 값이다. 이를 설명하기 위해 인공 지능 모델의 데이터의 구조를 설명한다.First, the weight value included in the AI model is a value obtained by being shift-scaled based on a different shift scaling factor for each channel included in the AI model and quantized for each layer. To explain this, we describe the structure of the data of the artificial intelligence model.

인공 지능 모델은 복수의 레이어 각각에 대응되는 스케일 파라미터와 제로 포인트 파라미터를 포함할 수 있다. 여기서, 스케일 파라미터는 복수의 레이어 각각을 양자화하는 경우, 양자화 전의 값과 양자화 후 값 간의 기울기를 나타내고, 제로 포인트 파라미터는 복수의 레이어 각각을 양자화하는 경우, 양자화 전 제로 값의 양자화 후의 값을 나타낼 수 있다.The artificial intelligence model may include a scale parameter and a zero point parameter corresponding to each of the plurality of layers. Here, the scale parameter represents a gradient between a value before quantization and a value after quantization when each of a plurality of layers is quantized, and a zero point parameter represents a value after quantization of a zero value before quantization when each of a plurality of layers is quantized. there is.

예를 들어, 제1 레이어의 가중치 값 중 최소값 및 최대값이 각각 0 및 255로 매핑되는 방식으로 양자화를 수행하면, 가중치 값 중 최소값 및 최대값과 0 및 255의 상관 관계를 나타내는 스케일 파라미터, 제로 포인트 파라미터 및 양자화된 가중치 값이 획득될 수 있다. 여기서, 스케일 파라미터는 양자화 전후 데이터의 상관 관계를 나타내는 기울기를 의미하고, 제로 포인트 파라미터는 실수 0.0을 나타내는 양자값 또는 상관 관계가 원점으로부터 벗어난 정도를 의미한다.For example, if quantization is performed in such a way that the minimum and maximum values among the weight values of the first layer are mapped to 0 and 255, respectively, a scale parameter indicating the correlation between the minimum and maximum values of the weight values and 0 and 255, zero Point parameters and quantized weight values may be obtained. Here, the scale parameter refers to a slope indicating the correlation between data before and after quantization, and the zero point parameter refers to a quantum value representing the real number 0.0 or the degree to which the correlation deviates from the origin.

이상과 같은 방법을 통해 각 레이어 별 가중치에 대한 스케일 파라미터 및 제로 포인트 파라미터가 획득될 수 있다. 또한, 동일한 방법으로 각 레이어 별 입력 및 출력에 대한 스케일 파라미터 및 제로 포인트 파라미터가 획득될 수 있다. 프로세서(120)는 현재 레이어의 스케일 파라미터, 상기 현재 레이어 직전 레이어의 스케일 파라미터 및 상기 복수의 가중치 값의 스케일 파라미터에 기초하여 획득된 값을, 상기 각 채널에 대응되는 시프트 스케일링 팩터에 기초하여 역스케일링하여 상기 합성 스케일 파라미터를 획득할 수 있다.Through the method described above, a scale parameter and a zero point parameter for a weight for each layer may be obtained. In addition, scale parameters and zero point parameters for input and output for each layer may be obtained by the same method. The processor 120 inversely scales a value obtained based on a scale parameter of the current layer, a scale parameter of a layer immediately before the current layer, and a scale parameter of the plurality of weight values based on a shift scaling factor corresponding to each channel. to obtain the synthesis scale parameter.

예를 들어, 프로세서(120)는 하기의 수학식 1로 양자화된 출력 값을 획득할 수 있다. 수학식 1의 획득 과정은 도면을 통해 후술한다.For example, the processor 120 may obtain an output value quantized by Equation 1 below. The acquisition process of Equation 1 will be described later with reference to the drawings.

[수학식 1][Equation 1]

여기서,

는 각각 이전 레이어의 출력 데이터로서 피쳐 맵 데이터(또는, 입력 데이터), 이전 레이어의 스케일 파라미터, 이전 레이어의 제로 포인트 파라미터이고,

는 각각 양자화된 가중치 값, 가중치 값의 스케일 파라미터, 가중치 값의 제로 포인트 파라미터이고,

는 각각 현재 레이어의 출력 데이터, 현재 레이어의 스케일 파라미터, 현재 레이어의 제로 포인트 파라미터이다.

는 floating bias

를

스케일로 symmetric quantization 한 후의 양자 bias 값이고, i, j는 각각 출력 채널 인덱스, 입력 채널 인덱스이다.here,

are the feature map data (or input data) as output data of the previous layer, the scale parameter of the previous layer, and the zero point parameter of the previous layer, respectively,

are a quantized weight value, a scale parameter of a weight value, and a zero point parameter of a weight value, respectively;

are output data of the current layer, a scale parameter of the current layer, and a zero point parameter of the current layer, respectively.

is a floating bias

cast

It is a quantum bias value after symmetric quantization with scale, and i and j are output channel indexes and input channel indexes, respectively.

프로세서(120)는 신경망 연산 모듈을 실행함으로써 메모리(110)로부터 가중치 값, 제로 포인트 파라미터(이전 레이어, 현재 레이어, 가중치 값), 입력 데이터를 획득하고, 채널 별 역스케일링 모듈을 실행함으로써 메모리(110)로부터 채널 별 시프트 스케일링 팩터, 스케일 파라미터(이전 레이어, 현재 레이어, 가중치 값)를 획득할 수 있다. 채널 별 역스케일링 모듈은 이전 레이어의 스케일 파라미터, 현재 레이어의 스케일 파라미터 및 가중치 값의 스케일 파라미터에 기초하여 획득된 값을, 각 채널에 대응되는 시프트 스케일링 팩터에 기초하여 역스케일링하여 합성 스케일 파라미터를 획득할 수 있다. 즉, 합성 스케일 파라미터의 획득 과정은 역스케일링을 포함하며, 채널 별 역스케일링 모듈은 합성 스케일 파라미터를 신경망 연산 모듈로 제공할 수 있다. 여기서, 획득된 값은 현재 레이어의 스케일 파라미터에 반비례하고, 현재 레이어 직전 레이어의 스케일 파라미터 및 복수의 가중치 값의 스케일 파라미터에 비례할 수 있다.The processor 120 obtains a weight value, a zero point parameter (previous layer, a current layer, a weight value), and input data from the memory 110 by executing the neural network operation module, and executes the inverse scaling module for each channel to the memory 110 ), it is possible to obtain a shift scaling factor for each channel and a scale parameter (previous layer, current layer, weight value). The inverse scaling module for each channel inversely scales a value obtained based on the scale parameter of the previous layer, the scale parameter of the current layer, and the scale parameter of the weight value, based on the shift scaling factor corresponding to each channel to obtain a composite scale parameter can do. That is, the process of obtaining the synthetic scale parameter includes inverse scaling, and the inverse scaling module for each channel may provide the synthetic scale parameter to the neural network operation module. Here, the obtained value may be inversely proportional to the scale parameter of the current layer, and may be proportional to the scale parameter of the layer immediately before the current layer and the scale parameter of the plurality of weight values.

즉, 프로세서(120)는 수학식 1의 스케일 파라미터 간의 연산 값인

를

의 형태로 변환하고, 각 채널에 대응되는 시프트 스케일링 팩터(

)를

와 같이 추가함에 따라 역스케일링된 합성 스케일 파라미터를 획득할 수 있다.That is, the processor 120 is an operation value between the scale parameters of Equation 1,

cast

converted to the form of , and the shift scaling factor (

)cast

By adding as , an inversely scaled synthetic scale parameter can be obtained.

특히, 프로세서(120)는 이진법으로 데이터를 처리하기 때문에, 시프트 스케일링 팩터의 추가는 시프트 동작으로써 구현될 수 있다. 즉, 프로세서(120)는 획득된 값을 각 채널에 대응되는 시프트 스케일링 팩터에 기초하여 시프트함으로써 합성 스케일 파라미터를 획득할 수 있으며, 단순히 시프트 동작을 추가하는 것에 불과하여 하드웨어 구현이 용이하다.In particular, since the processor 120 processes data in a binary format, addition of a shift scaling factor may be implemented as a shift operation. That is, the processor 120 may obtain the synthesized scale parameter by shifting the obtained value based on the shift scaling factor corresponding to each channel, and it is easy to implement hardware because it simply adds a shift operation.

프로세서(120)는 신경망 연산 모듈을 실행함으로써 수학식 1과 같은 연산을 수행할 수 있다, 구체적으로, 프로세서(120)는 가중치 값, 제로 포인트 파라미터(이전 레이어, 현재 레이어, 가중치 값), 입력 데이터, 합성 스케일 파라미터에 기초하여 신경망 연산을 수행할 수 있다. 여기서, 이전 레이어 및 현재 레이어의 제로 포인트 파라미터는 현재 레이어의 계산 동안에는 고정된 값이고, 가중치 값의 제로 포인트 파라미터는 가중치 값의 양자화에 이용된 값으로 레이어가 변경되기 까지는 동일하다. 또한, 신경망 연산의 수행 과정에서 채널 별 역스케일링 모듈을 통해 획득되는 합성 스케일 파라미터는 채널 별로 상이할 수 있다.The processor 120 may perform an operation as in Equation 1 by executing the neural network operation module. Specifically, the processor 120 may include a weight value, a zero point parameter (previous layer, a current layer, a weight value), and input data. , a neural network operation may be performed based on the synthetic scale parameter. Here, the zero point parameter of the previous layer and the current layer is a fixed value during the calculation of the current layer, and the zero point parameter of the weight value is the same as a value used for quantization of the weight value until the layer is changed. In addition, the synthesized scale parameter obtained through the inverse scaling module for each channel in the process of performing the neural network operation may be different for each channel.

한편, 프로세서(120)는 채널 별 역스케일링 모듈을 실행함으로써 양자화된 인공 지능 모델에 복수의 레이어 각각에 포함된 복수의 채널 별 시프트 스케일링 팩터가 포함되었다고 식별되면, 역스케일링을 수행할 수 있다. 즉, 프로세서(120)는 인공 지능 모델이 채널 별 시프트 스케일링 팩터를 포함하는지 여부에 기초하여 역스케일링 동작의 수행 여부를 결정할 수 있다. 구체적으로, 프로세서(120)는 인공 지능 모델이 채널 별 시프트 스케일링 팩터를 포함하는 경우 시프트 동작을 통해 역스케일링을 수행하고, 인공 지능 모델이 채널 별 시프트 스케일링 팩터를 포함하지 않는 경우 시프트 동작을 수행하지 않을 수 있다. 즉, 프로세서(120)는 인공 지능 모델이 채널 별 시프트 스케일링 팩터를 포함하는 경우

를 신경망 연산에 이용하고, 인공 지능 모델이 채널 별 시프트 스케일링 팩터를 포함하지 않는 경우

를 신경망 연산에 이용하게 된다.Meanwhile, when it is identified that the quantized artificial intelligence model includes a plurality of shift scaling factors for each channel included in each of the plurality of layers by executing the inverse scaling module for each channel, the processor 120 may perform inverse scaling. That is, the processor 120 may determine whether to perform the inverse scaling operation based on whether the artificial intelligence model includes the shift scaling factor for each channel. Specifically, the processor 120 performs inverse scaling through a shift operation when the artificial intelligence model includes the shift scaling factor for each channel, and does not perform the shift operation when the artificial intelligence model does not include the shift scaling factor for each channel. it may not be That is, when the artificial intelligence model includes the shift scaling factor for each channel, the processor 120

is used for neural network computation, and the artificial intelligence model does not include the shift scaling factor for each channel.

is used for neural network computation.

한편, 전자 장치(100)는 신경망 처리 장치(Neural Processing Unit, NPU)로 구현될 수도 있다. 경우, 이 경우, 신경망 처리 장치에 포함된 캐시 메모리 등이 메모리(110)로서 동작하고, 신경망 처리 장치에 포함된 복수의 연산 소자(Processing Element) 등이 프로세서(120)로서 동작할 수 있다.Meanwhile, the electronic device 100 may be implemented as a Neural Processing Unit (NPU). In this case, a cache memory included in the neural network processing apparatus may operate as the memory 110 , and a plurality of processing elements included in the neural network processing apparatus may operate as the processor 120 .

이상과 같이 프로세서(120)는 신경망 연산 과정에서 역스케일링 동작을 수행할 수 있으며, 일부 데이터의 시프트 동작 만으로 역스케일링의 결과를 획득할 수 있어 온 디바이스 형태로의 구현이 용이하다.As described above, the processor 120 can perform the inverse scaling operation in the neural network calculation process, and can obtain the result of the inverse scaling only by shifting some data, making it easy to implement in an on-device form.

또한, 인공 지능 모델의 양자화 과정에서 각 채널의 스케일링에 따라 각 채널의 데이터가 뭉개지지 않는 효과가 있으며, 그로 인해 일정 수준의 정확도의 확보가 가능하다.In addition, in the quantization process of the artificial intelligence model, there is an effect that the data of each channel is not crushed according to the scaling of each channel, so that a certain level of accuracy can be secured.

한편, 이상에서 설명한 바와 같이, 역스케일링 동작은 시프터를 통해 구현될 수 있다. 예를 들어, 도 1d에 도시된 바와 같이, 시프터는 프로세서(120) 내부의 일 구성으로서 구현될 수 있다. 즉, 도 1b의 채널 별 역스케일링 모듈은 시프터로서 구현될 수 있다.Meanwhile, as described above, the inverse scaling operation may be implemented through a shifter. For example, as shown in FIG. 1D , the shifter may be implemented as a component inside the processor 120 . That is, the inverse scaling module for each channel of FIG. 1B may be implemented as a shifter.

또는, 시프터(130)는 도 1e에 도시된 바와 같이, 프로세서(120) 외부의 구성으로서 구현될 수도 있다. 이 경우, 시프터(130)는 메모리(110)로부터 스케일 파라미터 및 채널 별 시프트 스케일링 팩터를 수신하고, 이전 레이어의 스케일 파라미터, 현재 레이어의 스케일 파라미터 및 가중치 값의 스케일 파라미터에 기초하여 획득된 값을, 각 채널에 대응되는 시프트 스케일링 팩터에 기초하여 역스케일링하여 합성 스케일 파라미터를 획득할 수 있다. 그리고, 시프터(130)는 합성 스케일 파라미터를 프로세서(120)로 제공할 수 있다.Alternatively, the shifter 130 may be implemented as an external component of the processor 120 as shown in FIG. 1E . In this case, the shifter 130 receives the scale parameter and the shift scaling factor for each channel from the memory 110, and a value obtained based on the scale parameter of the previous layer, the scale parameter of the current layer, and the scale parameter of the weight value, A synthesized scale parameter may be obtained by performing inverse scaling based on a shift scaling factor corresponding to each channel. In addition, the shifter 130 may provide the synthesized scale parameter to the processor 120 .

도 1e와 같이 시프터가 구현되는 경우, 종래의 프로세서를 이용하더라도 본 개시와 같은 역스케일링이 가능한 효과가 있다.When a shifter is implemented as shown in FIG. 1E, even if a conventional processor is used, reverse scaling as in the present disclosure is possible.

이하에서는 도면을 통해 본 개시의 다양한 실시 예를 좀더 구체적으로 설명한다.Hereinafter, various embodiments of the present disclosure will be described in more detail with reference to the drawings.

도 2는 본 개시의 일 실시 예에 따른 컴파일러(200) 및 전자 장치(100)의 동작을 설명하기 위한 도면이다.2 is a diagram for explaining the operations of the compiler 200 and the electronic device 100 according to an embodiment of the present disclosure.

컴파일러(200)는 입력 모델(Float Model File)에 포함된 복수의 가중치 값을 복수의 레이어 각각에 포함된 복수의 채널 별로 상이한 시프트 스케일링 팩터에 기초하여 스케일링하고 복수의 레이어 별로 양자화할 수 있다.The compiler 200 may scale a plurality of weight values included in an input model (Float Model File) based on different shift scaling factors for a plurality of channels included in each of a plurality of layers, and may quantize the values for each of the plurality of layers.

컴파일러(200)는 입력 모델을 파싱하는 파싱 모듈, custom NPU에서 지원하는 op로 재구성하는 Instruction stream 모듈, WES(weight equalizing scaler) 모듈, Float 파라미터를 IntN으로 양자화하는 양자화(quantizer) 모듈, 메모리 분배, 연산을 최적화(타일화)하는 최적화(optimization) 모듈 및 Binary 파일로 만들어 주는 Binarization 모듈을 포함할 수 있다.Compiler 200 includes a parsing module that parses an input model, an instruction stream module that reconfigures an op supported by a custom NPU, a weight equalizing scaler (WES) module, a quantizer module that quantizes Float parameters into IntN, memory distribution, It can include an optimization module that optimizes (tiles) an operation and a Binarization module that makes it into a binary file.

특히, WES 모듈은 Instruction Stream 모듈로부터 NPU HW Operation으로 구성된, 단방향으로 사슬처럼 연결된 그래프 파일과 Float32 포맷의 파라미터들을 수신할 수 있다.In particular, the WES module can receive from the Instruction Stream module a graph file composed of NPU HW Operation, connected like a chain in one direction, and parameters in Float32 format.

WES 모듈은 도 1c의 상단과 같이 채널 별로 오리지널 파라미터의 최소값, 최대값들을 획득하고, 각 채널 별 최소값, 최대값을 이용하여 최소의 양자화 에러를 갖도록 기준 범위를 정하여 채널 별 시프트 스케일링 팩터(channel-wise shift scale 값)를 획득할 수 있다.The WES module acquires the minimum and maximum values of the original parameter for each channel as shown in the upper part of FIG. 1C, and sets a reference range to have the minimum quantization error using the minimum and maximum values for each channel, thereby setting the shift scaling factor for each channel (channel- wise shift scale value) can be obtained.

구체적으로, WES 모듈은 채널 별 오리지널 파라미터의 범위를 구하고, 그 중에서 최대값(도 1c 상단의 채널 22번)을 가지는 범위를 기준 범위로 정하고, 기준 범위에 기초하여 각 채널 별 시프트 스케일링 팩터를 획득할 수 있다.Specifically, the WES module obtains the range of the original parameter for each channel, sets the range having the maximum value (channel 22 in the upper part of FIG. 1C ) as the reference range, and obtains the shift scaling factor for each channel based on the reference range can do.

예를 들어, WES 모듈은 하나의 레이어에 포함된 가중치 값 중 크기가 가장 큰 가중치 값을 식별하고, 식별된 가중치 값에 기초하여 각 채널의 시프트 스케일링 팩터를 획득할 수 있다. 이때, WES 모듈은 각 채널에서 크기가 가장 큰 가중치 값 및 이상에서 식별된 가중치 값에 기초하여 각 채널의 시프트 스케일링 팩터를 획득할 수 있다.For example, the WES module may identify a weight value having the largest size among weight values included in one layer, and obtain a shift scaling factor of each channel based on the identified weight value. In this case, the WES module may obtain the shift scaling factor of each channel based on the weight value having the largest magnitude in each channel and the weight value identified above.

또는, WES 모듈은 전체 범위에 대해 각 채널을 스케일링(shift scale)한 후 변화된 범위 값의 비율을 합산하여 그 값이 커지도록 gradient-descent 방식을 적용하여 각 채널의 시프트 스케일링 팩터를 획득할 수도 있다.Alternatively, the WES module may obtain the shift scaling factor of each channel by applying the gradient-descent method so that the ratio of the changed range values is added up after scaling each channel for the entire range. .

또는, WES 모듈은 각 채널을 시프트 스케일링(shift scale)한 후 int로 변환하는 양자화를 진행하고 floating으로 복구하는 역양자화 과정을 거치면 발생하는 양자 에러를 채널 별로 합산하여 그 값이 최소값을 가지도록 각 채널의 시프트 스케일링 팩터를 획득할 수도 있다.Alternatively, the WES module shift scales each channel, performs quantization to convert to int, and inverse quantization process to recover floating, summing up quantum errors for each channel so that the value has a minimum value. A shift scaling factor of the channel may be obtained.

WES 모듈은 기준 범위를 각 채널의 범위로 나눈 다음 지수 2를 가지는 algorithm을 취하고 그것의 내림 값을 취함으로써 Si를 획득할 수 있다. 2^Si는 이진수를 연산하는 하드웨어에서 비트 Shift(<< Si) 연산으로 간단히 처리할 수 있어 하드웨어 구현이 용이하다.The WES module can obtain Si by dividing the reference range by the range of each channel, then taking the algorithm with exponent 2 and taking its rounded down value. 2^Si can be easily processed by bit shift (<< Si) operation in hardware that calculates binary numbers, so hardware implementation is easy.

채널 별로 2^Si 형태의 스케일링에 따라 업데이트된 파라미터들(wi)은 도 1c의 하단에 도시된 바와 같이, 레이어 전체의 최소값, 최대값 범위와 최대한 매칭되어 layer-wise quantization하기에 최적화된 범위를 가질 수 있다. 이때, bias(bi)에도 동일하게 2^Si 스케일이 적용될 수 있다.As shown at the bottom of FIG. 1c, the parameters wi updated according to the 2^Si type scaling for each channel match the minimum and maximum values of the entire layer to the maximum, so that the optimal range for layer-wise quantization is obtained. can have In this case, the same 2^Si scale may be applied to bias(bi).

이후, 양자화 모듈 파라미터 스케일이 조정된 컨볼루션 레이어에 레이어 별(layer-wise) 선형 양자화(Linear quantization)를 수행할 수 있다.Thereafter, layer-wise linear quantization may be performed on the convolutional layer whose quantization module parameter scale is adjusted.

한편, 전자 장치(NPU HW, 100)는 Fixed computing을 담당하는 ALU 모듈, 각 싸이클에서 연산에 필요한 파라미터와 입출력 피쳐맵을 저장하는 캐시 메모리 및 전체 파라미터와 피쳐맵을 공유하는 메모리를 포함할 수 있다.Meanwhile, the electronic device (NPU HW, 100) may include an ALU module in charge of fixed computing, a cache memory that stores parameters and input/output feature maps required for operation in each cycle, and a memory that shares all parameters and feature maps. .

여기서, 컴파일러(200)의 WES 모듈의 동작에 따라 ALU 모듈은 구조적으로 Fixed Computing ALU w/ Ch-wise Shift Scaling로서 변경될 수 있으며, 이에 대하여는 이하의 도면을 통해 좀더 구체적으로 설명한다.Here, according to the operation of the WES module of the compiler 200, the ALU module may be structurally changed to Fixed Computing ALU w/ Ch-wise Shift Scaling, which will be described in more detail with reference to the following drawings.

도 3은 본 개시의 다양한 실시 예에 따른 시프트 스케일링 팩터를 획득하는 방법을 설명하기 위한 흐름도이다.3 is a flowchart illustrating a method of obtaining a shift scaling factor according to various embodiments of the present disclosure.

먼저, WES 모듈은 채널 별 오리지널 파라미터의 범위를 구할 수 있다(S310). 그리고, WES 모듈은 전체 범위를 선택하고(S320), 전체 범위 및 채널 별 범위에 기초하여 채널 별 시프트 스케일링 팩터를 획득하고, 획득된 채널 별 시프트 스케일링 팩터에 기초하여 채널 별 시프트 스케일링을 수행할 수 있다(S330).First, the WES module may obtain the range of the original parameter for each channel (S310). Then, the WES module selects the entire range (S320), obtains a shift scaling factor for each channel based on the entire range and the range for each channel, and performs shift scaling for each channel based on the obtained shift scaling factor for each channel. There is (S330).

WES 모듈은 채널 별 시프트 스케일링 후, 추가적으로 레이어 별 양자화를 수행하고(S331), 양자화 에러를 산출하며(S332), 양자화 에러에 기초하여 채널 별 시프트 스케일링 팩터를 재획득하고, 재획득된 채널 별 시프트 스케일링 팩터에 기초하여 채널 별 시프트 스케일링을 재수행할 수도 있다(S390).After shift scaling for each channel, the WES module additionally performs quantization for each layer (S331), calculates a quantization error (S332), re-acquires a shift scaling factor for each channel based on the quantization error, and re-acquired shift for each channel Shift scaling for each channel may be re-performed based on the scaling factor (S390).

또는, WES 모듈은 채널 별 시프트 스케일링 후, 추가적으로 전체 범위에 대한 비용 함수를 정의하고(S340), 채널 별 시프트 스케일링 팩터에 기초하여 채널 별 시프트 스케일링을 수행하며(S350), 양자화(S360) 후 양자화 에러를 산출할 수 있다(S370). WES 모듈은 양자화 에러가 기설정된 값에 수렴하는 경우, 채널 별 시프트 스케일링 팩터를 확정하고, 확정된 채널 별 시프트 스케일링 팩터에 기초하여 채널 별 스케일링을 수행할 수 있다(S390). 또는, WES 모듈은 양자화 에러가 기설정된 값에 수렴하지 않는 경우, 스케일링 후 변화된 범위 값의 비율을 합산하고, 그 값이 커지도록 gradient-descent 방식을 적용하여 채널 별 시프트 스케일링 팩터를 재조정할 수 있다.Alternatively, the WES module additionally defines a cost function for the entire range after shift scaling for each channel (S340), performs shift scaling for each channel based on the shift scaling factor for each channel (S350), and quantizes after quantization (S360) An error may be calculated (S370). When the quantization error converges to a preset value, the WES module may determine a shift scaling factor for each channel and perform scaling for each channel based on the determined shift scaling factor for each channel ( S390 ). Alternatively, when the quantization error does not converge to a preset value, the WES module sums the ratio of the range values changed after scaling, and applies the gradient-descent method so that the value increases, and the shift scaling factor for each channel can be readjusted. .

이상과 같은 방식으로 WES 모듈는 채널 별 시프트 스케일링 팩터를 획득할 수 있다.In the above manner, the WES module may obtain a shift scaling factor for each channel.

도 4a 및 도 4b는 본 개시의 일 실시 예에 따른 역스케일링 동작을 설명하기 위한 도면들이다.4A and 4B are diagrams for explaining an inverse scaling operation according to an embodiment of the present disclosure.

도 4a에 도시된 바와 같이, 좌측의 컨볼루션 연산 및 스케일러는 우측과 같이 하나의 구성으로 구현될 수 있다. 먼저, 좌측의 구성을 설명한다.As shown in FIG. 4A , the convolution operation and the scaler on the left side may be implemented as one configuration as shown on the right side. First, the configuration on the left will be described.

INT 연산을 수행하는 Fixed computing ALU는 INTN 양자화된 값의 입력 데이터, 가중치 값, 스케일 파라미터, 제로 포인트 파라미터를 메모리로부터 로드할 수 있다. 여기서, 스케일 파라미터, 제로 포인트 파라미터는 현재 레이어의 스케일 파라미터, 현재 레이어 직전 레이어의 스케일 파라미터 및 복수의 가중치 값의 스케일 파라미터를 포함할 수 있다.Fixed computing ALU performing INT operation can load input data of INTN quantized values, weight values, scale parameters, and zero point parameters from memory. Here, the scale parameter and the zero point parameter may include a scale parameter of the current layer, a scale parameter of a layer immediately before the current layer, and a scale parameter of a plurality of weight values.

스케일 파라미터는 양자화 전후 데이터의 상관 관계를 나타내는 기울기를 의미하고, 제로 포인트 파라미터는 실수 0.0을 나타내는 양자값 또는 상관 관계가 원점으로부터 벗어난 정도를 의미한다. 예를 들어, 도 4b에 도시된 바와 같이, 실수축의 최대값(max)은 255로 양자화되고, 실수축의 최소값(min)은 0으로 양자화될 수 있으며, 이때의 기울기가 스케일 파라미터이다. 그리고, 실수축의 0 값은 z로 양자화되는데, 이때 z가 제로 포인트 파라미터이다.The scale parameter refers to a slope indicating the correlation between data before and after quantization, and the zero point parameter refers to a quantum value representing the real number 0.0 or the degree to which the correlation deviates from the origin. For example, as shown in FIG. 4B , the maximum value (max) of the real axis may be quantized to 255, and the minimum value (min) of the real axis may be quantized to 0, and the slope at this time is a scale parameter. And, the zero value of the real axis is quantized to z, where z is the zero point parameter.

이러한 방식으로 레이어 별 스케일 파라미터 및 제로 포인트 파라미터가 획득될 수 있다.In this way, a scale parameter and a zero point parameter for each layer may be obtained.

Fixed computing ALU는 상술한 수학식 1과 같이 Fixed Computing Convolution 레이어에서 양자화된 입력값과 파라미터로부터 출력값을 획득할 수 있다. 여기서,

,

은 각각 이전 레이어, 현재 레이어 및 가중치 값에 대한 int to float 변환하는 float 스케일 값이고, 레이어마다 상이할 수 있다. Fixed computing ALU는

를

(M : mantissa or multiplier,

: exponent or shiftamount)의 형태로 변환할 수 있다.The fixed computing ALU may obtain an output value from an input value and a parameter quantized in the Fixed Computing Convolution layer as shown in Equation 1 above. here,

,

is a float scale value that converts int to float for the previous layer, the current layer, and the weight value, respectively, and may be different for each layer. Fixed computing ALU is

cast

(M: mantissa or multiplier,

: exponent or shiftamount).

Fixed computing ALU는 추가로 각 채널 별 2^Si 중의 4비트로 표현된 integer Si를 메모리로부터 로드할 수 있다. 채널 별 integer Si는 컴파일러(200)로부터 수신되어 메모리(110)에 저장될 수 있다. 그리고, Fixed computing ALU는 각 채널에 대응되는 시프트 스케일링 팩터(

)를

와 같이 추가함에 따라

가 역스케일링된 합성 스케일링 파라미터를 획득할 수 있으며, 도 4a의 우측과 같이 역스케일링 동작을 컨볼루션 연산 중에 부가적으로 수행할 수 있다. 특히, 역스케일링 동작은 채널 별 반대 방향으로의 시프트 형태로 처리될 수 있어 하드웨어적인 구현이 용이하다.Fixed computing ALU can additionally load integer Si expressed as 4 bits out of 2^Si for each channel from memory. Integer Si for each channel may be received from the compiler 200 and stored in the memory 110 . And, the fixed computing ALU is a shift scaling factor (

)cast

by adding as

may obtain an inverse-scaled synthetic scaling parameter, and an inverse-scaling operation may be additionally performed during a convolution operation as shown on the right side of FIG. 4A . In particular, since the inverse scaling operation can be processed in the form of a shift in the opposite direction for each channel, hardware implementation is easy.

한편, 이상에서는 컴파일러(200)가 전자 장치(100)로 채널 별 시프트 스케일링 팩터에 대한 정보를 2^Si 중의 4비트로 표현된 integer Si로서 제공하는 것으로 설명하였다. 예를 들어, layer-wise quantization을 지원하는 하드웨어에서는 composite scale은 채널 공통으로 Multiplier(M, 32bit)와 Shift amount(S, 6bit)로 구성되며, 상술한 방법에 따르면 Multiplier(M), Shift amount(S)는 채널 공통으로 사용하되, Ch-wise shift scaling(Si)를 별도의 4bit 포맷의 추가 파라미터로 컴파일러로부터 수신하고, Shift amount(S)에서 Ch-wise shift scale(Si)를 뺀 만큼을 출력값에 스케일해주고 Multiplier를 곱하는 연산을 수행하는 방식으로 역스케일링을 수행할 수 있다. 다른 방법으로는, 출력값을 Shift amount만큼 시프트한 후, Ch-wise shift scale만큼 또 시프트한 후, Multiplier를 곱하는 이중 시프트 연산을 수행할 수도 있다.Meanwhile, it has been described above that the compiler 200 provides information on the shift scaling factor for each channel to the electronic device 100 as an integer Si expressed by 4 bits among 2^Si. For example, in hardware supporting layer-wise quantization, the composite scale is composed of Multiplier(M, 32bit) and Shift amount(S, 6bit) in common for channels. According to the above method, Multiplier(M), Shift amount( S) is used in common for channels, but Ch-wise shift scaling (Si) is received as an additional parameter in a separate 4-bit format from the compiler, and the output value obtained by subtracting the Ch-wise shift scale (Si) from the shift amount (S) Inverse scaling can be performed by scaling to and multiplying by Multiplier. Alternatively, after shifting the output value by the shift amount, shifting by the Ch-wise shift scale again, a double shift operation of multiplying by a multiplier may be performed.

다만, 이에 한정되는 것은 아니며, 컴파일러(200)가 역스케일링이 적용된 채널 별 시프트 스케일링 팩터를 제공할 수도 있다. 예를 들어, Multiplier(M)는 채널 공통으로 사용하되, 컴파일러(200)는 Shift amount(S)는 채널 공통의 S에 Ch-wise shift scale(Si)을 채널 별로 뺀 6bit 정보를 전자 장치(100)로 제공할 수도 있다.However, the present invention is not limited thereto, and the compiler 200 may provide a shift scaling factor for each channel to which inverse scaling is applied. For example, the Multiplier (M) is used in common channels, but the compiler 200 calculates the shift amount (S) by subtracting the Ch-wise shift scale (Si) from the channel common S for each channel. 6-bit information of the electronic device 100 ) can also be provided.

한편, 이상에서는 컴파일러(200) 및 전자 장치(100)가 별도의 장치인 것으로 설명하였으나, 두 장치는 하나의 통합된 장치로 구현될 수도 있다.Meanwhile, although it has been described above that the compiler 200 and the electronic device 100 are separate devices, the two devices may be implemented as one integrated device.

도 5a 내지 도 5c는 본 개시에 따른 효과를 설명하기 위한 도면들이다.5A to 5C are diagrams for explaining the effect according to the present disclosure.

도 5a에 도시된 바와 같이, WES가 이용된 경우의 정확도는 양자화가 없는 경우(baseline)의 정확도에 상당히 근접하면서 종래보다 높은 정확도를 나타낸다.As shown in FIG. 5A , the accuracy when WES is used is significantly close to that of the baseline without quantization, and exhibits higher accuracy than the prior art.

반면, 도 5b에 도시된 바와 같이, WES가 이용된 경우의 파라미터의 크기는 채널 별로 양자화하는 경우보다 상당히 작고, 레이어 별로 양자화하는 경우보다 약간 큰 정도이다.On the other hand, as shown in FIG. 5B , the size of the parameter when WES is used is considerably smaller than that of quantization for each channel and slightly larger than that of quantization for each layer.

도 5c는 채널의 개수에 따른 파라미너터의 크기를 비교하며, 도 5b와 유사한 결과를 나타낸다. 즉, WES가 이용된 경우는 레이어 별로 양자화하는 경우보다 약간의 파라미터의 증가가 있으나, 그로 인해 확보되는 정확도는 양자화가 없는 경우에 근접한다.FIG. 5C compares the parameter sizes according to the number of channels, and shows a result similar to that of FIG. 5B. That is, when WES is used, there is a slight increase in parameters compared to the case where quantization is performed for each layer, but the resulting accuracy is close to that in the case of no quantization.

도 6은 본 개시의 일 실시 예에 따른 전자 장치의 제어 방법을 설명하기 위한 흐름도이다.6 is a flowchart illustrating a method of controlling an electronic device according to an embodiment of the present disclosure.

먼저, 입력 데이터를 수신한다(S610). 그리고, 인공 지능 모델을 이용하여 입력 데이터에 대한 신경망 연산 과정에서 인공 지능 모델을 구성하는 복수의 레이어 각각의 채널 별 연산 결과를 각 채널에 대응되는 시프트 스케일링 팩터에 기초하여 역스케일링된 합성 스케일 파라미터와 연산한다(S620). 여기서, 인공 지능 모델은 복수의 레이어 각각에 포함된 복수의 채널 별로 상이한 시프트 스케일링 팩터에 기초하여 스케일링되고 복수의 레이어 별로 양자화된 복수의 가중치 값을 포함할 수 있다.First, input data is received (S610). And, in the neural network calculation process for input data using the artificial intelligence model, the calculation result for each channel of each of the plurality of layers constituting the artificial intelligence model is inversely scaled based on the shift scaling factor corresponding to each channel, and the synthetic scale parameter and Calculate (S620). Here, the artificial intelligence model may include a plurality of weight values scaled based on a different shift scaling factor for each of a plurality of channels included in each of the plurality of layers and quantized for each of the plurality of layers.

또한, 인공 지능 모델은 양자화된 복수의 가중치 값, 복수의 레이어 각각에 포함된 복수의 채널 별 시프트 스케일링 팩터 및 복수의 레이어 각각에 대응되는 스케일 파라미터와 제로 포인트 파라미터를 포함할 수 있다. 여기서, 스케일 파라미터는 복수의 레이어 각각을 양자화하는 경우, 양자화 전의 값과 양자화 후 값 간의 기울기를 나타내고, 제로 포인트 파라미터는 복수의 레이어 각각을 양자화하는 경우, 양자화 전 제로 값의 양자화 후의 값을 나타낼 수 있다.In addition, the artificial intelligence model may include a plurality of quantized weight values, a shift scaling factor for each channel included in each of the plurality of layers, and a scale parameter and a zero point parameter corresponding to each of the plurality of layers. Here, the scale parameter represents a gradient between a value before quantization and a value after quantization when each of a plurality of layers is quantized, and a zero point parameter represents a value after quantization of a zero value before quantization when each of a plurality of layers is quantized. there is.

그리고, 연산하는 단계(S620)는 현재 레이어의 스케일 파라미터, 현재 레이어 직전 레이어의 스케일 파라미터 및 복수의 가중치 값의 스케일 파라미터에 기초하여 획득된 값을, 각 채널에 대응되는 시프트 스케일링 팩터에 기초하여 역스케일링하여 합성 스케일 파라미터를 획득할 수 있다.Then, the calculating ( S620 ) is performed by inverting a value obtained based on a scale parameter of the current layer, a scale parameter of a layer immediately before the current layer, and a scale parameter of a plurality of weight values based on a shift scaling factor corresponding to each channel. A synthetic scale parameter may be obtained by scaling.

또한, 연산하는 단계(S620)는 획득된 값을 각 채널에 대응되는 시프트 스케일링 팩터에 기초하여 시프트함으로써 합성 스케일 파라미터를 획득할 수 있다.In addition, in the calculating ( S620 ), the synthesized scale parameter may be obtained by shifting the obtained value based on the shift scaling factor corresponding to each channel.

여기서, 획득된 값은 현재 레이어의 스케일 파라미터에 반비례하고, 현재 레이어 직전 레이어의 스케일 파라미터 및 복수의 가중치 값의 스케일 파라미터에 비례할 수 있다.Here, the obtained value may be inversely proportional to the scale parameter of the current layer, and may be proportional to the scale parameter of the layer immediately before the current layer and the scale parameter of the plurality of weight values.

그리고, 연산하는 단계(S620)는 복수의 레이어 각각에 포함된 복수의 채널 별 시프트 스케일링 팩터가 인공 지능 모델에 포함되었다고 식별되면, 연산을 수행할 수 있다.Then, in the calculating ( S620 ), if it is identified that the shift scaling factors for each channel included in each of the plurality of layers are included in the artificial intelligence model, the calculation may be performed.

한편, 전자 장치는 신경망 처리 장치(Neural Processing Unit, NPU)로 구현될 수 있다.Meanwhile, the electronic device may be implemented as a Neural Processing Unit (NPU).

그리고, 각 채널에 대응되는 시프트 스케일링 팩터는 각 채널에 포함된 가중치 값 및 각 채널을 포함하는 레이어에 포함된 가중치 값에 기초하여 결정될 수 있다.In addition, the shift scaling factor corresponding to each channel may be determined based on a weight value included in each channel and a weight value included in a layer including each channel.

여기서, 각 채널에 대응되는 시프트 스케일링 팩터는 각 채널에서 가장 크기가 큰 가중치 값 및 각 채널을 포함하는 레이어에서 가장 크기가 큰 가중치 값에 기초하여 결정될 수 있다.Here, the shift scaling factor corresponding to each channel may be determined based on the largest weight value in each channel and the largest weight value in the layer including each channel.

한편, 본 개시의 일시 예에 따르면, 이상에서 설명된 다양한 실시 예들은 기기(machine)(예: 컴퓨터)로 읽을 수 있는 저장 매체(machine-readable storage media)에 저장된 명령어를 포함하는 소프트웨어로 구현될 수 있다. 기기는, 저장 매체로부터 저장된 명령어를 호출하고, 호출된 명령어에 따라 동작이 가능한 장치로서, 개시된 실시 예들에 따른 전자 장치(예: 전자 장치(A))를 포함할 수 있다. 명령이 프로세서에 의해 실행될 경우, 프로세서가 직접, 또는 프로세서의 제어 하에 다른 구성요소들을 이용하여 명령에 해당하는 기능을 수행할 수 있다. 명령은 컴파일러 또는 인터프리터에 의해 생성 또는 실행되는 코드를 포함할 수 있다. 기기로 읽을 수 있는 저장매체는, 비일시적(non-transitory) 저장매체의 형태로 제공될 수 있다. 여기서, '비일시적'은 저장매체가 신호(signal)를 포함하지 않으며 실재(tangible)한다는 것을 의미할 뿐 데이터가 저장매체에 반영구적 또는 임시적으로 저장됨을 구분하지 않는다.Meanwhile, according to a temporary example of the present disclosure, the various embodiments described above may be implemented as software including instructions stored in a machine-readable storage media readable by a machine (eg, a computer). can The device is a device capable of calling a stored command from a storage medium and operating according to the called command, and may include an electronic device (eg, the electronic device A) according to the disclosed embodiments. When the instruction is executed by the processor, the processor may perform a function corresponding to the instruction by using other components directly or under the control of the processor. Instructions may include code generated or executed by a compiler or interpreter. The device-readable storage medium may be provided in the form of a non-transitory storage medium. Here, 'non-transitory' means that the storage medium does not include a signal and is tangible, and does not distinguish that data is semi-permanently or temporarily stored in the storage medium.

또한, 본 개시의 일 실시 예에 따르면, 이상에서 설명된 다양한 실시 예들에 따른 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예: compact disc read only memory (CD-ROM))의 형태로, 또는 어플리케이션 스토어(예: 플레이 스토어TM)를 통해 온라인으로 배포될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리와 같은 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다.Also, according to an embodiment of the present disclosure, the method according to the various embodiments described above may be included in a computer program product and provided. Computer program products may be traded between sellers and buyers as commodities. The computer program product may be distributed in the form of a machine-readable storage medium (eg, compact disc read only memory (CD-ROM)) or online through an application store (eg, Play Store™). In the case of online distribution, at least a portion of the computer program product may be temporarily stored or temporarily generated in a storage medium such as a memory of a server of a manufacturer, a server of an application store, or a relay server.

또한, 본 개시의 일 실시 예에 따르면, 이상에서 설명된 다양한 실시 예들은 소프트웨어(software), 하드웨어(hardware) 또는 이들의 조합을 이용하여 컴퓨터(computer) 또는 이와 유사한 장치로 읽을 수 있는 기록 매체 내에서 구현될 수 있다. 일부 경우에 있어 본 명세서에서 설명되는 실시 예들이 프로세서 자체로 구현될 수 있다. 소프트웨어적인 구현에 의하면, 본 명세서에서 설명되는 절차 및 기능과 같은 실시 예들은 별도의 소프트웨어 모듈들로 구현될 수 있다. 소프트웨어 모듈들 각각은 본 명세서에서 설명되는 하나 이상의 기능 및 동작을 수행할 수 있다.In addition, according to an embodiment of the present disclosure, the various embodiments described above are stored in a recording medium readable by a computer or a similar device using software, hardware, or a combination thereof. can be implemented in In some cases, the embodiments described herein may be implemented by the processor itself. According to the software implementation, embodiments such as procedures and functions described in this specification may be implemented as separate software modules. Each of the software modules may perform one or more functions and operations described herein.

한편, 상술한 다양한 실시 예들에 따른 기기의 프로세싱 동작을 수행하기 위한 컴퓨터 명령어(computer instructions)는 비일시적 컴퓨터 판독 가능 매체(non-transitory computer-readable medium)에 저장될 수 있다. 이러한 비일시적 컴퓨터 판독 가능 매체에 저장된 컴퓨터 명령어는 특정 기기의 프로세서에 의해 실행되었을 때 상술한 다양한 실시 예에 따른 기기에서의 처리 동작을 특정 기기가 수행하도록 한다. 비일시적 컴퓨터 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 비일시적 컴퓨터 판독 가능 매체의 구체적인 예로는, CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등이 있을 수 있다.Meanwhile, computer instructions for performing the processing operation of the device according to the above-described various embodiments may be stored in a non-transitory computer-readable medium. When the computer instructions stored in the non-transitory computer-readable medium are executed by the processor of the specific device, the specific device performs the processing operation in the device according to the various embodiments described above. The non-transitory computer-readable medium refers to a medium that stores data semi-permanently, not a medium that stores data for a short moment, such as a register, cache, memory, etc., and can be read by a device. Specific examples of the non-transitory computer-readable medium may include a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

또한, 상술한 다양한 실시 예들에 따른 구성 요소(예: 모듈 또는 프로그램) 각각은 단수 또는 복수의 개체로 구성될 수 있으며, 전술한 해당 서브 구성 요소들 중 일부 서브 구성 요소가 생략되거나, 또는 다른 서브 구성 요소가 다양한 실시 예에 더 포함될 수 있다. 대체적으로 또는 추가적으로, 일부 구성 요소들(예: 모듈 또는 프로그램)은 하나의 개체로 통합되어, 통합되기 이전의 각각의 해당 구성 요소에 의해 수행되는 기능을 동일 또는 유사하게 수행할 수 있다. 다양한 실시예들에 따른, 모듈, 프로그램 또는 다른 구성 요소에 의해 수행되는 동작들은 순차적, 병렬적, 반복적 또는 휴리스틱하게 실행되거나, 적어도 일부 동작이 다른 순서로 실행되거나, 생략되거나, 또는 다른 동작이 추가될 수 있다.In addition, each of the components (eg, a module or a program) according to the above-described various embodiments may be composed of a single or a plurality of entities, and some sub-components of the aforementioned sub-components may be omitted, or other sub-components may be omitted. Components may be further included in various embodiments. Alternatively or additionally, some components (eg, a module or a program) may be integrated into a single entity to perform the same or similar functions performed by each corresponding component prior to integration. According to various embodiments, operations performed by a module, program, or other component are sequentially, parallel, repetitively or heuristically executed, or at least some operations are executed in a different order, are omitted, or other operations are added. can be

이상에서는 본 개시의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 개시는 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 개시의 요지를 벗어남이 없이 당해 개시에 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 개시의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In the above, preferred embodiments of the present disclosure have been illustrated and described, but the present disclosure is not limited to the specific embodiments described above, and it is common in the technical field pertaining to the present disclosure without departing from the gist of the present disclosure as claimed in the claims. Various modifications may be made by those having the knowledge of

100 : 전자 장치 110 : 메모리
120 : 프로세서 200 : 컴파일러100: electronic device 110: memory
120: processor 200: compiler

Claims

In an electronic device,
a memory in which an artificial intelligence model composed of a plurality of layers is stored; and
processor; including;
The artificial intelligence model includes a plurality of weight values scaled based on a different shift scaling factor for each of a plurality of channels included in each of the plurality of layers and quantized for each of the plurality of layers,
The processor is
When input data is received, in a neural network operation process on the input data, an operation result for each channel is calculated with a synthetic scale parameter descaled based on a shift scaling factor corresponding to each channel.

According to claim 1,
The artificial intelligence model is
a plurality of quantized weight values, a shift scaling factor for each channel included in each of the plurality of layers, and a scale parameter and a zero point parameter corresponding to each of the plurality of layers,
The scale parameter is
When each of the plurality of layers is quantized, a gradient between a value before quantization and a value after quantization is indicated,
The zero point parameter is
When each of the plurality of layers is quantized, an electronic device indicating a value after quantization of a zero value before quantization.

3. The method of claim 2,
The processor is
A value obtained based on a scale parameter of a current layer, a scale parameter of a layer immediately before the current layer, and a scale parameter of the plurality of weight values is inversely scaled based on a shift scaling factor corresponding to each channel, and the synthesized scale parameter to obtain, an electronic device.

4. The method of claim 3,
The processor is
and obtaining the synthesized scale parameter by shifting the obtained value based on a shift scaling factor corresponding to each channel.

4. The method of claim 3,
The obtained value is
The electronic device is inversely proportional to a scale parameter of the current layer and proportional to a scale parameter of a layer immediately before the current layer and a scale parameter of the plurality of weight values.

3. The method of claim 2,
The processor is
When it is identified that a plurality of channel-specific shift scaling factors included in each of the plurality of layers are included in the artificial intelligence model, the inverse scaling is performed.

According to claim 1,
The electronic device is
An electronic device implemented as a Neural Processing Unit (NPU).

According to claim 1,
The shift scaling factor corresponding to each channel is,
The electronic device is determined based on a weight value included in each channel and a weight value included in a layer including each channel.

9. The method of claim 8,
The shift scaling factor corresponding to each channel is,
The electronic device is determined based on a weight value having a largest magnitude in each channel and a weight value having a largest magnitude in a layer including each channel.

A method for controlling an electronic device, comprising:
receiving input data; and
In the neural network calculation process for the input data using the artificial intelligence model, the calculation result for each channel of each of the plurality of layers constituting the artificial intelligence model is inversely scaled based on the shift scaling factor corresponding to each channel. and calculating with;
The artificial intelligence model is scaled based on a different shift scaling factor for each of a plurality of channels included in each of the plurality of layers and includes a plurality of weight values quantized for each of the plurality of layers.

11. The method of claim 10,
The artificial intelligence model is
a plurality of quantized weight values, a shift scaling factor for each channel included in each of the plurality of layers, and a scale parameter and a zero point parameter corresponding to each of the plurality of layers,
The scale parameter is
When each of the plurality of layers is quantized, a gradient between a value before quantization and a value after quantization is indicated,
The zero point parameter is
When quantizing each of the plurality of layers, a zero value before quantization indicates a value after quantization.

12. The method of claim 11,
The calculating step is
A value obtained based on a scale parameter of a current layer, a scale parameter of a layer immediately before the current layer, and a scale parameter of the plurality of weight values is inversely scaled based on a shift scaling factor corresponding to each channel, and the synthesized scale parameter to obtain, a control method.

13. The method of claim 12,
The calculating step is
and obtaining the synthesized scale parameter by shifting the obtained value based on a shift scaling factor corresponding to each channel.

13. The method of claim 12,
The obtained value is
The control method is inversely proportional to the scale parameter of the current layer and proportional to the scale parameter of the layer immediately before the current layer and the scale parameter of the plurality of weight values.

12. The method of claim 11,
The calculating step is
When it is identified that the shift scaling factors for each channel included in each of the plurality of layers are included in the artificial intelligence model, the operation is performed.

11. The method of claim 10,
The electronic device is
A control method implemented by a Neural Processing Unit (NPU).

11. The method of claim 10,
The shift scaling factor corresponding to each channel is,
The control method is determined based on a weight value included in each channel and a weight value included in a layer including each channel.

18. The method of claim 17,
The shift scaling factor corresponding to each channel is,
The control method is determined based on the largest weight value in each channel and the largest weight value in the layer including each channel.