KR20210035702A

KR20210035702A - Method of artificial neural network quantization and method of computation using artificial neural network

Info

Publication number: KR20210035702A
Application number: KR1020200029807A
Authority: KR
Inventors: 박주희; 김경영; 하상혁
Original assignee: 삼성전자주식회사
Priority date: 2019-09-24
Filing date: 2020-03-10
Publication date: 2021-04-01

Abstract

According to a technical idea of the present disclosure, a computing system comprises a neural network system for driving an artificial neural network and a quantization system for quantizing the artificial neural network. The quantization system can quantize parameters of an artificial neural network to generate quantized parameters of the artificial neural network, generate a quantization error of parameters of the artificial neural network based on the parameters of the artificial neural network and the quantized parameters, generate a correction bias based on the quantized parameters and the quantization error of the parameters of the artificial neural network, and transmit the generated quantized parameters and the correction bias to the neural network system.

Description

Quantization method of artificial neural network and computation method using artificial neural network {METHOD OF ARTIFICIAL NEURAL NETWORK QUANTIZATION AND METHOD OF COMPUTATION USING ARTIFICIAL NEURAL NETWORK}

본 개시의 기술적 사상은 인공 신경망의 양자화 방법 및 인공 신경망을 이용한 연산 방법에 관한 것으로서, 구체적으로 양자화 과정에서 발생하는 오차의 기대값을 보정 바이어스로 생성하고, 생성한 보정 바이어스를 양자화된 인공 신경망을 통한 연산 결과에 반영하는 인공 신경망의 양자화 방법 및 인공 신경망을 이용한 연산 방법에 관한 것이다.The technical idea of the present disclosure relates to a quantization method of an artificial neural network and an operation method using an artificial neural network. Specifically, an expected value of an error occurring in the quantization process is generated as a correction bias, and the generated correction bias is a quantized artificial neural network. The present invention relates to a quantization method of an artificial neural network that is reflected in the result of an operation through the method and an operation method using an artificial neural network.

인공 신경망(artificial neural network)은, 인공 뉴런(neuron)(또는 뉴런 모델)들의 상호연결된 집합들을 구현하기 위하여 컴퓨팅 기기 또는 컴퓨팅 기기에 의해서 수행되는 방법을 지칭할 수 있다. 인공 뉴런은 입력 데이터에 대한 단순 연산들을 수행함으로써 출력 데이터를 생성할 수 있고, 출력 데이터는 다른 인공 뉴런에 전달될 수 있다. 인공 신경망의 예시로서 심층 신경망(deep neural network) 또는 딥 러닝(deep learning)은 멀티 레이어 구조를 가질 수 있다.An artificial neural network may refer to a computing device or a method performed by a computing device to implement interconnected sets of artificial neurons (or neuronal models). Artificial neurons can generate output data by performing simple operations on the input data, and the output data can be passed on to other artificial neurons. As an example of an artificial neural network, a deep neural network or deep learning may have a multi-layer structure.

본 개시의 기술적 사상은 양자화 과정에서 발생하는 오차의 기대값을 보정 바이어스로 생성하고, 생성한 보정 바이어스를 양자화된 인공 신경망을 통한 연산 결과에 반영하는 인공 신경망의 양자화 방법 및 인공 신경망을 이용한 연산 방법 을 제공한다.The technical idea of the present disclosure is a quantization method of an artificial neural network and a computation method using an artificial neural network in which an expected value of an error occurring in the quantization process is generated as a correction bias and the generated correction bias is reflected in the calculation result through a quantized artificial neural network Provides.

상기와 같은 목적을 달성하기 위하여, 본 개시의 기술적 사상의 일측면에 따른 컴퓨팅 시스템은, 인공 신경망을 구동하는 신경망 시스템 및 인공 신경망을 양자화하는 양자화 시스템을 포함하고, 양자화 시스템은, 인공 신경망의 파라미터들을 양자화함으로써, 인공 신경망의 양자화된 파라미터들을 생성하고, 인공 신경망의 파라미터들 및 양자화된 파라미터들을 기초로 인공 신경망의 파라미터들의 양자화 오차를 생성하고, 양자화된 파라미터들 및 인공 신경망의 파라미터들의 양자화 오차를 기초로 보정 바이어스를 생성하고, 생성한 양자화된 파라미터들 및 보정 바이어스를 신경망 시스템에 전송할 수 있다.In order to achieve the above object, a computing system according to an aspect of the technical idea of the present disclosure includes a neural network system for driving an artificial neural network and a quantization system for quantizing the artificial neural network, and the quantization system includes parameters of the artificial neural network. By quantizing them, quantized parameters of the artificial neural network are generated, quantization errors of the parameters of the artificial neural network are generated based on the parameters of the artificial neural network and the quantized parameters, and quantization errors of the quantized parameters and the parameters of the artificial neural network are calculated. A correction bias may be generated based on the generated quantized parameters and a correction bias may be transmitted to the neural network system.

본 개시의 기술적 사상의 일측면에 따른 인공 신경망을 이용한 연산 방법은, 인공 신경망의 가중치 및 바이어스를 양자화하는 단계, 양자화에 의한 오차를 포함하도록 양자화된 바이어스를 보정하는 단계, 입력 샘플을 양자화하는 단계, 인공 신경망의 양자화된 가중치와 양자화된 입력 샘플을 기초로 제1 MAC(multiply-accumulate) 연산을 수행하는 단계 및 제1 MAC 연산의 결과에 보정된 양자화된 바이어스를 반영하는 단계를 포함할 수 있다.A computation method using an artificial neural network according to an aspect of the technical idea of the present disclosure includes quantizing weights and biases of the artificial neural network, correcting the quantized bias to include an error due to quantization, and quantizing an input sample. , Performing a first multiply-accumulate (MAC) operation based on the quantized weight of the artificial neural network and the quantized input sample, and reflecting the corrected quantized bias to the result of the first MAC operation. .

본 개시의 기술적 사상의 일측면에 따른 인공 신경망의 양자화 방법은, 인공 신경망의 파라미터들을 양자화하는 단계, 인공 신경망의 파라미터들 및 양자화된 파라미터들을 기초로 파라미터들의 양자화 오차를 산출하는 단계, 양자화된 파라미터들 및 파라미터들의 양자화 오차를 기초로 보정 바이어스를 생성하는 단계를 포함할 수 있다.A quantization method of an artificial neural network according to an aspect of the technical idea of the present disclosure includes quantizing parameters of the artificial neural network, calculating a quantization error of parameters based on parameters of the artificial neural network and quantized parameters, and quantized parameters. And generating a correction bias based on the quantization error of the parameters and parameters.

본 개시의 실시예에 따른 인공 신경망의 양자화 방법 및 인공 신경망을 이용한 연산 방법에 의하면, 양자화 과정에서 발생하는 오차의 기대값을 보정 바이어스로 생성하고, 생성한 보정 바이어스를 양자화된 인공 신경망을 통한 연산 결과에 반영함으로써, 양자화된 인공 신경망의 이용에 따른 감소된 복잡도를 가지면서도, 보정 바이어스의 반영에 따른 양호한 성능을 함께 가질 수 있다.According to the quantization method of the artificial neural network and the computation method using the artificial neural network according to an embodiment of the present disclosure, an expected value of an error occurring in the quantization process is generated as a correction bias, and the generated correction bias is calculated through a quantized artificial neural network. By reflecting it in the result, it is possible to have a reduced complexity due to the use of a quantized artificial neural network and have good performance according to the reflection of a correction bias.

도 1은 본 개시의 일 실시예에 따른 인공 신경망을 나타내는 도면이다.
도 2는 본 개시의 일 실시예에 따른 컴퓨팅 시스템을 나타내는 도면이다.
도 3은 본 개시의 일 실시예에 따른 신경망 시스템, 파라미터 양자화기, 샘플 양자화기 및 바이어스 보정기들의 동작을 나타내는 흐름도이다.
도 4는 본 개시의 일 실시예에 따른 산출 그래프(computational graph)의 아키텍처를 설명하기 위한 도면이다.
도 5는 본 개시의 일 실시예에 따른 신경망 시스템의 양자화 오차의 기대값을 양자화된 연산에 반영하는 동작을 설명하기 위한 도면이다.
도 6은 본 개시의 일 실시예에 따른 보정 바이어스의 생성 방법을 나타내는 도면이다.
도 7은 본 개시의 일 실시예에 따른 보정 바이어스의 생성 방법을 나타내는 도면이다.
도 8은 본 개시의 일 실시예에 따른 신경망 시스템의 양자화 오차의 기대값을 양자화된 연산에 반영하는 동작을 설명하기 위한 도면이다.
도 9는 본 개시의 일 실시예에 따른 참조 샘플, 양자화된 참조 샘플 및 참조 샘플의 양자화 오차를 결정하는 방법을 나타내는 흐름도이다.
도 10은 본 개시의 일 실시예에 따른 컴퓨팅 시스템을 나타내는 도면이다.
도 11은 본 개시의 일 실시예에 따른 다음 입력 샘플을 예상하는 방법을 나타내는 도면이다.
도 12는 본 개시의 일 실시예에 따른 다음 입력 샘플을 예상하는 방법을 나타내는 도면이다.
도 13은 본 개시의 일 실시예에 따른 참조 샘플, 양자화된 참조 샘플 및 참조 샘플의 양자화 오차를 결정하는 방법을 나타내는 흐름도이다.
도 14는 본 개시의 일 실시예에 따른 양자화된 인공 신경망을 통해 다음 순서의 입력 샘플에 대한 연산을 수행하는 방법을 나타내는 도면이다.
도 15는 본 개시의 일 실시예에 따른 인공 신경망을 이용한 연산 방법을 나타내는 흐름도이다.
도 16은 본 개시의 일 실시예에 따른 전자 장치를 나타내는 블록도이다.
도 17은 본 개시의 일 실시예에 따른 전자 장치를 나타내는 블록도이다.1 is a diagram illustrating an artificial neural network according to an embodiment of the present disclosure.
2 is a diagram illustrating a computing system according to an embodiment of the present disclosure.
3 is a flowchart illustrating operations of a neural network system, a parameter quantizer, a sample quantizer, and a bias corrector according to an embodiment of the present disclosure.
4 is a diagram for describing an architecture of a computational graph according to an embodiment of the present disclosure.
5 is a diagram for describing an operation of reflecting an expected value of a quantization error of a neural network system to a quantized operation according to an embodiment of the present disclosure.
6 is a diagram illustrating a method of generating a correction bias according to an embodiment of the present disclosure.
7 is a diagram illustrating a method of generating a correction bias according to an embodiment of the present disclosure.
8 is a diagram for describing an operation of reflecting an expected value of a quantization error of a neural network system to a quantized operation according to an embodiment of the present disclosure.
9 is a flowchart illustrating a method of determining a reference sample, a quantized reference sample, and a quantization error of a reference sample according to an embodiment of the present disclosure.
10 is a diagram illustrating a computing system according to an embodiment of the present disclosure.
11 is a diagram illustrating a method of predicting a next input sample according to an embodiment of the present disclosure.
12 is a diagram illustrating a method of predicting a next input sample according to an embodiment of the present disclosure.
13 is a flowchart illustrating a method of determining a reference sample, a quantized reference sample, and a quantization error of a reference sample according to an embodiment of the present disclosure.
14 is a diagram illustrating a method of performing an operation on an input sample of a next sequence through a quantized artificial neural network according to an embodiment of the present disclosure.
15 is a flowchart illustrating an operation method using an artificial neural network according to an embodiment of the present disclosure.
16 is a block diagram illustrating an electronic device according to an embodiment of the present disclosure.
17 is a block diagram illustrating an electronic device according to an embodiment of the present disclosure.

도 1은 본 개시의 일 실시예에 따른 인공 신경망을 나타내는 도면이다. 구체적으로, 도 1은 본 개시의 일 실시예에 따른 인공 신경망의 예시로서 심층 신경망(10)의 구조를 개략적으로 나타내는 도면이다.1 is a diagram illustrating an artificial neural network according to an embodiment of the present disclosure. Specifically, FIG. 1 is a diagram schematically illustrating a structure of a deep neural network 10 as an example of an artificial neural network according to an embodiment of the present disclosure.

인공 신경망(artificial neural network; ANN)은, 동물의 두뇌를 구성하는 생물학적 신경망에 착안된 컴퓨팅 시스템을 지칭할 수 있다. 인공 신경망(ANN)은, 규칙-기반(rule-based) 프로그래밍과 같이 미리 정의된 조건에 따라 작업을 수행하는 고전적인 알고리즘과 달리, 다수의 샘플들(또는 예시들)을 고려함으로써 작업을 수행하는 것을 학습할 수 있다. 인공 신경망(ANN)은 인공 뉴런(neuron)(또는 뉴런)들이 연결된 구조를 가질 수 있고, 뉴런들간 연결은 시냅스(synapse)로 지칭될 수 있다. 뉴런은 수신된 신호를 처리할 수 있고, 처리된 신호를 시냅스를 통해서 다른 뉴런에 전송할 수 있다. 뉴런의 출력은 "액티베이션(activation)"으로 지칭될 수 있다. 뉴런 및/또는 시냅스는 변동할 수 있는 가중치(weight)를 가질 수 있고, 가중치에 따라 뉴런에 의해서 처리된 신호의 영향력이 증가하거나 감소할 수 있다. 특히 개개의 뉴런에 관련된 가중치는 바이어스(bias)로서 지칭될 수 있다.An artificial neural network (ANN) may refer to a computing system focused on biological neural networks that make up an animal's brain. Unlike classical algorithms that perform tasks according to predefined conditions, such as rule-based programming, artificial neural networks (ANNs) perform tasks by considering multiple samples (or examples). You can learn to do it. An artificial neural network (ANN) may have a structure in which artificial neurons (or neurons) are connected, and a connection between neurons may be referred to as a synapse. Neurons can process the received signal and transmit the processed signal to other neurons through synapses. The output of a neuron may be referred to as “activation”. Neurons and/or synapses may have variable weights, and depending on the weights, the influence of signals processed by the neurons may increase or decrease. In particular, weights related to individual neurons may be referred to as bias.

심층 신경망(deep neural network; DNN) 또는 딥 러닝 구조(deep learning architecture)는 레이어 구조를 가질 수 있고, 특정 레이어의 출력은 후속하는 레이어의 입력이 될 수 있다. 이와 같은 멀티-레이어드(multi-layered) 구조에서, 레이어들 각각은 다수의 샘플들에 따라 훈련될 수 있다. 심층 신경망과 같은 인공 신경망은 인공 뉴런들에 각각 대응하는 다수의 프로세싱 노드들에 의해서 구현될 수 있는데, 양호한 결과들, 예컨대 높은 정확도의 결과들을 획득하기 위해서 높은 계산 복잡도(computational complexity)가 요구될 수 있고, 이에 따라 많은 컴퓨팅 리소스들이 요구될 수 있다.A deep neural network (DNN) or a deep learning architecture may have a layer structure, and an output of a specific layer may be an input of a subsequent layer. In such a multi-layered structure, each of the layers may be trained according to a plurality of samples. Artificial neural networks such as deep neural networks can be implemented by a plurality of processing nodes each corresponding to artificial neurons, and high computational complexity may be required to obtain good results, such as high accuracy results. And, accordingly, many computing resources may be required.

도 1을 참조하면, 심층 신경망(10)은 복수의 레이어들(L1, L2, L3, …, LN)을 포함할 수 있고, 레이어의 출력은 적어도 하나의 채널을 통해서 후속하는 레이어에 입력될 수 있다. 예를 들면, 제1 레이어(L1)는 샘플(SAM)을 처리함으로써 복수의 채널들(CH11...CH1x)을 통해서 제2 레이어(L2)에 출력을 제공할 수 있고, 제2 레이어(L2) 역시 복수의 채널들(CH21...CH2y)을 통해서 제3 레이어(L3)에 출력을 제공할 수 있다. 최종적으로, 제N 레이어(LN)는 결과(RES)를 출력할 수 있고, 결과(RES)는 샘플(SAM)과 관계된 적어도 하나의 값을 포함할 수 있다. 복수의 레이어들(L1, L2, L3, ..., LN) 각각의 출력들이 전달되는 채널들의 개수는 동일하거나 상이할 수 있다. 예를 들면, 제2 레이어(L2)의 채널들(CH21...CH2y)의 개수 및 제3 레이터(L3)의 채널들(CH31...CH3z)의 개수는 동일할 수도 있고 상이할 수도 있다.Referring to FIG. 1, the deep neural network 10 may include a plurality of layers (L1, L2, L3, ..., LN), and the output of the layer may be input to a subsequent layer through at least one channel. have. For example, the first layer L1 may provide an output to the second layer L2 through a plurality of channels CH11...CH1x by processing the sample SAM, and the second layer L2 ) May also provide an output to the third layer L3 through a plurality of channels CH21...CH2y. Finally, the Nth layer LN may output the result RES, and the result RES may include at least one value related to the sample SAM. The number of channels through which the outputs of each of the plurality of layers L1, L2, L3, ..., LN are transmitted may be the same or different. For example, the number of channels CH21...CH2y of the second layer L2 and the number of channels CH31...CH3z of the third rater L3 may be the same or different. .

샘플(SAM)은 심층 신경망(10)에 의해서 처리되는 입력 데이터일 수 있다. 예를 들면, 샘플(SAM)은 사람이 펜으로 쓴 글자를 포함하는 이미지일 수 있고, 심층 신경망(10)은 이미지로부터 글자를 인식함으로써 글자를 나타내는 값을 포함하는 결과(RES)를 출력할 수 있다. 결과(RES)는 상이한 글자들에 대응하는 복수의 확률들을 포함할 수 있고, 상이한 글자들 중 가장 유력한 글자는 가장 높은 확률에 대응할 수 있다. 심층 신경망(10)의 복수의 레이어들(L1, L2, L3, ..., LN) 각각은 글자를 포함하는 다수의 이미지들을 학습함으로써 생성된 값들, 예컨대 가중치, 바이어스 등에 기초하여 샘플(SAM) 및 이전 레이어의 출력을 처리함으로써 자신의 출력들을 생성할 수 있다.The sample SAM may be input data processed by the deep neural network 10. For example, the sample (SAM) may be an image including a character written by a human pen, and the deep neural network 10 may output a result (RES) including a value representing the character by recognizing the character from the image. have. The result RES may include a plurality of probabilities corresponding to different letters, and the most powerful letter among different letters may correspond to the highest probability. Each of the plurality of layers (L1, L2, L3, ..., LN) of the deep neural network 10 is a sample (SAM) based on values generated by learning a plurality of images including letters, such as weights, biases, etc. And it can generate its own outputs by processing the output of the previous layer.

한편, 심층 신경망(10)은 실시예에 따라 많은 개수의 레이어들 또는 채널들을 포함할 수 있고, 이에 따라, 심층 신경망(10)의 계산 복잡도가 증가할 수 있다. 계산 복잡도가 높은 심층 신경망(10)은 많은 리소스를 필요로 할 수 있다. 따라서, 심층 신경망(10)의 계산 복잡도를 감소시키기 위하여, 심층 신경망(10)은 양자화될 수 있다. 심층 신경망(10)의 양자화는, 예를 들어, 반올림을 통해서 실수(real number)를 정수(integer)로 맵핑하는 것과 같이, 입력 값들을 입력 값들의 개수보다 작은 개수의 값들로 맵핑하는 과정을 지칭할 수 있다. 한편, 양자화된 심층 신경망(10)은 낮은 계산 복잡도를 가질 수 있으나, 양자화 과정에서 발생하는 오차에 의해 감소된 정확도를 가질 수 있다.Meanwhile, the deep neural network 10 may include a large number of layers or channels according to an exemplary embodiment, and accordingly, the computational complexity of the deep neural network 10 may increase. The deep neural network 10 with high computational complexity may require a lot of resources. Therefore, in order to reduce the computational complexity of the deep neural network 10, the deep neural network 10 may be quantized. Quantization of the deep neural network 10 refers to a process of mapping input values to values smaller than the number of input values, such as mapping a real number to an integer through rounding. can do. Meanwhile, the quantized deep neural network 10 may have low computational complexity, but may have reduced accuracy due to an error occurring in a quantization process.

이하의 도면들을 참조하여 후술되는 바와 같이, 본 개시의 일 실시예에 따른 양자화된 심층 신경망(10)은 양자화 과정에서 발생하는 오차의 기대값을 연산 결과에 반영할 수 있고, 이에 따라 양호한 성능을 가지면서도 감소된 복잡도를 가질 수 있다.As will be described later with reference to the following drawings, the quantized deep neural network 10 according to an embodiment of the present disclosure may reflect an expected value of an error occurring in the quantization process in the calculation result, thereby improving performance. While having, it can have a reduced complexity.

도 2는 본 개시의 일 실시예에 따른 컴퓨팅 시스템을 나타내는 도면이다. 2 is a diagram illustrating a computing system according to an embodiment of the present disclosure.

도 2를 참조하면, 컴퓨팅 시스템(1000)은 양자화 시스템(100) 및 신경망 시스템(200)을 포함할 수 있다. 신경망 시스템(200)은 인공 신경망을 제공할 수 있고, 양자화 시스템(100)은 신경망 시스템(200)으로부터 제공된 인공 신경망을 양자화할 수 있고, 적어도 부분적으로 양자화된 인공 신경망을 신경망 시스템(200)에 제공할 수 있다. 도 1에서는 신경망 시스템(200) 및 양자화 시스템(100)이 분리된 것으로 도시되었으나, 실시예에 따라, 신경망 시스템(200) 및 양자화 시스템(100)은 하나의 시스템으로서 구현될 수도 있다.Referring to FIG. 2, the computing system 1000 may include a quantization system 100 and a neural network system 200. The neural network system 200 may provide an artificial neural network, and the quantization system 100 may quantize an artificial neural network provided from the neural network system 200, and provide an at least partially quantized artificial neural network to the neural network system 200 can do. In FIG. 1, the neural network system 200 and the quantization system 100 are illustrated as being separated, but according to an embodiment, the neural network system 200 and the quantization system 100 may be implemented as one system.

신경망 시스템(200)은 인공 신경망을 제공하는(또는 구동하는) 임의의 시스템일 수 있고, 신경망 장치로서 지칭될 수도 있다. 예를 들면, 신경망 시스템(200)은 적어도 하나의 프로세서 및 메모리를 포함하는 컴퓨팅 시스템일 수 있다. 비제한적인 예시로서, 신경망 시스템(200)은, 데스크탑 컴퓨터, 서버 등과 같은 고정형(stationary) 컴퓨팅 시스템뿐만 아니라, 랩탑 컴퓨터, 스마트 폰 등과 같은 휴대용(mobile) 컴퓨팅 시스템일 수도 있다.The neural network system 200 may be any system that provides (or drives) an artificial neural network, and may be referred to as a neural network device. For example, the neural network system 200 may be a computing system including at least one processor and a memory. As a non-limiting example, the neural network system 200 may be a stationary computing system such as a desktop computer and a server, as well as a mobile computing system such as a laptop computer and a smart phone.

일 실시예에서, 신경망 시스템(200)은 인공 신경망을 구동할 수 있고, 양자화 시스템(100)에 인공 신경망에 관한 정보를 제공할 수 있다. 일 실시예에서, 신경망 시스템(200)은 양자화 시스템(100)으로부터 제공된 정보에 따라 인공 신경망을 구동할 수 있고, 구동된 인공 신경망에 관한 정보를 양자화 시스템(100)에 제공할 수 있다.In an embodiment, the neural network system 200 may drive an artificial neural network and may provide information about the artificial neural network to the quantization system 100. In an embodiment, the neural network system 200 may drive an artificial neural network according to information provided from the quantization system 100 and may provide information on the driven artificial neural network to the quantization system 100.

양자화 시스템(100)은 인공 신경망의 양자화를 수행하는 임의의 시스템일 수 있고, 양자화 장치로서 지칭될 수도 있다. 예를 들면, 양자화 시스템(100)은 적어도 하나의 프로세서 및 메모리를 포함하는 컴퓨팅 시스템일 수 있다. 양자화 시스템(100)은 고정형 컴퓨팅 시스템일 수도 있고, 휴대용(mobile) 컴퓨팅 시스템일 수도 있다. 양자화 시스템(100)은 신경망 시스템(200)으로부터 제공된 인공 신경망의 정보에 기초하여 인공 신경망을 양자화할 수 있다.The quantization system 100 may be any system that performs quantization of an artificial neural network, and may be referred to as a quantization device. For example, the quantization system 100 may be a computing system including at least one processor and a memory. The quantization system 100 may be a fixed computing system or a mobile computing system. The quantization system 100 may quantize the artificial neural network based on information of the artificial neural network provided from the neural network system 200.

도 2를 참조하면, 양자화 시스템(100)은 신경망 인터페이스(110), 파라미터 양자화기(120), 샘플 양자화기(130) 및 바이어스 보정기(140)를 포함할 수 있다. 일 실시예에서, 신경망 인터페이스(110), 파라미터 양자화기(120), 샘플 양자화기(130) 및 바이어스 보정기(140) 각각은 논리 합성을 통해서 구현되는 로직 블록, 프로세서에 의해서 수행되는 소프트웨어 블록 또는 이들의 조합으로 구현될 수 있다. 일 실시예에서, 신경망 인터페이스(110), 파라미터 양자화기(120), 샘플 양자화기(130) 및 바이어스 보정기(140) 각각은 프로세서에 의해서 실행되는 복수의 명령어들의 집합으로서 프로시저일 수 있고, 프로세서에 의해서 액세스 가능한 메모리에 저장될 수 있다.Referring to FIG. 2, the quantization system 100 may include a neural network interface 110, a parameter quantizer 120, a sample quantizer 130, and a bias corrector 140. In one embodiment, each of the neural network interface 110, the parameter quantizer 120, the sample quantizer 130, and the bias corrector 140 is a logic block implemented through logic synthesis, a software block performed by a processor, or It can be implemented in a combination of. In one embodiment, each of the neural network interface 110, the parameter quantizer 120, the sample quantizer 130, and the bias corrector 140 may be a procedure as a set of a plurality of instructions executed by a processor. Can be stored in memory accessible by

신경망 인터페이스(110)는 신경망 시스템(200)에 대한 인터페이스를 파라미터 양자화기(120) 및 샘플 양자화기(130)에 제공할 수 있다. 예를 들면, 신경망 인터페이스(110)는 신경망 시스템(200)으로부터 수신된 인공 신경망의 파라미터들을 파라미터 양자화기(120)에 제공할 수 있고, 파라미터 양자화기(120)로부터 수신된 양자화된 파라미터들을 신경망 시스템(200)에 제공할 수 있다. 또한, 신경망 인터페이스(110)는 신경망 시스템(200)으로부터 수신된 샘플들을 샘플 양자화기(130)에 제공할 수 있고, 샘플 양자화기(130)로부터 수신된 양자화된 샘플들을 신경망 시스템(200)에 제공할 수 있다. 또한 신경망 인터페이스(110)는 바이어스 보정기(140)로부터 수신한 보정 바이어스를 신경망 시스템(200)에 제공할 수 있다.The neural network interface 110 may provide an interface to the neural network system 200 to the parameter quantizer 120 and the sample quantizer 130. For example, the neural network interface 110 may provide parameters of an artificial neural network received from the neural network system 200 to the parameter quantizer 120, and the quantized parameters received from the parameter quantizer 120 are used in the neural network system. It can be provided to (200). In addition, the neural network interface 110 may provide samples received from the neural network system 200 to the sample quantizer 130, and provide quantized samples received from the sample quantizer 130 to the neural network system 200 can do. In addition, the neural network interface 110 may provide the correction bias received from the bias corrector 140 to the neural network system 200.

파라미터 양자화기(120)는 신경망 인터페이스(110)를 통해서 신경망 시스템(200)으로부터 수신되는 파라미터들로부터 양자화된 파라미터들을 생성할 수 있다. 예를 들어, 파라미터 양자화기(120)는 인공 신경망의 파라미터들로서 가중치 및 바이어스를 수신하고, 양자화된 가중치 및 양자화된 바이어스를 생성할 수 있다. 파라미터 양자화기(120)는 양자화된 가중치를 신경망 인터페이스(110)를 통해서 신경망 시스템(200)에 제공할 수 있다.The parameter quantizer 120 may generate quantized parameters from parameters received from the neural network system 200 through the neural network interface 110. For example, the parameter quantizer 120 may receive weights and biases as parameters of an artificial neural network, and generate quantized weights and quantized biases. The parameter quantizer 120 may provide the quantized weight to the neural network system 200 through the neural network interface 110.

그리고 파라미터 양자화기(120)는 양자화된 가중치 및 양자화된 바이어스를 이용하여 가중치의 양자화 오차 및 바이어스의 양자화 오차를 생성할 수 있다. 가중치의 양자화 오차는, 인공 신경망의 가중치를 양자화하는 과정에서 발생하는 오차를 포함할 수 있다. 양자화 되지 않은 가중치는 양자화된 가중치와 가중치의 양자화 오차의 합과 같다는 특성을 이용하여, 파라미터 양자화기(120)는 수신한 가중치 및 양자화된 가중치로부터 가중치의 양자화 오차를 생성할 수 있다. 그리고 바이어스의 양자화 오차는, 인공 신경망의 바이어스를 양자화하는 과정에서 발생하는 오차를 포함할 수 있다. 양자화되지 않은 바이어스는 양자화된 바이어스와 바이어스의 양자화 오차의 합과 같다는 특성을 이용하여, 파라미터 양자화기(120)는 수신한 바이어스 및 양자화된 바이어스로부터 바이어스의 양자화 오차를 생성할 수 있다.In addition, the parameter quantizer 120 may generate a quantization error of the weight and a quantization error of the bias using the quantized weight and the quantized bias. The quantization error of the weight may include an error generated in the process of quantizing the weight of the artificial neural network. The parameter quantizer 120 may generate a quantization error of the weight from the received weight and the quantized weight by using a characteristic that the unquantized weight is equal to the sum of the quantized weight and the quantization error of the weight. In addition, the bias quantization error may include an error generated in the process of quantizing the bias of the artificial neural network. Using a characteristic that the non-quantized bias is equal to the sum of the quantized bias and the quantization error of the bias, the parameter quantizer 120 may generate a quantization error of the bias from the received bias and the quantized bias.

그리고 파라미터 양자화기(120)는 양자화에 의한 오차를 보정하기 위한 보정 바이어스를 생성하는데 사용되는 정보를 바이어스 보정기(140)에 제공할 수 있다. 예를 들어, 파라미터 양자화기(120)는 양자화된 가중치, 양자화된 바이어스, 가중치의 양자화 오차 및 바이어스의 양자화 오차를 바이어스 보정기(140)에 제공할 수 있다.In addition, the parameter quantizer 120 may provide information used to generate a correction bias for correcting an error due to quantization to the bias corrector 140. For example, the parameter quantizer 120 may provide a quantized weight, a quantized bias, a quantization error of the weight, and a quantization error of the bias to the bias corrector 140.

샘플 양자화기(130)는 신경망 인터페이스(110)를 통해서 신경망 시스템(200)으로부터 수신되는 샘플들로부터 양자화된 샘플들을 생성할 수 있다. 예를 들어, 샘플 양자화기(130)는 복수의 이미지들, 음성 데이터 등을 수신하고, 양자화된 이미지들, 양자화된 음성 데이터 등을 생성할 수 있다. 샘플 양자화기(130)는 양자화된 샘플들을 신경망 인터페이스(110)를 통해서 신경망 시스템(200)에 제공할 수 있다. 한편, 인공 신경망에서 파라미터들 및 샘플들은 상이한 특성을 가지므로, 파라미터들의 양자화와 샘플들의 양자화는 분리될 수 있다.The sample quantizer 130 may generate quantized samples from samples received from the neural network system 200 through the neural network interface 110. For example, the sample quantizer 130 may receive a plurality of images, voice data, and the like, and generate quantized images, quantized voice data, and the like. The sample quantizer 130 may provide quantized samples to the neural network system 200 through the neural network interface 110. Meanwhile, since parameters and samples have different characteristics in an artificial neural network, quantization of parameters and quantization of samples can be separated.

바이어스 보정기(140)는 파라미터 양자화기(120)로부터 수신한 정보를 이용하여 양자화에 의한 오차를 보정하기 위한 보정 바이어스를 생성할 수 있다. 일 실시예에서, 바이어스 보정기(140)는 양자화에 의한 오차를 포함하도록 양자화된 바이어스를 보정하여 보정 바이어스를 생성할 수 있다. 보정 바이어스를 생성하는 바이어스 보정기(140)의 동작의 예시들은 도 6 및 도 7을 참조하여 후술될 것이다. 바이어스 보정기(140)는 생성한 보정 바이어스를 신경망 인터페이스(110)를 통해서 신경망 시스템(200)에 제공할 수 있다.The bias corrector 140 may generate a correction bias for correcting an error due to quantization using information received from the parameter quantizer 120. In an embodiment, the bias corrector 140 may generate a correction bias by correcting a quantized bias to include an error due to quantization. Examples of the operation of the bias corrector 140 for generating a correction bias will be described later with reference to FIGS. 6 and 7. The bias corrector 140 may provide the generated correction bias to the neural network system 200 through the neural network interface 110.

신경망 시스템(200)은 파라미터 양자화기(120)로부터 수신한 양자화된 가중치 및 샘플 양자화기(130)로부터 수신한 양자화된 샘플을 기초로 MAC(multiply-accumulate) 연산을 수행할 수 있다. 그리고 신경망 시스템(200)은 바이어스 보정기(140)로부터 수신한 보정 바이어스를 MAC 연산 결과에 반영함으로써, 최종 연산 결과를 생성할 수 있다.The neural network system 200 may perform a multiply-accumulate (MAC) operation based on the quantized weight received from the parameter quantizer 120 and the quantized sample received from the sample quantizer 130. In addition, the neural network system 200 may generate a final operation result by reflecting the correction bias received from the bias corrector 140 on the MAC operation result.

도 3은 본 개시의 일 실시예에 따른 신경망 시스템, 파라미터 양자화기, 샘플 양자화기 및 바이어스 보정기들의 동작을 나타내는 흐름도이다. 구체적으로, 도 3은 도 2의 신경망 시스템(200), 파라미터 양자화기(120), 샘플 양자화기(130) 및 바이어스 보정기(140)들의 인공 신경망의 양자화 동작 및 양자화된 인공 신경망을 이용한 연산 동작을 나타내는 흐름도이다.3 is a flowchart illustrating operations of a neural network system, a parameter quantizer, a sample quantizer, and a bias corrector according to an embodiment of the present disclosure. Specifically, FIG. 3 illustrates a quantization operation of the artificial neural network of the neural network system 200, the parameter quantizer 120, the sample quantizer 130, and the bias corrector 140 of FIG. 2 and an operation operation using the quantized artificial neural network. It is a flow chart showing.

도 2 및 도 3을 참조하면, 신경망 시스템(200)은 인공 신경망을 포함할 수 잇다(S100). 그리고 신경망 시스템(200)은 양자화 요청과 함께 인공 신경망의 파라미터로서 가중치(W) 및 바이어스(bias)를 파라미터 양자화기(120)에 제공할 수 있다(S105). 그리고 파라미터 양자화기(120)는 수신한 가중치(W) 및 바이어스(bias)를 양자화할 수 있다(S110). 그리고 파라미터 양자화기(120)는 양자화된 가중치(q_W)를 신경망 시스템(200)에 제공할 수 있다(S110). 신경망 시스템(200)은 수신한 양자화된 가중치(q_W)를 저장할 수 있다. 그리고 파라미터 양자화기(120)는 양자화된 가중치(q_W), 양자화된 바이어스(q_bias), 가중치의 양자화 오차(e_W) 및 바이어스의 양자화 오차(e_bias)를 바이어스 보정기(140)에 제공할 수 있다(S120). 한편, 단계 S110 및 S120의 동작 순서는 변경될 수 있다.2 and 3, the neural network system 200 may include an artificial neural network (S100). In addition, the neural network system 200 may provide a weight W and a bias as parameters of the artificial neural network to the parameter quantizer 120 together with the quantization request (S105). In addition, the parameter quantizer 120 may quantize the received weight W and bias (S110). In addition, the parameter quantizer 120 may provide the quantized weight q_W to the neural network system 200 (S110). The neural network system 200 may store the received quantized weight (q_W). In addition, the parameter quantizer 120 may provide a quantized weight (q_W), a quantized bias (q_bias), a quantization error of the weight (e_W), and a quantization error of the bias (e_bias) to the bias corrector 140 (S120). ). Meanwhile, the order of operations of steps S110 and S120 may be changed.

그리고 바이어스 보정기(140)는 수신한 정보를 기초로 양자화에 의한 오차를 보정하기 위한 보정 바이어스를 생성할 수 있다(S125). 구체적으로, 바이어스 보정기(140)는 수신한 양자화된 가중치(q_W), 양자화된 바이어스(q_bias), 가중치의 양자화 오차(e_W) 및 바이어스의 양자화 오차(e_bias)를 기초로 보정 바이어스를 생성할 수 있다. 보정 바이어스를 생성하는 바이어스 보정기(140)의 동작의 예시들은 도 6 및 도 7을 참조하여 후술될 것이다. 그리고 바이어스 보정기(140)는 생성한 보정 바이어스(q_bias1)를 신경망 시스템(200)에 제공할 수 있다(S130). 신경망 시스템(200)은 수신한 보정 바이어스(q_bias1)를 저장할 수 있다.Further, the bias corrector 140 may generate a correction bias for correcting an error due to quantization based on the received information (S125). Specifically, the bias corrector 140 may generate a correction bias based on the received quantized weight (q_W), a quantized bias (q_bias), a quantization error of the weight (e_W), and a quantization error of the bias (e_bias). . Examples of the operation of the bias corrector 140 for generating a correction bias will be described later with reference to FIGS. 6 and 7. In addition, the bias corrector 140 may provide the generated correction bias q_bias1 to the neural network system 200 (S130). The neural network system 200 may store the received correction bias q_bias1.

신경망 시스템(200)은 새로운 샘플(X)을 수신할 수 있다(S135). 그리고 신경망 시스템(200)은 양자화 요청과 함께 샘플(X)을 샘플 양자화기(130)에 제공할 수 있다(S140). 그리고 샘플 양자화기(130)는 수신한 샘플(X)을 양자화할 수 있다(S145). 그리고 샘플 양자화기(130)는 양자화된 샘플(q_X)을 신경망 시스템(200)에 제공할 수 있다(S150). 신경망 시스템(200)은 양자화된 가중치(q_W) 및 양자화된 샘플(q_X)을 기초로 MAC 연산을 수행할 수 있다(S155). 그리고 신경망 시스템(200)은 MAC 연산 결과에 보정 바이어스(q_bias1)를 반영함으로써 최종 연산 결과를 생성할 수 있다(S160).The neural network system 200 may receive a new sample X (S135). In addition, the neural network system 200 may provide the sample X to the sample quantizer 130 together with the quantization request (S140). In addition, the sample quantizer 130 may quantize the received sample X (S145). In addition, the sample quantizer 130 may provide the quantized sample q_X to the neural network system 200 (S150). The neural network system 200 may perform a MAC operation based on the quantized weight (q_W) and the quantized sample (q_X) (S155). In addition, the neural network system 200 may generate a final calculation result by reflecting the correction bias q_bias1 in the MAC calculation result (S160).

도 4는 본 개시의 일 실시예에 따른 산출 그래프(computational graph)의 아키텍처를 설명하기 위한 도면이다. 4 is a diagram for describing an architecture of a computational graph according to an embodiment of the present disclosure.

도 4를 참조하면, 산출 그래프(20)는 노드들과 에지들을 이용하여 표현된 수학적 모델을 나타내는 그래프이다. 산출 그래프(20)의 아키텍처는 인공 신경망 또는 양자화된 인공 신경망의 아키텍처에 해당할 수 있다. 예를 들어, 인공 신경망 또는 양자화된 인공 신경망은 컨볼루션 신경망(CNN; Convolution Neural Network)으로 구현될 수 있으나, 본 개시는 이에 제한되지 않는다. 도 4의 양자화된 인공 신경망이 컨볼루션 신경망을 표현하는 경우, 산출 그래프(20)는 컨볼루션 신경망 중 일부의 레이어들에 해당될 수 있다. 예를 들어, 산출 그래프(20)는 컨볼루션 신경망의, 컨볼루션 레이어, 풀리 커넥티드(fully connected) 레이어 등과 같이 MAC 연산을 수행하는 하나의 레이어에 해당될 수 있다. 이하에서는 설명의 편의를 위해, 산출 그래프(20)가 컨볼루션 레이어인 것을 전제로, 도 2의 신경망 시스템(200)이 양자화된 가중치 및 양자화된 샘플을 이용하여 MAC 연산을 수행하는 방법을 설명한다.Referring to FIG. 4, the calculation graph 20 is a graph representing a mathematical model expressed using nodes and edges. The architecture of the calculation graph 20 may correspond to the architecture of an artificial neural network or a quantized artificial neural network. For example, an artificial neural network or a quantized artificial neural network may be implemented as a convolutional neural network (CNN), but the present disclosure is not limited thereto. When the quantized artificial neural network of FIG. 4 represents a convolutional neural network, the calculation graph 20 may correspond to some layers of the convolutional neural network. For example, the calculation graph 20 may correspond to one layer that performs MAC operations, such as a convolutional neural network, a convolutional layer, and a fully connected layer. Hereinafter, for convenience of explanation, on the assumption that the calculation graph 20 is a convolution layer, a method in which the neural network system 200 of FIG. 2 performs a MAC operation using quantized weights and quantized samples will be described. .

도 2 및 도 4를 참조하면, 신경망 시스템(200)은 양자화 요청과 함께 인공 신경망의 가중치(W)를 파라미터 양자화기(120)에 제공하고, 파라미터 양자화기(120)로부터 양자화된 가중치(q_W)를 수신할 수 있다. 그리고 신경망 시스템(200)은 수신한 입력 샘플(X)을 양자화 요청과 함께 파라미터 양자화기(120)에 제공하고, 파라미터 양자화기(120)로부터 양자화된 입력 샘플(q_X)을 수신할 수 있다. 그리고 신경망 시스템(200)은 양자화된 가중치(q_W) 및 양자화된 입력 샘플(q_X)을 기초로 MAC 연산을 수행하고, MAC 연산 결과에 바이어스(미도시)를 반영하여 양자화된 출력 샘플(q_Y)을 생성할 수 있다.2 and 4, the neural network system 200 provides the weight (W) of the artificial neural network together with the quantization request to the parameter quantizer 120, and the quantized weight (q_W) from the parameter quantizer 120 Can be received. In addition, the neural network system 200 may provide the received input sample X together with a quantization request to the parameter quantizer 120 and receive the quantized input sample q_X from the parameter quantizer 120. In addition, the neural network system 200 performs a MAC operation based on the quantized weight (q_W) and the quantized input sample (q_X), and reflects a bias (not shown) in the MAC operation result to obtain a quantized output sample (q_Y). Can be generated.

양자화된 입력 샘플(q_X) 및 양자화된 출력 샘플(q_Y)은 2차원 이상의 고차원 매트릭스일 수 있고, 각각의 액티베이션(activation) 파라미터들을 가질 수 있다. 양자화된 입력 샘플(q_X) 및 양자화된 출력 샘플(q_Y)이 예를 들어 3차원 매트릭스에 해당되는 경우, 양자화된 입력 샘플(q_X) 및 양자화된 출력 샘플(q_Y)은 너비(W)(또는 칼럼이라고 함), 높이(H)(또는 로우라고 함) 및 깊이(D)를 가질 수 있다. 이때, 깊이(D)는 채널(Channel)들의 개수로 지칭될 수 있다.The quantized input sample (q_X) and the quantized output sample (q_Y) may be a two-dimensional or higher-dimensional matrix, and may have respective activation parameters. If the quantized input sample (q_X) and the quantized output sample (q_Y) correspond to, for example, a three-dimensional matrix, the quantized input sample (q_X) and the quantized output sample (q_Y) are the width (W) (or column ), height (H) (or low), and depth (D). In this case, the depth D may be referred to as the number of channels.

컨볼루션 레이어에서, 양자화된 입력 샘플(q_X) 및 양자화된 가중치(q_W)에 대한 컨볼루션 연산이 수행될 수 있고, 그 결과 양자화된 출력 샘플(q_Y)이 생성될 수 있다. 양자화된 가중치(q_W)는 양자화된 입력 샘플(q_X)을 필터링할 수 있으며, 필터 또는 커널(kernel)로 지칭될 수 있다. 양자화된 가중치(q_W)는 커널 사이즈(K)(즉, 가중치의 크기)를 가질 수 있고, 양자화된 가중치(q_W)의 깊이, 즉 양자화된 가중치(q_W)의 채널 개수는 양자화된 입력 샘플(q_X)의 깊이와 동일할 수 있다. 양자화된 가중치(q_W)는 양자화된 입력 샘플(q_X)을 슬라이딩 윈도로 하여 횡단하는 방식으로 시프트될 수 있다. 각 시프트 동안, 양자화된 가중치(q_W)에 포함되는 웨이트들 각각이 양자화된 입력 샘플(q_X)과 중첩된 영역에서의 모든 값과 곱해지고 더해질 수 있다. 양자화된 입력 샘플(q_X)과 양자화된 가중치(q_W)가 컨볼루션됨에 따라, 양자화된 출력 샘플(q_Y)의 하나의 채널이 생성될 수 있다. 도 1에는 하나의 양자화된 가중치(q_W)가 표시되었으나, 실질적으로는 복수의 양자화된 가중치(q_W)들이 양자화된 입력 샘플(q_X)과 컨볼루션 되어, 양자화된 출력 샘플(q_Y)의 복수의 채널들이 생성될 수 있다.In the convolutional layer, a convolution operation may be performed on a quantized input sample (q_X) and a quantized weight (q_W), and as a result, a quantized output sample (q_Y) may be generated. The quantized weight q_W may filter the quantized input sample q_X, and may be referred to as a filter or a kernel. The quantized weight (q_W) may have a kernel size (K) (i.e., the size of the weight), and the depth of the quantized weight (q_W), that is, the number of channels of the quantized weight (q_W), is the quantized input sample (q_X Can be equal to the depth of ). The quantized weight q_W may be shifted by traversing the quantized input sample q_X as a sliding window. During each shift, each of the weights included in the quantized weight q_W may be multiplied and added to all values in the overlapped region with the quantized input sample q_X. As the quantized input sample q_X and the quantized weight q_W are convolved, one channel of the quantized output sample q_Y may be generated. Although one quantized weight (q_W) is shown in FIG. 1, in fact, a plurality of quantized weights (q_W) are convolved with the quantized input sample (q_X), and a plurality of channels of the quantized output sample (q_Y) Can be created.

한편, 양자화된 인공 신경망과 양자화된 입력을 이용하여 MAC 연산을 수행하는 것은, 양자화되지 않은 인공 신경망과 양자화되지 않은 입력을 이용하여 MAC 연산을 수행하는 경우에 비해 오차가 발생할 수 있다.Meanwhile, performing a MAC operation using a quantized artificial neural network and a quantized input may cause an error compared to a case of performing a MAC operation using an unquantized artificial neural network and a non-quantized input.

구체적으로, 인공 신경망의 연산은 아래의 식과 같이 표현될 수 있다.Specifically, the computation of the artificial neural network can be expressed as the following equation.

(C는 입력의 채널 개수, K는 커널 사이즈,

는 양자화된 가중치(w),

는 가중치(W)의 양자화 오차,

는 양자화된 바이어스(bias),

는 바이어스(bias)의 양자화 오차,

는 양자화된 입력 샘플(X),

는 입력 샘플(X)의 양자화 오차)(C is the number of input channels, K is the kernel size,

Is the quantized weight (w),

Is the quantization error of the weight (W),

Is the quantized bias,

Is the quantization error of the bias,

Is the quantized input sample (X),

Is the quantization error of the input sample (X))

그리고 인공 신경망의 연산은 양자화된 연산(①)과 양자화 오차(②)로 구별될 수 있다. 종래 방식의 신경망 시스템(200)은, 양자화 오차(②)에 대한 직접적인 고려 없이(즉, 인공 신경망의 연산에 양자화 오차(②)의 반영 없이), 가중치(W)와 입력 샘플(X)의 양자화 오차를 줄이도록 최적화된 양자화된 연산(①)을 구현하는데 집중하였다. 반면, 본 개시의 일 실시예에 따른 신경망 시스템(200)은 양자화 오차(②)의 기대값을 산출하고, 산출한 기대값을 양자화된 연산(①)에 반영함으로써, 양자화된 인공 신경망의 정확도를 높일 수 있다.In addition, the computation of the artificial neural network can be divided into a quantized operation (①) and a quantization error (②). The conventional neural network system 200 quantizes the weight (W) and the input sample (X) without direct consideration of the quantization error (②) (that is, without reflection of the quantization error (②) in the computation of the artificial neural network). We focused on implementing the optimized quantized operation (①) to reduce the error. On the other hand, the neural network system 200 according to an embodiment of the present disclosure calculates the expected value of the quantization error (②) and reflects the calculated expected value to the quantized operation (①), thereby increasing the accuracy of the quantized artificial neural network. You can increase it.

한편, 도 4에서는 설명의 편의를 위해 인공 신경망의 하나의 레이어에서의 연산을 도시하고 설명했지만, 본 개시의 기술적 사상에 따른 인공 신경망의 연산은 인공 신경망을 구성하는 복수의 레이어들 각각에 대하여 실질적으로 동일하게 적용될 수 있다.Meanwhile, in FIG. 4, for convenience of explanation, operation in one layer of the artificial neural network is illustrated and described, but the operation of the artificial neural network according to the technical idea of the present disclosure is substantially performed for each of a plurality of layers constituting the artificial neural network. It can be applied in the same way.

도 5는 본 개시의 일 실시예에 따른 신경망 시스템의 양자화 오차의 기대값을 양자화된 연산에 반영하는 동작을 설명하기 위한 도면이다. 5 is a diagram for describing an operation of reflecting an expected value of a quantization error of a neural network system to a quantized operation according to an embodiment of the present disclosure.

도 2, 도 4 및 도 5를 참조하면, 본 개시의 일 실시예에 따른 신경망 시스템(200)은 양자화 시스템(100)으로부터 수신한 양자화된 입력 샘플(q_X) 및 양자화된 가중치(q_W)를 기초로 MAC 연산을 수행할 수 있다. 그리고 신경망 시스템(200)은, 인공 신경망의 바이어스를 단순히 양자화한 바이어스(q_bias) 대신, 양자화 시스템(100)의 바이어스 보정기(140)에 의해 생성된 보정 바이어스(q_bias1)를 MAC 연산 결과에 반영함으로써, 양자화된 출력 샘플(q_Y)을 생성할 수 있다. 보정 바이어스(q_bias)는 도 6 및 도 7에서 후술하는 바와 같이, MAC 연산 결과의 크기와 동일한 크기를 가질 수 있다. 따라서 보정 바이어스(q_bias)는 MAC 연산 결과와 중첩되는 영역에서의 모든 값과 더해질 수 있다. 한편, 도 5에는 임의의 채널(channel k)에 대한 양자화된 출력 샘플(q_Y)이 표시되었으나, 실질적으로는 각 채널에 대한 양자화된 출력 샘플(q_Y)이 생성될 수 있다.2, 4, and 5, the neural network system 200 according to an embodiment of the present disclosure is based on a quantized input sample (q_X) and a quantized weight (q_W) received from the quantization system 100. MAC operation can be performed. In addition, the neural network system 200 reflects a correction bias (q_bias1) generated by the bias corrector 140 of the quantization system 100 to the MAC operation result, instead of a bias (q_bias) simply quantized the bias of the artificial neural network, A quantized output sample (q_Y) can be generated. The correction bias q_bias may have the same size as the size of the MAC operation result, as described later in FIGS. 6 and 7. Therefore, the correction bias (q_bias) can be added to all values in the region overlapping with the result of the MAC operation. Meanwhile, in FIG. 5, quantized output samples q_Y for an arbitrary channel (channel k) are displayed, but in practice, quantized output samples q_Y for each channel may be generated.

바이어스 보정기(140)는 인공 신경망의 양자화된 바이어스(q_bias) 뿐만 아니라 양자화 오차를 포함하는 보정 바이어스(q_bias1)를 생성할 수 있다. 구체적으로, 바이어스 보정기(140)는 아래의 인공 신경망의 양자화 연산식에서, ③의 식을 포함하도록 보정 바이어스(q_bias1)를 생성할 수 있다.The bias corrector 140 may generate a correction bias q_bias1 including a quantization error as well as a quantized bias q_bias of the artificial neural network. Specifically, the bias corrector 140 may generate a correction bias q_bias1 to include the equation ③ in the quantization equation of the artificial neural network below.

위의 인공 신경망의 양자화 연산식에서,

,

, 및

는 샘플 양자화기(130)의 인공 신경망의 양자화를 수행함으로써 미리 알고 있는 값이다. 그러나

및

의 경우, 입력 샘플(X)에 대한 연산 처리를 수행하는 시점에 미리 알 수 없는 값이다. 따라서, 바이어스 보정기(140)는 실제 입력 샘플(X)에 대한

및

이 아닌, 실제 입력 샘플(X)을 대신하는 참조 샘플(X’)의

및

를 이용하여 보정 바이어스(q_bias1)를 생성할 수 있다.In the above artificial neural network's quantization equation,

,

, And

Is a value known in advance by performing quantization of the artificial neural network of the sample quantizer 130. But

And

In the case of, it is a value that is not known in advance at the time when the operation processing on the input sample (X) is performed. Therefore, the bias corrector 140 is used for the actual input sample (X).

And

Of the reference sample (X') instead of the actual input sample (X).

And

A correction bias (q_bias1) can be generated by using.

예를 들어, 샘플 양자화기(130)는 샘플 풀(Pool) 중에서 선택된 복수의 샘플들 양자화함으로써, 양자화된 복수의 샘플들 및 복수의 샘플들의 양자화 오차들을 생성할 수 있다. 바이어스 보정기(140)는 양자화된 복수의 샘플들을 이용하여

를 생성하고, 복수의 샘플들의 양자화 오차들을 이용하여

를 생성할 수 있다. 그리고 바이어스 보정기(140)는 생성한

및

를 이용하여 보정 바이어스(q_bias1)를 생성할 수 있다. 복수의 샘플들을 이용하여 보정 바이어스(q_bias1)를 생성하는 구체적인 동작은 도 9와 관련하여 후술한다.For example, the sample quantizer 130 may generate a plurality of quantized samples and quantization errors of a plurality of samples by quantizing a plurality of samples selected from a sample pool. The bias corrector 140 uses a plurality of quantized samples

And using the quantization errors of a plurality of samples

Can be created. And the bias corrector 140 generated

And

A correction bias (q_bias1) can be generated by using. A specific operation of generating the correction bias q_bias1 using a plurality of samples will be described later with reference to FIG. 9.

또 다른 예로, 바이어스 보정기(140)는 인공 신경망을 통해 이미 처리한 입력 샘플들을 이용하여 예상되는 다음 순서의 입력 샘플의

를 생성하고, 이미 처리한 입력 샘플들의 양자화 오차들을 이용하여 다음 순서의 입력 샘플의

를 생성할 수 있다. 그리고 바이어스 보정기(140)는 생성한

및

를 이용하여 보정 바이어스(q_bias1)를 생성할 수 있다. 이미 처리한 입력 샘플들을 이용하여 보정 바이어스(q_bias1)를 생성하는 구체적인 동작은 도 10 내지 도 14와 관련하여 후술한다.As another example, the bias corrector 140 uses input samples that have already been processed through an artificial neural network.

And using the quantization errors of the input samples that have already been processed,

Can be created. And the bias corrector 140 generated

And

A correction bias (q_bias1) can be generated by using. A specific operation of generating the correction bias q_bias1 using input samples that have already been processed will be described later with reference to FIGS. 10 to 14.

도 6은 본 개시의 일 실시예에 따른 보정 바이어스의 생성 방법을 나타내는 도면이다. 구체적으로, 도 6은 인공 신경망의 양자화된 바이어스와 양자화 오차를 포함하는 보정 바이어스의 생성 방법을 나타내는 도면이다.6 is a diagram illustrating a method of generating a correction bias according to an embodiment of the present disclosure. Specifically, FIG. 6 is a diagram illustrating a method of generating a correction bias including a quantized bias and a quantization error of an artificial neural network.

보정 바이어스의 연산은 아래의 식과 같이 표현될 수 있다.The calculation of the correction bias can be expressed as the following equation.

(E는 기대값)(E is the expected value)

위의 마지막 식에서,

는 실제 입력 샘플(X)이 아닌 참조 샘플(X’)을 의미할 수 있고,

는 참조 샘플(X’)의 이용하여 생성할 수 있다. 참조 샘플(X’)을 결정하는 방법, 참조 샘플의 양자화 오차(

)를 산출하는 방법은 도 9 및 도 13에서 후술한다.In the last expression above,

May mean a reference sample (X'), not an actual input sample (X),

Can be generated using the reference sample (X'). How to determine the reference sample (X'), the quantization error of the reference sample (

A method of calculating) will be described later in FIGS. 9 and 13.

도 6을 참조하면, 위의 마지막 식은 바이어스 보정기(140)의 동작으로 구현될 수 있다. 바이어스 보정기(140)는 참조 샘플의 양자화 오차(e_X’) 및 양자화된 가중치(q_W)를 기초로 제1 MAC 연산을 수행할 수 있다. 그리고 바이어스 보정기(140)는 참조 샘플(X’) 및 가중치의 양자화 오차(e_W)를 기초로 제2 MAC 연산을 수행할 수 있다. 그리고 바이어스 보정기(140)는 제1 MAC 연산 결과, 제2 MAC 연산 결과 및 인공 신경망의 바이어스(bias, 양자화된 바이어스(q_bias) 및 바이어스의 양자화 오차(e_bias)의 합과 같다)를 더하여 보정 바이어스(q_bias1)를 생성할 수 있다.Referring to FIG. 6, the last equation above may be implemented by the operation of the bias corrector 140. The bias corrector 140 may perform a first MAC operation based on a quantization error (e_X') of a reference sample and a quantized weight (q_W). In addition, the bias corrector 140 may perform a second MAC operation based on the reference sample X'and the quantization error e_W of the weight. In addition, the bias corrector 140 adds the first MAC operation result, the second MAC operation result, and the bias of the artificial neural network (which is equal to the sum of the bias, quantized bias (q_bias), and quantization error (e_bias) of the bias) to correct a correction bias ( q_bias1) can be created.

도 7은 본 개시의 일 실시예에 따른 보정 바이어스의 생성 방법을 나타내는 도면이다. 구체적으로, 도 7은 도 6의 변형 가능한 실시예를 나타내는 도면이다.7 is a diagram illustrating a method of generating a correction bias according to an embodiment of the present disclosure. Specifically, FIG. 7 is a diagram illustrating a deformable embodiment of FIG. 6.

도 6의 보정 바이어스의 연산은 아래의 식과 같이 표현될 수도 있다.The calculation of the correction bias of FIG. 6 may be expressed by the following equation.

위의 마지막 식에서,

는 인공 신경망의 양자화되지 않은 가중치(W)를 의미할 수 있다.

및

는 실제 입력 샘플(X)이 아닌 참조 샘플(X’)의 이용하여 생성할 수 있다. 참조 샘플(X’)을 결정하는 방법, 양자화된 참조 샘플(

) 및 참조 샘플의 양자화 오차(

May mean an unquantized weight (W) of the artificial neural network.

And

Can be generated by using the reference sample (X'), not the actual input sample (X). A method of determining a reference sample (X'), a quantized reference sample (

) And the quantization error of the reference sample (

A method of calculating) will be described later in FIGS. 9 and 13.

도 7을 참조하면, 위의 마지막 식은 바이어스 보정기(140)의 동작으로 구현될 수 있다. 바이어스 보정기(140)는 참조 샘플의 양자화 오차(e_X’) 및 양자화되지 않은 가중치(W)를 기초로 제3 MAC 연산을 수행할 수 있다. 그리고 바이어스 보정기(140)는 양자화된 참조 샘플(q_X’) 및 가중치의 양자화 오차(e_W)를 기초로 제4 MAC 연산을 수행할 수 있다. 그리고 바이어스 보정기(140)는 제3 MAC 연산 결과, 제4 MAC 연산 결과 및 인공 신경망의 바이어스(bias)를 더하여 보정 바이어스(q_bias1)를 생성할 수 있다.Referring to FIG. 7, the last equation above may be implemented by the operation of the bias corrector 140. The bias corrector 140 may perform a third MAC operation based on a quantization error (e_X') of a reference sample and an unquantized weight (W). In addition, the bias corrector 140 may perform a fourth MAC operation based on the quantized reference sample q_X' and the quantization error e_W of the weight. In addition, the bias corrector 140 may generate a correction bias q_bias1 by adding the third MAC operation result, the fourth MAC operation result, and a bias of the artificial neural network.

도 6 및 도 7에서 생성된 보정 바이어스(q_bias1)들은 그 생성되는 방법은 상이하나, 값은 동일할 수 있다. 도 5의 양자화된 입력 샘플(q_X)과 도 6 및 도 7의 참조 샘플(X’), 양자화된 참조 샘플(q_X’) 및 참조 샘플의 양자화 오차(e_X’)는 모두 동일한 너비(W) 및 동일한 높이(H)를 갖고, 도 5의 양자화된 가중치(q_W)와 도 6 및 도 7의 양자화된 가중치(q_W) 및 가중치의 양자화 오차(e_W)는 동일한 커널 사이즈(K)를 가질 수 있다. 따라서, 도 6 및 도 7에서 생성된 보정 바이어스(q_bias1)의 크기는, 도 5의 한 채널에서 양자화된 가중치(q_W) 및 양자화된 입력 샘플(q_X)의 MAC 연산 결과의 크기가 동일할 수 있다. 따라서, 생성된 보정 바이어스(q_bias1)는, 양자화된 가중치(q_W) 및 양자화된 입력 샘플(q_X)의 MAC 연산 결과와 중첩되는 영역에서의 모든 값과 더해질 수 있다.The correction biases q_bias1 generated in FIGS. 6 and 7 may have different generation methods, but may have the same value. The quantized input sample (q_X) of FIG. 5, the reference sample (X′) of FIGS. 6 and 7, the quantization error (e_X′) of the quantized reference sample (q_X′), and the reference sample are all the same width (W) and They have the same height H, and the quantized weight q_W of FIG. 5, the quantized weight q_W of FIGS. 6 and 7 and the quantization error e_W of the weight may have the same kernel size K. Accordingly, the size of the correction bias q_bias1 generated in FIGS. 6 and 7 may be the same as the size of the MAC operation result of the quantized weight q_W and the quantized input sample q_X in one channel of FIG. 5. . Accordingly, the generated correction bias q_bias1 may be added to all values in a region overlapping with the result of the MAC operation of the quantized weight q_W and the quantized input sample q_X.

도 8은 본 개시의 일 실시예에 따른 신경망 시스템의 양자화 오차의 기대값을 양자화된 연산에 반영하는 동작을 설명하기 위한 도면이다. 구체적으로, 도 8은 도 5의 변형 가능한 실시예를 나타내는 도면이다.8 is a diagram for describing an operation of reflecting an expected value of a quantization error of a neural network system to a quantized operation according to an embodiment of the present disclosure. Specifically, FIG. 8 is a diagram illustrating a deformable embodiment of FIG. 5.

도 7 에서 전술한 바와 같이, 보정 바이어스(q_bias1)는 한 채널에서 양자화된 가중치(q_W) 및 양자화된 입력 샘플(q_X)의 MAC 연산 결과의 크기가 동일할 수 있다. 한편, 변형 가능한 실시예에 따르면, 보정 바이어스(q_bias1)는 2차원 매트릭스 구조가 아닌 스칼라 값을 가질 수 있다. 일 실시예에서, 바이어스 보정기(140)는 보정 바이어스(q_bias1)를 구성하는 값들에 대한 평균(mean)을 취함으로써, 스칼라 값을 갖는 제2 보정 바이어스(q_bias2)를 생성할 수 있다. 한편, 바이어스 보정기(140)가 스칼라 값을 갖는 제2 보정 바이어스(q_bias2)를 생성하는 방법은 전술한 예에 한하지 않으며, 다양한 방법을 적용할 수 있다.As described above with reference to FIG. 7, the correction bias q_bias1 may have the same size of the MAC operation result of the quantized weight q_W and the quantized input sample q_X in one channel. Meanwhile, according to a deformable embodiment, the correction bias q_bias1 may have a scalar value instead of a 2D matrix structure. In an embodiment, the bias corrector 140 may generate a second correction bias q_bias2 having a scalar value by taking a mean of values constituting the correction bias q_bias1. Meanwhile, a method of generating the second correction bias q_bias2 having a scalar value by the bias corrector 140 is not limited to the above-described example, and various methods may be applied.

바이어스 보정기(140)는 생성한 제2 보정 바이어스(q_bias2)를 신경망 시스템(200)에 제공할 수 있다. 한편, 실시예에 따라, 바이어스 보정기(140)는 기존의 보정 바이어스(q_bias1)를 신경망 시스템(200)에 제공하고, 신경망 시스템(200)이 수신한 보정 바이어스(q_bias1)를 이용하여 스칼라 값을 갖는 제2 보정 바이어스(q_bias2)를 생성할 수 있음은 물론이다.The bias corrector 140 may provide the generated second correction bias q_bias2 to the neural network system 200. Meanwhile, according to an embodiment, the bias corrector 140 provides an existing correction bias (q_bias1) to the neural network system 200, and has a scalar value using the correction bias (q_bias1) received by the neural network system 200. It goes without saying that the second correction bias (q_bias2) can be generated.

도 8을 참조하면, 신경망 시스템(200)은 양자화된 입력 샘플(q_X) 및 양자화된 가중치(q_W)를 기초로 MAC 연산을 수행할 수 있다. 그리고 신경망 시스템(200)은, 인공 신경망의 바이어스를 단순히 양자화한 바이어스(q_bias) 대신, 제2 보정 바이어스(q_bias2)를 MAC 연산 결과에 반영함으로써, 양자화된 출력 샘플(q_Y)을 생성할 수 있다. 이와 같이, 기존의 보정 바이어스(q_bias1) 대신 제2 보정 바이어스(q_bias2)를 이용하는 경우, 메모리의 저장 공간의 효율성을 높일 수 있고, 하나의 스칼라 값만을 MAC 연산 결과에 반영함에 따라, 연산 속도가 증가할 수 있다.Referring to FIG. 8, the neural network system 200 may perform a MAC operation based on a quantized input sample (q_X) and a quantized weight (q_W). In addition, the neural network system 200 may generate a quantized output sample q_Y by reflecting the second correction bias q_bias2 to the MAC operation result instead of the bias q_bias obtained by simply quantizing the bias of the artificial neural network. In this way, when the second correction bias (q_bias2) is used instead of the conventional correction bias (q_bias1), the efficiency of the storage space of the memory can be increased, and the calculation speed increases as only one scalar value is reflected in the MAC operation result. can do.

도 9는 본 개시의 일 실시예에 따른 참조 샘플, 양자화된 참조 샘플 및 참조 샘플의 양자화 오차를 결정하는 방법을 나타내는 흐름도이다. 구체적으로, 도 9는 도 2의 신경망 시스템(200) 및 양자화 시스템(100)이 고정적으로 사용할 수 있는 보정 바이어스를 생성하기 위해, 참조 샘플, 양자화된 참조 샘플 및 참조 샘플의 양자화 오차를 결정하는 방법을 나타내는 흐름도이다.9 is a flowchart illustrating a method of determining a reference sample, a quantized reference sample, and a quantization error of a reference sample according to an embodiment of the present disclosure. Specifically, FIG. 9 is a method of determining a reference sample, a quantized reference sample, and a quantization error of the reference sample in order to generate a correction bias that can be fixedly used by the neural network system 200 and the quantization system 100 of FIG. 2 It is a flow chart showing.

도 2 및 도 9를 참조하면, 신경망 시스템(200)은 샘플 풀 중에서 복수의 제1 샘플들을 선택할 수 있다(S210). 여기서 복수의 제1 샘플들은 인공 신경망을 훈련에 사용된 훈련 샘플들일 수 있다. 그러나 본 개시는 이에 한정되지 않으며, 복수의 제1 샘플들은 인공 신경망의 훈련에 사용되지 않은 샘플들일 수 있고, 훈련에 사용된 훈련 샘플들과 훈련에 사용되지 않은 샘플들을 모두 포함할 수도 있다.2 and 9, the neural network system 200 may select a plurality of first samples from a sample pool (S210). Here, the plurality of first samples may be training samples used for training the artificial neural network. However, the present disclosure is not limited thereto, and the plurality of first samples may be samples not used for training of an artificial neural network, and may include both training samples used for training and samples not used for training.

그리고 신경망 시스템(200)은 인공 신경망을 통해 복수의 제1 샘플들에 대한 연산을 수행할 수 있다(S220). 구체적으로, 신경망 시스템(200)은 양자화되지 않은 신경망을 통해 복수의 제1 샘플들에 대한 연산을 수행할 수 있다.In addition, the neural network system 200 may perform an operation on a plurality of first samples through an artificial neural network (S220). Specifically, the neural network system 200 may perform an operation on a plurality of first samples through a neural network that is not quantized.

그리고 신경망 시스템(200)은 인공 신경망을 구성하는 레이어들 각각의 복수의 출력 샘플들의 통계적 분포들을 기초로 복수의 제1 샘플들 중 적어도 하나의 제2 샘플을 선택할 수 있다(S230). 구체적으로, 신경망 시스템(200)은 양자화되지 않은 인공 신경망을 구성하는 레이어들, 예를 들어, 도 1의 제1 레이어(L1) 내지 제n 레이어(Ln) 각각의 출력 샘플들의 통계적 분포들을 확인할 수 있다. 여기서 통계적 분포들은, 출력 샘플들의 평균, 분산, 기대값, 비대칭도, 첨도 중 적어도 하나를 포함할 수 있으며, 전술한 예에 한하지 않는다. 그리고 신경망 시스템(200)은 통계적 분포들에 기초하여 복수의 제1 샘플들 중 적어도 하나의 제2 샘플을 선택할 수 있다. 예를 들어, 신경망 시스템(200)은 복수의 제1 샘플들 중, 레어어들 각각의 평균들과 가까운 값을 갖는 출력 샘플에 대응하는 적어도 하나의 제2 샘플을 선택할 수 있다. 한편, 통계적 분포들에 기초하여 적어도 하나의 제2 샘플을 선택하는 방법은 전술한 예에 한하지 않는다.In addition, the neural network system 200 may select at least one second sample from among the plurality of first samples based on statistical distributions of a plurality of output samples of each of the layers constituting the artificial neural network (S230). Specifically, the neural network system 200 can check the statistical distributions of the output samples of each of the layers constituting the unquantized artificial neural network, for example, the first layer (L1) to the n-th layer (Ln) of FIG. 1. have. Here, the statistical distributions may include at least one of average, variance, expected value, asymmetry, and kurtosis of the output samples, and are not limited to the above-described examples. In addition, the neural network system 200 may select at least one second sample from among the plurality of first samples based on statistical distributions. For example, the neural network system 200 may select at least one second sample corresponding to an output sample having a value close to the averages of each of the plurality of first samples. Meanwhile, the method of selecting at least one second sample based on statistical distributions is not limited to the above-described example.

그리고 신경망 시스템(200)은 선택된 제2 샘플을 이용하여 양자화된 제2 샘플 및 제2 샘플의 양자화 오차를 산출할 수 있다(S240). 구체적으로, 신경망 시스템(200)은 양자화된 인공 신경망을 통해 제2 샘플에 대한 연산을 수행함으로써, 양자화된 제2 샘플을 산출할 수 있다. 그리고 입력 샘플(X)은 양자화된 입력 샘플(q_X) 및 입력 샘플의 양자화 오차(e_X)와 동일하다는 특성을 이용하여, 신경망 시스템(200)은 제2 샘플 및 양자화된 제2 샘플로부터 제2 샘플의 양자화 오차를 산출할 수 있다.In addition, the neural network system 200 may calculate a quantization error of the quantized second sample and the second sample using the selected second sample (S240). Specifically, the neural network system 200 may calculate the quantized second sample by performing an operation on the second sample through the quantized artificial neural network. In addition, by using the characteristics that the input sample (X) is the same as the quantized input sample (q_X) and the quantization error (e_X) of the input sample, the neural network system 200 performs a second sample from the second sample and the quantized second sample. The quantization error of can be calculated.

그리고 양자화 시스템(100)은 참조 샘플, 양자화된 참조 샘플 및/또는 참조 샘플의 양자화 오차를 결정할 수 있다(S250). 일 실시예에서, 양자화 시스템(100)은 적어도 하나의 제2 샘플, 양자화된 제2 샘플 및 제2 샘플의 양자화 오차 각각의 평균을 산출하고, 산출한 제2 샘플의 평균, 양자화된 제2 샘플의 평균 및 제2 샘플의 양자화 오차의 평균 각각을 참조 샘플, 양자화된 참조 샘플 및 참조 샘플의 양자화 오차로 결정할 수 있다. 한편, 양자화 시스템(100)이 제2 샘플 및 양자화된 제2 샘플로부터 제2 샘플의 양자화 오차를 이용하여 참조 샘플, 양자화된 참조 샘플 및/또는 참조 샘플의 양자화 오차를 결정하는 방법은 전술한 예에 한하지 않으며, 다양한 방식들이 적용될 수 있다.In addition, the quantization system 100 may determine a reference sample, a quantized reference sample, and/or a quantization error of the reference sample (S250). In one embodiment, the quantization system 100 calculates an average of each of the quantization errors of at least one second sample, the quantized second sample, and the second sample, the calculated average of the second sample, and the quantized second sample Each of the average and the average of the quantization error of the second sample may be determined as a reference sample, a quantized reference sample, and a quantization error of the reference sample. Meanwhile, the method for determining the quantization error of the reference sample, the quantized reference sample, and/or the reference sample by using the quantization error of the second sample from the second sample and the quantized second sample is described above. It is not limited to, and various methods can be applied.

그리고 양자화 시스템(100)은 결정한 참조 샘플, 양자화된 참조 샘플 및/또는 참조 샘플의 양자화 오차를 기초로 보정 바이어스를 생성할 수 있다. 구체적으로, 도 6을 참조하면, 양자화 시스템(100)의 바이어스 보정기(140)는 참조 샘플의 양자화 오차(e_X’) 및 양자화된 가중치(q_W)를 기초로 제1 MAC 연산을 수행할 수 있다. 그리고 바이어스 보정기(140)는 참조 샘플(X’) 및 가중치의 양자화 오차(e_W)를 기초로 제2 MAC 연산을 수행할 수 있다. 그리고 바이어스 보정기(140)는 제1 MAC 연산 결과, 제2 MAC 연산 결과 및 인공 신경망의 바이어스(bias, 양자화된 바이어스(q_bias) 및 바이어스의 양자화 오차(e_bias)의 합과 같다)를 더하여 보정 바이어스(q_bias1)를 생성할 수 있다.In addition, the quantization system 100 may generate a correction bias based on the determined reference sample, the quantized reference sample, and/or a quantization error of the reference sample. Specifically, referring to FIG. 6, the bias corrector 140 of the quantization system 100 may perform a first MAC operation based on a quantization error (e_X') and a quantized weight (q_W) of a reference sample. In addition, the bias corrector 140 may perform a second MAC operation based on the reference sample X'and the quantization error e_W of the weight. In addition, the bias corrector 140 adds the first MAC operation result, the second MAC operation result, and the bias of the artificial neural network (which is equal to the sum of the bias, quantized bias (q_bias), and quantization error (e_bias) of the bias) to correct a correction bias ( q_bias1) can be created.

또는, 도 7을 참조하면, 양자화 시스템(100)의 바이어스 보정기(140)는 참조 샘플의 양자화 오차(e_X’) 및 양자화되지 않은 가중치(W)를 기초로 제3 MAC 연산을 수행할 수 있다. 그리고 바이어스 보정기(140)는 양자화된 참조 샘플(q_X’) 및 가중치의 양자화 오차(e_W)를 기초로 제4 MAC 연산을 수행할 수 있다. 그리고 바이어스 보정기(140)는 제3 MAC 연산 결과, 제4 MAC 연산 결과 및 인공 신경망의 바이어스(bias)를 더하여 보정 바이어스(q_bias1)를 생성할 수 있다.Alternatively, referring to FIG. 7, the bias corrector 140 of the quantization system 100 may perform a third MAC operation based on a quantization error (e_X') of a reference sample and an unquantized weight (W). In addition, the bias corrector 140 may perform a fourth MAC operation based on the quantized reference sample q_X' and the quantization error e_W of the weight. In addition, the bias corrector 140 may generate a correction bias q_bias1 by adding the third MAC operation result, the fourth MAC operation result, and a bias of the artificial neural network.

한편, 바이어스 보정기(140)는 도 8을 참조하여 전술한 바와 같이, 생성한 보정 바이어스(q_bias1)를 구성하는 값들에 대한 평균을 취함으로써, 스칼라 값을 갖는 제2 보정 바이어스(q_bias2)를 생성할 수 있음은 물론이다.Meanwhile, the bias corrector 140 generates a second correction bias q_bias2 having a scalar value by taking an average of the values constituting the generated correction bias q_bias1 as described above with reference to FIG. 8. Of course you can.

이와 같이, 바이어스 보정기(140)는 복수의 샘플들을 이용하여 참조 샘플, 양자화된 참조 샘플 및/또는 참조 샘플의 양자화 오차를 결정할 수 있으며, 결정한 참조 샘플, 양자화된 참조 샘플 및 참조 샘플의 양자화 오차를 기초로 보정 바이어스(q_bias1 또는 q_bias2)를 생성할 수 있다. 신경망 시스템(200)은 생성한 보정 바이어스(q_bias1 또는 q_bias2)를 입력 샘플의 연산에서 고정적으로 사용할 수 있다. 한편, 본 개시는 이에 한정되지 않으며, 실시예에 따라, 바이어스 보정기(140)는 주기적 또는 비주기적으로 보정 바이어스를 새로 생성할 수 있다.In this way, the bias corrector 140 may determine a reference sample, a quantized reference sample, and/or a quantization error of the reference sample using a plurality of samples, and calculate the quantization error of the determined reference sample, the quantized reference sample, and the reference sample. A correction bias (q_bias1 or q_bias2) may be generated as a basis. The neural network system 200 may use the generated correction bias (q_bias1 or q_bias2) in the calculation of the input sample fixedly. Meanwhile, the present disclosure is not limited thereto, and according to an exemplary embodiment, the bias corrector 140 may periodically or aperiodically generate a new correction bias.

한편, 바이어스 보정기(140)는 이미 처리한 입력 샘플들을 이용하여 참조 샘플, 양자화된 참조 샘플 및 참조 샘플의 양자화 오차를 결정할 수 있다. 이에 대한 구체적인 설명은 도 10 내지 도 14에서 후술한다.Meanwhile, the bias corrector 140 may determine a reference sample, a quantized reference sample, and a quantization error of the reference sample using input samples that have already been processed. A detailed description of this will be described later in FIGS. 10 to 14.

도 10은 본 개시의 일 실시예에 따른 컴퓨팅 시스템을 나타내는 도면이다. 구체적으로, 도 10은 도 2의 변형 가능한 실시예를 나타내는 도면이다.10 is a diagram illustrating a computing system according to an embodiment of the present disclosure. Specifically, FIG. 10 is a diagram illustrating a deformable embodiment of FIG. 2.

도 10을 참조하면, 컴퓨팅 시스템(1000a)은 양자화 시스템(100a) 및 신경망 시스템(200)을 포함할 수 있다. 양자화 시스템(100a)은 신경망 인터페이스(110), 파라미터 양자화기(120), 샘플 양자화기(130), 바이어스 보정기(140a) 및 샘플 생성기(150a)를 생성할 수 있다. 도 10의 실시예에 따른 신경망 시스템(200), 신경망 인터페이스(110), 파라미터 양자화기(120) 및 샘플 양자화기(130)는 도 1의 신경망 시스템(200), 신경망 인터페이스(110), 파라미터 양자화기(120) 및 샘플 양자화기(130)에 대응될 수 있는바, 중복된 설명은 생략한다.Referring to FIG. 10, the computing system 1000a may include a quantization system 100a and a neural network system 200. The quantization system 100a may generate a neural network interface 110, a parameter quantizer 120, a sample quantizer 130, a bias corrector 140a, and a sample generator 150a. The neural network system 200, the neural network interface 110, the parameter quantizer 120, and the sample quantizer 130 according to the embodiment of FIG. 10 are the neural network system 200, the neural network interface 110, and the parameter quantization of FIG. Since it may correspond to the group 120 and the sample quantizer 130, a redundant description will be omitted.

샘플 생성기(150a)는 논리 합성을 통해서 구현되는 로직 블록, 프로세서에 의해서 수행되는 소프트웨어 블록 또는 이들의 조합으로 구현될 수 있다. 한편, 도 10을 도시하고 설명함에 있어서, 샘플 생성기(150a)가 양자화 시스템(100a)에 포함되는 것으로 도시하고 설명하였지만, 실시예에 따라, 샘플 생성기(150a)는 신경망 시스템(200)에 포함되거나, 신경망 시스템(200) 및 양자화 시스템(100a)과 별도의 구성으로 구현될 수 있다.The sample generator 150a may be implemented as a logic block implemented through logic synthesis, a software block performed by a processor, or a combination thereof. On the other hand, in the illustration and description of FIG. 10, the sample generator 150a is illustrated and described as being included in the quantization system 100a, but according to the embodiment, the sample generator 150a is included in the neural network system 200 or , It may be implemented in a separate configuration from the neural network system 200 and the quantization system 100a.

샘플 생성기(150a)는 신경망 시스템(200)에서 이미 처리한 적어도 하나의 입력 샘플을 이용하여 예상되는 다음 순서의 입력 샘플을 생성할 수 있다. 예를 들어, 샘플 생성기(150a)는 이미 처리한 적어도 하나의 입력 샘플들을 분석하고, 분석 결과에 따라 다음 순서 입력 샘플을 예상할 수 있다. 그리고 샘플 생성기(150a)는 예상 입력 샘플을 바이어스 보정기(140a)에 제공할 수 있다. 샘플 생성기(150a)가 이미 처리한 적어도 하나의 입력 샘플을 이용하여 예상되는 다음 순서의 입력 샘플을 생성하는 동작에 대한 구체적인 설명은 도 11 및 도 12에서 후술한다.The sample generator 150a may generate input samples in the next expected sequence using at least one input sample already processed by the neural network system 200. For example, the sample generator 150a may analyze at least one input sample that has already been processed, and predict the next input sample according to the analysis result. In addition, the sample generator 150a may provide the expected input sample to the bias corrector 140a. A detailed description of an operation of generating the next expected input sample using at least one input sample already processed by the sample generator 150a will be described later with reference to FIGS. 11 and 12.

바이어스 보정기(140a)는 수신한 예상 입력 샘플을 이용하여 보정 바이어스를 생성할 수 있다. 바이어스 보정기(140a)가 예상 입력 샘플을 이용하여 보정 바이어스를 생성하는 구체적인 방법은 도 11에서 후술한다.The bias corrector 140a may generate a correction bias using the received expected input sample. A detailed method of generating a correction bias by the bias corrector 140a using an expected input sample will be described later in FIG. 11.

도 11은 본 개시의 일 실시예에 따른 다음 입력 샘플을 예상하는 방법을 나타내는 도면이다. 구체적으로, 도 11은 도 10의 샘플 생성기(150a)가 이미 처리한 적어도 하나의 입력 샘플을 이용하여 다음 순서의 입력 샘플을 예상하는 방법을 나타내는 도면이다.11 is a diagram illustrating a method of predicting a next input sample according to an embodiment of the present disclosure. Specifically, FIG. 11 is a diagram illustrating a method of predicting an input sample of a next sequence using at least one input sample already processed by the sample generator 150a of FIG. 10.

신경망 시스템(200)에서 처리하는 입력 샘플들은 연속으로 촬영된 이미지들일 수 있다. 연속으로 촬영된 이미지들 각각은, 이전 순서 또는 이후 순서의 이미지와 차이가 작고, 시간 흐름에 따라 이미지들 간의 차이가 방향성을 가질 수 있다. 따라서, 샘플 생성기(150a)는 이미 처리된 입력 샘플들, 예를 들어, 연속으로 촬영된 이미지들의 차이를 분석함으로써, 다음 순서의 입력 샘플을 예상할 수 있다. 그리고 샘플 생성기(150a)는 예상되는 다음 순서의 입력 샘플을 바이어스 보정기(140a)에 제공할 수 있다. 샘플 생성기(150a)는 이미 처리된 입력 샘플들을 양자화 시스템(100a)에 포함된 메모리(미도시)로부터 독출하거나, 신경망 시스템(200) 또는 그 외의 구별되는 임의의 구성으로부터 이미 처리된 입력 샘플들을 수신할 수 있다.Input samples processed by the neural network system 200 may be continuously photographed images. Each of the consecutively photographed images has a small difference from the previous or subsequent images, and the difference between the images may have a directionality over time. Accordingly, the sample generator 150a may predict a next sequence of input samples by analyzing differences between input samples that have already been processed, for example, images that have been successively photographed. In addition, the sample generator 150a may provide an input sample of the next expected sequence to the bias corrector 140a. The sample generator 150a reads input samples that have already been processed from a memory (not shown) included in the quantization system 100a, or receives input samples that have already been processed from the neural network system 200 or any other distinct configuration. can do.

예를 들어, 도 11을 참조하면, 샘플 생성기(150a)는 칼만 필터(Kalman filter) 등을 이용하여 이미 처리한 이미지들 중 가장 최근의 이미지들인 제N-2 프레임(Frame N-2) 및 제N-1 프레임(Frame N-1)의 모션 벡터들을 산출할 수 있다. 그리고 샘플 생성기(150a)는 산출한 모션 벡터들을 이용하여 이미지 내에서 움직이는 물체(예를 들어, 사람)를 확인할 수 있다. 그리고 샘플 생성기(150a)는 확인한 물체의 이미지 내 이동 경로를 예상함으로써, 예상 제N 프레임(Expected Frame N)을 생성할 수 있다.For example, referring to FIG. 11, the sample generator 150a includes a frame N-2 and a frame N-2, which are the most recent images among images already processed using a Kalman filter or the like. Motion vectors of an N-1 frame (Frame N-1) can be calculated. In addition, the sample generator 150a may identify a moving object (eg, a person) in the image using the calculated motion vectors. In addition, the sample generator 150a may generate an Expected Frame N by predicting the movement path of the identified object in the image.

한편, 도 11을 도시하고 설명함에 있어서, 샘플 생성기(150a)가 2개의 최근 이미지를 이용하여 다음 이미지를 예상하는 것으로 도시하고 설명하였지만, 본 개시는 이에 한정되지 않으며, 2개 이상의 최근 이미지들을 이용하여 다음 이미지를 예상할 수 있음은 물론이다.Meanwhile, in the illustration and description of FIG. 11, the sample generator 150a is illustrated and described as predicting the next image using two recent images, but the present disclosure is not limited thereto, and two or more recent images are used. It goes without saying that the next image can be expected.

도 12는 본 개시의 일 실시예에 따른 다음 입력 샘플을 예상하는 방법을 나타내는 도면이다. 구체적으로, 도 12는 도 10의 샘플 생성기(150a)가 이미 처리한 적어도 하나의 입력 샘플을 이용하여 다음 순서의 입력 샘플을 예상하는 방법을 나타내는 도면이다.12 is a diagram illustrating a method of predicting a next input sample according to an embodiment of the present disclosure. Specifically, FIG. 12 is a diagram illustrating a method of predicting an input sample of a next sequence using at least one input sample already processed by the sample generator 150a of FIG. 10.

샘플 생성기(150a)는 인공 지능(AI; artificial intelligence) 모듈(151a)을 포함할 수 있다. 인공 지능 모듈(151a)은 입력 샘플을 기초로 다음 샘플을 예상하도록 학습된 인공 지능 모듈일 수 있다. 샘플 생성기(150a)는 인공 지능 모듈(151a)에 이미 처리한 적어도 하나의 입력 샘플을 입력하고, 인공 지능 모듈(151)로부터 출력된 출력 샘플을 다음 입력 샘플로서 바이어스 보정기(140a)에 제공할 수 있다. 인공 지능 모듈(151a)은 논리 합성을 통해서 구현되는 로직 블록, 프로세서에 의해서 수행되는 소프트웨어 블록 또는 이들의 조합으로 구현될 수 있다.The sample generator 150a may include an artificial intelligence (AI) module 151a. The artificial intelligence module 151a may be an artificial intelligence module that has been trained to predict a next sample based on an input sample. The sample generator 150a may input at least one input sample that has already been processed to the artificial intelligence module 151a, and provide the output sample output from the artificial intelligence module 151 to the bias corrector 140a as the next input sample. have. The artificial intelligence module 151a may be implemented as a logic block implemented through logic synthesis, a software block executed by a processor, or a combination thereof.

한편, 도 11 및 도 12와 관련하여, 신경망 시스템(200)에서 처리하는 입력 샘플들은 연속으로 촬영된 이미지들인 것으로 설명하였지만, 본 개시는 이에 한정되지 않으며, 음성 데이터 또는 깊이(Depth) 데이터 등 다양한 종류의 데이터일 수 있다.On the other hand, with reference to FIGS. 11 and 12, the input samples processed by the neural network system 200 have been described as being continuously photographed images, but the present disclosure is not limited thereto, and various data such as voice data or depth data It can be a kind of data.

도 13은 본 개시의 일 실시예에 따른 참조 샘플, 양자화된 참조 샘플 및 참조 샘플의 양자화 오차를 결정하는 방법을 나타내는 흐름도이다. 구체적으로, 도 13은 도 10의 신경망 시스템(200) 및 양자화 시스템(100a)이 다음 순서의 입력 샘플의 연산에서 사용할 수 있는 보정 바이어스를 생성하기 위해, 참조 샘플, 양자화된 참조 샘플 및 참조 샘플의 양자화 오차를 결정하는 방법을 나타내는 흐름도이다.13 is a flowchart illustrating a method of determining a reference sample, a quantized reference sample, and a quantization error of a reference sample according to an embodiment of the present disclosure. Specifically, FIG. 13 shows a reference sample, a quantized reference sample, and a reference sample in order to generate a correction bias that the neural network system 200 and the quantization system 100a of FIG. 10 can use in the calculation of the next input sample. It is a flow chart showing a method of determining a quantization error.

도 10 및 도 13을 참조하면, 양자화 시스템(100a)은 이미 처리한 입력 샘플들 중 적어도 하나의 제1 샘플을 선택할 수 있다(S310). 구체적으로, 양자화 시스템(100a)의 샘플 생성기(150a)는 이미 처리한 입력 샘플들 중 기설정된 개수의 제1 샘플을 선택할 수 있다. 여기서 기설정된 개수는 제조사 또는 사용자에 의해 설정될 수 있다. 예를 들어, 샘플 생성기(150a)는 입력 샘플들이 연속으로 촬영된 이미지들인 경우, 이미 처리한 연속되는 이미지들 중 최근에 처리한 100개의 이미지들을 제1 샘플로 선택할 수 있다.10 and 13, the quantization system 100a may select at least one first sample from among input samples that have already been processed (S310 ). Specifically, the sample generator 150a of the quantization system 100a may select a preset number of first samples from among input samples that have already been processed. Here, the preset number may be set by a manufacturer or a user. For example, when the input samples are images continuously photographed, the sample generator 150a may select 100 recently processed images from among consecutive images that have already been processed as the first sample.

그리고 양자화 시스템(100a)은 제1 샘플을 기초로 다음 순서의 제2 샘플을 예상할 수 있다(S320). 구체적으로, 샘플 생성기(150a)는 기설정된 개수의 제1 샘플들을 분석하고, 분석 결과에 따라 다음 순서의 제2 샘플을 예상할 수 있다. 샘플 생성기(150a)가 다음 순서의 제2 샘플을 예상하는 방법은 도 11 및 도 12에서 전술한 방법과 실질적으로 동일할 수 있다. 그리고 샘플 생성기(150a)는 제2 샘플을 바이어스 보정기(140a)에 제공할 수 있다.In addition, the quantization system 100a may predict a second sample in the next order based on the first sample (S320). Specifically, the sample generator 150a may analyze a preset number of first samples and predict a second sample in the next order according to the analysis result. A method of predicting the second sample in the next sequence by the sample generator 150a may be substantially the same as the method described above with reference to FIGS. 11 and 12. In addition, the sample generator 150a may provide the second sample to the bias corrector 140a.

그리고 양자화 시스템(100a)은 예상한 제2 샘플을 참조 샘플로 결정할 수 있다(S330). 그리고 양자화 시스템(100a)은 적어도 하나의 제1 샘플의 양자화 오차를 이용하여 참조 샘플의 양자화 오차를 결정할 수 있다(S340). 일 실시예에서, 바이어스 보정기(140a)는 적어도 하나의 제1 샘플의 양자화 오차에 대한 평균을 취함으로써, 참조 샘플의 양자화 오차를 결정할 수 있다. 또는, 바이어스 보정기(140a)는 적어도 하나의 제1 샘플 중 가장 최근에 처리한 샘플의 양자화 오차를 확인하고, 확인한 양자화 오차를 참조 샘플의 양자화 오차로 결정할 수 있다. 즉, 바이어스 보정기(140a)는 입력 샘플(X)은 양자화된 입력 샘플(q_X) 및 입력 샘플의 양자화 오차(e_X)와 동일하다는 특성을 이용하지 않고, 참조 샘플의 양자화 오차를 결정할 수 있다.In addition, the quantization system 100a may determine the expected second sample as a reference sample (S330). In addition, the quantization system 100a may determine a quantization error of the reference sample by using the quantization error of at least one first sample (S340). In an embodiment, the bias corrector 140a may determine the quantization error of the reference sample by taking an average of the quantization error of at least one first sample. Alternatively, the bias corrector 140a may check the quantization error of the most recently processed sample among the at least one first sample, and determine the determined quantization error as the quantization error of the reference sample. That is, the bias corrector 140a may determine the quantization error of the reference sample without using a characteristic that the input sample X is equal to the quantized input sample q_X and the quantization error e_X of the input sample.

그리고 양자화 시스템(100a)은 결정한 참조 샘플 및 참조 샘플의 양자화 오차를 기초로 보정 바이어스를 생성할 수 있다. 구체적으로, 도 6을 참조하면, 양자화 시스템(100a)의 바이어스 보정기(140a)는 참조 샘플의 양자화 오차(e_X’) 및 양자화된 가중치(q_W)를 기초로 제1 MAC 연산을 수행할 수 있다. 그리고 바이어스 보정기(140a)는 참조 샘플(X’) 및 가중치의 양자화 오차(e_W)를 기초로 제2 MAC 연산을 수행할 수 있다. 그리고 바이어스 보정기(140a)는 제1 MAC 연산 결과, 제2 MAC 연산 결과 및 인공 신경망의 바이어스(bias, 양자화된 바이어스(q_bias) 및 바이어스의 양자화 오차(e_bias)의 합과 같다)를 더하여 보정 바이어스(q_bias1)를 생성할 수 있다.In addition, the quantization system 100a may generate a correction bias based on the determined reference sample and a quantization error of the reference sample. Specifically, referring to FIG. 6, the bias corrector 140a of the quantization system 100a may perform a first MAC operation based on a quantization error (e_X') of a reference sample and a quantized weight (q_W). In addition, the bias corrector 140a may perform a second MAC operation based on the reference sample X'and the quantization error e_W of the weight. In addition, the bias corrector 140a adds the first MAC operation result, the second MAC operation result, and the bias of the artificial neural network (equal to the sum of the bias, quantized bias (q_bias), and quantization error (e_bias) of the bias) to compensate for the correction bias ( q_bias1) can be created.

한편, 변형 가능한 실시예에 따르면, 양자화 시스템(100a)은 결정한 참조 샘플을 양자화하고, 양자화된 참조 샘플 및 참조 샘플의 양자화 오차를 기초로 보정 바이어스를 생성할 수도 있다. 구체적으로, 도 7을 참조하면, 양자화 시스템(100a)의 바이어스 보정기(140a)는 참조 샘플의 양자화 오차(e_X’) 및 양자화되지 않은 가중치(W)를 기초로 제3 MAC 연산을 수행할 수 있다. 그리고 바이어스 보정기(140a)는 양자화된 참조 샘플(q_X’) 및 가중치의 양자화 오차(e_W)를 기초로 제4 MAC 연산을 수행할 수 있다. 그리고 바이어스 보정기(140a)는 제3 MAC 연산 결과, 제4 MAC 연산 결과 및 인공 신경망의 바이어스(bias)를 더하여 보정 바이어스(q_bias1)를 생성할 수 있다.Meanwhile, according to a deformable embodiment, the quantization system 100a may quantize the determined reference sample and generate a correction bias based on the quantized reference sample and a quantization error of the reference sample. Specifically, referring to FIG. 7, the bias corrector 140a of the quantization system 100a may perform a third MAC operation based on a quantization error (e_X′) of a reference sample and an unquantized weight (W). . In addition, the bias corrector 140a may perform a fourth MAC operation based on the quantized reference sample q_X' and the quantization error e_W of the weight. In addition, the bias corrector 140a may generate a correction bias q_bias1 by adding the third MAC operation result, the fourth MAC operation result, and a bias of the artificial neural network.

한편, 바이어스 보정기(140a)는 도 8을 참조하여 전술한 바와 같이, 생성한 보정 바이어스(q_bias1)를 구성하는 값들에 대한 평균을 취함으로써, 스칼라 값을 갖는 제2 보정 바이어스(q_bias2)를 생성할 수 있음은 물론이다.Meanwhile, the bias corrector 140a generates a second correction bias q_bias2 having a scalar value by taking an average of the values constituting the generated correction bias q_bias1 as described above with reference to FIG. 8. Of course you can.

이와 같이, 바이어스 보정기(140a)는 이미 처리한 입력 샘플들을 이용하여 참조 샘플, 양자화된 참조 샘플 및/또는 참조 샘플의 양자화 오차를 결정할 수 있으며, 결정한 참조 샘플, 양자화된 참조 샘플 및 참조 샘플의 양자화 오차를 기초로 보정 바이어스(q_bias1 또는 q_bias2)를 생성할 수 있다. 신경망 시스템(200)은 생성한 보정 바이어스(q_bias1 또는 q_bias2)를 다음 순서의 입력 샘플의 연산에서 사용할 수 있다.In this way, the bias corrector 140a may determine the quantization error of the reference sample, the quantized reference sample, and/or the reference sample using input samples that have already been processed, and quantization of the determined reference sample, the quantized reference sample, and the reference sample. A correction bias (q_bias1 or q_bias2) may be generated based on the error. The neural network system 200 may use the generated correction bias (q_bias1 or q_bias2) in the calculation of input samples in the next sequence.

도 14는 본 개시의 일 실시예에 따른 양자화된 인공 신경망을 통해 다음 순서의 입력 샘플에 대한 연산을 수행하는 방법을 나타내는 도면이다. 구체적으로, 도 14는 도 10의 양자화 시스템(100a) 및 신경망 시스템(200)이 다음 순서의 입력 샘플에 대한 연산을 수행하는 방법을 나타내는 도면이다. 이하에서는 설명의 편의를 위해, 입력 샘플들이 연속으로 촬영된 이미지들인 것을 전제로 설명한다.14 is a diagram illustrating a method of performing an operation on an input sample of a next sequence through a quantized artificial neural network according to an embodiment of the present disclosure. Specifically, FIG. 14 is a diagram illustrating a method for the quantization system 100a and the neural network system 200 of FIG. 10 to perform operations on input samples in the next order. Hereinafter, for convenience of description, it is assumed that input samples are images taken continuously.

도 10 및 도 14를 참조하면, 복수의 입력 샘플들인 연속으로 촬영된 이미지들(Frame 0 내지 Frame N)이 순차적으로 처리될 수 있다. 제N-1 프레임에 대한 처리가 완료되면, 양자화 시스템(100a)은 제N 프레임에 대한 처리를 준비할 수 있다.Referring to FIGS. 10 and 14, consecutively photographed images (Frame 0 to Frame N), which are a plurality of input samples, may be sequentially processed. When processing for the N-1th frame is completed, the quantization system 100a may prepare to process the Nth frame.

구체적으로, 양자화 시스템(100a)은 이미 처리된 이미지들(Frame 0 내지 Frame N-1) 중 적어도 하나의 이미지를 선택할 수 있다. 예를 들어, 도 14를 참조하면, 양자화 시스템(100a)의 샘플 생성기(150a)는 제N-1 프레임(Frame N-1)을 선택할 수 있다. 그리고 양자화 시스템(100a)은 선택한 이미지를 기초로 다음 순서의 이미지를 예상할 수 있다. 예를 들어, 도 14를 참조하면, 샘플 생성기(150a)는 제N-1 프레임(Frame N-1)을 기초로 제N 프레임(Frame N)을 예상함으로써, 예상 제N 프레임(Expected Frame N)을 생성할 수 있다.Specifically, the quantization system 100a may select at least one image from among images that have already been processed (Frame 0 to Frame N-1). For example, referring to FIG. 14, the sample generator 150a of the quantization system 100a may select an N-1th frame (Frame N-1). In addition, the quantization system 100a may predict a next sequence of images based on the selected image. For example, referring to FIG. 14, the sample generator 150a predicts an Nth frame (Frame N) based on an N-1th frame (Frame N-1), and thus an expected Nth frame (Expected Frame N). Can be created.

그리고 양자화 시스템(100a)은 다음 순서의 이미지를 이용하여 보정 바이어스를 생성할 수 있다. 예를 들어, 도 14를 참조하면, 바이어스 보정기(140a)는 샘플 생성기(150a)로부터 예상 제N 프레임(Expected Frame N)을 수신하고, 수신한 예상 제N 프레임(Expected Frame N)을 참조 샘플(X’)로 결정할 수 있다. 그리고 바이어스 보정기(140a)는 이미 처리된 이미지들(Frame 0 내지 Frame N-1) 중 적어도 하나의 양자화 오차를 이용하여 참조 샘플의 양자화 오차(e_X’)를 결정할 수 있다. 일 실시예로, 바이어스 보정기(140a)는 제N-1 프레임(Frame N-1)의 양자화 오차를 참조 샘플의 양자화 오차(e_X’)로 결정하거나, 가장 최근에 처리된 기설정된 개수의 이미지들의 양자화 오차에 평균을 취함으로써 참조 샘플의 양자와 오차(e_X’)를 결정할 수 있다.In addition, the quantization system 100a may generate a correction bias using an image in the following order. For example, referring to FIG. 14, the bias corrector 140a receives an expected Nth frame N from the sample generator 150a, and refers to the received expected Nth frame N as a reference sample ( X'). In addition, the bias corrector 140a may determine the quantization error e_X' of the reference sample by using at least one quantization error among the already processed images Frame 0 to Frame N-1. In one embodiment, the bias corrector 140a determines the quantization error of the N-1th frame (Frame N-1) as the quantization error (e_X′) of the reference sample, or the most recently processed preset number of images. By taking the average of the quantization error, it is possible to determine the quantum and error (e_X') of the reference sample.

이미 처리된 이미지들(Frame 0 내지 Frame N-1)의 양자화 오차들은, 이미 처리된 이미지들(Frame 0 내지 Frame N-1)에 대한 지난 연산 과정에서 산출됨으로써, 메모리(160)에 저장될 수 있다. 예를 들어, 입력 샘플(X)은 양자화된 입력 샘플(q_X) 및 입력 샘플의 양자화 오차(e_X)와 동일하다는 특성을 통해, 이미 처리된 이미지들(Frame 0 내지 Frame N-1)의 양자화 오차들은 산출됨으로써, 메모리(160)에 저장될 수 있다. 따라서, 바이어스 보정기(140a)는 메모리(160)로부터 이미 처리된 이미지들(Frame 0 내지 Frame N-1) 중 적어도 하나의 양자화 오차를 독출하고, 독출한 오차를 이용하여 참조 샘플의 양자화 오차를 결정할 수 있다. 한편, 도 14에서는 메모리(160)가 양자화 시스템(100a)에 포함되는 것으로 도시하고 설명하였지만, 메모리(160)는 신경망 시스템(200)에 포함되거나, 신경망 시스템(200) 및 양자화 시스템(100a)과 별도의 구성으로 구현될 수 있다.Quantization errors of the images (Frame 0 to Frame N-1) that have already been processed are calculated during the last operation on the images (Frame 0 to Frame N-1) that have already been processed, so that they can be stored in the memory 160. have. For example, through the characteristic that the input sample (X) is the same as the quantized input sample (q_X) and the quantization error (e_X) of the input sample, the quantization error of the already processed images (Frame 0 to Frame N-1) They may be calculated and stored in the memory 160. Therefore, the bias corrector 140a reads the quantization error of at least one of the images (Frame 0 to Frame N-1) already processed from the memory 160 and determines the quantization error of the reference sample using the read error. I can. Meanwhile, in FIG. 14, the memory 160 is illustrated and described as being included in the quantization system 100a, but the memory 160 is included in the neural network system 200, or the memory 160 is included in the neural network system 200 and the quantization system 100a. It can be implemented as a separate configuration.

그리고 바이어스 보정기(140a)는 참조 샘플(X’) 및 참조 샘플의 양자화 오차(e_X’)를 기초로 보정 바이어스(q_bias1 또는 q_bias2)를 생성할 수 있다. 그리고 바이어스 보정기(140a)는 생성한 보정 바이어스(q_bias1 또는 q_bias2)를 신경망 인터페이스(110)를 통해 신경망 시스템(200)으로 제공할 수 있다.In addition, the bias corrector 140a may generate a correction bias q_bias1 or q_bias2 based on the reference sample X'and the quantization error e_X' of the reference sample. In addition, the bias corrector 140a may provide the generated correction bias q_bias1 or q_bias2 to the neural network system 200 through the neural network interface 110.

그리고 샘플 양자화기(130)는 제N 프레임(Frame N)을 양자화함으로써, 양자화된 제N 프레임(Quantized Frame N)을 생성할 수 있다. 그리고 샘플 양자화기(130)는 양자화된 제N 프레임(Quantized Frame N)을 신경망 인터페이스(110)를 통해 신경망 시스템(200)으로 제공할 수 있다.In addition, the sample quantizer 130 quantizes the Nth frame (Frame N) to generate a quantized Nth frame (Quantized Frame N). In addition, the sample quantizer 130 may provide the quantized Nth frame N to the neural network system 200 through the neural network interface 110.

그리고 신경망 시스템(200)은 수신한 양자화된 제N 프레임(Quantized Frame N) 및 보정 바이어스(q_bias1 또는 q_bias2)를 기초로 연산을 수행할 수 있다. 구체적으로, 신경망 시스템(200)은 제N 프레임(Quantized Frame N) 및 양자화된 가중치를 기초로 MAC 연산을 수행하고, MAC 연산 결과에 보정 바이어스(q_bias1 또는 q_bias2)를 반영하여 양자화된 출력 샘플(q_Y)을 생성할 수 있다.In addition, the neural network system 200 may perform an operation based on the received quantized frame N and a correction bias q_bias1 or q_bias2. Specifically, the neural network system 200 performs a MAC operation based on an Nth frame (Quantized Frame N) and a quantized weight, and reflects a correction bias (q_bias1 or q_bias2) in the MAC operation result to reflect the quantized output sample (q_Y). ) Can be created.

도 15는 본 개시의 일 실시예에 따른 인공 신경망을 이용한 연산 방법을 나타내는 흐름도이다. 구체적으로, 도 15는 도 2 또는 도 10의 컴퓨팅 시스템(1000, 1000a)의 인공 신경망을 이용한 연산 방법을 나타내는 흐름도이다.15 is a flowchart illustrating an operation method using an artificial neural network according to an embodiment of the present disclosure. Specifically, FIG. 15 is a flowchart illustrating an operation method using an artificial neural network of the computing systems 1000 and 1000a of FIG. 2 or 10.

도 2, 도 10 및 도 15를 참조하면, 컴퓨팅 시스템(1000, 1000a)은 인공 신경망의 파라미터를 양자화할 수 있다(S410). 예를 들어, 컴퓨팅 시스템(1000, 1000a)의 양자화 시스템(100, 100a)은 가중치, 바이어스 등의 인공 신경망의 파라미터들을 양자화할 수 있다. 그리고 컴퓨팅 시스템(1000, 1000a)은 양자화에 의한 오차를 포함하도록 양자화된 바이어스를 보정하여 보정 바이어스를 생성할 수 있다(S420). 여기서 보정 바이어스는, 인공 신경망의 양자화된 가중치, 가중치의 양자화 오차, 양자화된 바이어스, 바이어스의 양자화 오차, 참조 샘플, 양자화된 참조 샘플 및 참조 샘플의 양자화 오차 중 적어도 하나를 이용하여 생성될 수 있다. 참조 샘플은 현재 처리 중인 입력 샘플과 상이한 복수의 샘플들을 통해 결정되거나, 이미 처리된 입력 샘플들 중 적어도 하나를 통해 결정될 수 있다.2, 10, and 15, the computing systems 1000 and 1000a may quantize the parameters of the artificial neural network (S410). For example, the quantization systems 100 and 100a of the computing systems 1000 and 1000a may quantize parameters of an artificial neural network such as weights and biases. In addition, the computing systems 1000 and 1000a may generate a correction bias by correcting the quantized bias to include an error due to quantization (S420). Here, the correction bias may be generated using at least one of a quantized weight of an artificial neural network, a quantization error of a weight, a quantized bias, a quantization error of a bias, a reference sample, a quantized reference sample, and a quantization error of a reference sample. The reference sample may be determined through a plurality of samples different from the input sample currently being processed, or may be determined through at least one of input samples that have already been processed.

그리고 컴퓨팅 시스템(1000, 1000a)은 입력 샘플을 양자화할 수 있다(S430). 예를 들어, 컴퓨팅 시스템(1000, 1000a)의 양자화 시스템(100, 100a)은 입력 샘플을 양자화할 수 있다. 그리고 양자화 시스템(100, 100a)은 컴퓨터 시스템(1000, 1000a)의 신경망 시스템(200)으로 양자화된 입력 샘플을 제공할 수 있다. 그리고 컴퓨팅 시스템(1000, 1000a)은 양자화된 가중치와 양자화된 입력 샘플을 기초로 MAC(multiply-accumulate) 연산을 수행할 수 있다(S440). 예를 들어, 신경망 시스템(200)은 양자화된 입력 샘플을 수신하고, 양자화된 가중치와 수신한 양자화된 입력 샘플을 기초로 MAC 연산을 수행할 수 있다.In addition, the computing systems 1000 and 1000a may quantize the input samples (S430). For example, the quantization systems 100 and 100a of the computing systems 1000 and 1000a may quantize the input samples. In addition, the quantization systems 100 and 100a may provide quantized input samples to the neural network system 200 of the computer systems 1000 and 1000a. Further, the computing systems 1000 and 1000a may perform a multiply-accumulate (MAC) operation based on the quantized weight and the quantized input sample (S440). For example, the neural network system 200 may receive a quantized input sample and perform a MAC operation based on the quantized weight and the received quantized input sample.

그리고 컴퓨팅 시스템(1000, 1000a)은 양자화 오차를 보정하기 위한 보정 바이어스를 MAC 연산의 결과에 반영할 수 있다(S450). 예를 들어, 컴퓨팅 시스템(1000, 1000a)의 신경망 시스템(200)은 MAC 연산의 결과에 양자화 시스템(100, 100a)으로부터 수신한 보정 바이어스를 반영함으로써 최종 연산 결과를 생성할 수 있다. 본 개시의 일 실시예에 따른 컴퓨팅 시스템은 양자화 과정에서 발생하는 오차의 기대값을 보정 바이어스로 생성하고, 생성한 보정 바이어스를 양자화된 인공 신경망을 통한 MAC 연산 결과에 반영할 수 있다. 이에 따라, 본 개시의 일 실시예에 따른 인공 신경망을 이용한 연산 방법은 양자화된 인공 신경망의 이용에 따른 감소된 복잡도를 가지면서도, 보정 바이어스의 반영에 따른 양호한 성능을 함께 가질 수 있다.In addition, the computing systems 1000 and 1000a may reflect the correction bias for correcting the quantization error in the result of the MAC operation (S450). For example, the neural network system 200 of the computing systems 1000 and 1000a may generate a final operation result by reflecting the correction bias received from the quantization systems 100 and 100a on the result of the MAC operation. The computing system according to an embodiment of the present disclosure may generate an expected value of an error occurring in a quantization process as a correction bias, and reflect the generated correction bias to a result of a MAC operation through a quantized artificial neural network. Accordingly, a computation method using an artificial neural network according to an embodiment of the present disclosure may have a reduced complexity due to the use of a quantized artificial neural network and have good performance according to reflection of a correction bias.

도 16은 본 개시의 일 실시예에 따른 전자 장치를 나타내는 블록도이다. 16 is a block diagram illustrating an electronic device according to an embodiment of the present disclosure.

일 실시예에서, 도 2 또는 도 10의 양자화 시스템(100, 100a)은 도 16의 전자 장치(300)로 구현될 수 있다. 도 16에 도시된 바와 같이, 전자 장치(300)는 시스템 메모리(310), 프로세서(330), 스토리지(350), 입출력 장치들(370) 및 통신 접속들(390)을 포함할 수 있다. 전자 장치(300)에 포함된 구성요소들은 상호 통신가능하게, 예를 들어, 버스를 통해서 연결될 수 있다.In an embodiment, the quantization systems 100 and 100a of FIG. 2 or 10 may be implemented with the electronic device 300 of FIG. 16. As shown in FIG. 16, the electronic device 300 may include a system memory 310, a processor 330, a storage 350, input/output devices 370, and communication connections 390. Components included in the electronic device 300 may be connected to each other to enable communication, for example, through a bus.

시스템 메모리(310)는 프로그램(312)을 포함할 수 있다. 프로그램(312)은 프로세서(330)로 하여금 본 개시의 실시예들에 따른 인공 신경망의 양자화, 입력 샘플의 양자화, 및 보정 바이어스 생성을 수행하도록 할 수 있다. 예를 들면, 프로그램(312)은 프로세서(330)에 의해서 실행가능한(executable) 복수의 명령어들을 포함할 수 있다. 그리고 프로그램(312)에 포함된 복수의 명령어들이 프로세서(330)에 의해서 실행됨으로써 인공 신경망의 양자화가 수행되거나, 입력 샘플의 양자화가 수행되거나, 또는 보정 바이어스가 생성이 수행될 수 있다. 시스템 메모리(310)는, 비제한적인 예시로서, SRAM(Static Random Access Memory), DRAM(Dynamic Random Access Memory)와 같은 휘발성 메모리를 포함할 수도 있고, 플래시 메모리(flash memory) 등과 같은 비휘발성 메모리를 포함할 수도 있다.The system memory 310 may include a program 312. The program 312 may cause the processor 330 to perform quantization of an artificial neural network, quantization of an input sample, and generation of a correction bias according to embodiments of the present disclosure. For example, the program 312 may include a plurality of instructions executable by the processor 330. In addition, by executing a plurality of instructions included in the program 312 by the processor 330, the artificial neural network may be quantized, the input sample may be quantized, or a correction bias may be generated. As a non-limiting example, the system memory 310 may include a volatile memory such as static random access memory (SRAM) and dynamic random access memory (DRAM), and nonvolatile memory such as flash memory. It can also be included.

프로세서(330)는 임의의 명령어 세트(예컨대, IA-32(Intel Architecture-32), 64 비트 확장 IA-32, x86-64, PowerPC, Sparc, MIPS, ARM, IA-64 등)을 실행할 수 있는 적어도 하나의 코어를 포함할 수 있다. 프로세서(330)는 시스템 메모리(310)에 저장된 명령어들을 실행할 수 있으며, 프로그램(312)을 실행함으로써 인공 신경망의 양자화, 입력 샘플의 양자화, 또는 보정 바이어스의 생성을 수행할 수 있다.The processor 330 is capable of executing any instruction set (eg, IA-32 (Intel Architecture-32), 64-bit extended IA-32, x86-64, PowerPC, Sparc, MIPS, ARM, IA-64, etc.). It may include at least one core. The processor 330 may execute instructions stored in the system memory 310, and may perform quantization of an artificial neural network, quantization of input samples, or generation of a correction bias by executing the program 312.

스토리지(350)는 전자 장치(300)에 공급되는 전력이 차단되더라도 저장된 데이터를 소실하지 아니할 수 있다. 예를 들면, 스토리지(350)는 EEPROM(non-volatile memory such as a Electrically Erasable Programmable Read-Only Memory), 플래시 메모리(flash memory), PRAM(Phase Change Random Access Memory), RRAM(Resistance Random Access Memory), NFGM(Nano Floating Gate Memory), PoRAM(Polymer Random Access Memory), MRAM(Magnetic Random Access Memory), FRAM(Ferroelectric Random Access Memory) 등과 같은 비휘발성 메모리를 포함할 수도 있고, 자기 테이프, 광학 디스크, 자기 디스크와 같은 저장 매체를 포함할 수도 있다. 일부 실시예들에서, 스토리지(350)는 전자 장치(300)로부터 탈착 가능할 수도 있다.The storage 350 may not lose stored data even when power supplied to the electronic device 300 is cut off. For example, the storage 350 includes non-volatile memory such as a Electrically Erasable Programmable Read-Only Memory (EEPROM), flash memory, Phase Change Random Access Memory (PRAM), and Resistance Random Access Memory (RRAM). , NFGM (Nano Floating Gate Memory), PoRAM (Polymer Random Access Memory), MRAM (Magnetic Random Access Memory), FRAM (Ferroelectric Random Access Memory). It may also include a storage medium such as a disk. In some embodiments, the storage 350 may be detachable from the electronic device 300.

일 실시예에서, 스토리지(350)는 본 개시의 일 실시예에 따른 인공 신경망의 양자화, 입력 샘플의 양자화 및 보정 바이어스의 생성을 위한 프로그램(312)을 저장할 수 있다. 그리고 프로그램(312)이 프로세서(330)에 의해서 실행되기 이전에, 스토리지(350)로부터 프로그램(312) 또는 그것의 적어도 일부가 시스템 메모리(310)로 로딩될 수 있다. 일 실시예에서, 스토리지(350)는 프로그램 언어로 작성된 파일을 저장할 수 있고, 파일로부터 컴파일러 등에 의해서 생성된 프로그램(312) 또는 그것의 적어도 일부가 시스템 메모리(310)로 로딩될 수도 있다.In an embodiment, the storage 350 may store a program 312 for quantizing an artificial neural network, quantizing an input sample, and generating a correction bias according to an embodiment of the present disclosure. And before the program 312 is executed by the processor 330, the program 312 or at least a portion thereof from the storage 350 may be loaded into the system memory 310. In one embodiment, the storage 350 may store a file written in a program language, and a program 312 generated from a file by a compiler or the like, or at least a part thereof, may be loaded into the system memory 310.

일 실시예에서, 스토리지(350)는 프로세서(330)에 의해서 처리될 데이터 및/또는 프로세서(330)에 의해서 처리된 데이터를 저장할 수 있다. 예를 들면, 스토리지(350)는 입력 샘플들을 저장할 수도 있고, 양자화된 입력 샘플들 및 입력 샘플들의 양자화 오차들을 저장할 수도 있으며, 생성된 보정 바이어스 등을 저장할 수도 있다.In one embodiment, the storage 350 may store data to be processed by the processor 330 and/or data processed by the processor 330. For example, the storage 350 may store input samples, quantized input samples and quantization errors of the input samples, and may store a generated correction bias.

입출력 장치들(370)은 키보드, 포인팅 장치 등과 같은 입력 장치를 포함할 수 있고, 디스플레이 장치, 프린터 등과 같은 출력 장치를 포함할 수 있다. 예를 들면, 사용자는 입출력 장치들(370)을 통해서, 프로세서(330)에 의한 프로그램(312)의 실행을 트리거할 수도 있고, 입력 샘플을 입력할 수도 있고, 출력 샘플 및/또는 오류 메시지 등을 확인할 수도 있다.The input/output devices 370 may include an input device such as a keyboard and a pointing device, and may include an output device such as a display device and a printer. For example, the user may trigger the execution of the program 312 by the processor 330 through the input/output devices 370, input an input sample, and output an output sample and/or an error message. You can also check.

통신 접속들(390)은 전자 장치(300) 외부의 네트워크에 대한 액세스를 제공할 수 있다. 예를 들면, 네트워크는 다수의 컴퓨팅 시스템들 및 통신 링크들을 포함할 수 있고, 통신 링크들은 유선 링크들, 광학 링크들, 무선 링크들 또는 임의의 다른 형태의 링크들을 포함할 수 있다.The communication connections 390 may provide access to a network outside the electronic device 300. For example, a network may include multiple computing systems and communication links, and the communication links may include wired links, optical links, wireless links, or any other type of links.

도 17은 본 개시의 일 실시예에 따른 전자 장치를 나타내는 블록도이다. 일 실시예에서, 도 2 또는 도 10의 신경망 시스템(200)은 도 17의 전자 장치(400)로 구현될 수 있다. 전자 장치(400)는 비제한적인 예시로서, 모바일 폰, 태블릿 PC, 웨어러블 기기, 사물 인터넷 장치 등과 같이 배터리 또는 자가 발전을 통해서 전력이 공급되는 임의의 휴대용 전자 기기일 수 있다.17 is a block diagram illustrating an electronic device according to an embodiment of the present disclosure. In an embodiment, the neural network system 200 of FIG. 2 or 10 may be implemented with the electronic device 400 of FIG. 17. The electronic device 400 is a non-limiting example, and may be any portable electronic device supplied with power through a battery or self-power, such as a mobile phone, a tablet PC, a wearable device, an IoT device, and the like.

도 17에 도시된 바와 같이, 전자 장치(400)는 메모리 서브시스템(410), 입출력 장치들(430), 프로세싱 유닛(450) 및 네트워크 인터페이스(470)를 포함할 수 있고, 메모리 서브시스템(410), 입출력 장치들(430), 프로세싱 유닛(450) 및 네트워크 인터페이스(470)는 버스(490)를 통해서 상호 통신할 수 있다. 일부 실시예들에서, 메모리 서브시스템(410), 입출력 장치들(430), 프로세싱 유닛(450) 및 네트워크 인터페이스(470) 중 적어도 2개 이상은, 시스템-온-칩(System-on-a-Chip; SoC)으로서 하나의 패키지에 포함될 수 있다.17, the electronic device 400 may include a memory subsystem 410, input/output devices 430, a processing unit 450, and a network interface 470, and the memory subsystem 410 ), the input/output devices 430, the processing unit 450, and the network interface 470 may communicate with each other through the bus 490. In some embodiments, at least two or more of the memory subsystem 410, the input/output devices 430, the processing unit 450, and the network interface 470 are system-on-a-chip (System-on-a- Chip; SoC) can be included in one package.

메모리 서브시스템(410)은 RAM(412) 및 스토리지(414)를 포함할 수 있다. RAM(412) 및/또는 스토리지(414)는 프로세싱 유닛(450)에 의해서 실행되는 명령어들 및 처리되는 데이터를 저장할 수 있다. 예를 들면, RAM(412) 및/또는 스토리지(414)는 인공 신경망의 신호들, 가중치들, 바이어스들과 같은 파라미터들을 저장할 수 있다. 일부 실시예들에서, 스토리지(414)는 비휘발성 메모리를 포함할 수 있다.Memory subsystem 410 may include RAM 412 and storage 414. The RAM 412 and/or the storage 414 may store instructions executed by the processing unit 450 and data to be processed. For example, the RAM 412 and/or the storage 414 may store parameters such as signals, weights, and biases of the artificial neural network. In some embodiments, storage 414 may include non-volatile memory.

프로세싱 유닛(450)은 CPU(Central Processing Unit)(452), GPU(Graphic Processing Unit)(454), DSP(Digital Signal Processor)(456) 및 NPU(Neural Processing Unit)(458)를 포함할 수 있다. 도 17에 도시된 바와 상이하게, 일 실시예에서, 프로세싱 유닛(450)은 CPU(452), GPU(454), DSP(456) 및 NPU(458) 중 적어도 일부만을 포함할 수도 있다.The processing unit 450 may include a central processing unit (CPU) 452, a graphical processing unit (GPU) 454, a digital signal processor (DSP) 456, and a neural processing unit (NPU) 458. . Different from that shown in FIG. 17, in one embodiment, the processing unit 450 may include only at least some of the CPU 452, GPU 454, DSP 456 and NPU 458.

CPU(452)는 전자 장치(400)의 전체적인 동작, 예컨대 입출력 장치들(430)을 통해서 수신된 외부 입력에 응답하여 특정 작업을 직접 수행하거나, 프로세싱 유닛(450)의 다른 구성요소들에게 수행을 지시할 수 있다. GPU(454)는 입출력 장치들(430)에 포함된 디스플레이 장치를 통해서 출력되는 영상을 위한 데이터를 생성할 수도 있고, 입출력 장치들(430)에 포함된 카메라로부터 수신되는 데이터를 인코딩할 수도 있다. DSP(456)는 디지털 신호, 예컨대 네트워크 인터페이스(470)로부터 제공되는 디지털 신호를 처리함으로써 유용한 데이터를 생성할 수 있다.The CPU 452 directly performs a specific task in response to the overall operation of the electronic device 400, for example, an external input received through the input/output devices 430, or performs execution to other components of the processing unit 450. I can instruct. The GPU 454 may generate data for an image output through a display device included in the input/output devices 430 or may encode data received from a camera included in the input/output devices 430. DSP 456 may generate useful data by processing a digital signal, such as a digital signal provided from network interface 470.

NPU(458)는 인공 신경망을 위한 전용의 하드웨어로서, 인공 신경망을 구성하는 적어도 일부의 인공 뉴런에 대응하는 복수의 계산 노드들을 포함할 수 있고, 복수의 계산 노드들 중 적어도 일부는 병렬적으로 신호를 처리할 수 있다. 본 개시의 일 실시예에 따라 양자화된 인공 신경망은 높은 정확도뿐만 아니라 낮은 계산 복잡도를 가지므로, 도 17의 전자 장치(400)에 용이하게 구현될 수 있고, 빠른 처리 속도를 가질 수 있으며, 예컨대 단순하고 작은 규모의 NPU(458)에 의해서도 구현될 수 있다.The NPU 458 is dedicated hardware for an artificial neural network, and may include a plurality of computational nodes corresponding to at least some artificial neurons constituting the artificial neural network, and at least some of the plurality of computational nodes are signaled in parallel. Can handle. Since the quantized artificial neural network according to an embodiment of the present disclosure has not only high accuracy but also low computational complexity, it can be easily implemented in the electronic device 400 of FIG. And it can be implemented by a small scale NPU (458).

입출력 장치들(430)은 터치 입력 장치, 사운드 입력 장치, 카메라 등과 같은 입력 장치들 및 디스플레이 장치 및 사운드 출력 장치 등과 같은 출력 장치들을 포함할 수 있다. 예를 들면, 사운드 입력 장치를 통해서 사용자의 음성이 입력되는 경우, 전자 장치(400)에 구현된 인공 신경망에 의해서 음성이 인식될 수 있고, 그에 따른 동작이 트리거링될 수 있다. 또한, 카메라를 통해서 이미지가 입력되는 경우, 전자 장치(400)에 구현된 심층 신경망에 의해서 이미지에 포함된 오브젝트가 인식될 수 있고, 가상 현실(virtual reality)과 같은 출력을 사용자에게 제공할 수 있다. 네트워크 인터페이스(470)는 전자 장치(400)에 LTE(Long Term Evolution), 5G 등과 같은 이동 통신 네트워크에 대한 액세스를 제공할 수도 있고, 와이파이와 같은 로컬 네트워크에 대한 액세스를 제공할 수도 있다.The input/output devices 430 may include input devices such as a touch input device, a sound input device, and a camera, and output devices such as a display device and a sound output device. For example, when a user's voice is input through a sound input device, the voice may be recognized by an artificial neural network implemented in the electronic device 400, and an operation according to the voice may be triggered. In addition, when an image is input through a camera, an object included in the image may be recognized by a deep neural network implemented in the electronic device 400, and an output such as virtual reality may be provided to the user. . The network interface 470 may provide the electronic device 400 with access to a mobile communication network such as Long Term Evolution (LTE) or 5G, or may provide access to a local network such as Wi-Fi.

이상에서와 같이 도면과 명세서에서 예시적인 실시예들이 개시되었다. 본 명세서에서 특정한 용어를 사용하여 실시예들을 설명되었으나, 이는 단지 본 개시의 기술적 사상을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 특허청구범위에 기재된 본 개시의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로 본 기술분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 개시의 진정한 기술적 보호범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다.As described above, exemplary embodiments have been disclosed in the drawings and specification. In the present specification, embodiments have been described using specific terms, but these are only used for the purpose of describing the technical idea of the present disclosure, and are not used to limit the meaning or the scope of the present disclosure described in the claims. . Therefore, those of ordinary skill in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Therefore, the true technical scope of the present disclosure should be determined by the technical spirit of the appended claims.

Claims

In the computing system,
A neural network system that drives an artificial neural network; And
Including; a quantization system for quantizing the artificial neural network,
The quantization system,
By quantizing the parameters of the artificial neural network, quantized parameters of the artificial neural network are generated, quantization errors of the parameters of the artificial neural network are generated based on the parameters of the artificial neural network and the quantized parameters, and the quantized parameter And generating a correction bias based on quantization errors of parameters of the artificial neural network, and transmitting the generated quantized parameters and the correction bias to the neural network system.

The method of claim 1,
The quantization system,
Upon receiving an input sample from the neural network system, quantize the input sample, transmit the quantized input sample to the neural network system,
The neural network system,
Upon receiving the quantized input sample, a first multiply-accumulate (MAC) operation is performed based on the quantized input sample and the quantized parameters, and the correction bias is reflected in the result of the first MAC operation. A computing system that generates computational results.

The method of claim 1,
The parameters of the artificial neural network are,
Including the weight and bias of the artificial neural network,
The quantized parameters are,
Including quantized weights and quantized biases,
The quantization error of the parameters is,
A computing system including a quantization error of the weight and a quantization error of the bias.

The method of claim 3,
The quantization system,
Identify a reference sample to generate the correction bias, generate a quantized reference sample by quantizing the reference sample, generate a quantization error of the reference sample based on the reference sample and the quantized reference sample, and the A computing system for generating the correction bias based on at least one of a reference sample, the quantized reference sample, a quantization error of the reference sample, the quantized parameters, and a quantization error of parameters of the artificial neural network.

The method of claim 4,
The quantization system,
A second MAC operation is performed based on a quantization error of the reference sample and the weight, a third MAC operation is performed based on the quantization error of the reference sample and the quantized weight, and a result of the second MAC operation and A computing system for generating the correction bias based on a result of the third MAC operation.

The method of claim 5,
The quantization system,
A computing system for generating the correction bias by summing a result of the second MAC operation, a result of the third MAC operation, and a bias of the artificial neural network.

The method of claim 6,
The quantization system,
A computing system that calculates an average value of the summation result and generates the correction bias having the calculated average value as a scalar value.

The method of claim 4,
The quantization system,
A fourth MAC operation is performed based on the quantization error of the quantized reference sample and the weight, a fifth MAC operation is performed based on the quantization error of the reference sample and the weight of the artificial neural network, and the fourth MAC operation The computing system for generating the correction bias based on the result of the fifth MAC operation and the result of.

The method of claim 4,
The neural network system,
Select at least one first sample from a sample pool, and transmit the at least one first sample to the quantization system,
The quantization system,
Generating the reference sample based on the at least one first sample,
A computing system for generating at least one quantized first sample by quantizing the at least one first sample, and generating the quantized reference sample using the at least one quantized first sample.

The method of claim 9,
The neural network system,
Computing system for selecting a plurality of second samples from the sample pool, performing an operation on the plurality of second samples through the artificial neural network, and selecting the at least one first sample based on the calculation result .

The method of claim 10,
The neural network system,
A computing system that checks statistical distributions of output samples of each of the layers constituting the artificial neural network based on the calculation result, and selects the at least one first sample based on the checked statistical distributions.

The method of claim 3,
The quantization system,
An expected input sample is generated by predicting a next sequence of samples based on at least one third sample processed by the neural network system based on the quantized parameters, and based on a quantization error of the at least one third sample A computing system for generating a quantization error of the expected input sample, and generating the correction bias based on a quantization error of the expected input sample, a quantization error of the expected input sample, the quantized parameters, and parameters of the artificial neural network.

The method of claim 12,
The quantized system,
A computing system that calculates a motion vector of the at least one third sample and generates the expected input sample using the calculated motion vector.

The method of claim 12,
The quantized system,
A computing system for generating the expected input sample by inputting the at least one third sample into an artificial intelligence module that is learned to predict a next sequence of samples based on a previous sequence of samples.

In the computation method using an artificial neural network,
Quantizing weights and biases of the artificial neural network;
Generating a correction bias by correcting the quantized bias to include an error due to quantization;
Quantizing the input samples;
Performing a first multiply-accumulate (MAC) operation based on the quantized weight of the artificial neural network and the quantized input sample; And
And reflecting the correction bias to a result of the first MAC operation.

The method of claim 15,
Generating the correction bias,
A calculation method for correcting the quantized bias based on a first error that is a quantization error of a reference sample and a second error that is a quantization error of the weight.

The method of claim 16,
Generating the correction bias,
Selecting at least one first sample from a sample pool and generating a reference sample based on the at least one first sample; And
Calculating a quantization error of the reference sample based on the quantization error of the at least one first sample.

The method of claim 17,
The step of calculating the reference sample,
Predicting the input sample based on at least one first sample already processed through the artificial neural network among the sample pool; And
Generating the expected input sample as the reference sample.

In the quantization method of an artificial neural network,
Quantizing parameters of the artificial neural network;
Calculating a quantization error of the parameters based on the parameters of the artificial neural network and the quantized parameters;
Generating a correction bias based on the quantized parameters and quantization errors of the parameters.

The method of claim 19,
Generating the correction bias,
Generating a reference sample based on at least one first sample;
Quantizing the reference sample;
Calculating a quantization error of the reference sample based on the reference sample and the quantized reference sample; And
Generating the correction bias using at least one of the reference sample, the quantized reference sample, a quantization error of the reference sample, the quantized parameters, and a quantization error of the parameters.