KR20230138372A

KR20230138372A - Apparatus and method of neural networm operation

Info

Publication number: KR20230138372A
Application number: KR1020220069660A
Authority: KR
Inventors: 최정욱; 박성민
Original assignee: 삼성전자주식회사
Priority date: 2022-03-23
Filing date: 2022-06-08
Publication date: 2023-10-05

Abstract

Disclosed are an apparatus for a neural network arithmetic operation and a method thereof. To this end, the apparatus for a neural network arithmetic operation according to one embodiment of the present invention comprises: a receiver which receives data to perform a neural network arithmetic operation; and a processor which extracts calibration data based on learning data for allowing the neural network to learn from the data, generates a look-up table (LUT) for performing a nonlinear arithmetic operation included in the neural network by means of an auxiliary network corresponding to the layer of the neural network based on the calibration data, and updates the parameter of the LUT based on the output of the nonlinear arithmetic operation and the output of the auxiliary network.

Description

Neural network operation device and method {APPARATUS AND METHOD OF NEURAL NETWORM OPERATION}

아래 실시예들은 뉴럴 네트워크 연산 장치 및 방법에 관한 것이다.The embodiments below relate to neural network computing devices and methods.

종래의 뉴럴 네트워크 연산에서 LUT(Look Up Table)을 사용하는 방식은 입력 값의 변화량보다 출력 값의 범위가 급변하는 비선형 함수를 근사하는 경우, 오차(error)가 발생하였다. 종래의 뉘럴 네트워크 연산 방식은 발생하는 오차로 인하여, 최종적인 뉴럴 네트워크 모델의 예측의 정확도가 떨어졌었다.In the conventional neural network calculation, the method of using LUT (Look Up Table) generates errors when approximating a non-linear function in which the range of output values changes more rapidly than the amount of change in input values. In the conventional neural network calculation method, the accuracy of prediction of the final neural network model was reduced due to errors generated.

또한, 변동률이 심한 구간을 근사하기 위해 많은 양의 LUT(Look Up Table) 인덱스(index)가 필요하여 하드웨어 코스트가 증가되기 때문에, LUT 인덱스의 수를 유지하면서 정확도를 보존하는 뉴럴 네트워크 연산 방식이 요구된다.In addition, since a large amount of Look Up Table (LUT) indexes are required to approximate sections with high fluctuation rates, which increases hardware costs, a neural network calculation method that preserves accuracy while maintaining the number of LUT indexes is required. do.

하지만, 하나의 LUT를 이용하여 뉴럴 네트워크의 비선형 함수를 근사할 경우에 뉴럴 네트워크의 복수의 레이어들이 서로 다른 입력데이터의 통계적 특성을 가지고 있기 때문에, 뉴럴 네트워크 추론의 성능이 저하될 수 있다.However, when approximating a nonlinear function of a neural network using a single LUT, the performance of neural network inference may deteriorate because multiple layers of the neural network have different statistical characteristics of input data.

뉴럴 네트워크 연산 장치에 있어서, 일 실시예에 따른 뉴럴 네트워크 연산 장치는, 뉴럴 네트워크 연산을 수행하기 위한 데이터를 수신하는 수신기와, 상기 데이터 중에서 상기 뉴럴 네트워크를 학습시키기 위한 학습 데이터에 기초하여 캘리브레이션(calibration) 데이터를 추출하고, 상기 캘리브레이션 데이터에 기초하여 상기 뉴럴 네트워크의 레이어에 대응하는 보조 네트워크를 통해 상기 뉴럴 네트워크에 포함된 비선형 연산을 수행하기 위한 LUT(Look Up Table)를 생성하고, 상기 비선형 연산의 출력 및 상기 보조 네트워크의 출력에 기초하여 상기 LUT의 파라미터를 업데이트하는 프로세서를 포함한다.In the neural network calculation device, the neural network calculation device according to one embodiment includes a receiver that receives data for performing neural network calculation, and a calibration device based on training data for training the neural network among the data. ) Extract data, generate a LUT (Look Up Table) for performing a non-linear operation included in the neural network through an auxiliary network corresponding to a layer of the neural network based on the calibration data, and perform the non-linear operation of the non-linear operation. and a processor that updates parameters of the LUT based on the output and the output of the auxiliary network.

상기 프로세서는, 상기 학습 데이터 중에서 미리 결정된 비율의 데이터를 추출함으로써 상기 캘리브레이션 데이터를 추출할 수 있다.The processor may extract the calibration data by extracting a predetermined ratio of data from the learning data.

상기 프로세서는, 상기 뉴럴 네트워크의 제1 레이어에 대응하는 제1 보조 네트워크에 기초하여 제1 LUT를 생성하고, 상기 뉴럴 네트워크의 제2 레이어에 대응하는 제2 보조 네트워크에 기초하여 제2 LUT를 생성할 수 있다.The processor generates a first LUT based on a first auxiliary network corresponding to the first layer of the neural network, and generates a second LUT based on a second auxiliary network corresponding to the second layer of the neural network. can do.

상기 프로세서는, 상기 캘리브레이션 데이터에 기초하여 상기 뉴럴 네트워크의 제1 레이어에 대응하는 제1 비선형 연산 출력을 생성하고, 상기 제1 비선형 연산 출력을 상기 뉴럴 네트워크의 제2 레이어에 입력함으로써 순전파(forward propagation)를 수행할 수 있다.The processor generates a first nonlinear operation output corresponding to the first layer of the neural network based on the calibration data, and inputs the first nonlinear operation output to the second layer of the neural network to perform forward propagation. propagation) can be performed.

상기 프로세서는, 상기 비선형 연산을 근사하기 위한 상기 LUT의 스케일(scale) 및 바이어스(bias)를 결정함으로써 상기 LUT를 생성할 수 있다.The processor may generate the LUT by determining the scale and bias of the LUT for approximating the non-linear operation.

상기 프로세서는, 상기 보조 네트워크의 출력 및 상기 비선형 연산의 출력에 기초하여 역전파(back propagation)를 수행함으로써 상기 보조 네트워크의 파라미터를 미세 조정할 수 있다.The processor may fine-tune the parameters of the auxiliary network by performing back propagation based on the output of the auxiliary network and the output of the non-linear operation.

상기 프로세서는, 상기 보조 네트워크의 출력과 상기 비선형 연산의 출력 간의 평균 절대 오차(mean absolute error)에 기초하여 상기 보조 네트워크의 파라미터를 미세 조정할 수 있다.The processor may fine-tune the parameters of the auxiliary network based on the mean absolute error between the output of the auxiliary network and the output of the non-linear operation.

상기 프로세서는, 상기 비선형 연산의 출력에 기초하여 상기 제1 레이어 및 상기 제2 레이어를 동시에 학습시킬 수 있다.The processor may simultaneously learn the first layer and the second layer based on the output of the nonlinear operation.

상기 비선형 연산은, GELU (Gaussian Error Linear Unit) 연산, 소프트맥스(softmax) 연산 또는 레이어 정규화(layer normalization) 연산을 포함할 수 있다.The non-linear operation may include a GELU (Gaussian Error Linear Unit) operation, a softmax operation, or a layer normalization operation.

뉴럴 네트워크 연산 방법에 있어서, 일 실시예에 따른 뉴럴 네트워크 연산 방법은, 뉴럴 네트워크 연산을 수행하기 위한 데이터를 수신하는 단계와, 상기 데이터 중에서 상기 뉴럴 네트워크를 학습시키기 위한 학습 데이터에 기초하여 캘리브레이션(calibration) 데이터를 추출하는 단계와, 상기 캘리브레이션 데이터에 기초하여 상기 뉴럴 네트워크의 레이어에 대응하는 보조 네트워크를 통해 상기 뉴럴 네트워크에 포함된 비선형 연산을 수행하기 위한 LUT(Look Up Table)를 생성하는 단계와, 상기 비선형 연산의 출력 및 상기 보조 네트워크의 출력에 기초하여 상기 LUT의 파라미터를 업데이트하는 단계를 포함한다.In a neural network calculation method, the neural network calculation method according to an embodiment includes receiving data for performing a neural network calculation, and performing calibration based on training data for training the neural network among the data. ) Extracting data, generating a LUT (Look Up Table) for performing a non-linear operation included in the neural network through an auxiliary network corresponding to a layer of the neural network based on the calibration data, and updating parameters of the LUT based on the output of the non-linear operation and the output of the auxiliary network.

상기 캘리브레이션 데이터를 추출하는 단계는, 상기 학습 데이터 중에서 미리 결정된 비율의 데이터를 추출함으로써 상기 캘리브레이션 데이터를 추출하는 단계를 포함할 수 있다.The step of extracting the calibration data may include extracting the calibration data by extracting a predetermined ratio of data from the learning data.

상기 LUT를 생성하는 단계는, 상기 뉴럴 네트워크의 제1 레이어에 대응하는 제1 보조 네트워크에 기초하여 제1 LUT를 생성하는 단계와, 상기 뉴럴 네트워크의 제2 레이어에 대응하는 제2 보조 네트워크에 기초하여 제2 LUT를 생성하는 단계를 포함할 수 있다.Generating the LUT may include generating a first LUT based on a first auxiliary network corresponding to a first layer of the neural network, and based on a second auxiliary network corresponding to a second layer of the neural network. This may include generating a second LUT.

상기 파라미터를 업데이트하는 단계는, 상기 캘리브레이션 데이터에 기초하여 상기 뉴럴 네트워크의 제1 레이어에 대응하는 제1 비선형 연산 출력을 생성하는 단계와, 상기 제1 비선형 연산 출력을 상기 뉴럴 네트워크의 제2 레이어에 입력함으로써 순전파(forward propagation)를 수행하는 단계를 포함할 수 있다.The step of updating the parameters includes generating a first non-linear operation output corresponding to the first layer of the neural network based on the calibration data, and transmitting the first non-linear operation output to the second layer of the neural network. It may include performing forward propagation by inputting.

상기 LUT를 생성하는 단계는, 상기 비선형 연산을 근사하기 위한 상기 LUT의 스케일(scale) 및 바이어스(bias)를 결정함으로써 상기 LUT를 생성하는 단계를 포함할 수 있다.Generating the LUT may include generating the LUT by determining a scale and bias of the LUT for approximating the non-linear operation.

상기 파라미터를 업데이트하는 단계는, 상기 보조 네트워크의 출력 및 상기 비선형 연산의 출력에 기초하여 역전파를 수행함으로써 상기 보조 네트워크의 파라미터를 미세 조정하는 단계를 포함할 수 있다.The step of updating the parameters may include fine-tuning the parameters of the auxiliary network by performing backpropagation based on the output of the auxiliary network and the output of the non-linear operation.

상기 보조 네트워크의 파라미터를 미세 조정하는 단계는, 상기 보조 네트워크의 출력과 상기 비선형 연산의 출력 간의 평균 절대 오차(mean absolute error)에 기초하여 상기 보조 네트워크의 파라미터를 미세 조정하는 단계를 포함할 수 있다.Fine-tuning the parameters of the auxiliary network may include fine-tuning the parameters of the auxiliary network based on a mean absolute error between the output of the auxiliary network and the output of the non-linear operation. .

상기 파라미터를 업데이트하는 단계는, 상기 비선형 연산의 출력에 기초하여 상기 제1 레이어 및 상기 제2 레이어를 동시에 학습시키는 단계를 포함할 수 있다.The step of updating the parameter may include simultaneously learning the first layer and the second layer based on the output of the non-linear operation.

도 1은 뉴럴 네트워크 연산 장치의 개략적인 블록도를 나타낸다.
도 2는 도 1에 도시된 뉴럴 네트워크 연산 장치의 동작을 나타낸다.
도 3a는 보조 네트워크의 학습 과정을 설명하기 위한 도면이다.
도 3b는 보조 네트워크의 예를 나타낸다.
도 4는 뉴럴 네트워크 연산 장치의 성능을 나타낸다.
도 5는 도 4의 성능을 측정하는 동작의 흐름도를 나타낸다.
도 6은 도 1에 도시된 뉴럴 네트워크 연산 장치의 동작의 흐름도를 나타낸다.Figure 1 shows a schematic block diagram of a neural network computing device.
FIG. 2 shows the operation of the neural network computing device shown in FIG. 1.
Figure 3a is a diagram for explaining the learning process of the auxiliary network.
Figure 3b shows an example of an auxiliary network.
Figure 4 shows the performance of the neural network computing device.
Figure 5 shows a flowchart of the operation for measuring the performance of Figure 4.
FIG. 6 shows a flowchart of the operation of the neural network computing device shown in FIG. 1.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 구현될 수 있다. 따라서, 실제 구현되는 형태는 개시된 특정 실시예로만 한정되는 것이 아니며, 본 명세서의 범위는 실시예들로 설명한 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only and may be changed and implemented in various forms. Accordingly, the actual implementation form is not limited to the specific disclosed embodiments, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical idea described in the embodiments.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but these terms should be interpreted only for the purpose of distinguishing one component from another component. For example, a first component may be named a second component, and similarly, the second component may also be named a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being “connected” to another component, it should be understood that it may be directly connected or connected to the other component, but that other components may exist in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as “comprise” or “have” are intended to designate the presence of the described features, numbers, steps, operations, components, parts, or combinations thereof, and are intended to indicate the presence of one or more other features or numbers, It should be understood that this does not exclude in advance the possibility of the presence or addition of steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the art. Terms as defined in commonly used dictionaries should be interpreted as having meanings consistent with the meanings they have in the context of the related technology, and unless clearly defined in this specification, should not be interpreted in an idealized or overly formal sense. No.

이하, 실시예들을 첨부된 도면들을 참조하여 상세하게 설명한다. 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고, 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments will be described in detail with reference to the attached drawings. In the description with reference to the accompanying drawings, identical components will be assigned the same reference numerals regardless of the reference numerals, and overlapping descriptions thereof will be omitted.

도 1은 뉴럴 네트워크 연산 장치의 개략적인 블록도를 나타낸다.Figure 1 shows a schematic block diagram of a neural network computing device.

도 1을 참조하면, 뉴럴 네트워크 연산 장치(10)는 뉴럴 네트워크 연산을 수행할 수 있다. 뉴럴 네트워크 연산 장치(10)는 뉴럴 네트워크 연산에 포함된 비선형 함수(non-linear function)에 대한 연산을 수행할 수 있다. 뉴럴 네트워크 연산 장치(10)는 뉴럴 네트워크의 비선형 함수를 근사하는 LUT(Look Up Table)을 생성하여 비선형 함수를 LUT로 치환함으로써 연산 속도를 향상시킬 수 있다.Referring to FIG. 1, the neural network calculation device 10 can perform neural network calculation. The neural network calculation device 10 can perform calculations on non-linear functions included in neural network calculations. The neural network calculation device 10 can improve calculation speed by generating a Look Up Table (LUT) that approximates the non-linear function of the neural network and replacing the non-linear function with the LUT.

비선형 함수는 선형 함수를 제외한 함수를 의미할 수 있다. 비선형 함수는 함수의 변수들 간의 그래프가 직선이 아닌 형태로 표현될 수 있다. 예를 들어, 비선형 함수는 시그모이드(sigmoid)(또는, 로지스틱(logistic) 함수, 하이퍼볼릭 탄젠트(hyperbolic tangent), ReLU(Rectified Linear Unit) 함수, 리키(leaky) ReLU 함수, 파라메트릭(parametric) ReLU 함수, ELUs(Exponential Linear Units) 함수, 소프트맥스(softmax) 함수, 스위시(swish) 함수, GeLU(Gaussian Error Linear Unit) 함수 및/또는 SELU(Scaled Exponential Linear Unit) 함수를 포함할 수 있다.A non-linear function may mean a function other than a linear function. Nonlinear functions can be expressed in a form where the graph between the variables of the function is not a straight line. For example, non-linear functions include sigmoid (or logistic function, hyperbolic tangent, ReLU (Rectified Linear Unit) function, leaky ReLU function, parametric) It may include a ReLU function, an Exponential Linear Units (ELUs) function, a softmax function, a swish function, a Gaussian Error Linear Unit (GeLU) function, and/or a Scaled Exponential Linear Unit (SELU) function.

뉴럴 네트워크 연산 장치(10)는 LUT(Look-Up Table)을 이용하여 비선형 함수에 대한 연산을 수행할 수 있다. 뉴럴 네트워크 연산 장치(10)는 메인 연산을 수행하는 뉴럴 네트워크에 포함된 각 레이어에 대응하는 보조 네트워크를 이용하여 LUT를 생성할 수 있다. 보조 네트워크는 뉴럴 네트워크의 형태로 구현될 수 있다.The neural network calculation device 10 can perform calculations on nonlinear functions using a look-up table (LUT). The neural network calculation device 10 may generate an LUT using an auxiliary network corresponding to each layer included in the neural network that performs the main calculation. The auxiliary network may be implemented in the form of a neural network.

뉴럴 네트워크는 시냅스의 결합으로 네트워크를 형성한 인공 뉴런(노드)이 학습을 통해 시냅스의 결합 세기를 변화시켜, 문제 해결 능력을 가지는 모델 전반을 의미할 수 있다.A neural network can refer to an overall model in which artificial neurons (nodes), which form a network through the combination of synapses, change the strength of the synapse connection through learning and have problem-solving capabilities.

뉴럴 네트워크의 뉴런은 가중치 또는 바이어스의 조합을 포함할 수 있다. 뉴럴 네트워크는 하나 이상의 뉴런 또는 노드로 구성된 하나 이상의 레이어(layer)를 포함할 수 있다. 뉴럴 네트워크는 뉴런의 가중치를 학습을 통해 변화시킴으로써 임의의 입력으로부터 예측하고자 하는 결과를 추론할 수 있다.Neurons in a neural network can contain combinations of weights or biases. A neural network may include one or more layers consisting of one or more neurons or nodes. Neural networks can infer the results they want to predict from arbitrary inputs by changing the weights of neurons through learning.

뉴럴 네트워크는 심층 뉴럴 네트워크 (Deep Neural Network)를 포함할 수 있다. 뉴럴 네트워크는 CNN(Convolutional Neural Network), RNN(Recurrent Neural Network), 퍼셉트론(perceptron), 다층 퍼셉트론(multilayer perceptron), FF(Feed Forward), RBF(Radial Basis Network), DFF(Deep Feed Forward), LSTM(Long Short Term Memory), GRU(Gated Recurrent Unit), AE(Auto Encoder), VAE(Variational Auto Encoder), DAE(Denoising Auto Encoder), SAE(Sparse Auto Encoder), MC(Markov Chain), HN(Hopfield Network), BM(Boltzmann Machine), RBM(Restricted Boltzmann Machine), DBN(Depp Belief Network), DCN(Deep Convolutional Network), DN(Deconvolutional Network), DCIGN(Deep Convolutional Inverse Graphics Network), GAN(Generative Adversarial Network), LSM(Liquid State Machine), ELM(Extreme Learning Machine), ESN(Echo State Network), DRN(Deep Residual Network), DNC(Differentiable Neural Computer), NTM(Neural Turning Machine), CN(Capsule Network), KN(Kohonen Network) 및 AN(Attention Network)를 포함할 수 있다.Neural networks may include deep neural networks. Neural networks include CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), perceptron, multilayer perceptron, FF (Feed Forward), RBF (Radial Basis Network), DFF (Deep Feed Forward), and LSTM. (Long Short Term Memory), GRU (Gated Recurrent Unit), AE (Auto Encoder), VAE (Variational Auto Encoder), DAE (Denoising Auto Encoder), SAE (Sparse Auto Encoder), MC (Markov Chain), HN (Hopfield) Network), BM (Boltzmann Machine), RBM (Restricted Boltzmann Machine), DBN (Depp Belief Network), DCN (Deep Convolutional Network), DN (Deconvolutional Network), DCIGN (Deep Convolutional Inverse Graphics Network), GAN (Generative Adversarial Network) ), Liquid State Machine (LSM), Extreme Learning Machine (ELM), Echo State Network (ESN), Deep Residual Network (DRN), Differential Neural Computer (DNC), Neural Turning Machine (NTM), Capsule Network (CN), It may include Kohonen Network (KN) and Attention Network (AN).

뉴럴 네트워크 연산 장치(10)는 PC(personal computer), 데이터 서버, 또는 휴대용 장치 내에 구현될 수 있다.The neural network computing device 10 may be implemented within a personal computer (PC), a data server, or a portable device.

휴대용 장치는 랩탑(laptop) 컴퓨터, 이동 전화기, 스마트 폰(smart phone), 태블릿(tablet) PC, 모바일 인터넷 디바이스(mobile internet device(MID)), PDA(personal digital assistant), EDA(enterprise digital assistant), 디지털 스틸 카메라(digital still camera), 디지털 비디오 카메라(digital video camera), PMP(portable multimedia player), PND(personal navigation device 또는 portable navigation device), 휴대용 게임 콘솔(handheld game console), e-북(e-book), 또는 스마트 디바이스(smart device)로 구현될 수 있다. 스마트 디바이스는 스마트 와치(smart watch), 스마트 밴드(smart band), 또는 스마트 링(smart ring)으로 구현될 수 있다.Portable devices include laptop computers, mobile phones, smart phones, tablet PCs, mobile internet devices (MIDs), personal digital assistants (PDAs), and enterprise digital assistants (EDAs). , digital still camera, digital video camera, portable multimedia player (PMP), personal navigation device or portable navigation device (PND), handheld game console, e-book ( It can be implemented as an e-book) or a smart device. A smart device may be implemented as a smart watch, smart band, or smart ring.

뉴럴 네트워크 연산 장치(10)는 뉴럴 네트워크에 포함된 레이어들이 서로 다른 입력 데이터에 대한 통계적 특성을 가진다는 점을 고려하여 각각의 레이어에 대응하는 LUT를 생성함으로써 뉴럴 네트워크의 비선형 연산을 수행할 수 있다. 뉴럴 네트워크 연산 장치(10)는 각각의 레이어에 대응하는 LUT를 이용하여 비선형 연산을 처리함으로써 뉴럴 네트워크 연산의 성능을 향상시킬 수 있다.The neural network calculation device 10 can perform nonlinear calculation of the neural network by generating a LUT corresponding to each layer, considering that the layers included in the neural network have different statistical characteristics of input data. . The neural network computation device 10 can improve the performance of neural network computation by processing non-linear computation using the LUT corresponding to each layer.

뉴럴 네트워크 연산 장치(10)는 수신기(100) 및 프로세서(200)를 포함한다. 뉴럴 네트워크 연산 장치(10)는 메모리(300)를 더 포함할 수 있다.The neural network computing device 10 includes a receiver 100 and a processor 200. The neural network computing device 10 may further include a memory 300.

수신기(100)는 뉴럴 네트워크 연산을 수행하기 위한 데이터를 수신할 수 있다. 수신기(100)는 수신 인터페이스를 포함할 수 있다. 수신기(100)는 수신한 데이터를 프로세서(200)로 출력할 수 있다. 뉴럴 네트워크에 관련된 데이터는 뉴럴 네트워크의 모델 파라미터(또는, 가중치), 뉴럴 네트워크 연산을 수행하기 위한 입력 데이터, 뉴럴 네트워크로부터 출력되는 데이터, 뉴럴 네트워크를 학습시키기 위한 학습 데이터 및/또는 뉴럴 네트워크 연산에 관련된 정보를 포함할 수 있다.The receiver 100 may receive data for performing a neural network operation. Receiver 100 may include a receiving interface. The receiver 100 may output the received data to the processor 200. Data related to the neural network include model parameters (or weights) of the neural network, input data for performing neural network operations, data output from the neural network, training data for training the neural network, and/or related to neural network operations. May contain information.

프로세서(200)는 메모리(300)에 저장된 데이터를 처리할 수 있다. 프로세서(200)는 메모리(300)에 저장된 컴퓨터로 읽을 수 있는 코드(예를 들어, 소프트웨어) 및 프로세서(200)에 의해 유발된 인스트럭션(instruction)들을 실행할 수 있다.The processor 200 may process data stored in the memory 300. The processor 200 may execute computer-readable code (eg, software) stored in the memory 300 and instructions triggered by the processor 200 .

"프로세서(200)"는 목적하는 동작들(desired operations)을 실행시키기 위한 물리적인 구조를 갖는 회로를 가지는 하드웨어로 구현된 데이터 처리 장치일 수 있다. 예를 들어, 목적하는 동작들은 프로그램에 포함된 코드(code) 또는 인스트럭션들(instructions)을 포함할 수 있다.The “processor 200” may be a data processing device implemented in hardware that has a circuit with a physical structure for executing desired operations. For example, the intended operations may include code or instructions included in the program.

예를 들어, 하드웨어로 구현된 데이터 처리 장치는 마이크로프로세서(microprocessor), 중앙 처리 장치(central processing unit), 프로세서 코어(processor core), 멀티-코어 프로세서(multi-core processor), 멀티프로세서(multiprocessor), ASIC(Application-Specific Integrated Circuit), FPGA(Field Programmable Gate Array)를 포함할 수 있다.For example, data processing devices implemented in hardware include microprocessors, central processing units, processor cores, multi-core processors, and multiprocessors. , ASIC (Application-Specific Integrated Circuit), and FPGA (Field Programmable Gate Array).

프로세서(200)는 데이터 중에서 뉴럴 네트워크를 학습시키기 위한 학습 데이터에 기초하여 캘리브레이션(calibration) 데이터를 추출할 수 있다. 프로세서(200)는 학습 데이터 중에서 미리 결정된 비율의 데이터를 추출함으로써 캘리브레이션 데이터를 추출할 수 있다. 예를 들어, 미리 결정된 비율은 1/10일 수 있다.The processor 200 may extract calibration data from the data based on training data for training a neural network. The processor 200 may extract calibration data by extracting a predetermined ratio of data from the training data. For example, the predetermined ratio may be 1/10.

프로세서(200)는 캘리브레이션 데이터에 기초하여 뉴럴 네트워크의 레이어에 대응하는 보조 네트워크를 통해 뉴럴 네트워크에 포함된 비선형 연산을 수행하기 위한 LUT(Look Up Table)를 생성할 수 있다. 비선형 연산은 비선형 함수에 대한 연산을 포함할 수 있다. 예를 들어, 비선형 연산은 GELU (Gaussian Error Linear Unit) 연산, 소프트맥스(softmax) 연산 또는 레이어 정규화(layer normalization) 연산을 포함할 수 있다.The processor 200 may generate a look up table (LUT) for performing a nonlinear operation included in the neural network through an auxiliary network corresponding to a layer of the neural network based on the calibration data. Nonlinear operations may include operations on nonlinear functions. For example, non-linear operations may include GELU (Gaussian Error Linear Unit) operations, softmax operations, or layer normalization operations.

프로세서(200)는 비선형 연산을 근사하기 위한 LUT의 스케일(scale) 및 바이어스(bias)를 결정함으로써 LUT를 생성할 수 있다. 프로세서(200)는 뉴럴 네트워크의 제1 레이어에 대응하는 제1 보조 네트워크에 기초하여 제1 LUT를 생성할 수 있다. 프로세서(200)는 뉴럴 네트워크의 제2 레이어에 대응하는 제2 보조 네트워크에 기초하여 제2 LUT를 생성할 수 있다.The processor 200 may generate a LUT by determining the scale and bias of the LUT for approximating a non-linear operation. The processor 200 may generate the first LUT based on the first auxiliary network corresponding to the first layer of the neural network. The processor 200 may generate a second LUT based on a second auxiliary network corresponding to the second layer of the neural network.

프로세서(200)는 비선형 연산의 출력 및 보조 네트워크의 출력에 기초하여 LUT의 파라미터를 업데이트할 수 있다. 프로세서(200)는 캘리브레이션 데이터에 기초하여 뉴럴 네트워크의 제1 레이어에 대응하는 제1 비선형 연산 출력을 생성할 수 있다. 프로세서(200)는 제1 비선형 연산 출력을 뉴럴 네트워크의 제2 레이어에 입력함으로써 순전파(forward propagation)를 수행할 수 있다.The processor 200 may update the parameters of the LUT based on the output of the nonlinear operation and the output of the auxiliary network. The processor 200 may generate a first non-linear operation output corresponding to the first layer of the neural network based on the calibration data. The processor 200 may perform forward propagation by inputting the first nonlinear operation output to the second layer of the neural network.

프로세서(200)는 보조 네트워크의 출력 및 비선형 연산의 출력에 기초하여 역전파를 수행함으로써 보조 네트워크의 파라미터를 미세 조정(fine tuning)할 수 있다. 프로세서(200)는 보조 네트워크의 출력과 비선형 연산의 출력 간의 평균 절대 오차(mean absolute error)에 기초하여 보조 네트워크의 파라미터를 미세 조정할 수 있다.The processor 200 may fine tune the parameters of the auxiliary network by performing backpropagation based on the output of the auxiliary network and the output of the nonlinear operation. The processor 200 may fine-tune the parameters of the auxiliary network based on the mean absolute error between the output of the auxiliary network and the output of the non-linear operation.

프로세서(200)는 비선형 연산의 출력에 기초하여 제1 레이어 및 제2 레이어를 동시에 학습시킬 수 있다.The processor 200 may simultaneously learn the first layer and the second layer based on the output of the nonlinear operation.

메모리(300)는 연산(예: 뉴럴 네트워크 연산)을 위한 데이터 또는 연산 결과를 저장할 수 있다. 메모리(300)는 프로세서(200)에 의해 실행가능한 인스트럭션들(또는 프로그램)을 저장할 수 있다. 예를 들어, 인스트럭션들은 프로세서의 동작 및/또는 프로세서의 각 구성의 동작을 실행하기 위한 인스트럭션들을 포함할 수 있다.The memory 300 may store data or calculation results for calculations (eg, neural network calculations). The memory 300 may store instructions (or programs) executable by the processor 200. For example, the instructions may include instructions for executing the operation of the processor and/or the operation of each component of the processor.

메모리(300)는 휘발성 메모리 장치 또는 비휘발성 메모리 장치로 구현될 수 있다.The memory 300 may be implemented as a volatile memory device or a non-volatile memory device.

휘발성 메모리 장치는 DRAM(dynamic random access memory), SRAM(static random access memory), T-RAM(thyristor RAM), Z-RAM(zero capacitor RAM), 또는 TTRAM(Twin Transistor RAM)으로 구현될 수 있다.Volatile memory devices may be implemented as dynamic random access memory (DRAM), static random access memory (SRAM), thyristor RAM (T-RAM), zero capacitor RAM (Z-RAM), or twin transistor RAM (TTRAM).

비휘발성 메모리 장치는 EEPROM(Electrically Erasable Programmable Read-Only Memory), 플래시(flash) 메모리, MRAM(Magnetic RAM), 스핀전달토크 MRAM(Spin-Transfer Torque(STT)-MRAM), Conductive Bridging RAM(CBRAM), FeRAM(Ferroelectric RAM), PRAM(Phase change RAM), 저항 메모리(Resistive RAM(RRAM)), 나노 튜브 RRAM(Nanotube RRAM), 폴리머 RAM(Polymer RAM(PoRAM)), 나노 부유 게이트 메모리(Nano Floating Gate Memory(NFGM)), 홀로그래픽 메모리(holographic memory), 분자 전자 메모리 소자(Molecular Electronic Memory Device), 또는 절연 저항 변화 메모리(Insulator Resistance Change Memory)로 구현될 수 있다.Non-volatile memory devices include EEPROM (Electrically Erasable Programmable Read-Only Memory), flash memory, MRAM (Magnetic RAM), Spin-Transfer Torque (STT)-MRAM (MRAM), and Conductive Bridging RAM (CBRAM). , FeRAM (Ferroelectric RAM), PRAM (Phase change RAM), Resistive RAM (RRAM), Nanotube RRAM (Nanotube RRAM), Polymer RAM (PoRAM), Nano Floating Gate Memory (NFGM), holographic memory, molecular electronic memory device, or insulation resistance change memory.

도 2는 도 1에 도시된 뉴럴 네트워크 연산 장치의 동작을 나타낸다.FIG. 2 shows the operation of the neural network computing device shown in FIG. 1.

도 2를 참조하면, 프로세서(예: 도 1의 프로세서(200))는 뉴럴 네트워크 및 보조 네트워크를 학습시킬 수 있다. 프로세서(200)는 뉴럴 네트워크를 구성하는 각각의 레이어들의 특성을 고하여 LUT를 생성할 수 있다. 프로세서(200)는 역전파에 기초하여 보조 네트워크의 파라미터를 미세 조정함으로써 보조 네트워크를 학습시킬 수 있다. 프로세서(200)는 미세 조정된 보조 네트워크에 기초하여 최적의 LUT를 생성함으로써 뉴럴 네트워크의 연산 성능을 향상시킬 수 있다.Referring to FIG. 2, a processor (eg, processor 200 of FIG. 1) can train a neural network and an auxiliary network. The processor 200 may generate an LUT by determining the characteristics of each layer constituting the neural network. The processor 200 may learn the auxiliary network by fine-tuning the parameters of the auxiliary network based on backpropagation. The processor 200 can improve the computational performance of the neural network by generating an optimal LUT based on the finely tuned auxiliary network.

프로세서(200)는 전체 뉴럴 네트워크에 대응하는 LUT를 생성하는 보조 네트워크의 가중치를 초기값으로 이용하여 뉴럴 네트워크에 포함된 복수의 레이어 각각에 대응하는 보조 네트워크를 생성할 수 있다.The processor 200 may generate an auxiliary network corresponding to each of a plurality of layers included in the neural network by using the weight of the auxiliary network that generates the LUT corresponding to the entire neural network as an initial value.

프로세서(200)는 보조 네트워크에 의해 생성된 LUT의 출력과 뉴럴 네트워크의 비선형 연산의 출력 간의 차이에 기초하여 미세 조정 학습을 수행할 수 있다. 프로세서(200)는 LUT의 출력과 뉴럴 네트워크의 비선형 연산의 출력 간의 평균 절대 오차를 손실 함수로 이용하여 미세 조정 학습을 수행할 수 있다.The processor 200 may perform fine-tuning learning based on the difference between the output of the LUT generated by the auxiliary network and the output of the non-linear operation of the neural network. The processor 200 may perform fine-tuning learning by using the average absolute error between the output of the LUT and the output of the nonlinear operation of the neural network as a loss function.

이를 통해, 프로세서(200)는 레이어 별로 입력 데이터에 대한 통계적 특성을 분석하지 않고, 손실 함수를 감소시키토록 역전파를 수행함으로써 뉴럴 네트워크 및/또는 보조 뉴럴 네트워크를 자연스럽게 학습시킬 수 있다.Through this, the processor 200 can naturally learn the neural network and/or the auxiliary neural network by performing backpropagation to reduce the loss function without analyzing statistical characteristics of the input data for each layer.

뉴럴 네트워크는 제1 레이어(210-1), 제2 레이어(210-2), ..., 제N 레이어 N(210-N)을 포함할 수 있다. 여기서, N은 자연수를 의미할 수 있다. 각각의 레이어는 비선형 연산을 포함할 수 있다. 제1 보조 네트워크(250-1), 제2 보조 네트워크(250-2), ..., 제N 보조 네트워크 N(250-N)은 각각의 레이어에 대응될 수 있다. 프로세서(200)는 복수의 보조 네트워크들을 통해 제1 LUT(230-1), 제2 LUT 2(230-2), ..., 제N LUT(230-N)를 생성하고, 학습시킬 수 있다.The neural network may include a first layer (210-1), a second layer (210-2), ..., an N-th layer N (210-N). Here, N may mean a natural number. Each layer may include non-linear operations. The first auxiliary network 250-1, the second auxiliary network 250-2, ..., the Nth auxiliary network N (250-N) may correspond to each layer. The processor 200 may generate and learn the first LUT 230-1, the second LUT 2 230-2, ..., the N-th LUT 230-N through a plurality of auxiliary networks. .

프로세서(200)는 비선형 연산을 근사하기 위한 LUT의 스케일(scale) 및 바이어스(bias)를 결정함으로써 복수의 LUT(230-1, 230-2, ..., 230-N)를 생성할 수 있다. 프로세서(200)는 뉴럴 네트워크의 제1 레이어(210-1)에 대응하는 제1 보조 네트워크(250-1)에 기초하여 제1 LUT(230-1)를 생성할 수 있다. 프로세서(200)는 뉴럴 네트워크의 제2 레이어(210-1)에 대응하는 제2 보조 네트워크(250-2)에 기초하여 제2 LUT(230-2)를 생성할 수 있다.The processor 200 may generate a plurality of LUTs 230-1, 230-2, ..., 230-N by determining the scale and bias of the LUT for approximating a non-linear operation. . The processor 200 may generate the first LUT 230-1 based on the first auxiliary network 250-1 corresponding to the first layer 210-1 of the neural network. The processor 200 may generate the second LUT 230-2 based on the second auxiliary network 250-2 corresponding to the second layer 210-1 of the neural network.

프로세서(200)는 비선형 연산의 출력 및 보조 네트워크의 출력에 기초하여 복수의 LUT(230-1, 230-2, ..., 230-N)의 파라미터를 업데이트할 수 있다. 프로세서(200)는 비선형 연산의 출력에 기초하여 제1 레이어(210-1) 및 제2 레이어(210-2)를 동시에 학습시킬 수 있다.The processor 200 may update parameters of the plurality of LUTs 230-1, 230-2, ..., 230-N based on the output of the non-linear operation and the output of the auxiliary network. The processor 200 may simultaneously learn the first layer 210-1 and the second layer 210-2 based on the output of the non-linear operation.

프로세서(200)는 캘리브레이션 데이터에 기초하여 뉴럴 네트워크의 제1 레이어(210-1)에 대응하는 제1 비선형 연산 출력을 생성할 수 있다. 프로세서(200)는 제1 비선형 연산 출력을 뉴럴 네트워크의 제2 레이어(210-2)에 입력함으로써 순전파(forward propagation)를 수행할 수 있다.The processor 200 may generate a first non-linear operation output corresponding to the first layer 210-1 of the neural network based on the calibration data. The processor 200 may perform forward propagation by inputting the first nonlinear operation output to the second layer 210-2 of the neural network.

프로세서(200)는 보조 네트워크의 출력 및 비선형 연산의 출력에 기초하여 역전파를 수행함으로써 보조 네트워크의 파라미터를 미세 조정할 수 있다. 프로세서(200)는 보조 네트워크의 출력과 비선형 연산의 출력 간의 평균 절대 오차(mean absolute error)에 기초하여 보조 네트워크의 파라미터를 미세 조정할 수 있다.The processor 200 may fine-tune the parameters of the auxiliary network by performing backpropagation based on the output of the auxiliary network and the output of the nonlinear operation. The processor 200 may fine-tune the parameters of the auxiliary network based on the mean absolute error between the output of the auxiliary network and the output of the non-linear operation.

도 3a는 보조 네트워크의 학습 과정을 설명하기 위한 도면이고, 도 3b는 보조 네트워크의 예를 나타낸다.FIG. 3A is a diagram for explaining the learning process of an auxiliary network, and FIG. 3B shows an example of an auxiliary network.

도 3a 및 도 3b를 참조하면, 프로세서(예: 도 1의 프로세서(200))는 뉴럴 네트워크의 비선형 연산에 대응하는 LUT를 생성할 수 있다. 프로세서(200)는 보조 네트워크를 구성함으로써 LUT를 생성할 수 있다.Referring to FIGS. 3A and 3B, a processor (eg, processor 200 of FIG. 1) may generate an LUT corresponding to a nonlinear operation of a neural network. The processor 200 may generate a LUT by configuring an auxiliary network.

도 3a의 예시는 하나의 레이어에 대응하는 보조 네트워크(310) 및 비선형 연산(330)을 이용하여 학습 과정을 설명하기 위한 도면이다. 다른 보조 네트워크 및 레이어에 대해서도 후술하는 것과 같은 방식으로 학습이 수행될 수 있다.The example in FIG. 3A is a diagram for explaining a learning process using the auxiliary network 310 and nonlinear operation 330 corresponding to one layer. Learning can also be performed for other auxiliary networks and layers in the same manner as described later.

프로세서(200)는 사전에 학습이 완료된 하나의 보조 네트워크(310)에 대한 가중치를 모든 레이어에 대응하는 보조 네트워크(310)의 가중치의 초기값으로 사용할 수 있다.The processor 200 may use the weight of one auxiliary network 310 for which training has been completed in advance as an initial value of the weight of the auxiliary network 310 corresponding to all layers.

프로세서(200)는 데이터 중에서 뉴럴 네트워크를 학습시키기 위한 학습 데이터에 기초하여 캘리브레이션 데이터를 추출할 수 있다. 프로세서(200)는 학습 데이터 중에서 미리 결정된 비율의 데이터를 추출함으로써 캘리브레이션 데이터를 추출할 수 있다. 캘리브레이션 데이터는 뉴럴 네트워크의 학습 데이터로부터 추출되기 때문에 학습 데이터와 통계적으로 유사한 특성을 가질 수 있다.The processor 200 may extract calibration data from data based on training data for training a neural network. The processor 200 may extract calibration data by extracting a predetermined ratio of data from the training data. Since the calibration data is extracted from the training data of the neural network, it may have statistically similar characteristics to the training data.

프로세서(200)는 추출된 캘리브레이션 데이터에 기초하여 보조 네트워크(310)의 미세 조정을 수행할 수 있다. The processor 200 may perform fine tuning of the auxiliary network 310 based on the extracted calibration data.

프로세서(200)는 보조 네트워크(310)의 출력 및 비선형 연산(330)의 출력에 기초하여 역전파를 수행함으로써 보조 네트워크(310)의 파라미터를 미세 조정할 수 있다. 프로세서(200)는 보조 네트워크(310)의 출력과 비선형 연산(330)의 출력 간의 평균 절대 오차(Mean Absolute Error(MAE))에 기초하여 보조 네트워크(310)의 파라미터를 미세 조정할 수 있다.The processor 200 may finely adjust the parameters of the auxiliary network 310 by performing backpropagation based on the output of the auxiliary network 310 and the output of the nonlinear operation 330. The processor 200 may finely adjust the parameters of the auxiliary network 310 based on the mean absolute error (MAE) between the output of the auxiliary network 310 and the output of the nonlinear operation 330.

뉴럴 네트워크의 레이어가 깊어질수록 비선형 연산(330)의 근사로 인하여 생기는 오차가 누적될 수 있다. 프로세서(200)는 다음 레이어로 전달되는 출력을 비선형 연산(330)의 출력을 사용함으로써 복수의 레이어를 동시에 학습함으로써 발생하는 오차 누적을 방지할 수 있다. 이를 통해, 프로세서(200)는 오차를 누적시키지 않고 한 번에 뉴럴 네트워크의 모든 레이어를 학습시킬 수 있기 때문에 학습의 효율을 향상시킬 수 있다.As the layers of the neural network become deeper, errors resulting from the approximation of the nonlinear operation 330 may accumulate. The processor 200 can prevent error accumulation caused by learning multiple layers simultaneously by using the output of the nonlinear operation 330 as the output transmitted to the next layer. Through this, the processor 200 can improve learning efficiency because it can learn all layers of the neural network at once without accumulating errors.

보조 네트워크(310)는 하나의 은닉 레이어를 포함하는 뉴럴 네트워크의 형태로 이루어질 수 있다. N-엔트리(entry)의 LUT를 생성할 때, 보조 네트워크(310)는 N-1 개의 뉴런으로 구성될 수 있다. 이 때, 비선형 함수는 ReLU가 사용될 수 있다.The auxiliary network 310 may be in the form of a neural network including one hidden layer. When generating a LUT of N-entries, the auxiliary network 310 may be composed of N-1 neurons. At this time, ReLU may be used as the nonlinear function.

보조 네트워크(310)의 첫 번째 레이어는 가중치 및 바이어스 파라미터를 모두 포함할 수 있고, 두 번째 레이어는 가중치 파라미터만을 포함할 수 있다.The first layer of the auxiliary network 310 may include both weight and bias parameters, and the second layer may include only the weight parameters.

도 4는 뉴럴 네트워크 연산 장치의 성능을 나타낸다.Figure 4 shows the performance of the neural network computing device.

도 4를 참조하면, 프로세서(예: 도 1의 프로세서(200))는 보조 네트워크의 미세 조정을 통해서 LUT를 생성함으로써 뉴럴 네트워크 연산의 성능을 향상시킬 수 있다. 도 4의 예시는 미세 조정을 수행한 LUT가 베이스라인 및 미세 조정을 수행하지 않은 LUT에 비하여 우수한 성능을 나타낸다는 것을 나타낼 수 있다.Referring to FIG. 4, a processor (e.g., processor 200 of FIG. 1) can improve the performance of neural network operations by generating a LUT through fine tuning of the auxiliary network. The example of FIG. 4 may indicate that the LUT that has performed fine-tuning shows superior performance compared to the baseline and the LUT that has not performed fine-tuning.

도 4의 예시는, RoBERTa 모델의 비선형 연산인 레이어 정규화 연산을 LUT로 근사했을 때, 태스크(task) 별 점수를 나타낼 수 있다.The example of FIG. 4 may represent the score for each task when the layer normalization operation, which is a non-linear operation of the RoBERTa model, is approximated with a LUT.

베이스라인 열은 RoBERTa 모델의 비선형 연산을 그대로 사용하였을 때의 성능을 나타내고, LUT 열은 하나의 LUT로 비선형 연산을 치환했을 때의 성능을 나타내고, 미세 조정을 수행한 LUT 열은 각각의 레이어에 대응하는 보조 네트워크를 이용하여 미세 조정이 수행된 LUT를 사용했을 경우의 성능을 나타낼 수 있다.The baseline column represents the performance when the nonlinear operation of the RoBERTa model is used as is, the LUT column represents the performance when the nonlinear operation is replaced with one LUT, and the fine-tuned LUT column corresponds to each layer. Performance can be shown when using a LUT that has been fine-tuned using an auxiliary network.

프로세서(200)는 미세 조정을 수행하여 레이어 별로 최적화된 LUT를 생성함으로써 대부분의 태스크에서 하나의 LUT만을 이용하여 비선형 연산을 치환할 때 발생하는 성능 저하를 복구할 수 있다.The processor 200 performs fine tuning to generate an optimized LUT for each layer, thereby recovering performance degradation that occurs when replacing nonlinear operations using only one LUT in most tasks.

프로세서(200)는 복수의 레이어에 대하여 미세 조정을 병렬적으로 동시에 수행함으로써 레이어 별로 순차적으로 미세 조정을 수행하는 방식에 비하여 효율적으로 학습을 수행할 수 있다.The processor 200 can perform learning more efficiently by simultaneously performing fine-tuning on a plurality of layers in parallel, compared to a method of sequentially performing fine-tuning for each layer.

도 5는 도 4의 성능을 측정하는 동작의 흐름도를 나타낸다.Figure 5 shows a flowchart of the operation for measuring the performance of Figure 4.

도 5를 참조하면, 프로세서(예: 도 1의 프로세서(200))는 뉴럴 네트워크를 구성하는 각각의 레이어에 LUT를 생성하기 위한 보조 네트워크를 배치하여 뉴럴 네트워크의 비선형 연산에 대응하는 LUT를 생성할 수 있다(510). 이 때, 사전에 학습된 하나의 보조 네트워크의 가중치를 모든 레이어에 대한 보조 네트워크의 가중치의 초기값으로 사용할 수 있다.Referring to FIG. 5, a processor (e.g., processor 200 in FIG. 1) places an auxiliary network for generating a LUT in each layer constituting the neural network to generate a LUT corresponding to the nonlinear operation of the neural network. Can (510). At this time, the weight of one auxiliary network learned in advance can be used as the initial value of the weight of the auxiliary network for all layers.

프로세서(200)는 캘리브레이션 데이터를 사용하여 순전파 학습을 진행할 수 있다(530). 프로세서(200)는 뉴럴 네트워크의 학습 데이터의 일부를 캘리브레이션 데이터로 사용할 수 있다. 예를 들어, 프로세서(200)는 뉴럴 네트워크의 학습 데이터의 1/10을 캘리브레이션 데이터로 사용할 수 있다. 프로세서(200)는 비선형 연산의 출력을 다음 레이어로 전달할 수 있다.The processor 200 may perform forward propagation learning using the calibration data (530). The processor 200 may use part of the learning data of the neural network as calibration data. For example, the processor 200 may use 1/10 of the training data of the neural network as calibration data. The processor 200 may transfer the output of the nonlinear operation to the next layer.

프로세서(200)는 보조 네트워크의 출력 및 비선형 연산의 출력에 기초하여 역전파 학습을 수행함으로써 LUT를 생성할 수 있다(550). 프로세서(200)는 보조 네트워크의 가중치를 미세 조정함으로써 LUT를 생성할 수 있다(570).The processor 200 may generate an LUT by performing backpropagation learning based on the output of the auxiliary network and the output of the nonlinear operation (550). The processor 200 may generate the LUT by fine-tuning the weights of the auxiliary network (570).

프로세서(200)는 미세 조정된 보조 네트워크를 이용하여 태스크의 정확도를 측정할 수 있다(590). 프로세서(200)는 530 내지 570의 동작을 반복적으로 수행함으로써 미세 조정된 보조 네트워크를 통해 태스크의 정확도를 측정할 수 있다. 프로세서(200)는 미세 조정을 통해 뉴럴 네트워크 및/또는 보조 네트워크의 파라미터를 업데이트할 수 있다.The processor 200 may measure the accuracy of the task using the finely tuned auxiliary network (590). The processor 200 may measure task accuracy through a finely tuned auxiliary network by repeatedly performing operations 530 to 570. Processor 200 may update parameters of the neural network and/or auxiliary network through fine-tuning.

상술한 미세 조정 방식을 통해, 프로세서(200)는 회귀(regression)를 사용한 레이블(label)을 필요로 하지 않고, 뉴럴 네트워크 및/또는 보조 네트워크에 대한 미세 조정을 수행할 수 있다.Through the above-described fine-tuning method, the processor 200 can perform fine-tuning on the neural network and/or auxiliary network without requiring a label using regression.

프로세서(200)는 비선형 함수의 출력을 다음 레이어의 입력으로 전달함으로써 미세 조정할 때 발생하는 오차의 누적을 방지할 수 있다.The processor 200 can prevent the accumulation of errors that occur during fine tuning by transferring the output of the nonlinear function to the input of the next layer.

도 6은 도 1에 도시된 뉴럴 네트워크 연산 장치의 동작의 흐름도를 나타낸다.FIG. 6 shows a flowchart of the operation of the neural network computing device shown in FIG. 1.

도 6을 참조하면, 수신기(예: 도 1의 수신기(100))는 뉴럴 네트워크 연산을 수행하기 위한 데이터를 수신할 수 있다(610).Referring to FIG. 6, a receiver (e.g., receiver 100 of FIG. 1) may receive data for performing a neural network operation (610).

프로세서(예: 도 1의 프로세서(200))는 데이터 중에서 뉴럴 네트워크를 학습시키기 위한 학습 데이터에 기초하여 캘리브레이션 데이터를 추출할 수 있다(630). 프로세서(200)는 학습 데이터 중에서 미리 결정된 비율의 데이터를 추출함으로써 캘리브레이션 데이터를 추출할 수 있다. 예를 들어, 미리 결정된 비율은 1/10일 수 있다.A processor (e.g., processor 200 in FIG. 1) may extract calibration data based on training data for training a neural network from data (630). The processor 200 may extract calibration data by extracting a predetermined ratio of data from the training data. For example, the predetermined ratio may be 1/10.

프로세서(200)는 캘리브레이션 데이터에 기초하여 뉴럴 네트워크의 레이어에 대응하는 보조 네트워크를 통해 뉴럴 네트워크에 포함된 비선형 연산을 수행하기 위한 LUT를 생성할 수 있다(650). 예를 들어, 비선형 연산은 GELU 연산, 소프트맥스 연산 또는 레이어 정규화 연산을 포함할 수 있다.The processor 200 may generate an LUT for performing a nonlinear operation included in the neural network through an auxiliary network corresponding to a layer of the neural network based on the calibration data (650). For example, non-linear operations may include GELU operations, softmax operations, or layer normalization operations.

프로세서(200)는 비선형 연산을 근사하기 위한 LUT의 스케일 및 바이어스를 결정함으로써 LUT를 생성할 수 있다. 프로세서(200)는 뉴럴 네트워크의 제1 레이어에 대응하는 제1 보조 네트워크에 기초하여 제1 LUT를 생성할 수 있다. 프로세서(200)는 뉴럴 네트워크의 제2 레이어에 대응하는 제2 보조 네트워크에 기초하여 제2 LUT를 생성할 수 있다.The processor 200 may generate a LUT by determining the scale and bias of the LUT for approximating a non-linear operation. The processor 200 may generate the first LUT based on the first auxiliary network corresponding to the first layer of the neural network. The processor 200 may generate a second LUT based on a second auxiliary network corresponding to the second layer of the neural network.

프로세서(200)는 비선형 연산의 출력 및 보조 네트워크의 출력에 기초하여 LUT의 파라미터를 업데이트할 수 있다(670). 프로세서(200)는 캘리브레이션 데이터에 기초하여 뉴럴 네트워크의 제1 레이어에 대응하는 제1 비선형 연산 출력을 생성할 수 있다. 프로세서(200)는 제1 비선형 연산 출력을 뉴럴 네트워크의 제2 레이어에 입력함으로써 순전파를 수행할 수 있다.The processor 200 may update the parameters of the LUT based on the output of the nonlinear operation and the output of the auxiliary network (670). The processor 200 may generate a first non-linear operation output corresponding to the first layer of the neural network based on the calibration data. The processor 200 may perform forward propagation by inputting the first nonlinear operation output to the second layer of the neural network.

프로세서(200)는 보조 네트워크의 출력 및 비선형 연산의 출력에 기초하여 역전파를 수행함으로써 보조 네트워크의 파라미터를 미세 조정할 수 있다. 프로세서(200)는 보조 네트워크의 출력과 비선형 연산의 출력 간의 평균 절대 오차에 기초하여 보조 네트워크의 파라미터를 미세 조정할 수 있다.The processor 200 may fine-tune the parameters of the auxiliary network by performing backpropagation based on the output of the auxiliary network and the output of the nonlinear operation. The processor 200 may fine-tune the parameters of the auxiliary network based on the average absolute error between the output of the auxiliary network and the output of the non-linear operation.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, the devices, methods, and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, and a field programmable gate (FPGA). It may be implemented using a general-purpose computer or a special-purpose computer, such as an array, programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and software applications running on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include multiple processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on a computer-readable recording medium.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있으며 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. A computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination, and the program instructions recorded on the medium may be specially designed and constructed for the embodiment or may be known and available to those skilled in the art of computer software. It may be possible. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

위에서 설명한 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 또는 복수의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or multiple software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 이를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited drawings, those skilled in the art can apply various technical modifications and variations based on this. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims also fall within the scope of the claims described below.

Claims

In the neural network computing device,
A receiver that receives data for performing neural network operations;
Extracting calibration data from the data based on training data for training the neural network,
Based on the calibration data, generate a LUT (Look Up Table) for performing a non-linear operation included in the neural network through an auxiliary network corresponding to a layer of the neural network,
A processor that updates parameters of the LUT based on the output of the non-linear operation and the output of the auxiliary network.
A neural network computing device including a.

According to paragraph 1,
The processor,
Extracting the calibration data by extracting a predetermined ratio of data from the learning data,
Neural network computing device.

According to paragraph 1,
The processor,
Generate a first LUT based on a first auxiliary network corresponding to the first layer of the neural network,
Generating a second LUT based on a second auxiliary network corresponding to the second layer of the neural network,
Neural network computing device.

According to paragraph 1,
The processor,
Generating a first non-linear operation output corresponding to a first layer of the neural network based on the calibration data,
Performing forward propagation by inputting the first nonlinear operation output to a second layer of the neural network,
Neural network computing device.

According to paragraph 1,
The processor,
Generating the LUT by determining a scale and bias of the LUT for approximating the non-linear operation,
Neural network computing device.

According to paragraph 1,
The processor,
Fine-tuning the parameters of the auxiliary network by performing back propagation based on the output of the auxiliary network and the output of the nonlinear operation,
Neural network computing device.

According to clause 6,
The processor,
Fine-tuning the parameters of the auxiliary network based on the mean absolute error between the output of the auxiliary network and the output of the non-linear operation,
Neural network computing device.

According to paragraph 3,
The processor,
Simultaneously learning the first layer and the second layer based on the output of the nonlinear operation,
Neural network computing device.

According to paragraph 1,
The nonlinear operation is,
Including GELU (Gaussian Error Linear Unit) operation, softmax operation, or layer normalization operation,
Neural network computing device.

In the neural network calculation method,
Receiving data for performing a neural network operation;
extracting calibration data from the data based on training data for training the neural network;
generating a Look Up Table (LUT) for performing a non-linear operation included in the neural network through an auxiliary network corresponding to a layer of the neural network based on the calibration data; and
Updating parameters of the LUT based on the output of the non-linear operation and the output of the auxiliary network.
Neural network calculation method including.

According to clause 10,
The step of extracting the calibration data is,
Extracting the calibration data by extracting a predetermined ratio of data from the learning data.
Neural network calculation method including.

According to clause 10,
The step of generating the LUT is,
generating a first LUT based on a first auxiliary network corresponding to a first layer of the neural network; and
generating a second LUT based on a second auxiliary network corresponding to the second layer of the neural network
Neural network calculation method including.

According to clause 10,
The step of updating the parameters is,
generating a first non-linear operation output corresponding to a first layer of the neural network based on the calibration data; and
Performing forward propagation by inputting the first nonlinear operation output to a second layer of the neural network
Neural network calculation method including.

According to clause 10,
The step of generating the LUT is,
Generating the LUT by determining a scale and bias of the LUT for approximating the non-linear operation.
Neural network calculation method including.

According to clause 10,
The step of updating the parameters is,
Fine-tuning the parameters of the auxiliary network by performing backpropagation based on the output of the auxiliary network and the output of the non-linear operation.
Neural network calculation method including.

According to clause 15,
The step of fine-tuning the parameters of the auxiliary network is,
Fine-tuning the parameters of the auxiliary network based on the mean absolute error between the output of the auxiliary network and the output of the non-linear operation.
Neural network calculation method including.

According to clause 12,
The step of updating the parameters is,
Simultaneously learning the first layer and the second layer based on the output of the nonlinear operation
Neural network calculation method including.

According to clause 10,
The nonlinear operation is,
Including GELU (Gaussian Error Linear Unit) operation, softmax operation, or layer normalization operation,
Neural network calculation method.

A computer program stored in a computer-readable medium in combination with hardware to execute the method of any one of claims 10 to 18.