KR102596769B1

KR102596769B1 - Method for designing neural network and device for the same

Info

Publication number: KR102596769B1
Application number: KR1020220085575A
Authority: KR
Inventors: 최정환; 은현; 김하나
Original assignee: 오픈엣지테크놀로지 주식회사
Priority date: 2022-07-12
Filing date: 2022-07-12
Publication date: 2023-11-02
Also published as: WO2024014632A1; KR20240008816A

Abstract

컴퓨팅 장치에서 동작하는 신경망의 레이어별 연산정밀도를 결정하는 방법을 공개한다. 이 방법은, 신경망에 포함된 제1세트의 레이어들 각각에 대한 지표를 산출하는 단계, 상기 제1세트의 레이어들 중 상기 지표의 값이 작은 상위 N1 개의 레이어들에 대하여 제1연산정밀도를 할당하고 상기 지표의 값이 큰 상위 N2 개의 레이어들에 대하여 상기 제1연산정밀도보다 낮은 정밀도인 제2연산정밀도를 할당하는 단계, 및 상기 제1세트의 레이어들 중 상기 제1연산정밀도가 할당된 레이어에 관한 정보와 상기 제2연산정밀도가 할당된 레이어에 관한 정보를 메모리에 저장하는 단계를 포함한다.We disclose a method for determining the computational precision of each layer of a neural network operating on a computing device. This method includes calculating an index for each of the first set of layers included in the neural network, and assigning a first calculation precision to the upper N1 layers with smaller index values among the first set of layers. and allocating a second calculation precision having a lower precision than the first calculation precision to the upper N2 layers with large values of the indicator, and a layer to which the first calculation precision is assigned among the first set of layers. and storing information about the layer to which the second calculation precision is assigned in a memory.

Description

Neural network design method and device for the same {Method for designing neural network and device for the same}

본 발명은 인공지능 신경망을 설계하는 기술에 관한 것으로서, 특히 신경망의 각 레이어에서 이용되는 연산 데이터의 정밀도를 결정하는 기술에 관한 것이다.The present invention relates to technology for designing an artificial intelligence neural network, and in particular, to technology for determining the precision of computational data used in each layer of a neural network.

신경망은 입력 데이터를 출력 데이터로 변환하는 처리모듈로서 컴퓨팅 장치에서 이용될 수 있는 기능모듈 중 하나이다. 신경망은 복수 개의 캐스케이드 연결된 레이어들 및 상기 레이어와는 구분되는 다른 연산모듈들을 포함할 수 있으며, 이에 관한 내용은 다수의 문헌에 공개되어 있다. A neural network is a processing module that converts input data into output data and is one of the functional modules that can be used in a computing device. A neural network may include a plurality of cascaded layers and other computational modules that are distinct from the layers, and information regarding this is disclosed in numerous documents.

각각의 레이어에 입력되는 데이터를 입력 액티베이션이라고 지칭하고 각각의 레이어가 출력하는 데이터를 출력 액티베이션이라고 지칭할 수 있다. 상기 입력 액티베이션과 출력 액티베이션은 다수의 값들로 구성된 데이터이다. 한 개의 레이어가 출력한 출력 액티베이션은 다른 레이어의 입력 액티베이션으로서 제공될 수도 있고, 또는 상기 레이어와는 구분되는 다른 연산모듈의 입력 데이터로서 제공될 수도 있다. 각 레이어에서의 연산을 위해, 상기 각 레이어에 입력되는 입력 액티베이션뿐만 아니라, 상기 각 레이어에서 상기 입력 액티베이션과 소정의 규칙에 의해 연산되는 데이터인 가중치가 필요할 수 있다.Data input to each layer may be referred to as input activation, and data output from each layer may be referred to as output activation. The input activation and output activation are data composed of multiple values. The output activation output by one layer may be provided as input activation of another layer, or may be provided as input data of another calculation module distinct from the layer. For calculation in each layer, not only the input activation input to each layer, but also weights, which are data calculated according to the input activation and predetermined rules in each layer, may be required.

신경망의 동작은 이를 위한 명령코드들 및 데이터가 CPU에 로딩 되어 실행될 수 있다. 또는 신경망의 동작은 신경망의 동작을 지원하도록 설계되어 제조된 전용의 하드웨어인 신경망 가속장치를 이용해 실행될 수 있으며, 이때 CPU와 상기 신경망 가속장치는 서로 협력할 수 있고, CPU는 신경망 가속장치의 구동을 위한 명령코드를 실행하고, 그리고 신경망 가속장치는 필요한 데이터를 메모리로부터 읽을 수 있다.The operation of a neural network can be executed by loading command codes and data into the CPU. Alternatively, the operation of the neural network can be executed using a neural network accelerator, which is dedicated hardware designed and manufactured to support the operation of the neural network. In this case, the CPU and the neural network accelerator can cooperate with each other, and the CPU drives the neural network accelerator. executes the command code, and the neural network accelerator can read the necessary data from memory.

신경망의 연산을 위해 사용되는 원-데이터(raw data)는 예컨대 32-비트 플로팅 포인트 (FP32) 형태의 데이터이거나 이와 다른 형태의 데이터일 수 있다. 그러나 데이터 메모리 트래픽 감소와 경량화 연산을 위하여, 상기 신경망의 각 레이어에서 사용하는 데이터는 고정 소수점(INT4, INT8, or INT16) 형태로 변환될 필요가 있다.The raw data used for the calculation of the neural network may be, for example, data in the form of 32-bit floating point (FP32) or other forms of data. However, in order to reduce data memory traffic and perform lightweight operations, the data used in each layer of the neural network needs to be converted to fixed point (INT4, INT8, or INT16) form.

예컨대 제1레이어에서의 연산을 위해 준비된 제1입력 액티베이션 및 제1가중치는 제1비트수를 갖는 고정 소수점 형태로 준비되어야 할 수 있다. 만일 상기 제1레이어에서의 연산을 위해 준비된 제1입력 액티베이션 및 제1가중치가 상기 제1비트수를 갖는 고정 소수점 형태가 아니라면, 이를 상기 제1비트수를 갖는 고정 소수점 형태로 변환할 필요가 있다. 이와 같이, 데이터 메모리 트래픽의 감소와 경량화 연산을 목적으로 플로팅 포인트 형태의 데이터를 적은 비트수를 사용하는 고정 소수점 형태로 근사시키는 기법을 신경망 양자화라고 지칭할 수 있다.For example, the first input activation and first weight prepared for operation in the first layer may need to be prepared in fixed-point form with the first number of bits. If the first input activation and first weight prepared for operation in the first layer are not in fixed-point form with the first number of bits, it is necessary to convert them into fixed-point form with the first number of bits. . In this way, the technique of approximating data in the form of floating points to a fixed point form using a small number of bits for the purpose of reducing data memory traffic and making lightweight calculations can be referred to as neural network quantization.

주어진 신경망이 복수 개의 레이어들을 포함하고 있을 때에, 상기 복수 개의 레이어들 전부를 동일한 정밀도로 신경망 양자화할 수 있다(단일 정밀도 신경망 양자화). 이때 상기 복수 개의 레이어들 전부를 제1비트수를 갖는 고정 소수점 형태로 양자화하는 제1경우와 제2비트수를 갖는 고정 소수점 형태로 양자화하는 제2경우를 비교할 수 있다. 상기 제2비트수가 제1비트수보다 크다고 가정하면, 상기 제1경우에는 신경망 경량화 측면에서 더 유리하지만 양자화 에러 관점에서는 더 불리하며, 상기 제2경우에는 신경망 경량화 측면에서 더 불리하지만 양자화 에러 관점에서는 더 유리하다. When a given neural network includes multiple layers, all of the multiple layers can be quantized with the same precision (single precision neural network quantization). At this time, a first case in which all of the plurality of layers are quantized into a fixed point form with a first number of bits can be compared with a second case in which all of the plurality of layers are quantized in a fixed point form with a second number of bits. Assuming that the second number of bits is greater than the first number of bits, the first case is more advantageous in terms of lightening the neural network, but is more disadvantageous in terms of quantization error, and the second case is more disadvantageous in terms of lightening the neural network, but in terms of quantization error, It is more advantageous.

위와 달리, 주어진 신경망의 복수 개의 레이어 중 일부는 제1비트수의 고정 소수점 형태로 양자화하고 다른 일부는 제2비트수의 고정 소수점 형태로 할 수 있다(혼합 정밀도 신경망 양자화). 이렇게 함으로써 양자화 에러와 신경망 경량화 관점을 모두 고려하였을 때에 종합적으로 더 유리한 효과를 달성할 수 있다. 이때, 제1비트수의 고정 소수점 형태(제1정밀도)로 양자화할 제1그룹의 레이어들과 제2비트수의 고정 소수점 형태(제2정밀도)로 양자화할 제2그룹의 레이어들을 선별하여 결정할 수 있어야 한다. Unlike the above, some of the plurality of layers of a given neural network may be quantized in fixed-point form with a first number of bits, and others may be quantized in fixed-point form with a second number of bits (mixed-precision neural network quantization). By doing this, a more advantageous effect can be achieved overall when considering both quantization error and neural network lightweight perspectives. At this time, the first group of layers to be quantized in fixed point form (first precision) of the first number of bits and the second group of layers to be quantized in fixed point form (second precision) of the second number of bits are selected and determined. Must be able to.

본 발명에서는 주어진 신경망에 포함된 복수 개의 레이어들 중 제1정밀도로 양자화할 제1그룹의 레이어들과 제2정밀도로 양자화할 제2그룹의 레이어들을 결정하는 기술을 제공하고자 한다.The present invention seeks to provide a technique for determining first group layers to be quantized with first precision and second group layers to be quantized with second precision among a plurality of layers included in a given neural network.

본 발명의 일 관점에 따라, 컴퓨팅 장치에서 동작하는 신경망의 레이어별 연산정밀도를 결정하는 방법이 제공될 수 있다. 이 방법은, 컴퓨팅 장치가, 신경망에 포함된 제1세트의 레이어들 각각에 대한 지표(SQNR_Avg)를 산출하는 단계; 상기 컴퓨팅 장치가, 상기 제1세트의 레이어들 중 상기 지표의 값이 작은 상위 N1 개의 레이어들에 대하여 제1연산정밀도를 할당하고 상기 지표의 값이 큰 상위 N2 개의 레이어들에 대하여 상기 제1연산정밀도보다 낮은 정밀도인 제2연산정밀도를 할당하는 단계; 및 상기 컴퓨팅 장치가, 상기 제1세트의 레이어들 중 상기 제1연산정밀도가 할당된 레이어에 관한 정보와 상기 제2연산정밀도가 할당된 레이어에 관한 정보를 메모리에 저장하는 단계;를 포함한다. 여기서 상기 제1연산정밀도가 할당된 레이어에 관한 정보는 상기 레어어를 식별하는 정보일 수 있다. 그리고 상기 제2연산정밀도가 할당된 레이어에 관한 정보 역시 상기 레이어를 식별하는 정보일 수 있다.According to one aspect of the present invention, a method for determining the computational precision of each layer of a neural network operating on a computing device may be provided. The method includes calculating, by a computing device, an index (SQNR _Avg ) for each of a first set of layers included in a neural network; The computing device allocates a first operation precision to the upper N1 layers with smaller index values among the layers of the first set and performs the first operation on the upper N2 layers with larger index values. Allocating a second arithmetic precision that is lower than the precision; and storing, by the computing device, information about a layer to which the first computational precision is assigned and information about a layer to which the second computational precision is assigned among the first set of layers in a memory. Here, information about the layer to which the first computational precision is assigned may be information identifying the layer. Additionally, information about the layer to which the second calculation precision is assigned may also be information identifying the layer.

이때, 상기 신경망에 포함된 특정 레이어에 대한 상기 지표를 산출하는 단계는, 상기 특정 레이어에서 연산되는 제1데이터가 제1비트수를 갖는 고정 소수점 형태로 양자화되어 있는 경우에 상기 특정 레이어가 출력하는 제1출력 액티베이션과 상기 제1데이터가 제2비트수를 갖는 고정 소수점 형태로 양자화되어 있는 경우에 상기 특정 레이어가 출력하는 제2출력 액티베이션의 차이값 또는 거리를 기초로 생성한 값을 상기 특정 출력 레이어의 출력 액티베이션의 크기를 이용하여 보정하는 단계를 포함할 수 있다. At this time, the step of calculating the indicator for a specific layer included in the neural network includes the output of the specific layer when the first data calculated in the specific layer is quantized in a fixed point form with a first number of bits. When the first output activation and the first data are quantized in fixed-point form with a second number of bits, a value generated based on the difference value or distance between the second output activation output by the specific layer is used as the specific output. A correction step may be included using the size of the output activation of the layer.

이때, 상기 신경망에 포함된 특정 레이어에 대한 상기 지표는, 상기 특정 레이어에서 연산되는 제1데이터가 제1비트수를 갖는 고정 소수점 형태로 양자화되어 있는 경우에 상기 특정 레이어가 출력하는 제1출력 액티베이션과 상기 제1데이터가 제2비트수를 갖는 고정 소수점 형태로 양자화되어 있는 경우에 상기 특정 레이어가 출력하는 제2출력 액티베이션의 차이값이 작을수록 더 큰 값을 가질 수 있다. At this time, the indicator for the specific layer included in the neural network is the first output activation output by the specific layer when the first data calculated in the specific layer is quantized in fixed point form with the first number of bits. When the first data is quantized in a fixed-point format with a second number of bits, the smaller the difference between the second output activations output by the specific layer, the larger the value.

이때, 상기 신경망에 포함된 특정 레이어에 대한 상기 지표는, 상기 특정 레이어에서 연산되는 제1데이터가 제1비트수를 갖는 고정 소수점 형태로 양자화되어 있는 경우에 상기 특정 레이어가 출력하는 제1출력 액티베이션과 상기 제1데이터가 제2비트수를 갖는 고정 소수점 형태로 양자화되어 있는 경우에 상기 특정 레이어가 출력하는 제2출력 액티베이션 간의 거리가 작을수록 더 큰 값을 가질 수 있다. At this time, the indicator for the specific layer included in the neural network is the first output activation output by the specific layer when the first data calculated in the specific layer is quantized in fixed point form with the first number of bits. When the first data is quantized in fixed-point form with a second number of bits, the smaller the distance between the second output activation output by the specific layer, the larger the value.

이때, 상기 신경망에 포함된 특정 레이어에 대한 상기 지표는, 상기 특정 출력 레이어의 출력 액티베이션의 크기가 클수록 더 작은 값을 가질 수 있다. At this time, the indicator for a specific layer included in the neural network may have a smaller value as the size of output activation of the specific output layer increases.

이때, 상기 지표를 산출하는 단계는, 상기 컴퓨팅 장치가, 상기 신경망에 포함된 제k 레이어의 출력 액티베이션의 크기(T)를 획득하는 단계; 상기 컴퓨팅 장치가, 상기 제k 레이어에서 이용하는 제1데이터를 제1비트수를 갖는 고정 소수점 형태로 준비하고, 상기 제1비트수로 준비된 상기 제1데이터를 이용하여 상기 제k 레이어로부터 제1출력 액티베이션(v_i)을 생성하고(i=1, 2, 3, ..., T), 그리고 상기 제1출력 액티베이션의 각 요소의 제곱의 합인 제1합()을 산출하는 제1단계; 상기 컴퓨팅 장치가, 상기 제1데이터를 제2비트수를 갖는 고정 소수점 형태로 준비하고, 상기 제2비트수로 준비된 상기 제1데이터를 이용하여 상기 제k 레이어로부터 제2출력 액티베이션(qv_i)을 생성하고, 그리고 상기 제1출력 액티베이션의 각 요소와 이에 대응하는 제2출력 액티베이션의 각 요소 간의 차이의 제곱의 합인 제2합()을 산출하는 단계; 상기 컴퓨팅 장치가, 상기 제2합에 대한 상기 제1합의 비율에 대한 로그값인 제1로그값()을 산출하고, 그리고 상기 제k 레이어의 출력 액티베이션의 크기(T)에 대한 로그값인 제2로그값(log₁₀T)을 산출하는 단계; 및 상기 컴퓨팅 장치가, 상기 제1로그값에 비례하는 값으로부터 상기 제2로그값에 비례하는 값을 차감하여 상기 제k 레이어에 대한 양자화 오류에 관한 지표(SQNR[k]_Avg)를 결정하는 단계;를 포함할 수 있다. 여기서 상기 i (i=1, 2, 3, ..., T)는 상기 T개의 액티베이션을 각각 구분하는 식별 인덱스이다.At this time, calculating the indicator includes: acquiring, by the computing device, the size (T) of the output activation of the kth layer included in the neural network; The computing device prepares first data used in the k-th layer in a fixed-point format with a first number of bits, and outputs a first output from the k-th layer using the first data prepared with the first number of bits. Generate activation (v _i ) (i=1, 2, 3, ..., T), and a first sum (i), which is the sum of the squares of each element of the first output activation ( ) The first step of calculating; The computing device prepares the first data in a fixed-point format with a second number of bits, and performs a second output activation (qv _i ) from the kth layer using the first data prepared with the second number of bits. Generating a second sum (which is the sum of the squares of the difference between each element of the first output activation and each element of the corresponding second output activation) ) calculating; The computing device provides a first log value, which is a log value of the ratio of the first sum to the second sum ( ), and calculating a second log value (log ₁₀ T), which is a log value for the size (T) of the output activation of the kth layer; and determining, by the computing device, a quantization error index (SQNR[k] _Avg ) for the kth layer by subtracting a value proportional to the second log value from a value proportional to the first log value. May include ;. Here, i (i=1, 2, 3, ..., T) is an identification index that distinguishes each of the T activations.

또는, 상기 신경망의 특정 레이어에 관한 상기 지표(SQNR_Avg)는, 수식 1을 만족할 수 있다.Alternatively, the indicator (SQNR _Avg ) regarding a specific layer of the neural network may satisfy Equation 1.

[수식 1][Formula 1]

여기서, T는 특정 레이어의 출력 액티베이션의 크기이다.Here, T is the size of the output activation of a specific layer.

여기서, v_i는, 상기 특정 레이어에서 연산 되는 제1데이터가 제1비트수를 갖는 고정 소수점 형태로 양자화되어 있을 경우에, 상기 특정 레이어의 출력 액티베이션 중 인덱스 i를 갖는 i번째 데이터의 값이다.Here, v _i is the value of the i-th data with index i among the output activations of the specific layer, when the first data operated on the specific layer is quantized in fixed-point form with the first number of bits.

여기서, qv_i는, 상기 특정 레이어에서 연산 되는 상기 제1데이터가 상기 제1비트수보다 작은 제2비트수를 갖는 고정 소수점 형태로 양자화되어 있을 경우에, 상기 특정 레이어의 출력 액티베이션 중 인덱스 i를 갖는 i번째 데이터의 값이다. Here, qv _i is the index i among the output activations of the specific layer when the first data operated in the specific layer is quantized in fixed-point form with a second number of bits smaller than the first number of bits. It is the value of the ith data.

여기서, α, β는 상수이다. Here, α and β are constants.

그리고 여기서, 이다.And here, am.

이때, 상기 제1데이터는 상기 제k 레이어에 입력되는 제k 입력 액티베이션 및 상기 제k 레이어에서 이용되는 제k 가중치를 포함할 수 있다.At this time, the first data may include the kth input activation input to the kth layer and the kth weight used in the kth layer.

이때, 상기 지표의 값이 작은 상위의 소정 개수의 레이어들에서 처리되는 데이터는 제1개의 비트들의 고정 소수점 형태로 표현된 상태에서 연산 되고, 상기 지표의 값이 큰 상위의 소정 개수의 레이어들에서 처리되는 데이터는 제2개의 비트들의 고정 소수점 형태로 표현된 상태에서 연산되며, 그리고 상기 제1개의 비트들의 개수는 상기 제2개의 비트들의 개수보다 큰 값일 수 있다.At this time, the data processed in the upper predetermined number of layers where the indicator value is small is calculated in the state expressed in fixed-point form of the first bit, and the data processed in the upper predetermined number of layers where the indicator value is large is calculated. The data to be processed is operated in a fixed-point format of the second bits, and the number of the first bits may be greater than the number of the second bits.

본 발명의 다른 관점에 따라, 컴퓨팅 장치에서 동작하는 신경망에 포함된 레이어의 출력 액티베이션을 산출하는 신경망 연산방법이 제공될 수 있다. 이 신경망 연산방법은, 상기 컴퓨팅 장치가, 상기 신경망의 제k 레이어에 할당된 연산정밀도, 상기 제k 레이어에 입력될 제k 입력 액티베이션 및 상기 제k 레이어에 대응하는 제k 가중치를 획득하는 단계; 상기 컴퓨팅 장치가, 상기 제k 입력 액티베이션 및 상기 제k 가중치가 상기 획득한 상기 제k 레이어에 할당된 연산정밀도 대응하는 비트수로 표현되어 있는지 여부를 결정하는 단계; 상기 컴퓨팅 장치가, 상기 제k 가중치가 상기 획득한 상기 제k 레이어에 할당된 연산정밀도 대응하는 비트수로 표현되어 있지 않다면, 상기 제k 입력 액티베이션 및 상기 제k 가중치를 상기 획득한 연산정밀도에 대응하는 비트수의 고정 소수점 형태를 갖도록 변환하는 단계; 및 상기 컴퓨팅 장치가, 상기 변환된 상기 제k 입력 액티베이션 및 상기 제k 가중치를 이용하여 상기 제k 레이어의 상기 제k 출력 액티베이션을 산출하는 단계;를 포함한다. According to another aspect of the present invention, a neural network calculation method for calculating output activation of a layer included in a neural network operating on a computing device may be provided. This neural network calculation method includes: acquiring, by the computing device, computational precision assigned to the k-th layer of the neural network, a k-th input activation to be input to the k-th layer, and a k-th weight corresponding to the k-th layer; determining, by the computing device, whether the kth input activation and the kth weight are expressed in the number of bits corresponding to the computational precision assigned to the obtained kth layer; If the k-th weight is not expressed in the number of bits corresponding to the computational precision assigned to the k-th layer, the computing device sets the k-th input activation and the k-th weight to correspond to the obtained computational precision. Converting the number of bits to have a fixed-point format; and calculating, by the computing device, the kth output activation of the kth layer using the converted kth input activation and the kth weight.

이때, 상기 신경망의 제k 레이어에 할당된 연산정밀도는, 상기 신경망에 포함된 제1세트의 레이어들 각각에 대한 지표(SQNR_Avg)를 산출하는 단계; 상기 제1세트의 레이어들 중 상기 지표의 값이 작은 상위 N1 개의 레이어들에 대하여 제1연산정밀도를 할당하고 상기 지표의 값이 큰 상위 N2 개의 레이어들에 대하여 상기 제1연산정밀도보다 낮은 정밀도인 제2연산정밀도를 할당하는 단계; 및 상기 제1세트의 레이어들 중 상기 제1연산정밀도가 할당된 레이어에 관한 정보와 상기 제2연산정밀도가 할당된 레이어에 관한 정보를 메모리에 저장하는 단계;를 포함하는 연산정밀도 결정방법을 실행하여 결정된 것일 수 있다.At this time, the computational precision allocated to the kth layer of the neural network includes calculating an index (SQNR _Avg ) for each of the first set of layers included in the neural network; Among the layers of the first set, a first calculation precision is assigned to the top N1 layers with small indicator values, and a precision lower than the first calculation precision is assigned to the top N2 layers with large indicator values. Allocating a second computational precision; and storing, in a memory, information about the layer to which the first arithmetic precision is assigned and information about the layer to which the second arithmetic precision is assigned among the first set of layers. This may have been decided.

이때, 상기 지표(SQNR_Avg)는, 상기 수식 1을 만족할 수 있다. At this time, the indicator (SQNR _Avg ) may satisfy Equation 1 above.

본 발명의 다른 관점에 따라, 신경망에 포함된 레이어의 출력 액티베이션을 산출하는 신경망 연산방법이 제공될 수 있다. 상기 신경망 연산방법은, 제1컴퓨팅 장치가, 신경망에 포함된 제1세트의 레이어들 각각에 대한 지표(SQNR_Avg)를 산출하는 단계; 상기 제1 컴퓨팅 장치가, 상기 제1세트의 레이어들 중 상기 지표의 값이 작은 상위 N1 개의 레이어들에 대하여 제1연산정밀도를 할당하고 상기 지표의 값이 큰 상위 N2 개의 레이어들에 대하여 상기 제1연산정밀도보다 낮은 정밀도인 제2연산정밀도를 할당하는 단계; 상기 제1 컴퓨팅 장치가, 상기 제1세트의 레이어들 중 상기 제1연산정밀도가 할당된 레이어에 관한 정보와 상기 제2연산정밀도가 할당된 레이어에 관한 정보를 메모리에 저장하는 단계; 제2 컴퓨팅 장치가, 상기 신경망의 제k 레이어에 할당된 연산정밀도, 상기 제k 레이어에 입력될 제k 입력 액티베이션 및 상기 제k 레이어에 대응하는 제k 가중치를 획득하는 단계; 상기 제2 컴퓨팅 장치가, 상기 제k 입력 액티베이션 및 상기 제k 가중치가 상기 획득한 상기 제k 레이어에 할당된 연산정밀도 대응하는 비트수로 표현되어 있는지 여부를 결정하는 단계; 상기 제2 컴퓨팅 장치가, 상기 제k 가중치가 상기 획득한 상기 제k 레이어에 할당된 연산정밀도 대응하는 비트수로 표현되어 있지 않다면, 상기 제k 입력 액티베이션 및 상기 제k 가중치를 상기 획득한 연산정밀도에 대응하는 비트수의 고정 소수점 형태를 갖도록 변환하는 단계; 및 상기 제2 컴퓨팅 장치가, 상기 변환된 상기 제k 입력 액티베이션 및 상기 제k 가중치를 이용하여 상기 제k 레이어의 상기 제k 출력 액티베이션을 산출하는 단계; 를 포함한다. According to another aspect of the present invention, a neural network calculation method for calculating the output activation of a layer included in the neural network may be provided. The neural network calculation method includes calculating, by a first computing device, an index (SQNR _Avg ) for each of the first set of layers included in the neural network; The first computing device allocates a first calculation precision to the upper N1 layers with smaller index values among the layers of the first set, and the first calculation precision to the upper N2 layers with larger index values. Allocating a second arithmetic precision that is lower than the first arithmetic precision; storing, by the first computing device, information about a layer to which the first computational precision is assigned and information about a layer to which the second computational precision is assigned among the first set of layers in a memory; Obtaining, by a second computing device, computational precision assigned to the k-th layer of the neural network, a k-th input activation to be input to the k-th layer, and a k-th weight corresponding to the k-th layer; determining, by the second computing device, whether the kth input activation and the kth weight are expressed in the number of bits corresponding to the computational precision assigned to the obtained kth layer; If the second computing device determines that the k-th weight is not expressed in a number of bits corresponding to the computational precision assigned to the k-th layer, the k-th input activation and the k-th weight are converted to the obtained computational precision. Converting the number of bits corresponding to to have a fixed-point format; and calculating, by the second computing device, the kth output activation of the kth layer using the converted kth input activation and the kth weight. Includes.

본 발명의 다른 관점에 따라, 비휘발성 기록매체에 액세스할 수 있도록 되어 있고 처리부를 포함하는 컴퓨팅 장치가 제공될 수 있다. 상기 비휘발성 기록매체에는, 신경망에 관한 정보, 레이어별 지표를 산출하도록 하는 제1명령코드, 및 레이어별 연산정밀도를 결정하도록 하는 제2명령코드가 기록되어 있다. 상기 처리부는, 상기 제1명령코드를 읽어 실행함으로써, 상기 신경망에 포함된 제1세트의 레이어들 각각에 대한 지표(SQNR_Avg)를 산출하는 단계; 상기 제2명령코드를 읽어 실행함으로써, 상기 제1세트의 레이어들 중 상기 지표의 값이 작은 상위 N1 개의 레이어들에 대하여 제1연산정밀도를 할당하고 상기 지표의 값이 큰 상위 N2 개의 레이어들에 대하여 상기 제1연산정밀도보다 낮은 정밀도인 제2연산정밀도를 할당하는 단계; 및 상기 제1세트의 레이어들 중 상기 제1연산정밀도가 할당된 레이어에 관한 정보와 상기 제2연산정밀도가 할당된 레이어에 관한 정보를 상기 비휘발성 기록매체에 저장하는 단계;를 실행하도록 되어 있다.According to another aspect of the present invention, a computing device capable of accessing a non-volatile recording medium and including a processing unit may be provided. In the non-volatile recording medium, information about the neural network, a first command code for calculating an index for each layer, and a second command code for determining the calculation precision for each layer are recorded. The processing unit reads and executes the first command code, thereby calculating an index (SQNR _Avg ) for each of the first set of layers included in the neural network; By reading and executing the second command code, the first calculation precision is assigned to the upper N1 layers with smaller index values among the layers of the first set, and to the upper N2 layers with larger index values. assigning a second arithmetic precision that is lower than the first arithmetic precision; and storing information about the layer to which the first computational precision is assigned and information about the layer to which the second computational precision is assigned among the first set of layers in the non-volatile recording medium. .

본 발명의 또 다른 관점에 따라, 비휘발성 기록매체에 액세스할 수 있도록 되어 있고 처리부를 포함하는 컴퓨팅 장치가 제공될 수 있다. 상기 비휘발성 기록매체에는, 신경망에 관한 정보, 및 상기 신경망의 제k 레이어에 할당된 연산정밀도가 기록되어 있다. 상기 처리부는, 상기 신경망의 제k 레이어에 할당된 연산정밀도, 상기 제k 레이어에 입력될 제k 입력 액티베이션 및 상기 제k 레이어에 대응하는 제k 가중치를 획득하는 단계; 상기 제k 입력 액티베이션 및 상기 제k 가중치가 상기 획득한 상기 제k 레이어에 할당된 연산정밀도 대응하는 비트수로 표현되어 있는지 여부를 결정하는 단계; 상기 제k 가중치가 상기 획득한 상기 제k 레이어에 할당된 연산정밀도 대응하는 비트수로 표현되어 있지 않다면, 상기 제k 입력 액티베이션 및 상기 제k 가중치를 상기 획득한 연산정밀도에 대응하는 비트수의 고정 소수점 형태를 갖도록 변환하는 단계; 그리고 상기 변환된 상기 제k 입력 액티베이션 및 상기 제k 가중치를 이용하여 상기 제k 레이어의 상기 제k 출력 액티베이션을 산출하는 단계;를 실행하도록 되어 있다.According to another aspect of the present invention, a computing device capable of accessing a non-volatile recording medium and including a processing unit may be provided. In the non-volatile recording medium, information about the neural network and computational precision assigned to the kth layer of the neural network are recorded. The processing unit acquires computational precision assigned to the k-th layer of the neural network, a k-th input activation to be input to the k-th layer, and a k-th weight corresponding to the k-th layer; determining whether the kth input activation and the kth weight are expressed in the number of bits corresponding to the computational precision assigned to the obtained kth layer; If the k-th weight is not expressed in the number of bits corresponding to the computational precision assigned to the k-th layer, the k-th input activation and the k-th weight are fixed to the number of bits corresponding to the obtained computational precision. Converting to have decimal form; And calculating the kth output activation of the kth layer using the converted kth input activation and the kth weight.

본 발명에 따르면 주어진 신경망에 포함된 복수 개의 레이어들 중 제1정밀로도 양자화할 제1그룹의 레이어들과 제2정밀도로 양자화할 제2그룹의 레이어들을 결정하는 기술을 제공할 수 있다.According to the present invention, it is possible to provide a technique for determining first group layers to be quantized with first precision and second group layers to be quantized with second precision among a plurality of layers included in a given neural network.

도 1은 주어진 신경망의 구조 중에서 신경망에 포함되어 있는 레이어들만을 개념화하여 나타낸 것이다.
도 2는 컴퓨팅 장치가 이용하는 신경망의 설계에 이용되는 본 발명의 일 실시예에 따른 컴퓨팅 장치의 구성을 나타낸 것이다.
도 3은 본 발명의 일 실시예에 따라 제공되는 신경망의 레이어별 연산정밀도 결정 프로세스를 나타낸 순서도이다.
도 4는 본 발명의 일 실시예에 따라 신경망의 레이어별 지표를 산출하는 프로세스를 나타낸 순서도이다.
도 5는 본 발명의 일 실시예에서 이용되는 컴퓨팅 장치 중 일부의 주요 구조를 나타낸 것이다.
도 6은 본 발명의 일 실시예에 따라 제공되는 출력 액티베이션의 산출방법을 나타낸 순서도이다.
도 7은 본 발명의 일 실시예에 따라 신경망 연산방법을 실행하는 컴퓨팅 장치의 구성을 나타낸 것이다.Figure 1 conceptualizes only the layers included in the neural network among the given neural network structure.
Figure 2 shows the configuration of a computing device according to an embodiment of the present invention used to design a neural network used by the computing device.
Figure 3 is a flowchart showing a process for determining computational precision for each layer of a neural network provided according to an embodiment of the present invention.
Figure 4 is a flowchart showing a process for calculating indices for each layer of a neural network according to an embodiment of the present invention.
Figure 5 shows the main structure of some of the computing devices used in one embodiment of the present invention.
Figure 6 is a flowchart showing a method of calculating output activation provided according to an embodiment of the present invention.
Figure 7 shows the configuration of a computing device that executes a neural network calculation method according to an embodiment of the present invention.

이하, 본 발명의 실시예를 첨부한 도면을 참고하여 설명한다. 그러나 본 발명은 본 명세서에서 설명하는 실시예에 한정되지 않으며 여러 가지 다른 형태로 구현될 수 있다. 본 명세서에서 사용되는 용어는 실시예의 이해를 돕기 위한 것이며, 본 발명의 범위를 한정하고자 의도된 것이 아니다. 또한, 이하에서 사용되는 단수 형태들은 문구들이 이와 명백히 반대의 의미를 나타내지 않는 한 복수 형태들도 포함한다.Hereinafter, embodiments of the present invention will be described with reference to the attached drawings. However, the present invention is not limited to the embodiments described herein and may be implemented in various other forms. The terms used in this specification are intended to aid understanding of the embodiments and are not intended to limit the scope of the present invention. Additionally, as used herein, singular forms include plural forms unless phrases clearly indicate the contrary.

도 1은 주어진 신경망의 구조 중에서 신경망에 포함되어 있는 레이어들만을 개념화하여 나타낸 것이다. Figure 1 conceptualizes only the layers included in the neural network among the given neural network structure.

신경망(1000은 복수 개의 레이어(10[k])들을 포함할 수 있다(k=1, 2, 3, ..., K). 제k 레이어(10[k])에는 제k 입력 액티베이션이 입력되고, 제k 레이어(10[k])로부터 제k 출력 액티베이션이 출력될 수 있다. 제k 출력 액티베이션을 생성하기 위하여 상기 제k 입력 액티베이션과 상기 제k 레이어(10[k])를 위해 미리 준비된 제k 가중치를 이용하는 연산이 실행될 수 있다. 이때, 상기 제k 입력 액티베이션, 상기 제k 출력 액티베이션, 및 상기 제k 가중치를 구성하는 값들은 제k 비트수를 갖는 고정 소수점 형태로 양자화 되어 있을 수 있다. The neural network (1000) may include a plurality of layers (10[k]) (k=1, 2, 3, ..., K). The kth input activation is input to the kth layer (10[k]). And, the kth output activation can be output from the kth layer (10[k]). In order to generate the kth output activation, the kth input activation and the kth layer (10[k]) are prepared in advance for the kth layer (10[k]). An operation using the kth weight may be executed. At this time, the values constituting the kth input activation, the kth output activation, and the kth weight may be quantized in a fixed point form with the kth number of bits. .

일 실시예에서, 모든 k 값에 대하여 제k 비트수는 모두 동일할 수 있다(k=1, 2, 3, ..., K). In one embodiment, the kth number of bits may be the same for all k values (k=1, 2, 3, ..., K).

다른 실시예에서, 제1그룹의 k 값들에 대한 제k 비트수는 모두 제1 비트수를 가질 수 있고, 제2그룹의 k 값들에 대한 제k 비트수는 모두 제2 비트수를 가질 수 있다. 이때, 상기 제1 비트수는 상기 제2 비트수 보다 클 수 있다.In another embodiment, the k-th number of bits for the k values in the first group may all have the first number of bits, and the k-th number of bits for the k values in the second group may all have the second number of bits. . At this time, the first number of bits may be greater than the second number of bits.

도 2는 컴퓨팅 장치가 이용하는 신경망의 설계에 이용되는 컴퓨팅 장치의 구성을 나타낸 것이다.Figure 2 shows the configuration of a computing device used to design a neural network used by the computing device.

컴퓨팅 장치(11)는 비휘발성 기록매체(12), 장치 인터페이스(13), 및 처리부(14)를 포함할 수 있다. 상기 비휘발성 기록매체(12)는 상기 컴퓨팅 장치(11)에 납땜 되어 있거나 또는 탈착 가능한 메모리일 수 있다. 상기 처리부(14)는 CPU 등일 수 있다. 상기 장치 인터페이스는 데이터 버스, 명령 버스, 및 버스 컨트롤러 등을 포함할 수 있다.Computing device 11 may include a non-volatile recording medium 12, a device interface 13, and a processing unit 14. The non-volatile recording medium 12 may be soldered to the computing device 11 or may be a removable memory. The processing unit 14 may be a CPU or the like. The device interface may include a data bus, command bus, and bus controller.

상기 비휘발성 기록매체(12)는 신경망의 구조에 관한 정보를 포함하는 신경망 정보가 저장되어 있을 수 있다. 상기 신경망 정보는, 상기 신경망의 레이어의 개수, 각 레이어의 입출력 변환 함수, 레이어를 제외한 다른 연산모듈의 기능 등, 상기 신경망에 입력되는 입력 데이터의 처리를 위해 필요한 정보들을 포함할 수 있다.The non-volatile recording medium 12 may store neural network information including information about the structure of the neural network. The neural network information may include information necessary for processing input data input to the neural network, such as the number of layers of the neural network, the input/output conversion function of each layer, and the functions of calculation modules other than the layer.

본 발명의 일 실시예에서는 상기 신경망의 각 레이어의 특성을 산술적으로 평가한 지표를 정의한다. 이 지표의 구체적인 정의는 아래의 수식 2에 자세히 설명한다. In one embodiment of the present invention, an index that arithmetically evaluates the characteristics of each layer of the neural network is defined. The specific definition of this indicator is explained in detail in Equation 2 below.

상기 비휘발성 기록매체(12)는 상기 신경망의 각 레이어에 대하여 상기 지표를 산출하기 위한 명령코드인 제1명령코드(레이어별 지표 산출 명령코드)를 포함할 수 있다. The non-volatile recording medium 12 may include a first command code (index calculation command code for each layer), which is a command code for calculating the index for each layer of the neural network.

상기 비휘발성 기록매체(12)는 상기 산출된 지표들을 이용하여, 상기 신경망의 각 레이어에 대하여 할당되는 연산정밀도를 결정하는 제2명령코드(레이어별 연산정밀도 결정 명령코드)를 포함할 수 있다.The non-volatile recording medium 12 may include a second command code (computation precision determination command code for each layer) that determines the calculation precision allocated to each layer of the neural network using the calculated indices.

예컨대 제k 레이어에는 제k 입력 액티베이션과 제k 가중치를 곱하거나 더하는 연산을 통해 제k 출력 액티베이션을 생성한다. 이때, 제k 입력 액티베이션과 제k 가중치를 제k 비트수를 갖는 고정 소수점 형태로 표현하여 연산한다. 이때, 상기 제k 비트수가 어떤 값인지 미리 결정해야 하는데, 그 값이 클수록 상기 연산정밀도가 높아지고, 양자화 에러가 감소한다. 따라서 상기 연산정밀도를 결정한다는 것은 상기 제k 비트수의 구체적인 값을 결정하는 것으로 이해될 수 있다.For example, in the kth layer, the kth output activation is generated through an operation that multiplies or adds the kth input activation and the kth weight. At this time, the kth input activation and the kth weight are calculated by expressing them in fixed point form with the kth number of bits. At this time, it is necessary to determine in advance what the k-th number of bits is. The larger the value, the higher the calculation precision and the lower the quantization error. Therefore, determining the computational precision can be understood as determining a specific value of the kth number of bits.

상기 신경망 정보에는 상기 각 레이어별 연산정밀도를 나타내는 정보가 포함될 수 있는데, 이 정보는 상기 제1명령코드 및 상기 제2명령코드를 실행함으로써 생성되거나 갱신될 수 있다.The neural network information may include information indicating computational precision for each layer, and this information may be generated or updated by executing the first command code and the second command code.

상기 비휘발성 기록매체(12)에 저장된 상기 신경망 정보, 상기 제1명령코드, 및 상기 제2명령코드는 상기 장치 인터페이스(13)를 통해 상기 처리부(14)에 로딩 될 수 있다. The neural network information, the first command code, and the second command code stored in the non-volatile recording medium 12 may be loaded into the processing unit 14 through the device interface 13.

상기 처리부(14)는 상기 제1명령코드를 실행함으로써 레이어별 지표 산출 프로세스를 실행하고, 상기 제2명령코드를 실행함으로써 레이어별 연산정밀도 결정 프로세스를 실행할 수 있다.The processing unit 14 may execute an index calculation process for each layer by executing the first command code, and execute a process for determining calculation precision for each layer by executing the second command code.

도 3은 본 발명의 일 실시예에 따라 제공되는 신경망의 레이어별 연산정밀도 결정 프로세스를 나타낸 순서도이다.Figure 3 is a flowchart showing a process for determining computational precision for each layer of a neural network provided according to an embodiment of the present invention.

단계(S110)에서, 컴퓨팅 장치(11)는, 주어진 신경망의 선택된 제1세트의 레이어들 각각에 대하여 양자화 오류에 관한 지표(SQNR[k]_Avg)를 산출할 수 있다.In step S110, the computing device 11 may calculate an indicator (SQNR[k] _Avg ) regarding quantization error for each of the selected first set of layers of a given neural network.

여기서 상기 제1세트의 레이어들은 상기 신경망에 포함된 모든 레이어들일 수도 있고, 또는 상기 모든 레이어들 중 일부일 수도 있다. 일 실시예에서, 상기 제1세트의 레이어들을 선택하는 기준은 특별히 제한되지는 않을 수 있다.Here, the first set of layers may be all layers included in the neural network, or may be some of all layers. In one embodiment, the criteria for selecting the first set of layers may not be particularly limited.

상기 양자화 오류에 관한 지표(SQNR[k]_Avg)에 관한 정의는 아래 수식 2를 통해 설명한다.The definition of the quantization error indicator (SQNR[k] _Avg ) is explained through Equation 2 below.

상기 제1세트의 레이어들의 개수가 N 개라면 상기 양자화 오류에 관한 지표(SQNR[k]_Avg) 역시 N개가 생성될 수 있다.If the number of layers in the first set is N, N indicators for the quantization error (SQNR[k] _Avg ) can also be generated.

단계(S120)에서, 컴퓨팅 장치(11)는, 상기 제1세트의 레이어들 중 상기 지표의 값이 작은 상위 N1 개의 레이어들에 대하여 제1양자화 정밀도(높은 정밀도, 높은 비트수)를 할당하고 상기 지표의 값이 큰 상위 N2 개의 레이어들에 대하여 제2양자화 정밀도(낮은 정밀도, 낮은 비트수)를 할당할 수 있다.In step S120, the computing device 11 allocates a first quantization precision (high precision, high number of bits) to the upper N1 layers with small index values among the first set of layers, and The second quantization precision (low precision, low number of bits) can be assigned to the upper N2 layers with large index values.

이때, 상기 제1세트의 레이어들의 개수가 N일 때에, N1+N2=N이거나 또는 N1+N2<N일 수 있다.At this time, when the number of layers in the first set is N, N1+N2=N or N1+N2<N.

예컨대 상기 지표의 값이 작은 상위 N1 개의 레이어들에서 처리되는 데이터는 8비트의 고정 소수점 형태로 표현된 상태에서 연산될 수 있고, 상기 지표의 값이 큰 상위 N2 개의 레이어들에 처리되는 데이터는 4비트의 고정 소수점 형태로 표현된 상태에서 연산될 수 있다. For example, the data processed in the upper N1 layers with small index values can be calculated in an 8-bit fixed-point format, and the data processed in the upper N2 layers with large index values can be calculated in 4 It can be operated on with bits expressed in fixed-point form.

단계(S130)에서, 상기 컴퓨팅 장치(11)는, 상기 제1양자화 정밀도가 할당된 레이어에 관한 정보와 제2양자화 정밀도가 할당된 레이어에 관한 정보를 상기 신경망 정보에 포함할 수 있다. 이 신경망 정보는 비휘발성 기록매체(12)에 저장될 수 있다.In step S130, the computing device 11 may include information about the layer to which the first quantization precision is assigned and information about the layer to which the second quantization precision is assigned to the neural network information. This neural network information can be stored in the non-volatile recording medium 12.

본 명세서에서 양자화 정밀도(제k 양자화 정밀도)는 연산정밀도(제k 연산정밀도)로 지칭될 수도 있다.In this specification, quantization precision (kth quantization precision) may also be referred to as arithmetic precision (kth arithmetic precision).

일 실시예에서, 상술한 양자화 오류에 관한 지표로서, 특정 레이어에 대한 지표(metric)인 SQNR_Avg를 수식 2와 같이 정의할 수 있다. In one embodiment, as an indicator of the above-described quantization error, SQNR _Avg , a metric for a specific layer, can be defined as Equation 2.

[수식 2][Formula 2]

[수식 2]에서, T는 상기 특정 레이어의 출력 액티베이션의 크기, 즉 출력 액티베이션을 구성하는 데이터의 개수이고,In [Formula 2], T is the size of the output activation of the specific layer, that is, the number of data constituting the output activation,

v_i는, 상기 특정 레이어에서 연산 되는 모든 데이터가 제1정밀도로 양자화되어 있거나 또는 제1비트수를 갖는 고정 소수점 형태로 양자화되어 있을 경우에, 상기 특정 레이어의 출력 액티베이션 중 인덱스 i를 갖는 i번째 데이터의 값이고, v _i is the i th value with index i among the output activations of the specific layer, when all data operated on in the specific layer are quantized to the first precision or quantized in fixed point form with the first number of bits. is the value of the data,

qv_i는, 상기 특정 레이어에서 연산되는 모든 데이터가 제2정밀도로 양자화되어 있거나 또는 제2비트수를 갖는 고정 소수점 형태로 양자화되어 있을 경우에, 상기 특정 레이어의 출력 액티베이션 중 인덱스 i를 갖는 i번째 데이터의 값이고,qv _i is the i th value with index i among the output activations of the specific layer, when all data operated in the specific layer is quantized with second precision or quantized in fixed point form with the second number of bits. is the value of the data,

v_{i -} qv_i는 양자화 오류의 차이값 또는 양자화 오류차 라고 지칭할 수 있으며,v _{i -} qv _i can be referred to as the difference value of quantization error or quantization error difference,

α, β는 상수이다.α, β are constants.

여기서, 상기 제1정밀도는 상기 제2정밀도보다 더 큰 정밀도이며, 상기 제1비트수는 상기 제2비트수보다 더 클 수 있다.Here, the first precision may be greater than the second precision, and the first number of bits may be greater than the second number of bits.

즉, 수식 2는 상기 특정 레이어의 동작을 두 번 실행한 결과를 비교한 것이다. 첫 번째 동작에서는 상기 특정 레이어에 상기 제1정밀도로 양자화된 제1입력 액티베이션 및 제1가중치를 제공하여 제1출력 액티베이션(v_i)을 산출하고, 두 번째 동작에서는 상기 특정 레이어에 상기 제2정밀도로 양자화된 제2입력 액티베이션 및 제2가중치를 제공하여 제2출력 액티베이션(v_i)을 산출한다. 이때, 상기 제1입력 액티베이션과 상기 제2입력 액티베이션은 동일한 입력 액티베이션이며, 다만, 서로 다른 정밀도로 양자화된 것이다. 그리고 상기 제1가중치와 상기 제2가중치는 동일한 가중치이며, 다만 서로 다른 정밀도로 양자화된 것이다. In other words, Equation 2 compares the results of executing the operation of the specific layer twice. In the first operation, the first input activation and first weight quantized with the first precision are provided to the specific layer to calculate the first output activation (v i ), and in the second operation, the first output activation (v _i ) is calculated to the specific layer. The second output activation (v _i ) is calculated by providing the second input activation and second weight quantized with . At this time, the first input activation and the second input activation are the same input activation, but are quantized with different precision. And the first weight and the second weight are the same weight, but are quantized with different precision.

예컨대, {상기 제1비트수, 상기 제2비트수}는 {16, 8}, {16, 4}, 또는 {8, 4}일 수 있다.For example, {the first number of bits, the second number of bits} may be {16, 8}, {16, 4}, or {8, 4}.

수식 2로부터 아래 수식 3 및 수식 4가 성립한다.From Equation 2, Equations 3 and 4 below are established.

[수식 3][Formula 3]

[수식 4][Formula 4]

상기 특정 레이어가 상기 제k 레이어(10[k])인 경우 상기 수식 2의 SQNR_Avg는 SQNR[k]_Avg로 표기할 수 있다.When the specific layer is the k-th layer (10[k]), SQNR _Avg in Equation 2 can be expressed as SQNR[k] _Avg .

수식 3의 SQNR이 크다는 것, 즉 비트수 조정에 따른 v_i-qv_i가 작다는 것은, 해당 레이어에서 연산에 이용되는 데이터를 표현하는 비트수를 더 작은 값으로 변경하더라도 출력 액티베이션의 양자화 오류가 많이 증가하지는 않는다는 것을 의미한다. 따라서 SQNR이 큰 레이어에 대해서는, 해당 레이어에서 연산에 이용되는 데이터를 더 작은 개수의 비트수로 양자화(작은 정밀도)하는 것을 허용해도 된다는 것을 의미할 수 있다.The fact that the SQNR in Equation 3 is large, that is, v _i -qv _i according to the adjustment of the number of bits, is small, means that the quantization error of output activation is small even if the number of bits representing the data used in the operation in the corresponding layer is changed to a smaller value. This means that it does not increase much. Therefore, for a layer with a large SQNR, this may mean that it is acceptable to allow the data used in calculations in that layer to be quantized (small precision) into a smaller number of bits.

이와 달리, SQNR이 작은 레이어에 대해서는, 해당 레이어에서 연산에 이용되는 데이터를 더 많은 개수의 비트수로 양자화(높은 정밀도)해야 한다는 것을 의미할 수 있다.On the other hand, for a layer with a small SQNR, this may mean that the data used for calculations in that layer must be quantized (high precision) with a larger number of bits.

즉, SQNR이 큰 값을 갖는 레이어들에서 연산에 이용되는 데이터는 작은 비트수로 양자화 처리하고, SQNR이 작은 값을 갖는 레이어들에서 연산에 이용되는 데이터는 큰 비트수로 양자화 처리해야 하는 것이 바람직하다.In other words, it is desirable that data used in calculations in layers with large SQNR values be quantized with a small number of bits, and data used in calculations in layers with small SQNR values should be quantized with a large number of bits. do.

이때, 본 발명의 바람직한 실시예에서는, 산출된 상기 SQNR의 값을 소정의 규칙에 따라 감소시켜 보정한 보정값 SQNR_Avg을 이용할 수 있다(수식 2). 상기 감소를 위한 보정항은 수식 4에 제시된 -β·log₁₀T이다. 이에 따르면, 어떤 레이어의 SQNR의 값은 해당 레이어의 출력 액티베이션의 크기(T)가 클수록 더 감소시키도록 보정하고, 출력 액티베이션의 크기(T)가 작을수록 덜 감소시키도록 보정된다. At this time, in a preferred embodiment of the present invention, a correction value SQNR _Avg obtained by reducing the calculated value of SQNR according to a predetermined rule can be used (Equation 2). The correction term for the reduction is -β·log ₁₀ T given in Equation 4. According to this, the SQNR value of a layer is corrected to be reduced more as the output activation size (T) of the layer is larger, and is corrected to be reduced less as the output activation size (T) is smaller.

출력 액티베이션의 크기가 상대적으로 큰 레이어인 경우, 만일 SQNR이 큰 값을 갖더라도 그 값은 보정항에 의해 감소되는 정도가 크다. 즉, 출력 액티베이션의 크기가 상대적으로 큰 레이어인 경우 큰 비트수(높은 정밀도)로 양자화 처리될 가능성이 높아지도록 처리한다. In the case of a layer where the size of the output activation is relatively large, even if the SQNR has a large value, the value is greatly reduced by the correction term. In other words, if the size of the output activation is a relatively large layer, it is processed to increase the possibility of quantization with a large number of bits (high precision).

이와 달리, 출력 액티베이션의 크기가 상대적은 작은 레이어인 경우, 만일 SQNR이 작은 값을 갖더라도 그 값은 보정항에 의해 감소되는 정도가 작다. 즉, 출력 액티베이션의 크기가 상대적으로 작은 레이어인 경우 낮은 비트수(낮은 정밀도)로 양자화 처리될 가능성이 높아지도록 처리한다. On the other hand, in the case of a layer where the size of the output activation is relatively small, even if the SQNR has a small value, the degree to which the value is reduced by the correction term is small. In other words, if the size of the output activation is a relatively small layer, the layer is processed to increase the possibility of quantization with a low number of bits (low precision).

즉, 상술한 보정항은 출력 액티베이션의 크기가 상대적으로 작은 경우보다 큰 경우에 높은 비트수(높은 정밀도)로 양자화 처리될 가능성이 높아지도록 보정하는 항이다.In other words, the above-mentioned correction term is a correction term that increases the possibility of quantization processing with a high number of bits (high precision) when the size of the output activation is large rather than when the size of the output activation is relatively small.

일 실시예에서, 단계(S130) 이후에 단계(S140)이 실행될 수 있다. 단계(S140)에서는, 상기 신경망의 각 레이어에 대응하는 가중치 값들을 상기 비휘발성 메모리에 저장할 수 있다. 이때, 상기 저장된 가중치 값들은 상기 신경망 정보에 포함될 수 있다. 이때 만일 제k 레이어에 제1양자화 정밀도가 할당되었다면, 상기 제k 레이어에 대응하는 가중치 값들은 상기 제1비트수를 갖는 고정 소수점 형태로 양자화되어 저장될 수 있다. 이와 달리, 만일 제k 레이어에 제2양자화 정밀도가 할당되었다면, 상기 제k 레이어에 대응하는 가중치 값들은 상기 제2비트수를 갖는 고정 소수점 형태로 양자화 되어 저장될 수 있다.In one embodiment, step S140 may be executed after step S130. In step S140, weight values corresponding to each layer of the neural network may be stored in the non-volatile memory. At this time, the stored weight values may be included in the neural network information. At this time, if the first quantization precision is assigned to the k-th layer, the weight values corresponding to the k-th layer may be quantized and stored in fixed-point format with the first number of bits. In contrast, if the second quantization precision is assigned to the k-th layer, the weight values corresponding to the k-th layer may be quantized and stored in fixed-point format with the second number of bits.

다른 실시예에서, 상기 신경망의 각 레이어에 대응하는 가중치 값들은 상기 비휘발성 기록매체(12)에 저장된 신경망 정보에 포함되지 않을 수 있다. 즉, 상기 신경망의 각 레이어에 할당된 양자화 정밀도는 상기 신경망 정보에 포함되지만, 상기 각 레이어에 대응하는 가중치 값들은 별도의 가중치 결정 프로세스에 의해 준비될 수 있다. 상기 가중치 결정 프로세스에서는 상기 신경망 정보에 포함된 레이어별 양자화 정밀도를 이용하여 각 레이어에 대응하는 가중치 값들을 표현하는 비트수를 결정할 수 있다.In another embodiment, weight values corresponding to each layer of the neural network may not be included in the neural network information stored in the non-volatile recording medium 12. That is, the quantization precision assigned to each layer of the neural network is included in the neural network information, but weight values corresponding to each layer may be prepared through a separate weight determination process. In the weight determination process, the number of bits representing weight values corresponding to each layer can be determined using the quantization precision for each layer included in the neural network information.

수식 2에 제시한 본 발명의 일 실시예에 따라 제공되는 특정 레이어에 대한 지표는, 상기 특정 레이어에서 연산되는 제1데이터가 제1비트수를 갖는 고정 소수점 형태로 양자화되어 있는 경우에 상기 특정 레이어가 출력하는 제1출력 액티베이션과 상기 제1데이터가 제2비트수를 갖는 고정 소수점 형태로 양자화되어 있는 경우에 상기 특정 레이어가 출력하는 제2출력 액티베이션의 차이값(v_i-qv_i) 또는 거리()를 기초로 생성한 값을 상기 특정 출력 레이어의 출력 액티베이션의 크기(T)를 이용하여 보정한 것으로 간주할 수 있다. The indicator for a specific layer provided according to an embodiment of the present invention presented in Equation 2 is, when the first data operated on the specific layer is quantized in a fixed-point format with the first number of bits, the specific layer The difference value (v _i -qv _i ) or distance between the first output activation output and the second output activation output by the specific layer when the first data is quantized in fixed-point form with a second number of bits. ( ) can be considered as having been corrected using the size (T) of the output activation of the specific output layer.

수식 2에 제시한 본 발명의 일 실시예에 따라 제공되는 특정 레이어에 대한 지표는, 상기 특정 레이어에서 연산되는 제1데이터가 제1비트수를 갖는 고정 소수점 형태로 양자화되어 있는 경우에 상기 특정 레이어가 출력하는 제1출력 액티베이션과 상기 제1데이터가 제2비트수를 갖는 고정 소수점 형태로 양자화되어 있는 경우에 상기 특정 레이어가 출력하는 제2출력 액티베이션의 차이값(v_i-qv_i) 또는 거리()가 작을수록 더 큰 값을 가질 수 있다.The indicator for a specific layer provided according to an embodiment of the present invention presented in Equation 2 is, when the first data operated on the specific layer is quantized in a fixed-point format with the first number of bits, the specific layer The difference value (v _i -qv _i ) or distance between the first output activation output and the second output activation output by the specific layer when the first data is quantized in fixed-point form with a second number of bits. ( ), the smaller it can be, the larger the value can be.

또한, 수식 2에 제시한 본 발명의 일 실시예에 따라 제공되는 특정 레이어에 대한 지표는, 상기 특정 출력 레이어의 출력 액티베이션의 크기가 클수록 더 작은 값을 가질 수 있다.Additionally, the indicator for a specific layer provided according to an embodiment of the present invention shown in Equation 2 may have a smaller value as the size of output activation of the specific output layer increases.

도 4는 본 발명의 일 실시예에 따라 신경망의 레이어별 지표를 산출하는 프로세스를 나타낸 순서도이다.Figure 4 is a flowchart showing a process for calculating indices for each layer of a neural network according to an embodiment of the present invention.

도 4에 나타낸 순서도는 도 3의 단계(S110)를 구체화한 것이다.The flow chart shown in FIG. 4 embodies the step (S110) of FIG. 3.

단계(S111)에서, 컴퓨팅 장치(11)는, 상기 신경망 정보로부터, 신경망에 포함된 제k 레이어의 출력 액티베이션의 크기를 획득할 수 있다.In step S111, the computing device 11 may obtain the size of the output activation of the kth layer included in the neural network from the neural network information.

예컨대, 상기 출력 액티베이션의 크기는 수식 2에 제시한 T일 수 있다.For example, the size of the output activation may be T shown in Equation 2.

그 다음, 단계(S112)에서, 컴퓨팅 장치(11)는, Then, in step S112, computing device 11:

① 제k 레이어에서 제k 출력 액티베이션을 생성하기 위해 이용하는 제1데이터를 제1비트수를 갖는 고정 소수점 형태로 준비하고, ① Prepare the first data used to generate the kth output activation in the kth layer in a fixed point form with the first number of bits,

② 상기 제1비트수를 갖는 고정 소수점 형태를 갖는 상기 제1데이터를 이용하여 상기 제k 레이어로부터 제1출력 액티베이션을 생성하고, 그리고 ② Generate a first output activation from the kth layer using the first data having a fixed point form with the first number of bits, and

③ 상기 제1출력 액티베이션의 각 요소의 제곱의 합인 제1합을 산출할 수 있다.③ The first sum, which is the sum of the squares of each element of the first output activation, can be calculated.

예컨대, 상기 제1출력 액티베이션은 수식 2에 제시한 v_i(i=1, 2, 3, ..., T)일 수 있고, 상기 제1합은 수식 2에 제시한 일 수 있다.For example, the first output activation may be v _i (i=1, 2, 3, ..., T) shown in Equation 2, and the first sum is shown in Equation 2. It can be.

여기서, 상기 제1데이터는 상기 제k 레이어에 입력되는 제k 입력 액티베이션 및 상기 제k 레이어에서 이용되는 제k 가중치를 포함할 수 있다.Here, the first data may include a kth input activation input to the kth layer and a kth weight used in the kth layer.

단계(S113)에서, 컴퓨팅 장치(11)는,In step S113, the computing device 11:

① 제k 레이어에서 제k 출력 액티베이션을 생성하기 위해 이용하는 상기 제1데이터를 상기 제1비트수와는 다른 제2비트수를 갖는 고정 소수점 형태로 준비하고, ① The first data used to generate the kth output activation in the kth layer is prepared in fixed-point form with a second number of bits different from the first number of bits,

② 상기 제2비트수를 갖는 고정 소수점 형태를 갖는 상기 제1데이터를 이용하여 상기 제k 레이어로부터 제2출력 액티베이션을 생성하고, 그리고 ② Generate a second output activation from the kth layer using the first data in a fixed-point format with the second number of bits, and

③ 상기 제1출력 액티베이션의 각 요소와 이에 대응하는 제2출력 액티베이션의 각 요소 간의 차이의 제곱의 합인 제2합을 산출할 수 있다.③ The second sum, which is the sum of the squares of the differences between each element of the first output activation and each element of the corresponding second output activation, can be calculated.

단계(S113)의 제k 레이어와 제1데이터는 단계(S112)의 제k 레이어와 제1데이터와 동일한 것이다. 다만, 단계(S113)의 제1데이터를 표현하는 비트수는 단계(S112)의 제1데이터를 표현하는 비트수와 다르다.The k-th layer and first data in step S113 are the same as the k-th layer and first data in step S112. However, the number of bits representing the first data in step S113 is different from the number of bits representing the first data in step S112.

예컨대, 상기 제2출력 액티베이션은 수식 2에 제시한 qv_i (i=1, 2, 3, ..., T)이고, 상기 제1출력 액티베이션의 각 요소와 이에 대응하는 제2출력 액티베이션의 각 요소 간의 차이는 수식 2에 제시한 v_i-qv_i이고, 상기 제2합은 수식 2에 제시한 일 수 있다.For example, the second output activation is qv _i (i=1, 2, 3, ..., T) shown in Equation 2, and each element of the first output activation and the corresponding second output activation are The difference between the elements is v _i -qv _i presented in Equation 2, and the second sum is presented in Equation 2. It can be.

그 다음, 단계(S114)에서, 컴퓨팅 장치(11)는,Then, in step S114, computing device 11:

① 상기 제2합에 대한 상기 제1합의 비율에 대한 로그값인 제1로그값을 산출하고, 그리고 ① Calculate the first log value, which is the logarithm value of the ratio of the first sum to the second sum, and

② 상기 출력 액티베이션의 크기(T)에 대한 로그값인 제2로그값을 산출할 수 있다. ② A second log value, which is a log value for the size (T) of the output activation, can be calculated.

예컨대 상기 제2합에 대한 상기 제1합의 비율은 수식 2에 나타낸 이고, 상기 제1로그값은 수식 2에 나타낸 이고, 상기 제2로그값은 수식 2에 나타낸 log₁₀T 일 수 있다.For example, the ratio of the first sum to the second sum is shown in Equation 2 And the first log value is shown in Equation 2 , and the second log value may be log ₁₀ T shown in Equation 2.

그 다음, 단계(S115)에서, 컴퓨팅 장치(11)는, 상기 제1로그값에 비례하는 값으로부터 상기 제2로그값에 비례하는 값을 차감하여 상기 제k 레이어에 대한 양자화 오류에 관한 지표(SQNR[k]_Avg)를 결정할 수 있다.Next, in step S115, the computing device 11 subtracts a value proportional to the second log value from a value proportional to the first log value and provides an index regarding the quantization error for the kth layer ( SQNR[k] _Avg ) can be determined.

예컨대 상기 제1로그값에 비례하는 값은 수식 3에 나타낸 SQNR이고, 상기 제2로그값에 비례하는 값은 수식 4에 나타낸 βㅇlog₁₀T 일 수 있다.For example, the value proportional to the first log value may be SQNR shown in Equation 3, and the value proportional to the second log value may be βㅇlog ₁₀ T shown in Equation 4.

도 5는 본 발명의 일 실시예에서 이용되는 컴퓨팅 장치 중 일부의 주요 구조를 나타낸 것이다.Figure 5 shows the main structure of some of the computing devices used in one embodiment of the present invention.

컴퓨팅 장치(1)는 도 2에 도시한 컴퓨팅 장치(11)와 동일한 것이거나 또는 다른 것일 수 있다. 도 2의 컴퓨팅 장치(11)는 신경망의 구조 및 동작 규칙을 설계하는 설계자가 사용하는 장치일 수 있고, 도 5의 컴퓨팅 장치(1)는 상기 설계가 완료된 신경망을 이용하여 소정의 연산을 수행하는 장치일 수 있다.Computing device 1 may be the same as computing device 11 shown in FIG. 2 or may be different. The computing device 11 of FIG. 2 may be a device used by a designer who designs the structure and operation rules of a neural network, and the computing device 1 of FIG. 5 may be a device that performs a predetermined operation using the designed neural network. It could be a device.

컴퓨팅 장치(1)는, DRAM(Dynamic Random Access Memory)(130), 하드웨어 가속기(110), DRAM(130)와 하드웨어 가속기(110)를 연결하는 버스(700), 및 버스(700)에 연결된 타 하드웨어들(99) 및 메인 프로세서(160)를 포함할 수 있다. 여기서 DRAM(130)은 메모리(130)로 지칭될 수 있다.The computing device 1 includes a dynamic random access memory (DRAM) 130, a hardware accelerator 110, a bus 700 connecting the DRAM 130 and the hardware accelerator 110, and other devices connected to the bus 700. It may include hardware 99 and a main processor 160. Here, DRAM 130 may be referred to as memory 130.

그 밖에 컴퓨팅 장치(1)는 도시되지 않은 전원부, 통신부, 사용자 인터페이스, 저장부(170), 및 주변 장치부들을 더 포함할 수 있다. 버스(700)는 상기 하드웨어 가속기(110)와 타 하드웨어들(99), 그리고 메인 프로세서(160)가 공유할 수도 있다.In addition, the computing device 1 may further include a power supply unit, a communication unit, a user interface, a storage unit 170, and peripheral device units not shown. The bus 700 may be shared by the hardware accelerator 110, other hardware 99, and the main processor 160.

상기 하드웨어 가속기(110)는 DMA부(Direct Memory Access part)(20), 제어부(40), 내부 메모리(30), 입력버퍼(650), 데이터 연산부(610), 및 출력버퍼(640)를 포함할 수 있다. The hardware accelerator 110 includes a DMA unit (Direct Memory Access part) 20, a control unit 40, an internal memory 30, an input buffer 650, a data operation unit 610, and an output buffer 640. can do.

내부 메모리(30)에 임시 저장되는 데이터의 일부 또는 전부는 버스(700)를 통해 DRAM(130)으로부터 제공될 수 있다. 이때, DRAM(130)에 저장되어 있는 데이터를 내부 메모리(30)로 이동시키기 위하여, 제어부(40)와 DMA부(20)가 내부 메모리(30) 및 DRAM(130)을 제어할 수도 있다.Some or all of the data temporarily stored in the internal memory 30 may be provided from the DRAM 130 through the bus 700. At this time, in order to move data stored in the DRAM 130 to the internal memory 30, the control unit 40 and the DMA unit 20 may control the internal memory 30 and the DRAM 130.

내부 메모리(30)에 저장되어 있던 데이터는 입력버퍼(650)를 통해 데이터 연산부(610)에게 제공될 수 있다. Data stored in the internal memory 30 may be provided to the data calculation unit 610 through the input buffer 650.

상기 데이터 연산부(610)가 동작하여 생성된 출력값들은 출력버퍼(640)을 거쳐 상기 내부 메모리(30)에 저장될 수 있다. 상기 내부 메모리(30)에 저장된 상기 출력값들은 제어부(40) 및 DMA부(20)의 제어에 의해 DRAM(130)에 기록될 수도 있다. Output values generated by the operation of the data calculation unit 610 may be stored in the internal memory 30 through the output buffer 640. The output values stored in the internal memory 30 may be written to the DRAM 130 under the control of the control unit 40 and the DMA unit 20.

제어부(40)는 DMA부(20), 내부 메모리(30), 및 상기 데이터 연산부(610)의 동작을 총괄하여 제어할 수 있다.The control unit 40 can collectively control the operations of the DMA unit 20, the internal memory 30, and the data operation unit 610.

일 구현예에서, 상기 데이터 연산부(610)는 제1시구간 동안에는 제1연산기능을 수행하고 제2시구간 동안에는 제2연산기능을 수행할 수 있다. 예컨대 제1시구간 동안에는 신경망의 제1레이어의 입력 액티베이션으로부터 상기 제1레이어의 출력 액티베이션을 산출하는 연산기능을 수행하고, 제2시구간 동안에는 신경망의 제2레이어의 입력 액티베이션으로부터 상기 제2레이어의 출력 액티베이션을 산출하는 연산기능을 수행할 수 있다.In one implementation, the data calculation unit 610 may perform a first calculation function during a first time period and a second calculation function during a second time period. For example, during the first time period, an arithmetic function is performed to calculate the output activation of the first layer from the input activation of the first layer of the neural network, and during the second time period, the calculation function of the second layer is calculated from the input activation of the second layer of the neural network. The calculation function that calculates output activation can be performed.

도 5에는, 상기 데이터 연산부(610)는 상기 하드웨어 가속기(110) 내에 1개가 제시되어 있다. 그러나 도시하지 않은 변형된 일 실시예에서, 도 5에 나타낸 상기 데이터 연산부(610)는 상기 하드웨어 가속기(110) 내에 복수 개 제공되어 각각 제어부(40)에 의해 요청된 연산을 병렬적으로 수행할 수도 있다.In FIG. 5, one data operation unit 610 is shown within the hardware accelerator 110. However, in a modified embodiment not shown, a plurality of data calculation units 610 shown in FIG. 5 may be provided in the hardware accelerator 110 to perform operations requested by the control unit 40 in parallel. there is.

일 구현예에서, 상기 데이터 연산부(610)는 그 출력데이터를 한 번에 출력하지 않고 시간에 따라 주어진 순서에 따라 순차적으로 출력할 수 있다.In one implementation, the data calculation unit 610 may output the output data sequentially according to a given order over time rather than all at once.

도 5의 메모리(130)에는 도 3의 단계(S130)에 의해 완성되어 도 2의 비휘발성 기록매체(12)에 저장된 신경망 정보가 저장되어 있을 수 있다. The memory 130 of FIG. 5 may store the neural network information completed in step S130 of FIG. 3 and stored in the non-volatile recording medium 12 of FIG. 2.

상기 신경망 정보는 제1양자화 정밀도가 할당된 레이어에 관한 정보와 제2양자화 정밀도가 할당된 레이어에 관한 정보를 포함할 수 있다. 따라서 상기 신경망 중 제k 레이어에 할당된 양자화 정밀도가 상기 제1양자화 정밀도인지 아니면 상기 제2양자화 정밀도인지에 관한 정보가 준비될 수 있다.The neural network information may include information about the layer to which the first quantization precision is assigned and information about the layer to which the second quantization precision is assigned. Accordingly, information regarding whether the quantization precision assigned to the kth layer of the neural network is the first quantization precision or the second quantization precision can be prepared.

일 시구간에서, 상기 하드웨어 가속기(110)는 메모리(130)에 저장되어 있는 신경망 정보 중 제k 레이어가 출력해야 하는 제k 출력 액티베이션을 산출할 수 있다. 이를 위하여 상기 메모리(130)에는 상기 제k 레이어에 입력되는 제k 입력 액티베이션 및 제k 가중치가 저장되어 있을 수 있다. 상기 일 시구간에서 상기 하드웨어 가속기(110)는 상기 메모리(130)에 저장되어 있는 상기 제k 입력 액티베이션 및 제k 가중치를 버스(700)를 통해 읽어들여 내부 메모리(30)에 임시로 저장한 후, 데이터 연산부(610)를 이용하여 상기 제k 출력 액티베이션을 산출할 수 있다. 이러한 일련의 과정은 제어부(40)에 의해 제어될 수 있으며, 제어부(40)를 동작시키는 명령코드 중 적어도 일부는 메인 프로세서(160)에 의해 제공된 것일 수도 있다. In one time period, the hardware accelerator 110 may calculate the kth output activation that the kth layer should output among the neural network information stored in the memory 130. For this purpose, the kth input activation and kth weight input to the kth layer may be stored in the memory 130. In the one time period, the hardware accelerator 110 reads the kth input activation and kth weight stored in the memory 130 through the bus 700 and temporarily stores them in the internal memory 30. , the kth output activation can be calculated using the data calculation unit 610. This series of processes may be controlled by the control unit 40, and at least some of the command codes that operate the control unit 40 may be provided by the main processor 160.

일 실시예에서, 상기 제k 레이어의 상기 제k 출력 액티베이션을 산출하기 위해 상기 내부 메모리(30) 또는 상기 메모리(130)에 저장되어 있는 상기 제k 가중치 및/또는 상기 제k 입력 액티베이션이 제1비트수의 고정 소수점 형태를 가질 수 있다. 이때, 만일 상기 제k 레이어에 할당된 양자화 정밀도가 제2비트수에 대응하는 것으로 확인된다면, 제어부(40)는 상기 제1비트수의 고정 소수점 형태를 갖는 상기 제k 가중치 및/또는 상기 제k 입력 액티베이션을 상기 제2비트수의 고정 소수점 형태로 변환할 수 있다. 그 다음, 상기 데이터 연산부(610)는, 상기 변환된 제2비트수의 고정 소수점 형태를 갖는 상기 제k 가중치 및/또는 상기 제k 입력 액티베이션을 이용하여 상기 제k 출력 액티베이션을 산출할 수 있다. In one embodiment, the kth weight and/or the kth input activation stored in the internal memory 30 or the memory 130 are used to calculate the kth output activation of the kth layer. It can have a fixed-point number of bits. At this time, if it is confirmed that the quantization precision allocated to the k-th layer corresponds to the second number of bits, the control unit 40 controls the k-th weight and/or the k-th weight having a fixed point form of the first number of bits. Input activation may be converted to a fixed point form of the second number of bits. Next, the data calculation unit 610 may calculate the kth output activation using the kth weight and/or the kth input activation having a fixed point form of the converted second number of bits.

도 6은 본 발명의 일 실시예에 따라 제공되는 출력 액티베이션의 산출방법을 나타낸 순서도이다.Figure 6 is a flowchart showing a method of calculating output activation provided according to an embodiment of the present invention.

일 실시예에서, 도 6에 나타낸 각 단계는 도 5의 하드웨어 가속기(110)에서 실행될 수 있다.In one embodiment, each step shown in Figure 6 may be executed in hardware accelerator 110 of Figure 5.

단계(S210)에서, 하드웨어 가속기(110)는 메모리(130)로부터 신경망 정보의 일부 또는 전부를 읽을 수 있다. 상기 신경망 정보에는 신경망의 구조가 포함될 수 있다. In step S210, the hardware accelerator 110 may read some or all of the neural network information from the memory 130. The neural network information may include the structure of the neural network.

단계(S220)에서, 상기 하드웨어 가속기(110)는 상기 신경망 정보에서 제k 레이어에 할당된 양자화 정밀도를 확인할 수 있다. In step S220, the hardware accelerator 110 may check the quantization precision assigned to the kth layer from the neural network information.

단계(S230)에서, 상기 하드웨어 가속기(110)는 신경망의 제k 레이어에 입력될 제k 입력 액티베이션 및 상기 제k 레이어에 대응하는 제k 가중치를 메모리로부터 읽을 수 있다. 단계(S230)는 단계(S210)에서 이미 실행된 것일 수도 있다.In step S230, the hardware accelerator 110 may read the kth input activation to be input to the kth layer of the neural network and the kth weight corresponding to the kth layer from memory. Step S230 may have already been executed in step S210.

단계(S240)에서, 상기 하드웨어 가속기(110)는 상기 제k 입력 액티베이션 및 상기 제k 가중치가 상기 확인된 양자화 정밀도에 대응하는 비트수로 표현되어 있는지 여부를 확인할 수 있다. 상기 제k 가중치가 상기 확인된 양자화 정밀도에 대응하는 비트수로 표현되어 있다면 단계(S260)로 가고, 그렇지 않다면 단계(S250)로 갈 수 있다.In step S240, the hardware accelerator 110 may check whether the kth input activation and the kth weight are expressed in the number of bits corresponding to the confirmed quantization precision. If the kth weight is expressed in the number of bits corresponding to the confirmed quantization precision, step S260 can be performed. Otherwise, step S250 can be performed.

단계(S250)에서, 상기 하드웨어 가속기(110)는 제k 입력 액티베이션 및 상기 제k 가중치를 상기 확인된 양자화 정밀도에 대응하는 비트수의 고정 소수점 형태를 갖도록 변환할 수 있다.In step S250, the hardware accelerator 110 may convert the kth input activation and the kth weight to have a fixed point form with the number of bits corresponding to the confirmed quantization precision.

단계(S260)에서, 상기 하드웨어 가속기(110)는 상기 제k 입력 액티베이션 및 상기 제k 가중치를 이용하여 상기 제k 레이어의 상기 제k 출력 액티베이션을 산출할 수 있다. In step S260, the hardware accelerator 110 may calculate the kth output activation of the kth layer using the kth input activation and the kth weight.

산출된 상기 제k 출력 액티베이션은 내부 메모리(30)에 임시로 저장된 상태에서 이용될 수도 있고, 또는 비휘발성 기록장치인 메모리(130)에 저장될 수도 있다.The calculated k-th output activation may be used while temporarily stored in the internal memory 30, or may be stored in the memory 130, which is a non-volatile recording device.

도 7은 본 발명의 일 실시예에 따라 제공되는 신경망 연산방법을 실행하는 컴퓨팅 장치의 구성을 나타낸 것이다.Figure 7 shows the configuration of a computing device that executes a neural network calculation method provided according to an embodiment of the present invention.

컴퓨팅 장치(21)는 비휘발성 기록매체(22), 장치 인터페이스(23), 및 처리부(24)를 포함할 수 있다. 상기 비휘발성 기록매체(22)는 상기 컴퓨팅 장치(21)에 납땜되어 있거나 또는 탈착 가능한 메모리일 수 있다. 상기 처리부(24)는 CPU 등일 수 있다. 상기 장치 인터페이스는 데이터 버스, 명령 버스, 및 버스 컨트롤러 등을 포함할 수 있다.Computing device 21 may include a non-volatile recording medium 22, a device interface 23, and a processing unit 24. The non-volatile recording medium 22 may be soldered to the computing device 21 or may be a removable memory. The processing unit 24 may be a CPU or the like. The device interface may include a data bus, command bus, and bus controller.

상기 비휘발성 기록매체(22)에는, 신경망에 관한 정보, 상기 신경망의 제k 레이어에 할당된 연산정밀도, 상기 제k 레이어에 입력될 제k 입력 액티베이션 및 상기 제k 레이어에 대응하는 제k 가중치가 저장되어 있을 수 있다. 이 정보들은 도 2에 제시한 컴퓨팅 장치(11)를 이용하여 생성된 것일 수 있다.The non-volatile recording medium 22 contains information about the neural network, computational precision assigned to the k-th layer of the neural network, k-th input activation to be input to the k-th layer, and k-th weight corresponding to the k-th layer. It may be stored. This information may be generated using the computing device 11 shown in FIG. 2.

상기 비휘발성 기록매체(22)에는, 상기 처리부(24)로 하여금, 상기 신경망의 제k 레이어에 할당된 연산정밀도, 상기 제k 레이어에 입력될 제k 입력 액티베이션 및 상기 제k 레이어에 대응하는 제k 가중치를 획득하는 단계; 상기 제k 입력 액티베이션 및 상기 제k 가중치가 상기 획득한 상기 제k 레이어에 할당된 연산정밀도 대응하는 비트수로 표현되어 있는지 여부를 결정하는 단계; 상기 제k 가중치가 상기 획득한 상기 제k 레이어에 할당된 연산정밀도 대응하는 비트수로 표현되어 있지 않다면, 상기 제k 입력 액티베이션 및 상기 제k 가중치를 상기 획득한 연산정밀도에 대응하는 비트수의 고정 소수점 형태를 갖도록 변환하는 단계; 및 상기 변환된 상기 제k 입력 액티베이션 및 상기 제k 가중치를 이용하여 상기 제k 레이어의 상기 제k 출력 액티베이션을 산출하는 단계;를 포함하는 신경망 연산 프로세스(즉, 신경망 연산방법)를 실행하도록 하는 제3명령코드(신경망 연산 명령코드)가 기록된 프로그램이 저장되어 있을 수 있다. In the non-volatile recording medium 22, the processing unit 24 stores the computational precision assigned to the k-th layer of the neural network, the k-th input activation to be input to the k-th layer, and the k-th layer corresponding to the k-th layer. Obtaining k weights; determining whether the kth input activation and the kth weight are expressed in the number of bits corresponding to the computational precision assigned to the obtained kth layer; If the k-th weight is not expressed in the number of bits corresponding to the computational precision assigned to the k-th layer, the k-th input activation and the k-th weight are fixed to the number of bits corresponding to the obtained computational precision. Converting to have decimal form; and calculating the kth output activation of the kth layer using the converted kth input activation and the kth weight. A process for executing a neural network calculation process (i.e., neural network calculation method) including. 3A program in which command codes (neural network operation command codes) are recorded may be stored.

상기 처리부(24)는 상기 장치 인터페이스(23)를 통해 상기 제3명령코드를 읽어 실행함으로써 상기 신경망 연산방법을 실행하도록 되어 있을 수 있다. The processing unit 24 may be configured to execute the neural network calculation method by reading and executing the third command code through the device interface 23.

도 2의 컴퓨팅 장치(11)와 도 7의 컴퓨팅 장치(21)는 동일한 장치일 수도 있고 별개의 장치일 수도 있다. The computing device 11 of FIG. 2 and the computing device 21 of FIG. 7 may be the same device or may be separate devices.

도 2의 컴퓨팅 장치(11)와 도 7의 컴퓨팅 장치(21)가 별개의 장치인 경우 도 2의 컴퓨팅 장치(11)와 도 7의 컴퓨팅 장치(21)가 서로 협력하여 한 개의 컴퓨팅 시스템을 형성할 수 있다. 이때 상기 컴퓨팅 시스템은 상술한 본 발명의 모든 방법을 실행할 수 있다. 이때, 도 2의 컴퓨팅 장치(11)와 도 7의 컴퓨팅 장치(21)는 로컬 네트워크에 의해 정보를 교환하거나 또는 메트로폴리탄 네트워크와 같은 원격 네트워크에 의해 정도를 교환할 수도 있다.When the computing device 11 of FIG. 2 and the computing device 21 of FIG. 7 are separate devices, the computing device 11 of FIG. 2 and the computing device 21 of FIG. 7 cooperate with each other to form one computing system. can do. At this time, the computing system can execute all methods of the present invention described above. At this time, the computing device 11 of FIG. 2 and the computing device 21 of FIG. 7 may exchange information through a local network or may exchange information through a remote network such as a metropolitan network.

상술한 본 발명의 실시예들을 이용하여, 본 발명의 기술분야에 속하는 자들은 본 발명의 본질적인 특성에서 벗어나지 않는 범위 내에 다양한 변경 및 수정을 용이하게 실시할 수 있을 것이다. 특허청구범위의 각 청구항의 내용은 본 명세서를 통해 이해할 수 있는 범위 내에서 인용관계가 없는 다른 청구항에 결합될 수 있다.By using the above-described embodiments of the present invention, those in the technical field of the present invention will be able to easily make various changes and modifications without departing from the essential characteristics of the present invention. The contents of each claim in the patent claims can be combined with other claims without reference within the scope that can be understood through this specification.

Claims

A method of determining the computational precision of each layer of a neural network operating on a computing device,
Calculating, by the computing device, an index for each of a first set of layers included in the neural network;
The computing device allocates a first operation precision to the upper N1 layers with smaller index values among the layers of the first set and performs the first operation on the upper N2 layers with larger index values. Allocating a second calculation precision having a precision different from the precision; and
storing, by the computing device, information about a layer to which the first computational precision is assigned and information about a layer to which the second computational precision is assigned among the first set of layers in a memory;
Includes,
The maximum value of the indicators of the N1 layers is smaller than the minimum value of the indicators of the N2 layers,
The step of calculating the indicator for a specific layer among the layers of the first set is,
The computing device calculates a first set of values of the first output activation output by the specific layer when the first data operated on the specific layer is quantized in fixed-point form with a first number of bits. step,
The computing device calculates a second set of values of the second output activation output by the specific layer when the first data operated on the specific layer is quantized in fixed-point form with a second number of bits. step, and
calculating, by the computing device, the indicator for the specific layer using a difference between the first set of values and the second set of values;
Including,
How to determine calculation precision.

The method of claim 1, wherein in calculating the indicator for the specific layer using a difference between the values of the first set and the values of the second set, the indicator for the specific layer is the value of the first set. A method for determining calculation precision, characterized in that it is calculated using the distance between the value and the second set of values.

The method of claim 2, wherein the smaller the distance between the first set of values and the second set of values, the larger the value of the indicator calculated for the specific layer. method.

The method of claim 2, wherein the index for the specific layer is calculated further using the size of the number of data constituting the output activation of the specific layer.

The method of claim 1, wherein the smaller the size of the number of data constituting the output activation of the specific layer, the larger the value of the indicator calculated for the specific layer. method.

According to paragraph 1,
The step of calculating the above indicator is,
Obtaining, by the computing device, a size of output activation of a kth layer included in the neural network;
The computing device prepares first data used in the k-th layer in a fixed-point format with a first number of bits, and outputs a first output from the k-th layer using the first data prepared with the first number of bits. A first step of generating an activation and calculating a first sum, which is the sum of the squares of each element of the first output activation;
The computing device prepares the first data in a fixed-point format with a second number of bits and generates a second output activation from the kth layer using the first data prepared with the second number of bits, and calculating a second sum, which is the sum of the squares of differences between each element of the first output activation and each element of the corresponding second output activation;
The computing device calculates a first log value, which is a log value of the ratio of the first sum to the second sum, and a second log value, which is a log value of the size of the output activation of the kth layer. steps; and
determining, by the computing device, an index regarding quantization error for the kth layer by subtracting a value proportional to the second log value from a value proportional to the first log value;
Including,
How to determine calculation precision.

The method of claim 1, wherein the indicator (SQNR _Avg ) regarding a specific layer of the neural network satisfies Equation 1,
[Formula 1]

However, T is a size related to the number of data constituting the output activation of a specific layer;
v _i is the value of the ith data with index i among the output activations of the specific layer, when the first data operated on the specific layer is quantized in fixed-point form with the first number of bits;
qv _i is i with index i among the output activations of the specific layer when the first data operated in the specific layer is quantized in fixed-point form with a second number of bits smaller than the first number of bits. value of the second data;
α, β are constants; and
,
How to determine calculation precision.

The method of claim 6 or 7, wherein the first data includes input activation input to the specific layer and a weight used in the specific layer.

The method of claim 1, wherein the first number, which is the number of bits representing data processed in the upper N1 layers with small index values in fixed-point form, is processed in the upper N2 layers with large index values. A method for determining calculation precision, characterized in that the second number is greater than the number of bits representing the data in fixed-point form.

The method of claim 1, wherein the second calculation precision is lower than the first calculation precision.

delete

A computing device capable of accessing a non-volatile recording medium and including a processing unit, comprising:
In the non-volatile recording medium, information about the neural network, a first command code for calculating an index for each layer, and a second command code for determining the calculation precision for each layer are recorded,
The processing unit,
calculating an index for each of the first set of layers included in the neural network by reading and executing the first command code;
By reading and executing the second command code, the first calculation precision is assigned to the upper N1 layers with smaller index values among the layers of the first set, and to the upper N2 layers with larger index values. assigning a second arithmetic precision having a different precision from the first arithmetic precision; and
storing information about a layer to which the first computational precision is assigned and information about a layer to which the second computational precision is assigned among the first set of layers in the non-volatile recording medium;
is set to run,
The maximum value of the indicators of the N1 layers is smaller than the minimum value of the indicators of the N2 layers,
The step of calculating the indicator for a specific layer among the layers of the first set is,
Calculating a first set of values of the first output activation output by the specific layer when the first data operated on the specific layer is quantized in fixed-point form with a first number of bits,
Calculating a second set of values of the second output activation output by the specific layer when the first data operated on the specific layer is quantized in fixed-point form with a second number of bits, and
Calculating the indicator for the specific layer using the difference between the first set of values and the second set of values,
Including,
Computing device.

The computing device of claim 17, wherein the second calculation precision is lower than the first calculation precision.