KR20240093407A

KR20240093407A - Method for bit quantization of artificial neural network and convultion processor

Info

Publication number: KR20240093407A
Application number: KR1020240074371A
Authority: KR
Inventors: 김녹원
Original assignee: 주식회사 딥엑스
Priority date: 2019-02-25
Filing date: 2024-06-07
Publication date: 2024-06-24
Also published as: KR20200106475A; KR20200104201A; KR102152374B1; KR102261715B1; KR20220142986A; CN113396427A; KR20210023912A

Abstract

본 개시는, 인공신경망의 비트 양자화 방법을 제공한다. 이 방법은, (a) 인공신경망에서 양자화할 하나의 파라미터 또는 하나의 파라미터 그룹을 선택하는 단계; (b) 상기 선택된 파라미터 또는 파라미터 그룹에 대한 데이터 표현 크기를 비트 단위로 감소시키는 비트 양자화 단계; (c) 상기 인공신경망의 정확도가 사전 결정된 목표값을 이상인지 여부를 결정하는 단계; (d) 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 (a) 단계 내지 상기 (c) 단계를 반복 실행하는 단계를 포함할 수 있다.The present disclosure provides a method for bit quantization of an artificial neural network. This method includes the steps of (a) selecting one parameter or one group of parameters to be quantized in an artificial neural network; (b) a bit quantization step of reducing the data representation size for the selected parameter or parameter group in bits; (c) determining whether the accuracy of the artificial neural network is greater than or equal to a predetermined target value; (d) If the accuracy of the artificial neural network is greater than or equal to the target value, repeating steps (a) to (c) may be included.

Description

Bit quantization method and convolution processing device of artificial neural network {METHOD FOR BIT QUANTIZATION OF ARTIFICIAL NEURAL NETWORK AND CONVULTION PROCESSOR}

본 개시는 인공신경망의 하드웨어, 및 합성곱 처리장치에 관한 것으로, 보다 상세하게는, 인공신경망의 실질적인 정확성을 유지하면서 성능과 메모리 사용량을 감소시킬 수 있는 인공신경망의 하드웨어, 및 합성곱 처리장치에 관한 것이다.The present disclosure relates to the hardware of an artificial neural network and a convolution processing device, and more specifically, to the hardware of an artificial neural network and a convolution processing device that can reduce performance and memory usage while maintaining the substantial accuracy of the artificial neural network. It's about.

인공신경망(artificial neural network)은 생물학적 뇌를 모델링한 컴퓨터 구조이다. 인공신경망에서는 뇌의 뉴런들에 해당되는 노드들이 상호 연결되어 있고, 뉴런들 사이의 시냅스 결합의 세기를 가중치(weight)로 표현한다. 인공신경망은 인공 뉴런들(노드들)이 학습을 통해 노드들 사이의 시냅스 결합의 세기를 변화시켜, 주어진 문제 해결 능력을 갖는 모델을 구성한다. An artificial neural network is a computer structure that models the biological brain. In an artificial neural network, nodes corresponding to neurons in the brain are interconnected, and the strength of synaptic connections between neurons is expressed as a weight. Artificial neural networks construct a model with the ability to solve a given problem by changing the strength of synaptic connections between artificial neurons (nodes) through learning.

인공신경망은, 좁은 의미에서 전방 전달 신경망(feedforward neural network)의 일종인 다층 퍼셉트론(multi-layered perceptron)을 지칭할 수 있으나, 이에 한정되는 것은 아니며, 방사 신경망(radial basis function network), 자기조직 신경망(self-organizing network), 순환 신경망(recurrent neural network) 등 다양한 종류의 신경망을 포함할 수 있다. Artificial neural network may, in a narrow sense, refer to a multi-layered perceptron, a type of feedforward neural network, but is not limited thereto, and may include a radial basis function network, a self-organizing neural network, etc. It may include various types of neural networks, such as self-organizing network and recurrent neural network.

최근에는 영상 인식을 위한 기술로 다층 구조의 심층 신경망(deep neural network)이 많이 사용되고 있고, 다층 구조의 심층 신경망의 대표적인 예가 컨볼루션 신경망(convolutional neural network: CNN)이다. 일반적인 다층 구조의 전방 전달 신경망의 경우는, 입력 데이터가 1차원의 형태로 한정되는데, 2차원 내지 3차원으로 구성되는 영상 데이터를 1차원 데이터로 평면화하면 공간 정보가 손실되어, 영상의 공간 정보를 유지한 상태로 신경망의 학습이 어려울 수 있다. 그러나, 컨볼루션 신경망은 2차원 또는 3차원의 공간 정보를 유지한 상태로 시각 정보에 대한 학습이 가능하다. Recently, a deep neural network with a multi-layer structure has been widely used as a technology for image recognition, and a representative example of a deep neural network with a multi-layer structure is a convolutional neural network (CNN). In the case of a general multi-layer forward transfer neural network, the input data is limited to one-dimensional form. When image data consisting of two or three dimensions is flattened into one-dimensional data, spatial information is lost, and the spatial information of the image is lost. It may be difficult to train a neural network in this state. However, convolutional neural networks are capable of learning visual information while maintaining two-dimensional or three-dimensional spatial information.

구체적으로, 컨볼루션 신경망은, 이미지의 공간 정보를 유지하면서 인접 이미지와의 특징을 효과적으로 인식하고, 추출한 이미지의 특징을 모으고 강화하는 맥스 풀링(Max Pooling) 과정을 포함하고 있어, 시각적 데이터의 패턴 인식에 효과적이다. 하지만 이러한 컨볼루션 신경망과 같은 다층 구조의 심층 신경망은, 높은 인식 성능을 제공하기 위해 깊은 레이어 구조가 사용되지만, 그 구조가 매우 복잡하고 큰 연산량과 많은 양의 메모리를 요구한다. 다층 구조의 심층 신경망에서, 내부적으로 발생하는 대부분의 연산은 곱셈과 덧셈(또는 누산)을 사용하여 실행되는데, 인공신경망 내의 노드 간의 연결 수가 많고 곱셈을 요구하는 파라미터(예를 들어, 가중치 데이터, 특징맵 데이터, 활성화맵 데이터 등)의 수가 많기 때문에 학습과정이나 인식과정에서 큰 연산량이 필요하다.Specifically, the convolutional neural network effectively recognizes features of adjacent images while maintaining the spatial information of the image, and includes a Max Pooling process that collects and strengthens the features of the extracted image, thereby recognizing patterns in visual data. It is effective for However, deep neural networks with a multi-layer structure, such as these convolutional neural networks, use a deep layer structure to provide high recognition performance, but the structure is very complex and requires a large amount of computation and a large amount of memory. In a deep neural network with a multi-layer structure, most operations that occur internally are executed using multiplication and addition (or accumulation), as the number of connections between nodes in the artificial neural network is large and parameters that require multiplication (e.g., weight data, features) Since the number of data (map data, activation map data, etc.) is large, a large amount of computation is required during the learning process or recognition process.

이상 살펴본 바와 같이, 컨볼루션 신경망과 같은 다층 구조의 심층 신경망의 학습과 인식과정에서 많은 연산량과 메모리량을 필요로 한다. 다층 구조의 심층 신경망의 연산량과 메모리량을 줄이는 방법으로는, 인공신경망의 연산에 사용되는 파라미터의 데이터 표현 크기를 비트 단위로 감소시키는 비트 양자화 방법이 사용될 수 있다. 기존의 비트 양자화 방법은, 인공신경망의 모든 파라미터를 동일한 비트 수로 양자화 하는 균일 비트 양자화(Uniform bit quantization)가 사용되지만, 기존의 균일 비트 양자화 방법은 인공 신경망에서 사용되는 각각의 파라미터에 대한 비트 수의 변경이 전체 성능에 미치는 영향을 정확히 반영하지 못하는 문제가 있다. As discussed above, a large amount of computation and memory are required during the learning and recognition process of a deep neural network with a multi-layer structure, such as a convolutional neural network. As a method of reducing the amount of computation and memory of a deep neural network with a multi-layer structure, a bit quantization method can be used, which reduces the data representation size of the parameters used in the computation of the artificial neural network in bits. The existing bit quantization method uses uniform bit quantization, which quantizes all parameters of the artificial neural network to the same number of bits. However, the existing uniform bit quantization method uses the number of bits for each parameter used in the artificial neural network. There is a problem with not accurately reflecting the impact of changes on overall performance.

본 명세서에서 개시되는 실시예들은, 인공신경망에 있어서 전체 성능을 개선하면서 인공지능 정확도를 유지할 수 있도록, 인공신경망을 구성하는 각각의 파라미터 데이터 또는 특정 기준에 따라 그룹 지어진 파라미터 데이터를 특정의 비트 수로 양자화하는 방법 및 시스템을 제공하고자 한다. Embodiments disclosed in this specification quantize each parameter data constituting the artificial neural network or parameter data grouped according to a specific standard to a specific number of bits in order to maintain artificial intelligence accuracy while improving overall performance in the artificial neural network. We would like to provide a method and system for doing so.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 방법이 제공된다. 이 방법은, (a) 인공신경망에서 사용되는 복수의 파라미터 중의 적어도 하나의 파라미터를 선택하는 단계; (b) 상기 선택된 파라미터 에 대한 연산에 요구되는 데이터의 크기를 비트 단위로 감소시키는 비트 양자화 단계; (c) 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 단계; (d) 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 파라미터에 대해 (b) 단계 내지 상기 (c) 단계를 반복 실행하여 상기 파라미터의 데이터 표현에서 비트 수를 더 감소시키는 단계를 포함할 수 있다. 또한, 이 방법은, (e) 상기 인공신경망의 정확도가 상기 목표값 미만인 경우, 상기 파라미터의 비트 수를 상기 인공신경망의 정확도가 상기 목표값을 이상이었을 때의 비트 수로 복원한 후, (a) 단계 내지 (d) 단계를 반복하는 단계를 더 포함한다. According to an embodiment of the present disclosure, a method for bit quantization of an artificial neural network is provided. This method includes: (a) selecting at least one parameter among a plurality of parameters used in an artificial neural network; (b) a bit quantization step of reducing the size of data required for the operation for the selected parameter in bit units; (c) determining whether the accuracy of the artificial neural network is greater than or equal to a predetermined target value; (d) If the accuracy of the artificial neural network is greater than or equal to the target value, further reducing the number of bits in the data representation of the parameter by repeatedly executing steps (b) to (c) for the parameter. there is. In addition, this method includes (e) when the accuracy of the artificial neural network is less than the target value, restoring the number of bits of the parameter to the number of bits when the accuracy of the artificial neural network was greater than the target value, and then (a) It further includes repeating steps to (d).

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 방법이 제공된다. 이 방법은, (a) 파라미터 선택 모듈에 의해, 상기 복수의 레이어 중의 적어도 하나의 레이어를 선택하는 단계; (b) 비트 양자화 모듈에 의해, 상기 선택된 레이어의 파라미터에 대한 데이터 표현의 크기를 비트 단위로 감소시키는 비트 양자화 단계; (c) 정확도 판단 모듈에 의해, 상기 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 단계; 및 (d)상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 (a) 단계 내지 상기 (c) 단계를 반복 실행하는 단계를 포함한다.According to an embodiment of the present disclosure, a method for bit quantization of an artificial neural network is provided. The method includes the steps of: (a) selecting, by a parameter selection module, at least one layer among the plurality of layers; (b) a bit quantization step of reducing, by a bit quantization module, the size of the data representation for the parameters of the selected layer in bits; (c) determining, by an accuracy determination module, whether the accuracy of the artificial neural network is greater than or equal to a predetermined target value; and (d) repeating steps (a) to (c) when the accuracy of the artificial neural network is greater than or equal to the target value.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 방법이 제공된다. 이 방법은, (a) 파라미터 선택 모듈에 의해, 상기 인공신경망에서 가중치, 특징맵, 활성화맵 데이터 중에서 하나 이상의 데이터 또는 하나 이상의 그룹의 데이터를 선택하는 단계; (b) 비트 양자화 모듈에 의해, 상기 선택된 데이터에 대한 데이터 표현 크기를 비트 단위로 감소시키는 비트 양자화 단계; (c) 상기 인공신경망의 인공지능 정확도가 목표값 이상인지 여부를 측정하는 단계; 및 (d) 상기 인공신경망의 데이터 중에서 더 이상 양자화할 데이터가 존재하지 않을 때까지, 상기 (a) 단계 내지 상기 (c) 단계를 반복 실행하는 단계를 포함한다.According to an embodiment of the present disclosure, a method for bit quantization of an artificial neural network is provided. This method includes the steps of: (a) selecting one or more data or one or more groups of data among weight, feature map, and activation map data in the artificial neural network by a parameter selection module; (b) a bit quantization step of reducing the data representation size for the selected data in bits by a bit quantization module; (c) measuring whether the artificial intelligence accuracy of the artificial neural network is greater than or equal to a target value; and (d) repeatedly executing steps (a) to (c) until there is no more data to be quantized among the data of the artificial neural network.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 방법이 제공된다. 이 방법은, 상기 인공신경망의 하나 이상의 파라미터에 따라 상기 인공신경망을 학습시키는 단계; 상기 실시예들에 따르는 인공신경망의 비트 양자화 방법에 따라 상기 인공신경망의 하나 이상의 파라미터에 대한 비트 양자화를 실행하는 단계; 및 상기 비트 양자화가 실행된 상기 인공신경망의 하나 이상의 파라미터에 따라 상기 인공신경망을 학습시키는 단계를 포함한다.According to an embodiment of the present disclosure, a method for bit quantization of an artificial neural network is provided. The method includes training the artificial neural network according to one or more parameters of the artificial neural network; performing bit quantization on one or more parameters of the artificial neural network according to the bit quantization method of the artificial neural network according to the above embodiments; and training the artificial neural network according to one or more parameters of the artificial neural network on which the bit quantization has been performed.

본 개시의 다른 실시예에 따르면, 인공신경망의 비트 양자화 시스템이 제공된다. 이 시스템은, 상기 인공신경망 내에서 적어도 하나의 파라미터를 선택하는 파라미터 선택 모듈; 상기 선택된 파라미터의 데이터 표현의 크기를 비트 단위로 감소시키는 비트 양자화 단계; 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 정확도 판단 모듈을 포함할 수 있다. 상기 정확도 판단 모듈은, 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 파라미터 선택 모듈과 상기 비트 양자화 모듈을 제어하여, 상기 인공신경망의 정확도를 목표값 이상으로 유지하면서, 상기 복수의 파라미터 각각이 최소 비트 수를 가지도록 양자화를 실행할 수 있다. According to another embodiment of the present disclosure, a bit quantization system for an artificial neural network is provided. The system includes a parameter selection module for selecting at least one parameter within the artificial neural network; a bit quantization step of reducing the size of the data representation of the selected parameter in bits; It may include an accuracy determination module that determines whether the accuracy of the artificial neural network is greater than or equal to a predetermined target value. When the accuracy of the artificial neural network is greater than the target value, the accuracy determination module controls the parameter selection module and the bit quantization module to maintain the accuracy of the artificial neural network above the target value, and each of the plurality of parameters Quantization can be performed to have the minimum number of bits.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 시스템이 제공된다. 이 시스템은, 상기 인공신경망을 구성하는 복수의 레이어 중의 적어도 하나의 레이어를 선택하는 파라미터 선택 모듈; 상기 선택된 레이어의 파라미터에 대한 데이터 표현의 크기를 비트 단위로 감소시키는 비트 양자화 모듈; 및 상기 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 정확도 판단 모듈을 포함하며, 상기 정확도 판단 모듈은, 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 파라미터 선택 모듈과 상기 비트 양자화 모듈을 제어하여, 상기 복수의 레이어 중의 다른 하나의 레이어에 대한 비트 양자화가 실행되도록 하며, 상기 비트 양자화 모듈은, 상기 복수의 레이어의 모든 가중치에 대해서 n비트(단, n은 n>0인 정수)를 설정하고, 상기 복수의 레이어의 출력 데이터에 대해서 m비트(단, m은 m>0인 정수)를 설정한다. According to one embodiment of the present disclosure, a bit quantization system for an artificial neural network is provided. This system includes a parameter selection module that selects at least one layer among a plurality of layers constituting the artificial neural network; a bit quantization module that reduces the size of the data representation for the parameter of the selected layer in bits; and an accuracy determination module that determines whether the accuracy of the artificial neural network is greater than or equal to a predetermined target value, and the accuracy determination module determines whether the accuracy of the artificial neural network is greater than or equal to the target value. Controls the module to perform bit quantization on another layer among the plurality of layers, and the bit quantization module performs n bits for all weights of the plurality of layers (where n is an integer where n>0). ) is set, and m bits (where m is an integer where m>0) are set for the output data of the plurality of layers.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 시스템이 제공된다. 이 시스템은, 상기 인공신경망을 구성하는 복수의 레이어 중의 적어도 하나의 레이어를 선택하는 파라미터 선택 모듈; 상기 선택된 레이어의 파라미터에 대한 데이터 표현의 크기를 비트 단위로 감소시키는 비트 양자화 모듈; 및 상기 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 정확도 판단 모듈을 포함하며, 상기 정확도 판단 모듈은, 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 파라미터 선택 모듈과 상기 비트 양자화 모듈을 제어하여, 상기 복수의 레이어 중의 다른 하나의 레이어에 대한 비트 양자화가 실행되도록 하며, 상기 비트 양자화 모듈은, 상기 복수의 레이어의 가중치와 출력 데이터에 대해서 n비트(단, n은 n>0인 정수)를 할당하되, 상기 복수의 레이어 각각에 할당되는 비트의 수를 상이하게 설정한다. According to one embodiment of the present disclosure, a bit quantization system for an artificial neural network is provided. This system includes a parameter selection module that selects at least one layer among a plurality of layers constituting the artificial neural network; a bit quantization module that reduces the size of the data representation for the parameter of the selected layer in bits; and an accuracy determination module that determines whether the accuracy of the artificial neural network is greater than or equal to a predetermined target value, and the accuracy determination module determines whether the accuracy of the artificial neural network is greater than or equal to the target value. Controls the module to perform bit quantization on another layer among the plurality of layers, and the bit quantization module performs n bits for the weight and output data of the plurality of layers (where n is n > 0). , an integer) is allocated, but the number of bits allocated to each of the plurality of layers is set differently.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 시스템이 제공된다. 이 시스템은, 상기 인공신경망을 구성하는 복수의 레이어 중의 적어도 하나의 레이어를 선택하는 파라미터 선택 모듈; 상기 선택된 레이어의 파라미터에 대한 데이터 표현의 크기를 비트 단위로 감소시키는 비트 양자화 모듈; 및 상기 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 정확도 판단 모듈을 포함하며, 상기 정확도 판단 모듈은, 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 파라미터 선택 모듈과 상기 비트 양자화 모듈을 제어하여, 상기 복수의 레이어 중의 다른 하나의 레이어에 대한 비트 양자화가 실행되도록 하며, 상기 비트 양자화 모듈은, 상기 복수의 레이어의 가중치와 출력 데이터의 비트의 수를 개별적으로 상이하게 할당한다.According to one embodiment of the present disclosure, a bit quantization system for an artificial neural network is provided. This system includes a parameter selection module that selects at least one layer among a plurality of layers constituting the artificial neural network; a bit quantization module that reduces the size of the data representation for the parameter of the selected layer in bits; and an accuracy determination module that determines whether the accuracy of the artificial neural network is greater than or equal to a predetermined target value, and the accuracy determination module determines whether the accuracy of the artificial neural network is greater than or equal to the target value. The module is controlled to perform bit quantization on another layer among the plurality of layers, and the bit quantization module individually allocates differently the weights of the plurality of layers and the number of bits of output data.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 시스템을 제공한다. 이 시스템은, 상기 인공신경망을 구성하는 복수의 레이어 중의 적어도 하나의 레이어를 선택하는 파라미터 선택 모듈; 상기 선택된 레이어의 파라미터를 저장하기 위한 메모리의 크기를 비트 단위로 감소시키는 비트 양자화 모듈; 및 상기 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 정확도 판단 모듈을 포함하며, 상기 정확도 판단 모듈은, 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 파라미터 선택 모듈과 상기 비트 양자화 모듈을 제어하여, 상기 복수의 레이어 중의 다른 하나의 레이어에 대한 비트 양자화가 실행되도록 하며, 상기 비트 양자화 모듈은, 상기 복수의 레이어에서 사용되는 가중치 별로 상이한 수의 비트를 할당한다.According to an embodiment of the present disclosure, a bit quantization system for an artificial neural network is provided. This system includes a parameter selection module that selects at least one layer among a plurality of layers constituting the artificial neural network; a bit quantization module that reduces the size of the memory for storing the parameters of the selected layer in bits; and an accuracy determination module that determines whether the accuracy of the artificial neural network is greater than or equal to a predetermined target value, and the accuracy determination module determines whether the accuracy of the artificial neural network is greater than or equal to the target value. The module is controlled to perform bit quantization on another layer among the plurality of layers, and the bit quantization module allocates a different number of bits to each weight used in the plurality of layers.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 시스템이 제공된다. 이 시스템은, 상기 인공신경망을 구성하는 복수의 레이어 중의 적어도 하나의 레이어를 선택하는 파라미터 선택 모듈; 상기 선택된 레이어의 파라미터에 대한 데이터 표현의 크기를 비트 단위로 감소시키는 비트 양자화 모듈; 및 상기 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 정확도 판단 모듈을 포함하며, 상기 정확도 판단 모듈은, 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 파라미터 선택 모듈과 상기 비트 양자화 모듈을 제어하여, 상기 복수의 레이어 중의 다른 하나의 레이어에 대한 비트 양자화가 실행되도록 하며, 상기 비트 양자화 모듈은, 상기 복수의 레이어에서 출력되는 출력 데이터의 특정 단위로 개별적으로 상이한 수의 비트를 할당한다.According to one embodiment of the present disclosure, a bit quantization system for an artificial neural network is provided. This system includes a parameter selection module that selects at least one layer among a plurality of layers constituting the artificial neural network; a bit quantization module that reduces the size of the data representation for the parameter of the selected layer in bits; and an accuracy determination module that determines whether the accuracy of the artificial neural network is greater than or equal to a predetermined target value, and the accuracy determination module determines whether the accuracy of the artificial neural network is greater than or equal to the target value. Controls a module to perform bit quantization on another layer among the plurality of layers, and the bit quantization module individually allocates a different number of bits to specific units of output data output from the plurality of layers. do.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 시스템이 제공된다. 이 시스템은, 상기 인공신경망을 구성하는 복수의 레이어 중의 적어도 하나의 레이어를 선택하는 파라미터 선택 모듈; 상기 선택된 레이어의 파라미터에 대한 데이터 표현의 크기를 비트 단위로 감소시키는 비트 양자화 모듈; 및 상기 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 정확도 판단 모듈을 포함하며, 상기 정확도 판단 모듈은, 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 파라미터 선택 모듈과 상기 비트 양자화 모듈을 제어하여, 상기 복수의 레이어 중의 다른 하나의 레이어에 대한 비트 양자화가 실행되도록 하며, 상기 비트 양자화 모듈은, 상기 복수의 레이어에서 출력되는 출력 데이터의 개별적 값에 각각 다른 비트를 할당한다.According to one embodiment of the present disclosure, a bit quantization system for an artificial neural network is provided. This system includes a parameter selection module that selects at least one layer among a plurality of layers constituting the artificial neural network; a bit quantization module that reduces the size of the data representation for the parameter of the selected layer in bits; and an accuracy determination module that determines whether the accuracy of the artificial neural network is greater than or equal to a predetermined target value, and the accuracy determination module determines whether the accuracy of the artificial neural network is greater than or equal to the target value. The module is controlled to perform bit quantization on another layer among the plurality of layers, and the bit quantization module allocates different bits to individual values of output data output from the plurality of layers.

본 개시의 다양한 실시예들에 따르면, 인공신경망에 있어서 학습 또는 추론 등의 연산에 필요한 데이터들의 비트 수를 양자화 함으로써, 전체 연산 성능을 개선할 수 있다. 또한, 인공신경망을 구현하는데 필요한 하드웨어 리소스는 절감하고, 전력 소모와 메모리 필요 사용량을 감소시키면서, 인공지능 정확도의 열화가 없는 인공신경망을 구현하는 것이 가능하다. According to various embodiments of the present disclosure, overall computational performance can be improved by quantizing the number of bits of data required for operations such as learning or inference in an artificial neural network. In addition, it is possible to reduce the hardware resources required to implement an artificial neural network, reduce power consumption and memory usage, and implement an artificial neural network without deterioration in artificial intelligence accuracy.

본 개시의 효과는 이상에서 언급한 효과로 제한되지 않으며, 언급되지 않은 다른 효과들은 청국범위의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The effects of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description of the scope of Cheongguk.

본 개시의 실시예들은, 이하 설명하는 첨부 도면들을 참조하여 설명될 것이며, 여기서 유사한 참조 번호는 유사한 요소들을 나타내지만, 이에 한정되지는 않는다.
도 1은 본 개시의 일 실시예에 따른 복수의 레이어와 복수의 레이어 가중치를 이용하여 입력 데이터에 대한 출력 데이터를 획득하는 인공신경망의 예를 보여주는 도면이다.
도 2 내지 도 3은, 본 개시의 일 실시예에 따른 도 1에 도시된 인공신경망의 구체적인 구현예들을 설명하기 위한 도면이다.
도 4는 본 개시의 일 실시예에 따른 복수의 레이어를 포함하는 인공신경망의 다른 예를 보여주는 도면이다.
도 5는 본 개시의 일 실시예에 따른 컨볼루션 레이어에서 입력 데이터와 합성곱 연산에 사용되는 가중치 커널을 나타내는 도면이다.
도 6은 본 개시의 일 실시예에 따른 입력 데이터에 대해 제1 가중치 커널을 사용하여 합성곱을 실행하여 제1 활성화 맵을 생성하는 절차를 설명하는 도면이다.
도 7은 본 개시의 일 실시예에 따른 입력 데이터에 대해 제2 가중치 커널을 사용하여 합성곱을 실행하여 제2 활성화 맵을 생성하는 절차를 설명하는 도면이다.
도8은 본 개시의 일 실시예에 따른 컨볼루션 레이어의 연산 과정을 행렬로 표현한 도면이다.
도 9는 본 개시의 일 실시예에 따른 완전 연결 레이어의 연산 과정을 행렬로 표현한 도면이다.
도 10은 본 개시의 일 실시예에 따른 콘볼루션 레이어의 비트 양자화 과정을 행렬로 표현한 도면이다.
도 11은 본 개시의 일 실시예에 따른 인공신경망의 비트 양자화 방법을 나타내는 순서도이다.
도 12는 본 개시의 다른 실시예에 따른 인공신경망의 비트 양자화 방법을 나타내는 순서도이다.
도 13은 본 개시의 또 다른 실시예에 따른 인공신경망의 비트 양자화 방법을 나타내는 순서도이다.
도 14는 본 개시의 일 실시예에 따른 인공신경망의 레이어 별 연산량의 예시를 나타내는 그래프이다.
도 15는 본 개시의 일 실시예에 따른 순방향 양자화(forward bit quantization) 방법에 의해 비트 양자화가 실행된 인공신경망의 레이어 별 비트 수를 나타내는 그래프이다.
도 16은 본 개시의 일 실시예에 따른 역방향 양자화(backward bit quantization) 방법에 의해 비트 양자화가 실행된 인공신경망의 레이어 별 비트 수를 나타내는 그래프이다.
도 17은 본 개시의 일 실시예에 따른 고 연산량 레이어 우선 양자화(high computational cost layer first bit quantization) 방법에 의해 비트 양자화가 실행된 인공신경망의 레이어 별 비트 수를 나타내는 그래프이다.
도 18은 본 개시의 일 실시예에 따른 저 연산량 레이어 우선 양자화(low computational cost layer first bit quantization) 방법에 의해 비트 양자화가 실행된 인공신경망의 레이어 별 비트 수를 나타내는 그래프이다.
도 19는 본 개시의 일 실시예에 따른 인공신경망의 하드웨어 구현 예를 도시하는 도면이다.
도 20은 본 개시의 다른 실시예에 따른 인공신경망의 하드웨어 구현 예를 도시하는 도면이다.
도 21은 본 개시의 또 다른 실시예에 따른 인공신경망의 하드웨어 구현 예를 도시하는 도면이다.
도 22은 본 개시의 일 실시예에 따른 인공신경망에 대해 비트 양자화를 실행하는 시스템의 구성을 도시하는 도면이다.Embodiments of the present disclosure will be described with reference to the accompanying drawings described below, in which like reference numerals indicate like elements, but are not limited thereto.
FIG. 1 is a diagram showing an example of an artificial neural network that obtains output data for input data using a plurality of layers and a plurality of layer weights according to an embodiment of the present disclosure.
FIGS. 2 and 3 are diagrams for explaining specific implementation examples of the artificial neural network shown in FIG. 1 according to an embodiment of the present disclosure.
Figure 4 is a diagram showing another example of an artificial neural network including a plurality of layers according to an embodiment of the present disclosure.
FIG. 5 is a diagram illustrating input data and a weight kernel used in a convolution operation in a convolution layer according to an embodiment of the present disclosure.
FIG. 6 is a diagram illustrating a procedure for generating a first activation map by performing convolution on input data using a first weight kernel according to an embodiment of the present disclosure.
FIG. 7 is a diagram illustrating a procedure for generating a second activation map by executing convolution on input data using a second weight kernel according to an embodiment of the present disclosure.
Figure 8 is a diagram expressing the calculation process of a convolutional layer according to an embodiment of the present disclosure as a matrix.
Figure 9 is a diagram expressing the calculation process of a fully connected layer according to an embodiment of the present disclosure as a matrix.
Figure 10 is a diagram expressing the bit quantization process of a convolutional layer according to an embodiment of the present disclosure as a matrix.
Figure 11 is a flowchart showing a bit quantization method of an artificial neural network according to an embodiment of the present disclosure.
Figure 12 is a flowchart showing a bit quantization method of an artificial neural network according to another embodiment of the present disclosure.
Figure 13 is a flowchart showing a bit quantization method of an artificial neural network according to another embodiment of the present disclosure.
Figure 14 is a graph showing an example of the amount of calculation for each layer of an artificial neural network according to an embodiment of the present disclosure.
Figure 15 is a graph showing the number of bits per layer of an artificial neural network in which bit quantization has been performed by the forward bit quantization method according to an embodiment of the present disclosure.
FIG. 16 is a graph showing the number of bits per layer of an artificial neural network in which bit quantization is performed using a backward bit quantization method according to an embodiment of the present disclosure.
FIG. 17 is a graph showing the number of bits per layer of an artificial neural network in which bit quantization is performed using a high computational cost layer first bit quantization method according to an embodiment of the present disclosure.
FIG. 18 is a graph showing the number of bits per layer of an artificial neural network in which bit quantization is performed using a low computational cost layer first bit quantization method according to an embodiment of the present disclosure.
FIG. 19 is a diagram illustrating an example of hardware implementation of an artificial neural network according to an embodiment of the present disclosure.
FIG. 20 is a diagram illustrating an example of hardware implementation of an artificial neural network according to another embodiment of the present disclosure.
FIG. 21 is a diagram illustrating an example of hardware implementation of an artificial neural network according to another embodiment of the present disclosure.
FIG. 22 is a diagram illustrating the configuration of a system that performs bit quantization for an artificial neural network according to an embodiment of the present disclosure.

이하, 본 개시의 실시를 위한 구체적인 내용을 첨부된 도면을 참조하여 상세히 설명한다. 다만, 이하의 설명에서는 본 개시의 요지를 불필요하게 흐릴 우려가 있는 경우, 널리 알려진 기능이나 구성에 관한 구체적 설명은 생략하기로 한다.Hereinafter, specific details for implementing the present disclosure will be described in detail with reference to the attached drawings. However, in the following description, detailed descriptions of well-known functions or configurations will be omitted if there is a risk of unnecessarily obscuring the gist of the present disclosure.

첨부된 도면에서, 동일하거나 대응하는 구성요소에는 동일한 참조부호가 부여되어 있다. 또한, 이하의 실시예들의 설명에 있어서, 동일하거나 대응하는 구성요소를 중복하여 기술하는 것이 생략될 수 있다. 그러나 구성요소에 관한 기술이 생략되어도, 그러한 구성요소가 어떤 실시예에 포함되지 않는 것으로 의도되지는 않는다.In the accompanying drawings, identical or corresponding components are given the same reference numerals. Additionally, in the description of the following embodiments, overlapping descriptions of identical or corresponding components may be omitted. However, even if descriptions of components are omitted, it is not intended that such components are not included in any embodiment.

본 개시에서, "파라미터"는, 인공신경망 또는 인공신경망을 구성하는 각 레이어의 가중치 데이터, 특징맵 데이터, 활성화맵 데이터 중 어느 하나 이상을 의미할 수 있다. 또한, "파라미터"는, 이와 같은 데이터로 표현되는 인공신경망 또는 인공신경망을 구성하는 각 레이어를 의미할 수도 있다. 또한, 본 개시에서, "비트 양자화"는, 파라미터 또는 파라미터들의 그룹을 나타내는 데이터 표현의 비트 수를 감소시키는 연산 또는 동작을 의미할 수 있다. In the present disclosure, “parameter” may mean any one or more of weight data, feature map data, and activation map data of an artificial neural network or each layer constituting an artificial neural network. Additionally, “parameter” may refer to an artificial neural network or each layer constituting an artificial neural network expressed as such data. Additionally, in this disclosure, “bit quantization” may mean an operation or operation that reduces the number of bits of a data representation representing a parameter or group of parameters.

본 개시는 디지털 하드웨어 시스템의 연산량과 메모리 사용량 및 전력소모를 감소시키기 위해, 관련 연산에 사용 되는 파라미터의 데이터 표현 크기를 비트 단위로 감소시키는 양자화 방법과 시스템의 다양한 실시예들을 제공한다. 일부 실시예에서, 본 개시의 비트 양자화 방법과 시스템은, 인공신경망의 연산에 사용되는 파라미터의 크기를 비트 단위로 감소시킬 수 있다. 일반적으로 인공신경망의 연산에는 32비트, 16비트, 또는 8 비트 단위의 데이터구조(예를 들어, CPU, GPU, 메모리, 캐쉬, 버퍼 등)를 사용한다. 따라서, 본 개시의 양자화 방법과 시스템은 인공신경망의 연산에 사용되는 파라미터의 크기를 32, 16, 8 비트 이외의 다른 비트로 감소시킬 수 있다. 더욱이 인공신경망의 각각의 파라미터 또는 각각의 파라미터의 그룹에게 특정 비트 수를 개별적으로 상이하게 할당할 수 있다.The present disclosure provides various embodiments of a quantization method and system that reduce the data representation size of parameters used in related operations in bits in order to reduce the amount of computation, memory usage, and power consumption of a digital hardware system. In some embodiments, the bit quantization method and system of the present disclosure can reduce the size of parameters used in the calculation of an artificial neural network in units of bits. In general, artificial neural network operations use 32-bit, 16-bit, or 8-bit data structures (e.g., CPU, GPU, memory, cache, buffer, etc.). Accordingly, the quantization method and system of the present disclosure can reduce the size of parameters used in artificial neural network operations to bits other than 32, 16, and 8 bits. Moreover, a specific number of bits can be individually and differently assigned to each parameter or each group of parameters of the artificial neural network.

일부 실시예에서, 본 개시의 비트 양자화 방법과 시스템은, 인공 신경망 모델에 대하여, 모든 가중치를 위해 n비트(n은 n > 0인 정수)를 설정하고, 각 레이어의 출력 데이터를 m 비트(m은 m > 0인 정수)를 설정할 수 있다. In some embodiments, the bit quantization method and system of the present disclosure sets n bits (n is an integer where n > 0) for all weights for an artificial neural network model, and sets the output data of each layer to m bits (m can set an integer where m > 0.

다른 실시예에서, 본 개시의 비트 양자화 방법과 시스템은, 인공 신경망 모델의 각 레이어의 가중치와 출력 데이터에 n비트를 할당할 수 있으며, 여기서 n은 각 레이어 마다 다른 수로 설정될 수 있다. In another embodiment, the bit quantization method and system of the present disclosure may allocate n bits to the weight and output data of each layer of the artificial neural network model, where n may be set to a different number for each layer.

또 다른 실시예에서, 본 개시의 비트 양자화 방법과 시스템은, 인공 신경망 모델의 각 레이어의 가중치와 출력 데이터에 서로 다른 비트를 할당하며, 또한 각 레이어 마다 가중치와 해당 레이어에서 출력 특징맵 파라미터에 대해 다른 수의 비트를 할당할 수 있다. In another embodiment, the bit quantization method and system of the present disclosure allocates different bits to the weights and output data of each layer of the artificial neural network model, and also assigns different bits to the weights and output feature map parameters in each layer for each layer. Different numbers of bits can be assigned.

본 개시의 비트 양자화 방법과 시스템은, 다양한 종류의 인공 신경망에 적용될 수 있다. 예를 들어, 본 개시의 비트 양자화 방법과 시스템이 컨볼루션 인공 신경망(CNN: convolution neural network)에 적용되는 경우, 이 인공신경망의 각 레이어 내에서 사용하는 가중치 커널들에 개별적으로 다른 비트를 할당할 수 있다.The bit quantization method and system of the present disclosure can be applied to various types of artificial neural networks. For example, when the bit quantization method and system of the present disclosure are applied to a convolutional neural network (CNN), different bits may be individually assigned to the weight kernels used within each layer of the artificial neural network. You can.

또 다른 실시예에서, 본 개시의 비트 양자화 방법과 시스템은, 다층 구조의 인공 신경망 모델의 각 레이어 내에서 사용되는 각 가중치 별로 다른 비트를 할당하거나, 각 레이어의 출력 데이터의 특정 단위로 개별적인 비트를 할당하거나, 각 레이어의 출력 데이터의 개별적 값에 다른 비트를 할당할 수 있다.In another embodiment, the bit quantization method and system of the present disclosure allocates different bits to each weight used within each layer of a multi-layer artificial neural network model, or individual bits as specific units of output data of each layer. Alternatively, different bits can be assigned to individual values of the output data of each layer.

이상 설명한 본 개시의 다양한 실시예들에 따른 비트 양자화 방법과 시스템은, 이상 설명한 실시예들 중의 어느 하나를 인공 신경망 모델에 적용할 수 있으나, 이에 한정되는 것은 아니며, 이 실시예들 중 하나 이상을 결합하여 인공 신경망 모델에 적용할 수도 있다.The bit quantization method and system according to various embodiments of the present disclosure described above may apply any one of the embodiments described above to an artificial neural network model, but is not limited thereto, and may apply one or more of these embodiments. It can also be combined and applied to an artificial neural network model.

도 1은 본 개시의 일 실시예에 따른 복수의 레이어와 복수의 레이어 가중치를 이용하여 입력 데이터에 대한 출력 데이터를 획득하는 인공신경망(100)의 예를 보여주는 도면이다. FIG. 1 is a diagram showing an example of an artificial neural network 100 that obtains output data for input data using a plurality of layers and a plurality of layer weights according to an embodiment of the present disclosure.

일반적으로, 인공신경망(100)과 같은 다층 구조의 인공 신경망은, 머신러닝(Machine Learning) 기술과 인지과학에서, 생물학적 신경망의 구조에 기초하여 구현된 통계학적 학습 알고리즘 또는 그 알고리즘을 실행하는 구조를 포함한다. 즉, 인공신경망(100)은, 생물학적 신경망에서 와 같이 시냅스의 결합으로 네트워크를 형성한 인공 뉴런인 노드(node)들이 시냅스의 가중치를 반복적으로 조정하여, 특정 입력에 대응한 올바른 출력과 추론된 출력 사이의 오차가 감소되도록 학습함으로써, 문제 해결 능력을 가지는 머신러닝 모델을 생성할 수 있다. In general, an artificial neural network with a multi-layer structure, such as the artificial neural network 100, is a statistical learning algorithm implemented based on the structure of a biological neural network or a structure for executing the algorithm in machine learning technology and cognitive science. Includes. In other words, in the artificial neural network 100, nodes, which are artificial neurons that form a network by combining synapses, as in a biological neural network, repeatedly adjust the weights of the synapses to produce the correct output and inferred output corresponding to a specific input. By learning to reduce the error between the two, a machine learning model with problem-solving capabilities can be created.

일 예에서, 인공신경망(100)은 하나 이상의 노드들이 포함된 레이어들과 이들 사이의 연결로 구성된 다층 퍼셉트론(MLP: multilayer perceptron)으로 구현될 수 있다. 그러나, 본 실시예에 따른 인공신경망(100)은 MLP의 구조에 한정되는 것은 아니며, 다층 구조를 갖는 다양한 인공신경망 구조들 중의 하나를 이용하여 구현될 수 있다. In one example, the artificial neural network 100 may be implemented as a multilayer perceptron (MLP) consisting of layers containing one or more nodes and connections between them. However, the artificial neural network 100 according to this embodiment is not limited to the MLP structure, and can be implemented using one of various artificial neural network structures having a multi-layer structure.

도 1에 도시된 바와 같이, 인공신경망(100)은, 외부로부터 입력 데이터를 입력하면, 각각 하나 이상의 노드로 구성된 복수의 레이어(110_1, 110_2, ..., 110_N)를 거쳐 입력 데이터에 대응한 출력 데이터를 출력하도록 구성된다. As shown in FIG. 1, when input data is input from the outside, the artificial neural network 100 passes through a plurality of layers (110_1, 110_2, ..., 110_N), each composed of one or more nodes, to generate a layer corresponding to the input data. It is configured to output output data.

일반적으로, 인공신경망(100)의 학습 방법에는, 교사 신호(정답)의 입력에 의해서 문제의 해결에 최적화되도록 학습하는 지도 학습(Supervised Learning)방법, 교사 신호를 필요로 하지 않는 비지도 학습(Unsupervised Learning)방법, 지도 학습과 비지도 학습을 함께 이용하는 준 지도 학습(Semi-supervised Learning)방법이 있다. 도 1에 도시된 인공신경망(100)은, 사용자의 선택에 따라 지도 학습(Supervised Learning)방법, 비지도 학습(Unsupervised Learning)방법, 준 지도 학습(Semi-supervised Learning)방법 중 적어도 하나 이상의 방법을 이용하여, 출력 데이터를 생성하는 인공신경망(100)을 학습시킬 수 있다. In general, the learning method of the artificial neural network 100 includes a supervised learning method that learns to optimize problem solving by inputting a teacher signal (correct answer), and an unsupervised learning method that does not require a teacher signal. Learning) method, and semi-supervised learning method that uses both supervised learning and unsupervised learning. The artificial neural network 100 shown in FIG. 1 uses at least one of a supervised learning method, an unsupervised learning method, and a semi-supervised learning method according to the user's selection. Using this, the artificial neural network 100 that generates output data can be trained.

도 2 내지 도 3은, 본 개시의 일 실시예에 따른 도 1에 도시된 인공신경망(100)의 구체적인 구현예들을 설명하기 위한 도면이다. FIGS. 2 and 3 are diagrams for explaining specific implementation examples of the artificial neural network 100 shown in FIG. 1 according to an embodiment of the present disclosure.

도 2를 참조하면, 인공신경망(200)은, 입력 데이터(210)가 입력되는 입력 노드(, ... , ), 입력 데이터(210)에 대응하는 출력 데이터를 출력하는 출력 노드(, ... , ), 입력 노드와 출력 노드 사이에 위치하는 은닉 노드 및 다수의 파라미터를 포함할 수 있다. 입력 노드(, ... , )는, 입력층(220)을 구성하는 노드로서, 외부로부터 입력 데이터(210)(예를 들어, 이미지)를 수신하고, 출력 노드(, ... , )는 출력층(240)을 구성하는 노드로서, 외부로 출력데이터를 출력할 수 있다. 입력 노드와 출력 노드 사이에 위치한 은닉 노드는, 은닉층(230)을 구성하는 노드로서, 입력 노드의 출력 데이터를 출력 노드의 입력 데이터로 연결할 수 있다. 입력층(220)의 각 노드는, 도 2에 도시된 바와 같이, 출력층(240)의 각 출력 노드와 완전 연결될 수 있고, 불완전 연결될 수 있다. 또한, 입력 노드는, 외부로부터 입력 데이터를 수신하여 은닉 노드로 전달해주는 역할을 할 수 있다. 이때, 은닉 노드와 출력 노드에서는, 데이터에 대한 계산을 수행할 수 있는데, 수신한 입력 데이터에 파라미터(또는 가중치)를 곱하여 계산을 수행할 수 있다. 각 노드의 계산이 완료되면, 계산 결과값을 모두 합한 후, 미리 설정된 활성화 함수를 이용하여 출력 데이터를 출력할 수 있다. Referring to FIG. 2, the artificial neural network 200 has an input node ( , ... , ), an output node that outputs output data corresponding to the input data 210 ( , ... , ), a hidden node located between the input node and the output node, and a number of parameters may be included. input node ( , ... , ) is a node constituting the input layer 220, which receives input data 210 (e.g., image) from the outside, and an output node ( , ... , ) is a node that constitutes the output layer 240 and can output output data to the outside. The hidden node located between the input node and the output node is a node that constitutes the hidden layer 230, and can connect the output data of the input node to the input data of the output node. As shown in FIG. 2, each node of the input layer 220 may be fully or incompletely connected to each output node of the output layer 240. Additionally, the input node may serve to receive input data from the outside and transmit it to the hidden node. At this time, calculations on data can be performed at the hidden node and output node, and calculations can be performed by multiplying the received input data by parameters (or weights). When the calculation of each node is completed, all calculation results can be added together and output data can be output using a preset activation function.

은닉 노드와 출력 노드(, ... , )는 활성화 함수를 갖는다. 활성화 함수는 계단 함수(step function), 부호 함수(sign function), 선형 함수(linear function), 로지스틱 시그모이드 함수(logistic sigmoid function), 하이퍼탄젠트 함수(hyper tangent function), ReLU 함수, 소프트맥스(softmax) 함수 중 어느 하나일 수 있다. 활성화 함수는 통상의 기술자라면 인공 신경망의 학습 방법에 따라 적절히 결정될 수 있다. Hidden nodes and output nodes ( , ... , ) has an activation function. Activation functions include step function, sign function, linear function, logistic sigmoid function, hyper tangent function, ReLU function, and softmax ( softmax) function. The activation function can be appropriately determined by a person skilled in the art according to the learning method of the artificial neural network.

인공 신경망(200)은 가중치 값들을 반복적으로 적절한 값으로 갱신(또는 수정)하는 과정으로 기계 학습한다. 인공 신경망(200)이 기계 학습하는 방법에는 대표적으로 지도 학습과 비지도 학습이 있다. The artificial neural network 200 performs machine learning by repeatedly updating (or modifying) weight values to appropriate values. Representative methods of machine learning by the artificial neural network 200 include supervised learning and unsupervised learning.

지도 학습은 입력 데이터에 대해 임의의 신경망이 계산해내기를 바라는 목표 출력 데이터가 명확히 정해져 있는 상태에서, 상기 입력 데이터를 상기 신경망에 넣어서 얻은 출력 데이터를 상기 목표 데이터에 비슷해질 수 있도록 가중치 값들을 갱신시키는 학습 방법이다. 도 2의 다층 구조의 인공신경망(200)은 지도 학습에 기반하여 생성될 수 있다.Supervised learning involves updating weight values so that the output data obtained by inputting the input data into the neural network is similar to the target data, in a state where the target output data that a random neural network wants to calculate for the input data is clearly determined. It's a learning method. The multi-layered artificial neural network 200 of FIG. 2 can be created based on supervised learning.

도 3을 참조하면, 다층 구조의 인공 신경망의 다른 예로서, 심층 신경망(DNN, Deep Neural Network)의 한 종류인 컨볼루션 신경망(CNN, Convolutional Neural Network)(300)이 있다. 컨벌루션 신경망(CNN)은 하나 또는 여러 개의 컨벌루션 계층(convolutional layer)과 통합 계층(pooling layer), 완전하게 연결된 계층(fully connected layer)들로 구성된 신경망이다. 컨벌루션 신경망(CNN)은 2차원 데이터의 학습에 적합한 구조를 가지고 있으며, 역전달(Backpropagation algorithm)을 통해 학습될 수 있다. 영상 내 객체 분류, 객체 탐지 등 다양한 응용 분야에 폭넓게 활용되는 DNN의 대표적 모델 중 하나이다.Referring to FIG. 3, another example of a multi-layer artificial neural network is a convolutional neural network (CNN) 300, which is a type of deep neural network (DNN). A convolutional neural network (CNN) is a neural network composed of one or several convolutional layers, a pooling layer, and a fully connected layer. Convolutional neural network (CNN) has a structure suitable for learning two-dimensional data, and can be learned through the backpropagation algorithm. It is one of the representative models of DNN that is widely used in various application fields such as object classification and object detection in images.

여기서, 본 발명의 다층 구조의 인공 신경망이 도 2 및 도 3에 도시된 인공 신경망으로 한정되는 것은 아니며, 기타 다양한 인공 신경망에 다른 종류의 데이터를 기계 학습시켜 학습된 모델을 얻을 수도 있음에 유의해야 한다.Here, it should be noted that the multi-layered artificial neural network of the present invention is not limited to the artificial neural network shown in Figures 2 and 3, and that the learned model can be obtained by machine learning other types of data in various other artificial neural networks. do.

도 4는 본 개시의 일 실시예에 따른 복수의 레이어를 포함하는 인공신경망의 다른 예를 보여주는 도면이다. 도 1에 도시된 인공신경망(400)은, 도 3에 개시되어 있는 복수의 컨볼루션 레이어(convolution layer: CONV)(420), 복수의 서브샘플링 레이어(subsampling layer: SUBS)(430), 및 복수의 완전 연결 레이어(fully-connected layer: FC)(440)를 포함하는 컨볼루션 인공 신경망(convolution neural network: CNN)이다. Figure 4 is a diagram showing another example of an artificial neural network including a plurality of layers according to an embodiment of the present disclosure. The artificial neural network 400 shown in FIG. 1 includes a plurality of convolution layers (CONV) 420, a plurality of subsampling layers (SUBS) 430, and a plurality of subsampling layers (SUBS) 430 disclosed in FIG. 3. It is a convolutional neural network (CNN) including a fully-connected layer (FC) 440.

CNN(400)의 CONV(420)은 입력 데이터(410)에 대해 컨볼루션 가중치 커널을 적용하여 특징맵(feature map)을 생성한다. 여기서, CONV(420)은 고차원의 입력 데이터(예를 들어, 이미지 또는 영상)에 대해서 특징을 추출하는 일종의 템플릿 역할을 할 수 있다. 구체적으로, 하나의 컨볼루션은 입력데이터(410)의 부분을 대상으로 위치를 변경하면서 여러 번 반복하여 적용되어 전체 입력데이터(410)에 대해 특징을 추출할 수 있다. 또한, SUBS(430)은 CONV(420)에 의해 생성된 특징맵에 대해서 공간적 해상도를 감소하는 역할을 한다. 서브샘플링은 입력데이터(예를 들어, 특징맵)의 차원을 축소하는 기능을 하며, 이를 통해 입력 데이터(410)의 분석 문제의 복잡도를 감소시킬 수 있다. SUBS(430)은 특징맵의 부분의 값들에 대해 최대치를 취하는 맥스풀링(max pooling) 연산자나 평균치를 취하는 평균풀링(average pooling) 연산자를 사용할 수 있다. 이와 같은 SUBS(430)은 풀링 연산을 통해 특징맵의 차원을 감소시킬 뿐 아니라, 특징맵이 이동(shift)과 왜곡(distortion)에 대해 강인하도록 하는 효과를 갖는다. 마지막으로 FC(440)은 특징맵에 기초하여 입력 데이터를 분류하는 기능을 수행할 수 있다.The CONV 420 of the CNN 400 generates a feature map by applying a convolutional weight kernel to the input data 410. Here, CONV 420 may serve as a type of template that extracts features from high-dimensional input data (for example, images or videos). Specifically, one convolution can be applied repeatedly several times while changing the position of a portion of the input data 410 to extract features for the entire input data 410. Additionally, the SUBS 430 serves to reduce the spatial resolution of the feature map generated by the CONV 420. Subsampling functions to reduce the dimension of input data (eg, feature map), and through this, the complexity of the analysis problem of the input data 410 can be reduced. The SUBS 430 may use a max pooling operator that takes the maximum value of the values of the portion of the feature map or an average pooling operator that takes the average value. Such SUBS 430 not only reduces the dimension of the feature map through a pooling operation, but also has the effect of making the feature map robust against shift and distortion. Lastly, the FC 440 can perform the function of classifying input data based on the feature map.

CNN(400)은, CONV(420), SUBS(430), FC(440)의 레이어 수 또는 연산자의 종류에 따라 다양한 구성과 기능을 실행할 수 있다. 예를 들어, CNN(400)은, AlexNet, VGGNet, LeNet, ResNet 등과 같은 다양한 CNN의 구성 중 어느 하나를 포함할 수 있으나, 이에 한정되는 것은 아니다.CNN (400) can execute various configurations and functions depending on the number of layers or type of operator of CONV (420), SUBS (430), and FC (440). For example, the CNN 400 may include any one of various CNN configurations such as AlexNet, VGGNet, LeNet, ResNet, etc., but is not limited thereto.

이상 설명한 구성을 갖는 CNN(400)의 CONV(420)은, 이미지 데이터가 입력 데이터(410)로 입력되면, 입력 데이터(410)에 가중치를 적용하여 합성곱 연산을 통해 특징맵을 생성할 수 있는데, 이때, 사용되는 가중치들의 그룹을 가중치 커널(kernel)이라고 지칭할 수 있다. 가중치 커널은, n x m x d의 3차원 행렬(여기서, n은 입력 이미지 데이터와 마찬가지로 특정 크기의 행을 나타내고, m은 특정 크기의 열을 나타내며, d는 입력 이미지 데이터의 채널 등을 나타내는 것으로, 이들 차원의 수는 1이상의 정수임)로 구성되는데, 입력 데이터(410)를 지정된 간격으로 순회하며 합성곱 연산을 통해 특징맵을 생성할 수 있다. 이때, 입력 데이터(410)가 복수의 채널(예를 들어, RGB의 3개의 채널)을 갖는 컬러 이미지라면, 가중치 커널은 입력 데이터(410)의 각 채널을 순회하며 합성곱을 계산한 후, 채널 별 특징맵을 생성할 수 있다. When image data is input as input data 410, the CONV 420 of the CNN 400 having the configuration described above can apply weights to the input data 410 and generate a feature map through a convolution operation. , At this time, the group of weights used may be referred to as a weight kernel. The weight kernel is a three-dimensional matrix of n The number is an integer of 1 or more), and the input data 410 can be traversed at specified intervals and a feature map can be generated through a convolution operation. At this time, if the input data 410 is a color image with a plurality of channels (for example, three channels of RGB), the weight kernel traverses each channel of the input data 410, calculates the convolution, and then calculates the convolution for each channel. A feature map can be created.

도 5는 본 개시의 일 실시예에 따른 컨볼루션 레이어의 입력 데이터와 합성곱 연산에 사용되는 가중치 커널을 나타내는 도면이다. Figure 5 is a diagram showing input data of a convolution layer and a weight kernel used in a convolution operation according to an embodiment of the present disclosure.

도시된 바와 같이, 입력 데이터(510)는, 특정 크기의 행(530)과 특정 크기의 열(540)로 구성된 2차원적 행렬로 표시되는 이미지 또는 영상일 수 있다. 앞서 설명한 바와 같이, 입력 데이터(510)는 복수의 채널(550)을 가질 수 있는데, 여기서 채널(550)은 입력 데이터 이미지의 컬러 성분의 수를 나타낼 수 있다. 한편, 가중치 커널(520)은, 입력 데이터(510)의 일정 부분을 스캐닝하면서 해당 부분의 특징을 추출하기 위한 합성곱에 사용되는 가중치 커널일 수 있다. 가중치 커널(520)은, 입력 데이터 이미지와 마찬가지로 특정 크기의 행(560), 특정 크기의 열(570), 특정 수의 채널(580)을 갖도록 구성될 수 있다. 일반적으로 가중치 커널(520)의 행(560), 열(570)의 크기는 동일하도록 설정되며, 채널(580)의 수는 입력 데이터 이미지의 채널(550)의 수와 동일할 수 있다. As shown, the input data 510 may be an image or video displayed as a two-dimensional matrix consisting of rows 530 of a specific size and columns 540 of a specific size. As previously described, the input data 510 may have a plurality of channels 550, where the channels 550 may represent the number of color components of the input data image. Meanwhile, the weight kernel 520 may be a weight kernel used in convolution to extract features of a certain portion of the input data 510 while scanning it. The weight kernel 520 may be configured to have rows 560 of a specific size, columns 570 of a specific size, and a specific number of channels 580, similar to the input data image. In general, the sizes of the rows 560 and columns 570 of the weight kernel 520 are set to be the same, and the number of channels 580 may be the same as the number of channels 550 of the input data image.

도 6은 본 개시의 일 실시예에 따른 입력 데이터에 대해 제1 커널을 사용하여 합성곱을 실행하여 제1 활성화 맵을 생성하는 절차를 설명하는 도면이다. FIG. 6 is a diagram illustrating a procedure for generating a first activation map by executing convolution on input data using a first kernel according to an embodiment of the present disclosure.

제1 가중치 커널(610)은, 도 2의 가중치 커널(620)의 제1채널을 나타내는 가중치 커널일 수 있다. 제1 가중치 커널(610)은, 입력 데이터를 지정된 간격으로 순회하며 합성곱을 실행함으로써, 최종적으로 제1 활성화 맵(630)을 생성할 수 있다. 합성곱은, 입력 데이터(510)의 일 부분에 제1 가중치 커널(610)을 적용하였을 때, 그 부분의 특정 위치의 입력 데이터 값들과 가중치 커널의 해당 위치의 값들을 각각 곱한 뒤 생성된 값들을 모두 더하여 실행된다. 이러한 합성곱 과정을 통해, 제1 결과값(620)이 생성되며, 제1 가중치 커널(610)이 입력 데이터(510)를 순회할 때마다 이러한 합성곱의 결과값들이 생성되어 특징맵을 구성한다. 특징맵의 각 구성요소 값들은 컨볼루션 레이어의 활성화 함수를 통해 제1 활성화 맵(630)으로 변환된다.The first weight kernel 610 may be a weight kernel representing the first channel of the weight kernel 620 of FIG. 2. The first weight kernel 610 can ultimately generate the first activation map 630 by traversing the input data at designated intervals and performing convolution. Convolution, when applying the first weight kernel 610 to a portion of the input data 510, multiplies the input data values at a specific location of that portion and the values at the corresponding location of the weight kernel, and then all the values generated are It is executed in addition. Through this convolution process, a first result value 620 is generated, and each time the first weight kernel 610 traverses the input data 510, the result values of this convolution are generated to form a feature map. . Each component value of the feature map is converted into the first activation map 630 through the activation function of the convolution layer.

도 7은 본 개시의 일 실시예에 따른 입력 데이터에 대해 제2 가중치 커널을 사용하여 합성곱을 실행하여 제2 활성화 맵을 생성하는 절차를 설명하는 도면이다.FIG. 7 is a diagram illustrating a procedure for generating a second activation map by executing convolution on input data using a second weight kernel according to an embodiment of the present disclosure.

도 6에 도시된 바와 같이 제1 가중치 커널(610)을 이용하여 입력 데이터(510)에 대해 합성곱을 실행하여 제1 활성화 맵(620)을 생성한 후, 도 7에 도시된 바와 같이 제2 가중치 커널(710)을 이용하여 입력 데이터(510)에 대해 합성곱을 실행함으로써 제2 활성화 맵(730)을 생성할 수 있다. As shown in FIG. 6, the first activation map 620 is generated by performing convolution on the input data 510 using the first weight kernel 610, and then the second weight is applied as shown in FIG. 7. The second activation map 730 can be generated by performing convolution on the input data 510 using the kernel 710.

제2 가중치 커널(710)은, 도 5의 가중치 커널(520)의 제2채널을 나타내는 가중치 커널일 수 있다. 제2 가중치 커널(710)은, 입력 데이터를 지정된 간격으로 순회하며 합성곱을 실행함으로써, 최종적으로 제2 활성화 맵(730)을 생성할 수 있다. 도 6과 마찬가지로, 합성곱은, 입력 데이터(510)의 일 부분에 제2 가중치 커널(710)을 적용하였을 때, 그 부분의 특정 위치의 입력 데이터 값들과 가중치 커널의 해당 위치의 값들을 각각 곱한 뒤 생성된 값들을 모두 더하여 실행된다. 이러한 합성곱 과정을 통해, 제2 결과값(720)이 생성되며, 제2 가중치 커널(710)이 입력 데이터(510)를 순회할 때마다 이러한 합성곱의 결과값들이 생성되어 특징맵을 구성한다. 특징맵의 각 구성요소 값들은 컨볼루션 레이어의 활성화 함수를 통해 제2 활성화 맵(730)으로 변환된다.The second weight kernel 710 may be a weight kernel representing the second channel of the weight kernel 520 of FIG. 5. The second weight kernel 710 can ultimately generate the second activation map 730 by traversing the input data at specified intervals and performing convolution. Similar to FIG. 6, convolution is performed by multiplying the input data values at a specific location of the portion and the values at the corresponding location of the weight kernel when the second weight kernel 710 is applied to a portion of the input data 510. It is executed by adding up all the generated values. Through this convolution process, a second result value 720 is generated, and each time the second weight kernel 710 traverses the input data 510, the result values of this convolution are generated to form a feature map. . Each component value of the feature map is converted into a second activation map 730 through the activation function of the convolution layer.

도 8은 본 개시의 일 실시예에 따른 입력 특징맵이 하나의 채널을 가지는 경우의 컨볼루션 레이어의 연산 과정을 행렬로 표현한 도면이다. FIG. 8 is a diagram expressing the calculation process of a convolutional layer in a matrix when an input feature map has one channel according to an embodiment of the present disclosure.

도 8에 도시된 콘볼루션 레이어(420)는 도 4에 도시된 CONV(420)에 대응될 수 있다. 도 8에서 콘볼루션 레이어(420)에 입력되는 입력 데이터(810)는 6 x 6의 크기를 갖는 2차원적 행렬로 표시되며, 가중치 커널(814)은 3 x 3 크기를 갖는 2차원적 행렬로 표시된다. 그러나, 컨볼루션 레이어(420)의 입력 데이터(810) 및 가중치 커널(814)의 크기는, 이에 한정되는 것은 아니며, 컨볼루션 레이어(420)가 포함되는 인공신경망의 성능 및 요구사항에 따라 다양하게 변경될 수 있다.The convolution layer 420 shown in FIG. 8 may correspond to the CONV 420 shown in FIG. 4. In Figure 8, the input data 810 input to the convolution layer 420 is displayed as a two-dimensional matrix with a size of 6 x 6, and the weight kernel 814 is a two-dimensional matrix with a size of 3 x 3. displayed. However, the size of the input data 810 and the weight kernel 814 of the convolution layer 420 are not limited to this, and may vary depending on the performance and requirements of the artificial neural network including the convolution layer 420. can be changed.

도시된 바와 같이, 컨볼루션 레이어(420)에 입력 데이터(810)가 입력되면, 가중치 커널(814)이 입력 데이터(810) 상에서 사전 결정된 간격(예를 들어, 1)으로 순회하며, 입력 데이터(810)와 가중치 커널(814)의 동일 위치의 값들을 각각 곱하는 다중 곱(elementwise multiplication)을 실행할 수 있다. 가중치 커널(814)은, 일정 간격으로 입력 데이터(810)를 순회하며, 다중 곱을 통해 획득한 값을 합산(summation)(816)한다. As shown, when input data 810 is input to the convolution layer 420, the weight kernel 814 traverses the input data 810 at predetermined intervals (e.g., 1) and calculates the input data ( Elementwise multiplication can be performed by multiplying the values at the same position of the weight kernel 810) and the weight kernel 814. The weight kernel 814 iterates over the input data 810 at regular intervals and sums (816) the values obtained through multiple multiplication.

구체적으로, 가중치 커널(814)이 입력 데이터(810)의 특정 위치(820)에서 계산한 다중 곱의 값(예를 들어, "3")을 특징맵(818)의 대응 요소(824)에 배정한다. 다음으로, 가중치 커널(814)이 입력 데이터(810)의 다음 위치(822)에서 계산한 다중 곱의 값(예를 들어, "1")을 특징맵(818)의 대응 요소(826)에 배정한다. 이와 같이 가중치 커널(814)이 입력 데이터(810) 상을 순회하면서 계산한 다중 곱의 값들을 특징맵(818)에 모두 배정하면, 4 x 4 크기의 특징맵(818)이 완성된다. 이때, 입력 데이터(810)가 예를 들어 3가지 채널(R채널, G채널, B채널)로 구성된다면, 동일 가중치 커널 또는 채널 별 상이한 채널을 각각 입력 데이터(810)의 각 채널 별 데이터 상을 순회하며 다중 곱(812)과 합(816)을 진행하는 합성곱을 통해 채널 별 특징맵들을 생성할 수 있다. Specifically, the weight kernel 814 assigns the value of the multiple product (e.g., “3”) calculated at a specific position 820 of the input data 810 to the corresponding element 824 of the feature map 818. do. Next, the weight kernel 814 assigns the value of the multiple product (e.g., “1”) calculated at the next position 822 of the input data 810 to the corresponding element 826 of the feature map 818. do. In this way, when the weight kernel 814 traverses the input data 810 and allocates all of the multi-product values calculated to the feature map 818, the feature map 818 of 4 x 4 size is completed. At this time, if the input data 810 consists of, for example, three channels (R channel, G channel, B channel), the same weight kernel or a different channel for each channel is used to compare the data for each channel of the input data 810. Feature maps for each channel can be generated through convolution that traverses and performs multiple multiplication (812) and sum (816).

다시 도 4을 참조하면, CONV(420)는, 도 25내지 도 8를 참조하여 설명한 방법에 따라 생성된 특징맵에 대해 활성화 함수를 적용하여 콘볼루션 레이어의 최종 출력 결과인 활성화 맵(activation map)을 생성할 수 있다. 여기서, 활성화 함수는 시그모이드 함수((sigmoid function), 방사기저 함수(radial basis function: RBF), 정류선형 함수(rectified linear unit: ReLU) 등 다양한 활성화 함수 중의 어느 하나이거나 또는 이들 중 변형된 함수 이거나 다른 함수 일 수 있다.Referring again to FIG. 4, CONV 420 applies an activation function to the feature map generated according to the method described with reference to FIGS. 25 to 8 to generate an activation map that is the final output result of the convolution layer. can be created. Here, the activation function is one of various activation functions such as a sigmoid function ((sigmoid function), radial basis function (RBF), rectified linear unit (ReLU), or a modified function among them. It could be this or another function.

한편, SUBS(430)는, CONV(420)의 출력 데이터인 활성화 맵을 입력 데이터로 수신한다. SUBS(430)은, 활성화 맵의 크기를 줄이거나 특정 데이터를 강조하는 기능을 수행한다. SUBS(430)가 맥스 풀링을 사용하는 경우, 활성화 맵의 특정 영역 안 값의 최댓값을 선택하여 출력한다. 이와 같이 SUBS(430)의 풀링 과정을 통해 입력 데이터의 노이즈를 제거할 수 있고, 그 데이터의 크기를 줄일 수 있다. Meanwhile, the SUBS 430 receives the activation map, which is the output data of the CONV 420, as input data. The SUBS 430 performs a function of reducing the size of the activation map or emphasizing specific data. When SUBS 430 uses max pooling, it selects and outputs the maximum value within a specific area of the activation map. In this way, noise in input data can be removed through the pooling process of the SUBS 430, and the size of the data can be reduced.

또한, FC(440)는 SUBS(430)의 출력 데이터를 수신하여 최종 출력 데이터(450)를 생성할 수 있다. SUBS(430)에서 추출된 활성화 맵은, 완전 연결 레이어(440)에 입력되기 위해 1차원적으로 평면화된다. Additionally, FC 440 may receive output data from SUBS 430 and generate final output data 450. The activation map extracted from the SUBS 430 is one-dimensionally flattened to be input to the fully connected layer 440.

도 9는 본 개시의 일 실시예에 따른 완전 연결 레이어의 연산 과정을 행렬로 표현한 도면이다. Figure 9 is a diagram expressing the calculation process of a fully connected layer according to an embodiment of the present disclosure as a matrix.

도 9에 도시된 완전 연결 레이어(440)는 도 4의 FC(440)에 대응될 수 있다. 이상 설명한 바와 같이, 맥스 풀링 레이어(430)에서 추출된 활성화 맵은 완전 연결 레이어(440)로 입력되기 위해 1차원으로 평명화 될 수 있다. 1차원으로 평명화된 활성화 맵은, 완전 연결 레이어(440)에서 입력 데이터(910)로 수신될 수 있다. 완전 연결 레이어(440)에서는, 1차원의 가중치 커널(914)을 이용하여 입력 데이터(910)와 가중치 커널(914)의 다중 곱(912)을 실행할 수 있다. 이와 같은 입력 데이터(910)와 가중치 커널(914)의 다중 곱의 결과값은 합산(916)되어 출력 데이터(918)로 출력될 수 있다. 이때, 출력 데이터(918)는, CNN(400)에 입력된 입력 데이터(410)에 대한 추론 값을 나타낼 수 있다.The fully connected layer 440 shown in FIG. 9 may correspond to the FC 440 in FIG. 4. As described above, the activation map extracted from the max pooling layer 430 can be flattened into one dimension to be input to the fully connected layer 440. The one-dimensional flattened activation map may be received as input data 910 in the fully connected layer 440. In the fully connected layer 440, multiple multiplication 912 of the input data 910 and the weight kernel 914 can be performed using the one-dimensional weight kernel 914. The result of multiple products of the input data 910 and the weight kernel 914 may be summed 916 and output as output data 918. At this time, the output data 918 may represent an inference value for the input data 410 input to the CNN 400.

이상 설명한 구성을 갖는 CNN(400)은, 복수의 레이어 각각에 대해 2차원 또는 1차원 행렬의 입력 데이터가 입력되고, 입력 데이터에 대해 가중치 커널의 다중 곱과 합산과 같은 복잡한 연산을 통해 학습과 추론 과정을 실행한다. 따라서, CNN(400)의 구성하는 레이어의 수나 연산의 복잡도에 따라 데이터의 학습 및 추론에 소요되는 자원(예를 들어, 연산자의 수나 메모리의 양)이 상당히 증가할 수 있다. 따라서, CNN(400)과 같이 복수의 레이어를 갖는 인공 신경망의 연산량과 메모리를 줄이기 위하여 레이어 별로 사용되는 입출력 데이터에 대한 비트 양자화가 실행될 수 있다. 일 실시예에서, 복수의 레이어를 갖는 CNN(400)의 비트 양자화는, 많은 연산량과 메모리량이 필요한 CONV(420)와 FC(440)에 대해 실행될 수 있다.The CNN 400 having the configuration described above receives input data of a two-dimensional or one-dimensional matrix for each of a plurality of layers, and performs learning and inference through complex operations such as multiple multiplication and summation of weight kernels on the input data. Run the process. Therefore, depending on the number of layers constituting the CNN 400 or the complexity of the operation, the resources (for example, the number of operators or the amount of memory) required for learning and inferring data may increase significantly. Therefore, in order to reduce the amount of computation and memory of an artificial neural network with multiple layers, such as the CNN 400, bit quantization can be performed on input and output data used for each layer. In one embodiment, bit quantization of the CNN 400 with multiple layers may be performed on the CONV 420 and FC 440, which require a large amount of computation and memory.

도 10은 본 개시의 일 실시예에 따른 콘볼루션 레이어의 비트 양자화 과정을 행렬로 표현한 도면이다. Figure 10 is a diagram expressing the bit quantization process of a convolutional layer according to an embodiment of the present disclosure as a matrix.

콘볼루션 레이어에서 실행되는 비트 양자화는, 합성곱 연산에 사용되는 가중치 커널의 각 요소 값의 비트 수를 감소시키는 가중치 또는 가중치 커널 양자화(1028), 및/또는 특징맵 또는 활성화 맵의 각 요소 값의 비트 수를 감소시키는 특징맵 양자화 또는 활성화 맵 양자화(1030)를 포함할 수 있다. Bit quantization performed in the convolutional layer includes weight or weight kernel quantization 1028, which reduces the number of bits of the value of each element of the weight kernel used in the convolution operation, and/or of the value of each element of the feature map or activation map. It may include feature map quantization or activation map quantization 1030, which reduces the number of bits.

일 실시예에 따른 콘볼루션 레이어의 비트 양자화 과정은, 다음과 같이 실행될 수 있다. 콘볼루션 레이어의 입력 데이터(1010)에 가중치 커널(1014)을 적용하여 합성곱을 실행하기 전에, 가중치 커널(1014)에 대한 양자화(716) 과정을 실행하여 양자화된 가중치 커널(1018)을 생성한다. 또한, 입력 데이터(1010)에 대해 양자화된 가중치 커널(1018)을 적용하여 다중 곱(1012)과 합산(1020)을 실행하여 합성곱의 값들을 출력하여 특징맵을 생성한 뒤 활성화 함수를 통해 활성화 맵(1022)을 생성할 수 있다. 다음으로, 활성화 맵에 대해 양자화(1024)를 통해 최종 양자화 활성화 맵(1026)을 생성할 수 있다. The bit quantization process of the convolution layer according to one embodiment may be performed as follows. Before performing convolution by applying the weight kernel 1014 to the input data 1010 of the convolution layer, a quantization 716 process is performed on the weight kernel 1014 to generate a quantized weight kernel 1018. In addition, the quantized weight kernel 1018 is applied to the input data 1010, multiple multiplication (1012) and summation (1020) are performed, the convolution values are output to generate a feature map, and then activated through an activation function. A map 1022 can be created. Next, the final quantized activation map 1026 can be generated through quantization 1024 on the activation map.

이상 설명한 콘볼루션 레이어의 비트 양자화 과정에서, 가중치 커널 양자화(1028)는 다음 수식을 이용하여 실행될 수 있다.In the bit quantization process of the convolutional layer described above, weight kernel quantization 1028 can be performed using the following equation.

여기서, 는 양자화될 가중치 값(예를 들어, 실수의 가중치 및 가중치 커널 내의 각 가중치)을 나타내고, 는 양자화할 비트 수를 나타내고, 는 가k비트 만큼 양자화된 결과를 나타낸다. 즉, 위 수식에 따르면, 먼저 에 대해, 사전 결정된 이진수 를 곱하여 , 가 k 비트만큼 자리수가 증가된다(이하 "제1 값"이라고 함). 다음으로, 제1 값에 대해 라운딩(rounding) 또는 트렁케이션(truncation) 연산을 실행함으로써, 의 소수점 이하 숫자가 제거된다(이하 "제2 값"이라고 함). 제2 값은 이진수 으로 나누어, k 비트만큼 자리수가 다시 감소됨으로써, 최종 양자화된 가중치 커널의 요소 값이 계산될 수 있다. 이와 같은 가중치 또는 가중치 커널 양자화(1028)는 가중치 또는 가중치 커널(1014)의 모든 요소 값에 대해 반복 실행되어, 양자화 된 가중치 값들(1018)이 생성된다.here, represents the weight values to be quantized (e.g., real weights and each weight in the weight kernel), represents the number of bits to be quantized, Is It represents the result quantized by k bits. That is, according to the above formula, first For, a predetermined binary number Multiply by , The number of digits is increased by k bits (hereinafter referred to as “first value”). Next, by performing a rounding or truncation operation on the first value, The digits after the decimal point are removed (hereinafter referred to as the "second value"). The second value is binary By dividing by and reducing the number of digits by k bits again, the element value of the final quantized weight kernel can be calculated. This weight or weight kernel quantization 1028 is repeatedly performed on all element values of the weight or weight kernel 1014 to generate quantized weight values 1018.

한편, 특징맵 또는 활성화 맵 양자화(1030)는, 다음 수식에 의해 실행될 수 있다.Meanwhile, feature map or activation map quantization 1030 can be performed using the following equation.

특징맵 또는 활성화 맵 양자화(1030)에서는, 가중치 또는 가중치 커널 양자화(1028) 방법과 동일한 수식이 이용될 수 있다. 다만, 특징맵 또는 활성화 맵 양자화(1030)에서 특징맵 또는 활성화 맵(1022)의 각 요소 값()(예를 들어, 실수의 계수)에 대한 양자화가 적용되기 전에, 클립핑(clipping)이 적용하여 특징맵 또는 활성화 맵(1022)의 각 요소 값을 0에서 1의 사이 값으로 정규화 시키는 과정을 추가할 수 있다. In feature map or activation map quantization (1030), the same formula as the weight or weight kernel quantization (1028) method can be used. However, in the feature map or activation map quantization 1030, the value of each element of the feature map or activation map 1022 ( ) (e.g., the coefficient of a real number), before quantization is applied, clipping is applied to normalize the value of each element of the feature map or activation map 1022 to a value between 0 and 1. can do.

다음으로, 정규화된 에 대해, 사전 결정된 이진수 를 곱하여, 가 k 비트만큼 자리수가 증가된다("제1 값"). 다음으로, 제1 값에 대해 라운딩 또는 트렁케이션 연산을 실행함으로써, 의 소수점 이하 숫자가 제거된다("제2 값"). 제2 값은 이진수 으로 나누어, k 비트만큼 자리수가 다시 감소됨으로써, 최종 양자화된 특징맵 또는 활성화 맵(1026)의 요소 값이 계산될 수 있다. 이와 같은 특징맵 또는 활성화 맵의 양자화(1030)는 특징맵 또는 활성화 맵(1022)의 모든 요소 값에 대해 반복 실행되어, 양자화된 특징맵 또는 활성화 맵(1026)이 생성된다.Next, the normalized For, a predetermined binary number Multiply by The number of digits is increased by k bits (“first value”). Next, by performing a rounding or truncation operation on the first value, The decimal digits of are removed (“second value”). The second value is binary By dividing by and reducing the number of digits by k bits again, the element value of the final quantized feature map or activation map 1026 can be calculated. This quantization 1030 of the feature map or activation map is repeatedly performed on all element values of the feature map or activation map 1022, thereby generating a quantized feature map or activation map 1026.

이상 설명한 가중치 또는 가중치 커널 양자화(1028)와 특징맵 또는 활성화 맵 양자화(1030)를 통해, 콘볼루션 신경망의 콘볼루션 레이어(420)의 합성곱 연산 등에 소요되는 메모리 크기와 연산량을 비트 단위로 감소시킬 수 있다.Through the weight or weight kernel quantization (1028) and the feature map or activation map quantization (1030) described above, the memory size and amount of calculation required for the convolution operation of the convolution layer 420 of the convolutional neural network can be reduced in bits. You can.

도 11은 본 개시의 일 실시예에 따른 인공신경망의 비트 양자화 방법을 나타내는 순서도이다. 이 실시예는, 인공신경망에서 양자화할 수 있는 데이터 그룹의 단위를, 인공신경망을 구성하는 각 레이어에 속한 모든 파라미터로 가정한 예이다. Figure 11 is a flowchart showing a bit quantization method of an artificial neural network according to an embodiment of the present disclosure. This embodiment is an example in which the unit of a data group that can be quantized in an artificial neural network is assumed to be all parameters belonging to each layer constituting the artificial neural network.

도시된 바와 같이, 인공신경망의 비트 양자화 방법(1100)은, 인공신경망에 포함된 복수의 레이어 중의 적어도 하나의 레이어를 선택하는 단계(S1110)로 개시될 수 있다. 인공신경망에 포함된 복수의 레이어 중에서 어떤 레이어를 선택할지는, 인공신경망의 전체 성능 또는 연산량(또는 메모리양)에 선택될 레이어가 미치는 영향에 따라 결정될 수 있다. 일 실시예에서, 앞서 설명한 도 1 내지 도 3를 참조하여 설명한 다층 구조의 인공 신경망에서는, 인공신경망의 전체 성능 또는 연산량 등에 미치는 영향이 큰 레이어가 임의로 선택될 수 있다. 또한, 도 4 내지 도 10을 참조하여 설명한 콘볼루션 인공신경망(CNN)(400)의 경우에는, 콘볼루션 레이어(420) 및/또는 완전 연결 레이어(440)가 CNN(400)의 전체 성능 또는 연산량 등에 미치는 영향이 크기 때문에, 이들 레이어(420, 440) 중 적어도 하나의 레이어가 선택될 수 있다. As shown, the bit quantization method 1100 of an artificial neural network may begin with selecting at least one layer among a plurality of layers included in the artificial neural network (S1110). Which layer to select from among the plurality of layers included in the artificial neural network may be determined based on the effect of the selected layer on the overall performance or amount of computation (or amount of memory) of the artificial neural network. In one embodiment, in the multi-layered artificial neural network described above with reference to FIGS. 1 to 3, a layer that has a large impact on the overall performance or computational amount of the artificial neural network may be arbitrarily selected. In addition, in the case of the convolutional artificial neural network (CNN) 400 described with reference to FIGS. 4 to 10, the convolutional layer 420 and/or the fully connected layer 440 are the overall performance or computational amount of the CNN 400. Because it has a large influence on the etc., at least one layer among these layers 420 and 440 may be selected.

인공신경망에 포함된 복수의 레이어 중 적어도 하나를 선택하는 방법은, 선택된 레이어가 인공신경망의 전체 성능 또는 연산량 등에 미치는 영향에 따라 결정될 수 있으나, 이에 한정되는 것은 아니고, 다양한 방법들 중에 하나를 포함할 수 있다. 예를 들어, 인공신경망에 포함된 복수의 레이어 중 적어도 하나의 레이어의 선택은, (i) 인공신경망을 구성하는 복수의 레이어의 배열 순서에 따라 입력 데이터가 수신되는 제1 레이어부터 이후 레이어로 순차적으로 선택하는 방법, (ii) 인공신경망을 구성하는 복수의 레이어의 배열 순서에 따라 최종 출력 데이터가 생성되는 가장 마지막 레이어부터 이전 레이어로 순차적으로 선택하는 방법, (iii) 인공신경망을 구성하는 복수의 레이어 중에서 가장 연산량이 높은 레이어부터 선택하는 방법, 또는 (iv) 인공신경망을 구성하는 복수의 레이어 중에서 가장 연산량이 작은 레이어부터 선택하는 방법에 따라 실행될 수도 있다. The method of selecting at least one of the plurality of layers included in the artificial neural network may be determined depending on the effect of the selected layer on the overall performance or computational amount of the artificial neural network, but is not limited to this and may include one of various methods. You can. For example, the selection of at least one layer among the plurality of layers included in the artificial neural network is (i) sequentially from the first layer where input data is received to subsequent layers according to the arrangement order of the plurality of layers constituting the artificial neural network. (ii) a method of sequentially selecting from the last layer where the final output data is generated to the previous layer according to the arrangement order of the plurality of layers constituting the artificial neural network, (iii) the plurality of layers constituting the artificial neural network It may be executed by selecting the layer with the highest computational amount among the layers, or (iv) selecting the layer with the lowest computational amount among the plurality of layers that make up the artificial neural network.

단계(S1110)에서 인공신경망의 레이어 선택이 완료되면, 선택된 레이어의 파라미터(예를 들어, 가중치)에 대한 데이터 표현 크기를 비트 단위로 감소시키는 단계(S1120)로 진행될 수 있다.Once the selection of the artificial neural network layer is completed in step S1110, the step S1120 may proceed to reduce the size of the data representation for the parameter (eg, weight) of the selected layer in bit units.

일 실시예에서, 선택된 레이어의 파라미터들 중 가중치 또는 출력 데이터의 크기를 비트 단위로 감소시키는 경우, 도 4내지 도 10을 참조하여 설명한 가중치 커널 양자화(1028)와 활성화 맵 양자화(1024)가 실행될 수 있다. 예를 들어, 가중치 커널 양자화(1028)는, 다음 수식에 의해 산출될 수 있다.In one embodiment, when the size of the weight or output data among the parameters of the selected layer is reduced in bits, the weight kernel quantization 1028 and activation map quantization 1024 described with reference to FIGS. 4 to 10 may be performed. there is. For example, weight kernel quantization 1028 can be calculated by the following equation.

여기서, 는 양자화될 가중치 커널의 요소 값(예를 들어, 실수의 가중치 커널 계수)을 나타내고, 는 양자화할 비트 수를 나타내고, 는 가k비트 만큼 양자화된 결과를 나타낸다. 즉, 위 수식에 따르면, 먼저 에 대해, 사전 결정된 이진수 를 곱하여 , 가 k 비트만큼 자리수가 증가된다("제1 값"). 다음으로, 제1 값에 대해 라운딩 또는 트렁케이션 연산을 실행함으로써, 의 소수점 이하 숫자가 제거된다("제2 값"). 제2 값은 이진수 으로 나누어, k 비트만큼 자리수가 다시 감소됨으로써, 최종 양자화된 가중치 커널의 요소 값이 계산될 수 있다. 이와 같은 가중치 커널 양자화(1028)는 가중치 커널(1014)의 모든 요소 값에 대해 반복 실행되어, 양자화 가중치 커널(1018)이 생성된다. here, represents the element value of the weight kernel to be quantized (e.g., a real weight kernel coefficient), represents the number of bits to be quantized, Is It represents the result quantized by k bits. That is, according to the above formula, first For, a predetermined binary number Multiply by , The number of digits is increased by k bits (“first value”). Next, by performing a rounding or truncation operation on the first value, The decimal digits of are removed (“second value”). The second value is binary By dividing by and reducing the number of digits by k bits again, the element value of the final quantized weight kernel can be calculated. This weight kernel quantization (1028) is repeatedly performed on all element values of the weight kernel (1014), thereby generating the quantization weight kernel (1018).

한편, 활성화 맵 양자화(1030)는, 다음 수식에 의해 실행될 수 있다.Meanwhile, activation map quantization 1030 can be performed using the following equation.

활성화 맵 양자화(1030)에서는, 활성화 맵(1022)의 각 요소 값()(예를 들어, 실수의 계수)에 대한 양자화가 적용되기 전에, 클립핑(clipping)이 적용하여 활성화 맵(1022)의 각 요소 값을 0에서 1의 사이 값으로 정규화 시키는 과정을 추가할 수 있다. 다음으로, 정규화된 에 대해, 사전 결정된 이진수 를 곱하여, 가 k 비트만큼 자리수가 증가된다("제1 값"). 다음으로, 제1 값에 대해 라운딩 또는 트렁케이션 연산을 실행함으로써, 의 소수점 이하 숫자가 제거된다("제2 값"). 제2 값은 이진수 으로 나누어, k 비트만큼 자리수가 다시 감소됨으로써, 최종 양자화된 활성화 맵(1026)의 요소 값이 계산될 수 있다. 이와 같은 활성화 맵의 양자화(1030)는 활성화 맵(1022)의 모든 요소 값에 대해 반복 실행되어, 양자화 활성화 맵(1026)이 생성된다.In activation map quantization 1030, the value of each element of the activation map 1022 ( ) (e.g., the coefficient of a real number), before quantization is applied, a process of normalizing the value of each element of the activation map 1022 to a value between 0 and 1 can be added by applying clipping. . Next, the normalized For, a predetermined binary number Multiply by The number of digits is increased by k bits (“first value”). Next, by performing a rounding or truncation operation on the first value, The decimal digits of are removed (“second value”). The second value is binary By dividing by and reducing the number of digits by k bits again, the element value of the final quantized activation map 1026 can be calculated. This quantization 1030 of the activation map is repeatedly performed on all element values of the activation map 1022, thereby generating the quantized activation map 1026.

이상 설명한 실시예들에서는, 인공신경망에서 선택된 레이어의 파라미터에 대한 데이터 표현의 크기를 감소하기 위해, 그 가중치 값 또는 활성화 맵 데이터의 비트 수를 감소하는 예를 설명하였으나, 본 개시의 비트 양자화 방법은 이에 한정되지 않는다. 다른 실시예에서, 인공신경망에서 선택된 레이어에 포함된 다양한 데이터에 대한 여러 연산 단계들 사이에 존재하는 중단 단계의 데이터에 대해 각각 다른 비트를 할당할 수도 있다. 이에 따라서 인공신경망의 하드웨어로 구현 시 각 데이터가 저장되는 메모리(예를 들어, 버퍼, 레지스터, 또는 캐쉬)의 크기를 감소하기 위해, 해당 메모리에 저장되는 각 데이터의 비트 수를 감소하고 해당 메모리의 비트 수를 감소할 수도 있다. 또 다른 실시예에서, 인공신경망에서 선택된 레이어의 데이터가 전송되는 데이터 경로의 데이터 비트의 크기를 비트 단위로 감소할 수도 있다.In the above-described embodiments, an example of reducing the number of bits of the weight value or activation map data was described in order to reduce the size of the data representation for the parameter of the selected layer in the artificial neural network, but the bit quantization method of the present disclosure is It is not limited to this. In another embodiment, different bits may be allocated to data in an interruption stage that exists between various calculation stages for various data included in a selected layer in an artificial neural network. Accordingly, in order to reduce the size of the memory (e.g., buffer, register, or cache) in which each data is stored when implementing the artificial neural network in hardware, the number of bits of each data stored in the memory is reduced and the number of bits in the corresponding memory is reduced. The number of bits can also be reduced. In another embodiment, the size of the data bits of the data path through which data of the selected layer in the artificial neural network is transmitted may be reduced in bit units.

단계(S1120)의 실행 후에, 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 단계(S1130)를 진행할 수 있다. 인공신경망에서 선택된 레이어의 파라미터의 데이터 표현 크기를 비트 단위로 감소한 후, 해당 인공신경망의 출력 결과(예를 들어, 인공신경망의 학습 결과 또는 추론 결과)의 정확도가 사전에 결정된 목표값 이상이라면, 추가적으로 해당 데이터의 비트를 감소시켜도 인공신경망의 전체 성능을 유지할 수 있다고 예상할 수 있다. After executing step S1120, a step S1130 may be performed to determine whether the accuracy of the artificial neural network is greater than or equal to a predetermined target value. After reducing the data representation size of the parameters of the selected layer in the artificial neural network in bits, if the accuracy of the output result of the artificial neural network (e.g., learning result or inference result of the artificial neural network) is greater than a predetermined target value, additional It can be expected that the overall performance of the artificial neural network can be maintained even if the bits of the data are reduced.

따라서, 단계(S1130)에서 인공신경망의 정확도가 목표값 이상이라고 결정되는 경우, 단계(S1120)로 진행하여, 선택된 레이어의 데이터 표현 크기를 비트 단위로 추가로 감소시킬 수 있다. 또한, 해당 인공신경망의 출력 결과(예를 들어, 인공신경망의 학습 결과 또는 추론 결과)의 정확도가 사전에 결정된 목표값 이상인지 여부를 다시 판단할 수 있다(단계 S1130). Accordingly, if it is determined in step S1130 that the accuracy of the artificial neural network is greater than the target value, the process proceeds to step S1120 and the data representation size of the selected layer can be further reduced in bit units. Additionally, it may be determined again whether the accuracy of the output result of the corresponding artificial neural network (e.g., learning result or inference result of the artificial neural network) is greater than or equal to a predetermined target value (step S1130).

단계(S1130)에서, 인공신경망의 정확도가 목표값 이상이 아니라면, 현재 실행된 비트 양자화에 의해 인공신경망의 정확도가 저하되었다고 판단할 수 있다. 따라서, 이 경우, 바로 이전에 실행된 비트 양자화에서 정확도 목표값을 만족시켰던 최소의 비트 수를 선택된 레이어의 파라미터에 대한 최종 비트 수로 결정할 수 있다(단계 S1140). In step S1130, if the accuracy of the artificial neural network is not greater than the target value, it may be determined that the accuracy of the artificial neural network has been degraded due to the currently performed bit quantization. Therefore, in this case, the minimum number of bits that satisfied the accuracy target value in the bit quantization performed immediately before can be determined as the final number of bits for the parameter of the selected layer (step S1140).

다음으로, 인공신경망의 모든 레이어에 대한 비트 양자화가 완료되었는지를 결정한다(단계 S1150). 이 단계에서, 인공 신경망의 모든 레이어에 대한 비트 양자화가 완료되었다고 판단되면, 전체 프로세스를 종료한다. 반면, 인공 신경망의 레이어들 중에서 아직 비트 양자화가 되지 않은 레이어가 남아 있다면, 해당 레이어에 대한 비트 양자화를 실행하기 위해 단계(S1110)를 실행한다.Next, it is determined whether bit quantization for all layers of the artificial neural network has been completed (step S1150). At this stage, when it is determined that bit quantization for all layers of the artificial neural network is complete, the entire process is terminated. On the other hand, if there are still layers among the layers of the artificial neural network that have not yet been bit quantized, step S1110 is executed to perform bit quantization for the corresponding layer.

여기서, 단계(S1110)에서 인공신경망에 포함된 복수의 레이어 중에서 다른 하나의 레이어를 선택하는 방법은, (i) 인공신경망을 구성하는 복수의 레이어의 배열 순서에 따라, 이전 선택된 레이어의 다음 레이어를 순차적으로 선택하는 방법("순방향 비트 양자화", forward bit quantization), (ii) 인공신경망을 구성하는 복수의 레이어의 배열 순서에 따라, 이전 선택된 레이어의 이전 레이어를 역방향으로 선택하는 방법("역방향 비트 양자화", backward bit quantization), (iii) 인공신경망을 구성하는 복수의 레이어 중에서 연산량의 순서에 따라, 이전 선택된 레이어 다음으로 연산량이 많은 레이어를 선택하는 방법("고 연산량 비트 양자화", high computational cost bit quantization), 또는 (iv) 인공신경망을 구성하는 복수의 레이어 중에서 연산량의 순서에 따라, 이전 선택된 레이어 다음으로 연산량이 작은 레이어를 선택하는 방법("저 연산량 비트 양자화", low computational cost bit quantization)에 따라 실행될 수도 있다. Here, the method of selecting another layer from among the plurality of layers included in the artificial neural network in step S1110 is (i) selecting the next layer of the previously selected layer according to the arrangement order of the plurality of layers constituting the artificial neural network. A method of sequentially selecting (“forward bit quantization”), (ii) a method of selecting the previous layer of the previously selected layer in the reverse direction according to the arrangement order of the plurality of layers constituting the artificial neural network (“forward bit quantization”) Quantization", backward bit quantization), (iii) a method of selecting the layer with the highest computational cost after the previously selected layer according to the order of computational amount among the plurality of layers that make up the artificial neural network ("high computational bit quantization", high computational cost bit quantization), or (iv) a method of selecting the layer with the smallest computational cost following the previously selected layer in the order of computational amount among the plurality of layers that make up the artificial neural network ("low computational cost bit quantization", low computational cost bit quantization) It may be executed depending on .

일 실시예에서, 인공신경망의 정확도는, 인공신경망이 주어진 문제의 해결 방법(예를 들어, 입력 데이터인 이미지에 포함된 물체의 인식)을 학습 후에, 추론 단계에서 해당 문제의 해결방법을 제시할 확률을 의미할 수 있다. 또한, 이상 설명한 비트 양자화 방법에서 사용되는 목표치는, 인공신경망의 비트 양자화 후에 유지해야할 최소한의 정확도를 나타낼 수 있다. 예를 들어, 목표치가 90%의 정확도라고 가정하면, 비트 양자화에 의해 선택된 레이어의 파라미터를 비트 단위로 감소시킨 후에도, 해당 인공신경망의 정확도가 90% 이상이라면 , 추가의 비트 양자화를 실행할 수 있다. 예를 들어, 첫 번째 비트 양자화를 실행한 후에, 인공신경망의 정확도가 94%로 측정되었다면, 추가의 비트 양자화를 실행할 수 있다. 두 번째 비트 양자화의 실행 후에, 인공신경망의 정확도가 88%로 측정되었다면, 현재 실행된 비트 양자화의 결과를 무시하고, 첫번째 비트 양자화에 의해 결정된 비트 수(즉, 해당 데이터를 표현하기 위한 비트 수)를 최종의 비트 양자화 결과로 확정할 수 있다. In one embodiment, the accuracy of an artificial neural network refers to the fact that the artificial neural network learns how to solve a given problem (for example, recognition of an object included in an image as input data) and then presents a solution to the problem in the inference stage. It can mean probability. Additionally, the target value used in the bit quantization method described above may represent the minimum accuracy to be maintained after bit quantization of the artificial neural network. For example, assuming that the target value is 90% accuracy, even after reducing the parameters of the selected layer in bits by bit quantization, if the accuracy of the artificial neural network is more than 90%, additional bit quantization can be performed. For example, after performing the first bit quantization, if the accuracy of the artificial neural network is measured to be 94%, additional bit quantization can be performed. After execution of the second bit quantization, if the accuracy of the artificial neural network is measured to be 88%, the result of the currently performed bit quantization is ignored, and the number of bits determined by the first bit quantization (i.e., the number of bits to represent the corresponding data) can be confirmed as the final bit quantization result.

일 실시예에서, 연산량 비트 양자화(computational cost bit quantization) 방식에 따라, 복수의 레이어를 포함하는 인공 신경망에서, 복수의 레이어 중에서 연산량을 기준으로 비트 양자화를 실행할 레이어를 선택하는 경우, 각 레이어의 연산량은 다음과 같이 결정될 수 있다. 즉, 인공 신경망의 특정 레이어에서 하나의 덧셈 연산이 n 비트와 m 비트의 덧셈을 실행하는 경우, 해당 연산의 연산량은 (n+m)/2로 산정한다. 또한, 인공 신경망의 특정 레이어가 n 비트와 m 비트의 곱셈을 실행하는 경우, 해당 연산의 연산량은 n x m으로 산정할 수 있다. 따라서, 인공 신경망의 특정 레이어의 연산량은, 그 레이어가 실행하는 모든 덧셈과 곱셈의 연산량들을 합산한 결과가 될 수 있다.In one embodiment, according to a computational cost bit quantization method, in an artificial neural network including a plurality of layers, when selecting a layer to perform bit quantization among a plurality of layers based on the computational cost, the computational cost of each layer can be determined as follows. In other words, if one addition operation performs the addition of n bits and m bits in a specific layer of the artificial neural network, the amount of calculation for that operation is calculated as (n+m)/2. Additionally, when a specific layer of the artificial neural network performs multiplication of n bits and m bits, the amount of computation for the corresponding operation can be calculated as n x m. Therefore, the amount of computation of a specific layer of an artificial neural network can be the result of adding up the amount of computations of all additions and multiplications performed by that layer.

또한, 연산량 비트 양자화(computational cost bit quantization) 방식에 따라, 인공 신경망에서 복수의 레이어 중에서 연산량을 기준으로 레이어를 선택하여 비트 양자화를 실행하는 방법은, 도 11에 도시된 것에 한정되는 것은 아니고, 다양한 변형이 가능하다. In addition, according to the computational cost bit quantization method, the method of performing bit quantization by selecting a layer based on the computational cost among a plurality of layers in an artificial neural network is not limited to the one shown in FIG. 11, and can be used in various ways. Transformation is possible.

다른 실시예에서, 도 11에 도시된 실시예에서 각 레이어 별 파라미터의 비트 양자화는, 가중치와 활성화맵 각각에 대해 분리하여 실행될 수 있다. 예를 들어, 먼저, 선택된 레이어의 가중치에 대해서 양자화를 실행하고 이에 대한 결과로 가중치가 n 비트를 가지게 된다. 이와는 개별적으로, 선택된 레이어의 출력 활성화 데이터에 대하여 비트 양자화를 실행하여 활성화 맵 데이터의 표현 비트 수를 m비트로 결정할 수 있다. 대안적으로, 해당 레이어의 가중치와 활성화 맵 데이터에 대해 동일한 비트를 할당하면서 양자화 진행을 하고, 결과적으로 가중치와 활성화 맵 데이터 모두에 대해 동일한 n비트로 표현될 수도 있다.In another embodiment, the bit quantization of parameters for each layer in the embodiment shown in FIG. 11 may be performed separately for each weight and activation map. For example, first, quantization is performed on the weight of the selected layer, and as a result, the weight has n bits. Separately, bit quantization may be performed on the output activation data of the selected layer to determine the number of bits representing the activation map data as m bits. Alternatively, quantization may be performed while assigning the same bits to the weight and activation map data of the corresponding layer, and as a result, both the weight and activation map data may be expressed with the same n bits.

도 12는 본 개시의 다른 실시예에 따른 인공신경망의 비트 양자화 방법을 나타내는 순서도이다.Figure 12 is a flowchart showing a bit quantization method of an artificial neural network according to another embodiment of the present disclosure.

도시된 바와 같이, 인공신경망의 비트 양자화 방법(1200)은, 인공신경망에 포함된 복수의 레이어 중에서 연산량이 가장 높은 레이어를 선택하는 단계(S1210)로 개시될 수 있다. As shown, the bit quantization method 1200 of an artificial neural network may begin with a step (S1210) of selecting a layer with the highest calculation amount among a plurality of layers included in the artificial neural network.

단계(S1210)에서 인공신경망의 레이어 선택이 완료되면, 선택된 레이어의 파라미터에 대한 데이터 표현의 크기를 비트 단위로 감소시키는 단계(S1220)로 진행될 수 있다. 일 실시예에서, 선택된 레이어의 데이터의 크기를 비트 단위로 감소시키는 경우, 도 4 내지 도 10을 참조하여 설명한 가중치 커널 양자화(1028)와 활성화 맵 양자화(1024)가 실행될 수 있다. Once the selection of the artificial neural network layer is completed in step S1210, the step S1220 may proceed to reduce the size of the data representation for the parameters of the selected layer in bits. In one embodiment, when reducing the size of the data of the selected layer in bits, the weight kernel quantization 1028 and activation map quantization 1024 described with reference to FIGS. 4 to 10 may be performed.

단계(S1220)의 실행 후에, 지금까지의 비트 양자화 결과를 반영한 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 단계(S1230)를 진행할 수 있다. 단계(S1230)에서 인공신경망의 정확도가 목표값 이상이라고 결정되는 경우, 해당 레이어의 데이터의 크기를 현재의 비트 양자화 결과로 설정하고, 단계(S1210)로 진행하여 단계(S1210 내지 S1230)를 반복 실행할 수 있다. 즉, 단계(S1210)으로 진행하여, 인공신경망 내의 모든 레이어에 대해 연산량을 다시 계산하고, 이를 바탕으로 연산량이 가장 높은 레이어를 다시 선택한다.After executing step S1220, a step S1230 can be performed to determine whether the accuracy of the artificial neural network reflecting the bit quantization results so far is greater than or equal to a predetermined target value. If it is determined in step S1230 that the accuracy of the artificial neural network is greater than the target value, set the data size of the corresponding layer to the current bit quantization result, proceed to step S1210, and repeat steps S1210 to S1230. You can. That is, proceeding to step S1210, the calculation amount is recalculated for all layers in the artificial neural network, and based on this, the layer with the highest calculation amount is selected again.

단계(S1230)에서, 인공신경망의 정확도가 목표치 이상이 아니라면, 현재 선택된 레이어에 대한 비트 감소 양자화를 취소하고, 해당 레이어는 레이어 선택 단계(S1210)에서 선택할 수 있는 레이어 대상에서 제외 시킨다. 그런 다음에 해당 레이어의 다음으로 연산량이 높은 레이어를 선택할 수 있다(단계 S1240). 다음으로, 선택된 레이어의 데이터의 크기를 비트 단위로 감소할 수 있다(단계 S1250).In step S1230, if the accuracy of the artificial neural network is not higher than the target value, bit reduction quantization for the currently selected layer is canceled, and the layer is excluded from the layers that can be selected in the layer selection step S1210. Then, the layer with the next highest calculation amount after that layer can be selected (step S1240). Next, the size of the data of the selected layer can be reduced in bit units (step S1250).

단계(S1260)에서, 지금까지의 비트 양자화 결과를 반영한 인공신경망의 정확도가 목표치 이상인지를 결정한다. 만약 인공신경망의 정확도가 목표치 이상이 아니라면, 인공신경망의 모든 레이어에 대한 비트 양자화가 완료되었는지를 결정한다(S1270). 단계(S1270)에서 인공신경망의 모든 레이어에 대한 비트 양자화가 완료되었다고 판단되면, 전체 비트 양자화 절차를 종료한다. 반면, 단계(S1270)에서 인공신경망의 모든 레이어에 대한 비트 양자화가 완료되지 않았다고 판단되면, 단계(S1240)로 진행할 수 있다. In step S1260, it is determined whether the accuracy of the artificial neural network reflecting the bit quantization results so far is greater than or equal to the target value. If the accuracy of the artificial neural network is not higher than the target value, it is determined whether bit quantization for all layers of the artificial neural network has been completed (S1270). If it is determined in step S1270 that bit quantization for all layers of the artificial neural network is completed, the entire bit quantization procedure is terminated. On the other hand, if it is determined in step S1270 that bit quantization for all layers of the artificial neural network has not been completed, the process may proceed to step S1240.

단계(S1260)에서 인공신경망의 정확도가 목표치 이상이라고 판단되면, 단계(1220)로 진행하여 이후 절차를 진행할 수 있다.If it is determined in step S1260 that the accuracy of the artificial neural network is greater than the target value, the process can proceed to step S1220 and further procedures.

도 13은 본 개시의 또 다른 실시예에 따른 복수의 레이어를 갖는 인공신경망의 비트 양자화 방법을 나타내는 순서도이다.Figure 13 is a flowchart showing a bit quantization method of an artificial neural network with a plurality of layers according to another embodiment of the present disclosure.

도시된 바와 같이, 복수의 레이어를 갖는 인공신경망의 비트 양자화 방법(1300)은, 인공 신경망에 포함되는 모든 레이어 각각에 대한 정확도 변동 지점을 탐색하는 단계들(S1310 내지 S1350)을 포함한다. 방법(1300)은, 초기에 인공신경망에 포함되는 모든 레이어의 데이터의 비트 크기를 최대로 고정하고, 정확도 변동 지점의 탐색이 진행되지 않은 하나의 레이어를 선택하는 단계(S1310)로 개시된다.As shown, the bit quantization method 1300 of an artificial neural network having a plurality of layers includes steps (S1310 to S1350) of searching for accuracy change points for each of all layers included in the artificial neural network. The method 1300 begins with a step (S1310) of initially fixing the bit size of data of all layers included in the artificial neural network to the maximum and selecting one layer for which no search for accuracy change points has been performed.

단계(S1310)에서 임의의 인공신경망의 레이어 선택이 완료되면, 선택된 레이어의 데이터의 크기를 비트 단위로 감소시키는 단계(S1320)로 진행될 수 있다. 일 실시예에서, 선택된 레이어의 데이터의 크기를 비트 단위로 감소시키는 경우, 도 4 내지 도 10을 참조하여 설명한 가중치 커널 양자화(1028)와 활성화 맵 양자화(1024)가 실행될 수 있다. Once the selection of a layer of a random artificial neural network is completed in step S1310, the process may proceed to step S1320 of reducing the size of the data of the selected layer in bits. In one embodiment, when reducing the size of the data of the selected layer in bits, the weight kernel quantization 1028 and activation map quantization 1024 described with reference to FIGS. 4 to 10 may be performed.

단계(S1320)의 실행 후에, 선택된 레이어에 대해 지금까지의 비트 양자화 결과를 반영한 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 단계(S1330)를 진행할 수 있다. 단계(S1330)에서 인공신경망의 정확도가 목표값 이상이라고 결정되는 경우, 단계(S1320)로 진행하여 현재 선택된 레이어에 대한 추가의 비트 감소 양자화를 실행한다. After executing step S1320, a step S1330 may be performed to determine whether the accuracy of the artificial neural network reflecting the bit quantization results so far for the selected layer is greater than or equal to a predetermined target value. If it is determined in step S1330 that the accuracy of the artificial neural network is greater than or equal to the target value, the process proceeds to step S1320 to perform additional bit reduction quantization for the currently selected layer.

단계(S1330)에서, 인공신경망의 정확도가 목표치 이상이 아니라면, 현재 선택된 레이어의 데이터 비트 수를 가장 최근에 목표치를 만족했었던 최소 비트 수로 설정한다. 이후에, 인공신경망의 모든 레이어에 대한 정확도 변동 지점의 탐색이 완료되었는지를 결정한다(S1340). 이 단계에서, 모든 레이어에 대한 정확도 변동 지점의 탐색이 완료되지 않은 경우에는, 단계(S1310)로 진행할 수 있다. 단계(S1310)에서는 인공신경망에 포함되는 모든 레이어의 데이터의 비트 크기가 최대이고, 성능 변동 지점의 탐색이 진행되지 않은 다른 하나의 레이어를 선택한다.In step S1330, if the accuracy of the artificial neural network is not higher than the target value, the number of data bits of the currently selected layer is set to the minimum number of bits that most recently satisfied the target value. Afterwards, it is determined whether the search for accuracy change points for all layers of the artificial neural network has been completed (S1340). In this step, if the search for accuracy change points for all layers is not completed, the process may proceed to step S1310. In step S1310, the bit size of the data of all layers included in the artificial neural network is the maximum, and another layer for which performance change points have not been searched is selected.

만약, 단계(S1340)에서, 인공신경망의 모든 레이어에 대한 정확도 변동 지점의 탐색이 완료되었다고 결정되면, 인공 신경망의 각 레이어에 대한 정확도 변동 지점에 대응하는 비트 양자화 결과를 인공 신경망에 반영할 수 있다(S1350). 일 실시예에서, 단계(S1350)에서는, 이상 설명한 단계들(S1310 내지 S1340)에 따라 결정된 인공신경망의 각 레이어의 정확도 변동 지점(예를 들어, 각 레이어에 있어서 인공신경망의 정확도의 열화가 발생되는 지점) 직전의 데이터의 비트 크기로 해당 레이어를 설정한다. If, in step S1340, it is determined that the search for accuracy change points for all layers of the artificial neural network is completed, the bit quantization result corresponding to the accuracy change point for each layer of the artificial neural network can be reflected in the artificial neural network. (S1350). In one embodiment, in step S1350, the accuracy change point of each layer of the artificial neural network determined according to the steps S1310 to S1340 described above (e.g., the point where the accuracy of the artificial neural network is deteriorated in each layer) point) Set the layer to the bit size of the previous data.

다른 실시예에서, 단계(S1350)에서는, 이상 설명한 단계들(S1310 내지 S1340)에 따라 결정된 인공신경망의 각 레이어의 정확도 변동 지점 직전의 파라미터에 대한 연산에 요구되는 자원의 크기보다 크게 해당 레이어를 설정한다. 예를 들어, 인공 신경망의 각 레이어의 파라미터의 비트 수를 정확도 변동 지점 직전의 비트 수보다 2 비트 크게 설정할 수 있다. 그 다음에, 단계(1350)에서 설정된 각 레이어의 데이터의 크기를 갖는 인공신경망에 대해 비트 양자화 방법을 실행한다(S1360). 단계(S1360)에서 실행되는 비트 양자화 방법은, 예를 들어, 도 11 또는 도 12에 도시된 방법을 포함할 수 있다.In another embodiment, in step S1350, the corresponding layer is set to be larger than the size of the resources required for calculating the parameter immediately before the accuracy change point of each layer of the artificial neural network determined according to the steps S1310 to S1340 described above. do. For example, the number of bits of the parameters of each layer of the artificial neural network can be set to be 2 bits larger than the number of bits immediately before the accuracy change point. Next, the bit quantization method is executed on the artificial neural network having the data size of each layer set in step 1350 (S1360). The bit quantization method performed in step S1360 may include, for example, the method shown in FIG. 11 or FIG. 12.

이상 설명한 다양한 실시예들에 따른 인공신경망의 비트 양자화 방법은, 인공신경망의 복수의 레이어 각각의 가중치 커널 및 특징맵(또는 활성화맵)에 대해 실행되는 것에 한정되지 않는다. 일 실시예에서, 본 개시의 비트 양자화 방법은, 인공신경망의 모든 레이어의 가중치 커널(또는 가중치)에 대해 먼저 실행되고, 이와 같은 가중치 커널 양자화가 반영된 인공신경망의 모든 레이어의 특징맵에 대해 다시 비트 양자화가 실행될 수도 있다. 다른 실시예에서, 인공신경망의 모든 레이어의 특징맵에 대해 먼저 비트 양자화가 실행되고, 이와 같은 특징맵 양자화가 반영된 인공신경망의 모든 레이어의 커널에 대해 다시 비트 양자화가 실행될 수도 있다.The bit quantization method of the artificial neural network according to the various embodiments described above is not limited to being executed on the weight kernel and feature map (or activation map) of each of the plurality of layers of the artificial neural network. In one embodiment, the bit quantization method of the present disclosure is first executed on the weight kernel (or weights) of all layers of the artificial neural network, and then bit again on the feature maps of all layers of the artificial neural network in which such weight kernel quantization is reflected. Quantization may also be performed. In another embodiment, bit quantization may be performed first on the feature maps of all layers of the artificial neural network, and then bit quantization may be performed again on the kernels of all layers of the artificial neural network in which the feature map quantization is reflected.

또한, 본 개시의 인공신경망의 비트 양자화 방법은, 인공신경망의 각 레이어의 가중치 커널들에 대해 동일한 수준의 비트 양자화가 적용되는 것에 한정되지 않는다. 일 실시예에서, 본 개시의 비트 양자화 방법은, 인공신경망의 각 레이어의 가중치 커널 단위로 비트 양자화를 실행할 수도 있고, 또는 각 가중치 커널의 요소가 되는 각 가중치 단위로 다른 비트를 가질 수 있도록 개별적인 비트 양자화를 실행할 수도 있다. Additionally, the bit quantization method of the artificial neural network of the present disclosure is not limited to applying the same level of bit quantization to the weight kernels of each layer of the artificial neural network. In one embodiment, the bit quantization method of the present disclosure may perform bit quantization on a weight kernel basis of each layer of the artificial neural network, or may perform individual bit quantization so that each weight unit that is an element of each weight kernel can have a different bit. Quantization can also be performed.

이하에서는, 본 개시의 다양한 실시예들에 따른 인공신경망의 비트 양자화 방법의 실행 결과의 예들을 도면을 참조하여 설명한다.Hereinafter, examples of execution results of the bit quantization method of an artificial neural network according to various embodiments of the present disclosure will be described with reference to the drawings.

도 14는 본 개시의 일 실시예에 따른 인공신경망의 레이어 별 연산량의 예시를 나타내는 그래프이다. 도 14에 도시된 인공신경망은 16개의 레이어를 포함하고 있는 VGG-16 모델의 컨볼루션 인공신경망의 예이며, 이 인공신경망의 각 레이어는 다른 연산량을 갖고 있다. Figure 14 is a graph showing an example of the amount of calculation for each layer of an artificial neural network according to an embodiment of the present disclosure. The artificial neural network shown in Figure 14 is an example of a convolutional artificial neural network of the VGG-16 model containing 16 layers, and each layer of this artificial neural network has a different computational amount.

예를 들어, 제2 레이어, 제4 레이어, 제6 레이어, 제7 레이어, 제9 레이어, 제10 레이어는 가장 높은 연산량을 갖고 있기 때문에, 고 연산량 비트 양자화(high computational cost bit quantization) 방법에 따를 경우, 가장 먼저 비트 양자화가 적용될 수 있다. 또한, 제2, 제4, 제6, 제7, 제9 및 제10 레이어에 대한 비트 양자화가 실행된 후, 다음으로 연산량이 높은 제14 레이어에 대한 비트 양자화가 실행될 수 있다.For example, the 2nd layer, 4th layer, 6th layer, 7th layer, 9th layer, and 10th layer have the highest computational cost, so they follow the high computational cost bit quantization method. In this case, bit quantization may be applied first. Additionally, after bit quantization is performed for the second, fourth, sixth, seventh, ninth, and tenth layers, bit quantization may be performed for the fourteenth layer, which has the next highest calculation amount.

도 15는 본 개시의 일 실시예에 따른 순방향 양자화(forward bit quantization) 방법에 의해 비트 양자화가 실행된 인공신경망의 레이어 별 비트 수를 나타내는 그래프이다. FIG. 15 is a graph showing the number of bits per layer of an artificial neural network in which bit quantization is performed using a forward bit quantization method according to an embodiment of the present disclosure.

앞서 설명한 바와 같이, 순방향 양자화는, 인공신경망에 포함된 복수의 레이어의 배열 순서를 기준으로 가장 앞의 레이어부터(예를 들어, 입력 데이터가 처음 수신되는 레이어부터) 순차적으로 비트 양자화를 실행하는 방법이다. 도 15는, 도 14에 도시된 VGG-16 모델의 인공신경망에 대해 순방향 양자화를 적용한 후 각 레이어별 비트 수와, 순방향 양자화에 의해 인공신경망의 연산량의 감소율을 나타낸다. 예를 들어, n 비트와 m 비트의 덧셈을 실행하는 경우, 해당 연산의 연산량은 (n+m)/2로 계산한다. 또한, n 비트와 m 비트의 곱셈을 실행하는 경우, 해당 연산의 연산량은 n x m으로 계산할 수 있다. 따라서, 인공 신경망의 전체 연산량은, 해당 인공 신경망에서 실행하는 모든 덧셈과 곱셈의 연산량들을 합산한 결과가 될 수 있다.As previously explained, forward quantization is a method of sequentially performing bit quantization starting from the frontmost layer (e.g., from the layer where input data is first received) based on the arrangement order of the plurality of layers included in the artificial neural network. am. Figure 15 shows the number of bits for each layer after applying forward quantization to the artificial neural network of the VGG-16 model shown in Figure 14, and the reduction rate of the computational amount of the artificial neural network due to forward quantization. For example, when performing addition of n bits and m bits, the computational amount of the operation is calculated as (n+m)/2. Additionally, when multiplication of n bits and m bits is performed, the amount of computation for the corresponding operation can be calculated as n x m. Therefore, the total computational amount of the artificial neural network may be the result of adding up all the computational amounts of addition and multiplication performed by the artificial neural network.

도시된 바와 같이, 순방향 양자화를 이용하여 VGG-16 모델의 인공신경망에 대해 비트 양자화를 실행하였을 경우, 상대적으로 인공신경망의 앞에 배열된 레이어들의 비트 수가 많이 감소하였고, 인공신경망의 뒤에 배열된 레이어들의 비트 수는 작게 감소하였다. 예를 들어, 인공신경망의 제1레이어의 비트 수는 12비트까지 감소하였고, 제2 레이어 및 제3 레이어의 비트 수는 각각 9 비트까지 감소한 반면, 제16레이어의 비트 수는 13 비트까지 감소하고, 제15레이어의 비트 수는 15비트까지만 감소하였다. 이와 같이 순방향 양자화를 인공신경망의 제1레이어부터 제16 레이어까지 순차적으로 적용하였을 때, 인공신경망 전체의 연산량의 감소율은 56%로 계산되었다.As shown, when bit quantization was performed on the artificial neural network of the VGG-16 model using forward quantization, the number of bits in the layers arranged in front of the artificial neural network was relatively reduced significantly, and the number of bits in the layers arranged behind the artificial neural network was relatively reduced. The number of bits decreased slightly. For example, the number of bits of the first layer of the artificial neural network decreased to 12 bits, the number of bits of the second layer and the third layer decreased to 9 bits each, while the number of bits of the 16th layer decreased to 13 bits. , the number of bits in the 15th layer was reduced to only 15 bits. When forward quantization was applied sequentially from the first to the 16th layer of the artificial neural network, the reduction rate of the overall computational amount of the artificial neural network was calculated to be 56%.

도 16은 본 개시의 일 실시예에 따른 역방향 양자화(backward bit quantization) 방법에 의해 비트 양자화가 실행된 인공신경망의 레이어 별 비트 수를 나타내는 그래프이다. FIG. 16 is a graph showing the number of bits per layer of an artificial neural network in which bit quantization is performed using a backward bit quantization method according to an embodiment of the present disclosure.

역방향 양자화는, 인공신경망에 포함된 복수의 레이어의 배열 순서를 기준으로 가장 뒤의 레이어부터(예를 들어, 출력 데이터가 최종 출력되는 레이어부터) 순차적으로 비트 양자화를 실행하는 방법이다. 도 16은, 도 14에 도시된 VGG-16 모델의 인공신경망에 대해 역방향 양자화를 적용한 후 각 레이어별 비트 수와, 역방향 양자화에 의해 인공신경망의 연산량의 감소율을 나타낸다. Reverse quantization is a method of sequentially performing bit quantization starting from the rearmost layer (for example, from the layer where output data is finally output) based on the arrangement order of the plurality of layers included in the artificial neural network. Figure 16 shows the number of bits for each layer after applying reverse quantization to the artificial neural network of the VGG-16 model shown in Figure 14, and the reduction rate of the computational amount of the artificial neural network due to reverse quantization.

도시된 바와 같이, 역방향 양자화를 이용하여 VGG-16 모델의 인공신경망에 대해 비트 양자화를 실행하였을 경우, 인공신경망의 뒤에 배열된 레이어들의 비트 수가 상대적으로 많이 감소한 반면, 인공신경망의 앞에 배열된 레이어들의 비트 수는 작게 감소하였다. 예를 들어, 제1레이어, 제2 레이어, 제3 레이어의 비트 수는 각각 15비트까지 감소하였고, 제4 레이어의 비트 수는 14비트까지 감소한 반면, 제16 레이어의 비트 수는 9비트까지 감소하고, 제15레이어의 비트 수는 15비트까지 감소하였다. 이와 같이 역방향 양자화를 인공신경망의 제1 레이어부터 제16레이어까지 순차적으로 적용하였을 때, 인공신경망 전체의 연산량의 감소율은 43.05%로 계산되었다. As shown, when bit quantization is performed on the artificial neural network of the VGG-16 model using reverse quantization, the number of bits in the layers arranged behind the artificial neural network is relatively reduced, while the number of bits in the layers arranged in front of the artificial neural network is reduced. The number of bits decreased slightly. For example, the number of bits of the first layer, second layer, and third layer each decreased to 15 bits, the number of bits of the fourth layer decreased to 14 bits, while the number of bits of the 16th layer decreased to 9 bits. And the number of bits in the 15th layer was reduced to 15 bits. When reverse quantization was applied sequentially from the first to the 16th layer of the artificial neural network, the reduction rate of the overall computational amount of the artificial neural network was calculated to be 43.05%.

도 17은 본 개시의 일 실시예에 따른 고 연산량 레이어 우선 양자화(high computational cost layer first bit quantization) 방법에 의해 비트 양자화가 실행된 인공신경망의 레이어 별 비트 수를 나타내는 그래프이다.FIG. 17 is a graph showing the number of bits per layer of an artificial neural network in which bit quantization is performed using a high computational cost layer first bit quantization method according to an embodiment of the present disclosure.

고 연산량 레이어 우선 양자화(또는 고 연산량 양자화)는, 인공신경망에 포함된 복수의 레이어 중에서 연산량이 높은 레이어부터 순차적으로 비트 양자화를 실행하는 방법이다. 도 17는, 도 14에 도시된 VGG-16 모델의 인공신경망에 대해 고 연산량 양자화를 적용한 후 각 레이어별 비트 수와, 고 연산량 양자화에 의해 인공신경망의 연산량의 감소율을 나타낸다. High computation layer priority quantization (or high computation amount quantization) is a method of sequentially performing bit quantization starting from the layer with the highest computation amount among the plurality of layers included in the artificial neural network. Figure 17 shows the number of bits for each layer after applying high calculation amount quantization to the artificial neural network of the VGG-16 model shown in Figure 14, and the reduction rate of the calculation amount of the artificial neural network due to high calculation amount quantization.

도시된 바와 같이, 고 연산량 양자화를 이용하여 VGG-16 모델의 인공신경망에 대해 비트 양자화를 실행하였을 경우, 인공신경망의 복수의 레이어들 중에서 연산량이 높은 레이어들의 비트 수가 상대적으로 많이 감소하였다. 예를 들어, 제2레이어 및 제10 레이어의 비트 수는 각각 5비트 및 6 비트까지 감소한 반면, 제1 레이어의 비트 수는 14비트까지 감소하였다. 이와 같이 고 연산량 양자화를 인공신경망의 레이어들에 대해 연산량의 순서대로 적용하였을 때, 인공신경망 전체의 연산량의 감소율은 70.70%로 계산되었다. As shown, when bit quantization was performed on the artificial neural network of the VGG-16 model using high-computation quantization, the number of bits of the layers with high calculation amount among the plurality of layers of the artificial neural network decreased relatively significantly. For example, the number of bits in the second layer and the tenth layer decreased to 5 bits and 6 bits, respectively, while the number of bits in the first layer decreased to 14 bits. When high-computation quantization was applied to the layers of the artificial neural network in order of computation amount, the reduction rate of the overall computation amount of the artificial neural network was calculated to be 70.70%.

도 18은 본 개시의 일 실시예에 따른 저 연산량 레이어 우선 양자화(low computational cost bit quantization) 방법에 의해 비트 양자화가 실행된 인공신경망의 레이어 별 비트 수를 나타내는 그래프이다. FIG. 18 is a graph showing the number of bits per layer of an artificial neural network in which bit quantization is performed using a low computational cost layer priority quantization method according to an embodiment of the present disclosure.

저 연산량 레이어 우선 양자화(또는 저 연산량 양자화)는, 인공신경망에 포함된 복수의 레이어 중에서 연산량이 낮은 레이어부터 순차적으로 비트 양자화를 실행하는 방법이다. 도 18은, 도 14에 도시된 VGG-16 모델의 인공신경망에 대해 저 연산량 양자화를 적용한 후 각 레이어별 비트 수와, 저 연산량 양자화에 의해 인공신경망의 연산량의 감소율을 나타낸다. Low computation layer priority quantization (or low computation amount quantization) is a method of sequentially performing bit quantization starting from the layer with the lowest computation amount among a plurality of layers included in an artificial neural network. Figure 18 shows the number of bits for each layer after applying low-computation quantization to the artificial neural network of the VGG-16 model shown in Figure 14, and the reduction rate of the calculation amount of the artificial neural network due to low-computation quantization.

도시된 바와 같이, 저 연산량 양자화를 이용하여 VGG-16 모델의 인공신경망에 대해 비트 양자화를 실행하였을 경우에도, 인공신경망의 복수의 레이어들 중에서 연산량이 높은 레이어들의 비트 수가 상대적으로 많이 감소하였다. 예를 들어, 제6레이어 및 제7 레이어의 비트 수는 각각 6비트 및 5 비트까지 감소한 반면, 제1 레이어의 비트 수는 13비트까지 감소하였다. 이와 같이 저 연산량 양자화를 인공신경망의 레이어들에 대해 연산량의 순서대로 적용하였을 때, 인공신경망 전체의 연산량의 감소율은 49.11%로 계산되었다.As shown, even when bit quantization was performed on the artificial neural network of the VGG-16 model using low-computation quantization, the number of bits in the layers with high calculation amount among the plurality of layers of the artificial neural network decreased relatively significantly. For example, the number of bits of the 6th layer and the 7th layer decreased to 6 bits and 5 bits, respectively, while the number of bits of the first layer decreased to 13 bits. When low-computation quantization was applied to the layers of the artificial neural network in order of computation amount, the reduction rate of the computation amount of the entire artificial neural network was calculated to be 49.11%.

이하에서는, 이상 설명한 본 개시의 다양한 실시예들에 따른 비트 양자화가 적용된 인공신경망의 하드웨어 구현 예들에 대해 상세히 설명한다. 복수의 레이어를 포함하는 컨볼루션 인공신경망을 하드웨어로 구현하는 경우, 가중치 커널은, 컨볼루션 레이어의 합성곱을 실행하기 위한 프로세싱 유닛의 외부 및/또는 내부에 배열될 수 있다. Hereinafter, hardware implementation examples of an artificial neural network to which bit quantization is applied according to various embodiments of the present disclosure described above will be described in detail. When implementing a convolutional artificial neural network including a plurality of layers in hardware, the weight kernel may be arranged outside and/or inside a processing unit for executing convolution of the convolutional layer.

일 실시예에서, 가중치 커널은 컨볼루션 레이어의 합성곱을 실행하기 위한 프로세싱 유닛과 분리된 메모리(예를 들어, 레지스터, 버퍼, 캐쉬 등)에 저장될 수 있다. 이 경우, 가중치 커널에 대해 비트 양자화를 적용하여 가중치 커널의 요소 값들의 비트 수를 감소시킨 후, 가중치 커널의 비트 수에 따라 메모리의 크기를 결정할 수 있다. 또한, 메모리에 저장된 가중치 커널의 요소 값들과 입력 특징맵의 요소 값들을 입력 받아 곱셈 및/또는 덧셈 연산을 실행하는 프로세싱 유닛 내에 배열되는 곱셈기(multiplier) 또는 가산기(adder)의 비트 크기(bit width)도 비트 양자화의 결과에 따른 비트 수에 맞추어 설계될 수 있다.In one embodiment, the weight kernel may be stored in a memory (e.g., register, buffer, cache, etc.) separate from the processing unit for executing the convolution of the convolution layer. In this case, after applying bit quantization to the weight kernel to reduce the number of bits of the element values of the weight kernel, the size of the memory can be determined according to the number of bits of the weight kernel. In addition, the bit width of a multiplier or adder arranged in a processing unit that receives the element values of the weight kernel and the element values of the input feature map stored in memory and performs multiplication and/or addition operations. It can also be designed according to the number of bits resulting from bit quantization.

다른 실시예에서, 가중치 커널은 컨볼루션 레이어의 합성곱을 실행하기 위한 프로세싱 유닛 내에 하드와이어된(hard-wired) 형태로 구현될 수도 있다. 이 경우, 가중치 커널에 대해 비트 양자화를 적용하여 가중치 커널의 요소 값들의 비트 수를 감소시킨 후, 가중치 커널의 비트 수에 따라 가중치 커널의 요소 값들 각각을 나타내는 하드와이어를 프로세싱 유닛 내에 구현할 수 있다. 또한, 하드와이어된 가중치 커널의 요소 값들과 입력 특징맵의 요소 값들을 입력 받아 곱셈 및/또는 덧셈 연산을 실행하는 프로세싱 유닛 내에 배열되는 곱셈기 또는 가산기의 비트 크기도 비트 양자화의 결과에 따른 비트 수에 맞추어 설계될 수 있다.In another embodiment, the weight kernel may be implemented in a hard-wired form within a processing unit for executing convolution of the convolution layer. In this case, after applying bit quantization to the weight kernel to reduce the number of bits of the element values of the weight kernel, a hardwire representing each element value of the weight kernel according to the number of bits of the weight kernel can be implemented in the processing unit. In addition, the bit size of the multiplier or adder arranged in the processing unit that receives the element values of the hardwired weight kernel and the element values of the input feature map and performs multiplication and/or addition operations also depends on the number of bits according to the result of bit quantization. It can be designed accordingly.

이하에서 설명되는 도 19 내지 도 21는, 본 개시의 또 다른 실시예에 따른 복수의 레이어를 포함하는 인공신경망의 하드웨어 구현 예를 도시하는 도면이다. 본 개시인 복수의 레이어를 포함하는 인공신경망의 비트 양자화 방법 및 시스템은, CPU, GPU, FPGA, ASIC 등 어떠한 ANN(Artificial neural network) 연산 시스템에도 본 개시를 적용하여 필요 연산량, 연산기의 비트 크기, 메모리를 감소시킬 수 있다. 또한, 본 예시에서는 정수(Integer)를 기준으로 실시 예를 보였지만, 부동 소수점(Floating Point) 연산으로도 실시될 수도 있다. 19 to 21 described below are diagrams showing an example of hardware implementation of an artificial neural network including a plurality of layers according to another embodiment of the present disclosure. The bit quantization method and system of an artificial neural network including a plurality of layers of the present disclosure can be applied to any ANN (Artificial neural network) operation system such as CPU, GPU, FPGA, ASIC, etc. to determine the amount of calculation required, the bit size of the operator, Memory may be reduced. In addition, in this example, the embodiment was shown based on an integer, but it can also be performed with a floating point operation.

도 19는 본 개시의 일 실시예에 따른 인공신경망의 하드웨어 구현 예를 도시하는 도면이다. 도시된 인공신경망은 컨볼루션 인공신경망의 컨볼루션 레이어의 합성곱 처리 장치(1900)를 하드웨어로 구현한 예를 나타낸다. 여기서, 컨볼루션 레이어는, 3x3x3 크기의 가중치 커널을 입력 특징맵 상의 일 부분(3x3x3 크기의 데이터)에 적용하여 합성곱을 실행하는 것으로 가정하여 설명한다. 각 레이어의 가중치 커널의 크기와 개수는, 응용분야와 입출력의 특징맵 채널 수에 따라 상이할 수 있다.FIG. 19 is a diagram illustrating an example of hardware implementation of an artificial neural network according to an embodiment of the present disclosure. The depicted artificial neural network represents an example of hardware implementation of the convolution processing device 1900 of the convolutional layer of the convolutional artificial neural network. Here, the convolution layer is explained assuming that convolution is performed by applying a weight kernel of size 3x3x3 to a portion of the input feature map (data of size 3x3x3). The size and number of weight kernels in each layer may vary depending on the application field and the number of feature map channels of input and output.

도시된 바와 같이, 가중치 커널은 컨볼루션 레이어의 합성곱을 실행하기 위한 프로세싱 유닛(1930)과 분리된 가중치 커널 캐쉬(1910)에 저장될 수 있다. 이 경우, 가중치 커널에 대해 비트 양자화를 적용하여 가중치 커널의 요소 값들(w₁, w₂, ..., w₉)의 비트 수를 감소시킨 후, 가중치 커널의 비트 수에 따라 캐쉬의 크기를 결정할 수 있다. 또한, 메모리에 저장된 가중치 커널의 요소 값들과 입력 특징맵의 요소 값들을 입력 받아 곱셈 및/또는 덧셈 연산을 실행하는 프로세싱 유닛(1930) 내에 배열되는 곱셈기 또는 가산기의 비트 크기도 비트 양자화의 결과에 따른 가중치 커널 요소 값의 비트 수에 맞추어 설계될 수 있다.As shown, the weight kernel may be stored in a weight kernel cache 1910 that is separate from the processing unit 1930 for performing convolution of the convolution layer. In this case, bit quantization is applied to the weight kernel to reduce the number of bits of the element values (w ₁ , w ₂ , ..., w ₉ ) of the weight kernel, and then the size of the cache is adjusted according to the number of bits of the weight kernel. You can decide. In addition, the bit size of the multiplier or adder arranged in the processing unit 1930 that receives the element values of the weight kernel and the element values of the input feature map stored in the memory and performs multiplication and/or addition operations is also determined according to the result of bit quantization. It can be designed according to the number of bits of the weight kernel element value.

일 실시예에 따르면, 입력 특징맵 캐쉬(1920)는, 입력 데이터 상의 일부분(가중치 커널의 크기에 대응되는 부분)을 입력 받아 저장할 수 있다. 가중치 커널은 입력 데이터 상을 순회하며, 입력 특징맵 캐쉬(1920)는 해당 가중치 커널의 위치에 대응되는 입력 데이터의 일부분을 순차적으로 입력 받아 저장할 수 있다. 입력 특징맵 캐쉬(1920)에 저장된 입력 데이터의 일부분(x₁, x₂, ..., x₉)과 가중치 커널 캐쉬(1910)에 저장된 가중치 커널의 일부 요소 값들(w₁, w₂, ..., w₉)은 각각 대응되는 곱셈기(1932)에 입력되어 다중 곱이 실행된다. 곱셈기(1932)에 의한 다중 곱의 결과값들은 트리 가산기(1934)에 의해 합산되어 가산기(1940)로 입력된다. 입력 데이터가 다채널로 구성된 경우(예를 들어, 입력 데이터가 RGB 컬러 영상인 경우), 가산기(1940)는, 누산기(1942)에 저장된 값(초기값은 0)과 입력된 특정 채널의 합산값을 더하여 다시 누산기(1942)에 저장할 수 있다. 누산기(1942)에 저장된 합산값은 다음 채널에 대한 가산기(1940)의 합산값과 다시 더해서 누산기(1942)로 입력될 수 있다. 이러한 가산기(1940)와 누산기(1942)의 합산 과정은, 입력 데이터의 모든 채널에 대해 실행되어 그 총 합산값은 출력 활성화 맵 캐쉬(1950)로 입력될 수 있다. 이상 설명한 합성곱의 절차는, 가중치 커널과 해당 가중치 커널의 입력 데이터 상의 순회 위치에 대응되는 입력 데이터의 일부분에 대해 반복될 수 있다. According to one embodiment, the input feature map cache 1920 may receive and store a portion of the input data (a portion corresponding to the size of the weight kernel). The weight kernel traverses the input data, and the input feature map cache 1920 can sequentially receive and store a portion of the input data corresponding to the position of the weight kernel. Part of the input data (x ₁ , x ₂ , ..., x ₉ ) stored in the input feature map cache 1920 and some element values of the weight kernel stored in the weight kernel cache 1910 (w ₁ , w ₂ , . .., w ₉ ) are each input to the corresponding multiplier 1932 and multiple multiplication is performed. The results of multiple multiplication by the multiplier 1932 are added by the tree adder 1934 and input to the adder 1940. When the input data consists of multiple channels (for example, when the input data is an RGB color image), the adder 1940 calculates the sum of the value stored in the accumulator 1942 (the initial value is 0) and the specific input channel. can be added and stored again in the accumulator (1942). The summed value stored in the accumulator 1942 can be added back to the summed value of the adder 1940 for the next channel and input to the accumulator 1942. This summation process of the adder 1940 and the accumulator 1942 is performed for all channels of input data, and the total sum value can be input to the output activation map cache 1950. The convolution procedure described above can be repeated for a portion of the input data corresponding to the weight kernel and the traversal position on the input data of the weight kernel.

이상 설명한 바와 같이, 가중치 커널의 요소값들이 프로세싱 유닛(1930)의 외부에 배열된 가중치 커널 캐쉬(1910)에 저장될 경우, 본 개시에 따른 비트 양자화에 의해 가중치 커널 요소값들의 비트 수를 감소할 수 있으며, 이에 따라 가중치 커널 캐쉬(1910)의 크기와 프로세싱 유닛(1930)의 곱셈기와 가산기의 크기를 감소할 수 있는 효과가 있다. 또한, 프로세싱 유닛(1930)의 크기가 감소함에 따라, 프로세싱 유닛(1930)의 전력 소비량도 감소될 수 있다.As described above, when the element values of the weight kernel are stored in the weight kernel cache 1910 arranged outside the processing unit 1930, the number of bits of the weight kernel element values can be reduced by bit quantization according to the present disclosure. This has the effect of reducing the size of the weight kernel cache 1910 and the size of the multiplier and adder of the processing unit 1930. Additionally, as the size of the processing unit 1930 decreases, the power consumption of the processing unit 1930 may also be reduced.

도 20은 본 개시의 다른 실시예에 따른 인공신경망의 하드웨어 구현 예를 도시하는 도면이다. FIG. 20 is a diagram illustrating an example of hardware implementation of an artificial neural network according to another embodiment of the present disclosure.

도시된 인공신경망은 컨볼루션 인공신경망의 컨볼루션 레이어의 합성곱 처리 장치(2000)를 하드웨어로 구현한 예를 나타낸다. 여기서, 컨볼루션 레이어는, 3x3x3 크기의 가중치 커널을 입력 활성화 맵 상의 일 부분(3x3x3 크기의 데이터)에 적용하여 합성곱을 실행한다.The depicted artificial neural network represents an example of hardware implementation of the convolution processing device 2000 of the convolutional layer of the convolutional artificial neural network. Here, the convolution layer performs convolution by applying a weight kernel of size 3x3x3 to a portion of the input activation map (data of size 3x3x3).

도시된 바와 같이, 가중치 커널은 컨볼루션 레이어의 합성곱을 실행하기 위한 프로세싱 유닛(2030)과 분리된 가중치 커널 캐쉬(2010)에 저장될 수 있다. 이 경우, 가중치 커널에 대해 비트 양자화를 적용하여 가중치 커널의 요소 값들(w₁, w₂, ..., w₉)의 비트 수를 감소시킨 후, 가중치 커널의 비트 수에 따라 캐쉬의 크기를 결정할 수 있다. 또한, 메모리에 저장된 가중치 커널의 요소 값들과 입력 활성화 맵(또는 특징맵)의 요소 값들을 입력 받아 곱셈 및/또는 덧셈 연산을 실행하는 프로세싱 유닛(2030) 내에 배열되는 곱셈기 또는 가산기의 비트 크기도 비트 양자화의 결과에 따른 가중치 커널 요소 값의 비트 수에 맞추어 설계될 수 있다.As shown, the weight kernel may be stored in a weight kernel cache 2010 that is separate from the processing unit 2030 for executing convolution of the convolution layer. In this case, bit quantization is applied to the weight kernel to reduce the number of bits of the element values (w ₁ , w ₂ , ..., w ₉ ) of the weight kernel, and then the size of the cache is adjusted according to the number of bits of the weight kernel. You can decide. In addition, the bit size of the multiplier or adder arranged in the processing unit 2030 that receives the element values of the weight kernel and the element values of the input activation map (or feature map) stored in the memory and performs multiplication and/or addition operations is also bit size. It can be designed according to the number of bits of the weight kernel element value according to the quantization result.

일 실시예에 따르면, 입력 활성화맵 캐쉬(2020)는, 다채널(예를 들어, 3개의 RGB 채널)로 구성된 입력 데이터 상의 일부분(가중치 커널의 크기에 대응되는 부분)을 입력 받아 저장할 수 있다. 가중치 커널은 입력 데이터 상을 순회하며, 입력 활성화맵 캐쉬(2020)는 해당 가중치 커널의 위치에 대응되는 입력 데이터의 일부분을 순차적으로 입력 받아 저장할 수 있다. 입력 활성화맵 캐쉬(2020)에 저장된 입력 데이터의 일부분(x₁, x₂, ..., x₂₇)과 가중치 커널 캐쉬(2010)에 저장된 가중치 커널의 요소값들(w₁, w₂, ..., w₂₇)은 각각 대응되는 곱셈기에 입력되어 다중 곱이 실행된다. 이 때, 가중치 커널 캐쉬(2010)의 커널 요소값들(w₁, w₂, ..., w₉)과 입력 활성화맵 가중치 캐쉬(2020)에 저장된 입력 데이터의 제1채널의 부분(x₁, x₂, ..., x₉)은 제1 합성곱 처리 유닛(2032)로 입력된다. 또한, 가중치 커널 캐쉬(2010)의 가중치 커널 요소값들(w₁₀, w₁₁, ..., w₁₈)과 입력 활성화맵 캐쉬(2020)에 저장된 입력 데이터의 제2채널의 부분(x₁₀, x₁₁, ..., x₁₈)은 제2 합성곱 처리 유닛(2034)로 입력된다. 또한, 가중치 커널 캐쉬(2010)의 가중치 커널 요소값들(w₁₉, w₂₀, ..., w₂₇)과 입력 활성화맵 캐쉬(2020)에 저장된 입력 데이터의 제3채널의 부분(x₁₉, x₂₀, ..., x₂₇)은 제3 합성곱 처리 유닛(2036)로 입력된다. According to one embodiment, the input activation map cache 2020 may receive and store a portion (a portion corresponding to the size of the weight kernel) of input data composed of multiple channels (e.g., three RGB channels). The weight kernel traverses the input data, and the input activation map cache 2020 can sequentially receive and store a portion of the input data corresponding to the position of the weight kernel. A portion of the input data (x ₁ , x ₂ , ..., x ₂₇ ) stored in the input activation map cache (2020) and the element values of the weight kernel (w ₁ , w ₂ , . .., w ₂₇ ) are each input to the corresponding multiplier and multiple multiplication is performed. At this time, the kernel element values (w ₁ , w ₂ , ..., w ₉ ) of the weight kernel cache 2010 and the portion of the first channel of the input data stored in the input activation map weight cache 2020 (x ₁ , x ₂ , ..., x ₉ ) are input to the first convolution processing unit 2032. In addition, the weight kernel element values (w ₁₀ , w ₁₁ , ..., w ₁₈ ) of the weight kernel cache (2010) and the portion of the second channel of the input data stored in the input activation map cache (2020) (x ₁₀ , x ₁₁ , ..., x ₁₈ ) are input to the second convolution processing unit 2034. In addition, the weight kernel element values (w ₁₉ , w ₂₀ , ..., w ₂₇ ) of the weight kernel cache (2010) and the third channel portion (x ₁₉ , x ₂₀ , ..., x ₂₇ ) are input to the third convolution processing unit 2036.

제1 합성곱 처리 유닛(2032), 제2 합성곱 처리 유닛(2034) 및 제3 합성곱 처리 유닛(2036) 각각은, 도 19에 도시된 프로세싱 유닛(1930)과 동일하게 동작할 수 있다. 제1 합성곱 처리 유닛(2032), 제2 합성곱 처리 유닛(2034) 및 제3 합성곱 처리 유닛(2036) 각각에 의해 계산된 합성곱의 결과값은 트리 가산기(2038)에 의해 합산되어 출력 활성화 맵 캐쉬(2040)에 입력될 수 있다.Each of the first convolution processing unit 2032, the second convolution processing unit 2034, and the third convolution processing unit 2036 may operate identically to the processing unit 1930 shown in FIG. 19. The convolution results calculated by each of the first convolution processing unit 2032, the second convolution processing unit 2034, and the third convolution processing unit 2036 are summed by the tree adder 2038 and output. It may be input into the activation map cache 2040.

이상 설명한 바와 같이, 가중치 커널의 요소값들이 프로세싱 유닛(2030)의 외부에 배열된 가중치 커널 캐쉬(2010)에 저장될 경우, 본 개시에 따른 비트 양자화에 의해 가중치 커널 요소값들의 비트 수를 감소할 수 있으며, 이에 따라 가중치 커널 캐쉬(2010)의 크기와 프로세싱 유닛(2030)의 곱셈기와 가산기의 크기를 감소할 수 있는 효과가 있다. 또한, 프로세싱 유닛(2030)의 크기가 감소함에 따라, 프로세싱 유닛(2030)의 전력 소비량도 감소될 수 있다.As described above, when the element values of the weight kernel are stored in the weight kernel cache 2010 arranged outside the processing unit 2030, the number of bits of the weight kernel element values can be reduced by bit quantization according to the present disclosure. This has the effect of reducing the size of the weight kernel cache 2010 and the size of the multiplier and adder of the processing unit 2030. Additionally, as the size of the processing unit 2030 decreases, the power consumption of the processing unit 2030 may also be reduced.

도 21은 본 개시의 또 다른 실시예에 따른 인공신경망의 하드웨어 구현 예를 도시하는 도면이다.FIG. 21 is a diagram illustrating an example of hardware implementation of an artificial neural network according to another embodiment of the present disclosure.

도시된 인공신경망은 컨볼루션 인공신경망의 컨볼루션 레이어의 합성곱 처리 장치(2200)를 하드웨어로 구현한 예를 나타낸다. 여기서, 컨볼루션 레이어는, 3x3x3 크기의 가중치 커널을 입력 활성화 맵 상의 일 부분(3x3x3 크기의 데이터)에 적용하여 합성곱을 실행한다.The depicted artificial neural network represents an example of hardware implementation of the convolution processing device 2200 of the convolutional layer of the convolutional artificial neural network. Here, the convolution layer performs convolution by applying a weight kernel of size 3x3x3 to a portion of the input activation map (data of size 3x3x3).

도시된 바와 같이, 가중치 커널은 컨볼루션 레이어의 합성곱을 실행하기 위한 프로세싱 유닛(2220) 내에 하드와이어된 형태로 구현될 수 있다. 이 경우, 가중치 커널에 대해 비트 양자화를 적용하여 가중치 커널의 요소값들(w_{1_K}, w_{2_K}, ..., w_{27_K})의 비트 수를 감소시킨 후, 가중치 커널의 비트 수에 따라 캐쉬의 크기를 결정할 수 있다. 또한, 프로세싱 유닛(2220) 내에 와이어로 구현된 가중치 커널의 요소값들과 입력 활성화 맵(또는 특징맵)의 요소 값들을 입력 받아 곱셈 및/또는 덧셈 연산을 실행하는 프로세싱 유닛(2030) 내에 배열되는 곱셈기 또는 가산기의 비트 크기도 비트 양자화의 결과에 따른 가중치 커널 요소 값의 비트 수에 맞추어 설계될 수 있다.As shown, the weight kernel may be implemented in a hardwired form within the processing unit 2220 for executing convolution of the convolution layer. In this case, after applying bit quantization to the weight kernel to reduce the number of bits of the element values (w _{1_K} , w _{2_K} , ..., w _{27_K} ) of the weight kernel, the size of the cache is adjusted according to the number of bits of the weight kernel. can be decided. In addition, the element values of the weight kernel implemented as a wire in the processing unit 2220 and the element values of the input activation map (or feature map) are input and are arranged in the processing unit 2030 to perform multiplication and/or addition operations. The bit size of the multiplier or adder can also be designed according to the number of bits of the weight kernel element value according to the result of bit quantization.

일 실시예에 따르면, 입력 활성화맵 캐쉬(2210)는, 다채널(예를 들어, 3개의 RGB 채널)로 구성된 입력 데이터 상의 일부분(가중치 커널의 크기에 대응되는 부분)을 입력 받아 저장할 수 있다. 가중치 커널은 입력 데이터 상을 순회하며, 입력 활성화맵 캐쉬(2210)는 해당 가중치 커널의 위치에 대응되는 입력 데이터의 일부분을 순차적으로 입력 받아 저장할 수 있다. 입력 활성화맵 캐쉬(2210)에 저장된 입력 데이터의 일부분(x₁, x₂, ..., x₂₇)과 프로세싱 유닛(2220) 내에 와이어로 구현된 가중치 커널의 요소값들(w_{1_K}, w_{2_K}, ... w_{27_K})은 각각 대응되는 곱셈기에 입력되어 다중 곱이 실행된다. 이 때, 프로세싱 유닛(2220) 내에 와이어로 구현된 가중치 커널 요소값들(w_{1_K}, w_{2_K}, ..., w_{9_K})과 입력 활성화맵 캐쉬(2210)에 저장된 입력 데이터의 제1채널의 부분(x₁, x₂, ..., x₉)은 제1 합성곱 처리 유닛(2222)로 입력된다. 또한, 프로세싱 유닛(2220) 내에 와이어로 구현된 가중치 커널 요소값들(w_{10_K}, w_{11_K}, ..., w_{18_K})과 입력 활성화맵 캐쉬(2210)에 저장된 입력 데이터의 제2채널의 부분(x₁₀, x₁₁, ..., x₁₈)은 제2 합성곱 처리 유닛(2224)로 입력된다. 또한, 가중치 커널 캐쉬(2210)의 가중치 커널 요소값들(w_{19_K}, w_{20_K}, ..., w_{27_K})과 입력 활성화맵 캐쉬(2210)에 저장된 입력 데이터의 제3채널의 부분(x₁₉, x₂₀, ..., x₂₇)은 제3 합성곱 처리 유닛(2226)로 입력된다. According to one embodiment, the input activation map cache 2210 may receive and store a portion (a portion corresponding to the size of the weight kernel) of input data composed of multiple channels (e.g., three RGB channels). The weight kernel traverses the input data, and the input activation map cache 2210 can sequentially receive and store a portion of the input data corresponding to the position of the weight kernel. Part of the input data (x ₁ , x ₂ , ..., x ₂₇ ) stored in the input activation map cache 2210 and element values (w _{1_K} , w _{2_K} ) of the weight kernel implemented as a wire in the processing unit 2220 , ... w _{27_K} ) are each input to the corresponding multiplier and multiple multiplication is performed. At this time, the weight kernel element values (w _{1_K} , w _{2_K} , ..., w _{9_K} ) implemented as wires in the processing unit 2220 and the first channel portion of the input data stored in the input activation map cache 2210 (x ₁ , x ₂ , ..., x ₉ ) is input to the first convolution processing unit 2222. In addition, weight kernel element values (w _{10_K} , w _{11_K} , ..., w _{18_K} ) implemented as wires in the processing unit 2220 and a portion of the second channel of the input data stored in the input activation map cache 2210 ( x ₁₀ , x ₁₁ , ..., x ₁₈ ) are input to the second convolution processing unit 2224. In addition, the weight kernel element values (w _{19_K} , w _{20_K} , ..., w _{27_K} ) of the weight kernel cache 2210 and the third channel portion of the input data stored in the input activation map cache 2210 (x ₁₉ , x ₂₀ , ..., x ₂₇ ) are input to the third convolution processing unit 2226.

제1 합성곱 처리 유닛(2222), 제2 합성곱 처리 유닛(2224) 및 제3 합성곱 처리 유닛(2226) 각각에 의해 계산된 합성곱의 결과값은 트리 가산기(2228)에 의해 합산되어 출력 활성화 맵 캐쉬(2230)에 입력될 수 있다.The convolution results calculated by each of the first convolution processing unit 2222, the second convolution processing unit 2224, and the third convolution processing unit 2226 are summed by the tree adder 2228 and output. It may be entered into the activation map cache 2230.

이상 설명한 바와 같이, 가중치 커널의 요소값들이 프로세싱 유닛(2220) 내에 하드와이어된 형태로 구현될 경우, 본 개시에 따른 비트 양자화에 의해 가중치 커널 요소값들의 비트 수를 감소할 수 있으며, 이에 따라 프로세싱 유닛(2220)의 내부에 구현된 와이어의 수와 프로세싱 유닛(2220)의 곱셈기와 가산기의 크기를 감소할 수 있는 효과가 있다. 또한, 프로세싱 유닛(2220)의 크기가 감소함에 따라, 프로세싱 유닛(2220)의 전력 소비량도 감소될 수 있다.As described above, when the element values of the weight kernel are implemented in a hardwired form within the processing unit 2220, the number of bits of the weight kernel element values can be reduced by bit quantization according to the present disclosure, and thus the processing This has the effect of reducing the number of wires implemented inside the unit 2220 and the size of the multiplier and adder of the processing unit 2220. Additionally, as the size of the processing unit 2220 decreases, the power consumption of the processing unit 2220 may also decrease.

도 22는 본 개시의 일 실시예에 따른 인공신경망에 대해 비트 양자화를 실행하는 시스템의 구성을 도시하는 도면이다. FIG. 22 is a diagram illustrating the configuration of a system that performs bit quantization for an artificial neural network according to an embodiment of the present disclosure.

도시된 바와 같이, 시스템(2300)은, 파라미터 선택 모듈(2310), 비트 양자화 모듈(2320) 및 정확도 판단 모듈(2330)을 포함할 수 있다. 파라미터 선택 모듈(2310)은, 입력되는 인공신경망의 구성 정보를 분석할 수 있다. 인공신경망의 구성 정보에는, 인공신경망에 포함되는 레이어의 수, 각 레이어의 기능과 역할, 각 레이어의 입출력 데이터에 관한 정보, 각 레이어에 의해 실행되는 곱셈과 덧셈의 종류와 수, 각 레이어에 의해 실행되는 활성화 함수의 종류, 각 레이어가 입력되는 가중치 커널의 종류와 구성, 각 레이어에 속한 가중치 커널의 크기와 개수, 출력 특징맵의 크기, 가중치 커널의 초기값(예를 들어, 실수로 설정된 가중치 커널의 요소값들) 등이 포함될 수 있으나, 이에 한정되는 것은 아니다. 인공신경망의 구성 정보는, 인공신경망의 종류(예를 들어, 콘볼루션 인공신경망, 순환 인공신경망, 다층 퍼셉트론 등)에 따라 다양한 구성요소들의 정보를 포함할 수 있다.As shown, system 2300 may include a parameter selection module 2310, a bit quantization module 2320, and an accuracy determination module 2330. The parameter selection module 2310 can analyze the input configuration information of the artificial neural network. The configuration information of the artificial neural network includes the number of layers included in the artificial neural network, the function and role of each layer, information about the input and output data of each layer, the type and number of multiplication and addition performed by each layer, and the number of multiplications and additions performed by each layer. Type of activation function to be executed, type and configuration of weight kernel input to each layer, size and number of weight kernels belonging to each layer, size of output feature map, initial value of weight kernel (e.g., weight set as a real number) kernel element values), etc. may be included, but are not limited thereto. The configuration information of the artificial neural network may include information on various components depending on the type of artificial neural network (eg, convolutional artificial neural network, recurrent artificial neural network, multilayer perceptron, etc.).

파라미터 선택 모듈(2310)은, 입력된 인공신경망 구성 정보를 참조하여, 해당 인공신경망에서 적어도 하나의 양자화할 파라미터 또는 파라미터 그룹을 선택할 수 있다. 인공신경망에서 어떻게 하나의 파라미터(또는 데이터) 또는 파라미터 그룹을 선택할지는, 인공신경망의 전체 성능 또는 연산량(또는 하드웨어 구현시 요구되는 자원량)에 선택될 파라미터 미치는 영향에 따라 결정될 수 있다. 파라미터의 선택은, 하나의 가중치, 하나의 특징맵 및 활성화 맵, 하나의 가중치 커널, 한 레이어 속한 모든 가중치, 한 레이어에 속한 모든 특징맵 또는 활성화 맵 중의 어느 하나의 선택으로 실행될 수 있다.The parameter selection module 2310 may refer to the input artificial neural network configuration information and select at least one parameter or parameter group to be quantized in the corresponding artificial neural network. How to select one parameter (or data) or group of parameters in an artificial neural network may be determined according to the effect of the selected parameter on the overall performance or computational amount of the artificial neural network (or the amount of resources required when implementing hardware). Selection of parameters can be performed by selecting any one of one weight, one feature map and activation map, one weight kernel, all weights belonging to one layer, all feature maps or activation maps belonging to one layer.

일 실시예에서, 앞서 설명한 도 4내지 도 10을 참조하여 설명한 콘볼루션 인공신경망(CNN)(400)의 경우에는, 콘볼루션 레이어(420) 및/또는 완전 연결 레이어(440)가 CNN(400)의 전체 성능 또는 연산량 등에 미치는 영향이 크기 때문에, 이들 레이어(420, 440) 중 적어도 하나의 레이어의 가중치 커널 또는 특징맵/활성화 맵이 하나의 양자화할 파라미터로 선택될 수 있다. In one embodiment, in the case of the convolutional artificial neural network (CNN) 400 described above with reference to FIGS. 4 to 10, the convolutional layer 420 and/or the fully connected layer 440 are the CNNs 400. Since it has a large impact on the overall performance or amount of computation, the weight kernel or feature map/activation map of at least one of these layers 420 and 440 may be selected as one parameter to be quantized.

일 실시예에서, 인공신경망에 포함된 복수의 레이어 중 적어도 하나를 선택하여 그 레이어 내의 전체 가중치 커널 또는 그 레이어의 전체 활성화맵 데이터를 하나의 파라미터 그룹으로 설정할 수 있는데, 그 선택 방법은, 선택된 레이어가 인공신경망의 전체 성능 또는 연산량 등에 미치는 영향에 따라 결정될 수 있으나, 이에 한정되는 것은 아니고, 다양한 방법들 중에 하나를 포함할 수 있다. 예를 들어, 인공신경망에 포함된 복수의 레이어 중 적어도 하나의 레이어의 선택은, (i) 인공신경망을 구성하는 복수의 레이어의 배열 순서에 따라 입력 데이터가 수신되는 제1 레이어부터 이후 레이어로 순차적으로 선택하는 방법, (ii) 인공신경망을 구성하는 복수의 레이어의 배열 순서에 따라 최종 출력 데이터가 생성되는 가장 마지막 레이어부터 이전 레이어로 순차적으로 선택하는 방법, (iii) 인공신경망을 구성하는 복수의 레이어 중에서 가장 연산량이 높은 레이어부터 선택하는 방법, 또는 (iv) 인공신경망을 구성하는 복수의 레이어 중에서 가장 연산량이 작은 레이어부터 선택하는 방법에 따라 실행될 수도 있다. In one embodiment, at least one of a plurality of layers included in the artificial neural network can be selected and the entire weight kernel within that layer or the entire activation map data of that layer can be set as one parameter group, and the selection method is: the selected layer may be determined depending on the impact on the overall performance or computational amount of the artificial neural network, but is not limited to this and may include one of various methods. For example, the selection of at least one layer among the plurality of layers included in the artificial neural network is (i) sequentially from the first layer where input data is received to subsequent layers according to the arrangement order of the plurality of layers constituting the artificial neural network. (ii) a method of sequentially selecting from the last layer where the final output data is generated to the previous layer according to the arrangement order of the plurality of layers constituting the artificial neural network, (iii) the plurality of layers constituting the artificial neural network It may be executed by selecting the layer with the highest computational amount among the layers, or (iv) selecting the layer with the lowest computational amount among the plurality of layers that make up the artificial neural network.

파라미터 선택 모듈(2310)에 의해 인공신경망의 양자화 할 데이터 대상의 선택이 완료되면, 선택된 데이터의 정보는 비트 양자화 모듈(2320)에 입력된다. 비트 양자 화 모듈(2320)은, 입력된 선택된 파라미터의 정보를 참조하여, 해당 파라미터에 대한 데이터 표현 크기를 비트 단위로 감소시킬 수 있다. 선택된 파라미터의 연산에 요구되는 자원은, 그 선택된 파라미터를 저장하기 위한 메모리, 또는 그 선택된 파라미터를 전송하기 위한 데이터 경로(data path) 등을 포함할 수 있으나, 이에 한정되는 것은 아니다. When the selection of the data object to be quantized by the artificial neural network is completed by the parameter selection module 2310, information on the selected data is input to the bit quantization module 2320. The bit quantization module 2320 may refer to the input information of the selected parameter and reduce the data representation size for the corresponding parameter in bit units. Resources required for calculating the selected parameter may include, but are not limited to, a memory for storing the selected parameter or a data path for transmitting the selected parameter.

일 실시예에서, 비트 양자화 모듈(2320)이, 선택된 파라미터의 데이터 크기를 비트 단위로 감소시키는 경우, 도 4 내지 도 13을 참조하여 설명한 가중치 커널 양자화 및/또는 활성화 맵 양자화가 실행될 수 있다. In one embodiment, when the bit quantization module 2320 reduces the data size of the selected parameter in bits, the weight kernel quantization and/or activation map quantization described with reference to FIGS. 4 to 13 may be performed.

비트 양자화 모듈(2320)이, 선택된 파라미터에 대한 비트 양자화를 완료하면, 비트 양자화된 인공신경망의 정보를 정확도 판단 모듈(2330)로 전송한다. 정확도 판단 모듈(2330)은, 시스템(2300)에 입력된 인공신경망의 구성 정보에 비트 양자화된 인공신경망의 정보를 반영할 수 있다. 비트 양자화 모듈(2320)은, 비트 양자화된 인공신경망의 정보가 반영된 인공신경망의 구성 정보에 기초하여, 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정할 수 있다. 예를 들어, 정확도 판단 모듈(2330)은, 인공신경망에서 선택된 파라미터의 데이터를 표현하는 크기를 비트 단위로 감소한 후, 해당 인공신경망의 출력 결과(예를 들어, 인공신경망의 추론 결과)의 정확도가 사전에 결정된 목표값 이상이라면, 추가의 비트 양자화를 실행하여도 인공신경망의 전체 성능을 유지할 수 있다고 예측할 수 있다. When the bit quantization module 2320 completes bit quantization for the selected parameter, it transmits the information of the bit quantized artificial neural network to the accuracy determination module 2330. The accuracy determination module 2330 may reflect information on the bit-quantized artificial neural network in the configuration information of the artificial neural network input to the system 2300. The bit quantization module 2320 may determine whether the accuracy of the artificial neural network is greater than or equal to a predetermined target value, based on configuration information of the artificial neural network that reflects the information of the bit quantized artificial neural network. For example, the accuracy determination module 2330 reduces the size representing the data of the selected parameter in the artificial neural network in bits, and then reduces the accuracy of the output result of the artificial neural network (for example, the inference result of the artificial neural network). If it is above the pre-determined target value, it can be predicted that the overall performance of the artificial neural network can be maintained even if additional bit quantization is performed.

따라서, 정확도 판단 모듈(2330)이 인공신경망의 정확도가 목표값 이상이라고 결정하는 경우, 파라미터 선택 모듈(2310)에 제어 신호를 전송하여, 파라미터 선택 모듈(2310)이 인공신경망에 포함된 다른 하나의 파라미터 또는 파라미터 그룹을 선택하도록 한다. 여기서, 인공신경망에서 하나의 파라미터를 선택하는 방법은, (i) 인공신경망을 구성하는 각 파라미터 또는 파라미터 그룹의 배열 순서에 따라, 이전 선택된 파라미터의 다음 파라미터를 순차적으로 선택하는 방법("순방향 비트 양자화", forward bit quantization), (ii) 인공신경망을 구성하는 파라미터 또는 파라미터 그룹의 배열 순서에 따라, 이전 선택된 파라미터의 이전 파라미터를 역방향으로 선택하는 방법("역방향 비트 양자화", backward bit quantization), (iii) 인공신경망을 구성하는 복수의 파라미터 중에서 연산량의 순서에 따라, 이전 선택된 파라미터 다음으로 연산량이 많은 파라미터를 선택하는 방법("고 연산량 비트 양자화", high computational cost bit quantization), 또는 (iv) 인공신경망을 구성하는 복수의 파라미터 중에서 연산량의 순서에 따라, 이전 선택된 파라미터 다음으로 연산량이 작은 파라미터를 선택하는 방법("저 연산량 비트 양자화", low computational cost bit quantization)에 따라 실행될 수도 있다. Therefore, when the accuracy determination module 2330 determines that the accuracy of the artificial neural network is greater than the target value, it transmits a control signal to the parameter selection module 2310, so that the parameter selection module 2310 selects another one included in the artificial neural network. Select a parameter or parameter group. Here, the method of selecting one parameter in the artificial neural network is (i) a method of sequentially selecting the next parameter of the previously selected parameter according to the arrangement order of each parameter or parameter group constituting the artificial neural network ("forward bit quantization ", forward bit quantization), (ii) a method of selecting the previous parameter of the previously selected parameter in the reverse direction according to the arrangement order of the parameters or parameter groups constituting the artificial neural network ("backward bit quantization", backward bit quantization), ( iii) a method of selecting the parameter with the highest computational cost after the previously selected parameter according to the order of computational amount among the plurality of parameters constituting the artificial neural network (“high computational cost bit quantization”), or (iv) artificial Among the plurality of parameters that make up the neural network, depending on the order of computation amount, it may be executed by selecting the parameter with the smallest computation amount next to the previously selected parameter (“low computational cost bit quantization”).

다른 한편, 정확도 판단 모듈(2330)이, 인공신경망의 정확도가 목표치 이상이 아니라고 판단하면, 현재 선택된 파라미터에 대해 실행된 비트 양자화에 의해 인공신경망의 정확도가 저하되었다고 판단할 수 있다. 따라서, 이 경우, 바로 이전에 실행된 비트 양자화에 의해 결정된 비트 수를 최종 비트 수로 결정할 수 있다. 일 실시예에서, 인공신경망의 정확도는, 인공신경망이 주어진 문제의 해결 방법(예를 들어, 입력 데이터인 이미지에 포함된 물체의 인식)을 학습 후에, 추론 단계에서 해당 문제의 정답을 제시할 확률을 의미할 수 있다. 또한, 이상 설명한 비트 양자화 방법에서 사용되는 목표치는, 인공신경망의 비트 양자화 후에 유지해야할 최소한의 정확도로 나타낼 수 있다. 예를 들어, 임계치가 90퍼센트라고 가정하면, 비트 양자화에 의해 선택된 레이어의 파라미터를 저장하기 위한 메모리 크기를 비트 단위로 감소시킨 후에도, 해당 인공신경망의 정확도가 90퍼센트 이상이라면, 추가의 비트 양자화를 실행할 수 있다. 예를 들어, 첫 번째 비트 양자화를 실행한 후에, 인공신경망의 정확도가 94퍼센트로 측정되었다면, 추가의 비트 양자화를 실행할 수 있다. 두 번째 비트 양자화의 실행 후에, 인공신경망의 정확도가 88퍼센트로 측정되었다면, 현재 실행된 비트 양자화의 결과를 무시하고, 첫번째 비트 양자화에 의해 결정된 데이터 표현 비트 수를 최종의 비트 양자화 결과로 확정할 수 있다. On the other hand, if the accuracy determination module 2330 determines that the accuracy of the artificial neural network is not more than the target value, it may determine that the accuracy of the artificial neural network has been degraded due to the bit quantization performed for the currently selected parameter. Therefore, in this case, the number of bits determined by the bit quantization performed immediately before can be determined as the final number of bits. In one embodiment, the accuracy of an artificial neural network refers to the probability that the artificial neural network will present the correct answer to the problem in the inference stage after learning how to solve a given problem (for example, recognition of an object included in an image as input data). It can mean. Additionally, the target value used in the bit quantization method described above can be expressed as the minimum accuracy to be maintained after bit quantization of the artificial neural network. For example, assuming that the threshold is 90 percent, even after reducing the memory size for storing the parameters of the selected layer by bit by bit by bit quantization, if the accuracy of the artificial neural network is more than 90 percent, additional bit quantization is performed. It can be run. For example, after performing the first bit quantization, if the accuracy of the artificial neural network has been measured to be 94 percent, additional bit quantization can be performed. After execution of the second bit quantization, if the accuracy of the artificial neural network is measured to be 88 percent, the result of the currently performed bit quantization can be ignored, and the number of data representation bits determined by the first bit quantization can be confirmed as the final bit quantization result. there is.

일 실시예에서, 연산량 비트 양자화(computational cost bit quantization) 방식에 따라, 연산량을 기준으로 비트 양자화를 실행할 파라미터 또는 파라미터 그룹을 선택하는 경우, 각 파라미터의 연산량은 다음과 같이 결정될 수 있다. 즉, 인공 신경망의 특정 연산에서 n 비트와 m 비트의 합산을 실행하는 경우, 해당 연산의 연산량은 (n+m)/2로 계산한다. 또한, 인공 신경망의 특정 연산에서 n 비트와 m 비트의 곱셈을 실행하는 경우, 해당 연산의 연산량은 n x m으로 계산할 수 있다. 따라서, 인공 신경망의 특정 파라미터에 대한 연산량은, 그 파라미터에 대해 실행하는 모든 덧셈과 곱셈의 연산량들을 합산한 결과가 될 수 있다.In one embodiment, when selecting a parameter or parameter group to perform bit quantization based on the computational cost according to a computational cost bit quantization method, the computational cost of each parameter may be determined as follows. In other words, when the sum of n bits and m bits is performed in a specific operation of an artificial neural network, the calculation amount of the operation is calculated as (n+m)/2. Additionally, when multiplication of n bits and m bits is performed in a specific operation of an artificial neural network, the calculation amount of the operation can be calculated as n x m. Therefore, the amount of computation for a specific parameter of an artificial neural network can be the result of adding up the amount of computation of all additions and multiplications performed on that parameter.

이러한 비트 양자화에서 특정 파라미터 또는 파라미터 그룹을 선택하는 방법은, 각 레이어에 속한 가중치 데이터 또는 특징맵 및 활성화맵 데이터, 또는 하나의 레이어에 속한 각각의 가중치 커널, 또는 하나의 가중치 커널 내에 각각의 가중치 데이터 들을 개별적인 파라미터 그룹으로 선택할 수 있다.The method of selecting a specific parameter or parameter group in this bit quantization is to select the weight data or feature map and activation map data belonging to each layer, or each weight kernel belonging to one layer, or each weight data within one weight kernel. These can be selected as individual parameter groups.

참고로, 본 개시의 실시예에 따른 도 22에 도시된 구성 요소들은 소프트웨어 또는 FPGA(Field Programmable Gate Array) 또는 ASIC(Application Specific Integrated Circuit)와 같은 하드웨어 구성 요소로 구현될 수 있다. For reference, the components shown in FIG. 22 according to an embodiment of the present disclosure may be implemented as software or hardware components such as FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit).

그러나, '구성 요소들'은 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, 각 구성 요소는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. However, 'components' is not limited to software or hardware, and each component may be configured to reside on an addressable storage medium or may be configured to run on one or more processors.

따라서, 일 예로서 구성 요소는 소프트웨어 구성 요소들, 객체지향 소프트웨어 구성 요소들, 클래스 구성 요소 들 및 태스크 구성 요소들과 같은 구성 요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로 그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. Thus, as an example, a component may include components such as software components, object-oriented software components, class components, and task components, as well as processes, functions, properties, procedures, and sub-processes. Includes routines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables.

구성 요소들과 해당 구성 요소들 안에서 제공되는 기능은 더 작은 수의 구성 요소들로 결합되거나 추가적인 구성 요소들로 더 분리될 수 있다. Components and the functionality provided within them may be combined into a smaller number of components or further separated into additional components.

본 개시의 실시예들은, 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독 가능 매체는 컴퓨터 저장 매체 및 통신 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령 어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 통신 매체는 전형적으로 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈, 또는 반송파와 같은 변조된 데이터 신호의 기타 데이터, 또는 기타 전송 메커니즘을 포함하며, 임의의 정보 전달 매체를 포함한다.Embodiments of the present disclosure may also be implemented in the form of a recording medium containing instructions executable by a computer, such as program modules executed by a computer. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and non-volatile media, removable and non-removable media. Additionally, computer-readable media may include both computer storage media and communication media. Computer storage media includes both volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transmission mechanism, and includes any information delivery medium.

본 명세서에서는 본 개시가 일부 실시예들과 관련하여 설명되었지만, 본 발명이 속하는 기술분야의 통상의 기술자가 이해할 수 있는 본 개시의 범위를 벗어나지 않는 범위에서 다양한 변형 및 변경이 이루어질 수 있다는 점을 알아야 할 것이다. 또한, 그러한 변형 및 변경은 본 명세서에 첨부된 특허청구의 범위 내에 속하는 것으로 생각되어야 한다. Although the present disclosure has been described in relation to some embodiments in the specification, it should be noted that various modifications and changes can be made without departing from the scope of the present disclosure as can be understood by those skilled in the art. something to do. Additionally, such modifications and changes should be considered to fall within the scope of the claims appended hereto.

100: 인공신경망 110_1 내지 110_N: 레이어
200: 인공신경망 210: 입력 데이터
220: 입력층 230: 은닉층
240: 출력층 400: 인공신경망
410: 입력 데이터 420: 콘볼루션 레이어(CONV)
430: 서브샘플링 레이어(SUBS) 440: 완전 연결 레이어(FC)
450: 출력 데이터 510: 입력 데이터
520: 가중치 커널 530, 260: 행
540, 270: 열 550, 280: 깊이(채널)
610: 제1 가중치 커널 620: 제1 결과값
630: 제1 활성화 맵 710: 제2 가중치 커널
720: 제2 결과값 730: 제2 활성화 맵
810: 입력 데이터 812: 다중 곱
814: 가중치 커널 816: 합
818: 출력 데이터 820: 제1 순회 구간
822: 제2 순회 구간 824: 제1 특징맵 값
826: 제2 특징맵 값 910: 입력 데이터
912: 다중 곱 914: 가중치 커널
916: 합 918: 출력 데이터
1010: 입력 데이터 1012: 다중 곱
1014: 가중치 커널 1016: 양자화
1018: 양자화 가중치 커널 1020: 합
1022: 출력 데이터 1024: 양자화
1026: 양자화 활성화 맵 1028: 가중치 커널 양자화
1030: 활성화 맵 양자화100: Artificial neural network 110_1 to 110_N: Layer
200: Artificial neural network 210: Input data
220: input layer 230: hidden layer
240: Output layer 400: Artificial neural network
410: Input data 420: Convolutional layer (CONV)
430: Subsampling layer (SUBS) 440: Fully connected layer (FC)
450: output data 510: input data
520: weight kernel 530, 260: row
540, 270: Columns 550, 280: Depth (channels)
610: first weight kernel 620: first result value
630: first activation map 710: second weight kernel
720: second result value 730: second activation map
810: input data 812: multiple product
814: Weight Kernel 816: Sum
818: Output data 820: First circuit section
822: Second traversal section 824: First feature map value
826: Second feature map value 910: Input data
912: Multiple product 914: Weight kernel
916: Sum 918: Output data
1010: Input data 1012: Multiple product
1014: Weight Kernel 1016: Quantization
1018: Quantization weight kernel 1020: Sum
1022: Output data 1024: Quantization
1026: Quantization activation map 1028: Weight kernel quantization
1030: Activation map quantization

Claims

Bit quantization is performed by sequentially selecting any one of the plurality of layers constituting the artificial neural network, and the bit quantization performed on the selected layer by comparing the accuracy and target value of the artificial neural network A memory configured to determine whether the accuracy of the neural network has deteriorated, determine the final number of bits according to the judgment result, and store the quantized artificial neural network; and
A processing unit comprising a plurality of multipliers or a plurality of adders configured to process the quantized artificial neural network,
The one layer is selected from among the plurality of layers based on one of the following: arrangement order, amount of computation, and whether or not to search for accuracy change points for each layer,
The memory is,
If the accuracy of the artificial neural network is greater than the target value, the size of the data expression for storing the parameters of the selected layer is reduced in 1-bit units, and then additional bit quantization is performed, and the accuracy of the artificial neural network is greater than the target value. If it is less than that, the result of the bit quantization is ignored and the minimum number of bits that satisfied the target value in the previous bit quantization is determined as the final number of bits for the parameter of the selected layer,
The target value is,
Artificial neural network hardware that represents the minimum accuracy that must be maintained after bit quantization of the artificial neural network.

According to claim 1,
The quantized artificial neural network is an artificial neural network hardware in which at least one of feature map data and activation map data for each of the plurality of layers is sequentially quantized based on a computation amount or a memory amount.

According to claim 1,
The processing unit is configured to process a quantized artificial neural network in at least one of a computation bit quantization method, a forward bit quantization method, and a reverse bit quantization method.

According to claim 1,
The computational amount and memory amount of the quantized artificial neural network are relatively reduced compared to before quantization, and the number of bits of at least one of the feature map data and activation map data for each of the plurality of layers stored in the memory is reduced. hardware.

According to claim 1,
Artificial neural network hardware, wherein the memory includes at least one of buffer memory, register memory, and cache memory.

According to claim 1,
Artificial neural network hardware, wherein the size of the data bits of the data path through which data of a specific layer among the plurality of layers is transmitted is reduced in bit units.

According to claim 1,
Wherein the quantized artificial neural network is bit quantized to reduce the storage size of the memory configured to store at least one of feature map data and activation map data for each of the plurality of layers.

According to claim 1,
Artificial neural network hardware, wherein the memory further includes a feature map cache.

According to claim 1,
The processing unit further includes a tree adder configured to sum result values of multiple multiplication by the plurality of multipliers.

According to claim 1,
Artificial neural network hardware further comprising an adder connected to the processing unit and an accumulator connected to the adder.

According to claim 1,
Artificial neural network hardware further comprising an output activation map cache configured to store convolution result values of the processing unit.

According to claim 1,
Artificial neural network hardware, wherein the processing unit further includes a plurality of convolution processing units.

According to claim 12,
The processing unit further includes a tree adder configured to sum convolution result values of each of the plurality of convolution processing units.

Bit quantization is performed by sequentially selecting any one of the plurality of layers constituting the artificial neural network, and the accuracy and target value of the artificial neural network are compared and the bit quantization performed on the selected layer is performed on the artificial neural network. a memory configured to determine whether the accuracy of has decreased, determine the final number of bits according to the determination result, store the quantized artificial neural network, and store the quantized feature map; and
It includes a processing unit configured to receive a weight kernel and the quantized feature map and process convolution,
The one layer is selected from among the plurality of layers based on one of the following: arrangement order, amount of computation, and whether or not to search for accuracy change points for each layer,
The memory is,
If the accuracy of the artificial neural network is greater than the target value, the size of the data expression for storing the parameters of the selected layer is reduced in 1-bit units, and then additional bit quantization is performed, and the accuracy of the artificial neural network is greater than the target value. If it is less than that, the result of the bit quantization is ignored and the minimum number of bits that satisfied the target value in the previous bit quantization is determined as the final number of bits for the parameter of the selected layer,
The target value is,
A convolution processing device that represents the minimum accuracy to be maintained after bit quantization of the artificial neural network.

According to claim 14,
The processing unit further includes a multiplier and an adder,
A convolution processing device in which the bit sizes of the multiplier and the adder are designed to match the number of bits of the quantized feature map.

According to claim 14,
The quantized feature map is a convolution processing device in which the feature map is sequentially quantized for each of the plurality of layers based on the amount of computation or amount of memory.

According to claim 14,
For each of the plurality of layers, the number of bits of the quantized feature map is set to the number of bits immediately before the accuracy change point of each of the plurality of layers.

According to claim 14,
For each of the plurality of layers, the quantized feature map is bit quantized by searching for accuracy change points for each of the plurality of layers.

According to claim 14,
A convolution processing device in which bit quantization is performed on the feature map according to the number of bits of the weight kernel.

Bit quantization is performed by sequentially selecting any one of the plurality of layers constituting the artificial neural network, and the accuracy and target value of the artificial neural network are compared and the bit quantization performed on the selected layer is performed on the artificial neural network. A memory that determines whether the accuracy of has decreased, determines the final number of bits according to the judgment result, and stores the quantized artificial neural network;
a processing unit configured to receive a weight kernel and the quantized feature map and process convolution;
a weight kernel cache configured to store weight kernel data of the artificial neural network;
an input feature map cache configured to store feature map data of the quantized artificial neural network; and
An output activation map cache configured to store output activation map data of the quantized artificial neural network,
The one layer is selected from among the plurality of layers based on one of the following: arrangement order, amount of computation, and whether or not to search for accuracy change points for each layer,
The memory is,
If the accuracy of the artificial neural network is greater than the target value, the size of the data expression for storing the parameters of the selected layer is reduced in 1-bit units, and then additional bit quantization is performed, and the accuracy of the artificial neural network is greater than the target value. If it is less than that, the result of the bit quantization is ignored and the minimum number of bits that satisfied the target value in the previous bit quantization is determined as the final number of bits for the parameter of the selected layer,
The target value is,
An apparatus representing the minimum accuracy to be maintained after bit quantization of the artificial neural network.