KR102261715B1

KR102261715B1 - Method and system for bit quantization of artificial neural network

Info

Publication number: KR102261715B1
Application number: KR1020200110330A
Authority: KR
Inventors: 김녹원
Original assignee: 주식회사 딥엑스
Priority date: 2019-02-25
Filing date: 2020-08-31
Publication date: 2021-06-07
Also published as: KR20200104201A; KR20220142986A; KR20200106475A; KR20240093407A; CN113396427A; KR102152374B1; KR20210023912A

Abstract

본 개시는, 인공신경망의 비트 양자화 방법을 제공한다. 이 방법은, (a) 인공신경망에서 양자화할 하나의 파라미터 또는 하나의 파라미터 그룹을 선택하는 단계; (b) 상기 선택된 파라미터 또는 파라미터 그룹에 대한 데이터 표현 크기를 비트 단위로 감소시키는 비트 양자화 단계; (c) 상기 인공신경망의 정확도가 사전 결정된 목표값을 이상인지 여부를 결정하는 단계; (d) 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 (a) 단계 내지 상기 (c) 단계를 반복 실행하는 단계를 포함할 수 있다.The present disclosure provides a method for bit quantization of an artificial neural network. The method includes the steps of (a) selecting one parameter or one parameter group to be quantized in an artificial neural network; (b) a bit quantization step of reducing a data representation size of the selected parameter or parameter group in units of bits; (c) determining whether the accuracy of the artificial neural network is greater than or equal to a predetermined target value; (d) repeating steps (a) to (c) when the accuracy of the artificial neural network is equal to or greater than the target value.

Description

Bit quantization method and system of artificial neural network {METHOD AND SYSTEM FOR BIT QUANTIZATION OF ARTIFICIAL NEURAL NETWORK}

본 개시는 인공신경망의 비트 양자화 방법 및 시스템에 관한 것으로, 보다 상세하게는, 인공신경망의 실질적인 정확성을 유지하면서 성능과 메모리 사용량을 감소시킬 수 있는 비트 양자화 하는 방법 및 시스템에 관한 것이다.The present disclosure relates to a method and system for bit quantization of an artificial neural network, and more particularly, to a method and system for bit quantization capable of reducing performance and memory usage while maintaining practical accuracy of an artificial neural network.

인공신경망(artificial neural network)은 생물학적 뇌를 모델링한 컴퓨터 구조이다. 인공신경망에서는 뇌의 뉴런들에 해당되는 노드들이 상호 연결되어 있고, 뉴런들 사이의 시냅스 결합의 세기를 가중치(weight)로 표현한다. 인공신경망은 인공 뉴런들(노드들)이 학습을 통해 노드들 사이의 시냅스 결합의 세기를 변화시켜, 주어진 문제 해결 능력을 갖는 모델을 구성한다. An artificial neural network is a computer structure that models a biological brain. In an artificial neural network, nodes corresponding to neurons in the brain are interconnected, and the strength of synaptic coupling between neurons is expressed as a weight. In the artificial neural network, artificial neurons (nodes) change the strength of synaptic bonds between nodes through learning, thereby constructing a model having a given problem-solving ability.

인공신경망은, 좁은 의미에서 전방 전달 신경망(feedforward neural network)의 일종인 다층 퍼셉트론(multi-layered perceptron)을 지칭할 수 있으나, 이에 한정되는 것은 아니며, 방사 신경망(radial basis function network), 자기조직 신경망(self-organizing network), 순환 신경망(recurrent neural network) 등 다양한 종류의 신경망을 포함할 수 있다. The artificial neural network, in a narrow sense, may refer to a multi-layered perceptron, which is a type of a feedforward neural network, but is not limited thereto, and a radial basis function network, a self-organized neural network. It may include various types of neural networks, such as a self-organizing network and a recurrent neural network.

최근에는 영상 인식을 위한 기술로 다층 구조의 심층 신경망(deep neural network)이 많이 사용되고 있고, 다층 구조의 심층 신경망의 대표적인 예가 컨볼루션 신경망(convolutional neural network: CNN)이다. 일반적인 다층 구조의 전방 전달 신경망의 경우는, 입력 데이터가 1차원의 형태로 한정되는데, 2차원 내지 3차원으로 구성되는 영상 데이터를 1차원 데이터로 평면화하면 공간 정보가 손실되어, 영상의 공간 정보를 유지한 상태로 신경망의 학습이 어려울 수 있다. 그러나, 컨볼루션 신경망은 2차원 또는 3차원의 공간 정보를 유지한 상태로 시각 정보에 대한 학습이 가능하다. Recently, a multi-layered deep neural network has been widely used as a technology for image recognition, and a representative example of a multi-layered deep neural network is a convolutional neural network (CNN). In the case of a general multi-layer forward transmission neural network, input data is limited to a one-dimensional form. When image data composed of two to three dimensions is flattened into one-dimensional data, spatial information is lost, and spatial information of the image is lost. It may be difficult to train a neural network in the state it is maintained. However, the convolutional neural network can learn about visual information while maintaining 2D or 3D spatial information.

구체적으로, 컨볼루션 신경망은, 이미지의 공간 정보를 유지하면서 인접 이미지와의 특징을 효과적으로 인식하고, 추출한 이미지의 특징을 모으고 강화하는 맥스 풀링(Max Pooling) 과정을 포함하고 있어, 시각적 데이터의 패턴 인식에 효과적이다. 하지만 이러한 컨볼루션 신경망과 같은 다층 구조의 심층 신경망은, 높은 인식 성능을 제공하기 위해 깊은 레이어 구조가 사용되지만, 그 구조가 매우 복잡하고 큰 연산량과 많은 양의 메모리를 요구한다. 다층 구조의 심층 신경망에서, 내부적으로 발생하는 대부분의 연산은 곱셈과 덧셈(또는 누산)을 사용하여 실행되는데, 인공신경망 내의 노드 간의 연결 수가 많고 곱셈을 요구하는 파라미터(예를 들어, 가중치 데이터, 특징맵 데이터, 활성화맵 데이터 등)의 수가 많기 때문에 학습과정이나 인식과정에서 큰 연산량이 필요하다.Specifically, the convolutional neural network effectively recognizes features with adjacent images while maintaining spatial information of the image, and includes a Max Pooling process that collects and strengthens the features of the extracted image, so that it recognizes patterns in visual data. effective in However, a deep neural network with a multi-layer structure such as a convolutional neural network uses a deep layer structure to provide high recognition performance, but the structure is very complex and requires a large amount of computation and a large amount of memory. In multi-layer deep neural networks, most of the operations that occur internally are executed using multiplication and addition (or accumulation), where the number of connections between nodes in the neural network is large and parameters requiring multiplication (e.g., weight data, features Because the number of map data, activation map data, etc.) is large, a large amount of computation is required in the learning process or recognition process.

이상 살펴본 바와 같이, 컨볼루션 신경망과 같은 다층 구조의 심층 신경망의 학습과 인식과정에서 많은 연산량과 메모리량을 필요로 한다. 다층 구조의 심층 신경망의 연산량과 메모리량을 줄이는 방법으로는, 인공신경망의 연산에 사용되는 파라미터의 데이터 표현 크기를 비트 단위로 감소시키는 비트 양자화 방법이 사용될 수 있다. 기존의 비트 양자화 방법은, 인공신경망의 모든 파라미터를 동일한 비트 수로 양자화 하는 균일 비트 양자화(Uniform bit quantization)가 사용되지만, 기존의 균일 비트 양자화 방법은 인공 신경망에서 사용되는 각각의 파라미터에 대한 비트 수의 변경이 전체 성능에 미치는 영향을 정확히 반영하지 못하는 문제가 있다. As described above, a large amount of computation and memory is required in the learning and recognition process of a deep neural network having a multi-layer structure such as a convolutional neural network. As a method of reducing the amount of computation and memory of a multi-layered deep neural network, a bit quantization method of reducing the data expression size of a parameter used for computation of an artificial neural network in bits can be used. In the conventional bit quantization method, uniform bit quantization, which quantizes all parameters of the artificial neural network with the same number of bits, is used, but the conventional uniform bit quantization method uses the number of bits for each parameter used in the artificial neural network. The problem is that the change does not accurately reflect the impact on overall performance.

본 명세서에서 개시되는 실시예들은, 인공신경망에 있어서 전체 성능을 개선하면서 인공지능 정확도를 유지할 수 있도록, 인공신경망을 구성하는 각각의 파라미터 데이터 또는 특정 기준에 따라 그룹 지어진 파라미터 데이터를 특정의 비트 수로 양자화하는 방법 및 시스템을 제공하고자 한다. The embodiments disclosed herein quantize each parameter data constituting an artificial neural network or parameter data grouped according to a specific criterion to a specific number of bits so as to maintain artificial intelligence accuracy while improving overall performance in the artificial neural network. It is intended to provide a method and system for doing so.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 방법이 제공된다. 이 방법은, (a) 인공신경망에서 사용되는 복수의 파라미터 중의 적어도 하나의 파라미터를 선택하는 단계; (b) 상기 선택된 파라미터 에 대한 연산에 요구되는 데이터의 크기를 비트 단위로 감소시키는 비트 양자화 단계; (c) 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 단계; (d) 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 파라미터에 대해 (b) 단계 내지 상기 (c) 단계를 반복 실행하여 상기 파라미터의 데이터 표현에서 비트 수를 더 감소시키는 단계를 포함할 수 있다. 또한, 이 방법은, (e) 상기 인공신경망의 정확도가 상기 목표값 미만인 경우, 상기 파라미터의 비트 수를 상기 인공신경망의 정확도가 상기 목표값을 이상이었을 때의 비트 수로 복원한 후, (a) 단계 내지 (d) 단계를 반복하는 단계를 더 포함한다. According to an embodiment of the present disclosure, a method for bit quantization of an artificial neural network is provided. The method includes the steps of: (a) selecting at least one parameter from among a plurality of parameters used in an artificial neural network; (b) a bit quantization step of reducing the size of data required for the operation on the selected parameter in units of bits; (c) determining whether the accuracy of the artificial neural network is greater than or equal to a predetermined target value; (d) when the accuracy of the artificial neural network is equal to or greater than the target value, repeating steps (b) to (c) for the parameter to further reduce the number of bits in the data representation of the parameter have. Also, in this method, (e) when the accuracy of the artificial neural network is less than the target value, after restoring the number of bits of the parameter to the number of bits when the accuracy of the artificial neural network exceeds the target value, (a) It further comprises repeating steps to (d).

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 방법이 제공된다. 이 방법은, (a) 파라미터 선택 모듈에 의해, 상기 복수의 레이어 중의 적어도 하나의 레이어를 선택하는 단계; (b) 비트 양자화 모듈에 의해, 상기 선택된 레이어의 파라미터에 대한 데이터 표현의 크기를 비트 단위로 감소시키는 비트 양자화 단계; (c) 정확도 판단 모듈에 의해, 상기 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 단계; 및 (d)상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 (a) 단계 내지 상기 (c) 단계를 반복 실행하는 단계를 포함한다.According to an embodiment of the present disclosure, a method for bit quantization of an artificial neural network is provided. The method includes the steps of: (a) selecting, by a parameter selection module, at least one of the plurality of layers; (b) a bit quantization step of reducing, by the bit quantization module, the size of the data representation for the parameter of the selected layer in bits; (c) determining, by the accuracy determination module, whether the accuracy of the artificial neural network is equal to or greater than a predetermined target value; and (d) repeating steps (a) to (c) when the accuracy of the artificial neural network is equal to or greater than the target value.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 방법이 제공된다. 이 방법은, (a) 파라미터 선택 모듈에 의해, 상기 인공신경망에서 가중치, 특징맵, 활성화맵 데이터 중에서 하나 이상의 데이터 또는 하나 이상의 그룹의 데이터를 선택하는 단계; (b) 비트 양자화 모듈에 의해, 상기 선택된 데이터에 대한 데이터 표현 크기를 비트 단위로 감소시키는 비트 양자화 단계; (c) 상기 인공신경망의 인공지능 정확도가 목표값 이상인지 여부를 측정하는 단계; 및 (d) 상기 인공신경망의 데이터 중에서 더 이상 양자화할 데이터가 존재하지 않을 때까지, 상기 (a) 단계 내지 상기 (c) 단계를 반복 실행하는 단계를 포함한다.According to an embodiment of the present disclosure, a method for bit quantization of an artificial neural network is provided. The method includes the steps of: (a) selecting one or more data or one or more groups of data from weight, feature map, and activation map data in the artificial neural network by a parameter selection module; (b) a bit quantization step of reducing a data representation size of the selected data in bits by a bit quantization module; (c) measuring whether the artificial intelligence accuracy of the artificial neural network is greater than or equal to a target value; and (d) repeating steps (a) to (c) until there is no more data to be quantized among the data of the artificial neural network.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 방법이 제공된다. 이 방법은, 상기 인공신경망의 하나 이상의 파라미터에 따라 상기 인공신경망을 학습시키는 단계; 상기 실시예들에 따르는 인공신경망의 비트 양자화 방법에 따라 상기 인공신경망의 하나 이상의 파라미터에 대한 비트 양자화를 실행하는 단계; 및 상기 비트 양자화가 실행된 상기 인공신경망의 하나 이상의 파라미터에 따라 상기 인공신경망을 학습시키는 단계를 포함한다.According to an embodiment of the present disclosure, a method for bit quantization of an artificial neural network is provided. The method includes: training the artificial neural network according to one or more parameters of the artificial neural network; performing bit quantization on one or more parameters of the artificial neural network according to the bit quantization method of the artificial neural network according to the embodiments; and training the artificial neural network according to one or more parameters of the artificial neural network on which the bit quantization is performed.

본 개시의 다른 실시예에 따르면, 인공신경망의 비트 양자화 시스템이 제공된다. 이 시스템은, 상기 인공신경망 내에서 적어도 하나의 파라미터를 선택하는 파라미터 선택 모듈; 상기 선택된 파라미터의 데이터 표현의 크기를 비트 단위로 감소시키는 비트 양자화 단계; 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 정확도 판단 모듈을 포함할 수 있다. 상기 정확도 판단 모듈은, 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 파라미터 선택 모듈과 상기 비트 양자화 모듈을 제어하여, 상기 인공신경망의 정확도를 목표값 이상으로 유지하면서, 상기 복수의 파라미터 각각이 최소 비트 수를 가지도록 양자화를 실행할 수 있다. According to another embodiment of the present disclosure, a bit quantization system of an artificial neural network is provided. The system includes: a parameter selection module for selecting at least one parameter in the artificial neural network; a bit quantization step of reducing the size of the data representation of the selected parameter in bits; and an accuracy determination module for determining whether the accuracy of the artificial neural network is equal to or greater than a predetermined target value. When the accuracy of the artificial neural network is greater than or equal to the target value, the accuracy determination module controls the parameter selection module and the bit quantization module to maintain the accuracy of the artificial neural network to be greater than or equal to the target value, and each of the plurality of parameters is Quantization can be performed to have a minimum number of bits.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 시스템이 제공된다. 이 시스템은, 상기 인공신경망을 구성하는 복수의 레이어 중의 적어도 하나의 레이어를 선택하는 파라미터 선택 모듈; 상기 선택된 레이어의 파라미터에 대한 데이터 표현의 크기를 비트 단위로 감소시키는 비트 양자화 모듈; 및 상기 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 정확도 판단 모듈을 포함하며, 상기 정확도 판단 모듈은, 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 파라미터 선택 모듈과 상기 비트 양자화 모듈을 제어하여, 상기 복수의 레이어 중의 다른 하나의 레이어에 대한 비트 양자화가 실행되도록 하며, 상기 비트 양자화 모듈은, 상기 복수의 레이어의 모든 가중치에 대해서 n비트(단, n은 n>0인 정수)를 설정하고, 상기 복수의 레이어의 출력 데이터에 대해서 m비트(단, m은 m>0인 정수)를 설정한다. According to an embodiment of the present disclosure, a bit quantization system of an artificial neural network is provided. The system includes: a parameter selection module for selecting at least one layer from among a plurality of layers constituting the artificial neural network; a bit quantization module for reducing the size of the data representation for the parameter of the selected layer in units of bits; and an accuracy determination module for determining whether the accuracy of the artificial neural network is equal to or greater than a predetermined target value, wherein the accuracy determination module is configured to, when the accuracy of the artificial neural network is greater than or equal to the target value, the parameter selection module and the bit quantization control the module to perform bit quantization on another one of the plurality of layers, and the bit quantization module is configured to: n bits for all weights of the plurality of layers (where n is an integer where n > 0) ), and set m bits (where m is an integer where m>0) for the output data of the plurality of layers.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 시스템이 제공된다. 이 시스템은, 상기 인공신경망을 구성하는 복수의 레이어 중의 적어도 하나의 레이어를 선택하는 파라미터 선택 모듈; 상기 선택된 레이어의 파라미터에 대한 데이터 표현의 크기를 비트 단위로 감소시키는 비트 양자화 모듈; 및 상기 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 정확도 판단 모듈을 포함하며, 상기 정확도 판단 모듈은, 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 파라미터 선택 모듈과 상기 비트 양자화 모듈을 제어하여, 상기 복수의 레이어 중의 다른 하나의 레이어에 대한 비트 양자화가 실행되도록 하며, 상기 비트 양자화 모듈은, 상기 복수의 레이어의 가중치와 출력 데이터에 대해서 n비트(단, n은 n>0인 정수)를 할당하되, 상기 복수의 레이어 각각에 할당되는 비트의 수를 상이하게 설정한다. According to an embodiment of the present disclosure, a bit quantization system of an artificial neural network is provided. The system includes: a parameter selection module for selecting at least one layer from among a plurality of layers constituting the artificial neural network; a bit quantization module for reducing the size of the data representation for the parameter of the selected layer in units of bits; and an accuracy determination module for determining whether the accuracy of the artificial neural network is equal to or greater than a predetermined target value, wherein the accuracy determination module is configured to, when the accuracy of the artificial neural network is greater than or equal to the target value, the parameter selection module and the bit quantization The module is controlled to perform bit quantization on another one of the plurality of layers, and the bit quantization module is configured for n bits (where n is n > 0) for weights and output data of the plurality of layers. integer), but the number of bits allocated to each of the plurality of layers is set differently.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 시스템이 제공된다. 이 시스템은, 상기 인공신경망을 구성하는 복수의 레이어 중의 적어도 하나의 레이어를 선택하는 파라미터 선택 모듈; 상기 선택된 레이어의 파라미터에 대한 데이터 표현의 크기를 비트 단위로 감소시키는 비트 양자화 모듈; 및 상기 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 정확도 판단 모듈을 포함하며, 상기 정확도 판단 모듈은, 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 파라미터 선택 모듈과 상기 비트 양자화 모듈을 제어하여, 상기 복수의 레이어 중의 다른 하나의 레이어에 대한 비트 양자화가 실행되도록 하며, 상기 비트 양자화 모듈은, 상기 복수의 레이어의 가중치와 출력 데이터의 비트의 수를 개별적으로 상이하게 할당한다.According to an embodiment of the present disclosure, a bit quantization system of an artificial neural network is provided. The system includes: a parameter selection module for selecting at least one layer from among a plurality of layers constituting the artificial neural network; a bit quantization module for reducing the size of the data representation for the parameter of the selected layer in units of bits; and an accuracy determination module for determining whether the accuracy of the artificial neural network is equal to or greater than a predetermined target value, wherein the accuracy determination module is configured to, when the accuracy of the artificial neural network is greater than or equal to the target value, the parameter selection module and the bit quantization control the module to perform bit quantization for another one of the plurality of layers, wherein the bit quantization module allocates weights of the plurality of layers and the number of bits of output data differently.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 시스템을 제공한다. 이 시스템은, 상기 인공신경망을 구성하는 복수의 레이어 중의 적어도 하나의 레이어를 선택하는 파라미터 선택 모듈; 상기 선택된 레이어의 파라미터를 저장하기 위한 메모리의 크기를 비트 단위로 감소시키는 비트 양자화 모듈; 및 상기 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 정확도 판단 모듈을 포함하며, 상기 정확도 판단 모듈은, 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 파라미터 선택 모듈과 상기 비트 양자화 모듈을 제어하여, 상기 복수의 레이어 중의 다른 하나의 레이어에 대한 비트 양자화가 실행되도록 하며, 상기 비트 양자화 모듈은, 상기 복수의 레이어에서 사용되는 가중치 별로 상이한 수의 비트를 할당한다.According to an embodiment of the present disclosure, there is provided a bit quantization system of an artificial neural network. The system includes: a parameter selection module for selecting at least one layer from among a plurality of layers constituting the artificial neural network; a bit quantization module for reducing the size of a memory for storing the parameter of the selected layer in units of bits; and an accuracy determination module for determining whether the accuracy of the artificial neural network is equal to or greater than a predetermined target value, wherein the accuracy determination module is configured to, when the accuracy of the artificial neural network is greater than or equal to the target value, the parameter selection module and the bit quantization The module is controlled to perform bit quantization on another one of the plurality of layers, and the bit quantization module allocates a different number of bits for each weight used in the plurality of layers.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 시스템이 제공된다. 이 시스템은, 상기 인공신경망을 구성하는 복수의 레이어 중의 적어도 하나의 레이어를 선택하는 파라미터 선택 모듈; 상기 선택된 레이어의 파라미터에 대한 데이터 표현의 크기를 비트 단위로 감소시키는 비트 양자화 모듈; 및 상기 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 정확도 판단 모듈을 포함하며, 상기 정확도 판단 모듈은, 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 파라미터 선택 모듈과 상기 비트 양자화 모듈을 제어하여, 상기 복수의 레이어 중의 다른 하나의 레이어에 대한 비트 양자화가 실행되도록 하며, 상기 비트 양자화 모듈은, 상기 복수의 레이어에서 출력되는 출력 데이터의 특정 단위로 개별적으로 상이한 수의 비트를 할당한다.According to an embodiment of the present disclosure, a bit quantization system of an artificial neural network is provided. The system includes: a parameter selection module for selecting at least one layer from among a plurality of layers constituting the artificial neural network; a bit quantization module for reducing the size of the data representation for the parameter of the selected layer in units of bits; and an accuracy determination module for determining whether the accuracy of the artificial neural network is equal to or greater than a predetermined target value, wherein the accuracy determination module is configured to, when the accuracy of the artificial neural network is greater than or equal to the target value, the parameter selection module and the bit quantization control the module to perform bit quantization for another one of the plurality of layers, wherein the bit quantization module allocates a different number of bits individually to a specific unit of output data output from the plurality of layers do.

본 개시의 일 실시예에 따르면, 인공신경망의 비트 양자화 시스템이 제공된다. 이 시스템은, 상기 인공신경망을 구성하는 복수의 레이어 중의 적어도 하나의 레이어를 선택하는 파라미터 선택 모듈; 상기 선택된 레이어의 파라미터에 대한 데이터 표현의 크기를 비트 단위로 감소시키는 비트 양자화 모듈; 및 상기 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 정확도 판단 모듈을 포함하며, 상기 정확도 판단 모듈은, 상기 인공신경망의 정확도가 상기 목표값 이상인 경우, 상기 파라미터 선택 모듈과 상기 비트 양자화 모듈을 제어하여, 상기 복수의 레이어 중의 다른 하나의 레이어에 대한 비트 양자화가 실행되도록 하며, 상기 비트 양자화 모듈은, 상기 복수의 레이어에서 출력되는 출력 데이터의 개별적 값에 각각 다른 비트를 할당한다.According to an embodiment of the present disclosure, a bit quantization system of an artificial neural network is provided. The system includes: a parameter selection module for selecting at least one layer from among a plurality of layers constituting the artificial neural network; a bit quantization module for reducing the size of the data representation for the parameter of the selected layer in units of bits; and an accuracy determination module for determining whether the accuracy of the artificial neural network is equal to or greater than a predetermined target value, wherein the accuracy determination module is configured to, when the accuracy of the artificial neural network is greater than or equal to the target value, the parameter selection module and the bit quantization The module is controlled to perform bit quantization on another one of the plurality of layers, and the bit quantization module allocates different bits to individual values of output data output from the plurality of layers.

본 개시의 다양한 실시예들에 따르면, 인공신경망에 있어서 학습 또는 추론 등의 연산에 필요한 데이터들의 비트 수를 양자화 함으로써, 전체 연산 성능을 개선할 수 있다. 또한, 인공신경망을 구현하는데 필요한 하드웨어 리소스는 절감하고, 전력 소모와 메모리 필요 사용량을 감소시키면서, 인공지능 정확도의 열화가 없는 인공신경망을 구현하는 것이 가능하다. According to various embodiments of the present disclosure, overall computational performance may be improved by quantizing the number of bits of data required for computation, such as learning or inference, in an artificial neural network. In addition, it is possible to implement an artificial neural network without deterioration of artificial intelligence accuracy while reducing hardware resources required to implement the artificial neural network, reducing power consumption and memory required usage.

본 개시의 효과는 이상에서 언급한 효과로 제한되지 않으며, 언급되지 않은 다른 효과들은 청국범위의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The effects of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description of the range of cheongguk.

본 개시의 실시예들은, 이하 설명하는 첨부 도면들을 참조하여 설명될 것이며, 여기서 유사한 참조 번호는 유사한 요소들을 나타내지만, 이에 한정되지는 않는다.
도 1은 본 개시의 일 실시예에 따른 복수의 레이어와 복수의 레이어 가중치를 이용하여 입력 데이터에 대한 출력 데이터를 획득하는 인공신경망의 예를 보여주는 도면이다.
도 2 내지 도 3은, 본 개시의 일 실시예에 따른 도 1에 도시된 인공신경망의 구체적인 구현예들을 설명하기 위한 도면이다.
도 4는 본 개시의 일 실시예에 따른 복수의 레이어를 포함하는 인공신경망의 다른 예를 보여주는 도면이다.
도 5는 본 개시의 일 실시예에 따른 컨볼루션 레이어에서 입력 데이터와 합성곱 연산에 사용되는 가중치 커널을 나타내는 도면이다.
도 6은 본 개시의 일 실시예에 따른 입력 데이터에 대해 제1 가중치 커널을 사용하여 합성곱을 실행하여 제1 활성화 맵을 생성하는 절차를 설명하는 도면이다.
도 7은 본 개시의 일 실시예에 따른 입력 데이터에 대해 제2 가중치 커널을 사용하여 합성곱을 실행하여 제2 활성화 맵을 생성하는 절차를 설명하는 도면이다.
도8은 본 개시의 일 실시예에 따른 컨볼루션 레이어의 연산 과정을 행렬로 표현한 도면이다.
도 9는 본 개시의 일 실시예에 따른 완전 연결 레이어의 연산 과정을 행렬로 표현한 도면이다.
도 10은 본 개시의 일 실시예에 따른 콘볼루션 레이어의 비트 양자화 과정을 행렬로 표현한 도면이다.
도 11은 본 개시의 일 실시예에 따른 인공신경망의 비트 양자화 방법을 나타내는 순서도이다.
도 12는 본 개시의 다른 실시예에 따른 인공신경망의 비트 양자화 방법을 나타내는 순서도이다.
도 13은 본 개시의 또 다른 실시예에 따른 인공신경망의 비트 양자화 방법을 나타내는 순서도이다.
도 14는 본 개시의 일 실시예에 따른 인공신경망의 레이어 별 연산량의 예시를 나타내는 그래프이다.
도 15는 본 개시의 일 실시예에 따른 순방향 양자화(forward bit quantization) 방법에 의해 비트 양자화가 실행된 인공신경망의 레이어 별 비트 수를 나타내는 그래프이다.
도 16은 본 개시의 일 실시예에 따른 역방향 양자화(backward bit quantization) 방법에 의해 비트 양자화가 실행된 인공신경망의 레이어 별 비트 수를 나타내는 그래프이다.
도 17은 본 개시의 일 실시예에 따른 고 연산량 레이어 우선 양자화(high computational cost layer first bit quantization) 방법에 의해 비트 양자화가 실행된 인공신경망의 레이어 별 비트 수를 나타내는 그래프이다.
도 18은 본 개시의 일 실시예에 따른 저 연산량 레이어 우선 양자화(low computational cost layer first bit quantization) 방법에 의해 비트 양자화가 실행된 인공신경망의 레이어 별 비트 수를 나타내는 그래프이다.
도 19는 본 개시의 일 실시예에 따른 인공신경망의 하드웨어 구현 예를 도시하는 도면이다.
도 20은 본 개시의 다른 실시예에 따른 인공신경망의 하드웨어 구현 예를 도시하는 도면이다.
도 21은 본 개시의 또 다른 실시예에 따른 인공신경망의 하드웨어 구현 예를 도시하는 도면이다.
도 22은 본 개시의 일 실시예에 따른 인공신경망에 대해 비트 양자화를 실행하는 시스템의 구성을 도시하는 도면이다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present disclosure will be described with reference to the accompanying drawings described below, wherein like reference numerals denote like elements, but are not limited thereto.
1 is a diagram illustrating an example of an artificial neural network for obtaining output data for input data using a plurality of layers and a plurality of layer weights according to an embodiment of the present disclosure.
2 to 3 are diagrams for explaining specific implementations of the artificial neural network shown in FIG. 1 according to an embodiment of the present disclosure.
4 is a diagram illustrating another example of an artificial neural network including a plurality of layers according to an embodiment of the present disclosure.
5 is a diagram illustrating a weight kernel used in a convolution operation with input data in a convolution layer according to an embodiment of the present disclosure.
6 is a diagram illustrating a procedure of generating a first activation map by performing convolution on input data using a first weight kernel according to an embodiment of the present disclosure.
7 is a diagram illustrating a procedure of generating a second activation map by performing convolution on input data using a second weight kernel according to an embodiment of the present disclosure.
8 is a diagram illustrating an operation process of a convolutional layer in a matrix according to an embodiment of the present disclosure.
9 is a diagram illustrating a matrix operation of a fully connected layer according to an embodiment of the present disclosure.
10 is a diagram illustrating a process of bit quantization of a convolutional layer in a matrix according to an embodiment of the present disclosure.
11 is a flowchart illustrating a bit quantization method of an artificial neural network according to an embodiment of the present disclosure.
12 is a flowchart illustrating a bit quantization method of an artificial neural network according to another embodiment of the present disclosure.
13 is a flowchart illustrating a bit quantization method of an artificial neural network according to another embodiment of the present disclosure.
14 is a graph illustrating an example of an amount of computation for each layer of an artificial neural network according to an embodiment of the present disclosure.
15 is a graph illustrating the number of bits per layer of an artificial neural network in which bit quantization is performed by a forward bit quantization method according to an embodiment of the present disclosure.
16 is a graph illustrating the number of bits per layer of an artificial neural network in which bit quantization is performed by a backward bit quantization method according to an embodiment of the present disclosure.
17 is a graph illustrating the number of bits per layer of an artificial neural network in which bit quantization is performed by a high computational cost layer first bit quantization method according to an embodiment of the present disclosure.
18 is a graph showing the number of bits per layer of an artificial neural network in which bit quantization is performed by a low computational cost layer first bit quantization method according to an embodiment of the present disclosure.
19 is a diagram illustrating a hardware implementation example of an artificial neural network according to an embodiment of the present disclosure.
20 is a diagram illustrating an example of hardware implementation of an artificial neural network according to another embodiment of the present disclosure.
21 is a diagram illustrating an example of hardware implementation of an artificial neural network according to another embodiment of the present disclosure.
22 is a diagram illustrating a configuration of a system for performing bit quantization on an artificial neural network according to an embodiment of the present disclosure.

이하, 본 개시의 실시를 위한 구체적인 내용을 첨부된 도면을 참조하여 상세히 설명한다. 다만, 이하의 설명에서는 본 개시의 요지를 불필요하게 흐릴 우려가 있는 경우, 널리 알려진 기능이나 구성에 관한 구체적 설명은 생략하기로 한다.Hereinafter, specific contents for carrying out the present disclosure will be described in detail with reference to the accompanying drawings. However, in the following description, if there is a risk of unnecessarily obscuring the subject matter of the present disclosure, detailed descriptions of well-known functions or configurations will be omitted.

첨부된 도면에서, 동일하거나 대응하는 구성요소에는 동일한 참조부호가 부여되어 있다. 또한, 이하의 실시예들의 설명에 있어서, 동일하거나 대응하는 구성요소를 중복하여 기술하는 것이 생략될 수 있다. 그러나 구성요소에 관한 기술이 생략되어도, 그러한 구성요소가 어떤 실시예에 포함되지 않는 것으로 의도되지는 않는다.In the accompanying drawings, identical or corresponding components are assigned the same reference numerals. In addition, in the description of the embodiments below, overlapping description of the same or corresponding components may be omitted. However, even if descriptions regarding components are omitted, it is not intended that such components are not included in any embodiment.

본 개시에서, "파라미터"는, 인공신경망 또는 인공신경망을 구성하는 각 레이어의 가중치 데이터, 특징맵 데이터, 활성화맵 데이터 중 어느 하나 이상을 의미할 수 있다. 또한, "파라미터"는, 이와 같은 데이터로 표현되는 인공신경망 또는 인공신경망을 구성하는 각 레이어를 의미할 수도 있다. 또한, 본 개시에서, "비트 양자화"는, 파라미터 또는 파라미터들의 그룹을 나타내는 데이터 표현의 비트 수를 감소시키는 연산 또는 동작을 의미할 수 있다. In the present disclosure, a “parameter” may refer to an artificial neural network or any one or more of weight data, feature map data, and activation map data of each layer constituting the artificial neural network. In addition, "parameter" may mean an artificial neural network represented by such data or each layer constituting an artificial neural network. Also, in the present disclosure, “bit quantization” may refer to an operation or operation that reduces the number of bits in a data representation representing a parameter or group of parameters.

본 개시는 디지털 하드웨어 시스템의 연산량과 메모리 사용량 및 전력소모를 감소시키기 위해, 관련 연산에 사용 되는 파라미터의 데이터 표현 크기를 비트 단위로 감소시키는 양자화 방법과 시스템의 다양한 실시예들을 제공한다. 일부 실시예에서, 본 개시의 비트 양자화 방법과 시스템은, 인공신경망의 연산에 사용되는 파라미터의 크기를 비트 단위로 감소시킬 수 있다. 일반적으로 인공신경망의 연산에는 32비트, 16비트, 또는 8 비트 단위의 데이터구조(예를 들어, CPU, GPU, 메모리, 캐쉬, 버퍼 등)를 사용한다. 따라서, 본 개시의 양자화 방법과 시스템은 인공신경망의 연산에 사용되는 파라미터의 크기를 32, 16, 8 비트 이외의 다른 비트로 감소시킬 수 있다. 더욱이 인공신경망의 각각의 파라미터 또는 각각의 파라미터의 그룹에게 특정 비트 수를 개별적으로 상이하게 할당할 수 있다.The present disclosure provides various embodiments of a quantization method and system for reducing the data representation size of a parameter used for a related operation in bits in order to reduce the amount of computation, memory usage, and power consumption of a digital hardware system. In some embodiments, the bit quantization method and system of the present disclosure may reduce the size of a parameter used for calculation of an artificial neural network in bits. In general, a 32-bit, 16-bit, or 8-bit data structure (eg, CPU, GPU, memory, cache, buffer, etc.) is used for the operation of an artificial neural network. Accordingly, the quantization method and system of the present disclosure can reduce the size of a parameter used for calculation of an artificial neural network to bits other than 32, 16, and 8 bits. Moreover, a specific number of bits can be individually and differently assigned to each parameter or group of each parameter of the artificial neural network.

일부 실시예에서, 본 개시의 비트 양자화 방법과 시스템은, 인공 신경망 모델에 대하여, 모든 가중치를 위해 n비트(n은 n > 0인 정수)를 설정하고, 각 레이어의 출력 데이터를 m 비트(m은 m > 0인 정수)를 설정할 수 있다. In some embodiments, the bit quantization method and system of the present disclosure set n bits (n is an integer such that n > 0) for all weights for an artificial neural network model, and convert the output data of each layer to m bits (m is an integer where m > 0) can be set.

다른 실시예에서, 본 개시의 비트 양자화 방법과 시스템은, 인공 신경망 모델의 각 레이어의 가중치와 출력 데이터에 n비트를 할당할 수 있으며, 여기서 n은 각 레이어 마다 다른 수로 설정될 수 있다. In another embodiment, the bit quantization method and system of the present disclosure may allocate n bits to the weight and output data of each layer of the artificial neural network model, where n may be set to a different number for each layer.

또 다른 실시예에서, 본 개시의 비트 양자화 방법과 시스템은, 인공 신경망 모델의 각 레이어의 가중치와 출력 데이터에 서로 다른 비트를 할당하며, 또한 각 레이어 마다 가중치와 해당 레이어에서 출력 특징맵 파라미터에 대해 다른 수의 비트를 할당할 수 있다. In another embodiment, the bit quantization method and system of the present disclosure allocate different bits to the weights and output data of each layer of the artificial neural network model, and also for the weights for each layer and the output feature map parameters in the corresponding layer. A different number of bits can be allocated.

본 개시의 비트 양자화 방법과 시스템은, 다양한 종류의 인공 신경망에 적용될 수 있다. 예를 들어, 본 개시의 비트 양자화 방법과 시스템이 컨볼루션 인공 신경망(CNN: convolution neural network)에 적용되는 경우, 이 인공신경망의 각 레이어 내에서 사용하는 가중치 커널들에 개별적으로 다른 비트를 할당할 수 있다.The bit quantization method and system of the present disclosure can be applied to various types of artificial neural networks. For example, when the bit quantization method and system of the present disclosure are applied to a convolutional neural network (CNN), different bits may be individually allocated to weight kernels used in each layer of the artificial neural network. can

또 다른 실시예에서, 본 개시의 비트 양자화 방법과 시스템은, 다층 구조의 인공 신경망 모델의 각 레이어 내에서 사용되는 각 가중치 별로 다른 비트를 할당하거나, 각 레이어의 출력 데이터의 특정 단위로 개별적인 비트를 할당하거나, 각 레이어의 출력 데이터의 개별적 값에 다른 비트를 할당할 수 있다.In another embodiment, the bit quantization method and system of the present disclosure allocate different bits to each weight used in each layer of a multi-layered artificial neural network model, or separate bits in a specific unit of output data of each layer. Alternatively, different bits may be allocated to individual values of the output data of each layer.

이상 설명한 본 개시의 다양한 실시예들에 따른 비트 양자화 방법과 시스템은, 이상 설명한 실시예들 중의 어느 하나를 인공 신경망 모델에 적용할 수 있으나, 이에 한정되는 것은 아니며, 이 실시예들 중 하나 이상을 결합하여 인공 신경망 모델에 적용할 수도 있다.In the bit quantization method and system according to various embodiments of the present disclosure described above, any one of the above-described embodiments may be applied to an artificial neural network model, but is not limited thereto, and one or more of these embodiments may be applied to an artificial neural network model. It can also be applied to artificial neural network models by combining them.

도 1은 본 개시의 일 실시예에 따른 복수의 레이어와 복수의 레이어 가중치를 이용하여 입력 데이터에 대한 출력 데이터를 획득하는 인공신경망(100)의 예를 보여주는 도면이다. 1 is a diagram illustrating an example of an artificial neural network 100 for obtaining output data for input data using a plurality of layers and a plurality of layer weights according to an embodiment of the present disclosure.

일반적으로, 인공신경망(100)과 같은 다층 구조의 인공 신경망은, 머신러닝(Machine Learning) 기술과 인지과학에서, 생물학적 신경망의 구조에 기초하여 구현된 통계학적 학습 알고리즘 또는 그 알고리즘을 실행하는 구조를 포함한다. 즉, 인공신경망(100)은, 생물학적 신경망에서 와 같이 시냅스의 결합으로 네트워크를 형성한 인공 뉴런인 노드(node)들이 시냅스의 가중치를 반복적으로 조정하여, 특정 입력에 대응한 올바른 출력과 추론된 출력 사이의 오차가 감소되도록 학습함으로써, 문제 해결 능력을 가지는 머신러닝 모델을 생성할 수 있다. In general, a multi-layered artificial neural network such as the artificial neural network 100 is a statistical learning algorithm implemented based on the structure of a biological neural network or a structure for executing the algorithm in machine learning technology and cognitive science. include That is, the artificial neural network 100 repeatedly adjusts the weight of the synapse by repeatedly adjusting the weight of the synapse in the artificial neural network 100, which is an artificial neuron that forms a network by combining synapses, as in a biological neural network, to obtain correct output and inferred output corresponding to a specific input. By learning to reduce the error between them, it is possible to create a machine learning model with problem-solving ability.

일 예에서, 인공신경망(100)은 하나 이상의 노드들이 포함된 레이어들과 이들 사이의 연결로 구성된 다층 퍼셉트론(MLP: multilayer perceptron)으로 구현될 수 있다. 그러나, 본 실시예에 따른 인공신경망(100)은 MLP의 구조에 한정되는 것은 아니며, 다층 구조를 갖는 다양한 인공신경망 구조들 중의 하나를 이용하여 구현될 수 있다. In one example, the artificial neural network 100 may be implemented as a multilayer perceptron (MLP) consisting of layers including one or more nodes and a connection therebetween. However, the artificial neural network 100 according to the present embodiment is not limited to the structure of the MLP, and may be implemented using one of various artificial neural network structures having a multi-layered structure.

도 1에 도시된 바와 같이, 인공신경망(100)은, 외부로부터 입력 데이터를 입력하면, 각각 하나 이상의 노드로 구성된 복수의 레이어(110_1, 110_2, ..., 110_N)를 거쳐 입력 데이터에 대응한 출력 데이터를 출력하도록 구성된다. As shown in FIG. 1 , when input data is input from the outside, the artificial neural network 100 corresponds to the input data through a plurality of layers 110_1 , 110_2 , ..., 110_N each composed of one or more nodes. configured to output output data.

일반적으로, 인공신경망(100)의 학습 방법에는, 교사 신호(정답)의 입력에 의해서 문제의 해결에 최적화되도록 학습하는 지도 학습(Supervised Learning)방법, 교사 신호를 필요로 하지 않는 비지도 학습(Unsupervised Learning)방법, 지도 학습과 비지도 학습을 함께 이용하는 준 지도 학습(Semi-supervised Learning)방법이 있다. 도 1에 도시된 인공신경망(100)은, 사용자의 선택에 따라 지도 학습(Supervised Learning)방법, 비지도 학습(Unsupervised Learning)방법, 준 지도 학습(Semi-supervised Learning)방법 중 적어도 하나 이상의 방법을 이용하여, 출력 데이터를 생성하는 인공신경망(100)을 학습시킬 수 있다. In general, the learning method of the artificial neural network 100 includes a supervised learning method that learns to be optimized to solve a problem by input of a teacher signal (correct answer), and an unsupervised learning method that does not require a teacher signal. Learning) method, there is a semi-supervised learning method that uses both supervised learning and unsupervised learning. The artificial neural network 100 shown in FIG. 1 performs at least one of a supervised learning method, an unsupervised learning method, and a semi-supervised learning method according to a user's selection. By using it, the artificial neural network 100 that generates output data can be trained.

도 2 내지 도 3은, 본 개시의 일 실시예에 따른 도 1에 도시된 인공신경망(100)의 구체적인 구현예들을 설명하기 위한 도면이다. 2 to 3 are diagrams for explaining specific implementations of the artificial neural network 100 shown in FIG. 1 according to an embodiment of the present disclosure.

도 2를 참조하면, 인공신경망(200)은, 입력 데이터(210)가 입력되는 입력 노드(

,

...

,

), 입력 데이터(210)에 대응하는 출력 데이터를 출력하는 출력 노드(

,

...

,

), 입력 노드와 출력 노드 사이에 위치하는 은닉 노드 및 다수의 파라미터를 포함할 수 있다. 입력 노드(

,

...

,

)는, 입력층(220)을 구성하는 노드로서, 외부로부터 입력 데이터(210)(예를 들어, 이미지)를 수신하고, 출력 노드(

,

...

,

)는 출력층(240)을 구성하는 노드로서, 외부로 출력데이터를 출력할 수 있다. 입력 노드와 출력 노드 사이에 위치한 은닉 노드는, 은닉층(230)을 구성하는 노드로서, 입력 노드의 출력 데이터를 출력 노드의 입력 데이터로 연결할 수 있다. 입력층(220)의 각 노드는, 도 2에 도시된 바와 같이, 출력층(240)의 각 출력 노드와 완전 연결될 수 있고, 불완전 연결될 수 있다. 또한, 입력 노드는, 외부로부터 입력 데이터를 수신하여 은닉 노드로 전달해주는 역할을 할 수 있다. 이때, 은닉 노드와 출력 노드에서는, 데이터에 대한 계산을 수행할 수 있는데, 수신한 입력 데이터에 파라미터(또는 가중치)를 곱하여 계산을 수행할 수 있다. 각 노드의 계산이 완료되면, 계산 결과값을 모두 합한 후, 미리 설정된 활성화 함수를 이용하여 출력 데이터를 출력할 수 있다. Referring to FIG. 2 , the artificial neural network 200 is an input node (

,

...

,

), an output node that outputs output data corresponding to the input data 210 (

,

...

,

), a hidden node located between the input node and the output node, and a number of parameters. input node (

,

...

,

) is a node constituting the input layer 220, which receives input data 210 (eg, an image) from the outside, and an output node (

,

...

,

) is a node constituting the output layer 240 , and may output output data to the outside. A hidden node located between the input node and the output node is a node constituting the hidden layer 230 and may connect output data of the input node to the input data of the output node. Each node of the input layer 220 may be fully connected or incompletely connected to each output node of the output layer 240 as shown in FIG. 2 . Also, the input node may serve to receive input data from the outside and deliver it to the hidden node. In this case, the hidden node and the output node may perform calculation on data, and may perform calculation by multiplying the received input data by a parameter (or weight). When the calculation of each node is completed, the output data may be output by using a preset activation function after summing all the calculation results.

은닉 노드와 출력 노드(

,

...

,

)는 활성화 함수를 갖는다. 활성화 함수는 계단 함수(step function), 부호 함수(sign function), 선형 함수(linear function), 로지스틱 시그모이드 함수(logistic sigmoid function), 하이퍼탄젠트 함수(hyper tangent function), ReLU 함수, 소프트맥스(softmax) 함수 중 어느 하나일 수 있다. 활성화 함수는 통상의 기술자라면 인공 신경망의 학습 방법에 따라 적절히 결정될 수 있다. Hidden nodes and output nodes (

,

...

,

) has an activation function. Activation functions are step function, sign function, linear function, logistic sigmoid function, hyper tangent function, ReLU function, softmax ( softmax) function. An activation function may be appropriately determined by a person skilled in the art according to a learning method of an artificial neural network.

인공 신경망(200)은 가중치 값들을 반복적으로 적절한 값으로 갱신(또는 수정)하는 과정으로 기계 학습한다. 인공 신경망(200)이 기계 학습하는 방법에는 대표적으로 지도 학습과 비지도 학습이 있다. The artificial neural network 200 performs machine learning by repeatedly updating (or correcting) weight values to appropriate values. As a method for the artificial neural network 200 to perform machine learning, there are representatively supervised learning and unsupervised learning.

지도 학습은 입력 데이터에 대해 임의의 신경망이 계산해내기를 바라는 목표 출력 데이터가 명확히 정해져 있는 상태에서, 상기 입력 데이터를 상기 신경망에 넣어서 얻은 출력 데이터를 상기 목표 데이터에 비슷해질 수 있도록 가중치 값들을 갱신시키는 학습 방법이다. 도 2의 다층 구조의 인공신경망(200)은 지도 학습에 기반하여 생성될 수 있다.Supervised learning updates the weight values so that the output data obtained by putting the input data into the neural network becomes similar to the target data in a state in which the target output data desired to be calculated by an arbitrary neural network is clearly defined for the input data. learning method. The multi-layered artificial neural network 200 of FIG. 2 may be generated based on supervised learning.

도 3을 참조하면, 다층 구조의 인공 신경망의 다른 예로서, 심층 신경망(DNN, Deep Neural Network)의 한 종류인 컨볼루션 신경망(CNN, Convolutional Neural Network)(300)이 있다. 컨벌루션 신경망(CNN)은 하나 또는 여러 개의 컨벌루션 계층(convolutional layer)과 통합 계층(pooling layer), 완전하게 연결된 계층(fully connected layer)들로 구성된 신경망이다. 컨벌루션 신경망(CNN)은 2차원 데이터의 학습에 적합한 구조를 가지고 있으며, 역전달(Backpropagation algorithm)을 통해 학습될 수 있다. 영상 내 객체 분류, 객체 탐지 등 다양한 응용 분야에 폭넓게 활용되는 DNN의 대표적 모델 중 하나이다.Referring to FIG. 3 , as another example of an artificial neural network having a multilayer structure, there is a convolutional neural network (CNN) 300 which is a type of a deep neural network (DNN). A convolutional neural network (CNN) is a neural network composed of one or several convolutional layers, a pooling layer, and fully connected layers. A convolutional neural network (CNN) has a structure suitable for learning two-dimensional data, and can be learned through a backpropagation algorithm. It is one of the representative models of DNN that is widely used in various application fields such as object classification in images and object detection.

여기서, 본 발명의 다층 구조의 인공 신경망이 도 2 및 도 3에 도시된 인공 신경망으로 한정되는 것은 아니며, 기타 다양한 인공 신경망에 다른 종류의 데이터를 기계 학습시켜 학습된 모델을 얻을 수도 있음에 유의해야 한다.Here, it should be noted that the multi-layered artificial neural network of the present invention is not limited to the artificial neural network shown in FIGS. 2 and 3, and a learned model may be obtained by machine learning other types of data in various other artificial neural networks. do.

도 4는 본 개시의 일 실시예에 따른 복수의 레이어를 포함하는 인공신경망의 다른 예를 보여주는 도면이다. 도 1에 도시된 인공신경망(400)은, 도 3에 개시되어 있는 복수의 컨볼루션 레이어(convolution layer: CONV)(420), 복수의 서브샘플링 레이어(subsampling layer: SUBS)(430), 및 복수의 완전 연결 레이어(fully-connected layer: FC)(440)를 포함하는 컨볼루션 인공 신경망(convolution neural network: CNN)이다. 4 is a diagram illustrating another example of an artificial neural network including a plurality of layers according to an embodiment of the present disclosure. The artificial neural network 400 shown in FIG. 1 is a plurality of convolution layers (CONV) 420, a plurality of subsampling layers (SUBS) 430, and a plurality of convolution layers (CONV) 420 shown in FIG. It is a convolutional neural network (CNN) including a fully-connected layer (FC) 440 of

CNN(400)의 CONV(420)은 입력 데이터(410)에 대해 컨볼루션 가중치 커널을 적용하여 특징맵(feature map)을 생성한다. 여기서, CONV(420)은 고차원의 입력 데이터(예를 들어, 이미지 또는 영상)에 대해서 특징을 추출하는 일종의 템플릿 역할을 할 수 있다. 구체적으로, 하나의 컨볼루션은 입력데이터(410)의 부분을 대상으로 위치를 변경하면서 여러 번 반복하여 적용되어 전체 입력데이터(410)에 대해 특징을 추출할 수 있다. 또한, SUBS(430)은 CONV(420)에 의해 생성된 특징맵에 대해서 공간적 해상도를 감소하는 역할을 한다. 서브샘플링은 입력데이터(예를 들어, 특징맵)의 차원을 축소하는 기능을 하며, 이를 통해 입력 데이터(410)의 분석 문제의 복잡도를 감소시킬 수 있다. SUBS(430)은 특징맵의 부분의 값들에 대해 최대치를 취하는 맥스풀링(max pooling) 연산자나 평균치를 취하는 평균풀링(average pooling) 연산자를 사용할 수 있다. 이와 같은 SUBS(430)은 풀링 연산을 통해 특징맵의 차원을 감소시킬 뿐 아니라, 특징맵이 이동(shift)과 왜곡(distortion)에 대해 강인하도록 하는 효과를 갖는다. 마지막으로 FC(440)은 특징맵에 기초하여 입력 데이터를 분류하는 기능을 수행할 수 있다.The CONV 420 of the CNN 400 generates a feature map by applying a convolution weight kernel to the input data 410 . Here, the CONV 420 may serve as a kind of template for extracting features from high-dimensional input data (eg, image or video). Specifically, one convolution may be repeatedly applied several times while changing the location of a portion of the input data 410 to extract features from the entire input data 410 . In addition, the SUBS 430 serves to reduce the spatial resolution of the feature map generated by the CONV 420 . The subsampling functions to reduce the dimension of the input data (eg, a feature map), thereby reducing the complexity of the analysis problem of the input data 410 . The SUBS 430 may use a max pooling operator that takes a maximum value or an average pooling operator that takes an average value with respect to values of a part of the feature map. The SUBS 430 has the effect of not only reducing the dimension of the feature map through a pooling operation, but also making the feature map robust against shift and distortion. Finally, the FC 440 may perform a function of classifying input data based on the feature map.

CNN(400)은, CONV(420), SUBS(430), FC(440)의 레이어 수 또는 연산자의 종류에 따라 다양한 구성과 기능을 실행할 수 있다. 예를 들어, CNN(400)은, AlexNet, VGGNet, LeNet, ResNet 등과 같은 다양한 CNN의 구성 중 어느 하나를 포함할 수 있으나, 이에 한정되는 것은 아니다.CNN 400 may execute various configurations and functions according to the number of layers of CONV 420 , SUBS 430 , and FC 440 or types of operators. For example, the CNN 400 may include any one of various CNN configurations such as AlexNet, VGGNet, LeNet, ResNet, and the like, but is not limited thereto.

이상 설명한 구성을 갖는 CNN(400)의 CONV(420)은, 이미지 데이터가 입력 데이터(410)로 입력되면, 입력 데이터(410)에 가중치를 적용하여 합성곱 연산을 통해 특징맵을 생성할 수 있는데, 이때, 사용되는 가중치들의 그룹을 가중치 커널(kernel)이라고 지칭할 수 있다. 가중치 커널은, n x m x d의 3차원 행렬(여기서, n은 입력 이미지 데이터와 마찬가지로 특정 크기의 행을 나타내고, m은 특정 크기의 열을 나타내며, d는 입력 이미지 데이터의 채널 등을 나타내는 것으로, 이들 차원의 수는 1이상의 정수임)로 구성되는데, 입력 데이터(410)를 지정된 간격으로 순회하며 합성곱 연산을 통해 특징맵을 생성할 수 있다. 이때, 입력 데이터(410)가 복수의 채널(예를 들어, RGB의 3개의 채널)을 갖는 컬러 이미지라면, 가중치 커널은 입력 데이터(410)의 각 채널을 순회하며 합성곱을 계산한 후, 채널 별 특징맵을 생성할 수 있다. The CONV 420 of the CNN 400 having the above-described configuration can generate a feature map through convolution operation by applying a weight to the input data 410 when the image data is input as the input data 410. , in this case, a group of weights used may be referred to as a weight kernel. The weight kernel is a three-dimensional matrix of nxmxd, where n represents a row of a specific size as in the input image data, m represents a column of a specific size, d represents a channel of the input image data, etc. number is an integer greater than or equal to 1), and it is possible to generate a feature map through convolution operation by traversing the input data 410 at a specified interval. At this time, if the input data 410 is a color image having a plurality of channels (eg, three channels of RGB), the weight kernel traverses each channel of the input data 410 and calculates a convolution, You can create a feature map.

도 5는 본 개시의 일 실시예에 따른 컨볼루션 레이어의 입력 데이터와 합성곱 연산에 사용되는 가중치 커널을 나타내는 도면이다. 5 is a diagram illustrating a weight kernel used for a convolution operation with input data of a convolution layer according to an embodiment of the present disclosure.

도시된 바와 같이, 입력 데이터(510)는, 특정 크기의 행(530)과 특정 크기의 열(540)로 구성된 2차원적 행렬로 표시되는 이미지 또는 영상일 수 있다. 앞서 설명한 바와 같이, 입력 데이터(510)는 복수의 채널(550)을 가질 수 있는데, 여기서 채널(550)은 입력 데이터 이미지의 컬러 성분의 수를 나타낼 수 있다. 한편, 가중치 커널(520)은, 입력 데이터(510)의 일정 부분을 스캐닝하면서 해당 부분의 특징을 추출하기 위한 합성곱에 사용되는 가중치 커널일 수 있다. 가중치 커널(520)은, 입력 데이터 이미지와 마찬가지로 특정 크기의 행(560), 특정 크기의 열(570), 특정 수의 채널(580)을 갖도록 구성될 수 있다. 일반적으로 가중치 커널(520)의 행(560), 열(570)의 크기는 동일하도록 설정되며, 채널(580)의 수는 입력 데이터 이미지의 채널(550)의 수와 동일할 수 있다. As illustrated, the input data 510 may be an image or an image displayed as a two-dimensional matrix composed of a row 530 of a specific size and a column 540 of a specific size. As described above, the input data 510 may have a plurality of channels 550 , where the channels 550 may represent the number of color components of the input data image. Meanwhile, the weight kernel 520 may be a weight kernel used for convolution for extracting a feature of a portion of the input data 510 while scanning it. The weight kernel 520 may be configured to have a specific size of a row 560 , a specific size of a column 570 , and a specific number of channels 580 similarly to the input data image. In general, the size of the row 560 and the column 570 of the weight kernel 520 is set to be the same, and the number of channels 580 may be the same as the number of channels 550 of the input data image.

도 6은 본 개시의 일 실시예에 따른 입력 데이터에 대해 제1 커널을 사용하여 합성곱을 실행하여 제1 활성화 맵을 생성하는 절차를 설명하는 도면이다. 6 is a diagram for describing a procedure of generating a first activation map by performing convolution on input data using a first kernel according to an embodiment of the present disclosure.

제1 가중치 커널(610)은, 도 2의 가중치 커널(620)의 제1채널을 나타내는 가중치 커널일 수 있다. 제1 가중치 커널(610)은, 입력 데이터를 지정된 간격으로 순회하며 합성곱을 실행함으로써, 최종적으로 제1 활성화 맵(630)을 생성할 수 있다. 합성곱은, 입력 데이터(510)의 일 부분에 제1 가중치 커널(610)을 적용하였을 때, 그 부분의 특정 위치의 입력 데이터 값들과 가중치 커널의 해당 위치의 값들을 각각 곱한 뒤 생성된 값들을 모두 더하여 실행된다. 이러한 합성곱 과정을 통해, 제1 결과값(620)이 생성되며, 제1 가중치 커널(610)이 입력 데이터(510)를 순회할 때마다 이러한 합성곱의 결과값들이 생성되어 특징맵을 구성한다. 특징맵의 각 구성요소 값들은 컨볼루션 레이어의 활성화 함수를 통해 제1 활성화 맵(630)으로 변환된다.The first weight kernel 610 may be a weight kernel indicating the first channel of the weight kernel 620 of FIG. 2 . The first weight kernel 610 may finally generate the first activation map 630 by traversing the input data at a specified interval and performing convolution. Convolution, when the first weight kernel 610 is applied to a part of the input data 510, multiplies the input data values at a specific position of the part and the values at the corresponding position of the weight kernel, and then uses all of the values generated. is executed in addition. Through this convolution process, a first result value 620 is generated, and whenever the first weight kernel 610 traverses the input data 510 , the result values of the convolution are generated to configure a feature map. . Each component value of the feature map is converted into the first activation map 630 through the activation function of the convolutional layer.

도 7은 본 개시의 일 실시예에 따른 입력 데이터에 대해 제2 가중치 커널을 사용하여 합성곱을 실행하여 제2 활성화 맵을 생성하는 절차를 설명하는 도면이다.7 is a diagram illustrating a procedure of generating a second activation map by performing convolution on input data using a second weight kernel according to an embodiment of the present disclosure.

도 6에 도시된 바와 같이 제1 가중치 커널(610)을 이용하여 입력 데이터(510)에 대해 합성곱을 실행하여 제1 활성화 맵(620)을 생성한 후, 도 7에 도시된 바와 같이 제2 가중치 커널(710)을 이용하여 입력 데이터(510)에 대해 합성곱을 실행함으로써 제2 활성화 맵(730)을 생성할 수 있다. As shown in FIG. 6 , the first activation map 620 is generated by performing convolution on the input data 510 using the first weight kernel 610 , and then, as shown in FIG. 7 , the second weight The second activation map 730 may be generated by performing convolution on the input data 510 using the kernel 710 .

제2 가중치 커널(710)은, 도 5의 가중치 커널(520)의 제2채널을 나타내는 가중치 커널일 수 있다. 제2 가중치 커널(710)은, 입력 데이터를 지정된 간격으로 순회하며 합성곱을 실행함으로써, 최종적으로 제2 활성화 맵(730)을 생성할 수 있다. 도 6과 마찬가지로, 합성곱은, 입력 데이터(510)의 일 부분에 제2 가중치 커널(710)을 적용하였을 때, 그 부분의 특정 위치의 입력 데이터 값들과 가중치 커널의 해당 위치의 값들을 각각 곱한 뒤 생성된 값들을 모두 더하여 실행된다. 이러한 합성곱 과정을 통해, 제2 결과값(720)이 생성되며, 제2 가중치 커널(710)이 입력 데이터(510)를 순회할 때마다 이러한 합성곱의 결과값들이 생성되어 특징맵을 구성한다. 특징맵의 각 구성요소 값들은 컨볼루션 레이어의 활성화 함수를 통해 제2 활성화 맵(730)으로 변환된다.The second weight kernel 710 may be a weight kernel indicating the second channel of the weight kernel 520 of FIG. 5 . The second weight kernel 710 may finally generate the second activation map 730 by traversing the input data at specified intervals and performing convolution. As in FIG. 6 , the convolution, when the second weight kernel 710 is applied to a portion of the input data 510 , multiplies the input data values of a specific position of the portion and the values of the corresponding position of the weight kernel, respectively. It is executed by adding all the generated values. Through this convolution process, a second result value 720 is generated, and whenever the second weight kernel 710 traverses the input data 510 , the result value of the convolution is generated to configure a feature map. . Each component value of the feature map is converted into the second activation map 730 through the activation function of the convolutional layer.

도 8은 본 개시의 일 실시예에 따른 입력 특징맵이 하나의 채널을 가지는 경우의 컨볼루션 레이어의 연산 과정을 행렬로 표현한 도면이다. 8 is a diagram illustrating a process of calculating a convolutional layer in a matrix when an input feature map has one channel according to an embodiment of the present disclosure.

도 8에 도시된 콘볼루션 레이어(420)는 도 4에 도시된 CONV(420)에 대응될 수 있다. 도 8에서 콘볼루션 레이어(420)에 입력되는 입력 데이터(810)는 6 x 6의 크기를 갖는 2차원적 행렬로 표시되며, 가중치 커널(814)은 3 x 3 크기를 갖는 2차원적 행렬로 표시된다. 그러나, 컨볼루션 레이어(420)의 입력 데이터(810) 및 가중치 커널(814)의 크기는, 이에 한정되는 것은 아니며, 컨볼루션 레이어(420)가 포함되는 인공신경망의 성능 및 요구사항에 따라 다양하게 변경될 수 있다.The convolutional layer 420 illustrated in FIG. 8 may correspond to the CONV 420 illustrated in FIG. 4 . In FIG. 8 , input data 810 input to the convolutional layer 420 is represented as a two-dimensional matrix having a size of 6×6, and the weight kernel 814 is a two-dimensional matrix having a size of 3×3. is displayed However, the size of the input data 810 and the weight kernel 814 of the convolution layer 420 is not limited thereto, and the size of the input data 810 and the weight kernel 814 of the convolution layer 420 may vary according to the performance and requirements of the artificial neural network including the convolution layer 420 . can be changed.

도시된 바와 같이, 컨볼루션 레이어(420)에 입력 데이터(810)가 입력되면, 가중치 커널(814)이 입력 데이터(810) 상에서 사전 결정된 간격(예를 들어, 1)으로 순회하며, 입력 데이터(810)와 가중치 커널(814)의 동일 위치의 값들을 각각 곱하는 다중 곱(elementwise multiplication)을 실행할 수 있다. 가중치 커널(814)은, 일정 간격으로 입력 데이터(810)를 순회하며, 다중 곱을 통해 획득한 값을 합산(summation)(816)한다. As shown, when the input data 810 is input to the convolutional layer 420, the weight kernel 814 traverses the input data 810 at a predetermined interval (eg, 1), and the input data ( Elementwise multiplication may be performed by multiplying the values of the 810 and the weight kernel 814 at the same location, respectively. The weight kernel 814 traverses the input data 810 at regular intervals and sums values obtained through multiple multiplication (summation 816).

구체적으로, 가중치 커널(814)이 입력 데이터(810)의 특정 위치(820)에서 계산한 다중 곱의 값(예를 들어, "3")을 특징맵(818)의 대응 요소(824)에 배정한다. 다음으로, 가중치 커널(814)이 입력 데이터(810)의 다음 위치(822)에서 계산한 다중 곱의 값(예를 들어, "1")을 특징맵(818)의 대응 요소(826)에 배정한다. 이와 같이 가중치 커널(814)이 입력 데이터(810) 상을 순회하면서 계산한 다중 곱의 값들을 특징맵(818)에 모두 배정하면, 4 x 4 크기의 특징맵(818)이 완성된다. 이때, 입력 데이터(810)가 예를 들어 3가지 채널(R채널, G채널, B채널)로 구성된다면, 동일 가중치 커널 또는 채널 별 상이한 채널을 각각 입력 데이터(810)의 각 채널 별 데이터 상을 순회하며 다중 곱(812)과 합(816)을 진행하는 합성곱을 통해 채널 별 특징맵들을 생성할 수 있다. Specifically, the weight kernel 814 assigns a value (eg, "3") of multiple products calculated at a specific location 820 of the input data 810 to the corresponding element 824 of the feature map 818 . do. Next, the weight kernel 814 assigns the multiple product value (eg, “1”) calculated at the next position 822 of the input data 810 to the corresponding element 826 of the feature map 818 . do. In this way, when all values of multiple products calculated by the weight kernel 814 while traversing the input data 810 are allotted to the feature map 818, the feature map 818 having a size of 4 x 4 is completed. At this time, if the input data 810 is composed of, for example, three channels (R channel, G channel, and B channel), the same weighted kernel or different channels for each channel are applied to the data phase for each channel of the input data 810 , respectively. It is possible to generate feature maps for each channel through convolution in which multiple products 812 and 816 are traversed.

다시 도 4을 참조하면, CONV(420)는, 도 25내지 도 8를 참조하여 설명한 방법에 따라 생성된 특징맵에 대해 활성화 함수를 적용하여 콘볼루션 레이어의 최종 출력 결과인 활성화 맵(activation map)을 생성할 수 있다. 여기서, 활성화 함수는 시그모이드 함수((sigmoid function), 방사기저 함수(radial basis function: RBF), 정류선형 함수(rectified linear unit: ReLU) 등 다양한 활성화 함수 중의 어느 하나이거나 또는 이들 중 변형된 함수 이거나 다른 함수 일 수 있다.Referring back to FIG. 4 , the CONV 420 applies an activation function to the feature map generated according to the method described with reference to FIGS. 25 to 8 to obtain an activation map that is a final output result of the convolutional layer. can create Here, the activation function is any one of various activation functions, such as a sigmoid function, a radial basis function (RBF), and a rectified linear unit (ReLU), or a modified function of them. or it can be another function.

한편, SUBS(430)는, CONV(420)의 출력 데이터인 활성화 맵을 입력 데이터로 수신한다. SUBS(430)은, 활성화 맵의 크기를 줄이거나 특정 데이터를 강조하는 기능을 수행한다. SUBS(430)가 맥스 풀링을 사용하는 경우, 활성화 맵의 특정 영역 안 값의 최댓값을 선택하여 출력한다. 이와 같이 SUBS(430)의 풀링 과정을 통해 입력 데이터의 노이즈를 제거할 수 있고, 그 데이터의 크기를 줄일 수 있다. Meanwhile, the SUBS 430 receives the activation map, which is output data of the CONV 420 , as input data. The SUBS 430 performs a function of reducing the size of the activation map or emphasizing specific data. When the SUBS 430 uses max pooling, the maximum value of a value in a specific region of the activation map is selected and output. As described above, noise of input data can be removed through the pooling process of the SUBS 430 and the size of the data can be reduced.

또한, FC(440)는 SUBS(430)의 출력 데이터를 수신하여 최종 출력 데이터(450)를 생성할 수 있다. SUBS(430)에서 추출된 활성화 맵은, 완전 연결 레이어(440)에 입력되기 위해 1차원적으로 평면화된다. Also, the FC 440 may receive the output data of the SUBS 430 to generate the final output data 450 . The activation map extracted from the SUBS 430 is one-dimensionally flattened to be input to the fully connected layer 440 .

도 9는 본 개시의 일 실시예에 따른 완전 연결 레이어의 연산 과정을 행렬로 표현한 도면이다. 9 is a diagram illustrating a matrix operation of a fully connected layer according to an embodiment of the present disclosure.

도 9에 도시된 완전 연결 레이어(440)는 도 4의 FC(440)에 대응될 수 있다. 이상 설명한 바와 같이, 맥스 풀링 레이어(430)에서 추출된 활성화 맵은 완전 연결 레이어(440)로 입력되기 위해 1차원으로 평명화 될 수 있다. 1차원으로 평명화된 활성화 맵은, 완전 연결 레이어(440)에서 입력 데이터(910)로 수신될 수 있다. 완전 연결 레이어(440)에서는, 1차원의 가중치 커널(914)을 이용하여 입력 데이터(910)와 가중치 커널(914)의 다중 곱(912)을 실행할 수 있다. 이와 같은 입력 데이터(910)와 가중치 커널(914)의 다중 곱의 결과값은 합산(916)되어 출력 데이터(918)로 출력될 수 있다. 이때, 출력 데이터(918)는, CNN(400)에 입력된 입력 데이터(410)에 대한 추론 값을 나타낼 수 있다.The fully connected layer 440 illustrated in FIG. 9 may correspond to the FC 440 of FIG. 4 . As described above, the activation map extracted from the max pooling layer 430 may be flattened in one dimension to be input to the fully connected layer 440 . The one-dimensional smoothed activation map may be received as input data 910 in the fully connected layer 440 . In the fully connected layer 440 , multiple products 912 of the input data 910 and the weight kernel 914 may be executed using the one-dimensional weight kernel 914 . A result value of the multiple product of the input data 910 and the weight kernel 914 may be summed 916 and output as the output data 918 . In this case, the output data 918 may represent an inference value for the input data 410 input to the CNN 400 .

이상 설명한 구성을 갖는 CNN(400)은, 복수의 레이어 각각에 대해 2차원 또는 1차원 행렬의 입력 데이터가 입력되고, 입력 데이터에 대해 가중치 커널의 다중 곱과 합산과 같은 복잡한 연산을 통해 학습과 추론 과정을 실행한다. 따라서, CNN(400)의 구성하는 레이어의 수나 연산의 복잡도에 따라 데이터의 학습 및 추론에 소요되는 자원(예를 들어, 연산자의 수나 메모리의 양)이 상당히 증가할 수 있다. 따라서, CNN(400)과 같이 복수의 레이어를 갖는 인공 신경망의 연산량과 메모리를 줄이기 위하여 레이어 별로 사용되는 입출력 데이터에 대한 비트 양자화가 실행될 수 있다. 일 실시예에서, 복수의 레이어를 갖는 CNN(400)의 비트 양자화는, 많은 연산량과 메모리량이 필요한 CONV(420)와 FC(440)에 대해 실행될 수 있다.The CNN 400 having the configuration described above receives input data of a two-dimensional or one-dimensional matrix for each of a plurality of layers, and learns and infers through complex operations such as multiple multiplication and summation of a weighted kernel for the input data. run the process Accordingly, resources (eg, the number of operators or the amount of memory) required for data learning and inference may be significantly increased according to the number of layers constituting the CNN 400 or the complexity of operations. Accordingly, bit quantization may be performed on input/output data used for each layer in order to reduce the amount of computation and memory of an artificial neural network having a plurality of layers, such as the CNN 400 . In an embodiment, the bit quantization of the CNN 400 having a plurality of layers may be performed for the CONV 420 and the FC 440 requiring a large amount of computation and memory.

도 10은 본 개시의 일 실시예에 따른 콘볼루션 레이어의 비트 양자화 과정을 행렬로 표현한 도면이다. 10 is a diagram illustrating a process of bit quantization of a convolutional layer in a matrix according to an embodiment of the present disclosure.

콘볼루션 레이어에서 실행되는 비트 양자화는, 합성곱 연산에 사용되는 가중치 커널의 각 요소 값의 비트 수를 감소시키는 가중치 또는 가중치 커널 양자화(1028), 및/또는 특징맵 또는 활성화 맵의 각 요소 값의 비트 수를 감소시키는 특징맵 양자화 또는 활성화 맵 양자화(1030)를 포함할 수 있다. The bit quantization performed in the convolutional layer is a weight or weight kernel quantization 1028 that reduces the number of bits of each element value of the weight kernel used for the convolution operation, and/or the value of each element of the feature map or activation map. It may include feature map quantization or activation map quantization 1030 that reduces the number of bits.

일 실시예에 따른 콘볼루션 레이어의 비트 양자화 과정은, 다음과 같이 실행될 수 있다. 콘볼루션 레이어의 입력 데이터(1010)에 가중치 커널(1014)을 적용하여 합성곱을 실행하기 전에, 가중치 커널(1014)에 대한 양자화(716) 과정을 실행하여 양자화된 가중치 커널(1018)을 생성한다. 또한, 입력 데이터(1010)에 대해 양자화된 가중치 커널(1018)을 적용하여 다중 곱(1012)과 합산(1020)을 실행하여 합성곱의 값들을 출력하여 특징맵을 생성한 뒤 활성화 함수를 통해 활성화 맵(1022)을 생성할 수 있다. 다음으로, 활성화 맵에 대해 양자화(1024)를 통해 최종 양자화 활성화 맵(1026)을 생성할 수 있다. The bit quantization process of the convolutional layer according to an embodiment may be performed as follows. Before convolution is performed by applying the weight kernel 1014 to the input data 1010 of the convolutional layer, a quantization 716 process is performed on the weight kernel 1014 to generate a quantized weight kernel 1018 . In addition, the quantized weight kernel 1018 is applied to the input data 1010, multiple products 1012 and summation 1020 are executed to output the values of the convolution to generate a feature map, and then activate it through an activation function. A map 1022 may be generated. Next, a final quantized activation map 1026 may be generated through quantization 1024 on the activation map.

이상 설명한 콘볼루션 레이어의 비트 양자화 과정에서, 가중치 커널 양자화(1028)는 다음 수식을 이용하여 실행될 수 있다.In the bit quantization process of the convolutional layer described above, the weight kernel quantization 1028 may be performed using the following equation.

여기서,

는 양자화될 가중치 값(예를 들어, 실수의 가중치 및 가중치 커널 내의 각 가중치)을 나타내고,

는 양자화할 비트 수를 나타내고,

는

가k비트 만큼 양자화된 결과를 나타낸다. 즉, 위 수식에 따르면, 먼저

에 대해, 사전 결정된 이진수

를 곱하여 ,

가 k 비트만큼 자리수가 증가된다(이하 "제1 값"이라고 함). 다음으로, 제1 값에 대해 라운딩(rounding) 또는 트렁케이션(truncation) 연산을 실행함으로써,

의 소수점 이하 숫자가 제거된다(이하 "제2 값"이라고 함). 제2 값은 이진수

으로 나누어, k 비트만큼 자리수가 다시 감소됨으로써, 최종 양자화된 가중치 커널의 요소 값이 계산될 수 있다. 이와 같은 가중치 또는 가중치 커널 양자화(1028)는 가중치 또는 가중치 커널(1014)의 모든 요소 값에 대해 반복 실행되어, 양자화 된 가중치 값들(1018)이 생성된다.here,

denotes the weight values to be quantized (eg, real weights and each weight in the weight kernel),

represents the number of bits to be quantized,

is

It represents the result quantized by k bits. That is, according to the above formula, first

For , a predetermined binary number

Multiply by ,

is incremented by k bits (hereinafter referred to as "first value"). Next, by executing a rounding or truncation operation on the first value,

The number after the decimal point of ' is removed (hereinafter referred to as "second value"). the second value is binary

By dividing by , the number of digits is again reduced by k bits, so that the element value of the final quantized weight kernel can be calculated. This weight or weight kernel quantization 1028 is iteratively executed for all element values of the weight or weight kernel 1014 , thereby generating quantized weight values 1018 .

한편, 특징맵 또는 활성화 맵 양자화(1030)는, 다음 수식에 의해 실행될 수 있다.Meanwhile, the feature map or activation map quantization 1030 may be performed by the following equation.

특징맵 또는 활성화 맵 양자화(1030)에서는, 가중치 또는 가중치 커널 양자화(1028) 방법과 동일한 수식이 이용될 수 있다. 다만, 특징맵 또는 활성화 맵 양자화(1030)에서 특징맵 또는 활성화 맵(1022)의 각 요소 값(

)(예를 들어, 실수의 계수)에 대한 양자화가 적용되기 전에, 클립핑(clipping)이 적용하여 특징맵 또는 활성화 맵(1022)의 각 요소 값을 0에서 1의 사이 값으로 정규화 시키는 과정을 추가할 수 있다. In the feature map or activation map quantization 1030, the same formula as the weight or weight kernel quantization 1028 method may be used. However, in the feature map or activation map quantization 1030, each element value (

) (e.g., coefficients of real numbers) before quantization is applied, clipping is applied to normalize each element value of the feature map or activation map 1022 to a value between 0 and 1 is added. can do.

다음으로, 정규화된

에 대해, 사전 결정된 이진수

를 곱하여,

가 k 비트만큼 자리수가 증가된다("제1 값"). 다음으로, 제1 값에 대해 라운딩 또는 트렁케이션 연산을 실행함으로써,

의 소수점 이하 숫자가 제거된다("제2 값"). 제2 값은 이진수

으로 나누어, k 비트만큼 자리수가 다시 감소됨으로써, 최종 양자화된 특징맵 또는 활성화 맵(1026)의 요소 값이 계산될 수 있다. 이와 같은 특징맵 또는 활성화 맵의 양자화(1030)는 특징맵 또는 활성화 맵(1022)의 모든 요소 값에 대해 반복 실행되어, 양자화된 특징맵 또는 활성화 맵(1026)이 생성된다.Next, normalized

For , a predetermined binary number

Multiply by

is incremented by k bits (“first value”). Next, by executing a rounding or truncation operation on the first value,

's decimal digits are removed ("second value"). the second value is binary

By dividing by , the number of digits is again reduced by k bits, so that the element value of the final quantized feature map or activation map 1026 can be calculated. The quantization 1030 of the feature map or activation map is iteratively executed for all element values of the feature map or activation map 1022 , thereby generating a quantized feature map or activation map 1026 .

이상 설명한 가중치 또는 가중치 커널 양자화(1028)와 특징맵 또는 활성화 맵 양자화(1030)를 통해, 콘볼루션 신경망의 콘볼루션 레이어(420)의 합성곱 연산 등에 소요되는 메모리 크기와 연산량을 비트 단위로 감소시킬 수 있다.Through the above-described weight or weight kernel quantization 1028 and feature map or activation map quantization 1030, the memory size and amount of computation required for the convolution operation of the convolutional layer 420 of the convolutional neural network can be reduced in bits. can

도 11은 본 개시의 일 실시예에 따른 인공신경망의 비트 양자화 방법을 나타내는 순서도이다. 이 실시예는, 인공신경망에서 양자화할 수 있는 데이터 그룹의 단위를, 인공신경망을 구성하는 각 레이어에 속한 모든 파라미터로 가정한 예이다.11 is a flowchart illustrating a bit quantization method of an artificial neural network according to an embodiment of the present disclosure. This embodiment is an example in which the unit of a data group that can be quantized in the artificial neural network is assumed to be all parameters belonging to each layer constituting the artificial neural network.

도시된 바와 같이, 인공신경망의 비트 양자화 방법(1100)은, 인공신경망에 포함된 복수의 레이어 중의 적어도 하나의 레이어를 선택하는 단계(S1110)로 개시될 수 있다. 인공신경망에 포함된 복수의 레이어 중에서 어떤 레이어를 선택할지는, 인공신경망의 전체 성능 또는 연산량(또는 메모리양)에 선택될 레이어가 미치는 영향에 따라 결정될 수 있다. 일 실시예에서, 앞서 설명한 도 1 내지 도 3를 참조하여 설명한 다층 구조의 인공 신경망에서는, 인공신경망의 전체 성능 또는 연산량 등에 미치는 영향이 큰 레이어가 임의로 선택될 수 있다. 또한, 도 4 내지 도 10을 참조하여 설명한 콘볼루션 인공신경망(CNN)(400)의 경우에는, 콘볼루션 레이어(420) 및/또는 완전 연결 레이어(440)가 CNN(400)의 전체 성능 또는 연산량 등에 미치는 영향이 크기 때문에, 이들 레이어(420, 440) 중 적어도 하나의 레이어가 선택될 수 있다. As shown, the bit quantization method 1100 of the artificial neural network may start with the step of selecting at least one layer from among a plurality of layers included in the artificial neural network ( S1110 ). Which layer to select from among a plurality of layers included in the artificial neural network may be determined according to the effect of the selected layer on the overall performance or computational amount (or memory amount) of the artificial neural network. In an embodiment, in the multi-layered artificial neural network described above with reference to FIGS. 1 to 3 , a layer having a large influence on overall performance or computational amount of the artificial neural network may be arbitrarily selected. In addition, in the case of the convolutional artificial neural network (CNN) 400 described with reference to FIGS. 4 to 10 , the convolutional layer 420 and/or the fully connected layer 440 is the overall performance or amount of computation of the CNN 400 . At least one of the layers 420 and 440 may be selected because the effect on the etc. is large.

인공신경망에 포함된 복수의 레이어 중 적어도 하나를 선택하는 방법은, 선택된 레이어가 인공신경망의 전체 성능 또는 연산량 등에 미치는 영향에 따라 결정될 수 있으나, 이에 한정되는 것은 아니고, 다양한 방법들 중에 하나를 포함할 수 있다. 예를 들어, 인공신경망에 포함된 복수의 레이어 중 적어도 하나의 레이어의 선택은, (i) 인공신경망을 구성하는 복수의 레이어의 배열 순서에 따라 입력 데이터가 수신되는 제1 레이어부터 이후 레이어로 순차적으로 선택하는 방법, (ii) 인공신경망을 구성하는 복수의 레이어의 배열 순서에 따라 최종 출력 데이터가 생성되는 가장 마지막 레이어부터 이전 레이어로 순차적으로 선택하는 방법, (iii) 인공신경망을 구성하는 복수의 레이어 중에서 가장 연산량이 높은 레이어부터 선택하는 방법, 또는 (iv) 인공신경망을 구성하는 복수의 레이어 중에서 가장 연산량이 작은 레이어부터 선택하는 방법에 따라 실행될 수도 있다. The method of selecting at least one of the plurality of layers included in the artificial neural network may be determined according to the effect of the selected layer on the overall performance or the amount of computation of the artificial neural network, but is not limited thereto, and may include one of various methods. can For example, the selection of at least one layer among a plurality of layers included in the artificial neural network is sequentially performed from a first layer from which input data is received to a subsequent layer according to the arrangement order of the plurality of layers constituting the artificial neural network. (ii) a method of sequentially selecting from the last layer in which the final output data is generated to the previous layer according to the arrangement order of the plurality of layers constituting the artificial neural network, (iii) the plurality of layers constituting the artificial neural network. It may be performed according to a method of selecting a layer with the highest amount of computation from among the layers, or (iv) a method of selecting a layer with the least amount of computation from among a plurality of layers constituting the artificial neural network.

단계(S1110)에서 인공신경망의 레이어 선택이 완료되면, 선택된 레이어의 파라미터(예를 들어, 가중치)에 대한 데이터 표현 크기를 비트 단위로 감소시키는 단계(S1120)로 진행될 수 있다.When the selection of the layer of the artificial neural network is completed in step S1110, the step of reducing the data representation size of the parameter (eg, weight) of the selected layer in bits (S1120) may be performed.

일 실시예에서, 선택된 레이어의 파라미터들 중 가중치 또는 출력 데이터의 크기를 비트 단위로 감소시키는 경우, 도 4내지 도 10을 참조하여 설명한 가중치 커널 양자화(1028)와 활성화 맵 양자화(1024)가 실행될 수 있다. 예를 들어, 가중치 커널 양자화(1028)는, 다음 수식에 의해 산출될 수 있다.In an embodiment, when the weight or the size of output data among the parameters of the selected layer is reduced in bits, the weight kernel quantization 1028 and the activation map quantization 1024 described with reference to FIGS. 4 to 10 may be performed. have. For example, the weight kernel quantization 1028 may be calculated by the following equation.

여기서,

는 양자화될 가중치 커널의 요소 값(예를 들어, 실수의 가중치 커널 계수)을 나타내고,

는 양자화할 비트 수를 나타내고,

는

에 대해, 사전 결정된 이진수

를 곱하여 ,

의 소수점 이하 숫자가 제거된다("제2 값"). 제2 값은 이진수

으로 나누어, k 비트만큼 자리수가 다시 감소됨으로써, 최종 양자화된 가중치 커널의 요소 값이 계산될 수 있다. 이와 같은 가중치 커널 양자화(1028)는 가중치 커널(1014)의 모든 요소 값에 대해 반복 실행되어, 양자화 가중치 커널(1018)이 생성된다.here,

denotes the element values of the weight kernel to be quantized (eg, real weight kernel coefficients),

represents the number of bits to be quantized,

is

For , a predetermined binary number

Multiply by ,

's decimal digits are removed ("second value"). the second value is binary

By dividing by , the number of digits is again reduced by k bits, so that the element value of the final quantized weight kernel can be calculated. This weight kernel quantization 1028 is iteratively executed for all element values of the weight kernel 1014 to generate a quantized weight kernel 1018 .

한편, 활성화 맵 양자화(1030)는, 다음 수식에 의해 실행될 수 있다.Meanwhile, the activation map quantization 1030 may be performed by the following equation.

활성화 맵 양자화(1030)에서는, 활성화 맵(1022)의 각 요소 값(

)(예를 들어, 실수의 계수)에 대한 양자화가 적용되기 전에, 클립핑(clipping)이 적용하여 활성화 맵(1022)의 각 요소 값을 0에서 1의 사이 값으로 정규화 시키는 과정을 추가할 수 있다. 다음으로, 정규화된

에 대해, 사전 결정된 이진수

를 곱하여,

의 소수점 이하 숫자가 제거된다("제2 값"). 제2 값은 이진수

으로 나누어, k 비트만큼 자리수가 다시 감소됨으로써, 최종 양자화된 활성화 맵(1026)의 요소 값이 계산될 수 있다. 이와 같은 활성화 맵의 양자화(1030)는 활성화 맵(1022)의 모든 요소 값에 대해 반복 실행되어, 양자화 활성화 맵(1026)이 생성된다.In the activation map quantization 1030, each element value of the activation map 1022 (

) (e.g., coefficients of real numbers) before quantization is applied, a process of normalizing each element value of the activation map 1022 to a value between 0 and 1 by applying clipping may be added. . Next, normalized

For , a predetermined binary number

Multiply by

's decimal digits are removed ("second value"). the second value is binary

By dividing the number by k bits again, the element values of the final quantized activation map 1026 can be calculated. The quantization 1030 of the activation map as described above is iteratively executed for all element values of the activation map 1022 , thereby generating the quantization activation map 1026 .

이상 설명한 실시예들에서는, 인공신경망에서 선택된 레이어의 파라미터에 대한 데이터 표현의 크기를 감소하기 위해, 그 가중치 값 또는 활성화 맵 데이터의 비트 수를 감소하는 예를 설명하였으나, 본 개시의 비트 양자화 방법은 이에 한정되지 않는다. 다른 실시예에서, 인공신경망에서 선택된 레이어에 포함된 다양한 데이터에 대한 여러 연산 단계들 사이에 존재하는 중단 단계의 데이터에 대해 각각 다른 비트를 할당할 수도 있다. 이에 따라서 인공신경망의 하드웨어로 구현 시 각 데이터가 저장되는 메모리(예를 들어, 버퍼, 레지스터, 또는 캐쉬)의 크기를 감소하기 위해, 해당 메모리에 저장되는 각 데이터의 비트 수를 감소하고 해당 메모리의 비트 수를 감소할 수도 있다. 또 다른 실시예에서, 인공신경망에서 선택된 레이어의 데이터가 전송되는 데이터 경로의 데이터 비트의 크기를 비트 단위로 감소할 수도 있다.In the above-described embodiments, an example of reducing the weight value or the number of bits of the activation map data has been described in order to reduce the size of the data expression for the parameter of the layer selected in the artificial neural network, but the bit quantization method of the present disclosure is However, the present invention is not limited thereto. In another embodiment, different bits may be allocated to data in the interruption stage existing between several operation stages for various data included in a layer selected in the artificial neural network. Accordingly, in order to reduce the size of the memory (eg, buffer, register, or cache) in which each data is stored when implemented as hardware of the artificial neural network, the number of bits of each data stored in the corresponding memory is reduced and the number of bits of the corresponding memory is reduced. The number of bits may be reduced. In another embodiment, the size of data bits of a data path through which data of a layer selected in the artificial neural network is transmitted may be reduced in units of bits.

단계(S1120)의 실행 후에, 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 단계(S1130)를 진행할 수 있다. 인공신경망에서 선택된 레이어의 파라미터의 데이터 표현 크기를 비트 단위로 감소한 후, 해당 인공신경망의 출력 결과(예를 들어, 인공신경망의 학습 결과 또는 추론 결과)의 정확도가 사전에 결정된 목표값 이상이라면, 추가적으로 해당 데이터의 비트를 감소시켜도 인공신경망의 전체 성능을 유지할 수 있다고 예상할 수 있다. After the execution of step S1120, a step S1130 of determining whether the accuracy of the artificial neural network is equal to or greater than a predetermined target value may proceed. After reducing the data expression size of the parameter of the selected layer in the artificial neural network in bits, if the accuracy of the output result of the corresponding artificial neural network (for example, the learning result or inference result of the artificial neural network) is greater than or equal to the predetermined target value, additionally It can be expected that the overall performance of the artificial neural network can be maintained even if the bits of the corresponding data are reduced.

따라서, 단계(S1130)에서 인공신경망의 정확도가 목표값 이상이라고 결정되는 경우, 단계(S1120)로 진행하여, 선택된 레이어의 데이터 표현 크기를 비트 단위로 추가로 감소시킬 수 있다. 또한, 해당 인공신경망의 출력 결과(예를 들어, 인공신경망의 학습 결과 또는 추론 결과)의 정확도가 사전에 결정된 목표값 이상인지 여부를 다시 판단할 수 있다(단계 S1130). Accordingly, when it is determined in step S1130 that the accuracy of the artificial neural network is equal to or greater than the target value, the process proceeds to step S1120 to further reduce the data representation size of the selected layer in bits. In addition, it may be determined again whether the accuracy of the output result of the artificial neural network (eg, the learning result or inference result of the artificial neural network) is equal to or greater than a predetermined target value (step S1130 ).

단계(S1130)에서, 인공신경망의 정확도가 목표값 이상이 아니라면, 현재 실행된 비트 양자화에 의해 인공신경망의 정확도가 저하되었다고 판단할 수 있다. 따라서, 이 경우, 바로 이전에 실행된 비트 양자화에서 정확도 목표값을 만족시켰던 최소의 비트 수를 선택된 레이어의 파라미터에 대한 최종 비트 수로 결정할 수 있다(단계 S1140). In step S1130, if the accuracy of the artificial neural network is not equal to or greater than the target value, it may be determined that the accuracy of the artificial neural network is degraded due to the currently executed bit quantization. Accordingly, in this case, the minimum number of bits that satisfy the accuracy target value in the bit quantization performed immediately before may be determined as the final number of bits for the parameter of the selected layer (step S1140 ).

다음으로, 인공신경망의 모든 레이어에 대한 비트 양자화가 완료되었는지를 결정한다(단계 S1150). 이 단계에서, 인공 신경망의 모든 레이어에 대한 비트 양자화가 완료되었다고 판단되면, 전체 프로세스를 종료한다. 반면, 인공 신경망의 레이어들 중에서 아직 비트 양자화가 되지 않은 레이어가 남아 있다면, 해당 레이어에 대한 비트 양자화를 실행하기 위해 단계(S1110)를 실행한다.Next, it is determined whether bit quantization for all layers of the artificial neural network is completed (step S1150). In this step, when it is determined that bit quantization for all layers of the artificial neural network is complete, the entire process is terminated. On the other hand, if there is a layer that has not yet been bit quantized among the layers of the artificial neural network, step S1110 is executed to perform bit quantization on the corresponding layer.

여기서, 단계(S1110)에서 인공신경망에 포함된 복수의 레이어 중에서 다른 하나의 레이어를 선택하는 방법은, (i) 인공신경망을 구성하는 복수의 레이어의 배열 순서에 따라, 이전 선택된 레이어의 다음 레이어를 순차적으로 선택하는 방법("순방향 비트 양자화", forward bit quantization), (ii) 인공신경망을 구성하는 복수의 레이어의 배열 순서에 따라, 이전 선택된 레이어의 이전 레이어를 역방향으로 선택하는 방법("역방향 비트 양자화", backward bit quantization), (iii) 인공신경망을 구성하는 복수의 레이어 중에서 연산량의 순서에 따라, 이전 선택된 레이어 다음으로 연산량이 많은 레이어를 선택하는 방법("고 연산량 비트 양자화", high computational cost bit quantization), 또는 (iv) 인공신경망을 구성하는 복수의 레이어 중에서 연산량의 순서에 따라, 이전 선택된 레이어 다음으로 연산량이 작은 레이어를 선택하는 방법("저 연산량 비트 양자화", low computational cost bit quantization)에 따라 실행될 수도 있다. Here, in the method of selecting another layer from among the plurality of layers included in the artificial neural network in step S1110, (i) according to the arrangement order of the plurality of layers constituting the artificial neural network, the next layer of the previously selected layer is selected. A method of sequentially selecting (“forward bit quantization”, forward bit quantization), (ii) a method of selecting a previous layer of a previously selected layer in the reverse direction according to the arrangement order of a plurality of layers constituting an artificial neural network (“reverse bit quantization”) Quantization", backward bit quantization), (iii) A method of selecting a layer with the highest computational amount after the previously selected layer according to the order of computational amount among a plurality of layers constituting an artificial neural network ("high computational bit quantization", high computational cost bit quantization), or (iv) a method of selecting a layer with the smallest computational amount after the previously selected layer according to the order of computational amount among a plurality of layers constituting the artificial neural network (“low computational cost bit quantization”, low computational cost bit quantization) may be executed according to

일 실시예에서, 인공신경망의 정확도는, 인공신경망이 주어진 문제의 해결 방법(예를 들어, 입력 데이터인 이미지에 포함된 물체의 인식)을 학습 후에, 추론 단계에서 해당 문제의 해결방법을 제시할 확률을 의미할 수 있다. 또한, 이상 설명한 비트 양자화 방법에서 사용되는 목표치는, 인공신경망의 비트 양자화 후에 유지해야할 최소한의 정확도를 나타낼 수 있다. 예를 들어, 목표치가 90%의 정확도라고 가정하면, 비트 양자화에 의해 선택된 레이어의 파라미터를 비트 단위로 감소시킨 후에도, 해당 인공신경망의 정확도가 90% 이상이라면 , 추가의 비트 양자화를 실행할 수 있다. 예를 들어, 첫 번째 비트 양자화를 실행한 후에, 인공신경망의 정확도가 94%로 측정되었다면, 추가의 비트 양자화를 실행할 수 있다. 두 번째 비트 양자화의 실행 후에, 인공신경망의 정확도가 88%로 측정되었다면, 현재 실행된 비트 양자화의 결과를 무시하고, 첫번째 비트 양자화에 의해 결정된 비트 수(즉, 해당 데이터를 표현하기 위한 비트 수)를 최종의 비트 양자화 결과로 확정할 수 있다. In one embodiment, the accuracy of the artificial neural network is that after the artificial neural network learns a method of solving a given problem (eg, recognition of an object included in an image as input data), the method of solving the problem is presented in the reasoning step. It can mean probability. In addition, the target value used in the bit quantization method described above may represent the minimum accuracy to be maintained after bit quantization of the artificial neural network. For example, assuming that the target value is 90% accuracy, even after the parameter of the layer selected by bit quantization is reduced in bits, if the accuracy of the corresponding artificial neural network is 90% or more, additional bit quantization can be performed. For example, after performing the first bit quantization, if the accuracy of the artificial neural network is measured to be 94%, additional bit quantization can be performed. After the execution of the second bit quantization, if the accuracy of the artificial neural network is measured to be 88%, the result of the currently executed bit quantization is ignored, and the number of bits determined by the first bit quantization (that is, the number of bits to represent the data) may be determined as the final bit quantization result.

일 실시예에서, 연산량 비트 양자화(computational cost bit quantization) 방식에 따라, 복수의 레이어를 포함하는 인공 신경망에서, 복수의 레이어 중에서 연산량을 기준으로 비트 양자화를 실행할 레이어를 선택하는 경우, 각 레이어의 연산량은 다음과 같이 결정될 수 있다. 즉, 인공 신경망의 특정 레이어에서 하나의 덧셈 연산이 n 비트와 m 비트의 덧셈을 실행하는 경우, 해당 연산의 연산량은 (n+m)/2로 산정한다. 또한, 인공 신경망의 특정 레이어가 n 비트와 m 비트의 곱셈을 실행하는 경우, 해당 연산의 연산량은 n x m으로 산정할 수 있다. 따라서, 인공 신경망의 특정 레이어의 연산량은, 그 레이어가 실행하는 모든 덧셈과 곱셈의 연산량들을 합산한 결과가 될 수 있다.In an embodiment, according to a computational cost bit quantization method, in an artificial neural network including a plurality of layers, when a layer to be bit quantized is selected based on the amount of computation from among the plurality of layers, the amount of computation of each layer can be determined as follows. That is, when one addition operation performs n-bit and m-bit addition in a specific layer of the artificial neural network, the amount of operation of the corresponding operation is calculated as (n+m)/2. In addition, when a specific layer of the artificial neural network multiplies n bits by m bits, the amount of computation of the corresponding operation may be calculated as n x m. Accordingly, the computational amount of a specific layer of the artificial neural network may be a result of summing all the computational amounts of addition and multiplication executed by the layer.

또한, 연산량 비트 양자화(computational cost bit quantization) 방식에 따라, 인공 신경망에서 복수의 레이어 중에서 연산량을 기준으로 레이어를 선택하여 비트 양자화를 실행하는 방법은, 도 11에 도시된 것에 한정되는 것은 아니고, 다양한 변형이 가능하다. In addition, according to the computational cost bit quantization method, a method of performing bit quantization by selecting a layer from among a plurality of layers based on the amount of computation in the artificial neural network is not limited to that shown in FIG. 11, but various Transformation is possible.

다른 실시예에서, 도 11에 도시된 실시예에서 각 레이어 별 파라미터의 비트 양자화는, 가중치와 활성화맵 각각에 대해 분리하여 실행될 수 있다. 예를 들어, 먼저, 선택된 레이어의 가중치에 대해서 양자화를 실행하고 이에 대한 결과로 가중치가 n 비트를 가지게 된다. 이와는 개별적으로, 선택된 레이어의 출력 활성화 데이터에 대하여 비트 양자화를 실행하여 활성화 맵 데이터의 표현 비트 수를 m비트로 결정할 수 있다. 대안적으로, 해당 레이어의 가중치와 활성화 맵 데이터에 대해 동일한 비트를 할당하면서 양자화 진행을 하고, 결과적으로 가중치와 활성화 맵 데이터 모두에 대해 동일한 n비트로 표현될 수도 있다.In another embodiment, in the embodiment shown in FIG. 11 , bit quantization of parameters for each layer may be performed separately for each of a weight and an activation map. For example, first, quantization is performed on the weight of the selected layer, and as a result, the weight has n bits. Separately, by performing bit quantization on the output activation data of the selected layer, the number of representation bits of the activation map data may be determined as m bits. Alternatively, quantization may be performed while allocating the same bits to the weight of the corresponding layer and the activation map data, and as a result, both the weight and the activation map data may be expressed with the same n bits.

도 12는 본 개시의 다른 실시예에 따른 인공신경망의 비트 양자화 방법을 나타내는 순서도이다.12 is a flowchart illustrating a bit quantization method of an artificial neural network according to another embodiment of the present disclosure.

도시된 바와 같이, 인공신경망의 비트 양자화 방법(1200)은, 인공신경망에 포함된 복수의 레이어 중에서 연산량이 가장 높은 레이어를 선택하는 단계(S1210)로 개시될 수 있다. As shown, the bit quantization method 1200 of the artificial neural network may begin with the step of selecting a layer with the highest computational amount from among a plurality of layers included in the artificial neural network (S1210).

단계(S1210)에서 인공신경망의 레이어 선택이 완료되면, 선택된 레이어의 파라미터에 대한 데이터 표현의 크기를 비트 단위로 감소시키는 단계(S1220)로 진행될 수 있다. 일 실시예에서, 선택된 레이어의 데이터의 크기를 비트 단위로 감소시키는 경우, 도 4 내지 도 10을 참조하여 설명한 가중치 커널 양자화(1028)와 활성화 맵 양자화(1024)가 실행될 수 있다. When the layer selection of the artificial neural network is completed in step S1210, the step of reducing the size of the data representation for the parameter of the selected layer in bits (S1220) may be performed. In an embodiment, when the size of the data of the selected layer is reduced in bits, the weight kernel quantization 1028 and the activation map quantization 1024 described with reference to FIGS. 4 to 10 may be performed.

단계(S1220)의 실행 후에, 지금까지의 비트 양자화 결과를 반영한 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 단계(S1230)를 진행할 수 있다. 단계(S1230)에서 인공신경망의 정확도가 목표값 이상이라고 결정되는 경우, 해당 레이어의 데이터의 크기를 현재의 비트 양자화 결과로 설정하고, 단계(S1210)로 진행하여 단계(S1210 내지 S1230)를 반복 실행할 수 있다. 즉, 단계(S1210)으로 진행하여, 인공신경망 내의 모든 레이어에 대해 연산량을 다시 계산하고, 이를 바탕으로 연산량이 가장 높은 레이어를 다시 선택한다.After the execution of step S1220, a step S1230 of determining whether the accuracy of the artificial neural network reflecting the bit quantization results so far is greater than or equal to a predetermined target value may proceed. When it is determined in step S1230 that the accuracy of the artificial neural network is greater than or equal to the target value, the size of the data of the corresponding layer is set as the current bit quantization result, and the process proceeds to step S1210 and steps S1210 to S1230 are repeatedly executed. can That is, the process proceeds to step S1210, the computation amount is recalculated for all layers in the artificial neural network, and the layer with the highest computation amount is selected again based on this.

단계(S1230)에서, 인공신경망의 정확도가 목표치 이상이 아니라면, 현재 선택된 레이어에 대한 비트 감소 양자화를 취소하고, 해당 레이어는 레이어 선택 단계(S1210)에서 선택할 수 있는 레이어 대상에서 제외 시킨다. 그런 다음에 해당 레이어의 다음으로 연산량이 높은 레이어를 선택할 수 있다(단계 S1240). 다음으로, 선택된 레이어의 데이터의 크기를 비트 단위로 감소할 수 있다(단계 S1250).In step S1230, if the accuracy of the artificial neural network is not equal to or greater than the target value, the bit-reduced quantization of the currently selected layer is canceled, and the layer is excluded from the layer target selectable in the layer selection step S1210. Then, a layer with the next highest computational amount of the corresponding layer may be selected (step S1240). Next, the size of the data of the selected layer may be reduced in bits (step S1250).

단계(S1260)에서, 지금까지의 비트 양자화 결과를 반영한 인공신경망의 정확도가 목표치 이상인지를 결정한다. 만약 인공신경망의 정확도가 목표치 이상이 아니라면, 인공신경망의 모든 레이어에 대한 비트 양자화가 완료되었는지를 결정한다(S1270). 단계(S1270)에서 인공신경망의 모든 레이어에 대한 비트 양자화가 완료되었다고 판단되면, 전체 비트 양자화 절차를 종료한다. 반면, 단계(S1270)에서 인공신경망의 모든 레이어에 대한 비트 양자화가 완료되지 않았다고 판단되면, 단계(S1240)로 진행할 수 있다. In step S1260, it is determined whether the accuracy of the artificial neural network reflecting the bit quantization results so far is equal to or greater than a target value. If the accuracy of the artificial neural network is not higher than the target value, it is determined whether bit quantization for all layers of the artificial neural network is completed (S1270). If it is determined in step S1270 that bit quantization for all layers of the artificial neural network is complete, the entire bit quantization procedure is terminated. On the other hand, if it is determined in step S1270 that bit quantization for all layers of the artificial neural network is not completed, the process may proceed to step S1240.

단계(S1260)에서 인공신경망의 정확도가 목표치 이상이라고 판단되면, 단계(1220)로 진행하여 이후 절차를 진행할 수 있다.If it is determined in step S1260 that the accuracy of the artificial neural network is greater than or equal to the target value, the process may proceed to step 1220 and subsequent procedures.

도 13은 본 개시의 또 다른 실시예에 따른 복수의 레이어를 갖는 인공신경망의 비트 양자화 방법을 나타내는 순서도이다.13 is a flowchart illustrating a bit quantization method of an artificial neural network having a plurality of layers according to another embodiment of the present disclosure.

도시된 바와 같이, 복수의 레이어를 갖는 인공신경망의 비트 양자화 방법(1300)은, 인공 신경망에 포함되는 모든 레이어 각각에 대한 정확도 변동 지점을 탐색하는 단계들(S1310 내지 S1350)을 포함한다. 방법(1300)은, 초기에 인공신경망에 포함되는 모든 레이어의 데이터의 비트 크기를 최대로 고정하고, 정확도 변동 지점의 탐색이 진행되지 않은 하나의 레이어를 선택하는 단계(S1310)로 개시된다.As shown, the bit quantization method 1300 of an artificial neural network having a plurality of layers includes steps ( S1310 to S1350 ) of searching for accuracy variation points for each of all layers included in the artificial neural network. The method 1300 starts with a step (S1310) of initially fixing the bit size of data of all layers included in the artificial neural network to the maximum, and selecting one layer in which the accuracy change point is not searched.

단계(S1310)에서 임의의 인공신경망의 레이어 선택이 완료되면, 선택된 레이어의 데이터의 크기를 비트 단위로 감소시키는 단계(S1320)로 진행될 수 있다. 일 실시예에서, 선택된 레이어의 데이터의 크기를 비트 단위로 감소시키는 경우, 도 4 내지 도 10을 참조하여 설명한 가중치 커널 양자화(1028)와 활성화 맵 양자화(1024)가 실행될 수 있다. When the selection of the layer of any artificial neural network is completed in step S1310, the step of reducing the size of data of the selected layer in bit units may proceed to step S1320. In an embodiment, when the size of the data of the selected layer is reduced in bits, the weight kernel quantization 1028 and the activation map quantization 1024 described with reference to FIGS. 4 to 10 may be performed.

단계(S1320)의 실행 후에, 선택된 레이어에 대해 지금까지의 비트 양자화 결과를 반영한 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정하는 단계(S1330)를 진행할 수 있다. 단계(S1330)에서 인공신경망의 정확도가 목표값 이상이라고 결정되는 경우, 단계(S1320)로 진행하여 현재 선택된 레이어에 대한 추가의 비트 감소 양자화를 실행한다. After the execution of step S1320, a step S1330 of determining whether the accuracy of the artificial neural network reflecting the bit quantization results so far for the selected layer is equal to or greater than a predetermined target value may be performed. When it is determined in step S1330 that the accuracy of the artificial neural network is equal to or greater than the target value, the process proceeds to step S1320 to perform additional bit-reduced quantization on the currently selected layer.

단계(S1330)에서, 인공신경망의 정확도가 목표치 이상이 아니라면, 현재 선택된 레이어의 데이터 비트 수를 가장 최근에 목표치를 만족했었던 최소 비트 수로 설정한다. 이후에, 인공신경망의 모든 레이어에 대한 정확도 변동 지점의 탐색이 완료되었는지를 결정한다(S1340). 이 단계에서, 모든 레이어에 대한 정확도 변동 지점의 탐색이 완료되지 않은 경우에는, 단계(S1310)로 진행할 수 있다. 단계(S1310)에서는 인공신경망에 포함되는 모든 레이어의 데이터의 비트 크기가 최대이고, 성능 변동 지점의 탐색이 진행되지 않은 다른 하나의 레이어를 선택한다.In step S1330, if the accuracy of the artificial neural network is not equal to or greater than the target value, the number of data bits of the currently selected layer is set to the minimum number of bits that most recently satisfied the target value. Thereafter, it is determined whether the search for accuracy variation points for all layers of the artificial neural network is completed ( S1340 ). In this step, if the search for accuracy variation points for all layers is not completed, the process may proceed to step S1310. In step S1310, the bit size of data of all layers included in the artificial neural network is the maximum, and another layer in which the performance change point is not searched is selected.

만약, 단계(S1340)에서, 인공신경망의 모든 레이어에 대한 정확도 변동 지점의 탐색이 완료되었다고 결정되면, 인공 신경망의 각 레이어에 대한 정확도 변동 지점에 대응하는 비트 양자화 결과를 인공 신경망에 반영할 수 있다(S1350). 일 실시예에서, 단계(S1350)에서는, 이상 설명한 단계들(S1310 내지 S1340)에 따라 결정된 인공신경망의 각 레이어의 정확도 변동 지점(예를 들어, 각 레이어에 있어서 인공신경망의 정확도의 열화가 발생되는 지점) 직전의 데이터의 비트 크기로 해당 레이어를 설정한다. If, in step S1340, it is determined that the search for the accuracy change point for all layers of the artificial neural network is completed, the bit quantization result corresponding to the accuracy change point for each layer of the artificial neural network may be reflected in the artificial neural network. (S1350). In one embodiment, in step S1350, the accuracy variation point of each layer of the artificial neural network determined according to the steps S1310 to S1340 described above (for example, deterioration of the accuracy of the artificial neural network in each layer occurs) point), set the layer to the bit size of the previous data.

다른 실시예에서, 단계(S1350)에서는, 이상 설명한 단계들(S1310 내지 S1340)에 따라 결정된 인공신경망의 각 레이어의 정확도 변동 지점 직전의 파라미터에 대한 연산에 요구되는 자원의 크기보다 크게 해당 레이어를 설정한다. 예를 들어, 인공 신경망의 각 레이어의 파라미터의 비트 수를 정확도 변동 지점 직전의 비트 수보다 2 비트 크게 설정할 수 있다. 그 다음에, 단계(1350)에서 설정된 각 레이어의 데이터의 크기를 갖는 인공신경망에 대해 비트 양자화 방법을 실행한다(S1360). 단계(S1360)에서 실행되는 비트 양자화 방법은, 예를 들어, 도 11 또는 도 12에 도시된 방법을 포함할 수 있다.In another embodiment, in step S1350, the layer is set to be larger than the size of the resource required for the calculation of the parameter immediately before the accuracy change point of each layer of the artificial neural network determined according to the steps S1310 to S1340 described above. do. For example, the number of bits of the parameter of each layer of the artificial neural network may be set to be 2 bits larger than the number of bits immediately before the accuracy change point. Then, the bit quantization method is executed on the artificial neural network having the size of data of each layer set in step 1350 ( S1360 ). The bit quantization method executed in step S1360 may include, for example, the method illustrated in FIG. 11 or FIG. 12 .

이상 설명한 다양한 실시예들에 따른 인공신경망의 비트 양자화 방법은, 인공신경망의 복수의 레이어 각각의 가중치 커널 및 특징맵(또는 활성화맵)에 대해 실행되는 것에 한정되지 않는다. 일 실시예에서, 본 개시의 비트 양자화 방법은, 인공신경망의 모든 레이어의 가중치 커널(또는 가중치)에 대해 먼저 실행되고, 이와 같은 가중치 커널 양자화가 반영된 인공신경망의 모든 레이어의 특징맵에 대해 다시 비트 양자화가 실행될 수도 있다. 다른 실시예에서, 인공신경망의 모든 레이어의 특징맵에 대해 먼저 비트 양자화가 실행되고, 이와 같은 특징맵 양자화가 반영된 인공신경망의 모든 레이어의 커널에 대해 다시 비트 양자화가 실행될 수도 있다.The bit quantization method of the artificial neural network according to the various embodiments described above is not limited to being executed on the weight kernel and feature map (or activation map) of each of a plurality of layers of the artificial neural network. In one embodiment, the bit quantization method of the present disclosure is first executed for the weight kernels (or weights) of all layers of the artificial neural network, and bit again for the feature maps of all layers of the artificial neural network in which the weight kernel quantization is reflected. Quantization may be performed. In another embodiment, bit quantization may be first performed on the feature maps of all layers of the artificial neural network, and bit quantization may be performed again on the kernels of all layers of the artificial neural network to which the feature map quantization is reflected.

또한, 본 개시의 인공신경망의 비트 양자화 방법은, 인공신경망의 각 레이어의 가중치 커널들에 대해 동일한 수준의 비트 양자화가 적용되는 것에 한정되지 않는다. 일 실시예에서, 본 개시의 비트 양자화 방법은, 인공신경망의 각 레이어의 가중치 커널 단위로 비트 양자화를 실행할 수도 있고, 또는 각 가중치 커널의 요소가 되는 각 가중치 단위로 다른 비트를 가질 수 있도록 개별적인 비트 양자화를 실행할 수도 있다. In addition, the bit quantization method of the artificial neural network of the present disclosure is not limited to applying the same level of bit quantization to weight kernels of each layer of the artificial neural network. In an embodiment, the bit quantization method of the present disclosure may perform bit quantization in units of weight kernels of each layer of the artificial neural network, or individual bits so that each weight unit that is an element of each weight kernel may have a different bit. Quantization can also be performed.

이하에서는, 본 개시의 다양한 실시예들에 따른 인공신경망의 비트 양자화 방법의 실행 결과의 예들을 도면을 참조하여 설명한다.Hereinafter, examples of execution results of the bit quantization method of an artificial neural network according to various embodiments of the present disclosure will be described with reference to the drawings.

도 14는 본 개시의 일 실시예에 따른 인공신경망의 레이어 별 연산량의 예시를 나타내는 그래프이다. 도 14에 도시된 인공신경망은 16개의 레이어를 포함하고 있는 VGG-16 모델의 컨볼루션 인공신경망의 예이며, 이 인공신경망의 각 레이어는 다른 연산량을 갖고 있다. 14 is a graph illustrating an example of an amount of computation for each layer of an artificial neural network according to an embodiment of the present disclosure. The artificial neural network shown in FIG. 14 is an example of a convolutional neural network of the VGG-16 model including 16 layers, and each layer of the artificial neural network has a different amount of computation.

예를 들어, 제2 레이어, 제4 레이어, 제6 레이어, 제7 레이어, 제9 레이어, 제10 레이어는 가장 높은 연산량을 갖고 있기 때문에, 고 연산량 비트 양자화(high computational cost bit quantization) 방법에 따를 경우, 가장 먼저 비트 양자화가 적용될 수 있다. 또한, 제2, 제4, 제6, 제7, 제9 및 제10 레이어에 대한 비트 양자화가 실행된 후, 다음으로 연산량이 높은 제14 레이어에 대한 비트 양자화가 실행될 수 있다.For example, since the second layer, the fourth layer, the sixth layer, the seventh layer, the ninth layer, and the tenth layer have the highest computational amount, a high computational cost bit quantization method may be used. In this case, bit quantization may be applied first. Also, after bit quantization is performed on the second, fourth, sixth, seventh, ninth, and tenth layers, bit quantization may be performed on the 14th layer having the next highest computational amount.

도 15는 본 개시의 일 실시예에 따른 순방향 양자화(forward bit quantization) 방법에 의해 비트 양자화가 실행된 인공신경망의 레이어 별 비트 수를 나타내는 그래프이다. 15 is a graph illustrating the number of bits per layer of an artificial neural network in which bit quantization is performed by a forward bit quantization method according to an embodiment of the present disclosure.

앞서 설명한 바와 같이, 순방향 양자화는, 인공신경망에 포함된 복수의 레이어의 배열 순서를 기준으로 가장 앞의 레이어부터(예를 들어, 입력 데이터가 처음 수신되는 레이어부터) 순차적으로 비트 양자화를 실행하는 방법이다. 도 15는, 도 14에 도시된 VGG-16 모델의 인공신경망에 대해 순방향 양자화를 적용한 후 각 레이어별 비트 수와, 순방향 양자화에 의해 인공신경망의 연산량의 감소율을 나타낸다. 예를 들어, n 비트와 m 비트의 덧셈을 실행하는 경우, 해당 연산의 연산량은 (n+m)/2로 계산한다. 또한, n 비트와 m 비트의 곱셈을 실행하는 경우, 해당 연산의 연산량은 n x m으로 계산할 수 있다. 따라서, 인공 신경망의 전체 연산량은, 해당 인공 신경망에서 실행하는 모든 덧셈과 곱셈의 연산량들을 합산한 결과가 될 수 있다.As described above, the forward quantization is a method of sequentially performing bit quantization from the frontmost layer (eg, from the layer from which input data is first received) based on the arrangement order of a plurality of layers included in the artificial neural network. to be. FIG. 15 shows the number of bits for each layer after forward quantization is applied to the artificial neural network of the VGG-16 model shown in FIG. 14 and the reduction rate of the computational amount of the artificial neural network by forward quantization. For example, when performing n-bit and m-bit addition, the amount of operation of the corresponding operation is calculated as (n+m)/2. In addition, when performing multiplication of n bits and m bits, the amount of operation of the corresponding operation can be calculated as n x m. Accordingly, the total amount of computation of the artificial neural network may be the result of summing all the computations of addition and multiplication executed in the corresponding artificial neural network.

도시된 바와 같이, 순방향 양자화를 이용하여 VGG-16 모델의 인공신경망에 대해 비트 양자화를 실행하였을 경우, 상대적으로 인공신경망의 앞에 배열된 레이어들의 비트 수가 많이 감소하였고, 인공신경망의 뒤에 배열된 레이어들의 비트 수는 작게 감소하였다. 예를 들어, 인공신경망의 제1레이어의 비트 수는 12비트까지 감소하였고, 제2 레이어 및 제3 레이어의 비트 수는 각각 9 비트까지 감소한 반면, 제16레이어의 비트 수는 13 비트까지 감소하고, 제15레이어의 비트 수는 15비트까지만 감소하였다. 이와 같이 순방향 양자화를 인공신경망의 제1레이어부터 제16 레이어까지 순차적으로 적용하였을 때, 인공신경망 전체의 연산량의 감소율은 56%로 계산되었다.As shown, when bit quantization is performed on the artificial neural network of the VGG-16 model using forward quantization, the number of bits in the layers arranged in front of the artificial neural network is relatively greatly reduced, and the number of bits arranged in the back of the artificial neural network is greatly reduced. The number of bits decreased slightly. For example, the number of bits in the first layer of the artificial neural network is reduced to 12 bits, the number of bits in the second layer and the third layer is reduced to 9 bits, respectively, while the number of bits in the 16th layer is reduced to 13 bits, , the number of bits in the 15th layer is reduced to only 15 bits. As such, when forward quantization is sequentially applied from the first layer to the 16th layer of the artificial neural network, the reduction rate of the total amount of computation of the artificial neural network is calculated to be 56%.

도 16은 본 개시의 일 실시예에 따른 역방향 양자화(backward bit quantization) 방법에 의해 비트 양자화가 실행된 인공신경망의 레이어 별 비트 수를 나타내는 그래프이다. 16 is a graph illustrating the number of bits per layer of an artificial neural network in which bit quantization is performed by a backward bit quantization method according to an embodiment of the present disclosure.

역방향 양자화는, 인공신경망에 포함된 복수의 레이어의 배열 순서를 기준으로 가장 뒤의 레이어부터(예를 들어, 출력 데이터가 최종 출력되는 레이어부터) 순차적으로 비트 양자화를 실행하는 방법이다. 도 16은, 도 14에 도시된 VGG-16 모델의 인공신경망에 대해 역방향 양자화를 적용한 후 각 레이어별 비트 수와, 역방향 양자화에 의해 인공신경망의 연산량의 감소율을 나타낸다. The reverse quantization is a method of sequentially performing bit quantization from the rearmost layer (eg, from the layer from which output data is finally output) based on the arrangement order of a plurality of layers included in the artificial neural network. FIG. 16 shows the number of bits for each layer after applying reverse quantization to the artificial neural network of the VGG-16 model shown in FIG. 14 and the reduction rate of the amount of computation of the artificial neural network by reverse quantization.

도시된 바와 같이, 역방향 양자화를 이용하여 VGG-16 모델의 인공신경망에 대해 비트 양자화를 실행하였을 경우, 인공신경망의 뒤에 배열된 레이어들의 비트 수가 상대적으로 많이 감소한 반면, 인공신경망의 앞에 배열된 레이어들의 비트 수는 작게 감소하였다. 예를 들어, 제1레이어, 제2 레이어, 제3 레이어의 비트 수는 각각 15비트까지 감소하였고, 제4 레이어의 비트 수는 14비트까지 감소한 반면, 제16 레이어의 비트 수는 9비트까지 감소하고, 제15레이어의 비트 수는 15비트까지 감소하였다. 이와 같이 역방향 양자화를 인공신경망의 제1 레이어부터 제16레이어까지 순차적으로 적용하였을 때, 인공신경망 전체의 연산량의 감소율은 43.05%로 계산되었다. As shown, when bit quantization is performed on the artificial neural network of the VGG-16 model using reverse quantization, the number of bits in the layers arranged behind the artificial neural network is reduced relatively significantly, whereas the number of bits in the layers arranged in front of the artificial neural network is reduced. The number of bits decreased slightly. For example, the number of bits of the first layer, the second layer, and the third layer is reduced to 15 bits, respectively, and the number of bits of the fourth layer is reduced to 14 bits, while the number of bits of the 16th layer is reduced to 9 bits and the number of bits in the 15th layer is reduced to 15 bits. As such, when reverse quantization is sequentially applied from the first layer to the 16th layer of the artificial neural network, the reduction rate of the total amount of computation of the artificial neural network was calculated to be 43.05%.

도 17은 본 개시의 일 실시예에 따른 고 연산량 레이어 우선 양자화(high computational cost layer first bit quantization) 방법에 의해 비트 양자화가 실행된 인공신경망의 레이어 별 비트 수를 나타내는 그래프이다.17 is a graph illustrating the number of bits per layer of an artificial neural network in which bit quantization is performed by a high computational cost layer first bit quantization method according to an embodiment of the present disclosure.

고 연산량 레이어 우선 양자화(또는 고 연산량 양자화)는, 인공신경망에 포함된 복수의 레이어 중에서 연산량이 높은 레이어부터 순차적으로 비트 양자화를 실행하는 방법이다. 도 17는, 도 14에 도시된 VGG-16 모델의 인공신경망에 대해 고 연산량 양자화를 적용한 후 각 레이어별 비트 수와, 고 연산량 양자화에 의해 인공신경망의 연산량의 감소율을 나타낸다. High-computational layer-priority quantization (or high-computational quantization) is a method of sequentially performing bit quantization from a layer with a high computational amount among a plurality of layers included in the artificial neural network. FIG. 17 shows the number of bits for each layer after high computational quantization is applied to the artificial neural network of the VGG-16 model shown in FIG. 14 and the reduction rate of the computational amount of the artificial neural network by high computational quantization.

도시된 바와 같이, 고 연산량 양자화를 이용하여 VGG-16 모델의 인공신경망에 대해 비트 양자화를 실행하였을 경우, 인공신경망의 복수의 레이어들 중에서 연산량이 높은 레이어들의 비트 수가 상대적으로 많이 감소하였다. 예를 들어, 제2레이어 및 제10 레이어의 비트 수는 각각 5비트 및 6 비트까지 감소한 반면, 제1 레이어의 비트 수는 14비트까지 감소하였다. 이와 같이 고 연산량 양자화를 인공신경망의 레이어들에 대해 연산량의 순서대로 적용하였을 때, 인공신경망 전체의 연산량의 감소율은 70.70%로 계산되었다. As shown, when bit quantization is performed on the artificial neural network of the VGG-16 model using high computational quantization, the number of bits in layers with high computational amount among a plurality of layers of the artificial neural network is reduced relatively significantly. For example, the number of bits in the second layer and the tenth layer is reduced to 5 bits and 6 bits, respectively, while the number of bits in the first layer is reduced to 14 bits. As such, when high computational quantization is applied to the layers of the artificial neural network in the order of computational amount, the reduction rate of the total computational amount of the artificial neural network was calculated to be 70.70%.

도 18은 본 개시의 일 실시예에 따른 저 연산량 레이어 우선 양자화(low computational cost bit quantization) 방법에 의해 비트 양자화가 실행된 인공신경망의 레이어 별 비트 수를 나타내는 그래프이다. 18 is a graph illustrating the number of bits per layer of an artificial neural network in which bit quantization is performed by a low computational cost bit quantization method according to an embodiment of the present disclosure.

저 연산량 레이어 우선 양자화(또는 저 연산량 양자화)는, 인공신경망에 포함된 복수의 레이어 중에서 연산량이 낮은 레이어부터 순차적으로 비트 양자화를 실행하는 방법이다. 도 18은, 도 14에 도시된 VGG-16 모델의 인공신경망에 대해 저 연산량 양자화를 적용한 후 각 레이어별 비트 수와, 저 연산량 양자화에 의해 인공신경망의 연산량의 감소율을 나타낸다. Low-computational layer-priority quantization (or low-computational quantization) is a method of sequentially performing bit quantization from a layer with a low computational amount among a plurality of layers included in the artificial neural network. FIG. 18 shows the number of bits for each layer after applying low computational quantization to the artificial neural network of the VGG-16 model shown in FIG. 14 and the reduction rate of the computational amount of the artificial neural network by low computational quantization.

도시된 바와 같이, 저 연산량 양자화를 이용하여 VGG-16 모델의 인공신경망에 대해 비트 양자화를 실행하였을 경우에도, 인공신경망의 복수의 레이어들 중에서 연산량이 높은 레이어들의 비트 수가 상대적으로 많이 감소하였다. 예를 들어, 제6레이어 및 제7 레이어의 비트 수는 각각 6비트 및 5 비트까지 감소한 반면, 제1 레이어의 비트 수는 13비트까지 감소하였다. 이와 같이 저 연산량 양자화를 인공신경망의 레이어들에 대해 연산량의 순서대로 적용하였을 때, 인공신경망 전체의 연산량의 감소율은 49.11%로 계산되었다.As shown, even when bit quantization is performed on the artificial neural network of the VGG-16 model using low computational quantization, the number of bits in layers with high computational amount among a plurality of layers of the artificial neural network is reduced relatively significantly. For example, the number of bits in the sixth layer and the seventh layer is reduced to 6 bits and 5 bits, respectively, while the number of bits in the first layer is reduced to 13 bits. As described above, when low computational quantization was applied to the layers of the artificial neural network in the order of computational amount, the reduction rate of the computational amount of the entire artificial neural network was calculated to be 49.11%.

이하에서는, 이상 설명한 본 개시의 다양한 실시예들에 따른 비트 양자화가 적용된 인공신경망의 하드웨어 구현 예들에 대해 상세히 설명한다. 복수의 레이어를 포함하는 컨볼루션 인공신경망을 하드웨어로 구현하는 경우, 가중치 커널은, 컨볼루션 레이어의 합성곱을 실행하기 위한 프로세싱 유닛의 외부 및/또는 내부에 배열될 수 있다. Hereinafter, hardware implementation examples of the artificial neural network to which bit quantization is applied according to various embodiments of the present disclosure described above will be described in detail. When a convolutional neural network including a plurality of layers is implemented as hardware, the weight kernel may be arranged outside and/or inside the processing unit for executing convolution of the convolutional layers.

일 실시예에서, 가중치 커널은 컨볼루션 레이어의 합성곱을 실행하기 위한 프로세싱 유닛과 분리된 메모리(예를 들어, 레지스터, 버터, 캐쉬 등)에 저장될 수 있다. 이 경우, 가중치 커널에 대해 비트 양자화를 적용하여 가중치 커널의 요소 값들의 비트 수를 감소시킨 후, 가중치 커널의 비트 수에 따라 메모리의 크기를 결정할 수 있다. 또한, 메모리에 저장된 가중치 커널의 요소 값들과 입력 특징맵의 요소 값들을 입력 받아 곱셈 및/또는 덧셈 연산을 실행하는 프로세싱 유닛 내에 배열되는 곱셈기(multiplier) 또는 가산기(adder)의 비트 크기(bit width)도 비트 양자화의 결과에 따른 비트 수에 맞추어 설계될 수 있다.In one embodiment, the weight kernel may be stored in a memory (eg, register, butter, cache, etc.) separate from the processing unit for performing the convolution of the convolutional layer. In this case, after reducing the number of bits of element values of the weight kernel by applying bit quantization to the weight kernel, the size of the memory may be determined according to the number of bits of the weight kernel. In addition, the bit width of a multiplier or adder arranged in a processing unit that receives element values of the weight kernel stored in the memory and element values of the input feature map and executes multiplication and/or addition operations. Also, it can be designed according to the number of bits according to the result of bit quantization.

다른 실시예에서, 가중치 커널은 컨볼루션 레이어의 합성곱을 실행하기 위한 프로세싱 유닛 내에 하드와이어된(hard-wired) 형태로 구현될 수도 있다. 이 경우, 가중치 커널에 대해 비트 양자화를 적용하여 가중치 커널의 요소 값들의 비트 수를 감소시킨 후, 가중치 커널의 비트 수에 따라 가중치 커널의 요소 값들 각각을 나타내는 하드와이어를 프로세싱 유닛 내에 구현할 수 있다. 또한, 하드와이어된 가중치 커널의 요소 값들과 입력 특징맵의 요소 값들을 입력 받아 곱셈 및/또는 덧셈 연산을 실행하는 프로세싱 유닛 내에 배열되는 곱셈기 또는 가산기의 비트 크기도 비트 양자화의 결과에 따른 비트 수에 맞추어 설계될 수 있다.In another embodiment, the weight kernel may be implemented in a hard-wired form in a processing unit for performing convolution of a convolutional layer. In this case, after reducing the number of bits of element values of the weight kernel by applying bit quantization to the weight kernel, a hard wire representing each of the element values of the weight kernel according to the number of bits of the weight kernel may be implemented in the processing unit. In addition, the bit size of a multiplier or adder arranged in a processing unit that receives the element values of the hardwired weight kernel and element values of the input feature map and executes multiplication and/or addition operations is also dependent on the number of bits according to the result of bit quantization. can be designed accordingly.

이하에서 설명되는 도 19 내지 도 21는, 본 개시의 또 다른 실시예에 따른 복수의 레이어를 포함하는 인공신경망의 하드웨어 구현 예를 도시하는 도면이다. 본 개시인 복수의 레이어를 포함하는 인공신경망의 비트 양자화 방법 및 시스템은, CPU, GPU, FPGA, ASIC 등 어떠한 ANN(Artificial neural network) 연산 시스템에도 본 개시를 적용하여 필요 연산량, 연산기의 비트 크기, 메모리를 감소시킬 수 있다. 또한, 본 예시에서는 정수(Integer)를 기준으로 실시 예를 보였지만, 부동 소수점(Floating Point) 연산으로도 실시될 수도 있다. 19 to 21 described below are diagrams illustrating a hardware implementation example of an artificial neural network including a plurality of layers according to another embodiment of the present disclosure. The bit quantization method and system of an artificial neural network including a plurality of layers according to the present disclosure, by applying the present disclosure to any ANN (Artificial Neural Network) calculation system such as CPU, GPU, FPGA, ASIC, required amount of computation, bit size of operator, memory can be reduced. In addition, although the embodiment is shown based on an integer in this example, it may also be implemented as a floating point operation.

도 19는 본 개시의 일 실시예에 따른 인공신경망의 하드웨어 구현 예를 도시하는 도면이다. 도시된 인공신경망은 컨볼루션 인공신경망의 컨볼루션 레이어의 합성곱 처리 장치(1900)를 하드웨어로 구현한 예를 나타낸다. 여기서, 컨볼루션 레이어는, 3x3x3 크기의 가중치 커널을 입력 특징맵 상의 일 부분(3x3x3 크기의 데이터)에 적용하여 합성곱을 실행하는 것으로 가정하여 설명한다. 각 레이어의 가중치 커널의 크기와 개수는, 응용분야와 입출력의 특징맵 채널 수에 따라 상이할 수 있다.19 is a diagram illustrating a hardware implementation example of an artificial neural network according to an embodiment of the present disclosure. The illustrated artificial neural network represents an example in which the convolution processing unit 1900 of the convolutional layer of the convolutional artificial neural network is implemented in hardware. Here, the convolution layer will be described on the assumption that a 3x3x3 weight kernel is applied to a part (3x3x3 data) on the input feature map to perform convolution. The size and number of weight kernels of each layer may be different depending on the application field and the number of input/output feature map channels.

도시된 바와 같이, 가중치 커널은 컨볼루션 레이어의 합성곱을 실행하기 위한 프로세싱 유닛(1930)과 분리된 가중치 커널 캐쉬(1910)에 저장될 수 있다. 이 경우, 가중치 커널에 대해 비트 양자화를 적용하여 가중치 커널의 요소 값들(w₁, w₂, ..., w₉)의 비트 수를 감소시킨 후, 가중치 커널의 비트 수에 따라 캐쉬의 크기를 결정할 수 있다. 또한, 메모리에 저장된 가중치 커널의 요소 값들과 입력 특징맵의 요소 값들을 입력 받아 곱셈 및/또는 덧셈 연산을 실행하는 프로세싱 유닛(1930) 내에 배열되는 곱셈기 또는 가산기의 비트 크기도 비트 양자화의 결과에 따른 가중치 커널 요소 값의 비트 수에 맞추어 설계될 수 있다.As shown, the weight kernel may be stored in a weight kernel cache 1910 separate from the processing unit 1930 for executing convolution of the convolutional layer. In this case, after the bit quantization is applied to the weight kernel _{to reduce the number of bits of the element values (w 1} , w ₂ , ..., w ₉ ) of the weight kernel, the size of the cache is adjusted according to the number of bits of the weight kernel. can decide In addition, the bit size of a multiplier or adder arranged in the processing unit 1930 that receives element values of the weight kernel stored in the memory and element values of the input feature map and executes multiplication and/or addition operations is also determined according to the result of bit quantization. It can be designed according to the number of bits of the weighted kernel element value.

일 실시예에 따르면, 입력 특징맵 캐쉬(1920)는, 입력 데이터 상의 일부분(가중치 커널의 크기에 대응되는 부분)을 입력 받아 저장할 수 있다. 가중치 커널은 입력 데이터 상을 순회하며, 입력 특징맵 캐쉬(1920)는 해당 가중치 커널의 위치에 대응되는 입력 데이터의 일부분을 순차적으로 입력 받아 저장할 수 있다. 입력 특징맵 캐쉬(1920)에 저장된 입력 데이터의 일부분(x₁, x₂, ..., x₉)과 가중치 커널 캐쉬(1910)에 저장된 가중치 커널의 일부 요소 값들(w₁, w₂, ..., w₉)은 각각 대응되는 곱셈기(1932)에 입력되어 다중 곱이 실행된다. 곱셈기(1932)에 의한 다중 곱의 결과값들은 트리 가산기(1934)에 의해 합산되어 가산기(1940)로 입력된다. 입력 데이터가 다채널로 구성된 경우(예를 들어, 입력 데이터가 RGB 컬러 영상인 경우), 가산기(1940)는, 누산기(1942)에 저장된 값(초기값은 0)과 입력된 특정 채널의 합산값을 더하여 다시 누산기(1942)에 저장할 수 있다. 누산기(1942)에 저장된 합산값은 다음 채널에 대한 가산기(1940)의 합산값과 다시 더해서 누산기(1942)로 입력될 수 있다. 이러한 가산기(1940)와 누산기(1942)의 합산 과정은, 입력 데이터의 모든 채널에 대해 실행되어 그 총 합산값은 출력 활성화 맵 캐쉬(1550)로 입력될 수 있다. 이상 설명한 합성곱의 절차는, 가중치 커널과 해당 가중치 커널의 입력 데이터 상의 순회 위치에 대응되는 입력 데이터의 일부분에 대해 반복될 수 있다. According to an embodiment, the input feature map cache 1920 may receive and store a portion of input data (a portion corresponding to the size of the weight kernel) as input. The weight kernel traverses the input data phase, and the input feature map cache 1920 may sequentially receive and store a portion of the input data corresponding to the location of the weight kernel. _{Part of the input data (x 1} , x ₂ , ..., x ₉ ) stored in the input feature map cache 1920 and some element values of the weight kernel stored in the weight kernel cache 1910 (w ₁ , w ₂ , . .., w ₉ ) is input to the corresponding multiplier 1932, respectively, and multiple multiplication is executed. The result values of multiple multiplication by the multiplier 1932 are summed by the tree adder 1934 and input to the adder 1940 . When the input data is composed of multiple channels (eg, when the input data is an RGB color image), the adder 1940 is the sum of the value stored in the accumulator 1942 (the initial value is 0) and the input specific channel. can be added and stored in the accumulator 1942 again. The sum value stored in the accumulator 1942 may be input to the accumulator 1942 by adding it back to the sum value of the adder 1940 for the next channel. The summing process of the adder 1940 and the accumulator 1942 may be performed for all channels of input data, and the total sum may be input to the output activation map cache 1550 . The convolution procedure described above may be repeated for a weight kernel and a portion of input data corresponding to a traversal position on input data of the weight kernel.

이상 설명한 바와 같이, 가중치 커널의 요소값들이 프로세싱 유닛(1930)의 외부에 배열된 가중치 커널 캐쉬(1910)에 저장될 경우, 본 개시에 따른 비트 양자화에 의해 가중치 커널 요소값들의 비트 수를 감소할 수 있으며, 이에 따라 가중치 커널 캐쉬(1910)의 크기와 프로세싱 유닛(1930)의 곱셈기와 가산기의 크기를 감소할 수 있는 효과가 있다. 또한, 프로세싱 유닛(1930)의 크기가 감소함에 따라, 프로세싱 유닛(1930)의 연산 속도와 전력 소비량도 감소될 수 있다.As described above, when the element values of the weight kernel are stored in the weight kernel cache 1910 arranged outside the processing unit 1930, the number of bits of the weight kernel element values is reduced by bit quantization according to the present disclosure. Accordingly, there is an effect that the size of the weight kernel cache 1910 and the size of the multiplier and the adder of the processing unit 1930 can be reduced. Also, as the size of the processing unit 1930 decreases, the operation speed and power consumption of the processing unit 1930 may also decrease.

도 20은 본 개시의 다른 실시예에 따른 인공신경망의 하드웨어 구현 예를 도시하는 도면이다. 20 is a diagram illustrating an example of hardware implementation of an artificial neural network according to another embodiment of the present disclosure.

도시된 인공신경망은 컨볼루션 인공신경망의 컨볼루션 레이어의 합성곱 처리 장치(2000)를 하드웨어로 구현한 예를 나타낸다. 여기서, 컨볼루션 레이어는, 3x3x3 크기의 가중치 커널을 입력 활성화 맵 상의 일 부분(3x3x3 크기의 데이터)에 적용하여 합성곱을 실행한다.The illustrated artificial neural network shows an example in which the convolution processing apparatus 2000 of the convolutional layer of the convolutional artificial neural network is implemented in hardware. Here, the convolution layer performs convolution by applying a weight kernel of 3x3x3 size to a portion (data of size 3x3x3) on the input activation map.

도시된 바와 같이, 가중치 커널은 컨볼루션 레이어의 합성곱을 실행하기 위한 프로세싱 유닛(2030)과 분리된 가중치 커널 캐쉬(2010)에 저장될 수 있다. 이 경우, 가중치 커널에 대해 비트 양자화를 적용하여 가중치 커널의 요소 값들(w₁, w₂, ..., w₉)의 비트 수를 감소시킨 후, 가중치 커널의 비트 수에 따라 캐쉬의 크기를 결정할 수 있다. 또한, 메모리에 저장된 가중치 커널의 요소 값들과 입력 활성화 맵(또는 특징맵)의 요소 값들을 입력 받아 곱셈 및/또는 덧셈 연산을 실행하는 프로세싱 유닛(2030) 내에 배열되는 곱셈기 또는 가산기의 비트 크기도 비트 양자화의 결과에 따른 가중치 커널 요소 값의 비트 수에 맞추어 설계될 수 있다.As shown, the weight kernel may be stored in the weight kernel cache 2010 separated from the processing unit 2030 for executing convolution of the convolutional layer. In this case, after the bit quantization is applied to the weight kernel _{to reduce the number of bits of the element values (w 1} , w ₂ , ..., w ₉ ) of the weight kernel, the size of the cache is adjusted according to the number of bits of the weight kernel. can decide In addition, the bit size of a multiplier or adder arranged in the processing unit 2030 that receives element values of the weight kernel stored in the memory and element values of the input activation map (or feature map) and executes multiplication and/or addition operations in bits It can be designed according to the number of bits of the weight kernel element value according to the result of quantization.

일 실시예에 따르면, 입력 활성화맵 캐쉬(2020)는, 다채널(예를 들어, 3개의 RGB 채널)로 구성된 입력 데이터 상의 일부분(가중치 커널의 크기에 대응되는 부분)을 입력 받아 저장할 수 있다. 가중치 커널은 입력 데이터 상을 순회하며, 입력 활성화맵 캐쉬(2020)는 해당 가중치 커널의 위치에 대응되는 입력 데이터의 일부분을 순차적으로 입력 받아 저장할 수 있다. 입력 활성화맵 캐쉬(2020)에 저장된 입력 데이터의 일부분(x₁, x₂, ..., x₂₇)과 가중치 커널 캐쉬(2010)에 저장된 가중치 커널의 요소값들(w₁, w₂, ..., w₂₇)은 각각 대응되는 곱셈기에 입력되어 다중 곱이 실행된다. 이 때, 가중치 커널 캐쉬(2010)의 커널 요소값들(w₁, w₂, ..., w₉)과 입력 활성화맵 가중치 캐쉬(2020)에 저장된 입력 데이터의 제1채널의 부분(x₁, x₂, ..., x₉)은 제1 합성곱 처리 유닛(2032)로 입력된다. 또한, 가중치 커널 캐쉬(2010)의 가중치 커널 요소값들(w₁₀, w₁₁, ..., w₁₈)과 입력 활성화맵 캐쉬(2020)에 저장된 입력 데이터의 제2채널의 부분(x₁₀, x₁₁, ..., x₁₈)은 제2 합성곱 처리 유닛(2034)로 입력된다. 또한, 가중치 커널 캐쉬(2010)의 가중치 커널 요소값들(w₁₉, w₂₀, ..., w₂₇)과 입력 활성화맵 캐쉬(2020)에 저장된 입력 데이터의 제3채널의 부분(x₁₉, x₂₀, ..., x₂₇)은 제3 합성곱 처리 유닛(2036)로 입력된다. According to an embodiment, the input activation map cache 2020 may receive and store a portion (a portion corresponding to the size of the weight kernel) on input data composed of multiple channels (eg, three RGB channels). The weight kernel traverses the input data phase, and the input activation map cache 2020 may sequentially receive and store a portion of the input data corresponding to the location of the weight kernel. _{A portion of input data (x 1} , x ₂ , ..., x ₂₇ ) stored in the input activation map cache 2020 and element values of the weight kernel stored in the weight kernel cache 2010 (w ₁ , w ₂ , . .., w ₂₇ ) is input to the corresponding multiplier, and multiple multiplication is executed. At this time, the kernel element values (w ₁ , w ₂ , ..., w ₉ _{) of the weight kernel cache 2010 and the portion (x 1} ) of the first channel of the input data stored in the input activation map weight cache 2020 , x ₂ , ..., x ₉ ) are input to the first convolution processing unit 2032 . In addition, the weight kernel element values (w ₁₀ , w ₁₁ , ..., w ₁₈ ) of the weight kernel cache 2010 and the second channel portion of the input data stored in the input activation map cache 2020 (x ₁₀ , x ₁₁ , ..., x ₁₈ ) are input to the second convolution processing unit 2034 . In addition, the weight kernel element values (w ₁₉ , w ₂₀ , ..., w ₂₇ ) of the weight kernel cache 2010 and the third channel part of the input data stored in the input activation map cache 2020 (x ₁₉ , x ₂₀ , ..., x ₂₇ ) are input to the third convolution processing unit 2036 .

제1 합성곱 처리 유닛(2032), 제2 합성곱 처리 유닛(2034) 및 제3 합성곱 처리 유닛(2036) 각각은, 도 19에 도시된 프로세싱 유닛(1930)과 동일하게 동작할 수 있다. 제1 합성곱 처리 유닛(2032), 제2 합성곱 처리 유닛(2034) 및 제3 합성곱 처리 유닛(2036) 각각에 의해 계산된 합성곱의 결과값은 트리 가산기(2038)에 의해 합산되어 출력 활성화 맵 캐쉬(2040)에 입력될 수 있다.Each of the first convolution processing unit 2032 , the second convolution processing unit 2034 , and the third convolution processing unit 2036 may operate in the same manner as the processing unit 1930 illustrated in FIG. 19 . The result values of the convolutions calculated by each of the first convolution processing unit 2032 , the second convolution processing unit 2034 , and the third convolution processing unit 2036 are summed by the tree adder 2038 and output It may be input to the activation map cache 2040 .

이상 설명한 바와 같이, 가중치 커널의 요소값들이 프로세싱 유닛(2030)의 외부에 배열된 가중치 커널 캐쉬(2010)에 저장될 경우, 본 개시에 따른 비트 양자화에 의해 가중치 커널 요소값들의 비트 수를 감소할 수 있으며, 이에 따라 가중치 커널 캐쉬(2010)의 크기와 프로세싱 유닛(2030)의 곱셈기와 가산기의 크기를 감소할 수 있는 효과가 있다. 또한, 프로세싱 유닛(2030)의 크기가 감소함에 따라, 프로세싱 유닛(2030)의 연산 속도와 전력 소비량도 감소될 수 있다.As described above, when the element values of the weight kernel are stored in the weight kernel cache 2010 arranged outside the processing unit 2030, the number of bits of the weight kernel element values is reduced by bit quantization according to the present disclosure. Accordingly, there is an effect that the size of the weight kernel cache 2010 and the size of the multiplier and adder of the processing unit 2030 can be reduced. Also, as the size of the processing unit 2030 decreases, the operation speed and power consumption of the processing unit 2030 may also decrease.

도 21은 본 개시의 또 다른 실시예에 따른 인공신경망의 하드웨어 구현 예를 도시하는 도면이다.21 is a diagram illustrating an example of hardware implementation of an artificial neural network according to another embodiment of the present disclosure.

도시된 인공신경망은 컨볼루션 인공신경망의 컨볼루션 레이어의 합성곱 처리 장치(2200)를 하드웨어로 구현한 예를 나타낸다. 여기서, 컨볼루션 레이어는, 3x3x3 크기의 가중치 커널을 입력 활성화 맵 상의 일 부분(3x3x3 크기의 데이터)에 적용하여 합성곱을 실행한다.The illustrated artificial neural network shows an example in which the convolution processing device 2200 of the convolutional layer of the convolutional artificial neural network is implemented in hardware. Here, the convolution layer performs convolution by applying a weight kernel of 3x3x3 size to a portion (data of size 3x3x3) on the input activation map.

도시된 바와 같이, 가중치 커널은 컨볼루션 레이어의 합성곱을 실행하기 위한 프로세싱 유닛(2220) 내에 하드와이어된 형태로 구현될 수 있다. 이 경우, 가중치 커널에 대해 비트 양자화를 적용하여 가중치 커널의 요소값들(w_{1_K}, w_{2_K}, ..., w_{27_K})의 비트 수를 감소시킨 후, 가중치 커널의 비트 수에 따라 캐쉬의 크기를 결정할 수 있다. 또한, 프로세싱 유닛(2220) 내에 와이어로 구현된 가중치 커널의 요소값들과 입력 활성화 맵(또는 특징맵)의 요소 값들을 입력 받아 곱셈 및/또는 덧셈 연산을 실행하는 프로세싱 유닛(2030) 내에 배열되는 곱셈기 또는 가산기의 비트 크기도 비트 양자화의 결과에 따른 가중치 커널 요소 값의 비트 수에 맞추어 설계될 수 있다.As shown, the weight kernel may be implemented in hardwired form within the processing unit 2220 for performing convolution of the convolutional layer. In this case, after the bit quantization is applied to the weight kernel _{to reduce the number of bits of the element values (w 1_K} , w _{2_K} , ..., w _{27_K} ) of the weight kernel, the size of the cache depends on the number of bits of the weight kernel. can be decided In addition, arranged in the processing unit 2030 that receives element values of the weight kernel implemented as wires in the processing unit 2220 and element values of the input activation map (or feature map) and executes multiplication and/or addition operations. The bit size of the multiplier or adder may also be designed according to the number of bits of the weighted kernel element value according to the result of bit quantization.

일 실시예에 따르면, 입력 활성화맵 캐쉬(2210)는, 다채널(예를 들어, 3개의 RGB 채널)로 구성된 입력 데이터 상의 일부분(가중치 커널의 크기에 대응되는 부분)을 입력 받아 저장할 수 있다. 가중치 커널은 입력 데이터 상을 순회하며, 입력 활성화맵 캐쉬(2210)는 해당 가중치 커널의 위치에 대응되는 입력 데이터의 일부분을 순차적으로 입력 받아 저장할 수 있다. 입력 활성화맵 캐쉬(2210)에 저장된 입력 데이터의 일부분(x₁, x₂, ..., x₂₇)과 프로세싱 유닛(2220) 내에 와이어로 구현된 가중치 커널의 요소값들(w_{1_K}, w_{2_K}, ... w_{27_K})은 각각 대응되는 곱셈기에 입력되어 다중 곱이 실행된다. 이 때, 프로세싱 유닛(2220) 내에 와이어로 구현된 가중치 커널 요소값들(w_{1_K}, w_{2_K}, ..., w_{9_K})과 입력 활성화맵 캐쉬(2210)에 저장된 입력 데이터의 제1채널의 부분(x₁, x₂, ..., x₉)은 제1 합성곱 처리 유닛(2222)로 입력된다. 또한, 프로세싱 유닛(2220) 내에 와이어로 구현된 가중치 커널 요소값들(w_{10_K}, w_{11_K}, ..., w_{18_K})과 입력 활성화맵 캐쉬(2210)에 저장된 입력 데이터의 제2채널의 부분(x₁₀, x₁₁, ..., x₁₈)은 제2 합성곱 처리 유닛(2224)로 입력된다. 또한, 가중치 커널 캐쉬(2210)의 가중치 커널 요소값들(w_{19_K}, w_{20_K}, ..., w_{27_K})과 입력 활성화맵 캐쉬(2210)에 저장된 입력 데이터의 제3채널의 부분(x₁₉, x₂₀, ..., x₂₇)은 제3 합성곱 처리 유닛(2226)로 입력된다. According to an embodiment, the input activation map cache 2210 may receive and store a portion (a portion corresponding to the size of the weight kernel) on input data composed of multiple channels (eg, three RGB channels). The weight kernel traverses the input data phase, and the input activation map cache 2210 may sequentially receive and store a portion of the input data corresponding to the location of the weight kernel. _{A portion of input data (x 1} , x ₂ , ..., x ₂₇ ) stored in the input activation map cache 2210 _{and element values (w 1_K} , w _{2_K} ) of the weight kernel implemented as wires in the processing unit 2220 . , ... w _{27_K} ) are input to the corresponding multiplier, respectively, and multiple multiplication is executed. At this time, the weight kernel element values w _{1_K} , w _{2_K} , ..., w _{9_K} implemented as wires in the processing unit 2220 and the input data stored in the input activation map cache 2210 are part of the first channel (x ₁ , x ₂ , ..., x ₉ ) is input to the first convolution processing unit 2222 . In addition, the weight kernel element values (w _{10_K} , w _{11_K} , ..., w _{18_K} ) implemented as wires in the processing unit 2220 and the second channel portion of the input data stored in the input activation map cache 2210 ( x ₁₀ , x ₁₁ , ..., x ₁₈ ) are input to the second convolution processing unit 2224 . In addition, the weight kernel element values (w _{19_K} , w _{20_K} , ..., w _{27_K} ) of the weight kernel cache 2210 and the third channel part of the input data stored in the input activation map cache 2210 (x ₁₉ , x ₂₀ , ..., x ₂₇ ) are input to the third convolution processing unit 2226 .

제1 합성곱 처리 유닛(2222), 제2 합성곱 처리 유닛(2224) 및 제3 합성곱 처리 유닛(2226) 각각에 의해 계산된 합성곱의 결과값은 트리 가산기(2228)에 의해 합산되어 출력 활성화 맵 캐쉬(2230)에 입력될 수 있다.The result values of the convolutions calculated by each of the first convolution processing unit 2222 , the second convolution processing unit 2224 , and the third convolution processing unit 2226 are summed by the tree adder 2228 and output It may be input to the activation map cache 2230 .

이상 설명한 바와 같이, 가중치 커널의 요소값들이 프로세싱 유닛(2220) 내에 하드와이어된 형태로 구현될 경우, 본 개시에 따른 비트 양자화에 의해 가중치 커널 요소값들의 비트 수를 감소할 수 있으며, 이에 따라 프로세싱 유닛(2220)의 내부에 구현된 와이어의 수와 프로세싱 유닛(2220)의 곱셈기와 가산기의 크기를 감소할 수 있는 효과가 있다. 또한, 프로세싱 유닛(2220)의 크기가 감소함에 따라, 프로세싱 유닛(2220)의 연산 속도와 전력 소비량도 감소될 수 있다.As described above, when the element values of the weight kernel are implemented in a hard-wired form in the processing unit 2220, the number of bits of the weight kernel element values can be reduced by bit quantization according to the present disclosure, and accordingly, processing The number of wires implemented inside the unit 2220 and the size of the multipliers and adders of the processing unit 2220 can be reduced. Also, as the size of the processing unit 2220 decreases, the operation speed and power consumption of the processing unit 2220 may also decrease.

도 22는 본 개시의 일 실시예에 따른 인공신경망에 대해 비트 양자화를 실행하는 시스템의 구성을 도시하는 도면이다. 22 is a diagram illustrating a configuration of a system for performing bit quantization on an artificial neural network according to an embodiment of the present disclosure.

도시된 바와 같이, 시스템(2300)은, 파라미터 선택 모듈(2310), 비트 양자화 모듈(2320) 및 정확도 판단 모듈(2330)을 포함할 수 있다. 파라미터 선택 모듈(2310)은, 입력되는 인공신경망의 구성 정보를 분석할 수 있다. 인공신경망의 구성 정보에는, 인공신경망에 포함되는 레이어의 수, 각 레이어의 기능과 역할, 각 레이어의 입출력 데이터에 관한 정보, 각 레이어에 의해 실행되는 곱셈과 덧셈의 종류와 수, 각 레이어에 의해 실행되는 활성화 함수의 종류, 각 레이어가 입력되는 가중치 커널의 종류와 구성, 각 레이어에 속한 가중치 커널의 크기와 개수, 출력 특징맵의 크기, 가중치 커널의 초기값(예를 들어, 실수로 설정된 가중치 커널의 요소값들) 등이 포함될 수 있으나, 이에 한정되는 것은 아니다. 인공신경망의 구성 정보는, 인공신경망의 종류(예를 들어, 콘볼루션 인공신경망, 순환 인공신경망, 다층 퍼셉트론 등)에 따라 다양한 구성요소들의 정보를 포함할 수 있다.As shown, the system 2300 may include a parameter selection module 2310 , a bit quantization module 2320 , and an accuracy determination module 2330 . The parameter selection module 2310 may analyze the input configuration information of the artificial neural network. The configuration information of the artificial neural network includes the number of layers included in the artificial neural network, the function and role of each layer, information about input and output data of each layer, the type and number of multiplications and additions performed by each layer, and by each layer. The type of activation function to be executed, the type and composition of the weight kernel to which each layer is input, the size and number of weight kernels belonging to each layer, the size of the output feature map, the initial value of the weight kernel (e.g., weights set by real numbers) kernel element values), but is not limited thereto. The configuration information of the artificial neural network may include information of various components according to the type of the artificial neural network (eg, a convolutional artificial neural network, a recurrent artificial neural network, a multilayer perceptron, etc.).

파라미터 선택 모듈(2310)은, 입력된 인공신경망 구성 정보를 참조하여, 해당 인공신경망에서 적어도 하나의 양자화할 파라미터 또는 파라미터 그룹을 선택할 수 있다. 인공신경망에서 어떻게 하나의 파라미터(또는 데이터) 또는 파라미터 그룹을 선택할지는, 인공신경망의 전체 성능 또는 연산량(또는 하드웨어 구현시 요구되는 자원량)에 선택될 파라미터 미치는 영향에 따라 결정될 수 있다. 파라미터의 선택은, 하나의 가중치, 하나의 특징맵 및 활성화 맵, 하나의 가중치 커널, 한 레이어 속한 모든 가중치, 한 레이어에 속한 모든 특징맵 또는 활성화 맵 중의 어느 하나의 선택으로 실행될 수 있다.The parameter selection module 2310 may select at least one parameter or parameter group to be quantized in the corresponding artificial neural network with reference to the input artificial neural network configuration information. How one parameter (or data) or parameter group is selected in the artificial neural network may be determined according to the effect of the selected parameter on the overall performance or computational amount (or the amount of resources required for hardware implementation) of the artificial neural network. The parameter selection may be performed by selection of one weight, one feature map and activation map, one weight kernel, all weights belonging to one layer, all feature maps belonging to one layer, or activation map.

일 실시예에서, 앞서 설명한 도 4내지 도 10을 참조하여 설명한 콘볼루션 인공신경망(CNN)(400)의 경우에는, 콘볼루션 레이어(420) 및/또는 완전 연결 레이어(440)가 CNN(400)의 전체 성능 또는 연산량 등에 미치는 영향이 크기 때문에, 이들 레이어(420, 440) 중 적어도 하나의 레이어의 가중치 커널 또는 특징맵/활성화 맵이 하나의 양자화할 파라미터로 선택될 수 있다. In one embodiment, in the case of the convolutional artificial neural network (CNN) 400 described with reference to FIGS. 4 to 10 described above, the convolutional layer 420 and/or the fully connected layer 440 is the CNN 400 Since it has a large effect on the overall performance or amount of computation, a weight kernel or a feature map/activation map of at least one of these layers 420 and 440 may be selected as one parameter to be quantized.

일 실시예에서, 인공신경망에 포함된 복수의 레이어 중 적어도 하나를 선택하여 그 레이어 내의 전체 가중치 커널 또는 그 레이어의 전체 활성화맵 데이터를 하나의 파라미터 그룹으로 설정할 수 있는데, 그 선택 방법은, 선택된 레이어가 인공신경망의 전체 성능 또는 연산량 등에 미치는 영향에 따라 결정될 수 있으나, 이에 한정되는 것은 아니고, 다양한 방법들 중에 하나를 포함할 수 있다. 예를 들어, 인공신경망에 포함된 복수의 레이어 중 적어도 하나의 레이어의 선택은, (i) 인공신경망을 구성하는 복수의 레이어의 배열 순서에 따라 입력 데이터가 수신되는 제1 레이어부터 이후 레이어로 순차적으로 선택하는 방법, (ii) 인공신경망을 구성하는 복수의 레이어의 배열 순서에 따라 최종 출력 데이터가 생성되는 가장 마지막 레이어부터 이전 레이어로 순차적으로 선택하는 방법, (iii) 인공신경망을 구성하는 복수의 레이어 중에서 가장 연산량이 높은 레이어부터 선택하는 방법, 또는 (iv) 인공신경망을 구성하는 복수의 레이어 중에서 가장 연산량이 작은 레이어부터 선택하는 방법에 따라 실행될 수도 있다. In an embodiment, by selecting at least one of a plurality of layers included in the artificial neural network, all weight kernels in the layer or all activation map data of the layer may be set as one parameter group. The selection method includes: may be determined according to an effect on the overall performance or amount of computation of the artificial neural network, but is not limited thereto, and may include one of various methods. For example, the selection of at least one layer among a plurality of layers included in the artificial neural network is sequentially performed from a first layer from which input data is received to a subsequent layer according to the arrangement order of the plurality of layers constituting the artificial neural network. (ii) a method of sequentially selecting from the last layer in which the final output data is generated to the previous layer according to the arrangement order of the plurality of layers constituting the artificial neural network, (iii) the plurality of layers constituting the artificial neural network. It may be performed according to a method of selecting a layer with the highest amount of computation from among the layers, or (iv) a method of selecting a layer with the least amount of computation from among a plurality of layers constituting the artificial neural network.

파라미터 선택 모듈(2310)에 의해 인공신경망의 양자화 할 데이터 대상의 선택이 완료되면, 선택된 데이터의 정보는 비트 양자화 모듈(2320)에 입력된다. 비트 양자 화 모듈(2320)은, 입력된 선택된 파라미터의 정보를 참조하여, 해당 파라미터에 대한 데이터 표현 크기를 비트 단위로 감소시킬 수 있다. 선택된 파라미터의 연산에 요구되는 자원은, 그 선택된 파라미터를 저장하기 위한 메모리, 또는 그 선택된 파라미터를 전송하기 위한 데이터 경로(data path) 등을 포함할 수 있으나, 이에 한정되는 것은 아니다. When the selection of a data target to be quantized by the artificial neural network is completed by the parameter selection module 2310 , information on the selected data is input to the bit quantization module 2320 . The bit quantization module 2320 may reduce the data representation size of the corresponding parameter in units of bits with reference to the inputted information of the selected parameter. The resource required for the operation of the selected parameter may include, but is not limited to, a memory for storing the selected parameter or a data path for transmitting the selected parameter.

일 실시예에서, 비트 양자화 모듈(2320)이, 선택된 파라미터의 데이터 크기를 비트 단위로 감소시키는 경우, 도 4 내지 도 13을 참조하여 설명한 가중치 커널 양자화 및/또는 활성화 맵 양자화가 실행될 수 있다. In an embodiment, when the bit quantization module 2320 reduces the data size of the selected parameter in bits, the weight kernel quantization and/or activation map quantization described with reference to FIGS. 4 to 13 may be performed.

비트 양자화 모듈(2320)이, 선택된 파라미터에 대한 비트 양자화를 완료하면, 비트 양자화된 인공신경망의 정보를 정확도 판단 모듈(2330)로 전송한다. 정확도 판단 모듈(2330)은, 시스템(2300)에 입력된 인공신경망의 구성 정보에 비트 양자화된 인공신경망의 정보를 반영할 수 있다. 비트 양자화 모듈(2320)은, 비트 양자화된 인공신경망의 정보가 반영된 인공신경망의 구성 정보에 기초하여, 인공신경망의 정확도가 사전 결정된 목표값 이상인지 여부를 결정할 수 있다. 예를 들어, 정확도 판단 모듈(2330)은, 인공신경망에서 선택된 파라미터의 데이터를 표현하는 크기를 비트 단위로 감소한 후, 해당 인공신경망의 출력 결과(예를 들어, 인공신경망의 추론 결과)의 정확도가 사전에 결정된 목표값 이상이라면, 추가의 비트 양자화를 실행하여도 인공신경망의 전체 성능을 유지할 수 있다고 예측할 수 있다. When the bit quantization module 2320 completes bit quantization for the selected parameter, the bit quantized artificial neural network information is transmitted to the accuracy determination module 2330 . The accuracy determination module 2330 may reflect bit-quantized information of the artificial neural network to the configuration information of the artificial neural network input to the system 2300 . The bit quantization module 2320 may determine whether the accuracy of the artificial neural network is equal to or greater than a predetermined target value, based on the configuration information of the artificial neural network in which the bit-quantized information of the artificial neural network is reflected. For example, the accuracy determination module 2330 decreases the size of the data representing the data of the selected parameter in the artificial neural network in bits, and then the accuracy of the output result of the corresponding artificial neural network (eg, the inference result of the artificial neural network) is If it is more than a predetermined target value, it can be predicted that the overall performance of the artificial neural network can be maintained even if additional bit quantization is performed.

따라서, 정확도 판단 모듈(2330)이 인공신경망의 정확도가 목표값 이상이라고 결정하는 경우, 파라미터 선택 모듈(2310)에 제어 신호를 전송하여, 파라미터 선택 모듈(2310)이 인공신경망에 포함된 다른 하나의 파라미터 또는 파라미터 그룹을 선택하도록 한다. 여기서, 인공신경망에서 하나의 파라미터를 선택하는 방법은, (i) 인공신경망을 구성하는 각 파라미터 또는 파라미터 그룹의 배열 순서에 따라, 이전 선택된 파라미터의 다음 파라미터를 순차적으로 선택하는 방법("순방향 비트 양자화", forward bit quantization), (ii) 인공신경망을 구성하는 파라미터 또는 파라미터 그룹의 배열 순서에 따라, 이전 선택된 파라미터의 이전 파라미터를 역방향으로 선택하는 방법("역방향 비트 양자화", backward bit quantization), (iii) 인공신경망을 구성하는 복수의 파라미터 중에서 연산량의 순서에 따라, 이전 선택된 파라미터 다음으로 연산량이 많은 파라미터를 선택하는 방법("고 연산량 비트 양자화", high computational cost bit quantization), 또는 (iv) 인공신경망을 구성하는 복수의 파라미터 중에서 연산량의 순서에 따라, 이전 선택된 파라미터 다음으로 연산량이 작은 파라미터를 선택하는 방법("저 연산량 비트 양자화", low computational cost bit quantization)에 따라 실행될 수도 있다. Accordingly, when the accuracy determination module 2330 determines that the accuracy of the artificial neural network is equal to or greater than the target value, it transmits a control signal to the parameter selection module 2310 so that the parameter selection module 2310 selects another one included in the artificial neural network. Allows you to select a parameter or parameter group. Here, the method of selecting one parameter in the artificial neural network is (i) a method of sequentially selecting the next parameter of the previously selected parameter according to the arrangement order of each parameter or parameter group constituting the artificial neural network (“forward bit quantization”) ", forward bit quantization), (ii) a method of selecting a previous parameter of a previously selected parameter in the reverse direction according to the arrangement order of parameters or parameter groups constituting an artificial neural network ("reverse bit quantization", backward bit quantization), ( iii) A method of selecting a parameter with the highest computational amount after the previously selected parameter according to the order of computational amount among a plurality of parameters constituting the artificial neural network (“high computational cost bit quantization”), or (iv) artificial According to the order of the computational amount among a plurality of parameters constituting the neural network, it may be executed according to a method of selecting a parameter with the smallest computational amount next to the previously selected parameter (“low computational cost bit quantization”).

다른 한편, 정확도 판단 모듈(2330)이, 인공신경망의 정확도가 목표치 이상이 아니라고 판단하면, 현재 선택된 파라미터에 대해 실행된 비트 양자화에 의해 인공신경망의 정확도가 저하되었다고 판단할 수 있다. 따라서, 이 경우, 바로 이전에 실행된 비트 양자화에 의해 결정된 비트 수를 최종 비트 수로 결정할 수 있다. 일 실시예에서, 인공신경망의 정확도는, 인공신경망이 주어진 문제의 해결 방법(예를 들어, 입력 데이터인 이미지에 포함된 물체의 인식)을 학습 후에, 추론 단계에서 해당 문제의 정답을 제시할 확률을 의미할 수 있다. 또한, 이상 설명한 비트 양자화 방법에서 사용되는 목표치는, 인공신경망의 비트 양자화 후에 유지해야할 최소한의 정확도로 나타낼 수 있다. 예를 들어, 임계치가 90퍼센트라고 가정하면, 비트 양자화에 의해 선택된 레이어의 파라미터를 저장하기 위한 메모리 크기를 비트 단위로 감소시킨 후에도, 해당 인공신경망의 정확도가 90퍼센트 이상이라면, 추가의 비트 양자화를 실행할 수 있다. 예를 들어, 첫 번째 비트 양자화를 실행한 후에, 인공신경망의 정확도가 94퍼센트로 측정되었다면, 추가의 비트 양자화를 실행할 수 있다. 두 번째 비트 양자화의 실행 후에, 인공신경망의 정확도가 88퍼센트로 측정되었다면, 현재 실행된 비트 양자화의 결과를 무시하고, 첫번째 비트 양자화에 의해 결정된 데이터 표현 비트 수를 최종의 비트 양자화 결과로 확정할 수 있다. On the other hand, if the accuracy determination module 2330 determines that the accuracy of the artificial neural network is not equal to or greater than the target value, it may determine that the accuracy of the artificial neural network is reduced due to bit quantization performed on the currently selected parameter. Accordingly, in this case, the number of bits determined by the bit quantization performed immediately before may be determined as the final number of bits. In one embodiment, the accuracy of the artificial neural network is the probability of presenting the correct answer to the problem in the reasoning step after the artificial neural network learns a method for solving a given problem (eg, recognition of an object included in an image as input data) can mean In addition, the target value used in the bit quantization method described above can be expressed with the minimum accuracy to be maintained after bit quantization of the artificial neural network. For example, assuming that the threshold is 90%, even after reducing the size of the memory for storing the parameters of the layer selected by bit quantization in bits, if the accuracy of the corresponding artificial neural network is 90% or more, additional bit quantization is performed. can run For example, after performing the first bit quantization, if the accuracy of the artificial neural network is measured to be 94%, additional bit quantization can be performed. After the execution of the second bit quantization, if the accuracy of the artificial neural network is measured to be 88%, it is possible to ignore the result of the currently executed bit quantization and determine the number of data representation bits determined by the first bit quantization as the final bit quantization result. have.

일 실시예에서, 연산량 비트 양자화(computational cost bit quantization) 방식에 따라, 연산량을 기준으로 비트 양자화를 실행할 파라미터 또는 파라미터 그룹을 선택하는 경우, 각 파라미터의 연산량은 다음과 같이 결정될 수 있다. 즉, 인공 신경망의 특정 연산에서 n 비트와 m 비트의 합산을 실행하는 경우, 해당 연산의 연산량은 (n+m)/2로 계산한다. 또한, 인공 신경망의 특정 연산에서 n 비트와 m 비트의 곱셈을 실행하는 경우, 해당 연산의 연산량은 n x m으로 계산할 수 있다. 따라서, 인공 신경망의 특정 파라미터에 대한 연산량은, 그 파라미터에 대해 실행하는 모든 덧셈과 곱셈의 연산량들을 합산한 결과가 될 수 있다.In an embodiment, according to a computational cost bit quantization method, when a parameter or parameter group to be bit quantized is selected based on the computational amount, the computational amount of each parameter may be determined as follows. That is, when the sum of n bits and m bits is performed in a specific operation of the artificial neural network, the amount of operation of the corresponding operation is calculated as (n+m)/2. In addition, when performing multiplication of n bits and m bits in a specific operation of the artificial neural network, the amount of operation of the corresponding operation can be calculated as n x m. Accordingly, the amount of computation for a specific parameter of the artificial neural network may be a result of summing all computations of addition and multiplication performed on the parameter.

이러한 비트 양자화에서 특정 파라미터 또는 파라미터 그룹을 선택하는 방법은, 각 레이어에 속한 가중치 데이터 또는 특징맵 및 활성화맵 데이터, 또는 하나의 레이어에 속한 각각의 가중치 커널, 또는 하나의 가중치 커널 내에 각각의 가중치 데이터 들을 개별적인 파라미터 그룹으로 선택할 수 있다.A method of selecting a specific parameter or parameter group in such bit quantization includes weight data or feature map and activation map data belonging to each layer, or each weight kernel belonging to one layer, or each weight data in one weight kernel. can be selected as individual parameter groups.

참고로, 본 개시의 실시예에 따른 도 22에 도시된 구성 요소들은 소프트웨어 또는 FPGA(Field Programmable Gate Array) 또는 ASIC(Application Specific Integrated Circuit)와 같은 하드웨어 구성 요소로 구현될 수 있다. For reference, the components illustrated in FIG. 22 according to an embodiment of the present disclosure may be implemented as software or hardware components such as Field Programmable Gate Array (FPGA) or Application Specific Integrated Circuit (ASIC).

그러나, '구성 요소들'은 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, 각 구성 요소는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. However, the term 'components' is not limited to software or hardware, and each component may be configured to reside in an addressable storage medium or to reproduce one or more processors.

따라서, 일 예로서 구성 요소는 소프트웨어 구성 요소들, 객체지향 소프트웨어 구성 요소들, 클래스 구성 요소 들 및 태스크 구성 요소들과 같은 구성 요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로 그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. Thus, as an example, a component includes components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, procedures, sub It includes routines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables.

구성 요소들과 해당 구성 요소들 안에서 제공되는 기능은 더 작은 수의 구성 요소들로 결합되거나 추가적인 구성 요소들로 더 분리될 수 있다. Components and functions provided within the components may be combined into a smaller number of components or further divided into additional components.

본 개시의 실시예들은, 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독 가능 매체는 컴퓨터 저장 매체 및 통신 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령 어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 통신 매체는 전형적으로 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈, 또는 반송파와 같은 변조된 데이터 신호의 기타 데이터, 또는 기타 전송 메커니즘을 포함하며, 임의의 정보 전달 매체를 포함한다.Embodiments of the present disclosure may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by a computer. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, computer-readable media may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transport mechanism, and includes any information delivery media.

본 명세서에서는 본 개시가 일부 실시예들과 관련하여 설명되었지만, 본 발명이 속하는 기술분야의 통상의 기술자가 이해할 수 있는 본 개시의 범위를 벗어나지 않는 범위에서 다양한 변형 및 변경이 이루어질 수 있다는 점을 알아야 할 것이다. 또한, 그러한 변형 및 변경은 본 명세서에 첨부된 특허청구의 범위 내에 속하는 것으로 생각되어야 한다. Although the present disclosure has been described with reference to some embodiments herein, it should be understood that various modifications and changes may be made without departing from the scope of the present disclosure as understood by those skilled in the art to which the present invention pertains. something to do. Further, such modifications and variations are intended to fall within the scope of the claims appended hereto.

100: 인공신경망 110_1 내지 110_N: 레이어
200: 인공신경망 210: 입력 데이터
220: 입력층 230: 은닉층
240: 출력층 400: 인공신경망
410: 입력 데이터 420: 콘볼루션 레이어(CONV)
430: 서브샘플링 레이어(SUBS) 440: 완전 연결 레이어(FC)
450: 출력 데이터 510: 입력 데이터
520: 가중치 커널 530, 260: 행
540, 270: 열 550, 280: 깊이(채널)
610: 제1 가중치 커널 620: 제1 결과값
630: 제1 활성화 맵 710: 제2 가중치 커널
720: 제2 결과값 730: 제2 활성화 맵
810: 입력 데이터 812: 다중 곱
814: 가중치 커널 816: 합
818: 출력 데이터 820: 제1 순회 구간
822: 제2 순회 구간 824: 제1 특징맵 값
826: 제2 특징맵 값 910: 입력 데이터
912: 다중 곱 914: 가중치 커널
916: 합 918: 출력 데이터
1010: 입력 데이터 1012: 다중 곱
1014: 가중치 커널 1016: 양자화
1018: 양자화 가중치 커널 1020: 합
1022: 출력 데이터 1024: 양자화
1026: 양자화 활성화 맵 1028: 가중치 커널 양자화
1030: 활성화 맵 양자화100: artificial neural network 110_1 to 110_N: layer
200: artificial neural network 210: input data
220: input layer 230: hidden layer
240: output layer 400: artificial neural network
410: input data 420: convolutional layer (CONV)
430: subsampling layer (SUBS) 440: fully connected layer (FC)
450: output data 510: input data
520: weight kernel 530, 260: row
540, 270: column 550, 280: depth (channel)
610: first weight kernel 620: first result value
630: first activation map 710: second weight kernel
720: second result value 730: second activation map
810: input data 812: multiple product
814: weight kernel 816: sum
818: output data 820: first traversal section
822: second traversal section 824: first feature map value
826: second feature map value 910: input data
912: multiple product 914: weight kernel
916: sum 918: output data
1010: input data 1012: multiple product
1014: weight kernel 1016: quantization
1018: quantization weight kernel 1020: sum
1022: output data 1024: quantization
1026: quantization activation map 1028: weighted kernel quantization
1030: Activation map quantization

Claims

In the bit quantization method of a multi-layered artificial neural network executed by a system,
calculating an amount of computation for each of a plurality of layers of the multi-layered artificial neural network;
selecting at least one layer from among the plurality of layers in the order of the layer having the highest computational amount;
a bit quantization step of reducing the size of the data representation of the selected at least one layer parameter in units of bits;
determining whether the accuracy of the multi-layered artificial neural network is greater than or equal to a target value after the bit quantization step; and
and performing the bit quantization step again when the accuracy of the multi-layered artificial neural network is equal to or greater than the target value.

In the bit quantization method of a multi-layered artificial neural network executed by a system,
calculating an amount of computation for each of a plurality of layers of the multi-layered artificial neural network;
selecting at least one layer from among the plurality of layers in the order of the layers having the lowest computational amount;
a bit quantization step of reducing the size of the data representation of the selected at least one layer parameter in units of bits;
determining whether the accuracy of the multi-layered artificial neural network is greater than or equal to a target value after the bit quantization step; and
and performing the bit quantization step again when the accuracy of the multi-layered artificial neural network is equal to or greater than the target value.

In the bit quantization method of a multi-layered artificial neural network executed by a system,
The bit size of the data for the parameters of the plurality of layers of the multi-layered neural network is fixed to the maximum, and the accuracy change point, which is the point at which the deterioration of the accuracy of the artificial neural network occurs in each layer, is not searched at least selecting one layer;
a bit quantization step of reducing the size of the data representation of the selected at least one layer parameter in units of bits;
determining whether the accuracy of the multi-layered artificial neural network is greater than or equal to a target value after the bit quantization step; and
and performing the bit quantization step again when the accuracy of the multi-layered artificial neural network is equal to or greater than the target value.

According to any one of claims 1 to 3,
When the accuracy of the multi-layered artificial neural network is less than the target value, determining the size of the data representation for the parameter of the selected layer that satisfies the accuracy above the target value as the final number of bits for the parameter of the selected layer. Including method.

5. The method of claim 4,
Selecting at least one layer in which the final number of bits for a parameter is not determined among the plurality of layers, and repeatedly executing the bit quantization step to determine the final number of bits of the at least one layer in which the selected final number of bits is not determined A method further comprising a step.

4. The method according to any one of claims 1 to 3,
The method according to claim 1, wherein the bit quantizing step of decreasing in bits is configured to decrease in units of 1 bit.

4. The method according to any one of claims 1 to 3,
The method, characterized in that the parameter of the selected at least one layer comprises at least one of weight data, feature map data, and activation map data.

4. The method according to any one of claims 1 to 3,
wherein the bit quantizing step performs bit quantization to reduce a storage size of at least one of a buffer memory, a register memory, and a cache memory configured to store the parameter of the selected at least one layer.

4. The method according to any one of claims 1 to 3,
The number of bits of the multiplier and the adder of the processing unit for processing the bit-quantized multi-layered artificial neural network is designed to correspond to the number of bits according to the result of the bit quantization step, the method.

In the bit quantization method of a multi-layered artificial neural network executed by a system including a processor,
calculating, through the processor, an amount of computation for each layer of the multi-layered artificial neural network;
selecting, through the processor, the weight data of the layer in an order of weight data of a layer having a high computational amount or weighting data of a layer having a low computational amount;
a bit quantization step of reducing, by the processor, a size of a data representation of the selected weight data in bits;
determining, through the processor, whether the accuracy of the multi-layered artificial neural network is equal to or greater than a target value after the bit quantization step; and
When, through the processor, the accuracy of the multi-layered artificial neural network is equal to or greater than the target value, performing the bit quantization step again.

In the bit quantization method of a multi-layered artificial neural network executed by a system including a processor,
calculating, through the processor, an amount of computation for each layer of the multi-layered artificial neural network;
selecting, through the processor, the feature map data of the layer in the order of the feature map data of the layer with the high computational amount or the feature map data of the layer with the low computational amount;
a bit quantization step of reducing, by the processor, a size of a data representation of the selected feature map data in units of bits;
determining, through the processor, whether the accuracy of the multi-layered artificial neural network is equal to or greater than a target value after the bit quantization step; and
When, through the processor, the accuracy of the multi-layered artificial neural network is equal to or greater than the target value, performing the bit quantization step again.

In the bit quantization method of a multi-layered artificial neural network executed by a system including a processor,
calculating, through the processor, an amount of computation for each layer of the multi-layered artificial neural network;
selecting, through the processor, the activation map data of the layer in an order of activation map data of a layer having a high computational amount or in an order of activation map data of a layer having a low computational amount;
a bit quantization step of reducing, by the processor, a size of a data representation of the selected activation map data in units of bits;
determining, through the processor, whether the accuracy of the multi-layered artificial neural network is equal to or greater than a target value after the bit quantization step; and
When, through the processor, the accuracy of the multi-layered artificial neural network is equal to or greater than the target value, performing the bit quantization step again.

In the bit quantization method of a multi-layered artificial neural network executed by a system including a processor,
Through the processor, the bit size of data for parameters including one of weight data, feature map data, and activation map data of each layer of the multi-layered artificial neural network is fixed to the maximum, and in each layer, the artificial neural network selecting one parameter for which the search for an accuracy variation point, which is a point at which deterioration of accuracy of , is not performed; a bit quantization step of reducing the size of the data representation for the selected one parameter in bits;
determining, through the processor, whether the accuracy of the multi-layered artificial neural network is equal to or greater than a target value after the bit quantization step; and
When, through the processor, the accuracy of the multi-layered artificial neural network is equal to or greater than the target value, performing the bit quantization step again.

14. The method according to any one of claims 10 to 13,
When the accuracy of the multi-layered artificial neural network is less than the target value, determining the size of the data representation for the selected at least one parameter that satisfies the accuracy greater than or equal to the target value as the final number of bits for the selected at least one parameter; A method further comprising:

15. The method of claim 14,
selecting at least one parameter for which the final number of bits for the plurality of parameters is not determined, and repeatedly executing the bit quantization step to determine the final number of bits of the at least one parameter for which the selected final number of bits is not determined; further comprising the method.

14. The method according to any one of claims 10 to 13,
The method according to claim 1, wherein the bit quantizing step of decreasing in bits is configured to decrease in units of 1 bit.

14. The method according to any one of claims 10 to 13,
wherein the bit quantizing step is bit quantized to reduce a storage size of at least one of a buffer memory, a register memory, and a cache memory configured to store the selected at least one parameter.

14. The method according to any one of claims 10 to 13,
The number of bits of the multiplier and the adder of the processing unit for processing the bit-quantized multi-layered artificial neural network is designed to correspond to the number of bits according to the result of the bit quantization step, the method.

14. The method of claim 13,
The method further comprising the step of setting the number of bits of the parameter of each layer of the multi-layered artificial neural network to be 2 bits larger than the number of bits immediately before the accuracy change point.