KR102153791B1

KR102153791B1 - Digital neural, artificial neuron for artificial neuron network and inference engine having the same

Info

Publication number: KR102153791B1
Application number: KR1020180099090A
Authority: KR
Inventors: 김시호; 박현빈
Original assignee: 연세대학교 산학협력단
Priority date: 2017-12-20
Filing date: 2018-08-24
Publication date: 2020-09-08
Also published as: KR20190074938A

Abstract

본 발명의 실시예에 따른 인공 신경망을 위한 디지털 뉴런, 인공 뉴런 및 이를 포함하는 추론 엔진은 정수화된 가중치를 인가받아 입력값과의 곱셈 연산을 비트 천이 연산으로 획득되는 부분곱의 합산 방식으로 수행하도록 함으로써, 소형의 디지털 하드웨어로 구현될 수 있고, 전력 소모를 줄일 수 있으며, 연산 속도를 크게 향상시킬 수 있도록 한다.A digital neuron for an artificial neural network, an artificial neuron, and an inference engine including the same according to an embodiment of the present invention perform a multiplication operation with an input value by receiving an integerized weight in a summation method of partial products obtained by bit shift operation. By doing so, it can be implemented with small digital hardware, power consumption can be reduced, and operation speed can be greatly improved.

Description

Digital neurons for artificial neural networks, artificial neurons, and inference engines including the same {DIGITAL NEURAL, ARTIFICIAL NEURON FOR ARTIFICIAL NEURON NETWORK AND INFERENCE ENGINE HAVING THE SAME}

본 발명은 인공 신경망을 위한 디지털 뉴런, 인공 뉴런 및 이를 포함하는 추론 엔진에 관한 것으로, 특히 정수 기반으로 다수의 곱셈 및 덧셈 연산을 단순화하여 연산 속도를 향상시키고 전력 소비를 저감할 수 있으며, 작은 면적의 디지털 하드웨어로 구현 가능한 인공 신경망을 위한 디지털 뉴런, 인공 뉴런 및 이를 포함하는 추론 엔진에 관한 것이다.The present invention relates to a digital neuron for an artificial neural network, an artificial neuron, and an inference engine including the same.In particular, by simplifying a number of multiplication and addition operations based on integers, it is possible to improve computation speed and reduce power consumption, and to reduce power consumption. It relates to a digital neuron, an artificial neuron, and an inference engine including the same for an artificial neural network that can be implemented with digital hardware.

최근 인간의 두뇌가 패턴을 인식하는 방법을 모사하여 두뇌와 비슷한 방식으로 여러 정보를 처리하도록 구성된 인공 신경망(artificial neural network)을 이용한 딥 러닝에 대한 연구가 활발하게 진행되고 있다. 딥 러닝은 일예로 객체 분류, 객체 검출, 음성 인식, 자연어 처리와 분야에 적용되고 있으며, 적용 분야가 계속 확장되어 가고 있다.Recently, research on deep learning using an artificial neural network configured to process various information in a manner similar to that of the brain by simulating the way the human brain recognizes patterns is actively progressing. As an example, deep learning is applied to object classification, object detection, speech recognition, natural language processing and fields, and the field of application continues to expand.

딥 러닝(Deep Learning) 기법을 이용하는 인공 신경망은 일반적으로 다수의 인공 뉴런(artificial neuron)을 포함하고, 다수의 인공 뉴런은 행렬(또는 벡터)의 내적(dot product) 연산을 수행하도록 구성된다. 즉 다수의 인공 뉴런은 인공 신경망의 추론 엔진을 구성하며, 연산기로서 기능한다.An artificial neural network using a deep learning technique generally includes a plurality of artificial neurons, and the plurality of artificial neurons is configured to perform a dot product operation of a matrix (or vector). In other words, a number of artificial neurons constitute the inference engine of the artificial neural network and function as an operator.

또한 인공 신경망은 추론을 위해 사용되기 이전에 방대한 학습 데이터를 기반으로 학습이 수행되어야, 요구되는 패턴 인식 성능을 나타낼 수 있다. 그러므로 일반적으로 학습은 서버 또는 GPU(Graphics Processing Unit)/TPU(Tensor Processing Units)와 같은 고성능의 범용 연산 가속기가 장착된 장치에서 수행되고, 실제 사용 시의 추론은 임베디드 시스템에서 수행되는 전략이 채택되어 왔다. 그리고 인공 신경망은 학습 및 추론 과정에서 대량의 덧셈 및 곱셈 연산을 수행하게 된다. In addition, the artificial neural network must perform learning based on vast amounts of training data before it is used for inference, so that it can exhibit the required pattern recognition performance. Therefore, in general, learning is performed on a server or a device equipped with a high-performance general-purpose computational accelerator such as GPU (Graphics Processing Unit)/TPU (Tensor Processing Units), and inference in actual use is performed in an embedded system. come. In addition, the artificial neural network performs a large amount of addition and multiplication operations during the learning and inference process.

그러나 인공 신경망이 학습이 아닌 추론을 위해 이용되는 경우, 적용된 시스템의 하드웨어의 성능에 따라 처리 속도에 큰 차이가 발생하게 된다. 특히 임베디스 시스템(embedded system)에서의 경우, 크기, 전력 소비 등과 같은 여러 제약 사항이 존재하므로, 인공 신경망을 위해 고성능의 범용 연산 가속기 이용하기 어렵다는 한계가 있다. 따라서 임베디스 시스템에서는 일반적으로 인공 신경망의 인공 뉴런이 추론을 위해 별도로 설계되는 디지털 하드웨어인 추론 엔진의 형태로 구현된다. 그럼에도 임베디드 시스템의 크기, 전력 소비 및 연산 속도의 문제는 추론 엔진의 설계에도 제약 사항으로 작용하여, 인공 신경망이 소형 기기 등에 적용되기 어렵게 하는 요인이 되어, 인공 신경망의 적용 분야를 제한하는 요인이 되고 있다.However, when an artificial neural network is used for inference rather than learning, a large difference occurs in processing speed depending on the hardware performance of the applied system. In particular, in the case of an embedded system, since there are various limitations such as size and power consumption, there is a limitation in that it is difficult to use a high-performance general-purpose computational accelerator for an artificial neural network. Therefore, in an embedded system, artificial neurons of an artificial neural network are generally implemented in the form of an inference engine, which is digital hardware separately designed for inference. Nevertheless, the problem of the size, power consumption, and computational speed of the embedded system acts as a limitation on the design of the inference engine, making it difficult to apply the artificial neural network to small devices, and limiting the field of application of the artificial neural network. have.

이에 적은 면적으로 구현되면서 저전력을 소모하여 인공 신경망에서 요구되는 대량의 연산에 대한 연산 속도를 획기적으로 향상 시킬 수 있는 추론 엔진을 위한 최적화된 인공 뉴런이 요구되고 있다.Accordingly, there is a need for an optimized artificial neuron for an inference engine that is implemented in a small area and consumes low power and can significantly improve the computational speed for a large amount of computation required in an artificial neural network.

한국 공개 특허 제10-2016-0143505호 (2016.12.14 공개)Korean Patent Publication No. 10-2016-0143505 (published on December 14, 2016)

본 발명의 목적은 소형으로 구현 가능하고 저전력을 소모하며, 연산 속도를 향상시킬 수 있는 인공 신경망을 위한 디지털 하드웨어로 구현되는 디지털 뉴런, 인공 뉴런 및 이를 포함하는 추론 엔진을 제공하는데 있다.An object of the present invention is to provide a digital neuron, an artificial neuron, and an inference engine including the same, implemented as digital hardware for an artificial neural network that can be implemented in a small size, consumes low power, and improves computation speed.

본 발명의 다른 목적은 정수화된 가중치를 입력받아, 정수 입력값과의 곱셈 연산을 제한된 개수의 비트 천이 및 가산 연산으로 수행함으로써, 연산 속도를 향상시키고, 저전력을 소모하며 구현 면적을 줄일 수 있는 인공 신경망을 위한 디지털 뉴런, 인공 뉴런 및 이를 포함하는 추론 엔진을 제공하는데 있다.Another object of the present invention is to receive an integerized weight and perform a multiplication operation with an integer input value with a limited number of bit transitions and addition operations, thereby improving the operation speed, consuming low power, and reducing the realization area. It is to provide digital neurons, artificial neurons, and inference engines including the same for neural networks.

본 발명의 다른 목적은 다수의 입력값과 다수의 가중치의 곱셈을 병렬로 동시에 연산하여 연산 속도를 향상 시킬 수 있는 디지털 뉴런, 인공 뉴런 및 이를 포함하는 추론 엔진을 제공하는데 있다.Another object of the present invention is to provide a digital neuron, an artificial neuron, and an inference engine including the same, capable of increasing the computation speed by simultaneously calculating the multiplication of a plurality of input values and a plurality of weights in parallel.

본 발명의 또 다른 목적은 부분곱의 합산 시에 부호 비트에 대한 연산을 별도로 수행하여, 적은 면적으로 구현 가능한 인공 신경망을 위한 디지털 뉴런, 인공 뉴런 및 이를 포함하는 추론 엔진을 제공하는데 있다.Another object of the present invention is to provide a digital neuron, an artificial neuron, and an inference engine including the same for an artificial neural network that can be implemented with a small area by separately performing an operation on a sign bit when partial products are summed.

본 발명의 또 다른 목적은 다수의 디지털 뉴런을 포함하여 기지정된 크기 이하의 가중치 필터에 대해서는 병렬로 연산 처리를 수행하고, 기지정된 크기를 초과하는 크기를 갖는 가중치 필터에 대해서는 가중치 필터의 가중치를 분할하여 동시에 연산 처리를 수행할 수 있도록 하여 가중치 필터의 크기 조절이 가능한 디지털 뉴런, 인공 뉴런 및 이를 포함하는 추론 엔진을 제공하는데 있다.Another object of the present invention is to perform computational processing in parallel for weight filters less than a predetermined size including a plurality of digital neurons, and divide the weight of the weight filter for weight filters having a size exceeding a predetermined size. Thus, it is intended to provide a digital neuron, an artificial neuron, and an inference engine including the same, in which the size of the weight filter can be adjusted by allowing computational processing to be performed simultaneously.

본 발명의 또 다른 목적은 다수의 레이어로 구성되는 인공 신경망에서 다수의 레이어가 디지털 뉴런을 공유할 수 있도록 하여, 최소 개수의 디지털 뉴런으로 다수의 레이어를 구현할 수 있도록 하는 추론 엔진을 제공하는데 있다.Another object of the present invention is to provide an inference engine that enables a plurality of layers to share digital neurons in an artificial neural network composed of a plurality of layers, thereby implementing a plurality of layers with a minimum number of digital neurons.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 디지털 뉴런은 기지정된 비트 수를 갖는 정수 포멧의 다수의 입력값을 포함하는 입력 벡터와 기지정된 비트 수를 갖는 정수 포멧의 다수의 가중치를 포함하는 가중치 벡터를 인가받아 내적 연산을 수행하는 디지털 하드웨어로 구현되는 디지털 뉴런에 있어서, 상기 디지털 뉴런은 신경소자를 포함하고, 상기 신경소자는 상기 가중치 벡터에서 대응하는 가중치를 인가받고, 인가된 정수 포멧의 상기 가중치를 -1, 0, 1 중 적어도 하나의 값을 갖는 계수(R)와 2의 승수(2ⁿ)의 곱으로 표현되는 다수의 부분곱(R·2ⁿ)의 합으로 분해하고, 분해된 부분곱의 계수(R)와 지수(n)에 따라 제어 신호를 출력하는 다수의 가중치 분해부; 상기 입력 벡터에서 대응하는 입력값을 인가받고, 상기 제어 신호에 응답하여 상기 입력값을 지수(n)만큼 상위 비트 방향으로 비트 천이하여, 다수의 부분곱을 각각 출력하는 다수의 부분곱 생성기; 및 상기 다수의 부분곱을 병렬로 합산하여 채널 연산값을 출력하는 부분곱 가산기; 를 포함한다.A digital neuron according to an embodiment of the present invention for achieving the above object includes an input vector including a plurality of input values in an integer format having a predetermined number of bits and a plurality of weights in an integer format having a predetermined number of bits. In a digital neuron implemented by digital hardware that performs an inner product operation by receiving a weight vector, the digital neuron includes a neuron, and the neuron receives a corresponding weight from the weight vector, and the applied integer format The weight of is decomposed into the sum of a plurality of partial products (R·2 ⁿ ) expressed as the product of a coefficient (R) having at least one value of -1, 0, 1 and a multiplier of 2 (2 ⁿ ), A plurality of weight decomposition units outputting a control signal according to a coefficient (R) and an index (n) of the decomposed partial product; A plurality of partial product generators for receiving a corresponding input value from the input vector, and outputting a plurality of partial products by bit shifting the input value in a direction of an upper bit by an index (n) in response to the control signal; And a partial product adder for summing the plurality of partial products in parallel to output a channel operation value. Includes.

상기 다수의 부분곱 생성기 각각은 상기 계수(R)에 따라 상기 다수의 부분곱에 대해 1의 보수 연산 또는 0 전환 연산을 추가로 수행할 수 있다.Each of the plurality of partial product generators may additionally perform a one's complement operation or a zero conversion operation for the plurality of partial products according to the coefficient R.

상기 디지털 뉴런은 상기 다수의 부분곱 생성기에서 1의 보수 연산된 횟수를 카운트하여 카운트 값을 획득하는 네거티브 부분곱 카운터; 를 더 포함하고, 상기 부분곱 가산기는 상기 다수의 부분곱과 함께 상기 카운트 값을 합산하여 상기 채널 연산값을 계산할 수 있다.The digital neuron may include a negative partial product counter for obtaining a count value by counting the number of times a one's complement operation is performed by the plurality of partial product generators; Further, the partial product adder may calculate the channel operation value by summing the count values together with the plurality of partial products.

상기 부분곱 가산기는 상기 다수의 부분곱에서 부호 비트의 값이 제외된 값과 상기 카운트 값을 합산하고, 상기 합산 결과에 상기 카운트 값의 2의 보수를 부호 비트에 추가하여 상기 채널 연산값을 계산할 수 있다.The partial product adder calculates the channel operation value by summing the count value with the value excluding the value of the sign bit from the plurality of partial products, and adding the two's complement of the count value to the sign bit to the summation result. I can.

상기 다수의 부분곱 생성기 각각은 상기 제어 신호에 따라 구동되어 각각 부분곱을 계산하여 출력하는 다수의 부분곱 계산기를 포함하고, 상기 다수의 부분곱 계산기 각각은 상기 입력값을 상기 지수(n)만큼 비트 천이하는 비트 시프터; 상기 계수(R)가 -1인 경우, 상기 비트 시프터의 출력값에 대해 1의 보수 연산하여 상기 부분곱을 출력하는 1의 보수 연산기; 및 상기 계수(R)가 0인 경우, 상기 제어 신호에 응답하여 상기 부분곱을 0으로 전환하는 제로 곱셈기; 를 더 포함할 수 있다.Each of the plurality of partial product generators includes a plurality of partial product calculators that are driven according to the control signal to calculate and output each partial product, and each of the plurality of partial product calculators bit the input value by the exponent (n). A shifting bit shifter; A one's complement operator for outputting the partial product by performing a one's complement operation on the output value of the bit shifter when the coefficient R is -1; And a zero multiplier for converting the partial product to 0 in response to the control signal when the coefficient R is 0. It may further include.

상기 부분곱 가산기는 각각 다수의 전가산기를 포함하는 다수의 스테이지로 구성되고, 상기 다수의 스테이지 중 제1 스테이지는 상기 다수의 부분곱을 인가받고, 서로 다른 기설정된 개수의 부분곱의 동일 비트의 값들을 그룹화하여 가산하고, 나머지 스테이지는 이전 스테이지의 가산값들을 그룹화하여 가산하며, 최종 스테이지는 이전 스테이지의 가산값의 하위 비트에 상기 카운트 값을 합산하여 상기 채널 연산값을 계산할 수 있다.The partial product adder is composed of a plurality of stages each including a plurality of full adders, and a first stage of the plurality of stages receives the plurality of partial products, and the value of the same bit of the partial products of different preset numbers They are grouped and added, the remaining stages are added by grouping the addition values of the previous stage, and the final stage can calculate the channel operation value by adding the count value to the lower bits of the addition value of the previous stage.

상기 부분곱 가산기는 상기 카운트 값을 인가받아 2의 보수 연산하여 카운트 보수값을 계산하는 2의 보수 연산기; 를 더 포함하고, 상기 최종 스테이지는 이전 스테이지의 가산값의 상위 비트에 상기 카운트 보수값을 가산하여 상기 채널 연산값의 부호 비트를 확장할 수 있다.The partial product adder receives the count value and performs a two's complement operation to calculate a count complement value; Further, the final stage may extend the sign bit of the channel operation value by adding the count complement value to the upper bit of the addition value of the previous stage.

상기 디지털 뉴런은 다수의 가중치을 갖는 2차원 가중치 필터와 다수의 입력값을 갖는 2차원 입력 특징 맵을 인가받는 하나의 신경소자를 포함하고, 상기 부분곱 가산기는 기지정된 바이어스 값을 인가받아 상기 채널 연산값에 가산하여 상기 디지털 뉴런의 출력값을 출력할 수 있다.The digital neuron includes a two-dimensional weight filter having a plurality of weights and one nerve element receiving a two-dimensional input feature map having a plurality of input values, and the partial product adder receives a predetermined bias value and calculates the channel. In addition to the value, an output value of the digital neuron may be output.

상기 디지털 뉴런은 3차원 가중치 필터의 다수의 가중치을 갖는 다수의 2차원 가중치 필터 중 대응하는 2차원 가중치 필터와 3차원 입력 특징 맵의 다수의 입력값을 갖는 다수의 2차원 입력 특징 맵 중 대응하는 2차원 입력 특징 맵을 각각 인가받는 다수의 신경소자를 포함하고, 상기 다수의 신경소자 각각에서 출력되는 상기 채널 연산값과 기지정된 바이어스 값을 가산하여 상기 디지털 뉴런의 출력값을 출력하는 채널 가산기; 를 더 포함할 수 있다.The digital neuron is a corresponding 2D weight filter among a plurality of 2D weight filters having a plurality of weights of the 3D weight filter and a corresponding 2D input feature map having a plurality of input values of the 3D input feature map. A channel adder that includes a plurality of neurons each receiving a dimensional input feature map, and outputs an output value of the digital neuron by adding the channel operation value output from each of the plurality of neurons and a predetermined bias value; It may further include.

상기 가중치는 상기 부분곱의 개수를 줄이기 위해, 미리 지정된 정수가 제외되도록 지정된 인접 정수로 대치되어 인가될 수 있다.In order to reduce the number of partial products, the weight may be applied by being substituted with a designated adjacent integer so that a predetermined integer is excluded.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 디지털 뉴런은 기지정된 비트 수를 갖는 정수 포멧의 다수의 입력값을 포함하는 입력 벡터와 기지정된 비트 수를 갖는 정수 포멧의 다수의 가중치를 포함하는 가중치 벡터를 인가받아 내적 연산을 수행하는 디지털 하드웨어로 구현되는 신경소자를 포함하는 디지털 뉴런에 있어서, 상기 신경소자는 상기 가중치 벡터에서 대응하는 가중치를 인가받고, 인가된 정수 포멧의 상기 가중치를 -1, 0, 1 중 적어도 하나의 값을 갖는 계수(R)와 2의 승수(2ⁿ)의 곱으로 표현되는 다수의 부분곱(R·2ⁿ)의 합으로 분해하는 다수의 가중치 분해부; 상기 입력 벡터에서 대응하는 입력값과 상기 입력값의 2의 보수인 네거티브 입력값 및 0값에 대응하는 전압을 인가받아 상기 계수(R)에 대응하는 값을 선택하고, 선택된 값을 지수(n)만큼 상위 비트 방향으로 비트 천이하여, 다수의 부분곱을 각각 출력하는 다수의 부분곱 생성기; 및 상기 다수의 부분곱을 병렬로 합산하여 채널 연산값을 출력하는 부분곱 가산기; 를 포함한다.A digital neuron according to an embodiment of the present invention for achieving the above object includes an input vector including a plurality of input values in an integer format having a predetermined number of bits and a plurality of weights in an integer format having a predetermined number of bits. In a digital neuron including a neuron implemented by digital hardware that performs a dot product operation by receiving a weight vector to be applied, the neuron is applied with a corresponding weight from the weight vector, and the weight of the applied integer format is- A plurality of weight decomposition units that decompose into a sum of a plurality of partial products (R·2 ⁿ ) expressed as a product of a coefficient (R) having at least one value of 1, 0, and 1 and a multiplier of 2 (2 ⁿ ); In the input vector, a corresponding input value, a negative input value that is two's complement of the input value, and a voltage corresponding to a 0 value are applied to select a value corresponding to the coefficient R, and the selected value is an index (n) A plurality of partial product generators for bit-shifting in the direction of a higher bit and outputting a plurality of partial products, respectively; And a partial product adder for summing the plurality of partial products in parallel to output a channel operation value. Includes.

상기 다수의 부분곱 생성기 각각은 각각 부분곱을 계산하여 출력하는 다수의 부분곱 계산기를 포함하며, 상기 다수의 부분곱 계산기 각각은 상기 입력값과 상기 입력값의 2의 보수인 네거티브 입력값 및 0값에 대응하는 전압을 인가받고, 상기 계수(R)에 대응하는 값을 선택하여 출력하는 먹스; 및 상기 먹스에서 선택되어 인가되는 값을 상기 지수(n)만큼 비트 천이하는 비트 시프터; 를 포함할 수 있다.Each of the plurality of partial product generators includes a plurality of partial product calculators for calculating and outputting each partial product, and each of the plurality of partial product calculators includes a negative input value and a zero value that are two's complement of the input value and the input value. A mux for receiving a voltage corresponding to a and selecting and outputting a value corresponding to the coefficient R; And a bit shifter for bit shifting a value selected and applied from the mux by the index (n). It may include.

따라서, 본 발명의 인공 신경망을 위한 디지털 뉴런, 인공 뉴런 및 이를 포함하는 추론 엔진은 디지털 뉴런이 곱셈 연산을 비트 쉬프트 및 합산 방식으로 수행하도록 함으로써, 소형의 디지털 하드웨어로 구현될 수 있고, 전력 소모를 줄일 수 있으며, 연산 속도를 크게 향상시킬 수 있다. 이때, 디지털 뉴런은 부분곱의 개수를 최소화할 수 있도록 네거티브 부분곱을 2의 보수 연산을 이용하여 제공함으로써, 연산 속도를 더욱 향상시킬 수 있으며, 크기 및 전력 소비를 추가적으로 줄일 수 있다.Therefore, the digital neurons, artificial neurons, and the inference engine including the same for the artificial neural network of the present invention can be implemented with small digital hardware by allowing the digital neurons to perform multiplication operations in a bit shift and summation method, and reduce power consumption. Can be reduced, and the calculation speed can be greatly improved. In this case, the digital neuron provides a negative partial product using a two's complement operation to minimize the number of partial products, thereby further improving the operation speed and further reducing the size and power consumption.

그리고 다수의 가중치와 다수의 입력값에 대해 동시에 병렬로 곱셈 연산을 수행하도록 하여 연산 속도를 더욱 향상 시킬 수 있다.In addition, it is possible to further improve the operation speed by performing a multiplication operation on multiple weights and multiple input values simultaneously in parallel.

또한 디지털 뉴런은 다수의 네거티브 부분곱을 획득하기 위한 2의 보수 연산을 1의 보수 연산 이후, 네거티브 부분곱의 개수에 따라 일괄적으로 가산하여 회로 구성을 더욱 간략화할 수 있다. 뿐만 아니라 디지털 뉴런은 부분곱의 합산 시에 부호 비트에 대한 연산을 별도로 수행할 수 있도록 하여, 크기와 전력소모를 최소화 화면서도 연산 속도를 크게 높일 수 있다.In addition, the digital neuron can further simplify the circuit configuration by collectively adding a 2's complement operation for obtaining a plurality of negative partial products according to the number of negative partial products after a 1's complement operation. In addition, the digital neuron allows the operation of the sign bit to be separately performed when the partial products are summed, thereby minimizing the size and power consumption, and greatly increasing the operation speed on the screen.

그리고 추론 엔진은 다수의 디지털 뉴런을 포함하여 기지정된 크기 이하의 가중치 필터에 대해서는 병렬로 연산 처리를 수행하고, 기지정된 크기를 초과하는 크기를 갖는 가중치 필터에 대해서는 가중치 필터의 가중치를 분할하여 연산 처리를 수행할 수 있도록 하여, 하드웨어적 구조 변경없이 다양한 크기의 가중치 필터를 이용할 수 있다.In addition, the inference engine performs computational processing in parallel for weight filters less than a predetermined size including a plurality of digital neurons, and divides the weight of the weight filter for weight filters having a size exceeding the predetermined size. Can be performed, so that weight filters of various sizes can be used without changing the hardware structure.

추가적으로 다수의 레이어로 구성되는 인공 신경망의 추론 엔진은 다수의 레이어가 디지털 뉴런을 공유하도록 함으로써, 다수의 레이어를 최소 개수의 하드웨어 레이어로 구현함으로써, 인공 신경망에 포함되는 디지털 뉴런의 개수를 최소화 할 수 있으며, 최소 회로 구성으로 복잡한 인공 신경망을 용이하게 구현할 수 있도록 한다. In addition, the inference engine of an artificial neural network composed of multiple layers allows multiple layers to share digital neurons, thereby implementing multiple layers with the minimum number of hardware layers, minimizing the number of digital neurons included in the artificial neural network. In addition, a complex artificial neural network can be easily implemented with a minimum circuit configuration.

도1 은 인공 신경망 중 심층 신경망의 일예를 나타낸다.
도2 는 인공신경망의 인공 뉴런의 수학적 모델의 일예를 나타낸다.
도3 은 본 발명의 실시예에 따른 인공 신경망의 디지털 뉴런에 대한 블록 다이어그램을 나타낸다.
도4 는 가중치를 지정된 개수의 부분곱으로 표현하는 경우의 오차를 설명하기 위한 도면이다.
도5 는 도3 의 인공 뉴런에서 신경소자의 상세 구성의 일예를 나타낸다.
도6 은 도5 의 부분곱 가산기의 상세 구성의 일예를 나타낸다.
도7 은 도6 의 2의 보수 연산기의 동작 개념을 설명하기 위한 도면이다.
도8 은 디지털 뉴런의 신경소자의 다른 예를 나타낸다.
도9 는 본 발명의 다른 실시예에 따른 인공 신경망의 디지털 뉴런에 대한 블록 다이어그램을 나타낸다.
도10 은 도9 의 인공 뉴런에서 신경소자의 상세 구성의 일예를 나타낸다.
도11 은 도10 의 부분곱 가산기의 상세 구성의 일예를 나타낸다.
도12 는 본 실시예에 따른 디지털 뉴런의 전력 소비를 시뮬레이션 한 결과를 나타낸다.
도13 은 본 발명의 실시예에 따른 디지털 뉴런을 이용하여 가중치 필터 크기를 가변할 수 있는 인공 뉴런 구조의 일예를 나타낸다.
도14 는 본 발명의 다른 실시예에 따른 인공 신경망의 추론 엔진의 개략적 구성을 나타낸다.1 shows an example of a deep neural network among artificial neural networks.
2 shows an example of a mathematical model of an artificial neuron of an artificial neural network.
3 is a block diagram of a digital neuron of an artificial neural network according to an embodiment of the present invention.
4 is a diagram for explaining an error in the case of expressing a weight as a partial product of a specified number.
5 shows an example of a detailed configuration of a neuron in the artificial neuron of FIG. 3.
6 shows an example of a detailed configuration of the partial product adder of FIG. 5.
FIG. 7 is a diagram for explaining the operation concept of the two's complement operator of FIG. 6;
8 shows another example of a neuron of a digital neuron.
9 is a block diagram of digital neurons of an artificial neural network according to another embodiment of the present invention.
10 shows an example of a detailed configuration of a neuron in the artificial neuron of FIG. 9.
Fig. 11 shows an example of the detailed configuration of the partial product adder of Fig. 10;
12 shows the result of simulating power consumption of digital neurons according to the present embodiment.
13 shows an example of an artificial neuron structure capable of varying the size of a weight filter by using a digital neuron according to an embodiment of the present invention.
14 shows a schematic configuration of an inference engine of an artificial neural network according to another embodiment of the present invention.

본 발명과 본 발명의 동작상의 이점 및 본 발명의 실시에 의하여 달성되는 목적을 충분히 이해하기 위해서는 본 발명의 바람직한 실시예를 예시하는 첨부 도면 및 첨부 도면에 기재된 내용을 참조하여야만 한다. In order to fully understand the present invention, the operational advantages of the present invention, and the objects achieved by the implementation of the present invention, reference should be made to the accompanying drawings illustrating preferred embodiments of the present invention and the contents described in the accompanying drawings.

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로써, 본 발명을 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 설명하는 실시예에 한정되는 것이 아니다. 그리고, 본 발명을 명확하게 설명하기 위하여 설명과 관계없는 부분은 생략되며, 도면의 동일한 참조부호는 동일한 부재임을 나타낸다. Hereinafter, the present invention will be described in detail by describing a preferred embodiment of the present invention with reference to the accompanying drawings. However, the present invention may be implemented in various different forms, and is not limited to the described embodiments. In addition, in order to clearly describe the present invention, parts irrelevant to the description are omitted, and the same reference numerals in the drawings indicate the same members.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "...부", "...기", "모듈", "블록" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. Throughout the specification, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components unless specifically stated to the contrary. In addition, terms such as "... unit", "... group", "module", and "block" described in the specification mean units that process at least one function or operation, which is hardware, software, or hardware. And software.

도1 은 인공 신경망 중 컨볼루션 신경망의 일예를 나타내고, 도2 는 인공신경망의 인공 뉴런의 수학적 모델의 일예를 나타낸다.1 shows an example of a convolutional neural network among artificial neural networks, and FIG. 2 shows an example of a mathematical model of an artificial neuron of the artificial neural network.

도1 에 도시된 바와 같이, 일반적으로 인공 신경망(artificial neural network), 특히 심층 신경망(Deep Neural Network: DNN)은 다수의 레이어를 포함하고, 다수의 레이어 각각에 포함된 적어도 하나의 인공 뉴런 각각은 입력 레이어 또는 이전 레이어의 출력을 입력 벡터(X)로서 인가받아 기지정된 연산을 수행한다.1, in general, an artificial neural network, particularly a deep neural network (DNN), includes a plurality of layers, and each of at least one artificial neuron included in each of the plurality of layers is A predetermined operation is performed by receiving the output of the input layer or the previous layer as an input vector (X).

다수의 레이어에 포함된 다수의 인공 뉴런 각각은 인가된 입력 벡터(X)에 대해 미리 지정된 가중치 벡터(w)를 이용하여 기지정된 방식으로 연산하여 출력하도록 구성될 수 있다.Each of the plurality of artificial neurons included in the plurality of layers may be configured to calculate and output the applied input vector (X) in a predetermined manner using a weight vector (w) designated in advance.

도2 를 참조하면, 다수의 인공 뉴런 각각에는 입력 벡터(X)의 원소(X₁, X₂, …, X_N)가 입력값으로 인가된다. 그리고 가중치 벡터(w)의 원소인 가중치(w₁, w₂, …, w_N)는 입력 벡터(X)의 다수의 입력값(X₁, X₂, …, X_N) 중 대응하는 입력값의 중요도를 조절하기 위한 값이다.Referring to FIG. 2, elements (X ₁ , X ₂ , …, X _N ) of an input vector X are applied as input values to each of a plurality of artificial neurons. And the weight (w ₁ , w ₂ , …, w _N ), which is an element of the weight vector (w), is the corresponding input value among multiple input values (X ₁ , X ₂ , …, X _N ) of the input vector (X). It is a value to control the importance of

여기서 가중치 벡터(w)의 다수의 가중치(w₁, w₂, …, w_N)는 심층 신경망의 학습 과정 동안 오차 역전파에 의해 가변되어 다양한 값을 갖도록 지정될 수 있다. 학습에 의해 지정된 가중치 벡터(w)는 각각의 레이어 또는 별도의 메모리에 미리 저장될 수 있다.Here, a plurality of weights (w ₁ , w ₂ , …, w _N ) of the weight vector w may be designated to have various values by being varied by error backpropagation during the learning process of the deep neural network. The weight vector w designated by learning may be previously stored in each layer or in a separate memory.

일예로, 인공 뉴런은 입력 벡터(X)의 다수의 입력값(X₁, X₂, …, X_N)에 가중치 벡터(w)의 다수의 가중치 각각(w₁, w₂, …, w_N) 중 대응하는 가중치를 곱하고, 각각의 곱한 결과를 바이어스 값(b)와 함께 합산한다.For example, an artificial neuron has multiple input values (X ₁ , X ₂ , …, X _N ) of the input vector (X) and multiple weights of the weight vector (w), respectively (w ₁ , w ₂ , …, w _N ), the corresponding weight is multiplied, and the result of the multiplication is summed together with the bias value (b).

즉 인공 뉴런은 입력 벡터(X)에 대해 기지정된 가중치 벡터(w)를 내적(inner product)하여 출력하는 연산을 수행할 수 있으며, 이러한 인공 뉴런의 수학적 모델의 프로세스는 수학식 1과 같이 표현될 수 있다.In other words, the artificial neuron can perform an operation that outputs an inner product of a weight vector (w) determined for the input vector (X), and the process of the mathematical model of this artificial neuron is expressed as Equation 1 I can.

여기서 X_i, w_i, b 및 Output는 각각 입력값, 가중치, 바이어스 값 및 출력값을 나타낸다.Here, X _i , w _i , b and Output represent input values, weights, bias values, and output values, respectively.

수학식 1에서는 입력 벡터(X) 및 가중치 벡터(w)가 각각 1차원 벡터인 것으로 가정하였으나, 입력 벡터(X) 및 가중치 벡터(w)는 다차원 벡터(n차원 벡터(여기서 n은 자연수))일 수 있으며, 행렬로 표현될 수 있다. 이 때, 가중치 벡터(w)는 입력 벡터(X)의 차원에 대응하는 차원을 가질 수 있다. 예를 들면, 입력 벡터(X)가 2차원 행렬인 경우, 가중치 벡터(w) 또한 2차원 행렬로 구성될 수 있다.In Equation 1, it is assumed that the input vector (X) and the weight vector (w) are each one-dimensional vector, but the input vector (X) and the weight vector (w) are multidimensional vectors (n-dimensional vectors (where n is a natural number)) May be, and may be expressed as a matrix. In this case, the weight vector w may have a dimension corresponding to the dimension of the input vector X. For example, when the input vector X is a two-dimensional matrix, the weight vector w may also be configured as a two-dimensional matrix.

도1 은 인공 신경망의 일예로서, 컨볼루션 신경망(Convolution Neural Network: CNN)을 도시하였으며, 컨볼루션 신경망(CNN)은 영상 인식, 음성 인식, 자연어 처리, 필기체 인식 등에 주로 사용되는 신경망이다. 도1 에서는 대표적인 컨볼루션 신경망으로 잘알려진 LeNet-5의 개략적 구조를 나타내었다.FIG. 1 shows a convolutional neural network (CNN) as an example of an artificial neural network, and the convolutional neural network (CNN) is a neural network mainly used for image recognition, speech recognition, natural language processing, and handwriting recognition. Fig. 1 shows the schematic structure of LeNet-5, which is well known as a representative convolutional neural network.

도1 에 도시된 바와 같이, 컨볼루션 신경망은 입력된 입력 영상의 특징을 추출하는 특징 추출부(Feature Extraction) 및 분류부(Classification)를 포함할 수 있다.As shown in FIG. 1, the convolutional neural network may include a feature extraction unit (Feature Extraction) and a classification unit (Classification) for extracting features of an input image.

그리고 도1 의 컨볼션 신경망의 특징 추출부는 적어도 하나의 컨볼루션 레이어(Convolution Layer)(C1, C2, C3)과 적어도 하나의 풀링 레이어(Pooling Layer)(S1, S2, S3)를 포함하고, 분류부는 적어도 하나의 완전 연결 레이어(Fully Connected Layer)(FC1, FC2)를 포함할 수 있다.In addition, the feature extraction unit of the convolutional neural network of FIG. 1 includes at least one convolution layer (C1, C2, C3) and at least one pooling layer (S1, S2, S3), and classification The unit may include at least one fully connected layer (FC1, FC2).

도1 에서는 일예로 컨볼루션 신경망이 3개의 컨볼루션 레이어(C1, C2, C3)와 3개의 풀링 레이어(S1, S2, S3) 및 2개의 완전 연결 레이어(FC1, FC2)를 포함하는 것으로 도시하였으나, 레이어의 개수는 다양하게 조절될 수 있다.In FIG. 1, as an example, it is shown that the convolutional neural network includes three convolutional layers (C1, C2, C3), three pooling layers (S1, S2, S3), and two fully connected layers (FC1, FC2). , The number of layers can be variously adjusted.

컨볼루션 레이어(C1, C2, C3)는 이전 단의 레이어에서 출력되는 적어도 하나의 출력 특징 맵(또는 입력 영상)을 입력 벡터(X)로서 인가받아 기지정된 가중치 벡터(w)와 컨볼루션 연산을 수행하여 출력 특징 맵을 생성한다.The convolution layer (C1, C2, C3) receives at least one output feature map (or input image) output from the previous layer as an input vector (X) and performs a predetermined weight vector (w) and a convolution operation. To generate an output feature map.

컨볼루션 신경망(CNN)에서는 입력 벡터(X)를 입력 특징 맵(Input feature map)이라 하며, 가중치 벡터(w)를 가중치 필터(weight filter)라 한다. 즉 컨볼루션 신경망(CNN)의 컨볼루션 레이어는 입력 특징 맵을 인가받아 가중치 필터로 필터링하여 출력 특징 맵을 출력한다.In a convolutional neural network (CNN), the input vector X is called an input feature map, and the weight vector w is called a weight filter. That is, the convolutional layer of the convolutional neural network (CNN) receives an input feature map, filters it with a weight filter, and outputs an output feature map.

그리고 풀링 레이어(S1, S2, S3)는 인공 신경망의 오버피팅을 줄이고 성능을 향상 시키기 위해 이전 레이어(C1, C2, C3)에서 출력되는 출력값에 대해 서브 샘플링(sub-sampling)을 수행한다.In addition, the pooling layers (S1, S2, S3) perform sub-sampling on the output values output from the previous layers (C1, C2, C3) in order to reduce overfitting of the artificial neural network and improve performance.

완전 연결 레이어 (FC1, FC2)는 특징 추출부에서 추출된 특징 맵 각각을 기지정된 클래스로 분류한다.The fully connected layers FC1 and FC2 classify each of the feature maps extracted by the feature extraction unit into a known class.

컨볼루션 레이어(C1, C2, C3)는 적어도 하나의 입력 특징 맵 중 대응하는 적어도 하나의 입력 특징 맵을 인가받아, 서로 다른 기저장된 가중치 필터로 컨볼루션 연산을 수행하는 다수의 디지털 뉴런을 포함할 수 있다. 이때 다수의 컨볼루션 레이어 각각은 서로 다른 개수의 인공 뉴런을 포함하도록 구성될 수 있다.The convolution layer (C1, C2, C3) receives a corresponding at least one input feature map from among at least one input feature map and includes a plurality of digital neurons that perform convolution operations using different pre-stored weight filters. I can. In this case, each of the plurality of convolutional layers may be configured to include a different number of artificial neurons.

컨볼루션 레이어(C1, C2, C3)의 인공 뉴런은 인가된 적어도 하나의 입력 특징 맵을 기지정된 가중치 필터와 컨볼루션 연산한다. 컨볼루션 레이어(C1, C2, C3)의 인공 뉴런은 컨볼루션 연산을 수행하는 동안 가중치 필터를 입력 특징 맵 상을 이동시키며, 입력 특징 맵에서 가중치 필터가 투영된 영역의 입력값과 가중치를 곱한다. 그리고 곱해진 결과와 바이어스 값(Bias)을 합하고, 활성화 함수로 필터링하여 출력한다. 이때 출력값은 입력 특징 맵에 가중치 필터가 투영된 위치에 대응하여 출력 특징 맵의 지정된 위치의 값으로 지정된다. 즉 출력 특징 맵은 행렬 형태로 표현되는 벡터로 획득될 수 있다.The artificial neurons of the convolutional layers C1, C2, and C3 convolve the applied at least one input feature map with a predetermined weight filter. The artificial neurons of the convolutional layers (C1, C2, C3) move the weight filter on the input feature map while performing the convolution operation, and multiply the weight by the input value of the area where the weight filter is projected in the input feature map. Then, the multiplied result and the bias value (Bias) are summed, filtered with an activation function, and output. In this case, the output value is designated as the value of the designated location of the output feature map corresponding to the location where the weight filter is projected onto the input feature map. That is, the output feature map may be obtained as a vector expressed in a matrix form.

따라서 컨볼루션 레이어(C1, C2, C3)의 인공 뉴런은 입력 특징 맵(X)에 대해 기지정된 가중치 필터(w)를 이동시키면서 내적하여 컨볼루션 연산을 수행할 수 있으며, 컨볼루션 레이어의 인공 뉴런의 프로세스는 수학식 2와 같이 표현될 수 있다.Therefore, the artificial neurons of the convolution layer (C1, C2, C3) can perform the convolution operation by dot product while moving the weight filter (w) specified for the input feature map (X), and the artificial neurons of the convolution layer. The process of can be expressed as in Equation 2.

여기서 X, w, b, F 및 Output은 각각 입력값, 가중치, 바이어스 값 및 출력값을 나타낸다. 그리고 R 및 S는 각각 입력 벡터(X) 상에서 가중치 벡터(w)가 행 방향 및 열 방향 이동 가능 거리를 나타내며, x, y는 입력 벡터(X)에 대한 가중치 벡터(w)의 위치를 나타낸다.Here, X, w, b, F, and Output represent input values, weights, bias values, and output values, respectively. Further, R and S denote the distances that the weight vector w can move in the row and column directions on the input vector X, respectively, and x and y denote the positions of the weight vector w with respect to the input vector X.

수학식 2 를 참조하면, 컨볼루션 신경망(CNN)의 인공 뉴런 또한 수학식 1과 마찬가지로, 입력 벡터(X)와 가중치 벡터(w)를 내적 연산함을 알 수 있다.Referring to Equation 2, it can be seen that the artificial neuron of the convolutional neural network (CNN) also performs a dot product operation on the input vector (X) and the weight vector (w) as in Equation 1.

다만 컨볼루션 연산의 특성상 컨볼루션 레이어(C1, C2, C3)의 인공 뉴런은 입력 특징 맵을 가중치 필터의 크기에 대응하여 다수 횟수로 곱셈을 반복하고, 반복 수행되는 곱셈의 결과를 누적합하여 수행한다.However, due to the nature of the convolution operation, the artificial neurons of the convolution layer (C1, C2, C3) repeat the multiplication of the input feature map a number of times corresponding to the size of the weight filter, and accumulate and add the results of the repeated multiplication. .

일반적으로 회로 구현에 있어서 곱셈 연산은 덧셈 연산에 비해 큰 전력과 연산 시간을 요구하며, 특히 칩 구현에 있어서 더 큰 면적을 필요로 한다. 따라서 임베디드 시스템에 적용되는 인공 신경망을 위한 연산 가속기로서 디지털 회로로 구현되는 디지털 뉴런은 곱셈 연산을 효율적으로 수행할 수 있도록 구성되어야 한다.In general, in circuit implementation, the multiplication operation requires a larger power and operation time than the addition operation, and in particular, a larger area is required for the chip implementation. Therefore, digital neurons implemented as digital circuits as computation accelerators for artificial neural networks applied to embedded systems should be configured to efficiently perform multiplication operations.

도3 은 본 발명의 실시예에 따른 인공 신경망의 디지털 뉴런에 대한 블록 다이어그램을 나타낸다.3 is a block diagram of a digital neuron of an artificial neural network according to an embodiment of the present invention.

도3 을 참조하면, 본 실시예에 따른 디지털 뉴런은 적어도 하나의 신경소자(Neural Element: NE)를 포함하도록 구성된다.Referring to FIG. 3, the digital neuron according to the present embodiment is configured to include at least one neural element (NE).

신경소자(NE)는 입력 특징 맵과 가중치 필터를 인가받고, 인가된 입력 특징 맵 및 가중치 필터에 대해 기지정된 연산을 수행하여 출력한다. 여기서 기지정된 연산은 일예로 내적 연산이거나, 내적 연산을 누적하여 수행하는 컨볼루션 연산일 수 있다.The neural element NE receives the input feature map and the weight filter, and performs a predetermined operation on the applied input feature map and the weight filter and outputs it. Here, the predetermined operation may be, for example, an inner product operation or a convolution operation performed by accumulating inner product operations.

상기한 바와 같이 컨볼루션 연산은 입력 특징 맵과 가중치 필터의 내적의 누적합이며, 내적은 입력 특징 맵의 원소인 다수의 입력값(X_i)과 가중치 필터의 원소인 다수의 가중치(w_i = w₁, w₂, …, w_N)의 곱셈값의 합으로 계산된다.As described above, the convolution operation is the cumulative sum of the dot product of the input feature map and the weight filter, and the dot product is a number of input values (X _i ) that are elements of the input feature map and a number of weights (w _i = It is calculated as the sum of the multiplication values of w ₁ , w ₂ , …, w _N ).

이때 다수의 입력값(X_i)은 정수 포멧으로 인가되며, 가중치 필터의 가중치(w_i) 또한 기지정된 비트수를 갖는 정수 포멧의 가중치로 인가될 수 있다. 일예로 가중치(w_i)는 5비트의 정수 포멧의 가중치일 수 있다.In this case, a plurality of input values X _i may be applied in an integer format, and a weight w _i of the weight filter may also be applied as a weight in an integer format having a predetermined number of bits. As an example, the weight w _i may be a weight of a 5-bit integer format.

일반적으로 인공 신경망은 학습 과정을 통해 32비트의 단정밀도 부동 소수점 포멧(single-precision Floating Point format(FP32)) 또는 16 비트의 반정밀도 부동 소수점 포멧(half-precision Floating Point format(FP16))의 가중치가 획득된다. 그러나 부동 소수를 곱셈 연산하기 위한 곱셈 회로의 구조는 매우 복잡하여, 전력 소모가 크며, 큰 면적을 요구한다.In general, artificial neural networks are weighted in 32-bit single-precision floating point format (FP32) or 16-bit half-precision floating point format (FP16) through the learning process. Is obtained. However, the structure of a multiplication circuit for multiplying a floating point is very complex, consumes large power, and requires a large area.

반면, 가중치(w_i)가 정수인 경우, 정수 포멧의 입력값과 가중치의 곱셈은 비트 천이 연산과 덧셈 연산의 조합으로 수행될 수 있다. 비트 천이 연산을 수행하는 비트 시프터 및 덧셈 연산을 수행하는 가산기는 곱셈 회로에 비해 매우 간단한 구조로 구현될 수 있어, 소형으로 구현될 수 있으며 전력 소모를 크게 줄일 수 있다. 특히 곱셈 회로에 비해 연산 속도가 비약적으로 증가된다.On the other hand, when the weight w _i is an integer, multiplication of the input value of the integer format and the weight may be performed by a combination of a bit shift operation and an addition operation. The bit shifter performing the bit shift operation and the adder performing the addition operation can be implemented in a very simple structure compared to the multiplication circuit, and thus can be implemented in a small size and greatly reduce power consumption. In particular, the operation speed is dramatically increased compared to the multiplication circuit.

다만 디지털 뉴런이 부동 소수점 포멧의 가중치가 아닌 정수 포멧의 가중치를 이용하는 경우에도, 인공 신경망이 동일한 성능을 나타낼 수 있는지 고려되어야 한다. 이에 본 실시예에서는 부동 소수점 포멧의 가중치와 정수화된 가중치를 이용하는 경우 각각에서 인공 신경망의 성능을 시뮬레이션 하였다. 시뮬레이션은 필기체 숫자를 인식하기 위해 개발된 대표적인 컨볼루션 신경망인 LeNet-5에서 MNIST(Modified National Institute of Standards and Technology database)를 이용하여 수행되었으며, 시뮬레이션 결과는 32비트의 단정밀도 부동 소수점 포멧의 가중치 대신 8비트 정수 포멧의 가중치가 이용되어도 LeNet-5의 필기체 숫자 인식 정확도는 동일하게 나타나는 것으로 확인되었다. 즉 32비트의 부동 소수점 포멧의 가중치보다 적은 비트수를 갖는 8비트의 정수 포멧의 가중치를 이용하여도 동일한 정확도를 나타내며, 인공 신경망의 성능이 동일하게 유지됨이 확인되었다.However, even if the digital neuron uses the weight of the integer format rather than the weight of the floating point format, it should be considered whether the artificial neural network can exhibit the same performance. Accordingly, in this embodiment, the performance of the artificial neural network is simulated in the case of using the weight of the floating point format and the weight of the integer number. The simulation was performed using the Modified National Institute of Standards and Technology database (MNIST) in LeNet-5, a representative convolutional neural network developed for recognizing handwritten digits, and the simulation result is instead of the weight of the 32-bit single-precision floating-point format. It was confirmed that even if the weight of the 8-bit integer format is used, the accuracy of ReNet-5's handwritten number recognition is the same. In other words, it was confirmed that the same accuracy was obtained by using the weight of the 8-bit integer format, which has a smaller number of bits than the weight of the 32-bit floating-point format, and the performance of the artificial neural network remains the same.

따라서 본 실시예에서 디지털 뉴런은 곱셈 연산을 비트 천이 연산 및 덧셈 연산의 조합으로 수행할 수 있도록 정수화된 가중치를 인가받는다.Accordingly, in the present embodiment, the digital neuron receives an integerized weight so that the multiplication operation can be performed by a combination of a bit shift operation and an addition operation.

특히 본 실시예에서 디지털 뉴런은 32비트의 단정밀도 부동 소수점 포멧과 동일한 정확도를 나타내는 8비트의 정수 포멧 보다 적은 비트수(예를 들면 5비트)의 정수 포멧의 가중치(w_i)를 인가받을 수 있다.In particular, in the present embodiment, the digital neuron can receive the weight (w _i ) of the integer format with fewer bits (e.g., 5 bits) than the 8-bit integer format, which exhibits the same accuracy as the 32-bit single-precision floating-point format. have.

인공 신경망은 패턴 인식 및 분류를 위해 주로 이용되며, 이러한 인공 신경망은 매우 정확한 정밀도의 산술적 계산을 요구하지 않는다. 오히려 인공 신경망에서는 학습 데이터에 과도하게 최적화되는 오버피팅(Overfitting)과 같은 문제를 방지하여, 다양한 입력 데이터에 대해 유연하게 처리할 수 있도록 풀링 레이어(Pooling layer)을 추가하는 경우가 빈번하다.Artificial neural networks are mainly used for pattern recognition and classification, and these artificial neural networks do not require highly accurate arithmetic calculations. Rather, in artificial neural networks, a pooling layer is frequently added so that a variety of input data can be flexibly processed by preventing problems such as overfitting that are excessively optimized for training data.

이에 본 실시예에서는 신경소자(NE)의 회로를 더욱 간단하게 구현하고 전력 소모를 줄일 수 있도록 32비트의 단정밀도 부동 소수점 포멧의 가중치에 대응하는 정확도를 갖는 8비트의 정수 포멧 보다 적은 비트 수(여기서는 일예로 5비트)의 정수 가중치를 인가받는 경우에 대해서 추가적으로 시뮬레이션을 수행하였다. 시뮬레이션 결과, 디지털 뉴런이 5비트의 정수 포멧의 가중치를 인가받더라도 인공 신경망의 성능 저하는 미미한 수준인 것으로 확인하였다. 따라서 본 실시예에서는 디지털 뉴런이 32비트의 단정밀도 부동 소수점 포멧과 동일한 성능을 나타내는 8비트의 정수보다 적은 비트 수를 갖는 정수 포멧의 가중치(w_i)를 인가받는 것으로 가정하여 설명한다.Accordingly, in this embodiment, the number of bits is smaller than the 8-bit integer format having an accuracy corresponding to the weight of the 32-bit single-precision floating-point format so that the circuit of the neural element (NE) can be more simply implemented and power consumption is reduced. Here, as an example, a simulation was additionally performed for the case of receiving an integer weight of 5 bits). As a result of the simulation, it was confirmed that the performance degradation of the artificial neural network was insignificant even if the digital neuron was applied with a weight in the 5-bit integer format. Therefore, in the present embodiment, it is assumed that a digital neuron is applied with a weight w _i of an integer format having a number of bits less than an 8-bit integer that exhibits the same performance as a 32-bit single-precision floating-point format.

여기서 적은 비트 수를 갖는 정수 포멧의 가중치(w_i)는 일예로 인공 신경망의 학습 장치가 32비트의 단정밀도 부동 소수점 포멧의 가중치 각각에 기지정된 정수화값(일예로 16(이진수 10000))를 곱하고, 이하 소수점 자리수를 버리거나 반올림하여 획득될 수 있다.Here, the weight (w _i ) of the integer format having a small number of bits is obtained by multiplying each weight of the 32-bit single-precision floating-point format by a learning device of an artificial neural network by a predetermined integer value (eg, 16 (binary 10000)). , Can be obtained by discarding or rounding off the number of decimal places.

또한 가중치(w_i)는 파지티브 정수(positive integer)가 아닌 부호 정수(signed integer)로 인가될 수 있다. 즉, 5비트의 부호 정수 포멧의 가중치(w_i)는 -16(11111) ~ 15(01111)의 범위의 값으로 인가될 수 있다.In addition, the weight w _i may be applied as a signed integer rather than a positive integer. That is, the weight w _i of the 5-bit signed integer format may be applied as a value ranging from -16 (11111) to 15 (01111).

본 실시예에서 가중치(w_i)가 기지정된 비트 수(여기서는 일예로 5비트)의 정수값을 가지므로, 신경소자(NE)는 입력값(X)과 가중치(w_i)의 곱셈 연산을 비트 천이 연산과 비트 천이 연산으로 계산되는 부분곱을 가산하는 덧셈 연산의 조합으로 수행할 수 있다. 여기서 비트 천이 연산은 가중치(w_i)의 각 비트의 값에 따라 입력값(X)을 비트 천이하여 부분곱을 계산하도록 수행된다.In the present embodiment, since the weight (w _i ) has an integer value of a predetermined number of bits (here, as an example, 5 bits), the neural element (NE) performs a multiplication operation of the input value (X) and the weight (w _i ). It can be performed by a combination of a transition operation and an addition operation that adds a partial product calculated by a bit shift operation. Here, the bit shift operation is performed to calculate a partial product by bit shifting the input value X according to the value of each bit of the weight w _i .

따라서 곱셈 연산을 비트 천이와 덧셈 연산으로 수행하므로, 신경소자(NE)는 디지털 뉴런의 연산 속도를 향상시키고, 전력 소비를 줄일 뿐만 아니라 소형으로 구현될 수 있도록 한다.Therefore, since the multiplication operation is performed by bit shifting and addition operation, the neural element NE improves the operation speed of digital neurons, reduces power consumption, and allows it to be implemented in a small size.

도3 을 참조하면, 신경소자(NE)는 N(여기서 N은 자연수)개의 부분곱 생성기(MBS)와 부분곱 가산기(PSUM)를 포함한다.Referring to FIG. 3, the neuronal element NE includes N (where N is a natural number) partial product generator MBS and partial product adder PSUM.

부분곱 생성기(MBS)의 개수(N)은 2차원 가중치 필터의 원소 개수, 즉 가중치(w_i = w₁, w₂, …, w_N)의 개수와 동일하다. 일예로 가중치 필터가 5 X 5인 크기의 행렬(N = 25)인 경우, 신경소자(NE)는 25개의 부분곱 생성기(MBS)를 포함한다.The number (N) of partial product generators (MBS) is equal to the number of elements of the 2D weight filter, that is, the number of weights (w _i = w ₁ , w ₂ , …, w _N ). For example, when the weight filter is a matrix having a size of 5 X 5 (N = 25), the neural element NE includes 25 partial product generators MBS.

그리고 N개의 부분곱 생성기(MBS)는 인가된 2차원 입력 특징 맵의 대응하는 입력값(X)과 2차원 가중치 필터의 대응하는 가중치(w_i)의 부분곱((P1₁, P2₁), (P1₂, P2₂), …, (P1_N, P2_N))을 생성하여 출력한다.And the N partial product generators (MBS) are the partial products ((P1 ₁ , P2 ₁ ) of the corresponding input values (X) of the applied 2D input feature map and the corresponding weights (w _i ) of the 2D weight filter, (P1 ₂ , P2 ₂ ), …, (P1 _N , P2 _N )) are generated and output.

부분곱 생성기(MBS)는 가중치(w)가 기지정된 비트수의 정수값을 가지므로, 입력값(X)과 가중치(w_i)의 곱셈 연산을 입력값(X)을 가중치(w_i) 각각의 비트 값에 따라 비트 천이하여 2개씩의 부분곱((P1₁, P2₁), (P1₂, P2₂), …, (P1_N, P2_N))을 계산한다. 여기서 다수의 부분곱 생성기(MBS)는 병렬로 부분곱을 계산하므로, 동시에 다수의 부분곱((P1₁, P2₁), (P1₂, P2₂), …, (P1_N, P2_N))을 계산할 수 있다.Partial product generators (MBS) is a weight (w) a group because of the integer value of the specified number of bits, the input value (X) and the weight (w _i) multiplying the weight (w _i) the input value (X) calculated in each Calculate the partial product ((P1 ₁ , P2 ₁ ), (P1 ₂ , P2 ₂ ), …, (P1 _N , P2 _N )) of two by bit shifting according to the bit value of. Here, multiple partial product generators (MBS) calculate partial products in parallel, so at the same time, multiple partial products ((P1 ₁ , P2 ₁ ), (P1 ₂ , P2 ₂ ), …, (P1 _N , P2 _N )) Can be calculated.

본 실시예에서는 N개의 부분곱 생성기(MBS) 각각이 2개의 부분곱((P1₁, P2₁), (P1₂, P2₂), …, (P1_N, P2_N))을 생성하여 출력하는 것으로 가정하며, 부분곱 생성기(MBS)가 2개의 부분곱을 생성하는 이유는 후술한다.In this embodiment, each of the N partial product generators (MBS) generates and outputs two partial products ((P1 ₁ , P2 ₁ ), (P1 ₂ , P2 ₂ ), …, (P1 _N , P2 _N )). It is assumed that the partial product generator (MBS) generates two partial products will be described later.

부분곱 가산기(PSUM)는 N개의 부분곱 생성기(MBS) 각각에서 2개씩 전달되는 부분곱((P1₁, P2₁), (P1₂, P2₂), …, (P1_N, P2_N))과 바이어스 값(BS)을 가산하여, 2차원 입력 특징 맵과 2차원 가중치 필터의 내적 연산 결과인 출력값(Out)을 출력한다. 여기서 부분곱 가산기(PSUM)에서 출력되는 출력값(Out)이 컨볼루션 연산의 결과가 아닌 내적 연산의 결과인 것은 2차원 가중치 필터가 2차원 입력 특징 맵의 일부와 연산된 결과이기 때문이다. 그리고 바이어스 값(BS)는 수학식 1 및 2의 바이어스 값(b)을 의미한다.Partial product adder (PSUM) is a partial product ((P1 ₁ , P2 ₁ ), (P1 ₂ , P2 ₂ ), …, (P1 _N , P2 _N )) that is transmitted by two from each of N partial product generators (MBS). The over-bias value BS is added, and an output value Out, which is a result of the dot product operation of the 2D input feature map and the 2D weight filter, is output. Here, the output value Out output from the partial product adder PSUM is the result of the dot product operation, not the result of the convolution operation, because the 2D weight filter is a result of the operation with a part of the 2D input feature map. In addition, the bias value BS denotes the bias value b of Equations 1 and 2.

기존의 GPU/TPU의 경우, 한 사이클 동안 가중치 필터에서 하나의 가중치(w_i)와 대응하는 하나의 입력값(X)를 곱셈 연산하여 누적하는 방식으로 연산값을 계산한다. 따라서 가중치 필터의 원소인 가중치(w_i)의 개수(N)에 대응하는 사이클 동안 반복 연산을 수행해야 한다. 즉 N 사이클 동안, 반복 연산을 수행해야 한다.In the case of a conventional GPU/TPU, an operation value is calculated by multiplying and accumulating one weight w _i and a corresponding input value X in a weight filter during one cycle. Therefore, iterative operation must be performed during the cycle corresponding to the number (N) of the weights (w _i ), which are elements of the weight filter. That is, for N cycles, iterative operations must be performed.

그에 반해 본 실시예에 따른 디지털 뉴런은 신경소자(NE)의 N개의 부분곱 생성기(MBS)가 병렬로 동시에 연산을 수행하므로, 고속으로 연산을 수행할 수 있다. 또한 부분곱 생성기(MBS)가 비트 천이 방식으로 부분곱을 계산하므로, 디지털 회로 구현이 간단하여 연산 속도를 향상시키고, 전력 소비를 줄일 뿐만 아니라 소형으로 구현될 수 있다.On the other hand, in the digital neuron according to the present embodiment, the N partial product generators MBS of the neural element NE perform calculations simultaneously in parallel, and thus, the calculation can be performed at high speed. In addition, since the partial product generator (MBS) calculates the partial product using a bit shift method, the digital circuit is simple to implement, thereby improving the operation speed, reducing power consumption, and implementing a small size.

한편, 본 실시예에 따른 신경소자(NE)는 부호 정수 포멧의 가중치를 인가받도록 구성됨에 따라, 부분곱 생성기(MBS)가 부분곱의 부호가 음수로 나타나는 네거티브 부분곱을 생성할 수 있도록 한다. 네거티브 부분곱을 이용하는 경우, 입력값(X)과 가중치(w_i)의 곱셈 연산을 위해 생성되어야 하는 부분곱의 개수를 더욱 줄일 수 있다.Meanwhile, since the neural element NE according to the present embodiment is configured to receive a weight in the signed integer format, the partial product generator MBS can generate a negative partial product in which the sign of the partial product is negative. In the case of using the negative partial product, the number of partial products to be generated for the multiplication operation of the input value X and the weight w _i can be further reduced.

예를 들어 가중치(w_i)가 7인 경우, 가중치(w_i)와 입력값(X)의 곱셈 연산은 7(00111)·X이다. 그리고 가중치(w_i)와 입력값(X)의 곱셈 연산을 비트 천이 연산을 이용하는 부분곱과 부분곱의 덧셈으로 연산을 수행하는 경우, 가중치(w_i)의 각 비트별 값(즉 2의 승수(2ⁿ))과 입력값(X)의 부분곱의 합으로 4(00100)·X + 2(00010) ·X +1(00001)·X 와 같이 계산된다. 즉 3개의 부분곱이 생성되어야 한다. 여기서 부분곱은 입력값(X)을 가중치(w_i)의 1의 값을 갖는 비트(2의 승수(2ⁿ)의 지수(n))만큼 상위 비트 방향으로 시프트하여 획득될 수 있다. 즉 3개의 비트 천이 회로와 가산기를 요구한다.For example, if the weight (w _i ) is 7, the multiplication operation of the weight (w _i ) and the input value (X) is 7(00111)·X. And when the multiplication operation of the weight (w _i ) and the input value (X) is performed by the addition of the partial product and the partial product using the bit shift operation, the value of each bit of the weight (w _i ) (that is, a multiplier of 2) (2 ⁿ )) is the sum of the partial product of the input value (X) and is calculated as 4(00100)·X + 2(00010) ·X +1(00001)·X. That is, three partial products must be generated. Here, the partial product may be obtained by shifting the input value X by a bit having a value of 1 of the weight w _i (an exponent ⁿ of a multiplier of 2 (2 ⁿ )) in the upper bit direction. That is, it requires three bit shifting circuits and an adder.

그러나 신경소자(NE)가 네거티브 부분곱을 생성할 수 있는 경우, 가중치(w_i) 7과 입력값(X)의 곱셈은 8(01000)·X + (-X)로 계산될 수 있다. 여기서 네거티브 부분곱인 -X는 입력값(X)에 대한 2의 보수 연산을 수행하여 획득될 수 있다. 즉 2개의 부분곱과 2의 보수를 이용하여 계산될 수 있다. 따라서 2개의 비트 천이 회로와 2의 보수 회로 및 가산기로 동일한 연산을 수행할 수 있다.However, when the neuron (NE) can generate a negative partial product, the multiplication of the weight (w _i ) 7 and the input value (X) can be calculated as 8(01000)·X + (-X). Here, the negative partial product -X may be obtained by performing a two's complement operation on the input value X. That is, it can be calculated using two partial products and two's complement. Therefore, the same operation can be performed with two bit shift circuits, a two's complement circuit, and an adder.

이와 같은 방식으로 부분곱과 2의 보수를 이용하여, 부분곱의 개수가 최소화되도록 가중치(w_i)를 재구성하는 경우, 5비트의 가중치(w_i) 중 양의 정수(1 ~ 15) 값과 입력값(X)의 곱셈에 대한 모든 경우의 수는 표1 과 같이 나타난다.In this way, when the weight (w _i ) is reconstructed so that the number of partial products is minimized by using partial products and two's complement, the positive integer (1 ~ 15) of the 5-bit weight (w _i ) and The number of all cases for multiplication of the input value (X) is shown in Table 1.

표1 의 부호 정수 포멧의 가중치(w_i)에서 이진수 10000은 -16을 의미하지만, 표 1에서와 같이 가중치 14 및 15를 계산하기 위해 이용되는 경우에, 이진수 10000은 +16으로 고려한다.In the weight (w _i ) of the signed integer format of Table 1, the binary number 10000 means -16, but when used to calculate the weights 14 and 15 as in Table 1, the binary number 10000 is considered as +16.

표1 을 살펴보면, 1 ~ 15 범위 이내의 가중치(w_i)와 입력값(X)의 곱셈은 최대 3개의 부분곱의 합으로 계산될 수 있다. 그리고 표1 에서 네거티브 부분곱이 요구되는 경우에는 계산된 부분곱에 2의 보수 연산을 수행하여 획득될 수 있다.Referring to Table 1, the multiplication of the weight (w _i ) and the input value (X) within a range of 1 to 15 may be calculated as a sum of up to three partial products. In addition, when a negative partial product is required in Table 1, it can be obtained by performing a two's complement operation on the calculated partial product.

5비트의 정수 값을 갖는 가중치(w_i)와 입력값(X)과의 일반적인 곱셈 연산은 최대 5개의 부분곱을 합산이 필요하다. 그에 반해, 표1 과 같이, 네거티브 부분곱을 이용하게 되면, 최대 3개의 부분곱만을 합산하면 되므로, 곱셈 연산을 수행하기 위한 회로의 구성을 간략하게 할 수 있다.A typical multiplication operation between the weight (w _i ) having a 5-bit integer value and the input value (X) requires adding up to 5 partial products. On the other hand, as shown in Table 1, when a negative partial product is used, it is only necessary to add up to three partial products, so that the configuration of a circuit for performing a multiplication operation can be simplified.

한편 표1 에 나타난 바와 같이 5비트의 정수 값을 갖는 가중치(w_i)와 입력값(X)의 곱셈에서 최대 개수인 3개의 부분곱을 요구하는 가중치는 11 및 13뿐이며, 나머지 가중치(1 ~ 10, 12, 14, 15)와 입력값(X)의 곱셈은 2개 이하의 부분곱의 합산으로 계산될 수 있다.On the other hand, as shown in Table 1, only 11 and 13 are the weights that require the maximum number of 3 partial products in multiplication of the weight (w _i ) having a 5-bit integer value and the input value (X), and the remaining weights (1 ~ 10 , 12, 14, 15) and the input value (X) can be calculated as the sum of two or less partial products.

따라서 5비트의 정수 값을 갖는 가중치(w_i) 중에서 최대 개수의 부분곱을 요구하는 가중치 11 및 13을 제외한 나머지 가중치와 입력값(X)의 곱셈은 수학식 3과 같이 나타날 수 있다.Accordingly, the multiplication of the input value X and the remaining weights excluding the weights 11 and 13 that require the maximum number of partial products among the weights w _i having a 5-bit integer value may be expressed as Equation 3.

수학식 3을 참조하면, 5비트의 정수 값을 갖는 가중치(w_i)는 -1, 0 또는 1의 값을 갖는 계수(coefficient)(w_b1, w_b2)와 2의 승수(2ⁱ, 2^j)의 부분곱(w_b12ⁱ, w_b22^j)의 합으로 표현되고, 계수(w_b1, w_b2) 중 -1 의 값은 1에 대한 2의 보수(-1)로 표현될 수 있다.Referring to Equation 3, a weight (w _i ) having a 5-bit integer value is a coefficient (w _b1 , w _b2 ) having a value of -1, 0 or 1 and a multiplier of 2 (2 ⁱ , 2 ). ^j ) is expressed as the sum of the partial products (w _b1 2 ⁱ , w _b2 2 ^j ), and the value of -1 among the coefficients (w _b1 , w _b2 ) can be expressed as 2's complement (-1) to 1 have.

이에 가중치(w_i)와 입력값(X)의 곱셈은 2개의 부분곱(w_b12ⁱ·X, w_b22^j·X)과 2의 보수(비트 값(w_b1, w_b2)이 -1인 경우)의 조합의 합산으로 계산될 수 있음을 나타낸다.Therefore, the multiplication of the weight (w _i ) and the input value (X) is equal to two partial products (w _b1 2 ⁱ ·X, w _b2 2 ^j ·X) and two's complement (bit values (w _b1 , w _b2 )). 1) indicates that it can be calculated as the sum of the combinations.

한편, 가중치(w_i)가 -1 ~ -15의 범위인 경우에는 1 ~ 15 범위 이내의 가중치(w_i)의 각 입력값(X)의 곱에 2의 보수를 취하여, 표2 와 같이 계산할 수 있다. 따라서 가중치 -11 및 -13을 제외하면, 네거티브 가중치(w_i) 또한 2의 승수(2ⁿ)에 대한 2개의 부분곱과 2의 보수의 조합을 합산하여 계산할 수 있다.On the other hand, if the weight (w _i ) is in the range of -1 to -15, the product of each input value (X) of the weight (w _i ) within the range of 1 to 15 is calculated as shown in Table 2. I can. Therefore, excluding the weights -11 and -13, the negative weight (w _i ) can also be calculated by summing the combination of the two partial products and the two's complement of the power of 2 (2 ⁿ ).

표2 에서도 -11 및 -13만이 3개의 부분곱이 필요한 것을 알 수 있다.In Table 2, it can be seen that only -11 and -13 need three partial products.

이에 본 실시예에 따른 디지털 뉴런은 신경소자(NE)의 구조를 간략화하기 위하여, 기지정된 적어도 하나의 정수(여기서는 일예로 ±11 및 ±13)가 제외된 지정된 비트 수의 정수 포멧의 가중치(w_i)를 인가 받도록 구성될 수 있다.Accordingly, in order to simplify the structure of the neural device (NE), the digital neuron according to the present embodiment has a weight (w) of the integer format of the specified number of bits excluding at least one predetermined integer (in this case, ±11 and ±13). _i ) can be configured to be authorized.

결과적으로 2의 승수(2ⁿ)에 대한 2개의 부분곱을 수행하기 위한 2개의 비트 천이 회로와 2의 보수 회로만 구비하여도 5비트 범위의 가중치(w_i)와 입력값(X)의 곱셈을 수행할 수 있게 되어, 신경소자(NE)의 구조를 최대한 간략하게 할 수 있다. 즉 N개의 부분곱 생성기(MBS) 각각이 2개의 부분곱((P1₁, P2₁), (P1₂, P2₂), …, (P1_N, P2_N))을 출력하도록 구성될 수 있다.As a result, the multiplication of the weight (w _i ) and the input value (X) of the 5-bit range can be performed even if only two bit shift circuits and two's complement circuits are provided to perform two partial products for the multiplier of 2 (2 ⁿ ). Since it can be performed, the structure of the neural element (NE) can be simplified as much as possible. That is, each of the N partial product generators MBS may be configured to output two partial products ((P1 ₁ , P2 ₁ ), (P1 ₂ , P2 ₂ ), …, (P1 _N , P2 _N )).

만일 가중치(w_i)가 6비트 또는 그 이상의 범위의 값으로 지정되는 경우에도, 유사하게 2개 또는 3개의 비트 천이 회로와 2의 보수 회로를 이용하여 가중치(w)와 입력값(X)의 곱셈을 수행하도록 할 수 있다.Even if the weight (w _i ) is designated as a value in the range of 6 bits or more, similarly, the weight (w) and the input value (X) are equally determined using a two or three bit transition circuit and a two's complement circuit. You can make it do multiplication.

그리고 이와 별도로 가중치(w_i)가 0인 경우에는 입력값(X)에 무관하게 0으로 변환하는 회로가 신경소자(NE)에 포함될 수 있으며, 일예로 입력값(X)을 인가받아 0과의 논리곱(And) 연산을 수행하는 논리 회로를 포함할 수 있다.And separately from this, when the weight (w _i ) is 0, a circuit that converts to 0 regardless of the input value (X) may be included in the nerve element (NE). For example, when the input value (X) is applied, it is It may include a logic circuit that performs an AND operation.

비록 -16 ~ 15 범위의 5비트 가중치(w_i)에서 ±11 및 ±13이 제외되어, 가중치(w_i)로 선택될 수 있는 값이 제한되지만, 이러한 가중치(w_i)의 제한이 인공 신경망의 성능에 미치는 영향은 크지 않다.Although the 5-bit weighting (w _i) of -16 to 15 degrees is excluded ± 11 and ± 13, but this value may be selected as a weight (w _i) limited, this limitation of the weight (w _i) ANN The effect on the performance of is not significant.

도4 는 가중치를 지정된 개수의 부분곱으로 표현하는 경우의 오차를 설명하기 위한 도면이다.4 is a diagram for explaining an error in the case of expressing a weight as a partial product of a specified number.

도4 에서는 일예로 가중치(w)가 8비트 정수 포멧인 경우에 대해, 0 ~ 128까지의 양의 정수 범위만을 고려한 것으로, 범위 내의 가중치(w_i)를 모두 표현하는 경우와 2의 승수(2ⁿ)에 대한 2개의 부분곱과 2의 보수의 조합으로 표현되도록 일부 정수를 제외한 경우 및 2의 승수(2ⁿ)에 대한 3개의 부분곱과 2의 보수의 조합으로 표현되도록 일부 정수를 제외한 경우의 오차를 나타낸다.In FIG. 4, for example, for a case in which the weight w is an 8-bit integer format, only a range of positive integers from 0 to 128 is considered, and a case where all the weights w _i within the range are expressed and a multiplier of 2 (2 Excluding some integers to be expressed as a combination of two partial products of ⁿ ) and two's complement, and excluding some integers to be expressed as a combination of three partial products of 2's (2 ⁿ ) and two's complement Represents the error of

도4 에서 X축은 7비트 양의 정수 포멧의 실제 가중치(w_i)를 나타내고, Y축은 7비트 양의 정수 포멧의 가중치(w_i)가 2개 또는 3개의 부분곱의 합으로 표현되도록 일부 값이 제외되어 인접한 정수값으로 대체된 가중치를 나타낸다. 여기서 녹색은 3개의 부분곱의 합으로 표현되는 가중치를 나타내고, 청색은 2개의 부분곱의 합으로 표현되는 가중치를 나타낸다.In Fig. 4, the X-axis represents the actual weight (w _i ) of the 7-bit positive integer format, and the Y-axis represents some values so that the weight (w _i ) of the 7-bit positive integer format is expressed as the sum of two or three partial products. Represents the weights that are excluded and replaced with adjacent integer values. Here, green indicates a weight expressed as the sum of three partial products, and blue indicates a weight expressed as the sum of two partial products.

그러나 도4 에 나타난 바와 같이, 7비트 양의 정수 포멧의 가중치(w_i)가 2개 또는 3개의 부분곱의 합으로 표현되도록 일부 값이 제외되어 인접한 정수값으로 대체될지라도, 2개의 부분곱의 합으로 표현되는 경우에 최대 오차는 대략 9.4% 수준이고, 3개의 부분곱으로 표현되는 경우에 대략 2.3% 수준이다. 즉 7비트 양의 정수 포멧의 가중치(w_i)를 0 ~ 128까지의 범위의 정수로 표현하지 않고, 2개의 부분곱의 합 또는 3개의 부분곱의 합으로 표현 가능한 정수로 대체하여 표현하더라도 오차는 크지 않다.However, as shown in Fig. 4, although some values are excluded and replaced with adjacent integer values so that the weight (w _i ) of the 7-bit positive integer format is expressed as the sum of two or three partial products, two partial products When expressed as the sum of, the maximum error is approximately 9.4%, and when expressed as three partial products, it is approximately 2.3%. That is, even if the weight (w _i ) of the 7-bit positive integer format is not expressed as an integer in the range of 0 to 128, it is expressed by replacing it with an integer that can be expressed as the sum of two partial products or the sum of three partial products. Is not big.

도4 에서는 양의 정수 범위(0 ~ 128)만을 고려하였으나, 음의 정수 범위(0 ~ -128)에 대해서도 동일하게 표현될 수 있음은 자명하다. 즉 가중치가 8비트의 부호화 정수인 경우, 2개의 부분곱 또는 3개의 부분곱으로 표현 가능한 정수로 대체하더라도, 오차는 10% 미만으로 나타난다.In FIG. 4, only the positive integer range (0 to 128) is considered, but it is obvious that the negative integer range (0 to -128) can be expressed in the same manner. That is, when the weight is an 8-bit coded integer, the error appears to be less than 10% even if it is replaced with an integer that can be represented by two partial products or three partial products.

따라서 사전에 시뮬레이션을 통해, 가중치(w_i)를 표현하는 부분곱의 개수의 변화에 따른 인공 신경망의 오차를 검토하고, 검토된 오차가 미리 지정된 허용 오차 범위 이내인 경우에, 디지털 뉴런은 적은 개수의 부분곱과 2의 보수의 조합으로 표현 가능한 정수로 표현되는 가중치(w_i)를 인가받도록 구성될 수 있다.Therefore, through simulation in advance, the error of the artificial neural network according to the change in the number of partial products representing the weight (w _i ) is reviewed, and if the reviewed error is within a predetermined tolerance, the number of digital neurons is small. It may be configured to receive a weight (w _i ) expressed as an integer expressible by a combination of a partial product of and two's complement.

표3 은 심층 컨볼루션 신경망 중 하나인 LeNet-5가 가중치(w_i)에 따라 MNIST에서 제공된 필기된 숫자를 판별한 추측 정확도를 시뮬레이션한 결과를 나타낸다.Table 3 shows the results of simulating the guessing accuracy of LeNet-5, one of the deep convolutional neural networks, discriminating the written number provided by MNIST according to the weight (w _i ).

표3 을 참조하면, 가중치(w_i)가 32비트의 단정밀도 부동 소수점 포멧인 경우와 8비트 정수 포멧인 경우에 LeNet-5의 추측 정확도는 99.10%로 동일하게 나타남을 알 수 있다. 그리고 가중치(w_i)가 5비트 정수 포멧인 경우에 98.95%의 정확도를 나타내고, ±11 및 ±13가 제외된 5비트 정수 포멧인 경우에는 98.92%의 정확도를 나타낸다. 즉 가중치(w_i)가 ±11 및 ±13이 제외된 5비트 정수 포멧으로 인가되더라도, 인공 신경망의 성능에 미치는 영향은 크지 않음을 알 수 있다.Referring to Table 3, it can be seen that when the weight w _i is in the 32-bit single-precision floating-point format and the 8-bit integer format, the guessing accuracy of LeNet-5 is the same as 99.10%. In addition, when the weight w _i is in a 5-bit integer format, the accuracy is 98.95%, and in the case of the 5-bit integer format excluding ±11 and ±13, the accuracy is 98.92%. That is, even if the weight w _i is applied in a 5-bit integer format excluding ±11 and ±13, it can be seen that the effect on the performance of the artificial neural network is not significant.

인공 신경망의 학습 장치는 부동 소수점 포멧의 가중치를 5비트의 정수로 변환하여 인공 신경망의 디지털 뉴런으로 전달할 수 있으며, 변환 과정에서 제외되도록 지정된 값(±11, ±13)으로 가중치가 획득되면, 이를 인접한 다른 정수로 변환하여 전달할 수 있다. 일예로, 학습 장치는 학습된 가중치(w_i)가 10.7, 11.3인 경우, 각각 10 및 12로 변환할 수 있다.The artificial neural network learning device converts the weight of the floating-point format into a 5-bit integer and transfers it to the digital neuron of the artificial neural network. When the weight is acquired with a specified value (±11, ±13) to be excluded from the conversion process, this It can be converted to another adjacent integer and passed. For example, when the learned weights w _i are 10.7 and 11.3, the learning device may convert them into 10 and 12, respectively.

따라서 디지털 뉴런의 신경소자(NE)에 인가되는 가중치(w_i)는 상기한 바와 같이, 일부 지정된 수를 제외한 기지정된 비트 수를 갖는 정수일 수 있다. Accordingly, the weight w _i applied to the neural element NE of the digital neuron may be an integer having a predetermined number of bits excluding some specified number as described above.

가중치(w_i)가 2개의 부분곱의 합으로 표현될 수 있으므로, 부분곱 생성기(MBS) 각각은 수학식 3과 같이 2개의 부분곱 회로와 2의 보수 회로 및 가중치 0 계산기를 포함하여 구현될 수 있다. 그리고 N개의 부분곱 생성기(MBS) 각각은 2개의 부분곱((P1₁, P2₁), (P1₂, P2₂), …, (P1_N, P2_N))을 부분곱 가산기(PSUM)로 출력한다.Since the weight (w _i ) can be expressed as the sum of two partial products, each partial product generator (MBS) will be implemented by including two partial product circuits, a two's complement circuit, and a weight 0 calculator as shown in Equation 3. I can. And each of the N partial product generators (MBS) uses two partial products ((P1 ₁ , P2 ₁ ), (P1 ₂ , P2 ₂ ), …, (P1 _N , P2 _N )) as a partial product adder (PSUM). Print.

부분곱 가산기(PSUM)은 N개의 부분곱 생성기(MBS)에서 인가되는 2N개의 ((P1₁, P2₁), (P1₂, P2₂), …, (P1_N, P2_N))과 바이어스 값(BS)를 합산하여 출력값(Out)을 출력한다.Partial product adder (PSUM) is 2N ((P1 ₁ , P2 ₁ ), (P1 ₂ , P2 ₂ ), …, (P1 _N , P2 _N )) and bias values applied from N partial product generators (MBS). The output value (Out) is output by summing (BS).

여기서 출력값(Out)은 2차원 가중치 필터가 2차원 입력 특징 맵의 일부 영역에 대해 내적 연산한 결과로서, 2차원의 출력 특징 맵의 원소이다.Here, the output value Out is a result of a dot product operation on a partial area of the 2D input feature map by the 2D weight filter, and is an element of the 2D output feature map.

인공 뉴런은 컨볼루션 연산을 수행하기 위해, 2차원 가중치 필터를 2차원 입력 특징 맵 상에서 이동시켜 다시 가중치 필터와 대응하는 2차원 입력 특징 맵을 내적 연산할 수 있다.In order to perform a convolution operation, the artificial neuron may perform a dot product calculation of a 2D input feature map corresponding to the weight filter by moving the 2D weight filter on the 2D input feature map.

도5 는 도3 의 인공 뉴런에서 신경소자의 상세 구성의 일예를 나타낸다.5 shows an example of a detailed configuration of a neuron in the artificial neuron of FIG. 3.

도5 를 참조하여, 디지털 뉴런의 신경소자(NE)를 설명하면, 신경소자(NE)는 N개의 가중치 분해부(WDC)와 N개의 부분곱 생성기(MBS), 부분곱 가산기(PSUM) 및 네거티브 부분곱 카운터(NCCP)를 포함한다.Referring to FIG. 5, the neural element NE of a digital neuron is described, wherein the neural element NE includes N weight decomposition units (WDC), N partial product generators (MBS), partial product adder (PSUM), and negative. It includes a partial product counter (NCCP).

상기한 바와 같이, 신경소자(NE)는 2차원 입력 특징 맵을 인가받아 2차원의 가중치 필터로 내적 연산을 수행한다. 그리고 2차원 입력 특징 맵의 N개의 입력값(X)은 m 비트(예를 들면, 8비트)의 정수 값을 가지며, 가중치 필터의 N개의 가중치(w_i)는 지정된 정수(예를 들면, ±11 및 ±13)가 제외된 기지정된 비트 수(n)(예를 들면, 5비트)의 정수 값을 갖는 것으로 가정한다.As described above, the neural element NE receives the 2D input feature map and performs the dot product operation using the 2D weight filter. In addition, N input values (X) of the 2D input feature map have m-bit (e.g., 8-bit) integer values, and N weights (w _i ) of the weight filter are specified integers (e.g., ± It is assumed that 11 and ±13) have an integer value of the specified number of bits n (eg, 5 bits) excluding.

N개의 가중치 분해부(WDC) 각각은 N개의 가중치 중 대응하는 가중치(w_i)를 인가받아, 계수(w_b1, w_b2)와 2의 승수(2ⁱ, 2^j)의 부분곱(w_b12ⁱ, w_b22^j) 형태로 분해하고, 분해된 가중치(w_b12ⁱ, w_b22^j)에 따라 N개의 부분곱 생성기(MBS) 중 대응하는 부분곱 생성기(MBS_i)의 동작을 제어한다. 이때, 가중치 분해부(WDC)는 가중치(w_i)를 부분곱의 개수가 최소화되도록 2의 승수(2ⁿ)와 2의 보수의 조합으로 구성된 표1 및 표2 에 따라 가중치(w_i)를 분해하고, 부분곱 생성기(MBS_i)의 동작을 판별한다. 그리고 판별된 동작을 수행하도록 부분곱 생성기(MBS_i)를 제어하여, 부분곱 생성기(MBS_i)가 가중치(w_i)와 입력값(X)의 곱셈을 수행할 수 있도록 한다.Each of the N weight decomposition units (WDC) receives the corresponding weight (w _i ) among the N weights, and the partial product (w _b1 ) of the coefficients (w _b1 , w _b2 ) and the multiplier of 2 (2 ⁱ , 2 ^j ) 2 ⁱ , w _b2 2 ^j ), and the operation of the corresponding partial product generator (MBS _i ) among N partial product generators (MBS) according to the decomposed weight (w _b1 2 ⁱ , w _b2 2 ^j ) Control. At this time, the weight decomposition unit (WDC) is a weight weight (w _i) depending on the power of 2 (2 ⁿ⁾ and a table consisting of a combination of the two's complement 1 and 2 so that the number of the minimum of the (w _i) the partial products Decomposition and determine the operation of the partial product generator (MBS _i ). And by controlling the partial product generator (MBS _i) to perform the determination operation, so that partial product generators (MBS _i) is able to perform the multiplication of the weight (w _i) and the input value (X).

가중치 분해부(WDC)는 표1 및 표2 를 이용하여 가중치(w_i)에 대응하여, 입력 신호(X)가 시프트되어야 하는 비트수를 판별한다. 그리고 판별된 비트수에 따라 시프트 제어 신호(SH1, SH2)를 출력한다. 여기서 시프트되어야 하는 비트 수는 분해된 가중치(w_i)의 2의 승수(2ⁱ, 2^j)에서 지수(i, j)에 대응한다.The weight decomposition unit (WDC) determines the number of bits to which the input signal (X) is to be shifted according to the weight (w _i ) using Tables 1 and 2. Then, the shift control signals SH1 and SH2 are output according to the determined number of bits. Here, the number of bits to be shifted corresponds to the exponent (i, j) in a power of 2 (2 ⁱ , 2 ^j ) of the decomposed weight (w _i ).

또한 가중치 분해부(WDC)는 표1 및 표2 로부터 분해된 가중치(w_i)의 계수(w_b1, w_b2)로부터 네거티브 부분곱이 요구되는, 즉 2의 보수를 취해야 하는 값을 판별한다. 가중치 분해부(WDC)는 계수(w_b1, w_b2)가 음수, 즉 -1 인 경우, 2의 보수를 취하는 것으로 판단한다. 그리고 네거티브 부분곱이 필요한 값에 따라 반전 제어 신호(INV1, INV2)를 출력한다.In addition, the weight decomposition unit WDC determines a value for which a negative partial product is required, that is, a two's complement, from the coefficients w _b1 and w _b2 of the weight w _i decomposed from Tables 1 and 2. When the coefficients w _b1 and w _b2 are negative, that is, -1, the weight decomposition unit WDC determines that it takes 2's complement. In addition, the inversion control signals INV1 and INV2 are output according to the required value for the negative partial product.

한편 가중치 분해부(WDC)는 분해된 가중치(w_i)의 계수(w_b1, w_b2)가 0의 값을 갖는 경우, 제로 연산 신호(ALZ1, ALZ2)를 출력한다. 그리고 가중치 분해부(WDC)는 N개의 부분곱 생성기(MBS)로 인가되는 모든 반전 제어 신호(INV1, INV2)를 네거티브 부분곱 카운터(NCCP)로 전송한다.Meanwhile, the weight decomposition unit WDC outputs the zero operation signals ALZ1 and ALZ2 when the coefficients w _b1 and w _b2 of the decomposed weight w _i have a value of 0. Further, the weight decomposition unit WDC transmits all inversion control signals INV1 and INV2 applied to the N partial product generators MBS to the negative partial product counter NCCP.

가중치 분해부(WDC)는 표1 및 표2 를 기반으로 가중치(w_i)에 따른 시프트 제어 신호(SH1, SH2), 반전 제어 신호(INV1, INV2) 및 제로 연산 신호(ALZ1, ALZ2)를 생성하기 위한 적어도 하나의 룩업 테이블(Look-Up Table)을 포함할 수 있다. 도5 에서는 일예로 가중치 분해부(WDC)가 시프트 제어 신호(SH1, SH2), 반전 제어 신호(INV1, INV2) 및 제로 연산 신호(ALZ1, ALZ2) 각각을 생성하기 위한 3개의 룩업 테이블을 포함하는 것으로 가정하여 도시하였다.The weight decomposition unit (WDC) generates shift control signals (SH1, SH2), inversion control signals (INV1, INV2), and zero operation signals (ALZ1, ALZ2) according to the weights (w _i ) based on Tables 1 and 2. It may include at least one look-up table for doing so. In FIG. 5, as an example, the weight decomposition unit WDC includes three lookup tables for generating shift control signals SH1 and SH2, inversion control signals INV1 and INV2, and zero operation signals ALZ1 and ALZ2, respectively. It is shown assuming that.

N개의 부분곱 생성기(MBS) 각각은 수학식 3과 같이 2개의 비트별 부분곱 및 2의 보수 연산을 수행하여, 입력값(X)과 가중치(w_i)의 곱셈 연산을 수행할 수 있다. 이에 N개의 부분곱 생성기(MBS) 각각은 2개의 부분곱 계산기(PP1, PP2)를 포함한다. 본 실시예에서는 가중치 분해부(WDC)가 가중치(w_i)를 2개의 부분곱(w_b12ⁱ, w_b22^j) 형태로 분해하는 것으로 가정함에 따라, N개의 부분곱 생성기(MBS) 각각이 2개의 부분곱 계산기(PP1, PP2)를 포함하는 것으로 도시되었다. 그러나 가중치 분해부(WDC)가 가중치(w_i)를 3개 이상의 부분곱(w_b12ⁱ, w_b22^j) 형태로 분해하는 경우, N개의 부분곱 생성기(MBS) 각각은 분해된 가중치의 부분곱 개수에 대응하는 개수의 부분곱 계산기를 포함하도록 구성된다.Each of the N partial product generators MBS may perform a multiplication operation of an input value X and a weight w _i by performing a partial product for each bit and a two's complement operation as shown in Equation 3. Accordingly, each of the N partial product generators MBS includes two partial product calculators PP1 and PP2. In this embodiment, as it is assumed that the weight decomposition unit (WDC) decomposes the weight (w _i ) into two partial products (w _b1 2 ⁱ , w _b2 2 ^j ), each of the N partial product generators (MBS) It is shown to include two partial product calculators (PP1, PP2). However, when the weight decomposition unit (WDC) decomposes the weight (w _i ) into three or more partial products (w _b1 2 ⁱ , w _b2 2 ^j ), each of the N partial product generators (MBS) It is configured to include a partial product calculator of a number corresponding to the number of partial products.

2개의 부분곱 계산기(PP1, PP2)는 각각 비트 시프터(BSH), 1의 보수 연산기(1CMP) 및 제로 곱셈기(ZOP)를 포함하여, 가중치(w_i)의 특정 비트와 입력값(X)을 부분곱한 부분곱(P1, P2)을 출력한다.Two partial product calculators (PP1, PP2) include a bit shifter (BSH), one's complement operator (1CMP), and a zero multiplier (ZOP), respectively, to calculate a specific bit of the weight (w _i ) and an input value (X). Partial product and partial product (P1, P2) are output.

비트 시프터(BSH)는 가중치 분해부(WDC)에서 인가되는 시프트 제어 신호(SH1, SH2)에 응답하여, 입력값(X)을 상위 비트 방향으로 비트 천이 시킨다. 즉 입력값(X)에 분해된 가중치(w_b12ⁱ, w_b22^j)의 2의 승수(2ⁱ, 2^j), 즉 특정 비트를 곱한다. 여기서 비트 시프터(BSH)는 일예로 배럴 시프터(Barrel shifter)로 구현될 수 있다. 비트 시프터(BSH)는 표1 및 표2 에 나타난 바와 같이, 분해된 가중치(w_b12ⁱ, w_b22^j)에서 2의 승수(2ⁱ, 2^j)의 지수(i, j)에 대응하여, 입력값(X)을 상위 비트 방향으로 비트 천이 시킨다. 시프트 제어 신호(SH1, SH2)의 비트 수(q)는 표1 및 표2 에 나타난 가중치(w_i)에 대해 비트 천이(BSH)가 시프트시키는 최대 범위에 따라 결정된다.In response to the shift control signals SH1 and SH2 applied from the weight decomposition unit WDC, the bit shifter BSH makes the input value X bit shift in the upper bit direction. That is, the input value (X) is multiplied by the decomposed weight (w _b1 2 ⁱ , w _b2 2 ^j ) by a power of 2 (2 ⁱ , 2 ^j ), that is, a specific bit. Here, the bit shifter BSH may be implemented as a barrel shifter, for example. The bit shifter (BSH) corresponds to the exponent (i, j) of a multiplier of 2 (2 ⁱ , 2 ^j ) in the decomposed weight (w _b1 2 ⁱ , w _b2 2 ^j ) as shown in Tables 1 and 2. Thus, the input value (X) is bit shifted in the upper bit direction. The number of bits q of the shift control signals SH1 and SH2 is determined according to the maximum range that the bit shift BSH shifts with respect to the weight w _i shown in Tables 1 and 2.

그리고 1의 보수 연산기(1CMP)는 가중치 분해부(WDC)에서 인가되는 반전 제어 신호(INV1, INV2)에 응답하여, 비트 시프터(BSH)에서 인가되는 값을 반전하여 1의 보수를 계산한다.In addition, the one's complement operator 1CMP calculates one's complement by inverting the values applied from the bit shifter BSH in response to the inversion control signals INV1 and INV2 applied from the weight decomposition unit WDC.

제로 곱셈기(ZOP)는 가중치 분해부(WDC)에서 인가되는 제로 연산 신호(ALZ1, ALZ2)에 응답하여, 1의 보수 연산기(1CMP)에서 인가되는 값을 0으로 전환하여 출력한다. 제로 곱셈기(ZOP)는 가중치(w_i)가 0으로 인가된 경우에, 입력값(X)에 무관하게 0의 값을 출력하기 위해 추가된 구성이다.The zero multiplier ZOP converts and outputs a value applied from the one's complement operator 1CMP to 0 in response to the zero operation signals ALZ1 and ALZ2 applied from the weight decomposition unit WDC. The zero multiplier ZOP is a component added to output a value of 0 regardless of the input value X when the weight w _i is applied as 0.

여기서 2개의 부분곱 계산기(PP1, PP2)가 2의 보수 연산기 대신 1의 보수 연산기를 포함하는 것은 부분곱 계산기(PP1, PP2)의 구조를 더욱 간략화하기 위해서이다.Here, the reason that the two partial product calculators PP1 and PP2 include a one's complement operator instead of a two's complement operator is to further simplify the structure of the partial product calculators PP1 and PP2.

2의 보수 연산기는 인가된 모든 비트값을 반전한 후 1을 더하는 연산을 수행해야 한다. 즉 2의 보수연산기는 인가된 값의 비트별 반전을 수행하는 반전 회로와 더불어 가산기가 필요하다.The two's complement operator must perform an operation that adds 1 after inverting all the applied bit values. That is, the two's complement operator requires an adder along with an inversion circuit that performs bitwise inversion of the applied value.

그에 반해 1의 보수 회로는 인가된 값의 모든 비트값을 단순히 반전하여 출력하므로, 가산기가 필요하지 않다. 따라서 2의 보수 회로에 비해 1의 보수 회로는 회로 구성이 간단하다.On the other hand, the 1's complement circuit simply inverts and outputs all the bit values of the applied value, so there is no need for an adder. Therefore, compared to the two's complement circuit, the one's complement circuit has a simpler circuit configuration.

가산기 1개의 크기나 전력 소모는 크지 않다. 그러나 신경소자(NE)가 N개의 부분곱 생성기(MBS)를 포함하고, N개의 부분곱 생성기(MBS) 각각이 2개의 부분곱 계산기(PP1, PP2)를 포함하므로, 인공 뉴런은 N * 2 개씩의 2의 보수 회로를 포함해야 한다. 따라서 2의 보수 회로를 1의 보수 회로로 전환하면, 디지털 뉴런의 하나의 신경소자(NE)에서 N * 2개의 가산기를 줄일 수 있다. 즉 인공 뉴런의 크기를 크게 줄일 수 있고, 연산 속도를 향상 시킬 수 있으며, 전력 소모를 절감할 수 있다.The size or power consumption of one adder is not large. However, since the neuron (NE) contains N partial product generators (MBS), and each of the N partial product generators (MBS) contains two partial product calculators (PP1, PP2), artificial neurons are N * 2 each. It should include a two's complement circuit. Therefore, by converting the 2's complement circuit to the 1's complement circuit, it is possible to reduce N * 2 adders in one neuron (NE) of a digital neuron. That is, the size of artificial neurons can be greatly reduced, the computational speed can be improved, and power consumption can be reduced.

다만, 부분곱 계산기(PP1, PP2)가 2의 보수 연산기 대신 1의 보수 연산기를 포함함에 따라, 부분곱 계산기(PP1, PP2)에서 출력되는 부분곱(P1, P2)의 합은 가중치(w_i)와 입력값(X)의 곱셈값과 차이가 발생할 수 있다. 이러한 차이는 부분곱(P1, P2) 중 네거티브 부분곱을 계산할 때, 2의 보수가 가 아닌 1의 보수가 이용되어 1이 추가되지 않았기 때문에 발생한다.However, as the partial product calculators PP1 and PP2 include one's complement operator instead of the two's complement operator, the sum of the partial products P1 and P2 output from the partial product calculators PP1 and PP2 is the weight w _i ) And the multiplication value of the input value (X) and the difference may occur. This difference occurs because 1 is not added because 1's complement, not 2's complement, is used when calculating the negative partial product among the partial products P1 and P2.

이에 본 실시예에서는 네거티브 부분곱 카운터(NCCP)를 별도로 포함하여, 부분곱 계산기(PP1, PP2)가 2의 보수가 아닌 1의 보수 연산을 수행하여, 가산되지 않은 1을 계산하도록 한다. 네거티브 부분곱 카운터(NCCP)는 가중치 분해부(WDC)에서 인가되는 반전 제어 신호(INV1, INV2)를 카운트한다. 여기서 카운트된 값은 N개의 가중치(w_i)에서 표1 및 표2 에 따라 네거티브 부분곱이 요구되는 개수를 의미한다. 즉 네거티브 부분곱 카운터(NCCP)는 신경소자(NE) 전체에서 2의 보수 연산에 추가되어야 하는 1의 값을 일괄로 카운트하여 카운트 값(NUM_P)을 획득한다. 그리고 카운트 값(NUM_P)을 부분곱 가산기(PSUM)으로 전달한다.Accordingly, in this embodiment, a negative partial product counter (NCCP) is separately included so that the partial product calculators PP1 and PP2 perform a one's complement operation instead of two's complement to calculate one that is not added. The negative partial product counter NCCP counts the inversion control signals INV1 and INV2 applied from the weight decomposition unit WDC. Here, the counted value means the number of negative partial products required according to Tables 1 and 2 from the N weights w _i . That is, the negative partial product counter NCCP collectively counts a value of 1 to be added to a two's complement operation in the entire neuro element NE to obtain a count value NUM_P. Then, the count value NUM_P is transferred to the partial product adder PSUM.

부분곱 가산기(PSUM)는 N개의 부분곱 생성기(MBS) 각각으로부터 2개씩의 부분곱(P1_i, P2_i)을 인가받고, 네거티브 부분곱 카운터(NCCP)로부터 카운트 값(NUM_P)을 인가받으며, 바이어스 값(BS)를 인가받아 모두 합산하여, 출력값(Out)를 출력한다.The partial product adder (PSUM) receives two partial products (P1 _i , P2 _i ) from each of the N partial product generators (MBS), and receives a count value (NUM_P) from the negative partial product counter (NCCP), The bias value BS is applied, all of them are summed, and the output value Out is output.

도6 은 도5 의 부분곱 가산기의 상세 구성의 일예를 나타낸다.6 shows an example of a detailed configuration of the partial product adder of FIG. 5.

상기한 바와 같이, 부분곱 가산기(PSUM)는 N개의 부분곱 생성기(MBS) 각각으로부터 2개씩의 부분곱(P1_i, P2_i)을 인가받으며, 여기서는 N은 25인 경우를 가정하여 설명한다. N이 25이면, 부분곱 가산기(PSUM)는 50(= 2N)개의 부분곱((P1₁~ P1_N), (P2₁~ P2_N))과 카운트 값(NUM_P) 및 바이어스 값(BS)를 인가받아 합산하여, 출력값(Out)를 출력한다.As described above, the partial product adder PSUM receives two partial products P1 _i and P2 _i from each of the N partial product generators MBS. Here, it is assumed that N is 25. If N is 25, the partial product adder (PSUM) calculates 50 (= 2N) partial products ((P1 ₁ to P1 _N ), (P2 ₁ to P2 _N )) and count values (NUM_P) and bias values (BS). It is applied and summed, and the output value (Out) is output.

본 실시예에서 부분곱 가산기(PSUM)는 기본적으로 월러스 트리 가산기(Wallace Tree Adder)와 유사한 방식으로 덧셈 연산을 수행한다. 본 실시예에서 부분곱 가산기(PSUM)는 가산해야 하는 부분곱의 개수에 대응하는 다수의 스테이지(ST1 ~ STk)와 CLA 가산기(CAL)를 포함하고, 다수의 스테이지(ST1 ~ STk) 각각은 다수의 전가산기(Full Adder: FA)를 포함하는 다수의 가산 그룹(GP)을 포함한다.In this embodiment, the partial product adder (PSUM) basically performs an addition operation in a manner similar to that of the Wallace Tree Adder. In this embodiment, the partial product adder (PSUM) includes a plurality of stages (ST1 to STk) and a CLA adder (CAL) corresponding to the number of partial products to be added, and each of the plurality of stages (ST1 to STk) is It includes a number of addition groups (GPs) including the full adder (FA) of.

제1 스테이지(ST1)에서 다수의 가산 그룹(GP) 각각은 2N개의 부분곱((P1₁~ P1_N), (P2₁~ P2_N))에서 동일위치의 비트들을 기설정된 a개씩 그룹화하여 가산한다. 2N개의 부분곱((P1₁~ P1_N), (P2₁~ P2_N)) 각각이 j비트(j는 자연수, 일예로 j = m+n)의 값인 경우, 각각의 가산 그룹(GP)는 j개의 전가산기(FA)를 포함할 수 있다.In the first stage (ST1), each of the plurality of addition groups (GP) is added by grouping the bits at the same position by a preset number of 2N partial products ((P1 ₁ ~ P1 _N ), (P2 ₁ ~ P2 _N )). do. When 2N partial products ((P1 ₁ ~ P1 _N ), (P2 ₁ ~ P2 _N )) are j-bits (j is a natural number, for example, j = m+n), each addition group (GP) is It may include j full adders (FA).

일예로 제1 스테이지(ST1)의 제1 가산 그룹(GP)의 전가산기(FA)들은 3개의 부분곱(P1₁, P1₂, P1₃)의 동일 비트를 입력으로 인가받아 가산할 수 있다. 그리고 제2 가산 그룹(GP)의 전가산기(FA)들 또한 3개의 부분곱(P1₄, P1₅, P1₆)의 동일 비트를 입력으로 인가받아 가산할 수 있다.For example, the full adders FA of the first addition group GP of the first stage ST1 may add the same bits of the three partial products P1 ₁ , P1 ₂ , and P1 ₃ as inputs. Further, the full adders FA of the second addition group GP may also add the same bits of the three partial products P1 ₄ , P1 ₅ , and P1 ₆ as inputs.

그러나 경우에 따라서 제1 스테이지(ST1)의 다수의 가산 그룹(GP) 각각의 전가산기(FA)들은 5개의 부분곱(P1₁, P1₂, P1₃, P1₄, P1₅)의 동일 비트를 입력으로 인가받아 가산할 수도 있다. 즉 각 가산 그룹(GP)가 부분곱의 동일 비트를 인가받아 가산하는 개수는 다양하게 조절될 수 있다.However, in some cases, the full adders FA of each of the plurality of addition groups GP of the first stage ST1 use the same bit of the _five partial products P1 ₁ , P1 ₂ , P1 ₃ , P1 ₄ , and P1 ₅ . It can also be added by being authorized as an input. That is, the number of additions by each addition group GP receiving the same bit of the partial product may be variously adjusted.

제1 스테이지(ST1)가 2N개의 부분곱((P1₁~ P1_N), (P2₁~ P2_N))을 a개씩 그룹화하여 가산을 수행하는 경우, 제1 스테이지(ST1)에는 A(= 2N/a)개의 가산 그룹(GP)이 포함된다.When the first stage ST1 performs addition by grouping 2N partial products ((P1 ₁ to P1 _N ), (P2 ₁ to P2 _N )) by a, A(= 2N) is performed in the first stage ST1 /a) addition groups (GPs) are included.

그리고 제2 스테이지(ST2) 이후 단의 스테이지는 이전 스테이지에서 가산된 결과를 인가받아 다시 a개씩 그룹화하여 가산을 수행한다. 이때 도4 에 도시된 바와 같이 전가산기(FA)가 a비트의 입력을 인가받아 캐리(carry) 및 합값(sum)의 2비트를 출력하므로, 각 스테이지(ST2 ~ STk-1)는 가산 그룹(GP)의 수가 순차적으로 2/a씩 줄어들게 된다.In addition, the stages after the second stage ST2 receive the results added from the previous stage, group a pieces again, and perform the addition. At this time, as shown in FIG. 4, since the full adder FA receives an input of a-bit and outputs 2 bits of the carry and sum value, each stage ST2 to STk-1 is an addition group ( The number of GP) decreases sequentially by 2/a.

일예로 각 가산 그룹(GP)의 전가산기(FA)가 3비트의 입력을 받아 2비트를 출력하도록 구성되고 부분곱의 개수(2N)가 50개인 경우, 제1 스테이지(ST1)는 17(50/3 = 16.6)개의 가산 그룹(GP)을 포함한다. 그리고 제2 스테이지(ST2)는 12개(17* 2/3 = 11.3)의 가산 그룹(GP)를 포함하고, 제3 스테이지(ST3)는 8개의 가산 그룹(GP)을 포함한다. 스테이지별로 가산 그룹(GP)의 수가 2/3씩로 줄어들므로, 제7 스테이지(ST7)에서 가산 그룹(GP)의 수는 2가 된다.For example, when the full adder (FA) of each addition group (GP) is configured to receive an input of 3 bits and output 2 bits, and the number of partial products (2N) is 50, the first stage (ST1) is 17 (50). /3 = 16.6) addition groups (GPs). In addition, the second stage ST2 includes 12 addition groups GP (17*2/3 = 11.3), and the third stage ST3 includes 8 addition groups GP. Since the number of addition groups GP is reduced by 2/3 for each stage, the number of addition groups GP becomes 2 in the seventh stage ST7.

그리고 각 스테이지(ST1 ~ STk) 로 가산 결과가 전이되는 동안, 가산 결과의 비트 폭은 1비트씩 확장된다. 예로서, 2N개의 부분곱((P1₁~ P1_N), (P2₁~ P2_N)) 각각이 j비트의 값인 경우, 제1 스테이지(ST1)의 A개의 가산 그룹(GP)은 각각 j개의 전가산기(FA)를 포함하여, A개의 j비트 가산 값을 출력한다. 그러나 제2 스테이지(ST2)의 B개의 가산 그룹(GP) 각각은 j+1개의 전가산기(FA)를 포함하여, B개의 j+1비트 가산값을 출력한다. 또한 제3 스테이지(ST3)의 C(C = 2B/a)개의 가산 그룹(GP) 각각은 j+2개의 전가산기(FA)를 포함하여, C개의 j+2비트 가산값을 출력한다.And while the addition result is transferred to each stage (ST1 to STk), the bit width of the addition result is extended by 1 bit. As an example, when 2N partial products ((P1 ₁ to P1 _N ), (P2 ₁ to P2 _N )) are j-bit values, each of the A addition groups GP of the first stage ST1 is j Includes a full adder (FA) and outputs A j-bit addition values. However, each of the B addition groups GP of the second stage ST2 includes j+1 full adders FA and outputs B j+1-bit addition values. In addition, each of the C (C = 2B/a) addition groups GP of the third stage ST3 includes j+2 full adders FA and outputs C j+2 bit addition values.

상기한 부분곱 가산기(PSUM)의 2N개의 부분곱((P1₁~ P1_N), (P2₁~ P2_N))에 대한 가산 방법은 월러스 트리 가산기와 유사하다. 그러나 이러한 부분곱 가산기(PSUM)의 구조는 입력되는 2N개의 부분곱((P1₁~ P1_N), (P2₁~ P2_N))이 자연수인 경우에 동일한 것으로 네거티브 부분곱에 대한 부호 비트가 고려되어 있지 않은 것이다.The method of adding the 2N partial products ((P1 ₁ to P1 _N ), (P2 ₁ to P2 _N )) of the partial product adder PSUM is similar to the Wallace tree adder. However, the structure of this partial product adder (PSUM) is the same when the input 2N partial products ((P1 ₁ ~ P1 _N ), (P2 ₁ ~ P2 _N )) are natural numbers, and the sign bit for the negative partial product is considered. It is not done.

부분곱 가산기(PSUM)가 개별 부호가 있는 숫자의 합계를 계산하고, 부호 비트(s)를 포함하여 채널 연산값(MO)을 출력하는 경우, j비트의 2N개의 부분곱((P1₁~ P1_N), (P2₁~ P2_N)) 각각은 부호 비트(1비트)가 s비트(일예로 6비트)만큼 더 확장되어 j+s 비트의 값으로 입력되어야 한다. 즉 부호 비트로 s비트(s= T-1)가 추가로 확장되어야 한다.When the partial product adder (PSUM) calculates the sum of individual signed numbers and outputs the channel operation value (MO) including the sign bit (s), 2N partial products of j bits ((P1 ₁ to P1 _{In each of N} ) and (P2 ₁ ~ P2 _N )), the sign bit (1 bit) is extended by s bits (for example, 6 bits) and must be input as a value of j+s bits. That is, the s bit (s = T-1) must be additionally extended as a sign bit.

이 경우, 부분곱 가산기(PSUM)의 각 스테이지(ST1 ~ STk)에서 각각의 가산 그룹(GP)은 s개의 전가산기(FA)를 더 구비해야 하며, 이는 부분곱 가산기(PSUM)의 회로 면적을 약 21% 증가시키게 되는 문제가 있다.In this case, in each stage (ST1 to STk) of the partial product adder (PSUM), each addition group (GP) must further include s full adders (FA), which is the circuit area of the partial product adder (PSUM). There is a problem that increases by about 21%.

이에 본 실시예에 따른 부분곱 가산기(PSUM)는 확장되어야 하는 부호 비트의 값을 별도로 인가받아 최종 스테이지(STk)에서 가산되도록 함으로써, 부분곱 가산기(PSUM)의 크기와 전력 소모를 줄일 수 있도록 한다.Accordingly, the partial product adder PSUM according to the present embodiment receives the value of the sign bit to be extended separately and adds it in the final stage STk, thereby reducing the size and power consumption of the partial product adder PSUM. .

이를 위해, 부분곱 가산기(PSUM)는 네거티브 부분곱 카운터(NCCP)에서 인가되는 카운트 값(NUM_P)을 2의 보수로 변환하는 2의 보수 연산기(2CMP)를 더 포함하고, 2의 보수 연산기(2CMP)는 카운트 값(NUM_P)에 대한 2의 보수를 연산하여, 카운트 보수값(N_NUM_P)을 출력한다. 이때 카운트 보수값(N_NUM_P)의 비트수(T)는 확장되는 부호 비트 수에 대응하는 비트 수를 갖는다.To this end, the partial product adder (PSUM) further includes a two's complement operator (2CMP) for converting the count value (NUM_P) applied from the negative partial product counter (NCCP) into two's complement, and a two's complement operator (2CMP). ) Calculates two's complement to the count value NUM_P, and outputs the count complement value N_NUM_P. At this time, the number of bits T of the count complement value N_NUM_P has the number of bits corresponding to the number of extended sign bits.

2의 보수 연산기(2CMP)가 카운트 값(NUM_P)을 2의 보수로 변환하는 것은, 카운트 값(NUM_P)을 2의 보수가 2N개의 부분곱((P1₁~ P1_N), (P2₁~ P2_N)) 전체에서 네거티브 부분곱의 부호 비트를 모두 더한 값과 동일하기 때문이다.The two's complement operator (2CMP) converts the count value (NUM_P) into two's complement, which converts the count value (NUM_P) into 2's complement of 2N partial products ((P1 ₁ to P1 _N ), (P2 ₁ to P2). _{This is because N} )) is equal to the sum of all the sign bits of the negative partial product.

도7 은 도6 의 2의 보수 연산기의 동작 개념을 설명하기 위한 도면이다.FIG. 7 is a diagram for explaining the operation concept of the two's complement operator of FIG. 6;

도7 에서는 일예로, 부분곱 가산기(PSUM)가 5비트 이진 정수 포멧의 6개의 부분곱(11101, 10110, 10010, 00101, 10101, 10000)을 가산하여 출력하는 경우를 가정하여 설명한다. 도7 에서 6개의 부분곱 중 사인비트가 1인 5개의 부분곱(11101, 10110, 10010, 10101, 10000)은 음수로서 네거티브 부분곱이고, 사인비트가 0인 1개의 부분곱(00101)은 양수로서 포지티브 부분곱이다.In FIG. 7 as an example, it is assumed that the partial product adder (PSUM) adds and outputs six partial products (11101, 10110, 10010, 00101, 10101, 10000) in a 5-bit binary integer format. In FIG. 7, five partial products (11101, 10110, 10010, 10101, 10000) with a sine bit of 1 are negative and are negative partial products, and one partial product with a sine bit of 0 (00101) is positive. Is a positive partial product.

그리고 6개의 부분곱은 부호 비트가 s비트(일예로 6비트)만큼 더 확장되어 j+s 비트(도7 에서는 11비트)의 값으로 전환된다. 확장된 부호 비트는 결국 부호 비트와 동일한 비트값을 갖고 확장된다.And, in the six partial products, the sign bit is further extended by s bits (for example, 6 bits) and converted into a value of j+s bits (11 bits in FIG. 7). The extended sign bit is eventually extended with the same bit value as the sign bit.

그러나 확장된 전체 부호비트의 값은 5개의 -1의 값과 1개의 0의 값의 합이므로, (-1) X 5 + 0 X 1 = -5 와 동일하다. 즉 음수의 개수, 즉 네거티브 부분곱의 개수(5)에 2의 보수를 취한 값(-5)과 동일하다.However, since the value of the entire extended sign bit is the sum of 5 values of -1 and 1 value of 0, it is equal to (-1) X 5 + 0 X 1 = -5. That is, it is equal to the number of negative numbers, that is, the number of negative partial products (5) and the value of 2's complement (-5).

이는 5비트 이진 정수 포멧의 6개의 부분곱(11101, 10110, 10010, 00101, 10101, 10000)을 가산할 때, 부호 비트 부분을 제외한 2비트의 양의 이진 정수 포멧을 가산하고, 이후, 네거티브 부분곱의 개수(5)에 2의 보수를 취한 값(-5)을 별도로 가산해도 동일하다는 것을 의미한다.When adding 6 partial products (11101, 10110, 10010, 00101, 10101, 10000) of the 5-bit binary integer format, 2 bits of the positive binary integer format excluding the sign bit part are added, and then, the negative part It means that it is the same even if the number of products (5) and the value (-5) obtained by taking the complement of 2 are separately added.

이에 본 발명의 실시예에 따른 부분곱 가산기(PSUM)는 2의 보수 연산기(2CMP)를 포함하여, 네거티브 부분곱 카운터(NCCP)에서 인가되는 카운트 값(NUM_P)을 2의 보수로 변환하여, 확장되는 부호 비트 수에 대응하는 비트 수를 갖는 카운트 보수값(N_NUM_P)을 획득함으로써, 부분곱 가산기(PSUM)의 크기와 전력 소모를 줄일 수 있도록 한다.Accordingly, the partial product adder (PSUM) according to the embodiment of the present invention includes a two's complement operator (2CMP), and converts the count value (NUM_P) applied from the negative partial product counter (NCCP) into two's complement, and is extended. By acquiring the count complement value N_NUM_P having the number of bits corresponding to the number of coded bits, the size and power consumption of the partial product adder PSUM can be reduced.

다시 도6 을 참조하면, 2의 보수 연산기(2CMP)에서 출력되는 카운트 보수값(N_NUM_P)은 도4 에 도시된 바와 같이, 최종 스테이지(STk)에서 이전 스테이지까지 누적된 합산 결과의 상위 비트에 대응하는 전가산기(FA)의 다수의 입력 중 하나의 입력으로 인가되어 가산된다.Referring back to FIG. 6, the count complement value N_NUM_P output from the 2's complement operator 2CMP corresponds to the upper bit of the summation result accumulated from the last stage STk to the previous stage, as shown in FIG. It is added by being applied as one of a plurality of inputs of the full adder (FA).

한편 본 실시예에 따른 부분곱 가산기(PSUM)는 부분곱 생성기(MSB)의 부분곱 계산기(PP1, PP2)가 2의 보수가 아닌 1의 보수를 이용하여 부분곱을 수행하였으므로, 네거티브 부분곱 카운터(NCCP)에서 인가되는 카운트 값(NUM_P)을 추가로 가산해야 한다. 이에 도4 에 도시된 바와 같이, 카운트 값(NUM_P)의 각 비트는 최종 스테이지(STk)의 하위 비트에 대응하는 전가산기(FA)의 다수의 입력 중 하나의 입력으로 인가되어 가산된다.Meanwhile, in the partial product adder PSUM according to the present embodiment, since the partial product calculators PP1 and PP2 of the partial product generator MSB perform partial multiplication using one's complement rather than two's complement, a negative partial product counter ( NCCP) must be added to the count value (NUM_P). Accordingly, as shown in FIG. 4, each bit of the count value NUM_P is applied as an input of a plurality of inputs of the full adder FA corresponding to the lower bit of the last stage STk and added.

또한 부분곱 가산기(PSUM)는 최종 스테이지(STk) 이전의 스테이지에 바이어스 값(BS)을 인가받아 추가로 가산한다. 이때 바이어스 값(BS)은 도6 에 도시된 바와 같이, 부호 비트가 추가 확장된 비트수(BS'[j-1+s:0])를 갖는 2진 포멧의 값으로 전환되어 인가될 수 있다. 추가 확장된 비트수를 갖는 2진 포멧의 바이어스 값(BS')의 각 비트는 전가산기(FA)의 다수의 입력 중 하나의 입력으로 인가되어 가산된다.In addition, the partial product adder PSUM receives the bias value BS applied to the stage before the final stage STk and adds the bias value BS. At this time, as shown in FIG. 6, the bias value BS may be converted into a value of a binary format having an additionally extended number of bits (BS'[j-1+s:0]) and applied as shown in FIG. . Each bit of the bias value BS' of the binary format having an additional extended number of bits is applied to one of a plurality of inputs of the full adder FA and added.

그리고 CLA 가산기(CLA)는 최종 스테이지(STk)에서 가산된 각 비트값을 인가받아 가산하여 최종적으로 출력값(Out[j+s:0])을 출력한다.In addition, the CLA adder CLA receives and adds each bit value added in the final stage STk, and finally outputs an output value Out[j+s:0].

결과적으로, 본 실시예에 따른 부분곱 가산기(PSUM)는 고속 덧셈 연산이 가능한 월러스 트리 가산 방식을 이용하여, 고속으로 연산할 수 있다. 그리고 부호 비트에 대한 연산을 네거티브 부분곱 카운터(NCCP)에서 인가되는 카운트 값(NUM_P)에 대한 2의 보수로 계산하여, 별도로 상위 비트에 가산함으로써, 회로 크기 및 전력 소비를 줄일 수 있다. 또한 카운트 값(NUM_P)을 하위 비트에 가산함으로써, 부분곱 생성기(MBS)의 부분곱 계산기(PP1, PP2)들이 2의 보수 연산기(2CMP)가 아닌 1의 보수 연산기(1CMP)를 구비할 수 있도록 한다.As a result, the partial product adder (PSUM) according to the present embodiment can perform high-speed calculation using a Wallace tree addition method capable of a high-speed addition operation. In addition, the operation on the sign bit is calculated as two's complement to the count value NUM_P applied from the negative partial product counter NCCP, and is added to the upper bit separately, thereby reducing circuit size and power consumption. In addition, by adding the count value (NUM_P) to the lower bit, the partial product calculators (PP1, PP2) of the partial product generator (MBS) can have a one's complement operator (1CMP) instead of a two's complement operator (2CMP). do.

도8 은 디지털 뉴런의 신경소자의 다른 예를 나타낸다.8 shows another example of a neuron of a digital neuron.

도8 의 신경소자(NE)는 도5 의 신경소자(NE)와 마찬가지로 N개의 가중치 분해부(WDC)와 N개의 부분곱 생성기(MBS), 부분곱 가산기(PSUM) 및 네거티브 부분곱 카운터(NCCP)를 포함한다.The neural element NE of FIG. 8 is similar to the neural element NE of FIG. 5, with N weight decomposition units (WDC), N partial product generators (MBS), partial product adder (PSUM), and negative partial product counter (NCCP). ).

N개의 가중치 분해부(WDC) 각각은 N개의 가중치 중 대응하는 가중치(w_i)를 인가받아, 계수(R_a, R_b, R_c)와 2의 승수(2^a, 2^b, 2^c)의 부분곱(R_a·2^a, R_b·2^b, R_c·2^c) 형태로 분해(w_i = R_a·2^a + R_b·2^b + R_c·2^c (여기서 계수(R_a, R_b, R_c ∈ {-1, 0, 1}))하고, 분해된 가중치(R_a·2^a, R_b·2^b, R_c·2^c)에 따라 N개의 부분곱 생성기(MBS) 중 대응하는 부분곱 생성기(MBS_i)의 동작을 제어한다.Each of the N weight decomposition units (WDC) receives a corresponding weight (w _i ) among N weights, and a coefficient (R _a , R _b , R _c ) and a multiplier of 2 (2 ^a , 2 ^b , 2 ^c ) Partial product of (R _a ·2 ^a , R _b ·2 ^b , R _c ·2 ^c ) decomposition into the form (w _i = R _a ·2 ^a + R _b ·2 ^b + R _c ·2 ^c (where the coefficient ( R _a , R _b , R _c ∈ {-1, 0, 1})) and N partial product generators according to the decomposed weights (R _a ·2 ^a , R _b ·2 ^b , R _c ·2 ^c ) Controls the operation of the corresponding partial product generator MBS _i among (MBS).

가중치 분해부(WDC)는 분해된 가중치(R_a·2^a, R_b·2^b, R_c·2^c)에서 계수(R_a, R_b, R_c)와 2의 승수(2^a, 2^b, 2^c)의 지수(a, b, c)를 대응하는 부분곱 생성기(MBS_i)로 전달하여, 부분곱 생성기(MBS_i)의 동작을 제어할 수 있다.The weight decomposition unit (WDC) is ^a factor (R _a , R _b , R _c ) and a multiplier of 2 (2 ^a , 2) from the decomposed weights (R _a · 2 ^a , R _b · 2 ^b , R _c · 2 ^c ). ^b, it is possible to pass to the partial product generator (MBS _i) corresponding to the index (a, b, c) of 2 ^c), controlling the operation of partial product generators (MBS _i).

도8 에서는 일예로 가중치 분해부(WDC)가 도5 의 가중치 분해부(WDC)와 달리 가중치(w_i)를 3개의 부분곱(R_a·2^a, R_b·2^b, R_c·2^c)으로 분해하는 것으로 가정하였으나, 가중치 분해부(WDC)는 도5 에서와 마찬가지로 가중치(w_i)를 2개의 부분곱으로 분해하도록 설정될 수도 있다.In FIG. 8, as an example, the weight decomposition unit (WDC) is different from the weight decomposition unit (WDC) of Fig. 5, and the weight (w _i ) is divided into three partial products (R _a ·2 ^a , R _b ·2 ^b , R _c ·2). ^Although it is assumed that the decomposition is performed by ^c ), the weight decomposition unit WDC may be set to decompose the weight w _i into two partial products as in FIG. 5.

N개의 부분곱 생성기(MBS) 각각은 입력값(X_i)과 네거티브 입력값(

+1)을 인가받는다. 여기서 네거티브 입력값(

+1)은 입력값(X_i)의 2의 보수값이다. 그리고 N개의 부분곱 생성기(MBS) 각각은 3개의 부분곱 계산기(PP1 ~ PP3)를 포함할 수 있다. 이는 상기한 바와 같이, 가중치 분해부(WDC)가 가중치(w_i)를 3개의 부분곱(R_a·2^a, R_b·2^b, R_c·2^c)으로 분해하는 것으로 가정하였기 때문이다.Each of the N partial product generators (MBS) has an input value (X _i ) and a negative input value (

+1) is authorized. Where the negative input (

+1) is the two's complement value of the input value (X _i ). In addition, each of the N partial product generators MBS may include three partial product calculators PP1 to PP3. This is because, as described above, it is assumed that the weight decomposition unit (WDC) decomposes the weight (w _i ) into three partial products (R _a ·2 ^a , R _b ·2 ^b , R _c ·2 ^c ). .

3개의 부분곱 계산기(PP1 ~ PP3) 각각은 먹스(PMX)와 비트 시프터(BSH)를 포함한다. 먹스(PMX)는 입력값(X_i)과 네거티브 입력값(

+1)에 분해된 가중치(R_a·2^a, R_b·2^b, R_c·2^c)에서 계수(R_a, R_b, R_c)를 곱하기 위한 구성으로, 가중치 분해부(WDC)에서 인가되는 계수(R_a, R_b, R_c) 중 대응하는 계수에 응답하여, 입력값(X_i)과 네거티브 입력값(

+1) 및 접지 전원 중 하나를 선택하여 비트 시프터(BSH)로 전달한다.Each of the three partial product calculators (PP1 to PP3) includes a mux (PMX) and a bit shifter (BSH). The mux (PMX) is the input value (X _i ) and the negative input value (

+1) is multiplied by the coefficients (R _a , R _b , R _c ) from the decomposed weights (R _a ·2 ^a , R _b ·2 ^b , R _c ·2 ^c ).WDC In response to the corresponding one of the coefficients (R _a , R _b , R _c ) applied from, the input value (X _i ) and the negative input value (

It selects one of +1) and ground power and delivers it to the bit shifter (BSH).

상기한 바와 같이, 계수(R_a, R_b, R_c)는 -1, 0, 1의 값 중 하나로 지정될 수 있다. 도8 의 부분곱 계산기(PP1 ~ PP3)에서 먹스(PMX)는 계수(R_a, R_b, R_c)가 -1인 경우, 입력값(X_i)을 반전하는 도5 와 달리, 반전되어 인가되는 네거티브 입력값(

+1)을 선택함으로써, 네거티브 부분곱을 획득할 수 있도록 한다. 또한, 계수(R_a, R_b, R_c)가 0인 경우에 제로 곱셈을 수행하지 않고, 논리 0을 나타내는 접지 전원을 선택하여 출력하도록 한다. 먹스(PMX)가 네거티브 입력값(

+1) 및 논리 0의 접지 전원을 선택할 수 있도록 구성됨에 따라 도8 의 부분곱 계산기(PP1 ~ PP3)는 1의 보수 연산기(1CMP) 및 제로 곱셈기(ZOP)를 포함하지 않아도 무방하다.As described above, the coefficients R _a , R _b , and R _c may be specified as one of values of -1, 0, and 1. In the partial product calculators PP1 to PP3 of FIG. 8, the mux (PMX) is inverted, unlike FIG. 5, which inverts the input value (X _i ) when the coefficients (R _a , R _b , R _c ) are -1. The applied negative input value (

By selecting +1), it is possible to obtain a negative partial product. In addition, when the coefficients R _a , R _b , and R _c are 0, zero multiplication is not performed, and a ground power supply representing logic 0 is selected and output. The mux (PMX) is the negative input value (

The partial product calculators PP1 to PP3 of FIG. 8 may not include a one's complement operator 1CMP and a zero multiplier ZOP, as it is configured to select +1) and a ground power source of logic 0.

비트 시프터(BSH)는 먹스(PMX)에서 선택된 값을 인가받고, 가중치 분해부(WDC)에서 인가되는 지수(a, b, c)에 응답하여, 인가된 값을 상위 비트 방향으로 비트 천이 시켜서 부분곱(P1, P2, P3)을 출력한다.The bit shifter (BSH) receives the selected value from the mux (PMX), and in response to the exponents (a, b, c) applied from the weight decomposition unit (WDC), the applied value is bit-shifted in the upper bit direction. Outputs the product (P1, P2, P3).

부분곱 가산기(PSUM)는 동일하게 N개의 부분곱 생성기(MBS) 각각으로부터 3개씩의 부분곱(P1_i, P2_i, P3_i)을 인가받고, 바이어스 값(BS)를 인가받아 모두 합산하여, 출력값(Out)를 출력한다.The partial product adder (PSUM) receives three partial products (P1 _i , P2 _i , P3 _i ) from each of the N partial product generators (MBS) in the same manner, and adds a bias value (BS) to all of them, Output the output value (Out).

도9 는 본 발명의 다른 실시예에 따른 인공 신경망의 디지털 뉴런에 대한 블록 다이어그램을 나타내고, 도10 은 도9 의 인공 뉴런에서 신경소자의 상세 구성의 일예를 나타내며, 도11 은 도10 의 부분곱 가산기의 상세 구성의 일예를 나타낸다.9 shows a block diagram of a digital neuron of an artificial neural network according to another embodiment of the present invention, FIG. 10 shows an example of a detailed configuration of a neuron in the artificial neuron of FIG. 9, and FIG. 11 is a partial product of FIG. An example of a detailed configuration of an adder is shown.

도3 에서는 일예로 디지털 뉴런이 하나의 2차원 입력 특징 맵과 2차원 입력 특징 맵에 대응하는 하나의 2차원의 가중치 필터를 인가받는 것으로 가정하여, 디지털 뉴런이 하나의 신경소자(NE)를 포함하는 것으로 도시하였다.In FIG. 3, as an example, it is assumed that a digital neuron is applied with one two-dimensional input feature map and one two-dimensional weight filter corresponding to the two-dimensional input feature map, and the digital neuron includes one neural element (NE). Shown as

그러나 도1 에 나타난 바와 같이, 컨볼루션 레이어(C1, C2, C3)의 인공 뉴런은 3차원 입력 특징 맵을 인가받고, 3차원 가중치 필터를 이용하여 3차원 컨볼루션 연산을 수행할 수 있다.However, as shown in FIG. 1, the artificial neurons of the convolution layers C1, C2, and C3 can receive a 3D input feature map and perform a 3D convolution operation using a 3D weight filter.

여기서 3차원 입력 특징 맵은 M개(여기서 M은 자연수)의 2차원 입력 특징 맵으로 구성될 수 있으며, M개의 2차원 입력 특징 맵은 3차원 입력 특징 맵의 M개의 채널이다. 이때 M개의 2차원 입력 특징 맵으로 구성되는 3차원 입력 특징 맵은 이전 레이어에서 출력되는 출력 특징 맵의 일부 일 수 있다. 일예로 이전 레이어에서 출력되는 출력 특징 맵은 L(L > M인 자연수)(도1 에서는 일예로 32개) 개의 2차원 출력 특징 맵으로 구성될 수 있다. 그리고 인공 뉴런은 L개의 2차원 출력 특징 맵에서 M개의 2차원 출력 특징 맵을 선택하여 3차원 입력 특징 맵을 구성하는 M개의 입력 특징 맵으로서 인가받을 수 있다.Here, the 3D input feature map may be composed of M 2D input feature maps (where M is a natural number), and the M 2D input feature maps are M channels of the 3D input feature map. In this case, the 3D input feature map composed of M 2D input feature maps may be a part of the output feature map output from the previous layer. As an example, the output feature maps output from the previous layer may be composed of L (a natural number of L> M) (32 as an example in FIG. 1) 2D output feature maps. In addition, the artificial neuron may select M two-dimensional output feature maps from the L two-dimensional output feature maps, and may be applied as M input feature maps constituting a three-dimensional input feature map.

또한 동일한 컨볼루션 레이어에서 다수의 인공 뉴런 각각은 서로 다른 개수의 2차원 입력 특징맵으로 구성된 3차원 입력 특징 맵을 인가받을 수 있다.In addition, in the same convolutional layer, each of a plurality of artificial neurons may receive a 3D input feature map composed of a different number of 2D input feature maps.

한편 3차원 입력 특징 맵의 M개의 채널에 대응하여, 3차원의 가중치 필터 또한 M개의 2차원 가중치 필터를 갖는다. 여기서 M개의 2차원 입력 특징 맵 각각은 이전 컨볼루션 레이어의 인공 뉴런 각각에서 생성된 출력 특징 맵일 수 있다.Meanwhile, corresponding to the M channels of the 3D input feature map, the 3D weight filter also has M 2D weight filters. Here, each of the M 2D input feature maps may be output feature maps generated by each artificial neuron of the previous convolutional layer.

이에 도9 내지 도11 에서는 디지털 뉴런이 M개의 2차원 입력 특징 맵과 M개의 2차원 가중치 필터를 인가받도록 구성된 경우를 도시하였으며, 이에 디지털 뉴런은 M개의 신경소자(NE)를 포함하도록 구성되었다.Accordingly, FIGS. 9 to 11 illustrate a case in which the digital neuron is configured to receive M 2D input feature maps and M 2D weight filters, and the digital neuron is configured to include M neurons (NE).

도9 에 도시된 M개의 신경소자(NE) 각각은 기본적으로 도3 에 도시된 신경소자(NE)와 유사한 구성을 갖는다. 다만, 도9 및 도10 에 도시된 디지털 뉴런에서 M개의 신경소자(NE) 각각은 바이어스값(BS)를 합산하지 않고, N개의 부분곱 생성기(MBS)에서 인가되는 2N개의 ((P1₁, P2₁), (P1₂, P2₂), …, (P1_N, P2_N))만을 합산하여 채널 연산값(MO₁ ~ MO_M)을 출력한다.Each of the M neural elements NE shown in FIG. 9 has a configuration similar to that of the neural elements NE shown in FIG. 3. However, in the digital neurons shown in Figs. 9 and 10, each of the M neurons (NE) does not add up the bias value (BS), and 2N ((P1 ₁ ,) applied from the N partial product generators (MBS) Only P2 ₁ ), (P1 ₂ , P2 ₂ ), …, (P1 _N , P2 _N )) are summed and the channel calculation value (MO ₁ ~ MO _M ) is output.

그리고 도9 의 디지털 뉴런은 M개의 신경소자(NE) 각각에서 출력되는 M개의 채널 연산값(MO₁ ~ MO_M)과 바이어스 값(BS)를 합산하여, 출력값(Out)을 출력한다. 여기서 출력값(Out)는 3차원 가중치 필터가 3차원 입력 특징 맵의 일부 영역에 대해 내적 연산한 결과로서, 2차원의 출력 특징 맵의 원소이다.In addition, the digital neuron of FIG. 9 adds the M channel operation values MO ₁ to MO _M output from each of the M neurons NE and the bias value BS, and outputs an output value Out. Here, the output value Out is a result of a dot product operation on a partial area of the 3D input feature map by the 3D weight filter, and is an element of the output feature map in 2D.

디지털 뉴런은 이후, 3차원 가중치 필터를 3차원 입력 특징 맵 상에서 이동시켜 다시 내적 연산하여, 2차원의 출력 특징 맵의 다른 원소를 계산할 수 있다.The digital neuron may then move the 3D weight filter on the 3D input feature map and perform dot product operation again to calculate another element of the 2D output feature map.

도10 에서는 일예로서 다수의 신경소자(NE)가 도3 과 동일한 구성을 갖는 것으로 도시하였으나, 다수의 신경소자(NE)는 도8 에 따를 구성을 가질 수도 있다.In FIG. 10, as an example, a plurality of nerve elements NE is shown to have the same configuration as in FIG. 3. However, the plurality of neuro elements NE may have a configuration according to FIG. 8.

도12 는 본 실시예에 따른 디지털 뉴런의 전력 소비를 시뮬레이션 한 결과를 나타낸다.12 shows the result of simulating power consumption of digital neurons according to the present embodiment.

도12 에서는 일반적인 인공 신경망의 학습 및 실행을 위해 주로 이용되는 범용 연산 장치로서 다수의 상용 GPU와 본 실시예의 디지털 뉴런의 전력 소비를 비교하여 나타내었다. 여기서 본 실시예에 따른 디지털 뉴런을 상용 GPU와 비교한 것은 단순 연산을 반복하여 수행하는 인공 신경망에서는 CPU보다 GPU가 더욱 우수한 성능을 나타내고, 전력 소모를 적게 하기 때문이다.In FIG. 12, power consumption of a plurality of commercial GPUs and digital neurons according to the present embodiment as a general-purpose computing device mainly used for learning and execution of a general artificial neural network is compared. Here, the comparison of the digital neuron according to the present embodiment with a commercial GPU is because the GPU exhibits better performance than the CPU and consumes less power in an artificial neural network that repeatedly performs simple operations.

도12 를 참조하면, 본 실시예에 따라 하드웨어적으로 구현된 디지털 뉴런은 300mW 미만의 전력을 소비하는데 비해, 상용 GPU는 100W 이상의 전력을 소비함을 알 수 있다. 즉 상용 GPU에 비해 본 실시예의 디지털 뉴런은 전력 소모를 1/300 미만으로 줄일 수 있다.Referring to FIG. 12, it can be seen that a digital neuron implemented in hardware according to the present embodiment consumes less than 300mW of power, whereas a commercial GPU consumes 100W or more. That is, compared to a commercial GPU, the digital neuron of this embodiment can reduce power consumption to less than 1/300.

표4 는 본 실시예에 따른 디지털 뉴런의 성능을 다른 인공 뉴런과 비교한 결과를 나타낸다.Table 4 shows the results of comparing the performance of digital neurons according to the present embodiment with other artificial neurons.

표4 를 참조하면, 본 실시예에 따른 디지털 뉴런은 5비트의 정수 가중치를 이용하여 부스 곱셈(booth multiplier)을 수행하는 방식과 유사한 정확도를 나타내는 반면, 게이트 지연 및 전력 소비가 크게 줄어듬을 알 수 있다. 또한 1비트의 정수 가중치를 이용하는 YodaNN 인공 뉴런에 비해 전력 소비는 높으나 정확도가 월등하게 높다는 것을 알 수 있다.Referring to Table 4, it can be seen that the digital neuron according to the present embodiment exhibits similar accuracy to the method of performing a boost multiplier using a 5-bit integer weight, while gate delay and power consumption are significantly reduced. have. In addition, it can be seen that the power consumption is higher than that of YodaNN artificial neurons using 1-bit integer weights, but the accuracy is remarkably high.

결과적으로 본 실시예에 따른 인공 신경망을 위한 디지털 뉴런, 인공 뉴런 및 이를 포함하는 추론 엔진은 일부 값을 제외한 지정된 비트 수의 정수로 변환된 가중치를 이용하므로 부분곱의 개수를 최소화한다. 그리고 네거티브 부분곱에 대해 2의 보수가 아닌 1의 보수 연산을 수행하여 획득함으로써 부분곱 회로의 구조를 간략화 할 수 있다. 또한 채널 연산값(MO) 계산 시에 부호 비트값을 카운트 값(NUM_P)을 이용하여 별도로 계산한 후, 추가함으로써, 부분곱 가산기 구조를 간소화 할 수 있다. 따라서 고속 연산이 가능하며, 소비 전력을 줄일 수 있으며 저면적으로 구현될 수 있다.As a result, the digital neuron, the artificial neuron, and the inference engine including the same for the artificial neural network according to the present embodiment minimize the number of partial products because the weight converted to an integer of the specified number of bits excluding some values is used. In addition, the structure of the partial product circuit can be simplified by obtaining a negative partial product by performing a one's complement operation instead of two's complement. In addition, when calculating the channel operation value MO, the sign bit value is separately calculated using the count value NUM_P and then added, thereby simplifying the partial product adder structure. Therefore, high-speed operation is possible, power consumption can be reduced, and a low area can be implemented.

도13 은 본 발명의 실시예에 따른 디지털 뉴런을 이용하여 가중치 필터 크기를 가변할 수 있는 인공 뉴런의 일예를 나타낸다.13 shows an example of an artificial neuron capable of varying the size of a weight filter using a digital neuron according to an embodiment of the present invention.

도9 의 디지털 뉴런은 M개의 2차원 입력 특징 맵을 포함하는 3차원 입력 특징 맵을 인가받고, M개의 2차원 가중치 필터를 포함하는 3차원 가중치 필터를 이용하여 컨볼루션 연산 또는 내적 연산을 수행할 수 있도록 구성되었다.The digital neuron of FIG. 9 receives a 3D input feature map including M 2D input feature maps, and performs a convolution operation or a dot product operation using a 3D weight filter including M 2D weight filters. It was constructed to be able to.

다만 도9 의 디지털 뉴런은 가중치 필터에 포함되는 가중치의 개수의 조절이 용이하지 않았다. 즉 가중치 필터의 크기를 조절하기 용이하지 않았다.However, the digital neuron of FIG. 9 was not easy to adjust the number of weights included in the weight filter. That is, it was not easy to adjust the size of the weight filter.

만일 가중치 필터가 5 X 5인 크기의 M개의 2차원 가중치 필터로 구성되는 경우, 도9 의 디지털 뉴런은 5 X 5인 크기 미만(예를 들면 3 X 3), M개 미만의 2차원 가중치 필터에 대해서는 내적 연산에 영향을 주지않는 더미 가중치(예를 들면 0)를 추가하여, 5 X 5인 크기의 M개의 2차원 가중치 필터로 변환하여 연산을 수행함으로써, 입력 특징 맵과 가중치 필터의 내적 연산을 수행할 수 있다.If the weight filter is composed of M two-dimensional weight filters with a size of 5 X 5, the digital neurons in Fig. 9 are less than 5 X 5 (for example, 3 X 3) and less than M two-dimensional weight filters. For the dummy weight (e.g. 0) that does not affect the dot product operation, it is converted into M two-dimensional weight filters with a size of 5 X 5 to perform the operation, thereby performing the dot product operation of the input feature map and the weight filter. Can be done.

즉 디지털 뉴런은 디지털 하드웨어 설계 시에 지정된 크기 미만의 가중치 필터에 대해서는 용이하게 내적 연산 또는 컨볼루션 연산을 수행할 수 있다. 그러나 지정된 크기를 초과하는 가중치 필터에 대해서는 내적 연산 또는 컨볼루션 연산을 수행할 수 없다.That is, the digital neuron can easily perform a dot product operation or a convolution operation on a weight filter less than a specified size when designing digital hardware. However, a dot product operation or a convolution operation cannot be performed on a weight filter exceeding the specified size.

이러한 문제를 방지하기 위해, 디지털 뉴런의 설계 시에 미리 연산 가능한 가중치 필터의 크기를 크게 확장하게 되면, 불필요한 다수의 연산을 수행할 뿐만 아니라, 디지털 뉴런의 크기와 전력 소모가 증가된다.In order to prevent such a problem, when the size of a weight filter that can be computed in advance when designing a digital neuron is greatly expanded, not only a number of unnecessary calculations are performed, but also the size and power consumption of the digital neuron are increased.

도13 에서는 디지털 뉴런에 지정된 크기를 초과하는 가중치 필터를 이용하여 연산을 수행할 수 있으며, 지정된 크기 이하의 가중치 필터를 이용하여 연산을 수행하는 경우, 연산 속도를 향상시킬 수 있는 인공 뉴런의 구조를 개념적으로 도시한다.In FIG. 13, a structure of an artificial neuron capable of improving the operation speed when the operation is performed using a weight filter exceeding the size specified for the digital neuron, and the operation is performed using a weight filter less than the specified size. Illustrated conceptually.

도13 을 참조하면, 본 실시예에 따른 인공 뉴런은 입력값 추출부(INEX), 다수의 디지털 뉴런(DN), 출력 가산기(AD), 적어도 하나의 입력 선택 먹스(Mux1) 및 출력 선택 먹스(Mux2)를 포함한다.Referring to FIG. 13, the artificial neuron according to the present embodiment includes an input value extractor (INEX), a plurality of digital neurons (DN), an output adder (AD), at least one input selection mux (Mux1), and an output selection mux ( Mux2).

입력값 추출부(INEX)는 입력 특징 맵에서 미리 지정된 위치의 지정된 크기의 입력값을 추출한다. 이때 입력값 추출부(INEX)는 모드 신호(MD)에 응답하여 서로 다른 위치의 입력값을 추출할 수 있으며, 모드 신호(MD)는 기본 모드 및 확장 모드를 지정할 수 있다.The input value extracting unit INEX extracts an input value of a specified size at a predetermined location from the input feature map. In this case, the input value extracting unit INEX may extract input values of different positions in response to the mode signal MD, and the mode signal MD may designate a basic mode and an extended mode.

여기서 기본 모드는 다수의 디지털 뉴런(DN) 각각이 연산 가능한 지정된 크기 이하의 가중치 필터를 이용하여 연산을 수행하기 위한 모드이고, 확장 모드는 지정된 크기를 초과하는 가중치 필터를 이용하여 연산을 수행하기 위한 모드이다.Here, the basic mode is a mode for performing an operation using a weight filter less than a specified size in which each of a plurality of digital neurons (DN) can operate, and the extended mode is a mode for performing an operation using a weight filter exceeding the specified size. Mode.

도13 에서는 일예로 디지털 뉴런(DN)의 지정된 가중치 필터의 크기가 5 X 5 크기의 M개의 2차원 가중치 필터인 것으로 가정하였다. 따라서 기본 모드는 디지털 뉴런(DN)에 지정된 크기에 따라 가중치 필터가 5 X 5 크기 이하, M개 이하의 2차원 가중치 필터를 포함하는 경우에 설정될 수 있다.In FIG. 13, as an example, it is assumed that the size of a designated weight filter of a digital neuron (DN) is M two-dimensional weight filters having a size of 5 X 5. Accordingly, the basic mode may be set when the weight filter includes a size of 5 X 5 or less and M or less 2D weight filters according to a size designated for the digital neuron (DN).

그러나 가중치 필터의 크기가 5 X 5 크기를 초과하거나 M개를 초과하는 2차원 가중치 필터를 포함하는 경우, 확장 모드가 지정된다.However, when the size of the weight filter exceeds the size of 5 X 5 or includes more than M two-dimensional weight filters, an extension mode is specified.

여기서는 가중치 필터가 5 X 5 크기의 M개의 2차원 가중치 필터로 구성되는 경우와, 7 X 7 크기의 M개의 2차원 가중치 필터로 구성되는 경우를 가정하여 설명한다.Here, it is assumed that the weight filter is composed of M two-dimensional weight filters of size 5 X 5 and the case of M two-dimensional weight filters of 7 X 7 size is assumed.

입력값 추출부(INEX)는 모드 신호(MD)가 기본 모드로 지정된 경우, 입력 특징 맵에서 가중치 필터의 크기에 대응하는 크기를 갖는 2개의 입력값 세트를 추출한다.When the mode signal MD is designated as the basic mode, the input value extraction unit INEX extracts two input value sets having a size corresponding to the size of the weight filter from the input feature map.

기본 모드에서 입력값 추출부(INEX)는 각각 가중치 필터의 크기에 대응하는 2개의 입력값 세트를 추출한다.In the basic mode, the input value extracting unit INEX extracts two sets of input values each corresponding to the size of the weight filter.

도13 을 참조하면, 기본 모드에서 입력값 추출부(INEX)는 입력 특징 맵의 제1 영역(IN_T1)의 입력값과 제2 영역(IN_T2)의 입력값을 제1 기본 입력값 세트로 추출하고, 제2 영역(IN_T2)의 입력값과 제4 영역(IN_T4)의 입력값을 제2 기본 입력값 세트로 추출한다.Referring to FIG. 13, in the basic mode, the input value extracting unit INEX uses the input values of the first area IN _T1 and the second area IN _T2 of the input feature map as a first set of basic input values. Then, the input value of the second area IN _T2 and the input value of the fourth area IN _T4 are extracted as a second basic input value set.

그러나 확장 모드에서 입력값 추출부는 입력 특징 맵의 제1 영역(IN_T1)의 입력값과 제2 영역(IN_T2)의 입력값을 제1 확장 입력값 세트로 추출하고, 제3 영역(IN_T3)의 입력값과 제4 영역(IN_T4)의 입력값을 제2 확장 입력값 세트로 추출한다.A third region (IN _T3 but extracts any entry in the expansion mode part extracting the input value and the input value of the second area (IN _T2) of the first area (IN _T1) of the input feature map as a first expansion set of input values, and ) And the input value of the fourth region IN _T4 are extracted as a second extended input value set.

즉 제1 기본 입력값 세트와 제1 확장 입력값 세트는 동일한 입력값들이 추출되는 반면, 제2 기본 입력값 세트와 제2 확장 입력값 세트는 서로 다른 입력값들이 추출된다.That is, the same input values are extracted from the first basic input value set and the first extended input value set, while different input values are extracted from the second basic input value set and the second extended input value set.

여기서 추출된 제1 기본 입력값 세트와 제1 확장 입력값 세트는 다수의 디지털 뉴런 중 대응하는 제1 디지털 뉴런(DN1)으로 인가되고, 제2 기본 입력값 세트와 제2 확장 입력값 세트는 대응하는 제2 디지털 뉴런(DN2)로 인가된다.The extracted first set of basic input values and the first set of extended input values are applied to a corresponding first digital neuron (DN1) among a plurality of digital neurons, and the second set of basic input values and the second set of extended input values correspond. It is applied to the second digital neuron (DN2).

기본 모드에서 제1 및 제2 디지털 뉴런(DN1, DN2)은 각각 인가되는 제1 기본 입력값 세트와 제2 기본 입력값 세트와 지정된 가중치 필터를 이용하여 내적 연산을 수행하여 출력값(Out₁, Out₂)를 출력한다. 여기서 제1 및 제2 디지털 뉴런(DN1, DN2)의 가중치 필터는 동일한 가중치 필터이다.In the basic mode, the first and second digital neurons (DN1, DN2) each perform a dot product operation using the applied first set of basic input values, the second set of basic input values, and a designated weight filter, and output values (Out ₁ , Out). ₂ ) is printed. Here, the weight filters of the first and second digital neurons DN1 and DN2 are the same weight filters.

즉 기본 모드에서 제1 및 제2 디지털 뉴런(DN1, DN2)은 입력 특징 맵상을 가중치 필터가 이동하면서 내적 연산을 수행하는 컨볼루션 연산에서 가중치 필터가 위치가 이동된 연산을 동시에 수행한다. 즉 다수의 내적 연산으로 구성되는 컨볼루션 연산에서 2번의 내적 연산을 2개의 디지털 뉴런(DN1, DN2)를 이용하여 병렬로 수행할 수 있도록 함으로써, 연산 속도를 2배로 향상시킬 수 있다.That is, in the basic mode, the first and second digital neurons DN1 and DN2 simultaneously perform an operation in which the weight filter position is moved in a convolution operation in which the weight filter moves on the input feature map and performs the dot product operation. That is, in a convolution operation composed of a plurality of inner product operations, two inner product operations can be performed in parallel using two digital neurons DN1 and DN2, thereby doubling the operation speed.

도13 에서는 일예로 인공 뉴런이 2개의 디지털 뉴런(DN1, DN2)을 포함하는 것으로 가정하였으므로, 연산 속도가 2배로 향상되지만, 인공 뉴런에 포함되는 디지털 개수가 증가하면 연산 속도 또한 비례하여 향상될 수 있다.In Fig. 13, as an example, it is assumed that the artificial neuron includes two digital neurons (DN1, DN2), so the calculation speed is doubled, but if the number of digital included in the artificial neuron increases, the calculation speed can also be proportionally improved. have.

한편, 확장 모드에서 제1 및 제2 디지털 뉴런(DN1, DN2) 각각은 제1 확장 입력값 세트와 제2 확장 입력값 세트와 지정된 가중치 필터의 대응하는 영역을 이용하여 내적 연산을 수행하여 출력값(Out₁)을 출력한다.Meanwhile, in the extended mode, each of the first and second digital neurons DN1 and DN2 performs a dot product operation using a first set of extended input values, a second set of extended input values, and a corresponding region of a designated weight filter, and output values ( Out ₁ ) is output.

확장 모드에서는 가중치 필터의 크기가 디지털 뉴런(DN1, DN2)이 연산할 수 있는 크기보다 크기 때문에 하나의 디지털 뉴런만으로 요구되는 내적 연산을 수행할 수 없다. 이에 본 실시에에서는 다수의 디지털 뉴런(DN1, DN2)이 크기가 큰 가중치 필터의 대응하는 영역의 입력값과 가중치를 인가받아 각각 내적 연산을 수행할 수 있도록 함으로써, 디지털 뉴런(DN1, DN2)이 연산할 수 있는 크기보다 큰 크기의 가중치 필터에 대해서도 연산을 수행할 수 있도록 한다.In the extended mode, since the size of the weight filter is larger than the size that the digital neurons (DN1, DN2) can calculate, the dot product operation required by only one digital neuron cannot be performed. Accordingly, in this embodiment, a plurality of digital neurons (DN1, DN2) receive input values and weights of a corresponding region of a large-sized weight filter to perform a dot product operation, respectively, so that the digital neurons (DN1, DN2) It is possible to perform an operation even on a weight filter of a size larger than the size that can be calculated.

상기에서 각 디지털 뉴런(DN1, DN2)이 연산할 수 있는 가중치 필터의 크기가 5 X 5 X M인 것으로 가정하였므로, 각각의 디지털 뉴런(DN1, DN2)은 25 X M개의 입력값과 가중치에 대한 연산을 수행할 수 있다. 따라서 2개의 디지털 뉴런(DN1, DN2)은 50 X M개의 입력값과 가중치에 대한 연산을 수행할 수 있다.In the above, it is assumed that the size of the weight filter that each digital neuron (DN1, DN2) can calculate is 5 X 5 XM, so each digital neuron (DN1, DN2) calculates 25 XM input values and weights. Can be done. Therefore, two digital neurons DN1 and DN2 can perform calculations on 50 X M input values and weights.

확장 모드에서 인가된 가중치 필터가 7 X 7 X M (= 49 X M)인 것으로 가정하였으므로, 5 X 5 X M 크기의 2개의 디지털 뉴런(DN1, DN2)을 이용하는 인공 뉴런은 용이하게 7 X 7 X M 크기의 가중치 필터에 대한 내적 연산을 수행할 수 있게 된다.Since it is assumed that the weight filter applied in the extended mode is 7 X 7 XM (= 49 XM), artificial neurons using 2 digital neurons (DN1, DN2) of 5 X 5 XM size are easily used. It is possible to perform a dot product operation on the weight filter.

다만, 기본 모드에서의 병렬 연산은 개별적 출력값(Out₁, Out₂)으로 출력되어야 하는 반면, 확장 모드에서는 다수의 디지털 뉴런(DN1, DN2)의 출력값이 통합되어야 단일 출력값(Out₁)으로 출력되어야 한다.However, parallel operation in basic mode must be output as individual output values (Out ₁ , Out ₂ ), whereas in extended mode, output values of multiple digital neurons (DN1, DN2) must be integrated and output as a single output value (Out ₁ ). do.

이에 도13 의 인공 뉴런은 출력 가산기(ADD)와 출력 선택 먹스(Mux2)를 더 포함한다. 출력 가산기(ADD)는 확장 모드에서 다수의 디지털 뉴런(DN1, DN2)의 출력값을 합하여 출력 선택 먹스(Mux2)로 출력한다.Accordingly, the artificial neuron of FIG. 13 further includes an output adder ADD and an output selection mux2. The output adder ADD sums the output values of a plurality of digital neurons DN1 and DN2 in the extended mode and outputs the sum to the output selection mux2.

출력 선택 먹스(Mux2)는 모드 신호(MD)에 응답하여 기본 모드에서는 제1 디지털 뉴런(DN1)에서 출력되는 출력값을 선택하는 반면, 확장 모드에서는 출력 가산기(ADD)에서 출력되는 합계 값을 출력한다.In response to the mode signal MD, the output selection Mux2 selects the output value output from the first digital neuron DN1 in the basic mode, while outputting the sum value output from the output adder ADD in the extended mode. .

한편, 도13 에 도시된 바와 같이, 제1 기본 입력값 세트와 제2 기본 입력값 세트는 제2 영역(IN_T2)의 입력값이 중복된다. 그리고 제2 기본 입력값 세트와 제2 확장 입력값 세트는 제4 영역(IN_T4)의 입력값이 중복되는 반면, 제3 영역(IN_T3)의 입력값은 제2 확장 입력값 세트에만 포함된다.Meanwhile, as shown in FIG. 13, the first set of basic input values and the second set of basic input values overlap the input values of the second area IN _T2 . In addition, the second basic input value set and the second extended input value set overlap the input value of the fourth area IN _T4 , while the input value of the third area IN _T3 is included only in the second extended input value set. .

이렇게 중복되는 입력값과 차별되는 입력값을 다수의 디저털 뉴런(DN1, DN2)로 각각 개별적으로 인가하는 경우, 인공 뉴런에 다수의 데이터 라인이 배치되어야 한다. 그리고 M개 채널 각각에서 입력값 세트가 출력되므로, 중복되는 입력값을 전송하기 위해 구비되어야 하는 데이터 라인의 수는 매우 많으며, 이로 인해 인공 뉴런의 크기와 전력 소비가 증가하는 문제가 있다.When the overlapping input values and different input values are individually applied to a plurality of digital neurons (DN1 and DN2), a plurality of data lines must be arranged on the artificial neuron. In addition, since a set of input values is output from each of the M channels, the number of data lines that must be provided to transmit redundant input values is very large, which increases the size and power consumption of artificial neurons.

이에 본 실시예에서는 다수개의 입력 선택 먹스(Mux1)을 구비하여, 제2 기본 입력값 세트와 제2 확장 입력값 세트에서 중복되는 입력값과 차별화 된 입력값이 선택적으로 제2 디지털 뉴런(DN2)으로 인가되도록 한다.Accordingly, in this embodiment, a plurality of input selection muxes (Mux1) are provided, so that the input values overlapping in the second basic input value set and the second extended input value set and the differentiated input value are selectively selected as the second digital neuron DN2. To be applied.

여기서 적어도 하나의 입력 선택 먹스(Mux1)는 모드 신호(MD)에 응답하여, 제2 기본 입력값 세트와 제2 확장 입력값 세트의 중복 입력값과 차별 입력값 중 하나를 선택할 수 있다.Here, the at least one input selection mux1 may select one of a redundant input value and a differential input value of the second basic input value set and the second extended input value set in response to the mode signal MD.

도14 는 본 발명의 다른 실시예에 따른 인공 신경망의 추론 엔진의 개략적 구성을 나타낸다.14 shows a schematic configuration of an inference engine of an artificial neural network according to another embodiment of the present invention.

도14 는 컨볼루션 신경망에 대한 추론 엔진의 일예를 도시하였다. 특히 도14 의 추론 엔진은 하나의 컨볼루션 엔진부(CEN)와 하나의 완전 연결 엔진부(FEN) 및 메모리(MEM)를 포함한다. 도1 에 도시된 바와 같이, 컨볼루션 신경망은 다수의 컨볼루션 레이어(C1, C2, C3)와 다수의 완전 연결 레이어(FC1, FC2)를 포함할 수 있다. 그러나 다수의 컨볼루션 레이어(C1, C2, C3)와 다수의 완전 연결 레이어(FC1, FC2)를 각각 구분하여 개별적인 회로로 구현하는 것은 크기, 전력 소모, 제조 비용 등에 있어서 매우 비효율적이다.14 shows an example of an inference engine for a convolutional neural network. In particular, the inference engine of FIG. 14 includes one convolution engine unit CEN, one fully connected engine unit FEN, and a memory MEM. As shown in FIG. 1, the convolutional neural network may include a plurality of convolutional layers C1, C2, and C3 and a plurality of fully connected layers FC1 and FC2. However, implementing multiple convolution layers (C1, C2, C3) and multiple fully connected layers (FC1, FC2) as separate circuits is very inefficient in terms of size, power consumption, and manufacturing cost.

이에 도14 에서는 유사한 동작을 수행하는 다수의 컨볼루션 레이어의 동작을 수행하기 위한 하나의 컨볼루션 엔진부(CEN)와 다수의 완전 연결 레이어의 동작을 수행하기 위한 하나의 완전 연결 엔진부(FEN)를 포함하여 컨볼루션 신경망이 디지털 하드웨어로 구현될 수 있도록 한다.Accordingly, in FIG. 14, one convolution engine unit (CEN) for performing operations of a plurality of convolutional layers performing similar operations and one fully connected engine unit (FEN) for performing operations of a plurality of fully connected layers. Including, the convolutional neural network can be implemented with digital hardware.

도14 에서는 일예로 왼쪽에 도시한 바와 같이, 인공 신경망이 2개의 컨볼루션 레이어(C1, C2)와 3개의 완전 연결 레이어(FC1, FC2, INF)를 포함하는 것으로 가정한다.In FIG. 14, for example, as shown on the left, it is assumed that the artificial neural network includes two convolutional layers C1 and C2 and three fully connected layers FC1, FC2, and INF.

우선 메모리(MEM)는 컨볼루션 신경망으로 인가되는 입력 데이터(또는 입력 영상)와 다수의 컨볼루션 레이어(C1, C2)와 다수의 완전 연결 레이어(FC1, FC2, INF)를 위한 가중치 필터가 저장된다.First, the memory MEM stores input data (or input image) applied to the convolutional neural network, a weight filter for a plurality of convolutional layers (C1, C2) and a plurality of fully connected layers (FC1, FC2, INF). .

컨볼루션 엔진부(CEN)는 입력값 획득부(INOB), 컨볼루션 필터뱅크 버퍼(FB1)와 입력 먹스(MuxIN), 입력 선택 먹스(MuxDI), 디먹스(Demux), 출력 특징 맵 버퍼(OFMB) 및 적어도 하나의 디지털 뉴런(DN1)을 포함한다.The convolution engine unit (CEN) includes an input value acquisition unit (INOB), a convolution filter bank buffer (FB1), an input mux (MuxIN), an input selection mux (MuxDI), a demux, and an output feature map buffer (OFMB). ) And at least one digital neuron (DN1).

입력 먹스(MuxIN)는 인공 신경망의 입력부로서 메모리(MEN)에 저장된 입력 데이터 또는 컨볼루션 엔진부(CEN)의 출력값을 인가받고, 레이어 선택 신호(SelINL)에 응답하여, 입력 영상 또는 출력값 중 하나를 선택하여 입력 특징 맵의 원소인 입력값을 입력값 획득부(INOB)로 출력한다.The input Muxin is an input part of an artificial neural network, receiving input data stored in the memory MEN or an output value of the convolution engine unit CEN, and in response to the layer selection signal SelINL, it receives either an input image or an output value. It selects and outputs an input value, which is an element of the input feature map, to the input value acquisition unit INOB.

여기서 레이어 선택 신호(SelIN_L)는 다수의 컨볼루션 레이어(C1, C2)를 포함하는 인공 신경망에서 컨볼루션 엔진부(CEN)의 초기 입력값을 선택하기 위한 신호로서, 인공 신경망의 초기 레이어(여기서는 제1 컨볼루션 레이어(C1))인지 아니면, 컨볼루션 레이어(C1, C2) 이후의 레이어(여기서는 완전 연결 레이어(FC1, FC2, INF))인지를 구분하는 신호이다.Here, the layer selection signal (SelIN _L ) is a signal for selecting an initial input value of the convolution engine unit (CEN) in an artificial neural network including a plurality of convolutional layers (C1, C2), and is an initial layer of the artificial neural network (here This is a signal for discriminating whether it is the first convolution layer C1) or a layer after the convolution layers C1 and C2 (here, the fully connected layers FC1, FC2, and INF).

상기한 바와 같이, 인공 신경망이 2개의 컨볼루션 레이어(C1, C2)를 포함하는 경우, 제1 컨볼루션 레이어(C1)는 입력 데이터를 인가받는데 반해, 제2 컨볼루션 레이어(C2)는 제1 컨볼루션 레이어(C1)의 출력값을 인가받도록 구성될 수 있다.As described above, when the artificial neural network includes two convolution layers (C1, C2), the first convolution layer (C1) receives input data, whereas the second convolution layer (C2) receives the first It may be configured to receive the output value of the convolution layer C1.

상기한 구조에 기반하여 입력 먹스(MuxIN)는 레이어 선택 신호(SelIN_L)에 응답하여, 초기 레이어(C1)이면 메모리(MEM)으로부터 입력 데이터를 선택하여 입력값 획득부(INOB)로 출력한다. 반면, 컨볼루션 레이어(C1, C2)가 아니면, 컨볼루션 엔진부(CEN)의 출력값을 선택하여 입력값 획득부(INOB)로 출력한다.Based on the above-described structure, the input muxin responds to the layer selection signal SelIN _L , and if it is the initial layer C1, selects input data from the memory MEM and outputs the input data to the input value acquisition unit INOB. On the other hand, if it is not the convolution layers C1 and C2, the output value of the convolution engine unit CEN is selected and output to the input value acquisition unit INOB.

여기서 컨볼루션 레이어(C1, C2)가 아닌 경우, 컨볼루션 엔진부(CEN)의 출력값은 기지정된 설정값 또는 초기화된 값으로 출력될 수 있다. 즉 입력 먹스(MuxIN)는 컨볼루션 레이어(C1, C2)가 아닌 경우에, 초기화된 값을 입력값 획득부(INOB)로 출력하여, 입력값 획득부(INOB)를 기지정된 초기화 값으로 초기화 시킬 수 있다.Here, when the convolution layers C1 and C2 are not, the output value of the convolution engine unit CEN may be output as a predetermined set value or an initialized value. In other words, if the input muxin is not the convolutional layer (C1, C2), it outputs the initialized value to the input value acquisition unit (INOB) and initializes the input value acquisition unit (INOB) to a predetermined initialization value. I can.

입력값 획득부(INOB)는 입력 먹스(MuxIN)로부터 입력 데이터를 인가받아 입력 특징 맵을 구성하여 저장하고, 저장된 입력 특징 맵에서 지정된 영역에 포함된 입력값을 추출하여 입력 선택 먹스(MuxDI)로 출력한다.The input value acquisition unit INOB receives input data from the input Muxin, configures and stores an input feature map, extracts the input value included in the designated area from the stored input feature map, and uses the input selection MuxDI. Print.

상기한 바와 같이 컨볼루션 레이어는 컨볼루션 연산을 수행하기 위해, 입력 특징 맵 상을 지정된 가중치 필터가 이동하면서 내적 연산을 수행해야 한다.As described above, in order to perform a convolution operation, the convolution layer must perform dot product operation while a designated weight filter is moved on an input feature map.

이는 입력값 획득부(INOB)에 저장된 입력 특징 맵의 다수의 입력값 중 가중치 필터에 대응하는 위치의 입력값을 위치 이동하면서 선택하여 디지털 뉴런(DN1)으로 전달해야 하며, 이러한 특징을 하드웨어적으로 구현하기 위해서는 매우 어렵다.This should be transferred to the digital neuron (DN1) by selecting the input value of the position corresponding to the weight filter among the plurality of input values of the input feature map stored in the input value acquisition unit (INOB) while moving. Very difficult to implement.

그러나 본 실시예에서는 입력값 획득부(INOB)를 도14 에 도시된 바와 같이 다수의 시프트 레지스터(shift register)로 구현하고, 입력 특징 맵의 다수의 입력값이 순차적으로 지정된 위치로 이동하도록 구성함에 따라, 항상 동일한 위치의 입력값을 추출하여 입력 선택 먹스(MuxDI)로 출력할 수 있도록 한다.However, in this embodiment, the input value acquisition unit (INOB) is implemented as a plurality of shift registers as shown in FIG. 14, and a plurality of input values of the input feature map are sequentially moved to a designated position. Accordingly, the input value at the same position is always extracted and output as an input selection MuxDI.

즉 시프트 레지스터를 이용하여 매우 용이하게 입력값 획득부(INOB)를 하드웨어로 구현할 수 있도록 하였다.That is, it is possible to implement the input value acquisition unit INOB in hardware very easily using the shift register.

그리고 입력값 획득부(INOB)는 현재 수행되어야 하는 레이어가 컨볼루션 레이어(C1, C2)가 아닌 경우에, 입력 먹스(MuxIN)로부터 인가되는 초기화값을 저장함으로써, 초기화된다.In addition, the input value acquisition unit INOB is initialized by storing an initialization value applied from the input muxin when the layer to be currently performed is not the convolutional layers C1 and C2.

입력 선택 먹스(MuxDI)는 입력 데이터 선택 신호(SelDI_L)에 응답하여, 입력값 획득부(INOB)에서 인가되는 입력값 또는 출력 특징 맵 버퍼(OFMB)에서 인가되는 출력값 중 하나를 선택하여 디지털 뉴런(DN1)으로 출력한다.In response to the input data selection signal SelDI _L , the input selection MuxDI selects one of an input value applied from the input value acquisition unit INOB or an output value applied from the output feature map buffer OFMB. Output as (DN1).

여기서 입력 데이터 선택 신호(SelDI_L)는 인공 신경망에서 컨볼루션 엔진부(CEN)가 현재 컨볼루션 연산을 수행해야 하는 컨볼루션 레이어가 초기 컨볼루션 레이어(C1)인지 아닌지를 판별하기 위한 신호이다.Here, the input data selection signal SelDI _L is a signal for determining whether or not the convolutional layer to which the convolution engine unit CEN currently performs a convolution operation in the artificial neural network is the initial convolutional layer C1.

입력 선택 먹스(MuxDI)는 인공 신경망의 다수의 컨볼루션 레이어(C1, C2) 중 초기 컨볼루션 레이어(C1)이면, 입력값 획득부(INOB)에서 인가되는 입력값을 디지털 뉴런(DN1)으로 출력한다.The input selection MuxDI outputs an input value applied from the input value acquisition unit INOB to a digital neuron (DN1) if it is the initial convolution layer (C1) among a plurality of convolution layers (C1, C2) of the artificial neural network. do.

그러나 초기 컨볼루션 레이어(C1)이 아니면, 출력 특징 맵 버퍼(OFMB)에서 인가되는 출력 특징 맵의 값을 입력값으로 디지털 뉴런(DN1)으로 출력한다. 즉 이전 컨볼루션 레이어에서 획득된 출력 특징 맵을 입력 특징 맵으로서, 디지털 뉴런(DN1)으로 전달한다.However, if it is not the initial convolution layer C1, the value of the output feature map applied from the output feature map buffer OFMB is output as an input value to the digital neuron DN1. That is, the output feature map obtained from the previous convolution layer is transferred to the digital neuron DN1 as an input feature map.

컨볼루션 필터뱅크 버퍼(FB1)는 메모리(MEN)에 저장된 가중치 필터를 인가받아 임시 저장한다. 이때 컨볼루션 필터뱅크 버퍼(FB1)는 다수의 컨볼루션 레이(C1, C2)어 각각에 대응하는 다수의 가중치 필터를 한번에 인가받아 저장할 수도 있으나, 현재 컨볼루션 엔진(CEN)이 수행하는 동작에 대응하는 컨볼루션 레이어를 위한 가중치 필터를 인가받아 저장할 수도 있다.The convolution filter bank buffer FB1 receives and temporarily stores the weight filter stored in the memory MEN. At this time, the convolution filter bank buffer (FB1) may receive and store a plurality of weight filters corresponding to each of a plurality of convolution layers (C1, C2) at once, but it corresponds to an operation performed by the current convolution engine (CEN). A weight filter for a convolutional layer may be applied and stored.

디지털 뉴런(DN1)은 도9 에 도시된 인공 뉴런으로서, 입력 선택 먹스(MuxDI)에서 인가되는 다수의 입력값과 컨볼루션 필터뱅크 버퍼(FB1)에서 인가되는 가중치 필터를 내적하여 출력한다.The digital neuron DN1 is an artificial neuron shown in FIG. 9 and outputs a plurality of input values applied from the input selection MuxDI and a weight filter applied from the convolution filter bank buffer FB1 by dot product.

디먹스(Demux)는 출력 데이터 선택 신호(SelDO_L)에 응답하여, 디지털 뉴런(DN1)에서 출력되는 출력값을 출력 특징 맵 버퍼(OFMB) 또는 완전 연결 엔진(FEN)으로 출력한다.In response to the output data selection signal SelDO _L , the Demux outputs an output value output from the digital neuron DN1 to the output feature map buffer OFMB or the fully connected engine FEN.

여기서 출력 데이터 선택 신호(SelDO_L)는 입력 데이터 선택 신호(SelDI_L)과 상이하게 인공 신경망에서 컨볼루션 엔진부(CEN)가 현재 컨볼루션 연산을 수행해야 하는 컨볼루션 레이어가 최종 컨볼루션 레이어인지 아닌지를 판별하기 위한 신호이다.Here, the output data selection signal (SelDO _L ) is different from the input data selection signal (SelDI _L ).In the artificial neural network, the convolutional layer to which the convolution engine unit (CEN) currently performs convolution operation is the final convolution layer or not. It is a signal to discriminate.

디먹스(Demux)는 다수의 컨볼루션 레이어(C1, C2) 중 최종 컨볼루션 레이어(C2)이면, 디지털 뉴런(DN1)의 출력값을 완전 연결 엔진(FEN)으로 출력한다. 그러나 최종 컨볼루션 레이어(C2)가 아니면, 디지털 뉴런(DN1)의 출력값을 출력 특징 맵 버퍼(OFMB)으로 출력한다.Demux (Demux) is the final convolution layer (C2) of the plurality of convolution layers (C1, C2), outputs the output value of the digital neuron (DN1) to the fully connected engine (FEN). However, if it is not the final convolutional layer C2, the output value of the digital neuron DN1 is outputted to the output feature map buffer OFMB.

또한 디먹스(Demux)의 출력은 현재 연산이 수행되는 레이어가 컨볼루션 레이어(C1, C2)가 아닌 경우에, 입력 먹스(MuxIN)으로 인가된다.In addition, the output of the demux is applied as an input muxin when the layer on which the current operation is performed is not the convolutional layers C1 and C2.

그리고 출력 특징 맵 버퍼(OFMB)는 디먹스(Demux)에서 출력되는 출력값을 순차적으로 누적하여 출력 특징 맵으로서 저장하고, 저장된 출력값을 입력 선택 먹스(MuxDI)로 출력한다.The output feature map buffer OFMB sequentially accumulates output values output from the Demux and stores them as an output feature map, and outputs the stored output values as an input selection MuxDI.

상기와 같이 구성된 컨볼루션 엔진(CEN)은 특징 추출부의 다수의 컨볼루션 레이어 중 초기 컨볼루션 레이어에 대해서는 메모리(MEM)에 저장된 입력 영상을 입력 특징 맵으로서 인가받아 입력값을 추출하여 디지털 뉴런(DN1)으로 전달하고, 디지털 뉴런(DN1)에서 출력되는 출력값을 출력 특징 맵 버퍼(OFMB)에 저장한다.The convolution engine (CEN) configured as described above receives the input image stored in the memory (MEM) as an input feature map for the initial convolution layer among the plurality of convolution layers of the feature extraction unit, extracts the input value, and extracts the digital neuron (DN1). ), and stores the output value output from the digital neuron DN1 in the output feature map buffer (OFMB).

그리고 중간 단계의 컨볼루션 레이어에 대해서는 출력 특징 맵 버퍼(OFMB)로부터 출력 특징 맵을 입력 특징 맵으로서 인가받아 다시 출력 특징 맵 버퍼(OFMB)에 저장한다.In addition, for the convolutional layer in the intermediate stage, the output feature map is applied from the output feature map buffer OFMB as an input feature map and is stored in the output feature map buffer OFMB.

그러나 최종 컨볼루션 레이어에 대해서는 디지털 뉴런(DN1)에서 출력되는 출력값을 완전 연결 엔진(FEN)으로 출력한다.However, for the final convolutional layer, the output value output from the digital neuron (DN1) is output to the fully connected engine (FEN).

즉 하나의 컨볼루션 엔진(CEN)으로 다수의 컨볼루션 레이어(C1, C2)를 포함하는 특징 추출부를 구현할 수 있다.That is, a feature extractor including a plurality of convolution layers C1 and C2 may be implemented with one convolution engine CEN.

이때 컨볼루션 엔진(CEN)은 출력 특징 맵 버퍼(OFMB)에 저장된 출력 특징 맵을 기지정된 방식으로 서브 샘플링하는 풀링부(미도시)를 더 포함할 수 있다. 풀링부는 일예로 평균값 풀링 또는 맥스 풀링 작업을 수행하도록 구성될 수 있다.At this time, the convolution engine CEN may further include a pooling unit (not shown) for sub-sampling the output feature map stored in the output feature map buffer OFMB in a known manner. The pooling unit may be configured to perform an average value pooling or a max pooling operation, for example.

그리고 상기에서는 출력 특징 맵 버퍼(OFMB)가 컨볼루션 엔진(CEN) 내에 포함되는 것으로 도시하였으나, 경우에 따라서 출력 특징 맵은 메모리(MEM)에 저장되도록 구성될 수도 있다.In addition, although it is illustrated that the output feature map buffer OFMB is included in the convolution engine CEN, the output feature map may be configured to be stored in the memory MEM in some cases.

한편, 완전 연결 엔진(FEN)은 입력 추론 선택 먹스(MuxFI)와 입력 레지스터(FIRG), 출력 레지스터(FORG), 완전 연결 필터뱅크 버퍼(FB2) 및 디지털 뉴런(DN2)를 포함할 수 있다.Meanwhile, the fully connected engine FEN may include an input inference selection MuxFI, an input register FIRG, an output register FORG, a fully connected filter bank buffer FB2, and a digital neuron DN2.

입력 추론 선택 먹스(MuxFI)는 컨볼루션 엔진(CEN)의 입력 선택 먹스(MuxDI)와 유사하게 연결 레이어 선택 신호(SelFI_L)에 응답하여 컨볼루션 엔진(CEN)에서 인가되는 출력값 또는 출력 레지스터(FORG)에 저장된 출력값을 선택하여 입력 레지스터(FIRG)로 전달한다.The input inference selection MuxFI is similar to the input selection MuxDI of the convolution engine (CEN) in response to the connection layer selection signal (SelFI _L ) and the output value or output register (FORG) applied from the convolution engine (CEN). Select the output value stored in) and transfer it to the input register (FIRG).

여기서 연결 레이어 선택 신호(SelFI_L)는 분류부의 다수의 완전 연결 레이어(FC1, FC2, INF) 중 초기 완전 연결 레이어(FC1)인지 나머지 완전 연결 레이어(FC2, INF)인지를 구분하기 위해 인가되는 신호이다.Here, the connection layer selection signal SelFI _L is a signal applied to distinguish whether it is the initial fully connected layer (FC1) or the remaining fully connected layers (FC2, INF) among the plurality of fully connected layers (FC1, FC2, INF) of the classification unit. to be.

입력 추론 선택 먹스(MuxFI)는 연결 레이어 선택 신호(SelFI_L)에 응답하여 초기 완전 연결 레이어(FC1)이면, 컨볼루션 엔진(CEN)에서 인가되는 출력값을 선택하여, 입력 레지스터(FIRG)로 전달한다. 그러나 나머지 완전 연결 레이어(FC2, INF)이면, 출력 레지스터(FORG)에 저장된 출력값을 선택하여 입력 레지스터(FIRG)로 전달한다.In response to the connection layer selection signal SelFI _L , the input inference selection MuxFI selects an output value applied from the convolution engine CEN and transfers it to the input register FIRG if it is the initial fully connected layer FC1. . However, in the remaining fully connected layers FC2 and INF, the output value stored in the output register FORG is selected and transferred to the input register FIRG.

입력 레지스터(FIRG)는 입력 추론 선택 먹스(MuxFI)에서 전달되는 출력값을 인가받아 임시 저장하여 디지털 뉴런(DN2)로 출력하고, 출력 레지스터(FORG)는 디지털 뉴런(DN2)에서 출력되는 출력값을 임시 저장하여 입력 추론 선택 먹스(MuxFI)로 출력한다. 그리고 완전 연결 필터뱅크 버퍼(FB2)는 메모리(MEN)에 저장된 가중치 필터를 인가받아 임시 저장한다.The input register (FIRG) receives the output value transmitted from the input reasoning selection MuxFI, temporarily stores it, and outputs it to the digital neuron (DN2), and the output register (FORG) temporarily stores the output value output from the digital neuron (DN2). And output as input reasoning selection MuxFI. In addition, the fully connected filter bank buffer FB2 receives and temporarily stores the weight filter stored in the memory MEN.

디지털 뉴런(DN2)는 입력 레지스터(FIRG)에서 인가되는 출력값과 완전 연결 필터뱅크 버퍼(FB2)에서 저장된 가중치 필터를 인가받아 기지정된 연산을 수행하여 출력값을 출력한다. 이때 디지털 뉴런(DN2)에서 출력되는 출력값은 추론값(Inferencr output)일 수 있다.The digital neuron DN2 receives the output value applied from the input register FIRG and the weight filter stored in the fully connected filter bank buffer FB2, performs a predetermined operation, and outputs the output value. In this case, the output value output from the digital neuron DN2 may be an inferencr output.

여기서 완전 연결 엔진(FEN)은 디지털 뉴런(DN2)에서 출력되는 출력값을 출력 레지스터(FORG)로 출력하거나 외부로 출력하도록 선택하는 디먹스를 추가로 더 구비할 수 있다.Here, the fully connected engine FEN may further include a demux that selects to output an output value output from the digital neuron DN2 to the output register FORG or output to the outside.

이 경우 컨볼루션 엔진(CEN)과 완전 연결 엔진(FEN)은 거의 동일한 구조를 가질 수 있다. 그러나 추론 엔진이 컨볼루션 엔진(CEN)과 완전 연결 엔진(FEN)을 구분하여 각각 구비하는 것은 일반적으로 컨볼루션 레이어에서 이용되는 가중치 필터의 크기와 완전 연결 레이어에서 이용되는 가중치 필터의 크기 차이가 매우 크므로, 컨볼루션 엔진(CEN)을 이용하여 완전 연결 레이어를 구현하는 것은 매우 비효율적이기 때문이다.In this case, the convolution engine (CEN) and the fully connected engine (FEN) may have almost the same structure. However, the reason why the inference engine separates the convolution engine (CEN) and the fully connected engine (FEN) and has a large difference between the size of the weight filter used in the convolution layer and the weight filter used in the fully connected layer is generally very different. Because it is large, it is very inefficient to implement a fully connected layer using the convolution engine (CEN).

즉 도14 에서 컨볼루션 엔진(CEN)의 디지털 뉴런(DN1)과 완전 연결 엔진(FEN)의 디지털 뉴런(DN2)은 모두 도9 과 같은 인공 뉴런으로 구현될 수 있으나, 가중치 필터와 입력 특징 맵의 크기 차이가 크기 때문에 별도로 구현되는 것이 바람직하다. 특히 컨볼루션 엔진(CEN)의 디지털 뉴런(DN1)은 3차원 가중치 필터를 이용하는 반면, 완전 연결 엔진(FEN)의 디지털 뉴런(DN2)은 대부분 1차원 가중치 필터를 이용한다.That is, in FIG. 14, the digital neuron DN1 of the convolution engine (CEN) and the digital neuron DN2 of the fully connected engine (FEN) may both be implemented as artificial neurons as shown in FIG. 9, but the weight filter and the input feature map Since the size difference is large, it is desirable to implement it separately. In particular, the digital neuron (DN1) of the convolution engine (CEN) uses a 3D weight filter, while the digital neuron (DN2) of the fully connected engine (FEN) mostly uses a 1D weight filter.

이에 본 실시예에서는 컨볼루션 엔진(CEN)의 디지털 뉴런(DN1)은 도9 의 인공 뉴런으로 구성하고, 완전 연결 엔진(FEN)의 디지털 뉴런(DN2)은 별도로 구현되는 완전 연결 인공 뉴런이라 할 수 있다.Accordingly, in this embodiment, the digital neuron (DN1) of the convolution engine (CEN) is composed of the artificial neuron of Fig. 9, and the digital neuron (DN2) of the fully connected engine (FEN) is a fully connected artificial neuron that is implemented separately. have.

상기한 도14 에서는 컨볼루션 엔진(CEN)이 하나의 디지털 뉴런(DN1)을 포함하는 것으로 가정하여 도시하였으나, 도14 의 인공 신경망의 다수의 컨볼루션 레이어 각각은 서로 다른 크기의 가중치 필터를 이용할 수도 있다. 이때 가중치 필터의 크기는 디지털 뉴런(DN1)에서 지정된 가중치 필터의 크기보다 클 수 있다.In FIG. 14 above, it is assumed that the convolution engine (CEN) includes one digital neuron (DN1), but each of the plurality of convolutional layers of the artificial neural network of FIG. 14 may use weight filters of different sizes. have. In this case, the size of the weight filter may be larger than the size of the weight filter designated by the digital neuron DN1.

이에 도14 의 인공 신경망을 위한 추론 엔진은 디지털 뉴런(DN1) 대신 도13 의 인공 뉴런이 추가될 수 있다. 일예로 도14 의 추론 엔진에서 컨볼루션 엔진(CEN)은 도13 에서와 같이 다수개의 디지털 뉴런을 포함할 수 있다. 그리고 도14 의 입력값 획득부(INOB)는 도13 의 입력값 추출부(INEX)와 같이 모드 신호(MD)에 응답하여 서로 다른 위치의 입력값을 추출하도록 구성될 수 있다. 또한 입력값 획득부(INOB)와 다수의 디지털 뉴런 사이에 다수의 입력 선택 먹스(MuxDI)를 포함하고, 다수의 디지털 뉴런과 디먹스(Demux) 사이에 가산기(AD)와 적어도 하나의 출력 선택 먹스(Mux2)를 포함하도록 구성될 수 있다.Accordingly, the inference engine for the artificial neural network of FIG. 14 may include the artificial neuron of FIG. 13 instead of the digital neuron DN1. For example, in the inference engine of FIG. 14, the convolution engine (CEN) may include a plurality of digital neurons as shown in FIG. 13. In addition, the input value acquisition unit INOB of FIG. 14 may be configured to extract input values of different positions in response to the mode signal MD, like the input value extracting unit INEX of FIG. 13. In addition, it includes a plurality of input selection muxes (MuxDI) between the input value acquisition unit (INOB) and a plurality of digital neurons, and an adder (AD) and at least one output selection mux between a plurality of digital neurons and Demux. It can be configured to include (Mux2).

즉 컨볼루션 엔진(CEN)이 수행해야하는 동작에 대응하는 컨볼루션 레이어의 가중치 필터의 크기에 따라 지정된 크기 이내의 가중치 필터에 대해서는 둘 이상의 내적 연산을 병렬로 수행하고, 지정된 크기를 초과하는 가중치 필터에 대해서는 분할하여 내적 연산을 수행할 수 있도록 한다.That is, two or more dot product operations are performed in parallel for weight filters within a specified size according to the size of the weight filter of the convolution layer corresponding to the operation that the convolution engine (CEN) should perform, and weight filters exceeding the specified size are performed. For this, it is divided so that the dot product operation can be performed.

경우에 따라서는 완전 연결 엔진(FEN) 또한 컨볼루션 엔진(CEN)과 마찬가지로 다수의 인공 뉴런을 포함하고, 가중치의 크기에 따라 병렬 또는 분할 방식으로 내적 연산을 수행하도록 구성될 수도 있다.In some cases, the fully connected engine (FEN) may also include a plurality of artificial neurons, similar to the convolution engine (CEN), and may be configured to perform dot product operations in a parallel or split method according to the size of the weight.

본 발명에 따른 방법은 컴퓨터에서 실행 시키기 위한 매체에 저장된 컴퓨터 프로그램으로 구현될 수 있다. 여기서 컴퓨터 판독가능 매체는 컴퓨터에 의해 액세스 될 수 있는 임의의 가용 매체일 수 있고, 또한 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함하며, ROM(판독 전용 메모리), RAM(랜덤 액세스 메모리), CD(컴팩트 디스크)-ROM, DVD(디지털 비디오 디스크)-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등을 포함할 수 있다.The method according to the present invention may be implemented as a computer program stored in a medium for execution on a computer. Here, the computer-readable medium may be any available medium that can be accessed by a computer, and may also include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, and ROM (Read Dedicated memory), RAM (random access memory), CD (compact disk)-ROM, DVD (digital video disk)-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다.The present invention has been described with reference to the embodiments shown in the drawings, but these are merely exemplary, and those of ordinary skill in the art will understand that various modifications and equivalent other embodiments are possible therefrom.

따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 청구범위의 기술적 사상에 의해 정해져야 할 것이다.Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the appended claims.

Claims

Digital implemented by digital hardware that performs dot product operation by receiving an input vector including a plurality of input values in an integer format having a predetermined number of bits and a weight vector including a plurality of weights in an integer format having a predetermined number of bits. For neurons,
The digital neuron includes a neuron, and the neuron
A plurality of values expressed as a product of a coefficient (R) having at least one value of -1, 0, 1 and a multiplier of 2 (2 ⁿ ) having a corresponding weight applied from the weight vector and the weight of the applied integer format A plurality of weight decomposition units that decompose into the sum of the partial products of R·2 ⁿ and output a control signal according to a coefficient (R) and an exponent (n) of the decomposed partial products;
A plurality of partial product generators for receiving a corresponding input value from the input vector, and outputting a plurality of partial products by bit shifting the input value in a direction of an upper bit by an index (n) in response to the control signal; And
A partial product adder for summing the plurality of partial products in parallel and outputting a channel operation value; Including,
Each of the plurality of partial product generators
A digital neuron that additionally performs a one's complement operation or a zero conversion operation for the plurality of partial products according to the coefficient R.

delete

The method of claim 1, wherein the digital neuron
A negative partial product counter for obtaining a count value by counting the number of times a one's complement operation is performed by the plurality of partial product generators; Including more,
The partial product adder is
A digital neuron that calculates the channel operation value by summing the count values together with the plurality of partial products.

The method of claim 3, wherein the partial product adder is
A digital neuron that calculates the channel operation value by summing the count value and the value excluding the value of the sign bit from the plurality of partial products, and adding the two's complement of the count value to the sign bit to the summation result.

The method of claim 4, wherein each of the plurality of partial product generators
It includes a plurality of partial product calculators that are driven according to the control signal to calculate and output each partial product,
Each of the plurality of partial product calculators
A bit shifter for bit shifting the input value by the exponent n;
A one's complement operator for outputting the partial product by performing a one's complement operation on the output value of the bit shifter when the coefficient R is -1; And
A zero multiplier for converting the partial product to zero in response to the control signal when the coefficient R is 0; Digital neurons further comprising.

The method of claim 4, wherein the partial product adder is
It is composed of a number of stages each including a number of full adders,
The first stage among the plurality of stages receives the plurality of partial products, groups and adds values of the same bit of the partial products of different preset numbers,
The remaining stages are added by grouping the addition values of the previous stage,
The final stage is a digital neuron that calculates the channel operation value by adding the count value to the lower bit of the addition value of the previous stage.

The method of claim 6, wherein the partial product adder is
A two's complement calculator that receives the count value and calculates a two's complement value; Including more,
The final stage is a digital neuron that extends the sign bit of the channel operation value by adding the count complement value to the upper bit of the addition value of the previous stage.

The method of claim 1, wherein the digital neuron
A corresponding two-dimensional input feature map among a number of two-dimensional weight filters having a plurality of weights of the three-dimensional weight filter and a two-dimensional input feature map having a plurality of input values of the three-dimensional input feature map Including a plurality of neurons each approved,
A channel adder for outputting an output value of the digital neuron by adding the channel operation value output from each of the plurality of neurons and a predetermined bias value; Digital neurons further comprising.

The method of claim 1, wherein the weight is
In order to reduce the number of items of the partial product, a digital neuron that is applied by being substituted with a designated adjacent integer so that a predetermined integer is excluded.

A nerve implemented by digital hardware that performs dot product operation by receiving an input vector including a plurality of input values in an integer format having a predetermined number of bits and a weight vector including a plurality of weights in an integer format having a predetermined number of bits. In the digital neuron containing the device,
The nerve element
A plurality of values expressed by the product of a coefficient (R) having at least one value of -1, 0, and 1 and a multiplier of 2 (2 ⁿ ) having a corresponding weight applied from the weight vector and the weight of the applied integer format A plurality of weight decomposition units that decompose into a sum of partial products (R·2 ⁿ ) of;
A value corresponding to the coefficient R is selected by receiving a corresponding input value from the input vector and a voltage corresponding to the negative input value and the 0 value of the input value, and the selected value is in the direction of the upper bit by the index (n). A plurality of partial product generators for bit shifting to and outputting a plurality of partial products, respectively; And
A partial product adder for summing the plurality of partial products in parallel and outputting a channel operation value; Including,
Each of the plurality of partial product generators
A digital neuron that additionally performs a one's complement operation or a zero conversion operation for the plurality of partial products according to the coefficient R.

The method of claim 10, wherein each of the plurality of partial product generators
Includes a plurality of partial product calculators that calculate and output each partial product,
Each of the plurality of partial product calculators
A mux for receiving a voltage corresponding to the input value and a negative input value and a zero value, which are two's complement of the input value, and selecting and outputting a value corresponding to the coefficient R; And
A bit shifter for bit shifting a value selected and applied from the mux by the index n; Digital neurons containing.

The method of claim 10, wherein the weight is
In order to reduce the number of items of the partial product, a digital neuron that is applied by being substituted with a designated adjacent integer so that a predetermined integer is excluded.