KR20040040075A

KR20040040075A - Hardware of reconfigurable and expandable neural networks

Info

Publication number: KR20040040075A
Application number: KR1020020068396A
Authority: KR
Inventors: 윤석배; 김영주; 이종호
Original assignee: 학교법인 인하학원
Priority date: 2002-11-06
Filing date: 2002-11-06
Publication date: 2004-05-12
Also published as: KR100445264B1

Abstract

PURPOSE: An expandable and reconfigurable neural network hardware is provided to offer a digital neural network hardware having availability as much as software by securing variability and expandability with the interconnecting bus between a ladder bus and a module, realizing a fast neural network operation through a parallel process and a pipeline process, and changing an architecture of a neural network. CONSTITUTION: A processing element includes SPEs(Synapse Processing Element)(10), an AFU(Activation Function Unit), and an LPE(Layer Processing Element)(20). The SPE has a multiplier multiplying a weight of a synapse included in a neuron and an accumulator performing an accumulating operation of the synapses. The AFU receives an operation result of the SPE and outputs an activation function value for the operation result. The LPE has a bus controller controlling a data transfer direction between processing elements. The interconnecting bus(30) connects between the SPEs and expands the synapse by transferring the accumulated synapse value to the near SPE. The ladder bus(40) transfers the operation result of the SPE to the LPE by connecting the SPE and the LPE.

Description

Neural Network Hardware with Reconfigurability and Expansion {HARDWARE OF RECONFIGURABLE AND EXPANDABLE NEURAL NETWORKS}

본 발명은 재구성능력 및 확장능력을 가진 신경회로망 하드웨어에 관한 것으로, 보다 상세하게는 디지털 신경회로망의 하드웨어 구현을 위한 새로운 구조인 ERNA(Expandable and Reconfigurable Neural network Architecture)에 관한 것이다.The present invention relates to neural network hardware having reconfigurability and expandability, and more particularly, to an expandable and reconfigurable neural network architecture (ERN), which is a new structure for hardware implementation of digital neural networks.

신경회로망은 논리식을 기본으로 하고 프로그램에 의해 동작하는 폰 노이만(von Neumman)형 컴퓨터 구조와 달리 인간의 뇌를 모방하여 만든 연산 방법이다. 인간의 뇌가 컴퓨터에 대해서 보이는 차이점은 크게 두 가지로 요약할 수 있다. 첫 번째는 대규모 병렬 처리 및 분산 처리 형태(Massively distributed and parallel)로 연산을 수행한다는 것이고, 두 번째는 문제를 위한 선행적 프로그램 개발 대신에 학습을 통해서 문제를 해결한다는 것이다. 이러한 인간의 뇌가 가지는 특성을 흉내 내는 연산 방법이 바로 신경회로망이다.Neural networks are computational methods that mimic the human brain, unlike von Neumman-type computer structures that are based on logic and operate by programs. The differences between the human brain and the computer can be summarized in two main ways. The first is to perform operations in a massively distributed and parallel manner, and the second is to solve problems through learning instead of developing proactive programs for problems. The neural network is a computational method that mimics the characteristics of the human brain.

일반적인 신경회로망에 대한 정의는 다음과 같이 요약될 수 있다.The definition of a general neural network can be summarized as follows.

“유니트(인공 뉴런)들 간의 수많은 상호 연결로써 구성된 구조. 각각의 유니트는 각각의 입출력 특성을 가지고 있으며, 이것은 지역적 연산이나 함수로써 구현된다. 개개의 유니트 출력은 각자의 입출력 특성에 의해서 정의되며, 외부 입력 및 인접 유니트와 상호 연결되어 있다. 간혹, 수작업에 의해서 임의적으로 신경망을 구성하기도 하지만, 일반적으로 여러 가지 형태에 의한 학습을 통해서 신경망의 전체적인 기능을 개발하게 된다.” 도 1은 시각세포를 모델로 한 신경회로망의 기본적인 형태를 보여주고 있다.“A structure composed of numerous interconnections between units (artificial neurons). Each unit has its own input and output characteristics, which are implemented as local operations or functions. Individual unit outputs are defined by their input and output characteristics and are interconnected with external inputs and adjacent units. Occasionally, the neural network can be arbitrarily constructed by hand, but in general, the overall function of the neural network is developed through various forms of learning. ”Fig. 1 shows the basic form of a neural network modeled by visual cells. have.

하드웨어 신경회로망의 개발은 폰 노이만형 컴퓨터에 의한 디지털 컴퓨터의 개발과 그 시기를 비슷하게 갖고 있다. 다시 말해서 신경회로망 컴퓨터는 기존의 폰 노이만형 컴퓨터의 대치를 위해 개발된 것이 아니라, 인간의 뇌 구조를 모방하려는 노력에서 나온 폰 노이만 형 컴퓨터와는 또 다른 종류의 하나인 것이다. 이러한 신경회로망 컴퓨터의 첫 번째 하드웨어 구현은 F. Rosenblatt에 의해 1958년에 개발된 MARK I Perceptron이었다. 이후 40년 동안 신경회로망은 발전과 쇠퇴를 반복해왔으나, 디지털 컴퓨터가 IC의 출현에 힘입어 급속도로 발전한 반면 신경회로망 컴퓨터는 일반화된 문제에 적용될 만큼 발전하지 못했었다.The development of the hardware neural network has a similar time to the development of digital computers by von Neumann type computers. In other words, the neural network computer was not developed to replace the existing von Neumann type computer, but is a different kind of von Neumann type computer from the effort to imitate the human brain structure. The first hardware implementation of this neural network computer was the MARK I Perceptron, developed in 1958 by F. Rosenblatt. Over the next 40 years, neural networks have evolved and decayed, but digital computers have evolved rapidly with the advent of ICs, while neural networks have not advanced enough to be applied to generalized problems.

이러한 초창기의 신경회로망은 대부분 아날로그 기술로 만들어졌다. 일반적으로 아날로그 신경회로망은 논리적인 신경회로망을 직관적으로 만들 수 있는 장점이 있으며, 가중치 적용에 대해서 곱셈기를 사용하지 않음으로써 적은 면적을 차지하면서도 매우 빠른 병렬 구조를 획득할 수 있다. 반면에, 범용성이 크게 제한되고 학습된 가중치 값의 저장이 어려우며, 전기적 잡음과 온도에 민감하여 오차율이 크고, 기존의 디지털 컴퓨터와의 인터페이스가 어려운 단점들이 있어서 상업적 제품으로는 성공하지 못하였다. 이후 90년대 중반까지 이러한 신경회로망 칩 개발은 다소 침체된 양상을 보였었으나, 90년대 후반에 들어서 디지털 반도체 기술의 비약적인 발전에 힘입어 신경회로망 칩에 대한 연구가 다시 활성화되기 시작했다. ASIC 기술이 발전하고 FPGA가 등장함에 따라 빠르고 집적도 높은 디지털 VLSI의 구현이 가능해졌고, 이를 이용한 신경회로망 칩도 많이 개발되어 상업용 신경회로망 칩도 활발히 개발되었다. 또한 고속 DSP의 등장은 병렬 DSP를 이용한 신경회로망의 구현을 가능하게 하였다. 아날로그 신경회로망 칩에 비해 면적을 많이 차지하고 느린 속도를 가진다는 단점이 있으나, 가중치 값의 저장이 용이하고 기존의 디지털 프로세서 기술을 사용함으로써 쉽게 제작할 수 있으며, 아날로그 신경회로망에 비해 비교적 자유로운 구조를 가질 수 있는 장점이 있다.Most of these early neural networks were made with analog technology. In general, analog neural networks have the advantage of making logical neural networks intuitive, and by not using a multiplier for weighting, a very fast parallel structure can be obtained while occupying a small area. On the other hand, it has not been successful as a commercial product due to the limitations of general versatility, difficult to store learned weight values, high error rate because it is sensitive to electrical noise and temperature, and difficult to interface with existing digital computers. Since then, the development of neural network chips has been somewhat stagnant until the mid 90's, but in the late 90's, research on neural network chips has begun to revitalize with the rapid development of digital semiconductor technology. With the development of ASIC technology and the emergence of FPGAs, it has become possible to implement fast and high-density digital VLSI, and many neural network chips have been developed using them, and commercial neural network chips have been actively developed. In addition, the emergence of high speed DSP enabled the implementation of neural networks using parallel DSP. Compared to analog neural network chip, it takes up a lot of area and has a slow speed, but it is easy to store weight value and can be easily manufactured by using existing digital processor technology, and it can have relatively free structure compared to analog neural network. There is an advantage.

디지털 신경회로망을 구현하는 방법은 크게 두 가지로 나눌 수 있다. 첫 번째는 DSP 혹은 CPU를 이용한 마이크로 프로세서 기반의 신경회로망이며, 두 번째는 ASIC 형태로써 직접적으로 시냅스, 뉴런 등의 회로를 구성하여 신경회로망을 구현하는 방법이다. 디지털 프로세서 기반의 신경회로망은 구현되는 시냅스의 개수에 비해 면적이 적고, 기존의 프로세서를 그대로 사용하기 때문에 설계가 용이하며, 프로그램의 변경만으로 다양한 형태의 신경회로망을 구현할 수 있다는 장점이 있다. 하지만, ASIC으로 구현되는 신경회로망에 비해서 속도가 느리고, 단일칩으로 구성하기 어려우며, 신경망의 특징인 병렬처리, 분산처리의 효율이 떨어지게 된다.ASIC 기술을 이용하여 신경회로망을 구현하는 경우는 사용하고자 하는 목적에 따라서 다양한 형태의 신경회로망을 구현하는 것이 가능하며 이론적 모델에 가까운 형태로 신경망을 구현할 수 있다는 장점이 있지만, 한번 구현된 신경망의 형태를 변경하기가 어렵고, 신경망의 확장이 어려운 단점이 있다.There are two ways to implement digital neural networks. The first is a microprocessor-based neural network using DSP or CPU, and the second is an ASIC type that implements neural networks by directly constructing synapses and neurons. Digital processor-based neural networks have the advantage that they are smaller in area than the number of synapses that are implemented, and are easy to design because the existing processors are used as they are, and that various types of neural networks can be implemented just by changing the program. However, it is slower than the neural network implemented by ASIC, difficult to construct as a single chip, and the efficiency of parallel processing and distributed processing, which is a characteristic of neural network, is inferior. It is possible to implement various types of neural networks according to the purpose, and it is possible to implement neural networks in a form close to a theoretical model. However, it is difficult to change the shape of the neural network once implemented, and it is difficult to expand neural networks. .

일반적인 완전연결(Fully Connected) 형태를 갖는 신경망 구조에서 뉴런의 증가에 대해 시냅스의 개수는 뉴런 수의 제곱의 형태로 증가하게 되고, 이에 비례하여 곱셈기의 양이 증가하게 된다. 이러한 곱셈기의 증가에 따른 면적 소모 및 속도 저하는 곱셈기가 사용되지 않는 아날로그 신경회로망에서보다 디지털 신경회로망에서 더 큰 단점으로 나타나게 되며, 이를 극복하기 위한 방법으로 곱셈기가 없는 디지털 신경망, 혹은 곱셈기의 크기를 줄이기 위한 시리얼 데이터 방식의 디지털 신경망과 같이 구조를 그대로 두고 곱셈기의 크기를 줄이는 연구들이 이루어졌다. 그리고 다른 방법으로는 SIMD 형태와 같이 버스 구조를 통해서 곱셈기를 재사용하는 디지털 신경망과 사슬 배열 구조(Systolic Array Architecture)를 채용하여 곱셈기의 수를 선형적으로 늘이는 신경회로망, 셀룰러 신경회로망(Cellular Neural Network)처럼 인접 연결(Neighborhood Connected)을 사용하는 신경망과 같이 기존의 완전 연결 구조를 물리적으로 구현하지 않고 모듈러 아키텍쳐 형태로 구현함으로써 뉴런 및 시냅스의 증가에 따른 곱셈기의 증가 문제를 해결하는 연구가 이루어졌다.The number of synapses increases in the form of the square of the number of neurons with respect to the increase of neurons in a neural network structure having a general fully connected (Fully Connected) form, in proportion to the amount of multipliers. The area consumption and speed decrease caused by the increase of the multiplier appear to be a greater disadvantage in the digital neural network than in the analog neural network without the multiplier. Research has been conducted to reduce the size of the multiplier while leaving the structure intact, such as a digital neural network using serial data. The other method uses a digital neural network that reuses a multiplier through a bus structure like the SIMD form, and a neural network that linearly increases the number of multipliers by using a neural network and a cellular neural network. As a neural network using Neighborhood Connected, research has been conducted to solve the problem of multiplication by increasing the number of neurons and synapses by implementing a modular architecture instead of physically implementing the existing fully connected structure.

상기와 같이 신경회로망은 기존의 폰 노이만 방식의 컴퓨터에 비해 병렬성,학습능력, 고장극복능력, 적응성 등의 이점을 가지고 있으며 최적화 문제, 패턴인식, 연상 메모리 등, 기존의 폰 노이만 형 컴퓨터로써는 구현이 어렵거나 복잡한 알고리즘을 사용해야만 해결할 수 있었던 문제들을 쉽게 풀 수 있는 장점을 가지고 있다. 이와 같이 신경회로망은 데이터를 병렬, 분산 처리하므로 연상, 추론, 학습 등의 기능을 구현하는데 유리한 반면, 그 응용 범위가 한정 되어 있고 하드웨어 구현에 어려움이 있기 때문에, 실질적인 하드웨어로서는 널리 쓰이지 못하고 있다.As described above, the neural network has advantages such as parallelism, learning ability, fault tolerance, adaptability, etc., compared to the existing von Neumann type computers. It has the advantage of being able to solve problems that could only be solved by using difficult or complicated algorithms. As such, the neural network is advantageous in realizing functions such as association, inference, and learning because data is processed in parallel and distributedly, but its application range is limited and it is difficult to implement hardware. Therefore, it is not widely used as a practical hardware.

그럼에도 불구하고, 신경회로망은 병렬, 분산 처리를 기본으로 하기 때문에 기존의 디지털 컴퓨터에서의 시뮬레이션만으로는 한계를 가지며, 그에 따라 실제적인 신경회로망의 하드웨어 구현을 필요로 하게 된다.Nevertheless, since the neural network is based on parallel and distributed processing, the simulation in the existing digital computer alone is limited, and thus necessitates the actual hardware implementation of the neural network.

본 발명은 상기와 같은 문제점을 해결하기 위해 안출된 것으로, 사다리형 버스(Ladder Bus)와 모듈간의 내부연결 버스(Interconnecting Bus)를 채택하여 가변성과 확장성을 확보하고, 병렬 처리와 파이프라인 처리를 통해 고속의 신경회로망 연산이 가능하며, 구성비트 레지스터들의 설정을 통해 신경망의 구조를 변경할 수 있도록 함으로서 소프트웨어만큼의 유용성을 갖는 디지털 신경회로망 하드웨어를 제공함을 목적으로 한다.The present invention has been made to solve the above problems, by adopting a ladder bus (Interconnecting Bus) between the module (Ladder Bus) and the module (Interconnecting Bus) to secure variability and scalability, parallel processing and pipeline processing It is possible to provide high speed neural network operation, and to provide digital neural network hardware that is as useful as software by changing the structure of neural network through setting of configuration bit registers.

도 1은 시각세포를 모델로 한 신경회로망의 기본적인 형태를 도시한 도면,1 is a view showing the basic form of the neural network modeled on the visual cell,

도 2는 본 발명에 따른 기본 프로세싱 모듈의 구성도,2 is a block diagram of a basic processing module according to the present invention;

도 3은 본 발명에 따른 SPE의 블록다이어그램,3 is a block diagram of an SPE according to the present invention;

도 4는 본 발명에 따른 LPE의 블록다이어그램,4 is a block diagram of an LPE according to the present invention;

도 5는 내부연결 버스를 이용한 시냅스의 확장을 도시한 도면,5 is a diagram illustrating expansion of a synapse using an interconnection bus;

도 6은 사다리형 버스를 이용한 모듈 확장을 도시한 도면,6 shows module expansion using a ladder bus,

도 7은 LPE의 구성비트 레지스터의 설정을 통한 정방향, 역방향 버스 구성도,7 is a configuration diagram of the forward and reverse buses through the configuration bit registers of the LPE;

도 8은 레이어 내의 뉴런 확장을 도시한 도면.8 illustrates neuronal expansion within a layer.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

(10) SPE(Synapse Processing Element) : 시냅스처리요소(10) SPE (Synapse Processing Element): Synapse processing element

(20) LPE(Layer Processing Element) : 레이어처리요소(20) Layer Processing Element (LPE): Layer Processing Element

(30) 내부연결 버스(Interconnecting Bus)(30) Interconnecting Bus

(40) 사다리형 버스(Ladder Bus)(40) Ladder Bus

(100) 기본 프로세싱 모듈(Processing Module)(100) Basic Processing Module

상기 목적을 달성하기 위해 본 발명은, 복수의 처리요소(PE)와 복수의 데이터 버스로 구성된 하나 이상의 기본 프로세싱 모듈을 가진 신경회로망 하드웨어를 제공하며, 상기 처리요소는, 뉴런이 갖고 있는 시냅스의 가중치 곱셈 연산을 수행하는 곱셈기와 시냅스의 누적합 연산을 수행하는 누산기를 가지는 복수의 시냅스처리요소(SPE)와; 상기 시냅스처리요소의 연산 결과를 입력 받아 이에 대한 활성화 함수값을 출력하는 활성화 함수 유닛(AFU)과 상기 처리요소들 간의 데이터 전송 방향을 제어하는 버스 컨트롤러를 가지는 레이어처리요소(LPE);를 포함하고, 상기 데이터 버스는, 상기 시냅스처리요소들 간에 연결되고 한 시냅스처리요소에서 누적된 시냅스 값을 인접 시냅스처리요소로 전달함으로써 시냅스를 확장하는 내부연결 버스와; 상기 시냅스처리요소와 상기 레이어처리요소를 연결하여 상기 시냅스처리요소의 연산 결과를 상기 레이어처리요소로 전달하는 사다리형 버스;를 포함하는 것을 특징으로 한다.In order to achieve the above object, the present invention provides a neural network hardware having one or more basic processing modules composed of a plurality of processing elements (PEs) and a plurality of data buses, wherein the processing elements are weights of synapses possessed by neurons. A plurality of synaptic processing elements (SPEs) having a multiplier for performing a multiplication operation and an accumulator for performing a cumulative sum operation of synapses; And a layer processing element (LPE) having an activation function unit (AFU) for receiving an operation result of the synaptic processing element and outputting an activation function value for the synaptic processing element and a bus controller for controlling a data transfer direction between the processing elements. The data bus includes: an internally connected bus connected between the synaptic processing elements and extending synapses by transferring the synaptic values accumulated in one synaptic processing element to adjacent synaptic processing elements; And a ladder bus which connects the synaptic processing element and the layer processing element to transfer the calculation result of the synaptic processing element to the layer processing element.

본 발명에서는 가변구조를 기반으로 하는 모듈러 신경회로망을 통해서 확장성 및 가변성을 확보하는 방법을 제안한다.The present invention proposes a method for securing scalability and scalability through a modular neural network based on a variable structure.

제안된 하나의 프로세싱 모듈은 전체적으로 사다리(Ladder)형 버스 구조를 가지며 사다리형 버스의 방향을 바꾸는 방법으로 뉴런(Neuron)이나 시냅스(Synapse)를 확장하게 된다.The proposed processing module has a ladder bus structure as a whole and expands neurons or synapses by changing the direction of the ladder bus.

그리고, 각각의 모듈은 내부적으로 SPE(Synapse Processing Element)와 LPE(Layer Processing Element)의 두 가지 모듈로 구성되며, 인접한 SPE들 간에는 뉴런의 시냅스에 대한 누적합을 전달할 수 있는 내부연결 버스(Interconnectingbus)로 연결되어 있다. 이러한 내부연결 버스를 통해서 뉴런의 누적합을 전달함으로써 시냅스의 개수를 모듈 내부뿐 아니라 외부까지도 확장할 수 있는 구조를 가지고 있다.Each module is internally composed of two modules, a SPE (Synapse Processing Element) and an LPE (Layer Processing Element), and an interconnecting bus capable of transferring a cumulative sum of neurons synapses between adjacent SPEs. Is connected. By passing the cumulative sum of neurons through the interconnection bus, the number of synapses can be extended not only inside the module but also outside.

마지막으로 LPE는 뉴런의 활성화 함수에 대한 출력기능을 가지며, 사다리형 버스의 방향을 바꾸는 기능을 가진다. 이러한 기능을 통해 모듈 외부로의 뉴런 및 레이어(Layer) 확장을 가능하게 하며, 단층, 다층 및 정방향, 역방향 같은 다양한 형태의 신경회로망 구성을 가능하게 한다.Finally, the LPE has the ability to output the neuron's activation function and to redirect the ladder bus. This feature allows neurons and layers to extend out of the module, and enables various forms of neural networks such as monolayers, multilayers and forward and reverse.

본 발명의 구성에 따른 시냅스의 확장에 대한 연산 모델을 설명하면 다음과 같다.Referring to the operation model for the expansion of the synapse according to the configuration of the present invention.

먼저, 제안된 구조를 수식적으로 표현하고자 할 때 시냅스에 대한 곱의 합(Sum of products) 계산을 담당하는 임의의 i번째 SPE(Synapse Processing Element)와 버스 컨트롤러를 제외한 임의의 j번째 LPE(Layer Processing Element)에 대한 연산 모델은 아래의 수학식 1과 같다.First, in order to express the proposed structure formally, any i-th SPE (Synapse Processing Element) responsible for calculating the sum of products for synapses and any j-th LPE (except the bus controller) A computational model for Processing Element) is shown in Equation 1 below.

이때,는 SPE가 처리할 수 있는 최대 시냅스 수가 된다. 한편, 일반적인 뉴런에 대한 모델을 수식으로 표현하면 수학식 2와 같이 나타낼 수 있다.At this time, Is the maximum number of synapses that the SPE can handle. On the other hand, if the model for a general neuron is expressed by the equation can be expressed as Equation 2.

마찬가지로,는 하나의 뉴런이 가지는 최대 시냅스 수가 되며, 상기 수학식 1과 2의f(y)는 뉴런의 활성화 함수(Activation Function)이다. SPE의 범위 내에서 임의의을 포함하는 집합을, 같은 조건의을 포함하는 집합을라고 하고, 뉴런의 범위 내에서 임의의을 포함하는 집합을, 같은 조건의을 포함하는 집합을라고 하면, 각각의 집합은 다음 수학식 3과 같이 나타낼 수 있다.Likewise, Is the maximum number of synapses of one neuron, and f (y) in Equations 1 and 2 is an activation function of the neuron. Any within the scope of SPE A set containing Of the same condition A set containing Randomly within the range of the neuron A set containing Of the same condition A set containing In this case, each set may be represented by Equation 3 below.

뉴런에 대한 SPE의 관계는에 대한의 크기에 따라 다음 수학식 4및 5와 같이 두가지 경우로 나뉘게 된다.SPE's relationship to neurons For According to the size of the equations are divided into two cases as shown in equations (4) and (5).

i) 수학식 4의 경우에 대해 SPE를 이용하여 뉴런을 구성하려고 하면 수학식 6과 같은 조건을 만족할 때, 상기 수학식 2는 수학식 7과 같이 나타낼 수 있다.i) When the neuron is constructed using the SPE in the case of Equation 4, when the condition as shown in Equation 6 is satisfied, Equation 2 may be expressed as Equation 7.

, ,

수학식 7에 수학식 1의 첫번째 식을 대입하면 다음 수학식 8과 같은 결과가 된다.Substituting the first expression of Equation 1 into Equation 7 results in the following Equation 8.

결국 뉴런의 출력는 1개의 SPE 출력을 1개의 LPE로 입력함으로써 얻을 수 있게 된다.Eventually neuron output Can be obtained by inputting one SPE output to one LPE.

ii) 수학식 5의 경우는 뉴런이 갖는 시냅스의 크기가 SPE가 갖는 시냅스의 크기보다 큰 경우가 된다. 이때 마찬가지로 SPE를 통해서 뉴런을 구성하려고 하는 경우 뉴런의 시냅스와 SPE의 시냅스에 대한 집합 관계의 조건은 다음 수학식 9와 같다.ii) In the case of Equation 5, the size of synapses of neurons is larger than that of SPEs. In this case, when trying to construct a neuron through the SPE, the condition of the collective relationship between the synapse of the neuron and the synapse of the SPE is expressed by Equation 9 below.

수학식 9의 조건을 만족할 때를로 나눈 몫을 k 나머지를 r이라고 하면 다음 수학식 10과 같은 관계를 가진다.When the condition of Equation 9 is satisfied To If the quotient divided by k is r, the relationship is as shown in Equation 10 below.

이 관계식을 이용하여 수학식 2를 정리하면 다음 수학식 11의 관계로 나타낼 수 있다.By using Equation 2, Equation 2 can be summarized as Equation 11 below.

수학식 11에 수학식 1의 첫번째 식을 대입하면 다음 수학식 12와 같다.Substituting the first expression of Equation 1 into Equation 11 is as follows.

나머지 항에 대해서는 수학식 7을 적용할 수 있으므로 다음 수학식 13과 같이 된다.Equation (7) may be applied to the remaining terms as follows.

여기에서f(x)를로 치환하면 다음 수학식 14가 된다.Where f (x) If replaced by the following equation (14).

따라서, 뉴런의 출력는 k+1개의 SPE에서 발생한 출력의 합을 1개의 LPE로 입력함으로써 얻을 수 있게 된다. 이러한 방법을 사용하여 수학식 5와 같이 SPE가 가진 시냅스의 개수가 뉴런의 개수보다 적을 때에 여러 개의 SPE를 사용하여 단일 뉴런의 시냅스를 확장할 수 있는 것이다.Thus, the output of neurons Can be obtained by inputting the sum of the outputs generated by k + 1 SPEs into one LPE. Using this method, when the number of synapses of SPEs is smaller than the number of neurons as shown in Equation 5, multiple SPEs can be used to expand the synapses of a single neuron.

이하 본 발명의 실시예에 따른 구성과 그 작용을 첨부한 도면에 연계시켜 상세히 설명한다.Hereinafter will be described in detail with reference to the accompanying drawings the configuration and its operation according to an embodiment of the present invention.

본 발명은 SIMD(Single Instruction Multiple Data) 구조의 신경회로망을 기본으로 하고 파이프라인 구조를 통해서 데이터 입력에 대해 1 클럭 대기(Clock Latency)의 지연(Delay)을 가질 수 있도록 했으며, 모듈러 하드웨어 디자인을 적용하여 모듈 단위의 분산 컨트롤 유닛(Distributed Control Unit)을 통해 인접된 PE(Processing Element)들 사이에 마스터-슬레이브(Master-Slaved) 방식으로 데이터를 전달함으로써, 신경망의 시냅스, 뉴런, 레이어에 대한 확장시 발생하는 속도 저하 및 컨트롤 신호의 증가, 뉴런확장에 대한 시냅스의 비선형적 증가 문제를 해결하였다.The present invention is based on the neural network of the SIMD (Single Instruction Multiple Data) structure and has a delay of 1 clock latency for data input through the pipeline structure, and the modular hardware design is applied. In order to expand the synapses, neurons, and layers of a neural network by transferring data in a master-slaved manner between adjacent processing elements (PEs) through a distributed control unit in a modular unit, It solves the problem of slowing rate, increase of control signal and nonlinear increase of synapse to neuronal expansion.

또한, 사다리형 버스(40:Ladder Bus)와 내부연결 버스(30:Interconnecting Bus)를 통해 PE들을 연결하는 멀티 버스를 사용하고 각각의 모듈에 적용되는 사다리형 버스의 버스 방향과 PE들 간에 연결된 내부연결 버스의 연결 여부를 FPGA(Field Programmable Gate Array) 방식의 재구성 하드웨어 기법을 통해 재구성 가능하도록 함으로써, 간단한 구성비트들(Configuration bits)의 조작을 통하여 전체적인 신경망의 구조 변경 및 확장이 가능하도록 하는 재구성 구조를 구현하였으며 칩 내부의 유닛뿐 아니라 칩과 칩 사이에도 같은 방법으로 연결함으로써 외부 확장도 가능하도록 하였다.In addition, it uses a multi-bus that connects PEs through a ladder bus (40: Ladder Bus) and an interconnecting bus (30: Interconnecting Bus), and the bus direction of the ladder bus applied to each module and the internal connection between the PEs By reconfiguring whether the connection bus is connected through a reconfigurable hardware technique using a field programmable gate array (FPGA) method, the reconfiguration structure enables the structure of the entire neural network to be changed and expanded through simple manipulation of configuration bits. In addition, external expansion is also possible by connecting the chip and the chip as well as the unit inside the chip.

본 발명에서는 하나의 레이어를 구성하는 하나 또는 다수의 뉴런을 구현할 수 있는 회로 블록을 프로세싱 모듈(Processing Module)이라 지칭하였다. 프로세싱 모듈은 기본적으로 두가지의 PE(Processing Element)로써 구성되어 있으며, 입력 노드에 대한 가중치를 곱하는 시냅스 기능과 각 시냅스의 누적합 기능을 가지고 있는 SPE(Synapse Processing Element)와, 누적된 시냅스의 결과에 대한 활성화 함수값을 출력하거나 PE간의 데이터 전송방향을 제어하는 LPE(Layer Processing Element)가 그것이다. 도 2는 4개의 SPE와 1개의 LPE로 구성된 형태의 기본 프로세싱 모듈의 구조를 보여준다.In the present invention, a circuit block capable of implementing one or multiple neurons constituting one layer is referred to as a processing module. The processing module is basically composed of two processing elements (PEs), a synapse function that multiplies the weights of input nodes, a synapse function that has a cumulative sum function of each synapse, and a result of accumulated synapses. This is the Layer Processing Element (LPE) that outputs the value of the activation function and controls the data transfer direction between PEs. 2 shows the structure of a basic processing module composed of four SPEs and one LPE.

여기서, 상기 SPE는 내장된 메모리로 가중치 값을 저장하고, 저장된 가중치 값을 이용하여 각각의 시냅스에 대한 곱셈을 수행함과 동시에 곱셈 결과를 누적하는 기능과 인접 SPE로부터 연산된 결과를 받아서 더하는 기능, 연산된 결과를 LPE로 출력하는 기능을 가지고 있다. SPE의 내부 블록 다이어그램은 도 3과 같다.Here, the SPE stores a weight value in a built-in memory, performs a multiplication on each synapse by using the stored weight value, and accumulates a multiplication result and receives and adds a result calculated from an adjacent SPE. It has a function to output the result to LPE. The internal block diagram of the SPE is shown in FIG.

도 3에 도시된 바와 같이, SPE의 연산기는 기본적으로 덧셈기 곱셈기와 누산기(Accumulator)로 구성되어 있다. SPE는 실질적인 뉴런의 연산 속도를 좌우하는 부분이기 때문에 속도를 중요시하여 디자인되었다. 그에 따라 모두 병렬 데이터를처리하며 속도 향상을 위해 4 클럭의 처리율(Through-put)을 가지고 있으며, 1 클럭 대기(Clock Latency)를 갖고 있다.As shown in FIG. 3, the calculator of the SPE basically includes an adder multiplier and an accumulator. SPE is designed with speed in mind because it is the dominant part of the computational speed of neurons. As a result, they all process parallel data, have a 4-put through-put to improve speed, and have one clock latency.

그리고, 상기 LPE의 기능은 크게 두가지로 나뉜다. 첫 번째는 뉴런의 활성화 함수를 출력하는 AFU(Activation Function Unit)부분이며, 두 번째는 사다리형 버스를 가변시키는 버스 컨트롤러이다. 이 LPE를 통해서 실질적인 레이어나 뉴런의 확장이 이루어진다. 도 4는 LPE에 대한 블록 다이어그램이다.In addition, the function of the LPE is largely divided into two. The first is the AFU (Activation Function Unit), which outputs the neuron's activation function, and the second is a bus controller that changes the ladder bus. This LPE allows for the actual expansion of layers or neurons. 4 is a block diagram for an LPE.

다음은 버스 구조에 대해 설명한다.The following describes the bus structure.

본 발명의 전체적인 데이터는 다중 버스 구조(Multi-Bus Architecture)를 통해서 전달되고 있다. 데이터 버스는 크게 세가지로 나뉘어질 수 있는데, 첫 번째는 모듈 내부의 SIMD 구조에서 사용되는 입출력 버스, 두 번째는 모듈 내부에서 각각의 SPE간에 연결되어 있는 내부연결 버스(30), 세 번째는 모듈간의 연결을 맡고 있는 사다리형 버스(40)이다.The overall data of the present invention is delivered through a multi-bus architecture. The data bus can be divided into three types: the first is the input / output bus used in the SIMD structure inside the module, the second is the internal connection bus 30 connected between the respective SPEs inside the module, and the third is between the modules. The ladder bus 40 in charge of the connection.

상기 내부연결 버스(30)는 SPE에서 누적된 시냅스 값을 인접 SPE로 전달함으로써 시냅스를 확장할 수 있다. 이것은 단지 모듈의 내부 뿐 아니라 도 2의 프로세싱 모듈 구조에서도 나타나듯이 모듈 간에도 적용이 가능하다. 따라서 시냅스의 확장은 모듈 내부에 국한되지 않고 모듈을 초과하는 양의 시냅스도 하나의 뉴런에서 처리하는 것이 가능하다. 도 5는 내부연결 버스를 사용하여 시냅스를 확장하는 방법을 보여주고 있다. 도 5a는 하나의 모듈 내에서 시냅스를 확장하는 형태를 보여주고 있으며, 도 5b는 두 개의 모듈을 이용하여 시냅스를 확장하는 방법을 보여주고 있다.The interconnection bus 30 may extend the synapse by transferring the synaptic value accumulated in the SPE to the adjacent SPE. This is applicable not only inside the module but also between modules as shown in the processing module structure of FIG. Therefore, synaptic expansion is not limited to the inside of the module, it is possible to process the amount of synapses in excess of the module in one neuron. 5 shows a method for extending synapses using an interconnect bus. FIG. 5A illustrates a method of expanding a synapse in one module, and FIG. 5B illustrates a method of expanding a synapse using two modules.

상기 사다리형 버스(40)는 모듈간의 연결 구조를 바꿈으로써 여러 가지 신경회로망을 구현할 수 있도록 되어 있다. 사다리형 버스의 방향 전환 및 제어는 LPE에서 맡고 있다. LPE에 저장되는 구성비트에 따라서 버스의 방향 및 AFU(Activation Function Module)의 기능이 결정된다. 구성비트의 설정에 의해서 외부, 혹은 다른 모듈로부터의 입력이 아닌 자신의 모듈, 혹은 같은 레이어 상에 존재하는 다른 모듈로부터의 출력값을 입력으로 사용하는 것이 가능하다. 이 기능으로 홉필드 같은 역방향 신경망(Feedback Neural Network)을 구현하는 것이 가능하다. 도 6은 사다리형 버스(Ladder Bus)에 의해서 구성된 모듈의 다이어그램을 나타내며, 도 7은 LPE의 설정에 의한 정방향(a), 역방향(b) 버스 구성을, 도 8은 LPE에 의해 사다리 형태로 뉴런을 확장하는 예를 보여준다.The ladder bus 40 is capable of implementing various neural networks by changing the connection structure between the modules. The direction and control of the ladder bus is handled by the LPE. The configuration bits stored in the LPE determine the direction of the bus and the function of the activation function module (AFU). By setting the configuration bit, it is possible to use an output value from an own module or another module existing on the same layer as an input, not from an external or other module. This feature makes it possible to implement a backward neural network, such as a hopfield. Figure 6 shows a diagram of a module constructed by a ladder bus, Figure 7 shows the forward (a) and reverse (b) bus configurations by setting the LPE, and Figure 8 shows the neurons in a ladder form by the LPE. Here is an example of extending.

본 발명의 데이터 형식은 24비트 형식과 32비트 형식으로 나뉜다. 24비트(사다리형 버스) 형식은 뉴런의 입력 데이터, 가중치 값, 뉴런의 출력값들의 형식이고, 32비트(내부연결 버스) 형식은 SPE의 출력값 즉, 누적합 값의 형식이다.The data format of the present invention is divided into a 24-bit format and a 32-bit format. The 24-bit (ladder-type bus) format is the format of the neuron's input data, weights, and outputs of the neuron. The 32-bit (interconnect-bus) format is the output of the SPE, that is, the cumulative sum value.

그리고 본 발명에 따른 모듈의 데이터 버스는 멀티버스 형태로 24비트의 데이터 입력 버스, 24비트의 데이터 출력 버스, 32비트의 누적합 입력 버스, 32비트 누적합 출력 버스, 그리고 시냅스를 확장하기 위한 32비트의 시냅스 확장 입력 버스, 32비트의 시냅스 확장 출력 버스의 6가지로 되어 있다. 다음 표 1은 데이터 버스에 대한 정리를 보여준다.In addition, the data bus of the module according to the present invention is a multi-bus type 24-bit data input bus, 24-bit data output bus, 32-bit cumulative sum input bus, 32-bit cumulative sum output bus, and 32 for extending synapses. It consists of six bits: a synaptic expansion input bus with a bit and a synaptic expansion output bus with a 32 bit. Table 1 below shows a summary of the data bus.

버스 이름Bus name 방 향direction 크 기size 내 용Contents dindin inin 2424 신경망 모듈로 데이터를 입력Enter data into neural network module passinpassin inin 3232 인접 모듈로부터 시냅스 누적값을 입력Input synaptic accumulation value from adjacent module suminsumin inoutinout 3232 인접 모듈 및 내부 SPE로부터 SPE 출력값 입력Input SPE outputs from adjacent modules and internal SPEs doutdout inoutinout 2424 신경망 모듈로 데이터를 출력 혹은 궤환 입력Data output or feedback input to neural network module passoutpassout outout 3232 인접 모듈로 시냅스 누적값 출력Output Synapse Accumulation to Adjacent Modules sumoutsumout outout 3232 인접 모듈의 sumin과 연결(방향 및 기능은 연결된 모듈과 반대)Connect with sumin of adjacent module (direction and function are opposite to connected module)

제어신호는 SIMD 방식의 제어신호, 즉 모든 모듈에 동일하게 입력되는 제어신호와 자신의 인접 모듈과 영향을 주고 받는 Master-slaved 방식의 제어신호로 나뉘어진다. 각각의 SPE나 LPE들은 인접 SPE혹은 LPE들과 마스터 슬레이브 방식의 통신을 하기 때문에 각각의 모듈은 자신의 슬레이브 모듈에 대해서 제어권을 가진다. 이러한 구조를 사용함으로써 제어신호의 개수는 늘어나지만, PE의 개수에 제한없이 모듈을 확장할 수 있다는 장점이 있다. 표 2는 제어신호의 종류를 보여주고 있다.The control signal is divided into a SIMD type control signal, that is, a control signal input to all modules equally and a master-slaved type control signal affected by its neighboring module. Since each SPE or LPE communicates with the neighbor SPE or LPEs in a master slave mode, each module has control over its slave module. By using this structure, the number of control signals is increased, but there is an advantage that the module can be extended without limit to the number of PEs. Table 2 shows the types of control signals.

Signal NameSignal name DirectionDirection DescriptionDescription clkclk inin ClockClock rstrst inin ResetReset aleale inin Address latch enableAddress latch enable ldld inin Load data busLoad data bus ndsetndset inin New data setNew data set cmodecmode inin Configuration modeConfiguration mode wmodewmode inin Weight update modeWeight update mode ovinovin inin Overflow inOverflow in poepoe inin Previous output enablePrevious output enable prev_aeprev_ae inin Previous address enablePrevious address enable next_aenext_ae outout Next address enableNext address enable ovflovfl outout Overflow outOverflow out noenoe outout Next output enableNext output enable

본 발명에 따른 신경회로망의 구성 형태와 확장 형태를 정의하기 위해서는 먼저 구성비트들에 대해 정의해야 한다. ERNA의 구성비트는 SPE와 LPE에 대한 구성비트로 나눌 수 있다. 도 9는 SPE의 구성비트, 도 10은 LPE의 구성비트에 대한 설명으로서 SPE 모듈은 SPE의 구성비트를 통해서 동작여부, 누산기의 증감여부, 내부연결 버스의 사용여부를 결정하게 되고, LPE 모듈은 LPE의 구성비트를 통해서 사다리형 가변 버스의 데이터 진행 방향과 AFU의 동작여부를 결정하게 되는 것이다.In order to define a configuration form and an extension form of the neural network according to the present invention, configuration bits must first be defined. The constituent bits of ERNA can be divided into constituent bits for SPE and LPE. 9 is a configuration bit of the SPE, Figure 10 is a description of the configuration bit of the LPE SPE module to determine whether to operate through the configuration bit of the SPE, whether to increase or decrease the accumulator, the use of the internal connection bus, LPE module The configuration bit of the LPE determines the data progress direction of the ladder-type variable bus and whether the AFU is operated.

본 발명에서 제안된 구조를 이용하여 구성 가능한 신경회로망의 종류는 기본적으로 구성비트들과 LUT(Look-Up Table)에 의존하게 된다. 이 밖에 내부연결 버스 형태와 시냅스의 완전연결, 부분연결 등이 구성 형태에 영향을 미치게 된다. 이와 같은 요소에 의해 구성 가능한 형태는 SPE 구성비트(2⁴), LPE 구성비트(2²), 내부연결 버스 형태(2¹), 시냅스의 완전연결, 부분연결(2¹)로서 최소 2⁸=256가지가 되며, 실제적으로 모듈 내부의 SPE 개수와, 비선형 함수의 출력, 부분 Feed Back에의한 TDNN(Time Delayed Neural Network) 구성 등에 의해서 수많은 형태의 신경회로망으로 조합이 가능하다.The type of neural network configurable using the structure proposed in the present invention basically depends on configuration bits and a look-up table (LUT). In addition, the shape of the internal connection bus, the complete connection of the synapse and the partial connection affect the configuration. Forms configurable by such elements are SPE configuration bits (2 ⁴ ), LPE configuration bits (2 ² ), internally connected bus types (2 ¹ ), full connection of synapses, partial connections (2 ¹ ), at least 2 ⁸ = In fact, it is possible to combine them into many types of neural networks by the number of SPEs in the module, the output of nonlinear functions, and the configuration of TDNN (Time Delayed Neural Network) by partial feed back.

본 발명은 상술한 특정의 바람직한 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 고안이 속하는 기술분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형실시가 가능한 것은 물론이고, 그와 같은 변경은 청구범위 기재의 범위 내에 있게 된다.The present invention is not limited to the above-described specific preferred embodiments, and various modifications can be made by any person having ordinary skill in the art without departing from the gist of the present invention claimed in the claims. Of course, such changes will fall within the scope of the claims.

상기와 같이 구성된 본 발명은 가변성과 확장성을 확보할 수 있고, 고속의 신경회로망 연산이 가능하며, 구성비트 레지스터들의 설정을 통해 신경망의 구조를 변경할 수 있어 범용성을 갖는 신경회로망 하드웨어를 구현함으로써 그 상업적 가치가 높고, 하드웨어 칩의 장점을 가짐으로써 증권, 금융 등의 분야에서 많이 쓰이는 실시간 추론, 혹은 휴대용 모바일 기기 등의 지능형 서비스에 사용될 수 있어 산업발전에 이바지할 수 있는 유용한 발명이다.The present invention configured as described above can secure variability and scalability, enable high speed neural network operation, and can change neural network structure by setting configuration bit registers, thereby implementing neural network hardware having general purpose. It is a useful invention that can contribute to industrial development because it has high commercial value and has the advantage of a hardware chip, and can be used for real-time inference which is widely used in the fields of securities, finance, or intelligent services such as portable mobile devices.

Claims

In a neural network hardware having one or more basic processing modules composed of a plurality of processing elements (PEs) and a plurality of data buses,

The processing element includes: a plurality of synaptic processing elements (SPEs) having a multiplier for performing a weighted multiplication operation of a synapse of a neuron and an accumulator for performing a cumulative sum operation of synapses; And a layer processing element (LPE) having an activation function unit (AFU) for receiving an operation result of the synaptic processing element and outputting an activation function value for the synaptic processing element and a bus controller for controlling a data transfer direction between the processing elements. ,

The data bus includes: an internal connection bus connected between the synaptic processing elements and extending synapses by transferring the synaptic value accumulated in one synaptic processing element to an adjacent synaptic processing element; And a ladder bus which connects the synaptic processing element and the layer processing element to transfer the calculation result of the synaptic processing element to the layer processing element.

The method of claim 1,

The plurality of synaptic processing elements are arranged in a pipeline structure having a delay of one clock delay (Clock Latency) for the data input, the neural network hardware having a reconfigurable capacity and expansion capability.

The method of claim 1,

A neural network hardware with reconfigurability and expandability, further comprising: a modular control unit for transferring data between adjacent processing elements in a master-slave manner.

The method of claim 1,

The interconnection bus is reconfigurable and expandable neural network hardware, characterized in that the synaptic expansion between the main processing module as well as inside the basic processing module.

The method of claim 4, wherein

The internal connection bus is a neural network hardware having a reconfigurability and expandability, characterized in that for transmitting the result of the synaptic processing element in 32-bit format.

The method of claim 1,

The layer processing element has a means for storing a configuration bit, and according to the configuration bit to determine the transmission direction of the ladder bus and whether the activation function unit operating neural network with reconfigurability and expansion capability hardware.

The method according to claim 1 or 6,

The synaptic processing element has a means for storing a configuration bit, and according to the configuration bit reconfigurability characterized in that it determines whether the operation of the synaptic processing element, increase or decrease of the accumulator and the use of the internal connection bus and Neural network hardware with expandability.

The method of claim 1,

The ladder bus is a neural network hardware having a reconfigurability and expandability, characterized in that for transmitting the input data, the weight value, the output data of the neuron in a 24-bit format.