KR102176695B1

KR102176695B1 - Neural network hardware

Info

Publication number: KR102176695B1
Application number: KR1020180075553A
Authority: KR
Inventors: 김재준; 이우석
Original assignee: 포항공과대학교 산학협력단
Priority date: 2018-06-29
Filing date: 2018-06-29
Publication date: 2020-11-09
Also published as: KR20200002250A

Abstract

본 실시예는 뉴럴 네트워크의 온 칩(on chip) 학습을 수행하는 뉴럴 네트워크 하드웨어로, 뉴럴 네트워크 하드웨어는 학습 파라미터 갱신시 랜덤 학습 파라미터 업데이트 방식을 적용하도록 온 칩 학습 과정을 수행한다.The present embodiment is a neural network hardware that performs on-chip learning of a neural network. The neural network hardware performs an on-chip learning process to apply a random learning parameter update method when updating a learning parameter.

Description

Neural network hardware {NEURAL NETWORK HARDWARE}

이하 설명하는 기술은 뉴럴 네트워크 하드웨어 가속 장치에 관한 것이다.The technology to be described below relates to a neural network hardware acceleration device.

기존의 폰 노이만 구조를 기반으로 하는 칩들의 구조적 한계를 극복하기 위하여, IC 칩 개발 업체들은 사람의 뇌를 이루는 기본 단위인 뉴런과 이러한 뉴런 사이의 연결을 이어주는 시냅스 등으로 이루어진 뉴럴 네트워크를 바탕으로 하는 뉴럴 네트워크 하드웨어 또는 뉴로모픽 하드웨어를 개발해 오고 있다. 뉴럴 네트워크는 기존의 머신 러닝 알고리즘들이 갖고 있던 한계를 뛰어넘어 사람에 근접한 수준의 이미지, 영상, 패턴 학습 및 인지 능력을 보여주고 있으며, 이미 수많은 분야에 사용되고 있다. 수많은 회사와 연구자들은 이러한 뉴럴 네트워크의 연산 작업을 보다 저전력으로 빠르게 수행하기 위하여 전용 ASIC 칩을 개발해 오고 있다. In order to overcome the structural limitations of chips based on the existing von Neumann structure, IC chip developers have developed a neural network based on neurons, which are the basic units that make up the human brain, and synapses that connect these neurons. I have been developing neural network hardware or neuromorphic hardware. Neural networks have overcome the limitations of existing machine learning algorithms and show human-like image, image, pattern learning and cognitive abilities, and are already used in many fields. Numerous companies and researchers have been developing dedicated ASIC chips to perform these neural networks' computation tasks faster with lower power.

사람의 뇌에서는 뉴런들이 이어져 네트워크를 형성하는데, 그 과정에서 학습, 추론 및 사고한다. 뉴럴 네트워크는 그 학습활동을 모방하여 형성된다. 뉴럴 네트워크의 구성요소로는 뉴런과 시냅스가 있을 수 있으며, 시냅스는 뉴런과 뉴런이 연결되는 부분으로, 뉴런 사이의 시냅스 연결도의 강하고, 약함을 가중치(weight)라고 한다. In the human brain, neurons connect to form a network, in the process of learning, reasoning, and thinking. Neural networks are formed by imitating their learning activities. Neurons and synapses may be components of a neural network, and synapses are parts that connect neurons and neurons, and the strength and weakness of synaptic connections between neurons is called weight.

뉴럴 네트워크를 학습한다고 하는 것은 네트워크를 만들어 놓고, 학습을 시키고 싶은 데이터를 주었을 때, 목적하는 데이터가 출력될 때까지 학습을 시키는 것이며, 결과적으로, 학습이라는 과정은 가중치를 얻어내는 과정이다. Learning a neural network is a process of creating a network, giving data to be trained, and training until the desired data is output. As a result, the process of learning is a process of obtaining weights.

전용 하드웨어(customized hardware)를 이용하여 뉴럴 네트워크 학습 및 추론을 수행하는데, 그 이유는 뉴럴 네트워크라는 학습 프로그램(및 추론 프로그램) 을 수행하는 것은 수많은 병렬적 연산을 수행하는 것이다. 즉, 뉴런과 뉴런 사이의 가중치 뉴런을 가중치 매트릭스로 표현하고, 입력 벡터와 가중치 매트릭스를 곱하여 출력 벡터를 얻는 과정으로 표현할 수 있으나, 수행하는 연산의 횟수가 많다. Neural network training and inference are performed using specialized hardware, because performing a learning program (and reasoning program) called a neural network performs numerous parallel operations. In other words, it can be expressed as a process of obtaining an output vector by expressing a neuron and a weighted neuron between the neurons as a weight matrix and multiplying the input vector and the weight matrix, but the number of operations to be performed is large.

본 실시예가 이루고자 하는 기술적 과제는 기존의 폰노이만 구조 기반 칩들의 구조적 한계를 극복하고 보다 고속이면서 저전력으로 동작하는, 클래스 당 두 개 이상의 뉴런을 포함한 뉴럴 네트워크 가속화 칩을 만들기 위해 필요한 제어 컨트롤러의 설계 및 구현이다. The technical task to be achieved by this embodiment is the design of a control controller required to overcome the structural limitations of the existing von Neumann structure-based chips and to make a neural network accelerating chip including two or more neurons per class that operates at higher speed and lower power. Implementation.

본 실시예로 이루고자 하는 하나의 기술적 과제중 하나는 뉴럴 네트워크 가속 하드웨어에서, 라벨 뉴런부의 구성이 한 클래스당 두 개 이상의 라벨 뉴런으로 구성된 경우에 대해, 외부에서 주어진 신호에 따라 라벨 뉴런부의 뉴런들을 발현시킨 후, 각 클래스마다 발현된 뉴런의 개수를 합하여 가장 많은 뉴런이 발현한 클래스를 최종 추론 결과로 판단하거나, 또는 각 클래스별 추론 확률을 제시하는 하드웨어 제어 장치를 제공하는 것이다. One of the technical challenges to be achieved with this example is in the neural network acceleration hardware, when the configuration of the label neuron part is composed of two or more label neurons per class, the neurons of the label neuron part are expressed according to a signal given from the outside. After that, by summing the number of expressed neurons for each class, the class in which the most neurons are expressed is determined as the final inference result, or a hardware control device that presents the inference probability for each class is provided.

또한, 본 실시예로 이루고자 하는 다른 기술적 과제 중 하나는 유사 난수 생성기 기반의 단일 뉴럴 네트워크 가속 하드웨어에서, 확률 모델을 기반으로 하는 뉴럴 네트워크에서 추론을 수행할 때 같은 데이터에 대해 두 번 이상의 반복적인 추론을 수행한 후 각 추론의 결과를 모두 합하여 최종 추론의 결과를 도출하는 앙상블 추론(Ensemble inference)을 수행할 때, 보다 높은 추론 정확도의 확보를 위해 추론의 매 시행마다 유사 난수 생성기의 시드 버퍼를 새로이 바꿔주는 제어 장치를 설계 및 구현하는 것이다.In addition, one of the other technical challenges to be achieved with this embodiment is that when inference is performed on a neural network based on a probability model in hardware for accelerating a single neural network based on a pseudorandom number generator, iteratively infers two or more times on the same data. When performing Ensemble inference to derive the final inference result by summing the results of each inference after performing, the seed buffer of the pseudo-random number generator is newly updated for each trial of inference to ensure higher inference accuracy. It is to design and implement a changeable control device.

또한, 본 실시예로 이루고자 하는 또 다른 기술적 과제 중 하나는 온칩 학습이 가능한 뉴럴 네트워크 가속 하드웨어에서, 학습과정에서 가중치 및 바이어스를 갱신할 때, 외부 신호에 따라 랜덤 가중치 업데이트 방식의 적용 여부를 판단하고, 적용 시에 랜덤 가중치 업데이트 방식대로 가중치 및 바이어스의 갱신을 수행하도록 하는 데 필요한 제어 장치를 구현하는 것이다.In addition, one of the other technical challenges to be achieved with this embodiment is to determine whether to apply a random weight update method according to an external signal when updating weights and biases in the learning process in neural network acceleration hardware capable of on-chip learning. When applied, a control device necessary to update weights and biases according to a random weight update method is implemented.

본 실시예에 의한 뉴럴 네트워크 하드웨어는: 복수의 클래스를 가지며, 클래스 당 두 개 이상의 라벨 뉴런(label neuron)을 가지는 라벨 뉴런부를 포함하며, 라벨 뉴런들의 발현 결과가 뉴럴 네트워크를 통하여 수행된 복수회의 추론 결과로 선택된다.The neural network hardware according to this embodiment: has a plurality of classes, and includes a label neuron unit having two or more label neurons per class, and the expression results of the label neurons are performed multiple times through the neural network. Is selected as a result.

본 실시예에 따른 하드웨어 가속장치에 의하면 기존의 클래스 당 1개씩의 뉴런을 가진 뉴럴 네트워크 구조에 비해 더욱 높은 추론 정확도를 보장하는 클래스 당 2개 이상의 뉴런을 가진 뉴럴 네트워크 구조를 계산하는 하드웨어를 제어할 수 있는 장치를 구현할 수 있다는 장점이 제공된다. According to the hardware accelerator according to the present embodiment, it is possible to control the hardware that calculates a neural network structure having two or more neurons per class that guarantees higher inference accuracy compared to a conventional neural network structure having one neuron per class. The advantage of being able to implement a capable device is provided.

또한, 본 실시예에 따른 하드웨어 가속장치에 의하면 일반적으로 물리적으로 다른 하드웨어 장치들의 결과를 종합해서 추론을 수행하는 앙상블 추론 방식을 하나의 유사 난수 생성기 기반 하드웨어를 통해 수행할 때, 유사 난수 생성기의 시드 버퍼를 추론의 매 시행마다, 혹은 두 번 이상의 시행마다, 또는 무작위적 시행마다 다른 값으로 바꾸어 주어 마치 서로 다른 하드웨어에서 추론한 것과 비슷한 결과를 얻을 수 있도록 하여 보다 높은 추론 정확도를 보장하는 하드웨어 제어 장치를 구현할 수 있다는 장점이 제공된다. In addition, according to the hardware accelerator according to the present embodiment, when performing an ensemble inference method that physically aggregates the results of other hardware devices and performs inference through hardware based on a pseudorandom number generator, the seed of the pseudorandom number generator A hardware control device that guarantees higher inference accuracy by changing the buffer to a different value for each trial, two or more trials, or random trials to obtain results similar to those inferred by different hardware. The advantage of being able to implement is provided.

또한, 본 실시예에 의한 하드웨어 가속장치에 의하면 임의의 난수 값을 샘플링하여 그 난수가 미리 정해진 조건을 만족할 경우 계산된 수정치 값을 가중치 및 바이어스 값 갱신시에 적용하고, 조건을 만족하지 못한 경우는 계산된 수정치 값을 가중치 및 바이어스 값 갱신시에 적용하지 않도록 하는 랜덤 가중치 업데이트 방식을 적용할 수 있도록 뉴럴 네트워크 가속 하드웨어를 제어하여 기존의 방식에 비해 보다 높은 온칩 학습 추론 정확도를 보장하는 제어 장치를 구현할 수 있다는 장점이 제공된다.In addition, according to the hardware accelerator according to the present embodiment, when a random number value is sampled and the random number satisfies a predetermined condition, the calculated correction value is applied when updating the weight and bias values, and when the condition is not satisfied. Is a control device that guarantees higher on-chip learning inference accuracy compared to the conventional method by controlling the neural network acceleration hardware so that a random weight update method that does not apply the calculated correction value when updating the weight and bias values is applied. The advantage of being able to implement is provided.

도 1은 뉴럴 네트워크의 라벨 뉴런부를 추론하는 과정을 설명하기 위한 도면이다.
도 2 일 실시예에 의한 가속 하드웨어의 구조를 개요적으로 도시한 도면이다.
도 3은 일 실시예에 의한 축적-발현부의 개요를 도시한 도면이다.
도 4는 투표 및 추론부의 개요를 도시한 도면이다.
도 5는 다른 실시예에 의한 가속 하드웨어의 구조를 설명하기 위한 개요도이다.
도 6은 다른 실시예에 따른 투표 및 추론부를 설명하기 위한 개요도이다.
도 7은 다른 실시예에 따른 유사 난수 생성기의 개요를 도시한 도면이다.
도 8은 또 다른 실시예에 따른 가속 하드웨어의 구조를 설명하기 위한 도면이다.
도 9는 또 다른 실시예에 따른 가중치 수정부의 개요를 도시한 도면이다.
도 10은 뉴럴 네트워크를 탑재한 장치의 구조에 대한 예이다.1 is a diagram illustrating a process of inferring a label neuron part of a neural network.
2 is a diagram schematically showing a structure of an acceleration hardware according to an embodiment.
3 is a diagram showing an outline of an accumulation-expression unit according to an embodiment.
4 is a diagram showing an outline of a voting and reasoning unit.
5 is a schematic diagram for explaining the structure of an acceleration hardware according to another embodiment.
6 is a schematic diagram for explaining a voting and reasoning unit according to another embodiment.
7 is a diagram illustrating an overview of a pseudorandom number generator according to another embodiment.
8 is a diagram illustrating a structure of an acceleration hardware according to another embodiment.
9 is a diagram illustrating an outline of a weight correction unit according to another embodiment.
10 is an example of a structure of a device equipped with a neural network.

이하 설명하는 기술은 다양한 변경을 가할 수 있고 여러 가지 실시례를 가질 수 있는 바, 특정 실시례들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 이하 설명하는 기술을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 이하 설명하는 기술의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.The technology to be described below may be modified in various ways and may have various embodiments, and specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the technology to be described below with respect to a specific embodiment, and it should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the technology described below.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 해당 구성요소들은 상기 용어들에 의해 한정되지는 않으며, 단지 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 이하 설명하는 기술의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as 1st, 2nd, A, B, etc. may be used to describe various components, but the components are not limited by the above terms, only for the purpose of distinguishing one component from other components. Is only used. For example, without departing from the scope of the rights of the technology described below, a first component may be referred to as a second component, and similarly, a second component may be referred to as a first component. The term and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.

본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함한다" 등의 용어는 설시된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.In terms of the terms used in the present specification, expressions in the singular should be understood as including plural expressions unless clearly interpreted differently in context, and terms such as "includes" are specified features, numbers, steps, actions, and components. It is to be understood that the presence or addition of one or more other features or numbers, step-acting components, parts or combinations thereof is not meant to imply the presence of, parts, or combinations thereof.

도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. 그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다.Prior to the detailed description of the drawings, it is intended to clarify that the division of the constituent parts in the present specification is merely divided by the main function that each constituent part is responsible for. That is, two or more constituent parts to be described below may be combined into one constituent part, or one constituent part may be divided into two or more according to more subdivided functions. In addition, each of the constituent units to be described below may additionally perform some or all of the functions of other constituent units in addition to its own main function, and some of the main functions of each constituent unit are different. It goes without saying that it may be performed exclusively by.

또, 방법 또는 동작 방법을 수행함에 있어서, 상기 방법을 이루는 각 과정들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 과정들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In addition, in performing the method or operation method, each of the processes constituting the method may occur differently from the specified order unless a specific order is clearly stated in the context. That is, each of the processes may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

이하에서는 제어 장치를 포함하는 뉴럴 네트워크 하드웨어의 실시예들을 설명한다. Hereinafter, embodiments of neural network hardware including a control device will be described.

제1 First 실시예Example

이하에서는 도 1 내지 도 4를 참조하여 뉴럴 네트워크 하드웨어의 실시예를 설명한다. 학습된 뉴럴 네트워크를 이용하여 추론을 수행하는 알고리즘으로, 동일한 입력이 제공된 상태에서 동일한 출력을 내는 결정적(deterministic) 알고리즘과, 매번 다른 출력을 내는 스토캐스틱(stochastic) 알고리즘이 있을 수 있다. 본 실시예는 스토캐스틱 알고리즘을 이용한 앙상블 추론을 위한 것이다. 앙상블 추론이란 복수회의 추론 결과를 저장(투표)하고, 최종 추론의 결과는 투표 결과에 기초하는 추론 방식이다.Hereinafter, an embodiment of neural network hardware will be described with reference to FIGS. 1 to 4. As an algorithm for performing inference using a learned neural network, there may be a deterministic algorithm that produces the same output while the same input is provided, and a stochastic algorithm that produces a different output each time. This embodiment is for ensemble inference using a stochastic algorithm. Ensemble reasoning is a reasoning method that stores (votes) the results of multiple times of reasoning, and the result of the final reasoning is based on the result of the vote.

도 1을 참조하면, 본 발명의 일 실시예인 제어부(20)를 포함하는 뉴럴 네트워크 하드웨어(2)는 외부에서 제공된 제어 신호들(D1)을 제어부(20)를 통해 수신하고, 이 신호들을 바탕으로 입력 데이터 버퍼부(24)와 투표 및 추론부(224) 및 유사 난수 생성기(26)를 각각 입력 데이터 버퍼부 제어 신호(D2), 투표 및 추론부 제어 신호(D3), 그리고 유사 난수 생성기 제어 신호(D4)를 통해 제어한다. Referring to FIG. 1, a neural network hardware 2 including a control unit 20 according to an embodiment of the present invention receives control signals D1 provided from outside through the control unit 20, and based on these signals. The input data buffer unit 24, the voting and inference unit 224, and the pseudorandom number generator 26 are respectively input data buffer unit control signal D2, voting and reasoning unit control signal D3, and pseudo-random number generator control signal. Control through (D4).

외부 제어 신호(D1)는 학습 대상 데이터, 추론 대상 데이터 및 하나 이상의 제어 신호 중 어느 하나 이상을 포함할 수 있으며, 그 중에는 하드웨어(2)가 앙상블 추론을 수행할 때 몇 번의 추론 횟수를 누적하여 최종 결과를 도출할지에 대한 정보가 포함된다.The external control signal D1 may include any one or more of data to be learned, data to be inferred, and one or more control signals, among which, when the hardware 2 performs ensemble inference, the number of inferences is accumulated and the final Information on whether to produce results is included.

제어부(20)는 제어 신호(D1)를 처리하여 입력 데이터 버퍼부 제어 신호(D2)를 형성하여 입력 데이터 버퍼부(24)에 제공한다. 또한, 제어부(20)는 제어 신호(D1)을 처리하여 입력 데이터 버퍼부 제어 신호(D2)에 입력 데이터를 함께 입력 데이터 버퍼부(24)에 제공한다. The control unit 20 processes the control signal D1 to form an input data buffer unit control signal D2 and provides it to the input data buffer unit 24. In addition, the control unit 20 processes the control signal D1 and provides the input data to the input data buffer unit 24 together with the input data buffer unit control signal D2.

단일 하드웨어 앙상블 추론이 수행될 때, 첫 시행인 경우에 입력 데이터 버퍼부(24)는 입력 데이터 버퍼부 제어 신호(D2)를 통하여 제공된 데이터를 입력 데이터 버퍼에 저장하고 이를 축적-발현부(222)에 제공한다. When a single hardware ensemble inference is performed, in the case of the first trial, the input data buffer unit 24 stores the data provided through the input data buffer unit control signal D2 in the input data buffer, and stores the data in the input data buffer, and the accumulation-expression unit 222 To provide.

입력 데이터 버퍼부(24)는 첫 번째 시행 이후 정해진 앙상블 추론 횟수만큼 추론을 수행할 때에는 데이터를 입력받는 과정 없이 입력 데이터 버퍼에 저장된 값을 축적-발현부(222)로 제공하여 보다 빠르게 반복 추론이 수행되도록 동작한다. The input data buffer unit 24 provides the value stored in the input data buffer to the accumulation-expression unit 222 without receiving data to the accumulation-expression unit 222 when performing the inference as many as the number of ensemble inferences determined after the first trial. Works to be performed.

축적-발현부(222)는 입력 데이터(입력 벡터)와 가중치 매트릭스 일부와의 곱을 연산하여 부분 합(partial sum)을 연산하고, 연산된 부분합을 축적한다. 축적된 값은 가중 합(weighted sum)을 이룬다. The accumulation-expression unit 222 calculates a partial sum by calculating a product of the input data (input vector) and a part of the weight matrix, and accumulates the calculated partial sum. The accumulated values form a weighted sum.

또한, 축적 발현부(222)는 연산된 가중 합을 활성화 함수(acivation function)에 인자로 제공하고, 활성화 함수를 연산하여 활성화 값(activation function)을 얻는다. 활성화 값은 뉴런의 발현 결과에 해당한다. 일 실시예로, 축적 발현부(222)는 연산된 가중 합과 바이어스 값(bias value)을 합산하여 활성화 함수에 인자로 제공할 수 있다. 바이어스는 벡터 형태를 가질 수 있으며, 현재 계산하고 있는 각 출력 뉴런마다 다른 바이어스 값을 가질 수 있다. 바이어스 값은 후술할 바와 같이 학습 과정을 통해서 얻을 수 있다.In addition, the accumulation and expression unit 222 provides the calculated weighted sum as a factor to an activation function, and calculates the activation function to obtain an activation function. The activation value corresponds to the result of expression of the neuron. In an embodiment, the accumulation and expression unit 222 may add the calculated weighted sum and a bias value to provide an activation function as a factor. The bias can have a vector form, and each output neuron currently being calculated can have a different bias value. The bias value can be obtained through a learning process as described later.

정해진 앙상블 추론 횟수가 끝나고 새로운 추론을 수행하게 되면, 제어부는 기존에 입력 데이터 버퍼에 저장된 값을 초기화하고 새로 입력받은 데이터를 저장하도록 제어한다.When the predetermined number of ensemble inferences is over and a new inference is performed, the control unit initializes the values previously stored in the input data buffer and controls to store the newly received data.

도 2를 참조하면, 하드웨어(2)의 투표 및 추론부(224)는 제어부(20)가 제어 신호(D1)을 처리하여 형성한 투표 및 추론부 제어 신호(D3)를 입력받고, 이를 기초로 투표부(2242)가 몇 번의 시행동안 결과값을 누적할지 제어한다. 정해진 앙상블 추론 횟수가 될 때까지 투표부(2242)는 축적-발현부(222)의 결과를 발현 결과 버퍼에 누적하여 보관하며, 그 값을 추론부(2246)에 출력하지 않고 다음 추론이 수행되도록 한다. Referring to FIG. 2, the voting and reasoning unit 224 of the hardware 2 receives the voting and reasoning unit control signal D3 formed by the control unit 20 processing the control signal D1, and based on this The voting unit 2242 controls how many trials the result value is accumulated. The voting unit 2242 accumulates and stores the result of the accumulation-expression unit 222 in the expression result buffer until the set number of ensemble inferences is reached, and the next inference is performed without outputting the value to the inference unit 2246. do.

정해진 앙상블 추론 횟수에 도달하면, 최종적으로 누적된 발현 결과 버퍼값을 추론부(2246)에 보내어 최종 앙상블 추론 결과를 결정하도록 한 뒤, 발현 결과 버퍼를 초기화하여 다음번 앙상블 추론이 수행될 수 있도록 준비한다. When the determined number of ensemble inferences is reached, the finally accumulated expression result buffer value is sent to the inference unit 2246 to determine the final ensemble inference result, and then the expression result buffer is initialized to prepare for the next ensemble inference. .

도 3을 참조하면, 하드웨어(2)의 유사 난수 생성기(26)는 외부 또는 제어부로부터 유사 난수 생성기 제어 신호(D4)를 입력받고, 추론을 수행하는 횟수에 따라 시드 버퍼부(262)가 유사 난수 생성의 기초가 되는 시드 버퍼의 값을 바꾸도록 제어한다. 시드 버퍼부(262)의 시드 버퍼는 앙상블 추론의 매 시행마다 바뀔 수도 있고, 두 번 이상의 시행마다 바뀔 수도 있으며, 불규칙적으로 바뀔 수도 있다. 일반적으로 앙상블 추론이 끝나고 나면 다시 처음의 시드 버퍼 값으로 돌아가지만, 새로운 시드 버퍼 값으로 설정될 수도 있다. 3, the pseudo-random number generator 26 of the hardware 2 receives a pseudo-random number generator control signal D4 from an external or a control unit, and the seed buffer unit 262 receives a pseudo-random number according to the number of times that inference is performed. It controls to change the value of the seed buffer, which is the basis of generation. The seed buffer of the seed buffer unit 262 may be changed every trial of the ensemble inference, may be changed every two or more trials, or may be changed irregularly. In general, after the ensemble inference is finished, it returns to the initial seed buffer value, but it may be set to a new seed buffer value.

한 번의 추론을 수행할 때, 입력과 가중치 매트릭스를 곱하여 출력을 얻는데, 이는 출력 뉴런에 대한 값을 얻는 것이다. 디터미니스틱 알고리즘에 따르면, 입력 벡터와 가중치 매트릭스를 곱하여 얻어진 출력(웨이티드 섬, weighted sum) 값을 얻는데, 이를 활성화 함수(activation function)에 인자로 제공한다. 활성화 함수(activation function)의 출력이 활성화 값(activation value)이고, 그 값이 곧 뉴런의 출력 값이다. When performing one inference, we multiply the input by the weight matrix to get the output, which is the value for the output neuron. According to the deterministic algorithm, an output (weighted sum) value obtained by multiplying an input vector and a weight matrix is obtained, which is provided as a factor to an activation function. The output of the activation function is the activation value, and that value is the output value of the neuron.

그러나, 스토캐스틱 알고리즘에 의하면 활성화 값을 얻은 후, 이를 적절한 범위의 난수 값(random number)과 비교하여, 활성화 값과 난수 값의 대소 관계에 따라 뉴런의 출력값을 1 또는 0으로 결정한다. 일 예로, 활성화 값이 난수값보다 작다면 뉴런의 최종 출력값은 0이 되고, 활성화 값이 난수 값보다 크면 뉴런의 최종 출력값은 1이 된다. 이와 같이 난수를 이용하는 이유는 스토캐스틱 알고리즘에서 가중 합(weighted sum) 값을 통해 계산된 활성화 값은, 곧 해당 뉴런이 발현할 확률이 되기 때문에, 확률로부터 해당 뉴런의 최종 출력값을 샘플링하기 위한 방법으로, 발현할 확률과 임의로 뽑은 난수 값의 비교 과정이 필요하기 때문이다. 이와 같이 확률이 개입하는 스토캐스틱 알고리즘은 디터미니스틱 알고리즘에 비하여 보다 실제의 뇌와 가깝다.However, according to the stochastic algorithm, after obtaining an activation value, it is compared with a random number in an appropriate range, and the output value of the neuron is determined as 1 or 0 according to the magnitude relationship between the activation value and the random number value. For example, if the activation value is less than the random number, the final output value of the neuron becomes 0, and if the activation value is greater than the random number value, the final output value of the neuron becomes 1. The reason for using random numbers in this way is that the activation value calculated through the weighted sum value in the stochastic algorithm becomes the probability that the corresponding neuron will be expressed, so it is a method to sample the final output value of the corresponding neuron from the probability. This is because it requires a process of comparing the probability of occurrence and the random number selected. In this way, the stochastic algorithm in which probability intervenes is closer to the real brain than the deterministic algorithm.

시드 버퍼부(262)가 시드 버퍼의 값을 설정하면, 이를 통해 선형 궤환 시프트 레지스터(Linear Feedback Shift Resistor, LFSR) 및 난수 생성부(264)가 유사 난수를 생성한다.When the seed buffer unit 262 sets the value of the seed buffer, the linear feedback shift register (LFSR) and the random number generator 264 generate pseudo-random numbers through this.

제2 Second 실시예Example

이하에서는 도 4 내지 도 7을 참조하여 뉴럴 네트워크 하드웨어의 실시예를 설명한다. 이상에서 설명된 실시예와 동일하거나, 유사한 요소에 대하여는 도면으로 도시하지 않을 수 있으며, 설명을 생략할 수 있다. 본 실시예는 뉴럴 네트워크를 이용한 추론에 관한 것으로, 복수의 서로 다른 추론을 수행하여 얻어진 결과가 서로 다른 복수의 라벨 뉴런을 통하여 발현된다.Hereinafter, an embodiment of neural network hardware will be described with reference to FIGS. 4 to 7. Elements that are the same as or similar to the embodiments described above may not be illustrated in the drawings, and descriptions may be omitted. The present embodiment relates to inference using a neural network, and results obtained by performing a plurality of different inferences are expressed through a plurality of different label neurons.

도 4를 참조하면, 뉴럴 네트워크의 라벨 뉴런부(13)는 두 개 이상의 뉴런을 포함하며, 숨겨진 뉴런부(11)를 통해 계산된다. 일 실시예로, 뉴럴 네트워크는 입력 벡터가 제공되는 인풋 레이어(input layer)와, 출력을 제공하는 라벨 레이어(label layer) 및 인풋 레이어와 라벨 레이어 사이의 숨겨진 레이어(hidden layer)의 크게 세 층으로 분류될 수 있다. 숨겨진 레이어는 실질적으로 하나의 뉴런 레이어 일 수 있으며 혹은 둘 이상의 뉴런 레이어일 수 있다. 이들의 레이어 사이들은 가중치 매트릭스로 연결될 수 있다.Referring to FIG. 4, a label neuron part 13 of a neural network includes two or more neurons, and is calculated through a hidden neuron part 11. In one embodiment, the neural network is composed of three layers: an input layer providing an input vector, a label layer providing an output, and a hidden layer between the input layer and the label layer. Can be classified. The hidden layer may be substantially one neuron layer or two or more neuron layers. These layers can be connected by a weight matrix.

도 4의 실시예는 클래스 당 다섯 개의 라벨 뉴런을 가지는 경우를 도시한다. 도 4로 예시되는 일 실시예는 라벨 뉴런부(13)에서 최종 활성화 되는 뉴런의 개수는 외부 입력 신호에 따라 클래스 당 할당된 뉴런의 개수만큼 제한될 수 있다. The embodiment of FIG. 4 shows the case of having five label neurons per class. In the exemplary embodiment illustrated in FIG. 4, the number of neurons that are finally activated in the label neuron unit 13 may be limited by the number of neurons allocated per class according to an external input signal.

도 4로 도시된 실시예에서는 다섯 개의 라벨 뉴런이 최종 발현된 상태이며, 그 후에 각 클래스마다 발현된 뉴런의 수를 합하여 가장 많은 수의 뉴런이 발현된 클래스가 입력된 데이터에 대한 최종 추론의 결과로 선택된 것을 보여주고 있다. In the example shown in FIG. 4, five labeled neurons are finally expressed, and after that, the number of expressed neurons for each class is summed and the class in which the largest number of neurons is expressed is the result of the final inference on the input data. It shows what was selected as.

일 실시예로, 한 번의 추론을 통해 여러 개의 라벨 뉴런이 발현됐을 때 그 결과를 종합하여 최종 추론 결과로 선택될 수 있다. 다른 실시예에 의하면, 복수 회의 추론 결과를 취합하고, 이를 통해 라벨 뉴런들을 발현시킬 수 있다. 즉, 라벨 뉴런의 발현 결과는 뉴럴 네트워크를 통하여 수행된 1 회 이상의 추론 결과로 선택된다.In one embodiment, when several label neurons are expressed through one inference, the results may be synthesized and selected as a final inference result. According to another embodiment, a plurality of inference results may be collected and label neurons may be expressed through this. That is, the expression result of the labeled neuron is selected as the result of one or more inferences performed through the neural network.

도시되지 않은 다른 실시예에서, 라벨 뉴런부(13)에서 최종 활성화 되는 뉴런의 개수는 제한 없이 정해진 발현 조건에 따라 발현될 수 있으며, 최종 추론 결과는 각 클래스 당 발현된 뉴런의 개수를 각 클래스 당 추론 확률로 변환하여 최종 추론 결과로 제공할 수 있다.In another embodiment not shown, the number of neurons that are finally activated in the labeled neuron part 13 may be expressed according to a determined expression condition without limitation, and the final inference result is the number of expressed neurons per each class. It can be converted into inference probability and provided as a final inference result.

도 5로 예시된 실시예에서, 제어부(10)를 포함하는 뉴럴 네트워크 하드웨어(1)는 외부에서 제공된 제어 신호들(D1)을 제어부(10)를 통해 수신한다. 상술한 바와 같이 제어 신호(D1)는 입력 데이터와 함께 하나 이상의 제어 신호를 포함할 수 있으며, 제어 신호(D1)에는 해당 하드웨어(1)에 입력되는 데이터의 클래스 개수, 클래스 당 할당된 라벨 뉴런의 개수에 대한 정보 및 라벨 뉴런의 발현 개수 제한 여부를 결정하는 신호를 포함할 수 있다.In the embodiment illustrated in FIG. 5, the neural network hardware 1 including the controller 10 receives control signals D1 provided from the outside through the controller 10. As described above, the control signal D1 may include one or more control signals along with the input data, and the control signal D1 includes the number of classes of data input to the corresponding hardware 1 and the number of label neurons allocated per class. It may include information on the number and a signal for determining whether to limit the expression number of label neurons.

제어부(10)는 수신한 제어 신호(D1)를 바탕으로 축적-발현부 제어 신호(D2)를 형성 및 출력 하여 축적-발현부(122)를 제어하고, 입력 데이터(D4)를 입력 데이터 버퍼부(14)에 제공한다. 데이터는 입력 데이터 버퍼부(14)에 저장될 수 있으며, 입력 데이터 버퍼부(14)는 저장된 데이터(D4')를 축적-발현부(122)의 축적부(1224)에 제공된다. 제어부(10)는 제어 신호(D1)을 기초로 투표 및 추론부 제어 신호(D3)를 형성하고, 이를 출력하여 투표 및 추론부(124)를 제어한다.The control unit 10 controls the accumulation-expression unit 122 by forming and outputting the accumulation-expression unit control signal D2 based on the received control signal D1, and inputs the input data D4 to the input data buffer unit. Provided in (14). Data may be stored in the input data buffer unit 14, and the input data buffer unit 14 provides the stored data D4' to the accumulating unit 1224 of the accumulating-expressing unit 122. The control unit 10 forms a voting and reasoning unit control signal D3 based on the control signal D1, and outputs it to control the voting and reasoning unit 124.

도 6을 참조하면, 하드웨어(1)의 축적-발현부(122)는 축적-발현부 제어 신호(D2)를 입력받아 이를 발현부(1226)가 라벨 뉴런부(13)의 값을 발현하는 데에 사용한다. 가중치 저장부(1222)는 뉴런 레이어 사이의 가중치인 가중치 값을 저장하며, 메모리 소자들을 포함한다. 도 6에는 가중치 저장부(1222)가 축적-발현부(122)에 포함된 것으로 예시되었다. 다른 실시예에서 가중치 저장부(122)는 축적-발현부(122)의 외부에 존재할 수 있고 단지 메모리 소자에서 독출한 값을 축적-발현부(122)에 제공할 수 있다. 6, the accumulation-expression unit 122 of the hardware 1 receives the accumulation-expression control signal D2, and the expression unit 1226 expresses the value of the label neuron unit 13 Used for The weight storage unit 1222 stores a weight value, which is a weight between neuron layers, and includes memory elements. 6, it is illustrated that the weight storage unit 1222 is included in the accumulation-expression unit 122. In another embodiment, the weight storage unit 122 may exist outside the storage-expression unit 122 and may only provide a value read from the memory device to the storage-expression unit 122.

축적부(1224)는 입력 데이터 버퍼부(14)에 저장된 데이터(D4')를 제공받는다. 또한, 축적부(1224)는 가중치 저장부(1222)에서 가중치 값을 제공받고, 이를 입력받은 데이터 값(D4')과 연산하여 부분 합(partial sum)을 연산하고, 부분 합을 축적하여 가중 합(weighted sum)을 형성한다. 발현부(1226)는 가중합을 제공받고 뉴런부의 발현 여부를 결정한다. 일 실시예로, 발현부는(1226)는 가중합에 바이어스 값을 더 합산하여 뉴런부의 발현 여부를 결정할 수 있다.The storage unit 1224 receives the data D4' stored in the input data buffer unit 14. In addition, the accumulation unit 1224 receives a weight value from the weight storage unit 1222, calculates a partial sum by calculating the weight value from the weight storage unit 1222, and calculates the received data value (D4'), and accumulates the partial sum. (weighted sum) is formed. The expression portion 1226 receives weighted polymerization and determines whether or not the neuronal portion is expressed. In one embodiment, the expression unit 1226 may determine whether to express the neuron unit by adding a bias value to the weighted sum.

그 중 축적-발현부(122)가 라벨 뉴런부(13)의 발현을 결정 하는 경우는 축적-발현부 제어 신호(D2)에 따라 총 발현되는 라벨 뉴런의 개수를 클래스 당 할당된 라벨 뉴런의 개수만큼 제한하거나, 또는 제한하지 않고 다른 뉴런부의 발현 방식과 똑같이 발현시키는 역할을 한다. Among them, when the accumulation-expression part 122 determines the expression of the label neuron part 13, the total number of label neurons expressed according to the accumulation-expression part control signal D2 is the number of label neurons allocated per class. It serves to express the same as the expression mode of other neurons without limiting or limiting it.

도 7을 참조하면, 하드웨어(1)의 투표 및 추론부(124)는 투표 및 추론부 제어 신호(D3)를 입력받고, 이를 투표부(1242)가 발현부(1226)의 결과를 바탕으로 각 클래스 당 발현된 뉴런의 개수를 카운트 하도록 제어한다. 투표부(1242)는 카운트 결과를 추론부(1246)에 전달하여 최종 추론 결과를 결정하거나 각 클래스 당 추론 확률을 제시한다. Referring to FIG. 7, the voting and reasoning unit 124 of the hardware 1 receives the voting and reasoning unit control signal D3, and the voting unit 1242 is based on the result of the expression unit 1226. Control to count the number of expressed neurons per class. The voting unit 1242 transmits the count result to the reasoning unit 1246 to determine the final reasoning result or present the reasoning probability for each class.

투표부(1242)는 투표 및 추론부 제어 신호(D3)에서 입력받은 클래스 개수와 클래스 당 할당된 뉴런의 개수 정보를 바탕으로 각 클래스가 가지는 발현된 라벨 뉴런의 개수를 수합한다. 추론부(1246)는 투표부(3110)의 결과를 바탕으로 득표를 가장 많이 한 클래스를 입력된 데이터의 최종 클래스 추론 후보로 결정하거나, 혹은 각 클래스마다 득표한 뉴런의 개수를 각 클래스 당 추론 확률로 변환하여 이를 최종 결과로 통보한다.The voting unit 1242 collects the number of expressed label neurons of each class based on the number of classes received from the voting and inference unit control signal D3 and the number of neurons allocated per class. The inference unit 1246 determines the class with the most votes as the final class inference candidate of the input data based on the result of the voting unit 3110, or determines the number of neurons that have been voted for each class as the inference probability per class. And notified as the final result.

제3 Third 실시예Example

이하에서는 도 8 내지 도 9를 참조하여 뉴럴 네트워크 하드웨어의 또 다른 실시예를 설명한다. 이상에서 설명된 실시예와 동일하거나, 유사한 요소에 대하여는 도면으로 도시하지 않을 수 있으며, 설명을 생략할 수 있다.Hereinafter, another embodiment of neural network hardware will be described with reference to FIGS. 8 to 9. Elements that are the same as or similar to the embodiments described above may not be illustrated in the drawings, and descriptions may be omitted.

하드웨어를 사용하여 뉴럴 네트워크를 연산할 때, 가중치 및 바이어스를 포함하는 학습 파라미터 값들은 서버를 이용한 학습 과정이 수행되어 값이 도출되고, 하드웨어를 사용하여 학습된 뉴럴 네트워크를 통한 추론을 수행하는 것이 일반적이다. 그러나, 본 실시예는 온 칩으로 학습을 수행하며, 추론을 병행 수행할 수 있다.When computing a neural network using hardware, learning parameter values including weights and biases are derived by performing a learning process using a server, and inference is generally performed through a neural network learned using hardware. to be. However, in this embodiment, learning is performed on-chip and inference can be performed in parallel.

이하에서는 본 실시예에 의한 랜덤 학습 파라미터 업데이트 방법을 설명한다. 뉴럴 네트워크를 학습시킬 때, 입력 벡터와 가중치 매트릭스의 곱으로 가중 합(weighted sum)을 얻고, 가중 합에 바이어스 값을 가산하여 활성화 함수에 인자로 제공할 수 있다. 활성화 함수의 출력은 활성화 값(activation value)으로, 뉴런은 활성화 값에 따라 출력을 제공하며, 이와 같이 얻은 출력과 목적하는 출력과의 차이는 오차(error)이다. Hereinafter, a method of updating a random learning parameter according to the present embodiment will be described. When training a neural network, a weighted sum is obtained by multiplying an input vector and a weight matrix, and a bias value is added to the weighted sum to provide an activation function as a factor. The output of the activation function is an activation value, and neurons provide an output according to the activation value, and the difference between the obtained output and the desired output is an error.

일 실시예로, 바이어스는 벡터 형태를 가질 수 있으며, 현재 계산하고 있는 각 출력 뉴런마다 다른 바이어스 값을 가질 수 있다. 바이어스 값은 학습 과정을 통해서, 현재 추론된 결과 값과 이상적인 정답의 차이를 통해 계산될 수 있다.In an embodiment, the bias may have a vector form, and may have a different bias value for each output neuron currently being calculated. The bias value may be calculated through the learning process, through the difference between the currently inferred result value and the ideal correct answer.

뉴럴 네트워크의 학습은 이러한 오차를 줄이는 방향으로 학습 파라미터(learning parameter)를 학습한다. 일 예로, 학습 파라미터는 가중치(weight) 및 바이어스를 포함할 수 있다. 본 실시예에 의한 랜덤 학습 파라미터 학습에서, 가중치 학습은 목적하는 가중치 값에 r·η·ΔW를 더하여 수행될 수 있다. η는 학습률(learning rate)을 의미하며, r은 난수(random number), ΔW는 가중치 오차이다. Learning of a neural network learns a learning parameter in a direction to reduce this error. As an example, the learning parameter may include a weight and a bias. In the random learning parameter learning according to the present embodiment, weight learning may be performed by adding r·η·ΔW to a target weight value. η means the learning rate, r is the random number, and ΔW is the weight error.

일 예로, 난수가 1이면 η·ΔW를 더하고, 난수가 0이면 η·ΔW를 더하지 않는다. 일반적으로, 학습률이 클수록 학습과정의 진행은 빠르지만 최종 추론 정확도가 낮고, 학습률이 작을수록 학습과정은 오래 걸리나 최종 추론 정확도가 높다고 알려져 있다. 본 실시예에서, 학습률과 가중치 변화량(ΔW)의 곱에 난수 값을 곱하는 것은, 예를 들어 난수 값이 1/2의 확률로 1이 되는 값이라고 할 때, 복수 회의 학습을 거친 후의 전체적인 학습율인 실효 학습률(effective learning rate, η_eff)값을 η_eff ≒ 1/2*η로 낮추는 역할을 한다. 또한, 난수를 사용함으로써 특정 몇몇 학습 결과에 편중되지 않고 좀 더 일반화된 학습을 수행하도록 하는 역할을 할 수 있어서, 최종 추론 정확도를 향상시킬 수 있다.For example, if the random number is 1, η·ΔW is added, and if the random number is 0, η·ΔW is not added. In general, it is known that the higher the learning rate, the faster the learning process proceeds, but the final inference accuracy is low, and the smaller the learning rate, the longer the learning process is, but the final inference accuracy is high. In the present embodiment, multiplying the product of the learning rate and the weight change amount (ΔW) by the random number value is, for example, a value in which the random number value becomes 1 with a probability of 1/2, which is the overall learning rate after multiple times of learning. The effective learning rate (η _eff ) is set to η _eff It plays a role of lowering ≒ 1/2*η. In addition, by using a random number, it can play a role of performing more generalized learning without being biased against some specific learning results, thereby improving the accuracy of final inference.

본 실시예에 의한 랜덤 학습 파라미터 학습에서, 바이어스의 학습은 위의 가중치 학습과 유사하게, 이론상의 바이어스 값과 학습에 의하여 얻어진 바이어스 값의 차이와 난수 및 학습률을 곱하여 얻어진 값에 목적하는 바이어스 값에 더하여 수행될 수 있다. In the random learning parameter learning according to the present embodiment, the bias learning is similar to the weight learning above, and the difference between the theoretical bias value and the bias value obtained by learning is multiplied by a random number and a learning rate. In addition, it can be done.

도 8을 참조하면, 본 발명의 일 실시예인 제어 장치(30)를 포함하는 뉴럴 네트워크 하드웨어(3)는 외부에서 제공된 제어 신호들(D1)을 제어부(30)를 통해 수신하며, 이 신호들을 바탕으로 학습 파라미터 수정부(326)와 난수 생성기부(34)를 각각 학습 파라미터 수정부 제어 신호(D2)와 난수 생성기 제어 신호(D3)를 통해 제어한다. 외부 제어 신호(D1)는 하나 이상의 제어 신호를 포함하며, 그 중에는 해당 하드웨어(1)가 온 칩(on-chip) 학습을 수행할 때 랜덤 학습 파라미터 업데이트(Random Learning Parameter Update) 방식의 적용 여부를 결정하는 성분과, 적용시에 난수를 어떠한 조건으로 추출하여 적용할지 결정하는 신호 성분을 포함한다. 또한, 외부 제어 신호(D1)에는 가중치 값을 저장하는 메모리 소자(미도시)로부터 제공된 가중치 값의 성분을 포함할 수 있다. 일 예로, 메모리 소자(미도시)는 본 실시예에 의한 하드웨어와 동일한 칩에 포함될 수 있다. Referring to FIG. 8, the neural network hardware 3 including the control device 30 according to an embodiment of the present invention receives control signals D1 provided from the outside through the control unit 30, based on these signals. As a result, the learning parameter correction unit 326 and the random number generator unit 34 are controlled through the learning parameter correction unit control signal D2 and the random number generator control signal D3, respectively. The external control signal D1 includes one or more control signals, among which, when the hardware 1 performs on-chip learning, whether or not the random learning parameter update method is applied. It includes a component to determine and a signal component to determine under what conditions the random number is extracted and applied at the time of application. In addition, the external control signal D1 may include a component of a weight value provided from a memory device (not shown) that stores the weight value. For example, a memory device (not shown) may be included in the same chip as the hardware according to the present embodiment.

난수 생성기(34)는 유사 난수 생성기(Pseudo Random Number Generator) 또는 순수 난수 생성기(True Random Number Generator) 등이 될 수 있다. 난수 생성기부(34)에서 생성된 난수는 투표 및 추론부(324)와 학습 파라미터 수정부(326)가 동작할 때 사용될 수 있다. The random number generator 34 may be a pseudo random number generator or a true random number generator. The random number generated by the random number generator 34 may be used when the voting and inference unit 324 and the learning parameter correction unit 326 operate.

연산부(32)의 전체 연산 과정을 간략히 설명하면, 축적-발현부(322)는 외부에서 입력된 입력 데이터 값과 해당 데이터에 연관된 가중치를 곱하여 축적하고, 바이어스를 더하는 연산 과정을 수행한다. 투표 및 추론부(324)는 축적 및 발현부(322)의 연산 결과를 이용하여 추론한다. Briefly describing the entire operation process of the operation unit 32, the accumulation-expression unit 322 performs an operation process of multiplying and accumulating the input data value input from the outside by a weight associated with the data, and adding a bias. The voting and inference unit 324 infers using the calculation result of the accumulation and expression unit 322.

추론 결과와 이상적인 정답을 비교하여 그 차이를 학습 파라미터 수정부(326)에 포함된 오차 계산부(3262)를 통해 계산한다. 오차 계산부(3262)는 이상적인 가중치 값과 실제적인 가중치 값의 차이 오차(ΔW) 및 이상적인 바이어스 값과 실제적인 바이어스 값의 차이 오차를 연산할 수 있다. The inference result and the ideal correct answer are compared and the difference is calculated through the error calculation unit 3262 included in the learning parameter correction unit 326. The error calculator 3262 may calculate a difference error ΔW between an ideal weight value and an actual weight value, and a difference error between the ideal bias value and the actual bias value.

오차 계산부(3262)는 계산된 가중치 오차(ΔW) 및 바이어스 값의 오차를 학습 파라미터 수정치 계산부(3264)에 전달한다. 학습 파라미터 수정치 계산부(3264)는 제공된 가중치 오차(ΔW), 바이어스 오차에 학습율(learning rate, η), 난수(r)을 곱하여 실제 가중치와 바이어스를 얼마나 수정할지를 결정한다. The error calculation unit 3262 transfers the calculated weight error ΔW and the error of the bias value to the learning parameter correction value calculation unit 3264. The learning parameter correction value calculation unit 3264 multiplies the provided weight error (ΔW) and bias error by a learning rate (η) and a random number (r) to determine how much to correct the actual weight and bias.

일 실시예로, 가중치 오차(ΔW)에 곱해지는 학습율 값과 바이어스 오차에 곱해지는 학습율 값은 서로 다르다. 또한, 가중치 오차(ΔW)에 곱해지는 난수 값과 바이어스 오차에 곱해지는 난수 값은 서로 다르다. 다른 실시예에서, 가중치 오차(ΔW)에 곱해지는 학습율 값과 바이어스 오차에 곱해지는 학습율 값은 서로 같다. 또한, 가중치 오차(ΔW)에 곱해지는 난수 값과 바이어스 오차에 곱해지는 난수 값은 서로 같다.In one embodiment, a learning rate value multiplied by a weight error ΔW and a learning rate value multiplied by a bias error are different from each other. In addition, the random number value multiplied by the weight error ΔW and the random number value multiplied by the bias error are different from each other. In another embodiment, the learning rate value multiplied by the weight error ΔW and the learning rate value multiplied by the bias error are the same. In addition, the random number value multiplied by the weight error ΔW and the random number value multiplied by the bias error are the same.

제어 신호(D1)는 크게 두 부분의 하드웨어 동작을 제어한다. 제어부(30)는 학습 파라미터 수정부 제어 신호(D2)를 통해 학습 파라미터 수정부(326)의 학습 파라미터 수정부 계산부(3264)를 제어하고, 또한 난수 생성기 제어 신호(D3)를 제공하여 난수 생성기부(12)를 제어한다. The control signal D1 largely controls the hardware operation of two parts. The control unit 30 controls the learning parameter correction unit calculation unit 3264 of the learning parameter correction unit 326 through the learning parameter correction unit control signal D2, and also provides a random number generator control signal D3 to generate a random number. Control the base 12.

하드웨어(3)의 온 칩 학습 시에 랜덤 가중치 업데이트 방식을 적용한다면, 가중치 수정치 계산부(3264)는 난수 생성기(34)가 생성한 난수를 이용하여 최종 가중치 수정치를 계산한다. If a random weight update method is applied during on-chip learning of the hardware 3, the weight correction value calculation unit 3264 calculates a final weight correction value using the random number generated by the random number generator 34.

외부 제어 신호(D1)의 조건에 따라 난수의 이용 방식이 달라질 수 있다. 일례로, 한 가중치마다 1 비트의 난수를 생성하여 해당 난수가 0이면 최종 가중치 수정치는 0, 그리고 해당 난수가 1이면 최종 가중치 수정치는 오차 계산부(3262)의 오차값을 통해 계산해낸 가중치 수정치로 설정할 수 있다. The method of using the random number may vary depending on the condition of the external control signal D1. For example, a 1-bit random number is generated for each weight, and if the corresponding random number is 0, the final weight correction value is 0, and if the corresponding random number is 1, the final weight correction value calculated through the error value of the error calculation unit 3262 Can be set to

즉, 난수의 조건에 따라 계산된 가중치 수정치를 최종 가중치 수정치로 하여 가중치 갱신에 반영하도록 할 수 있다. 이는 외부 제어 신호(D1)의 조건에 따라 한 가중치마다 1비트의 난수를 N개를 생성하여 그 난수가 모두 1일 때에만 할 수도 있고, 혹은 2비트 이상의 난수를 생성하여 그 난수의 값이 특정한 값 이상이 될 때에만 계산된 가중치 수정치를 최종 가중치 수정치로 하여 가중치 갱신에 반영하도록 할 수 있다.That is, the weight correction value calculated according to the condition of the random number may be used as the final weight correction value to be reflected in the weight update. Depending on the condition of the external control signal (D1), N random numbers of 1 bit are generated for each weight and can be done only when all of the random numbers are 1, or a random number of 2 or more bits is generated and the value of the random number is specified. Only when the value is greater than or equal to the value, the calculated weight correction value can be used as the final weight correction value and reflected in the weight update.

도 10은 뉴럴 네트워크를 탑재한 장치의 구조에 대한 예이다. 뉴럴 네트워크를 탑재한 장치는 훈련 데이터를 이용하여 뉴럴 네트워크를 학습하여 마련할 수 있다. 또 뉴럴 네트워크를 탑재한 장치는 학습된 뉴럴 네트워크를 이용하여 서비스를 제공하는 장치에 해당한다. 도 10에서 설명하는 장치는 전술한 과정을 이용하여 뉴럴 네트워크를 구축할 수 있다.10 is an example of a structure of a device equipped with a neural network. A device equipped with a neural network may be prepared by learning a neural network using training data. In addition, a device equipped with a neural network corresponds to a device that provides a service using a learned neural network. The device described in FIG. 10 may construct a neural network using the above-described process.

도 10(A)는 뉴럴 네트워크를 생성하는 PC(200)와 같은 장치에 대한 예이다. PC(200)는 훈련데이터 DB(50)로부터 훈련 데이터를 수신한다. PC(200)는 물리적으로 연결된 저장 매체 등을 통해서도 훈련 데이터를 수신할 수 있다. PC(200)는 뉴럴 네트워크를 생성하면서 전술한 비대칭 대조 발산 알고리즘을 사용하여 라벨 뉴런-숨겨진 뉴런 간 가중치의 수정치 ΔU를 연산할 수 있다. PC(200)는 데이터 뉴런만을 이용하여 숨겨진 뉴런을 구축하고 재구축할 수 있다.10A is an example of a device such as the PC 200 that generates a neural network. The PC 200 receives training data from the training data DB 50. The PC 200 may also receive training data through a physically connected storage medium. The PC 200 may calculate a correction value ΔU of the weight between the label neuron and the hidden neuron using the above-described asymmetric contrast divergence algorithm while generating the neural network. The PC 200 may construct and rebuild hidden neurons using only data neurons.

도 10(B)는 뉴럴 네트워크를 생성하는 서버(80)와 같은 장치에 대한 예이다. 서버(80)는 클라이언트 장치(80)로부터 훈련 데이터를 수신한다. 물론 서버(80)는 네트워크에 연결된 다른 객체로부터 훈련 데이터를 수신할 수도 있다. 서버(80)는 뉴럴 네트워크를 생성하면서 전술한 학습 알고리즘을 사용하여 라벨 뉴런-숨겨진 뉴런 간 가중치의 수정치 ΔU를 연산할 수 있다. 서버(80)는 데이터 뉴런만을 이용하여 숨겨진 뉴런을 구축하고 재구축할 수 있다.10(B) is an example of a device such as the server 80 generating a neural network. The server 80 receives training data from the client device 80. Of course, the server 80 may also receive training data from another object connected to the network. The server 80 may calculate a correction value ΔU of the weight between the label neuron and the hidden neuron using the above-described learning algorithm while generating the neural network. The server 80 may build and rebuild hidden neurons using only data neurons.

뉴럴 네트워크는 메모리 및 연산 소자로 구성되는 회로 내지 칩셋에 마련될 수도 있다. 도 10(C)는 물리적인 구성을 제한하지 않고, 뉴럴 네트워크를 탑재한 장치(400)에 대한 구성을 도시한 예이다. 도 10(C)는 PC(200), 서버(300) 또는 인공지능 탑재된 회로 기판(또는 칩) 등의 구성일 수 있다.The neural network may be provided in a circuit or chipset composed of a memory and an operation element. FIG. 10C is an example of a configuration of a device 400 equipped with a neural network without limiting the physical configuration. 10(C) may be a configuration of a PC 200, a server 300, or a circuit board (or chip) mounted with artificial intelligence.

뉴럴 네트워크를 탑재한 장치(400)는 입력장치(410), 연산장치(420) 및 저장장치(430)를 포함한다. 나아가 뉴럴 네트워크를 탑재한 장치(400)는 출력장치(440)를 더 포함할 수도 있다. The device 400 equipped with a neural network includes an input device 410, an operation device 420, and a storage device 430. Furthermore, the device 400 equipped with a neural network may further include an output device 440.

입력 장치(410)는 훈련 데이터 내지 입력 데이터를 입력받는다. 입력 장치(410)는 네트워크로 데이터를 수신하는 통신 장치 내지 인터페이스 장치일 수 있다. 또 입력 장치(410)는 유선 네트워크로 데이터를 수신하는 인터페이스 장치일 수도 있다. 한편 입력 장치(410)는 외부 제어 신호를 수신할 수도 있다.The input device 410 receives training data or input data. The input device 410 may be a communication device or an interface device that receives data through a network. Also, the input device 410 may be an interface device that receives data through a wired network. Meanwhile, the input device 410 may receive an external control signal.

저장장치(430)는 기본적으로 뉴럴 네트워크 모델을 저장할 수 있다. 저장장치(430)는 데이터를 저장할 수 있는 다양한 매체로 구현될 수 있다. 저장장치(430)는 학습전의 초기 뉴럴 네트워크를 저장하고, 학습 과정에 사용되는 다양한 정보 및 파라미터를 저장하고, 학습 완료된 뉴럴 네트워크를 저장할 수 있다.The storage device 430 may basically store a neural network model. The storage device 430 may be implemented with various media capable of storing data. The storage device 430 may store an initial neural network before learning, various information and parameters used in a learning process, and store a neural network that has been trained.

연산장치(440)는 훈련 데이터를 이용하여 뉴럴 네트워크를 학습하여 마련한다. 연산장치(440)는 훈련 데이터를 기반으로 데이터 뉴런 및 라벨 뉴런을 구축하고, 데이터 뉴런의 출력값을 기반으로 숨겨진 뉴런을 구축할 수 있다. 또 연산장치(440)는 숨겨진 뉴런의 출력값을 기반으로 데이터 뉴런 및 라벨 뉴런을 재구축할 수 있다. 연산장치(440)는 전술한 과정에 따라 뉴럴 네트워크를 학습할 수 있다. 따라서 연산장치(440)는 데이터 뉴런만을 이용하여 숨겨진 뉴런을 구축하고 재구축할 수 있다. 연산장치(440)는 뉴럴 네트워크를 생성하면서 전술한 비대칭 대조 발산 알고리즘을 사용하여 라벨 뉴런-숨겨진 뉴런 간 가중치의 수정치 ΔU를 연산할 수 있다.The computing device 440 learns and prepares a neural network using training data. The computing device 440 may build data neurons and label neurons based on the training data, and build hidden neurons based on the output values of the data neurons. In addition, the computing device 440 may reconstruct data neurons and label neurons based on output values of hidden neurons. The computing device 440 may learn the neural network according to the above-described process. Accordingly, the computing device 440 may construct and rebuild hidden neurons using only data neurons. The computing device 440 may calculate a correction value ΔU of the weight between the label neuron and the hidden neuron using the above-described asymmetric contrast divergence algorithm while generating the neural network.

연산장치(440)는 입력 데이터를 학습된 뉴럴 네트워크에 입력하여 일정한 결과값을 도출할 수 있다.The computing device 440 may derive a constant result value by inputting the input data into the learned neural network.

연산장치(440)는 일정한 명령 내지 프로그램을 구동하여 데이터를 처리하는 장치에 해당한다. 연산장치(440)는 명령 내지 정보를 임시 저장하는 메모리(버퍼) 및 연산 처리를 수행하는 프로세서로 구현될 수 있다. 프로세서는 장치의 종류에 따라 CPU, AP, FPGA 등으로 구현될 수 있다.The computing device 440 corresponds to a device that processes data by driving certain commands or programs. The computing device 440 may be implemented with a memory (buffer) that temporarily stores instructions or information and a processor that performs calculation processing. The processor may be implemented as a CPU, AP, or FPGA depending on the type of device.

출력장치(440)는 외부로 필요한 데이터를 송신하는 통신 장치일 수 있다. 출력장치(440)는 학습한 뉴럴 네트워크가 도출한 결과값을 외부로 전송할 수 있다. 경우에 따라서 출력장치(440)는 뉴럴 네트워크 학습과정이나, 학습한 뉴럴 네트워크가 도출할 결과값을 화면으로 출력하는 장치일 수도 있다.The output device 440 may be a communication device that transmits necessary data to the outside. The output device 440 may externally transmit a result value derived from the learned neural network. In some cases, the output device 440 may be a device that outputs a neural network learning process or a result value to be derived from the learned neural network on a screen.

또한, 상술한 바와 같은 뉴럴 네트워크 학습 방법, 뉴럴 네트워크를 이용한 추론 방법 및 뉴럴 네트워크에 대한 운용 방법은 컴퓨터에서 실행될 수 있는 실행가능한 알고리즘을 포함하는 프로그램(또는 어플리케이션)으로 구현될 수 있다. 상기 프로그램은 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다.In addition, the neural network learning method, the inference method using the neural network, and the neural network operation method as described above may be implemented as a program (or application) including an executable algorithm that can be executed on a computer. The program may be provided by being stored in a non-transitory computer readable medium.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.The non-transitory readable medium refers to a medium that stores data semi-permanently and can be read by a device, not a medium that stores data for a short moment, such as a register, cache, or memory. Specifically, the above-described various applications or programs may be provided by being stored in a non-transitory readable medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, and ROM.

본 실시례 및 본 명세서에 첨부된 도면은 전술한 기술에 포함되는 기술적 사상의 일부를 명확하게 나타내고 있는 것에 불과하며, 전술한 기술의 명세서 및 도면에 포함된 기술적 사상의 범위 내에서 당업자가 용이하게 유추할 수 있는 변형 예와 구체적인 실시례는 모두 전술한 기술의 권리범위에 포함되는 것이 자명하다고 할 것이다.The present embodiment and the accompanying drawings are merely illustrative of some of the technical ideas included in the above-described technology, and those skilled in the art will be able to easily within the scope of the technical ideas included in the specification and drawings of the above-described technology. It will be apparent that all of the modified examples and specific embodiments that can be inferred are included in the scope of the rights of the above-described technology.

1, 2, 3: 하드웨어 10, 20, 30: 제어부
11: 숨겨진 뉴런부 13: 라벨 뉴런부
12, 22, 32: 연산부 34: 난수 생성기
24: 입력 데이터 버퍼부 26: 유사 난수 생성기
262: 시드 버퍼부 264: LFSR 및 난수 생성부
122, 222, 322: 축적-발현부 1222: 가중치 저장부
1224: 축적부 1226: 발현부
124, 224, 324: 투표 및 추론부 1242, 2242: 투표부
1246, 2246: 추론부 326: 가중치 수정부
3262: 오차 계산부 3264: 가중치 수정치 계산부
200 : PC 80 : 클라이언트 장치 300 : 서버 400 : ClassRBM을 탑재한 장치
410 : 입력장치 420 : 연산장치
430 : 저장장치 440 : 출력장치 1, 2, 3: hardware 10, 20, 30: control unit
11: hidden neuron part 13: label neuron part
12, 22, 32: operator 34: random number generator
24: input data buffer unit 26: pseudo-random number generator
262: seed buffer unit 264: LFSR and random number generator
122, 222, 322: accumulation-expression unit 1222: weight storage unit
1224: accumulation portion 1226: expression portion
124, 224, 324: Voting and Reasoning Department 1242, 2242: Voting Department
1246, 2246: reasoning unit 326: weight correction unit
3262: error calculation unit 3264: weight correction value calculation unit
200: PC 80: Client device 300: Server 400: Device equipped with ClassRBM
410: input device 420: operation device
430: storage device 440: output device

Claims

Neural network hardware that performs on-chip learning of neural networks,
The neural network hardware performs the on-chip learning process to apply a random learning parameter update method when updating a learning parameter,
The learning parameter includes a weight and a bias,
The learning parameter update
The actual weight correction amount is determined by multiplying the weight error by the weight learning rate and the random number,
Neural network hardware that determines the actual bias correction amount by multiplying the bias error by the bias learning rate and random number.

The method of claim 1,
The random learning parameter update method,
Neural network hardware that applies the calculated training parameter correction value to the actual training parameter update only when the provided random number satisfies a predetermined condition.

According to claim 2
The random number generator is any one of a pseudorandom number generator and a pure random number generator,
Neural network hardware that controls the random number generator to generate a random number of 1 bit or more than 2 bits by one or more times each time the random number generator updates one learning parameter according to an external control signal.

delete