WO2013115431A1 - 신경망 컴퓨팅 장치 및 시스템과 그 방법 - Google Patents
신경망 컴퓨팅 장치 및 시스템과 그 방법 Download PDFInfo
- Publication number
- WO2013115431A1 WO2013115431A1 PCT/KR2012/003067 KR2012003067W WO2013115431A1 WO 2013115431 A1 WO2013115431 A1 WO 2013115431A1 KR 2012003067 W KR2012003067 W KR 2012003067W WO 2013115431 A1 WO2013115431 A1 WO 2013115431A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- memory
- neuron
- attribute value
- neural network
- connection line
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Definitions
- the present invention relates to the field of digital neural network computing technology, and more particularly, to a distributed memory structure and all neurons, which operate as a synchronized circuit in which all components are synchronized to one system clock, and store artificial neural network data.
- the present invention relates to a neural network computing device, a system, and a method, comprising a computational structure for processing a time division in a pipeline circuit.
- a digital neural network computer is an electronic circuit implemented for the purpose of simulating a biological neural network to implement functions similar to the role of the brain.
- the method of constructing artificial neural networks is called a neural network model.
- artificial neurons are connected by directional connectors to form a network, and each neuron has its own attribute values and passes the values through the connector to affect the attributes of adjacent neurons.
- Crazy The connection between neurons and neurons also has unique property values that control the strength of the transmitted signal.
- the property value of the neuron most commonly used in various neural network models is the state value corresponding to the output value of the neuron, and the property value of the most commonly used connection line is a weight value representing the connection strength of the connection line.
- Neurons in the artificial neural network can be divided into input neurons that receive input values from the outside, output neurons that serve to process the result, and the remaining hidden neurons.
- the cycle of calculating the one time is called the neural network update cycle.
- the execution of the digital artificial neural network proceeds by repeatedly executing the neural network update cycle.
- connection line attribute value inside the neural network.
- the step of adjusting the value of the connection line of the artificial neural network to accumulate knowledge is called a learning mode, and the step of finding stored knowledge by presenting input data is called a recall mode.
- the recall mode assigns input data values to input neurons and repeats the neural network update cycle to derive output neuron state values, and all neurons in the neural network within one neural network update cycle.
- the state value of the neuron calculated for each j is calculated as shown in Equation 1 below.
- y j (T) is the state value (property value) of neuron j calculated in the T-th neural network update period
- f is an activation function that determines the output of neuron j
- p j Is the number of input leads of neuron j
- w ij Is the weight value (property value) of the i-th input lead of neuron j
- M ij Is the number of neurons connected to the i-th input lead of neuron j.
- neurons emit instantaneous spike signals, and the connecting lines (synapses) that receive these spike signals generate signals in various patterns for a period of time, and these signals are summed. Use the way it is delivered.
- the pattern type through which the signal is delivered may vary for each connection line.
- the attribute values of the connection lines as well as the attribute values of the neurons are updated together in one neural network update period.
- Backpropagation algorithm is a supervised learning method in which the supervisor outside the system assigns the most desirable output value to a specific input value in the learning mode. Sub-cycles such as 4 to 4.
- a first sub-period for each output neuron to obtain an error value of the output neuron based on an externally provided desired output value and a current output value.
- the second sub-period to propagate the error value of the output neuron to other neurons so that the non-output neuron also has the error value.
- a third sub-period for propagating the value of the input neuron to another neuron to calculate a new state value of all neurons (same as the contents of the recall mode).
- the order of execution of the four subcycles in the neural network update period is not important.
- the first sub period is a step of calculating Equation 3 below for all output neurons.
- teach j is a learning value (learning data) provided to an output neuron j
- ⁇ j is an error of neuron j.
- the second sub period is a step of calculating Equation 4 below for all neurons other than the output neuron.
- ⁇ j (T) is the error value of the neuron j in the neural network update period T
- R ij is the number of neurons connected to the reverse i-th connection line of neuron j.
- the third sub-period is to calculate Equation 1 for each neuron. This is because the third sub period is the same as the recall mode.
- the fourth sub period is a step of calculating Equation 5 below for each neuron.
- ⁇ is a constant and net j is the input value of neuron j to be.
- a delta learning rule or a hebb's rule may be used for learning in addition to the backpropagation algorithm according to the neural network model.
- the method has a feature that can be generalized to Equation 6 as follows.
- ⁇ intrinsic value of neuron j ⁇ in Equation 6 is to be.
- a neural network model such as a deep belief network may alternately calculate forward propagation and backward propagation for all or part of a network.
- Neural network computers are used to predict the future based on pattern recognition or a priori knowledge to find the most appropriate pattern for a given input, and can be used in various fields such as robot control, military equipment, medicine, games, weather information processing, and human-machine interfaces. Can be used for
- Direct implementation is implemented by mapping logical neurons of artificial neural networks to physical neurons one-to-one, and most analog neural network chips fall into this category.
- direct implementation method can achieve a high processing speed, it is difficult to apply various neural network models and it is difficult to apply to a large neural network.
- the conventional direct implementation method can produce a high processing speed, but it is difficult to apply various neural network models and it is difficult to apply to a large neural network.
- the conventional virtual implementation method has various neural network models and a large scale.
- the neural network can be executed but there is a problem that it is difficult to obtain a high speed, and it is an object of the present invention to solve this problem.
- the present invention includes a distributed memory structure that operates as a synchronized circuit in which all components are synchronized to one system clock, a distributed memory structure for storing artificial neural network data, and a calculation structure for time-dividing all neurons in a pipeline circuit. Accordingly, an object of the present invention is to provide a neural network computing device and a system and a method thereof capable of applying various neural network models and large-scale neural networks and at the same time capable of high speed processing.
- a first apparatus of the present invention for achieving the above object is a neural network computing device, comprising: a control unit for controlling the neural network computing device; A plurality of memory units for outputting connection line attribute values and neuron attribute values, respectively; And a calculation unit for calculating a new neuron attribute value by using a connection line attribute value and a neuron attribute value respectively input from the plurality of memory units, and feeding back a new neuron attribute value to each of the plurality of memory units.
- a second apparatus of the present invention for achieving the above object is a neural network computing device, comprising: a control unit for controlling the neural network computing device; A plurality of memory units for outputting connection line attribute values and neuron attribute values, respectively; One calculation unit for calculating a new neuron attribute value by using a connection line attribute value and a neuron attribute value respectively input from the plurality of memory units; Input means for providing input data from the control unit to an input neuron; Switching means for switching input data from the input means or new neuron attribute values from the calculation unit into the plurality of memory units under control of the control unit; And a dual memory replacement (SWAP) circuit for switching all inputs and outputs to each other under the control of the control unit, and connecting the first and second output means to output a new neuron attribute value from the calculation unit to the control unit.
- SWAP dual memory replacement
- the first system of the present invention for achieving the above object, the neural network computing system, a control unit for controlling the neural network computing system; A plurality of memory units including "a plurality of memory parts each outputting a connection line attribute value and a neuron attribute value"; And a plurality of calculations for calculating new neuron attribute values using respective connection line attribute values and neuron attribute values respectively input from the corresponding plurality of memory parts in the plurality of memory units, and for feeding back each of the corresponding plurality of memory parts. It includes a unit.
- the third apparatus of the present invention for achieving the above object, the neural network computing device, comprising: a control unit for controlling the neural network computing device; A plurality of memory units for outputting connection line attribute values and neuron error values, respectively; And a calculation unit for calculating a new neuron error value by using a connection line attribute value and a neuron error value respectively inputted from the plurality of memory units, and feeding back a new neuron error value to each of the plurality of memory units.
- a fourth apparatus of the present invention for achieving the above object, the neural network computing device, comprising: a control unit for controlling the neural network computing device; A plurality of memory units for outputting connection line attribute values and neuron attribute values, and calculating new connection line attribute values using the connection line attribute values, neuron attribute values, and learning attribute values, respectively; And one calculation unit for calculating a new neuron attribute value and a learning attribute value using the connection line attribute value and the neuron attribute value respectively input from the plurality of memory units.
- a fifth apparatus of the present invention for achieving the above object, the neural network computing device, comprising: a control unit for controlling the neural network computing device; A first learning attribute value memory for storing a learning attribute value of a neuron; A plurality of memory units for outputting a connection line attribute value and a neuron attribute value, respectively, and calculating a new connection attribute value by using the connection attribute value and the neuron attribute value and the learning attribute value of the first learning attribute value memory; A calculation unit for calculating a new neuron attribute value and a learning attribute value by using a connection line attribute value and a neuron attribute value respectively input from the plurality of memory units; And a second learning attribute value memory for storing the new learning attribute value calculated in the one calculating unit.
- a sixth apparatus of the present invention for achieving the above object is a neural network computing device, comprising: a control unit for controlling the neural network computing device; A plurality of memory units for storing and outputting a connection line attribute value, a forward neuron attribute value, and a reverse neuron attribute value, respectively, and for calculating a new connection line attribute value; And a calculation unit for calculating a new forward neuron attribute value and a reverse neuron attribute value based on data input from each of the plurality of memory units, and feeding the new neuron attribute value back to each of the plurality of memory units.
- the second system of the present invention for achieving the above object, the neural network computing system, a control unit for controlling the neural network computing system; "A plurality of outputting the connector property values and the reverse neuron property value, respectively, or outputting the connection property value and the forward neuron property value, respectively, and using the connector property value, the forward neuron property value and the learning property value, A plurality of memory units including memory parts "; And calculating a new backward neuron attribute value by using a connection line attribute value and a reverse neuron attribute value respectively input from the corresponding plurality of memory parts in the plurality of memory units, and feeding back each of the corresponding plurality of memory parts; A plurality of calculations for calculating a new forward neuron attribute value and a learning attribute value by using a connection line attribute value and a forward neuron attribute value respectively input from the corresponding plurality of memory parts and feeding back each of the corresponding plurality of memory parts It includes a unit.
- the seventh apparatus of the present invention for achieving the above object, in the memory device of the digital system, all the input and output of the two memories using a plurality of digital switches controlled by a control signal from an external control unit A dual memory replacement (SWAP) circuit that is interchangeably connected to each other is applied to the two memories.
- SWAP dual memory replacement
- the control unit of the control unit outputs the connection line attribute value and neuron attribute value, respectively; And calculating, by a calculation unit, a new neuron attribute value by using a connection line attribute value and a neuron attribute value respectively input from the plurality of memory units, and feeding it back to each of the plurality of memory units according to the control of the control unit.
- the plurality of memory unit and the one calculation unit in synchronization with one system clock under the control of the control unit operates in a pipelined manner.
- the second method of the present invention for achieving the above object, in the neural network computing method, according to the control of the control unit, receiving data for providing to the input neurons from the control unit; Switching new neuron attribute values from the input data or calculation unit into a plurality of memory units under control of the control unit; Outputting a connection line attribute value and a neuron attribute value by the plurality of memory units, respectively, according to the control of the control unit; In accordance with the control of the control unit, calculating, by one of the calculation units, a new neuron attribute value by using a connection line attribute value and a neuron attribute value respectively input from the plurality of memory units; And first and second output means composed of a dual memory replacement (SWAP) circuit for swapping all the inputs and outputs according to the control of the control unit so that new neuron attribute values from the calculation unit are output to the control unit.
- SWAP dual memory replacement
- a plurality of memory parts in a plurality of memory units respectively output connection line attribute values and neuron attribute values under control of a control unit. step; And, according to the control of the control unit, a plurality of calculation units calculate new neuron attribute values using the connection line attribute values and the neuron attribute values respectively inputted from the corresponding plurality of memory parts in the plurality of memory units, respectively. And feeding back to each of the plurality of memory parts, wherein the plurality of memory parts and the plurality of calculation units in the plurality of memory units are synchronized to one system clock under a control of the control unit. It works.
- the control unit of the control unit a plurality of memory units for outputting the connection line attribute value and neuron error value, respectively; And calculating, by a calculation unit, a new neuron error value by using a connection line attribute value and a neuron error value respectively inputted from the plurality of memory units, and feeding it back to each of the plurality of memory units according to the control of the control unit.
- the plurality of memory unit and the one calculation unit in synchronization with one system clock under the control of the control unit operates in a pipelined manner.
- a neural network computing method comprising: outputting, by a plurality of memory units, a connection line attribute value and a neuron attribute value according to control of a control unit; In accordance with the control of the control unit, one calculation unit calculating a new neuron attribute value and a learning attribute value by using the connection line attribute value and the neuron attribute value respectively input from the plurality of memory units; And calculating, by the plurality of memory units, a new connection line attribute value using a connection line attribute value, a neuron attribute value, and a learning attribute value according to the control of the control unit.
- the calculation unit is operated in a pipelined manner in synchronization with one system clock under the control of the control unit.
- a plurality of memory units store connection line attribute values, forward neuron attribute values, and reverse neuron attribute values, respectively, under control of a control unit. Outputting and calculating a new connection line attribute value; And calculating, by a calculation unit, a new forward neuron attribute value and a reverse neuron attribute value based on data input from the plurality of memory units, respectively, and feeding back each of the plurality of memory units according to the control of the control unit.
- the plurality of memory unit and the one calculation unit in synchronization with one system clock under the control of the control unit operates in a pipelined manner.
- a seventh method of the present invention for achieving the above object, in a neural network computing method, a plurality of memory parts in a plurality of memory units respectively output connection line attribute values and reverse neuron attribute values according to control of a control unit.
- a plurality of calculation units calculate new reverse neuron attribute values using the connection line attribute values and the reverse neuron attribute values respectively inputted from the corresponding plurality of memory parts in the plurality of memory units.
- the plurality of memory parts in the plurality of memory units output connection line attribute values and forward neuron attribute values, respectively, and use new connection line attributes using the connection line attribute value, the forward neuron attribute value, and the learning attribute value. Calculating a value; And, according to the control of the control unit, the plurality of calculation units calculate new forward neuron attribute values and learning attribute values using the connection line attribute values and the forward neuron attribute values respectively input from the corresponding plurality of memory parts. And feeding back each of the corresponding plurality of memory parts, wherein the plurality of memory parts and the plurality of calculation units in the plurality of memory units are piped in synchronization with one system clock under control of the control unit. It works in line.
- the present invention is not limited to the network topology, the number of neurons, and the number of connection lines of the neural network, and has the effect of executing various neural network models including an arbitrary activation function.
- the present invention can be designed by arbitrarily determining the number of connection lines p that the neural network computing system can simultaneously handle, and can recall or train up to p connection lines at every memory access cycle at the same time. It has the advantage of being able to run at high speed.
- the present invention has the advantage that can arbitrarily increase the precision (precision) of the operation without reducing the maximum speed that can be implemented.
- the application of the present invention is not only possible to implement a large-capacity general-purpose neural network computer, but also to be integrated into a small semiconductor, and thus may be applied to various artificial neural network applications.
- FIG. 1 is a configuration diagram of an embodiment of a neural network computing device according to the present invention.
- FIG. 2 is a detailed configuration diagram of an embodiment of a control unit according to the present invention.
- FIG. 3 is an exemplary view illustrating a flow of data progressed by a control signal according to the present invention
- FIG. 4 is an exemplary view for explaining a pipeline structure of a neural network computing device according to the present invention.
- SWAP dual memory replacement
- SWAP single memory replacement
- FIG. 8 is a detailed configuration diagram of an embodiment of a calculation unit according to the present invention.
- FIG. 10 is a detailed illustration for explaining a multi-stage pipeline structure of a neural network computing device according to the present invention.
- FIG. 11 is an exemplary view for explaining a parallel calculation line technique according to the present invention.
- FIG. 13 is an exemplary diagram illustrating a case where a parallel calculation line scheme according to the present invention is applied to a multiplier, an adder, or an activation function calculator.
- FIG. 13 is an exemplary diagram illustrating a case where a parallel calculation line scheme according to the present invention is applied to a multiplier, an adder, or an activation function calculator.
- FIG. 14 is an exemplary view illustrating a case where a parallel calculation line technique according to the present invention is applied to an accumulator
- 15 is a view showing the flow of input and output data when the parallel calculation line technique according to the present invention is applied;
- 16 is a detailed illustration for explaining a multi-stage pipeline structure when the parallel computing line technique is applied to a neural network computing device according to the present invention
- 17 is a view for explaining another structure of the calculation unit according to the present invention.
- FIG. 18 is a view showing input and output data flow in the calculation unit of the other structure of FIG. 17 according to the present invention.
- 19 is a view for explaining another structure of the activation function operator and the YN memory according to the present invention.
- FIG. 20 is a configuration diagram of another embodiment of a neural network computing device according to the present invention.
- 21 is a diagram for explaining a neural network update period according to the present invention.
- FIG. 22 is a detailed block diagram of an embodiment of a multiplier of a calculation unit that calculates Equation 2;
- FIG. 23 is a configuration diagram of an embodiment of a neural network computing system according to the present invention.
- 24 is a view for explaining the structure of a neural network computing device that executes a first sub period and a second sub period of the backpropagation learning algorithm according to the present invention
- 25 is a view for explaining the structure of a neural network computing device for executing a learning algorithm according to the present invention.
- 26 is a diagram illustrating a data flow in the neural network computing device of FIG. 25 according to the present invention.
- FIG. 27 is a diagram illustrating a neural network computing device for alternately performing a reverse propagation period and a forward propagation period for all or some networks of one neural network according to the present invention
- FIG. 28 is a view for explaining another calculation structure simplified the neural network computing device of FIG. 27 according to the present invention.
- FIGS. 27 and 28 are detailed configuration diagrams of a calculation unit in the neural network computing device of FIGS. 27 and 28 according to the present invention.
- FIG. 30 is a detailed configuration diagram of the soma processor in the calculation unit of FIG. 29 according to the present invention.
- FIG. 31 is a configuration diagram of another embodiment of a neural network computing system according to the present invention.
- 32 is a detailed block diagram of an embodiment of a multiplier of a calculation unit when the calculation model of the neural network executed in the calculation unit is a dynamic synaptic model or a spiking neural network model.
- 33 is a view for explaining another structure of the neural network computing device for executing the learning algorithm according to the present invention.
- FIG. 1 is a configuration diagram of an neural network computing device according to an embodiment of the present invention, showing a basic detailed structure thereof.
- the neural network computing device includes a control unit 119 for controlling the neural network computing device, and a plurality of memory units (also known as synaptic units) for outputting connection line attribute values and neuron attribute values, respectively. And a new neuron attribute value (used as a neuron attribute value of a next neural network update period) by using a connection line attribute value and a neuron attribute value respectively input from the plurality of memory units 100.
- One calculation unit 101 for feeding back to each of the two memory units 100.
- the InSel input (connected wire bundle number 112) and the OutSel input (address and write permission signal 113 to store the neuron attribute value of the next neural network update period, 113) respectively connected to the control unit 119 are each of the plurality of memories. Commonly connected to the unit 100. Each output (connected line attribute value and neuron attribute value 114, 115) of the plurality of memory units 100 is connected to an input of the calculation unit 101. The output of the calculation unit 101 (the neuron attribute value of the next neural network update period) is commonly connected to the inputs of all the plurality of memory units 100 through the Y bus 111.
- Each memory unit 100 is configured to store a W memory (first memory) 102 for storing connection line attribute values and a unique number of a neuron (for example, an address value of YC memory in which neuron attribute values are stored).
- YN memory fourth memory
- an address input (AD: Address Input) of the W memory 102 and the M memory 103 is commonly tied and connected to an InSel input 112, and a data output (DO: Data Output) of the M memory 103 is performed.
- the data outputs of the W memory 102 and the YC memory 104 are connected to the inputs of the calculation unit 101, respectively.
- the OutSel input 113 is connected to an address input of the YN memory 105 and a write enable (WE) input, and the Y bus 111 is connected to a data input (DI) of the YN memory 105.
- the address input terminal of the W memory 102 of the memory unit 100 may further include a first register (temporarily storing a connection line bundle number input to the W memory, 106), and the address input terminal of the YC memory 104.
- the second register (temporarily stores the unique number of the neuron output from the M memory) may be further included.
- the address input terminal of the W memory 102 of the memory unit 100 may further include a first register (temporarily storing a connection line bundle number input to the W memory, 106), and the address input terminal of the YC memory 104.
- the second register (temporarily stores the unique number of the neuron output from the M memory) may be further included.
- the first and second registers 106 and 107 are synchronized to one system clock such that the W memory 102, the M memory 103 and the YC memory operate in a pipelined manner under the control of the control unit 119. Do it.
- a plurality of third registers (connection line attribute values from W memory and neuron attribute values from YC memory) between the outputs of all the plurality of memory units 100 and the inputs of the calculation unit 101. 109 may be further included.
- a fourth register (temporarily storing a new neuron attribute value output from the calculation unit, 110) may be further included at the output terminal of the calculation unit 101.
- the third and fourth registers 108 to 110 are synchronized by one system clock such that the plurality of memory units 100 and the one calculation unit 101 are pipelined under the control of the control unit 119. It works in a way.
- the electronic switch may further include a digital switch 116 that selects one of the Y-buses 111 on which the calculated neuron attribute value is output and connects to each memory unit 100.
- the output 118 of the calculation unit 101 is connected to the control unit 119 to transmit the value of the neuron to the outside.
- Initial values of the W memory 102, the M memory 103, and the YC memory 104 of the memory unit 100 are stored in advance by the control unit 119.
- the control unit 119 stores a value in each memory in the memory unit 100
- the value may be stored in each memory according to the following procedures a to h.
- the neural network adds one virtual neuron having an attribute value that does not affect any neuron and makes all virtual connection lines connect to this virtual neuron.
- the k-th address of the M memory 103 of the i-th memory unit of the memory unit has a number value of the neuron connected to the i-th connection line of the k-th connection line bundle (the attribute value of the neuron is the YC memory of the i-th memory unit of the memory unit). Storing the address value stored in 104).
- the control unit 119 supplies the InSel input with the number value of the connection line bundle starting from 1 and incremented by 1 every system clock cycle.
- the output of the plurality of memory units 100 includes a connection line attribute value of each of the connection lines included in a specific connection line bundle and a neuron connected as an input to the connection line every system clock cycle. Attribute values are output sequentially. In this order, the sequence of connecting bundles is sequentially repeated from the first bundle of neurons to the last bundle of neurons, and then from the first bundle of neurons to the last bundle of neurons, and the last of the last neurons. Repeat until the bundle is printed.
- the calculation unit 101 receives an output (connection line attribute value and neuron attribute value) of the memory unit 100 as an input and calculates a new attribute value of the neuron.
- an output connection line attribute value and neuron attribute value
- data of the connection line bundles of each neuron are sequentially input to the calculation unit 101 after a certain system clock period has elapsed since the neural network update cycle starts.
- a new neuron attribute value is calculated and output every n system clock cycles.
- FIG. 2 is a detailed configuration diagram of an embodiment of a control unit according to the present invention.
- control unit 201 provides various control signals to the neural network computing device 202 as described above in FIG. 1 and initializes, real-time or non-initializes each memory in the memory unit. It performs real time input data loading, real time or non real time output data retrieval.
- the control unit 201 may be connected to the host computer 200 to receive control from the user.
- the control memory 204 stores timing and control information of all control signals 205 necessary for processing each connection line bundle and each neuron in the neural network update period, and the neural network provided from the clock period counter 203.
- the control signal may be extracted according to the clock period in the update period.
- FIG. 3 is an exemplary view illustrating a flow of data advanced by a control signal according to the present invention.
- the unique number of the bundle bundle is sequentially input by the control unit 201 via the InSel input 112. If the InSel input 112 is provided with a value k for a particular tie bundle at a particular clock period, then the first and second registers 106 and 107 are assigned the k value and the i th connector of the k th tie bundle, respectively, in the next clock period. The unique number of the neuron providing the attribute value is stored in.
- the plurality of third registers 108 and 109 store attribute values of the i-th connection line of the k-th bundle and each of the neurons providing the attribute values of the i-th connection of the k-th bundle. do.
- the p memory units 100 simultaneously output the attribute values of the p connection lines belonging to one connection line bundle and the attribute values of the neurons connected to each connection line to the calculation unit 101, and provide the values of the two connection line bundles of the neuron j.
- the newly calculated attribute value of the neuron j is stored in the fourth register 110.
- the new neuron attribute values stored in the fourth register 110 are commonly stored in each of the YN memories 104 of all the memory units 100 in the next clock period (the new neuron attribute values stored in each YN memory are updated in the next neural network). Used as the neuron property of the cycle).
- the address to be stored and the write permission signal WE are provided by the control unit 201 through the OutSel input 113.
- one neural network update cycle can be ended and the next neural network update cycle can be started.
- FIG. 4 is an exemplary view illustrating a pipeline structure of a neural network computing device according to the present invention.
- the neural network computing device operates like a pipeline circuit composed of stages under the control of a control unit.
- the clock cycle, or pipeline cycle, in pipeline circuits can be shortened to the time of the most time-consuming stages of each pipeline stage. Therefore, if tmem is a memory access time and tcalc is a calculation cycle of a calculation unit, an ideal pipeline period of the neural network computing device according to the present invention is max (tmem, tcalc).
- the calculation unit may be internally configured as a pipeline circuit to further shorten the calculation period tcalc of the calculation unit.
- the calculation unit is characterized in that input data is sequentially input, calculation results are sequentially output, and there is no time dependency between input and output. Therefore, the latency in which the output data is calculated after the input data is input does not significantly affect the performance of the system when there is a lot of data to be calculated, but instead the calculation cycle in which the output data is calculated affects the performance of the system. Crazy Therefore, in order to shorten the calculation period, it is desirable to design the internal structure of the calculation unit in a pipelined manner.
- a method of processing each calculation step into a pipeline by adding a register synchronized by the system clock between the calculation steps inside the calculation unit can be used.
- the calculation period of the calculation unit can be shortened to the maximum value of the calculation periods of each calculation step. This content can be applied regardless of the type of calculation performed by the calculation unit, and will be made clearer through the embodiment of FIG. 8 described below, for example, under the premise of a specific calculation.
- a method of implementing the calculation device internal structure in a pipeline circuit synchronized to the system clock may be used. In this case, the calculation period of each calculation device can be shortened to the pipeline period of the internal structure.
- a parallel calculation line technique for distributing sequentially input data to a plurality of computing devices through a divider and collecting the calculation results of the plurality of computing devices into a multiplexer may be applied. This content can be applied regardless of the type of calculation performed by the calculation unit, and will be made clearer with reference to the embodiment of FIG. 11 described below, for example, under the premise of a specific calculation.
- the contents of the YN memory 401 are YC memory. It must be stored at location 400. However, when the contents of the YN memory 401 are copied to the YC memory 400, processing time may be required, which may greatly reduce the performance of the system.
- a method of implementing the two memories using a double memory replacement (SWAP) method (2) a single memory redundant storage method, and (3) a method of using a single memory replacement circuit described below. Etc.
- SWAP double memory replacement
- the dual memory replacement method has the same effect as using a plurality of 1-bit digital switches to completely change the input and output of two identical devices (memory).
- SWAP dual memory replacement
- a logic circuit as shown in FIG. 5A may be used.
- a 1-bit switch is expressed as "500" shown in FIG. 5 (b), and an N-bit switch composed of N 1-bit switches will be described as shown in FIG. 5 (b2).
- FIG. 5C illustrates a structure in which two physical devices D1 and D2 having a 3-bit input and a 1-bit output are implemented as replacement circuits.
- physical device D1 501 has a11, a21, a31 connected to its input, a41 to its output, and physical device D2 502 has a12, a22, a32 connected to it. It is connected to the input and a42 to the output.
- the dual memory replacement circuit in which the replacement circuit is applied to the two memories 505 and 506 in this manner is shown in FIG.
- FIG. 1 a circuit in which the dual memory replacement method is applied to the YC memory 104 and the YN memory 105, and the input / output which is not used is omitted will be described as shown in FIG. 5F.
- the two memory roles are swapped under the control of the control unit, thereby transferring the contents without physically moving the contents of the memory.
- the contents of the YN memory 105 stored in the update period can be directly used by the YC memory 104.
- the single memory redundant storage method instead of using two memories (YC memory and YN memory of FIG. 1) as described above, one memory is used and a read process (role of YC memory of FIG. 1).
- the process of writing and writing (the role of YN memory in FIG. 1) is divided into time divisions in one pipeline cycle, and the property values of neurons are stored in the same storage location (memory) without distinguishing between the old value and the new value.
- the single memory replacement (SWAP) method uses one memory and read process (the YC memory in FIG. Role) and the write process (the role of YN memory in Figure 1) as time-division in one pipeline cycle, and the attribute values of existing neurons are stored in half of the memory storage space, calculated by the computation unit, and then updated by the neural network.
- the attribute value of a neuron in a period is stored in another half region.
- the roles of the two memory regions are swapped.
- SWAP single memory replacement
- SWAP single memory replacement
- the READ / WRITE control input 604 of the N-bit switch 601 is connected to one of the inputs of the exclusive OR gate 603, and the EVENCYCLE control input 605 is the exclusive OR gate 603. Is connected to the other input of).
- the output of the exclusive OR gate 603 is connected to the most significant bit of the address input of the memory 602.
- one pipeline cycle includes a step in which the position of the digital switch 601 is connected to the top and operated in a read mode, and the position of the digital switch 601 is connected to the bottom and operated in the write mode. Are divided into stages.
- a read / write control input 604 is provided with a value of 1 for reading the attribute value of a neuron of the current update period and a value of 0 for storing the newly calculated attribute value of the neuron.
- the Even Cycle control input 605 is provided with a value of 0 when the neural network update period number is even and a value of 1 when the odd number is odd.
- the entire area of the memory 602 is divided into an upper half area and a lower half area.
- the upper half area of the memory 602 is used as the YC memory, and the lower half area is used as the YN memory.
- the upper half area of the memory 602 is used as the YN memory, and the lower half area is used alternately as the YC memory.
- SWAP single memory replacement
- Equation 1 the basic structure of the calculation unit 101 may be implemented as shown in FIG. 8.
- the calculation unit 101 comprises a multiplier as many as the number of the memory units 100, and multiplies the neuron attribute values and the connection line attribute values from each memory unit 100.
- a multiplication operation unit 800 for performing a multi-stage multiplication operation on a plurality of output values from the multiplication operation unit 800 and an add operation unit 802 and 804.
- An accumulator 808 for accumulating the output values from 806, and one for calculating a new neuron attribute value to be used in the next neural network update period by applying an activation function to the cumulative output value from the accumulator 808.
- the calculation unit 101 may further include registers 801, 803, 805, 807, 809 between each operation step.
- the calculation unit 101 includes a plurality of registers 801 and an addition operation unit provided between the multiplication operation unit 800 and the first addition operation unit 802 of the addition operation units 802, 804, and 806. (802, 804, 806) A plurality of registers (803, 805) provided between each step of the tree, the add operation unit 802, 804, 806 is provided between the last add operation unit 806 and the accumulator 808 of the tree And a register 809 provided between the accumulator 808 and the activation function operator 811.
- each register is synchronized according to one system clock and each calculation step operates in a pipelined manner.
- the multiplication operation unit 800 and the tree-based addition operation units 802, 804, and 806 collectively bundle a series of neural network connection lines. The total sum of inputs coming through the connection line included in is sequentially calculated.
- the accumulator 808 calculates the total sum of the inputs of the neurons by cumulatively calculating the total sum of the inputs of the connection line bundles. At this time, if the data inputted to the accumulator 808 at the output of the add operation tree is the data of the first connection line bundle of a specific neuron, the digital switch 810 is switched to the left terminal by the control unit 201 so that the value 0 is accumulated in the accumulator ( Provided to another input of 808 to initialize the output of accumulator 808 to a new value.
- the activation function calculator 811 calculates a new neuron attribute value (state value) by applying the activation function to the total sum of the neuron inputs.
- the activation function operator 811 may be implemented in a simple structure such as a memory reference table, or may be implemented in a dedicated processor that is executed by micro code.
- FIG 9 is an embodiment diagram showing the data flow in the calculation unit according to the invention.
- connection line bundle k when data of a connection line bundle k is provided to the input terminal of the multiplication operation unit 800 at a specific time point, the data of the connection line bundle k appears at the output terminal of the multiplication operation unit 800 at the next clock period.
- the data is processed by stepping in a manner that appears at the output terminal of the first add operation unit 802, and finally, when the final add operation unit 806 is reached, it is calculated as a net input of the connection line bundle k.
- the net input of this bundle bundle k is summed one by one by the accumulator 808 and is summed n times when the number of connection bundles of one neuron is n and calculated as the net input of one neuron j.
- the net input of neuron j is calculated and output as a new attribute value of the neuron by the activation function for n clock periods.
- FIG. 10 is a detailed exemplary diagram for explaining a multi-stage pipeline structure of a neural network computing device according to the present invention, and shows a multi-stage pipeline circuit.
- the ideal pipeline period is max (tmem, tmul, tadd, tacti / B). Where B is the number of wire bundles per neuron.
- a multiplier, an adder, and an activation function operator may be configured as circuits that are internally processed in a pipelined manner. If the number of pipeline stages of the multiplier is smul, the number of pipeline stages of the adder is sadd, and the number of pipeline stages of the activator is sacti, the pipeline cycle of the entire system is max (tmem, tmul / smul, tadd / sadd, tacti / (B * sacti)). This means that multipliers, adders, and activation function operators can further shorten pipeline cycles if they can operate internally enough pipelined. However, even when it is not possible to operate in a pipelined manner internally, it is possible to convert to a pipelined circuit using a plurality of computing devices. This method, which will be described below, will be referred to as a parallel computational line technique.
- FIG. 11 is an exemplary diagram for describing a parallel calculation line scheme according to the present invention
- FIG. 12 is a diagram illustrating the flow of input / output data according to the parallel calculation line scheme according to the present invention.
- one demultiplexer 1101 is used at the input and used inside.
- Two devices C 1102 are used, one multiplexer 1103 is used at the output, and the divider 1101 and the multiplexer 1103 are synchronized by a clock t ck .
- One input data is input to the input stage every t ck clock period, and the input data is sequentially distributed to each internal device C 1102 in the distributor 1101. After receiving the input data, each internal device C 1102 completes the calculation at t c time and outputs it.
- the multiplexer 1103 selects and latches the output of the completed device C 1102 at every t ck time. (1104).
- the divider 1101 and the multiplexer 1103 can be implemented using simple logic gate and decoder circuits, and have little effect on processing speed. This is referred to as "parallel calculation line technique" in the present invention.
- the circuit of the parallel computation line technique outputs one result every t ck .
- the computational throughput is increased by one computation per t ck .
- a plurality of devices C 1102 may be used to arbitrarily increase throughput to a desired level. This is the same principle as increasing the production line to increase production in the production plant. As an example, when the number of devices C is 4, the flow of input / output data is as shown in FIG. 12.
- FIG. 13 is an exemplary view illustrating a case where the parallel calculation line scheme according to the present invention is applied to a multiplier, an adder, or an activation function calculator.
- the multiplier 1301, the adder 1303, or the activation function operator 1305 is substituted into the device C 1102 using the parallel calculation line technique as described above, the number of devices introduced is proportional to each other.
- the multiplier 1302 or the adder 1304 or the activation function operator 1306 with improved throughput per hour may be implemented.
- each multiplier in the multiplication operation unit 800 is composed of one divider, a plurality of multipliers 1301, and a multiplexer, and sequentially divides input data input in a clock period to the plurality of multipliers 1301 by a divider. Then, the calculated data are multiplexed in order by the multiplexer and output in the clock cycle.
- Each adder in the add operation units 802, 804, and 806 consists of a divider, a plurality of adders 1303, and a multiplexer, and the input data inputted in a clock cycle is sequentially transferred to the plurality of adders 1303 by the divider. The data is calculated and multiplexed in order by the multiplexer and outputted in clock cycles.
- the activation function operator 811 is composed of one distributor, a plurality of activation function operators 1305, and one multiplexer, and distributes the input data input in a clock cycle to the plurality of activation function operators 1305 in turn by a divider.
- the calculated data are multiplexed in order by the multiplexer and output in clock cycles.
- FIG. 14 is an exemplary view illustrating a case where a parallel calculation line technique according to the present invention is applied to an accumulator.
- the divider 1400 and the multiplexer 1401 are implemented as described above, but each of the internal devices are first-in-first-out (FIFO).
- the queue 1402 and accumulator 1403 are replaced with circuits connected in series.
- the device thus constructed will be described as "1405".
- the input data drawn in the clock cycle is distributed to the first-in first-out (FIFO) queue 1402 by the divider 1400 in turn, and the data whose calculation is completed in the accumulator 1403 is multiplexed in order by the multiplexer 1401. It is output in clock cycles.
- the unit sum calculation time of the accumulator 1403 is taccum, and the pipeline period is t ck .
- the number of accumulators 1403 required for the implementation of the circuit shown in FIG. 14 is two.
- the flow of input / output data is as shown in FIG. 15.
- 15 is a diagram illustrating the flow of input / output data when the parallel calculation line scheme according to the present invention is applied.
- the net input data net j of a bundle of neurons sequentially provided to the input of the divider 1400 is the first first-in first-out queue q1 and the second first in two units, which are the number of wire bundles per neuron. Stored alternately in the election queue q2.
- Each of the unit accumulator acc1 and acc2 is stored in the first-in, first-out queues q1 and q2 in the preceding stage, and the data is stored in the first-in, first-out queues q1 and q2. Is selected and output.
- each component of FIG. 10 is correspondingly replaced with a multiplier, an adder, an accumulator, and an activation function operator to which the parallel calculation line technique is applied, as shown in FIG. 16.
- FIG. 16 is a detailed diagram illustrating a multi-stage pipeline structure when the parallel computing line technique is applied to a neural network computing device according to the present invention.
- all the multipliers 1601 and all the adders 1602, and the accumulators 1603 and the activation function operator 1604 each have a parallel calculation line technique applied to add a unit calculation device if necessary.
- the calculation cycle can be shortened arbitrarily.
- the pipeline cycle can be shortened to the time that takes the most time among the stages of the pipeline, except that tmem, the memory access cycle, can be arbitrarily shortened.
- the ideal pipeline cycle is tmem.
- p is the number of memory units
- the maximum processing speed is p / tmem CPS (Connection Per Second).
- FIG. 22 is a detailed block diagram of an embodiment of a multiplier of a calculation unit that calculates Equation 2.
- each multiplier in the computational unit of FIG. 8 has two input values (connected line attribute value and neuron attribute value).
- This one subtractor 2200 is connected, and the output of the subtractor 2200 may be replaced by a circuit connected by a square power calculator 2201.
- 32 is a detailed block diagram of an embodiment of a multiplier of a calculation unit when the calculation model of the neural network executed in the calculation unit is a dynamic synaptic model or a spiking neural network model.
- each multiplier in the computational unit of FIG. 8 is one reference table 3200 and one multiplier ( It can be replaced with a circuit consisting of 3201.
- the attribute values of the connection line stored in the W memory of the memory unit are stored separately by being divided into the weight value w ij of the connection line and the dynamic type identifier (type ij ) of the connection line.
- the type identifier selects one of a plurality of tables of the reference table 3200.
- the attribute value y M (i, j) of the neuron represents the value of the time axis in the reference table 3200.
- the activation function operator transmits a signal gradually increasing every neural network update period starting from 0 as an output value when a specific neuron generates a spike, and the signal of FIG. As shown in (c), it is converted into a signal that changes with time by the reference table 3200 and is passed to one of the inputs of the multiplier 3201.
- the memory storage method in which all the neurons have the same number of connection bundles and the structure of the calculation unit 101 therefor is that when the difference in the number of connection lines between the neurons is large, in the case of neurons having a small number of connection line bundles, The number can be high and the efficiency can be reduced. Also, in this case, the calculation time given to the activation function operator 1604 is shortened so that a quick activation function operator 1604 is required or a large number of activation function operators 1604 must be added to the configuration of the parallel calculation line technique.
- FIG. 17 is a view for explaining another structure of the calculation unit according to the present invention
- FIG. 18 is a diagram showing the input / output data flow in the calculation unit of the other structure of FIG. 17 according to the present invention.
- a first-in first-out (FIFO) queue 1700 may be placed between an accumulator and an activation function operator as described above in FIG. 8 or FIG. 13.
- the activation function calculation time is a time corresponding to the average number of connection bundles of all neurons, and the input terminal of the activation function operator is not synchronized to the pipeline period of the neural network computing device, and the first-in, first-out queue ( In 1700, the oldest stored value is retrieved and used.
- the activation function operator can calculate the data accumulated in the first-in, first-out queue 1700 one by one and calculate the equal calculation time for all neurons.
- control unit stores a value in each memory inside the memory unit 100 of FIG. Procedures a to h such as can be used.
- the activation function operator corresponds to the average number of wire bundles of all neurons.
- the activation function operator can be processed periodically to improve efficiency.
- the timing at which the output data of the activation function is output can be known in advance, and when the output data of the activation function is stored in the YN memory 105 of all the memory units 100, the value of the OutSel input 113, which is an address value to be stored, is The control unit 201 could produce in a predetermined order.
- 19 is a view for explaining another structure of the activation function operator and the YN memory according to the present invention.
- the activation function calculator 1900 receives a second input to a first input 1902 for receiving the net input data of a neuron and a first output 1904 for outputting a new attribute value (state value). 1901 and second output 1905 are added.
- the number j of the neuron is input to the second input 1903.
- the activation function operator 1900 temporarily stores the number of neurons while calculating the activation function, and outputs the number of neurons when the calculation is completed to output a new attribute value (status value) to the first output 1904.
- (1905) when the attribute value (state value) of the neuron is stored in the YN memory 1901, the number of neurons (1906) is the OutSel input (1906) commonly connected to the address input of the YN memory (1901). Is provided.
- the result value can be stored in the memory of the correct position.
- recall mode execution of an artificial neural network including an input and an output may be performed by the following processes 1 to 3.
- This method has a problem of slowing down the processing speed of the system because the calculation must be stopped to set the input data or extract the value of the output neuron.
- a method as shown in FIG. 20 may be used for a method of simultaneously setting neural network input data and extracting output neuron values.
- 20 is a configuration diagram of another embodiment of a neural network computing device according to the present invention.
- the neural network computing device includes a control unit 2006 for controlling the neural network computing device, a plurality of memory units 2002 for outputting connection line attribute values and neuron attribute values, respectively.
- Input data from one calculation unit 2003 and control unit 2006 for calculating a new neuron attribute value using the connection line attribute value and the neuron attribute value respectively input from the plurality of memory units 2002 is input to the input neuron.
- a dual memory replacement (SWAP) method in which all inputs and all outputs are interchangeably connected according to the control of the digital switch 2004 and the control unit 2006. It is, a first and a second output memory (2001, 2005) the property value of new neurons from the calculation unit 2003 to be output to the control unit (2006).
- one neural network update period is divided into two steps: storing the value of the input neuron and storing the value of the newly calculated neuron.
- the step of storing the values of the input neurons the digital switch 2004 is connected to the output of the input memory 2000 so that the attribute values of the input neurons stored in the input memory are output from the input memory 2000 so that all the memory units 2002 To the YN memory.
- the step of storing the value of the newly calculated neuron The digital switch 2004 is connected to the output of the calculation unit 2003 so that the attribute values of the newly calculated neurons output from the calculation unit 2003 are all memory units 2002. To the YN memory.
- control unit 2006 may store the value of the input neuron to be used in the next neural network update period in the input memory 2000.
- a method of performing "1. storing the value of an input neuron" at the beginning of the neural network update period may be used all at once. Using this method, since "1. The value of the input neurons" need not be used other than the YN memory, the start of the neural network update cycle can be accelerated as shown in FIG. 21 (b). There is an advantage to increase the efficiency somewhat. However, if the number of input neurons is large, the input process may still affect the performance of the neural network computing device.
- the output of the calculation unit is generated every two or more clock periods so that the interleaving is performed every clock period in which no output occurs.
- the method can be switched to "1. Step of storing input neurons" to store input data one by one. In this case, the process of storing the values of the input neurons does not affect the performance of the neural network computing device at all.
- the first output memory 2001 and the second output memory 2005 may be a dual memory in which all inputs and all outputs may be interchanged according to a control signal.
- SWAP swap
- the newly calculated attribute values of the neurons within the neural network update cycle are stored in the first output memory 2001.
- the two memories are replaced with each other and updated.
- the data stored in the period is placed in the second output memory location.
- the control unit 2006 may read attribute values of all neurons except the input neurons from the second output memory 2005, and take the attribute values of the output neurons and use them as real-time output values of the neural network. This approach has the advantage that the control unit can always access the property values of the output neurons regardless of the timing and timing of execution of the neural network computing device.
- 21 is a diagram for explaining a neural network update period according to the present invention.
- FIG. 21A illustrates a case where the process of storing the attribute values of the input neurons in the memory unit 2002 is not performed at the beginning of the neural network update period.
- the new neural network update period 2101 may be executed only when the previous neural network update period 2100 is completely completed.
- 21B illustrates a case where the process of storing the attribute values of the input neurons in the memory unit 2002 is executed at the beginning of the neural network update period. Since the input neurons 2102 do not need to use a calculation unit to calculate their values, the interval between neural network update periods can be narrowed than in FIG. 21A.
- 21C illustrates a method of interleaving a process of storing an attribute value of an input neuron in the memory unit 2002 at a gap time at which an output does not occur in the calculation unit. In this case, no matter how many input neurons, the overall processing speed does not affect.
- a method of connecting several neural network computing devices to each other may be used.
- the neural network computing devices may be executed in parallel at the same time, thereby increasing the processing speed of the neural network computing devices.
- Disadvantages of this approach are that the network configuration is limited in order to divide the network into sub-networks, and communication between systems takes place, resulting in overhead and performance degradation.
- a plurality of neural network computing devices may be combined into one large synchronization circuit as shown in FIG.
- FIG. 23 is a block diagram of an embodiment of a neural network computing system in accordance with the present invention.
- the neural network computing system includes a control unit for controlling the neural network computing system (not shown, see Fig. 2 and the following description), "each connection line attribute value and neuron attribute, respectively.
- Connection line attribute values and neurons respectively input from a plurality of memory units 2300 including a plurality of memory parts 2309 for outputting values, and a plurality of corresponding memory parts 2309 in the plurality of memory units 2300.
- a plurality of calculation units 2301 are included for each new neuron attribute value calculated using the attribute value and fed back to each of the corresponding plurality of memory parts 2309.
- the plurality of memory parts 2309 and the plurality of calculation units 2301 in the plurality of memory units 2300 operate in a pipelined manner in synchronization with one system clock under the control of the control unit.
- Each memory part includes a W memory (first memory) 2302 for storing a connection line attribute value, an M memory (second memory, 2303) for storing a unique number of neurons, and a YC for storing neuron attribute values.
- Memory group first memory group, 2304
- YN memory group second memory group, 2305
- the i th memory unit of the h th neural network computing device before combining is the h th th i th memory unit of the combined neural network computing system.
- Memory part, and thus, in a multiple neural network computing system, one memory unit 2300 is composed of H memory parts.
- One memory part is basically the same as the structure of the memory unit described above with reference to FIG. 1, but there are differences such as 1 and 2 below.
- H YC memories are located in a form of combining H memory by a decoder circuit.
- H YN memories are commonly located.
- a multiple neural network computing system consisting of H neural network computing devices includes H computational units 2301, and the hth computational unit is coupled to the hth memory part of each memory unit.
- control unit stores the value in each memory of each memory part in the memory unit 2300, and may store the value in each memory according to the following procedures a to j.
- step j In the j th address of all memories of the YN memory group (second memory group, 2305) of the h th memory part of each memory unit, the attribute value of the neuron whose j is a unique number in the h th neuron group is stored in common. step
- each of the memories denoted by YCa-b is identical to the memory denoted by YNa-b which is the same a and b and the dual memory replacement as described above.
- SWAP SWAP
- the j-th memory of the YC memory group (first memory group) of the i-th memory part and the i-th memory of the YN memory group (second memory group) of the j-th memory part are controlled for any natural number i, j. It is implemented as a dual memory replacement method in which all inputs and outputs are interchanged under the control of a unit.
- the control unit supplies the InSel input 2308 for each memory part with a number value of the connector bundle starting from 1 and incrementing by 1 for every system clock cycle, and the neural network update cycle begins. Then, after a certain system clock period, the memories 2302 to 2305 of the h th memory part in the memory unit 2300 sequentially sequence attribute values of the connection line of the bundle bundle in the h th neuron group and attribute values of the neurons connected to the connection line. Will print The output of the h th memory part in all the memory units is input to the h th computational unit, and constitutes the data of the bundle of the connecting lines of the h th neuron group. The order of this bundle of bundles is repeated in order from the first bundle of neurons to the last bundle of neurons in the h-th neuron group, and then from the first bundle of neurons to the last bundle of neurons, and the last bundle of last neurons. It is repeated until output.
- the input of the h-th calculated unit inputs data of the connected bundles of each neuron in the h-th neuron group. Are sequentially input, and the attribute values of new neurons are calculated and output every n system clock cycles to the output of the calculation unit.
- the value of the new neuron in the h-th neuron group calculated in the h-th calculation unit 2301 is stored in common in all the YN memories 2305 of the h-th memory part of all the memory units. At this time, the address to be stored and the write permission signal WE are provided by the control unit 201 through the OutSel input 2310 for each memory part.
- the control unit swaps all YC memories and corresponding YN memories, respectively, so that the value of the previously stored YN memories into one large YC memory 2304 in the new neural network update cycle.
- the large YC memory 2304 of all memory parts stores attribute values of all neurons in the neural network.
- the maximum processing speed of the neural network computing system is p * H / tmem CPS.
- the configuration of the multi-neural network computing system as described above can increase the size of the system indefinitely without any limitation of the neural network topology.
- the performance of the multi-neural network computing system can be increased in proportion to the input resources without the communication overhead occurring in the multi-system.
- the neural network update period of the backpropagation learning algorithm includes first, second, third and fourth sub periods.
- a calculation structure for performing only the first and second sub periods and a calculation structure for performing the third and fourth sub periods will be described separately, and then a method of integrating the two calculation structures into one will be described.
- FIG. 24 is a diagram for describing a structure of a neural network computing device that simultaneously executes a first sub period and a second sub period of a backpropagation learning algorithm according to the present invention.
- the neural network computing device which executes the first sub period and the second sub period of the backpropagation learning algorithm together includes a control unit for controlling the neural network computing device, the connection line attribute value and the neuron error value, respectively.
- Learning data provided through a control unit from a plurality of memory units 2400 for output and a connection line attribute value and neuron error value respectively input from the plurality of memory units 2400 (or supervisors external to the system).
- one calculation unit 2401 for calculating a new neuron error value (used as a neuron error value of a next neural network update period) and feeding it back to each of the plurality of memory units 2400.
- the plurality of memory units 2400 and one calculation unit 2401 operate in a pipelined manner in synchronization with one system clock under the control of the control unit.
- the InSel input 2408 and the OutSel input 2409 respectively connected to the control unit are commonly connected to all the memory units 2400.
- the outputs of all the memory units 2400 are each connected to the inputs of the calculation unit 2401.
- the output of the calculation unit 2401 is commonly connected to the inputs of all the memory units 2400.
- Each memory unit 2400 stores a W memory (first memory) 2403 for storing connection line attribute values, an R2 memory (second memory, 2404) for storing a unique number of neurons, and a neuron error value.
- W memory first memory
- R2 memory second memory, 2404
- EC memory third memory 2405
- EN memory fourth memory 2406
- the InSel input 2408 in each memory unit 2400 is commonly connected to the address input of the W memory 2403 and the address input of the R2 memory 2404.
- the data output of the R2 memory 2404 is connected to the address input of the EC memory 2405.
- the data output of the W memory 2403 and the data output of the EC memory 2405 are respectively output of the memory unit 2400 and are commonly connected to the input of the calculation unit 2401.
- the output of the calculation unit 2401 is connected to the data input of the EN memory 2406 of the memory unit 2400, and the address input of the EN memory 2406 is connected to the OutSel input 2409.
- the EC memory 2405 and the EN memory 2406 are implemented in a double memory replacement (SWAP) method in which all inputs and all outputs are interchangeably connected under the control of the control unit.
- SWAP double memory replacement
- the neural network computing device shown in FIG. 24 is similar to the basic structure of the neural network computing device of FIG. 1 described above, but has the following differences.
- R2 memory 2404 instead of M memory of FIG. 1 stores unique numbers of neurons connected to a specific connection line in a reverse network.
- the EC memory 2405 and the EN memory 2406 store error values of the neurons instead of the attribute values of the neurons.
- the output neurons (the input neurons in the reverse network) of the total neurons in the calculation unit are replaced by the corresponding output neurons provided via the teach data input 2407 of the calculation unit.
- the error value is calculated by comparing the learning value with the property value of the neuron [Equation 2].
- the calculation unit of FIG. 1 calculates an attribute value of a neuron, whereas a neuron other than an output neuron of all neurons calculates an error value by factoring error values coming through a reverse connection line (Equation 3).
- the learning data of the output neuron is input every clock period via the learning data input 2407 of the calculation unit by the control unit.
- the calculation unit calculates and outputs an error value by applying Equation 2
- the calculation unit is fed back to each of the plurality of memory units 2400 and stored in the EN memory (fourth memory, 2406). This process is repeated until the error values of all output neurons are calculated.
- the control unit number value of the connection bundle number starting from 1 at the InSel input and increasing by 1 every system clock period. Is supplied, and after a certain system clock period has elapsed since the neural network update cycle starts, through the outputs of the W memory 2403 and the EC memory 2405 of the memory unit 2400, the attributes of the connection line of the connection line bundle and the connection line Error values of the connected neurons are sequentially output.
- the output of each of all the memory units 2400 is input to the input of one calculation unit 2401 and constitutes data of one wire bundle.
- the order of this bundle is repeated from the first bundle of the first neuron to the last bundle, and from the first bundle of the second neuron to the last bundle, and until the last bundle of the last neuron is output. Is repeated.
- the calculation unit 2401 calculates the total sum of the error values of the bundles of each of the neurons by applying Equation 3, and the values are fed back to each of the plurality of memory units 2400 to provide EN memory (fourth). Memory 2406.
- 25 is a diagram for explaining the structure of a neural network computing device that executes a learning algorithm according to the present invention. This structure can be used in neural network models using the Delta Learning Rule or the Hebb's Rule.
- a neural network computing device that executes a learning algorithm outputs a control unit for controlling the neural network computing device, a connection line attribute value and a neuron attribute value, to the calculation unit 2501, respectively,
- One calculation unit 2501 is used to calculate a new neuron attribute value and a learning attribute value using the connection line attribute value and the neuron attribute value respectively input from 2500.
- the plurality of memory units 2500 and one calculation unit 2501 operate in a pipelined manner in synchronization with one system clock under the control of the control unit.
- Each of the plurality of memory units 2500 stores a WC memory (first memory, 2502) for storing connection line attribute values, an M memory (second memory, 2503) for storing a unique number of neurons, and a neuron attribute value. Delayed connection line attribute values from the YC memory (third memory 2504), the YN memory (fourth memory 2506) and the WC memory 2502 for storing the new neuron attribute values calculated in the calculation unit 2501 First-in, first-out queue (first delay means 2509), second-in, first-out queue (second delay means, 2510) for delaying neuron attribute values from YC memory 2504, and calculation unit 2501 Connector adjustment module 2511 for calculating a new connector attribute value using a learning attribute value from the first-in first-out queue 2509 and a neuron attribute value from the second first-in first-out queue 2510. , And the new connector property values calculated by the connector adjustment module 2511. Chapter WN and a memory (the memory 5, 2505) to.
- the first-in, first-out queue (FIFO Queue 2509) and the second, first-in, first-out queue (FIFO Queue, 2510) serves to delay the attribute value (W) of the connection line and the attribute value (Y) of the neuron connected to the connection line.
- the X output of the calculation unit 2501 a learning attribute value necessary for learning the neuron is output.
- connection line adjustment module 2511 receives the three input data (W, Y, X), calculates the attribute value of the new connection line of the next neural network update period, and stores the attribute value in the WN memory 2505.
- the YC memory 2504 and the YN memory 2506, the WC memory 2502 and the WN memory 2505 are each implemented in a dual memory swap (SWAP) method in which all inputs and all outputs are connected to each other under the control of a control unit. do.
- the YC memory 2504, the YN memory 2506, the WC memory 2502, and the WN memory 2505 may each be implemented using a single memory redundancy method or a single memory replacement method using one memory. .
- connection line adjustment module 2511 performs the calculation as shown in Equation 7 below.
- W ij represents an attribute value of the i-th connection line of neuron j
- Y j represents an attribute value of neuron j
- L j represents an attribute value required for learning neuron j.
- Equation 7 is a more generalized function encompassing [Equation 5], in contrast to [Equation 5]
- W ij is the weight value of the connection line w ij
- Y j is the state value of the neuron y j
- L j is The equation is as shown in Equation 8 below.
- connection line adjustment module 2511 that calculates Equation 8 may be implemented by one multiplier 2513, a first-in first-out queue 2512, and one adder 2514. That is, the connection line adjustment module 2511 is a third-in-first-out queue (third delay unit 2512) for delaying the connection line attribute value from the first-in, first-out queue 2509, and the learning attribute from the calculation unit 2501.
- a multiplier 2513 for performing a multiplication operation on the value and the neuron attribute value from the second first-in first-out queue 2510, and the connection line attribute value from the third first-in first-out queue 2512 and the output value of the multiplier 2513.
- an adder 2514 for performing an addition operation on and outputting a new connection line attribute value.
- the FIFO queue 2512 serves to delay the W ij (T) value during the calculation by the multiplier 2513.
- FIG. 26 is a diagram illustrating a data flow in the neural network computing device of FIG. 25 according to the present invention.
- Connector bundle k is the first connector bundle of neuron j.
- a neural network computing device as shown in FIG. 33 may be used.
- the neural network computing device that executes the learning algorithm outputs a control unit for controlling the neural network computing device, the connection line attribute value and the neuron attribute value to the calculation unit 3301, respectively, and the connection line attribute value. And a new neuron using the neuron attribute values and neuron attribute values respectively input from the plurality of memory units 3300 and the plurality of memory units 3300 for calculating new interconnection attribute values using the neuron attribute value and the learning attribute value.
- the plurality of memory units 3300 and one calculation unit 3301 are operated in a pipelined manner in synchronization with one system clock under the control of the control unit.
- Each of the plurality of memory units 3300 stores a WC memory (first memory) 3302 for storing the connection line attribute value, an M memory (second memory, 3303) for storing the unique number of the neuron, and a neuron attribute value.
- a connection line adjustment module 3311 and a connection line adjustment module 3311 for calculating a new connection line attribute value using the attribute value and the attribute value of the input neuron from the YC memory (third memory, 3304) and the learning attribute value of the neuron.
- WN memory (5th memory, 3305) for storing the new connection line attribute value calculated in the.
- the memories in the memory unit operate in a pipelined manner in synchronization with one system clock.
- the calculation unit 3301 calculates the new attribute value of the neuron and outputs it to the Y output, and simultaneously calculates and outputs the learning attribute value required for the neuron's learning to the X output.
- the X output of the calculation unit 3301 is connected to the LN memory 3322, and the LN memory 3322 serves to store the newly calculated learning attribute value Lj (T + 1).
- the LC memory 3331 stores the learning attribute value Lj (T) of the neuron calculated in the previous neural network update period, and the data output of this memory is connected to the X input of the connection line adjusting module 3311 of all the memory units 3300. do.
- the attribute value output of the specific connection line output from the memory unit 3300 and the attribute value output of the neuron connected to the connection line are respectively connected to the W input and the Y input of the connection line adjustment module 3311 in the memory unit 3300.
- the connection line adjustment module 3311 receives these three input data (W, Y, L), calculates a new connection line attribute value of the next neural network update period, and stores it in the WN memory 3305.
- the YC memory 3304 and the YN memory 3306, the WC memory 3302 and the WN memory 3305, and the LC memory 3331 and the LN memory 3322 respectively control all inputs and all outputs under the control of the control unit. It is implemented as a dual memory swap (SWAP) method that is interchangeable. As another alternative, the YC memory 3304 and the YN memory 3306, the WC memory 3302 and the WN memory 3305, and the LC memory 3331 and the LN memory 3322 each use a single memory. It can be implemented as a memory redundancy method or a single memory replacement method.
- SWAP dual memory swap
- connection line adjusting module 3311 Since the description of the connection line adjusting module 3311 is similar to that described above with reference to FIG. 25, the description thereof will not be repeated herein.
- FIG. 27 is a diagram illustrating a neural network computing device for alternately performing a reverse propagation period and a forward propagation period for all or part of a network of one neural network according to the present invention.
- the structure of the present invention can execute a learning mode of a neural network model that alternates a backward propagation period and a forward propagation period with respect to a partial network of a neural network, such as a deep belief network.
- the first and second sub periods correspond to the reverse propagation period
- the third and fourth sub periods correspond to the forward propagation period.
- a neural network computing device for alternately performing a reverse propagation period and a forward propagation period for all or part of a network of one neural network includes a control unit for controlling the neural network computing device, respectively, a connection line. Storing and outputting the attribute value, the forward neuron attribute value, and the reverse neuron attribute value, and based on data input from each of the plurality of memory units 2700 and the plurality of memory units 2700 for calculating new connection line attribute values.
- One calculation unit 2701 is provided for calculating a new forward neuron attribute value and a reverse neuron attribute value and feeding it back to each of the plurality of memory units 2700.
- the neuron attribute value corresponds to the forward neuron attribute value
- the neuron error value corresponds to the reverse neuron attribute value.
- a circuit for calculating a new connection attribute value in FIG. 27 is omitted since it can be easily inferred by those skilled in the art based on the description of FIGS. 25 and 33.
- the plurality of memory units 2700 and one calculation unit 2701 operate in a pipelined manner in synchronization with one system clock under control of the control unit.
- Each of the plurality of memory units 2700 includes an R1 memory (first memory, 2705) for storing address values of the WC memory (second memory, 2704) in a reverse network, and a WC memory (for storing connection line attribute values).
- YC memory (seventh memory, 2703), YN memory (eighth memory, 2709), WC memory (2704) for selecting the new forward neuron attribute value calculated in the calculation unit 2701 First digital switch 2712, EC The second digital switch 2713 for switching the output of the memory 2707 or the YC memory 2703 to the calculation unit 2701, and the output of the calculation unit 2701 to the EN memory 2710 or the YN memory 2709.
- the positions of the N-bit switches 2712 to 2715 in the neural network computing device are located at the lower end, respectively, and the forward propagation period
- the positions of the N-bit switches 2712-2715 are controlled by the control unit so as to be located at the upper end, respectively.
- the YC memory 2703 and the YN memory 2709, the EC memory 2707 and the EN memory 2710, and the WC memory 2704 and the WN memory 2708 are all inputs and all outputs under the control of the control unit, respectively. It is implemented as a dual memory swap (SWAP) method that is interchangeable. As another alternative, the YC memory 2703 and the YN memory 2709, the EC memory 2707 and the EN memory 2710, and the WC memory 2704 and the WN memory 2708 each use a single memory. It can be implemented as a redundant storage method or a single memory replacement method.
- the control unit places the N-bit switches 2712-2715 at their lower ends and performs the reverse propagation period when the neural network update period starts.
- the N-bit switches 2712-2715 are then switched to the top and perform a forward propagation period.
- the effective system configuration diagram when the N-bit switches 2712 to 2715 are located at the lower end is as shown in FIG. 24, except that the InSel input and the WC memory are not directly connected to each other, but pass through the R1 memory 2705.
- the configuration diagram of the effective system when the N-bit switches (2712 to 2715) is located at the upper end is shown in FIG.
- the procedure in which the system operates in the reverse propagation period is basically the same as described above with reference to FIG. 24, but there is a difference in that the contents of the WC memory 2704 are selected indirectly through the R1 memory 2705. This further has a feature that the contents of the WC memory 2704 can be referred to through the R1 memory 2705 as long as it is in the memory unit even if the contents of the WC memory 2704 do not match the order of the connection line bundle of the reverse network.
- the procedure of operating the system in the forward propagation period is as described above with reference to FIGS. 25 and 33.
- the value may be stored in each memory according to the following procedures a to q.
- connection line when both ends of each connection line are divided into one end where the arrow starts and the other end where the arrow ends, assigning a number that satisfies the following conditions 1 to 4 on both sides of the connection line.
- each added connector having a connector attribute value that does not affect which neuron is connected or a null neuron (where p is a memory unit in the neural network computing device) (Number of 2700)
- an i th connection line of the k th reverse connection line bundle is located in the WC memory 2704 of the i th memory unit of the memory unit 2700. Step to save the position value
- step a when a specific connection line of the forward neural network is stored in the i-th memory unit, the same connection line is identically stored in the i-th memory unit in the reverse network. Therefore, as described above, the WC memory 2704 is used in the reverse propagation period so that the memory can be referred to through the R1 memory 2705 even if the storage order does not match the order of the connection line bundle of the reverse network. do.
- step a The problem of solving the above step a is the same problem as the edge coloring problem of coloring the edges attached to each node in the graph theory, and the number of connecting lines connected to each neuron is different. It can be solved by applying the coloring algorithm of the arc, assuming that it represents color.
- Vizing's theorem one of the graph theories and Konig's bipartite theorem, shows that the number of arcs in the node with the most arcs among the nodes in the graph is n.
- the number of colors needed to solve the problem is n. This means that if the number is specified by applying the coloring algorithm of the arc in step a, the connecting line number does not exceed the number of connecting lines of the neurons having the largest number of connecting lines in the entire network.
- FIG. 28 is a diagram for describing another calculation structure that is simplified of the neural network computing device of FIG. 27 according to the present invention.
- the memory 2270, EC memory 2707, and EN memory 2710 are also utilized. , As shown in FIG. 28.
- half of the memory area of the M memory 2802 of FIG. 28 is used for the purpose of the M memory 2702 of the neural network computing device of FIG. 27, and the other half is the R2 memory 2706 of the neural network computing device of FIG. 27.
- Used for Half of the memory area of the YEC memory 2803 of FIG. 28 is used for the YC memory 2703 of the neural network computing device of FIG. 27, and the other half is used of the EC memory 2707 of the neural network computing device of FIG. 27.
- Half of the memory area of the YEN memory 2823 of FIG. 28 is used for the YN memory 2709 of the neural network computing device of FIG. 27, and the other half is used of the EN memory 2710 of the neural network computing device of FIG. 27. Used as.
- each of the plurality of memory units 2800 of FIG. 28 includes an R1 memory (first memory) 2805 for storing address values of the WC memory (second memory) 2804 and a WC for storing connection line attribute values.
- Memory second memory, 2804
- M memory third memory, 2802
- YEC memory fourth memory, 2803
- a digital switch 2812 for selecting an input of a YEN memory (5th memory 2823) and a WC memory 2804 for storing the new reverse neuron attribute value or the forward neuron attribute value calculated in the calculation unit 2801 Include.
- FIGS. 27 and 28 are a detailed block diagram of the calculation units 2701 and 2801 of the neural network computing device of FIGS. 27 and 28 according to the present invention.
- the calculation units 2701 and 2801 consist of multipliers corresponding to the number of memory units 2700 and 2800, and the connection line attribute values from each memory unit 2700 and 2800 and the forward direction.
- a multiplication operation unit 2900 for performing a multiply operation on a neuron property value or a connection line property value and a reverse neuron property value, and has a tree structure to perform multistage addition operation on a plurality of output values from the multiplication operation unit 2900.
- Adder 2901, one accumulator 2902 for accumulating the output values from the add operator 2901, and learning data and accumulator 2902 provided through a control unit from a supervisor external to the system.
- a soma treatment for inputting the cumulative output value from) and calculating a new forward or reverse neuron attribute value to be used in the next neural network update cycle.
- a group (2903).
- the calculation units 2701 and 2801 may further include a register therein for each operation step.
- the registers are synchronized to the system clock and each computational step is pipelined.
- the structure of the calculation unit of FIG. 29 is the same as that of the calculation unit of FIG. 8 described above, except that the soma processor 2903 is used instead of the activation function operator.
- the soma processor 2904 performs various calculations such as a to c according to sub periods in the neural network update period.
- the backward propagation learning algorithm in order to calculate the output neurons in the error calculation sub period, the learning value of each neuron is received from the teaching data input 2904, and Equation 3 is used to apply a new error.
- the value is calculated and stored internally and output to the Y output. That is, in the period of calculating the error of the output neuron, the error value is calculated by the difference between the input training data (Teach) and the attribute value of the neuron stored therein, and is stored internally and output as Y output. This process can be omitted if it is not a backpropagation learning algorithm.
- the backward propagation learning algorithm When the backward propagation learning algorithm is executed, the total of the error inputs is received from the accumulator 2902 in the order of the neurons that are not the output neurons in the error calculation sub period, and are internally stored and output to the Y output. If it is not backward propagation learning algorithm, it calculates according to the backward calculation formula of the neural network model and outputs it to Y output.
- the neuron property value calculation subcycle receives the net input value NETk of the neuron from the accumulator 2902 and calculates the property value (state value) of the new neuron by applying an activation function. Internally and output to Y output. At the same time, the property values of neurons required Calculate and output to X output. If it is not backward propagation learning algorithm, it is calculated according to the forward calculation formula of the neural network model and output to Y output.
- FIG. 30 is a detailed configuration diagram of the soma processor 2903 in the calculation unit of FIG. 29 according to the present invention.
- One unit soma processor has an input and output as shown in (a) of FIG. 30, and may store various attribute information of neurons therein.
- the soma processor having the increased throughput by the parallel calculation line technique may be implemented as shown in FIG.
- the soma processor receives a net input of the neuron or the total sum of errors from the accumulator 2902 through the first input 3000 and through the second input 3001.
- the training data of the output neuron is input, and the newly calculated attribute value or error value of the neuron is output through the first output 3003, and the attribute value of the neuron for adjusting the connection line is output through the second output 3002. .
- the soma processor includes distributors 3004 and 3005 corresponding to each input, a plurality of soma processors 3006, and multiplexers 3007 and 3008 corresponding to each output.
- the input data input in clock cycles are sequentially distributed to the plurality of soma processors 3006 by the dividers 3004 and 3005, and the data whose calculation is completed is multiplexed in sequence by the multiplexers 3007 and 3008 and output in clock cycles. do.
- the neural network computing device dedicated to the recall mode described above provides a real-time input and output in the learning mode in real time input value of the input neurons through the input memory, the output data of the output neurons through the output memory along with the real-time learning data
- a memory may be provided in the (Teach) input unit 2723 to provide learning data in real time.
- the structure of the neural network computing system that bundles a plurality of neural network computing devices supporting the learning of FIG. 27 to have a plurality of times the performance is as shown in FIG. 31.
- 31 is another embodiment of the neural network computing system according to the present invention.
- the neural network computing system is a control unit for controlling the neural network computing system, " each of the connection line attribute value and the reverse neuron attribute value, respectively, or the connection line attribute value and the forward neuron attribute value, respectively.
- the new reverse neuron attribute values are respectively calculated by using the connection line attribute values and the reverse neuron attribute values respectively inputted from the corresponding plurality of memory parts, and fed back to each of the corresponding plurality of memory parts, or from the corresponding plurality of memory parts, respectively.
- New forward neuron by using input line attribute value and forward neuron attribute value And a plurality of calculation units 3101 for calculating the run attribute value and the learning attribute value respectively and feeding them back to each of the corresponding plurality of memory parts.
- a circuit for calculating a new connection attribute value in FIG. 31 will be omitted since it can be easily inferred by those skilled in the art based on the description of FIGS. 25 and 33.
- the plurality of memory parts and the plurality of calculation units 3101 in the plurality of memory units 3100 operate in a pipelined manner in synchronization with one system clock under the control of the control unit.
- Each memory part includes an R1 memory (first memory, 3103) for storing address values of the WC memory (second memory, 3102), a WC memory (second memory, 3102) for storing connection line attribute values, R2 memory (third memory, 3115) for storing the unique number of the neuron, EC memory group (first memory group, 3106) for storing the reverse neuron attribute value, and new reverse neuron attribute calculated in the calculation unit 3101 EN memory group (second memory group, 3108) for storing values, M memory (fourth memory, 3104) for storing unique numbers of neurons, YC memory group (third memory group) for storing forward neuron attribute values 3105), an YN memory group (fourth memory group 3107) for storing the new forward neuron attribute value calculated in the calculation unit 3101, a first digital switch for selecting an input of the WC memory 3102, EC Output of memory group 3106 or YC memory group 3105 A second digital switch for switching to the calculation unit 3101, a third digital switch for switching the output of the calculation
- the memory unit 3100 includes n memory units 2700 of individual neural network computing devices combined into one.
- n YC memories are combined by a decoder circuit with n times as much memory capacity
- n YN memories are provided in the YN memory location 3107. It is implemented in the form of a common bundle.
- each EC memory location 3106 is implemented in a form in which n EC memories are combined with a large memory capacity of n times by a decoder circuit, and in the EN memory location 3108, n EN memories are commonly tied together. Is implemented.
- the h th neural network computing device processes the h group of neurons when the neurons of the entire system are divided into n groups.
- each memory unit When a and b are arbitrary integers in Fig. 31, all of the memories designated as YCa-b in each memory unit are each designated as YNa-b, which is the same as a and b, and the dual memory replacement as described above.
- SWAP SWAP
- each of the memories denoted by ECa-b is implemented by the memories denoted by ENa-b which are the same a and b, and the dual memory replacement (SWAP) schemes 3113 and 3114 as described above.
- the operation procedure of such a system has a difference supporting the learning procedure compared to the system operation procedure of FIG. 23, but is similar in general and can be derived at the level of those skilled in the art, and thus a detailed description thereof will be omitted.
- the maximum processing speed of the neural network computing system is p * h / tmem CUPS.
- the neural network computing method according to the present invention as described above may be implemented in the form of program instructions that can be executed by various computer means may be recorded on a computer readable medium.
- the computer readable medium may include program instructions, data files, data structures, etc. alone or in combination.
- Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts.
- Examples of computer readable recording media include magnetic media such as hard disks, floppy disks and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks such as floppy disks.
- Magneto-optical media and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like.
- the medium may be a transmission medium such as an optical or metal wire, a waveguide, or the like including a carrier wave for transmitting a signal specifying a program command, a data structure, or the like.
- Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.
- the hardware device may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.
- the present invention can be used for digital neural network computing systems and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Feedback Control In General (AREA)
- Memory System (AREA)
Abstract
Description
Claims (68)
- 신경망 컴퓨팅 장치에 있어서,상기 신경망 컴퓨팅 장치를 제어하기 위한 제어 유닛;각각 연결선 속성값과 뉴런 속성값을 출력하기 위한 복수 개의 메모리 유닛; 및상기 복수 개의 메모리 유닛으로부터 각각 입력되는 연결선 속성값과 뉴런 속성값을 이용하여 새로운 뉴런 속성값을 계산하여 상기 복수 개의 메모리 유닛 각각으로 피드백시키기 위한 하나의 계산 유닛을 포함하는 신경망 컴퓨팅 장치.
- 제 1 항에 있어서,상기 제어 유닛은,신경망 갱신 주기 내의 클록 주기를 제공하기 위한 클록 주기 카운터; 및제어 신호의 타이밍 및 제어 정보를 저장하고 있다가 상기 클록 주기 카운터로부터의 클록 주기에 따라 상기 신경망 컴퓨팅 장치로 출력하기 위한 제어 메모리를 포함하는 신경망 컴퓨팅 장치.
- 제 1 항에 있어서,상기 제어 유닛은 호스트 컴퓨터에 의해 제어되는, 신경망 컴퓨팅 장치.
- 제 1 항에 있어서,상기 계산 유닛의 출력과 상기 복수 개의 메모리 유닛 사이에 구비되어, 상기 제어 유닛의 제어에 따라 상기 제어 유닛으로부터의 입력 데이터와 상기 계산 유닛으로부터의 새로운 뉴런 속성값 중 어느 하나를 선택하여 상기 복수 개의 메모리 유닛으로 스위칭하기 위한 스위칭 수단을 더 포함하는 신경망 컴퓨팅 장치.
- 제 1 항 내지 제 4 항 중 어느 한 항에 있어서,상기 복수 개의 메모리 유닛 각각은,연결선 속성값을 저장하기 위한 제1메모리;뉴런의 고유번호를 저장하기 위한 제2메모리;상기 제2메모리의 데이터 출력이 주소 입력으로 연결되며, 뉴런 속성값을 저장하기 위한 제3메모리; 및상기 계산 유닛에서 계산된 새로운 뉴런 속성값을 저장하기 위한 제4메모리를 포함하는 신경망 컴퓨팅 장치.
- 제 5 항에 있어서,상기 복수 개의 메모리 유닛 각각은,시스템 클록에 동기화되어 동작하며, 상기 제1메모리의 주소 입력단에 구비되어 상기 제1메모리로 입력되는 연결선 묶음 번호를 임시 저장하기 위한 제1레지스터; 및상기 시스템 클록에 동기화되어 동작하며, 상기 제3메모리의 주소 입력단에 구비되어 상기 제2메모리에서 출력되는 뉴런의 고유번호를 임시 저장하기 위한 제2레지스터를 더 포함하고,상기 제1메모리, 상기 제2메모리, 상기 제3메모리는 상기 제어 유닛의 제어에 따라 파이프라인 방식으로 동작하는, 신경망 컴퓨팅 장치.
- 제 5 항에 있어서,시스템 클록에 동기화되어 동작하며, 상기 복수 개의 메모리 유닛의 각 출력과 상기 하나의 계산 유닛의 입력 사이에 구비되어 상기 연결선 속성값과 상기 뉴런 속성값을 임시 저장하기 위한 복수의 제3레지스터; 및상기 시스템 클록에 동기화되어 동작하며, 상기 하나의 계산 유닛의 출력단에 구비되어 상기 하나의 계산 유닛에서 출력되는 새로운 뉴런 속성값을 임시 저장하기 위한 제4레지스터를 더 포함하고,상기 복수 개의 메모리 유닛과 상기 하나의 계산 유닛은, 상기 제어 유닛의 제어에 따라 파이프라인 방식으로 동작하는, 신경망 컴퓨팅 장치.
- 제 5 항에 있어서,상기 제어 유닛은,하기의 a 과정 내지 h 과정에 따라 각각의 상기 메모리 유닛 내의 각 메모리에 데이터를 저장하는, 신경망 컴퓨팅 장치.a. 신경망 내에서 가장 많은 수의 입력 연결선을 가진 뉴런의 입력 연결선의 수(Pmax)를 찾는 과정b. 상기 메모리 유닛의 수를 p라 할 때, 신경망 내의 모든 뉴런이 개의 연결선을 갖도록 각각의 뉴런에 어떤 뉴런이 연결되어도 인접 뉴런에 영향을 미치지 않는 연결선 속성값을 갖는 가상의 연결선을 추가하는 과정c. 신경망 내 모든 뉴런을 임의의 순서로 정렬하고 일련번호를 부여하는 과정e. 첫 번째 뉴런의 첫 번째 연결선 묶음부터 마지막 번째 뉴런의 마지막 연결선 묶음까지 순서대로 일련 번호 k를 부여하는 과정f. 상기 메모리 유닛 중 i번째 메모리 유닛의 제1메모리의 k번째 주소에는 k번째 연결선 묶음의 i번째 연결선의 속성값을 저장하는 과정g. 상기 복수 개의 메모리 유닛의 상기 제3메모리에는 j번째 주소에 j번째 뉴런의 속성값을 저장하는 과정h. 상기 메모리 유닛 중 i번째 메모리 유닛의 제2메모리의 k번째 주소에는 k번째 연결선 묶음의 i번째 연결선에 연결된 뉴런의 번호 값을 저장하는 과정
- 제 8 항에 있어서,상기 b 과정은,어떤 뉴런과 연결되어도 뉴런의 속성값에 영향을 주지 않는 연결선의 속성값을 갖도록 하는 방식 또는 신경망에 어떤 뉴런과 연결되어도 영향을 주지 않는 속성값을 가진 하나의 가상의 뉴런을 추가하고 모든 가상의 연결선들이 상기 가상의 뉴런과 연결되도록 하는 방식 중 어느 한 방식으로 상기 가상의 연결선을 추가하는, 신경망 컴퓨팅 장치.
- 제 5 항에 있어서,상기 제어 유닛은,하기의 a 과정 내지 h 과정에 따라 각각의 상기 메모리 유닛 내의 각 메모리에 데이터를 저장하는, 신경망 컴퓨팅 장치.a. 신경망 내 모든 뉴런을 각 뉴런에 포함된 입력 연결선의 수를 기준으로 오름차순으로 정렬하고 순서대로 번호를 부여하는 과정b. 신경망 내에 다른 뉴런과 연결선으로 연결되어도 영향을 미치지 않는 속성값을 갖는 한 개의 널(null) 뉴런을 추가하는 과정c. 뉴런 j의 입력 연결선의 수를 pj라 할 때, 신경망 내의 뉴런 각각이 *p개의 연결선을 갖도록 뉴런에 어떤 뉴런과 연결되어도 영향을 미치지 않는 연결선 속성값을 갖고 널(null) 뉴런과 연결된 *p - pj개의 연결선을 추가하는 과정(p는 상기 메모리 유닛의 수)e. 첫 번째 뉴런의 첫 번째 연결선 묶음부터 마지막 번째 뉴런의 마지막 연결선 묶음까지 순서대로 1부터 시작하여 1씩 증가하는 번호 k를 부여하는 과정f. 상기 메모리 유닛 중 i번째 메모리 유닛의 제1메모리의 k번째 주소에는 k번째 연결선 묶음의 i번째 연결선의 속성값을 저장하는 과정g. 상기 메모리 유닛 중 i번째 메모리 유닛의 제2메모리의 k번째 주소에는 k번째 연결선 묶음의 i번째 연결선에 연결된 뉴런의 번호를 저장하는 과정h. 상기 메모리 유닛 중 i번째 메모리 유닛의 제3메모리의 j번째 주소에는 j번째 뉴런의 속성값을 저장하는 과정
- 제 5 항에 있어서,상기 제어 유닛으로부터의 제어 신호에 의해 제어되는 복수 개의 디지털 스위치를 이용하여 두 개의 동일한 메모리의 모든 입출력을 서로 바꾸어 연결하는 이중 메모리 교체(SWAP) 회로를 상기 제3메모리와 상기 제4메모리에 적용하는, 신경망 컴퓨팅 장치.
- 제 1 항 내지 제 4 항 중 어느 한 항에 있어서,상기 복수 개의 메모리 유닛 각각은,연결선 속성값을 저장하기 위한 제1메모리;뉴런의 고유번호를 저장하기 위한 제2메모리; 및뉴런 속성값을 저장하기 위한 제3메모리를 포함하는 신경망 컴퓨팅 장치.
- 제 12 항에 있어서,기존 뉴런 속성값과 상기 계산 유닛에서 계산된 새로운 뉴런 속성값을 구분없이 상기 제3메모리에 저장하고,상기 기존 뉴런 속성값의 읽기 과정과 상기 계산 유닛에서 계산된 새로운 뉴런 속성값의 쓰기 과정을 하나의 파이프라인 주기에 시간 분할로 처리하기 위한 단일 메모리 중복 저장 회로를 상기 제3메모리에 적용하는, 신경망 컴퓨팅 장치.
- 제 12 항에 있어서,상기 제3메모리의 제1 반부 영역에 기존 뉴런 속성값을 저장하고, 제2 반부 영역에 상기 계산 유닛에서 계산된 새로운 뉴런 속성값을 저장하고,상기 기존 뉴런 속성값의 읽기 과정과 상기 계산 유닛에서 계산된 새로운 뉴런 속성값의 쓰기 과정을 하나의 파이프라인 주기에 시간 분할로 처리하기 위한 단일 메모리 교체 회로를 상기 제3메모리에 적용하는, 신경망 컴퓨팅 장치.
- 제 1 항 내지 제 4 항 중 어느 한 항에 있어서,상기 계산 유닛 내부의 각 계산 단계 사이에 시스템 클록에 의해 동기화되는 레지스터를 더 구비하여 상기 각 계산 단계를 파이프라인 방식으로 처리하는, 신경망 컴퓨팅 장치.
- 제 1 항 내지 제 4 항 중 어느 한 항에 있어서,상기 계산 유닛에 구비된 전체 또는 일부의 계산 장치 각각에 대해 내부 구조를 시스템 클록에 동기화되어 동작하는 파이프라인 회로로 구현한, 신경망 컴퓨팅 장치.
- 제 16 항에 있어서,특정 계산 장치의 입력의 수의 개수에 해당하는 분배기와 복수 개의 특정 계산 장치와 상기 특정 계산 장치의 출력의 수에 해당하는 개수의 다중화기를 사용하여, 순차적으로 인입되는 입력 데이터를 상기 분배기를 통해 상기 복수 개의 특정 계산 장치로 분배시키고 상기 복수 개의 특정 계산 장치의 계산 결과를 상기 다중화기로 수합하는 병렬 계산 라인 기법을 적용하여 상기 각 계산 장치의 내부 구조를 파이프라인 방식으로 구현한, 신경망 컴퓨팅 장치.
- 제 1 항 내지 제 4 항 중 어느 한 항에 있어서,상기 계산 유닛은,상기 복수 개의 메모리 유닛으로부터의 연결선 속성값과 뉴런 속성값에 대해 곱셉 연산을 수행하기 위한 곱셈 연산부;상기 곱셈 연산부로부터의 복수의 출력값에 대해 하나 이상의 단계로 덧셈 연산을 수행하기 위한 트리 구조의 덧셈 연산부;상기 덧셈 연산부로부터의 출력값을 누적 연산하기 위한 누산기; 및상기 누산기로부터의 누적 출력값에 활성화 함수를 적용하여 다음 신경망 갱신 주기에 사용될 새로운 뉴런 속성값을 계산하기 위한 활성화 함수 연산기를 포함하는 신경망 컴퓨팅 장치.
- 제 18 항에 있어서,상기 누산기를, 하나의 분배기와 복수 개의 선입선출 큐와 복수 개의 누산기와 하나의 다중화기를 사용하여, 순차적으로 인입되는 입력 데이터를 상기 분배기를 통해 상기 복수 개의 선입선출 큐로 분배시키고 상기 선입선출 큐와 상기 누산기를 거쳐 누산된 결과를 상기 다중화기로 수합하는 병렬 계산 라인 기법을 적용하여 구현한, 신경망 컴퓨팅 장치.
- 제 18 항에 있어서,상기 곱셈 연산부에 구비된 곱셈기 각각을, 하나의 뺄셈기와 하나의 제곱승 계산기로 구현하되, 두 개의 입력 값이 상기 뺄셈기로 연결되고 상기 뺄셈기의 출력이 상기 제곱승 계산기로 연결되는, 신경망 컴퓨팅 장치.
- 제 18 항에 있어서,상기 곱셈 연산부에 구비된 곱셈기 각각을, 하나의 참조 테이블과 하나의 곱셈기를 사용하여 구현한, 신경망 컴퓨팅 장치.
- 제 18 항에 있어서,상기 누산기와 상기 활성화 함수 연산기 사이에 선입선출 큐를 더 포함하는 신경망 컴퓨팅 장치.
- 제 18 항에 있어서,상기 활성화 함수 연산기는,제1입력을 통하여 상기 누산기로부터의 누적 출력값(뉴런의 순입력 데이터)을 입력받고, 제1출력을 통하여 다음 신경망 갱신 주기에 사용될 새로운 뉴런 속성값을 상기 복수 개의 메모리 유닛 각각으로 출력하며,제2입력을 통하여 해당 뉴런의 번호를 입력받고, 상기 제1출력으로 새로운 뉴런 속성값이 출력될 때 해당 뉴런의 번호를 제2출력을 통하여 상기 복수 개의 메모리 유닛 각각의 입력으로 연결하는, 신경망 컴퓨팅 장치.
- 신경망 컴퓨팅 장치에 있어서,상기 신경망 컴퓨팅 장치를 제어하기 위한 제어 유닛;각각 연결선 속성값과 뉴런 속성값을 출력하기 위한 복수 개의 메모리 유닛;상기 복수 개의 메모리 유닛으로부터 각각 입력되는 연결선 속성값과 뉴런 속성값을 이용하여 새로운 뉴런 속성값을 계산하기 위한 하나의 계산 유닛;상기 제어 유닛으로부터의 입력 데이터를 입력 뉴런에 제공하기 위한 입력 수단;상기 입력 수단으로부터의 입력 데이터 또는 상기 계산 유닛으로부터의 새로운 뉴런 속성값을 상기 제어 유닛의 제어에 따라 상기 복수 개의 메모리 유닛으로 스위칭하기 위한 스위칭 수단; 및상기 제어 유닛의 제어에 따라 모든 입출력을 서로 바꾸어 연결하는 이중 메모리 교체(SWAP) 회로로 이루어져, 상기 계산 유닛으로부터의 새로운 뉴런 속성값이 상기 제어 유닛으로 출력되도록 하기 위한 제1 및 제2출력 수단을 포함하는 신경망 컴퓨팅 장치.
- 제 24 항에 있어서,상기 제어 유닛으로부터의 입력 데이터를 상기 복수 개의 메모리 유닛에 저장하는 과정을 신경망 갱신 주기의 처음에 실행하는, 신경망 컴퓨팅 장치.
- 제 24 항에 있어서,상기 제어 유닛으로부터의 입력 데이터를 상기 복수 개의 메모리 유닛에 저장하는 과정을 상기 계산 유닛의 출력이 발생하지 않는 클록 주기에 끼워 넣기(interleaving) 방식으로 실행하는, 신경망 컴퓨팅 장치.
- 신경망 컴퓨팅 시스템에 있어서,상기 신경망 컴퓨팅 시스템을 제어하기 위한 제어 유닛;"각각 연결선 속성값과 뉴런 속성값을 출력하는 복수의 메모리 파트"를 포함하는 복수 개의 메모리 유닛; 및상기 복수 개의 메모리 유닛 내의 상응하는 복수의 메모리 파트로부터 각각 입력되는 연결선 속성값과 뉴런 속성값을 이용하여 새로운 뉴런 속성값을 각각 계산하여 상기 상응하는 복수의 메모리 파트 각각으로 피드백시키기 위한 복수의 계산 유닛을 포함하는 신경망 컴퓨팅 시스템.
- 제 27 항에 있어서,상기 복수 개의 메모리 유닛 내의 상기 복수의 메모리 파트와 상기 복수의 계산 유닛은,상기 제어 유닛의 제어에 따라 하나의 시스템 클록에 동기화되어 파이프라인 방식으로 동작하는, 신경망 컴퓨팅 시스템.
- 제 27 항 또는 제 28 항에 있어서,각각의 상기 메모리 파트는,연결선 속성값을 저장하기 위한 제1메모리;뉴런의 고유번호를 저장하기 위한 제2메모리;복수 개의 메모리가 디코더 회로에 의해 복수 배 용량의 통합 메모리의 기능을 수행하여 뉴런 속성값을 저장하기 위한 제1메모리 그룹; 및복수 개의 메모리가 공통으로 묶여서 상응하는 상기 계산 유닛에서 계산된 새로운 뉴런 속성값을 저장하기 위한 제2메모리 그룹을 포함하는 신경망 컴퓨팅 시스템.
- 제 29 항에 있어서,i번째 메모리 파트(i는 임의의 자연수)의 제1 메모리 그룹의 j번째 메모리(j는 임의의 자연수)와, j번째 메모리 파트의 제2 메모리 그룹의 i번째 메모리는,상기 제어 유닛의 제어에 따라 모든 입출력을 서로 바꾸어 연결하는 이중 메모리 교체(SWAP) 방식으로 구현된, 신경망 컴퓨팅 시스템.
- 제 29 항에 있어서,상기 제어 유닛은,하기의 a 과정 내지 j 과정에 따라 각각의 상기 메모리 파트 내의 각 메모리에 데이터를 저장하는, 신경망 컴퓨팅 시스템.a. 신경망 내 모든 뉴런을 H개의 균일한 뉴런 그룹으로 나누는 과정b. 각 뉴런 그룹 내에서 가장 많은 수의 입력 연결선을 가진 뉴런의 입력 연결선의 수(Pmax)를 찾는 과정c. 메모리 유닛의 수를 p라 할 때, 신경망 내의 모든 뉴런이 개의 연결선을 갖도록 각각의 뉴런에 어떤 뉴런과 연결되어도 인접 뉴런에 영향을 미치지 않는 연결선 속성값을 갖는 가상의 연결선을 추가하는 과정d. 뉴런 그룹 각각에 대해, 뉴런 그룹 내 모든 뉴런 각각에 임의의 순서로 번호를 부여하는 과정e. 뉴런 그룹 각각에 대해, 뉴런 그룹 내 모든 뉴런 각각의 연결선을 p개씩 나누어 개의 묶음으로 분류하고 묶음 내의 연결선 각각에 임의의 순서로 1부터 시작하여 1씩 증가하는 번호 i를 부여하는 과정f. 뉴런 그룹 각각에 대해, 뉴런 그룹 내 첫 번째 뉴런의 첫 번째 연결선 묶음부터 마지막 번째 뉴런의 마지막 연결선 묶음까지 순서대로 1부터 시작하여 1씩 증가하는 번호 k를 부여하는 과정g. 상기 메모리 유닛 중 i번째 메모리 유닛의 h번째 메모리 파트의 제1메모리의 j번째 주소에는 h번째 뉴런 그룹의 k번째 연결선 묶음의 i번째 연결선의 속성값을 저장하는 과정h. 상기 메모리 유닛 중 i번째 메모리 유닛의 h번째 제2메모리의 j번째 주소에는 h번째 뉴런 그룹의 k번째 연결선 묶음의 i번째 연결선에 연결된 뉴런의 고유 번호를 저장하는 과정i. 모든 상기 메모리 유닛 각각의 모든 상기 메모리 파트의 제1메모리 그룹을 구성하는 g번째 메모리의 j번째 주소에는 g번째 뉴런 그룹 내에서 j를 고유번호로 하는 뉴런의 속성값을 저장하는 과정j. 모든 상기 메모리 유닛 각각의 h번째 메모리 파트의 제2메모리 그룹의 모든 메모리들의 j번째 주소에는 공통으로 h번째 뉴런 그룹 내에서 j를 고유번호로 하는 뉴런의 속성값을 저장하는 과정
- 제 27 항 또는 제 28 항에 있어서,각각의 상기 계산 유닛은.상기 상응하는 복수의 메모리 파트로부터의 연결선 속성값과 뉴런 속성값에 대해 곱셉 연산을 수행하기 위한 곱셈 연산부;상기 곱셈 연산부로부터의 복수의 출력값에 대해 하나 이상의 단계로 덧셈 연산을 수행하기 위한 트리 구조의 덧셈 연산부;상기 덧셈 연산부로부터의 출력값을 누적 연산하기 위한 누산기; 및상기 누산기로부터의 누적 출력값에 활성화 함수를 적용하여 새로운 뉴런 속성값을 계산하기 위한 활성화 함수 연산기를 포함하는 신경망 컴퓨팅 시스템.
- 신경망 컴퓨팅 장치에 있어서,상기 신경망 컴퓨팅 장치를 제어하기 위한 제어 유닛;각각 연결선 속성값과 뉴런 오차값을 출력하기 위한 복수 개의 메모리 유닛; 및상기 복수 개의 메모리 유닛으로부터 각각 입력되는 연결선 속성값과 뉴런 오차값을 이용하여 새로운 뉴런 오차값을 계산하여 상기 복수 개의 메모리 유닛 각각으로 피드백시키기 위한 하나의 계산 유닛을 포함하는 신경망 컴퓨팅 장치.
- 제 33 항에 있어서,상기 계산 유닛은,상기 복수 개의 메모리 유닛으로부터 각각 입력되는 연결선 속성값과 뉴런 오차값, 및 상기 제어 유닛을 통해 제공되는 학습 데이터를 이용하여 새로운 뉴런 오차값을 계산하여 상기 복수 개의 메모리 유닛 각각으로 피드백시키는, 신경망 컴퓨팅 장치.
- 제 33 항 또는 제 34 항에 있어서,상기 복수 개의 메모리 유닛 각각은,연결선 속성값을 저장하기 위한 상기 제1메모리;뉴런의 고유번호를 저장하기 위한 제2메모리;뉴런 오차값을 저장하기 위한 제3메모리; 및상기 계산 유닛에서 계산된 새로운 뉴런 오차값을 저장하기 위한 제4메모리를 포함하는 신경망 컴퓨팅 장치.
- 신경망 컴퓨팅 장치에 있어서,상기 신경망 컴퓨팅 장치를 제어하기 위한 제어 유닛;각각 연결선 속성값과 뉴런 속성값을 출력하고, 연결선 속성값과 뉴런 속성값과 학습 속성값을 이용하여 새로운 연결선 속성값을 계산하기 위한 복수 개의 메모리 유닛; 및상기 복수 개의 메모리 유닛으로부터 각각 입력되는 연결선 속성값과 뉴런 속성값을 이용하여 새로운 뉴런 속성값과 학습 속성값을 계산하기 위한 하나의 계산 유닛을 포함하는 신경망 컴퓨팅 장치.
- 제 36 항에 있어서,상기 복수 개의 메모리 유닛 각각은,연결선 속성값을 저장하기 위한 제1메모리;뉴런의 고유번호를 저장하기 위한 제2메모리;뉴런 속성값을 저장하기 위한 제3메모리;상기 계산 유닛에서 계산된 새로운 뉴런 속성값을 저장하기 위한 제4메모리;상기 제1메모리로부터의 연결선 속성값을 지연시키기 위한 제1지연수단;상기 제3메모리로부터의 뉴런 속성값을 지연시키기 위한 제2지연수단; 및상기 계산 유닛으로부터의 학습 속성값과 상기 제1지연수단으로부터의 연결선 속성값과 상기 제2지연수단으로부터의 뉴런 속성값을 이용하여 새로운 연결선 속성값을 계산하기 위한 연결선 조정 모듈; 및상기 연결선 조정 모듈에서 계산된 새로운 연결선 속성값을 저장하기 위한 제5메모리를 포함하는 신경망 컴퓨팅 장치.
- 제 37 항에 있어서,상기 제어 유닛의 제어에 따라 모든 입출력을 서로 바꾸어 연결하는 이중 메모리 교체(SWAP) 회로를 상기 제1메모리와 상기 제5메모리에 적용하고 또한 상기 제3메모리와 상기 제4메모리에 적용하는, 신경망 컴퓨팅 장치.
- 제 37 항에 있어서,상기 제1메모리와 상기 제5메모리, 상기 제3메모리와 상기 제4메모리를 각각 하나의 메모리로 구현하고, 읽기 과정과 쓰기 과정을 시간 분할로 처리하는, 신경망 컴퓨팅 장치.
- 제 37 항에 있어서,상기 연결선 조정 모듈은,상기 제1지연수단으로부터의 연결선 속성값을 지연시키기 위한 제3지연수단;상기 계산 유닛으로부터의 학습 속성값과 상기 제2지연수단으로부터의 뉴런 속성값에 대하여 곱셈 연산을 수행하기 위한 곱셈기; 및상기 제3지연수단으로부터의 연결선 속성값과 상기 곱셈기의 출력 값에 대하여 덧셈 연산을 수행하여 새로운 연결선 속성값을 출력하기 위한 덧셈기를 포함하는 신경망 컴퓨팅 장치.
- 신경망 컴퓨팅 장치에 있어서,상기 신경망 컴퓨팅 장치를 제어하기 위한 제어 유닛;뉴런의 학습 속성값을 저장하기 위한 제1학습 속성값 메모리;각각 연결선 속성값과 뉴런 속성값을 출력하고, 연결선 속성값과 뉴런 속성값과 상기 제1학습 속성값 메모리의 학습 속성값을 이용하여 새로운 연결선 속성값을 계산하기 위한 복수 개의 메모리 유닛;상기 복수 개의 메모리 유닛으로부터 각각 입력되는 연결선 속성값과 뉴런 속성값을 이용하여 새로운 뉴런 속성값과 학습 속성값을 계산하기 위한 하나의 계산 유닛; 및상기 하나의 계산 유닛에서 계산된 새로운 학습 속성값을 저장하기 위한 제2학습 속성값 메모리를 포함하는 신경망 컴퓨팅 장치.
- 제 41 항에 있어서,상기 복수 개의 메모리 유닛 각각은,연결선 속성값을 저장하기 위한 제1메모리;뉴런의 고유번호를 저장하기 위한 제2메모리;뉴런 속성값을 저장하기 위한 제3메모리;상기 계산 유닛에서 계산된 새로운 뉴런 속성값을 저장하기 위한 제4메모리; 및연결선 속성값과 뉴런 속성값과 상기 제1학습 속성값 메모리의 학습 속성값을 이용하여 새로운 연결선 속성값을 계산하기 위한 연결선 조정 모듈; 및상기 연결선 조정 모듈에서 계산된 새로운 연결선 속성값을 저장하기 위한 제5메모리를 포함하는 신경망 컴퓨팅 장치.
- 제 42 항에 있어서,상기 제어 유닛의 제어에 따라 모든 입출력을 서로 바꾸어 연결하는 이중 메모리 교체(SWAP) 회로를 상기 제1학습 속성값 메모리와 상기 제2학습 속성값 메모리, 상기 제1메모리와 상기 제5메모리, 및 상기 제3메모리와 상기 제4메모리에 각각 적용하는, 신경망 컴퓨팅 장치.
- 제 42 항에 있어서,상기 제1학습 속성값 메모리와 상기 제2학습 속성값 메모리, 상기 제1메모리와 상기 제5메모리, 상기 제3메모리와 상기 제4메모리를 각각 하나의 메모리로 구현하고, 읽기 과정과 쓰기 과정을 시간 분할로 처리하는, 신경망 컴퓨팅 장치.
- 제 42 항에 있어서,상기 연결선 조정 모듈은,상기 메모리 유닛으로부터의 연결선 속성값을 지연시키기 위한 제1지연수단;상기 제1학습 속성값 메모리로부터의 학습 속성값과 상기 메모리 유닛으로부터의 뉴런 속성값에 대하여 곱셈 연산을 수행하기 위한 곱셈기; 및상기 제1지연수단으로부터의 연결선 속성값과 상기 곱셈기의 출력 값에 대하여 덧셈 연산을 수행하여 새로운 연결선 속성값을 출력하기 위한 덧셈기를 포함하는 신경망 컴퓨팅 장치.
- 신경망 컴퓨팅 장치에 있어서,상기 신경망 컴퓨팅 장치를 제어하기 위한 제어 유닛;각각 연결선 속성값, 순방향 뉴런 속성값 및 역방향 뉴런 속성값을 저장하고 출력하며, 새로운 연결선 속성값을 계산하기 위한 복수 개의 메모리 유닛; 및상기 복수 개의 메모리 유닛으로부터 각각 입력되는 데이터를 바탕으로 새로운 순방향 뉴런 속성값과 역방향 뉴런 속성값을 계산하여 상기 복수 개의 메모리 유닛 각각으로 피드백시키기 위한 하나의 계산 유닛을 포함하는 신경망 컴퓨팅 장치.
- 제 46 항에 있어서,상기 복수 개의 메모리 유닛과 상기 하나의 계산 유닛은,상기 제어 유닛의 제어에 따라 하나의 시스템 클록에 동기화되어 파이프라인 방식으로 동작하는, 신경망 컴퓨팅 장치.
- 제 46 항 또는 제 47 항에 있어서,상기 복수 개의 메모리 유닛 각각은,제2메모리의 주소값을 저장하기 위한 제1메모리;연결선 속성값을 저장하기 위한 상기 제2메모리;뉴런의 고유번호를 저장하기 위한 제3메모리;역방향 뉴런 속성값을 저장하기 위한 제4메모리;상기 계산 유닛에서 계산된 새로운 역방향 뉴런 속성값을 저장하기 위한 제5메모리;뉴런의 고유번호를 저장하기 위한 제6메모리;순방향 뉴런 속성값을 저장하기 위한 제7메모리;상기 계산 유닛에서 계산된 새로운 순방향 뉴런 속성값을 저장하기 위한 제8메모리;상기 제2메모리의 입력을 선택하기 위한 제1스위치;상기 제4메모리 또는 상기 제7메모리의 출력을 상기 계산 유닛으로 스위칭하기 위한 제2스위치;상기 계산 유닛의 출력을 상기 제5메모리 또는 상기 제8메모리로 스위칭하기 위한 제3스위치; 및아웃셀(OutSel) 입력을 상기 제5메모리 또는 상기 제8메모리로 스위칭하기 위한 제4스위치를 포함하는 신경망 컴퓨팅 장치.
- 제 48 항에 있어서,상기 제어 유닛의 제어에 따라 모든 입출력을 서로 바꾸어 연결하는 이중 메모리 교체(SWAP) 회로를 상기 제4메모리와 상기 제5메모리에 적용하고 또한 상기 제7메모리와 상기 제8메모리에 적용하는, 신경망 컴퓨팅 장치.
- 제 48 항에 있어서,상기 제4메모리와 상기 제5메모리, 상기 제7메모리와 상기 제8메모리를 각각 하나의 메모리로 구현하고, 읽기 과정과 쓰기 과정을 시간 분할로 처리하는, 신경망 컴퓨팅 장치.
- 제 48 항에 있어서,상기 제어 유닛은,하기의 a 과정 내지 q 과정에 따라 각각의 상기 메모리 유닛 내의 각 메모리에 데이터를 저장하는, 신경망 컴퓨팅 장치.a. 인공 신경망 순방향 네트워크에서 모든 연결선 각각의 양쪽 끝을 화살표가 시작되는 한쪽 끝과 화살표가 끝나는 다른 한쪽 끝으로 구분할 때, 모든 연결선 양 쪽에 하기의 1 내지 4의 조건을 만족하는 번호를 부여하는 과정1. 모든 뉴런 각각에서 다른 뉴런으로 나가는 아웃바운드(outbound) 연결선들의 번호는 중복되지 않고 고유한 번호를 갖는 조건2. 모든 뉴런 각각에서 다른 뉴런으로부터 들어오는 인바운드(inbound) 연결선들의 번호는 중복되지 않고 고유한 번호를 갖는 조건3. 모든 연결선 양쪽의 번호는 같은 번호를 갖는 조건4. 상기 1 내지 3의 조건을 만족하되 가능한 한 낮은 숫자의 번호를 갖는 조건b. 모든 뉴런의 아웃바운드(outbound) 또는 인바운드(inbound) 연결선에 부여된 번호 중 가장 큰 수(Pmax)를 찾는 과정c. 신경망의 순방향 네트워크 내부에 다른 뉴런과 연결선으로 연결되어도 영향을 미치지 않는 속성값을 갖는 한 개의 널(null) 뉴런을 추가하는 과정d. 순방향 네트워크 내의 모든 뉴런 각각의 연결선에 할당된 번호를 유지한 채로 1부터 번까지 중 비어 있는 모든 번호에 새로운 연결선을 추가하여 총 개의 입력 연결선을 갖도록 확장하고, 추가된 연결선 각각은 어떤 뉴런과 연결되어도 영향을 미치지 않는 연결선 속성값을 갖거나 널(null) 뉴런과 연결되도록 설정하는 과정(p는 상기 신경망 컴퓨팅 장치 내 상기 메모리 유닛의 수)e. 순방향 네트워크 내 모든 뉴런 각각에 임의의 순서로 번호를 부여하는 과정f. 순방향 네트워크 내 모든 뉴런 각각의 연결선을 1번부터 순서대로 p개씩 나누어 개의 순방향 연결선 묶음으로 분류하고 묶음 내의 연결선 각각에 순서대로 1부터 시작하여 1씩 증가하는 새로운 번호 i를 부여하는 과정g. 첫 번째 뉴런의 첫 번째 순방향 연결선 묶음부터 마지막 번째 뉴런의 마지막 순방향 연결선 묶음까지 순서대로 1부터 시작하여 1씩 증가하는 번호 k를 부여하는 과정h. 상기 메모리 유닛 중 i번째 메모리 유닛의 제2메모리 및 제9메모리의 k번째 주소에는 k번째 순방향 연결선 묶음의 i번째 연결선의 속성값의 초기값을 저장하는 과정i. 상기 메모리 유닛 중 i번째 메모리 유닛의 제6메모리의 k번째 주소에는 k번째 순방향 연결선 묶음의 i번째 연결선에 연결된 뉴런의 고유 번호를 저장하는 과정j. 모든 상기 메모리 유닛 각각의 제7메모리와 제8메모리 각각의 j번째 주소에는 j를 고유번호로 하는 뉴런의 순방향 뉴런 속성값을 저장하는 과정k. 신경망의 역방향 네트워크 내부에 다른 뉴런과 연결선으로 연결되어도 영향을 미치지 않는 속성값을 갖는 한 개의 널(null) 뉴런을 추가하는 과정l. 역방향 네트워크 내의 모든 뉴런 각각의 연결선에 할당된 번호를 유지한 채로 1부터 번까지 중 비어 있는 모든 번호에 새로운 연결선을 추가하여 총 개의 입력 연결선을 갖도록 확장하고, 추가된 연결선 각각은 어떤 뉴런과 연결되어도 영향을 미치지 않는 연결선 속성값을 갖거나 널(null) 뉴런과 연결되도록 설정하는 과정m. 역방향 네트워크 내 모든 뉴런 각각의 연결선을 1번부터 순서대로 p개씩 나누어 개의 역방향 연결선 묶음으로 분류하고 묶음 내의 연결선 각각에 순서대로 1부터 시작하여 1씩 증가하는 새로운 번호 i를 부여하는 과정n. 첫 번째 뉴런의 첫 번째 역방향 연결선 묶음부터 마지막 번째 뉴런의 마지막 역방향 연결선 묶음까지 순서대로 1부터 시작하여 1씩 증가하는 번호 k를 부여하는 과정o. 상기 메모리 유닛 중 i번째 메모리 유닛의 제1메모리의 k번째 주소에는 k번째 역방향 연결선 묶음의 i번째 연결선이 상기 메모리 유닛 중 i번째 메모리 유닛의 제2메모리에서 위치하는 위치 값을 저장하는 과정p. 상기 메모리 유닛 중 i번째 메모리 유닛의 제3메모리의 k번째 주소에는 k번째 역방향 연결선 묶음의 i번째 연결선에 연결된 뉴런의 고유 번호를 저장하는 과정q. 모든 상기 메모리 유닛 각각의 제4메모리와 제5메모리 각각의 j번째 주소에는 j를 고유번호로 하는 뉴런의 역방향 뉴런 속성값을 저장하는 과정
- 제 51 항에 있어서,상기 a 과정의 조건을 만족하는 해를 호의 색칠 알고리즘(edge coloring algorithm)을 사용하여 구하는, 신경망 컴퓨팅 장치.
- 제 46 항 또는 제 47 항에 있어서,상기 복수 개의 메모리 유닛 각각은,제2메모리의 주소값을 저장하기 위한 제1메모리;연결선 속성값을 저장하기 위한 상기 제2메모리;뉴런의 고유번호를 저장하기 위한 제3메모리;역방향 뉴런 속성값 또는 순방향 뉴런 속성값을 저장하기 위한 제4메모리;상기 계산 유닛에서 계산된 새로운 역방향 뉴런 속성값 또는 순방향 뉴런 속성값을 저장하기 위한 제5메모리; 및상기 제2메모리의 입력을 선택하기 위한 스위치를 포함하는 신경망 컴퓨팅 장치.
- 제 46 항 또는 제 47 항에 있어서,상기 계산 유닛은,상기 복수 개의 메모리 유닛으로부터의 연결선 속성값과 순방향 뉴런 속성값 또는 연결선 속성값과 역방향 뉴런 속성값에 대해 곱셉 연산을 수행하기 위한 곱셈 연산부;상기 곱셈 연산부로부터의 복수의 출력값에 대해 하나 이상의 단계로 덧셈 연산을 수행하기 위한 트리 구조의 덧셈 연산부;상기 덧셈 연산부로부터의 출력값을 누적 연산하기 위한 누산기; 및상기 제어 유닛으로부터의 학습 데이터(Teach)와 상기 누산기로부터의 누적 출력값을 입력받아 새로운 순방향 뉴런 속성값 또는 역방향 뉴런 속성값을 계산하기 위한 소마(soma) 처리기를 포함하는 신경망 컴퓨팅 장치.
- 제 54 항에 있어서,상기 소마 처리기는,제1입력을 통하여 상기 누산기로부터 뉴런의 순 입력 또는 오차의 총 합을 입력받고, 제2입력을 통하여 출력 뉴런의 학습 데이터를 입력받으며, 제1출력을 통하여 새로 계산된 뉴런의 속성값 또는 오차값을 출력하고, 제2출력을 통하여 연결선 조정을 위한 뉴런의 속성값을 출력하며,출력 뉴런의 오차를 계산하는 주기에는 입력받은 학습 데이터(Teach)와 내부에 저장된 뉴런의 속성값의 차이로 오차값을 계산하여 내부에 저장하고 상기 제1출력을 통하여 출력하고,비 출력 뉴런의 오차를 계산하는 주기에는 상기 누산기로부터 오차 입력의 총합을 받아서 내부에 저장하고 상기 제1출력을 통하여 출력하며,회상 주기에는 상기 누산기로부터 뉴런의 순입력 값을 제공받아 활성화 함수를 적용하여 새로운 뉴런의 속성값을 계산하여 내부에 저장하고 상기 제1출력을 통하여 출력하고, 연결선 조정에 필요한 뉴런의 속성값을 계산하여 상기 제2출력을 통하여 출력하는, 신경망 컴퓨팅 장치.
- 제 54 항에 있어서,상기 소마 처리기를 병렬 계산 라인 기법을 적용하여 구현한, 신경망 컴퓨팅 장치.
- 신경망 컴퓨팅 시스템에 있어서,상기 신경망 컴퓨팅 시스템을 제어하기 위한 제어 유닛;"각각 연결선 속성값과 역방향 뉴런 속성값을 출력하거나, 각각 연결선 속성값과 순방향 뉴런 속성값을 출력하고 연결선 속성값과 순방향 뉴런 속성값과 학습 속성값을 이용하여 새로운 연결선 속성값을 계산하는 복수의 메모리 파트"를 포함하는 복수 개의 메모리 유닛; 및상기 복수 개의 메모리 유닛 내의 상응하는 복수의 메모리 파트로부터 각각 입력되는 연결선 속성값과 역방향 뉴런 속성값을 이용하여 새로운 역방향 뉴런 속성값을 각각 계산하여 상기 상응하는 복수의 메모리 파트 각각으로 피드백시키거나, 상기 상응하는 복수의 메모리 파트로부터 각각 입력되는 연결선 속성값과 순방향 뉴런 속성값을 이용하여 새로운 순방향 뉴런 속성값과 학습 속성값을 각각 계산하여 상기 상응하는 복수의 메모리 파트 각각으로 피드백시키기 위한 복수의 계산 유닛을 포함하는 신경망 컴퓨팅 시스템.
- 제 57 항에 있어서,상기 복수 개의 메모리 유닛 내의 상기 복수의 메모리 파트와 상기 복수의 계산 유닛은,상기 제어 유닛의 제어에 따라 하나의 시스템 클록에 동기화되어 파이프라인 방식으로 동작하는, 신경망 컴퓨팅 시스템.
- 제 57 항 또는 제 58 항에 있어서,각각의 상기 메모리 파트는,제2메모리의 주소값을 저장하기 위한 제1메모리;연결선 속성값을 저장하기 위한 상기 제2메모리;뉴런의 고유번호를 저장하기 위한 제3메모리;역방향 뉴런 속성값을 저장하기 위한 제1메모리 그룹;상기 계산 유닛에서 계산된 새로운 역방향 뉴런 속성값을 저장하기 위한 제2메모리 그룹;뉴런의 고유번호를 저장하기 위한 제4메모리;순방향 뉴런 속성값을 저장하기 제3메모리 그룹;상기 계산 유닛에서 계산된 새로운 순방향 뉴런 속성값을 저장하기 위한 제4메모리 그룹;상기 제2메모리의 입력을 선택하기 위한 제1스위치;상기 제1메모리 그룹 또는 상기 제3메모리 그룹의 출력을 상기 계산 유닛으로 스위칭하기 위한 제2스위치;상기 계산 유닛의 출력을 상기 제2메모리 그룹 또는 상기 제4메모리 그룹으로 스위칭하기 위한 제3스위치; 및아웃셀(OutSel) 입력을 상기 제2메모리 그룹 또는 상기 제4메모리 그룹으로 스위칭하기 위한 제4스위치를 포함하는 신경망 컴퓨팅 시스템.
- 제 57 항 또는 제 58 항에 있어서,상기 계산 유닛은,상기 상응하는 복수의 메모리 파트로부터의 연결선 속성값과 순방향 뉴런 속성값 또는 연결선 속성값과 역방향 뉴런 속성값에 대해 곱셉 연산을 수행하기 위한 곱셈 연산부;상기 곱셈 연산부로부터의 복수의 출력값에 대해 하나 이상의 단계로 덧셈 연산을 수행하기 위한 트리 구조의 덧셈 연산부;상기 덧셈 연산부로부터의 출력값을 누적 연산하기 위한 누산기; 및상기 제어 유닛으로부터의 학습 데이터(Teach)와 상기 누산기로부터의 누적 출력값을 입력받아 새로운 순방향 뉴런 속성값 또는 역방향 뉴런 속성값을 계산하기 위한 소마(soma) 처리기를 포함하는 신경망 컴퓨팅 시스템.
- 디지털 시스템의 메모리 장치에 있어서,외부의 제어 유닛으로부터의 제어 신호에 의해 제어되는 복수 개의 디지털 스위치를 이용하여 두 개의 메모리의 모든 입출력을 서로 바꾸어 연결하는 이중 메모리 교체(SWAP) 회로를 상기 두 개의 메모리에 적용한, 메모리 장치.
- 신경망 컴퓨팅 방법에 있어서,제어 유닛의 제어에 따라, 복수 개의 메모리 유닛이 각각 연결선 속성값과 뉴런 속성값을 출력하는 단계; 및상기 제어 유닛의 제어에 따라, 하나의 계산 유닛이 상기 복수 개의 메모리 유닛으로부터 각각 입력되는 연결선 속성값과 뉴런 속성값을 이용하여 새로운 뉴런 속성값을 계산하여 상기 복수 개의 메모리 유닛 각각으로 피드백시키는 단계를 포함하되,상기 복수 개의 메모리 유닛과 상기 하나의 계산 유닛이, 상기 제어 유닛의 제어에 따라 하나의 시스템 클록에 동기화되어 파이프라인 방식으로 동작하는, 신경망 컴퓨팅 방법.
- 신경망 컴퓨팅 방법에 있어서,제어 유닛의 제어에 따라, 상기 제어 유닛으로부터 입력 뉴런에 제공하기 위한 데이터를 입력받는 단계;상기 입력받은 데이터 또는 계산 유닛으로부터의 새로운 뉴런 속성값을 상기 제어 유닛의 제어에 따라 복수 개의 메모리 유닛으로 스위칭하는 단계;상기 제어 유닛의 제어에 따라, 상기 복수 개의 메모리 유닛이 각각 연결선 속성값과 뉴런 속성값을 출력하는 단계;상기 제어 유닛의 제어에 따라, 하나의 상기 계산 유닛이 상기 복수 개의 메모리 유닛으로부터 각각 입력되는 연결선 속성값과 뉴런 속성값을 이용하여 새로운 뉴런 속성값을 계산하는 단계; 및상기 제어 유닛의 제어에 따라 모든 입출력을 서로 바꾸어 연결하는 이중 메모리 교체(SWAP) 회로로 이루어진 제1 및 제2출력 수단이, 상기 계산 유닛으로부터의 새로운 뉴런 속성값이 상기 제어 유닛으로 출력되도록 하는 단계를 포함하는 신경망 컴퓨팅 방법.
- 신경망 컴퓨팅 방법에 있어서,제어 유닛의 제어에 따라, 복수 개의 메모리 유닛 내의 복수의 메모리 파트가 각각 연결선 속성값과 뉴런 속성값을 출력하는 단계; 및상기 제어 유닛의 제어에 따라, 복수의 계산 유닛이 상기 복수 개의 메모리 유닛 내의 상응하는 복수의 메모리 파트로부터 각각 입력되는 연결선 속성값과 뉴런 속성값을 이용하여 새로운 뉴런 속성값을 각각 계산하여 상기 상응하는 복수의 메모리 파트 각각으로 피드백시키는 단계를 포함하되,상기 복수 개의 메모리 유닛 내의 상기 복수의 메모리 파트와 상기 복수의 계산 유닛이, 상기 제어 유닛의 제어에 따라 하나의 시스템 클록에 동기화되어 파이프라인 방식으로 동작하는, 신경망 컴퓨팅 방법.
- 신경망 컴퓨팅 방법에 있어서,제어 유닛의 제어에 따라, 복수 개의 메모리 유닛이 각각 연결선 속성값과 뉴런 오차값을 출력하는 단계; 및상기 제어 유닛의 제어에 따라, 하나의 계산 유닛이 상기 복수 개의 메모리 유닛으로부터 각각 입력되는 연결선 속성값과 뉴런 오차값을 이용하여 새로운 뉴런 오차값을 계산하여 상기 복수 개의 메모리 유닛 각각으로 피드백시키는 단계를 포함하되,상기 복수 개의 메모리 유닛과 상기 하나의 계산 유닛이, 상기 제어 유닛의 제어에 따라 하나의 시스템 클록에 동기화되어 파이프라인 방식으로 동작하는, 신경망 컴퓨팅 방법.
- 신경망 컴퓨팅 방법에 있어서,제어 유닛의 제어에 따라, 복수 개의 메모리 유닛이 각각 연결선 속성값과 뉴런 속성값을 출력하는 단계;상기 제어 유닛의 제어에 따라, 하나의 계산 유닛이 상기 복수 개의 메모리 유닛으로부터 각각 입력되는 연결선 속성값과 뉴런 속성값을 이용하여 새로운 뉴런 속성값과 학습 속성값을 계산하는 단계; 및상기 제어 유닛의 제어에 따라, 상기 복수 개의 메모리 유닛이 연결선 속성값과 뉴런 속성값과 학습 속성값을 이용하여 새로운 연결선 속성값을 계산하는 단계를 포함하되,상기 복수 개의 메모리 유닛과 상기 하나의 계산 유닛이, 상기 제어 유닛의 제어에 따라 하나의 시스템 클록에 동기화되어 파이프라인 방식으로 동작하는, 신경망 컴퓨팅 방법.
- 신경망 컴퓨팅 방법에 있어서,제어 유닛의 제어에 따라, 복수 개의 메모리 유닛이 각각 연결선 속성값, 순방향 뉴런 속성값 및 역방향 뉴런 속성값을 저장하고 출력하며, 새로운 연결선 속성값을 계산하는 단계; 및상기 제어 유닛의 제어에 따라, 하나의 계산 유닛이 상기 복수 개의 메모리 유닛으로부터 각각 입력되는 데이터를 바탕으로 새로운 순방향 뉴런 속성값과 역방향 뉴런 속성값을 계산하여 상기 복수 개의 메모리 유닛 각각으로 피드백시키는 단계를 포함하되,상기 복수 개의 메모리 유닛과 상기 하나의 계산 유닛이, 상기 제어 유닛의 제어에 따라 하나의 시스템 클록에 동기화되어 파이프라인 방식으로 동작하는, 신경망 컴퓨팅 방법.
- 신경망 컴퓨팅 방법에 있어서,제어 유닛의 제어에 따라, 복수 개의 메모리 유닛 내의 복수의 메모리 파트가 각각 연결선 속성값과 역방향 뉴런 속성값을 출력하는 단계;상기 제어 유닛의 제어에 따라, 복수 개의 계산 유닛이 상기 복수 개의 메모리 유닛 내의 상응하는 복수의 메모리 파트로부터 각각 입력되는 연결선 속성값과 역방향 뉴런 속성값을 이용하여 새로운 역방향 뉴런 속성값을 각각 계산하여 상기 상응하는 복수의 메모리 파트 각각으로 피드백시키는 단계;상기 제어 유닛의 제어에 따라, 상기 복수 개의 메모리 유닛 내의 상기 복수의 메모리 파트가 각각 연결선 속성값과 순방향 뉴런 속성값을 출력하고 연결선 속성값과 순방향 뉴런 속성값과 학습 속성값을 이용하여 새로운 연결선 속성값을 계산하는 단계; 및상기 제어 유닛의 제어에 따라, 상기 복수 개의 계산 유닛이 상기 상응하는 복수의 메모리 파트로부터 각각 입력되는 연결선 속성값과 순방향 뉴런 속성값을 이용하여 새로운 순방향 뉴런 속성값과 학습 속성값을 각각 계산하여 상기 상응하는 복수의 메모리 파트 각각으로 피드백시키는 단계를 포함하되,상기 복수 개의 메모리 유닛 내의 상기 복수의 메모리 파트와 상기 복수 개의 계산 유닛이, 상기 제어 유닛의 제어에 따라 하나의 시스템 클록에 동기화되어 파이프라인 방식으로 동작하는, 신경망 컴퓨팅 방법.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/376,380 US20140344203A1 (en) | 2012-02-03 | 2012-04-20 | Neural network computing apparatus and system, and method therefor |
CN201280068894.7A CN104145281A (zh) | 2012-02-03 | 2012-04-20 | 神经网络计算装置和系统及其方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2012-0011256 | 2012-02-03 | ||
KR1020120011256A KR20130090147A (ko) | 2012-02-03 | 2012-02-03 | 신경망 컴퓨팅 장치 및 시스템과 그 방법 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013115431A1 true WO2013115431A1 (ko) | 2013-08-08 |
Family
ID=48905446
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2012/003067 WO2013115431A1 (ko) | 2012-02-03 | 2012-04-20 | 신경망 컴퓨팅 장치 및 시스템과 그 방법 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20140344203A1 (ko) |
KR (1) | KR20130090147A (ko) |
CN (1) | CN104145281A (ko) |
WO (1) | WO2013115431A1 (ko) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10169051B2 (en) | 2013-12-05 | 2019-01-01 | Blue Yonder GmbH | Data processing device, processor core array and method for characterizing behavior of equipment under observation |
TWI831312B (zh) * | 2022-07-29 | 2024-02-01 | 大陸商北京集創北方科技股份有限公司 | 燒錄控制電路、單次可編程記憶體裝置、電子晶片及資訊處理裝置 |
Families Citing this family (67)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9159020B2 (en) | 2012-09-14 | 2015-10-13 | International Business Machines Corporation | Multiplexing physical neurons to optimize power and area |
US9747547B2 (en) | 2013-10-22 | 2017-08-29 | In2H2 | Hardware enhancements to radial basis function with restricted coulomb energy learning and/or k-Nearest Neighbor based neural network classifiers |
US9852006B2 (en) | 2014-03-28 | 2017-12-26 | International Business Machines Corporation | Consolidating multiple neurosynaptic core circuits into one reconfigurable memory block maintaining neuronal information for the core circuits |
WO2016076534A1 (ko) * | 2014-11-12 | 2016-05-19 | 서울대학교산학협력단 | 뉴런 디바이스 및 뉴런 디바이스를 포함하는 집적회로 |
KR101727546B1 (ko) | 2014-11-12 | 2017-05-02 | 서울대학교산학협력단 | 뉴런 디바이스 및 뉴런 디바이스를 포함하는 집적회로 |
US10192162B2 (en) * | 2015-05-21 | 2019-01-29 | Google Llc | Vector computation unit in a neural network processor |
CN106250981B (zh) * | 2015-06-10 | 2022-04-01 | 三星电子株式会社 | 减少存储器访问和网络内带宽消耗的脉冲神经网络 |
CN105095966B (zh) * | 2015-07-16 | 2018-08-21 | 北京灵汐科技有限公司 | 人工神经网络和脉冲神经网络的混合计算系统 |
CN106447035B (zh) * | 2015-10-08 | 2019-02-26 | 上海兆芯集成电路有限公司 | 具有可变率执行单元的处理器 |
US20170140264A1 (en) * | 2015-11-12 | 2017-05-18 | Google Inc. | Neural random access machine |
US10846591B2 (en) * | 2015-12-29 | 2020-11-24 | Synopsys, Inc. | Configurable and programmable multi-core architecture with a specialized instruction set for embedded application based on neural networks |
CN111340200B (zh) * | 2016-01-20 | 2024-05-03 | 中科寒武纪科技股份有限公司 | 用于执行人工神经网络正向运算的装置和方法 |
WO2017127763A1 (en) * | 2016-01-21 | 2017-07-27 | In2H2 | Hardware enhancements to neural network classifiers |
CN107203807B (zh) * | 2016-03-16 | 2020-10-02 | 中国科学院计算技术研究所 | 神经网络加速器的片上缓存带宽均衡方法、系统及其装置 |
CN105760931A (zh) * | 2016-03-17 | 2016-07-13 | 上海新储集成电路有限公司 | 人工神经网络芯片及配备人工神经网络芯片的机器人 |
EP3451239A4 (en) * | 2016-04-29 | 2020-01-01 | Cambricon Technologies Corporation Limited | APPARATUS AND METHOD FOR PERFORMING RECURRENT NEURONAL NETWORK AND LTSM CALCULATIONS |
CN109284825B (zh) * | 2016-04-29 | 2020-04-14 | 中科寒武纪科技股份有限公司 | 用于执行lstm运算的装置和方法 |
CN106056211B (zh) * | 2016-05-25 | 2018-11-23 | 清华大学 | 神经元计算单元、神经元计算模块及人工神经网络计算核 |
US20180053086A1 (en) * | 2016-08-22 | 2018-02-22 | Kneron Inc. | Artificial neuron and controlling method thereof |
US10552732B2 (en) * | 2016-08-22 | 2020-02-04 | Kneron Inc. | Multi-layer neural network |
CN110908931B (zh) | 2016-08-26 | 2021-12-28 | 中科寒武纪科技股份有限公司 | Tlb模块的更新方法 |
DE102016216947A1 (de) * | 2016-09-07 | 2018-03-08 | Robert Bosch Gmbh | Modellberechnungseinheit und Steuergerät zur Berechnung eines mehrschichtigen Perzeptronenmodells |
JP6847386B2 (ja) | 2016-09-09 | 2021-03-24 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | ニューラルネットワークの正則化 |
KR20180034853A (ko) | 2016-09-28 | 2018-04-05 | 에스케이하이닉스 주식회사 | 합성곱 신경망의 연산 장치 및 방법 |
CN107992942B (zh) * | 2016-10-26 | 2021-10-01 | 上海磁宇信息科技有限公司 | 卷积神经网络芯片以及卷积神经网络芯片操作方法 |
US10140574B2 (en) * | 2016-12-31 | 2018-11-27 | Via Alliance Semiconductor Co., Ltd | Neural network unit with segmentable array width rotator and re-shapeable weight memory to match segment width to provide common weights to multiple rotator segments |
CN108304922B (zh) * | 2017-01-13 | 2020-12-15 | 华为技术有限公司 | 用于神经网络计算的计算设备和计算方法 |
US11823030B2 (en) | 2017-01-25 | 2023-11-21 | Tsinghua University | Neural network information receiving method, sending method, system, apparatus and readable storage medium |
US11551028B2 (en) * | 2017-04-04 | 2023-01-10 | Hailo Technologies Ltd. | Structured weight based sparsity in an artificial neural network |
US11544545B2 (en) | 2017-04-04 | 2023-01-03 | Hailo Technologies Ltd. | Structured activation based sparsity in an artificial neural network |
US10387298B2 (en) | 2017-04-04 | 2019-08-20 | Hailo Technologies Ltd | Artificial neural network incorporating emphasis and focus techniques |
US11238334B2 (en) | 2017-04-04 | 2022-02-01 | Hailo Technologies Ltd. | System and method of input alignment for efficient vector operations in an artificial neural network |
US11615297B2 (en) | 2017-04-04 | 2023-03-28 | Hailo Technologies Ltd. | Structured weight based sparsity in an artificial neural network compiler |
CN108734288B (zh) * | 2017-04-21 | 2021-01-29 | 上海寒武纪信息科技有限公司 | 一种运算方法及装置 |
CN107169563B (zh) * | 2017-05-08 | 2018-11-30 | 中国科学院计算技术研究所 | 应用于二值权重卷积网络的处理系统及方法 |
CN109214502B (zh) | 2017-07-03 | 2021-02-26 | 清华大学 | 神经网络权重离散化方法和系统 |
CN107832082B (zh) * | 2017-07-20 | 2020-08-04 | 上海寒武纪信息科技有限公司 | 一种用于执行人工神经网络正向运算的装置和方法 |
DE102017212835A1 (de) * | 2017-07-26 | 2019-01-31 | Robert Bosch Gmbh | Steuerungssystem für ein autonomes Fahrzeug |
US11256985B2 (en) | 2017-08-14 | 2022-02-22 | Sisense Ltd. | System and method for generating training sets for neural networks |
US20190050724A1 (en) | 2017-08-14 | 2019-02-14 | Sisense Ltd. | System and method for generating training sets for neural networks |
US11216437B2 (en) | 2017-08-14 | 2022-01-04 | Sisense Ltd. | System and method for representing query elements in an artificial neural network |
CN107748914A (zh) * | 2017-10-19 | 2018-03-02 | 珠海格力电器股份有限公司 | 人工神经网络运算电路 |
CN108874445A (zh) * | 2017-10-30 | 2018-11-23 | 上海寒武纪信息科技有限公司 | 神经网络处理器及使用处理器执行向量点积指令的方法 |
CN107844826B (zh) * | 2017-10-30 | 2020-07-31 | 中国科学院计算技术研究所 | 神经网络处理单元及包含该处理单元的处理系统 |
CN108304856B (zh) * | 2017-12-13 | 2020-02-28 | 中国科学院自动化研究所 | 基于皮层丘脑计算模型的图像分类方法 |
CN108153200A (zh) * | 2017-12-29 | 2018-06-12 | 贵州航天南海科技有限责任公司 | 一种三层神经网络路径规划的立体车库控制方法 |
US20190286988A1 (en) * | 2018-03-15 | 2019-09-19 | Ants Technology (Hk) Limited | Feature-based selective control of a neural network |
EP3776367A1 (en) * | 2018-03-28 | 2021-02-17 | Nvidia Corporation | Detecting data anomalies on a data interface using machine learning |
WO2019194466A1 (ko) * | 2018-04-03 | 2019-10-10 | 주식회사 퓨리오사에이아이 | 뉴럴 네트워크 프로세서 |
US10698730B2 (en) * | 2018-04-03 | 2020-06-30 | FuriosaAI Co. | Neural network processor |
CN115115720B (zh) * | 2018-04-25 | 2024-10-29 | 杭州海康威视数字技术股份有限公司 | 一种图像解码、编码方法、装置及其设备 |
CA3101026A1 (en) * | 2018-06-05 | 2019-12-12 | Lightelligence, Inc. | Optoelectronic computing systems |
CN109325591B (zh) * | 2018-09-26 | 2020-12-29 | 中国科学院计算技术研究所 | 面向Winograd卷积的神经网络处理器 |
KR102191428B1 (ko) * | 2018-10-30 | 2020-12-15 | 성균관대학교산학협력단 | 머신러닝 가속기 및 그의 행렬 연산 방법 |
JP7135743B2 (ja) * | 2018-11-06 | 2022-09-13 | 日本電信電話株式会社 | 分散処理システムおよび分散処理方法 |
JP6852141B2 (ja) * | 2018-11-29 | 2021-03-31 | キヤノン株式会社 | 情報処理装置、撮像装置、情報処理装置の制御方法、および、プログラム |
US12061971B2 (en) | 2019-08-12 | 2024-08-13 | Micron Technology, Inc. | Predictive maintenance of automotive engines |
US11635893B2 (en) * | 2019-08-12 | 2023-04-25 | Micron Technology, Inc. | Communications between processors and storage devices in automotive predictive maintenance implemented via artificial neural networks |
KR20210052059A (ko) * | 2019-10-31 | 2021-05-10 | 에스케이하이닉스 주식회사 | 반도체장치 |
US20210232902A1 (en) * | 2020-01-23 | 2021-07-29 | Spero Devices, Inc. | Data Flow Architecture for Processing with Memory Computation Modules |
US11221929B1 (en) | 2020-09-29 | 2022-01-11 | Hailo Technologies Ltd. | Data stream fault detection mechanism in an artificial neural network processor |
US11811421B2 (en) | 2020-09-29 | 2023-11-07 | Hailo Technologies Ltd. | Weights safety mechanism in an artificial neural network processor |
US11263077B1 (en) | 2020-09-29 | 2022-03-01 | Hailo Technologies Ltd. | Neural network intermediate results safety mechanism in an artificial neural network processor |
US11237894B1 (en) | 2020-09-29 | 2022-02-01 | Hailo Technologies Ltd. | Layer control unit instruction addressing safety mechanism in an artificial neural network processor |
US11874900B2 (en) | 2020-09-29 | 2024-01-16 | Hailo Technologies Ltd. | Cluster interlayer safety mechanism in an artificial neural network processor |
CN112732436B (zh) * | 2020-12-15 | 2022-04-22 | 电子科技大学 | 一种多核处理器-单图形处理器的深度强化学习加速方法 |
KR102547997B1 (ko) * | 2021-04-27 | 2023-06-27 | 건국대학교 산학협력단 | 효율적인 메모리 접근 방식을 이용한 뉴럴 네트워크 연산 가속 방법 및 장치 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020143720A1 (en) * | 2001-04-03 | 2002-10-03 | Anderson Robert Lee | Data structure for improved software implementation of a neural network |
US20030065631A1 (en) * | 2001-10-03 | 2003-04-03 | Mcbride Chad B. | Pipelined hardware implementation of a neural network circuit |
KR20040040075A (ko) * | 2002-11-06 | 2004-05-12 | 학교법인 인하학원 | 재구성능력 및 확장능력을 가진 신경회로망 하드웨어 |
US20080319933A1 (en) * | 2006-12-08 | 2008-12-25 | Medhat Moussa | Architecture, system and method for artificial neural network implementation |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4974169A (en) * | 1989-01-18 | 1990-11-27 | Grumman Aerospace Corporation | Neural network with memory cycling |
US5065339A (en) * | 1990-05-22 | 1991-11-12 | International Business Machines Corporation | Orthogonal row-column neural processor |
US5329611A (en) * | 1990-05-22 | 1994-07-12 | International Business Machines Corp. | Scalable flow virtual learning neurocomputer |
JP2647330B2 (ja) * | 1992-05-12 | 1997-08-27 | インターナショナル・ビジネス・マシーンズ・コーポレイション | 超並列コンピューティングシステム |
-
2012
- 2012-02-03 KR KR1020120011256A patent/KR20130090147A/ko not_active Application Discontinuation
- 2012-04-20 CN CN201280068894.7A patent/CN104145281A/zh active Pending
- 2012-04-20 WO PCT/KR2012/003067 patent/WO2013115431A1/ko active Application Filing
- 2012-04-20 US US14/376,380 patent/US20140344203A1/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020143720A1 (en) * | 2001-04-03 | 2002-10-03 | Anderson Robert Lee | Data structure for improved software implementation of a neural network |
US20030065631A1 (en) * | 2001-10-03 | 2003-04-03 | Mcbride Chad B. | Pipelined hardware implementation of a neural network circuit |
KR20040040075A (ko) * | 2002-11-06 | 2004-05-12 | 학교법인 인하학원 | 재구성능력 및 확장능력을 가진 신경회로망 하드웨어 |
US20080319933A1 (en) * | 2006-12-08 | 2008-12-25 | Medhat Moussa | Architecture, system and method for artificial neural network implementation |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10169051B2 (en) | 2013-12-05 | 2019-01-01 | Blue Yonder GmbH | Data processing device, processor core array and method for characterizing behavior of equipment under observation |
US10481923B2 (en) | 2013-12-05 | 2019-11-19 | Jda Software, Inc. | Data processing device, processor core array and method for characterizing behavior of equipment under observation |
US10922093B2 (en) | 2013-12-05 | 2021-02-16 | Blue Yonder Group, Inc. | Data processing device, processor core array and method for characterizing behavior of equipment under observation |
TWI831312B (zh) * | 2022-07-29 | 2024-02-01 | 大陸商北京集創北方科技股份有限公司 | 燒錄控制電路、單次可編程記憶體裝置、電子晶片及資訊處理裝置 |
Also Published As
Publication number | Publication date |
---|---|
CN104145281A (zh) | 2014-11-12 |
US20140344203A1 (en) | 2014-11-20 |
KR20130090147A (ko) | 2013-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2013115431A1 (ko) | 신경망 컴퓨팅 장치 및 시스템과 그 방법 | |
WO2019245348A1 (en) | Neural processor | |
WO2015016640A1 (ko) | 신경망 컴퓨팅 장치 및 시스템과 그 방법 | |
WO2015024492A1 (zh) | 基于通用单元的高性能处理器系统和方法 | |
WO2017222140A1 (ko) | Cnn 기반 인루프 필터를 포함하는 부호화 방법과 장치 및 복호화 방법과 장치 | |
WO2009131376A2 (en) | Multiple antenna communication system including adaptive updating and changing of codebooks | |
WO2020242260A1 (ko) | 전역적 문맥을 이용하는 기계 학습 기반의 이미지 압축을 위한 방법 및 장치 | |
WO2022154457A1 (en) | Action localization method, device, electronic equipment, and computer-readable storage medium | |
WO2019172685A1 (en) | Electronic apparatus and control method thereof | |
WO2017028597A1 (zh) | 一种虚拟资源的数据处理方法及装置 | |
WO2023153818A1 (en) | Method of providing neural network model and electronic apparatus for performing the same | |
WO2023080276A1 (ko) | 쿼리 기반 데이터베이스 연동 딥러닝 분산 시스템 및 그 방법 | |
WO2022019443A1 (ko) | 효율적인 양자 모듈러 곱셈기 및 양자 모듈러 곱셈 방법 | |
WO2022092416A1 (ko) | 인공신경망 데이터 지역성에 기초한 인공 신경망 메모리 시스템 | |
WO2024005464A1 (ko) | 데이터 클리닉 방법, 데이터 클리닉 방법이 저장된 컴퓨터 프로그램 및 데이터 클리닉 방법을 수행하는 컴퓨팅 장치 | |
WO2020262825A1 (ko) | 위노그라드 알고리즘에 기반한 행렬 곱셈 방법 및 장치 | |
WO2023128009A1 (ko) | 뉴럴 프로세싱 장치 및 그의 동기화 방법 | |
WO2021256843A1 (ko) | 동형 암호문에 대한 통계 연산 수행하는 장치 및 방법 | |
WO2022154326A1 (ko) | 가상화된 리소스를 관리하는 방법, 장치 및 컴퓨터 프로그램 | |
WO2022092988A1 (en) | A memory device for an artificial neural network | |
WO2024096381A1 (ko) | 신경망 및 액티베이션의 파티션 및 시뮬레이션 방법 및 이를 위한 컴퓨팅 장치 | |
WO2024010437A1 (ko) | 신경 프로세싱 유닛 및 이의 동작 방법 | |
WO2024162598A1 (ko) | 곱셈기와 누적기를 이용한 양자화를 수행하는 전자 장치 및 그 제어 방법 | |
WO2011083900A1 (en) | Codebook design method for multiple-input multiple-output (mimo) communication system and method for using the codebook | |
WO2024076163A1 (ko) | 신경망 연산방법과 이를 위한 npu 및 컴퓨팅 장치 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12867148 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14376380 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16/01/2015) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12867148 Country of ref document: EP Kind code of ref document: A1 |