WO2015016640A1 - 신경망 컴퓨팅 장치 및 시스템과 그 방법 - Google Patents
신경망 컴퓨팅 장치 및 시스템과 그 방법 Download PDFInfo
- Publication number
- WO2015016640A1 WO2015016640A1 PCT/KR2014/007065 KR2014007065W WO2015016640A1 WO 2015016640 A1 WO2015016640 A1 WO 2015016640A1 KR 2014007065 W KR2014007065 W KR 2014007065W WO 2015016640 A1 WO2015016640 A1 WO 2015016640A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- memory
- output
- neural network
- input
- connection line
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Definitions
- Some embodiments of the present invention relate to the field of digital neural network computing technology, and more particularly to distributed memory that stores artificial neural network data, operating in a synchronized circuit where all components are synchronized to one system clock.
- a neural network computing device, system and method comprising a structure and a computational structure for time-dividing all neurons in a pipeline circuit.
- a digital neural network computer is an electronic circuit implemented for the purpose of simulating a biological neural network to implement functions similar to the role of the brain.
- the method of constructing artificial neural networks is called a neural network model.
- artificial neurons are connected by directional connectors (synapses) to form a network, and signals from the outputs of pre-synaptic neurons connected to them are summed up in the dendrites and added to the neurons. It is processed in the main body (soma).
- Each neuron has its own state and attribute values, and in Soma it updates the state values of post-synaptic neuron neurons and calculates new outputs based on input from dendrites, The output is passed through the input leads of a plurality of other neurons to affect adjacent neurons.
- connection line between the neuron and the neuron may also have a plurality of unique state and attribute values, and basically controls the strength of the signal transmitted by the connection line.
- the state value of the connection line most commonly used in most neural network models is a weight value indicating the connection strength of the connection line.
- the state value means the value that changes as the value is initially assigned and is calculated.
- the property value means the value that does not change once the value is specified.
- connection-specific values the state value and the attribute value of the connection line
- neuron-specific values the state value and the attribute value of the neuron
- digital neural network computers cannot change the value of neurons linearly, so they calculate once for the whole neuron and reflect the result in the next calculation.
- the cycle for calculating the entire neuron once is called the neural network update cycle.
- the execution of the digital artificial neural network proceeds by repeatedly executing the neural network update cycle.
- the method of reflecting the results of neurons calculation is a non-overlapping updating method that reflects the result in the next cycle after the calculation of the entire neurons is completed, and all neurons are sequentially calculated at any time within a specific update cycle. It is divided into overlapping updating method that reflects the result.
- Equation 1 the calculation of the output value of a new neuron may be expressed by a generalized equation as shown in Equation 1 below.
- y j (T) is the output value of the neuron j calculated in the T-th neural network update period
- f N is a neuron function for updating a plurality of state values of the neuron and calculating one new output value
- f S is a plurality of A synaptic function that updates a state value and computes one output value
- SN j is a set of state and attribute values of any of a plurality of neurons j
- SS ij is any of a plurality of state values and attributes of the i-th connection line of neuron j
- p j is the number of input leads of neuron j
- M ij is the reference number of the neuron connected to the i-th input lead of neuron j.
- Equation 2 the value of a neuron is expressed as a real number or an integer, and is calculated as shown in Equation 2 below.
- w ij is a weight value of the i-th input connection line of neuron j.
- [Equation 2] is one of several cases of [Equation 1]
- SS ij of [Equation 1] is the weight of one connecting line
- Synapse function f S is the weight value (W ij ) and input This expression is multiplied by the value (y Mij ).
- spike neural networks which work similarly to neural networks in biological brains
- neurons emit instantaneous spike signals, which spike for a certain amount of time depending on the intrinsic properties of the connection line before being transmitted to the synapse.
- the delayed spike signal (synapse) generates signals in various patterns, and dendrites add these signals to the input of the soma.
- Soma updates the state value based on this input signal and the state values of a plurality of neurons, and emits one spike signal to the output when certain conditions are satisfied.
- connection line may have several state values and property values in addition to the weights of the connection lines, and may include arbitrary calculation formulas according to the neural network model, and neurons may also have one or more state values and property values. It can be calculated by any formula according to the neural network model. For example, in the "Izhikevich" model, one neuron has two state values and four attribute values, and according to the attribute values, various spike patterns can be reproduced similarly to biological neurons.
- One of these spiked neural network models such as the biologically-realistic Hodgkin-Huxley (HH) model, requires more than 240 operators to compute one neuron, and the neural network refresh cycle It has a disadvantage that the amount of calculation is huge because it must be calculated every period corresponding to 0.05 milliseconds of the neuron.
- HH biologically-realistic Hodgkin-Huxley
- Neurons in the artificial neural network can be divided into input neurons that receive input from the outside, output neurons that serve to process the result, and the remaining hidden neurons.
- an input layer consisting of input neurons, one or more hidden layers, and an output layer consisting of output neurons are connected in series, and one layer of neurons Only neurons in the next layer are connected.
- the artificial neural network stores the knowledge information in the form of a connecting line weight value in the neural network to derive a desirable result.
- the step of accumulating knowledge by adjusting the connection weight value of the artificial neural network is called a learning mode, and the step of finding stored knowledge by presenting input data is called a recall mode.
- the neuron's state and output values as well as the weight of the connection line are updated together in one neural network update cycle.
- Hub's theory states that the strength of a neural network's connector is both the output of a pre-synaptic neuron connected as input to the connector and the value of a post-synaptic neuron that accepts input through the connector. The theory is that it is strengthened when it is strong and gradually weakened when it is not. If this learning method is generalized, it can be expressed as [Equation 3] below.
- Lj is a value calculated by calculating the state value of the neuron j and the output value, and will be referred to as a learning state value for convenience.
- the learning state value is characterized in that the connection line specific value is excluded and is composed only of the neuron specific value.
- a typical hebbian learning rule is defined as in Equation 4 below.
- ⁇ is a constant value for adjusting the learning speed.
- the learning state value L j is ⁇ * y j .
- STDP Spike Timing Dependant Plasticity
- Backpropagation algorithm is a supervised learning method in which a supervisor outside of the system in learning mode specifies the most desirable output value corresponding to a specific input value, that is, a learning value, and is a neural network update cycle. Sub-cycles such as 1 to 5 below.
- the outputs of the pre-synaptic neurons which are connected to and provide values for the leads, and the errors of the post-synaptic neurons are reflected.
- the calculation formula for calculating the learning state value L j may vary according to various methods even within the backpropagation algorithm.
- the backpropagation algorithm is characterized in that data flows forward and backward in a network of a neural network, in which weight values of connecting lines are shared between forward and reverse directions.
- the depth trust network has a network in which a plurality of Restricted Boltzmann Machines (RBMs) are connected in series.
- RBMs Restricted Boltzmann Machines
- each RBM is composed of n visible layer neurons and m hidden layer neurons for any number n and m so that all neurons of each layer are completely connected to neurons of the same layer. It has a network structure that is connected to all neurons of other layers.
- the learning calculation of the deep trust network assigns the value of the neurons of the visible layer of the front RBM to the value of the training data and executes the RBM learning procedure to adjust the values of the connecting lines, derive new values of the hidden layer, The neuron value of the hidden layer becomes the input value of the visible layer of the next RBM, and sequentially calculates all RBMs.
- the learning calculation of the depth trust network is performed by adjusting the weight of the connection line by repeatedly applying a plurality of learning data.
- the calculation procedure for learning one learning data is as follows.
- La vpos a vector of values of the visible layer neurons calculate the value of all the neurons of the hidden layer to the input and is referred to as the vpos hpos the vectors of all neurons in the hidden neurons value.
- the vector hpos is the output of this RBM (RBM step 1).
- the vector hpos is calculated and the values of all the neurons in the visible layer are called vneg (RBM step 2).
- the vector vneg is input to recalculate the values of the neurons in the hidden layer, and the vector is called hneg (RBM step 3).
- vpos the elements of the visible layer neurons connected to the connection line vpos i, the element of vneg vneg i d and the hpos of elements of hidden neurons connected to the connection line hpos j, an element of hneg for each of all the connection lines hneg When j is called Add in proportion to.
- Such a deep trust network requires a large amount of computation and is difficult to implement in hardware because of a large amount of computational process and complexity, and thus has a disadvantage in that the computation speed is slow and low power and real-time processing are not easy because it must be processed in software.
- Neural network computers are used to predict the future based on pattern recognition or a priori knowledge to find the most appropriate pattern for a given input, and can be used in various fields such as robot control, military equipment, medicine, games, weather information processing, and human-machine interfaces. Can be used for
- Direct implementation is implemented by mapping logical neurons of artificial neural networks to physical neurons one-to-one, and most analog neural network chips fall into this category.
- direct implementation method can achieve a high processing speed, it is difficult to apply various neural network models and it is difficult to apply to a large neural network.
- the conventional direct implementation method can produce a high processing speed, but it is difficult to apply various neural network models and it is difficult to apply to a large neural network.
- the conventional virtual implementation method has various neural network models and a large scale. There is a problem that the neural network can be executed but it is difficult to obtain a high speed, and one of the problems of the present invention is to solve this problem.
- An embodiment of the present invention is a distributed memory structure that stores artificial neural network data and computations that time-sequence all neurons in a pipeline circuit, operating as a synchronized circuit where all components are synchronized to one system clock.
- the present invention provides a neural network computing device and system and a method thereof, which can be applied to various neural network models and large-scale neural networks, and at the same time, enable high-speed processing.
- a neural network computing device for controlling the neural network computing device;
- a plurality of memory units for outputting output values of pre-synaptic neurons, respectively, using dual port memory;
- a calculation subsystem for calculating an output value of a new post-synaptic neuron using the output values of the front end neurons respectively input from the plurality of memory units and feeding back the output values of the new post-synaptic neurons to each of the plurality of memory units.
- a neural network computing system includes a control unit for controlling the neural network computing system; A plurality of network subsystems each consisting of a plurality of memory units for outputting output values of pre-synaptic neurons, each using dual port memory; And calculating output values of new post-synaptic neurons by using output values of front end neurons connected from the plurality of memory units included in one of the plurality of network subsystems, respectively. It may include a plurality of calculation subsystems for feeding back to each one.
- Multiprocessor computing system the control unit for controlling the multiprocessor computing system; And a plurality of processor subsystems, each of which calculates a portion of the total amount of calculation and outputs a portion of the calculation result for sharing with other processors, each of the plurality of processor subsystems calculating a portion of the total amount of calculation and the other value.
- a memory device includes: a first memory for storing reference numbers of front end neurons of a connection line; And a second port memory having a read port and a write port, for storing an output value of the neuron.
- a neural network computing method under control of a control unit, includes: outputting, by a plurality of memory units, output values of pre-synaptic neurons using a dual port memory; And according to the control of the control unit, one calculation subsystem calculates output values of new post-synaptic neurons by using output values of front end neurons respectively input from the plurality of memory units. And feeding back each of the units, wherein the plurality of memory units and the one computing subsystem operate in a pipelined manner in synchronization with one system clock under control of the control unit.
- the network topology there is no restriction on the network topology, the number of neurons, and the number of connecting lines of the neural network, and there is an effect of executing various neural network models including arbitrary synaptic and neuronal functions.
- the neural network computing system can design a number of connection lines that can be processed at the same time arbitrarily, and recall or train up to p connections at the same time every clock cycle (train) There is an advantage that can be run at high speed.
- a large-capacity general-purpose neural network computer can be implemented, but also integrated into a small semiconductor, and thus it can be applied to various artificial neural network applications.
- FIG. 1 is a block diagram of a neural network computing device according to an embodiment of the present invention.
- FIG. 2 is a detailed configuration diagram of a control unit according to an embodiment of the present invention.
- FIG. 3 is an exemplary diagram of a neural network showing neurons and data flows according to an embodiment of the present invention
- 4A and 4B are diagrams for explaining a method of distributedly storing reference numbers of front end neurons in an M memory according to an embodiment of the present invention
- FIG. 5 is a view showing a flow of data proceeded by a control signal according to an embodiment of the present invention
- SWAP dual memory replacement
- FIG. 7 is a block diagram of a calculation subsystem according to an embodiment of the present invention.
- FIG. 8 is a configuration diagram of a synaptic unit supporting a spiking neural network model according to an embodiment of the present invention.
- FIG. 9 is a configuration diagram of a dendrite unit according to an embodiment of the present invention.
- FIG. 10 is a configuration diagram of one attribute value memory according to an embodiment of the present invention.
- FIG. 11 is a diagram showing the structure of a system using a multi-time scale method according to an embodiment of the present invention.
- FIG. 12 is a diagram illustrating a structure for calculating a neural network using a learning method as described in [Equation 3] according to an embodiment of the present invention
- FIG. 13 is a diagram showing a structure for calculating a neural network using a learning method according to another embodiment of the present invention.
- FIG. 14 is an exemplary diagram of a memory unit according to an embodiment of the present invention.
- FIG. 15 is another exemplary diagram of a memory unit according to an embodiment of the present invention.
- 16 is another exemplary diagram of a memory unit according to an embodiment of the present invention.
- 17 is an exemplary diagram of a neural network computing system according to an embodiment of the present invention.
- FIG. 18 is a diagram for describing a memory control signal generation method in a control unit according to one embodiment of the present invention.
- FIG. 19 is a block diagram of a multiprocessor computing system according to another embodiment of the present invention.
- 20A to 20C are diagrams for describing a result of optimizing a synaptic function represented by assembly code and designing assembly code according to a design procedure according to an embodiment of the present invention.
- FIG. 1 is a block diagram of a neural network computing device according to an embodiment of the present invention, and shows a basic detailed structure thereof.
- the neural network computing device may output an output value of a control unit 100 for controlling the neural network computing device, and output values of pre-synaptic neurons of a connection line, respectively.
- One calculation subsystem 106 for calculating an output value and feeding it back through the output 104 to the input 105 of each of the plurality of memory units 102.
- each of the InSel input (connecting wire bundle number 107) and the OutSel input (the address where the newly calculated neuron output value is to be stored and the write permission signal 108) respectively connected to the control unit 100 are each of the plurality of memory units 102.
- the outputs 101 of the plurality of memory units 102 are connected to the inputs of the computation subsystem 106.
- the output of the computation subsystem 106 (output value of the postsynaptic neuron) is commonly connected to the inputs of all of the plurality of memory units 102 via a "HILLOCK" bus 109.
- the value of the input neuron from the control unit 100 is controlled under the control of the control unit 100.
- the output 104 of the calculation subsystem 106 is connected to the control unit 100 to transmit the output value of the neuron to the outside.
- Each memory unit 102 stores an M memory (first memory 112) for storing reference numbers of front end neurons (address values of the following Y memory in which output values of neurons are stored), and output values of neurons.
- Y memory (second memory) 113 for the purpose of communication.
- the Y memory 113 includes a dual port memory having two ports, a read port 114 and 115 and a write port 116 and 117, and a data output (DO) of the first memory.
- the read port's address input (AD: 114) is connected, the read port's data output 115 is connected to the output unit 101 of the memory unit 102, the write port's data input (DI: Data Input 117 is connected to the input 105 of the memory unit 102 and commonly connected to the inputs of the other memory units.
- the address inputs (AD: Address Input, 119) of the M memory 112 of all the memory units 102 are commonly tied and connected to the InSel input 107, and the address input of the write port of the Y memory 113 ( 116) and a write enable (WE) 116 are commonly connected to the OutSel input 108 and used to store output values of neurons. Therefore, the Y memory 113 of all the memory units 102 has the same output values of all the neurons.
- a first synaptic shear neuron outputted from the M memory (connection front end)
- the reference number of the neuron may be further included. All of the first registers 120 are synchronized to one system clock such that the read ports 114 and 115 of the M memory 112 and the Y memory 113 are in a pipelined manner under the control of the control unit 100. To work.
- a third register temporary storing the output value of the new neuron output from the calculation subsystem, 122 may be further included in the output terminal 104 of the calculation subsystem 106.
- the second and third registers 121, 122 are synchronized by one system clock such that the plurality of memory units 102 and one computing subsystem 106 are piped under control of the control unit 100. It works by line method.
- a method for operating the neural network computing device to calculate a general artificial neural network comprising: a synaptic shear connected to M memory 112 of a plurality of memory units 102 connected to an input connection line of all neurons in an artificial neural network
- the reference numbers of the neurons are distributed and stored, and a calculation function is performed according to the following steps a to d.
- the values of the InSel inputs 107 are sequentially changed to be transmitted to the address input 119 of the M memory 112 of each of the plurality of memory units 102, and the data output 118 of the M memory 112 is performed. Sequentially outputting reference numbers of synaptic shear neurons connected to the input line of the neurons
- the output value of the synaptic shear neuron connected to the input connection line of the neuron is sequentially output to the data output 115 of the read port of the Y memory 113 of each of the plurality of memory units 102, thereby outputting the output of the memory unit 102 ( Inputting to the plurality of inputs 103 of the calculation subsystem 106 via 101.
- the output value of the post-synaptic neurons calculated by the calculation subsystem 106 is output through the output 104 and then the input 105 and the Y memory 113 of each of the plurality of memory units 102. Storing sequentially through the write port 117 of the
- the neural network computing device distributedly stores the reference numbers of the synaptic shear neurons connected to the input connection lines of all the neurons in the artificial neural network in the M memory 112 of the plurality of memory units 102, the following steps a to f may be performed according to the process.
- the Y memory 113 of the plurality of memory units 102 has the same contents in all the Y memories 113 because the write ports 116 and 117 are commonly connected to the write ports of the Y memory of all the other memory units. Is stored, and the output value of the i th neuron is stored in the i th address.
- the control unit 100 supplies the InSel input 107 with a number value of the connection bundle bundle starting from 1 and incremented by 1 for every system clock cycle, and after the neural network update cycle starts, After the system clock period has passed, the output values of the synaptic front end neurons of all the connection lines included in the specific connection line bundle are sequentially output to the output 115 of the plurality of memory units 102. In this order, the sequence of connecting bundles outputs is repeated from the first bundle of neurons 1 to the last bundle of neurons, and then from the first bundle of neurons to the last bundle of neurons, and the last connection of the last neuron. Repeat until the bundle is printed.
- the calculation subsystem 106 then receives the output 101 of each memory unit 102 as an input and calculates the new state value and output value of the neuron. If every neuron has a bundle of n connections each, after a certain system clock period has elapsed since the neural network update cycle starts, the data of the bundle of connections of each neuron is sequentially input to the input 103 of the calculation subsystem 106. The output 104 of the calculation subsystem 106 calculates and outputs the output value of the new neuron every n system clock cycles.
- FIG. 2 is a detailed configuration diagram of a control unit according to an embodiment of the present invention.
- control unit 200 provides various control signals to the neural network computing device 201 as described above in FIG. 1 and initializes each memory in the system. 202, real time or non real time input data loading 203, real time or non real time output data retrieval 204, and the like.
- the control unit 200 may be connected to the host computer 208 to receive control from the user.
- control circuit 205 provides the neural network computing device 201 with all the control signals 206 and the clock signal 207 necessary to sequentially process each bundle and each neuron within the neural network update period. .
- embodiments of the present invention may be pre-programmed stand-alone by a microprocessor or the like to be utilized in applications that process real-time input / output.
- FIG. 3 is an exemplary diagram of a neural network showing neurons and data flows according to an embodiment of the present invention.
- Each neuron has a unique output value 303, and the connecting line connecting the neuron and the neuron has a unique weight value 304.
- w 14 (304) represents the weight of the connecting line from neuron 1 (301) to neuron 4 (302), where the pre-synaptic neuron is neuron 1 (301) and the back end of the connecting line. (post-synaptic) neurons are neuron 4 (302).
- FIGS. 4A and 4B are diagrams for describing a method of distributing and storing reference numbers of front end neurons of a connection line in an M memory according to an embodiment of the present invention.
- a method of distributedly storing reference numbers of synaptic shear neurons connected to input wires of all neurons in an artificial neural network in M memory 112 of the plurality of memory units 102 is illustrated.
- two virtual neurons 401 are added to two connection lines 400.
- Four connecting lines of each of the neurons are arranged in a row in two bundles (see FIG. 4A).
- the first column 402 in the set of sorted connector bundles is stored as the contents of the M memory 403 of the first memory unit 406, and the second column 404 is the M memory 405 of the second memory unit. Is stored as.
- FIG. 4B is a diagram showing the contents of a memory inside each of the two memory units.
- the output value of the neuron is stored in the Y memory 407 of the first memory unit 406.
- the virtual connection line adds a virtual neuron 8 408 whose output value is always 0, and all virtual connection lines 409 are connected to the virtual neuron 8 408.
- FIG. 5 is a diagram illustrating a flow of data progressed by a control signal according to an embodiment of the present invention.
- the unique number of the bundle bundle is sequentially input by the control unit 100 via the InSel inputs 410 and 500.
- the first register 411, 501 has a reference to the neuron connected as an input to the i-th wire of the kth wire bundle in the next clock period. The number is stored.
- the output value of the neuron connected as an input to the i-th connection line of the k-th connection line is stored in the second registers 121 and 502 connected to the output 407 of the memory unit 406 and the calculation sub Delivered to system 106.
- the calculation subsystem 106 performs calculation using the input data to sequentially calculate and output the output values of the new neurons, and the new output values of the neurons are temporarily stored in the third register 122, and the " HILLOCK "bus 109 is stored in the Y memory 113 through the input 105,503 of each memory unit 102.
- cells 504 indicated by bold lines indicate the flow of data of neuron 1. Once all neurons in the neural network have been computed, one neural network update cycle may end and the next neural network update cycle may begin.
- the neural network computing device described in the above-described embodiment of the present invention may use the following method as an additional method when the neural network to be calculated is a multi-layer network.
- the neural network computing device may include a reference number of a neuron connected to an input connection line of each neuron included in the corresponding layer for each of one or a plurality of hidden layers and an output layer, and the M memory (first memory) of the plurality of memory units 102. , 112) Accumulatively store and store in a specific address range, and perform a calculation function according to steps a and b below.
- the value of the address input 119 of the M memory (first memory) 112 of the plurality of memory units 102 is sequentially changed within the address range of the corresponding layer, thereby outputting data 118 of the M memory 112. ) Process of sequentially outputting the reference number of the neuron connected to the input connection line of the neuron in the layer
- the neural network computing device may accumulate and store reference numbers of neurons in a specific address range of the M memory 112 of the plurality of memory units 102 in order to calculate a neural network composed of the multi-layer network. For each of one or a plurality of hidden layers and output layers in the multi-layer network, a method of repeatedly performing steps a to f may be used.
- the calculation function is performed by inputting the calculation result of the previous layer (output value of the neuron) step by step from the input layer to the output layer.
- the dual port memory used as the Y memory 113 of the memory unit 102 and providing a read port and a write port is a physical dual with a logic circuit capable of simultaneously accessing one memory to the same clock cycle.
- Port memory may be included.
- the dual port memory used as the Y memory 113 of the memory unit 102 includes two input / output ports that time-divisionally access one physical memory at different clock cycles. can do.
- the dual port memory used as the Y memory 113 of the memory unit 102 may have two identical physical memories 600, as shown in FIG. 601, and all the inputs and outputs of two identical physical memories 600 and 601 are interchangeably connected using a plurality of digital switches 602 to 606 controlled by a control signal from the control unit 100.
- SWAP dual memory replacement
- Such a dual memory replacement circuit can be effectively used when the neural network computing device uses a non-overlapping updating method that completes the calculation of the whole neuron and then reflects the result in the next cycle. That is, when the dual memory replacement circuit is used as the Y memory 113 of the memory unit 102, when one neural network update period ends and the control unit 100 changes the SWAP signal, the Y memory 113 in the previous neural network update period. The contents stored through the write ports 116 and 117 of FIG. 2 are instantly changed into the contents of the memory accessed through the read ports 114 and 115.
- FIG. 7 is a block diagram of a calculation subsystem according to an embodiment of the present invention.
- output values of new post-synaptic neurons are calculated using output values of pre-synaptic neurons input 103 respectively from the plurality of memory units 102.
- Computing subsystems 106 and 700 for feeding back the input 105 of each of the plurality of memory units 102 via an output 104 receive synapses from the corresponding outputs of the plurality of memory units 701.
- One dendrite unit that receives a plurality of synaptic units 702 for performing a specific calculation f S , and calculates the total sum of the inputs transmitted from all connecting lines of the neurons by receiving the outputs of the plurality of synaptic units 702 ( 703, a soma unit 704 that receives the output of the dendrite unit 703, updates a state value of a neuron, calculates a new output value, and outputs the output to the output 708 of the calculation subsystem 700.
- a soma unit 704 that receives the output of the dendrite unit 703, updates a state value of a neuron, calculates a new output value, and outputs the output to the output 708 of the calculation subsystem 700.
- the internal structure of the synapse unit 702, the dendrites 703, and the soma unit 704 may vary depending on the neural network model that the computation subsystem 700 calculates.
- synaptic unit 702 that may be implemented differently according to the neural network model is the case of the spiking neural network model.
- the synaptic unit 702 performs synaptic specific calculation.
- the synaptic specific calculation is the strength of the signal passing through the synapse according to the state value of the connection line including the axon delay function and the weight of the connection line to delay the signal by a specific neural network update cycle according to the property value (axon delay value) specific to each synapse. It is made with a calculation function to adjust it.
- FIG. 8 is a block diagram of a synaptic unit supporting a spiking neural network model according to an embodiment of the present invention.
- the synaptic unit has a state value of a connection line including weights of an axon delay unit 800 and a connection line that delay a signal by a specific neural network update period according to an attribute value (axon delay value) specific to each synapse.
- an attribute value axon delay value
- the axon delay unit 800 is implemented as a dual-port memory, the data width for storing the axon delay state value of the connection line is n-1 bit when the maximum delay time (number of update cycles) is n, Axon delay state value memory 808, one n-bit shift register 802, one n-to-1 selector 803, and axon delay attribute value memory 804 that stores the axon delay attribute values of synapses ) May be included.
- the 1-bit input from the inputs 707 and 805 of the synaptic unit and the data output of the read port of the axon delay state value memory 808 are respectively bit 0 and bit 1 to bit () of the shift register 802. n-1), and the lower n bits of the output of the shift register 802 are connected to the data input 807 of the write port of the axon delay state value memory 808.
- the n bit output of the shift register 802 is also connected to the input of the n-to-1 selector 803, where one bit is selected according to the output value of the axon delay attribute value memory 804 so that the n ⁇ to the output of the to-1 selector 803.
- the write port of the axon delay state value memory 808 is stored after being stored in the 0th bit of the shift register 802. Is stored in memory via data input 807.
- this one-bit signal appears as bit 1 of the data output 806 of the read port of the axon delay state value memory 808 and is raised one bit each time the neural network update period is repeated, resulting in Since the spike values of the last N neural network update periods are stored as the n-bit output of the shift register 802 and the spikes before the latest i times appear in the i-th bit, the axon delay attribute value memory 804 is stored in i. If it has a value, the previous spike value i is outputted to the output of the n-to-1 selector 803.
- Using the circuit of the axon delay unit 800 as described above has the advantage of delaying all spikes no matter how frequently spikes occur.
- FIG. 9 is a block diagram of a dendrites according to an embodiment of the present invention.
- the structure of the dendrite unit 703 for most neural network models includes an add operation unit 900 having a tree structure for performing an addition operation on a plurality of input values in one or more steps. It may include an accumulator 901 for accumulating the output value from the add operation unit 900.
- registers 902 to 904 are further synchronized with the system clock so that the components can operate as pipeline circuits operating in synchronization with the system clock. do.
- the function of the soma unit 704 calculates a new output value while updating the state value with the net-input value of the neuron introduced from the dendrite unit 703 and the state value inside the soma unit 704. Output to the output (708). Because neuron-specific calculations can vary greatly depending on the neural network model, the structure of the soma unit 704 is unstructured.
- Synaptic specific calculations of the synaptic unit 702 or neuron specific calculations of the soma unit 704 may not only be standardized in various neural network models, but may also include very complex functions.
- the synaptic unit 702 or soma is a high-speed pipelined circuit capable of processing one input / output in every clock period by using the following method for any calculation function.
- Unit 704 can be designed.
- step (2) expressing the calculation in pseudo-assembly code.
- the input value defined in step (1) becomes the input value of the pseudo assembly code, and the output value becomes the return value.
- Each state value and attribute value assumes that there is a corresponding memory, and the first part of the code reads the attribute and state values from the memory, and at the end of the code, the changed state values are stored in the memory.
- step (3) Arrange and connect a shift register group consisting of a plurality of shift registers each of which corresponds to the input value, the state value, and the attribute value in an empty circuit, as many as the number of instructions of the assembly code designed in step (2). Steps. This is called a register file.
- step (3) a plurality of dual port memories corresponding to each of the state value and the attribute value defined in step (1) are added in parallel with the register file, and the data of the read port of each memory is added. Connecting the output to the input of the corresponding register of the first register group of the register file, and respectively connecting the output of the register corresponding to the state value of the last register group of the register file to the data input of the write port of each state value memory. .
- the external input connects to the input of the corresponding register of the first group of registers in the register file.
- the state value x gradually decreases with the magnitude of the state value x and the constant a.
- the state value x increases instantaneously by a constant b.
- the input value is a 1-bit spike I
- the state values are x
- the attribute values are a and b
- the function is expressed in assembly code as shown in FIG. 20A. This assembly code includes one conditional statement (2000), subtraction (2001), division (2002), and addition (2003).
- the result of designing this assembly code as in the above design procedure is as shown in Fig. 20B, and the result after optimization is as shown in Fig. 20C.
- each shift register acts as a pipeline circuit that operates in synchronization with the clock, so all steps are executed in parallel and have throughput to process one input and one output per clock period.
- the circuits of the synaptic unit 702, the soma unit 704, or the special case of the dendrite unit 703 may be implemented by a combination of circuits designed as described above.
- the characteristic of such a circuit is that each of a predetermined number of state value memories implemented as a dual port memory, an arbitrary number of attribute value memories, and data read sequentially from the read ports of the state value memory and the attribute value memory. It is implemented as a pipeline circuit (calculation circuit) which takes all or part of the calculation and sequentially calculates a new state value and an output value, and sequentially stores all or part of the calculation result in the state value storage memory.
- registers 705 and 706 that operate in synchronization with the system clock so that each of the units can operate in a pipelined manner. To be able.
- the apparatus may further include a register configured to operate in synchronization with a system clock between all or a part of components constituting each of all or some of the units included in the calculation subsystem 700. It can be implemented as a pipeline circuit that operates in synchronization with.
- the internal structure of the components may be implemented by pipeline circuits operating in synchronization with the system clock.
- the entire computational subsystem can be designed as a pipeline circuit that operates in synchronization with the system clock.
- the attribute value memory included in the computation subsystem is a memory that only reads while the computation is in progress.
- the property value memory included in the calculation subsystem is a memory required in the manner shown in FIG. The total amount of can be reduced.
- one attribute value memory refers to a look-up memory 1000 and a plurality of attribute values which store a plurality of (finite number) attribute values and whose outputs are connected to a calculation circuit to provide the attribute values.
- a property value reference number memory 1001 that stores the number and outputs the output to the address input of the lookup memory 1000.
- a large amount of calculation is required for neuron calculation, and the amount of calculation is increased because it needs to be updated every short period compared to the time of the biological neuron.
- synaptic-specific calculations do not require short cycle calculations, but the synaptic-specific calculations have to perform many calculations when the entire system's update period is adapted to neuron-specific calculations.
- a multi-time scale (MTS) method may be used, in which a calculation cycle of a synapse and a neuron calculation cycle are set differently. In this method, synaptic-specific calculations have a longer update period than neuron-specific calculations, and neuron-specific calculations are performed several times while synaptic-specific calculations are performed once.
- FIG. 11 is a diagram illustrating a structure of a system using a multi-time scale method according to an embodiment of the present invention.
- a dual port memory 1103 that performs a buffer function between different neural network update periods is further added.
- Each Y memory of each memory unit 1106 may be implemented as a dual replacement memory as described above using two independent memories 1107 and 1108. While one synaptic specific calculation cycle is performed and the net input value of the neuron is stored in the dual port memory 1103, the soma unit 1104 reads the net input value of the neuron from the dual port memory 1103 several times to obtain a neuron. Perform certain calculations repeatedly.
- the calculation subsystem 1100 sets the neural network update period of the synaptic specific calculation calculated by the synaptic unit 1101 and the dendrite unit 1102 and the neural network cycle of the neuron specific calculation calculated by the soma unit 1104 differently.
- One or more neural network update cycles for performing neuron specific calculations are repeatedly performed one or more times while the neural network update cycle for performing synaptic specific calculations is performed. Therefore, the net input value calculated once has the effect that the same value is continuously used during several neuron-specific calculations.
- the output of the soma unit 1104 i.e., the spikes of neurons
- the synaptic specific calculations are stored cumulatively in one of the Y memories 1108 while the synaptic specific calculations continue and at the end of the calculation cycle of the synaptic specific calculations, 1107 and 1108 may reverse roles by multiplexer circuitry to resume synaptic specific calculations based on accumulated spikes.
- Using such a multi-time scale method can reduce the number of synaptic units and use the soma unit more efficiently to obtain high performance with the same hardware resources.
- FIG. 12 is a diagram illustrating a structure for calculating a neural network using a learning method as described in Equation 3 according to an embodiment of the present invention.
- the synapse unit 1200 includes a connection line weight memory for storing the weight value of the connection line as one of the state value memories, and another input 1211 for receiving a learning state value.
- the soma unit 1201 further includes another output 1210 for outputting a learning state value, and the other output 1210 of the soma unit 1201 is commonly used as the other input 1211 of all the synaptic units 1200. Connected.
- the neural network computing device distributes and stores reference numbers of neurons connected to the input connection lines of all the neurons in the neural network in the M memory 112 of the plurality of memory units 102 and 1202, and stores the reference numbers of the synaptic units 1200.
- connection line weight memory an initial value of the connection line weight of each input connection line of each neuron may be stored, and the learning calculation function may be performed according to the following steps a to f.
- the synaptic unit 1200 receives a new connection line output value by inputting output values of input neurons sequentially transmitted from the memory unit 1202 through one input 1203 and connection line weight values sequentially transmitted from the output of the connection line weight memory. Calculating sequentially and outputting to the output 1204 of the synaptic unit
- the dendrites 1205 sequentially receive inputs from the outputs 1204 of the plurality of synaptic units through the inputs 1206 composed of the plurality of inputs, and sequentially add the total sum of the inputs transmitted from all connecting lines of the neurons. Computing and outputting through output 1207
- the soma unit 1201 sequentially receives neuron input values from the output 1207 of the dendrites through an input 1208, updates the state values of neurons, and calculates new output values sequentially to output one output 1209. Outputting the output value sequentially through the step S), and simultaneously calculating a new learning state value L j based on the input value and the state value and sequentially outputting it to the other output 1210.
- Each of the plurality of synaptic units 1200 may include a learning state value L j sequentially transmitted through another input 1211 and an output value of an input neuron sequentially transmitted through one input 1203 and the connection line weight memory.
- a step of sequentially calculating new connection weight values by storing the connection weight values sequentially transmitted from the output and storing the connection weight values in the connection weight memory.
- the learning state value memory 1212 which is implemented as a dual port memory, may serve to adjust the timing by temporarily storing the learning state value among inputs to which 1211 is commonly connected.
- the learning calculation is performed at the time when the output value of the input neurons sequentially transmitted through the one input 1203 of the synapse unit 1200 and the connection weight value sequentially transmitted from the output of the connection weight weight memory are generated.
- the learning state value L j sequentially transmitted through the input 1211 is calculated in the soma unit 1201 in the previous neural network update period and the value stored in the learning state value memory 1212 is used.
- the learning calculation function may be performed according to the following steps a to f.
- the synapse unit 1300 sequentially inputs new connection line output values by inputting output values of input neurons sequentially transmitted from the memory unit 1303 through one input and connection line weight values sequentially transmitted from the output of the connection line weight memory 1304. Calculate and output to the output of the synapse unit 1300, and at the same time to the two first-in first-out queue (1305, 1306) the connection line weight value transmitted sequentially from the output value of the input neuron and the output of the connection line weight memory (1304). Enter each
- the dendrite unit 1301 receives an input sequentially from the outputs of the plurality of synaptic units 1300 through an input consisting of a plurality of inputs, and sequentially calculates the total sum of the inputs transmitted from all the connecting lines of the neurons, thereby outputting the output. Output through
- the soma unit 1302 sequentially receives the input values of the neurons at the output of the dendrites 1301 through the input, updates the state values of the neurons, calculates the new output values sequentially, and sequentially outputs the output values through one output. Outputting and sequentially calculating a new learning state value L j based on the input value and the state value and sequentially outputting it to another output 1308.
- Each of the plurality of synaptic units 1300 may include a learning state value L j sequentially transmitted through the other input 1308 and an output value of an input neuron delayed by the queue at the outputs of the two queues 1305 and 1306, respectively.
- Computing a new connection line weight value sequentially by inputting the connection line weight value (1307) and storing it in the connection line weight memory (1304).
- all data used for learning can be calculated using the data generated during the current update cycle.
- a method for computing a neural network including a bidirectional connection in which forward and reverse calculations are simultaneously applied to the same connection line as the back-propagation algorithm, wherein the neural network computing device is configured to include the plurality of neural networks.
- the process of storing data in the M memory 112 in the memory unit 102 and the state value memory and the attribute value memory of the plurality of synaptic units may be performed according to the following steps a to d.
- a neuron that provides a forward input, A, and a neuron that receives a forward input, B adds a new reverse connection line from neuron B to neuron A to the forward network.
- connection line arrangement algorithm uses forward connection lines and reverse connection lines of each bidirectional connection line in the same memory unit. And the process of placing them in synaptic units
- connection line state value and the connection line attribute value of the connection line when the connection line is the forward connection line at each k-th address of each of the state value memory and the attribute value memory included in each of the plurality of synaptic units.
- FIG. 14 is a diagram illustrating a memory unit in accordance with an embodiment of the present invention.
- each of the plurality of memory units 102 and 1400 approaches the state value memory 1402 and the attribute value memory 1403 of the synaptic unit 1401, when the corresponding connection line is the reverse connection line, the corresponding connection line corresponds to the reverse connection line.
- the reverse connection line for storing the reference number of the forward connection line corresponding to the reverse connection line A reference number memory 1404 and controlled by the control unit 100 to select one of a control signal of the control unit 100 and a data output of the reverse connection line reference number memory 1404 to select the memory unit ( A digital switch 1406 connected to the synaptic unit 1401 through an output 1405 of 1400, and used to sequentially select a state value and an attribute value of a connecting line.
- a control signal is provided directly from the control unit without passing through the reverse connecting line reference number memory.
- connection line arrangement for arranging the connection line so that the position of the memory unit in which the data of the forward connection line is stored and the location of the memory unit in which the data of the reverse connection line are stored are the same.
- the algorithm uses an edge coloring algorithm in the graph to represent all bidirectional connections in the neural network as an edge in the graph, all neurons in the neural network as nodes in the graph, and in the neural network By expressing the number of the memory unit in which the connection line is stored as a color in the graph, a method of arranging the forward and reverse connection lines in the same memory unit number can be used.
- the arc coloring algorithm which assigns the same color to both sides of the arc and does not assign the same color to the other arcs of both neurons connected to the arc, ensures that the forward and reverse connectors of a particular connector are assigned the same memory unit number. Is essentially the same as Therefore, the arc coloring algorithm can be used as the connection line placement algorithm.
- the structure of the neural network to be calculated is that when all the bidirectional connections are included in a complete bipartite graph between two layers, that is, the connections in which the forward and reverse connections are shared are two neuron groups.
- the coloring algorithm of the arc when each bidirectional connection line is connected from one group of i group of neurons to another group of j group of neurons in the case where all neurons of one group are connected to all neurons of another group, respectively.
- a simpler method can be used to place the corresponding forward and reverse leads in the number of (i + j) mod pth memory units, respectively.
- (i + j) mod p is assigned the same memory unit number because the forward and reverse directions have the same value.
- 15 is another exemplary diagram of a memory unit according to an embodiment of the present invention.
- each of the plurality of memory units 102 and 1500 includes an M memory 1501 for storing reference numbers of neurons connected to a connection line, and a dual port having two ports, a read port and a write port.
- Y1 memory 1502 consisting of a memory
- Y2 memory 1503 consisting of a dual port memory having two ports, a read port and a write port, controlled by a control signal from the control unit 100 and the Y1 memory 1502
- SWAP dual memory replacement
- the first logical dual port 1505 formed by the dual memory replacement circuit 1504 has an address input 1506 of the read port connected to the output of the M memory 1501 and a data output 1507 of the read port. It is an output of the unit 1500, and the data input 1508 of the write port is commonly connected to the data input of the write port of the first logical dual port of other memory units, and is used for storing a newly calculated neuron output.
- the second logical dual port 1509 formed by the dual memory replacement circuit 1504 has a data input 1510 of a write port commonly connected to a data input of a write port of a second logical dual port of other memory units. Can be used to store the value of the input neuron to be used in the next neural network update cycle.
- This structure has the advantage of being able to perform calculations and storage of input data in parallel during the entire neural network update cycle.
- This method can be effectively used when the number of input neurons, which is a general feature of a multi-layer neural network, is large.
- 16 is another exemplary diagram of a memory unit according to an embodiment of the present invention.
- each of the plurality of memory units 102 and 1600 includes an M memory 1601 for storing reference numbers of neurons connected to a connecting line, and a dual port having two ports, a read port and a write port.
- Y1 memory 1602 made up of memory
- Y2 memory 1603 made up of dual port memory having two ports, a read port and a write port, controlled by a control signal from the control unit 100 and the Y1 memory 1602
- SWAP dual memory replacement circuit
- the port 1605 has an address input 1606 of the read port connected to the output of the M memory 1601, and a data output 1607 of the read port becomes one output of the memory unit 1600.
- a port's data input 1608 is commonly connected with the data input of the write port of the first logical dual port of the other memory units to store the newly calculated neuron output, and the dual memory replacement circuit forms the two
- the second logical dual port 1609 has an address input 1610 of the read port connected to the output of the M memory 1601, and a data output 1611 of the read port connected to the other output of the memory unit 1600.
- the neuron of the neural network update period may be output.
- this structure can simultaneously output the neuron output value of the previous neural network cycle and the neuron output value of the current neural network cycle, and when the neural network calculation model needs the neuron output of the neural network update cycle T and the neuron output of the neural network update cycle T-1 at the same time Can be used effectively.
- each of the plurality of memory units M memory for storing the reference number of the neuron connected to the connecting line
- Y1 memory consisting of dual port memory having two ports of read port and write port
- Y2 memory consisting of dual port memory with ports
- Y3 memory consisting of dual port memory with two ports of read port and write port
- SWAP triple memory replacement
- the first logical dual port formed by the triple memory replacement circuit has a data input of a write port connected in common with the data input of a write port of the first logical dual port of other memory units so that a value of an input neuron to be used in the next neural network update period is obtained.
- the second logical dual port formed by the triple memory replacement circuit is used to store an address input of a read port connected to the output of the M memory, and a data output of the read port becomes one output of the memory unit.
- the data input of is connected to the data input of the write port of the second logical dual port of other memory units in common to store the newly calculated neuron output, and the third logical dual port formed by the triple memory replacement circuit. Is the address input of the read port Lee and connected to the output, the data output of the read port connected to another output of the memory unit and outputs the output value of a neuron of the previous neural network update cycle.
- This method is a mixture of the above-described method shown in Figs. 15 and 16, and can be used when the input of input data, the calculation, and the learning process through the values of previous neurons occur simultaneously.
- the method for calculating the backpropagation neural network algorithm comprises a connection line weight memory for storing the weight value of the connection line as one of the state value memory, and further input other receiving the learning state value
- the soma unit further includes a learning temporary value memory for temporarily storing a learning temporary value therein, another input for receiving learning data, and another output for outputting a learning state value.
- the system temporarily stores a learning state value and adjusts timing.
- the system has a learning state value memory connected to an input unit at another output of the soma unit, and another input of the synaptic unit connected in common to an output unit. It includes more.
- a method for computing a backpropagation neural network learning algorithm wherein the neural network computing device comprises, for each of one or a plurality of hidden layers and an output layer of a forward network, and each of one or a plurality of hidden layers of a reverse network, each neuron included in the corresponding layer.
- the reference numbers of neurons connected to the input connection lines of the plurality of memory units are distributed and stored in a specific address range of the first memory of the plurality of memory units, and the initial value of the connection line weight of each input connection line of each neuron in the connection line weight memory of the plurality of synaptic units And store the calculation function according to steps a to e below.
- the second memories of the plurality of memory units include two dual port memories and two logical dual port memories by a dual memory replacement circuit, and in the next neural network update period.
- the input data to be used is prestored in a second logical dual port memory, so that steps a and be above can be performed in parallel.
- the soma unit 704 in the calculation subsystem 106 calculates a learning temporary value when performing step b above and stores it in the learning temporary value memory for temporary storage until a future learning state value L j is calculated. .
- the soma unit 704 in the calculation subsystem 106 may perform the step of calculating the error value of the output neuron of step c above in the forward propagation step of step b above to shorten the calculation time.
- the soma unit 704 in the calculation subsystem 106 calculates the learning state value L j after calculating the error value of the neuron in step c and step d above, respectively, and outputs the output through the other output.
- the learning state value L j stored in the state value memory and stored in the learning state value memory may be used to calculate the weight value W ij of the connection line in step e.
- the Y memory of the plurality of memory units 102 has two logical dual port memories by two dual port memories and a dual memory replacement circuit, and a second logical dual port.
- the memory may output the output value of the neuron of the previous neural network update period to the other output of the memory unit, and simultaneously perform the above step e and the b step of the next neural network update period to reduce the calculation time.
- a method for performing a learning calculation of a depth trust network includes, for each of the RBM first, second, and third stages of each RBM, reference to a neuron connected to an input connection line of each neuron included in the corresponding stage. Accumulate and store the numbers in a specific address range of the first memory of the plurality of memory units, accumulate and store the reverse connection line information of the second stage in the reverse connection line reference number memory, and store each neuron in the connection line weight memory of the plurality of synaptic units, respectively.
- the calculation subsystem takes as input the Y (S) area of the second memory of the memory unit and performs the calculation of the RBM first step to store the calculation result hpos in the Y (D) area of the second memory of the memory unit.
- Processes c3 to c6 may be performed simultaneously in one process.
- the neural network computing device is configured such that the control unit of the memory unit is assigned to the address input of each of one or a plurality of memories in the memory unit or the computational subsystem, the value of which is added to the address value to be accessed by the offset value specified. It further includes an offset circuit that makes it easy to change the access range.
- the control unit needs to generate the information necessary for generating the control signal of each control step to facilitate the control. It is provided with a Stage Operation Table (SOT) including a, and can be used to operate the system by reading a record of the procedure operation table for each control step.
- the procedure table is composed of a plurality of records, each of which contains various system parameters required to perform one calculation procedure, such as the offset of each memory and the size of the network. Some of these records contain identifiers from other records, which can act as a GO TO statement.
- the system reads the system parameters from the current record in the procedural table to set up the system and moves the current record pointer sequentially to the next record. If the current record is a GO TO statement, it does not go to the sequential record, but to the record identifier contained in the record.
- 17 is an exemplary diagram of a neural network computing system according to an embodiment of the present invention.
- the neural network computing system includes a control unit 1700 for controlling the neural network computing system, a plurality of network subsystems 1702 each consisting of a plurality of memory units 1701, each of the above A plurality of calculations for calculating and outputting output values of new post-synaptic neurons using output values of front-end neurons connected from a plurality of memory units 1701 included in one of the plurality of network subsystems 1702. Between the subsystem 1703 and the input signal 1705 to which the output 1704 of the plurality of calculation subsystems 1703 and the feedback inputs of all the memory units 1701 are commonly connected. And a multiplexer 1706 that multiplexes the output 1704.
- Each of the plurality of memory units 1701 in the network subsystem 1702 is the same as the structure of the memory unit 102 in the single system described above, and each for outputting output values of pre-synaptic neurons.
- the frequency of occurrence of data output at the output 1704 of each of the plurality of computation subsystems 1703 is one per n clock periods. Therefore, when multiplexing the output of the calculation subsystem 1703 in the multiplexer 1706, up to n calculation subsystems 1703 can be multiplexed without overflow, and the multiplexed data is stored in all network subsystems 1702. It can be stored in the Y memory of all the memory units 1701.
- control unit 100 includes a plurality of shift registers 1800 connected in a row, and if only the signal of the first register 1801 is sequentially changed, a time difference is obtained.
- the memory structure combining the plurality of neural network computing devices described in an embodiment of the present invention may be utilized in not only all neural network computing systems but also a multiprocessor computing system including a plurality of general processors.
- FIG. 19 is a block diagram of a multiprocessor computing system according to another embodiment of the present invention.
- a multiprocessor computing system includes a control unit 1900 for controlling the multiprocessor computing system, and a portion of a calculation result for each to calculate a portion of the total computation and share with other processors. And a plurality of processor subsystems 1901 for outputting the.
- the processor subsystem may include one processor device 1902 and one processor device 1902, each of which outputs a portion of the calculation result to calculate a portion of the total calculation amount and share it with other processors.
- a memory group 1903 that performs communication functions between different processors, wherein the memory group 1903 includes N dual port memories 1904 each having a read port and a write port;
- a decoder circuit (not shown) that incorporates read ports of dual port memory 1904 to perform the functions of integrated memory 1905 with N times the capacity of each memory taking up a portion of the total capacity,
- the integrated memory 1905 which is integrated by the decoder circuit of the memory group, has a bundle 1906 of address input and data output directed to the processor device 1902. The results are always accessible by the processor unit 1902, a write port (1907) of said N dual-port memory are respectively connected to the output (1908) of said N processor subsystem 1901.
- the processor device 1902 in all the processor subsystems 1901 acquires data that needs to be shared with other processor devices, the processor device 1902 outputs to the output 1908, and the output data is stored in the memory group in all the processor subsystems 1901. It is stored through the write port 1907 of one of the dual port memories 1904 in 1903 and can be accessed via the read port of the memory group immediately on all other processor subsystems.
- the processor subsystem 1901 further includes a local memory 1909 independently used by the processor device, the local memory and the space of the memory accessible through the read port 1906 of the memory group.
- the processor devices 1902 may directly access the contents of the local memory 1909 and the shared memory (memory group) stored by another system without a distinction.
- the integrated memory integrated by the local memory 1909 and the decoder circuit of the memory group is mapped to one memory map so that a program of the processor device 1902 may access the data of the local memory and the data of the integrated memory without distinction.
- each processor subsystem processes an image processing system that processes an image represented by a combination of a plurality of pixels of a two-dimensional screen.
- Each processor subsystem computes a portion of the two-dimensional view.
- an image processing algorithm applies a series of filter functions to a raw image, and each pixel value of an nth filtered image is used to calculate an n + 1th filtered image.
- the calculation of a particular pixel is calculated as the input of neighboring pixels at that pixel position in the previous filtered screen, so that the processor subsystem calculates the pixel values calculated by other processor subsystems to calculate the edge pixels of the screen area responsible for processing.
- each processor subsystem shares the calculation results with the other processor subsystems, and each processor subsystem performs the calculation without a hardware device for communication and no delay time for communication. Can be done.
- the present invention can be used in the field of digital neural network computing technology and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Feedback Control In General (AREA)
Abstract
Description
Claims (56)
- 신경망 컴퓨팅 장치에 있어서,상기 신경망 컴퓨팅 장치를 제어하기 위한 제어 유닛;듀얼 포트 메모리를 이용하여 각각 연결선 전단(pre-synaptic) 뉴런의 출력값을 출력하기 위한 복수 개의 메모리 유닛; 및상기 복수 개의 메모리 유닛으로부터 각각 입력되는 연결선 전단 뉴런의 출력값을 이용하여 새로운 연결선 후단(post-synaptic) 뉴런의 출력값을 계산하여 상기 복수 개의 메모리 유닛 각각으로 피드백시키기 위한 하나의 계산 서브시스템을 포함하는 신경망 컴퓨팅 장치.
- 제 1 항에 있어서,상기 복수 개의 메모리 유닛 각각은,연결선 전단 뉴런의 참조번호를 저장하기 위한 제 1 메모리; 및읽기 포트와 쓰기 포트를 구비한 상기 듀얼 포트 메모리로 이루어져, 뉴런의 출력값을 저장하기 위한 제 2 메모리를 포함하는 신경망 컴퓨팅 장치.
- 제 2 항에 있어서,상기 신경망 컴퓨팅 장치는,상기 복수 개의 메모리 유닛의 상기 제 1 메모리에 신경망 내의 모든 뉴런의 입력 연결선에 연결된 뉴런의 참조번호를 분산 저장하고, 하기의 a 단계 내지 d 단계에 따라 계산 기능을 수행하는, 신경망 컴퓨팅 장치.a. 상기 복수 개의 메모리 유닛 각각의 상기 제 1 메모리의 주소 입력의 값을 순차적으로 변화시켜, 상기 제 1 메모리의 데이터 출력에 뉴런의 입력 연결선에 연결된 뉴런의 참조번호를 순차적으로 출력하는 단계b. 상기 복수 개의 메모리 유닛 각각의 상기 제 2 메모리의 읽기 포트의 데이터 출력에 뉴런의 입력 연결선에 연결된 뉴런의 출력값을 순차적으로 출력시켜 상기 복수 개의 메모리 유닛 각각의 출력을 통해 상기 계산 서브시스템의 복수 입력에 입력하는 단계c. 상기 계산 서브시스템에서 새로운 연결선 후단 뉴런의 출력값을 순차적으로 계산하는 단계d. 상기 계산 서브시스템에서 계산한 연결선 후단 뉴런의 출력값을 상기 복수 개의 메모리 유닛 각각의 상기 제 2 메모리의 쓰기 포트를 통해 순차적으로 저장하는 단계
- 제 3 항에 있어서,상기 신경망 컴퓨팅 장치는,하기의 a 과정 내지 f 과정에 따라 상기 복수 개의 메모리 유닛의 상기 제 1 메모리에 신경망 내의 모든 뉴런의 입력 연결선에 연결된 뉴런의 참조번호를 분산 저장하는, 신경망 컴퓨팅 장치.a. 신경망 내에서 가장 많은 수의 입력 연결선을 가진 뉴런의 입력 연결선의 수(Pmax)를 찾는 과정b. 상기 메모리 유닛의 수를 p라 할 때, 신경망 내의 모든 뉴런이 개의 연결선을 갖도록 각각의 뉴런에 어떤 뉴런이 연결되어도 인접 뉴런에 영향을 미치지 않는 가상의 연결선을 추가하는 과정c. 신경망 내 모든 뉴런을 임의의 순서로 정렬하고 일련번호를 부여하는 과정e. 첫 번째 뉴런의 첫 번째 연결선 묶음부터 마지막 번째 뉴런의 마지막 연결선 묶음까지 순서대로 일련 번호 k를 부여하는 과정f. 상기 복수 개의 메모리 유닛 중 i번째 메모리 유닛의 제 1 메모리의 k번째 주소에는 k번째 연결선 묶음의 i번째 연결선에 연결된 연결선 전단 뉴런의 참조번호 값을 저장하는 과정
- 제 2 항에 있어서,상기 신경망 컴퓨팅 장치는,하나 또는 복수 개의 은닉 계층과 출력 계층 각각에 대해, 해당 계층 내에 포함되는 뉴런 각각의 입력 연결선에 연결된 뉴런의 참조번호를 상기 복수 개의 메모리 유닛의 상기 제 1 메모리의 특정 주소 범위에 분산 누적 저장하고, 하기의 a 단계 및 b 단계에 따라 복수 계층 네트워크로 구성된 신경망을 계산하는, 신경망 컴퓨팅 장치.a. 입력 데이터를 상기 복수 개의 메모리 유닛의 상기 제 2 메모리에 입력 계층의 뉴런의 값으로 저장하는 단계b. 은닉 계층과 출력 계층 각각에 대해, 입력 계층에 연결된 계층부터 출력 계층까지 순차적으로 하기의 b1 과정 내지 b4 과정에 따라 계산하는 단계b1. 상기 복수 개의 메모리 유닛의 상기 제 1 메모리의 주소 입력의 값을 해당 계층의 주소 범위 내에서 순차적으로 변화시켜, 상기 제 1 메모리의 데이터 출력에 해당 계층 내의 뉴런의 입력 연결선에 연결된 뉴런의 참조번호를 순차적으로 출력하는 과정b2. 상기 복수 개의 메모리 유닛의 상기 제 2 메모리의 읽기 포트의 데이터 출력에 해당 계층 내의 뉴런의 입력 연결선에 연결된 뉴런의 출력값을 순차적으로 출력하는 과정b3. 상기 계산 서브시스템에서 해당 계층 내의 모든 뉴런 각각의 새로운 출력값을 순차적으로 계산하는 과정b4. 상기 계산 서브시스템에서 계산한 뉴런의 출력값을 상기 복수 개의 메모리 유닛의 상기 제 2 메모리의 쓰기 포트를 통해 순차적으로 저장하는 과정
- 제 5 항에 있어서,상기 신경망 컴퓨팅 장치는,상기 복수 계층 네트워크로 구성된 신경망을 계산하기 위하여 상기 복수 계층 네트워크 내의 하나 또는 복수 개의 은닉 계층과 출력 계층 각각에 대해, 하기의 a 과정 내지 f 과정을 반복적으로 수행하여 상기 복수 개의 메모리 유닛의 상기 제 1 메모리의 특정 주소 범위에 뉴런의 참조번호를 분산 누적 저장하는, 신경망 컴퓨팅 장치.a. 해당 계층 내에서 가장 많은 수의 입력 연결선을 가진 뉴런의 입력 연결선의 수(Pmax)를 찾는 과정b. 상기 메모리 유닛의 수를 p라 할 때, 해당 계층 내의 모든 뉴런이 개의 연결선을 갖도록 각각의 뉴런에 어떤 뉴런이 연결되어도 인접 뉴런에 영향을 미치지 않는 가상의 연결선을 추가하는 과정c. 해당 계층 내의 뉴런을 임의의 순서로 정렬하고 일련번호를 부여하는 과정e. 해당 계층 내의 첫 번째 뉴런의 첫 번째 연결선 묶음부터 마지막 번째 뉴런의 마지막 연결선 묶음까지 순서대로 일련 번호 k를 부여하는 과정f. 상기 복수 개의 메모리 유닛 중 i번째 메모리 유닛의 제 1 메모리의 해당 계층을 위한 특정 주소 영역 범위 내에서 k번째 주소에는 k번째 연결선 묶음의 i번째 연결선에 연결된 뉴런의 참조번호 값을 저장하는 과정
- 제 1 항에 있어서,상기 듀얼 포트 메모리는,하나의 메모리를 같은 클록 주기에 동시에 접근할 수 있는 논리회로를 구비한 물리적 듀얼 포트 메모리를 포함하는, 신경망 컴퓨팅 장치.
- 제 1 항에 있어서,상기 듀얼 포트 메모리는,하나의 메모리를 서로 다른 클록 주기에 시분할로 접근하는 두 개의 입출력 포트를 포함하는, 신경망 컴퓨팅 장치.
- 제 1 항에 있어서,상기 듀얼 포트 메모리는,내부에 두 개의 동일한 물리적인 메모리를 구비하고, 상기 제어 유닛으로부터의 제어 신호에 의해 제어되는 복수 개의 스위치를 이용하여 상기 두 개의 동일한 물리적인 메모리의 모든 입출력을 서로 바꾸어 연결하는 이중 메모리 교체(SWAP) 회로를 포함하는, 신경망 컴퓨팅 장치.
- 제 1 항에 있어서,상기 계산 서브시스템은,상응하는 상기 복수 개의 메모리 유닛의 출력을 입력받아 시냅스 특정 계산을 수행하기 위한 복수 개의 시냅스 유닛;상기 복수 개의 시냅스 유닛의 출력을 입력받아 뉴런의 모든 연결선에서 전달되는 입력의 총 합산을 계산하기 위한 덴드라이트 유닛; 및상기 덴드라이트 유닛의 출력을 입력받아 뉴런의 상태값을 갱신하고 새로운 출력값을 계산하기 위한 소마 유닛 중에서,상기 복수 개의 시냅스 유닛과 상기 소마 유닛을 포함하거나, 상기 복수 개의 시냅스 유닛과 상기 덴드라이트 유닛과 상기 소마 유닛을 포함하는, 신경망 컴퓨팅 장치.
- 제 10 항에 있어서,상기 복수 개의 시냅스 유닛 각각은,연결선의 입력으로 전달되는 신호를 연결선의 속성값에 따라 지연시키기 위한 액손 지연부; 및연결선의 가중치를 포함한 연결선의 상태값에 따라 연결선을 통과하는 신호의 세기를 조절하기 위한 시냅스 포텐셜부를 포함하는, 신경망 컴퓨팅 장치.
- 제 11 항에 있어서,상기 액손 지연부는,연결선의 액손 지연 상태값을 저장하는 액손 지연 상태값 메모리;1-비트 입력과 상기 액손 지연 상태값 메모리의 n비트 출력을 n+1 비트 데이터 폭으로 입력받고, n+1비트의 출력 중 상기 1-비트 입력에 대응되는 출력을 포함하는 n비트 출력을 상기 액손 지연 상태값 메모리로 출력하는 시프트 레지스터;연결선의 액손 지연 속성값을 저장하는 액손 지연 속성값 메모리; 및상기 액손 지연 속성값 메모리의 출력에 따라 상기 시프트 레지스터로부터의 n비트 중 하나를 선택하는 비트 선택기를 포함하는, 신경망 컴퓨팅 장치.
- 제 1 항에 있어서,상기 계산 서브시스템은,상태값을 저장하는 상태값 메모리; 및상기 상태값 메모리의 출력에서 순차적으로 읽은 데이터를 입력의 전부 또는 일부로 취하여 새로운 상태값을 순차적으로 계산하고 계산 결과의 전부 또는 일부를 상기 상태값 메모리에 순차적으로 저장하는 계산 회로를 하나 이상 포함하는, 신경망 컴퓨팅 장치.
- 제 13 항에 있어서,상기 상태값 메모리는,하나의 메모리를 같은 클록 주기에 동시에 접근할 수 있는 논리회로를 구비한 물리적 듀얼 포트 메모리를 포함하는, 신경망 컴퓨팅 장치.
- 제 1 항에 있어서,상기 계산 서브시스템은,복수 개의 속성값을 저장하고 계산 회로에 속성값을 제공하는 룩업 메모리; 및복수 개의 속성값 참조번호를 저장하고 상기 룩업 메모리에 속성값 참조번호를 제공하는 속성값 참조번호 메모리를 하나 이상 포함하는, 신경망 컴퓨팅 장치.
- 제 10 항에 있어서,상기 계산 서브시스템은,상기 시냅스 유닛 및 상기 덴드라이트 유닛에서 계산하는 시냅스 특정 계산의 신경망 갱신 주기와 상기 소마 유닛에서 계산하는 뉴런 특정 계산의 신경망 주기를 다르게 설정하여 시냅스 특정 계산을 수행하는 신경망 갱신 주기가 1회 진행되는 동안 뉴런 특정 계산을 수행하는 신경망 갱신 주기를 1회 이상 반복적으로 수행하는, 신경망 컴퓨팅 장치.
- 제 16 항에 있어서,상기 계산 서브시스템은,상기 덴드라이트 유닛과 상기 소마 유닛 사이에, 서로 다른 신경망 갱신 주기 간에 완충 기능을 수행하는 듀얼포트 메모리를 더 포함하는, 신경망 컴퓨팅 장치.
- 제 16 항에 있어서,상기 복수 개의 메모리 유닛 각각은,내부에 두 개의 동일한 물리적인 메모리를 구비하고, 상기 제어 유닛으로부터의 제어 신호에 의해 제어되는 복수 개의 스위치를 이용하여 상기 두 개의 동일한 물리적인 메모리의 모든 입출력을 서로 바꾸어 연결하는 이중 메모리 교체(SWAP) 회로를 포함하는, 신경망 컴퓨팅 장치.
- 제 10 항에 있어서,상기 복수 개의 시냅스 유닛 각각은, 연결선의 가중치값을 저장하는 연결선 가중치 메모리를 상태값 메모리의 하나로 구비하고, 학습 상태값을 입력받는 입력단을 더 구비하며,상기 소마 유닛은, 학습 상태값을 출력하는 출력단을 더 구비하고,상기 계산 서브시스템은, 상기 소마 유닛의 상기 출력단에서 상기 복수 개의 시냅스 유닛 각각의 상기 입력단 각각으로 공통으로 연결되는 연결선을 더 구비하는, 신경망 컴퓨팅 장치.
- 제 19 항에 있어서,상기 신경망 컴퓨팅 장치는,상기 복수 개의 메모리 유닛의 제 1 메모리에 신경망 내의 모든 뉴런 각각의 입력 연결선에 연결된 뉴런의 참조번호를 분산 저장하고, 상기 복수 개의 시냅스 유닛의 상기 연결선 가중치 메모리에 모든 뉴런 각각의 입력 연결선의 연결선 가중치의 초기값으로 저장하고, 하기의 a 과정 내지 f 과정에 따라 학습 계산을 수행하는, 신경망 컴퓨팅 장치.a. 상기 복수 개의 메모리 유닛에서 모든 뉴런 각각의 입력 연결선에 연결된 뉴런의 값을 순차적으로 출력하는 단계b. 상기 복수 개의 시냅스 유닛 각각은 일 입력을 통해 상응하는 상기 메모리 유닛에서 순차적으로 전달된 입력 뉴런의 출력값과 상기 연결선 가중치 메모리에서 순차적으로 전달된 연결선 가중치값을 입력으로 새로운 연결선 출력값을 순차적으로 계산하여 출력하는 단계c. 상기 덴드라이트 유닛은 상기 복수 개의 시냅스 유닛으로부터의 연결선 출력을 순차적으로 입력받아 뉴런의 모든 연결선에서 전달되는 입력의 총 합을 순차적으로 계산하여 출력하는 단계d. 상기 소마 유닛은 상기 덴드라이트 유닛으로부터의 뉴런의 입력값을 순차적으로 입력받아 뉴런의 상태값을 갱신하고 새로운 출력값을 순차적으로 계산하여 출력하고, 상기 입력값과 상기 상태값을 바탕으로 새로운 학습 상태값을 순차적으로 계산하여 출력하는 단계e. 상기 복수 개의 시냅스 유닛 각각은 타 입력을 통해 순차적으로 전달되는 학습 상태값과 일 입력을 통해 순차적으로 전달되는 입력 뉴런의 출력값과 상기 연결선 가중치 메모리로부터 순차적으로 전달된 연결선 가중치값을 입력으로 새로운 연결선 가중치값을 순차적으로 계산하여 상기 연결선 가중치 메모리에 저장하는 단계f. 상기 소마 유닛으로부터 출력되는 값을 상기 복수 개의 메모리 유닛의 제 2 메모리의 쓰기 포트를 통해 순차적으로 저장하는 단계
- 제 19 항에 있어서,상기 계산 서브시스템은,상기 소마 유닛의 상기 출력단과 상기 복수 개의 시냅스 유닛 각각의 상기 입력단 각각 사이에 구비되어, 학습 상태값을 임시로 저장하여 타이밍을 조절하는 학습 상태값 메모리를 더 포함하는 신경망 컴퓨팅 장치.
- 제 21 항에 있어서,상기 학습 상태값 메모리는,읽기 포트와 쓰기 포트를 구비한 물리적 듀얼 포트 메모리를 포함하되,상기 쓰기 포트는 상기 소마 유닛의 상기 출력단과 연결되고, 상기 읽기 포트는 상기 복수 개의 시냅스 유닛 각각의 상기 입력단 각각과 공통으로 연결되는, 신경망 컴퓨팅 장치.
- 제 19 항에 있어서,상기 복수 개의 시냅스 유닛 각각은,상응하는 상기 복수 개의 메모리 유닛에서 순차적으로 전달된 입력 뉴런의 출력값과 상기 연결선 가중치 메모리에서 순차적으로 전달된 연결선 가중치값을 각각 지연시켜 상기 소마 유닛의 학습 상태값의 출력과 타이밍을 맞추기 위한 두 개의 선입선출 큐를 포함하는, 신경망 컴퓨팅 장치.
- 제 10 항에 있어서,상기 신경망 컴퓨팅 장치는,같은 연결선에 대하여 순방향 계산과 역방향 계산이 동시에 적용되는 양방향 연결선(bidirectional connection)을 한 개 이상 포함하는 신경망을 계산하기 위하여, 하기의 a 과정 내지 d 과정에 따라 상기 복수 개의 메모리 유닛 내의 제 1 메모리와, 상기 복수 개의 시냅스 유닛의 상태값 메모리 및 속성값 메모리에 데이터를 저장하는, 신경망 컴퓨팅 장치.a. 모든 양방향 연결선 각각에 대해, 순방향의 입력을 제공하는 뉴런을 A, 순방향의 입력을 제공받는 뉴런을 B라 할 때, 뉴런 B에서 뉴런 A로 연결되는 새로운 역방향 연결선을 순방향 네트워크에 추가하여 펼친 네트워크를 구성하는 과정b. 상기 복수 개의 메모리 유닛과 상기 복수 개의 시냅스 유닛에 상기 펼친 네트워크 내의 모든 뉴런 각각의 입력 연결선 정보를 분산 저장하는 방법으로서, 연결선 배치 알고리즘을 사용하여 각각의 양방향 연결선의 순방향 연결선과 역방향 연결선을 같은 메모리 유닛과 시냅스 유닛에 배치하는 과정c. 상기 복수 개의 시냅스 유닛 각각에 포함되는 임의의 상태값 메모리와 속성값 메모리 각각의 k번째 주소에는, 해당 연결선이 순방향 연결선일 때, 해당 연결선의 연결선 상태값과 연결선 속성값을 각각 저장하는 과정d. 상기 복수 개의 시냅스 유닛의 상기 상태값 메모리와 상기 속성값 메모리에 저장된 연결선의 상태값 및 속성값에 접근할 때, k번째 연결선이 순방향 연결선이면 해당 상태값 메모리와 속성값 메모리에 저장된 k번째 주소에 접근하고, 역방향 연결선이면 해당 역방향 연결선에 대응되는 순방향 연결선의 상태값과 속성값에 접근하여 순방향 연결선과 역방향 연결선이 같은 상태값과 속성값을 공유하는 과정
- 제 24 항에 있어서,상기 복수 개의 메모리 유닛 각각은,역방향 연결선에 대응되는 순방향 연결선의 참조번호를 저장하는 역방향 연결선 참조번호 메모리; 및상기 제어 유닛에 의해 제어되고, 상기 제어 유닛의 제어신호와 상기 역방향 연결선 참조번호 메모리의 데이터 출력 중 하나를 선택하여 상응하는 시냅스 유닛으로 출력하며, 연결선의 상태값과 속성값을 순차적으로 선택하기 위해 사용되는 스위치를 더 포함하는, 신경망 컴퓨팅 장치.
- 제 24 항에 있어서,상기 연결선 배치 알고리즘은,그래프에서 호의 색칠 알고리즘(edge coloring algorithm)을 사용하여, 신경망에서 모든 양방향 연결선을 그래프에서 호(edge)로 표현하고, 신경망에서 모든 뉴런을 그래프에서 노드(node)로 표현하며, 신경망에서 연결선이 저장되는 메모리 유닛의 번호를 그래프에서 색(color)으로 표현하여, 순방향과 역방향 연결선을 같은 메모리 유닛의 번호에 배치하는, 신경망 컴퓨팅 장치.
- 제 24 항에 있어서,상기 연결선 배치 알고리즘은,모든 양방향 연결선이 완전한 이분 그래프(Complete Bipartite Graph)에 포함되는 경우, 각각의 양방향 연결선이 한 그룹의 i번째 뉴런에서 다른 그룹의 j번째 뉴런과 연결될 때, 해당 순방향 연결선과 역방향 연결선을 각각 (i+j) mod p 번째 메모리 유닛의 번호에 배치하는, 신경망 컴퓨팅 장치.
- 제 1 항에 있어서,상기 복수 개의 메모리 유닛 각각은,연결선에 연결된 뉴런의 참조번호를 저장하기 위한 제 1 메모리;읽기 포트와 쓰기 포트를 구비한 상기 듀얼 포트 메모리로 이루어진 제 2 메모리;읽기 포트와 쓰기 포트를 구비한 상기 듀얼 포트 메모리로 이루어진 제 3 메모리; 및상기 제어 유닛으로부터의 제어 신호에 의해 제어되고 상기 제 2 메모리와 상기 제 3 메모리의 모든 입출력을 서로 바꾸어 연결하는 복수 개의 스위치로 이루어진 이중 메모리 교체(SWAP) 회로를 포함하는 신경망 컴퓨팅 장치.
- 제 28 항에 있어서,상기 이중 메모리 교체 회로가 형성하는 첫 번째 논리적 듀얼 포트는 읽기 포트의 주소 입력이 상기 제 1 메모리의 출력과 연결되고 읽기 포트의 데이터 출력이 해당 메모리 유닛의 출력이 되며, 쓰기 포트의 데이터 입력이 다른 메모리 유닛들의 첫 번째 논리적 듀얼 포트의 쓰기 포트의 데이터 입력과 공통으로 연결되어 새롭게 계산한 뉴런 출력을 저장하고,상기 이중 메모리 교체 회로가 형성하는 두 번째 논리적 듀얼 포트는 쓰기 포트의 데이터 입력이 상기 다른 메모리 유닛들의 두 번째 논리적 듀얼 포트의 쓰기 포트의 데이터 입력과 공통으로 연결되어 다음 신경망 갱신 주기에 사용될 입력 뉴런의 값을 저장하는, 신경망 컴퓨팅 장치.
- 제 28 항에 있어서,상기 이중 메모리 교체 회로가 형성하는 첫 번째 논리적 듀얼 포트는 읽기 포트의 주소 입력이 상기 제 1 메모리의 출력과 연결되고 읽기 포트의 데이터 출력이 해당 메모리 유닛의 일 출력이 되며, 쓰기 포트의 데이터 입력이 다른 메모리 유닛들의 첫 번째 논리적 듀얼 포트의 쓰기 포트의 데이터 입력과 공통으로 연결되어 새롭게 계산한 뉴런 출력을 저장하고,상기 이중 메모리 교체 회로가 형성하는 두 번째 논리적 듀얼 포트는 읽기 포트의 주소 입력이 상기 제 1 메모리의 출력과 연결되고, 읽기 포트의 데이터 출력이 상기 해당 메모리 유닛의 타 출력으로 연결되어 이전 신경망 갱신 주기의 뉴런의 출력값을 출력하는, 신경망 컴퓨팅 장치.
- 제 1 항에 있어서,상기 복수 개의 메모리 유닛 각각은,연결선에 연결된 뉴런의 참조번호를 저장하기 위한 제 1 메모리;읽기 포트와 쓰기 포트를 구비한 상기 듀얼 포트 메모리로 이루어진 제 2 메모리;읽기 포트와 쓰기 포트를 구비한 상기 듀얼 포트 메모리로 이루어진 제 3 메모리;읽기 포트와 쓰기 포트를 구비한 상기 듀얼 포트 메모리로 이루어진 제 4 메모리; 및상기 제어 유닛으로부터의 제어 신호에 의해 제어되고 상기 제 2 메모리 내지 상기 제 4 메모리의 모든 입출력을 순차적으로 바꾸어 연결하는 복수 개의 스위치로 이루어진 삼중 메모리 교체(SWAP) 회로를 포함하는 신경망 컴퓨팅 장치.
- 제 31 항에 있어서,상기 삼중 메모리 교체 회로가 형성하는 첫 번째 논리적 듀얼 포트는 쓰기 포트의 데이터 입력이 다른 메모리 유닛들의 첫 번째 논리적 듀얼 포트의 쓰기 포트의 데이터 입력과 공통으로 연결되어 다음 신경망 갱신 주기에 사용될 입력 뉴런의 값을 저장하고,상기 삼중 메모리 교체 회로가 형성하는 두 번째 논리적 듀얼 포트는 읽기 포트의 주소 입력이 상기 제 1 메모리의 출력과 연결되고 읽기 포트의 데이터 출력이 해당 메모리 유닛의 일 출력이 되며, 쓰기 포트의 데이터 입력이 상기 다른 메모리 유닛들의 두 번째 논리적 듀얼 포트의 쓰기 포트의 데이터 입력과 공통으로 연결되어 새롭게 계산한 뉴런 출력을 저장하며,상기 삼중 메모리 교체 회로가 형성하는 세 번째 논리적 듀얼 포트는 읽기 포트의 주소 입력이 상기 제 1 메모리의 출력과 연결되고, 읽기 포트의 데이터 출력이 상기 해당 메모리 유닛의 타 출력으로 연결되어 이전 신경망 갱신 주기의 뉴런의 출력값을 출력하는, 신경망 컴퓨팅 장치.
- 제 10 항에 있어서,상기 복수 개의 시냅스 유닛 각각은, 연결선의 가중치값을 저장하는 연결선 가중치 메모리를 상태값 메모리의 하나로 구비하고, 학습 상태값을 입력받는 입력단을 더 구비하며,상기 소마 유닛은, 학습 임시값을 임시로 저장하기 위한 학습 임시값 메모리와 학습 데이터를 입력받기 위한 입력단과 학습 상태값을 출력하는 출력단을 더 구비하고,상기 계산 서브시스템은, 상기 소마 유닛의 상기 출력단과 상기 복수 개의 시냅스 유닛 각각의 상기 입력단 각각 사이에 구비되어, 학습 상태값을 임시로 저장하여 타이밍을 조절하는 학습 상태값 메모리를 더 포함하는, 신경망 컴퓨팅 장치.
- 제 33 항에 있어서,상기 신경망 컴퓨팅 장치는,순방향 네트워크의 하나 또는 복수 개의 은닉 계층과 출력 계층 각각과 역방향 네트워크의 하나 또는 복수 개의 은닉 계층 각각에 대해, 해당 계층 내에 포함되는 뉴런 각각의 입력 연결선에 연결된 뉴런의 참조번호를 상기 복수 개의 메모리 유닛의 제 1 메모리의 특정 주소 범위에 분산 저장하고, 상기 복수 개의 시냅스 유닛의 상기 연결선 가중치 메모리에 모든 뉴런 각각의 입력 연결선의 연결선 가중치의 초기값을 저장하고, 하기의 a 과정 내지 e 과정에 따라 역전파 신경망 학습 알고리즘을 계산하는, 신경망 컴퓨팅 장치.a. 입력 데이터를 상기 복수 개의 메모리 유닛의 제 2 메모리에 입력 계층의 뉴런의 값으로 저장하는 단계b. 복수 계층 순방향 계산을 입력 계층에 연결된 계층부터 출력 계층까지 순차적으로 진행하는 단계c. 출력 계층의 각 뉴런에 대해 상기 소마 유닛에서 상기 소마 유닛의 입력단을 통해 입력된 학습 데이터와 새로 계산된 뉴런의 출력값의 차이, 즉, 에러 값을 계산하는 단계d. 한 개 또는 복수 개의 은닉 계층의 역방향 네트워크의 각각의 계층에 대해, 출력 계층에 연결된 계층부터 입력 계층에 연결된 계층까지의 순차적으로 상기 에러 값의 전파를 수행하는 단계e. 한 개 또는 복수 개의 은닉 계층과 하나의 출력 계층 각각에 대해, 입력 계층에 연결된 계층부터 출력 계층까지 각 뉴런에 연결된 연결선의 가중치값을 조정하는 단계
- 제 34 항에 있어서,상기 복수 개의 메모리 유닛 각각은, 제 2 메모리가 두 개의 듀얼 포트 메모리와 이중 메모리 교체 회로에 의한 두 개의 논리적 듀얼 포트 메모리를 구비하고,상기 신경망 컴퓨팅 장치는,다음 신경망 갱신 주기에 사용할 입력 데이터를 두 번째 논리적 듀얼 포트 메모리에 미리 저장하여, 상기 a 단계와 상기 b 내지 e 단계를 병렬로 수행하는, 신경망 컴퓨팅 장치.
- 제 34 항에 있어서,상기 소마 유닛은,상기 b 단계 수행 시 학습 임시값을 계산하고 향후 학습 상태값 계산 시점까지 임시 보관을 위하여 상기 학습 임시값 메모리에 저장하는, 신경망 컴퓨팅 장치.
- 제 34 항에 있어서,상기 소마 유닛은,상기 c 단계의 출력 뉴런의 에러 값을 계산하는 단계를 상기 b 단계의 순방향 전파 단계에서 함께 수행하여 계산시간을 단축하는, 신경망 컴퓨팅 장치.
- 제 34 항에 있어서,상기 소마 유닛은,상기 c 단계와 상기 d 단계에서 각각 뉴런의 에러 값을 계산한 후에 학습 상태값을 계산하여 상기 출력단을 통해 출력하여 상기 학습 상태값 메모리에 저장하고,상기 학습 상태값 메모리에 저장된 학습 상태값은, 상기 e 단계에서 연결선의 가중치값을 계산하기 위해 사용하는, 신경망 컴퓨팅 장치.
- 제 34 항에 있어서,상기 제어 유닛은,상기 복수 개의 메모리 유닛 각각은, 제 2 메모리가 두 개의 듀얼 포트 메모리와 이중 메모리 교체 회로에 의한 두 개의 논리적 듀얼 포트 메모리를 구비하고,상기 두 개의 논리적 듀얼 포트 메모리 중 두 번째 논리적 듀얼 포트 메모리는 이전 신경망 갱신 주기의 뉴런의 출력값을 해당 메모리 유닛의 출력단으로 출력하여, 상기 e 단계와 다음 신경망 갱신 주기의 상기 b 단계를 동시에 수행하여 계산 시간을 단축하는, 신경망 컴퓨팅 장치.
- 제 10 항에 있어서,RBM(Restricted Boltzmann Machine) 각각의 RBM 제 1, 2, 3 단계 각각에 대해, 해당 단계 내에 포함되는 뉴런 각각의 입력 연결선에 연결된 뉴런의 참조번호를 상기 복수 개의 메모리 유닛의 제 1 메모리의 특정 주소 범위에 분산 누적 저장하고, 상기 RBM 제 2단계의 역방향 연결선 정보를 역방향 연결선 참조번호 메모리에 누적 저장하고, 상기 복수 개의 시냅스 유닛의 연결선 가중치 메모리에 모든 뉴런 각각의 입력 연결선의 연결선 가중치의 초기값을 누적 저장하고, 제 2 메모리의 영역을 3등분하여 Y(1), Y(2), Y(3) 영역이라 하고, 하나의 학습 데이터를 학습하는 계산 절차를 하기의 a 단계 내지 c 단계에 따라 수행하는, 신경망 컴퓨팅 장치.a. 상기 Y(1) 영역에 학습 데이터를 저장하는 단계b. 변수 S=1, D=2로 설정하는 단계c. 신경망 내 RBM 각각에 대하여 하기의 c1 과정 내지 c6 과정을 수행하는 단계c1. 상기 계산 서브시스템은 상기 메모리 유닛의 제 2 메모리의 Y(S) 영역을 입력으로 하고 상기 RBM 제 1 단계의 계산을 수행하여 계산 결과(hpos)를 상기 제 2 메모리의 Y(D) 영역에 저장하는 과정c2. 상기 Y(D) 영역을 입력으로 하고 상기 RBM 제 2 단계의 계산을 수행하여 계산 결과를 상기 Y(3) 영역에 저장하는 과정c3. 상기 Y(3) 영역을 입력으로 하고 상기 RBM 제 2 단계의 계산을 수행하는 과정c4. 모든 연결선의 값을 조정하는 과정c5. 상기 변수 S와 D의 값을 서로 바꾸는 과정c6. 현재 RBM이 마지막 RBM이면 상기 Y(1) 영역에 다음 학습 데이터를 저장하는 과정
- 제 1 항에 있어서,상기 메모리 유닛 또는 상기 계산 서브시스템 내의 하나 또는 복수 개의 메모리 각각의 주소 입력단에, 접근하는 주소값에 지정된 오프셋 값만큼 더한 값이 메모리의 주소로 지정되도록 하여 상기 제어 유닛이 메모리의 접근 범위를 쉽게 변경할 수 있게 하는 오프셋 회로를 더 포함하는 신경망 컴퓨팅 장치.
- 제 1 항에 있어서,상기 제어 유닛은,각 제어 단계의 제어신호 생성에 필요한 정보를 포함하는 절차운용표(SOT)를 구비하고, 상기 각 제어 단계마다 상기 절차운용표의 레코드를 하나씩 읽어 시스템 운용에 사용하는, 신경망 컴퓨팅 장치.
- 제 42 항에 있어서,상기 절차운용표는,순차적인 레코드로 이동하지 않고 레코드에 포함된 레코드 식별자로 이동하도록 지시하는 "GO TO" 레코드를 포함하는, 신경망 컴퓨팅 장치.
- 신경망 컴퓨팅 시스템에 있어서,상기 신경망 컴퓨팅 시스템을 제어하기 위한 제어 유닛;듀얼 포트 메모리를 이용하여 각각 연결선 전단(pre-synaptic) 뉴런의 출력값을 출력하는 복수 개의 메모리 유닛으로 각각 이루어진 복수 개의 네트워크 서브시스템; 및각각이 상기 복수 개의 네트워크 서브시스템 중 하나에 포함된 상기 복수 개의 메모리 유닛으로부터 입력되는 연결선 전단 뉴런의 출력값을 이용하여 새로운 연결선 후단(post-synaptic) 뉴런의 출력값을 계산하여 상기 복수 개의 네트워크 서브시스템 각각으로 피드백시키기 위한 복수 개의 계산 서브시스템을 포함하는 신경망 컴퓨팅 시스템.
- 제 44 항에 있어서,상기 복수 개의 계산 서브시스템의 출력단과 상기 복수 개의 네트워크 서브시스템의 상기 복수 개의 메모리 유닛의 피드백 입력이 공통으로 연결된 입력단 사이에 구비되어, 상기 복수 개의 계산 서브시스템의 출력을 다중화하는 다중화기를 더 포함하는 신경망 컴퓨팅 시스템.
- 제 44 항에 있어서,상기 제어 유닛은,일 열로 연결된 복수 개의 시프트 레지스터를 이용하여 시간의 차이를 가지면서 동일한 순서로 변화하는 제어 신호를 생성하여 상기 신경망 컴퓨팅 시스템 내의 각 메모리의 주소 입력에 공급하는, 신경망 컴퓨팅 시스템.
- 다중 프로세서 컴퓨팅 시스템에 있어서,상기 다중 프로세서 컴퓨팅 시스템을 제어하기 위한 제어 유닛; 및각각이 전체 계산량의 일부를 계산하고 타 프로세서와 공유하기 위하여 계산 결과의 일부를 출력하는 복수 개의 프로세서 서브시스템을 포함하되,상기 복수 개의 프로세서 서브시스템 각각은,전체 계산량의 일부를 계산하고 상기 타 프로세서와 공유하기 위하여 계산 결과의 일부를 출력하는 하나의 프로세서; 및상기 프로세서와 타 프로세서 사이의 통신 기능을 수행하는 하나의 메모리 그룹을 포함하는 다중 프로세서 컴퓨팅 시스템.
- 제 47 항에 있어서,상기 메모리 그룹은,각각이 읽기 포트와 쓰기 포트를 구비한 복수 개(N개)의 듀얼 포트 메모리; 및상기 복수 개의 듀얼 포트 메모리의 읽기 포트를 통합하여 각각의 듀얼 포트 메모리가 전체 용량의 일부를 차지하는 N배 용량의 통합 메모리의 기능을 수행하도록 하는 디코더 회로를 포함하는 다중 프로세서 컴퓨팅 시스템.
- 제 48 항에 있어서,상기 디코더 회로에 의해 통합되는 상기 통합 메모리는 주소 입력과 데이터 출력이 대응되는 프로세서로 연결되어 해당 프로세서에 의해 상시 접근되고,상기 복수 개의 듀얼 포트 메모리의 쓰기 포트는 각각 복수 개의 상기 프로세서의 출력과 연결되는, 다중 프로세서 컴퓨팅 시스템.
- 제 48 항에 있어서,상기 복수 개의 듀얼 포트 메모리 중 일부는,물리적 메모리가 할당되지 않는 가상의 메모리로 구현된, 다중 프로세서 컴퓨팅 시스템.
- 제 47 항 내지 제 50 항 중 어느 한 항에 있어서,상기 복수 개의 프로세서 서브시스템 각각은,상기 프로세서가 독립적으로 사용하는 지역 메모리를 더 포함하고,상기 메모리 그룹의 읽기 포트를 통해 접근 가능한 메모리의 공간과 상기 지역 메모리의 읽기 공간을 하나의 메모리 공간으로 통합하여, 상기 프로세서의 프로그램이 상기 지역 메모리와 상기 메모리 그룹의 데이터를 구분없이 접근하는, 다중 프로세서 컴퓨팅 시스템.
- 메모리 장치에 있어서,연결선 전단 뉴런의 참조번호를 저장하기 위한 제 1 메모리; 및읽기 포트와 쓰기 포트를 구비한 듀얼 포트 메모리로 이루어져, 뉴런의 출력값을 저장하기 위한 제 2 메모리를 포함하는 메모리 장치.
- 제 52 항에 있어서,상기 듀얼 포트 메모리는,하나의 메모리를 같은 클록 주기에 동시에 접근할 수 있는 논리회로를 구비한 물리적 듀얼 포트 메모리를 포함하는, 메모리 장치.
- 제 52 항에 있어서,상기 듀얼 포트 메모리는,하나의 메모리를 서로 다른 클록 주기에 시분할로 접근하는 두 개의 입출력 포트를 포함하는, 메모리 장치.
- 제 52 항에 있어서,상기 듀얼 포트 메모리는,내부에 두 개의 동일한 물리적인 메모리를 구비하고, 제어 유닛으로부터의 제어 신호에 의해 제어되는 복수 개의 스위치를 이용하여 상기 두 개의 동일한 물리적인 메모리의 모든 입출력을 서로 바꾸어 연결하는 이중 메모리 교체(SWAP) 회로를 포함하는, 메모리 장치.
- 신경망 컴퓨팅 방법에 있어서,제어 유닛의 제어에 따라, 복수 개의 메모리 유닛 각각이 듀얼 포트 메모리를 이용하여 연결선 전단(pre-synaptic) 뉴런의 출력값을 출력하는 단계; 및상기 제어 유닛의 제어에 따라, 하나의 계산 서브시스템이 상기 복수 개의 메모리 유닛으로부터 각각 입력되는 연결선 전단 뉴런의 출력값을 이용하여 새로운 연결선 후단(post-synaptic) 뉴런의 출력값을 계산하여 상기 복수 개의 메모리 유닛 각각으로 피드백시키는 단계를 포함하되,상기 복수 개의 메모리 유닛과 상기 하나의 계산 서브시스템이, 상기 제어 유닛의 제어에 따라 하나의 시스템 클록에 동기화되어 파이프라인 방식으로 동작하는, 신경망 컴퓨팅 방법.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/909,338 US20160196488A1 (en) | 2013-08-02 | 2014-07-31 | Neural network computing device, system and method |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2013-0091855 | 2013-08-02 | ||
KR20130091855 | 2013-08-02 | ||
KR1020140083688A KR20150016089A (ko) | 2013-08-02 | 2014-07-04 | 신경망 컴퓨팅 장치 및 시스템과 그 방법 |
KR10-2014-0083688 | 2014-07-04 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2015016640A1 true WO2015016640A1 (ko) | 2015-02-05 |
WO2015016640A9 WO2015016640A9 (ko) | 2015-04-30 |
Family
ID=52432091
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2014/007065 WO2015016640A1 (ko) | 2013-08-02 | 2014-07-31 | 신경망 컴퓨팅 장치 및 시스템과 그 방법 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2015016640A1 (ko) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106560848A (zh) * | 2016-10-09 | 2017-04-12 | 辽宁工程技术大学 | 模拟生物双向认知能力的新型神经网络模型及训练方法 |
EP3185184A1 (en) | 2015-12-21 | 2017-06-28 | Aiton Caldwell SA | The method for analyzing a set of billing data in neural networks |
CN108304922A (zh) * | 2017-01-13 | 2018-07-20 | 华为技术有限公司 | 用于神经网络计算的计算设备和计算方法 |
WO2019050297A1 (ko) * | 2017-09-08 | 2019-03-14 | 삼성전자 주식회사 | 뉴럴 네트워크 학습 방법 및 장치 |
WO2019088470A1 (en) * | 2017-10-31 | 2019-05-09 | Samsung Electronics Co., Ltd. | Processor and control methods thereof |
CN111738429A (zh) * | 2019-03-25 | 2020-10-02 | 中科寒武纪科技股份有限公司 | 一种计算装置及相关产品 |
CN112805727A (zh) * | 2018-10-08 | 2021-05-14 | 深爱智能科技有限公司 | 分布式处理用人工神经网络运算加速装置、利用其的人工神经网络加速系统、及该人工神经网络的加速方法 |
CN113128675A (zh) * | 2021-04-21 | 2021-07-16 | 南京大学 | 一种基于脉冲神经网络的无乘法卷积调度器及其硬件实现方法 |
CN113821321A (zh) * | 2021-08-31 | 2021-12-21 | 上海商汤阡誓科技有限公司 | 任务处理芯片、方法、装置、计算机设备及存储介质 |
US11954579B2 (en) * | 2021-06-04 | 2024-04-09 | Lynxi Technologies Co., Ltd. | Synaptic weight training method, target identification method, electronic device and medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020143720A1 (en) * | 2001-04-03 | 2002-10-03 | Anderson Robert Lee | Data structure for improved software implementation of a neural network |
KR20080046065A (ko) * | 2006-11-21 | 2008-05-26 | 엠텍비젼 주식회사 | 공유 메모리 접근 제어 장치를 가지는 듀얼 포트 메모리,공유 메모리 접근 제어 장치를 가지는 멀티 프로세서시스템 및 멀티 프로세서 시스템의 공유 메모리 접근 제어방법 |
US20110307685A1 (en) * | 2010-06-11 | 2011-12-15 | Song William S | Processor for Large Graph Algorithm Computations and Matrix Operations |
US20120063240A1 (en) * | 2010-09-14 | 2012-03-15 | Samsung Electronics Co., Ltd. | Memory system supporting input/output path swap |
US20120166374A1 (en) * | 2006-12-08 | 2012-06-28 | Medhat Moussa | Architecture, system and method for artificial neural network implementation |
-
2014
- 2014-07-31 WO PCT/KR2014/007065 patent/WO2015016640A1/ko active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020143720A1 (en) * | 2001-04-03 | 2002-10-03 | Anderson Robert Lee | Data structure for improved software implementation of a neural network |
KR20080046065A (ko) * | 2006-11-21 | 2008-05-26 | 엠텍비젼 주식회사 | 공유 메모리 접근 제어 장치를 가지는 듀얼 포트 메모리,공유 메모리 접근 제어 장치를 가지는 멀티 프로세서시스템 및 멀티 프로세서 시스템의 공유 메모리 접근 제어방법 |
US20120166374A1 (en) * | 2006-12-08 | 2012-06-28 | Medhat Moussa | Architecture, system and method for artificial neural network implementation |
US20110307685A1 (en) * | 2010-06-11 | 2011-12-15 | Song William S | Processor for Large Graph Algorithm Computations and Matrix Operations |
US20120063240A1 (en) * | 2010-09-14 | 2012-03-15 | Samsung Electronics Co., Ltd. | Memory system supporting input/output path swap |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3185184A1 (en) | 2015-12-21 | 2017-06-28 | Aiton Caldwell SA | The method for analyzing a set of billing data in neural networks |
CN106560848B (zh) * | 2016-10-09 | 2021-05-11 | 辽宁工程技术大学 | 模拟生物双向认知能力的新型神经网络模型及训练方法 |
CN106560848A (zh) * | 2016-10-09 | 2017-04-12 | 辽宁工程技术大学 | 模拟生物双向认知能力的新型神经网络模型及训练方法 |
CN108304922A (zh) * | 2017-01-13 | 2018-07-20 | 华为技术有限公司 | 用于神经网络计算的计算设备和计算方法 |
WO2019050297A1 (ko) * | 2017-09-08 | 2019-03-14 | 삼성전자 주식회사 | 뉴럴 네트워크 학습 방법 및 장치 |
US11586923B2 (en) | 2017-09-08 | 2023-02-21 | Samsung Electronics Co., Ltd. | Neural network learning method and device |
WO2019088470A1 (en) * | 2017-10-31 | 2019-05-09 | Samsung Electronics Co., Ltd. | Processor and control methods thereof |
US11093439B2 (en) | 2017-10-31 | 2021-08-17 | Samsung Electronics Co., Ltd. | Processor and control methods thereof for performing deep learning |
CN112805727A (zh) * | 2018-10-08 | 2021-05-14 | 深爱智能科技有限公司 | 分布式处理用人工神经网络运算加速装置、利用其的人工神经网络加速系统、及该人工神经网络的加速方法 |
CN111738429A (zh) * | 2019-03-25 | 2020-10-02 | 中科寒武纪科技股份有限公司 | 一种计算装置及相关产品 |
CN111738429B (zh) * | 2019-03-25 | 2023-10-13 | 中科寒武纪科技股份有限公司 | 一种计算装置及相关产品 |
CN113128675A (zh) * | 2021-04-21 | 2021-07-16 | 南京大学 | 一种基于脉冲神经网络的无乘法卷积调度器及其硬件实现方法 |
CN113128675B (zh) * | 2021-04-21 | 2023-12-26 | 南京大学 | 一种基于脉冲神经网络的无乘法卷积调度器及其硬件实现方法 |
US11954579B2 (en) * | 2021-06-04 | 2024-04-09 | Lynxi Technologies Co., Ltd. | Synaptic weight training method, target identification method, electronic device and medium |
CN113821321A (zh) * | 2021-08-31 | 2021-12-21 | 上海商汤阡誓科技有限公司 | 任务处理芯片、方法、装置、计算机设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
WO2015016640A9 (ko) | 2015-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015016640A1 (ko) | 신경망 컴퓨팅 장치 및 시스템과 그 방법 | |
WO2013115431A1 (ko) | 신경망 컴퓨팅 장치 및 시스템과 그 방법 | |
US11360930B2 (en) | Neural processing accelerator | |
US5542026A (en) | Triangular scalable neural array processor | |
WO2021054614A1 (en) | Electronic device and method for controlling the electronic device thereof | |
US4967340A (en) | Adaptive processing system having an array of individually configurable processing components | |
WO2020231049A1 (en) | Neural network model apparatus and compressing method of neural network model | |
WO2021060609A1 (ko) | 복수의 엣지와 클라우드를 포함하는 분산 컴퓨팅 시스템 및 이의 적응적 지능 활용을 위한 모델 제공 방법 | |
WO2016099036A1 (ko) | 메모리 접근 방법 및 장치 | |
WO2021153969A1 (en) | Methods and systems for managing processing of neural network across heterogeneous processors | |
WO2020075957A1 (ko) | 분산처리용 인공신경망 연산 가속화 장치, 이를 이용한 인공신경망 가속화 시스템, 및 그 인공신경망의 가속화 방법 | |
Kumar et al. | Design and implementation of Carry Select Adder without using multiplexers | |
WO2023229410A1 (ko) | Rna 치료제 설계 방법 및 시스템 | |
WO2019088470A1 (en) | Processor and control methods thereof | |
WO1991018348A1 (en) | A triangular scalable neural array processor | |
WO2021125496A1 (ko) | 전자 장치 및 그 제어 방법 | |
WO2022019443A1 (ko) | 효율적인 양자 모듈러 곱셈기 및 양자 모듈러 곱셈 방법 | |
WO2022270815A1 (ko) | 전자 장치 및 전자 장치의 제어 방법 | |
WO2023085862A1 (en) | Image processing method and related device | |
US5146420A (en) | Communicating adder tree system for neural array processor | |
WO2024010437A1 (ko) | 신경 프로세싱 유닛 및 이의 동작 방법 | |
Swenson et al. | A hardware FPGA implementation of a 2D median filter using a novel rank adjustment technique | |
Wang et al. | A DSP48-based reconfigurable 2-D convolver on FPGA | |
WO2024058572A1 (en) | Multi-bit accumulator and in-memory computing processor with same | |
WO2023128283A1 (ko) | 복수의 저장공간을 갖는 원장정보 액세스 시스템 및 수행방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14831891 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14909338 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 08/04/2016) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14831891 Country of ref document: EP Kind code of ref document: A1 |