CN112835844A - Communication sparsization method for load calculation of impulse neural network - Google Patents
Communication sparsization method for load calculation of impulse neural network Download PDFInfo
- Publication number
- CN112835844A CN112835844A CN202110233847.4A CN202110233847A CN112835844A CN 112835844 A CN112835844 A CN 112835844A CN 202110233847 A CN202110233847 A CN 202110233847A CN 112835844 A CN112835844 A CN 112835844A
- Authority
- CN
- China
- Prior art keywords
- node
- communication
- nodes
- neuron
- neurons
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000006854 communication Effects 0.000 title claims abstract description 90
- 238000004891 communication Methods 0.000 title claims abstract description 86
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 23
- 238000004364 calculation method Methods 0.000 title claims description 19
- 210000002569 neuron Anatomy 0.000 claims abstract description 155
- 230000001242 postsynaptic effect Effects 0.000 claims abstract description 18
- 230000000903 blocking effect Effects 0.000 claims abstract description 7
- 210000000225 synapse Anatomy 0.000 claims description 11
- 238000012421 spiking Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 10
- 230000009467 reduction Effects 0.000 abstract description 2
- 230000002829 reductive effect Effects 0.000 description 18
- 238000004088 simulation Methods 0.000 description 17
- 230000001054 cortical effect Effects 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000005265 energy consumption Methods 0.000 description 7
- 210000004556 brain Anatomy 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000007423 decrease Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 125000004122 cyclic group Chemical group 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000000630 rising effect Effects 0.000 description 2
- 230000000946 synaptic effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002964 excitative effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Debugging And Monitoring (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides a communication sparsity method of a pulse neural network computing load, which effectively solves the problem of expandability of communication efficiency of a distributed computing platform and effectively solves the problem of gradual reduction of computing efficiency along with gradual increase of computing nodes in the distributed computing platform. In the technical scheme of the patent, the neurons are redistributed on each node based on redistribution operation, the neurons distributed on each node have the most post-synaptic neurons in the node, the nodes are in a non-blocking communication mode, in each communication process, each node asynchronously sends pulse data to all target nodes of the node and waits for receiving the pulse sent by the source node, namely, each node only sends necessary data to the target nodes of the communication, and does not communicate with the non-target nodes, and therefore communication with non-adjacent processes is avoided.
Description
Technical Field
The invention relates to the technical field of deep learning, in particular to a communication sparseness method for a pulse neural network computing load.
Background
With the development of the field of deep learning, more and more research works of brain neural computation science appear, and the defects of the existing deep learning are hopefully overcome by analyzing the working mechanism of the brain and developing brain-like computation. The basis of brain-like computing is the Spiking Neural Network (SNN), which works closer to the biological brain than the traditional Deep Neural Network (DNN).
In order to obtain the optimal computing capacity, in order to be closer to the brain computing scale, the brain-like computing is mostly performed by constructing a large-scale cluster-forming distributed computing platform in the prior art. However, as the number of computing nodes increases, the proportion of communication time between the hardware nodes in the distributed cluster increases, and the proportion of effective computation in the computing nodes decreases gradually, that is, as the number of computing nodes in the distributed computing platform increases gradually, the computation efficiency of the SNN decreases gradually.
The skilled person also tries to solve this problem at various angles. For example, a common simulator NEST of a neural network model adopts a "buffer dynamic equalization" method to solve the problem of reduced computational efficiency. In the NEST simulation environment, nodes do not need to communicate and directly and dynamically maintain a transmission buffer. The method does not need to communicate the size of the buffer area, so the communication time is optimized to a certain extent, but the method is poor in energy consumption, and a new problem is caused: the NEST communication energy consumption is rapidly increased along with the increase of nodes, so that the expansion of the NEST in the communication energy consumption is poor.
Disclosure of Invention
In order to solve the problem that the SNN calculation efficiency is gradually reduced along with the increase of calculation nodes in a distributed calculation platform, the invention provides a communication sparseness method of a pulse neural network calculation load, which effectively solves the problem of expandability of the communication efficiency of the distributed calculation platform, and effectively solves the problem that the calculation efficiency is gradually reduced along with the gradual increase of the calculation nodes in the distributed calculation platform.
The technical scheme of the invention is as follows: a communication sparsification method for a pulse neural network computing load comprises the following steps:
s1: constructing an impulse neural network based on a distributed system architecture;
it is characterized by also comprising the following steps:
s2: traversing the pulse neural network to find all nodes, neurons and a neuron connection table;
s3: performing a redistribution operation on the neurons on each of the nodes;
the redistribution operation ensures that the neuron assigned on each node has the most post-synaptic neurons in that node;
s4: establishing synapses according to the nodes subjected to the redistribution operation and the neurons distributed on each node;
s5: after entering a communication stage and before each communication, each node communicates with other nodes to obtain information of all source nodes and target nodes of the node in the communication;
s6: each node asynchronously sends pulse data to all target nodes and waits for receiving pulses sent by a source node; in the communication process, all the nodes communicate with the source node and the target node thereof based on a non-blocking communication mode;
s7: and after the communication between the node and the source node and the target node is finished, finishing the communication stage.
It is further characterized in that:
in step S3, the redistribution operation includes the following steps:
a 1: putting all nodes into a node set, and putting all neurons into a neuron set;
a 2: selecting a node from the node set, and removing the current node from the node set after the node is set as the current node;
a 3: selecting a neuron from the neuron set, and setting the neuron as a current neuron;
a 4: distributing the current neuron onto the current node;
removing the current neuron from the set of neurons;
a 5: confirming whether the current node reaches the maximum accommodating amount;
if the maximum capacity has not been reached, performing step a 6; otherwise, executing step a 8;
a 6: based on the neuron connection table, finding out the neuron with the postsynaptic neuron most distributed in the node in the neuron set, and recording as the neuron to be distributed;
a 7: distributing the neurons to be distributed to the current node, and removing the neurons to be distributed from the neuron set; circularly executing the steps a 5-a 7;
a 8: confirming whether nodes without distributed neurons exist in the node set or not;
if yes, circularly executing the steps a 2-a 8;
otherwise, recording all distribution results, and ending the redistribution operation;
in step S1, when constructing the spiking neural network, it is only necessary to construct connections between neurons without creating specific synapses, and the neuron connection table may be obtained;
the redistribution operation in step S3 is performed on any one of the nodes, and after the operation is finished, the redistribution operation is performed on the node to send the redistribution result to other nodes;
the memory of the redistribution operation uses a calculation mode:
M=N*Mint*(2N+1)
wherein: m represents the total memory, N represents the total number of neurons, MintIndicating that the int-type parameter occupies memory.
The communication sparsification method for the calculation load of the pulse neural network provided by the invention redistributes the neurons on each node based on redistribution operation, and the neurons distributed on each node have the most postsynaptic neurons in the node, so that the generation of cross-node pulses is reduced, the connection between cross-hardware nodes becomes sparse, the communication between the nodes is reduced, the communication time between the hardware nodes in the distributed cluster and the hardware nodes is reduced, and the calculation efficiency of SNN is improved; based on a non-blocking communication mode, each node asynchronously sends pulse data to all target nodes in each communication process and waits for receiving pulses sent by a source node, namely, each node only sends necessary data to the target nodes in the communication, but does not communicate with the non-target nodes, communication with non-adjacent processes is avoided, and communication efficiency between the nodes is further optimized; meanwhile, after redistribution operation, the neurons are redistributed to each node, pulses generated by each round of neurons of the SNN model are reduced, point-to-point communication is reduced, and only pulse data required by a target process needs to be sent, so that the problem of poor communication energy consumption expandability caused by collective communication is solved.
Drawings
FIG. 1 is a schematic flow chart of the redistribution operation of the present patent;
FIG. 2 is a schematic flow chart of the implementation of the technical solution of the present patent based on the NEST simulator;
FIG. 3 is a diagram of a parallel SNN connectivity structure in the prior art;
FIG. 4 is a diagram illustrating a communication method between nodes in the prior art;
FIG. 5 is a schematic diagram of a SNN subgraph structure in the present patent;
fig. 6 shows the results of operating the solution of the present patent based on the cortical microcircuit model: simulation time;
fig. 7 shows the results of operating the solution of the present patent based on the cortical microcircuit model: communication time;
fig. 8 shows the results of operating the solution of the present patent based on the cortical microcircuit model: the amount of data communicated;
fig. 9 shows the results of operating the solution of the present patent based on the cortical microcircuit model: the technical scheme of the patent is time-consuming to operate;
fig. 10 shows the results of running the solution of the present patent based on the cortical microcircuit model: the number of connections of neurons to nodes is compared.
Detailed Description
In the communication sparseness method for calculating the load of the impulse neural network, as shown in fig. 2, a NEST simulator is used as an SNN load characterization simulation tool, and implementation steps of the technical scheme of the patent are described in detail.
S1: constructing an impulse neural network based on a distributed system architecture;
when the impulse neural network is constructed, specific synapses do not need to be established, and only connection among neurons needs to be established, so that a neuron connection table can be obtained.
S2: and traversing the impulse neural network to find all nodes and neurons and a neuron connection table.
When the technical scheme of the patent is applied to NEST, Create only needs to obtain the connection condition between neurons; no specific synaptic object is created at Connect time, but a connection table is obtained through all connections.
S3: performing redistribution operation on the neurons on each node;
the redistribution operation ensures that the neuron assigned on each node has the most post-synaptic neurons in that node.
On NEST, when a simulation function is called to initialize simulation, redistribution processing of neurons is carried out according to the connection table, and then synapses are created and a simulation process is operated. Namely, when the template is initialized, the Redistribute module is called to run redistribution operation, then the result is applied to all nodes, and then Connect operation is carried out to create a specific synapse object.
In a typical SNN network implemented based on a NEST simulator, neurons 2 are distributed into different nodes 1, as shown in fig. 3, the connecting lines between the neurons 2 represent pulse data between the neurons, as shown in fig. 4, the connecting lines between the nodes 1 represent communication between the nodes; as shown in fig. 5, if neuron No. 20 generates a pulse, it sends pulse data to all post-synaptic neurons 21, 22, 23, and all pulse transmissions are based on such SNN subgraphs. In the NEST simulator, when a plurality of post-synaptic neurons of a neuron are included on one node, the post-synaptic neurons share one pulse data. Thus, regardless of the number of post-synaptic neurons, the pulse sent by that neuron is transmitted to the node counts as only one, i.e., a neuron-to-node connection. The total pulse generated per neuron update is always less than or equal to the number of nodes x the number of neurons.
Therefore, in the technical scheme of the patent, through analysis of connection relations among the neurons and based on redistribution operation, the neurons with the most postsynaptic neurons in the neuron set are selected each time, the SNN subgraphs are placed into nodes as few as possible, the neurons are redistributed, generation of cross-node pulses is reduced, connection among cross-hardware nodes becomes sparse, and therefore communication among the nodes is reduced.
As shown in fig. 1, the redistribution operation includes the following steps:
a 1: putting all nodes into a node set, and putting all neurons into a neuron set;
a 2: selecting a node from the node set, and removing the current node from the node set after the node is set as the current node;
a 3: selecting a neuron from the neuron set, and setting the neuron as a current neuron;
a 4: distributing the current neurons to the current nodes;
removing the current neuron from the neuron set;
a 5: confirming whether the current node reaches the maximum accommodating amount;
if the maximum capacity has not been reached, performing step a 6; otherwise, executing step a 8;
a 6: based on a neuron connection table, finding out the neuron with the most distributed postsynaptic neurons in the node in the neuron set, and recording as the neuron to be distributed;
a 7: distributing the neurons to be distributed to the current nodes, and removing the neurons to be distributed from the neuron set; circularly executing the steps a 5-a 7;
a 8: confirming whether nodes without distributed neurons exist in the node set or not;
if yes, circularly executing the steps a 2-a 8;
otherwise, recording all distribution results, and ending the redistribution operation.
The redistribution operation distributes the neurons to a plurality of nodes according to the connection of compactness, a maximum accommodating capacity is set for each node, and the load of the nodes is well balanced by limiting the accommodating capacity of the neurons of the nodes; the technical scheme of the patent is ensured to improve the calculation efficiency and simultaneously avoid the problems of node load, performance and the like.
In a specific implementation, the algorithm description of the redistribution operation (hereinafter abbreviated as ReLOC) is as follows:
ReLOC algorithm
Inputting: node number and neuron connection table;
and (3) outputting: each node neuron is distributed;
①for each k∈np
②r=rand(neurons);
③put r in part[k];
④while part[k].size<max_neurons
⑤ c=max(post(neurons) in part[k]);
⑥put c in part[k];
wherein: np is the number of nodes, neurones is the unassigned set of neurons, part [ k ] is the set of neurons for node k, max _ neurones is the maximum capacity of a node, and post [ n ] is all the post-synaptic neurons for neuron n.
In the ReLOC algorithm, a node is selected at first, a random neuron r is distributed on the node, and k represents the current node; then, unallocated neurons are put on the node in turn until the node k reaches the maximum accommodating amount max _ neurons, and the rule is that each time the put neurons are the most distributed neurons in the node k, the postsynaptic neurons of the neurons in the unallocated neuron set neurons are all put. When neurons are matched in sequence according to the method, the matched neuron is the neuron which is most closely related to the matching set each time.
In the redistribution operation, it is the number of post-synaptic neurons in the neuron set that determines how closely the neurons are. Because each matching can maximize the distribution of an SNN subgraph (figure 5 of the attached figure of the specification) to the same node, when all matches are optimal, each subgraph is distributed to one node, and the problem of traffic increase caused by the distribution of the same subgraph to multiple nodes is avoided.
The redistribution operation of the patent is operated on any node, and after the operation is finished, the distribution result of the redistribution operation is sent to other nodes; redistribution operation in this patent itself possesses very high execution efficiency, and the performance requirement to the device is not high simultaneously, promptly, this patent technical scheme can not extra bring too big burden to the platform, has ensured this patent technical scheme's practicality.
The memory usage calculation mode of the redistribution operation is as follows:
M=N*Mint*(2N+1)
wherein: m represents the total memory, N represents the total number of neurons, MintIndicating that the int-type parameter occupies memory.
S4: and establishing synapses according to the nodes subjected to redistribution operation and the nodes distributed on each node.
In the technical scheme, a node-to-node communication mode is realized based on a dynamic sparse data exchange algorithm, that is, one node only sends necessary data to another node, and multiple MPI communication needs to be started in the period, although the overall efficiency is not as good as that of collective communication (MPI _ Alltoall). After the number of neighbor nodes of each node is reduced based on the redistribution operation, the number of times of starting MPI communication is reduced on the whole, and further the communication efficiency is improved.
S5: after entering the communication stage, before each communication starts, each node communicates with other nodes to obtain information of all source nodes and target nodes in the communication.
In specific implementation, each node uses the MPI _ Reduce _ scatter function to communicate with other nodes to obtain information of a source node, that is, the nodes from which pulse data needs to be received, so as to avoid communication with non-adjacent processes in subsequent communication, thereby improving communication efficiency.
S6: each node asynchronously sends pulse data to all target nodes and waits for receiving pulses sent by a source node; in the communication process, all nodes communicate with the source node and the target node based on the non-blocking communication mode.
When each node and other target nodes carry out data transmission, non-blocking communication is used, namely MPI _ Isend is used for sending data, MPI _ Recv is used for receiving the data, MPI _ Waitall is used for waiting for the completion of all data transmission, and then simulation operation is continued. The process does not need to wait for the communication of all nodes to be finished, and communication and calculation stages are overlapped through non-blocking communication, so that the efficiency of parallel calculation is improved.
S7: and after the communication between the node and the source node and the target node is finished, finishing the communication stage.
When implementing SNN based on the NEST simulator, parameters therein may be set through a setcomm (type) function, thereby enabling a different communication mechanism, and if this function is not called, the default communication is the NEST native communication mode. And calling a Setcomm (type) function to start the communication mechanism in the technical scheme of the patent.
According to the technical scheme, connection from the neurons to the process is reduced through redistribution operation on the neurons, so that cross-node pulses are reduced, and nodes are sparser; because the pulses generated by each round of neurons of the SNN model after redistribution operation are rare, the occurrence of point-to-point communication is reduced to a certain extent, and only pulse data required by a target process needs to be sent, so that the problem of poor communication energy consumption expandability caused by NEST collective communication is improved.
The performance of the solution of the patent is confirmed below based on the Cortical Microcircuit (CM) model.
An experimental environment selects a high-performance heterogeneous brain platform consisting of 28 PYNQ-Z2 blocks, and each node comprises an ARM A9 dual-core processor system at a PS (process system) end, 1 FPGA (field programmable gate array) device at a PL (programmable logic) end and 512MB of physical memory. The nodes adopt Ethernet with 1000Mbps network bandwidth for communication and adopt TCP/IP protocol. The platform aims to build a high-performance brain-like computing platform through two dimensions (firstly, parallel computing is realized by utilizing a PS end of a plurality of nodes, and secondly, an efficient acceleration architecture is established by utilizing a PL end of each node).
The Cortical Microcircuit (CM) model network has four layers, each consisting of inhibitory and excitatory neuron populations, divided into 8 populations, for a total of 77k neurons and 3 hundred million synapses. The neuron type is iaf _ psc _ exp, all connections are static synapses, the connection rule is fixed _ total _ number, and 8 poisson and corresponding neuron groups are connected completely.
To verify the benefits of the patented solution to SNN distributed computation, neurons and synapses of the cortical microcircuit model were scaled to 0.1 and 0.02, simulating 200ms in 0.1ms time steps. The cortical microcircuit model calls the kernel of the NEST (C + + implementation) based on the NEST example written by Python, thus simulating the SNN. The method is mainly characterized in that time step driving is adopted, and an MPI library is used for parallel communication. The temporal statistics of the model are as follows:
(1) the calculation time is the time taken by all the neuron sequence state updates of each node;
(2) the communication time is the time from the start of the node communication to the completion of the data reception;
(3) simulation time refers to the total time it takes to complete a simulation, including computation time and communication time.
Communication data volume statistics MPI function all data byte volumes (signaled int type) actually sent by all nodes in 200ms simulation time.
In order to compare the communication efficiency and performance of the technical scheme of the patent with the original edition NEST mode, 4, 8, 16 and 28 nodes are respectively used for simulation, and the operation results are shown in FIGS. 6-10, wherein the technical scheme of the patent is marked as redistribution and sparse exchange, and the original edition NEST mode is marked as NEST. In the cortical microcircuit model, because the initial neuron pulses are dense, the sending buffer area is very large at the beginning, the waste of subsequent communication data is more obvious, and therefore simulation is performed on 28 nodes, and compared with an original edition NEST mode, the communication energy consumption of the technical scheme is reduced by about 73 times. As shown in fig. 6, when there are few nodes, the simulation time of the NEST mode of the original version is shorter than that of the technical solution of the present invention, however, the nodes continue to increase, and the simulation time of the NEST mode of the original version is in a rapid rising trend; in accordance with the curve change of fig. 6, as shown in fig. 7, as the number of nodes increases, the communication time between the nodes is also in quick proximity, and it is known that, as the number of computing nodes in the distributed computing platform increases gradually, the time for communication is effectively suppressed, and more time is left for computing. As shown in fig. 8, with the increase of nodes, the communication data volume of the NEST mode of the original version shows a rapid rising trend, while the data volume of the technical solution of the present patent is slowly increased, which indicates that the expandability of the technical solution of the present patent to the communication energy consumption is greatly improved.
As shown in fig. 9, with the increase of nodes, the time taken by the technical solution of the present invention is less and less, because the number of neurons that can be accommodated in each node decreases due to the increase of nodes, and therefore, when the algorithm loops to perform neuron matching, the number of computations for calculating the number of post-synaptic neurons located in the node neuron set is greatly reduced.
Since neuron redistribution is the time consumed before simulation, the cost required by ReLOC in the technical scheme of the patent is gradually reduced along with the increase of simulation time. Therefore, the node and simulation time are inversely proportional to the cost of ReLOC algorithm, so that the cost eventually becomes marginal.
The core of ReLOC algorithm is matching process, each matching will select the optimal matching set, the optimal means SNN subgraph cutting with lowest cost, i.e. SNN subgraphs are distributed on the least nodes. To describe the topology of SNN, a neuron connection diagram is represented by an adjacency matrix a, [ i, j ] ∈ {0,1}, where a [ i, j ] ═ 1 when there is a synaptic connection between neurons i and j, and otherwise a [ i, j ] ═ 0. The distribution matrix is denoted by T, T [ i, j ] equals 1 when a neuron i is distributed to a node j, otherwise T [ i, j ] equals 0.
The postsynaptic neurons share impulse data, and N can be used for measuring cross-node communicationNP(number of neuron-to-node connections) by first defining P
P[i,j]≧ 0 indicates that there is a connection of neuron i to node j, then NNPCan be expressed as:
the experiment shown in fig. 10 compares the N of the cyclic distribution in the NEST mode of the original version (labeled cyclic distribution in the figure) with the redistribution in the solution of the present patent (labeled redistribution in the figure)NPThe results show that ReLOC algorithm makes NNPThe reduction, i.e. the inter-node communication becomes more sparse.
Claims (4)
1. A communication sparsification method for a pulse neural network computing load comprises the following steps:
s1: constructing an impulse neural network based on a distributed system architecture;
it is characterized by also comprising the following steps:
s2: traversing the pulse neural network to find all nodes, neurons and a neuron connection table;
s3: performing a redistribution operation on the neurons on each of the nodes;
the redistribution operation ensures that the neuron assigned on each node has the most post-synaptic neurons in that node;
s4: establishing synapses according to the nodes subjected to the redistribution operation and the neurons distributed on each node;
s5: after entering a communication stage and before each communication, each node communicates with other nodes to obtain information of all source nodes and target nodes of the node in the communication;
s6: each node asynchronously sends pulse data to all target nodes and waits for receiving pulses sent by a source node; in the communication process, all the nodes communicate with the source node and the target node thereof based on a non-blocking communication mode;
s7: and after the communication between the node and the source node and the target node is finished, finishing the communication stage.
2. The communication sparsification method of the impulse neural network computing load according to claim 1, characterized in that: in step S3, the redistribution operation includes the following steps:
a 1: putting all nodes into a node set, and putting all neurons into a neuron set;
a 2: selecting a node from the node set, and removing the current node from the node set after the node is set as the current node;
a 3: selecting a neuron from the neuron set, and setting the neuron as a current neuron;
a 4: distributing the current neuron onto the current node;
removing the current neuron from the set of neurons;
a 5: confirming whether the current node reaches the maximum accommodating amount;
if the maximum capacity has not been reached, performing step a 6; otherwise, executing step a 8;
a 6: based on the neuron connection table, finding out the neuron with the postsynaptic neuron most distributed in the node in the neuron set, and recording as the neuron to be distributed;
a 7: distributing the neurons to be distributed to the current node, and removing the neurons to be distributed from the neuron set; circularly executing the steps a 5-a 7;
a 8: confirming whether nodes without distributed neurons exist in the node set or not;
if yes, circularly executing the steps a 2-a 8;
otherwise, recording all distribution results, and ending the redistribution operation.
3. The communication sparsification method of the impulse neural network computing load according to claim 1, characterized in that: in step S1, when constructing the spiking neural network, it is only necessary to construct connections between neurons without creating specific synapses, and the neuron connection table may be obtained.
4. The communication sparsification method of the impulse neural network computing load according to claim 1, characterized in that: the redistribution operation in step S3 is performed on any one of the nodes, and after the operation is finished, the redistribution operation is performed on the node to send the redistribution result to other nodes;
the memory of the redistribution operation uses a calculation mode:
M=N*Mint*(2N+1)
wherein: m represents the total memory, N represents the total number of neurons, MintIndicating that the int-type parameter occupies memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110233847.4A CN112835844B (en) | 2021-03-03 | 2021-03-03 | Communication sparsification method for impulse neural network calculation load |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110233847.4A CN112835844B (en) | 2021-03-03 | 2021-03-03 | Communication sparsification method for impulse neural network calculation load |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112835844A true CN112835844A (en) | 2021-05-25 |
CN112835844B CN112835844B (en) | 2024-03-19 |
Family
ID=75934398
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110233847.4A Active CN112835844B (en) | 2021-03-03 | 2021-03-03 | Communication sparsification method for impulse neural network calculation load |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112835844B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114090261A (en) * | 2021-11-26 | 2022-02-25 | 江南大学 | SNN workload prediction method and system |
CN114116596A (en) * | 2022-01-26 | 2022-03-01 | 之江实验室 | Dynamic relay-based infinite routing method and architecture for neural network on chip |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104094293A (en) * | 2012-02-08 | 2014-10-08 | 高通股份有限公司 | Methods and apparatus for spiking neural computation |
CN108875846A (en) * | 2018-05-08 | 2018-11-23 | 河海大学常州校区 | A kind of Handwritten Digit Recognition method based on improved impulsive neural networks |
WO2021012752A1 (en) * | 2019-07-23 | 2021-01-28 | 中建三局智能技术有限公司 | Spiking neural network-based short-range tracking method and system |
-
2021
- 2021-03-03 CN CN202110233847.4A patent/CN112835844B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104094293A (en) * | 2012-02-08 | 2014-10-08 | 高通股份有限公司 | Methods and apparatus for spiking neural computation |
CN108875846A (en) * | 2018-05-08 | 2018-11-23 | 河海大学常州校区 | A kind of Handwritten Digit Recognition method based on improved impulsive neural networks |
WO2021012752A1 (en) * | 2019-07-23 | 2021-01-28 | 中建三局智能技术有限公司 | Spiking neural network-based short-range tracking method and system |
Non-Patent Citations (3)
Title |
---|
STEVE B. FURBER 等: "Overview of the SpiNNaker System Architecture", 《IEEE TRANSACTIONS ON COMPUTERS》 * |
李康;张鲁飞;张新伟;郁龚健;刘家航;吴东;柴志雷;: "基于FPGA集群的脉冲神经网络仿真器设计", 计算机工程, no. 10 * |
陈浩;吴庆祥;王颖;林梅燕;蔡荣太;: "基于脉冲神经网络模型的车辆车型识别", 计算机系统应用, no. 04 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114090261A (en) * | 2021-11-26 | 2022-02-25 | 江南大学 | SNN workload prediction method and system |
CN114116596A (en) * | 2022-01-26 | 2022-03-01 | 之江实验室 | Dynamic relay-based infinite routing method and architecture for neural network on chip |
Also Published As
Publication number | Publication date |
---|---|
CN112835844B (en) | 2024-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ji et al. | NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints | |
Shi et al. | Development of a neuromorphic computing system | |
Liu et al. | Neu-NoC: A high-efficient interconnection network for accelerated neuromorphic systems | |
US10482380B2 (en) | Conditional parallel processing in fully-connected neural networks | |
WO2020133317A1 (en) | Computing resource allocation technology and neural network system | |
WO2018058426A1 (en) | Hardware neural network conversion method, computing device, compiling method and neural network software and hardware collaboration system | |
Thiele et al. | Spikegrad: An ann-equivalent computation model for implementing backpropagation with spikes | |
CN112835844A (en) | Communication sparsization method for load calculation of impulse neural network | |
CN114492770B (en) | Brain-like calculation chip mapping method oriented to cyclic pulse neural network | |
WO2020133463A1 (en) | Neural network system and data processing technology | |
Davies et al. | Population-based routing in the SpiNNaker neuromorphic architecture | |
CN113298222A (en) | Parameter updating method based on neural network and distributed training platform system | |
Chen et al. | Cerebron: A reconfigurable architecture for spatiotemporal sparse spiking neural networks | |
CN111368981B (en) | Method, apparatus, device and storage medium for reducing storage area of synaptic connections | |
CN115168281A (en) | Neural network on-chip mapping method and device based on tabu search algorithm | |
Fang et al. | Spike trains encoding optimization for spiking neural networks implementation in fpga | |
WO2021092890A1 (en) | Distributed ai training topology based on flexible cable connection | |
Pu et al. | Block-based spiking neural network hardware with deme genetic algorithm | |
CN113297127A (en) | Parameter updating method and platform system for large-scale distributed training cluster | |
Ding et al. | A hybrid-mode on-chip router for the large-scale FPGA-based neuromorphic platform | |
Pei et al. | Multi-grained system integration for hybrid-paradigm brain-inspired computing | |
CN112784972B (en) | Synapse implementation architecture for on-chip neural network | |
Feng et al. | Power law in deep neural networks: Sparse network generation and continual learning with preferential attachment | |
Ascia et al. | Networks-on-chip based deep neural networks accelerators for iot edge devices | |
Chen et al. | Efficient multi-training framework of image deep learning on GPU cluster |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |