Background
Definition of terms
The neuron cluster is derived from the concept of neuron cluster of human brain, and represents a set of neurons. The physical neuron is the minimum calculation unit in the brain-like chip, and the physical neuron cluster consists of a plurality of physical neurons. Logical neuron clusters refer to logical neuron clusters represented in a brain-like computational model, which have not been assigned to physical neuron clusters.
In recent years, brain-like computing and brain-like chips have gained more and more attention in the scientific research community and the industrial community. In the related research field, Intel developed a Loihi brain chip, and in the paper "planning Spiking Neural Networks on Intel's lohi" (document link: ieeexplore. ie. org/document/8303802; publishing: IEEE; page(s): 52-61; publishing in: Computer (Volume: 51, Issue: 3, March 2018); Date of Publication: 27 February 2018) mentioned that the compiler of Loihi first splits the Spiking Neural network, with the goal of using the least number of neurons to compute the core as possible, and a specific splitting method is not described in the paper.
The present invention discloses a chinese patent application with publication number CN106650922A, patent name "hardware neural network conversion method, computing device, software and hardware cooperation system", wherein the operation of the neural network splitting step is to split the neural network connection diagram into neural network basic units, each neural network basic unit has only an input node and an output node, there is no intermediate layer node, the input node and the output node are all connected, all the output degrees of the neurons in the input node are in the basic unit, and all the input degrees of each neuron in the output node are in the basic unit. Under the condition that the neural network connection graph is a directed acyclic graph, converting each neural network basic unit one by one according to the topological order of the neural network connection graph; under the condition that the neural network connection graph is the directed graph with the rings, the rings with the directed graph with the rings are firstly disassembled to enable the neural network connection graph to become the directed acyclic graph, and then the basic units of the neural network are converted one by one according to the topological order of the directed acyclic graph.
FIG. 1 is a schematic diagram of different regions within a Darwinian brain chip, including 7 different regions. The Darwin brain computer is composed of four sub-boards, a brain-like calculation node and 66 nodes, wherein the four sub-boards are composed of four brain-like chips, the three sub-boards are composed of three brain-like calculation nodes, the brain-like calculation nodes are shown in figure 2, and the Darwin brain computer is composed of four sub-boards.
The detailed meanings of the different regions within the darwinian brain chip are explained below. In the following description
Represents the coordinate of the upper left corner as
The coordinate of the lower right corner is
A rectangular area formed by encircling. Region(s)
Corresponding to the region where the output neuron cluster is located
The function of the system is to output the pulse of the chip set to the FPGA (the pulse is received by the FPGA and then sent to the ARM processor through a system bus); region(s)
Corresponding to the area where the forwarding neuron cluster is located
From fig. 2, it can be seen that the function is to forward received pulses to chipset B
An area; region(s)
Corresponding to the area where the forwarding neuron cluster is located
From fig. 2, it can be seen that the function is to forward received pulses to chipset C
An area; region(s)
Corresponding to the area of the common neuron cluster
Referring to FIG. 2, it can be seen that the function is to receive forwarding fields from chipset C
Either as a common neuron cluster; region(s)
Corresponding to the area of the common neuron cluster
According to FIG. 2, the function is to receive forwarding fields from chipset B
Either as a common neuron cluster; region(s)
Corresponding to the area of the common neuron cluster
Its function is to forward to the area
And an output area
Sending pulses or as a cluster of normal neurons; region(s)
Corresponding to the area of the common neuron cluster
Its function is as a common neuronal cluster.
The registers of the brain-like computing chip are divided into two types, one type is a neuron cluster register which is responsible for information description in a neuron cluster and records configuration information of neurons and a synaptic storage in the neuron cluster, such as the number of configured neurons and the base address of the synaptic storage in which synaptic connection information of each neuron is stored. The other type is a Network Interface (NI) register, which is responsible for describing routing related information, and a core register in the NI register is a dynamic reference origin register, which stores a reference origin coordinate of synaptic connections.
In a Darwinian brain computing chip, the connection relationship between neurons is stored in a synaptic memory, and the synaptic memory of each neuron cluster contains 32KB of SRAM storage space. The synaptic memory is divided into two areas: a linked list region and a synaptic packet region. The linked list area records the starting address of each neuron packet in the synaptic memory. The synapse data packet region is used to store specific data content of the neuron data packet.
For a small-scale spiking neural network, the spiking neural network can be directly operated on a single brain-like chip, but for a large-scale spiking neural network, a spiking neural network model needs to be split and then deployed on a plurality of brain-like chips.
Disclosure of Invention
The invention aims to provide a splitting method of a pulse neural network model, which lays a foundation for a brain-like computer to operate a large-scale pulse neural network model. Therefore, the invention provides a neural model splitting method of a brain-like computer operating system, which breaks through the limitation of a single brain-like chip on the size of the model and deploys the neural model on the brain-like computer in a cross-chip manner.
The embodiment of the neural model splitting method of the brain-like computer operating system adopts the technical scheme that:
a neural model splitting method of a brain-like computer operating system comprises the following steps:
(1) analyzing the neural model file, and extracting a directed graph from the neural model file;
(2) judging whether a ring exists or not, and converting a directed ring-containing graph into a directed acyclic graph;
(3) obtaining a topological sequence of a directed acyclic graph;
(4) determining a splitting point of the topological sequence, and determining the resource demand according to the splitting point;
(5) and (4) performing resource quantity inspection, judging whether the current brain-like computing node meets the resources required by the neural model, if so, successfully splitting the model, and if not, returning to the step (4) to continue to execute until the model is successfully split.
Further, the process of extracting the directed graph is as follows: reading a microchip instruction from a binary neural model file; storing data in the microchip instruction to a synapse memory of the memory simulation or a register of the memory simulation; extracting a neuron connection relation, establishing a neuron cluster connection relation and establishing a model directed graph.
Further, the split point process for determining the topological sequence is as follows: initializing calculation parameters; determining splitting points by taking brain-like computing nodes as basic units; the split point is determined by taking the chip set as a basic unit.
The invention provides a neural model splitting method of a brain-like computer operating system aiming at the hardware resource constraint of the brain-like computer, which can break through the limitation of a single brain-like chip on the size of the model, and can deploy the model on the brain-like computer by crossing chips, thereby laying a foundation for the brain-like computer to operate a large-scale pulse neural network model.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A neural model splitting method of a brain-like computer operating system according to an embodiment of the present invention is shown in fig. 3. The present embodiment takes a darwinian brain computer as an example. Extracting a directed graph from the brain-like calculation model file; judging whether a ring exists or not, and converting a directed ring-containing graph into a directed acyclic graph; if a loop exists in the pulse communication between the brain-like computing nodes, the pulse input is necessarily split, so that the cross-node pulse communication constraint cannot be met; obtaining topological ordering of the directed acyclic graph; determining a topology sequence splitting point and calculating resource demand; and performing resource quantity inspection, judging whether the current brain-like computing node meets the resources required by the model, and returning to the step of determining the splitting point to execute if the resource quantity inspection does not pass the resource quantity inspection until the resource quantity inspection is met.
Further, the specific implementation process of the splitting method of the brain-like computer impulse neural network model is as follows:
s101: analyzing the neural model file aiming at the brain-like computer, and extracting a directed graph from the neural model file;
s102: judging whether a ring exists or not, and converting a directed ring-containing graph into a directed acyclic graph; for a directed cyclic graph, because a strong connected component of the directed cyclic graph cannot be split into two brain-like computing nodes, pulse communication between the brain-like computing nodes has a loop, and pulse input is inevitably split, so that cross-node pulse communication constraint cannot be met;
s103: a topological sequence of a directed acyclic graph is obtained. Although the directed acyclic graph does not have a loop, the mapping sequence does not cause a pulse communication loop between the mapped brain-like computing nodes, and the problem can be solved by taking the topological sequence as the mapping sequence of the logic neuron cluster;
s104: determining splitting points of the topological sequence, and calculating resource demand for different areas in the brain-like chip under the condition of the current splitting points;
s105: performing resource quantity inspection, and judging whether the current brain-like computing node meets the resources required by the submodel; if the resource amount check is not passed, returning to the step S104 for execution; and if the resource amount passes the checking, the model is successfully split.
The internal data of the calculation model file in the step S101 is stored in a binary form, the content of the model data is composed of flits, and the flits carry information such as instructions, addresses, data, and the like.
The process of extracting the directed graph in step S101 is as follows: analyzing instructions and data carried in the microchip; writing data into a register of the memory simulation or a synapse memory according to the instruction; parameters are extracted from a register simulated by a memory or a synaptic storage, and a node connection relation of a logic neuron cluster, namely a directed graph, is established.
The method used in the step S102 of converting the directed cyclic graph into the directed acyclic graph is to first obtain the strongly connected component and then perform the puncturing, where the puncturing refers to regarding a set of all points in the strongly connected component as one point.
The constraint of the cross-node impulse communication in the step S102 means that no loop exists in the impulse communication between the brain-like computing nodes.
An example of the "directed acyclic graph itself does not have a loop in step S103, but the mapping order is not correct, which may cause an impulse communication loop to exist between the mapped brain-like computing nodes" is shown in fig. 4, as can be seen from (a) in fig. 4, the connection relationship between the model neuron cluster nodes is an acyclic graph, but after mapping, it can be seen that impulse communication loops are formed between the brain-like computing nodes, as shown in (b) in fig. 4.
The "different regions inside the Darwinian brain chip" in the step S104 are shown in FIG. 1, wherein the regions
Is the region where the output neuron is located
And
is the area in which the forwarding neuron is located, the area
、
、
、
Is the location of the common neuronThe area of (a).
The "resource demand of different regions" in step S104 refers to the quantity of neuron clusters required by the computational model for the ith region in the darwinian brain chip, and is used for calculating the quantity of neuron clusters required by the computational model for the ith region
And (4) showing. i means
-
The ith.
The meaning of "the resource amount check passes" in the step S105 is as follows: the actual neuron cluster number of the ith area of the Darwinian brain chip is
If for all i, all satisfy
And if so, the resource amount is considered to pass the inspection. The resource quantity inspection is to judge whether the quantity of the neuron clusters required by the split model for all the areas of the brain-like chip does not exceed the resource upper limit of the brain-like chip.
The following is a specific splitting method flow of the embodiment of the invention:
step S101: and reading a calculation model file of the handwritten number recognition model, and extracting a directed graph from the calculation model file.
The internal data of the calculation model file is stored in a binary form, and the content of the model data is composed of the flits. The flits carry information such as instructions, addresses, data and the like. Example of flits as shown in fig. 5, the meaning of the individual bits contained in the flit header is: (1) 32-35 position: a virtual channel indicating a buffer number of the virtual channel; (2) 36-37 position: a flit type indicating whether the current flit is a head flit, a middle flit or a tail flit; (3) positions 62-63: and the chip group number represents the chip group appointed by the current microchip. The significance of each bit contained in the flit tail of the head flit is as follows: (1) 0-11 position: coordinates of a source neuron cluster (the coordinates are absolute coordinates and are not influenced by a dynamic reference origin register), wherein the abscissa and the ordinate respectively occupy 6 bits; (2) 12-23 position: coordinates of the target neuron cluster (the coordinates are absolute coordinates and are not influenced by a dynamic reference origin register), wherein the abscissa and the ordinate respectively occupy 6 bits; (3) 24-28 bits: and the port direction indicates the input/output port of the router through which the data packet passes. The flit tails of the middle flit and the tail flit only contain specific data which are used for configuring registers and synaptic memories of the brain-like computing chip.
As shown in fig. 6, the process of analyzing the binary file of the calculation model and obtaining the directed graph is as follows:
c101: and analyzing the microchip instruction. The coordinates of the target neuron cluster node in the head microchip indicate the neuron cluster to which the model data packet is sent. And the data part carried by the middle or tail flits contains four key information: a read or write operation, the object of the operation, the address of the object, and the content of the write operation.
C102: the data is stored to a synaptic memory of the memory emulation or a register of the memory emulation. When the result of the parsing of the intermediate flit in step C101 is a write operation and the object of the write operation is a synaptic memory or a register, the address of the object of the write operation is extracted. Further, the data to be written is extracted from the next intermediate flit data portion.
C103: judging whether the flits are not processed, if so, returning to the step C101 for execution; if not, step C104 is performed.
C104: extracting the connection relation of the neurons and establishing the connection relation of the neuron clusters. The neuron cluster register holds the number of neurons that the current neuron cluster has been configured for. Traversing each neuron data packet of the synaptic storage, and extracting continuous neuron numbers and neuron cluster node relative coordinates from the neuron data packets; and extracting the dynamic reference origin coordinates from the network interface register, and calculating the absolute coordinates of the target neuron cluster nodes connected with the neurons by combining the relative coordinates.
C105: and establishing a model directed graph, wherein the model directed graph is stored by adopting an adjacency matrix.
Step S102: and judging whether the directed graph has a ring or not, and converting the directed ring graph into a directed acyclic graph so as to meet the cross-node pulse communication constraint. Here, topological sorting may be used to determine whether a directed graph has a ring, and a reduction method may be used to convert a directed cyclic graph into a directed acyclic graph. The reasons for the cross-node impulse communication constraint are mainly two: one reason is that the darwinian-like brain computing node can only receive the pulse input from the outside once in a single time step, and the other reason is that the membrane potential of the neuron based on the LIF model can naturally decay with the lapse of time, and if the pulse input is divided into a plurality of time steps for input, the dynamic behavior of the neuron based on the LIF model pulse neural network model can be influenced. If the impulse communication between the brain-like computing nodes has a loop, the impulse input is necessarily split, so that the constraint of the cross-node impulse communication cannot be met. In summary, the pulse inputs of one brain-like computing node must be sent together to another brain-like computing node.
Step S103: a topological sequence of a directed acyclic graph is obtained. The topological sort of the directed acyclic graph acquired in step S102 is acquired using a topological sort method. According to the nature of the topological sequence, the topological sequence determines a precedence relationship, so that synapse connection pointing to the mapped neuron cluster does not exist in the post-mapped neuron cluster, and the problem of pulse communication loops among brain-like computing nodes is solved.
Step S104: and determining a topology sequence splitting point and calculating the resource demand. The method comprises the following specific steps:
first, key parameters are initialized
And
wherein s is the starting position of the topology sequence to be processed, and e is the ending position of the topology sequence segment to be processed;
then, taking the brain-like computing node as a basic unit to determine split points, and performing the steps in the reverse orderObtaining split points from calendar topology sequences
. The split target of the calculation model is satisfied by adopting reverse order traversal: mapping the model to the same chip set or the same class of brain computing nodes as much as possible; then, the splitting point is determined by taking the chip set as a basic unit, and the interval [ s, k ] is formed because the number of the chip sets on the brain-like computing node is 3]The sub-topology sequences within the range are split into at most 3 subsequences.
Then, determining the resource demand according to the split points, wherein the resource demand refers to the neuron cluster demand of the calculation model for different areas in the Darwinian brain chip, and the neuron cluster demand of the model for the ith area is
. The resource demand calculation process is as follows:
1) initializing a directed graph
Is used to represent the directed acyclic graph obtained in step S102, wherein
Represents the set of vertices in the directed graph,
representing a set of edges in a directed graph;
represents a set of vertices pointing to vertex v;
a set of vertices representing the vertex v points to;
is the set of vertices mapped to chipset a;
is the set of vertices mapped to chipset B;
is the set of vertices mapped to chipset C;
the neuron cluster demand for the ith region is modeled.
2) A vertex v on chipset a is fetched.
3) If a neuron clusters a node
Is not empty, description
If there is a node connecting to the logic neuron cluster v, then
。
4) If a neuron clusters a node
Is not empty, description
If there is a node connecting to the logic neuron cluster v, then
。
5) If a neuron clusters a node
Is not empty or at
If not, the logic neuron cluster v should be mapped according to the constraint of the chipset pulse outputIncident on the region
Therefore, it is
。
6) If a neuron clusters a node
Not null, logical neuron cluster v maps to a region
Then, a forwarding neuron cluster is needed to be allocated to the network, and the forwarding neuron cluster is connected with a chip set
Is forwarded to the chipset
The region of the forwarding neuron cluster is
Therefore, it is
。
7) If a neuron clusters a node
Not null, logical neuron cluster v maps to a region
Then, a forwarding neuron cluster is needed to be allocated to the network, and the forwarding neuron cluster is connected with a chip set
Is forwarded to the chipset
The region of the forwarding neuron cluster is
Therefore, it is
。
8) If a neuron clusters a node
Null indicates that the neuron cluster v is the last layer of neuron cluster, and a normal neuron cluster and an output neuron cluster are assigned to it, so
And
。
9) if set
The unprocessed vertex is remained, and the step 2) is returned to for execution.
Step S105: and (5) performing resource quantity inspection, judging whether the current brain-like computing node meets the resources required by the model, and returning to the step S104 to execute if the resource quantity inspection does not pass. For the ith area of each Darwinian brain chip, whether the ith area meets the requirements or not is judged
And if all the resources are satisfied, the resource amount is considered to pass the checking.
Based on the process, the method can improve the utilization rate of the hardware resources of the Darwinian brain computer, breaks through the limitation of a single Darwinian second-generation brain chip on the size of the model, and deploys the model on the Darwinian brain computer in a cross-chip mode.