CN110991626A

CN110991626A - Multi-CPU brain simulation system

Info

Publication number: CN110991626A
Application number: CN201910582931.XA
Authority: CN
Inventors: 刘怡俊; 梁君泽; 叶武剑; 翁韶伟; 张子文
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2020-04-10
Anticipated expiration: 2039-06-28
Also published as: CN110991626B

Abstract

The invention discloses a multi-CPU brain analog system, which comprises a plurality of brain analog system mainboards, wherein the brain analog system mainboards are sequentially connected; the brain-like simulation system mainboard consists of six computing nodes, and the computing nodes and the brain-like simulation system mainboard are communicated by adopting SATA interfaces; the computing node comprises a plurality of CPUs and a routing system; the routing system consists of an FPGA and a CAM, and the CPU is connected with the FGPA by adopting an RGMII communication interface; the computing nodes in the same brain simulation system mainboard are connected logically, a regular hexagon interconnection structure is adopted, and each edge of the regular hexagon represents a logical connection. The invention effectively connects a large number of CPUs based on the routing system composed of the FPGA and the CAM and the regular hexagon interconnection structure, so that the number of physical connections is reduced while normal communication between the computing nodes is calculated, the difficulty of system realization is reduced, and the system has better expansion capability.

Description

Multi-CPU brain simulation system

Technical Field

The invention relates to the technical field of computers, in particular to a multi-CPU brain simulation system.

Background

The human brain undergoes long-term natural development and biological evolution, and the ultra-strong logical thinking capability and the excellent intelligent perception capability are gradually formed. The human brain can reason about everything and living things that humans are exposed to, and is known for the third and the fourth. The human brain can easily deal with various problems by means of tactile, visual, auditory, logical thinking reasoning and decision strategies. The above-mentioned capabilities of the human brain are places that modern computers cannot compare with, but are also the purpose of continuous improvement and struggle of modern computer technology. For further development and research of intelligent brain computing, biological brain mechanisms, especially human brain mechanisms, should be used as reference, and a general intelligent system should be constructed as the first approach.

Brain-like computing refers to the simulation of the nervous system of the brain and the information processing process, and realizes a high-performance and low-power-consumption computing system. Although the existing supercomputer has the capability of brain-like calculation, the defects of high power consumption, high cost, large volume, low efficiency and the like make the development of brain-like calculation research and the realization of application difficult.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a multi-CPU brain simulation system which has low power consumption, low implementation difficulty and reasonable cost and can be expanded or cut according to the calculated amount of tasks.

In order to achieve the purpose, the technical scheme provided by the invention is as follows:

a multi-CPU brain analog system comprises a plurality of brain analog system main boards which are sequentially connected;

the brain-like simulation system mainboard consists of six computing nodes, and the computing nodes and the brain-like simulation system mainboard are communicated by adopting SATA interfaces;

the computing node comprises a plurality of CPUs and a routing system; the routing system consists of an FPGA and a CAM, and the CPU is connected with the FGPA by adopting an RGMII communication interface;

the computing nodes in the same brain simulation system mainboard are connected logically, a regular hexagon interconnection structure is adopted, and each edge of the regular hexagon represents a logical connection.

Further, the six computing nodes are respectively a first computing node, a second computing node, a third computing node, a fourth computing node, a fifth computing node and a sixth computing node;

in the logic connection, the regular hexagonal interconnection structure is specifically as follows:

the first, second, third, fourth, fifth and sixth calculation nodes are all regular hexagons;

wherein, the first, second and third computation nodes are connected in sequence by edges; the fifth computing node and the sixth computing node are respectively connected between the first computing node and the second computing node and between the second computing node and the third computing node in a seamless mode; connecting the fifth and sixth computing nodes after seamless connection; two adjacent edges of the fourth computing node are respectively superposed with one edges of the first computing node and the fifth computing node.

Further, on the physical connection, the first computing node is respectively connected with a second computing node, a fourth computing node and a fifth computing node; the second computing node is respectively connected with the first computing node, the third computing node, the fifth computing node and the sixth computing node; the third computing node is respectively connected with the second computing node and the sixth computing node; the fourth computing node is respectively connected with the first computing node and the fifth computing node; the fifth computing node is respectively connected with the first, second, fourth and sixth computing nodes; the physical connection among the computing nodes corresponds to the logical connection among six regular hexagons in a regular hexagon interconnection structure one by one.

Furthermore, the first, third, fourth and sixth computing nodes are provided with an external link connected with a mainboard of an external brain-like simulation system;

the external link set by the first computing node is respectively connected with the link connected between the first computing node and the second computing node, the link connected between the first computing node and the fourth computing node, and the link connected between the first computing node and the fifth computing node;

an external link arranged on the third computing node is respectively connected with a link connected between the third computing node and the second computing node and a link connected between the third computing node and the sixth computing node;

an external link arranged on the fourth computing node is respectively connected with a link connected between the fourth computing node and the first computing node and a link connected between the fourth computing node and the fifth computing node;

and an external link arranged on the sixth computing node is respectively connected with a link connected between the sixth computing node and the second computing node, a link connected between the sixth computing node and the third computing node, and a link connected between the sixth computing node and the fifth computing node.

Furthermore, the CPU is a single-core CPU, a thread for simulating neurons and a clock synchronization thread are bound to a core of the CPU, and an internal cache, an intra-core routing table, and a weight table are provided for storing pulse data packets generated by the neurons in the thread.

Further, the routing table in the core includes the number items of neuron ID, weight address and weight in the core; the weight table includes an offset address, a destination neuron ID, and a weight entry.

Furthermore, the CPU is a multi-core CPU, a core of the CPU is bound with a routing thread for processing the routing problem of the pulse data packet between the neuron threads, one core is bound with a clock synchronization thread, and other cores are bound with threads for simulating neurons, an internal cache for storing the pulse data packet generated by the neurons in the threads and a receiving cache for storing the pulse data packet sent by other threads; and the multi-core CPU is provided with a first out-of-core routing table and a second out-of-core routing table which are used for addressing the pulse data packet sent to the thread outside the thread.

Further, the first out-of-core routing table includes entries for an in-core neuron ID, an offset address, and a number of entries; the second out-of-core routing table comprises a CPU number, a core number, a weight address and a number item of a weight.

Furthermore, the CPU is provided with an external receiving buffer for receiving the pulse data packet sent by the routing system.

Compared with the prior art, the principle and the advantages of the scheme are as follows:

1. based on a routing system consisting of an FPGA and a CAM and an interconnection structure of a regular hexagon, a large number of CPUs are effectively connected, so that the number of physical connections is reduced while normal communication between computing nodes is realized, the difficulty of system implementation is reduced, the expansion capability is good, and a hardware basis is provided for multi-CPU analog neurons.

2. And establishing a first out-of-core routing table by taking the in-core neuron ID as an index, so that the number of table entries of the out-of-core routing table is greatly reduced, and the traversal time of the weight value table is shortened.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the services required for the embodiments or the technical solutions in the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram of a multi-CPU brain simulation system according to the present invention;

fig. 2 is a schematic structural diagram of a brain-like simulation system motherboard in a multi-CPU brain-like simulation system according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a compute node in a multi-CPU-like brain simulation system according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of the interconnection in the motherboard of a brain-like simulation system in a multi-CPU brain-like simulation system according to an embodiment of the present invention;

fig. 5 is a schematic diagram illustrating interconnection between boards of a brain-like simulation system in a multi-CPU brain-like simulation system according to an embodiment of the present invention;

FIG. 6 is a process of addressing a pulse packet when a single CPU core simulates a neuron;

FIG. 7 is a block diagram of a single CPU multi-core simulated neuron system;

FIG. 8 is a process of addressing an external pulse packet when a single CPU multi-core simulates a neuron;

FIG. 9 is a schematic diagram of a multi-CPU multi-core simulated neuron system;

fig. 10 is a schematic diagram of an interconnection structure of a regular quadrangle, a regular hexagon, and a regular octagon.

Detailed Description

The invention is further illustrated below with reference to three specific examples:

example 1:

as shown in fig. 1, the multi-CPU brain-like simulation system according to this embodiment includes a plurality of brain-like simulation system motherboards 1, and the brain-like simulation system motherboards 1 are sequentially connected; the brain-like simulation system mainboard 1 is composed of six computing nodes 2, namely a first computing node, a second computing node, a third computing node, a fourth computing node, a fifth computing node and a sixth computing node, and the computing nodes 2 and the brain-like simulation system mainboard 1 are communicated by SATA interfaces, as shown in FIG. 2.

As shown in fig. 3, the computing node 2 includes eight CPUs and a routing system; the routing system consists of FPGA and CAM, and the CPU and FGPA are connected by RGMII communication interface.

Due to the large number of connections among the nodes of the multi-CPU brain analog system, the connection between the circuit boards needs to be symmetrical and have infinite expansion capability. Therefore, in this embodiment, in the logical connection between the computing nodes 2 in the same brain simulation system-like motherboard 1, a regular hexagon interconnection structure is adopted, and each edge of the regular hexagon represents one logical connection.

As shown in fig. 4(a), the regular hexagonal interconnection structure is specifically as follows:

the first, second, third, fourth, fifth and sixth calculation nodes are all regular hexagons; wherein, the first, second and third computation nodes are connected in sequence by edges; the fifth computing node and the sixth computing node are respectively connected between the first computing node and the second computing node and between the second computing node and the third computing node in a seamless mode; connecting the fifth and sixth computing nodes after seamless connection; two adjacent edges of the fourth computing node are respectively superposed with one edges of the first computing node and the fifth computing node.

As shown in fig. 4(b), the first computing node is connected to the second, fourth, and fifth computing nodes respectively on the physical connection; the second computing node is respectively connected with the first computing node, the third computing node, the fifth computing node and the sixth computing node; the third computing node is respectively connected with the second computing node and the sixth computing node; the fourth computing node is respectively connected with the first computing node and the fifth computing node; the fifth computing node is respectively connected with the first, second, fourth and sixth computing nodes; the physical connection among the computing nodes corresponds to the logical connection among six regular hexagons in a regular hexagon interconnection structure one by one. The first to fourth computing nodes and the third to fifth computing nodes have no adjacent edges logically, so that the two pairs of computing nodes on the brain simulation system-like mainboard 1 have no physical connection.

Based on rich wiring resources and multi-path switching functions of the FPGA and a regular hexagon communication network structure, the number of connections between the mainboards is reduced by adopting a line sharing method. The logic is unchanged, and physically, the first, third, fourth and sixth computing nodes are provided with an external link connected with an external brain-like simulation system mainboard (1); the external link set by the first computing node is respectively connected with the link connected between the first computing node and the second computing node, the link connected between the first computing node and the fourth computing node, and the link connected between the first computing node and the fifth computing node; an external link arranged on the third computing node is respectively connected with a link connected between the third computing node and the second computing node and a link connected between the third computing node and the sixth computing node; an external link arranged on the fourth computing node is respectively connected with a link connected between the fourth computing node and the first computing node and a link connected between the fourth computing node and the fifth computing node; the external link set by the sixth computing node is respectively connected with the link connected between the sixth computing node and the second computing node, the link connected between the sixth computing node and the third computing node, and the link connected between the sixth computing node and the fifth computing node, as shown in fig. 5.

In this embodiment 1, the CPU is a single-core CPU, and a single-CPU single-core neuron is simulated, specifically as follows:

in the biological neural network, the neurons of the same type are gathered together and connected at a short distance, and the neurons of different types are connected relatively far, so when the neurons are divided into CPUs, the neurons of the same type are generally divided into one CPU or one CPU core. The brain-like simulation system needs to simulate a large number of neurons, the operation time of a single neuron is short, and if the process simulation neuron is used, the switching of processes is very frequent, so that the utilization rate of system resources is low. In this embodiment 1, a core of a CPU is bound with a thread for simulating a certain number of neurons, a clock synchronization thread, and an internal cache for storing a pulse packet generated by a neuron in the thread, an intra-core routing table, and a weight table. The routing table in the core comprises neuron ID in the core, weight address and number item of the weight; the weight table includes an offset address, a destination neuron ID, and a weight entry.

When the time synchronization signal arrives, the thread updates the state of the neuron firstly, and then reads the pulse data packet buffered inside for addressing operation. As shown in fig. 6, there is a pulse packet in the internal cache, and the ID of the intra-core neuron of the pulse packet is D1, and the corresponding entry is found in the intra-core routing table by using the ID of the intra-core neuron. The offset address of the weight value table and the number of the weight value table entries can be obtained from the table entries. And searching the ID and the weight of the target neuron in the weight value table according to the two items of information, and finally superposing the weight to the input variable of the target neuron.

Example 2:

compared with embodiment 1, in this embodiment 2, the CPU is a multi-core CPU, and a single-CPU multi-core neuron is simulated, specifically as follows:

the single CPU multi-core simulation neuron simulates the neuron by using a plurality of threads to work cooperatively. Two key problems to be solved for realizing single-CPU multi-core simulation of neurons are respectively: the issue of sending packets to other threads and the problem of addressing received burst packets.

In order to solve the two problems, a core of a multi-core CPU is bound with a routing thread for processing the routing problem of pulse data packets between neuron threads, one core is bound with a clock synchronization thread, other cores are bound with threads for simulating neurons, and an internal cache for storing the pulse data packets generated by the neurons in the threads and a receiving cache for storing the pulse data packets sent by other threads are arranged; and the multi-core CPU is provided with a first out-of-core routing table and a second out-of-core routing table which are used for addressing the pulse data packet sent to the thread outside the thread. The first out-of-core routing table comprises an in-core neuron ID, an offset address and a number item of table entries; the second out-of-core routing table comprises a CPU number, a core number, a weight address and a number item of a weight. A block diagram of a single CPU multi-core neuron simulation system is shown in FIG. 7.

The neurons of the same type are distributed in the same thread as much as possible, so that external connection is less; the connection of the neurons is known at the time of routing table construction. A forwarding flag variable is designed in the neuron structure to indicate by the lower 7 bits that the pulse packet should be forwarded to that thread. When a bit is set to 1, the thread is forwarded; when set to 0, it indicates that the thread is not to be forwarded to the corresponding thread.

When the time synchronization signal arrives, the thread updates the state of the neuron firstly, and then reads the pulse data packet buffered inside for addressing operation. In the single-CPU multi-core simulation neuron, the forwarding flag variable of the neuron is judged at the moment, and whether forwarding to a routing thread is needed or not is checked. And if the forwarding mark variable has a 1 position 1, forming a data packet by the CPU number, the CPU core number, the neuron ID and the forwarding mark variable, and sending the data packet to the routing thread through the SOCKET interface. And after the routing thread receives the data packet, checking the forwarding mark variable, if a plurality of positions 1 exist, removing the forwarding mark variable of the data packet, copying a plurality of forwarding mark variables, finally sending the data to a receiving cache of a corresponding thread through an SOCKET interface, and waiting for processing when the next time arrives.

And finishing processing the pulse data packet buffered inside, and then processing the pulse data packet in the receiving buffer. As shown in fig. 8, a burst packet with an ID of D1 for the intra-core neuron is stored in the receiving buffer, and the entry of D1 is looked up in the first extra-core routing table to obtain the offset address and the number of entries in the second extra-core routing table. And comparing the CPU number and the CPU core number in the pulse data packet with the CPU number and the CPU core number at the 0 address of the second core external routing table, and successfully matching the result. And obtaining a weight address and the number of table entries from the table entry at the address 0, finding out the corresponding ID and weight of the target neuron in the weight table, and superposing the weight to the input variable of the target neuron.

Example 3:

and combining a hardware platform of a multi-CPU brain simulation system, and further optimizing the single-CPU multi-core simulated neuron system to realize the multi-CPU multi-core simulated neuron.

Compared with embodiment 2, in this embodiment 3, the multi-core CPU is provided with an external receiving cache for receiving the burst data packet sent by the routing system, and at the same time, a network service function based on an original socket is added for receiving/sending data to the routing system, as shown in fig. 9. And an outgoing flag bit is added in a forwarding flag variable of the neuron, and the position 1 indicates that a pulse data packet is sent out of the CPU.

The forwarding process of the pulse data packet from the thread to the routing thread is the same as that of the single-CPU multi-core simulation neuron. When the flag variable is judged, whether the outgoing flag bit is 1 needs to be judged, and if the outgoing flag bit is 1, the pulse data is forwarded to the routing system. And after receiving the pulse data packet, the routing system performs routing table matching, and the matching result indicates that the pulse data packet is forwarded to a port of the CPU 2. After receiving the pulse data packet, the routing thread of the CPU2 performs retrieval in conjunction with the first out-of-core routing table and the second out-of-core routing table, and a result indicates forwarding to thread 2. The routing thread forwards the pulse data packet to the thread 2 through the SOCKET interface, stores the pulse data packet in a receiving buffer, and waits for processing when the next synchronous clock arrives.

In the three specific embodiments, the regular hexagonal interconnection structure is adopted, so that the number of physical connections is reduced while normal communication between the nodes is calculated, the difficulty of system implementation is reduced, and the expansion capability is better. In general design, a network structure of a regular quadrangle, a regular hexagon, or a regular octagon may be considered. The network diameters of the regular hexagon network structure and the regular octagon network structure are the same, the network diameter of the regular quadrilateral network structure is the largest of the three structures, and the transmission time of the system is longer; the number of physical links of the regular quadrilateral network structure is the least among the 3 network structures, and the number of physical links of the regular octagonal network structure is the most. As shown in fig. 10, the regular octagonal network structure has regular octagons and regular quadrilateral portions, and is not an expandable fully symmetrical structure, so that the regular octagons are not adopted. Because the number of connections between the nodes of the brain-like simulation system is large, more connections between the nodes are designed on one computing node as much as possible, and the number of connections of the regular quadrangle is not superior to that of the regular hexagon. Therefore, the regular hexagon structure is selected as the interconnection structure of the brain simulation system.

In addition, the number of the table entries of the routing table is small, so that the query time is reduced, and the performance of the system is improved. When the neurons are distributed to the CPUs, the neurons of the same type are divided to the same CPU or the same core as much as possible. Assuming all neurons in core 1 of CPU1, with in-core IDs 0-999, a pulse packet is sent to core 2 of CPU 1. Still other CPU cores may have a small number of burst packets sent to core 2 of CPU 1. If the first out-of-core routing table is built with the CPU and the core ID as indices and the second out-of-core routing table is built with the core ID as indices, in this case, a large number of empty rows appear in the first out-of-core routing table, and it takes a long time for the second out-of-core routing table to traverse each time. If the first out-of-core routing table is built with the in-core neuron ID as an index, the number of entries of the first out-of-core routing table is reduced by a large amount. The time for traversing the weight table is not too long.

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that variations based on the shape and principle of the present invention should be covered within the scope of the present invention.

Claims

1. A multi-CPU brain analog system is characterized by comprising a plurality of brain analog system main boards (1), wherein the brain analog system main boards (1) are sequentially connected;

the brain-like simulation system mainboard (1) consists of six computing nodes (2), and the computing nodes (2) and the brain-like simulation system mainboard (1) are communicated by adopting SATA interfaces;

the compute node (2) comprises a plurality of CPUs and a routing system; the routing system consists of an FPGA and a CAM, and the CPU is connected with the FGPA by adopting an RGMII communication interface;

the computational nodes (2) in the same brain simulation system mainboard (1) are connected logically, a regular hexagon interconnection structure is adopted, and each edge of the regular hexagon represents a logical connection.

2. The multi-CPU-like brain simulation system according to claim 1, wherein the six computation nodes (2) are first, second, third, fourth, fifth, and sixth computation nodes, respectively;

3. The multi-CPU brain-like simulation system according to claim 2, wherein said first computing node is connected to a second, a fourth and a fifth computing node respectively; the second computing node is respectively connected with the first computing node, the third computing node, the fifth computing node and the sixth computing node; the third computing node is respectively connected with the second computing node and the sixth computing node; the fourth computing node is respectively connected with the first computing node and the fifth computing node; the fifth computing node is respectively connected with the first, second, fourth and sixth computing nodes; the physical connection among the computing nodes corresponds to the logical connection among six regular hexagons in a regular hexagon interconnection structure one by one.

4. The multi-CPU brain-like simulation system according to claim 3, wherein, on the physical connection, the first, third, fourth and sixth computing nodes are provided with an external connection link connected with an external brain-like simulation system mainboard (1);

5. The multi-CPU brain-like simulation system according to claim 1, wherein the CPU is a single-core CPU, a thread for simulating neurons, a clock synchronization thread are bound to a core of the CPU, and an internal cache for storing pulse packets generated by neurons in the thread, an intra-core routing table, and a weight table are provided.

6. The multi-CPU brain-like simulation system according to claim 5, wherein the intra-core routing table comprises intra-core neuron IDs, weight addresses and number items of weights; the weight table includes an offset address, a destination neuron ID, and a weight entry.

7. The multi-CPU brain-like simulation system according to claim 1, wherein the CPU is a multi-core CPU, one core of the CPU is bound with a routing thread for processing the routing problem of the pulse data packets between the neuron threads, one core is bound with a clock synchronization thread, the other cores are bound with threads for simulating neurons, and an internal cache for storing the pulse data packets generated by the neurons in the threads and a receiving cache for storing the pulse data packets sent by the other threads are provided; and the multi-core CPU is provided with a first out-of-core routing table and a second out-of-core routing table which are used for addressing the pulse data packet sent to the thread outside the thread.

8. The multi-CPU brain simulation system according to claim 7, wherein said first extra-core routing table comprises an intra-core neuron ID, an offset address, and a number of entries; the second out-of-core routing table comprises a CPU number, a core number, a weight address and a number item of a weight.

9. The multi-CPU brain-like simulation system according to claim 7, wherein said CPU is provided with an external receiving buffer for receiving the pulse data packet sent from the routing system.