CN111210014A

CN111210014A - Control method and device of neural network accelerator and neural network accelerator

Info

Publication number: CN111210014A
Application number: CN202010009676.2A
Authority: CN
Inventors: 陈虹; 张吉霖
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-01-06
Filing date: 2020-01-06
Publication date: 2020-05-29
Anticipated expiration: 2040-01-06
Also published as: CN111210014B

Abstract

The application discloses a control method and device of a neural network accelerator and the neural network accelerator, and relates to the artificial intelligence technology. The specific scheme comprises the following steps: after an input pulse data packet is acquired, acquiring an input excitation data packet, and storing the input excitation data packet; the input excitation data packet is used for judging whether the current computing core meets the updating condition; acquiring an updating condition of the current computing core; judging whether the current computing core meets the updating condition or not according to the stored input data packet and the acquired updating condition; and when the current computing core meets the updating condition, executing updating operation and sending an output pulse data packet. According to the method and the device, each computation core in the neural network accelerator can carry out updating operation according to the actual delay time of the received pulse, and the overall operation performance of the neural network accelerator is obviously improved.

Description

Control method and device of neural network accelerator and neural network accelerator

Technical Field

The present application relates to artificial intelligence technology, and in particular, to a method and an apparatus for controlling a neural network accelerator, and a neural network accelerator.

Background

The rapid development of artificial intelligence technology makes the neural network accelerator, which is a high-performance computing device for brain-like operations, meet the peak of development. The neural network accelerator simulates the operation mode of a human brain and operates by using membrane potential and pulse potential carried by pulse.

In order to ensure that each computation core in the neural network accelerator can complete the updating operation under the condition of different receiving pulse time delay. In the prior art, the time step for updating the operation is usually designed according to the worst working condition and the longest delay time, so that each computation core in the neural network accelerator works under the longest delay time for receiving the pulse no matter the actual delay time for receiving the pulse, and the overall operation performance of the neural network accelerator is greatly reduced.

Disclosure of Invention

In view of this, a main object of the present application is to provide a control method for a neural network accelerator, which enables each computation core in the neural network accelerator to perform an update operation according to an actual delay time of a received pulse, thereby significantly improving the overall operation performance of the neural network accelerator.

In order to achieve the purpose, the technical scheme provided by the application is as follows:

in a first aspect, an embodiment of the present application provides a control method for a neural network accelerator, which is applied to a control device for a computational core, and includes the following steps:

after an input pulse data packet is obtained, an input excitation data packet is obtained; the input excitation data packet comprises an update condition of a current computing core;

storing the input excitation data packet, and judging whether the updating condition of the current computing core is met or not according to the stored input excitation data packet;

when the updating condition of the current computing core is met, executing updating operation and sending an output pulse data packet;

generating and transmitting an output stimulus data packet; the output stimulus data packet includes a compute core address of a target compute core and an update condition of the target compute core.

In a possible implementation, the update condition of the current computing core is that a preset number of excitation data packets are received;

the step of judging whether the update condition of the current computing core is met according to the stored input excitation data packet comprises the following steps:

and judging whether a preset number of excitation data packets are received or not according to the total number of each stored input excitation data packet.

In a possible implementation manner, the input pulse data packet carries at least one input pulse potential and a neuron address corresponding to each input pulse potential;

the step of performing an update operation includes:

aiming at each input pulse potential, sending the input pulse potential to a target neuron according to a neuron address corresponding to the input pulse potential;

receiving an output pulse potential sent by the target neuron; the output pulse potential is generated by the target neuron according to the membrane potential of the target neuron, the input pulse potential received by the target neuron, a preset leakage potential and a preset potential threshold;

and generating the output pulse data packet according to each output pulse potential.

In a possible implementation, after the step of performing the update operation, the method further includes:

emptying the stored input excitation data packet.

In one possible embodiment, the step of generating and transmitting the output stimulus packet comprises:

acquiring a computing core address of the target computing core and an updating condition of the target computing core;

generating the output excitation data packet according to the computing core address of the target computing core and the updating condition of the target computing core;

and transmitting the output excitation data packet.

In a second aspect, an embodiment of the present application further provides a control apparatus for a neural network accelerator, which is applied to a control device of a computing core, and includes:

the excitation acquisition module is used for acquiring an input excitation data packet after acquiring the input pulse data packet; the input excitation data packet comprises an update condition of a current computing core;

the excitation judging module is used for judging whether the updating condition of the current computing core is met or not according to the stored input excitation data packet;

the updating module is used for executing updating operation and sending an output pulse data packet when the updating condition of the current computing core is met;

the excitation sending module is used for generating and sending an output excitation data packet; the output stimulus data packet includes a compute core address of a target compute core and an update condition of the target compute core.

the excitation judging module is specifically configured to:

and judging whether a preset number of excitation data packets are received or not.

the update module specifically includes:

the neuron sending unit is used for sending the input pulse potential to a target neuron according to a neuron address corresponding to the input pulse potential aiming at each input pulse potential;

the neuron receiving unit is used for receiving the output pulse potential sent by the target neuron; the output pulse potential is generated by the target neuron according to the membrane potential of the target neuron, the input pulse potential received by the target neuron, a preset leakage potential and a preset potential threshold;

and the pulse generating unit is used for generating the output pulse data packet according to each output pulse potential.

In a possible embodiment, the apparatus further comprises:

and the emptying module is used for emptying the stored input excitation data packet.

In one possible embodiment, the stimulus sending module includes:

an obtaining unit, configured to obtain a computing core address of the target computing core and an update condition of the target computing core;

the generating unit is used for generating the output excitation data packet according to the computing core address of the target computing core and the updating condition of the target computing core;

a sending unit, configured to send the output excitation data packet.

In a third aspect, an embodiment of the present application further provides a neural network accelerator, including: the method comprises the steps of configuring equipment, storage equipment, routing equipment and a computing core;

the computing core comprises: neurons and control devices;

the control device is configured to implement the steps of any one of the possible embodiments of the first aspect and the first aspect, or implement the apparatus of any one of the possible embodiments of the second aspect and the second aspect.

In summary, the present application provides a control method and device for a neural network accelerator, and the neural network accelerator. The method is different from the method that each computing core works under the longest delay time of receiving pulses in the prior art, the computing core obtains and stores an input excitation data packet, judges whether the updating condition of the current computing core is met or not according to the input excitation data packet, and directly performs updating operation without waiting when the updating condition is met. And each stage of computing core directly performs updating operation when the updating condition is met, so that each computing core in the neural network accelerator performs updating operation according to the actual delay time of the received pulse, and the overall operation performance of the neural network accelerator is obviously improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1 is a schematic architecture diagram of a neural network accelerator according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of the architecture of a routing device in a neural network accelerator;

FIG. 3 is a schematic diagram of the architecture of a compute core in a neural network accelerator;

FIG. 4a is a schematic diagram of one connection between computational cores in a neural network accelerator;

FIG. 4b is a schematic diagram of another connection between computational cores in a neural network accelerator;

FIG. 4c is a schematic diagram of another connection between computational cores in a neural network accelerator;

fig. 5 is a schematic flowchart of a control method of a neural network accelerator according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a neural network accelerator with three layers of neural networks;

fig. 7 is a schematic flowchart of another control method for a neural network accelerator according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a control device and target neuron in a computational core;

FIG. 9a is a schematic diagram of one connection between computational cores in a neural network accelerator;

FIG. 9b is a schematic diagram of the update operation time of the neural network accelerator;

FIG. 9c is a schematic diagram of another update operation time of the neural network accelerator;

fig. 10 is a schematic structural diagram of a control device of a neural network accelerator according to an embodiment of the present application;

FIG. 11 is a schematic diagram of an update module in a control device of a neural network accelerator;

FIG. 12 is a schematic diagram of a structure of a stimulus transmission module in a control device of a neural network accelerator;

FIG. 13 is a diagram illustrating one manner of connection between computational cores in a neural network accelerator.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or apparatus is not necessarily limited to those steps or apparatus explicitly listed, but may include other steps or apparatus not explicitly listed or inherent to such process, method, article, or apparatus.

In view of this, the core invention points of the embodiments of the present application are: the method comprises the steps that an input excitation data packet is obtained and stored by a computing core, whether the updating condition of the current computing core is met or not is judged according to the input excitation data packet, and when the updating condition is met, the updating operation is directly carried out without waiting. And after the updating operation is executed, generating and sending an output excitation data packet so that the subsequent target computing core can continuously judge whether the updating condition is met according to the output excitation data packet and carry out the updating operation when the updating condition is met. And each stage of computing core directly performs updating operation when the updating condition is met, so that each computing core in the neural network accelerator performs updating operation according to the actual delay time of the received pulse, and the overall operation performance of the neural network accelerator is obviously improved.

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention are described in detail below with specific embodiments. Several of the following embodiments may be combined with each other and some details of the same or similar concepts or processes may not be repeated in some embodiments.

As shown in fig. 1, an architecture of a neural network accelerator 100 provided in an embodiment of the present application is that the neural network accelerator 100 includes a configuration device 101, a storage device 102, a plurality of routing devices 103, and a computing core 104 corresponding to each routing device 103.

In the neural network accelerator 100 of the embodiment of the present application, the configuration device 101 is used to configure the structure of the neural network and the weights of different neurons. Here, the structure of the neural network specifically includes the computation cores 104 included in each level of the neural network, connection relationships of neurons within each computation core 104 and between the computation cores 104, and the like. And, each specific neural network implements a different function depending on the weights of different neurons. The step of configuring the structure of the neural network and the weights of the different neurons is specifically to input and store the structure of the neural network and the weights of the different neurons into the storage device 102.

The storage device 102 is used to store the structure of the neural network and the weights of the different neurons.

The routing device 103 is an on-chip interconnection network of a neural network accelerator, and pulse packets including input pulse packets and output pulse packets, excitation packets including input excitation packets and output excitation packets, and other data are transmitted by using the routing device 103.

The data transmission mode in the neural network accelerator 100 according to the embodiment of the present application is step-by-step transmission between the routing devices 103, as shown in fig. 2, one routing device 201 has 5 bidirectional ports: east, south, west, north and local, each bidirectional port may carry data transfer for input and output. For convenience of description, the names of the bidirectional ports are named from north, south, west and left, and east, and this naming method has no special meaning, and is only used for intuitively describing the positions of the bidirectional ports, where the above, the below, the left and the right are relative to the routing device 201, and are only used for describing the positions of the bidirectional ports, and do not necessarily have meanings of up, down, left and right in an absolute coordinate system, and the east, the west, the south and the north are also only used for describing the positions of the bidirectional ports relative to the routing device 201, and do not necessarily have meanings of east, west, south and north in a physical coordinate system. When a pulse data packet or an excitation data packet is acquired by the routing device 201, it is determined in which direction the pulse data packet or the excitation data packet should be transmitted according to a computation core address carried by the pulse data packet or the excitation data packet, and if the computation core address carried by the pulse data packet or the excitation data packet is a computation core address of the current computation core 202, the pulse data packet or the excitation data packet is sent to the local current computation core 202. If the computation core address carried by the pulse data packet or the excitation data packet is not the computation core address of the current computation core 202, the pulse data packet or the excitation data packet is sent to the adjacent routing device in the corresponding direction according to the computation core address carried by the pulse data packet or the excitation data packet. In order to facilitate transmission of the burst packet or the excitation packet, the burst packet and the excitation packet are generated and transmitted in units of the computation core 104, and are distributed to specific neurons inside the computation core 104.

As shown in fig. 3, each of the computing cores 104 includes a control device 301 and at least one neuron 302, and in general, a computing core includes a plurality of neurons 302, which may include 64 neurons 302, 128 neurons 302, or 1024 neurons 302, for example. Each neuron 302 individually comprises a membrane potential, and each neuron 302 can make a change in membrane potential according to a pulse potential.

In the prior art, the computation core 104 only receives an input pulse data packet or sends an output pulse data packet, and the update operation of each neuron is completed through the pulse data packets including the input pulse data packet and the output pulse data packet. Here, the input burst packet includes: at least one input pulse potential, and a neuron address corresponding to a neuron that specifically receives the pulse potential. For each input pulse potential, the control device 301 of the computation core 104 transmits the input pulse potential to a neuron corresponding to a neuron address of the input pulse potential. Because a plurality of neurons 302 are included in a computing core 104, an input pulse packet received by a computing core 104 may include a plurality of input pulse potentials.

Because the pulse data packet and the excitation data packet are generated and sent in units of the computing core 104, when the neural network is implemented, a specific connection structure may be as shown in fig. 4a, where a preceding computing core is connected to a subsequent computing core, and the preceding computing core sends the pulse data packet and the excitation data packet to the subsequent computing core; as shown in fig. 4b, more than two preceding-stage computation cores are connected to a subsequent-stage computation core, and each preceding-stage computation core sends a pulse data packet and an excitation data packet to the subsequent-stage computation core; it is also possible that, as shown in fig. 4c, a preceding-stage computation core is connected to more than two subsequent-stage computation cores, and the preceding-stage computation core sends a pulse data packet and an excitation data packet to each subsequent-stage computation core. The neurons 302 in the computation core 104 need to receive all the input pulse potentials before they can perform the correct update operation. Since each computing core 104 is located at a different position in the on-chip interconnection network and the connection path length between each computing core 104 is different, the transmission delay of the burst data packet between each computing core 104 is different; in addition, because the number of the preceding-stage computation cores connected to each computation core is different, the computation core can satisfy the condition of the update operation, and the time for performing the update operation is different, which further affects the time for the computation core to send the output pulse data packet to the subsequent-stage computation core connected to the computation core.

To sum up, in order to ensure that all input pulse potentials can be completely received by neurons in each computation core and update operation can be correctly completed, in the prior art, a waiting method is adopted, and a method of experimental measurement or computation is used to determine the longest delay time for transmitting pulse data packets between the computation cores under the worst condition, for example, the longest transmission delay under the condition that the connection path between the computation cores 104 in the on-chip interconnection network is longest is determined, and the time step length of update operation of each computation core 104 is determined according to the longest delay time. Whenever the computation core 104 completes the operation of receiving all the pulse packets, and whenever the neuron 302 in the computation core 104 completes the operation of receiving all the input pulse potentials, the computation core 104 waits for the longest delay time and then performs the refresh operation. The prior art method undoubtedly greatly reduces the overall operational performance of the neural network accelerator.

In the embodiment of the present application, the computation core 104 not only receives the input burst data packet or sends the output burst data packet, but also receives the input excitation data packet or sends the output excitation data packet. And finishing handshake among the computing cores through excitation data packets including input excitation data packets and output excitation data packets.

Specifically, a method for completing control of a neural network accelerator by receiving an input stimulus packet or sending an output stimulus packet is shown in fig. 5, and the method is applied to a control device of a computational core, and mainly includes:

s501: after an input pulse data packet is acquired, acquiring an input excitation data packet, and storing the input excitation data packet; the input excitation data packet is used for judging whether the current computing core meets the updating condition.

In the embodiment of the present application, the control device of the computation core further obtains an input excitation data packet on the basis of obtaining an input pulse data packet in the prior art. The input excitation data packet acquired by the control device of the computing core carries the update condition of the current computing core. Generally, the correct update operation is performed only after the computation core receives all the input burst packets. Therefore, the calculation cores are ensured to receive all input pulse data packets before updating operation is carried out through the acquired updating conditions of the current calculation cores carried in the input excitation data packets.

Since, as shown in fig. 4b, in an actual neural network structure, there is a case where one computation core is connected to more than two preceding computation cores, the computation core may receive more than two input pulse packets, and at this time, the control device of the computation core needs to receive one corresponding input excitation packet every time it receives one input pulse packet. At this time, the control device of the computing core stores the received input excitation data packet every time it receives one input excitation data packet. Specifically, the storage may be stored in a unit having a storage function in the computing core, or may be stored in a storage device inside or outside the neural network accelerator.

S502: and acquiring the updating condition of the current computing core.

The update condition of the current computing core may be stored in a storage device internal or external to the neural network accelerator, and thus, the update condition of the current computing core may be retrieved from the storage device internal or external to the neural network accelerator.

The step of obtaining the update condition of the current compute core may be performed before or after the step of obtaining the input excitation packet. Preferably, as shown in fig. 5, the step of obtaining the update condition of the current computing core and the step of obtaining the input excitation packet may be performed in parallel.

S503: and judging whether the current computing core meets the updating condition or not according to the stored input excitation data packet and the acquired updating condition.

Typically, at this point, at least one input stimulus packet is stored in a unit with memory functionality in the compute core, or in a memory device internal or external to the neural network accelerator. At this time, whether the updating condition of the current computing core is met is judged according to the stored input excitation data packet and the obtained updating condition of the current computing core.

S504: and when the current computing core meets the updating condition, executing updating operation and sending an output pulse data packet.

And when judging that all the input pulse data packets are received and judging that the updating conditions of the current computing core are met, executing updating operation, generating an output pulse data packet according to the updating operation result, and sending the output pulse data packet to a target computing core at the later stage.

Each computation core in the neural network accelerator provided by the embodiment of the present application adopts the control method provided by the embodiment of the present application, determines whether an update condition is satisfied according to the acquired input excitation data packet, and performs an update operation when the update condition is satisfied, which is different from a method of performing an update operation after waiting for a longest delay time in the prior art.

For convenience of understanding, in the embodiments of the present application, a neural network accelerator including three layers of neural networks is taken as an example, and a control method of the neural network accelerator is described in detail.

The schematic architecture of the neural network accelerator including the three-layer neural network is shown in fig. 6, and the configuration device and the storage device of the neural network accelerator do not have a great change in structure regardless of the neural network structure in the neural network accelerator, and therefore, only the routing device 601 of the neural network accelerator 600 and the computing core 602 corresponding to the routing device 601 are shown in fig. 6. The neural network accelerator 600 in fig. 6 includes an input layer, an intermediate layer, and an output layer.

Whether the computation core 602 is located in an input layer, an intermediate layer, or an output layer, the control method is similar, and as shown in fig. 7, the specific control method includes:

s701: and acquiring an input pulse data packet.

Generally, a neural network accelerator performs operation under the driving of an external electronic device, an input layer is connected with the external electronic device, an input pulse data packet is acquired from the external electronic device, and the input pulse data packet acquired by a computation core of the input layer is generated by the external electronic device. Here, the external electronic device may include a CPU, an SOC, an FPGA, or other commonly used electronic devices. The input layer is also connected with the middle layer, and the computing core of the input layer generates an output pulse data packet and sends the output pulse data packet to the computing core of the middle layer. And the output pulse data packet generated by the computing core of the input layer is the input pulse data packet taking the computing core of the middle layer as the current computing core for the computing core of the middle layer. And the intermediate layer is connected with the output layer, and the computing core of the intermediate layer receives the output pulse data packet generated by the computing core of the input layer as an input pulse data packet, generates an output pulse data packet and sends the output pulse data packet to the computing core of the output layer. And the output pulse data packet generated by the computing core of the middle layer is an input pulse data packet taking the computing core of the output layer as the current computing core for the computing core of the output layer. The computing core of the output layer can also be connected with an external electronic device, and the computing core sends the computing result to the external electronic device. During training, the computation core of the output layer may also generate an output pulse data packet, and back-propagate the generated output pulse data packet to the computation core of the middle layer. At this time, the computation core of the intermediate layer receives the output pulse packet generated by the computation core of the output layer as an input pulse packet. The pulse data packets including the input pulse data packet and the output pulse data packet each include: and calculating a core address, at least one input pulse potential and a neuron address corresponding to each input pulse potential. Here, the address of the computation core is the address of the computation core that receives the burst packet. And each input pulse potential is input into one neuron in the computation core receiving the pulse data packet, and the neuron address corresponding to the input pulse potential is the address of the neuron receiving the input pulse potential.

Taking the computing core of the input layer as an example, the external electronic device may generate an input pulse data packet of each computing core of the input layer, and send the input pulse data packet to each computing core of the input layer in a step-by-step transfer or direct transfer manner.

S702: an input stimulus packet is obtained.

The excitation data packet including the input excitation data packet and the output excitation data packet is usually received and transmitted by taking the computing core as a unit, the excitation data packet may include a computing core address and an excitation signal, and the receiving of the input excitation data packet by the current computing core means that the previous computing core connected thereto completes the update operation, and means that the current computing core receives the input pulse data packet of the previous computing core connected thereto. In order to ensure that each computation core has acquired all input pulse data packets before performing update operation, whether the current computation core meets the update condition is judged according to the acquired input excitation data packets. In order to be sufficient to determine whether all incoming pulse packets have been captured, the incoming excitation packets captured by the computational core have the same source as the captured incoming pulse packets. Specifically, the input layer is connected with an external electronic device, and an input pulse data packet is acquired from the external electronic device, so that the computing core of the input layer acquires an input excitation data packet from the external electronic device. Similarly, after the computing core of the input layer generates an output pulse data packet and sends the output pulse data packet to the intermediate layer, an output excitation data packet is also generated and sent to the intermediate layer. And after the intermediate layer acquires the output pulse data packet generated by the computing core of the input layer as an input pulse data packet, the intermediate layer also acquires the output excitation data packet generated by the computing core of the input layer as an input excitation data packet. Similarly, after the computing core in the middle layer generates an output pulse data packet and sends the output pulse data packet to the output layer, an output excitation data packet is also generated and sent to the output layer. And after the output layer acquires the output pulse data packet generated by the computing core of the middle layer as an input pulse data packet, the output layer also acquires the output excitation data packet generated by the computing core of the middle layer as an input excitation data packet.

S703: the input stimulus packet is stored.

Because there is a case where one computation core is connected to more than two previous-stage computation cores, in order to determine whether the update condition is satisfied, each acquired input excitation packet is stored. Specifically, the storage may be stored in a unit having a storage function in the computing core, or may be stored in a storage device inside or outside the neural network accelerator.

S704: and acquiring the updating condition of the current computing core.

The update condition of the current computing core may be stored in a storage device internal or external to the neural network accelerator, and thus, the update condition of the current computing core may be retrieved from the storage device internal or external to the neural network accelerator. For example, the update condition of the current computing core may be obtained from a storage device inside or outside the neural network accelerator according to the computing core address of the current computing core.

S705: and judging whether the current computing core meets the updating condition or not according to the stored input excitation data packet and the acquired updating condition.

Specifically, when the stored input excitation data packet indicates that all input pulse data packets have been received, it is determined that the update condition of the current computation core is satisfied. Therefore, it is preferable that the update condition of the current computing core is that a preset number of excitation packets are received. Here, the preset number is determined according to the number of the calculation cores of the previous stage to which the current calculation core is connected.

And judging whether the updating condition of the current computing core is met or not according to the stored input excitation data packets, specifically, judging whether a preset number of excitation data packets are received or not according to the total number of each stored input excitation data packet.

For example, assuming that the number of previous-stage computation cores connected to the current computation core is 3, the preset number is determined to be 3, and when it is determined that 3 excitation data packets are received according to the total number of each stored input excitation data packet, since the previous-stage computation core transmits the excitation data packet after transmitting the pulse data packet, when 3 excitation data packets are received, it is proved that the computation core receives the input pulse data packet and the input excitation data packet transmitted by the 3 previous-stage computation cores, and at this time, it can be determined that the computation core receives all the input pulse data packets, and the update condition of the current computation core is satisfied.

When the update condition of the current computing core is satisfied, executing step S705; and when the updating condition of the current computing core is not met, returning to the step S701 to wait for obtaining the input pulse data packet.

S706: and when the updating condition of the current computing core is met, executing updating operation and sending an output pulse data packet.

Specifically, an input pulse data packet carries a computation core address, and the input pulse data packet is sent to a current computation core according to the computation core address; the input pulse data packet also carries at least one input pulse potential and a neuron address corresponding to each input pulse potential. And each input pulse potential carried by the input pulse data packet is sent to a neuron in the current computation core.

Specifically, the update operation may be performed according to the following steps 1 to 3:

and step 1, aiming at each input pulse potential, sending the input pulse potential to a target neuron according to a neuron address corresponding to the input pulse potential.

As shown in fig. 3, the computation core includes a control device 301 and at least one neuron 302, and specifically, for each input pulse potential, the control device 301 in the computation core sends the input pulse potential to a target neuron according to a neuron address corresponding to the input pulse potential. The target neuron is a neuron corresponding to the input pulse potential. As shown in fig. 8, the control device 301 transmits an input pulse potential to the target neuron 801 according to the neuron address.

Step 2, receiving output pulse potential sent by the target neuron; the output pulse potential is generated by the target neuron according to the membrane potential of the target neuron, the input pulse potential received by the target neuron, a preset leakage potential and a preset potential threshold.

When the update condition is satisfied and the update operation is performed, the target neuron 801 generates an output pulse potential according to the membrane potential of the target neuron, the input pulse potential received by the target neuron, a preset leak potential, and a preset potential threshold.

Specifically, the target neuron 801 updates its membrane potential according to its membrane potential, an input pulse potential received by the target neuron 801, and a preset leak potential. In general, the updated membrane potential is the membrane potential of the target neuron 801 itself plus the received input pulse potential minus a preset leakage potential. Here, when a neural network is implemented using a neural network accelerator, the received input pulse potential is usually multiplied by the weight of the target neuron 801, that is, the updated membrane potential is the membrane potential of the target neuron 801 itself, and the input pulse potential multiplied by the weight is added to the updated membrane potential, and the preset leak potential is subtracted. Specifically, the weight of the target neuron 801 may be a positive number or a negative number. When the target neuron 801 does not receive the input pulse potential or the received input pulse potential is 0, the membrane potential of the target neuron 801 also needs to be subtracted by a preset leak potential. After the target neuron 801 updates its membrane potential, it is determined whether the updated membrane potential is greater than a preset potential threshold. When the updated membrane potential is greater than the preset potential threshold, the target neuron 801 generates and issues an output pulse potential, and after issuing the output pulse potential, the membrane potential of the target neuron 801 returns to zero.

The control device 301 receives the output pulse potential from the target neuron 801.

And 3, generating the output pulse data packet according to each output pulse potential.

Calculating the possibility that each neuron in the kernel sends out an output pulse potential, and sending out the output pulse potential when the membrane potential of the neuron is greater than a preset potential threshold; when the membrane potential of the neuron is not greater than a preset potential threshold, no output pulse potential is emitted. The control device 301 of the computation core generates the output pulse packet in accordance with each received output pulse potential. According to the neural network structure, namely the connection relation between each computation core in the neural network accelerator and the neuron in each computation core, determining the computation core address of the target computation core in the output pulse data packet and the neuron address corresponding to the output pulse potential, and generating the output pulse data packet. Here, since the neural network structure is stored in the storage device, the computing core may retrieve the neural network structure from the storage device.

After the updating operation is finished, the stored input excitation data packet has no effect, and in order to avoid influencing the judgment of the next updating condition and the execution of the updating operation, the stored input excitation data packet is emptied after the updating operation is executed at this time. The step of flushing the stored input stimulus packet may be performed in parallel with the step of sending an output pulse packet or the steps of generating and sending an output stimulus packet. There is no need to define the execution order of the step of flushing the stored input stimulus packet and the step of transmitting the output burst packet or the steps of generating and transmitting the output stimulus packet.

S707: and sending the output pulse data packet.

The control device of the computation core sends the output pulse data packet to the routing device, and the routing device sends the output pulse data packet to the computation core of the middle layer or the computation core of the output layer, or of course, to the computation core of the input layer.

S708: an output stimulus packet is generated and transmitted.

In one possible embodiment, in order to enable the target computing core of the subsequent stage to continuously determine whether the update condition is satisfied according to the output excitation data packet, and perform the update operation when the update condition is satisfied, the control device of the computing core generates the excitation data packet according to the connection relationship of the neural network, and sends the output excitation data packet to the target computing core of the subsequent stage. Here, in order to ensure that the target computing core at the subsequent stage correctly performs the update operation, the target computing core corresponding to the excitation packet is output, the target computing core corresponding to the output pulse packet is the same computing core, and the computing core addresses of the target computing cores are the same. The update condition of the target computing core is also used for ensuring that the target computing core receives all input pulse data packets taking the target computing core as the current computing core before performing the update operation.

In order for the computation core of the subsequent stage receiving the output burst packet to determine whether the update condition is satisfied, the output excitation packet must be generated and transmitted at the time of transmitting the output burst packet. Specifically, a computing core address of a target computing core and an update condition of the target computing core are obtained from a storage device; generating the output excitation data packet according to the computing core address of the target computing core and the updating condition of the target computing core; and then, sending the output excitation data packet to the routing equipment, and sending the excitation data packet to the target computing core by the routing equipment. Here, the target computation core that outputs the excitation packet and the target computation core that outputs the pulse packet are the same target computation core. Also, the update condition of the target computing core may be input by the configuration device and stored in the storage device.

It should be understood that the output stimulus packet is identical in composition and function to the input stimulus packet, and the same stimulus packet sent by the preceding compute core to the current compute core is the output stimulus packet for the preceding compute core and the input stimulus packet for the current compute core. In the case where the input pulse packet and the output pulse packet are the same, the same pulse packet sent from the preceding-stage computing core to the current computing core is the output pulse packet for the preceding-stage computing core and the input pulse packet for the current computing core.

For the computing cores of the output layer, when the neural network performs computation, the output layer generally does not need to send output pulse data packets to the next computing core, but sends the computation results to the external electronic device. At this time, the output layer completes the update operation, which means that the entire neural network has completed the update operation, and at this time, the output pulse data packet needs to be sent to the external electronic device as an operation result. Further, depending on the external electronic device, the output stimulus packet may be generated and transmitted to the external electronic device, or the output stimulus packet may not be generated and transmitted.

In actual implementation, there are often connection cases as shown in fig. 9a, and the computing core a, the computing core B, the computing core C, and the computing core D are all connected to the computing core E. At this time, because the number of input pulse data packets input to the computation core a, the computation core B, the computation core C, and the computation core D is not constant, and the time duration of the update operation of the computation core a, the computation core B, the computation core C, and the computation core D is also not constant, a situation may occur in which the computation core B receives fewer input pulse data packets, and the computation core C receives more input pulse data packets, which may cause a time step of the update operation performed by the computation core B according to the input pulse data packets to be much smaller than a time step of the update operation performed by the computation core C according to the input pulse data packets.

In this case, when the control method of the neural network accelerator provided in the embodiment of the present application is used to control the update operation of the neural network accelerator, the computation core B is immediately updated when the next update condition is satisfied after the last update operation is completed. If the number of input pulse packets input by the computing core B is still small in the time step of the next update operation, a situation that the computing core B completes two update operations and the computing core C does not complete one update operation occurs, as shown in fig. 9B. Therefore, the second-layer computing core E receives the input pulse data packet and the input excitation data packet of two time steps from the computing core B, and further operation errors occur.

In order to further overcome the defect in this case, in the embodiment of the present application, after the step of performing the update operation and sending the output burst packet, and before the step of generating and sending the output excitation packet, the output feedback signal may be sent; the output feedback signal is used to characterize the receipt of the input excitation data packet. And upon receiving an input feedback signal, performing the steps of generating and transmitting an output stimulus data packet.

In this case, the execution steps of the control method of the preferred neural network accelerator include: acquiring an input pulse data packet; acquiring an input excitation data packet; storing the input stimulus data packet; acquiring an updating condition of the current computing core; judging whether the current computing core meets the updating condition or not according to the stored input excitation data packet and the acquired updating condition; when the current computing core meets the updating condition, executing updating operation and sending an output pulse data packet; sending an output feedback signal; judging whether an input feedback signal is received or not; upon receiving the input feedback signal, performing the steps of generating and transmitting an output stimulus data packet.

At this time, as shown in fig. 9C, the time for the computation core B to send out the output pulse packet and the output excitation packet in the second time step is delayed until the computation core C sends out the output pulse packet and the output excitation packet in the first time step, so as to avoid the above error condition.

Based on the same design concept, the embodiment of the application also provides a control device of the neural network accelerator and the neural network accelerator.

As shown in fig. 10, a control apparatus 1000 of a neural network accelerator provided in an embodiment of the present application is applied to a control device of a computing core, and includes:

an excitation obtaining module 1001, configured to obtain an input excitation data packet after obtaining the input pulse data packet;

a storage module 1002, configured to store the input excitation data packet; the input excitation data packet is used for judging whether the current computing core meets the updating condition;

a condition obtaining module 1003, configured to obtain an update condition of the current computing core;

the excitation judging module 1004 is configured to judge whether the current computing core meets the update condition according to the stored input excitation data packet and the obtained update condition;

an update module 1005, configured to execute an update operation and send an output pulse data packet when the current computing core meets an update condition.

The excitation obtaining module 1001 is connected to the storage module 1002, and after obtaining the input pulse data packet, the excitation obtaining module 1001 obtains the input excitation data packet, and stores the input excitation data packet in the storage module 1002. The excitation judging module 1004 is respectively connected to the storage module 1002 and the condition obtaining module 1003, and judges whether the update condition of the current computing core is satisfied according to the input excitation data packet stored in the storage module 1002 and the update condition obtained by the condition obtaining module 1003. When the update condition is satisfied, the excitation determining module 1004 drives the update module 1005 to start the update operation.

the excitation determining module 1004 is specifically configured to:

as shown in fig. 11, the update module 1005 specifically includes:

a neuron sending unit 1101 configured to send, for each input pulse potential, the input pulse potential to a target neuron according to a neuron address corresponding to the input pulse potential;

a neuron receiving unit 1102, configured to receive an output pulse potential emitted by the target neuron; the output pulse potential is generated by the target neuron according to the membrane potential of the target neuron, the input pulse potential received by the target neuron, a preset leakage potential and a preset potential threshold;

a pulse generating unit 1103 configured to generate the output pulse packet according to each of the output pulse potentials.

The neuron transmission unit 1101 is connected to the target neuron 1104 and transmits an input pulse potential to the target neuron 1104, and the neuron reception unit 1102 is similarly connected to the target neuron 1104 and receives an output pulse potential from the target neuron 1104. The neuron receiving unit 1102 is further connected to a pulse generating unit 1103, and transmits output pulse potentials to the pulse generating unit 1103, and the pulse generating unit 1103 generates the output pulse data packets according to each of the output pulse potentials.

In a possible implementation, the apparatus 1000 further comprises:

a clearing module 1006, configured to clear the stored input excitation packet.

The emptying module 1006 is connected to the updating module 1005 and the incentive storing module 1002, and empties the input incentive packets stored in the incentive storing module 1002 after the updating module 1005 performs the updating operation.

In a possible implementation, the apparatus 1000 further comprises:

an excitation sending module 1007, configured to generate and send an output excitation data packet; the output excitation data packet comprises a computing core address of a target computing core; the output excitation data packet is used for judging whether the target computing core meets an updating condition.

In one possible implementation, as shown in fig. 12, the excitation sending module 1007 includes:

an obtaining unit 1201, configured to obtain a computing core address of the target computing core and an update condition of the target computing core;

a generating unit 1202, configured to generate the output excitation data packet according to a computing core address of the target computing core and an update condition of the target computing core;

a sending unit 1203 is configured to send the output excitation data packet.

The obtaining unit 1201 is connected to the storage device and the generating unit 1202, and the obtaining unit 1201 obtains the computing core address of the target computing core and the update condition of the target computing core, and sends the obtained address and the update condition to the generating unit 1202. The generation unit 1202 generates an output excitation packet according to the computation core address of the target computation core and the update condition of the target computation core. The generation unit 1202 is connected to the transmission unit 1203, and the generation unit 1202 transmits the output excitation packet to the transmission unit 1203. The sending unit 1203 is connected to the routing device, and the sending unit 1203 sends the output excitation data packet to the routing device, and sends the output excitation data packet to the target computing core through the routing device.

In a possible implementation, the apparatus 1000 further comprises:

a feedback sending module 1008, configured to send an output feedback signal; the output feedback signal is used to characterize the receipt of the input excitation data packet.

The feedback sending module 1008 is connected to the updating module 1005, and sends an output feedback signal after the updating module 1005 completes the updating operation.

In a possible implementation, the apparatus 1000 further comprises:

the feedback determining module 1009 is configured to determine whether an input feedback signal is received. The feedback determining module 1009 is connected to the excitation sending module 1007, and drives the excitation sending module 1007 to perform the step of generating and sending the output excitation data packet when receiving the input feedback signal

The control device of the neural network accelerator provided by the embodiment of the application can enable each computation core in the neural network accelerator to carry out updating operation according to the actual delay time of the received pulse, and the overall operation performance of the neural network accelerator is obviously improved.

An embodiment of the present application further provides a neural network accelerator, including: the method comprises the steps of configuring equipment, storage equipment, routing equipment and a computing core;

the computing core comprises: neurons and control devices;

the control device is used for realizing any method provided by the embodiment of the application or realizing any device provided by the embodiment of the application.

In a possible implementation manner, the neural network accelerator provided in the embodiment of the present application performs transmission of the input stimulus packet and/or the output stimulus packet through the routing device.

In a possible implementation manner, the neural network accelerator provided in the embodiment of the present application performs transmission of the input feedback signal and/or the output feedback signal through a dedicated signal line.

Here, the input feedback signal and/or the output feedback signal are also the same in configuration and action, and the same feedback signal transmitted from the subsequent-stage computing core to the current computing core is the output feedback signal for the subsequent-stage computing core and the input feedback signal for the current computing core.

Feedback signals, including input feedback signals and output feedback signals, may be transmitted using the routing device. However, since the input feedback signal and/or the output feedback signal can be implemented by only 1bit signal, there is no need to transmit the input feedback signal and/or the output feedback signal through the on-chip network (i.e., the routing device of the neural network accelerator) in order to further improve the stun efficiency and the overall operation performance of the neural network accelerator. Preferably, a dedicated signal line can be directly added between the computing cores, and the connection between the computing cores is directly performed. As shown in fig. 13, a dedicated signal is added between the computing core E and its previous computing core (i.e., computing core a, computing core B, computing core C, and computing core D), and is used for transmitting a feedback signal including an input feedback signal and an output feedback signal. Since the number of bits of the feedback signal is only 1bit, the feedback signal sent by one computing core does not exceed the number of computing cores in the network at most (for example, in a neural network accelerator with a computing core array of 4 × 4, the number of feedback signals is 16 at most), and even if a special signal line is added, excessive design and implementation cost is not brought. And moreover, the feedback signal is directly transmitted between the computing cores by using the special signal wire, so that the feedback signal can be reduced, and the updating operation speed is improved. Therefore, it is preferable to use the above-mentioned dedicated signal line independent of the routing device for the transmission of the input feedback signal and/or the output feedback signal.

The control method and device for any neural network accelerator and the neural network accelerator provided by the embodiments of the present application are all based on the same design concept, and the technical means in any embodiment of the present application can be freely combined, and the combined technical means is still within the protection scope of the present application.

The flowchart and block diagrams in the figures of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments disclosed herein. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It will be appreciated by a person skilled in the art that various combinations and/or combinations of features described in the various embodiments and/or claims of the present application are possible, even if such combinations or combinations are not explicitly described in the present application. In particular, the features recited in the various embodiments and/or claims of the present application may be combined and/or coupled in various ways, all of which fall within the scope of the present disclosure, without departing from the spirit and teachings of the present application.

The principle and implementation of the present application are explained by applying specific embodiments in the present application, and the above description of the embodiments is only used to help understanding the method and the core idea of the present application, and is not used to limit the present application. It will be appreciated by those skilled in the art that changes may be made in this embodiment and its broader aspects and without departing from the principles, spirit and scope of the invention, and that all such modifications, equivalents, improvements and equivalents as may be included within the scope of the invention are intended to be protected by the claims.

Claims

1. A control method of a neural network accelerator, characterized in that a control device applied to a computational core includes:

after an input pulse data packet is acquired, acquiring an input excitation data packet, and storing the input excitation data packet; the input excitation data packet is used for judging whether the current computing core meets the updating condition;

acquiring an updating condition of the current computing core;

judging whether the current computing core meets the updating condition or not according to the stored input excitation data packet and the acquired updating condition;

and when the current computing core meets the updating condition, executing updating operation and sending an output pulse data packet.

2. The method of claim 1, wherein the current computational core is updated on condition that a predetermined number of excitation packets are received;

and judging whether a preset number of excitation data packets are received or not according to the total number of the stored input excitation data packets.

3. The method of claim 1, wherein the input pulse data packet carries at least one input pulse potential and a neuron address corresponding to each input pulse potential;

the step of performing an update operation includes:

4. The method of claim 1, wherein after the step of performing an update operation, the method further comprises:

emptying the stored input excitation data packet.

5. The method of claim 1, wherein after the step of performing the update operation and sending the output burst packet, the method further comprises:

generating and transmitting an output stimulus data packet; the output excitation data packet comprises a computing core address of a target computing core; the output excitation data packet is used for judging whether the target computing core meets an updating condition.

6. The method of claim 5, wherein the step of generating and transmitting an output stimulus packet comprises:

and transmitting the output excitation data packet.

7. The method of claim 5, wherein after the step of performing an update operation and sending an output burst packet and before the step of generating and sending an output stimulus packet, the method further comprises:

sending an output feedback signal; the output feedback signal is used to characterize the receipt of the input excitation data packet.

8. The method of claim 7, wherein after the step of performing an update operation and sending an output burst packet and before the step of generating and sending an output stimulus packet, the method further comprises:

judging whether an input feedback signal is received or not;

upon receiving the input feedback signal, performing the steps of generating and transmitting an output stimulus data packet.

9. A control device of a neural network accelerator is characterized in that a control device applied to a computing core comprises:

the excitation acquisition module is used for acquiring an input excitation data packet after acquiring the input pulse data packet;

the storage module is used for storing the input excitation data packet; the input excitation data packet is used for judging whether the current computing core meets the updating condition;

the condition acquisition module is used for acquiring the updating condition of the current computing core;

the excitation judging module is used for judging whether the current computing core meets the updating condition or not according to the stored input excitation data packet and the acquired updating condition;

and the updating module is used for executing updating operation and sending an output pulse data packet when the current computing core meets the updating condition.

10. A neural network accelerator, comprising: the method comprises the steps of configuring equipment, storage equipment, routing equipment and a computing core;

the computing core comprises: neurons and control devices;

the control device is used for realizing the method of any one of claims 1 to 8 or realizing the device of claim 9.

11. The neural network accelerator of claim 10, wherein the transmission of the input feedback signal and/or the output feedback signal is performed via dedicated signal lines.