CN111210014A - Control method and device of neural network accelerator and neural network accelerator - Google Patents

Control method and device of neural network accelerator and neural network accelerator Download PDF

Info

Publication number
CN111210014A
CN111210014A CN202010009676.2A CN202010009676A CN111210014A CN 111210014 A CN111210014 A CN 111210014A CN 202010009676 A CN202010009676 A CN 202010009676A CN 111210014 A CN111210014 A CN 111210014A
Authority
CN
China
Prior art keywords
data packet
input
computing core
output
core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010009676.2A
Other languages
Chinese (zh)
Other versions
CN111210014B (en
Inventor
陈虹
张吉霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202010009676.2A priority Critical patent/CN111210014B/en
Publication of CN111210014A publication Critical patent/CN111210014A/en
Application granted granted Critical
Publication of CN111210014B publication Critical patent/CN111210014B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Feedback Control In General (AREA)

Abstract

The application discloses a control method and device of a neural network accelerator and the neural network accelerator, and relates to the artificial intelligence technology. The specific scheme comprises the following steps: after an input pulse data packet is acquired, acquiring an input excitation data packet, and storing the input excitation data packet; the input excitation data packet is used for judging whether the current computing core meets the updating condition; acquiring an updating condition of the current computing core; judging whether the current computing core meets the updating condition or not according to the stored input data packet and the acquired updating condition; and when the current computing core meets the updating condition, executing updating operation and sending an output pulse data packet. According to the method and the device, each computation core in the neural network accelerator can carry out updating operation according to the actual delay time of the received pulse, and the overall operation performance of the neural network accelerator is obviously improved.

Description

Control method and device of neural network accelerator and neural network accelerator
Technical Field
The present application relates to artificial intelligence technology, and in particular, to a method and an apparatus for controlling a neural network accelerator, and a neural network accelerator.
Background
The rapid development of artificial intelligence technology makes the neural network accelerator, which is a high-performance computing device for brain-like operations, meet the peak of development. The neural network accelerator simulates the operation mode of a human brain and operates by using membrane potential and pulse potential carried by pulse.
In order to ensure that each computation core in the neural network accelerator can complete the updating operation under the condition of different receiving pulse time delay. In the prior art, the time step for updating the operation is usually designed according to the worst working condition and the longest delay time, so that each computation core in the neural network accelerator works under the longest delay time for receiving the pulse no matter the actual delay time for receiving the pulse, and the overall operation performance of the neural network accelerator is greatly reduced.
Disclosure of Invention
In view of this, a main object of the present application is to provide a control method for a neural network accelerator, which enables each computation core in the neural network accelerator to perform an update operation according to an actual delay time of a received pulse, thereby significantly improving the overall operation performance of the neural network accelerator.
In order to achieve the purpose, the technical scheme provided by the application is as follows:
in a first aspect, an embodiment of the present application provides a control method for a neural network accelerator, which is applied to a control device for a computational core, and includes the following steps:
after an input pulse data packet is obtained, an input excitation data packet is obtained; the input excitation data packet comprises an update condition of a current computing core;
storing the input excitation data packet, and judging whether the updating condition of the current computing core is met or not according to the stored input excitation data packet;
when the updating condition of the current computing core is met, executing updating operation and sending an output pulse data packet;
generating and transmitting an output stimulus data packet; the output stimulus data packet includes a compute core address of a target compute core and an update condition of the target compute core.
In a possible implementation, the update condition of the current computing core is that a preset number of excitation data packets are received;
the step of judging whether the update condition of the current computing core is met according to the stored input excitation data packet comprises the following steps:
and judging whether a preset number of excitation data packets are received or not according to the total number of each stored input excitation data packet.
In a possible implementation manner, the input pulse data packet carries at least one input pulse potential and a neuron address corresponding to each input pulse potential;
the step of performing an update operation includes:
aiming at each input pulse potential, sending the input pulse potential to a target neuron according to a neuron address corresponding to the input pulse potential;
receiving an output pulse potential sent by the target neuron; the output pulse potential is generated by the target neuron according to the membrane potential of the target neuron, the input pulse potential received by the target neuron, a preset leakage potential and a preset potential threshold;
and generating the output pulse data packet according to each output pulse potential.
In a possible implementation, after the step of performing the update operation, the method further includes:
emptying the stored input excitation data packet.
In one possible embodiment, the step of generating and transmitting the output stimulus packet comprises:
acquiring a computing core address of the target computing core and an updating condition of the target computing core;
generating the output excitation data packet according to the computing core address of the target computing core and the updating condition of the target computing core;
and transmitting the output excitation data packet.
In a second aspect, an embodiment of the present application further provides a control apparatus for a neural network accelerator, which is applied to a control device of a computing core, and includes:
the excitation acquisition module is used for acquiring an input excitation data packet after acquiring the input pulse data packet; the input excitation data packet comprises an update condition of a current computing core;
the excitation judging module is used for judging whether the updating condition of the current computing core is met or not according to the stored input excitation data packet;
the updating module is used for executing updating operation and sending an output pulse data packet when the updating condition of the current computing core is met;
the excitation sending module is used for generating and sending an output excitation data packet; the output stimulus data packet includes a compute core address of a target compute core and an update condition of the target compute core.
In a possible implementation, the update condition of the current computing core is that a preset number of excitation data packets are received;
the excitation judging module is specifically configured to:
and judging whether a preset number of excitation data packets are received or not.
In a possible implementation manner, the input pulse data packet carries at least one input pulse potential and a neuron address corresponding to each input pulse potential;
the update module specifically includes:
the neuron sending unit is used for sending the input pulse potential to a target neuron according to a neuron address corresponding to the input pulse potential aiming at each input pulse potential;
the neuron receiving unit is used for receiving the output pulse potential sent by the target neuron; the output pulse potential is generated by the target neuron according to the membrane potential of the target neuron, the input pulse potential received by the target neuron, a preset leakage potential and a preset potential threshold;
and the pulse generating unit is used for generating the output pulse data packet according to each output pulse potential.
In a possible embodiment, the apparatus further comprises:
and the emptying module is used for emptying the stored input excitation data packet.
In one possible embodiment, the stimulus sending module includes:
an obtaining unit, configured to obtain a computing core address of the target computing core and an update condition of the target computing core;
the generating unit is used for generating the output excitation data packet according to the computing core address of the target computing core and the updating condition of the target computing core;
a sending unit, configured to send the output excitation data packet.
In a third aspect, an embodiment of the present application further provides a neural network accelerator, including: the method comprises the steps of configuring equipment, storage equipment, routing equipment and a computing core;
the computing core comprises: neurons and control devices;
the control device is configured to implement the steps of any one of the possible embodiments of the first aspect and the first aspect, or implement the apparatus of any one of the possible embodiments of the second aspect and the second aspect.
In summary, the present application provides a control method and device for a neural network accelerator, and the neural network accelerator. The method is different from the method that each computing core works under the longest delay time of receiving pulses in the prior art, the computing core obtains and stores an input excitation data packet, judges whether the updating condition of the current computing core is met or not according to the input excitation data packet, and directly performs updating operation without waiting when the updating condition is met. And each stage of computing core directly performs updating operation when the updating condition is met, so that each computing core in the neural network accelerator performs updating operation according to the actual delay time of the received pulse, and the overall operation performance of the neural network accelerator is obviously improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a schematic architecture diagram of a neural network accelerator according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of the architecture of a routing device in a neural network accelerator;
FIG. 3 is a schematic diagram of the architecture of a compute core in a neural network accelerator;
FIG. 4a is a schematic diagram of one connection between computational cores in a neural network accelerator;
FIG. 4b is a schematic diagram of another connection between computational cores in a neural network accelerator;
FIG. 4c is a schematic diagram of another connection between computational cores in a neural network accelerator;
fig. 5 is a schematic flowchart of a control method of a neural network accelerator according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a neural network accelerator with three layers of neural networks;
fig. 7 is a schematic flowchart of another control method for a neural network accelerator according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a control device and target neuron in a computational core;
FIG. 9a is a schematic diagram of one connection between computational cores in a neural network accelerator;
FIG. 9b is a schematic diagram of the update operation time of the neural network accelerator;
FIG. 9c is a schematic diagram of another update operation time of the neural network accelerator;
fig. 10 is a schematic structural diagram of a control device of a neural network accelerator according to an embodiment of the present application;
FIG. 11 is a schematic diagram of an update module in a control device of a neural network accelerator;
FIG. 12 is a schematic diagram of a structure of a stimulus transmission module in a control device of a neural network accelerator;
FIG. 13 is a diagram illustrating one manner of connection between computational cores in a neural network accelerator.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or apparatus is not necessarily limited to those steps or apparatus explicitly listed, but may include other steps or apparatus not explicitly listed or inherent to such process, method, article, or apparatus.
The rapid development of artificial intelligence technology makes the neural network accelerator, which is a high-performance computing device for brain-like operations, meet the peak of development. The neural network accelerator simulates the operation mode of a human brain and operates by using membrane potential and pulse potential carried by pulse.
In order to ensure that each computation core in the neural network accelerator can complete the updating operation under the condition of different receiving pulse time delay. In the prior art, the time step for updating the operation is usually designed according to the worst working condition and the longest delay time, so that each computation core in the neural network accelerator works under the longest delay time for receiving the pulse no matter the actual delay time for receiving the pulse, and the overall operation performance of the neural network accelerator is greatly reduced.
In view of this, the core invention points of the embodiments of the present application are: the method comprises the steps that an input excitation data packet is obtained and stored by a computing core, whether the updating condition of the current computing core is met or not is judged according to the input excitation data packet, and when the updating condition is met, the updating operation is directly carried out without waiting. And after the updating operation is executed, generating and sending an output excitation data packet so that the subsequent target computing core can continuously judge whether the updating condition is met according to the output excitation data packet and carry out the updating operation when the updating condition is met. And each stage of computing core directly performs updating operation when the updating condition is met, so that each computing core in the neural network accelerator performs updating operation according to the actual delay time of the received pulse, and the overall operation performance of the neural network accelerator is obviously improved.
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention are described in detail below with specific embodiments. Several of the following embodiments may be combined with each other and some details of the same or similar concepts or processes may not be repeated in some embodiments.
As shown in fig. 1, an architecture of a neural network accelerator 100 provided in an embodiment of the present application is that the neural network accelerator 100 includes a configuration device 101, a storage device 102, a plurality of routing devices 103, and a computing core 104 corresponding to each routing device 103.
In the neural network accelerator 100 of the embodiment of the present application, the configuration device 101 is used to configure the structure of the neural network and the weights of different neurons. Here, the structure of the neural network specifically includes the computation cores 104 included in each level of the neural network, connection relationships of neurons within each computation core 104 and between the computation cores 104, and the like. And, each specific neural network implements a different function depending on the weights of different neurons. The step of configuring the structure of the neural network and the weights of the different neurons is specifically to input and store the structure of the neural network and the weights of the different neurons into the storage device 102.
The storage device 102 is used to store the structure of the neural network and the weights of the different neurons.
The routing device 103 is an on-chip interconnection network of a neural network accelerator, and pulse packets including input pulse packets and output pulse packets, excitation packets including input excitation packets and output excitation packets, and other data are transmitted by using the routing device 103.
The data transmission mode in the neural network accelerator 100 according to the embodiment of the present application is step-by-step transmission between the routing devices 103, as shown in fig. 2, one routing device 201 has 5 bidirectional ports: east, south, west, north and local, each bidirectional port may carry data transfer for input and output. For convenience of description, the names of the bidirectional ports are named from north, south, west and left, and east, and this naming method has no special meaning, and is only used for intuitively describing the positions of the bidirectional ports, where the above, the below, the left and the right are relative to the routing device 201, and are only used for describing the positions of the bidirectional ports, and do not necessarily have meanings of up, down, left and right in an absolute coordinate system, and the east, the west, the south and the north are also only used for describing the positions of the bidirectional ports relative to the routing device 201, and do not necessarily have meanings of east, west, south and north in a physical coordinate system. When a pulse data packet or an excitation data packet is acquired by the routing device 201, it is determined in which direction the pulse data packet or the excitation data packet should be transmitted according to a computation core address carried by the pulse data packet or the excitation data packet, and if the computation core address carried by the pulse data packet or the excitation data packet is a computation core address of the current computation core 202, the pulse data packet or the excitation data packet is sent to the local current computation core 202. If the computation core address carried by the pulse data packet or the excitation data packet is not the computation core address of the current computation core 202, the pulse data packet or the excitation data packet is sent to the adjacent routing device in the corresponding direction according to the computation core address carried by the pulse data packet or the excitation data packet. In order to facilitate transmission of the burst packet or the excitation packet, the burst packet and the excitation packet are generated and transmitted in units of the computation core 104, and are distributed to specific neurons inside the computation core 104.
As shown in fig. 3, each of the computing cores 104 includes a control device 301 and at least one neuron 302, and in general, a computing core includes a plurality of neurons 302, which may include 64 neurons 302, 128 neurons 302, or 1024 neurons 302, for example. Each neuron 302 individually comprises a membrane potential, and each neuron 302 can make a change in membrane potential according to a pulse potential.
In the prior art, the computation core 104 only receives an input pulse data packet or sends an output pulse data packet, and the update operation of each neuron is completed through the pulse data packets including the input pulse data packet and the output pulse data packet. Here, the input burst packet includes: at least one input pulse potential, and a neuron address corresponding to a neuron that specifically receives the pulse potential. For each input pulse potential, the control device 301 of the computation core 104 transmits the input pulse potential to a neuron corresponding to a neuron address of the input pulse potential. Because a plurality of neurons 302 are included in a computing core 104, an input pulse packet received by a computing core 104 may include a plurality of input pulse potentials.
Because the pulse data packet and the excitation data packet are generated and sent in units of the computing core 104, when the neural network is implemented, a specific connection structure may be as shown in fig. 4a, where a preceding computing core is connected to a subsequent computing core, and the preceding computing core sends the pulse data packet and the excitation data packet to the subsequent computing core; as shown in fig. 4b, more than two preceding-stage computation cores are connected to a subsequent-stage computation core, and each preceding-stage computation core sends a pulse data packet and an excitation data packet to the subsequent-stage computation core; it is also possible that, as shown in fig. 4c, a preceding-stage computation core is connected to more than two subsequent-stage computation cores, and the preceding-stage computation core sends a pulse data packet and an excitation data packet to each subsequent-stage computation core. The neurons 302 in the computation core 104 need to receive all the input pulse potentials before they can perform the correct update operation. Since each computing core 104 is located at a different position in the on-chip interconnection network and the connection path length between each computing core 104 is different, the transmission delay of the burst data packet between each computing core 104 is different; in addition, because the number of the preceding-stage computation cores connected to each computation core is different, the computation core can satisfy the condition of the update operation, and the time for performing the update operation is different, which further affects the time for the computation core to send the output pulse data packet to the subsequent-stage computation core connected to the computation core.
To sum up, in order to ensure that all input pulse potentials can be completely received by neurons in each computation core and update operation can be correctly completed, in the prior art, a waiting method is adopted, and a method of experimental measurement or computation is used to determine the longest delay time for transmitting pulse data packets between the computation cores under the worst condition, for example, the longest transmission delay under the condition that the connection path between the computation cores 104 in the on-chip interconnection network is longest is determined, and the time step length of update operation of each computation core 104 is determined according to the longest delay time. Whenever the computation core 104 completes the operation of receiving all the pulse packets, and whenever the neuron 302 in the computation core 104 completes the operation of receiving all the input pulse potentials, the computation core 104 waits for the longest delay time and then performs the refresh operation. The prior art method undoubtedly greatly reduces the overall operational performance of the neural network accelerator.
In the embodiment of the present application, the computation core 104 not only receives the input burst data packet or sends the output burst data packet, but also receives the input excitation data packet or sends the output excitation data packet. And finishing handshake among the computing cores through excitation data packets including input excitation data packets and output excitation data packets.
Specifically, a method for completing control of a neural network accelerator by receiving an input stimulus packet or sending an output stimulus packet is shown in fig. 5, and the method is applied to a control device of a computational core, and mainly includes:
s501: after an input pulse data packet is acquired, acquiring an input excitation data packet, and storing the input excitation data packet; the input excitation data packet is used for judging whether the current computing core meets the updating condition.
In the embodiment of the present application, the control device of the computation core further obtains an input excitation data packet on the basis of obtaining an input pulse data packet in the prior art. The input excitation data packet acquired by the control device of the computing core carries the update condition of the current computing core. Generally, the correct update operation is performed only after the computation core receives all the input burst packets. Therefore, the calculation cores are ensured to receive all input pulse data packets before updating operation is carried out through the acquired updating conditions of the current calculation cores carried in the input excitation data packets.
Since, as shown in fig. 4b, in an actual neural network structure, there is a case where one computation core is connected to more than two preceding computation cores, the computation core may receive more than two input pulse packets, and at this time, the control device of the computation core needs to receive one corresponding input excitation packet every time it receives one input pulse packet. At this time, the control device of the computing core stores the received input excitation data packet every time it receives one input excitation data packet. Specifically, the storage may be stored in a unit having a storage function in the computing core, or may be stored in a storage device inside or outside the neural network accelerator.
S502: and acquiring the updating condition of the current computing core.
The update condition of the current computing core may be stored in a storage device internal or external to the neural network accelerator, and thus, the update condition of the current computing core may be retrieved from the storage device internal or external to the neural network accelerator.
The step of obtaining the update condition of the current compute core may be performed before or after the step of obtaining the input excitation packet. Preferably, as shown in fig. 5, the step of obtaining the update condition of the current computing core and the step of obtaining the input excitation packet may be performed in parallel.
S503: and judging whether the current computing core meets the updating condition or not according to the stored input excitation data packet and the acquired updating condition.
Typically, at this point, at least one input stimulus packet is stored in a unit with memory functionality in the compute core, or in a memory device internal or external to the neural network accelerator. At this time, whether the updating condition of the current computing core is met is judged according to the stored input excitation data packet and the obtained updating condition of the current computing core.
S504: and when the current computing core meets the updating condition, executing updating operation and sending an output pulse data packet.
And when judging that all the input pulse data packets are received and judging that the updating conditions of the current computing core are met, executing updating operation, generating an output pulse data packet according to the updating operation result, and sending the output pulse data packet to a target computing core at the later stage.
Each computation core in the neural network accelerator provided by the embodiment of the present application adopts the control method provided by the embodiment of the present application, determines whether an update condition is satisfied according to the acquired input excitation data packet, and performs an update operation when the update condition is satisfied, which is different from a method of performing an update operation after waiting for a longest delay time in the prior art.
For convenience of understanding, in the embodiments of the present application, a neural network accelerator including three layers of neural networks is taken as an example, and a control method of the neural network accelerator is described in detail.
The schematic architecture of the neural network accelerator including the three-layer neural network is shown in fig. 6, and the configuration device and the storage device of the neural network accelerator do not have a great change in structure regardless of the neural network structure in the neural network accelerator, and therefore, only the routing device 601 of the neural network accelerator 600 and the computing core 602 corresponding to the routing device 601 are shown in fig. 6. The neural network accelerator 600 in fig. 6 includes an input layer, an intermediate layer, and an output layer.
Whether the computation core 602 is located in an input layer, an intermediate layer, or an output layer, the control method is similar, and as shown in fig. 7, the specific control method includes:
s701: and acquiring an input pulse data packet.
Generally, a neural network accelerator performs operation under the driving of an external electronic device, an input layer is connected with the external electronic device, an input pulse data packet is acquired from the external electronic device, and the input pulse data packet acquired by a computation core of the input layer is generated by the external electronic device. Here, the external electronic device may include a CPU, an SOC, an FPGA, or other commonly used electronic devices. The input layer is also connected with the middle layer, and the computing core of the input layer generates an output pulse data packet and sends the output pulse data packet to the computing core of the middle layer. And the output pulse data packet generated by the computing core of the input layer is the input pulse data packet taking the computing core of the middle layer as the current computing core for the computing core of the middle layer. And the intermediate layer is connected with the output layer, and the computing core of the intermediate layer receives the output pulse data packet generated by the computing core of the input layer as an input pulse data packet, generates an output pulse data packet and sends the output pulse data packet to the computing core of the output layer. And the output pulse data packet generated by the computing core of the middle layer is an input pulse data packet taking the computing core of the output layer as the current computing core for the computing core of the output layer. The computing core of the output layer can also be connected with an external electronic device, and the computing core sends the computing result to the external electronic device. During training, the computation core of the output layer may also generate an output pulse data packet, and back-propagate the generated output pulse data packet to the computation core of the middle layer. At this time, the computation core of the intermediate layer receives the output pulse packet generated by the computation core of the output layer as an input pulse packet. The pulse data packets including the input pulse data packet and the output pulse data packet each include: and calculating a core address, at least one input pulse potential and a neuron address corresponding to each input pulse potential. Here, the address of the computation core is the address of the computation core that receives the burst packet. And each input pulse potential is input into one neuron in the computation core receiving the pulse data packet, and the neuron address corresponding to the input pulse potential is the address of the neuron receiving the input pulse potential.
Taking the computing core of the input layer as an example, the external electronic device may generate an input pulse data packet of each computing core of the input layer, and send the input pulse data packet to each computing core of the input layer in a step-by-step transfer or direct transfer manner.
S702: an input stimulus packet is obtained.
The excitation data packet including the input excitation data packet and the output excitation data packet is usually received and transmitted by taking the computing core as a unit, the excitation data packet may include a computing core address and an excitation signal, and the receiving of the input excitation data packet by the current computing core means that the previous computing core connected thereto completes the update operation, and means that the current computing core receives the input pulse data packet of the previous computing core connected thereto. In order to ensure that each computation core has acquired all input pulse data packets before performing update operation, whether the current computation core meets the update condition is judged according to the acquired input excitation data packets. In order to be sufficient to determine whether all incoming pulse packets have been captured, the incoming excitation packets captured by the computational core have the same source as the captured incoming pulse packets. Specifically, the input layer is connected with an external electronic device, and an input pulse data packet is acquired from the external electronic device, so that the computing core of the input layer acquires an input excitation data packet from the external electronic device. Similarly, after the computing core of the input layer generates an output pulse data packet and sends the output pulse data packet to the intermediate layer, an output excitation data packet is also generated and sent to the intermediate layer. And after the intermediate layer acquires the output pulse data packet generated by the computing core of the input layer as an input pulse data packet, the intermediate layer also acquires the output excitation data packet generated by the computing core of the input layer as an input excitation data packet. Similarly, after the computing core in the middle layer generates an output pulse data packet and sends the output pulse data packet to the output layer, an output excitation data packet is also generated and sent to the output layer. And after the output layer acquires the output pulse data packet generated by the computing core of the middle layer as an input pulse data packet, the output layer also acquires the output excitation data packet generated by the computing core of the middle layer as an input excitation data packet.
S703: the input stimulus packet is stored.
Because there is a case where one computation core is connected to more than two previous-stage computation cores, in order to determine whether the update condition is satisfied, each acquired input excitation packet is stored. Specifically, the storage may be stored in a unit having a storage function in the computing core, or may be stored in a storage device inside or outside the neural network accelerator.
S704: and acquiring the updating condition of the current computing core.
The update condition of the current computing core may be stored in a storage device internal or external to the neural network accelerator, and thus, the update condition of the current computing core may be retrieved from the storage device internal or external to the neural network accelerator. For example, the update condition of the current computing core may be obtained from a storage device inside or outside the neural network accelerator according to the computing core address of the current computing core.
S705: and judging whether the current computing core meets the updating condition or not according to the stored input excitation data packet and the acquired updating condition.
Specifically, when the stored input excitation data packet indicates that all input pulse data packets have been received, it is determined that the update condition of the current computation core is satisfied. Therefore, it is preferable that the update condition of the current computing core is that a preset number of excitation packets are received. Here, the preset number is determined according to the number of the calculation cores of the previous stage to which the current calculation core is connected.
And judging whether the updating condition of the current computing core is met or not according to the stored input excitation data packets, specifically, judging whether a preset number of excitation data packets are received or not according to the total number of each stored input excitation data packet.
For example, assuming that the number of previous-stage computation cores connected to the current computation core is 3, the preset number is determined to be 3, and when it is determined that 3 excitation data packets are received according to the total number of each stored input excitation data packet, since the previous-stage computation core transmits the excitation data packet after transmitting the pulse data packet, when 3 excitation data packets are received, it is proved that the computation core receives the input pulse data packet and the input excitation data packet transmitted by the 3 previous-stage computation cores, and at this time, it can be determined that the computation core receives all the input pulse data packets, and the update condition of the current computation core is satisfied.
When the update condition of the current computing core is satisfied, executing step S705; and when the updating condition of the current computing core is not met, returning to the step S701 to wait for obtaining the input pulse data packet.
S706: and when the updating condition of the current computing core is met, executing updating operation and sending an output pulse data packet.
Specifically, an input pulse data packet carries a computation core address, and the input pulse data packet is sent to a current computation core according to the computation core address; the input pulse data packet also carries at least one input pulse potential and a neuron address corresponding to each input pulse potential. And each input pulse potential carried by the input pulse data packet is sent to a neuron in the current computation core.
Specifically, the update operation may be performed according to the following steps 1 to 3:
and step 1, aiming at each input pulse potential, sending the input pulse potential to a target neuron according to a neuron address corresponding to the input pulse potential.
As shown in fig. 3, the computation core includes a control device 301 and at least one neuron 302, and specifically, for each input pulse potential, the control device 301 in the computation core sends the input pulse potential to a target neuron according to a neuron address corresponding to the input pulse potential. The target neuron is a neuron corresponding to the input pulse potential. As shown in fig. 8, the control device 301 transmits an input pulse potential to the target neuron 801 according to the neuron address.
Step 2, receiving output pulse potential sent by the target neuron; the output pulse potential is generated by the target neuron according to the membrane potential of the target neuron, the input pulse potential received by the target neuron, a preset leakage potential and a preset potential threshold.
When the update condition is satisfied and the update operation is performed, the target neuron 801 generates an output pulse potential according to the membrane potential of the target neuron, the input pulse potential received by the target neuron, a preset leak potential, and a preset potential threshold.
Specifically, the target neuron 801 updates its membrane potential according to its membrane potential, an input pulse potential received by the target neuron 801, and a preset leak potential. In general, the updated membrane potential is the membrane potential of the target neuron 801 itself plus the received input pulse potential minus a preset leakage potential. Here, when a neural network is implemented using a neural network accelerator, the received input pulse potential is usually multiplied by the weight of the target neuron 801, that is, the updated membrane potential is the membrane potential of the target neuron 801 itself, and the input pulse potential multiplied by the weight is added to the updated membrane potential, and the preset leak potential is subtracted. Specifically, the weight of the target neuron 801 may be a positive number or a negative number. When the target neuron 801 does not receive the input pulse potential or the received input pulse potential is 0, the membrane potential of the target neuron 801 also needs to be subtracted by a preset leak potential. After the target neuron 801 updates its membrane potential, it is determined whether the updated membrane potential is greater than a preset potential threshold. When the updated membrane potential is greater than the preset potential threshold, the target neuron 801 generates and issues an output pulse potential, and after issuing the output pulse potential, the membrane potential of the target neuron 801 returns to zero.
The control device 301 receives the output pulse potential from the target neuron 801.
And 3, generating the output pulse data packet according to each output pulse potential.
Calculating the possibility that each neuron in the kernel sends out an output pulse potential, and sending out the output pulse potential when the membrane potential of the neuron is greater than a preset potential threshold; when the membrane potential of the neuron is not greater than a preset potential threshold, no output pulse potential is emitted. The control device 301 of the computation core generates the output pulse packet in accordance with each received output pulse potential. According to the neural network structure, namely the connection relation between each computation core in the neural network accelerator and the neuron in each computation core, determining the computation core address of the target computation core in the output pulse data packet and the neuron address corresponding to the output pulse potential, and generating the output pulse data packet. Here, since the neural network structure is stored in the storage device, the computing core may retrieve the neural network structure from the storage device.
After the updating operation is finished, the stored input excitation data packet has no effect, and in order to avoid influencing the judgment of the next updating condition and the execution of the updating operation, the stored input excitation data packet is emptied after the updating operation is executed at this time. The step of flushing the stored input stimulus packet may be performed in parallel with the step of sending an output pulse packet or the steps of generating and sending an output stimulus packet. There is no need to define the execution order of the step of flushing the stored input stimulus packet and the step of transmitting the output burst packet or the steps of generating and transmitting the output stimulus packet.
S707: and sending the output pulse data packet.
The control device of the computation core sends the output pulse data packet to the routing device, and the routing device sends the output pulse data packet to the computation core of the middle layer or the computation core of the output layer, or of course, to the computation core of the input layer.
S708: an output stimulus packet is generated and transmitted.
In one possible embodiment, in order to enable the target computing core of the subsequent stage to continuously determine whether the update condition is satisfied according to the output excitation data packet, and perform the update operation when the update condition is satisfied, the control device of the computing core generates the excitation data packet according to the connection relationship of the neural network, and sends the output excitation data packet to the target computing core of the subsequent stage. Here, in order to ensure that the target computing core at the subsequent stage correctly performs the update operation, the target computing core corresponding to the excitation packet is output, the target computing core corresponding to the output pulse packet is the same computing core, and the computing core addresses of the target computing cores are the same. The update condition of the target computing core is also used for ensuring that the target computing core receives all input pulse data packets taking the target computing core as the current computing core before performing the update operation.
In order for the computation core of the subsequent stage receiving the output burst packet to determine whether the update condition is satisfied, the output excitation packet must be generated and transmitted at the time of transmitting the output burst packet. Specifically, a computing core address of a target computing core and an update condition of the target computing core are obtained from a storage device; generating the output excitation data packet according to the computing core address of the target computing core and the updating condition of the target computing core; and then, sending the output excitation data packet to the routing equipment, and sending the excitation data packet to the target computing core by the routing equipment. Here, the target computation core that outputs the excitation packet and the target computation core that outputs the pulse packet are the same target computation core. Also, the update condition of the target computing core may be input by the configuration device and stored in the storage device.
It should be understood that the output stimulus packet is identical in composition and function to the input stimulus packet, and the same stimulus packet sent by the preceding compute core to the current compute core is the output stimulus packet for the preceding compute core and the input stimulus packet for the current compute core. In the case where the input pulse packet and the output pulse packet are the same, the same pulse packet sent from the preceding-stage computing core to the current computing core is the output pulse packet for the preceding-stage computing core and the input pulse packet for the current computing core.
For the computing cores of the output layer, when the neural network performs computation, the output layer generally does not need to send output pulse data packets to the next computing core, but sends the computation results to the external electronic device. At this time, the output layer completes the update operation, which means that the entire neural network has completed the update operation, and at this time, the output pulse data packet needs to be sent to the external electronic device as an operation result. Further, depending on the external electronic device, the output stimulus packet may be generated and transmitted to the external electronic device, or the output stimulus packet may not be generated and transmitted.
In actual implementation, there are often connection cases as shown in fig. 9a, and the computing core a, the computing core B, the computing core C, and the computing core D are all connected to the computing core E. At this time, because the number of input pulse data packets input to the computation core a, the computation core B, the computation core C, and the computation core D is not constant, and the time duration of the update operation of the computation core a, the computation core B, the computation core C, and the computation core D is also not constant, a situation may occur in which the computation core B receives fewer input pulse data packets, and the computation core C receives more input pulse data packets, which may cause a time step of the update operation performed by the computation core B according to the input pulse data packets to be much smaller than a time step of the update operation performed by the computation core C according to the input pulse data packets.
In this case, when the control method of the neural network accelerator provided in the embodiment of the present application is used to control the update operation of the neural network accelerator, the computation core B is immediately updated when the next update condition is satisfied after the last update operation is completed. If the number of input pulse packets input by the computing core B is still small in the time step of the next update operation, a situation that the computing core B completes two update operations and the computing core C does not complete one update operation occurs, as shown in fig. 9B. Therefore, the second-layer computing core E receives the input pulse data packet and the input excitation data packet of two time steps from the computing core B, and further operation errors occur.
In order to further overcome the defect in this case, in the embodiment of the present application, after the step of performing the update operation and sending the output burst packet, and before the step of generating and sending the output excitation packet, the output feedback signal may be sent; the output feedback signal is used to characterize the receipt of the input excitation data packet. And upon receiving an input feedback signal, performing the steps of generating and transmitting an output stimulus data packet.
In this case, the execution steps of the control method of the preferred neural network accelerator include: acquiring an input pulse data packet; acquiring an input excitation data packet; storing the input stimulus data packet; acquiring an updating condition of the current computing core; judging whether the current computing core meets the updating condition or not according to the stored input excitation data packet and the acquired updating condition; when the current computing core meets the updating condition, executing updating operation and sending an output pulse data packet; sending an output feedback signal; judging whether an input feedback signal is received or not; upon receiving the input feedback signal, performing the steps of generating and transmitting an output stimulus data packet.
At this time, as shown in fig. 9C, the time for the computation core B to send out the output pulse packet and the output excitation packet in the second time step is delayed until the computation core C sends out the output pulse packet and the output excitation packet in the first time step, so as to avoid the above error condition.
Based on the same design concept, the embodiment of the application also provides a control device of the neural network accelerator and the neural network accelerator.
As shown in fig. 10, a control apparatus 1000 of a neural network accelerator provided in an embodiment of the present application is applied to a control device of a computing core, and includes:
an excitation obtaining module 1001, configured to obtain an input excitation data packet after obtaining the input pulse data packet;
a storage module 1002, configured to store the input excitation data packet; the input excitation data packet is used for judging whether the current computing core meets the updating condition;
a condition obtaining module 1003, configured to obtain an update condition of the current computing core;
the excitation judging module 1004 is configured to judge whether the current computing core meets the update condition according to the stored input excitation data packet and the obtained update condition;
an update module 1005, configured to execute an update operation and send an output pulse data packet when the current computing core meets an update condition.
The excitation obtaining module 1001 is connected to the storage module 1002, and after obtaining the input pulse data packet, the excitation obtaining module 1001 obtains the input excitation data packet, and stores the input excitation data packet in the storage module 1002. The excitation judging module 1004 is respectively connected to the storage module 1002 and the condition obtaining module 1003, and judges whether the update condition of the current computing core is satisfied according to the input excitation data packet stored in the storage module 1002 and the update condition obtained by the condition obtaining module 1003. When the update condition is satisfied, the excitation determining module 1004 drives the update module 1005 to start the update operation.
In a possible implementation, the update condition of the current computing core is that a preset number of excitation data packets are received;
the excitation determining module 1004 is specifically configured to:
and judging whether a preset number of excitation data packets are received or not.
In a possible implementation manner, the input pulse data packet carries at least one input pulse potential and a neuron address corresponding to each input pulse potential;
as shown in fig. 11, the update module 1005 specifically includes:
a neuron sending unit 1101 configured to send, for each input pulse potential, the input pulse potential to a target neuron according to a neuron address corresponding to the input pulse potential;
a neuron receiving unit 1102, configured to receive an output pulse potential emitted by the target neuron; the output pulse potential is generated by the target neuron according to the membrane potential of the target neuron, the input pulse potential received by the target neuron, a preset leakage potential and a preset potential threshold;
a pulse generating unit 1103 configured to generate the output pulse packet according to each of the output pulse potentials.
The neuron transmission unit 1101 is connected to the target neuron 1104 and transmits an input pulse potential to the target neuron 1104, and the neuron reception unit 1102 is similarly connected to the target neuron 1104 and receives an output pulse potential from the target neuron 1104. The neuron receiving unit 1102 is further connected to a pulse generating unit 1103, and transmits output pulse potentials to the pulse generating unit 1103, and the pulse generating unit 1103 generates the output pulse data packets according to each of the output pulse potentials.
In a possible implementation, the apparatus 1000 further comprises:
a clearing module 1006, configured to clear the stored input excitation packet.
The emptying module 1006 is connected to the updating module 1005 and the incentive storing module 1002, and empties the input incentive packets stored in the incentive storing module 1002 after the updating module 1005 performs the updating operation.
In a possible implementation, the apparatus 1000 further comprises:
an excitation sending module 1007, configured to generate and send an output excitation data packet; the output excitation data packet comprises a computing core address of a target computing core; the output excitation data packet is used for judging whether the target computing core meets an updating condition.
In one possible implementation, as shown in fig. 12, the excitation sending module 1007 includes:
an obtaining unit 1201, configured to obtain a computing core address of the target computing core and an update condition of the target computing core;
a generating unit 1202, configured to generate the output excitation data packet according to a computing core address of the target computing core and an update condition of the target computing core;
a sending unit 1203 is configured to send the output excitation data packet.
The obtaining unit 1201 is connected to the storage device and the generating unit 1202, and the obtaining unit 1201 obtains the computing core address of the target computing core and the update condition of the target computing core, and sends the obtained address and the update condition to the generating unit 1202. The generation unit 1202 generates an output excitation packet according to the computation core address of the target computation core and the update condition of the target computation core. The generation unit 1202 is connected to the transmission unit 1203, and the generation unit 1202 transmits the output excitation packet to the transmission unit 1203. The sending unit 1203 is connected to the routing device, and the sending unit 1203 sends the output excitation data packet to the routing device, and sends the output excitation data packet to the target computing core through the routing device.
In a possible implementation, the apparatus 1000 further comprises:
a feedback sending module 1008, configured to send an output feedback signal; the output feedback signal is used to characterize the receipt of the input excitation data packet.
The feedback sending module 1008 is connected to the updating module 1005, and sends an output feedback signal after the updating module 1005 completes the updating operation.
In a possible implementation, the apparatus 1000 further comprises:
the feedback determining module 1009 is configured to determine whether an input feedback signal is received. The feedback determining module 1009 is connected to the excitation sending module 1007, and drives the excitation sending module 1007 to perform the step of generating and sending the output excitation data packet when receiving the input feedback signal
The control device of the neural network accelerator provided by the embodiment of the application can enable each computation core in the neural network accelerator to carry out updating operation according to the actual delay time of the received pulse, and the overall operation performance of the neural network accelerator is obviously improved.
An embodiment of the present application further provides a neural network accelerator, including: the method comprises the steps of configuring equipment, storage equipment, routing equipment and a computing core;
the computing core comprises: neurons and control devices;
the control device is used for realizing any method provided by the embodiment of the application or realizing any device provided by the embodiment of the application.
In a possible implementation manner, the neural network accelerator provided in the embodiment of the present application performs transmission of the input stimulus packet and/or the output stimulus packet through the routing device.
In a possible implementation manner, the neural network accelerator provided in the embodiment of the present application performs transmission of the input feedback signal and/or the output feedback signal through a dedicated signal line.
Here, the input feedback signal and/or the output feedback signal are also the same in configuration and action, and the same feedback signal transmitted from the subsequent-stage computing core to the current computing core is the output feedback signal for the subsequent-stage computing core and the input feedback signal for the current computing core.
Feedback signals, including input feedback signals and output feedback signals, may be transmitted using the routing device. However, since the input feedback signal and/or the output feedback signal can be implemented by only 1bit signal, there is no need to transmit the input feedback signal and/or the output feedback signal through the on-chip network (i.e., the routing device of the neural network accelerator) in order to further improve the stun efficiency and the overall operation performance of the neural network accelerator. Preferably, a dedicated signal line can be directly added between the computing cores, and the connection between the computing cores is directly performed. As shown in fig. 13, a dedicated signal is added between the computing core E and its previous computing core (i.e., computing core a, computing core B, computing core C, and computing core D), and is used for transmitting a feedback signal including an input feedback signal and an output feedback signal. Since the number of bits of the feedback signal is only 1bit, the feedback signal sent by one computing core does not exceed the number of computing cores in the network at most (for example, in a neural network accelerator with a computing core array of 4 × 4, the number of feedback signals is 16 at most), and even if a special signal line is added, excessive design and implementation cost is not brought. And moreover, the feedback signal is directly transmitted between the computing cores by using the special signal wire, so that the feedback signal can be reduced, and the updating operation speed is improved. Therefore, it is preferable to use the above-mentioned dedicated signal line independent of the routing device for the transmission of the input feedback signal and/or the output feedback signal.
The control method and device for any neural network accelerator and the neural network accelerator provided by the embodiments of the present application are all based on the same design concept, and the technical means in any embodiment of the present application can be freely combined, and the combined technical means is still within the protection scope of the present application.
The flowchart and block diagrams in the figures of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments disclosed herein. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It will be appreciated by a person skilled in the art that various combinations and/or combinations of features described in the various embodiments and/or claims of the present application are possible, even if such combinations or combinations are not explicitly described in the present application. In particular, the features recited in the various embodiments and/or claims of the present application may be combined and/or coupled in various ways, all of which fall within the scope of the present disclosure, without departing from the spirit and teachings of the present application.
The principle and implementation of the present application are explained by applying specific embodiments in the present application, and the above description of the embodiments is only used to help understanding the method and the core idea of the present application, and is not used to limit the present application. It will be appreciated by those skilled in the art that changes may be made in this embodiment and its broader aspects and without departing from the principles, spirit and scope of the invention, and that all such modifications, equivalents, improvements and equivalents as may be included within the scope of the invention are intended to be protected by the claims.

Claims (11)

1. A control method of a neural network accelerator, characterized in that a control device applied to a computational core includes:
after an input pulse data packet is acquired, acquiring an input excitation data packet, and storing the input excitation data packet; the input excitation data packet is used for judging whether the current computing core meets the updating condition;
acquiring an updating condition of the current computing core;
judging whether the current computing core meets the updating condition or not according to the stored input excitation data packet and the acquired updating condition;
and when the current computing core meets the updating condition, executing updating operation and sending an output pulse data packet.
2. The method of claim 1, wherein the current computational core is updated on condition that a predetermined number of excitation packets are received;
the step of judging whether the update condition of the current computing core is met according to the stored input excitation data packet comprises the following steps:
and judging whether a preset number of excitation data packets are received or not according to the total number of the stored input excitation data packets.
3. The method of claim 1, wherein the input pulse data packet carries at least one input pulse potential and a neuron address corresponding to each input pulse potential;
the step of performing an update operation includes:
aiming at each input pulse potential, sending the input pulse potential to a target neuron according to a neuron address corresponding to the input pulse potential;
receiving an output pulse potential sent by the target neuron; the output pulse potential is generated by the target neuron according to the membrane potential of the target neuron, the input pulse potential received by the target neuron, a preset leakage potential and a preset potential threshold;
and generating the output pulse data packet according to each output pulse potential.
4. The method of claim 1, wherein after the step of performing an update operation, the method further comprises:
emptying the stored input excitation data packet.
5. The method of claim 1, wherein after the step of performing the update operation and sending the output burst packet, the method further comprises:
generating and transmitting an output stimulus data packet; the output excitation data packet comprises a computing core address of a target computing core; the output excitation data packet is used for judging whether the target computing core meets an updating condition.
6. The method of claim 5, wherein the step of generating and transmitting an output stimulus packet comprises:
acquiring a computing core address of the target computing core and an updating condition of the target computing core;
generating the output excitation data packet according to the computing core address of the target computing core and the updating condition of the target computing core;
and transmitting the output excitation data packet.
7. The method of claim 5, wherein after the step of performing an update operation and sending an output burst packet and before the step of generating and sending an output stimulus packet, the method further comprises:
sending an output feedback signal; the output feedback signal is used to characterize the receipt of the input excitation data packet.
8. The method of claim 7, wherein after the step of performing an update operation and sending an output burst packet and before the step of generating and sending an output stimulus packet, the method further comprises:
judging whether an input feedback signal is received or not;
upon receiving the input feedback signal, performing the steps of generating and transmitting an output stimulus data packet.
9. A control device of a neural network accelerator is characterized in that a control device applied to a computing core comprises:
the excitation acquisition module is used for acquiring an input excitation data packet after acquiring the input pulse data packet;
the storage module is used for storing the input excitation data packet; the input excitation data packet is used for judging whether the current computing core meets the updating condition;
the condition acquisition module is used for acquiring the updating condition of the current computing core;
the excitation judging module is used for judging whether the current computing core meets the updating condition or not according to the stored input excitation data packet and the acquired updating condition;
and the updating module is used for executing updating operation and sending an output pulse data packet when the current computing core meets the updating condition.
10. A neural network accelerator, comprising: the method comprises the steps of configuring equipment, storage equipment, routing equipment and a computing core;
the computing core comprises: neurons and control devices;
the control device is used for realizing the method of any one of claims 1 to 8 or realizing the device of claim 9.
11. The neural network accelerator of claim 10, wherein the transmission of the input feedback signal and/or the output feedback signal is performed via dedicated signal lines.
CN202010009676.2A 2020-01-06 2020-01-06 Control method and device of neural network accelerator and neural network accelerator Active CN111210014B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010009676.2A CN111210014B (en) 2020-01-06 2020-01-06 Control method and device of neural network accelerator and neural network accelerator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010009676.2A CN111210014B (en) 2020-01-06 2020-01-06 Control method and device of neural network accelerator and neural network accelerator

Publications (2)

Publication Number Publication Date
CN111210014A true CN111210014A (en) 2020-05-29
CN111210014B CN111210014B (en) 2023-06-02

Family

ID=70787005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010009676.2A Active CN111210014B (en) 2020-01-06 2020-01-06 Control method and device of neural network accelerator and neural network accelerator

Country Status (1)

Country Link
CN (1) CN111210014B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112242963A (en) * 2020-10-14 2021-01-19 广东工业大学 Rapid high-concurrency neural pulse data packet distribution and transmission method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4924517A (en) * 1988-02-04 1990-05-08 Nec Corporation Encoder of a multi-pulse type capable of controlling the number of excitation pulses
CN101042424A (en) * 2007-04-26 2007-09-26 北京南山之桥信息技术有限公司 Method and apparatus for detecting application-specific integrated circuits
US20180075344A1 (en) * 2016-09-09 2018-03-15 SK Hynix Inc. Neural network hardware accelerator architectures and operating method thereof
CN109684672A (en) * 2018-11-30 2019-04-26 上海芯钛信息科技有限公司 A kind of SOC chip whole-system verification system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4924517A (en) * 1988-02-04 1990-05-08 Nec Corporation Encoder of a multi-pulse type capable of controlling the number of excitation pulses
CN101042424A (en) * 2007-04-26 2007-09-26 北京南山之桥信息技术有限公司 Method and apparatus for detecting application-specific integrated circuits
US20180075344A1 (en) * 2016-09-09 2018-03-15 SK Hynix Inc. Neural network hardware accelerator architectures and operating method thereof
CN109684672A (en) * 2018-11-30 2019-04-26 上海芯钛信息科技有限公司 A kind of SOC chip whole-system verification system and method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JILIN ZHANG ET AL.: "An Asynchronous Reconfigurable SNN Accelerator With Event-Driven Time Step Update", 《IEEE ASIAN SOLID-STATE CIRCUITS CONFERENCE》 *
沈阳靖;沈君成;叶俊;马琪;: "基于FPGA的脉冲神经网络加速器设计", 电子科技 *
王海涛;张霄;史丽晨;王琨;康振亚;: "多脉冲激励法在轴承滚珠磨损中的状态研究及应用", 机械科学与技术 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112242963A (en) * 2020-10-14 2021-01-19 广东工业大学 Rapid high-concurrency neural pulse data packet distribution and transmission method
CN112242963B (en) * 2020-10-14 2022-06-24 广东工业大学 Rapid high-concurrency neural pulse data packet distribution and transmission method and system

Also Published As

Publication number Publication date
CN111210014B (en) 2023-06-02

Similar Documents

Publication Publication Date Title
CN108416327B (en) Target detection method and device, computer equipment and readable storage medium
CN110389909A (en) Use the system and method for the performance of deep neural network optimization solid state drive
CN108334942B (en) Data processing method, device, chip and storage medium of neural network
EP3710995A1 (en) Deep neural network processor with interleaved backpropagation
CN109246027B (en) Network maintenance method and device and terminal equipment
KR20190116040A (en) Neural network processor
CN108111335A (en) A kind of method and system dispatched and link virtual network function
CN111723901A (en) Training method and device of neural network model
CN111210014A (en) Control method and device of neural network accelerator and neural network accelerator
WO2021096590A1 (en) Threshold triggered back propagation of an artificial neural network
CN110600020B (en) Gradient transmission method and device
CN114819114A (en) Pulse neural network hardware accelerator and optimization method thereof in convolution operation
CN111985634B (en) Operation method and device of neural network, computer equipment and storage medium
CN109871958B (en) Method, device and equipment for training model
WO2020093654A1 (en) Multichip system and data processing method adapted to the same for implementing neural network application
CN116680565A (en) Combined learning model training method, device, equipment and storage medium
CN117391148A (en) Convolution calculation unit, AI operation array and related equipment
CN113014659B (en) Microservice migration method and device, storage medium and electronic equipment
EP4052188B1 (en) Neural network instruction streaming
CN113312169B (en) Computing resource allocation method and device
CN113255902A (en) Neural network circuit, system and method for controlling data flow
CN108564170B (en) Reconfigurable neural network operation method and circuit based on NOC
CN113396425B (en) Acceleration method, device and system-on-chip
CN109325582B (en) Computing device and method for binary neural network
CN114365148A (en) Neural network operation system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant