WO2020051918A1 - 神经元电路、芯片、系统及其方法、存储介质 - Google Patents

神经元电路、芯片、系统及其方法、存储介质 Download PDF

Info

Publication number
WO2020051918A1
WO2020051918A1 PCT/CN2018/105847 CN2018105847W WO2020051918A1 WO 2020051918 A1 WO2020051918 A1 WO 2020051918A1 CN 2018105847 W CN2018105847 W CN 2018105847W WO 2020051918 A1 WO2020051918 A1 WO 2020051918A1
Authority
WO
WIPO (PCT)
Prior art keywords
deep learning
neural network
network layer
neuron
module
Prior art date
Application number
PCT/CN2018/105847
Other languages
English (en)
French (fr)
Inventor
王峥
梁明兰
林跃金
李善辽
赵玮
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to PCT/CN2018/105847 priority Critical patent/WO2020051918A1/zh
Publication of WO2020051918A1 publication Critical patent/WO2020051918A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the invention belongs to the field of computer technology, and particularly relates to a neuron circuit, a chip, a system and a method thereof, and a storage medium.
  • ASIC application-specific integrated circuit
  • the purpose of the present invention is to provide a neuron circuit, a chip, a system and a method thereof, and a storage medium, which are aimed at solving the problem that the application of a deep learning chip cannot be reconstructed due to the existing technology, and the application of the deep learning chip is limited.
  • the present invention provides a neuron circuit.
  • the neuron circuit includes:
  • a configuration information storage module for storing neuron processing mode configuration information
  • a control module is configured to control the computing module to adjust to a corresponding computing infrastructure and execute corresponding neural network layer node data processing according to the processing mode configuration information.
  • the present invention provides a deep learning chip.
  • the deep learning chip includes:
  • a storage unit configured to store a deep learning instruction set and data targeted for the deep learning, the deep learning instruction set including: a plurality of neural network layer instructions having a predetermined processing sequence;
  • a neuron array composed of several neuron circuits as described above;
  • a central controller for controlling according to the deep learning instruction set so that: the current unit corresponding to the current neural network layer instruction is placed from the storage unit to the neuron circuit in the neuron array Processing mode configuration information and corresponding required processing data, and after the current neural network layer processing task indicated by the current neural network layer instruction is completed, the next neural network layer processing task is executed until the deep learning instruction set Instructed completion of deep learning tasks; and,
  • An input-output unit is configured to implement data transmission between the storage unit and the neuron array.
  • the present invention also provides a deep learning chip cascade system.
  • the deep learning chip cascade system includes: at least two deep learning chips having the cascade relationship with each other as described above.
  • the present invention also provides a deep learning system.
  • the deep learning system includes: at least one deep learning chip as described above, and peripheral devices connected to the deep learning chip.
  • the present invention also provides a neuron control method.
  • the neuron control method includes the following steps:
  • control computing module is adjusted to the corresponding computing infrastructure and executes the corresponding neural network layer node data processing.
  • the present invention also provides a deep learning control method.
  • the deep learning control method includes the following steps:
  • the deep learning instruction set comprising: a plurality of neural network layer instructions having a predetermined processing order;
  • the control is such that: the current processing mode configuration information corresponding to the current neural network layer instruction and the corresponding data to be processed are inserted into the neuron circuit in the neuron array, wherein the neuron circuit is based on
  • the current processing mode configuration information is adjusted to the corresponding computing infrastructure and executes the corresponding neural network layer node data processing, and after the current neural network layer processing task indicated by the current neural network layer instruction is completed, the next neural network is executed.
  • the network layer processes the tasks until the deep learning task indicated by the deep learning instruction set is completed.
  • the present invention also provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in the method are implemented.
  • the present invention also provides a deep learning method.
  • the deep learning method is based on the deep learning chip described above or the deep learning chip cascade system described above.
  • the deep learning method includes the following steps:
  • the central controller controls, according to the deep learning instruction set, to: place the current neuron circuit in the neuron array with the current configuration information of the processing mode corresponding to the current neural network layer instruction and Corresponding to the data to be processed, wherein the neuron circuit is adjusted to the corresponding computing infrastructure and executes the corresponding neural network layer node data processing according to the current configuration information of the processing mode, and After the indicated current neural network layer processing task is completed, the next neural network layer processing task is executed until the deep learning task indicated by the deep learning instruction set is completed.
  • the invention includes the following structure in a neuron circuit: a calculation module; a configuration information storage module for storing neuron processing mode configuration information; and a control module for controlling the calculation module to adjust according to the processing mode configuration information For the corresponding computing infrastructure and execute the corresponding neural network layer node data processing.
  • the neuron circuit and the deep learning chip applied by the neuron circuit can be flexibly configured according to the needs of different scene functions, neural network types, neural network scales, and neuron operation modes, so that the deep learning chip and neuron circuit can be based on
  • FIG. 1 is a schematic structural diagram of a neuron circuit provided by Embodiment 1 of the present invention.
  • FIG. 2 is a schematic structural diagram of a neuron circuit provided in Embodiment 2 of the present invention.
  • FIG. 3 is a schematic structural diagram of a neuron circuit provided by Embodiment 3 of the present invention.
  • FIG. 4 is a schematic structural diagram of a neuron circuit according to a fourth embodiment of the present invention.
  • Embodiment 5 is a schematic structural diagram of a deep learning chip provided by Embodiment 5 of the present invention.
  • FIG. 6 is a schematic structural diagram of a deep learning chip cascade system according to an eighth embodiment of the present invention.
  • Embodiment 9 of the present invention is a schematic structural diagram of a deep learning system provided by Embodiment 9 of the present invention.
  • FIG. 8 is a schematic flowchart of a neuron control method according to Embodiment 10 of the present invention.
  • Embodiment 11 of the present invention is a schematic flowchart of a deep learning control method according to Embodiment 11 of the present invention.
  • FIG. 10 is a schematic diagram of a data structure of a convolutional network layer instruction in an application example of the present invention.
  • FIG. 11 is a schematic diagram of a data structure of a pooled network layer instruction in an application example of the present invention.
  • FIG. 12 is a schematic diagram of a data structure of a fully connected network layer instruction in an application example of the present invention.
  • FIG. 13 is a schematic diagram of a data structure of an activation function network layer instruction in an application example of the present invention.
  • FIG. 14 is a schematic diagram of a data structure of a state action network layer instruction in an application example of the present invention.
  • FIG. 15 is a schematic structural diagram of a CRNA architecture chip in an application example of the present invention.
  • 16 is a schematic structural diagram of a neuron circuit in an application example of the present invention.
  • FIG. 17 is a schematic diagram of a 128-level state machine control flow of a central controller in an application example of the present invention.
  • FIG. 1 shows a structure of a neuron circuit provided in Embodiment 1 of the present invention, and specifically relates to a digital neuron circuit for forming a deep learning neural network, and the deep learning neural network can perform required input data.
  • the neural network layers are processed in an orderly manner, and the neuron circuit is used to perform the required data processing on the corresponding nodes of the neural network layer.
  • the parts related to the embodiment of the present invention including:
  • the computing module 101 can adjust the computing infrastructure to perform data processing of nodes in different neural network layers.
  • the calculation module 101 may be used to perform a corresponding single process such as multiplication operation, addition operation, activation using an activation function, or a flexible combination of different processes.
  • the computing module 101 mentioned here can perform at least two different neural network layer node data processing at least including the following two meanings: First, it corresponds to different types of neural network layers, such as: convolutional network layer, pooling The network layer, fully connected network layer, activation function network layer, state action network layer, etc. have different data processing requirements.
  • the computing module 101 can meet at least two types of neural network layer data processing requirements.
  • the computing module 101 can adapt to different The ability of processing data in the neural network layer can be achieved by using flexible combinations of multiplication, addition, and activation processing according to requirements. For example, when a computing module 101 is required to perform convolutional network layer data processing, the computing module 101 The flexible combination of the above processing can meet the data processing requirements of the convolutional network layer. When the computing module 101 is required to perform the fully connected network layer data processing, the computing module 101 can satisfy the full connection by another flexible combination of the above processing. Network layer data processing requirements, this flexible combination for different needs depends on subsequent The information storage module 102 and the control module 103 are implemented together.
  • the neuron circuit can perform data processing of one node in a certain type of neural network layer, and it can also perform data processing of a node in another type of neural network layer.
  • Data processing second, the neuron circuit can perform data processing corresponding to different nodes in the same type of neural network layer, for example: a neuron circuit can perform data processing on a node in the first convolutional network layer, or Second, the data processing of a node in the convolutional network layer; third, the neuron circuit can perform data processing corresponding to different nodes in the same neural network layer, and the neuron circuit is multiplexed in the data processing required by the same neural network layer .
  • the computing infrastructure of the computing module 101 is not the same.
  • the computing module 101 can control the adjustment of the computing infrastructure to suit the processing of data at different nodes. After the data processing of all nodes in a neural network layer is completed, the data processing of the neural network layer is completed, and the data of each neural network layer in the neural network is completed, then the data processing of the neural network is completed.
  • the configuration information storage module 102 is configured to store neuron processing mode configuration information.
  • the processing mode configuration information indicates some configuration information required when the neuron circuit needs to perform data processing on the corresponding node of the neural network layer, and these configuration information may indicate which node operations the neuron circuit needs to implement.
  • a control module 103 is configured to control the computing module to adjust to a corresponding computing infrastructure and execute corresponding node network node data processing according to the processing mode configuration information.
  • the neuron circuit can perform data processing of a node in a certain neural network layer of the entire deep learning neural network.
  • Flexible configuration of neuron circuits so that neuron circuits can be reconstructed according to the actual neural network computing needs, so as to meet the complex and diverse neural network computing needs of rapid iterations, which can be widely applied to computing resources that are limited and require a certain neural network architecture.
  • the field of refactoring has expanded the application of deep learning chips.
  • the neuron circuit also includes:
  • the parameter storage module 201 is configured to store parameters required for data processing of a neural network layer node.
  • the parameter may be a neural network parameter obtained through training.
  • the address generating module 202 is used to be controlled by the control module 103 to find parameters corresponding to data targeted for data processing of the neural network layer node, and the searched parameters are input to the computing module 101 to participate in corresponding data processing.
  • the basic configuration of the neuron circuit does not need the above-mentioned parameter storage module 201 and address generation Module 202, and other neural network parameters need to be called when processing data such as convolutional network layer, fully connected network layer, state action network layer, etc., the above parameter storage module 201 and address generation module 202 need to be configured in the neuron circuit. This enhances the broad applicability of neuron circuits.
  • the neuron circuit also includes:
  • the temporary storage module 301 is configured to store intermediate data of the data processing of the neural network layer node.
  • the basic configuration of the neuron circuit does not require the temporary storage module described above. 301, and other data processing such as reinforcement learning networks, recurrent networks, etc. need to use neuron circuits to process the intermediate data, you need to configure the temporary storage module 301 in the neuron circuit, which can also enhance the broad applicability of neuron circuits.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • the calculation module 101 includes:
  • the basic calculation module 401 includes a multiplier, an adder, and / or an activation function module.
  • the gating module 402 is configured to perform a corresponding gating action under the control of the control module 103, so that the basic computing module 401 forms a corresponding computing infrastructure.
  • the gating module 402 may include: a multiplexer (MUX) and / Or Demultiplexer (DEMUX).
  • the basic calculation module 401 can perform basic processing such as multiplication, addition, and activation using an activation function.
  • the required parameters can be obtained from the above-mentioned parameter storage module 201, and the gating module 402 can The basic computing module 401 is adjusted accordingly as needed to obtain the computing infrastructure required for real-time neural network layer node data processing, and the reconstruction of the computing module 101 is realized.
  • Embodiment 5 is a diagrammatic representation of Embodiment 5:
  • FIG. 5 shows the structure of a deep learning chip provided in Embodiment 5 of the present invention. For ease of description, only parts related to the embodiment of the present invention are shown, including:
  • the storage unit 501 is configured to store a deep learning instruction set and data targeted for the deep learning.
  • the deep learning instruction set includes: a plurality of neural network layer instructions having a predetermined processing sequence.
  • the deep learning instruction set includes all neural network layer instructions covered by the deep learning task, such as: convolutional network layer instructions, pooled network layer instructions, fully connected network layer instructions, activation function network layer instructions, status Action network layer instructions.
  • this instruction set usually also includes other information needed to complete deep learning tasks, such as: neural network type information, neural network structure information, etc., where the neural network type information can indicate that the neural network is a convolutional network, a region Network, recurrent network or reinforcement learning network, etc., the neural network structure information may include: the number of layers of the neural network layer included in the neural network, the number of nodes in the neural network layer, and instructions on which operations the neural network layer needs to implement, etc.
  • the instruction information corresponds to the processing mode configuration information stored in the configuration information storage module 102.
  • the neuron array 504 is composed of a plurality of neuron circuits 502 as described above.
  • the specific functions and structures are as described in the other embodiments, and are not repeated here.
  • the central controller 503 is configured to control according to the deep learning instruction set such that: from the storage unit 501 to the neuron circuit 502 in the neuron array 504, the current processing mode configuration information corresponding to the current neural network layer instruction and the corresponding required information are placed. The processed data, and after the current neural network layer processing task indicated by the neural network layer instruction is completed, execute the next neural network layer processing task until the deep learning task indicated by the deep learning instruction set is completed.
  • the central controller 503 can use various types of controllers, such as: Advanced Reduced Instruction Set Computer (ARM) controllers, Intel Intel series controllers, and Huawei Hisilicon series controls It can use a finite state machine for its architecture, and can complete the transition of different states according to conditions, so as to control the workflow of the neural network covered by it, including: the configuration process, neural network computing process, data transmission process, etc., among which, It may involve a single batch of neuron array 504 for a neural network layer in the entire deep learning chip, or multiple batches of neuron array for a neural network layer. Multi-batch processing involves the reuse of neuron circuit 502 .
  • ARM Advanced Reduced Instruction Set Computer
  • Intel Intel series controllers Intel Intel series controllers
  • Huawei Hisilicon series controls It can use a finite state machine for its architecture, and can complete the transition of different states according to conditions, so as to control the workflow of the neural network covered by it, including: the configuration process, neural network computing process, data transmission process, etc., among which, It may involve a single batch
  • the central controller 503 may be mainly configured to configure a deep learning chip constituting a neural network, so that the neuron array 504 can perform an orderly data processing according to a neural network layer instruction in a deep learning instruction set. During the entire operation of the neural network, the central controller 503 is used to implement the core operations of the neural network, including: instruction update, content decoding, and the like.
  • the input-output unit 505 is configured to implement data transmission between the storage unit 501 and the real-time neuron array 504.
  • the current neural network layer processing task is executed.
  • the current processing mode configuration information corresponding to the current neural network layer instruction in the storage unit 501 is placed in the neuron circuit 502 in the neuron array 504 through the input and output unit 505 to complete the neuron.
  • the configuration of the circuit 502, and then the data to be processed in the storage unit 501 is placed in the neuron circuit 502 in the neuron array 504 through the input and output unit 505.
  • the neuron circuit 502 is placed on the basis of the completed configuration.
  • the data is processed, and the processed data is used as the pending data for the next neural network layer processing task.
  • the next neural network layer processing task is executed until all the neural network layer processing tasks are completed, and the deep learning task is finally completed.
  • the processing mode configuration information is placed in the neuron circuit 502, it will also be transferred from the storage unit 501 to the neurons in the neuron array 504.
  • the circuit 502 sets the corresponding parameters, and then performs the setting and processing of the data.
  • Implementation of this embodiment can flexibly configure the neuron circuit and the deep learning chip applied by the neuron circuit according to the needs of different scene functions, neural network types, neural network scales, and neuron operation modes, so that the deep learning chip and neuron
  • the circuit can be reconstructed according to the actual neural network computing needs, so as to meet the complex and diverse neural network computing needs of rapid iterations. It can be widely applied to areas where computing resources are limited and a certain neural network architecture is reconfigurable, extending the deep learning chip Applications.
  • Embodiment 6 is a diagrammatic representation of Embodiment 6
  • this embodiment further relates to:
  • the input-output unit 505 is a stream-in stream-out shift register, and an independent data transmission path is established between the neuron circuit 502 and the input-output unit 505.
  • the data to be processed at each node of the neural network layer stored in the storage unit 501 is transmitted to the corresponding neuron circuit 502 in the neuron array 504 through an input shift register and an independent data transmission path for processing. After the processing is completed, the processed data is transmitted to the storage unit 501 for storage through an independent data transmission path and an output shift register. If the data processed by all nodes in the current neural network layer is the data to be processed in the next neural network layer, all the processed data is used as the next neural network layer after all nodes in the current neural network layer have completed data processing. Pending data.
  • serial data transmission in series can be realized.
  • the multi-fanout circuit required for traditional multi-neurons to access multiple data there is no need to calculate the access address of data storage. Reading and writing are greatly simplified, which reduces the memory bandwidth requirements and greatly reduces the input and output power consumption.
  • the use of the shift register and the data transmission path established between the neuron circuit 502 and the shift register are relatively independent between each neuron circuit 502, which can avoid the competition mechanism of the same storage access by the multi-kernel system of the neuron array. ), So there is no need for the arbitration mechanism to avoid conflicts and the complex cache synchronization mechanism (Cache Synchronization) required by the traditional multi-core processor system on the communication bus.
  • the cascaded input and output system of the array can linearly increase the calculation throughput and the number of neuron circuits 502, and at the same time optimize the storage access mechanism to avoid useless calculations.
  • Embodiment 7 is a diagrammatic representation of Embodiment 7:
  • this embodiment further relates to:
  • the storage unit 501 is further configured to store intermediate data of the data processing of the nodes in the real-time neural network layer.
  • Embodiment 8 is a diagrammatic representation of Embodiment 8
  • FIG. 6 shows a structure of a deep learning chip cascade system provided in Embodiment 8 of the present invention. For ease of description, only parts related to the embodiment of the present invention are shown, including:
  • At least two deep learning chips 601 in a cascade relationship with each other, as in any of the above embodiments.
  • multiple acceleration chips can be cascaded, thereby expanding parallel processing capabilities and meeting the usage requirements of different scenarios.
  • FIG. 7 shows the structure of a deep learning system provided by Embodiment 9 of the present invention. For ease of description, only parts related to the embodiment of the present invention are shown, including:
  • the deep learning chips 701 may be cascaded or may not be cascaded and independent of each other.
  • the peripheral device 702 may be other embedded processors or sensors.
  • Embodiment 10 is a diagrammatic representation of Embodiment 10:
  • FIG. 8 shows a flow of a neuron control method provided in Embodiment 10 of the present invention. For convenience of explanation, only a part related to the embodiment of the present invention is shown, which involves the following steps:
  • step S801 the neuron processing mode configuration information is obtained.
  • step S802 according to the processing mode configuration information, the control computing module is adjusted to the corresponding computing infrastructure and executes the corresponding neural network layer node data processing.
  • Embodiment 11 is a diagrammatic representation of Embodiment 11:
  • FIG. 9 shows a flow of a deep learning control method provided in Embodiment 11 of the present invention. For ease of description, only parts related to the embodiment of the present invention are shown, which involves the following steps:
  • step S901 a deep learning instruction set is obtained, and the deep learning instruction set includes: a plurality of neural network layer instructions having a predetermined processing order.
  • step S902 according to the deep learning instruction set, the control is such that: the current processing mode configuration information corresponding to the current neural network layer instruction and the corresponding data to be processed are placed into the neuron circuit in the neuron array, wherein the neural The meta circuit is adjusted to the corresponding computing infrastructure and executes the corresponding neural network layer node data processing according to the current processing mode configuration information, and after the current neural network layer processing task indicated by the current neural network layer instruction is completed, it executes the next The neural network layer processes the tasks until the deep learning tasks indicated by the deep learning instruction set are completed.
  • Embodiment 12 is a diagrammatic representation of Embodiment 12
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in the eleventh or twelfth embodiment of the method are implemented. For example, steps S801 to S802 shown in FIG. 1.
  • the computer-readable storage medium of the embodiment of the present invention may include any entity or device capable of carrying computer program code, a recording medium, for example, a memory such as a ROM / RAM, a magnetic disk, an optical disk, a flash memory, or the like.
  • a recording medium for example, a memory such as a ROM / RAM, a magnetic disk, an optical disk, a flash memory, or the like.
  • Embodiment 13 is a diagrammatic representation of Embodiment 13:
  • the process of the deep learning method provided by the thirteenth embodiment of the present invention is based on the deep learning chip as described above or the deep learning chip cascade system as described above or the deep learning system as described above. For ease of description, only the relevant examples of the present invention are shown. , Which involves the following steps:
  • the central controller 503 controls according to the deep learning instruction set to place the current processing mode configuration information corresponding to the current neural network layer instruction and the corresponding data to be processed into the neuron circuit 502 in the neuron array 504.
  • the neuron circuit 502 adjusts to the corresponding computing infrastructure and executes the corresponding neural network layer node data processing according to the current processing mode configuration information, and after the current neural network layer processing task indicated by the current neural network layer instruction is completed, it executes the next The neural network layer processes the tasks until the deep learning tasks indicated by the deep learning instruction set are completed.
  • This application example specifically relates to the design and application of a deep learning instruction set and a coarse-grained reconfigurable Neuromorphic Array (CRNA) architecture based on the instruction set.
  • CRNA reconfigurable Neuromorphic Array
  • This application example uses all-digital circuits to design neurons and neuron arrays, introduces a pipeline design method, and flexibly implements the configuration of neural network types, neural network structures (the number of neural network layer nodes and the number of neural network layers), and more through dynamic configuration.
  • the data processing speed can be greatly improved, and it can meet the needs of the current fast iterative neural network algorithm. It has the characteristics of low power consumption, fast processing speed, and reconfigurability, and is especially suitable for computing resources that are limited,
  • the use scenarios with less storage capacity, power consumption requirements, and fast processing speeds have broadened the application areas of hardware and software based on neural networks.
  • the instruction set is the core of the processor design and the interface between the software system and the hardware chip.
  • This application example supports the instruction set of neural network hierarchical description, which specifically involves the following five or more types of neural network layer instructions.
  • the instruction width is 96 bits.
  • the instruction width can be adaptively adjusted, and specifically involves: convolutional network layer instructions as shown in FIG. 10, and pools as shown in FIG. 11.
  • Bits are assigned “1" to indicate forward, "0" to indicate reverse, etc .; in Figure 13 You can assign "00001” to bits 5-9 to indicate that the activation function mode is a Linear Rectified (Rectified, Linear Unit, ReLU) function, and assign "00010” to indicate that the activation function mode is a Sigmoid function, and assign a value of "00011- 11111 "to indicate that the encoding is scalable, etc .; In Figure 14, the 45th to 47th bits can be assigned a value of" 000 “to indicate that the iteration strategy uses a deep Q-learning (DQN) algorithm, and a value of" 001 "is used to indicate the iteration The strategy uses the State-Action-Reward-State-Action (SARSA) algorithm, and assigns a value of "010-111” to indicate that the encoding is scalable, and the E-greedy probability can take 0-100.
  • SARSA State-Action-Reward-State-A
  • the CRNA architecture chip proposed in this application example includes a storage unit 1501, an input-output unit 1502, a number of neuron circuits 1503, and a central controller 1504 as shown in FIG. 15 as a whole.
  • This architecture breaks the limitations of the traditional Feng Neumann architecture, relies on distributed memory to optimize the use of memory, and flexibly implements different neural network models, neural network structures, and combined applications of multiple models of neural networks through dynamic configuration; based on neurons
  • the control module in the circuit 1503 and the central processing unit 1504 realize the configuration of the storage and the fast arithmetic function of the pipeline of the neural network, as well as the optimization design of the artificial neuron through hardware, which greatly improves the computing capacity of the entire CRNA architecture.
  • the CRNA architecture makes full use of memory resources, and breaks through the Feng Neumann architecture to further improve computing capabilities, effectively reduce the amount of data transmission, and greatly reduce power consumption.
  • the CRNA architecture proposed in this application example supports the deployment of multiple hybrid neural network layers. Very flexible and reconfigurable, low power consumption, high computing power and other advantages.
  • each unit in the CRNA architecture can be as follows:
  • the memory includes: a storage unit 1501 shown in FIG. 15 and a parameter storage module 15031 located in the neuron circuit 1503.
  • the storage unit 1501 can be distributedly deployed: a first storage sub-unit 15011, a second storage sub-unit 15012, and a third Storage sub-unit 15013. These storage sub-units can also be centrally deployed in one physical storage, as detailed below:
  • the first storage subunit 15011 is configured to store data processed by the neural network, including: input data, data stored between layers of the neural network, and output data.
  • the parameter storage module is used to store the parameters required for the data processing of the trained neural network node, and the parameter storage can be completed during the initialization phase of the neural network.
  • the neuron circuit 1503 can read the parameters in the parameter storage module to complete the corresponding neural network layer node operations, and the neuron circuit 1503 only reads the local parameters to avoid data access between neurons. possibility.
  • the second storage subunit 15012. This part of the memory determines the type of the neural network of the CRNA architecture (convolutional network, regional network, recurrent network, or reinforcement learning network) and the structure of the neural network (number of nodes in the neural network layer, number of neural network layers, each Neural network layer).
  • the third storage subunit 15013 which is specifically for the reinforcement learning network mode or the recurrent network mode, stores intermediate data generated by the operation of the reinforcement learning network and the recurrent network.
  • the input / output unit 1502 is configured to implement serial input and output of input data and output data respectively through an input shift register and an output shift register.
  • the neuron circuit 1503 may perform a neuron operation in a specified mode on the input data of the neural network according to the configuration, and obtain an operation result.
  • the artificial neuron design method of the CRNA architecture can flexibly implement the deployment of a single kind of neural network or the combined deployment of multiple neural networks, as described below:
  • the neuron circuit 1503 may include a structure as shown in FIG. 16, which includes: a calculation module, a configuration information storage module 1601, a control module 1602, a parameter storage module 1603, an address generation module 1604, a temporary storage module 1605, an operation cache module 1606, Configuration information / parameter input module 1607, data input module 1608, data output module 1609, and so on.
  • the configuration information storage module 1601 may use a configuration chain register
  • the operation buffer module 1606 may be an accumulation register
  • the calculation module may include a multiplier, an adder, an activation function module 1610, a gating module, and the like. The functions of each module are detailed below:
  • the configuration information / parameter input module 1607 is used to input processing mode configuration information and neural network parameters of the neuron to the neuron circuit 1503, and the processing mode configuration information is used to configure the working mode of the neuron.
  • the gating module may be embodied as MUX and / or DEMUX.
  • MUX is labeled M1 and M2 in the figure
  • DEMUX is labeled DM1 and DM2 in the figure.
  • M1 is used to select whether to skip the multiplication unit. If the read parameter is 0, it is skipped.
  • DM1 is used to control whether the destination of the input content is the configuration information storage module 1601 or the parameter storage module 1603.
  • DM2 is used to specify the activation function or skip. Activation processing; M2 selects the output of the activation function; M1, M2, DM1, and DM2 are selected by the control module 1602.
  • the address generation module 1604 is used to ensure that the input data of the neuron matches the parameters read in real time from the parameter memory.
  • the multiplier and adder form a multiplication and addition module, which is used to multiply data and parameters.
  • the result is stored in the operation buffer module 1606 and read out as one of the addition inputs in the next cycle. If the calculation result needs to be backed up, the result is It is stored in the temporary storage module 1605.
  • the control module 1602 is configured to control the working mode of the entire neuron according to the configuration information, including the selection of MUX and DEMUX, the working mode of the address generation module 1604, and the like.
  • the data output module 169 is configured to output a calculation result of the neuron circuit 1503.
  • the workflow of the neuron circuit 1503 is roughly as follows:
  • the serial input content of the configuration information / parameter input module 1607 is used to configure the neuron.
  • the neural network parameters and configuration information are stored in the parameter storage module 1603 and the configuration information storage module 1601, respectively.
  • the neuron obtains input data from the data input module 1608, and finds the neural network parameters that match the input data from the parameter storage module 1603 for the neuron to perform the required multiplication and addition operations.
  • the result of the multiplication and addition operation is selected according to the mode content in the instruction of the activation function layer, the instruction field is selected to select the corresponding activation function for the required activation processing, and then the neuron activation result is stored in the corresponding memory according to the neural network mode (operation Cache module 1606, or operation cache module 1606 and temporary storage module 1605).
  • the output result of the neuron is output through the data output module 1609, and output through the CRNA architecture input and output unit 1502 for storage.
  • the central controller 1504 uses a finite state machine in the CRNA architecture, and the state machine completes the transition of different states according to the transition conditions, thereby controlling the workflow of the entire architecture, as shown in FIG. 17: the configuration process S1701, the neural network calculation process S1702, Data transmission process S1703 and so on. As shown in Figure 17, the detailed description of the 128-level state machine control process is as follows:
  • the second storage sub-unit 15012, the parameter storage module 1603, and the first storage sub-unit 15011 are configured according to the algorithm requirements.
  • a 128-level state machine control process is involved, which includes: instruction update and content decoding to implement core operations of the neural network.
  • 128 neurons are used.
  • the 128 neurons will be used for multiple batch calculations, that is, 128
  • the artificial neuron array composed of each neuron is continuously reused.
  • the global parameter configuration controls the overall characteristics of the neural network, and controls the data flow between neural network layers. Jump, data input and output, parameter assignment, etc.
  • process S1703 the output result of the neural network is transmitted to a host computer for use.
  • the memory is initialized and configured according to the configuration information, parameters, and data corresponding to the deep learning instruction set. Specifically, the type (mode) of the neural network, the number of layers of the neural network, and the processing mode of the neuron may be configured.
  • the corresponding neural network operation is performed according to the type of the neural network. Specifically, it involves performing corresponding data processing on the current neural network layer instructions, completing the current neural network layer processing task, and then using the data processed by the current neural network layer as the pending data for the next neural network layer processing task, and executing the next neural network. Layer processing tasks.
  • the configuration information When performing the tasks of the neural network layer, generally read the configuration information from the memory and put it into the neuron circuit in the neuron array, complete the processing mode configuration of the neuron circuit, and then read the neural network parameters from the memory and put it into the neuron array.
  • the data to be processed In the neuron circuit inside, the data to be processed is serially input from the memory for processing, and the neural network parameters can be called during the processing.
  • the operation results are continuously registered in the accumulation register for accumulation, and all the data targeted by a neural network layer instruction are completely completed.
  • the operation result is sequentially stored in the memory through the output shift register, and is used as input data processed by the next neural network layer.
  • the operation results of the last neural network layer are also stored in the memory, and the operation results are sequentially output to the upper computer according to the data output process for subsequent use by the upper computer.
  • This CRNA architecture chip design is based on United Microelectronics Corporation (UMC) 65nm Complementary Metal Oxide Semiconductor (CMOS) process for simulation and logic synthesis.
  • UMC United Microelectronics Corporation
  • CMOS Complementary Metal Oxide Semiconductor
  • a reconfigurable array integrating 128 digital neurons is designed as a computing unit.
  • Each neuron contains two 1KB data memories and two 1KB parameter memories. It has a real-time and flexible control port for closing neurons, which can reduce neurons. Dynamic power consumption.
  • the main state machine can flexibly adjust the working states of the memory and neuron array, and through data flow control, it implements functions such as jumps between network layers, data input and output, and parameter allocation.
  • Chip simulation A fully-connected neural network with 10 layers of arbitrary input and output nodes is configured, and it is found from the waveform diagram that whether the neuron array has an efficiency of 100% when it is enabled, the jump between layers is only a state machine. Delay of 2 clock cycles. Fully connected network functions allow complete mapping from algorithms to circuits.
  • Table 1 Basic unit usage and area occupation
  • the proposed instruction is an assembly language level instruction, which is different from the existing operating system-based platform-level network deployment model (TensorFlow, Caffe, etc.) framework. It does not require the support of the operating system and directly changes the chip operating mode. The programming efficiency is extremely high. High, can be directly deployed in ultra-low-power computing scenarios.
  • the CRNA architecture uses digital circuits to implement artificial neurons, making neurons have the advantages of strong anti-noise ability, high reliability, high accuracy, good scalability, and mature and standardized design methods.
  • the calculation accuracy of this design is an 8-bit fixed-point quantization method. Compared with the unit binary quantization network used in today's deep learning processor, this design has higher calculation accuracy.
  • the implementation of the neural network layer is more flexible.
  • Arrays are used to deploy complex neural networks.
  • Blocks are used to implement neural network models with different numbers of nodes.
  • a large number of neural computing units are reused, which greatly improves the utilization of hardware resources, saves hardware costs, and has high flexibility.
  • the CRNA architecture is reconfigurable and programmable.
  • the CRNA architecture uses a pipelined distributed storage method, which reduces latency and power consumption, improves system reliability, and makes each computing unit a relatively small independent functioning system with relatively complete functions and a relatively complete structure.
  • Each neuron is connected individually, and the configuration process has similarity and progressive relationship, so it is easier to implement reconfigurable, and different modes of networks are configured globally through configuration instruction memory. Therefore, reconfigurability and programmability are realized from global and local;
  • the distributed storage method reduces the delay and power consumption, improves the reliability of the system, and makes the data and parameter distribution more uniform. Compared with the centralized storage method, it has a better balance and thus has good integration.
  • the CRNA architecture can be used with embedded processors and sensors, cascading multiple acceleration chips, and expanding parallel processing capabilities to meet the needs of different scenarios.
  • the instructions and CRNA architecture mentioned in this application example have high-speed, low-power, flexible and reconfigurable capabilities, providing a reliable computing platform for today's heterogeneous deep neural networks, and promoting deep neural network algorithms in mobile IoT terminal devices, Widely used in drones and autonomous driving.
  • each unit or module involved in the above embodiments may be implemented by corresponding hardware or software units.
  • Each unit or module may be an independent software or hardware unit or module, or may be integrated into a software or hardware unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

本发明适用计算机技术领域,提供了一种神经元电路、芯片、系统及其方法、存储介质,在神经元电路中包括如下结构:计算模块;配置信息存储模块,用于存储神经元处理模式配置信息;以及,控制模块,用于根据所述处理模式配置信息,控制所述计算模块调整为对应的计算基础架构并执行对应的神经网络层节点数据处理。这样,可满足快速迭代的复杂多样的神经网络计算需求,可广泛应用到计算资源受限、需要一定神经网络架构可重构的领域,扩展了深度学习芯片的应用。

Description

神经元电路、芯片、系统及其方法、存储介质 技术领域
本发明属于计算机技术领域,尤其涉及一种神经元电路、芯片、系统及其方法、存储介质。
背景技术
近年来,随着基于人工神经网络的深度学习技术在计算机视觉、自然语言处理、智能系统决策等领域的广泛应用,对神经网络计算进行加速的人工智能芯片技术得到学术界和工业界的关注和重视。
现有为神经网络计算而定制的专用集成电路(Application Specific Integrated Circuit,ASIC)芯片多数还是基于预先指定的网络结构与算法,过度追求功耗和速度的性能,导致其硬件结构固定,不具备神经网络架构的可重构性,无法部署如今快速迭代的复杂多样的神经网络结构,从而无法广泛应用到计算资源受限、需要一定神经网络架构可重构的领域,如移动物联网终端、无人机、无人驾驶等领域,ASIC芯片应用受到限制。
发明内容
本发明的目的在于提供一种神经元电路、芯片、系统及其方法、存储介质,旨在解决现有技术所存在的、神经网络架构无法重构而导致深度学习芯片应用受限的问题。
一方面,本发明提供了一种神经元电路,所述神经元电路包括:
计算模块;
配置信息存储模块,用于存储神经元处理模式配置信息;以及,
控制模块,用于根据所述处理模式配置信息,控制所述计算模块调整为对 应的计算基础架构并执行对应的神经网络层节点数据处理。
另一方面,本发明提供了一种深度学习芯片,所述深度学习芯片包括:
存储单元,用于存储深度学习指令集以及深度学习所针对数据,所述深度学习指令集包括:若干具有预定处理顺序的神经网络层指令;
由若干如上述的神经元电路构成的神经元阵列;
中央控制器,用于按照所述深度学习指令集控制使得:从所述存储单元向所述神经元阵列中的所述神经元电路置入与当前所述神经网络层指令相对应的当前所述处理模式配置信息及相应所需处理的数据,并在当前所述神经网络层指令所指示的当前神经网络层处理任务完成后,执行下一神经网络层处理任务,直至所述深度学习指令集所指示的深度学习任务完成;以及,
输入输出单元,用于实现数据在所述存储单元与所述神经元阵列之间的传输。
另一方面,本发明还提供了一种深度学习芯片级联系统,所述深度学习芯片级联系统包括:至少两个相互之间存在级联关系的、如上述的深度学习芯片。
另一方面,本发明还提供了一种深度学习系统,所述深度学习系统包括:至少一个如上述的深度学习芯片,以及与所述深度学习芯片相连的外围器件。
另一方面,本发明还提供了一种神经元控制方法,所述神经元控制方法包括下述步骤:
获得神经元处理模式配置信息;
根据所述处理模式配置信息,控制计算模块调整为对应的计算基础架构并执行对应的神经网络层节点数据处理。
另一方面,本发明还提供了深度学习控制方法,所述深度学习控制方法包括下述步骤:
获得深度学习指令集,所述深度学习指令集包括:若干具有预定处理顺序的神经网络层指令;
按照所述深度学习指令集,控制使得:向神经元阵列中的神经元电路置入 与当前神经网络层指令相对应的当前处理模式配置信息及相应所需处理的数据,其中,神经元电路根据当前所述处理模式配置信息调整为对应的计算基础架构并执行对应的神经网络层节点数据处理,并在当前所述神经网络层指令所指示的当前神经网络层处理任务完成后,执行下一神经网络层处理任务,直至所述深度学习指令集所指示的深度学习任务完成。
另一方面,本发明还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如上述方法中的步骤。
另一方面,本发明还提供了一种深度学习方法,所述深度学习方法基于上述的深度学习芯片或如上述的深度学习芯片级联系统,所述深度学习方法包括下述步骤:
向所述存储单元置入所述深度学习指令集以及所述数据;
所述中央控制器按照所述深度学习指令集,控制使得:向所述神经元阵列中的所述神经元电路置入与当前所述神经网络层指令相对应的当前所述处理模式配置信息及相应所需处理的数据,其中,所述神经元电路根据当前所述处理模式配置信息调整为对应的计算基础架构并执行对应的神经网络层节点数据处理,并在当前所述神经网络层指令所指示的当前神经网络层处理任务完成后,执行下一神经网络层处理任务,直至所述深度学习指令集所指示的深度学习任务完成。
本发明在神经元电路中包括如下结构:计算模块;配置信息存储模块,用于存储神经元处理模式配置信息;以及,控制模块,用于根据所述处理模式配置信息,控制所述计算模块调整为对应的计算基础架构并执行对应的神经网络层节点数据处理。这样,可根据不同的场景功能、神经网络种类、神经网络规模、神经元运算模式等需求,灵活配置神经元电路以及神经元电路所应用的深度学习芯片,使得深度学习芯片及神经元电路能根据实际神经网络计算需要进行重构,从而满足快速迭代的复杂多样的神经网络计算需求,可广泛应用到计 算资源受限、需要一定神经网络架构可重构的领域,扩展了深度学习芯片的应用。
附图说明
图1是本发明实施例一提供的神经元电路的结构示意图;
图2是本发明实施例二提供的神经元电路的结构示意图;
图3是本发明实施例三提供的神经元电路的结构示意图;
图4是本发明实施例四提供的神经元电路的结构示意图;
图5是本发明实施例五提供的深度学习芯片的结构示意图;
图6是本发明实施例八提供的深度学习芯片级联系统的结构示意图;
图7是本发明实施例九提供的深度学习系统的结构示意图;
图8是本发明实施例十提供的神经元控制方法的流程示意图;
图9是本发明实施例十一提供的深度学习控制方法的流程示意图;
图10是本发明一应用实例中卷积网络层指令的数据结构示意图;
图11是本发明一应用实例中池化网络层指令的数据结构示意图;
图12是本发明一应用实例中全连接网络层指令的数据结构示意图;
图13是本发明一应用实例中激活函数网络层指令的数据结构示意图;
图14是本发明一应用实例中状态动作网络层指令的数据结构示意图;
图15是本发明一应用实例中CRNA架构芯片的结构示意图;
图16是本发明一应用实例中神经元电路的结构示意图;
图17是本发明一应用实例中中央控制器128级状态机控制流程示意图。
具体实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
以下结合具体实施例对本发明的具体实现进行详细描述:
实施例一:
图1示出了本发明实施例一提供的神经元电路的结构,具体地,涉及一种数字神经元电路,用于构成深度学习神经网络,而深度学习神经网络可对输入的数据进行所需的各神经网络层的有序处理,神经元电路则用于执行神经网络层对应节点上所需的数据处理。为了便于说明,仅示出了与本发明实施例相关的部分,其中包括:
计算模块101,能进行计算基础架构的调整,以执行不同的神经网络层节点数据处理。在本实施例中,计算模块101可用于进行对应的乘法运算、加法运算、利用激活函数激活等单个处理,或进行不同处理的灵活组合等。此处所提及的计算模块101能执行至少两种不同的神经网络层节点数据处理至少包含如下两种含义:其一,对应于不同类型的神经网络层,例如:卷积网络层、池化网络层、全连接网络层、激活函数网络层、状态动作网络层等,其数据处理要求不尽相同,计算模块101可满足至少两种神经网络层数据处理要求,计算模块101这种能适应不同种类神经网络层数据处理的能力,可通过根据需求利用对乘法运算、加法运算、激活等处理的灵活组合来实现,例如:在需要某计算模块101进行卷积网络层数据处理时,计算模块101通过对上述处理的灵活组合即可满足卷积网络层数据处理要求,在需要该计算模块101进行全连接网络层数据处理时,计算模块101通过对上述处理的另一灵活组合即可满足全连接网络层数据处理要求,这种针对不同需求的灵活组合需要依靠后续的配置信息存储模块102及控制模块103的配合一同实现,在这个含义中,神经元电路可进行某一种类神经网络层中一个节点的数据处理,也可以进行另一种类神经网络层中一个节点的数据处理;其二,神经元电路可进行同一类型神经网络层中不同节点对应的数据处理,例如:一神经元电路可进行第一卷积网络层中某一节点的数据处理,也可进行第二卷积网络层中某一节点的数据处理;其三,神经元电路可进行同一神经网络层中不同节点对应的数据处理,神经元电路在 同一神经网络层所需的数据处理中进行复用。当进行不同节点数据处理时,计算模块101的计算基础架构不尽相同,计算模块101可受控进行计算基础架构的调整以适应不同节点数据处理。某一神经网络层中所有节点的数据处理完成,则该神经网络层的数据处理完成,神经网络中各神经网络层的数据完成,则该神经网络的数据处理完成。
配置信息存储模块102,用于存储神经元处理模式配置信息。在本实施例中,处理模式配置信息指示出该神经元电路需要进行对应神经网络层节点数据处理时,所需要的一些配置信息,这些配置信息可指示该神经元电路需要实现哪些节点运算等。
控制模块103,用于根据处理模式配置信息,控制计算模块调整为对应的计算基础架构并执行对应的神经网络层节点数据处理。
实施本实施例,神经元电路可进行整个深度学习神经网络某一神经网络层中一个节点的数据处理,这样,可根据不同的场景功能、神经网络种类、神经网络规模、神经元运算模式等需求,灵活配置神经元电路,使得神经元电路能根据实际神经网络计算需要进行重构,从而满足快速迭代的复杂多样的神经网络计算需求,可广泛应用到计算资源受限、需要一定神经网络架构可重构的领域,扩展了深度学习芯片的应用。
实施例二:
本实施例在其他实施例的神经元电路基础上,进一步提供了如下内容:
如图2所示,神经元电路还包括:
参数存储模块201,用于存储神经网络层节点数据处理所需的参数。在本实施例中,参数可以为训练所得神经网络参数。
地址生成模块202,用于受控制模块103控制,查找与神经网络层节点数据处理针对的数据所对应的参数,查找得到的参数会输入到计算模块101以参与到对应数据处理中。
在本实施例中,由于某些类型神经网络层的数据处理无需调用参数,例如: 池化网络层、激活函数网络层等,因此,神经元电路基础配置即无需上述参数存储模块201及地址生成模块202,而其他如卷积网络层、全连接网络层、状态动作网络层等进行数据处理时需要调用神经网络参数,则需要在神经元电路中配置上述参数存储模块201及地址生成模块202,这样即可增强神经元电路的广泛适用性。
实施例三:
本实施例在其他实施例的神经元电路基础上,进一步提供了如下内容:
如图3所示,神经元电路还包括:
临时存储模块301,用于存储神经网络层节点数据处理的中间数据。
在本实施例中,由于某些类型神经网络如卷积网络、区域网络等,无需对神经元电路处理所得中间数据进行保存以供后续处理,因此,神经元电路基础配置即无需上述临时存储模块301,而其他如强化学习网络、循环网络等数据处理需要使用神经元电路处理所得中间数据,则需要在神经元电路中配置上述临时存储模块301,这样也可增强神经元电路的广泛适用性。
实施例四:
本实施例在其他实施例的神经元电路基础上,进一步提供了如下内容:
如图4所示,计算模块101包括:
基础计算模块401,基础计算模块包括:乘法器、加法器和/或激活函数模块等。
选通模块402,用于在控制模块103的控制下执行对应的选通动作,使基础计算模块401构成对应的计算基础架构,选通模块402可包括:复用器(Multiplexer,MUX)和/或解复用器(Demultiplexer,DEMUX)等。
在本实施例中,基础计算模块401可进行乘法运算、加法运算、利用激活函数激活等基础处理,在进行基础处理时,可从上述参数存储模块201获得所需参数,选通模块402则可将基础计算模块401按照需要进行相应调整,得到实时神经网络层节点数据处理所需的计算基础架构,实现计算模块101的重构。
实施例五:
图5示出了本发明实施例五提供的深度学习芯片的结构,为了便于说明,仅示出了与本发明实施例相关的部分,其中包括:
存储单元501,用于存储深度学习指令集以及深度学习所针对数据,深度学习指令集包括:若干具有预定处理顺序的神经网络层指令。在本实施例中,深度学习指令集包括深度学习任务所涵盖的所有神经网络层指令,例如:卷积网络层指令、池化网络层指令、全连接网络层指令、激活函数网络层指令、状态动作网络层指令等。当然,这个指令集中通常还包括为完成深度学习任务而需要具备的其他信息,例如:神经网络种类信息、神经网络结构信息等,其中,神经网络种类信息可指示该神经网络为卷积网络、区域网络、循环网络或强化学习网络等,神经网络结构信息可包括:神经网络所包含的神经网络层层数信息、神经网络层中的节点数信息,以及神经网络层需要实现哪些运算的指示信息等,该指示信息与上述配置信息存储模块102中所存储的处理模式配置信息相对应。
由若干如上述的神经元电路502构成的神经元阵列504。具体功能、结构如其他实施例所述,此处不再赘述。
中央控制器503,用于按照深度学习指令集控制使得:从存储单元501向神经元阵列504中的神经元电路502置入与当前神经网络层指令相对应的当前处理模式配置信息及相应所需处理的数据,并在当神经网络层指令所指示的当前神经网络层处理任务完成后,执行下一神经网络层处理任务,直至深度学习指令集所指示的深度学习任务完成。在本实施例中,中央控制器503可采用各种类型控制器,例如:进阶精简指令集机器(Advanced Reduced Instruction Set Computer Machine,ARM)控制器、英特尔Intel系列控制器、华为海思系列控制器等,其架构可采用有限状态机,可根据条件完成不同状态的转换,从而控制其所涵盖的神经网络的工作流程,具体包括:配置流程、神经网络运算流程、数据传输流程等,其中,可能涉及整个深度学习芯片中神经元阵列504针对一 个神经网络层的单批次处理,或神经元阵列针对一个神经网络层的多批次处理,多批次处理则涉及神经元电路502的复用。中央控制器503主要可用于对构成神经网络的深度学习芯片进行配置,使得神经元阵列504可根据深度学习指令集中的神经网络层指令进行有序的数据处理。在整个神经网络的运行过程中,中央控制器503用以实现神经网络的核心运算,包括:指令的更新、内容解码等。
输入输出单元505,用于实现数据在存储单元501与实时神经元阵列504之间的传输。
一个深度学习任务的处理流程大致如下:
首先执行当前神经网络层处理任务,存储单元501中与当前神经网络层指令对应的当前处理模式配置信息,会通过输入输出单元505置入神经元阵列504内的神经元电路502中,完成神经元电路502的配置,随后存储单元501中的待处理数据再通过输入输出单元505置入神经元阵列504内的神经元电路502中,神经元电路502则在所完成的配置基础上,对置入的数据进行处理,处理所得数据作为下一神经网络层处理任务的待处理数据。然后,按照上述同样的方法,执行下一神经网络层处理任务,直至所有的神经网络层处理任务完成,最终完成本次深度学习任务。
另外,如果深度学习任务中需要引入神经网络参数进行处理,则上述处理流程中,在处理模式配置信息置入神经元电路502后,还会从存储单元501中向神经元阵列504内的神经元电路502置入相应的参数,再执行对数据的置入、处理。
实施本实施例,可根据不同的场景功能、神经网络种类、神经网络规模、神经元运算模式等需求,灵活配置神经元电路以及神经元电路所应用的深度学习芯片,使得深度学习芯片及神经元电路能根据实际神经网络计算需要进行重构,从而满足快速迭代的复杂多样的神经网络计算需求,可广泛应用到计算资源受限、需要一定神经网络架构可重构的领域,扩展了深度学习芯片的应用。
实施例六:
本实施例在其他实施例的深度学习芯片基础上,进一步涉及:
输入输出单元505为串入(Stream-in)串出(Stream-out)式移位寄存器,神经元电路502与输入输出单元505之间建立独立的数据传输通路。
在本实施例中,存储单元501所存储的神经网络层每一节点的待处理数据将通过输入移位寄存器及独立的数据传输通路传输到神经元阵列504中对应的神经元电路502中进行处理,处理完成后,处理所得数据再通过独立的数据传输通路以及输出移位寄存器传输到存储单元501中进行存储。如果当前神经网络层所有节点的处理所得数据是下一神经网络层的待处理数据,则是在当前神经网络层所有节点均完成数据处理后,再将所有处理所得数据作为下一神经网络层的待处理数据。
实施本实施例,可实现串入串出式的数据流水线传输,相较于传统的多神经元对多数据进行访问所需的多扇出电路而言,无需再计算数据存储的访问地址,对读写进行大幅度简化,降低了对存储器的带宽要求,大幅度降低了输入输出功耗。而移位寄存器的采用以及神经元电路502与移位寄存器之间建立的数据传输通路在各个神经元电路502之间相对独立,可避免神经元阵列多内核系统对同一存储访问的竞争机制(Retention),从而无需像传统多核处理器系统在通讯总线上所需要的、用于避免冲突的仲裁机制以及复杂的缓存同步机制(Cache Synchronization),从而因串入串出式寄存器的引入而构成的多阵列级联输入输出系统,可以使计算吞吐量与神经元电路502的数量呈线性增长,同时优化了存储的访问机制,避免了无用计算。
实施例七:
本实施例在其他实施例的深度学习芯片基础上,进一步涉及:
存储单元501还用于:存储实时神经网络层节点数据处理的中间数据。
本实施例的实施目的与上述实施例三相同,此处不再赘述。
实施例八:
图6示出了本发明实施例八提供的深度学习芯片级联系统的结构,为了便于说明,仅示出了与本发明实施例相关的部分,其中包括:
至少两个相互之间存在级联关系的、如上述任一实施例的深度学习芯片601。
实施本实施例,可以级联多块加速芯片,从而扩大并行处理能力,满足不同场景的使用需求。
实施例九:
图7示出了本发明实施例九提供的深度学习系统的结构,为了便于说明,仅示出了与本发明实施例相关的部分,其中包括:
至少一个如上述的深度学习芯片701,以及与深度学习芯片701相连的外围器件702。在本实施例中,当存在至少两个深度学习芯片701时,深度学习芯片701之间可以级联,也可以不级联而相互独立。而外围器件702可以是其他嵌入式处理器或传感器等。
实施例十:
图8示出了本发明实施例十提供的神经元控制方法的流程,为了便于说明,仅示出了与本发明实施例相关的部分,其中涉及如下步骤:
在步骤S801中,获得神经元处理模式配置信息。
在步骤S802中,根据所述处理模式配置信息,控制计算模块调整为对应的计算基础架构并执行对应的神经网络层节点数据处理。
上述步骤S801、S802处理所涉及内容在其他实施例中相关内容中有具体呈现,此处可援引而不再赘述。
实施例十一:
图9示出了本发明实施例十一提供的深度学习控制方法的流程,为了便于说明,仅示出了与本发明实施例相关的部分,其中涉及如下步骤:
在步骤S901中,获得深度学习指令集,该深度学习指令集包括:若干具有预定处理顺序的神经网络层指令。
在步骤S902中,按照深度学习指令集,控制使得:向神经元阵列中的神经元电路置入与当前神经网络层指令相对应的当前处理模式配置信息及相应所需处理的数据,其中,神经元电路根据当前所述处理模式配置信息调整为对应的计算基础架构并执行对应的神经网络层节点数据处理,并在当前神经网络层指令所指示的当前神经网络层处理任务完成后,执行下一神经网络层处理任务,直至深度学习指令集所指示的深度学习任务完成。
上述步骤S901、S902处理所涉及内容在其他实施例中相关内容中有具体呈现,此处可援引而不再赘述。
实施例十二:
在本发明实施例中,提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述方法实施例十一或十二中的步骤,例如,图1所示的步骤S801至S802。
本发明实施例的计算机可读存储介质可以包括能够携带计算机程序代码的任何实体或装置、记录介质,例如,ROM/RAM、磁盘、光盘、闪存等存储器。
实施例十三:
本发明实施例十三提供的深度学习方法的流程,基于如上述深度学习芯片或如上述深度学习芯片级联系统或如上述深度学习系统,为了便于说明,仅示出了与本发明实施例相关的部分,其中涉及如下步骤:
向存储单元501置入深度学习指令集以及数据;
中央控制器503按照深度学习指令集,控制使得:向神经元阵列504中的神经元电路502置入与当前神经网络层指令相对应的当前处理模式配置信息及相应所需处理的数据,其中,神经元电路502根据当前处理模式配置信息调整为对应的计算基础架构并执行对应的神经网络层节点数据处理,并在当前神经网络层指令所指示的当前神经网络层处理任务完成后,执行下一神经网络层处理任务,直至深度学习指令集所指示的深度学习任务完成。
下面通过一个应用实例,对上述各实施例所涉及的神经元电路、芯片、系 统及其方法、存储介质的相关内容进行具体说明。
本应用实例具体涉及一种深度学习指令集以及基于该指令集的粗粒度可重构神经形态阵列(Coarse-grained Reconfigurable Neuromorphic Array,CRNA)架构的设计和应用,在该设计和应用中,涵盖了上述各实施例所涉及的神经元电路、芯片、系统及其方法、存储介质的相关内容。本应用实例采用全数字电路设计神经元及神经元阵列,引入流水线设计方式,通过动态配置的方式灵活实现配置神经网络种类、神经网络结构(神经网络层节点数目和神经网络层层数)、多种类神经网络的组合应用、神经元的工作模式等。应用本应用实例,能大幅度提高数据的处理速度,又能满足现快速迭代的神经网络算法需求,具有低功耗、处理速度快、可重构性的特点,尤其适用于计算资源受限、存储容量少、功耗有要求、处理速度要求快的使用场景,拓宽了基于神经网络的硬件、软件应用领域。
首先,对深度学习指令集进行相应阐述。指令集是处理器设计的核心,是软件系统与硬件芯片的接口。本应用实例支持神经网络分层描述的指令集,具体涉及如下五种甚至更多类型的神经网络层指令。在本应用实例中,指令宽度均为96位,当然,在其他应用实例中,指令宽度可适应性调整,具体涉及:如图10所示的卷积网络层指令、如图11所示的池化网络层指令、如图12所示的全连接网络层指令、如图13所示的激活函数网络层指令以及如图14所示的状态动作网络层指令,其中,可以对指令中相应数据位进行赋值,从而实现相应的功能,例如:在图10中,可对第70位赋值“1”以表示填充,赋值“0”以表示不填充,可对第65-67位赋值“001”以表示卷积核尺寸为1×1,赋值“010”以表示卷积核尺寸为2×2等;在图11中,可对第65-67位赋值“000”以表示池化策略采用最大池化(max-pooling),赋值“001”以表示池化策略采用最小池化(min-pooling),赋值“010”以表示池化策略采用平均池化(average-pooling)等,可对第70位赋值“1”以表示正向,赋值“0”以表示反向等;在图13中,可对第5-9位赋值“00001”以表示激活函数模式为线性整流(Rectified Linear  Unit,ReLU)函数,赋值“00010”以表示激活函数模式为S型(Sigmoid)函数,赋值“00011-11111”以表示编码可扩展等;在图14中,可对第45-47位赋值“000”以表示迭代策略采用深度Q学习(Deep Q-learning,DQN)算法,赋值“001”以表示迭代策略采用状态-动作-奖励-状态-动作(State-Action-Reward-State-Action,SARSA)算法,赋值“010-111”以表示编码可扩展,E-贪婪概率可取0-100等。
其次,对CRNA架构芯片的设计和应用进行相应阐述。本应用实例提出的CRNA架构芯片,整体上包括如图15所示的存储单元1501,输入输出单元1502、若干神经元电路1503及中央控制器1504等。该架构打破传统的冯诺依曼架构的限制,依靠分布式内存优化内存的使用,通过动态配置方式灵活实现不同神经网络模式、神经网络结构以及多种模式神经网络的组合应用等;基于神经元电路1503中的控制模块和中央处理器1504实现对存储的配置和神经网络的流水线快速运算功能,以及通过硬件实现了人工神经元的优化设计,大大提高了整个CRNA架构的计算能力。该CRNA架构充分使用内存资源,突破冯诺依曼架构进一步提高了计算能力,数据传输量得到有效减少,功耗大幅度降低,本应用实例提出的CRNA架构支持多混合神经网络层的部署,具有很好的灵活可重构性、低功耗、高计算能力等优势。
CRNA架构中各个单元的功能可如下述:
(一)内存:
内存包括:如图15所示的存储单元1501及位于神经元电路1503中的参数存储模块15031,存储单元1501可进行分布式部署:第一存储子单元15011、第二存储子单元15012及第三存储子单元15013,这些存储子单元也可集中部署于一个物理存储,具体说明如下:
第一存储子单元15011,用于存储神经网络处理针对的数据,包含:输入数据、神经网络层间存储数据和输出数据等。
参数存储模块,用于存储已经训练好的神经网络节点数据处理所需的参数, 在神经网络初始化阶段即可完成参数的存储。在神经网络处于运算阶段时,神经元电路1503可相应读取参数存储模块中的参数完成相应神经网络层节点运算,且神经元电路1503仅读取本地参数从而避免了神经元间进行数据访问的可能性。
第二存储子单元15012,该部分内存决定CRNA架构的神经网络类型(卷积网络、区域网络、循环网络或强化学习网络)以及神经网络结构(神经网络层节点数目、神经网络层层数、每神经网络层所实现运算)等。
第三存储子单元15013,该部分内存特别针对强化学习网络模式或循环网络模式,将强化学习网络、循环网络运算所产生的中间数据进行存储。
(二)输入输出:
输入输出单元1502,用于通过输入移位寄存器和输出移位寄存器分别实现对输入数据和输出数据的串入串出,具体说明详见上述实施例六,此处不再赘述。
(三)人工神经元:
神经元电路1503可根据配置对神经网络的输入数据进行指定模式的神经元运算,得到运算结果。CRNA架构的人工神经元设计方法可灵活实现单一种类神经网络的部署或多种神经网络的组合部署,具体说明如下:
神经元电路1503可包括如图16所示的结构,其中涉及:计算模块、配置信息存储模块1601、控制模块1602、参数存储模块1603、地址生成模块1604、临时存储模块1605、运算缓存模块1606、配置信息/参数输入模块1607、数据输入模块1608、数据输出模块1609等。其中,配置信息存储模块1601可采用配置链寄存器,运算缓存模块1606可为累加寄存器,计算模块可包括乘法器、加法器、激活函数模块1610、选通模块等。各模块的功能详述如下:
配置信息/参数输入模块1607,用于对神经元电路1503输入神经元的处理模式配置信息以及神经网络参数,处理模式配置信息用以配置神经元的工作模式。
选通模块可以体现为MUX和/或DEMUX,MUX在图中标号为M1、M2,DEMUX在图中标号为DM1、DM2。M1用来选择是否跳过乘法单元,如读取参数为0则跳过;DM1用来控制输入内容的目的地是配置信息存储模块1601还是参数存储模块1603;DM2用来指定激活函数或跳过激活处理;M2选择激活函数的输出;M1、M2、DM1、DM2所选内容均由控制模块1602指定。
地址生成模块1604,用于保证神经元的输入数据与参数内存中实时读取的参数相匹配。
乘法器、加法器组成乘加模块,用于对数据、参数进行乘法运算,其结果存于运算缓存模块1606,并在下一周期读出作为加法输入之一,如需要备份该计算结果,则结果存于临时存储模块1605中。
控制模块1602,用于根据配置信息,控制整个神经元的工作模式,包括MUX、DEMUX的选择、地址生成模块1604的工作模式等。
数据输出模块169,用于输出神经元电路1503的计算结果。
神经元电路1503的工作流程大致如下:
首先,通过配置信息/参数输入模块1607的串行输入内容用来配置神经元,其中神经网络参数和配置信息分别存储于参数存储模块1603和配置信息存储模块1601中。
其次,完成神经元配置后,神经元从数据输入模块1608获得输入数据,从参数存储模块1603中找到与输入数据相匹配的神经网络参数供神经元进行所需的乘加运算。
然后,将乘加运算结果,根据激活函数层指令中的模式内容,选择指令域来选择相应激活函数进行所需的激活处理,接着将神经元激活结果,根据神经网络模式存储到相应存储器(运算缓存模块1606,或者运算缓存模块1606和临时存储模块1605)。
当完成当前神经网络层节点所有输入数据的运算时,将神经元的输出结果经数据输出模块1609输出,并经过CRNA架构输入输出单元1502输出以进行 存储。
需要说明的是:上述过程包含了乘加运算、激活处理,但在实际应用中,这些运算都是按需选择来配置的。
(四)中央控制:
中央控制器1504在CRNA架构中采用有限状态机,状态机根据转换条件完成不同状态的转换,从而控制整个架构的工作流程,包括如图17所示的:配置流程S1701、神经网络运算流程S1702、数据传输流程S1703等。如图17所示,以128级状态机控制流程进行具体说明如下:
在流程S1701中,对第二存储子单元15012、参数存储模块1603、第一存储子单元15011根据算法需求进行配置。
在流程S1702中,涉及超长矢量流水线128级的状态机控制流程,其中包括:指令的更新和内容解码,实现神经网络的核心运算。在该CRNA架构设计中采用128个神经元,当同一神经网络层节点数目大于CRNA架构所包含的128个人工神经元数目时,会利用该128个神经元进行多次分批计算,即对128各神经元构成的人工神经元阵列不断复用,通过读取并解码指令,对128个神经元处理模式进行配置,通过全局参数配置控制神经网络整体的特性,并控制数据流在神经网络层间的跳转、数据的输入输出、参数分配等。
在流程S1703中,将神经网络的输出结果传输到上位机进行使用。
上述CRNA架构芯片所实现的流程主要涉及:
其一,根据深度学习指令集对应的配置信息、参数、数据,对内存进行初始化配置,具体可,配置神经网络种类(模式)、神经网络层层数数和神经元处理模式等。
其二,根据神经网络种类进行相应神经网络运算。具体涉及:对当前神经网络层指令执行对应的数据处理,完成当前神经网络层处理任务,再以当前神经网络层处理所得数据作为下一神经网络层处理任务的待处理数据,执行下一神经网络层处理任务。
在执行神经网络层任务时,一般先从内存读取配置信息置入神经元阵列内的神经元电路中,完成神经元电路的处理模式配置,然后从内存读取神经网络参数置入神经元阵列内的神经元电路中,接着从内存串行输入待处理数据进行处理,在处理过程中可调用神经网络参数。
这样,通过超长128级流水线流程,实现对数据进行网络运算,在这个过程中,不断将运算结果寄存到累加寄存器中进行累加,待一个神经网络层指令所针对的所有数据都全部完成神经元阵列的网络运算后,将该运算结果依次通过输出移位寄存器存储到内存中,作为下一神经网络层处理的输入数据。当处理到最后一神经网络层时,将该最后一神经网络层的运算结果同样存到内存中,并根据数据输出流程,依次将该运算结果输出到上位机,供上位机后续使用。
本CRNA架构芯片设计基于联华电子公司(United Microelectronics Corporation,UMC)65nm互补金属氧化物半导体(Complementary Metal Oxide Semiconductor,CMOS)工艺进行仿真与逻辑综合。设计了集成128个数字神经元的可重构阵列作为计算单元,每一个神经元包含两块1KB的数据内存和两块1KB的参数内存,具备实时灵活关闭神经元的控制端口,可降低神经元的动态功耗。主状态机可灵活调控内存和神经元阵列的工作状态,并通过数据流控制实现了网络层间的跳转、数据的输入输出、参数分配等功能。综合结果表明分布式存储系统的设计极大降低了存储访问的扇出,降低了存储控制系统的复杂度和数据访问的带宽,提高了参数分配的均衡性。通过配置,可以有效地部署类型不同,规格不一的神经网络模型。以下是部分实验结果:
芯片仿真:配置了10层任意输入输出节点数量的全连接神经网络,通过波形图发现,无论是神经元阵列在其使能的情况下使用效率几进100%,层间跳转只有状态机的2个时钟周期的延时。全连接网络功能得以从算法到电路的完全映射。
逻辑综合:综合结果表示这种可重构的设计芯片规格在1.5mm2左右,非常节约硬件资源,总方便集成在资源受限的终端中,如下表1所示;功耗在几 十毫瓦级别,易于集成在低功耗的终端设备中,如下表2所示。
Figure PCTCN2018105847-appb-000001
表1:基本单元使用及面积占用情况
Figure PCTCN2018105847-appb-000002
表2:模块功耗比例
以上本应用实例具有如下优势:
其一,所提出指令是汇编语言级别指令,与现有基于操作系统的平台级别网络部署模型(TensorFlow、Caffe等)框架不同,其不需要操作系统的支持,直接改变芯片运行模式,编程效率极高,可直接部署于超低功耗计算场景。
其二,CRNA架构采用数字电路实现人工神经元,使得神经元具有抗噪声能力强、可靠性高、精度高、扩展性好,设计方法成熟规范等优点。本设计计算精度为8位定点量化方式,相比现今深度学习处理器部分采用单位二进制量化网络,本设计比其计算精度高。
其三,神经网络层的实现方式更灵活。采用阵列复用的方式部署复杂的神经网络,分块实现节点数量不一的神经网络模型,大量复用神经计算单元,极 大地提高了硬件资源的利用率,节省硬件成本,具备高灵活性。
其四,CRNA架构具有可重构性、可编程性。CRNA架构采用流水线分布式存储方式,减小了延时和功耗,提高了系统的可靠性,并且使得每一个计算单元成为一个功能相对完善,结构相对完整的独立的小系统,配置链直接与单独的每一个神经元连接,配置过程具有相似性和递进关系,从而较容易实现可重构,通过配置指令内存全局配置不同模式网络。因此从全局和局部实现可重构性、可编程性;
其五,可集成性与扩展性。分布式存储方式减小了延时和功耗,提高了系统的可靠性,也使得数据与参数分布更为均匀,相对于集中存储方式具有更好的均衡性,从而具有良好的可集成性;CRNA架构可以与嵌入式处理器、传感器并用,级联多块加速芯片,扩大并行处理能力,以满足不同场景的使用。
综上,本应用实例所提指令与CRNA架构具有高速低功耗、灵活可重构能力,为如今种类不一的深度神经网络提供可靠计算平台,促进深度神经网络算法在移动物联网终端设备、无人机、自动驾驶等领域的广泛应用。
需要说明的是:在上述实施例中所涉及的各单元或模块可由相应的硬件或软件单元实现,各单元或模块可以为独立的软、硬件单元或模块,也可以集成为一个软、硬件单元或模块,在此不用以限制本发明。
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。

Claims (13)

  1. 一种神经元电路,其特征在于,所述神经元电路包括:
    计算模块;
    配置信息存储模块,用于存储神经元处理模式配置信息;以及,
    控制模块,用于根据所述处理模式配置信息,控制所述计算模块调整为对应的计算基础架构并执行对应的神经网络层节点数据处理。
  2. 如权利要求1所述的神经元电路,其特征在于,所述神经元电路还包括:
    参数存储模块,用于存储所述神经网络层节点数据处理所需的参数;以及,
    地址生成模块,用于受所述控制模块控制,查找与所述神经网络层节点数据处理针对的数据所对应的参数。
  3. 如权利要求1所述的神经元电路,其特征在于,所述神经元电路还包括:
    临时存储模块,用于存储所述神经网络层节点数据处理的中间数据。
  4. 如权利要求1所述的神经元电路,其特征在于,所述计算模块包括:
    基础计算模块,所述基础计算模块包括:乘法器、加法器和/或激活函数模块;以及,
    选通模块,用于在所述控制模块的控制下执行对应的选通动作,使所述基础计算模块构成对应的所述计算基础架构,所述选通模块包括:复用器和/或解复用器。
  5. 一种深度学习芯片,其特征在于,所述深度学习芯片包括:
    存储单元,用于存储深度学习指令集以及深度学习所针对数据,所述深度学习指令集包括:若干具有预定处理顺序的神经网络层指令;
    由若干如权利要求1至4任一项所述的神经元电路构成的神经元阵列;
    中央控制器,用于按照所述深度学习指令集控制使得:从所述存储单元向所述神经元阵列中的所述神经元电路置入与当前所述神经网络层指令相对应的当前所述处理模式配置信息及相应所需处理的数据,并在当前所述神经网络层指令所指示的当前神经网络层处理任务完成后,执行下一神经网络层处理任务, 直至所述深度学习指令集所指示的深度学习任务完成;以及,
    输入输出单元,用于实现数据在所述存储单元与所述神经元阵列之间的传输。
  6. 如权利要求5所述的深度学习芯片,其特征在于,所述输入输出单元为串入串出式移位寄存器,所述神经元电路与所述输入输出单元之间建立独立的数据传输通路。
  7. 如权利要求5所述的深度学习芯片,其特征在于,所述存储单元还用于:存储所述神经网络层节点数据处理的中间数据。
  8. 一种深度学习芯片级联系统,其特征在于,所述深度学习芯片级联系统包括:至少两个相互之间存在级联关系的、如权利要求5至7任一项所述的深度学习芯片。
  9. 一种深度学习系统,其特征在于,所述深度学习系统包括:至少一个如权利要求5至7任一项所述的深度学习芯片,以及与所述深度学习芯片相连的外围器件。
  10. 一种神经元控制方法,其特征在于,所述神经元控制方法包括下述步骤:
    获得神经元处理模式配置信息;
    根据所述处理模式配置信息,控制计算模块调整为对应的计算基础架构并执行对应的神经网络层节点数据处理。
  11. 一种深度学习控制方法,其特征在于,所述深度学习控制方法包括下述步骤:
    获得深度学习指令集,所述深度学习指令集包括:若干具有预定处理顺序的神经网络层指令;
    按照所述深度学习指令集,控制使得:向神经元阵列中的神经元电路置入与当前神经网络层指令相对应的当前处理模式配置信息及相应所需处理的数据,其中,神经元电路根据当前所述处理模式配置信息调整为对应的计算基础 架构并执行对应的神经网络层节点数据处理,并在当前所述神经网络层指令所指示的当前神经网络层处理任务完成后,执行下一神经网络层处理任务,直至所述深度学习指令集所指示的深度学习任务完成。
  12. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求10或11所述方法中的步骤。
  13. 一种深度学习方法,其特征在于,所述深度学习方法基于如权利要求5至7任一项所述的深度学习芯片或如权利要求8所述的深度学习芯片级联系统或如权利要求9所述的深度学习系统,所述深度学习方法包括下述步骤:
    向所述存储单元置入所述深度学习指令集以及所述数据;
    所述中央控制器按照所述深度学习指令集,控制使得:向所述神经元阵列中的所述神经元电路置入与当前所述神经网络层指令相对应的当前所述处理模式配置信息及相应所需处理的数据,其中,所述神经元电路根据当前所述处理模式配置信息调整为对应的计算基础架构并执行对应的神经网络层节点数据处理,并在当前所述神经网络层指令所指示的当前神经网络层处理任务完成后,执行下一神经网络层处理任务,直至所述深度学习指令集所指示的深度学习任务完成。
PCT/CN2018/105847 2018-09-14 2018-09-14 神经元电路、芯片、系统及其方法、存储介质 WO2020051918A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/105847 WO2020051918A1 (zh) 2018-09-14 2018-09-14 神经元电路、芯片、系统及其方法、存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/105847 WO2020051918A1 (zh) 2018-09-14 2018-09-14 神经元电路、芯片、系统及其方法、存储介质

Publications (1)

Publication Number Publication Date
WO2020051918A1 true WO2020051918A1 (zh) 2020-03-19

Family

ID=69777414

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/105847 WO2020051918A1 (zh) 2018-09-14 2018-09-14 神经元电路、芯片、系统及其方法、存储介质

Country Status (1)

Country Link
WO (1) WO2020051918A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI746084B (zh) * 2020-07-24 2021-11-11 義守大學 多重函數計算器

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295799A (zh) * 2015-05-12 2017-01-04 核工业北京地质研究院 一种深度学习多层神经网络的实现方法
CN108364063A (zh) * 2018-01-24 2018-08-03 福州瑞芯微电子股份有限公司 一种基于权值分配资源的神经网络训练方法和装置
CN109409510A (zh) * 2018-09-14 2019-03-01 中国科学院深圳先进技术研究院 神经元电路、芯片、系统及其方法、存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295799A (zh) * 2015-05-12 2017-01-04 核工业北京地质研究院 一种深度学习多层神经网络的实现方法
CN108364063A (zh) * 2018-01-24 2018-08-03 福州瑞芯微电子股份有限公司 一种基于权值分配资源的神经网络训练方法和装置
CN109409510A (zh) * 2018-09-14 2019-03-01 中国科学院深圳先进技术研究院 神经元电路、芯片、系统及其方法、存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI746084B (zh) * 2020-07-24 2021-11-11 義守大學 多重函數計算器

Similar Documents

Publication Publication Date Title
CN109409510B (zh) 神经元电路、芯片、系统及其方法、存储介质
CN109102065B (zh) 一种基于PSoC的卷积神经网络加速器
US20190087708A1 (en) Neural network processor with direct memory access and hardware acceleration circuits
WO2019060670A1 (en) LOW PROFOUND CONVOLUTIVE NETWORK WEIGHT COMPRESSION
US11080593B2 (en) Electronic circuit, in particular capable of implementing a neural network, and neural system
CN111105023B (zh) 数据流重构方法及可重构数据流处理器
US11609792B2 (en) Maximizing resource utilization of neural network computing system
Bavikadi et al. A survey on machine learning accelerators and evolutionary hardware platforms
CN111199275B (zh) 用于神经网络的片上系统
CN108304925B (zh) 一种池化计算装置及方法
CN110991630A (zh) 一种面向边缘计算的卷积神经网络处理器
TW201807622A (zh) 多層人造神經網路
Sun et al. A high-performance accelerator for large-scale convolutional neural networks
CN112434785B (zh) 一种面向超级计算机的分布式并行深度神经网络性能评测方法
Huang et al. IECA: An in-execution configuration CNN accelerator with 30.55 GOPS/mm² area efficiency
Geng et al. CQNN: a CGRA-based QNN framework
CN114429214A (zh) 运算单元、相关装置和方法
TW202341012A (zh) 用於記憶體內運算之加速器架構之二維網格
CN113407479A (zh) 一种内嵌fpga的众核架构及其数据处理方法
WO2020051918A1 (zh) 神经元电路、芯片、系统及其方法、存储介质
US20210326189A1 (en) Synchronization of processing elements that execute statically scheduled instructions in a machine learning accelerator
CN115668222A (zh) 一种神经网络的数据处理方法及装置
CN113407238A (zh) 一种具有异构处理器的众核架构及其数据处理方法
CN112906877A (zh) 用于执行神经网络模型的存储器架构中的数据布局有意识处理
Sriraman et al. Customized FPGA Design and Analysis of Soft-Core Processor for DNN

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18933025

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18933025

Country of ref document: EP

Kind code of ref document: A1