WO2020133317A1 - Computing resource allocation technology and neural network system - Google Patents

Computing resource allocation technology and neural network system Download PDF

Info

Publication number
WO2020133317A1
WO2020133317A1 PCT/CN2018/125239 CN2018125239W WO2020133317A1 WO 2020133317 A1 WO2020133317 A1 WO 2020133317A1 CN 2018125239 W CN2018125239 W CN 2018125239W WO 2020133317 A1 WO2020133317 A1 WO 2020133317A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
layer
computing
output data
network system
Prior art date
Application number
PCT/CN2018/125239
Other languages
French (fr)
Chinese (zh)
Inventor
刘哲
曾重
王铁英
段小祥
张慧敏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201880100574.2A priority Critical patent/CN113597621A/en
Priority to PCT/CN2018/125239 priority patent/WO2020133317A1/en
Publication of WO2020133317A1 publication Critical patent/WO2020133317A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present application relates to the field of computer technology, in particular to a computing resource allocation technology and a neural network system.
  • Deep learning is an important branch of artificial intelligence (Artificial Intelligence, AI). Deep learning is a neural network constructed to imitate the human brain, which can achieve better recognition results than traditional shallow learning methods.
  • Convolutional neural network Convolutional Neural Network, CNN
  • CNN Convolutional Neural Network
  • image processing is an application to identify and analyze the input image, and finally output a set of classified image content. For example, we can use the convolutional neural network algorithm to extract and classify the body color, license plate number and model of a motor vehicle on a picture.
  • Convolutional neural networks usually use a three-layer sequence: convolutional layer (Convolutional Layer), pooling layer (Pooling Layer) and modified linear units (Rectified Linear Units, ReLU) to extract the features of the picture.
  • Convolutional Layer convolutional layer
  • Pooling Layer Pooling layer
  • ReLU modified linear units
  • the process of extracting picture features is actually a series of matrix operations (for example, matrix multiply-add operations). Therefore, how to process the pictures in the network in parallel and quickly becomes a problem to be studied in a convolutional neural network.
  • the application provides a computing resource allocation technology and a neural network system, which can improve the data processing speed in the neural network.
  • an embodiment of the present invention provides a computing resource allocation method applied in a neural network system.
  • This method can be performed by the host host connected to the neural network chip.
  • the requirement determines N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer. Further, according to the calculation specifications of the calculation units in the neural network system, N first weights are deployed on P calculation units, and M second weights are deployed on Q calculation units.
  • the input data of the second neural network layer includes the first output data, N and M are both positive integers, and the ratio of N and M to the data volume of the first output data and the second output data The ratio of the amount of data corresponds.
  • Both P and Q are positive integers, the P computing units are used to perform operations of the first neural network layer, and the Q computing units are used to perform operations of the second neural network layer.
  • the computing resource allocation method provided by the embodiment of the present invention takes into account the amount of data output by the adjacent neural network layer when configuring the computing unit that executes each layer of neural network operation according to the deployment requirements, so that the computing nodes that perform different neural network layer operations
  • the computing power of the computer is matched, so that the computing power of the computing nodes that perform each layer of neural network operations can be fully utilized to improve the efficiency of data processing.
  • the deployment requirement includes a calculation delay
  • the first neural network layer is a starting layer of all neural network layers in the neural network system.
  • the determining N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer includes: according to the data amount of the first output data, the calculation The delay and the calculation frequency of the resistance random access memory cross matrix ReRAM crossbar in the calculation unit determine the value of N; according to the ratio of the data amount of the first output data to the data amount of the second output data and The value of N determines the value of M.
  • the value of N may be obtained according to the following formula:
  • N Used to indicate the number N of weights required for the first layer of neural network configuration, Is the number of rows of output data of the first layer of neural network, The number of columns of output data for the first-layer neural network.
  • t is the set calculation delay, and f is the calculation frequency of the CrossBar in the calculation unit.
  • the neural network system includes multiple neural network chips, each neural network chip includes multiple computing units, and each computing unit includes at least one resistive random access The memory cross matrix ReRAM crossbar, the deployment requirements include the number of chips of the neural network system.
  • N first weights to be configured for the first neural network layer and M to be configured for the second neural network layer The second weights include: according to the number of the chips, the number of ReRAM crossbars in each chip, the number of ReRAM crossbars required to deploy one weight for each layer of neural network, and the amount of output data of the adjacent neural network layer Determines the value of N; determines the value of M according to the ratio of the data amount of the first output data and the data amount of the second output data and the value of N.
  • the deployment requirement is the number of chips required by the neural network system
  • the first neural network layer is the starting layer of the neural network system
  • the two formulas are used to obtain N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer, where the value of N is ,
  • the value of M is Value.
  • xb 1 is used to represent the number of crossbars required to deploy a weight of the first layer (or called the starting layer) neural network, Used to represent the number of weights required for the starting layer
  • xb 2 is used to represent the number of crossbars required to deploy one weight in the second layer of neural network, Used to represent the number of weights required for the second layer of neural network
  • xb n is used to represent the number of crossbars required to deploy a weight in the nth layer neural network, It is used to represent the number of weights required for the nth layer neural network
  • K is the number of chips of the neural network system required for deployment requirements
  • L is the number of crossbars in each chip.
  • the value of i can be from 2 to n, where n is the total number of neural network layers in the neural network system.
  • the neural network system includes multiple neural network chips, each neural network chip includes multiple secondary computing nodes, and each secondary computing node includes multiple computing Units
  • the method further includes mapping the P computing units and the Q computing units into multiple secondary computing nodes according to the number of computing units included in the secondary computing nodes in the neural network system. Wherein, at least a part of the P computing units and at least a part of the Q computing units are mapped into the same secondary computing node.
  • the computing units that perform the operations of adjacent neural network layers can be located in the same secondary computing node as much as possible, thereby reducing the amount of data transmitted between computing nodes and increasing the speed of data transmission between different neural network layers .
  • the method further includes mapping the plurality of P computing units and the Q computing units according to the number of secondary computing nodes included in each neural network chip
  • the secondary computing nodes are mapped into the multiple neural network chips.
  • at least a part of the secondary computing nodes of the secondary computing nodes to which the P computing units belong and at least a part of the secondary computing nodes of the secondary computing nodes to which the Q computing units belong are mapped to the same neural network In the chip.
  • the secondary computing nodes that perform the operations of adjacent neural network layers can be located in the same neural network chip as much as possible, which further reduces the amount of data transmitted between the computing nodes and improves the data transmission between different neural network layers. speed.
  • the ratio of the N and M and the data amount of the first output data and the second The corresponding ratio of the data volume of the output data includes: the ratio of the ratio of the N and M to the data volume of the first output data and the data volume of the second output data is the same.
  • the present application provides a neural network system, including a host and a plurality of neural network chips, each neural network chip includes a plurality of computing units, the host is connected to the plurality of neural network chips and used for execution
  • a neural network system including a host and a plurality of neural network chips, each neural network chip includes a plurality of computing units, the host is connected to the plurality of neural network chips and used for execution
  • the present application provides a resource allocation apparatus, including a functional module capable of executing the computing resource allocation method described in the first aspect and any possible implementation manner of the first aspect.
  • the present application also provides a computer program product, including program code, and the instructions included in the program code are executed by a computer to implement the first aspect and any possible one of the first aspect
  • the computing resource allocation method described in the implementation is also provided.
  • the present application also provides a computer-readable storage medium for storing program code, and the instructions included in the program code are executed by a computer to implement the foregoing first aspect and The method for computing resource allocation described in any possible implementation manner of the first aspect.
  • FIG. 1 is a schematic structural diagram of a neural network system provided by an embodiment of the present invention.
  • FIG. 1A is a schematic structural diagram of yet another neural network system provided by an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a computing node in a neural network chip provided by an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a logical structure of a neural network layer in a neural network system according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a set of computing nodes for processing data of different neural network layers in a neural network system according to an embodiment of the present invention
  • FIG. 5 is a flowchart of a method for computing resource allocation in a neural network system according to an embodiment of the present invention
  • FIG. 6 is a flowchart of yet another method for computing resource allocation according to an embodiment of the present invention.
  • 6A is a flowchart of a resource mapping method according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of another computing resource allocation method according to an embodiment of the present invention.
  • FIG. 8 is a flowchart of a data processing method according to an embodiment of the present invention.
  • ReRAM crossbar 10 is a schematic structural diagram of a resistive random access memory crossbar (ReRAM crossbar) according to an embodiment of the present invention
  • FIG. 11 is a schematic structural diagram of a resource allocation apparatus according to an embodiment of the present invention.
  • Deep learning is an important branch of artificial intelligence (Artificial Intelligence, AI). Deep learning is a neural network constructed to imitate the human brain, which can achieve better recognition results than traditional shallow learning methods.
  • Artificial neural network Artificial Neural Network, ANN
  • NN neural network
  • Artificial neural networks can include convolutional neural networks (Convolutional Neural Networks, CNN), deep neural networks (Deep Neural Networks, DNN), multilayer perceptrons (Multilayer Perceptron, MLP) and other neural networks.
  • CNN convolutional neural networks
  • DNN deep neural networks
  • MLP multilayer perceptrons
  • FIG. 1 is a schematic structural diagram of an artificial neural network system according to an embodiment of the present invention.
  • Figure 1 illustrates the convolutional neural network as an example.
  • the convolutional neural network system 100 may include a host 105 and a convolutional neural network circuit 110.
  • the convolutional neural network circuit 110 may also be referred to as a neural network accelerator.
  • the convolutional neural network circuit 110 is connected to the host 105 through the host interface.
  • the host interface may include a standard host interface and a network interface.
  • the host interface may include a Peripheral Component Interconnect Express (PCIE) interface.
  • PCIE Peripheral Component Interconnect Express
  • the convolutional neural network circuit 110 may be connected to the host 105 through the PCIE bus 106.
  • PCIE Peripheral Component Interconnect Express
  • the data can be input into the convolutional neural network circuit 110 through the PCIE bus 106, and receive the processed data of the convolutional neural network circuit 110 through the PCIE bus 106.
  • the host 105 may also monitor the working state of the convolutional neural network circuit 110 through the host interface.
  • the host 105 may include a processor 1052 and memory 1054. It should be noted that, in addition to the devices shown in FIG. 1, the host 105 may also include a communication interface and other devices such as a magnetic disk as an external storage, which is not limited herein.
  • a processor (Processor) 1052 is an arithmetic core and a control core (Control Unit) of the host 105.
  • the processor 1052 may include multiple processor cores.
  • the processor 1052 may be a very large-scale integrated circuit.
  • An operating system and other software programs are installed in the processor 1052, so that the processor 1052 can achieve access to the memory 1054, cache, disk, and peripheral devices (such as the neural network circuit in FIG. 1).
  • the Core in the processor 1052 may be, for example, a central processing unit (Central Processing Unit, CPU), or other specific integrated circuits (Application Specific Integrated Circuit, ASIC).
  • CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • the memory 1054 is the main memory of the host 105.
  • the memory 1054 is connected to the processor 1052 through a double data rate (DDR) bus.
  • the memory 1054 is generally used to store various running software, input and output data, and information exchanged with external storage in the operating system. In order to improve the access speed of the processor 1052, the memory 1054 needs to have the advantage of fast access speed.
  • dynamic random access memory DRAM
  • the processor 1052 can access the memory 1054 at a high speed through a memory controller (not shown in FIG. 1), and can perform a read operation and a write operation on any storage unit in the memory 1054.
  • a convolutional neural network (CNN) circuit 110 is a chip array composed of multiple neural network (NN) chips.
  • the CNN circuit 110 includes multiple NN chips 115 and multiple routers 120.
  • the embodiment of the present invention refers to the NN chip 115 in the application as chip 115 for short.
  • the plurality of chips 115 are connected to each other through a router 120.
  • one chip 115 may be connected to one or more routers 120.
  • Multiple routers 120 may constitute one or more network topologies. Data can be transmitted between the chips 115 through the one or more network topologies.
  • the plurality of routers 120 may constitute a first network 1106 and a second network 1108, where the first network 1106 is a ring network and the second network 1108 is a two-dimensional mesh (2D mesh) network. Therefore, the data input from the input port 1102 can be sent to the corresponding chip 115 by the network composed of the plurality of routers 120, and the data processed by any one chip 115 can also be sent to other chips 115 through the network composed of the plurality of routers 120. Process or send out from output port 1104.
  • FIG. 1 also shows a schematic structural diagram of the chip 115.
  • chip 115 may include multiple neural network processing units 125 and multiple routers 122.
  • FIG. 1 takes the neural network processing unit as a tile for example.
  • one tile 125 may be connected to one or more routers 122.
  • the multiple routers 122 in the chip 115 may constitute one or more network topologies. Data can be transmitted between tiles 125 through the various network topologies.
  • the plurality of routers 122 may constitute a first network 1156 and a second network 1158, where the first network 1156 is a ring network and the second network 1158 is a two-dimensional mesh (2D mesh) network.
  • 2D mesh two-dimensional mesh
  • the data input to the chip 115 from the input port 1152 can be sent to the corresponding tile 125 according to the network composed of the plurality of routers 122, and the data processed by any one tile 125 can also be sent through the network composed of the plurality of routers 122 Send to other tiles 125 or output from output port 1154.
  • the chips 115 are interconnected by routers, one or more network topologies composed of multiple routers 120 in the convolutional neural network circuit 110 and a network composed of multiple routers 122 in the data processing chip 115
  • the topology may be the same or different, as long as data can be transmitted between the chips 115 or the tiles 125 through the network topology, and the chip 115 or the tiles 125 can receive data or output data through the network topology.
  • the number and types of networks composed of multiple routers 120 and 122 are not limited.
  • the router 120 and the router 122 may be the same or different. For clarity of description, in FIG.
  • the router 120 connected to the chip may also be referred to as a computing node.
  • FIG. 1A is a schematic structural diagram of yet another neural network system according to an embodiment of the present invention.
  • the host 105 is connected to multiple PCIE cards 109 through a PCIE interface 107, and each PCIE card 109 may include multiple neural network chips 115, and the neural network chips are connected through a high-speed interconnection interface .
  • the interconnection between chips is not limited here. It can be understood that, in actual applications, the tiles within the chip may not be connected by a router, and the high-speed interconnection method between the chips shown in FIG. 1A is adopted. In another case, it is also possible to use the router connection shown in FIG. 1 between the tiles within the chip, and the high-speed interconnection method shown in FIG. 1A between the chips.
  • the embodiments of the present invention do not limit the connection modes between chips or within chips.
  • each tile 125 may include an input-output interface (TxRx) 1252, a switching device (TSW) 1254, and multiple processing devices (PE) 1256.
  • TxRx input-output interface
  • TSW switching device
  • PE processing devices
  • a switch (TSW) 1254 is connected to TxRx 1252, and the TSW 1254 is used to implement data transmission between the TxRx 1252 and multiple PEs 1256.
  • Each PE 1256 may include one or more computing engines (computing engines) 1258.
  • the one or more computing engines 1258 are used to implement neural network calculations on the data in the input PE 1256. For example, the data input to tile 125 and the convolution kernel preset in tile 125 may be multiplied and added.
  • the calculation result of Engine 1258 can be sent to other tiles 125 through TSW 1254 and TxRx 1252.
  • an Engine 1258 may include modules that implement convolution, pooling, or other neural network operations.
  • the specific circuit or function of the Engine is not limited. For simplicity of description, in the embodiment of the present invention, the calculation engine is simply referred to as the engine engine.
  • ReRAM resistive random-access memory
  • Engine 1258 may include one or more crossbars.
  • the structure of ReRAM crossbar can be shown in Figure 10. Later, we will introduce how to perform matrix multiply-add operation through ReRAM crossbar.
  • the neural network circuit provided by the embodiment of the present invention includes multiple NN chips, each NN chip includes multiple tile tiles, and each tile includes multiple processing devices PE, and each PE Including multiple engine engines, each engine is realized by one or more ReRAM crossbars.
  • the neural network system provided by the embodiment of the present invention may include multi-level computing nodes, for example, may include four-level computing nodes: the first-level computing node is a chip 115, the second-level computing node is a tile within the chip, and The third-level computing node is the PE in the tile, and the fourth-level computing node is the Engine in the PE.
  • the neural network system may include multiple neural network layers.
  • the neural network layer is a logical layer concept.
  • a neural network layer refers to performing a neural network operation once.
  • Each layer of neural network computing is implemented by computing nodes.
  • the neural network layer may include a convolution layer, a pooling layer, and the like.
  • the neural network system may include n neural network layers (also called n-layer neural networks), where n is an integer greater than or equal to 2.
  • FIG. 3 shows some neural network layers in the neural network system. As shown in FIG.
  • the neural network system may include a first layer 302, a second layer 304, a third layer 306, a fourth layer 308, and a fifth layer 310 To the nth layer 312.
  • the first layer 302 can perform a convolution operation
  • the second layer 304 can perform a pooling operation on the output data of the first layer 302
  • the third layer 306 can perform a convolution operation on the output data of the second layer 304
  • the fourth layer 308 may perform a convolution operation on the output result of the third layer 306, and the fifth layer 310 may perform a sum operation on the output data of the second layer 304 and the output data of the fourth layer 308, and so on. It can be understood that FIG.
  • the fourth layer 308 may also be a pooling operation.
  • the fifth layer 310 may also be other neural network operations such as convolution operations or pooling operations.
  • the calculation result of the i-th layer will be temporarily stored in the preset buffer.
  • the calculation unit The calculation result of the i-th layer and the weight of the i+1th layer need to be reloaded from the preset cache.
  • the i-th layer is any layer in the neural network system.
  • ReRAM uses a crossbar in the engine of the neural network system, and because ReRAM has the advantage of integrating storage and calculation, the weights can be configured on the ReRAM before calculation, and the calculation results can be directly sent to the next layer Perform pipeline calculations.
  • each layer of neural network only needs to cache very little data.
  • each layer of neural network only needs to cache enough input data for one window calculation.
  • an embodiment of the present invention provides a method for streaming data through a neural network. For clarity of description, the following briefly introduces the stream processing of the neural network system in conjunction with the convolutional neural network system of FIG. 1.
  • FIG. 4 takes the division of tiles 125 in the neural network system shown in FIG. 1 as an example to illustrate different sets of computing nodes that implement neural network computing at different layers in the embodiment of the present invention.
  • multiple tiles 125 in the chip 115 may be divided into multiple node sets. For example: first node set 402, second node set 404, third node set 406, fourth node set 408, and fifth node set 410.
  • each node set includes at least one computing node (for example, tile 125).
  • the computing nodes of the same node set are used to perform neural network operations on the data entering the same neural network layer, and the data of different neural network layers are processed by the computing nodes of different node sets.
  • the processing results of a computing node will be transmitted to the computing nodes in other node sets for processing.
  • This pipelined processing method makes each layer of neural network only need to cache very little data, and can make multiple computing nodes concurrent Processing the same data stream to improve processing efficiency.
  • FIG. 4 uses tiles as an example to illustrate a set of computing nodes used to process different neural network layers (for example, convolutional layers). In actual applications, because a tile contains multiple PEs, each PE contains multiple Engines, and different application scenarios require different amounts of calculation.
  • the computing nodes in the neural network system can be divided with the granularity of PE, Engine or chip, so that the computing nodes in different sets are used to handle the operations of different neural network layers.
  • the computing node referred to in the embodiment of the present invention may be Engine, PE, tile, or chip.
  • a computing node for example, tile125
  • a neural network operation for example, convolution calculation
  • it may calculate the data input to the computing node based on the weight of the corresponding neural network layer
  • a certain tile 125 may perform a convolution operation on the input data input to the tile 125 based on the weight of the corresponding convolution layer, for example, perform a matrix multiply-add calculation on the weight and the input data.
  • the weight is usually used to indicate the importance of the input data to the output data.
  • the weights are usually represented by a matrix. As shown in FIG. 9, the weight matrix of j rows and k columns shown in FIG.
  • each element in the weight matrix represents a weight value.
  • the computing nodes of a node set since the computing nodes of a node set are used to perform the operation of a neural network layer, the computing nodes of the same node set may share weights, and the computing nodes in different node sets may have different weights.
  • the weights in each computing node can be configured in advance. Specifically, each element in a weight matrix is configured in the ReRAM cell in the corresponding crossbar array, so that the matrix multiply-add operation of the input data and the configured weight can be implemented through the crossbar array. In the follow-up, we will briefly introduce how to implement matrix multiply-add operation through crossbar.
  • the computing nodes in the neural network may be divided into a set of nodes for processing different neural network layers, and corresponding weights are configured.
  • computing nodes of different node sets can perform corresponding calculations according to the configured weights.
  • the computing nodes of each node set can send the computing results to the computing nodes used to perform the next layer of neural network operations.
  • a person skilled in the art may know that, in the process of realizing the neural network stream processing, if the computing resources for performing different layers of neural network operations do not match, for example, the computing resources for performing the upper layer neural network operations are less, and the next layer of neural network operations are performed There are relatively many computing resources, which will result in a waste of computing resources of the next level of computing nodes.
  • embodiments of the present invention provide a computing resource allocation method for allocating computing nodes performing different neural network layer operations The matching of the computing power of the computing nodes used to perform the operation of two adjacent neural network layers in the neural network system improves the data processing efficiency in the neural network system and does not waste computing resources.
  • FIG. 5 is a flowchart of a method for computing resource allocation in a neural network system according to an embodiment of the present invention. This method can be applied to the neural network system shown in FIG. 1. This method may be implemented by the host computer 105 when deploying a neural network or when configuring a neural network system. Specifically, it may be implemented by the processor 1052 in the host computer 105. As shown in FIG. 5, the method may include the following steps.
  • the network model information of the neural network system is obtained.
  • the network model information includes the first output data amount of the first neural network layer and the second output data amount of the second neural network layer in the neural network system.
  • Network model information can be determined according to actual application requirements. For example, the total number of neural network layers and the algorithm of each layer can be determined according to the application scenario of the neural network system.
  • the network model information may include the total number of neural network layers in the neural network system, the algorithm of each layer, and the data output of each layer of the neural network.
  • the algorithm refers to a neural network operation that needs to be performed.
  • the algorithm may refer to a convolution operation, a pooling operation, and so on. As shown in FIG.
  • the neural network layer of the neural network system may have n layers, where n is an integer not less than 2.
  • the first neural network layer and the second neural network layer may be two layers in the n layer that are operationally dependent.
  • the two neural network layers having a dependency relationship mean that the input data of one neural network layer includes the output data of another neural network layer.
  • Two neural network layers with dependencies can also be referred to as adjacent layers.
  • the output data of the first layer 302 is the input data of the second layer 304, therefore, the first layer 302 and the second layer 304 have a dependency relationship.
  • the output data of the second layer 304 is the input data of the third layer 306, the input data of the fifth layer 310 includes the output data of the second layer 304, therefore, the second layer 304 and the third layer 306 have a dependency relationship, the second layer 304 and the fifth layer 310 also have a dependency relationship.
  • the first layer 302 shown in FIG. 3 is the first neural network layer
  • the second layer 304 is the second neural network layer as an example for description.
  • the first output data amount, and the second output data amount determine the N first weights and the first M second weights to be configured in the second neural network layer.
  • N and M are both positive integers
  • the ratio of N and M corresponds to the ratio of the first output data volume and the second output data volume.
  • the deployment requirements may include the calculation delay of the neural network system, or may include the number of chips required to be deployed by the neural network system.
  • the operation of the neural network is mainly to perform matrix multiply-add operations.
  • the output data of each layer of the neural network is also a one-dimensional or multi-dimensional real matrix. Therefore, the first output data includes the first neural network layer.
  • the number of rows and columns of output data, and the second output data amount includes the number of rows and columns of output data of the second neural network layer.
  • a computing node when performing a convolution operation or a pooling operation, it is necessary to perform a multiply-add calculation on the input data and the weight of the corresponding neural network layer. Since the weights are configured on the cells in the crossbar, the crossbars in the calculation unit perform calculations on the input data in parallel, so the number of weights can determine the parallel computing capabilities of multiple calculation units that perform neural network operations. In another way of expression, the computing power of the computing node performing the neural network operation is determined by the number of weights configured in the computing unit performing the neural network operation.
  • the first output data amount and the second output data amount may be based on specific deployment requirements Determine the number of weights to be configured for the first neural network layer and the second neural network layer. Since the weights of different neural network layers are not necessarily the same, for clarity of description, in the embodiments of the present invention, the weights required for the operation of the first neural network layer are called first weights, and the weights required for the operation of the second neural network layer Called the second weight.
  • Performing the first neural network layer operation means that the computing node performs the corresponding calculation on the data input to the first neural network layer based on the first weight
  • performing the first neural network layer operation means that the computing node inputs the second neural network based on the second weight
  • the data of the layer performs corresponding calculations.
  • the calculations here can be neural network operations such as performing convolution or pooling operations.
  • the number of weights to be configured for each layer of the neural network includes the number N of first weights to be configured by the first neural network layer and the number M of second weights to be configured by the second neural network layer.
  • the weight refers to a weight matrix.
  • the number of weights refers to the number of weight matrices required, or the number of copies of weights.
  • the number of weights can also be understood as how many identical weight matrices need to be configured.
  • the first The data output volume of the neural network (that is, the starting layer of all neural network layers in the neural network system), the calculation delay, and the calculation frequency of the ReRAM crossbar used in the neural network system determine the first The number of weights that need to be configured for a layer of neural network, and then the number of weights that need to be configured for each layer of neural network according to the number of weights that need to be configured for the first layer of neural network and the output data amount of each layer of neural network.
  • the number of weights required for the first layer (ie, the starting layer) neural network can be obtained according to the following formula 1:
  • the first-layer neural network is the starting layer neural network among all neural network layers in the neural network system. It can be understood that, when the first neural network layer is the starting layer of all neural network layers in the neural network system, the number N of the first weight is calculated according to formula one Value.
  • the ratio of the number of weights required by two adjacent layers can be made to correspond to the ratio of the output data amount of the two adjacent layers.
  • the ratio can be the same. Therefore, in the embodiment of the present invention, the number of weights required by each layer of neural network can be determined according to the number of weights required by the first layer of neural network and the ratio of the output data amount of each layer of neural network. Specifically, the number of weights required for each layer of neural network can be calculated according to the following formula (2):
  • the value of i can be from 2 to n, where n is the total number of neural network layers in the neural network system.
  • the ratio of the number of weights required to perform the operation of the i-1th layer neural network to the number of weights required to perform the ith layer of the neural network operation is the i-1th layer
  • the ratio of the output data volume of and the output data volume of the i-th layer corresponds.
  • the output data of each neural network layer may include multiple channels (channel), where the channel refers to the number of kernels in each neural network layer.
  • a Kernel represents a feature extraction method, corresponding to a feature map (feature map), multiple feature maps constitute the output data of this layer.
  • the weight used by a neural network layer includes multiple kernels. Therefore, in practical applications, in another situation, the output data volume of each layer can also consider the number of channels of each layer of the neural network. Specifically, after obtaining the number of weights required for the first neural network layer according to the above formula 1, the number of weights required for each layer of neural network can be obtained according to the following formula 3:
  • Formula 3 further considers the number of channels output by each layer of neural network on the basis of Formula 2.
  • C i-1 is used to represent the number of channels of the i-1 layer
  • C i is used to represent the number of channels of the i layer
  • the value of i is from 2 to n
  • n is the number of channels of the neural network layer in the neural network system
  • the total number of layers, n is an integer not less than 2.
  • the number of channels of each layer of neural network can be obtained from the network model information.
  • the number of weights required for the starting layer is obtained according to the above formula 1, it can be calculated according to formula 2 (or formula 3) and the output data amount of each layer of neural network included in the network model information The number of weights required for each layer of neural network.
  • the above-mentioned first neural network layer is the starting layer of all neural network layers in the neural network system
  • the number N of the first weight is obtained according to formula 1
  • it can be calculated according to formula 2, according to the value of N
  • the set first output data amount and second output data amount to obtain the number M of second weights required by the second neural network layer.
  • the number of weights required to obtain the first layer of neural network can be calculated in combination with the following formula 4 and the foregoing formula 2, or it can be combined
  • the following formula 4 and the foregoing formula 3 calculate the number of weights required to obtain the first layer of neural network.
  • xb 1 is used to represent the number of crossbars required to deploy a weight of the first layer (or called the starting layer) neural network, Used to represent the number of weights required for the starting layer
  • xb 2 is used to represent the number of crossbars required to deploy one weight in the second layer of neural network, Used to represent the number of weights required for the second layer of neural network
  • xb n is used to represent the number of crossbars required to deploy a weight in the nth layer neural network, It is used to represent the number of weights required for the nth layer neural network
  • K is the number of chips of the neural network system required for deployment requirements
  • L is the number of crossbars in each chip.
  • the network model information of the neural network system also includes the size of a weight used by each neural network layer and crossbar specification information. Therefore, in the embodiment of the present invention, the xb i of the i-th layer neural network can be obtained according to the weight of each layer (ie, the number of rows and columns of the weight matrix) and the specifications of the crossbar, where i takes the value from 1 to n.
  • the value of L can be obtained from the parameters of the chip used by the neural network system.
  • the number of weights required to obtain the starting layer neural network according to Formula 4 and Formula 2 above (ie ), the number of weights that need to be configured for each layer can be obtained according to Equation 2 and the output data amount of each layer obtained from the network model information.
  • the number of weights required to obtain the starting layer neural network according to Formula 4 and Formula 3 above (ie ) can also be obtained according to Equation 3 and the output data amount of each layer.
  • N first weights are deployed on P calculation units, and M second weights are deployed on Q calculation units on.
  • P and Q are both positive integers
  • the P computing units are used to perform operations of the first neural network layer
  • the Q computing units are used to perform operations of the second neural network layer.
  • the calculation specification of the calculation unit refers to the number of crossbars included in one calculation unit.
  • a computing unit may include one or more crossbars. Specifically, as mentioned above, since the network model information of the neural network system further includes the size of one weight used by each neural network layer and the specification information of the crossbar, the deployment relationship between one weight and the crossbar can be obtained.
  • the weights of each layer may be deployed on the corresponding number of calculation units according to the number of crossbars included in each calculation unit.
  • the elements in the weight matrix are respectively configured in the ReRAM cells of the crossbar of the calculation unit.
  • the computing unit may refer to a PE or an engine, one PE may include multiple engines, and one engine may include one or more crossbars. Since the weight of each layer may be different, a weight can be deployed on one or more engines.
  • the P calculation units and the M number of The two weights need to be deployed in Q calculation units.
  • N first weights of the first neural network layer may be deployed on P computing units
  • M second weights may be deployed on Q computing units.
  • the elements in the N first weights are respectively allocated to the corresponding crossbar ReRAM cells in the P calculation units.
  • the elements in the M second weights are respectively allocated to the corresponding crossbar ReRAM cells in the Q calculation units.
  • the P computing units may perform the operation of the first neural network layer on the input data input to the P computing units based on the configured N first weights, and the Q computing units may be based on the configured Q first
  • the second weight performs the operation of the second neural network layer on the input data input to the Q computing units.
  • the computing resource allocation method provided by the embodiments of the present invention considers the amount of data output by the adjacent neural network layer when configuring the computing unit that performs each layer of neural network operations according to deployment requirements, so that different neural networks are executed.
  • the computing power of the computing nodes operating at the network layer matches, so that the computing power of the computing nodes can be fully utilized to improve the efficiency of data processing.
  • the transmission bandwidth between computing units or computing nodes is saved.
  • the computing unit can be mapped to the superior computing node of the computing unit according to the following method.
  • the neural network system may include four-level computing nodes: a first-level computing node chip, a second-level computing node tile, a third-level computing node PE, and a fourth-level computing node engine.
  • FIG. 6 describes in detail how to map the P computing units that need to deploy the N first weights and the Q computing units that need to deploy the M second weights Go to the superior computing node.
  • This method can still be implemented by the host 105 in the neural network system shown in FIGS. 1 and 1A. As shown in FIG. 6, the method may include the following steps.
  • the network model information of the neural network system is obtained.
  • the network model information includes the first output data amount of the first neural network layer and the second output data amount of the second neural network layer in the neural network system.
  • the first output data amount, and the second output data amount determine the N first weights and the first M second weights to be configured in the second neural network layer.
  • step 606 according to the calculation specifications of the calculation units in the neural network system, determine the P calculation units that need to be deployed with the N first weights, and the Q number that need to be deployed with the M second weights Calculation unit.
  • step 606 for steps 602, 604, and 606, reference may be made to the related description in the foregoing steps 502, 504, and 506, respectively.
  • step 606 after determining the P computing units to be deployed with the N first weights and the Q computing units to be deployed with the M second weights, and The N first weights are not directly deployed to P computing units, and the M second weights are deployed to Q computing units. Instead, step 608 is entered.
  • step 608 the P computing units and the Q computing units are mapped into multiple three-level computing nodes according to the number of computing units included in the three-level computing nodes in the neural network system.
  • FIG. 6A is a flowchart of a resource mapping method according to an embodiment of the present invention. 6A takes the computing unit as the fourth-level computing node engine as an example, and describes how to map the engine into the third-level computing node PE. As shown in FIG. 6A, the method may include the following steps.
  • the P computing units and the Q computing units are divided into m groups, and each group includes P/m computing units for executing the first neural network layer and Q/m calculation units in the second neural network layer.
  • m is an integer not less than 2, and the values of P/m and Q/m are both integers.
  • the P computing units are used as the computing unit performing the i-1th layer
  • the Q computing units are used as the computing unit performing the i-1th layer as an example.
  • each group of computing units is mapped to the third-level computing node.
  • the process of mapping try to make the computing unit that performs the operation of the adjacent neural network layer map to the same three-level node.
  • each first-level computing node chip includes eight second-level computing node tiles, and each tile includes two third-level computing nodes PE, and each PE includes 4 engines.
  • the four engines at the i-1th layer can be mapped to a third-level computing node PE (such as PE1 in Figure 7), and map the two engines at the ith layer and the i+1th layer The two engines are mapped to a third-level computing node PE (such as PE2 in Figure 7).
  • the mapping method for the computing units in the first group for the computing units in the second group, the four engines at the i-1th layer can be mapped to PE3, and the two engines at the ith layer and the second The two engines in the i+1 layer are mapped onto one PE4 together.
  • the computing units of other groups can be mapped in a mirrored manner according to the mapping method of the first group.
  • the computing units that execute adjacent neural network layers can be mapped to the same three-level computing node as much as possible. Therefore, when the output data of the i-th layer is sent to the computing unit of the i+1th layer, it only needs to be transmitted between the same third-level node (PE), and does not need to occupy the bandwidth between the third-level nodes, which can improve Data transmission speed reduces transmission bandwidth consumption between nodes.
  • PE third-level node
  • step 610 according to the number of third-level computing nodes included in the second-level computing nodes in the neural network system, a plurality of three mapping units of the P computing units and the Q computing units are mapped The level computing nodes are mapped to multiple level two computing nodes.
  • step 612 according to the number of secondary computing nodes included in each neural network chip, the multiple secondary computing nodes mapped by the P computing units and the Q computing units are mapped to the multiple Neural network chip.
  • FIG. 6A takes the example of mapping the engine performing the layer i operation to the third-level computing node as an example. Similarly, according to the method shown in FIG.
  • the third-level node can also be mapped to the second-level node , And map the second-level nodes to the first-level nodes.
  • PE1 performing the operation of the i-1 layer and PE2 performing the operations of the i-th layer and the i+1-th layer may be mapped into the same second-level computing node Tile1.
  • PE3 performing the operation of the i-1 layer and PE4 performing the operations of the i-th layer and the i+1th layer can be further mapped into the same second-level computing node Tile2.
  • the operations Tile1 and Tile2 that perform the i-1th layer, the ith layer, and the i+1th layer can also be mapped into the same chip chip1. In this way, the mapping relationship from the first-level computing node chip to the fourth-level computing node engine in the neural network system can be obtained.
  • the N first weights and the M second weights are deployed to P corresponding to the multiple third-level nodes, multiple second-level computing nodes, and multiple first-level computing nodes, respectively.
  • Calculation units and Q calculation units are obtained.
  • the mapping relationship from the first-level computing node chip to the fourth-level computing node engine in the neural network system can be obtained according to the methods described in FIGS. 6A and 7. For example, a mapping relationship between the P computing units and the Q computing units and the multiple third-level nodes, multiple second-level computing nodes, and multiple first-level computing nodes may be obtained, respectively.
  • the weights of the corresponding neural network layer can be deployed to the computing units of the computing nodes at all levels according to the obtained mapping relationship.
  • the N weights of the i-1th layer can be deployed in the four computing units corresponding to chip1, tile1, and PE1 and the four computing units corresponding to chip1, tile2, and PE3, respectively.
  • the M second weights of the i-th layer are respectively deployed to two computing units corresponding to chip1, tile1 and PE2 and two computing units corresponding to chip1, tile2 and PE4.
  • the N weights of the i-1 layer are respectively deployed in the four computing units in chip1—>tile1—>PE1 and the four computing units in chip1—>tile2—>PE3.
  • the M weights of the i-th layer are respectively deployed in two computing units in chip1—>tile1—>PE2 and two computing units in chip1—>tile2—>PE4.
  • the computing units supporting the operation of the adjacent neural network layer in the neural network system described in the embodiments of the present invention can be matched, but also the computing units performing the operations of the adjacent neural network layer can be made as much as possible Many are located in the same three-level computing node, as many third-level computing nodes executing adjacent neural network layers are located in the same second-level computing node, and as many secondary computing nodes executing adjacent neural network layer are in the same
  • a first-level computing node for example, a neural network chip
  • a fourth-level computing node engine is used as a computing unit to describe a process of allocating computing resources for performing operations of different neural network layers.
  • the above embodiment divides the set of operations that perform different neural network layers with the engine as the granularity.
  • the third-level computing node PE can also be used as the computing unit for distribution.
  • the third-level computing node PE and the second-level computing node tile and the first-level computing node chip can be established according to the above method. Mapping.
  • the calculation unit may be engine, PE, tile, or chip, which is not limited herein.
  • FIG. 8 is a flowchart of a data processing method according to an embodiment of the present invention. This method is applied to the neural network system shown in FIG. 1, and the neural network system shown in FIG. 1 is configured by the method shown in FIGS. 5-7 to allocate computing resources for performing different neural network layer operations. As shown in FIG. 8, the method may be implemented by the neural network circuit shown in FIG. 1. The method may include the following steps.
  • P computing units in the neural network system receive first input data.
  • the P computing units are used to perform the first neural network layer operation of the neural network system.
  • the first neural network layer is any layer in the neural network system.
  • the first input data is data that needs to perform the operation of the first neural network layer.
  • the first input data may be data input to the neural network system for the first time.
  • the first input data may be output data processed by other neural network layers.
  • the P calculation units perform calculation on the first input data according to the configured N first weights to obtain first output data.
  • the first weight is a weight matrix.
  • the N first weights refer to N weight matrices, and the N first weights may also be referred to as N first weight copies.
  • the N first weights may be configured in the P calculation units according to the method shown in FIGS. 5-7. Specifically, the elements in the first weights are respectively configured into the ReRAM cells of the crossbars included in the P calculation units, so that the crossbars in the P calculation units can pair the input data based on the N first weights Parallel computing makes full use of the computing power of the crossbar in P computing units.
  • the P calculation units may perform a neural network operation on the received first input data based on the configured N first weights to obtain The first output data.
  • the crossbar in the P calculation units may perform a matrix multiply-add operation on the first input data and the configured first weight.
  • the Q computing units in the neural network system receive second input data.
  • the Q calculation units are used to perform a second neural network layer operation of the neural network system, and the second input data includes the first output data.
  • the Q calculation units may only perform the operation of the second neural network layer on the first output data of the P calculation units.
  • the P computing units are used to perform the operations of the first layer 302 shown in FIG. 3, and the Q computing units are used to perform the operations of the second layer 302 shown in FIG.
  • the second input data is the first output data.
  • the Q calculation units may also be used to jointly perform a second neural network operation on the first output data of the first neural network layer and the output data of other neural network layers.
  • the P computing units may be used to perform the neural network operation of the second layer 304 shown in FIG. 3, and the Q computing units may be used to perform the neural network operation of the fifth layer 310 shown in FIG.
  • the Q calculation units are used to perform operations on the output data of the second layer 304 and the fourth layer 308, and the second input data includes the first output data and the fourth The output data of layer 308.
  • the Q calculation units perform calculation on the second input data according to the configured M second weights to obtain second output data.
  • the second weight is also a weight matrix.
  • the M second weights refer to M weight matrixes, and the M second weights may also be referred to as M second weight copies.
  • the second weight may be configured into the ReRAM cell of the crossbar included in the Q calculation units according to the method shown in FIG.
  • the Q calculation units may perform a neural network operation on the received second input data based on the configured M second weights to obtain the second output data.
  • the crossbar in the Q calculation units may perform a matrix multiply-add operation on the second input data and the configured second weight.
  • the ratio of N and M corresponds to the ratio of the data volume of the first output data to the data volume of the second output data.
  • the weight matrix of j rows and k columns shown in FIG. 9 may be a weight of a neural network layer, and each element in the weight matrix represents a weight value.
  • 10 is a schematic structural diagram of a ReRAM crossbar in a computing unit provided by an embodiment of the present invention.
  • the ReRAM crossbar may be simply referred to as a crossbar in this embodiment of the present invention.
  • the crossbar includes multiple ReRAM cells, such as G 1,1 , G 2,1, and so on.
  • the multiple ReRAM cells constitute a neural network matrix.
  • the weight element W 0,0 in FIG. 9 is configured in G 1,1 in FIG. 10
  • the weight element W 1,0 in FIG. 9 is configured in G 2,1 and so on in FIG. 10.
  • Each weight element corresponds to a ReRAM cell.
  • input data is input to the crossbar through the crossbar word line (input port 1004 shown in FIG. 10).
  • the input data can be expressed by voltage, so that the input data and the weight value configured in the ReRAM cell can be dot-multiplied, and the calculated result can be obtained from the output terminal of each column of the crossbar in the form of output voltage (as shown in FIG. 10)
  • the output port shown is 1006) output.
  • the computing unit that performs each layer of neural network operation in the neural network system is configured, the amount of data output by the adjacent neural network layer is considered, so that the computing power of the computing node that performs the operation of the adjacent neural network layer Able to match. Therefore, the data processing method provided by the embodiment of the present invention can make full use of the computing power of the computing node and improve the data processing efficiency of the neural network system.
  • an embodiment of the present invention provides a resource allocation apparatus.
  • the device can be applied to the neural network system shown in FIG. 1 and FIG. 1A, and is used to allocate computing nodes that perform operations of different neural network layers, so that the computing nodes used to perform operations of two adjacent neural network layers in the neural network system The matching of computing power improves the data processing efficiency in the neural network system and does not waste computing resources.
  • the resource allocation device may be located in the host, may be implemented by a processor in the host, or may be a physical device that exists independently of the processor. For example, it can be used as a processor-independent compiler.
  • the resource allocation apparatus 1100 may include an acquisition module 1102, a calculation module 1104, and a deployment module 1106.
  • An obtaining module 1102 configured to obtain the data amount of the first output data of the first neural network layer and the data amount of the second output data of the second neural network layer in the neural network system, the input of the second neural network layer
  • the data includes the first output data.
  • the calculation module 1104 is configured to determine N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer according to deployment requirements of the neural network system. Wherein, N and M are both positive integers, and the ratio of N and M corresponds to the ratio of the data volume of the first output data to the data volume of the second output data.
  • the neural network system includes multiple neural network chips, each neural network chip includes multiple computing units, and each computing unit includes at least one resistive random access memory cross matrix ReRAM crossbar .
  • the deployment requirement includes a calculation delay.
  • the calculation module is used to calculate A data volume of the output data, the calculation delay, and a calculation frequency of the resistance random access memory cross matrix ReRAM crossbar in the calculation unit determine the value of N, and according to the data volume of the first output data and the The ratio of the data amount of the second output data and the value of N determine the value of M.
  • the deployment requirement includes the number of chips of the neural network system
  • the first neural network layer is a starting layer of the neural network system
  • the calculation module is configured to The number of Re, the number of ReRAM crossbars in each chip, the number of ReRAM crossbars required to deploy a weight for each layer of neural network, and the ratio of the output data volume of adjacent neural network layers determine the value of N
  • the value of M is determined according to the ratio of the data amount of the first output data to the data amount of the second output data and the value of N.
  • a deployment module 1106, configured to deploy N first weights to P computing units according to the calculation specifications of the calculation units in the neural network system, and deploy M M second weights to Q calculations On the unit, where P and Q are both positive integers, the P computing units are used to perform operations of the first neural network layer, and the Q computing units are used to perform operations of the second neural network layer.
  • the calculation specification of the calculation unit refers to the number of crossbars included in one calculation unit. In practical applications, a computing unit may include one or more crossbars. Specifically, after the calculation module 1104 obtains the number of weights to be configured for each layer of the neural network, the deployment module 1106 may deploy the weights of each layer on the corresponding calculation unit according to the number of crossbars included in each calculation unit.
  • the elements in the weight matrix are respectively configured in the ReRAM cells of the crossbar of the calculation unit.
  • the computing unit may refer to a PE or an engine, one PE may include multiple engines, and one engine may include one or more crossbars. Since the weight of each layer may be different, a weight can be deployed on one or more engines.
  • the neural network system shown in FIG. 1 includes multiple neural network chips, each neural network chip includes multiple secondary computing nodes, and each secondary computing node includes multiple computing units.
  • the resource allocation device 1100 may further include a mapping module 1108 for mapping the computing unit to the superior computing node of the computing unit. Specifically, after the calculation module 1104 obtains N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer, the mapping module 1108 is used to establish The mapping relationship between the N first weights and the P computing units, and the mapping relationship between the M second weights and the Q computing units is established.
  • mapping module 1108 is further configured to map the P computing units and the Q computing units to multiple second units according to the number of computing units included in the secondary computing node in the neural network system In a level computing node, at least a part of the P computing units and at least a part of the Q computing units are mapped into the same level two computing node.
  • mapping module 1108 is further configured to map the P computing units and the Q computing units to the plurality of secondary computing nodes according to the number of secondary computing nodes included in each neural network chip Map into the multiple neural network chips. Wherein, at least a part of the secondary computing nodes of the secondary computing nodes to which the P computing units belong and at least a part of the secondary computing nodes of the secondary computing nodes to which the Q computing units belong are mapped to the same neural network In the chip.
  • mapping module 1108 establishes the mapping relationship between the N first weights and the P computing units, establishes the mapping relationship between the M second weights and the Q computing units, and how The P computing units and the Q computing units are respectively mapped to the upper-level computing nodes of the computing unit.
  • An embodiment of the present invention also provides a computer program product that implements the above resource allocation method, and an embodiment of the present invention also provides a computing program product that implements the above data processing method.
  • the above computer program products all include programs that store program codes.
  • a computer-readable storage medium. The instructions included in the program code are used to execute the method flow described in any one of the foregoing method embodiments.
  • Persons of ordinary skill in the art may understand that the foregoing storage medium includes: a USB flash drive, a mobile hard disk, a magnetic disk, an optical disk, a random access memory (Random-Access Memory, RAM), a solid state disk (SSD), or a non-volatile memory
  • RAM random access memory
  • SSD solid state disk
  • a non-transitory machine-readable medium that can store program code, such as a non-volatile memory.

Abstract

The present application provides a computing resource allocation technology and a neural network system. The neural network system comprises a processor and a plurality of neural network chips connected to the processor, and each neural network chip comprises a plurality of computing units having a computing-in-memory function. The processor configures, for each neural network layer, according to output data volumes of neural network layers in the neural network system, a computing unit for executing operations of this neural network layer, so that the computing power of the computing units for executing the operations of adjacent neural network layers matches each other. The neural network system provided by the present application can be applied to the field of artificial intelligence, and improve the data processing efficiency of the neural network system.

Description

计算资源分配技术及神经网络系统Computing resource allocation technology and neural network system 技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种计算资源分配技术及神经网络系统。The present application relates to the field of computer technology, in particular to a computing resource allocation technology and a neural network system.
背景技术Background technique
深度学习(Deep Learning,DL)是人工智能(Artificial Intelligence,AI)的一个重要分支,深度学习是为了模仿人脑构造的一种神经网络,可以达到比传统的浅层学习方式更好的识别效果。卷积神经网络(Convolutional Neural Network,CNN)是一种最常见的深度学习架构,也是研究最广泛的深度学习方法。典型的卷积神经网络处理领域是图像处理。图像处理是对输入的图像进行识别和分析的应用,最终输出一组分类完成的图像内容。例如,我们可以利用卷积神经网络算法对一张图片上机动车的车身颜色、车牌号码及车型进行提取并分类输出。Deep learning (DL) is an important branch of artificial intelligence (Artificial Intelligence, AI). Deep learning is a neural network constructed to imitate the human brain, which can achieve better recognition results than traditional shallow learning methods. . Convolutional neural network (Convolutional Neural Network, CNN) is one of the most common deep learning architectures and the most widely studied deep learning method. The typical field of convolutional neural network processing is image processing. Image processing is an application to identify and analyze the input image, and finally output a set of classified image content. For example, we can use the convolutional neural network algorithm to extract and classify the body color, license plate number and model of a motor vehicle on a picture.
卷积神经网络通常通过一个三层序列:卷积层(Convolutional Layer)、池化层(Pooling Layer)和修正线性单元(Rectified Liner Units,ReLU),对图片的特征进行提取。提取图片特征的过程实际上是一系列矩阵操作(例如,矩阵乘加操作)的过程。因此,如何对网络中的图片并行、快速的处理成为一个卷积神经网络需要研究的问题。Convolutional neural networks usually use a three-layer sequence: convolutional layer (Convolutional Layer), pooling layer (Pooling Layer) and modified linear units (Rectified Linear Units, ReLU) to extract the features of the picture. The process of extracting picture features is actually a series of matrix operations (for example, matrix multiply-add operations). Therefore, how to process the pictures in the network in parallel and quickly becomes a problem to be studied in a convolutional neural network.
发明内容Summary of the invention
本申请提供的一种计算资源分配技术及神经网络系统,能够提升神经网络中的数据处理速度。The application provides a computing resource allocation technology and a neural network system, which can improve the data processing speed in the neural network.
第一方面,本发明实施例提供了一种应用于神经网络系统中的计算资源分配方法。该方法可以由与神经网络芯片连接的主机host来执行。根据该方法中,在获取所述神经网络系统中第一神经网络层的第一输出数据的数据量和第二神经网络层的第二输出数据的数据量后,根据所述神经网络系统的部署需求确定所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重。进一步的,根据所述神经网络系统中的计算单元的计算规格,将N个所述第一权重部署到P个计算单元上,并将M个所述第二权重部署到Q个计算单元上。其中,所述第二神经网络层的输入数据包括所述第一输出数据,N和M都是正整数,且N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值对应。P和Q都是正整数,所述P个计算单元用于执行所述第一神经网络层的操作,所述Q个计算单元用于执行所述第二神经网络层的操作。In a first aspect, an embodiment of the present invention provides a computing resource allocation method applied in a neural network system. This method can be performed by the host host connected to the neural network chip. According to this method, after acquiring the data volume of the first output data of the first neural network layer and the data volume of the second output data of the second neural network layer in the neural network system, according to the deployment of the neural network system The requirement determines N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer. Further, according to the calculation specifications of the calculation units in the neural network system, N first weights are deployed on P calculation units, and M second weights are deployed on Q calculation units. The input data of the second neural network layer includes the first output data, N and M are both positive integers, and the ratio of N and M to the data volume of the first output data and the second output data The ratio of the amount of data corresponds. Both P and Q are positive integers, the P computing units are used to perform operations of the first neural network layer, and the Q computing units are used to perform operations of the second neural network layer.
本发明实施例提供的计算资源分配方法,在根据部署需求配置执行每一层神经网络操作的计算单元时,考虑了相邻神经网络层输出的数据量,使执 行不同神经网络层操作的计算节点的计算能力相匹配,从而能够充分利用执行每一层神经网络操作的计算节点的计算能力,提升数据处理的效率。The computing resource allocation method provided by the embodiment of the present invention takes into account the amount of data output by the adjacent neural network layer when configuring the computing unit that executes each layer of neural network operation according to the deployment requirements, so that the computing nodes that perform different neural network layer operations The computing power of the computer is matched, so that the computing power of the computing nodes that perform each layer of neural network operations can be fully utilized to improve the efficiency of data processing.
结合第一方面,在一种可能的实现方式下,所述部署需求包括计算时延,所述第一神经网络层为所述神经网络系统中所有神经网络层的起始层。所述确定所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重包括:根据所述第一输出数据的数据量、所述计算时延以及计算单元中的阻变式随机访问存储器交叉矩阵ReRAM crossbar的计算频率确定所述N的值;根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值确定所述M的值。With reference to the first aspect, in a possible implementation manner, the deployment requirement includes a calculation delay, and the first neural network layer is a starting layer of all neural network layers in the neural network system. The determining N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer includes: according to the data amount of the first output data, the calculation The delay and the calculation frequency of the resistance random access memory cross matrix ReRAM crossbar in the calculation unit determine the value of N; according to the ratio of the data amount of the first output data to the data amount of the second output data and The value of N determines the value of M.
具体的,在一种可能的实现方式中,当所述第一神经网络层为所述神经网络系统中所有神经网络层的起始层时,所述N的值可以根据下面的公式获得:Specifically, in a possible implementation manner, when the first neural network layer is the starting layer of all neural network layers in the neural network system, the value of N may be obtained according to the following formula:
Figure PCTCN2018125239-appb-000001
Figure PCTCN2018125239-appb-000001
其中,
Figure PCTCN2018125239-appb-000002
用于指示第一层神经网络所需配置的权重的数量N,
Figure PCTCN2018125239-appb-000003
为第一层神经网络的输出数据的行数,
Figure PCTCN2018125239-appb-000004
为所述第一层神经网络输出数据的列数。t为设置的计算时延,f为计算单元中的CrossBar的计算频率。所述M的值可以根据下面的公式来计算:N/M=第一输出数据量/第二输出数据量。
among them,
Figure PCTCN2018125239-appb-000002
Used to indicate the number N of weights required for the first layer of neural network configuration,
Figure PCTCN2018125239-appb-000003
Is the number of rows of output data of the first layer of neural network,
Figure PCTCN2018125239-appb-000004
The number of columns of output data for the first-layer neural network. t is the set calculation delay, and f is the calculation frequency of the CrossBar in the calculation unit. The value of M can be calculated according to the following formula: N/M=first output data amount/second output data amount.
结合第一方面,在又一种可能的实现方式下,所述神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个计算单元,每个计算单元包括至少一个阻变式随机访问存储器交叉矩阵ReRAM crossbar,所述部署需求包括所述神经网络系统的芯片的数量。当所述第一神经网络层为所述神经网络系统的起始层时,所述确定所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重包括:根据所述芯片的数量、每个芯片中的ReRAM crossbar的数量、部署每一层神经网络的一个权重所需的ReRAM crossbar的数量、以及相邻神经网络层的输出数据量的比值确定所述N的值;根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值确定所述M的值。With reference to the first aspect, in yet another possible implementation manner, the neural network system includes multiple neural network chips, each neural network chip includes multiple computing units, and each computing unit includes at least one resistive random access The memory cross matrix ReRAM crossbar, the deployment requirements include the number of chips of the neural network system. When the first neural network layer is the starting layer of the neural network system, it is determined that N first weights to be configured for the first neural network layer and M to be configured for the second neural network layer The second weights include: according to the number of the chips, the number of ReRAM crossbars in each chip, the number of ReRAM crossbars required to deploy one weight for each layer of neural network, and the amount of output data of the adjacent neural network layer Determines the value of N; determines the value of M according to the ratio of the data amount of the first output data and the data amount of the second output data and the value of N.
具体的,在一种可能的实现方式中,当部署需求为所述神经网络系统所需的芯片数量,且所述第一神经网络层为所述神经网络系统的起始层时,可以根据下述两个公式获得所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重,其中N的值即为
Figure PCTCN2018125239-appb-000005
的值,M的值即为
Figure PCTCN2018125239-appb-000006
的值。
Specifically, in a possible implementation manner, when the deployment requirement is the number of chips required by the neural network system, and the first neural network layer is the starting layer of the neural network system, the following The two formulas are used to obtain N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer, where the value of N is
Figure PCTCN2018125239-appb-000005
, The value of M is
Figure PCTCN2018125239-appb-000006
Value.
Figure PCTCN2018125239-appb-000007
Figure PCTCN2018125239-appb-000007
Figure PCTCN2018125239-appb-000008
Figure PCTCN2018125239-appb-000008
其中,xb 1用于表示部署第一层(或称为起始层)神经网络的一个权重所需的 crossbar的数量,
Figure PCTCN2018125239-appb-000009
用于表示起始层所需的权重数量,xb 2用于表示部署第二层神经网络中的一个权重所需的crossbar的数量,
Figure PCTCN2018125239-appb-000010
用于表示第二层神经网络所需的权重的数量。xb n用于表示部署第n层神经网络中的一份权重所需的crossbar的数量,
Figure PCTCN2018125239-appb-000011
用于表示第n层神经网络所需的权重的数量,K为部署需求所要求的神经网络系统的芯片的数量,L为每个芯片中的crossbar的数量。
Figure PCTCN2018125239-appb-000012
用于表示第i层所需的权重的数量;
Figure PCTCN2018125239-appb-000013
用于表示第i-1层所需的权重的数量,
Figure PCTCN2018125239-appb-000014
用于表示第i层输出数据的行数,
Figure PCTCN2018125239-appb-000015
用于表示第i层输出数据的列数,
Figure PCTCN2018125239-appb-000016
用于表示第i-1层输出数据的行数,
Figure PCTCN2018125239-appb-000017
用于表示第i-1层输出数据的列数,i的值可以从2到n,n为所述神经网络系统中神经网络层的总层数。
Among them, xb 1 is used to represent the number of crossbars required to deploy a weight of the first layer (or called the starting layer) neural network,
Figure PCTCN2018125239-appb-000009
Used to represent the number of weights required for the starting layer, xb 2 is used to represent the number of crossbars required to deploy one weight in the second layer of neural network,
Figure PCTCN2018125239-appb-000010
Used to represent the number of weights required for the second layer of neural network. xb n is used to represent the number of crossbars required to deploy a weight in the nth layer neural network,
Figure PCTCN2018125239-appb-000011
It is used to represent the number of weights required for the nth layer neural network, K is the number of chips of the neural network system required for deployment requirements, and L is the number of crossbars in each chip.
Figure PCTCN2018125239-appb-000012
Used to represent the number of weights required for layer i;
Figure PCTCN2018125239-appb-000013
Used to represent the number of weights required for layer i-1,
Figure PCTCN2018125239-appb-000014
Used to represent the number of rows of output data of the i-th layer,
Figure PCTCN2018125239-appb-000015
Used to represent the number of columns of the output data of the i-th layer,
Figure PCTCN2018125239-appb-000016
Used to represent the number of rows of output data of the i-1th layer,
Figure PCTCN2018125239-appb-000017
It is used to represent the number of columns of the output data of the i-1th layer. The value of i can be from 2 to n, where n is the total number of neural network layers in the neural network system.
结合第一方面,在又一种可能的实现方式下,所述神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个二级计算节点,每个二级计算节点包括多个计算单元,所述方法还包括根据所述神经网络系统中的二级计算节点包含的计算单元的数量,将所述P个计算单元以及所述Q个计算单元映射到多个二级计算节点中。其中,所述P个计算单元中的至少一部分计算单元和所述Q个计算单元中的至少一部分计算单元被映射到同一个二级计算节点中。根据这种方式,能够使执行相邻神经网络层操作的计算单元尽量多的位于同一个二级计算节点中,从而能够减少计算节点间传输的数据量,提高不同神经网络层间数据传输的速度。With reference to the first aspect, in yet another possible implementation manner, the neural network system includes multiple neural network chips, each neural network chip includes multiple secondary computing nodes, and each secondary computing node includes multiple computing Units, the method further includes mapping the P computing units and the Q computing units into multiple secondary computing nodes according to the number of computing units included in the secondary computing nodes in the neural network system. Wherein, at least a part of the P computing units and at least a part of the Q computing units are mapped into the same secondary computing node. According to this method, the computing units that perform the operations of adjacent neural network layers can be located in the same secondary computing node as much as possible, thereby reducing the amount of data transmitted between computing nodes and increasing the speed of data transmission between different neural network layers .
在又一种可能的实现方式下,所述方法还包括根据每个神经网络芯片包含的二级计算节点的数量,将所述P个计算单元以及所述Q个计算单元映射的所述多个二级计算节点映射到所述多个神经网络芯片中。其中,所述P个计算单元所属的二级计算节点中的至少一部分二级计算节点与所述Q个计算单元所属的二级计算节点中的至少一部分二级计算节点被映射到同一个神经网络芯片中。根据这种方式,能够使执行相邻神经网络层操作的二级计算节点尽量多的位于同一个神经网络芯片中,进一步的减少计算节点间传输的数据量,提高不同神经网络层间数据传输的速度。In yet another possible implementation manner, the method further includes mapping the plurality of P computing units and the Q computing units according to the number of secondary computing nodes included in each neural network chip The secondary computing nodes are mapped into the multiple neural network chips. Wherein, at least a part of the secondary computing nodes of the secondary computing nodes to which the P computing units belong and at least a part of the secondary computing nodes of the secondary computing nodes to which the Q computing units belong are mapped to the same neural network In the chip. According to this method, the secondary computing nodes that perform the operations of adjacent neural network layers can be located in the same neural network chip as much as possible, which further reduces the amount of data transmitted between the computing nodes and improves the data transmission between different neural network layers. speed.
结合上述第一方面及第一方面的上述一种可能的实现方式,在又一种可能的实现方式下,所述N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值对应包括:所述N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值相同。With reference to the foregoing first aspect and the foregoing one possible implementation manner of the first aspect, in yet another possible implementation manner, the ratio of the N and M and the data amount of the first output data and the second The corresponding ratio of the data volume of the output data includes: the ratio of the ratio of the N and M to the data volume of the first output data and the data volume of the second output data is the same.
第二方面,本申请提供了一种神经网络系统,包括主机和多个神经网络芯片,每个神经网络芯片中包括多个计算单元,所述主机与所述多个神经网络芯片连接并用于执行上述第一方面或第一方面的任意一种可能的实现方式中所述的计算资源分配方法。In a second aspect, the present application provides a neural network system, including a host and a plurality of neural network chips, each neural network chip includes a plurality of computing units, the host is connected to the plurality of neural network chips and used for execution The computing resource allocation method described in the foregoing first aspect or any possible implementation manner of the first aspect.
第三方面,本申请提供了一种资源分配装置,包括能够执行上述第一方面以及第一方面任意一种可能的实现方式中所述的计算资源分配方法的功能模块。In a third aspect, the present application provides a resource allocation apparatus, including a functional module capable of executing the computing resource allocation method described in the first aspect and any possible implementation manner of the first aspect.
第四方面,本申请还提供了一种计算机程序产品,包括程序代码,所述程序代码包括的指令被计算机所执行,以实现所述第一方面以及所述第一方面的任意一种可能的实现方式中所述的计算资源分配方法。In a fourth aspect, the present application also provides a computer program product, including program code, and the instructions included in the program code are executed by a computer to implement the first aspect and any possible one of the first aspect The computing resource allocation method described in the implementation.
第五方面,本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储程序代码,所述程序代码包括的指令被计算机所执行,以实现前述第一方面以及所述第一方面的任意一种可能的实现方式中所述的计算资源分配方法。In a fifth aspect, the present application also provides a computer-readable storage medium for storing program code, and the instructions included in the program code are executed by a computer to implement the foregoing first aspect and The method for computing resource allocation described in any possible implementation manner of the first aspect.
附图说明BRIEF DESCRIPTION
为了更清楚的说明本发明实施例或现有技术中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例。In order to more clearly explain the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings required in the description of the embodiments. Obviously, the drawings in the following description are only for the invention. Some embodiments.
图1为本发明实施例提供的一种神经网络系统的结构示意图;1 is a schematic structural diagram of a neural network system provided by an embodiment of the present invention;
图1A为本发明实施例提供的又一种神经网络系统的结构示意图;1A is a schematic structural diagram of yet another neural network system provided by an embodiment of the present invention;
图2为本发明实施例提供的一种神经网络芯片中的计算节点的结构示意图;2 is a schematic structural diagram of a computing node in a neural network chip provided by an embodiment of the present invention;
图3为本发明实施例提供的一种神经网络系统中的神经网络层的逻辑结构示意图;3 is a schematic diagram of a logical structure of a neural network layer in a neural network system according to an embodiment of the present invention;
图4为本发明实施例提供的一种神经网络系统中处理不同神经网络层数据的计算节点集合示意图;4 is a schematic diagram of a set of computing nodes for processing data of different neural network layers in a neural network system according to an embodiment of the present invention;
图5为本发明实施例提供的一种神经网络系统中的计算资源分配方法流程图;5 is a flowchart of a method for computing resource allocation in a neural network system according to an embodiment of the present invention;
图6为本发明实施例提供的又一种计算资源分配方法流程图;6 is a flowchart of yet another method for computing resource allocation according to an embodiment of the present invention;
图6A为本发明实施例提供的一种资源映射方法流程图;6A is a flowchart of a resource mapping method according to an embodiment of the present invention;
图7为本发明实施例提供的又一种计算资源分配方法示意图;7 is a schematic diagram of another computing resource allocation method according to an embodiment of the present invention;
图8为本发明实施例提供的一种数据处理方法流程图;8 is a flowchart of a data processing method according to an embodiment of the present invention;
图9为本发明实施例提供的一个权重示意图;9 is a schematic diagram of weights provided by an embodiment of the present invention;
图10为本发明实施例提供的一种阻变式随机访问存储器交叉矩阵(ReRAM crossbar)的结构示意图;10 is a schematic structural diagram of a resistive random access memory crossbar (ReRAM crossbar) according to an embodiment of the present invention;
图11为本发明实施例提供的一种资源分配装置的结构示意图。11 is a schematic structural diagram of a resource allocation apparatus according to an embodiment of the present invention.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚的描述。显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be described clearly in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all the embodiments.
深度学习(Deep Learning,DL)是人工智能(Artificial Intelligence,AI)的一个重要分支,深度学习是为了模仿人脑构造的一种神经网络,可以达到比传统的浅层学习方式更好的识别效果。人工神经网络(Artificial Neural  Network,ANN),简称为神经网络(Neural Network,NN)或类神经网络,在机器学习和认知科学领域,是一种模仿生物神经网络(动物的中枢神经系统,特别是大脑)的结构和功能的数学模型或计算模型,用于对函数进行估计或近似。人工神经网络可以包括卷积神经网络(Convolutional Neural Network,CNN)、深度神经网络(Deep Neural Networks,DNN)、多层感知器(Multilayer Perceptron,MLP)等神经网络。图1为本发明实施例提供的一种人工神经网络系统的结构示意图。图1以卷积神经网络为例进行图示。如图1所示,卷积神经网络系统100可以包括主机105以及卷积神经网络电路110。卷积神经网络电路110也可以被称为神经网络加速器。卷积神经网络电路110通过主机接口与主机105连接。主机接口可以包括标准的主机接口以及网络接口(network interface)。例如,主机接口可以包括快捷外设互联标准(Peripheral Component Interconnect Express,PCIE)接口。如图1所示,卷积神经网络电路110可以通过PCIE总线106与主机105连接。因此,数据可以通过PCIE总线106输入至卷积神经网络电路110中,并通过PCIE总线106接收卷积神经网络电路110处理完成后的数据。并且,主机105也可以通过主机接口监测卷积神经网络电路110的工作状态。Deep learning (DL) is an important branch of artificial intelligence (Artificial Intelligence, AI). Deep learning is a neural network constructed to imitate the human brain, which can achieve better recognition results than traditional shallow learning methods. . Artificial neural network (Artificial Neural Network, ANN), referred to as neural network (NN) or neural network for short, in the field of machine learning and cognitive science, is a kind of imitation biological neural network (animal central nervous system, especially Is a mathematical or computational model of the structure and function of the brain), used to estimate or approximate functions. Artificial neural networks can include convolutional neural networks (Convolutional Neural Networks, CNN), deep neural networks (Deep Neural Networks, DNN), multilayer perceptrons (Multilayer Perceptron, MLP) and other neural networks. FIG. 1 is a schematic structural diagram of an artificial neural network system according to an embodiment of the present invention. Figure 1 illustrates the convolutional neural network as an example. As shown in FIG. 1, the convolutional neural network system 100 may include a host 105 and a convolutional neural network circuit 110. The convolutional neural network circuit 110 may also be referred to as a neural network accelerator. The convolutional neural network circuit 110 is connected to the host 105 through the host interface. The host interface may include a standard host interface and a network interface. For example, the host interface may include a Peripheral Component Interconnect Express (PCIE) interface. As shown in FIG. 1, the convolutional neural network circuit 110 may be connected to the host 105 through the PCIE bus 106. Therefore, the data can be input into the convolutional neural network circuit 110 through the PCIE bus 106, and receive the processed data of the convolutional neural network circuit 110 through the PCIE bus 106. Moreover, the host 105 may also monitor the working state of the convolutional neural network circuit 110 through the host interface.
主机105可以包括处理器1052以及内存1054。需要说明的是,除了图1所示的器件外,主机105还可以包括通信接口以及作为外存的磁盘等其他器件,在此不做限制。The host 105 may include a processor 1052 and memory 1054. It should be noted that, in addition to the devices shown in FIG. 1, the host 105 may also include a communication interface and other devices such as a magnetic disk as an external storage, which is not limited herein.
处理器(Processor)1052是主机105的运算核心和控制核心(Control Unit)。处理器1052中可以包括多个处理器核(core)。处理器1052可以是一块超大规模的集成电路。在处理器1052中安装有操作系统和其他软件程序,从而处理器1052能够实现对内存1054、缓存、磁盘及外设设备(如图1中的神经网络电路)的访问。可以理解的是,在本发明实施例中,处理器1052中的Core例如可以是中央处理器(Central Processing unit,CPU),还可以是其他特定集成电路(Application Specific Integrated Circuit,ASIC)。A processor (Processor) 1052 is an arithmetic core and a control core (Control Unit) of the host 105. The processor 1052 may include multiple processor cores. The processor 1052 may be a very large-scale integrated circuit. An operating system and other software programs are installed in the processor 1052, so that the processor 1052 can achieve access to the memory 1054, cache, disk, and peripheral devices (such as the neural network circuit in FIG. 1). It can be understood that, in the embodiment of the present invention, the Core in the processor 1052 may be, for example, a central processing unit (Central Processing Unit, CPU), or other specific integrated circuits (Application Specific Integrated Circuit, ASIC).
内存1054是主机105的主存。内存1054通过双倍速率(double data rate,DDR)总线和处理器1052相连。内存1054通常用来存放操作系统中各种正在运行的软件、输入和输出数据以及与外存交换的信息等。为了提高处理器1052的访问速度,内存1054需要具备访问速度快的优点。在传统的计算机系统架构中,通常采用动态随机存取存储器(Dynamic Random Access Memory,DRAM)作为内存1054。处理器1052能够通过内存控制器(图1中未示出)高速访问内存1054,对内存1054中的任意一个存储单元进行读操作和写操作。The memory 1054 is the main memory of the host 105. The memory 1054 is connected to the processor 1052 through a double data rate (DDR) bus. The memory 1054 is generally used to store various running software, input and output data, and information exchanged with external storage in the operating system. In order to improve the access speed of the processor 1052, the memory 1054 needs to have the advantage of fast access speed. In the traditional computer system architecture, dynamic random access memory (Dynamic Random Access Memory, DRAM) is usually used as the memory 1054. The processor 1052 can access the memory 1054 at a high speed through a memory controller (not shown in FIG. 1), and can perform a read operation and a write operation on any storage unit in the memory 1054.
卷积神经网络(CNN)电路110是由多个神经网络(NN)芯片(chip)组成的芯片阵列。例如,如图1所示,CNN电路110包括多个NN芯片115和多个路由器120。为了描述方便,本发明实施例将申请中的NN芯片115简称为芯片115。所述多个芯片115通过路由器120相互连接。例如,一个芯片115可以与一个或多个路由器120连接。多个路由器120可以组成一种或多种网络拓扑。芯片115之间可以通过所述一种或多种网络拓扑进行数据传输。例如,所述多个 路由器120可以组成第一网络1106以及第二网络1108,其中,第一网络1106为环形网络,第二网络1108为二维网状(2D mesh)网络。从而,从输入端口1102输入的数据能够所述多个路由器120组成的网络发送给相应的chip115,任意一个芯片115处理后的数据也可以通过所述多个路由器120组成的网络发送给其他芯片115处理或从输出端口1104发送出去。A convolutional neural network (CNN) circuit 110 is a chip array composed of multiple neural network (NN) chips. For example, as shown in FIG. 1, the CNN circuit 110 includes multiple NN chips 115 and multiple routers 120. For convenience of description, the embodiment of the present invention refers to the NN chip 115 in the application as chip 115 for short. The plurality of chips 115 are connected to each other through a router 120. For example, one chip 115 may be connected to one or more routers 120. Multiple routers 120 may constitute one or more network topologies. Data can be transmitted between the chips 115 through the one or more network topologies. For example, the plurality of routers 120 may constitute a first network 1106 and a second network 1108, where the first network 1106 is a ring network and the second network 1108 is a two-dimensional mesh (2D mesh) network. Therefore, the data input from the input port 1102 can be sent to the corresponding chip 115 by the network composed of the plurality of routers 120, and the data processed by any one chip 115 can also be sent to other chips 115 through the network composed of the plurality of routers 120. Process or send out from output port 1104.
进一步的,图1也示出了芯片115的结构示意图。如图1所示,chip115可以包括多个神经网络处理单元125以及多个路由器122。图1以神经网络处理单元为瓦片(tile)为例进行描述。在图1所示的数据处理芯片115的架构中,一个tile 125可以与一个或多个路由器122相连。芯片115中的所述多个路由器122可以组成一种或多种网络拓扑。Tile 125之间可以通过所述多种网络拓扑进行数据传输。例如,所述多个路由器122可以组成第一网络1156以及第二网络1158,其中,第一网络1156为环形网络,第二网络1158为二维网状(2D mesh)网络。从而,从输入端口1152输入芯片115的数据能够根据所述多个路由器122组成的网络发送给相应的tile 125,任意一个tile 125处理后的数据也可以通过所述多个路由器122组成的网络发送给其他tile 125或从输出端口1154发送出去。Further, FIG. 1 also shows a schematic structural diagram of the chip 115. As shown in FIG. 1, chip 115 may include multiple neural network processing units 125 and multiple routers 122. FIG. 1 takes the neural network processing unit as a tile for example. In the architecture of the data processing chip 115 shown in FIG. 1, one tile 125 may be connected to one or more routers 122. The multiple routers 122 in the chip 115 may constitute one or more network topologies. Data can be transmitted between tiles 125 through the various network topologies. For example, the plurality of routers 122 may constitute a first network 1156 and a second network 1158, where the first network 1156 is a ring network and the second network 1158 is a two-dimensional mesh (2D mesh) network. Therefore, the data input to the chip 115 from the input port 1152 can be sent to the corresponding tile 125 according to the network composed of the plurality of routers 122, and the data processed by any one tile 125 can also be sent through the network composed of the plurality of routers 122 Send to other tiles 125 or output from output port 1154.
需要说明的是,当芯片115之间通过路由器互联时,卷积神经网络电路110中的多个路由器120组成的一种或多种网络拓扑与数据处理芯片115中的多个路由器122组成的网络拓扑可以相同也可以不相同,只要芯片115之间或tile 125之间能够通过网络拓扑进行数据传输,且芯片115或tile 125能够通过所述网络拓扑接收数据或输出数据即可。在本发明实施例中并不对多个路由器120和122组成的网络的数量和类型进行限制。并且,本发明实施例中,路由器120和路由器122可以相同也可以不同。为了描述清楚,在图1中将连接芯片的路由器120和连接tile的路由器122在标识上进行了区分。为了描述方便,在本发明实施例中,也可以将卷积神经网络系统中的芯片115或tile 125称为计算节点(computing node)。It should be noted that when the chips 115 are interconnected by routers, one or more network topologies composed of multiple routers 120 in the convolutional neural network circuit 110 and a network composed of multiple routers 122 in the data processing chip 115 The topology may be the same or different, as long as data can be transmitted between the chips 115 or the tiles 125 through the network topology, and the chip 115 or the tiles 125 can receive data or output data through the network topology. In the embodiment of the present invention, the number and types of networks composed of multiple routers 120 and 122 are not limited. Moreover, in the embodiment of the present invention, the router 120 and the router 122 may be the same or different. For clarity of description, in FIG. 1, a distinction is made between the router 120 connected to the chip and the router 122 connected to the tile. For convenience of description, in the embodiment of the present invention, the chip 115 or tile 125 in the convolutional neural network system may also be referred to as a computing node.
实际应用中,在另一种情形下,芯片115之间还可以通过高速接口(High Transport IO)互联,而不通过路由器120进行互联。如图1A所示,图1A为本发明实施例提供的又一种神经网络系统的结构示意图。在图1A所示的神经网络系统中,主机105通过PCIE接口107与多个PCIE卡109连接,每个PCIE卡109上可以包括多个神经网络芯片115,神经网络芯片之间通过高速互联接口连接。在此不对芯片之间的互联方式进行限定。可以理解的是,实际应用中,芯片内部的tile之间也可以不通过路由器连接,而采用图1A所示的芯片之间的高速互联方式。在另一种情形下,还可以是芯片内部的tile之间采用图1所示的路由器连接,而芯片之间采用图1A所示的高速互联方式。本发明实施例并不对芯片之间或芯片内部的连接方式进行限定。In practical applications, in another situation, the chips 115 may also be interconnected through a high-speed interface (High Transport IO) instead of the router 120. As shown in FIG. 1A, FIG. 1A is a schematic structural diagram of yet another neural network system according to an embodiment of the present invention. In the neural network system shown in FIG. 1A, the host 105 is connected to multiple PCIE cards 109 through a PCIE interface 107, and each PCIE card 109 may include multiple neural network chips 115, and the neural network chips are connected through a high-speed interconnection interface . The interconnection between chips is not limited here. It can be understood that, in actual applications, the tiles within the chip may not be connected by a router, and the high-speed interconnection method between the chips shown in FIG. 1A is adopted. In another case, it is also possible to use the router connection shown in FIG. 1 between the tiles within the chip, and the high-speed interconnection method shown in FIG. 1A between the chips. The embodiments of the present invention do not limit the connection modes between chips or within chips.
图2为本发明实施例提供的一种神经网络芯片中的计算节点的结构示意图。如图2所示,芯片115中包括多个路由器120,每个路由器可以连接一个tile 125。实际应用中,一个路由器120还可以连接多个tile 125。如图2所示, 每个tile 125可以包括输入输出接口(TxRx)1252、交换装置(TSW)1254以及多个处理器件(processing element,PE)1256。所述TxRx 1252用于接收从Router120输入tile125的数据,或者输出tile125的计算结果。换一种表达方式,TxRx 1252用于实现tile 125和router120之间的数据传输。交换机(TSW)1254连接TxRx 1252,所述TSW 1254用于实现所述TxRx 1252以及多个PE 1256之间的数据传输。每个PE 1256中可以包括一个或多个计算引擎(computing engine)1258,所述一个或多个计算引擎1258用于实现对输入PE 1256中的数据进行神经网络计算。例如,可以对输入tile 125的数据与tile125中预设的卷积核进行乘加运算。Engine 1258的计算结果可以通过TSW 1254以及TxRx 1252发送给其他tile 125。实际应用中,一个Engine 1258可以包括实现卷积、池化pooling或其他神经网络操作的模块。在此,不对Engine的具体电路或功能进行限定。为了描述简便,在本发明实施例中,将计算引擎简称为引擎engine。2 is a schematic structural diagram of a computing node in a neural network chip provided by an embodiment of the present invention. As shown in FIG. 2, the chip 115 includes multiple routers 120, and each router can be connected to a tile 125. In practical applications, one router 120 can also connect multiple tiles 125. As shown in FIG. 2, each tile 125 may include an input-output interface (TxRx) 1252, a switching device (TSW) 1254, and multiple processing devices (PE) 1256. The TxRx 1252 is used to receive the data input to the tile 125 from the Router 120 or output the calculation result of the tile 125. To put it another way, TxRx 1252 is used to transfer data between tile 125 and router 120. A switch (TSW) 1254 is connected to TxRx 1252, and the TSW 1254 is used to implement data transmission between the TxRx 1252 and multiple PEs 1256. Each PE 1256 may include one or more computing engines (computing engines) 1258. The one or more computing engines 1258 are used to implement neural network calculations on the data in the input PE 1256. For example, the data input to tile 125 and the convolution kernel preset in tile 125 may be multiplied and added. The calculation result of Engine 1258 can be sent to other tiles 125 through TSW 1254 and TxRx 1252. In practical applications, an Engine 1258 may include modules that implement convolution, pooling, or other neural network operations. Here, the specific circuit or function of the Engine is not limited. For simplicity of description, in the embodiment of the present invention, the calculation engine is simply referred to as the engine engine.
本领域技术人员可以知道,由于阻变式随机访问存储器(Resistive random-access memory,ReRAM)这种新型的非易失性存储器具有集存储和计算于一体的优势,近年来,也被广泛应用于神经网络系统中。例如,多个忆阻器单元(ReRAM cell)组成的阻变式随机访问存储器交叉矩阵(ReRAM crossbar)可用于在神经网络系统中执行矩阵乘加运算。在本发明实施例中,Engine 1258可以包括一个或多个crossbar。ReRAM crossbar的结构可以如图10所示,后面会对如何通过ReRAM crossbar进行矩阵乘加运算进行介绍。根据上述对神经网络的介绍可以看出,本发明实施例提供的神经网络电路包括多个NN芯片,每个NN芯片包括多个瓦片tile,每个tile包括多个处理器件PE,每个PE包括多个引擎engine,每个engine由一个或多个ReRAM crossbar来实现。由此可见,本发明实施例提供的神经网络系统可以包括多级计算节点,例如,可以包括四级计算节点:第一级计算节点为芯片115,第二级计算节点为芯片内的tile,第三级计算节点为tile内的PE,第四级计算节点为PE内的Engine。Those skilled in the art can know that because of the new type of non-volatile memory (Resistive random-access memory, ReRAM), which has the advantages of integrating storage and calculation, it has also been widely used in recent years. Neural network system. For example, a resistive random access memory crossbar (ReRAM) crossbar composed of multiple memristor cells (ReRAM) can be used to perform matrix multiply-add operations in a neural network system. In the embodiment of the present invention, Engine 1258 may include one or more crossbars. The structure of ReRAM crossbar can be shown in Figure 10. Later, we will introduce how to perform matrix multiply-add operation through ReRAM crossbar. It can be seen from the above introduction to the neural network that the neural network circuit provided by the embodiment of the present invention includes multiple NN chips, each NN chip includes multiple tile tiles, and each tile includes multiple processing devices PE, and each PE Including multiple engine engines, each engine is realized by one or more ReRAM crossbars. It can be seen that the neural network system provided by the embodiment of the present invention may include multi-level computing nodes, for example, may include four-level computing nodes: the first-level computing node is a chip 115, the second-level computing node is a tile within the chip, and The third-level computing node is the PE in the tile, and the fourth-level computing node is the Engine in the PE.
另一方面,本领域技术人员可以知道,神经网络系统可以包括多个神经网络层。在本发明实施例中,神经网络层为逻辑的层概念,一个神经网络层是指要执行一次神经网络操作。每一层神经网络计算均是由计算节点来实现。神经网络层可以包括卷积层、池化层等。如图3所示,神经网络系统中可以包括n个神经网络层(又可以被称为n层神经网络),其中,n为大于或等于2的整数。图3示出了神经网络系统中的部分神经网络层,如图3所示,神经网络系统可以包括第一层302、第二层304、第三层306、第四层308、第五层310至第n层312。其中,第一层302可以执行卷积操作,第二层304可以是对第一层302的输出数据执行池化操作,第三层306可以是对第二层304的输出数据执行卷积操作,第四层308可以对第三层306的输出结果执行卷积操作,第五层310可以对第二层304的输出数据以及第四层308的输出数据执行求和操作等等。可以理解的是,图4只是对神经网络系统中的神经网络层的一个简单示例和说明,并不对每一层神经网络的具体操作进行限制,例如,第四层308也可以是池化运算,第 五层310也可以是做卷积操作或池化操作等其他的神经网络操作。On the other hand, those skilled in the art may know that the neural network system may include multiple neural network layers. In the embodiment of the present invention, the neural network layer is a logical layer concept. A neural network layer refers to performing a neural network operation once. Each layer of neural network computing is implemented by computing nodes. The neural network layer may include a convolution layer, a pooling layer, and the like. As shown in FIG. 3, the neural network system may include n neural network layers (also called n-layer neural networks), where n is an integer greater than or equal to 2. FIG. 3 shows some neural network layers in the neural network system. As shown in FIG. 3, the neural network system may include a first layer 302, a second layer 304, a third layer 306, a fourth layer 308, and a fifth layer 310 To the nth layer 312. Among them, the first layer 302 can perform a convolution operation, the second layer 304 can perform a pooling operation on the output data of the first layer 302, and the third layer 306 can perform a convolution operation on the output data of the second layer 304, The fourth layer 308 may perform a convolution operation on the output result of the third layer 306, and the fifth layer 310 may perform a sum operation on the output data of the second layer 304 and the output data of the fourth layer 308, and so on. It can be understood that FIG. 4 is only a simple example and description of the neural network layer in the neural network system, and does not limit the specific operation of each layer of the neural network. For example, the fourth layer 308 may also be a pooling operation. The fifth layer 310 may also be other neural network operations such as convolution operations or pooling operations.
在现有的神经网络系统中,当神经网络中的第i层计算完成后,会将第i层的计算结果暂存在预设的缓存中,在执行第i+1层的计算时,计算单元需要重新从预设的缓存中加载第i层的计算结果和第i+1层的权重进行计算。其中,第i层为神经网络系统中的任意一层。在本发明实施例中,由于神经网络系统的Engine中采用了ReRAM crossbar,又由于ReRAM具有存算一体的优势,权重可以在计算之前配置到ReRAM cell上,而计算结果可以直接发送给下一层进行流水线计算。因此,每层神经网络只用缓存很少的数据,例如,每层神经网络只需要缓存够一次窗口计算的输入数据即可。进一步的,为了实现对数据的并行、快速处理,本发明实施例提供了一种通过神经网络对数据进行流处理的方式。为了描述清楚,下面结合图1的卷积神经网络系统简要介绍一下神经网络系统的流处理。In the existing neural network system, after the calculation of the i-th layer in the neural network is completed, the calculation result of the i-th layer will be temporarily stored in the preset buffer. When performing the calculation of the i+1th layer, the calculation unit The calculation result of the i-th layer and the weight of the i+1th layer need to be reloaded from the preset cache. Among them, the i-th layer is any layer in the neural network system. In the embodiment of the present invention, because ReRAM uses a crossbar in the engine of the neural network system, and because ReRAM has the advantage of integrating storage and calculation, the weights can be configured on the ReRAM before calculation, and the calculation results can be directly sent to the next layer Perform pipeline calculations. Therefore, each layer of neural network only needs to cache very little data. For example, each layer of neural network only needs to cache enough input data for one window calculation. Further, in order to realize parallel and fast processing of data, an embodiment of the present invention provides a method for streaming data through a neural network. For clarity of description, the following briefly introduces the stream processing of the neural network system in conjunction with the convolutional neural network system of FIG. 1.
如图4所示,为了实现对数据的快速处理,可以将系统中的计算节点分为多个节点集合以分别执行不同神经网络层的计算。图4以对图1所示神经网络系统中的tile 125划分为例,对本发明实施例中实现不同层的神经网络计算的不同计算节点集合进行示例。如图4所示,可以将芯片115中的多个tile 125划分为多个节点集合。例如:第一节点集合402、第二节点集合404、第三节点集合406、第四节点集合408以及第五节点集合410。其中,每个节点集合中包括至少一个计算节点(例如tile 125)。同一节点集合的计算节点用于对进入同一神经网络层的数据执行神经网络操作,不同神经网络层的数据由不同节点集合的计算节点进行处理。一个计算节点处理后的处理结果将传输给其他节点集合中的计算节点进行处理,这种流水线式的处理方式使每一层神经网络只需要缓存很少的数据,并能够使得多个计算节点并发处理同一条数据流,提高处理效率。需要说明的是,图4是以tile为例对用于处理不同神经网络层(例如卷积层)的计算节点集合进行示例。实际应用中,由于tile中包含多个PE,每个PE中包含多个Engine,并且不同应用场景所需的计算量不同。因此,也可以根据实际的应用情况,以PE、Engine或chip为粒度对神经网络系统中的计算节点进行划分,使得不同的集合中的计算节点用于处理不同神经网络层的操作。根据这种方式,本发明实施例所指的计算节点可以是Engine、PE、tile、或芯片chip。As shown in FIG. 4, in order to realize rapid processing of data, the computing nodes in the system can be divided into multiple node sets to perform calculations of different neural network layers respectively. FIG. 4 takes the division of tiles 125 in the neural network system shown in FIG. 1 as an example to illustrate different sets of computing nodes that implement neural network computing at different layers in the embodiment of the present invention. As shown in FIG. 4, multiple tiles 125 in the chip 115 may be divided into multiple node sets. For example: first node set 402, second node set 404, third node set 406, fourth node set 408, and fifth node set 410. Among them, each node set includes at least one computing node (for example, tile 125). The computing nodes of the same node set are used to perform neural network operations on the data entering the same neural network layer, and the data of different neural network layers are processed by the computing nodes of different node sets. The processing results of a computing node will be transmitted to the computing nodes in other node sets for processing. This pipelined processing method makes each layer of neural network only need to cache very little data, and can make multiple computing nodes concurrent Processing the same data stream to improve processing efficiency. It should be noted that FIG. 4 uses tiles as an example to illustrate a set of computing nodes used to process different neural network layers (for example, convolutional layers). In actual applications, because a tile contains multiple PEs, each PE contains multiple Engines, and different application scenarios require different amounts of calculation. Therefore, according to the actual application situation, the computing nodes in the neural network system can be divided with the granularity of PE, Engine or chip, so that the computing nodes in different sets are used to handle the operations of different neural network layers. According to this manner, the computing node referred to in the embodiment of the present invention may be Engine, PE, tile, or chip.
此外,本领域技术人员可以知道,计算节点(例如,tile125)在执行神经网络操作(例如,卷积计算)时,可以基于对应神经网络层的权重(weight)对输入计算节点的数据进行计算,例如,某个tile 125可以基于对应卷积层的权重对输入该tile125的输入数据执行卷积操作,例如,对权重与输入数据执行矩阵乘加计算。权重通常用于表示输入数据对于输出数据的重要程度。在神经网络中,权重通常用一个矩阵表示。如图9所示,图9所示的j行k列的权重矩阵可以是一个神经网络层的一个权重,该权重矩阵中的每一个元素代表一个权重值。在本发明实施例中,由于一个节点集合的计算节点用于执行一个神经网络层的操作,因此,同一节点集合的计算节点可以共享权重,不同节点集合中的计算节点 的权重可以不相同。在本发明实施例中,各计算节点中的权重可以通过事先配置完成。具体的,一个权重矩阵中的每个元素被配置在对应的crossbar阵列中的ReRAM cell中,从而,可以通过crossbar阵列实现输入数据与配置的权重的矩阵乘加操作。后续将对如何通过crossbar实现矩阵乘加运算进行简要介绍。In addition, those skilled in the art may know that when a computing node (for example, tile125) performs a neural network operation (for example, convolution calculation), it may calculate the data input to the computing node based on the weight of the corresponding neural network layer, For example, a certain tile 125 may perform a convolution operation on the input data input to the tile 125 based on the weight of the corresponding convolution layer, for example, perform a matrix multiply-add calculation on the weight and the input data. The weight is usually used to indicate the importance of the input data to the output data. In neural networks, the weights are usually represented by a matrix. As shown in FIG. 9, the weight matrix of j rows and k columns shown in FIG. 9 may be a weight of a neural network layer, and each element in the weight matrix represents a weight value. In the embodiment of the present invention, since the computing nodes of a node set are used to perform the operation of a neural network layer, the computing nodes of the same node set may share weights, and the computing nodes in different node sets may have different weights. In the embodiment of the present invention, the weights in each computing node can be configured in advance. Specifically, each element in a weight matrix is configured in the ReRAM cell in the corresponding crossbar array, so that the matrix multiply-add operation of the input data and the configured weight can be implemented through the crossbar array. In the follow-up, we will briefly introduce how to implement matrix multiply-add operation through crossbar.
根据上述的描述可知,在本发明实施例中,在实现神经网络流处理的过程中,可以将神经网络中的计算节点划分为用于处理不同神经网络层的节点集合,并配置对应的权重。从而,不同节点集合的计算节点能够根据配置的权重执行相应的计算。并且,每个节点集合的计算节点能够将计算结果发送给用于执行下一层神经网络操作的计算节点。本领域技术人员可以知道,在实现神经网络的流处理过程中,如果执行不同层神经网络操作的计算资源不匹配,例如,执行上层神经网络操作的计算资源少,而执行下一层神经网络操作的计算资源相对较多,则会导致下一层计算节点的计算资源浪费的情况。为了充分利用计算节点的计算能力,使执行不同神经网络层操作的计算节点的计算能力相匹配,本发明实施例提供了一种计算资源分配方法,用于分配执行不同神经网络层操作的计算节点,使得神经网络系统中用于执行相邻两层神经网络操作的计算节点的计算能力匹配,提高了神经网络系统中的数据处理效率,并且不浪费计算资源。According to the above description, in the embodiment of the present invention, in the process of implementing neural network stream processing, the computing nodes in the neural network may be divided into a set of nodes for processing different neural network layers, and corresponding weights are configured. Thus, computing nodes of different node sets can perform corresponding calculations according to the configured weights. And, the computing nodes of each node set can send the computing results to the computing nodes used to perform the next layer of neural network operations. A person skilled in the art may know that, in the process of realizing the neural network stream processing, if the computing resources for performing different layers of neural network operations do not match, for example, the computing resources for performing the upper layer neural network operations are less, and the next layer of neural network operations are performed There are relatively many computing resources, which will result in a waste of computing resources of the next level of computing nodes. In order to make full use of the computing power of computing nodes and match the computing power of computing nodes performing different neural network layer operations, embodiments of the present invention provide a computing resource allocation method for allocating computing nodes performing different neural network layer operations The matching of the computing power of the computing nodes used to perform the operation of two adjacent neural network layers in the neural network system improves the data processing efficiency in the neural network system and does not waste computing resources.
图5为本发明实施例提供的一种神经网络系统中的计算资源分配方法的流程图。该方法可以应用于图1所示的神经网络系统。该方法可以在部署神经网络时或在配置神经网络系统时,由主机105来实现,具体的,可以由主机105中的处理器1052来实现。如图5所示,该方法可以包括下述步骤。5 is a flowchart of a method for computing resource allocation in a neural network system according to an embodiment of the present invention. This method can be applied to the neural network system shown in FIG. 1. This method may be implemented by the host computer 105 when deploying a neural network or when configuring a neural network system. Specifically, it may be implemented by the processor 1052 in the host computer 105. As shown in FIG. 5, the method may include the following steps.
在步骤502中,获取神经网络系统的网络模型信息。所述网络模型信息包括所述神经网络系统中第一神经网络层的第一输出数据量和第二神经网络层的第二输出数据量。网络模型信息可以根据实际的应用需求进行确定。例如,可以根据神经网络系统的应用场景来确定神经网络层的总层数及每一层的算法。网络模型信息中可以包括神经网络系统中神经网络层的总层数、每一层的算法、以及每一层神经网络的数据输出量。在本发明实施例中,算法是指需要执行的神经网络操作,例如,算法可以是指卷积操作、池化操作等。如图3所示,本发明实施例的神经网络系统的神经网络层可以有n层,其中,n为不小于2的整数。在本步骤中,第一神经网络层和第二神经网络层可以是n层中在操作上有依赖关系的两层。在本发明实施例中,具有依赖关系的两个神经网络层是指一个神经网络层的输入数据包括另一神经网络层的输出数据。具有依赖关系的两个神经网络层也可以被称为是相邻层。例如,如图3所示,第一层302的输出数据为第二层304的输入数据,因此,第一层302和第二层304有依赖关系。第二层304的输出数据为第三层306的输入数据,第五层310的输入数据包括第二层304的输出数据,因此,第二层304和第三层306有依赖关系,第二层304和第五层310也具有依赖关系。为了描述清楚,在本发明实施例中,以图3中所示的第一层302为第一神经网络层,第二层304为第二神经网络层为例进行描述。In step 502, the network model information of the neural network system is obtained. The network model information includes the first output data amount of the first neural network layer and the second output data amount of the second neural network layer in the neural network system. Network model information can be determined according to actual application requirements. For example, the total number of neural network layers and the algorithm of each layer can be determined according to the application scenario of the neural network system. The network model information may include the total number of neural network layers in the neural network system, the algorithm of each layer, and the data output of each layer of the neural network. In the embodiments of the present invention, the algorithm refers to a neural network operation that needs to be performed. For example, the algorithm may refer to a convolution operation, a pooling operation, and so on. As shown in FIG. 3, the neural network layer of the neural network system according to the embodiment of the present invention may have n layers, where n is an integer not less than 2. In this step, the first neural network layer and the second neural network layer may be two layers in the n layer that are operationally dependent. In the embodiment of the present invention, the two neural network layers having a dependency relationship mean that the input data of one neural network layer includes the output data of another neural network layer. Two neural network layers with dependencies can also be referred to as adjacent layers. For example, as shown in FIG. 3, the output data of the first layer 302 is the input data of the second layer 304, therefore, the first layer 302 and the second layer 304 have a dependency relationship. The output data of the second layer 304 is the input data of the third layer 306, the input data of the fifth layer 310 includes the output data of the second layer 304, therefore, the second layer 304 and the third layer 306 have a dependency relationship, the second layer 304 and the fifth layer 310 also have a dependency relationship. For clarity of description, in the embodiment of the present invention, the first layer 302 shown in FIG. 3 is the first neural network layer, and the second layer 304 is the second neural network layer as an example for description.
在步骤504中,根据所述神经网络系统的部署需求、所述第一输出 数据量以及所述第二输出数据量确定所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重。其中,N和M都是正整数,且N和M的比值与所述第一输出数据量与所述第二输出数据量的比值对应。实际应用中,部署需求可以包括神经网络系统的计算时延,也可以包括所述神经网络系统所需部署的芯片的数量。本领域技术人员可以知道,神经网络操作主要是执行矩阵乘加操作,每一层神经网络的输出数据也是一个一维或多维的实数矩阵,因此,第一输出数据量包括第一神经网络层的输出数据的行数和列数,第二输出数据量包括第二神经网络层的输出数据的行数和列数。In step 504, according to the deployment requirements of the neural network system, the first output data amount, and the second output data amount, determine the N first weights and the first M second weights to be configured in the second neural network layer. Wherein, N and M are both positive integers, and the ratio of N and M corresponds to the ratio of the first output data volume and the second output data volume. In practical applications, the deployment requirements may include the calculation delay of the neural network system, or may include the number of chips required to be deployed by the neural network system. Those skilled in the art can know that the operation of the neural network is mainly to perform matrix multiply-add operations. The output data of each layer of the neural network is also a one-dimensional or multi-dimensional real matrix. Therefore, the first output data includes the first neural network layer. The number of rows and columns of output data, and the second output data amount includes the number of rows and columns of output data of the second neural network layer.
如前所述,计算节点执行神经网络操作时,例如执行卷积操作或池化操作时,需要对输入数据与对应神经网络层的权重执行乘加计算。由于权重配置在crossbar中的cell上,计算单元中的crossbar并行对输入数据执行计算,因此,权重的数量可以确定执行神经网络操作的多个计算单元的并行计算能力。换一种表达方式,执行神经网络操作的计算节点的计算能力是由执行所述神经网络操作的计算单元中配置的权重的数量决定的。在本发明实施例中,为了使神经网络系统中执行相邻操作的两层神经网络的计算能力相匹配,可以根据具体的部署需求以及所述第一输出数据量以及所述第二输出数据量确定第一神经网络层和第二神经网络层需配置的权重的数量。由于不同神经网络层的权重不一定相同,为了描述清楚,在本发明实施例中,将第一神经网络层操作所需的权重称为第一权重,将第二神经网络层操作所需的权重称为第二权重。执行第一神经网络层操作就是指计算节点基于第一权重对输入第一神经网络层的数据执行相应的计算,执行第一神经网络层操作就是指计算节点基于第二权重对输入第二神经网络层的数据执行相应的计算。这里的计算可以是执行卷积或池化运算等神经网络操作。As mentioned above, when a computing node performs a neural network operation, for example, when performing a convolution operation or a pooling operation, it is necessary to perform a multiply-add calculation on the input data and the weight of the corresponding neural network layer. Since the weights are configured on the cells in the crossbar, the crossbars in the calculation unit perform calculations on the input data in parallel, so the number of weights can determine the parallel computing capabilities of multiple calculation units that perform neural network operations. In another way of expression, the computing power of the computing node performing the neural network operation is determined by the number of weights configured in the computing unit performing the neural network operation. In the embodiment of the present invention, in order to match the computing power of the two-layer neural network that performs adjacent operations in the neural network system, the first output data amount and the second output data amount may be based on specific deployment requirements Determine the number of weights to be configured for the first neural network layer and the second neural network layer. Since the weights of different neural network layers are not necessarily the same, for clarity of description, in the embodiments of the present invention, the weights required for the operation of the first neural network layer are called first weights, and the weights required for the operation of the second neural network layer Called the second weight. Performing the first neural network layer operation means that the computing node performs the corresponding calculation on the data input to the first neural network layer based on the first weight, and performing the first neural network layer operation means that the computing node inputs the second neural network based on the second weight The data of the layer performs corresponding calculations. The calculations here can be neural network operations such as performing convolution or pooling operations.
下面将分别根据不同的部署需求详细描述在本步骤中如何确定每一层神经网络需配置的权重的数量。其中每一层神经网络需配置的权重的数量包括所述第一神经网络层需配置的第一权重的数量N以及所述第二神经网络层需配置的第二权重的数量M。在本发明实施例中,权重是指的权重矩阵。权重的数量是指所需的权重矩阵的个数,或者说权重的副本数。权重的数量也可以被理解为需要配置多少个相同的权重矩阵。The following will describe in detail how to determine the number of weights to be configured for each layer of neural network in this step according to different deployment requirements. The number of weights to be configured for each layer of the neural network includes the number N of first weights to be configured by the first neural network layer and the number M of second weights to be configured by the second neural network layer. In the embodiment of the present invention, the weight refers to a weight matrix. The number of weights refers to the number of weight matrices required, or the number of copies of weights. The number of weights can also be understood as how many identical weight matrices need to be configured.
在一种情形下,当所述神经网络系统的部署需求为所述神经网络系统的计算时延的情况下,为了使整个神经网络系统的计算不超过设置的计算时延,可以先根据第一层(即所述神经网络系统中所有神经网络层中的起始层)神经网络的数据输出量、所述计算时延以及所述神经网络系统中使用的ReRAM crossbar的计算频率来确定所述第一层神经网络所需要配置的权重的数量,再根据所述第一层神经网络需要配置的权重的数量以及每一层神经网络的输出数据量获得每一层神经网络需要配置的权重的数量。具体的,所述第一层(即所述起始层)神经网络所需配置的权重的数量可以按照下述公式一获得:In one case, when the deployment requirement of the neural network system is the calculation delay of the neural network system, in order that the calculation of the entire neural network system does not exceed the set calculation delay, the first The data output volume of the neural network (that is, the starting layer of all neural network layers in the neural network system), the calculation delay, and the calculation frequency of the ReRAM crossbar used in the neural network system determine the first The number of weights that need to be configured for a layer of neural network, and then the number of weights that need to be configured for each layer of neural network according to the number of weights that need to be configured for the first layer of neural network and the output data amount of each layer of neural network. Specifically, the number of weights required for the first layer (ie, the starting layer) neural network can be obtained according to the following formula 1:
Figure PCTCN2018125239-appb-000018
Figure PCTCN2018125239-appb-000018
(公式一)(Formula 1)
其中,
Figure PCTCN2018125239-appb-000019
用于指示第一层(即,所述起始层)神经网络所需配置的权重的数量,
Figure PCTCN2018125239-appb-000020
为第一层(即,所述起始层)神经网络的输出数据的行数,
Figure PCTCN2018125239-appb-000021
为所述第一层(即,所述起始层)神经网络输出数据的列数。t为设置的计算时延,f为计算单元中的CrossBar的计算频率。本领域技人员可以知道,f的值可以根据采用的crossbar的配置参数获得。第一层神经网络输出数据的数据量可以根据步骤502中获取的网络模型信息获得。需要说明的是,在本发明实施例中,第一层神经网络为所述神经网络系统中所有神经网络层中的起始层神经网络。可以理解的是,当第一神经网络层所述神经网络系统中所有神经网络层的起始层时,第一权重的数量N的数量即为根据公式一计算的
Figure PCTCN2018125239-appb-000022
的值。
among them,
Figure PCTCN2018125239-appb-000019
Used to indicate the number of weights required for the first layer (ie, the starting layer) neural network,
Figure PCTCN2018125239-appb-000020
Is the number of rows of output data of the first layer (ie, the starting layer) neural network,
Figure PCTCN2018125239-appb-000021
The number of columns of output data for the first layer (ie, the starting layer) neural network. t is the set calculation delay, and f is the calculation frequency of the CrossBar in the calculation unit. Those skilled in the art can know that the value of f can be obtained according to the configuration parameters of the adopted crossbar. The data volume of the output data of the first layer neural network can be obtained according to the network model information obtained in step 502. It should be noted that, in the embodiment of the present invention, the first-layer neural network is the starting layer neural network among all neural network layers in the neural network system. It can be understood that, when the first neural network layer is the starting layer of all neural network layers in the neural network system, the number N of the first weight is calculated according to formula one
Figure PCTCN2018125239-appb-000022
Value.
在获得第一层神经网络所需的权重的数量后,为了提高神经网络系统中数据处理效率,避免流水线式的并行处理方式出现瓶颈或数据等待,使相邻神经网络层的处理速度匹配,在本发明实施例中,可以使相邻两层所需的权重的数量的比值与相邻两层的输出数据量的比值相对应。例如,比值可以相同。因此,在本发明实施例中,可以根据第一层神经网络所需的权重的数量以及每一层神经网络的输出数据量的比值确定每一层神经网络所需的权重的数量。具体可以按照下述公式(二)计算每一层神经网络所需的权重的数量:After obtaining the number of weights required for the first layer of neural network, in order to improve the efficiency of data processing in the neural network system, to avoid bottlenecks or data waiting in the pipelined parallel processing mode, to match the processing speed of the adjacent neural network layer, in In the embodiment of the present invention, the ratio of the number of weights required by two adjacent layers can be made to correspond to the ratio of the output data amount of the two adjacent layers. For example, the ratio can be the same. Therefore, in the embodiment of the present invention, the number of weights required by each layer of neural network can be determined according to the number of weights required by the first layer of neural network and the ratio of the output data amount of each layer of neural network. Specifically, the number of weights required for each layer of neural network can be calculated according to the following formula (2):
Figure PCTCN2018125239-appb-000023
Figure PCTCN2018125239-appb-000023
(公式二)(Formula 2)
其中,
Figure PCTCN2018125239-appb-000024
用于表示第i层所需的权重的数量;
Figure PCTCN2018125239-appb-000025
用于表示第i-1层所需的权重的数量,
Figure PCTCN2018125239-appb-000026
用于表示第i层输出数据的行数,
Figure PCTCN2018125239-appb-000027
用于表示第i层输出数据的列数,
Figure PCTCN2018125239-appb-000028
用于表示第i-1层输出数据的行数,
Figure PCTCN2018125239-appb-000029
用于表示第i-1层输出数据的列数,i的值可以从2到n,n为所述神经网络系统中神经网络层的总层数。换一种表达方式,在本发明实施例中,执行第i-1层神经网络操作所需的权重的数量与执行第i层神经网络操作所需的权重的数量的比值与第i-1层的输出数据量和第i层的输出数据量的比值对应。
among them,
Figure PCTCN2018125239-appb-000024
Used to represent the number of weights required for layer i;
Figure PCTCN2018125239-appb-000025
Used to represent the number of weights required for layer i-1,
Figure PCTCN2018125239-appb-000026
Used to represent the number of rows of output data of the i-th layer,
Figure PCTCN2018125239-appb-000027
Used to represent the number of columns of the output data of the i-th layer,
Figure PCTCN2018125239-appb-000028
Used to represent the number of rows of output data of the i-1th layer,
Figure PCTCN2018125239-appb-000029
It is used to represent the number of columns of the output data of the i-1th layer. The value of i can be from 2 to n, where n is the total number of neural network layers in the neural network system. To put it another way, in the embodiment of the present invention, the ratio of the number of weights required to perform the operation of the i-1th layer neural network to the number of weights required to perform the ith layer of the neural network operation is the i-1th layer The ratio of the output data volume of and the output data volume of the i-th layer corresponds.
本领域技术人员可以知道,每一个神经网络层的输出数据可以包括多个通道(channel),其中,通道是指每个神经网络层中kernel的数量。一个Kernel代表一种特征提取方式,对应产生一个特征图(feature map),多个特征图构成该层的输出数据。一个神经网络层使用的权重包括多个kernel。因此,在实际应用中,在又一种情形下,每一层的输出数据量还可以考虑每一层神经网络的通道数。具体的,在根据上述公式一获得第一神经网络层所需的权重的数量后,可以根据下述公式三获得每一层神经网络所需的权重的数量:Those skilled in the art may know that the output data of each neural network layer may include multiple channels (channel), where the channel refers to the number of kernels in each neural network layer. A Kernel represents a feature extraction method, corresponding to a feature map (feature map), multiple feature maps constitute the output data of this layer. The weight used by a neural network layer includes multiple kernels. Therefore, in practical applications, in another situation, the output data volume of each layer can also consider the number of channels of each layer of the neural network. Specifically, after obtaining the number of weights required for the first neural network layer according to the above formula 1, the number of weights required for each layer of neural network can be obtained according to the following formula 3:
Figure PCTCN2018125239-appb-000030
Figure PCTCN2018125239-appb-000030
(公式三)(Formula 3)
公式三与公式二的区别在于,公式三在公式二的基础上进一步考虑了每一层神经网络输出的通道数。其中,C i-1用于表示第i-1层的通道数,C i用于表示第i层的通道数,i的值从2到n,n为所述神经网络系统中神经网络层的总层数,n为不小于2的整数。每一层神经网络的通道数可以从网络模型信息中获得。 The difference between Formula 3 and Formula 2 is that Formula 3 further considers the number of channels output by each layer of neural network on the basis of Formula 2. Among them, C i-1 is used to represent the number of channels of the i-1 layer, C i is used to represent the number of channels of the i layer, the value of i is from 2 to n, n is the number of channels of the neural network layer in the neural network system The total number of layers, n is an integer not less than 2. The number of channels of each layer of neural network can be obtained from the network model information.
在本发明实施例中,在根据上述公式一获得起始层所需的权重的数量后,可以按照公式二(或公式三)以及网络模型信息中包含的每一层神经网络的输出数据量计算每一层神经网络所需的权重的数量。例如,当上述第一神经网络层为所述神经网络系统中所有神经网络层的起始层时,则在根据公式一计算获得第一权重的数N后,可以按照公式二,根据N的值以及设置的第一输出数据量和第二输出数据量获得第二神经网络层所需的第二权重的数量M。换一种表达方式,在获得N的值后,可以根据下述公式来计算M的值:N/M=第一输出数据量/第二输出数据量。In the embodiment of the present invention, after the number of weights required for the starting layer is obtained according to the above formula 1, it can be calculated according to formula 2 (or formula 3) and the output data amount of each layer of neural network included in the network model information The number of weights required for each layer of neural network. For example, when the above-mentioned first neural network layer is the starting layer of all neural network layers in the neural network system, after the number N of the first weight is obtained according to formula 1, it can be calculated according to formula 2, according to the value of N And the set first output data amount and second output data amount to obtain the number M of second weights required by the second neural network layer. In another way of expression, after obtaining the value of N, the value of M can be calculated according to the following formula: N/M=first output data amount/second output data amount.
在又一种情况下,当部署需求为所述神经网络系统所需的芯片数量时,可以结合下述公式四和前述公式二计算获得第一层神经网络所需的权重的数量,也可以结合下述公式四和前述公式三计算获得第一层神经网络所需的权重的数量。In yet another case, when the deployment requirement is the number of chips required by the neural network system, the number of weights required to obtain the first layer of neural network can be calculated in combination with the following formula 4 and the foregoing formula 2, or it can be combined The following formula 4 and the foregoing formula 3 calculate the number of weights required to obtain the first layer of neural network.
Figure PCTCN2018125239-appb-000031
Figure PCTCN2018125239-appb-000031
(公式四)(Formula 4)
在上述公式四中,xb 1用于表示部署第一层(或称为起始层)神经网络的一个权重所需的crossbar的数量,
Figure PCTCN2018125239-appb-000032
用于表示起始层所需的权重数量,xb 2用于表示部署第二层神经网络中的一个权重所需的crossbar的数量,
Figure PCTCN2018125239-appb-000033
用于表示第二层神经网络所需的权重的数量。xb n用于表示部署第n层神经网络中的一份权重所需的crossbar的数量,
Figure PCTCN2018125239-appb-000034
用于表示第n层神经网络所需的权重的数量,K为部署需求所要求的神经网络系统的芯片的数量,L为每个芯片中的crossbar的数量。上述公式四表示各神经网络层的crossbar的数量的总和小于等于设置的神经网络中芯片中包括的crossbar的总数。对公式二和公式三的描述可以参考前面的描述,在此不再赘述。
In the above formula 4, xb 1 is used to represent the number of crossbars required to deploy a weight of the first layer (or called the starting layer) neural network,
Figure PCTCN2018125239-appb-000032
Used to represent the number of weights required for the starting layer, xb 2 is used to represent the number of crossbars required to deploy one weight in the second layer of neural network,
Figure PCTCN2018125239-appb-000033
Used to represent the number of weights required for the second layer of neural network. xb n is used to represent the number of crossbars required to deploy a weight in the nth layer neural network,
Figure PCTCN2018125239-appb-000034
It is used to represent the number of weights required for the nth layer neural network, K is the number of chips of the neural network system required for deployment requirements, and L is the number of crossbars in each chip. The above formula 4 indicates that the sum of the number of crossbars of each neural network layer is less than or equal to the total number of crossbars included in the chip in the set neural network. For the description of Formula 2 and Formula 3, please refer to the previous description, and no more details are given here.
本领域技术人员可以知道,在确定神经网络系统的模型后,该神经网络系统的每一个神经网络层的一个权重以及神经网络系统中采用的crossbar的规格(即crossbar中ReRAM cell的行数和列数)就已经确定。换一种表达方式,神经网络系统的网络模型信息还包括每一个神经网络层所使用的一个权重的大小和crossbar的规格信息。因此,在本发明实施例中,可以根据每一层的权重的大小(即权重矩阵的行数和列数)以及crossbar的规格分别获得第i层神经网络的xb i,其中i的取值从1到n。L的值可以从所述神经网络系统采用的芯片的参数获得。在本发明实施例中,一种情形下,在根据上述公式四和公式二获得所述起始层神经网络所需的权重的数量(即
Figure PCTCN2018125239-appb-000035
)后,可以根据公式二以及从网络模型信息中获得的每一层的输出数据量获得每一层需要配置的权重的数量。在 另一种情形下,在根据上述公式四和公式三获得所述起始层神经网络所需的权重的数量(即
Figure PCTCN2018125239-appb-000036
)后,也可以根据公式三以及每一层的输出数据量获得每一层需要配置的权重的数量。
Those skilled in the art can know that after determining the model of the neural network system, a weight of each neural network layer of the neural network system and the specifications of the crossbar used in the neural network system (that is, the number of rows and columns of the ReRAM cell in the crossbar Number) has been determined. In another way of expression, the network model information of the neural network system also includes the size of a weight used by each neural network layer and crossbar specification information. Therefore, in the embodiment of the present invention, the xb i of the i-th layer neural network can be obtained according to the weight of each layer (ie, the number of rows and columns of the weight matrix) and the specifications of the crossbar, where i takes the value from 1 to n. The value of L can be obtained from the parameters of the chip used by the neural network system. In the embodiment of the present invention, in one case, the number of weights required to obtain the starting layer neural network according to Formula 4 and Formula 2 above (ie
Figure PCTCN2018125239-appb-000035
), the number of weights that need to be configured for each layer can be obtained according to Equation 2 and the output data amount of each layer obtained from the network model information. In another case, the number of weights required to obtain the starting layer neural network according to Formula 4 and Formula 3 above (ie
Figure PCTCN2018125239-appb-000036
), the number of weights that need to be configured for each layer can also be obtained according to Equation 3 and the output data amount of each layer.
在步骤506中,根据所述神经网络系统中的计算单元的计算规格,将N个所述第一权重部署到P个计算单元上,并将M个所述第二权重部署到Q个计算单元上。其中,P和Q都是正整数,所述P个计算单元用于执行所述第一神经网络层的操作,所述Q个计算单元用于执行所述第二神经网络层的操作。在本发明实施例中,所述计算单元的计算规格是指一个计算单元中包含的crossbar的数量。实际应用中,一个计算单元可以包括一个或多个crossbar。具体的,如前所述,由于神经网络系统的网络模型信息还包括每一个神经网络层所使用的一个权重的大小和crossbar的规格信息,因此,可以获得一个权重和crossbar的部署关系。在步骤504中获得每一层神经网络需要配置的权重的数量后,可以根据每个计算单元包含的crossbar的数量,将每一层的权重部署在对应数量的计算单元上。具体的,权重矩阵中的元素被分别配置在计算单元的crossbar的ReRAM cell中。在本发明实施例中,计算单元可以指PE或engine,一个PE可以包括多个engine,一个engine可以包括一个或多个crossbar。由于每一层的权重的大小可能不同,因此,一个权重可以部署在一个或多个engine上。In step 506, according to the calculation specifications of the calculation units in the neural network system, N first weights are deployed on P calculation units, and M second weights are deployed on Q calculation units on. Wherein, P and Q are both positive integers, the P computing units are used to perform operations of the first neural network layer, and the Q computing units are used to perform operations of the second neural network layer. In the embodiment of the present invention, the calculation specification of the calculation unit refers to the number of crossbars included in one calculation unit. In practical applications, a computing unit may include one or more crossbars. Specifically, as mentioned above, since the network model information of the neural network system further includes the size of one weight used by each neural network layer and the specification information of the crossbar, the deployment relationship between one weight and the crossbar can be obtained. After obtaining the number of weights to be configured for each layer of the neural network in step 504, the weights of each layer may be deployed on the corresponding number of calculation units according to the number of crossbars included in each calculation unit. Specifically, the elements in the weight matrix are respectively configured in the ReRAM cells of the crossbar of the calculation unit. In the embodiment of the present invention, the computing unit may refer to a PE or an engine, one PE may include multiple engines, and one engine may include one or more crossbars. Since the weight of each layer may be different, a weight can be deployed on one or more engines.
具体的,在本步骤中,可以根据一个权重和crossbar的部署关系以及计算单元中包含的crossbar的数量,确定所述N个第一权重需要部署的P个计算单元以及所述M个所述第二权重需要部署的Q个计算单元。例如,可以将第一神经网络层的N个所述第一权重部署到P个计算单元上,将M个所述第二权重部署到Q个计算单元上。具体的,N个第一权重中的元素被分别配置到P个计算单元中对应的crossbar的ReRAM cell中。M个第二权重中的元素被分别配置到Q个计算单元中对应的crossbar的ReRAM cell中。从而,所述P个计算单元可以基于配置的N个第一权重对输入所述P个计算单元的输入数据执行第一神经网络层的操作,所述Q个计算单元可以基于配置的Q个第二权重对输入所述Q个计算单元的输入数据执行第二神经网络层的操作。Specifically, in this step, according to the deployment relationship of one weight and crossbar and the number of crossbars included in the calculation unit, the P calculation units and the M number of The two weights need to be deployed in Q calculation units. For example, N first weights of the first neural network layer may be deployed on P computing units, and M second weights may be deployed on Q computing units. Specifically, the elements in the N first weights are respectively allocated to the corresponding crossbar ReRAM cells in the P calculation units. The elements in the M second weights are respectively allocated to the corresponding crossbar ReRAM cells in the Q calculation units. Thus, the P computing units may perform the operation of the first neural network layer on the input data input to the P computing units based on the configured N first weights, and the Q computing units may be based on the configured Q first The second weight performs the operation of the second neural network layer on the input data input to the Q computing units.
从上述实施例可知,本发明实施例提供的计算资源分配方法,在根据部署需求配置执行每一层神经网络操作的计算单元时,考虑了相邻神经网络层输出的数据量,使执行不同神经网络层操作的计算节点的计算能力相匹配,从而能够充分利用计算节点的计算能力,提升数据处理的效率。It can be seen from the above embodiments that the computing resource allocation method provided by the embodiments of the present invention considers the amount of data output by the adjacent neural network layer when configuring the computing unit that performs each layer of neural network operations according to deployment requirements, so that different neural networks are executed. The computing power of the computing nodes operating at the network layer matches, so that the computing power of the computing nodes can be fully utilized to improve the efficiency of data processing.
进一步的,在本发明实施例中,为了进一步减少执行不同神经网络层的计算单元之间数据的传输量,节省计算单元或计算节点间的传输带宽。可以按照下述的方法将计算单元映射到计算单元的上级计算节点中。如前所述,神经网络系统中可以包括四级计算节点:第一级计算节点chip、第二级计算节点tile、第三级计算节点PE和第四级计算节点engine。图6以第四级计算节点engine为计算单元为例,详细描述了如何将需要部署所述N个第一权重的P个计算单元以及需要部署所述M个第二权重的Q个计算单元映射到上级计算节点。该方法 仍然可以由图1和图1A所示的神经网络系统中的主机105来实现。如图6所示,该方法可以包括下述步骤。Further, in the embodiments of the present invention, in order to further reduce the amount of data transmission between computing units that execute different neural network layers, the transmission bandwidth between computing units or computing nodes is saved. The computing unit can be mapped to the superior computing node of the computing unit according to the following method. As mentioned above, the neural network system may include four-level computing nodes: a first-level computing node chip, a second-level computing node tile, a third-level computing node PE, and a fourth-level computing node engine. Taking the fourth-level computing node engine as the computing unit as an example, FIG. 6 describes in detail how to map the P computing units that need to deploy the N first weights and the Q computing units that need to deploy the M second weights Go to the superior computing node. This method can still be implemented by the host 105 in the neural network system shown in FIGS. 1 and 1A. As shown in FIG. 6, the method may include the following steps.
在步骤602中,获取神经网络系统的网络模型信息。所述网络模型信息包括所述神经网络系统中第一神经网络层的第一输出数据量和第二神经网络层的第二输出数据量。在步骤604中,根据所述神经网络系统的部署需求、所述第一输出数据量以及所述第二输出数据量确定所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重。在步骤606中,根据所述神经网络系统中的计算单元的计算规格,确定所述N个第一权重需要部署的P个计算单元,以及所述M个所述第二权重需要部署的Q个计算单元。在本发明实施例中,步骤602、604和606可以分别参见前述步骤502、504和506中的相关描述。步骤606和步骤506的不同在于,在步骤606中,在确定所述N个第一权重需要部署的P个计算单元以及所述M个所述第二权重需要部署的Q个计算单元后,并不直接将所述N个第一权重部署到P个计算单元,将M个第二权重部署到Q个计算单元中。而是进入步骤608。In step 602, the network model information of the neural network system is obtained. The network model information includes the first output data amount of the first neural network layer and the second output data amount of the second neural network layer in the neural network system. In step 604, according to the deployment requirements of the neural network system, the first output data amount, and the second output data amount, determine the N first weights and the first M second weights to be configured in the second neural network layer. In step 606, according to the calculation specifications of the calculation units in the neural network system, determine the P calculation units that need to be deployed with the N first weights, and the Q number that need to be deployed with the M second weights Calculation unit. In the embodiment of the present invention, for steps 602, 604, and 606, reference may be made to the related description in the foregoing steps 502, 504, and 506, respectively. The difference between step 606 and step 506 is that, in step 606, after determining the P computing units to be deployed with the N first weights and the Q computing units to be deployed with the M second weights, and The N first weights are not directly deployed to P computing units, and the M second weights are deployed to Q computing units. Instead, step 608 is entered.
在步骤608中,根据所述神经网络系统中的三级计算节点包含的计算单元的数量,将所述P个计算单元以及所述Q个计算单元映射到多个三级计算节点中。具体的,如图6A所示,图6A为本发明实施例提供的一种资源映射方法流程图。图6A以计算单元为第四级计算节点engine为例,描述了如何将engine映射到第三级计算节点PE中。如图6A所示,该方法可以包括下述步骤。In step 608, the P computing units and the Q computing units are mapped into multiple three-level computing nodes according to the number of computing units included in the three-level computing nodes in the neural network system. Specifically, as shown in FIG. 6A, FIG. 6A is a flowchart of a resource mapping method according to an embodiment of the present invention. 6A takes the computing unit as the fourth-level computing node engine as an example, and describes how to map the engine into the third-level computing node PE. As shown in FIG. 6A, the method may include the following steps.
在步骤6082中,将所述P个计算单元和所述Q个计算单元分为m个组,每一组中包括用于执行第一神经网络层的P/m个计算单元以及用于执行第二神经网络层的Q/m个计算单元。其中,m为不小于2的整数,P/m和Q/m的值均为整数。具体的,以所述P个计算单元为执行第i-1层的计算单元,所述Q个计算单元为执行第i-1层的计算单元为例。如图7所示,以第i-1层需要分配8个计算单元(即P=8),第i层需要分配4个计算单元(即Q=4),第i+1层需要分配4个计算单元,且分成2组(即m=2)为例。则可以得到如图7所示的两个组,其中,第1组包括第i-1层的4个计算单元、第i层的2个计算单元、以及第i+1层的2个计算单元。类似的,第2组包括第i-1层的4个计算单元、第i层的2个计算单元、以及第i+1层的2个计算单元。In step 6082, the P computing units and the Q computing units are divided into m groups, and each group includes P/m computing units for executing the first neural network layer and Q/m calculation units in the second neural network layer. Among them, m is an integer not less than 2, and the values of P/m and Q/m are both integers. Specifically, the P computing units are used as the computing unit performing the i-1th layer, and the Q computing units are used as the computing unit performing the i-1th layer as an example. As shown in Fig. 7, at the i-1 layer, 8 computing units need to be allocated (ie P=8), at the i layer, 4 computing units need to be allocated (ie, Q=4), and at the i+1 layer, 4 computing units need to be allocated. Calculation unit, and divided into 2 groups (ie m = 2) as an example. Then you can get two groups as shown in Figure 7, where the first group includes 4 computing units at the i-1th layer, 2 computing units at the ith layer, and 2 computing units at the i+1th layer . Similarly, the second group includes 4 computing units at the i-1th layer, 2 computing units at the ith layer, and 2 computing units at the i+1th layer.
在步骤6084中,按照三级计算节点包含的计算单元的数量,分别将每一组的计算单元映射到三级计算节点中。在映射的过程中,尽量使执行相邻神经网络层操作的计算单元映射到同一个三级节点中。如图7所示,假设所述神经网络系统中,每个第一级计算节点chip中包括8个第二级计算节点tile,每个tile包括2个第三级计算节点PE,每个PE包括4个engine。则对于第1组,可以将第i-1层的4个engine映射到一个第三级计算节点PE(例如图7中的PE1)上,将第i层的2个engine以及第i+1层的2个engine一起映射到一个第三级计算节点PE(例如图7中的PE2)上。类似的,按照对第一组中的计算单元的映射方式,对于第二组的计算单元,可以将第i-1层的4个engine映射到PE3上,将第 i层的2个engine以及第i+1层的2个engine一起映射到一个PE4上。实际应用中,在完成第1组中的计算单元的映射后,可以以镜像的方式,按照第1组的映射方式进行其他组的计算单元的映射。In step 6084, according to the number of computing units included in the third-level computing node, each group of computing units is mapped to the third-level computing node. In the process of mapping, try to make the computing unit that performs the operation of the adjacent neural network layer map to the same three-level node. As shown in FIG. 7, assume that in the neural network system, each first-level computing node chip includes eight second-level computing node tiles, and each tile includes two third-level computing nodes PE, and each PE includes 4 engines. For the first group, you can map the four engines at the i-1th layer to a third-level computing node PE (such as PE1 in Figure 7), and map the two engines at the ith layer and the i+1th layer The two engines are mapped to a third-level computing node PE (such as PE2 in Figure 7). Similarly, according to the mapping method for the computing units in the first group, for the computing units in the second group, the four engines at the i-1th layer can be mapped to PE3, and the two engines at the ith layer and the second The two engines in the i+1 layer are mapped onto one PE4 together. In practical applications, after the mapping of the computing units in the first group is completed, the computing units of other groups can be mapped in a mirrored manner according to the mapping method of the first group.
根据这种映射方式,可以尽量将执行相邻神经网络层(例如,图7中的第i层和第i+1层)的计算单元映射到同一个三级计算节点中。从而,使得第i层的输出数据被发送到第i+1层的计算单元中时,只需要在同一个三级节点(PE)间传输,不需要占用三级节点间的带宽,进而能够提高数据传输速度,减少节点间的传输带宽消耗。According to this mapping method, the computing units that execute adjacent neural network layers (for example, the i-th layer and the i+1th layer in FIG. 7) can be mapped to the same three-level computing node as much as possible. Therefore, when the output data of the i-th layer is sent to the computing unit of the i+1th layer, it only needs to be transmitted between the same third-level node (PE), and does not need to occupy the bandwidth between the third-level nodes, which can improve Data transmission speed reduces transmission bandwidth consumption between nodes.
回到图6,在步骤610中,根据所述神经网络系统中的二级计算节点包含的三级计算节点的数量,将所述P个计算单元以及所述Q个计算单元映射的多个三级计算节点映射到多个二级计算节点中。在步骤612中,根据每个神经网络芯片包含的二级计算节点的数量,将所述P个计算单元以及所述Q个计算单元映射的所述多个二级计算节点映射到所述多个神经网络芯片中。如上所述,图6A是以将执行第i层操作的engine映射到第三级计算节点为例进行描述,类似的,按照图6A所示的方法,还可以将三级节点映射到二级节点,并将二级节点映射到一级节点中。例如,如图7所示,对于第1组,可以进一步将执行i-1层操作的PE1以及执行第i层和第i+1层操作的PE2均映射到同一个二级计算节点Tile1中。对于第2组,可以进一步将执行i-1层操作的PE3以及执行第i层和第i+1层操作的PE4均映射到同一个二级计算节点Tile2中。进一步的,还可以将执行第i-1层、第i层和第i+1层的操作Tile 1和Tile2均映射到同一个芯片chip1中。根据这种方式,可以获得神经网络系统中从第一级计算节点chip到第四级计算节点engine的映射关系。Returning to FIG. 6, in step 610, according to the number of third-level computing nodes included in the second-level computing nodes in the neural network system, a plurality of three mapping units of the P computing units and the Q computing units are mapped The level computing nodes are mapped to multiple level two computing nodes. In step 612, according to the number of secondary computing nodes included in each neural network chip, the multiple secondary computing nodes mapped by the P computing units and the Q computing units are mapped to the multiple Neural network chip. As described above, FIG. 6A takes the example of mapping the engine performing the layer i operation to the third-level computing node as an example. Similarly, according to the method shown in FIG. 6A, the third-level node can also be mapped to the second-level node , And map the second-level nodes to the first-level nodes. For example, as shown in FIG. 7, for the first group, PE1 performing the operation of the i-1 layer and PE2 performing the operations of the i-th layer and the i+1-th layer may be mapped into the same second-level computing node Tile1. For the second group, PE3 performing the operation of the i-1 layer and PE4 performing the operations of the i-th layer and the i+1th layer can be further mapped into the same second-level computing node Tile2. Further, the operations Tile1 and Tile2 that perform the i-1th layer, the ith layer, and the i+1th layer can also be mapped into the same chip chip1. In this way, the mapping relationship from the first-level computing node chip to the fourth-level computing node engine in the neural network system can be obtained.
在步骤614中,将所述N个第一权重部署和所述M个第二权重分别部署到与所述多个三级节点、多个二级计算节点以及多个一级计算节点对应的P个计算单元和Q个计算单元中。在本发明实施例中,按照图6A和图7所述的方法可以获得神经网络系统中从第一级计算节点chip到第四级计算节点engine的映射关系。例如,可以获得所述P个计算单元和所述Q个计算单元分别与所述所述多个三级节点、多个二级计算节点以及多个一级计算节点的映射关系。进而,在本步骤中,可以按照获得的映射关系将对应神经网络层的权重分别部署到各级计算节点的计算单元中。例如,如图7A所示,可以将第i-1层的N个权重分别部署到与chip1、tile1和PE1对应的4个计算单元以及与chip1、tile2和PE3对应的4个计算单元中,将所述第i层的M个第二权重分别部署到与chip1、tile1和PE2对应的2个计算单元以及与chip1、tile2和PE4对应的2个计算单元。换一种表达方式,将第i-1层的N个权重分别部署chip1—>tile1—>PE1中的4个计算单元(engine)以及chip1—>tile2—>PE3中的4个计算单元中。将第i层的M个权重分别部署到chip1—>tile1—>PE2中的2个计算单元以及chip1—>tile2—>PE4中的2个计算单元中。In step 614, the N first weights and the M second weights are deployed to P corresponding to the multiple third-level nodes, multiple second-level computing nodes, and multiple first-level computing nodes, respectively. Calculation units and Q calculation units. In the embodiment of the present invention, the mapping relationship from the first-level computing node chip to the fourth-level computing node engine in the neural network system can be obtained according to the methods described in FIGS. 6A and 7. For example, a mapping relationship between the P computing units and the Q computing units and the multiple third-level nodes, multiple second-level computing nodes, and multiple first-level computing nodes may be obtained, respectively. Furthermore, in this step, the weights of the corresponding neural network layer can be deployed to the computing units of the computing nodes at all levels according to the obtained mapping relationship. For example, as shown in FIG. 7A, the N weights of the i-1th layer can be deployed in the four computing units corresponding to chip1, tile1, and PE1 and the four computing units corresponding to chip1, tile2, and PE3, respectively. The M second weights of the i-th layer are respectively deployed to two computing units corresponding to chip1, tile1 and PE2 and two computing units corresponding to chip1, tile2 and PE4. In another way of expression, the N weights of the i-1 layer are respectively deployed in the four computing units in chip1—>tile1—>PE1 and the four computing units in chip1—>tile2—>PE3. The M weights of the i-th layer are respectively deployed in two computing units in chip1—>tile1—>PE2 and two computing units in chip1—>tile2—>PE4.
通过这种部署方式,不仅能够使本发明实施例所述的神经网络系统 中支持相邻神经网络层操作的计算单元的计算能力相匹配,还能够使执行相邻神经网络层操作的计算单元尽量多的位于同一个三级计算节点中,执行相邻神经网络层的三级计算节点尽量多的位于同一个二级计算节点中,执行相邻神经网络层的二级计算节点尽量多的位于同一个一级计算节点(例如,神经网络芯片)中,从而能够减少计算节点间传输的数据量,提高不同神经网络层间数据传输的速度。Through this deployment method, not only can the computing capabilities of the computing units supporting the operation of the adjacent neural network layer in the neural network system described in the embodiments of the present invention be matched, but also the computing units performing the operations of the adjacent neural network layer can be made as much as possible Many are located in the same three-level computing node, as many third-level computing nodes executing adjacent neural network layers are located in the same second-level computing node, and as many secondary computing nodes executing adjacent neural network layer are in the same In a first-level computing node (for example, a neural network chip), it can reduce the amount of data transmitted between computing nodes and increase the speed of data transmission between different neural network layers.
需要说明的是,本发明实施例是在包含四级计算节点的网络神经系统中,以第四级计算节点engine为计算单元来描述用于执行不同神经网络层的操作的计算资源的分配过程。换一种表达方式,上述实施例是以engine为粒度来划分执行不同神经网络层的操作的集合。实际应用中,还可以以第三级计算节点PE为计算单元进行分配,在这种情况下,可以按照上述方法建立第三级计算节点PE与第二级计算节点tile以及第一级计算节点chip的映射。当然,在需要计算的数据量很大的情况下,也可以以第二级计算节点tile为粒度来进行分配。换一种表达方式,在本发明实施例中,计算单元可以是engine、PE、tile或chip,在此不做限定。It should be noted that in the embodiment of the present invention, in a network neural system including four-level computing nodes, a fourth-level computing node engine is used as a computing unit to describe a process of allocating computing resources for performing operations of different neural network layers. In another way of expression, the above embodiment divides the set of operations that perform different neural network layers with the engine as the granularity. In practical applications, the third-level computing node PE can also be used as the computing unit for distribution. In this case, the third-level computing node PE and the second-level computing node tile and the first-level computing node chip can be established according to the above method. Mapping. Of course, when the amount of data to be calculated is large, the second-level computing node tile can also be used for allocation. To put it another way, in the embodiment of the present invention, the calculation unit may be engine, PE, tile, or chip, which is not limited herein.
上面对本发明实施例提供的神经网络系统如何配置计算资源进行了详细的描述。下面将从处理数据的角度对所述神经网络系统进行进一步的描述。图8为本发明实施例提供的一种数据处理方法流程图。该方法应用于图1所示的神经网络系统中,图1所示的神经网络系统通过图5-7所示的方法进行配置,分配用于执行不同神经网络层操作的计算资源。如图8所示,该方法可以由图1中所示的神经网络电路来实现,该方法可以包括下述步骤。The above describes in detail how the neural network system provided by the embodiment of the present invention configures computing resources. The neural network system will be further described below from the perspective of processing data. 8 is a flowchart of a data processing method according to an embodiment of the present invention. This method is applied to the neural network system shown in FIG. 1, and the neural network system shown in FIG. 1 is configured by the method shown in FIGS. 5-7 to allocate computing resources for performing different neural network layer operations. As shown in FIG. 8, the method may be implemented by the neural network circuit shown in FIG. 1. The method may include the following steps.
在步骤802中,所述神经网络系统中的P个计算单元接收第一输入数据。其中,所述P个计算单元用于执行所述神经网络系统的第一神经网络层操作。在本发明实施例中,所述第一神经网络层为所述神经网络系统中的任意一层。所述第一输入数据为需要执行所述第一神经网络层操作的数据。当所述第一神经网络层为图3所示的所述神经网络系统中的第1层302时,所述第一输入数据可以为首次输入神经网络系统的数据。当所述第一神经网络层不是所述神经网络系统的第1层时,所述第一输入数据可以为其他神经网络层处理后的输出数据。In step 802, P computing units in the neural network system receive first input data. Wherein, the P computing units are used to perform the first neural network layer operation of the neural network system. In the embodiment of the present invention, the first neural network layer is any layer in the neural network system. The first input data is data that needs to perform the operation of the first neural network layer. When the first neural network layer is the first layer 302 in the neural network system shown in FIG. 3, the first input data may be data input to the neural network system for the first time. When the first neural network layer is not the first layer of the neural network system, the first input data may be output data processed by other neural network layers.
在步骤804中,所述P个计算单元根据配置的N个第一权重对所述第一输入数据执行计算以得到第一输出数据。在本发明实施例中,第一权重是一个权重矩阵。所述N个第一权重是指有N个权重矩阵,所述N个第一权重也可以被称为N个第一权重副本。所述N个第一权重可以按照图5-7所示的方法配置在所述P个计算单元中。具体的,第一权重中的元素被分别配置到所述P个计算单元包括的crossbar的ReRAM cell中,从而使得所述P个计算单元中的crossbar可以基于所述N个第一权重对输入数据并行计算,充分利用P个计算单元中的crossbar的计算能力。在本发明实施例中,在接收到所述第一输入数据后,所述P个计算单元可以基于配置的所述N个第一权重对接收的所述第一输入数据执行神经网络操作,得到所述第一输出数据。例如,所述P个计算单元中的 crossbar可以将所述第一输入数据与配置的第一权重执行矩阵乘加运算。In step 804, the P calculation units perform calculation on the first input data according to the configured N first weights to obtain first output data. In the embodiment of the present invention, the first weight is a weight matrix. The N first weights refer to N weight matrices, and the N first weights may also be referred to as N first weight copies. The N first weights may be configured in the P calculation units according to the method shown in FIGS. 5-7. Specifically, the elements in the first weights are respectively configured into the ReRAM cells of the crossbars included in the P calculation units, so that the crossbars in the P calculation units can pair the input data based on the N first weights Parallel computing makes full use of the computing power of the crossbar in P computing units. In the embodiment of the present invention, after receiving the first input data, the P calculation units may perform a neural network operation on the received first input data based on the configured N first weights to obtain The first output data. For example, the crossbar in the P calculation units may perform a matrix multiply-add operation on the first input data and the configured first weight.
在步骤806中,所述神经网络系统中的Q个计算单元接收第二输入数据。其中,所述Q个计算单元用于执行所述神经网络系统的第二神经网络层操作,所述第二输入数据包括所述第一输出数据。具体的,在一种情况下,所述Q个计算单元可以只对所述P个计算单元的第一输出数据执行第二神经网络层的操作。例如,所述P个计算单元用于执行图3所示的第一层302的操作,所述Q个计算单元用于执行图3所示的第二层302的操作。在这种情况下,所述第二输入数据为所述第一输出数据。在又一种情况下,所述Q个计算单元还可以用于对所述第一神经网络层的第一输出数据以及其他神经网络层的输出数据共同执行第二神经网络操作。例如,所述P个计算单元可以用于执行图3所示的第二层304的神经网络操作,所述Q个计算单元可以用于执行图3所示的第五层310的神经网络操作。在这种情况下,所述Q个计算单元用于对所述第二层304以及第四层308的输出数据执行操作,所述第二输入数据包括所述第一输出数据以及所述第四层308的输出数据。In step 806, the Q computing units in the neural network system receive second input data. The Q calculation units are used to perform a second neural network layer operation of the neural network system, and the second input data includes the first output data. Specifically, in one case, the Q calculation units may only perform the operation of the second neural network layer on the first output data of the P calculation units. For example, the P computing units are used to perform the operations of the first layer 302 shown in FIG. 3, and the Q computing units are used to perform the operations of the second layer 302 shown in FIG. In this case, the second input data is the first output data. In yet another case, the Q calculation units may also be used to jointly perform a second neural network operation on the first output data of the first neural network layer and the output data of other neural network layers. For example, the P computing units may be used to perform the neural network operation of the second layer 304 shown in FIG. 3, and the Q computing units may be used to perform the neural network operation of the fifth layer 310 shown in FIG. In this case, the Q calculation units are used to perform operations on the output data of the second layer 304 and the fourth layer 308, and the second input data includes the first output data and the fourth The output data of layer 308.
在步骤808中,所述Q个计算单元根据配置的M个第二权重对所述第二输入数据执行计算以得到第二输出数据。在本发明实施例中,第二权重也是一个权重矩阵。所述M个第二权重是指有M个权重矩阵,所述M个第二权重也可以被称为M个第二权重副本。与步骤804类似,所述第二权重可以按照图5所示的方法配置到所述Q个计算单元包括的crossbar的ReRAM cell中。在接收到所述第二输入数据后,所述Q个计算单元可以基于配置的所述M个第二权重对接收的所述第二输入数据执行神经网络操作,得到所述第二输出数据。例如,所述Q个计算单元中的crossbar可以将所述第二输入数据与配置的第二权重执行矩阵乘加运算。需要说明的是,在本发明实施例中,所述N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值对应。In step 808, the Q calculation units perform calculation on the second input data according to the configured M second weights to obtain second output data. In the embodiment of the present invention, the second weight is also a weight matrix. The M second weights refer to M weight matrixes, and the M second weights may also be referred to as M second weight copies. Similar to step 804, the second weight may be configured into the ReRAM cell of the crossbar included in the Q calculation units according to the method shown in FIG. After receiving the second input data, the Q calculation units may perform a neural network operation on the received second input data based on the configured M second weights to obtain the second output data. For example, the crossbar in the Q calculation units may perform a matrix multiply-add operation on the second input data and the configured second weight. It should be noted that, in the embodiment of the present invention, the ratio of N and M corresponds to the ratio of the data volume of the first output data to the data volume of the second output data.
为了描述清楚,下面对ReRAM crossbar如何实现矩阵乘加操作进行简单的描述。如图9所示,图9所示的j行k列的权重矩阵可以是一个神经网络层的一个权重,该权重矩阵中的每一个元素代表一个权重值。图10是本发明实施例提供的计算单元中的一个ReRAM crossbar的结构示意图。为了描述方便,本发明实施例可以将ReRAM crossbar简称为crossbar。如图10所示,crossbar包括多个ReRAM cell,如G 1,1、G 2,1等。所述多个ReRAM cell构成一个神经网络矩阵。在本发明实施例中,可以在配置神经网络的过程中,将图9所示的权重从图10所示的crossbar的位线(如图10中输入端口1002所示)输入crossbar中,使得权重中的每个元素被配置到相应的ReRAM cell中。例如,图9中的权重元素W 0,0被配置到图10的G 1,1中,图9中的权重元素W 1,0被配置到图10的G 2,1中等。每一个权重元素对应一个ReRAM cell。在执行神经网络计算时,输入数据通过crossbar的字线(如图10所示的输入端口1004)输入crossbar。可以理解的是,输入数据可以通过电压表示,从而使得输入数据与ReRAM cell中配置的权重值实现点乘运算,得到的计算结果以输出电压的形式从crossbar每一列 的输出端(如图10所示的输出端口1006)输出。 For clarity of description, the following briefly describes how the ReRAM crossbar implements matrix multiply-add operations. As shown in FIG. 9, the weight matrix of j rows and k columns shown in FIG. 9 may be a weight of a neural network layer, and each element in the weight matrix represents a weight value. 10 is a schematic structural diagram of a ReRAM crossbar in a computing unit provided by an embodiment of the present invention. For convenience of description, the ReRAM crossbar may be simply referred to as a crossbar in this embodiment of the present invention. As shown in FIG. 10, the crossbar includes multiple ReRAM cells, such as G 1,1 , G 2,1, and so on. The multiple ReRAM cells constitute a neural network matrix. In the embodiment of the present invention, in the process of configuring the neural network, the weight shown in FIG. 9 may be input into the crossbar from the bit line of the crossbar shown in FIG. 10 (as shown by the input port 1002 in FIG. 10), so that the weight Each element in is allocated to the corresponding ReRAM cell. For example, the weight element W 0,0 in FIG. 9 is configured in G 1,1 in FIG. 10, and the weight element W 1,0 in FIG. 9 is configured in G 2,1 and so on in FIG. 10. Each weight element corresponds to a ReRAM cell. When performing neural network calculations, input data is input to the crossbar through the crossbar word line (input port 1004 shown in FIG. 10). It is understandable that the input data can be expressed by voltage, so that the input data and the weight value configured in the ReRAM cell can be dot-multiplied, and the calculated result can be obtained from the output terminal of each column of the crossbar in the form of output voltage (as shown in FIG. 10) The output port shown is 1006) output.
如前所述,由于在配置神经网络系统中执行每一层神经网络操作的计算单元时,考虑了相邻神经网络层输出的数据量,使执行相邻神经网络层操作的计算节点的计算能力能够相匹配。因此通过本发明实施例提供的数据处理方法,能够充分利用计算节点的计算能力,提升神经网络系统的数据处理效率。As mentioned above, because the computing unit that performs each layer of neural network operation in the neural network system is configured, the amount of data output by the adjacent neural network layer is considered, so that the computing power of the computing node that performs the operation of the adjacent neural network layer Able to match. Therefore, the data processing method provided by the embodiment of the present invention can make full use of the computing power of the computing node and improve the data processing efficiency of the neural network system.
在又一种情形下,本发明实施例提供了一种资源分配装置。该装置可以应用于图1及图1A所示的神经网络系统中,用于分配执行不同神经网络层操作的计算节点,使得神经网络系统中用于执行相邻两层神经网络操作的计算节点的计算能力匹配,提高了神经网络系统中的数据处理效率,并且不浪费计算资源。可以理解的是,所述资源分配装置可以位于host中,可以由host中的处理器来实现,也可以作为一个物理器件,独立于处理器而单独存在。例如,可以作为一个独立于处理器的编译器。如图11所示,该资源分配装置1100可以包括获取模块1102、计算模块1104以及部署模块1106。In yet another situation, an embodiment of the present invention provides a resource allocation apparatus. The device can be applied to the neural network system shown in FIG. 1 and FIG. 1A, and is used to allocate computing nodes that perform operations of different neural network layers, so that the computing nodes used to perform operations of two adjacent neural network layers in the neural network system The matching of computing power improves the data processing efficiency in the neural network system and does not waste computing resources. It can be understood that the resource allocation device may be located in the host, may be implemented by a processor in the host, or may be a physical device that exists independently of the processor. For example, it can be used as a processor-independent compiler. As shown in FIG. 11, the resource allocation apparatus 1100 may include an acquisition module 1102, a calculation module 1104, and a deployment module 1106.
获取模块1102,用于获取所述神经网络系统中第一神经网络层的第一输出数据的数据量和第二神经网络层的第二输出数据的数据量,所述第二神经网络层的输入数据包括所述第一输出数据。计算模块1104,用于根据所述神经网络系统的部署需求确定所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重。其中,N和M都是正整数,且N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值对应。An obtaining module 1102, configured to obtain the data amount of the first output data of the first neural network layer and the data amount of the second output data of the second neural network layer in the neural network system, the input of the second neural network layer The data includes the first output data. The calculation module 1104 is configured to determine N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer according to deployment requirements of the neural network system. Wherein, N and M are both positive integers, and the ratio of N and M corresponds to the ratio of the data volume of the first output data to the data volume of the second output data.
如前所述,本发明实施例所述的神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个计算单元,每个计算单元包括至少一个阻变式随机访问存储器交叉矩阵ReRAM crossbar。在一种情形下,所述部署需求包括计算时延,当所述第一神经网络层为所述神经网络系统中所有神经网络层的起始层时,所述计算模块用于根据所述第一输出数据的数据量、所述计算时延以及计算单元中的阻变式随机访问存储器交叉矩阵ReRAM crossbar的计算频率确定所述N的值,并根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值确定所述M的值。As mentioned above, the neural network system according to the embodiment of the present invention includes multiple neural network chips, each neural network chip includes multiple computing units, and each computing unit includes at least one resistive random access memory cross matrix ReRAM crossbar . In one case, the deployment requirement includes a calculation delay. When the first neural network layer is the starting layer of all neural network layers in the neural network system, the calculation module is used to calculate A data volume of the output data, the calculation delay, and a calculation frequency of the resistance random access memory cross matrix ReRAM crossbar in the calculation unit determine the value of N, and according to the data volume of the first output data and the The ratio of the data amount of the second output data and the value of N determine the value of M.
在又一种情形下,所述部署需求包括所述神经网络系统的芯片的数量,所述第一神经网络层为所述神经网络系统的起始层,所述计算模块用于根据所述芯片的数量、每个芯片中的ReRAM crossbar的数量、部署每一层神经网络的一个权重所需的ReRAM crossbar的数量、以及相邻神经网络层的输出数据量的比值确定所述N的值,并根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值确定所述M的值。In yet another situation, the deployment requirement includes the number of chips of the neural network system, the first neural network layer is a starting layer of the neural network system, and the calculation module is configured to The number of Re, the number of ReRAM crossbars in each chip, the number of ReRAM crossbars required to deploy a weight for each layer of neural network, and the ratio of the output data volume of adjacent neural network layers determine the value of N, and The value of M is determined according to the ratio of the data amount of the first output data to the data amount of the second output data and the value of N.
部署模块1106,用于根据所述神经网络系统中的计算单元的计算规格,将N个所述第一权重部署到P个计算单元上,并将M个所述第二权重部署到Q个计算单元上,其中,P和Q都是正整数,所述P个计算单元用于执行所述第一神经网络层的操作,所述Q个计算单元用于执行所述第二神经网络层的操作。其中,计算单元的计算规格是指一个计算单元中包括的crossbar的数量。 实际应用中,一个计算单元可以包括一个或多个crossbar。具体的,在计算模块1104获得每一层神经网络需要配置的权重的数量后,部署模块1106可以根据每个计算单元包含的crossbar的数量,将每一层的权重部署在对应的计算单元上。具体的,权重矩阵中的元素被分别配置计算单元的crossbar的ReRAM cell中。在本发明实施例中,计算单元可以指PE或engine,一个PE可以包括多个engine,一个engine可以包括一个或多个crossbar。由于每一层的权重的大小可能不同,因此,一个权重可以部署在一个或多个engine上。A deployment module 1106, configured to deploy N first weights to P computing units according to the calculation specifications of the calculation units in the neural network system, and deploy M M second weights to Q calculations On the unit, where P and Q are both positive integers, the P computing units are used to perform operations of the first neural network layer, and the Q computing units are used to perform operations of the second neural network layer. The calculation specification of the calculation unit refers to the number of crossbars included in one calculation unit. In practical applications, a computing unit may include one or more crossbars. Specifically, after the calculation module 1104 obtains the number of weights to be configured for each layer of the neural network, the deployment module 1106 may deploy the weights of each layer on the corresponding calculation unit according to the number of crossbars included in each calculation unit. Specifically, the elements in the weight matrix are respectively configured in the ReRAM cells of the crossbar of the calculation unit. In the embodiment of the present invention, the computing unit may refer to a PE or an engine, one PE may include multiple engines, and one engine may include one or more crossbars. Since the weight of each layer may be different, a weight can be deployed on one or more engines.
如前所述,图1所示的神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个二级计算节点,每个二级计算节点包括多个计算单元。为了进一步减少执行不同神经网络层的计算单元之间数据的传输量,节省计算单元或计算节点间的传输带宽。所述资源分配装置1100还可以包括映射模块1108,用于将计算单元映射到计算单元的上级计算节点中。具体的,在计算模块1104获得所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重后,所述映射模块1108用于建立所述N个第一权重和所述P个计算单元的映射关系,以及建立所述M个第二权重和所述Q个计算单元的映射关系。进一步的,所述映射模块1108还用于将根据所述神经网络系统中的二级计算节点包含的计算单元的数量,将所述P个计算单元以及所述Q个计算单元映射到多个二级计算节点中,其中,所述P个计算单元中的至少一部分计算单元和所述Q个计算单元中的至少一部分计算单元被映射到同一个二级计算节点中。As described above, the neural network system shown in FIG. 1 includes multiple neural network chips, each neural network chip includes multiple secondary computing nodes, and each secondary computing node includes multiple computing units. In order to further reduce the amount of data transmission between computing units executing different neural network layers, the transmission bandwidth between computing units or computing nodes is saved. The resource allocation device 1100 may further include a mapping module 1108 for mapping the computing unit to the superior computing node of the computing unit. Specifically, after the calculation module 1104 obtains N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer, the mapping module 1108 is used to establish The mapping relationship between the N first weights and the P computing units, and the mapping relationship between the M second weights and the Q computing units is established. Further, the mapping module 1108 is further configured to map the P computing units and the Q computing units to multiple second units according to the number of computing units included in the secondary computing node in the neural network system In a level computing node, at least a part of the P computing units and at least a part of the Q computing units are mapped into the same level two computing node.
进一步的,所述映射模块1108还用于根据每个神经网络芯片包含的二级计算节点的数量,将所述P个计算单元以及所述Q个计算单元映射的所述多个二级计算节点映射到所述多个神经网络芯片中。其中,所述P个计算单元所属的二级计算节点中的至少一部分二级计算节点与所述Q个计算单元所属的二级计算节点中的至少一部分二级计算节点被映射到同一个神经网络芯片中。Further, the mapping module 1108 is further configured to map the P computing units and the Q computing units to the plurality of secondary computing nodes according to the number of secondary computing nodes included in each neural network chip Map into the multiple neural network chips. Wherein, at least a part of the secondary computing nodes of the secondary computing nodes to which the P computing units belong and at least a part of the secondary computing nodes of the secondary computing nodes to which the Q computing units belong are mapped to the same neural network In the chip.
在本发明实施例中,映射模块1108如何建立N个第一权重和所述P个计算单元的映射关系,建立所述M个第二权重和所述Q个计算单元的映射关系,以及如何将P个计算单元和Q个计算单元分别映射到计算单元的上级计算节点中,可以参见前述对图6、图6A和图7的相应描述,在此不再赘述。In the embodiment of the present invention, how the mapping module 1108 establishes the mapping relationship between the N first weights and the P computing units, establishes the mapping relationship between the M second weights and the Q computing units, and how The P computing units and the Q computing units are respectively mapped to the upper-level computing nodes of the computing unit. Refer to the foregoing corresponding descriptions of FIG. 6, FIG. 6A and FIG. 7, which will not be repeated here.
本发明实施例还提供一种实现上述资源分配方法的计算机程序产品,并且,本发明实施例也提供了一种实现上述数据处理方法的计算程序产品,上述计算机程序产品均包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令用于执行前述任意一个方法实施例所述的方法流程。本领域普通技术人员可以理解,前述的存储介质包括:U盘、移动硬盘、磁碟、光盘、随机存储器(Random-Access Memory,RAM)、固态硬盘(Solid State Disk,SSD)或者非易失性存储器(non-volatile memory)等各种可以存储程序代码的非短暂性的(non-transitory)机器可读介质。An embodiment of the present invention also provides a computer program product that implements the above resource allocation method, and an embodiment of the present invention also provides a computing program product that implements the above data processing method. The above computer program products all include programs that store program codes. A computer-readable storage medium. The instructions included in the program code are used to execute the method flow described in any one of the foregoing method embodiments. Persons of ordinary skill in the art may understand that the foregoing storage medium includes: a USB flash drive, a mobile hard disk, a magnetic disk, an optical disk, a random access memory (Random-Access Memory, RAM), a solid state disk (SSD), or a non-volatile memory A non-transitory machine-readable medium that can store program code, such as a non-volatile memory.
需要说明的是,本申请所提供的实施例仅仅是示意性的。所属领域 的技术人员可以清楚的了解到,为了描述的方便和简洁,在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。在本发明实施例、权利要求以及附图中揭示的特征可以独立存在也可以组合存在。在本发明实施例中以硬件形式描述的特征可以通过软件来执行,反之亦然。在此不做限定。It should be noted that the embodiments provided in this application are only schematic. Those skilled in the art can clearly understand that, for the convenience and conciseness of description, in the above-mentioned embodiments, the description of each embodiment has its own emphasis. For a part that is not detailed in an embodiment, you can refer to other implementations. Examples of related descriptions. The features disclosed in the embodiments, claims and drawings of the present invention may exist independently or in combination. Features described in the form of hardware in the embodiments of the present invention may be executed by software, and vice versa. No limitation here.

Claims (17)

  1. 一种应用于神经网络系统中的计算资源分配方法,其特征在于,包括:A computing resource allocation method applied in a neural network system is characterized by including:
    获取所述神经网络系统中第一神经网络层的第一输出数据的数据量和第二神经网络层的第二输出数据的数据量,所述第二神经网络层的输入数据包括所述第一输出数据;Acquiring the data volume of the first output data of the first neural network layer and the data volume of the second output data of the second neural network layer in the neural network system, the input data of the second neural network layer including the first Output Data;
    根据所述神经网络系统的部署需求确定所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重,其中,N和M都是正整数,且N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值对应;Determine N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer according to the deployment requirements of the neural network system, where N and M are both positive integers , And the ratio of N and M corresponds to the ratio of the data volume of the first output data to the data volume of the second output data;
    根据所述神经网络系统中的计算单元的计算规格,将N个所述第一权重部署到P个计算单元上,并将M个所述第二权重部署到Q个计算单元上,其中,P和Q都是正整数,所述P个计算单元用于执行所述第一神经网络层的操作,所述Q个计算单元用于执行所述第二神经网络层的操作。According to the calculation specifications of the calculation units in the neural network system, N first weights are deployed on P calculation units, and M second weights are deployed on Q calculation units, where P Both Q and Q are positive integers, the P computing units are used to perform operations of the first neural network layer, and the Q computing units are used to perform operations of the second neural network layer.
  2. 根据权利要求1所述的方法,其特征在于,所述部署需求包括计算时延,所述第一神经网络层为所述神经网络系统中所有神经网络层的起始层,The method according to claim 1, wherein the deployment requirement includes a calculation delay, and the first neural network layer is a starting layer of all neural network layers in the neural network system,
    所述确定所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重包括:The determining N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer includes:
    根据所述第一输出数据的数据量、所述计算时延以及计算单元中的阻变式随机访问存储器交叉矩阵ReRAM crossbar的计算频率确定所述N的值;Determine the value of N according to the data amount of the first output data, the calculation delay, and the calculation frequency of the resistance random access memory cross matrix ReRAM crossbar in the calculation unit;
    根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值确定所述M的值。The value of M is determined according to the ratio of the data amount of the first output data to the data amount of the second output data and the value of N.
  3. 根据权利要求1所述的方法,其特征在于:所述神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个计算单元,每个计算单元包括至少一个阻变式随机访问存储器交叉矩阵ReRAM crossbar,所述部署需求包括所述神经网络系统的芯片的数量,所述第一神经网络层为所述神经网络系统的起始层,The method according to claim 1, wherein the neural network system includes a plurality of neural network chips, each neural network chip includes a plurality of calculation units, and each calculation unit includes at least one resistive random access memory cross Matrix ReRAM crossbar, the deployment requirements include the number of chips of the neural network system, the first neural network layer is the starting layer of the neural network system,
    所述确定所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重包括:The determining N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer includes:
    根据所述芯片的数量、每个芯片中的ReRAM crossbar的数量、部署每一层神经网络的一个权重所需的ReRAM crossbar的数量、以及相邻神经网络层的输出数据量的比值确定所述N的值;The N is determined according to the ratio of the number of the chips, the number of ReRAM crossbars in each chip, the number of ReRAM crossbars required to deploy one weight of each layer of neural network, and the output data amount of adjacent neural network layers Value of
    根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值确定所述M的值。The value of M is determined according to the ratio of the data amount of the first output data to the data amount of the second output data and the value of N.
  4. 根据权利要求1所述的方法,其特征在于,所述神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个二级计算节点,每个二级计算节点包括多个计算单元,所述方法还包括:The method according to claim 1, wherein the neural network system includes multiple neural network chips, each neural network chip includes multiple secondary computing nodes, and each secondary computing node includes multiple computing units, The method also includes:
    根据所述神经网络系统中的二级计算节点包含的计算单元的数量,将所述P个计算单元以及所述Q个计算单元映射到多个二级计算节点中,其中,所述P个计算单元中的至少一部分计算单元和所述Q个计算单元中的至少一部分计算单元被映射到同一个二级计算节点中。Mapping the P computing units and the Q computing units to multiple secondary computing nodes according to the number of computing units included in the secondary computing nodes in the neural network system, wherein the P computing At least a part of the computing units in the unit and at least a part of the computing units in the Q computing units are mapped into the same secondary computing node.
  5. 根据权利要求4所述的方法,其特征在于,还包括:The method according to claim 4, further comprising:
    根据每个神经网络芯片包含的二级计算节点的数量,将所述P个计算单元以及所述Q个计算单元映射的所述多个二级计算节点映射到所述多个神经网络芯片中,其中,所述P个计算单元所属的二级计算节点中的至少一部分二级计算节点与所述Q个计算单元所属的二级计算节点中的至少一部分二级计算节点被映射到同一个神经网络芯片中。According to the number of secondary computing nodes included in each neural network chip, mapping the multiple secondary computing nodes mapped by the P computing units and the Q computing units into the multiple neural network chips, Wherein, at least a part of the secondary computing nodes of the secondary computing nodes to which the P computing units belong and at least a part of the secondary computing nodes of the secondary computing nodes to which the Q computing units belong are mapped to the same neural network In the chip.
  6. 一种神经网络系统,其特征在于,包括:A neural network system, characterized in that it includes:
    多个神经网络芯片,每个神经网络芯片包括多个计算单元;Multiple neural network chips, each of which includes multiple computing units;
    处理器,与所述多个神经网络芯片连接并用于:A processor, connected to the plurality of neural network chips and used for:
    获取所述神经网络系统中第一神经网络层的第一输出数据的数据量和第二神经网络层的第二输出数据的数据量,所述第二神经网络层的输入数据包括所述第一输出数据;Acquiring the data volume of the first output data of the first neural network layer and the data volume of the second output data of the second neural network layer in the neural network system, the input data of the second neural network layer including the first Output Data;
    根据所述神经网络系统的部署需求确定所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重,其中,N和M都是正整数,且N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值对应;Determine N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer according to the deployment requirements of the neural network system, where N and M are both positive integers , And the ratio of N and M corresponds to the ratio of the data volume of the first output data to the data volume of the second output data;
    根据所述神经网络系统中的计算单元的计算规格,将N个所述第一权重部署到所述多个计算单元中的P个计算单元上,并将M个所述第二权重部署到所述多个计算单元中的Q个计算单元上,其中,P和Q都是正整数,所述P个计算单元用于执行所述第一神经网络层的操作,所述Q个计算单元用于执行所述第二神经网络层的操作。According to the calculation specifications of the calculation units in the neural network system, N of the first weights are deployed to P of the plurality of calculation units, and M of the second weights are deployed to all On the Q computing units of the plurality of computing units, where P and Q are both positive integers, the P computing units are used to perform the operation of the first neural network layer, and the Q computing units are used to perform The operation of the second neural network layer.
  7. 根据权利要求6所述的神经网络系统,其特征在于,所述部署需求包括计算时延,所述第一神经网络层为所述神经网络系统中所有神经网络层的起始层,The neural network system according to claim 6, wherein the deployment requirement includes a calculation delay, and the first neural network layer is a starting layer of all neural network layers in the neural network system,
    在所述确定所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重的步骤中,所述处理器用于:In the step of determining N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer, the processor is configured to:
    根据所述第一输出数据的数据量、所述计算时延以及计算单元中的阻变式随机访问存储器交叉矩阵ReRAM crossbar的计算频率确定所述N的值;Determine the value of N according to the data amount of the first output data, the calculation delay, and the calculation frequency of the resistance random access memory cross matrix ReRAM crossbar in the calculation unit;
    根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值确定所述M的值。The value of M is determined according to the ratio of the data amount of the first output data to the data amount of the second output data and the value of N.
  8. 根据权利要求6所述的神经网络系统,其特征在于:每个计算单元包括至少一个阻变式随机访问存储器交叉矩阵ReRAM crossbar,所述部署需求包括所述神经网络系统的芯片的数量,所述第一神经网络层为所述神经网络系统的起始层,The neural network system according to claim 6, wherein each computing unit includes at least one resistive random access memory cross matrix ReRAM crossbar, and the deployment requirement includes the number of chips of the neural network system, the The first neural network layer is the starting layer of the neural network system,
    在所述确定所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重的步骤中,所述处理器用于:In the step of determining N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer, the processor is configured to:
    根据所述芯片的数量、每个芯片中的ReRAM crossbar的数量、部署每一层神经网络的一个权重所需的ReRAM crossbar的数量、以及相邻神经网络层的输出数据量的比值确定所述N的值;The N is determined according to the ratio of the number of the chips, the number of ReRAM crossbars in each chip, the number of ReRAM crossbars required to deploy one weight of each layer of neural network, and the output data amount of adjacent neural network layers Value of
    根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值确定所述M的值。The value of M is determined according to the ratio of the data amount of the first output data to the data amount of the second output data and the value of N.
  9. 根据权利要求6所述的神经网络系统,其特征在于:所述神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个二级计算节点,每个二级计算节点包括多个计算单元,所述处理器还用于:The neural network system according to claim 6, wherein the neural network system includes multiple neural network chips, each neural network chip includes multiple secondary computing nodes, and each secondary computing node includes multiple computing Unit, the processor is also used to:
    根据所述神经网络系统中的二级计算节点包含的计算单元的数量,将所述P个计算单元以及所述Q个计算单元映射到多个二级计算节点中,其中,所述P个计算单元中的至少一部分计算单元和所述Q个计算单元中的至少一部分计算单元被映射到同一个二级计算节点中。Mapping the P computing units and the Q computing units to multiple secondary computing nodes according to the number of computing units included in the secondary computing nodes in the neural network system, wherein the P computing At least a part of the computing units in the unit and at least a part of the computing units in the Q computing units are mapped into the same secondary computing node.
  10. 根据权利要求9所述的神经网络系统,其特征在于,所述处理器还用于:The neural network system according to claim 9, wherein the processor is further configured to:
    根据每个神经网络芯片包含的二级计算节点的数量,将所述P个计算单元以及所述Q个计算单元映射的所述多个二级计算节点映射到所述多个神经网络芯片中,其中,所述P个计算单元所属的二级计算节点中的至少一部分二级计算节点与所述Q个计算单元所属的二级计算节点中的至少一部分二级计算节点被映射到同一个神经网络芯片中。According to the number of secondary computing nodes included in each neural network chip, mapping the multiple secondary computing nodes mapped by the P computing units and the Q computing units into the multiple neural network chips, Wherein, at least a part of the secondary computing nodes of the secondary computing nodes to which the P computing units belong and at least a part of the secondary computing nodes of the secondary computing nodes to which the Q computing units belong are mapped to the same neural network In the chip.
  11. 一种资源分配装置,其特征在于,包括:A resource allocation device is characterized by comprising:
    获取模块,用于获取所述神经网络系统中第一神经网络层的第一输出数据的数据量和第二神经网络层的第二输出数据的数据量,所述第二神经网络层的输入数据包括所述第一输出数据;An acquisition module for acquiring the data amount of the first output data of the first neural network layer and the data amount of the second output data of the second neural network layer in the neural network system, and the input data of the second neural network layer Including the first output data;
    计算模块,用于根据所述神经网络系统的部署需求确定所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重,其中,N和M都是正整数,且N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值对应;A calculation module, configured to determine N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer according to the deployment requirements of the neural network system, where N And M are both positive integers, and the ratio of N and M corresponds to the ratio of the data volume of the first output data to the data volume of the second output data;
    部署模块,用于根据所述神经网络系统中的计算单元的计算规格,将N个所述第一权重部署到P个计算单元上,并将M个所述第二权重部署到Q个计算单元上,其中,P和Q都是正整数,所述P个计算单元用于执行所述第一神经网络层的操作,所述Q个计算单元用于执行所述第二神经网络层的操作。A deployment module, configured to deploy N first weights to P computing units according to the calculation specifications of the calculation units in the neural network system, and deploy M second weights to Q computing units In the above, where P and Q are both positive integers, the P computing units are used to perform operations of the first neural network layer, and the Q computing units are used to perform operations of the second neural network layer.
  12. 根据权利要求11所述的资源分配装置,其特征在于,所述部署需求包括计算时延,所述第一神经网络层为所述神经网络系统中所有神经网络层的起始层,所述计算模块用于:The resource allocation device according to claim 11, wherein the deployment requirement includes a calculation delay, and the first neural network layer is a starting layer of all neural network layers in the neural network system, and the calculation The module is used for:
    根据所述第一输出数据的数据量、所述计算时延以及计算单元中的阻变式随机访问存储器交叉矩阵ReRAM crossbar的计算频率确定所述N的值;Determine the value of N according to the data amount of the first output data, the calculation delay, and the calculation frequency of the resistance random access memory cross matrix ReRAM crossbar in the calculation unit;
    根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值确定所述M的值。The value of M is determined according to the ratio of the data amount of the first output data to the data amount of the second output data and the value of N.
  13. 根据权利要求11所述的资源分配装置,其特征在于,所述神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个计算单元,每个计算单元包括至少一个阻变式随机访问存储器交叉矩阵ReRAM crossbar,所述部署需求包括所述神经网络系统的芯片的数量,所述第一神经网络层为所述神经网络系统的起始层,所述计算模块用于:The resource allocation device according to claim 11, wherein the neural network system includes multiple neural network chips, each neural network chip includes multiple computing units, and each computing unit includes at least one resistive random access Memory cross matrix ReRAM crossbar, the deployment requirements include the number of chips of the neural network system, the first neural network layer is the starting layer of the neural network system, and the calculation module is used for:
    根据所述芯片的数量、每个芯片中的ReRAM crossbar的数量、部署每一层神经网络的一个权重所需的ReRAM crossbar的数量、以及相邻神经网络层的输 出数据量的比值确定所述N的值;The N is determined according to the ratio of the number of the chips, the number of ReRAM crossbars in each chip, the number of ReRAM crossbars required to deploy one weight of each layer of neural network, and the output data amount of adjacent neural network layers Value of
    根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值确定所述M的值。The value of M is determined according to the ratio of the data amount of the first output data to the data amount of the second output data and the value of N.
  14. 根据权利要求11所述的资源分配装置,其特征在于,所述神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个二级计算节点,每个二级计算节点包括多个计算单元,所述装置还包括:The resource allocation device according to claim 11, wherein the neural network system includes multiple neural network chips, each neural network chip includes multiple secondary computing nodes, and each secondary computing node includes multiple computing Unit, the device further includes:
    映射模块,用于根据所述神经网络系统中的二级计算节点包含的计算单元的数量,将所述P个计算单元以及所述Q个计算单元映射到多个二级计算节点中,其中,所述P个计算单元中的至少一部分计算单元和所述Q个计算单元中的至少一部分计算单元被映射到同一个二级计算节点中。A mapping module, configured to map the P computing units and the Q computing units to multiple secondary computing nodes according to the number of computing units included in the secondary computing nodes in the neural network system, wherein, At least a part of the P computing units and at least a part of the Q computing units are mapped into the same secondary computing node.
  15. 根据权利要求14所述的资源分配装置,其特征在于,所述映射模块还用于:The resource allocation apparatus according to claim 14, wherein the mapping module is further used to:
    根据每个神经网络芯片包含的二级计算节点的数量,将所述P个计算单元以及所述Q个计算单元映射的所述多个二级计算节点映射到所述多个神经网络芯片中,其中,所述P个计算单元所属的二级计算节点中的至少一部分二级计算节点与所述Q个计算单元所属的二级计算节点中的至少一部分二级计算节点被映射到同一个神经网络芯片中。According to the number of secondary computing nodes included in each neural network chip, mapping the multiple secondary computing nodes mapped by the P computing units and the Q computing units into the multiple neural network chips, Wherein, at least a part of the secondary computing nodes of the secondary computing nodes to which the P computing units belong and at least a part of the secondary computing nodes of the secondary computing nodes to which the Q computing units belong are mapped to the same neural network In the chip.
  16. 一种计算机程序产品,包括程序代码,所述程序代码包括的指令被计算机所执行以执行如权利要求1-5任意一项所述的计算资源分配方法。A computer program product includes program code, and the instructions included in the program code are executed by a computer to perform the computing resource allocation method according to any one of claims 1-5.
  17. 一种计算机可读存储介质,包括计算机程序指令,当所述计算机程序指令在计算机上运行时,使得所述计算机执行如权利要求1-5任意一项所述的方法。A computer-readable storage medium includes computer program instructions, which when executed on a computer, causes the computer to perform the method according to any one of claims 1-5.
PCT/CN2018/125239 2018-12-29 2018-12-29 Computing resource allocation technology and neural network system WO2020133317A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880100574.2A CN113597621A (en) 2018-12-29 2018-12-29 Computing resource allocation technique and neural network system
PCT/CN2018/125239 WO2020133317A1 (en) 2018-12-29 2018-12-29 Computing resource allocation technology and neural network system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/125239 WO2020133317A1 (en) 2018-12-29 2018-12-29 Computing resource allocation technology and neural network system

Publications (1)

Publication Number Publication Date
WO2020133317A1 true WO2020133317A1 (en) 2020-07-02

Family

ID=71126750

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/125239 WO2020133317A1 (en) 2018-12-29 2018-12-29 Computing resource allocation technology and neural network system

Country Status (2)

Country Link
CN (1) CN113597621A (en)
WO (1) WO2020133317A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579285A (en) * 2020-12-10 2021-03-30 南京工业大学 Edge network-oriented distributed neural network collaborative optimization method
CN113158243A (en) * 2021-04-16 2021-07-23 苏州大学 Distributed image recognition model reasoning method and system
CN113238715A (en) * 2021-06-03 2021-08-10 上海新氦类脑智能科技有限公司 Intelligent file system, configuration method thereof, intelligent auxiliary computing equipment and medium
CN113517009A (en) * 2021-06-10 2021-10-19 上海新氦类脑智能科技有限公司 Storage and calculation integrated intelligent chip, control method and controller
WO2022199315A1 (en) * 2021-03-22 2022-09-29 华为技术有限公司 Data processing method and apparatus
CN116089095A (en) * 2023-02-28 2023-05-09 苏州亿铸智能科技有限公司 Deployment method for ReRAM neural network computing engine network
CN116306811A (en) * 2023-02-28 2023-06-23 苏州亿铸智能科技有限公司 Weight distribution method for deploying neural network for ReRAM

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114297130A (en) * 2021-12-28 2022-04-08 深圳云天励飞技术股份有限公司 Data transmission processing method in chip system and related device
CN115204380B (en) * 2022-09-15 2022-12-27 之江实验室 Data storage and array mapping method and device of storage and calculation integrated convolutional neural network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622305A (en) * 2017-08-24 2018-01-23 中国科学院计算技术研究所 Processor and processing method for neutral net
CN107871163A (en) * 2016-09-28 2018-04-03 爱思开海力士有限公司 Operation device and method for convolutional neural networks
EP3343465A1 (en) * 2016-12-30 2018-07-04 Intel Corporation Neuromorphic computer with reconfigurable memory mapping for various neural network topologies

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871163A (en) * 2016-09-28 2018-04-03 爱思开海力士有限公司 Operation device and method for convolutional neural networks
EP3343465A1 (en) * 2016-12-30 2018-07-04 Intel Corporation Neuromorphic computer with reconfigurable memory mapping for various neural network topologies
CN107622305A (en) * 2017-08-24 2018-01-23 中国科学院计算技术研究所 Processor and processing method for neutral net

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579285A (en) * 2020-12-10 2021-03-30 南京工业大学 Edge network-oriented distributed neural network collaborative optimization method
CN112579285B (en) * 2020-12-10 2023-07-25 南京工业大学 Distributed neural network collaborative optimization method for edge network
WO2022199315A1 (en) * 2021-03-22 2022-09-29 华为技术有限公司 Data processing method and apparatus
CN113158243A (en) * 2021-04-16 2021-07-23 苏州大学 Distributed image recognition model reasoning method and system
CN113238715A (en) * 2021-06-03 2021-08-10 上海新氦类脑智能科技有限公司 Intelligent file system, configuration method thereof, intelligent auxiliary computing equipment and medium
CN113238715B (en) * 2021-06-03 2022-08-30 上海新氦类脑智能科技有限公司 Intelligent file system, configuration method thereof, intelligent auxiliary computing equipment and medium
CN113517009A (en) * 2021-06-10 2021-10-19 上海新氦类脑智能科技有限公司 Storage and calculation integrated intelligent chip, control method and controller
CN116089095A (en) * 2023-02-28 2023-05-09 苏州亿铸智能科技有限公司 Deployment method for ReRAM neural network computing engine network
CN116306811A (en) * 2023-02-28 2023-06-23 苏州亿铸智能科技有限公司 Weight distribution method for deploying neural network for ReRAM
CN116089095B (en) * 2023-02-28 2023-10-27 苏州亿铸智能科技有限公司 Deployment method for ReRAM neural network computing engine network
CN116306811B (en) * 2023-02-28 2023-10-27 苏州亿铸智能科技有限公司 Weight distribution method for deploying neural network for ReRAM

Also Published As

Publication number Publication date
CN113597621A (en) 2021-11-02

Similar Documents

Publication Publication Date Title
WO2020133317A1 (en) Computing resource allocation technology and neural network system
US10445638B1 (en) Restructuring a multi-dimensional array
US20230325348A1 (en) Performing concurrent operations in a processing element
WO2020133463A1 (en) Neural network system and data processing technology
CN110520853B (en) Queue management for direct memory access
CN109102065B (en) Convolutional neural network accelerator based on PSoC
CN111033529B (en) Architecture optimization training of neural networks
US11294599B1 (en) Registers for restricted memory
US11599367B2 (en) Method and system for compressing application data for operations on multi-core systems
US11755683B2 (en) Flexible accelerator for sparse tensors (FAST) in machine learning
US20210303976A1 (en) Flexible accelerator for sparse tensors in convolutional neural networks
Dutta et al. Hdnn-pim: Efficient in memory design of hyperdimensional computing with feature extraction
US11579921B2 (en) Method and system for performing parallel computations to generate multiple output feature maps
WO2020124488A1 (en) Application process mapping method, electronic device, and computer-readable storage medium
CN112835844A (en) Communication sparsization method for load calculation of impulse neural network
WO2021244045A1 (en) Neural network data processing method and apparatus
CN111971692A (en) Convolutional neural network
CN111078624B (en) Network-on-chip processing system and network-on-chip data processing method
CN111078623B (en) Network-on-chip processing system and network-on-chip data processing method
CN111078625B (en) Network-on-chip processing system and network-on-chip data processing method
US20200218960A1 (en) Convolution operator system to perform concurrent convolution operations
WO2020051918A1 (en) Neuronal circuit, chip, system and method therefor, and storage medium
TWI836132B (en) Storage system and method for dynamically scaling sort operation for storage system
TWI753728B (en) Architecture and cluster of processing elements and method of convolution operation
KR102663759B1 (en) System and method for hierarchical sort acceleration near storage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18944653

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18944653

Country of ref document: EP

Kind code of ref document: A1