WO2020133463A1 - 神经网络系统及数据处理技术 - Google Patents

神经网络系统及数据处理技术 Download PDF

Info

Publication number
WO2020133463A1
WO2020133463A1 PCT/CN2018/125761 CN2018125761W WO2020133463A1 WO 2020133463 A1 WO2020133463 A1 WO 2020133463A1 CN 2018125761 W CN2018125761 W CN 2018125761W WO 2020133463 A1 WO2020133463 A1 WO 2020133463A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
layer
data
output data
computing
Prior art date
Application number
PCT/CN2018/125761
Other languages
English (en)
French (fr)
Inventor
刘哲
曾重
王铁英
段小祥
张慧敏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2018/125761 priority Critical patent/WO2020133463A1/zh
Priority to EP18944316.1A priority patent/EP3889844A4/en
Priority to CN201880100568.7A priority patent/CN113261015A/zh
Publication of WO2020133463A1 publication Critical patent/WO2020133463A1/zh
Priority to US17/360,459 priority patent/US20210326687A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • This application relates to the field of computer technology, in particular to a neural network system and data processing technology.
  • Deep learning is an important branch of artificial intelligence (Artificial Intelligence, AI). Deep learning is a neural network constructed to imitate the human brain, which can achieve better recognition results than traditional shallow learning methods.
  • Convolutional neural network Convolutional Neural Network, CNN
  • CNN Convolutional Neural Network
  • image processing is an application to identify and analyze the input image, and finally output a set of classified image content. For example, we can use the convolutional neural network algorithm to extract and classify the body color, license plate number and model of a motor vehicle on a picture.
  • Convolutional neural networks usually use a three-layer sequence: convolutional layer (Convolutional Layer), pooling layer (Pooling Layer) and modified linear units (Rectified Linear Units, ReLU) to extract the features of the picture.
  • Convolutional Layer convolutional layer
  • Pooling Layer Pooling layer
  • ReLU modified linear units
  • the process of extracting picture features is actually a series of matrix operations (for example, matrix multiply-add operations). Therefore, how to process the pictures in the network in parallel and quickly becomes a problem to be studied in a convolutional neural network.
  • a neural network system and data processing technology provided by the present application can increase the data processing speed in the neural network.
  • an embodiment of the present invention provides a neural network system.
  • the neural network system includes P computing units for performing operations of the first neural network layer and Q computing units for performing operations of the second neural network layer.
  • the P calculation units are configured to receive first input data and perform calculation on the first input data according to the configured N first weights to obtain first output data.
  • the Q calculation units are used to receive second input data, and perform calculation on the second input data according to the configured M second weights to obtain second output data.
  • the second input data includes the first output data.
  • P, Q, N, and M are all positive integers, and the ratio of N and M corresponds to the ratio of the data volume of the first output data to the data volume of the second output data.
  • the ratio of the N weights configured on the P computing units performing the first neural network layer operation and the M weights configured on the Q computing units performing the second neural network layer operation Corresponding to the ratio of the data amount of the first output data to the data amount of the second output data, so that the calculation capabilities of the P calculation units and the Q calculation units match. Therefore, it is possible to make full use of the computing power of the computing nodes that perform each layer of neural network operations and improve the efficiency of data processing.
  • the neural network system includes multiple neural network chips, each neural network chip includes multiple secondary computing nodes, and each secondary computing node includes multiple computing Unit, each computing unit includes at least one resistive random access memory cross matrix ReRAM crossbar.
  • the values of N and M are based on the deployment requirements of the neural network system, the amount of the first output data, and all The second output data amount is determined.
  • the deployment requirement includes a calculation delay
  • the first neural network layer is the start of all neural network layers in the neural network system Layer
  • the value of N is determined according to the data amount of the first output data
  • the value of M is determined according to the data amount of the first output data and The ratio of the data amount of the second output data and the value of N are determined.
  • the value of N can be obtained according to the following formula:
  • N Used to indicate the number N of weights required for the first layer of neural network configuration, Is the number of rows of output data of the first layer of neural network, The number of columns of output data for the first-layer neural network.
  • t is the set calculation delay, and f is the calculation frequency of the CrossBar in the calculation unit.
  • the deployment requirement includes the number of the neural network chips, and the first neural network layer is all neural networks in the neural network system
  • the starting layer of the layer the value of N is based on the number of the chips, the number of ReRAM crossbars in each chip, the number of ReRAM crossbars required to deploy a weight for each layer of neural network, and the neighboring neural network
  • the ratio of the data volume of the output data of the layer is determined, and the value of M is determined according to the ratio of the data volume of the first output data and the data volume of the second output data and the value of N.
  • the deployment requirement is the number of chips required by the neural network system
  • the first neural network layer is the starting layer of the neural network system
  • the two formulas are used to obtain N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer, where the value of N is ,
  • the value of M is Value.
  • xb 1 is used to represent the number of crossbars required to deploy a weight of the first layer (or called the starting layer) neural network, Used to represent the number of weights required for the starting layer
  • xb 2 is used to represent the number of crossbars required to deploy one weight in the second layer of neural network, Used to represent the number of weights required for the second layer of neural network
  • xb n is used to represent the number of crossbars required to deploy a weight in the nth layer neural network, It is used to represent the number of weights required for the nth layer neural network
  • K is the number of chips of the neural network system required for deployment requirements
  • L is the number of crossbars in each chip.
  • the value of i can be from 2 to n, where n is the total number of neural network layers in the neural network system.
  • At least a part of the P computing units and at least a part of the Q computing units are located In the same secondary computing node.
  • the ratio of the N and M and the data volume of the first output data and the second The corresponding ratio of the data volume of the output data includes: the ratio of the ratio of the N and M to the data volume of the first output data and the data volume of the second output data is the same.
  • the present application provides a data processing method applied in a neural network system.
  • P calculation units in the neural network system receive the first input data, and perform calculation on the first input data according to the configured N first weights to obtain first output data, wherein, the P computing units are used to perform the first neural network layer operation.
  • the Q calculation units in the neural network system receive second input data, and perform calculation on the second input data according to the configured M second weights to obtain second output data.
  • the Q calculation units are used to perform a second neural network layer operation, the second input data includes the first output data, P, Q, N, and M are all positive integers, and the ratio of N and M is The ratio of the data volume of the first output data to the data volume of the second output data corresponds.
  • the first neural network layer is a starting layer of all neural network layers in the neural network system; the value of N is based on the first output The data amount of the data, the calculation delay of the set neural network system and the calculation frequency of the ReRAM and crossbar in the calculation unit are determined; the value of M is based on the data amount of the first output data and the second output The ratio of the data volume of the data and the value of N are determined.
  • the neural network system includes multiple neural network chips, each neural network chip includes multiple computing units, and each computing unit includes at least one resistive random access In the memory cross matrix ReRAM, the first neural network layer is the starting layer of the neural network system.
  • the value of N depends on the number of the multiple neural network chips, the number of ReRAM crossbars in each chip, the number of ReRAM crossbars required to deploy one weight for each layer of neural network, and the number of adjacent neural network layers
  • the ratio of the amount of output data is determined; the value of M is determined according to the ratio of the amount of data of the first output data and the amount of data of the second output data and the value of N.
  • the neural network system includes multiple neural network chips, and each neural network chip includes multiple secondary computing Node, each secondary computing node includes multiple computing units. At least a part of the P computing units and at least a part of the Q computing units are located in the same secondary computing node.
  • At least a portion of the secondary computing nodes to which the P computing units belong and the Q computing units At least a part of the secondary computing nodes in the secondary computing nodes belong to the same neural network chip.
  • the ratio of the N and M to the data amount of the first output data and the second The ratio of the data volume of the output data is the same.
  • the present application also provides a computer program product, including program code, and the instructions included in the program code are executed by a computer to implement the first aspect and any one of the possible aspects of the first aspect The data processing method described in the implementation.
  • the present application also provides a computer-readable storage medium for storing program code, and the instructions included in the program code are executed by a computer to implement the foregoing first aspect and The method described in any possible implementation manner of the first aspect.
  • FIG. 1 is a schematic structural diagram of a neural network system provided by an embodiment of the present invention.
  • FIG. 1A is a schematic structural diagram of yet another neural network system provided by an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a computing node in a neural network chip provided by an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a logical structure of a neural network layer in a neural network system according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a set of computing nodes for processing data of different neural network layers in a neural network system according to an embodiment of the present invention
  • FIG. 5 is a flowchart of a method for computing resource allocation in a neural network system according to an embodiment of the present invention
  • FIG. 6 is a flowchart of yet another method for computing resource allocation according to an embodiment of the present invention.
  • 6A is a flowchart of a resource mapping method according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of another computing resource allocation method according to an embodiment of the present invention.
  • FIG. 8 is a flowchart of a data processing method according to an embodiment of the present invention.
  • ReRAM crossbar 10 is a schematic structural diagram of a resistive random access memory crossbar (ReRAM crossbar) according to an embodiment of the present invention
  • FIG. 11 is a schematic structural diagram of a resource allocation apparatus according to an embodiment of the present invention.
  • Deep learning is an important branch of artificial intelligence (Artificial Intelligence, AI). Deep learning is a neural network constructed to imitate the human brain, which can achieve better recognition results than traditional shallow learning methods.
  • Artificial neural network Artificial Neural Network, ANN
  • NN neural network
  • Artificial neural networks can include convolutional neural networks (Convolutional Neural Networks, CNN), deep neural networks (Deep Neural Networks, DNN), multilayer perceptrons (Multilayer Perceptron, MLP) and other neural networks.
  • CNN convolutional neural networks
  • DNN deep neural networks
  • MLP multilayer perceptrons
  • FIG. 1 is a schematic structural diagram of an artificial neural network system according to an embodiment of the present invention.
  • Figure 1 illustrates the convolutional neural network as an example.
  • the convolutional neural network system 100 may include a host 105 and a convolutional neural network circuit 110.
  • the convolutional neural network circuit 110 may also be referred to as a neural network accelerator.
  • the convolutional neural network circuit 110 is connected to the host 105 through the host interface.
  • the host interface may include a standard host interface and a network interface.
  • the host interface may include a Peripheral Component Interconnect Express (PCIE) interface.
  • PCIE Peripheral Component Interconnect Express
  • the convolutional neural network circuit 110 may be connected to the host 105 through the PCIE bus 106.
  • PCIE Peripheral Component Interconnect Express
  • the data can be input into the convolutional neural network circuit 110 through the PCIE bus 106, and receive the processed data of the convolutional neural network circuit 110 through the PCIE bus 106.
  • the host 105 may also monitor the working state of the convolutional neural network circuit 110 through the host interface.
  • the host 105 may include a processor 1052 and memory 1054. It should be noted that, in addition to the devices shown in FIG. 1, the host 105 may also include a communication interface and other devices such as a magnetic disk as an external storage, which is not limited herein.
  • a processor (Processor) 1052 is an arithmetic core and a control core (Control Unit) of the host 105.
  • the processor 1052 may include multiple processor cores.
  • the processor 1052 may be a very large-scale integrated circuit.
  • An operating system and other software programs are installed in the processor 1052, so that the processor 1052 can achieve access to the memory 1054, cache, disk, and peripheral devices (such as the neural network circuit in FIG. 1).
  • the Core in the processor 1052 may be, for example, a central processing unit (Central Processing Unit, CPU), or other specific integrated circuits (Application Specific Integrated Circuit, ASIC).
  • CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • the memory 1054 is the main memory of the host 105.
  • the memory 1054 is connected to the processor 1052 through a double data rate (DDR) bus.
  • the memory 1054 is generally used to store various running software, input and output data, and information exchanged with external storage in the operating system. In order to improve the access speed of the processor 1052, the memory 1054 needs to have the advantage of fast access speed.
  • dynamic random access memory DRAM
  • the processor 1052 can access the memory 1054 at a high speed through a memory controller (not shown in FIG. 1), and can perform a read operation and a write operation on any storage unit in the memory 1054.
  • a convolutional neural network (CNN) circuit 110 is a chip array composed of multiple neural network (NN) chips.
  • the CNN circuit 110 includes multiple NN chips 115 and multiple routers 120.
  • the embodiment of the present invention refers to the NN chip 115 in the application as chip 115 for short.
  • the plurality of chips 115 are connected to each other through a router 120.
  • one chip 115 may be connected to one or more routers 120.
  • Multiple routers 120 may constitute one or more network topologies. Data can be transmitted between the chips 115 through the one or more network topologies.
  • the plurality of routers 120 may constitute a first network 1106 and a second network 1108, where the first network 1106 is a ring network and the second network 1108 is a two-dimensional mesh (2D mesh) network. Therefore, the data input from the input port 1102 can be sent to the corresponding chip 115 by the network composed of the plurality of routers 120, and the data processed by any one chip 115 can also be sent to other chips 115 through the network composed of the plurality of routers 120. Process or send out from output port 1104.
  • FIG. 1 also shows a schematic structural diagram of the chip 115.
  • chip 115 may include multiple neural network processing units 125 and multiple routers 122.
  • FIG. 1 takes the neural network processing unit as a tile for example.
  • one tile 125 may be connected to one or more routers 122.
  • the multiple routers 122 in the chip 115 may constitute one or more network topologies. Data can be transmitted between tiles 125 through the various network topologies.
  • the plurality of routers 122 may constitute a first network 1156 and a second network 1158, where the first network 1156 is a ring network and the second network 1158 is a two-dimensional mesh (2D mesh) network.
  • 2D mesh two-dimensional mesh
  • the data input to the chip 115 from the input port 1152 can be sent to the corresponding tile 125 according to the network composed of the plurality of routers 122, and the data processed by any one tile 125 can also be sent through the network composed of the plurality of routers 122 Send to other tiles 125 or output from output port 1154.
  • the chips 115 are interconnected by routers, one or more network topologies composed of multiple routers 120 in the convolutional neural network circuit 110 and a network composed of multiple routers 122 in the data processing chip 115
  • the topology may be the same or different, as long as data can be transmitted between the chips 115 or the tiles 125 through the network topology, and the chip 115 or the tiles 125 can receive data or output data through the network topology.
  • the number and types of networks composed of multiple routers 120 and 122 are not limited.
  • the router 120 and the router 122 may be the same or different. For clarity of description, in FIG.
  • the router 120 connected to the chip may also be referred to as a computing node.
  • FIG. 1A is a schematic structural diagram of yet another neural network system according to an embodiment of the present invention.
  • the host 105 is connected to multiple PCIE cards 109 through a PCIE interface 107, and each PCIE card 109 may include multiple neural network chips 115, and the neural network chips are connected through a high-speed interconnection interface .
  • the interconnection between chips is not limited here. It can be understood that, in actual applications, the tiles within the chip may not be connected by a router, and the high-speed interconnection method between the chips shown in FIG. 1A is adopted. In another case, it is also possible to use the router connection shown in FIG. 1 between the tiles within the chip, and the high-speed interconnection method shown in FIG. 1A between the chips.
  • the embodiments of the present invention do not limit the connection modes between chips or within chips.
  • each tile 125 may include an input-output interface (TxRx) 1252, a switching device (TSW) 1254, and multiple processing devices (processing elements) 1256.
  • TxRx 1252 is used to receive the data input to the tile 125 from the Router 120 or output the calculation result of the tile 125. To put it another way, TxRx 1252 is used to transfer data between tile 125 and router 120.
  • a switch (TSW) 1254 is connected to TxRx 1252, and the TSW 1254 is used to implement data transmission between the TxRx 1252 and multiple PEs 1256.
  • Each PE 1256 may include one or more computing engines (computing engines) 1258.
  • the one or more computing engines 1258 are used to implement neural network calculations on the data in the input PE 1256. For example, the data input to tile 125 and the convolution kernel preset in tile 125 may be multiplied and added.
  • the calculation result of Engine 1258 can be sent to other tiles 125 through TSW 1254 and TxRx 1252.
  • an Engine 1258 may include modules that implement convolution, pooling, or other neural network operations.
  • the specific circuit or function of the Engine is not limited. For simplicity of description, in the embodiment of the present invention, the calculation engine is simply referred to as the engine engine.
  • ReRAM resistive random-access memory
  • Engine 1258 may include one or more crossbars.
  • the structure of ReRAM crossbar can be shown in Figure 10. Later, we will introduce how to perform matrix multiply-add operation through ReRAM crossbar.
  • the neural network circuit provided by the embodiment of the present invention includes multiple NN chips, each NN chip includes multiple tile tiles, and each tile includes multiple processing devices PE, and each PE Including multiple engine engines, each engine is realized by one or more ReRAM crossbars.
  • the neural network system provided by the embodiment of the present invention may include multi-level computing nodes, for example, may include four-level computing nodes: the first-level computing node is a chip 115, the second-level computing node is a tile within the chip, and The third-level computing node is the PE in the tile, and the fourth-level computing node is the Engine in the PE.
  • the neural network system may include multiple neural network layers.
  • the neural network layer is a logical layer concept.
  • a neural network layer refers to performing a neural network operation once.
  • Each layer of neural network computing is implemented by computing nodes.
  • the neural network layer may include a convolution layer, a pooling layer, and the like.
  • the neural network system may include n neural network layers (also called n-layer neural networks), where n is an integer greater than or equal to 2.
  • FIG. 3 shows some neural network layers in the neural network system. As shown in FIG.
  • the neural network system may include a first layer 302, a second layer 304, a third layer 306, a fourth layer 308, and a fifth layer 310 To the nth layer 312.
  • the first layer 302 can perform a convolution operation
  • the second layer 304 can perform a pooling operation on the output data of the first layer 302
  • the third layer 306 can perform a convolution operation on the output data of the second layer 304
  • the fourth layer 308 may perform a convolution operation on the output result of the third layer 306, and the fifth layer 310 may perform a sum operation on the output data of the second layer 304 and the output data of the fourth layer 308, and so on. It can be understood that FIG.
  • the fourth layer 308 may also be a pooling operation.
  • the fifth layer 310 may also be other neural network operations such as convolution operations or pooling operations.
  • the calculation result of the i-th layer will be temporarily stored in the preset buffer.
  • the calculation unit The calculation result of the i-th layer and the weight of the i+1th layer need to be reloaded from the preset cache.
  • the i-th layer is any layer in the neural network system.
  • ReRAM uses a crossbar in the engine of the neural network system, and because ReRAM has the advantage of integrating storage and calculation, the weights can be configured on the ReRAM before calculation, and the calculation results can be directly sent to the next layer Perform pipeline calculations.
  • each layer of neural network only needs to cache very little data.
  • each layer of neural network only needs to cache enough input data for one window calculation.
  • an embodiment of the present invention provides a method for streaming data through a neural network. For clarity of description, the following briefly introduces the stream processing of the neural network system in conjunction with the convolutional neural network system of FIG. 1.
  • FIG. 4 takes the division of tiles 125 in the neural network system shown in FIG. 1 as an example to illustrate different sets of computing nodes that implement neural network computing at different layers in the embodiment of the present invention.
  • multiple tiles 125 in the chip 115 may be divided into multiple node sets. For example: first node set 402, second node set 404, third node set 406, fourth node set 408, and fifth node set 410.
  • each node set includes at least one computing node (for example, tile 125).
  • the computing nodes of the same node set are used to perform neural network operations on the data entering the same neural network layer, and the data of different neural network layers are processed by the computing nodes of different node sets.
  • the processing results of a computing node will be transmitted to the computing nodes in other node sets for processing.
  • This pipelined processing method makes each layer of neural network only need to cache very little data, and can make multiple computing nodes concurrent Processing the same data stream to improve processing efficiency.
  • FIG. 4 uses tiles as an example to illustrate a set of computing nodes used to process different neural network layers (such as convolutional layers). In actual applications, because a tile contains multiple PEs, each PE contains multiple Engines, and different application scenarios require different amounts of calculation.
  • the computing nodes in the neural network system can be divided with the granularity of PE, Engine or chip, so that the computing nodes in different sets are used to handle the operations of different neural network layers.
  • the computing node referred to in the embodiment of the present invention may be Engine, PE, tile, or chip.
  • a computing node for example, tile125
  • a neural network operation for example, convolution calculation
  • it may calculate the data input to the computing node based on the weight of the corresponding neural network layer
  • a certain tile 125 may perform a convolution operation on the input data input to the tile 125 based on the weight of the corresponding convolution layer, for example, perform a matrix multiply-add calculation on the weight and the input data.
  • the weight is usually used to indicate the importance of the input data to the output data.
  • the weights are usually represented by a matrix. As shown in FIG. 9, the weight matrix of j rows and k columns shown in FIG.
  • each element in the weight matrix represents a weight value.
  • the computing nodes of a node set since the computing nodes of a node set are used to perform an operation of a neural network layer, the computing nodes of the same node set may share weights, and the computing nodes in different node sets may have different weights.
  • the weights in each computing node can be configured in advance. Specifically, each element in a weight matrix is configured in the ReRAM cell in the corresponding crossbar array, so that the matrix multiply-add operation of the input data and the configured weight can be implemented through the crossbar array. In the follow-up, we will briefly introduce how to implement matrix multiply-add operation through crossbar.
  • the computing nodes in the neural network may be divided into a set of nodes for processing different neural network layers, and corresponding weights are configured.
  • computing nodes of different node sets can perform corresponding calculations according to the configured weights.
  • the computing nodes of each node set can send the computing results to the computing nodes used to perform the next layer of neural network operations.
  • a person skilled in the art may know that, in the process of realizing the neural network stream processing, if the computing resources for performing different layers of neural network operations do not match, for example, the computing resources for performing the upper layer neural network operations are less, and the next layer of neural network operations are performed There are relatively many computing resources, which will result in a waste of computing resources of the next level of computing nodes.
  • embodiments of the present invention provide a computing resource allocation method for allocating computing nodes performing different neural network layer operations The matching of the computing power of the computing nodes used to perform the operation of two adjacent neural network layers in the neural network system improves the data processing efficiency in the neural network system and does not waste computing resources.
  • FIG. 5 is a flowchart of a method for computing resource allocation in a neural network system according to an embodiment of the present invention. This method can be applied to the neural network system shown in FIG. 1. This method may be implemented by the host computer 105 when deploying a neural network or when configuring a neural network system. Specifically, it may be implemented by the processor 1052 in the host computer 105. As shown in FIG. 5, the method may include the following steps.
  • the network model information of the neural network system is obtained.
  • the network model information includes the first output data amount of the first neural network layer and the second output data amount of the second neural network layer in the neural network system.
  • Network model information can be determined according to actual application requirements. For example, the total number of neural network layers and the algorithm of each layer can be determined according to the application scenario of the neural network system.
  • the network model information may include the total number of neural network layers in the neural network system, the algorithm of each layer, and the data output of each layer of the neural network.
  • the algorithm refers to a neural network operation that needs to be performed.
  • the algorithm may refer to a convolution operation, a pooling operation, and so on. As shown in FIG.
  • the neural network layer of the neural network system may have n layers, where n is an integer not less than 2.
  • the first neural network layer and the second neural network layer may be two layers in the n layer that are operationally dependent.
  • the two neural network layers having a dependency relationship mean that the input data of one neural network layer includes the output data of another neural network layer.
  • Two neural network layers with dependencies can also be referred to as adjacent layers.
  • the output data of the first layer 302 is the input data of the second layer 304, therefore, the first layer 302 and the second layer 304 have a dependency relationship.
  • the output data of the second layer 304 is the input data of the third layer 306, the input data of the fifth layer 310 includes the output data of the second layer 304, therefore, the second layer 304 and the third layer 306 have a dependency relationship, the second layer 304 and the fifth layer 310 also have a dependency relationship.
  • the first layer 302 shown in FIG. 3 is the first neural network layer
  • the second layer 304 is the second neural network layer as an example for description.
  • the first output data amount, and the second output data amount determine the N first weights and the first M second weights to be configured in the second neural network layer.
  • N and M are both positive integers
  • the ratio of N and M corresponds to the ratio of the first output data volume and the second output data volume.
  • the deployment requirements may include the calculation delay of the neural network system, or may include the number of chips required to be deployed by the neural network system.
  • the operation of the neural network is mainly to perform matrix multiply-add operations.
  • the output data of each layer of the neural network is also a one-dimensional or multi-dimensional real matrix. Therefore, the first output data includes the first neural network layer.
  • the number of rows and columns of output data, and the second output data amount includes the number of rows and columns of output data of the second neural network layer.
  • a computing node when performing a convolution operation or a pooling operation, it is necessary to perform a multiply-add calculation on the input data and the weight of the corresponding neural network layer. Since the weights are configured on the cells in the crossbar, the crossbars in the calculation unit perform calculations on the input data in parallel, so the number of weights can determine the parallel computing capabilities of multiple calculation units that perform neural network operations. In another way of expression, the computing power of the computing node performing the neural network operation is determined by the number of weights configured in the computing unit performing the neural network operation.
  • the first output data amount and the second output data amount may be based on specific deployment requirements Determine the number of weights to be configured for the first neural network layer and the second neural network layer. Since the weights of different neural network layers are not necessarily the same, for clarity of description, in the embodiments of the present invention, the weights required for the operation of the first neural network layer are called first weights, and the weights required for the operation of the second neural network layer Called the second weight.
  • Performing the first neural network layer operation means that the computing node performs the corresponding calculation on the data input to the first neural network layer based on the first weight
  • performing the first neural network layer operation means that the computing node inputs the second neural network based on the second weight
  • the data of the layer performs corresponding calculations.
  • the calculations here can be neural network operations such as performing convolution or pooling operations.
  • the number of weights to be configured for each layer of the neural network includes the number N of first weights to be configured by the first neural network layer and the number M of second weights to be configured by the second neural network layer.
  • the weight refers to a weight matrix.
  • the number of weights refers to the number of weight matrices required, or the number of copies of weights.
  • the number of weights can also be understood as how many identical weight matrices need to be configured.
  • the first Layer ie, the starting layer of all neural network layers in the neural network system
  • the data output of the neural network the calculation delay, and the calculation frequency of the ReRAM crossbar used in the neural network system to determine the The number of weights that need to be configured for the first layer neural network, and then the number of weights that need to be configured for each layer of neural network according to the number of weights that need to be configured for the first layer neural network and the output data amount of each layer of neural network .
  • the number of weights required for the first layer (ie, the starting layer) neural network can be obtained according to the following formula 1:
  • the number of weights required for the first layer (ie, the starting layer) neural network Is the number of rows of output data of the first layer (ie, the starting layer) neural network, The number of columns of output data for the first layer (ie, the starting layer) neural network.
  • t is the set calculation delay
  • f is the calculation frequency of the CrossBar in the calculation unit.
  • the data volume of the output data of the first layer neural network can be obtained according to the network model information obtained in step 502. It can be understood that, when the first neural network layer is the starting layer of all neural network layers in the neural network system, the number N of the first weight is calculated according to formula one Value.
  • the ratio of the number of weights required by the adjacent two layers can be made to correspond to the ratio of the output data amount of the adjacent two layers.
  • the ratio can be the same. Therefore, in the embodiment of the present invention, the number of weights required by each layer of neural network can be determined according to the number of weights required by the neural network of the starting layer and the output data amount of each layer of neural network. Specifically, the number of weights required for each layer of neural network can be calculated according to the following formula (2):
  • the value of i can be from 2 to n, where n is the total number of neural network layers in the neural network system.
  • the ratio of the number of weights required to perform the operation of the i-1th layer neural network to the number of weights required to perform the ith layer of the neural network operation is the i-1th layer
  • the ratio of the output data volume of and the output data volume of the i-th layer corresponds.
  • the output data of each neural network layer may include multiple channels (channel), where the channel refers to the number of kernels in each neural network layer.
  • a Kernel represents a feature extraction method, corresponding to a feature map (feature map), multiple feature maps constitute the output data of this layer.
  • the weight used by a neural network layer includes multiple kernels. Therefore, in practical applications, in another situation, the output data volume of each layer can also consider the number of channels of each layer of the neural network. Specifically, after obtaining the number of weights required for the first neural network layer according to the above formula 1, the number of weights required for each layer of neural network can be obtained according to the following formula 3:
  • Formula 3 further considers the number of channels output by each layer of neural network on the basis of Formula 2.
  • C i-1 is used to represent the number of channels of the i-1 layer
  • C i is used to represent the number of channels of the i layer
  • the value of i is from 2 to n
  • n is the number of channels of the neural network layer in the neural network system
  • the total number of layers, n is an integer not less than 2.
  • the number of channels of each layer of neural network can be obtained from the network model information.
  • the number of weights required for the starting layer is obtained according to the above formula 1, it can be calculated according to formula 2 (or formula 3) and the output data amount of each layer of neural network included in the network model information The number of weights required for each layer of neural network.
  • the above-mentioned first neural network layer is the starting layer of all neural network layers in the neural network system
  • the number N of the first weight is obtained according to formula 1
  • it can be calculated according to formula 2, according to the value of N
  • the set first output data amount and second output data amount to obtain the number M of second weights required by the second neural network layer.
  • the number of weights required to obtain the first layer of neural network can be calculated in combination with the following formula 4 and the foregoing formula 2, or it can be combined
  • the following formula 4 and the foregoing formula 3 calculate the number of weights required to obtain the first layer of neural network.
  • xb 1 is used to represent the number of crossbars required to deploy a weight of the first layer (or called the starting layer) neural network, Used to represent the number of weights required for the starting layer
  • xb 2 is used to represent the number of crossbars required to deploy one weight in the second layer of neural network, Used to represent the number of weights required for the second layer of neural network
  • xb n is used to represent the number of crossbars required to deploy a weight in the nth layer neural network, It is used to represent the number of weights required for the nth layer neural network
  • K is the number of chips of the neural network system required for deployment requirements
  • L is the number of crossbars in each chip.
  • the network model information of the neural network system also includes the size of a weight used by each neural network layer and crossbar specification information. Therefore, in the embodiment of the present invention, the xb i of the i-th layer neural network can be obtained according to the weight of each layer (ie, the number of rows and columns of the weight matrix) and the specifications of the crossbar, where i takes the value from 2 to n.
  • the value of L can be obtained from the parameters of the chip used by the neural network system.
  • the number of weights required to obtain the starting layer neural network according to Formula 4 and Formula 2 above (ie ), the number of weights that need to be configured for each layer can be obtained according to Equation 2 and the output data amount of each layer obtained from the network model information.
  • the number of weights required to obtain the starting layer neural network according to Formula 4 and Formula 3 above (ie ) can also be obtained according to Equation 3 and the output data amount of each layer.
  • N first weights are deployed on P calculation units, and M second weights are deployed on Q calculation units on.
  • P and Q are both positive integers
  • the P computing units are used to perform operations of the first neural network layer
  • the Q computing units are used to perform operations of the second neural network layer.
  • the calculation specification of the calculation unit refers to the number of crossbars included in one calculation unit.
  • a computing unit may include one or more crossbars. Specifically, as mentioned above, since the network model information of the neural network system further includes the size of one weight used by each neural network layer and the specification information of the crossbar, the deployment relationship between one weight and the crossbar can be obtained.
  • the weights of each layer may be deployed on the corresponding number of calculation units according to the number of crossbars included in each calculation unit.
  • the elements in the weight matrix are respectively configured in the ReRAM cells of the crossbar of the calculation unit.
  • the computing unit may refer to a PE or an engine, one PE may include multiple engines, and one engine may include one or more crossbars. Since the weight of each layer may be different, a weight can be deployed on one or more engines.
  • the P calculation units and the M number of The two weights need to be deployed in Q calculation units.
  • N first weights of the first neural network layer may be deployed on P computing units
  • M second weights may be deployed on Q computing units.
  • the elements in the N first weights are respectively allocated to the corresponding crossbar ReRAM cells in the P calculation units.
  • the elements in the M second weights are respectively allocated to the corresponding crossbar ReRAM cells in the Q calculation units.
  • the P computing units may perform the operation of the first neural network layer on the input data input to the P computing units based on the configured N first weights, and the Q computing units may be based on the configured Q first
  • the second weight performs the operation of the second neural network layer on the input data input to the Q computing units.
  • the resource allocation method provided by the embodiment of the present invention takes into account the amount of data output by the adjacent neural network layer when configuring the computing unit that performs each layer of neural network operations, so that the operation of different neural network layer operations
  • the computing power of the computing nodes matches, so that the computing power of the computing nodes can be fully utilized and the efficiency of data processing can be improved.
  • the transmission bandwidth between computing units or computing nodes is saved.
  • the computing unit can be mapped to the superior computing node of the computing unit according to the following method.
  • the neural network system may include four-level computing nodes: a first-level computing node chip, a second-level computing node tile, a third-level computing node PE, and a fourth-level computing node engine.
  • FIG. 6 describes in detail how to map the P computing units that need to deploy the N first weights and the Q computing units that need to deploy the M second weights Go to the superior computing node.
  • This method can still be implemented by the host 105 in the neural network system shown in FIGS. 1 and 1A. As shown in FIG. 6, the method may include the following steps.
  • the network model information of the neural network system is obtained.
  • the network model information includes the first output data amount of the first neural network layer and the second output data amount of the second neural network layer in the neural network system.
  • the first output data amount, and the second output data amount determine the N first weights and the first M second weights to be configured in the second neural network layer.
  • step 606 according to the calculation specifications of the calculation units in the neural network system, determine the P calculation units that need to be deployed with the N first weights, and the Q number that need to be deployed with the M second weights Calculation unit.
  • step 606 for steps 602, 604, and 606, reference may be made to the related description in the foregoing steps 502, 504, and 506, respectively.
  • step 606 after determining the P computing units to be deployed with the N first weights and the Q computing units to be deployed with the M second weights, and The N first weights are not directly deployed to P computing units, and the M second weights are deployed to Q computing units. Instead, step 608 is entered.
  • step 608 the P computing units and the Q computing units are mapped into multiple three-level computing nodes according to the number of computing units included in the three-level computing nodes in the neural network system.
  • FIG. 6A is a flowchart of a resource mapping method according to an embodiment of the present invention. 6A takes the computing unit as the fourth-level computing node engine as an example, and describes how to map the engine into the third-level computing node PE. As shown in FIG. 6A, the method may include the following steps.
  • the P computing units and the Q computing units are divided into m groups, and each group includes P/m computing units for executing the first neural network layer and Q/m calculation units in the second neural network layer.
  • m is an integer not less than 2, and the values of P/m and Q/m are both integers.
  • the P computing units are used as the computing unit performing the i-1th layer
  • the Q computing units are used as the computing unit performing the i-1th layer as an example.
  • each group of computing units is mapped to the third-level computing node.
  • the process of mapping try to make the computing unit that performs the operation of the adjacent neural network layer map to the same three-level node.
  • each first-level computing node chip includes eight second-level computing node tiles, and each tile includes two third-level computing nodes PE, and each PE includes 4 engines.
  • the four engines at the i-1th layer can be mapped to a third-level computing node PE (such as PE1 in Figure 7), and map the two engines at the ith layer and the i+1th layer The two engines are mapped to a third-level computing node PE (such as PE2 in Figure 7).
  • the mapping method for the computing units in the first group for the computing units in the second group, the four engines at the i-1th layer can be mapped to PE3, and the two engines at the ith layer and the second The two engines in the i+1 layer are mapped onto one PE4 together.
  • the computing units of other groups can be mapped in a mirrored manner according to the mapping method of the first group.
  • the computing units that execute adjacent neural network layers can be mapped to the same three-level computing node as much as possible. Therefore, when the output data of the i-th layer is sent to the computing unit of the i+1th layer, it only needs to be transmitted between the same third-level node (PE), and does not need to occupy the bandwidth between the third-level nodes, which can improve Data transmission speed reduces transmission bandwidth consumption between nodes.
  • PE third-level node
  • step 610 according to the number of third-level computing nodes included in the second-level computing nodes in the neural network system, a plurality of three mapping units of the P computing units and the Q computing units are mapped The level computing nodes are mapped to multiple level two computing nodes.
  • step 612 according to the number of secondary computing nodes included in each neural network chip, the multiple secondary computing nodes mapped by the P computing units and the Q computing units are mapped to the multiple Neural network chip.
  • FIG. 6A takes the example of mapping the engine performing the layer i operation to the third-level computing node as an example. Similarly, according to the method shown in FIG.
  • the third-level node can also be mapped to the second-level node , And map the second-level nodes to the first-level nodes.
  • PE1 performing the operation of the i-1 layer and PE2 performing the operations of the i-th layer and the i+1-th layer may be mapped into the same second-level computing node Tile1.
  • PE3 performing the operation of the i-1 layer and PE4 performing the operations of the i-th layer and the i+1th layer can be further mapped into the same second-level computing node Tile2.
  • the operations Tile1 and Tile2 that perform the i-1th layer, the ith layer, and the i+1th layer can also be mapped into the same chip chip1. In this way, the mapping relationship from the first-level computing node chip to the fourth-level computing node engine in the neural network system can be obtained.
  • the N first weights and the M second weights are deployed to P corresponding to the multiple third-level nodes, multiple second-level computing nodes, and multiple first-level computing nodes, respectively.
  • Calculation units and Q calculation units are obtained.
  • the mapping relationship from the first-level computing node chip to the fourth-level computing node engine in the neural network system can be obtained according to the methods described in FIGS. 6A and 7. For example, a mapping relationship between the P computing units and the Q computing units and the multiple third-level nodes, multiple second-level computing nodes, and multiple first-level computing nodes may be obtained, respectively.
  • the weights of the corresponding neural network layer can be deployed to the computing units of the computing nodes at all levels according to the obtained mapping relationship.
  • the N weights of the i-1th layer can be deployed in the four computing units corresponding to chip1, tile1, and PE1 and the four computing units corresponding to chip1, tile2, and PE3, respectively.
  • the M second weights of the i-th layer are respectively deployed to two computing units corresponding to chip1, tile1 and PE2 and two computing units corresponding to chip1, tile2 and PE4.
  • the N weights of the i-1 layer are respectively deployed in the four computing units in chip1—>tile1—>PE1 and the four computing units in chip1—>tile2—>PE3.
  • the M weights of the i-th layer are respectively deployed in two computing units in chip1—>tile1—>PE2 and two computing units in chip1—>tile2—>PE4.
  • the computing units supporting the operation of the adjacent neural network layer in the neural network system described in the embodiments of the present invention can be matched, but also the computing units performing the operations of the adjacent neural network layer can be made as much as possible Many are located in the same three-level computing node, as many third-level computing nodes executing adjacent neural network layers are located in the same second-level computing node, and as many secondary computing nodes executing adjacent neural network layer are in the same
  • a first-level computing node can reduce the amount of data transmitted between computing nodes and increase the speed of data transmission between different neural network layers.
  • a fourth-level computing node engine is used as a computing unit to describe a process of allocating computing resources for performing operations of different neural network layers.
  • the above embodiment divides the set of operations that perform different neural network layers with the engine as the granularity.
  • the third-level computing node PE can also be used as the computing unit for distribution.
  • the third-level computing node PE and the second-level computing node tile and the first-level computing node chip can be established according to the above method. Mapping.
  • the calculation unit may be engine, PE, tile, or chip, which is not limited herein.
  • FIG. 8 is a flowchart of a data processing method according to an embodiment of the present invention. This method is applied to the neural network system shown in FIG. 1, and the neural network system shown in FIG. 1 is configured by the method shown in FIGS. 5-7 to allocate computing resources for performing different neural network layer operations. As shown in FIG. 8, the method may be implemented by the neural network circuit shown in FIG. 1. The method may include the following steps.
  • P computing units in the neural network system receive first input data.
  • the P computing units are used to perform the first neural network layer operation of the neural network system.
  • the first neural network layer is any layer in the neural network system.
  • the first input data is data that needs to perform the operation of the first neural network layer.
  • the first input data may be data input to the neural network system for the first time.
  • the first input data may be output data processed by other neural network layers.
  • the P calculation units perform calculation on the first input data according to the configured N first weights to obtain first output data.
  • the first weight is a weight matrix.
  • the N first weights refer to N weight matrices, and the N first weights may also be referred to as N first weight copies.
  • the N first weights may be configured in the P calculation units according to the method shown in FIGS. 5-7. Specifically, the elements in the first weights are respectively configured into the ReRAM cells of the crossbars included in the P calculation units, so that the crossbars in the P calculation units can pair the input data based on the N first weights Parallel computing makes full use of the computing power of the crossbar in P computing units.
  • the P calculation units may perform a neural network operation on the received first input data based on the configured N first weights to obtain The first output data.
  • the crossbar in the P calculation units may perform a matrix multiply-add operation on the first input data and the configured first weight.
  • the Q computing units in the neural network system receive second input data.
  • the Q calculation units are used to perform a second neural network layer operation of the neural network system, and the second input data includes the first output data.
  • the Q calculation units may only perform the operation of the second neural network layer on the first output data of the P calculation units.
  • the P computing units are used to perform the operations of the first layer 302 shown in FIG. 3, and the Q computing units are used to perform the operations of the second layer 302 shown in FIG.
  • the second input data is the first output data.
  • the Q calculation units may also be used to jointly perform a second neural network operation on the first output data of the first neural network layer and the output data of other neural network layers.
  • the P computing units may be used to perform the neural network operation of the second layer 304 shown in FIG. 3, and the Q computing units may be used to perform the neural network operation of the fifth layer 310 shown in FIG.
  • the Q calculation units are used to perform operations on the output data of the second layer 304 and the fourth layer 308, and the second input data includes the first output data and the fourth The output data of layer 308.
  • the Q calculation units perform calculation on the second input data according to the configured M second weights to obtain second output data.
  • the second weight is also a weight matrix.
  • the M second weights refer to M weight matrixes, and the M second weights may also be referred to as M second weight copies.
  • the second weight may be configured into the ReRAM cell of the crossbar included in the Q calculation units according to the method shown in FIG.
  • the Q calculation units may perform a neural network operation on the received second input data based on the configured M second weights to obtain the second output data.
  • the crossbar in the Q calculation units may perform a matrix multiply-add operation on the second input data and the configured second weight.
  • the ratio of N and M corresponds to the ratio of the data volume of the first output data to the data volume of the second output data.
  • the weight matrix of j rows and k columns shown in FIG. 9 may be a weight of a neural network layer, and each element in the weight matrix represents a weight value.
  • 10 is a schematic structural diagram of a ReRAM crossbar in a computing unit provided by an embodiment of the present invention.
  • the ReRAM crossbar may be simply referred to as a crossbar in this embodiment of the present invention.
  • the crossbar includes multiple ReRAM cells, such as G 1,1 , G 2,1, and so on.
  • the multiple ReRAM cells constitute a neural network matrix.
  • the weight element W 0,0 in FIG. 9 is configured in G 1,1 in FIG. 10
  • the weight element W 1,0 in FIG. 9 is configured in G 2,1 and so on in FIG. 10.
  • Each weight element corresponds to a ReRAM cell.
  • input data is input to the crossbar through the crossbar word line (input port 1004 shown in FIG. 10).
  • the input data can be expressed by voltage, so that the input data and the weight value configured in the ReRAM cell can be dot-multiplied, and the calculated result can be obtained from the output terminal of each column of the crossbar in the form of output voltage (as shown in FIG. 10)
  • the output port shown is 1006) output.
  • the computing unit that performs each layer of neural network operations in the neural network system is configured, the amount of data output by the adjacent neural network layer is considered, so that the computing power of the computing nodes that perform different neural network layer operations can Match. Therefore, the data processing method provided by the embodiment of the present invention can make full use of the computing power of the computing node and improve the data processing efficiency of the neural network system.
  • an embodiment of the present invention provides a resource allocation apparatus.
  • the device can be applied to the neural network system shown in FIG. 1 and FIG. 1A, and is used to allocate computing nodes that perform operations of different neural network layers, so that the computing nodes used to perform operations of two adjacent neural network layers in the neural network system The matching of computing power improves the data processing efficiency in the neural network system and does not waste computing resources.
  • the resource allocation device may be located in the host, may be implemented by a processor in the host, or may be a physical device that exists independently of the processor. For example, it can be used as a processor-independent compiler.
  • the resource allocation apparatus 1100 may include an acquisition module 1102, a calculation module 1104, and a deployment module 1106.
  • An obtaining module 1102 configured to obtain the data amount of the first output data of the first neural network layer and the data amount of the second output data of the second neural network layer in the neural network system, the input of the second neural network layer
  • the data includes the first output data.
  • the calculation module 1104 is configured to determine N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer according to deployment requirements of the neural network system. Wherein, N and M are both positive integers, and the ratio of N and M corresponds to the ratio of the data volume of the first output data to the data volume of the second output data.
  • the neural network system includes multiple neural network chips, each neural network chip includes multiple computing units, and each computing unit includes at least one resistive random access memory cross matrix ReRAM crossbar .
  • the deployment requirement includes a calculation delay
  • the calculation module is configured to A data volume of the output data, the calculation delay, and the calculation frequency of the resistance random access memory cross matrix ReRAM crossbar in the calculation unit determine the value of N, and according to the data volume of the first output data and the The ratio of the data amount of the second output data and the value of N determine the value of M.
  • the deployment requirement includes the number of chips of the neural network system
  • the first neural network layer is a starting layer of the neural network system
  • the calculation module is configured to The number of Re, the number of ReRAM crossbars in each chip, the number of ReRAM crossbars required to deploy a weight for each layer of neural network, and the ratio of the output data volume of adjacent neural network layers determine the value of N
  • the value of M is determined according to the ratio of the data amount of the first output data to the data amount of the second output data and the value of N.
  • a deployment module 1106, configured to deploy N first weights to P computing units according to the calculation specifications of the calculation units in the neural network system, and deploy M M second weights to Q calculations On the unit, where P and Q are both positive integers, the P computing units are used to perform operations of the first neural network layer, and the Q computing units are used to perform operations of the second neural network layer.
  • the calculation specification of the calculation unit refers to the number of crossbars included in one calculation unit. In practical applications, a computing unit may include one or more crossbars. Specifically, after the calculation module 1104 obtains the number of weights to be configured for each layer of the neural network, the deployment module 1106 may deploy the weights of each layer on the corresponding calculation unit according to the number of crossbars included in each calculation unit.
  • the elements in the weight matrix are respectively configured in the ReRAM cells of the crossbar of the calculation unit.
  • the computing unit may refer to a PE or an engine, one PE may include multiple engines, and one engine may include one or more crossbars. Since the weight of each layer may be different, a weight can be deployed on one or more engines.
  • the neural network system shown in FIG. 1 includes multiple neural network chips, each neural network chip includes multiple secondary computing nodes, and each secondary computing node includes multiple computing units.
  • the resource allocation device 1100 may further include a mapping module 1108 for mapping the computing unit to the superior computing node of the computing unit. Specifically, after the calculation module 1104 obtains N first weights to be configured for the first neural network layer and M second weights to be configured for the second neural network layer, the mapping module 1108 is used to establish The mapping relationship between the N first weights and the P computing units, and the mapping relationship between the M second weights and the Q computing units is established.
  • mapping module 1108 is further configured to map the P computing units and the Q computing units to multiple second units according to the number of computing units included in the secondary computing node in the neural network system In a level computing node, at least a part of the P computing units and at least a part of the Q computing units are mapped into the same level two computing node.
  • mapping module 1108 is further configured to map the P computing units and the Q computing units to the plurality of secondary computing nodes according to the number of secondary computing nodes included in each neural network chip Map into the multiple neural network chips. Wherein, at least a part of the secondary computing nodes of the secondary computing nodes to which the P computing units belong and at least a part of the secondary computing nodes of the secondary computing nodes to which the Q computing units belong are mapped to the same neural network In the chip.
  • mapping module 1108 establishes the mapping relationship between the N first weights and the P computing units, establishes the mapping relationship between the M second weights and the Q computing units, and how The P computing units and the Q computing units are respectively mapped to the upper-level computing nodes of the computing unit.
  • An embodiment of the present invention also provides a computer program product that implements the above resource allocation method, and an embodiment of the present invention also provides a computing program product that implements the above data processing method.
  • the above computer program products all include programs that store program codes.
  • a computer-readable storage medium. The instructions included in the program code are used to execute the method flow described in any one of the foregoing method embodiments.
  • Persons of ordinary skill in the art may understand that the foregoing storage medium includes: a USB flash drive, a mobile hard disk, a magnetic disk, an optical disk, a random access memory (Random-Access Memory, RAM), a solid state disk (SSD), or a non-volatile memory
  • RAM random access memory
  • SSD solid state disk
  • a non-transitory machine-readable medium that can store program code, such as a non-volatile memory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供了一种神经网络系统及数据处理方法。所述神经网络系统包括用于执行第一神经网络层操作的P个计算单元以及用于执行第二神经网络层操作的Q个计算单元。所述P个计算单元用于在接收第一输入数据后,根据配置的N个第一权重对所述第一输入数据执行计算以得到第一输出数据。所述Q个用于在接收第二输入数据后,根据配置的M个第二权重对所述第二输入数据执行计算以得到第二输出数据。其中,所述第二输入数据包括所述第一输出数据。N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值对应。本申请提供的神经网络系统能够应用于人工智能领域,提升神经网络系统的数据处理效率。

Description

神经网络系统及数据处理技术 技术领域
本申请涉及计算机技术领域,尤其涉及一种神经网络系统及数据处理技术。
背景技术
深度学习(Deep Learning,DL)是人工智能(Artificial Intelligence,AI)的一个重要分支,深度学习是为了模仿人脑构造的一种神经网络,可以达到比传统的浅层学习方式更好的识别效果。卷积神经网络(Convolutional Neural Network,CNN)是一种最常见的深度学习架构,也是研究最广泛的深度学习方法。典型的卷积神经网络处理领域是图像处理。图像处理是对输入的图像进行识别和分析的应用,最终输出一组分类完成的图像内容。例如,我们可以利用卷积神经网络算法对一张图片上机动车的车身颜色、车牌号码及车型进行提取并分类输出。
卷积神经网络通常通过一个三层序列:卷积层(Convolutional Layer)、池化层(Pooling Layer)和修正线性单元(Rectified Liner Units,ReLU),对图片的特征进行提取。提取图片特征的过程实际上是一系列矩阵操作(例如,矩阵乘加操作)的过程。因此,如何对网络中的图片并行、快速的处理成为一个卷积神经网络需要研究的问题。
发明内容
本申请提供的一种神经网络系统及数据处理技术,能够提升神经网络中的数据处理速度。
第一方面,本发明实施例提供了一种神经网络系统。所述神经网络系统包括用于执行第一神经网络层操作的P个计算单元和用于执行第二神经网络层操作的Q个计算单元。所述P个计算单元用于接收第一输入数据,并根据配置的N个第一权重对所述第一输入数据执行计算以得到第一输出数据。所述Q个计算单元用于接收第二输入数据,并根据配置的M个第二权重对所述第二输入数据执行计算以得到第二输出数据。其中,所述第二输入数据包括所述第一输出数据。P、Q、N和M都是正整数,N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值对应。
本发明实施例提供的神经网络系统,由于执行第一神经网络层操作的P个计算单元上配置的N个权重与执行第二神经网络层操作的Q个计算单元上配置的M个权重的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值对应,从而使得P个计算单元和Q个计算单元的计算能力相匹配。从而能够充分利用执行每一层神经网络操作的计算节点的计算能力,提升数据处理的效率。
结合第一方面,在第一种可能的实现方式中,所述神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个二级计算节点,每个二级计算节点包括多个计算单元,每个计算单元包括至少一个阻变式随机访问存储器交叉矩阵ReRAM crossbar。
结合上述第一方面或第一种可能的实现方式,在第二种可能的实现方式中,所述N和M的值根据所述神经网络系统的部署需求、所述第一输出数据量以及所述第二输出数据量来确定。
结合第二种可能的实现方式,在第三种可能的实现方式中,所述部署需求包括计算时延,所述第一神经网络层为所述神经网络系统的中所有神经网络层的起始层,所述N的值根据所述第一输出数据的数据量、所述计算时延以及所述ReRAM crossbar的计算频率来确定,所述M的值根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值来确定。
在一种可能的实现方式中,当所述第一神经网络层为所述神经网络系统中所有神经网络层的起始层时,所述N的值可以根据下面的公式获得:
Figure PCTCN2018125761-appb-000001
其中,
Figure PCTCN2018125761-appb-000002
用于指示第一层神经网络所需配置的权重的数量N,
Figure PCTCN2018125761-appb-000003
为第一层神经网络的输出数据的行数,
Figure PCTCN2018125761-appb-000004
为所述第一层神经网络输出数据的列数。t为设置的计算时延,f为计算单元中的CrossBar的计算频率。所述M的值可以根据下面的公式来计算:N/M=第一输出数据量/第二输出数据量。
结合第二种可能的实现方式,在第四种可能的实现方式中,所述部署需求包括所述神经网络芯片的数量,所述第一神经网络层为所述神经网络系统的中所有神经网络层的起始层,所述N的值根据所述芯片的数量、每个芯片中的ReRAM crossbar的数量、部署每一层神经网络的一个权重所需的ReRAM crossbar的数量、以及相邻神经网络层的输出数据的数据量的比值来确定,所述M的值根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值来确定。
具体的,在一种可能的实现方式中,当部署需求为所述神经网络系统所需的芯片数量,且所述第一神经网络层为所述神经网络系统的起始层时,可以根据下述两个公式获得所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重,其中N的值即为
Figure PCTCN2018125761-appb-000005
的值,M的值即为
Figure PCTCN2018125761-appb-000006
的值。
Figure PCTCN2018125761-appb-000007
Figure PCTCN2018125761-appb-000008
其中,xb 1用于表示部署第一层(或称为起始层)神经网络的一个权重所需的crossbar的数量,
Figure PCTCN2018125761-appb-000009
用于表示起始层所需的权重数量,xb 2用于表示部署第 二层神经网络中的一个权重所需的crossbar的数量,
Figure PCTCN2018125761-appb-000010
用于表示第二层神经网络所需的权重的数量。xb n用于表示部署第n层神经网络中的一份权重所需的crossbar的数量,
Figure PCTCN2018125761-appb-000011
用于表示第n层神经网络所需的权重的数量,K为部署需求所要求的神经网络系统的芯片的数量,L为每个芯片中的crossbar的数量。
Figure PCTCN2018125761-appb-000012
用于表示第i层所需的权重的数量;
Figure PCTCN2018125761-appb-000013
用于表示第i-1层所需的权重的数量,
Figure PCTCN2018125761-appb-000014
用于表示第i层输出数据的行数,
Figure PCTCN2018125761-appb-000015
用于表示第i层输出数据的列数,
Figure PCTCN2018125761-appb-000016
用于表示第i-1层输出数据的行数,
Figure PCTCN2018125761-appb-000017
用于表示第i-1层输出数据的列数,i的值可以从2到n,n为所述神经网络系统中神经网络层的总层数。
结合上述第一方面的任意一种可能的实现方式,在第五种可能的实现方式中,所述P个计算单元中的至少一部分计算单元和所述Q个计算单元中的至少一部分计算单元位于同一个二级计算节点中。
结合上述第一方面的任意一种可能的实现方式,在第六种可能的实现方式中,所述P个计算单元所属的二级计算节点中的至少一部分二级计算节点与所述Q个计算单元所属的二级计算节点中的至少一部分二级计算节点位于同一个神经网络芯片中。
结合上述第一方面或第一方面的任意一种可能的实现方式,在又一种可能的实现方式中,所述N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值对应包括:所述N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值相同。
第二方面,本申请提供了一种应用于神经网络系统中的数据处理方法。根据该方法,所述神经网络系统中的P个计算单元接收第一输入数据,并根据配置的N个第一权重对所述第一输入数据执行计算以得到第一输出数据,其中,所述P个计算单元用于执行第一神经网络层操作。所述神经网络系统中的Q个计算单元接收第二输入数据,并根据配置的M个第二权重对所述第二输入数据执行计算以得到第二输出数据。其中,所述Q个计算单元用于执行第二神经网络层操作,所述第二输入数据包括所述第一输出数据,P、Q、N和M都是正整数,N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值对应。
结合第二方面,在第一种可能的实现方式中,所述第一神经网络层为所述神经网络系统的中所有神经网络层的起始层;所述N的值根据所述第一输出数据的数据量、设置的神经网络系统的计算时延以及所述计算单元中的ReRAM crossbar的计算频率来确定;所述M的值根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值来确定。
结合第二方面,在第二种可能的实现方式中,所述神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个计算单元,每个计算单元包括至少一个阻变式随机访问存储器交叉矩阵ReRAM crossbar,所述第一神经网络层为所述神经网络系统的起始层。所述N的值根据所述多个神经网络芯片的数 量、每个芯片中的ReRAM crossbar的数量、部署每一层神经网络的一个权重所需的ReRAM crossbar的数量、以及相邻神经网络层的输出数据量的比值来确定;所述M的值根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值来确定。
结合第二方面或第二方面的任意一种可能的实现方式,在第三种可能的实现方式中,所述神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个二级计算节点,每个二级计算节点包括多个计算单元。所述P个计算单元中的至少一部分计算单元和所述Q个计算单元中的至少一部分计算单元位于同一个二级计算节点中。
结合第二方面的第三种可能的实现方式,在又一种可能的实现方式中,所述P个计算单元所属的二级计算节点中的至少一部分二级计算节点与所述Q个计算单元所属的二级计算节点中的至少一部分二级计算节点位于同一个神经网络芯片中。
结合第二方面或上述第二方面的任意一种可能的实现方式,在又一种可能的实现方式中,所述N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值相同。
第三方面,本申请还提供了一种计算机程序产品,包括程序代码,所述程序代码包括的指令被计算机所执行,以实现所述第一方面以及所述第一方面的任意一种可能的实现方式中所述的数据处理方法。
第四方面,本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储程序代码,所述程序代码包括的指令被计算机所执行,以实现前述第一方面以及所述第一方面的任意一种可能的实现方式中所述的方法。
附图说明
为了更清楚的说明本发明实施例或现有技术中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例。
图1为本发明实施例提供的一种神经网络系统的结构示意图;
图1A为本发明实施例提供的又一种神经网络系统的结构示意图;
图2为本发明实施例提供的一种神经网络芯片中的计算节点的结构示意图;
图3为本发明实施例提供的一种神经网络系统中的神经网络层的逻辑结构示意图;
图4为本发明实施例提供的一种神经网络系统中处理不同神经网络层数据的计算节点集合示意图;
图5为本发明实施例提供的一种神经网络系统中的计算资源分配方法流程图;
图6为本发明实施例提供的又一种计算资源分配方法流程图;
图6A为本发明实施例提供的一种资源映射方法流程图;
图7为本发明实施例提供的又一种计算资源分配方法示意图;
图8为本发明实施例提供的一种数据处理方法流程图;
图9为本发明实施例提供的一个权重示意图;
图10为本发明实施例提供的一种阻变式随机访问存储器交叉矩阵(ReRAM crossbar)的结构示意图;
图11为本发明实施例提供的一种资源分配装置的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚的描述。显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。
深度学习(Deep Learning,DL)是人工智能(Artificial Intelligence,AI)的一个重要分支,深度学习是为了模仿人脑构造的一种神经网络,可以达到比传统的浅层学习方式更好的识别效果。人工神经网络(Artificial Neural Network,ANN),简称为神经网络(Neural Network,NN)或类神经网络,在机器学习和认知科学领域,是一种模仿生物神经网络(动物的中枢神经系统,特别是大脑)的结构和功能的数学模型或计算模型,用于对函数进行估计或近似。人工神经网络可以包括卷积神经网络(Convolutional Neural Network,CNN)、深度神经网络(Deep Neural Networks,DNN)、多层感知器(Multilayer Perceptron,MLP)等神经网络。图1为本发明实施例提供的一种人工神经网络系统的结构示意图。图1以卷积神经网络为例进行图示。如图1所示,卷积神经网络系统100可以包括主机105以及卷积神经网络电路110。卷积神经网络电路110也可以被称为神经网络加速器。卷积神经网络电路110通过主机接口与主机105连接。主机接口可以包括标准的主机接口以及网络接口(network interface)。例如,主机接口可以包括快捷外设互联标准(Peripheral Component Interconnect Express,PCIE)接口。如图1所示,卷积神经网络电路110可以通过PCIE总线106与主机105连接。因此,数据可以通过PCIE总线106输入至卷积神经网络电路110中,并通过PCIE总线106接收卷积神经网络电路110处理完成后的数据。并且,主机105也可以通过主机接口监测卷积神经网络电路110的工作状态。
主机105可以包括处理器1052以及内存1054。需要说明的是,除了图1所示的器件外,主机105还可以包括通信接口以及作为外存的磁盘等其他器件,在此不做限制。
处理器(Processor)1052是主机105的运算核心和控制核心(Control Unit)。处理器1052中可以包括多个处理器核(core)。处理器1052可以是一块超大规模的集成电路。在处理器1052中安装有操作系统和其他软件程序,从而处理器1052能够实现对内存1054、缓存、磁盘及外设设备(如图1中的神经网络电路)的访问。可以理解的是,在本发明实施例中,处理器1052中的Core例 如可以是中央处理器(Central Processing unit,CPU),还可以是其他特定集成电路(Application Specific Integrated Circuit,ASIC)。
内存1054是主机105的主存。内存1054通过双倍速率(double data rate,DDR)总线和处理器1052相连。内存1054通常用来存放操作系统中各种正在运行的软件、输入和输出数据以及与外存交换的信息等。为了提高处理器1052的访问速度,内存1054需要具备访问速度快的优点。在传统的计算机系统架构中,通常采用动态随机存取存储器(Dynamic Random Access Memory,DRAM)作为内存1054。处理器1052能够通过内存控制器(图1中未示出)高速访问内存1054,对内存1054中的任意一个存储单元进行读操作和写操作。
卷积神经网络(CNN)电路110是由多个神经网络(NN)芯片(chip)组成的芯片阵列。例如,如图1所示,CNN电路110包括多个NN芯片115和多个路由器120。为了描述方便,本发明实施例将申请中的NN芯片115简称为芯片115。所述多个芯片115通过路由器120相互连接。例如,一个芯片115可以与一个或多个路由器120连接。多个路由器120可以组成一种或多种网络拓扑。芯片115之间可以通过所述一种或多种网络拓扑进行数据传输。例如,所述多个路由器120可以组成第一网络1106以及第二网络1108,其中,第一网络1106为环形网络,第二网络1108为二维网状(2D mesh)网络。从而,从输入端口1102输入的数据能够所述多个路由器120组成的网络发送给相应的chip115,任意一个芯片115处理后的数据也可以通过所述多个路由器120组成的网络发送给其他芯片115处理或从输出端口1104发送出去。
进一步的,图1也示出了芯片115的结构示意图。如图1所示,chip115可以包括多个神经网络处理单元125以及多个路由器122。图1以神经网络处理单元为瓦片(tile)为例进行描述。在图1所示的数据处理芯片115的架构中,一个tile 125可以与一个或多个路由器122相连。芯片115中的所述多个路由器122可以组成一种或多种网络拓扑。Tile 125之间可以通过所述多种网络拓扑进行数据传输。例如,所述多个路由器122可以组成第一网络1156以及第二网络1158,其中,第一网络1156为环形网络,第二网络1158为二维网状(2D mesh)网络。从而,从输入端口1152输入芯片115的数据能够根据所述多个路由器122组成的网络发送给相应的tile 125,任意一个tile 125处理后的数据也可以通过所述多个路由器122组成的网络发送给其他tile 125或从输出端口1154发送出去。
需要说明的是,当芯片115之间通过路由器互联时,卷积神经网络电路110中的多个路由器120组成的一种或多种网络拓扑与数据处理芯片115中的多个路由器122组成的网络拓扑可以相同也可以不相同,只要芯片115之间或tile 125之间能够通过网络拓扑进行数据传输,且芯片115或tile 125能够通过所述网络拓扑接收数据或输出数据即可。在本发明实施例中并不对多个路由器120和122组成的网络的数量和类型进行限制。并且,本发明实施例中,路由器120和路由器122可以相同也可以不同。为了描述清楚,在图1中将连接芯片的路由器120和连接tile的路由器122在标识上进行了区分。为了描述方便,在本发明 实施例中,也可以将卷积神经网络系统中的芯片115或tile 125称为计算节点(computing node)。
实际应用中,在另一种情形下,芯片115之间还可以通过高速接口(High Transport IO)互联,而不通过路由器120进行互联。如图1A所示,图1A为本发明实施例提供的又一种神经网络系统的结构示意图。在图1A所示的神经网络系统中,主机105通过PCIE接口107与多个PCIE卡109连接,每个PCIE卡109上可以包括多个神经网络芯片115,神经网络芯片之间通过高速互联接口连接。在此不对芯片之间的互联方式进行限定。可以理解的是,实际应用中,芯片内部的tile之间也可以不通过路由器连接,而采用图1A所示的芯片之间的高速互联方式。在另一种情形下,还可以是芯片内部的tile之间采用图1所示的路由器连接,而芯片之间采用图1A所示的高速互联方式。本发明实施例并不对芯片之间或芯片内部的连接方式进行限定。
图2为本发明实施例提供的一种神经网络芯片中的计算节点的结构示意图。如图2所示,芯片115中包括多个路由器120,每个路由器可以连接一个tile 125。实际应用中,一个路由器120还可以连接多个tile 125。如图2所示,每个tile 125可以包括输入输出接口(TxRx)1252、交换装置(TSW)1254以及多个处理器件(processing element,PE)1256。所述TxRx 1252用于接收从Router120输入tile125的数据,或者输出tile125的计算结果。换一种表达方式,TxRx 1252用于实现tile 125和router120之间的数据传输。交换机(TSW)1254连接TxRx 1252,所述TSW 1254用于实现所述TxRx 1252以及多个PE 1256之间的数据传输。每个PE 1256中可以包括一个或多个计算引擎(computing engine)1258,所述一个或多个计算引擎1258用于实现对输入PE 1256中的数据进行神经网络计算。例如,可以对输入tile 125的数据与tile125中预设的卷积核进行乘加运算。Engine 1258的计算结果可以通过TSW 1254以及TxRx 1252发送给其他tile 125。实际应用中,一个Engine 1258可以包括实现卷积、池化pooling或其他神经网络操作的模块。在此,不对Engine的具体电路或功能进行限定。为了描述简便,在本发明实施例中,将计算引擎简称为引擎engine。
本领域技术人员可以知道,由于阻变式随机访问存储器(Resistive random-access memory,ReRAM)这种新型的非易失性存储器具有集存储和计算于一体的优势,近年来,也被广泛应用于神经网络系统中。例如,多个忆阻器单元(ReRAM cell)组成的阻变式随机访问存储器交叉矩阵(ReRAM crossbar)可用于在神经网络系统中执行矩阵乘加运算。在本发明实施例中,Engine 1258可以包括一个或多个crossbar。ReRAM crossbar的结构可以如图10所示,后面会对如何通过ReRAM crossbar进行矩阵乘加运算进行介绍。根据上述对神经网络的介绍可以看出,本发明实施例提供的神经网络电路包括多个NN芯片,每个NN芯片包括多个瓦片tile,每个tile包括多个处理器件PE,每个PE包括多个引擎engine,每个engine由一个或多个ReRAM crossbar来实现。由此可见,本发明实施例提供的神经网络系统可以包括多级计算节点,例如,可以包括四级计算 节点:第一级计算节点为芯片115,第二级计算节点为芯片内的tile,第三级计算节点为tile内的PE,第四级计算节点为PE内的Engine。
另一方面,本领域技术人员可以知道,神经网络系统可以包括多个神经网络层。在本发明实施例中,神经网络层为逻辑的层概念,一个神经网络层是指要执行一次神经网络操作。每一层神经网络计算均是由计算节点来实现。神经网络层可以包括卷积层、池化层等。如图3所示,神经网络系统中可以包括n个神经网络层(又可以被称为n层神经网络),其中,n为大于或等于2的整数。图3示出了神经网络系统中的部分神经网络层,如图3所示,神经网络系统可以包括第一层302、第二层304、第三层306、第四层308、第五层310至第n层312。其中,第一层302可以执行卷积操作,第二层304可以是对第一层302的输出数据执行池化操作,第三层306可以是对第二层304的输出数据执行卷积操作,第四层308可以对第三层306的输出结果执行卷积操作,第五层310可以对第二层304的输出数据以及第四层308的输出数据执行求和操作等等。可以理解的是,图4只是对神经网络系统中的神经网络层的一个简单示例和说明,并不对每一层神经网络的具体操作进行限制,例如,第四层308也可以是池化运算,第五层310也可以是做卷积操作或池化操作等其他的神经网络操作。
在现有的神经网络系统中,当神经网络中的第i层计算完成后,会将第i层的计算结果暂存在预设的缓存中,在执行第i+1层的计算时,计算单元需要重新从预设的缓存中加载第i层的计算结果和第i+1层的权重进行计算。其中,第i层为神经网络系统中的任意一层。在本发明实施例中,由于神经网络系统的Engine中采用了ReRAM crossbar,又由于ReRAM具有存算一体的优势,权重可以在计算之前配置到ReRAM cell上,而计算结果可以直接发送给下一层进行流水线计算。因此,每层神经网络只用缓存很少的数据,例如,每层神经网络只需要缓存够一次窗口计算的输入数据即可。进一步的,为了实现对数据的并行、快速处理,本发明实施例提供了一种通过神经网络对数据进行流处理的方式。为了描述清楚,下面结合图1的卷积神经网络系统简要介绍一下神经网络系统的流处理。
如图4所示,为了实现对数据的快速处理,可以将系统中的计算节点分为多个节点集合以分别执行不同神经网络层的计算。图4以对图1所示神经网络系统中的tile 125划分为例,对本发明实施例中实现不同层的神经网络计算的不同计算节点集合进行示例。如图4所示,可以将芯片115中的多个tile 125划分为多个节点集合。例如:第一节点集合402、第二节点集合404、第三节点集合406、第四节点集合408以及第五节点集合410。其中,每个节点集合中包括至少一个计算节点(例如tile 125)。同一节点集合的计算节点用于对进入同一神经网络层的数据执行神经网络操作,不同神经网络层的数据由不同节点集合的计算节点进行处理。一个计算节点处理后的处理结果将传输给其他节点集合中的计算节点进行处理,这种流水线式的处理方式使每一层神经网络只需要缓存很少的数据,并能够使得多个计算节点并发处理同一条数据流,提高处理效率。需要 说明的是,图4是以tile为例对用于处理不同神经网络层(例如卷积层)的计算节点集合进行示例。实际应用中,由于tile中包含多个PE,每个PE中包含多个Engine,并且不同应用场景所需的计算量不同。因此,也可以根据实际的应用情况,以PE、Engine或chip为粒度对神经网络系统中的计算节点进行划分,使得不同的集合中的计算节点用于处理不同神经网络层的操作。根据这种方式,本发明实施例所指的计算节点可以是Engine、PE、tile、或芯片chip。
此外,本领域技术人员可以知道,计算节点(例如,tile125)在执行神经网络操作(例如,卷积计算)时,可以基于对应神经网络层的权重(weight)对输入计算节点的数据进行计算,例如,某个tile 125可以基于对应卷积层的权重对输入该tile125的输入数据执行卷积操作,例如,对权重与输入数据执行矩阵乘加计算。权重通常用于表示输入数据对于输出数据的重要程度。在神经网络中,权重通常用一个矩阵表示。如图9所示,图9所示的j行k列的权重矩阵可以是一个神经网络层的一个权重,该权重矩阵中的每一个元素代表一个权重值。在本发明实施例中,由于一个节点集合的计算节点用于执行一个神经网络层的操作,因此,同一节点集合的计算节点可以共享权重,不同节点集合中的计算节点的权重可以不相同。在本发明实施例中,各计算节点中的权重可以通过事先配置完成。具体的,一个权重矩阵中的每个元素被配置在对应的crossbar阵列中的ReRAM cell中,从而,可以通过crossbar阵列实现输入数据与配置的权重的矩阵乘加操作。后续将对如何通过crossbar实现矩阵乘加运算进行简要介绍。
根据上述的描述可知,在本发明实施例中,在实现神经网络流处理的过程中,可以将神经网络中的计算节点划分为用于处理不同神经网络层的节点集合,并配置对应的权重。从而,不同节点集合的计算节点能够根据配置的权重执行相应的计算。并且,每个节点集合的计算节点能够将计算结果发送给用于执行下一层神经网络操作的计算节点。本领域技术人员可以知道,在实现神经网络的流处理过程中,如果执行不同层神经网络操作的计算资源不匹配,例如,执行上层神经网络操作的计算资源少,而执行下一层神经网络操作的计算资源相对较多,则会导致下一层计算节点的计算资源浪费的情况。为了充分利用计算节点的计算能力,使执行不同神经网络层操作的计算节点的计算能力相匹配,本发明实施例提供了一种计算资源分配方法,用于分配执行不同神经网络层操作的计算节点,使得神经网络系统中用于执行相邻两层神经网络操作的计算节点的计算能力匹配,提高了神经网络系统中的数据处理效率,并且不浪费计算资源。
图5为本发明实施例提供的一种神经网络系统中的计算资源分配方法的流程图。该方法可以应用于图1所示的神经网络系统。该方法可以在部署神经网络时或在配置神经网络系统时,由主机105来实现,具体的,可以由主机105中的处理器1052来实现。如图5所示,该方法可以包括下述步骤。
在步骤502中,获取神经网络系统的网络模型信息。所述网络模型信息包括所述神经网络系统中第一神经网络层的第一输出数据量和第二神经网络层的第二输出数据量。网络模型信息可以根据实际的应用需求进行确定。例如, 可以根据神经网络系统的应用场景来确定神经网络层的总层数及每一层的算法。网络模型信息中可以包括神经网络系统中神经网络层的总层数、每一层的算法、以及每一层神经网络的数据输出量。在本发明实施例中,算法是指需要执行的神经网络操作,例如,算法可以是指卷积操作、池化操作等。如图3所示,本发明实施例的神经网络系统的神经网络层可以有n层,其中,n为不小于2的整数。在本步骤中,第一神经网络层和第二神经网络层可以是n层中在操作上有依赖关系的两层。在本发明实施例中,具有依赖关系的两个神经网络层是指一个神经网络层的输入数据包括另一神经网络层的输出数据。具有依赖关系的两个神经网络层也可以被称为是相邻层。例如,如图3所示,第一层302的输出数据为第二层304的输入数据,因此,第一层302和第二层304有依赖关系。第二层304的输出数据为第三层306的输入数据,第五层310的输入数据包括第二层304的输出数据,因此,第二层304和第三层306有依赖关系,第二层304和第五层310也具有依赖关系。为了描述清楚,在本发明实施例中,以图3中所示的第一层302为第一神经网络层,第二层304为第二神经网络层为例进行描述。
在步骤504中,根据所述神经网络系统的部署需求、所述第一输出数据量以及所述第二输出数据量确定所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重。其中,N和M都是正整数,且N和M的比值与所述第一输出数据量与所述第二输出数据量的比值对应。实际应用中,部署需求可以包括神经网络系统的计算时延,也可以包括所述神经网络系统所需部署的芯片的数量。本领域技术人员可以知道,神经网络操作主要是执行矩阵乘加操作,每一层神经网络的输出数据也是一个一维或多维的实数矩阵,因此,第一输出数据量包括第一神经网络层的输出数据的行数和列数,第二输出数据量包括第二神经网络层的输出数据的行数和列数。
如前所述,计算节点执行神经网络操作时,例如执行卷积操作或池化操作时,需要对输入数据与对应神经网络层的权重执行乘加计算。由于权重配置在crossbar中的cell上,计算单元中的crossbar并行对输入数据执行计算,因此,权重的数量可以确定执行神经网络操作的多个计算单元的并行计算能力。换一种表达方式,执行神经网络操作的计算节点的计算能力是由执行所述神经网络操作的计算单元中配置的权重的数量决定的。在本发明实施例中,为了使神经网络系统中执行相邻操作的两层神经网络的计算能力相匹配,可以根据具体的部署需求以及所述第一输出数据量以及所述第二输出数据量确定第一神经网络层和第二神经网络层需配置的权重的数量。由于不同神经网络层的权重不一定相同,为了描述清楚,在本发明实施例中,将第一神经网络层操作所需的权重称为第一权重,将第二神经网络层操作所需的权重称为第二权重。执行第一神经网络层操作就是指计算节点基于第一权重对输入第一神经网络层的数据执行相应的计算,执行第一神经网络层操作就是指计算节点基于第二权重对输入第二神经网络层的数据执行相应的计算。这里的计算可以是执行卷积或池化运算等神经网络操作。
下面将分别根据不同的部署需求详细描述在本步骤中如何确定每一层神经网络需配置的权重的数量。其中每一层神经网络需配置的权重的数量包括所述第一神经网络层需配置的第一权重的数量N以及所述第二神经网络层需配置的第二权重的数量M。在本发明实施例中,权重是指的权重矩阵。权重的数量是指所需的权重矩阵的个数,或者说权重的副本数。权重的数量也可以被理解为需要配置多少个相同的权重矩阵。
在一种情形下,当所述神经网络系统的部署需求为所述神经网络系统的计算时延的情况下,为了使整个神经网络系统的计算不超过设置的计算时延,可以先根据第一层(即,所述神经网络系统中所有神经网络层中的起始层)神经网络的数据输出量、所述计算时延以及所述神经网络系统中使用的ReRAM crossbar的计算频率来确定所述第一层神经网络所需要配置的权重的数量,再根据所述第一层神经网络需要配置的权重的数量以及每一层神经网络的输出数据量获得每一层神经网络需要配置的权重的数量。具体的,所述第一层(即,所述起始层)神经网络所需配置的权重的数量可以按照下述公式一获得:
Figure PCTCN2018125761-appb-000018
其中,
Figure PCTCN2018125761-appb-000019
用于指示第一层(即,所述起始层)神经网络所需配置的权重的数量,
Figure PCTCN2018125761-appb-000020
为第一层(即,所述起始层)神经网络的输出数据的行数,
Figure PCTCN2018125761-appb-000021
为所述第一层(即,所述起始层)神经网络输出数据的列数。t为设置的计算时延,f为计算单元中的CrossBar的计算频率。本领域技人员可以知道,f的值可以根据采用的crossbar的配置参数获得。第一层神经网络输出数据的数据量可以根据步骤502中获取的网络模型信息获得。可以理解的是,当第一神经网络层所述神经网络系统中所有神经网络层的起始层时,第一权重的数量N的数量即为根据公式一计算的
Figure PCTCN2018125761-appb-000022
的值。
在获得起始层神经网络所需的权重的数量后,为了提高神经网络系统中数据处理效率,避免流水线式的并行处理方式出现瓶颈或数据等待,使相邻神经网络层的处理速度匹配,在本发明实施例中,可以使相邻两层所需的权重的数量的比值与相邻两层的输出数据量的比值相对应。例如,比值可以相同。因此,在本发明实施例中,可以根据起始层神经网络所需的权重的数量以及每一层神经网络的输出数据量的比值确定每一层神经网络所需的权重的数量。具体可以按照下述公式(二)计算每一层神经网络所需的权重的数量:
Figure PCTCN2018125761-appb-000023
其中,
Figure PCTCN2018125761-appb-000024
用于表示第i层所需的权重的数量;
Figure PCTCN2018125761-appb-000025
用于表示第i-1层所需的权重的数量,
Figure PCTCN2018125761-appb-000026
用于表示第i层输出数据的行数,
Figure PCTCN2018125761-appb-000027
用于表示第i层输出数据的列数,
Figure PCTCN2018125761-appb-000028
用于表示第i-1层输出数据的行数,
Figure PCTCN2018125761-appb-000029
用 于表示第i-1层输出数据的列数,i的值可以从2到n,n为所述神经网络系统中神经网络层的总层数。换一种表达方式,在本发明实施例中,执行第i-1层神经网络操作所需的权重的数量与执行第i层神经网络操作所需的权重的数量的比值与第i-1层的输出数据量和第i层的输出数据量的比值对应。
本领域技术人员可以知道,每一个神经网络层的输出数据可以包括多个通道(channel),其中,通道是指每个神经网络层中kernel的数量。一个Kernel代表一种特征提取方式,对应产生一个特征图(feature map),多个特征图构成该层的输出数据。一个神经网络层使用的权重包括多个kernel。因此,在实际应用中,在又一种情形下,每一层的输出数据量还可以考虑每一层神经网络的通道数。具体的,在根据上述公式一获得第一神经网络层所需的权重的数量后,可以根据下述公式三获得每一层神经网络所需的权重的数量:
Figure PCTCN2018125761-appb-000030
公式三与公式二的区别在于,公式三在公式二的基础上进一步考虑了每一层神经网络输出的通道数。其中,C i-1用于表示第i-1层的通道数,C i用于表示第i层的通道数,i的值从2到n,n为所述神经网络系统中神经网络层的总层数,n为不小于2的整数。每一层神经网络的通道数可以从网络模型信息中获得。
在本发明实施例中,在根据上述公式一获得起始层所需的权重的数量后,可以按照公式二(或公式三)以及网络模型信息中包含的每一层神经网络的输出数据量计算每一层神经网络所需的权重的数量。例如,当上述第一神经网络层为所述神经网络系统中所有神经网络层的起始层时,则在根据公式一计算获得第一权重的数N后,可以按照公式二,根据N的值以及设置的第一输出数据量和第二输出数据量获得第二神经网络层所需的第二权重的数量M。换一种表达方式,在获得N的值后,可以根据下述公式来计算M的值:N/M=第一输出数据量/第二输出数据量。
在又一种情况下,当部署需求为所述神经网络系统所需的芯片数量时,可以结合下述公式四和前述公式二计算获得第一层神经网络所需的权重的数量,也可以结合下述公式四和前述公式三计算获得第一层神经网络所需的权重的数量。
Figure PCTCN2018125761-appb-000031
在上述公式四中,xb 1用于表示部署第一层(或称为起始层)神经网络的一个权重所需的crossbar的数量,
Figure PCTCN2018125761-appb-000032
用于表示起始层所需的权重数量,xb 2用于表示部署第二层神经网络中的一个权重所需的crossbar的数量,
Figure PCTCN2018125761-appb-000033
用于表示第二层神经网络所需的权重的数量。xb n用于表示部署第n层神经网络中的一份权重所需的crossbar的数量,
Figure PCTCN2018125761-appb-000034
用于表示第n层神经网络所需的权重的数 量,K为部署需求所要求的神经网络系统的芯片的数量,L为每个芯片中的crossbar的数量。上述公式四表示各神经网络层的crossbar的数量的总和小于等于设置的神经网络中芯片中包括的crossbar的总数。对公式二和公式三的描述可以参考前面的描述,在此不再赘述。
本领域技术人员可以知道,在确定神经网络系统的模型后,该神经网络系统的每一个神经网络层的一个权重以及神经网络系统中采用的crossbar的规格(即crossbar中ReRAM cell的行数和列数)就已经确定。换一种表达方式,神经网络系统的网络模型信息还包括每一个神经网络层所使用的一个权重的大小和crossbar的规格信息。因此,在本发明实施例中,可以根据每一层的权重的大小(即权重矩阵的行数和列数)以及crossbar的规格分别获得第i层神经网络的xb i,其中i的取值从2到n。L的值可以从所述神经网络系统采用的芯片的参数获得。在本发明实施例中,一种情形下,在根据上述公式四和公式二获得所述起始层神经网络所需的权重的数量(即
Figure PCTCN2018125761-appb-000035
)后,可以根据公式二以及从网络模型信息中获得的每一层的输出数据量获得每一层需要配置的权重的数量。在另一种情形下,在根据上述公式四和公式三获得所述起始层神经网络所需的权重的数量(即
Figure PCTCN2018125761-appb-000036
)后,也可以根据公式三以及每一层的输出数据量获得每一层需要配置的权重的数量。
在步骤506中,根据所述神经网络系统中的计算单元的计算规格,将N个所述第一权重部署到P个计算单元上,并将M个所述第二权重部署到Q个计算单元上。其中,P和Q都是正整数,所述P个计算单元用于执行所述第一神经网络层的操作,所述Q个计算单元用于执行所述第二神经网络层的操作。在本发明实施例中,所述计算单元的计算规格是指一个计算单元中包含的crossbar的数量。实际应用中,一个计算单元可以包括一个或多个crossbar。具体的,如前所述,由于神经网络系统的网络模型信息还包括每一个神经网络层所使用的一个权重的大小和crossbar的规格信息,因此,可以获得一个权重和crossbar的部署关系。在步骤504中获得每一层神经网络需要配置的权重的数量后,可以根据每个计算单元包含的crossbar的数量,将每一层的权重部署在对应数量的计算单元上。具体的,权重矩阵中的元素被分别配置在计算单元的crossbar的ReRAM cell中。在本发明实施例中,计算单元可以指PE或engine,一个PE可以包括多个engine,一个engine可以包括一个或多个crossbar。由于每一层的权重的大小可能不同,因此,一个权重可以部署在一个或多个engine上。
具体的,在本步骤中,可以根据一个权重和crossbar的部署关系以及计算单元中包含的crossbar的数量,确定所述N个第一权重需要部署的P个计算单元以及所述M个所述第二权重需要部署的Q个计算单元。例如,可以将第一神经网络层的N个所述第一权重部署到P个计算单元上,将M个所述第二权重部署到Q个计算单元上。具体的,N个第一权重中的元素被分别配置到P个计算单元中对应的crossbar的ReRAM cell中。M个第二权重中的元素被分别配置到Q个计算单元中对应的crossbar的ReRAM cell中。从而,所述P个计算单 元可以基于配置的N个第一权重对输入所述P个计算单元的输入数据执行第一神经网络层的操作,所述Q个计算单元可以基于配置的Q个第二权重对输入所述Q个计算单元的输入数据执行第二神经网络层的操作。
从上述实施例可知,本发明实施例提供的资源分配方法,在配置执行每一层神经网络操作的计算单元时,考虑了相邻神经网络层输出的数据量,使执行不同神经网络层操作的计算节点的计算能力相匹配,从而能够充分利用计算节点的计算能力,提升数据处理的效率。
进一步的,在本发明实施例中,为了进一步减少执行不同神经网络层的计算单元之间数据的传输量,节省计算单元或计算节点间的传输带宽。可以按照下述的方法将计算单元映射到计算单元的上级计算节点中。如前所述,神经网络系统中可以包括四级计算节点:第一级计算节点chip、第二级计算节点tile、第三级计算节点PE和第四级计算节点engine。图6以第四级计算节点engine为计算单元为例,详细描述了如何将需要部署所述N个第一权重的P个计算单元以及需要部署所述M个第二权重的Q个计算单元映射到上级计算节点。该方法仍然可以由图1和图1A所示的神经网络系统中的主机105来实现。如图6所示,该方法可以包括下述步骤。
在步骤602中,获取神经网络系统的网络模型信息。所述网络模型信息包括所述神经网络系统中第一神经网络层的第一输出数据量和第二神经网络层的第二输出数据量。在步骤604中,根据所述神经网络系统的部署需求、所述第一输出数据量以及所述第二输出数据量确定所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重。在步骤606中,根据所述神经网络系统中的计算单元的计算规格,确定所述N个第一权重需要部署的P个计算单元,以及所述M个所述第二权重需要部署的Q个计算单元。在本发明实施例中,步骤602、604和606可以分别参见前述步骤502、504和506中的相关描述。步骤606和步骤506的不同在于,在步骤606中,在确定所述N个第一权重需要部署的P个计算单元以及所述M个所述第二权重需要部署的Q个计算单元后,并不直接将所述N个第一权重部署到P个计算单元,将M个第二权重部署到Q个计算单元中。而是进入步骤608。
在步骤608中,根据所述神经网络系统中的三级计算节点包含的计算单元的数量,将所述P个计算单元以及所述Q个计算单元映射到多个三级计算节点中。具体的,如图6A所示,图6A为本发明实施例提供的一种资源映射方法流程图。图6A以计算单元为第四级计算节点engine为例,描述了如何将engine映射到第三级计算节点PE中。如图6A所示,该方法可以包括下述步骤。
在步骤6082中,将所述P个计算单元和所述Q个计算单元分为m个组,每一组中包括用于执行第一神经网络层的P/m个计算单元以及用于执行第二神经网络层的Q/m个计算单元。其中,m为不小于2的整数,P/m和Q/m的值均为整数。具体的,以所述P个计算单元为执行第i-1层的计算单元,所述Q个计算单元为执行第i-1层的计算单元为例。如图7所示,以第i-1层需要分 配8个计算单元(即P=8),第i层需要分配4个计算单元(即Q=4),第i+1层需要分配4个计算单元,且分成2组(即m=2)为例。则可以得到如图7所示的两个组,其中,第1组包括第i-1层的4个计算单元、第i层的2个计算单元、以及第i+1层的2个计算单元。类似的,第2组包括第i-1层的4个计算单元、第i层的2个计算单元、以及第i+1层的2个计算单元。
在步骤6084中,按照三级计算节点包含的计算单元的数量,分别将每一组的计算单元映射到三级计算节点中。在映射的过程中,尽量使执行相邻神经网络层操作的计算单元映射到同一个三级节点中。如图7所示,假设所述神经网络系统中,每个第一级计算节点chip中包括8个第二级计算节点tile,每个tile包括2个第三级计算节点PE,每个PE包括4个engine。则对于第1组,可以将第i-1层的4个engine映射到一个第三级计算节点PE(例如图7中的PE1)上,将第i层的2个engine以及第i+1层的2个engine一起映射到一个第三级计算节点PE(例如图7中的PE2)上。类似的,按照对第一组中的计算单元的映射方式,对于第二组的计算单元,可以将第i-1层的4个engine映射到PE3上,将第i层的2个engine以及第i+1层的2个engine一起映射到一个PE4上。实际应用中,在完成第1组中的计算单元的映射后,可以以镜像的方式,按照第1组的映射方式进行其他组的计算单元的映射。
根据这种映射方式,可以尽量将执行相邻神经网络层(例如,图7中的第i层和第i+1层)的计算单元映射到同一个三级计算节点中。从而,使得第i层的输出数据被发送到第i+1层的计算单元中时,只需要在同一个三级节点(PE)间传输,不需要占用三级节点间的带宽,进而能够提高数据传输速度,减少节点间的传输带宽消耗。
回到图6,在步骤610中,根据所述神经网络系统中的二级计算节点包含的三级计算节点的数量,将所述P个计算单元以及所述Q个计算单元映射的多个三级计算节点映射到多个二级计算节点中。在步骤612中,根据每个神经网络芯片包含的二级计算节点的数量,将所述P个计算单元以及所述Q个计算单元映射的所述多个二级计算节点映射到所述多个神经网络芯片中。如上所述,图6A是以将执行第i层操作的engine映射到第三级计算节点为例进行描述,类似的,按照图6A所示的方法,还可以将三级节点映射到二级节点,并将二级节点映射到一级节点中。例如,如图7所示,对于第1组,可以进一步将执行i-1层操作的PE1以及执行第i层和第i+1层操作的PE2均映射到同一个二级计算节点Tile1中。对于第2组,可以进一步将执行i-1层操作的PE3以及执行第i层和第i+1层操作的PE4均映射到同一个二级计算节点Tile2中。进一步的,还可以将执行第i-1层、第i层和第i+1层的操作Tile 1和Tile2均映射到同一个芯片chip1中。根据这种方式,可以获得神经网络系统中从第一级计算节点chip到第四级计算节点engine的映射关系。
在步骤614中,将所述N个第一权重部署和所述M个第二权重分别部署到与所述多个三级节点、多个二级计算节点以及多个一级计算节点对应的P 个计算单元和Q个计算单元中。在本发明实施例中,按照图6A和图7所述的方法可以获得神经网络系统中从第一级计算节点chip到第四级计算节点engine的映射关系。例如,可以获得所述P个计算单元和所述Q个计算单元分别与所述所述多个三级节点、多个二级计算节点以及多个一级计算节点的映射关系。进而,在本步骤中,可以按照获得的映射关系将对应神经网络层的权重分别部署到各级计算节点的计算单元中。例如,如图7A所示,可以将第i-1层的N个权重分别部署到与chip1、tile1和PE1对应的4个计算单元以及与chip1、tile2和PE3对应的4个计算单元中,将所述第i层的M个第二权重分别部署到与chip1、tile1和PE2对应的2个计算单元以及与chip1、tile2和PE4对应的2个计算单元。换一种表达方式,将第i-1层的N个权重分别部署chip1—>tile1—>PE1中的4个计算单元(engine)以及chip1—>tile2—>PE3中的4个计算单元中。将第i层的M个权重分别部署到chip1—>tile1—>PE2中的2个计算单元以及chip1—>tile2—>PE4中的2个计算单元中。
通过这种部署方式,不仅能够使本发明实施例所述的神经网络系统中支持相邻神经网络层操作的计算单元的计算能力相匹配,还能够使执行相邻神经网络层操作的计算单元尽量多的位于同一个三级计算节点中,执行相邻神经网络层的三级计算节点尽量多的位于同一个二级计算节点中,执行相邻神经网络层的二级计算节点尽量多的位于同一个一级计算节点,从而能够减少计算节点间传输的数据量,提高不同神经网络层间数据传输的速度。
需要说明的是,本发明实施例是在包含四级计算节点的网络神经系统中,以第四级计算节点engine为计算单元来描述用于执行不同神经网络层的操作的计算资源的分配过程。换一种表达方式,上述实施例是以engine为粒度来划分执行不同神经网络层的操作的集合。实际应用中,还可以以第三级计算节点PE为计算单元进行分配,在这种情况下,可以按照上述方法建立第三级计算节点PE与第二级计算节点tile以及第一级计算节点chip的映射。当然,在需要计算的数据量很大的情况下,也可以以第二级计算节点tile为粒度来进行分配。换一种表达方式,在本发明实施例中,计算单元可以是engine、PE、tile或chip,在此不做限定。
上面对本发明实施例提供的神经网络系统如何配置计算资源进行了详细的描述。下面将从处理数据的角度对所述神经网络系统进行进一步的描述。图8为本发明实施例提供的一种数据处理方法流程图。该方法应用于图1所示的神经网络系统中,图1所示的神经网络系统通过图5-7所示的方法进行配置,分配用于执行不同神经网络层操作的计算资源。如图8所示,该方法可以由图1中所示的神经网络电路来实现,该方法可以包括下述步骤。
在步骤802中,所述神经网络系统中的P个计算单元接收第一输入数据。其中,所述P个计算单元用于执行所述神经网络系统的第一神经网络层操作。在本发明实施例中,所述第一神经网络层为所述神经网络系统中的任意一层。所述第一输入数据为需要执行所述第一神经网络层操作的数据。当所述第一神经 网络层为图3所示的所述神经网络系统中的第1层302时,所述第一输入数据可以为首次输入神经网络系统的数据。当所述第一神经网络层不是所述神经网络系统的第1层时,所述第一输入数据可以为其他神经网络层处理后的输出数据。
在步骤804中,所述P个计算单元根据配置的N个第一权重对所述第一输入数据执行计算以得到第一输出数据。在本发明实施例中,第一权重是一个权重矩阵。所述N个第一权重是指有N个权重矩阵,所述N个第一权重也可以被称为N个第一权重副本。所述N个第一权重可以按照图5-7所示的方法配置在所述P个计算单元中。具体的,第一权重中的元素被分别配置到所述P个计算单元包括的crossbar的ReRAM cell中,从而使得所述P个计算单元中的crossbar可以基于所述N个第一权重对输入数据并行计算,充分利用P个计算单元中的crossbar的计算能力。在本发明实施例中,在接收到所述第一输入数据后,所述P个计算单元可以基于配置的所述N个第一权重对接收的所述第一输入数据执行神经网络操作,得到所述第一输出数据。例如,所述P个计算单元中的crossbar可以将所述第一输入数据与配置的第一权重执行矩阵乘加运算。
在步骤806中,所述神经网络系统中的Q个计算单元接收第二输入数据。其中,所述Q个计算单元用于执行所述神经网络系统的第二神经网络层操作,所述第二输入数据包括所述第一输出数据。具体的,在一种情况下,所述Q个计算单元可以只对所述P个计算单元的第一输出数据执行第二神经网络层的操作。例如,所述P个计算单元用于执行图3所示的第一层302的操作,所述Q个计算单元用于执行图3所示的第二层302的操作。在这种情况下,所述第二输入数据为所述第一输出数据。在又一种情况下,所述Q个计算单元还可以用于对所述第一神经网络层的第一输出数据以及其他神经网络层的输出数据共同执行第二神经网络操作。例如,所述P个计算单元可以用于执行图3所示的第二层304的神经网络操作,所述Q个计算单元可以用于执行图3所示的第五层310的神经网络操作。在这种情况下,所述Q个计算单元用于对所述第二层304以及第四层308的输出数据执行操作,所述第二输入数据包括所述第一输出数据以及所述第四层308的输出数据。
在步骤808中,所述Q个计算单元根据配置的M个第二权重对所述第二输入数据执行计算以得到第二输出数据。在本发明实施例中,第二权重也是一个权重矩阵。所述M个第二权重是指有M个权重矩阵,所述M个第二权重也可以被称为M个第二权重副本。与步骤804类似,所述第二权重可以按照图5所示的方法配置到所述Q个计算单元包括的crossbar的ReRAM cell中。在接收到所述第二输入数据后,所述Q个计算单元可以基于配置的所述M个第二权重对接收的所述第二输入数据执行神经网络操作,得到所述第二输出数据。例如,所述Q个计算单元中的crossbar可以将所述第二输入数据与配置的第二权重执行矩阵乘加运算。需要说明的是,在本发明实施例中,所述N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值对应。
为了描述清楚,下面对ReRAM crossbar如何实现矩阵乘加操作进行 简单的描述。如图9所示,图9所示的j行k列的权重矩阵可以是一个神经网络层的一个权重,该权重矩阵中的每一个元素代表一个权重值。图10是本发明实施例提供的计算单元中的一个ReRAM crossbar的结构示意图。为了描述方便,本发明实施例可以将ReRAM crossbar简称为crossbar。如图10所示,crossbar包括多个ReRAM cell,如G 1,1、G 2,1等。所述多个ReRAM cell构成一个神经网络矩阵。在本发明实施例中,可以在配置神经网络的过程中,将图9所示的权重从图10所示的crossbar的位线(如图10中输入端口1002所示)输入crossbar中,使得权重中的每个元素被配置到相应的ReRAM cell中。例如,图9中的权重元素W 0,0被配置到图10的G 1,1中,图9中的权重元素W 1,0被配置到图10的G 2,1中等。每一个权重元素对应一个ReRAM cell。在执行神经网络计算时,输入数据通过crossbar的字线(如图10所示的输入端口1004)输入crossbar。可以理解的是,输入数据可以通过电压表示,从而使得输入数据与ReRAM cell中配置的权重值实现点乘运算,得到的计算结果以输出电压的形式从crossbar每一列的输出端(如图10所示的输出端口1006)输出。
如前所述,由于在配置神经网络系统中执行每一层神经网络操作的计算单元时,考虑了相邻神经网络层输出的数据量,使执行不同神经网络层操作的计算节点的计算能力能够相匹配。因此通过本发明实施例提供的数据处理方法,能够充分利用计算节点的计算能力,提升神经网络系统的数据处理效率。
在又一种情形下,本发明实施例提供了一种资源分配装置。该装置可以应用于图1及图1A所示的神经网络系统中,用于分配执行不同神经网络层操作的计算节点,使得神经网络系统中用于执行相邻两层神经网络操作的计算节点的计算能力匹配,提高了神经网络系统中的数据处理效率,并且不浪费计算资源。可以理解的是,所述资源分配装置可以位于host中,可以由host中的处理器来实现,也可以作为一个物理器件,独立于处理器而单独存在。例如,可以作为一个独立于处理器的编译器。如图11所示,该资源分配装置1100可以包括获取模块1102、计算模块1104以及部署模块1106。
获取模块1102,用于获取所述神经网络系统中第一神经网络层的第一输出数据的数据量和第二神经网络层的第二输出数据的数据量,所述第二神经网络层的输入数据包括所述第一输出数据。计算模块1104,用于根据所述神经网络系统的部署需求确定所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重。其中,N和M都是正整数,且N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值对应。
如前所述,本发明实施例所述的神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个计算单元,每个计算单元包括至少一个阻变式随机访问存储器交叉矩阵ReRAM crossbar。在一种情形下,所述部署需求包括计算时延,当所述第一神经网络层为所述神经网络系统中所有神经网络层的起始层时,所述计算模块用于根据所述第一输出数据的数据量、所述计算时延以及计算单元中的阻变式随机访问存储器交叉矩阵ReRAM crossbar的计算频率确定所述 N的值,并根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值确定所述M的值。
在又一种情形下,所述部署需求包括所述神经网络系统的芯片的数量,所述第一神经网络层为所述神经网络系统的起始层,所述计算模块用于根据所述芯片的数量、每个芯片中的ReRAM crossbar的数量、部署每一层神经网络的一个权重所需的ReRAM crossbar的数量、以及相邻神经网络层的输出数据量的比值确定所述N的值,并根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值确定所述M的值。
部署模块1106,用于根据所述神经网络系统中的计算单元的计算规格,将N个所述第一权重部署到P个计算单元上,并将M个所述第二权重部署到Q个计算单元上,其中,P和Q都是正整数,所述P个计算单元用于执行所述第一神经网络层的操作,所述Q个计算单元用于执行所述第二神经网络层的操作。其中,计算单元的计算规格是指一个计算单元中包括的crossbar的数量。实际应用中,一个计算单元可以包括一个或多个crossbar。具体的,在计算模块1104获得每一层神经网络需要配置的权重的数量后,部署模块1106可以根据每个计算单元包含的crossbar的数量,将每一层的权重部署在对应的计算单元上。具体的,权重矩阵中的元素被分别配置计算单元的crossbar的ReRAM cell中。在本发明实施例中,计算单元可以指PE或engine,一个PE可以包括多个engine,一个engine可以包括一个或多个crossbar。由于每一层的权重的大小可能不同,因此,一个权重可以部署在一个或多个engine上。
如前所述,图1所示的神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个二级计算节点,每个二级计算节点包括多个计算单元。为了进一步减少执行不同神经网络层的计算单元之间数据的传输量,节省计算单元或计算节点间的传输带宽。所述资源分配装置1100还可以包括映射模块1108,用于将计算单元映射到计算单元的上级计算节点中。具体的,在计算模块1104获得所述第一神经网络层需配置的N个第一权重以及所述第二神经网络层需配置的M个第二权重后,所述映射模块1108用于建立所述N个第一权重和所述P个计算单元的映射关系,以及建立所述M个第二权重和所述Q个计算单元的映射关系。进一步的,所述映射模块1108还用于将根据所述神经网络系统中的二级计算节点包含的计算单元的数量,将所述P个计算单元以及所述Q个计算单元映射到多个二级计算节点中,其中,所述P个计算单元中的至少一部分计算单元和所述Q个计算单元中的至少一部分计算单元被映射到同一个二级计算节点中。
进一步的,所述映射模块1108还用于根据每个神经网络芯片包含的二级计算节点的数量,将所述P个计算单元以及所述Q个计算单元映射的所述多个二级计算节点映射到所述多个神经网络芯片中。其中,所述P个计算单元所属的二级计算节点中的至少一部分二级计算节点与所述Q个计算单元所属的二级计算节点中的至少一部分二级计算节点被映射到同一个神经网络芯片中。
在本发明实施例中,映射模块1108如何建立N个第一权重和所述P个计算单元的映射关系,建立所述M个第二权重和所述Q个计算单元的映射关系,以及如何将P个计算单元和Q个计算单元分别映射到计算单元的上级计算节点中,可以参见前述对图6、图6A和图7的相应描述,在此不再赘述。
本发明实施例还提供一种实现上述资源分配方法的计算机程序产品,并且,本发明实施例也提供了一种实现上述数据处理方法的计算程序产品,上述计算机程序产品均包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令用于执行前述任意一个方法实施例所述的方法流程。本领域普通技术人员可以理解,前述的存储介质包括:U盘、移动硬盘、磁碟、光盘、随机存储器(Random-Access Memory,RAM)、固态硬盘(Solid State Disk,SSD)或者非易失性存储器(non-volatile memory)等各种可以存储程序代码的非短暂性的(non-transitory)机器可读介质。
需要说明的是,本申请所提供的实施例仅仅是示意性的。所属领域的技术人员可以清楚的了解到,为了描述的方便和简洁,在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。在本发明实施例、权利要求以及附图中揭示的特征可以独立存在也可以组合存在。在本发明实施例中以硬件形式描述的特征可以通过软件来执行,反之亦然。在此不做限定。

Claims (12)

  1. 一种神经网络系统,其特征在于,包括:
    用于执行第一神经网络层操作的P个计算单元,用于接收第一输入数据,根据配置的N个第一权重对所述第一输入数据执行计算以得到第一输出数据;
    用于执行第二神经网络层操作的Q个计算单元,用于接收第二输入数据,根据配置的M个第二权重对所述第二输入数据执行计算以得到第二输出数据,其中,所述第二输入数据包括所述第一输出数据;
    其中,P、Q、N和M都是正整数,N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值对应。
  2. 根据权利要求1所述的神经网络系统,其特征在于:所述神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个二级计算节点,每个二级计算节点包括多个计算单元,每个计算单元包括至少一个阻变式随机访问存储器交叉矩阵ReRAM crossbar。
  3. 根据权利要求2所述的神经网络系统,其特征在于:所述N和M的值根据所述神经网络系统的部署需求、所述第一输出数据量以及所述第二输出数据量来确定。
  4. 根据权利要求3所述的神经网络系统,其特征在于:所述部署需求包括计算时延,所述第一神经网络层为所述神经网络系统的中所有神经网络层的起始层,所述N的值根据所述第一输出数据的数据量、所述计算时延以及所述ReRAM crossbar的计算频率来确定,所述M的值根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值来确定。
  5. 根据权利要求3所述的神经网络系统,其特征在于:所述部署需求包括所述神经网络芯片的数量,所述第一神经网络层为所述神经网络系统的中所有神经网络层的起始层,所述N的值根据所述芯片的数量、每个芯片中的ReRAM crossbar的数量、部署每一层神经网络的一个权重所需的ReRAM crossbar的数量、以及相邻神经网络层的输出数据的数据量的比值来确定,所述M的值根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值来确定。
  6. 根据权利要求2-5任意一项所述的神经网络系统,其特征在于:
    所述P个计算单元中的至少一部分计算单元和所述Q个计算单元中的至少一部分计算单元位于同一个二级计算节点中。
  7. 根据权利要求2-5任意一项所述的神经网络系统,其特征在于:
    所述P个计算单元所属的二级计算节点中的至少一部分二级计算节点与所述Q个计算单元所属的二级计算节点中的至少一部分二级计算节点位于同一个神经网络芯片中。
  8. 一种数据处理方法,其特征在于,所述方法应用于神经网络系统中,所述方法包括:
    所述神经网络系统中的P个计算单元接收第一输入数据,其中,所述P个计算单元用于执行第一神经网络层操作;
    所述P个计算单元根据配置的N个第一权重对所述第一输入数据执行计算以得到第一输出数据;
    所述神经网络系统中的Q个计算单元接收第二输入数据,其中,所述Q个计算单元用于执行第二神经网络层操作,所述第二输入数据包括所述第一输出数据;
    所述Q个计算单元根据配置的M个第二权重对所述第二输入数据执行计算以得到第二输出数据;
    其中,P、Q、N和M都是正整数,N和M的比值与所述第一输出数据的数据量与所述第二输出数据的数据量的比值对应。
  9. 根据权利要求8所述的数据处理方法,其特征在于:所述第一神经网络层为所述神经网络系统的中所有神经网络层的起始层;
    所述N的值根据所述第一输出数据的数据量、设置的神经网络系统的计算时延以及所述计算单元中的ReRAM crossbar的计算频率来确定;
    所述M的值根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值来确定。
  10. 根据权利要求8所述的数据处理方法,其特征在于:所述神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个计算单元,每个计算单元包括至少一个阻变式随机访问存储器交叉矩阵ReRAM crossbar,所述第一神经网络层为所述神经网络系统的起始层;
    所述N的值根据所述多个神经网络芯片的数量、每个芯片中的ReRAM crossbar的数量、部署每一层神经网络的一个权重所需的ReRAM crossbar的数量、以及相邻神经网络层的输出数据量的比值来确定;
    所述M的值根据所述第一输出数据的数据量和所述第二输出数据的数据量的比值以及所述N的值来确定。
  11. 根据权利要求8-10任意一项所述的数据处理方法,其特征在于:所述神经网络系统包括多个神经网络芯片,每个神经网络芯片包括多个二级计算节点,每个二级计算节点包括多个计算单元;
    所述P个计算单元中的至少一部分计算单元和所述Q个计算单元中的至少一部分计算单元位于同一个二级计算节点中。
  12. 根据权利要求11所述的数据处理方法,其特征在于:
    所述P个计算单元所属的二级计算节点中的至少一部分二级计算节点与所述Q个计算单元所属的二级计算节点中的至少一部分二级计算节点位于同一个神经网络芯片中。
PCT/CN2018/125761 2018-12-29 2018-12-29 神经网络系统及数据处理技术 WO2020133463A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2018/125761 WO2020133463A1 (zh) 2018-12-29 2018-12-29 神经网络系统及数据处理技术
EP18944316.1A EP3889844A4 (en) 2018-12-29 2018-12-29 Neural network system and data processing technology
CN201880100568.7A CN113261015A (zh) 2018-12-29 2018-12-29 神经网络系统及数据处理技术
US17/360,459 US20210326687A1 (en) 2018-12-29 2021-06-28 Neural Network System and Data Processing Technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/125761 WO2020133463A1 (zh) 2018-12-29 2018-12-29 神经网络系统及数据处理技术

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/360,459 Continuation US20210326687A1 (en) 2018-12-29 2021-06-28 Neural Network System and Data Processing Technology

Publications (1)

Publication Number Publication Date
WO2020133463A1 true WO2020133463A1 (zh) 2020-07-02

Family

ID=71128014

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/125761 WO2020133463A1 (zh) 2018-12-29 2018-12-29 神经网络系统及数据处理技术

Country Status (4)

Country Link
US (1) US20210326687A1 (zh)
EP (1) EP3889844A4 (zh)
CN (1) CN113261015A (zh)
WO (1) WO2020133463A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210133541A1 (en) * 2019-10-31 2021-05-06 Micron Technology, Inc. Spike Detection in Memristor Crossbar Array Implementations of Spiking Neural Networks

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807519B (zh) * 2019-11-07 2023-01-17 清华大学 基于忆阻器的神经网络的并行加速方法及处理器、装置
US11668797B2 (en) 2019-12-18 2023-06-06 Micron Technology, Inc. Intelligent radar electronic control units in autonomous vehicles
US11947359B2 (en) 2020-02-14 2024-04-02 Micron Technology, Inc. Intelligent lidar sensors for autonomous vehicles
CN116151316A (zh) * 2021-11-15 2023-05-23 平头哥(上海)半导体技术有限公司 适用于类神经网络模型的计算系统及实现类神经网络模型的方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107085562A (zh) * 2017-03-23 2017-08-22 中国科学院计算技术研究所 一种基于高效复用数据流的神经网络处理器及设计方法
CN107871163A (zh) * 2016-09-28 2018-04-03 爱思开海力士有限公司 用于卷积神经网络的操作装置及方法
CN108470009A (zh) * 2018-03-19 2018-08-31 上海兆芯集成电路有限公司 处理电路及其神经网络运算方法
US10096134B2 (en) * 2017-02-01 2018-10-09 Nvidia Corporation Data compaction and memory bandwidth reduction for sparse neural networks
CN108805262A (zh) * 2017-04-27 2018-11-13 美国飞通计算解决方案有限公司 用于根据高级程序进行脉动阵列设计的系统及方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017171771A1 (en) * 2016-03-31 2017-10-05 Hewlett Packard Enterprise Development Lp Data processing using resistive memory arrays

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871163A (zh) * 2016-09-28 2018-04-03 爱思开海力士有限公司 用于卷积神经网络的操作装置及方法
US10096134B2 (en) * 2017-02-01 2018-10-09 Nvidia Corporation Data compaction and memory bandwidth reduction for sparse neural networks
CN107085562A (zh) * 2017-03-23 2017-08-22 中国科学院计算技术研究所 一种基于高效复用数据流的神经网络处理器及设计方法
CN108805262A (zh) * 2017-04-27 2018-11-13 美国飞通计算解决方案有限公司 用于根据高级程序进行脉动阵列设计的系统及方法
CN108470009A (zh) * 2018-03-19 2018-08-31 上海兆芯集成电路有限公司 处理电路及其神经网络运算方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3889844A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210133541A1 (en) * 2019-10-31 2021-05-06 Micron Technology, Inc. Spike Detection in Memristor Crossbar Array Implementations of Spiking Neural Networks
US11681903B2 (en) * 2019-10-31 2023-06-20 Micron Technology, Inc. Spike detection in memristor crossbar array implementations of spiking neural networks

Also Published As

Publication number Publication date
EP3889844A4 (en) 2021-12-29
US20210326687A1 (en) 2021-10-21
EP3889844A1 (en) 2021-10-06
CN113261015A (zh) 2021-08-13

Similar Documents

Publication Publication Date Title
WO2020133317A1 (zh) 计算资源分配技术及神经网络系统
WO2020133463A1 (zh) 神经网络系统及数据处理技术
US10943167B1 (en) Restructuring a multi-dimensional array
US11720523B2 (en) Performing concurrent operations in a processing element
CN111033529B (zh) 神经网络的架构优化训练
TWI836132B (zh) 儲存系統以及用於動態地擴縮儲存系統的排序操作的方法
EP3855367A1 (en) Operation accelerator, processing method, and related device
US11599367B2 (en) Method and system for compressing application data for operations on multi-core systems
US11755683B2 (en) Flexible accelerator for sparse tensors (FAST) in machine learning
US11579921B2 (en) Method and system for performing parallel computations to generate multiple output feature maps
Dutta et al. Hdnn-pim: Efficient in memory design of hyperdimensional computing with feature extraction
WO2018010244A1 (en) Systems, methods and devices for data quantization
US20210303976A1 (en) Flexible accelerator for sparse tensors in convolutional neural networks
WO2020124488A1 (zh) 应用进程映射方法、电子装置及计算机可读存储介质
CN112835844A (zh) 一种脉冲神经网络计算负载的通信稀疏化方法
US11467973B1 (en) Fine-grained access memory controller
Zhan et al. Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems
US11501134B2 (en) Convolution operator system to perform concurrent convolution operations
Ortega-Cisneros Design and Implementation of a NoC-Based Convolution Architecture With GEMM and Systolic Arrays
CN111078623B (zh) 片上网络处理系统和片上网络数据处理方法
CN111078625B (zh) 片上网络处理系统和片上网络数据处理方法
WO2024130830A1 (zh) 数据处理装置、计算机系统及其操作方法
TWI753728B (zh) 運算單元架構、運算單元叢集及卷積運算的執行方法
US12126367B2 (en) Method and system for compressing application data for operations on multi-core systems
US12067484B2 (en) Learning neural networks of programmable device blocks directly with backpropagation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18944316

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018944316

Country of ref document: EP

Effective date: 20210630