WO2021243489A1 - Procédé et appareil de traitement de données d'un réseau neuronal - Google Patents

Procédé et appareil de traitement de données d'un réseau neuronal Download PDF

Info

Publication number
WO2021243489A1
WO2021243489A1 PCT/CN2020/093624 CN2020093624W WO2021243489A1 WO 2021243489 A1 WO2021243489 A1 WO 2021243489A1 CN 2020093624 W CN2020093624 W CN 2020093624W WO 2021243489 A1 WO2021243489 A1 WO 2021243489A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
neural network
data
batch
inter
Prior art date
Application number
PCT/CN2020/093624
Other languages
English (en)
Chinese (zh)
Inventor
袁宏辉
高山青
高立稳
熊乐进
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2020/093624 priority Critical patent/WO2021243489A1/fr
Priority to PCT/CN2021/073691 priority patent/WO2021244045A1/fr
Priority to CN202180037755.7A priority patent/CN115668222A/zh
Publication of WO2021243489A1 publication Critical patent/WO2021243489A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons

Definitions

  • This application relates to the field of artificial intelligence (AI), and in particular to a neural network data processing method and device.
  • AI artificial intelligence
  • the performance of the processor continues to improve.
  • the computer system is equipped with a multi-level cache structure with higher bandwidth and smaller capacity.
  • each layer of the neural network After each layer of the neural network has processed the input data, it enters the next layer. If the amount of input data is large, the size of the inter-layer data of multiple layers of the neural network may also be too large, causing the cache to be unable to store the inter-layer data, and the inter-layer data is stored in an external memory. Because the cache cannot be used effectively, the computational efficiency of the processor is reduced.
  • the traditional technology groups the input data according to the inter-layer data caching requirements of each layer to obtain multiple sets of batches of the same batch size.
  • the batch size is limited by the largest cache demand.
  • the neural network processes one set of batches before processing the next set of batches. By reducing the data processed by each layer in the neural network, the inter-layer data is reduced, and the inter-layer data is stored in the cache as much as possible.
  • the size of the data between layers is different. For example, if a layer in a neural network enlarges a picture, the generated inter-layer data is relatively large. For another example, if a layer in the neural network performs a shrinking operation on a picture, the generated inter-layer data is smaller. For layers that output smaller inter-layer data, the smaller the batch size, the smaller the inter-layer data, and the more remaining capacity of the cache; for the layers that output larger inter-layer data, the larger the batch size. The larger the data, the smaller the remaining capacity of the cache, and the cache may not be able to store inter-layer data.
  • the utilization rate of the cache is still low, which affects the computational efficiency of the hardware running the neural network.
  • it will increase the head overhead of processing each batch in each layer of the neural network, and on the contrary reduce the computational efficiency of the hardware running the neural network. Therefore, how to improve the utilization rate of the cache and ensure the computational efficiency of the hardware running the neural network is an urgent problem to be solved.
  • the present application provides a neural network data processing method and device, which can improve the utilization rate of the cache and ensure the computational efficiency of the hardware running the neural network.
  • this application adopts the following technical solutions.
  • this application provides a neural network data processing method.
  • the method includes: the processor uses the amount of input data, the first feature of the internal memory in the chip running the neural network, and the multiple layers of the neural network.
  • the second feature groups the input data, determines the batch size of each layer in the neural network, and makes the batch sizes of at least two layers different among the batch sizes of multiple layers.
  • the batch size of each layer in a neural network is different.
  • a neural network includes layers of the same batch size and layers of different batch sizes.
  • the first feature includes at least one of the distribution feature of the internal memory in the chip and the capacity of the internal memory.
  • the second feature includes the connection relationship between the plurality of layers and the calculation-related parameters of each of the plurality of layers.
  • the batch corresponding to the batch size is one picture, multiple pictures, or part of the image in one picture.
  • the so-called internal memory refers to the memory in the chip running the neural network.
  • the memory on the chip that runs the neural network is a cache.
  • the so-called external memory refers to the memory outside the chip that runs the neural network.
  • Internal memory can also be called on-chip memory.
  • External memory can also be called off-chip memory.
  • the neural network data processing method comprehensively refers to the data amount of the input data, the first feature and the second feature to segment the input data, and sets different batch sizes for the layers in the neural network. Therefore, by setting a reasonable batch size for each layer in the neural network, the internal memory is fully utilized to store the inter-layer data of the neural network during the neural network inference process, which reduces the interaction between the chip running the neural network and the external memory, thereby Improve the utilization of internal memory and ensure the computational efficiency of the chip running the neural network.
  • determining the batch size of the multiple layers according to the amount of data, the first characteristic, and the second characteristic includes: determining the amount of batches according to the amount of data, the first characteristic, and the second characteristic.
  • the batch size of layers, N sub-pictures, M pictures and storage locations of inter-layer data, N is an integer greater than or equal to 2
  • M is an integer greater than or equal to 1
  • N ⁇ M is an integer greater than or equal to 2
  • the storage location of the inter-layer data includes at least one of an internal memory or an external memory.
  • the inter-layer data of multiple layers included in the sub-picture is stored in the internal memory.
  • the inter-layer data between sub-pictures is stored in the internal memory.
  • the inter-layer data between the pictures is stored in the external memory.
  • the subgraph contains one or more layers of the same batch size.
  • the number of layers included in different sub-pictures may be the same or different.
  • the sub-picture may also be referred to as the first-type layer group.
  • the graph includes one or more subgraphs.
  • the number of subgraphs contained in different graphs can be the same or different.
  • the graph may also be referred to as the second type of layer group.
  • the processor may use an iterative algorithm to determine the batch size of multiple layers, N sub-pictures, M pictures, and inter-layer data based on the amount of data, the first feature, and the second feature. Storage location. It is understandable that the processor does not perform a calculation based on the amount of data, the first feature, and the second feature at one time, and obtains the batch size of multiple layers, N sub-pictures, M pictures, and storage locations of inter-layer data. Instead, it uses an iterative algorithm to go through multiple iterative experiments. From multiple experimental results, select the batch size of multiple layers, N sub-graphs, M graphs, and storage locations of inter-layer data to ensure the utilization of internal memory and operation The computational efficiency of the neural network chip.
  • the optimization algorithm can be a dynamic programming algorithm, a greedy algorithm or a genetic algorithm.
  • dynamic programming algorithm dynamic programming algorithm
  • the basic idea of dynamic programming algorithm is also to decompose the problem to be solved into several sub-problems, first solve the sub-problems, and then obtain the solution of the original problem from the solutions of these sub-problems.
  • Greedy algorithm can also be called greedy algorithm.
  • the basic idea is to proceed step by step from a certain initial solution of the problem. According to a certain optimization measure, each step must ensure that a local optimal solution can be obtained. Only one data is considered in each step, and his selection should meet the conditions of local optimization. If the next data and the partial optimal solution are no longer a feasible solution, the data is not added to the partial solution until all the data is enumerated, or no more data can be added to stop the algorithm
  • Genetic algorithm is a type of algorithm designed based on the evolutionary laws of the biological world, and is used to simulate natural evolution to search for the optimal solution.
  • the scheduling order of the layers in the figure is based on the scheduling order of the subgraphs contained in the figure, and the The scheduling sequence of the layers in the subgraph is determined.
  • the scheduling order of the layers in the subgraph is the same as the scheduling order of the layers in the neural network. For example, batches corresponding to the batch size of the layers included in the sub-picture are processed in the order of the layers included in the sub-picture.
  • the scheduling order of each subgraph included in the figure is determined according to the batch size and the scheduling order of the first and last layers in the subgraph.
  • the inter-layer data of the sub-graphs contained in the graph are aggregated or scattered.
  • the embodiment of the present application also provides a neural network data processing device, and the beneficial effects can be referred to the description of the first aspect and will not be repeated here.
  • the data processing device of the neural network has the function of realizing the behavior of the processor in the method example of the first aspect described above.
  • the functions can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • the data processing device of the neural network includes: an acquisition unit and a processing unit.
  • the acquiring unit is used to acquire the data amount of the input data of the neural network, the first feature of the internal memory in the chip running the neural network, and the second feature of multiple layers in the neural network.
  • the processing unit is configured to determine the batch size of each layer in the multiple layers according to the data amount, the first characteristic, and the second characteristic, and the batch sizes of at least two layers in the multiple layers are different.
  • a neural network data processing device may be a processor.
  • graphics processor Graphics Processing Unit, GPU
  • neural network processor Neural-network Processing Unit, NPU
  • advanced reduced instruction set processor Advanced RISC Machines, ARM
  • the neural network also includes a memory.
  • the memory is used to store computer programs or instructions
  • the processor is coupled with the memory.
  • a computer program product includes: computer program code, which when the computer program code runs, causes the method executed by the processor in the first aspect to be executed.
  • the present application provides a chip system, the chip system includes a processor, and is configured to implement the function of the processor in the method of the first aspect.
  • the chip system further includes a memory for storing at least one of program instructions or data.
  • the chip system can be composed of chips, and can also include chips and other discrete devices.
  • the present application provides a computer-readable storage medium that stores a computer program, and when the computer program is executed, the method executed by the processor in the first aspect described above is implemented.
  • the names of the processor and the data processing device of the neural network do not constitute a limitation on the device itself. In actual implementation, these devices may appear under other names. As long as the function of each device is similar to that of this application, it falls within the scope of the claims of this application and its equivalent technologies.
  • FIG. 1 is a schematic diagram of the principle of a neural network provided by an embodiment of this application.
  • FIG. 2 is a schematic structural diagram of a neural network system provided by an embodiment of this application.
  • FIG. 3 is a schematic diagram of the structure of a neural network chip provided by an embodiment of the application.
  • FIG. 4 is a schematic structural diagram of a processing device provided by an embodiment of this application.
  • FIG. 5 is a schematic diagram of the structure of layers in a neural network provided by an embodiment of this application.
  • FIG. 6 is a schematic diagram of the overlap problem provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of a sub-picture provided by an embodiment of this application.
  • FIG. 8 is a schematic diagram of a diagram provided by an embodiment of the application.
  • FIG. 9 is a schematic diagram of aggregation processing of inter-layer data between sub-pictures according to an embodiment of the application.
  • FIG. 10 is a schematic diagram of dispersing processing of inter-layer data between sub-pictures provided by an embodiment of the application.
  • FIG. 11 is a schematic diagram of the processing of a graph provided by an embodiment of this application.
  • FIG. 12 is a flowchart of a neural network data processing method provided by an embodiment of the application.
  • FIG. 13 is a schematic diagram of a process of neural network processing data provided by an embodiment of this application.
  • FIG. 14 is a schematic diagram of a process of neural network processing data provided by an embodiment of this application.
  • FIG. 15 is a schematic diagram of a process of neural network processing data provided by an embodiment of this application.
  • 16 is a schematic structural diagram of a neural network data processing device provided by an embodiment of the application.
  • FIG. 17 is a schematic structural diagram of a neural network data processing device provided by an embodiment of the application.
  • Neural network may also be called artificial neural network (ANN) or similar neural network.
  • ANN artificial neural network
  • a neural network is a mathematical model or calculation model that imitates the structure and function of a biological neural network (an animal's central nervous system, especially the brain), and is used to estimate or approximate functions.
  • Neural networks can include convolutional neural network (convolutional neural network, CNN), deep neural network (deep neural network, DNN), multilayer perceptron (multilayer perceptron, MLP) and recurrent neural network (recurrent neural network, RNN), etc.
  • CNN convolutional neural network
  • DNN deep neural network
  • MLP multilayer perceptron
  • RNN recurrent neural network
  • a neural network can be composed of neural units, which can refer to an arithmetic unit that takes x s and intercept 1 as inputs. The output of this arithmetic unit satisfies the following formula (1).
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • the neural network 100 has N processing layers, N ⁇ 3 and N takes a natural number.
  • the first layer of the neural network is the input layer 110, which is responsible for receiving input signals
  • the last layer of the neural network is the output layer 130, which outputs the processing results of the neural network.
  • the other layers excluding the first and last layers are intermediate layers 140. These intermediate layers 140 collectively form the hidden layer 120.
  • Each intermediate layer 140 in the hidden layer 120 can receive input signals and output signals.
  • the hidden layer 120 is responsible for the processing of the input signal.
  • Each layer represents a logic level of signal processing. Through multiple layers, data signals can be processed by multiple levels of logic.
  • the input signal of the neural network may be a signal in various forms such as a video signal, a voice signal, a text signal, an image signal, and a temperature signal.
  • the processed image signal may be various sensor signals such as a landscape signal taken by a camera (image sensor), an image signal of a community environment captured by a display monitoring device, and a facial signal of a human face obtained by an access control system.
  • the input signal of the neural network also includes various other engineering signals that can be processed by computers, which will not be listed here. If the neural network is used for deep learning of the image signal, the image quality can be improved.
  • Deep neural network is also called multi-layer neural network, which can be understood as a neural network with multiple hidden layers.
  • the deep neural network is divided according to the position of different layers.
  • the neural network inside the deep neural network can be divided into three categories: input layer, hidden layer and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers.
  • the layers are fully connected, that is, any neuron in the i-th layer is connected to any neuron in the i+1-th layer.
  • the coefficients from the kth neuron in the L-1th layer to the jth neuron in the Lth layer are defined as
  • Convolutional neural network is a deep neural network with convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can be connected to only part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels.
  • Sharing weight can be understood as the way of extracting image information has nothing to do with location.
  • the convolution kernel can be initialized in the form of a matrix of random size. During the training process of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
  • the neural network system 200 includes a host 210 and a neural network circuit 220.
  • the neural network circuit 220 is connected to the host 210 through a host interface.
  • the host interface may include a standard host interface and a network interface (network interface).
  • the host interface may include a peripheral component interconnect express (PCIe) interface.
  • PCIe peripheral component interconnect express
  • the neural network circuit 220 may be connected to the host 210 through the PCIe bus 230. Therefore, data can be input to the neural network circuit 220 via the PCIe bus 230, and data processed by the neural network circuit 220 can be received via the PCIe bus 230.
  • the host 210 can also monitor the working status of the neural network circuit 220 through the host interface.
  • the host 210 includes a processor 211 and a memory 212. It should be noted that, in addition to the devices shown in FIG. 2, the host 210 may also include other devices such as a communication interface and a magnetic disk as an external memory, which is not limited here. The host 210 can be considered as an integrated circuit or an independent device.
  • the processor 211 is the computing core and control unit of the host 210.
  • the processor 211 may include multiple processor cores (cores).
  • the processor 211 may be a very large-scale integrated circuit.
  • An operating system and other software programs are installed in the processor 211, so that the processor 211 can implement access to the memory 212, cache, disk, and peripheral devices (such as the neural network circuit in FIG. 2).
  • the processor core in the processor 211 may be a central processing unit (CPU), or may also be other application specific integrated circuits (ASICs).
  • the memory 212 is the main memory of the host 210.
  • the memory 212 is connected to the processor 211 via a double data rate (DDR) bus.
  • the memory 212 is generally used to store various running software in the operating system, input and output data, and information exchanged with an external memory. In order to increase the access speed of the processor 211, the memory 212 needs to have the advantage of fast access speed. In a traditional computer system architecture, a dynamic random access memory (DRAM) is usually used as the memory 212.
  • DRAM dynamic random access memory
  • the processor 211 can access the memory 212 at a high speed through a memory controller (not shown in FIG. 2), and perform a read operation and a write operation on any storage unit in the memory 212.
  • the neural network circuit 220 may be a chip that runs a neural network.
  • the neural network circuit 220 is a chip array composed of a plurality of neural network chips.
  • the neural network circuit 220 includes a plurality of neural network chips 221 for data processing and a plurality of routers 222.
  • the neural network chip 221 is referred to as the chip 221 for short in the embodiment of the present application.
  • the multiple chips 221 are connected to each other through a router 222.
  • one chip 221 may be connected to one or more routers 222.
  • Multiple routers 222 can form one or more network topologies.
  • the chips 221 can transmit data through the multiple network topologies described above.
  • the neural network circuit 220 may also include other devices such as a memory 223, an input port 224, and an output port 225.
  • the memory is used to store data, computer programs and instructions.
  • FIG. 3 is a schematic structural diagram of a neural network chip provided by an embodiment of the application.
  • the chip 221 includes a plurality of routers 310, and each router 310 can be connected to a tile 320. In practical applications, one router 310 can also connect multiple tiles 320.
  • each tile 320 may include an input/output interface (TxRx) 321, a switching device 322, multiple processing elements (PE) 323, and a memory 324.
  • the input/output interface 321 is used to receive data input from the router 310 to the tile 320, or to output the calculation result of the tile 320. To put it another way, the input/output interface 321 is used to implement data transmission between the tile 320 and the router 310.
  • the switching device 322 connects the input/output interface 321 and a plurality of processing devices 323.
  • the switching device 322 is used to implement data transmission between the input/output interface 321 and the multiple processing devices 323.
  • the memory 324 is used to store data, computer programs and instructions.
  • Each tile 320 may also include a controller 325, which is used to control the input/output interface 321 and multiple processing devices 323 to make the system work normally.
  • Each processing device 323 may include one or more computing engines 326.
  • One or more calculation engines 326 are used to implement neural network calculations on the data input to the calculation engine 326. For example, the data input to the tile 320 and the preset convolution kernel in the tile 320 may be multiplied and added.
  • the calculation result of the calculation engine 326 can be sent to other tiles 320 through the switching device 322 and the input/output interface 321.
  • a calculation engine 326 may include modules that implement convolution, pooling, or other neural network operations.
  • the specific circuit or function of the calculation engine 326 is not limited.
  • the calculation engine is referred to as engine for short.
  • FIG. 4 it is a schematic structural diagram of a processing device provided by an embodiment of this application.
  • the processing device 323 may also include a controller 327 and a bus 328.
  • the controller 327 is used for receiving data, and scheduling one or more engines 326 in the processing device 323 to process the data, so that the system works normally.
  • the multiple engines 326 perform data transmission through the bus 328.
  • the engine 326 is connected to one or more exclusive memories 3210.
  • multiple engines 326 may also share one or more memories 329.
  • the memory in the neural network circuit 220 may be a cache memory, that is, a cache.
  • the memory 223, the memory 324, the memory 329, and the memory 3210 may all be cache memories.
  • the cache memory in the neural network circuit 220 is composed of a static random access memory (SRAM), which has a relatively small capacity but a speed much higher than that of the main memory, which is close to the speed of the CPU.
  • the cache memory may be an L1 level cache memory, an L2 level cache memory, or an L3 level cache memory.
  • the memory 3210 is an L1 level cache memory.
  • the memory 329 is an L2 level cache memory or an L3 level cache memory.
  • the memory 223 is an L2 level cache memory or an L3 level cache memory.
  • the memory 324 is an L2 level cache memory or an L3 level cache memory.
  • the neural network circuit 220 provided by the embodiment of the present application includes a plurality of neural network chips 221, each neural network chip 221 includes a plurality of tiles 320, and each tile 320 includes a plurality of Processing devices 323, each processing device 323 includes a plurality of engines 326.
  • the neural network system may include multi-level computing nodes, for example, may include four-level computing nodes: the first-level computing node is the chip 221, and the second-level computing node is the tile in the chip 221 320, the third-level computing node is the processing device 323 in the tile 320, and the fourth-level computing node is the engine 326 in the processing device 323.
  • the neural network system provided by the embodiments of the present application can be applied to a mobile terminal, a monitoring terminal, or a server, etc., to implement related neural network operations.
  • the neural network includes multiple neural network layers.
  • the neural network layer is a logical layer concept, and a neural network layer refers to a neural network operation to be performed once.
  • the neural network may include n neural network layers (also called n-layer neural network), where n is an integer greater than or equal to 2.
  • the first neural network layer and the second neural network layer may be two of the n layers that have a dependency relationship in operation.
  • two neural network layers with a dependency relationship means that the input data of one neural network layer includes the output data of the other neural network layer.
  • Two neural network layers with dependencies can also be referred to as adjacent layers.
  • the input of each neural network layer may come from more than one neural network layer, and may come from the first m neural network layers; similarly, the output of each neural network layer may not only be output to the next neural network layer, it is possible Output to the last m neural network layers.
  • FIG. 5 shows part of the neural network layers in the neural network.
  • the neural network layers may include convolutional layers, pooling layers, and so on.
  • the neural network 500 may include a first layer 502, a second layer 504, a third layer 506, a fourth layer 508, and a fifth layer 510 to an nth layer 512.
  • the first layer 502 can perform a convolution operation
  • the second layer 504 can perform a pooling operation on the output data of the first layer 502
  • the third layer 506 can perform a convolution operation on the output data of the second layer 504.
  • the fourth layer 508 can perform a convolution operation on the output result of the third layer 506, and the fifth layer 510 can perform a summation operation on the output data of the second layer 504 and the output data of the fourth layer 508, and so on.
  • Figure 5 is only a simple example and description of the neural network layers in the neural network, and does not limit the specific operations of each layer of the neural network.
  • the fourth layer 508 can also be a pooling operation.
  • the five-layer 510 may also perform other neural network operations such as convolution operation or pooling operation.
  • the output data of the first layer 502 is the input data of the second layer 504. Therefore, the first layer 502 and the second layer 504 have a dependency relationship.
  • the output data of the second layer 504 is the input data of the third layer 506, and the second layer 504 and the third layer 506 have a dependency relationship.
  • the output data of the third layer 506 is the input data of the fourth layer 508, and the third layer 506 and the fourth layer 508 have a dependency relationship.
  • the input data of the fifth layer 510 includes the output data of the second layer 504 and the output data of the fourth layer 508. Therefore, the second layer 504 and the fifth layer 510 also have a dependency relationship, and the fourth layer 508 and the fifth layer 510 also Have dependencies.
  • Each layer of calculation in the neural network is realized by a computing node.
  • the computing nodes in the neural network system can be divided with the granularity of chips, tiles, processing devices, or engines according to actual application conditions, so that computing nodes in different sets are used to process operations of different neural network layers.
  • the computing node referred to in the embodiment of the present application may be a chip 221, a tile 320, a processing device 323, or an engine 326.
  • the computing node reloads the calculation result of the i-th layer and the weight of the i+1th layer from the preset cache for calculation.
  • the i-th layer is any layer in the neural network.
  • the output data (interlayer data) of the second layer 504 is temporarily stored in the preset memory 329, and the fifth layer 510 is executed.
  • the computing node reloads the calculation result of the second layer 504 and the weight of the fifth layer 510 from the preset memory 329 for calculation.
  • the preset cache is also different.
  • the preset cache may be the memory 329 or the memory 3210.
  • the preset cache may be the memory 324.
  • the preset cache may be a memory in the tile 320.
  • the preset cache may be the memory 223.
  • the memory outside the neural network circuit 220 is called an external memory.
  • the external memory is the memory 212 shown in FIG. 2.
  • the memory in the neural network circuit 220 is called an internal memory.
  • the internal memory is the memory 223 shown in FIG. 2.
  • the internal memory is the memory 324 shown in FIG. 3.
  • the internal memories are the memory 329 and the memory 3210 shown in FIG. 4.
  • the so-called external memory refers to the memory outside the chip that runs the neural network.
  • the external memory may be a magnetic disk or the memory 212 shown in FIG. 2.
  • the amount of data that can be processed by each layer in the neural network is the batch size corresponding to that layer.
  • the batch corresponding to the batch size can be one picture, multiple pictures, or part of images in one picture. For example, if the capacity of the internal memory is 100, if layer 1 (layer 1, L1) processes 1 image with a cache requirement of 60, then each scheduling layer 1 can process at most 1 image, and the batch size corresponding to layer 1 is 1. Pictures. If the data cache requirement size for processing 1 image in layer 2 is 30, then each scheduling layer 2 can process 3 images at most, and the batch size corresponding to layer 2 is 3 images.
  • the batch size not only affects the usage of the internal memory of the chip running the neural network, but also affects the optimization degree and processing speed of the neural network.
  • the convolutional layer can use a filling algorithm to process the input data of a non-integral image. That is, before the calculation by the convolution kernel, the size of the input data is artificially increased by means of the filling algorithm to offset the influence caused by the size shrinkage in the calculation.
  • the filling algorithm can be, for example, zero filling, repeated boundary value filling, or other methods. That is to say, if the input data is non-integral image data, it is necessary to process the input data with a filling algorithm; if the input data is an entire image data, it is not necessary to use the filling algorithm to process the input data.
  • the input data needs to be filled first, and then flattened.
  • the stride of the convolution kernel is smaller than the side length of the convolution kernel (usually a square row)
  • the convolution kernel When the stride of the movement is the same as the side length of the convolution kernel, there will be no overlap.
  • the input data size is (w*w)
  • the filled data size is (w+k-s)*(w+k-s).
  • k represents the side length of the convolution kernel
  • s represents the step length of the convolution kernel movement
  • the filling data is (k-s).
  • the layers in a neural network include layer 0, layer 1, layer 2, and layer 3.
  • the size of the convolution kernel is all 3*3, and the step size of the convolution kernel movement It is 1, the step length of the convolution kernel is smaller than the side length of the convolution kernel, and there is an overlap problem in the process of using the filling algorithm to process the input data.
  • the size of the entire picture is 56*56, and the number of rows of the entire picture is divided into 4 parts for processing. If layer 0, layer 1, and layer 2 are scheduled as a layer group, it is necessary to ensure that layer 2 outputs 14 rows of data, that is, the output data size of the layer group is 14*56, and that layer 3 can process 1/4 rows of pictures.
  • the input data of layer 2 needs to be filled with 2 rows of data, that is, the input data size is 16*56.
  • the input data size corresponding to layer 1 is 18*56
  • the input data size corresponding to layer 0 is 20*56. That is to say, in the process of segmenting the entire picture, in order to ensure the output data size, the cache demand of the layers in the layer group will increase. And, the more the number of layers in the layer group, the larger the amount of data that the previous layer needs to fill. If the internal memory capacity is small, the size of the layer group will be limited.
  • a neural network includes multiple layers, which can be described as a neural network including multiple layers arranged in a directed graph, and each layer can have a corresponding set of parameters.
  • the subgraph is obtained by dividing the layers included in the neural network according to the batch size of each layer.
  • the subgraph contains one or more layers of the same batch size.
  • the subgraph can also be described as a super layer or a layer group, etc., which means that it contains one layer or continuous multiple layers in the neural network.
  • the neural network is scheduled to process the input data with the sub-graph as a unit, and the scheduling order of the layers in the sub-graph is the same as the scheduling order of the layers in the neural network.
  • the batches corresponding to the batch size of the layers contained in the subgraph are processed in the order of the layers contained in the subgraph.
  • the inter-layer data of multiple layers included in the sub-picture is stored in the internal memory.
  • the inter-layer data between the sub-pictures is stored in the internal memory.
  • FIG. 7 it is a schematic diagram of a sub-picture provided in an embodiment of this application.
  • the sub-picture includes layer 0 and layer 1.
  • the batch size of layer 0 and the batch size of layer 1 are both 1.
  • a batch corresponding to a batch size of 1 can be one picture, multiple pictures, or part of images in one picture.
  • Level 0 processes one batch at a time.
  • Tier 1 processes one batch at a time.
  • layer 0 and layer 1 in the subgraph process batch A0 and batch A1.
  • Batch A0 and batch A1 may be batches in the input data to be processed by the neural network.
  • batch A0 and batch A1 may be inter-layer data that has been processed by layers in the neural network.
  • the batch size of batch A0 and batch A1 are both 1.
  • the execution sequence of the processing batches in the subgraph is shown by the bold arrows in the figure. For ease of understanding, the layer 0 and layer 1 processing batch A0 and batch A1 are separately shown.
  • layer 0 processes batch A0 first to obtain inter-layer data B0, and layer 1 processes inter-layer data B0 to obtain inter-layer data C0. Then, layer 0 processes batch A1 to obtain inter-layer data B1, and layer 1 processes inter-layer data B1 to obtain inter-layer data C1.
  • the inter-layer data C0 and the inter-layer data C1 can be stored in the internal memory.
  • the graph includes one or more subgraphs. Among them, the graph can also be described as a super layer or a layer group, which means a layer or a continuous multi-layer in the neural network.
  • each subgraph in the graph contains layers that can handle the same batch size.
  • the hypothetical graph includes sub-graph 1 and sub-graph 2.
  • subgraph 1 includes layer 0 and layer 1
  • the batch size of layer 0 is the same as the batch size of layer 1.
  • Subgraph 2 includes layer 2 and layer 3.
  • the batch size of layer 2 is the same as the batch size of layer 3.
  • the batch size of layer 0 and the batch size of layer 1 are both one batch.
  • the batch size of layer 2 and the batch size of layer 3 are both one batch.
  • all sub-graphs included in the graph contain layers of the same batch size.
  • At least two sub-graphs in all sub-graphs included in the graph include layers of different batch sizes.
  • the hypothetical picture includes sub-picture 1, sub-picture 2 and sub-picture 3.
  • subgraph 1 includes layer 0 and layer 1
  • the batch size of layer 0 is the same as the batch size of layer 1.
  • Subgraph 2 includes layer 2 and layer 3.
  • the batch size of layer 2 is the same as the batch size of layer 3.
  • Sub-figure 3 includes layer 4 and layer 5, and the batch size of layer 4 is the same as the batch size of layer 5.
  • the batch size of layer 0 and the batch size of layer 1 are both one batch.
  • the batch size of layer 2 and the batch size of layer 3 are both one batch.
  • the batch size of layer 4 and the batch size of layer 5 are both two batches.
  • the batch size of the layer included in the sub-picture 3 included in the graph is different from the batch size of the layer included in the sub-picture 1.
  • the batch size of the layer included in the sub-picture 3 included in the graph is different from the batch size of the layer included in the sub-picture 2.
  • the neural network is scheduled to process input data in units of graphs, and the scheduling order of the layers in the graph is the same as the scheduling order of the layers in the neural network.
  • the scheduling order of each subgraph included in the figure is determined according to the batch size and the scheduling order of the first and last layers in the subgraph.
  • a part of the data is retained in the cache space of the internal memory, thereby generating additional internal memory cache requirements.
  • the inter-layer data between the pictures is stored in the external memory.
  • the scheduling process of the layers in the figure is described as follows to gather and scatter problems.
  • the inter-layer data between the sub-graphs contained in the graph is aggregated.
  • FIG. 9 it is a schematic diagram of aggregation processing of inter-layer data of a subgraph provided in an embodiment of this application.
  • the graph includes subgraph 0 and subgraph 1.
  • Subgraph 0 includes layer 0 and layer 1.
  • the batch size of layer 0 and the batch size of layer 1 are both 1.
  • Level 0 processes one batch at a time.
  • Tier 1 processes one batch at a time.
  • Sub-figure 1 includes layer 2 and layer 3.
  • the batch size of layer 2 and the batch size of layer 3 are both 2.
  • a batch corresponding to a batch size of 2 can be two pictures, multiple pictures, or partial images in one picture.
  • Layer 2 processes two batches at a time.
  • Layer 3 processes two batches at a time.
  • the graph processes batch A0 and batch A1.
  • Batch A0 and batch A1 may be batches in the input data to be processed by the neural network.
  • Batch A0 and batch A1 may be inter-layer data that has been processed by layers in the neural network.
  • the batch size of batch A0 and batch A1 are both 1. Since layer 0 and layer 1 included in sub-picture 0 process one batch each time, layer 2 and layer 3 included in sub-picture 1 process two batches each time. After subgraph 0 has processed batch A0 and batch A1 respectively, subgraph 1 can then process the inter-layer data of batch A0 and the inter-layer data of batch A1 output by subgraph 0.
  • the execution sequence of the processing batches in the figure is shown by the bold arrows in the figure. For ease of understanding, the layer 0 and layer 1 processing batch A0 and batch A1 are separately shown.
  • layer 0 first processes batch A0 to obtain inter-layer data B0, and layer 1 processes inter-layer data B0 to obtain inter-layer data C0. Then, layer 0 processes batch A1 to obtain inter-layer data B1, and layer 1 processes inter-layer data B1 to obtain inter-layer data C1.
  • the inter-layer data C0 and the inter-layer data C1 can be stored in the internal memory.
  • layer 2 can obtain inter-layer data C0 and inter-layer data C1 from the internal memory.
  • inter-layer data C0 and inter-layer data C1 can be combined into inter-layer data (C0, C1).
  • Layer 2 processes (C0, C1) to obtain inter-layer data (D0, D1)
  • layer 3 processes inter-layer data (D0, D1) to obtain inter-layer data (E0, E1).
  • the inter-layer data (E0, E1) can be stored in the internal memory.
  • the inter-layer data between the sub-graphs contained in the graph is scattered.
  • FIG. 10 it is a schematic diagram of spreading the inter-layer data of the subgraph provided in an embodiment of this application.
  • the graph includes sub graph 1 and sub graph 2.
  • Sub-figure 1 includes layer 2 and layer 3.
  • the batch size of layer 2 and the batch size of layer 3 are both two batches.
  • Layer 2 processes two batches at a time.
  • Layer 3 processes two batches at a time.
  • Sub-figure 2 includes layer 4 and layer 5.
  • the batch size of layer 4 and the batch size of layer 5 are both one batch.
  • Layer 4 processes one batch at a time.
  • Layer 5 processes one batch at a time.
  • layer 2 and layer 3 included in sub-figure 1 process two batches each time
  • layer 4 and layer 5 included in sub-figure 2 process one batch each time.
  • the execution sequence of the processing batches in the figure is shown by the bold arrows in the figure. To facilitate understanding, the processing of the inter-layer data E0 and the inter-layer data E1 in layer 4 and layer 5 are separately represented.
  • layer 2 can obtain the inter-layer data (C0, C1) of batch A0 and batch A1 from the internal memory.
  • Layer 2 processes (C0, C1) to obtain inter-layer data (D0, D1)
  • layer 3 processes inter-layer data (D0, D1) to obtain inter-layer data (E0, E1).
  • the inter-layer data (E0, E1) can be stored in the internal memory.
  • layer 4 first obtains the inter-layer data (E0, E1) from the internal memory, and divides the inter-layer data (E0, E1) into the inter-layer data E0 and the inter-layer data E1.
  • Layer 4 first processes the inter-layer data E0 in the inter-layer data (E0, E1) to obtain the inter-layer data F0, and layer 5 processes the inter-layer data F0 to obtain the inter-layer data G0.
  • the layer 4 processes the inter-layer data E1 in the inter-layer data (E0, E1) to obtain the inter-layer data F1
  • the layer 5 processes the inter-layer data F1 to obtain the inter-layer data G1.
  • the inter-layer data G0 and the inter-layer data G1 can be stored in the internal memory.
  • multiple graphs are scheduled for processing in the order of the layers of the neural network. What needs to be clarified is that the data processed by the latter graph is the data output by the previous graph. Divide the layers of the neural network into multiple graphs, and process batches according to the order of the graphs, which improves the utilization of internal memory and the processing performance of the entire neural network.
  • FIG. 11 it is a schematic diagram of the processing of a graph provided by an embodiment of this application.
  • the abscissa represents the layer of the neural network
  • the ordinate represents the batch.
  • the neural network includes 12 layers.
  • the batch sizes of layer 0, layer 1, layer 4, layer 5, layer 10, and layer 11 are all one batch, that is, layer 0, layer 1, layer 4, layer 5, layer 10, and layer 11 are processed one batch at a time.
  • the batch sizes of layer 2, layer 3, layer 6 and layer 7 are all two batches, that is, layer 2, layer 3, layer 6 and layer 7 process two batches at a time.
  • the batch sizes of layer 8 and layer 9 are both four batches, that is, layer 8 and layer 9 process four batches each time.
  • FIG. 1 includes layers 6 to 11.
  • the numbers in the boxes indicate the order of batch execution. follow the numbers from small to large. After processing a picture 0, process picture 1 again.
  • layer 0 processes batch 0, and layer 1 processes the inter-layer data of batch 0 output by layer 0, obtains the inter-layer data of batch 0 output by layer 1, and stores the inter-layer data of batch 0 in the internal memory.
  • layer 0 processes batch 1
  • layer 1 processes the inter-layer data of batch 1 output by layer 0, obtains the inter-layer data of batch 1 output from layer 1, and stores the inter-layer data of batch 1 in the internal memory.
  • layer 2 processes the inter-layer data of batch 0 and the inter-layer data of batch 1 from the internal memory
  • layer 3 processes the inter-layer data of batch 0 output by layer 2 and
  • the inter-layer data of batch 1 obtains the inter-layer data of batch 0 and the inter-layer data of batch 1 output by layer 3.
  • Layer 4 processes the inter-layer data of batch 0 output by layer 3, and layer 5 processes the inter-layer data of batch 0 output by layer 4;
  • layer 4 processes the inter-layer data of batch 1 output from layer 3, and layer 5 processes the batch output from layer 4 1 inter-layer data.
  • layer 0 to layer 5 process batch 2 and batch 3 in the order of processing batch 0 and batch 1.
  • the batch processed in Figure 1 is the data output in Figure 0.
  • layer 6 processes the inter-layer data of batch 0 and batch 1 output from layer 5
  • layer 7 processes the inter-layer data of batch 0 and batch 1 output of layer 6, and outputs layer 7
  • the inter-layer data of batch 0 and the inter-layer data of batch 1 are stored in the internal memory.
  • Layer 6 processes the batch 2 inter-layer data and batch 3 inter-layer data output by layer 5
  • layer 7 processes the batch 2 inter-layer data and batch 3 inter-layer data output by layer 6 and transfers the batch 2 output from layer 7
  • the inter-layer data and batch 3 inter-layer data are stored in the internal memory.
  • Layer 8 processes the inter-layer data of batch 0 output by layer 7 , The inter-layer data of batch 1, the inter-layer data of batch 2 and the inter-layer data of batch 3, layer 9 processes the inter-layer data of batch 0 output by layer 8, the inter-layer data of batch 1, and the inter-layer data of batch 2 and The inter-layer data of batch 3, the inter-layer data of batch 0, the inter-layer data of batch 1, the inter-layer data of batch 2, and the inter-layer data of batch 3 outputted by layer 9 are stored in the internal memory.
  • Layer 10 processes the batch 0 inter-layer data output by layer 9.
  • Layer 11 processes the inter-layer data of batch 0 output by layer 10.
  • Layer 10 processes the inter-layer data of batch 1 output by layer 9.
  • Layer 11 processes the inter-layer data of batch 1 output by layer 10.
  • Layer 10 processes the batch 2 inter-layer data output by layer 9.
  • Layer 11 processes the inter-layer data of batch 2 output by layer 10.
  • Layer 10 processes the batch 3 inter-layer data output by layer 9.
  • Layer 11 processes the inter-layer data of batch 3 output by layer 10.
  • the internal memory includes a memory 223, a memory 324, a memory 329, and a memory 3210.
  • the external memory is the memory 212.
  • the calculation node completes the calculation of the neural network according to the determined batch size.
  • the computing node includes a chip 221, a tile 320, a processing device 323, or an engine 326.
  • the neural network data processing method includes S1201 and S1202.
  • the processor 211 obtains the data amount of the input data, the first feature of the internal memory in the chip running the neural network, and the second feature of the multiple layers in the neural network.
  • the input data is the data received by the input layer of the neural network.
  • the input data is data in a data set. Take image processing as an example.
  • the input data is 32 pictures in the data set.
  • the first feature includes at least one of the distribution feature of the internal memory in the chip and the capacity of the internal memory.
  • the distribution characteristics of internal memory in the chip include the number of memories in the chip running the neural network and the connection relationship between the memory and the computing node.
  • the memory capacity and number in the chip are large, not every time these storage resources Both are used for neural network calculations, and the storage resources allocated to the neural network calculations vary. Therefore, it is necessary to dynamically optimize the neural network configuration according to the number of memories and the connection relationship between the memories and the computing nodes, that is, the distribution characteristics.
  • the neural network circuit 220 includes the number of memories 223, the number of memories 324, the number of memories 329, and the number of memories 3210, as well as the connection relationship between the memory 223 and the chip 221, and the connection between the memory 324 and the processing device 323 The relationship, the connection relationship between the storage 329 and the engine 326, and the connection relationship between the storage 3210 and the engine 326.
  • the capacity of internal memory includes the capacity of all memories in the chip running the neural network.
  • the memory capacity and number in the chip are large. Not every time these storage resources are used for neural network calculations, they are allocated to the storage for neural network calculations. Resources are variable, so the neural network configuration needs to be dynamically optimized according to the capacity.
  • the neural network circuit 220 includes the capacity of the memory 223, the capacity of the memory 324, the capacity of the memory 329, and the capacity of the memory 3210. It is understandable that the capacity of the internal memory may refer to the available capacity of the internal memory.
  • the second feature includes the connection relationship between the multiple layers and the calculation-related parameters of each of the multiple layers.
  • the computing resources in the chip will change, not every time these computing resources are used for neural network calculations, so the connection relationship between multiple layers and the calculation-related parameters of each layer in multiple layers will also be based on As requirements change, the neural network configuration needs to be dynamically optimized according to the changes.
  • the connection relationship between multiple layers includes the connection relationship between each layer in the neural network and at least one layer in other layers. According to the different functions performed by the neural network, the connection relationship of the layers in the neural network is also different, and this application does not limit the connection relationship of the layers in the neural network.
  • the calculation-related parameters of each layer include the dimensionality of the input data and the dimensionality of the output data, offset parameters, convolution kernels, quantization parameters, or normalization parameters.
  • the first feature and the second feature may be stored in the memory 212 in the host 210.
  • the processor 211 may obtain the characteristics of the internal memory and the characteristics of multiple layers in the neural network from the memory 212 in the host 210.
  • the processor 211 determines the batch size of the multiple layers, N sub-pictures, M pictures, and storage locations of inter-layer data according to the data volume, the first characteristic and the second characteristic.
  • the batch size of at least two layers in the batch size are different.
  • the processor 211 may use an iterative algorithm to determine the batch size of multiple layers, N sub-pictures, M pictures, and storage locations of inter-layer data according to the amount of data, the first feature, and the second feature.
  • the optimization algorithm can be a dynamic programming algorithm, a greedy algorithm or a genetic algorithm. It is understandable that the processor does not perform a calculation based on the amount of data, the first feature, and the second feature at one time, and obtains the batch size of multiple layers, N sub-pictures, M pictures, and storage locations of inter-layer data. Instead, it uses an iterative algorithm to go through multiple iterative experiments.
  • N is an integer greater than or equal to 2
  • M is an integer greater than or equal to 1
  • N ⁇ M it means that the layer of the neural network is divided into 2 subgraphs, and the 2 subgraphs are divided into one graph.
  • the processor 211 first determines the batch size of each layer in the neural network based on the capacity of the internal memory, and then merges layers with the same batch size into sub-graphs. Based on the cache requirements of the sub-graphs and the capacity of the internal memory, multiple sub-graphs are merged into a graph, and the resulting graph contains sub-graphs of different batch sizes. That is to say, when the neural network is scheduled in the unit of graphs, the input data is processed in different batch sizes, so the cache requirement of each graph will not exceed the capacity of the internal memory, but also can improve the utilization of the on-chip memory and improve the hardware. Operational performance.
  • the layers in the N subgraphs are connected to form a complete neural network.
  • Each of the N subgraphs contains one or more layers of the same batch size.
  • One or more layers of the same batch size are consecutive layers in the neural network.
  • the number of layers included in different sub-pictures may be the same or different.
  • the sub-graphs in the M graphs are connected to form a complete neural network.
  • Each of the M graphs includes one or more subgraphs.
  • the number of sub-pictures included in different pictures may be the same or different.
  • an exemplary process of processing data in a neural network is given, including the process of data import (that is, the process of reading input data), the calculation process, and the process of data export (that is, the process of storing output data). ).
  • the neural network processes a batch of data, and needs to first move part of the data in, that is, the data move in process, and the overhead in this process is the head overhead.
  • the data import process, the calculation process, and the data export process are parallel.
  • the neural network executes the data removal process of the last calculated data and stores it in the storage space.
  • the overhead generated by this process is the tail overhead.
  • the layer processes data in units of batch size.
  • calculation time calculation amount of this layer / computing power of the chip equipped with the neural network
  • data transfer time (input data volume + output data volume) / (internal memory bandwidth or chip External storage bandwidth)
  • total time overhead head overhead + max (calculation time, data transfer time) + tail overhead.
  • the time overhead of a certain layer in the neural network can be obtained according to the storage location of at least one of the input data or output data of the current layer and the computing power of the chip equipped with the neural network.
  • the storage location of data includes internal memory and external memory.
  • the external memory and the internal memory are jointly planned to store the inter-layer data, which reduces the storage space of the internal memory.
  • the inter-layer data can be stored in the external memory, a larger batch size can be set for the layers in the neural network, thereby reducing the head overhead of processing each batch of the layers in the neural network, and improving the computational efficiency of the processor .
  • the scheduling order of the layers in the figure is according to the scheduling order of each subgraph contained in the figure, and the order of the subgraphs in the subgraph
  • the scheduling sequence of the layers is determined.
  • the scheduling order of the layers in the subgraph is the same as the scheduling order of the layers in the neural network.
  • the batches corresponding to the batch size of the layers included in the sub-picture are processed in the order of the layers included in the sub-picture.
  • the scheduling order of each subgraph included in the figure is determined according to the batch size and the scheduling order of the first and last layers in the subgraph.
  • the inter-layer data of the sub-graphs contained in the graph are aggregated or scattered. For the explanation of sub-pictures and graphs, please refer to the above description.
  • the neural network includes 6 layers, and the hierarchy sequence is layer 0-layer 5 (layer0-layer5, L1-L5).
  • the batch size corresponding to L0, L1, L4, and L5 is 1, and the batch size corresponding to L2 and L3 is 2.
  • the layers with the same batch size form a subgraph, that is, L0 and L1 form subgraph 0.
  • L2 and L3 form subfigure 1.
  • L4 and L5 make up subfigure 2.
  • the subgraphs are composed of graphs, that is, subgraph 0, subgraph 1, and subgraph 2 are composed of graphs.
  • the batch size corresponding to L0 and L1 is 1, so subgraph 0 can process input data with a data size of 1 each time, that is, batch 0 and batch 1 are processed separately.
  • the output data of L1 is C0.
  • the batch size corresponding to L2 is 2.
  • C0 only corresponds to batch 0, which does not meet the processing requirements of L2, and C0 needs to be temporarily stored in the internal memory.
  • Batch 1 is input to L0 for processing, after L0 and L1 are processed, the output data of L1 is C1.
  • L1 outputs two batches of data to meet the processing requirements of L2.
  • the internal memory contains two sets of C0 and C1 data.
  • L2 can call the aggregated C0 and C1 for processing. Therefore, if sub-graph 0 and sub-graph 1 are divided into one graph, in the process of scheduling L0 and L1 to process batch 1, C0 occupies the cache space of the internal memory, and the amount of data corresponding to C0 is L0 and L1. Additional internal memory cache requirements .
  • the cache requirement of input data corresponding to L0 is the amount of data corresponding to (C0+A1)
  • the cache requirement of output data is the amount of data corresponding to (C0+B1)
  • the cache requirement of input data corresponding to L1 is (C0 +B1) corresponds to the amount of data
  • the buffer requirement for output data is the amount of data corresponding to (C0+C1).
  • the cache requirement of input data corresponding to L4 is the amount of data corresponding to (E1+E0)
  • the cache requirement of output data is the amount of data corresponding to (E1+F0)
  • the cache requirement of input data corresponding to L5 is (E1 +F0) corresponds to the amount of data
  • the buffer requirement for output data is the amount of data corresponding to (E1+G0).
  • the inter-layer data of multiple layers contained in the sub-picture and the inter-layer data between the sub-pictures are stored in the internal memory, which occupies the storage space of the internal memory, the batch size of the multiple layers and the storage of the inter-layer data The location is also affected by the division of subgraphs and graphs.
  • the inter-layer data E0 when the inter-layer data E0 is processed in layers 4 and 5, the inter-layer data E1 is stored in the cache, which occupies the space of the cache, resulting in smaller available caches in the layers 4 and 5 , Affecting the segmentation of input data.
  • the neural network data processing method comprehensively refers to the data amount of the input data, the first feature and the second feature to segment the input data, and sets different batch sizes for the layers in the neural network. Therefore, by setting a reasonable batch size for each layer in the neural network, the internal memory is fully utilized to store the inter-layer data of the neural network during the neural network inference process, which reduces the interaction between the chip running the neural network and the external memory, thereby Improve the utilization of internal memory and ensure the computational efficiency of the chip running the neural network.
  • other computers can execute S1201 and S1202 offline to generate the segmentation strategy and schedule the execution order of the layers of the neural network.
  • the segmentation strategy and the execution order of the layers of the scheduling neural network are configured to the controller in the neural network system, and the controller in the neural network system controls the execution order of the segmentation strategy and the layers of the scheduling neural network.
  • the controller in the neural network system can execute S1201 and S1202 to generate the segmentation strategy and schedule the execution sequence of the neural network layers, and the controller uniformly manages the various layers of the scheduling neural network and the number of segmentation. Batches.
  • Example 1 The input data is the whole image data.
  • the batch size corresponding to L0 and L1 is 1 picture
  • the batch size corresponding to L2, L3 and L4 is 2 pictures
  • L5 and L6 correspond to The batch size is 4 pictures.
  • L0 and L1 are divided into sub-graph 1
  • L2-L4 is divided into sub-graph 1
  • L5 and L6 are divided into sub-graph 2.
  • the 3 subgraphs are divided into one graph, that is, L0-L6 are divided into graphs.
  • the cache requirement of the graph is less than or equal to the capacity of the internal memory.
  • the figure contains layers with different batch sizes. In the process of scheduling the subgraphs in the neural network to process the input data, it can improve the utilization of internal memory and improve the operating performance of the chip running the neural network.
  • the data set contains 8 pictures
  • L0 is the first layer of the picture
  • the batch size is 1 picture
  • the data set is divided into 8 batches of input data (batch 0-batch 7 shown in Fig. 14 )
  • each batch of input data is the whole image data corresponding to 1 picture, and input L0 in batches.
  • subgraph 0 is scheduled twice, corresponding to subgraph 1 is scheduled once, that is, the scheduling sequence is L0 ⁇ L1 ⁇ L0 ⁇ L1 ⁇ L2 ⁇ L3 ⁇ L4; Scheduling subgraph 1 twice corresponds to subgraph 2 scheduling once, that is, the scheduling sequence is L2 ⁇ L3 ⁇ L4 ⁇ L2 ⁇ L3 ⁇ L4 ⁇ L5 ⁇ L6.
  • the scheduling sequence is L2 ⁇ L3 ⁇ L4 ⁇ L2 ⁇ L3 ⁇ L4 ⁇ L5 ⁇ L6.
  • Example 2 The input data is non-integral image data.
  • the batch size corresponding to L0 and L1 is 1/4 pictures
  • the batch size corresponding to L2, L3, and L4 is 1/2 pictures.
  • L0 and L1 are divided into subgraph 0, and the L2-L4 sequence is divided into subgraph 1.
  • the input data is non-integral image data, and the input data needs to be processed with a filling algorithm, and the filling data is the shaded part.
  • the two subgraphs are divided into one graph, that is, L0-L4 are divided into graphs.
  • the cache requirement of the graph is less than or equal to the capacity of the internal memory.
  • the figure contains layers with different batch sizes. In the process of scheduling the subgraphs in the neural network to process the input data, it can improve the utilization of internal memory and improve the operating performance of the chip running the neural network.
  • each batch of input data is the non-integrated image data corresponding to 1/4 pictures, and input L0 in batches.
  • subgraph 0 is scheduled twice, corresponding to subgraph 1 is scheduled once, that is, the scheduling sequence is L0 ⁇ L1 ⁇ L0 ⁇ L1 ⁇ L2 ⁇ L3 ⁇ L4. Processing the input data of the current data set needs to schedule 8 times of subgraph 0 and 4 times of subgraph 1.
  • the neural network system includes at least one of a hardware structure or a software module corresponding to each function.
  • FIG. 16 and FIG. 17 are schematic diagrams of the structure of a possible neural network data processing device provided by an embodiment of the application. These neural network data processing devices can be used to implement the functions of the processor 211 in the foregoing method embodiment, and therefore can also achieve the beneficial effects of the foregoing method embodiment.
  • the data processing device of the neural network in FIG. 16 may be the processor 211 shown in FIG. 2 or a device formed by running software on it.
  • the data processing device 1600 of the neural network includes an acquisition unit 1610 and a processing unit 1620.
  • the neural network data processing device 1600 is used to implement the function of the processor 211 in the method embodiment shown in FIG. 12 above.
  • the acquiring unit 1610 is used to perform S1201; the processing unit 1620 is used to perform S1202. More detailed descriptions of the above-mentioned acquisition unit 1610 and processing unit 1620 can be obtained directly by referring to the relevant description in the method embodiment shown in FIG. 12, and will not be repeated here.
  • the data processing device of the neural network may also be a module (such as a chip) of other equipment connected to the neural network system 200.
  • the data processing device 1700 of the neural network includes a processor 1710 and an interface circuit 1720.
  • the processor 1710 and the interface circuit 1720 are coupled to each other.
  • the interface circuit 1720 may be a transceiver or an input/output interface.
  • the neural network data processing device 1700 may further include a memory 1730 for storing instructions executed by the processor 1710 or storing input data required by the processor 1710 to run the instructions or storing data generated after the processor 1710 runs the instructions.
  • the data processing device of the neural network may include the host 210 shown in FIG.
  • the processor 1710 may include the processor 211, and the memory 1730 is the memory 212.
  • the above scheme is used to configure the batch size for the neural network chip so that the neural network can work efficiently.
  • the batch size, the processing of graphs and subgraphs, and the operation of related algorithms are all executed by the processor 211.
  • the processing method can be It is executed by other types of processors or devices, for example, other controllers or processors located inside the neural network chip execute related solutions to complete the configuration of the neural network.
  • one or more types of processors can be included in the neural network chip, and the processor can run related neural network configuration schemes in order to obtain suitable batch sizes and algorithms such as graph and subgraph division. After configuring the parameters of the neural network, the processor can run neural network consistency neural network calculations, thereby realizing self-configuration, which is not limited in this embodiment.
  • the processor 1710 is used to perform the functions of the above-mentioned processing unit 1620, and the interface circuit 1720 is used to perform the functions of the above-mentioned obtaining unit 1610.
  • the processor in the embodiment of the present application may optionally include a central processing unit (Central Processing Unit, CPU), or may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and special purpose processors.
  • Integrated circuits Application Specific Integrated Circuit, ASIC
  • Field Programmable Gate Array Field Programmable Gate Array, FPGA
  • the general-purpose processor may be a microprocessor or any conventional processor.
  • the method steps in the embodiments of the present application can be implemented by hardware, and can also be implemented by a processor executing software instructions.
  • Software instructions can be composed of corresponding software modules, which can be stored in Random Access Memory (RAM), flash memory, read-only memory (Read-Only Memory, ROM), and programmable read-only memory (Programmable ROM) , PROM), Erasable Programmable Read-Only Memory (Erasable PROM, EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or well-known in the art Any other form of storage medium.
  • RAM Random Access Memory
  • ROM read-only memory
  • PROM programmable read-only memory
  • PROM Erasable Programmable Read-Only Memory
  • EPROM Erasable Programmable Read-Only Memory
  • Electrically Erasable Programmable Read-Only Memory Electrically Erasable Programmable Read-Only Memory
  • register hard disk, mobile
  • An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and can write information to the storage medium.
  • the storage medium may also be an integral part of the processor.
  • the processor and the storage medium may be located in the ASIC.
  • the ASIC can be located in a network device or a terminal device.
  • the processor and the storage medium may also exist as discrete components in the network device or the terminal device.
  • the computer program product includes one or more computer programs or instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, network equipment, user equipment, or other programmable devices.
  • the computer program or instruction may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer program or instruction may be downloaded from a website, computer, The server or data center transmits to another website site, computer, server or data center through wired or wireless means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center that integrates one or more available media.
  • the usable medium may be a magnetic medium, such as a floppy disk, a hard disk, and a magnetic tape; it may also be an optical medium, such as a digital video disc (digital video disc, DVD); and it may also be a semiconductor medium, such as a solid state drive (solid state drive). , SSD).

Abstract

Sont divulgués ici un procédé et un appareil de traitement de données d'un réseau neuronal qui se rapportent au domaine de l'intelligence artificielle. Le procédé consiste : en fonction de la quantité de données de données d'entrée, d'une première caractéristique d'une mémoire interne dans une puce qui exécute un réseau neuronal et d'une seconde caractéristique de multiples couches dans le réseau neuronal, à segmenter de manière dynamique les données d'entrée, et à configurer différentes tailles de lot des couches dans le réseau neuronal. Au moyen de la configuration d'une taille de lot rationnelle de chaque couche dans un réseau neuronal, pendant une procédure d'inférence de réseau neuronal, une mémoire interne peut être entièrement utilisée pour stocker des données inter-couches du réseau neuronal, ce qui permet d'améliorer le taux d'utilisation de la mémoire interne et de garantir l'efficacité de calcul du matériel qui exécute le réseau neuronal.
PCT/CN2020/093624 2020-05-30 2020-05-30 Procédé et appareil de traitement de données d'un réseau neuronal WO2021243489A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/CN2020/093624 WO2021243489A1 (fr) 2020-05-30 2020-05-30 Procédé et appareil de traitement de données d'un réseau neuronal
PCT/CN2021/073691 WO2021244045A1 (fr) 2020-05-30 2021-01-26 Procédé et appareil de traitement de données en réseau neuronal
CN202180037755.7A CN115668222A (zh) 2020-05-30 2021-01-26 一种神经网络的数据处理方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/093624 WO2021243489A1 (fr) 2020-05-30 2020-05-30 Procédé et appareil de traitement de données d'un réseau neuronal

Publications (1)

Publication Number Publication Date
WO2021243489A1 true WO2021243489A1 (fr) 2021-12-09

Family

ID=78831421

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2020/093624 WO2021243489A1 (fr) 2020-05-30 2020-05-30 Procédé et appareil de traitement de données d'un réseau neuronal
PCT/CN2021/073691 WO2021244045A1 (fr) 2020-05-30 2021-01-26 Procédé et appareil de traitement de données en réseau neuronal

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/073691 WO2021244045A1 (fr) 2020-05-30 2021-01-26 Procédé et appareil de traitement de données en réseau neuronal

Country Status (2)

Country Link
CN (1) CN115668222A (fr)
WO (2) WO2021243489A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116382880B (zh) * 2023-06-07 2023-08-11 成都登临科技有限公司 任务执行方法、装置、处理器、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140142929A1 (en) * 2012-11-20 2014-05-22 Microsoft Corporation Deep neural networks training for speech and pattern recognition
CN107454965A (zh) * 2015-05-21 2017-12-08 谷歌公司 神经网络处理器中的批处理
CN108885571A (zh) * 2016-04-05 2018-11-23 谷歌有限责任公司 分批处理机器学习模型的输入
CN109492754A (zh) * 2018-11-06 2019-03-19 深圳市友杰智新科技有限公司 一种基于深度神经网络模型压缩和加速方法
CN110389910A (zh) * 2018-04-17 2019-10-29 英特尔公司 用于管理级联神经网络中的存储器的方法和安排

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018018451A (ja) * 2016-07-29 2018-02-01 富士通株式会社 機械学習方法、機械学習プログラム及び情報処理装置
CN108268941B (zh) * 2017-01-04 2022-05-31 意法半导体股份有限公司 深度卷积网络异构架构
US20180341852A1 (en) * 2017-05-24 2018-11-29 International Business Machines Corporation Balancing memory consumption of multiple graphics processing units in deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140142929A1 (en) * 2012-11-20 2014-05-22 Microsoft Corporation Deep neural networks training for speech and pattern recognition
CN107454965A (zh) * 2015-05-21 2017-12-08 谷歌公司 神经网络处理器中的批处理
CN108885571A (zh) * 2016-04-05 2018-11-23 谷歌有限责任公司 分批处理机器学习模型的输入
CN110389910A (zh) * 2018-04-17 2019-10-29 英特尔公司 用于管理级联神经网络中的存储器的方法和安排
CN109492754A (zh) * 2018-11-06 2019-03-19 深圳市友杰智新科技有限公司 一种基于深度神经网络模型压缩和加速方法

Also Published As

Publication number Publication date
WO2021244045A1 (fr) 2021-12-09
CN115668222A (zh) 2023-01-31

Similar Documents

Publication Publication Date Title
CN109102065B (zh) 一种基于PSoC的卷积神经网络加速器
JP7366274B2 (ja) ニューラル・ネットワークのための適応的探索方法および装置
JP7451614B2 (ja) オンチップの計算ネットワーク
CN111105023B (zh) 数据流重构方法及可重构数据流处理器
WO2021051987A1 (fr) Procédé et appareil d'entraînement de modèle de réseau neuronal
KR20230084449A (ko) 신경 프로세싱 유닛
CN117501245A (zh) 神经网络模型训练方法和装置、数据处理方法和装置
WO2021244045A1 (fr) Procédé et appareil de traitement de données en réseau neuronal
JP2022137247A (ja) 複数の入力データセットのための処理
CN112789627B (zh) 一种神经网络处理器、数据处理方法及相关设备
Véstias Processing systems for deep learning inference on edge devices
Kim et al. Efficient multi-GPU memory management for deep learning acceleration
US11461662B1 (en) Compilation time reduction for memory and compute bound neural networks
Sun et al. Multi-node acceleration for large-scale GCNs
Yan et al. Acceleration and optimization of artificial intelligence CNN image recognition based on FPGA
US10990525B2 (en) Caching data in artificial neural network computations
US11921667B2 (en) Reconfigurable computing chip
Zhou et al. Design and implementation of YOLOv3-Tiny accelerator based on PYNQ-Z2 heterogeneous platform
WO2021120036A1 (fr) Appareil de traitement de données et procédé de traitement de données
WO2020051918A1 (fr) Circuit neuronal, puce, système et procédé associé, et support de stockage
EP3895024A1 (fr) Mise en mémoire cache de données dans des calculs de réseau neuronal artificiel
RamaDevi et al. Machine learning techniques for the energy and performance improvement in Network-on-Chip (NoC)
WO2021237755A1 (fr) Procédé et appareil de planification de réseau neuronal
CN113780529B (zh) 一种面向fpga的稀疏卷积神经网络多级存储计算系统
WO2023115529A1 (fr) Procédé de traitement de données dans une puce, et puce

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20939062

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20939062

Country of ref document: EP

Kind code of ref document: A1