CN116306811A

CN116306811A - Weight distribution method for deploying neural network for ReRAM

Info

Publication number: CN116306811A
Application number: CN202310178399.1A
Authority: CN
Inventors: 董光达; 余少华; 伍骏; 熊大鹏; 李涛
Original assignee: Suzhou Yizhu Intelligent Technology Co ltd
Current assignee: Suzhou Yizhu Intelligent Technology Co ltd
Priority date: 2023-02-28
Filing date: 2023-02-28
Publication date: 2023-06-23
Anticipated expiration: 2043-02-28
Also published as: CN116306811B

Abstract

The invention discloses a weight distribution method for deploying a neural network aiming at a ReRAM, which belongs to the technical field of the neural network and comprises the following steps: traversing the neural network structure, converting the weights, and adapting to a computational neural network calculation engine, wherein the weights of each layer form a 3-dimensional small cube; and traversing all the weight cubes by using a 3d knapsack algorithm, putting the small cubes into the large cubes converted by the storage and calculation array, and acquiring corresponding coordinate information. According to the invention, the view angle of the storage array is converted, the weight storage problem is abstracted into the traditional 3d knapsack problem, so that the strange problem can be solved by using a mature algorithm, and the weight storage efficiency is improved; meanwhile, the method is suitable for weight storage of most neural networks, has high reusability, and is more beneficial to the deployment of the neural networks and the iterative improvement of the computational engine of the storage neural networks.

Description

Weight distribution method for deploying neural network for ReRAM

Technical Field

The invention relates to the technical field of neural networks, in particular to a weight distribution method for deploying a neural network aiming at a ReRAM.

Background

The weight loading of the neural network algorithm in the calculation process occupies a large amount of data bandwidth, the weight is rewritten into the calculation unit in advance by the memory calculation neural network calculation engine based on the ReRAM, and the feature vector only needs to be loaded when the neural network algorithm is executed, so that the loading of the weight is omitted, and the bandwidth pressure releasing calculation force can be greatly relieved.

At present, a ReRAM-based in-memory neural network computing engine is still in a starting stage, and no mature complete solution exists.

The structure of the memory grain in the algorithm is shown in figure 1, wherein a bank consists of 256 lines of memory units 144B; one macro consists of 4 banks, where the data/address of the input is shared between banks, providing 4 different outputs; one group consists of 16 macro. The algorithm describes the case of 4 groups, i.e. the calculated particles contain 256 banks in total. rowBank represents the number of banks placed in parallel, i.e., the maximum number of banks that can be used at one time. The weighting format of the neural network is also a multidimensional array, i.e. number widht channel. Because the storage and calculation array and the weight are in a multidimensional format, the general network is more, and if the weight of the whole network is put to the storage and calculation array purely manually, a lot of inconveniences are caused, such as:

(1) Multidimensional data is difficult to put, and the problem of data coverage is easy to occur.

(2) Without reusability, different neural networks all need independent processing, and the workload is extremely high.

(3) The iteration is difficult, the rule change is stored, and the manual change weight storage is difficult.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a weight distribution method for deploying a neural network aiming at a ReRAM.

The aim of the invention is realized by the following technical scheme:

a weight distribution method for deploying a neural network for a ReRAM comprises the following steps:

step 1: reading in a neural network;

step 2: reading in neural network layer information;

step 3: judging whether weight data are included or not;

step 4: if yes, weight conversion is carried out to form a 3-dimensional small cube; if not, jumping to the step 5;

step 5: judging whether the neural network layer is the last layer;

step 6: if yes, a 3d knapsack algorithm is used for storing the 3-dimensional small cubes into a large cube converted by the storage array; if not, reading in the next layer of the neural network layer, and jumping to the step 2;

step 7: and outputting the storage coordinates.

Further, the weight conversion in the step 4 specifically includes: the neural network weights are converted into 3-dimensional microcubes that are supported by a computational neural network computing engine.

Further, the mapping relation of the weight conversion is as follows: the rowBank of the 3-dimensional small cube represents the number of banks placed in parallel, namely the maximum number of banks which can be used at one time; a bank consists of 144B by 256 rows, i.e. a bank has a storage capacity of 144B by 256=36 KB; the floor consists of rowbands, columns, representing the current computing resources.

Further, the mapping relation of the storage array conversion is as follows: the rowBank of the 3-dimensional small cube represents the number of banks placed in parallel, namely the maximum number of banks which can be used at one time; a bank consists of 144B by 256 rows, i.e. a bank has a storage capacity of 144B by 256=36 KB; the floor consists of rowbands, columns, representing the current computing resources.

Further, the step 6 of storing the 3-dimensional microcubes into the storage array transformed large cubes by using a 3d knapsack algorithm specifically comprises the following steps:

step 601: descending order of 3-dimensional cubes;

step 602: sequentially transferring ordered small cubes to store, firstly placing along the column direction, and if the column of the small cubes exceeds the boundary of the large cube of the stored particles, adding the maximum rowBanks to the next row to place; placing along the rowBank direction, and if the current plane is fully placed, adding rows to the next layer of space for placing; finally, placing along the rows direction;

step 603: and after the storage is finished, acquiring all the stored small cube coordinates, and generating weight distribution information necessary for the neural network deployment.

Further, the descending order in step 601 is descending order of the priority of rowbank > columns > rows.

The invention has the beneficial effects that:

1. according to the invention, the view angle of the storage array is converted, the weight storage problem is abstracted into the traditional 3d knapsack problem, so that the strange problem can be solved by using a mature algorithm, and the weight storage efficiency is improved;

2. the method is suitable for storing the weights of most neural networks, has high reusability, and is more beneficial to the deployment of the neural networks and the iterative improvement of the calculation engines of the storage neural networks.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a block diagram of a memory granule.

Fig. 2 is a flow chart of the method of the present invention.

Fig. 3 is a schematic diagram of a 3-dimensional cube structure.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In this embodiment, as shown in fig. 2, a weight distribution method for deploying a neural network for ReRAM includes the following steps:

step 1: reading in a neural network;

step 2: reading in neural network layer information;

step 3: judging whether weight data are included or not;

step 5: judging whether the neural network layer is the last layer;

step 7: and outputting the storage coordinates.

In this embodiment, the mapping relationship of the weight conversion is: the rowBank of the 3-dimensional small cube represents the number of banks placed in parallel, namely the maximum number of banks which can be used at one time; a bank consisting of 144B (column direction) by 256 rows (row direction), i.e. a bank having a storage capacity of 144B by 256=36 KB; the floor consists of rowbands, columns, representing the current computing resources.

In this embodiment, the mapping relationship of the storage array conversion is: the rowBank of the 3-dimensional small cube represents the number of banks placed in parallel, namely the maximum number of banks which can be used at one time; a bank consisting of 144B (column direction) by 256 rows (row direction), i.e. a bank having a storage capacity of 144B by 256=36 KB; the floor consists of rowbands, columns, representing the current computing resources.

In this embodiment, the weight allocation algorithm of the ReRAM deployment neural network of the present invention mainly includes: traversing the neural network structure, converting the weights, and adapting to a computational neural network calculation engine, wherein the weights of each layer form a 3-dimensional small cube; and traversing all the weight cubes by using a 3d knapsack algorithm, putting the small cubes into the large cubes converted by the storage and calculation array, and acquiring corresponding coordinate information. The method comprises the following steps:

1. parsing a neural network, such as mobileNet-v3-unit8.tflite, traversing each layer of the network, putting the general convolution, the deep convolution layer and the full connection layer into the same array (denoted weight_layer_array), and ignoring other layers without weights;

2. traversing weight_layer_array, and deforming the dimension of the weight according to the constraint of the computational neural network calculation engine to obtain a new array;

3. the new array obtained in the last step is transmitted into a 3d knapsack algorithm for placing the small cubes, and the specific steps include:

(1) Descending order of cubes (descending order priority: rowbank > columns > rows);

(2) Sequentially transferring the coordinates into ordered cubes for storage, and obtaining the coordinates. For example, fig. 3 shows the specific storage process as follows:

(a) Firstly placing along the column direction, and if the column of the small cube exceeds the boundary of the large cube of the stored particles, adding the maximum rowBanks to the next row for placing;

(b) Placing along the rowBank direction, and if the current plane is fully placed, adding rows to the next layer of space for placing;

(c) Finally, placing along the rows direction;

(3) And after the storage is finished, acquiring all the stored small cube coordinates, and generating weight distribution information necessary for the neural network deployment.

It should be specifically noted that, in the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in the embodiments may be accomplished by computer programs stored in a computer-readable storage medium, which when executed, may include the steps of the embodiments of the methods described above. Wherein the storage medium may be a magnetic disk, an optical disk, a ROM, a RAM, etc.

The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.

Claims

1. The weight distribution method for deploying the neural network for the ReRAM is characterized by comprising the following steps of:

step 1: reading in a neural network;

step 2: reading in neural network layer information;

step 3: judging whether weight data are included or not;

step 5: judging whether the neural network layer is the last layer;

step 7: and outputting the storage coordinates.

2. The weight distribution method for the ReRAM deployment neural network according to claim 1, wherein the weight conversion in the step 4 is specifically: the neural network weights are converted into 3-dimensional microcubes that are supported by a computational neural network computing engine.

3. The weight distribution method for the ReRAM deployment neural network according to claim 2, wherein the mapping relation of the weight conversion is: the rowBank of the 3-dimensional small cube represents the number of banks placed in parallel, namely the maximum number of banks which can be used at one time; a bank consists of 144B by 256 rows, i.e. a bank has a storage capacity of 144B by 256=36 KB; the floor consists of rowbands, columns, representing the current computing resources.

4. The method for allocating weights to a ReRAM deployed neural network according to claim 1, wherein the mapping relation of the storage array conversion is: the rowBank of the 3-dimensional small cube represents the number of banks placed in parallel, namely the maximum number of banks which can be used at one time; a bank consists of 144B by 256 rows, i.e. a bank has a storage capacity of 144B by 256=36 KB; the floor consists of rowbands, columns, representing the current computing resources.

5. The method for allocating weights to a ReRAM deployed neural network according to claim 1, wherein the storing the 3-dimensional microcubes into the large cubes transformed by the storage array in step 6 by using a 3d knapsack algorithm specifically comprises the following steps:

step 601: descending order of 3-dimensional cubes;

6. The method according to claim 5, wherein the descending order in step 601 is descending order of the priority of rowbank > columns > rows.