CN116306811A - Weight distribution method for deploying neural network for ReRAM - Google Patents

Weight distribution method for deploying neural network for ReRAM Download PDF

Info

Publication number
CN116306811A
CN116306811A CN202310178399.1A CN202310178399A CN116306811A CN 116306811 A CN116306811 A CN 116306811A CN 202310178399 A CN202310178399 A CN 202310178399A CN 116306811 A CN116306811 A CN 116306811A
Authority
CN
China
Prior art keywords
neural network
storage
weight
reram
cubes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310178399.1A
Other languages
Chinese (zh)
Other versions
CN116306811B (en
Inventor
董光达
余少华
伍骏
熊大鹏
李涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Yizhu Intelligent Technology Co ltd
Original Assignee
Suzhou Yizhu Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Yizhu Intelligent Technology Co ltd filed Critical Suzhou Yizhu Intelligent Technology Co ltd
Priority to CN202310178399.1A priority Critical patent/CN116306811B/en
Publication of CN116306811A publication Critical patent/CN116306811A/en
Application granted granted Critical
Publication of CN116306811B publication Critical patent/CN116306811B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a weight distribution method for deploying a neural network aiming at a ReRAM, which belongs to the technical field of the neural network and comprises the following steps: traversing the neural network structure, converting the weights, and adapting to a computational neural network calculation engine, wherein the weights of each layer form a 3-dimensional small cube; and traversing all the weight cubes by using a 3d knapsack algorithm, putting the small cubes into the large cubes converted by the storage and calculation array, and acquiring corresponding coordinate information. According to the invention, the view angle of the storage array is converted, the weight storage problem is abstracted into the traditional 3d knapsack problem, so that the strange problem can be solved by using a mature algorithm, and the weight storage efficiency is improved; meanwhile, the method is suitable for weight storage of most neural networks, has high reusability, and is more beneficial to the deployment of the neural networks and the iterative improvement of the computational engine of the storage neural networks.

Description

Weight distribution method for deploying neural network for ReRAM
Technical Field
The invention relates to the technical field of neural networks, in particular to a weight distribution method for deploying a neural network aiming at a ReRAM.
Background
The weight loading of the neural network algorithm in the calculation process occupies a large amount of data bandwidth, the weight is rewritten into the calculation unit in advance by the memory calculation neural network calculation engine based on the ReRAM, and the feature vector only needs to be loaded when the neural network algorithm is executed, so that the loading of the weight is omitted, and the bandwidth pressure releasing calculation force can be greatly relieved.
At present, a ReRAM-based in-memory neural network computing engine is still in a starting stage, and no mature complete solution exists.
The structure of the memory grain in the algorithm is shown in figure 1, wherein a bank consists of 256 lines of memory units 144B; one macro consists of 4 banks, where the data/address of the input is shared between banks, providing 4 different outputs; one group consists of 16 macro. The algorithm describes the case of 4 groups, i.e. the calculated particles contain 256 banks in total. rowBank represents the number of banks placed in parallel, i.e., the maximum number of banks that can be used at one time. The weighting format of the neural network is also a multidimensional array, i.e. number widht channel. Because the storage and calculation array and the weight are in a multidimensional format, the general network is more, and if the weight of the whole network is put to the storage and calculation array purely manually, a lot of inconveniences are caused, such as:
(1) Multidimensional data is difficult to put, and the problem of data coverage is easy to occur.
(2) Without reusability, different neural networks all need independent processing, and the workload is extremely high.
(3) The iteration is difficult, the rule change is stored, and the manual change weight storage is difficult.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a weight distribution method for deploying a neural network aiming at a ReRAM.
The aim of the invention is realized by the following technical scheme:
a weight distribution method for deploying a neural network for a ReRAM comprises the following steps:
step 1: reading in a neural network;
step 2: reading in neural network layer information;
step 3: judging whether weight data are included or not;
step 4: if yes, weight conversion is carried out to form a 3-dimensional small cube; if not, jumping to the step 5;
step 5: judging whether the neural network layer is the last layer;
step 6: if yes, a 3d knapsack algorithm is used for storing the 3-dimensional small cubes into a large cube converted by the storage array; if not, reading in the next layer of the neural network layer, and jumping to the step 2;
step 7: and outputting the storage coordinates.
Further, the weight conversion in the step 4 specifically includes: the neural network weights are converted into 3-dimensional microcubes that are supported by a computational neural network computing engine.
Further, the mapping relation of the weight conversion is as follows: the rowBank of the 3-dimensional small cube represents the number of banks placed in parallel, namely the maximum number of banks which can be used at one time; a bank consists of 144B by 256 rows, i.e. a bank has a storage capacity of 144B by 256=36 KB; the floor consists of rowbands, columns, representing the current computing resources.
Further, the mapping relation of the storage array conversion is as follows: the rowBank of the 3-dimensional small cube represents the number of banks placed in parallel, namely the maximum number of banks which can be used at one time; a bank consists of 144B by 256 rows, i.e. a bank has a storage capacity of 144B by 256=36 KB; the floor consists of rowbands, columns, representing the current computing resources.
Further, the step 6 of storing the 3-dimensional microcubes into the storage array transformed large cubes by using a 3d knapsack algorithm specifically comprises the following steps:
step 601: descending order of 3-dimensional cubes;
step 602: sequentially transferring ordered small cubes to store, firstly placing along the column direction, and if the column of the small cubes exceeds the boundary of the large cube of the stored particles, adding the maximum rowBanks to the next row to place; placing along the rowBank direction, and if the current plane is fully placed, adding rows to the next layer of space for placing; finally, placing along the rows direction;
step 603: and after the storage is finished, acquiring all the stored small cube coordinates, and generating weight distribution information necessary for the neural network deployment.
Further, the descending order in step 601 is descending order of the priority of rowbank > columns > rows.
The invention has the beneficial effects that:
1. according to the invention, the view angle of the storage array is converted, the weight storage problem is abstracted into the traditional 3d knapsack problem, so that the strange problem can be solved by using a mature algorithm, and the weight storage efficiency is improved;
2. the method is suitable for storing the weights of most neural networks, has high reusability, and is more beneficial to the deployment of the neural networks and the iterative improvement of the calculation engines of the storage neural networks.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a block diagram of a memory granule.
Fig. 2 is a flow chart of the method of the present invention.
Fig. 3 is a schematic diagram of a 3-dimensional cube structure.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In this embodiment, as shown in fig. 2, a weight distribution method for deploying a neural network for ReRAM includes the following steps:
step 1: reading in a neural network;
step 2: reading in neural network layer information;
step 3: judging whether weight data are included or not;
step 4: if yes, weight conversion is carried out to form a 3-dimensional small cube; if not, jumping to the step 5;
step 5: judging whether the neural network layer is the last layer;
step 6: if yes, a 3d knapsack algorithm is used for storing the 3-dimensional small cubes into a large cube converted by the storage array; if not, reading in the next layer of the neural network layer, and jumping to the step 2;
step 7: and outputting the storage coordinates.
In this embodiment, the mapping relationship of the weight conversion is: the rowBank of the 3-dimensional small cube represents the number of banks placed in parallel, namely the maximum number of banks which can be used at one time; a bank consisting of 144B (column direction) by 256 rows (row direction), i.e. a bank having a storage capacity of 144B by 256=36 KB; the floor consists of rowbands, columns, representing the current computing resources.
In this embodiment, the mapping relationship of the storage array conversion is: the rowBank of the 3-dimensional small cube represents the number of banks placed in parallel, namely the maximum number of banks which can be used at one time; a bank consisting of 144B (column direction) by 256 rows (row direction), i.e. a bank having a storage capacity of 144B by 256=36 KB; the floor consists of rowbands, columns, representing the current computing resources.
In this embodiment, the weight allocation algorithm of the ReRAM deployment neural network of the present invention mainly includes: traversing the neural network structure, converting the weights, and adapting to a computational neural network calculation engine, wherein the weights of each layer form a 3-dimensional small cube; and traversing all the weight cubes by using a 3d knapsack algorithm, putting the small cubes into the large cubes converted by the storage and calculation array, and acquiring corresponding coordinate information. The method comprises the following steps:
1. parsing a neural network, such as mobileNet-v3-unit8.tflite, traversing each layer of the network, putting the general convolution, the deep convolution layer and the full connection layer into the same array (denoted weight_layer_array), and ignoring other layers without weights;
2. traversing weight_layer_array, and deforming the dimension of the weight according to the constraint of the computational neural network calculation engine to obtain a new array;
3. the new array obtained in the last step is transmitted into a 3d knapsack algorithm for placing the small cubes, and the specific steps include:
(1) Descending order of cubes (descending order priority: rowbank > columns > rows);
(2) Sequentially transferring the coordinates into ordered cubes for storage, and obtaining the coordinates. For example, fig. 3 shows the specific storage process as follows:
(a) Firstly placing along the column direction, and if the column of the small cube exceeds the boundary of the large cube of the stored particles, adding the maximum rowBanks to the next row for placing;
(b) Placing along the rowBank direction, and if the current plane is fully placed, adding rows to the next layer of space for placing;
(c) Finally, placing along the rows direction;
(3) And after the storage is finished, acquiring all the stored small cube coordinates, and generating weight distribution information necessary for the neural network deployment.
It should be specifically noted that, in the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in the embodiments may be accomplished by computer programs stored in a computer-readable storage medium, which when executed, may include the steps of the embodiments of the methods described above. Wherein the storage medium may be a magnetic disk, an optical disk, a ROM, a RAM, etc.
The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.

Claims (6)

1. The weight distribution method for deploying the neural network for the ReRAM is characterized by comprising the following steps of:
step 1: reading in a neural network;
step 2: reading in neural network layer information;
step 3: judging whether weight data are included or not;
step 4: if yes, weight conversion is carried out to form a 3-dimensional small cube; if not, jumping to the step 5;
step 5: judging whether the neural network layer is the last layer;
step 6: if yes, a 3d knapsack algorithm is used for storing the 3-dimensional small cubes into a large cube converted by the storage array; if not, reading in the next layer of the neural network layer, and jumping to the step 2;
step 7: and outputting the storage coordinates.
2. The weight distribution method for the ReRAM deployment neural network according to claim 1, wherein the weight conversion in the step 4 is specifically: the neural network weights are converted into 3-dimensional microcubes that are supported by a computational neural network computing engine.
3. The weight distribution method for the ReRAM deployment neural network according to claim 2, wherein the mapping relation of the weight conversion is: the rowBank of the 3-dimensional small cube represents the number of banks placed in parallel, namely the maximum number of banks which can be used at one time; a bank consists of 144B by 256 rows, i.e. a bank has a storage capacity of 144B by 256=36 KB; the floor consists of rowbands, columns, representing the current computing resources.
4. The method for allocating weights to a ReRAM deployed neural network according to claim 1, wherein the mapping relation of the storage array conversion is: the rowBank of the 3-dimensional small cube represents the number of banks placed in parallel, namely the maximum number of banks which can be used at one time; a bank consists of 144B by 256 rows, i.e. a bank has a storage capacity of 144B by 256=36 KB; the floor consists of rowbands, columns, representing the current computing resources.
5. The method for allocating weights to a ReRAM deployed neural network according to claim 1, wherein the storing the 3-dimensional microcubes into the large cubes transformed by the storage array in step 6 by using a 3d knapsack algorithm specifically comprises the following steps:
step 601: descending order of 3-dimensional cubes;
step 602: sequentially transferring ordered small cubes to store, firstly placing along the column direction, and if the column of the small cubes exceeds the boundary of the large cube of the stored particles, adding the maximum rowBanks to the next row to place; placing along the rowBank direction, and if the current plane is fully placed, adding rows to the next layer of space for placing; finally, placing along the rows direction;
step 603: and after the storage is finished, acquiring all the stored small cube coordinates, and generating weight distribution information necessary for the neural network deployment.
6. The method according to claim 5, wherein the descending order in step 601 is descending order of the priority of rowbank > columns > rows.
CN202310178399.1A 2023-02-28 2023-02-28 Weight distribution method for deploying neural network for ReRAM Active CN116306811B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310178399.1A CN116306811B (en) 2023-02-28 2023-02-28 Weight distribution method for deploying neural network for ReRAM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310178399.1A CN116306811B (en) 2023-02-28 2023-02-28 Weight distribution method for deploying neural network for ReRAM

Publications (2)

Publication Number Publication Date
CN116306811A true CN116306811A (en) 2023-06-23
CN116306811B CN116306811B (en) 2023-10-27

Family

ID=86823387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310178399.1A Active CN116306811B (en) 2023-02-28 2023-02-28 Weight distribution method for deploying neural network for ReRAM

Country Status (1)

Country Link
CN (1) CN116306811B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504320A (en) * 2016-11-02 2017-03-15 华东师范大学 A kind of based on GPU and the real-time three-dimensional reconstructing method towards depth image
WO2020133317A1 (en) * 2018-12-29 2020-07-02 华为技术有限公司 Computing resource allocation technology and neural network system
CN112000772A (en) * 2020-08-24 2020-11-27 齐鲁工业大学 Sentence-to-semantic matching method based on semantic feature cube and oriented to intelligent question and answer
WO2021078486A1 (en) * 2019-10-24 2021-04-29 International Business Machines Corporation 3d neural inference processing unit architectures
CN112990444A (en) * 2021-05-13 2021-06-18 电子科技大学 Hybrid neural network training method, system, equipment and storage medium
CN113128134A (en) * 2021-06-17 2021-07-16 中国矿业大学(北京) Mining area ecological environment evolution driving factor weight quantitative analysis method
WO2021142713A1 (en) * 2020-01-16 2021-07-22 北京比特大陆科技有限公司 Neural network processing method, device and system
CN113487020A (en) * 2021-07-08 2021-10-08 中国科学院半导体研究所 Stagger storage structure for neural network calculation and neural network calculation method
CN113705784A (en) * 2021-08-20 2021-11-26 江南大学 Neural network weight coding method based on matrix sharing and hardware system
CN114008677A (en) * 2019-04-26 2022-02-01 韦尔特布雷公司 Three-dimensional model optimization
WO2022134465A1 (en) * 2020-12-24 2022-06-30 北京清微智能科技有限公司 Sparse data processing method for accelerating operation of re-configurable processor, and device
CN114723024A (en) * 2022-03-08 2022-07-08 北京知存科技有限公司 Linear programming-based neural network mapping method for storage and calculation integrated chip
CN115420578A (en) * 2022-06-30 2022-12-02 吉林大学 Omicron virus detection method based on microscopic hyperspectral imaging system
CN115687181A (en) * 2022-11-07 2023-02-03 上海亿铸智能科技有限公司 Addressing method for storage processing unit

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504320A (en) * 2016-11-02 2017-03-15 华东师范大学 A kind of based on GPU and the real-time three-dimensional reconstructing method towards depth image
WO2020133317A1 (en) * 2018-12-29 2020-07-02 华为技术有限公司 Computing resource allocation technology and neural network system
CN114008677A (en) * 2019-04-26 2022-02-01 韦尔特布雷公司 Three-dimensional model optimization
WO2021078486A1 (en) * 2019-10-24 2021-04-29 International Business Machines Corporation 3d neural inference processing unit architectures
WO2021142713A1 (en) * 2020-01-16 2021-07-22 北京比特大陆科技有限公司 Neural network processing method, device and system
CN112000772A (en) * 2020-08-24 2020-11-27 齐鲁工业大学 Sentence-to-semantic matching method based on semantic feature cube and oriented to intelligent question and answer
WO2022134465A1 (en) * 2020-12-24 2022-06-30 北京清微智能科技有限公司 Sparse data processing method for accelerating operation of re-configurable processor, and device
CN112990444A (en) * 2021-05-13 2021-06-18 电子科技大学 Hybrid neural network training method, system, equipment and storage medium
CN113128134A (en) * 2021-06-17 2021-07-16 中国矿业大学(北京) Mining area ecological environment evolution driving factor weight quantitative analysis method
CN113487020A (en) * 2021-07-08 2021-10-08 中国科学院半导体研究所 Stagger storage structure for neural network calculation and neural network calculation method
CN113705784A (en) * 2021-08-20 2021-11-26 江南大学 Neural network weight coding method based on matrix sharing and hardware system
CN114723024A (en) * 2022-03-08 2022-07-08 北京知存科技有限公司 Linear programming-based neural network mapping method for storage and calculation integrated chip
CN115420578A (en) * 2022-06-30 2022-12-02 吉林大学 Omicron virus detection method based on microscopic hyperspectral imaging system
CN115687181A (en) * 2022-11-07 2023-02-03 上海亿铸智能科技有限公司 Addressing method for storage processing unit

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
AQEEB IQBAL ARKA等: "ReGraphX: NoC-enabled 3D Heterogeneous ReRAM Architecture for Training Graph Neural Networks", 《ARXIV》, pages 1 - 6 *
周川波: "基于ReRAM的神经网络加速器发展概况", 《西部广播电视》, no. 24, pages 246 - 251 *
宫正: "基于激光雷达背包点云的室内场景语义分割与目标检测方法研究", 《中国博士学位论文全文数据库 信息科技辑》, no. 3, pages 136 - 79 *

Also Published As

Publication number Publication date
CN116306811B (en) 2023-10-27

Similar Documents

Publication Publication Date Title
CN111178519B (en) Convolutional neural network acceleration engine, convolutional neural network acceleration system and method
CN109948774B (en) Neural network accelerator based on network layer binding operation and implementation method thereof
CN110222818B (en) Multi-bank row-column interleaving read-write method for convolutional neural network data storage
WO2020238843A1 (en) Neural network computing device and method, and computing device
Ibrahim et al. Intelligent data placement mechanism for replicas distribution in cloud storage systems
CN110750957B (en) Cache system verification method of high-efficiency multi-core RISC-V processor
CN108197075B (en) Multi-core implementation method of Inceptation structure
CN112633490A (en) Data processing device and method for executing neural network model and related products
CN111415003B (en) Three-dimensional stacked storage optimization method and device for neural network acceleration chip
CN110336875B (en) Method for improving computing and storing speed of Internet of things application
CN116306811B (en) Weight distribution method for deploying neural network for ReRAM
CN106484532B (en) GPGPU parallel calculating method towards SPH fluid simulation
CN111429974A (en) Molecular dynamics simulation short-range force parallel optimization method on super computer platform
CN109446478B (en) Complex covariance matrix calculation system based on iteration and reconfigurable mode
CN113419861B (en) GPU card group-oriented graph traversal hybrid load balancing method
CN115221102B (en) Method for optimizing convolution operation of system-on-chip and related product
CN110084865A (en) A kind of method of discrete point classification weighted fitting regular grid
CN116956756B (en) Model deployment method, task processing method, device, equipment and storage medium
CN109190450A (en) Artificial intelligence remote sensing image data extraction method based on distributed computing platform
CN105138607A (en) Hybrid granularity distributional memory grid index-based KNN query method
CN116993555A (en) Partition method, system and storage medium for identifying territory space planning key region
CN114124973B (en) Mirror image synchronization method and device for multi-cloud scene
CN112527463B (en) Container mirror image downloading acceleration method based on object storage
CN116089095B (en) Deployment method for ReRAM neural network computing engine network
CN114021070A (en) Deep convolution calculation method and system based on micro-architecture processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant