CN115879530B - RRAM (remote radio access m) memory-oriented computing system array structure optimization method - Google Patents

RRAM (remote radio access m) memory-oriented computing system array structure optimization method Download PDF

Info

Publication number
CN115879530B
CN115879530B CN202310186971.9A CN202310186971A CN115879530B CN 115879530 B CN115879530 B CN 115879530B CN 202310186971 A CN202310186971 A CN 202310186971A CN 115879530 B CN115879530 B CN 115879530B
Authority
CN
China
Prior art keywords
quantization
array
data
rram
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310186971.9A
Other languages
Chinese (zh)
Other versions
CN115879530A (en
Inventor
王浩
郑精
吕琳
汪汉斌
万厚钊
马国坤
袁晓旭
高浩浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University
Hubei Jiangcheng Laboratory
Original Assignee
Hubei University
Hubei Jiangcheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University, Hubei Jiangcheng Laboratory filed Critical Hubei University
Priority to CN202310186971.9A priority Critical patent/CN115879530B/en
Publication of CN115879530A publication Critical patent/CN115879530A/en
Application granted granted Critical
Publication of CN115879530B publication Critical patent/CN115879530B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a method for optimizing an array structure of an RRAM-oriented in-memory computing system, which mainly utilizes a corresponding formula in a post-training quantization algorithm to optimize the array structure of the RRAM-based in-memory computing system, and reduces the array area and the system power consumption under the condition of ensuring the computing accuracy and precision. The beneficial effects of the invention are as follows: the invention is suitable for multiple neural networks such as a multi-layer perceptron and a convolutional neural network, and under the same calculation condition, by halving the 1T1R array scale, the system area is effectively reduced, the system energy consumption is reduced, the system calculation efficiency is improved, and the invention is more suitable for commercialized landing in combination with the current situation that the RRAM device preparation process is not mature enough; under the condition that the number of convolution kernels of the CNN convolution layer is increased, the array scale of the invention is half of that of the conventional technology, but the computation X is additionally added Z W Q And X Q W Z The number of multipliers is kept unchanged, and the overall performance advantage of the system is quite remarkable.

Description

RRAM (remote radio access m) memory-oriented computing system array structure optimization method
Technical Field
The invention relates to the technical field of memory computation, in particular to a method for optimizing an array structure of an RRAM memory computing system.
Background
With the rapid development of science and technology, we have put into the "big data" age, and various information data are continuously and explosive, which puts higher demands on storage and computing technologies. Traditional computers employ von neumann architecture, with memory and processor each being independent. The data exchange between the two is frequent, but the data exchange link is narrow and the power consumption is high, and a storage wall is formed between the calculation and the storage, so that the high-performance exertion of the advanced processor is greatly influenced. Therefore, development of a new memory computing system is important, and recently proposed in-memory computing architecture combines memory and operation, so that the load of data transmission can be effectively reduced, the energy consumption of data computing is reduced, and the information processing efficiency is improved.
In-memory computing technology typically utilizes the physical and electrical characteristics of non-volatile memory to perform computations directly in memory while guaranteeing non-volatile memory. The method effectively avoids the interaction of high-frequency data stored and calculated, thereby breaking through the limit of a storage wall and greatly improving the data processing capacity. Among these nonvolatile memories, resistive Random Access Memory (RRAM) has been attracting attention due to its characteristics of simple structure, fast reading and writing, low power consumption, and good CMOS process compatibility. The RRAM can generate resistance state transition under the action of a specific voltage excitation signal, the voltage high and low levels respectively represent the numbers of 1 and 0 by virtue of the electrical characteristic, the device high and low resistance states respectively represent the numbers of 0 and 1, and the current flowing through the device is acquired and quantized by combining ohm law, so that the digital multiplication calculation result is obtained. After the RRAM device is expanded into a cross array structure, multiply-accumulate operation (MAC) can be completed, and matrix operation is further realized. The matrix storage computing capability of the RRAM array is very suitable for the intensive computing application requirements of the neural network, so that the RRAM array has wide application prospect in the field of the neural network accelerator.
However, in the current in-memory computing neural network accelerator based on RRAM, since the 1T1R array can only store unsigned numbers when storing the weight matrix, two device structures are generally adopted to jointly represent signed numbers, namely, positive and negative weights in the neural network. Among them, the common processing methods are as follows:
1, as shown in the 2T2R structure array of FIG. 1, two 1T1R units are combined into a pair of positive and negative values stored with symbol numbers to represent positive and negative weights; applying equivalent positive and negative voltage pulse signals to the current collector, and quantifying the accumulated current collected by the last column to obtain a final multiply-accumulate calculation result;
2, as shown in the 1T1R positive and negative line structure array of fig. 2, all the weights of one weight matrix are mapped to two 1T1R conductive lines, one line is a positive weight, with positive pulse input, and the other line is a negative weight, with equivalent negative pulse input; after inputting the coding pulse into the bit line, collecting two rows of accumulated output currents, and subtracting to obtain a calculation result;
and 3, constructing two 1T1R arrays according to the 1T1R positive and negative double-array structure shown in fig. 3, respectively storing positive and negative weights, inputting equivalent voltage signals, and finally subtracting the calculation results to obtain a final calculation result.
Because the RRAM device technology is not mature enough, the manufacturing of a large-scale array also faces many challenges, and in order to inhibit crosstalk, the three array structures are adopted to carry out storage calculation, but double 1T1R resources are required when one signed number is represented, so that the area and the energy consumption of an in-memory computing system are greatly increased.
Disclosure of Invention
The invention aims to provide a method for optimizing an array structure of an RRAM-oriented in-memory computing system, which aims to solve the problems in the background technology.
In order to achieve the above purpose, the present invention provides a method for optimizing an array structure of an RRAM-oriented in-memory computing system, comprising the following steps:
step one, performing post-training quantization on image data and neural network weight data, and performing quantization formula
Calculating to obtain image quantized data X Q Zero offset X of image data Z Weight quantization data W Q Weight zero point offset W Z ,X Z And W is Z Fixed after quantization, software calculates X Z W Z A value;
step two, signed forward propagation of neural networkThe numerical formula is
Figure SMS_1
And (3) expanding the material to obtain:
Figure SMS_2
respectively calculating positive integer term X by using adder, multiplier and RRAM array circuit Q W Z 、X Z W Q And X is Q W Q Finally, combining X calculated in the step one Z W Z Substituting the expansion formula to calculate a Y value; the RRAM array is used for storing and calculating the positive integer by the splitting calculation method, so that the problem of excessive RRAM device resource consumption caused by directly calculating the number of symbols is avoided;
and thirdly, storing the Y value calculation result into a buffer, carrying out subsequent activation function and other quantized operations, and obtaining complete characteristic diagram data as input of a next-layer network after processing.
As a preferred technical solution of the present invention, the quantization formula in the first step is:
Figure SMS_3
wherein R is an original data value, Q is a quantized data value, S is a scale factor scale, which represents the proportional relation between the original data and the quantized data, Z is zero offset, which represents an integer corresponding to 0 quantization in the original data;
setting the quantization precision as n bits, randomly extracting part of test set data, calculating a quantization scale factor S and a zero offset Z in each layer of network according to the following formula, and substituting the quantization scale factor S and the zero offset Z into the formula to obtain a quantized data value;
the calculation formula of the scaling factor S is as follows:
Figure SMS_4
the zero offset Z calculation formula is:
Figure SMS_5
wherein, the liquid crystal display device comprises a liquid crystal display device,R max and R is min 、Q max And Q min Maximum and minimum values of the original data value and the quantized data value, respectively, and Q max And Q min Determined by the quantization accuracy n, R max And R is min Determined from a randomly drawn portion of the sample data.
In the second step, the RRAM adopts a device with only two stable states of high resistance and low resistance, and forms a 1T1R structure with the NMOS tube, so as to construct a 1T1R crisscross array; and (3) forming an n bits weight value by a row of n 1T1R structures, and performing corresponding shift weighting when collecting output current to obtain a calculation result.
As a preferable technical scheme of the invention, the weight data W Q Controlling row-by-row writing into 1T1R array storage through peripheral digital logic circuit, and then quantizing parameter X Z Converted into a read voltage signal to be input into a 1T1R array, the output current is collected, quantized and shifted and weighted to obtain a calculation result X Z W Q。
As a preferable technical scheme of the invention, the digital control circuit is used for scheduling, corresponding image data are sequentially taken from the buffer and added and multiplied by corresponding W Z Obtaining X Q W Z The method comprises the steps of carrying out a first treatment on the surface of the At the same time, converting the image data into voltage signals, inputting the voltage signals from the row head of the 1T1R array, reading the output current at the tail end of the column, and obtaining X after quantization, shift and weighting Q W Q
Compared with the prior art, the invention has the beneficial effects that: under the same calculation condition, the invention greatly reduces the number of RRAM devices, and is more suitable for commercial landing in combination with the current situation that the preparation process of the RRAM devices is not mature enough; the RRAM array is used for storing and calculating the positive integer by the splitting calculation method, so that the problem of excessive RRAM device resource consumption caused by directly calculating the number of symbols is avoided; by halving the 1T1R array scale, the system area is effectively reduced, the system energy consumption is reduced, and the system calculation efficiency is improved; the invention is suitable for multiple neural networks such as a multilayer perceptron and a convolutional neural network, and because the local receptive field of the CNN convolutional layer is smaller, X is calculated Q W Z Item instituteFewer adders are needed, so that the advantages of the technology in the CNN network are more obvious; under the condition that the number of convolution kernels of the CNN convolution layer is increased, the array scale of the invention is half of that of the conventional technology, but the computation X is additionally added Z W Q And X Q W Z The number of multipliers is kept unchanged, the influence of a calculation unit introduced when the CNN network is large in scale is limited, and the overall performance advantage of the system is quite remarkable.
Drawings
FIG. 1 is a diagram showing the positive and negative weights of a background art 2T2R structure;
FIG. 2 is a diagram showing positive and negative weights of a positive and negative row structure of a background art 1T 1R;
FIG. 3 is a diagram showing the positive and negative weights of a 1T1R positive and negative double array structure in the prior art;
FIG. 4 is a graph showing the quantization weights of the optimized 1T1R single array according to the present invention;
FIG. 5 is a computational block diagram of the present invention optimized using PTQ formulas;
FIG. 6 is a diagram of a single-layer network accelerator architecture based on a 1T1R array in accordance with the present invention;
Detailed Description
Example 1
As shown in fig. 4 to 6, the invention discloses a method for optimizing an array structure of an in-memory computing system for RRAM (resistive random access memory), which mainly utilizes a corresponding formula in a post-training quantization algorithm to optimize the array structure of the in-memory computing system based on RRAM, reduces the array area and the system power consumption under the condition of ensuring the computing accuracy and precision, and provides a reliable solution for the computing acceleration of a large-scale neural network.
The quantization operation is to quantize the 32-bit floating point number in the neural network to an 8-bit fixed point number or other low-bit fixed point number, so that the calculation cost can be greatly reduced, and the neural network is beneficial to being integrated into the edge intelligent equipment with strict requirements on power consumption and delay. The post-training quantization algorithm (PTQ) utilized by the technology of the invention obtains the quantization parameters of the network on the premise of not retraining the network (i.e. not updating the network weight). Taking Convolutional Neural Network (CNN) as an example, after the neural network is trained normally, data after each convolution, pooling and full connection layer calculation are quantized in an inference stage, so that storage calculation cost is effectively saved.
The quantization formula is:
Figure SMS_6
wherein R is an original data value, Q is a quantized data value, S is scale, which represents the proportional relation between the original data and the quantized data, Z is zero point offset, which represents an integer corresponding to 0 quantization in the original data.
The calculation formula of the scaling factor S is as follows:
Figure SMS_7
the zero offset Z calculation formula is:
Figure SMS_8
wherein Q is max And Q min Determined by the precision of the number of bits to be quantized, R max And R is min Then it is determined from the randomly extracted portion of the sample data.
Therefore, all quantization parameters required by the post-training quantization process can be determined before reasoning after the neural network training, and finally, the quantization of the data can be completed only by substituting the corresponding parameters into a quantization formula in the reasoning process.
The technology of the invention mainly utilizes a quantization calculation formula to realize the optimization of the array structure. Taking convolution calculation in a CNN first layer convolution network as an example, an actual calculation formula after original image data and convolution kernel quantization is as follows:
Figure SMS_9
the unfolding can be obtained by:
Figure SMS_10
wherein X is Z And W is Z Is a fixed value.
So X is in the whole convolution calculation process Z W Z To a known determined value, X Z W Q The method also only needs to calculate once after the 1T1R array weight is initialized and before the convolution sliding operation, and actually only X is really needed to calculate during each sliding operation Q W Z And X is Q W Q Two items. X is X Q W Z The term is that the picture input data at each sliding time is added and multiplied by W Z Since the convolution kernel is typically 3*3 or 5*5, only a few adders and multipliers are required to complete the computation; x is X Q W Q The term is the calculation key point, and the quantized positive value convolution kernel weight matrix W Q Storing into 1T1R array, and inputting corresponding characteristic diagram data X during each convolution sliding operation Q In-memory computation of matrix multiplication can be achieved. The structure of the calculation method is shown in fig. 5. In summary, the present invention can halve the 1T1R array resources of the current technology by adding a few additional multipliers and adders. In particular, the number of additional adders and multipliers is only related to the size of the convolution kernel and not to the number of convolution kernels. When the number of convolution kernels in CNN is very large, the required array scale is greatly increased, the number of additional adders and multipliers is unchanged, and the advantages of the technology in array scale optimization are more remarkable, so that the technology is very suitable for accelerating a large-scale neural network.
Taking a first layer convolution network as an example, the convolution kernels are assumed to be 200 3*3 convolution kernels, and the specific implementation steps of the method are as follows:
1, designing and training a neural network system. Setting quantization bit number (8 bits, for example), randomly extracting partial test set data, and calculating quantization scale factor S and zero offset Z of each layer of network
Figure SMS_11
);
2 according to the formula
Figure SMS_12
Quantizing the original image data, the weights of all network layers and the intermediate calculation result of each layer to obtain X of each layer of network Q 、W Q 、X Z And W is Z Equating data and parameters and calculating X for each layer Z W Z
And 3, the RRAM adopts a device with only two stable states of high resistance state and low resistance state, and forms a 1T1R structure with the NMOS tube. Because RRAM only can represent '0' and '1' binary logic, a row of 8 1T1R structures are required to form an 8 bits weight value, and corresponding shift weighting is also required to be carried out when output current is acquired to obtain an actual calculation result. The processing of the convolution kernel adopts the prior conventional technology, the convolution kernel is unfolded line by line into column vectors, the column vectors are stored in a column 1T1R structure, the output current of the column ends is collected, and the single convolution operation result is obtained after quantization. According to the logic building corresponding to the 1T1R array, the array size of 200 3*3 convolution kernels is 9 x 1600 (only 14400 1T1R structures are needed in the technology of the invention, and 28800 1T1R structures are needed in the prior art because of the negative weight problem);
4, controlling the quantized convolution kernel weight data in the step 2 to be written into the array for storage line by line through a peripheral digital logic circuit, and then storing the quantization parameter X Z Converted into a read voltage signal to be input into a 1T1R array, the output current is collected, quantized and shifted and weighted to obtain a calculation result X Z W Q
And 5, storing the original image data into a buffer, and controlling sliding operation through a digital control circuit. According to the convolution sliding setting, 9 image data in the corresponding sliding window are sequentially taken from the buffer to be added and multiplied by the corresponding W Z Obtaining X Q W Z . At the same time, 9 image data are converted into read voltages which are input from the row head of the array, the output current is read, and X is obtained after quantization, shift and weighting Q W Q
And 6, accessing a digital logic circuit at the output end of the array, and calculating a final convolution calculation result:
Figure SMS_13
/>
and storing the single convolution calculation result into a register, carrying out subsequent activation function and pooling operation, and obtaining complete characteristic diagram data as the input of the next network after the convolution sliding calculation is completed.
The steps describe the calculation operation steps of the first convolutional layer of the CNN, the operation of other network layers is similar to the calculation operation steps, and the calculation result output of each network layer is quantized and then is the data input of the next layer. As can be seen from fig. 6, the inventive technique 1T1R array only needs to store quantized positive number weight values, and only needs half of the number of prior art 1T1R under the same weight. Although additional multipliers and adders are introduced in the calculation, the number required is small. In the above example, only 1 multiplier and 3×200+9 adders are added, but 14400 1T1R units are reduced. Considering the current situation that the traditional CMOS technology is far mature in RRAM technology, the technical advantages of the invention are obvious.
Although the specific embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes and modifications without inventive labor may be made within the scope of the present invention without departing from the spirit of the present invention, which is within the scope of the present invention.

Claims (2)

1. The RRAM in-memory computing system array structure optimization-oriented method is characterized by comprising the following steps of:
step one, performing post-training quantization on image data and neural network weight data, and calculating through a quantization formula to obtain image quantization data X Q Zero offset X of image data Z Weight quantization data W Q Weight zero point offset W Z ,X Z And W is Z Fixed after quantization, software calculates X Z W Z A value;
step two, the calculation formula of the forward propagation signed number of the neural network is as follows:
Figure QLYQS_1
and (3) expanding the material to obtain:
Figure QLYQS_2
using adders, multipliers and RRsAM array circuits respectively calculate positive integer items X Q W Z 、X Z W Q And X is Q W Q Finally, combining X calculated in the step one Z W Z Substituting the expansion formula to calculate a Y value; the RRAM array is used for storing and calculating the positive integer by the splitting calculation method, so that the problem of excessive RRAM device resource consumption caused by directly calculating the number of symbols is avoided; the RRAM adopts a device with only two stable states of high resistance and low resistance, and forms a 1T1R structure with an NMOS tube, so as to construct a 1T1R crisscross array; n 1T1R structures in a row form an n bits weight value, and corresponding shifting weighting is carried out when output current is acquired to obtain a calculation result; weight data W Q Controlling row-by-row writing into 1T1R array storage through peripheral digital logic circuit, and then quantizing parameter X Z Converted into a read voltage signal to be input into a 1T1R array, the output current is collected, quantized and shifted and weighted to obtain a calculation result X Z W Q The method comprises the steps of carrying out a first treatment on the surface of the Sequentially taking corresponding image data from the buffer through digital control circuit scheduling, adding and multiplying the corresponding image data by corresponding W Z Obtaining X Q W Z The method comprises the steps of carrying out a first treatment on the surface of the At the same time, converting the image data into voltage signals, inputting the voltage signals from the row head of the 1T1R array, reading the output current at the tail end of the column, and obtaining X after quantization, shift and weighting Q W Q;
And thirdly, storing the Y value calculation result into a buffer, carrying out subsequent activation function and other quantized operations, and obtaining complete characteristic diagram data as input of a next-layer network after processing.
2. The method for optimizing the array structure of the RRAM-oriented in-memory computing system of claim 1, wherein the method comprises the following steps: the quantization formula in the first step is as follows:
Figure QLYQS_3
wherein R is an original data value, Q is a quantized data value, S is a scale factor scale, which represents the proportional relation between the original data and the quantized data, Z is zero offset, which represents an integer corresponding to 0 quantization in the original data;
setting the quantization precision as n bits, randomly extracting part of test set data, calculating a quantization scale factor S and a zero offset Z in each layer of network according to the following formula, and substituting the quantization scale factor S and the zero offset Z into the formula to obtain a quantized data value;
the calculation formula of the scaling factor S is as follows:
Figure QLYQS_4
the zero offset Z calculation formula is:
Figure QLYQS_5
wherein R is max And R is min 、Q max And Q min Maximum and minimum values of the original data value and the quantized data value, respectively, and Q max And Q min Determined by the quantization accuracy n, R max And R is min Determined from a randomly drawn portion of the sample data.
CN202310186971.9A 2023-03-02 2023-03-02 RRAM (remote radio access m) memory-oriented computing system array structure optimization method Active CN115879530B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310186971.9A CN115879530B (en) 2023-03-02 2023-03-02 RRAM (remote radio access m) memory-oriented computing system array structure optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310186971.9A CN115879530B (en) 2023-03-02 2023-03-02 RRAM (remote radio access m) memory-oriented computing system array structure optimization method

Publications (2)

Publication Number Publication Date
CN115879530A CN115879530A (en) 2023-03-31
CN115879530B true CN115879530B (en) 2023-05-05

Family

ID=85761720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310186971.9A Active CN115879530B (en) 2023-03-02 2023-03-02 RRAM (remote radio access m) memory-oriented computing system array structure optimization method

Country Status (1)

Country Link
CN (1) CN115879530B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116561050A (en) * 2023-04-07 2023-08-08 清华大学 Fine granularity mapping method and device for RRAM (remote radio access memory) integrated chip

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109146070B (en) * 2017-06-16 2021-10-22 华为技术有限公司 Peripheral circuit and system for supporting neural network training based on RRAM
US11450385B2 (en) * 2018-09-20 2022-09-20 University Of Utah Research Foundation Digital RRAM-based convolutional block
CN109472353B (en) * 2018-11-22 2020-11-03 浪潮集团有限公司 Convolutional neural network quantization circuit and method
CN109993297A (en) * 2019-04-02 2019-07-09 南京吉相传感成像技术研究院有限公司 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
CN110378468B (en) * 2019-07-08 2020-11-20 浙江大学 Neural network accelerator based on structured pruning and low bit quantization
CN110569962B (en) * 2019-08-08 2022-02-15 华中科技大学 Convolution calculation accelerator based on 1T1R memory array and operation method thereof
CN110427171B (en) * 2019-08-09 2022-10-18 复旦大学 In-memory computing device and method for expandable fixed-point matrix multiply-add operation
CN110647983B (en) * 2019-09-30 2023-03-24 南京大学 Self-supervision learning acceleration system and method based on storage and calculation integrated device array
CN110852429B (en) * 2019-10-28 2022-02-18 华中科技大学 1T 1R-based convolutional neural network circuit and operation method thereof
KR20210076691A (en) * 2019-12-16 2021-06-24 삼성전자주식회사 Method and apparatus for verifying the learning of neural network between frameworks
KR20210085461A (en) * 2019-12-30 2021-07-08 삼성전자주식회사 Processing apparatus and method for processing floating point operation thereof
CN111242289B (en) * 2020-01-19 2023-04-07 清华大学 Convolutional neural network acceleration system and method with expandable scale
CN111832719A (en) * 2020-07-28 2020-10-27 电子科技大学 Fixed point quantization convolution neural network accelerator calculation circuit
CN111738427B (en) * 2020-08-14 2020-12-29 电子科技大学 Operation circuit of neural network
CN112183739B (en) * 2020-11-02 2022-10-04 中国科学技术大学 Hardware architecture of memristor-based low-power-consumption pulse convolution neural network
CN112633477A (en) * 2020-12-28 2021-04-09 电子科技大学 Quantitative neural network acceleration method based on field programmable array
CN113033794B (en) * 2021-03-29 2023-02-28 重庆大学 Light weight neural network hardware accelerator based on deep separable convolution
CN113762491B (en) * 2021-08-10 2023-06-30 南京工业大学 Convolutional neural network accelerator based on FPGA
CN113705803A (en) * 2021-08-31 2021-11-26 南京大学 Image hardware identification system based on convolutional neural network and deployment method

Also Published As

Publication number Publication date
CN115879530A (en) 2023-03-31

Similar Documents

Publication Publication Date Title
Wang et al. Low power convolutional neural networks on a chip
CN110647983B (en) Self-supervision learning acceleration system and method based on storage and calculation integrated device array
CN107169563B (en) Processing system and method applied to two-value weight convolutional network
CN108665063B (en) Bidirectional parallel processing convolution acceleration system for BNN hardware accelerator
CN107229967A (en) A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
WO2021088248A1 (en) Memristor-based neural network parallel acceleration method, processor and device
CN115879530B (en) RRAM (remote radio access m) memory-oriented computing system array structure optimization method
CN113222133B (en) FPGA-based compressed LSTM accelerator and acceleration method
CN110569962B (en) Convolution calculation accelerator based on 1T1R memory array and operation method thereof
CN112636745B (en) Logic unit, adder and multiplier
CN115423081A (en) Neural network accelerator based on CNN _ LSTM algorithm of FPGA
CN115390789A (en) Magnetic tunnel junction calculation unit-based analog domain full-precision memory calculation circuit and method
CN111048135A (en) CNN processing device based on memristor memory calculation and working method thereof
CN111931925A (en) FPGA-based binary neural network acceleration system
US20220269483A1 (en) Compute in memory accumulator
WO2022062391A1 (en) System and method for accelerating rnn network, and storage medium
CN113378115B (en) Near-memory sparse vector multiplier based on magnetic random access memory
TWI771014B (en) Memory circuit and operating method thereof
CN114758699A (en) Data processing method, system, device and medium
CN113988279A (en) Output current reading method and system of storage array supporting negative value excitation
Chen et al. An efficient ReRAM-based inference accelerator for convolutional neural networks via activation reuse
CN109416757B (en) Method, apparatus and computer-readable storage medium for processing numerical data
CN217933180U (en) Memory computing circuit
Chang et al. HDSuper: Algorithm-Hardware Co-design for Light-weight High-quality Super-Resolution Accelerator
CN115858999B (en) Combined optimization problem processing circuit based on improved simulated annealing algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant