CN115879530A - Method for optimizing array structure of RRAM (resistive random access memory) memory computing system - Google Patents

Method for optimizing array structure of RRAM (resistive random access memory) memory computing system Download PDF

Info

Publication number
CN115879530A
CN115879530A CN202310186971.9A CN202310186971A CN115879530A CN 115879530 A CN115879530 A CN 115879530A CN 202310186971 A CN202310186971 A CN 202310186971A CN 115879530 A CN115879530 A CN 115879530A
Authority
CN
China
Prior art keywords
data
rram
array
quantization
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310186971.9A
Other languages
Chinese (zh)
Other versions
CN115879530B (en
Inventor
王浩
郑精
吕琳
汪汉斌
万厚钊
马国坤
袁晓旭
高浩浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University
Hubei Jiangcheng Laboratory
Original Assignee
Hubei University
Hubei Jiangcheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University, Hubei Jiangcheng Laboratory filed Critical Hubei University
Priority to CN202310186971.9A priority Critical patent/CN115879530B/en
Publication of CN115879530A publication Critical patent/CN115879530A/en
Application granted granted Critical
Publication of CN115879530B publication Critical patent/CN115879530B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses a RRAM (resistive random access memory) memory computing system array structure optimization-oriented method, which is mainly used for optimizing an RRAM-based memory computing system array structure by using a corresponding formula in a post-training quantization algorithm, so that the array area is reduced and the system power consumption is reduced under the condition of ensuring the computing accuracy and precision. The invention has the beneficial effects that: the method is suitable for multiple neural networks such as a multilayer perceptron and a convolutional neural network, the system area is effectively reduced, the system energy consumption is reduced, the system calculation efficiency is improved by reducing the 1T1R array scale by half under the same calculation condition, and the method is more suitable for commercial landing by combining the current situation that the RRAM device preparation process is not mature enough; in the invention, under the condition that the number of convolution kernels of the CNN convolution layer is increased, the array scale is half of that of the conventional technology, but extra calculation X is added Z W Q And X Q W Z The number of the multipliers is kept unchanged, and the overall performance advantage of the system is very remarkable.

Description

Method for optimizing array structure of RRAM (resistive random access memory) memory computing system
Technical Field
The invention relates to the technical field of memory computing, in particular to a method for optimizing an array structure of a RRAM memory computing system.
Background
With the rapid development of science and technology, the era of 'big data' has been developed, various information data are continuously and explosively increased, and higher requirements are put forward on storage and computing technology. Traditional computers employ a von Neumann architecture, with memory and processor being independent of each other. Data exchange between the two is frequent, but a data exchange link is narrow, power consumption is high, a storage wall is formed between calculation and storage, and high performance of the advanced processor is greatly influenced. Therefore, it is important to develop a new storage computing system, and a memory computing architecture proposed in recent years combines storage and operation into one, so that the load of data transmission can be effectively reduced, the energy consumption of data computing can be reduced, and the efficiency of information processing can be improved.
In-memory computing techniques generally utilize the physical electrical characteristics of non-volatile memory to perform computations directly in memory while ensuring non-volatile storage. The method effectively avoids the interaction of high-frequency data of storage and calculation, thereby breaking through the limitation of a storage wall and greatly improving the data processing capability. Among these nonvolatile memories, a Resistive Random Access Memory (RRAM) has attracted attention due to its characteristics of simple structure, fast read/write, low power consumption, and good CMOS process compatibility. The RRAM can generate resistance state transition under the action of a specific voltage excitation signal, by means of the electrical characteristic, the voltage high and low levels respectively represent digital 1 and 0, the device high and low resistance states respectively represent 0 and 1, and the current flowing through the device is acquired and quantized by combining ohm's law to obtain a digital multiplication calculation result. When the RRAM device is expanded to a cross array structure, the product accumulation operation (MAC) can be completed, and then matrix operation is realized. The matrix storage computing capability of the RRAM array is very suitable for the intensive computing application requirement of the neural network, so that the RRAM array has wide application prospect in the field of neural network accelerators.
However, in the current RRAM-based in-memory computing neural network accelerator, since the 1T1R array can only store unsigned numbers when storing the weight matrix, two device structures are usually adopted to commonly represent signed numbers, i.e., positive and negative weights in the neural network. Among them, the following three types of common treatment methods are available:
1, as a 2T2R structure array shown in FIG. 1, two 1T1R units form a pair of positive and negative values with signed numbers stored, so as to represent positive and negative weight values; applying equivalent positive and negative voltage pulse signals to the positive and negative voltage pulse signals, and quantizing the accumulated current collected at the end of the row to obtain a final multiplication accumulation calculation result;
2, as shown in the 1T1R positive and negative row structure array shown in fig. 2, all weights of a weight matrix are mapped to two 1T1R conductive rows, one row is a positive weight and has a positive pulse input, and the other row is a negative weight and has an equivalent negative pulse input; after the coded pulse is input into a bit line, collecting two rows of accumulated output currents, and subtracting to obtain a calculation result;
and 3, constructing two 1T1R arrays according to the 1T1R positive and negative double array structure shown in the figure 3, respectively storing positive and negative weights, inputting equivalent voltage signals, and finally subtracting the calculation results to obtain the final calculation result.
Because the technology of manufacturing RRAM devices is not mature enough, the manufacturing of large-scale arrays still faces many challenges, and meanwhile, in order to suppress the crosstalk problem, the three array structures are adopted for memory calculation, but when a signed number is represented, double 1T1R resources are needed, and the area and the energy consumption of a memory calculation system are greatly increased.
Disclosure of Invention
The present invention is directed to a method for optimizing an array structure of a RRAM-oriented memory computing system, so as to solve the problems in the background art.
In order to achieve the above object, the present invention provides the following technical solutions, a method for optimizing an array structure of a RRAM memory computing system, comprising the following steps:
step one, performing post-training quantification on image data and neural network weight data, and performing quantification through a quantification formula
Obtaining image quantization data X by calculation Q Zero point shift amount X of image data Z Weighted quantized data W Q And the weight zero point offset amount W Z ,X Z And W Z After quantization, the X is fixed and unchanged, and the software calculates the X Z W Z A value;
step two, the signed number calculation formula of the neural network forward propagation is
Y=X*W=(X Q -X Z )*(W Q -W Z )
It is developed to obtain:Y=X Q W Q -X Q W Z -X Z W Q +X Z W Z respectively calculating positive integer term X by using adder, multiplier and RRAM array circuit Q W Z 、X Z W Q And X Q W Q Finally combining X calculated in the step one Z W Z Substituting the expansion formula to calculate a Y value; by the aid of the split calculation method, the RRAM array stores and calculates positive integers, and the problem of excessive resource consumption of RRAM devices caused by direct calculation of signed numbers is solved;
and step three, storing the Y value calculation result into a buffer, performing other operations such as subsequent activation function and quantization, and obtaining complete characteristic diagram data after the processing is finished and using the complete characteristic diagram data as the input of the next layer of network.
As a preferred technical solution of the present invention, in the first step, a quantization formula is:
Figure SMS_1
wherein R is an original data value, Q is a quantized data value, S is a scale factor scale representing the proportional relation between the original data and the quantized data, and Z is zero offset representing an integer corresponding to 0 in the original data after quantization;
setting the quantization precision as n bits, randomly extracting part of test set data, calculating a quantization scale factor S and a zero offset Z in each layer of network according to the following formula, and substituting the quantization scale factor S and the zero offset Z into the formula to obtain a quantized data value;
the formula for calculating the scaling factor S is as follows:
Figure SMS_2
the zero offset Z is calculated as:
Figure SMS_3
wherein R is max And R min 、Q max And Q min The maximum and minimum values of the original data value and the quantized data value, respectively, and Q max And Q min Determined by the quantization precision n, R max And R min Determined by the randomly drawn part of the sample data.
As a preferred technical scheme of the invention, in the second step, the RRAM adopts a device with only two stable states of high and low resistance, and forms a 1T1R structure with the NMOS tube, so as to build a 1T1R criss-cross array; and the n 1T1R structures in a row form an nbits weight value, and corresponding shift weighting is carried out when the output current is collected to obtain a calculation result.
As a preferred embodiment of the present invention, the weight data W Q The memory is written into the 1T1R array row by row under the control of the peripheral digital logic circuitThen quantize the parameter X Z Converting the voltage into a read voltage signal, inputting the read voltage signal into a 1T1R array, collecting output current, quantizing, and weighting by displacement to obtain a calculation result X Z W Q。
As a preferred technical scheme of the invention, the digital control circuit is used for scheduling, and the corresponding picture data are sequentially taken from the buffer to be added and then multiplied by the corresponding W Z To obtain X Q W Z (ii) a Meanwhile, converting the picture data into voltage signals, inputting the voltage signals from the head of a row of the 1T1R array, reading the output current at the tail end of a column, and obtaining X after quantization and shift weighting Q W Q
Compared with the prior art, the invention has the beneficial effects that: under the same calculation condition, the invention greatly reduces the number of RRAM devices, and is more suitable for commercial landing by combining the current situation that the preparation process of the RRAM devices is not mature enough; by the aid of the split calculation method, the RRAM array stores and calculates positive integers, and the problem of excessive resource consumption of RRAM devices caused by direct calculation of signed numbers is solved; by reducing the size of the 1T1R array by half, the area of the system is effectively reduced, the energy consumption of the system is reduced, and the calculation efficiency of the system is improved; the invention is suitable for multiple neural networks such as multilayer perceptron and convolutional neural network, and the local receptive field of CNN convolutional layer is small, so that X is calculated Q W Z The number of adders required by the items is less, so that the technology has more obvious advantages in a CNN network; in the invention, under the condition that the number of convolution kernels of the CNN convolution layer is increased, the array scale is half of that of the conventional technology, but extra calculation X is added Z W Q And X Q W Z The number of the multipliers is kept unchanged, the influence of the introduced computing unit is limited when the size of the CNN network is large, and the overall performance advantage of the system is very remarkable.
Drawings
FIG. 1 is a diagram of a 2T2R structure representing positive and negative weights in the background art;
FIG. 2 is a diagram showing the structure of positive and negative rows of 1T1R in the background art to represent positive and negative weights;
FIG. 3 is a diagram showing the positive and negative weights of a 1T1R positive and negative double array structure in the background art;
FIG. 4 shows the optimized 1T1R single array representation quantization weights of the present invention;
FIG. 5 is a diagram of a computational architecture optimized using PTQ equations in accordance with the present invention;
FIG. 6 is a diagram of a single-layer network accelerator architecture based on a 1T1R array according to the present invention;
detailed description of the preferred embodiments
Example 1
As shown in fig. 4 to 6, the invention discloses a method for optimizing an array structure of an RRAM-based in-memory computing system, which mainly optimizes the array structure of the RRAM-based in-memory computing system by using a corresponding formula in a post-training quantization algorithm, reduces the array area and the system power consumption under the condition of ensuring the computing accuracy and precision, and provides a reliable solution for accelerating the computation of a large-scale neural network.
The quantization operation is to quantize the 32-bit floating point number in the neural network to an 8-bit or other low-bit fixed point number, so that the calculation cost can be greatly reduced, and the neural network is favorably integrated into edge intelligent equipment with strict requirements on power consumption and delay. The post-training quantization algorithm (PTQ) utilized by the technology of the invention obtains the quantization parameters of the network on the premise of not retraining the network (namely not updating the network weight). Taking a Convolutional Neural Network (CNN) as an example, after the neural network is trained normally, data after each convolution, pooling and full connection layer calculation is quantized in an inference stage, so that the storage calculation cost is effectively saved.
The quantization formula is:
Figure SMS_4
wherein R is an original data value, Q is a quantized data value, S is scale, which represents the proportional relation between the original data and the quantized data, and Z is zero offset, which represents an integer corresponding to 0 in the original data after quantization.
The formula for calculating the scaling factor S is as follows:
Figure SMS_5
the zero offset Z is calculated as:
Figure SMS_6
wherein Q max And Q min Determined by the precision of the number of bits to be quantized, R max And R min It is determined by the randomly drawn part of the sample data.
Therefore, various quantization parameters required by the post-training quantization process can be determined before reasoning after the neural network training, and finally, the data can be quantized only by substituting the corresponding parameters into the quantization formula in the reasoning process.
The technology of the invention mainly utilizes a quantitative calculation formula to realize the optimization of the array structure. Taking convolution calculation in the CNN first layer convolution network as an example, the actual calculation formula after quantizing the original picture data and the convolution kernel is:
Y=X*W=(X Q -X Z )*(W Q -W Z )
unfolding it can result in:Y=X Q W Q -X Q W Z -X Z W Q +X Z W Z
wherein X Z And W Z Is a fixed value.
So during the entire convolution calculation, X Z W Z For a known determined value, X Z W Q The calculation is only needed once after the initialization of the 1T1R array weight and before the convolution sliding operation, and actually only X which needs to be calculated really is needed to be calculated every time the sliding operation is carried out Q W Z And X Q W Q Two items. X Q W Z The term is the sum of the picture input data at each sliding and multiplied by W Z Since the convolution kernel is typically 3 × 3 or 5 × 5, only a few adders and multipliers are required to complete the computation; x Q W Q The term is the key point of calculation, and the quantized positive value is convolvedKernel weight matrix W Q Storing in 1T1R array, inputting corresponding characteristic diagram data X in each convolution sliding operation Q The in-memory calculation of matrix multiplication can be realized. The structure of the above calculation method is shown in fig. 5. In summary, the present invention can reduce the 1T1R array resource in the current common technology by adding a few additional multipliers and adders. In particular, the number of additional adders and multipliers is dependent only on the size of the convolution kernel and not on the number of convolution kernels. When the number of convolution kernels in the CNN is very large, the required array scale is greatly increased, the number of additional adders and multipliers is unchanged, and the advantage of the technology of the invention in array scale optimization is more remarkable, so that the technology is very suitable for large-scale neural network acceleration.
Taking the first layer of convolutional network as an example, the convolutional kernel is assumed to be 200 3 × 3 convolutional kernels, and the specific implementation steps of the invention are as follows:
1, designing and training a neural network system. Setting quantization bit number (such as 8 bits), randomly extracting partial test set data, and calculating quantization scale factor S and zero point offset Z of each layer of network
Figure SMS_7
);
2 according to the formula
Figure SMS_8
Carrying out quantitative processing on the original picture data, the weights of all network layers and the intermediate calculation results of all layers to obtain X of all layers of networks Q 、W Q 、X Z And W Z Equating the data and parameters and calculating X for each layer Z W Z
3, the RRAM adopts a device with two stable states of high and low resistance states, and forms a 1T1R structure with the NMOS tube. Because the RRAM here can only represent binary logics of "0" and "1", a line of 8 1T1R structures is required to form an 8-bits weight value, and a corresponding shift weighting is also required when an output current is collected to obtain an actual calculation result. The convolution kernel is processed by adopting the conventional technology, the convolution kernel is expanded into column vectors line by line and is stored in a column of 1T1R structures, the column end output current is collected, and the single convolution operation result is obtained after quantization. Corresponding 1T1R arrays are built according to the logic, and the size of 200 arrays of 3 × 3 convolution kernels is 9 × 1600 (in the technology, only 14400 1T1R structures are needed, and 28800 1T1R structures are needed due to the existence of a negative weight problem in the conventional technology);
4, writing the quantized convolution kernel weight data in the step 2 into an array for storage line by line through the control of a peripheral digital logic circuit, and then writing a quantization parameter X into the array for storage Z Converting the voltage into a read voltage signal, inputting the read voltage signal into a 1T1R array, collecting output current, quantizing, and weighting by displacement to obtain a calculation result X Z W Q
And 5, storing the original picture data into a buffer, and controlling the sliding operation through a digital control circuit. According to the convolution sliding setting, sequentially taking 9 picture data in the corresponding sliding window from the buffer, adding the picture data and multiplying the added picture data by the corresponding W Z To obtain X Q W Z . Meanwhile, 9 picture data are converted into reading voltages which are input from the head of the array row, output currents are read, and X is obtained after quantization and shifting weighting Q W Q
And 6, connecting a digital logic circuit at the output end of the array, and calculating a final convolution calculation result:
Y=X Q W Q -X Q W Z -X Z W Q +X Z W Z
and storing the single convolution calculation result into a register, performing subsequent activation function and pooling operation, and obtaining complete characteristic diagram data as the input of the next layer of network after the convolution sliding calculation is finished.
The steps describe the calculation operation steps of the CNN first layer convolution layer, the operation of other network layers is realized similarly, and the calculation result output of each layer is quantized to be the data input of the next layer. As can be seen from fig. 6, the 1T1R array according to the present invention only needs to store the quantized positive weight values, and only needs half of the number of 1T1R arrays according to the conventional technology under the same weight. Although additional multipliers and adders are introduced in the calculation, the required number is small. In the above example, only 1 multiplier and 3 + 200+9 adders are added, but 14400 1T1R units are reduced. Considering the current situation that the traditional CMOS process technology is far matured over the RRAM process, the technical advantages of the invention are very obvious.
Although the present invention has been described in detail with reference to the specific embodiments thereof, the present invention is not limited to the above embodiments, and various changes can be made without departing from the gist of the present invention within the knowledge of those skilled in the art without departing from the scope of the present invention.

Claims (5)

1. A method for optimizing an array structure of a RRAM (resistive random access memory) oriented computing system is characterized by comprising the following steps:
step one, performing post-training quantification on image data and neural network weight data, and performing quantification through a quantification formula
The image quantization data X is obtained by calculation Q Zero point offset amount X of image data Z Weighted quantized data W Q And the weight zero point offset amount W Z ,X Z And W Z After quantization, the X is fixed and unchanged, and the software calculates the X Z W Z A value;
step two, the calculation formula of the signed number propagated forward by the neural network is as follows:
Y=X*W=(X Q -X Z )*(W Q -W Z )
it is developed to obtain:Y=X Q W Q -X Q W Z -X Z W Q +X Z W Z respectively calculating positive integer term X by using adder, multiplier and RRAM array circuit Q W Z 、X Z W Q And X Q W Q Finally combining X calculated in the step one Z W Z Substituting the expansion formula to calculate a Y value; by the aid of the split calculation method, the RRAM array stores and calculates positive integers, and the problem of excessive resource consumption of RRAM devices caused by direct calculation of signed numbers is solved;
and step three, storing the calculation result of the Y value into a buffer, performing other operations such as subsequent activation function and quantization, and obtaining complete characteristic diagram data after the processing is finished to be used as the input of the next layer of network.
2. The method for optimizing an array structure of a RRAM-oriented memory computing system of claim 1, wherein: the quantization formula in the first step is as follows:
Figure QLYQS_1
wherein R is an original data value, Q is a quantized data value, S is a scale factor scale representing the proportional relation between the original data and the quantized data, and Z is zero offset representing an integer corresponding to 0 in the original data after quantization;
setting the quantization precision as n bits, randomly extracting part of test set data, calculating a quantization scale factor S and a zero offset Z in each layer of network according to the following formula, and substituting the quantization scale factor S and the zero offset Z into the formula to obtain a quantized data value;
the formula for calculating the scaling factor S is as follows:
Figure QLYQS_2
the zero offset Z is calculated as:
Figure QLYQS_3
wherein R is max And R min 、Q max And Q min The maximum and minimum values of the original data value and the quantized data value, respectively, and Q max And Q min Determined by the quantization precision n, R max And R min Determined by the randomly drawn part of the sample data.
3. The method for optimizing an array structure of a RRAM-oriented memory computing system of claim 1, wherein: in the second step, the RRAM adopts a device with two stable states of only high and low resistance states to form a 1T1R structure with the NMOS tube, and then a 1T1R crisscross array is built; and the n 1T1R structures in a row form an nbits weight value, and corresponding shift weighting is carried out when the output current is collected to obtain a calculation result.
4. The method of claim 3, wherein the method comprises the following steps: weight data W Q Writing the data into the 1T1R array memory line by line under the control of a peripheral digital logic circuit, and then quantizing the parameter X Z Converting the voltage into a read voltage signal, inputting the read voltage signal to a 1T1R array, collecting output current, quantizing, shifting and weighting to obtain a calculation result X Z W Q
5. The method of claim 3, wherein the method for optimizing the array structure of the RRAM-oriented memory computing system comprises: scheduling by a digital control circuit, sequentially taking corresponding picture data from the buffer, adding the picture data, and multiplying the picture data by corresponding W Z To obtain X Q W Z (ii) a Meanwhile, converting the picture data into voltage signals, inputting the voltage signals from the head of a row of the 1T1R array, reading the output current at the tail end of a column, and obtaining X after quantization and shift weighting Q W Q
CN202310186971.9A 2023-03-02 2023-03-02 RRAM (remote radio access m) memory-oriented computing system array structure optimization method Active CN115879530B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310186971.9A CN115879530B (en) 2023-03-02 2023-03-02 RRAM (remote radio access m) memory-oriented computing system array structure optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310186971.9A CN115879530B (en) 2023-03-02 2023-03-02 RRAM (remote radio access m) memory-oriented computing system array structure optimization method

Publications (2)

Publication Number Publication Date
CN115879530A true CN115879530A (en) 2023-03-31
CN115879530B CN115879530B (en) 2023-05-05

Family

ID=85761720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310186971.9A Active CN115879530B (en) 2023-03-02 2023-03-02 RRAM (remote radio access m) memory-oriented computing system array structure optimization method

Country Status (1)

Country Link
CN (1) CN115879530B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116561050A (en) * 2023-04-07 2023-08-08 清华大学 Fine granularity mapping method and device for RRAM (remote radio access memory) integrated chip

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109146070A (en) * 2017-06-16 2019-01-04 华为技术有限公司 A kind of peripheral circuit and system of neural network training of the support based on RRAM
CN109472353A (en) * 2018-11-22 2019-03-15 济南浪潮高新科技投资发展有限公司 A kind of convolutional neural networks sample circuit and quantization method
CN109993297A (en) * 2019-04-02 2019-07-09 南京吉相传感成像技术研究院有限公司 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
CN110378468A (en) * 2019-07-08 2019-10-25 浙江大学 A kind of neural network accelerator quantified based on structuring beta pruning and low bit
CN110427171A (en) * 2019-08-09 2019-11-08 复旦大学 Expansible fixed-point number matrix multiply-add operation deposits interior calculating structures and methods
CN110569962A (en) * 2019-08-08 2019-12-13 华中科技大学 Convolution calculation accelerator based on 1T1R memory array and operation method thereof
CN110647983A (en) * 2019-09-30 2020-01-03 南京大学 Self-supervision learning acceleration system and method based on storage and calculation integrated device array
CN110852429A (en) * 2019-10-28 2020-02-28 华中科技大学 Convolutional neural network based on 1T1R and operation method thereof
US20200098428A1 (en) * 2018-09-20 2020-03-26 University Of Utah Research Foundation Digital rram-based convolutional block
CN111242289A (en) * 2020-01-19 2020-06-05 清华大学 Convolutional neural network acceleration system and method with expandable scale
CN111738427A (en) * 2020-08-14 2020-10-02 电子科技大学 Operation circuit of neural network
CN111832719A (en) * 2020-07-28 2020-10-27 电子科技大学 Fixed point quantization convolution neural network accelerator calculation circuit
CN112183739A (en) * 2020-11-02 2021-01-05 中国科学技术大学 Hardware architecture of memristor-based low-power-consumption pulse convolution neural network
CN112633477A (en) * 2020-12-28 2021-04-09 电子科技大学 Quantitative neural network acceleration method based on field programmable array
CN113033794A (en) * 2021-03-29 2021-06-25 重庆大学 Lightweight neural network hardware accelerator based on deep separable convolution
US20210200513A1 (en) * 2019-12-30 2021-07-01 Samsung Electronics Co., Ltd. Method and apparatus with floating point processing
CN113065632A (en) * 2019-12-16 2021-07-02 三星电子株式会社 Method and apparatus for validating training of neural networks for image recognition
CN113705803A (en) * 2021-08-31 2021-11-26 南京大学 Image hardware identification system based on convolutional neural network and deployment method
CN113762491A (en) * 2021-08-10 2021-12-07 南京工业大学 Convolutional neural network accelerator based on FPGA

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109146070A (en) * 2017-06-16 2019-01-04 华为技术有限公司 A kind of peripheral circuit and system of neural network training of the support based on RRAM
US20200098428A1 (en) * 2018-09-20 2020-03-26 University Of Utah Research Foundation Digital rram-based convolutional block
CN109472353A (en) * 2018-11-22 2019-03-15 济南浪潮高新科技投资发展有限公司 A kind of convolutional neural networks sample circuit and quantization method
CN109993297A (en) * 2019-04-02 2019-07-09 南京吉相传感成像技术研究院有限公司 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
CN110378468A (en) * 2019-07-08 2019-10-25 浙江大学 A kind of neural network accelerator quantified based on structuring beta pruning and low bit
CN110569962A (en) * 2019-08-08 2019-12-13 华中科技大学 Convolution calculation accelerator based on 1T1R memory array and operation method thereof
CN110427171A (en) * 2019-08-09 2019-11-08 复旦大学 Expansible fixed-point number matrix multiply-add operation deposits interior calculating structures and methods
CN110647983A (en) * 2019-09-30 2020-01-03 南京大学 Self-supervision learning acceleration system and method based on storage and calculation integrated device array
CN110852429A (en) * 2019-10-28 2020-02-28 华中科技大学 Convolutional neural network based on 1T1R and operation method thereof
CN113065632A (en) * 2019-12-16 2021-07-02 三星电子株式会社 Method and apparatus for validating training of neural networks for image recognition
US20210200513A1 (en) * 2019-12-30 2021-07-01 Samsung Electronics Co., Ltd. Method and apparatus with floating point processing
CN113126953A (en) * 2019-12-30 2021-07-16 三星电子株式会社 Method and apparatus for floating point processing
CN111242289A (en) * 2020-01-19 2020-06-05 清华大学 Convolutional neural network acceleration system and method with expandable scale
CN111832719A (en) * 2020-07-28 2020-10-27 电子科技大学 Fixed point quantization convolution neural network accelerator calculation circuit
CN111738427A (en) * 2020-08-14 2020-10-02 电子科技大学 Operation circuit of neural network
CN112183739A (en) * 2020-11-02 2021-01-05 中国科学技术大学 Hardware architecture of memristor-based low-power-consumption pulse convolution neural network
CN112633477A (en) * 2020-12-28 2021-04-09 电子科技大学 Quantitative neural network acceleration method based on field programmable array
CN113033794A (en) * 2021-03-29 2021-06-25 重庆大学 Lightweight neural network hardware accelerator based on deep separable convolution
CN113762491A (en) * 2021-08-10 2021-12-07 南京工业大学 Convolutional neural network accelerator based on FPGA
CN113705803A (en) * 2021-08-31 2021-11-26 南京大学 Image hardware identification system based on convolutional neural network and deployment method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PENG YAO等: "Fully hardware-implemented memristor convolutional neural network", 《NATURE》 *
季渊等: "具有二维状态转移结构的随机逻辑及其在神经网络中的应用", 《电子与信息学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116561050A (en) * 2023-04-07 2023-08-08 清华大学 Fine granularity mapping method and device for RRAM (remote radio access memory) integrated chip

Also Published As

Publication number Publication date
CN115879530B (en) 2023-05-05

Similar Documents

Publication Publication Date Title
CN109063825B (en) Convolutional neural network accelerator
CN107169563B (en) Processing system and method applied to two-value weight convolutional network
Wang et al. Low power convolutional neural networks on a chip
CN108665063B (en) Bidirectional parallel processing convolution acceleration system for BNN hardware accelerator
CN107256424B (en) Three-value weight convolution network processing system and method
WO2022037257A1 (en) Convolution calculation engine, artificial intelligence chip, and data processing method
CN113222133B (en) FPGA-based compressed LSTM accelerator and acceleration method
CN111652360B (en) Convolution operation device based on pulsation array
CN115879530B (en) RRAM (remote radio access m) memory-oriented computing system array structure optimization method
CN112636745B (en) Logic unit, adder and multiplier
CN115423081A (en) Neural network accelerator based on CNN _ LSTM algorithm of FPGA
KR20220114519A (en) Quantum error correction decoding system and method, fault-tolerant quantum error correction system and chip
WO2023116923A1 (en) Storage and calculation integrated device and calculation method
CN111931925A (en) FPGA-based binary neural network acceleration system
CN113762493A (en) Neural network model compression method and device, acceleration unit and computing system
Shahshahani et al. Memory optimization techniques for fpga based cnn implementations
TWI737228B (en) Quantization method based on hardware of in-memory computing and system thereof
KR20230084449A (en) Neural processing unit
Guan et al. Recursive binary neural network training model for efficient usage of on-chip memory
WO2022062391A1 (en) System and method for accelerating rnn network, and storage medium
CN111627479B (en) Coding type flash memory device, system and coding method
CN113378115B (en) Near-memory sparse vector multiplier based on magnetic random access memory
CN115495152A (en) Memory computing circuit with variable length input
US20230047364A1 (en) Partial sum management and reconfigurable systolic flow architectures for in-memory computation
CN115222028A (en) One-dimensional CNN-LSTM acceleration platform based on FPGA and implementation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant