CN115879530A - Method for optimizing array structure of RRAM (resistive random access memory) memory computing system - Google Patents
Method for optimizing array structure of RRAM (resistive random access memory) memory computing system Download PDFInfo
- Publication number
- CN115879530A CN115879530A CN202310186971.9A CN202310186971A CN115879530A CN 115879530 A CN115879530 A CN 115879530A CN 202310186971 A CN202310186971 A CN 202310186971A CN 115879530 A CN115879530 A CN 115879530A
- Authority
- CN
- China
- Prior art keywords
- data
- rram
- array
- quantization
- calculation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Complex Calculations (AREA)
Abstract
The invention discloses a RRAM (resistive random access memory) memory computing system array structure optimization-oriented method, which is mainly used for optimizing an RRAM-based memory computing system array structure by using a corresponding formula in a post-training quantization algorithm, so that the array area is reduced and the system power consumption is reduced under the condition of ensuring the computing accuracy and precision. The invention has the beneficial effects that: the method is suitable for multiple neural networks such as a multilayer perceptron and a convolutional neural network, the system area is effectively reduced, the system energy consumption is reduced, the system calculation efficiency is improved by reducing the 1T1R array scale by half under the same calculation condition, and the method is more suitable for commercial landing by combining the current situation that the RRAM device preparation process is not mature enough; in the invention, under the condition that the number of convolution kernels of the CNN convolution layer is increased, the array scale is half of that of the conventional technology, but extra calculation X is added Z W Q And X Q W Z The number of the multipliers is kept unchanged, and the overall performance advantage of the system is very remarkable.
Description
Technical Field
The invention relates to the technical field of memory computing, in particular to a method for optimizing an array structure of a RRAM memory computing system.
Background
With the rapid development of science and technology, the era of 'big data' has been developed, various information data are continuously and explosively increased, and higher requirements are put forward on storage and computing technology. Traditional computers employ a von Neumann architecture, with memory and processor being independent of each other. Data exchange between the two is frequent, but a data exchange link is narrow, power consumption is high, a storage wall is formed between calculation and storage, and high performance of the advanced processor is greatly influenced. Therefore, it is important to develop a new storage computing system, and a memory computing architecture proposed in recent years combines storage and operation into one, so that the load of data transmission can be effectively reduced, the energy consumption of data computing can be reduced, and the efficiency of information processing can be improved.
In-memory computing techniques generally utilize the physical electrical characteristics of non-volatile memory to perform computations directly in memory while ensuring non-volatile storage. The method effectively avoids the interaction of high-frequency data of storage and calculation, thereby breaking through the limitation of a storage wall and greatly improving the data processing capability. Among these nonvolatile memories, a Resistive Random Access Memory (RRAM) has attracted attention due to its characteristics of simple structure, fast read/write, low power consumption, and good CMOS process compatibility. The RRAM can generate resistance state transition under the action of a specific voltage excitation signal, by means of the electrical characteristic, the voltage high and low levels respectively represent digital 1 and 0, the device high and low resistance states respectively represent 0 and 1, and the current flowing through the device is acquired and quantized by combining ohm's law to obtain a digital multiplication calculation result. When the RRAM device is expanded to a cross array structure, the product accumulation operation (MAC) can be completed, and then matrix operation is realized. The matrix storage computing capability of the RRAM array is very suitable for the intensive computing application requirement of the neural network, so that the RRAM array has wide application prospect in the field of neural network accelerators.
However, in the current RRAM-based in-memory computing neural network accelerator, since the 1T1R array can only store unsigned numbers when storing the weight matrix, two device structures are usually adopted to commonly represent signed numbers, i.e., positive and negative weights in the neural network. Among them, the following three types of common treatment methods are available:
1, as a 2T2R structure array shown in FIG. 1, two 1T1R units form a pair of positive and negative values with signed numbers stored, so as to represent positive and negative weight values; applying equivalent positive and negative voltage pulse signals to the positive and negative voltage pulse signals, and quantizing the accumulated current collected at the end of the row to obtain a final multiplication accumulation calculation result;
2, as shown in the 1T1R positive and negative row structure array shown in fig. 2, all weights of a weight matrix are mapped to two 1T1R conductive rows, one row is a positive weight and has a positive pulse input, and the other row is a negative weight and has an equivalent negative pulse input; after the coded pulse is input into a bit line, collecting two rows of accumulated output currents, and subtracting to obtain a calculation result;
and 3, constructing two 1T1R arrays according to the 1T1R positive and negative double array structure shown in the figure 3, respectively storing positive and negative weights, inputting equivalent voltage signals, and finally subtracting the calculation results to obtain the final calculation result.
Because the technology of manufacturing RRAM devices is not mature enough, the manufacturing of large-scale arrays still faces many challenges, and meanwhile, in order to suppress the crosstalk problem, the three array structures are adopted for memory calculation, but when a signed number is represented, double 1T1R resources are needed, and the area and the energy consumption of a memory calculation system are greatly increased.
Disclosure of Invention
The present invention is directed to a method for optimizing an array structure of a RRAM-oriented memory computing system, so as to solve the problems in the background art.
In order to achieve the above object, the present invention provides the following technical solutions, a method for optimizing an array structure of a RRAM memory computing system, comprising the following steps:
step one, performing post-training quantification on image data and neural network weight data, and performing quantification through a quantification formula
Obtaining image quantization data X by calculation Q Zero point shift amount X of image data Z Weighted quantized data W Q And the weight zero point offset amount W Z ,X Z And W Z After quantization, the X is fixed and unchanged, and the software calculates the X Z W Z A value;
step two, the signed number calculation formula of the neural network forward propagation is
Y=X*W=(X
Q
-X
Z
)*(W
Q
-W
Z
)
It is developed to obtain:Y=X Q W Q -X Q W Z -X Z W Q +X Z W Z respectively calculating positive integer term X by using adder, multiplier and RRAM array circuit Q W Z 、X Z W Q And X Q W Q Finally combining X calculated in the step one Z W Z Substituting the expansion formula to calculate a Y value; by the aid of the split calculation method, the RRAM array stores and calculates positive integers, and the problem of excessive resource consumption of RRAM devices caused by direct calculation of signed numbers is solved;
and step three, storing the Y value calculation result into a buffer, performing other operations such as subsequent activation function and quantization, and obtaining complete characteristic diagram data after the processing is finished and using the complete characteristic diagram data as the input of the next layer of network.
As a preferred technical solution of the present invention, in the first step, a quantization formula is:,
wherein R is an original data value, Q is a quantized data value, S is a scale factor scale representing the proportional relation between the original data and the quantized data, and Z is zero offset representing an integer corresponding to 0 in the original data after quantization;
setting the quantization precision as n bits, randomly extracting part of test set data, calculating a quantization scale factor S and a zero offset Z in each layer of network according to the following formula, and substituting the quantization scale factor S and the zero offset Z into the formula to obtain a quantized data value;
wherein R is max And R min 、Q max And Q min The maximum and minimum values of the original data value and the quantized data value, respectively, and Q max And Q min Determined by the quantization precision n, R max And R min Determined by the randomly drawn part of the sample data.
As a preferred technical scheme of the invention, in the second step, the RRAM adopts a device with only two stable states of high and low resistance, and forms a 1T1R structure with the NMOS tube, so as to build a 1T1R criss-cross array; and the n 1T1R structures in a row form an nbits weight value, and corresponding shift weighting is carried out when the output current is collected to obtain a calculation result.
As a preferred embodiment of the present invention, the weight data W Q The memory is written into the 1T1R array row by row under the control of the peripheral digital logic circuitThen quantize the parameter X Z Converting the voltage into a read voltage signal, inputting the read voltage signal into a 1T1R array, collecting output current, quantizing, and weighting by displacement to obtain a calculation result X Z W Q。
As a preferred technical scheme of the invention, the digital control circuit is used for scheduling, and the corresponding picture data are sequentially taken from the buffer to be added and then multiplied by the corresponding W Z To obtain X Q W Z (ii) a Meanwhile, converting the picture data into voltage signals, inputting the voltage signals from the head of a row of the 1T1R array, reading the output current at the tail end of a column, and obtaining X after quantization and shift weighting Q W Q 。
Compared with the prior art, the invention has the beneficial effects that: under the same calculation condition, the invention greatly reduces the number of RRAM devices, and is more suitable for commercial landing by combining the current situation that the preparation process of the RRAM devices is not mature enough; by the aid of the split calculation method, the RRAM array stores and calculates positive integers, and the problem of excessive resource consumption of RRAM devices caused by direct calculation of signed numbers is solved; by reducing the size of the 1T1R array by half, the area of the system is effectively reduced, the energy consumption of the system is reduced, and the calculation efficiency of the system is improved; the invention is suitable for multiple neural networks such as multilayer perceptron and convolutional neural network, and the local receptive field of CNN convolutional layer is small, so that X is calculated Q W Z The number of adders required by the items is less, so that the technology has more obvious advantages in a CNN network; in the invention, under the condition that the number of convolution kernels of the CNN convolution layer is increased, the array scale is half of that of the conventional technology, but extra calculation X is added Z W Q And X Q W Z The number of the multipliers is kept unchanged, the influence of the introduced computing unit is limited when the size of the CNN network is large, and the overall performance advantage of the system is very remarkable.
Drawings
FIG. 1 is a diagram of a 2T2R structure representing positive and negative weights in the background art;
FIG. 2 is a diagram showing the structure of positive and negative rows of 1T1R in the background art to represent positive and negative weights;
FIG. 3 is a diagram showing the positive and negative weights of a 1T1R positive and negative double array structure in the background art;
FIG. 4 shows the optimized 1T1R single array representation quantization weights of the present invention;
FIG. 5 is a diagram of a computational architecture optimized using PTQ equations in accordance with the present invention;
FIG. 6 is a diagram of a single-layer network accelerator architecture based on a 1T1R array according to the present invention;
detailed description of the preferred embodiments
Example 1
As shown in fig. 4 to 6, the invention discloses a method for optimizing an array structure of an RRAM-based in-memory computing system, which mainly optimizes the array structure of the RRAM-based in-memory computing system by using a corresponding formula in a post-training quantization algorithm, reduces the array area and the system power consumption under the condition of ensuring the computing accuracy and precision, and provides a reliable solution for accelerating the computation of a large-scale neural network.
The quantization operation is to quantize the 32-bit floating point number in the neural network to an 8-bit or other low-bit fixed point number, so that the calculation cost can be greatly reduced, and the neural network is favorably integrated into edge intelligent equipment with strict requirements on power consumption and delay. The post-training quantization algorithm (PTQ) utilized by the technology of the invention obtains the quantization parameters of the network on the premise of not retraining the network (namely not updating the network weight). Taking a Convolutional Neural Network (CNN) as an example, after the neural network is trained normally, data after each convolution, pooling and full connection layer calculation is quantized in an inference stage, so that the storage calculation cost is effectively saved.
wherein R is an original data value, Q is a quantized data value, S is scale, which represents the proportional relation between the original data and the quantized data, and Z is zero offset, which represents an integer corresponding to 0 in the original data after quantization.
wherein Q max And Q min Determined by the precision of the number of bits to be quantized, R max And R min It is determined by the randomly drawn part of the sample data.
Therefore, various quantization parameters required by the post-training quantization process can be determined before reasoning after the neural network training, and finally, the data can be quantized only by substituting the corresponding parameters into the quantization formula in the reasoning process.
The technology of the invention mainly utilizes a quantitative calculation formula to realize the optimization of the array structure. Taking convolution calculation in the CNN first layer convolution network as an example, the actual calculation formula after quantizing the original picture data and the convolution kernel is:
Y=X*W=(X Q -X Z )*(W Q -W Z ),
unfolding it can result in:Y=X Q W Q -X Q W Z -X Z W Q +X Z W Z ,
wherein X Z And W Z Is a fixed value.
So during the entire convolution calculation, X Z W Z For a known determined value, X Z W Q The calculation is only needed once after the initialization of the 1T1R array weight and before the convolution sliding operation, and actually only X which needs to be calculated really is needed to be calculated every time the sliding operation is carried out Q W Z And X Q W Q Two items. X Q W Z The term is the sum of the picture input data at each sliding and multiplied by W Z Since the convolution kernel is typically 3 × 3 or 5 × 5, only a few adders and multipliers are required to complete the computation; x Q W Q The term is the key point of calculation, and the quantized positive value is convolvedKernel weight matrix W Q Storing in 1T1R array, inputting corresponding characteristic diagram data X in each convolution sliding operation Q The in-memory calculation of matrix multiplication can be realized. The structure of the above calculation method is shown in fig. 5. In summary, the present invention can reduce the 1T1R array resource in the current common technology by adding a few additional multipliers and adders. In particular, the number of additional adders and multipliers is dependent only on the size of the convolution kernel and not on the number of convolution kernels. When the number of convolution kernels in the CNN is very large, the required array scale is greatly increased, the number of additional adders and multipliers is unchanged, and the advantage of the technology of the invention in array scale optimization is more remarkable, so that the technology is very suitable for large-scale neural network acceleration.
Taking the first layer of convolutional network as an example, the convolutional kernel is assumed to be 200 3 × 3 convolutional kernels, and the specific implementation steps of the invention are as follows:
1, designing and training a neural network system. Setting quantization bit number (such as 8 bits), randomly extracting partial test set data, and calculating quantization scale factor S and zero point offset Z of each layer of network
2 according to the formulaCarrying out quantitative processing on the original picture data, the weights of all network layers and the intermediate calculation results of all layers to obtain X of all layers of networks Q 、W Q 、X Z And W Z Equating the data and parameters and calculating X for each layer Z W Z ;
3, the RRAM adopts a device with two stable states of high and low resistance states, and forms a 1T1R structure with the NMOS tube. Because the RRAM here can only represent binary logics of "0" and "1", a line of 8 1T1R structures is required to form an 8-bits weight value, and a corresponding shift weighting is also required when an output current is collected to obtain an actual calculation result. The convolution kernel is processed by adopting the conventional technology, the convolution kernel is expanded into column vectors line by line and is stored in a column of 1T1R structures, the column end output current is collected, and the single convolution operation result is obtained after quantization. Corresponding 1T1R arrays are built according to the logic, and the size of 200 arrays of 3 × 3 convolution kernels is 9 × 1600 (in the technology, only 14400 1T1R structures are needed, and 28800 1T1R structures are needed due to the existence of a negative weight problem in the conventional technology);
4, writing the quantized convolution kernel weight data in the step 2 into an array for storage line by line through the control of a peripheral digital logic circuit, and then writing a quantization parameter X into the array for storage Z Converting the voltage into a read voltage signal, inputting the read voltage signal into a 1T1R array, collecting output current, quantizing, and weighting by displacement to obtain a calculation result X Z W Q ;
And 5, storing the original picture data into a buffer, and controlling the sliding operation through a digital control circuit. According to the convolution sliding setting, sequentially taking 9 picture data in the corresponding sliding window from the buffer, adding the picture data and multiplying the added picture data by the corresponding W Z To obtain X Q W Z . Meanwhile, 9 picture data are converted into reading voltages which are input from the head of the array row, output currents are read, and X is obtained after quantization and shifting weighting Q W Q ;
And 6, connecting a digital logic circuit at the output end of the array, and calculating a final convolution calculation result:
Y=X
Q
W
Q
-X
Q
W
Z
-X
Z
W
Q
+X
Z
W
Z
and storing the single convolution calculation result into a register, performing subsequent activation function and pooling operation, and obtaining complete characteristic diagram data as the input of the next layer of network after the convolution sliding calculation is finished.
The steps describe the calculation operation steps of the CNN first layer convolution layer, the operation of other network layers is realized similarly, and the calculation result output of each layer is quantized to be the data input of the next layer. As can be seen from fig. 6, the 1T1R array according to the present invention only needs to store the quantized positive weight values, and only needs half of the number of 1T1R arrays according to the conventional technology under the same weight. Although additional multipliers and adders are introduced in the calculation, the required number is small. In the above example, only 1 multiplier and 3 + 200+9 adders are added, but 14400 1T1R units are reduced. Considering the current situation that the traditional CMOS process technology is far matured over the RRAM process, the technical advantages of the invention are very obvious.
Although the present invention has been described in detail with reference to the specific embodiments thereof, the present invention is not limited to the above embodiments, and various changes can be made without departing from the gist of the present invention within the knowledge of those skilled in the art without departing from the scope of the present invention.
Claims (5)
1. A method for optimizing an array structure of a RRAM (resistive random access memory) oriented computing system is characterized by comprising the following steps:
step one, performing post-training quantification on image data and neural network weight data, and performing quantification through a quantification formula
The image quantization data X is obtained by calculation Q Zero point offset amount X of image data Z Weighted quantized data W Q And the weight zero point offset amount W Z ,X Z And W Z After quantization, the X is fixed and unchanged, and the software calculates the X Z W Z A value;
step two, the calculation formula of the signed number propagated forward by the neural network is as follows:
Y=X*W=(X
Q
-X
Z
)*(W
Q
-W
Z
)
it is developed to obtain:Y=X Q W Q -X Q W Z -X Z W Q +X Z W Z respectively calculating positive integer term X by using adder, multiplier and RRAM array circuit Q W Z 、X Z W Q And X Q W Q Finally combining X calculated in the step one Z W Z Substituting the expansion formula to calculate a Y value; by the aid of the split calculation method, the RRAM array stores and calculates positive integers, and the problem of excessive resource consumption of RRAM devices caused by direct calculation of signed numbers is solved;
and step three, storing the calculation result of the Y value into a buffer, performing other operations such as subsequent activation function and quantization, and obtaining complete characteristic diagram data after the processing is finished to be used as the input of the next layer of network.
2. The method for optimizing an array structure of a RRAM-oriented memory computing system of claim 1, wherein: the quantization formula in the first step is as follows:,
wherein R is an original data value, Q is a quantized data value, S is a scale factor scale representing the proportional relation between the original data and the quantized data, and Z is zero offset representing an integer corresponding to 0 in the original data after quantization;
setting the quantization precision as n bits, randomly extracting part of test set data, calculating a quantization scale factor S and a zero offset Z in each layer of network according to the following formula, and substituting the quantization scale factor S and the zero offset Z into the formula to obtain a quantized data value;
wherein R is max And R min 、Q max And Q min The maximum and minimum values of the original data value and the quantized data value, respectively, and Q max And Q min Determined by the quantization precision n, R max And R min Determined by the randomly drawn part of the sample data.
3. The method for optimizing an array structure of a RRAM-oriented memory computing system of claim 1, wherein: in the second step, the RRAM adopts a device with two stable states of only high and low resistance states to form a 1T1R structure with the NMOS tube, and then a 1T1R crisscross array is built; and the n 1T1R structures in a row form an nbits weight value, and corresponding shift weighting is carried out when the output current is collected to obtain a calculation result.
4. The method of claim 3, wherein the method comprises the following steps: weight data W Q Writing the data into the 1T1R array memory line by line under the control of a peripheral digital logic circuit, and then quantizing the parameter X Z Converting the voltage into a read voltage signal, inputting the read voltage signal to a 1T1R array, collecting output current, quantizing, shifting and weighting to obtain a calculation result X Z W Q 。
5. The method of claim 3, wherein the method for optimizing the array structure of the RRAM-oriented memory computing system comprises: scheduling by a digital control circuit, sequentially taking corresponding picture data from the buffer, adding the picture data, and multiplying the picture data by corresponding W Z To obtain X Q W Z (ii) a Meanwhile, converting the picture data into voltage signals, inputting the voltage signals from the head of a row of the 1T1R array, reading the output current at the tail end of a column, and obtaining X after quantization and shift weighting Q W Q 。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310186971.9A CN115879530B (en) | 2023-03-02 | 2023-03-02 | RRAM (remote radio access m) memory-oriented computing system array structure optimization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310186971.9A CN115879530B (en) | 2023-03-02 | 2023-03-02 | RRAM (remote radio access m) memory-oriented computing system array structure optimization method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115879530A true CN115879530A (en) | 2023-03-31 |
CN115879530B CN115879530B (en) | 2023-05-05 |
Family
ID=85761720
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310186971.9A Active CN115879530B (en) | 2023-03-02 | 2023-03-02 | RRAM (remote radio access m) memory-oriented computing system array structure optimization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115879530B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116561050A (en) * | 2023-04-07 | 2023-08-08 | 清华大学 | Fine granularity mapping method and device for RRAM (remote radio access memory) integrated chip |
Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109146070A (en) * | 2017-06-16 | 2019-01-04 | 华为技术有限公司 | A kind of peripheral circuit and system of neural network training of the support based on RRAM |
CN109472353A (en) * | 2018-11-22 | 2019-03-15 | 济南浪潮高新科技投资发展有限公司 | A kind of convolutional neural networks sample circuit and quantization method |
CN109993297A (en) * | 2019-04-02 | 2019-07-09 | 南京吉相传感成像技术研究院有限公司 | A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing |
CN110378468A (en) * | 2019-07-08 | 2019-10-25 | 浙江大学 | A kind of neural network accelerator quantified based on structuring beta pruning and low bit |
CN110427171A (en) * | 2019-08-09 | 2019-11-08 | 复旦大学 | Expansible fixed-point number matrix multiply-add operation deposits interior calculating structures and methods |
CN110569962A (en) * | 2019-08-08 | 2019-12-13 | 华中科技大学 | Convolution calculation accelerator based on 1T1R memory array and operation method thereof |
CN110647983A (en) * | 2019-09-30 | 2020-01-03 | 南京大学 | Self-supervision learning acceleration system and method based on storage and calculation integrated device array |
CN110852429A (en) * | 2019-10-28 | 2020-02-28 | 华中科技大学 | Convolutional neural network based on 1T1R and operation method thereof |
US20200098428A1 (en) * | 2018-09-20 | 2020-03-26 | University Of Utah Research Foundation | Digital rram-based convolutional block |
CN111242289A (en) * | 2020-01-19 | 2020-06-05 | 清华大学 | Convolutional neural network acceleration system and method with expandable scale |
CN111738427A (en) * | 2020-08-14 | 2020-10-02 | 电子科技大学 | Operation circuit of neural network |
CN111832719A (en) * | 2020-07-28 | 2020-10-27 | 电子科技大学 | Fixed point quantization convolution neural network accelerator calculation circuit |
CN112183739A (en) * | 2020-11-02 | 2021-01-05 | 中国科学技术大学 | Hardware architecture of memristor-based low-power-consumption pulse convolution neural network |
CN112633477A (en) * | 2020-12-28 | 2021-04-09 | 电子科技大学 | Quantitative neural network acceleration method based on field programmable array |
CN113033794A (en) * | 2021-03-29 | 2021-06-25 | 重庆大学 | Lightweight neural network hardware accelerator based on deep separable convolution |
US20210200513A1 (en) * | 2019-12-30 | 2021-07-01 | Samsung Electronics Co., Ltd. | Method and apparatus with floating point processing |
CN113065632A (en) * | 2019-12-16 | 2021-07-02 | 三星电子株式会社 | Method and apparatus for validating training of neural networks for image recognition |
CN113705803A (en) * | 2021-08-31 | 2021-11-26 | 南京大学 | Image hardware identification system based on convolutional neural network and deployment method |
CN113762491A (en) * | 2021-08-10 | 2021-12-07 | 南京工业大学 | Convolutional neural network accelerator based on FPGA |
-
2023
- 2023-03-02 CN CN202310186971.9A patent/CN115879530B/en active Active
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109146070A (en) * | 2017-06-16 | 2019-01-04 | 华为技术有限公司 | A kind of peripheral circuit and system of neural network training of the support based on RRAM |
US20200098428A1 (en) * | 2018-09-20 | 2020-03-26 | University Of Utah Research Foundation | Digital rram-based convolutional block |
CN109472353A (en) * | 2018-11-22 | 2019-03-15 | 济南浪潮高新科技投资发展有限公司 | A kind of convolutional neural networks sample circuit and quantization method |
CN109993297A (en) * | 2019-04-02 | 2019-07-09 | 南京吉相传感成像技术研究院有限公司 | A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing |
CN110378468A (en) * | 2019-07-08 | 2019-10-25 | 浙江大学 | A kind of neural network accelerator quantified based on structuring beta pruning and low bit |
CN110569962A (en) * | 2019-08-08 | 2019-12-13 | 华中科技大学 | Convolution calculation accelerator based on 1T1R memory array and operation method thereof |
CN110427171A (en) * | 2019-08-09 | 2019-11-08 | 复旦大学 | Expansible fixed-point number matrix multiply-add operation deposits interior calculating structures and methods |
CN110647983A (en) * | 2019-09-30 | 2020-01-03 | 南京大学 | Self-supervision learning acceleration system and method based on storage and calculation integrated device array |
CN110852429A (en) * | 2019-10-28 | 2020-02-28 | 华中科技大学 | Convolutional neural network based on 1T1R and operation method thereof |
CN113065632A (en) * | 2019-12-16 | 2021-07-02 | 三星电子株式会社 | Method and apparatus for validating training of neural networks for image recognition |
US20210200513A1 (en) * | 2019-12-30 | 2021-07-01 | Samsung Electronics Co., Ltd. | Method and apparatus with floating point processing |
CN113126953A (en) * | 2019-12-30 | 2021-07-16 | 三星电子株式会社 | Method and apparatus for floating point processing |
CN111242289A (en) * | 2020-01-19 | 2020-06-05 | 清华大学 | Convolutional neural network acceleration system and method with expandable scale |
CN111832719A (en) * | 2020-07-28 | 2020-10-27 | 电子科技大学 | Fixed point quantization convolution neural network accelerator calculation circuit |
CN111738427A (en) * | 2020-08-14 | 2020-10-02 | 电子科技大学 | Operation circuit of neural network |
CN112183739A (en) * | 2020-11-02 | 2021-01-05 | 中国科学技术大学 | Hardware architecture of memristor-based low-power-consumption pulse convolution neural network |
CN112633477A (en) * | 2020-12-28 | 2021-04-09 | 电子科技大学 | Quantitative neural network acceleration method based on field programmable array |
CN113033794A (en) * | 2021-03-29 | 2021-06-25 | 重庆大学 | Lightweight neural network hardware accelerator based on deep separable convolution |
CN113762491A (en) * | 2021-08-10 | 2021-12-07 | 南京工业大学 | Convolutional neural network accelerator based on FPGA |
CN113705803A (en) * | 2021-08-31 | 2021-11-26 | 南京大学 | Image hardware identification system based on convolutional neural network and deployment method |
Non-Patent Citations (2)
Title |
---|
PENG YAO等: "Fully hardware-implemented memristor convolutional neural network", 《NATURE》 * |
季渊等: "具有二维状态转移结构的随机逻辑及其在神经网络中的应用", 《电子与信息学报》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116561050A (en) * | 2023-04-07 | 2023-08-08 | 清华大学 | Fine granularity mapping method and device for RRAM (remote radio access memory) integrated chip |
Also Published As
Publication number | Publication date |
---|---|
CN115879530B (en) | 2023-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109063825B (en) | Convolutional neural network accelerator | |
CN107169563B (en) | Processing system and method applied to two-value weight convolutional network | |
Wang et al. | Low power convolutional neural networks on a chip | |
CN108665063B (en) | Bidirectional parallel processing convolution acceleration system for BNN hardware accelerator | |
CN107256424B (en) | Three-value weight convolution network processing system and method | |
WO2022037257A1 (en) | Convolution calculation engine, artificial intelligence chip, and data processing method | |
CN113222133B (en) | FPGA-based compressed LSTM accelerator and acceleration method | |
CN111652360B (en) | Convolution operation device based on pulsation array | |
CN115879530B (en) | RRAM (remote radio access m) memory-oriented computing system array structure optimization method | |
CN112636745B (en) | Logic unit, adder and multiplier | |
CN115423081A (en) | Neural network accelerator based on CNN _ LSTM algorithm of FPGA | |
KR20220114519A (en) | Quantum error correction decoding system and method, fault-tolerant quantum error correction system and chip | |
WO2023116923A1 (en) | Storage and calculation integrated device and calculation method | |
CN111931925A (en) | FPGA-based binary neural network acceleration system | |
CN113762493A (en) | Neural network model compression method and device, acceleration unit and computing system | |
Shahshahani et al. | Memory optimization techniques for fpga based cnn implementations | |
TWI737228B (en) | Quantization method based on hardware of in-memory computing and system thereof | |
KR20230084449A (en) | Neural processing unit | |
Guan et al. | Recursive binary neural network training model for efficient usage of on-chip memory | |
WO2022062391A1 (en) | System and method for accelerating rnn network, and storage medium | |
CN111627479B (en) | Coding type flash memory device, system and coding method | |
CN113378115B (en) | Near-memory sparse vector multiplier based on magnetic random access memory | |
CN115495152A (en) | Memory computing circuit with variable length input | |
US20230047364A1 (en) | Partial sum management and reconfigurable systolic flow architectures for in-memory computation | |
CN115222028A (en) | One-dimensional CNN-LSTM acceleration platform based on FPGA and implementation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |