CN111079919A - Memory computing architecture supporting weight sparsity and data output method thereof - Google Patents

Memory computing architecture supporting weight sparsity and data output method thereof Download PDF

Info

Publication number
CN111079919A
CN111079919A CN201911151228.XA CN201911151228A CN111079919A CN 111079919 A CN111079919 A CN 111079919A CN 201911151228 A CN201911151228 A CN 201911151228A CN 111079919 A CN111079919 A CN 111079919A
Authority
CN
China
Prior art keywords
sub
memory cell
analog
weight
digital conversion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911151228.XA
Other languages
Chinese (zh)
Other versions
CN111079919B (en
Inventor
刘勇攀
岳金山
袁哲
孙文钰
李学清
杨华中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201911151228.XA priority Critical patent/CN111079919B/en
Publication of CN111079919A publication Critical patent/CN111079919A/en
Application granted granted Critical
Publication of CN111079919B publication Critical patent/CN111079919B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Memory System (AREA)

Abstract

The embodiment of the invention provides a memory computing architecture supporting weight sparsity and a data output method thereof, wherein the architecture comprises the following steps: the memory cell array comprises a plurality of sub memory cell blocks, and an analog-to-digital conversion unit is correspondingly arranged at an output port of each row of sub memory cell blocks; the operation module is used for carrying out sparse training on the weight of the neural network model stored in the storage unit array according to each sub storage unit block, so that the weight stored in each sub storage unit block is trained to be an all-zero value or a non-all-zero value; and the detection module is used for turning off the analog-to-digital conversion unit and setting the output of the analog-to-digital conversion unit to be zero when the sub-storage unit block corresponding to the analog-to-digital conversion unit is detected to be in a working state and the stored weight is all zero. The embodiment of the invention can effectively reduce the power consumption of the memory calculation in the weight sparse application of the neural network model and improve the feasibility of the application.

Description

Memory computing architecture supporting weight sparsity and data output method thereof
Technical Field
The invention relates to the technical field of circuit design, in particular to a memory computing architecture supporting weight sparsity and a data output method thereof.
Background
The memory computing is a new circuit architecture, different from a traditional von Neumann architecture with separated storage and computing, the memory computing integrates the storage and the computing, and the computing is completed in a storage unit. Compared with the traditional structure, the memory computing has the characteristics of high parallelism and high energy efficiency, and is a better alternative scheme for algorithms which need a large number of parallel matrix vector multiplication operations, particularly neural network algorithms.
The neural network algorithm is an important algorithm of the current artificial intelligence technology, is composed of a large number of matrix vector multiplication operations, and is suitable for realizing high-energy-efficiency processing by using an in-memory computing circuit. In the application of the traditional memory computing architecture in a neural network algorithm, the memory computing architecture comprises a memory cell array with M rows and N columns, images are input into the memory cell through a digital-to-analog converter (DAC) in each row, and then multiplication and accumulation operations are carried out on the images and neural network weights (N columns in each row are weight data of one N-bit) stored in the memory cell.
At each clock cycle, m rows of DACs in the memory cell array are turned on, and the multiplication and accumulation result of the m rows is converted into a digital signal output on an analog-to-digital converter (ADC) of each column. That is, the result obtained from the memory calculation needs to be converted into a digital signal by an ADC or other modules, and then stored and processed in a digital circuit. Let the image input in the ith row be aiThe weight data of the n-bit in the ith row and the ith column (j x n) to (j x n + n-1) is wijThen the result of multiply-accumulate of the ADC output is
Figure BDA0002283600670000011
In practical application, in consideration of redundancy existing in a neural network algorithm, a large amount of weight data (weight) in the algorithm can be set to be 0 through a sparse technology, so that the calculation overhead of the neural network is reduced. However, the distribution of 0 s in memory calculations tends to be discrete and irregular. Since the memory calculation is usually parallel calculation, even if most weights are 0, as long as there is a weight other than 0, the ADC corresponding to the corresponding output result needs to be turned on, which will generate a large amount of power consumption, and even may occupy 95% of the power consumption overhead of the whole memory calculation module.
Disclosure of Invention
In order to overcome the above problems or at least partially solve the above problems, embodiments of the present invention provide a memory computing architecture supporting weight sparsity and a data output method thereof, so as to effectively reduce power consumption of memory computing in neural network model weight sparsity application and improve feasibility of application.
In a first aspect, an embodiment of the present invention provides an in-memory computing architecture supporting weight sparseness, including:
the memory cell array comprises a plurality of sub memory cell blocks, and an analog-to-digital conversion unit is correspondingly arranged at an output port of each row of sub memory cell blocks;
the operation module is used for carrying out sparse training on the weight of the neural network model stored in the storage unit array according to each sub storage unit block, so that the weight stored in each sub storage unit block is trained to be an all-zero value or a non-all-zero value;
and the detection module is used for turning off the analog-to-digital conversion unit and setting the output of the analog-to-digital conversion unit to be zero when the sub-storage unit block corresponding to the analog-to-digital conversion unit is detected to be in a working state and the stored weight is all zero.
Further, the operation module is further configured to adaptively adjust the number of rows and the number of columns of the sub-memory cell block in the process of performing sparse training, so as to adapt to the total number of rows and the total number of columns of the memory cell array.
Further, the operation module is further configured to mark the sub-memory cell block as a sparse block after training the weight stored in the sub-memory cell block to an all-zero value;
correspondingly, the detection module is further configured to detect whether the weight stored in each sub-memory cell block is all zero by detecting whether each sub-memory cell block includes a sparse block flag.
Optionally, the analog-to-digital conversion unit is specifically an analog-to-digital converter ADC, a sampling amplifying circuit SA, or an in-memory computing processing unit PU.
Optionally, the operation module is specifically configured to mark whether the sub storage unit block is a sparse block by using a 1-bit sparse mark sparse index.
Further, the operation module is further configured to, in each clock cycle, respectively match the number of rows and the number of columns of the corresponding memory cell array with the number of rows and the number of columns of the sub memory cell block.
In a second aspect, an embodiment of the present invention provides a data output method based on the memory computing architecture supporting weight sparsity as described in the first aspect, including:
according to each sub-storage unit block, carrying out sparse training on weights of the neural network model stored in the storage unit array, so that the weights stored in each sub-storage unit block are trained to be all-zero values or non-all-zero values;
if the sub-memory cell block corresponding to the analog-digital conversion unit is detected to be in a working state and the stored weight is all zero, the analog-digital conversion unit is turned off, the output of the analog-digital conversion unit is set to be zero, otherwise, multiplication and addition operation is carried out according to the input of the sub-memory cell block corresponding to the analog-digital conversion unit in the working state and the weight stored in the sub-memory cell block in the working state, and a multiplication and addition operation result is output by turning on the analog-digital conversion unit.
According to the weight-sparse-support memory computing architecture and the data output method thereof, provided by the embodiment of the invention, the weight in the memory-computed memory cell array is sparsely trained according to blocks, the memory cell array computed in the memory is divided into the sub-memory cell blocks, the weight sparseness of the neural network model according to the blocks is realized, and meanwhile, the power consumption of the memory-computed neural network model weight sparseness application can be effectively reduced and the feasibility of the application is improved by turning off the analog-to-digital conversion units corresponding to the sparse blocks.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a memory computing architecture supporting weight sparseness according to an embodiment of the present invention;
FIG. 2 is a schematic circuit diagram of a memory computing architecture supporting weight sparseness according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a data output method based on a memory computing architecture supporting weight sparseness according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments of the present invention without any creative efforts belong to the protection scope of the embodiments of the present invention.
Aiming at the problem of overhigh power consumption of in-memory calculation in neural network application in the prior art, the embodiment of the invention realizes the block-by-block sparsity of the weight of the neural network model by carrying out the block-by-block sparsity training on the weight in the memory unit array of in-memory calculation and carrying out the sub-memory unit block division on the memory unit array of in-memory calculation, and can effectively reduce the power consumption of in-memory calculation in the neural network model weight sparsity application and improve the feasibility of application by turning off the analog-to-digital conversion unit corresponding to the sparse block. Embodiments of the present invention will be described and illustrated with reference to various embodiments.
Fig. 1 is a schematic structural diagram of a memory computing architecture supporting weight sparsity according to an embodiment of the present invention, where the architecture may be used to implement closing of a corresponding analog-to-digital conversion circuit unit in a weight sparsity manner supporting rules, so as to reduce power consumption of a memory computing circuit system. As shown in fig. 1, the system includes a memory cell array 101, an operation module 102, and a detection module 103. Wherein:
the memory cell array 101 comprises a plurality of sub-memory cell blocks, and an analog-to-digital conversion unit is correspondingly arranged at an output port of each column of sub-memory cell blocks; the operation module 102 is configured to perform sparse training on weights of the neural network model stored in the memory cell array according to each sub-memory cell block, so that the weights stored in each sub-memory cell block are trained to be all-zero values or non-all-zero values; the detection module 103 is configured to turn off the analog-to-digital conversion unit and set an output of the analog-to-digital conversion unit to zero when it is detected that the sub-memory cell block corresponding to the analog-to-digital conversion unit is in the working state and the stored weight is a full zero value.
It can be understood that, in the memory computing architecture supporting weight sparsity according to the embodiment of the present invention, the weights stored in the memory computing circuit are sparse in blocks by training the weights of the neural network in a regular sparse manner, and when performing computation of corresponding sparse weights, the memory computing architecture can directly skip and turn off corresponding analog-to-digital conversion circuit units to save power consumption. Therefore, the method at least comprises a storage unit array 101, an operation module 102 and a detection module 103, which are respectively used for realizing the storage of the weight of the neural network model, the block-wise sparse training of the weight of the neural network model and the energy-saving processing flow by detecting sparse blocks and switching off the corresponding analog-to-digital conversion units.
Specifically, as shown in fig. 2, for a schematic circuit structure diagram of a memory computing architecture supporting weight sparseness according to an embodiment of the present invention, a memory cell array 101 includes M rows and N columns of memory cells, which are divided into M rows and N columns of small blocks, and each M rows and N columns of small blocks form a sub-memory cell block. Wherein the weight of each small block is trained to be a value of 0 all (sparse) or not all of 0 during the algorithm training process. Meanwhile, the multiply-add output end of each column of sub-memory cell blocks is provided with a corresponding analog-to-digital conversion unit for converting the result obtained by memory calculation, such as analog voltage/current signals, into digital signals to be stored and processed in a digital circuit.
Optionally, the analog-to-digital conversion unit is specifically an analog-to-digital converter ADC, a sampling amplifying circuit SA, or an in-memory computing processing unit PU. That is, in a different memory computation architecture, the ADC may be a sampling amplifier circuit (SA), a Processing Unit (PU), or the like. In either case, the function implemented is to convert the results of the memory calculations from analog voltage or current to a digital circuit representation.
The operation module 102 is mainly used to implement a calculation function in the memory calculation architecture, and specifically, through a neural network algorithm, trains a weight of a neural network model into a sparse form according to a block, and corresponds to m rows and n columns of sub-memory cell blocks of a memory cell array in memory calculation. That is, by block-wise sparse training, the weights stored in each sub-block of memory cells are either all zero values (i.e., all zero values are achieved) or all zero values (i.e., non-all zero values are achieved).
The detection module 103 detects the weight condition stored in each sub-memory cell block on the basis of sparse training to determine the storage state of the sub-memory cell block corresponding to each analog-to-digital conversion unit. That is, it is determined whether it is idle or in an operating state, and it is determined whether all weights stored in the sub memory cell block in the operating state are all zero or not all zero. And if the weights stored in the sub-memory cell blocks in the working state corresponding to a certain analog-to-digital conversion unit are all zero values, correspondingly turning off the analog-to-digital conversion unit. In addition, since the memory cell array of the memory calculation uses m rows as an operation unit in one clock cycle, and since all stored data of one sparse block is 0, the corresponding multiply-accumulate result can be directly determined to be 0. Therefore, the detection module 103 sets the output of the corresponding analog-to-digital conversion unit to zero on the basis of turning off the analog-to-digital conversion unit.
According to the weight-sparse-support memory computing architecture provided by the embodiment of the invention, the weight in the memory-computed memory cell array is sparsely trained according to blocks, the memory-computed memory cell array is divided into the sub-memory cell blocks, the weight-sparse-according-blocks of the neural network model is realized, and meanwhile, the power consumption of the memory-computed memory in the neural network model weight sparse application can be effectively reduced and the application feasibility is improved by switching off the analog-to-digital conversion units corresponding to the sparse blocks.
In addition, on the basis of the foregoing embodiments, the operation module may further be configured to adaptively adjust the number of rows and the number of columns of the sub-memory cell block in the process of performing the sparse training, so as to adapt to the total number of rows and the total number of columns of the memory cell array.
It can be understood that, during the block-wise sparse training through the neural network, the number of rows and columns of each block may be adaptively adjusted, so that the number of rows and columns of the sub-memory cell blocks is also adaptively divided to adapt to the total number of rows and the total number of columns of the memory cell array.
Furthermore, the operation module is further configured to, in each clock cycle, match the number of rows and the number of columns of the corresponding open memory cell array with the number of rows and the number of columns of the sub memory cell block, respectively. That is, in each clock cycle, the number of rows of the turned-on memory cell array is consistent with the number of rows of the sub memory cell blocks, and the number of columns of the turned-on memory cell array is consistent with the number of bits of the weight.
Furthermore, the operation module is further used for marking the sub-storage unit block as a sparse block after training the weight stored in the sub-storage unit block to be an all-zero value; correspondingly, the detection module is further configured to detect whether the weight stored in the sub-memory cell block is all zero values by detecting whether each sub-memory cell block includes the sparse block flag.
It can be understood that after all weights stored in a certain sub-storage unit block are trained to be zero values through a neural network algorithm, the sub-storage unit block is marked, that is, marked as a sparse block, so as to obtain a corresponding sparse block mark. Where the sparse block is called Sparse Weight Block (SWB). Optionally, the operation module is specifically configured to mark whether the sub-storage unit block is a sparse block by using a 1-bit sparse mark sparse index. That is, whether the weight data block corresponding to the current ADC is an SWB may be marked by a 1-bit sparse flag sparse index. And if the voltage is SWB, controlling the ADC to be powered off, and directly outputting 0 in a subsequent circuit, so that the running power consumption of the ADC is reduced.
Accordingly, when detecting whether each sub-memory cell block is a sparse block, the detection module only needs to detect whether each sub-memory cell block includes a corresponding sparse block flag to detect whether all the weights stored in each sub-memory cell block are zero values. For example, when a certain sub-memory cell block includes a corresponding sparse block flag, it indicates that the weights stored therein are all zero.
Based on the same inventive concept, the embodiment of the invention also provides a data output method based on the memory computing architecture supporting weight sparsity, which is based on the above embodiments. Therefore, the description and definition in the memory computing architecture supporting weight sparsity in the above embodiments may be used for understanding the processing steps in the embodiments of the present invention, and reference may be made to the above embodiments specifically, and details are not repeated here.
As an embodiment of the present invention, a data output method based on the memory computing architecture supporting weight sparsity according to the above embodiments is shown in fig. 3, which is a schematic flow chart of the data output method based on the memory computing architecture supporting weight sparsity according to the embodiment of the present invention, and includes the following processing procedures:
s301, according to each sub-storage unit block, sparse training is carried out on the weight of the neural network model stored in the storage unit array, so that the weight stored in each sub-storage unit block is trained to be all-zero or non-all-zero.
It can be understood that, in this step, data calculation in the memory cell array is mainly implemented, specifically, by using a neural network algorithm, weights of a neural network model are trained to be sparse in a block form, and correspond to m rows and n columns of sub memory cell blocks of the memory cell array in the memory cell calculation. That is, by block-wise sparse training, the weights stored in each sub-block of memory cells are either all zero values (i.e., all zero values are achieved) or all zero values (i.e., non-all zero values are achieved).
S302, if the sub-memory cell block corresponding to the analog-to-digital conversion unit is detected to be in the working state and the stored weight is all zero, the analog-to-digital conversion unit is turned off, and the output of the analog-to-digital conversion unit is set to be zero, otherwise, multiplication and addition operation is carried out according to the input of the sub-memory cell block corresponding to the analog-to-digital conversion unit in the working state and the weight stored in the sub-memory cell block in the working state, and a multiplication and addition operation result is output by turning on the analog-to-digital conversion unit.
It can be understood that, in this step, on the basis of sparse training, the weight condition stored in each sub-memory cell block is detected to determine the storage state of the sub-memory cell block corresponding to each analog-to-digital conversion unit. That is, it is determined whether it is idle or in an operating state, and it is determined in which the weight stored in the sub memory cell block in the operating state is all zero or not all zero.
And if the weights stored in the sub-memory cell blocks in the working state corresponding to a certain analog-to-digital conversion unit are all zero values, correspondingly turning off the analog-to-digital conversion unit. In addition, since the memory cell array of the memory calculation uses m rows as an operation unit in one clock cycle, and since all stored data of one sparse block is 0, the corresponding multiply-accumulate result can be directly determined to be 0. Therefore, the output of the corresponding analog-to-digital conversion unit is set to zero on the basis of turning off the analog-to-digital conversion unit.
In addition, if the weights stored in the sub memory cell blocks in the working state corresponding to a certain analog-to-digital conversion unit are not all zero values, the analog-to-digital conversion unit is correspondingly turned on. And meanwhile, performing multiply-add operation on the input of the sub-storage unit block in the working state corresponding to the analog-digital conversion unit and the weight stored in the sub-storage unit block, and outputting a corresponding multiply-add operation result.
According to the data output method based on the memory computing architecture supporting weight sparseness, which is provided by the embodiments of the invention, the weight in the memory computing memory cell array is sparsely trained according to blocks, the memory computing memory cell array is divided into sub memory cell blocks, the weight sparseness of the neural network model according to blocks is realized, and meanwhile, by turning off the analog-to-digital conversion unit corresponding to the sparse block, the power consumption of the memory computing in the neural network model weight sparseness application can be effectively reduced, and the feasibility of the application is improved.
To further illustrate the technical solutions of the embodiments of the present invention, the embodiments of the present invention provide the following specific processes according to the above embodiments, but do not limit the scope of the embodiments of the present invention.
According to the embodiment of the invention, the integrated circuit chip comprising the memory computing architecture example of the embodiment of the invention is obtained through the front-end design, the back-end design and the wafer manufacturing of the digital circuit and the analog circuit. The process adopts a station accumulated power 65nm process, and then the power consumption and the performance are tested after the chip is packaged. The chip area is 3.0mmx3.0mm, and 4 identical invention examples are contained, and the area of each invention example is 0.37mm x 0.40 mm. The test running frequency is 50-100MHz, and the corresponding voltage is 0.90-1.05V.
The data storage and operation process comprises the following steps:
and training the weight into a sparse form through a neural network algorithm, and calculating the SWB of m rows and n columns of the array corresponding to the memory.
The number of open rows of the memory calculation array in each period is consistent with m, and n is consistent with the number of bits of the weight. Both m and n can be flexibly adjusted in the algorithm training process to adapt to the actual in-memory computing array.
Dynamic shutdown of SWB-based ADCs. Each SWB of m rows and n columns corresponds to a 1-bit sparse mark, and the ADC is dynamically turned on and off when the current SWB multiply-accumulate operation is executed.
Experiments show that the embodiment of the invention reduces the power consumption overhead of the memory computing architecture by supporting weight sparseness and dynamically switching off the ADC. Meanwhile, sparse training and chip testing are performed on different neural network algorithms, two neural network algorithm models of VGG16 and ResNet18 are used on two image recognition test sets based on MNIST and Cifar-10, and weight data block compression of 20-39 times is achieved, namely the proportion of SWB accounts for 95% -97.4% of all weights. For a VGG16 network based on Cifar-10, a 4-bit input image and a 4-bit weight are adopted, power consumption saving of 2.4-13.6 times (along with SWB proportion change of different layers of the network) is realized in the invention example part of an actual chip, and the average saving is 10.1 times.
It will be appreciated that the above described embodiments of in-memory computing architectures are merely illustrative, in which the units illustrated as separate components may or may not be physically separate, may be located in one place, or may be distributed over different network elements. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on the understanding, the above technical solutions may be embodied in software products, or hardware products, which may be stored in a computer-readable storage medium, such as a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, and include instructions for causing a computer device (such as a personal computer, a server, or a network device) to execute the method described in the method embodiments or some parts of the method embodiments.
In addition, it should be understood by those skilled in the art that in the specification of the embodiments of the present invention, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
In the description of the embodiments of the invention, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description. Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the embodiments of the present invention, and not to limit the same; although embodiments of the present invention have been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (7)

1. An in-memory computing architecture that supports weight sparseness, comprising:
the memory cell array comprises a plurality of sub memory cell blocks, and an analog-to-digital conversion unit is correspondingly arranged at an output port of each row of sub memory cell blocks;
the operation module is used for carrying out sparse training on the weight of the neural network model stored in the storage unit array according to each sub storage unit block, so that the weight stored in each sub storage unit block is trained to be an all-zero value or a non-all-zero value;
and the detection module is used for turning off the analog-to-digital conversion unit and setting the output of the analog-to-digital conversion unit to be zero when the sub-storage unit block corresponding to the analog-to-digital conversion unit is detected to be in a working state and the stored weight is all zero.
2. The memory computing architecture of claim 1, wherein the computing module is further configured to adaptively adjust the number of rows and the number of columns of the sub-memory cell blocks to adapt to the total number of rows and the total number of columns of the memory cell array during the sparse training.
3. The memory computing architecture supporting weight sparsity according to claim 1 or 2, wherein the operation module is further configured to mark the sub-memory cell blocks as sparse blocks after training weights stored in the sub-memory cell blocks to all-zero values;
correspondingly, the detection module is further configured to detect whether the weight stored in each sub-memory cell block is all zero by detecting whether each sub-memory cell block includes a sparse block flag.
4. The memory computing architecture supporting weight sparseness of claim 1 or 2, wherein the analog-to-digital conversion unit is specifically an analog/digital converter (ADC), a sampling amplification circuit (SA), or a memory computing Processing Unit (PU).
5. The memory computing architecture of claim 3, wherein the computing module is specifically configured to mark whether the sub-memory cell block is a sparse block by using a 1-bit sparse mark sparse index.
6. The memory computing architecture of claim 2, wherein the computing module is further configured to, at each clock cycle, align the number of rows and columns of the corresponding memory cell array with the number of rows and columns of the sub-memory cell blocks, respectively.
7. A data output method based on the in-memory computing architecture supporting weight sparsity of any one of claims 1 to 6, comprising:
according to each sub-storage unit block, carrying out sparse training on weights of the neural network model stored in the storage unit array, so that the weights stored in each sub-storage unit block are trained to be all-zero values or non-all-zero values;
if the sub-memory cell block corresponding to the analog-digital conversion unit is detected to be in a working state and the stored weight is all zero, the analog-digital conversion unit is turned off, the output of the analog-digital conversion unit is set to be zero, otherwise, multiplication and addition operation is carried out according to the input of the sub-memory cell block corresponding to the analog-digital conversion unit in the working state and the weight stored in the sub-memory cell block in the working state, and a multiplication and addition operation result is output by turning on the analog-digital conversion unit.
CN201911151228.XA 2019-11-21 2019-11-21 Memory computing architecture supporting weight sparseness and data output method thereof Active CN111079919B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911151228.XA CN111079919B (en) 2019-11-21 2019-11-21 Memory computing architecture supporting weight sparseness and data output method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911151228.XA CN111079919B (en) 2019-11-21 2019-11-21 Memory computing architecture supporting weight sparseness and data output method thereof

Publications (2)

Publication Number Publication Date
CN111079919A true CN111079919A (en) 2020-04-28
CN111079919B CN111079919B (en) 2022-05-20

Family

ID=70311698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911151228.XA Active CN111079919B (en) 2019-11-21 2019-11-21 Memory computing architecture supporting weight sparseness and data output method thereof

Country Status (1)

Country Link
CN (1) CN111079919B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111431536A (en) * 2020-05-18 2020-07-17 深圳市九天睿芯科技有限公司 Subunit, MAC array and analog-digital mixed memory computing module with reconfigurable bit width
CN111709872A (en) * 2020-05-19 2020-09-25 北京航空航天大学 Spin memory computing architecture of graph triangle counting algorithm
CN112214326A (en) * 2020-10-22 2021-01-12 南京博芯电子技术有限公司 Equalization operation acceleration method and system for sparse recurrent neural network
CN112529171A (en) * 2020-12-04 2021-03-19 中国科学院深圳先进技术研究院 Memory computing accelerator and optimization method thereof
CN113313247A (en) * 2021-02-05 2021-08-27 中国科学院计算技术研究所 Operation method of sparse neural network based on data flow architecture
WO2022029790A1 (en) * 2020-08-04 2022-02-10 Indian Institute Of Technology, Madras A flash adc based method and process for in-memory computation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106851076A (en) * 2017-04-01 2017-06-13 重庆大学 Compressed sensing video image acquisition circuit based on address decoding
US20180046916A1 (en) * 2016-08-11 2018-02-15 Nvidia Corporation Sparse convolutional neural network accelerator
CN108988865A (en) * 2018-07-11 2018-12-11 西安空间无线电技术研究所 A kind of optimum design method of compressed sensing observing matrix
CN109472350A (en) * 2018-10-30 2019-03-15 南京大学 A kind of neural network acceleration system based on block circulation sparse matrix
CN109993297A (en) * 2019-04-02 2019-07-09 南京吉相传感成像技术研究院有限公司 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
CN110427171A (en) * 2019-08-09 2019-11-08 复旦大学 Expansible fixed-point number matrix multiply-add operation deposits interior calculating structures and methods

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180046916A1 (en) * 2016-08-11 2018-02-15 Nvidia Corporation Sparse convolutional neural network accelerator
CN106851076A (en) * 2017-04-01 2017-06-13 重庆大学 Compressed sensing video image acquisition circuit based on address decoding
CN108988865A (en) * 2018-07-11 2018-12-11 西安空间无线电技术研究所 A kind of optimum design method of compressed sensing observing matrix
CN109472350A (en) * 2018-10-30 2019-03-15 南京大学 A kind of neural network acceleration system based on block circulation sparse matrix
CN109993297A (en) * 2019-04-02 2019-07-09 南京吉相传感成像技术研究院有限公司 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
CN110427171A (en) * 2019-08-09 2019-11-08 复旦大学 Expansible fixed-point number matrix multiply-add operation deposits interior calculating structures and methods

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ANGSHUMAN PARASHAR 等: "SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks", 《ARXIV:1708.04485V1》 *
PEIQI WANG等: "SNrram: An Efficient Sparse Neural Network Computation Architecture Based on Resistive Random-Access Memory", 《2018 55TH ACM/ESDA/IEEE DESIGN AUTOMATION CONFERENCE》 *
ZHE YUAN 等: "A Sparse-Adaptive CNN Processor with Area/Performance balanced N-Way Set-Associate PE Arrays Assisted by a Collision-Aware Scheduler", 《IEEE ASIAN SOLID-STATE CIRCUITS CONFERENCE》 *
陈桂林 等: "硬件加速神经网络综述", 《计算机研究与发展》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111431536A (en) * 2020-05-18 2020-07-17 深圳市九天睿芯科技有限公司 Subunit, MAC array and analog-digital mixed memory computing module with reconfigurable bit width
US11948659B2 (en) 2020-05-18 2024-04-02 Reexen Technology Co., Ltd. Sub-cell, mac array and bit-width reconfigurable mixed-signal in-memory computing module
CN111709872A (en) * 2020-05-19 2020-09-25 北京航空航天大学 Spin memory computing architecture of graph triangle counting algorithm
CN111709872B (en) * 2020-05-19 2022-09-23 北京航空航天大学 Spin memory computing architecture of graph triangle counting algorithm
WO2022029790A1 (en) * 2020-08-04 2022-02-10 Indian Institute Of Technology, Madras A flash adc based method and process for in-memory computation
CN112214326A (en) * 2020-10-22 2021-01-12 南京博芯电子技术有限公司 Equalization operation acceleration method and system for sparse recurrent neural network
CN112529171A (en) * 2020-12-04 2021-03-19 中国科学院深圳先进技术研究院 Memory computing accelerator and optimization method thereof
CN112529171B (en) * 2020-12-04 2024-01-05 中国科学院深圳先进技术研究院 In-memory computing accelerator and optimization method thereof
CN113313247A (en) * 2021-02-05 2021-08-27 中国科学院计算技术研究所 Operation method of sparse neural network based on data flow architecture
CN113313247B (en) * 2021-02-05 2023-04-07 中国科学院计算技术研究所 Operation method of sparse neural network based on data flow architecture

Also Published As

Publication number Publication date
CN111079919B (en) 2022-05-20

Similar Documents

Publication Publication Date Title
CN111079919B (en) Memory computing architecture supporting weight sparseness and data output method thereof
Tang et al. Binary convolutional neural network on RRAM
CN111026700B (en) Memory computing architecture for realizing acceleration and acceleration method thereof
Zhou et al. Cambricon-S: Addressing irregularity in sparse neural networks through a cooperative software/hardware approach
Gupta et al. Masr: A modular accelerator for sparse rnns
CN109543830B (en) Splitting accumulator for convolutional neural network accelerator
CN108491926B (en) Low-bit efficient depth convolution neural network hardware accelerated design method, module and system based on logarithmic quantization
CN112070204B (en) Neural network mapping method and accelerator based on resistive random access memory
CN110991631A (en) Neural network acceleration system based on FPGA
Qin et al. Diagonalwise refactorization: An efficient training method for depthwise convolutions
CN113283587A (en) Winograd convolution operation acceleration method and acceleration module
Yang et al. Fusekna: Fused kernel convolution based accelerator for deep neural networks
Jiang et al. A low-latency LSTM accelerator using balanced sparsity based on FPGA
CN112529171B (en) In-memory computing accelerator and optimization method thereof
Saxena et al. Towards adc-less compute-in-memory accelerators for energy efficient deep learning
CN111381968A (en) Convolution operation optimization method and system for efficiently running deep learning task
Wu et al. A 3.89-GOPS/mW scalable recurrent neural network processor with improved efficiency on memory and computation
KR102541461B1 (en) Low power high performance deep-neural-network learning accelerator and acceleration method
Basumallik et al. Adaptive block floating-point for analog deep learning hardware
Chen et al. CompRRAE: RRAM-based convolutional neural network accelerator with r educed computations through ar untime a ctivation e stimation
He et al. Infox: An energy-efficient reram accelerator design with information-lossless low-bit adcs
WO2023146613A1 (en) Reduced power consumption analog or hybrid mac neural network
CN114897159A (en) Method for rapidly deducing incident angle of electromagnetic signal based on neural network
Zhu et al. Exploiting parallelism with vertex-clustering in processing-in-memory-based GCN accelerators
JP2023545575A (en) Quantization for neural network calculations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant