CN113592072A - Sparse convolution neural network accelerator oriented to memory access optimization - Google Patents
Sparse convolution neural network accelerator oriented to memory access optimization Download PDFInfo
- Publication number
- CN113592072A CN113592072A CN202110845980.5A CN202110845980A CN113592072A CN 113592072 A CN113592072 A CN 113592072A CN 202110845980 A CN202110845980 A CN 202110845980A CN 113592072 A CN113592072 A CN 113592072A
- Authority
- CN
- China
- Prior art keywords
- data
- module
- convolution operation
- activation
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000005457 optimization Methods 0.000 title claims abstract description 12
- 238000013528 artificial neural network Methods 0.000 title description 10
- 230000004913 activation Effects 0.000 claims abstract description 63
- 238000012545 processing Methods 0.000 claims abstract description 26
- 210000002364 input neuron Anatomy 0.000 claims abstract description 25
- VIEYMVWPECAOCY-UHFFFAOYSA-N 7-amino-4-(chloromethyl)chromen-2-one Chemical compound ClCC1=CC(=O)OC2=CC(N)=CC=C21 VIEYMVWPECAOCY-UHFFFAOYSA-N 0.000 claims abstract description 22
- 238000004364 calculation method Methods 0.000 claims abstract description 17
- 238000012216 screening Methods 0.000 claims abstract description 16
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 14
- 210000002569 neuron Anatomy 0.000 claims abstract description 12
- 230000009977 dual effect Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 3
- 230000000977 initiatory effect Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims 2
- 230000006872 improvement Effects 0.000 abstract description 12
- 238000010586 diagram Methods 0.000 description 8
- 238000000034 method Methods 0.000 description 7
- 238000003062 neural network model Methods 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
A memory access optimization oriented sparse convolutional neural network accelerator, comprising: the sparse activation value processing module SSG is used for removing zero-value activation data and screening out effective non-zero activation values; the buffer module CBUF is used for storing input neuron data and realizing repeated activation data multiplexing; the cache module PB is used for storing the weight data read in parallel; the operation module CMAC is used for completing the multiply-add operation of the convolution operation; in the data reading stage, reading neuron data required by the current convolution operation into a cache module, and reading weight data into a cache module PB; in the screening and multiplexing stage, the sparse activation value processing module screens out the non-zero activation data in the cache module, and simultaneously checks whether multiplexed activation data exist or not; and in the operation stage, transmitting the screened non-zero activation data to an operation module for convolution calculation. The invention has the advantages of simple principle, easy realization, obvious improvement on the efficiency of calculation and memory access, and the like.
Description
Technical Field
The invention mainly relates to the technical field of neural network application, in particular to a sparse convolution neural network accelerator oriented to access optimization.
Background
At present, the Deep Learning (Deep Learning) technology is rapidly developed, and is widely applied in many fields, and at the same time, the Deep Learning technology is a hot field of academic research. In the deep learning technology, the most concerned is a deep neural network model, and the deep neural network has a good effect in many artificial intelligence applications, including the fields of computer vision, natural language processing, machine translation, image recognition and the like. However, the training and derivation of neural network models place high demands on the computational power, and ordinary CPUs and embedded processors have been unable to meet the computational power required for the calculation of neural network models.
The method aims to solve the problem that the computational complexity is increased due to the rapid development of a neural network model. Currently, there are two main flow schemes provided by those skilled in the art to provide computational support for the operation of neural network models. One approach is to use a GPU with a large number of parallel threads to complete the computation of the model; another approach is to develop a dedicated neural network accelerator based on FPGA or ASIC design. Although GPUs can provide computationally intensive support for neural network models, their large power consumption cost is a concern. Therefore, a low-power, high-computational-effort dedicated neural network accelerator becomes an effective method for resolving the neural network model calculations.
NVDLA is one of the representatives of the special neural network accelerator, and is an open source neural network accelerator platform which is introduced by great invida and faces to the deep learning reasoning process. However, there are still two problems during the convolution operation:
the first and the large amount of zero-value activation value data occupy a large amount of memory space and operation units, and the calculation in which the zero-value activation values participate is called invalid calculation. Because the NVDLA does not support processing for sparse active values, efficient removal of these zero-valued active value data can improve the efficiency of convolution operations;
second, there is a large amount of repeated activation value data in the convolutional layer sliding window, i.e. data at the same position of adjacent convolution operations needs to be repeatedly involved in calculation, which results in a proliferation of NVDLA accelerator access amount.
In summary, a method capable of processing sparse activation data in convolution operation and multiplexing repeated activation data in different convolution operations, and further improving computation and memory access efficiency in the NVDLA convolution process, is needed.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the technical problems in the prior art, the invention provides the memory access optimization-oriented sparse convolutional neural network accelerator which is simple in principle, easy to implement and obvious in improvement of calculation and memory access efficiency.
In order to solve the technical problems, the invention adopts the following technical scheme:
a memory-oriented optimization sparse convolutional neural network accelerator, comprising:
the sparse activation value processing module SSG is used for removing zero-value activation data and screening out effective non-zero activation values;
the buffer module CBUF is used for storing input neuron data and realizing repeated activation data multiplexing;
the cache module PB is used for storing the weight data read in parallel;
the operation module CMAC is used for completing the multiply-add operation of the convolution operation;
in the data reading stage, reading neuron data required by the current convolution operation into a buffer module CBUF, and reading weight data into a buffer module PB;
in the screening and multiplexing stage, the sparse activation value processing module SSG screens out the non-zero activation data in the cache module CBUF, and simultaneously checks whether multiplexed activation data exist or not;
and in the operation stage, transmitting the screened non-zero activation data to an operation module CMAC for convolution calculation.
As a further improvement of the invention: the sparse activation value processing module SSG includes:
the input neuron and the weight channel are used for storing input neuron and weight data required by each convolution operation;
the index table is used for recording the position of the non-zero activation value data in the memory;
and the threshold setting module is used for setting threshold parameters of data screening.
As a further improvement of the invention: and the size of the input neurons, the input neurons in the weight channels and the weight data is set to be 16 × 1 × 128 byte.
As a further improvement of the invention: and the screening data threshold T in the threshold setting module is set to be zero and used for screening out non-zero activation value data.
As a further improvement of the invention: the buffer module CBUF includes:
the counting module is used for determining the starting time of the accelerator for initiating the memory access operation, and the counting module is set to be 2 bits to identify when the accelerator starts to execute the memory access operation;
and the identification module is used for identifying whether the secondary convolution operation is finished or not, when the identification is 0, the secondary convolution operation is not finished, otherwise, the current convolution operation is finished.
As a further improvement of the invention: the counting module comprises a counting component Stripe Count used for realizing the counting function in the convolution operation, namely the first convolution operation and the second convolution operation; the first convolution operation has no data multiplexing, partial data in the front of the data section needs to be discarded after the convolution operation is finished, the partial data in the rear is reserved for the next convolution operation for data multiplexing, and the like;
as a further improvement of the invention: the identification module comprises an identification component C _ Flag, the convolution identification component is set to be 0 or 1, the value of 0 indicates that the current convolution operation is not completed due to the lack of data quantity, otherwise, the current convolution operation is completed; the convolution identification bit of the second convolution operation is set to be 0, and the convolution identification bit of the third convolution operation is also set to be 0 as well as the second convolution operation; until the fourth convolution operation, the total amount of the missing active data segments is accumulated to the required active data amount of one convolution operation; at the moment, a read operation is initiated, and the new data are all read into the Buffer; and the data is stored in the cache in the CBUF according to the sequence of use.
As a further improvement of the invention: the operation module CMAC is used for setting to execute multiplication and addition operation in convolution operation, the CMAC components have 16 groups in common, each group of data input is 64 bits, each group of data input has the same input neuron data, and the input weight data are different.
As a further improvement of the invention: the operation module CMAC comprises a three-level pipeline stack:
the first stage is a multiplier layer which comprises 16 multipliers, and the input of each multiplier is 64 bits;
the second level is an adder layer which is set as 16 input ports;
the third layer is a linear processing unit used for completing the nonlinear processing work of the activation data;
and the multiplier layer adopts a weight fixed flow mode to carry out mapping, namely the weight channel updates the weight elements in the weight channel after traversing all the activation data corresponding to the group of weights.
As a further improvement of the invention: the cache module PB stores input weight data, the number of the input weight data is set to be 16 in total, and the input weight data are not distributed in different PE units; the size of the input weight data is set to be 256KB of SRAM, and the input weight data consists of 16 groups of 1KB of Bank; each Bank consists of 64-bit wide, 128-entry dual port SRAM.
Compared with the prior art, the invention has the advantages that:
the sparse convolution neural network accelerator facing the access optimization is simple in structure, easy to realize and obvious in improvement of calculation and access efficiency; the invention utilizes the sparse activation value processing module to enable the accelerator to dynamically skip the zero activation value, thereby improving the calculation efficiency of the accelerator calculation array in the convolution operation. Further, the accelerator can efficiently multiplex repeated activation value data in adjacent convolution operation by setting a plurality of weight value storage modules PB and improving CBUF.
Drawings
Fig. 1 is a schematic diagram of the topology of the present invention.
Fig. 2 is a schematic diagram of the structure of a sparse activation value processing module in a specific application example of the present invention.
FIG. 3 is a schematic diagram illustrating the working flow of the sparse activation value processing module in a specific application example of the present invention.
Fig. 4 is a schematic diagram of the sparse activation value processing module in a specific calculation example in a specific application example of the present invention.
FIG. 5 is a schematic diagram of the structure of an arithmetic element in a specific application example of the present invention.
Fig. 6 is a schematic diagram illustrating a structure of a cache block PB in a specific application example of the present invention.
Fig. 7 is a schematic diagram of the structure of the buffer module CBUF in the embodiment of the present invention.
Detailed Description
The invention will be described in further detail below with reference to the drawings and specific examples.
As shown in fig. 1, the sparse convolutional neural network accelerator facing memory access optimization of the present invention includes:
the sparse activation value processing module SSG is used for removing zero-value activation data and screening out effective non-zero activation values;
the buffer module CBUF is used for storing input neuron data and realizing repeated activation data multiplexing;
the cache module PB is used for storing the weight data read in parallel;
the operation module CMAC is used for completing the multiply-add operation of the convolution operation;
in the data reading stage, the accelerator reads in neuron data required by the current convolution operation to the cache module CBUF, and reads in weight data to the cache module PB;
in the screening and multiplexing stage, the sparse activation value processing module SSG screens out the non-zero activation data in the cache module CBUF, and simultaneously checks whether multiplexed activation data exist or not; and in the operation stage, the screened non-zero activation data is transmitted to an operation module CMAC for convolution calculation.
Referring to fig. 2, in a specific application example, the sparse activation value processing module SSG includes:
the method comprises the steps of inputting neurons and weight channels, and storing input neurons and weight data required by each convolution operation;
the index table records the position of the non-zero activation value data in the memory;
and the threshold setting module is used for setting threshold parameters of data screening.
In a specific application example, the input neurons and the weight channels store input neurons and weight data required in each convolution operation, and the size of the input neurons and the size of the weight data are generally set to be 16 × 1 × 128 byte.
In a specific application example, the threshold setting module is used for setting a threshold of the filtered data, the setting of the size of the threshold is in direct relation with the accuracy of the network model, and here, the threshold T is generally set to be zero, so that non-zero activation value data can be filtered out.
As shown in fig. 3, in a specific application example, the process of the accelerator processing to complete the screening of the non-zero activation value in the sparse activation value processing module of the present invention includes:
firstly, reading neuron data and weight data required in current convolution operation;
secondly, comparing the read input neuron data with a threshold value, and recording the position of the neuron data with the value larger than zero in an index table Indexing result;
and thirdly, transmitting the effective neuron data and the weight data to an operation module CMAC for calculation by looking up the position of the nonzero neuron data in the index table.
As shown in fig. 4, the sparse activation value processing module in the present invention is a schematic diagram in a specific calculation example. Let the dimension of the neuron input by one convolution operation be 4 x 4, and the dimension of the convolution kernel be 1 x 16. In a specific computing example, the accelerator needs three steps in total for completing the non-zero activation value data screening:
firstly, transmitting Input neuron data with the dimension of 4 x 4 to Input neuron channels;
in a second step, the input neuron data is compared to a threshold while the relative position of the non-zero neuron data is recorded in an index table indexinresult. Meanwhile, screening out corresponding weight data through the relative position of the nonzero activation value element in the index table;
and thirdly, transmitting nonzero neurons to a CMAC port of the computational array in a broadcasting mode in sequence, wherein the input neuron data in each CMAC are the same, and the weight data are different.
In a specific application example, the cache module CBUF includes: and the counting module is used for determining the starting time of the accelerator for initiating the memory access operation, the number of times of memory access of the accelerator is actually reduced by repeatedly activating the multiplexing of data, and therefore, the counting component is set to be 2 bits to identify when the accelerator starts to execute the memory access operation. And the identification module is used for identifying whether the secondary convolution operation is finished or not, when the identification is 0, the secondary convolution operation is not finished, otherwise, the current convolution operation is finished.
The buffer block PB is used to set the storage weight data. The total number of the CMAC units is set to be 16, and the CMAC units are distributed in different CMAC units, so that the aim of parallelizing reading of the weight value of the accelerator is fulfilled.
In a specific application example, the operation module CMAC is used for setting to execute a multiply-add operation in a convolution operation. The CMAC component has 16 groups in total, each group of data input is 64 bits, the input neuron data of each group are the same, and the input weight data are different.
As shown in fig. 5, in a specific application example, the operation module CMAC of the present invention includes a three-level pipeline stack:
the first stage is a multiplier layer which comprises 16 multipliers, and the input of each multiplier is 64 bits;
the second level is an adder layer which is set as 16 input ports;
the third layer is a linear processing unit which completes the nonlinear processing work of the activation data.
The multiplier layer performs mapping in a way of fixed flow of weights, that is, the weight channel updates the weight elements in the weight channel after traversing all the activation data corresponding to the set of weights.
As shown in fig. 6, in a specific application example, the number of the input weight data stored in the cache block PB of the present invention is set to 16 in total, and the input weight data are not distributed in different PE units. The size of the input weight data is set to 256KB of SRAM, which consists of 16 sets of 1KB of banks. Each Bank consists of 64-bit wide, 128-entry dual port SRAM. The purpose of setting the number of the buffer blocks PB to 16 is to reduce the time for reading the weight data, thereby facilitating the multiplexing of the repeatedly activated data in the adjacent convolution windows.
As shown in fig. 7, in a specific application example, the cache module CBUF of the present invention includes:
and a Buffer unit Buffer for storing the data-multiplexed non-repetitive input neuron data, and having a size set to 256KB of SRAM. A total of 16 banks of 16KB each consisting of 512 bit wide 256 entry dual port SRAM.
And the counting component Stripe Count is used for realizing a counting function in the convolution operation, namely, the first convolution operation, the second convolution operation and the like. Taking the example that the repeated active value elements in the adjacent convolution operations account for 75% of the active value data of the whole convolution operation, the first convolution operation has no data multiplexing, after the convolution operation is completed, the first 25% of data in the data segment needs to be discarded, the last 75% of data is reserved for the next convolution operation for data multiplexing, and so on. The addition of a counter may quantify the amount of data multiplexing in each convolution operation.
And the identification component C _ Flag is set to be 0 or 1, the value of 0 indicates that the current convolution operation is not completed due to the lack of the data quantity, and otherwise, the identification component C _ Flag indicates that the current convolution operation is completed. The second convolution operation has its convolution flag bit set to 0 because it lacks the last 25% of the active data, and the third convolution operation has its convolution flag bit set to 0, as does the second convolution operation, and because it lacks the last 25% of the data segment. The total amount of missing active data segments accumulates up to the required amount of active data for one convolution operation up to the fourth convolution operation. At this time, a read operation is initiated to read all the new data into the Buffer. And the data is stored in the cache in the CBUF according to the sequence of use.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that those skilled in the art should appreciate that they can make various changes and modifications without departing from the spirit and scope of the present invention.
Claims (10)
1. An access-oriented optimization sparse convolutional neural network accelerator, comprising:
the sparse activation value processing module SSG is used for removing zero-value activation data and screening out effective non-zero activation values;
the buffer module CBUF is used for storing input neuron data and realizing repeated activation data multiplexing;
the cache module PB is used for storing the weight data read in parallel;
the operation module CMAC is used for completing the multiply-add operation of the convolution operation;
in the data reading stage, reading neuron data required by the current convolution operation into a buffer module CBUF, and reading weight data into a buffer module PB;
in the screening and multiplexing stage, the sparse activation value processing module SSG screens out the non-zero activation data in the cache module CBUF, and simultaneously checks whether multiplexed activation data exist or not;
and in the operation stage, transmitting the screened non-zero activation data to an operation module CMAC for convolution calculation.
2. The memory access optimization-oriented sparse convolutional neural network accelerator of claim 1, wherein the sparse activation value processing module SSG comprises:
the input neuron and the weight channel are used for storing input neuron and weight data required by each convolution operation;
the index table is used for recording the position of the non-zero activation value data in the memory;
and the threshold setting module is used for setting threshold parameters of data screening.
3. The memory-oriented optimization sparse convolutional neural network accelerator of claim 2, wherein the input neurons, the input neurons in the weight channels, and the weight data are sized to 16 x 1 x 128 bytes.
4. The memory access optimization-oriented sparse convolutional neural network accelerator of claim 2, wherein the filtering data threshold T in the threshold setting module is set to a zero value for filtering out non-zero activation value data.
5. The sparse convolutional neural network accelerator oriented to memory access optimization of any one of claims 1-4, wherein the cache module CBUF comprises:
the counting module is used for determining the starting time of the accelerator for initiating the memory access operation, and the counting module is set to be 2 bits to identify when the accelerator starts to execute the memory access operation;
and the identification module is used for identifying whether the secondary convolution operation is finished or not, when the identification is 0, the secondary convolution operation is not finished, otherwise, the current convolution operation is finished.
6. The sparse convolutional neural network accelerator for memory access optimization as claimed in claim 5, wherein the counting module comprises a counter unit Stripe Count for implementing a counting function in the convolution operation, namely a first convolution operation and a second convolution operation; the first convolution operation has no data multiplexing, after the convolution operation is completed, partial data in the front of the data section needs to be discarded, partial data in the rear is reserved for the next convolution operation for data multiplexing, and the like.
7. The memory access optimization-oriented sparse convolutional neural network accelerator of claim 6, wherein the identification module comprises an identification component C _ Flag, the convolutional identification component is set to 0 or 1, the value of 0 indicates that the current convolutional operation is not completed due to lack of data amount, otherwise, indicates that the current convolutional operation is completed; the convolution identification bit of the second convolution operation is set to be 0, and the convolution identification bit of the third convolution operation is also set to be 0 as well as the second convolution operation; until the fourth convolution operation, the total amount of the missing active data segments is accumulated to the required active data amount of one convolution operation; at the moment, a read operation is initiated, and the new data are all read into the Buffer; and the data is stored in the cache in the CBUF according to the sequence of use.
8. The memory access optimization-oriented sparse convolutional neural network accelerator as claimed in any one of claims 1 to 4, wherein the operation module CMAC is configured to perform a multiply-add operation in a convolution operation, a CMAC component has 16 groups in common, each group of data input is 64 bits, each group of data input has the same input neuron data, and the input weight data is different.
9. The memory access optimization oriented sparse convolutional neural network accelerator of claim 8, wherein the operation module CMAC comprises a three-level pipeline stack:
the first stage is a multiplier layer which comprises 16 multipliers, and the input of each multiplier is 64 bits;
the second level is an adder layer which is set as 16 input ports;
the third layer is a linear processing unit used for completing the nonlinear processing work of the activation data;
and the multiplier layer adopts a weight fixed flow mode to carry out mapping, namely the weight channel updates the weight elements in the weight channel after traversing all the activation data corresponding to the group of weights.
10. The memory access optimization-oriented sparse convolutional neural network accelerator as claimed in any one of claims 1 to 4, wherein the cache module PB stores input weight data, the number of the input weight data is set to 16 in total, and the input weight data are distributed in different PE units; the size of the input weight data is set to be 256KB of SRAM, and the input weight data consists of 16 groups of 1KB of Bank; each Bank consists of 64-bit wide, 128-entry dual port SRAM.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110845980.5A CN113592072B (en) | 2021-07-26 | 2021-07-26 | Sparse convolutional neural network accelerator for memory optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110845980.5A CN113592072B (en) | 2021-07-26 | 2021-07-26 | Sparse convolutional neural network accelerator for memory optimization |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113592072A true CN113592072A (en) | 2021-11-02 |
CN113592072B CN113592072B (en) | 2024-05-14 |
Family
ID=78250125
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110845980.5A Active CN113592072B (en) | 2021-07-26 | 2021-07-26 | Sparse convolutional neural network accelerator for memory optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113592072B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180046900A1 (en) * | 2016-08-11 | 2018-02-15 | Nvidia Corporation | Sparse convolutional neural network accelerator |
CN111126569A (en) * | 2019-12-18 | 2020-05-08 | 中电海康集团有限公司 | Convolutional neural network device supporting pruning sparse compression and calculation method |
CN112418396A (en) * | 2020-11-20 | 2021-02-26 | 北京工业大学 | Sparse activation perception type neural network accelerator based on FPGA |
-
2021
- 2021-07-26 CN CN202110845980.5A patent/CN113592072B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180046900A1 (en) * | 2016-08-11 | 2018-02-15 | Nvidia Corporation | Sparse convolutional neural network accelerator |
CN111126569A (en) * | 2019-12-18 | 2020-05-08 | 中电海康集团有限公司 | Convolutional neural network device supporting pruning sparse compression and calculation method |
CN112418396A (en) * | 2020-11-20 | 2021-02-26 | 北京工业大学 | Sparse activation perception type neural network accelerator based on FPGA |
Non-Patent Citations (1)
Title |
---|
徐睿,马胜,郭阳,黄友,李艺煌: "《基于 Wi nog r ad 稀疏算法的卷积神经网络加速器设计与研究》", 计算机工程与科学, vol. 41, no. 9 * |
Also Published As
Publication number | Publication date |
---|---|
CN113592072B (en) | 2024-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021004366A1 (en) | Neural network accelerator based on structured pruning and low-bit quantization, and method | |
JP7469407B2 (en) | Exploiting sparsity of input data in neural network computation units | |
JP6857286B2 (en) | Improved performance of neural network arrays | |
Yin et al. | An energy-efficient reconfigurable processor for binary-and ternary-weight neural networks with flexible data bit width | |
CN105843775B (en) | On piece data divide reading/writing method, system and its apparatus | |
Chen et al. | Persistent homology computation with a twist | |
WO2018205708A1 (en) | Processing system and method for binary weight convolutional network | |
CN112465110B (en) | Hardware accelerator for convolution neural network calculation optimization | |
CN110738308B (en) | Neural network accelerator | |
JP2024052988A5 (en) | ||
WO2022037257A1 (en) | Convolution calculation engine, artificial intelligence chip, and data processing method | |
JP2018142049A (en) | Information processing apparatus, image recognition apparatus and method of setting parameter for convolution neural network | |
CN107256424A (en) | Three value weight convolutional network processing systems and method | |
CN112734020B (en) | Convolution multiplication accumulation hardware acceleration device, system and method of convolution neural network | |
CN110580519A (en) | Convolution operation structure and method thereof | |
CN111507465A (en) | Configurable convolutional neural network processor circuit | |
CN116720549A (en) | FPGA multi-core two-dimensional convolution acceleration optimization method based on CNN input full cache | |
Li et al. | Winograd algorithm for addernet | |
CN112200310B (en) | Intelligent processor, data processing method and storage medium | |
CN113592072A (en) | Sparse convolution neural network accelerator oriented to memory access optimization | |
Yang et al. | A parallel processing cnn accelerator on embedded devices based on optimized mobilenet | |
CN110766136B (en) | Compression method of sparse matrix and vector | |
CN113592075B (en) | Convolution operation device, method and chip | |
TW202215300A (en) | Convolutional neural network operation method and device | |
CN112836793A (en) | Floating point separable convolution calculation accelerating device, system and image processing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |