CN108510063B - Acceleration method and accelerator applied to convolutional neural network - Google Patents

Acceleration method and accelerator applied to convolutional neural network Download PDF

Info

Publication number
CN108510063B
CN108510063B CN201810306577.3A CN201810306577A CN108510063B CN 108510063 B CN108510063 B CN 108510063B CN 201810306577 A CN201810306577 A CN 201810306577A CN 108510063 B CN108510063 B CN 108510063B
Authority
CN
China
Prior art keywords
feature map
preset threshold
density
neural network
convolution kernel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810306577.3A
Other languages
Chinese (zh)
Other versions
CN108510063A (en
Inventor
刘勇攀
袁哲
岳金山
杨华中
李学清
王智博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201810306577.3A priority Critical patent/CN108510063B/en
Priority to PCT/CN2018/095365 priority patent/WO2019196223A1/en
Publication of CN108510063A publication Critical patent/CN108510063A/en
Application granted granted Critical
Publication of CN108510063B publication Critical patent/CN108510063B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides an acceleration method and an accelerator applied to a convolutional neural network, wherein the method comprises the following steps: s1, for any layer in the convolutional neural network, respectively calculating the density of each feature map output by the layer; s2, comparing the density of each characteristic graph output by the layer with a plurality of preset thresholds, and carrying out sparse coding on each characteristic graph according to the comparison result; wherein, different comparison results correspond to different sparse coding modes; and S3, convolving each feature map after sparse coding and each convolution kernel in the convolutional neural network which is sparsely coded in advance based on the convolutional layer of the next layer. The invention reduces the calculation amount of convolution operation in the convolution neural network and improves the operation speed.

Description

Acceleration method and accelerator applied to convolutional neural network
Technical Field
The invention belongs to the technical field of operation optimization, and particularly relates to an acceleration method and an accelerator applied to a convolutional neural network.
Background
A Convolutional Neural Network (CNN) is a feed-forward Neural Network whose artificial neurons can respond to a portion of the surrounding cells in the coverage, and is suitable for processing large images. The convolutional neural network is widely applied to the fields of image recognition, voice recognition and the like, but the calculation amount is very large.
A large number of sparse feature maps (feature maps) are caused by the activation function ReLU (corrected linear unit) in the convolutional neural network; meanwhile, training the convolutional neural network by using methods such as pruning and the like can cause a large amount of sparse weight data (weight data). The calculation efficiency of the convolutional neural network can be greatly improved by utilizing the sparsity of the feature map and the weight data. At present, many methods are used for improving the calculation speed based on the sparsity of feature maps and weight data in a convolutional neural network. These methods can be roughly divided into two categories, one of which focuses on skipping the 0 value. For example, some methods remove the value of 0 from the input, thereby reducing the number of invalid calculations for an input of 0. The other takes the approach of ignoring zero values. For example, there is a method of reducing the number of operations by not performing multiplication when the input data is 0. However, these methods all focus on dealing with sparse neural networks themselves, assuming that neural networks are sparse as a premise. In practice, however, the output feature maps of the various layers in the convolutional neural network may be sparse and may be non-sparse. In practical application, the weight data of each layer of the convolutional neural network and the density of the characteristic diagram are generally distributed between 5% and 90%.
The sparse matrix refers to a matrix in which the number of elements having a value of 0 is much greater than the number of elements other than 0, and the elements other than 0 are distributed irregularly. In the prior art, on one hand, only a sparse convolutional neural network can be processed, and under the condition that the convolutional neural network is not sparse, the calculation amount is large and the operation speed is low; on the other hand, the prior art can only process the condition that the weight data or the characteristic graph in the convolution nerve is sparse, and cannot process the condition that the weight data and the characteristic graph are both sparse.
Disclosure of Invention
In order to overcome the problem of low operation speed of the convolutional neural network or at least partially solve the problem, the invention provides an acceleration method and an accelerator applied to the convolutional neural network.
According to a first aspect of the present invention, there is provided an acceleration method applied to a convolutional neural network, comprising:
s1, for any layer in the convolutional neural network, respectively calculating the density of each feature map output by the layer;
s2, comparing the density of each characteristic graph output by the layer with a plurality of preset thresholds, and carrying out sparse coding on each characteristic graph according to the comparison result; wherein, different comparison results correspond to different sparse coding modes;
and S3, convolving each feature map after sparse coding and each convolution kernel in the convolutional neural network which is sparsely coded in advance based on the convolutional layer of the next layer.
Specifically, the step S1 specifically includes:
for any one feature map, counting the number of non-0 elements in the feature map and the total number of all elements in the feature map;
and taking the ratio of the number of the elements which are not 0 in the feature map to the total number of all the elements in the feature map as the density of the feature map.
Specifically, the preset threshold includes a first preset threshold and a second preset threshold; wherein the first preset threshold is smaller than the second preset threshold;
correspondingly, the step S2 specifically includes:
if the density of each characteristic diagram is smaller than the first preset threshold value, encoding each characteristic diagram into a sparse matrix storage format;
if the density of each feature map is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 element in each feature map;
and if the density of each feature map is greater than or equal to the second preset threshold, not performing sparse coding on each feature map.
Specifically, the step S3 is preceded by:
calculating the density of each convolution kernel in the trained convolution network;
if the density of each convolution kernel is smaller than the first preset threshold value, encoding each convolution kernel into a sparse matrix storage format;
if the density of each convolution kernel is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each convolution kernel;
and if the density of each convolution kernel is greater than or equal to the second preset threshold, not performing sparse coding on each convolution kernel.
Specifically, the step S3 specifically includes:
when the mark exists in each feature map or each convolution kernel, the element corresponding to the mark in each feature map or each convolution kernel is not calculated.
According to another aspect of the present invention, there is provided an accelerator applied to a convolutional neural network, including: the device comprises a neural network computing array module and a dynamic sparse adjustment module;
the dynamic sparse adjustment module is used for calculating the density of each characteristic diagram output by each layer of the convolutional neural network, comparing the density of each characteristic diagram with a plurality of preset thresholds, and performing sparse coding on each characteristic diagram according to a comparison result; wherein, different sparse coding modes corresponding to different comparison results;
the neural network calculation array module is used for carrying out convolution operation on each feature map subjected to sparse coding and each convolution kernel in the convolution neural network subjected to sparse coding in advance.
Specifically, the dynamic sparse adjustment module comprises an online density identification module, an output temporary registering module, a dynamic coding module and a dynamic sparse control module;
the on-line density identification module is used for counting the number of 0 elements in the feature map and the total number of all elements in the feature map for any feature map; taking the ratio of the number of 0 elements in the feature map to the total number of all elements in the feature map as the density of the feature map;
the output temporary register module is used for storing each characteristic graph output by each layer in the convolutional neural network;
the dynamic sparse control module is used for comparing the density of each characteristic diagram output by the on-line density identification module with a plurality of preset thresholds;
and the dynamic coding module is used for carrying out sparse coding on each characteristic diagram in the output temporary registering module according to a comparison result.
Specifically, the preset threshold includes a first preset threshold and a second preset threshold; wherein the first preset threshold is smaller than the second preset threshold;
correspondingly, the dynamic encoding module is specifically configured to:
if the density of each characteristic diagram is smaller than the first preset threshold value, encoding each characteristic diagram into a sparse matrix storage format;
if the density of each feature map is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 element in each feature map;
and if the density of each feature map is greater than or equal to the second preset threshold, not performing sparse coding on each feature map.
Specifically, the dynamic encoding module is further configured to:
if the pre-calculated consistency of each convolution kernel is smaller than the first preset threshold value, encoding each convolution kernel into a sparse matrix storage format;
if the density of each convolution kernel is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each convolution kernel;
and if the density of each convolution kernel is greater than or equal to the second preset threshold, not performing sparse coding on each convolution kernel.
Specifically, the neural network computational array module is specifically configured to:
when the mark exists in each feature map or each convolution kernel, the element corresponding to the mark in each feature map or each convolution kernel is not calculated.
The invention provides an acceleration method and an accelerator applied to a convolutional neural network, wherein the method comprises the steps of comparing the density of each characteristic diagram output by each layer in the convolutional neural network with a plurality of preset thresholds to obtain the sparse state of each characteristic diagram, carrying out sparse coding in different modes on the characteristic diagrams in different sparse states, and carrying out convolution operation on each characteristic diagram after sparse coding and a convolution kernel in the convolutional neural network with sparse coding in advance based on the convolution layer of the next layer of each layer, so that the calculation amount of convolution operation in the convolutional neural network is reduced, and the operation speed is improved.
Drawings
Fig. 1 is a schematic overall flowchart of an acceleration method applied to a convolutional neural network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an overall structure of an accelerator applied to a convolutional neural network according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a limit energy efficiency test result of an accelerator applied to a convolutional neural network according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating comparison of the results of the limit energy efficiency test in the accelerator applied to the convolutional neural network according to the embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
In an embodiment of the present invention, an acceleration method applied to a convolutional neural network is provided, and fig. 1 is a schematic flowchart of an overall acceleration method applied to a convolutional neural network, provided in an embodiment of the present invention, and the method includes:
s1, for any layer in the convolutional neural network, respectively calculating the density of each feature map output by the layer;
in particular, the convolutional neural network may or may not include pooling layers. Firstly, the convolutional neural network is trained, and after the training is finished, the convolutional kernel in the convolutional neural network does not change any more, so that the convolutional kernel in the convolutional neural network does not need on-line dynamic sparse coding, and the on-line sparse coding is directly carried out once. The inline refers to being on the chip of the accelerator and the offline refers to not being on the chip of the accelerator. And directly reading the convolution kernel of the sparse code to carry out convolution calculation in each convolution operation. When the original image data is input, the original image data is sparsely encoded, and then the sparsely encoded original data and a sparsely encoded convolution kernel are input to a first layer convolution layer of the convolutional neural network for convolution calculation. Because the original image data is not sparse in general, the original image data may not be sparsely encoded and may be directly input. The sparse coding is to store data in a sparse format.
In S1, since the feature maps output by each layer in the convolutional neural network have different densities, the feature maps output by different layers also dynamically change, and thus the densities also dynamically change. The density represents the degree of sparseness of each of the feature maps. In order to better improve the operation speed of the convolutional neural network, the density of each feature map output by each layer is calculated, and each feature map output by each layer is sparsely encoded according to the density of each feature map output by each layer.
S2, comparing the density of each characteristic graph output by the layer with a plurality of preset thresholds, and carrying out sparse coding on each characteristic graph according to the comparison result; wherein, different comparison results correspond to different sparse coding modes;
in S2, all the feature maps output from each layer are sparsely encoded in the prior art, which results in a large amount of calculation. In this embodiment, the sparse state of each feature map output by the layer is obtained according to the preset threshold. Therefore, different forms of sparse coding are carried out on the feature maps in different sparse states.
And S3, convolving each feature map after sparse coding and each convolution kernel in the convolutional neural network which is sparsely coded in advance based on the convolutional layer of the next layer.
In S3, a convolution operation is performed using each sparsely encoded feature map and each convolution kernel in the previously sparsely encoded convolutional neural network as input to a convolution layer of the layer next to the layer. Then, taking the result of the convolution operation as the input of the next layer of the convolutional layer, and continuing to perform the sparse coding and convolution operation on the feature map output by the next layer of the convolutional layer until each feature map is output by the last layer of the convolutional neural network. The present embodiment is not limited to the sparse coding scheme of the convolution kernel.
In the embodiment, the density of each feature graph output by each layer in the convolutional neural network is compared with a plurality of preset thresholds, the sparse state of each feature graph is obtained, the feature graphs in different sparse states are subjected to sparse coding in different modes, and then, on the basis of the convolutional layer of the next layer of each layer, the feature graphs after sparse coding and convolutional kernels in the convolutional neural network subjected to sparse coding in advance are subjected to convolutional operation, so that the calculation amount of convolutional operation in the convolutional neural network is reduced, and the operation speed is improved.
On the basis of the foregoing embodiment, in this embodiment, the step S1 specifically includes: for any one feature map, counting the number of non-0 elements in the feature map and the total number of all elements in the feature map; and taking the ratio of the number of the elements which are not 0 in the feature map to the total number of all the elements in the feature map as the density of the feature map.
Specifically, the density of each feature map is a ratio of the number of elements other than 0 in each feature map to the total number of all elements in each feature map. For example, if the number of non-0 elements in a feature map is 10 and the total number of all elements in the feature map is 100, the density of the feature map is 0.1.
On the basis of the above embodiment, in this embodiment, the preset threshold includes a first preset threshold and a second preset threshold; the preset threshold comprises a first preset threshold and a second preset threshold; correspondingly, the step S2 specifically includes: if the density of each characteristic diagram is smaller than the first preset threshold value, encoding each characteristic diagram into a sparse matrix storage format; if the density of each feature map is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 element in each feature map; and if the density of each feature map is greater than or equal to the second preset threshold, not performing sparse coding on each feature map.
Specifically, the preset threshold in this embodiment includes a first preset threshold th1 and a second preset threshold th 2. And dividing the characteristic state AS of each characteristic graph into three states according to the first preset threshold and the second preset threshold, namely dividing the characteristic graph with the consistency smaller than the first preset threshold into a complete sparse state S, dividing the characteristic graph with the consistency larger than or equal to the first preset threshold and smaller than the second preset threshold into a medium sparse state M, and dividing the characteristic graph with the consistency larger than or equal to the second preset threshold into a complete non-sparse state D. And if each characteristic map is in a sparse state S, encoding each characteristic map into a sparse matrix storage format, wherein the sparse matrix storage format comprises non-0 data activ and sparse index in each characteristic map, such as coordinate encoding and compressed sparse row encoding. By encoding the feature map into a sparse matrix storage format, a large amount of storage space can be saved, while saving a large amount of computing time. If each feature map is in the medium sparse state M, a label guard is added to the 0 element in each feature map, and the label is used for identifying the 0 element. The elements marked may not participate in computation and storage, thereby reducing power consumption. Labeling 0 elements in each feature map is also a sparse coding approach. And if each characteristic diagram is in a complete non-sparse state D, the non-sparse data of each characteristic diagram is directly output without dynamic coding.
On the basis of the foregoing embodiment, step S3 in this embodiment further includes: calculating the density of each convolution kernel in the trained convolution network; if the density of each convolution kernel is smaller than the first preset threshold value, encoding each convolution kernel into a sparse matrix storage format; if the density of each convolution kernel is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each convolution kernel; and if the density of each convolution kernel is greater than or equal to the second preset threshold value, not encoding each convolution kernel.
Specifically, the consistency of each convolution kernel is the ratio of the number of non-0 elements in each convolution kernel to the total number of all elements in each convolution kernel. The state WS of each convolution kernel is classified into three states as in the signature. Each state corresponds to a different sparse coding mode. The feature diagram and the convolution kernel have three states respectively, and the combined state has 9 states, so that the consistency of the convolution neural network is divided in a finer granularity.
On the basis of the foregoing embodiments, in this embodiment, the step S3 specifically includes: when the mark exists in each feature map or each convolution kernel, the element corresponding to the mark in each feature map or each convolution kernel is not calculated.
Specifically, when each feature map or each convolution kernel is in a completely sparse state S, 0 is removed before input, and a storage space is reduced without calculating 0 elements; when each feature map or each convolution kernel is in the medium sparse state M, although 0 element in each feature map or each convolution kernel is stored, the element corresponding to the label is not calculated, so that the calculation is reduced.
In another embodiment of the present invention, an accelerator applied to a convolutional neural network is provided, and fig. 2 is a schematic diagram of an overall structure of the accelerator applied to the convolutional neural network, which is provided by the embodiment of the present invention, and includes: the device comprises a neural network computing array module and a dynamic sparse adjustment module; the dynamic sparse adjustment module is used for calculating the density of each feature map output by each layer in the convolutional neural network, comparing the density of each feature map with a plurality of preset thresholds, and performing sparse coding on each feature map according to a comparison result; wherein, different sparse coding modes corresponding to different comparison results; the neural network calculation array module is used for carrying out convolution operation on each feature map subjected to sparse coding and each convolution kernel in the convolution neural network subjected to sparse coding in advance.
In particular, the convolutional neural network may or may not include pooling layers. Firstly, the convolutional neural network is trained, and after the training is finished, the convolutional kernel in the convolutional neural network is not changed any more, so that the convolutional kernel in the convolutional neural network does not need on-line dynamic sparse coding, and can be directly subjected to off-line sparse coding once. And the neural network computing array module directly reads the convolution kernel of the offline sparse code for convolution computation during each convolution operation. When the convolution neural network inputs original image data, the dynamic sparse adjustment module conducts sparse coding on the original image data, and then the neural network calculation array module conducts convolution calculation according to the sparsely coded original data and sparsely coded convolution kernels. Because the original image data is not sparse in general, the original image data may not be sparsely encoded and may be directly input. The sparse coding is to store data in a sparse format.
Because the consistency of each characteristic diagram output by each layer in the convolutional neural network is different, the characteristic diagrams output by different layers are also dynamically changed, and therefore, the consistency is also dynamically changed. The density represents the degree of sparseness of each of the feature maps. In order to better improve the operation speed of the convolutional neural network, the dynamic sparse adjustment module calculates the density of each feature map output by each layer, so as to perform sparse coding on each feature map output by each layer according to the density of each feature map output by each layer.
And the dynamic sparse adjustment module acquires the sparse state of each characteristic diagram output by the layer according to a plurality of preset thresholds. Therefore, the feature maps in different sparse states are subjected to different forms of sparse coding, and the method is not limited to one type of sparse coding. In the prior art, all the special pattern images output by each layer are subjected to sparse coding, and the calculation amount is large.
And the neural network computing array module performs convolution operation according to each feature map subjected to sparse coding and each convolution kernel in the convolution neural network subjected to sparse coding in advance. If the pooling module is included, the pooling module performs a pooling operation on the result of the convolution operation. In addition, the accelerator also comprises an intermediate data memory module, a main chip controller and an on-chip and off-chip data communication module. The main controller controls the operation action and the time sequence of the whole chip of the accelerator. The chip upper and lower data communication module is used for reading data from the chip external memory or writing the data calculated by the chip into the external memory. For example, after initialization, a chip reads original image data and an initial convolution kernel from an external memory through the chip up-down data communication module under the control of the main controller. The intermediate data storage module is used for storing intermediate results in the calculation process of the neural network calculation array module.
In this embodiment, the dynamic sparse adjustment module compares the density of each feature map output by each layer of the convolutional neural network with a plurality of preset thresholds to obtain the sparse state of each feature map, and performs sparse coding in different modes on the feature maps in different sparse states, so that the neural network computation array module performs convolutional operation on each feature map after sparse coding and a convolutional kernel in the convolutional neural network with sparse coding in advance, on one hand, the computation amount of convolutional operation in the convolutional neural network is reduced, and the computation speed is improved; on the other hand, the processing state of the accelerator is dynamically switched according to the difference of the sparse states, so that the flexibility of the accelerator is improved.
On the basis of the above embodiment, the dynamic sparse adjustment module in this embodiment includes an on-line thick density identification module, an output temporary registration module, a dynamic coding module, and a dynamic sparse control module; the on-line density identification module is used for counting the number of 0 elements in the feature map and the total number of all elements in the feature map for any feature map; taking the ratio of the number of 0 elements in the feature map to the total number of all elements in the feature map as the density of the feature map; the output temporary register module is used for storing each characteristic graph output by each layer in the convolutional neural network; the dynamic sparse control module is used for comparing the density of each characteristic diagram output by the on-line density identification module with a plurality of preset thresholds; and the dynamic coding module is used for carrying out sparse coding on each characteristic diagram in the output temporary registering module according to a comparison result.
Specifically, the dynamic sparsity adjusting module specifically includes four modules. The on-line consistency identification module is used for counting the number of non-0 elements in each feature map in the calculation process so as to calculate the consistency of each feature map. The output temporary registering module is used for temporarily storing the characteristic diagram output by each layer in the convolutional neural network in a non-sparse format. The dynamic sparse control module is used for controlling the sparse state of the characteristic diagram through a plurality of preset thresholds. And the dynamic coding module performs sparse coding on each characteristic diagram in the output temporary registering module according to the sparse state of each characteristic diagram, so that the speed of convolution operation is increased.
On the basis of the above embodiment, in this embodiment, the preset threshold includes a first preset threshold and a second preset threshold; the preset threshold comprises a first preset threshold and a second preset threshold; correspondingly, the dynamic encoding module is specifically configured to: if the density of each characteristic diagram is smaller than the first preset threshold value, encoding each characteristic diagram into a sparse matrix storage format; if the density of each feature map is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 element in each feature map; and if the density of each feature map is greater than or equal to the second preset threshold, not encoding each feature map.
Specifically, the preset threshold in this embodiment includes a first preset threshold th1 and a second preset threshold th 2. The dynamic sparse control module divides the characteristic state AS of each characteristic diagram into three states according to the first preset threshold and the second preset threshold, namely, the characteristic diagram with the density smaller than the first preset threshold is divided into a complete sparse state S, the characteristic diagram with the density larger than or equal to the first preset threshold and smaller than the second preset threshold is divided into a medium sparse state M, and the characteristic diagram with the density larger than or equal to the second preset threshold is divided into a complete non-sparse state D.
If each feature map is in a sparse state S, the dynamic encoding module encodes each feature map in the output temporary registration module into a sparse matrix storage format, where the sparse matrix storage format includes non-0 data activ and a sparse index in each feature map, such as coordinate encoding and compressed sparse row encoding. By encoding the feature map into a sparse matrix storage format, a large amount of storage space can be saved, while saving a large amount of computing time. If each feature map is in a medium sparse state M, the dynamic coding module adds a mark guard to 0 elements in each feature map in the output temporary registering module, and the marked elements do not participate in calculation and storage, so that power consumption is reduced. And if each characteristic diagram is in a complete non-sparse state D, dynamic coding is not needed, and the dynamic coding module directly outputs non-sparse data of each characteristic diagram.
On the basis of the foregoing embodiment, in this embodiment, the dynamic encoding module is further configured to: if the pre-calculated consistency of each convolution kernel is smaller than the first preset threshold value, encoding each convolution kernel into a sparse matrix storage format; if the density of each convolution kernel is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each convolution kernel; and if the density of each convolution kernel is greater than or equal to the second preset threshold value, not encoding each convolution kernel.
Specifically, the consistency of each convolution kernel is the ratio of the number of non-0 elements in each convolution kernel to the total number of all elements in each convolution kernel. The state WS of each convolution kernel has three states as well as the signature. Each state corresponds to a different sparse coding mode. The feature diagram and the convolution kernel have three states respectively, and the combined state has 9 states, so that the consistency of the convolution neural network is divided in a finer granularity.
On the basis of the foregoing embodiments, the neural network computational array module in this embodiment is specifically configured to: when the mark exists in each feature map or each convolution kernel, the element corresponding to the mark in each feature map or each convolution kernel is not calculated.
Specifically, when each feature map or each convolution kernel is in a completely sparse state S, 0 is removed before each feature map or each convolution kernel is input to the neural network computational array module, so that the storage space is reduced, and meanwhile, 0 element is not required to be calculated; when each feature map or each convolution kernel is in the medium sparse state M, although 0 element in each feature map or each convolution kernel is stored, the element corresponding to the label is not calculated, so that the calculation is reduced.
For example, the chip of the accelerator is manufactured by a deposition 65nm process, the area of the chip is 3mm by 4mm, the operating frequency is 20-200MHz, and the power consumption is 20.5-248.4 milliwatts. The energy efficiency limit in this embodiment rises rapidly as the feature map and convolution kernel density decrease, as shown in fig. 3. When the density of the characteristic diagram and the convolution kernel is 5%, the limit energy efficiency can reach 62.1TOPS/W, which is 6.2 times of the limit energy efficiency when the accelerator is not adopted. As shown in fig. 4, compared to the implementation that only feature data sparsity is supported, the energy efficiency of the embodiment can be improved by 4.3 times. Compared with the realization without self-adaptive sparse control, the invention can improve the energy efficiency by 2.8 times. Compared with the realization of no density control but variable quantization precision, the energy efficiency of the invention can be improved by 2 times.
Finally, the method of the present application is only a preferred embodiment and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. An acceleration method applied to a convolutional neural network, comprising:
s1, for any layer in the convolutional neural network, respectively calculating the density of each feature map output by the layer;
s2, comparing the density of each characteristic graph output by the layer with a plurality of preset thresholds, and carrying out sparse coding on each characteristic graph according to the comparison result; wherein, different comparison results correspond to different sparse coding modes;
s3, convolving each feature map after sparse coding and each convolution kernel in the convolutional neural network which is sparsely coded in advance based on the convolution layer of the next layer of the layer;
the preset threshold comprises a first preset threshold and a second preset threshold; wherein the first preset threshold is smaller than the second preset threshold;
correspondingly, the step S2 specifically includes:
if the density of each characteristic diagram is smaller than the first preset threshold value, encoding each characteristic diagram into a sparse matrix storage format;
if the density of each feature map is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 element in each feature map;
if the density of each feature map is greater than or equal to the second preset threshold, not performing sparse coding on each feature map;
the step S3 specifically includes:
and when the marks exist in each characteristic diagram, not calculating the elements corresponding to the marks in each characteristic diagram.
2. The method according to claim 1, wherein the step S1 specifically includes:
for any one feature map, counting the number of non-0 elements in the feature map and the total number of all elements in the feature map;
and taking the ratio of the number of the elements which are not 0 in the feature map to the total number of all the elements in the feature map as the density of the feature map.
3. The method according to claim 1, wherein the step S3 is preceded by:
calculating the density of each convolution kernel in the trained convolution network;
if the density of each convolution kernel is smaller than the first preset threshold value, encoding each convolution kernel into a sparse matrix storage format;
if the density of each convolution kernel is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each convolution kernel;
and if the density of each convolution kernel is greater than or equal to the second preset threshold, not performing sparse coding on each convolution kernel.
4. The method according to claim 3, wherein the step S3 specifically includes:
and when the mark exists in each convolution kernel, not calculating the element corresponding to the mark in each convolution kernel.
5. An accelerator for application to a convolutional neural network, comprising: the device comprises a neural network computing array module and a dynamic sparse adjustment module;
the dynamic sparse adjustment module is used for calculating the density of each feature map output by each layer in the convolutional neural network, comparing the density of each feature map with a plurality of preset thresholds, and performing sparse coding on each feature map according to a comparison result; wherein, different sparse coding modes corresponding to different comparison results;
the neural network computing array module is used for carrying out convolution operation on each feature map subjected to sparse coding and each convolution kernel in the convolution neural network subjected to sparse coding in advance;
the preset threshold comprises a first preset threshold and a second preset threshold; wherein the first preset threshold is smaller than the second preset threshold;
correspondingly, the dynamic sparsity adjustment module includes a dynamic encoding module, and the dynamic encoding module is specifically configured to:
if the density of each characteristic diagram is smaller than the first preset threshold value, encoding each characteristic diagram into a sparse matrix storage format;
if the density of each feature map is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 element in each feature map;
if the density of each feature map is greater than or equal to the second preset threshold, not performing sparse coding on each feature map;
the neural network computing array module is specifically configured to:
and when the marks exist in each characteristic diagram, not calculating the elements corresponding to the marks in each characteristic diagram.
6. The accelerator according to claim 5, wherein the dynamic sparsity adjustment module comprises an on-line thickness identification module, an output temporary registration module, a dynamic encoding module, and a dynamic sparsity control module;
the on-line density identification module is used for counting the number of 0 elements in the feature map and the total number of all elements in the feature map for any feature map; taking the ratio of the number of 0 elements in the feature map to the total number of all elements in the feature map as the density of the feature map;
the output temporary register module is used for storing each characteristic graph output by each layer in the convolutional neural network;
the dynamic sparse control module is used for comparing the density of each characteristic diagram output by the on-line density identification module with a plurality of preset thresholds;
and the dynamic coding module is used for carrying out sparse coding on each characteristic diagram in the output temporary registering module according to a comparison result.
7. The accelerator of claim 5, wherein the dynamic encoding module is further to:
if the pre-calculated consistency of each convolution kernel is smaller than the first preset threshold value, encoding each convolution kernel into a sparse matrix storage format;
if the density of each convolution kernel is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each convolution kernel;
and if the density of each convolution kernel is greater than or equal to the second preset threshold, not performing sparse coding on each convolution kernel.
8. The accelerator of claim 7, wherein the neural network computational array module is specifically configured to:
and when the mark exists in each convolution kernel, not calculating the element corresponding to the mark in each convolution kernel.
CN201810306577.3A 2018-04-08 2018-04-08 Acceleration method and accelerator applied to convolutional neural network Active CN108510063B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810306577.3A CN108510063B (en) 2018-04-08 2018-04-08 Acceleration method and accelerator applied to convolutional neural network
PCT/CN2018/095365 WO2019196223A1 (en) 2018-04-08 2018-07-12 Acceleration method and accelerator used for convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810306577.3A CN108510063B (en) 2018-04-08 2018-04-08 Acceleration method and accelerator applied to convolutional neural network

Publications (2)

Publication Number Publication Date
CN108510063A CN108510063A (en) 2018-09-07
CN108510063B true CN108510063B (en) 2020-03-20

Family

ID=63380995

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810306577.3A Active CN108510063B (en) 2018-04-08 2018-04-08 Acceleration method and accelerator applied to convolutional neural network

Country Status (2)

Country Link
CN (1) CN108510063B (en)
WO (1) WO2019196223A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389043B (en) * 2018-09-10 2021-11-23 中国人民解放军陆军工程大学 Crowd density estimation method for aerial picture of unmanned aerial vehicle
CN109409518B (en) * 2018-10-11 2021-05-04 北京旷视科技有限公司 Neural network model processing method and device and terminal
CN109784484A (en) * 2019-01-31 2019-05-21 深兰科技(上海)有限公司 Neural network accelerated method, device, neural network accelerate chip and storage medium
CN110097172B (en) * 2019-03-18 2021-10-29 中国科学院计算技术研究所 Convolutional neural network data processing method and device based on Winograd convolutional operation
CN109858575B (en) * 2019-03-19 2024-01-05 苏州市爱生生物技术有限公司 Data classification method based on convolutional neural network
CN110443357B (en) * 2019-08-07 2020-09-15 上海燧原智能科技有限公司 Convolutional neural network calculation optimization method and device, computer equipment and medium
CN110909801B (en) * 2019-11-26 2020-10-09 山东师范大学 Data classification method, system, medium and device based on convolutional neural network
CN111291230B (en) * 2020-02-06 2023-09-15 北京奇艺世纪科技有限公司 Feature processing method, device, electronic equipment and computer readable storage medium
CN111401554B (en) * 2020-03-12 2023-03-24 交叉信息核心技术研究院(西安)有限公司 Accelerator of convolutional neural network supporting multi-granularity sparsity and multi-mode quantization
CN113537465A (en) * 2021-07-07 2021-10-22 深圳市易成自动驾驶技术有限公司 LSTM model optimization method, accelerator, device and medium
WO2023164855A1 (en) * 2022-03-03 2023-09-07 Intel Corporation Apparatus and method for 3d dynamic sparse convolution

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184362B (en) * 2015-08-21 2018-02-02 中国科学院自动化研究所 The acceleration of the depth convolutional neural networks quantified based on parameter and compression method
US10380479B2 (en) * 2015-10-08 2019-08-13 International Business Machines Corporation Acceleration of convolutional neural network training using stochastic perforation
CN107679617B (en) * 2016-08-22 2021-04-09 赛灵思电子科技(北京)有限公司 Multi-iteration deep neural network compression method
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator
CN107609641B (en) * 2017-08-30 2020-07-03 清华大学 Sparse neural network architecture and implementation method thereof

Also Published As

Publication number Publication date
WO2019196223A1 (en) 2019-10-17
CN108510063A (en) 2018-09-07

Similar Documents

Publication Publication Date Title
CN108510063B (en) Acceleration method and accelerator applied to convolutional neural network
Liu et al. Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm
CN111368662B (en) Method, device, storage medium and equipment for editing attribute of face image
CN111160523B (en) Dynamic quantization method, system and medium based on characteristic value region
WO2019127362A1 (en) Neural network model block compression method, training method, computing device and system
CN105718943A (en) Character selection method based on particle swarm optimization algorithm
CN109886391B (en) Neural network compression method based on space forward and backward diagonal convolution
Bruske et al. Dynamic cell structures
CN112990420A (en) Pruning method for convolutional neural network model
US20210192327A1 (en) Apparatus and method for neural network computation
Liu et al. RB-Net: Training highly accurate and efficient binary neural networks with reshaped point-wise convolution and balanced activation
CN111814973A (en) Memory computing system suitable for neural ordinary differential equation network computing
CN112215298A (en) Model training method, device, equipment and readable storage medium
CN112257844A (en) Convolutional neural network accelerator based on mixed precision configuration and implementation method thereof
CN112288046B (en) Mixed granularity-based joint sparse method for neural network
CN109670582A (en) A kind of design method of full fixed point neural network
US20210397962A1 (en) Effective network compression using simulation-guided iterative pruning
CN113705784A (en) Neural network weight coding method based on matrix sharing and hardware system
CN114298291A (en) Model quantization processing system and model quantization processing method
CN114065920A (en) Image identification method and system based on channel-level pruning neural network
CN112906871A (en) Temperature prediction method and system based on hybrid multilayer neural network model
CN113222142A (en) Channel pruning and quick connection layer pruning method and system
Ho et al. MiCE: an ANN-to-SNN conversion technique to Enable High Accuracy and Low Latency
EP4007173A1 (en) Data storage method, and data acquisition method and apparatus therefor
Huang et al. BWA-NIMC: Budget-based Workload Allocation for Hybrid Near/In-Memory-Computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant