CN108510063B

CN108510063B - Acceleration method and accelerator applied to convolutional neural network

Info

Publication number: CN108510063B
Application number: CN201810306577.3A
Authority: CN
Inventors: 刘勇攀; 袁哲; 岳金山; 杨华中; 李学清; 王智博
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2018-04-08
Filing date: 2018-04-08
Publication date: 2020-03-20
Anticipated expiration: 2038-04-08
Also published as: WO2019196223A1; CN108510063A

Abstract

The invention provides an acceleration method and an accelerator applied to a convolutional neural network, wherein the method comprises the following steps: s1, for any layer in the convolutional neural network, respectively calculating the density of each feature map output by the layer; s2, comparing the density of each characteristic graph output by the layer with a plurality of preset thresholds, and carrying out sparse coding on each characteristic graph according to the comparison result; wherein, different comparison results correspond to different sparse coding modes; and S3, convolving each feature map after sparse coding and each convolution kernel in the convolutional neural network which is sparsely coded in advance based on the convolutional layer of the next layer. The invention reduces the calculation amount of convolution operation in the convolution neural network and improves the operation speed.

Description

Acceleration method and accelerator applied to convolutional neural network

Technical Field

The invention belongs to the technical field of operation optimization, and particularly relates to an acceleration method and an accelerator applied to a convolutional neural network.

Background

A Convolutional Neural Network (CNN) is a feed-forward Neural Network whose artificial neurons can respond to a portion of the surrounding cells in the coverage, and is suitable for processing large images. The convolutional neural network is widely applied to the fields of image recognition, voice recognition and the like, but the calculation amount is very large.

A large number of sparse feature maps (feature maps) are caused by the activation function ReLU (corrected linear unit) in the convolutional neural network; meanwhile, training the convolutional neural network by using methods such as pruning and the like can cause a large amount of sparse weight data (weight data). The calculation efficiency of the convolutional neural network can be greatly improved by utilizing the sparsity of the feature map and the weight data. At present, many methods are used for improving the calculation speed based on the sparsity of feature maps and weight data in a convolutional neural network. These methods can be roughly divided into two categories, one of which focuses on skipping the 0 value. For example, some methods remove the value of 0 from the input, thereby reducing the number of invalid calculations for an input of 0. The other takes the approach of ignoring zero values. For example, there is a method of reducing the number of operations by not performing multiplication when the input data is 0. However, these methods all focus on dealing with sparse neural networks themselves, assuming that neural networks are sparse as a premise. In practice, however, the output feature maps of the various layers in the convolutional neural network may be sparse and may be non-sparse. In practical application, the weight data of each layer of the convolutional neural network and the density of the characteristic diagram are generally distributed between 5% and 90%.

The sparse matrix refers to a matrix in which the number of elements having a value of 0 is much greater than the number of elements other than 0, and the elements other than 0 are distributed irregularly. In the prior art, on one hand, only a sparse convolutional neural network can be processed, and under the condition that the convolutional neural network is not sparse, the calculation amount is large and the operation speed is low; on the other hand, the prior art can only process the condition that the weight data or the characteristic graph in the convolution nerve is sparse, and cannot process the condition that the weight data and the characteristic graph are both sparse.

Disclosure of Invention

In order to overcome the problem of low operation speed of the convolutional neural network or at least partially solve the problem, the invention provides an acceleration method and an accelerator applied to the convolutional neural network.

According to a first aspect of the present invention, there is provided an acceleration method applied to a convolutional neural network, comprising:

s1, for any layer in the convolutional neural network, respectively calculating the density of each feature map output by the layer;

s2, comparing the density of each characteristic graph output by the layer with a plurality of preset thresholds, and carrying out sparse coding on each characteristic graph according to the comparison result; wherein, different comparison results correspond to different sparse coding modes;

and S3, convolving each feature map after sparse coding and each convolution kernel in the convolutional neural network which is sparsely coded in advance based on the convolutional layer of the next layer.

Specifically, the step S1 specifically includes:

for any one feature map, counting the number of non-0 elements in the feature map and the total number of all elements in the feature map;

and taking the ratio of the number of the elements which are not 0 in the feature map to the total number of all the elements in the feature map as the density of the feature map.

Specifically, the preset threshold includes a first preset threshold and a second preset threshold; wherein the first preset threshold is smaller than the second preset threshold;

correspondingly, the step S2 specifically includes:

if the density of each characteristic diagram is smaller than the first preset threshold value, encoding each characteristic diagram into a sparse matrix storage format;

if the density of each feature map is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 element in each feature map;

and if the density of each feature map is greater than or equal to the second preset threshold, not performing sparse coding on each feature map.

Specifically, the step S3 is preceded by:

calculating the density of each convolution kernel in the trained convolution network;

if the density of each convolution kernel is smaller than the first preset threshold value, encoding each convolution kernel into a sparse matrix storage format;

if the density of each convolution kernel is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each convolution kernel;

and if the density of each convolution kernel is greater than or equal to the second preset threshold, not performing sparse coding on each convolution kernel.

Specifically, the step S3 specifically includes:

when the mark exists in each feature map or each convolution kernel, the element corresponding to the mark in each feature map or each convolution kernel is not calculated.

According to another aspect of the present invention, there is provided an accelerator applied to a convolutional neural network, including: the device comprises a neural network computing array module and a dynamic sparse adjustment module;

the dynamic sparse adjustment module is used for calculating the density of each characteristic diagram output by each layer of the convolutional neural network, comparing the density of each characteristic diagram with a plurality of preset thresholds, and performing sparse coding on each characteristic diagram according to a comparison result; wherein, different sparse coding modes corresponding to different comparison results;

the neural network calculation array module is used for carrying out convolution operation on each feature map subjected to sparse coding and each convolution kernel in the convolution neural network subjected to sparse coding in advance.

Specifically, the dynamic sparse adjustment module comprises an online density identification module, an output temporary registering module, a dynamic coding module and a dynamic sparse control module;

the on-line density identification module is used for counting the number of 0 elements in the feature map and the total number of all elements in the feature map for any feature map; taking the ratio of the number of 0 elements in the feature map to the total number of all elements in the feature map as the density of the feature map;

the output temporary register module is used for storing each characteristic graph output by each layer in the convolutional neural network;

the dynamic sparse control module is used for comparing the density of each characteristic diagram output by the on-line density identification module with a plurality of preset thresholds;

and the dynamic coding module is used for carrying out sparse coding on each characteristic diagram in the output temporary registering module according to a comparison result.

correspondingly, the dynamic encoding module is specifically configured to:

Specifically, the dynamic encoding module is further configured to:

if the pre-calculated consistency of each convolution kernel is smaller than the first preset threshold value, encoding each convolution kernel into a sparse matrix storage format;

Specifically, the neural network computational array module is specifically configured to:

The invention provides an acceleration method and an accelerator applied to a convolutional neural network, wherein the method comprises the steps of comparing the density of each characteristic diagram output by each layer in the convolutional neural network with a plurality of preset thresholds to obtain the sparse state of each characteristic diagram, carrying out sparse coding in different modes on the characteristic diagrams in different sparse states, and carrying out convolution operation on each characteristic diagram after sparse coding and a convolution kernel in the convolutional neural network with sparse coding in advance based on the convolution layer of the next layer of each layer, so that the calculation amount of convolution operation in the convolutional neural network is reduced, and the operation speed is improved.

Drawings

Fig. 1 is a schematic overall flowchart of an acceleration method applied to a convolutional neural network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an overall structure of an accelerator applied to a convolutional neural network according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a limit energy efficiency test result of an accelerator applied to a convolutional neural network according to an embodiment of the present invention;

fig. 4 is a schematic diagram illustrating comparison of the results of the limit energy efficiency test in the accelerator applied to the convolutional neural network according to the embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

In an embodiment of the present invention, an acceleration method applied to a convolutional neural network is provided, and fig. 1 is a schematic flowchart of an overall acceleration method applied to a convolutional neural network, provided in an embodiment of the present invention, and the method includes:

in particular, the convolutional neural network may or may not include pooling layers. Firstly, the convolutional neural network is trained, and after the training is finished, the convolutional kernel in the convolutional neural network does not change any more, so that the convolutional kernel in the convolutional neural network does not need on-line dynamic sparse coding, and the on-line sparse coding is directly carried out once. The inline refers to being on the chip of the accelerator and the offline refers to not being on the chip of the accelerator. And directly reading the convolution kernel of the sparse code to carry out convolution calculation in each convolution operation. When the original image data is input, the original image data is sparsely encoded, and then the sparsely encoded original data and a sparsely encoded convolution kernel are input to a first layer convolution layer of the convolutional neural network for convolution calculation. Because the original image data is not sparse in general, the original image data may not be sparsely encoded and may be directly input. The sparse coding is to store data in a sparse format.

In S1, since the feature maps output by each layer in the convolutional neural network have different densities, the feature maps output by different layers also dynamically change, and thus the densities also dynamically change. The density represents the degree of sparseness of each of the feature maps. In order to better improve the operation speed of the convolutional neural network, the density of each feature map output by each layer is calculated, and each feature map output by each layer is sparsely encoded according to the density of each feature map output by each layer.

in S2, all the feature maps output from each layer are sparsely encoded in the prior art, which results in a large amount of calculation. In this embodiment, the sparse state of each feature map output by the layer is obtained according to the preset threshold. Therefore, different forms of sparse coding are carried out on the feature maps in different sparse states.

In S3, a convolution operation is performed using each sparsely encoded feature map and each convolution kernel in the previously sparsely encoded convolutional neural network as input to a convolution layer of the layer next to the layer. Then, taking the result of the convolution operation as the input of the next layer of the convolutional layer, and continuing to perform the sparse coding and convolution operation on the feature map output by the next layer of the convolutional layer until each feature map is output by the last layer of the convolutional neural network. The present embodiment is not limited to the sparse coding scheme of the convolution kernel.

In the embodiment, the density of each feature graph output by each layer in the convolutional neural network is compared with a plurality of preset thresholds, the sparse state of each feature graph is obtained, the feature graphs in different sparse states are subjected to sparse coding in different modes, and then, on the basis of the convolutional layer of the next layer of each layer, the feature graphs after sparse coding and convolutional kernels in the convolutional neural network subjected to sparse coding in advance are subjected to convolutional operation, so that the calculation amount of convolutional operation in the convolutional neural network is reduced, and the operation speed is improved.

On the basis of the foregoing embodiment, in this embodiment, the step S1 specifically includes: for any one feature map, counting the number of non-0 elements in the feature map and the total number of all elements in the feature map; and taking the ratio of the number of the elements which are not 0 in the feature map to the total number of all the elements in the feature map as the density of the feature map.

Specifically, the density of each feature map is a ratio of the number of elements other than 0 in each feature map to the total number of all elements in each feature map. For example, if the number of non-0 elements in a feature map is 10 and the total number of all elements in the feature map is 100, the density of the feature map is 0.1.

On the basis of the above embodiment, in this embodiment, the preset threshold includes a first preset threshold and a second preset threshold; the preset threshold comprises a first preset threshold and a second preset threshold; correspondingly, the step S2 specifically includes: if the density of each characteristic diagram is smaller than the first preset threshold value, encoding each characteristic diagram into a sparse matrix storage format; if the density of each feature map is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 element in each feature map; and if the density of each feature map is greater than or equal to the second preset threshold, not performing sparse coding on each feature map.

Specifically, the preset threshold in this embodiment includes a first preset threshold th1 and a second preset threshold th 2. And dividing the characteristic state AS of each characteristic graph into three states according to the first preset threshold and the second preset threshold, namely dividing the characteristic graph with the consistency smaller than the first preset threshold into a complete sparse state S, dividing the characteristic graph with the consistency larger than or equal to the first preset threshold and smaller than the second preset threshold into a medium sparse state M, and dividing the characteristic graph with the consistency larger than or equal to the second preset threshold into a complete non-sparse state D. And if each characteristic map is in a sparse state S, encoding each characteristic map into a sparse matrix storage format, wherein the sparse matrix storage format comprises non-0 data activ and sparse index in each characteristic map, such as coordinate encoding and compressed sparse row encoding. By encoding the feature map into a sparse matrix storage format, a large amount of storage space can be saved, while saving a large amount of computing time. If each feature map is in the medium sparse state M, a label guard is added to the 0 element in each feature map, and the label is used for identifying the 0 element. The elements marked may not participate in computation and storage, thereby reducing power consumption. Labeling 0 elements in each feature map is also a sparse coding approach. And if each characteristic diagram is in a complete non-sparse state D, the non-sparse data of each characteristic diagram is directly output without dynamic coding.

On the basis of the foregoing embodiment, step S3 in this embodiment further includes: calculating the density of each convolution kernel in the trained convolution network; if the density of each convolution kernel is smaller than the first preset threshold value, encoding each convolution kernel into a sparse matrix storage format; if the density of each convolution kernel is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each convolution kernel; and if the density of each convolution kernel is greater than or equal to the second preset threshold value, not encoding each convolution kernel.

Specifically, the consistency of each convolution kernel is the ratio of the number of non-0 elements in each convolution kernel to the total number of all elements in each convolution kernel. The state WS of each convolution kernel is classified into three states as in the signature. Each state corresponds to a different sparse coding mode. The feature diagram and the convolution kernel have three states respectively, and the combined state has 9 states, so that the consistency of the convolution neural network is divided in a finer granularity.

On the basis of the foregoing embodiments, in this embodiment, the step S3 specifically includes: when the mark exists in each feature map or each convolution kernel, the element corresponding to the mark in each feature map or each convolution kernel is not calculated.

Specifically, when each feature map or each convolution kernel is in a completely sparse state S, 0 is removed before input, and a storage space is reduced without calculating 0 elements; when each feature map or each convolution kernel is in the medium sparse state M, although 0 element in each feature map or each convolution kernel is stored, the element corresponding to the label is not calculated, so that the calculation is reduced.

In another embodiment of the present invention, an accelerator applied to a convolutional neural network is provided, and fig. 2 is a schematic diagram of an overall structure of the accelerator applied to the convolutional neural network, which is provided by the embodiment of the present invention, and includes: the device comprises a neural network computing array module and a dynamic sparse adjustment module; the dynamic sparse adjustment module is used for calculating the density of each feature map output by each layer in the convolutional neural network, comparing the density of each feature map with a plurality of preset thresholds, and performing sparse coding on each feature map according to a comparison result; wherein, different sparse coding modes corresponding to different comparison results; the neural network calculation array module is used for carrying out convolution operation on each feature map subjected to sparse coding and each convolution kernel in the convolution neural network subjected to sparse coding in advance.

In particular, the convolutional neural network may or may not include pooling layers. Firstly, the convolutional neural network is trained, and after the training is finished, the convolutional kernel in the convolutional neural network is not changed any more, so that the convolutional kernel in the convolutional neural network does not need on-line dynamic sparse coding, and can be directly subjected to off-line sparse coding once. And the neural network computing array module directly reads the convolution kernel of the offline sparse code for convolution computation during each convolution operation. When the convolution neural network inputs original image data, the dynamic sparse adjustment module conducts sparse coding on the original image data, and then the neural network calculation array module conducts convolution calculation according to the sparsely coded original data and sparsely coded convolution kernels. Because the original image data is not sparse in general, the original image data may not be sparsely encoded and may be directly input. The sparse coding is to store data in a sparse format.

Because the consistency of each characteristic diagram output by each layer in the convolutional neural network is different, the characteristic diagrams output by different layers are also dynamically changed, and therefore, the consistency is also dynamically changed. The density represents the degree of sparseness of each of the feature maps. In order to better improve the operation speed of the convolutional neural network, the dynamic sparse adjustment module calculates the density of each feature map output by each layer, so as to perform sparse coding on each feature map output by each layer according to the density of each feature map output by each layer.

And the dynamic sparse adjustment module acquires the sparse state of each characteristic diagram output by the layer according to a plurality of preset thresholds. Therefore, the feature maps in different sparse states are subjected to different forms of sparse coding, and the method is not limited to one type of sparse coding. In the prior art, all the special pattern images output by each layer are subjected to sparse coding, and the calculation amount is large.

And the neural network computing array module performs convolution operation according to each feature map subjected to sparse coding and each convolution kernel in the convolution neural network subjected to sparse coding in advance. If the pooling module is included, the pooling module performs a pooling operation on the result of the convolution operation. In addition, the accelerator also comprises an intermediate data memory module, a main chip controller and an on-chip and off-chip data communication module. The main controller controls the operation action and the time sequence of the whole chip of the accelerator. The chip upper and lower data communication module is used for reading data from the chip external memory or writing the data calculated by the chip into the external memory. For example, after initialization, a chip reads original image data and an initial convolution kernel from an external memory through the chip up-down data communication module under the control of the main controller. The intermediate data storage module is used for storing intermediate results in the calculation process of the neural network calculation array module.

In this embodiment, the dynamic sparse adjustment module compares the density of each feature map output by each layer of the convolutional neural network with a plurality of preset thresholds to obtain the sparse state of each feature map, and performs sparse coding in different modes on the feature maps in different sparse states, so that the neural network computation array module performs convolutional operation on each feature map after sparse coding and a convolutional kernel in the convolutional neural network with sparse coding in advance, on one hand, the computation amount of convolutional operation in the convolutional neural network is reduced, and the computation speed is improved; on the other hand, the processing state of the accelerator is dynamically switched according to the difference of the sparse states, so that the flexibility of the accelerator is improved.

On the basis of the above embodiment, the dynamic sparse adjustment module in this embodiment includes an on-line thick density identification module, an output temporary registration module, a dynamic coding module, and a dynamic sparse control module; the on-line density identification module is used for counting the number of 0 elements in the feature map and the total number of all elements in the feature map for any feature map; taking the ratio of the number of 0 elements in the feature map to the total number of all elements in the feature map as the density of the feature map; the output temporary register module is used for storing each characteristic graph output by each layer in the convolutional neural network; the dynamic sparse control module is used for comparing the density of each characteristic diagram output by the on-line density identification module with a plurality of preset thresholds; and the dynamic coding module is used for carrying out sparse coding on each characteristic diagram in the output temporary registering module according to a comparison result.

Specifically, the dynamic sparsity adjusting module specifically includes four modules. The on-line consistency identification module is used for counting the number of non-0 elements in each feature map in the calculation process so as to calculate the consistency of each feature map. The output temporary registering module is used for temporarily storing the characteristic diagram output by each layer in the convolutional neural network in a non-sparse format. The dynamic sparse control module is used for controlling the sparse state of the characteristic diagram through a plurality of preset thresholds. And the dynamic coding module performs sparse coding on each characteristic diagram in the output temporary registering module according to the sparse state of each characteristic diagram, so that the speed of convolution operation is increased.

On the basis of the above embodiment, in this embodiment, the preset threshold includes a first preset threshold and a second preset threshold; the preset threshold comprises a first preset threshold and a second preset threshold; correspondingly, the dynamic encoding module is specifically configured to: if the density of each characteristic diagram is smaller than the first preset threshold value, encoding each characteristic diagram into a sparse matrix storage format; if the density of each feature map is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 element in each feature map; and if the density of each feature map is greater than or equal to the second preset threshold, not encoding each feature map.

Specifically, the preset threshold in this embodiment includes a first preset threshold th1 and a second preset threshold th 2. The dynamic sparse control module divides the characteristic state AS of each characteristic diagram into three states according to the first preset threshold and the second preset threshold, namely, the characteristic diagram with the density smaller than the first preset threshold is divided into a complete sparse state S, the characteristic diagram with the density larger than or equal to the first preset threshold and smaller than the second preset threshold is divided into a medium sparse state M, and the characteristic diagram with the density larger than or equal to the second preset threshold is divided into a complete non-sparse state D.

If each feature map is in a sparse state S, the dynamic encoding module encodes each feature map in the output temporary registration module into a sparse matrix storage format, where the sparse matrix storage format includes non-0 data activ and a sparse index in each feature map, such as coordinate encoding and compressed sparse row encoding. By encoding the feature map into a sparse matrix storage format, a large amount of storage space can be saved, while saving a large amount of computing time. If each feature map is in a medium sparse state M, the dynamic coding module adds a mark guard to 0 elements in each feature map in the output temporary registering module, and the marked elements do not participate in calculation and storage, so that power consumption is reduced. And if each characteristic diagram is in a complete non-sparse state D, dynamic coding is not needed, and the dynamic coding module directly outputs non-sparse data of each characteristic diagram.

On the basis of the foregoing embodiment, in this embodiment, the dynamic encoding module is further configured to: if the pre-calculated consistency of each convolution kernel is smaller than the first preset threshold value, encoding each convolution kernel into a sparse matrix storage format; if the density of each convolution kernel is greater than or equal to the first preset threshold and less than the second preset threshold, marking 0 elements in each convolution kernel; and if the density of each convolution kernel is greater than or equal to the second preset threshold value, not encoding each convolution kernel.

Specifically, the consistency of each convolution kernel is the ratio of the number of non-0 elements in each convolution kernel to the total number of all elements in each convolution kernel. The state WS of each convolution kernel has three states as well as the signature. Each state corresponds to a different sparse coding mode. The feature diagram and the convolution kernel have three states respectively, and the combined state has 9 states, so that the consistency of the convolution neural network is divided in a finer granularity.

On the basis of the foregoing embodiments, the neural network computational array module in this embodiment is specifically configured to: when the mark exists in each feature map or each convolution kernel, the element corresponding to the mark in each feature map or each convolution kernel is not calculated.

Specifically, when each feature map or each convolution kernel is in a completely sparse state S, 0 is removed before each feature map or each convolution kernel is input to the neural network computational array module, so that the storage space is reduced, and meanwhile, 0 element is not required to be calculated; when each feature map or each convolution kernel is in the medium sparse state M, although 0 element in each feature map or each convolution kernel is stored, the element corresponding to the label is not calculated, so that the calculation is reduced.

For example, the chip of the accelerator is manufactured by a deposition 65nm process, the area of the chip is 3mm by 4mm, the operating frequency is 20-200MHz, and the power consumption is 20.5-248.4 milliwatts. The energy efficiency limit in this embodiment rises rapidly as the feature map and convolution kernel density decrease, as shown in fig. 3. When the density of the characteristic diagram and the convolution kernel is 5%, the limit energy efficiency can reach 62.1TOPS/W, which is 6.2 times of the limit energy efficiency when the accelerator is not adopted. As shown in fig. 4, compared to the implementation that only feature data sparsity is supported, the energy efficiency of the embodiment can be improved by 4.3 times. Compared with the realization without self-adaptive sparse control, the invention can improve the energy efficiency by 2.8 times. Compared with the realization of no density control but variable quantization precision, the energy efficiency of the invention can be improved by 2 times.

Finally, the method of the present application is only a preferred embodiment and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An acceleration method applied to a convolutional neural network, comprising:

s3, convolving each feature map after sparse coding and each convolution kernel in the convolutional neural network which is sparsely coded in advance based on the convolution layer of the next layer of the layer;

the preset threshold comprises a first preset threshold and a second preset threshold; wherein the first preset threshold is smaller than the second preset threshold;

correspondingly, the step S2 specifically includes:

if the density of each feature map is greater than or equal to the second preset threshold, not performing sparse coding on each feature map;

the step S3 specifically includes:

and when the marks exist in each characteristic diagram, not calculating the elements corresponding to the marks in each characteristic diagram.

2. The method according to claim 1, wherein the step S1 specifically includes:

3. The method according to claim 1, wherein the step S3 is preceded by:

4. The method according to claim 3, wherein the step S3 specifically includes:

and when the mark exists in each convolution kernel, not calculating the element corresponding to the mark in each convolution kernel.

5. An accelerator for application to a convolutional neural network, comprising: the device comprises a neural network computing array module and a dynamic sparse adjustment module;

the dynamic sparse adjustment module is used for calculating the density of each feature map output by each layer in the convolutional neural network, comparing the density of each feature map with a plurality of preset thresholds, and performing sparse coding on each feature map according to a comparison result; wherein, different sparse coding modes corresponding to different comparison results;

the neural network computing array module is used for carrying out convolution operation on each feature map subjected to sparse coding and each convolution kernel in the convolution neural network subjected to sparse coding in advance;

correspondingly, the dynamic sparsity adjustment module includes a dynamic encoding module, and the dynamic encoding module is specifically configured to:

the neural network computing array module is specifically configured to:

6. The accelerator according to claim 5, wherein the dynamic sparsity adjustment module comprises an on-line thickness identification module, an output temporary registration module, a dynamic encoding module, and a dynamic sparsity control module;

7. The accelerator of claim 5, wherein the dynamic encoding module is further to:

8. The accelerator of claim 7, wherein the neural network computational array module is specifically configured to: