CN108510063A

CN108510063A - A kind of accelerated method and accelerator applied to convolutional neural networks

Info

Publication number: CN108510063A
Application number: CN201810306577.3A
Authority: CN
Inventors: 刘勇攀; 袁哲; 岳金山; 杨华中; 李学清; 王智博
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2018-04-08
Filing date: 2018-04-08
Publication date: 2018-09-07
Anticipated expiration: 2038-04-08
Also published as: WO2019196223A1; CN108510063B

Abstract

The present invention provides a kind of accelerated method and accelerator applied to convolutional neural networks, the method includes：S1 calculates separately the consistency of each characteristic pattern of this layer output for any layer in convolutional neural networks；The consistency of each characteristic pattern of this layer output is compared by S2 with multiple predetermined threshold values, and each characteristic pattern is carried out sparse coding according to comparison result；Wherein, different comparison results corresponds to different sparse coding modes；S3 carries out convolution based on this layer next layer of convolutional layer to each convolution kernel in the convolutional neural networks of each characteristic pattern and advance sparse coding after sparse coding.The present invention reduces the calculation amount of convolution algorithm in convolutional neural networks, improves arithmetic speed.

Description

A kind of accelerated method and accelerator applied to convolutional neural networks

Technical field

The invention belongs to operation optimisation technique fields, more particularly, to a kind of acceleration applied to convolutional neural networks Method and accelerator.

Background technology

Convolutional neural networks (Convolutional Neural Network, CNN) are a kind of feedforward neural network, people Work neuron can respond the surrounding cells in a part of coverage area, be suitable for the processing to large-scale image.Convolutional Neural net Network is widely used in the fields such as image recognition, speech recognition, but calculation amount is very big.

Due to the activation primitive ReLU in convolutional neural networks (Rectified linear unit correct linear unit) It can cause a large amount of sparse characteristic patterns (feature map)；Meanwhile it can be caused using the methods of beta pruning training convolutional neural networks A large amount of sparse weighted datas (weight data).Convolution can be greatly improved using the sparsity of characteristic pattern and weighted data The computational efficiency of neural network.Currently, it is sparse based on characteristic pattern in convolutional neural networks and weighted data to have many methods Property improve calculating speed.These methods can substantially be divided into two classes, and one kind, which is conceived to, skips 0 value.Such as the method removal having is defeated 0 value in entering, to reduce input as 0 invalid computation.It is another kind of to take the method for ignoring zero.Such as the method having is defeated Enter data be 0 when, do not execute multiplication operation, to reduce operation.But these methods are all conceived to processing sparse neural network sheet Body, it is assumed that sparse neural network is premise.However actually in convolutional neural networks each layer output characteristic pattern may be it is sparse, May be non-sparse.The weighted data of each layer of convolutional neural networks and the consistency of characteristic pattern are generally in 5%- in practice It is distributed between 90%.

Sparse matrix is element number that exponential quantity is 0 far more than the number of non-zero element, and non-zero Elemental redistribution does not have Regular matrix.On the one hand sparse convolutional neural networks can only be handled in the prior art, in convolutional neural networks Be not it is sparse in the case of calculation amount it is very big, arithmetic speed is low；The another aspect prior art can only handle weight in convolutional Neural Data or characteristic pattern are sparse situations, cannot handle weighted data and characteristic pattern all and be sparse situation.

Invention content

To overcome the problems, such as that above-mentioned convolutional neural networks arithmetic speed is low or solves the above problems at least partly, this hair It is bright to provide a kind of accelerated method and accelerator applied to convolutional neural networks.

According to the first aspect of the invention, a kind of accelerated method applied to convolutional neural networks is provided, including：

S1 calculates separately the consistency of each characteristic pattern of this layer output for any layer in convolutional neural networks；

The consistency of each characteristic pattern of this layer output is compared with multiple predetermined threshold values, is tied according to comparing by S2 Each characteristic pattern is carried out sparse coding by fruit；Wherein, different comparison results corresponds to different sparse coding modes；

S3, based on this layer next layer of convolutional layer to the institute of each characteristic pattern and advance sparse coding after sparse coding Each convolution kernel stated in convolutional neural networks carries out convolution.

Specifically, the step S1 is specifically included：

For any characteristic pattern, all elements in the number of non-zero element and this feature figure are counted in this feature figure Total number；

Using the ratio between the total number of all elements in the number of non-zero element in this feature figure and this feature figure as this The consistency of characteristic pattern.

Specifically, the predetermined threshold value includes the first predetermined threshold value and the second predetermined threshold value；Wherein, the described first default threshold Value is less than second predetermined threshold value；

Correspondingly, the step S2 is specifically included：

If the consistency of each characteristic pattern is less than first predetermined threshold value, each characteristic pattern is encoded to sparse Matrix storage format；

If the consistency of each characteristic pattern is greater than or equal to first predetermined threshold value, and is less than the described second default threshold Value, then 0 element in each characteristic pattern is marked；

If the consistency of each characteristic pattern be greater than or equal to second predetermined threshold value, not to each characteristic pattern into Row sparse coding.

Specifically, further include before the step S3：

Calculate the consistency of each convolution kernel in trained convolutional network；

If the consistency of each convolution kernel is less than first predetermined threshold value, each convolution kernel is encoded to sparse Matrix storage format；

If the consistency of each convolution kernel is greater than or equal to first predetermined threshold value, and is less than the described second default threshold Value, then 0 element in each convolution kernel is marked；

If the consistency of each convolution kernel be greater than or equal to second predetermined threshold value, not to each convolution kernel into Row sparse coding.

Specifically, the step S3 is specifically included：

When in each characteristic pattern or each convolution kernel there are when the label, to each characteristic pattern or each volume Mark corresponding element without calculating in product core.

A kind of accelerator applied to convolutional neural networks is provided according to a further aspect of the invention, including：Neural network meter Calculate array module and the sparse adjustment module of dynamic；

Wherein, the sparse adjustment module of the dynamic is used to calculate the dense of each characteristic pattern of each layer output of convolutional neural networks Degree, the consistency of each characteristic pattern is compared with multiple predetermined threshold values, according to comparison result to each characteristic pattern into Row sparse coding；Wherein, the corresponding different sparse coding modes of different comparison results；

The neural computing array module be used for after sparse coding each characteristic pattern and advance sparse coding The convolutional neural networks in each convolution kernel carry out convolution operation.

Specifically, the sparse adjustment module of the dynamic includes consistency identification module on line, the interim registration module of output, moves State coding module and the sparse control module of dynamic；

Wherein, consistency identification module is used to, for any characteristic pattern, count 0 element in this feature figure on the line Number and this feature figure in all elements total number；By all members in the number of 0 element in this feature figure and this feature figure Consistency of the ratio as this feature figure between the total number of element；

The interim registration module of output is used to store each characteristic pattern of each layer output in convolutional neural networks；

Each characteristic pattern that the sparse control module of dynamic is used to export consistency identification module on the line Consistency is compared with multiple predetermined threshold values；

The dynamic coding module is used to export each characteristic pattern in interim registration module by described according to comparison result Carry out sparse coding.

Correspondingly, the dynamic coding module is specifically used for：

Specifically, the dynamic coding module is additionally operable to：

If the consistency of each convolution kernel precalculated is less than first predetermined threshold value, by each convolution kernel It is encoded to sparse matrix storage format；

Specifically, the neural computing array module is specifically used for：

The present invention provides a kind of accelerated method and accelerator applied to convolutional neural networks, and this method passes through convolution is refreshing Consistency through each characteristic pattern of each layer output in network is compared with multiple predetermined threshold values, obtains the dilute of each characteristic pattern The characteristic pattern of different rarefaction states is carried out the sparse coding of different modes, is then based on the convolution of next layer of each layer by the state of dredging Layer carries out convolution operation to the convolution kernel in the convolutional neural networks of each characteristic pattern and advance sparse coding after sparse coding, subtracts The calculation amount of convolution algorithm, improves arithmetic speed in few convolutional neural networks.

Description of the drawings

Fig. 1 is the accelerated method overall flow schematic diagram provided in an embodiment of the present invention applied to convolutional neural networks；

Fig. 2 is the accelerator overall structure diagram provided in an embodiment of the present invention applied to convolutional neural networks；

Fig. 3 is end-point energy efficiency test knot in the accelerator provided in an embodiment of the present invention applied to convolutional neural networks Fruit schematic diagram；

Fig. 4 is end-point energy efficiency test knot in the accelerator provided in an embodiment of the present invention applied to convolutional neural networks Fruit contrast schematic diagram.

Specific implementation mode

With reference to the accompanying drawings and examples, the specific implementation mode of the present invention is described in further detail.Implement below Example is not limited to the scope of the present invention for illustrating the present invention.

A kind of accelerated method applied to convolutional neural networks is provided in one embodiment of the invention, and Fig. 1 is this hair The accelerated method overall flow schematic diagram applied to convolutional neural networks that bright embodiment provides, this method include：

Specifically, may include in the convolutional neural networks pond layer or include pond layer.First to the convolution Neural network is trained, and the convolution kernel after the completion of training in convolutional neural networks no longer changes, therefore in convolutional neural networks Convolution kernel do not need dynamic sparse coding on line, sparse coding is primary under direct line.Refer to being located to accelerate on the line Refer to not being located on the chip of the accelerator on the chip of device, under the line.It is directly read in each convolution algorithm sparse The convolution kernel of coding carries out convolutional calculation.When inputting raw image data, sparse coding is carried out to raw image data, then The first layer convolutional layer that the convolution kernel of the initial data of sparse coding and sparse coding is inputted to the convolutional neural networks carries out Convolutional calculation.Since raw image data is generally not sparse, sparse volume can not also be carried out to the raw image data Code, directly inputs the raw image data.The sparse coding is to store data with sparse format.

In S1, since the consistency of each characteristic pattern of every layer of output in convolutional neural networks is different, different layers output Characteristic pattern is also dynamic change, therefore consistency is also dynamic change.The consistency indicates the dilute of each characteristic pattern The degree of dredging.In order to preferably improve the arithmetic speed of the convolutional neural networks, the thick of each characteristic pattern of each layer of output is calculated Density carries out sparse coding with each characteristic pattern that the consistency of each characteristic pattern exported according to each layer exports each layer.

In S2, the characteristic pattern in the prior art exporting each layer all carries out sparse coding, computationally intensive.The present embodiment root The rarefaction state of each characteristic pattern of this layer output is obtained according to the predetermined threshold value.Thus by the characteristic pattern of different rarefaction states Carry out various forms of sparse codings.

It, will be each in the convolutional neural networks of each characteristic pattern and advance sparse coding after sparse coding in S3 Input of the convolution kernel as this layer next layer of convolutional layer carries out convolution operation.Then, using the result of the convolution operation as The characteristic pattern of lower layer of output of the convolutional layer is continued to execute above-mentioned sparse coding and volume by the input that lower layer of the convolutional layer Product operation, until last layer exports each characteristic pattern in the convolutional neural networks.Convolution kernel is not limited in the present embodiment Sparse coding mode.

The present embodiment carries out the consistency of each characteristic pattern of each layer output in convolutional neural networks with multiple predetermined threshold values Compare, obtain the rarefaction state of each characteristic pattern, the characteristic pattern of different rarefaction states is carried out to the sparse coding of different modes, It is then based on convolutional neural networks of the convolutional layer to each characteristic pattern and advance sparse coding after sparse coding of next layer of each layer In convolution kernel carry out convolution operation, reduce convolutional neural networks in convolution algorithm calculation amount, improve arithmetic speed.

On the basis of the above embodiments, in the present embodiment, the step S1 is specifically included：For any feature Figure counts the total number of the number of non-zero element and all elements in this feature figure in this feature figure；By non-zero member in this feature figure Consistency of the ratio as this feature figure in the number and this feature figure of element between the total number of all elements.

Specifically, the consistency of each characteristic pattern is the number of non-zero element and all elements in each characteristic pattern in each characteristic pattern Total number between ratio.For example, the number of non-zero element is 10 in a characteristic pattern, all elements is total in this feature figure Number is 100, then the consistency of this feature figure is 0.1.

On the basis of the above embodiments, predetermined threshold value described in the present embodiment includes that the first predetermined threshold value and second are default Threshold value；The predetermined threshold value includes the first predetermined threshold value and the second predetermined threshold value；Correspondingly, the step S2 is specifically included：If The consistency of each characteristic pattern is less than first predetermined threshold value, then each characteristic pattern is encoded to sparse matrix stores lattice Formula；If the consistency of each characteristic pattern is greater than or equal to first predetermined threshold value, and is less than second predetermined threshold value, then 0 element in each characteristic pattern is marked；If it is default that the consistency of each characteristic pattern is greater than or equal to described second Threshold value does not then carry out sparse coding to each characteristic pattern.

Specifically, predetermined threshold value described in the present embodiment includes the first predetermined threshold value th1 and the second predetermined threshold value th2.Root It is three kinds of states to divide the significant condition AS of each characteristic pattern according to first predetermined threshold value and second predetermined threshold value, i.e., The characteristic pattern that consistency is less than to first predetermined threshold value is divided into complete rarefaction state S, consistency is greater than or equal to described First predetermined threshold value, and it is divided into medium rarefaction state M less than the characteristic pattern of second predetermined threshold value, consistency is more than or is waited It is divided into complete non-rarefaction state D in the characteristic pattern of second predetermined threshold value.If each characteristic pattern is rarefaction state S, by each institute It states characteristic pattern and is encoded to sparse matrix storage format, the sparse matrix storage format includes the non-zero number in each characteristic pattern According to activ and sparse index index, such as codes co-ordinates and compression loose line coding.By the way that characteristic pattern is encoded to sparse square Battle array storage format can save a large amount of memory spaces, while save a large amount of calculating times.If each characteristic pattern is medium rarefaction state M, then by the 0 element addition label guard in each characteristic pattern, the label is for identifying 0 element.It can be with for the element of label It is not involved in calculating and storage, to reduce power consumption.It is also a kind of sparse coding side that 0 element in each characteristic pattern, which is marked, Formula.If each characteristic pattern is complete non-rarefaction state D, dynamic coding is not needed, the non-sparse data of each characteristic pattern is directly exported.

On the basis of the above embodiments, further include before step S3 described in the present embodiment：Calculate trained convolution The consistency of each convolution kernel in network；It, will be each described if the consistency of each convolution kernel is less than first predetermined threshold value Convolution kernel is encoded to sparse matrix storage format；If the consistency of each convolution kernel is greater than or equal to the described first default threshold Value, and be less than second predetermined threshold value, then 0 element in each convolution kernel is marked；If each convolution kernel Consistency is greater than or equal to second predetermined threshold value, then is not encoded to each convolution kernel.

Specifically, the consistency of each convolution kernel is the number of non-zero element and all elements in each convolution kernel in each convolution kernel Total number between ratio.The state WS of each convolution kernel is divided into three kinds of states as characteristic pattern.Each state corresponds to not Same sparse coding mode.Since characteristic pattern and convolution kernel respectively have three kinds of states, 9 kinds of states are shared after combination, to volume The consistency of product neural network carries out more fine-grained division.

On the basis of the various embodiments described above, step S3 described in the present embodiment specifically includes：When each characteristic pattern or There are when the label in each convolution kernel, to marked in each characteristic pattern or each convolution kernel corresponding element not into Row calculates.

Specifically, when each characteristic pattern or each convolution kernel are complete rarefaction state S, 0 is removed before input, is subtracted Few memory space, while without calculating 0 element；When each characteristic pattern or each convolution kernel are medium rarefaction state When M, although the 0 element storage in each characteristic pattern or each convolution kernel, to marking corresponding element without calculating, It is calculated to reduce.

A kind of accelerator applied to convolutional neural networks is provided in another embodiment of the present invention, and Fig. 2 is this hair The accelerator overall structure diagram applied to convolutional neural networks that bright embodiment provides, including：Neural computing array Module and the sparse adjustment module of dynamic；Wherein, the sparse adjustment module of the dynamic is defeated for calculating each layer in convolutional neural networks The consistency of each characteristic pattern is compared by the consistency of each characteristic pattern gone out with multiple predetermined threshold values, is tied according to comparing Fruit carries out sparse coding to each characteristic pattern；Wherein, the corresponding different sparse coding modes of different comparison results；The god It is used for the convolutional Neural to each characteristic pattern and advance sparse coding after sparse coding through network calculations array module Each convolution kernel in network carries out convolution operation.

Specifically, may include in the convolutional neural networks pond layer or include pond layer.First to convolutional Neural Network is trained, and the convolution kernel after the completion of training in convolutional neural networks no longer changes, therefore the volume in convolutional neural networks Product core does not need dynamic sparse coding on line, and sparse coding is primary under direct line.The nerve in each convolution algorithm The convolution kernel that network calculations array module directly reads sparse coding under line carries out convolutional calculation.When convolutional neural networks input original When beginning image data, the sparse adjustment module of dynamic carries out sparse coding to raw image data, then neural computing array Module carries out convolutional calculation according to the initial data of sparse coding and the convolution kernel of sparse coding.Since raw image data is general It is not sparse, sparse coding can not also be carried out to the raw image data, directly input the raw image data.Institute It is to store data with sparse format to state sparse coding.

Since the consistency of each characteristic pattern of each layer output in convolutional neural networks is different, the characteristic pattern of different layers output It is also dynamic change, therefore consistency is also dynamic change.The consistency indicates the sparse degree of each characteristic pattern. In order to preferably improve the arithmetic speed of the convolutional neural networks, the sparse adjustment module of dynamic calculates each layer of output The consistency of each characteristic pattern is carried out with each characteristic pattern that the consistency of each characteristic pattern exported according to each layer exports each layer Sparse coding.

The sparse adjustment module of dynamic obtains the sparse of each characteristic pattern of this layer output according to multiple predetermined threshold values State.To which the characteristic pattern of different rarefaction states is carried out various forms of sparse codings, it is not limited solely to a kind of sparse coding. The special card figure that each layer is exported in the prior art all carries out sparse coding, computationally intensive.

The neural computing array module according to after sparse coding each characteristic pattern and advance sparse coding Each convolution kernel in the convolutional neural networks carries out convolution operation.If including pond module, the pond module will be described The result of convolution operation carries out pondization operation.In addition, the accelerator further includes intermediate data storage device module, master chip control Data exchange module above and below device and chip.Wherein, the run action and sequential of the entire chip of main controller controls accelerator. Chip data write-in of the data exchange module for being calculated from the chip exterior memory read data or by chip up and down External storage.For example, after initialization, chip passes through data exchange module above and below the chip under the control of the master controller Raw image data and initial convolution kernel are read from external memory.The intermediate data storage device module is for storing the god Through the intermediate result in network calculations array module calculating process.

The sparse adjustment module of the present embodiment dynamic by the consistency of each characteristic pattern of convolutional neural networks each layer output with it is more A predetermined threshold value is compared, and obtains the rarefaction state of each characteristic pattern, the characteristic pattern of different rarefaction states is carried out different The sparse coding of mode, for neural computing array module to after sparse coding each characteristic pattern and advance sparse coding Convolution kernel in convolutional neural networks carries out convolution operation, on the one hand, the calculation amount of convolution algorithm in convolutional neural networks is reduced, Improve arithmetic speed；On the other hand, according to the difference of rarefaction state, the processing state of switching at runtime accelerator improves acceleration The flexibility of device.

On the basis of the above embodiments, the sparse adjustment module of dynamic described in the present embodiment includes consistency identification on line Module, the interim registration module of output, dynamic coding module and the sparse control module of dynamic；Wherein, consistency identifies on the line Module is used for for any characteristic pattern, counts in this feature figure the total of all elements in the number of 0 element and this feature figure Number；Using the ratio between the total number of all elements in the number of 0 element in this feature figure and this feature figure as this feature The consistency of figure；The interim registration module of output is used to store each characteristic pattern of each layer output in convolutional neural networks； The sparse control module of dynamic be used for by the line consistency identification module export each characteristic pattern consistency with Multiple predetermined threshold values are compared；The dynamic coding module is used for will be in the interim registration module of output according to comparison result Each characteristic pattern carries out sparse coding.

Specifically, the sparse adjustment module of the dynamic specifically includes four modules.Consistency identification module is used on the line In the number for counting non-zero element in each characteristic pattern in calculating process, to calculate the consistency of each characteristic pattern.The output is posted temporarily Storing module is used to keep in the characteristic pattern of each layer output in convolutional neural networks with non-sparse format.The sparse control of dynamic Molding block is used to control the rarefaction state of the characteristic pattern by preset multiple predetermined threshold values.The dynamic coding module according to Each characteristic pattern in the interim registration module of output is carried out sparse coding by the rarefaction state of each characteristic pattern, to carry The speed of high convolution algorithm.

On the basis of the above embodiments, predetermined threshold value described in the present embodiment includes that the first predetermined threshold value and second are default Threshold value；The predetermined threshold value includes the first predetermined threshold value and the second predetermined threshold value；Correspondingly, the dynamic coding module is specifically used In：If the consistency of each characteristic pattern is less than first predetermined threshold value, each characteristic pattern is encoded to sparse matrix Storage format；If the consistency of each characteristic pattern is greater than or equal to first predetermined threshold value, and default less than described second Then 0 element in each characteristic pattern is marked for threshold value；If the consistency of each characteristic pattern is greater than or equal to described the Two predetermined threshold values do not encode each characteristic pattern then.

Specifically, predetermined threshold value described in the present embodiment includes the first predetermined threshold value th1 and the second predetermined threshold value th2.Institute The sparse control module of dynamic is stated according to first predetermined threshold value and second predetermined threshold value by the feature of each characteristic pattern AS points of state is three kinds of states, i.e., the characteristic pattern that consistency is less than to first predetermined threshold value is divided into complete rarefaction state S, will Consistency is greater than or equal to first predetermined threshold value, and is divided into medium sparse shape less than the characteristic pattern of second predetermined threshold value State M, the characteristic pattern that consistency is greater than or equal to second predetermined threshold value are divided into complete non-rarefaction state D.

If each characteristic pattern is rarefaction state S, the dynamic coding module exports each institute in interim registration module by described It states characteristic pattern and is encoded to sparse matrix storage format, the sparse matrix storage format includes the non-zero number in each characteristic pattern According to activ and sparse index index, such as codes co-ordinates and compression loose line coding.By the way that characteristic pattern is encoded to sparse square Battle array storage format can save a large amount of memory spaces, while save a large amount of calculating times.If each characteristic pattern is medium rarefaction state M, then the dynamic coding module is right by the 0 element addition label guard in each characteristic pattern in the interim registration module of the output It can be not involved in calculating and storage in the element of label, to reduce power consumption.If each characteristic pattern is complete non-rarefaction state D, Dynamic coding is not needed, the dynamic coding module directly exports the non-sparse data of each characteristic pattern.

On the basis of the above embodiments, dynamic coding module described in the present embodiment is additionally operable to：If what is precalculated is each The consistency of the convolution kernel is less than first predetermined threshold value, then each convolution kernel is encoded to sparse matrix stores lattice Formula；If the consistency of each convolution kernel is greater than or equal to first predetermined threshold value, and is less than second predetermined threshold value, then 0 element in each convolution kernel is marked；If it is default that the consistency of each convolution kernel is greater than or equal to described second Threshold value does not encode each convolution kernel then.

Specifically, the consistency of each convolution kernel is the number of non-zero element and all elements in each convolution kernel in each convolution kernel Total number between ratio.There are three types of states as characteristic pattern by the state WS of each convolution kernel.Each state corresponds to different Sparse coding mode.Since characteristic pattern and convolution kernel respectively have three kinds of states, 9 kinds of states are shared after combination, to convolution god Consistency through network carries out more fine-grained division.

On the basis of the various embodiments described above, neural computing array module described in the present embodiment is specifically used for：When There are when the label in each characteristic pattern or each convolution kernel, to being marked in each characteristic pattern or each convolution kernel Corresponding element is without calculating.

Specifically, when each characteristic pattern or each convolution kernel are complete rarefaction state S, by each characteristic pattern Or 0 is removed before each convolution kernel input neural computing array module, memory space is reduced, while without to 0 yuan Element is calculated；When each characteristic pattern or each convolution kernel are medium rarefaction state M, each characteristic pattern or each described Although the 0 element storage in convolution kernel is calculated to reduce marking corresponding element without calculating.

For example, being made the chip of the accelerator of Taiwan Semiconductor Manufacturing Co.'s 65nm techniques, the area of the chip is 3mm*4mm, Running frequency is 20-200MHz, and power consumption is 20.5-248.4 milliwatts.In the present embodiment end-point energy efficiency can with characteristic pattern and The decline of convolution kernel consistency and rapid increase, as shown in Figure 3.When the consistency of characteristic pattern and convolution kernel is 5%, the limit Energy efficiency can reach 62.1TOPS/W, be 6.2 times of end-point energy efficiency when not using the present embodiment accelerator.Such as Fig. 4 institutes Show, compared to the realization of only supported feature Sparse, the present embodiment energy efficiency can promote 4.3 times.Compared to without adaptive The realization of sparse control, energy efficiency of the present invention can promote 2.8 times.Compared to the control of no consistency but the variable reality of quantified precision Existing, energy efficiency of the present invention can promote 2 times.

Finally, the present processes are only preferable embodiment, are not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent replacement, improvement and so on should be included in the protection of the present invention Within the scope of.

Claims

1. a kind of accelerated method applied to convolutional neural networks, which is characterized in that including：

The consistency of each characteristic pattern of this layer output is compared by S2 with multiple predetermined threshold values, will according to comparison result Each characteristic pattern carries out sparse coding；Wherein, different comparison results corresponds to different sparse coding modes；

S3, based on this layer next layer of convolutional layer to the volume of each characteristic pattern and advance sparse coding after sparse coding Each convolution kernel in product neural network carries out convolution.

2. according to the method described in claim 1, it is characterized in that, the step S1 is specifically included：

For any characteristic pattern, total of all elements in the number of non-zero element and this feature figure is counted in this feature figure Number；

Using the ratio between the total number of all elements in the number of non-zero element in this feature figure and this feature figure as this feature The consistency of figure.

3. according to the method described in claim 1, it is characterized in that, the predetermined threshold value includes that the first predetermined threshold value and second are pre- If threshold value；Wherein, first predetermined threshold value is less than second predetermined threshold value；

Correspondingly, the step S2 is specifically included：

If the consistency of each characteristic pattern is greater than or equal to first predetermined threshold value, and is less than second predetermined threshold value, Then 0 element in each characteristic pattern is marked；

If the consistency of each characteristic pattern is greater than or equal to second predetermined threshold value, each characteristic pattern is not carried out dilute Dredge coding.

4. according to the method described in claim 3, it is characterized in that, further including before the step S3：

If the consistency of each convolution kernel is greater than or equal to first predetermined threshold value, and is less than second predetermined threshold value, Then 0 element in each convolution kernel is marked；

If the consistency of each convolution kernel is greater than or equal to second predetermined threshold value, each convolution kernel is not carried out dilute Dredge coding.

5. method according to claim 3 or 4, which is characterized in that the step S3 is specifically included：

When in each characteristic pattern or each convolution kernel there are when the label, to each characteristic pattern or each convolution kernel It is middle to mark corresponding element without calculating.

6. a kind of accelerator applied to convolutional neural networks, which is characterized in that including：Neural computing array module and dynamic The sparse adjustment module of state；

Wherein, the sparse adjustment module of the dynamic is used to calculate the dense of each characteristic pattern that each layer exports in convolutional neural networks Degree, the consistency of each characteristic pattern is compared with multiple predetermined threshold values, according to comparison result to each characteristic pattern into Row sparse coding；Wherein, the corresponding different sparse coding modes of different comparison results；

The neural computing array module is used for the institute to each characteristic pattern and advance sparse coding after sparse coding Each convolution kernel stated in convolutional neural networks carries out convolution operation.

7. accelerator according to claim 6, which is characterized in that the sparse adjustment module of dynamic includes consistency on line Identification module, the interim registration module of output, dynamic coding module and the sparse control module of dynamic；

Wherein, consistency identification module is used to count any characteristic pattern of 0 element in this feature figure on the line The total number of all elements in number and this feature figure；By all elements in the number of 0 element in this feature figure and this feature figure Consistency of the ratio as this feature figure between total number；

The sparse control module of dynamic be used for by the line consistency identification module export each characteristic pattern it is dense Degree is compared with multiple predetermined threshold values；

The dynamic coding module is used to be carried out each characteristic pattern in the interim registration module of output according to comparison result Sparse coding.

8. accelerator according to claim 7, which is characterized in that the predetermined threshold value includes the first predetermined threshold value and second Predetermined threshold value；Wherein, first predetermined threshold value is less than second predetermined threshold value；

Correspondingly, the dynamic coding module is specifically used for：

9. accelerator according to claim 8, which is characterized in that the dynamic coding module is additionally operable to：

If the consistency of each convolution kernel precalculated is less than first predetermined threshold value, each convolution kernel is encoded For sparse matrix storage format；

10. accelerator according to claim 7 or 8, which is characterized in that the neural computing array module is specifically used In：