CN108510063A - A kind of accelerated method and accelerator applied to convolutional neural networks - Google Patents
A kind of accelerated method and accelerator applied to convolutional neural networks Download PDFInfo
- Publication number
- CN108510063A CN108510063A CN201810306577.3A CN201810306577A CN108510063A CN 108510063 A CN108510063 A CN 108510063A CN 201810306577 A CN201810306577 A CN 201810306577A CN 108510063 A CN108510063 A CN 108510063A
- Authority
- CN
- China
- Prior art keywords
- characteristic pattern
- predetermined threshold
- threshold value
- consistency
- convolution kernel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Complex Calculations (AREA)
Abstract
The present invention provides a kind of accelerated method and accelerator applied to convolutional neural networks, the method includes:S1 calculates separately the consistency of each characteristic pattern of this layer output for any layer in convolutional neural networks;The consistency of each characteristic pattern of this layer output is compared by S2 with multiple predetermined threshold values, and each characteristic pattern is carried out sparse coding according to comparison result;Wherein, different comparison results corresponds to different sparse coding modes;S3 carries out convolution based on this layer next layer of convolutional layer to each convolution kernel in the convolutional neural networks of each characteristic pattern and advance sparse coding after sparse coding.The present invention reduces the calculation amount of convolution algorithm in convolutional neural networks, improves arithmetic speed.
Description
Technical field
The invention belongs to operation optimisation technique fields, more particularly, to a kind of acceleration applied to convolutional neural networks
Method and accelerator.
Background technology
Convolutional neural networks (Convolutional Neural Network, CNN) are a kind of feedforward neural network, people
Work neuron can respond the surrounding cells in a part of coverage area, be suitable for the processing to large-scale image.Convolutional Neural net
Network is widely used in the fields such as image recognition, speech recognition, but calculation amount is very big.
Due to the activation primitive ReLU in convolutional neural networks (Rectified linear unit correct linear unit)
It can cause a large amount of sparse characteristic patterns (feature map);Meanwhile it can be caused using the methods of beta pruning training convolutional neural networks
A large amount of sparse weighted datas (weight data).Convolution can be greatly improved using the sparsity of characteristic pattern and weighted data
The computational efficiency of neural network.Currently, it is sparse based on characteristic pattern in convolutional neural networks and weighted data to have many methods
Property improve calculating speed.These methods can substantially be divided into two classes, and one kind, which is conceived to, skips 0 value.Such as the method removal having is defeated
0 value in entering, to reduce input as 0 invalid computation.It is another kind of to take the method for ignoring zero.Such as the method having is defeated
Enter data be 0 when, do not execute multiplication operation, to reduce operation.But these methods are all conceived to processing sparse neural network sheet
Body, it is assumed that sparse neural network is premise.However actually in convolutional neural networks each layer output characteristic pattern may be it is sparse,
May be non-sparse.The weighted data of each layer of convolutional neural networks and the consistency of characteristic pattern are generally in 5%- in practice
It is distributed between 90%.
Sparse matrix is element number that exponential quantity is 0 far more than the number of non-zero element, and non-zero Elemental redistribution does not have
Regular matrix.On the one hand sparse convolutional neural networks can only be handled in the prior art, in convolutional neural networks
Be not it is sparse in the case of calculation amount it is very big, arithmetic speed is low;The another aspect prior art can only handle weight in convolutional Neural
Data or characteristic pattern are sparse situations, cannot handle weighted data and characteristic pattern all and be sparse situation.
Invention content
To overcome the problems, such as that above-mentioned convolutional neural networks arithmetic speed is low or solves the above problems at least partly, this hair
It is bright to provide a kind of accelerated method and accelerator applied to convolutional neural networks.
According to the first aspect of the invention, a kind of accelerated method applied to convolutional neural networks is provided, including:
S1 calculates separately the consistency of each characteristic pattern of this layer output for any layer in convolutional neural networks;
The consistency of each characteristic pattern of this layer output is compared with multiple predetermined threshold values, is tied according to comparing by S2
Each characteristic pattern is carried out sparse coding by fruit;Wherein, different comparison results corresponds to different sparse coding modes;
S3, based on this layer next layer of convolutional layer to the institute of each characteristic pattern and advance sparse coding after sparse coding
Each convolution kernel stated in convolutional neural networks carries out convolution.
Specifically, the step S1 is specifically included:
For any characteristic pattern, all elements in the number of non-zero element and this feature figure are counted in this feature figure
Total number;
Using the ratio between the total number of all elements in the number of non-zero element in this feature figure and this feature figure as this
The consistency of characteristic pattern.
Specifically, the predetermined threshold value includes the first predetermined threshold value and the second predetermined threshold value;Wherein, the described first default threshold
Value is less than second predetermined threshold value;
Correspondingly, the step S2 is specifically included:
If the consistency of each characteristic pattern is less than first predetermined threshold value, each characteristic pattern is encoded to sparse
Matrix storage format;
If the consistency of each characteristic pattern is greater than or equal to first predetermined threshold value, and is less than the described second default threshold
Value, then 0 element in each characteristic pattern is marked;
If the consistency of each characteristic pattern be greater than or equal to second predetermined threshold value, not to each characteristic pattern into
Row sparse coding.
Specifically, further include before the step S3:
Calculate the consistency of each convolution kernel in trained convolutional network;
If the consistency of each convolution kernel is less than first predetermined threshold value, each convolution kernel is encoded to sparse
Matrix storage format;
If the consistency of each convolution kernel is greater than or equal to first predetermined threshold value, and is less than the described second default threshold
Value, then 0 element in each convolution kernel is marked;
If the consistency of each convolution kernel be greater than or equal to second predetermined threshold value, not to each convolution kernel into
Row sparse coding.
Specifically, the step S3 is specifically included:
When in each characteristic pattern or each convolution kernel there are when the label, to each characteristic pattern or each volume
Mark corresponding element without calculating in product core.
A kind of accelerator applied to convolutional neural networks is provided according to a further aspect of the invention, including:Neural network meter
Calculate array module and the sparse adjustment module of dynamic;
Wherein, the sparse adjustment module of the dynamic is used to calculate the dense of each characteristic pattern of each layer output of convolutional neural networks
Degree, the consistency of each characteristic pattern is compared with multiple predetermined threshold values, according to comparison result to each characteristic pattern into
Row sparse coding;Wherein, the corresponding different sparse coding modes of different comparison results;
The neural computing array module be used for after sparse coding each characteristic pattern and advance sparse coding
The convolutional neural networks in each convolution kernel carry out convolution operation.
Specifically, the sparse adjustment module of the dynamic includes consistency identification module on line, the interim registration module of output, moves
State coding module and the sparse control module of dynamic;
Wherein, consistency identification module is used to, for any characteristic pattern, count 0 element in this feature figure on the line
Number and this feature figure in all elements total number;By all members in the number of 0 element in this feature figure and this feature figure
Consistency of the ratio as this feature figure between the total number of element;
The interim registration module of output is used to store each characteristic pattern of each layer output in convolutional neural networks;
Each characteristic pattern that the sparse control module of dynamic is used to export consistency identification module on the line
Consistency is compared with multiple predetermined threshold values;
The dynamic coding module is used to export each characteristic pattern in interim registration module by described according to comparison result
Carry out sparse coding.
Specifically, the predetermined threshold value includes the first predetermined threshold value and the second predetermined threshold value;Wherein, the described first default threshold
Value is less than second predetermined threshold value;
Correspondingly, the dynamic coding module is specifically used for:
If the consistency of each characteristic pattern is less than first predetermined threshold value, each characteristic pattern is encoded to sparse
Matrix storage format;
If the consistency of each characteristic pattern is greater than or equal to first predetermined threshold value, and is less than the described second default threshold
Value, then 0 element in each characteristic pattern is marked;
If the consistency of each characteristic pattern be greater than or equal to second predetermined threshold value, not to each characteristic pattern into
Row sparse coding.
Specifically, the dynamic coding module is additionally operable to:
If the consistency of each convolution kernel precalculated is less than first predetermined threshold value, by each convolution kernel
It is encoded to sparse matrix storage format;
If the consistency of each convolution kernel is greater than or equal to first predetermined threshold value, and is less than the described second default threshold
Value, then 0 element in each convolution kernel is marked;
If the consistency of each convolution kernel be greater than or equal to second predetermined threshold value, not to each convolution kernel into
Row sparse coding.
Specifically, the neural computing array module is specifically used for:
When in each characteristic pattern or each convolution kernel there are when the label, to each characteristic pattern or each volume
Mark corresponding element without calculating in product core.
The present invention provides a kind of accelerated method and accelerator applied to convolutional neural networks, and this method passes through convolution is refreshing
Consistency through each characteristic pattern of each layer output in network is compared with multiple predetermined threshold values, obtains the dilute of each characteristic pattern
The characteristic pattern of different rarefaction states is carried out the sparse coding of different modes, is then based on the convolution of next layer of each layer by the state of dredging
Layer carries out convolution operation to the convolution kernel in the convolutional neural networks of each characteristic pattern and advance sparse coding after sparse coding, subtracts
The calculation amount of convolution algorithm, improves arithmetic speed in few convolutional neural networks.
Description of the drawings
Fig. 1 is the accelerated method overall flow schematic diagram provided in an embodiment of the present invention applied to convolutional neural networks;
Fig. 2 is the accelerator overall structure diagram provided in an embodiment of the present invention applied to convolutional neural networks;
Fig. 3 is end-point energy efficiency test knot in the accelerator provided in an embodiment of the present invention applied to convolutional neural networks
Fruit schematic diagram;
Fig. 4 is end-point energy efficiency test knot in the accelerator provided in an embodiment of the present invention applied to convolutional neural networks
Fruit contrast schematic diagram.
Specific implementation mode
With reference to the accompanying drawings and examples, the specific implementation mode of the present invention is described in further detail.Implement below
Example is not limited to the scope of the present invention for illustrating the present invention.
A kind of accelerated method applied to convolutional neural networks is provided in one embodiment of the invention, and Fig. 1 is this hair
The accelerated method overall flow schematic diagram applied to convolutional neural networks that bright embodiment provides, this method include:
S1 calculates separately the consistency of each characteristic pattern of this layer output for any layer in convolutional neural networks;
Specifically, may include in the convolutional neural networks pond layer or include pond layer.First to the convolution
Neural network is trained, and the convolution kernel after the completion of training in convolutional neural networks no longer changes, therefore in convolutional neural networks
Convolution kernel do not need dynamic sparse coding on line, sparse coding is primary under direct line.Refer to being located to accelerate on the line
Refer to not being located on the chip of the accelerator on the chip of device, under the line.It is directly read in each convolution algorithm sparse
The convolution kernel of coding carries out convolutional calculation.When inputting raw image data, sparse coding is carried out to raw image data, then
The first layer convolutional layer that the convolution kernel of the initial data of sparse coding and sparse coding is inputted to the convolutional neural networks carries out
Convolutional calculation.Since raw image data is generally not sparse, sparse volume can not also be carried out to the raw image data
Code, directly inputs the raw image data.The sparse coding is to store data with sparse format.
In S1, since the consistency of each characteristic pattern of every layer of output in convolutional neural networks is different, different layers output
Characteristic pattern is also dynamic change, therefore consistency is also dynamic change.The consistency indicates the dilute of each characteristic pattern
The degree of dredging.In order to preferably improve the arithmetic speed of the convolutional neural networks, the thick of each characteristic pattern of each layer of output is calculated
Density carries out sparse coding with each characteristic pattern that the consistency of each characteristic pattern exported according to each layer exports each layer.
The consistency of each characteristic pattern of this layer output is compared with multiple predetermined threshold values, is tied according to comparing by S2
Each characteristic pattern is carried out sparse coding by fruit;Wherein, different comparison results corresponds to different sparse coding modes;
In S2, the characteristic pattern in the prior art exporting each layer all carries out sparse coding, computationally intensive.The present embodiment root
The rarefaction state of each characteristic pattern of this layer output is obtained according to the predetermined threshold value.Thus by the characteristic pattern of different rarefaction states
Carry out various forms of sparse codings.
S3, based on this layer next layer of convolutional layer to the institute of each characteristic pattern and advance sparse coding after sparse coding
Each convolution kernel stated in convolutional neural networks carries out convolution.
It, will be each in the convolutional neural networks of each characteristic pattern and advance sparse coding after sparse coding in S3
Input of the convolution kernel as this layer next layer of convolutional layer carries out convolution operation.Then, using the result of the convolution operation as
The characteristic pattern of lower layer of output of the convolutional layer is continued to execute above-mentioned sparse coding and volume by the input that lower layer of the convolutional layer
Product operation, until last layer exports each characteristic pattern in the convolutional neural networks.Convolution kernel is not limited in the present embodiment
Sparse coding mode.
The present embodiment carries out the consistency of each characteristic pattern of each layer output in convolutional neural networks with multiple predetermined threshold values
Compare, obtain the rarefaction state of each characteristic pattern, the characteristic pattern of different rarefaction states is carried out to the sparse coding of different modes,
It is then based on convolutional neural networks of the convolutional layer to each characteristic pattern and advance sparse coding after sparse coding of next layer of each layer
In convolution kernel carry out convolution operation, reduce convolutional neural networks in convolution algorithm calculation amount, improve arithmetic speed.
On the basis of the above embodiments, in the present embodiment, the step S1 is specifically included:For any feature
Figure counts the total number of the number of non-zero element and all elements in this feature figure in this feature figure;By non-zero member in this feature figure
Consistency of the ratio as this feature figure in the number and this feature figure of element between the total number of all elements.
Specifically, the consistency of each characteristic pattern is the number of non-zero element and all elements in each characteristic pattern in each characteristic pattern
Total number between ratio.For example, the number of non-zero element is 10 in a characteristic pattern, all elements is total in this feature figure
Number is 100, then the consistency of this feature figure is 0.1.
On the basis of the above embodiments, predetermined threshold value described in the present embodiment includes that the first predetermined threshold value and second are default
Threshold value;The predetermined threshold value includes the first predetermined threshold value and the second predetermined threshold value;Correspondingly, the step S2 is specifically included:If
The consistency of each characteristic pattern is less than first predetermined threshold value, then each characteristic pattern is encoded to sparse matrix stores lattice
Formula;If the consistency of each characteristic pattern is greater than or equal to first predetermined threshold value, and is less than second predetermined threshold value, then
0 element in each characteristic pattern is marked;If it is default that the consistency of each characteristic pattern is greater than or equal to described second
Threshold value does not then carry out sparse coding to each characteristic pattern.
Specifically, predetermined threshold value described in the present embodiment includes the first predetermined threshold value th1 and the second predetermined threshold value th2.Root
It is three kinds of states to divide the significant condition AS of each characteristic pattern according to first predetermined threshold value and second predetermined threshold value, i.e.,
The characteristic pattern that consistency is less than to first predetermined threshold value is divided into complete rarefaction state S, consistency is greater than or equal to described
First predetermined threshold value, and it is divided into medium rarefaction state M less than the characteristic pattern of second predetermined threshold value, consistency is more than or is waited
It is divided into complete non-rarefaction state D in the characteristic pattern of second predetermined threshold value.If each characteristic pattern is rarefaction state S, by each institute
It states characteristic pattern and is encoded to sparse matrix storage format, the sparse matrix storage format includes the non-zero number in each characteristic pattern
According to activ and sparse index index, such as codes co-ordinates and compression loose line coding.By the way that characteristic pattern is encoded to sparse square
Battle array storage format can save a large amount of memory spaces, while save a large amount of calculating times.If each characteristic pattern is medium rarefaction state
M, then by the 0 element addition label guard in each characteristic pattern, the label is for identifying 0 element.It can be with for the element of label
It is not involved in calculating and storage, to reduce power consumption.It is also a kind of sparse coding side that 0 element in each characteristic pattern, which is marked,
Formula.If each characteristic pattern is complete non-rarefaction state D, dynamic coding is not needed, the non-sparse data of each characteristic pattern is directly exported.
On the basis of the above embodiments, further include before step S3 described in the present embodiment:Calculate trained convolution
The consistency of each convolution kernel in network;It, will be each described if the consistency of each convolution kernel is less than first predetermined threshold value
Convolution kernel is encoded to sparse matrix storage format;If the consistency of each convolution kernel is greater than or equal to the described first default threshold
Value, and be less than second predetermined threshold value, then 0 element in each convolution kernel is marked;If each convolution kernel
Consistency is greater than or equal to second predetermined threshold value, then is not encoded to each convolution kernel.
Specifically, the consistency of each convolution kernel is the number of non-zero element and all elements in each convolution kernel in each convolution kernel
Total number between ratio.The state WS of each convolution kernel is divided into three kinds of states as characteristic pattern.Each state corresponds to not
Same sparse coding mode.Since characteristic pattern and convolution kernel respectively have three kinds of states, 9 kinds of states are shared after combination, to volume
The consistency of product neural network carries out more fine-grained division.
On the basis of the various embodiments described above, step S3 described in the present embodiment specifically includes:When each characteristic pattern or
There are when the label in each convolution kernel, to marked in each characteristic pattern or each convolution kernel corresponding element not into
Row calculates.
Specifically, when each characteristic pattern or each convolution kernel are complete rarefaction state S, 0 is removed before input, is subtracted
Few memory space, while without calculating 0 element;When each characteristic pattern or each convolution kernel are medium rarefaction state
When M, although the 0 element storage in each characteristic pattern or each convolution kernel, to marking corresponding element without calculating,
It is calculated to reduce.
A kind of accelerator applied to convolutional neural networks is provided in another embodiment of the present invention, and Fig. 2 is this hair
The accelerator overall structure diagram applied to convolutional neural networks that bright embodiment provides, including:Neural computing array
Module and the sparse adjustment module of dynamic;Wherein, the sparse adjustment module of the dynamic is defeated for calculating each layer in convolutional neural networks
The consistency of each characteristic pattern is compared by the consistency of each characteristic pattern gone out with multiple predetermined threshold values, is tied according to comparing
Fruit carries out sparse coding to each characteristic pattern;Wherein, the corresponding different sparse coding modes of different comparison results;The god
It is used for the convolutional Neural to each characteristic pattern and advance sparse coding after sparse coding through network calculations array module
Each convolution kernel in network carries out convolution operation.
Specifically, may include in the convolutional neural networks pond layer or include pond layer.First to convolutional Neural
Network is trained, and the convolution kernel after the completion of training in convolutional neural networks no longer changes, therefore the volume in convolutional neural networks
Product core does not need dynamic sparse coding on line, and sparse coding is primary under direct line.The nerve in each convolution algorithm
The convolution kernel that network calculations array module directly reads sparse coding under line carries out convolutional calculation.When convolutional neural networks input original
When beginning image data, the sparse adjustment module of dynamic carries out sparse coding to raw image data, then neural computing array
Module carries out convolutional calculation according to the initial data of sparse coding and the convolution kernel of sparse coding.Since raw image data is general
It is not sparse, sparse coding can not also be carried out to the raw image data, directly input the raw image data.Institute
It is to store data with sparse format to state sparse coding.
Since the consistency of each characteristic pattern of each layer output in convolutional neural networks is different, the characteristic pattern of different layers output
It is also dynamic change, therefore consistency is also dynamic change.The consistency indicates the sparse degree of each characteristic pattern.
In order to preferably improve the arithmetic speed of the convolutional neural networks, the sparse adjustment module of dynamic calculates each layer of output
The consistency of each characteristic pattern is carried out with each characteristic pattern that the consistency of each characteristic pattern exported according to each layer exports each layer
Sparse coding.
The sparse adjustment module of dynamic obtains the sparse of each characteristic pattern of this layer output according to multiple predetermined threshold values
State.To which the characteristic pattern of different rarefaction states is carried out various forms of sparse codings, it is not limited solely to a kind of sparse coding.
The special card figure that each layer is exported in the prior art all carries out sparse coding, computationally intensive.
The neural computing array module according to after sparse coding each characteristic pattern and advance sparse coding
Each convolution kernel in the convolutional neural networks carries out convolution operation.If including pond module, the pond module will be described
The result of convolution operation carries out pondization operation.In addition, the accelerator further includes intermediate data storage device module, master chip control
Data exchange module above and below device and chip.Wherein, the run action and sequential of the entire chip of main controller controls accelerator.
Chip data write-in of the data exchange module for being calculated from the chip exterior memory read data or by chip up and down
External storage.For example, after initialization, chip passes through data exchange module above and below the chip under the control of the master controller
Raw image data and initial convolution kernel are read from external memory.The intermediate data storage device module is for storing the god
Through the intermediate result in network calculations array module calculating process.
The sparse adjustment module of the present embodiment dynamic by the consistency of each characteristic pattern of convolutional neural networks each layer output with it is more
A predetermined threshold value is compared, and obtains the rarefaction state of each characteristic pattern, the characteristic pattern of different rarefaction states is carried out different
The sparse coding of mode, for neural computing array module to after sparse coding each characteristic pattern and advance sparse coding
Convolution kernel in convolutional neural networks carries out convolution operation, on the one hand, the calculation amount of convolution algorithm in convolutional neural networks is reduced,
Improve arithmetic speed;On the other hand, according to the difference of rarefaction state, the processing state of switching at runtime accelerator improves acceleration
The flexibility of device.
On the basis of the above embodiments, the sparse adjustment module of dynamic described in the present embodiment includes consistency identification on line
Module, the interim registration module of output, dynamic coding module and the sparse control module of dynamic;Wherein, consistency identifies on the line
Module is used for for any characteristic pattern, counts in this feature figure the total of all elements in the number of 0 element and this feature figure
Number;Using the ratio between the total number of all elements in the number of 0 element in this feature figure and this feature figure as this feature
The consistency of figure;The interim registration module of output is used to store each characteristic pattern of each layer output in convolutional neural networks;
The sparse control module of dynamic be used for by the line consistency identification module export each characteristic pattern consistency with
Multiple predetermined threshold values are compared;The dynamic coding module is used for will be in the interim registration module of output according to comparison result
Each characteristic pattern carries out sparse coding.
Specifically, the sparse adjustment module of the dynamic specifically includes four modules.Consistency identification module is used on the line
In the number for counting non-zero element in each characteristic pattern in calculating process, to calculate the consistency of each characteristic pattern.The output is posted temporarily
Storing module is used to keep in the characteristic pattern of each layer output in convolutional neural networks with non-sparse format.The sparse control of dynamic
Molding block is used to control the rarefaction state of the characteristic pattern by preset multiple predetermined threshold values.The dynamic coding module according to
Each characteristic pattern in the interim registration module of output is carried out sparse coding by the rarefaction state of each characteristic pattern, to carry
The speed of high convolution algorithm.
On the basis of the above embodiments, predetermined threshold value described in the present embodiment includes that the first predetermined threshold value and second are default
Threshold value;The predetermined threshold value includes the first predetermined threshold value and the second predetermined threshold value;Correspondingly, the dynamic coding module is specifically used
In:If the consistency of each characteristic pattern is less than first predetermined threshold value, each characteristic pattern is encoded to sparse matrix
Storage format;If the consistency of each characteristic pattern is greater than or equal to first predetermined threshold value, and default less than described second
Then 0 element in each characteristic pattern is marked for threshold value;If the consistency of each characteristic pattern is greater than or equal to described the
Two predetermined threshold values do not encode each characteristic pattern then.
Specifically, predetermined threshold value described in the present embodiment includes the first predetermined threshold value th1 and the second predetermined threshold value th2.Institute
The sparse control module of dynamic is stated according to first predetermined threshold value and second predetermined threshold value by the feature of each characteristic pattern
AS points of state is three kinds of states, i.e., the characteristic pattern that consistency is less than to first predetermined threshold value is divided into complete rarefaction state S, will
Consistency is greater than or equal to first predetermined threshold value, and is divided into medium sparse shape less than the characteristic pattern of second predetermined threshold value
State M, the characteristic pattern that consistency is greater than or equal to second predetermined threshold value are divided into complete non-rarefaction state D.
If each characteristic pattern is rarefaction state S, the dynamic coding module exports each institute in interim registration module by described
It states characteristic pattern and is encoded to sparse matrix storage format, the sparse matrix storage format includes the non-zero number in each characteristic pattern
According to activ and sparse index index, such as codes co-ordinates and compression loose line coding.By the way that characteristic pattern is encoded to sparse square
Battle array storage format can save a large amount of memory spaces, while save a large amount of calculating times.If each characteristic pattern is medium rarefaction state
M, then the dynamic coding module is right by the 0 element addition label guard in each characteristic pattern in the interim registration module of the output
It can be not involved in calculating and storage in the element of label, to reduce power consumption.If each characteristic pattern is complete non-rarefaction state D,
Dynamic coding is not needed, the dynamic coding module directly exports the non-sparse data of each characteristic pattern.
On the basis of the above embodiments, dynamic coding module described in the present embodiment is additionally operable to:If what is precalculated is each
The consistency of the convolution kernel is less than first predetermined threshold value, then each convolution kernel is encoded to sparse matrix stores lattice
Formula;If the consistency of each convolution kernel is greater than or equal to first predetermined threshold value, and is less than second predetermined threshold value, then
0 element in each convolution kernel is marked;If it is default that the consistency of each convolution kernel is greater than or equal to described second
Threshold value does not encode each convolution kernel then.
Specifically, the consistency of each convolution kernel is the number of non-zero element and all elements in each convolution kernel in each convolution kernel
Total number between ratio.There are three types of states as characteristic pattern by the state WS of each convolution kernel.Each state corresponds to different
Sparse coding mode.Since characteristic pattern and convolution kernel respectively have three kinds of states, 9 kinds of states are shared after combination, to convolution god
Consistency through network carries out more fine-grained division.
On the basis of the various embodiments described above, neural computing array module described in the present embodiment is specifically used for:When
There are when the label in each characteristic pattern or each convolution kernel, to being marked in each characteristic pattern or each convolution kernel
Corresponding element is without calculating.
Specifically, when each characteristic pattern or each convolution kernel are complete rarefaction state S, by each characteristic pattern
Or 0 is removed before each convolution kernel input neural computing array module, memory space is reduced, while without to 0 yuan
Element is calculated;When each characteristic pattern or each convolution kernel are medium rarefaction state M, each characteristic pattern or each described
Although the 0 element storage in convolution kernel is calculated to reduce marking corresponding element without calculating.
For example, being made the chip of the accelerator of Taiwan Semiconductor Manufacturing Co.'s 65nm techniques, the area of the chip is 3mm*4mm,
Running frequency is 20-200MHz, and power consumption is 20.5-248.4 milliwatts.In the present embodiment end-point energy efficiency can with characteristic pattern and
The decline of convolution kernel consistency and rapid increase, as shown in Figure 3.When the consistency of characteristic pattern and convolution kernel is 5%, the limit
Energy efficiency can reach 62.1TOPS/W, be 6.2 times of end-point energy efficiency when not using the present embodiment accelerator.Such as Fig. 4 institutes
Show, compared to the realization of only supported feature Sparse, the present embodiment energy efficiency can promote 4.3 times.Compared to without adaptive
The realization of sparse control, energy efficiency of the present invention can promote 2.8 times.Compared to the control of no consistency but the variable reality of quantified precision
Existing, energy efficiency of the present invention can promote 2 times.
Finally, the present processes are only preferable embodiment, are not intended to limit the scope of the present invention.It is all
Within the spirit and principles in the present invention, any modification, equivalent replacement, improvement and so on should be included in the protection of the present invention
Within the scope of.
Claims (10)
1. a kind of accelerated method applied to convolutional neural networks, which is characterized in that including:
S1 calculates separately the consistency of each characteristic pattern of this layer output for any layer in convolutional neural networks;
The consistency of each characteristic pattern of this layer output is compared by S2 with multiple predetermined threshold values, will according to comparison result
Each characteristic pattern carries out sparse coding;Wherein, different comparison results corresponds to different sparse coding modes;
S3, based on this layer next layer of convolutional layer to the volume of each characteristic pattern and advance sparse coding after sparse coding
Each convolution kernel in product neural network carries out convolution.
2. according to the method described in claim 1, it is characterized in that, the step S1 is specifically included:
For any characteristic pattern, total of all elements in the number of non-zero element and this feature figure is counted in this feature figure
Number;
Using the ratio between the total number of all elements in the number of non-zero element in this feature figure and this feature figure as this feature
The consistency of figure.
3. according to the method described in claim 1, it is characterized in that, the predetermined threshold value includes that the first predetermined threshold value and second are pre-
If threshold value;Wherein, first predetermined threshold value is less than second predetermined threshold value;
Correspondingly, the step S2 is specifically included:
If the consistency of each characteristic pattern is less than first predetermined threshold value, each characteristic pattern is encoded to sparse matrix
Storage format;
If the consistency of each characteristic pattern is greater than or equal to first predetermined threshold value, and is less than second predetermined threshold value,
Then 0 element in each characteristic pattern is marked;
If the consistency of each characteristic pattern is greater than or equal to second predetermined threshold value, each characteristic pattern is not carried out dilute
Dredge coding.
4. according to the method described in claim 3, it is characterized in that, further including before the step S3:
Calculate the consistency of each convolution kernel in trained convolutional network;
If the consistency of each convolution kernel is less than first predetermined threshold value, each convolution kernel is encoded to sparse matrix
Storage format;
If the consistency of each convolution kernel is greater than or equal to first predetermined threshold value, and is less than second predetermined threshold value,
Then 0 element in each convolution kernel is marked;
If the consistency of each convolution kernel is greater than or equal to second predetermined threshold value, each convolution kernel is not carried out dilute
Dredge coding.
5. method according to claim 3 or 4, which is characterized in that the step S3 is specifically included:
When in each characteristic pattern or each convolution kernel there are when the label, to each characteristic pattern or each convolution kernel
It is middle to mark corresponding element without calculating.
6. a kind of accelerator applied to convolutional neural networks, which is characterized in that including:Neural computing array module and dynamic
The sparse adjustment module of state;
Wherein, the sparse adjustment module of the dynamic is used to calculate the dense of each characteristic pattern that each layer exports in convolutional neural networks
Degree, the consistency of each characteristic pattern is compared with multiple predetermined threshold values, according to comparison result to each characteristic pattern into
Row sparse coding;Wherein, the corresponding different sparse coding modes of different comparison results;
The neural computing array module is used for the institute to each characteristic pattern and advance sparse coding after sparse coding
Each convolution kernel stated in convolutional neural networks carries out convolution operation.
7. accelerator according to claim 6, which is characterized in that the sparse adjustment module of dynamic includes consistency on line
Identification module, the interim registration module of output, dynamic coding module and the sparse control module of dynamic;
Wherein, consistency identification module is used to count any characteristic pattern of 0 element in this feature figure on the line
The total number of all elements in number and this feature figure;By all elements in the number of 0 element in this feature figure and this feature figure
Consistency of the ratio as this feature figure between total number;
The interim registration module of output is used to store each characteristic pattern of each layer output in convolutional neural networks;
The sparse control module of dynamic be used for by the line consistency identification module export each characteristic pattern it is dense
Degree is compared with multiple predetermined threshold values;
The dynamic coding module is used to be carried out each characteristic pattern in the interim registration module of output according to comparison result
Sparse coding.
8. accelerator according to claim 7, which is characterized in that the predetermined threshold value includes the first predetermined threshold value and second
Predetermined threshold value;Wherein, first predetermined threshold value is less than second predetermined threshold value;
Correspondingly, the dynamic coding module is specifically used for:
If the consistency of each characteristic pattern is less than first predetermined threshold value, each characteristic pattern is encoded to sparse matrix
Storage format;
If the consistency of each characteristic pattern is greater than or equal to first predetermined threshold value, and is less than second predetermined threshold value,
Then 0 element in each characteristic pattern is marked;
If the consistency of each characteristic pattern is greater than or equal to second predetermined threshold value, each characteristic pattern is not carried out dilute
Dredge coding.
9. accelerator according to claim 8, which is characterized in that the dynamic coding module is additionally operable to:
If the consistency of each convolution kernel precalculated is less than first predetermined threshold value, each convolution kernel is encoded
For sparse matrix storage format;
If the consistency of each convolution kernel is greater than or equal to first predetermined threshold value, and is less than second predetermined threshold value,
Then 0 element in each convolution kernel is marked;
If the consistency of each convolution kernel is greater than or equal to second predetermined threshold value, each convolution kernel is not carried out dilute
Dredge coding.
10. accelerator according to claim 7 or 8, which is characterized in that the neural computing array module is specifically used
In:
When in each characteristic pattern or each convolution kernel there are when the label, to each characteristic pattern or each convolution kernel
It is middle to mark corresponding element without calculating.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810306577.3A CN108510063B (en) | 2018-04-08 | 2018-04-08 | Acceleration method and accelerator applied to convolutional neural network |
PCT/CN2018/095365 WO2019196223A1 (en) | 2018-04-08 | 2018-07-12 | Acceleration method and accelerator used for convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810306577.3A CN108510063B (en) | 2018-04-08 | 2018-04-08 | Acceleration method and accelerator applied to convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108510063A true CN108510063A (en) | 2018-09-07 |
CN108510063B CN108510063B (en) | 2020-03-20 |
Family
ID=63380995
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810306577.3A Active CN108510063B (en) | 2018-04-08 | 2018-04-08 | Acceleration method and accelerator applied to convolutional neural network |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108510063B (en) |
WO (1) | WO2019196223A1 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109389043A (en) * | 2018-09-10 | 2019-02-26 | 中国人民解放军陆军工程大学 | A kind of crowd density estimation method of unmanned plane picture |
CN109409518A (en) * | 2018-10-11 | 2019-03-01 | 北京旷视科技有限公司 | Neural network model processing method, device and terminal |
CN109784484A (en) * | 2019-01-31 | 2019-05-21 | 深兰科技(上海)有限公司 | Neural network accelerated method, device, neural network accelerate chip and storage medium |
CN109858575A (en) * | 2019-03-19 | 2019-06-07 | 苏州市爱生生物技术有限公司 | Data classification method based on convolutional neural networks |
CN110097172A (en) * | 2019-03-18 | 2019-08-06 | 中国科学院计算技术研究所 | A kind of convolutional neural networks data processing method and device based on winograd convolution algorithm |
CN110443357A (en) * | 2019-08-07 | 2019-11-12 | 上海燧原智能科技有限公司 | Convolutional neural networks calculation optimization method, apparatus, computer equipment and medium |
CN110909801A (en) * | 2019-11-26 | 2020-03-24 | 山东师范大学 | Data classification method, system, medium and device based on convolutional neural network |
CN111291230A (en) * | 2020-02-06 | 2020-06-16 | 北京奇艺世纪科技有限公司 | Feature processing method and device, electronic equipment and computer-readable storage medium |
CN113537465A (en) * | 2021-07-07 | 2021-10-22 | 深圳市易成自动驾驶技术有限公司 | LSTM model optimization method, accelerator, device and medium |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111401554B (en) * | 2020-03-12 | 2023-03-24 | 交叉信息核心技术研究院(西安)有限公司 | Accelerator of convolutional neural network supporting multi-granularity sparsity and multi-mode quantization |
WO2023164855A1 (en) * | 2022-03-03 | 2023-09-07 | Intel Corporation | Apparatus and method for 3d dynamic sparse convolution |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239825A (en) * | 2016-08-22 | 2017-10-10 | 北京深鉴智能科技有限公司 | Consider the deep neural network compression method of load balancing |
CN107239824A (en) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | Apparatus and method for realizing sparse convolution neutral net accelerator |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105184362B (en) * | 2015-08-21 | 2018-02-02 | 中国科学院自动化研究所 | The acceleration of the depth convolutional neural networks quantified based on parameter and compression method |
US10380479B2 (en) * | 2015-10-08 | 2019-08-13 | International Business Machines Corporation | Acceleration of convolutional neural network training using stochastic perforation |
CN107609641B (en) * | 2017-08-30 | 2020-07-03 | 清华大学 | Sparse neural network architecture and implementation method thereof |
-
2018
- 2018-04-08 CN CN201810306577.3A patent/CN108510063B/en active Active
- 2018-07-12 WO PCT/CN2018/095365 patent/WO2019196223A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239825A (en) * | 2016-08-22 | 2017-10-10 | 北京深鉴智能科技有限公司 | Consider the deep neural network compression method of load balancing |
CN107239824A (en) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | Apparatus and method for realizing sparse convolution neutral net accelerator |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109389043A (en) * | 2018-09-10 | 2019-02-26 | 中国人民解放军陆军工程大学 | A kind of crowd density estimation method of unmanned plane picture |
CN109389043B (en) * | 2018-09-10 | 2021-11-23 | 中国人民解放军陆军工程大学 | Crowd density estimation method for aerial picture of unmanned aerial vehicle |
CN109409518B (en) * | 2018-10-11 | 2021-05-04 | 北京旷视科技有限公司 | Neural network model processing method and device and terminal |
CN109409518A (en) * | 2018-10-11 | 2019-03-01 | 北京旷视科技有限公司 | Neural network model processing method, device and terminal |
CN109784484A (en) * | 2019-01-31 | 2019-05-21 | 深兰科技(上海)有限公司 | Neural network accelerated method, device, neural network accelerate chip and storage medium |
CN110097172A (en) * | 2019-03-18 | 2019-08-06 | 中国科学院计算技术研究所 | A kind of convolutional neural networks data processing method and device based on winograd convolution algorithm |
CN109858575A (en) * | 2019-03-19 | 2019-06-07 | 苏州市爱生生物技术有限公司 | Data classification method based on convolutional neural networks |
CN109858575B (en) * | 2019-03-19 | 2024-01-05 | 苏州市爱生生物技术有限公司 | Data classification method based on convolutional neural network |
CN110443357A (en) * | 2019-08-07 | 2019-11-12 | 上海燧原智能科技有限公司 | Convolutional neural networks calculation optimization method, apparatus, computer equipment and medium |
CN110909801B (en) * | 2019-11-26 | 2020-10-09 | 山东师范大学 | Data classification method, system, medium and device based on convolutional neural network |
CN110909801A (en) * | 2019-11-26 | 2020-03-24 | 山东师范大学 | Data classification method, system, medium and device based on convolutional neural network |
CN111291230A (en) * | 2020-02-06 | 2020-06-16 | 北京奇艺世纪科技有限公司 | Feature processing method and device, electronic equipment and computer-readable storage medium |
CN111291230B (en) * | 2020-02-06 | 2023-09-15 | 北京奇艺世纪科技有限公司 | Feature processing method, device, electronic equipment and computer readable storage medium |
CN113537465A (en) * | 2021-07-07 | 2021-10-22 | 深圳市易成自动驾驶技术有限公司 | LSTM model optimization method, accelerator, device and medium |
Also Published As
Publication number | Publication date |
---|---|
WO2019196223A1 (en) | 2019-10-17 |
CN108510063B (en) | 2020-03-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108510063A (en) | A kind of accelerated method and accelerator applied to convolutional neural networks | |
CN108009594B (en) | A kind of image-recognizing method based on change grouping convolution | |
CN107169504B (en) | A kind of hand-written character recognition method based on extension Non-linear Kernel residual error network | |
CN110245741A (en) | Optimization and methods for using them, device and the storage medium of multilayer neural network model | |
CN106485317A (en) | A kind of neutral net accelerator and the implementation method of neural network model | |
CN108416427A (en) | Convolution kernel accumulates data flow, compressed encoding and deep learning algorithm | |
CN110390383A (en) | A kind of deep neural network hardware accelerator based on power exponent quantization | |
CN106778682A (en) | A kind of training method and its equipment of convolutional neural networks model | |
CN109063825A (en) | Convolutional neural networks accelerator | |
CN110363188A (en) | Cervical cell image classification method based on convolutional neural networks | |
CN106447034A (en) | Neutral network processor based on data compression, design method and chip | |
CN110321997A (en) | High degree of parallelism computing platform, system and calculating implementation method | |
CN103593674B (en) | A kind of cervical lymph node ultrasonoscopy feature selection method | |
CN109840585B (en) | Sparse two-dimensional convolution-oriented operation method and system | |
CN108510058A (en) | Weight storage method in neural network and the processor based on this method | |
CN107944545A (en) | Computational methods and computing device applied to neutral net | |
CN108875915B (en) | A kind of depth confrontation network optimized approach of Embedded application | |
CN112529165B (en) | Deep neural network pruning method, device, terminal and storage medium | |
CN109409509A (en) | A kind of data structure and accelerated method for the convolutional neural networks accelerator based on FPGA | |
CN107463533A (en) | A kind of three-dimensional CAD physical model manufacturing feature recognition methods based on PCA and CNN | |
CN109472352A (en) | A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature | |
CN109886391A (en) | A kind of neural network compression method based on the positive and negative diagonal convolution in space | |
CN107395211A (en) | A kind of data processing method and device based on convolutional neural networks model | |
CN111291861A (en) | Input pulse coding method applied to pulse neural network | |
CN110110852A (en) | A kind of method that deep learning network is transplanted to FPAG platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |