CN108510063A - A kind of accelerated method and accelerator applied to convolutional neural networks - Google Patents

A kind of accelerated method and accelerator applied to convolutional neural networks Download PDF

Info

Publication number
CN108510063A
CN108510063A CN201810306577.3A CN201810306577A CN108510063A CN 108510063 A CN108510063 A CN 108510063A CN 201810306577 A CN201810306577 A CN 201810306577A CN 108510063 A CN108510063 A CN 108510063A
Authority
CN
China
Prior art keywords
characteristic pattern
predetermined threshold
threshold value
consistency
convolution kernel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810306577.3A
Other languages
Chinese (zh)
Other versions
CN108510063B (en
Inventor
刘勇攀
袁哲
岳金山
杨华中
李学清
王智博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201810306577.3A priority Critical patent/CN108510063B/en
Priority to PCT/CN2018/095365 priority patent/WO2019196223A1/en
Publication of CN108510063A publication Critical patent/CN108510063A/en
Application granted granted Critical
Publication of CN108510063B publication Critical patent/CN108510063B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention provides a kind of accelerated method and accelerator applied to convolutional neural networks, the method includes:S1 calculates separately the consistency of each characteristic pattern of this layer output for any layer in convolutional neural networks;The consistency of each characteristic pattern of this layer output is compared by S2 with multiple predetermined threshold values, and each characteristic pattern is carried out sparse coding according to comparison result;Wherein, different comparison results corresponds to different sparse coding modes;S3 carries out convolution based on this layer next layer of convolutional layer to each convolution kernel in the convolutional neural networks of each characteristic pattern and advance sparse coding after sparse coding.The present invention reduces the calculation amount of convolution algorithm in convolutional neural networks, improves arithmetic speed.

Description

A kind of accelerated method and accelerator applied to convolutional neural networks
Technical field
The invention belongs to operation optimisation technique fields, more particularly, to a kind of acceleration applied to convolutional neural networks Method and accelerator.
Background technology
Convolutional neural networks (Convolutional Neural Network, CNN) are a kind of feedforward neural network, people Work neuron can respond the surrounding cells in a part of coverage area, be suitable for the processing to large-scale image.Convolutional Neural net Network is widely used in the fields such as image recognition, speech recognition, but calculation amount is very big.
Due to the activation primitive ReLU in convolutional neural networks (Rectified linear unit correct linear unit) It can cause a large amount of sparse characteristic patterns (feature map);Meanwhile it can be caused using the methods of beta pruning training convolutional neural networks A large amount of sparse weighted datas (weight data).Convolution can be greatly improved using the sparsity of characteristic pattern and weighted data The computational efficiency of neural network.Currently, it is sparse based on characteristic pattern in convolutional neural networks and weighted data to have many methods Property improve calculating speed.These methods can substantially be divided into two classes, and one kind, which is conceived to, skips 0 value.Such as the method removal having is defeated 0 value in entering, to reduce input as 0 invalid computation.It is another kind of to take the method for ignoring zero.Such as the method having is defeated Enter data be 0 when, do not execute multiplication operation, to reduce operation.But these methods are all conceived to processing sparse neural network sheet Body, it is assumed that sparse neural network is premise.However actually in convolutional neural networks each layer output characteristic pattern may be it is sparse, May be non-sparse.The weighted data of each layer of convolutional neural networks and the consistency of characteristic pattern are generally in 5%- in practice It is distributed between 90%.
Sparse matrix is element number that exponential quantity is 0 far more than the number of non-zero element, and non-zero Elemental redistribution does not have Regular matrix.On the one hand sparse convolutional neural networks can only be handled in the prior art, in convolutional neural networks Be not it is sparse in the case of calculation amount it is very big, arithmetic speed is low;The another aspect prior art can only handle weight in convolutional Neural Data or characteristic pattern are sparse situations, cannot handle weighted data and characteristic pattern all and be sparse situation.
Invention content
To overcome the problems, such as that above-mentioned convolutional neural networks arithmetic speed is low or solves the above problems at least partly, this hair It is bright to provide a kind of accelerated method and accelerator applied to convolutional neural networks.
According to the first aspect of the invention, a kind of accelerated method applied to convolutional neural networks is provided, including:
S1 calculates separately the consistency of each characteristic pattern of this layer output for any layer in convolutional neural networks;
The consistency of each characteristic pattern of this layer output is compared with multiple predetermined threshold values, is tied according to comparing by S2 Each characteristic pattern is carried out sparse coding by fruit;Wherein, different comparison results corresponds to different sparse coding modes;
S3, based on this layer next layer of convolutional layer to the institute of each characteristic pattern and advance sparse coding after sparse coding Each convolution kernel stated in convolutional neural networks carries out convolution.
Specifically, the step S1 is specifically included:
For any characteristic pattern, all elements in the number of non-zero element and this feature figure are counted in this feature figure Total number;
Using the ratio between the total number of all elements in the number of non-zero element in this feature figure and this feature figure as this The consistency of characteristic pattern.
Specifically, the predetermined threshold value includes the first predetermined threshold value and the second predetermined threshold value;Wherein, the described first default threshold Value is less than second predetermined threshold value;
Correspondingly, the step S2 is specifically included:
If the consistency of each characteristic pattern is less than first predetermined threshold value, each characteristic pattern is encoded to sparse Matrix storage format;
If the consistency of each characteristic pattern is greater than or equal to first predetermined threshold value, and is less than the described second default threshold Value, then 0 element in each characteristic pattern is marked;
If the consistency of each characteristic pattern be greater than or equal to second predetermined threshold value, not to each characteristic pattern into Row sparse coding.
Specifically, further include before the step S3:
Calculate the consistency of each convolution kernel in trained convolutional network;
If the consistency of each convolution kernel is less than first predetermined threshold value, each convolution kernel is encoded to sparse Matrix storage format;
If the consistency of each convolution kernel is greater than or equal to first predetermined threshold value, and is less than the described second default threshold Value, then 0 element in each convolution kernel is marked;
If the consistency of each convolution kernel be greater than or equal to second predetermined threshold value, not to each convolution kernel into Row sparse coding.
Specifically, the step S3 is specifically included:
When in each characteristic pattern or each convolution kernel there are when the label, to each characteristic pattern or each volume Mark corresponding element without calculating in product core.
A kind of accelerator applied to convolutional neural networks is provided according to a further aspect of the invention, including:Neural network meter Calculate array module and the sparse adjustment module of dynamic;
Wherein, the sparse adjustment module of the dynamic is used to calculate the dense of each characteristic pattern of each layer output of convolutional neural networks Degree, the consistency of each characteristic pattern is compared with multiple predetermined threshold values, according to comparison result to each characteristic pattern into Row sparse coding;Wherein, the corresponding different sparse coding modes of different comparison results;
The neural computing array module be used for after sparse coding each characteristic pattern and advance sparse coding The convolutional neural networks in each convolution kernel carry out convolution operation.
Specifically, the sparse adjustment module of the dynamic includes consistency identification module on line, the interim registration module of output, moves State coding module and the sparse control module of dynamic;
Wherein, consistency identification module is used to, for any characteristic pattern, count 0 element in this feature figure on the line Number and this feature figure in all elements total number;By all members in the number of 0 element in this feature figure and this feature figure Consistency of the ratio as this feature figure between the total number of element;
The interim registration module of output is used to store each characteristic pattern of each layer output in convolutional neural networks;
Each characteristic pattern that the sparse control module of dynamic is used to export consistency identification module on the line Consistency is compared with multiple predetermined threshold values;
The dynamic coding module is used to export each characteristic pattern in interim registration module by described according to comparison result Carry out sparse coding.
Specifically, the predetermined threshold value includes the first predetermined threshold value and the second predetermined threshold value;Wherein, the described first default threshold Value is less than second predetermined threshold value;
Correspondingly, the dynamic coding module is specifically used for:
If the consistency of each characteristic pattern is less than first predetermined threshold value, each characteristic pattern is encoded to sparse Matrix storage format;
If the consistency of each characteristic pattern is greater than or equal to first predetermined threshold value, and is less than the described second default threshold Value, then 0 element in each characteristic pattern is marked;
If the consistency of each characteristic pattern be greater than or equal to second predetermined threshold value, not to each characteristic pattern into Row sparse coding.
Specifically, the dynamic coding module is additionally operable to:
If the consistency of each convolution kernel precalculated is less than first predetermined threshold value, by each convolution kernel It is encoded to sparse matrix storage format;
If the consistency of each convolution kernel is greater than or equal to first predetermined threshold value, and is less than the described second default threshold Value, then 0 element in each convolution kernel is marked;
If the consistency of each convolution kernel be greater than or equal to second predetermined threshold value, not to each convolution kernel into Row sparse coding.
Specifically, the neural computing array module is specifically used for:
When in each characteristic pattern or each convolution kernel there are when the label, to each characteristic pattern or each volume Mark corresponding element without calculating in product core.
The present invention provides a kind of accelerated method and accelerator applied to convolutional neural networks, and this method passes through convolution is refreshing Consistency through each characteristic pattern of each layer output in network is compared with multiple predetermined threshold values, obtains the dilute of each characteristic pattern The characteristic pattern of different rarefaction states is carried out the sparse coding of different modes, is then based on the convolution of next layer of each layer by the state of dredging Layer carries out convolution operation to the convolution kernel in the convolutional neural networks of each characteristic pattern and advance sparse coding after sparse coding, subtracts The calculation amount of convolution algorithm, improves arithmetic speed in few convolutional neural networks.
Description of the drawings
Fig. 1 is the accelerated method overall flow schematic diagram provided in an embodiment of the present invention applied to convolutional neural networks;
Fig. 2 is the accelerator overall structure diagram provided in an embodiment of the present invention applied to convolutional neural networks;
Fig. 3 is end-point energy efficiency test knot in the accelerator provided in an embodiment of the present invention applied to convolutional neural networks Fruit schematic diagram;
Fig. 4 is end-point energy efficiency test knot in the accelerator provided in an embodiment of the present invention applied to convolutional neural networks Fruit contrast schematic diagram.
Specific implementation mode
With reference to the accompanying drawings and examples, the specific implementation mode of the present invention is described in further detail.Implement below Example is not limited to the scope of the present invention for illustrating the present invention.
A kind of accelerated method applied to convolutional neural networks is provided in one embodiment of the invention, and Fig. 1 is this hair The accelerated method overall flow schematic diagram applied to convolutional neural networks that bright embodiment provides, this method include:
S1 calculates separately the consistency of each characteristic pattern of this layer output for any layer in convolutional neural networks;
Specifically, may include in the convolutional neural networks pond layer or include pond layer.First to the convolution Neural network is trained, and the convolution kernel after the completion of training in convolutional neural networks no longer changes, therefore in convolutional neural networks Convolution kernel do not need dynamic sparse coding on line, sparse coding is primary under direct line.Refer to being located to accelerate on the line Refer to not being located on the chip of the accelerator on the chip of device, under the line.It is directly read in each convolution algorithm sparse The convolution kernel of coding carries out convolutional calculation.When inputting raw image data, sparse coding is carried out to raw image data, then The first layer convolutional layer that the convolution kernel of the initial data of sparse coding and sparse coding is inputted to the convolutional neural networks carries out Convolutional calculation.Since raw image data is generally not sparse, sparse volume can not also be carried out to the raw image data Code, directly inputs the raw image data.The sparse coding is to store data with sparse format.
In S1, since the consistency of each characteristic pattern of every layer of output in convolutional neural networks is different, different layers output Characteristic pattern is also dynamic change, therefore consistency is also dynamic change.The consistency indicates the dilute of each characteristic pattern The degree of dredging.In order to preferably improve the arithmetic speed of the convolutional neural networks, the thick of each characteristic pattern of each layer of output is calculated Density carries out sparse coding with each characteristic pattern that the consistency of each characteristic pattern exported according to each layer exports each layer.
The consistency of each characteristic pattern of this layer output is compared with multiple predetermined threshold values, is tied according to comparing by S2 Each characteristic pattern is carried out sparse coding by fruit;Wherein, different comparison results corresponds to different sparse coding modes;
In S2, the characteristic pattern in the prior art exporting each layer all carries out sparse coding, computationally intensive.The present embodiment root The rarefaction state of each characteristic pattern of this layer output is obtained according to the predetermined threshold value.Thus by the characteristic pattern of different rarefaction states Carry out various forms of sparse codings.
S3, based on this layer next layer of convolutional layer to the institute of each characteristic pattern and advance sparse coding after sparse coding Each convolution kernel stated in convolutional neural networks carries out convolution.
It, will be each in the convolutional neural networks of each characteristic pattern and advance sparse coding after sparse coding in S3 Input of the convolution kernel as this layer next layer of convolutional layer carries out convolution operation.Then, using the result of the convolution operation as The characteristic pattern of lower layer of output of the convolutional layer is continued to execute above-mentioned sparse coding and volume by the input that lower layer of the convolutional layer Product operation, until last layer exports each characteristic pattern in the convolutional neural networks.Convolution kernel is not limited in the present embodiment Sparse coding mode.
The present embodiment carries out the consistency of each characteristic pattern of each layer output in convolutional neural networks with multiple predetermined threshold values Compare, obtain the rarefaction state of each characteristic pattern, the characteristic pattern of different rarefaction states is carried out to the sparse coding of different modes, It is then based on convolutional neural networks of the convolutional layer to each characteristic pattern and advance sparse coding after sparse coding of next layer of each layer In convolution kernel carry out convolution operation, reduce convolutional neural networks in convolution algorithm calculation amount, improve arithmetic speed.
On the basis of the above embodiments, in the present embodiment, the step S1 is specifically included:For any feature Figure counts the total number of the number of non-zero element and all elements in this feature figure in this feature figure;By non-zero member in this feature figure Consistency of the ratio as this feature figure in the number and this feature figure of element between the total number of all elements.
Specifically, the consistency of each characteristic pattern is the number of non-zero element and all elements in each characteristic pattern in each characteristic pattern Total number between ratio.For example, the number of non-zero element is 10 in a characteristic pattern, all elements is total in this feature figure Number is 100, then the consistency of this feature figure is 0.1.
On the basis of the above embodiments, predetermined threshold value described in the present embodiment includes that the first predetermined threshold value and second are default Threshold value;The predetermined threshold value includes the first predetermined threshold value and the second predetermined threshold value;Correspondingly, the step S2 is specifically included:If The consistency of each characteristic pattern is less than first predetermined threshold value, then each characteristic pattern is encoded to sparse matrix stores lattice Formula;If the consistency of each characteristic pattern is greater than or equal to first predetermined threshold value, and is less than second predetermined threshold value, then 0 element in each characteristic pattern is marked;If it is default that the consistency of each characteristic pattern is greater than or equal to described second Threshold value does not then carry out sparse coding to each characteristic pattern.
Specifically, predetermined threshold value described in the present embodiment includes the first predetermined threshold value th1 and the second predetermined threshold value th2.Root It is three kinds of states to divide the significant condition AS of each characteristic pattern according to first predetermined threshold value and second predetermined threshold value, i.e., The characteristic pattern that consistency is less than to first predetermined threshold value is divided into complete rarefaction state S, consistency is greater than or equal to described First predetermined threshold value, and it is divided into medium rarefaction state M less than the characteristic pattern of second predetermined threshold value, consistency is more than or is waited It is divided into complete non-rarefaction state D in the characteristic pattern of second predetermined threshold value.If each characteristic pattern is rarefaction state S, by each institute It states characteristic pattern and is encoded to sparse matrix storage format, the sparse matrix storage format includes the non-zero number in each characteristic pattern According to activ and sparse index index, such as codes co-ordinates and compression loose line coding.By the way that characteristic pattern is encoded to sparse square Battle array storage format can save a large amount of memory spaces, while save a large amount of calculating times.If each characteristic pattern is medium rarefaction state M, then by the 0 element addition label guard in each characteristic pattern, the label is for identifying 0 element.It can be with for the element of label It is not involved in calculating and storage, to reduce power consumption.It is also a kind of sparse coding side that 0 element in each characteristic pattern, which is marked, Formula.If each characteristic pattern is complete non-rarefaction state D, dynamic coding is not needed, the non-sparse data of each characteristic pattern is directly exported.
On the basis of the above embodiments, further include before step S3 described in the present embodiment:Calculate trained convolution The consistency of each convolution kernel in network;It, will be each described if the consistency of each convolution kernel is less than first predetermined threshold value Convolution kernel is encoded to sparse matrix storage format;If the consistency of each convolution kernel is greater than or equal to the described first default threshold Value, and be less than second predetermined threshold value, then 0 element in each convolution kernel is marked;If each convolution kernel Consistency is greater than or equal to second predetermined threshold value, then is not encoded to each convolution kernel.
Specifically, the consistency of each convolution kernel is the number of non-zero element and all elements in each convolution kernel in each convolution kernel Total number between ratio.The state WS of each convolution kernel is divided into three kinds of states as characteristic pattern.Each state corresponds to not Same sparse coding mode.Since characteristic pattern and convolution kernel respectively have three kinds of states, 9 kinds of states are shared after combination, to volume The consistency of product neural network carries out more fine-grained division.
On the basis of the various embodiments described above, step S3 described in the present embodiment specifically includes:When each characteristic pattern or There are when the label in each convolution kernel, to marked in each characteristic pattern or each convolution kernel corresponding element not into Row calculates.
Specifically, when each characteristic pattern or each convolution kernel are complete rarefaction state S, 0 is removed before input, is subtracted Few memory space, while without calculating 0 element;When each characteristic pattern or each convolution kernel are medium rarefaction state When M, although the 0 element storage in each characteristic pattern or each convolution kernel, to marking corresponding element without calculating, It is calculated to reduce.
A kind of accelerator applied to convolutional neural networks is provided in another embodiment of the present invention, and Fig. 2 is this hair The accelerator overall structure diagram applied to convolutional neural networks that bright embodiment provides, including:Neural computing array Module and the sparse adjustment module of dynamic;Wherein, the sparse adjustment module of the dynamic is defeated for calculating each layer in convolutional neural networks The consistency of each characteristic pattern is compared by the consistency of each characteristic pattern gone out with multiple predetermined threshold values, is tied according to comparing Fruit carries out sparse coding to each characteristic pattern;Wherein, the corresponding different sparse coding modes of different comparison results;The god It is used for the convolutional Neural to each characteristic pattern and advance sparse coding after sparse coding through network calculations array module Each convolution kernel in network carries out convolution operation.
Specifically, may include in the convolutional neural networks pond layer or include pond layer.First to convolutional Neural Network is trained, and the convolution kernel after the completion of training in convolutional neural networks no longer changes, therefore the volume in convolutional neural networks Product core does not need dynamic sparse coding on line, and sparse coding is primary under direct line.The nerve in each convolution algorithm The convolution kernel that network calculations array module directly reads sparse coding under line carries out convolutional calculation.When convolutional neural networks input original When beginning image data, the sparse adjustment module of dynamic carries out sparse coding to raw image data, then neural computing array Module carries out convolutional calculation according to the initial data of sparse coding and the convolution kernel of sparse coding.Since raw image data is general It is not sparse, sparse coding can not also be carried out to the raw image data, directly input the raw image data.Institute It is to store data with sparse format to state sparse coding.
Since the consistency of each characteristic pattern of each layer output in convolutional neural networks is different, the characteristic pattern of different layers output It is also dynamic change, therefore consistency is also dynamic change.The consistency indicates the sparse degree of each characteristic pattern. In order to preferably improve the arithmetic speed of the convolutional neural networks, the sparse adjustment module of dynamic calculates each layer of output The consistency of each characteristic pattern is carried out with each characteristic pattern that the consistency of each characteristic pattern exported according to each layer exports each layer Sparse coding.
The sparse adjustment module of dynamic obtains the sparse of each characteristic pattern of this layer output according to multiple predetermined threshold values State.To which the characteristic pattern of different rarefaction states is carried out various forms of sparse codings, it is not limited solely to a kind of sparse coding. The special card figure that each layer is exported in the prior art all carries out sparse coding, computationally intensive.
The neural computing array module according to after sparse coding each characteristic pattern and advance sparse coding Each convolution kernel in the convolutional neural networks carries out convolution operation.If including pond module, the pond module will be described The result of convolution operation carries out pondization operation.In addition, the accelerator further includes intermediate data storage device module, master chip control Data exchange module above and below device and chip.Wherein, the run action and sequential of the entire chip of main controller controls accelerator. Chip data write-in of the data exchange module for being calculated from the chip exterior memory read data or by chip up and down External storage.For example, after initialization, chip passes through data exchange module above and below the chip under the control of the master controller Raw image data and initial convolution kernel are read from external memory.The intermediate data storage device module is for storing the god Through the intermediate result in network calculations array module calculating process.
The sparse adjustment module of the present embodiment dynamic by the consistency of each characteristic pattern of convolutional neural networks each layer output with it is more A predetermined threshold value is compared, and obtains the rarefaction state of each characteristic pattern, the characteristic pattern of different rarefaction states is carried out different The sparse coding of mode, for neural computing array module to after sparse coding each characteristic pattern and advance sparse coding Convolution kernel in convolutional neural networks carries out convolution operation, on the one hand, the calculation amount of convolution algorithm in convolutional neural networks is reduced, Improve arithmetic speed;On the other hand, according to the difference of rarefaction state, the processing state of switching at runtime accelerator improves acceleration The flexibility of device.
On the basis of the above embodiments, the sparse adjustment module of dynamic described in the present embodiment includes consistency identification on line Module, the interim registration module of output, dynamic coding module and the sparse control module of dynamic;Wherein, consistency identifies on the line Module is used for for any characteristic pattern, counts in this feature figure the total of all elements in the number of 0 element and this feature figure Number;Using the ratio between the total number of all elements in the number of 0 element in this feature figure and this feature figure as this feature The consistency of figure;The interim registration module of output is used to store each characteristic pattern of each layer output in convolutional neural networks; The sparse control module of dynamic be used for by the line consistency identification module export each characteristic pattern consistency with Multiple predetermined threshold values are compared;The dynamic coding module is used for will be in the interim registration module of output according to comparison result Each characteristic pattern carries out sparse coding.
Specifically, the sparse adjustment module of the dynamic specifically includes four modules.Consistency identification module is used on the line In the number for counting non-zero element in each characteristic pattern in calculating process, to calculate the consistency of each characteristic pattern.The output is posted temporarily Storing module is used to keep in the characteristic pattern of each layer output in convolutional neural networks with non-sparse format.The sparse control of dynamic Molding block is used to control the rarefaction state of the characteristic pattern by preset multiple predetermined threshold values.The dynamic coding module according to Each characteristic pattern in the interim registration module of output is carried out sparse coding by the rarefaction state of each characteristic pattern, to carry The speed of high convolution algorithm.
On the basis of the above embodiments, predetermined threshold value described in the present embodiment includes that the first predetermined threshold value and second are default Threshold value;The predetermined threshold value includes the first predetermined threshold value and the second predetermined threshold value;Correspondingly, the dynamic coding module is specifically used In:If the consistency of each characteristic pattern is less than first predetermined threshold value, each characteristic pattern is encoded to sparse matrix Storage format;If the consistency of each characteristic pattern is greater than or equal to first predetermined threshold value, and default less than described second Then 0 element in each characteristic pattern is marked for threshold value;If the consistency of each characteristic pattern is greater than or equal to described the Two predetermined threshold values do not encode each characteristic pattern then.
Specifically, predetermined threshold value described in the present embodiment includes the first predetermined threshold value th1 and the second predetermined threshold value th2.Institute The sparse control module of dynamic is stated according to first predetermined threshold value and second predetermined threshold value by the feature of each characteristic pattern AS points of state is three kinds of states, i.e., the characteristic pattern that consistency is less than to first predetermined threshold value is divided into complete rarefaction state S, will Consistency is greater than or equal to first predetermined threshold value, and is divided into medium sparse shape less than the characteristic pattern of second predetermined threshold value State M, the characteristic pattern that consistency is greater than or equal to second predetermined threshold value are divided into complete non-rarefaction state D.
If each characteristic pattern is rarefaction state S, the dynamic coding module exports each institute in interim registration module by described It states characteristic pattern and is encoded to sparse matrix storage format, the sparse matrix storage format includes the non-zero number in each characteristic pattern According to activ and sparse index index, such as codes co-ordinates and compression loose line coding.By the way that characteristic pattern is encoded to sparse square Battle array storage format can save a large amount of memory spaces, while save a large amount of calculating times.If each characteristic pattern is medium rarefaction state M, then the dynamic coding module is right by the 0 element addition label guard in each characteristic pattern in the interim registration module of the output It can be not involved in calculating and storage in the element of label, to reduce power consumption.If each characteristic pattern is complete non-rarefaction state D, Dynamic coding is not needed, the dynamic coding module directly exports the non-sparse data of each characteristic pattern.
On the basis of the above embodiments, dynamic coding module described in the present embodiment is additionally operable to:If what is precalculated is each The consistency of the convolution kernel is less than first predetermined threshold value, then each convolution kernel is encoded to sparse matrix stores lattice Formula;If the consistency of each convolution kernel is greater than or equal to first predetermined threshold value, and is less than second predetermined threshold value, then 0 element in each convolution kernel is marked;If it is default that the consistency of each convolution kernel is greater than or equal to described second Threshold value does not encode each convolution kernel then.
Specifically, the consistency of each convolution kernel is the number of non-zero element and all elements in each convolution kernel in each convolution kernel Total number between ratio.There are three types of states as characteristic pattern by the state WS of each convolution kernel.Each state corresponds to different Sparse coding mode.Since characteristic pattern and convolution kernel respectively have three kinds of states, 9 kinds of states are shared after combination, to convolution god Consistency through network carries out more fine-grained division.
On the basis of the various embodiments described above, neural computing array module described in the present embodiment is specifically used for:When There are when the label in each characteristic pattern or each convolution kernel, to being marked in each characteristic pattern or each convolution kernel Corresponding element is without calculating.
Specifically, when each characteristic pattern or each convolution kernel are complete rarefaction state S, by each characteristic pattern Or 0 is removed before each convolution kernel input neural computing array module, memory space is reduced, while without to 0 yuan Element is calculated;When each characteristic pattern or each convolution kernel are medium rarefaction state M, each characteristic pattern or each described Although the 0 element storage in convolution kernel is calculated to reduce marking corresponding element without calculating.
For example, being made the chip of the accelerator of Taiwan Semiconductor Manufacturing Co.'s 65nm techniques, the area of the chip is 3mm*4mm, Running frequency is 20-200MHz, and power consumption is 20.5-248.4 milliwatts.In the present embodiment end-point energy efficiency can with characteristic pattern and The decline of convolution kernel consistency and rapid increase, as shown in Figure 3.When the consistency of characteristic pattern and convolution kernel is 5%, the limit Energy efficiency can reach 62.1TOPS/W, be 6.2 times of end-point energy efficiency when not using the present embodiment accelerator.Such as Fig. 4 institutes Show, compared to the realization of only supported feature Sparse, the present embodiment energy efficiency can promote 4.3 times.Compared to without adaptive The realization of sparse control, energy efficiency of the present invention can promote 2.8 times.Compared to the control of no consistency but the variable reality of quantified precision Existing, energy efficiency of the present invention can promote 2 times.
Finally, the present processes are only preferable embodiment, are not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent replacement, improvement and so on should be included in the protection of the present invention Within the scope of.

Claims (10)

1. a kind of accelerated method applied to convolutional neural networks, which is characterized in that including:
S1 calculates separately the consistency of each characteristic pattern of this layer output for any layer in convolutional neural networks;
The consistency of each characteristic pattern of this layer output is compared by S2 with multiple predetermined threshold values, will according to comparison result Each characteristic pattern carries out sparse coding;Wherein, different comparison results corresponds to different sparse coding modes;
S3, based on this layer next layer of convolutional layer to the volume of each characteristic pattern and advance sparse coding after sparse coding Each convolution kernel in product neural network carries out convolution.
2. according to the method described in claim 1, it is characterized in that, the step S1 is specifically included:
For any characteristic pattern, total of all elements in the number of non-zero element and this feature figure is counted in this feature figure Number;
Using the ratio between the total number of all elements in the number of non-zero element in this feature figure and this feature figure as this feature The consistency of figure.
3. according to the method described in claim 1, it is characterized in that, the predetermined threshold value includes that the first predetermined threshold value and second are pre- If threshold value;Wherein, first predetermined threshold value is less than second predetermined threshold value;
Correspondingly, the step S2 is specifically included:
If the consistency of each characteristic pattern is less than first predetermined threshold value, each characteristic pattern is encoded to sparse matrix Storage format;
If the consistency of each characteristic pattern is greater than or equal to first predetermined threshold value, and is less than second predetermined threshold value, Then 0 element in each characteristic pattern is marked;
If the consistency of each characteristic pattern is greater than or equal to second predetermined threshold value, each characteristic pattern is not carried out dilute Dredge coding.
4. according to the method described in claim 3, it is characterized in that, further including before the step S3:
Calculate the consistency of each convolution kernel in trained convolutional network;
If the consistency of each convolution kernel is less than first predetermined threshold value, each convolution kernel is encoded to sparse matrix Storage format;
If the consistency of each convolution kernel is greater than or equal to first predetermined threshold value, and is less than second predetermined threshold value, Then 0 element in each convolution kernel is marked;
If the consistency of each convolution kernel is greater than or equal to second predetermined threshold value, each convolution kernel is not carried out dilute Dredge coding.
5. method according to claim 3 or 4, which is characterized in that the step S3 is specifically included:
When in each characteristic pattern or each convolution kernel there are when the label, to each characteristic pattern or each convolution kernel It is middle to mark corresponding element without calculating.
6. a kind of accelerator applied to convolutional neural networks, which is characterized in that including:Neural computing array module and dynamic The sparse adjustment module of state;
Wherein, the sparse adjustment module of the dynamic is used to calculate the dense of each characteristic pattern that each layer exports in convolutional neural networks Degree, the consistency of each characteristic pattern is compared with multiple predetermined threshold values, according to comparison result to each characteristic pattern into Row sparse coding;Wherein, the corresponding different sparse coding modes of different comparison results;
The neural computing array module is used for the institute to each characteristic pattern and advance sparse coding after sparse coding Each convolution kernel stated in convolutional neural networks carries out convolution operation.
7. accelerator according to claim 6, which is characterized in that the sparse adjustment module of dynamic includes consistency on line Identification module, the interim registration module of output, dynamic coding module and the sparse control module of dynamic;
Wherein, consistency identification module is used to count any characteristic pattern of 0 element in this feature figure on the line The total number of all elements in number and this feature figure;By all elements in the number of 0 element in this feature figure and this feature figure Consistency of the ratio as this feature figure between total number;
The interim registration module of output is used to store each characteristic pattern of each layer output in convolutional neural networks;
The sparse control module of dynamic be used for by the line consistency identification module export each characteristic pattern it is dense Degree is compared with multiple predetermined threshold values;
The dynamic coding module is used to be carried out each characteristic pattern in the interim registration module of output according to comparison result Sparse coding.
8. accelerator according to claim 7, which is characterized in that the predetermined threshold value includes the first predetermined threshold value and second Predetermined threshold value;Wherein, first predetermined threshold value is less than second predetermined threshold value;
Correspondingly, the dynamic coding module is specifically used for:
If the consistency of each characteristic pattern is less than first predetermined threshold value, each characteristic pattern is encoded to sparse matrix Storage format;
If the consistency of each characteristic pattern is greater than or equal to first predetermined threshold value, and is less than second predetermined threshold value, Then 0 element in each characteristic pattern is marked;
If the consistency of each characteristic pattern is greater than or equal to second predetermined threshold value, each characteristic pattern is not carried out dilute Dredge coding.
9. accelerator according to claim 8, which is characterized in that the dynamic coding module is additionally operable to:
If the consistency of each convolution kernel precalculated is less than first predetermined threshold value, each convolution kernel is encoded For sparse matrix storage format;
If the consistency of each convolution kernel is greater than or equal to first predetermined threshold value, and is less than second predetermined threshold value, Then 0 element in each convolution kernel is marked;
If the consistency of each convolution kernel is greater than or equal to second predetermined threshold value, each convolution kernel is not carried out dilute Dredge coding.
10. accelerator according to claim 7 or 8, which is characterized in that the neural computing array module is specifically used In:
When in each characteristic pattern or each convolution kernel there are when the label, to each characteristic pattern or each convolution kernel It is middle to mark corresponding element without calculating.
CN201810306577.3A 2018-04-08 2018-04-08 Acceleration method and accelerator applied to convolutional neural network Active CN108510063B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810306577.3A CN108510063B (en) 2018-04-08 2018-04-08 Acceleration method and accelerator applied to convolutional neural network
PCT/CN2018/095365 WO2019196223A1 (en) 2018-04-08 2018-07-12 Acceleration method and accelerator used for convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810306577.3A CN108510063B (en) 2018-04-08 2018-04-08 Acceleration method and accelerator applied to convolutional neural network

Publications (2)

Publication Number Publication Date
CN108510063A true CN108510063A (en) 2018-09-07
CN108510063B CN108510063B (en) 2020-03-20

Family

ID=63380995

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810306577.3A Active CN108510063B (en) 2018-04-08 2018-04-08 Acceleration method and accelerator applied to convolutional neural network

Country Status (2)

Country Link
CN (1) CN108510063B (en)
WO (1) WO2019196223A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389043A (en) * 2018-09-10 2019-02-26 中国人民解放军陆军工程大学 A kind of crowd density estimation method of unmanned plane picture
CN109409518A (en) * 2018-10-11 2019-03-01 北京旷视科技有限公司 Neural network model processing method, device and terminal
CN109784484A (en) * 2019-01-31 2019-05-21 深兰科技(上海)有限公司 Neural network accelerated method, device, neural network accelerate chip and storage medium
CN109858575A (en) * 2019-03-19 2019-06-07 苏州市爱生生物技术有限公司 Data classification method based on convolutional neural networks
CN110097172A (en) * 2019-03-18 2019-08-06 中国科学院计算技术研究所 A kind of convolutional neural networks data processing method and device based on winograd convolution algorithm
CN110443357A (en) * 2019-08-07 2019-11-12 上海燧原智能科技有限公司 Convolutional neural networks calculation optimization method, apparatus, computer equipment and medium
CN110909801A (en) * 2019-11-26 2020-03-24 山东师范大学 Data classification method, system, medium and device based on convolutional neural network
CN111291230A (en) * 2020-02-06 2020-06-16 北京奇艺世纪科技有限公司 Feature processing method and device, electronic equipment and computer-readable storage medium
CN113537465A (en) * 2021-07-07 2021-10-22 深圳市易成自动驾驶技术有限公司 LSTM model optimization method, accelerator, device and medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401554B (en) * 2020-03-12 2023-03-24 交叉信息核心技术研究院(西安)有限公司 Accelerator of convolutional neural network supporting multi-granularity sparsity and multi-mode quantization
WO2023164855A1 (en) * 2022-03-03 2023-09-07 Intel Corporation Apparatus and method for 3d dynamic sparse convolution

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239825A (en) * 2016-08-22 2017-10-10 北京深鉴智能科技有限公司 Consider the deep neural network compression method of load balancing
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184362B (en) * 2015-08-21 2018-02-02 中国科学院自动化研究所 The acceleration of the depth convolutional neural networks quantified based on parameter and compression method
US10380479B2 (en) * 2015-10-08 2019-08-13 International Business Machines Corporation Acceleration of convolutional neural network training using stochastic perforation
CN107609641B (en) * 2017-08-30 2020-07-03 清华大学 Sparse neural network architecture and implementation method thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239825A (en) * 2016-08-22 2017-10-10 北京深鉴智能科技有限公司 Consider the deep neural network compression method of load balancing
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389043A (en) * 2018-09-10 2019-02-26 中国人民解放军陆军工程大学 A kind of crowd density estimation method of unmanned plane picture
CN109389043B (en) * 2018-09-10 2021-11-23 中国人民解放军陆军工程大学 Crowd density estimation method for aerial picture of unmanned aerial vehicle
CN109409518B (en) * 2018-10-11 2021-05-04 北京旷视科技有限公司 Neural network model processing method and device and terminal
CN109409518A (en) * 2018-10-11 2019-03-01 北京旷视科技有限公司 Neural network model processing method, device and terminal
CN109784484A (en) * 2019-01-31 2019-05-21 深兰科技(上海)有限公司 Neural network accelerated method, device, neural network accelerate chip and storage medium
CN110097172A (en) * 2019-03-18 2019-08-06 中国科学院计算技术研究所 A kind of convolutional neural networks data processing method and device based on winograd convolution algorithm
CN109858575A (en) * 2019-03-19 2019-06-07 苏州市爱生生物技术有限公司 Data classification method based on convolutional neural networks
CN109858575B (en) * 2019-03-19 2024-01-05 苏州市爱生生物技术有限公司 Data classification method based on convolutional neural network
CN110443357A (en) * 2019-08-07 2019-11-12 上海燧原智能科技有限公司 Convolutional neural networks calculation optimization method, apparatus, computer equipment and medium
CN110909801B (en) * 2019-11-26 2020-10-09 山东师范大学 Data classification method, system, medium and device based on convolutional neural network
CN110909801A (en) * 2019-11-26 2020-03-24 山东师范大学 Data classification method, system, medium and device based on convolutional neural network
CN111291230A (en) * 2020-02-06 2020-06-16 北京奇艺世纪科技有限公司 Feature processing method and device, electronic equipment and computer-readable storage medium
CN111291230B (en) * 2020-02-06 2023-09-15 北京奇艺世纪科技有限公司 Feature processing method, device, electronic equipment and computer readable storage medium
CN113537465A (en) * 2021-07-07 2021-10-22 深圳市易成自动驾驶技术有限公司 LSTM model optimization method, accelerator, device and medium

Also Published As

Publication number Publication date
WO2019196223A1 (en) 2019-10-17
CN108510063B (en) 2020-03-20

Similar Documents

Publication Publication Date Title
CN108510063A (en) A kind of accelerated method and accelerator applied to convolutional neural networks
CN108009594B (en) A kind of image-recognizing method based on change grouping convolution
CN107169504B (en) A kind of hand-written character recognition method based on extension Non-linear Kernel residual error network
CN110245741A (en) Optimization and methods for using them, device and the storage medium of multilayer neural network model
CN106485317A (en) A kind of neutral net accelerator and the implementation method of neural network model
CN108416427A (en) Convolution kernel accumulates data flow, compressed encoding and deep learning algorithm
CN110390383A (en) A kind of deep neural network hardware accelerator based on power exponent quantization
CN106778682A (en) A kind of training method and its equipment of convolutional neural networks model
CN109063825A (en) Convolutional neural networks accelerator
CN110363188A (en) Cervical cell image classification method based on convolutional neural networks
CN106447034A (en) Neutral network processor based on data compression, design method and chip
CN110321997A (en) High degree of parallelism computing platform, system and calculating implementation method
CN103593674B (en) A kind of cervical lymph node ultrasonoscopy feature selection method
CN109840585B (en) Sparse two-dimensional convolution-oriented operation method and system
CN108510058A (en) Weight storage method in neural network and the processor based on this method
CN107944545A (en) Computational methods and computing device applied to neutral net
CN108875915B (en) A kind of depth confrontation network optimized approach of Embedded application
CN112529165B (en) Deep neural network pruning method, device, terminal and storage medium
CN109409509A (en) A kind of data structure and accelerated method for the convolutional neural networks accelerator based on FPGA
CN107463533A (en) A kind of three-dimensional CAD physical model manufacturing feature recognition methods based on PCA and CNN
CN109472352A (en) A kind of deep neural network model method of cutting out based on characteristic pattern statistical nature
CN109886391A (en) A kind of neural network compression method based on the positive and negative diagonal convolution in space
CN107395211A (en) A kind of data processing method and device based on convolutional neural networks model
CN111291861A (en) Input pulse coding method applied to pulse neural network
CN110110852A (en) A kind of method that deep learning network is transplanted to FPAG platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant