CN110288086A - A kind of configurable convolution array accelerator structure based on Winograd - Google Patents
A kind of configurable convolution array accelerator structure based on Winograd Download PDFInfo
- Publication number
- CN110288086A CN110288086A CN201910511987.6A CN201910511987A CN110288086A CN 110288086 A CN110288086 A CN 110288086A CN 201910511987 A CN201910511987 A CN 201910511987A CN 110288086 A CN110288086 A CN 110288086A
- Authority
- CN
- China
- Prior art keywords
- matrix
- winograd
- module
- weight
- domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
A kind of configurable convolution array accelerator structure based on Winograd, comprising: activation value cache module, weight cache module, output buffer module, controller, weight preprocessing module, activation value preprocessing module, weight conversion module, activation value matrix conversion module, dot product module, matrix of consequence conversion module, accumulator module, pond module and active module.A kind of configurable convolution array accelerator structure based on Winograd of the invention, according to the operation feature of the Winograd convolution algorithm of fixed normal form, the configurable convolution array accelerator of bit wide is devised, the demand of different neural networks and different convolutional layers to bit wide is flexibly met.In addition, the configurable multiplier unit of dedicated data bit width, which has also been devised, reduces calculating power consumption to improve the computational efficiency of neural network convolution algorithm.
Description
Technical field
The present invention relates to a kind of configurable convolution array accelerator structures.More particularly to it is a kind of based on Winograd can
Configure convolution array accelerator structure.
Background technique
Neural network is excellent on numerous areas application especially image inter-related task, such as image classification, image
In the computer vision problems such as semantic segmentation, image retrieval, object detection, start to substitute most of traditional algorithm, and gradually by
It is deployed on terminal device.
But neural computing amount is very huge, asks so that there are Processing with Neural Network speed is slow, operation power consumption is big etc.
Topic.Neural network mainly includes training stage and reasoning stage.High-precision processing result in order to obtain, weighted data is in training
Middle needs are calculated from mass data by iterating.In the ANN Reasoning stage, need in extremely short response
The calculation process to input data is completed in time (usually Millisecond), especially when Application of Neural Network is in real-time system
When, such as automatic Pilot field.In addition, calculating involved in neural network mainly includes convolution algorithm, activation operation and pond
Operation etc..
Existing research shows that neural network is more than 90% calculating time to be convolved process to occupy.Traditional convolution algorithm
By multiple multiply-accumulate operation, each element in output characteristic pattern is calculated separately.Although using the solution of the algorithm before
Scheme has been achieved for preliminary success, but when algorithm itself is more efficient, efficiency may be higher.Therefore, researcher at present
Propose the convolution algorithm of Winograd, the algorithm by carrying out specific data field conversion to input feature vector figure and weight,
It completes equivalent convolution algorithm task and reduces the multiplication number of convolution algorithm process.Due to nerve nets most of in practical application
The prediction process of network processor chips is using fixed neural network model, therefore used Winograd convolution output normal form
Generally also fixed mode, calculating process is very clear, has biggish optimization space.How to design and optimize and is based on
Winograd neural network accelerator structure becomes a research emphasis.
In addition, the data for inputting fixed point type can reach good experimental result for number Application of Neural Network big absolutely,
Speed can be more improved, power consumption is reduced.However the convolved data bit wide in existing fixed point neural network is fixed, Bu Nengling
Configuration living, reduces applicability.In general, the data bit width of 16bit can meet the accuracy requirement of neural network, and for one
A little not high networks of required precision and scene, 8bit data bit width can also meet accuracy requirement.Therefore, real in neural network
Existing data bit width is configurable preferably to be optimized.
Summary of the invention
The technical problem to be solved by the invention is to provide a kind of computational efficiencies that can be improved neural network convolution algorithm
The configurable convolution array accelerator structure based on Winograd.
The technical scheme adopted by the invention is that: a kind of configurable convolution array accelerator structure based on Winograd,
It include: that activation value cache module, weight cache module, output buffer module, controller, weight preprocessing module, activation value are pre-
Processing module, weight conversion module, activation value matrix conversion module, dot product module, matrix of consequence conversion module, accumulator module,
Pond module and active module, wherein
Activation value cache module is connected with controller for storing input pixel value or input feature vector map values, is activation value
Preprocessing module provides activation Value Data;
Weight cache module is connected with controller, provides for weight preprocessing module for storing trained weight
Weighted data;
Output buffer module, for storing a convolutional layer as a result, being connected with controller, when active module output data is complete
Data are passed to output buffer module by Cheng Hou, are used for next layer of convolution;
Controller controls the biography of activation Value Data to be processed, weighted data and convolution layer data according to calculating process
It is defeated;
Weight preprocessing module, receive weight cache module transmission to operational data, for dividing convolution kernel, when obtaining
Domain weight matrix K;
Activation value preprocessing module, receive the transmission of activation value cache module to operational data, for being cached from activation value
Module takes out activation value, for dividing activation value, obtains time domain activation value matrix I;
Weight conversion module, receive the transmission of weight preprocessing module to operational data, for realizing weighted data from when
Domain is converted to the domain Winograd, obtains the domain Winograd weight matrix U;
Activation value matrix conversion module, receive activation value preprocessing module transmission to operational data, for realizing activation
Value is converted to the domain Winograd from time domain, obtains the domain Winograd activation value matrix V;
Dot product module, receive respectively weight conversion module and activation value matrix conversion module transfer to operational data, use
In the dot product operations for realizing the domain Winograd activation value matrix and the domain Winograd weight matrix, the domain Winograd dot product knot is obtained
Fruit matrix M;
Matrix of consequence conversion module, receive dot product module transfer to operational data, for realizing dot product result matrix from
Conversion of the domain Winograd to time domain, time domain dot product result matrix F after being converted;
Accumulator module, reception result matrix conversion module transfer to operational data, by obtaining received data accumulation
To final convolution results;
Pond module, receive accumulator module transmission to operational data, final convolution results battle array is subjected to pond;
Active module, reception tank module transfer to operational data, pond result is subjected to the processing of Relu activation primitive,
It is after being activated as a result, being transferred to output buffer module.
The weight preprocessing module includes:
(1) convolution kernel that a size is 5*5 is passed through into zero padding, is extended to the convolution matrix of 6*6;
(2) convolution matrix of 6*6 is divided into the convolution kernel of four 3*3;
Specific division is as follows, wherein KinputIndicate the weight matrix of a 5*5, downside is 4 corresponding stroke respectively
Time domain weight matrix K to be processed after point1、K2、K3、K4.Calculating U=GKGTIn, K value is followed successively by K1、K2、K3、K4:
The activation value preprocessing module is that the activation value matrix of 6*6 size is divided into 4 4*4 sizes of overlapping
Matrix.It divides as follows, wherein IinputIndicate the weight matrix of a 5*5, downside is that the size after dividing is 4*4 respectively
Time domain activation value matrix I to be processed1、I2、I3、I4.Calculating V=BTIn IB, I value is followed successively by I1、I2、I3、I4:
The weight conversion module is to subtract the Matrix Multiplication in completing to calculate by ranks addition of vectors, thereby executing
It is directed to the conversion of weight matrix in Winograd convolution, obtains the domain Winograd weight matrix U=[GKGT] wherein, when K is indicated
Domain weight matrix, G are that weight converts companion matrix, U is the domain Winograd weight matrix;
Concrete operations: using the first row vector of weight matrix K as provisional matrix C2The first row, wherein provisional matrix C2
=GTK;Integer in weight matrix K is moved to right into benefit 0, negative moves to right benefit 1 and completes except two;When weight, which is positive, to be worth, weight is moved to right,
Mend 0 in the weight left side;When weight is negative, weight is moved to right, and the weight left side mends 1;By the first, second and third row element phase of weight matrix K
In addition the vector result after moving to right one again after is as provisional matrix C2The second row;By the first, second and third of weight matrix K
Row element is added the vector result after moving to right one again later as provisional matrix C2The third line;By the of weight matrix K
Three row vectors are as provisional matrix C2Fourth line;By provisional matrix C2First column vector is as the domain Winograd weight matrix U's
First row;By provisional matrix C2First, second and third column be added after move to right one again after vector result as Winograd
The secondary series of domain weight matrix U;By provisional matrix C2First, second and third column be added after move to right one again after vector knot
Fruit arranges as the third of the domain Winograd weight matrix U;By provisional matrix C2Third column vector as the domain Winograd weight
4th column of matrix U, finally obtain the domain Winograd weight matrix U.
The activation value matrix conversion module, is subtracted by ranks addition of vectors, and the Matrix Multiplication in calculating is completed, thus
It executes in Winograd convolution for the conversion operation of time domain activation value matrix, obtains matrix V=[BTIB] wherein, I is time domain
Activation value matrix, B are that activation value converts companion matrix, V is the domain Winograd activation value matrix;
Concrete operations: the first row of time domain activation value matrix I is subtracted into the vector differentials of the third line as provisional matrix C1's
The first row, wherein provisional matrix C1=BTI;The result that the second row of time domain activation value matrix I is added with the third line is as interim
Matrix C1The second row;Using the vector differentials of the third line row that subtracts the second of time domain activation value matrix I as provisional matrix C1Third
Row;The second row of time domain activation value matrix I is subtracted into the vector differentials of fourth line as provisional matrix C1Fourth line;By interim square
Battle array C1First row subtract first row of the tertial vector differentials as the domain Winograd activation value matrix V;By provisional matrix C1's
Secondary series of the result that secondary series is added with third column as the domain Winograd activation value matrix V;By provisional matrix C1Third
The vector differentials that column subtract secondary series are arranged as the third of the domain Winograd activation value matrix V;By provisional matrix C1Secondary series subtract
Fourth column of the vector differentials of 4th column as the domain Winograd activation value matrix V, finally obtain the domain Winograd activation value square
Battle array V.
The dot product module is by executing the domain the domain Winograd weight matrix U and Winograd activation value matrix V
Dot product operations, obtain the domain Winograd dot product result matrix M, and formula is expressed as M=U ⊙ V, wherein U is the domain Winograd weight
Matrix, V are the domain Winograd activation value matrixs;The dot product module has 8 to multiply to realize the configurable dot product of data bit width
Two operating modes of musical instruments used in a Buddhist or Taoist mass and 16 multipliers respectively correspond the operation for carrying out two kinds of data bit widths of 8bit and 16bit, realize 8*
The fixed-point multiplication operation of 8bit and 16*16bit.
8 multipliers include sequentially connected first gating unit, first negate unit, the first shift unit,
First summing elements, the second gating unit, second negate unit and third gating unit, wherein
First gating unit receives respectively: the data information and power of weight conversion module and activation value matrix conversion module
The Signed Domination signal of weight conversion module;
First negates the data information that unit receives the first gating unit, negates to received data;
First shift unit receives the first data information for negating unit, and receives the sign bit letter of the first gating unit
Breath, shifts received data according to symbolic information;
First summing elements receive the data information of the first shift unit, add up to received data;
Second gating unit receives the data information of the first summing elements and the sign bit information of the first gating unit, and passes
It gives second and negates unit;
Second negates the data information that unit receives the second gating unit, negates to received data;
Third gating unit receives second respectively and negates the data information of unit and the first summing elements, and exports.
16 multipliers include that sequentially connected 4th gating unit, third negate unit, 8 multipliers,
Two shift units, the second summing elements, the 5th gating unit, the 4th negate unit and the 6th gating unit, wherein
4th gating unit receives respectively: the data information and power of weight conversion module and activation value matrix conversion module
The Signed Domination signal of weight conversion module;
Third negates the data information that unit receives the 4th gating unit, negates to received data;
8 multipliers carry out the operation of 8bit data bit width, realize the fixed-point multiplication operation of 8*8bit;
Second shift unit receives the data information of 8 multipliers, shifts to received data;
Second summing elements receive the data information of the second shift unit, add up to received data;
5th gating unit receives the data information of the second summing elements and the sign bit information of the 4th gating unit, and passes
It gives the 4th and negates unit;
4th negates the data information that unit receives the 5th gating unit, negates to received data;
6th gating unit receives the 4th and negates the data information of unit, and exports.
The matrix of consequence conversion module is to add and subtract to grasp by the domain Winograd dot product result matrix M ranks vector shift
Make to execute the conversion operation F=A for being directed to the domain Winograd dot product result matrix MTMA, wherein M is the domain Winograd dot product result
Matrix, A are the companion matrixs that changes of the domain Winograd dot product result matrix M, and F is time domain dot product result matrix;
Concrete operations: using vector result that the first, second and third row of the domain Winograd dot product result matrix M is added as facing
When Matrix C3The first row, wherein provisional matrix C3=ATM;By the domain point Winograd dot product result matrix M second and third, four rows
The vector result of addition is as provisional matrix C3The second row;By provisional matrix C3First, second and third column be added vector result
First row as the time domain dot product result matrix F after conversion;By provisional matrix C3Second and third, four column be added vector knot
Secondary series of the fruit as the time domain dot product result matrix F after conversion, the time domain dot product result matrix F after finally obtaining conversion.
A kind of configurable convolution array accelerator structure based on Winograd of the invention, according to fixed normal form
The operation feature of Winograd convolution algorithm devises the configurable convolution array accelerator of bit wide, different nerves is flexibly met
The demand of network and different convolutional layers to bit wide.In addition, the configurable multiplier unit of dedicated data bit width has also been devised,
To improve the computational efficiency of neural network convolution algorithm, calculating power consumption is reduced.
Detailed description of the invention
Fig. 1 is Winograd convolution array accelerator general frame figure;
Fig. 2 is a kind of composition schematic diagram of the configurable convolution array accelerator structure based on Winograd of the present invention;
Fig. 3 is the schematic diagram of 8 multipliers during data bit width can match;
Fig. 4 is the schematic diagram of 16 multipliers during data bit width can match.
Specific embodiment
Below with reference to embodiment and attached drawing to a kind of configurable convolution array accelerator based on Winograd of the invention
Structure is described in detail.
In the convolutional calculation of neural network, Winograd conversion formula is
Out=AT[(GKGT)⊙(BTIB)]A(1)
Wherein K indicates that time domain weights matrix, I indicate that time domain activates value matrix, and A, G, B are respectively indicated and dot product matrix of consequence
[(GKGT)⊙(BTIB)], time domain weights matrix K, the corresponding transition matrix of time domain activation value matrix I, transition matrix A, G, B are specific
It is as follows:
The output normal form of used Winograd convolution is F (2*2,3*3) in the present invention, and first parameter 2*2 indicates defeated
The size of characteristic pattern out, second parameter 3*3 indicate the size of convolution kernel.
As shown in Figure 1, Winograd convolution can be divided into three phases execution.First stage, by what is read from caching
Weight matrix G and time domain activation value matrix I switchs to the domain Winograd from time domain, and concrete operations are matrix multiplication operation, calculate knot
Fruit indicates with U and V, wherein U=GKGT, V=BTIB;Second stage swashs the domain the Winograd domain weight matrix U and Winograd
Value matrix V living executes dot product operations " ⊙ ", obtains the domain Winograd dot product result matrix M=U ⊙ V;Phase III is by dot product knot
Fruit switchs to time domain from the domain Winograd.
As shown in Fig. 2, a kind of configurable convolution array accelerator structure based on Winograd of the invention, comprising: swash
Work value cache module 1, weight cache module 2, output buffer module 3, controller 4, weight preprocessing module 5, activation value are located in advance
Manage module 6, weight conversion module 7, activation value matrix conversion module 8, dot product module 9, matrix of consequence conversion module 10, cumulative mould
Block 11, pond module 12 and active module 13, wherein
1) activation value cache module 1 is connected with controller 4 for storing input pixel value or input feature vector map values, is sharp
Value preprocessing module 6 living provides activation Value Data;
2) weight cache module 2 is connected with controller 4 for storing trained weight, is weight preprocessing module
5 provide weighted data;
3) output buffer module 3, for storing a convolutional layer as a result, being connected with controller 4, when active module 13 exports
After the completion of data, data are passed to output buffer module 3, are used for next layer of convolution;
4) controller 4 control activation Value Data to be processed, weighted data and convolution layer data according to calculating process
Transmission;
5) weight preprocessing module 5 receives dividing to operational data for dividing convolution kernel for the transmission of weight cache module 2
Four time domains weight matrix K to be processed is not obtained1、K2、K3、K4;
The weight preprocessing module 5 includes: (1) by convolution kernel that a size is 5*5 by zero padding, is extended to 6*
6 convolution matrix;(2) convolution matrix of 6*6 is divided into the convolution kernel of four 3*3;The Winograd of 3*3 can thus be used
The convolution that normal form realizes 5*5 is exported, efficiently and not will increase power consumption multiplication number.
Specific division is as follows, wherein KinputIndicate that a size is the time domain input weight matrix of 5*5, when right side is
Four processing results after the division of 6*6 time domain weights matrix after the input weight matrix-expand of domain, are 4 corresponding separately below
Time domain weight matrix K to be processed after division1、K2、K3、K4.Calculating U=GKGTIn, K value is followed successively by K1、K2、K3、K4:
6) activation value preprocessing module 6, receive that activation value cache module 1 transmits to operational data, for from activation value
Cache module 1 takes out activation value and respectively obtains time domain activation value matrix I to be processed for dividing activation value1、I2、I3、I4.?
Calculate V=BTIn IB, I value is followed successively by I1、I2、I3、I4:
The activation value preprocessing module 6 is realized the reading of activation value and is pre-processed to it.It is calculated in Winograd
In method, activation value needs are corresponding with weight, and the data that many of them is reused, so being overlapped division.It is described
Activation value preprocessing module 6 be the matrix that the activation value matrix of 6*6 size is divided into 4 4*4 sizes of overlapping, it is right respectively
Answer the convolution kernel of 4 3*3;It divides as follows, wherein IinputIndicate that the time domain that a size is 6*6 inputs activation value
Matrix, lower section are respectively the time domain activation value matrix I to be processed that the size after dividing is 4*41、I2、I3、I4.Calculating V=BTIB
In, I value is followed successively by I1、I2、I3、I4:
7) weight conversion module 7, receive weight preprocessing module 5 transmit to operational data, for realizing weighted data
The domain Winograd is converted to from time domain, obtains the domain Winograd weight matrix U;
The weight conversion module 7 is to subtract the Matrix Multiplication in completing to calculate by ranks addition of vectors, thereby executing
It is directed to the conversion of weight matrix in Winograd convolution, obtains the domain Winograd weight matrix U=[GKGT] wherein, when K is indicated
Domain weight matrix, G are that weight converts companion matrix, U is the domain Winograd weight matrix;
Concrete operations: using the first row vector of time domain weights matrix K as provisional matrix C2The first row, wherein interim square
Battle array C2=GTK;Because of existence value 1/2 in weight matrix, only needs to move to right the integer in time domain weights matrix K benefit 0, bears
Number moves to right benefit 1 and completes except two;When weight, which is positive, to be worth, weight is moved to right, and the weight left side mends 0;When weight is negative, weight is moved to right,
Mend 1 in the weight left side;Vector result after moving to right one again after first, second and third row element of time domain weights matrix K is added
As provisional matrix C2The second row;Will time domain weights matrix K the first, second and third row element be added after move to right again one it
Vector result afterwards is as Matrix C2The third line;Using the third row vector of time domain weights matrix K as provisional matrix C2The 4th
Row;By provisional matrix C2First row of first column vector as the domain Winograd weight matrix U;By provisional matrix C2First,
Two, secondary series of the vector result as the domain Winograd weight matrix U after three column move to right one after being added again;It will be interim
Matrix C2First, second and third column be added after move to right one again after vector result as the domain Winograd weight matrix U's
Third column;By provisional matrix C2Third column vector as the domain Winograd weight matrix U the 4th column, finally obtain
The domain Winograd weight matrix U.
8) activation value matrix conversion module 8, receive that activation value preprocessing module 6 transmits to operational data, for realizing
Activation value is converted to the domain Winograd from time domain, obtains the domain Winograd activation value matrix V;
The activation value matrix conversion module 8, is subtracted by ranks addition of vectors, and the Matrix Multiplication in calculating is completed, from
And execute in Winograd convolution for time domain activation value matrix conversion operation, obtain the domain Winograd activation value matrix V=
[BTIB] wherein, it be activation value conversion companion matrix, V is the domain Winograd activation value matrix that I, which is time domain activation value matrix, B,;
Concrete operations: the first row of time domain activation value matrix I is subtracted into the vector differentials of the third line as provisional matrix C1's
The first row, wherein provisional matrix C1=BTI;The result that the second row of time domain activation value matrix I is added with the third line is as interim
Matrix C1The second row;Using the vector differentials of the third line row that subtracts the second of time domain activation value matrix I as provisional matrix C1Third
Row;The second row of time domain activation value matrix I is subtracted into the vector differentials of fourth line as provisional matrix C1Fourth line;By interim square
Battle array C1First row subtract first row of the tertial vector differentials as the domain Winograd activation value matrix V;By provisional matrix C1's
Secondary series of the result that secondary series is added with third column as the domain Winograd activation value matrix V;By provisional matrix C1Third
The vector differentials that column subtract secondary series are arranged as the third of the domain Winograd activation value matrix V;By provisional matrix C1Secondary series subtract
Fourth column of the vector differentials of 4th column as the domain Winograd activation value matrix V, finally obtain the domain Winograd activation value square
Battle array V.
9) dot product module 9, receive that weight conversion module 7 and activation value matrix conversion module 8 transmit respectively to operand
According to, for realizing the domain Winograd activation value matrix and the domain Winograd weight matrix dot product operations, obtain the domain Winograd
The module for calculating time and resource is most consumed in dot product result matrix M and convolution;
The dot product module 9 is by executing the domain the domain Winograd weight matrix U and Winograd activation value matrix V
Dot product operations, obtain the domain Winograd dot product result matrix M, and formula is expressed as M=U ⊙ V, wherein U is the domain Winograd weight
Matrix, V are the domain Winograd activation value matrixs;The dot product module 9 has 8 to realize the configurable dot product of data bit width
Two operating modes of multiplier and 16 multipliers respectively correspond the operation for carrying out two kinds of data bit widths of 8bit and 16bit, realize
The fixed-point multiplication operation of 8*8bit and 16*16bit.Wherein,
(1) as shown in figure 3,8 multipliers include sequentially connected first gating unit 14, first negate unit
15, the first shift unit 16, the first summing elements 17, the second gating unit 18, second negate unit 19 and third gating unit
20, wherein
First gating unit 14 receives respectively: weight conversion module 7 and the data information of activation value matrix conversion module 8 with
And the Signed Domination signal of weight conversion module 7;
First negates the data information that unit 15 receives the first gating unit 14, negates to received data;
First shift unit 16 receives the first data information for negating unit 15, and receives the symbol of the first gating unit 14
Number position information, shifts received data according to symbolic information;
First summing elements 17 receive the data information of the first shift unit 16, add up to received data;
Second gating unit 18 receives the data information of the first summing elements 17 and the sign bit letter of the first gating unit 14
Breath, and send second to and negate unit 19;
Second negates the data information that unit 19 receives the second gating unit 18, negates to received data;
Third gating unit 20 receives the second data information for negating unit 19 and the first summing elements 17 respectively, and defeated
Out.
8 multiplier concrete operations: according to the sign bit of two multipliers, phase exclusive or obtains the sign bit of result, and root
Judge according to sign bit positive and negative, then propose sign bit if negative, rear seven digit is negated plus 1;If positive number, then rear seven digits are protected
It holds constant.Judge it is positive and negative after multiplier A1Multiplier B is judged respectively1Whether each binary digit is 1, is if 1 corresponding median
Multiplier A1The corresponding position of seven bitwise shift lefts afterwards, if 0 that 0 corresponding median is 8.Multiplier B is judged1Latter seven after, will
All medians are added the result H being multiplied2, then decide whether to be negated plus 1 according to outcome symbol position, if knot
Fruit sign bit 1 is then by the result H of multiplication2It negates and adds 1, remained unchanged if outcome symbol position is 0, obtain multiplied result H3, finally
In multiplied result H3The 8th take outcome symbol position, obtain final result.No symbol 8 multiplies, without considering sign bit,
It will be according to multiplier B18 data shifter-adders obtain result.
(2) as shown in figure 4,16 multipliers include that sequentially connected 4th gating unit 21, third negate list
First 22,8 multipliers 23, the second shift unit 24, the second summing elements 25, the 5th gating unit the 26, the 4th negate unit 27
With the 6th gating unit 28, wherein
4th gating unit 21 receives respectively: weight conversion module 7 and the data information of activation value matrix conversion module 8 with
And the Signed Domination signal of weight conversion module 7;
Third negates the data information that unit 22 receives the 4th gating unit 21, negates to received data;
8 multipliers 23 carry out the operation of 8bit data bit width, realize the fixed-point multiplication operation of 8*8bit;
Second shift unit 24 receives the data information of 8 multipliers 23, shifts to received data;
Second summing elements 25 receive the data information of the second shift unit 24, add up to received data;
5th gating unit 26 receives the data information of the second summing elements 25 and the sign bit letter of the 4th gating unit 21
Breath, and send the 4th to and negate unit 27;
4th negates the data information that unit 27 receives the 5th gating unit 26, negates to received data;
6th gating unit 28 receives the 4th and negates the data information of unit 27, and exports.
16 multipliers are realized by 48 multiplier devices, wherein the gating letter of 8 multipliers used
Number be 0, i.e., without sign multiplication device.Firstly, being judged according to the sign bit of two 16 multipliers positive and negative, remained unchanged if canonical,
It is negated if being negative and adds 1;Secondly 16 digits after judgement are divided into most-significant byte number and least-significant byte number, it is then corresponding to be multiplied;Later will
The result that two most-significant byte numbers are multiplied moves to left 16, respectively by the result of multiplier D most-significant byte multiplier E least-significant byte multiplication, multiplier D least-significant byte
8 are moved to left after the results added that multiplier E most-significant byte is multiplied, by the result after displacement plus multiplier A least-significant byte and multiplier B least-significant byte
Multiplication obtains multiplied result L;Add 1 finally, deciding whether to negate according to outcome symbol position, if multiplied result L symbol is 1
The result of multiplication is negated and adds 1, is remained unchanged if multiplied result L sign bit is 0, finally takes symbol in the first place multiplied result L
The worth of position exports result to the end.
10) matrix of consequence conversion module 10, receive dot product module 9 transmit to operational data, for realizing dot product result
Conversion of the matrix from Winograd domain to time domain, the time domain dot product result matrix F after being converted;
The matrix of consequence conversion module 10 is added and subtracted by the domain Winograd dot product result matrix M ranks vector shift
Operation executes the conversion operation F=A for being directed to the domain Winograd dot product result matrix MTMA, wherein M is the domain Winograd dot product knot
Fruit matrix, A are the conversion companion matrixs of the domain Winograd dot product result matrix M, and F is time domain dot product result matrix;
Concrete operations: using vector result that the first, second and third row of the domain Winograd dot product result matrix M is added as facing
When Matrix C3The first row, wherein C3=ATM;By the domain Winograd dot product result matrix M second and third, four rows be added vector
As a result it is used as provisional matrix C3The second row;By provisional matrix C3First, second and third column be added vector result as conversion after
Time domain dot product result matrix F first row;By provisional matrix C3Second and third, four column be added vector results as conversion
The secondary series of time domain dot product result matrix F afterwards, finally obtains time domain dot product result matrix F.
11) accumulator module 11, reception result matrix conversion module 10 transmit to operational data, by by received data
It is cumulative, obtain final convolution results, the matrix of consequence of a 2*2 size;
12) pond module 12, receive that accumulator module 11 transmits to operational data, final convolution results battle array is subjected to pond
Change;Different pond methods, including maximizing method, averaging method, method of minimizing can be used, to the neuron of input into
The operation of row pondization.The matrix of consequence finally exported due to Winograd convolution F (2*2,3*3) is 2*2 size, then can directly into
The pondization of row 2*2 operates, and compares to obtain pond result by size three times: be for the first time matrix of consequence the first row two numbers into
Row comparison, is that two numbers of the second row compare for the second time, is to compare the preceding result compared twice for the third time, obtains
The maximum pond result of the matrix of consequence.
13) active module 13, reception tank module 12 transmit to operational data, pond result is subjected to Relu activation letter
Number processing is after being activated as a result, being transferred to output buffer module 3.
Claims (9)
1. a kind of configurable convolution array accelerator structure based on Winograd characterized by comprising activation value caches mould
Block (1), weight cache module (2), output buffer module (3), controller (4), weight preprocessing module (5), activation value are located in advance
Manage module (6), weight conversion module (7), activation value matrix conversion module (8), dot product module (9), matrix of consequence conversion module
(10), accumulator module (11), pond module (12) and active module (13), wherein
Activation value cache module (1) is connected for storing input pixel value or input feature vector map values with controller (4), for activation
It is worth preprocessing module (6) and activation Value Data is provided;
Weight cache module (2) is connected with controller (4) for storing trained weight, is weight preprocessing module
(5) weighted data is provided;
Output buffer module (3), for storing a convolutional layer as a result, being connected with controller (4), when active module (13) export
After the completion of data, data are passed to output buffer module (3), are used for next layer of convolution;
Controller (4) controls the transmission of activation Value Data, weighted data and convolution layer data to be processed according to calculating process;
Weight preprocessing module (5) receives obtaining to operational data for dividing convolution kernel for weight cache module (2) transmission
Time domain weights matrix K;
Activation value preprocessing module (6), receive activation value cache module (1) transmission to operational data, for slow from activation value
Storing module (1) takes out activation value, for dividing activation value, obtains time domain activation value matrix I;
Weight conversion module (7), receive weight preprocessing module (5) transmission to operational data, for realizing weighted data from
Time domain is converted to the domain Winograd, obtains the domain Winograd weight matrix U;
Activation value matrix conversion module (8), receive activation value preprocessing module (6) transmission to operational data, for realizing swashing
Value living is converted to the domain Winograd from time domain, obtains the domain Winograd activation value matrix V;
Dot product module (9), receive that weight conversion module (7) and activation value matrix conversion module (8) transmit respectively to operand
According to, for realizing the domain Winograd activation value matrix and the domain Winograd weight matrix dot product operations, obtain the domain Winograd
Dot product result matrix M;
Matrix of consequence conversion module (10), receive dot product module (9) transmission to operational data, for realizing dot product result matrix
Conversion from Winograd domain to time domain, the time domain dot product result matrix F after being converted;
Accumulator module (11), reception result matrix conversion module (10) transmission to operational data, by the way that received data are tired out
Add, obtains final convolution results;
Pond module (12), receive accumulator module (11) transmission to operational data, final convolution results battle array is subjected to pond;
Active module (13), reception tank module (12) transmission to operational data, by pond result progress Relu activation primitive
Processing is after being activated as a result, being transferred to output buffer module (3).
2. a kind of configurable convolution array accelerator structure based on Winograd according to claim 1, feature exist
In the weight preprocessing module (5) includes:
(1) convolution kernel that a size is 5*5 is passed through into zero padding, is extended to the convolution matrix of 6*6;
(2) convolution matrix of 6*6 is divided into the convolution kernel of four 3*3;
Specific division is as follows, wherein KinputThe weight matrix of a 5*5 is indicated, after downside is respectively 4 corresponding divisions
Time domain weight matrix K to be processed1、K2、K3、K4.Calculating U=GKGTIn, K value is followed successively by K1、K2、K3、K4:
3. a kind of configurable convolution array accelerator structure based on Winograd according to claim 1, feature exist
In the activation value preprocessing module (6) is the square that the activation value matrix of 6*6 size is divided into 4 4*4 sizes of overlapping
Battle array.It divides as follows, wherein IinputIndicate the weight matrix of a 5*5, downside be respectively size after dividing be 4*4 when
Domain activation value matrix I to be processed1、I2、I3、I4.Calculating V=BTIn IB, I value is followed successively by I1、I2、I3、I4:
4. a kind of configurable convolution array accelerator structure based on Winograd according to claim 1, feature exist
In, the weight conversion module (7) is to subtract the Matrix Multiplication in completing to calculate by ranks addition of vectors, thereby executing
It is directed to the conversion of weight matrix in Winograd convolution, obtains the domain Winograd weight matrix U=[GKGT] wherein, when K is indicated
Domain weight matrix, G are that weight converts companion matrix, U is the domain Winograd weight matrix;
Concrete operations: using the first row vector of weight matrix K as provisional matrix C2The first row, wherein provisional matrix C2=
GTK;Integer in weight matrix K is moved to right into benefit 0, negative moves to right benefit 1 and completes except two;When weight, which is positive, to be worth, weight is moved to right, power
It is worth the left side and mends 0;When weight is negative, weight is moved to right, and the weight left side mends 1;The first, second and third row element of weight matrix K is added
Vector result after moving to right one again later is as provisional matrix C2The second row;By the first, second and third row of weight matrix K
Element is added the vector result after moving to right one again later as provisional matrix C2The third line;By the third of weight matrix K
Row vector is as provisional matrix C2Fourth line;By provisional matrix C2First column vector as the domain Winograd weight matrix U
One column;By provisional matrix C2First, second and third column be added after move to right one again after vector result as the domain Winograd
The secondary series of weight matrix U;By provisional matrix C2The first, second and third column be added after move to right vector result after one again
Third as the domain Winograd weight matrix U arranges;By provisional matrix C2Third column vector as the domain Winograd weight square
The 4th column of battle array U, finally obtain the domain Winograd weight matrix U.
5. a kind of configurable convolution array accelerator structure based on Winograd according to claim 1, feature exist
In the activation value matrix conversion module (8), is subtracted by ranks addition of vectors, the Matrix Multiplication in calculating is completed, to hold
For the conversion operation of time domain activation value matrix in row Winograd convolution, matrix V=[B is obtainedTIB] wherein, I is that time domain swashs
Work value matrix, B are that activation value converts companion matrix, V is the domain Winograd activation value matrix;
Concrete operations: the first row of time domain activation value matrix I is subtracted into the vector differentials of the third line as provisional matrix C1First
It goes, wherein provisional matrix C1=BTI;The result that the second row of time domain activation value matrix I is added with the third line is as provisional matrix
C1The second row;Using the vector differentials of the third line row that subtracts the second of time domain activation value matrix I as provisional matrix C1The third line;
The second row of time domain activation value matrix I is subtracted into the vector differentials of fourth line as provisional matrix C1Fourth line;By provisional matrix C1
First row subtract first row of the tertial vector differentials as the domain Winograd activation value matrix V;By provisional matrix C1?
Secondary series of the result that two column are added with third column as the domain Winograd activation value matrix V;By provisional matrix C1Third column
The vector differentials for subtracting secondary series are arranged as the third of the domain Winograd activation value matrix V;By provisional matrix C1Secondary series subtract the
Fourth column of the vector differentials of four column as the domain Winograd activation value matrix V, finally obtain the domain Winograd activation value matrix
V。
6. a kind of configurable convolution array accelerator structure based on Winograd according to claim 1, feature exist
In the dot product module (9) is the point by executing the domain the domain Winograd weight matrix U and Winograd activation value matrix V
Product operation, obtains the domain Winograd dot product result matrix M, and formula is expressed as M=U ⊙ V, wherein U is the domain Winograd weight square
Battle array, V are the domain Winograd activation value matrixs;The dot product module (9) has 8 to realize the configurable dot product of data bit width
Two operating modes of multiplier and 16 multipliers respectively correspond the operation for carrying out two kinds of data bit widths of 8bit and 16bit, realize
The fixed-point multiplication operation of 8*8bit and 16*16bit.
7. a kind of configurable convolution array accelerator structure based on Winograd according to claim 6, feature exist
In, 8 multipliers include sequentially connected first gating unit (14), first to negate unit (15), the first displacement single
First (16), the first summing elements (17), the second gating unit (18), second negate unit (19) and third gating unit (20),
Wherein,
First gating unit (14) receives respectively: the data information of weight conversion module (7) and activation value matrix conversion module (8)
And the Signed Domination signal of weight conversion module (7);
First negates the data information that unit (15) receive the first gating unit (14), negates to received data;
First shift unit (16) receives the first data information for negating unit (15), and receives the first gating unit (14)
Sign bit information shifts received data according to symbolic information;
First summing elements (17) receive the data information of the first shift unit (16), add up to received data;
Second gating unit (18) receives the data information of the first summing elements (17) and the sign bit of the first gating unit (14)
Information, and send second to and negate unit (19);
Second negates the data information that unit (19) receive the second gating unit (18), negates to received data;
Third gating unit (20) receives the second data information for negating unit (19) and the first summing elements (17) respectively, and defeated
Out.
8. a kind of configurable convolution array accelerator structure based on Winograd according to claim 6, feature exist
It include that sequentially connected 4th gating unit (21), third negate unit (22), 8 multipliers in, 16 multipliers
(23), the second shift unit (24), the second summing elements (25), the 5th gating unit (26), the 4th negate unit (27) and
Six gating units (28), wherein
4th gating unit (21) receives respectively: the data information of weight conversion module (7) and activation value matrix conversion module (8)
And the Signed Domination signal of weight conversion module (7);
Third negates the data information that unit (22) receive the 4th gating unit (21), negates to received data;
8 multipliers (23) carry out the operation of 8bit data bit width, realize the fixed-point multiplication operation of 8*8bit;
Second shift unit (24) receives the data information of 8 multipliers (23), shifts to received data;
Second summing elements (25) receive the data information of the second shift unit (24), add up to received data;
5th gating unit (26) receives the data information of the second summing elements (25) and the sign bit of the 4th gating unit (21)
Information, and send the 4th to and negate unit (27);
4th negates the data information that unit (27) receive the 5th gating unit (26), negates to received data;
6th gating unit (28) receives the 4th and negates the data information of unit (27), and exports.
9. a kind of configurable convolution array accelerator structure based on Winograd according to claim 1, feature exist
In the matrix of consequence conversion module (10) is to add and subtract to grasp by the domain Winograd dot product result matrix M ranks vector shift
Make to execute the conversion operation F=A for being directed to the domain Winograd dot product result matrix MTMA, wherein M is the domain Winograd dot product result
Matrix, A are the companion matrixs that changes of the domain Winograd dot product result matrix M, and F is time domain dot product result matrix;
Concrete operations: the vector result that the first, second and third row of the domain Winograd dot product result matrix M is added is as interim square
Battle array C3The first row, wherein provisional matrix C3=ATM;By the domain point Winograd dot product result matrix M second and third, four rows be added
Vector result as provisional matrix C3The second row;By provisional matrix C3First, second and third column be added vector result conduct
The first row of time domain dot product result matrix F after conversion;By provisional matrix C3Second and third, four column be added vector results make
For the secondary series of the time domain dot product result matrix F after conversion, the time domain dot product result matrix F after converting is finally obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910511987.6A CN110288086B (en) | 2019-06-13 | 2019-06-13 | Winograd-based configurable convolution array accelerator structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910511987.6A CN110288086B (en) | 2019-06-13 | 2019-06-13 | Winograd-based configurable convolution array accelerator structure |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110288086A true CN110288086A (en) | 2019-09-27 |
CN110288086B CN110288086B (en) | 2023-07-21 |
Family
ID=68004097
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910511987.6A Active CN110288086B (en) | 2019-06-13 | 2019-06-13 | Winograd-based configurable convolution array accelerator structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110288086B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111325332A (en) * | 2020-02-18 | 2020-06-23 | 百度在线网络技术(北京)有限公司 | Convolutional neural network processing method and device |
CN112580793A (en) * | 2020-12-24 | 2021-03-30 | 清华大学 | Neural network accelerator based on time domain memory computing and acceleration method |
CN112639839A (en) * | 2020-05-22 | 2021-04-09 | 深圳市大疆创新科技有限公司 | Arithmetic device of neural network and control method thereof |
CN112734827A (en) * | 2021-01-07 | 2021-04-30 | 京东鲲鹏(江苏)科技有限公司 | Target detection method and device, electronic equipment and storage medium |
WO2021083097A1 (en) * | 2019-11-01 | 2021-05-06 | 中科寒武纪科技股份有限公司 | Data processing method and apparatus, and computer device and storage medium |
WO2021082747A1 (en) * | 2019-11-01 | 2021-05-06 | 中科寒武纪科技股份有限公司 | Operational apparatus and related product |
CN112862091A (en) * | 2021-01-26 | 2021-05-28 | 合肥工业大学 | Resource multiplexing type neural network hardware accelerating circuit based on quick convolution |
CN112949845A (en) * | 2021-03-08 | 2021-06-11 | 内蒙古大学 | Deep convolutional neural network accelerator based on FPGA |
CN113269302A (en) * | 2021-05-11 | 2021-08-17 | 中山大学 | Winograd processing method and system for 2D and 3D convolutional neural networks |
CN113283591A (en) * | 2021-07-22 | 2021-08-20 | 南京大学 | Efficient convolution implementation method and device based on Winograd algorithm and approximate multiplier |
CN113407904A (en) * | 2021-06-09 | 2021-09-17 | 中山大学 | Winograd processing method, system and medium compatible with multi-dimensional convolutional neural network |
CN113554163A (en) * | 2021-07-27 | 2021-10-26 | 深圳思谋信息科技有限公司 | Convolutional neural network accelerator |
CN113656751A (en) * | 2021-08-10 | 2021-11-16 | 上海新氦类脑智能科技有限公司 | Method, device, equipment and medium for realizing signed operation of unsigned DAC (digital-to-analog converter) |
CN114399036A (en) * | 2022-01-12 | 2022-04-26 | 电子科技大学 | Efficient convolution calculation unit based on one-dimensional Winograd algorithm |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103793199A (en) * | 2014-01-24 | 2014-05-14 | 天津大学 | Rapid RSA cryptography coprocessor capable of supporting dual domains |
US20160342893A1 (en) * | 2015-05-21 | 2016-11-24 | Google Inc. | Rotating data for neural network computations |
CN107862374A (en) * | 2017-10-30 | 2018-03-30 | 中国科学院计算技术研究所 | Processing with Neural Network system and processing method based on streamline |
US20180157969A1 (en) * | 2016-12-05 | 2018-06-07 | Beijing Deephi Technology Co., Ltd. | Apparatus and Method for Achieving Accelerator of Sparse Convolutional Neural Network |
CN109190755A (en) * | 2018-09-07 | 2019-01-11 | 中国科学院计算技术研究所 | Matrix conversion device and method towards neural network |
CN109190756A (en) * | 2018-09-10 | 2019-01-11 | 中国科学院计算技术研究所 | Arithmetic unit based on Winograd convolution and the neural network processor comprising the device |
CN109325591A (en) * | 2018-09-26 | 2019-02-12 | 中国科学院计算技术研究所 | Neural network processor towards Winograd convolution |
CN109359730A (en) * | 2018-09-26 | 2019-02-19 | 中国科学院计算技术研究所 | Neural network processor towards fixed output normal form Winograd convolution |
CN109447241A (en) * | 2018-09-29 | 2019-03-08 | 西安交通大学 | A kind of dynamic reconfigurable convolutional neural networks accelerator architecture in internet of things oriented field |
-
2019
- 2019-06-13 CN CN201910511987.6A patent/CN110288086B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103793199A (en) * | 2014-01-24 | 2014-05-14 | 天津大学 | Rapid RSA cryptography coprocessor capable of supporting dual domains |
US20160342893A1 (en) * | 2015-05-21 | 2016-11-24 | Google Inc. | Rotating data for neural network computations |
US20180157969A1 (en) * | 2016-12-05 | 2018-06-07 | Beijing Deephi Technology Co., Ltd. | Apparatus and Method for Achieving Accelerator of Sparse Convolutional Neural Network |
CN107862374A (en) * | 2017-10-30 | 2018-03-30 | 中国科学院计算技术研究所 | Processing with Neural Network system and processing method based on streamline |
CN109190755A (en) * | 2018-09-07 | 2019-01-11 | 中国科学院计算技术研究所 | Matrix conversion device and method towards neural network |
CN109190756A (en) * | 2018-09-10 | 2019-01-11 | 中国科学院计算技术研究所 | Arithmetic unit based on Winograd convolution and the neural network processor comprising the device |
CN109325591A (en) * | 2018-09-26 | 2019-02-12 | 中国科学院计算技术研究所 | Neural network processor towards Winograd convolution |
CN109359730A (en) * | 2018-09-26 | 2019-02-19 | 中国科学院计算技术研究所 | Neural network processor towards fixed output normal form Winograd convolution |
CN109447241A (en) * | 2018-09-29 | 2019-03-08 | 西安交通大学 | A kind of dynamic reconfigurable convolutional neural networks accelerator architecture in internet of things oriented field |
Non-Patent Citations (4)
Title |
---|
ANDREW LAVIN 等: "Fast Algorithms for Convolutional Neural Networks", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
LINGCHUAN MENG 等: "EFFICIENT WINOGRAD CONVOLUTION VIA INTEGER ARITHMETIC", 《ARXIV》 * |
LIQIANG LU 等: "SpWA: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs", 《ACM》 * |
Y HUANG 等: "A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm", 《JOURNAL OF PHYSICS: CONFERENCE SERIES》 * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112765538A (en) * | 2019-11-01 | 2021-05-07 | 中科寒武纪科技股份有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN112765538B (en) * | 2019-11-01 | 2024-03-29 | 中科寒武纪科技股份有限公司 | Data processing method, device, computer equipment and storage medium |
WO2021083097A1 (en) * | 2019-11-01 | 2021-05-06 | 中科寒武纪科技股份有限公司 | Data processing method and apparatus, and computer device and storage medium |
WO2021082747A1 (en) * | 2019-11-01 | 2021-05-06 | 中科寒武纪科技股份有限公司 | Operational apparatus and related product |
CN111325332B (en) * | 2020-02-18 | 2023-09-08 | 百度在线网络技术(北京)有限公司 | Convolutional neural network processing method and device |
CN111325332A (en) * | 2020-02-18 | 2020-06-23 | 百度在线网络技术(北京)有限公司 | Convolutional neural network processing method and device |
WO2021232422A1 (en) * | 2020-05-22 | 2021-11-25 | 深圳市大疆创新科技有限公司 | Neural network arithmetic device and control method thereof |
CN112639839A (en) * | 2020-05-22 | 2021-04-09 | 深圳市大疆创新科技有限公司 | Arithmetic device of neural network and control method thereof |
CN112580793B (en) * | 2020-12-24 | 2022-08-12 | 清华大学 | Neural network accelerator based on time domain memory computing and acceleration method |
CN112580793A (en) * | 2020-12-24 | 2021-03-30 | 清华大学 | Neural network accelerator based on time domain memory computing and acceleration method |
CN112734827A (en) * | 2021-01-07 | 2021-04-30 | 京东鲲鹏(江苏)科技有限公司 | Target detection method and device, electronic equipment and storage medium |
CN112862091A (en) * | 2021-01-26 | 2021-05-28 | 合肥工业大学 | Resource multiplexing type neural network hardware accelerating circuit based on quick convolution |
CN112949845A (en) * | 2021-03-08 | 2021-06-11 | 内蒙古大学 | Deep convolutional neural network accelerator based on FPGA |
CN113269302A (en) * | 2021-05-11 | 2021-08-17 | 中山大学 | Winograd processing method and system for 2D and 3D convolutional neural networks |
CN113407904A (en) * | 2021-06-09 | 2021-09-17 | 中山大学 | Winograd processing method, system and medium compatible with multi-dimensional convolutional neural network |
CN113283591B (en) * | 2021-07-22 | 2021-11-16 | 南京大学 | Efficient convolution implementation method and device based on Winograd algorithm and approximate multiplier |
CN113283591A (en) * | 2021-07-22 | 2021-08-20 | 南京大学 | Efficient convolution implementation method and device based on Winograd algorithm and approximate multiplier |
CN113554163A (en) * | 2021-07-27 | 2021-10-26 | 深圳思谋信息科技有限公司 | Convolutional neural network accelerator |
CN113554163B (en) * | 2021-07-27 | 2024-03-29 | 深圳思谋信息科技有限公司 | Convolutional neural network accelerator |
CN113656751A (en) * | 2021-08-10 | 2021-11-16 | 上海新氦类脑智能科技有限公司 | Method, device, equipment and medium for realizing signed operation of unsigned DAC (digital-to-analog converter) |
CN113656751B (en) * | 2021-08-10 | 2024-02-27 | 上海新氦类脑智能科技有限公司 | Method, apparatus, device and medium for realizing signed operation by unsigned DAC |
CN114399036A (en) * | 2022-01-12 | 2022-04-26 | 电子科技大学 | Efficient convolution calculation unit based on one-dimensional Winograd algorithm |
CN114399036B (en) * | 2022-01-12 | 2023-08-22 | 电子科技大学 | Efficient convolution calculation unit based on one-dimensional Winograd algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN110288086B (en) | 2023-07-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110288086A (en) | A kind of configurable convolution array accelerator structure based on Winograd | |
CN105681628B (en) | A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and the method for realizing image denoising processing | |
CN108108809B (en) | Hardware architecture for reasoning and accelerating convolutional neural network and working method thereof | |
CN109478144B (en) | Data processing device and method | |
CN109598338A (en) | A kind of convolutional neural networks accelerator of the calculation optimization based on FPGA | |
CN108665059A (en) | Convolutional neural networks acceleration system based on field programmable gate array | |
CN105512723B (en) | A kind of artificial neural networks apparatus and method for partially connected | |
CN108665063B (en) | Bidirectional parallel processing convolution acceleration system for BNN hardware accelerator | |
CN107862374A (en) | Processing with Neural Network system and processing method based on streamline | |
CN108256636A (en) | A kind of convolutional neural networks algorithm design implementation method based on Heterogeneous Computing | |
CN110222760B (en) | Quick image processing method based on winograd algorithm | |
CN106127302A (en) | Process the circuit of data, image processing system, the method and apparatus of process data | |
CN104915322A (en) | Method for accelerating convolution neutral network hardware and AXI bus IP core thereof | |
CN110390383A (en) | A kind of deep neural network hardware accelerator based on power exponent quantization | |
CN107203808B (en) | A kind of two-value Convole Unit and corresponding two-value convolutional neural networks processor | |
CN108537330A (en) | Convolutional calculation device and method applied to neural network | |
CN103020890A (en) | Visual processing device based on multi-layer parallel processing | |
CN111626403B (en) | Convolutional neural network accelerator based on CPU-FPGA memory sharing | |
CN108009126A (en) | A kind of computational methods and Related product | |
CN117933314A (en) | Processing device, processing method, chip and electronic device | |
CN110991630A (en) | Convolutional neural network processor for edge calculation | |
CN115880132B (en) | Graphics processor, matrix multiplication task processing method, device and storage medium | |
CN109885406B (en) | Operator calculation optimization method, device, equipment and storage medium | |
CN110580519B (en) | Convolution operation device and method thereof | |
CN108334944A (en) | A kind of device and method of artificial neural network operation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |