CN110288086A - A kind of configurable convolution array accelerator structure based on Winograd - Google Patents

A kind of configurable convolution array accelerator structure based on Winograd Download PDF

Info

Publication number
CN110288086A
CN110288086A CN201910511987.6A CN201910511987A CN110288086A CN 110288086 A CN110288086 A CN 110288086A CN 201910511987 A CN201910511987 A CN 201910511987A CN 110288086 A CN110288086 A CN 110288086A
Authority
CN
China
Prior art keywords
matrix
winograd
module
weight
domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910511987.6A
Other languages
Chinese (zh)
Other versions
CN110288086B (en
Inventor
魏继增
徐文富
王宇吉
郭炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201910511987.6A priority Critical patent/CN110288086B/en
Publication of CN110288086A publication Critical patent/CN110288086A/en
Application granted granted Critical
Publication of CN110288086B publication Critical patent/CN110288086B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

A kind of configurable convolution array accelerator structure based on Winograd, comprising: activation value cache module, weight cache module, output buffer module, controller, weight preprocessing module, activation value preprocessing module, weight conversion module, activation value matrix conversion module, dot product module, matrix of consequence conversion module, accumulator module, pond module and active module.A kind of configurable convolution array accelerator structure based on Winograd of the invention, according to the operation feature of the Winograd convolution algorithm of fixed normal form, the configurable convolution array accelerator of bit wide is devised, the demand of different neural networks and different convolutional layers to bit wide is flexibly met.In addition, the configurable multiplier unit of dedicated data bit width, which has also been devised, reduces calculating power consumption to improve the computational efficiency of neural network convolution algorithm.

Description

A kind of configurable convolution array accelerator structure based on Winograd
Technical field
The present invention relates to a kind of configurable convolution array accelerator structures.More particularly to it is a kind of based on Winograd can Configure convolution array accelerator structure.
Background technique
Neural network is excellent on numerous areas application especially image inter-related task, such as image classification, image In the computer vision problems such as semantic segmentation, image retrieval, object detection, start to substitute most of traditional algorithm, and gradually by It is deployed on terminal device.
But neural computing amount is very huge, asks so that there are Processing with Neural Network speed is slow, operation power consumption is big etc. Topic.Neural network mainly includes training stage and reasoning stage.High-precision processing result in order to obtain, weighted data is in training Middle needs are calculated from mass data by iterating.In the ANN Reasoning stage, need in extremely short response The calculation process to input data is completed in time (usually Millisecond), especially when Application of Neural Network is in real-time system When, such as automatic Pilot field.In addition, calculating involved in neural network mainly includes convolution algorithm, activation operation and pond Operation etc..
Existing research shows that neural network is more than 90% calculating time to be convolved process to occupy.Traditional convolution algorithm By multiple multiply-accumulate operation, each element in output characteristic pattern is calculated separately.Although using the solution of the algorithm before Scheme has been achieved for preliminary success, but when algorithm itself is more efficient, efficiency may be higher.Therefore, researcher at present Propose the convolution algorithm of Winograd, the algorithm by carrying out specific data field conversion to input feature vector figure and weight, It completes equivalent convolution algorithm task and reduces the multiplication number of convolution algorithm process.Due to nerve nets most of in practical application The prediction process of network processor chips is using fixed neural network model, therefore used Winograd convolution output normal form Generally also fixed mode, calculating process is very clear, has biggish optimization space.How to design and optimize and is based on Winograd neural network accelerator structure becomes a research emphasis.
In addition, the data for inputting fixed point type can reach good experimental result for number Application of Neural Network big absolutely, Speed can be more improved, power consumption is reduced.However the convolved data bit wide in existing fixed point neural network is fixed, Bu Nengling Configuration living, reduces applicability.In general, the data bit width of 16bit can meet the accuracy requirement of neural network, and for one A little not high networks of required precision and scene, 8bit data bit width can also meet accuracy requirement.Therefore, real in neural network Existing data bit width is configurable preferably to be optimized.
Summary of the invention
The technical problem to be solved by the invention is to provide a kind of computational efficiencies that can be improved neural network convolution algorithm The configurable convolution array accelerator structure based on Winograd.
The technical scheme adopted by the invention is that: a kind of configurable convolution array accelerator structure based on Winograd, It include: that activation value cache module, weight cache module, output buffer module, controller, weight preprocessing module, activation value are pre- Processing module, weight conversion module, activation value matrix conversion module, dot product module, matrix of consequence conversion module, accumulator module, Pond module and active module, wherein
Activation value cache module is connected with controller for storing input pixel value or input feature vector map values, is activation value Preprocessing module provides activation Value Data;
Weight cache module is connected with controller, provides for weight preprocessing module for storing trained weight Weighted data;
Output buffer module, for storing a convolutional layer as a result, being connected with controller, when active module output data is complete Data are passed to output buffer module by Cheng Hou, are used for next layer of convolution;
Controller controls the biography of activation Value Data to be processed, weighted data and convolution layer data according to calculating process It is defeated;
Weight preprocessing module, receive weight cache module transmission to operational data, for dividing convolution kernel, when obtaining Domain weight matrix K;
Activation value preprocessing module, receive the transmission of activation value cache module to operational data, for being cached from activation value Module takes out activation value, for dividing activation value, obtains time domain activation value matrix I;
Weight conversion module, receive the transmission of weight preprocessing module to operational data, for realizing weighted data from when Domain is converted to the domain Winograd, obtains the domain Winograd weight matrix U;
Activation value matrix conversion module, receive activation value preprocessing module transmission to operational data, for realizing activation Value is converted to the domain Winograd from time domain, obtains the domain Winograd activation value matrix V;
Dot product module, receive respectively weight conversion module and activation value matrix conversion module transfer to operational data, use In the dot product operations for realizing the domain Winograd activation value matrix and the domain Winograd weight matrix, the domain Winograd dot product knot is obtained Fruit matrix M;
Matrix of consequence conversion module, receive dot product module transfer to operational data, for realizing dot product result matrix from Conversion of the domain Winograd to time domain, time domain dot product result matrix F after being converted;
Accumulator module, reception result matrix conversion module transfer to operational data, by obtaining received data accumulation To final convolution results;
Pond module, receive accumulator module transmission to operational data, final convolution results battle array is subjected to pond;
Active module, reception tank module transfer to operational data, pond result is subjected to the processing of Relu activation primitive, It is after being activated as a result, being transferred to output buffer module.
The weight preprocessing module includes:
(1) convolution kernel that a size is 5*5 is passed through into zero padding, is extended to the convolution matrix of 6*6;
(2) convolution matrix of 6*6 is divided into the convolution kernel of four 3*3;
Specific division is as follows, wherein KinputIndicate the weight matrix of a 5*5, downside is 4 corresponding stroke respectively Time domain weight matrix K to be processed after point1、K2、K3、K4.Calculating U=GKGTIn, K value is followed successively by K1、K2、K3、K4:
The activation value preprocessing module is that the activation value matrix of 6*6 size is divided into 4 4*4 sizes of overlapping Matrix.It divides as follows, wherein IinputIndicate the weight matrix of a 5*5, downside is that the size after dividing is 4*4 respectively Time domain activation value matrix I to be processed1、I2、I3、I4.Calculating V=BTIn IB, I value is followed successively by I1、I2、I3、I4:
The weight conversion module is to subtract the Matrix Multiplication in completing to calculate by ranks addition of vectors, thereby executing It is directed to the conversion of weight matrix in Winograd convolution, obtains the domain Winograd weight matrix U=[GKGT] wherein, when K is indicated Domain weight matrix, G are that weight converts companion matrix, U is the domain Winograd weight matrix;
Concrete operations: using the first row vector of weight matrix K as provisional matrix C2The first row, wherein provisional matrix C2 =GTK;Integer in weight matrix K is moved to right into benefit 0, negative moves to right benefit 1 and completes except two;When weight, which is positive, to be worth, weight is moved to right, Mend 0 in the weight left side;When weight is negative, weight is moved to right, and the weight left side mends 1;By the first, second and third row element phase of weight matrix K In addition the vector result after moving to right one again after is as provisional matrix C2The second row;By the first, second and third of weight matrix K Row element is added the vector result after moving to right one again later as provisional matrix C2The third line;By the of weight matrix K Three row vectors are as provisional matrix C2Fourth line;By provisional matrix C2First column vector is as the domain Winograd weight matrix U's First row;By provisional matrix C2First, second and third column be added after move to right one again after vector result as Winograd The secondary series of domain weight matrix U;By provisional matrix C2First, second and third column be added after move to right one again after vector knot Fruit arranges as the third of the domain Winograd weight matrix U;By provisional matrix C2Third column vector as the domain Winograd weight 4th column of matrix U, finally obtain the domain Winograd weight matrix U.
The activation value matrix conversion module, is subtracted by ranks addition of vectors, and the Matrix Multiplication in calculating is completed, thus It executes in Winograd convolution for the conversion operation of time domain activation value matrix, obtains matrix V=[BTIB] wherein, I is time domain Activation value matrix, B are that activation value converts companion matrix, V is the domain Winograd activation value matrix;
Concrete operations: the first row of time domain activation value matrix I is subtracted into the vector differentials of the third line as provisional matrix C1's The first row, wherein provisional matrix C1=BTI;The result that the second row of time domain activation value matrix I is added with the third line is as interim Matrix C1The second row;Using the vector differentials of the third line row that subtracts the second of time domain activation value matrix I as provisional matrix C1Third Row;The second row of time domain activation value matrix I is subtracted into the vector differentials of fourth line as provisional matrix C1Fourth line;By interim square Battle array C1First row subtract first row of the tertial vector differentials as the domain Winograd activation value matrix V;By provisional matrix C1's Secondary series of the result that secondary series is added with third column as the domain Winograd activation value matrix V;By provisional matrix C1Third The vector differentials that column subtract secondary series are arranged as the third of the domain Winograd activation value matrix V;By provisional matrix C1Secondary series subtract Fourth column of the vector differentials of 4th column as the domain Winograd activation value matrix V, finally obtain the domain Winograd activation value square Battle array V.
The dot product module is by executing the domain the domain Winograd weight matrix U and Winograd activation value matrix V Dot product operations, obtain the domain Winograd dot product result matrix M, and formula is expressed as M=U ⊙ V, wherein U is the domain Winograd weight Matrix, V are the domain Winograd activation value matrixs;The dot product module has 8 to multiply to realize the configurable dot product of data bit width Two operating modes of musical instruments used in a Buddhist or Taoist mass and 16 multipliers respectively correspond the operation for carrying out two kinds of data bit widths of 8bit and 16bit, realize 8* The fixed-point multiplication operation of 8bit and 16*16bit.
8 multipliers include sequentially connected first gating unit, first negate unit, the first shift unit, First summing elements, the second gating unit, second negate unit and third gating unit, wherein
First gating unit receives respectively: the data information and power of weight conversion module and activation value matrix conversion module The Signed Domination signal of weight conversion module;
First negates the data information that unit receives the first gating unit, negates to received data;
First shift unit receives the first data information for negating unit, and receives the sign bit letter of the first gating unit Breath, shifts received data according to symbolic information;
First summing elements receive the data information of the first shift unit, add up to received data;
Second gating unit receives the data information of the first summing elements and the sign bit information of the first gating unit, and passes It gives second and negates unit;
Second negates the data information that unit receives the second gating unit, negates to received data;
Third gating unit receives second respectively and negates the data information of unit and the first summing elements, and exports.
16 multipliers include that sequentially connected 4th gating unit, third negate unit, 8 multipliers, Two shift units, the second summing elements, the 5th gating unit, the 4th negate unit and the 6th gating unit, wherein
4th gating unit receives respectively: the data information and power of weight conversion module and activation value matrix conversion module The Signed Domination signal of weight conversion module;
Third negates the data information that unit receives the 4th gating unit, negates to received data;
8 multipliers carry out the operation of 8bit data bit width, realize the fixed-point multiplication operation of 8*8bit;
Second shift unit receives the data information of 8 multipliers, shifts to received data;
Second summing elements receive the data information of the second shift unit, add up to received data;
5th gating unit receives the data information of the second summing elements and the sign bit information of the 4th gating unit, and passes It gives the 4th and negates unit;
4th negates the data information that unit receives the 5th gating unit, negates to received data;
6th gating unit receives the 4th and negates the data information of unit, and exports.
The matrix of consequence conversion module is to add and subtract to grasp by the domain Winograd dot product result matrix M ranks vector shift Make to execute the conversion operation F=A for being directed to the domain Winograd dot product result matrix MTMA, wherein M is the domain Winograd dot product result Matrix, A are the companion matrixs that changes of the domain Winograd dot product result matrix M, and F is time domain dot product result matrix;
Concrete operations: using vector result that the first, second and third row of the domain Winograd dot product result matrix M is added as facing When Matrix C3The first row, wherein provisional matrix C3=ATM;By the domain point Winograd dot product result matrix M second and third, four rows The vector result of addition is as provisional matrix C3The second row;By provisional matrix C3First, second and third column be added vector result First row as the time domain dot product result matrix F after conversion;By provisional matrix C3Second and third, four column be added vector knot Secondary series of the fruit as the time domain dot product result matrix F after conversion, the time domain dot product result matrix F after finally obtaining conversion.
A kind of configurable convolution array accelerator structure based on Winograd of the invention, according to fixed normal form The operation feature of Winograd convolution algorithm devises the configurable convolution array accelerator of bit wide, different nerves is flexibly met The demand of network and different convolutional layers to bit wide.In addition, the configurable multiplier unit of dedicated data bit width has also been devised, To improve the computational efficiency of neural network convolution algorithm, calculating power consumption is reduced.
Detailed description of the invention
Fig. 1 is Winograd convolution array accelerator general frame figure;
Fig. 2 is a kind of composition schematic diagram of the configurable convolution array accelerator structure based on Winograd of the present invention;
Fig. 3 is the schematic diagram of 8 multipliers during data bit width can match;
Fig. 4 is the schematic diagram of 16 multipliers during data bit width can match.
Specific embodiment
Below with reference to embodiment and attached drawing to a kind of configurable convolution array accelerator based on Winograd of the invention Structure is described in detail.
In the convolutional calculation of neural network, Winograd conversion formula is
Out=AT[(GKGT)⊙(BTIB)]A(1)
Wherein K indicates that time domain weights matrix, I indicate that time domain activates value matrix, and A, G, B are respectively indicated and dot product matrix of consequence [(GKGT)⊙(BTIB)], time domain weights matrix K, the corresponding transition matrix of time domain activation value matrix I, transition matrix A, G, B are specific It is as follows:
The output normal form of used Winograd convolution is F (2*2,3*3) in the present invention, and first parameter 2*2 indicates defeated The size of characteristic pattern out, second parameter 3*3 indicate the size of convolution kernel.
As shown in Figure 1, Winograd convolution can be divided into three phases execution.First stage, by what is read from caching Weight matrix G and time domain activation value matrix I switchs to the domain Winograd from time domain, and concrete operations are matrix multiplication operation, calculate knot Fruit indicates with U and V, wherein U=GKGT, V=BTIB;Second stage swashs the domain the Winograd domain weight matrix U and Winograd Value matrix V living executes dot product operations " ⊙ ", obtains the domain Winograd dot product result matrix M=U ⊙ V;Phase III is by dot product knot Fruit switchs to time domain from the domain Winograd.
As shown in Fig. 2, a kind of configurable convolution array accelerator structure based on Winograd of the invention, comprising: swash Work value cache module 1, weight cache module 2, output buffer module 3, controller 4, weight preprocessing module 5, activation value are located in advance Manage module 6, weight conversion module 7, activation value matrix conversion module 8, dot product module 9, matrix of consequence conversion module 10, cumulative mould Block 11, pond module 12 and active module 13, wherein
1) activation value cache module 1 is connected with controller 4 for storing input pixel value or input feature vector map values, is sharp Value preprocessing module 6 living provides activation Value Data;
2) weight cache module 2 is connected with controller 4 for storing trained weight, is weight preprocessing module 5 provide weighted data;
3) output buffer module 3, for storing a convolutional layer as a result, being connected with controller 4, when active module 13 exports After the completion of data, data are passed to output buffer module 3, are used for next layer of convolution;
4) controller 4 control activation Value Data to be processed, weighted data and convolution layer data according to calculating process Transmission;
5) weight preprocessing module 5 receives dividing to operational data for dividing convolution kernel for the transmission of weight cache module 2 Four time domains weight matrix K to be processed is not obtained1、K2、K3、K4
The weight preprocessing module 5 includes: (1) by convolution kernel that a size is 5*5 by zero padding, is extended to 6* 6 convolution matrix;(2) convolution matrix of 6*6 is divided into the convolution kernel of four 3*3;The Winograd of 3*3 can thus be used The convolution that normal form realizes 5*5 is exported, efficiently and not will increase power consumption multiplication number.
Specific division is as follows, wherein KinputIndicate that a size is the time domain input weight matrix of 5*5, when right side is Four processing results after the division of 6*6 time domain weights matrix after the input weight matrix-expand of domain, are 4 corresponding separately below Time domain weight matrix K to be processed after division1、K2、K3、K4.Calculating U=GKGTIn, K value is followed successively by K1、K2、K3、K4:
6) activation value preprocessing module 6, receive that activation value cache module 1 transmits to operational data, for from activation value Cache module 1 takes out activation value and respectively obtains time domain activation value matrix I to be processed for dividing activation value1、I2、I3、I4.? Calculate V=BTIn IB, I value is followed successively by I1、I2、I3、I4:
The activation value preprocessing module 6 is realized the reading of activation value and is pre-processed to it.It is calculated in Winograd In method, activation value needs are corresponding with weight, and the data that many of them is reused, so being overlapped division.It is described Activation value preprocessing module 6 be the matrix that the activation value matrix of 6*6 size is divided into 4 4*4 sizes of overlapping, it is right respectively Answer the convolution kernel of 4 3*3;It divides as follows, wherein IinputIndicate that the time domain that a size is 6*6 inputs activation value Matrix, lower section are respectively the time domain activation value matrix I to be processed that the size after dividing is 4*41、I2、I3、I4.Calculating V=BTIB In, I value is followed successively by I1、I2、I3、I4:
7) weight conversion module 7, receive weight preprocessing module 5 transmit to operational data, for realizing weighted data The domain Winograd is converted to from time domain, obtains the domain Winograd weight matrix U;
The weight conversion module 7 is to subtract the Matrix Multiplication in completing to calculate by ranks addition of vectors, thereby executing It is directed to the conversion of weight matrix in Winograd convolution, obtains the domain Winograd weight matrix U=[GKGT] wherein, when K is indicated Domain weight matrix, G are that weight converts companion matrix, U is the domain Winograd weight matrix;
Concrete operations: using the first row vector of time domain weights matrix K as provisional matrix C2The first row, wherein interim square Battle array C2=GTK;Because of existence value 1/2 in weight matrix, only needs to move to right the integer in time domain weights matrix K benefit 0, bears Number moves to right benefit 1 and completes except two;When weight, which is positive, to be worth, weight is moved to right, and the weight left side mends 0;When weight is negative, weight is moved to right, Mend 1 in the weight left side;Vector result after moving to right one again after first, second and third row element of time domain weights matrix K is added As provisional matrix C2The second row;Will time domain weights matrix K the first, second and third row element be added after move to right again one it Vector result afterwards is as Matrix C2The third line;Using the third row vector of time domain weights matrix K as provisional matrix C2The 4th Row;By provisional matrix C2First row of first column vector as the domain Winograd weight matrix U;By provisional matrix C2First, Two, secondary series of the vector result as the domain Winograd weight matrix U after three column move to right one after being added again;It will be interim Matrix C2First, second and third column be added after move to right one again after vector result as the domain Winograd weight matrix U's Third column;By provisional matrix C2Third column vector as the domain Winograd weight matrix U the 4th column, finally obtain The domain Winograd weight matrix U.
8) activation value matrix conversion module 8, receive that activation value preprocessing module 6 transmits to operational data, for realizing Activation value is converted to the domain Winograd from time domain, obtains the domain Winograd activation value matrix V;
The activation value matrix conversion module 8, is subtracted by ranks addition of vectors, and the Matrix Multiplication in calculating is completed, from And execute in Winograd convolution for time domain activation value matrix conversion operation, obtain the domain Winograd activation value matrix V= [BTIB] wherein, it be activation value conversion companion matrix, V is the domain Winograd activation value matrix that I, which is time domain activation value matrix, B,;
Concrete operations: the first row of time domain activation value matrix I is subtracted into the vector differentials of the third line as provisional matrix C1's The first row, wherein provisional matrix C1=BTI;The result that the second row of time domain activation value matrix I is added with the third line is as interim Matrix C1The second row;Using the vector differentials of the third line row that subtracts the second of time domain activation value matrix I as provisional matrix C1Third Row;The second row of time domain activation value matrix I is subtracted into the vector differentials of fourth line as provisional matrix C1Fourth line;By interim square Battle array C1First row subtract first row of the tertial vector differentials as the domain Winograd activation value matrix V;By provisional matrix C1's Secondary series of the result that secondary series is added with third column as the domain Winograd activation value matrix V;By provisional matrix C1Third The vector differentials that column subtract secondary series are arranged as the third of the domain Winograd activation value matrix V;By provisional matrix C1Secondary series subtract Fourth column of the vector differentials of 4th column as the domain Winograd activation value matrix V, finally obtain the domain Winograd activation value square Battle array V.
9) dot product module 9, receive that weight conversion module 7 and activation value matrix conversion module 8 transmit respectively to operand According to, for realizing the domain Winograd activation value matrix and the domain Winograd weight matrix dot product operations, obtain the domain Winograd The module for calculating time and resource is most consumed in dot product result matrix M and convolution;
The dot product module 9 is by executing the domain the domain Winograd weight matrix U and Winograd activation value matrix V Dot product operations, obtain the domain Winograd dot product result matrix M, and formula is expressed as M=U ⊙ V, wherein U is the domain Winograd weight Matrix, V are the domain Winograd activation value matrixs;The dot product module 9 has 8 to realize the configurable dot product of data bit width Two operating modes of multiplier and 16 multipliers respectively correspond the operation for carrying out two kinds of data bit widths of 8bit and 16bit, realize The fixed-point multiplication operation of 8*8bit and 16*16bit.Wherein,
(1) as shown in figure 3,8 multipliers include sequentially connected first gating unit 14, first negate unit 15, the first shift unit 16, the first summing elements 17, the second gating unit 18, second negate unit 19 and third gating unit 20, wherein
First gating unit 14 receives respectively: weight conversion module 7 and the data information of activation value matrix conversion module 8 with And the Signed Domination signal of weight conversion module 7;
First negates the data information that unit 15 receives the first gating unit 14, negates to received data;
First shift unit 16 receives the first data information for negating unit 15, and receives the symbol of the first gating unit 14 Number position information, shifts received data according to symbolic information;
First summing elements 17 receive the data information of the first shift unit 16, add up to received data;
Second gating unit 18 receives the data information of the first summing elements 17 and the sign bit letter of the first gating unit 14 Breath, and send second to and negate unit 19;
Second negates the data information that unit 19 receives the second gating unit 18, negates to received data;
Third gating unit 20 receives the second data information for negating unit 19 and the first summing elements 17 respectively, and defeated Out.
8 multiplier concrete operations: according to the sign bit of two multipliers, phase exclusive or obtains the sign bit of result, and root Judge according to sign bit positive and negative, then propose sign bit if negative, rear seven digit is negated plus 1;If positive number, then rear seven digits are protected It holds constant.Judge it is positive and negative after multiplier A1Multiplier B is judged respectively1Whether each binary digit is 1, is if 1 corresponding median Multiplier A1The corresponding position of seven bitwise shift lefts afterwards, if 0 that 0 corresponding median is 8.Multiplier B is judged1Latter seven after, will All medians are added the result H being multiplied2, then decide whether to be negated plus 1 according to outcome symbol position, if knot Fruit sign bit 1 is then by the result H of multiplication2It negates and adds 1, remained unchanged if outcome symbol position is 0, obtain multiplied result H3, finally In multiplied result H3The 8th take outcome symbol position, obtain final result.No symbol 8 multiplies, without considering sign bit, It will be according to multiplier B18 data shifter-adders obtain result.
(2) as shown in figure 4,16 multipliers include that sequentially connected 4th gating unit 21, third negate list First 22,8 multipliers 23, the second shift unit 24, the second summing elements 25, the 5th gating unit the 26, the 4th negate unit 27 With the 6th gating unit 28, wherein
4th gating unit 21 receives respectively: weight conversion module 7 and the data information of activation value matrix conversion module 8 with And the Signed Domination signal of weight conversion module 7;
Third negates the data information that unit 22 receives the 4th gating unit 21, negates to received data;
8 multipliers 23 carry out the operation of 8bit data bit width, realize the fixed-point multiplication operation of 8*8bit;
Second shift unit 24 receives the data information of 8 multipliers 23, shifts to received data;
Second summing elements 25 receive the data information of the second shift unit 24, add up to received data;
5th gating unit 26 receives the data information of the second summing elements 25 and the sign bit letter of the 4th gating unit 21 Breath, and send the 4th to and negate unit 27;
4th negates the data information that unit 27 receives the 5th gating unit 26, negates to received data;
6th gating unit 28 receives the 4th and negates the data information of unit 27, and exports.
16 multipliers are realized by 48 multiplier devices, wherein the gating letter of 8 multipliers used Number be 0, i.e., without sign multiplication device.Firstly, being judged according to the sign bit of two 16 multipliers positive and negative, remained unchanged if canonical, It is negated if being negative and adds 1;Secondly 16 digits after judgement are divided into most-significant byte number and least-significant byte number, it is then corresponding to be multiplied;Later will The result that two most-significant byte numbers are multiplied moves to left 16, respectively by the result of multiplier D most-significant byte multiplier E least-significant byte multiplication, multiplier D least-significant byte 8 are moved to left after the results added that multiplier E most-significant byte is multiplied, by the result after displacement plus multiplier A least-significant byte and multiplier B least-significant byte Multiplication obtains multiplied result L;Add 1 finally, deciding whether to negate according to outcome symbol position, if multiplied result L symbol is 1 The result of multiplication is negated and adds 1, is remained unchanged if multiplied result L sign bit is 0, finally takes symbol in the first place multiplied result L The worth of position exports result to the end.
10) matrix of consequence conversion module 10, receive dot product module 9 transmit to operational data, for realizing dot product result Conversion of the matrix from Winograd domain to time domain, the time domain dot product result matrix F after being converted;
The matrix of consequence conversion module 10 is added and subtracted by the domain Winograd dot product result matrix M ranks vector shift Operation executes the conversion operation F=A for being directed to the domain Winograd dot product result matrix MTMA, wherein M is the domain Winograd dot product knot Fruit matrix, A are the conversion companion matrixs of the domain Winograd dot product result matrix M, and F is time domain dot product result matrix;
Concrete operations: using vector result that the first, second and third row of the domain Winograd dot product result matrix M is added as facing When Matrix C3The first row, wherein C3=ATM;By the domain Winograd dot product result matrix M second and third, four rows be added vector As a result it is used as provisional matrix C3The second row;By provisional matrix C3First, second and third column be added vector result as conversion after Time domain dot product result matrix F first row;By provisional matrix C3Second and third, four column be added vector results as conversion The secondary series of time domain dot product result matrix F afterwards, finally obtains time domain dot product result matrix F.
11) accumulator module 11, reception result matrix conversion module 10 transmit to operational data, by by received data It is cumulative, obtain final convolution results, the matrix of consequence of a 2*2 size;
12) pond module 12, receive that accumulator module 11 transmits to operational data, final convolution results battle array is subjected to pond Change;Different pond methods, including maximizing method, averaging method, method of minimizing can be used, to the neuron of input into The operation of row pondization.The matrix of consequence finally exported due to Winograd convolution F (2*2,3*3) is 2*2 size, then can directly into The pondization of row 2*2 operates, and compares to obtain pond result by size three times: be for the first time matrix of consequence the first row two numbers into Row comparison, is that two numbers of the second row compare for the second time, is to compare the preceding result compared twice for the third time, obtains The maximum pond result of the matrix of consequence.
13) active module 13, reception tank module 12 transmit to operational data, pond result is subjected to Relu activation letter Number processing is after being activated as a result, being transferred to output buffer module 3.

Claims (9)

1. a kind of configurable convolution array accelerator structure based on Winograd characterized by comprising activation value caches mould Block (1), weight cache module (2), output buffer module (3), controller (4), weight preprocessing module (5), activation value are located in advance Manage module (6), weight conversion module (7), activation value matrix conversion module (8), dot product module (9), matrix of consequence conversion module (10), accumulator module (11), pond module (12) and active module (13), wherein
Activation value cache module (1) is connected for storing input pixel value or input feature vector map values with controller (4), for activation It is worth preprocessing module (6) and activation Value Data is provided;
Weight cache module (2) is connected with controller (4) for storing trained weight, is weight preprocessing module (5) weighted data is provided;
Output buffer module (3), for storing a convolutional layer as a result, being connected with controller (4), when active module (13) export After the completion of data, data are passed to output buffer module (3), are used for next layer of convolution;
Controller (4) controls the transmission of activation Value Data, weighted data and convolution layer data to be processed according to calculating process;
Weight preprocessing module (5) receives obtaining to operational data for dividing convolution kernel for weight cache module (2) transmission Time domain weights matrix K;
Activation value preprocessing module (6), receive activation value cache module (1) transmission to operational data, for slow from activation value Storing module (1) takes out activation value, for dividing activation value, obtains time domain activation value matrix I;
Weight conversion module (7), receive weight preprocessing module (5) transmission to operational data, for realizing weighted data from Time domain is converted to the domain Winograd, obtains the domain Winograd weight matrix U;
Activation value matrix conversion module (8), receive activation value preprocessing module (6) transmission to operational data, for realizing swashing Value living is converted to the domain Winograd from time domain, obtains the domain Winograd activation value matrix V;
Dot product module (9), receive that weight conversion module (7) and activation value matrix conversion module (8) transmit respectively to operand According to, for realizing the domain Winograd activation value matrix and the domain Winograd weight matrix dot product operations, obtain the domain Winograd Dot product result matrix M;
Matrix of consequence conversion module (10), receive dot product module (9) transmission to operational data, for realizing dot product result matrix Conversion from Winograd domain to time domain, the time domain dot product result matrix F after being converted;
Accumulator module (11), reception result matrix conversion module (10) transmission to operational data, by the way that received data are tired out Add, obtains final convolution results;
Pond module (12), receive accumulator module (11) transmission to operational data, final convolution results battle array is subjected to pond;
Active module (13), reception tank module (12) transmission to operational data, by pond result progress Relu activation primitive Processing is after being activated as a result, being transferred to output buffer module (3).
2. a kind of configurable convolution array accelerator structure based on Winograd according to claim 1, feature exist In the weight preprocessing module (5) includes:
(1) convolution kernel that a size is 5*5 is passed through into zero padding, is extended to the convolution matrix of 6*6;
(2) convolution matrix of 6*6 is divided into the convolution kernel of four 3*3;
Specific division is as follows, wherein KinputThe weight matrix of a 5*5 is indicated, after downside is respectively 4 corresponding divisions Time domain weight matrix K to be processed1、K2、K3、K4.Calculating U=GKGTIn, K value is followed successively by K1、K2、K3、K4:
3. a kind of configurable convolution array accelerator structure based on Winograd according to claim 1, feature exist In the activation value preprocessing module (6) is the square that the activation value matrix of 6*6 size is divided into 4 4*4 sizes of overlapping Battle array.It divides as follows, wherein IinputIndicate the weight matrix of a 5*5, downside be respectively size after dividing be 4*4 when Domain activation value matrix I to be processed1、I2、I3、I4.Calculating V=BTIn IB, I value is followed successively by I1、I2、I3、I4:
4. a kind of configurable convolution array accelerator structure based on Winograd according to claim 1, feature exist In, the weight conversion module (7) is to subtract the Matrix Multiplication in completing to calculate by ranks addition of vectors, thereby executing It is directed to the conversion of weight matrix in Winograd convolution, obtains the domain Winograd weight matrix U=[GKGT] wherein, when K is indicated Domain weight matrix, G are that weight converts companion matrix, U is the domain Winograd weight matrix;
Concrete operations: using the first row vector of weight matrix K as provisional matrix C2The first row, wherein provisional matrix C2= GTK;Integer in weight matrix K is moved to right into benefit 0, negative moves to right benefit 1 and completes except two;When weight, which is positive, to be worth, weight is moved to right, power It is worth the left side and mends 0;When weight is negative, weight is moved to right, and the weight left side mends 1;The first, second and third row element of weight matrix K is added Vector result after moving to right one again later is as provisional matrix C2The second row;By the first, second and third row of weight matrix K Element is added the vector result after moving to right one again later as provisional matrix C2The third line;By the third of weight matrix K Row vector is as provisional matrix C2Fourth line;By provisional matrix C2First column vector as the domain Winograd weight matrix U One column;By provisional matrix C2First, second and third column be added after move to right one again after vector result as the domain Winograd The secondary series of weight matrix U;By provisional matrix C2The first, second and third column be added after move to right vector result after one again Third as the domain Winograd weight matrix U arranges;By provisional matrix C2Third column vector as the domain Winograd weight square The 4th column of battle array U, finally obtain the domain Winograd weight matrix U.
5. a kind of configurable convolution array accelerator structure based on Winograd according to claim 1, feature exist In the activation value matrix conversion module (8), is subtracted by ranks addition of vectors, the Matrix Multiplication in calculating is completed, to hold For the conversion operation of time domain activation value matrix in row Winograd convolution, matrix V=[B is obtainedTIB] wherein, I is that time domain swashs Work value matrix, B are that activation value converts companion matrix, V is the domain Winograd activation value matrix;
Concrete operations: the first row of time domain activation value matrix I is subtracted into the vector differentials of the third line as provisional matrix C1First It goes, wherein provisional matrix C1=BTI;The result that the second row of time domain activation value matrix I is added with the third line is as provisional matrix C1The second row;Using the vector differentials of the third line row that subtracts the second of time domain activation value matrix I as provisional matrix C1The third line; The second row of time domain activation value matrix I is subtracted into the vector differentials of fourth line as provisional matrix C1Fourth line;By provisional matrix C1 First row subtract first row of the tertial vector differentials as the domain Winograd activation value matrix V;By provisional matrix C1? Secondary series of the result that two column are added with third column as the domain Winograd activation value matrix V;By provisional matrix C1Third column The vector differentials for subtracting secondary series are arranged as the third of the domain Winograd activation value matrix V;By provisional matrix C1Secondary series subtract the Fourth column of the vector differentials of four column as the domain Winograd activation value matrix V, finally obtain the domain Winograd activation value matrix V。
6. a kind of configurable convolution array accelerator structure based on Winograd according to claim 1, feature exist In the dot product module (9) is the point by executing the domain the domain Winograd weight matrix U and Winograd activation value matrix V Product operation, obtains the domain Winograd dot product result matrix M, and formula is expressed as M=U ⊙ V, wherein U is the domain Winograd weight square Battle array, V are the domain Winograd activation value matrixs;The dot product module (9) has 8 to realize the configurable dot product of data bit width Two operating modes of multiplier and 16 multipliers respectively correspond the operation for carrying out two kinds of data bit widths of 8bit and 16bit, realize The fixed-point multiplication operation of 8*8bit and 16*16bit.
7. a kind of configurable convolution array accelerator structure based on Winograd according to claim 6, feature exist In, 8 multipliers include sequentially connected first gating unit (14), first to negate unit (15), the first displacement single First (16), the first summing elements (17), the second gating unit (18), second negate unit (19) and third gating unit (20), Wherein,
First gating unit (14) receives respectively: the data information of weight conversion module (7) and activation value matrix conversion module (8) And the Signed Domination signal of weight conversion module (7);
First negates the data information that unit (15) receive the first gating unit (14), negates to received data;
First shift unit (16) receives the first data information for negating unit (15), and receives the first gating unit (14) Sign bit information shifts received data according to symbolic information;
First summing elements (17) receive the data information of the first shift unit (16), add up to received data;
Second gating unit (18) receives the data information of the first summing elements (17) and the sign bit of the first gating unit (14) Information, and send second to and negate unit (19);
Second negates the data information that unit (19) receive the second gating unit (18), negates to received data;
Third gating unit (20) receives the second data information for negating unit (19) and the first summing elements (17) respectively, and defeated Out.
8. a kind of configurable convolution array accelerator structure based on Winograd according to claim 6, feature exist It include that sequentially connected 4th gating unit (21), third negate unit (22), 8 multipliers in, 16 multipliers (23), the second shift unit (24), the second summing elements (25), the 5th gating unit (26), the 4th negate unit (27) and Six gating units (28), wherein
4th gating unit (21) receives respectively: the data information of weight conversion module (7) and activation value matrix conversion module (8) And the Signed Domination signal of weight conversion module (7);
Third negates the data information that unit (22) receive the 4th gating unit (21), negates to received data;
8 multipliers (23) carry out the operation of 8bit data bit width, realize the fixed-point multiplication operation of 8*8bit;
Second shift unit (24) receives the data information of 8 multipliers (23), shifts to received data;
Second summing elements (25) receive the data information of the second shift unit (24), add up to received data;
5th gating unit (26) receives the data information of the second summing elements (25) and the sign bit of the 4th gating unit (21) Information, and send the 4th to and negate unit (27);
4th negates the data information that unit (27) receive the 5th gating unit (26), negates to received data;
6th gating unit (28) receives the 4th and negates the data information of unit (27), and exports.
9. a kind of configurable convolution array accelerator structure based on Winograd according to claim 1, feature exist In the matrix of consequence conversion module (10) is to add and subtract to grasp by the domain Winograd dot product result matrix M ranks vector shift Make to execute the conversion operation F=A for being directed to the domain Winograd dot product result matrix MTMA, wherein M is the domain Winograd dot product result Matrix, A are the companion matrixs that changes of the domain Winograd dot product result matrix M, and F is time domain dot product result matrix;
Concrete operations: the vector result that the first, second and third row of the domain Winograd dot product result matrix M is added is as interim square Battle array C3The first row, wherein provisional matrix C3=ATM;By the domain point Winograd dot product result matrix M second and third, four rows be added Vector result as provisional matrix C3The second row;By provisional matrix C3First, second and third column be added vector result conduct The first row of time domain dot product result matrix F after conversion;By provisional matrix C3Second and third, four column be added vector results make For the secondary series of the time domain dot product result matrix F after conversion, the time domain dot product result matrix F after converting is finally obtained.
CN201910511987.6A 2019-06-13 2019-06-13 Winograd-based configurable convolution array accelerator structure Active CN110288086B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910511987.6A CN110288086B (en) 2019-06-13 2019-06-13 Winograd-based configurable convolution array accelerator structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910511987.6A CN110288086B (en) 2019-06-13 2019-06-13 Winograd-based configurable convolution array accelerator structure

Publications (2)

Publication Number Publication Date
CN110288086A true CN110288086A (en) 2019-09-27
CN110288086B CN110288086B (en) 2023-07-21

Family

ID=68004097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910511987.6A Active CN110288086B (en) 2019-06-13 2019-06-13 Winograd-based configurable convolution array accelerator structure

Country Status (1)

Country Link
CN (1) CN110288086B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325332A (en) * 2020-02-18 2020-06-23 百度在线网络技术(北京)有限公司 Convolutional neural network processing method and device
CN112580793A (en) * 2020-12-24 2021-03-30 清华大学 Neural network accelerator based on time domain memory computing and acceleration method
CN112639839A (en) * 2020-05-22 2021-04-09 深圳市大疆创新科技有限公司 Arithmetic device of neural network and control method thereof
CN112734827A (en) * 2021-01-07 2021-04-30 京东鲲鹏(江苏)科技有限公司 Target detection method and device, electronic equipment and storage medium
WO2021083097A1 (en) * 2019-11-01 2021-05-06 中科寒武纪科技股份有限公司 Data processing method and apparatus, and computer device and storage medium
WO2021082747A1 (en) * 2019-11-01 2021-05-06 中科寒武纪科技股份有限公司 Operational apparatus and related product
CN112862091A (en) * 2021-01-26 2021-05-28 合肥工业大学 Resource multiplexing type neural network hardware accelerating circuit based on quick convolution
CN112949845A (en) * 2021-03-08 2021-06-11 内蒙古大学 Deep convolutional neural network accelerator based on FPGA
CN113269302A (en) * 2021-05-11 2021-08-17 中山大学 Winograd processing method and system for 2D and 3D convolutional neural networks
CN113283591A (en) * 2021-07-22 2021-08-20 南京大学 Efficient convolution implementation method and device based on Winograd algorithm and approximate multiplier
CN113407904A (en) * 2021-06-09 2021-09-17 中山大学 Winograd processing method, system and medium compatible with multi-dimensional convolutional neural network
CN113554163A (en) * 2021-07-27 2021-10-26 深圳思谋信息科技有限公司 Convolutional neural network accelerator
CN113656751A (en) * 2021-08-10 2021-11-16 上海新氦类脑智能科技有限公司 Method, device, equipment and medium for realizing signed operation of unsigned DAC (digital-to-analog converter)
CN114399036A (en) * 2022-01-12 2022-04-26 电子科技大学 Efficient convolution calculation unit based on one-dimensional Winograd algorithm

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793199A (en) * 2014-01-24 2014-05-14 天津大学 Rapid RSA cryptography coprocessor capable of supporting dual domains
US20160342893A1 (en) * 2015-05-21 2016-11-24 Google Inc. Rotating data for neural network computations
CN107862374A (en) * 2017-10-30 2018-03-30 中国科学院计算技术研究所 Processing with Neural Network system and processing method based on streamline
US20180157969A1 (en) * 2016-12-05 2018-06-07 Beijing Deephi Technology Co., Ltd. Apparatus and Method for Achieving Accelerator of Sparse Convolutional Neural Network
CN109190755A (en) * 2018-09-07 2019-01-11 中国科学院计算技术研究所 Matrix conversion device and method towards neural network
CN109190756A (en) * 2018-09-10 2019-01-11 中国科学院计算技术研究所 Arithmetic unit based on Winograd convolution and the neural network processor comprising the device
CN109325591A (en) * 2018-09-26 2019-02-12 中国科学院计算技术研究所 Neural network processor towards Winograd convolution
CN109359730A (en) * 2018-09-26 2019-02-19 中国科学院计算技术研究所 Neural network processor towards fixed output normal form Winograd convolution
CN109447241A (en) * 2018-09-29 2019-03-08 西安交通大学 A kind of dynamic reconfigurable convolutional neural networks accelerator architecture in internet of things oriented field

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793199A (en) * 2014-01-24 2014-05-14 天津大学 Rapid RSA cryptography coprocessor capable of supporting dual domains
US20160342893A1 (en) * 2015-05-21 2016-11-24 Google Inc. Rotating data for neural network computations
US20180157969A1 (en) * 2016-12-05 2018-06-07 Beijing Deephi Technology Co., Ltd. Apparatus and Method for Achieving Accelerator of Sparse Convolutional Neural Network
CN107862374A (en) * 2017-10-30 2018-03-30 中国科学院计算技术研究所 Processing with Neural Network system and processing method based on streamline
CN109190755A (en) * 2018-09-07 2019-01-11 中国科学院计算技术研究所 Matrix conversion device and method towards neural network
CN109190756A (en) * 2018-09-10 2019-01-11 中国科学院计算技术研究所 Arithmetic unit based on Winograd convolution and the neural network processor comprising the device
CN109325591A (en) * 2018-09-26 2019-02-12 中国科学院计算技术研究所 Neural network processor towards Winograd convolution
CN109359730A (en) * 2018-09-26 2019-02-19 中国科学院计算技术研究所 Neural network processor towards fixed output normal form Winograd convolution
CN109447241A (en) * 2018-09-29 2019-03-08 西安交通大学 A kind of dynamic reconfigurable convolutional neural networks accelerator architecture in internet of things oriented field

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ANDREW LAVIN 等: "Fast Algorithms for Convolutional Neural Networks", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
LINGCHUAN MENG 等: "EFFICIENT WINOGRAD CONVOLUTION VIA INTEGER ARITHMETIC", 《ARXIV》 *
LIQIANG LU 等: "SpWA: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs", 《ACM》 *
Y HUANG 等: "A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm", 《JOURNAL OF PHYSICS: CONFERENCE SERIES》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765538A (en) * 2019-11-01 2021-05-07 中科寒武纪科技股份有限公司 Data processing method, data processing device, computer equipment and storage medium
CN112765538B (en) * 2019-11-01 2024-03-29 中科寒武纪科技股份有限公司 Data processing method, device, computer equipment and storage medium
WO2021083097A1 (en) * 2019-11-01 2021-05-06 中科寒武纪科技股份有限公司 Data processing method and apparatus, and computer device and storage medium
WO2021082747A1 (en) * 2019-11-01 2021-05-06 中科寒武纪科技股份有限公司 Operational apparatus and related product
CN111325332B (en) * 2020-02-18 2023-09-08 百度在线网络技术(北京)有限公司 Convolutional neural network processing method and device
CN111325332A (en) * 2020-02-18 2020-06-23 百度在线网络技术(北京)有限公司 Convolutional neural network processing method and device
WO2021232422A1 (en) * 2020-05-22 2021-11-25 深圳市大疆创新科技有限公司 Neural network arithmetic device and control method thereof
CN112639839A (en) * 2020-05-22 2021-04-09 深圳市大疆创新科技有限公司 Arithmetic device of neural network and control method thereof
CN112580793B (en) * 2020-12-24 2022-08-12 清华大学 Neural network accelerator based on time domain memory computing and acceleration method
CN112580793A (en) * 2020-12-24 2021-03-30 清华大学 Neural network accelerator based on time domain memory computing and acceleration method
CN112734827A (en) * 2021-01-07 2021-04-30 京东鲲鹏(江苏)科技有限公司 Target detection method and device, electronic equipment and storage medium
CN112862091A (en) * 2021-01-26 2021-05-28 合肥工业大学 Resource multiplexing type neural network hardware accelerating circuit based on quick convolution
CN112949845A (en) * 2021-03-08 2021-06-11 内蒙古大学 Deep convolutional neural network accelerator based on FPGA
CN113269302A (en) * 2021-05-11 2021-08-17 中山大学 Winograd processing method and system for 2D and 3D convolutional neural networks
CN113407904A (en) * 2021-06-09 2021-09-17 中山大学 Winograd processing method, system and medium compatible with multi-dimensional convolutional neural network
CN113283591B (en) * 2021-07-22 2021-11-16 南京大学 Efficient convolution implementation method and device based on Winograd algorithm and approximate multiplier
CN113283591A (en) * 2021-07-22 2021-08-20 南京大学 Efficient convolution implementation method and device based on Winograd algorithm and approximate multiplier
CN113554163A (en) * 2021-07-27 2021-10-26 深圳思谋信息科技有限公司 Convolutional neural network accelerator
CN113554163B (en) * 2021-07-27 2024-03-29 深圳思谋信息科技有限公司 Convolutional neural network accelerator
CN113656751A (en) * 2021-08-10 2021-11-16 上海新氦类脑智能科技有限公司 Method, device, equipment and medium for realizing signed operation of unsigned DAC (digital-to-analog converter)
CN113656751B (en) * 2021-08-10 2024-02-27 上海新氦类脑智能科技有限公司 Method, apparatus, device and medium for realizing signed operation by unsigned DAC
CN114399036A (en) * 2022-01-12 2022-04-26 电子科技大学 Efficient convolution calculation unit based on one-dimensional Winograd algorithm
CN114399036B (en) * 2022-01-12 2023-08-22 电子科技大学 Efficient convolution calculation unit based on one-dimensional Winograd algorithm

Also Published As

Publication number Publication date
CN110288086B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN110288086A (en) A kind of configurable convolution array accelerator structure based on Winograd
CN105681628B (en) A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and the method for realizing image denoising processing
CN108108809B (en) Hardware architecture for reasoning and accelerating convolutional neural network and working method thereof
CN109478144B (en) Data processing device and method
CN109598338A (en) A kind of convolutional neural networks accelerator of the calculation optimization based on FPGA
CN108665059A (en) Convolutional neural networks acceleration system based on field programmable gate array
CN105512723B (en) A kind of artificial neural networks apparatus and method for partially connected
CN108665063B (en) Bidirectional parallel processing convolution acceleration system for BNN hardware accelerator
CN107862374A (en) Processing with Neural Network system and processing method based on streamline
CN108256636A (en) A kind of convolutional neural networks algorithm design implementation method based on Heterogeneous Computing
CN110222760B (en) Quick image processing method based on winograd algorithm
CN106127302A (en) Process the circuit of data, image processing system, the method and apparatus of process data
CN104915322A (en) Method for accelerating convolution neutral network hardware and AXI bus IP core thereof
CN110390383A (en) A kind of deep neural network hardware accelerator based on power exponent quantization
CN107203808B (en) A kind of two-value Convole Unit and corresponding two-value convolutional neural networks processor
CN108537330A (en) Convolutional calculation device and method applied to neural network
CN103020890A (en) Visual processing device based on multi-layer parallel processing
CN111626403B (en) Convolutional neural network accelerator based on CPU-FPGA memory sharing
CN108009126A (en) A kind of computational methods and Related product
CN117933314A (en) Processing device, processing method, chip and electronic device
CN110991630A (en) Convolutional neural network processor for edge calculation
CN115880132B (en) Graphics processor, matrix multiplication task processing method, device and storage medium
CN109885406B (en) Operator calculation optimization method, device, equipment and storage medium
CN110580519B (en) Convolution operation device and method thereof
CN108334944A (en) A kind of device and method of artificial neural network operation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant