CN107578098A - Neural network processor based on systolic arrays - Google Patents

Neural network processor based on systolic arrays Download PDF

Info

Publication number
CN107578098A
CN107578098A CN201710777741.4A CN201710777741A CN107578098A CN 107578098 A CN107578098 A CN 107578098A CN 201710777741 A CN201710777741 A CN 201710777741A CN 107578098 A CN107578098 A CN 107578098A
Authority
CN
China
Prior art keywords
data
weight
array
processing unit
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710777741.4A
Other languages
Chinese (zh)
Other versions
CN107578098B (en
Inventor
韩银和
许浩博
王颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201710777741.4A priority Critical patent/CN107578098B/en
Publication of CN107578098A publication Critical patent/CN107578098A/en
Application granted granted Critical
Publication of CN107578098B publication Critical patent/CN107578098B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)
  • Advance Control (AREA)

Abstract

The present invention provides a kind of neural network processor, including control unit, computing unit, data storage cell and weight memory cell, the computing unit obtains the data computing related to weight progress neutral net from data storage cell to weight memory cell respectively under the control of the control unit, wherein described computing unit includes multiple processing units that array control unit connects with a manner of systolic arrays, data and weight from different directions extremely should be by the systolic arrays that processing unit is formed, each processing unit be handled the data for flowing through it simultaneously in parallel.The neural network processor can reach very high processing speed;Input data has repeatedly been reused simultaneously, thus can realize higher computing throughput in the case where consuming less memory bandwidth.

Description

Neural network processor based on systolic arrays
Technical field
The present invention relates to nerual network technique, more particularly to neural network processor architecture.
Background technology
Deep learning achieves important breakthrough in the last few years, and the neural network model using deep learning Algorithm for Training is being schemed As the application fields such as identification, speech processes, intelligent robot achieve the achievement to attract people's attention.Deep neural network passes through foundation Model simulates the neural attachment structure of human brain, when handling the signals such as image, sound and text, passes through multiple conversion ranks Data characteristics is described for section layering.With the continuous improvement of neutral net complexity, nerual network technique is in practical application During exist and take that resource is more, the problems such as arithmetic speed is slow, energy expenditure is big.Traditional software meter is substituted using hardware accelerator The method of calculation turns into the effective mode for improving neural computing efficiency, such as utilizes graphics processing unit, special place Manage the neural network processor that device chip and FPGA (FPGA) are realized.
However, because neural network processor belongs to computation-intensive and memory access processor-intensive, on the one hand, nerve net Network model includes a large amount of multiplication add operations and other nonlinear operations, it is necessary to which neural network processor keeps high capacity fortune OK, to ensure the computing demand of neural network model;On the other hand, substantial amounts of parameter during neural network computing be present to change Generation, computing unit need largely to access memory, and the bandwidth Design demand to processor has been significantly greatly increased in this, increases simultaneously Memory access power consumption is added.
Therefore, it is necessary to be improved to existing neural network processor, to improve the computing of neural network processor effect Rate simultaneously reduces hardware spending.
The content of the invention
Therefore, a kind of it is an object of the invention to overcome above-mentioned prior art the defects of, there is provided god based on systolic arrays Through network processing unit.
The purpose of the present invention is achieved through the following technical solutions:
According to one embodiment of present invention, there is provided a kind of neural network processor, including control unit, calculating list Member, data storage cell and weight memory cell, the computing unit is under the control of the control unit respectively from data storage list Member obtains the data computing related to weight progress neutral net to weight memory cell,
Wherein described computing unit includes multiple processing units that array control unit connects with a manner of systolic arrays, array Weight and data are loaded onto in pe array by controller from different directions, and each processing unit is to the data and power that receive Computing is carried out again and data and weight are passed into next processing unit along different directions.
In the above-mentioned technical solutions, the pe array can be one-dimensional systolic arrays or two dimension systolic arrays.
In the above-mentioned technical solutions, the processing unit may include data register, weight register, multiplier and add up Device;
Wherein weight register receives the weight of a processing unit on the column direction from pe array, is sent out To multiplier and pass to next processing unit of the direction;
Data register receives the data of a processing unit on the line direction from pe array, is dealt into and multiplied Musical instruments used in a Buddhist or Taoist mass and the next processing unit for passing to the direction;
Multiplier carries out multiplying to the data and weight of input, its export access in accumulator with accumulator Data are carried out cumulative or carried out with part and input signal after add operation using result of calculation as partly and exporting.
In the above-mentioned technical solutions, the array control unit can load number from the line direction of the pe array According to from the column direction loading weight of the pe array.
In the above-mentioned technical solutions, described control unit can load the data for participating in computing from memory cell with row vector Sequence, weight sequence corresponding with the data sequence is loaded in the form of column vector.
In the above-mentioned technical solutions, the array control unit can press the order of line number and row number from small to large successively respectively Data sequence and weight sequence are loaded into row and column corresponding to pe array, adjacent lines and adjacent column are into array When differ 1 clock cycle in time, and ensure that the respective weights to be calculated and data are under the same clock cycle Into pe array.
Compared with prior art, the advantage of the invention is that:
The structure of systolic arrays is used in the computing unit of neural network processor, improves neural network processor Operation efficiency, alleviate the bandwidth demand of processor design.
Brief description of the drawings
Embodiments of the present invention is further illustrated referring to the drawings, wherein:
Fig. 1 shows the common topological schematic diagram of neutral net;
Fig. 2 shows neutral net convolution operation schematic block diagram;
Fig. 3 shows neural network processor structural schematic block diagram according to embodiments of the present invention;
Fig. 4 shows the structural representation of the computing unit of neural network processor according to an embodiment of the invention;
Fig. 5 shows the structural representation of the computing unit of the neural network processor according to another embodiment of the invention Figure;
Fig. 6 shows the structural representation of processing unit in systolic array architecture according to an embodiment of the invention;
Fig. 7 shows the calculating process schematic diagram of computing unit according to an embodiment of the invention
Fig. 8 shows that neural network processor according to an embodiment of the invention performs schematic flow sheet.
Embodiment
In order that the purpose of the present invention, technical scheme and advantage are more clearly understood, pass through below in conjunction with accompanying drawing specific real Applying example, the present invention is described in more detail.It should be appreciated that specific embodiment described herein is only to explain the present invention, and It is not used in the restriction present invention.
Neutral net is to be modeled the mathematical modeling to be formed to human brain structure and behavior activity, be generally divided into input layer, The structure such as hidden layer and output layer, each layer are formed by multiple neuron nodes, the output valve of the neuron node of this layer, meeting Next layer of neuron node is passed to as input, is successively connected.Neural network has bionics characteristics in itself, and its multilayer is taken out As the process of iteration has similar information processing manner with human brain and other perceptual organs.
Fig. 1 shows the common topological schematic diagram of neutral net.The first layer input value of neutral net sandwich construction is original (" original image " in the present invention refers to pending initial data to beginning image, and not exclusively narrow sense passes through to shoot and shone The image that piece obtains), typically, can be by the neuron node value to this layer (herein for each layer of neutral net Also referred to as data) and its corresponding weighted value calculated to obtain next layer of nodal value.For example, it is assumed thatSeveral neuron nodes of a certain layer in neutral net are represented, they are connected with next layer of node y,The weight of corresponding connection is represented, then y value definition:Y=x × w.Therefore, for neutral net The largely convolution operation based on multiply-add operation all be present in each layer.Convolution operation process in neutral net is generally such as Fig. 2 institutes Show:The two dimension modulus convolution kernel of one K*K size is scanned to characteristic pattern, in scanning process weight with it is right in characteristic pattern The characteristic element answered seeks inner product, and all inner product values are summed, and obtains an output layer characteristic element.When each convolutional layer has During N number of feature figure layer, the convolution kernel and characteristic pattern in the convolutional layer that have N number of K*K sizes carry out convolution operation, N number of inner product value Summation obtains an output layer characteristic element.With the continuous improvement of neutral net complexity, such calculating can undoubtedly consume Substantial amounts of resource.Thus, generally use special neural network processor realizes neural computing.
Common neural network processor is all based on the structure of storage-control-calculating.Storage organization is based on storing and participating in Operational order of the data of calculation, neutral net weight and processor etc.;Control structure is used to parse operational order, generation control letter Number with the scheduling of data in control processor and storage and the calculating process of neutral net;Calculate structure responsible nerve network meter Calculate operation.Wherein memory cell can store transmitted outside neural network processor data (for example, primitive character diagram data), Caused result or intermediate result in the neutral net weight that has trained, calculating process, participate in the instruction letter that calculates Breath etc..
Fig. 3 shows the structural representation of neural network processor according to embodiments of the present invention.As shown in figure 3, storage Unit is further subdivided into input data memory cell 311, weight memory cell 312, the location of instruction 313 and output data Memory cell 314, wherein, input data memory cell 311 is used to store the data for participating in calculating, such as including primitive character figure Data and the data for participating in intermediate layer calculating;Weight memory cell 312 is used to store the neutral net weight trained;Refer to Make memory cell 313 be used for store participate in calculating command information, instruction can controlled unit 320 resolve to controlling stream to dispatch The calculating of neutral net;Output data memory cell 314 is used to store the neuron response being calculated.It is single by that will store Member is finely divided, can be centrally stored by the basically identical data of data type, in order to select suitable storage medium and can letter Change the operation such as addressing data.It should be understood that input data memory cell 311 and output data memory cell 314 can also be same Memory cell.
Control unit 320 is responsible for the work such as Instruction decoding, data dispatch, process control.Such as acquisition is stored in instruction and deposited The instruction of storage unit is simultaneously parsed, and then is dispatched data according to the obtained control signal of parsing and controlled computing unit to carry out The related operation of neutral net.In an embodiment of the present invention, the figure layer data for participating in neural network computing are divided into difference Region, each region is as a matrix, so as to which the computing between data and weight to be divided into the shape of multiple matrix operations Formula (such as shown in Fig. 2).So, control unit is suitable for the row vector of matrix operation or the form of column vector from memory cell To load the weight sequence and data sequence that participate in computing.
One or more computing units (such as computing unit 330,331 etc.) can be included in neural network processor, Each computing unit can perform corresponding neural computing according to the control signal from control unit 320, single from each storage Member obtains data and is calculated and result of calculation is written into memory cell.Each computing unit can use identical structure or Different structures, identical calculating can be performed, different calculating can also be carried out.There is provided in one embodiment of the invention Computing unit include array control unit and in the form of systolic arrays tissue multiple processing units, each processing unit has phase Same internal structure.Array control unit is responsible for data being loaded onto in systolic arrays, and each processing unit is responsible for data calculating, weight Input from the top of systolic arrays, propagate from top to bottom, data input on the left of systolic arrays, and propagate from left to right, everywhere Reason unit carries out computing to the data and weight that receive, is as a result exported from the right side of systolic arrays.Systolic arrays can be one-dimensional Or two-dimensional structure.It should be understood that the computing unit otherwise calculated can also be included in neural network processor, can To select different computing units according to the actual requirements by control unit come processing data.
Fig. 4 shows the structural representation of computing unit in neural network processor according to an embodiment of the invention. As shown in figure 4, systolic arrays ties up one-dimentional structure, each processing unit serial connection.For the respective weights sequence of pending computing And each weight in weight sequence is loaded into different processing units and remains to corresponding data sequence by data sequence, array control unit Arrange after last element completes the calculating with respective weights and reload next group of weight;Simultaneously successively will be each in data sequence Data are loaded onto in systolic arrays from left side, and processed data is from the opposite side transmission meeting array control unit of systolic arrays. In such computing unit structure, first data initially enters first processing unit, is passed to after processing Next processing unit, while second data enters first processing unit.By that analogy, when first data reaches finally One processing unit, it has been processed repeatedly.So this pulsation framework has actually repeatedly reused input data, It is possible thereby to realize higher computing throughput in the case where consuming less memory bandwidth.
Fig. 5 shows the structural representation of computing unit in neural network processor according to an embodiment of the invention. In this embodiment, in computing unit using two-dimensional array mode come the multiple computing units of tissue, including row array and column array, And each processing unit is only connected with adjacent processing unit, i.e., processing unit is only communicated with adjacent processing unit.Battle array Row controller is responsible for the scheduling of data, can control related data from the top of the systolic arrays of computing unit and left input to In processing unit, different data are inputted into processing unit from different directions.For example, array control unit control weight is from The top input of cell array is managed, is propagated from top to bottom on edge and column direction;Data are defeated from the left side of pe array Enter, and propagated from left to right along line direction.The not input direction to various calculating elements of the invention and pulsation propagation side To being limited, " left side " referred to herein, " right side ", the term such as " on ", " under " only refers to the respective direction of example in figure, should not solve It is interpreted as the limitation of the physics realization to the present invention.
As noted above, in an embodiment of the present invention, each processing unit is isomorphism and performs phase in computing unit Same operation.Fig. 6 gives the structural representation of processing unit according to an embodiment of the invention.As shown in fig. 6, processing The input signal of unit include data, weight and part and;Output signal includes data output, weight exports and partly and defeated Go out.Processing unit main inside includes data register, weight register, multiplier and accumulator.Weight input signal connects To weight register and multiplier, data input signal accesses to data register and multiplier, is partly accessed with input signal To accumulator.Data can be dealt into multiplier and be handled by weight register, can also be directly passed to the calculating list of lower section Member;Data can also be dealt into multiplier and be handled by same data register, or be directly passed to the next unit on right side. The data and weight of input carry out multiplying in multiplier, the output of multiplier access in accumulator with accumulator Data are carried out cumulative or carried out with part and input signal after add operation using result of calculation as partly and exporting.Above-mentioned computing It may be in response to flexibly be set from the control signal of array control unit with transmitting.For example, each processing unit can be held The following operation of row:
1) data of a upper node for the row and column in pulsation direction are received;
2) product of two data is calculated, and the result with depositing originally is added up;
3) value after adding up is preserved, the input data received voluntarily is output to next row node, by received from row Input data is output to next row node.
In addition, for the processing unit of one-dimensional array form tissue, weight need not be propagated downwards, therefore work as array control unit After pending weight sequence each element is respectively loaded in the weight register of each processing unit, weight register need not be carried out Output, but in weight register retain a period of time, array control unit treat wherein weight complete its related computing tasks it Afterwards, empty weight register and continue the follow-up pending weight of loading.
With reference to Fig. 7, illustrated with following with representing the example of the two of data and weight 3*3 matrix multiples according to this hair The calculating process of the computing unit using two-dimensional array structure of bright embodiment:
Data matrixWeight matrix
Array control unit control data and weight are inputted to processing unit from the top of pe array and left respectively In.Corresponded to for example, generally the row vector of matrix A can be sequentially entered into pe array by the order of line number from small to large Row, and adjacent row vector differs 1 clock cycle in time into pe array, i.e. the i-th row k row of matrix A Data and matrix A the i-th -1 row k-1 row data simultaneously enter pe array;The column vector of matrix B is by row number from small Row corresponding to pe array are sequentially entered to big order, and adjacent column vector enters pe array in the time The data of the row k j row of 1 clock cycle of upper difference, i.e. matrix B and the data of the row of kth -1 j-1 row of matrix B enter simultaneously Pe array.Also, data matrix A enters processing unit battle array with weight matrix B by systolic arrays is advanced into by row Arrange parallel in time, i.e., the corresponding element A that calculated in matrix A and matrix Bi,kAnd Bk,jIt is under the same clock cycle Into pe array, until the full line and permutation of all elements whole penetration management cell array of matrix A and matrix B. It is responsible for each input control in unit of each data arrival is met time alignment by array control unit.So, antenna array control Device from different directions extremely should be by the systolic arrays that processing unit is formed, weight flows from top to bottom, data by data and weight Flow from left to right.During data flow, all processing units are simultaneously in parallel at the data to flowing through it Reason, thus very high processing speed can be reached.Meanwhile by predetermined data flow pattern make data from flow into processing Cell array to outflow pe array during complete all processing that should be done to it, without re-entering these numbers again According to thus also reducing accessing operation.
As shown in fig. 7, in a cycle, data 3 and 3 are accessed in processing unit PE11 simultaneously, and reason is single in this place Multiplying is carried out in member;
In second period, the data 3 that processing unit PE11 is flowed to from left side flow to the right, and data 4 access simultaneously To processing unit PE12, the data 3 that processing unit PE12 is flowed to from top flow downward, and data 2 access to processing simultaneously In unit PE21;
The 3rd cycle, data 3 flow into processing unit PE11 above PE11, from data 2 flow into from left side Unit P11 is managed, data 5 and data 2 flow into processing unit PE21, and data 4 and data 5 flow into processing unit PE12, data 3 Processing unit PE13 is flowed into data 2, data 2 and data 4 flow into calculation units PE 22, and data 3 and data 3 flow into meter Unit PE31 is calculated,
The 4th cycle, data 2 and data 2 access to processing unit PE12, and it is single that data 4 and data 3 access to processing First PE13, data 3 and data 3 access to processing unit PE21, and data 5 and data 5 access to processing unit PE22, the He of data 2 Data 2 access to processing unit PE23, and data 2 and data 2 access to processing unit PE31, and data 3 and data 4 access to processing Unit PE32
The 5th cycle, data 2 and data 5 are flowed into processing unit PE13, and data 3 and data 2 flow into processing In unit PE22, data 5 and data 3 are flowed into processing unit 23, and data 5 and data 3 are flowed into processing unit PE31, number Flowed into according to 5 and data 2 in processing unit PE32, data 3 and data 2 are flowed into processing unit PE33.
The 6th cycle, data 3 and data 5 are flowed into processing unit PE23, and data 5 and data 2 flow into processing In unit PE32, data 2 and data 3 are flowed into processing unit PE33, and data 5 and data 5 are flowed into processing unit PE33.
The 7th cycle, data 5 and data 5 are flowed into processing unit PE33.
Wherein, result of product is added up in column direction, i.e., PE11 result of product, which is transferred in PE21, is added up, then Accumulation calculating result is transferred in PE31 and added up.
Fig. 8 is shown performs flow according to the neural network processor using above-mentioned computing unit of an example of the present invention Schematic diagram.In step S1, control unit addresses to memory cell, reads and parse the instruction for needing to perform in next step;Step S2, The storage address obtained according to analysis instruction obtains input data from memory cell;Step S3, by data and weight respectively from Input memory cell and weight memory cell are loaded into computing unit according to embodiments of the present invention described above;Step S4, The computing unit performs the arithmetic operation in neural network computing;Step S5, output will be stored in neural computing result In memory cell.
Although the present invention be described by means of preferred embodiments, but the present invention be not limited to it is described here Embodiment, also include made various changes and change without departing from the present invention.

Claims (7)

1. a kind of neural network processor, including control unit, computing unit, data storage cell and weight memory cell, meter Calculate unit and obtain data and weight progress god from data storage cell and weight memory cell respectively under the control of the control unit Computing through network correlation,
Wherein described computing unit includes multiple processing units that array control unit connects with a manner of systolic arrays, antenna array control Weight and data are loaded onto in pe array by device from different directions, and each processing unit enters to the data received with weight Data and weight are simultaneously passed to next processing unit by row computing along different directions.
2. neural network processor according to claim 1, wherein the pe array is a dimension systolic array.
3. neural network processor according to claim 1, wherein the pe array is two dimension systolic arrays.
4. neural network processor according to claim 3, wherein the processing unit includes data register, weight is posted Storage, multiplier and accumulator;
Wherein weight register receives the weight of a processing unit on the column direction from pe array, is dealt into and multiplied Musical instruments used in a Buddhist or Taoist mass and the next processing unit for passing to the direction;
Data register receives the data of a processing unit on the line direction from pe array, is dealt into multiplier And pass to next processing unit of the direction;
Multiplier carries out multiplying to the data and weight of input, its export access in accumulator with the data in accumulator Carry out cumulative or carried out with part and input signal after add operation using result of calculation as partly and exporting.
5. the neural network processor according to claim 3 or 4, wherein the array control unit is from the processing unit battle array The line direction loading data of row, weight is loaded from the column direction of the pe array.
6. the neural network processor according to claim 3 or 4, wherein described control unit from memory cell with row to Amount loading participates in the data sequence of computing, and weight sequence corresponding with the data sequence is loaded in the form of column vector.
7. neural network processor according to claim 6, wherein the array control unit respectively by line number and row number from It is small that data sequence and weight sequence are loaded into row and column corresponding to pe array successively to big order, adjacent lines and Adjacent column differs 1 clock cycle in time when entering array, and ensures that the respective weights to be calculated and data are Enter pe array under the same clock cycle.
CN201710777741.4A 2017-09-01 2017-09-01 Neural network processor based on systolic array Active CN107578098B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710777741.4A CN107578098B (en) 2017-09-01 2017-09-01 Neural network processor based on systolic array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710777741.4A CN107578098B (en) 2017-09-01 2017-09-01 Neural network processor based on systolic array

Publications (2)

Publication Number Publication Date
CN107578098A true CN107578098A (en) 2018-01-12
CN107578098B CN107578098B (en) 2020-10-30

Family

ID=61030459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710777741.4A Active CN107578098B (en) 2017-09-01 2017-09-01 Neural network processor based on systolic array

Country Status (1)

Country Link
CN (1) CN107578098B (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628799A (en) * 2018-04-17 2018-10-09 上海交通大学 Restructural single-instruction multiple-data systolic array architecture, processor and electric terminal
CN109885512A (en) * 2019-02-01 2019-06-14 京微齐力(北京)科技有限公司 The System on Chip/SoC and design method of integrated FPGA and artificial intelligence module
CN109902836A (en) * 2019-02-01 2019-06-18 京微齐力(北京)科技有限公司 The failure tolerant method and System on Chip/SoC of artificial intelligence module
CN109902064A (en) * 2019-02-01 2019-06-18 京微齐力(北京)科技有限公司 A kind of chip circuit of two dimension systolic arrays
CN109902835A (en) * 2019-02-01 2019-06-18 京微齐力(北京)科技有限公司 Processing unit is provided with the artificial intelligence module and System on Chip/SoC of general-purpose algorithm unit
CN109902063A (en) * 2019-02-01 2019-06-18 京微齐力(北京)科技有限公司 A kind of System on Chip/SoC being integrated with two-dimensional convolution array
CN109902795A (en) * 2019-02-01 2019-06-18 京微齐力(北京)科技有限公司 Processing unit is provided with the artificial intelligence module and System on Chip/SoC of inputoutput multiplexer
CN109919323A (en) * 2019-02-01 2019-06-21 京微齐力(北京)科技有限公司 Edge cells have the artificial intelligence module and System on Chip/SoC of local accumulation function
CN109919321A (en) * 2019-02-01 2019-06-21 京微齐力(北京)科技有限公司 Unit has the artificial intelligence module and System on Chip/SoC of local accumulation function
CN109933371A (en) * 2019-02-01 2019-06-25 京微齐力(北京)科技有限公司 Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage
CN110211618A (en) * 2019-06-12 2019-09-06 中国科学院计算技术研究所 A kind of processing unit and method for block chain
CN110210615A (en) * 2019-07-08 2019-09-06 深圳芯英科技有限公司 It is a kind of for executing the systolic arrays system of neural computing
CN110348564A (en) * 2019-06-11 2019-10-18 中国人民解放军国防科技大学 SCNN reasoning acceleration device based on systolic array, processor and computer equipment
CN110543934A (en) * 2019-08-14 2019-12-06 北京航空航天大学 Pulse array computing structure and method for convolutional neural network
CN110705703A (en) * 2019-10-16 2020-01-17 北京航空航天大学 Sparse neural network processor based on systolic array
CN110785778A (en) * 2018-08-14 2020-02-11 深圳市大疆创新科技有限公司 Neural network processing device based on pulse array
CN110851779A (en) * 2019-10-16 2020-02-28 北京航空航天大学 Systolic array architecture for sparse matrix operations
CN111368988A (en) * 2020-02-28 2020-07-03 北京航空航天大学 Deep learning training hardware accelerator utilizing sparsity
CN111684473A (en) * 2018-01-31 2020-09-18 亚马逊技术股份有限公司 Improving performance of neural network arrays
CN112204579A (en) * 2018-06-19 2021-01-08 国际商业机器公司 Runtime reconfigurable neural network processor core
CN112632464A (en) * 2020-12-28 2021-04-09 上海壁仞智能科技有限公司 Processing device for processing data
CN112819134A (en) * 2019-11-18 2021-05-18 爱思开海力士有限公司 Memory device including neural network processing circuit
CN112836813A (en) * 2021-02-09 2021-05-25 南方科技大学 Reconfigurable pulsation array system for mixed precision neural network calculation
CN112862067A (en) * 2021-01-14 2021-05-28 支付宝(杭州)信息技术有限公司 Method and device for processing business by utilizing business model based on privacy protection
CN112906877A (en) * 2019-11-19 2021-06-04 阿里巴巴集团控股有限公司 Data layout conscious processing in memory architectures for executing neural network models
CN113393376A (en) * 2021-05-08 2021-09-14 杭州电子科技大学 Lightweight super-resolution image reconstruction method based on deep learning
CN113869507A (en) * 2021-12-02 2021-12-31 之江实验室 Neural network accelerator convolution calculation device and method based on pulse array
CN113870273A (en) * 2021-12-02 2021-12-31 之江实验室 Neural network accelerator characteristic graph segmentation method based on pulse array
FR3115136A1 (en) 2020-10-12 2022-04-15 Thales METHOD AND DEVICE FOR PROCESSING DATA TO BE PROVIDED AS INPUT OF A FIRST SHIFT REGISTER OF A SYSTOLIC NEURONAL ELECTRONIC CIRCUIT
CN114675806A (en) * 2022-05-30 2022-06-28 中科南京智能技术研究院 Pulsation matrix unit and pulsation matrix calculation device
CN110210615B (en) * 2019-07-08 2024-05-28 中昊芯英(杭州)科技有限公司 Systolic array system for executing neural network calculation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529670A (en) * 2016-10-27 2017-03-22 中国科学院计算技术研究所 Neural network processor based on weight compression, design method, and chip
US20170103318A1 (en) * 2015-05-21 2017-04-13 Google Inc. Rotating data for neural network computations
CN106650924A (en) * 2016-10-27 2017-05-10 中国科学院计算技术研究所 Processor based on time dimension and space dimension data flow compression and design method
CN107016175A (en) * 2017-03-23 2017-08-04 中国科学院计算技术研究所 It is applicable the Automation Design method, device and the optimization method of neural network processor
CN107085562A (en) * 2017-03-23 2017-08-22 中国科学院计算技术研究所 A kind of neural network processor and design method based on efficient multiplexing data flow

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170103318A1 (en) * 2015-05-21 2017-04-13 Google Inc. Rotating data for neural network computations
CN106529670A (en) * 2016-10-27 2017-03-22 中国科学院计算技术研究所 Neural network processor based on weight compression, design method, and chip
CN106650924A (en) * 2016-10-27 2017-05-10 中国科学院计算技术研究所 Processor based on time dimension and space dimension data flow compression and design method
CN107016175A (en) * 2017-03-23 2017-08-04 中国科学院计算技术研究所 It is applicable the Automation Design method, device and the optimization method of neural network processor
CN107085562A (en) * 2017-03-23 2017-08-22 中国科学院计算技术研究所 A kind of neural network processor and design method based on efficient multiplexing data flow

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XUECHAO WEI等: "Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs", 《THE 54TH ANNUAL DESIGN AUTOMATION CONFERENCE(DAC)》 *

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111684473A (en) * 2018-01-31 2020-09-18 亚马逊技术股份有限公司 Improving performance of neural network arrays
CN111684473B (en) * 2018-01-31 2021-10-22 亚马逊技术股份有限公司 Improving performance of neural network arrays
CN108628799B (en) * 2018-04-17 2021-09-14 上海交通大学 Reconfigurable single instruction multiple data systolic array structure, processor and electronic terminal
CN108628799A (en) * 2018-04-17 2018-10-09 上海交通大学 Restructural single-instruction multiple-data systolic array architecture, processor and electric terminal
CN112204579A (en) * 2018-06-19 2021-01-08 国际商业机器公司 Runtime reconfigurable neural network processor core
CN110785778A (en) * 2018-08-14 2020-02-11 深圳市大疆创新科技有限公司 Neural network processing device based on pulse array
WO2020034079A1 (en) * 2018-08-14 2020-02-20 深圳市大疆创新科技有限公司 Systolic array-based neural network processing device
CN109902836A (en) * 2019-02-01 2019-06-18 京微齐力(北京)科技有限公司 The failure tolerant method and System on Chip/SoC of artificial intelligence module
CN109902064A (en) * 2019-02-01 2019-06-18 京微齐力(北京)科技有限公司 A kind of chip circuit of two dimension systolic arrays
CN109933371A (en) * 2019-02-01 2019-06-25 京微齐力(北京)科技有限公司 Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage
CN109919321A (en) * 2019-02-01 2019-06-21 京微齐力(北京)科技有限公司 Unit has the artificial intelligence module and System on Chip/SoC of local accumulation function
CN109885512A (en) * 2019-02-01 2019-06-14 京微齐力(北京)科技有限公司 The System on Chip/SoC and design method of integrated FPGA and artificial intelligence module
CN109902063B (en) * 2019-02-01 2023-08-22 京微齐力(北京)科技有限公司 System chip integrated with two-dimensional convolution array
CN109902835A (en) * 2019-02-01 2019-06-18 京微齐力(北京)科技有限公司 Processing unit is provided with the artificial intelligence module and System on Chip/SoC of general-purpose algorithm unit
CN109902063A (en) * 2019-02-01 2019-06-18 京微齐力(北京)科技有限公司 A kind of System on Chip/SoC being integrated with two-dimensional convolution array
CN109919323A (en) * 2019-02-01 2019-06-21 京微齐力(北京)科技有限公司 Edge cells have the artificial intelligence module and System on Chip/SoC of local accumulation function
CN109902795A (en) * 2019-02-01 2019-06-18 京微齐力(北京)科技有限公司 Processing unit is provided with the artificial intelligence module and System on Chip/SoC of inputoutput multiplexer
CN109885512B (en) * 2019-02-01 2021-01-12 京微齐力(北京)科技有限公司 System chip integrating FPGA and artificial intelligence module and design method
CN110348564A (en) * 2019-06-11 2019-10-18 中国人民解放军国防科技大学 SCNN reasoning acceleration device based on systolic array, processor and computer equipment
CN110211618A (en) * 2019-06-12 2019-09-06 中国科学院计算技术研究所 A kind of processing unit and method for block chain
CN110210615B (en) * 2019-07-08 2024-05-28 中昊芯英(杭州)科技有限公司 Systolic array system for executing neural network calculation
CN110210615A (en) * 2019-07-08 2019-09-06 深圳芯英科技有限公司 It is a kind of for executing the systolic arrays system of neural computing
CN110543934B (en) * 2019-08-14 2022-02-01 北京航空航天大学 Pulse array computing structure and method for convolutional neural network
CN110543934A (en) * 2019-08-14 2019-12-06 北京航空航天大学 Pulse array computing structure and method for convolutional neural network
CN110705703A (en) * 2019-10-16 2020-01-17 北京航空航天大学 Sparse neural network processor based on systolic array
CN110851779B (en) * 2019-10-16 2021-09-14 北京航空航天大学 Systolic array architecture for sparse matrix operations
CN110851779A (en) * 2019-10-16 2020-02-28 北京航空航天大学 Systolic array architecture for sparse matrix operations
CN112819134A (en) * 2019-11-18 2021-05-18 爱思开海力士有限公司 Memory device including neural network processing circuit
CN112819134B (en) * 2019-11-18 2024-04-05 爱思开海力士有限公司 Memory device including neural network processing circuitry
CN112906877A (en) * 2019-11-19 2021-06-04 阿里巴巴集团控股有限公司 Data layout conscious processing in memory architectures for executing neural network models
CN111368988B (en) * 2020-02-28 2022-12-20 北京航空航天大学 Deep learning training hardware accelerator utilizing sparsity
CN111368988A (en) * 2020-02-28 2020-07-03 北京航空航天大学 Deep learning training hardware accelerator utilizing sparsity
WO2022078982A1 (en) 2020-10-12 2022-04-21 Thales Method and device for processing data to be supplied as input for a first shift register of a systolic neural electronic circuit
FR3115136A1 (en) 2020-10-12 2022-04-15 Thales METHOD AND DEVICE FOR PROCESSING DATA TO BE PROVIDED AS INPUT OF A FIRST SHIFT REGISTER OF A SYSTOLIC NEURONAL ELECTRONIC CIRCUIT
CN112632464A (en) * 2020-12-28 2021-04-09 上海壁仞智能科技有限公司 Processing device for processing data
CN112862067A (en) * 2021-01-14 2021-05-28 支付宝(杭州)信息技术有限公司 Method and device for processing business by utilizing business model based on privacy protection
CN112836813B (en) * 2021-02-09 2023-06-16 南方科技大学 Reconfigurable pulse array system for mixed-precision neural network calculation
CN112836813A (en) * 2021-02-09 2021-05-25 南方科技大学 Reconfigurable pulsation array system for mixed precision neural network calculation
CN113393376A (en) * 2021-05-08 2021-09-14 杭州电子科技大学 Lightweight super-resolution image reconstruction method based on deep learning
CN113870273A (en) * 2021-12-02 2021-12-31 之江实验室 Neural network accelerator characteristic graph segmentation method based on pulse array
CN113870273B (en) * 2021-12-02 2022-03-25 之江实验室 Neural network accelerator characteristic graph segmentation method based on pulse array
CN113869507A (en) * 2021-12-02 2021-12-31 之江实验室 Neural network accelerator convolution calculation device and method based on pulse array
CN114675806A (en) * 2022-05-30 2022-06-28 中科南京智能技术研究院 Pulsation matrix unit and pulsation matrix calculation device

Also Published As

Publication number Publication date
CN107578098B (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN107578098A (en) Neural network processor based on systolic arrays
CN107578095B (en) Neural computing device and processor comprising the computing device
CN107918794A (en) Neural network processor based on computing array
CN107153873B (en) A kind of two-value convolutional neural networks processor and its application method
CN105184366B (en) A kind of time-multiplexed general neural network processor
CN105512723B (en) A kind of artificial neural networks apparatus and method for partially connected
CN106951395A (en) Towards the parallel convolution operations method and device of compression convolutional neural networks
CN107239824A (en) Apparatus and method for realizing sparse convolution neutral net accelerator
CN109190756A (en) Arithmetic unit based on Winograd convolution and the neural network processor comprising the device
WO2022134391A1 (en) Fusion neuron model, neural network structure and training and inference methods therefor, storage medium, and device
CN108665059A (en) Convolutional neural networks acceleration system based on field programmable gate array
CN106529670A (en) Neural network processor based on weight compression, design method, and chip
Sripad et al. SNAVA—A real-time multi-FPGA multi-model spiking neural network simulation architecture
CN107341544A (en) A kind of reconfigurable accelerator and its implementation based on divisible array
CN106875013A (en) The system and method for optimizing Recognition with Recurrent Neural Network for multinuclear
CN109472356A (en) A kind of accelerator and method of restructural neural network algorithm
CN109325591A (en) Neural network processor towards Winograd convolution
CN106951962A (en) Compound operation unit, method and electronic equipment for neutral net
CN107491811A (en) Method and system and neural network processor for accelerans network processing unit
CN106201651A (en) The simulator of neuromorphic chip
CN107423816A (en) A kind of more computational accuracy Processing with Neural Network method and systems
CN106650924A (en) Processor based on time dimension and space dimension data flow compression and design method
CN108510065A (en) Computing device and computational methods applied to long Memory Neural Networks in short-term
CN108446761A (en) A kind of neural network accelerator and data processing method
CN111401547B (en) HTM design method based on circulation learning unit for passenger flow analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant