CN107578098A - Neural network processor based on systolic arrays - Google Patents
Neural network processor based on systolic arrays Download PDFInfo
- Publication number
- CN107578098A CN107578098A CN201710777741.4A CN201710777741A CN107578098A CN 107578098 A CN107578098 A CN 107578098A CN 201710777741 A CN201710777741 A CN 201710777741A CN 107578098 A CN107578098 A CN 107578098A
- Authority
- CN
- China
- Prior art keywords
- data
- weight
- array
- processing unit
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Complex Calculations (AREA)
- Advance Control (AREA)
Abstract
The present invention provides a kind of neural network processor, including control unit, computing unit, data storage cell and weight memory cell, the computing unit obtains the data computing related to weight progress neutral net from data storage cell to weight memory cell respectively under the control of the control unit, wherein described computing unit includes multiple processing units that array control unit connects with a manner of systolic arrays, data and weight from different directions extremely should be by the systolic arrays that processing unit is formed, each processing unit be handled the data for flowing through it simultaneously in parallel.The neural network processor can reach very high processing speed;Input data has repeatedly been reused simultaneously, thus can realize higher computing throughput in the case where consuming less memory bandwidth.
Description
Technical field
The present invention relates to nerual network technique, more particularly to neural network processor architecture.
Background technology
Deep learning achieves important breakthrough in the last few years, and the neural network model using deep learning Algorithm for Training is being schemed
As the application fields such as identification, speech processes, intelligent robot achieve the achievement to attract people's attention.Deep neural network passes through foundation
Model simulates the neural attachment structure of human brain, when handling the signals such as image, sound and text, passes through multiple conversion ranks
Data characteristics is described for section layering.With the continuous improvement of neutral net complexity, nerual network technique is in practical application
During exist and take that resource is more, the problems such as arithmetic speed is slow, energy expenditure is big.Traditional software meter is substituted using hardware accelerator
The method of calculation turns into the effective mode for improving neural computing efficiency, such as utilizes graphics processing unit, special place
Manage the neural network processor that device chip and FPGA (FPGA) are realized.
However, because neural network processor belongs to computation-intensive and memory access processor-intensive, on the one hand, nerve net
Network model includes a large amount of multiplication add operations and other nonlinear operations, it is necessary to which neural network processor keeps high capacity fortune
OK, to ensure the computing demand of neural network model;On the other hand, substantial amounts of parameter during neural network computing be present to change
Generation, computing unit need largely to access memory, and the bandwidth Design demand to processor has been significantly greatly increased in this, increases simultaneously
Memory access power consumption is added.
Therefore, it is necessary to be improved to existing neural network processor, to improve the computing of neural network processor effect
Rate simultaneously reduces hardware spending.
The content of the invention
Therefore, a kind of it is an object of the invention to overcome above-mentioned prior art the defects of, there is provided god based on systolic arrays
Through network processing unit.
The purpose of the present invention is achieved through the following technical solutions:
According to one embodiment of present invention, there is provided a kind of neural network processor, including control unit, calculating list
Member, data storage cell and weight memory cell, the computing unit is under the control of the control unit respectively from data storage list
Member obtains the data computing related to weight progress neutral net to weight memory cell,
Wherein described computing unit includes multiple processing units that array control unit connects with a manner of systolic arrays, array
Weight and data are loaded onto in pe array by controller from different directions, and each processing unit is to the data and power that receive
Computing is carried out again and data and weight are passed into next processing unit along different directions.
In the above-mentioned technical solutions, the pe array can be one-dimensional systolic arrays or two dimension systolic arrays.
In the above-mentioned technical solutions, the processing unit may include data register, weight register, multiplier and add up
Device;
Wherein weight register receives the weight of a processing unit on the column direction from pe array, is sent out
To multiplier and pass to next processing unit of the direction;
Data register receives the data of a processing unit on the line direction from pe array, is dealt into and multiplied
Musical instruments used in a Buddhist or Taoist mass and the next processing unit for passing to the direction;
Multiplier carries out multiplying to the data and weight of input, its export access in accumulator with accumulator
Data are carried out cumulative or carried out with part and input signal after add operation using result of calculation as partly and exporting.
In the above-mentioned technical solutions, the array control unit can load number from the line direction of the pe array
According to from the column direction loading weight of the pe array.
In the above-mentioned technical solutions, described control unit can load the data for participating in computing from memory cell with row vector
Sequence, weight sequence corresponding with the data sequence is loaded in the form of column vector.
In the above-mentioned technical solutions, the array control unit can press the order of line number and row number from small to large successively respectively
Data sequence and weight sequence are loaded into row and column corresponding to pe array, adjacent lines and adjacent column are into array
When differ 1 clock cycle in time, and ensure that the respective weights to be calculated and data are under the same clock cycle
Into pe array.
Compared with prior art, the advantage of the invention is that:
The structure of systolic arrays is used in the computing unit of neural network processor, improves neural network processor
Operation efficiency, alleviate the bandwidth demand of processor design.
Brief description of the drawings
Embodiments of the present invention is further illustrated referring to the drawings, wherein:
Fig. 1 shows the common topological schematic diagram of neutral net;
Fig. 2 shows neutral net convolution operation schematic block diagram;
Fig. 3 shows neural network processor structural schematic block diagram according to embodiments of the present invention;
Fig. 4 shows the structural representation of the computing unit of neural network processor according to an embodiment of the invention;
Fig. 5 shows the structural representation of the computing unit of the neural network processor according to another embodiment of the invention
Figure;
Fig. 6 shows the structural representation of processing unit in systolic array architecture according to an embodiment of the invention;
Fig. 7 shows the calculating process schematic diagram of computing unit according to an embodiment of the invention
Fig. 8 shows that neural network processor according to an embodiment of the invention performs schematic flow sheet.
Embodiment
In order that the purpose of the present invention, technical scheme and advantage are more clearly understood, pass through below in conjunction with accompanying drawing specific real
Applying example, the present invention is described in more detail.It should be appreciated that specific embodiment described herein is only to explain the present invention, and
It is not used in the restriction present invention.
Neutral net is to be modeled the mathematical modeling to be formed to human brain structure and behavior activity, be generally divided into input layer,
The structure such as hidden layer and output layer, each layer are formed by multiple neuron nodes, the output valve of the neuron node of this layer, meeting
Next layer of neuron node is passed to as input, is successively connected.Neural network has bionics characteristics in itself, and its multilayer is taken out
As the process of iteration has similar information processing manner with human brain and other perceptual organs.
Fig. 1 shows the common topological schematic diagram of neutral net.The first layer input value of neutral net sandwich construction is original
(" original image " in the present invention refers to pending initial data to beginning image, and not exclusively narrow sense passes through to shoot and shone
The image that piece obtains), typically, can be by the neuron node value to this layer (herein for each layer of neutral net
Also referred to as data) and its corresponding weighted value calculated to obtain next layer of nodal value.For example, it is assumed thatSeveral neuron nodes of a certain layer in neutral net are represented, they are connected with next layer of node y,The weight of corresponding connection is represented, then y value definition:Y=x × w.Therefore, for neutral net
The largely convolution operation based on multiply-add operation all be present in each layer.Convolution operation process in neutral net is generally such as Fig. 2 institutes
Show:The two dimension modulus convolution kernel of one K*K size is scanned to characteristic pattern, in scanning process weight with it is right in characteristic pattern
The characteristic element answered seeks inner product, and all inner product values are summed, and obtains an output layer characteristic element.When each convolutional layer has
During N number of feature figure layer, the convolution kernel and characteristic pattern in the convolutional layer that have N number of K*K sizes carry out convolution operation, N number of inner product value
Summation obtains an output layer characteristic element.With the continuous improvement of neutral net complexity, such calculating can undoubtedly consume
Substantial amounts of resource.Thus, generally use special neural network processor realizes neural computing.
Common neural network processor is all based on the structure of storage-control-calculating.Storage organization is based on storing and participating in
Operational order of the data of calculation, neutral net weight and processor etc.;Control structure is used to parse operational order, generation control letter
Number with the scheduling of data in control processor and storage and the calculating process of neutral net;Calculate structure responsible nerve network meter
Calculate operation.Wherein memory cell can store transmitted outside neural network processor data (for example, primitive character diagram data),
Caused result or intermediate result in the neutral net weight that has trained, calculating process, participate in the instruction letter that calculates
Breath etc..
Fig. 3 shows the structural representation of neural network processor according to embodiments of the present invention.As shown in figure 3, storage
Unit is further subdivided into input data memory cell 311, weight memory cell 312, the location of instruction 313 and output data
Memory cell 314, wherein, input data memory cell 311 is used to store the data for participating in calculating, such as including primitive character figure
Data and the data for participating in intermediate layer calculating;Weight memory cell 312 is used to store the neutral net weight trained;Refer to
Make memory cell 313 be used for store participate in calculating command information, instruction can controlled unit 320 resolve to controlling stream to dispatch
The calculating of neutral net;Output data memory cell 314 is used to store the neuron response being calculated.It is single by that will store
Member is finely divided, can be centrally stored by the basically identical data of data type, in order to select suitable storage medium and can letter
Change the operation such as addressing data.It should be understood that input data memory cell 311 and output data memory cell 314 can also be same
Memory cell.
Control unit 320 is responsible for the work such as Instruction decoding, data dispatch, process control.Such as acquisition is stored in instruction and deposited
The instruction of storage unit is simultaneously parsed, and then is dispatched data according to the obtained control signal of parsing and controlled computing unit to carry out
The related operation of neutral net.In an embodiment of the present invention, the figure layer data for participating in neural network computing are divided into difference
Region, each region is as a matrix, so as to which the computing between data and weight to be divided into the shape of multiple matrix operations
Formula (such as shown in Fig. 2).So, control unit is suitable for the row vector of matrix operation or the form of column vector from memory cell
To load the weight sequence and data sequence that participate in computing.
One or more computing units (such as computing unit 330,331 etc.) can be included in neural network processor,
Each computing unit can perform corresponding neural computing according to the control signal from control unit 320, single from each storage
Member obtains data and is calculated and result of calculation is written into memory cell.Each computing unit can use identical structure or
Different structures, identical calculating can be performed, different calculating can also be carried out.There is provided in one embodiment of the invention
Computing unit include array control unit and in the form of systolic arrays tissue multiple processing units, each processing unit has phase
Same internal structure.Array control unit is responsible for data being loaded onto in systolic arrays, and each processing unit is responsible for data calculating, weight
Input from the top of systolic arrays, propagate from top to bottom, data input on the left of systolic arrays, and propagate from left to right, everywhere
Reason unit carries out computing to the data and weight that receive, is as a result exported from the right side of systolic arrays.Systolic arrays can be one-dimensional
Or two-dimensional structure.It should be understood that the computing unit otherwise calculated can also be included in neural network processor, can
To select different computing units according to the actual requirements by control unit come processing data.
Fig. 4 shows the structural representation of computing unit in neural network processor according to an embodiment of the invention.
As shown in figure 4, systolic arrays ties up one-dimentional structure, each processing unit serial connection.For the respective weights sequence of pending computing
And each weight in weight sequence is loaded into different processing units and remains to corresponding data sequence by data sequence, array control unit
Arrange after last element completes the calculating with respective weights and reload next group of weight;Simultaneously successively will be each in data sequence
Data are loaded onto in systolic arrays from left side, and processed data is from the opposite side transmission meeting array control unit of systolic arrays.
In such computing unit structure, first data initially enters first processing unit, is passed to after processing
Next processing unit, while second data enters first processing unit.By that analogy, when first data reaches finally
One processing unit, it has been processed repeatedly.So this pulsation framework has actually repeatedly reused input data,
It is possible thereby to realize higher computing throughput in the case where consuming less memory bandwidth.
Fig. 5 shows the structural representation of computing unit in neural network processor according to an embodiment of the invention.
In this embodiment, in computing unit using two-dimensional array mode come the multiple computing units of tissue, including row array and column array,
And each processing unit is only connected with adjacent processing unit, i.e., processing unit is only communicated with adjacent processing unit.Battle array
Row controller is responsible for the scheduling of data, can control related data from the top of the systolic arrays of computing unit and left input to
In processing unit, different data are inputted into processing unit from different directions.For example, array control unit control weight is from
The top input of cell array is managed, is propagated from top to bottom on edge and column direction;Data are defeated from the left side of pe array
Enter, and propagated from left to right along line direction.The not input direction to various calculating elements of the invention and pulsation propagation side
To being limited, " left side " referred to herein, " right side ", the term such as " on ", " under " only refers to the respective direction of example in figure, should not solve
It is interpreted as the limitation of the physics realization to the present invention.
As noted above, in an embodiment of the present invention, each processing unit is isomorphism and performs phase in computing unit
Same operation.Fig. 6 gives the structural representation of processing unit according to an embodiment of the invention.As shown in fig. 6, processing
The input signal of unit include data, weight and part and;Output signal includes data output, weight exports and partly and defeated
Go out.Processing unit main inside includes data register, weight register, multiplier and accumulator.Weight input signal connects
To weight register and multiplier, data input signal accesses to data register and multiplier, is partly accessed with input signal
To accumulator.Data can be dealt into multiplier and be handled by weight register, can also be directly passed to the calculating list of lower section
Member;Data can also be dealt into multiplier and be handled by same data register, or be directly passed to the next unit on right side.
The data and weight of input carry out multiplying in multiplier, the output of multiplier access in accumulator with accumulator
Data are carried out cumulative or carried out with part and input signal after add operation using result of calculation as partly and exporting.Above-mentioned computing
It may be in response to flexibly be set from the control signal of array control unit with transmitting.For example, each processing unit can be held
The following operation of row:
1) data of a upper node for the row and column in pulsation direction are received;
2) product of two data is calculated, and the result with depositing originally is added up;
3) value after adding up is preserved, the input data received voluntarily is output to next row node, by received from row
Input data is output to next row node.
In addition, for the processing unit of one-dimensional array form tissue, weight need not be propagated downwards, therefore work as array control unit
After pending weight sequence each element is respectively loaded in the weight register of each processing unit, weight register need not be carried out
Output, but in weight register retain a period of time, array control unit treat wherein weight complete its related computing tasks it
Afterwards, empty weight register and continue the follow-up pending weight of loading.
With reference to Fig. 7, illustrated with following with representing the example of the two of data and weight 3*3 matrix multiples according to this hair
The calculating process of the computing unit using two-dimensional array structure of bright embodiment:
Data matrixWeight matrix
Array control unit control data and weight are inputted to processing unit from the top of pe array and left respectively
In.Corresponded to for example, generally the row vector of matrix A can be sequentially entered into pe array by the order of line number from small to large
Row, and adjacent row vector differs 1 clock cycle in time into pe array, i.e. the i-th row k row of matrix A
Data and matrix A the i-th -1 row k-1 row data simultaneously enter pe array;The column vector of matrix B is by row number from small
Row corresponding to pe array are sequentially entered to big order, and adjacent column vector enters pe array in the time
The data of the row k j row of 1 clock cycle of upper difference, i.e. matrix B and the data of the row of kth -1 j-1 row of matrix B enter simultaneously
Pe array.Also, data matrix A enters processing unit battle array with weight matrix B by systolic arrays is advanced into by row
Arrange parallel in time, i.e., the corresponding element A that calculated in matrix A and matrix Bi,kAnd Bk,jIt is under the same clock cycle
Into pe array, until the full line and permutation of all elements whole penetration management cell array of matrix A and matrix B.
It is responsible for each input control in unit of each data arrival is met time alignment by array control unit.So, antenna array control
Device from different directions extremely should be by the systolic arrays that processing unit is formed, weight flows from top to bottom, data by data and weight
Flow from left to right.During data flow, all processing units are simultaneously in parallel at the data to flowing through it
Reason, thus very high processing speed can be reached.Meanwhile by predetermined data flow pattern make data from flow into processing
Cell array to outflow pe array during complete all processing that should be done to it, without re-entering these numbers again
According to thus also reducing accessing operation.
As shown in fig. 7, in a cycle, data 3 and 3 are accessed in processing unit PE11 simultaneously, and reason is single in this place
Multiplying is carried out in member;
In second period, the data 3 that processing unit PE11 is flowed to from left side flow to the right, and data 4 access simultaneously
To processing unit PE12, the data 3 that processing unit PE12 is flowed to from top flow downward, and data 2 access to processing simultaneously
In unit PE21;
The 3rd cycle, data 3 flow into processing unit PE11 above PE11, from data 2 flow into from left side
Unit P11 is managed, data 5 and data 2 flow into processing unit PE21, and data 4 and data 5 flow into processing unit PE12, data 3
Processing unit PE13 is flowed into data 2, data 2 and data 4 flow into calculation units PE 22, and data 3 and data 3 flow into meter
Unit PE31 is calculated,
The 4th cycle, data 2 and data 2 access to processing unit PE12, and it is single that data 4 and data 3 access to processing
First PE13, data 3 and data 3 access to processing unit PE21, and data 5 and data 5 access to processing unit PE22, the He of data 2
Data 2 access to processing unit PE23, and data 2 and data 2 access to processing unit PE31, and data 3 and data 4 access to processing
Unit PE32
The 5th cycle, data 2 and data 5 are flowed into processing unit PE13, and data 3 and data 2 flow into processing
In unit PE22, data 5 and data 3 are flowed into processing unit 23, and data 5 and data 3 are flowed into processing unit PE31, number
Flowed into according to 5 and data 2 in processing unit PE32, data 3 and data 2 are flowed into processing unit PE33.
The 6th cycle, data 3 and data 5 are flowed into processing unit PE23, and data 5 and data 2 flow into processing
In unit PE32, data 2 and data 3 are flowed into processing unit PE33, and data 5 and data 5 are flowed into processing unit PE33.
The 7th cycle, data 5 and data 5 are flowed into processing unit PE33.
Wherein, result of product is added up in column direction, i.e., PE11 result of product, which is transferred in PE21, is added up, then
Accumulation calculating result is transferred in PE31 and added up.
Fig. 8 is shown performs flow according to the neural network processor using above-mentioned computing unit of an example of the present invention
Schematic diagram.In step S1, control unit addresses to memory cell, reads and parse the instruction for needing to perform in next step;Step S2,
The storage address obtained according to analysis instruction obtains input data from memory cell;Step S3, by data and weight respectively from
Input memory cell and weight memory cell are loaded into computing unit according to embodiments of the present invention described above;Step S4,
The computing unit performs the arithmetic operation in neural network computing;Step S5, output will be stored in neural computing result
In memory cell.
Although the present invention be described by means of preferred embodiments, but the present invention be not limited to it is described here
Embodiment, also include made various changes and change without departing from the present invention.
Claims (7)
1. a kind of neural network processor, including control unit, computing unit, data storage cell and weight memory cell, meter
Calculate unit and obtain data and weight progress god from data storage cell and weight memory cell respectively under the control of the control unit
Computing through network correlation,
Wherein described computing unit includes multiple processing units that array control unit connects with a manner of systolic arrays, antenna array control
Weight and data are loaded onto in pe array by device from different directions, and each processing unit enters to the data received with weight
Data and weight are simultaneously passed to next processing unit by row computing along different directions.
2. neural network processor according to claim 1, wherein the pe array is a dimension systolic array.
3. neural network processor according to claim 1, wherein the pe array is two dimension systolic arrays.
4. neural network processor according to claim 3, wherein the processing unit includes data register, weight is posted
Storage, multiplier and accumulator;
Wherein weight register receives the weight of a processing unit on the column direction from pe array, is dealt into and multiplied
Musical instruments used in a Buddhist or Taoist mass and the next processing unit for passing to the direction;
Data register receives the data of a processing unit on the line direction from pe array, is dealt into multiplier
And pass to next processing unit of the direction;
Multiplier carries out multiplying to the data and weight of input, its export access in accumulator with the data in accumulator
Carry out cumulative or carried out with part and input signal after add operation using result of calculation as partly and exporting.
5. the neural network processor according to claim 3 or 4, wherein the array control unit is from the processing unit battle array
The line direction loading data of row, weight is loaded from the column direction of the pe array.
6. the neural network processor according to claim 3 or 4, wherein described control unit from memory cell with row to
Amount loading participates in the data sequence of computing, and weight sequence corresponding with the data sequence is loaded in the form of column vector.
7. neural network processor according to claim 6, wherein the array control unit respectively by line number and row number from
It is small that data sequence and weight sequence are loaded into row and column corresponding to pe array successively to big order, adjacent lines and
Adjacent column differs 1 clock cycle in time when entering array, and ensures that the respective weights to be calculated and data are
Enter pe array under the same clock cycle.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710777741.4A CN107578098B (en) | 2017-09-01 | 2017-09-01 | Neural network processor based on systolic array |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710777741.4A CN107578098B (en) | 2017-09-01 | 2017-09-01 | Neural network processor based on systolic array |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107578098A true CN107578098A (en) | 2018-01-12 |
CN107578098B CN107578098B (en) | 2020-10-30 |
Family
ID=61030459
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710777741.4A Active CN107578098B (en) | 2017-09-01 | 2017-09-01 | Neural network processor based on systolic array |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107578098B (en) |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108628799A (en) * | 2018-04-17 | 2018-10-09 | 上海交通大学 | Restructural single-instruction multiple-data systolic array architecture, processor and electric terminal |
CN109885512A (en) * | 2019-02-01 | 2019-06-14 | 京微齐力(北京)科技有限公司 | The System on Chip/SoC and design method of integrated FPGA and artificial intelligence module |
CN109902836A (en) * | 2019-02-01 | 2019-06-18 | 京微齐力(北京)科技有限公司 | The failure tolerant method and System on Chip/SoC of artificial intelligence module |
CN109902064A (en) * | 2019-02-01 | 2019-06-18 | 京微齐力(北京)科技有限公司 | A kind of chip circuit of two dimension systolic arrays |
CN109902835A (en) * | 2019-02-01 | 2019-06-18 | 京微齐力(北京)科技有限公司 | Processing unit is provided with the artificial intelligence module and System on Chip/SoC of general-purpose algorithm unit |
CN109902063A (en) * | 2019-02-01 | 2019-06-18 | 京微齐力(北京)科技有限公司 | A kind of System on Chip/SoC being integrated with two-dimensional convolution array |
CN109902795A (en) * | 2019-02-01 | 2019-06-18 | 京微齐力(北京)科技有限公司 | Processing unit is provided with the artificial intelligence module and System on Chip/SoC of inputoutput multiplexer |
CN109919323A (en) * | 2019-02-01 | 2019-06-21 | 京微齐力(北京)科技有限公司 | Edge cells have the artificial intelligence module and System on Chip/SoC of local accumulation function |
CN109919321A (en) * | 2019-02-01 | 2019-06-21 | 京微齐力(北京)科技有限公司 | Unit has the artificial intelligence module and System on Chip/SoC of local accumulation function |
CN109933371A (en) * | 2019-02-01 | 2019-06-25 | 京微齐力(北京)科技有限公司 | Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage |
CN110211618A (en) * | 2019-06-12 | 2019-09-06 | 中国科学院计算技术研究所 | A kind of processing unit and method for block chain |
CN110210615A (en) * | 2019-07-08 | 2019-09-06 | 深圳芯英科技有限公司 | It is a kind of for executing the systolic arrays system of neural computing |
CN110348564A (en) * | 2019-06-11 | 2019-10-18 | 中国人民解放军国防科技大学 | SCNN reasoning acceleration device based on systolic array, processor and computer equipment |
CN110543934A (en) * | 2019-08-14 | 2019-12-06 | 北京航空航天大学 | Pulse array computing structure and method for convolutional neural network |
CN110705703A (en) * | 2019-10-16 | 2020-01-17 | 北京航空航天大学 | Sparse neural network processor based on systolic array |
CN110785778A (en) * | 2018-08-14 | 2020-02-11 | 深圳市大疆创新科技有限公司 | Neural network processing device based on pulse array |
CN110851779A (en) * | 2019-10-16 | 2020-02-28 | 北京航空航天大学 | Systolic array architecture for sparse matrix operations |
CN111368988A (en) * | 2020-02-28 | 2020-07-03 | 北京航空航天大学 | Deep learning training hardware accelerator utilizing sparsity |
CN111684473A (en) * | 2018-01-31 | 2020-09-18 | 亚马逊技术股份有限公司 | Improving performance of neural network arrays |
CN112204579A (en) * | 2018-06-19 | 2021-01-08 | 国际商业机器公司 | Runtime reconfigurable neural network processor core |
CN112632464A (en) * | 2020-12-28 | 2021-04-09 | 上海壁仞智能科技有限公司 | Processing device for processing data |
CN112819134A (en) * | 2019-11-18 | 2021-05-18 | 爱思开海力士有限公司 | Memory device including neural network processing circuit |
CN112836813A (en) * | 2021-02-09 | 2021-05-25 | 南方科技大学 | Reconfigurable pulsation array system for mixed precision neural network calculation |
CN112862067A (en) * | 2021-01-14 | 2021-05-28 | 支付宝(杭州)信息技术有限公司 | Method and device for processing business by utilizing business model based on privacy protection |
CN112906877A (en) * | 2019-11-19 | 2021-06-04 | 阿里巴巴集团控股有限公司 | Data layout conscious processing in memory architectures for executing neural network models |
CN113393376A (en) * | 2021-05-08 | 2021-09-14 | 杭州电子科技大学 | Lightweight super-resolution image reconstruction method based on deep learning |
CN113869507A (en) * | 2021-12-02 | 2021-12-31 | 之江实验室 | Neural network accelerator convolution calculation device and method based on pulse array |
CN113870273A (en) * | 2021-12-02 | 2021-12-31 | 之江实验室 | Neural network accelerator characteristic graph segmentation method based on pulse array |
FR3115136A1 (en) | 2020-10-12 | 2022-04-15 | Thales | METHOD AND DEVICE FOR PROCESSING DATA TO BE PROVIDED AS INPUT OF A FIRST SHIFT REGISTER OF A SYSTOLIC NEURONAL ELECTRONIC CIRCUIT |
CN114675806A (en) * | 2022-05-30 | 2022-06-28 | 中科南京智能技术研究院 | Pulsation matrix unit and pulsation matrix calculation device |
CN110210615B (en) * | 2019-07-08 | 2024-05-28 | 中昊芯英(杭州)科技有限公司 | Systolic array system for executing neural network calculation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106529670A (en) * | 2016-10-27 | 2017-03-22 | 中国科学院计算技术研究所 | Neural network processor based on weight compression, design method, and chip |
US20170103318A1 (en) * | 2015-05-21 | 2017-04-13 | Google Inc. | Rotating data for neural network computations |
CN106650924A (en) * | 2016-10-27 | 2017-05-10 | 中国科学院计算技术研究所 | Processor based on time dimension and space dimension data flow compression and design method |
CN107016175A (en) * | 2017-03-23 | 2017-08-04 | 中国科学院计算技术研究所 | It is applicable the Automation Design method, device and the optimization method of neural network processor |
CN107085562A (en) * | 2017-03-23 | 2017-08-22 | 中国科学院计算技术研究所 | A kind of neural network processor and design method based on efficient multiplexing data flow |
-
2017
- 2017-09-01 CN CN201710777741.4A patent/CN107578098B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170103318A1 (en) * | 2015-05-21 | 2017-04-13 | Google Inc. | Rotating data for neural network computations |
CN106529670A (en) * | 2016-10-27 | 2017-03-22 | 中国科学院计算技术研究所 | Neural network processor based on weight compression, design method, and chip |
CN106650924A (en) * | 2016-10-27 | 2017-05-10 | 中国科学院计算技术研究所 | Processor based on time dimension and space dimension data flow compression and design method |
CN107016175A (en) * | 2017-03-23 | 2017-08-04 | 中国科学院计算技术研究所 | It is applicable the Automation Design method, device and the optimization method of neural network processor |
CN107085562A (en) * | 2017-03-23 | 2017-08-22 | 中国科学院计算技术研究所 | A kind of neural network processor and design method based on efficient multiplexing data flow |
Non-Patent Citations (1)
Title |
---|
XUECHAO WEI等: "Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs", 《THE 54TH ANNUAL DESIGN AUTOMATION CONFERENCE(DAC)》 * |
Cited By (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111684473A (en) * | 2018-01-31 | 2020-09-18 | 亚马逊技术股份有限公司 | Improving performance of neural network arrays |
CN111684473B (en) * | 2018-01-31 | 2021-10-22 | 亚马逊技术股份有限公司 | Improving performance of neural network arrays |
CN108628799B (en) * | 2018-04-17 | 2021-09-14 | 上海交通大学 | Reconfigurable single instruction multiple data systolic array structure, processor and electronic terminal |
CN108628799A (en) * | 2018-04-17 | 2018-10-09 | 上海交通大学 | Restructural single-instruction multiple-data systolic array architecture, processor and electric terminal |
CN112204579A (en) * | 2018-06-19 | 2021-01-08 | 国际商业机器公司 | Runtime reconfigurable neural network processor core |
CN110785778A (en) * | 2018-08-14 | 2020-02-11 | 深圳市大疆创新科技有限公司 | Neural network processing device based on pulse array |
WO2020034079A1 (en) * | 2018-08-14 | 2020-02-20 | 深圳市大疆创新科技有限公司 | Systolic array-based neural network processing device |
CN109902836A (en) * | 2019-02-01 | 2019-06-18 | 京微齐力(北京)科技有限公司 | The failure tolerant method and System on Chip/SoC of artificial intelligence module |
CN109902064A (en) * | 2019-02-01 | 2019-06-18 | 京微齐力(北京)科技有限公司 | A kind of chip circuit of two dimension systolic arrays |
CN109933371A (en) * | 2019-02-01 | 2019-06-25 | 京微齐力(北京)科技有限公司 | Its unit may have access to the artificial intelligence module and System on Chip/SoC of local storage |
CN109919321A (en) * | 2019-02-01 | 2019-06-21 | 京微齐力(北京)科技有限公司 | Unit has the artificial intelligence module and System on Chip/SoC of local accumulation function |
CN109885512A (en) * | 2019-02-01 | 2019-06-14 | 京微齐力(北京)科技有限公司 | The System on Chip/SoC and design method of integrated FPGA and artificial intelligence module |
CN109902063B (en) * | 2019-02-01 | 2023-08-22 | 京微齐力(北京)科技有限公司 | System chip integrated with two-dimensional convolution array |
CN109902835A (en) * | 2019-02-01 | 2019-06-18 | 京微齐力(北京)科技有限公司 | Processing unit is provided with the artificial intelligence module and System on Chip/SoC of general-purpose algorithm unit |
CN109902063A (en) * | 2019-02-01 | 2019-06-18 | 京微齐力(北京)科技有限公司 | A kind of System on Chip/SoC being integrated with two-dimensional convolution array |
CN109919323A (en) * | 2019-02-01 | 2019-06-21 | 京微齐力(北京)科技有限公司 | Edge cells have the artificial intelligence module and System on Chip/SoC of local accumulation function |
CN109902795A (en) * | 2019-02-01 | 2019-06-18 | 京微齐力(北京)科技有限公司 | Processing unit is provided with the artificial intelligence module and System on Chip/SoC of inputoutput multiplexer |
CN109885512B (en) * | 2019-02-01 | 2021-01-12 | 京微齐力(北京)科技有限公司 | System chip integrating FPGA and artificial intelligence module and design method |
CN110348564A (en) * | 2019-06-11 | 2019-10-18 | 中国人民解放军国防科技大学 | SCNN reasoning acceleration device based on systolic array, processor and computer equipment |
CN110211618A (en) * | 2019-06-12 | 2019-09-06 | 中国科学院计算技术研究所 | A kind of processing unit and method for block chain |
CN110210615B (en) * | 2019-07-08 | 2024-05-28 | 中昊芯英(杭州)科技有限公司 | Systolic array system for executing neural network calculation |
CN110210615A (en) * | 2019-07-08 | 2019-09-06 | 深圳芯英科技有限公司 | It is a kind of for executing the systolic arrays system of neural computing |
CN110543934B (en) * | 2019-08-14 | 2022-02-01 | 北京航空航天大学 | Pulse array computing structure and method for convolutional neural network |
CN110543934A (en) * | 2019-08-14 | 2019-12-06 | 北京航空航天大学 | Pulse array computing structure and method for convolutional neural network |
CN110705703A (en) * | 2019-10-16 | 2020-01-17 | 北京航空航天大学 | Sparse neural network processor based on systolic array |
CN110851779B (en) * | 2019-10-16 | 2021-09-14 | 北京航空航天大学 | Systolic array architecture for sparse matrix operations |
CN110851779A (en) * | 2019-10-16 | 2020-02-28 | 北京航空航天大学 | Systolic array architecture for sparse matrix operations |
CN112819134A (en) * | 2019-11-18 | 2021-05-18 | 爱思开海力士有限公司 | Memory device including neural network processing circuit |
CN112819134B (en) * | 2019-11-18 | 2024-04-05 | 爱思开海力士有限公司 | Memory device including neural network processing circuitry |
CN112906877A (en) * | 2019-11-19 | 2021-06-04 | 阿里巴巴集团控股有限公司 | Data layout conscious processing in memory architectures for executing neural network models |
CN111368988B (en) * | 2020-02-28 | 2022-12-20 | 北京航空航天大学 | Deep learning training hardware accelerator utilizing sparsity |
CN111368988A (en) * | 2020-02-28 | 2020-07-03 | 北京航空航天大学 | Deep learning training hardware accelerator utilizing sparsity |
WO2022078982A1 (en) | 2020-10-12 | 2022-04-21 | Thales | Method and device for processing data to be supplied as input for a first shift register of a systolic neural electronic circuit |
FR3115136A1 (en) | 2020-10-12 | 2022-04-15 | Thales | METHOD AND DEVICE FOR PROCESSING DATA TO BE PROVIDED AS INPUT OF A FIRST SHIFT REGISTER OF A SYSTOLIC NEURONAL ELECTRONIC CIRCUIT |
CN112632464A (en) * | 2020-12-28 | 2021-04-09 | 上海壁仞智能科技有限公司 | Processing device for processing data |
CN112862067A (en) * | 2021-01-14 | 2021-05-28 | 支付宝(杭州)信息技术有限公司 | Method and device for processing business by utilizing business model based on privacy protection |
CN112836813B (en) * | 2021-02-09 | 2023-06-16 | 南方科技大学 | Reconfigurable pulse array system for mixed-precision neural network calculation |
CN112836813A (en) * | 2021-02-09 | 2021-05-25 | 南方科技大学 | Reconfigurable pulsation array system for mixed precision neural network calculation |
CN113393376A (en) * | 2021-05-08 | 2021-09-14 | 杭州电子科技大学 | Lightweight super-resolution image reconstruction method based on deep learning |
CN113870273A (en) * | 2021-12-02 | 2021-12-31 | 之江实验室 | Neural network accelerator characteristic graph segmentation method based on pulse array |
CN113870273B (en) * | 2021-12-02 | 2022-03-25 | 之江实验室 | Neural network accelerator characteristic graph segmentation method based on pulse array |
CN113869507A (en) * | 2021-12-02 | 2021-12-31 | 之江实验室 | Neural network accelerator convolution calculation device and method based on pulse array |
CN114675806A (en) * | 2022-05-30 | 2022-06-28 | 中科南京智能技术研究院 | Pulsation matrix unit and pulsation matrix calculation device |
Also Published As
Publication number | Publication date |
---|---|
CN107578098B (en) | 2020-10-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107578098A (en) | Neural network processor based on systolic arrays | |
CN107578095B (en) | Neural computing device and processor comprising the computing device | |
CN107918794A (en) | Neural network processor based on computing array | |
CN107153873B (en) | A kind of two-value convolutional neural networks processor and its application method | |
CN105184366B (en) | A kind of time-multiplexed general neural network processor | |
CN105512723B (en) | A kind of artificial neural networks apparatus and method for partially connected | |
CN106951395A (en) | Towards the parallel convolution operations method and device of compression convolutional neural networks | |
CN107239824A (en) | Apparatus and method for realizing sparse convolution neutral net accelerator | |
CN109190756A (en) | Arithmetic unit based on Winograd convolution and the neural network processor comprising the device | |
WO2022134391A1 (en) | Fusion neuron model, neural network structure and training and inference methods therefor, storage medium, and device | |
CN108665059A (en) | Convolutional neural networks acceleration system based on field programmable gate array | |
CN106529670A (en) | Neural network processor based on weight compression, design method, and chip | |
Sripad et al. | SNAVA—A real-time multi-FPGA multi-model spiking neural network simulation architecture | |
CN107341544A (en) | A kind of reconfigurable accelerator and its implementation based on divisible array | |
CN106875013A (en) | The system and method for optimizing Recognition with Recurrent Neural Network for multinuclear | |
CN109472356A (en) | A kind of accelerator and method of restructural neural network algorithm | |
CN109325591A (en) | Neural network processor towards Winograd convolution | |
CN106951962A (en) | Compound operation unit, method and electronic equipment for neutral net | |
CN107491811A (en) | Method and system and neural network processor for accelerans network processing unit | |
CN106201651A (en) | The simulator of neuromorphic chip | |
CN107423816A (en) | A kind of more computational accuracy Processing with Neural Network method and systems | |
CN106650924A (en) | Processor based on time dimension and space dimension data flow compression and design method | |
CN108510065A (en) | Computing device and computational methods applied to long Memory Neural Networks in short-term | |
CN108446761A (en) | A kind of neural network accelerator and data processing method | |
CN111401547B (en) | HTM design method based on circulation learning unit for passenger flow analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |