CN106529670B - It is a kind of based on weight compression neural network processor, design method, chip - Google Patents
It is a kind of based on weight compression neural network processor, design method, chip Download PDFInfo
- Publication number
- CN106529670B CN106529670B CN201610958305.2A CN201610958305A CN106529670B CN 106529670 B CN106529670 B CN 106529670B CN 201610958305 A CN201610958305 A CN 201610958305A CN 106529670 B CN106529670 B CN 106529670B
- Authority
- CN
- China
- Prior art keywords
- weight
- storage unit
- data
- neural network
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Abstract
The present invention propose it is a kind of based on weight compression neural network processor, design method, chip, which includes at least one storage unit, for store operational order and participate in calculate data;At least one storage unit controller, for controlling the storage unit;At least one computing unit, for executing the calculating operation of neural network;Control unit is connected with the storage unit controller with the computing unit, for obtaining the instruction of the storage unit storage via the storage unit controller, and parses described instruction to control the computing unit;At least one weight retrieval unit, wherein each weight retrieval unit is connected with the computing unit, guarantees compressed weight and the correct operation of corresponding data for retrieving to weight.Present invention reduces the occupancy of weight resource in neural network processor, improve arithmetic speed, improve energy efficiency.
Description
Technical field
The present invention relates to the hardware-accelerated field that neural network model calculates, in particular to a kind of mind based on weight compression
Through network processing unit, design method, chip.
Background technique
Deep learning is the important branch in machine learning field, achieves important breakthrough in recent years.Using depth
The neural network model of algorithm training is practised since the proposition in the application fields such as image recognition, speech processes, intelligent robot
Achieve the achievement to attract people's attention.
Deep neural network by establishing the neural connection structure of modeling human brain, processing image, sound and
When the signals such as text, data characteristics is described by the layering of multiple conversion stages.With the continuous of neural network complexity
It improving, nerual network technique exists in actual application occupies the problems such as resource is more, arithmetic speed is slow, energy consumption is big,
Therefore there is serious efficiency and operation speed when the fields such as embedded device or low overhead data center are applied in the technology
Spend bottleneck.Using it is hardware-accelerated substitution traditional software calculate method become improve neural computing efficiency a kind of row it
Effective means.The hardware-accelerated mode of mainstream includes graphics processing unit, application specific processor chip and field programmable logic
Array (FPGA) etc..
In existing nerual network technique, neural network model carries out more wheel training according to training set, according to sample order
Obtain neural network weighted value.Neural network weight has certain sparsity, and there are the weight that big numerical quantity is 0, these power
Weight and data do not generate influence numerically after the operations such as multiplication and addition to operation result.Weight in these neural networks
It is related with the inherent characteristic of deep neural network for 0 weighted value, it is obtained by repeatedly training, and be not easy to eliminate from algorithm angle.
The weight that these numerical value are 0 is when the processes such as storage, loading and operation can occupy a large amount of Resources on Chip, consume extra work
Between, it is difficult to meet the performance requirement of neural network processor.
Therefore no matter in academia or industry, it is 0 element for numerical value in above-mentioned neural network, has carried out and largely ground
Study carefully.Document " Albericio J, Judd P, Hetherington T, et al.Cnvlutin:ineffectual-neuron-
free deep neural network computing[C]//Computer Architecture(ISCA),2016ACM/
IEEE 43rd Annual International Symposium on.IEEE, 2016:1-13. " is big by providing on piece
The storage unit of scale realizes Large-scale parallel computing and realizes the compression to data element based on this, but relies on piece
Upper large-scale storage unit is not suitable for embedded device to meet the parallel computation the needs of;Document " Chen Y H,
Emer J,Sze V.Eyeriss:A Spatial Architecture for Energy-Efficient Dataflow for
Convolutional Neural Networks [J] .2016. " realizes data reusing by shared data and weight and uses electricity
The method of source gate closes the calculating of element 0, can effectively improve energy efficiency, but this method can only reduce operation power consumption without
Method skips data 0 and then accelerates calculating speed.
Invention " a kind of neural network accelerator and its operation method ", the invention are suitable for neural network algorithm field, mention
Having supplied a kind of neural network accelerator and its operation method, the neural network accelerator includes storage medium in piece, address in piece
Index module, core calculation module and more ALU devices, storage medium in piece, for storing the external data transmitted or being used for
The data generated in storage computation process;Data directory module in piece maps to when for executing operation according to the index of input
Correct storage address;Core calculation module is for executing neural network computing;More ALU devices be used for from core calculation module or
Storage medium obtains input data and executes the impossible nonlinear operation of core calculation module in piece.The invention is in neural network
More ALU designs are introduced in accelerator, so that the arithmetic speed of nonlinear operation is promoted, so that neural network accelerator is more increased
Effect.The maximum difference of the present invention and the invention is that weight compression storage organization is introduced in neural network accelerator, is improved
Neural network computing speed simultaneously reduces energy loss.
Invention " accelerates the arithmetic unit and method of the acceleration chip of deep neural network algorithm ", which provides a kind of add
The arithmetic unit and method of the acceleration chip of fast deep neural network algorithm, described device includes: vectorial addition processor module,
Carry out the operation of the vectorization of the addition or the pooling layer algorithm in subtraction, and/or deep neural network algorithm of vector;To
Flow function value arithmetic device module, the vector quantities operation of the non-linear evaluation in deep neural network algorithm;Vector adder and multiplier module,
Carry out the multiply-add operation of vector;Three modules execute programmable instructions, interact with each other to calculate the neuron of neural network
Value and network export result and represent input layer to the synapse weight variable quantity of output layer neuron action intensity;
It is provided with median storage region in three modules, and main memory is read out and write operation.Thereby, it is possible to
Reduction reads and writees number to the median of main memory, reduces the energy consumption of accelerator chip, avoids data processing
Shortage of data and replacement problem in journey.The maximum difference of the present invention and the invention is that power is introduced in neural network accelerator
Weight contracting storage organization improves neural network computing speed and reduces energy loss.
Summary of the invention
For the drawbacks described above of existing neural network processor, the present invention proposes a kind of neural network based on weight compression
Processor, design method, chip, the system introduce weight index structure, Jin Erti in existing neural network processor system
The arithmetic speed and energy loss of neural network acceleration are risen.
The present invention proposes a kind of neural network processor based on weight compression, comprising:
At least one storage unit, for storing operational order and participating in the data calculated;
At least one storage unit controller, for controlling the storage unit;
At least one computing unit, for executing the calculating operation of neural network;
Control unit is connected with the storage unit controller with the computing unit, for via the storage unit
Controller obtains the instruction of the storage unit storage, and parses described instruction to control the computing unit;
At least one weight retrieval unit, for being retrieved to weight, wherein each weight retrieval unit and institute
It states computing unit to be connected, guarantees compressed weight and the correct operation of corresponding data.
The storage unit includes input data storage unit, output data storage unit, weight storage unit, instructs and deposit
Storage unit.
The input data storage unit is used to store the data for participating in calculating, and the data for participating in calculating include
Primitive character diagram data and the data for participating in middle layer calculating;The output data storage unit includes calculating the neuron obtained
Response;The weight storage unit is for storing trained neural network weight;Described instruction storage unit is used for
Storage participates in the command information calculated.
It is recoded by the method compressed offline under piece to the data for participating in calculating, passes through weight compressed format
Realize weight compression.
The weight compressed format include<weight, offset>.
Weight in the weight compressed format is original value of the neural network weight before being compressed, and the offset is
The relative position of current non-zero weight in one group of weighted value.
In weight compression process, by recompiling the weight value sequence of acquisition for the element that not retain numerical value be zero,
Only retain nonzero element.
The computing unit obtains data from the input data storage unit associated there to be calculated, and
And data are written to the output data storage unit associated there.
The present invention also proposes a kind of design method of the neural network processor based on weight compression described in design, comprising:
Step 1, described control unit is addressed the storage unit, reads and parses the finger needed to be implemented in next step
It enables;
Step 2, storage address is obtained according to the instruction parsed, and obtains the participation from the storage unit and calculates
Data and weight;
Step 3, the data for participating in calculating and weight are stored from the input storage unit and the weight respectively
Unit is loaded into the computing unit;
Step 4, the computing unit executes the arithmetic operation in neural network computing, wherein being retrieved by the weight single
Member ensure that compressed data can be computed correctly with weighted data;
Step 5, neural computing result is stored in the output storage unit.
The present invention also proposes a kind of chip including the neural network processor based on weight compression.
As it can be seen from the above scheme the present invention has the advantages that
The present invention poor, this low problem of energy efficiency for arithmetic speed present in neural network processor, by from
The mode of wire compression is reduced in neural network processor and is weighed by neural network weight boil down to weight compressed format outside piece
The occupancy of weight resource, improves arithmetic speed, improves energy efficiency.
Detailed description of the invention
Fig. 1 is neural network processor structural block diagram provided by the invention;
Fig. 2 is that a kind of weight proposed by the present invention compresses storage format figure;
Fig. 3 is weight compression unit schematic diagram in the single computing unit embodiment of the present invention;
Fig. 4 is weight compression unit schematic diagram in multioperation unit embodiment of the present invention;
Fig. 5 is the structural schematic diagram of computing unit of the present invention;
Fig. 6 is the flow chart that neural network processor proposed by the present invention carries out neural network computing.
Specific embodiment
When studying neural network processor, discovery neural network weight has certain sparsity, and there are a large amount of numbers
The weight that value is 0, these weights and data do not generate influence numerically after the operations such as multiplication and addition to operation result,
The weight that these numerical value are 0 is when the processes such as storage, loading and operation can occupy a large amount of Resources on Chip, consume extra work
Between, it is difficult to meet the performance requirement of neural network processor.
Analysis is carried out by the calculating structure to existing neural network processor to find, can to neural network weighted value into
Row compression realizes that the purpose accelerated arithmetic speed, reduce energy loss, the prior art provide the basic frame of neural network accelerator
Structure, the present invention propose a kind of weight compression storage format in prior art basis, and weighted data is being deposited after recoding
Storage format is compressed using weight in storage, transmission and calculating process, and increases weight index structure in neural computing unit,
Ensuring can be with the correct operation of data element by compressed weight.
To achieve the above object, the present invention proposes a kind of neural network processor based on weight compression, comprising:
At least one storage unit, for storing operational order and participating in the data calculated;
At least one computing unit, for executing neural computing;And control unit, at least one storage unit
Controller is connected at least one described computing unit, for described extremely via the acquisition of at least one described storage unit controller
The instruction of few storage unit storage, and the instruction is parsed to control at least one described computing unit;
At least one weight retrieval unit, wherein each weight retrieval unit is connected at least one described computing unit,
Guarantee compressed weight and the correct operation of corresponding data;
Neural network processor system according to the present invention, the weight are trained neural network weight.
Neural network processor according to the present invention, when the neural network processor carries out neural computing,
Trained neural network weight can be compressed to weight compressed format outside piece, and be stored in a storage unit.
The present invention by the way of compressing offline under piece by neural network weight boil down to weight compressed format, and pass through
Input interface is transmitted to the storage unit of on piece.
It is logical below in conjunction with attached drawing in order to keep the purpose of the present invention, technical solution, design method and advantage more clear
Crossing specific embodiment, the present invention is described in more detail, it should be understood that specific embodiment described herein is only to explain
The present invention is not intended to limit the present invention.
The present invention is intended to provide a kind of neural network processor based on weight compression, in Processing with Neural Network system
It introduces weight retrieval unit and neural network weight is stored using weight compression storage format, so that on piece storage overhead is reduced,
It reduces computing circuit scale and improves operation efficiency, so that Processing with Neural Network system performance is efficient.
Processing with Neural Network provided by the invention is based on storage-control-calculating structure;
Storage organization is used to store the data for participating in calculating and coprocessor operation instruction;
Control structure includes decoding circuit, for parsing operational order, generates control signal with the tune of data in control sheet
Degree and storage and neural computing process;
Calculating structure includes arithmetic logic unit, for participating in the operation of the neural computing in the processor, compresses number
Calculating operation is realized according in calculating structure.
The present invention also proposes a kind of comprising the chip of the neural network processor compressed based on weight
Fig. 1 is a kind of neural network processor 101 provided by the invention, which is made of six parts, including
Input data storage unit 102, control unit 103, output data storage unit 104, weight storage unit 105, instruction storage
Unit 106, computing unit 107.
Input data storage unit 102 is used to store the data for participating in calculating, the data include primitive character diagram data with
Participate in the data that middle layer calculates;Output data storage unit 104 includes the neuron response being calculated;Weight storage is single
Member 105 is for storing trained neural network weight;The location of instruction 106 stores the command information for participating in calculating,
Instruction is parsed to realize neural computing.
Control unit 103 respectively with output data storage unit 104, weight storage unit 105, the location of instruction 106,
Computing unit 107 is connected, and control unit 103 obtains the instruction being stored in the location of instruction 106 and parses the instruction, controls
Unit 103 processed can carry out neural computing according to the control signal control computing unit analyzed the instruction.
Computing unit 107 is used to execute corresponding neural computing according to the control signal that control unit 103 generates.
Computing unit 107 is associated with one or more storage units, and computing unit 107 can be deposited from input data associated there
Data storage part in storage unit 102 obtains data to be calculated, and can store to the associated output data
Data are written in unit 104.Computing unit 107 completes most of operation in neural network algorithm, i.e. multiply-add operation of vector etc., this
It outside, is weight compressed format due to being loaded into the weight format for participating in calculating in computing unit 107, in computing unit 107
In should also include that weight retrieves subelement, the subelement is for guaranteeing that compressed weight can be computed correctly with weight.
Fig. 2 is a kind of weight compressed format proposed by the present invention, by the method compressed offline under piece to initial data into
Row is recoded, and then realizes weight compression.The weight compressed format includes<weight, and offset>two parts composition, weight is mind
Original value through network weight before being compressed, offset are the relative position of current non-zero weight in one group of weighted value.It is pressing
In compression process, by recompiling to obtain weight value sequence for the element that not retain numerical value be zero, only retain nonzero element, the party
Method ensure that only non-zero weight value participates in neural computing, is compressed by weight, effectively reduces weight quantity in data,
Reduce neural computing amount, improves system integral operation speed.
The weight compression process is described in detail by Fig. 3.Weight is grouped, first prime number in every group is by computing unit
Scale determine.Weight compression process now is described in detail so that every group of weight includes four elements as an example, in first group of weight, number
The element that value is 1.5 and 2.5 is respectively the 0th and the 1st element, therefore after recompiling, this group of weight remains two
Nonzero element, the offset for indicating element position is respectively 0 and 1;It include three non-zeros in second group of original weighted data
Element is the 0th, the 3rd and the 4th element in this group of data, therefore offset is respectively 0,3 and 4.In third group weight
It include 3 and 4 two nonzero elements, offset is respectively 2 and 3 in value.
When computing unit resource is enough, i.e., when possessing multiple computing units simultaneously, the weighted value of multiple and different queues can
It is loaded into different computing units simultaneously, the elements in parallel work of same order position, is independent of each other in different queue, grouping side
Formula is identical as single computing unit, and the element in each queue in identical relative position is divided into a group, is calculating
Cheng Zhong, the data parallel of different queue is loaded into computing unit in each queue.
For convenience of description, Fig. 4 illustrates more computing unit situations by taking two computing units as an example, includes two weights in Fig. 4
Queue, each queue weight are respectively connected into corresponding computing unit, and each computing unit works independently.According to computing unit
Amount of capacity, weight are divided into four groups, and in every group of weight, the weighted value of each queue is according to identical group of interior element length point
It does not compress.
There is two o'clock advantage with weight compressed format storage weight, only store the nonzero element in weight first, it can significantly
Reduce EMS memory occupation;Secondly, only nonzero element is loaded into computing unit, improve calculating speed and improves computing unit
Utilization rate.
Fig. 5 is computing unit structural schematic diagram, is described corresponding when weight and data carry out convolutional neural networks calculating
Relationship, in the calculating process, into each computing unit, different weighted values is linked into each computing unit for data sharing
In, each computing unit concurrent working.
Fig. 6 is a kind of flow chart of neural computing process of the present invention, this method comprises:
Step S1, control unit address storage unit, read and parse the instruction needed to be implemented in next step;
Step S2 obtains input data according to the storage address analyzed the instruction from storage unit;
Data and weight are loaded into computing unit from input storage unit and weight storage unit respectively by step S3;
Step S4, computing unit execute neural network computing in arithmetic operation, wherein data retrieval structure ensure that by
The data of compression can be computed correctly with weighted data;
Step S5 will be stored in output storage unit with neural computing result.
In conclusion the present invention is for arithmetic speed present in neural network processor, poor, low this of energy efficiency is asked
Topic, by way of compressing offline, by neural network weight boil down to weight compressed format outside piece, reduces at neural network
The occupancy for managing weight resource in device, improves arithmetic speed, improves energy efficiency.
Although not each embodiment only includes one it should be appreciated that this specification describes according to various embodiments
A independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should will say
As a whole, the technical solutions in the various embodiments may also be suitably combined for bright book, and forming those skilled in the art can be with
The other embodiments of understanding.
The foregoing is merely the schematical specific embodiment of the present invention, the range being not intended to limit the invention.It is any
Those skilled in the art, made equivalent variations, modification and combination under the premise of not departing from design and the principle of the present invention,
It should belong to the scope of protection of the invention.
Claims (7)
1. a kind of neural network processor based on weight compression characterized by comprising
At least one storage unit, for storing operational order and participating in the data calculated;
At least one storage unit controller, for controlling the storage unit;
At least one computing unit, for executing the calculating operation of neural network;
Control unit is connected with the storage unit controller with the computing unit, for controlling via the storage unit
Device obtains the instruction of the storage unit storage, and parses described instruction to control the computing unit;
At least one weight compression unit, for being compressed to weight, wherein each weight compression unit and the meter
It calculates unit to be connected, guarantees compressed weight and the correct operation of corresponding data;
Wherein, it is recoded by the method compressed offline under piece to the data for participating in calculating, lattice is compressed by weight
Formula realizes weight compression;
In weight compression process, by recompiling the weight value sequence of acquisition for the element that not retain numerical value be zero, only protect
Stay nonzero element.
2. the neural network processor as described in claim 1 based on weight compression, which is characterized in that the storage unit packet
Include input data storage unit, output data storage unit, weight storage unit, the location of instruction.
3. the neural network processor as claimed in claim 2 based on weight compression, which is characterized in that the input data is deposited
Storage unit is used to store the data that the participation calculates, and the data for participating in calculating include in primitive character diagram data and participation
The data that interbed calculates;The output data storage unit includes calculating the neuron response obtained;The weight storage is single
Member is for storing trained neural network weight;Described instruction storage unit is used to store the instruction letter for participating in calculating
Breath.
4. the neural network processor as described in claim 1 based on weight compression, which is characterized in that the weight compresses lattice
Formula include<weight, offset>.
5. as claimed in claim 2 based on weight compression neural network processor, which is characterized in that the computing unit from
Data are obtained in the input data storage unit associated there to be calculated, and to associated there described defeated
Data are written in data storage cell out.
6. a kind of design side of neural network processor of design based on weight compression as described in claim 1-5 any one
Method characterized by comprising
Step 1, control unit is addressed storage unit, reads and parses the instruction needed to be implemented in next step;
Step 2, storage address is obtained according to the instruction that parses, and obtained from the storage unit data that participate in calculating with
Weight;
Step 3, the data for participating in calculating and weight are loaded into meter from input storage unit and weight storage unit respectively
Calculate unit;
Step 4, computing unit executes the arithmetic operation in neural network computing, is pressed wherein ensure that by weight compression unit
The data of contracting can be computed correctly with weighted data;
Step 5, neural computing result is stored in output storage unit;
Wherein, it is recoded by the method compressed offline under piece to the data for participating in calculating, lattice is compressed by weight
Formula realizes weight compression;
In weight compression process, by recompiling the weight value sequence of acquisition for the element that not retain numerical value be zero, only protect
Stay nonzero element.
7. a kind of includes the chip of the neural network processor based on weight compression as described in claim 1-5 any one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610958305.2A CN106529670B (en) | 2016-10-27 | 2016-10-27 | It is a kind of based on weight compression neural network processor, design method, chip |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610958305.2A CN106529670B (en) | 2016-10-27 | 2016-10-27 | It is a kind of based on weight compression neural network processor, design method, chip |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106529670A CN106529670A (en) | 2017-03-22 |
CN106529670B true CN106529670B (en) | 2019-01-25 |
Family
ID=58325737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610958305.2A Active CN106529670B (en) | 2016-10-27 | 2016-10-27 | It is a kind of based on weight compression neural network processor, design method, chip |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106529670B (en) |
Families Citing this family (60)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107103113B (en) * | 2017-03-23 | 2019-01-11 | 中国科学院计算技术研究所 | The Automation Design method, apparatus and optimization method towards neural network processor |
CN107092961B (en) * | 2017-03-23 | 2018-08-28 | 中国科学院计算技术研究所 | A kind of neural network processor and design method based on mode frequency statistical coding |
CN107016175B (en) * | 2017-03-23 | 2018-08-31 | 中国科学院计算技术研究所 | It is applicable in the Automation Design method, apparatus and optimization method of neural network processor |
CN107103358A (en) * | 2017-03-24 | 2017-08-29 | 中国科学院计算技术研究所 | Processing with Neural Network method and system based on spin transfer torque magnetic memory |
CN107086910B (en) * | 2017-03-24 | 2018-08-10 | 中国科学院计算技术研究所 | A kind of weight encryption and decryption method and system for Processing with Neural Network |
CN107423816B (en) * | 2017-03-24 | 2021-10-12 | 中国科学院计算技术研究所 | Multi-calculation-precision neural network processing method and system |
US11544545B2 (en) | 2017-04-04 | 2023-01-03 | Hailo Technologies Ltd. | Structured activation based sparsity in an artificial neural network |
US11551028B2 (en) | 2017-04-04 | 2023-01-10 | Hailo Technologies Ltd. | Structured weight based sparsity in an artificial neural network |
US11238334B2 (en) | 2017-04-04 | 2022-02-01 | Hailo Technologies Ltd. | System and method of input alignment for efficient vector operations in an artificial neural network |
US10387298B2 (en) | 2017-04-04 | 2019-08-20 | Hailo Technologies Ltd | Artificial neural network incorporating emphasis and focus techniques |
US11615297B2 (en) | 2017-04-04 | 2023-03-28 | Hailo Technologies Ltd. | Structured weight based sparsity in an artificial neural network compiler |
US10795836B2 (en) * | 2017-04-17 | 2020-10-06 | Microsoft Technology Licensing, Llc | Data processing performance enhancement for neural networks using a virtualized data iterator |
CN107679621B (en) * | 2017-04-19 | 2020-12-08 | 赛灵思公司 | Artificial neural network processing device |
CN108734288B (en) * | 2017-04-21 | 2021-01-29 | 上海寒武纪信息科技有限公司 | Operation method and device |
CN109146069B (en) * | 2017-06-16 | 2020-10-13 | 上海寒武纪信息科技有限公司 | Arithmetic device, arithmetic method, and chip |
SG11201911600VA (en) * | 2017-07-07 | 2020-01-30 | Mitsubishi Electric Corp | Data processing device, data processing method, and compressed data |
CN111656360B (en) * | 2017-07-21 | 2024-02-20 | 森田公司 | System and method for sparsity utilization |
CN107622305A (en) * | 2017-08-24 | 2018-01-23 | 中国科学院计算技术研究所 | Processor and processing method for neutral net |
CN107578098B (en) * | 2017-09-01 | 2020-10-30 | 中国科学院计算技术研究所 | Neural network processor based on systolic array |
CN107491811A (en) * | 2017-09-01 | 2017-12-19 | 中国科学院计算技术研究所 | Method and system and neural network processor for accelerans network processing unit |
US20190087713A1 (en) * | 2017-09-21 | 2019-03-21 | Qualcomm Incorporated | Compression of sparse deep convolutional network weights |
JP6957365B2 (en) * | 2017-09-22 | 2021-11-02 | 株式会社東芝 | Arithmetic logic unit |
CN108205704B (en) * | 2017-09-27 | 2021-10-29 | 深圳市商汤科技有限公司 | Neural network chip |
CN107800700B (en) * | 2017-10-27 | 2020-10-27 | 中国科学院计算技术研究所 | Router and network-on-chip transmission system and method |
CN109726805B (en) * | 2017-10-30 | 2021-02-09 | 上海寒武纪信息科技有限公司 | Method for designing neural network processor by using black box simulator |
CN107729995A (en) * | 2017-10-31 | 2018-02-23 | 中国科学院计算技术研究所 | Method and system and neural network processor for accelerans network processing unit |
CN107844829A (en) * | 2017-10-31 | 2018-03-27 | 中国科学院计算技术研究所 | Method and system and neural network processor for accelerans network processing unit |
CN107977704B (en) * | 2017-11-10 | 2020-07-31 | 中国科学院计算技术研究所 | Weight data storage method and neural network processor based on same |
CN107918794A (en) * | 2017-11-15 | 2018-04-17 | 中国科学院计算技术研究所 | Neural network processor based on computing array |
CN107944555B (en) * | 2017-12-07 | 2021-09-17 | 广州方硅信息技术有限公司 | Neural network compression and acceleration method, storage device and terminal |
CN109791628B (en) * | 2017-12-29 | 2022-12-27 | 清华大学 | Neural network model block compression method, training method, computing device and system |
CN111582464B (en) * | 2017-12-29 | 2023-09-29 | 中科寒武纪科技股份有限公司 | Neural network processing method, computer system and storage medium |
CN110045960B (en) * | 2018-01-16 | 2022-02-18 | 腾讯科技(深圳)有限公司 | Chip-based instruction set processing method and device and storage medium |
US11436483B2 (en) * | 2018-01-17 | 2022-09-06 | Mediatek Inc. | Neural network engine with tile-based execution |
CN108334945B (en) * | 2018-01-30 | 2020-12-25 | 中国科学院自动化研究所 | Acceleration and compression method and device of deep neural network |
CN108416425B (en) * | 2018-02-02 | 2020-09-29 | 浙江大华技术股份有限公司 | Convolution operation method and device |
CN110197272B (en) * | 2018-02-27 | 2020-08-25 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and related product |
CN108510058B (en) * | 2018-02-28 | 2021-07-20 | 中国科学院计算技术研究所 | Weight storage method in neural network and processor based on method |
CN108171328B (en) * | 2018-03-02 | 2020-12-29 | 中国科学院计算技术研究所 | Neural network processor and convolution operation method executed by same |
CN108647774B (en) * | 2018-04-23 | 2020-11-20 | 瑞芯微电子股份有限公司 | Neural network method and circuit for optimizing sparsity matrix operation |
CN108764454B (en) * | 2018-04-28 | 2022-02-25 | 中国科学院计算技术研究所 | Neural network processing method based on wavelet transform compression and/or decompression |
US11687759B2 (en) * | 2018-05-01 | 2023-06-27 | Semiconductor Components Industries, Llc | Neural network accelerator |
CN109325590B (en) * | 2018-09-14 | 2020-11-03 | 中国科学院计算技术研究所 | Device for realizing neural network processor with variable calculation precision |
CN109543830B (en) * | 2018-09-20 | 2023-02-03 | 中国科学院计算技术研究所 | Splitting accumulator for convolutional neural network accelerator |
CN109492761A (en) * | 2018-10-30 | 2019-03-19 | 深圳灵图慧视科技有限公司 | Realize FPGA accelerator, the method and system of neural network |
US11588499B2 (en) | 2018-11-05 | 2023-02-21 | Samsung Electronics Co., Ltd. | Lossless compression of neural network weights |
CN109886416A (en) * | 2019-02-01 | 2019-06-14 | 京微齐力(北京)科技有限公司 | The System on Chip/SoC and machine learning method of integrated AI's module |
CN110334716B (en) * | 2019-07-04 | 2022-01-11 | 北京迈格威科技有限公司 | Feature map processing method, image processing method and device |
CN112119593A (en) * | 2019-07-25 | 2020-12-22 | 深圳市大疆创新科技有限公司 | Data processing method and system, encoder and decoder |
US11635893B2 (en) * | 2019-08-12 | 2023-04-25 | Micron Technology, Inc. | Communications between processors and storage devices in automotive predictive maintenance implemented via artificial neural networks |
CN111105018B (en) * | 2019-10-21 | 2023-10-13 | 深圳云天励飞技术有限公司 | Data processing method and device |
CN112835510B (en) * | 2019-11-25 | 2022-08-26 | 北京灵汐科技有限公司 | Method and device for controlling storage format of on-chip storage resource |
CN113011577B (en) * | 2019-12-20 | 2024-01-05 | 阿里巴巴集团控股有限公司 | Processing unit, processor core, neural network training machine and method |
US11263077B1 (en) | 2020-09-29 | 2022-03-01 | Hailo Technologies Ltd. | Neural network intermediate results safety mechanism in an artificial neural network processor |
US11811421B2 (en) | 2020-09-29 | 2023-11-07 | Hailo Technologies Ltd. | Weights safety mechanism in an artificial neural network processor |
US11874900B2 (en) | 2020-09-29 | 2024-01-16 | Hailo Technologies Ltd. | Cluster interlayer safety mechanism in an artificial neural network processor |
US11237894B1 (en) | 2020-09-29 | 2022-02-01 | Hailo Technologies Ltd. | Layer control unit instruction addressing safety mechanism in an artificial neural network processor |
US11221929B1 (en) | 2020-09-29 | 2022-01-11 | Hailo Technologies Ltd. | Data stream fault detection mechanism in an artificial neural network processor |
TWI769807B (en) * | 2021-05-04 | 2022-07-01 | 國立清華大學 | Hardware/software co-compressed computing method and system for sram computing-in-memory-based processing unit |
CN113688983A (en) * | 2021-08-09 | 2021-11-23 | 上海新氦类脑智能科技有限公司 | Convolution operation implementation method, circuit and terminal for reducing weight storage in impulse neural network |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009176110A (en) * | 2008-01-25 | 2009-08-06 | Seiko Epson Corp | Parallel processing device and parallel processing method |
CN105184366A (en) * | 2015-09-15 | 2015-12-23 | 中国科学院计算技术研究所 | Time-division-multiplexing general neural network processor |
CN105260776A (en) * | 2015-09-10 | 2016-01-20 | 华为技术有限公司 | Neural network processor and convolutional neural network processor |
CN105512723A (en) * | 2016-01-20 | 2016-04-20 | 南京艾溪信息科技有限公司 | Artificial neural network calculating device and method for sparse connection |
-
2016
- 2016-10-27 CN CN201610958305.2A patent/CN106529670B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009176110A (en) * | 2008-01-25 | 2009-08-06 | Seiko Epson Corp | Parallel processing device and parallel processing method |
CN105260776A (en) * | 2015-09-10 | 2016-01-20 | 华为技术有限公司 | Neural network processor and convolutional neural network processor |
CN105184366A (en) * | 2015-09-15 | 2015-12-23 | 中国科学院计算技术研究所 | Time-division-multiplexing general neural network processor |
CN105512723A (en) * | 2016-01-20 | 2016-04-20 | 南京艾溪信息科技有限公司 | Artificial neural network calculating device and method for sparse connection |
Also Published As
Publication number | Publication date |
---|---|
CN106529670A (en) | 2017-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106529670B (en) | It is a kind of based on weight compression neural network processor, design method, chip | |
CN106447034B (en) | A kind of neural network processor based on data compression, design method, chip | |
CN106650924B (en) | A kind of processor based on time dimension and space dimension data stream compression, design method | |
CN106355244B (en) | The construction method and system of convolutional neural networks | |
CN105681628B (en) | A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and the method for realizing image denoising processing | |
CN110516801B (en) | High-throughput-rate dynamic reconfigurable convolutional neural network accelerator | |
CN107491811A (en) | Method and system and neural network processor for accelerans network processing unit | |
CN107169563B (en) | Processing system and method applied to two-value weight convolutional network | |
CN106951926A (en) | The deep learning systems approach and device of a kind of mixed architecture | |
CN107609641A (en) | Sparse neural network framework and its implementation | |
CN108280514A (en) | Sparse neural network acceleration system based on FPGA and design method | |
CN109376843A (en) | EEG signals rapid classification method, implementation method and device based on FPGA | |
CN110390383A (en) | A kind of deep neural network hardware accelerator based on power exponent quantization | |
CN109325591A (en) | Neural network processor towards Winograd convolution | |
CN109472350A (en) | A kind of neural network acceleration system based on block circulation sparse matrix | |
CN107729995A (en) | Method and system and neural network processor for accelerans network processing unit | |
CN109472356A (en) | A kind of accelerator and method of restructural neural network algorithm | |
CN110390385A (en) | A kind of general convolutional neural networks accelerator of configurable parallel based on BNRP | |
CN101625735A (en) | FPGA implementation method based on LS-SVM classification and recurrence learning recurrence neural network | |
CN109190756A (en) | Arithmetic unit based on Winograd convolution and the neural network processor comprising the device | |
Liu et al. | Hardware acceleration of fully quantized bert for efficient natural language processing | |
CN109447241A (en) | A kind of dynamic reconfigurable convolutional neural networks accelerator architecture in internet of things oriented field | |
CN107085562A (en) | A kind of neural network processor and design method based on efficient multiplexing data flow | |
CN107844829A (en) | Method and system and neural network processor for accelerans network processing unit | |
CN112465110A (en) | Hardware accelerator for convolution neural network calculation optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230109 Address after: 518063 14th Floor, West Tower, Baidu International Building, No. 8, Haitian 1st Road, Binhai Community, Yuehai Street, Nanshan District, Shenzhen, Guangdong Patentee after: Zhongke Times (Shenzhen) Computer System Co.,Ltd. Address before: 100080 No. 6 South Road, Zhongguancun Academy of Sciences, Beijing, Haidian District Patentee before: Institute of Computing Technology, Chinese Academy of Sciences |