CN106529670B - It is a kind of based on weight compression neural network processor, design method, chip - Google Patents

It is a kind of based on weight compression neural network processor, design method, chip Download PDF

Info

Publication number
CN106529670B
CN106529670B CN201610958305.2A CN201610958305A CN106529670B CN 106529670 B CN106529670 B CN 106529670B CN 201610958305 A CN201610958305 A CN 201610958305A CN 106529670 B CN106529670 B CN 106529670B
Authority
CN
China
Prior art keywords
weight
storage unit
data
neural network
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610958305.2A
Other languages
Chinese (zh)
Other versions
CN106529670A (en
Inventor
韩银和
许浩博
王颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Times Shenzhen Computer System Co ltd
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201610958305.2A priority Critical patent/CN106529670B/en
Publication of CN106529670A publication Critical patent/CN106529670A/en
Application granted granted Critical
Publication of CN106529670B publication Critical patent/CN106529670B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Abstract

The present invention propose it is a kind of based on weight compression neural network processor, design method, chip, which includes at least one storage unit, for store operational order and participate in calculate data;At least one storage unit controller, for controlling the storage unit;At least one computing unit, for executing the calculating operation of neural network;Control unit is connected with the storage unit controller with the computing unit, for obtaining the instruction of the storage unit storage via the storage unit controller, and parses described instruction to control the computing unit;At least one weight retrieval unit, wherein each weight retrieval unit is connected with the computing unit, guarantees compressed weight and the correct operation of corresponding data for retrieving to weight.Present invention reduces the occupancy of weight resource in neural network processor, improve arithmetic speed, improve energy efficiency.

Description

It is a kind of based on weight compression neural network processor, design method, chip
Technical field
The present invention relates to the hardware-accelerated field that neural network model calculates, in particular to a kind of mind based on weight compression Through network processing unit, design method, chip.
Background technique
Deep learning is the important branch in machine learning field, achieves important breakthrough in recent years.Using depth The neural network model of algorithm training is practised since the proposition in the application fields such as image recognition, speech processes, intelligent robot Achieve the achievement to attract people's attention.
Deep neural network by establishing the neural connection structure of modeling human brain, processing image, sound and When the signals such as text, data characteristics is described by the layering of multiple conversion stages.With the continuous of neural network complexity It improving, nerual network technique exists in actual application occupies the problems such as resource is more, arithmetic speed is slow, energy consumption is big, Therefore there is serious efficiency and operation speed when the fields such as embedded device or low overhead data center are applied in the technology Spend bottleneck.Using it is hardware-accelerated substitution traditional software calculate method become improve neural computing efficiency a kind of row it Effective means.The hardware-accelerated mode of mainstream includes graphics processing unit, application specific processor chip and field programmable logic Array (FPGA) etc..
In existing nerual network technique, neural network model carries out more wheel training according to training set, according to sample order Obtain neural network weighted value.Neural network weight has certain sparsity, and there are the weight that big numerical quantity is 0, these power Weight and data do not generate influence numerically after the operations such as multiplication and addition to operation result.Weight in these neural networks It is related with the inherent characteristic of deep neural network for 0 weighted value, it is obtained by repeatedly training, and be not easy to eliminate from algorithm angle. The weight that these numerical value are 0 is when the processes such as storage, loading and operation can occupy a large amount of Resources on Chip, consume extra work Between, it is difficult to meet the performance requirement of neural network processor.
Therefore no matter in academia or industry, it is 0 element for numerical value in above-mentioned neural network, has carried out and largely ground Study carefully.Document " Albericio J, Judd P, Hetherington T, et al.Cnvlutin:ineffectual-neuron- free deep neural network computing[C]//Computer Architecture(ISCA),2016ACM/ IEEE 43rd Annual International Symposium on.IEEE, 2016:1-13. " is big by providing on piece The storage unit of scale realizes Large-scale parallel computing and realizes the compression to data element based on this, but relies on piece Upper large-scale storage unit is not suitable for embedded device to meet the parallel computation the needs of;Document " Chen Y H, Emer J,Sze V.Eyeriss:A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks [J] .2016. " realizes data reusing by shared data and weight and uses electricity The method of source gate closes the calculating of element 0, can effectively improve energy efficiency, but this method can only reduce operation power consumption without Method skips data 0 and then accelerates calculating speed.
Invention " a kind of neural network accelerator and its operation method ", the invention are suitable for neural network algorithm field, mention Having supplied a kind of neural network accelerator and its operation method, the neural network accelerator includes storage medium in piece, address in piece Index module, core calculation module and more ALU devices, storage medium in piece, for storing the external data transmitted or being used for The data generated in storage computation process;Data directory module in piece maps to when for executing operation according to the index of input Correct storage address;Core calculation module is for executing neural network computing;More ALU devices be used for from core calculation module or Storage medium obtains input data and executes the impossible nonlinear operation of core calculation module in piece.The invention is in neural network More ALU designs are introduced in accelerator, so that the arithmetic speed of nonlinear operation is promoted, so that neural network accelerator is more increased Effect.The maximum difference of the present invention and the invention is that weight compression storage organization is introduced in neural network accelerator, is improved Neural network computing speed simultaneously reduces energy loss.
Invention " accelerates the arithmetic unit and method of the acceleration chip of deep neural network algorithm ", which provides a kind of add The arithmetic unit and method of the acceleration chip of fast deep neural network algorithm, described device includes: vectorial addition processor module, Carry out the operation of the vectorization of the addition or the pooling layer algorithm in subtraction, and/or deep neural network algorithm of vector;To Flow function value arithmetic device module, the vector quantities operation of the non-linear evaluation in deep neural network algorithm;Vector adder and multiplier module, Carry out the multiply-add operation of vector;Three modules execute programmable instructions, interact with each other to calculate the neuron of neural network Value and network export result and represent input layer to the synapse weight variable quantity of output layer neuron action intensity; It is provided with median storage region in three modules, and main memory is read out and write operation.Thereby, it is possible to Reduction reads and writees number to the median of main memory, reduces the energy consumption of accelerator chip, avoids data processing Shortage of data and replacement problem in journey.The maximum difference of the present invention and the invention is that power is introduced in neural network accelerator Weight contracting storage organization improves neural network computing speed and reduces energy loss.
Summary of the invention
For the drawbacks described above of existing neural network processor, the present invention proposes a kind of neural network based on weight compression Processor, design method, chip, the system introduce weight index structure, Jin Erti in existing neural network processor system The arithmetic speed and energy loss of neural network acceleration are risen.
The present invention proposes a kind of neural network processor based on weight compression, comprising:
At least one storage unit, for storing operational order and participating in the data calculated;
At least one storage unit controller, for controlling the storage unit;
At least one computing unit, for executing the calculating operation of neural network;
Control unit is connected with the storage unit controller with the computing unit, for via the storage unit Controller obtains the instruction of the storage unit storage, and parses described instruction to control the computing unit;
At least one weight retrieval unit, for being retrieved to weight, wherein each weight retrieval unit and institute It states computing unit to be connected, guarantees compressed weight and the correct operation of corresponding data.
The storage unit includes input data storage unit, output data storage unit, weight storage unit, instructs and deposit Storage unit.
The input data storage unit is used to store the data for participating in calculating, and the data for participating in calculating include Primitive character diagram data and the data for participating in middle layer calculating;The output data storage unit includes calculating the neuron obtained Response;The weight storage unit is for storing trained neural network weight;Described instruction storage unit is used for Storage participates in the command information calculated.
It is recoded by the method compressed offline under piece to the data for participating in calculating, passes through weight compressed format Realize weight compression.
The weight compressed format include<weight, offset>.
Weight in the weight compressed format is original value of the neural network weight before being compressed, and the offset is The relative position of current non-zero weight in one group of weighted value.
In weight compression process, by recompiling the weight value sequence of acquisition for the element that not retain numerical value be zero, Only retain nonzero element.
The computing unit obtains data from the input data storage unit associated there to be calculated, and And data are written to the output data storage unit associated there.
The present invention also proposes a kind of design method of the neural network processor based on weight compression described in design, comprising:
Step 1, described control unit is addressed the storage unit, reads and parses the finger needed to be implemented in next step It enables;
Step 2, storage address is obtained according to the instruction parsed, and obtains the participation from the storage unit and calculates Data and weight;
Step 3, the data for participating in calculating and weight are stored from the input storage unit and the weight respectively Unit is loaded into the computing unit;
Step 4, the computing unit executes the arithmetic operation in neural network computing, wherein being retrieved by the weight single Member ensure that compressed data can be computed correctly with weighted data;
Step 5, neural computing result is stored in the output storage unit.
The present invention also proposes a kind of chip including the neural network processor based on weight compression.
As it can be seen from the above scheme the present invention has the advantages that
The present invention poor, this low problem of energy efficiency for arithmetic speed present in neural network processor, by from The mode of wire compression is reduced in neural network processor and is weighed by neural network weight boil down to weight compressed format outside piece The occupancy of weight resource, improves arithmetic speed, improves energy efficiency.
Detailed description of the invention
Fig. 1 is neural network processor structural block diagram provided by the invention;
Fig. 2 is that a kind of weight proposed by the present invention compresses storage format figure;
Fig. 3 is weight compression unit schematic diagram in the single computing unit embodiment of the present invention;
Fig. 4 is weight compression unit schematic diagram in multioperation unit embodiment of the present invention;
Fig. 5 is the structural schematic diagram of computing unit of the present invention;
Fig. 6 is the flow chart that neural network processor proposed by the present invention carries out neural network computing.
Specific embodiment
When studying neural network processor, discovery neural network weight has certain sparsity, and there are a large amount of numbers The weight that value is 0, these weights and data do not generate influence numerically after the operations such as multiplication and addition to operation result, The weight that these numerical value are 0 is when the processes such as storage, loading and operation can occupy a large amount of Resources on Chip, consume extra work Between, it is difficult to meet the performance requirement of neural network processor.
Analysis is carried out by the calculating structure to existing neural network processor to find, can to neural network weighted value into Row compression realizes that the purpose accelerated arithmetic speed, reduce energy loss, the prior art provide the basic frame of neural network accelerator Structure, the present invention propose a kind of weight compression storage format in prior art basis, and weighted data is being deposited after recoding Storage format is compressed using weight in storage, transmission and calculating process, and increases weight index structure in neural computing unit, Ensuring can be with the correct operation of data element by compressed weight.
To achieve the above object, the present invention proposes a kind of neural network processor based on weight compression, comprising:
At least one storage unit, for storing operational order and participating in the data calculated;
At least one computing unit, for executing neural computing;And control unit, at least one storage unit Controller is connected at least one described computing unit, for described extremely via the acquisition of at least one described storage unit controller The instruction of few storage unit storage, and the instruction is parsed to control at least one described computing unit;
At least one weight retrieval unit, wherein each weight retrieval unit is connected at least one described computing unit, Guarantee compressed weight and the correct operation of corresponding data;
Neural network processor system according to the present invention, the weight are trained neural network weight.
Neural network processor according to the present invention, when the neural network processor carries out neural computing, Trained neural network weight can be compressed to weight compressed format outside piece, and be stored in a storage unit.
The present invention by the way of compressing offline under piece by neural network weight boil down to weight compressed format, and pass through Input interface is transmitted to the storage unit of on piece.
It is logical below in conjunction with attached drawing in order to keep the purpose of the present invention, technical solution, design method and advantage more clear Crossing specific embodiment, the present invention is described in more detail, it should be understood that specific embodiment described herein is only to explain The present invention is not intended to limit the present invention.
The present invention is intended to provide a kind of neural network processor based on weight compression, in Processing with Neural Network system It introduces weight retrieval unit and neural network weight is stored using weight compression storage format, so that on piece storage overhead is reduced, It reduces computing circuit scale and improves operation efficiency, so that Processing with Neural Network system performance is efficient.
Processing with Neural Network provided by the invention is based on storage-control-calculating structure;
Storage organization is used to store the data for participating in calculating and coprocessor operation instruction;
Control structure includes decoding circuit, for parsing operational order, generates control signal with the tune of data in control sheet Degree and storage and neural computing process;
Calculating structure includes arithmetic logic unit, for participating in the operation of the neural computing in the processor, compresses number Calculating operation is realized according in calculating structure.
The present invention also proposes a kind of comprising the chip of the neural network processor compressed based on weight
Fig. 1 is a kind of neural network processor 101 provided by the invention, which is made of six parts, including Input data storage unit 102, control unit 103, output data storage unit 104, weight storage unit 105, instruction storage Unit 106, computing unit 107.
Input data storage unit 102 is used to store the data for participating in calculating, the data include primitive character diagram data with Participate in the data that middle layer calculates;Output data storage unit 104 includes the neuron response being calculated;Weight storage is single Member 105 is for storing trained neural network weight;The location of instruction 106 stores the command information for participating in calculating, Instruction is parsed to realize neural computing.
Control unit 103 respectively with output data storage unit 104, weight storage unit 105, the location of instruction 106, Computing unit 107 is connected, and control unit 103 obtains the instruction being stored in the location of instruction 106 and parses the instruction, controls Unit 103 processed can carry out neural computing according to the control signal control computing unit analyzed the instruction.
Computing unit 107 is used to execute corresponding neural computing according to the control signal that control unit 103 generates. Computing unit 107 is associated with one or more storage units, and computing unit 107 can be deposited from input data associated there Data storage part in storage unit 102 obtains data to be calculated, and can store to the associated output data Data are written in unit 104.Computing unit 107 completes most of operation in neural network algorithm, i.e. multiply-add operation of vector etc., this It outside, is weight compressed format due to being loaded into the weight format for participating in calculating in computing unit 107, in computing unit 107 In should also include that weight retrieves subelement, the subelement is for guaranteeing that compressed weight can be computed correctly with weight.
Fig. 2 is a kind of weight compressed format proposed by the present invention, by the method compressed offline under piece to initial data into Row is recoded, and then realizes weight compression.The weight compressed format includes<weight, and offset>two parts composition, weight is mind Original value through network weight before being compressed, offset are the relative position of current non-zero weight in one group of weighted value.It is pressing In compression process, by recompiling to obtain weight value sequence for the element that not retain numerical value be zero, only retain nonzero element, the party Method ensure that only non-zero weight value participates in neural computing, is compressed by weight, effectively reduces weight quantity in data, Reduce neural computing amount, improves system integral operation speed.
The weight compression process is described in detail by Fig. 3.Weight is grouped, first prime number in every group is by computing unit Scale determine.Weight compression process now is described in detail so that every group of weight includes four elements as an example, in first group of weight, number The element that value is 1.5 and 2.5 is respectively the 0th and the 1st element, therefore after recompiling, this group of weight remains two Nonzero element, the offset for indicating element position is respectively 0 and 1;It include three non-zeros in second group of original weighted data Element is the 0th, the 3rd and the 4th element in this group of data, therefore offset is respectively 0,3 and 4.In third group weight It include 3 and 4 two nonzero elements, offset is respectively 2 and 3 in value.
When computing unit resource is enough, i.e., when possessing multiple computing units simultaneously, the weighted value of multiple and different queues can It is loaded into different computing units simultaneously, the elements in parallel work of same order position, is independent of each other in different queue, grouping side Formula is identical as single computing unit, and the element in each queue in identical relative position is divided into a group, is calculating Cheng Zhong, the data parallel of different queue is loaded into computing unit in each queue.
For convenience of description, Fig. 4 illustrates more computing unit situations by taking two computing units as an example, includes two weights in Fig. 4 Queue, each queue weight are respectively connected into corresponding computing unit, and each computing unit works independently.According to computing unit Amount of capacity, weight are divided into four groups, and in every group of weight, the weighted value of each queue is according to identical group of interior element length point It does not compress.
There is two o'clock advantage with weight compressed format storage weight, only store the nonzero element in weight first, it can significantly Reduce EMS memory occupation;Secondly, only nonzero element is loaded into computing unit, improve calculating speed and improves computing unit Utilization rate.
Fig. 5 is computing unit structural schematic diagram, is described corresponding when weight and data carry out convolutional neural networks calculating Relationship, in the calculating process, into each computing unit, different weighted values is linked into each computing unit for data sharing In, each computing unit concurrent working.
Fig. 6 is a kind of flow chart of neural computing process of the present invention, this method comprises:
Step S1, control unit address storage unit, read and parse the instruction needed to be implemented in next step;
Step S2 obtains input data according to the storage address analyzed the instruction from storage unit;
Data and weight are loaded into computing unit from input storage unit and weight storage unit respectively by step S3;
Step S4, computing unit execute neural network computing in arithmetic operation, wherein data retrieval structure ensure that by The data of compression can be computed correctly with weighted data;
Step S5 will be stored in output storage unit with neural computing result.
In conclusion the present invention is for arithmetic speed present in neural network processor, poor, low this of energy efficiency is asked Topic, by way of compressing offline, by neural network weight boil down to weight compressed format outside piece, reduces at neural network The occupancy for managing weight resource in device, improves arithmetic speed, improves energy efficiency.
Although not each embodiment only includes one it should be appreciated that this specification describes according to various embodiments A independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should will say As a whole, the technical solutions in the various embodiments may also be suitably combined for bright book, and forming those skilled in the art can be with The other embodiments of understanding.
The foregoing is merely the schematical specific embodiment of the present invention, the range being not intended to limit the invention.It is any Those skilled in the art, made equivalent variations, modification and combination under the premise of not departing from design and the principle of the present invention, It should belong to the scope of protection of the invention.

Claims (7)

1. a kind of neural network processor based on weight compression characterized by comprising
At least one storage unit, for storing operational order and participating in the data calculated;
At least one storage unit controller, for controlling the storage unit;
At least one computing unit, for executing the calculating operation of neural network;
Control unit is connected with the storage unit controller with the computing unit, for controlling via the storage unit Device obtains the instruction of the storage unit storage, and parses described instruction to control the computing unit;
At least one weight compression unit, for being compressed to weight, wherein each weight compression unit and the meter It calculates unit to be connected, guarantees compressed weight and the correct operation of corresponding data;
Wherein, it is recoded by the method compressed offline under piece to the data for participating in calculating, lattice is compressed by weight Formula realizes weight compression;
In weight compression process, by recompiling the weight value sequence of acquisition for the element that not retain numerical value be zero, only protect Stay nonzero element.
2. the neural network processor as described in claim 1 based on weight compression, which is characterized in that the storage unit packet Include input data storage unit, output data storage unit, weight storage unit, the location of instruction.
3. the neural network processor as claimed in claim 2 based on weight compression, which is characterized in that the input data is deposited Storage unit is used to store the data that the participation calculates, and the data for participating in calculating include in primitive character diagram data and participation The data that interbed calculates;The output data storage unit includes calculating the neuron response obtained;The weight storage is single Member is for storing trained neural network weight;Described instruction storage unit is used to store the instruction letter for participating in calculating Breath.
4. the neural network processor as described in claim 1 based on weight compression, which is characterized in that the weight compresses lattice Formula include<weight, offset>.
5. as claimed in claim 2 based on weight compression neural network processor, which is characterized in that the computing unit from Data are obtained in the input data storage unit associated there to be calculated, and to associated there described defeated Data are written in data storage cell out.
6. a kind of design side of neural network processor of design based on weight compression as described in claim 1-5 any one Method characterized by comprising
Step 1, control unit is addressed storage unit, reads and parses the instruction needed to be implemented in next step;
Step 2, storage address is obtained according to the instruction that parses, and obtained from the storage unit data that participate in calculating with Weight;
Step 3, the data for participating in calculating and weight are loaded into meter from input storage unit and weight storage unit respectively Calculate unit;
Step 4, computing unit executes the arithmetic operation in neural network computing, is pressed wherein ensure that by weight compression unit The data of contracting can be computed correctly with weighted data;
Step 5, neural computing result is stored in output storage unit;
Wherein, it is recoded by the method compressed offline under piece to the data for participating in calculating, lattice is compressed by weight Formula realizes weight compression;
In weight compression process, by recompiling the weight value sequence of acquisition for the element that not retain numerical value be zero, only protect Stay nonzero element.
7. a kind of includes the chip of the neural network processor based on weight compression as described in claim 1-5 any one.
CN201610958305.2A 2016-10-27 2016-10-27 It is a kind of based on weight compression neural network processor, design method, chip Active CN106529670B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610958305.2A CN106529670B (en) 2016-10-27 2016-10-27 It is a kind of based on weight compression neural network processor, design method, chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610958305.2A CN106529670B (en) 2016-10-27 2016-10-27 It is a kind of based on weight compression neural network processor, design method, chip

Publications (2)

Publication Number Publication Date
CN106529670A CN106529670A (en) 2017-03-22
CN106529670B true CN106529670B (en) 2019-01-25

Family

ID=58325737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610958305.2A Active CN106529670B (en) 2016-10-27 2016-10-27 It is a kind of based on weight compression neural network processor, design method, chip

Country Status (1)

Country Link
CN (1) CN106529670B (en)

Families Citing this family (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103113B (en) * 2017-03-23 2019-01-11 中国科学院计算技术研究所 The Automation Design method, apparatus and optimization method towards neural network processor
CN107092961B (en) * 2017-03-23 2018-08-28 中国科学院计算技术研究所 A kind of neural network processor and design method based on mode frequency statistical coding
CN107016175B (en) * 2017-03-23 2018-08-31 中国科学院计算技术研究所 It is applicable in the Automation Design method, apparatus and optimization method of neural network processor
CN107103358A (en) * 2017-03-24 2017-08-29 中国科学院计算技术研究所 Processing with Neural Network method and system based on spin transfer torque magnetic memory
CN107086910B (en) * 2017-03-24 2018-08-10 中国科学院计算技术研究所 A kind of weight encryption and decryption method and system for Processing with Neural Network
CN107423816B (en) * 2017-03-24 2021-10-12 中国科学院计算技术研究所 Multi-calculation-precision neural network processing method and system
US11544545B2 (en) 2017-04-04 2023-01-03 Hailo Technologies Ltd. Structured activation based sparsity in an artificial neural network
US11551028B2 (en) 2017-04-04 2023-01-10 Hailo Technologies Ltd. Structured weight based sparsity in an artificial neural network
US11238334B2 (en) 2017-04-04 2022-02-01 Hailo Technologies Ltd. System and method of input alignment for efficient vector operations in an artificial neural network
US10387298B2 (en) 2017-04-04 2019-08-20 Hailo Technologies Ltd Artificial neural network incorporating emphasis and focus techniques
US11615297B2 (en) 2017-04-04 2023-03-28 Hailo Technologies Ltd. Structured weight based sparsity in an artificial neural network compiler
US10795836B2 (en) * 2017-04-17 2020-10-06 Microsoft Technology Licensing, Llc Data processing performance enhancement for neural networks using a virtualized data iterator
CN107679621B (en) * 2017-04-19 2020-12-08 赛灵思公司 Artificial neural network processing device
CN108734288B (en) * 2017-04-21 2021-01-29 上海寒武纪信息科技有限公司 Operation method and device
CN109146069B (en) * 2017-06-16 2020-10-13 上海寒武纪信息科技有限公司 Arithmetic device, arithmetic method, and chip
SG11201911600VA (en) * 2017-07-07 2020-01-30 Mitsubishi Electric Corp Data processing device, data processing method, and compressed data
CN111656360B (en) * 2017-07-21 2024-02-20 森田公司 System and method for sparsity utilization
CN107622305A (en) * 2017-08-24 2018-01-23 中国科学院计算技术研究所 Processor and processing method for neutral net
CN107578098B (en) * 2017-09-01 2020-10-30 中国科学院计算技术研究所 Neural network processor based on systolic array
CN107491811A (en) * 2017-09-01 2017-12-19 中国科学院计算技术研究所 Method and system and neural network processor for accelerans network processing unit
US20190087713A1 (en) * 2017-09-21 2019-03-21 Qualcomm Incorporated Compression of sparse deep convolutional network weights
JP6957365B2 (en) * 2017-09-22 2021-11-02 株式会社東芝 Arithmetic logic unit
CN108205704B (en) * 2017-09-27 2021-10-29 深圳市商汤科技有限公司 Neural network chip
CN107800700B (en) * 2017-10-27 2020-10-27 中国科学院计算技术研究所 Router and network-on-chip transmission system and method
CN109726805B (en) * 2017-10-30 2021-02-09 上海寒武纪信息科技有限公司 Method for designing neural network processor by using black box simulator
CN107729995A (en) * 2017-10-31 2018-02-23 中国科学院计算技术研究所 Method and system and neural network processor for accelerans network processing unit
CN107844829A (en) * 2017-10-31 2018-03-27 中国科学院计算技术研究所 Method and system and neural network processor for accelerans network processing unit
CN107977704B (en) * 2017-11-10 2020-07-31 中国科学院计算技术研究所 Weight data storage method and neural network processor based on same
CN107918794A (en) * 2017-11-15 2018-04-17 中国科学院计算技术研究所 Neural network processor based on computing array
CN107944555B (en) * 2017-12-07 2021-09-17 广州方硅信息技术有限公司 Neural network compression and acceleration method, storage device and terminal
CN109791628B (en) * 2017-12-29 2022-12-27 清华大学 Neural network model block compression method, training method, computing device and system
CN111582464B (en) * 2017-12-29 2023-09-29 中科寒武纪科技股份有限公司 Neural network processing method, computer system and storage medium
CN110045960B (en) * 2018-01-16 2022-02-18 腾讯科技(深圳)有限公司 Chip-based instruction set processing method and device and storage medium
US11436483B2 (en) * 2018-01-17 2022-09-06 Mediatek Inc. Neural network engine with tile-based execution
CN108334945B (en) * 2018-01-30 2020-12-25 中国科学院自动化研究所 Acceleration and compression method and device of deep neural network
CN108416425B (en) * 2018-02-02 2020-09-29 浙江大华技术股份有限公司 Convolution operation method and device
CN110197272B (en) * 2018-02-27 2020-08-25 上海寒武纪信息科技有限公司 Integrated circuit chip device and related product
CN108510058B (en) * 2018-02-28 2021-07-20 中国科学院计算技术研究所 Weight storage method in neural network and processor based on method
CN108171328B (en) * 2018-03-02 2020-12-29 中国科学院计算技术研究所 Neural network processor and convolution operation method executed by same
CN108647774B (en) * 2018-04-23 2020-11-20 瑞芯微电子股份有限公司 Neural network method and circuit for optimizing sparsity matrix operation
CN108764454B (en) * 2018-04-28 2022-02-25 中国科学院计算技术研究所 Neural network processing method based on wavelet transform compression and/or decompression
US11687759B2 (en) * 2018-05-01 2023-06-27 Semiconductor Components Industries, Llc Neural network accelerator
CN109325590B (en) * 2018-09-14 2020-11-03 中国科学院计算技术研究所 Device for realizing neural network processor with variable calculation precision
CN109543830B (en) * 2018-09-20 2023-02-03 中国科学院计算技术研究所 Splitting accumulator for convolutional neural network accelerator
CN109492761A (en) * 2018-10-30 2019-03-19 深圳灵图慧视科技有限公司 Realize FPGA accelerator, the method and system of neural network
US11588499B2 (en) 2018-11-05 2023-02-21 Samsung Electronics Co., Ltd. Lossless compression of neural network weights
CN109886416A (en) * 2019-02-01 2019-06-14 京微齐力(北京)科技有限公司 The System on Chip/SoC and machine learning method of integrated AI's module
CN110334716B (en) * 2019-07-04 2022-01-11 北京迈格威科技有限公司 Feature map processing method, image processing method and device
CN112119593A (en) * 2019-07-25 2020-12-22 深圳市大疆创新科技有限公司 Data processing method and system, encoder and decoder
US11635893B2 (en) * 2019-08-12 2023-04-25 Micron Technology, Inc. Communications between processors and storage devices in automotive predictive maintenance implemented via artificial neural networks
CN111105018B (en) * 2019-10-21 2023-10-13 深圳云天励飞技术有限公司 Data processing method and device
CN112835510B (en) * 2019-11-25 2022-08-26 北京灵汐科技有限公司 Method and device for controlling storage format of on-chip storage resource
CN113011577B (en) * 2019-12-20 2024-01-05 阿里巴巴集团控股有限公司 Processing unit, processor core, neural network training machine and method
US11263077B1 (en) 2020-09-29 2022-03-01 Hailo Technologies Ltd. Neural network intermediate results safety mechanism in an artificial neural network processor
US11811421B2 (en) 2020-09-29 2023-11-07 Hailo Technologies Ltd. Weights safety mechanism in an artificial neural network processor
US11874900B2 (en) 2020-09-29 2024-01-16 Hailo Technologies Ltd. Cluster interlayer safety mechanism in an artificial neural network processor
US11237894B1 (en) 2020-09-29 2022-02-01 Hailo Technologies Ltd. Layer control unit instruction addressing safety mechanism in an artificial neural network processor
US11221929B1 (en) 2020-09-29 2022-01-11 Hailo Technologies Ltd. Data stream fault detection mechanism in an artificial neural network processor
TWI769807B (en) * 2021-05-04 2022-07-01 國立清華大學 Hardware/software co-compressed computing method and system for sram computing-in-memory-based processing unit
CN113688983A (en) * 2021-08-09 2021-11-23 上海新氦类脑智能科技有限公司 Convolution operation implementation method, circuit and terminal for reducing weight storage in impulse neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009176110A (en) * 2008-01-25 2009-08-06 Seiko Epson Corp Parallel processing device and parallel processing method
CN105184366A (en) * 2015-09-15 2015-12-23 中国科学院计算技术研究所 Time-division-multiplexing general neural network processor
CN105260776A (en) * 2015-09-10 2016-01-20 华为技术有限公司 Neural network processor and convolutional neural network processor
CN105512723A (en) * 2016-01-20 2016-04-20 南京艾溪信息科技有限公司 Artificial neural network calculating device and method for sparse connection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009176110A (en) * 2008-01-25 2009-08-06 Seiko Epson Corp Parallel processing device and parallel processing method
CN105260776A (en) * 2015-09-10 2016-01-20 华为技术有限公司 Neural network processor and convolutional neural network processor
CN105184366A (en) * 2015-09-15 2015-12-23 中国科学院计算技术研究所 Time-division-multiplexing general neural network processor
CN105512723A (en) * 2016-01-20 2016-04-20 南京艾溪信息科技有限公司 Artificial neural network calculating device and method for sparse connection

Also Published As

Publication number Publication date
CN106529670A (en) 2017-03-22

Similar Documents

Publication Publication Date Title
CN106529670B (en) It is a kind of based on weight compression neural network processor, design method, chip
CN106447034B (en) A kind of neural network processor based on data compression, design method, chip
CN106650924B (en) A kind of processor based on time dimension and space dimension data stream compression, design method
CN106355244B (en) The construction method and system of convolutional neural networks
CN105681628B (en) A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and the method for realizing image denoising processing
CN110516801B (en) High-throughput-rate dynamic reconfigurable convolutional neural network accelerator
CN107491811A (en) Method and system and neural network processor for accelerans network processing unit
CN107169563B (en) Processing system and method applied to two-value weight convolutional network
CN106951926A (en) The deep learning systems approach and device of a kind of mixed architecture
CN107609641A (en) Sparse neural network framework and its implementation
CN108280514A (en) Sparse neural network acceleration system based on FPGA and design method
CN109376843A (en) EEG signals rapid classification method, implementation method and device based on FPGA
CN110390383A (en) A kind of deep neural network hardware accelerator based on power exponent quantization
CN109325591A (en) Neural network processor towards Winograd convolution
CN109472350A (en) A kind of neural network acceleration system based on block circulation sparse matrix
CN107729995A (en) Method and system and neural network processor for accelerans network processing unit
CN109472356A (en) A kind of accelerator and method of restructural neural network algorithm
CN110390385A (en) A kind of general convolutional neural networks accelerator of configurable parallel based on BNRP
CN101625735A (en) FPGA implementation method based on LS-SVM classification and recurrence learning recurrence neural network
CN109190756A (en) Arithmetic unit based on Winograd convolution and the neural network processor comprising the device
Liu et al. Hardware acceleration of fully quantized bert for efficient natural language processing
CN109447241A (en) A kind of dynamic reconfigurable convolutional neural networks accelerator architecture in internet of things oriented field
CN107085562A (en) A kind of neural network processor and design method based on efficient multiplexing data flow
CN107844829A (en) Method and system and neural network processor for accelerans network processing unit
CN112465110A (en) Hardware accelerator for convolution neural network calculation optimization

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230109

Address after: 518063 14th Floor, West Tower, Baidu International Building, No. 8, Haitian 1st Road, Binhai Community, Yuehai Street, Nanshan District, Shenzhen, Guangdong

Patentee after: Zhongke Times (Shenzhen) Computer System Co.,Ltd.

Address before: 100080 No. 6 South Road, Zhongguancun Academy of Sciences, Beijing, Haidian District

Patentee before: Institute of Computing Technology, Chinese Academy of Sciences