CN106529670A

CN106529670A - Neural network processor based on weight compression, design method, and chip

Info

Publication number: CN106529670A
Application number: CN201610958305.2A
Authority: CN
Inventors: 韩银和; 许浩博; 王颖
Original assignee: Institute of Computing Technology of CAS
Current assignee: Zhongke Times Shenzhen Computer System Co ltd
Priority date: 2016-10-27
Filing date: 2016-10-27
Publication date: 2017-03-22
Anticipated expiration: 2036-10-27
Also published as: CN106529670B

Abstract

The invention brings forward a neural network processor based on weight compression, a design method, and a chip. The processor comprises at least one storage unit used for storing operation instructions and data participating in calculation; at least one storage unit controller used for controlling the storage unit; at least one calculation unit used for executing calculation operation of a neural network; a control unit connected with the storage unit controller and the calculation unit for obtaining the instructions stored by the storage unit via the storage unit controller and analyzing the instructions so as to control the calculation unit; and at least one weight retrieval unit used for retrieving weights, wherein each weight retrieval unit is connected with the calculation unit so as to ensure correct operation on the compressed weights and the corresponding data. According to the invention, less weight resources in a neural network processor are occupied, the operation speed is improved, and the energy efficiency is improved.

Description

It is a kind of based on weight compress neural network processor, method for designing, chip

Technical field

The present invention relates to the hardware-accelerated field that neural network model is calculated, more particularly to a kind of god compressed based on weight Jing network processing units, method for designing, chip.

Background technology

Deep learning is the important branch in machine learning field, and important breakthrough is achieved in the last few years.Using depth The neural network model of Algorithm for Training is practised since the proposition in the applications such as image recognition, speech processes, intelligent robot Achieve the achievement for attracting people's attention.

Deep neural network by setting up the neural attachment structure of modeling human brain, process image, sound and During the signals such as text, data characteristicses are described by the layering of multiple conversion stages.With the continuous of neutral net complexity Improve, nerual network technique has that in actual application occupancy resource is more, arithmetic speed is slow, energy expenditure, Therefore there is serious efficiency and computing speed in the technology when the field such as embedded device or low overhead data center is applied Degree bottleneck.Using it is hardware-accelerated substitute traditional software calculate method become improve neural computing efficiency a kind of row it Effective means.The hardware-accelerated mode of main flow includes graphics processing unit, application specific processor chip and field programmable logic Array (FPGA) etc..

In existing nerual network technique, neural network model carries out many wheel training according to sample order according to training set Obtain neutral net weighted value.Neutral net weight has certain openness, there is the weight that big numerical quantity is 0, these power Weight does not produce impact numerically after the computing such as multiplication and addition with data to operation result.Weight in these neutral nets Weighted value for 0 is relevant with the inherent character of deep neural network, is obtained by repeatedly training, and is difficult from algorithm angle to eliminate. These numerical value be 0 weight in storage, be loaded into and when the process such as computing can take a large amount of Resources on Chip, consume unnecessary work Between, it is difficult to meet the performance requirement of neural network processor.

Therefore no matter in academia or industrial quarters, it is 0 element for numerical value in above-mentioned neutral net, has carried out and ground in a large number Study carefully.Document " Albericio J, Judd P, Hetherington T, et al.Cnvlutin:ineffectual-neuron- free deep neural network computing[C]//Computer Architecture(ISCA),2016ACM/ IEEE 43rd Annual International Symposium on.IEEE,2016:1-13. " is big by providing on piece The memory element of scale is realized Large-scale parallel computing and realizes the compression to data element based on this, but relies on piece Upper large-scale memory element, meets the demand in parallel computation, is not suitable for embedded device；Document " Chen Y H, Emer J,Sze V.Eyeriss:A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks [J] .2016. " realize data reusing and using electricity by shared data and weight The method of source gate closes the calculating of element 0, can effectively improve energy efficiency, but the method can only reduce computing power consumption without Method is skipped data 0 and then accelerates calculating speed.

Invention " a kind of neutral net accelerator and its operation method ", the invention are applied to neural network algorithm field, carry A kind of neutral net accelerator and its operation method are supplied, the neutral net accelerator includes address in piece internal memory storage media, piece Index module, core calculation module and many ALU devices, piece internal memory storage media, for storing the outside data for transmitting or being used for The data produced in storage computation process；Data directory module in piece, mapped to according to the index of input during computing for performing Correct storage address；Core calculation module is used to perform neural network computing；Many ALU devices for from core calculation module or Piece internal memory storage media obtains input data and performs the nonlinear operation that core calculation module cannot be completed.The invention is in neutral net Many ALU designs are introduced in accelerator, so as to lift the arithmetic speed of nonlinear operation so that neutral net accelerator is more increased Effect.The present invention is that weight compression storage organization is introduced in neutral net accelerator with the maximum difference of the invention, is improve Neural network computing speed simultaneously reduces energy loss.

Invention " accelerates the arithmetic unit and method of the speed-up chip of deep neural network algorithm ", and the invention provides a kind of adding The arithmetic unit and method of the speed-up chip of fast deep neural network algorithm, described device include：Vector addition processor module, The computing of the vectorization of the pooling layer algorithms entered in the addition or subtraction, and/or deep neural network algorithm of row vector；To Flow function value arithmetic device module, the vector quantities operation of the non-linear evaluation in deep neural network algorithm；Vectorial adder and multiplier module, Enter the multiply-add operation of row vector；Three modules perform programmable instructions, interact with each other to calculate the neuron of neutral net Value and network output result and, represent synapse weight variable quantity of the input layer to output layer neuron operation intensity； Intermediate value memory area is provided with three modules, and main storage is read out and write operation.Thereby, it is possible to Reduce the intermediate value to main storage and read and write number of times, reduce the energy expenditure of accelerator chip, it is to avoid data processing Shortage of data and replacement problem in journey.The present invention is that power is introduced in neutral net accelerator with the maximum difference of the invention Weight contracting storage organization, improves neural network computing speed and reduces energy loss.

The content of the invention

For the drawbacks described above of existing neural network processor, the present invention proposes a kind of neutral net compressed based on weight Processor, method for designing, chip, the system introduce weight index structure, Jin Erti in existing neural network processor system The arithmetic speed and energy loss of neutral net acceleration are risen.

The present invention proposes a kind of neural network processor compressed based on weight, including：

At least one memory element, for storing operational order and participating in the data for calculating；

At least one memory element controller, for being controlled to the memory element；

At least one computing unit, the calculating for performing neutral net are operated；

Control unit, is connected with the computing unit with the memory element controller, for via the memory element Controller obtains the instruction of the memory element storage, and parses the instruction to control the computing unit；

At least one weight retrieval unit, for entering line retrieval to weight, wherein each described weight retrieval unit and institute State computing unit to be connected, it is ensured that the weight compressed and the correct computing of corresponding data.

The memory element includes that input data memory element, output data memory element, weight memory element, instruction are deposited Storage unit.

The input data memory element is used to store the data for participating in and calculating, and the data that the participation is calculated include Primitive character diagram data and the data for participating in intermediate layer calculating；The output data memory element includes calculating the neuron for obtaining Response value；The weight memory element is used for the neutral net weight that storage has been trained；The location of instruction is used for Storage participates in the command information for calculating.

The data for calculating that participate in are recoded by the method for offline compression under piece, by weight compressed format Realize that weight is compressed.

The weight compressed format includes<Weight, side-play amount>.

Weight in the weight compressed format is original value of the neutral net weight before being compressed, and the side-play amount is The relative position of current non-zero weight in one group of weighted value.

In weight compression process, the weight value sequence through recompiling acquisition is not retained the element that numerical value is zero, Only retain nonzero element.

The computing unit obtains data to be calculated from the input data memory element associated there, and And data are write to the output data memory element associated there.

The present invention also proposes a kind of method for designing of the neural network processor compressed based on weight described in design, including：

Step 1, described control unit are addressed that to the memory element reading and parse next step needs the finger of execution Order；

Step 2, obtains storage address according to the instruction for parsing, and the participation calculating is obtained from the memory element Data and weight；

Step 3, the data for participating in calculating are stored from the input memory element with the weight respectively with weight Unit is loaded into the computing unit；

Step 4, the computing unit perform the arithmetic operation in neural network computing, wherein single by weight retrieval Unit ensure that the data compressed can be computed correctly with weighted data；

Step 5, neural computing result is stored in the output memory element.

The present invention also proposes a kind of chip including the neural network processor compressed based on weight.

From above scheme, it is an advantage of the current invention that：

The present invention for arithmetic speed present in neural network processor is poor, low this problem of energy efficiency, by from The mode of wire compression, it is outer by neutral net weight boil down to weight compressed format in piece, weigh in reducing neural network processor The occupancy of weight resource, improves arithmetic speed, improves energy efficiency.

Description of the drawings

The neural network processor structured flowchart that Fig. 1 is provided for the present invention；

Fig. 2 compresses storage format figure for a kind of weight proposed by the present invention；

Fig. 3 is weight compression unit schematic diagram in present invention list computing unit embodiment；

Fig. 4 is weight compression unit schematic diagram in multioperation unit embodiment of the present invention；

Fig. 5 is the structural representation of computing unit of the present invention；

Fig. 6 is the flow chart that neural network processor proposed by the present invention carries out neural network computing.

Specific embodiment

When studying to neural network processor, it is found that neutral net weight has certain openness, there are a large amount of numbers It is worth the weight for 0, these weights do not produce impact numerically after the computing such as multiplication and addition with data to operation result, These numerical value be 0 weight in storage, be loaded into and when the process such as computing can take a large amount of Resources on Chip, consume unnecessary work Between, it is difficult to meet the performance requirement of neural network processor.

Discovery is analyzed through the computation structure to existing neural network processor, neutral net weighted value can be entered Row compression, realizes accelerating arithmetic speed, reduces the purpose of energy loss, and prior art provides the basic frame of neutral net accelerator Structure, the present invention propose a kind of weight compression storage format in prior art basis, and weighted data is being deposited after recodification Storage format is compressed using weight in storage, transmission and calculating process, and increases weight index structure in neural computing unit, Ensure that the weight after being compressed can be with the correct computing of data element.

For achieving the above object, the present invention proposes a kind of neural network processor compressed based on weight, including：

At least one computing unit, for performing neural computing；And control unit, with least one memory element Controller is connected with least one computing unit, for via at least one memory element controller obtain described in extremely The instruction of few memory element storage, and parse the instruction to control at least one computing unit；

At least one weight retrieval unit, wherein each weight retrieval unit are connected with least one computing unit, The weight compressed by guarantee and the correct computing of corresponding data；

Neural network processor system of the invention, the weight are the neutral net weight for having trained.

Neural network processor of the invention, when the neural network processor carries out neural computing, The neutral net weight for training can be compressed to weight compressed format outside piece, and store in the memory unit.

The present invention using by the way of offline compression under piece by neutral net weight boil down to weight compressed format, and pass through Input interface transmits the memory element to piece.

In order that the purpose of the present invention, technical scheme, method for designing and advantage are of greater clarity, it is logical below in conjunction with accompanying drawing The present invention is described in more detail to cross specific embodiment, it will be appreciated that specific embodiment described herein is only to explain The present invention, is not intended to limit the present invention.

The present invention is intended to provide a kind of neural network processor compressed based on weight, which is in Processing with Neural Network system Introduce weight retrieval unit and using weight compression storage format storage neutral net weight, so as to reduce storage overhead on piece, Reduce computing circuit scale and improve operation efficiency so that Processing with Neural Network systematic function is higher efficiently.

Structure of the Processing with Neural Network that the present invention is provided based on storage-control-calculating；

Storage organization is used to store data and the coprocessor operation instruction for participating in calculating；

Control structure includes decoding circuit, for parsing operational order, generates control signal with the tune of data in control sheet Degree and storage and neural computing process；

Computation structure includes ALU, for participating in the operation of the neural computing in the processor, compresses number Operate according to realizing calculating in computation structure.

The present invention also proposes a kind of chip comprising the neural network processor compressed based on weight

A kind of neural network processor 101 that Fig. 1 is provided for the present invention, the system architecture are made up of six parts, including Input data memory element 102, control unit 103, output data memory element 104, weight memory element 105, instruction storage Unit 106, computing unit 107.

Input data memory element 102 be used for store participate in calculate data, the data include primitive character diagram data and Participate in the data that intermediate layer calculates；Output data memory element 104 includes calculated neuron response value；Weight storage is single Unit 105 is used for the neutral net weight that storage has been trained；The location of instruction 106 stores the command information for participating in calculating, Instruction is parsed to realize neural computing.

Control unit 103 respectively with output data memory element 104, weight memory element 105, the location of instruction 106, Computing unit 107 is connected, and control unit 103 obtains the instruction that is stored in the location of instruction 106 and parses the instruction, controls Unit processed 103 can carry out neural computing according to the control signal control computing unit that analysis instruction is obtained.

Computing unit 107 performs corresponding neural computing for the control signal that produces according to control unit 103. Computing unit 107 is associated with one or more memory element, and computing unit 107 can be deposited from input data associated there Data storage part in storage unit 102 obtains data to be calculated, and can be to the associated output data storage Unit 104 writes data.Computing unit 107 completes the most of computing in neural network algorithm, i.e., vectorial multiply-add operation etc., this Outward, due to being loaded in computing unit 107 the weight form for participating in calculating for weight compressed format, therefore in computing unit 107 In should also include that weight retrieves subelement, the subelement is used to ensure that the weight compressed can be computed correctly with weight.

Fig. 2 is a kind of weight compressed format proposed by the present invention, initial data is entered by the method for offline compression under piece Row is recoded, and then realizes that weight is compressed.The weight compressed format includes<Weight, side-play amount>Two parts are constituted, and weight is god Original value of the Jing network weights before being compressed, side-play amount are the relative position of current non-zero weight in one group of weighted value.In pressure In compression process, obtain weight value sequence and do not retained the element that numerical value is zero through recompiling, only retain nonzero element, the party Method ensure that only non-zero weight value participates in neural computing, is compressed by weight, effectively reduces weight quantity in data, Neural computing amount is reduced, system integral operation speed is improve.

The weight compression process is described in detail by Fig. 3.Weight is grouped, per the first prime number in group by computing unit Scale determine.Now weight compression process is described in detail so that every group of weight is comprising four elements as an example, in first group of weight, number It is worth element respectively the 0th and the 1st element for 1.5 and 2.5, therefore after recompiling, this group of weight remains two Nonzero element, the side-play amount for indicating element position are respectively 0 and 1；Three non-zeros are included in second group of original weighted data Element, is the 0th, the 3rd and the 4th element in this group of data, therefore side-play amount is respectively 0,3 and 4.In the 3rd group of weight In value, comprising 3 and 4 two nonzero elements, side-play amount is respectively 2 and 3.

When computing unit resource is enough, i.e., when possessing multiple computing units simultaneously, the weighted value of multiple different queues can It is loaded in different computing units simultaneously, in different queue, the elements in parallel work of same order position, is independent of each other, packet side Formula is identical with single computing unit, and the element in each queue in identical relative position is divided into a group, is calculating Cheng Zhong, in each queue, the data parallel of different queue is loaded in computing unit.

For convenience of describing, Fig. 4 illustrates many computing unit situations by taking two computing units as an example, and two weights are included in Fig. 4 Queue, each queue weight are respectively connected to into corresponding computing unit, and each computing unit works independently.According to computing unit Amount of capacity, weight are divided into four groups, and in every group of weight, the weighted value of each queue is according to identical group interior element length point Do not compress.

There are 2 points of advantages with weight compressed format storage weight, the nonzero element in weight is only stored first, can significantly Reduce EMS memory occupation；Secondly, only nonzero element is loaded in computing unit, improves calculating speed and improve computing unit Utilization rate.

Fig. 5 is computing unit structural representation, describe weight and data carry out it is corresponding when convolutional neural networks are calculated Relation, in the calculating process, in each computing unit, different weighted values is linked into each computing unit for data sharing In, each computing unit concurrent working.

Fig. 6 is a kind of flow chart of neural computing process of the invention, and the method includes：

Step S1, control unit are addressed to memory element, read and parse the instruction that next step needs to perform；

Step S2, obtains input data from memory element according to the storage address that analysis instruction is obtained；

Data and weight are loaded into computing unit from input memory element and weight memory element by step S3 respectively；

Step S4, computing unit perform neural network computing in arithmetic operation, wherein data retrieval structure ensure that by The data of compression can be computed correctly with weighted data；

Step S5, will be stored in output memory element with neural computing result.

In sum, the present invention is for arithmetic speed present in neural network processor is poor, low this of energy efficiency is asked Topic, it is by way of offline compression, outer by neutral net weight boil down to weight compressed format in piece, reduce at neutral net In reason device, the occupancy of weight resource, improves arithmetic speed, improves energy efficiency.

It should be understood that, although this specification is described according to each embodiment, but not each embodiment only includes one Individual independent technical scheme, this narrating mode of description is only that those skilled in the art will should say for clarity Bright book as an entirety, the technical scheme in each embodiment can also Jing it is appropriately combined, forming those skilled in the art can be with The other embodiment of understanding.

Schematically specific embodiment of the invention is the foregoing is only, the scope of the present invention is not limited to.It is any Those skilled in the art, the equivalent variations made on the premise of the design without departing from the present invention and principle, modification and combination, The scope of protection of the invention all should be belonged to.

Claims

1. it is a kind of based on weight compress neural network processor, it is characterised in that include：

Control unit, is connected with the computing unit with the memory element controller, for controlling via the memory element Device obtains the instruction of the memory element storage, and parses the instruction to control the computing unit；

At least one weight retrieval unit, for entering line retrieval to weight, wherein each described weight retrieval unit and the meter Calculate unit to be connected, it is ensured that the weight compressed and the correct computing of corresponding data.

2. the neural network processor for being compressed based on weight as claimed in claim 1, it is characterised in that the memory element bag Include input data memory element, output data memory element, weight memory element, the location of instruction.

3. the neural network processor for being compressed based on weight as claimed in claim 2, it is characterised in that the input data is deposited Storage unit is used to store the data for participating in and calculating, during the data that the participation is calculated include primitive character diagram data and participate in The data that interbed is calculated；The output data memory element includes calculating the neuron response value for obtaining；The weight storage is single The neutral net weight that unit has been trained for storage；The location of instruction is used to store the instruction letter for participating in calculating Breath.

4. the neural network processor for being compressed based on weight as claimed in claim 1, it is characterised in that by being pressed under piece offline The method of contracting is recoded to the data for calculating that participate in, and realizes that weight is compressed by weight compressed format.

5. the neural network processor for being compressed based on weight as claimed in claim 4, it is characterised in that the weight compresses lattice Formula includes<Weight, side-play amount>.

6. the neural network processor for being compressed based on weight as claimed in claim 5, it is characterised in that the weight compresses lattice Weight in formula is original value of the neutral net weight before being compressed, and the side-play amount is current non-zero power in one group of weighted value The relative position of weight.

7. the neural network processor for being compressed based on weight as claimed in claim 4, it is characterised in that in weight compression process In, the weight value sequence through recompiling acquisition is not retained the element that numerical value is zero, only retains nonzero element.

8. the as claimed in claim 2 neural network processor compressed based on weight, it is characterised in that the computing unit from Data are obtained in the input data memory element associated there to be calculated, and to associated there described defeated Go out data storage cell write data.

9. it is a kind of design as described in claim 1-8 any one based on weight compress neural network processor design side Method, it is characterised in that include：

Step 1, described control unit are addressed that to the memory element reading and parse next step needs the instruction of execution；

Step 2, obtains storage address according to the instruction for parsing, and the number for participating in and calculating is obtained from the memory element According to weight；

Step 3, by the data for participating in calculating with weight respectively from the input memory element and the weight memory element It is loaded into the computing unit；

Step 4, the computing unit perform the arithmetic operation in neural network computing, wherein being protected by the weight retrieval unit Demonstrate,prove the data compressed to be computed correctly with weighted data；

Step 5, neural computing result is stored in the output memory element.

10. it is a kind of including as described in claim 1-8 any one based on weight compress neural network processor chip.