CN106529670B

CN106529670B - It is a kind of based on weight compression neural network processor, design method, chip

Info

Publication number: CN106529670B
Application number: CN201610958305.2A
Authority: CN
Inventors: 韩银和; 许浩博; 王颖
Original assignee: Institute of Computing Technology of CAS
Current assignee: Zhongke Times Shenzhen Computer System Co ltd
Priority date: 2016-10-27
Filing date: 2016-10-27
Publication date: 2019-01-25
Anticipated expiration: 2036-10-27
Also published as: CN106529670A

Abstract

The present invention propose it is a kind of based on weight compression neural network processor, design method, chip, which includes at least one storage unit, for store operational order and participate in calculate data；At least one storage unit controller, for controlling the storage unit；At least one computing unit, for executing the calculating operation of neural network；Control unit is connected with the storage unit controller with the computing unit, for obtaining the instruction of the storage unit storage via the storage unit controller, and parses described instruction to control the computing unit；At least one weight retrieval unit, wherein each weight retrieval unit is connected with the computing unit, guarantees compressed weight and the correct operation of corresponding data for retrieving to weight.Present invention reduces the occupancy of weight resource in neural network processor, improve arithmetic speed, improve energy efficiency.

Description

It is a kind of based on weight compression neural network processor, design method, chip

Technical field

The present invention relates to the hardware-accelerated field that neural network model calculates, in particular to a kind of mind based on weight compression Through network processing unit, design method, chip.

Background technique

Deep learning is the important branch in machine learning field, achieves important breakthrough in recent years.Using depth The neural network model of algorithm training is practised since the proposition in the application fields such as image recognition, speech processes, intelligent robot Achieve the achievement to attract people's attention.

Deep neural network by establishing the neural connection structure of modeling human brain, processing image, sound and When the signals such as text, data characteristics is described by the layering of multiple conversion stages.With the continuous of neural network complexity It improving, nerual network technique exists in actual application occupies the problems such as resource is more, arithmetic speed is slow, energy consumption is big, Therefore there is serious efficiency and operation speed when the fields such as embedded device or low overhead data center are applied in the technology Spend bottleneck.Using it is hardware-accelerated substitution traditional software calculate method become improve neural computing efficiency a kind of row it Effective means.The hardware-accelerated mode of mainstream includes graphics processing unit, application specific processor chip and field programmable logic Array (FPGA) etc..

In existing nerual network technique, neural network model carries out more wheel training according to training set, according to sample order Obtain neural network weighted value.Neural network weight has certain sparsity, and there are the weight that big numerical quantity is 0, these power Weight and data do not generate influence numerically after the operations such as multiplication and addition to operation result.Weight in these neural networks It is related with the inherent characteristic of deep neural network for 0 weighted value, it is obtained by repeatedly training, and be not easy to eliminate from algorithm angle. The weight that these numerical value are 0 is when the processes such as storage, loading and operation can occupy a large amount of Resources on Chip, consume extra work Between, it is difficult to meet the performance requirement of neural network processor.

Therefore no matter in academia or industry, it is 0 element for numerical value in above-mentioned neural network, has carried out and largely ground Study carefully.Document " Albericio J, Judd P, Hetherington T, et al.Cnvlutin:ineffectual-neuron- free deep neural network computing[C]//Computer Architecture(ISCA),2016ACM/ IEEE 43rd Annual International Symposium on.IEEE, 2016:1-13. " is big by providing on piece The storage unit of scale realizes Large-scale parallel computing and realizes the compression to data element based on this, but relies on piece Upper large-scale storage unit is not suitable for embedded device to meet the parallel computation the needs of；Document " Chen Y H, Emer J,Sze V.Eyeriss:A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks [J] .2016. " realizes data reusing by shared data and weight and uses electricity The method of source gate closes the calculating of element 0, can effectively improve energy efficiency, but this method can only reduce operation power consumption without Method skips data 0 and then accelerates calculating speed.

Invention " a kind of neural network accelerator and its operation method ", the invention are suitable for neural network algorithm field, mention Having supplied a kind of neural network accelerator and its operation method, the neural network accelerator includes storage medium in piece, address in piece Index module, core calculation module and more ALU devices, storage medium in piece, for storing the external data transmitted or being used for The data generated in storage computation process；Data directory module in piece maps to when for executing operation according to the index of input Correct storage address；Core calculation module is for executing neural network computing；More ALU devices be used for from core calculation module or Storage medium obtains input data and executes the impossible nonlinear operation of core calculation module in piece.The invention is in neural network More ALU designs are introduced in accelerator, so that the arithmetic speed of nonlinear operation is promoted, so that neural network accelerator is more increased Effect.The maximum difference of the present invention and the invention is that weight compression storage organization is introduced in neural network accelerator, is improved Neural network computing speed simultaneously reduces energy loss.

Invention " accelerates the arithmetic unit and method of the acceleration chip of deep neural network algorithm ", which provides a kind of add The arithmetic unit and method of the acceleration chip of fast deep neural network algorithm, described device includes: vectorial addition processor module, Carry out the operation of the vectorization of the addition or the pooling layer algorithm in subtraction, and/or deep neural network algorithm of vector；To Flow function value arithmetic device module, the vector quantities operation of the non-linear evaluation in deep neural network algorithm；Vector adder and multiplier module, Carry out the multiply-add operation of vector；Three modules execute programmable instructions, interact with each other to calculate the neuron of neural network Value and network export result and represent input layer to the synapse weight variable quantity of output layer neuron action intensity； It is provided with median storage region in three modules, and main memory is read out and write operation.Thereby, it is possible to Reduction reads and writees number to the median of main memory, reduces the energy consumption of accelerator chip, avoids data processing Shortage of data and replacement problem in journey.The maximum difference of the present invention and the invention is that power is introduced in neural network accelerator Weight contracting storage organization improves neural network computing speed and reduces energy loss.

Summary of the invention

For the drawbacks described above of existing neural network processor, the present invention proposes a kind of neural network based on weight compression Processor, design method, chip, the system introduce weight index structure, Jin Erti in existing neural network processor system The arithmetic speed and energy loss of neural network acceleration are risen.

The present invention proposes a kind of neural network processor based on weight compression, comprising:

At least one storage unit, for storing operational order and participating in the data calculated；

At least one storage unit controller, for controlling the storage unit；

At least one computing unit, for executing the calculating operation of neural network；

Control unit is connected with the storage unit controller with the computing unit, for via the storage unit Controller obtains the instruction of the storage unit storage, and parses described instruction to control the computing unit；

At least one weight retrieval unit, for being retrieved to weight, wherein each weight retrieval unit and institute It states computing unit to be connected, guarantees compressed weight and the correct operation of corresponding data.

The storage unit includes input data storage unit, output data storage unit, weight storage unit, instructs and deposit Storage unit.

The input data storage unit is used to store the data for participating in calculating, and the data for participating in calculating include Primitive character diagram data and the data for participating in middle layer calculating；The output data storage unit includes calculating the neuron obtained Response；The weight storage unit is for storing trained neural network weight；Described instruction storage unit is used for Storage participates in the command information calculated.

It is recoded by the method compressed offline under piece to the data for participating in calculating, passes through weight compressed format Realize weight compression.

The weight compressed format include<weight, offset>.

Weight in the weight compressed format is original value of the neural network weight before being compressed, and the offset is The relative position of current non-zero weight in one group of weighted value.

In weight compression process, by recompiling the weight value sequence of acquisition for the element that not retain numerical value be zero, Only retain nonzero element.

The computing unit obtains data from the input data storage unit associated there to be calculated, and And data are written to the output data storage unit associated there.

The present invention also proposes a kind of design method of the neural network processor based on weight compression described in design, comprising:

Step 1, described control unit is addressed the storage unit, reads and parses the finger needed to be implemented in next step It enables；

Step 2, storage address is obtained according to the instruction parsed, and obtains the participation from the storage unit and calculates Data and weight；

Step 3, the data for participating in calculating and weight are stored from the input storage unit and the weight respectively Unit is loaded into the computing unit；

Step 4, the computing unit executes the arithmetic operation in neural network computing, wherein being retrieved by the weight single Member ensure that compressed data can be computed correctly with weighted data；

Step 5, neural computing result is stored in the output storage unit.

The present invention also proposes a kind of chip including the neural network processor based on weight compression.

As it can be seen from the above scheme the present invention has the advantages that

The present invention poor, this low problem of energy efficiency for arithmetic speed present in neural network processor, by from The mode of wire compression is reduced in neural network processor and is weighed by neural network weight boil down to weight compressed format outside piece The occupancy of weight resource, improves arithmetic speed, improves energy efficiency.

Detailed description of the invention

Fig. 1 is neural network processor structural block diagram provided by the invention；

Fig. 2 is that a kind of weight proposed by the present invention compresses storage format figure；

Fig. 3 is weight compression unit schematic diagram in the single computing unit embodiment of the present invention；

Fig. 4 is weight compression unit schematic diagram in multioperation unit embodiment of the present invention；

Fig. 5 is the structural schematic diagram of computing unit of the present invention；

Fig. 6 is the flow chart that neural network processor proposed by the present invention carries out neural network computing.

Specific embodiment

When studying neural network processor, discovery neural network weight has certain sparsity, and there are a large amount of numbers The weight that value is 0, these weights and data do not generate influence numerically after the operations such as multiplication and addition to operation result, The weight that these numerical value are 0 is when the processes such as storage, loading and operation can occupy a large amount of Resources on Chip, consume extra work Between, it is difficult to meet the performance requirement of neural network processor.

Analysis is carried out by the calculating structure to existing neural network processor to find, can to neural network weighted value into Row compression realizes that the purpose accelerated arithmetic speed, reduce energy loss, the prior art provide the basic frame of neural network accelerator Structure, the present invention propose a kind of weight compression storage format in prior art basis, and weighted data is being deposited after recoding Storage format is compressed using weight in storage, transmission and calculating process, and increases weight index structure in neural computing unit, Ensuring can be with the correct operation of data element by compressed weight.

To achieve the above object, the present invention proposes a kind of neural network processor based on weight compression, comprising:

At least one computing unit, for executing neural computing；And control unit, at least one storage unit Controller is connected at least one described computing unit, for described extremely via the acquisition of at least one described storage unit controller The instruction of few storage unit storage, and the instruction is parsed to control at least one described computing unit；

At least one weight retrieval unit, wherein each weight retrieval unit is connected at least one described computing unit, Guarantee compressed weight and the correct operation of corresponding data；

Neural network processor system according to the present invention, the weight are trained neural network weight.

Neural network processor according to the present invention, when the neural network processor carries out neural computing, Trained neural network weight can be compressed to weight compressed format outside piece, and be stored in a storage unit.

The present invention by the way of compressing offline under piece by neural network weight boil down to weight compressed format, and pass through Input interface is transmitted to the storage unit of on piece.

It is logical below in conjunction with attached drawing in order to keep the purpose of the present invention, technical solution, design method and advantage more clear Crossing specific embodiment, the present invention is described in more detail, it should be understood that specific embodiment described herein is only to explain The present invention is not intended to limit the present invention.

The present invention is intended to provide a kind of neural network processor based on weight compression, in Processing with Neural Network system It introduces weight retrieval unit and neural network weight is stored using weight compression storage format, so that on piece storage overhead is reduced, It reduces computing circuit scale and improves operation efficiency, so that Processing with Neural Network system performance is efficient.

Processing with Neural Network provided by the invention is based on storage-control-calculating structure；

Storage organization is used to store the data for participating in calculating and coprocessor operation instruction；

Control structure includes decoding circuit, for parsing operational order, generates control signal with the tune of data in control sheet Degree and storage and neural computing process；

Calculating structure includes arithmetic logic unit, for participating in the operation of the neural computing in the processor, compresses number Calculating operation is realized according in calculating structure.

The present invention also proposes a kind of comprising the chip of the neural network processor compressed based on weight

Fig. 1 is a kind of neural network processor 101 provided by the invention, which is made of six parts, including Input data storage unit 102, control unit 103, output data storage unit 104, weight storage unit 105, instruction storage Unit 106, computing unit 107.

Input data storage unit 102 is used to store the data for participating in calculating, the data include primitive character diagram data with Participate in the data that middle layer calculates；Output data storage unit 104 includes the neuron response being calculated；Weight storage is single Member 105 is for storing trained neural network weight；The location of instruction 106 stores the command information for participating in calculating, Instruction is parsed to realize neural computing.

Control unit 103 respectively with output data storage unit 104, weight storage unit 105, the location of instruction 106, Computing unit 107 is connected, and control unit 103 obtains the instruction being stored in the location of instruction 106 and parses the instruction, controls Unit 103 processed can carry out neural computing according to the control signal control computing unit analyzed the instruction.

Computing unit 107 is used to execute corresponding neural computing according to the control signal that control unit 103 generates. Computing unit 107 is associated with one or more storage units, and computing unit 107 can be deposited from input data associated there Data storage part in storage unit 102 obtains data to be calculated, and can store to the associated output data Data are written in unit 104.Computing unit 107 completes most of operation in neural network algorithm, i.e. multiply-add operation of vector etc., this It outside, is weight compressed format due to being loaded into the weight format for participating in calculating in computing unit 107, in computing unit 107 In should also include that weight retrieves subelement, the subelement is for guaranteeing that compressed weight can be computed correctly with weight.

Fig. 2 is a kind of weight compressed format proposed by the present invention, by the method compressed offline under piece to initial data into Row is recoded, and then realizes weight compression.The weight compressed format includes<weight, and offset>two parts composition, weight is mind Original value through network weight before being compressed, offset are the relative position of current non-zero weight in one group of weighted value.It is pressing In compression process, by recompiling to obtain weight value sequence for the element that not retain numerical value be zero, only retain nonzero element, the party Method ensure that only non-zero weight value participates in neural computing, is compressed by weight, effectively reduces weight quantity in data, Reduce neural computing amount, improves system integral operation speed.

The weight compression process is described in detail by Fig. 3.Weight is grouped, first prime number in every group is by computing unit Scale determine.Weight compression process now is described in detail so that every group of weight includes four elements as an example, in first group of weight, number The element that value is 1.5 and 2.5 is respectively the 0th and the 1st element, therefore after recompiling, this group of weight remains two Nonzero element, the offset for indicating element position is respectively 0 and 1；It include three non-zeros in second group of original weighted data Element is the 0th, the 3rd and the 4th element in this group of data, therefore offset is respectively 0,3 and 4.In third group weight It include 3 and 4 two nonzero elements, offset is respectively 2 and 3 in value.

When computing unit resource is enough, i.e., when possessing multiple computing units simultaneously, the weighted value of multiple and different queues can It is loaded into different computing units simultaneously, the elements in parallel work of same order position, is independent of each other in different queue, grouping side Formula is identical as single computing unit, and the element in each queue in identical relative position is divided into a group, is calculating Cheng Zhong, the data parallel of different queue is loaded into computing unit in each queue.

For convenience of description, Fig. 4 illustrates more computing unit situations by taking two computing units as an example, includes two weights in Fig. 4 Queue, each queue weight are respectively connected into corresponding computing unit, and each computing unit works independently.According to computing unit Amount of capacity, weight are divided into four groups, and in every group of weight, the weighted value of each queue is according to identical group of interior element length point It does not compress.

There is two o'clock advantage with weight compressed format storage weight, only store the nonzero element in weight first, it can significantly Reduce EMS memory occupation；Secondly, only nonzero element is loaded into computing unit, improve calculating speed and improves computing unit Utilization rate.

Fig. 5 is computing unit structural schematic diagram, is described corresponding when weight and data carry out convolutional neural networks calculating Relationship, in the calculating process, into each computing unit, different weighted values is linked into each computing unit for data sharing In, each computing unit concurrent working.

Fig. 6 is a kind of flow chart of neural computing process of the present invention, this method comprises:

Step S1, control unit address storage unit, read and parse the instruction needed to be implemented in next step；

Step S2 obtains input data according to the storage address analyzed the instruction from storage unit；

Data and weight are loaded into computing unit from input storage unit and weight storage unit respectively by step S3；

Step S4, computing unit execute neural network computing in arithmetic operation, wherein data retrieval structure ensure that by The data of compression can be computed correctly with weighted data；

Step S5 will be stored in output storage unit with neural computing result.

In conclusion the present invention is for arithmetic speed present in neural network processor, poor, low this of energy efficiency is asked Topic, by way of compressing offline, by neural network weight boil down to weight compressed format outside piece, reduces at neural network The occupancy for managing weight resource in device, improves arithmetic speed, improves energy efficiency.

Although not each embodiment only includes one it should be appreciated that this specification describes according to various embodiments A independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should will say As a whole, the technical solutions in the various embodiments may also be suitably combined for bright book, and forming those skilled in the art can be with The other embodiments of understanding.

The foregoing is merely the schematical specific embodiment of the present invention, the range being not intended to limit the invention.It is any Those skilled in the art, made equivalent variations, modification and combination under the premise of not departing from design and the principle of the present invention, It should belong to the scope of protection of the invention.

Claims

1. a kind of neural network processor based on weight compression characterized by comprising

At least one storage unit controller, for controlling the storage unit；

Control unit is connected with the storage unit controller with the computing unit, for controlling via the storage unit Device obtains the instruction of the storage unit storage, and parses described instruction to control the computing unit；

At least one weight compression unit, for being compressed to weight, wherein each weight compression unit and the meter It calculates unit to be connected, guarantees compressed weight and the correct operation of corresponding data；

Wherein, it is recoded by the method compressed offline under piece to the data for participating in calculating, lattice is compressed by weight Formula realizes weight compression；

In weight compression process, by recompiling the weight value sequence of acquisition for the element that not retain numerical value be zero, only protect Stay nonzero element.

2. the neural network processor as described in claim 1 based on weight compression, which is characterized in that the storage unit packet Include input data storage unit, output data storage unit, weight storage unit, the location of instruction.

3. the neural network processor as claimed in claim 2 based on weight compression, which is characterized in that the input data is deposited Storage unit is used to store the data that the participation calculates, and the data for participating in calculating include in primitive character diagram data and participation The data that interbed calculates；The output data storage unit includes calculating the neuron response obtained；The weight storage is single Member is for storing trained neural network weight；Described instruction storage unit is used to store the instruction letter for participating in calculating Breath.

4. the neural network processor as described in claim 1 based on weight compression, which is characterized in that the weight compresses lattice Formula include<weight, offset>.

5. as claimed in claim 2 based on weight compression neural network processor, which is characterized in that the computing unit from Data are obtained in the input data storage unit associated there to be calculated, and to associated there described defeated Data are written in data storage cell out.

6. a kind of design side of neural network processor of design based on weight compression as described in claim 1-5 any one Method characterized by comprising

Step 1, control unit is addressed storage unit, reads and parses the instruction needed to be implemented in next step；

Step 2, storage address is obtained according to the instruction that parses, and obtained from the storage unit data that participate in calculating with Weight；

Step 3, the data for participating in calculating and weight are loaded into meter from input storage unit and weight storage unit respectively Calculate unit；

Step 4, computing unit executes the arithmetic operation in neural network computing, is pressed wherein ensure that by weight compression unit The data of contracting can be computed correctly with weighted data；

Step 5, neural computing result is stored in output storage unit；

7. a kind of includes the chip of the neural network processor based on weight compression as described in claim 1-5 any one.