CN107729995A

CN107729995A - Method and system and neural network processor for accelerans network processing unit

Info

Publication number: CN107729995A
Application number: CN201711054139.4A
Authority: CN
Inventors: 韩银和; 许浩博; 王颖
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2017-10-31
Filing date: 2017-10-31
Publication date: 2018-02-23

Abstract

The invention provides the method for accelerans network processing unit and corresponding neural network processor, wherein from original data packet and the weight packet of pending neural network model, extraction nonzero element simultaneously sets the position mark of each packet, and the position mark being each grouped indicates whether the element of relevant position in the packet is zero；The computing unit that the data and weight being all not zero based on element of the position mark selection in same position and relevant position when calculating are loaded onto neural network processor participates in computing.So, the data scale handled by neural network processor can be effectively reduced, so as to reduce storage overhead on piece, arithmetic speed is accelerated and reduces energy consumption so that Processing with Neural Network systematic function is more efficient.

Description

Method and system and neural network processor for accelerans network processing unit

Technical field

The present invention relates to neural network processor, more particularly to the method that accelerans network model calculates.

Background technology

Deep learning achieves important breakthrough in the last few years, and the neural network model using deep learning Algorithm for Training is being schemed As the application fields such as identification, speech processes, intelligent robot achieve the achievement to attract people's attention.Deep neural network passes through foundation Model simulates the neural attachment structure of human brain, when handling the signals such as image, sound and text, passes through multiple conversion ranks Data characteristics is described for section layering.With the continuous improvement of neutral net complexity, nerual network technique is in practical application During exist and take that resource is more, the problems such as arithmetic speed is slow, energy expenditure is big.Traditional software meter is substituted using hardware accelerator The method of calculation turns into the effective mode for improving neural computing efficiency, such as utilizes graphics processing unit, special place Manage the neural network processor that device chip and FPGA (FPGA) are realized.

At present neural network processor generally using the weighted data trained as input signal together with data-signal Carry out arithmetic operation on piece.Neural network processor belongs to computation-intensive and memory access processor-intensive.Neural network computing During substantial amounts of parameter iteration be present, computing unit needs largely to access memory.As Neural Network Data is advised The continuous growth of mould, intensive accessing operation do not only take up a large amount of Resources on Chip of neural network processor, also reduce its computing Speed.

The content of the invention

Therefore, it is an object of the invention to overcome above-mentioned prior art the defects of, there is provided one kind improves Processing with Neural Network The method of device arithmetic speed and corresponding neural network processor.

The purpose of the present invention is achieved through the following technical solutions：

On the one hand, the invention provides a kind of method for accelerans network processing unit, methods described to include：

Packet and weight packet of the step 1) for neural network model to be loaded, extract nonzero element and set The position mark being respectively grouped, the position mark being each grouped indicate whether the element of relevant position in the packet is zero；

The nonzero element and position mark that each packet and weight are grouped by step 2) are loaded onto neural network processor Memory cell in；

Step 3) is matched based on the position mark to data and weight, will only be in same position and relevant position The data that are all not zero of element and weight be loaded onto the computing unit of neural network processor and participate in computing.

In the above method, it may also include and extract non-zero from the output data of the computing unit from neural network processor Element and its position mark, and it is saved into data storage cell.

In the above method, step 3) may include：

By each position in the binary form of the position mark of packet and the binary system of the position mark of weight packet Each carry out order comparison in form；

Position is identical and be all the calculating that the data of position and weight corresponding to 1 position are loaded onto neural network processor Unit participates in computing.

Another aspect, the invention provides a kind of neural network processor, including control unit, computing unit, weight to deposit Storage unit, data storage cell, wherein data matching unit, control unit are used to control the scheduling of related data, computing with depositing Storage；Nonzero element and its position mark in the neutral net weight packet that the storage of weight memory cell has trained；Data are deposited Storage unit stores nonzero element and its position mark in neutral net original data packet and intermediate result data；Data Matching list Member be used for based on position mark to the weight from the weight memory cell and the data from data storage cell carry out Match somebody with somebody, only data and weight that the element of same position and relevant position is all not zero are loaded onto in computing unit.

In above-mentioned neural network processor, data compression unit is may also include, for from from the defeated of computing unit Go out extracting data nonzero element and set location mark, and be saved into data storage cell.

In above-mentioned neural network processor, data matching unit may include one or more comparators.

In above-mentioned neural network processor, data compression unit may include input register, output register and ratio Compared with device, input register receives the data from computing unit, judges whether the data are null value by comparator, if not being The data and corresponding register number are loaded into output register while marker bit are designated as into 1 by zero.

Another aspect, the invention provides a kind of system for accelerans network processing unit, the system includes：

Data prediction device, it is grouped for the packet to neural network model to be loaded and weight, extraction is non- Neutral element and the position mark that each packet is set, the position mark being each grouped indicate relevant position in the packet element whether It is zero, and for the nonzero element and position mark of each packet and weight packet to be loaded onto into neural network processor In memory cell；

Data matching device, data and weight are matched based on the position mark, only will be in same position and The computing unit that the data and weight that the element of relevant position is all not zero are loaded onto neural network processor participates in computing.

Said system may also include data compression device, from the output data of the computing unit from neural network processor Middle extraction nonzero element and its position mark, and it is saved into data storage cell.

In said system, the data matching device can be configured as：

Compared with prior art, the advantage of the invention is that：

The present invention effectively reduces the data scale handled by neural network processor, so as to reduce storage overhead on piece, Accelerate arithmetic speed and reduce energy consumption so that Processing with Neural Network systematic function is more efficient.

Brief description of the drawings

Embodiments of the present invention is further illustrated referring to the drawings, wherein：

Fig. 1 is the schematic flow sheet according to the method for accelerans network processing unit of the embodiment of the present invention；

Fig. 2 is to compress storage format example schematic diagram according to the weight of the embodiment of the present invention；

Fig. 3 is the compression storing data format sample schematic diagram according to the embodiment of the present invention；

Fig. 4 is the weight compression process example schematic diagram according to the embodiment of the present invention；

Fig. 5 is the data compression process example schematic diagram according to the embodiment of the present invention；

Fig. 6 is the structural representation according to the neural network processor of the embodiment of the present invention；

Fig. 7 is the structural representation according to the data matching unit of the embodiment of the present invention；

Fig. 8 is the structural representation according to the data compression unit of the embodiment of the present invention；

Fig. 9 is the calculation process schematic diagram using the neural network processor of the embodiment of the present invention.

Embodiment

In order that the purpose of the present invention, technical scheme and advantage are more clearly understood, pass through below in conjunction with accompanying drawing specific real Applying example, the present invention is described in more detail.It should be appreciated that specific embodiment described herein is only to explain the present invention, and It is not used in the restriction present invention.

It is 0 to show that inventor, which has found to participate under study for action to exist in the data and weight of neural computing big numerical quantity, As such data do not produce numerically with weight after the computing such as multiplication and addition to operation result in calculating process Influence.But these numerical value be 0 data and weight in storage, be loaded into and the process such as computing can take a large amount of Resources on Chip, disappear Consume the unnecessary working time, it is difficult to meet the performance requirement of neural network processor.

In one embodiment of the invention, there is provided a kind of method for accelerans network processing unit.Such as Fig. 1 institutes Show, this method mainly includes the 1) original data packet for neural network model to be loaded and weight packet, extracts non-zero Element and the position mark for setting packet, the position mark of packet indicate whether the element of relevant position in the packet is zero；2) The nonzero element and position mark that packet and weight are grouped are loaded onto in the memory cell of neural network processor；3) base Data and weight are matched in the position mark, only the element in same position and relevant position are all not zero The computing unit that data and weight are loaded onto neural network processor participates in computing.

More specifically, being grouped in step 1) for the original data packet and weight of neural network model to be loaded, carry Negated neutral element and the position mark that packet is set.In neural computing, it will usually by pending weight sum according to this Identical model split is stored and loaded into multiple packets or sequence, and the element in every group can be according to the god of actual use The scale of computing unit through network processing unit determines.The process of this extraction nonzero element and set location mark can also manage Solve for pending Neural Network Data and weight are recompiled or compressed, through recompiling or compressing what is obtained afterwards By element that not retain numerical value be zero in weight sequence and data sequence.Storage format such as Fig. 2 of weight after step 1) processing It is shown, including two parts：<Weight>With<Mark>；The storage format of data is as shown in figure 3, also include two parts：<Data Nonzero element>With<Mark>.Wherein mark (alternatively referred to as position mark) indicate relevant position in the packet element whether be Zero, such as in a packet if the numerical value of the element of correspondence position is 0, the mark of the position can be set 0, if relevant position Element be nonzero element, then the mark value of the position can be arranged to 1.

Fig. 4 gives the process signal that processing is compressed to weight.Retouched in Fig. 4 so that every group includes four elements as an example State the process of weight compression.As shown in figure 4, being original weight above line, and below line it is the power obtained after step 1) processing Weight.In first group of weight, nonzero element is 1.5 and 2.5 element, and the two elements are in the 1st position and the 4th of the packet Individual position, therefore after recompiling or compressing, line this group of weight shown below remains the two nonzero elements, and the group Position mark corresponding to weight is 1001；Three nonzero elements are included in second group of original weight, are in this group of weight 1st, the 3rd and the 4th element, therefore after recompiling or compressing, this group of weight remains the two nonzero elements, and Position mark corresponding to this group of weight is arranged to 1011.In the 3rd group of weighted value, comprising two nonzero elements 3 and 4, it is corresponding Position mark be arranged to 0011.

Data compression process shown in Fig. 5 is similar with the weight compression process shown in Fig. 4, contains four members with every group of packet It is initial data above line exemplified by element, and below line is the data obtained after step 1) processing.In first group of data, number The element being worth for 1 and 2 is respectively the 1st and the 4th element, therefore after recompiling, this group of data remain two non-zeros Element, and position mark corresponding to this group of data is 1001；Three nonzero elements are included in second group of initial data, It is the 1st, the 2nd and the 4th element in this group of data, position mark corresponding to this group of weight is arranged to 1101.At the 3rd group Three nonzero elements are remained in data, after compression, its position mark is arranged to 1011.

With continued reference to Fig. 1, after above-mentioned processing, nonzero element in weight and packet and position are marked in step 2) Note is loaded onto in the memory cell of neural network processor, such as can be respectively loaded on the weight storage list of neural network processor Member and data storage cell.Then when step 3) is being calculated, data is read from data storage cell and are stored from weight Unit reads weight, and data and weight are matched based on the position mark, will only be in same position and relevant position The data that are all not zero of element and weight be loaded onto the computing unit of neural network processor and participate in computing.For example, for number Each carry out order comparison in the position mark being grouped according to the position mark and weight of packet, if same position and mark is same When be 1, then the weight of relevant position and data are loaded onto in computing unit.As can be seen that point for including 4 elements Group, position mark corresponding to each packet are actually that (its number range is 2 for an integer⁰-2⁴Between), the two of the numerical value Each position of binary form indicates whether each element is 0 in the packet successively.Therefore for neural network processor, only deposit The nonzero element and a position mark in packet and weight packet are stored up, EMS memory occupation can be greatly reduced；And only will Non-zero and weight are loaded into computing unit, have both been improved calculating speed and have been improved computing unit utilization rate.

In yet another embodiment, this method also includes the output for the computing unit from neural network processor Every group of data carry out it is same recompile or compress, it is identical with the above-mentioned processing mode to weight and initial data, only should Nonzero element and its position mark in group data are saved in memory cell.Because it can be produced very in neural computing More results of intermediate calculations, wherein nonzero element is also only preserved from these results of intermediate calculations can be further at optimization neural network Manage the utilization rate of storage and computing resource in device.

Fig. 6 is the structural representation according to the neural network processor of one embodiment of the present of invention.At the neutral net The structure based on storage-control-calculating is managed, wherein storage organization is used to store the data for participating in calculating and processor operation refers to Order；Control structure includes decoding circuit, and for parsing operational order, generation control signal is with the scheduling of data in control sheet with depositing Storage and neural computing process；Calculating structure includes ALU, for participating in the neutral net in the processor Calculate operation.As shown in fig. 6, control unit can be single with data storage cell, weight memory cell, the location of instruction, calculating Member communication, control unit obtain the instruction being stored in the location of instruction and parse the instruction, produce control signal control Computing unit carries out neural computing.Weight memory cell is used to store the neutral net weight trained, and data are deposited Storage unit is used to store the various data related to neural computing, and the data may include the primitive character of neural network model The parameter that data and participation intermediate layer calculate and the data of output from computing unit etc..Computing unit is used for according to control The caused control signal of unit performs corresponding neural computing.Computing unit is related to one or more memory cell Connection, computing unit can obtain data and weight to be calculated from data storage cell and weight memory cell, and can To write data to data storage cell.

But it is different from existing neural network processor, deposited in the weight memory cell shown in Fig. 6 and data storage cell Storage is the data as described above by recompiling or compressing, and is only saved non-in each packet and weight packet Neutral element and its position mark.In addition, data also are added between the input of computing unit and the output of memory cell Matching unit, and add data compression unit between the output of computing unit and the input of memory cell.Wherein, data With unit in weight memory cell and data storage cell use recompile or compress after form store weight with Data are matched, for example, the position mark of packet and weight packet is read, by the binary form of the position mark Each order is compared, and is only loaded onto data and weight that the element in same position and relevant position is all not zero The computing unit of neural network processor participates in computing, so as to ensure that the weight of compression can be carried out with corresponding compressed data Correctly calculate.Fig. 7 gives the structural representation of the data matching unit of example.In the data matching unit comprising one or Multiple comparators, the effect of comparator are that the position mark of the position mark of data and weight is compared, and are only allowed identical Position and mark are that the buffer queue for the array that 1 data and weight are loaded onto computing unit is medium to be calculated simultaneously.

What is shown in Fig. 6 is only an example of each computing unit shared data matching unit.In another embodiment In or corresponding data matching unit is set in each computing unit.So, calculated in neural network model Cheng Zhong, the data sharing from data storage cell is into each computing unit, and the different power from weight memory cell Weight values are linked into each computing unit, each computing unit by the data matching unit of oneself to the position mark of weight and The position mark of data is matched, and only performing follow-up calculate to the correspondence position data and weight that match operates, and each Computing unit can concurrent working.

With continued reference to Fig. 6, the data compression unit between the output of computing unit and the input of memory cell is used for The results of intermediate calculations of computing unit output is compressed on neural network processor piece, only retains nonzero element, does not deposit Store up zero valued elements., to weight and the processing identical mode of initial data, only computing unit is exported using with described above One group of data in nonzero element and its position mark be saved in memory cell, so as to further optimization neural network processor Middle storage and the utilization rate of computing resource.Fig. 8 gives the structural representation of the data compression unit of example.The data compression list Member is made up of the input, it is necessary in the data access compressed to compression unit input register, output register and comparator In register group, then judge whether the data of access are null value by comparator, if not null value is then by data and corresponding Register number is loaded into output register, while comparative result is recorded in marker bit, if null value, marker bit 0, If not null value, marker bit 1.

Fig. 9 shows that use neural network processor according to embodiments of the present invention carries out the process of neural computing Schematic flow sheet.Wherein each computing unit of the neural network processor includes respective data matching unit.Such as Fig. 9 institutes Show, control unit addresses to memory cell, reads and parse the instruction for needing to perform, the storage address obtained according to analysis instruction Input data is obtained from memory cell, data and weight are stored from data storage cell and weight respectively in packetized units Unit is loaded into computing unit.In neural network model in calculating process, data storage cell will be come from according to control instruction Packet share in each computing unit, and the packet of the weight from weight memory cell is linked into each corresponding calculate In unit.Then, the data matching unit set in each computing unit is based on the position for receiving weight packet and packet Mark carries out data and matched with weight, is that 1 data and weight perform neural network computing simultaneously only to same position and mark Middle related arithmetic operation.The correlation result of each computing unit is provided to data compression unit, by data compression unit from In extract nonzero element and set location mark, output this to data storage cell.

In yet another embodiment, a kind of system for accelerans network processing unit, including piece external pressure are additionally provided Compression apparatus and neural network processor described above.Wherein, the piece external pressure compression apparatus is to pending neural network model Original data packet and weight packet in extract nonzero value and set location mark, then by the data after processing and weight point The data storage cell and weight memory cell of neural network processor are not loaded onto.

In yet another embodiment, a kind of system for accelerans network processing unit, the system bag are additionally provided Include data prediction device and data matching device.Wherein data prediction device is used for for neural network model to be loaded Original data packet and weight packet, extraction nonzero element simultaneously the position mark of packet is set, and is loaded into nerve net In the memory cell of network processor.Data matching device is used to match data and weight according to position mark, only will place Data and weight that element in same position and relevant position is all not zero are loaded onto the computing unit of neural network processor Participate in computing.In another embodiment, the system can also include：Data compression device, for from Processing with Neural Network Nonzero element and set location mark are extracted in the output data of the computing unit of device, is then saved into Processing with Neural Network In the data storage cell of device.

Although the present invention be described by means of preferred embodiments, but the present invention be not limited to it is described here Embodiment, also include made various changes and change without departing from the present invention.

Claims

1. a kind of method for accelerans network processing unit, methods described includes：

Packet of the step 1) for neural network model to be loaded and weight packet, extract nonzero element and set each point The position mark of group, the position mark being each grouped indicate whether the element of relevant position in the packet is zero；

The nonzero element and position mark that each packet and weight are grouped by step 2) are loaded onto depositing for neural network processor In storage unit；

Step 3) is matched based on the position mark to data and weight, only by the member in same position and relevant position The computing unit that the data and weight that element is all not zero are loaded onto neural network processor participates in computing.

2. the method according to claim 11, in addition to the output data from the computing unit from neural network processor Middle extraction nonzero element and its position mark, and it is saved into data storage cell.

3. according to the method for claim 1, step 3) includes：

By each position in the binary form of the position mark of packet and the binary form of the position mark of weight packet In each carry out order comparison；

By position is identical and mark be all 1 position corresponding to position data and weight be loaded onto the calculating of neural network processor Unit participates in computing.

4. a kind of neural network processor, including control unit, computing unit, weight memory cell, data storage cell, data Matching unit, wherein control unit are used for the scheduling, computing and storage for controlling related data；The storage of weight memory cell has been instructed Nonzero element and its position mark in the neutral net weight packet perfected；Data storage cell stores neutral net initial data Nonzero element and its position mark in packet and intermediate result data；Data matching unit is used for based on position mark to from institute State the weight of weight memory cell and the data from data storage cell are matched, only by same position and relevant position The data and weight that element is all not zero are loaded onto in computing unit.

5. neural network processor according to claim 4, in addition to data compression unit, for from from computing unit Output data in extract nonzero element and set location mark, and be saved into data storage cell.

6. the neural network processor according to claim 4 or 5, wherein data matching unit include one or more compare Device.

7. the neural network processor according to any one of claim 4 or 5, wherein data compression unit are posted including input Storage, output register and comparator, input register receive the data from computing unit, judge the data by comparator Whether it is null value, the data and corresponding register number are loaded into output register if being not zero simultaneously will mark Position is designated as 1.

8. a kind of system for accelerans network processing unit, the system includes：

Data prediction device, it is grouped for the packet to neural network model to be loaded and weight, extracts non-zero entry Element simultaneously sets the position mark of each packet, the position mark being each grouped indicate relevant position in the packet element whether be Zero, and for the nonzero element and position mark of each packet and weight packet to be loaded onto into depositing for neural network processor In storage unit；

Data matching device, data and weight are matched based on the position mark, will only be in same position and corresponding The computing unit that the data and weight that the element of position is all not zero are loaded onto neural network processor participates in computing.

9. system according to claim 8, in addition to：

Data compression device, nonzero element and its position are extracted from the output data of the computing unit from neural network processor Tagging, and it is saved into data storage cell.

10. system according to claim 8, the data matching device is configured as：