CN110175673A

CN110175673A - Processing method and accelerator

Info

Publication number: CN110175673A
Application number: CN201910474387.7A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2017-05-23
Filing date: 2018-05-23
Publication date: 2019-08-27
Anticipated expiration: 2038-05-23
Also published as: CN110175673B

Abstract

Present disclose provides a kind of processing units, in trained and calculating process, it is sparse to carry out coarseness to the weight of neural network, so that reducing memory access in trained and calculating process reduces operand simultaneously, to obtain speed-up ratio and reduce energy consumption.

Description

Processing method and accelerator

Technical field

This disclosure relates to which a kind of processing method and accelerator of computer field, refreshing by beta pruning further to one kind Accelerate the processing method and accelerator of operation through network weight.

Background technique

Neural network (neural network) has been obtained for extremely successful application.But as our designs are bigger Scale, deeper neural network will introduce more weights, and ultra-large weight becomes one of Application of Neural Network Huge challenge.On the one hand, more stringent requirements are proposed to storage for large-scale weight data, and a large amount of accessing operations will bring huge Memory access energy consumption, on the other hand, a large amount of weight it is also proposed that requirements at the higher level, calculates the time and calculates energy consumption therewith to arithmetic element Increase.Therefore, the weight that neural network is reduced under the premise of reducing computational accuracy, to reduce data storage capacity and calculating Amount becomes a urgent problem to be solved.

Current most work mainly utilizes low-rank matrix decomposition or Hash skill etc., but these methods can subtract Few weight and limited calculated amount, and the precision of neural network can be reduced, therefore, it is necessary to a kind of significantly more efficient methods to subtract The weight of few neural network simultaneously reduces calculation amount.

Summary of the invention

(1) technical problems to be solved

In view of this, the disclosure is designed to provide a kind of processing method and accelerator, to solve the above At least one of technical problem.

(2) technical solution

In a first aspect, the embodiment of the present invention provides a kind of processing unit, comprising:

Coarseness branch unit carries out coarseness beta pruning for the weight to neural network, to obtain the weight after beta pruning；

Arithmetic element, for being trained according to the weight after beta pruning to the neural network；

Wherein, the coarseness beta pruning unit is specifically used for:

M weight is selected from the weight of neural network by sliding window, the M is the integer greater than 1；When described When M weight meets preset condition, the M weight is set to zero in whole or in part.

Further, the preset condition are as follows:

The information content of the M weight is less than the first preset threshold.

Further, the information content of the M weight is the arithmetic mean of instantaneous value of the M weight absolute value, the M power It is worth the geometrical mean of absolute value or the maximum value of the M weight, first preset threshold is first threshold, the second threshold Value or third threshold value, the information content of the M weight include: less than the first preset threshold

The arithmetic mean of instantaneous value of the M weight absolute value is less than the first threshold or the M weight absolute value Geometrical mean is less than the second threshold or the maximum value of the M weight is less than the third threshold value.

Further, the coarseness beta pruning unit is used for: repeat to carry out the weight of neural network coarseness beta pruning with And neural network is trained according to the weight after beta pruning, until guaranteeing do not have under the premise of precision does not lose setting accuracy Weight meets the preset condition.

Further, the setting accuracy is x%, and wherein x is between 0 to 5.

Further, the neural network includes that full articulamentum, convolutional layer and/or shot and long term remember LSTM layers, wherein institute The weight for stating full articulamentum is a two-dimensional matrix (Nin, Nout), and wherein Nin is the number for inputting neuron, and Nout is output The number of neuron, the full articulamentum have Nin*Nout weight；The weight of the convolutional layer is a four-matrix (Nfin, Nfout, Kx, Ky), wherein Nfin is the number of input feature vector image, and Nfout is the number for exporting characteristic image, (Kx, Ky) is the size of convolution kernel, and the convolutional layer has Nfin*Nfout*Kx*Ky weight；LSTM layers of the weight has m The weight of a full articulamentum forms, and the m is the integer greater than 0, and i-th of full articulamentum weight is (Nin_i, Nout_i), Middle i is greater than 0 and is less than or equal to the integer of m, and Nin_i indicates that i-th of full articulamentum weight inputs neuron number, Nout_i Indicate i-th of full articulamentum weight output neuron number；The coarseness beta pruning unit is specifically used for:

When the weight to the full articulamentum carries out coarseness cut operator, the size of the sliding window is Bin* The sliding window of Bout, wherein the Bin be greater than 0 and be less than or equal to Nin integer, the Bout be greater than 0 and be less than or Integer equal to Nout；

The sliding window is set to be slided along the direction of Bin according to step-length Sin, Huo Zheyan

The direction of Bout slided according to step-length Sout, wherein Sin is greater than 0 and just whole less than or equal to Bin Number, Sout are the positive integer greater than 0 and less than or equal to Bout；

M value is chosen from the Nin*Nout weight by the sliding window, when the M weight meets institute When stating preset condition, the M weight is set to zero in whole or in part, wherein the M=Bin*Bout；

When the weight to the convolutional layer carries out coarseness cut operator, the sliding window is that size is Bfin* The four-dimensional sliding window of Bfout*Bx*By, wherein Bfin is the integer greater than 0 and less than or equal to Nfin, and Bfout is greater than 0 And it is less than or equal to the integer of Nfout, Bx is the integer greater than 0 and less than or equal to Kx, and By is greater than 0 and to be less than or equal to Ky Integer；

Slide the sliding window according to step-length Sfin along the direction of Bfin, or along the direction of Bfout Slided according to step-length Sfout, or slided along the direction of Bx according to step-length S, or along By direction according to step Long Sy is slided, and wherein Sfin is the integer greater than 0 and less than or equal to Bfin, and Sfout is greater than 0 and to be less than or equal to The integer of Bfout, Sx are the integer greater than 0 and less than or equal to Bx, and Sy is the integer greater than 0 and less than or equal to By；

M weight is chosen from the Nfin*Nfout*Kx*Ky weight by the sliding window, is weighed when described M When value meets the preset condition, the M weight is set to zero in whole or in part, wherein the M=Bfin*Bfout* Bx*By；

When carrying out coarseness cut operator to LSTM layers of the weight, the size of the sliding window is Bin_i* Bout_i, wherein Bin_i is the integer greater than 0 and less than or equal to Nin_i, and Bout_i is greater than 0 and to be less than or equal to Nout_ The integer of i；

Slide the sliding window according to step-length Sin_i along the direction of Bin_i, or along the direction of Bout_i Slided according to step-length Sout_i, wherein Sin_i be greater than 0 and be less than or equal to Bin_i positive integer, Sout_i be greater than 0 and be less than or equal to Bout_i positive integer；

M weight is chosen from the Bin_i*Bout_i weight by the sliding window, when the M weight is full When the foot preset condition, the M weight is set to zero in whole or in part, wherein the M=Bin_i*Bout_i.

Further, the arithmetic element is specifically used for: according to the weight after the beta pruning and by using backpropagation Algorithm carries out re -training.

Further, the processing group device further include:

Quantifying unit, for after the weight to neural network carries out coarseness beta pruning and according to the weight after beta pruning Before carrying out retraining to neural network, quantifies the weight of the neural network and/or the weight of the neural network is carried out First operation, to reduce the weight bit number of the neural network.

In a feasible embodiment, the coarseness selects counting unit to be specifically used for:

Neuron index is generated according to input neuron, the neuron index, which is used to indicate corresponding input neuron, is It is no useful；

With operation is carried out to neuron index and the location information of the target weight, to obtain neuron mark Will；Each in the neuronal marker indicates whether corresponding neuron is selected；

Each of the neuronal marker is added, to obtain cumulative character string；

With operation is executed to the cumulative character string and the neuronal marker, to obtain for selecting input nerve The target string of member；

Input neuron is selected according to the target string, and the input neuron of selection is input to above-mentioned operation Unit.

In a feasible embodiment, the arithmetic element includes multiple processing units, in the multiple processing unit Each processing unit include weight buffer area, weight decoder module, Weight selected device module and neuronal function unit,

The weight buffer area, for buffering weight；

The weight decoder module extracts power for the compressed value in code book and dictionary according to used in local quantization Value, and the weight is transmitted to the Weight selected device module；

The Weight selected device module according to index character string and comes from the weight solution for obtaining index character string The weight of code device module, obtains the weight of selection；The selected weight is to carry out calculating to the neuronal function unit having Weight；

The neuronal function unit, for obtaining the selected input neuron, and to the selected weight and The selected input neuron carries out operation, to obtain output neuron.

Second aspect, the embodiment of the present invention provide a kind of accelerator, which includes:

Storage unit, the neural network weight after input neuron, output neuron, beta pruning for storing neural network And instruction；Wherein neural network is to obtain trained neural network model to the Weight Training after beta pruning；

Coarseness beta pruning unit carries out coarseness beta pruning for the weight to the neural network, after obtaining beta pruning Weight, and by the weight storage after the beta pruning into the storage unit；

Coarseness selects counting unit, and for receiving input neuron and target weight location information, it is corresponding to select target weight Neuron, target weight be absolute value be greater than the second preset threshold weight；

Arithmetic element, target weight for receiving input and its corresponding neuron, and according to target weight and its right The neuron answered carries out operation, and output neuron is retransmitted to storage unit.

Wherein, the storage unit is also used to store arithmetic element and carries out the intermediate result generated in calculating process.

Further, the accelerator further include:

Instruction control unit is decoded for receiving described instruction, and to the instruction, to obtain control information to control The arithmetic element.

Further, the storage unit is used to store the location information of target weight and target weight.

Further, the accelerator further include:

Pretreated data are inputted the storage section for pre-processing to initial data by pretreatment unit, Above-mentioned initial data includes input neuron, output neuron and weight.

Further, the pretreatment includes at cutting, gaussian filtering, binaryzation, regularization and/or the normalization of data Reason.

Further, the accelerator further include:

Instruction cache unit, for caching described instruction.Described instruction cache unit is on piece caching.

Further, the accelerator further include:

Target weight cache unit is used for caching of target weight.The target weight cache unit is on piece caching.

Further, the accelerator further include:

Target weight position cache unit, the location information for caching of target weight.Target weight position caching Unit is on piece caching.

Further, the accelerator further include:

Neuron cache unit is inputted, for caching input neuron, the input neuron cache unit is slow on piece It deposits.

Further, the accelerator further include:

Output neuron cache unit, for caching output neuron, the output neuron cache unit is slow on piece It deposits.

Further, target weight position cache unit, the location information for caching of target weight；Target weight Position cache unit corresponds connection weight each in input data to corresponding input neuron.

Further, accelerator further include:

Be directly accessed cells D MA, in the storage unit, with described instruction cache unit, coarseness beta pruning unit, Target weight cache unit, target weight position cache unit, input neuron cache unit or output neuron caching in into Row data or instruction read-write.

Further, the arithmetic element comprises at least one of the following: multiplier, for will first input data with Second input data is multiplied, the data after being multiplied；Add tree, for third input data to be passed through add tree phase step by step Add, or the third input data is passed through into the data after being added with the 4th input data；Activation primitive fortune Unit is calculated, output data is obtained by activation primitive operation to the 5th data, the activation primitive is sigmoid, tanh, relu Or softmax function operation.

Further, the arithmetic element also Bao Chihua unit, for being obtained to the 6th data of input by pond operation Output data to after pondization operation, the pondization operation includes: average value pond, maximum value pond or intermediate value pond.

The third aspect, the embodiment of the present invention provide a kind of accelerator, which includes:

Storage unit, the neural network weight after input neuron, output neuron, beta pruning for storing neural network And instruction；Wherein the neural network is to obtain trained neural network model to the Weight Training after beta pruning；

Coarseness beta pruning unit carries out beta pruning for the weight to neural network, to obtain the weight after beta pruning, and will cut Weight storage after branch is into said memory cells；

Arithmetic element, for being trained according to the weight after beta pruning to neural network, with the nerve net after being trained Network；

Coarseness selects counting unit, and for receiving input neuron and target weight location information, it is corresponding to select target weight Input neuron, target weight be absolute value be greater than the second preset threshold weight；Wherein, target weight is after the training Weight.

Arithmetic element, target weight for receiving input and its corresponding input neuron, and according to target weight and Its corresponding input neuron carries out operation, and output neuron is retransmitted to storage unit.

Wherein, said memory cells can also be used to store the intermediate result generated in arithmetic element progress calculating process.

Further, accelerator further include:

Instruction control unit decodes described instruction for receiving described instruction, to obtain control information, is used for Control the arithmetic element.

Further, the accelerator further include:

Pretreated data are inputted the storage section for pre-processing to initial data by pretreatment unit, Above-mentioned initial data includes input neuron, output neuron and the weight of the neural network after training.

Further, accelerator further include:

Further, target weight position cache unit, for the location information of caching of target weight, target weight Position cache unit corresponds connection weight each in input data to corresponding input neuron.

Further, accelerator further include:

Fourth aspect, the embodiment of the present invention provide a kind of processing method, comprising:

Coarseness beta pruning is carried out to the weight of neural network, to obtain the weight after beta pruning；

Neural network is trained according to the weight after the beta pruning；

Wherein, described that coarseness beta pruning is carried out to neural network, to obtain the weight after beta pruning, comprising:

M weight is selected from the weight of neural network by sliding window, the M is the integer greater than 1；

When the M weight meets preset condition, the M weight is set to zero in whole or in part, to be cut Weight after branch.

Further, the preset condition are as follows:

Further, the method also includes:

It repeats to carry out coarseness beta pruning to the weight of the neural network and be trained according to the weight after beta pruning, directly To guaranteeing there is no weight to meet the preset condition under the premise of precision does not lose setting accuracy.

Further, the setting accuracy is x%, and wherein x is between 0 to 5.

Further, the neural network includes that full articulamentum, convolutional layer and/or shot and long term remember LSTM layers, wherein institute The weight for stating full articulamentum is a two-dimensional matrix (Nin, Nout), and wherein Nin is the number for inputting neuron, and Nout is output The number of neuron, the full articulamentum have Nin*Nout weight；The weight of the convolutional layer is a four-matrix (Nfin, Nfout, Kx, Ky), wherein Nfin is the number of input feature vector image, and Nfout is the number for exporting characteristic image, (Kx, Ky) is the size of convolution kernel, and the convolutional layer has Nfin*Nfout*Kx*Ky weight；LSTM layers of the weight has m The weight of a full articulamentum forms, and the m is the integer greater than 0, and i-th of full articulamentum weight is (Nin_i, Nout_i), Middle i is greater than 0 and is less than or equal to the integer of m, and Nin_i indicates that i-th of full articulamentum weight inputs neuron number, Nout_i Indicate i-th of full articulamentum weight output neuron number；It is described to include: to neural network progress coarseness beta pruning

When the weight of the full articulamentum to the neural network carries out coarseness beta pruning, the size of the sliding window is The sliding window of Bin*Bout, wherein the Bin is greater than 0 and to be less than or equal to Nin integer, the Bout is greater than 0 and small In or equal to Nout integer；

The sliding window is set to be slided along the direction of Bin according to step-length Sin, or along the direction of Bout It is slided according to step-length Sout, wherein Sin is the positive integer greater than 0 and less than or equal to Bin, and Sout is greater than 0 and to be less than Or the positive integer equal to Bout；

M value is chosen from the Nin*Nout weight by the sliding window, when the M weight meets institute When stating preset condition, the M weight is set to zero in whole or in part, wherein the M=Bin*Bout.

When the weight of the convolutional layer to the neural network carries out coarseness beta pruning, the sliding window is that size is The four-dimensional sliding window of Bfin*Bfout*Bx*By, wherein Bfin is the integer greater than 0 and less than or equal to Nfin, and Bfout is Greater than 0 and be less than or equal to Nfout integer, Bx be greater than 0 and be less than or equal to Kx integer, By be greater than 0 and be less than or Integer equal to Ky；

M weight is chosen from the Nfin*Nfout*Kx*Ky weight by the sliding window, is weighed when described M When value meets the preset condition, the M weight is set to zero in whole or in part, wherein the M=Bfin*Bfout* Bx*By。

When the weight of the LSTM layer to the neural network carries out coarseness beta pruning, the size of the sliding window is Bin_i*Bout_i, wherein Bin_i is the integer greater than 0 and less than or equal to Nin_i, and Bout_i is greater than 0 and to be less than or wait In the integer of Nout_i；The weight of the LSTM layer to neural network carries out coarseness beta pruning and specifically includes:

M weight is chosen from the Bin_i*Bout_i weight by the sliding window, when

When the M weight meets the preset condition, the M weight is set to zero in whole or in part, wherein institute State M=Bin_i*Bout_i.

Further, the weight according to after the beta pruning is trained neural network specifically:

Re -training is carried out to the neural network according to the weight after the beta pruning and by back-propagation algorithm.

Further, it is carried out between coarseness beta pruning and retraining to the neural network further include:

Quantify the weight of the neural network and/or the first operation is carried out to the neural network weight, to reduce weight Bit number.

5th aspect, the embodiment of the present invention provide a kind of neural network computing device, the neural network computing device packet One or more accelerators as described in first aspect, second aspect or the third aspect are included, are used for from other processing units It is middle to obtain to operational data and control information, and specified neural network computing is executed, implementing result is transmitted by I/O interface Give other processing units；

It, can between the multiple computing device when the neural network computing device includes multiple computing devices To be attached by specific structure and transmit data；

Wherein, multiple computing devices are interconnected by quick external equipment interconnection Bus PC IE bus and transmit number According to support the operation of more massive neural network；Multiple computing devices are shared same control system or are possessed respectively Control system；Multiple computing device shared drives possess respective memory；The interconnection of multiple computing devices Mode is any interconnection topology.

6th aspect, the embodiment of the present invention provide a kind of neural network chip, and the neural network chip includes first party Neural network computing dress described in the processing unit in face, second aspect, accelerator and/or the 5th aspect described in the third aspect It sets.

7th aspect, the embodiment of the present invention provide a kind of chip-packaging structure, including neural network described in the 6th aspect Chip.

Eighth aspect, the embodiment of the present invention provide a kind of board, including neural network chip or the described in the 6th aspect Chip-packaging structure described in seven aspects.

9th aspect, the embodiment of the present invention provide a kind of electronic device, including board described in aspect.

Further, the electronic device includes data processing equipment, robot, computer, printer, scanner, plate Computer, intelligent terminal, mobile phone, automobile data recorder, navigator, sensor, camera, cloud server, camera, video camera, throwing Shadow instrument, wrist-watch, earphone, mobile storage, the wearable device vehicles, household electrical appliance, and/or Medical Devices.

Further, the vehicles include aircraft, steamer and/or vehicle；The household electrical appliance include TV, sky Tune, micro-wave oven, refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator；The Medical Devices include that nuclear-magnetism is total Vibration Meter, B ultrasound instrument and/or electrocardiograph.

(3) beneficial effect

The processing method of the disclosure carries out coarseness beta pruning to the weight of neural network can compared to traditional method Make sparse neural network more regularization, accelerated conducive to hardware, while reducing the memory space of target weight position.Its In, target weight is the weight that absolute value is greater than or equal to the second preset threshold.

The processing unit of the disclosure can be realized the processing method of the disclosure, and coarseness beta pruning unit carries out neural network Coarseness beta pruning list, arithmetic element re-start training to the neural network after beta pruning.

The accelerator of the disclosure can speed up the neural network after processing coarseness beta pruning by setting, sufficiently excavate thick The sparse characteristic of fineness reduces memory access and reduces operand simultaneously, to obtain speed-up ratio and reduce energy consumption.

The storage unit of the accelerator of the disclosure depositing according to target weight cooperation target weight location information by weight Storage mode can reduce the storage overhead and memory access expense, and coarseness selects counting unit that can be selected according to target weight location information Need to participate in the neuron of operation, to reduce operand；By using the multi-layer artificial neural network sparse for coarseness The special SIM D of operation is instructed and the arithmetic element of customization, solves CPU and GPU operational performance deficiency, and front end decoding overheads are big The problem of, effectively increase the support to multi-layer artificial neural network mathematical algorithm；By using for multilayer artificial neural network The dedicated on piece of network mathematical algorithm caches, and has sufficiently excavated the reusability of input neuron and weight data, avoid repeatedly to Memory reads these data, reduces EMS memory access bandwidth, avoid memory bandwidth as multi-layer artificial neural network operation and The problem of its training algorithm performance bottleneck.

Detailed description of the invention

Fig. 1 is the processing unit that a kind of pair of neural network that the embodiment of the present disclosure provides carries out coarseness beta pruning rarefaction Structural schematic diagram；

Fig. 2 is that the full articulamentum to neural network that the embodiment of the present disclosure provides carries out coarseness beta pruning schematic diagram；

Fig. 3 is that the convolutional layer to neural network that the embodiment of the present disclosure provides carries out coarseness beta pruning schematic diagram；

Fig. 4 is a kind of structural schematic diagram for accelerator that the embodiment of the present disclosure provides；

Fig. 5 is the structural schematic diagram for another accelerator that the embodiment of the present disclosure provides；

Fig. 6 is that coarseness selects counting unit course of work schematic diagram；

Fig. 7 is a kind of structural schematic diagram for processing unit that the embodiment of the present disclosure provides；

Fig. 8 a is that a kind of coarseness that the embodiment of the present disclosure provides selects number schematic diagram；

Fig. 8 b is that a kind of coarseness that the embodiment of the present disclosure provides selects number schematic diagram；

Fig. 9 is the structural schematic diagram for another accelerator that the embodiment of the present disclosure provides；

Figure 10 is the structural schematic diagram for another accelerator that the embodiment of the present disclosure provides；

Figure 11 is an a kind of specific embodiment schematic diagram of processing method that the embodiment of the present disclosure provides；

Figure 12 is a kind of structural schematic diagram for combined treatment device that the embodiment of the present disclosure provides；

Figure 13 is the structural schematic diagram for another combined treatment device that the embodiment of the present disclosure provides；

Figure 14 is a kind of structural schematic diagram for neural network processor board that the embodiment of the present disclosure provides；

Figure 15 is a kind of chip-packaging structure schematic diagram that the embodiment of the present disclosure provides；

Figure 16 is another chip-packaging structure schematic diagram that the embodiment of the present disclosure provides；

Figure 17 is another chip-packaging structure schematic diagram that the embodiment of the present disclosure provides.

Specific embodiment

For the purposes, technical schemes and advantages of the disclosure are more clearly understood, below in conjunction with specific embodiment, and reference Attached drawing is described in further detail the disclosure.

All modules of the embodiment of the present disclosure can be hardware configuration, and the physics realization of hardware configuration includes but do not limit to In physical device, physical device includes but is not limited to transistor, memristor, DNA computer.

It should be noted that " first ", " second " used in the disclosure, " third " etc. are only used for distinguishing different objects, And not meaning that has any particular order relationship between these objects.

It should be noted that coarseness beta pruning (or coarseness is sparse), which refers to, obtains at least two data (weights Or neuron), when at least two data meet preset condition, some or all of at least two data is set Zero.

According to the basic conception of the disclosure, the processing side that a kind of pair of neural network carries out coarseness beta pruning rarefaction is provided Method, processing unit and accelerator, to reduce weight storage and calculation amount.

Referring to Fig. 1, Fig. 1 is the place that a kind of pair of neural network provided in an embodiment of the present invention carries out coarseness beta pruning rarefaction The structural schematic diagram for managing device, as shown in Figure 1, the processing unit includes:

Coarseness beta pruning unit carries out coarseness beta pruning for the weight to neural network, to obtain the weight after beta pruning.

Specifically, the coarseness beta pruning unit is specifically used for:

Wherein, the preset condition are as follows:

The information content of above-mentioned M weight meets default Rule of judgment.

As an alternative embodiment, above-mentioned default Rule of judgment includes threshold decision condition.Wherein, threshold decision Condition may include: to be less than or equal to a given threshold value less than a given threshold value, be greater than a given threshold value, it is given be more than or equal to one Threshold value, in a given value range or one or more of outside a given value range.

Specifically, the information content of above-mentioned M weight is less than a given threshold value, wherein the information content of above-mentioned M weight includes But it is exhausted to be not limited to the arithmetic mean of instantaneous value of the M weight absolute value, the geometrical mean of the M weight absolute value and the M weight To the maximum value of value.The arithmetic mean of instantaneous value of above-mentioned M weight absolute value is less than first threshold；Or above-mentioned M weight absolute value Geometrical mean be less than second threshold；Or the maximum value of above-mentioned M weight absolute value is less than third threshold value.For above-mentioned Respective selection in one threshold value, second threshold and third threshold value, those skilled in the art can according to circumstances preset, Acquisition can also be calculated by way of changing the input parameter in preset formula, can also be obtained by way of machine learning ?.For the acquisition modes of first threshold, second threshold and third threshold value, the disclosure is simultaneously not specifically limited.

As an alternative embodiment, above-mentioned default Rule of judgment includes Function Mapping Rule of judgment, which reflects Penetrating Rule of judgment is to judge whether above-mentioned M weight meets specified criteria after functional transformation.

Further, above-mentioned neural network includes full articulamentum, convolutional layer and shot and long term memory (long short term Memory, LSTM) layer, wherein the weight of the full articulamentum is a two-dimensional matrix (Nin, Nout), and wherein Nin is input The number of neuron, Nout are the numbers of output neuron, and the full articulamentum has Nin*Nout weight；The convolutional layer Weight is a four-matrix (Nfin, Nfout, Kx, Ky), and wherein Nfin is the number of input feature vector image, and Nfout is output The number of characteristic image, (Kx, Ky) are the sizes of convolution kernel, and the convolutional layer has Nfin*Nfout*Kx*Ky weight；It is described LSTM layers of weight is made of the weight of m full articulamentums, and the m is the integer greater than 0, and i-th of full articulamentum weight is (Nin_i, Nout_i), wherein i is greater than 0 and is less than or equal to the integer of m, and Nin_i indicates i-th of full articulamentum weight input Neuron number, Nout_i indicate i-th of full articulamentum weight output neuron number；The coarseness beta pruning unit is specifically used In:

M value is chosen from the Nin*Nout weight by the sliding window, when the M weight meets institute When stating preset condition, the M weight is set to zero in whole or in part, wherein the M=Bin*Bout；Detailed process ginseng See Fig. 2.

M weight is chosen from the Nfin*Nfout*Kx*Ky weight by the sliding window, is weighed when described M When value meets the preset condition, the M weight is set to zero in whole or in part, wherein the M=Bfin*Bfout* Bx*By；Detailed process is referring to Fig. 3.

Further, above-mentioned M weight is the weight that above-mentioned sliding window includes in moving process.Above-mentioned coarseness is cut Above-mentioned M weight is completely or partially set to zero and includes: by branch unit

Weight (i.e. M weight) all in above-mentioned sliding window is all set to zero by above-mentioned coarseness beta pruning unit；Or Weight in the clinodiagonal of above-mentioned sliding window is set to zero；Or a part of weight of the centre of sliding window is set to zero, For example the size of above-mentioned sliding window is 5*5, above-mentioned coarseness beta pruning unit is by the 3*3's among the sliding window of above-mentioned 5*5 Weight is set to zero；Or at least one weight is randomly selected from above-mentioned sliding window and is set to zero.Operation is conducive to provide in this way The precision of subsequent trained operation.

Further, above-mentioned coarseness beta pruning unit and arithmetic element are for repeating to above-mentioned neural network progress coarseness It beta pruning and is trained according to the weight after beta pruning, until guaranteeing there is no weight under the premise of precision does not lose predetermined accuracy Meet above-mentioned preset condition.

Wherein, above-mentioned setting accuracy is x%, and x is number greater than 0 and less than 100, x according to different neural network and Different applications can have different selections.

In a preferred embodiment, the value range of x is 0-5.

Further, above-mentioned processing unit further include:

Quantifying unit, for after coarseness beta pruning unit carries out coarseness beta pruning to the weight of neural network, and Before above-mentioned arithmetic element is trained above-mentioned neural network according to the weight after beta pruning, quantify neural network weight and/ Or the first operation is carried out to the weight of the neural network, to reduce the bit number of weight.

In one feasible embodiment, the weight for quantifying neural network is specifically to replace meeting condition using a weight W0 Weight W1, which isWherein,For preset value.

Above-mentioned first operation can be the value range or the corresponding number of reduction weight of the corresponding data format of reduction weight According to the accuracy rating of format.

Further, above-mentioned arithmetic element is specifically used for:

Re -training is carried out to above-mentioned neural network according to the weight after beta pruning and by back-propagation algorithm.

Specifically, arithmetic element can be used for executing neural network reverse train algorithm, the neural network after receiving beta pruning, The neural network is trained using back-propagation algorithm, remains 0 by the weight of beta pruning in the training process.Operation Neural network after training is transferred to coarseness beta pruning unit and carries out further cut operator by unit, or is directly exported.

Specifically, above-mentioned arithmetic element to each layer of above-mentioned neural network according to the sequence opposite with forward operation successively into Row retrospectively calculate is finally gone to update weight with the gradient for the weight being calculated；Here it is successively changing for the training of neural network In generation, it is multiple that entire training process needs to repeat this process；Each layer of reversed operation needs to be implemented two parts operation: one Part is that weight gradient is calculated using output neuron gradient and input neuron, and another part is using output neuron Gradient and weight, be calculated input neuron gradient (for the output neuron gradient as next layer in reversed operation with Reversed operation is carried out for it)；After having executed the reversed operation of neural network, the weight gradient of each layer has just been calculated, has been transported Unit is calculated to be updated weight according to weight gradient.

It should be pointed out that during above-mentioned arithmetic element is trained neural network, it is set to 0 weight always Remain 0.

In the scheme of the embodiment of the present invention, the coarseness beta pruning unit of above-mentioned processing unit to the weight of neural network into Row coarseness cut operator, to obtain the weight after beta pruning, arithmetic element according to the weight after beta pruning to neural network again into Row training.Coarseness beta pruning processing is carried out by the weight to neural network, reduction is subsequent to store and access value, simultaneously Subsequent operand is also reduced, operation efficiency is improved and reduces power consumption.

Referring to fig. 4, Fig. 4 is a kind of structural schematic diagram of accelerator provided in an embodiment of the present invention.As shown in figure 4, should Accelerator includes:

Storage unit, for storing input neuron, output neuron, weight and the instruction of neural network.

Coarseness beta pruning unit carries out coarseness beta pruning for the weight to above-mentioned neural network, after obtaining beta pruning Weight, and by after beta pruning weight and target weight location information store into said memory cells.

It should be noted that above-mentioned coarseness beta pruning unit carries out the tool of coarseness cut operator to the weight of neural network Body process can be found in the associated description of embodiment illustrated in fig. 1, no longer describe herein.

Arithmetic element, for being trained according to the weight after above-mentioned beta pruning to above-mentioned neural network.

Coarseness selects counting unit, for receiving input neuron and target weight location information, selects mesh

Mark weight and its corresponding input neuron.

Wherein, above-mentioned target weight is the weight that absolute value is greater than the second preset threshold.

Further, above-mentioned coarseness selects counting unit only to choose target weight and its corresponding neuronal transmission to operation list Member.

Above-mentioned arithmetic element is also used to receive the target weight and its corresponding neuron of input, and according to target weight And its corresponding neuron completes neural network computing by MLA operation unit, obtains output neuron, and by output nerve Member is transmitted to said memory cells again.

Wherein, said memory cells are also used to store during above-mentioned arithmetic element generate during neural network computing Between result.

Further, above-mentioned accelerator further include:

Instruction control unit for receiving above-metioned instruction, and will generate control information after the Instruction decoding, above-mentioned to control Coarseness selects counting unit to carry out that number operation and above-mentioned arithmetic element is selected to carry out calculating operation.

Further, the location information of target weight and target weight is only stored when said memory cells storage weight.

It should be pointed out that said memory cells, coarseness beta pruning unit, instruction control unit, coarseness select counting unit It is entity hardware device with arithmetic element, is not functional software unit.

Referring to Fig. 5, Fig. 5 is the structural schematic diagram of another accelerator provided in an embodiment of the present invention.As shown in figure 5, Above-mentioned accelerator further include: pretreatment unit, storage unit are directly accessed access (direct memory access, DMA) Unit, instruction cache unit, instruction control unit, coarseness beta pruning unit, the first cache unit, the second cache unit, third Cache unit, coarseness select counting unit, arithmetic element and the 4th cache unit.

Wherein, above-mentioned pretreatment unit inputs pretreated data above-mentioned for pre-processing to initial data Storage unit, above-mentioned initial data include input neuron, output neuron and weight.Above-mentioned pretreatment includes cutting to data Point, gaussian filtering, binaryzation, regularization and/or normalization.

Said memory cells, neuron, weight and instruction for neural network.Wherein, it is only deposited when storing weight Store up the location information of target weight and target weight.

Above-mentioned DMA unit, in said memory cells and above-metioned instruction cache unit, coarseness beta pruning unit, first Middle progress data or instruction read-write between cache unit, the second cache unit, third cache unit or the 4th cache unit.

Above-mentioned coarseness beta pruning unit, for obtaining above-mentioned neural network from said memory cells by DMA unit Then weight is read to carry out coarseness beta pruning to the weight of the neural network, to obtain the weight after beta pruning.Above-mentioned coarseness beta pruning Unit is by the weight storage after beta pruning into above-mentioned first cache unit.

Above-metioned instruction cache unit, for caching above-metioned instruction；

Above-mentioned first cache unit, is used for caching of target weight, which is that absolute value is greater than the second preset threshold Weight；

Above-mentioned second cache unit is used for caching of target weight position data；The target weight position cache unit will input Each connection weight is corresponded to corresponding input neuron in data.

Optionally, target weight position cache unit cache one-to-one method be using 1 indicate output neuron with There is weight connection between input neuron, 0 indicates to connect between output neuron and input neuron without weight, every group of output mind Through member with it is all input neurons connection status form one 0 and 1 character string come indicate the output neuron connection close System.

Optionally, target weight position cache unit cache one-to-one method be using 1 indicate input neuron with Have weight connection between output neuron, 0 indicates to connect between input neuron and output neuron without weight, every group of input with The connection status of all outputs, which forms one 0 and 1 character string, indicates the connection relationship of the input neuron.

Optionally, target weight position cache unit caches one-to-one method as by one group of output, first connection institute Input neuron positional distance first second group of the input distance of neuron, output input neuron apart from upper one The distance of a input neuron, the output third group input neuron input the distance ... ... of neuron apart from upper one, according to It is secondary to analogize, until all inputs of the exhaustion output, to indicate the connection relationship of the output.

Above-mentioned third cache unit, for caching the input neuron for being input to above-mentioned coarseness and selecting counting unit.

Above-mentioned 4th cache unit, for caching the output neuron of arithmetic element output and being obtained according to output neuron Output neuron gradient.

The Instruction decoding is generated control letter for receiving the instruction for instructing and changing into unit by above-metioned instruction control unit Breath has controlled above-mentioned arithmetic element and has carried out calculating operation.

Above-mentioned coarseness selects counting unit, for receiving input neuron and target weight location information, and is weighed according to target Value location information selects the input neuron to require calculation.The coarseness, which selects counting unit only, can select target weight correspondence Neuron and be transmitted to arithmetic element.

Above-mentioned arithmetic element, the control information for being transmitted according to instruction control unit is to input neuron and target weight Operation is carried out, to obtain output neuron, which is stored in the 4th cache unit；And according to the output neuron Output neuron gradient is obtained, and by output neuron gradient storage into above-mentioned 4th cache unit.

Specifically, above-mentioned coarseness selects counting unit, for according to the location information of target weight from above-mentioned input neuron Chosen in the input neuron of buffer cell input for the corresponding input neuron of target weight, then by target weight and its Corresponding input neuronal transmission is to arithmetic element.

In one embodiment, above-mentioned arithmetic element may include multiple processing units, to realize that parallel computation obtains not Same output neuron, and obtained output neuron is stored into output neuron cache unit.Wherein, above-mentioned multiple processing are single Each processing unit in member includes a local Weight selected device module, for further processing the sparse number of dynamic coarseness According to.Above-mentioned coarseness selects counting unit for handling static degree of rarefication, above-mentioned coarseness by the input neuron needed for selecting Select counting unit specific work process referring to the associated description of Fig. 6.

Referring to Fig. 6, firstly, above-mentioned coarseness selects counting unit to generate neuron index according to input neuron value, wherein often A index can all indicate whether corresponding neuron is useful (" 0 ").Secondly, above-mentioned coarseness selects neuron of the counting unit by generation Index and weight location information (i.e. weight index) carry out with operation (i.e. AND operation) and combine to obtain neuronal marker, And each of the neuronal marker indicates whether to select corresponding neuron.Third, above-mentioned coarseness select counting unit to add Then each of neuronal marker is executed with obtaining cumulative character string between cumulative character string and neuronal marker With operation (i.e. AND operation) is to generate the target string for selecting input neuron；Finally, above-mentioned coarseness selects number list Member selects to actually enter neuron using target string, to carry out subsequent calculating in above-mentioned arithmetic element.On meanwhile Stating coarseness selects counting unit to generate index according to target string and the weight index character string of accumulation (i.e. weight location information) Character string simultaneously passes to above-mentioned arithmetic element.

Above-mentioned arithmetic element is mainly for the treatment of dynamic sparsity and performs effectively all operations of neural network.The nerve Meta function unit includes multiple processing units.As shown in fig. 7, each processing unit includes weight buffer area, weight decoder mould The neuronal function unit of block, Weight selected device module and processing unit.Each processing unit load is slow from its local weight The weight in area is rushed, because it is also independent that weight, which is between independent, different processing between different output neurons, 's.Weight decoder module with look-up table is placed on the side of weight buffer area, with the code according to used in local quantization This extracts practical weight with the compressed value in dictionary.

As shown in Figure 8 a, Weight selected device module receives index character string and weight from weight decoder module, with Select the useful weight calculated the neuronal function unit of processing unit.The neuronal function unit of each processing unit By Tm multiplier, an adder tree and a nonlinear function module composition, as shown in Figure 8 b.Above-mentioned neuronal function unit Neural network is mapped on processing unit using time-sharing method, i.e., each processing unit in parallel processing output neuron, And M/Tm period is needed to calculate the output neuron for needing M multiplication, because it can realize Tm in one cycle Multiplication.Then, neuronal function unit is collected and the output for all processing units that collect, and arrives output to calculate or to store later Neuron cache unit.

The Weight selected device module weight that only selection needs when considering that dynamic is sparse, because above-mentioned weight buffer area is compact Ground stores weight to realize static sparsity.Index character based on the neuron selector module comprising weight location information String further filters weight and selects to calculate required weight, referring to Fig. 8 a.Each processing unit will be in different output nerves It works in member, to generate different weights.Therefore, inside processing unit enforcement right value selector module and weight buffer area with Avoid high bandwidth and delay.

It should be pointed out that dynamic it is sparse refer generally to input neuron it is sparse because input neuron value with input Change and changes.It is dynamically sparse to be mainly derived from this excitation function of relu, because absolute value can be less than by the functional operation The input neuron of threshold value is set to 0.The static sparse weight that refers generally to is sparse, because weight is cut off rear topological structure and no longer changed Become.

Wherein, above-metioned instruction cache unit, input neuron cache unit, target weight cache unit, target weight position It sets cache unit and output neuron cache unit is on piece caching.

Specifically, arithmetic element includes but are not limited to three parts, first part's multiplier, second part add tree, Part III is activation primitive unit.First input data (in1) is multiplied to obtain by first part with the second input data (in2) Output (out1) after multiplication, process are as follows: out=in1*in2；Third input data in3 is passed through add tree by second part It is added step by step and obtains the second output data (out2), wherein in3 is the vector that a length is N, and N is greater than 1, crosses and is known as: out2 =in3 [1]+in3 [2]+...+in3 [N], and/or by third input data (in3) by addition number it is cumulative after and it is the 4th defeated Enter data (in4) addition and obtains the second output data (out2), process are as follows: out=in3 [1]+in3 [2]+...+in3 [N]+ In4, or third input data (in3) is added to obtain the second output data (out2), process with the 4th input data (in4) Are as follows: out2=in3+in4；Part III is activated the 5th input data (in5) by activation primitive (active) operation Output data (out), process are as follows: out3=active (in5), activation primitive active can be sigmoid, tanh, relu, Softmax etc., in addition to doing activation operation, other nonlinear functions are may be implemented in Part III, can be logical by input data (in) It crosses operation (f) and obtains output data (out), process are as follows: out=f (in).

Further, arithmetic element can also include pond unit, and input data (in) is passed through Chi Huayun by pond unit It calculates and obtains the output data (out) after pondization operation, process is out=pool (in), and wherein pool is pondization operation, Chi Hua Operation includes but is not limited to: average value pond, maximum value pond, intermediate value pond, and input data in is and exports out relevant one Data in a pond core.

It includes several parts that the arithmetic element, which executes operation, and first part is that first input data and second is defeated Enter data multiplication, the data after being multiplied；Second part executes add tree operation, adds for passing through third input data Method tree is added step by step, or by the third input data by being added to obtain output data with the 4th input data；Third portion Divide and execute activation primitive operation, output data is obtained by activation primitive (active) operation to the 5th input data.It is above several The operation of a part can be freely combined, to realize the operation of various different function.

It is pointed out that above-mentioned pretreatment unit, storage unit, DMA unit, coarseness beta pruning unit, instruction buffer Unit, instruction control unit, the first cache unit, the second cache unit, third cache unit, the 4th cache unit, coarseness It selects counting unit and arithmetic element is entity hardware device, be not functional software unit.

Referring to Fig. 9, Fig. 9 is the structural schematic diagram of another accelerator provided in an embodiment of the present invention.As shown in figure 9, Above-mentioned accelerator further include: pretreatment unit, storage unit, DMA unit, instruction cache unit, instruction control unit, coarse grain Spend beta pruning unit, target weight cache unit, target weight position cache unit, input neuron cache unit, coarseness choosing Counting unit, arithmetic element, output neuron cache unit and output neuron gradient cache unit.

Above-mentioned DMA unit, in said memory cells and above-metioned instruction cache unit, coarseness beta pruning unit, target Middle progress data or instruction between weight position cache unit, input neuron cache unit or output neuron cache unit Read-write.

Above-mentioned coarseness beta pruning unit, for obtaining above-mentioned nerve from said memory cells by DMA unit

Then the weight of network carries out coarseness beta pruning to the weight of the neural network, to obtain the weight after beta pruning.On Coarseness beta pruning unit is stated to store up the weight after beta pruning into above-mentioned target weight cache unit.

Above-mentioned target weight cache unit is used for caching of target weight；

Above-mentioned target weight position cache unit is used for caching of target weight position data；Target weight position caching is single Member corresponds connection weight each in input data to corresponding input neuron.

Above-mentioned input neuron cache unit, for caching the input neuron for being input to above-mentioned coarseness and selecting counting unit.

Above-mentioned output neuron cache unit, for caching the output neuron of arithmetic element output.

Above-mentioned output neuron gradient cache unit, for caching the gradient of above-mentioned output neuron.

Above-mentioned arithmetic element, for according to the target weight that is obtained from above-mentioned target weight cache unit and its corresponding It inputs neuron and carries out operation, to obtain output neuron；It is single that the output neuron is cached to above-mentioned output neuron caching In member.

Above-mentioned arithmetic element is also used to be trained according to the weight after output neuron gradient and beta pruning.

It should be noted that above-mentioned accelerator each unit function can be found in the associated description of above-mentioned embodiment illustrated in fig. 5, It no longer describes herein.

It is pointed out that above-mentioned pretreatment unit, storage unit, DMA unit, coarseness beta pruning unit, instruction buffer Unit, target weight cache unit, target weight position cache unit, inputs neuron cache unit, is defeated instruction control unit Neuron gradient cache unit, output neuron cache unit, coarseness select counting unit out and arithmetic element is entity hardware Device is not functional software unit.

Referring to Figure 10, Figure 10 is the structural schematic diagram of another accelerator provided in an embodiment of the present invention.Such as Figure 10 institute Show, above-mentioned accelerator further include:

Pretreatment unit, storage unit, DMA unit, instruction cache unit, instruction control unit, coarseness beta pruning unit, Target weight cache unit, target weight position cache unit, input neuron cache unit, coarseness select counting unit, operation Unit and output neuron cache unit.

Above-mentioned coarseness beta pruning unit, for obtaining above-mentioned neural network from said memory cells by DMA unit Then weight carries out coarseness beta pruning to the weight of the neural network, to obtain the weight after beta pruning.Above-mentioned coarseness beta pruning list Member stores up the weight after beta pruning into above-mentioned target weight cache unit.

Above-mentioned coarseness selects counting unit, for receiving input neuron and target weight location information, and is weighed according to target Value location information selects the input neuron to require calculation.The coarseness, which selects counting unit only, can select target weight correspondence Input neuron and be transmitted to arithmetic element.

It is pointed out that above-mentioned pretreatment unit, storage unit, DMA unit, coarseness beta pruning unit, instruction buffer Unit, target weight cache unit, target weight position cache unit, inputs neuron cache unit, is defeated instruction control unit Neuron cache unit, output neuron gradient cache unit, coarseness select counting unit out and arithmetic element is entity hardware Device is not functional software unit.

Hereinafter, enumerating neural network processor embodiment, the processing method of the disclosure is specifically described, it should be understood that Be that it is not intended to limit the disclosure, it is all using equivalent structure or equivalent flow shift made by this specific embodiment, or Other related technical areas directly or indirectly are used in, similarly include in the protection scope of the disclosure.

It is an a kind of specific embodiment schematic diagram of processing method provided in an embodiment of the present invention referring to Figure 11, Figure 11.Such as The full articulamentum that Figure 11 show neural network is after coarseness beta pruning as a result, full articulamentum shares 8 input minds It is n1~n8 and 3 output neuron through member is o1~o3.Tetra- input neurons of wherein n3, n4, n7, n8 and o1, o2, o3 tri- Weight between a output neuron is set to zero by the way that coarseness is sparse；N1 and o1, o2 pass through s11, s12, s13 between o3 Three weight connections, by s21 between n2 and o1, o2, o3, tri- weight connections of s22, s23 pass through between n5 and o1, o2, o3 Tri- weight connections of s31, s32, s33 pass through s41, tri- weight connections of s42, s43 between n6 and o1, o2, o3；We use 11001100 this Bit String indicate the connection between input neuron and output neuron, i.e. the first expression target power The case where being worth location information, 1 indicates that input neuron and three output neurons all connect, and 0 indicates input neuron and three Output neuron is all not connected to.Table 1 describes the information of neuron and weight in embodiment, and formula 1 describes o1, o2, o3 tri- The operational formula of a output neuron.It can be seen that o1, o2, o3 will receive identical neuron and carry out operation from formula 1.

Fine granularity beta pruning refers to each weight being regarded as independent individual, when beta pruning if some weight is eligible It is cut off；Coarseness beta pruning is exactly that weight is grouped according to certain mode, and every group includes multiple weights, if one group of weight symbol Conjunction condition, this group of weight will be wiped out all.

Table 1

Formula 1-- output neuron operational formula:

O1=n1*s11+n2*s12+n5*s13+n6*s14

O2=n1*s21+n2*s22+n5*s23+n6*s24

O3=n1*s31+n2*s32+n5*s33+n6*s34

When processing unit carries out operation, 8 input neurons, the location information of 12 weights and 8 bits and accordingly Instruction be transferred to storage unit.Coarseness selects counting unit to receive 8 input neurons and target weight position, selects n1, N2, n5, n6 tetra- need to participate in the neuron of operation.Arithmetic element receives four neurons and weight selected, passes through public affairs Formula 1 completes the operation of output neuron, and output neuron is then transmitted back to storage section.

In some embodiments of the disclosure, discloses a kind of accelerator, comprising: memory: being stored with executable instruction；Place Reason device: it for executing the executable instruction in storage unit, is operated when executing instruction according to above-mentioned processing method.

Wherein, processor can be single processing unit, but also may include two or more processing units.In addition, Processor can also include general processor (CPU) or graphics processor (GPU)；Field programmable logic can also be included in Gate array (FPGA) or specific integrated circuit (ASIC), to be configured to neural network and operation.Processor can also wrap Include the on-chip memory (including the memory in processing unit) for caching purposes.

The application is also disclosed that a neural network computing device comprising what one or more was mentioned in this application adds Speed variator or processing unit execute specified nerve for being obtained from other processing units to operational data and control information Network operations and/or training, implementing result pass to peripheral equipment by I/O interface.Peripheral equipment for example camera, display Device, mouse, keyboard, network interface card, wifi interface, server.When comprising more than one computing device, it can pass through between computing device Specific structure is linked and is transmitted data, for example, data is interconnected and transmitted by PCIE bus, to support bigger rule The operation and/or training of the neural network of mould.At this point it is possible to share same control system, there can also be control independent System；Can with shared drive, can also each accelerator have respective memory.In addition, its mutual contact mode can be any interconnection Topology.

The neural network computing device compatibility with higher can pass through PCIE interface and various types of server phases Connection.

The application is also disclosed that a combined treatment device comprising above-mentioned neural network computing device, general interconnection Interface and other processing units.Neural network computing device is interacted with other processing units, common to complete what user specified Operation.Figure 12 is the schematic diagram of combined treatment device.

Other processing units, including central processor CPU, graphics processor GPU, neural network processor etc. are general/special With one of processor or above processor type.Processor quantity included by other processing units is with no restrictions.Its His interface of the processing unit as neural network computing device and external data and control, including data are carried, and are completed to Benshen Unlatching, stopping through network operations device etc. control substantially；Other processing units can also cooperate with neural network computing device It is common to complete processor active task.

General interconnecting interface, for transmitting data and control between the neural network computing device and other processing units Instruction.The neural network computing device obtains required input data, write-in neural network computing dress from other processing units Set the storage device of on piece；Control instruction can be obtained from other processing units, write-in neural network computing device on piece Control caching；The data in the memory module of neural network computing device can also be read and be transferred to other processing units.

Optionally, the structure is as shown in figure 13, can also include storage device, storage device respectively with the neural network Arithmetic unit is connected with other described processing units.Storage device for be stored in the neural network computing device and it is described its The data of his processing unit, the data of operation required for being particularly suitable for are in this neural network computing device or other processing units Storage inside in the data that can not all save.

The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard, Network interface card, wifi interface.

In some embodiments, a kind of neural network processor is disclosed, which includes above-mentioned nerve Network operations device or combined treatment device.

In some embodiments, a kind of chip is disclosed comprising above-mentioned neural network processor.

In some embodiments, a kind of chip-packaging structure is disclosed comprising said chip.

In some embodiments, a kind of board is disclosed comprising said chip encapsulating structure.

In some embodiments, a kind of electronic device is disclosed comprising above-mentioned board.

Figure 14 is please referred to, Figure 14 is a kind of structural representation of neural network processor board provided by the embodiments of the present application Figure.As shown in figure 14, above-mentioned neural network processor board includes said chip encapsulating structure, the first electrical and non-electrical connection Device and first substrate (substrate).

The application is not construed as limiting the specific structure of chip-packaging structure, optionally, as shown in figure 15, said chip envelope Assembling structure includes: chip, the second electrical and non-electrical attachment device, the second substrate.

The concrete form of chip involved in the application is not construed as limiting, and said chip is including but not limited to will be at neural network The integrated neural network chip of device is managed, above-mentioned chip can be made of silicon materials, germanium material, quantum material or molecular material etc.. Above-mentioned neural network chip can be packaged by (such as: more harsh environment) and different application demands according to the actual situation, So that the major part of neural network chip is wrapped, and the pin on neural network chip is connected to envelope by conductors such as gold threads The outside of assembling structure, for carrying out circuit connection with more outer layer.

The application for first substrate and the second substrate type without limitation, can be printed circuit board (printed Circuit board, PCB) or (printed wiring board, PWB), it is also possible to it is other circuit boards.Production to PCB Material is also without limitation.

The second substrate involved in the application passes through the second electrical and non-electrical attachment device general for carrying said chip The chip-packaging structure that said chip and the second substrate are attached is convenient for for protecting chip by chip-packaging structure It is further encapsulated with first substrate.

Electrical for above-mentioned specific second and non-electrical attachment device the corresponding structure of packaged type and packaged type It is not construed as limiting, suitable packaged type can be selected with different application demands according to the actual situation and is simply improved, example Such as: flip chip ball grid array encapsulates (Flip Chip Ball Grid Array Package, FCBGAP), and slim four directions is flat Flat encapsulates (Low-profile Quad Flat Package, LQFP), quad flat package (the Quad Flat with radiator Package with Heat sink, HQFP), without pin quad flat package (Quad Flat Non-lead Package, QFN) or small spacing quad flat formula encapsulates packaged types such as (Fine-pitch Ball Grid Package, FBGA).

Flip-chip (Flip Chip), suitable for the area requirements after encapsulation are high or biography to the inductance of conducting wire, signal In the case where defeated time-sensitive.In addition to this packaged type that wire bonding (Wire Bonding) can be used, reduces cost, mentions The flexibility of high encapsulating structure.

Ball grid array (Ball Grid Array), is capable of providing more pins, and the average conductor length of pin is short, tool The effect of standby high-speed transfer signal, wherein encapsulation can encapsulate (Pin Grid Array, PGA), zero slotting with Pin-Grid Array Pull out force (Zero Insertion Force, ZIF), single edge contact connection (Single Edge Contact Connection, SECC), contact array (Land Grid Array, LGA) etc. replaces.

Optionally, using the packaged type of flip chip ball grid array (Flip Chip Ball Grid Array) to mind It is packaged through network chip and the second substrate, the schematic diagram of specific neural network chip encapsulating structure can refer to Figure 16.Such as Shown in Figure 16, said chip encapsulating structure include: chip 21, pad 22, soldered ball 23, the second substrate 24, in the second substrate 24 Tie point 25, pin 26.

Wherein, pad 22 is connected with chip 21, by welding between the tie point 25 on pad 22 and the second substrate 24 Soldered ball 23 is formed, neural network chip 21 and the second substrate 24 are connected, that is, realize the encapsulation of chip 21.

Pin 26 with the external circuit (for example, first substrate on board) of encapsulating structure for being connected, it can be achieved that external The transmission of data and internal data is handled data convenient for chip 21 or the corresponding neural network processor of chip 21.It is right It is also not construed as limiting in the type and quantity the application of pin, different pin forms can be selected according to different encapsulation technologies, and Certain rule is deferred to be arranged.

Optionally, above-mentioned neural network chip encapsulating structure further includes insulation filler, is placed in pad 22, soldered ball 23 and connects In gap between contact 25, interference is generated between soldered ball and soldered ball for preventing.

Wherein, the material of insulation filler can be silicon nitride, silica or silicon oxynitride；Interference comprising electromagnetic interference, Inductive interferences etc..

Optionally, above-mentioned neural network chip encapsulating structure further includes radiator, for distributing neural network chip 21 Heat when operation.Wherein, radiator can be the good sheet metal of one piece of thermal conductivity, cooling fin or radiator, for example, wind Fan.

For example, as shown in figure 17, chip-packaging structure include: chip 21, pad 22, soldered ball 23, the second substrate 24, Tie point 25, pin 26, insulation filler 27, thermal grease 28 and metal shell cooling fin 29 in the second substrate 24.Wherein, it dissipates Hot cream 28 and metal shell cooling fin 29 are used to distribute heat when chip 21 is run.

Optionally, said chip encapsulating structure further includes reinforced structure, is connect with pad 22, and it is interior be embedded in soldered ball 23, To enhance the bonding strength between soldered ball 23 and pad 22.

Wherein, reinforced structure can be metal wire structure or column structure, it is not limited here.

The application is electrical for first and the concrete form of non-electrical device of air is also not construed as limiting, and it is electrical and non-to can refer to second Chip-packaging structure, i.e., be packaged by the description of electric device by welding, can also using connecting line connection or Pluggable mode connects the mode of the second substrate and first substrate, is convenient for subsequent replacement first substrate or chip-packaging structure.

Optionally, first substrate includes the internal storage location interface etc. for extension storage capacity, such as: synchronous dynamic random Memory (Synchronous Dynamic Random Access Memory, SDRAM), Double Data Rate synchronous dynamic random are deposited Reservoir (Double Date Rate SDRAM, DDR) etc., the processing energy of neural network processor is improved by exented memory Power.

It may also include quick external equipment interconnection bus (Peripheral Component on first substrate 13 Interconnect-Express, PCI-E or PCIe) interface, hot-swappable (the Small Form-factor of small package Pluggable, SFP) interface, Ethernet interface, Controller Area Network BUS (Controller Area Network, CAN) connect Mouthful etc., for the data transmission between encapsulating structure and external circuit, the convenience of arithmetic speed and operation can be improved.

Neural network processor is encapsulated as chip, is chip-packaging structure by chip package, chip-packaging structure is sealed Dress is board, carries out data interaction by interface (slot or lock pin) on board and external circuit (such as: computer motherboard), The function of neural network processor is directly realized by using neural network processor board, and protects chip.And nerve net Other modules can be also added on network processor board, improve the application range and operation efficiency of neural network processor.

Electronic device includes data processing equipment, robot, computer, printer, scanner, plate

Computer, intelligent terminal, mobile phone, automobile data recorder, navigator, sensor, camera, cloud clothes

Business device, camera, video camera, projector, wrist-watch, earphone, mobile storage, the wearable device vehicles, household electric Device, and/or Medical Devices.

The vehicles include aircraft, steamer and/or vehicle；The household electrical appliance include TV, air-conditioning, micro-wave oven, Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator；The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument And/or electrocardiograph.

Special emphasis is, all modules can be hardware configuration, and the physics realization of hardware configuration includes but not office It is limited to physical device, physical device includes but is not limited to transistor, memristor, DNA computer.It should be noted that running through attached drawing, phase Same element is indicated by same or similar appended drawing reference.When may cause the understanding of the present invention and cause to obscure, will save Slightly conventional structure or construction.It should be noted that the shape and size of each component do not reflect actual size and ratio in figure, and only illustrate this The content of inventive embodiments.

Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.

Particular embodiments described above has carried out further in detail the purpose of the disclosure, technical scheme and beneficial effects Describe in detail bright, it should be understood that the foregoing is merely the specific embodiment of the disclosure, be not limited to the disclosure, it is all Within the spirit and principle of the disclosure, any modification, equivalent substitution, improvement and etc. done should be included in the protection of the disclosure Within the scope of.

Claims

1. a kind of accelerator characterized by comprising coarseness beta pruning unit, storage unit, is directly deposited pretreatment unit Storage access DMA unit, coarseness select counting unit, instruction control unit and arithmetic element,

Wherein, the pretreatment unit is connect with the storage unit, and the storage unit is connect with the DMA unit, described DMA unit selects counting unit and arithmetic element to connect with the coarseness beta pruning unit, instruction control unit, coarseness, described thick Granularity beta pruning unit and instruction control unit are connect with the arithmetic element；

The pretreatment unit is stored for pre-processing to initial data, and by pretreated data to the storage Unit；The initial data includes input neuron, output neuron and weight；

The storage unit, for storing input neuron, output neuron, weight and instruction；Wherein when the storage finger Target weight and its location information are only stored when enabling；The target weight is the weight that absolute value is greater than the second preset threshold；

The DMA unit, for being selected in the storage unit and described instruction control unit, coarseness beta pruning unit, coarseness Middle progress data or instruction read-write between counting unit and arithmetic element；

The coarseness beta pruning unit, specifically for obtaining the weight from the storage unit by the DMA unit, so Coarseness beta pruning is carried out to the weight afterwards, to obtain the weight after beta pruning；

Arithmetic element, for being trained according to the weight after the beta pruning to neural network；

Described instruction control unit, for lead to it is described cross DMA unit acquisition instruction from the storage unit, and by Instruction decoding It generates control information and carries out calculating operation to control the arithmetic element；

The coarseness selects counting unit, for receiving the location information of input neuron and the target weight, and according to described The location information of target weight selects the input neuron to require calculation, and by the input neuronal transmission to the operation Unit；

The arithmetic element, the control information for being also used to be transmitted according to described instruction control unit weigh input neuron and target Value carries out operation, to obtain output neuron, obtains output neuron gradient according to the output neuron, and by described DMA unit is by output neuron and the storage of output neuron gradient into the storage unit；

Wherein, the coarseness beta pruning unit is specifically used for:

When the M weight meets preset condition, the M weight is set to zero in whole or in part.

2. the apparatus according to claim 1, which is characterized in that described device further includes instruction cache unit, the first caching Unit, the second cache unit, third cache unit and the 4th cache unit,

Wherein, described instruction cache unit is between the DMA unit and described instruction control unit, respectively with it is described DMA unit is connected with instruction control unit；First cache unit is located at the coarseness beta pruning unit and the operation list Between member, it is connect respectively with the coarseness beta pruning unit with the arithmetic element；Second cache unit is located at the DMA Unit and the coarseness are selected between counting unit, select counting unit to connect with the DMA unit with the coarseness respectively；Described Three cache units are located at the DMA unit and the coarseness is selected between counting unit, selected respectively with the coarseness counting unit with The DMA unit connection；4th cache unit between the DMA unit and the arithmetic element, respectively with it is described DMA unit is connect with the arithmetic element；

Described instruction cache unit, for caching described instruction；

First cache unit, for caching the weight after the target weight and the beta pruning；

Second cache unit, for caching the location information of the target weight；

The third cache unit, for caching the input neuron for being input to the coarseness and selecting counting unit；

4th cache unit, for caching the output neuron of the arithmetic element output and being obtained according to output neuron Output neuron gradient.

3. device according to claim 1 or 2, which is characterized in that the pretreatment includes cutting, gaussian filtering, two-value Change, regularization, in normalization at least one of.

4. device according to claim 3, which is characterized in that the preset condition are as follows:

5. device according to claim 4, which is characterized in that the information content of the M weight is that the M weight is absolute The maximum value of the arithmetic mean of instantaneous value of value, the geometrical mean of the M weight absolute value or the M weight, described first Preset threshold is first threshold, second threshold or third threshold value, and the information content of the M weight is less than the first preset threshold packet It includes:

The arithmetic mean of instantaneous value of the M weight absolute value is less than the geometry of the first threshold or the M weight absolute value Average value is less than the second threshold or the maximum value of the M weight is less than the third threshold value.

6. device according to claim 1-5, which is characterized in that the coarseness beta pruning unit and arithmetic element For:

It repeats to carry out coarseness beta pruning to the weight of the neural network and be trained according to the weight after beta pruning, until protecting Card does not have weight to meet the preset condition under the premise of precision does not lose setting accuracy.

7. device according to claim 6, which is characterized in that the setting accuracy is x%, wherein the x is between 0-5 Between.

8. device according to claim 1-6, which is characterized in that the neural network includes full articulamentum, volume Lamination and/or shot and long term remember LSTM layers, wherein and the weight of the full articulamentum is a two-dimensional matrix (Nin, Nout), Middle Nin is the number for inputting neuron, and Nout is the number of output neuron, and the full articulamentum has Nin*Nout weight； The weight of the convolutional layer is a four-matrix (Nfin, Nfout, Kx, Ky), and wherein Nfin is of input feature vector image Number, Nfout are the numbers for exporting characteristic image, and (Kx, Ky) is the size of convolution kernel, and the convolutional layer has Nfin*Nfout*Kx* Ky weight；LSTM layers of the weight be made of the weight of the full articulamentums of m, and the m is the integer greater than 0, is connected entirely for i-th Connecing layer weight is (Nin_i, Nout_i), and wherein i is greater than 0 and is less than or equal to the integer of m, and Nin_i indicates i-th of full connection Layer weight inputs neuron number, and Nout_i indicates i-th of full articulamentum weight output neuron number；The coarseness beta pruning Unit is specifically used for:

When the weight to the full articulamentum carries out coarseness cut operator, the size of the sliding window is Bin*Bout's Sliding window, wherein the Bin is greater than 0 and to be less than or equal to Nin integer, the Bout is greater than 0 and to be less than or equal to The integer of Nout；

So that the sliding window is slided along the direction of Bin according to step-length Sin, or along Bout direction according to Step-length Sout is slided, and wherein Sin is the positive integer greater than 0 and less than or equal to Bin, and Sout is greater than 0 and to be less than or wait In the positive integer of Bout；

M value is chosen from the Nin*Nout weight by the sliding window, when the M weight meet it is described pre- If when condition, the M weight is set to zero in whole or in part, wherein the M=Bin*Bout；

When the weight to the convolutional layer carries out coarseness cut operator, the sliding window is that size is Bfin*Bfout* The four-dimensional sliding window of Bx*By, wherein Bfin is the integer greater than 0 and less than or equal to Nfin, and Bfout is greater than 0 and to be less than Or the integer equal to Nfout, Bx are the integer greater than 0 and less than or equal to Kx, By is greater than 0 and whole less than or equal to Ky Number；

Slide the sliding window according to step-length Sfin along the direction of Bfin, or along Bfout direction according to Step-length Sfout is slided, or is slided along the direction of Bx according to step-length S, or along By direction according to step-length Sy It is slided, wherein Sfin is the integer greater than 0 and less than or equal to Bfin, and Sfout is greater than 0 and to be less than or equal to Bfout Integer, Sx be greater than 0 and be less than or equal to Bx integer, Sy be greater than 0 and be less than or equal to By integer；

M weight is chosen from the Nfin*Nfout*Kx*Ky weight by the sliding window, when the M weight is full When the foot preset condition, the M weight is set to zero in whole or in part, wherein the M=Bfin*Bfout*Bx* By；

When carrying out coarseness cut operator to LSTM layers of the weight, the size of the sliding window is Bin_i*Bout_ I, wherein Bin_i is the integer greater than 0 and less than or equal to Nin_i, and Bout_i is greater than 0 and whole less than or equal to Nout_i Number；

Slide the sliding window according to step-length Sin_i along the direction of Bin_i, or along Bout_i direction according to Step-length Sout_i is slided, wherein Sin_i be greater than 0 and be less than or equal to Bin_i positive integer, Sout_i be greater than 0 and Positive integer less than or equal to Bout_i；

M weight is chosen from the Bin_i*Bout_i weight by the sliding window, when the M weight meets institute When stating preset condition, the M weight is set to zero in whole or in part, wherein the M=Bin_i*Bout_i.

9. device according to claim 1-8, which is characterized in that the arithmetic element is specifically used for:

Re -training is carried out to the neural network according to the weight after beta pruning and by back-propagation algorithm.

10. -9 described in any item devices according to claim 1, which is characterized in that the processing unit further include:

Quantifying unit, for neural network weight carry out coarseness beta pruning after and according to the weight after beta pruning to mind Before carrying out retraining through network, quantifies the weight of the neural network and/or first is carried out to the weight of the neural network Operation, to reduce the weight bit number of the neural network；

Wherein, the weight for quantifying neural network is specifically that the weight W1 for the condition that meets is replaced with weight W0, which isIt is describedFor preset value；

Carrying out the first operation to the weight of the neural network is specifically the corresponding data lattice of weight for reducing the neural network The accuracy rating of the corresponding data format of the weight of the value range of formula or the neural network.