CN110175673A - Processing method and accelerator - Google Patents
Processing method and accelerator Download PDFInfo
- Publication number
- CN110175673A CN110175673A CN201910474387.7A CN201910474387A CN110175673A CN 110175673 A CN110175673 A CN 110175673A CN 201910474387 A CN201910474387 A CN 201910474387A CN 110175673 A CN110175673 A CN 110175673A
- Authority
- CN
- China
- Prior art keywords
- weight
- unit
- coarseness
- neural network
- neuron
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Feedback Control In General (AREA)
- Image Analysis (AREA)
Abstract
Present disclose provides a kind of processing units, in trained and calculating process, it is sparse to carry out coarseness to the weight of neural network, so that reducing memory access in trained and calculating process reduces operand simultaneously, to obtain speed-up ratio and reduce energy consumption.
Description
Technical field
This disclosure relates to which a kind of processing method and accelerator of computer field, refreshing by beta pruning further to one kind
Accelerate the processing method and accelerator of operation through network weight.
Background technique
Neural network (neural network) has been obtained for extremely successful application.But as our designs are bigger
Scale, deeper neural network will introduce more weights, and ultra-large weight becomes one of Application of Neural Network
Huge challenge.On the one hand, more stringent requirements are proposed to storage for large-scale weight data, and a large amount of accessing operations will bring huge
Memory access energy consumption, on the other hand, a large amount of weight it is also proposed that requirements at the higher level, calculates the time and calculates energy consumption therewith to arithmetic element
Increase.Therefore, the weight that neural network is reduced under the premise of reducing computational accuracy, to reduce data storage capacity and calculating
Amount becomes a urgent problem to be solved.
Current most work mainly utilizes low-rank matrix decomposition or Hash skill etc., but these methods can subtract
Few weight and limited calculated amount, and the precision of neural network can be reduced, therefore, it is necessary to a kind of significantly more efficient methods to subtract
The weight of few neural network simultaneously reduces calculation amount.
Summary of the invention
(1) technical problems to be solved
In view of this, the disclosure is designed to provide a kind of processing method and accelerator, to solve the above
At least one of technical problem.
(2) technical solution
In a first aspect, the embodiment of the present invention provides a kind of processing unit, comprising:
Coarseness branch unit carries out coarseness beta pruning for the weight to neural network, to obtain the weight after beta pruning;
Arithmetic element, for being trained according to the weight after beta pruning to the neural network;
Wherein, the coarseness beta pruning unit is specifically used for:
M weight is selected from the weight of neural network by sliding window, the M is the integer greater than 1;When described
When M weight meets preset condition, the M weight is set to zero in whole or in part.
Further, the preset condition are as follows:
The information content of the M weight is less than the first preset threshold.
Further, the information content of the M weight is the arithmetic mean of instantaneous value of the M weight absolute value, the M power
It is worth the geometrical mean of absolute value or the maximum value of the M weight, first preset threshold is first threshold, the second threshold
Value or third threshold value, the information content of the M weight include: less than the first preset threshold
The arithmetic mean of instantaneous value of the M weight absolute value is less than the first threshold or the M weight absolute value
Geometrical mean is less than the second threshold or the maximum value of the M weight is less than the third threshold value.
Further, the coarseness beta pruning unit is used for: repeat to carry out the weight of neural network coarseness beta pruning with
And neural network is trained according to the weight after beta pruning, until guaranteeing do not have under the premise of precision does not lose setting accuracy
Weight meets the preset condition.
Further, the setting accuracy is x%, and wherein x is between 0 to 5.
Further, the neural network includes that full articulamentum, convolutional layer and/or shot and long term remember LSTM layers, wherein institute
The weight for stating full articulamentum is a two-dimensional matrix (Nin, Nout), and wherein Nin is the number for inputting neuron, and Nout is output
The number of neuron, the full articulamentum have Nin*Nout weight;The weight of the convolutional layer is a four-matrix
(Nfin, Nfout, Kx, Ky), wherein Nfin is the number of input feature vector image, and Nfout is the number for exporting characteristic image,
(Kx, Ky) is the size of convolution kernel, and the convolutional layer has Nfin*Nfout*Kx*Ky weight;LSTM layers of the weight has m
The weight of a full articulamentum forms, and the m is the integer greater than 0, and i-th of full articulamentum weight is (Nin_i, Nout_i),
Middle i is greater than 0 and is less than or equal to the integer of m, and Nin_i indicates that i-th of full articulamentum weight inputs neuron number, Nout_i
Indicate i-th of full articulamentum weight output neuron number;The coarseness beta pruning unit is specifically used for:
When the weight to the full articulamentum carries out coarseness cut operator, the size of the sliding window is Bin*
The sliding window of Bout, wherein the Bin be greater than 0 and be less than or equal to Nin integer, the Bout be greater than 0 and be less than or
Integer equal to Nout;
The sliding window is set to be slided along the direction of Bin according to step-length Sin, Huo Zheyan
The direction of Bout slided according to step-length Sout, wherein Sin is greater than 0 and just whole less than or equal to Bin
Number, Sout are the positive integer greater than 0 and less than or equal to Bout;
M value is chosen from the Nin*Nout weight by the sliding window, when the M weight meets institute
When stating preset condition, the M weight is set to zero in whole or in part, wherein the M=Bin*Bout;
When the weight to the convolutional layer carries out coarseness cut operator, the sliding window is that size is Bfin*
The four-dimensional sliding window of Bfout*Bx*By, wherein Bfin is the integer greater than 0 and less than or equal to Nfin, and Bfout is greater than 0
And it is less than or equal to the integer of Nfout, Bx is the integer greater than 0 and less than or equal to Kx, and By is greater than 0 and to be less than or equal to Ky
Integer;
Slide the sliding window according to step-length Sfin along the direction of Bfin, or along the direction of Bfout
Slided according to step-length Sfout, or slided along the direction of Bx according to step-length S, or along By direction according to step
Long Sy is slided, and wherein Sfin is the integer greater than 0 and less than or equal to Bfin, and Sfout is greater than 0 and to be less than or equal to
The integer of Bfout, Sx are the integer greater than 0 and less than or equal to Bx, and Sy is the integer greater than 0 and less than or equal to By;
M weight is chosen from the Nfin*Nfout*Kx*Ky weight by the sliding window, is weighed when described M
When value meets the preset condition, the M weight is set to zero in whole or in part, wherein the M=Bfin*Bfout*
Bx*By;
When carrying out coarseness cut operator to LSTM layers of the weight, the size of the sliding window is Bin_i*
Bout_i, wherein Bin_i is the integer greater than 0 and less than or equal to Nin_i, and Bout_i is greater than 0 and to be less than or equal to Nout_
The integer of i;
Slide the sliding window according to step-length Sin_i along the direction of Bin_i, or along the direction of Bout_i
Slided according to step-length Sout_i, wherein Sin_i be greater than 0 and be less than or equal to Bin_i positive integer, Sout_i be greater than
0 and be less than or equal to Bout_i positive integer;
M weight is chosen from the Bin_i*Bout_i weight by the sliding window, when the M weight is full
When the foot preset condition, the M weight is set to zero in whole or in part, wherein the M=Bin_i*Bout_i.
Further, the arithmetic element is specifically used for: according to the weight after the beta pruning and by using backpropagation
Algorithm carries out re -training.
Further, the processing group device further include:
Quantifying unit, for after the weight to neural network carries out coarseness beta pruning and according to the weight after beta pruning
Before carrying out retraining to neural network, quantifies the weight of the neural network and/or the weight of the neural network is carried out
First operation, to reduce the weight bit number of the neural network.
In a feasible embodiment, the coarseness selects counting unit to be specifically used for:
Neuron index is generated according to input neuron, the neuron index, which is used to indicate corresponding input neuron, is
It is no useful;
With operation is carried out to neuron index and the location information of the target weight, to obtain neuron mark
Will;Each in the neuronal marker indicates whether corresponding neuron is selected;
Each of the neuronal marker is added, to obtain cumulative character string;
With operation is executed to the cumulative character string and the neuronal marker, to obtain for selecting input nerve
The target string of member;
Input neuron is selected according to the target string, and the input neuron of selection is input to above-mentioned operation
Unit.
In a feasible embodiment, the arithmetic element includes multiple processing units, in the multiple processing unit
Each processing unit include weight buffer area, weight decoder module, Weight selected device module and neuronal function unit,
The weight buffer area, for buffering weight;
The weight decoder module extracts power for the compressed value in code book and dictionary according to used in local quantization
Value, and the weight is transmitted to the Weight selected device module;
The Weight selected device module according to index character string and comes from the weight solution for obtaining index character string
The weight of code device module, obtains the weight of selection;The selected weight is to carry out calculating to the neuronal function unit having
Weight;
The neuronal function unit, for obtaining the selected input neuron, and to the selected weight and
The selected input neuron carries out operation, to obtain output neuron.
Second aspect, the embodiment of the present invention provide a kind of accelerator, which includes:
Storage unit, the neural network weight after input neuron, output neuron, beta pruning for storing neural network
And instruction;Wherein neural network is to obtain trained neural network model to the Weight Training after beta pruning;
Coarseness beta pruning unit carries out coarseness beta pruning for the weight to the neural network, after obtaining beta pruning
Weight, and by the weight storage after the beta pruning into the storage unit;
Coarseness selects counting unit, and for receiving input neuron and target weight location information, it is corresponding to select target weight
Neuron, target weight be absolute value be greater than the second preset threshold weight;
Arithmetic element, target weight for receiving input and its corresponding neuron, and according to target weight and its right
The neuron answered carries out operation, and output neuron is retransmitted to storage unit.
Wherein, the storage unit is also used to store arithmetic element and carries out the intermediate result generated in calculating process.
Further, the accelerator further include:
Instruction control unit is decoded for receiving described instruction, and to the instruction, to obtain control information to control
The arithmetic element.
Further, the storage unit is used to store the location information of target weight and target weight.
Further, the accelerator further include:
Pretreated data are inputted the storage section for pre-processing to initial data by pretreatment unit,
Above-mentioned initial data includes input neuron, output neuron and weight.
Further, the pretreatment includes at cutting, gaussian filtering, binaryzation, regularization and/or the normalization of data
Reason.
Further, the accelerator further include:
Instruction cache unit, for caching described instruction.Described instruction cache unit is on piece caching.
Further, the accelerator further include:
Target weight cache unit is used for caching of target weight.The target weight cache unit is on piece caching.
Further, the accelerator further include:
Target weight position cache unit, the location information for caching of target weight.Target weight position caching
Unit is on piece caching.
Further, the accelerator further include:
Neuron cache unit is inputted, for caching input neuron, the input neuron cache unit is slow on piece
It deposits.
Further, the accelerator further include:
Output neuron cache unit, for caching output neuron, the output neuron cache unit is slow on piece
It deposits.
Further, target weight position cache unit, the location information for caching of target weight;Target weight
Position cache unit corresponds connection weight each in input data to corresponding input neuron.
Further, accelerator further include:
Be directly accessed cells D MA, in the storage unit, with described instruction cache unit, coarseness beta pruning unit,
Target weight cache unit, target weight position cache unit, input neuron cache unit or output neuron caching in into
Row data or instruction read-write.
Further, the arithmetic element comprises at least one of the following: multiplier, for will first input data with
Second input data is multiplied, the data after being multiplied;Add tree, for third input data to be passed through add tree phase step by step
Add, or the third input data is passed through into the data after being added with the 4th input data;Activation primitive fortune
Unit is calculated, output data is obtained by activation primitive operation to the 5th data, the activation primitive is sigmoid, tanh, relu
Or softmax function operation.
Further, the arithmetic element also Bao Chihua unit, for being obtained to the 6th data of input by pond operation
Output data to after pondization operation, the pondization operation includes: average value pond, maximum value pond or intermediate value pond.
The third aspect, the embodiment of the present invention provide a kind of accelerator, which includes:
Storage unit, the neural network weight after input neuron, output neuron, beta pruning for storing neural network
And instruction;Wherein the neural network is to obtain trained neural network model to the Weight Training after beta pruning;
Coarseness beta pruning unit carries out beta pruning for the weight to neural network, to obtain the weight after beta pruning, and will cut
Weight storage after branch is into said memory cells;
Arithmetic element, for being trained according to the weight after beta pruning to neural network, with the nerve net after being trained
Network;
Coarseness selects counting unit, and for receiving input neuron and target weight location information, it is corresponding to select target weight
Input neuron, target weight be absolute value be greater than the second preset threshold weight;Wherein, target weight is after the training
Weight.
Arithmetic element, target weight for receiving input and its corresponding input neuron, and according to target weight and
Its corresponding input neuron carries out operation, and output neuron is retransmitted to storage unit.
Wherein, said memory cells can also be used to store the intermediate result generated in arithmetic element progress calculating process.
Further, accelerator further include:
Instruction control unit decodes described instruction for receiving described instruction, to obtain control information, is used for
Control the arithmetic element.
Further, the storage unit is used to store the location information of target weight and target weight.
Further, the accelerator further include:
Pretreated data are inputted the storage section for pre-processing to initial data by pretreatment unit,
Above-mentioned initial data includes input neuron, output neuron and the weight of the neural network after training.
Further, the pretreatment includes at cutting, gaussian filtering, binaryzation, regularization and/or the normalization of data
Reason.
Further, accelerator further include:
Instruction cache unit, for caching described instruction.Described instruction cache unit is on piece caching.
Further, accelerator further include:
Target weight cache unit is used for caching of target weight.The target weight cache unit is on piece caching.
Further, accelerator further include:
Target weight position cache unit, the location information for caching of target weight.Target weight position caching
Unit is on piece caching.
Further, accelerator further include:
Neuron cache unit is inputted, for caching input neuron, the input neuron cache unit is slow on piece
It deposits.
Further, accelerator further include:
Output neuron cache unit, for caching output neuron, the output neuron cache unit is slow on piece
It deposits.
Further, target weight position cache unit, for the location information of caching of target weight, target weight
Position cache unit corresponds connection weight each in input data to corresponding input neuron.
Further, accelerator further include:
Be directly accessed cells D MA, in the storage unit, with described instruction cache unit, coarseness beta pruning unit,
Target weight cache unit, target weight position cache unit, input neuron cache unit or output neuron caching in into
Row data or instruction read-write.
Further, the arithmetic element comprises at least one of the following: multiplier, for will first input data with
Second input data is multiplied, the data after being multiplied;Add tree, for third input data to be passed through add tree phase step by step
Add, or the third input data is passed through into the data after being added with the 4th input data;Activation primitive fortune
Unit is calculated, output data is obtained by activation primitive operation to the 5th data, the activation primitive is sigmoid, tanh, relu
Or softmax function operation.
Further, the arithmetic element also Bao Chihua unit, for being obtained to the 6th data of input by pond operation
Output data to after pondization operation, the pondization operation includes: average value pond, maximum value pond or intermediate value pond.
Fourth aspect, the embodiment of the present invention provide a kind of processing method, comprising:
Coarseness beta pruning is carried out to the weight of neural network, to obtain the weight after beta pruning;
Neural network is trained according to the weight after the beta pruning;
Wherein, described that coarseness beta pruning is carried out to neural network, to obtain the weight after beta pruning, comprising:
M weight is selected from the weight of neural network by sliding window, the M is the integer greater than 1;
When the M weight meets preset condition, the M weight is set to zero in whole or in part, to be cut
Weight after branch.
Further, the preset condition are as follows:
The information content of the M weight is less than the first preset threshold.
Further, the information content of the M weight is the arithmetic mean of instantaneous value of the M weight absolute value, the M power
It is worth the geometrical mean of absolute value or the maximum value of the M weight, first preset threshold is first threshold, the second threshold
Value or third threshold value, the information content of the M weight include: less than the first preset threshold
The arithmetic mean of instantaneous value of the M weight absolute value is less than the first threshold or the M weight absolute value
Geometrical mean is less than the second threshold or the maximum value of the M weight is less than the third threshold value.
Further, the method also includes:
It repeats to carry out coarseness beta pruning to the weight of the neural network and be trained according to the weight after beta pruning, directly
To guaranteeing there is no weight to meet the preset condition under the premise of precision does not lose setting accuracy.
Further, the setting accuracy is x%, and wherein x is between 0 to 5.
Further, the neural network includes that full articulamentum, convolutional layer and/or shot and long term remember LSTM layers, wherein institute
The weight for stating full articulamentum is a two-dimensional matrix (Nin, Nout), and wherein Nin is the number for inputting neuron, and Nout is output
The number of neuron, the full articulamentum have Nin*Nout weight;The weight of the convolutional layer is a four-matrix
(Nfin, Nfout, Kx, Ky), wherein Nfin is the number of input feature vector image, and Nfout is the number for exporting characteristic image,
(Kx, Ky) is the size of convolution kernel, and the convolutional layer has Nfin*Nfout*Kx*Ky weight;LSTM layers of the weight has m
The weight of a full articulamentum forms, and the m is the integer greater than 0, and i-th of full articulamentum weight is (Nin_i, Nout_i),
Middle i is greater than 0 and is less than or equal to the integer of m, and Nin_i indicates that i-th of full articulamentum weight inputs neuron number, Nout_i
Indicate i-th of full articulamentum weight output neuron number;It is described to include: to neural network progress coarseness beta pruning
When the weight of the full articulamentum to the neural network carries out coarseness beta pruning, the size of the sliding window is
The sliding window of Bin*Bout, wherein the Bin is greater than 0 and to be less than or equal to Nin integer, the Bout is greater than 0 and small
In or equal to Nout integer;
The sliding window is set to be slided along the direction of Bin according to step-length Sin, or along the direction of Bout
It is slided according to step-length Sout, wherein Sin is the positive integer greater than 0 and less than or equal to Bin, and Sout is greater than 0 and to be less than
Or the positive integer equal to Bout;
M value is chosen from the Nin*Nout weight by the sliding window, when the M weight meets institute
When stating preset condition, the M weight is set to zero in whole or in part, wherein the M=Bin*Bout.
When the weight of the convolutional layer to the neural network carries out coarseness beta pruning, the sliding window is that size is
The four-dimensional sliding window of Bfin*Bfout*Bx*By, wherein Bfin is the integer greater than 0 and less than or equal to Nfin, and Bfout is
Greater than 0 and be less than or equal to Nfout integer, Bx be greater than 0 and be less than or equal to Kx integer, By be greater than 0 and be less than or
Integer equal to Ky;
Slide the sliding window according to step-length Sfin along the direction of Bfin, or along the direction of Bfout
Slided according to step-length Sfout, or slided along the direction of Bx according to step-length S, or along By direction according to step
Long Sy is slided, and wherein Sfin is the integer greater than 0 and less than or equal to Bfin, and Sfout is greater than 0 and to be less than or equal to
The integer of Bfout, Sx are the integer greater than 0 and less than or equal to Bx, and Sy is the integer greater than 0 and less than or equal to By;
M weight is chosen from the Nfin*Nfout*Kx*Ky weight by the sliding window, is weighed when described M
When value meets the preset condition, the M weight is set to zero in whole or in part, wherein the M=Bfin*Bfout*
Bx*By。
When the weight of the LSTM layer to the neural network carries out coarseness beta pruning, the size of the sliding window is
Bin_i*Bout_i, wherein Bin_i is the integer greater than 0 and less than or equal to Nin_i, and Bout_i is greater than 0 and to be less than or wait
In the integer of Nout_i;The weight of the LSTM layer to neural network carries out coarseness beta pruning and specifically includes:
Slide the sliding window according to step-length Sin_i along the direction of Bin_i, or along the direction of Bout_i
Slided according to step-length Sout_i, wherein Sin_i be greater than 0 and be less than or equal to Bin_i positive integer, Sout_i be greater than
0 and be less than or equal to Bout_i positive integer;
M weight is chosen from the Bin_i*Bout_i weight by the sliding window, when
When the M weight meets the preset condition, the M weight is set to zero in whole or in part, wherein institute
State M=Bin_i*Bout_i.
Further, the weight according to after the beta pruning is trained neural network specifically:
Re -training is carried out to the neural network according to the weight after the beta pruning and by back-propagation algorithm.
Further, it is carried out between coarseness beta pruning and retraining to the neural network further include:
Quantify the weight of the neural network and/or the first operation is carried out to the neural network weight, to reduce weight
Bit number.
5th aspect, the embodiment of the present invention provide a kind of neural network computing device, the neural network computing device packet
One or more accelerators as described in first aspect, second aspect or the third aspect are included, are used for from other processing units
It is middle to obtain to operational data and control information, and specified neural network computing is executed, implementing result is transmitted by I/O interface
Give other processing units;
It, can between the multiple computing device when the neural network computing device includes multiple computing devices
To be attached by specific structure and transmit data;
Wherein, multiple computing devices are interconnected by quick external equipment interconnection Bus PC IE bus and transmit number
According to support the operation of more massive neural network;Multiple computing devices are shared same control system or are possessed respectively
Control system;Multiple computing device shared drives possess respective memory;The interconnection of multiple computing devices
Mode is any interconnection topology.
6th aspect, the embodiment of the present invention provide a kind of neural network chip, and the neural network chip includes first party
Neural network computing dress described in the processing unit in face, second aspect, accelerator and/or the 5th aspect described in the third aspect
It sets.
7th aspect, the embodiment of the present invention provide a kind of chip-packaging structure, including neural network described in the 6th aspect
Chip.
Eighth aspect, the embodiment of the present invention provide a kind of board, including neural network chip or the described in the 6th aspect
Chip-packaging structure described in seven aspects.
9th aspect, the embodiment of the present invention provide a kind of electronic device, including board described in aspect.
Further, the electronic device includes data processing equipment, robot, computer, printer, scanner, plate
Computer, intelligent terminal, mobile phone, automobile data recorder, navigator, sensor, camera, cloud server, camera, video camera, throwing
Shadow instrument, wrist-watch, earphone, mobile storage, the wearable device vehicles, household electrical appliance, and/or Medical Devices.
Further, the vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, sky
Tune, micro-wave oven, refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include that nuclear-magnetism is total
Vibration Meter, B ultrasound instrument and/or electrocardiograph.
(3) beneficial effect
The processing method of the disclosure carries out coarseness beta pruning to the weight of neural network can compared to traditional method
Make sparse neural network more regularization, accelerated conducive to hardware, while reducing the memory space of target weight position.Its
In, target weight is the weight that absolute value is greater than or equal to the second preset threshold.
The processing unit of the disclosure can be realized the processing method of the disclosure, and coarseness beta pruning unit carries out neural network
Coarseness beta pruning list, arithmetic element re-start training to the neural network after beta pruning.
The accelerator of the disclosure can speed up the neural network after processing coarseness beta pruning by setting, sufficiently excavate thick
The sparse characteristic of fineness reduces memory access and reduces operand simultaneously, to obtain speed-up ratio and reduce energy consumption.
The storage unit of the accelerator of the disclosure depositing according to target weight cooperation target weight location information by weight
Storage mode can reduce the storage overhead and memory access expense, and coarseness selects counting unit that can be selected according to target weight location information
Need to participate in the neuron of operation, to reduce operand;By using the multi-layer artificial neural network sparse for coarseness
The special SIM D of operation is instructed and the arithmetic element of customization, solves CPU and GPU operational performance deficiency, and front end decoding overheads are big
The problem of, effectively increase the support to multi-layer artificial neural network mathematical algorithm;By using for multilayer artificial neural network
The dedicated on piece of network mathematical algorithm caches, and has sufficiently excavated the reusability of input neuron and weight data, avoid repeatedly to
Memory reads these data, reduces EMS memory access bandwidth, avoid memory bandwidth as multi-layer artificial neural network operation and
The problem of its training algorithm performance bottleneck.
Detailed description of the invention
Fig. 1 is the processing unit that a kind of pair of neural network that the embodiment of the present disclosure provides carries out coarseness beta pruning rarefaction
Structural schematic diagram;
Fig. 2 is that the full articulamentum to neural network that the embodiment of the present disclosure provides carries out coarseness beta pruning schematic diagram;
Fig. 3 is that the convolutional layer to neural network that the embodiment of the present disclosure provides carries out coarseness beta pruning schematic diagram;
Fig. 4 is a kind of structural schematic diagram for accelerator that the embodiment of the present disclosure provides;
Fig. 5 is the structural schematic diagram for another accelerator that the embodiment of the present disclosure provides;
Fig. 6 is that coarseness selects counting unit course of work schematic diagram;
Fig. 7 is a kind of structural schematic diagram for processing unit that the embodiment of the present disclosure provides;
Fig. 8 a is that a kind of coarseness that the embodiment of the present disclosure provides selects number schematic diagram;
Fig. 8 b is that a kind of coarseness that the embodiment of the present disclosure provides selects number schematic diagram;
Fig. 9 is the structural schematic diagram for another accelerator that the embodiment of the present disclosure provides;
Figure 10 is the structural schematic diagram for another accelerator that the embodiment of the present disclosure provides;
Figure 11 is an a kind of specific embodiment schematic diagram of processing method that the embodiment of the present disclosure provides;
Figure 12 is a kind of structural schematic diagram for combined treatment device that the embodiment of the present disclosure provides;
Figure 13 is the structural schematic diagram for another combined treatment device that the embodiment of the present disclosure provides;
Figure 14 is a kind of structural schematic diagram for neural network processor board that the embodiment of the present disclosure provides;
Figure 15 is a kind of chip-packaging structure schematic diagram that the embodiment of the present disclosure provides;
Figure 16 is another chip-packaging structure schematic diagram that the embodiment of the present disclosure provides;
Figure 17 is another chip-packaging structure schematic diagram that the embodiment of the present disclosure provides.
Specific embodiment
For the purposes, technical schemes and advantages of the disclosure are more clearly understood, below in conjunction with specific embodiment, and reference
Attached drawing is described in further detail the disclosure.
All modules of the embodiment of the present disclosure can be hardware configuration, and the physics realization of hardware configuration includes but do not limit to
In physical device, physical device includes but is not limited to transistor, memristor, DNA computer.
It should be noted that " first ", " second " used in the disclosure, " third " etc. are only used for distinguishing different objects,
And not meaning that has any particular order relationship between these objects.
It should be noted that coarseness beta pruning (or coarseness is sparse), which refers to, obtains at least two data (weights
Or neuron), when at least two data meet preset condition, some or all of at least two data is set
Zero.
According to the basic conception of the disclosure, the processing side that a kind of pair of neural network carries out coarseness beta pruning rarefaction is provided
Method, processing unit and accelerator, to reduce weight storage and calculation amount.
Referring to Fig. 1, Fig. 1 is the place that a kind of pair of neural network provided in an embodiment of the present invention carries out coarseness beta pruning rarefaction
The structural schematic diagram for managing device, as shown in Figure 1, the processing unit includes:
Coarseness beta pruning unit carries out coarseness beta pruning for the weight to neural network, to obtain the weight after beta pruning.
Specifically, the coarseness beta pruning unit is specifically used for:
M weight is selected from the weight of neural network by sliding window, the M is the integer greater than 1;When described
When M weight meets preset condition, the M weight is set to zero in whole or in part.
Wherein, the preset condition are as follows:
The information content of above-mentioned M weight meets default Rule of judgment.
As an alternative embodiment, above-mentioned default Rule of judgment includes threshold decision condition.Wherein, threshold decision
Condition may include: to be less than or equal to a given threshold value less than a given threshold value, be greater than a given threshold value, it is given be more than or equal to one
Threshold value, in a given value range or one or more of outside a given value range.
Specifically, the information content of above-mentioned M weight is less than a given threshold value, wherein the information content of above-mentioned M weight includes
But it is exhausted to be not limited to the arithmetic mean of instantaneous value of the M weight absolute value, the geometrical mean of the M weight absolute value and the M weight
To the maximum value of value.The arithmetic mean of instantaneous value of above-mentioned M weight absolute value is less than first threshold;Or above-mentioned M weight absolute value
Geometrical mean be less than second threshold;Or the maximum value of above-mentioned M weight absolute value is less than third threshold value.For above-mentioned
Respective selection in one threshold value, second threshold and third threshold value, those skilled in the art can according to circumstances preset,
Acquisition can also be calculated by way of changing the input parameter in preset formula, can also be obtained by way of machine learning
?.For the acquisition modes of first threshold, second threshold and third threshold value, the disclosure is simultaneously not specifically limited.
As an alternative embodiment, above-mentioned default Rule of judgment includes Function Mapping Rule of judgment, which reflects
Penetrating Rule of judgment is to judge whether above-mentioned M weight meets specified criteria after functional transformation.
Further, above-mentioned neural network includes full articulamentum, convolutional layer and shot and long term memory (long short term
Memory, LSTM) layer, wherein the weight of the full articulamentum is a two-dimensional matrix (Nin, Nout), and wherein Nin is input
The number of neuron, Nout are the numbers of output neuron, and the full articulamentum has Nin*Nout weight;The convolutional layer
Weight is a four-matrix (Nfin, Nfout, Kx, Ky), and wherein Nfin is the number of input feature vector image, and Nfout is output
The number of characteristic image, (Kx, Ky) are the sizes of convolution kernel, and the convolutional layer has Nfin*Nfout*Kx*Ky weight;It is described
LSTM layers of weight is made of the weight of m full articulamentums, and the m is the integer greater than 0, and i-th of full articulamentum weight is
(Nin_i, Nout_i), wherein i is greater than 0 and is less than or equal to the integer of m, and Nin_i indicates i-th of full articulamentum weight input
Neuron number, Nout_i indicate i-th of full articulamentum weight output neuron number;The coarseness beta pruning unit is specifically used
In:
When the weight to the full articulamentum carries out coarseness cut operator, the size of the sliding window is Bin*
The sliding window of Bout, wherein the Bin be greater than 0 and be less than or equal to Nin integer, the Bout be greater than 0 and be less than or
Integer equal to Nout;
The sliding window is set to be slided along the direction of Bin according to step-length Sin, Huo Zheyan
The direction of Bout slided according to step-length Sout, wherein Sin is greater than 0 and just whole less than or equal to Bin
Number, Sout are the positive integer greater than 0 and less than or equal to Bout;
M value is chosen from the Nin*Nout weight by the sliding window, when the M weight meets institute
When stating preset condition, the M weight is set to zero in whole or in part, wherein the M=Bin*Bout;Detailed process ginseng
See Fig. 2.
When the weight to the convolutional layer carries out coarseness cut operator, the sliding window is that size is Bfin*
The four-dimensional sliding window of Bfout*Bx*By, wherein Bfin is the integer greater than 0 and less than or equal to Nfin, and Bfout is greater than 0
And it is less than or equal to the integer of Nfout, Bx is the integer greater than 0 and less than or equal to Kx, and By is greater than 0 and to be less than or equal to Ky
Integer;
Slide the sliding window according to step-length Sfin along the direction of Bfin, or along the direction of Bfout
Slided according to step-length Sfout, or slided along the direction of Bx according to step-length S, or along By direction according to step
Long Sy is slided, and wherein Sfin is the integer greater than 0 and less than or equal to Bfin, and Sfout is greater than 0 and to be less than or equal to
The integer of Bfout, Sx are the integer greater than 0 and less than or equal to Bx, and Sy is the integer greater than 0 and less than or equal to By;
M weight is chosen from the Nfin*Nfout*Kx*Ky weight by the sliding window, is weighed when described M
When value meets the preset condition, the M weight is set to zero in whole or in part, wherein the M=Bfin*Bfout*
Bx*By;Detailed process is referring to Fig. 3.
When carrying out coarseness cut operator to LSTM layers of the weight, the size of the sliding window is Bin_i*
Bout_i, wherein Bin_i is the integer greater than 0 and less than or equal to Nin_i, and Bout_i is greater than 0 and to be less than or equal to Nout_
The integer of i;
Slide the sliding window according to step-length Sin_i along the direction of Bin_i, or along the direction of Bout_i
Slided according to step-length Sout_i, wherein Sin_i be greater than 0 and be less than or equal to Bin_i positive integer, Sout_i be greater than
0 and be less than or equal to Bout_i positive integer;
M weight is chosen from the Bin_i*Bout_i weight by the sliding window, when the M weight is full
When the foot preset condition, the M weight is set to zero in whole or in part, wherein the M=Bin_i*Bout_i.
Further, above-mentioned M weight is the weight that above-mentioned sliding window includes in moving process.Above-mentioned coarseness is cut
Above-mentioned M weight is completely or partially set to zero and includes: by branch unit
Weight (i.e. M weight) all in above-mentioned sliding window is all set to zero by above-mentioned coarseness beta pruning unit;Or
Weight in the clinodiagonal of above-mentioned sliding window is set to zero;Or a part of weight of the centre of sliding window is set to zero,
For example the size of above-mentioned sliding window is 5*5, above-mentioned coarseness beta pruning unit is by the 3*3's among the sliding window of above-mentioned 5*5
Weight is set to zero;Or at least one weight is randomly selected from above-mentioned sliding window and is set to zero.Operation is conducive to provide in this way
The precision of subsequent trained operation.
Further, above-mentioned coarseness beta pruning unit and arithmetic element are for repeating to above-mentioned neural network progress coarseness
It beta pruning and is trained according to the weight after beta pruning, until guaranteeing there is no weight under the premise of precision does not lose predetermined accuracy
Meet above-mentioned preset condition.
Wherein, above-mentioned setting accuracy is x%, and x is number greater than 0 and less than 100, x according to different neural network and
Different applications can have different selections.
In a preferred embodiment, the value range of x is 0-5.
Further, above-mentioned processing unit further include:
Quantifying unit, for after coarseness beta pruning unit carries out coarseness beta pruning to the weight of neural network, and
Before above-mentioned arithmetic element is trained above-mentioned neural network according to the weight after beta pruning, quantify neural network weight and/
Or the first operation is carried out to the weight of the neural network, to reduce the bit number of weight.
In one feasible embodiment, the weight for quantifying neural network is specifically to replace meeting condition using a weight W0
Weight W1, which isWherein,For preset value.
Above-mentioned first operation can be the value range or the corresponding number of reduction weight of the corresponding data format of reduction weight
According to the accuracy rating of format.
Further, above-mentioned arithmetic element is specifically used for:
Re -training is carried out to above-mentioned neural network according to the weight after beta pruning and by back-propagation algorithm.
Specifically, arithmetic element can be used for executing neural network reverse train algorithm, the neural network after receiving beta pruning,
The neural network is trained using back-propagation algorithm, remains 0 by the weight of beta pruning in the training process.Operation
Neural network after training is transferred to coarseness beta pruning unit and carries out further cut operator by unit, or is directly exported.
Specifically, above-mentioned arithmetic element to each layer of above-mentioned neural network according to the sequence opposite with forward operation successively into
Row retrospectively calculate is finally gone to update weight with the gradient for the weight being calculated;Here it is successively changing for the training of neural network
In generation, it is multiple that entire training process needs to repeat this process;Each layer of reversed operation needs to be implemented two parts operation: one
Part is that weight gradient is calculated using output neuron gradient and input neuron, and another part is using output neuron
Gradient and weight, be calculated input neuron gradient (for the output neuron gradient as next layer in reversed operation with
Reversed operation is carried out for it);After having executed the reversed operation of neural network, the weight gradient of each layer has just been calculated, has been transported
Unit is calculated to be updated weight according to weight gradient.
It should be pointed out that during above-mentioned arithmetic element is trained neural network, it is set to 0 weight always
Remain 0.
In the scheme of the embodiment of the present invention, the coarseness beta pruning unit of above-mentioned processing unit to the weight of neural network into
Row coarseness cut operator, to obtain the weight after beta pruning, arithmetic element according to the weight after beta pruning to neural network again into
Row training.Coarseness beta pruning processing is carried out by the weight to neural network, reduction is subsequent to store and access value, simultaneously
Subsequent operand is also reduced, operation efficiency is improved and reduces power consumption.
Referring to fig. 4, Fig. 4 is a kind of structural schematic diagram of accelerator provided in an embodiment of the present invention.As shown in figure 4, should
Accelerator includes:
Storage unit, for storing input neuron, output neuron, weight and the instruction of neural network.
Coarseness beta pruning unit carries out coarseness beta pruning for the weight to above-mentioned neural network, after obtaining beta pruning
Weight, and by after beta pruning weight and target weight location information store into said memory cells.
It should be noted that above-mentioned coarseness beta pruning unit carries out the tool of coarseness cut operator to the weight of neural network
Body process can be found in the associated description of embodiment illustrated in fig. 1, no longer describe herein.
Arithmetic element, for being trained according to the weight after above-mentioned beta pruning to above-mentioned neural network.
Coarseness selects counting unit, for receiving input neuron and target weight location information, selects mesh
Mark weight and its corresponding input neuron.
Wherein, above-mentioned target weight is the weight that absolute value is greater than the second preset threshold.
Further, above-mentioned coarseness selects counting unit only to choose target weight and its corresponding neuronal transmission to operation list
Member.
Above-mentioned arithmetic element is also used to receive the target weight and its corresponding neuron of input, and according to target weight
And its corresponding neuron completes neural network computing by MLA operation unit, obtains output neuron, and by output nerve
Member is transmitted to said memory cells again.
Wherein, said memory cells are also used to store during above-mentioned arithmetic element generate during neural network computing
Between result.
Further, above-mentioned accelerator further include:
Instruction control unit for receiving above-metioned instruction, and will generate control information after the Instruction decoding, above-mentioned to control
Coarseness selects counting unit to carry out that number operation and above-mentioned arithmetic element is selected to carry out calculating operation.
Further, the location information of target weight and target weight is only stored when said memory cells storage weight.
It should be pointed out that said memory cells, coarseness beta pruning unit, instruction control unit, coarseness select counting unit
It is entity hardware device with arithmetic element, is not functional software unit.
Referring to Fig. 5, Fig. 5 is the structural schematic diagram of another accelerator provided in an embodiment of the present invention.As shown in figure 5,
Above-mentioned accelerator further include: pretreatment unit, storage unit are directly accessed access (direct memory access, DMA)
Unit, instruction cache unit, instruction control unit, coarseness beta pruning unit, the first cache unit, the second cache unit, third
Cache unit, coarseness select counting unit, arithmetic element and the 4th cache unit.
Wherein, above-mentioned pretreatment unit inputs pretreated data above-mentioned for pre-processing to initial data
Storage unit, above-mentioned initial data include input neuron, output neuron and weight.Above-mentioned pretreatment includes cutting to data
Point, gaussian filtering, binaryzation, regularization and/or normalization.
Said memory cells, neuron, weight and instruction for neural network.Wherein, it is only deposited when storing weight
Store up the location information of target weight and target weight.
Above-mentioned DMA unit, in said memory cells and above-metioned instruction cache unit, coarseness beta pruning unit, first
Middle progress data or instruction read-write between cache unit, the second cache unit, third cache unit or the 4th cache unit.
Above-mentioned coarseness beta pruning unit, for obtaining above-mentioned neural network from said memory cells by DMA unit
Then weight is read to carry out coarseness beta pruning to the weight of the neural network, to obtain the weight after beta pruning.Above-mentioned coarseness beta pruning
Unit is by the weight storage after beta pruning into above-mentioned first cache unit.
It should be noted that above-mentioned coarseness beta pruning unit carries out the tool of coarseness cut operator to the weight of neural network
Body process can be found in the associated description of embodiment illustrated in fig. 1, no longer describe herein.
Above-metioned instruction cache unit, for caching above-metioned instruction;
Above-mentioned first cache unit, is used for caching of target weight, which is that absolute value is greater than the second preset threshold
Weight;
Above-mentioned second cache unit is used for caching of target weight position data;The target weight position cache unit will input
Each connection weight is corresponded to corresponding input neuron in data.
Optionally, target weight position cache unit cache one-to-one method be using 1 indicate output neuron with
There is weight connection between input neuron, 0 indicates to connect between output neuron and input neuron without weight, every group of output mind
Through member with it is all input neurons connection status form one 0 and 1 character string come indicate the output neuron connection close
System.
Optionally, target weight position cache unit cache one-to-one method be using 1 indicate input neuron with
Have weight connection between output neuron, 0 indicates to connect between input neuron and output neuron without weight, every group of input with
The connection status of all outputs, which forms one 0 and 1 character string, indicates the connection relationship of the input neuron.
Optionally, target weight position cache unit caches one-to-one method as by one group of output, first connection institute
Input neuron positional distance first second group of the input distance of neuron, output input neuron apart from upper one
The distance of a input neuron, the output third group input neuron input the distance ... ... of neuron apart from upper one, according to
It is secondary to analogize, until all inputs of the exhaustion output, to indicate the connection relationship of the output.
Above-mentioned third cache unit, for caching the input neuron for being input to above-mentioned coarseness and selecting counting unit.
Above-mentioned 4th cache unit, for caching the output neuron of arithmetic element output and being obtained according to output neuron
Output neuron gradient.
The Instruction decoding is generated control letter for receiving the instruction for instructing and changing into unit by above-metioned instruction control unit
Breath has controlled above-mentioned arithmetic element and has carried out calculating operation.
Above-mentioned coarseness selects counting unit, for receiving input neuron and target weight location information, and is weighed according to target
Value location information selects the input neuron to require calculation.The coarseness, which selects counting unit only, can select target weight correspondence
Neuron and be transmitted to arithmetic element.
Above-mentioned arithmetic element, the control information for being transmitted according to instruction control unit is to input neuron and target weight
Operation is carried out, to obtain output neuron, which is stored in the 4th cache unit;And according to the output neuron
Output neuron gradient is obtained, and by output neuron gradient storage into above-mentioned 4th cache unit.
Specifically, above-mentioned coarseness selects counting unit, for according to the location information of target weight from above-mentioned input neuron
Chosen in the input neuron of buffer cell input for the corresponding input neuron of target weight, then by target weight and its
Corresponding input neuronal transmission is to arithmetic element.
In one embodiment, above-mentioned arithmetic element may include multiple processing units, to realize that parallel computation obtains not
Same output neuron, and obtained output neuron is stored into output neuron cache unit.Wherein, above-mentioned multiple processing are single
Each processing unit in member includes a local Weight selected device module, for further processing the sparse number of dynamic coarseness
According to.Above-mentioned coarseness selects counting unit for handling static degree of rarefication, above-mentioned coarseness by the input neuron needed for selecting
Select counting unit specific work process referring to the associated description of Fig. 6.
Referring to Fig. 6, firstly, above-mentioned coarseness selects counting unit to generate neuron index according to input neuron value, wherein often
A index can all indicate whether corresponding neuron is useful (" 0 ").Secondly, above-mentioned coarseness selects neuron of the counting unit by generation
Index and weight location information (i.e. weight index) carry out with operation (i.e. AND operation) and combine to obtain neuronal marker,
And each of the neuronal marker indicates whether to select corresponding neuron.Third, above-mentioned coarseness select counting unit to add
Then each of neuronal marker is executed with obtaining cumulative character string between cumulative character string and neuronal marker
With operation (i.e. AND operation) is to generate the target string for selecting input neuron;Finally, above-mentioned coarseness selects number list
Member selects to actually enter neuron using target string, to carry out subsequent calculating in above-mentioned arithmetic element.On meanwhile
Stating coarseness selects counting unit to generate index according to target string and the weight index character string of accumulation (i.e. weight location information)
Character string simultaneously passes to above-mentioned arithmetic element.
Above-mentioned arithmetic element is mainly for the treatment of dynamic sparsity and performs effectively all operations of neural network.The nerve
Meta function unit includes multiple processing units.As shown in fig. 7, each processing unit includes weight buffer area, weight decoder mould
The neuronal function unit of block, Weight selected device module and processing unit.Each processing unit load is slow from its local weight
The weight in area is rushed, because it is also independent that weight, which is between independent, different processing between different output neurons,
's.Weight decoder module with look-up table is placed on the side of weight buffer area, with the code according to used in local quantization
This extracts practical weight with the compressed value in dictionary.
As shown in Figure 8 a, Weight selected device module receives index character string and weight from weight decoder module, with
Select the useful weight calculated the neuronal function unit of processing unit.The neuronal function unit of each processing unit
By Tm multiplier, an adder tree and a nonlinear function module composition, as shown in Figure 8 b.Above-mentioned neuronal function unit
Neural network is mapped on processing unit using time-sharing method, i.e., each processing unit in parallel processing output neuron,
And M/Tm period is needed to calculate the output neuron for needing M multiplication, because it can realize Tm in one cycle
Multiplication.Then, neuronal function unit is collected and the output for all processing units that collect, and arrives output to calculate or to store later
Neuron cache unit.
The Weight selected device module weight that only selection needs when considering that dynamic is sparse, because above-mentioned weight buffer area is compact
Ground stores weight to realize static sparsity.Index character based on the neuron selector module comprising weight location information
String further filters weight and selects to calculate required weight, referring to Fig. 8 a.Each processing unit will be in different output nerves
It works in member, to generate different weights.Therefore, inside processing unit enforcement right value selector module and weight buffer area with
Avoid high bandwidth and delay.
It should be pointed out that dynamic it is sparse refer generally to input neuron it is sparse because input neuron value with input
Change and changes.It is dynamically sparse to be mainly derived from this excitation function of relu, because absolute value can be less than by the functional operation
The input neuron of threshold value is set to 0.The static sparse weight that refers generally to is sparse, because weight is cut off rear topological structure and no longer changed
Become.
Wherein, above-metioned instruction cache unit, input neuron cache unit, target weight cache unit, target weight position
It sets cache unit and output neuron cache unit is on piece caching.
Specifically, arithmetic element includes but are not limited to three parts, first part's multiplier, second part add tree,
Part III is activation primitive unit.First input data (in1) is multiplied to obtain by first part with the second input data (in2)
Output (out1) after multiplication, process are as follows: out=in1*in2;Third input data in3 is passed through add tree by second part
It is added step by step and obtains the second output data (out2), wherein in3 is the vector that a length is N, and N is greater than 1, crosses and is known as: out2
=in3 [1]+in3 [2]+...+in3 [N], and/or by third input data (in3) by addition number it is cumulative after and it is the 4th defeated
Enter data (in4) addition and obtains the second output data (out2), process are as follows: out=in3 [1]+in3 [2]+...+in3 [N]+
In4, or third input data (in3) is added to obtain the second output data (out2), process with the 4th input data (in4)
Are as follows: out2=in3+in4;Part III is activated the 5th input data (in5) by activation primitive (active) operation
Output data (out), process are as follows: out3=active (in5), activation primitive active can be sigmoid, tanh, relu,
Softmax etc., in addition to doing activation operation, other nonlinear functions are may be implemented in Part III, can be logical by input data (in)
It crosses operation (f) and obtains output data (out), process are as follows: out=f (in).
Further, arithmetic element can also include pond unit, and input data (in) is passed through Chi Huayun by pond unit
It calculates and obtains the output data (out) after pondization operation, process is out=pool (in), and wherein pool is pondization operation, Chi Hua
Operation includes but is not limited to: average value pond, maximum value pond, intermediate value pond, and input data in is and exports out relevant one
Data in a pond core.
It includes several parts that the arithmetic element, which executes operation, and first part is that first input data and second is defeated
Enter data multiplication, the data after being multiplied;Second part executes add tree operation, adds for passing through third input data
Method tree is added step by step, or by the third input data by being added to obtain output data with the 4th input data;Third portion
Divide and execute activation primitive operation, output data is obtained by activation primitive (active) operation to the 5th input data.It is above several
The operation of a part can be freely combined, to realize the operation of various different function.
It is pointed out that above-mentioned pretreatment unit, storage unit, DMA unit, coarseness beta pruning unit, instruction buffer
Unit, instruction control unit, the first cache unit, the second cache unit, third cache unit, the 4th cache unit, coarseness
It selects counting unit and arithmetic element is entity hardware device, be not functional software unit.
Referring to Fig. 9, Fig. 9 is the structural schematic diagram of another accelerator provided in an embodiment of the present invention.As shown in figure 9,
Above-mentioned accelerator further include: pretreatment unit, storage unit, DMA unit, instruction cache unit, instruction control unit, coarse grain
Spend beta pruning unit, target weight cache unit, target weight position cache unit, input neuron cache unit, coarseness choosing
Counting unit, arithmetic element, output neuron cache unit and output neuron gradient cache unit.
Wherein, above-mentioned pretreatment unit inputs pretreated data above-mentioned for pre-processing to initial data
Storage unit, above-mentioned initial data include input neuron, output neuron and weight.Above-mentioned pretreatment includes cutting to data
Point, gaussian filtering, binaryzation, regularization and/or normalization.
Said memory cells, neuron, weight and instruction for neural network.Wherein, it is only deposited when storing weight
Store up the location information of target weight and target weight.
Above-mentioned DMA unit, in said memory cells and above-metioned instruction cache unit, coarseness beta pruning unit, target
Middle progress data or instruction between weight position cache unit, input neuron cache unit or output neuron cache unit
Read-write.
Above-mentioned coarseness beta pruning unit, for obtaining above-mentioned nerve from said memory cells by DMA unit
Then the weight of network carries out coarseness beta pruning to the weight of the neural network, to obtain the weight after beta pruning.On
Coarseness beta pruning unit is stated to store up the weight after beta pruning into above-mentioned target weight cache unit.
It should be noted that above-mentioned coarseness beta pruning unit carries out the tool of coarseness cut operator to the weight of neural network
Body process can be found in the associated description of embodiment illustrated in fig. 1, no longer describe herein.
Above-metioned instruction cache unit, for caching above-metioned instruction;
Above-mentioned target weight cache unit is used for caching of target weight;
Above-mentioned target weight position cache unit is used for caching of target weight position data;Target weight position caching is single
Member corresponds connection weight each in input data to corresponding input neuron.
Optionally, target weight position cache unit cache one-to-one method be using 1 indicate output neuron with
There is weight connection between input neuron, 0 indicates to connect between output neuron and input neuron without weight, every group of output mind
Through member with it is all input neurons connection status form one 0 and 1 character string come indicate the output neuron connection close
System.
Optionally, target weight position cache unit cache one-to-one method be using 1 indicate input neuron with
Have weight connection between output neuron, 0 indicates to connect between input neuron and output neuron without weight, every group of input with
The connection status of all outputs, which forms one 0 and 1 character string, indicates the connection relationship of the input neuron.
Optionally, target weight position cache unit caches one-to-one method as by one group of output, first connection institute
Input neuron positional distance first second group of the input distance of neuron, output input neuron apart from upper one
The distance of a input neuron, the output third group input neuron input the distance ... ... of neuron apart from upper one, according to
It is secondary to analogize, until all inputs of the exhaustion output, to indicate the connection relationship of the output.
Above-mentioned input neuron cache unit, for caching the input neuron for being input to above-mentioned coarseness and selecting counting unit.
Above-mentioned output neuron cache unit, for caching the output neuron of arithmetic element output.
Above-mentioned output neuron gradient cache unit, for caching the gradient of above-mentioned output neuron.
The Instruction decoding is generated control letter for receiving the instruction for instructing and changing into unit by above-metioned instruction control unit
Breath has controlled above-mentioned arithmetic element and has carried out calculating operation.
Above-mentioned coarseness selects counting unit, for receiving input neuron and target weight location information, and is weighed according to target
Value location information selects the input neuron to require calculation.The coarseness, which selects counting unit only, can select target weight correspondence
Neuron and be transmitted to arithmetic element.
Above-mentioned arithmetic element, for according to the target weight that is obtained from above-mentioned target weight cache unit and its corresponding
It inputs neuron and carries out operation, to obtain output neuron;It is single that the output neuron is cached to above-mentioned output neuron caching
In member.
Above-mentioned arithmetic element is also used to be trained according to the weight after output neuron gradient and beta pruning.
It should be noted that above-mentioned accelerator each unit function can be found in the associated description of above-mentioned embodiment illustrated in fig. 5,
It no longer describes herein.
It is pointed out that above-mentioned pretreatment unit, storage unit, DMA unit, coarseness beta pruning unit, instruction buffer
Unit, target weight cache unit, target weight position cache unit, inputs neuron cache unit, is defeated instruction control unit
Neuron gradient cache unit, output neuron cache unit, coarseness select counting unit out and arithmetic element is entity hardware
Device is not functional software unit.
Referring to Figure 10, Figure 10 is the structural schematic diagram of another accelerator provided in an embodiment of the present invention.Such as Figure 10 institute
Show, above-mentioned accelerator further include:
Pretreatment unit, storage unit, DMA unit, instruction cache unit, instruction control unit, coarseness beta pruning unit,
Target weight cache unit, target weight position cache unit, input neuron cache unit, coarseness select counting unit, operation
Unit and output neuron cache unit.
Wherein, above-mentioned pretreatment unit inputs pretreated data above-mentioned for pre-processing to initial data
Storage unit, above-mentioned initial data include input neuron, output neuron and weight.Above-mentioned pretreatment includes cutting to data
Point, gaussian filtering, binaryzation, regularization and/or normalization.
Said memory cells, neuron, weight and instruction for neural network.Wherein, it is only deposited when storing weight
Store up the location information of target weight and target weight.
Above-mentioned DMA unit, in said memory cells and above-metioned instruction cache unit, coarseness beta pruning unit, target
Middle progress data or instruction between weight position cache unit, input neuron cache unit or output neuron cache unit
Read-write.
Above-mentioned coarseness beta pruning unit, for obtaining above-mentioned neural network from said memory cells by DMA unit
Then weight carries out coarseness beta pruning to the weight of the neural network, to obtain the weight after beta pruning.Above-mentioned coarseness beta pruning list
Member stores up the weight after beta pruning into above-mentioned target weight cache unit.
It should be noted that above-mentioned coarseness beta pruning unit carries out the tool of coarseness cut operator to the weight of neural network
Body process can be found in the associated description of embodiment illustrated in fig. 1, no longer describe herein.
Above-metioned instruction cache unit, for caching above-metioned instruction;
Above-mentioned target weight cache unit is used for caching of target weight;
Above-mentioned target weight position cache unit is used for caching of target weight position data;Target weight position caching is single
Member corresponds connection weight each in input data to corresponding input neuron.
Optionally, target weight position cache unit cache one-to-one method be using 1 indicate output neuron with
There is weight connection between input neuron, 0 indicates to connect between output neuron and input neuron without weight, every group of output mind
Through member with it is all input neurons connection status form one 0 and 1 character string come indicate the output neuron connection close
System.
Optionally, target weight position cache unit cache one-to-one method be using 1 indicate input neuron with
Have weight connection between output neuron, 0 indicates to connect between input neuron and output neuron without weight, every group of input with
The connection status of all outputs, which forms one 0 and 1 character string, indicates the connection relationship of the input neuron.
Optionally, target weight position cache unit caches one-to-one method as by one group of output, first connection institute
Input neuron positional distance first second group of the input distance of neuron, output input neuron apart from upper one
The distance of a input neuron, the output third group input neuron input the distance ... ... of neuron apart from upper one, according to
It is secondary to analogize, until all inputs of the exhaustion output, to indicate the connection relationship of the output.
Above-mentioned input neuron cache unit, for caching the input neuron for being input to above-mentioned coarseness and selecting counting unit.
Above-mentioned output neuron cache unit, for caching the output neuron of arithmetic element output.
Above-mentioned output neuron gradient cache unit, for caching the gradient of above-mentioned output neuron.
The Instruction decoding is generated control letter for receiving the instruction for instructing and changing into unit by above-metioned instruction control unit
Breath has controlled above-mentioned arithmetic element and has carried out calculating operation.
Above-mentioned coarseness selects counting unit, for receiving input neuron and target weight location information, and is weighed according to target
Value location information selects the input neuron to require calculation.The coarseness, which selects counting unit only, can select target weight correspondence
Input neuron and be transmitted to arithmetic element.
Above-mentioned arithmetic element, for according to the target weight that is obtained from above-mentioned target weight cache unit and its corresponding
It inputs neuron and carries out operation, to obtain output neuron;It is single that the output neuron is cached to above-mentioned output neuron caching
In member.
It should be noted that above-mentioned accelerator each unit function can be found in the associated description of above-mentioned embodiment illustrated in fig. 5,
It no longer describes herein.
It is pointed out that above-mentioned pretreatment unit, storage unit, DMA unit, coarseness beta pruning unit, instruction buffer
Unit, target weight cache unit, target weight position cache unit, inputs neuron cache unit, is defeated instruction control unit
Neuron cache unit, output neuron gradient cache unit, coarseness select counting unit out and arithmetic element is entity hardware
Device is not functional software unit.
Hereinafter, enumerating neural network processor embodiment, the processing method of the disclosure is specifically described, it should be understood that
Be that it is not intended to limit the disclosure, it is all using equivalent structure or equivalent flow shift made by this specific embodiment, or
Other related technical areas directly or indirectly are used in, similarly include in the protection scope of the disclosure.
It is an a kind of specific embodiment schematic diagram of processing method provided in an embodiment of the present invention referring to Figure 11, Figure 11.Such as
The full articulamentum that Figure 11 show neural network is after coarseness beta pruning as a result, full articulamentum shares 8 input minds
It is n1~n8 and 3 output neuron through member is o1~o3.Tetra- input neurons of wherein n3, n4, n7, n8 and o1, o2, o3 tri-
Weight between a output neuron is set to zero by the way that coarseness is sparse;N1 and o1, o2 pass through s11, s12, s13 between o3
Three weight connections, by s21 between n2 and o1, o2, o3, tri- weight connections of s22, s23 pass through between n5 and o1, o2, o3
Tri- weight connections of s31, s32, s33 pass through s41, tri- weight connections of s42, s43 between n6 and o1, o2, o3;We use
11001100 this Bit String indicate the connection between input neuron and output neuron, i.e. the first expression target power
The case where being worth location information, 1 indicates that input neuron and three output neurons all connect, and 0 indicates input neuron and three
Output neuron is all not connected to.Table 1 describes the information of neuron and weight in embodiment, and formula 1 describes o1, o2, o3 tri-
The operational formula of a output neuron.It can be seen that o1, o2, o3 will receive identical neuron and carry out operation from formula 1.
Fine granularity beta pruning refers to each weight being regarded as independent individual, when beta pruning if some weight is eligible
It is cut off;Coarseness beta pruning is exactly that weight is grouped according to certain mode, and every group includes multiple weights, if one group of weight symbol
Conjunction condition, this group of weight will be wiped out all.
Table 1
Formula 1-- output neuron operational formula:
O1=n1*s11+n2*s12+n5*s13+n6*s14
O2=n1*s21+n2*s22+n5*s23+n6*s24
O3=n1*s31+n2*s32+n5*s33+n6*s34
When processing unit carries out operation, 8 input neurons, the location information of 12 weights and 8 bits and accordingly
Instruction be transferred to storage unit.Coarseness selects counting unit to receive 8 input neurons and target weight position, selects n1,
N2, n5, n6 tetra- need to participate in the neuron of operation.Arithmetic element receives four neurons and weight selected, passes through public affairs
Formula 1 completes the operation of output neuron, and output neuron is then transmitted back to storage section.
In some embodiments of the disclosure, discloses a kind of accelerator, comprising: memory: being stored with executable instruction;Place
Reason device: it for executing the executable instruction in storage unit, is operated when executing instruction according to above-mentioned processing method.
Wherein, processor can be single processing unit, but also may include two or more processing units.In addition,
Processor can also include general processor (CPU) or graphics processor (GPU);Field programmable logic can also be included in
Gate array (FPGA) or specific integrated circuit (ASIC), to be configured to neural network and operation.Processor can also wrap
Include the on-chip memory (including the memory in processing unit) for caching purposes.
The application is also disclosed that a neural network computing device comprising what one or more was mentioned in this application adds
Speed variator or processing unit execute specified nerve for being obtained from other processing units to operational data and control information
Network operations and/or training, implementing result pass to peripheral equipment by I/O interface.Peripheral equipment for example camera, display
Device, mouse, keyboard, network interface card, wifi interface, server.When comprising more than one computing device, it can pass through between computing device
Specific structure is linked and is transmitted data, for example, data is interconnected and transmitted by PCIE bus, to support bigger rule
The operation and/or training of the neural network of mould.At this point it is possible to share same control system, there can also be control independent
System;Can with shared drive, can also each accelerator have respective memory.In addition, its mutual contact mode can be any interconnection
Topology.
The neural network computing device compatibility with higher can pass through PCIE interface and various types of server phases
Connection.
The application is also disclosed that a combined treatment device comprising above-mentioned neural network computing device, general interconnection
Interface and other processing units.Neural network computing device is interacted with other processing units, common to complete what user specified
Operation.Figure 12 is the schematic diagram of combined treatment device.
Other processing units, including central processor CPU, graphics processor GPU, neural network processor etc. are general/special
With one of processor or above processor type.Processor quantity included by other processing units is with no restrictions.Its
His interface of the processing unit as neural network computing device and external data and control, including data are carried, and are completed to Benshen
Unlatching, stopping through network operations device etc. control substantially;Other processing units can also cooperate with neural network computing device
It is common to complete processor active task.
General interconnecting interface, for transmitting data and control between the neural network computing device and other processing units
Instruction.The neural network computing device obtains required input data, write-in neural network computing dress from other processing units
Set the storage device of on piece;Control instruction can be obtained from other processing units, write-in neural network computing device on piece
Control caching;The data in the memory module of neural network computing device can also be read and be transferred to other processing units.
Optionally, the structure is as shown in figure 13, can also include storage device, storage device respectively with the neural network
Arithmetic unit is connected with other described processing units.Storage device for be stored in the neural network computing device and it is described its
The data of his processing unit, the data of operation required for being particularly suitable for are in this neural network computing device or other processing units
Storage inside in the data that can not all save.
The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment
The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment
The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard,
Network interface card, wifi interface.
In some embodiments, a kind of neural network processor is disclosed, which includes above-mentioned nerve
Network operations device or combined treatment device.
In some embodiments, a kind of chip is disclosed comprising above-mentioned neural network processor.
In some embodiments, a kind of chip-packaging structure is disclosed comprising said chip.
In some embodiments, a kind of board is disclosed comprising said chip encapsulating structure.
In some embodiments, a kind of electronic device is disclosed comprising above-mentioned board.
Figure 14 is please referred to, Figure 14 is a kind of structural representation of neural network processor board provided by the embodiments of the present application
Figure.As shown in figure 14, above-mentioned neural network processor board includes said chip encapsulating structure, the first electrical and non-electrical connection
Device and first substrate (substrate).
The application is not construed as limiting the specific structure of chip-packaging structure, optionally, as shown in figure 15, said chip envelope
Assembling structure includes: chip, the second electrical and non-electrical attachment device, the second substrate.
The concrete form of chip involved in the application is not construed as limiting, and said chip is including but not limited to will be at neural network
The integrated neural network chip of device is managed, above-mentioned chip can be made of silicon materials, germanium material, quantum material or molecular material etc..
Above-mentioned neural network chip can be packaged by (such as: more harsh environment) and different application demands according to the actual situation,
So that the major part of neural network chip is wrapped, and the pin on neural network chip is connected to envelope by conductors such as gold threads
The outside of assembling structure, for carrying out circuit connection with more outer layer.
The application for first substrate and the second substrate type without limitation, can be printed circuit board (printed
Circuit board, PCB) or (printed wiring board, PWB), it is also possible to it is other circuit boards.Production to PCB
Material is also without limitation.
The second substrate involved in the application passes through the second electrical and non-electrical attachment device general for carrying said chip
The chip-packaging structure that said chip and the second substrate are attached is convenient for for protecting chip by chip-packaging structure
It is further encapsulated with first substrate.
Electrical for above-mentioned specific second and non-electrical attachment device the corresponding structure of packaged type and packaged type
It is not construed as limiting, suitable packaged type can be selected with different application demands according to the actual situation and is simply improved, example
Such as: flip chip ball grid array encapsulates (Flip Chip Ball Grid Array Package, FCBGAP), and slim four directions is flat
Flat encapsulates (Low-profile Quad Flat Package, LQFP), quad flat package (the Quad Flat with radiator
Package with Heat sink, HQFP), without pin quad flat package (Quad Flat Non-lead Package,
QFN) or small spacing quad flat formula encapsulates packaged types such as (Fine-pitch Ball Grid Package, FBGA).
Flip-chip (Flip Chip), suitable for the area requirements after encapsulation are high or biography to the inductance of conducting wire, signal
In the case where defeated time-sensitive.In addition to this packaged type that wire bonding (Wire Bonding) can be used, reduces cost, mentions
The flexibility of high encapsulating structure.
Ball grid array (Ball Grid Array), is capable of providing more pins, and the average conductor length of pin is short, tool
The effect of standby high-speed transfer signal, wherein encapsulation can encapsulate (Pin Grid Array, PGA), zero slotting with Pin-Grid Array
Pull out force (Zero Insertion Force, ZIF), single edge contact connection (Single Edge Contact Connection,
SECC), contact array (Land Grid Array, LGA) etc. replaces.
Optionally, using the packaged type of flip chip ball grid array (Flip Chip Ball Grid Array) to mind
It is packaged through network chip and the second substrate, the schematic diagram of specific neural network chip encapsulating structure can refer to Figure 16.Such as
Shown in Figure 16, said chip encapsulating structure include: chip 21, pad 22, soldered ball 23, the second substrate 24, in the second substrate 24
Tie point 25, pin 26.
Wherein, pad 22 is connected with chip 21, by welding between the tie point 25 on pad 22 and the second substrate 24
Soldered ball 23 is formed, neural network chip 21 and the second substrate 24 are connected, that is, realize the encapsulation of chip 21.
Pin 26 with the external circuit (for example, first substrate on board) of encapsulating structure for being connected, it can be achieved that external
The transmission of data and internal data is handled data convenient for chip 21 or the corresponding neural network processor of chip 21.It is right
It is also not construed as limiting in the type and quantity the application of pin, different pin forms can be selected according to different encapsulation technologies, and
Certain rule is deferred to be arranged.
Optionally, above-mentioned neural network chip encapsulating structure further includes insulation filler, is placed in pad 22, soldered ball 23 and connects
In gap between contact 25, interference is generated between soldered ball and soldered ball for preventing.
Wherein, the material of insulation filler can be silicon nitride, silica or silicon oxynitride;Interference comprising electromagnetic interference,
Inductive interferences etc..
Optionally, above-mentioned neural network chip encapsulating structure further includes radiator, for distributing neural network chip 21
Heat when operation.Wherein, radiator can be the good sheet metal of one piece of thermal conductivity, cooling fin or radiator, for example, wind
Fan.
For example, as shown in figure 17, chip-packaging structure include: chip 21, pad 22, soldered ball 23, the second substrate 24,
Tie point 25, pin 26, insulation filler 27, thermal grease 28 and metal shell cooling fin 29 in the second substrate 24.Wherein, it dissipates
Hot cream 28 and metal shell cooling fin 29 are used to distribute heat when chip 21 is run.
Optionally, said chip encapsulating structure further includes reinforced structure, is connect with pad 22, and it is interior be embedded in soldered ball 23,
To enhance the bonding strength between soldered ball 23 and pad 22.
Wherein, reinforced structure can be metal wire structure or column structure, it is not limited here.
The application is electrical for first and the concrete form of non-electrical device of air is also not construed as limiting, and it is electrical and non-to can refer to second
Chip-packaging structure, i.e., be packaged by the description of electric device by welding, can also using connecting line connection or
Pluggable mode connects the mode of the second substrate and first substrate, is convenient for subsequent replacement first substrate or chip-packaging structure.
Optionally, first substrate includes the internal storage location interface etc. for extension storage capacity, such as: synchronous dynamic random
Memory (Synchronous Dynamic Random Access Memory, SDRAM), Double Data Rate synchronous dynamic random are deposited
Reservoir (Double Date Rate SDRAM, DDR) etc., the processing energy of neural network processor is improved by exented memory
Power.
It may also include quick external equipment interconnection bus (Peripheral Component on first substrate 13
Interconnect-Express, PCI-E or PCIe) interface, hot-swappable (the Small Form-factor of small package
Pluggable, SFP) interface, Ethernet interface, Controller Area Network BUS (Controller Area Network, CAN) connect
Mouthful etc., for the data transmission between encapsulating structure and external circuit, the convenience of arithmetic speed and operation can be improved.
Neural network processor is encapsulated as chip, is chip-packaging structure by chip package, chip-packaging structure is sealed
Dress is board, carries out data interaction by interface (slot or lock pin) on board and external circuit (such as: computer motherboard),
The function of neural network processor is directly realized by using neural network processor board, and protects chip.And nerve net
Other modules can be also added on network processor board, improve the application range and operation efficiency of neural network processor.
Electronic device includes data processing equipment, robot, computer, printer, scanner, plate
Computer, intelligent terminal, mobile phone, automobile data recorder, navigator, sensor, camera, cloud clothes
Business device, camera, video camera, projector, wrist-watch, earphone, mobile storage, the wearable device vehicles, household electric
Device, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven,
Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument
And/or electrocardiograph.
Special emphasis is, all modules can be hardware configuration, and the physics realization of hardware configuration includes but not office
It is limited to physical device, physical device includes but is not limited to transistor, memristor, DNA computer.It should be noted that running through attached drawing, phase
Same element is indicated by same or similar appended drawing reference.When may cause the understanding of the present invention and cause to obscure, will save
Slightly conventional structure or construction.It should be noted that the shape and size of each component do not reflect actual size and ratio in figure, and only illustrate this
The content of inventive embodiments.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment
Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or
Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any
Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed
All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power
Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose
It replaces.
Particular embodiments described above has carried out further in detail the purpose of the disclosure, technical scheme and beneficial effects
Describe in detail bright, it should be understood that the foregoing is merely the specific embodiment of the disclosure, be not limited to the disclosure, it is all
Within the spirit and principle of the disclosure, any modification, equivalent substitution, improvement and etc. done should be included in the protection of the disclosure
Within the scope of.
Claims (10)
1. a kind of accelerator characterized by comprising coarseness beta pruning unit, storage unit, is directly deposited pretreatment unit
Storage access DMA unit, coarseness select counting unit, instruction control unit and arithmetic element,
Wherein, the pretreatment unit is connect with the storage unit, and the storage unit is connect with the DMA unit, described
DMA unit selects counting unit and arithmetic element to connect with the coarseness beta pruning unit, instruction control unit, coarseness, described thick
Granularity beta pruning unit and instruction control unit are connect with the arithmetic element;
The pretreatment unit is stored for pre-processing to initial data, and by pretreated data to the storage
Unit;The initial data includes input neuron, output neuron and weight;
The storage unit, for storing input neuron, output neuron, weight and instruction;Wherein when the storage finger
Target weight and its location information are only stored when enabling;The target weight is the weight that absolute value is greater than the second preset threshold;
The DMA unit, for being selected in the storage unit and described instruction control unit, coarseness beta pruning unit, coarseness
Middle progress data or instruction read-write between counting unit and arithmetic element;
The coarseness beta pruning unit, specifically for obtaining the weight from the storage unit by the DMA unit, so
Coarseness beta pruning is carried out to the weight afterwards, to obtain the weight after beta pruning;
Arithmetic element, for being trained according to the weight after the beta pruning to neural network;
Described instruction control unit, for lead to it is described cross DMA unit acquisition instruction from the storage unit, and by Instruction decoding
It generates control information and carries out calculating operation to control the arithmetic element;
The coarseness selects counting unit, for receiving the location information of input neuron and the target weight, and according to described
The location information of target weight selects the input neuron to require calculation, and by the input neuronal transmission to the operation
Unit;
The arithmetic element, the control information for being also used to be transmitted according to described instruction control unit weigh input neuron and target
Value carries out operation, to obtain output neuron, obtains output neuron gradient according to the output neuron, and by described
DMA unit is by output neuron and the storage of output neuron gradient into the storage unit;
Wherein, the coarseness beta pruning unit is specifically used for:
M weight is selected from the weight of neural network by sliding window, the M is the integer greater than 1;
When the M weight meets preset condition, the M weight is set to zero in whole or in part.
2. the apparatus according to claim 1, which is characterized in that described device further includes instruction cache unit, the first caching
Unit, the second cache unit, third cache unit and the 4th cache unit,
Wherein, described instruction cache unit is between the DMA unit and described instruction control unit, respectively with it is described
DMA unit is connected with instruction control unit;First cache unit is located at the coarseness beta pruning unit and the operation list
Between member, it is connect respectively with the coarseness beta pruning unit with the arithmetic element;Second cache unit is located at the DMA
Unit and the coarseness are selected between counting unit, select counting unit to connect with the DMA unit with the coarseness respectively;Described
Three cache units are located at the DMA unit and the coarseness is selected between counting unit, selected respectively with the coarseness counting unit with
The DMA unit connection;4th cache unit between the DMA unit and the arithmetic element, respectively with it is described
DMA unit is connect with the arithmetic element;
Described instruction cache unit, for caching described instruction;
First cache unit, for caching the weight after the target weight and the beta pruning;
Second cache unit, for caching the location information of the target weight;
The third cache unit, for caching the input neuron for being input to the coarseness and selecting counting unit;
4th cache unit, for caching the output neuron of the arithmetic element output and being obtained according to output neuron
Output neuron gradient.
3. device according to claim 1 or 2, which is characterized in that the pretreatment includes cutting, gaussian filtering, two-value
Change, regularization, in normalization at least one of.
4. device according to claim 3, which is characterized in that the preset condition are as follows:
The information content of the M weight is less than the first preset threshold.
5. device according to claim 4, which is characterized in that the information content of the M weight is that the M weight is absolute
The maximum value of the arithmetic mean of instantaneous value of value, the geometrical mean of the M weight absolute value or the M weight, described first
Preset threshold is first threshold, second threshold or third threshold value, and the information content of the M weight is less than the first preset threshold packet
It includes:
The arithmetic mean of instantaneous value of the M weight absolute value is less than the geometry of the first threshold or the M weight absolute value
Average value is less than the second threshold or the maximum value of the M weight is less than the third threshold value.
6. device according to claim 1-5, which is characterized in that the coarseness beta pruning unit and arithmetic element
For:
It repeats to carry out coarseness beta pruning to the weight of the neural network and be trained according to the weight after beta pruning, until protecting
Card does not have weight to meet the preset condition under the premise of precision does not lose setting accuracy.
7. device according to claim 6, which is characterized in that the setting accuracy is x%, wherein the x is between 0-5
Between.
8. device according to claim 1-6, which is characterized in that the neural network includes full articulamentum, volume
Lamination and/or shot and long term remember LSTM layers, wherein and the weight of the full articulamentum is a two-dimensional matrix (Nin, Nout),
Middle Nin is the number for inputting neuron, and Nout is the number of output neuron, and the full articulamentum has Nin*Nout weight;
The weight of the convolutional layer is a four-matrix (Nfin, Nfout, Kx, Ky), and wherein Nfin is of input feature vector image
Number, Nfout are the numbers for exporting characteristic image, and (Kx, Ky) is the size of convolution kernel, and the convolutional layer has Nfin*Nfout*Kx*
Ky weight;LSTM layers of the weight be made of the weight of the full articulamentums of m, and the m is the integer greater than 0, is connected entirely for i-th
Connecing layer weight is (Nin_i, Nout_i), and wherein i is greater than 0 and is less than or equal to the integer of m, and Nin_i indicates i-th of full connection
Layer weight inputs neuron number, and Nout_i indicates i-th of full articulamentum weight output neuron number;The coarseness beta pruning
Unit is specifically used for:
When the weight to the full articulamentum carries out coarseness cut operator, the size of the sliding window is Bin*Bout's
Sliding window, wherein the Bin is greater than 0 and to be less than or equal to Nin integer, the Bout is greater than 0 and to be less than or equal to
The integer of Nout;
So that the sliding window is slided along the direction of Bin according to step-length Sin, or along Bout direction according to
Step-length Sout is slided, and wherein Sin is the positive integer greater than 0 and less than or equal to Bin, and Sout is greater than 0 and to be less than or wait
In the positive integer of Bout;
M value is chosen from the Nin*Nout weight by the sliding window, when the M weight meet it is described pre-
If when condition, the M weight is set to zero in whole or in part, wherein the M=Bin*Bout;
When the weight to the convolutional layer carries out coarseness cut operator, the sliding window is that size is Bfin*Bfout*
The four-dimensional sliding window of Bx*By, wherein Bfin is the integer greater than 0 and less than or equal to Nfin, and Bfout is greater than 0 and to be less than
Or the integer equal to Nfout, Bx are the integer greater than 0 and less than or equal to Kx, By is greater than 0 and whole less than or equal to Ky
Number;
Slide the sliding window according to step-length Sfin along the direction of Bfin, or along Bfout direction according to
Step-length Sfout is slided, or is slided along the direction of Bx according to step-length S, or along By direction according to step-length Sy
It is slided, wherein Sfin is the integer greater than 0 and less than or equal to Bfin, and Sfout is greater than 0 and to be less than or equal to Bfout
Integer, Sx be greater than 0 and be less than or equal to Bx integer, Sy be greater than 0 and be less than or equal to By integer;
M weight is chosen from the Nfin*Nfout*Kx*Ky weight by the sliding window, when the M weight is full
When the foot preset condition, the M weight is set to zero in whole or in part, wherein the M=Bfin*Bfout*Bx*
By;
When carrying out coarseness cut operator to LSTM layers of the weight, the size of the sliding window is Bin_i*Bout_
I, wherein Bin_i is the integer greater than 0 and less than or equal to Nin_i, and Bout_i is greater than 0 and whole less than or equal to Nout_i
Number;
Slide the sliding window according to step-length Sin_i along the direction of Bin_i, or along Bout_i direction according to
Step-length Sout_i is slided, wherein Sin_i be greater than 0 and be less than or equal to Bin_i positive integer, Sout_i be greater than 0 and
Positive integer less than or equal to Bout_i;
M weight is chosen from the Bin_i*Bout_i weight by the sliding window, when the M weight meets institute
When stating preset condition, the M weight is set to zero in whole or in part, wherein the M=Bin_i*Bout_i.
9. device according to claim 1-8, which is characterized in that the arithmetic element is specifically used for:
Re -training is carried out to the neural network according to the weight after beta pruning and by back-propagation algorithm.
10. -9 described in any item devices according to claim 1, which is characterized in that the processing unit further include:
Quantifying unit, for neural network weight carry out coarseness beta pruning after and according to the weight after beta pruning to mind
Before carrying out retraining through network, quantifies the weight of the neural network and/or first is carried out to the weight of the neural network
Operation, to reduce the weight bit number of the neural network;
Wherein, the weight for quantifying neural network is specifically that the weight W1 for the condition that meets is replaced with weight W0, which isIt is describedFor preset value;
Carrying out the first operation to the weight of the neural network is specifically the corresponding data lattice of weight for reducing the neural network
The accuracy rating of the corresponding data format of the weight of the value range of formula or the neural network.
Applications Claiming Priority (9)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710370905.1A CN108960420B (en) | 2017-05-23 | 2017-05-23 | Processing method and acceleration device |
CN2017103709051 | 2017-05-23 | ||
CN2017104567594 | 2017-06-16 | ||
CN201710456759.4A CN109146069B (en) | 2017-06-16 | 2017-06-16 | Arithmetic device, arithmetic method, and chip |
CN2017106779874 | 2017-08-09 | ||
CN2017106780388 | 2017-08-09 | ||
CN201710677987.4A CN109389218B (en) | 2017-08-09 | 2017-08-09 | Data compression method and compression device |
CN201710678038.8A CN109389208B (en) | 2017-08-09 | 2017-08-09 | Data quantization device and quantization method |
CN201880002821.5A CN109478251B (en) | 2017-05-23 | 2018-05-23 | Processing method and acceleration device |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201880002821.5A Division CN109478251B (en) | 2017-05-23 | 2018-05-23 | Processing method and acceleration device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110175673A true CN110175673A (en) | 2019-08-27 |
CN110175673B CN110175673B (en) | 2021-02-09 |
Family
ID=69886694
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910474387.7A Active CN110175673B (en) | 2017-05-23 | 2018-05-23 | Processing method and acceleration device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110175673B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111931937A (en) * | 2020-09-30 | 2020-11-13 | 深圳云天励飞技术股份有限公司 | Gradient updating method, device and system of image processing model |
CN112990157A (en) * | 2021-05-13 | 2021-06-18 | 南京广捷智能科技有限公司 | Image target identification acceleration system based on FPGA |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105512723A (en) * | 2016-01-20 | 2016-04-20 | 南京艾溪信息科技有限公司 | Artificial neural network calculating device and method for sparse connection |
CN105787500A (en) * | 2014-12-26 | 2016-07-20 | 日本电气株式会社 | Characteristic selecting method and characteristic selecting device based on artificial neural network |
CN106548234A (en) * | 2016-11-17 | 2017-03-29 | 北京图森互联科技有限责任公司 | A kind of neural networks pruning method and device |
-
2018
- 2018-05-23 CN CN201910474387.7A patent/CN110175673B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105787500A (en) * | 2014-12-26 | 2016-07-20 | 日本电气株式会社 | Characteristic selecting method and characteristic selecting device based on artificial neural network |
CN105512723A (en) * | 2016-01-20 | 2016-04-20 | 南京艾溪信息科技有限公司 | Artificial neural network calculating device and method for sparse connection |
CN106548234A (en) * | 2016-11-17 | 2017-03-29 | 北京图森互联科技有限责任公司 | A kind of neural networks pruning method and device |
Non-Patent Citations (2)
Title |
---|
SONG HAN, ET AL.: "Deep Compression: Compression Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding", 《ARXIV:1510.00149V5》 * |
黄红梅,胡寿松: "基于在线学习RBF神经网络的故障预报", 《南京航空航天大学学报》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111931937A (en) * | 2020-09-30 | 2020-11-13 | 深圳云天励飞技术股份有限公司 | Gradient updating method, device and system of image processing model |
CN112990157A (en) * | 2021-05-13 | 2021-06-18 | 南京广捷智能科技有限公司 | Image target identification acceleration system based on FPGA |
CN112990157B (en) * | 2021-05-13 | 2021-08-20 | 南京广捷智能科技有限公司 | Image target identification acceleration system based on FPGA |
Also Published As
Publication number | Publication date |
---|---|
CN110175673B (en) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11907844B2 (en) | Processing method and accelerating device | |
CN109902812B (en) | Board card and neural network operation method | |
CN109032669A (en) | Processing with Neural Network device and its method for executing the instruction of vector minimum value | |
TWI793225B (en) | Method for neural network training and related product | |
WO2019129070A1 (en) | Integrated circuit chip device | |
CN109478251A (en) | Processing method and accelerator | |
CN109961136A (en) | Integrated circuit chip device and Related product | |
CN110175673A (en) | Processing method and accelerator | |
CN109978131A (en) | Integrated circuit chip device and Related product | |
CN108960420A (en) | Processing method and accelerator | |
CN109961134A (en) | Integrated circuit chip device and Related product | |
CN109977446A (en) | Integrated circuit chip device and Related product | |
CN109961135A (en) | Integrated circuit chip device and Related product | |
CN109961131A (en) | Neural network forward operation method and Related product | |
CN109978156A (en) | Integrated circuit chip device and Related product | |
WO2019165946A1 (en) | Integrated circuit chip device, board card and related product | |
CN109978148A (en) | Integrated circuit chip device and Related product | |
CN109978151A (en) | Neural network processor board and Related product | |
CN109977071A (en) | Neural network processor board and Related product | |
CN109978157A (en) | Integrated circuit chip device and Related product | |
CN109978150A (en) | Neural network processor board and Related product | |
CN109978152A (en) | Integrated circuit chip device and Related product | |
CN110197267A (en) | Neural network processor board and Related product | |
CN109978147A (en) | Integrated circuit chip device and Related product | |
TWI767097B (en) | Integrated circuit chip apparatus and related product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |