CN109389210B - Processing method and processing apparatus - Google Patents
Processing method and processing apparatus Download PDFInfo
- Publication number
- CN109389210B CN109389210B CN201710689666.6A CN201710689666A CN109389210B CN 109389210 B CN109389210 B CN 109389210B CN 201710689666 A CN201710689666 A CN 201710689666A CN 109389210 B CN109389210 B CN 109389210B
- Authority
- CN
- China
- Prior art keywords
- neural network
- weight
- data
- weights
- zero
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012545 processing Methods 0.000 title claims abstract description 33
- 238000003672 processing method Methods 0.000 title claims abstract description 17
- 238000000034 method Methods 0.000 claims abstract description 53
- 238000004364 calculation method Methods 0.000 claims abstract description 14
- 238000013528 artificial neural network Methods 0.000 claims description 211
- 210000002364 input neuron Anatomy 0.000 claims description 58
- 238000011176 pooling Methods 0.000 claims description 51
- 239000011159 matrix material Substances 0.000 claims description 32
- 210000002569 neuron Anatomy 0.000 claims description 28
- 210000004205 output neuron Anatomy 0.000 claims description 27
- 230000008569 process Effects 0.000 claims description 25
- 238000013138 pruning Methods 0.000 claims description 21
- 230000004913 activation Effects 0.000 claims description 20
- 238000012549 training Methods 0.000 claims description 18
- 238000012546 transfer Methods 0.000 claims description 14
- 238000007781 pre-processing Methods 0.000 claims description 11
- 230000003139 buffering effect Effects 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 claims description 4
- 239000010410 layer Substances 0.000 description 70
- 239000013598 vector Substances 0.000 description 57
- 230000006870 function Effects 0.000 description 39
- 238000013139 quantization Methods 0.000 description 16
- 230000015654 memory Effects 0.000 description 12
- 238000007906 compression Methods 0.000 description 10
- 230000006835 compression Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 238000013144 data compression Methods 0.000 description 8
- 230000005540 biological transmission Effects 0.000 description 6
- 230000001133 acceleration Effects 0.000 description 5
- 239000011229 interlayer Substances 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000005265 energy consumption Methods 0.000 description 4
- 230000006403 short-term memory Effects 0.000 description 4
- 239000000203 mixture Substances 0.000 description 3
- 238000011022 operating instruction Methods 0.000 description 3
- 238000010187 selection method Methods 0.000 description 3
- 238000005481 NMR spectroscopy Methods 0.000 description 2
- 230000007787 long-term memory Effects 0.000 description 2
- 238000005406 washing Methods 0.000 description 2
- 241001442055 Vipera berus Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3296—Power saving characterised by the action undertaken by lowering the supply or operating voltage
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3877—Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2213/00—Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F2213/0026—PCI express
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
- Machine Translation (AREA)
Abstract
The present disclosure provides a processing apparatus and a processing method. The processing device comprises a number selecting unit, a calculating unit and a data processing unit, wherein the number selecting unit is used for inputting unit data and data position information and selecting unit data needing to be calculated; the lookup table unit is used for receiving the quantized numerical value, performing lookup table operation and outputting a network nonzero numerical value; and the operation unit is used for receiving the selected unit data and the nonzero value, performing operation and outputting data. According to the method and the device, the units needing to participate in calculation can be selected according to the nonzero-value position information by arranging the number selecting unit and the like, so that the calculation amount is reduced, and the access quantity is reduced.
Description
Technical Field
The present disclosure relates to the field of computers, and further relates to the field of artificial intelligence.
Background
Neural networks and machine learning algorithms have been used with great success. However, as we design a larger-scale and deeper neural network, more weights will be introduced, and the super-large-scale weights become a great challenge for the calculation of the neural network. On the one hand, the ultra-large-scale weight value data puts higher requirements on storage, and particularly in embedded devices such as mobile phones, the storage is quite limited, which may result in that all the weight value data cannot be stored. On the other hand, a large amount of access weight data brings immeasurable access and storage energy consumption, so how to compress the scale of the neural network becomes an urgent problem to be solved.
Disclosure of Invention
Technical problem to be solved
In view of the above, the present disclosure provides a compression method and a compression apparatus for a neural network.
(II) technical scheme
According to an aspect of the present disclosure, there is provided a data compression method, including:
performing coarse-grained pruning on data, comprising: selecting a group of weights from the neural network by using a sliding window, and setting the selected weights to be zero; carrying out first retraining on the neural network, wherein the weight value which is already set to be zero in the training process is kept to be zero;
quantizing the data, including: grouping the weights of the neural network, clustering each group of weights by using a clustering algorithm, calculating a central weight for each class, and replacing all weights in each class by using the central weights; and (3) carrying out coding operation on the central weight to obtain a codebook and a weight dictionary, carrying out second training on the neural network, wherein only the codebook is trained during retraining, and the content of the weight dictionary is kept unchanged.
In a further embodiment, the condition for choosing a set of weights for the neural network using a sliding window is: taking the arithmetic mean value of the absolute values of all the weights in the group as a first representative weight of the group, wherein the first representative weight is smaller than a first threshold; or taking the geometric mean value of the absolute values of all the weights in the group as a second representative weight of the group, wherein the second representative weight is smaller than a second threshold; or taking the maximum value of the absolute values of all the weights in the group as the third generation list weight of the group, wherein the third generation list weight is smaller than the third threshold.
In further embodiments, the treatment method further comprises: repeatedly using the sliding window to select a group of weights from the neural network, and setting the selected weights to be zero; and performing first retraining on the neural network until no weight value can be set to zero on the premise of ensuring that the set precision is not lost, wherein the set precision is x%, and x is between 0 and 5.
In a further embodiment, selecting a set of weights from the neural network using a sliding window includes pruning weights of a fully-connected layer, a convolutional layer, or an LSTM layer of the neural network.
In a further embodiment, pruning the fully-connected layer of the neural network comprises: setting the weight of the fully-connected layer as a two-dimensional matrix (Nin, Nout), wherein Nin is the number of input neurons, Nout is the number of output neurons, and the total number of the weights of Nin Nout is Nin Nout, and setting a sliding window with the size of Bin Bout, wherein Bin is a positive integer which is more than or equal to 1 and less than or equal to Nin, and Bout is a positive integer which is more than or equal to 1 and less than or equal to Nout; enabling the sliding window to slide along the direction of Bin according to the step length of Sin, and also sliding along the direction of Bout according to the step length of Sout, wherein Sin is a positive integer which is greater than or equal to 1 and less than or equal to Bin, Sout is a positive integer which is greater than or equal to 1 and less than or equal to Bout; when a set of weights in the sliding window is selected, the set of weights will all be set to zero, i.e. Bin × Bout weights will be set to zero at the same time.
In further embodiments, pruning convolutional layers of the neural network comprises: setting the weight of the convolution layer as a four-dimensional matrix (Nfin, Nfout, Kx, Ky), wherein Nfin is the number of input feature images, Nfout is the number of output feature images, and (Kx, Ky) is the size of a convolution kernel, and the convolution kernel has a total of Nfin, Nfout, Kx and Ky weights, and setting a four-dimensional sliding window with the size of Bfin, Bfout, Bx, By, wherein Bfin is a positive integer which is greater than or equal to 1 and less than or equal to Nfin, Bfout is a positive integer which is greater than or equal to 1 and less than or equal to Nfout, Bx is a positive integer which is greater than or equal to 1 and less than or equal to Kx, and By is a positive integer which is greater than or equal to 1 and less than or equal to Ky; sliding the sliding window along a Bfin direction according to an Sfin step length (stride), or sliding along a Bfout direction according to an Sfout step length, or sliding along a Bx direction according to an Sx step length, or sliding along a By direction according to a Sy step length, wherein Sfin is a positive integer which is greater than or equal to 1 and less than or equal to Bfin, Sfout is a positive integer which is greater than or equal to 1 and less than or equal to Bfout, Sx is a positive integer which is greater than or equal to 1 and less than or equal to Bx, and Sy is a positive integer which is greater than or equal to 1 and less than or equal to By; when a set of weights in a sliding window is selected, the set of weights will all be set to zero, i.e. Bfin Bfout Bx By weights will be set to zero at the same time.
In a further embodiment, pruning the LSTM layer of the neural network comprises: setting the weight of the LSTM layer to be composed of m fully-connected layer weights, wherein m is a positive integer larger than 0, the ith fully-connected layer weight is (Nin _ i, Nout _ i), wherein i is a positive integer larger than 0 and smaller than or equal to m, Nin _ i represents the number of input neurons of the ith fully-connected layer weight, and Nout _ i represents the number of output neurons of the ith fully-connected layer weight; setting a sliding window with the size of Bin _ i Bout _ i, wherein Bin _ i is a positive integer which is greater than or equal to 1 and less than or equal to Nin _ i, and Bout _ i is a positive integer which is greater than or equal to 1 and less than or equal to Nout _ i; sliding the sliding window along the direction of Bin _ i according to the step size of Sin _ i, or along the direction of Bout _ i according to the step size of Sout _ i, wherein Sin _ i is a positive integer which is greater than or equal to 1 and less than or equal to Bin _ i, Sout _ i is a positive integer which is greater than or equal to 1 and less than or equal to Bout _ i; when a set of weights in the sliding window is selected, the set of weights will all be set to zero, i.e. Bin _ i × Bout _ i weights will be set to zero at the same time.
In a further embodiment, the first retraining employs a back propagation algorithm, and the weights that have been set to zero during the training process are kept to zero.
In a further embodiment, grouping the weights of the neural network includes: into a group, a layer type grouping, an inter-layer grouping, and/or an intra-layer grouping.
In a further embodiment, the grouping is to group all weights of the neural network into one group.
In a further embodiment, the layer types are grouped into a group by dividing the weight of all convolutional layers, the weight of all fully-connected layers, and the weight of all long-term memory network layers in the neural network into a group.
In a further embodiment, the inter-layer grouping is to divide the weight of one or more convolutional layers, the weight of one or more fully-connected layers, and the weight of one or more long-term memory network layers in the neural network into a group.
In a further embodiment, the grouping in layers is performed by segmenting the weight values in one layer of the neural network, and each segmented part is divided into one group.
In further embodiments, the clustering algorithm comprises K-means, K-medoids, Clara, and/or Clarans.
In a further embodiment, the center weight selection method is such that the cost function J (w, w)0) At the minimum, the temperature of the mixture is controlled,
wherein w is all the weights in the class, w0 is the central weight, n is the number of weights in the class, wi is the ith weight in the class, and i is a positive integer greater than or equal to 1 and less than or equal to n.
The center weight is selected such that the cost function J (w, w)0) At the minimum, the temperature of the mixture is controlled,
wherein w is all the weights in the class, w0 is the central weight, n is the number of weights in the class, wi is the ith weight in the class, and i is a positive integer greater than or equal to 1 and less than or equal to n.
The second training of the clustered and encoded neural network comprises: retraining the neural network after clustering and encoding by using a back propagation algorithm, keeping the weight which is already set to 0 in the training process to be 0 all the time, and only training a weight codebook without training a weight dictionary.
According to another aspect of the present disclosure, there is provided an apparatus for compressing neural network data, including:
a memory for storing operating instructions;
a processor for executing an operating instruction in the memory, the operating instruction when executed operating in accordance with the compression method as claimed in any one of the preceding claims.
According to still another aspect of the present disclosure, there is provided a processing apparatus, including:
the coarse-granularity number selection unit is used for inputting the neuron and position information of the nonzero weight and selecting the neuron needing to be calculated;
the lookup table unit is used for receiving the quantized nonzero weight dictionary and the nonzero weight codebook, performing lookup table operation and outputting a nonzero weight of the neural network;
and the operation unit is used for receiving the selected neurons and the nonzero weight, operating the neural network and outputting the neurons.
In a further embodiment, the look-up table unit is further adapted to bypass the unquantized non-zero weights directly to the arithmetic unit.
In a further embodiment, the device further comprises an instruction control unit, which is used for receiving the instruction and generating control information after decoding to control the arithmetic unit.
In a further embodiment, the device further comprises a storage unit for storing the neurons, the weights and the instructions of the neural network.
In a further embodiment, the storage unit is further configured to store the nonzero weight value and position information of the nonzero weight value; and also for storing a quantized non-zero weight codebook and a non-zero weight dictionary.
In further embodiments, the arithmetic unit comprises at least one of:
the multiplier is used for multiplying the first input data and the second input data to obtain multiplied data;
the addition tree is used for adding third input data step by step through the addition tree or adding the third input data and fourth input data to obtain added data;
and the activation function operation unit is used for obtaining output data through activation function operation on the fifth data, and the activation function is sigmoid, tanh, relu or softmax function operation.
In a further embodiment, the operation unit further includes a pooling unit configured to obtain output data after a pooling operation by a pooling operation on the input sixth data, the pooling operation including: mean pooling, maximum pooling, or median pooling.
In a further embodiment, the device further comprises an instruction control unit, which is used for receiving the instruction in the storage device, generating control information after decoding to control the coarse-grained number selection unit to perform the number selection operation, and performing the table lookup operation and the calculation operation by the operation unit on the lookup table.
In further embodiments, the instructions are neural network specific instructions, including control instructions, data transfer instructions, arithmetic instructions, and logic instructions.
In a further embodiment, the neural network specific instructions are a Cambricon instruction set, each of the Cambricon instruction set being 64 bits in length, the instructions being comprised of an opcode and an operand.
In further embodiments, the control instructions are for controlling a neural network execution process, including jump instructions and conditional branch instructions.
In a further embodiment, the data transfer instructions are used to complete data transfer between different storage media, including load instructions, store instructions, and carry instructions.
In a further embodiment, the operation instruction is used to perform an arithmetic operation of the neural network, and includes a matrix operation instruction, a vector operation instruction, a scalar operation instruction, a convolutional neural network operation instruction, a fully-connected neural network operation instruction, a pooled neural network operation instruction, a RBM neural network operation instruction, an LRN neural network operation instruction, an LCN neural network operation instruction, an LSTM neural network operation instruction, an RNN neural network operation instruction, a RELU neural network operation instruction, a PRELU neural network operation instruction, a siiogmd neural network operation instruction, a TANH neural network operation instruction, and a MAXOUT neural network operation instruction.
In further embodiments, the logic instructions are for performing logic operations of a neural network, including vector logic operation instructions and scalar logic operation instructions.
In a further embodiment, the vector logic operation instructions include a vector compare, a vector logic operation, and a vector greater than merge instruction; preferably, the vector comparison includes but is greater than, less than, equal to, greater than or equal to, less than or equal to, and not equal to; preferably, the vector logic operation comprises an and, or, not.
In further embodiments, the scalar logic operation comprises a scalar comparison, a scalar logic operation; preferably, the scalar comparison includes but is greater than, less than, equal to, greater than or equal to, less than or equal to, and not equal to; preferably, the scalar logical operation includes logical and, logical or, and logical not.
In a further embodiment, the system further comprises an instruction cache for caching instructions, wherein the instruction cache is an on-chip cache.
In a further embodiment, the method further comprises a non-zero weight codebook cache for caching the non-zero weight codebook, wherein the non-zero weight codebook cache is an on-chip cache.
In a further embodiment, the method further comprises a non-zero weight dictionary cache for caching the non-zero weight dictionary, wherein the non-zero weight dictionary cache is an on-chip cache.
In a further embodiment, the apparatus further includes a non-zero weight location cache for caching the non-zero weight locations, and further for one-to-one mapping each connection weight in the input data to a corresponding input neuron, where the non-zero weight location cache is an on-chip cache.
In a further embodiment, the non-zero weight location caching, configured to one-to-one correspondence of each connection weight in the input data to a corresponding one of the input neurons, comprises: in order to adopt 1 to represent that the weight is connected with the input neuron and 0 to represent no connection, the connection state of each group of output and all the inputs forms a character string of 0 and 1 to represent the connection relation of the output.
In a further embodiment, the non-zero weight location caching, configured to one-to-one correspondence of each connection weight in the input data to a corresponding one of the input neurons, comprises: and (3) performing distance from the position of the input neuron where the first connection of one group of outputs is located to the first input neuron, distance from the second group of output input neurons to the last input neuron, distance from the third group of output input neurons to the last input neuron, and the like until all the inputs of the outputs are exhausted to represent the connection relation of the outputs.
In a further embodiment, the apparatus further comprises an input neuron buffer for buffering input neurons input to the coarse-grained selection unit, wherein the input neuron buffer is an on-chip buffer.
In a further embodiment, an output neuron buffer is further included for buffering output neurons, the output neuron buffer being an on-chip buffer.
In a further embodiment, the apparatus further comprises a direct data access unit DMA unit, configured to perform data or instruction reading and writing in the storage unit, the instruction cache, the non-zero weight codebook cache, the non-zero weight dictionary cache, the non-zero weight position cache, the input neuron cache, and the output neuron cache.
In a further embodiment, the method further comprises a pretreatment unit: the data processing unit is used for preprocessing the original data and inputting the preprocessed data into the storage unit.
According to yet another aspect of the present disclosure, there is provided a processing method including:
inputting neuron and nonzero weight position information, and selecting the neuron needing to be calculated;
receiving a quantized nonzero weight dictionary and a nonzero weight codebook, performing table look-up operation and outputting a nonzero weight of a neural network;
and receiving the selected neurons and the nonzero weight, calculating the neural network and outputting the neurons.
In further embodiments, the treatment method further comprises: receiving the unquantized nonzero weight value to perform neural network operation.
In a further embodiment, further comprising: and receiving the instruction, decoding the instruction, and generating control information to control the operation of the neural network.
In further embodiments, the operation comprises at least one of: multiplication operation, namely multiplying the first input data and the second input data to obtain multiplied data; adding third input data step by step through an addition tree, or adding the third input data and fourth input data to obtain added data; and performing activation function operation, namely performing activation function operation on the fifth data to obtain output data, wherein the activation function is sigmoid, tanh, relu or sofimax function operation.
In a further embodiment, the operation further includes a pooling operation for obtaining output data after the pooling operation by a pooling operation on the input sixth data, the pooling operation including: mean pooling, maximum pooling, or median pooling.
In further embodiments, the instructions are neural network specific instructions, including control instructions, data transfer instructions, arithmetic instructions, and logic instructions.
In further embodiments, the control instructions are for controlling a neural network execution process, including jump instructions and conditional branch instructions.
In a further embodiment, the data transfer instructions are used for data transfer between different storage media, and comprise load instructions, store instructions and handling instructions.
In a further embodiment, the operation instruction is used to perform an arithmetic operation of the neural network, and includes a matrix operation instruction, a vector operation instruction, a scalar operation instruction, a convolutional neural network operation instruction, a fully-connected neural network operation instruction, a pooled neural network operation instruction, a RBM neural network operation instruction, an LRN neural network operation instruction, an LCN neural network operation instruction, an LSTM neural network operation instruction, an RNN neural network operation instruction, a RELU neural network operation instruction, a PRELU neural network operation instruction, a siiogmd neural network operation instruction, a TANH neural network operation instruction, and a MAXOUT neural network operation instruction.
In a further embodiment, the neural network specific instructions are a Cambricon instruction set, each of the Cambricon instruction set being 64 bits in length, the instructions being comprised of an opcode and an operand.
In further embodiments, the logic instructions are for performing logic operations of a neural network, including vector logic operation instructions and scalar logic operation instructions.
In a further embodiment, the vector logic operation instructions include a vector compare, a vector logic operation, and a vector greater than merge instruction; preferably, the vector comparison includes but is greater than, less than, equal to, greater than or equal to, less than or equal to, and not equal to; preferably, the vector logic operation comprises a logical and, a logical or, or a logical not.
In further embodiments, the scalar logic operation comprises a scalar comparison, a scalar logic operation; preferably, the scalar comparison includes but is greater than, less than, equal to, greater than or equal to, less than or equal to, and not equal to; preferably, the scalar logical operation includes logical and, logical or, and logical not.
In a further embodiment, further comprising the step of: and preprocessing input neurons and non-zero weight position information, wherein the preprocessing comprises segmentation, Gaussian filtering, binaryzation, regularization and/or normalization.
In a further embodiment, after receiving the selected neurons and the non-zero weights, the method further comprises the steps of: storing input neurons, a weight dictionary, a codebook and instructions, and storing output neurons; and caching the instruction, the input neuron and the output neuron.
According to a further aspect of the present disclosure, there is provided an electronic device comprising the processing device of any of the above claims, the electronic device comprising a data processing device, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a mobile phone, a tachograph, a navigator, a sensor, a camera, a cloud server, a camera, a camcorder, a projector, a watch, an earphone, a mobile storage, a wearable device vehicle, a household appliance, and/or a medical device;
the vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and/or a range hood; the medical equipment comprises a nuclear magnetic resonance instrument, a B-ultrasonic instrument and/or an electrocardiograph (III)
Compared with the traditional method, the method has the advantages that the coarse-grained pruning and the local quantization are carried out on the weight of the neural network, so that the sparse neural network is more regular, the acceleration by hardware is facilitated, and meanwhile, the storage space of the non-zero weight position is reduced; the local quantization can fully excavate the weight distribution characteristics of the neural network and reduce the bit number representing each weight, thereby further reducing the storage cost and the access cost. The neural network processor disclosed by the invention can fully excavate the characteristics of sparse thickness and local quantization, reduce the memory access and the calculation amount, thereby obtaining the acceleration ratio and reducing the energy consumption. The coarse-granularity number selection unit can select the neurons needing to participate in calculation according to the position information of the nonzero weight, so that the calculation amount is reduced, and the lookup table can find out the nonzero weight according to the nonzero weight dictionary and the nonzero weight codebook, so that the access amount is reduced.
Drawings
Fig. 1 is a flow chart of a data compression method according to an embodiment of the present disclosure.
Fig. 2 is a schematic diagram of implementing step S101 in a fully-connected layer of a neural network according to an embodiment of the present disclosure.
Fig. 3 is a schematic diagram of implementing step S101 in the convolutional layer of the neural network according to the embodiment of the present disclosure.
Fig. 4 is a schematic diagram of a process of weight quantization according to an embodiment of the disclosure.
Fig. 5 is a schematic structural diagram of a compression device according to an embodiment of the disclosure.
Fig. 6 is a schematic structural diagram of a processing device according to an embodiment of the disclosure.
Fig. 7 is a schematic structural diagram of another processing device according to an embodiment of the disclosure.
FIG. 8 is a flow chart of a processing method of an embodiment of the disclosure.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
All modules of the disclosed embodiments may be hardware structures, physical implementations of which include, but are not limited to, physical devices including, but not limited to, transistors, memristors, DNA computers.
According to the basic concept of the present disclosure, there is provided a method of compressing a neural network, the steps including: coarse grain pruning and first retraining, and local quantization and second retraining. Compared with the traditional method, the sparse neural network can be more regular, hardware acceleration is facilitated, and the storage space of a non-zero weight position is reduced; the local quantization can fully excavate the weight distribution characteristics of the neural network and reduce the bit number representing each weight, thereby further reducing the storage cost and the access cost.
Fig. 1 is a flow chart of a data compression method according to an embodiment of the present disclosure. The data compression method comprises the following steps:
s101: selecting a group of weights from the neural network by using a sliding window, and setting the selected weights to be zero; carrying out first retraining on the neural network, wherein the weight value which is already set to be zero in the training process is kept to be zero;
s102: grouping the weights of the neural network, clustering and coding the weights in the group, and performing second training on the clustered and coded neural network.
Wherein the step S101 can be summarized into coarse-grained pruning and first retraining, and specifically can comprise the steps of
S1011: selecting a group of weights of the trained neural network by using a sliding window (sliding window);
s1012: setting the selected weights to be zero;
s1013: retraining the pruned neural network by using a back propagation algorithm, wherein the weight value which is already set to 0 in the training process is always kept to be 0.
The selection method can be as follows, the arithmetic mean of the absolute values of all the weights in the group is less than a certain threshold (such as a first representative weight); or the geometric mean of the absolute values of all weights in the group is less than a certain threshold (such as a second representative weight); or the maximum of the absolute values of all weights in the group is less than a certain threshold (e.g., a third generation table weight).
The coarse-grained pruning can be applied to full-connected layer, convolutional layer and LSTM (10ng short-term memory) layers of the neural network.
Referring to fig. 2, the fully connected layer of the neural network can be regarded as a two-dimensional matrix (Nin, Nout), where Nin represents the number of input neurons, Nout represents the number of output neurons, and Nin × Nout weights are provided. During coarse-grained pruning, a sliding window (sliding window) with the size of Bin Bout is set, and the sliding window can slide along the direction of Bin according to the step size (stride) of Sin and can also slide along the direction of Bout according to the step size (Sout). When a set of weights in a certain sliding window satisfies a condition, the set of weights will all be set to 0, i.e. Bin × Bout weights will be set to 0 at the same time.
Referring to fig. 3, the convolutional layer of the neural network can be regarded as a four-dimensional matrix (Nfin, Nfout, Kx, Ky), where Nfin represents the number of input feature images (feature maps), Nout represents the number of output feature images, and (Kx, Ky) represents the size of the convolutional kernel (kernel). During coarse-grained pruning, a sliding window with the size of Bfin, Bfout, Bx, By is set, and the sliding window can slide along the direction of Bfin according to the step length (stride) of Sfin, or slide along the direction of Bfout according to the step length of Sfout, or slide along the direction of Bx according to the step length of Sx, or slide along the direction of By according to the step length of Sy. When a set of weights in a certain sliding window satisfies the condition, the set of weights will all be set to 0, i.e. Bfin Bfout Bx By weights will be set to 0 at the same time.
The weight of the LSTM (long short-term memory) layer is composed of a plurality of fully-connected layer weights, and it is assumed that the weight of the LSTM layer is composed of i fully-connected layer weights, where i is a positive integer greater than 0. The weight values of all connected layers are (Nin _ i, Nout _ i), wherein Nin _ i represents the number of input neurons of the weight value of the ith all connected layer, Nout _ i represents the number of output neurons of the weight value of the ith all connected layer, when coarse-grained pruning is conducted, for the ith all connected layer, a sliding window with the size being Bin _ i and Bout _ i is set, Bin _ i is a positive integer larger than or equal to 1 and smaller than or equal to Nin _ i, and Bout _ i is a positive integer larger than or equal to 1 and smaller than or equal to Nout _ i. The sliding window can slide along the direction of Bin _ i according to the step size of Sin _ i, and can also slide along the direction of Bout _ i according to the step size of Sout _ i, wherein Sin _ i is a positive integer which is greater than or equal to 1 and less than or equal to Bin _ i, and Sout _ i is a positive integer which is greater than or equal to 1 and less than or equal to Bout _ i. When a set of weights in the sliding window is selected, the set of weights will all be set to 0, i.e. Bin _ i × Bout _ i weights will be set to 0 at the same time.
First retraining: retraining the pruned neural network by using a back propagation algorithm, wherein the weight value which is already set to 0 in the training process is always kept to be 0. And continuously repeating coarse-grained pruning and retraining until no weight value can be set to be 0 on the premise of ensuring that the precision does not lose x%, wherein x is a number which is more than 0 and less than 100, and can be selected differently according to different neural networks and different applications. In one embodiment, x may have a value of 0 to 5.
For step S102, it can be summarized as quantization and retraining, and may include the steps of:
s1021: grouping the neural network weights;
s1022: clustering operation is carried out on each group of weights by using a clustering algorithm, a group of weights is divided into m classes, each class calculates a central weight, and all weights in each class are replaced by the central weights; . Wherein m is a positive integer greater than 0.
S1023: and carrying out coding operation on the central weight to obtain a codebook and a weight dictionary
S1024: retraining the neural network by using a back propagation algorithm, keeping the weight value which is already set to 0 in the training process to be 0 all the time, only training the codebook and not training the dictionary.
For the packet in step S1021: and grouping the weights of the neural network. Further, the grouping strategy can be divided into a group, a layer type group, an interlayer group and an intralayer group.
Fig. 4 is a schematic diagram of a process of weight quantization according to the embodiment of the present disclosure, and as shown in fig. 4, weights are grouped according to a grouping policy to obtain a weight matrix in an ordered arrangement. And performing intra-group sampling and clustering operation on the grouped weight matrix, thereby dividing the weights with similar values into the same category to obtain 4 central weights of 1.50, -0.13, -1.3 and 0.23, and respectively corresponding to the weights of the four categories. Then, the center weight is encoded, the category with the center weight of-1.3 is encoded as 00, the category with the center weight of-0.13 is encoded as 01, the category with the center weight of 0.23 is encoded as 10, and the category with the center weight of 1.50 is encoded as 11, which is the content of the codebook. In addition, the weight values in the corresponding categories are respectively represented by the coded contents (00, 01, 10 and 11) corresponding to the 4 weight values, so that a weight value dictionary is obtained. The quantization process fully excavates the similarity of weights between layers of the neural network and the local similarity of weights in the layers, obtains the weight distribution characteristic of the neural network so as to carry out low bit quantization, and reduces the bit number representing each weight, thereby reducing the weight storage cost and the access cost.
The selection method of the central weight can be such that the cost function J (w, w)0) At the minimum, the temperature of the mixture is controlled,
wherein w is all the weights in the class, w0 is the central weight, n is the number of weights in the class, wi is the ith weight in the class, and i is a positive integer greater than or equal to 1 and less than or equal to n.
Furthermore, in the local quantization, the weights of the neural network are grouped according to the layer type according to a layer type grouping method. The weights of all convolutional layers are a group, the weights of all fully connected layers are a group, and the weights of all LSTM (Long Short-Term Memory) layers are a group. If a neural network has i convolutional layers, j fully-connected layers, m LSTM layers, t different types of layers, where i, j, m is a positive integer greater than or equal to 0 and satisfies i + j + m > -1, t is a positive integer greater than or equal to 1 and satisfies t ═ 0) + (j > 0) + (m > 0), the weight of the neural network will be divided into t groups.
Furthermore, in the local quantization, the weights of the neural network are grouped according to an interlayer grouping method and an interlayer structure. One or a plurality of consecutive convolutional layers is grouped, one or a plurality of consecutive full-link layers is grouped, and one or a plurality of consecutive LSTM layers is grouped.
Furthermore, in the local quantization, the weights of the neural network are grouped according to the in-layer structure according to the in-layer grouping, and the convolution layer, the full-link layer and the LSTM interior of the neural network are grouped and quantized.
Further, the convolutional layer of the neural network can be regarded as a four-dimensional matrix (Nfin, Nfout, Kx, Ky), where Nfin, Nfout, Kx, Ky are positive integers greater than 0, Nfin represents the number of input feature images (feature maps), Nout represents the number of output feature images, and (Kx, Ky) represents the size of the convolution kernel. The weights of the convolution layers are divided into different groups of Nfin Nfout Kx Ky/(Mfin Mfout Mx My) according to the group size of (Mfin, Mfout, Mx, My), wherein Mfin is a positive integer greater than or equal to 0 and less than or equal to Nfin, Mfout is a positive integer greater than or equal to 0 and less than or equal to Nfout, Mx is a positive integer greater than or equal to 0 and less than or equal to Kx, and My is a positive integer greater than or equal to 0 and less than or equal to Ky.
Furthermore, the fully connected layer of the neural network can be regarded as a two-dimensional matrix (Nin, Nout), where Nin, Nout are positive integers greater than 0, Nin represents the number of input neurons, Nout represents the number of output neurons, and Nin × Nout weights are provided. The fully-connected layer weights are divided into (Nin × Nout)/(Min × Mout) different groups according to the group size of (Min, Mout), where Min is a positive integer greater than 0 and equal to or less than Nin, and Mout is a positive integer greater than 0 and equal to or less than Nout.
Furthermore, the LSTM layer weight of the neural network may find the combination of the weights of the multiple fully-connected layers, and assuming that the weight of the LSTM layer is composed of n fully-connected layer weights, where n is a positive integer greater than 0, each fully-connected layer may perform grouping operation in a fully-connected layer grouping manner.
In another aspect of the embodiments of the present disclosure, there is also provided a neural network data compression apparatus, and fig. 5 is a schematic structural diagram of the compression apparatus in the embodiments of the present disclosure, as shown in fig. 5, the neural network data compression apparatus includes:
a memory 1 for storing operation instructions; the operation instruction is generally in the form of a binary number and is composed of an operation code indicating an operation to be performed by the processor 2 and an address code indicating the processor 2 to read data participating in the operation from an address in the memory 1.
And the processor 2 is used for executing the operation instruction in the memory 1, and when the instruction is executed, the operation is carried out according to the method of the weight value.
In the compression device disclosed by the invention, the processor 2 executes the operation instruction in the memory 1 and operates according to the coarse-grained pruning and quantization method, so that the neural network can be regularly thinned, the parameters of the neural network are reduced, the disordered weights are quantized at the same time, low-specific and normalized quantized weights are obtained, the similarity of weights between layers of the neural network and the local similarity of weights in the layers are fully mined, the weight distribution characteristic of the neural network is obtained so as to carry out low-bit quantization, and the bit number representing each weight is reduced, thereby reducing the weight storage cost and the access and storage cost.
Fig. 6 is a schematic structural diagram of a processing device according to an embodiment of the disclosure. The present disclosure provides a new neural network processor, which can fully mine the characteristics of coarse granularity and local quantization, reduce the memory access and calculation amount, thereby obtaining an acceleration ratio and reducing energy consumption.
The accelerating device of the embodiment of the disclosure comprises a coarse-grained number selection unit, a lookup table unit and an operation unit.
And the coarse-granularity number selection unit receives the input neurons and the position information of the nonzero weight and selects the neurons needing to be calculated.
And the lookup table unit receives the nonzero weight dictionary and the nonzero weight codebook, and performs lookup operation to obtain the nonzero weight of the neural network.
And the operation unit receives the selected neurons and the nonzero weight, completes the neural network operation and retransmits the output neurons to the storage device.
Furthermore, the coarse-granularity number selection unit receives the input neurons and the position information of the nonzero weight, selects the neurons corresponding to the nonzero weight and transmits the neurons to the operation unit.
Furthermore, the lookup table finds out the non-zero weight value for the quantized non-zero weight value according to the codebook and the dictionary and transmits the non-zero weight value to the operation unit, and directly transmits the non-quantized non-zero weight value to the operation unit through a bypass.
Furthermore, the operation unit executes the operation including a first part of multiplying the input data 1 and the input data 2 to obtain multiplied data; and/or the second part performs an addition tree operation for adding input data 1 step by step through an addition tree or adding the input data 1 and input data 2 to obtain output data; and/or the third part executes activation function operation, and obtains output data through activation function (active) operation on input data; and/or the fourth section performs pooling operations, out (in), wherein pool is a pooling operation including, but not limited to: mean pooling, maximum pooling, median pooling, input data in being data in a pooling kernel associated with output out. The operation of the above parts can freely select one or more parts to carry out combination in different orders, thereby realizing the operation of various functions.
Specifically, the arithmetic unit includes, but is not limited to: the first part is a multiplier, the second part is an addition tree, and the third part is an active function unit. The first part multiplies the input data 1(in1) and the input data 2(in2) to obtain the multiplied output (out), which is: out in1 in 2; the second section adds the input data in1 stage by stage through an adder tree to obtain output data (out), where in1 is a vector of length N, N is greater than 1, by: out in1[1] + in1[2] +. + in1[ N ], and/or adding the input data (in1) and the input data (in2) after adding through an addition tree to obtain output data (out), the process is as follows: out-in 1[1] + in1[2] +. + in1[ N ] + in2, or adding the input data (in1) and the input data (in2) to obtain the output data (out), the process is: out in1+ in 2; the third part obtains activation output data (out) by operating the input data (in) through an activation function (active), and the process is as follows: the active function may be sigmoid, tanh, relu, softmax, and the like, and in addition to the activation operation, the third part may implement other non-linear functions, and may obtain the output data (out) by performing the operation (f) on the input data (in), where the process is as follows: out ═ f (in); and/or a pooling unit, wherein the pooling unit obtains output data (out) after the pooling operation by pooling the input data (in), and the process is out ═ pool (in), where the pool is the pooling operation, and the pooling operation includes but is not limited to: mean pooling, maximum pooling, median pooling, input data in being data in a pooling kernel associated with output out.
Further, referring to fig. 7, the neural network processor further includes a preprocessing module. The module performs preprocessing on the raw data, including segmentation, gaussian filtering, binarization, regularization, normalization, and the like.
Further, the processor further comprises a storage unit for storing the neurons, the weights and the instructions of the neural network.
Furthermore, the storage unit only stores the nonzero weight and the position information of the nonzero weight when storing the weight. When the storage device stores the quantized nonzero weight value, only a nonzero weight value codebook and a nonzero weight value dictionary are stored.
Furthermore, the processor also comprises an instruction control unit which is used for receiving the instruction in the storage device, generating control information after decoding to control the coarse-grained number selection unit to perform number selection operation, and performing table lookup operation and calculation operation by the operation unit through the lookup table.
Alternatively, the instructions may be neural network specific instructions.
The special instruction for the neural network comprises all instructions special for completing the operation of the artificial neural network. Neural network specific instructions include, but are not limited to, control instructions, data transfer instructions, arithmetic instructions, and logic instructions. Wherein the control instruction controls the neural network to execute the process. The data transmission instructions complete data transmission between different storage media, and the data formats include, but are not limited to, matrix, vector and scalar. The operation instruction completes the arithmetic operation of the neural network, including but not limited to a matrix operation instruction, a vector operation instruction, a scalar operation instruction, a convolutional neural network operation instruction, a fully-connected neural network operation instruction, a pooled neural network operation instruction, an RBM neural network operation instruction, an LRN neural network operation instruction, an LCN neural network operation instruction, an LSTM neural network operation instruction, an RNN neural network operation instruction, a RELU neural network operation instruction, a PRELU neural network operation instruction, a SIGMOID neural network operation instruction, a TANH neural network operation instruction, and a MAXOUT neural network operation instruction. The logic instructions perform logic operations of the neural network, including but not limited to vector logic operation instructions and scalar logic operation instructions.
The RBM neural network operation instruction is used for realizing the operation of the trimmed Boltzmann Machine (RBM) neural network.
The LRN neural network operation instruction is used for realizing Local Response Normalization (LRN) neural network operation.
The LSTM neural network operation instruction is used for realizing Long Short-Term Memory (LSTM) neural network operation.
The RNN Neural network operation instruction is used for realizing the Recovery Neural Networks (RNN) Neural network operation.
The RELU neural network operation instruction is used for realizing a reduced linear unit (RELU) neural network operation.
The PRELU neural network operation instruction is used for realizing Parametric Rectified Linear Unit (PRELU) neural network operation.
Wherein the SIGMOID neural network operation instruction is used for realizing S-type growth curve (SIGMOID) neural network operation
The TANH neural network operation instruction is used for realizing hyperbolic tangent function (TANH) neural network operation.
Wherein the MAXOUT neural network operation instruction is for implementing (MAXOUT) neural network operations.
More specifically, the neural network specific instructions comprise a Cambricon instruction set.
The Cambricon instruction set is characterized in that the instructions are composed of opcodes and operands. The instruction set includes four types of instructions, namely control instructions (control instructions), data transfer instructions (data instructions), computational instructions (computational instructions), and logical instructions (logical instructions).
Preferably, each instruction in the instruction set has a fixed length. For example, each instruction in the instruction set may be 64 bits long.
Further, the control instructions are used for controlling the execution process. The control instructions include jump (jump) instructions and conditional branch (conditional branch) instructions.
Further, the data transmission instruction is used for completing data transmission between different storage media. The data transmission instruction comprises a load (10ad) instruction, a store (store) instruction and a move (move) instruction. The load instruction is used for loading data from the main memory to the cache, the store instruction is used for storing the data from the cache to the main memory, and the move instruction is used for carrying the data between the cache and the cache or between the cache and the register or between the register and the register. The data transfer instructions support three different data organization modes including matrices, vectors and scalars.
Further, the arithmetic instruction is used for completing the neural network arithmetic operation. The operation instructions include a matrix operation instruction, a vector operation instruction, and a scalar operation instruction.
Further, the matrix operation instruction performs matrix operations in the neural network, including matrix multiplication vector (matrix multiplication vector), vector multiplication matrix (vector multiplication matrix), matrix multiplication scalar (matrix multiplication scale), outer product (outer product), matrix addition matrix (matrix added matrix), and matrix subtraction matrix (matrix subtraction matrix).
Further, the vector operation instruction performs vector operations in the neural network, including vector elementary operations (vector elementary operations), vector transcendental functions (vector transcendental functions), inner products (dot products), vector random generator (random vector generator), and maximum/minimum values in vectors (maximum/minimum of a vector). The vector basic operation includes vector addition, subtraction, multiplication, and division (add, subtrect, multiplex, divide), and the vector transcendental function refers to a function that does not satisfy any polynomial equation with coefficients of a polynomial, including but not limited to an exponential function, a logarithmic function, a trigonometric function, and an inverse trigonometric function.
Further, scalar operation instructions perform scalar operations in the neural network, including scalar elementary operations (scalar elementary operations) and scalar transcendental functions (scalar transcendental functions). The scalar basic operation includes scalar addition, subtraction, multiplication, and division (add, subtrect, multiplex, divide), and the scalar transcendental function refers to a function that does not satisfy any polynomial equation with coefficients of a polynomial, including but not limited to an exponential function, a logarithmic function, a trigonometric function, and an inverse trigonometric function.
Further, the logic instruction is used for logic operation of the neural network. The logical operations include vector logical operation instructions and scalar logical operation instructions.
Further, the vector logic operation instruction includes a vector compare (vector compare), a vector logic operation (vector local operations) and a vector greater than merge (vector larger than merge). Wherein the vector comparisons include but are greater than, less than, equal to, greater than or equal to, less than or equal to, and not equal to. The vector logic operation includes and, or, not.
Further, scalar logic operations include scalar compare (scalar compare), scalar local operations (scalar logical operations). Where scalar comparisons include but are greater than, less than, equal to, greater than or equal to, less than or equal to, and not equal to. Scalar logic operations include and, or, not.
Further, as shown in fig. 7, the neural network processor further includes a direct data access unit dma (direct memory access).
Further, as shown in fig. 7, the neural network processor further includes an instruction cache, an input neuron cache, a non-zero weight codebook, a non-zero weight dictionary cache, a non-zero weight position cache, and an output neuron cache.
In particular, the storage unit is mainly used for storing the neurons, the weights and the instructions of the neural network. When storing the weight, only storing the nonzero weight and the position information of the nonzero weight. When the storage unit stores the quantized nonzero weight value, only a nonzero weight value codebook and a nonzero weight value dictionary are stored.
In particular, the DMA is used for reading and writing data or instructions in the storage unit, the instruction cache, the non-zero weight codebook, the non-zero weight dictionary, the non-zero weight position cache, the input neuron cache and the output neuron cache.
An instruction cache for storing the dedicated instructions;
a non-zero weight codebook cache for caching a non-zero weight codebook;
the nonzero weight dictionary cache is used for caching the nonzero weight dictionary;
a nonzero weight position cache for caching nonzero weight position data; the non-zero weight position cache corresponds each connection weight in the input data to the corresponding input neuron one by one.
In one case, the one-to-one correspondence method of the non-zero weight position caches is that 1 is adopted to represent connection, 0 is adopted to represent no connection, and each group of output and all input connection states form a character string of 0 and 1 to represent the connection relation of the output. In another situation, the non-zero weight position cache one-to-one correspondence method is that 1 is adopted to represent connection, 0 is adopted to represent no connection, and the connection state of each group of input and all output forms a character string of 0 and 1 to represent the connection relation of the input. In another case, the one-to-one correspondence method of the non-zero weight position caches is that the distance from the position of an input neuron where a group of output first connections are located to a first input neuron, the distance from the output second group of input neurons to a last input neuron, the distance from the output third group of input neurons to a last input neuron, … … and the like are repeated until all the inputs of the outputs are exhausted to represent the connection relation of the outputs.
The input neuron cache is used for caching the input neurons input to the coarse-granularity number selection unit;
and the output neuron buffer is used for buffering the output neurons output by the operation unit.
And the lookup table unit receives the weight dictionary and the weight codebook and obtains the weight through lookup operation. And the unquantized weight value is directly transmitted to the operation unit through a bypass.
The present disclosure also provides a neural network compression device, which comprises a storage device, an instruction decoding device and a computing device. The storage device stores an instruction sequence for compressing the neural network, wherein the instruction sequence comprises a control instruction, a data transmission instruction, a calculation instruction and the like, and can control the calculation device to complete the conversion of the neural network format and the task of compressing the corresponding format; the instruction decoding device receives the instruction in the storage unit, decodes the instruction and generates a control signal to control the computing device; the computing device receives the control signal to complete the coarse-grained pruning and quantification operation on the neural network. The computing means is arranged to execute the executable instructions in the storage means, the instructions when executed operating in accordance with the data compression method described above.
The present disclosure also provides a processing method of neural network data, as shown in fig. 8, the processing method includes the steps of:
s801: receiving an input neuron, a weight dictionary, a codebook and an instruction;
s802: decoding to obtain numerical control information, search control information and operation control information;
s803: and performing operation under the numerical control information selection, the control information search and the operation control information to obtain the output neuron.
In some embodiments, the processing method further comprises: receiving the unquantized nonzero weight value to perform neural network operation.
In some embodiments, the processing method further comprises: and receiving the instruction, decoding the instruction, and generating control information to control the operation of the neural network.
In some embodiments, the operations include at least one of: multiplication operation, namely multiplying the first input data and the second input data to obtain multiplied data; adding third input data step by step through an addition tree, or adding the third input data and fourth input data to obtain added data; and performing activation function operation, namely performing activation function operation on the fifth data to obtain output data, wherein the activation function is sigmoid, tanh, relu or softmax function operation.
In some embodiments, the operation further includes a pooling operation for obtaining output data after the pooling operation by the pooling operation on the input sixth data, the pooling operation including: mean pooling, maximum pooling, or median pooling.
In some embodiments, the instructions are neural network specific instructions, including control instructions, data transfer instructions, arithmetic instructions, and logic instructions.
In some embodiments, the control instructions are for controlling a neural network execution process, including jump instructions and conditional branch instructions.
In some embodiments, the data transfer instructions are used for completing data transfer between different storage media, and comprise load instructions, store instructions and transport instructions.
In some embodiments, the operation instruction is used to perform an arithmetic operation of the neural network, including a matrix operation instruction, a vector operation instruction, a scalar operation instruction, a convolutional neural network operation instruction, a fully-connected neural network operation instruction, a pooled neural network operation instruction, a RBM neural network operation instruction, an LRN neural network operation instruction, an LCN neural network operation instruction, an LSTM neural network operation instruction, an RNN neural network operation instruction, a RELU neural network operation instruction, a PRELU neural network operation instruction, a SIGMOID neural network operation instruction, a TANH neural network operation instruction, and a MAXOUT neural network operation instruction.
In some embodiments, the neural network specific instructions are a Cambricon instruction set, each instruction being 64 bits in length, the instructions consisting of an opcode and operands.
In some embodiments, the logic instructions are for performing logic operations of the neural network, including vector logic operation instructions and scalar logic operation instructions.
In some embodiments, the vector logic operation instructions include a vector compare, a vector logic operation, and a vector greater than merge instruction; preferably, the vector comparison includes but is greater than, less than, equal to, greater than or equal to, less than or equal to, and not equal to; preferably, the vector logic operation comprises a logical and, a logical or, or a logical not.
In some embodiments, the scalar logic operation comprises a scalar comparison, a scalar logic operation; preferably, the scalar comparison includes but is greater than, less than, equal to, greater than or equal to, less than or equal to, and not equal to; preferably, the scalar logical operation includes logical and, logical or, and logical not.
In some embodiments, the processing method further comprises the steps of: and preprocessing input neurons and non-zero weight position information, wherein the preprocessing comprises segmentation, Gaussian filtering, binaryzation, regularization and/or normalization.
In some embodiments, the processing method further includes, after receiving the selected neurons and the non-zero weight values: storing input neurons, a weight dictionary, a codebook and instructions, and storing output neurons; and caching the instruction, the input neuron and the output neuron.
In one embodiment, the present disclosure discloses a chip including the neural network processor described above.
In one embodiment, the present disclosure discloses a chip packaging structure, which includes the above chip.
In one embodiment, the present disclosure discloses a board card including the above chip package structure.
In one embodiment, the present disclosure discloses an electronic device, which includes the above board card.
The electronic device comprises a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and/or a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
All modules may be hardware structures, physical implementations of hardware structures include, but are not limited to, physical devices including, but not limited to, transistors, memristors, DNA computers.
By the data compression method and the data processing method, the neural network can be compressed regularly at a high compression ratio. The accelerator integrates a compression method inside, and realizes the compression function of the neural network. The accelerator can fully excavate the characteristics of the compressed neural network, reduce memory access and calculation amount, thereby obtaining an acceleration ratio and reducing energy consumption.
The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.
Claims (27)
1. A processing apparatus, comprising:
the sliding window is used for selecting a group of weights from the neural network and setting the selected weights to be zero; carrying out first retraining on the neural network, wherein the weight value which is already set to be zero in the training process is kept to be zero; grouping the weights of the neural network, then clustering and coding the weights in the group, and carrying out second training on the clustered and coded neural network;
the coarse-granularity number selection unit is used for inputting the neuron and position information of the nonzero weight and selecting the neuron needing to be calculated;
the lookup table unit is used for receiving the quantized nonzero weight dictionary and the nonzero weight codebook, performing lookup table operation and outputting a nonzero weight of the neural network;
the operation unit is used for receiving the selected neurons and the nonzero weight, operating the neural network and outputting the neurons;
further comprising: the non-zero weight codebook cache is used for caching a non-zero weight codebook, and is an on-chip cache; the non-zero weight dictionary cache is used for caching a non-zero weight dictionary, and the non-zero weight dictionary cache is an on-chip cache;
selecting a group of weights from the neural network by using a sliding window comprises pruning the weights of the full connection layer of the neural network;
pruning the full connection layer of the neural network comprises: setting the weight of the fully-connected layer as a two-dimensional matrix (Nin, Nout), wherein Nin is the number of input neurons, Nout is the number of output neurons, and the total number of the weights of Nin Nout is Nin Nout, and setting a sliding window with the size of Bin Bout, wherein Bin is a positive integer which is more than or equal to 1 and less than or equal to Nin, and Bout is a positive integer which is more than or equal to 1 and less than or equal to Nout; enabling the sliding window to slide along the direction of Bin according to the step length of Sin, and also sliding along the direction of Bout according to the step length of Sout, wherein Sin is a positive integer which is greater than or equal to 1 and less than or equal to Bin, Sout is a positive integer which is greater than or equal to 1 and less than or equal to Bout; when a group of weights in the sliding window is selected, all the weights are set to be zero, namely Bin × Bout weights are set to be zero at the same time;
wherein the sliding window is further configured to: and continuously repeating the coarse-grained pruning and the first retraining until no weight value can be set to be 0 on the premise of ensuring that the precision does not lose x%, wherein x is a number which is more than 0 and less than 100.
2. The processing apparatus as claimed in claim 1, wherein the lookup table unit is further configured to bypass the unquantized non-zero weight directly to the operation unit.
3. The processing apparatus according to claim 1, further comprising an instruction control unit configured to receive an instruction and generate control information after decoding to control the operation unit.
4. The processing apparatus according to claim 3, further comprising a storage unit for storing the neurons, the weights, and the instructions of the neural network.
5. The processing apparatus according to claim 4, wherein the storage unit is further configured to store a nonzero weight value and location information of the nonzero weight value; and also for storing a quantized non-zero weight codebook and a non-zero weight dictionary.
6. The processing apparatus according to claim 5, wherein the arithmetic unit comprises at least one of:
the multiplier is used for multiplying the first input data and the second input data to obtain multiplied data;
the addition tree is used for adding third input data step by step through the addition tree or adding the third input data and fourth input data to obtain added data;
and the activation function operation unit is used for obtaining output data through activation function operation on the fifth data, and the activation function is sigmoid, tanh, relu or softmax function operation.
7. The processing apparatus according to claim 6, wherein the arithmetic unit further includes a pooling unit configured to obtain output data after a pooling operation by a pooling operation on the input sixth data, the pooling operation including: mean pooling, maximum pooling, or median pooling.
8. The processing apparatus as claimed in claim 7, further comprising an instruction control unit, configured to receive the instruction from the storage device, generate control information after decoding, control the coarse-grained selection unit to perform the selection operation, perform the table lookup operation on the lookup table, and perform the calculation operation on the calculation unit.
9. The processing apparatus of claim 8, wherein the instructions are neural network specific instructions, including control instructions, data transfer instructions, arithmetic instructions, and logic instructions.
10. The processing apparatus as in claim 9 wherein the neural network specific instructions are Cambricon instruction set.
11. The processing apparatus as claimed in claim 10, further comprising an instruction cache to cache instructions, the instruction cache being an on-chip cache.
12. The processing apparatus according to claim 11, further comprising a non-zero weight location buffer for buffering non-zero weight locations, and for one-to-one mapping each connection weight in the input data to a corresponding input neuron, wherein the non-zero weight location buffer is an on-chip buffer.
13. The processing apparatus as claimed in claim 12, wherein the non-zero weight location buffer for one-to-one mapping each connection weight in the input data to one-to-one correspondence in the corresponding input neuron comprises:
in order to adopt 1 to represent that the weight is connected with the input neuron and 0 to represent no connection, the connection state of each group of output and all the inputs forms a character string of 0 and 1 to represent the connection relation of the output.
14. The processing apparatus as claimed in claim 12, wherein the non-zero weight location buffer for one-to-one mapping each connection weight in the input data to one-to-one correspondence in the corresponding input neuron comprises:
and (3) outputting the distance from the position of the input neuron where the first connection is output to the first input neuron, outputting the distance from the second group of input neurons to the last input neuron, outputting the distance from the third group of input neurons to the last input neuron, and repeating the steps until all the input of the output are exhausted to represent the connection relation of the output.
15. The processing apparatus as claimed in claim 14, further comprising an input neuron buffer for buffering input neurons input to the coarse-grained selection unit, the input neuron buffer being an on-chip buffer.
16. The processing apparatus as claimed in claim 15, further comprising an output neuron cache to cache output neurons, the output neuron cache being an on-chip cache.
17. The processing apparatus according to claim 16, further comprising a direct data access unit DMA unit configured to read and write data or instructions in the storage unit, the instruction cache, the non-zero weight codebook cache, the non-zero weight dictionary cache, the non-zero weight location cache, the input neuron cache, and the output neuron cache.
18. The processing apparatus according to claim 4, further comprising a preprocessing unit: the data processing unit is used for preprocessing the original data and inputting the preprocessed data into the storage unit.
19. A method of processing, comprising:
selecting a group of weights from the neural network by using a sliding window, and setting the selected weights to be zero; carrying out first retraining on the neural network, wherein the weight value which is already set to be zero in the training process is kept to be zero; grouping the weights of the neural network, then clustering and coding the weights in the group, and carrying out second training on the clustered and coded neural network;
inputting neuron and nonzero weight position information, and selecting the neuron needing to be calculated;
receiving a quantized nonzero weight dictionary and a nonzero weight codebook, performing table look-up operation and outputting a nonzero weight of a neural network;
receiving the selected neurons and the nonzero weight, calculating the neural network and outputting the neurons;
on-chip cache is adopted to cache a non-zero weight dictionary and a non-zero weight codebook;
selecting a group of weights from the neural network by using a sliding window comprises pruning the weights of the full connection layer of the neural network;
pruning the full connection layer of the neural network comprises: setting the weight of the fully-connected layer as a two-dimensional matrix (Nin, Nout), wherein Nin is the number of input neurons, Nout is the number of output neurons, and the total number of the weights of Nin Nout is Nin Nout, and setting a sliding window with the size of Bin Bout, wherein Bin is a positive integer which is more than or equal to 1 and less than or equal to Nin, and Bout is a positive integer which is more than or equal to 1 and less than or equal to Nout; enabling the sliding window to slide along the direction of Bin according to the step length of Sin, and also sliding along the direction of Bout according to the step length of Sout, wherein Sin is a positive integer which is greater than or equal to 1 and less than or equal to Bin, Sout is a positive integer which is greater than or equal to 1 and less than or equal to Bout; when a group of weights in the sliding window is selected, all the weights are set to be zero, namely Bin × Bout weights are set to be zero at the same time;
and continuously repeating the coarse-grained pruning and the first retraining until no weight value can be set to be 0 on the premise of ensuring that the precision does not lose x%, wherein x is a number which is more than 0 and less than 100.
20. The process of claim 19, further comprising: receiving the unquantized nonzero weight value to perform neural network operation.
21. The process of claim 19, further comprising: and receiving the instruction, decoding the instruction, and generating control information to control the operation of the neural network.
22. The processing method of claim 19, wherein the operation comprises at least one of:
multiplication operation, namely multiplying the first input data and the second input data to obtain multiplied data;
adding third input data step by step through an addition tree, or adding the third input data and fourth input data to obtain added data;
and performing activation function operation, namely performing activation function operation on the fifth data to obtain output data, wherein the activation function is sigmoid, tanh, relu or softmax function operation.
23. The processing method according to claim 22, wherein the operation further comprises a pooling operation for obtaining output data after the pooling operation by a pooling operation on the input sixth data, the pooling operation comprising: mean pooling, maximum pooling, or median pooling.
24. The processing method of claim 21, wherein the instructions are neural network specific instructions, including control instructions, data transfer instructions, arithmetic instructions, and logic instructions.
25. The processing method as in claim 24, wherein the neural network specific instructions are Cambricon instruction set, each of the Cambricon instruction set is 64 bits long, and the instructions are composed of an opcode and an operand.
26. The process of claim 19, further comprising the steps of: and preprocessing input neurons and non-zero weight position information, wherein the preprocessing comprises segmentation, Gaussian filtering, binaryzation, regularization and/or normalization.
27. The processing method of claim 19, further comprising, after receiving the selected neurons and the non-zero weights, the steps of: storing input neurons, a weight dictionary, a codebook and instructions, and storing output neurons; and caching the instruction, the input neuron and the output neuron.
Priority Applications (13)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710689666.6A CN109389210B (en) | 2017-08-09 | 2017-08-09 | Processing method and processing apparatus |
EP18806558.5A EP3637325A4 (en) | 2017-05-23 | 2018-05-23 | Processing method and accelerating device |
EP19214015.0A EP3657399A1 (en) | 2017-05-23 | 2018-05-23 | Weight pruning and quantization method for a neural network and accelerating device therefor |
EP19214007.7A EP3657340B1 (en) | 2017-05-23 | 2018-05-23 | Processing method and accelerating device |
EP19214010.1A EP3657398A1 (en) | 2017-05-23 | 2018-05-23 | Weight quantization method for a neural network and accelerating device therefor |
PCT/CN2018/088033 WO2018214913A1 (en) | 2017-05-23 | 2018-05-23 | Processing method and accelerating device |
US16/699,049 US20200134460A1 (en) | 2017-05-23 | 2019-11-28 | Processing method and accelerating device |
US16/699,051 US20220335299A9 (en) | 2017-05-23 | 2019-11-28 | Processing method and accelerating device |
US16/699,046 US11727276B2 (en) | 2017-05-23 | 2019-11-28 | Processing method and accelerating device |
US16/699,027 US20200097826A1 (en) | 2017-05-23 | 2019-11-28 | Processing method and accelerating device |
US16/699,055 US20200097828A1 (en) | 2017-05-23 | 2019-11-28 | Processing method and accelerating device |
US16/699,032 US11907844B2 (en) | 2017-05-23 | 2019-11-28 | Processing method and accelerating device |
US16/699,029 US11710041B2 (en) | 2017-05-23 | 2019-11-28 | Feature map and weight selection method and accelerating device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710689666.6A CN109389210B (en) | 2017-08-09 | 2017-08-09 | Processing method and processing apparatus |
CN201710677987.4A CN109389218B (en) | 2017-08-09 | 2017-08-09 | Data compression method and compression device |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710677987.4A Division CN109389218B (en) | 2017-05-23 | 2017-08-09 | Data compression method and compression device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109389210A CN109389210A (en) | 2019-02-26 |
CN109389210B true CN109389210B (en) | 2021-06-18 |
Family
ID=65415148
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710689666.6A Active CN109389210B (en) | 2017-05-23 | 2017-08-09 | Processing method and processing apparatus |
CN201710677987.4A Active CN109389218B (en) | 2017-05-23 | 2017-08-09 | Data compression method and compression device |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710677987.4A Active CN109389218B (en) | 2017-05-23 | 2017-08-09 | Data compression method and compression device |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN109389210B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110163370B (en) * | 2019-05-24 | 2021-09-17 | 上海肇观电子科技有限公司 | Deep neural network compression method, chip, electronic device and medium |
CN110298446B (en) * | 2019-06-28 | 2022-04-05 | 济南大学 | Deep neural network compression and acceleration method and system for embedded system |
CN112488285A (en) * | 2019-09-12 | 2021-03-12 | 上海大学 | Quantification method based on neural network weight data distribution characteristics |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105512723A (en) * | 2016-01-20 | 2016-04-20 | 南京艾溪信息科技有限公司 | Artificial neural network calculating device and method for sparse connection |
CN106557332A (en) * | 2016-11-30 | 2017-04-05 | 上海寒武纪信息科技有限公司 | A kind of multiplexing method and device of instruction generating process |
CN106919942A (en) * | 2017-01-18 | 2017-07-04 | 华南理工大学 | For the acceleration compression method of the depth convolutional neural networks of handwritten Kanji recognition |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11423311B2 (en) * | 2015-06-04 | 2022-08-23 | Samsung Electronics Co., Ltd. | Automatic tuning of artificial neural networks |
-
2017
- 2017-08-09 CN CN201710689666.6A patent/CN109389210B/en active Active
- 2017-08-09 CN CN201710677987.4A patent/CN109389218B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105512723A (en) * | 2016-01-20 | 2016-04-20 | 南京艾溪信息科技有限公司 | Artificial neural network calculating device and method for sparse connection |
CN106557332A (en) * | 2016-11-30 | 2017-04-05 | 上海寒武纪信息科技有限公司 | A kind of multiplexing method and device of instruction generating process |
CN106919942A (en) * | 2017-01-18 | 2017-07-04 | 华南理工大学 | For the acceleration compression method of the depth convolutional neural networks of handwritten Kanji recognition |
Non-Patent Citations (4)
Title |
---|
Deep Compression: Compression Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding;Song Han et al.;《arXiv》;20160215;摘要,第2-3节 * |
Learning both Weights and Connections for Efficient Neural Networks;Song Han et al.;《arXiv》;20151030;第1-9页 * |
Song Han et al..Deep Compression: Compression Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding.《arXiv》.2016, * |
Structured Pruning of Deep Convolutional Neural Networks;SAJID ANWAR et al.;《ACM Journal on Emerging Technologies in Computing Systems》;20170228;第13卷(第3期);第32-32:18页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109389210A (en) | 2019-02-26 |
CN109389218A (en) | 2019-02-26 |
CN109389218B (en) | 2021-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200097828A1 (en) | Processing method and accelerating device | |
CN111221578B (en) | Computing device and computing method | |
CN109389208B (en) | Data quantization device and quantization method | |
CN110163334B (en) | Integrated circuit chip device and related product | |
CN110163357B (en) | Computing device and method | |
US10657439B2 (en) | Processing method and device, operation method and device | |
US11544542B2 (en) | Computing device and method | |
CN109389210B (en) | Processing method and processing apparatus | |
CN109478251B (en) | Processing method and acceleration device | |
CN111626413A (en) | Computing device and method | |
CN109389209B (en) | Processing apparatus and processing method | |
CN109697507B (en) | Processing method and device | |
CN110196735A (en) | A kind of computing device and Related product | |
CN108960420B (en) | Processing method and acceleration device | |
CN110196734A (en) | A kind of computing device and Related product | |
CN114492778A (en) | Operation method of neural network model, readable medium and electronic device | |
CN111198714B (en) | Retraining method and related product | |
CN109102074B (en) | Training device | |
CN116384445A (en) | Neural network model processing method and related device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |