WO2019127362A1 - Procédé de compression de bloc de modèle de réseau neuronal, procédé d'apprentissage, dispositif informatique et système - Google Patents

Procédé de compression de bloc de modèle de réseau neuronal, procédé d'apprentissage, dispositif informatique et système Download PDF

Info

Publication number
WO2019127362A1
WO2019127362A1 PCT/CN2017/119819 CN2017119819W WO2019127362A1 WO 2019127362 A1 WO2019127362 A1 WO 2019127362A1 CN 2017119819 W CN2017119819 W CN 2017119819W WO 2019127362 A1 WO2019127362 A1 WO 2019127362A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
matrix
block
blocks
network model
Prior art date
Application number
PCT/CN2017/119819
Other languages
English (en)
Chinese (zh)
Inventor
张悠慧
季宇
张优扬
Original Assignee
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学 filed Critical 清华大学
Priority to PCT/CN2017/119819 priority Critical patent/WO2019127362A1/fr
Priority to CN201780042629.4A priority patent/CN109791628B/zh
Publication of WO2019127362A1 publication Critical patent/WO2019127362A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention generally relates to the field of neural network technologies, and more particularly to a network model block compression method, a training method, a computing device, and a hardware system for a neural network.
  • Figure 1 shows a chain-like neural network, in which each circle represents a neuron.
  • Each arrow represents a connection between neurons, each connection has a weight, and the structure of the actual neural network is not limited to a chain-like network structure.
  • the core computation of the neural network is a matrix vector multiplication operation.
  • the output produced by the layer L n containing n neurons can be represented by a vector V n of length n, which is fully associated with the layer L m containing m neurons, and the connection weights It can be expressed as a matrix M n ⁇ m , the matrix size is n rows and m columns, and each matrix element represents the weight of one connection.
  • the vector input to L m after weighting is M n ⁇ m V n , and such matrix vector multiplication is the core calculation of the neural network.
  • the neural network acceleration chip Since the matrix vector multiplication is very large, it takes a lot of time to perform a large number of matrix multiplication operations on a conventional general-purpose processor. Therefore, the neural network acceleration chip also has the main design goal of accelerating matrix multiplication.
  • a memristor array is a hardware device capable of implementing the above matrix multiplication operation.
  • the resistance of each memristor can be varied at a specific input current, and the resistance can be used to store data.
  • DRAM Dynamic Random Access Memory
  • SRAM Static Random Access Memory
  • memristors have the characteristics of high memory density and no loss of data in the event of loss of power supply.
  • Figure 2 shows a schematic diagram of a memristor based crossbar structure.
  • the conductance value G (the reciprocal of the resistance) of the memristor is set as the matrix element value of the weight matrix.
  • the voltage V is multiplied by the memristor conductance G and superimposed with the output current, and the output current is multiplied by the grounding resistance Rs to obtain an output voltage V', thereby completing the matrix vector multiplication operation at the output end.
  • memristor-based chip calculations also has the disadvantages of low precision, large disturbance, large cost of digital-to-analog/analog conversion, and limited matrix size.
  • TrueNorth is also a chip capable of matrix vector multiplication.
  • TrueNorth is IBM's neuromorphic chip, which integrates 4096 synaptic nuclei on each chip, and each synaptophysin can handle 256 ⁇ 256 synaptic calculations.
  • the neural network model needs to be compressed to reduce resource overhead and improve the computational efficiency of the neural network.
  • the existing Deep Compression is a common compression method for CNN networks.
  • the implementation of deep compression is mainly divided into three steps: weight cropping, weight sharing and Huffman coding.
  • Weight cutting First, the normal training model obtains the network weight; second, all the weights below a certain threshold are set to 0; 3. Re-train the remaining non-zero weights in the network. Repeat the above three steps.
  • Weight sharing The kmeans algorithm is used to cluster weights. In each class, all weights share the cluster centroid of the class, so the final stored result is a codebook and index table.
  • Huffman coding mainly used to solve the redundancy problem caused by the length of coding. Deep compression uses 8-bit encoding for the convolutional layer and 5 bits for the fully connected layer. Therefore, this entropy coding can better balance the coding bits and reduce redundancy.
  • This method can compress the model to a compression ratio of 90% while maintaining the accuracy.
  • the present invention has been made in view of the above circumstances.
  • a network model block compression method for a neural network comprising: a weight matrix obtaining step, obtaining a weight matrix of a network model of the trained neural network; a weight matrix blocking step, Dividing the weight matrix into an array consisting of a number of initial sub-blocks according to a predetermined array size; the step of concentrating the weighted element elements, according to the absolute value and value of the weights of the matrix elements in the sub-block, the row and column exchange, the weight is smaller
  • the matrix elements are grouped into the sub-block to be cropped such that the absolute values and values of the weights of the matrix elements in the sub-block to be cropped are relative to the absolute values and values of the weights of the matrix elements in other sub-blocks that are not to be cropped.
  • the sub-block clipping step clips the weights of the matrix elements in the sub-block to be cropped to obtain a final weight matrix to implement compression of the network model of the neural network.
  • the number of the sub-blocks to be cropped may be set according to a compression ratio or according to a threshold.
  • the step of arranging the weight element element may include the steps of: determining a pre-trimming sub-block step, determining a pre-trimmed sub-block as a crop candidate; marking the row and column step, selecting and marking the pre-cut sub-block All rows and all columns are used as transposition rows and transposition columns, wherein the number of the pre-trimmed sub-blocks is set according to the compression ratio; the swap row step and the swap column step have absolute weights for the matrix elements in each row The values are summed, and the rows with the small values are sequentially swapped with the marked transposition rows, and the absolute values of the weights of the matrix elements in each column are summed, and the columns with the smaller values are sequentially
  • the marked transposition column performs position exchange; the above steps are repeated until the exchange cannot change the sum of the absolute values of the weights of the matrix elements in all the pre-trimmed sub-blocks, and the pre-trimmed sub-block at this time is used as the sub-block to be cropped.
  • the step of determining the pre-trimmed sub-block may include: calculating a sum of absolute values of weights of matrix elements in each of the initial sub-blocks, and using a sub-block having a small value as a pre-cut sub-block.
  • a neural network training method comprising the steps of: training a neural network to obtain a weight matrix of a network model; and compressing the weight matrix according to the network model blocking compression method; Iterate through the above steps until the predetermined iteration suspension request is reached.
  • a computing device for neural network computing comprising a memory and a processor, the memory storing computer executable instructions, the computer executable instructions comprising network model compression instructions, when the processor When the network model compression instruction is executed, the following method is performed: a weight matrix obtaining step, obtaining a weight matrix of the network model of the trained neural network; a weight matrix blocking step, dividing the weight matrix into a number according to a predetermined array size An array of initial sub-blocks; a step of concentrating the weighted element elements, according to the absolute values and values of the weights of the matrix elements in the sub-blocks, by matrix-row exchange, the matrix elements with smaller weights are concentrated into the sub-blocks to be cropped, And causing the absolute value and the value of the weight of the matrix element in the to-be-cut sub-block to be smaller than the absolute value and the value of the matrix element in the other sub-blocks that are not to be cropped; the sub-block clipping step The weights of the matrix
  • the number of the sub-blocks to be cropped can be set according to a compression ratio or according to a threshold.
  • the step of concentrating the weighting element element may include the steps of: determining a pre-trimming sub-block step, determining a pre-trimmed sub-block as a clipping candidate; marking the row and column step, selecting and marking all rows of the pre-trimmed sub-block and All columns are used as a transposition row and a transposition column, wherein the number of the pre-trimmed sub-blocks is set according to a compression ratio; the swap row step and the swap column step are used to sum the absolute values of the weights of the matrix elements in each row, And the row with the small value is sequentially exchanged with the marked transposition row, and the absolute values of the weights of the matrix elements in each column are summed, and the column with the smaller value is sequentially substituted with the labeled transposition
  • the column performs positional exchange; the above steps are repeated until the exchange cannot change the sum of the absolute values of the weights of the matrix elements in all the pre-trimmed sub-blocks, and the pre-trimmed sub-blocks at this time
  • the step of determining the pre-trimmed sub-block may further comprise: calculating a sum of absolute values of weights of the matrix elements in each of the initial sub-blocks, and using the sub-blocks having a smaller value as the pre-cut sub-blocks.
  • the computer executable instructions may include network model application instructions, and when the processor executes the network model application instructions, perform the following method: input data processing steps, and exchange input data according to a row and column exchange order a matrix multiplication operation step of performing matrix multiplication on the exchanged input data with a final weight matrix obtained by executing the network model compression instruction; and outputting a data processing step of performing a matrix multiplication operation according to the row and column exchange order Reverse exchange and output as output data.
  • the computer executable instructions may further include a network model training instruction, and when the processor executes the network model training instruction, performing the following method: training the neural network to obtain an initial weight matrix of the network model; Performing the network model compression instruction to obtain a compressed final weight matrix; executing the network model application instruction for training; and iteratively performing the above compression and training steps until a predetermined iteration suspension request is reached.
  • a hardware system using the above-described network model block compression method, the above-described neural network training method, and network model compression, application and training according to the above-described computing device including: neural network hardware The chip, the neural network hardware chip has a basic module that performs a matrix vector multiplication operation in a hardware form by a circuit device, wherein a circuit device is not provided at a position corresponding to a matrix element in the sub-block to be cropped.
  • the circuit device can be a synapse or a synapse of a TrueNorth chip.
  • a network model block compression method for a neural network is provided, thereby saving resource overhead to arrange a large-scale neural network under conditions of limited resources.
  • Figure 1 shows a schematic of a chained neural network.
  • Figure 2 shows a schematic diagram of a memristor based crossbar switch structure.
  • FIG. 3 is a diagram showing an application scenario of a network model block compression technique of a neural network in accordance with the present invention.
  • FIG. 4 shows a general flow diagram of a network model block compression method in accordance with the present invention.
  • Fig. 5 shows an exploded flow chart of the steps of concentrating the weights to be trimmed elements according to the above method.
  • Figures 6a-6c show the correct rates for different compression ratios using the compression method according to the invention over a variety of data sets and different network sizes.
  • FIG. 3 shows a schematic diagram of an application context 1000 of a network model block compression technique for a neural network in accordance with the present invention.
  • the general inventive concept of the present disclosure is to perform preliminary neural network training on the neural network application 1100, and learn to obtain a network model 1200 that is subjected to a predetermined compression ratio by the network model block compression method 1300.
  • FIG. 4 and FIG. 5 are flowcharts showing a network model block compression method 1300 according to an embodiment of the present invention, wherein FIG. 4 shows a general flowchart of a network model block compression method according to the present invention, and FIG. An exploded flow chart of the steps of concentrating the elements to be trimmed according to the above method is presented.
  • the network model block compression method includes the following steps:
  • the weight matrix obtains step S210 to obtain a weight matrix of the network model of the trained neural network.
  • the initial weight matrix is 6*6 size and is further explained by the matrix of Table 1 below.
  • Weight Matrix Blocking Step S220 The weight matrix is divided into an array of a number of initial sub-blocks according to a predetermined array size.
  • the divided sub-block size can be set according to the size of the weight matrix and the compression ratio. For example, sub-blocks such as 4*4, 8*8...256*256 can also be set. Matrix size.
  • the to-be-cut weight element is concentrated in step S230.
  • the matrix elements with smaller weights are concentrated by row-column exchange.
  • the sum value of the sub-block to be cropped is smaller than the sum of the other sub-blocks that are not to be cropped, wherein the number of the sub-block to be cropped is set according to the compression ratio;
  • FIG. 5 shows an exploded flowchart of step S230 of the weighting element to be cropped according to the above method, including the following steps:
  • Step S2301 Determining a pre-trimmed sub-block as a cropping candidate.
  • each initial sub-block and value is calculated, and a sub-block having a small sum value is used as a pre-cut sub-block.
  • the weight matrix of Table 1 is taken as an absolute value, and the matrix of Table 2 is obtained.
  • Table 2 takes the absolute value of the matrix
  • the sub-block with the smallest value is selected as the pre-trimmed sub-block, and is marked as True, and the other sub-blocks are marked as False, and Table 4 is obtained, where the sub-block number starts with B.
  • Step S2302 Select and mark all the rows and all columns in which the pre-trimmed sub-blocks are located as the transposition row and the transposition column, and mark the transposition row and the transposition column.
  • the four sub-blocks B21, B22, B23, and B33 having the smallest sum value are marked as "True” as pre-trimmed sub-blocks.
  • the transposition behaviors R2, R3, R4, R5 and the transposition column are C0-C5, thus marking the transposition behavior ER2, ER3, ER4, ER5 and transposition are listed as EC0-EC5, in which the transposition line starts with ER, and the transposition column starts with EC, which is different from the general line beginning with R and C.
  • Step S2303 summing the absolute values of the weights of the matrix elements in each row, and sequentially swapping the rows with the smaller values with the marked transposition rows.
  • R3 is transposed to ER2 ⁇ R[0 1 3 2 4 5] (At this time, since R3 and R2 have been swapped, R2 is no longer transposed);
  • R1 is transposed to ER4 ⁇ R[0 4 3 2 1 5];
  • R0 is transposed to ER5 ⁇ R[5 4 3 2 1 0].
  • Exchange column step S2304 sum the absolute values of the weights of the matrix elements in each column, and sequentially exchange the columns with the smaller values in position with the marked transposition columns.
  • C5 is transposed to EC1 ⁇ C [3 5 2 0 4 1];
  • C4 is transposed to EC2 ⁇ C [3 5 4 0 2 1];
  • C1 is transposed to EC3 ⁇ C [3 5 4 1 2 0];
  • C0 is transposed to EC4 ⁇ C[3 5 4 1 0 2];
  • C2 is transposed to EC5 ⁇ C [3 5 4 1 0 2].
  • the first column obtained after the exchange is now the C3 column of the original matrix
  • the second column is the C5 column of the original matrix
  • the line order is: R[5,4,3,2,1,0]
  • the pre-cut sub-block sum Sum2 is compared to the storage sub-block sum Sum1.
  • the storage sub-block sum Sum1 ⁇ pre-cut sub-block sum Sum2, which are not equal, sets the storage sub-block sum Sum1 pre-cut sub-block sum Sum2. That is, since 6.0731 ⁇ 7.1541, the storage sub-block sum Sum1 is set to 7.154.
  • Sum1 is smaller than Sum2, it indicates that the current switching operation can continue, so steps S2301 to S2305 are repeated, that is, the to-be-cut weight element is concentrated to the pre-cut sub-block position again by the row-column switching operation, and it is determined whether the pre-cut sub-block is pre-cut.
  • the sum is equal to the sum of the storage sub-blocks as comparison values, as detailed below.
  • the sum of the pre-trimmed sub-blocks is obtained after the exchange processing, based on the position of the pre-trimmed sub-block determined before the exchange processing, and the sub-block and value are calculated, and the sum of the storage sub-blocks is After the last exchange process of the secondary loop, it is set according to the judgment result. Specifically, each time the judgment is made, as long as the sum of the storage sub-blocks and the sum of the pre-cropped sub-blocks are different, the sum of the pre-cut sub-blocks is stored as a sum of the storage sub-blocks for use in the next comparison. In the above process, the initial value of the sum of the stored sub-blocks is set according to the sum of the pre-cut sub-blocks initially determined by the loop.
  • Step S2301 Determining the pre-trimmed sub-block as a cropping candidate.
  • the sub-block with the small value is still used as the pre-trimmed sub-block, so according to Table 9, the pre-cropped sub-block is re-selected as shown in Table 10 below.
  • Table 10 marks the pre-cut sub-block
  • ⁇ mark row and column step S2302 select and mark all rows and all columns where the pre-cut sub-block is located
  • transposition line and a transposition column As a transposition line and a transposition column, the transposition line and the transposition column are marked.
  • the four sub-blocks B21, B22, B23, and B32 with the smallest sum value are marked as "True” as the pre-trimmed sub-block, and the row row in which the pre-cut sub-block is located is used as the transposition line and the transposition column, and the transposition is performed.
  • the row and transposition columns include: R2, R3, R4, R5, and C0-C5, such that the transposition behavior ER2, ER3, ER4, ER5 and the transposition column are EC0-EC5.
  • ⁇ swap line step S2303 summing the absolute values of the weights of the matrix elements in each row, and
  • the row with the smaller value is sequentially swapped with the marked transposition row.
  • the sum of the rows, from small to large, is R2 ⁇ R3 ⁇ R4 ⁇ R5 ⁇ R0 ⁇ R1, and then the rows of ER2, ER3, ER4, and ER5 are sequentially switched in this order. Since the order of the rows having the small weights is in one-to-one correspondence with the order of the transposition lines at this time, the row transposition is no longer performed, and is still in Table 8.
  • ⁇ exchange column step S2304 summing the absolute values of the weights of the matrix elements in each column, and
  • the columns with the smaller values are sequentially swapped with the marked transposition columns.
  • the sum of the columns, from small to large, is C0 ⁇ C1 ⁇ C2 ⁇ C3 ⁇ C4 ⁇ C5, and then the label columns EC0, EC1, EC2, EC3, EC4, and EC5 are sequentially changed in this order. Since the order of the columns with small weights is in one-to-one correspondence with the order of the transposition columns at this time, the column transposition is no longer performed, and is still Table 8.
  • the storage sub-block sum Sum1 has been set to 7.154 in the first row-column exchange process, that is, the sum of the pre-cut sub-blocks before the second row-column exchange.
  • the line order is: R[5,4,3,2,1,0]
  • Step S2301 Determining the pre-trimmed sub-block as a cropping candidate.
  • the sub-block with the small value is still used as the pre-trimmed sub-block.
  • the sum of the four pre-trimmed sub-blocks B21, B22, B23, and B32 is still the minimum sub-block and value, so the pre-cut sub-block constant.
  • the line order is: R[5,4,3,2,1,0]
  • the storage sub-block sum Sum1 is equal to 5.666
  • Sub-block clipping step S240 Trimming the weights of the matrix elements in the sub-block to be cropped to implement compression of the network model of the neural network.
  • the clipping here is not limited to setting the value of the element of the matrix itself to 0.
  • the device corresponding to the position corresponding to the matrix element can be directly Omitted. More specifically, when the corresponding hardware device is arranged to implement the weight matrix, the device for performing the block calculation of the corresponding position is removed.
  • the weighting elements to be cropped are concentrated into the matrix sub-block, and then the matrix sub-block is directly cut off, and then the neural network training is performed as the initial value, and the array is reduced under the premise of ensuring the network effect. Use, which greatly reduces resource overhead.
  • the method proposed by the present invention is fully applicable to a neural network based on a memristor and a TrueNorth chip.
  • the traditional network compression method is not suitable for neural networks based on memristors or TrueNorth chips, because even if the network model is compressed to a small extent, the use of the array cannot be reduced, and resource consumption cannot be reduced.
  • the steps of the row and column exchange given above are only examples, but such a row and column exchange manner is not the only alternative.
  • the sum of the rows, from small to large is R3 ⁇ R2 ⁇ R1 ⁇ R0 ⁇ R5 ⁇ R4, and then sequentially switches to ER2, ER3, ER4, and ER5 in this order.
  • These markup lines that is to say, in the present invention, the row in which the sub-block with the smallest value and the smallest value are selected is exchanged as an exchange line, which can greatly speed up the efficiency of the row-column exchange, and the weight-valued element to be cropped with a smaller value can be quickly added.
  • the number of sub-blocks to be cropped is determined by setting the compression ratio in the present invention
  • the number of sub-blocks to be cropped may be determined by setting a threshold as long as the compression purpose can be satisfied.
  • the core of the inventive concept of the present invention is to obtain a croppable sub-block by row-column switching, so as to be applicable to a sub-block operation application without limiting the specifically available switching manner.
  • Table 14 is a weight matrix (corresponding to Table 8) after the row and column exchange according to the compression method of the present invention, wherein the underlined Null identifies the pre-trimmed matrix element.
  • Table 15 is an initial matrix in which Table 14 is restored in the original rank order (i.e., the following order), in which the black underlined Null identifies the pre-cut matrix element.
  • the line order is: R[5,4,3,2,1,0]
  • Table 15 the essential difference between Table 15 and Table 14 is that the pre-cut elements in Table 15 are dispersed, and the pre-cut elements in Table 14 are gathered together in the form of 2*2 sub-blocks. Therefore, in the actual arrangement, the hardware arrangement is implemented according to Table 14 (ie, the matrix after row and column exchange) to meet the needs of the block calculation, which is also the overall concept of the invention, that is, the compression method is applied to the corresponding The key to the block computing application.
  • the row-exchanged input vector is multiplied by the row-row-exchanged weight matrix in Table 14, that is, the product of the vector and the corresponding element of the matrix is summed, and the output point multiplication result 2 is:
  • Table 20 is the result of the point multiplication of the exchange 2
  • weight matrix obtained by the compression method according to the present invention when applied to data, needs to exchange input data according to the order of row and column exchange before processing, and then exchanges the input data with the final weight.
  • the matrix performs matrix multiplication, and finally the result of matrix multiplication is reversely exchanged and output as output data according to the order of row and column exchange.
  • Figure 6a shows the correct rate of the CIFAR10 data set after compression.
  • the CIFAR10 data set has 60,000 32*32 pixel color pictures, each of which belongs to one of 10 categories.
  • Figure 6b shows the correct rate of compression of the MNIST data set under the LENET network, where the MNIST data set has 60,000 28*28 pixel black and white handwritten digital pictures.
  • Figure 6c shows the correct rate of compression of the MNIST data set under the MLP network.
  • the abscissa is the compression ratio
  • the ordinate is the correct rate
  • the lines of different colors represent arrays of different sizes.
  • the correct rate is between 84% and 85% for the CIFAR10 data set.
  • the correct The rate is basically 98%-99%, or even higher, which also proves that the correct rate of the data compression method of the present invention is quite good from multiple angles. In other words, under various data sets and different network scales, the compression method of the present invention can greatly compress the network scale and save resource overhead without affecting the correct rate.
  • the compression rate of some of the results is not high, and the result is related to the large scale of the array.
  • the largest set of data is 256*256 array size, so large.
  • the size of the array causes the valid data to be cropped too much, thus affecting the correct rate.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

L'invention concerne un procédé de compression de bloc de modèle de réseau destiné à être utilisé avec un réseau neuronal, comprenant : une étape d'obtention d'une matrice de pondération, comprenant l'obtention d'une matrice de pondération d'un modèle de réseau d'un réseau neuronal qui est obtenu après un apprentissage ; une étape de blocage d'une matrice de pondération, comprenant la division de la matrice de pondération selon une taille de réseau prédéterminée en un réseau composé d'une pluralité de sous-blocs initiaux ; une étape de concentration d'éléments de pondération à recadrer, selon la valeur absolue de pondération et la valeur d'éléments matriciels dans les sous-blocs, comprenant la concentration d'éléments matriciels ayant une pondération plus faible au moyen d'un échange rangée-colonne dans un sous-bloc à recadrer de telle sorte que la valeur absolue de pondération et la valeur des éléments matriciels dans le sous-bloc à recadrer sont inférieures à la valeur absolue de pondération et à la valeur d'éléments matriciels dans d'autres sous-blocs qui ne doivent pas être recadrés ; une étape de recadrage d'un sous-bloc, comprenant le recadrage de la pondération des éléments matriciels dans le sous-bloc à recadrer pour obtenir une matrice de pondération finale de manière à mettre en œuvre la compression du modèle de réseau du réseau neuronal. Ainsi, les ressources et le surdébit peuvent être économisés, et un réseau neuronal à grande échelle est configuré avec des ressources limitées.
PCT/CN2017/119819 2017-12-29 2017-12-29 Procédé de compression de bloc de modèle de réseau neuronal, procédé d'apprentissage, dispositif informatique et système WO2019127362A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2017/119819 WO2019127362A1 (fr) 2017-12-29 2017-12-29 Procédé de compression de bloc de modèle de réseau neuronal, procédé d'apprentissage, dispositif informatique et système
CN201780042629.4A CN109791628B (zh) 2017-12-29 2017-12-29 神经网络模型分块压缩方法、训练方法、计算装置及系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/119819 WO2019127362A1 (fr) 2017-12-29 2017-12-29 Procédé de compression de bloc de modèle de réseau neuronal, procédé d'apprentissage, dispositif informatique et système

Publications (1)

Publication Number Publication Date
WO2019127362A1 true WO2019127362A1 (fr) 2019-07-04

Family

ID=66495633

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/119819 WO2019127362A1 (fr) 2017-12-29 2017-12-29 Procédé de compression de bloc de modèle de réseau neuronal, procédé d'apprentissage, dispositif informatique et système

Country Status (2)

Country Link
CN (1) CN109791628B (fr)
WO (1) WO2019127362A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115724A (zh) * 2020-07-23 2020-12-22 云知声智能科技股份有限公司 一种多领域神经网络在垂直领域微调的优化方法及系统
CN113642710A (zh) * 2021-08-16 2021-11-12 北京百度网讯科技有限公司 一种网络模型的量化方法、装置、设备和存储介质
CN114781650A (zh) * 2022-04-28 2022-07-22 北京百度网讯科技有限公司 一种数据处理方法、装置、设备以及存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659731B (zh) * 2018-06-30 2022-05-17 华为技术有限公司 一种神经网络训练方法及装置
CN113052292B (zh) * 2019-12-27 2024-06-04 北京硅升科技有限公司 卷积神经网络技术方法、装置及计算机可读存储介质
CN111259396A (zh) * 2020-02-01 2020-06-09 贵州师范学院 一种基于深度学习卷积神经网络的计算机病毒检测方法及深度学习神经网络的压缩方法
CN112861549B (zh) * 2021-03-12 2023-10-20 云知声智能科技股份有限公司 一种训练翻译模型的方法和设备
CN113052307B (zh) * 2021-03-16 2022-09-06 上海交通大学 一种面向忆阻器加速器的神经网络模型压缩方法及系统
CN113252984B (zh) * 2021-07-06 2021-11-09 国网湖北省电力有限公司检修公司 基于蓝牙绝缘子测量仪的测量数据处理方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650928A (zh) * 2016-10-11 2017-05-10 广州视源电子科技股份有限公司 一种神经网络的优化方法及装置
CN106779068A (zh) * 2016-12-05 2017-05-31 北京深鉴智能科技有限公司 调整人工神经网络的方法和装置
CN106919942A (zh) * 2017-01-18 2017-07-04 华南理工大学 用于手写汉字识别的深度卷积神经网络的加速压缩方法
CN107239825A (zh) * 2016-08-22 2017-10-10 北京深鉴智能科技有限公司 考虑负载均衡的深度神经网络压缩方法
CN107368885A (zh) * 2017-07-13 2017-11-21 北京智芯原动科技有限公司 基于多粒度剪枝的网络模型压缩方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9400955B2 (en) * 2013-12-13 2016-07-26 Amazon Technologies, Inc. Reducing dynamic range of low-rank decomposition matrices
CN106297778A (zh) * 2015-05-21 2017-01-04 中国科学院声学研究所 数据驱动的基于奇异值分解的神经网络声学模型裁剪方法
CN106529670B (zh) * 2016-10-27 2019-01-25 中国科学院计算技术研究所 一种基于权重压缩的神经网络处理器、设计方法、芯片
CN106779051A (zh) * 2016-11-24 2017-05-31 厦门中控生物识别信息技术有限公司 一种卷积神经网络模型参数处理方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239825A (zh) * 2016-08-22 2017-10-10 北京深鉴智能科技有限公司 考虑负载均衡的深度神经网络压缩方法
CN106650928A (zh) * 2016-10-11 2017-05-10 广州视源电子科技股份有限公司 一种神经网络的优化方法及装置
CN106779068A (zh) * 2016-12-05 2017-05-31 北京深鉴智能科技有限公司 调整人工神经网络的方法和装置
CN106919942A (zh) * 2017-01-18 2017-07-04 华南理工大学 用于手写汉字识别的深度卷积神经网络的加速压缩方法
CN107368885A (zh) * 2017-07-13 2017-11-21 北京智芯原动科技有限公司 基于多粒度剪枝的网络模型压缩方法及装置

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115724A (zh) * 2020-07-23 2020-12-22 云知声智能科技股份有限公司 一种多领域神经网络在垂直领域微调的优化方法及系统
CN112115724B (zh) * 2020-07-23 2023-10-20 云知声智能科技股份有限公司 一种多领域神经网络在垂直领域微调的优化方法及系统
CN113642710A (zh) * 2021-08-16 2021-11-12 北京百度网讯科技有限公司 一种网络模型的量化方法、装置、设备和存储介质
CN113642710B (zh) * 2021-08-16 2023-10-31 北京百度网讯科技有限公司 一种网络模型的量化方法、装置、设备和存储介质
CN114781650A (zh) * 2022-04-28 2022-07-22 北京百度网讯科技有限公司 一种数据处理方法、装置、设备以及存储介质
CN114781650B (zh) * 2022-04-28 2024-02-27 北京百度网讯科技有限公司 一种数据处理方法、装置、设备以及存储介质

Also Published As

Publication number Publication date
CN109791628B (zh) 2022-12-27
CN109791628A (zh) 2019-05-21

Similar Documents

Publication Publication Date Title
WO2019127362A1 (fr) Procédé de compression de bloc de modèle de réseau neuronal, procédé d'apprentissage, dispositif informatique et système
WO2021004366A1 (fr) Accélérateur de réseau de neurones artificiels basé sur un élagage structuré et une quantification à faible débit binaire, et procédé associé
Chang et al. Hardware accelerators for recurrent neural networks on FPGA
WO2019127363A1 (fr) Procédé de codage de poids pour réseau neuronal, appareil informatique et système matériel
US20240185050A1 (en) Analog neuromorphic circuit implemented using resistive memories
WO2019091020A1 (fr) Procédé de stockage de données de poids, et processeur de réseau neuronal basé sur le procédé
Ankit et al. TraNNsformer: Neural network transformation for memristive crossbar based neuromorphic system design
CN112257844B (zh) 一种基于混合精度配置的卷积神经网络加速器及其实现方法
CN110084364B (zh) 一种深度神经网络压缩方法和装置
WO2022134465A1 (fr) Procédé de traitement de données clairsemées permettant d'accélérer l'exploitation d'un processeur reconfigurable, et dispositif
Cai et al. Training low bitwidth convolutional neural network on RRAM
WO2019001323A1 (fr) Système et procédé de traitement de signal
CN109993275A (zh) 一种信号处理方法及装置
KR20220101418A (ko) 저전력 고성능 인공 신경망 학습 가속기 및 가속 방법
Shen et al. PRAP-PIM: A weight pattern reusing aware pruning method for ReRAM-based PIM DNN accelerators
Yuan et al. A dnn compression framework for sot-mram-based processing-in-memory engine
CN113435581B (zh) 数据处理方法、量子计算机、装置及存储介质
Ascia et al. Improving inference latency and energy of network-on-chip based convolutional neural networks through weights compression
CN112132272B (zh) 神经网络的计算装置、处理器和电子设备
CN113705784A (zh) 一种基于矩阵共享的神经网络权重编码方法及硬件系统
Guo et al. A multi-conductance states memristor-based cnn circuit using quantization method for digital recognition
Peng et al. Network Pruning Towards Highly Efficient RRAM Accelerator
Chang et al. UCP: Uniform channel pruning for deep convolutional neural networks compression and acceleration
Huang et al. BWA-NIMC: Budget-based Workload Allocation for Hybrid Near/In-Memory-Computing
US11113623B2 (en) Multi-sample system for emulating a quantum computer and methods for use therewith

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17936159

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17936159

Country of ref document: EP

Kind code of ref document: A1