CN109212960B - Weight sensitivity-based binary neural network hardware compression method - Google Patents
Weight sensitivity-based binary neural network hardware compression method Download PDFInfo
- Publication number
- CN109212960B CN109212960B CN201811000016.7A CN201811000016A CN109212960B CN 109212960 B CN109212960 B CN 109212960B CN 201811000016 A CN201811000016 A CN 201811000016A CN 109212960 B CN109212960 B CN 109212960B
- Authority
- CN
- China
- Prior art keywords
- sensitivity
- value
- neural network
- particle
- weight matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 230000035945 sensitivity Effects 0.000 title claims abstract description 141
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 98
- 238000000034 method Methods 0.000 title claims abstract description 52
- 230000006835 compression Effects 0.000 title claims abstract description 32
- 238000007906 compression Methods 0.000 title claims abstract description 32
- 239000011159 matrix material Substances 0.000 claims abstract description 98
- 238000012549 training Methods 0.000 claims abstract description 8
- 239000002245 particle Substances 0.000 claims description 119
- 239000013598 vector Substances 0.000 claims description 38
- 230000003044 adaptive effect Effects 0.000 claims description 17
- 238000005516 engineering process Methods 0.000 claims description 16
- 230000001133 acceleration Effects 0.000 claims description 8
- 230000008859 change Effects 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 8
- 238000010206 sensitivity analysis Methods 0.000 claims description 5
- 230000001939 inductive effect Effects 0.000 claims description 2
- 230000001131 transforming effect Effects 0.000 claims description 2
- 230000006978 adaptation Effects 0.000 claims 3
- 230000008901 benefit Effects 0.000 abstract description 3
- 238000002474 experimental method Methods 0.000 description 6
- 238000005457 optimization Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000006698 induction Effects 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0205—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric not using a model or a simulator of the controlled system
- G05B13/024—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric not using a model or a simulator of the controlled system in which a parameter or coefficient is automatically adjusted to optimise the performance
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a weight sensitivity-based binary neural network hardware compression method, which comprises the following steps: training by adopting a binary neural network to obtain a weight matrix and original accuracy; evaluating the sensitivity of any weight matrix; presetting a sensitivity threshold value, and dividing a sensitive set and a non-sensitive set of a weight matrix; evaluating the sensitivity of the non-sensitive set of the weight matrix; adjusting a sensitivity threshold value to obtain an optimal non-sensitive set of the weight matrix; the sensitivity of the optimal non-sensitive set is equal to a preset maximum accuracy loss value; storing the optimal non-sensitive set in a new memory or a conventional memory using a near-threshold/sub-threshold voltage technique. Through the scheme, the method has the advantages of low power consumption, high recognition rate, good universality, low cost and the like, and has wide market prospect in the technical field of hardware compression.
Description
Technical Field
The invention relates to the technical field of hardware compression, in particular to a weight sensitivity-based binary neural network hardware compression method.
Background
At present, in order to reduce resource overhead and power consumption required for realizing neural network hardware, the mainstream methods adopted include hardware architecture optimization, neural network compression, binary neural network and the like. The hardware architecture optimization is a method for designing a neural network more efficiently on a hardware level, reduces memory resources occupied by data, and reduces redundancy of the data in memory reading and writing and operation modes, thereby achieving the purpose of reducing resource overhead and power consumption. The neural network compression realizes the compression of the network model by reducing the number of weights and the number of quantization bits in the neural network, and simultaneously ensures that the recognition accuracy of the compressed neural network is not influenced. In general, a large number of weights in the neural network are numbers with absolute values close to 0, so that the weights with very small absolute values in the network can be removed (namely, 0), so that no connection exists at the position, and the total number of the weights in the network is reduced. The weights in the network are all decimals with high accuracy, the decimals need to be quantized into fixed points of fixed numbers in hardware storage, and in order to ensure high accuracy of the weights, 32 bits are usually used for storing one weight, so that storage cost is high. Each weight can be quantized more coarsely, i.e. a high-precision decimal is represented by using fewer bits (such as 3 bits), and weights of different layers in the network can adopt different quantization bits to ensure the identification accuracy of the neural network. In the traditional neural network compression, firstly, a neural network is normally trained, the absolute value of the weight in the trained neural network is set to be 0 (namely, no connection exists between two neurons at the position) so that the neural network is sparse, the number of the weights in the network is reduced, and then the sparse network is retrained; then dividing the rest weights into several categories by a K-means clustering method, coding each category, wherein each weight is represented by a category code, if the weights are divided into 4 categories, only 2-bit codes are needed to represent each weight, the weights of each category share the same numerical value, and then retraining the codes of the weights; and finally, the Huffman coding is adopted to further optimize the coding of each weight value, so that effective compression is realized.
The binary neural network directly quantizes the weight (or comprises input values of each layer) in the neural network into 1 or-1, only 1bit is needed in hardware to represent the weight, and the number of bits needed by each weight is greatly reduced. The binary neural network mainly includes four types of networks, which are binary connection, BNN (binary neural network), BWN (binary weighted network), and XNOR-Net (homo or network), and they are only quantized objects different from each other, but all convert values into 1 or-1 representations. BinaryConnect and BWN quantize the weights to binary 1 or-1 only, and BNN and XNOR-Net quantize the weights and the input values of each layer to binary 1 or-1. In the neural network, the calculation of each layer is mainly the multiplication operation between the input vector and the weight matrix. If only the weight is quantized to 1 or-1, the multiplication operation between the input vector and the weight matrix is converted into addition and subtraction operation, and the multiplication operation is reduced; if the weight and the input value of each layer are quantized to 1 or-1, the multiplication operation of the input vector and the weight matrix is converted into the exclusive OR operation of 1bit, and the power consumption is saved compared with the addition and subtraction operation. BWN and XNOR-Net introduce more scale factors than BinaryConnect and BNN, so that the identification accuracy of complex tasks can be better ensured.
The conventional compression method has the following disadvantages:
firstly, hardware resources are saved and power consumption is poor due to hardware architecture optimization and neural network compression; compared with hardware architecture optimization and neural network compression, the binary neural network can realize compression of at least 32 times (the original network weight is represented by 32 bits, and the binary neural network is represented by only 1 bit), and the multiplication operation is converted into addition and subtraction operation or 1bit same or operation in the process of computing, so that hardware storage overhead and computing power consumption are greatly reduced. And the optimization of hardware architecture and the compression of neural network are not as simple as the binary neural network although the storage and the power consumption are saved to a certain extent.
Secondly, the recognition accuracy of the binary neural network is low; in many binary neural networks, as for classification tasks, BinaryConnect and BNN can only well complete classification tasks on some small data sets, such as handwritten digit sets MNIST, common object recognition data sets CIFAR, real world street number digital recognition data sets SVHN, and the like, and when the data sets are changed into huge data sets such as ImageNet, BinaryConnect and BNN can seriously reduce recognition accuracy. To this end, BWN and XNOR-Net require additional scaling factors to guarantee network identification accuracy.
Thirdly, the traditional compression method adopts a storage device such as a 6T SRAM, the storage device enables hardware resource overhead and power consumption to be large, the scale of a neural network realized by a chip is limited, and although the binary neural network has good performance for traditional hardware, the fault tolerance of the binary neural network is not fully utilized. The mainstream trend is to adopt a novel memory device, such as a RRAM (resistive random access memory), which can greatly save hardware resources, and deploy a larger-scale neural network on hardware, but the neural network has a unreliable problem, and if the weight of the whole neural network is stored on the novel memory device, the recognition accuracy of the neural network is greatly affected, so that the use of the novel memory device still has some challenges.
Fourth, the conventional memory is powered by a normal voltage, and a near-threshold/sub-threshold voltage technique may be used to reduce power consumption of the circuit. The near-threshold voltage technology is used for adjusting the power supply voltage of a circuit to be close to (or higher or lower than) the starting voltage of a transistor, and the technology is greatly improved in the aspects of working frequency, energy efficiency and the like; the sub-threshold voltage technique is to adjust the supply voltage to be lower than the transistor turn-on voltage, providing the lowest power consumption. Therefore, the near-threshold/sub-threshold voltage technology can be adopted for the traditional memory device for storing the weight of the neural network, so that the aim of reducing power consumption is fulfilled. However, near-threshold/sub-threshold voltage techniques still face some challenges, with uncertainty or variability issues. At low supply voltages, the circuit is subject to interference, resulting in errors in the weights stored in conventional memory devices. If the whole traditional memory device adopts the near-threshold/sub-threshold voltage technology, the identification accuracy of the neural network is greatly influenced.
Disclosure of Invention
Aiming at the problems, the invention aims to provide a weight sensitivity-based binary neural network hardware compression method, and the technical scheme adopted by the invention is as follows:
a weight sensitivity-based binary neural network hardware compression method comprises the following steps:
and step S1, training by adopting a binary neural network to obtain a weight matrix and original accuracy.
In step S2, the sensitivity of any weight matrix is evaluated.
And step S3, presetting a sensitivity threshold, and dividing a sensitive set and a non-sensitive set of the weight matrix.
And step S4, evaluating the sensitivity of the non-sensitive set of the weight matrix.
Step S5, adjusting a sensitivity threshold value to obtain an optimal non-sensitive set of the weight matrix; the sensitivity of the optimal non-sensitive set is equal to a preset maximum accuracy loss value.
Step S6, storing the optimal non-sensitive set in a new memory or a conventional memory using a near-threshold/sub-threshold voltage technique.
Further, in step S2, the evaluating the sensitivity of any weight matrix includes the following steps:
step S21, presetting an error probability P to evaluate the unreliability of the novel memory device and the near-threshold/sub-threshold voltage; the P is a number greater than 0 and less than 1.
And step S22, any binary neural network weight of the weight matrix generates errors in sequence according to the error probability P, and the first accuracy of the binary neural network is obtained.
Step S23, repeating step S22 for N times, and obtaining a frequency histogram with first accuracy; and N is a natural number greater than 100.
And step S24, obtaining the average value of the frequency histogram of the first accuracy, and taking the average value as the second accuracy of the binary neural network when the weight matrix has errors.
Step S25, finding the sensitivity of the weight matrix, wherein the sensitivity is the difference between the original accuracy and the second accuracy in step S24.
Further, in step S3, the step of presetting a sensitivity threshold and dividing the sensitive set and the non-sensitive set of the weight matrix includes the following steps:
and step S31, sequentially sorting the sensitivities of the weight matrixes from large to small, and presetting a sensitivity threshold.
Step S32, the sensitivity of the weight matrix is greater than the sensitivity threshold and divided into sensitive set, and the sensitivity of the weight matrix is less than or equal to the sensitivity threshold and divided into non-sensitive set.
Preferably, in the step S4, the evaluating the sensitivity of the non-sensitive set of weight matrices includes the following steps:
step S41, making the non-sensitive set generate errors according to the error probability P, and obtaining a third accuracy of the non-sensitive set; the P is a number greater than 0 and less than 1.
And step S42, repeating the step S41 for N times, obtaining the average value of the frequency histogram of the third accuracy, and obtaining the difference value between the original accuracy and the average value as the sensitivity of the non-sensitive set.
Further, in the step S5, adjusting the sensitivity threshold to obtain the optimal non-sensitive set of the weight matrix includes the following steps:
step S51, presetting a maximum accuracy loss value;
and step S52, adjusting a sensitivity threshold, if the sensitivity of the non-sensitive set is equal to the maximum accuracy loss value, taking the non-sensitive set as the optimal non-sensitive set, and entering step S6, otherwise, continuously adjusting the sensitivity threshold, and returning to step S3 to divide the sensitive set and the non-sensitive set of the weight matrix.
A weight sensitivity-based binary neural network hardware compression method comprises the following steps:
and step K1, training by adopting a binary neural network to obtain a weight matrix and original accuracy.
Step K2, initializing the particle swarm to obtain the initialization of the D-dimensional vector of the particle; and D is the number of weight matrixes in the neural network.
Step K3, add constraints.
In step K4, an adaptive value of any one particle in the particle group is obtained, and the sensitivity of the non-sensitive set of particles is obtained.
And step K5, updating the historical optimal value and the global optimal value of any particle.
K6, obtaining the update speed and the change probability of any particle dimension value, and generating a random number T; t is a number of 0 or more and 1 or less.
Step K7, judging the variation probability of any particle and the size of the random number T, if the random number T is less than or equal to the variation probability, transforming the dimension values of the particles differently; if the random number T is larger than the variation probability, the particle dimension value is kept unchanged, so that the particle D-dimension vector is updated.
Step K8, repeating the steps K4 to K7, performing iterative operation on the particles, judging whether the iterative operation times are equal to the preset maximum iterative times, if so, outputting a global optimum value, and otherwise, returning to the step K4; and solving an optimal non-sensitive set by using the global optimal value.
Step K9, storing the optimal non-sensitive set in a new memory or a conventional memory using a near-threshold/sub-threshold voltage technique.
Preferably, in the step K2, any particle in the particle group is selected, initialization of the D-dimensional vector of the particle is obtained by sensitivity analysis, and the D-dimensional vectors of the remaining particles are initialized randomly; the particle group consists of M particles, wherein M is a natural number more than 1.
Further, the step K4 of obtaining the adaptive value of any particle in the particle group and the sensitivity of the insensitive set includes the steps of:
step K41, marking the weight matrix with the position of 1 in the D-dimensional vector as non-sensitive, and marking the weight matrix with the position of 0 in the D-dimensional vector as sensitive; and the number of weight matrixes with the position of 1 in the D-dimensional vector is the adaptive value of the particle.
Step K42, inducing the non-sensitivity in the D-dimensional vector to obtain a non-sensitivity set, enabling the non-sensitivity set to be wrong according to the error probability P, obtaining a fourth accuracy of the binary neural network, repeating the operation for N times to obtain a frequency histogram of the fourth accuracy, and taking the average value of the frequency histogram as a fifth accuracy reached by the binary neural network when the non-sensitivity set is wrong; p is a number greater than 0 and less than 1, and N is a natural number greater than 100; and the difference value between the original accuracy and the fifth accuracy is the sensitivity of the non-sensitive set.
Further, in the step K5, if the sensitivity of the non-sensitive set obtained in the step K42 is less than or equal to the preset maximum accuracy loss value, it is determined whether the adaptive value of any particle is greater than the historical optimum value corresponding to the particle, if the adaptive value is greater than the historical optimum value, the adaptive value is taken as the historical optimum value, otherwise, the historical optimum value is kept unchanged; and comparing the historical optimal values of the M particles, and taking the historical optimal value with the largest numerical value as a global optimal value.
Further, in the step K6, updating the update speed and the change probability of the particle includes the following steps:
step K61, calculating the update velocity v of each dimension of any particleidThe expression is as follows:
vid=w·vid+c1·rand()·(pid-xid)+c2·rand()·(pgd-xid) ①
wherein w represents an inertia factor, c1Represents an acceleration constant, c2Representing the acceleration constant, and rand () representing the generated random number, pidRepresenting the historical optimum, x, of the particleidRepresenting the current value of the particle, pgdThe optimum values of the M particles are shown.
Step K62, updating the speed vidMapping the variable probability of the dimension value into a mapping expression:
wherein, v isidRepresenting the update speed for each dimension.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention can obtain the weight matrix insensitive to accuracy in the neural network, which is as follows: dividing the weight matrix by setting and adjusting a sensitivity threshold in the method 1; in the method 2, the result of the sensitivity analysis method is used as the D-dimensional vector of one particle to be initialized, other particles are initialized randomly, and the division scheme of the weight matrix is obtained by continuously iterating and updating the D-dimensional vectors of M particles. The advantage of such a design is that an optimal non-sensitive set of weight matrices can be obtained in the neural network, which means that the accuracy of the network is not seriously compromised even if the optimal non-sensitive set is in error.
(2) In the method 1, the invention continuously adjusts the sensitivity threshold value by comparing with the maximum accuracy loss set by the user, so as to ensure that the optimal non-sensitive set of the weight matrix is obtained under the maximum accuracy loss condition set by the user. In addition, in the method 2, the recognition accuracy of the binary neural network is ensured by adding the constraint condition of the maximum accuracy loss set by the user.
(3) The invention has universality aiming at novel memory devices or near-threshold/sub-threshold voltage technologies under different processes and technologies. Because the reliability degree of the novel memory device or the near-threshold/sub-threshold voltage technology under different processes and technologies is not consistent, the unreliability of the novel memory device or the near-threshold/sub-threshold voltage technology is evaluated by using the error probability P, and the error probability P can be determined according to the actual processes and technologies.
(4) The invention skillfully utilizes the neural network weight to reduce the resource expenditure of hardware, stores the weight of the optimal non-sensitive set on the novel storage device, and the novel storage device has a simple structure and low resource expenditure compared with the traditional storage device, thereby bringing about the effects of reducing the use of the traditional storage device and saving the resource expenditure of hardware. Or, the invention stores the weight of the optimal non-sensitive set on the traditional memory device adopting the near-threshold/sub-threshold voltage technology, and the power supply voltage of the near-threshold/sub-threshold voltage is lower than the normal starting voltage of the transistor, thereby reducing the power consumption of the circuit.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of protection, and it is obvious for those skilled in the art that other related drawings can be obtained according to these drawings without inventive efforts.
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a flow chart of the present invention for evaluating the sensitivity of a weight matrix.
FIG. 3 is a flow chart of the inventive partitioning weight matrix.
FIG. 4 is a flow chart of the present invention for evaluating the sensitivity of a non-sensitive set.
FIG. 5 is a flowchart of the present invention for finding the optimal non-sensitive set.
FIG. 6 is a flow chart of the present invention (II).
FIG. 7 is a flow chart of the present invention (III).
Detailed Description
To further clarify the objects, technical solutions and advantages of the present application, the present invention will be further described with reference to the accompanying drawings and examples, and embodiments of the present invention include, but are not limited to, the following examples. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Example 1
As shown in fig. 1 to fig. 5, this embodiment provides a method for compressing binary neural network hardware based on weight sensitivity, and it should be noted that, in this embodiment, the ordinal terms such as "first", "second", "third", and the like are only used to distinguish similar components, and the method includes the following steps:
firstly, training by adopting a binary neural network to obtain a weight matrix and original accuracy.
Secondly, evaluating the sensitivity of any weight matrix, specifically as follows:
(21) presetting an error probability P to evaluate the unreliability of the novel memory device and the near-threshold/sub-threshold voltage, wherein the P is a number greater than 0 and less than 1; that is, the probability of each weight in the weight matrix being wrong (1 → -1, -1 → 1) is P.
(22) And any binary neural network weight of the weight matrix is sequentially wrong according to the error probability P, and the first accuracy of the binary neural network is obtained. When any binary neural network weight is wrong, other weight matrixes in the binary neural network are kept unchanged, and the identification accuracy of the network is tested by using the data set.
(23) Repeating step (22) at least 100 times to obtain a frequency histogram of a first accuracy. Since whether each weight in the weight matrix has errors is a random event, the error condition of each experiment is different, the experiment is repeated at least 100 times, and a frequency histogram is obtained as probability distribution according to the experiment result.
(24) And obtaining the average value of the frequency histogram of the first accuracy, and taking the average value as the second accuracy of the binary neural network when the weight matrix is wrong.
(25) Finding the sensitivity of the weight matrix, wherein the sensitivity is the difference between the original accuracy and the second accuracy in the step S24. Each weight matrix in the neural network obtains respective sensitivity according to the process.
Thirdly, presetting a sensitivity threshold, and dividing a sensitive set and a non-sensitive set of the weight matrix, specifically:
(31) and sequencing the sensitivity of the weight matrix from large to small, and presetting a sensitivity threshold.
(32) And dividing the weight matrix into a sensitive set when the sensitivity of the weight matrix is greater than a sensitivity threshold, and dividing the weight matrix into a non-sensitive set when the sensitivity of the weight matrix is less than or equal to the sensitivity threshold.
Fourthly, evaluating the sensitivity of the non-sensitive set of the weight matrix, comprising the following steps:
(41) and the non-sensitive set is subjected to error by an error probability P, and a third accuracy of the non-sensitive set is obtained.
(42) Repeating the step (41) at least 100 times, obtaining the average value of the frequency histogram of the third accuracy, and obtaining the difference value of the original accuracy and the average value as the sensitivity of the non-sensitive set.
And fifthly, adjusting the sensitivity threshold to obtain the optimal non-sensitive set of the weight matrix. Wherein the sensitivity of the optimal non-sensitive set is equal to a preset maximum accuracy loss value. Specifically, the method comprises the following steps:
(51) presetting a maximum accuracy loss value;
(52) and adjusting a sensitivity threshold, if the sensitivity of the non-sensitive set is equal to the maximum accuracy loss value, taking the non-sensitive set as the optimal non-sensitive set, and entering the sixth step, otherwise, continuously adjusting the sensitivity threshold, and returning to the third step to divide the sensitive set and the non-sensitive set of the weight matrix. Here, the sensitivity threshold is specifically adjusted as follows: and if the sensitivity of the non-sensitive set is less than the maximum accuracy loss value, increasing the sensitivity threshold, and if the sensitivity of the non-sensitive set is greater than the maximum accuracy loss value, decreasing the sensitivity threshold until the sensitivity of the non-sensitive set is equal to the maximum accuracy loss value.
Sixthly, storing the optimal non-sensitive set into a novel memory or a traditional memory adopting a near threshold/sub-threshold voltage technology.
Example 2
As shown in fig. 6, this embodiment provides a weight sensitivity-based binary neural network hardware compression method, which combines sensitivity analysis and binary particle swarm optimization to search for a weight matrix combination with low sensitivity to identification accuracy in a binary neural network. The binary particle swarm algorithm is characterized in that a community is formed by M particles, an optimal value is searched in a D-dimensional target space, the positions of the particles are updated according to a speed updating formula, the goodness and badness of each solution are evaluated through a fitness function, and the optimal value is searched in an iterative updating mode. It should be noted that, in this embodiment, the terms of "fourth", "fifth", and the like are only used to distinguish similar components, and the binary neural network hardware compression method includes the following steps:
a weight sensitivity-based binary neural network hardware compression method is characterized by comprising the following steps:
firstly, training by adopting a binary neural network to obtain a weight matrix and original accuracy.
Second, initialize the particle swarm to obtain the initialization of the D-dimensional vector of the particle. Wherein D is the number of weight matrices in the neural network, that is, each dimension corresponds to a weight matrix, a "1" represents that the corresponding weight matrix is insensitive, and a "0" represents that the corresponding weight matrix is sensitive. The specific operation is as follows: assume that there are M particles in a population, each represented by a D-dimensional vector, each dimension being a binary value (1 or 0). Selecting any particle in the particle swarm, and initializing a D-dimensional vector of the particle by adopting sensitivity analysis. The D-dimensional vectors of the other (M-1) particles are randomly initialized.
And thirdly, adding a constraint condition, and taking a preset maximum accuracy loss value as a constraint condition in the algorithm, namely, the number of the insensitive weight matrix is maximized under the condition that the search result meets the accuracy loss value.
And a fourth step of obtaining an adaptive value of any particle in the particle group, and obtaining the sensitivity of the non-sensitive set of the particles. Specifically, the method comprises the following steps:
(41) and marking the weight matrix with the position of 1 in the D-dimensional vector as non-sensitive, and marking the weight matrix with the position of 0 in the D-dimensional vector as sensitive. And the number of weight matrixes with the position of 1 in the D-dimensional vector is the adaptive value of the particle.
(42) And (3) carrying out non-sensitive induction in the D-dimensional vector to obtain a non-sensitive set, carrying out error occurrence on the non-sensitive set according to the error probability P, obtaining a fourth accuracy of the binary neural network, repeating for at least 100 times to obtain a frequency histogram of the fourth accuracy, and taking the average value of the frequency histogram as a fifth accuracy reached by the binary neural network when the non-sensitive set has errors. And the difference value between the original accuracy and the fifth accuracy is the sensitivity of the non-sensitive set.
And fifthly, updating the historical optimal value and the global optimal value of any particle. And (4) if the sensitivity of the non-sensitive set obtained in the step (42) is less than or equal to a preset maximum accuracy loss value, judging whether the adaptive value of any particle is greater than the historical optimal value corresponding to the particle, if so, taking the adaptive value as the historical optimal value, otherwise, keeping the historical optimal value unchanged. And comparing the historical optimal values of the M particles, and taking the historical optimal value with the largest numerical value as a global optimal value.
Sixthly, solving the updating speed and the change probability of any particle dimension value and generating a random number T; t is a number of 0 or more and 1 or less. Specifically, the method comprises the following steps:
(61) calculating the update velocity v of each dimension of any particleidThe expression is as follows:
vid=w·vid+c1·rand()·(pid-xid)+c2·rand()·(pgd-xid) ①
wherein w represents an inertia factor, c1Represents an acceleration constant, c2Representing the acceleration constant, and rand () representing the generated random number, pidRepresenting the historical optimum, x, of the particleidRepresenting the current value of the particle, pgdThe optimum values of the M particles are shown.
(62) The update speed vidMapping the variable probability of the dimension value into a mapping expression:
wherein, v isidRepresenting the update speed for each dimension.
Seventhly, judging the change probability of any particle and the size of a random number T, and if the random number T is less than or equal to the change probability, converting the particle dimension values into different values, namely converting the particle dimension value 1 into 0 and converting the particle dimension value 0 into 1; if the random number T is larger than the variation probability, the particle dimension value is kept unchanged, so that the particle D-dimension vector is updated.
Eighthly, repeating the fourth step to the seventh step, performing iterative operation on the particles, judging whether the iterative operation times are equal to a preset maximum iterative time, if so, outputting a global optimum value, and otherwise, returning to the fourth step; and solving an optimal non-sensitive set by using the global optimal value.
Ninth, store the optimal non-sensitive set into a new memory or a conventional memory using near-threshold/sub-threshold voltage technology.
Example 3
As shown in fig. 7, this embodiment provides a weight sensitivity-based binary neural network hardware compression method, where the ordinal numbers such as "first", "second", "third", and the like in this embodiment are used only to distinguish similar components, and specifically, the method includes the following steps:
firstly, training by adopting a binary neural network to obtain a weight matrix and original accuracy.
Secondly, evaluating the sensitivity of any weight matrix, specifically as follows:
(21) presetting an error probability P to evaluate the unreliability of the novel memory device and the near-threshold/sub-threshold voltage, wherein the P is a number greater than 0 and less than 1; that is, the probability of each weight in the weight matrix being wrong (1 → -1, -1 → 1) is P.
(22) And any binary neural network weight of the weight matrix is sequentially wrong according to the error probability P, and the first accuracy of the binary neural network is obtained. When any binary neural network weight is wrong, other weight matrixes in the binary neural network are kept unchanged, and the identification accuracy of the network is tested by using the data set.
(23) Repeating step (22) at least 100 times to obtain a frequency histogram of a first accuracy. Since whether each weight in the weight matrix has errors is a random event, the error condition of each experiment is different, the experiment is repeated at least 100 times, and a frequency histogram is obtained as probability distribution according to the experiment result.
(24) And obtaining the average value of the frequency histogram of the first accuracy, and taking the average value as the second accuracy of the binary neural network when the weight matrix is wrong.
(25) Finding the sensitivity of the weight matrix, wherein the sensitivity is the difference between the original accuracy and the second accuracy in the step S24. Each weight matrix in the neural network obtains respective sensitivity according to the process.
Thirdly, presetting a sensitivity threshold, and dividing a sensitive set and a non-sensitive set of the weight matrix, specifically:
(31) and sequencing the sensitivity of the weight matrix from large to small, and presetting a sensitivity threshold.
(32) And dividing the weight matrix into a sensitive set when the sensitivity of the weight matrix is greater than a sensitivity threshold, and dividing the weight matrix into a non-sensitive set when the sensitivity of the weight matrix is less than or equal to the sensitivity threshold.
And fourthly, evaluating the sensitivity of the non-sensitive set of the weight matrix, and comprising the following steps of:
(41) and the non-sensitive set is subjected to error by an error probability P, and a third accuracy of the non-sensitive set is obtained.
(42) Repeating the step (41) at least 100 times, obtaining the average value of the frequency histogram of the third accuracy, and obtaining the difference value of the original accuracy and the average value as the sensitivity of the non-sensitive set.
And fifthly, adjusting the sensitivity threshold to obtain a suboptimal non-sensitive set of the weight matrix. The sensitivity of the suboptimal non-sensitive set is equal to a preset maximum accuracy loss value. The method comprises the following specific steps:
(51) a maximum accuracy loss value is preset.
(52) And adjusting a sensitivity threshold, if the sensitivity of the non-sensitive set is equal to the maximum accuracy loss value, taking the non-sensitive set as a suboptimal non-sensitive set, and entering the sixth step, otherwise, continuously adjusting the sensitivity threshold, and returning to the third step to divide the sensitive set and the non-sensitive set of the weight matrix. Continuing to adjust the sensitivity threshold is as follows: and if the sensitivity of the non-sensitive set is less than the maximum accuracy loss value, increasing the sensitivity threshold, and if the sensitivity of the non-sensitive set is greater than the maximum accuracy loss value, decreasing the sensitivity threshold until the sensitivity of the non-sensitive set is equal to the maximum accuracy loss value.
And sixthly, initializing the particle swarm to obtain the initialization of the D-dimensional vector of the particle, wherein D is the number of the weight matrixes in the neural network, namely, each dimension corresponds to one weight matrix, 1 represents that the corresponding weight matrix is insensitive, and 0 represents that the corresponding weight matrix is sensitive. The specific operation is as follows: assuming that M particles exist in the particle swarm, selecting any one particle, initializing the D-dimensional vector of the particle by adopting a suboptimal non-sensitive set, and randomly initializing the D-dimensional vectors of other (M-1) particles.
And step seven, adding a constraint condition, wherein a preset maximum accuracy loss value is used as a constraint condition in the algorithm, namely the number of the insensitive weight matrix is maximized under the condition that the search result meets the accuracy loss value.
And step eight, obtaining an adaptive value of any particle in the particle swarm and obtaining the sensitivity of a non-sensitive set of the particle, wherein the specific steps are as follows:
(81) and marking the weight matrix with the position of 1 in the D-dimensional vector as non-sensitive, and marking the weight matrix with the position of 0 in the D-dimensional vector as sensitive. And the number of weight matrixes with the position of 1 in the D-dimensional vector is the adaptive value of the particle.
(82) And (3) carrying out non-sensitive induction in the D-dimensional vector to obtain a non-sensitive set, carrying out error occurrence on the non-sensitive set according to the error probability P, obtaining a fourth accuracy of the binary neural network, repeating for at least 100 times to obtain a frequency histogram of the fourth accuracy, and taking the average value of the frequency histogram as a fifth accuracy reached by the binary neural network when the non-sensitive set has errors. And the difference value between the original accuracy and the fifth accuracy is the sensitivity of the non-sensitive set.
And step nine, updating the historical optimal value and the global optimal value of any particle. If the sensitivity of the non-sensitive set obtained in the step (82) is less than or equal to a preset maximum accuracy loss value, judging whether the adaptive value of any particle is greater than the historical optimal value corresponding to the particle, if so, taking the adaptive value as the historical optimal value, otherwise, keeping the historical optimal value unchanged; and comparing the historical optimal values of the M particles, and taking the historical optimal value with the largest numerical value as a global optimal value.
And step ten, solving the updating speed and the change probability of any particle dimension value, and generating a random number T, wherein the T is a number which is more than or equal to 0 and less than or equal to 1. Specifically, the method comprises the following steps:
(101) calculating the update velocity v of each dimension of any particleidThe expression is as follows:
vid=w·vid+c1·rand()·(pid-xid)+c2·rand()·(pgd-xid) ①
wherein w represents an inertia factor, c1Represents an acceleration constant, c2Representing the acceleration constant, and rand () representing the generated random number, pidRepresenting the historical optimum, x, of the particleidRepresenting the current value of the particle, pgdThe optimum values of the M particles are shown.
(102) The update speed vidMapping the variable probability of the dimension value into a mapping expression:
whereinSaid v isidRepresenting the update speed for each dimension.
The tenth step, judging the variation probability and the size of the random number T, if the random number T is less than or equal to the variation probability, converting any particle dimension value into different values, namely converting the particle dimension value 1 into 0, and converting the particle dimension value 0 into 1; if the random number T is larger than the variation probability, any particle dimension value is kept unchanged, so that the updating of the particle D-dimension vector is realized.
A twelfth step of repeating the eighth step to the eleventh step, performing iterative operation on the particles, and judging whether the iterative operation times are equal to a preset maximum iterative time, if so, outputting a global optimum value, otherwise, returning to the eighth step to continue the iterative operation on the particles; and obtaining the optimal non-sensitive set by using the global optimal value.
Thirteenth, store the optimal non-sensitive set into a new memory or a conventional memory using near-threshold/sub-threshold voltage technology.
The above-mentioned embodiments are only preferred embodiments of the present invention, and do not limit the scope of the present invention, but all the modifications made by the principles of the present invention and the non-inventive efforts based on the above-mentioned embodiments shall fall within the scope of the present invention.
Claims (8)
1. A weight sensitivity-based binary neural network hardware compression method is characterized by comprising the following steps:
step S1, training by using a binary neural network to obtain a weight matrix and original accuracy;
step S2, evaluating the sensitivity of any weight matrix;
step S3, presetting a sensitivity threshold, and dividing a sensitive set and a non-sensitive set of a weight matrix;
step S4, evaluating the sensitivity of the non-sensitive set of the weight matrix;
step S5, adjusting a sensitivity threshold value to obtain an optimal non-sensitive set of the weight matrix; the sensitivity of the optimal non-sensitive set is equal to a preset maximum accuracy loss value;
step S6, storing the optimal non-sensitive set in a novel memory or a traditional memory adopting a near threshold/sub-threshold voltage technology;
in step S2, the method for evaluating the sensitivity of any weight matrix includes the following steps:
step S21, presetting an error probability P to evaluate the unreliability of the novel memory device and the near-threshold/sub-threshold voltage; p is a number greater than 0 and less than 1;
step S22, any binary neural network weight of the weight matrix generates errors in sequence according to the error probability P, and first accuracy of the binary neural network is obtained;
step S23, repeating step S22 for N times, and obtaining a frequency histogram with first accuracy; n is a natural number more than 100;
step S24, obtaining the average value of the frequency histogram of the first accuracy, and taking the average value as the second accuracy reached by the binary neural network when the weight matrix is wrong;
step S25, finding the sensitivity of the weight matrix, wherein the sensitivity is the difference between the original accuracy and the second accuracy in step S24.
2. The weight sensitivity-based binary neural network hardware compression method according to claim 1, wherein in the step S3, a sensitivity threshold is preset, and a sensitive set and a non-sensitive set of the weight matrix are divided, including the following steps:
step S31, the sensitivities of the weight matrix are sequenced from big to small, and a sensitivity threshold is preset;
step S32, the sensitivity of the weight matrix is greater than the sensitivity threshold and divided into sensitive set, and the sensitivity of the weight matrix is less than or equal to the sensitivity threshold and divided into non-sensitive set.
3. The weight sensitivity-based binary neural network hardware compression method of claim 1, wherein in the step S4, evaluating the sensitivity of the non-sensitive set of the weight matrix comprises the following steps:
step S41, making the non-sensitive set generate errors according to the error probability P, and obtaining a third accuracy of the non-sensitive set; p is a number greater than 0 and less than 1;
and step S42, repeating the step S41 for N times, obtaining the average value of the frequency histogram of the third accuracy, and obtaining the difference value between the original accuracy and the average value as the sensitivity of the non-sensitive set.
4. The weight sensitivity-based binary neural network hardware compression method of claim 1, wherein in the step S5, adjusting the sensitivity threshold to obtain the optimal non-sensitive set of weight matrices comprises the following steps:
step S51, presetting a maximum accuracy loss value;
and step S52, adjusting a sensitivity threshold, if the sensitivity of the non-sensitive set is equal to the maximum accuracy loss value, taking the non-sensitive set as the optimal non-sensitive set, and entering step S6, otherwise, continuously adjusting the sensitivity threshold, and returning to step S3 to divide the sensitive set and the non-sensitive set of the weight matrix.
5. A weight sensitivity-based binary neural network hardware compression method is characterized by comprising the following steps:
k1, training by adopting a binary neural network to obtain a weight matrix and original accuracy;
step K2, initializing the particle swarm to obtain the initialization of the D-dimensional vector of the particle; d is the number of weight matrixes in the neural network;
step K3, adding constraint conditions;
step K4, obtaining the adaptive value of any particle in the particle group, and obtaining the sensitivity of the non-sensitive set of the particles;
step K5, updating the historical optimal value and the global optimal value of any particle;
k6, obtaining the update speed and the change probability of any particle dimension value, and generating a random number T; t is a number of 0 or more and 1 or less;
step K7, judging the variation probability of any particle and the size of the random number T, if the random number T is less than or equal to the variation probability, transforming the dimension values of the particles differently; if the random number T is larger than the variation probability, the particle dimension value is kept unchanged so as to update the particle D-dimension vector;
step K8, repeating the steps K4 to K7, performing iterative operation on the particles, judging whether the iterative operation times are equal to the preset maximum iterative times, if so, outputting a global optimum value, and otherwise, returning to the step K4; obtaining an optimal non-sensitive set by using the global optimal value;
step K9, storing the optimal non-sensitive set in a novel memory or a traditional memory adopting a near threshold/sub-threshold voltage technology;
in the step K4, the method for determining the adaptive value of any particle in the particle group and the sensitivity of the insensitive set includes the steps of:
step K41, marking the weight matrix with the position of 1 in the D-dimensional vector as non-sensitive, and marking the weight matrix with the position of 0 in the D-dimensional vector as sensitive; the number of weight matrix with the position of 1 in the D-dimensional vector is the adaptive value of the particle;
step K42, inducing the non-sensitivity in the D-dimensional vector to obtain a non-sensitivity set, enabling the non-sensitivity set to be wrong according to the error probability P, obtaining a fourth accuracy of the binary neural network, repeating the operation for N times to obtain a frequency histogram of the fourth accuracy, and taking the average value of the frequency histogram as a fifth accuracy reached by the binary neural network when the non-sensitivity set is wrong; p is a number greater than 0 and less than 1, and N is a natural number greater than 100; and the difference value between the original accuracy and the fifth accuracy is the sensitivity of the non-sensitive set.
6. The weight sensitivity-based binary neural network hardware compression method of claim 5, wherein in step K2, any particle in the particle swarm is selected, the D-dimensional vector of the particle is initialized by sensitivity analysis, and the D-dimensional vectors of the remaining particles are initialized randomly; the particle group consists of M particles, wherein M is a natural number more than 1.
7. The weight sensitivity-based binary neural network hardware compression method of claim 5, wherein in step K5, if the sensitivity of the non-sensitive set obtained in step K42 is less than or equal to a preset maximum accuracy loss value, it is determined whether the adaptation value of any particle is greater than the historical optimum value corresponding to the particle, if the adaptation value is greater than the historical optimum value, the adaptation value is taken as the historical optimum value, otherwise, the historical optimum value is kept unchanged; and comparing the historical optimal values of the M particles, and taking the historical optimal value with the largest numerical value as a global optimal value.
8. The weight sensitivity-based binary neural network hardware compression method according to claim 5, wherein in the step K6, updating the update speed and the change probability of the particles comprises the following steps:
step K61, calculating the update velocity v of each dimension of any particleidThe expression is as follows:
vid=w·vid+c1·rand()·(pid-xid)+c2·rand()·(pgd-xid) ①
wherein w represents an inertia factor, c1Represents an acceleration constant, c2Representing the acceleration constant, and rand () representing the generated random number, pidRepresenting the historical optimum, x, of the particleidRepresenting the current value of the particle, pgdRepresents the optimal values of the M particles;
step K62, updating the speed vidMapping the variable probability of the dimension value into a mapping expression:
wherein, v isidRepresenting the update speed for each dimension.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811000016.7A CN109212960B (en) | 2018-08-30 | 2018-08-30 | Weight sensitivity-based binary neural network hardware compression method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811000016.7A CN109212960B (en) | 2018-08-30 | 2018-08-30 | Weight sensitivity-based binary neural network hardware compression method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109212960A CN109212960A (en) | 2019-01-15 |
CN109212960B true CN109212960B (en) | 2020-08-14 |
Family
ID=64986164
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811000016.7A Expired - Fee Related CN109212960B (en) | 2018-08-30 | 2018-08-30 | Weight sensitivity-based binary neural network hardware compression method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109212960B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978160B (en) * | 2019-03-25 | 2021-03-02 | 中科寒武纪科技股份有限公司 | Configuration device and method of artificial intelligence processor and related products |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7372713B2 (en) * | 2006-04-17 | 2008-05-13 | Texas Instruments Incorporated | Match sensing circuit for a content addressable memory device |
CN107729999A (en) * | 2016-08-12 | 2018-02-23 | 北京深鉴科技有限公司 | Consider the deep neural network compression method of matrix correlation |
CN107967515A (en) * | 2016-10-19 | 2018-04-27 | 三星电子株式会社 | The method and apparatus quantified for neutral net |
CN108322221A (en) * | 2017-01-18 | 2018-07-24 | 华南理工大学 | A method of being used for depth convolutional neural networks model compression |
CN108334945A (en) * | 2018-01-30 | 2018-07-27 | 中国科学院自动化研究所 | The acceleration of deep neural network and compression method and device |
-
2018
- 2018-08-30 CN CN201811000016.7A patent/CN109212960B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7372713B2 (en) * | 2006-04-17 | 2008-05-13 | Texas Instruments Incorporated | Match sensing circuit for a content addressable memory device |
CN107729999A (en) * | 2016-08-12 | 2018-02-23 | 北京深鉴科技有限公司 | Consider the deep neural network compression method of matrix correlation |
CN107967515A (en) * | 2016-10-19 | 2018-04-27 | 三星电子株式会社 | The method and apparatus quantified for neutral net |
CN108322221A (en) * | 2017-01-18 | 2018-07-24 | 华南理工大学 | A method of being used for depth convolutional neural networks model compression |
CN108334945A (en) * | 2018-01-30 | 2018-07-27 | 中国科学院自动化研究所 | The acceleration of deep neural network and compression method and device |
Non-Patent Citations (3)
Title |
---|
Build a compact binary neural network through bit-level sensitivity and data pruning;Yixing Li 等;《Neurocomputing》;20200211;全文 * |
深度网络模型压缩综述;雷杰 等;《软件学报》;20171204;全文 * |
神经网络模型压缩方法综述;曹文龙 等;《计算机应用研究》;20180417;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109212960A (en) | 2019-01-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
He et al. | K-means hashing: An affinity-preserving quantization method for learning binary compact codes | |
CN108510067B (en) | Convolutional neural network quantification method based on engineering realization | |
CN106886599B (en) | Image retrieval method and device | |
Ercoli et al. | Compact hash codes for efficient visual descriptors retrieval in large scale databases | |
Wei et al. | Projected residual vector quantization for ANN search | |
CN104199923A (en) | Massive image library retrieving method based on optimal K mean value Hash algorithm | |
CN111027619A (en) | Memristor array-based K-means classifier and classification method thereof | |
Wei et al. | Compact MQDF classifiers using sparse coding for handwritten Chinese character recognition | |
EP3115908A1 (en) | Method and apparatus for multimedia content indexing and retrieval based on product quantization | |
WO2022052468A1 (en) | Methods and systems for product quantization-based compression of matrix | |
JP2020515986A (en) | Coding method based on mixture of vector quantization and nearest neighbor search (NNS) method using the same | |
CN116777727B (en) | Integrated memory chip, image processing method, electronic device and storage medium | |
CN112766484A (en) | Floating point neural network model quantization system and method | |
CN109212960B (en) | Weight sensitivity-based binary neural network hardware compression method | |
Liu et al. | Online optimized product quantization | |
CN111344719A (en) | Data processing method and device based on deep neural network and mobile device | |
Li et al. | Online variable coding length product quantization for fast nearest neighbor search in mobile retrieval | |
CN112132261A (en) | Convolutional neural network character recognition method running on ARM | |
CN113177627B (en) | Optimization system, retraining system, method thereof, processor and readable medium | |
CN112487231B (en) | Automatic image labeling method based on double-image regularization constraint and dictionary learning | |
CN111368976B (en) | Data compression method based on neural network feature recognition | |
CN113918696A (en) | Question-answer matching method, device, equipment and medium based on K-means clustering algorithm | |
Yuan et al. | Distortion minimization hashing | |
JP7265946B2 (en) | Neural network processor, neural network processing method, and program | |
CN113743593A (en) | Neural network quantization method, system, storage medium and terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200814 |
|
CF01 | Termination of patent right due to non-payment of annual fee |