CN110751274A - Neural network compression method and system based on random projection hash - Google Patents

Neural network compression method and system based on random projection hash Download PDF

Info

Publication number
CN110751274A
CN110751274A CN201910892214.7A CN201910892214A CN110751274A CN 110751274 A CN110751274 A CN 110751274A CN 201910892214 A CN201910892214 A CN 201910892214A CN 110751274 A CN110751274 A CN 110751274A
Authority
CN
China
Prior art keywords
weight matrix
matrix
neural network
characteristic diagram
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910892214.7A
Other languages
Chinese (zh)
Inventor
沈明珠
徐毅
刘祥龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Beijing University of Aeronautics and Astronautics
Original Assignee
Beijing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Aeronautics and Astronautics filed Critical Beijing University of Aeronautics and Astronautics
Priority to CN201910892214.7A priority Critical patent/CN110751274A/en
Publication of CN110751274A publication Critical patent/CN110751274A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a neural network compression method and system based on random projection hash. The method comprises the following steps: in forward propagation, compressing the input characteristic diagram and the weight matrix of each neural network layer through a projection matrix, and calculating an output characteristic diagram; in backward propagation, calculating a loss function of the neural network according to the output characteristic diagram, and calculating the gradient value of the input characteristic diagram and the weight matrix of each layer through the loss function; the weight matrix is updated according to the gradient value of the weight matrix of the first layer. The method can customize the compression factor according to the requirements of accuracy and compression ratio, and has high flexibility.

Description

Neural network compression method and system based on random projection hash
Technical Field
The invention relates to a neural network compression method based on random projection hash, and also relates to a neural network compression system for realizing the method.
Background
In recent years, deep neural networks have shown great potential in many fields, including computer vision and speech recognition. The rapid development of a big data technology and GPU parallel computation is benefited, and strong hardware support is provided in the huge neural learning of a complex network, so that a neural network model and a deep learning method are more and more applied to the field of computer vision. The accuracy and performance of the method are remarkably improved in the aspects of object identification, image classification, image retrieval, face verification, video understanding and the like, which exceed those of other methods. In the field of computer vision, a deep neural network-convolutional neural network can accurately acquire information from big data due to the excellent performance of the deep neural network-convolutional neural network in the process of simulating abstraction and iteration of a human brain, so that the highest standard can be achieved in many applications. At the same time, we can see some interesting advances in virtual reality, augmented reality, intelligent wearable devices that are emerging in the field of computer vision. In summary, it is very slow to load a high-performance identification system on an intelligent portable device.
However, current convolutional neural network-based recognition systems require a large amount of memory and high performance computing power, and are typically implemented on expensive GPU clusters. Although training of the neural network may be trained on a cluster of GPUs, the testing process needs to be done on the mobile device if it is to be done in real time. However, as the data set increases and the number of features increases, the scale of the model, the storage of parameters, and the amount of calculation also increase, which leads to a demand for high calculation capability, and thus the deep convolutional neural network can hardly be used in a mobile device or an embedded device, and hinders the development thereof.
It is known that mobile devices, embedded devices, and ordinary mobile computers have limitations on memory space, computing power, and power consumption, while most mobile devices have only 1GB Random Access Memory (RAM), the parameters obtained by training the convolutional neural network that wins the ImageNet race in 2014 have reached 576MB, which occupies a large amount of RAM, and it is not acceptable in terms of power consumption to test a picture to load more than 500M parameters, not to mention computing time, and these disadvantages of high memory and high computing power requirements are exposed on mobile devices. Therefore, the models greatly exceed the memory, required power and computing capacity which can be borne by the mobile phone, the embedded device and the intelligent wearable device.
However, more and more deep learning applications are oriented to mobile devices and embedded devices, for example, smart phones and robots implement image classification, and an automatic driving car needs to perform object recognition in real time, so how to compress a neural network to reduce the calculation amount and the storage space is an urgent need.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a neural network compression method based on random projection hash.
Another technical problem to be solved by the present invention is to provide a neural network compression system based on random projection hash.
In order to achieve the purpose, the invention adopts the following technical scheme:
according to a first aspect of the embodiments of the present invention, there is provided a neural network compression method based on random projection hash, including the following steps:
in forward propagation, compressing the input characteristic diagram and the weight matrix of each neural network layer through a projection matrix, and calculating an output characteristic diagram;
in backward propagation, calculating a loss function of the neural network according to the output characteristic diagram, and calculating the gradient value of the input characteristic diagram and the weight matrix of each layer through the loss function;
and updating the weight matrix according to the gradient value of the weight matrix of each layer.
Preferably, in the forward propagation, the input feature map and the weight matrix of each neural network layer are compressed by a projection matrix, and the output feature map is calculated by the following steps:
and multiplying the compressed input characteristic diagram and the weight matrix to obtain an output characteristic diagram.
Preferably, when the input characteristic diagram and the weight matrix of each neural network layer are compressed by the projection matrix, the input characteristic diagram and the weight matrix are compressed by the same projection matrix.
Preferably, the input feature map and the weight matrix of each neural network layer are compressed by a projection matrix, and the method comprises the following steps:
for the input feature map SkAnd a weight matrix WkBy projection matrices of real values
Figure BDA0002209108670000021
Projecting the input feature map SkAnd the weight matrix is converted into a low-dimensional input feature map
Figure BDA0002209108670000022
And a low-dimensional weight matrix
Input feature map of low dimensionAnd a low-dimensional weight matrix
Figure BDA0002209108670000032
Hash encoding into binaryAnd
Figure BDA0002209108670000034
completing the compression of the input feature map and the weight matrix;
wherein k is the number of neural network layers, and b is a compression factor.
Preferably, in the forward propagation, the input characteristic diagram and the weight matrix of the convolutional layer are compressed through a projection matrix, and an output characteristic diagram is calculated; the method comprises the following steps:
reorganizing the input characteristic diagram and the weight matrix into a large matrix respectively;
compressing the reorganized input characteristic diagram and the weight matrix through a projection matrix;
multiplying the compressed input characteristic diagram and the weight matrix to obtain an output characteristic diagram;
and carrying out col2im operation on the output characteristic diagram to obtain a final output characteristic diagram.
Preferably, in backward propagation, calculating gradient values of the input feature map and the weight matrix of each layer through a loss function; wherein the non-derivable y (sgn (x) function used in the forward process is approximated by y (x), so that when sgn (x) contained in the loss function is replaced by x, the gradient value of the input signature is
Figure BDA0002209108670000035
Comprises the following steps:
Figure BDA0002209108670000036
gradient values of the weight matrixComprises the following steps:
Figure BDA0002209108670000038
wherein the content of the first and second substances,
Figure BDA0002209108670000039
is the gradient value of the output characteristic diagram;
Figure BDA00022091086700000310
carrying out Hash coding on the real-valued weight matrix;
Figure BDA00022091086700000311
a projection matrix which is real value;
Figure BDA00022091086700000312
the input feature map after hash coding.
Preferably, in backward propagation, calculating gradient values of the input feature map and the weight matrix of each layer through a loss function; in which the forward process is to be carried outThe function used in (a) is approximated by y h tan (x), so that when sgn (x) included in the loss function is replaced by h tan (x), the gradient value of the input feature map is used
Figure BDA00022091086700000313
Comprises the following steps:
Figure BDA0002209108670000041
gradient values of the weight matrix
Figure BDA0002209108670000042
Comprises the following steps:
wherein the content of the first and second substances,
Figure BDA0002209108670000044
is the gradient value of the output characteristic diagram;
Figure BDA0002209108670000045
carrying out Hash coding on the real-valued weight matrix;
Figure BDA0002209108670000046
a projection matrix which is real value;
Figure BDA0002209108670000047
the input characteristic graph is subjected to Hash coding; by using
Figure BDA0002209108670000048
The gradient of the Hard tanh function is shown.
Wherein preferably, the weight matrix is updated according to the gradient value of the weight matrix of each layer
Figure BDA0002209108670000049
The matrix is truncated by a clip function, and
Figure BDA00022091086700000410
each element in the transformed matrix is [ -1,1 [ -1 [ ]]New matrices in the range and assigning the new matrices to the weight matrix W of the k-th layerk
Wherein η is the learning rate;
Figure BDA00022091086700000411
is the gradient value of the weight matrix; wkIs a weight matrix.
Preferably, when the bias matrix exists when the data is input into the neural network, the method further comprises the following steps:
and calculating the gradient value of the bias matrix of each layer according to the loss function, and updating the bias matrix according to the gradient value of the bias matrix of each layer.
According to a second aspect of the embodiments of the present invention, there is provided a neural network compression system based on random projection hash, including a processor and a memory; the memory having stored thereon a computer program executable on the processor, the computer program when executed by the processor implementing the steps of:
in forward propagation, compressing the input characteristic diagram and the weight matrix of each neural network layer through a projection matrix, and calculating an output characteristic diagram;
in backward propagation, calculating a loss function of the neural network according to the output characteristic diagram, and calculating the gradient value of the input characteristic diagram and the weight matrix of each layer through the loss function;
and updating the weight matrix according to the gradient value of the weight matrix of each layer.
According to the neural network compression method based on random projection Hash, parameters are quantized and approximately calculated through a random projection Hash algorithm in random projection, the method can be faster than the same type of method, less memory is applied to a neural network, compression factors can be customized individually according to the requirements of accuracy and compression ratio, high flexibility is achieved, and the storage requirements of the parameters are greatly reduced.
Drawings
FIG. 1 is a flow chart of a neural network compression method based on random projection hashing according to the present invention;
FIG. 2 is a schematic diagram of im2col operation in an embodiment of the present invention;
FIG. 3 is a schematic diagram of the Hard tanh function in an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a neural network compression system based on random projection hash provided in the present invention.
Detailed Description
The technical contents of the invention are described in detail below with reference to the accompanying drawings and specific embodiments.
At present, many relevant papers are researched for compression acceleration implementation of a neural network, and particularly, a general formula for implementing the relevant papers by binarization is provided. However, at present, the weight matrix and the input matrix are directly converted into the same binary number in compression, so the compression multiple, namely the real number of 4 bytes is converted into the number of 1 bit, and then the compression multiple is 32 times, and the acceleration is obvious because the bit number can be realized through operations such as bit operation in calculation. However, since the compression factor is fixed to 32 times, this condition is very limited. For example, on a small network, a compression ratio of 32 times may result in too few parameters and too much loss of accuracy, while on a large network, a compression ratio of 32 times may appear too small and the model remains large after compression.
Therefore, in the neural network compression method based on random projection hash provided by the invention, the improvement is carried out by combining a projection compression method, the input matrix can be compressed by adding a hyper-parameter compression factor, the weight matrix can be correspondingly compressed in one dimension, and finally, a binarized input matrix and a binarized weight matrix can be obtained. Therefore, the achievable compression factor can exceed the fixed compression factor used by some existing research methods, and meanwhile, the compression factor can be customized according to the specific size of the network, so that the flexibility is higher. And because the compression factor can be larger, when the test is loaded on the mobile equipment, the required calculation amount and the storage space are smaller, the energy consumption is less, and the test can be more suitable for the operation on the low-performance equipment.
As shown in fig. 1, the neural network compression method based on random projection hash provided by the present invention includes the following steps: firstly, in forward propagation, compressing an input characteristic diagram and a weight matrix of each neural network layer through a projection matrix, and calculating an output characteristic diagram; then, in backward propagation, calculating a loss function of the neural network according to the output characteristic diagram, and calculating the gradient value of the input characteristic diagram and the weight matrix of each layer through the loss function; and finally, updating the weight matrix according to the gradient value of the weight matrix of each layer. This process is described in detail below.
And S1, compressing the input characteristic diagram and the weight matrix of each neural network layer through a projection matrix in forward propagation, and calculating an output characteristic diagram.
Before describing the neural network compression method based on random projection hash provided by the invention, a hash algorithm is introduced. The main idea of the hash algorithm is to map binary values of arbitrary length to binary values of fixed length, called hash values, by means of a designed hash function, which compresses messages or data of different sizes into a fixed data format, so that the amount of data is greatly reduced. Hash random projection is used for retrieval and storage in the image field and is widely used for data encryption in the cryptology field.
The three major elements of the hash algorithm are an input space, a hash function and an output space, i.e., a space containing a hash value, so the most important of the hash algorithm is the hash function. Generally, the hash algorithm needs several hash functions to work together to change the original data into hash codes, and can maintain consistency with the input data, that is, if two sets of input data are similar in the input space, their hash codes are similar in the output space, and the same holds true in reverse. The locality sensitive hashing algorithm is a widely used hashing algorithm. Roughly defined as follows:
given a set of hash functions H ═ H: d → U }, for each function H e H in H, for any two vectors p, q e D, if the following condition is satisfied:
if d (p, q) is less than or equal to d1Then Pr [ h (q) ═ h (p)]≥P1
If d (p, q) ≧ d2Then Pr [ h (q) ═ h (p)]≤P2
Wherein d (p, q) is the distance between p and q, Pr [ h (q) ═ h (p)]Denotes the probability that h (q) and h (p) are equal, d1,d2,P1,P2Is a threshold value, typically d1<d2,P1>P2Then we call hash function cluster H location sensitive, i.e. (d)1,d2,P1,P2)-sensitive。
In terms of images, the hash algorithm is widely used in image retrieval, image storage, video retrieval, video storage and the like. In the traditional image retrieval, the image features directly extracted from the images are high-dimensional and high-precision data, extremely complex operation is needed when the image features are directly used as retrieval basis, and millions of images are generally contained in an image database, so that the time consumption is long and the real-time requirement is difficult to achieve. And the retrieval based on the Hash algorithm directly utilizes Hash codes transformed from image features to calculate and retrieve the similarity, thereby obviously reducing the calculation time and the storage space.
In the embodiment provided by the invention, the random projection hash algorithm is applied to the compression of the neural network. In forward propagation, compressing the input characteristic diagram and the weight matrix of each neural network layer through a projection matrix, and calculating an output characteristic diagram; the method specifically comprises the following steps:
and S11, compressing the input characteristic diagram and the weight matrix of each neural network layer through the projection matrix.
When the input characteristic diagram and the weight matrix of each neural network layer are compressed through the projection matrix, the input characteristic diagram and the weight matrix are compressed through the same projection matrix; the method specifically comprises the following steps:
s111, inputting a characteristic diagram SkAnd a weight matrix WkBy projection matrices of real values
Figure BDA0002209108670000071
Projecting and converting the feature into a low-dimensional input feature map
Figure BDA0002209108670000072
And a low-dimensional weight matrix
Figure BDA0002209108670000073
S112, inputting the low-dimensional input feature map
Figure BDA0002209108670000074
And a low-dimensional weight matrixHash encoding into binary
Figure BDA0002209108670000076
And
Figure BDA0002209108670000077
and completing the compression of the input feature map and the weight matrix. Wherein k is the number of neural network layers, and b is a compression factor.
Then a further action is taken on the hash code where the hash matrix P may be data insensitive, such as random projection.
S12, inputting the compressed characteristic diagram SkAnd a weight matrix WkAnd multiplying to obtain an output characteristic diagram.
In the embodiment provided by the invention, in the forward propagation process, the input characteristic diagram and the weight matrix of each neural network layer are compressed through the same projection matrix, and the matrix obtained by multiplying the compressed matrices is used as the approximation of the original output characteristic diagram. Namely, it isThe original output characteristic diagram passes through the input characteristic diagram S of real valuekWeight matrix WkMultiplication calculation of the input matrix now by binarization
Figure BDA0002209108670000081
Weight matrixThe multiplication calculation results in that:
Figure BDA0002209108670000083
specifically, in each layer, assuming that L-layer networks are shared, the network calculation for each layer, i.e., k ═ 1 to L, is as follows:
for the fully-connected layer, the fully-connected layer performs linear operation, and the calculation formula is as follows:
Figure BDA0002209108670000085
Figure BDA0002209108670000086
wherein, R is the size of the compressed matrix, and m × n is the number of rows and columns of the matrix corresponding to the compressed input feature map, respectively. m CrThe number of rows and columns, T, of the matrix corresponding to the compressed feature matrixkIs an output characteristic diagram; crAnd n is the number of rows and columns of the matrix corresponding to the compressed output characteristic diagram respectively.
For convolutional layers, the specific operation is convolution operation, so that the operation is different from the linear operation of fully-connected layers, but in fact, in the code of deep learning platforms such as Torch or Caffe, the convolution is realized by drawing the high-dimensional matrix of the original input feature map into a corresponding form according to a certain rule, and the weight matrix is the same, and then the linear operation is performed. This rule is the operation of im2 col.
In the embodiment provided by the invention, in the forward propagation, the input characteristic diagram and the weight matrix of the convolutional layer are compressed through a projection matrix, and an output characteristic diagram is calculated; the method specifically comprises the following steps:
s01, reorganizing the input feature map and the weight matrix into a larger matrix.
And S02, compressing the reorganized input feature map and the weight matrix through the projection matrix.
And S03, multiplying the compressed input characteristic diagram and the weight matrix to obtain an output characteristic diagram.
And S04, carrying out col2im operation on the output characteristic diagram to obtain a final output characteristic diagram.
Specifically, as shown in fig. 2, in a specific implementation, the input feature map and the weight matrix are reorganized into a larger matrix, wherein the process of reorganizing the input feature map and the weight matrix into a larger matrix employs a conventional reorganization process in the art, and is not limited in this respect. And then, compressing the reorganized input characteristic diagram and the weight matrix through a projection matrix, and multiplying the compressed input characteristic diagram and the weight matrix to obtain an output characteristic diagram. Finally, col2im operation is performed on the output feature map (col2i operation is a conventional operation in the field, and is not described herein again), so as to implement convolution with higher efficiency. So in essence, the implementation of the compression of the convolutional layer is not very different from that of the fully-connected layer, and only slightly different. The convolutional layer calculation formula is as follows:
Figure BDA0002209108670000091
Figure BDA0002209108670000092
wherein the hash of the binary is encoded
Figure BDA0002209108670000094
The corresponding matrix R has a size b (d)t*dt)。m*CrAnd (d)t*dt)*CrWith b (d)t*dt) Similarly, they will not be described in detail herein.
S2, in backward propagation, calculating a loss function of the neural network according to the output characteristic diagram, and calculating gradient values of the input characteristic diagram and the weight matrix of each layer through the loss function; specifically, a loss function of the neural network is calculated according to the output characteristic diagram, and the initial gradient value of the final output layer can be obtained through the loss function
Figure BDA0002209108670000095
Since the output of the k-1 th layer corresponds to the input of the k-th layer, i.e. Tk-1=SkCalculating gradient values of the input feature map and the weight matrix; and (4) recursion layer by layer is carried out on k being equal to L to 2, the gradient value of the input feature map and the weight matrix of each layer can be obtained, when the initial input data of the neural network is finally obtained, the weight matrix corresponding to the feature map is input, and the weight matrix is subjected to corresponding operation to update the initial weight matrix.
In the embodiment provided by the present invention, the loss function of the neural network is calculated according to the output feature diagram, and the loss function of the neural network is calculated in a conventional cross entropy manner, which is not described herein again.
Because some transformation is performed on the input feature map and the output feature map during forward propagation, the gradient of the parameter will change correspondingly during backward propagation, and because of the existence of the projection matrix, the gradient of the input matrix and the gradient of the weight matrix will change correspondingly due to the operation of the projection matrix, and further derivation of their gradient needs to be performed through the chain rule.
For calculating gradient parameters, the final output layer of the convolutional neural networkThe gradient of (a) can be obtained directly, so that we can easily obtain the gradient of the error rate of each layer to the output characteristic diagram, and k is 1 to L
Figure BDA0002209108670000101
From which can be derived according to the chain rule
Figure BDA0002209108670000102
Figure BDA0002209108670000103
The following were used:
Figure BDA0002209108670000105
Figure BDA0002209108670000106
by introducing a formula for forward propagation
In the above-described formula, the first and second groups,
Figure BDA0002209108670000109
this cannot be given directly, since the derivation of the sgn function is involved, but the sgn function is not continuously derivable. Therefore, different treatments are generally required to be performed when the derivatives are derived, and most approximate continuous functions are used for approximation, so that different derivation treatments are performed according to different approximate functions in the following.
If the approximation to y-sgn (x) is performed with y-x, i.e. the sgn is directly removed for approximation,
Figure BDA00022091086700001010
the following can be obtained:
the expression of the final gradient is therefore:
Figure BDA0002209108670000111
Figure BDA0002209108670000112
Figure BDA0002209108670000113
Figure BDA0002209108670000114
if y equals sgn (x) is approximated by y equals htanh (x) max (-1, min (1, x)), then the method uses the Hard tanh function, which is shown in fig. 3.
When x > -1 and x < ═ 1,
Figure BDA0002209108670000115
otherwise
Figure BDA0002209108670000116
By 1|x|≤1The gradient of the Hard tanh function is expressed, and then the gradient is obtained
Figure BDA0002209108670000117
The expression of the final gradient is therefore:
Figure BDA0002209108670000118
Figure BDA0002209108670000119
Figure BDA00022091086700001110
s3, the weight matrix is updated according to the gradient value of the weight matrix of each layer.
Updating the weight matrix according to the gradient value of the weight matrix of each layer, and adopting the following formula:
Figure BDA0002209108670000121
η is learning rate, which can be set according to requirement, when updating the weight matrix
Figure BDA0002209108670000122
The matrix is truncated by a clip function, and
Figure BDA0002209108670000123
each element in the transformed matrix is [ -1,1 [ -1 [ ]]New matrices within the range and assign them to the weight matrix Wk
In another embodiment provided by the present invention, a bias matrix b exists when data is input to the neural networkkIn forward propagation, compressing the input characteristic diagram and the weight matrix of each neural network layer through a projection matrix, and calculating an output characteristic diagram; the output characteristic diagram at this time is:
Figure BDA0002209108670000124
then, in backward propagation, calculating a loss function of the neural network according to the output characteristic diagram, and calculating the gradient value of the input characteristic diagram and the weight matrix of each layer through the loss function; finally, updating the weight matrix according to the gradient value of the weight matrix of each layer, calculating the gradient value of the bias matrix of each layer according to the loss function, updating the bias matrix according to the gradient value of the bias matrix of each layer, and adopting the following formula:
Figure BDA0002209108670000125
when updating the bias matrix will
Figure BDA0002209108670000126
The matrix is truncated by a clip function, and
Figure BDA0002209108670000127
each element in the transformed matrix is [ -1,1 [ -1 [ ]]New matrix in the range and assigning it to bias matrix b of the first layerk
The following is an analysis of the concrete performance of the neural network compression system method based on random projection hash, provided by the invention, on compression and acceleration, and the algorithm complexity is as follows:
Figure BDA0002209108670000128
TABLE 1 algorithm complexity analysis Table
In the fully-connected layer, in actual compression, we take the compression factor n (n is greater than 1), and then b is the input vector Cs1/n, so we set b ═ CsThe algorithm complexity of the simplified version of/n in the table is as follows:
TABLE 2 analysis table for algorithm complexity of full connection layer
From the ratios we can discuss three cases:
when C is presents≈CtThe calculation acceleration factor is about n/2 and the storage compression factor is about n.
When C is presents>>CtWhen calculating the acceleration multiple of
Figure BDA0002209108670000141
The storage compression factor is about
Figure BDA0002209108670000142
When C is presentsWhen the compression rate is large enough, the accelerated compression is less than 1, and the effect is poor.
When C is presents<<CtThe calculation acceleration factor is n and the storage compression factor is about 32 n.
In actual use, the 1 st case is more common, so compression is theoretically practically effective.
In convolutional layers, we take the compression factor n (n > 1) when actually compressing, then b is Csdk 21/n, so we set b ═ Csdk 2The algorithm complexity of the simplified version of/n in the table is as follows:
Figure BDA0002209108670000143
TABLE 3 convolution layer algorithm complexity analysis table
Wherein D iskIs the spatial scale size of the weight w, DtIs the spatial scale size of the output feature map.
From the ratios we can discuss three cases:
when d isk 2Cs≈CtThe calculation acceleration factor is about n/2 and the storage compression factor is about n.
When in useWhen calculating the acceleration multiple of
Figure BDA0002209108670000145
The storage compression factor is aboutWhen C is presentsWhen the compression rate is large enough, the accelerated compression is less than 1, and the effect is poor.
When in useThe calculation acceleration factor is n and the storage compression factor is about 32 n.
Also, in actual use, the 1 st case is more common, so compression is theoretically actually effective.
The invention also provides a neural network compression system based on the random projection hash. As shown in fig. 4, the system includes a processor 42 and a memory 41 storing instructions executable by the processor 42;
processor 42 may be a general-purpose processor, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention, among others.
The memory 41 is used for storing the program codes and transmitting the program codes to the CPU. The memory 41 may include volatile memory, such as Random Access Memory (RAM); the memory 41 may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory 41 may also comprise a combination of memories of the kind described above.
Specifically, the neural network compression system based on random projection hash provided by the embodiment of the present invention includes a processor 42 and a memory 41; the memory 41 has stored thereon a computer program operable on the processor 42, which when executed by the processor 42 performs the steps of:
in forward propagation, compressing the input characteristic diagram and the weight matrix of each neural network layer through a projection matrix, and calculating an output characteristic diagram;
in backward propagation, calculating a loss function of the neural network according to the output characteristic diagram, and calculating the gradient value of the input characteristic diagram and the weight matrix of each layer through the loss function;
and updating the weight matrix according to the gradient value of the weight matrix of each layer.
The embodiment of the invention also provides a computer readable storage medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in user equipment. Of course, the processor and the storage medium may reside as discrete components in a communication device.
The neural network compression method and system based on the random projection hash provided by the invention are explained in detail above. Any obvious modifications to the invention, which would occur to those skilled in the art, without departing from the true spirit of the invention, would constitute a violation of the patent rights of the invention and would carry a corresponding legal responsibility.

Claims (10)

1. A neural network compression method based on random projection hash is characterized by comprising the following steps:
in forward propagation, compressing the input characteristic diagram and the weight matrix of each neural network layer through a projection matrix, and calculating an output characteristic diagram;
in backward propagation, calculating a loss function of the neural network according to the output characteristic diagram, and calculating the gradient value of the input characteristic diagram and the weight matrix of each layer through the loss function;
and updating the weight matrix according to the gradient value of the weight matrix of each layer.
2. The neural network compression method of claim 1, wherein in forward propagation, the input feature map and the weight matrix of each neural network layer are compressed by a projection matrix, and the output feature map is calculated by the following steps:
and multiplying the compressed input characteristic diagram and the weight matrix to obtain an output characteristic diagram.
3. The neural network compression method of claim 1, wherein:
and compressing the input characteristic diagram and the weight matrix of each neural network layer through the same projection matrix when compressing the input characteristic diagram and the weight matrix through the projection matrix.
4. The neural network compression method of claim 1, wherein the input feature map and the weight matrix of each neural network layer are compressed by a projection matrix, comprising the steps of:
for the input feature map SkAnd a weight matrix WkBy projection matrices of real values
Figure FDA0002209108660000011
Projecting the input feature map SkAnd the weight matrix is converted into a low-dimensional input feature map
Figure FDA0002209108660000012
And a low-dimensional weight matrix
Input feature map of low dimension
Figure FDA0002209108660000014
And a low-dimensional weight matrix
Figure FDA0002209108660000013
Hash encoding into binary
Figure FDA0002209108660000017
And
Figure FDA0002209108660000015
completing the compression of the input feature map and the weight matrix;
wherein k is the number of neural network layers, and b is a compression factor.
5. The neural network compression method of claim 1, wherein in forward propagation, the input eigen map and the weight matrix of the convolutional layer are compressed by a projection matrix to calculate an output eigen map; the method comprises the following steps:
reorganizing the input characteristic diagram and the weight matrix into a large matrix respectively;
compressing the reorganized input characteristic diagram and the weight matrix through a projection matrix;
multiplying the compressed input characteristic diagram and the weight matrix to obtain an output characteristic diagram;
and carrying out col2im operation on the output characteristic diagram to obtain a final output characteristic diagram.
6. The neural network compression method of claim 5, wherein:
in backward propagation, calculating gradient values of an input feature map and a weight matrix of each layer through a loss function; when sgn (x) included in the loss function is replaced with x, the gradient value of the feature map is input
Figure FDA0002209108660000024
Comprises the following steps:
Figure FDA0002209108660000021
weight matrixGradient value ofComprises the following steps:
Figure FDA0002209108660000022
wherein the content of the first and second substances,
Figure FDA0002209108660000026
is the gradient value of the output characteristic diagram;
Figure FDA0002209108660000027
carrying out Hash coding on the real-valued weight matrix;
Figure FDA0002209108660000028
a projection matrix which is real value;
Figure FDA0002209108660000029
the input feature map after hash coding.
7. The neural network compression method of claim 1, wherein in the backward propagation, gradient values of the input feature map and the weight matrix of each layer are calculated by a loss function; when sgn (x) included in the loss function is replaced with Htanh (x), the gradient value of the feature map is input
Figure FDA00022091086600000210
Comprises the following steps:
Figure FDA0002209108660000023
gradient values of the weight matrix
Figure FDA00022091086600000211
Comprises the following steps:
Figure FDA0002209108660000031
wherein the content of the first and second substances,
Figure FDA0002209108660000032
is the gradient value of the output characteristic diagram;carrying out Hash coding on the real-valued weight matrix;
Figure FDA0002209108660000033
a projection matrix which is real value;
Figure FDA0002209108660000034
the input characteristic graph is subjected to Hash coding; by using
Figure FDA0002209108660000036
The gradient of the Hardtach function is expressed.
8. The neural network compression method of claim 1, wherein:
updating the weight matrix according to the gradient value of the weight matrix of each layer
Figure FDA0002209108660000039
The matrix is truncated by a clip function, and
Figure FDA0002209108660000037
each element in the transformed matrix is [ -1,1 [ -1 [ ]]New matrices in the range and assigning the new matrices to the weight matrix W of the k-th layerk
Wherein η is the learning rate;
Figure FDA0002209108660000038
is the gradient value of the weight matrix; wkIs a rightAnd (4) a heavy matrix.
9. The neural network compression method of claim 1, wherein when the bias matrix is present when the data is input to the neural network, further comprising the steps of:
and calculating the gradient value of the bias matrix of each layer according to the loss function, and updating the bias matrix according to the gradient value of the bias matrix of each layer.
10. A neural network compression system based on random projection hash is characterized by comprising a processor and a memory; the memory having stored thereon a computer program executable on the processor, the computer program when executed by the processor implementing the steps of:
in forward propagation, compressing the input characteristic diagram and the weight matrix of each neural network layer through a projection matrix, and calculating an output characteristic diagram;
in backward propagation, calculating a loss function of the neural network according to the output characteristic diagram, and calculating the gradient value of the input characteristic diagram and the weight matrix of each layer through the loss function;
the weight matrix is updated according to the gradient value of the weight matrix of the first layer.
CN201910892214.7A 2019-09-20 2019-09-20 Neural network compression method and system based on random projection hash Pending CN110751274A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910892214.7A CN110751274A (en) 2019-09-20 2019-09-20 Neural network compression method and system based on random projection hash

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910892214.7A CN110751274A (en) 2019-09-20 2019-09-20 Neural network compression method and system based on random projection hash

Publications (1)

Publication Number Publication Date
CN110751274A true CN110751274A (en) 2020-02-04

Family

ID=69276792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910892214.7A Pending CN110751274A (en) 2019-09-20 2019-09-20 Neural network compression method and system based on random projection hash

Country Status (1)

Country Link
CN (1) CN110751274A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931937A (en) * 2020-09-30 2020-11-13 深圳云天励飞技术股份有限公司 Gradient updating method, device and system of image processing model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931937A (en) * 2020-09-30 2020-11-13 深圳云天励飞技术股份有限公司 Gradient updating method, device and system of image processing model
CN111931937B (en) * 2020-09-30 2021-01-01 深圳云天励飞技术股份有限公司 Gradient updating method, device and system of image processing model

Similar Documents

Publication Publication Date Title
CN111247537B (en) Method and system for effectively storing sparse neural network and sparse convolutional neural network
CN110520909B (en) Neural network processor using compression and decompression of activation data to reduce memory bandwidth utilization
CN107516129B (en) Dimension self-adaptive Tucker decomposition-based deep network compression method
Wu et al. Training and inference with integers in deep neural networks
CN108304921B (en) Convolutional neural network training method and image processing method and device
CN108416427A (en) Convolution kernel accumulates data flow, compressed encoding and deep learning algorithm
CN112418292B (en) Image quality evaluation method, device, computer equipment and storage medium
CN110659725A (en) Neural network model compression and acceleration method, data processing method and device
WO2023207836A1 (en) Image encoding method and apparatus, and image decompression method and apparatus
CN110751274A (en) Neural network compression method and system based on random projection hash
CN110782003A (en) Neural network compression method and system based on Hash learning
Huang et al. Rct: Resource constrained training for edge ai
WO2023051335A1 (en) Data encoding method, data decoding method, and data processing apparatus
CN116095183A (en) Data compression method and related equipment
Malach et al. Hardware-based real-time deep neural network lossless weights compression
WO2023159820A1 (en) Image compression method, image decompression method, and apparatuses
CN113887719B (en) Model compression method and device
CN113743593B (en) Neural network quantization method, system, storage medium and terminal
CN115409150A (en) Data compression method, data decompression method and related equipment
Kamiya et al. Binary-decomposed DCNN for accelerating computation and compressing model without retraining
CN112183731A (en) Point cloud-oriented high-efficiency binarization neural network quantization method and device
Kekre et al. Vector quantized codebook optimization using modified genetic algorithm
CN114640357B (en) Data encoding method, apparatus and storage medium
KR20210048396A (en) Apparatus and method for generating binary neural network
CN114077885A (en) Model compression method and device based on tensor decomposition and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination