CN110751274A

CN110751274A - Neural network compression method and system based on random projection hash

Info

Publication number: CN110751274A
Application number: CN201910892214.7A
Authority: CN
Inventors: 沈明珠; 徐毅; 刘祥龙
Original assignee: Beijing University of Aeronautics and Astronautics
Current assignee: Beihang University; Beijing University of Aeronautics and Astronautics
Priority date: 2019-09-20
Filing date: 2019-09-20
Publication date: 2020-02-04

Abstract

The invention discloses a neural network compression method and system based on random projection hash. The method comprises the following steps: in forward propagation, compressing the input characteristic diagram and the weight matrix of each neural network layer through a projection matrix, and calculating an output characteristic diagram; in backward propagation, calculating a loss function of the neural network according to the output characteristic diagram, and calculating the gradient value of the input characteristic diagram and the weight matrix of each layer through the loss function; the weight matrix is updated according to the gradient value of the weight matrix of the first layer. The method can customize the compression factor according to the requirements of accuracy and compression ratio, and has high flexibility.

Description

Neural network compression method and system based on random projection hash

Technical Field

The invention relates to a neural network compression method based on random projection hash, and also relates to a neural network compression system for realizing the method.

Background

In recent years, deep neural networks have shown great potential in many fields, including computer vision and speech recognition. The rapid development of a big data technology and GPU parallel computation is benefited, and strong hardware support is provided in the huge neural learning of a complex network, so that a neural network model and a deep learning method are more and more applied to the field of computer vision. The accuracy and performance of the method are remarkably improved in the aspects of object identification, image classification, image retrieval, face verification, video understanding and the like, which exceed those of other methods. In the field of computer vision, a deep neural network-convolutional neural network can accurately acquire information from big data due to the excellent performance of the deep neural network-convolutional neural network in the process of simulating abstraction and iteration of a human brain, so that the highest standard can be achieved in many applications. At the same time, we can see some interesting advances in virtual reality, augmented reality, intelligent wearable devices that are emerging in the field of computer vision. In summary, it is very slow to load a high-performance identification system on an intelligent portable device.

However, current convolutional neural network-based recognition systems require a large amount of memory and high performance computing power, and are typically implemented on expensive GPU clusters. Although training of the neural network may be trained on a cluster of GPUs, the testing process needs to be done on the mobile device if it is to be done in real time. However, as the data set increases and the number of features increases, the scale of the model, the storage of parameters, and the amount of calculation also increase, which leads to a demand for high calculation capability, and thus the deep convolutional neural network can hardly be used in a mobile device or an embedded device, and hinders the development thereof.

It is known that mobile devices, embedded devices, and ordinary mobile computers have limitations on memory space, computing power, and power consumption, while most mobile devices have only 1GB Random Access Memory (RAM), the parameters obtained by training the convolutional neural network that wins the ImageNet race in 2014 have reached 576MB, which occupies a large amount of RAM, and it is not acceptable in terms of power consumption to test a picture to load more than 500M parameters, not to mention computing time, and these disadvantages of high memory and high computing power requirements are exposed on mobile devices. Therefore, the models greatly exceed the memory, required power and computing capacity which can be borne by the mobile phone, the embedded device and the intelligent wearable device.

However, more and more deep learning applications are oriented to mobile devices and embedded devices, for example, smart phones and robots implement image classification, and an automatic driving car needs to perform object recognition in real time, so how to compress a neural network to reduce the calculation amount and the storage space is an urgent need.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a neural network compression method based on random projection hash.

Another technical problem to be solved by the present invention is to provide a neural network compression system based on random projection hash.

In order to achieve the purpose, the invention adopts the following technical scheme:

according to a first aspect of the embodiments of the present invention, there is provided a neural network compression method based on random projection hash, including the following steps:

in forward propagation, compressing the input characteristic diagram and the weight matrix of each neural network layer through a projection matrix, and calculating an output characteristic diagram;

in backward propagation, calculating a loss function of the neural network according to the output characteristic diagram, and calculating the gradient value of the input characteristic diagram and the weight matrix of each layer through the loss function;

and updating the weight matrix according to the gradient value of the weight matrix of each layer.

Preferably, in the forward propagation, the input feature map and the weight matrix of each neural network layer are compressed by a projection matrix, and the output feature map is calculated by the following steps:

and multiplying the compressed input characteristic diagram and the weight matrix to obtain an output characteristic diagram.

Preferably, when the input characteristic diagram and the weight matrix of each neural network layer are compressed by the projection matrix, the input characteristic diagram and the weight matrix are compressed by the same projection matrix.

Preferably, the input feature map and the weight matrix of each neural network layer are compressed by a projection matrix, and the method comprises the following steps:

for the input feature map S_kAnd a weight matrix W_kBy projection matrices of real values

Projecting the input feature map S_kAnd the weight matrix is converted into a low-dimensional input feature map

And a low-dimensional weight matrix

Input feature map of low dimensionAnd a low-dimensional weight matrix

Hash encoding into binaryAnd

completing the compression of the input feature map and the weight matrix;

wherein k is the number of neural network layers, and b is a compression factor.

Preferably, in the forward propagation, the input characteristic diagram and the weight matrix of the convolutional layer are compressed through a projection matrix, and an output characteristic diagram is calculated; the method comprises the following steps:

reorganizing the input characteristic diagram and the weight matrix into a large matrix respectively;

compressing the reorganized input characteristic diagram and the weight matrix through a projection matrix;

multiplying the compressed input characteristic diagram and the weight matrix to obtain an output characteristic diagram;

and carrying out col2im operation on the output characteristic diagram to obtain a final output characteristic diagram.

Preferably, in backward propagation, calculating gradient values of the input feature map and the weight matrix of each layer through a loss function; wherein the non-derivable y (sgn (x) function used in the forward process is approximated by y (x), so that when sgn (x) contained in the loss function is replaced by x, the gradient value of the input signature is

Comprises the following steps:

gradient values of the weight matrixComprises the following steps:

wherein the content of the first and second substances,

is the gradient value of the output characteristic diagram;

carrying out Hash coding on the real-valued weight matrix;

a projection matrix which is real value;

the input feature map after hash coding.

Preferably, in backward propagation, calculating gradient values of the input feature map and the weight matrix of each layer through a loss function; in which the forward process is to be carried outThe function used in (a) is approximated by y h tan (x), so that when sgn (x) included in the loss function is replaced by h tan (x), the gradient value of the input feature map is used

Comprises the following steps:

gradient values of the weight matrix

Comprises the following steps:

wherein the content of the first and second substances,

is the gradient value of the output characteristic diagram;

carrying out Hash coding on the real-valued weight matrix;

a projection matrix which is real value;

the input characteristic graph is subjected to Hash coding; by using

The gradient of the Hard tanh function is shown.

Wherein preferably, the weight matrix is updated according to the gradient value of the weight matrix of each layer

The matrix is truncated by a clip function, and

each element in the transformed matrix is [ -1,1 [ -1 [ ]]New matrices in the range and assigning the new matrices to the weight matrix W of the k-th layer_k；

Wherein η is the learning rate;

is the gradient value of the weight matrix; w_kIs a weight matrix.

Preferably, when the bias matrix exists when the data is input into the neural network, the method further comprises the following steps:

and calculating the gradient value of the bias matrix of each layer according to the loss function, and updating the bias matrix according to the gradient value of the bias matrix of each layer.

According to a second aspect of the embodiments of the present invention, there is provided a neural network compression system based on random projection hash, including a processor and a memory; the memory having stored thereon a computer program executable on the processor, the computer program when executed by the processor implementing the steps of:

According to the neural network compression method based on random projection Hash, parameters are quantized and approximately calculated through a random projection Hash algorithm in random projection, the method can be faster than the same type of method, less memory is applied to a neural network, compression factors can be customized individually according to the requirements of accuracy and compression ratio, high flexibility is achieved, and the storage requirements of the parameters are greatly reduced.

Drawings

FIG. 1 is a flow chart of a neural network compression method based on random projection hashing according to the present invention;

FIG. 2 is a schematic diagram of im2col operation in an embodiment of the present invention;

FIG. 3 is a schematic diagram of the Hard tanh function in an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a neural network compression system based on random projection hash provided in the present invention.

Detailed Description

The technical contents of the invention are described in detail below with reference to the accompanying drawings and specific embodiments.

At present, many relevant papers are researched for compression acceleration implementation of a neural network, and particularly, a general formula for implementing the relevant papers by binarization is provided. However, at present, the weight matrix and the input matrix are directly converted into the same binary number in compression, so the compression multiple, namely the real number of 4 bytes is converted into the number of 1 bit, and then the compression multiple is 32 times, and the acceleration is obvious because the bit number can be realized through operations such as bit operation in calculation. However, since the compression factor is fixed to 32 times, this condition is very limited. For example, on a small network, a compression ratio of 32 times may result in too few parameters and too much loss of accuracy, while on a large network, a compression ratio of 32 times may appear too small and the model remains large after compression.

Therefore, in the neural network compression method based on random projection hash provided by the invention, the improvement is carried out by combining a projection compression method, the input matrix can be compressed by adding a hyper-parameter compression factor, the weight matrix can be correspondingly compressed in one dimension, and finally, a binarized input matrix and a binarized weight matrix can be obtained. Therefore, the achievable compression factor can exceed the fixed compression factor used by some existing research methods, and meanwhile, the compression factor can be customized according to the specific size of the network, so that the flexibility is higher. And because the compression factor can be larger, when the test is loaded on the mobile equipment, the required calculation amount and the storage space are smaller, the energy consumption is less, and the test can be more suitable for the operation on the low-performance equipment.

As shown in fig. 1, the neural network compression method based on random projection hash provided by the present invention includes the following steps: firstly, in forward propagation, compressing an input characteristic diagram and a weight matrix of each neural network layer through a projection matrix, and calculating an output characteristic diagram; then, in backward propagation, calculating a loss function of the neural network according to the output characteristic diagram, and calculating the gradient value of the input characteristic diagram and the weight matrix of each layer through the loss function; and finally, updating the weight matrix according to the gradient value of the weight matrix of each layer. This process is described in detail below.

And S1, compressing the input characteristic diagram and the weight matrix of each neural network layer through a projection matrix in forward propagation, and calculating an output characteristic diagram.

Before describing the neural network compression method based on random projection hash provided by the invention, a hash algorithm is introduced. The main idea of the hash algorithm is to map binary values of arbitrary length to binary values of fixed length, called hash values, by means of a designed hash function, which compresses messages or data of different sizes into a fixed data format, so that the amount of data is greatly reduced. Hash random projection is used for retrieval and storage in the image field and is widely used for data encryption in the cryptology field.

The three major elements of the hash algorithm are an input space, a hash function and an output space, i.e., a space containing a hash value, so the most important of the hash algorithm is the hash function. Generally, the hash algorithm needs several hash functions to work together to change the original data into hash codes, and can maintain consistency with the input data, that is, if two sets of input data are similar in the input space, their hash codes are similar in the output space, and the same holds true in reverse. The locality sensitive hashing algorithm is a widely used hashing algorithm. Roughly defined as follows:

given a set of hash functions H ═ H: d → U }, for each function H e H in H, for any two vectors p, q e D, if the following condition is satisfied:

if d (p, q) is less than or equal to d₁Then Pr [ h (q) ═ h (p)]≥P₁；

If d (p, q) ≧ d₂Then Pr [ h (q) ═ h (p)]≤P₂。

Wherein d (p, q) is the distance between p and q, Pr [ h (q) ═ h (p)]Denotes the probability that h (q) and h (p) are equal, d₁，d₂，P₁，P₂Is a threshold value, typically d₁＜d₂，P₁＞P₂Then we call hash function cluster H location sensitive, i.e. (d)₁，d₂，P₁，P₂)-sensitive。

In terms of images, the hash algorithm is widely used in image retrieval, image storage, video retrieval, video storage and the like. In the traditional image retrieval, the image features directly extracted from the images are high-dimensional and high-precision data, extremely complex operation is needed when the image features are directly used as retrieval basis, and millions of images are generally contained in an image database, so that the time consumption is long and the real-time requirement is difficult to achieve. And the retrieval based on the Hash algorithm directly utilizes Hash codes transformed from image features to calculate and retrieve the similarity, thereby obviously reducing the calculation time and the storage space.

In the embodiment provided by the invention, the random projection hash algorithm is applied to the compression of the neural network. In forward propagation, compressing the input characteristic diagram and the weight matrix of each neural network layer through a projection matrix, and calculating an output characteristic diagram; the method specifically comprises the following steps:

and S11, compressing the input characteristic diagram and the weight matrix of each neural network layer through the projection matrix.

When the input characteristic diagram and the weight matrix of each neural network layer are compressed through the projection matrix, the input characteristic diagram and the weight matrix are compressed through the same projection matrix; the method specifically comprises the following steps:

s111, inputting a characteristic diagram S_kAnd a weight matrix W_kBy projection matrices of real values

Projecting and converting the feature into a low-dimensional input feature map

And a low-dimensional weight matrix

S112, inputting the low-dimensional input feature map

And a low-dimensional weight matrixHash encoding into binary

And

and completing the compression of the input feature map and the weight matrix. Wherein k is the number of neural network layers, and b is a compression factor.

Then a further action is taken on the hash code where the hash matrix P may be data insensitive, such as random projection.

S12, inputting the compressed characteristic diagram S_kAnd a weight matrix W_kAnd multiplying to obtain an output characteristic diagram.

In the embodiment provided by the invention, in the forward propagation process, the input characteristic diagram and the weight matrix of each neural network layer are compressed through the same projection matrix, and the matrix obtained by multiplying the compressed matrices is used as the approximation of the original output characteristic diagram. Namely, it isThe original output characteristic diagram passes through the input characteristic diagram S of real value_kWeight matrix W_kMultiplication calculation of the input matrix now by binarization

Weight matrixThe multiplication calculation results in that:

specifically, in each layer, assuming that L-layer networks are shared, the network calculation for each layer, i.e., k ═ 1 to L, is as follows:

for the fully-connected layer, the fully-connected layer performs linear operation, and the calculation formula is as follows:

wherein, R is the size of the compressed matrix, and m × n is the number of rows and columns of the matrix corresponding to the compressed input feature map, respectively. m C_rThe number of rows and columns, T, of the matrix corresponding to the compressed feature matrix_kIs an output characteristic diagram; c_rAnd n is the number of rows and columns of the matrix corresponding to the compressed output characteristic diagram respectively.

For convolutional layers, the specific operation is convolution operation, so that the operation is different from the linear operation of fully-connected layers, but in fact, in the code of deep learning platforms such as Torch or Caffe, the convolution is realized by drawing the high-dimensional matrix of the original input feature map into a corresponding form according to a certain rule, and the weight matrix is the same, and then the linear operation is performed. This rule is the operation of im2 col.

In the embodiment provided by the invention, in the forward propagation, the input characteristic diagram and the weight matrix of the convolutional layer are compressed through a projection matrix, and an output characteristic diagram is calculated; the method specifically comprises the following steps:

s01, reorganizing the input feature map and the weight matrix into a larger matrix.

And S02, compressing the reorganized input feature map and the weight matrix through the projection matrix.

And S03, multiplying the compressed input characteristic diagram and the weight matrix to obtain an output characteristic diagram.

And S04, carrying out col2im operation on the output characteristic diagram to obtain a final output characteristic diagram.

Specifically, as shown in fig. 2, in a specific implementation, the input feature map and the weight matrix are reorganized into a larger matrix, wherein the process of reorganizing the input feature map and the weight matrix into a larger matrix employs a conventional reorganization process in the art, and is not limited in this respect. And then, compressing the reorganized input characteristic diagram and the weight matrix through a projection matrix, and multiplying the compressed input characteristic diagram and the weight matrix to obtain an output characteristic diagram. Finally, col2im operation is performed on the output feature map (col2i operation is a conventional operation in the field, and is not described herein again), so as to implement convolution with higher efficiency. So in essence, the implementation of the compression of the convolutional layer is not very different from that of the fully-connected layer, and only slightly different. The convolutional layer calculation formula is as follows:

wherein the hash of the binary is encoded

The corresponding matrix R has a size b (d)_t*d_t)。m*C_rAnd (d)_t*d_t)*C_rWith b (d)_t*d_t) Similarly, they will not be described in detail herein.

S2, in backward propagation, calculating a loss function of the neural network according to the output characteristic diagram, and calculating gradient values of the input characteristic diagram and the weight matrix of each layer through the loss function; specifically, a loss function of the neural network is calculated according to the output characteristic diagram, and the initial gradient value of the final output layer can be obtained through the loss function

Since the output of the k-1 th layer corresponds to the input of the k-th layer, i.e. T_k-1＝S_kCalculating gradient values of the input feature map and the weight matrix; and (4) recursion layer by layer is carried out on k being equal to L to 2, the gradient value of the input feature map and the weight matrix of each layer can be obtained, when the initial input data of the neural network is finally obtained, the weight matrix corresponding to the feature map is input, and the weight matrix is subjected to corresponding operation to update the initial weight matrix.

In the embodiment provided by the present invention, the loss function of the neural network is calculated according to the output feature diagram, and the loss function of the neural network is calculated in a conventional cross entropy manner, which is not described herein again.

Because some transformation is performed on the input feature map and the output feature map during forward propagation, the gradient of the parameter will change correspondingly during backward propagation, and because of the existence of the projection matrix, the gradient of the input matrix and the gradient of the weight matrix will change correspondingly due to the operation of the projection matrix, and further derivation of their gradient needs to be performed through the chain rule.

For calculating gradient parameters, the final output layer of the convolutional neural networkThe gradient of (a) can be obtained directly, so that we can easily obtain the gradient of the error rate of each layer to the output characteristic diagram, and k is 1 to L

From which can be derived according to the chain rule

The following were used:

by introducing a formula for forward propagation

In the above-described formula, the first and second groups,

this cannot be given directly, since the derivation of the sgn function is involved, but the sgn function is not continuously derivable. Therefore, different treatments are generally required to be performed when the derivatives are derived, and most approximate continuous functions are used for approximation, so that different derivation treatments are performed according to different approximate functions in the following.

If the approximation to y-sgn (x) is performed with y-x, i.e. the sgn is directly removed for approximation,

the following can be obtained:

the expression of the final gradient is therefore:

if y equals sgn (x) is approximated by y equals htanh (x) max (-1, min (1, x)), then the method uses the Hard tanh function, which is shown in fig. 3.

When x > -1 and x < ═ 1,

otherwise

By 1_|x|≤1The gradient of the Hard tanh function is expressed, and then the gradient is obtained

The expression of the final gradient is therefore:

s3, the weight matrix is updated according to the gradient value of the weight matrix of each layer.

Updating the weight matrix according to the gradient value of the weight matrix of each layer, and adopting the following formula:

η is learning rate, which can be set according to requirement, when updating the weight matrix

The matrix is truncated by a clip function, and

each element in the transformed matrix is [ -1,1 [ -1 [ ]]New matrices within the range and assign them to the weight matrix W_k。

In another embodiment provided by the present invention, a bias matrix b exists when data is input to the neural network_kIn forward propagation, compressing the input characteristic diagram and the weight matrix of each neural network layer through a projection matrix, and calculating an output characteristic diagram; the output characteristic diagram at this time is:

then, in backward propagation, calculating a loss function of the neural network according to the output characteristic diagram, and calculating the gradient value of the input characteristic diagram and the weight matrix of each layer through the loss function; finally, updating the weight matrix according to the gradient value of the weight matrix of each layer, calculating the gradient value of the bias matrix of each layer according to the loss function, updating the bias matrix according to the gradient value of the bias matrix of each layer, and adopting the following formula:

when updating the bias matrix will

The matrix is truncated by a clip function, and

each element in the transformed matrix is [ -1,1 [ -1 [ ]]New matrix in the range and assigning it to bias matrix b of the first layer_k。

The following is an analysis of the concrete performance of the neural network compression system method based on random projection hash, provided by the invention, on compression and acceleration, and the algorithm complexity is as follows:

TABLE 1 algorithm complexity analysis Table

In the fully-connected layer, in actual compression, we take the compression factor n (n is greater than 1), and then b is the input vector C_s1/n, so we set b ═ C_sThe algorithm complexity of the simplified version of/n in the table is as follows:

TABLE 2 analysis table for algorithm complexity of full connection layer

From the ratios we can discuss three cases:

when C is present_s≈C_tThe calculation acceleration factor is about n/2 and the storage compression factor is about n.

When C is present_s＞＞C_tWhen calculating the acceleration multiple of

The storage compression factor is about

When C is present_sWhen the compression rate is large enough, the accelerated compression is less than 1, and the effect is poor.

When C is present_s＜＜C_tThe calculation acceleration factor is n and the storage compression factor is about 32 n.

In actual use, the 1 st case is more common, so compression is theoretically practically effective.

In convolutional layers, we take the compression factor n (n > 1) when actually compressing, then b is C_sd_k ²1/n, so we set b ═ C_sd_k ²The algorithm complexity of the simplified version of/n in the table is as follows:

TABLE 3 convolution layer algorithm complexity analysis table

Wherein D is_kIs the spatial scale size of the weight w, D_tIs the spatial scale size of the output feature map.

From the ratios we can discuss three cases:

when d is_k ²C_s≈C_tThe calculation acceleration factor is about n/2 and the storage compression factor is about n.

When in useWhen calculating the acceleration multiple of

The storage compression factor is aboutWhen C is present_sWhen the compression rate is large enough, the accelerated compression is less than 1, and the effect is poor.

When in useThe calculation acceleration factor is n and the storage compression factor is about 32 n.

Also, in actual use, the 1 st case is more common, so compression is theoretically actually effective.

The invention also provides a neural network compression system based on the random projection hash. As shown in fig. 4, the system includes a processor 42 and a memory 41 storing instructions executable by the processor 42;

processor 42 may be a general-purpose processor, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention, among others.

The memory 41 is used for storing the program codes and transmitting the program codes to the CPU. The memory 41 may include volatile memory, such as Random Access Memory (RAM); the memory 41 may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory 41 may also comprise a combination of memories of the kind described above.

Specifically, the neural network compression system based on random projection hash provided by the embodiment of the present invention includes a processor 42 and a memory 41; the memory 41 has stored thereon a computer program operable on the processor 42, which when executed by the processor 42 performs the steps of:

The embodiment of the invention also provides a computer readable storage medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in user equipment. Of course, the processor and the storage medium may reside as discrete components in a communication device.

The neural network compression method and system based on the random projection hash provided by the invention are explained in detail above. Any obvious modifications to the invention, which would occur to those skilled in the art, without departing from the true spirit of the invention, would constitute a violation of the patent rights of the invention and would carry a corresponding legal responsibility.

Claims

1. A neural network compression method based on random projection hash is characterized by comprising the following steps:

2. The neural network compression method of claim 1, wherein in forward propagation, the input feature map and the weight matrix of each neural network layer are compressed by a projection matrix, and the output feature map is calculated by the following steps:

3. The neural network compression method of claim 1, wherein:

and compressing the input characteristic diagram and the weight matrix of each neural network layer through the same projection matrix when compressing the input characteristic diagram and the weight matrix through the projection matrix.

4. The neural network compression method of claim 1, wherein the input feature map and the weight matrix of each neural network layer are compressed by a projection matrix, comprising the steps of:

And a low-dimensional weight matrix

Input feature map of low dimension

And a low-dimensional weight matrix

Hash encoding into binary

And

completing the compression of the input feature map and the weight matrix;

5. The neural network compression method of claim 1, wherein in forward propagation, the input eigen map and the weight matrix of the convolutional layer are compressed by a projection matrix to calculate an output eigen map; the method comprises the following steps:

6. The neural network compression method of claim 5, wherein:

in backward propagation, calculating gradient values of an input feature map and a weight matrix of each layer through a loss function; when sgn (x) included in the loss function is replaced with x, the gradient value of the feature map is input

Comprises the following steps:

weight matrixGradient value ofComprises the following steps:

wherein the content of the first and second substances,

is the gradient value of the output characteristic diagram;

carrying out Hash coding on the real-valued weight matrix;

a projection matrix which is real value;

the input feature map after hash coding.

7. The neural network compression method of claim 1, wherein in the backward propagation, gradient values of the input feature map and the weight matrix of each layer are calculated by a loss function; when sgn (x) included in the loss function is replaced with Htanh (x), the gradient value of the feature map is input

Comprises the following steps:

gradient values of the weight matrix

Comprises the following steps:

wherein the content of the first and second substances,

is the gradient value of the output characteristic diagram;carrying out Hash coding on the real-valued weight matrix;

a projection matrix which is real value;

the input characteristic graph is subjected to Hash coding; by using

The gradient of the Hardtach function is expressed.

8. The neural network compression method of claim 1, wherein:

updating the weight matrix according to the gradient value of the weight matrix of each layer

The matrix is truncated by a clip function, and

Wherein η is the learning rate;

is the gradient value of the weight matrix; w_kIs a rightAnd (4) a heavy matrix.

9. The neural network compression method of claim 1, wherein when the bias matrix is present when the data is input to the neural network, further comprising the steps of:

10. A neural network compression system based on random projection hash is characterized by comprising a processor and a memory; the memory having stored thereon a computer program executable on the processor, the computer program when executed by the processor implementing the steps of:

the weight matrix is updated according to the gradient value of the weight matrix of the first layer.