CN115238881A

CN115238881A - Compression method, device and system for neural network model and storage medium

Info

Publication number: CN115238881A
Application number: CN202110414843.6A
Authority: CN
Inventors: 王旺; 林殷茵
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2021-04-17
Filing date: 2021-04-17
Publication date: 2022-10-25

Abstract

The invention relates to a compression method, a device, a system and a computer readable storage medium for a neural network model. The compression method comprises the following steps: grouping an eigentensor to be stored, the grouping based on first characteristics of the eigentensor; calculating the similarity of the feature tensors in the feature tensor group; comparing the calculated similarity with a preset similarity standard; and compressing the set of feature tensors if the similarity of the set of feature tensors satisfies the similarity criterion, otherwise not compressing the set of feature tensors. The technical scheme can reduce the storage space required in the training/reasoning process of the neural network model and reduce the power consumption in the operation process.

Description

Compression method, device and system for neural network model and storage medium

Technical Field

The invention relates to the field of deep learning, in particular to a compression method, a device and a system for a neural network model and a computer-readable storage medium.

Background

With the increasing of the network depth in the deep learning, the accuracy of the deep neural network in the aspects of target classification, target detection, natural language processing and the like is also increasing. However, deeper networks bring more storage requirements.

In general, deep neural networks need to be deployed through two processes, training and reasoning. The training process often uses a learning method of random gradient descent, and the method needs to use an intermediate activation value in a forward propagation process to calculate the weight gradient, so that a large number of storage units are needed. The proposal of Batch processing (Batch operation) makes the storage requirement of the intermediate activation value become larger, thereby aggravating the energy consumption problem of the hardware deployment, especially when the intermediate activation value is applied to the equipment of the internet of things.

Disclosure of Invention

In order to solve or at least alleviate one or more of the existing problems, such as those described above, the present invention provides the following technical solutions.

According to an aspect of the present invention, a compression method for a neural network model is provided. The method comprises the following steps: grouping feature tensors to be stored, the grouping based on a first characteristic of the feature tensors; calculating the similarity of the feature tensors in the feature tensor group; comparing the calculated similarity with a preset similarity standard; and compressing the set of feature tensors if the similarity of the set of feature tensors satisfies the similarity criterion, otherwise not compressing the set of feature tensors.

Alternatively or additionally to the above, according to an embodiment of the invention, the similarity includes a difference value, the preset similarity criterion includes a first threshold, and the feature tensor group is compressed if the difference value of the feature tensor group is lower than the first threshold, and is not compressed otherwise.

Alternatively or additionally to the above, according to an embodiment of the invention, the compression method for a neural network model, includes a full compression in which all the feature tensors in the set of feature tensors are compressed into similar feature tensors of the set of feature tensors.

Alternatively or additionally to the above, according to an embodiment of the invention, the fully-compressed feature tensor set is stored as the similar feature tensor and a first index value of the feature tensor set, and the feature tensor set that is not compressed is stored as each feature tensor and a second index value of the feature tensor set.

Alternatively or additionally to the above, according to an embodiment of the invention, the compression method for a neural network model, wherein the compression further includes a partial compression in which a proper subset of the set of feature tensors is compressed into similar feature tensors of the proper subset, and the rest of the feature tensors are not compressed.

Alternatively or additionally to the above, according to an embodiment of the invention, the set of feature tensors subjected to the partial compression is stored as the similar feature tensors of the proper subset, each feature tensor in a complement of the proper subset, a position index value of each feature tensor in the complement in the set of feature tensors, and a third index value.

Alternatively or additionally to the above, according to an embodiment of the invention, the compression method for a neural network model, wherein the proper subset of the set of feature tensors includes all feature tensors except a maximum feature tensor and/or a minimum feature tensor in the set of feature tensors.

Alternatively or additionally to the above, according to an embodiment of the invention, the similarity includes a difference value, the preset similarity criterion includes a second threshold and a third threshold, and the second threshold is smaller than the third threshold, in a case where the difference value of the feature tensor group is lower than the second threshold, the feature tensor group is completely compressed, in a case where the difference value of the feature tensor group is not lower than the second threshold but lower than the third threshold, the feature tensor group is partially compressed, and otherwise, the feature tensor group is not compressed.

Alternatively or additionally to the above, according to an embodiment of the invention, the similarity feature tensor is a maximum feature tensor, a minimum feature tensor, and/or an average feature tensor.

Alternatively or additionally to the above, according to an embodiment of the invention, the difference value includes a difference value between a maximum feature tensor and a minimum feature tensor in the feature tensor group, a difference value between the maximum feature tensor and an average feature tensor, and/or a difference value between the average feature tensor and the minimum feature tensor.

Alternatively or additionally to the above, the compression method for a neural network model according to an embodiment of the present invention further includes inputting an initial threshold as the first threshold; obtaining an accuracy of the neural network model based on the initial threshold; and adjusting the first threshold value so that the precision is a preset precision: if the accuracy of the neural network model is lower than the preset accuracy, decreasing the first threshold until the accuracy of the neural network model increases to the preset accuracy, and if the accuracy of the neural network model is equal to/higher than the preset accuracy, increasing the first threshold until the accuracy of the neural network model starts to be less than the preset accuracy.

Alternatively or additionally to the above, a compression method for a neural network model according to an embodiment of the present invention, wherein the feature tensor to be stored is obtained from: the outputs/outputs of the convolutional layer, the active layer and/or the pooling layer in the inference process of the neural network model, the outputs/outputs of the convolutional layer, the active layer and/or the pooling layer which are propagated in the forward direction in the training process of the neural network model, and/or the outputs/outputs of the characteristic error layer which is propagated in the backward direction in the training process of the neural network model.

Alternatively or additionally to the above, according to an embodiment of the invention, the feature tensor includes an image feature tensor/an audio feature tensor.

Alternatively or additionally to the above, a compression method for a neural network model according to an embodiment of the present invention, wherein the first characteristic includes temporal continuity and/or spatial continuity.

According to another aspect of the present invention, there is provided a compression apparatus for a neural network model. The device comprises: a memory configured to store instructions; and a processor arranged, when executed, to implement a compression method for a neural network model according to any embodiment of the invention.

According to yet another aspect of the present invention, there is provided a compression system for a neural network model, the system comprising: grouping means for grouping feature tensors to be stored, the grouping being based on a first characteristic of the feature tensors; calculating means for calculating a similarity of the feature tensors in the set of feature tensors; a comparison means for comparing the calculated similarity with a preset similarity standard; and compressing means for compressing the set of feature tensors if the similarity of the set of feature tensors satisfies the similarity criterion, and not compressing the set of feature tensors otherwise.

According to yet another aspect of the present invention, there is provided a computer readable storage medium for storing instructions, characterized in that the instructions, when executed, implement a compression method for a neural network model according to any one of the embodiments of the present invention.

The compression technique for neural network models according to the present invention can achieve a reduction in the memory space required during the neural network model training/reasoning process, and possibly also reduce power consumption during operation.

Drawings

The above and other objects and advantages of the present invention will become more fully apparent from the following detailed description taken in conjunction with the accompanying drawings.

FIG. 1 illustrates a compression method 1000 for a neural network model, according to one embodiment of the invention.

Fig. 2 shows an image 2000 to be processed by a neural network model.

Fig. 3 illustrates a compression result to which a compression method for a neural network model according to an embodiment of the present invention is applied.

A computer apparatus 4000 for compression of a neural network model according to one embodiment of the present invention is shown in fig. 4.

A compression system 5000 for a neural network model according to an embodiment of the invention is shown in fig. 5.

Fig. 6 illustrates the calculation of feature tensors after compression via a compression method for a neural network model according to an embodiment of the present invention.

Detailed Description

It is to be understood that the terms first, second and the like in the description and in the claims of the present invention are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. Furthermore, unless specifically stated otherwise, the terms "comprising," "including," "having," and the like are intended to mean a non-exclusive inclusion.

First, according to the design concept of the present invention, a compression method for a neural network model is provided. The method comprises the following steps: grouping feature tensors to be stored, the grouping based on a first characteristic of the feature tensors; calculating a similarity of the feature tensors in the set of feature tensors; comparing the calculated similarity with a preset similarity standard; and compressing the set of feature tensors if the similarity of the set of feature tensors satisfies the similarity criterion, otherwise not compressing the set of feature tensors. In practical application, the technical scheme of the invention can be applied to aspects of deep learning-based target classification, target detection, natural language processing and the like. The technical scheme of the invention can be applied to the fields of computer vision, hearing and the like, because the data sets in the fields are often composed of pictures, audio and video, and the data tensors in the fields have great similarity. For example, the pixel values between each frame in a video have many similar values, which are used as input and generated intermediate characteristic values of a neural network, and provide the possibility for compression.

Hereinafter, exemplary embodiments according to the present invention will be described in detail with reference to the accompanying drawings. FIG. 1 is a method 1000 for compression of a neural network model, according to one embodiment of the invention.

In step S101, feature tensors to be stored are grouped. Here, the grouping is performed based on the characteristics of the feature tensor.

The feature tensor can be an image feature tensor, an audio feature tensor, or the like obtained from an input and output tensor of a convolutional layer, an active layer, a pooling layer, or the like of a neural network (e.g., alexnet neural network) in a training or reasoning process. The characteristic of the feature tensor from which the grouping is based may be spatial continuity. In general, data densely existing in a spatial region has spatial continuity. For example, fig. 2 is an image 2000 to be processed by a neural network model, where each cell is 1 pixel (e.g., pixel 201), each pixel corresponding to one feature tensor to be stored. That is, 9 × 9 pixels in total are included in the image 2000, corresponding to 9 × 9 feature tensors to be stored. The corresponding 9 × 9 feature tensors in fig. 2 are grouped according to spatial continuity. Specifically, 3 × 3 adjacent feature tensors in the spatial region are divided into a group, as shown by a bold frame in the figure, and the group is totally divided into nine groups, and each group includes nine adjacent feature tensors. As shown in the right part of fig. 2, the set of feature tensors 210 in the dashed box specifically includes nine adjacent feature tensors a, b, c, d, e, f, g, h, and i.

It should be noted that, in the present invention, the grouping according to the spatial continuity is not limited to grouping 3 × 3 adjacent feature tensors in a square pattern, but any suitable number of adjacent feature tensors may be grouped in any suitable shape, such as grouping 5 × 5 adjacent feature tensors in a square pattern, or grouping 2 × 3 adjacent feature tensors in a rectangular pattern. It should be noted that the packet in the present invention is not limited to a complete packet, but may be an incomplete packet. For example, all the feature tensors in the image 2000 are not necessarily grouped, but only some of the feature tensors may be grouped. It is further noted that the tensor in the present invention can be a one-dimensional array, a two-dimensional matrix, a three-dimensional spatial matrix, or a number of larger dimensions.

Alternatively, for a segment of audio, the characteristic of the feature tensor can also be time-continuous, i.e., data that appears continuously over a period of time has time-continuity.

In step S102, the similarity of the feature tensors in the group is calculated, and the similarity may be a value representing the degree of difference of the respective feature tensors in the feature tensor group 210. For example, the similarity may be a difference between a maximum value and a minimum value of all the feature tensors in the feature tensor group 210, i.e., max (a, b, c, d, e, f, g, h, i) -min (a, b, c, d, e, f, g, h, i). It should be noted that the expression of the similarity value is not limited to the difference between the maximum value and the minimum value in the group, but may be any suitable expression, such as the difference between the maximum value and the average value in the group, the difference between the average value and the minimum value in the group, and the like.

In step S103, the calculated similarity is compared with a preset similarity criterion. The similarity criterion here may be, for example, a preset threshold value. Here, the threshold may be an empirical value or may be dynamically determined. For example, an initial threshold is input as a preset threshold, the neural network model is operated based on the initial threshold and corresponding accuracy is obtained, and then the accuracy of the model is made to meet the preset accuracy by adjusting the preset threshold. If the running accuracy of the model is lower than the preset accuracy, the preset threshold value is reduced until the accuracy of the model is increased to the preset accuracy, and if the running accuracy of the model is not lower than the preset accuracy, the preset threshold value is increased until the running accuracy is less than the preset accuracy. Each adjustment may be performed according to a step factor, for example, each time the preset threshold is increased, the current threshold is multiplied by (1 + step factor), and each time the preset threshold is decreased, the current threshold is multiplied by (1-step factor). The step size coefficient may be preset to a static fixed value, or may be dynamically changed in repeated iterations. The preset accuracy may be a percentage, e.g., 95%, 90%, etc., of the initial accuracy of the neural network model before the compression method 1000 for the neural network model is not used.

In step S104, the feature tensor group is compressed when the similarity of the feature tensor group satisfies the similarity criterion, and otherwise, the feature tensor group is not compressed. For example, the set of feature tensors 210 is compressed when max (a, b, c, d, e, f, g, h, i) -min (a, b, c, d, e, f, g, h, i) is less than a threshold, and the set of feature tensors 210 is not compressed when the value is greater than or equal to the threshold.

It is noted that the compression in the present invention may be a full compression, i.e., compressing all the feature tensors within the set of feature tensors (e.g., 210) to similar feature tensors. The similar feature tensor can be an average feature tensor of the feature tensors. However, the similar feature tensor is not limited to being the average feature tensor of the feature tensor, but may be any suitable feature tensor, such as the largest feature tensor, the smallest feature tensor, and so on, in the set of feature tensors (e.g., 210). Then, in step S1051 (not shown in fig. 1), the fully-compressed feature tensor group is stored as the similar feature tensor and the fully-compressed index value, and the uncompressed feature tensor group is stored as each feature tensor and the uncompressed index value in the feature tensor group. Because the feature tensor of the neural network model often has a large number of similar values, the compression mode obviously reduces the storage amount required by the neural network model in the training/reasoning process.

The compression in the present invention may also be partial compression, i.e., compressing only a proper subset of the set of feature tensors (e.g., 210) into similar feature tensors, rather than compressing all of the feature tensors therein. The similar feature tensor can be the largest feature tensor, the smallest feature tensor, the average feature tensor, etc., in the set of feature tensors (e.g., 210). Then, in step S1052 (not shown in fig. 1), the partially compressed set of feature tensors is stored as the similar feature tensors of the proper subset, each feature tensor in the complement of the proper subset, a position index value of each feature tensor in the complement in the set of feature tensors, and a partially compressed index value. Here, the similar feature tensors of the proper subset may be a maximum feature tensor, a minimum feature tensor, an average feature tensor, or the like of all feature tensors in the proper subset.

Fig. 3 is a compression result of applying a compression method for a neural network model according to an embodiment of the present invention. The image to be processed is represented by an eigen tensor table 3000, which includes 9 × 9 eigen tensors to be stored. These 9 × 9 feature tensors are grouped into 3 × 3 sets of feature tensors as indicated by the bold line box therein. The similarity of the feature tensors in the 3 × 3 feature tensor groups is calculated and compared with a preset threshold value respectively. Feature tensor groups with similarity lower than a preset threshold (namely, four feature tensor groups of upper left, middle left, upper right and lower right) are respectively compressed into similar feature tensors x, and the rest feature tensor groups are not compressed, so that a compression result is finally obtained, as shown in a feature tensor table 3100.

In an embodiment according to the invention, the similarity criterion comprises a high threshold and a low threshold. When the difference value of the set of feature tensors (e.g., max (a, b, c, d, e, f, g, h, i) -min (a, b, c, d, e, f, g, h, i) in the embodiment of fig. 1) is less than the low threshold, the set of feature tensors is fully compressed, and the set of fully compressed feature tensors is stored as the similar feature tensor and a fully compressed index value. When the difference value of the feature tensor group is not lower than the low threshold but lower than the high threshold, the feature tensor group is partially compressed (only a proper subset of the feature tensor group is compressed), and the partially compressed feature tensor group is stored as a similar feature tensor of the proper subset, each feature tensor in a complement of the proper subset, a position index value of each feature tensor in the complement in the feature tensor group, and a partially compressed index value. When the difference value of the characteristic tensor group is not lower than the high threshold value, the characteristic tensor group is not compressed. The uncompressed set of feature tensors is stored as each feature tensor in the set of feature tensors and an uncompressed index value. Through the hierarchical processing of the compression degree, the technical scheme of the invention can provide a more flexible compression mode according to the similarity degree between tensors to reduce the required storage amount.

A computer device 4000 for the compression method for a neural network model as described above according to an embodiment of the present invention is shown in fig. 4. The computer device 4000 comprises a memory 401 and a processor 402. Memory 401 includes store instructions (not shown in fig. 4) thereon. The processor 402 executes stored instructions on the memory 401 to implement the compression method for the neural network model as described above. Among other things, processor 402 may include one or more processing devices and memory 401 may include one or more tangible, non-transitory machine-readable media. By way of example, such machine-readable media can comprise RAM, ROM, EPROM, EEPROM, or optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code in the form of machine-executable instructions or data structures and which can be accessed by processor 402 or other processor-based devices.

A compression system 5000 for a neural network model according to an embodiment of the present invention is shown in fig. 5. The compression system 5000 includes a grouping means 510, a calculating means 520, a comparing means 530, and a compressing means 540. The grouping means 510 groups the feature tensors to be stored. Here, the grouping is based on characteristics of the feature tensor, e.g., spatial continuity, temporal continuity. The calculating means 520 calculates the similarity of the feature tensors in the set of feature tensors. The comparison means 530 compares the calculated similarity with a preset similarity criterion. The compressing means 540 compresses the set of feature tensors if the similarity of the set of feature tensors satisfies the similarity criterion, and does not compress the set of feature tensors otherwise.

The compressing means 540 may perform full compression, i.e., compressing all the feature tensors in the set of feature tensors into similar feature tensors (e.g., an average feature tensor of all the feature tensors). For the fully-compressed set of feature tensors, the compressing device 540 outputs the similar feature tensors and fully-compressed index values (e.g., 0) for storage; for the set of eigentensors that is not compressed, the compressing device 540 outputs each eigentensor in the set of eigentensors and an uncompressed index value (e.g., 1) for storage. Here, the storage may be a storage device (not shown in fig. 5) stored in the compression system 5000, or may be a storage device stored outside the compression system 5000.

Fig. 6 shows a convolution operation of the token table 610 after compression via the compression method for the neural network model according to an embodiment of the present invention. As shown in the upper left portion of fig. 6, the feature tensors in the middle box are compressed into a similar feature tensor m, with the remaining feature tensors uncompressed. And performing convolution operation on the characteristic tensor table and the convolution kernel 620 to obtain a result 630, which is specifically shown in the following formula.

w＝1*a+2*b+3*c+4*e+5*m+6*m+7*g+8*m+9*m

＝1*a+2*b+3*c+4*e+7*g+(5+6+8+9)*m

x＝1*b+2*c+3*d+4*m+5*m+6*f+7*m+8*m+9*h

＝1*b+2*c+3*d+6*f+9*h+(4+5+7+8)*m

y＝1*e+2*m+3*m+4*g+5*m+6*m+7*i+8*j+9*k

＝1*e+4*g+7*i+8*j+9*k+(2+3+5+6)*m

z＝1*m+2*m+3*f+4*m+5*m+6*h+7*j+8*k+9*l

＝3*f+6*h+7*j+8*k+9*l+(1+2+4+5)*m

Due to the compression, the result 630 contains a large number of redundant multiplication operations, e.g., each of w, x, y, and z includes 4 multiplication terms with a multiplier "m". Therefore, as shown in the above formula, the multiplication combination law is used to perform addition operation and then multiplication operation on the part related to m', so that partial multiplication operation is omitted, and the power consumption required by the whole convolution operation is greatly reduced.

It is noted that some of the block diagrams shown in the figures are only intended to schematically represent functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

It should also be understood that in some alternative embodiments, the functions/steps included in the methods may occur out of the order shown in the flowcharts. For example, two functions/steps shown in succession may be executed substantially concurrently or even in the reverse order. Depending on the functions/steps involved.

Although only a few embodiments of the present invention have been described in detail above, those skilled in the art will appreciate that the present invention may be embodied in many other forms without departing from the spirit or scope thereof. Accordingly, the present examples and embodiments are to be considered as illustrative and not restrictive, and various modifications and substitutions may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims.

Claims

1. A compression method for a neural network model, comprising:

grouping feature tensors to be stored, the grouping based on a first characteristic of the feature tensors;

calculating the similarity of the feature tensors in the feature tensor group;

comparing the calculated similarity with a preset similarity standard; and

compressing the set of feature tensors if the similarity of the set of feature tensors satisfies the similarity criterion, otherwise not compressing the set of feature tensors.

2. The compression method for a neural network model of claim 1,

the similarity includes a difference value between the values of the difference,

the preset similarity criterion includes a first threshold value,

compressing the set of feature tensors if the difference values of the set of feature tensors are below the first threshold, otherwise not compressing the set of feature tensors.

3. The compression method for a neural network model of claim 1,

the compressing includes a full compression in which all of the feature tensors of the set of feature tensors are compressed into similar feature tensors of the set of feature tensors.

4. The compression method for a neural network model of claim 3,

storing the fully-compressed set of feature tensors as the similar feature tensor and first index value of the set of feature tensors, an

Storing the set of feature tensors without compression as each feature tensor in the set of feature tensors and a second index value.

5. The compression method for a neural network model of claim 4,

the compressing further includes a partial compression in which a proper subset of the set of feature tensors is compressed into similar feature tensors of the proper subset, while the remaining feature tensors are not compressed.

6. The compression method for a neural network model of claim 5,

storing the set of feature tensors that have undergone the partial compression as the similar feature tensors of the proper subset, each feature tensor in a complement of the proper subset, a position index value of each feature tensor in the complement in the set of feature tensors, and a third index value.

7. The compression method for a neural network model of claim 5,

the proper subset of the set of eigentensors includes all but the largest and/or smallest eigentensors of the set of eigentensors.

8. The compression method for a neural network model of claim 6,

the similarity includes a difference value between the values of the differences,

the preset similarity criterion includes a second threshold and a third threshold, and the second threshold is smaller than the third threshold,

the set of feature tensors is fully compressed if the difference values of the set of feature tensors are below the second threshold, the set of feature tensors is partially compressed if the difference values of the set of feature tensors are not below the second threshold but below the third threshold, and the set of feature tensors is not compressed otherwise.

9. The compression method for a neural network model of claim 3,

the similar feature tensor is a maximum feature tensor, a minimum feature tensor, and/or an average feature tensor.

10. The compression method for a neural network model of claim 2,

the difference values include a difference of a largest feature tensor and a smallest feature tensor of the set of feature tensors, a difference of the largest feature tensor and an average feature tensor, and/or a difference of the average feature tensor and the smallest feature tensor.

11. The compression method for a neural network model of claim 1, further comprising,

inputting an initial threshold as a first threshold;

obtaining an accuracy of the neural network model based on the initial threshold; and

adjusting the first threshold value so that the precision is a preset precision:

if the accuracy of the neural network model is lower than the preset accuracy, decreasing the first threshold until the accuracy of the neural network model increases to the preset accuracy, and

if the accuracy of the neural network model is equal to/higher than the preset accuracy, increasing the first threshold until the accuracy of the neural network model starts to be less than the preset accuracy.

12. A compression method for a neural network model as claimed in claim 1, wherein the feature tensor to be stored is obtained from:

the outputs/outputs of convolutional, activation, and/or pooling layers in the inference process of the neural network model,

output/output of forward propagating convolutional, active and/or pooling layers during training of the neural network model, and/or

Output/output of a feature error layer of back propagation in a training process of the neural network model.

13. The compression method for a neural network model of claim 1,

the feature tensor comprises an image feature tensor/an audio feature tensor.

14. The compression method for a neural network model of claim 1,

the first characteristic includes temporal continuity and/or spatial continuity.

15. A computing device, wherein the apparatus comprises:

a memory arranged to store instructions; and

a processor arranged, when the instructions are executed, to implement a compression method for a neural network model as claimed in any one of claims 1-14.

16. A compression system for a neural network model, the system comprising:

grouping means for grouping an eigentensor to be stored, the grouping being based on first characteristics of the eigentensor;

calculating means for calculating a similarity of the feature tensors in the set of feature tensors;

a comparison means for comparing the calculated similarity with a preset similarity standard; and

means for compressing the set of feature tensors if the similarity of the set of feature tensors satisfies the similarity criterion, and not compressing the set of feature tensors otherwise.

17. A computer-readable storage medium storing instructions that, when executed, implement a compression method for a neural network model as claimed in any one of claims 1-14.