CN108510083B

CN108510083B - Neural network model compression method and device

Info

Publication number: CN108510083B
Application number: CN201810274146.3A
Authority: CN
Inventors: 孙源良; 王亚松; 刘萌; 樊雨茂
Original assignee: Guoxin Youe Data Co Ltd
Current assignee: Guoxin Youe Data Co Ltd
Priority date: 2018-03-29
Filing date: 2018-03-29
Publication date: 2021-05-14
Anticipated expiration: 2038-03-29
Also published as: CN108510083A

Abstract

The invention provides a neural network model compression method and a device, wherein the method comprises the following steps: inputting training data into a neural network model to be compressed and a target neural network model; training a target neural network model based on the feature vectors and classification results extracted from the training data by the neural network model to be compressed to obtain a compressed neural network model; and the number of the target neural network model parameters is less than that of the neural network model parameters to be compressed. According to the embodiment of the invention, the target neural network model is guided to be trained based on the feature vectors and the classification results of the neural network model to be compressed on the training data, and the finally obtained classification results of the compressed neural network model and the neural network model to be compressed on the same training data are the same, so that the precision loss in the model compression process is avoided, the size of the model can be compressed on the premise of ensuring the precision, and the dual requirements on the precision and the size of the model are met.

Description

Neural network model compression method and device

Technical Field

The invention relates to the technical field of machine learning, in particular to a neural network model compression method and device.

Background

With the rapid development of neural networks in the fields of images, voice, texts and the like, the landing of a series of intelligent products is promoted. In order to enable the neural network to better learn the characteristics of training data so as to improve the model effect, parameters correspondingly used for representing the neural network model are rapidly increased, and the number of layers of the neural network is continuously increased, so that the deep neural network model has the defects of numerous parameters and large calculation amount in the model training and application process; this results in that the neural network-based products mostly depend on the driving of the server-side computing power, and depend on good operating environment and network environment, so that the application range of the neural network model is limited, for example, embedded application cannot be realized. In order to realize embedded application of the neural network model, the volume of the neural network model needs to be compressed below a certain range.

Current model compression methods generally include the following: firstly, pruning, namely after a large model is trained, removing parameters with small weights in a network model, and then continuing to train the model; secondly, the purpose of reducing the number of parameters is achieved through weight sharing; and thirdly, quantization, generally speaking, parameters of the neural network model are represented by floating point type numbers with the length of 32 bits, so that the high precision is not required to be reserved actually, and the space occupied by each weight can be reduced by quantization, for example, the precision represented by the original 32 bits is represented by 0-255, and the precision is sacrificed. And fourthly, carrying out binarization on the neural network, namely representing parameters of the network model by using binary numbers so as to achieve the purpose of reducing the size of the model body.

However, the above methods directly perform model compression on the model to be compressed, and perform model compression on the premise of sacrificing the precision of the model, which often fails to meet the requirement for precision.

Disclosure of Invention

In view of the above, an object of the embodiments of the present invention is to provide a method and an apparatus for compressing a neural network model, which can compress the size of the model while ensuring the accuracy of the neural network model.

In a first aspect, an embodiment of the present invention provides a neural network model compression method, where the method includes:

inputting training data into a neural network model to be compressed and a target neural network model;

training a target neural network model based on the feature vectors and classification results extracted from the training data by the neural network model to be compressed to obtain a compressed neural network model;

wherein the number of the target neural network model parameters is less than the number of the neural network model parameters to be compressed.

In a second aspect, an embodiment of the present invention further provides a neural network model compression apparatus, where the apparatus includes:

the input module is used for inputting the training data into the neural network model to be compressed and the target neural network model;

the training module is used for training a target neural network model based on the feature vectors and the classification results of the training data extracted by the neural network model to be compressed to obtain a compressed neural network model;

The neural network model compression method and device provided by the embodiment of the application can pre-construct a target neural network with the quantity of parameters less than that of the neural network model to be compressed when the neural network model to be compressed is compressed, then inputting the training data into the neural network model to be compressed and the target neural network model, guiding the target neural network model to train based on the feature vectors and classification results extracted from the training data by the neural network model to be compressed to obtain the compressed neural network model, wherein the classification results of the finally obtained compressed neural network model and the neural network model to be compressed on the same training data are the same, and then can not cause the loss of precision in the compression process of the model, therefore can compress the size of the model on the premise of guaranteeing the precision, satisfy the dual demand to precision and model size.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a flowchart illustrating a neural network model compression method according to an embodiment of the present application;

fig. 2 is a flowchart illustrating a specific method for training a target neural network model based on a classification result of a neural network model to be compressed on training data according to a second embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a model compression process provided in the second embodiment of the present application;

fig. 4 is a flowchart illustrating a first comparison operation provided in the third embodiment of the present application;

fig. 5 is a flowchart of a specific method for performing similarity matching on the first feature vector and the second feature vector and performing the current round of training on the target neural network according to the result of the similarity matching, which is further provided in the fourth embodiment of the present application;

fig. 6 is a flowchart illustrating a similarity determination operation according to the fourth embodiment of the present application;

fig. 7 is a flowchart illustrating another specific method for performing similarity matching on the first feature vector and the second feature vector and performing the current round of training on the target neural network according to the result of the similarity matching according to the fifth embodiment of the present application;

fig. 8 is a flowchart illustrating a similarity determination operation according to a fifth embodiment of the present application;

FIG. 9 is a flowchart illustrating a neural network model compression method according to a sixth embodiment of the present disclosure;

fig. 10 is a schematic structural diagram illustrating a neural network model compression apparatus provided in a seventh embodiment of the present application;

fig. 11 shows a schematic structural diagram of a computer device according to an eighth embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

For the understanding of the present embodiment, a detailed description will be given to a neural network model compression method disclosed in the embodiment of the present invention, which can be used for compressing the sizes of various neural network models.

Referring to fig. 1, a neural network model compression method provided in an embodiment of the present application includes:

s101: and inputting the training data into the neural network model to be compressed and the target neural network model.

When the method is specifically implemented, the neural network model to be compressed is a neural network model with a large volume, is trained by training data, and is composed of a single neural network or a combination of a plurality of neural networks. It has a larger number of parameters than the target neural network model. The parameters herein may include the number of feature extraction layers of the neural network and/or parameters involved in each feature extraction layer.

Therefore, in order to compress the neural network model to be compressed, training data needs to be input into the neural network model to be compressed, the network model to be compressed is used to learn the characteristics of the training data, training of the neural network model to be compressed is achieved, the trained neural network model to be compressed is obtained, and the trained neural network model to be compressed is used as the neural network model to be compressed.

The target neural network model is a pre-configured neural network model, which has fewer parameters than the neural network model to be compressed, for example, the number of feature extraction layers is smaller, the neural network structure is simpler, and the number of feature extraction layers is smaller.

Here, it should be noted that, if the neural network model to be compressed is obtained by using an unsupervised training method, the training data is unlabeled; if the neural network model to be compressed is obtained by using a supervised training method, the training data are labeled; if the neural network model to be compressed is obtained by using a transfer learning training method, the training data can be labeled or unlabeled.

S102: training a target neural network model based on the feature vectors and classification results extracted from the training data by the neural network model to be compressed to obtain a compressed neural network model;

when the method is specifically realized, training data are input into the neural network model to be compressed and the target neural network model, the classification result of the network model to be compressed on the training data is used for guiding the training of the target neural network model, and in the process of training the target neural network model, the classification result of the training data is close to the classification result of the neural network model with compression on the training data as much as possible.

In the neural network model compression method provided by the embodiment of the application, when the neural network model to be compressed is compressed, a target neural network with the number of parameters less than that of the parameters of the neural network model to be compressed is constructed in advance, then the training data is input into the neural network model to be compressed and the target neural network model, the target neural network model is guided to be trained on the basis of the feature vectors and the classification results extracted from the training data by the neural network model to be compressed, and the compressed neural network model is obtained, the process is not operated on the neural network model with compression, and the finally obtained classification results of the compressed neural network model and the neural network model to be compressed on the same training data are the same, so that the precision loss can not be caused in the model compression process, the size of the model is compressed, and the dual requirements on the precision and the size of the model are met.

Specifically, the network model to be compressed generally includes: a neural network to be compressed and a classifier to be compressed. The target neural network model generally includes: a target neural network and a target classifier; the compressed neural network model obtained by training comprises the following steps: a compression neural network and a compression classifier.

Referring to fig. 2, a second embodiment of the present application further provides a specific method for training a target neural network model based on a classification result of a to-be-compressed neural network model on training data, including:

s201: and extracting a first feature vector for input training data by using the neural network to be compressed, and extracting a second feature vector for the input training data by using the target neural network.

S202: performing similarity matching on the first feature vector and the second feature vector, and performing the training of the target neural network according to the result of the similarity matching;

s203: inputting the first feature vector to a classifier to be compressed to obtain a first classification result;

inputting the second feature vector into a target classifier to obtain a second classification result;

s204: performing the training of the target neural network and the target classifier in the current round according to the comparison result of the first classification result and the second classification result;

s205: and performing multi-round training on the target neural network and the target classifier to obtain a compressed neural network model.

In specific implementation, referring to a schematic diagram of a model compression process shown in fig. 3, for convenience of describing the embodiment of the present application, two functional modules, a similarity matching module and a comparison module are introduced in the embodiment. The similarity matching module is used for performing similarity matching on the first feature vector and the second feature vector; the comparison module is used for comparing the first classification result with the second classification result.

The training data is input to the neural network model to be compressed and the target neural network model. After the training data are input into the neural network model to be compressed, two processes are executed, firstly, the neural network to be compressed performs feature extraction on the training data to obtain a first feature vector of the training data; and then transmitting the first feature vector to a classifier to be compressed, and classifying the training data represented by the first feature vector by the classifier to be compressed based on the first feature vector to obtain a first classification result.

Similarly, after the training data are input into the target neural network model, two processes are also executed, firstly, the target neural network performs feature extraction on the training data to obtain a second feature vector of the training data; and then transmitting the second feature vector to a target classifier, and classifying the training data represented by the second feature vector by the target classifier based on the second feature vector to obtain a second classification result.

The process of compressing the neural network model to be compressed is actually to realize the training of guiding the target neural network model through the neural network model to be compressed, so that the classification results of the compressed neural network model obtained by training are consistent with the classification results of the neural network model to be compressed on the same training data, namely, when the neural network to be compressed and the compressed neural network extract the characteristics of the same training data, the similarity between the obtained characteristic vectors is as close as possible; meanwhile, when the classifier to be compressed and the compression classifier classify the training data represented by the classifier to be compressed respectively based on the feature vectors which are as close as possible, the classification results are consistent. Thus, when training the target neural network model, both the target neural network and the target classifier are trained.

In the training process, the parameters of the target neural network are influenced by the similarity matching result of the first feature vector and the second feature vector, and the parameters of the target neural network are adjusted according to the similarity matching result. The first feature vector and the second feature vector are difficult to be consistent due to different parameters in the target neural network and the neural network to be compressed. Therefore, the target neural network needs to approach the second feature vector extracted from the training data to the first feature vector as much as possible; meanwhile, the parameters of the target neural network are also influenced by a second classification result of the training data classified by the target classifier on the second feature vector, and when the second classification result is inconsistent with the first classification result, the parameters of the target neural network are adjusted, so that the second classification result obtained by the target classifier is consistent with the first classification result.

And when the first classification result is inconsistent with the second classification result, adjusting the parameters of the target classifier to enable the second classification result to be consistent with the first classification result.

Then, after the training data are input into the neural network model to be compressed and the target neural network model, firstly, the neural network to be compressed is used for extracting a first feature vector for the training data of the data, the target neural network is used for extracting a second feature vector for the input training data, then the first feature vector and the second feature vector of the same training data are transmitted to a similarity matching module, the similarity matching module is used for carrying out similarity matching on the first feature vector and the second feature vector, and the target neural network is subjected to the training of the current round according to the similarity matching; meanwhile, the first feature vector is input into a classifier to be compressed to obtain a first classification result, the second feature vector is input into a target classifier to obtain a second classification result, the first classification result and the second classification result are transmitted to a comparison module, the comparison module is used for comparing the first classification result with the second classification result, and the target neural network and the target classifier are subjected to the training in the current round according to the comparison result.

And performing multi-round training on the target neural network and the target classifier to obtain a compressed neural network model.

It should be noted that the current round of training refers to training the target neural network model by using the same training data until the second feature vector obtained by feature extraction of the training data by the target neural network and the second classification result obtained by classification both meet preset conditions; the multi-round training refers to training a target neural network by using a plurality of training data, and each training data performs one round of training on the target neural network.

Specifically, a third embodiment of the present application further provides a specific method for performing a training round of a target neural network and a target classifier according to a comparison result between a first classification result and a second classification result, including: and executing the following first comparison operation until the classification loss of the target neural network model conforms to the preset loss range, and finishing the current round of training of the target neural network and the target classifier.

Referring to fig. 4, the first comparison operation includes:

s401: comparing whether the first classification result is consistent with the second classification result; if yes, jumping to S402; if not, it jumps to S403.

S402: completing the current round of training of the target neural network and the target classifier; the process is ended.

S403: generating first feedback information, and adjusting parameters of the target neural network and the target classifier based on the first feedback information;

s404: based on the adjusted parameters, a new second classification result is determined for the training data using the target neural network and the target classifier, and S401 is performed again.

In the specific implementation, the accuracy of the compressed neural network model obtained after the target neural network model is trained for multiple rounds is to be ensured, that is, the classification results of the compressed neural network model and the neural network model to be compressed on the same training data are to be ensured to be consistent. Therefore, the first classification result and the second classification result are compared by using the comparison module. When the comparison results are inconsistent, generating first feedback information, and adjusting parameters of the target neural network and the target classifier based on the first feedback information to obtain the target neural network and the target classifier after the parameters are adjusted; and determining a new second classification result for the training data by using the target neural network and the target classifier after the parameters are adjusted, performing the first comparison operation based on the first classification result and the new second classification result, and repeating the process until the first classification result is consistent with the second classification result.

In addition, referring to fig. 5, a fourth embodiment of the present application further provides a specific method for performing similarity matching on a first feature vector and a second feature vector, and performing the current round of training on a target neural network according to a result of the similarity matching, including:

s501: clustering the first feature vector and the second feature vector respectively;

s502: generating a first adjacency matrix according to the result of clustering the first eigenvector; generating a second adjacency matrix according to the result of clustering the second eigenvector;

s504: and performing the current round of training on the parameters of the target neural network according to the similarity between the first adjacent matrix and the second adjacent matrix.

In a specific implementation, the first feature vector may be regarded as a point mapped to a high-dimensional space, the points are respectively clustered according to a distance between the point and the point, the points whose distance is within a preset threshold are classified into the same class, and then a first adjacency matrix of the distance between the point and the point is formed according to a clustering result.

In the first adjacency matrix, if two points belong to the same class during clustering, the distance between the two points is 1; if two points do not belong to the same class at the time of clustering, the distance between the two points is 0.

For example, there are 5 training data, and the obtained first feature vectors are: 1. 2, 3, 4 and 5. Wherein, the result of clustering the first feature vector is as follows: {1,3}, {2}, and {4,5}, the adjacency matrix formed is:

the second adjacency matrix is formed according to the result of clustering the second eigenvector, which is similar to the above, and thus is not described again.

The fifth embodiment of the present application further provides a method for performing a current round of training on parameters of a target neural network according to a similarity between a first adjacency matrix and a second adjacency matrix, where the method includes: performing the following similarity determination operation until the similarity between the first adjacent matrix and the second adjacent matrix is smaller than a preset first similarity threshold value, and finishing the current round of training of the target neural network;

referring to fig. 6, the similarity determination operation includes:

s601: and comparing whether the similarity between the first adjacent matrix and the second adjacent matrix is smaller than a preset first similarity threshold value. If so, go to S602; if not, S603 is carried out.

Here, in a specific implementation, when the similarity between the first adjacent matrix and the second adjacent matrix obtained at present is calculated, the trace of the first adjacent matrix and the trace of the second adjacent matrix are calculated, and the closer the distance between the trace of the first adjacent matrix and the trace of the second adjacent matrix is, the higher the similarity between the first adjacent matrix and the second adjacent matrix is. When solving for the distance between the traces of the first adjacent matrix and the traces of the second adjacent matrix, the difference between the traces of the first adjacent matrix and the traces of the second adjacent matrix may be taken as the similarity between the first adjacent matrix and the second adjacent matrix, that is, the greater the absolute value of the difference between the traces of the first adjacent matrix and the traces of the second adjacent matrix, the lower the similarity between the first adjacent matrix and the second adjacent matrix.

S602: and finishing the current round of training of the target neural network. The process is ended.

S603: generating first feedback information, and adjusting parameters of the target neural network based on the first feedback information;

s604: extracting a new second feature vector for the training data by using the target neural network based on the adjusted parameters; clustering the new second feature vector to generate a new second adjacency matrix, and performing S601 again.

In a specific implementation, since the higher the similarity between the first adjacent matrix and the second adjacent matrix is, the more similar the classification result of the first adjacent matrix characterizing the first feature vector is to the classification result of the second adjacent matrix characterizing the second feature vector, the parameter adjustment is performed on the target neural network according to the similarity between the first adjacent matrix and the second adjacent matrix, so that the second feature vector obtained by performing feature extraction on training data by the target neural network is closer to the first feature vector obtained by performing feature extraction on the training data by using the neural network to be compressed.

In addition, referring to fig. 7, a fifth embodiment of the present application further provides another specific method for performing similarity matching on the first feature vector and the second feature vector, and performing the current round of training on the target neural network according to a result of the similarity matching, where the method includes:

s701: and respectively carrying out dimensionality reduction operation on the first eigenvector and the second eigenvector to obtain a first dimensionality reduction eigenvector of the first eigenvector and a second dimensionality reduction eigenvector of the second eigenvector.

In a specific implementation, the dimensionality reduction operation is performed on the first feature vector and the second feature vector, and the first reduced-dimensionality feature vector and the second reduced-dimensionality feature vector can be obtained by re-encoding the first feature vector and the second feature vector, for example, using a full-connection layer to perform feature capture on the first feature vector and the second feature vector again.

S702: and calculating the similarity of the first dimension-reduced feature vector and the second dimension-reduced feature vector.

Here, in calculating the similarity between the first dimension-reduced feature vector and the second dimension-reduced feature vector, calculating the difference between the two vectors and taking the difference between the two vectors as the result of the similarity may be adopted. Or, element-to-element subtraction may be performed directly on the first dimension-reduced feature vector and the second dimension-reduced feature vector, and the result of the subtraction is taken as the result of the similarity; or, the first dimension-reduced feature vector and the second dimension-reduced feature vector can be regarded as points and projected into a corresponding space, and the difference between the point distributions can be calculated. For example,projecting the first dimension-reducing feature vector and the second dimension-reducing feature vector into corresponding spaces, wherein the obtained points are respectively as follows: s (X)₁，Y₁，Z₁)，M(X₂，Y₂，Z₂) The distance between two points, L ═ X₁-X₂)²+(Y₁-Y₂)²+(Z₁-Z₂)²As the similarity of both; the smaller the distance, the greater the similarity.

S703: and performing the training of the parameters of the target neural network in the current round according to the similarity of the first dimension-reducing feature vector and the second dimension-reducing feature vector.

Here, the parameters of the target neural network are trained according to the similarity between the first dimension-reduced feature vector and the second dimension-reduced feature vector, and it is actually required to ensure that the similarity between the first dimension-reduced feature vector and the second dimension-reduced feature vector is within the preset second similarity threshold. In particular to

The following similarity determination operation can be performed until the similarity between the first dimension-reduced feature vector and the second dimension-reduced feature vector is smaller than a preset second similarity threshold, and the current round of training on the target neural network is completed.

Referring to fig. 8, the similarity determination operation includes:

s801: comparing whether the similarity between the first dimension-reducing feature vector and the second dimension-reducing feature vector is smaller than a preset second similarity threshold value or not; if yes, executing S302; if not, S803 is executed.

Here, the method for calculating the similarity between the first dimension-reduced feature vector and the second dimension-reduced feature vector may be referred to the description of S702, and is not repeated herein.

S802: and finishing the current round of training of the target neural network. The process is ended.

S803: and generating second feedback information, and adjusting parameters of the target neural network based on the second feedback information.

S804: based on the adjusted parameters, a new second feature vector is extracted for the training data using the target neural network. And performing dimensionality reduction operation on the new second feature vector to generate a new second dimensionality reduction feature vector, and performing S801 again.

Specifically, to ensure that the first feature vector and the second feature vector are as close as possible, the similarity between the first feature vector and the second feature vector is smaller than a certain threshold, that is, the similarity between the first dimension-reduced feature vector and the second dimension-reduced feature vector is smaller than a preset second similarity threshold. When the similarity between the first dimension-reduced feature vector and the second dimension-reduced feature vector is not less than a preset second similarity threshold, second feedback information is generated correspondingly, and parameters of the target neural network are adjusted based on the second feedback information, so that the target neural network can change towards the direction of increasing the similarity between the first dimension-reduced feature vector and the second dimension-reduced feature vector when the second feature vector is extracted for the training data again. And then extracting a new second feature vector for the training data by using the target neural network with the adjusted parameters, performing dimensionality reduction operation on the new second feature vector again to generate a new second dimensionality reduction feature vector, and performing the similarity determination operation again until the similarity between the first dimensionality reduction feature vector and the second dimensionality reduction feature vector is smaller than a preset second similarity threshold value.

By using the compressed neural network model obtained in the first embodiment of the application, the accuracy of the compressed neural network model can be ensured to be consistent with that of the neural network model to be compressed; for a to-be-compressed network model obtained by an unsupervised learning or transfer learning training method, if the to-be-compressed neural network model is wrong in classification of certain training data, the to-be-compressed neural network model also can be wrong in classification of the training data to a certain extent. The sixth embodiment of the application also provides another neural network model compression method, which can further improve the precision of the compressed neural network model.

Referring to fig. 9, before performing similarity matching on the first feature vector and the second feature vector, the neural network model compression method according to the sixth embodiment of the present application further includes:

s901: a noise addition operation is performed on the first feature vector.

In specific implementation, noise addition is performed on the first feature vector in order to increase generalization capability of the compressed neural network model obtained through training. Generalization ability refers to the ability of a machine learning algorithm to adapt to a fresh sample. When the noise adding operation is performed on the first feature vector, noise of different degrees or different types may be added to the first feature vector multiple times. Each addition of noise generates a first feature vector added with noise, and each first feature vector added with noise generates a certain degree of deviation to the original first feature vector, so that one training data can obtain a plurality of deviated first feature vectors. Meanwhile, the data volume of the first characteristic vector can be enriched, input training data are reduced under the condition that the data volume of the first characteristic vector is not changed, and data can be better fitted. In addition, the neural network model to be compressed is not necessarily very accurate for the classification of some training data, so that the addition of the variation of the first feature vector may make the first feature vector added with noise more practical, and better guidance for the training of the target neural network model is realized.

When noise is added to the first feature vector, a noise vector having the same dimension as the first feature vector is generally constructed, and noise is added to the first feature vector by adding the first feature vector and position data corresponding to the noise vector.

When constructing a noise vector having the same dimension as the first feature vector, the noise vector may be constructed directly or indirectly. Direct construction means that a noise vector having the same dimension as the first feature vector is directly generated, for example, when the dimension of the first feature vector is 1 × 1000, the constructed noise vector is also 1 × 1000. The indirect construction means that a noise vector with dimensionality lower than that of the first characteristic vector is generated, and then the noise vector with dimensionality same as that of the first characteristic vector is generated in a zero filling mode of the noise vector; for example, when the dimension of the first feature vector is 1 × 1000, the constructed intermediate noise vector is 1 × 500; 0 is filled in an arbitrary position of the intermediate noise vector, and a noise vector having a dimension of 1 × 1000 is finally formed.

In addition, because the first feature vector can be subjected to noise of different degrees for multiple times, or noise of different degrees can be added for multiple times, the noise of different degrees can be obtained by adopting a mode of changing parameters in a noise generation algorithm; or the method of indirectly constructing the noise vector is adopted, and the method of filling 0 in different positions is adopted to obtain the noise vector; different kinds of noise can be obtained by changing the noise generation algorithm.

S902: and performing similarity matching on the first feature vector and the second feature vector added with the noise.

The method for matching similarity between the first feature vector and the second feature vector added with noise is similar to the method for matching similarity between the first feature vector and the second feature vector not added with noise, and reference may be specifically made to the above description, and details are not repeated here.

In addition, in this embodiment, since noise is added to the first feature vector, when the classifier to be compressed is used to classify the first feature vector to which the noise is added, the classification result may be different from the original classification result of the first feature vector, and if the classification result is not corrected, the accuracy of the finally obtained compressed neural network model may be affected.

Therefore, in the embodiment of the present application, while performing similarity matching on the first feature vector and the second feature vector to which noise is added and completing training on the target neural network based on the similarity matching result, a second comparison operation is performed until the first classification result is consistent with the label of the training data, and the current round of training on the neural network to be compressed and the classifier to be compressed is completed;

the second alignment operation comprises:

comparing the first classification result with the label of the training data;

generating first scattered feedback information aiming at the condition that the comparison results are inconsistent, and adjusting parameters of the neural network to be compressed and the classifier to be compressed based on the first scattered feedback information;

and based on the adjusted parameters, extracting a new first classification result for the training data by using the neural network to be compressed and the classifier to be compressed, and executing a second comparison operation again.

The fine adjustment of the neural network model to be compressed can be realized through the steps, so that the neural network model to be compressed and the compressed neural network model obtained through training can have better generalization capability and higher precision.

Based on the same inventive concept, the embodiment of the present invention further provides a neural network model compression apparatus corresponding to the neural network model compression method, and as the principle of the apparatus in the embodiment of the present invention for solving the problem is similar to the neural network model compression method described above in the embodiment of the present invention, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.

Referring to fig. 10, a neural network model compression apparatus provided by the seventh embodiment of the present invention specifically includes:

the input module 11 is used for inputting the training data into the neural network model to be compressed and the target neural network model;

the first training module 12 is configured to train the target neural network model based on feature vectors and classification results extracted from training data by the neural network model to be compressed, so as to obtain a compressed neural network model;

and the number of the target neural network model parameters is less than that of the neural network model parameters to be compressed.

The neural network model compression device provided by the embodiment of the application, when compressing the neural network model to be compressed, pre-constructs a target neural network with the quantity of parameters less than that of the neural network model to be compressed, then inputs training data into the neural network model to be compressed and the target neural network model, guides the target neural network model to be trained based on the feature vectors and classification results extracted from the training data by the neural network model to be compressed, and obtains the compressed neural network model, the process is not to operate on the neural network model with compression, and the finally obtained classification results of the compressed neural network model and the neural network model to be compressed on the same training data are the same, so that the loss of precision can not be caused in the model compression process, therefore, on the premise of ensuring the precision, the size of the model is compressed, and the dual requirements on the precision and the size of the model are met.

Optionally, the method further comprises: and the second training module 13 is configured to input training data into the neural network model to be compressed before inputting the training data into the neural network model to be compressed and the target neural network model, and train the neural network model to be compressed to obtain the trained neural network model to be compressed.

Optionally, the neural network model to be compressed includes: a neural network to be compressed and a classifier to be compressed; the target neural network model includes: a target neural network and a target classifier;

the first training module 12 is specifically configured to: extracting a first characteristic vector by using a neural network to be compressed as input training data, and extracting a second characteristic vector by using a target neural network as the input training data;

performing similarity matching on the first feature vector and the second feature vector, and performing the training of the target neural network according to the result of the similarity matching; and

inputting the first feature vector to a classifier to be compressed to obtain a first classification result;

performing the training of the target neural network and the target classifier in the current round according to the comparison result of the first classification result and the second classification result;

Optionally, the first training module 12 is specifically configured to perform the following first comparison operation until the classification loss of the target neural network model meets a preset loss range, so as to complete the current round of training on the target neural network and the target classifier;

the first comparison operation includes:

comparing the first classification result with the second classification result;

generating first feedback information aiming at the condition that the comparison result is inconsistent, and adjusting parameters of the target neural network and the target classifier based on the first feedback information;

based on the adjusted parameters, a new second classification result is determined for the training data using the target neural network and the target classifier, and the first comparison operation is performed again.

Optionally, the first training module 12 is further configured to: before similarity matching is carried out on the first feature vector and the second feature vector, noise adding operation is carried out on the first feature vector; and performing similarity matching on the first feature vector and the second feature vector added with the noise.

Optionally, the first training module 12 is specifically configured to perform similarity matching on the first feature vector and the second feature vector through the description steps, and perform a current training on the target neural network according to a result of the similarity matching: clustering the first feature vector and the second feature vector respectively;

generating a first adjacency matrix according to the result of clustering the first eigenvector;

generating a second adjacency matrix according to the result of clustering the second eigenvector;

and performing the current round of training on the parameters of the target neural network according to the similarity between the first adjacent matrix and the second adjacent matrix.

Optionally, the first training module 12 is specifically configured to perform the following similarity determination operation until the similarity between the first adjacent matrix and the second adjacent matrix is smaller than a preset first similarity threshold, so as to complete the current round of training on the target neural network;

the similarity determination operation includes:

calculating the similarity between the first adjacency matrix and the second adjacency matrix which are obtained currently;

generating first feedback information aiming at the condition that the similarity is not less than a preset first similarity threshold, and adjusting parameters of the target neural network based on the first feedback information;

extracting a new second feature vector for the training data by using the target neural network based on the adjusted parameters;

and clustering the new second eigenvector to generate a new second adjacency matrix, and performing similarity determination operation again.

Optionally, the first training module 12 is specifically configured to perform similarity matching on the first feature vector and the second feature vector through the following steps, and perform a current training on the target neural network according to a result of the similarity matching:

respectively carrying out dimensionality reduction operation on the first eigenvector and the second eigenvector to obtain a first dimensionality reduction eigenvector of the first eigenvector and a second dimensionality reduction eigenvector of the second eigenvector;

calculating the similarity of the first dimension-reducing feature vector and the second dimension-reducing feature vector;

and performing the training of the parameters of the target neural network in the current round according to the similarity of the first dimension-reducing feature vector and the second dimension-reducing feature vector.

Optionally, the first training module 12 is specifically configured to perform the following similarity determination operation until the similarity between the first dimension-reduced feature vector and the second dimension-reduced feature vector is smaller than a preset second similarity threshold, and complete the current round of training on the target neural network;

the similarity determination operation includes:

calculating the similarity between the first dimension-reducing feature vector and the second dimension-reducing feature vector which are obtained currently;

generating second feedback information aiming at the condition that the similarity is not less than a preset second similarity threshold, and adjusting parameters of the target neural network based on the second feedback information;

and carrying out dimensionality reduction operation on the new second feature vector to generate a new second dimensionality reduction feature vector, and carrying out similarity determination operation again.

Corresponding to the neural network model compression method in fig. 1, an eighth embodiment of the present invention further provides a computer device, as shown in fig. 11, the computer device includes a memory 1000, a processor 2000 and a computer program stored in the memory 1000 and executable on the processor 2000, where the processor 2000 implements the steps of the neural network model compression method when executing the computer program.

Specifically, the memory 1000 and the processor 2000 can be general memories and general processors, which are not specifically limited herein, and when the processor 2000 runs a computer program stored in the memory 1000, the neural network model compression method can be executed, so as to solve the problem that the existing model compression method cannot meet the requirement for precision use because the model compression is performed on the premise of sacrificing the precision of the model, and further achieve the effect of compressing the size of the model under the condition of ensuring the precision of the neural network model.

Corresponding to the neural network model compression method in fig. 1, a ninth embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the neural network model compression method.

Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when a computer program on the storage medium is run, the neural network model compression method can be executed, so that the problem that the existing model compression method cannot meet the requirement for precision use because model compression is performed on the premise of sacrificing the precision of the model is solved, and the effect of compressing the size of the model under the condition of ensuring the precision of the neural network model is achieved.

The neural network model compression method and the computer program product of the apparatus provided in the embodiments of the present invention include a computer readable storage medium storing a program code, and instructions included in the program code may be used to execute the method in the foregoing method embodiments, and specific implementation may refer to the method embodiments, and will not be described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A neural network model compression method, the method comprising:

inputting image data to be trained, voice data to be trained and character data to be trained into a neural network model to be compressed and a target neural network model;

training a target neural network model based on the feature vectors and classification results extracted from the to-be-compressed neural network model on the to-be-trained image data, the to-be-trained voice data and the to-be-trained character data to obtain a compressed neural network model; the compressed neural network model is used for executing an image processing task on image data, or executing a voice processing task on voice data, or executing a word processing task on word data; the number of the target neural network model parameters is less than that of the neural network model parameters to be compressed;

the neural network model to be compressed comprises: a neural network to be compressed and a classifier to be compressed; the target neural network model includes: a target neural network and a target classifier;

the method comprises the following steps of training a target neural network model based on feature vectors and classification results extracted from the to-be-compressed neural network model on the to-be-trained image data, the to-be-trained voice data and the to-be-trained character data to obtain a compressed neural network model, and specifically comprises the following steps:

extracting a first characteristic vector for the input image data to be trained, the input voice data to be trained and the input text data to be trained by using the neural network to be compressed, and extracting a second characteristic vector for the input image data to be trained, the input voice data to be trained and the input text data to be trained by using the target neural network;

performing similarity matching on the first feature vector and the second feature vector, and performing the current round of training on the target neural network according to the result of the similarity matching; and

inputting the first feature vector to the classifier to be compressed to obtain a first classification result;

inputting the second feature vector to the target classifier to obtain a second classification result;

performing a current round of training on the target neural network and the target classifier according to a comparison result of the first classification result and the second classification result;

performing multi-round training on the target neural network and the target classifier to obtain a compressed neural network model;

executing the following first comparison operation until the classification loss of the target neural network model conforms to a preset loss range, and finishing the current round of training of the target neural network and the target classifier;

the first comparison operation includes:

generating first feedback information aiming at the condition that the comparison result is inconsistent, and carrying out parameter adjustment on the target neural network and the target classifier based on the first feedback information;

and determining a new second classification result for the image data to be trained, the voice data to be trained and the character data to be trained by using a target neural network and a target classifier based on the adjusted parameters, and executing the first comparison operation again.

2. The method of claim 1, further comprising, prior to inputting the training data into the neural network model to be compressed and the target neural network model:

and inputting the image data to be trained, the voice data to be trained and the character data to be trained into the neural network model to be compressed, and training the neural network model to be compressed to obtain the trained neural network model to be compressed.

3. The method of claim 1, wherein before the similarity matching the first feature vector and the second feature vector, further comprising:

performing a noise addition operation on the first feature vector;

the performing similarity matching on the first feature vector and the second feature vector specifically includes:

and performing similarity matching on the first feature vector and the second feature vector added with the noise.

4. The method according to any one of claims 1 to 3, wherein the performing similarity matching on the first feature vector and the second feature vector and performing a current training on the target neural network according to a result of the similarity matching specifically comprises:

clustering the first feature vector and the second feature vector respectively;

and performing the training of the current round on the parameters of the target neural network according to the similarity between the first adjacent matrix and the second adjacent matrix.

5. The method according to claim 4, wherein the following similarity determination operation is performed until the similarity between the first adjacent matrix and the second adjacent matrix is smaller than a preset first similarity threshold, and the current round of training of the target neural network is completed;

the similarity determination operation includes:

generating first feedback information aiming at the condition that the similarity is not less than a preset first similarity threshold, and carrying out parameter adjustment on the target neural network based on the first feedback information;

based on the adjusted parameters, extracting new second feature vectors for the image data to be trained, the voice data to be trained and the character data to be trained by using a target neural network;

and clustering the new second eigenvector to generate a new second adjacency matrix, and executing the similarity determination operation again.

6. The method according to any one of claims 1 to 3, wherein the performing similarity matching on the first feature vector and the second feature vector and performing a current training on the target neural network according to a result of the similarity matching specifically comprises:

calculating the similarity of the first dimension-reduced feature vector and the second dimension-reduced feature vector;

and performing the current round of training on the parameters of the target neural network according to the similarity of the first dimension-reducing feature vector and the second dimension-reducing feature vector.

7. The method of claim 6, wherein the following similarity determination operations are performed until the similarity between the first dimension-reduced feature vector and the second dimension-reduced feature vector is smaller than a preset second similarity threshold, and the current round of training on the target neural network is completed;

the similarity determination operation includes:

generating second feedback information aiming at the condition that the similarity is not less than a preset second similarity threshold, and carrying out parameter adjustment on the target neural network based on the second feedback information;

and carrying out dimensionality reduction operation on the new second feature vector to generate a new second dimensionality reduction feature vector, and executing the similarity determination operation again.

8. A neural network model compression device is applied to an embedded device, and comprises:

the input module is used for inputting image data to be trained, voice data to be trained and character data to be trained into the neural network model to be compressed and the target neural network model;

the first training module is used for training a target neural network model based on the feature vectors and classification results extracted by the neural network model to be compressed from the image data to be trained, the voice data to be trained and the character data to be trained, so as to obtain a compressed neural network model; the compressed neural network model is used for executing an image processing task on image data, or executing a voice processing task on voice data, or executing a word processing task on word data; the number of the target neural network model parameters is less than that of the neural network model parameters to be compressed;

the first training module is specifically configured to: extracting a first characteristic vector for the input image data to be trained, the input voice data to be trained and the input text data to be trained by using the neural network to be compressed, and extracting a second characteristic vector for the input image data to be trained, the input voice data to be trained and the input text data to be trained by using the target neural network;

the first training module is specifically configured to perform a first comparison operation until the classification loss of the target neural network model meets a preset loss range, and complete a current round of training on the target neural network and the target classifier;

the first comparison operation includes: