CN115456169A

CN115456169A - Model compression method, system, terminal and storage medium

Info

Publication number: CN115456169A
Application number: CN202211085770.1A
Authority: CN
Inventors: 王光勇; 关海欣; 梁家恩
Original assignee: Unisound Intelligent Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd
Priority date: 2022-09-06
Filing date: 2022-09-06
Publication date: 2022-12-09

Abstract

The invention provides a model compression method, a system, a terminal and a storage medium, wherein the method comprises the following steps: performing model training on the model to be compressed, adding a regular term and singular value decomposition into the model to be compressed to obtain a singular value matrix, returning to execute the step of performing model training on the model to be compressed and the subsequent steps according to the singular value matrix until the model to be compressed meets a performance degradation condition, and outputting the model to be compressed; performing parameter clustering according to the weight tensor of the model to be compressed to obtain a weight parameter matrix, and performing weight quantization on the weight parameter matrix to obtain a clustering quantization matrix; and setting parameters of the model to be compressed according to the clustering quantization matrix to obtain the compression model. The method is based on a combined model compression mode of sparse regularization, iterative pruning and clustering quantization, and carries out model compression on a model to be compressed from the global perspective, so that the maximum model compression is realized on the premise of ensuring the precision.

Description

Model compression method, system, terminal and storage medium

Technical Field

The present invention relates to the field of model processing technologies, and in particular, to a method, a system, a terminal, and a storage medium for compressing a model.

Background

Machine learning is now widely used in various industries. The complex model has better performance, but the high storage space and the computing resource consumption are important reasons for restricting the application efficiency, so that the model compression method is more and more emphasized by people.

In the existing model compression process, unimportant channels, convolution kernels, a certain layer and the like in the model are generally directly deleted from the model, so that the effect of reducing the parameters and the calculated amount in the model is achieved, but the compressed model has lower performance and the use experience of a user is reduced due to the fact that the channels, the convolution kernels or the network layer in the model are directly deleted.

Disclosure of Invention

The embodiment of the invention aims to provide a model compression method, a model compression system, a terminal and a storage medium, and aims to solve the problem that the performance of a compressed model is low in the existing model compression process.

The embodiment of the invention is realized in such a way that a model compression method comprises the following steps:

obtaining a model to be compressed, and performing model training on the model to be compressed;

adding a regular term into the model to be compressed after model training, and performing singular value decomposition on the weight tensor of the model to be compressed after the regular term is added to obtain a singular value matrix, wherein the regular term is used for performing parameter sparseness on weight parameters in the model to be compressed;

returning to execute the step of performing model training on the model to be compressed and the subsequent steps according to the singular value matrix until the model to be compressed meets a performance degradation condition, and outputting the model to be compressed;

performing parameter clustering according to the output weight tensor of the model to be compressed to obtain a weight parameter matrix, and performing weight quantization on the weight parameter matrix to obtain a clustering quantization matrix;

and setting parameters of the model to be compressed according to the clustering quantization matrix to obtain a compression model.

Furthermore, before the step of performing model training on the model to be compressed and the subsequent steps according to the singular value matrix, the method further includes:

inputting test data into the trained model to be compressed for data processing to obtain output data, and acquiring standard data of the test data;

according to the standard data and the output data, performing performance detection on the model to be compressed to obtain a subjective evaluation index, an audio intelligible index and a voice quality evaluation index;

and if the subjective evaluation index, the audio intelligible index and the voice quality evaluation index meet preset index conditions, judging that the model to be compressed meets performance degradation conditions.

Further, the performing performance detection on the model to be compressed according to the standard data and the output data includes:

acquiring an output image in the output data, and acquiring a standard image in the standard data;

performing signal-to-noise ratio calculation according to the standard image and the output image to obtain the subjective evaluation index;

acquiring an output audio frequency in the output data, and acquiring a standard audio frequency in the standard data;

performing short-time objective intelligibility calculation according to the output audio and the standard audio to obtain an audio intelligibility index;

acquiring output voice in the output data, and acquiring standard voice in the standard data;

and performing perceptual voice quality evaluation calculation according to the output voice and the standard voice to obtain the voice quality evaluation index.

Further, the adding a regular term to the model to be compressed after the model training includes:

inquiring coding layers in the model to be compressed, and adding a first regular term in a normalization layer of each coding layer;

inquiring the long and short term memory network layers in the model to be compressed, and adding a second regular term in the layer-in of each long and short term memory network layer;

the first regularization term includes:

the second regularization term includes:

wherein L is ₁ As the first regularization term, L ₂ The second regularization term, μ _β And delta _β The mean value and the variance of the activation value in the model to be compressed are respectively, gamma and beta are affine transformation parameters, and lambda is a penalty coefficient factor.

Further, the formula for performing singular value decomposition on the weight tensor of the model to be compressed after the regular term is added includes:

A＝UΣV ^T

u is an m multiplied by m matrix, sigma is an m multiplied by n matrix, elements on a main diagonal of the sigma except for the main diagonal are all zero, each element on the main diagonal of the sigma is a singular value, V is an n multiplied by n matrix, and U and V are unitary matrixes, so that UU is satisfied ^T ＝I，VV ^T = I, a is the weight tensor.

Further, the performing weight quantization on the weight parameter matrix to obtain a cluster quantization matrix includes:

and respectively matching the weight parameters in the weight parameter matrix with a pre-stored clustering comparison table to obtain a clustering gravity center value, and performing matrix construction according to the clustering gravity center value to obtain a clustering quantization matrix, wherein the clustering comparison table stores corresponding relations between different weight parameters and corresponding clustering gravity center values.

Further, the formula for performing parameter clustering according to the output weight tensor of the model to be compressed includes:

wherein, w _min And w _max Representing the minimum and maximum values of the weight tensor, respectively.

It is another object of an embodiment of the present invention to provide a model compression system, including:

the model training module is used for obtaining a model to be compressed and carrying out model training on the model to be compressed;

the singular value decomposition module is used for adding a regular term into the model to be compressed after model training, and performing singular value decomposition on the weight tensor of the model to be compressed after the regular term is added to obtain a singular value matrix, wherein the regular term is used for performing parameter sparseness on weight parameters in the model to be compressed;

the model output module is used for returning and executing the step of performing model training on the model to be compressed and the subsequent steps according to the singular value matrix until the model to be compressed meets a performance degradation condition, and outputting the model to be compressed;

the parameter clustering module is used for carrying out parameter clustering according to the output weight tensor of the model to be compressed to obtain a weight parameter matrix, and carrying out weight quantization on the weight parameter matrix to obtain a clustering quantization matrix;

and the parameter setting module is used for carrying out parameter setting on the model to be compressed according to the clustering quantization matrix to obtain a compression model.

It is another object of the embodiments of the present invention to provide a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method are implemented.

It is a further object of an embodiment of the present invention to provide a computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the steps of the above method.

According to the embodiment of the invention, the regular terms are added in the model to be compressed after model training, and the sparsity of the weight tensor in the model to be compressed is increased, so that a higher pruning rate is obtained under the condition of not sacrificing the performance of the model, the performance of the compressed model output after model compression is ensured, the step of performing model training on the model to be compressed and the subsequent steps are returned through the singular value matrix, the sparsity of the weight parameters in the model to be compressed can be continuously performed, so that the iterative pruning effect of the model to be compressed is achieved, the pruning effect of the model to be compressed is improved, the parameters in the model to be compressed can be effectively clustered and quantized through the weight tensor of the model to be compressed, and the model compression effect of the model to be compressed is further improved.

Drawings

FIG. 1 is a flow chart of a model compression method provided by a first embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a model to be compressed according to a first embodiment of the present invention;

FIG. 3 is a diagram illustrating a parameter clustering process of weight tensors according to a first embodiment of the present invention;

FIG. 4 is a flow chart of a model compression method provided by a second embodiment of the present invention;

FIG. 5 is a schematic diagram of a model compression system according to a third embodiment of the present invention;

fig. 6 is a schematic structural diagram of a terminal device according to a fourth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Example one

Referring to fig. 1, a flowchart of a model compression method according to a first embodiment of the present invention is shown, where the model compression method can be applied to any terminal device or system, and the model compression method includes the steps of:

s10, obtaining a model to be compressed, and performing model training on the model to be compressed;

the model to be compressed may be set as required, please refer to fig. 2, where the model to be compressed in this step is a Diffusion Convolutional Recurrent Neural Network (DCRNN) model, the DCRNN model includes seven coding (Encoder) layers, two Long and Short Term Memory network layers, and a full connection layer, the Encoder layer includes a Convolutional Neural Network (CNN), batchNorm2d, and a prunelu, the Long and Short Term Memory network layer includes a Long Short-Term Memory network (LSTM) and a LaynerNorm, and the batchm 2d and the LaynerNorm are unified.

In the step, before the model training, the importance of the weight in the model to be compressed is not distinguished, and after the model training is carried out on the model to be compressed, the important weight and the unimportant weight in the model to be compressed can be distinguished so as to store the optimal parameter quantity of the model, thereby facilitating the subsequent sparsification treatment on the model to be compressed.

Step S20, adding a regular term in the model to be compressed after model training, and performing singular value decomposition on the weight tensor of the model to be compressed after the regular term is added to obtain a singular value matrix;

the regular term is used for parameter sparseness of weight parameters in the model to be compressed, and by adding the regular term to the model to be compressed after model training, unimportant weights in the model to be compressed can be sparse without influencing the precision of the model to be compressed, so that a sparse regularization effect is achieved, and a model pruning effect of a subsequent model to be compressed is guaranteed.

Optionally, in this step, adding a regular term to the model to be compressed after the model training includes:

inquiring the long-short term memory network layer in the model to be compressed, and adding a second regular term in the layer-returning of each long-short term memory network layer;

aiming at the coding layer, a first regular term is introduced into the BatchNorm layer, the BatchNorm layer acts on the channel, the weight of the channel which is not important can be thinned to 0, pruning is facilitated, aiming at the pruning of the long-term and short-term memory network layer, a second regular term is introduced into the LayerNorm layer, the LayerNorm layer acts on the hide-to-hide, the weight which is not important on hide-to-hide can be thinned to 0, and pruning is facilitated.

Further, the first regularization term includes:

the second regularization term includes:

wherein L is ₁ Is a first regularization term, L ₂ Second regularization term, μ _β And δ β are respectively the mean and variance of the activation values in the model to be compressed, γ and β are affine transformation parameters, and λ is a penalty factor.

In the step, the problem that the model to be compressed cannot be pruned from the global perspective in the prior art is effectively solved by adding the first regular term in the normalization layer of each coding layer and adding the second regular term in the normalization layer of each long-short term memory network layer. In this step, when the epoch is 50%, λ is 10 during sparse training of the model to be compressed ^-4 (ii) a When the epoch is greater than 50%, λ is 10 ^-6 So that the loss of precision due to sparsification can be quickly recovered, one epoch represents: all dataAnd sending the data into the network to finish the process of one-time forward calculation and backward propagation.

Further, in this step, the formula for performing singular value decomposition on the weight tensor of the model to be compressed to which the regular term is added includes:

A＝UΣV ^T

u is an m multiplied by m matrix, sigma is an m multiplied by n matrix, elements of sigma except a main diagonal are all zero, each element of the main diagonal of sigma is a singular value, V is an n multiplied by n matrix, U and V are unitary matrixes, and UU is satisfied ^T ＝I，VV ^T = I, a is the weight tensor.

Furthermore, aiming at the pruning proportion of each layer network in the model to be compressed, singular value decomposition is carried out on all weight tensors to obtain the characteristic value, so that the effect of analyzing the pruning sensitivity piece by piece is achieved, and the pruning rate is determined. For the singular values, the singular values are arranged from large to small in the singular value matrix, and the singular values decrease rapidly, in general, the sum of the singular values of the first 10% or even 1% accounts for more than 99% of the sum of all the singular values, the singular value corresponding to the proportion of 97% adopted in the embodiment is set as the threshold, and then the pruning rate of each layer is determined.

Step S30, the step of performing model training on the model to be compressed and the subsequent steps are returned according to the singular value matrix until the model to be compressed meets the performance degradation condition, and the model to be compressed is output;

the method comprises the following steps of performing model training on a model to be compressed through singular value matrix return, continuously performing sparse of weight parameters in the model to be compressed, achieving iterative pruning effect of the model to be compressed, and improving the pruning effect of the model to be compressed;

in the step, when the performance degradation of the model to be compressed after the model training is detected, it is determined that the model to be compressed meets the performance degradation condition.

S40, performing parameter clustering according to the output weight tensor of the model to be compressed to obtain a weight parameter matrix, and performing weight quantization on the weight parameter matrix to obtain a clustering quantization matrix;

in this step, the formula for performing parameter clustering according to the output weight tensor of the model to be compressed includes:

wherein, w _min And w _max Respectively representing the minimum value and the maximum value of the weight tensor in the model to be compressed, and adopting a uniform interval of [ w _min ,w _max ]The weight tensor in the interval initializes the clustering center, when the clustering algorithm converges, the ownership value belonging to the same cluster is reset to the value of the corresponding centroid, so as to obtain the weight parameter matrix, please refer to fig. 3, the original weight is approximately represented by the parameter in the weight parameter matrix, the parameter clustering based on the weight tensor of the model to be compressed effectively reduces the number of effective weight values to be stored in the model to be compressed, and each weight can be represented as a clustering index in the weight parameter matrix.

Optionally, in this step, the performing weight quantization on the weight parameter matrix to obtain a clustering quantization matrix includes:

respectively matching the weight parameters in the weight parameter matrix with a pre-stored clustering comparison table to obtain a cluster gravity center value, and performing matrix construction according to the cluster gravity center value to obtain the clustering quantization matrix;

in order to further compress a model to be compressed, clustering quantization is provided, and the weight of each weight tensor is divided into K clusters S ₁ 、S ₂ 、...，S _k Clustering by Kmeans. The cluster comparison table stores the corresponding relationship between different weight parameters and the corresponding cluster gravity center values, and in the cluster comparison table, each weight parameter is bound to a corresponding cluster index which is used for determining the corresponding cluster gravity center value in the cluster comparison table.

Specifically, each weight parameter is quantized to log2Kbit, the weight parameter needs log2Kbit to store the corresponding cluster index, for example, the initial weight parameter is 32-bit floating point number, and an extra 32Kbit is needed to store the codebook, so the quantization compression ratio is calculated as:

the number of the weights in the weight tensor N is expressed, K represents the number of clusters, and for the quantization based on the clusters, selecting a proper K value is the key for realizing the quantization, in the embodiment, K is taken as 8 bits, so that under the condition that the performance of the model is not lost, the parameter quantity of the model to be compressed is reduced by nearly 5 times, the calculated quantity is reduced by 1 time, and the model compression effect is improved.

S50, setting parameters of the model to be compressed according to the clustering quantization matrix to obtain a compression model;

the method has the advantages that the clustering quantization matrix is used for setting the parameters of the weight parameters in the model to be compressed, so that the compression effect of the weight parameters in the model to be compressed is effectively achieved, and the parameters and the calculated amount in the model to be compressed are reduced.

The embodiment provides a combined model compression method based on sparse regularization, iterative pruning and clustering quantization, wherein sparsity of weight tensors in a model to be compressed is increased through sparse regularization, so that a higher pruning rate is obtained under the condition of not sacrificing model performance, performance of a compressed model output after model compression is guaranteed, the model to be compressed is trained through alternative iterative pruning and fine tuning, finally, quantization based on k-means clustering is performed on remaining weights in the model to be compressed, pruning and quantization are performed based on single-quantity sensitivity analysis, and when weight distribution is increased among tensors, selection of the pruning rate is facilitated. In the embodiment, based on a combined model compression mode of sparse regularization, iterative pruning and clustering quantization, the model compression is performed on the model to be compressed from the global perspective, the maximized model compression is realized on the premise of ensuring the accuracy, and the parameters in the model to be compressed can be effectively clustered and quantized by performing parameter clustering through the weight tensor of the model to be compressed, so that the model compression effect of the model to be compressed is further improved.

Example two

Referring to fig. 4, it is a flowchart of a model compression method according to a second embodiment of the present invention, which is used to further refine step S30 in the first embodiment, and includes the steps of:

step S31, inputting test data into the trained model to be compressed for data processing to obtain output data, and acquiring standard data of the test data;

the test data includes a plurality of different types of data, for example, the test data includes image data, voice data and audio data, and the image data, the voice data and the audio data are input into the trained model to be compressed for data processing to obtain output data.

Respectively acquiring data identifiers of each datum in the test data, and matching the acquired data identifiers with a pre-stored data query table to obtain the standard data, wherein the data query table stores corresponding relations between different data identifiers and corresponding standard data, and the standard data is used for detecting the accuracy of output data so as to evaluate the model performance of the model to be compressed;

step S32, according to the standard data and the output data, performing performance detection on the model to be compressed to obtain a subjective evaluation index, an audio intelligible index and a voice quality evaluation index;

optionally, in this step, the performing performance detection on the model to be compressed according to the standard data and the output data includes:

performing signal-to-noise ratio calculation according to the standard image and the output image to obtain the subjective evaluation index; the method comprises the steps of calculating the signal-to-noise ratio of a standard image and an output image to obtain a subjective evaluation index for evaluating the image processing performance of a model to be compressed;

calculating short-time objective intelligibility according to the output audio and the standard audio to obtain an audio intelligibility index; wherein, the audio intelligible index used for evaluating the audio processing performance of the model to be compressed is obtained by carrying out short-time objective intelligibility calculation on the output audio and the standard audio;

performing perceptual voice quality evaluation calculation according to the output voice and the standard voice to obtain the voice quality evaluation index; the method comprises the steps of carrying out perceptual speech quality evaluation calculation on output speech and standard speech to obtain a speech quality evaluation index for evaluating the speech processing performance of a model to be compressed;

step S33, if the subjective evaluation index, the audio intelligible index and the voice quality evaluation index meet preset index conditions, judging that the model to be compressed meets performance degradation conditions;

for example, in this step, whether the subjective evaluation index, the audio understandable index, and the voice quality evaluation index are smaller than an index threshold corresponding to a preset value is detected, and the index threshold may be set according to a requirement.

In this embodiment, test data is input into a trained model to be compressed for data processing to obtain output data, a data base can be provided for performance detection of the model to be compressed based on the output data, performance detection can be effectively performed on the model to be compressed based on the standard data by obtaining standard data of the test data, performance detection can be performed on the model to be compressed to obtain a subjective evaluation index, an audio understandable index and a voice quality evaluation index, image processing performance, audio processing performance and voice processing performance of the model to be compressed can be effectively represented based on the subjective evaluation index, the audio understandable index and the voice quality evaluation index, and when any one of the image processing performance, the audio processing performance and the voice processing performance of the model to be compressed is detected to be degraded, the model to be compressed is determined to meet a performance degradation condition.

EXAMPLE III

Referring to fig. 5, a schematic structural diagram of a model compression system 100 according to a third embodiment of the present invention is shown, including: the model training module 10, the singular value decomposition module 11, the model output module 12, the parameter clustering module 13 and the parameter setting module 14, wherein:

and the model training module 10 is used for obtaining a model to be compressed and carrying out model training on the model to be compressed.

The singular value decomposition module 11 is configured to add a regular term to the model to be compressed after model training, and perform singular value decomposition on the weight tensor of the model to be compressed after the regular term is added to obtain a singular value matrix, where the regular term is used to perform parameter sparsity on weight parameters in the model to be compressed.

Wherein, the singular value decomposition module 11 is further configured to: inquiring coding layers in the model to be compressed, and adding a first regular term in a normalization layer of each coding layer;

the first regularization term includes:

the second regularization term includes:

wherein L is ₁ Is the first regularization term, L ₂ The second regularization term μ _β And δ β is the mean and variance of the activation values in the model to be compressed, γ and β are affine transformation parameters, and λ is a penalty coefficient factor.

Optionally, the formula for performing singular value decomposition on the weight tensor of the model to be compressed to which the regular term is added includes:

A＝UΣV ^T

And the model output module 12 is configured to return to execute the step of performing model training on the model to be compressed and subsequent steps according to the singular value matrix until the model to be compressed meets a performance degradation condition, and output the model to be compressed.

Wherein, the model output module 12 is further configured to: inputting test data into the trained model to be compressed for data processing to obtain output data, and acquiring standard data of the test data;

Optionally, the model output module 12 is further configured to: acquiring an output image in the output data, and acquiring a standard image in the standard data;

calculating short-time objective intelligibility according to the output audio and the standard audio to obtain an audio intelligibility index;

And the parameter clustering module 13 is configured to perform parameter clustering according to the output weight tensor of the model to be compressed to obtain a weight parameter matrix, and perform weight quantization on the weight parameter matrix to obtain a clustering quantization matrix.

Wherein, the parameter clustering module 13 is further configured to: and respectively matching the weight parameters in the weight parameter matrix with a pre-stored clustering comparison table to obtain a clustering gravity center value, and performing matrix construction according to the clustering gravity center value to obtain a clustering quantization matrix, wherein the clustering comparison table stores corresponding relations between different weight parameters and corresponding clustering gravity center values.

Optionally, the formula for performing parameter clustering according to the output weight tensor of the model to be compressed includes:

And the parameter setting module 14 is configured to perform parameter setting on the to-be-compressed model according to the clustering quantization matrix to obtain a compressed model.

According to the embodiment, the regular terms are added into the model to be compressed after model training, sparsity of weight tensor in the model to be compressed is increased, so that a higher pruning rate is obtained under the condition that the performance of the model is not sacrificed, the performance of the compressed model output after model compression is ensured, the step of performing model training on the model to be compressed and subsequent steps are executed in a return mode through a singular value matrix, sparseness of weight parameters in the model to be compressed can be continuously performed, so that an iterative pruning effect of the model to be compressed is achieved, the pruning effect of the model to be compressed is improved, parameter clustering is performed through the weight tensor of the model to be compressed, parameters in the model to be compressed can be effectively clustered and quantized, and the model compression effect of the model to be compressed is further improved.

Example four

Fig. 6 is a block diagram of a terminal device 2 according to a fourth embodiment of the present application. As shown in fig. 6, the terminal device 2 of this embodiment includes: a processor 20, a memory 21 and a computer program 22, such as a program for a model compression method, stored in said memory 21 and executable on said processor 20. The steps in the various embodiments of the model compression methods described above are implemented when the computer program 22 is executed by the processor 20.

Illustratively, the computer program 22 may be partitioned into one or more modules that are stored in the memory 21 and executed by the processor 20 to accomplish the present application. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 22 in the terminal device 2. The terminal device may include, but is not limited to, a processor 20, a memory 21.

The Processor 20 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 21 may be an internal storage unit of the terminal device 2, such as a hard disk or a memory of the terminal device 2. The memory 21 may also be an external storage device of the terminal device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the terminal device 2. Further, the memory 21 may also include both an internal storage unit and an external storage device of the terminal device 2. The memory 21 is used for storing the computer program and other programs and data required by the terminal device. The memory 21 may also be used to temporarily store data that has been output or is to be output.

In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated module, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. The computer readable storage medium may be non-volatile or volatile, among others. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, software distribution medium, etc. It should be noted that the computer-readable storage medium may contain suitable additions or subtractions depending on the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer-readable storage media may not include electrical carrier signals or telecommunication signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method of model compression, the method comprising:

2. The method of model compression as claimed in claim 1, wherein before performing the step of model training and the subsequent steps of the model to be compressed according to the singular value matrix return, the method further comprises:

3. The model compression method of claim 2, wherein the performing performance detection on the model to be compressed according to the standard data and the output data comprises:

acquiring output voice in the output data and acquiring standard voice in the standard data;

4. The model compression method of claim 1, wherein adding a regularization term to the model to be compressed after model training comprises:

the first regularization term includes:

the second regularization term includes:

wherein L is ₁ Is the first regularization term, L ₂ The second regularization term, μ _β And delta _β The mean value and the variance of the activation value in the model to be compressed are respectively, gamma and beta are affine transformation parameters, and lambda is a penalty coefficient factor.

5. The model compression method as claimed in claim 1, wherein the formula for performing singular value decomposition on the weight tensor of the model to be compressed after adding the regularization term comprises:

A＝UΣV ^T

6. The model compression method of claim 1, wherein the performing weight quantization on the weight parameter matrix to obtain a cluster quantization matrix comprises:

respectively matching the weight parameters in the weight parameter matrix with a pre-stored clustering comparison table to obtain a clustering gravity center value, and performing matrix construction according to the clustering gravity center value to obtain a clustering quantization matrix, wherein the clustering comparison table stores corresponding relations between different weight parameters and corresponding clustering gravity center values.

7. The model compression method as claimed in any one of claims 1 to 6, wherein the formula for performing parameter clustering according to the outputted weight tensor of the model to be compressed comprises:

8. A model compression system, the system comprising:

the model training module is used for acquiring a model to be compressed and performing model training on the model to be compressed;

the parameter clustering module is used for performing parameter clustering according to the output weight tensor of the model to be compressed to obtain a weight parameter matrix, and performing weight quantization on the weight parameter matrix to obtain a clustering quantization matrix;

and the parameter setting module is used for setting parameters of the model to be compressed according to the clustering quantization matrix to obtain a compression model.

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a method according to any one of claims 1 to 7.