CN115456169A - Model compression method, system, terminal and storage medium - Google Patents

Model compression method, system, terminal and storage medium Download PDF

Info

Publication number
CN115456169A
CN115456169A CN202211085770.1A CN202211085770A CN115456169A CN 115456169 A CN115456169 A CN 115456169A CN 202211085770 A CN202211085770 A CN 202211085770A CN 115456169 A CN115456169 A CN 115456169A
Authority
CN
China
Prior art keywords
model
compressed
matrix
weight
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211085770.1A
Other languages
Chinese (zh)
Inventor
王光勇
关海欣
梁家恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202211085770.1A priority Critical patent/CN115456169A/en
Publication of CN115456169A publication Critical patent/CN115456169A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Pure & Applied Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Optimization (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Algebra (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a model compression method, a system, a terminal and a storage medium, wherein the method comprises the following steps: performing model training on the model to be compressed, adding a regular term and singular value decomposition into the model to be compressed to obtain a singular value matrix, returning to execute the step of performing model training on the model to be compressed and the subsequent steps according to the singular value matrix until the model to be compressed meets a performance degradation condition, and outputting the model to be compressed; performing parameter clustering according to the weight tensor of the model to be compressed to obtain a weight parameter matrix, and performing weight quantization on the weight parameter matrix to obtain a clustering quantization matrix; and setting parameters of the model to be compressed according to the clustering quantization matrix to obtain the compression model. The method is based on a combined model compression mode of sparse regularization, iterative pruning and clustering quantization, and carries out model compression on a model to be compressed from the global perspective, so that the maximum model compression is realized on the premise of ensuring the precision.

Description

Model compression method, system, terminal and storage medium
Technical Field
The present invention relates to the field of model processing technologies, and in particular, to a method, a system, a terminal, and a storage medium for compressing a model.
Background
Machine learning is now widely used in various industries. The complex model has better performance, but the high storage space and the computing resource consumption are important reasons for restricting the application efficiency, so that the model compression method is more and more emphasized by people.
In the existing model compression process, unimportant channels, convolution kernels, a certain layer and the like in the model are generally directly deleted from the model, so that the effect of reducing the parameters and the calculated amount in the model is achieved, but the compressed model has lower performance and the use experience of a user is reduced due to the fact that the channels, the convolution kernels or the network layer in the model are directly deleted.
Disclosure of Invention
The embodiment of the invention aims to provide a model compression method, a model compression system, a terminal and a storage medium, and aims to solve the problem that the performance of a compressed model is low in the existing model compression process.
The embodiment of the invention is realized in such a way that a model compression method comprises the following steps:
obtaining a model to be compressed, and performing model training on the model to be compressed;
adding a regular term into the model to be compressed after model training, and performing singular value decomposition on the weight tensor of the model to be compressed after the regular term is added to obtain a singular value matrix, wherein the regular term is used for performing parameter sparseness on weight parameters in the model to be compressed;
returning to execute the step of performing model training on the model to be compressed and the subsequent steps according to the singular value matrix until the model to be compressed meets a performance degradation condition, and outputting the model to be compressed;
performing parameter clustering according to the output weight tensor of the model to be compressed to obtain a weight parameter matrix, and performing weight quantization on the weight parameter matrix to obtain a clustering quantization matrix;
and setting parameters of the model to be compressed according to the clustering quantization matrix to obtain a compression model.
Furthermore, before the step of performing model training on the model to be compressed and the subsequent steps according to the singular value matrix, the method further includes:
inputting test data into the trained model to be compressed for data processing to obtain output data, and acquiring standard data of the test data;
according to the standard data and the output data, performing performance detection on the model to be compressed to obtain a subjective evaluation index, an audio intelligible index and a voice quality evaluation index;
and if the subjective evaluation index, the audio intelligible index and the voice quality evaluation index meet preset index conditions, judging that the model to be compressed meets performance degradation conditions.
Further, the performing performance detection on the model to be compressed according to the standard data and the output data includes:
acquiring an output image in the output data, and acquiring a standard image in the standard data;
performing signal-to-noise ratio calculation according to the standard image and the output image to obtain the subjective evaluation index;
acquiring an output audio frequency in the output data, and acquiring a standard audio frequency in the standard data;
performing short-time objective intelligibility calculation according to the output audio and the standard audio to obtain an audio intelligibility index;
acquiring output voice in the output data, and acquiring standard voice in the standard data;
and performing perceptual voice quality evaluation calculation according to the output voice and the standard voice to obtain the voice quality evaluation index.
Further, the adding a regular term to the model to be compressed after the model training includes:
inquiring coding layers in the model to be compressed, and adding a first regular term in a normalization layer of each coding layer;
inquiring the long and short term memory network layers in the model to be compressed, and adding a second regular term in the layer-in of each long and short term memory network layer;
the first regularization term includes:
Figure BDA0003835314130000031
the second regularization term includes:
Figure BDA0003835314130000032
wherein L is 1 As the first regularization term, L 2 The second regularization term, μ β And delta β The mean value and the variance of the activation value in the model to be compressed are respectively, gamma and beta are affine transformation parameters, and lambda is a penalty coefficient factor.
Further, the formula for performing singular value decomposition on the weight tensor of the model to be compressed after the regular term is added includes:
A=UΣV T
u is an m multiplied by m matrix, sigma is an m multiplied by n matrix, elements on a main diagonal of the sigma except for the main diagonal are all zero, each element on the main diagonal of the sigma is a singular value, V is an n multiplied by n matrix, and U and V are unitary matrixes, so that UU is satisfied T =I,VV T = I, a is the weight tensor.
Further, the performing weight quantization on the weight parameter matrix to obtain a cluster quantization matrix includes:
and respectively matching the weight parameters in the weight parameter matrix with a pre-stored clustering comparison table to obtain a clustering gravity center value, and performing matrix construction according to the clustering gravity center value to obtain a clustering quantization matrix, wherein the clustering comparison table stores corresponding relations between different weight parameters and corresponding clustering gravity center values.
Further, the formula for performing parameter clustering according to the output weight tensor of the model to be compressed includes:
Figure BDA0003835314130000033
wherein, w min And w max Representing the minimum and maximum values of the weight tensor, respectively.
It is another object of an embodiment of the present invention to provide a model compression system, including:
the model training module is used for obtaining a model to be compressed and carrying out model training on the model to be compressed;
the singular value decomposition module is used for adding a regular term into the model to be compressed after model training, and performing singular value decomposition on the weight tensor of the model to be compressed after the regular term is added to obtain a singular value matrix, wherein the regular term is used for performing parameter sparseness on weight parameters in the model to be compressed;
the model output module is used for returning and executing the step of performing model training on the model to be compressed and the subsequent steps according to the singular value matrix until the model to be compressed meets a performance degradation condition, and outputting the model to be compressed;
the parameter clustering module is used for carrying out parameter clustering according to the output weight tensor of the model to be compressed to obtain a weight parameter matrix, and carrying out weight quantization on the weight parameter matrix to obtain a clustering quantization matrix;
and the parameter setting module is used for carrying out parameter setting on the model to be compressed according to the clustering quantization matrix to obtain a compression model.
It is another object of the embodiments of the present invention to provide a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method are implemented.
It is a further object of an embodiment of the present invention to provide a computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the steps of the above method.
According to the embodiment of the invention, the regular terms are added in the model to be compressed after model training, and the sparsity of the weight tensor in the model to be compressed is increased, so that a higher pruning rate is obtained under the condition of not sacrificing the performance of the model, the performance of the compressed model output after model compression is ensured, the step of performing model training on the model to be compressed and the subsequent steps are returned through the singular value matrix, the sparsity of the weight parameters in the model to be compressed can be continuously performed, so that the iterative pruning effect of the model to be compressed is achieved, the pruning effect of the model to be compressed is improved, the parameters in the model to be compressed can be effectively clustered and quantized through the weight tensor of the model to be compressed, and the model compression effect of the model to be compressed is further improved.
Drawings
FIG. 1 is a flow chart of a model compression method provided by a first embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a model to be compressed according to a first embodiment of the present invention;
FIG. 3 is a diagram illustrating a parameter clustering process of weight tensors according to a first embodiment of the present invention;
FIG. 4 is a flow chart of a model compression method provided by a second embodiment of the present invention;
FIG. 5 is a schematic diagram of a model compression system according to a third embodiment of the present invention;
fig. 6 is a schematic structural diagram of a terminal device according to a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
Example one
Referring to fig. 1, a flowchart of a model compression method according to a first embodiment of the present invention is shown, where the model compression method can be applied to any terminal device or system, and the model compression method includes the steps of:
s10, obtaining a model to be compressed, and performing model training on the model to be compressed;
the model to be compressed may be set as required, please refer to fig. 2, where the model to be compressed in this step is a Diffusion Convolutional Recurrent Neural Network (DCRNN) model, the DCRNN model includes seven coding (Encoder) layers, two Long and Short Term Memory network layers, and a full connection layer, the Encoder layer includes a Convolutional Neural Network (CNN), batchNorm2d, and a prunelu, the Long and Short Term Memory network layer includes a Long Short-Term Memory network (LSTM) and a LaynerNorm, and the batchm 2d and the LaynerNorm are unified.
In the step, before the model training, the importance of the weight in the model to be compressed is not distinguished, and after the model training is carried out on the model to be compressed, the important weight and the unimportant weight in the model to be compressed can be distinguished so as to store the optimal parameter quantity of the model, thereby facilitating the subsequent sparsification treatment on the model to be compressed.
Step S20, adding a regular term in the model to be compressed after model training, and performing singular value decomposition on the weight tensor of the model to be compressed after the regular term is added to obtain a singular value matrix;
the regular term is used for parameter sparseness of weight parameters in the model to be compressed, and by adding the regular term to the model to be compressed after model training, unimportant weights in the model to be compressed can be sparse without influencing the precision of the model to be compressed, so that a sparse regularization effect is achieved, and a model pruning effect of a subsequent model to be compressed is guaranteed.
Optionally, in this step, adding a regular term to the model to be compressed after the model training includes:
inquiring coding layers in the model to be compressed, and adding a first regular term in a normalization layer of each coding layer;
inquiring the long-short term memory network layer in the model to be compressed, and adding a second regular term in the layer-returning of each long-short term memory network layer;
aiming at the coding layer, a first regular term is introduced into the BatchNorm layer, the BatchNorm layer acts on the channel, the weight of the channel which is not important can be thinned to 0, pruning is facilitated, aiming at the pruning of the long-term and short-term memory network layer, a second regular term is introduced into the LayerNorm layer, the LayerNorm layer acts on the hide-to-hide, the weight which is not important on hide-to-hide can be thinned to 0, and pruning is facilitated.
Further, the first regularization term includes:
Figure BDA0003835314130000061
the second regularization term includes:
Figure BDA0003835314130000071
wherein L is 1 Is a first regularization term, L 2 Second regularization term, μ β And δ β are respectively the mean and variance of the activation values in the model to be compressed, γ and β are affine transformation parameters, and λ is a penalty factor.
In the step, the problem that the model to be compressed cannot be pruned from the global perspective in the prior art is effectively solved by adding the first regular term in the normalization layer of each coding layer and adding the second regular term in the normalization layer of each long-short term memory network layer. In this step, when the epoch is 50%, λ is 10 during sparse training of the model to be compressed -4 (ii) a When the epoch is greater than 50%, λ is 10 -6 So that the loss of precision due to sparsification can be quickly recovered, one epoch represents: all dataAnd sending the data into the network to finish the process of one-time forward calculation and backward propagation.
Further, in this step, the formula for performing singular value decomposition on the weight tensor of the model to be compressed to which the regular term is added includes:
A=UΣV T
u is an m multiplied by m matrix, sigma is an m multiplied by n matrix, elements of sigma except a main diagonal are all zero, each element of the main diagonal of sigma is a singular value, V is an n multiplied by n matrix, U and V are unitary matrixes, and UU is satisfied T =I,VV T = I, a is the weight tensor.
Furthermore, aiming at the pruning proportion of each layer network in the model to be compressed, singular value decomposition is carried out on all weight tensors to obtain the characteristic value, so that the effect of analyzing the pruning sensitivity piece by piece is achieved, and the pruning rate is determined. For the singular values, the singular values are arranged from large to small in the singular value matrix, and the singular values decrease rapidly, in general, the sum of the singular values of the first 10% or even 1% accounts for more than 99% of the sum of all the singular values, the singular value corresponding to the proportion of 97% adopted in the embodiment is set as the threshold, and then the pruning rate of each layer is determined.
Step S30, the step of performing model training on the model to be compressed and the subsequent steps are returned according to the singular value matrix until the model to be compressed meets the performance degradation condition, and the model to be compressed is output;
the method comprises the following steps of performing model training on a model to be compressed through singular value matrix return, continuously performing sparse of weight parameters in the model to be compressed, achieving iterative pruning effect of the model to be compressed, and improving the pruning effect of the model to be compressed;
in the step, when the performance degradation of the model to be compressed after the model training is detected, it is determined that the model to be compressed meets the performance degradation condition.
S40, performing parameter clustering according to the output weight tensor of the model to be compressed to obtain a weight parameter matrix, and performing weight quantization on the weight parameter matrix to obtain a clustering quantization matrix;
in this step, the formula for performing parameter clustering according to the output weight tensor of the model to be compressed includes:
Figure BDA0003835314130000081
wherein, w min And w max Respectively representing the minimum value and the maximum value of the weight tensor in the model to be compressed, and adopting a uniform interval of [ w min ,w max ]The weight tensor in the interval initializes the clustering center, when the clustering algorithm converges, the ownership value belonging to the same cluster is reset to the value of the corresponding centroid, so as to obtain the weight parameter matrix, please refer to fig. 3, the original weight is approximately represented by the parameter in the weight parameter matrix, the parameter clustering based on the weight tensor of the model to be compressed effectively reduces the number of effective weight values to be stored in the model to be compressed, and each weight can be represented as a clustering index in the weight parameter matrix.
Optionally, in this step, the performing weight quantization on the weight parameter matrix to obtain a clustering quantization matrix includes:
respectively matching the weight parameters in the weight parameter matrix with a pre-stored clustering comparison table to obtain a cluster gravity center value, and performing matrix construction according to the cluster gravity center value to obtain the clustering quantization matrix;
in order to further compress a model to be compressed, clustering quantization is provided, and the weight of each weight tensor is divided into K clusters S 1 、S 2 、...,S k Clustering by Kmeans. The cluster comparison table stores the corresponding relationship between different weight parameters and the corresponding cluster gravity center values, and in the cluster comparison table, each weight parameter is bound to a corresponding cluster index which is used for determining the corresponding cluster gravity center value in the cluster comparison table.
Specifically, each weight parameter is quantized to log2Kbit, the weight parameter needs log2Kbit to store the corresponding cluster index, for example, the initial weight parameter is 32-bit floating point number, and an extra 32Kbit is needed to store the codebook, so the quantization compression ratio is calculated as:
Figure BDA0003835314130000091
the number of the weights in the weight tensor N is expressed, K represents the number of clusters, and for the quantization based on the clusters, selecting a proper K value is the key for realizing the quantization, in the embodiment, K is taken as 8 bits, so that under the condition that the performance of the model is not lost, the parameter quantity of the model to be compressed is reduced by nearly 5 times, the calculated quantity is reduced by 1 time, and the model compression effect is improved.
S50, setting parameters of the model to be compressed according to the clustering quantization matrix to obtain a compression model;
the method has the advantages that the clustering quantization matrix is used for setting the parameters of the weight parameters in the model to be compressed, so that the compression effect of the weight parameters in the model to be compressed is effectively achieved, and the parameters and the calculated amount in the model to be compressed are reduced.
The embodiment provides a combined model compression method based on sparse regularization, iterative pruning and clustering quantization, wherein sparsity of weight tensors in a model to be compressed is increased through sparse regularization, so that a higher pruning rate is obtained under the condition of not sacrificing model performance, performance of a compressed model output after model compression is guaranteed, the model to be compressed is trained through alternative iterative pruning and fine tuning, finally, quantization based on k-means clustering is performed on remaining weights in the model to be compressed, pruning and quantization are performed based on single-quantity sensitivity analysis, and when weight distribution is increased among tensors, selection of the pruning rate is facilitated. In the embodiment, based on a combined model compression mode of sparse regularization, iterative pruning and clustering quantization, the model compression is performed on the model to be compressed from the global perspective, the maximized model compression is realized on the premise of ensuring the accuracy, and the parameters in the model to be compressed can be effectively clustered and quantized by performing parameter clustering through the weight tensor of the model to be compressed, so that the model compression effect of the model to be compressed is further improved.
Example two
Referring to fig. 4, it is a flowchart of a model compression method according to a second embodiment of the present invention, which is used to further refine step S30 in the first embodiment, and includes the steps of:
step S31, inputting test data into the trained model to be compressed for data processing to obtain output data, and acquiring standard data of the test data;
the test data includes a plurality of different types of data, for example, the test data includes image data, voice data and audio data, and the image data, the voice data and the audio data are input into the trained model to be compressed for data processing to obtain output data.
Respectively acquiring data identifiers of each datum in the test data, and matching the acquired data identifiers with a pre-stored data query table to obtain the standard data, wherein the data query table stores corresponding relations between different data identifiers and corresponding standard data, and the standard data is used for detecting the accuracy of output data so as to evaluate the model performance of the model to be compressed;
step S32, according to the standard data and the output data, performing performance detection on the model to be compressed to obtain a subjective evaluation index, an audio intelligible index and a voice quality evaluation index;
optionally, in this step, the performing performance detection on the model to be compressed according to the standard data and the output data includes:
acquiring an output image in the output data, and acquiring a standard image in the standard data;
performing signal-to-noise ratio calculation according to the standard image and the output image to obtain the subjective evaluation index; the method comprises the steps of calculating the signal-to-noise ratio of a standard image and an output image to obtain a subjective evaluation index for evaluating the image processing performance of a model to be compressed;
acquiring an output audio frequency in the output data, and acquiring a standard audio frequency in the standard data;
calculating short-time objective intelligibility according to the output audio and the standard audio to obtain an audio intelligibility index; wherein, the audio intelligible index used for evaluating the audio processing performance of the model to be compressed is obtained by carrying out short-time objective intelligibility calculation on the output audio and the standard audio;
acquiring output voice in the output data, and acquiring standard voice in the standard data;
performing perceptual voice quality evaluation calculation according to the output voice and the standard voice to obtain the voice quality evaluation index; the method comprises the steps of carrying out perceptual speech quality evaluation calculation on output speech and standard speech to obtain a speech quality evaluation index for evaluating the speech processing performance of a model to be compressed;
step S33, if the subjective evaluation index, the audio intelligible index and the voice quality evaluation index meet preset index conditions, judging that the model to be compressed meets performance degradation conditions;
for example, in this step, whether the subjective evaluation index, the audio understandable index, and the voice quality evaluation index are smaller than an index threshold corresponding to a preset value is detected, and the index threshold may be set according to a requirement.
In this embodiment, test data is input into a trained model to be compressed for data processing to obtain output data, a data base can be provided for performance detection of the model to be compressed based on the output data, performance detection can be effectively performed on the model to be compressed based on the standard data by obtaining standard data of the test data, performance detection can be performed on the model to be compressed to obtain a subjective evaluation index, an audio understandable index and a voice quality evaluation index, image processing performance, audio processing performance and voice processing performance of the model to be compressed can be effectively represented based on the subjective evaluation index, the audio understandable index and the voice quality evaluation index, and when any one of the image processing performance, the audio processing performance and the voice processing performance of the model to be compressed is detected to be degraded, the model to be compressed is determined to meet a performance degradation condition.
EXAMPLE III
Referring to fig. 5, a schematic structural diagram of a model compression system 100 according to a third embodiment of the present invention is shown, including: the model training module 10, the singular value decomposition module 11, the model output module 12, the parameter clustering module 13 and the parameter setting module 14, wherein:
and the model training module 10 is used for obtaining a model to be compressed and carrying out model training on the model to be compressed.
The singular value decomposition module 11 is configured to add a regular term to the model to be compressed after model training, and perform singular value decomposition on the weight tensor of the model to be compressed after the regular term is added to obtain a singular value matrix, where the regular term is used to perform parameter sparsity on weight parameters in the model to be compressed.
Wherein, the singular value decomposition module 11 is further configured to: inquiring coding layers in the model to be compressed, and adding a first regular term in a normalization layer of each coding layer;
inquiring the long and short term memory network layers in the model to be compressed, and adding a second regular term in the layer-in of each long and short term memory network layer;
the first regularization term includes:
Figure BDA0003835314130000121
the second regularization term includes:
Figure BDA0003835314130000122
wherein L is 1 Is the first regularization term, L 2 The second regularization term μ β And δ β is the mean and variance of the activation values in the model to be compressed, γ and β are affine transformation parameters, and λ is a penalty coefficient factor.
Optionally, the formula for performing singular value decomposition on the weight tensor of the model to be compressed to which the regular term is added includes:
A=UΣV T
u is an m multiplied by m matrix, sigma is an m multiplied by n matrix, elements of sigma except a main diagonal are all zero, each element of the main diagonal of sigma is a singular value, V is an n multiplied by n matrix, U and V are unitary matrixes, and UU is satisfied T =I,VV T = I, a is the weight tensor.
And the model output module 12 is configured to return to execute the step of performing model training on the model to be compressed and subsequent steps according to the singular value matrix until the model to be compressed meets a performance degradation condition, and output the model to be compressed.
Wherein, the model output module 12 is further configured to: inputting test data into the trained model to be compressed for data processing to obtain output data, and acquiring standard data of the test data;
according to the standard data and the output data, performing performance detection on the model to be compressed to obtain a subjective evaluation index, an audio intelligible index and a voice quality evaluation index;
and if the subjective evaluation index, the audio intelligible index and the voice quality evaluation index meet preset index conditions, judging that the model to be compressed meets performance degradation conditions.
Optionally, the model output module 12 is further configured to: acquiring an output image in the output data, and acquiring a standard image in the standard data;
performing signal-to-noise ratio calculation according to the standard image and the output image to obtain the subjective evaluation index;
acquiring an output audio frequency in the output data, and acquiring a standard audio frequency in the standard data;
calculating short-time objective intelligibility according to the output audio and the standard audio to obtain an audio intelligibility index;
acquiring output voice in the output data, and acquiring standard voice in the standard data;
and performing perceptual voice quality evaluation calculation according to the output voice and the standard voice to obtain the voice quality evaluation index.
And the parameter clustering module 13 is configured to perform parameter clustering according to the output weight tensor of the model to be compressed to obtain a weight parameter matrix, and perform weight quantization on the weight parameter matrix to obtain a clustering quantization matrix.
Wherein, the parameter clustering module 13 is further configured to: and respectively matching the weight parameters in the weight parameter matrix with a pre-stored clustering comparison table to obtain a clustering gravity center value, and performing matrix construction according to the clustering gravity center value to obtain a clustering quantization matrix, wherein the clustering comparison table stores corresponding relations between different weight parameters and corresponding clustering gravity center values.
Optionally, the formula for performing parameter clustering according to the output weight tensor of the model to be compressed includes:
Figure BDA0003835314130000131
wherein, w min And w max Representing the minimum and maximum values of the weight tensor, respectively.
And the parameter setting module 14 is configured to perform parameter setting on the to-be-compressed model according to the clustering quantization matrix to obtain a compressed model.
According to the embodiment, the regular terms are added into the model to be compressed after model training, sparsity of weight tensor in the model to be compressed is increased, so that a higher pruning rate is obtained under the condition that the performance of the model is not sacrificed, the performance of the compressed model output after model compression is ensured, the step of performing model training on the model to be compressed and subsequent steps are executed in a return mode through a singular value matrix, sparseness of weight parameters in the model to be compressed can be continuously performed, so that an iterative pruning effect of the model to be compressed is achieved, the pruning effect of the model to be compressed is improved, parameter clustering is performed through the weight tensor of the model to be compressed, parameters in the model to be compressed can be effectively clustered and quantized, and the model compression effect of the model to be compressed is further improved.
Example four
Fig. 6 is a block diagram of a terminal device 2 according to a fourth embodiment of the present application. As shown in fig. 6, the terminal device 2 of this embodiment includes: a processor 20, a memory 21 and a computer program 22, such as a program for a model compression method, stored in said memory 21 and executable on said processor 20. The steps in the various embodiments of the model compression methods described above are implemented when the computer program 22 is executed by the processor 20.
Illustratively, the computer program 22 may be partitioned into one or more modules that are stored in the memory 21 and executed by the processor 20 to accomplish the present application. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 22 in the terminal device 2. The terminal device may include, but is not limited to, a processor 20, a memory 21.
The Processor 20 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 21 may be an internal storage unit of the terminal device 2, such as a hard disk or a memory of the terminal device 2. The memory 21 may also be an external storage device of the terminal device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the terminal device 2. Further, the memory 21 may also include both an internal storage unit and an external storage device of the terminal device 2. The memory 21 is used for storing the computer program and other programs and data required by the terminal device. The memory 21 may also be used to temporarily store data that has been output or is to be output.
In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated module, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. The computer readable storage medium may be non-volatile or volatile, among others. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, software distribution medium, etc. It should be noted that the computer-readable storage medium may contain suitable additions or subtractions depending on the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer-readable storage media may not include electrical carrier signals or telecommunication signals in accordance with legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A method of model compression, the method comprising:
obtaining a model to be compressed, and performing model training on the model to be compressed;
adding a regular term into the model to be compressed after model training, and performing singular value decomposition on the weight tensor of the model to be compressed after the regular term is added to obtain a singular value matrix, wherein the regular term is used for performing parameter sparseness on weight parameters in the model to be compressed;
returning to execute the step of performing model training on the model to be compressed and the subsequent steps according to the singular value matrix until the model to be compressed meets a performance degradation condition, and outputting the model to be compressed;
performing parameter clustering according to the output weight tensor of the model to be compressed to obtain a weight parameter matrix, and performing weight quantization on the weight parameter matrix to obtain a clustering quantization matrix;
and setting parameters of the model to be compressed according to the clustering quantization matrix to obtain a compression model.
2. The method of model compression as claimed in claim 1, wherein before performing the step of model training and the subsequent steps of the model to be compressed according to the singular value matrix return, the method further comprises:
inputting test data into the trained model to be compressed for data processing to obtain output data, and acquiring standard data of the test data;
according to the standard data and the output data, performing performance detection on the model to be compressed to obtain a subjective evaluation index, an audio intelligible index and a voice quality evaluation index;
and if the subjective evaluation index, the audio intelligible index and the voice quality evaluation index meet preset index conditions, judging that the model to be compressed meets performance degradation conditions.
3. The model compression method of claim 2, wherein the performing performance detection on the model to be compressed according to the standard data and the output data comprises:
acquiring an output image in the output data, and acquiring a standard image in the standard data;
performing signal-to-noise ratio calculation according to the standard image and the output image to obtain the subjective evaluation index;
acquiring an output audio frequency in the output data, and acquiring a standard audio frequency in the standard data;
calculating short-time objective intelligibility according to the output audio and the standard audio to obtain an audio intelligibility index;
acquiring output voice in the output data and acquiring standard voice in the standard data;
and performing perceptual voice quality evaluation calculation according to the output voice and the standard voice to obtain the voice quality evaluation index.
4. The model compression method of claim 1, wherein adding a regularization term to the model to be compressed after model training comprises:
inquiring coding layers in the model to be compressed, and adding a first regular term in a normalization layer of each coding layer;
inquiring the long-short term memory network layer in the model to be compressed, and adding a second regular term in the layer-returning of each long-short term memory network layer;
the first regularization term includes:
Figure FDA0003835314120000021
the second regularization term includes:
Figure FDA0003835314120000022
wherein L is 1 Is the first regularization term, L 2 The second regularization term, μ β And delta β The mean value and the variance of the activation value in the model to be compressed are respectively, gamma and beta are affine transformation parameters, and lambda is a penalty coefficient factor.
5. The model compression method as claimed in claim 1, wherein the formula for performing singular value decomposition on the weight tensor of the model to be compressed after adding the regularization term comprises:
A=UΣV T
u is an m multiplied by m matrix, sigma is an m multiplied by n matrix, elements of sigma except a main diagonal are all zero, each element of the main diagonal of sigma is a singular value, V is an n multiplied by n matrix, U and V are unitary matrixes, and UU is satisfied T =I,VV T = I, a is the weight tensor.
6. The model compression method of claim 1, wherein the performing weight quantization on the weight parameter matrix to obtain a cluster quantization matrix comprises:
respectively matching the weight parameters in the weight parameter matrix with a pre-stored clustering comparison table to obtain a clustering gravity center value, and performing matrix construction according to the clustering gravity center value to obtain a clustering quantization matrix, wherein the clustering comparison table stores corresponding relations between different weight parameters and corresponding clustering gravity center values.
7. The model compression method as claimed in any one of claims 1 to 6, wherein the formula for performing parameter clustering according to the outputted weight tensor of the model to be compressed comprises:
Figure FDA0003835314120000031
wherein, w min And w max Representing the minimum and maximum values of the weight tensor, respectively.
8. A model compression system, the system comprising:
the model training module is used for acquiring a model to be compressed and performing model training on the model to be compressed;
the singular value decomposition module is used for adding a regular term into the model to be compressed after model training, and performing singular value decomposition on the weight tensor of the model to be compressed after the regular term is added to obtain a singular value matrix, wherein the regular term is used for performing parameter sparseness on weight parameters in the model to be compressed;
the model output module is used for returning and executing the step of performing model training on the model to be compressed and the subsequent steps according to the singular value matrix until the model to be compressed meets a performance degradation condition, and outputting the model to be compressed;
the parameter clustering module is used for performing parameter clustering according to the output weight tensor of the model to be compressed to obtain a weight parameter matrix, and performing weight quantization on the weight parameter matrix to obtain a clustering quantization matrix;
and the parameter setting module is used for setting parameters of the model to be compressed according to the clustering quantization matrix to obtain a compression model.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a method according to any one of claims 1 to 7.
CN202211085770.1A 2022-09-06 2022-09-06 Model compression method, system, terminal and storage medium Pending CN115456169A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211085770.1A CN115456169A (en) 2022-09-06 2022-09-06 Model compression method, system, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211085770.1A CN115456169A (en) 2022-09-06 2022-09-06 Model compression method, system, terminal and storage medium

Publications (1)

Publication Number Publication Date
CN115456169A true CN115456169A (en) 2022-12-09

Family

ID=84302438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211085770.1A Pending CN115456169A (en) 2022-09-06 2022-09-06 Model compression method, system, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN115456169A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116189667A (en) * 2023-04-27 2023-05-30 摩尔线程智能科技(北京)有限责任公司 Quantization compression method, device, equipment and storage medium of voice processing model
CN116992946A (en) * 2023-09-27 2023-11-03 荣耀终端有限公司 Model compression method, apparatus, storage medium, and program product

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116189667A (en) * 2023-04-27 2023-05-30 摩尔线程智能科技(北京)有限责任公司 Quantization compression method, device, equipment and storage medium of voice processing model
CN116992946A (en) * 2023-09-27 2023-11-03 荣耀终端有限公司 Model compression method, apparatus, storage medium, and program product
CN116992946B (en) * 2023-09-27 2024-05-17 荣耀终端有限公司 Model compression method, apparatus, storage medium, and program product

Similar Documents

Publication Publication Date Title
CN115456169A (en) Model compression method, system, terminal and storage medium
Polino et al. Model compression via distillation and quantization
CN110619385B (en) Structured network model compression acceleration method based on multi-stage pruning
CN109002889B (en) Adaptive iterative convolution neural network model compression method
WO2021135715A1 (en) Image compression method and apparatus
CN110175641B (en) Image recognition method, device, equipment and storage medium
CN110874625B (en) Data processing method and device
Hong et al. Daq: Channel-wise distribution-aware quantization for deep image super-resolution networks
CN112016674A (en) Knowledge distillation-based convolutional neural network quantification method
US20210271973A1 (en) Operation method and apparatus for network layer in deep neural network
US20220383479A1 (en) Method for detecting defects in images, computer device, and storage medium
US20230385645A1 (en) Method for automatic hybrid quantization of deep artificial neural networks
Hepburn et al. On the relation between statistical learning and perceptual distances
CN114676825A (en) Neural network model quantification method, system, device and medium
CN113554097B (en) Model quantization method and device, electronic equipment and storage medium
CN111614358B (en) Feature extraction method, system, equipment and storage medium based on multichannel quantization
CN112200275B (en) Artificial neural network quantification method and device
CN111832596B (en) Data processing method, electronic device and computer readable medium
CN112906883A (en) Hybrid precision quantization strategy determination method and system for deep neural network
CN115705486A (en) Method and device for training quantitative model, electronic equipment and readable storage medium
Hong et al. DAQ: distribution-aware quantization for deep image super-resolution networks
CN114692892B (en) Method for processing numerical characteristics, model training method and device
CN114664316B (en) Audio restoration method, device, equipment and medium based on automatic pickup
WO2022205890A1 (en) Method, apparatus, and system for transmitting image features
CN114764756B (en) Quantitative pruning method and system for defogging model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination