CN110826692A

CN110826692A - Automatic model compression method, device, equipment and storage medium

Info

Publication number: CN110826692A
Application number: CN201911016094.0A
Authority: CN
Inventors: 王家兴; 柏昊立; 吴家祥
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-10-24
Filing date: 2019-10-24
Publication date: 2020-02-21
Anticipated expiration: 2039-10-24
Also published as: CN110826692B

Abstract

The application relates to an automatic model compression method, an automatic model compression device, automatic model compression equipment and a storage medium, wherein the method comprises the following steps: determining a first number of preset distribution components, and determining a statistical value of each preset distribution component based on a preset model parameter range and the first number; constructing a model to be trained based on the preset model and the statistical value of each preset distribution component; obtaining a training sample, and training the model to be trained based on the training sample to obtain a training result; determining a target distribution component based on the training result; quantizing each model parameter in the trained model into a target statistic value of the target distribution component respectively to obtain quantization parameters corresponding to each model parameter respectively; and generating a target compression model corresponding to the trained model based on the quantization parameters. By adopting the method and the device, the time consumption of the model compression process is shortened, and the calculated amount in the model compression process is reduced, so that the efficiency of the model compression is improved.

Description

Automatic model compression method, device, equipment and storage medium

Technical Field

The present application relates to the field of neural network technologies, and in particular, to a method, an apparatus, a device, and a storage medium for compressing an automation model.

Background

The neural network model enables the performance of many artificial intelligence tasks to reach an unprecedented level, however, the complex model has better performance inherently, but the high storage space and the computing resource consumption are important reasons which make the artificial intelligence tasks difficult to be effectively applied to various hardware platforms. In order to solve the problems, some technical schemes for compressing the model are proposed in the prior art, and parameter redundancy can be effectively reduced by compressing the model, so that storage occupation, communication bandwidth and calculation complexity are reduced, and application deployment of deep learning is facilitated.

However, some of the technical solutions proposed in the prior art for model compression need to rely on manual experience to select a layer-by-layer compression strategy of a neural network model, which is time-consuming and labor-consuming; some automatic model compression methods based on reinforcement learning consume long time in the training process and have large calculation amount. It is therefore desirable to provide a time-consuming and efficient model compression method.

Disclosure of Invention

The technical problem to be solved by the application is to provide an automatic model compression method, device, equipment and storage medium, which can train a constructed model based on a training sample, and can automatically determine a corresponding layer-by-layer compression strategy after training, thereby shortening the time consumption of the model compression process, reducing the calculated amount in the model compression process and improving the efficiency of model compression.

In order to solve the above technical problem, in one aspect, the present application provides an automated model compression method, including:

determining a first number of preset distribution components, and determining a statistical value of each preset distribution component based on a preset model parameter range and the first number;

constructing a model to be trained based on a preset model and the statistical values of the preset distribution components, wherein the parameters of each model in the model to be trained obey the mixed component distribution consisting of the first number of preset distribution components;

acquiring a training sample, training each model parameter in the model to be trained based on the training sample to obtain a trained model, and training the statistical value of each preset distribution component based on the training sample to obtain a target statistical value of each preset distribution component;

determining target distribution components from the first number of preset distribution components based on the model parameters in the trained model and the target statistical values of the preset distribution components;

quantizing each model parameter in the trained model into a target statistic value of the target distribution component respectively to obtain quantization parameters corresponding to each model parameter respectively;

and generating a target compression model corresponding to the trained model based on the quantization parameters.

In another aspect, the present application provides an automated model compression apparatus, the apparatus comprising:

the statistical value determining module is used for determining a first number of preset distribution components and determining the statistical value of each preset distribution component based on a preset model parameter range and the first number;

a model to be trained building module, configured to build a model to be trained based on a preset model and a statistical value of each preset distribution component, where each model parameter in the model to be trained obeys a mixed distribution composed of the first number of preset distribution components;

the model training module is used for acquiring a training sample, training each model parameter in the model to be trained based on the training sample to obtain a trained model, and training the statistic value of each preset distribution component based on the training sample to obtain a target statistic value of each preset distribution component;

a target distribution component determination module, configured to determine a target distribution component from the first number of preset distribution components based on each model parameter in the trained model and a target statistic of each preset distribution component;

a parameter quantization module, configured to quantize each model parameter in the trained model into a target statistical value of the target distribution component, respectively, to obtain quantization parameters corresponding to each model parameter;

and the compression model generation module is used for generating a target compression model corresponding to the trained model based on each quantization parameter.

In another aspect, the present application provides an apparatus comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the automated model compression method as described above.

In another aspect, the present application provides a computer storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions that is loaded by a processor and that performs an automated model compression method as described above.

The embodiment of the application has the following beneficial effects:

the method comprises the steps that a model to be trained is constructed through a first number of preset distribution components, and each model parameter in the model to be trained obeys mixed component distribution formed by the first number of preset distribution components; training the model parameters and the statistical values of the preset distribution components in the model to be trained based on training samples; determining target distribution components of each layer from each preset distribution component, and quantizing each model parameter of each layer in the trained model into a target statistic value of the target distribution component to which the model parameter belongs to obtain a quantized parameter; based on the quantization parameters, a target compression model is generated. The method and the device can train the constructed model based on the training sample, and can automatically determine the corresponding layer-by-layer compression strategy after training, so that the time consumption of the model compression process is shortened, the calculated amount in the model compression process is reduced, and the efficiency of model compression is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application;

FIG. 2 is a flow chart of an automated model compression method provided by an embodiment of the present application;

fig. 3 is a flowchart of an initialization method for presetting distribution components according to an embodiment of the present disclosure;

FIG. 4 is a flowchart of a method for constructing a model to be trained according to an embodiment of the present disclosure;

fig. 5 is a flowchart of a target distribution component determination method provided in an embodiment of the present application;

FIG. 6 is a flowchart of a method for determining Gaussian distribution components to which model parameters belong according to an embodiment of the present application;

FIG. 7 is a flowchart of a method for quantifying model parameters according to an embodiment of the present disclosure;

FIG. 8 is a probability model diagram of a Gaussian mixture distribution provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a Gaussian mixture distribution provided by an embodiment of the present application;

FIG. 10 is a schematic diagram of a Dirichlet process configuration provided by an embodiment of the present application;

FIG. 11 is a schematic diagram of an application scenario provided in an embodiment of the present application;

FIG. 12 is a graph illustrating a first comparison result provided by an embodiment of the present application;

FIG. 13 is a graph illustrating a second comparison result provided by an embodiment of the present application;

FIG. 14 is a schematic diagram of an automated model compression apparatus provided by an embodiment of the present application;

FIG. 15 is a schematic diagram of a statistic determination module provided by an embodiment of the present application;

FIG. 16 is a schematic diagram of a model building module to be trained according to an embodiment of the present application;

FIG. 17 is a schematic diagram of a target distribution component determination module provided by an embodiment of the present application;

FIG. 18 is a schematic diagram of a comparison module provided in an embodiment of the present application;

FIG. 19 is a schematic diagram of a parameter quantization module provided in an embodiment of the present application;

fig. 20 is a schematic structural diagram of an apparatus according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, the present application will be further described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The terms referred to in the examples of the present application are first explained below:

neural network compression: model parameters of the neural network are compressed, the storage space of the model is reduced, and the response time of the model is shortened.

Dirichlet process: a non-parametric model is commonly used to determine the number of mixture components in a mixture model.

And (3) variational reasoning: in machine learning, a method for solving hidden variables is used for reasoning hidden variables and training a model by optimizing a variation lower bound of data likelihood.

The application can be applied to the field of Artificial Intelligence (AI), which is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Referring to fig. 1, a schematic diagram of an implementation environment provided by an embodiment of the present application is shown, which at least includes: at least one terminal 110 and a server 120, the terminal 110 and the server 120 being in data communication via a network. Specifically, the terminal 110 may provide the preset model and the corresponding training sample to the server 120, so that the server 120 determines a layer-by-layer quantization compression strategy of the model based on the preset model and the training sample, performs model compression according to the quantization compression strategy, and returns the compression model to the terminal 110.

The terminal 110 may communicate with the Server 120 based on a Browser/Server mode (Browser/Server, B/S) or a Client/Server mode (Client/Server, C/S). The terminal 110 may include: the physical devices may also include software running in the physical devices, such as application programs and the like. The operating system running on the terminal 110 in the embodiment of the present application may include, but is not limited to, an android system, an IOS system, linux, windows, and the like.

The server 120 and the terminal 110 may establish a communication connection through a wired or wireless connection, and the server 120 may include a server operating independently, or a distributed server, or a server cluster composed of a plurality of servers, where the server may be a cloud server.

In order to solve the problems of long time consumption, large calculation amount, and the like in a model compression scheme in the prior art, an embodiment of the present application provides an automated model compression method, please refer to fig. 1, an execution subject of the method may be a server in fig. 1 or a terminal in fig. 1, and specifically, refer to fig. 2, which illustrates a model compression method, where the method includes:

s210, determining a first number of preset distribution components, and determining a statistical value of each preset distribution component based on a preset model parameter range and the first number.

The preset distribution components in the embodiment of the present application are mainly used for constructing a model to be trained, where each distribution component corresponds to one distribution, and each preset distribution component is a distribution of the same type but with different parameters, for example, each preset distribution component may be a gaussian distribution with different mean values and variances, or a distribution such as a multi-term distribution that can be used for constructing a model to be trained, and the embodiment of the present application is not particularly limited. In the following examples, a gaussian distribution is used as an example.

The first number for the preset distribution component is preset here on the basis of the actual situation; the preset model parameters can be set according to empirical values, for example, based on analysis of a large data volume model, the value range of the obtained model parameters is generally-1 to 1, so the preset model parameters in the embodiment of the application are set to-1 to 1.

In order to make each model parameter in the model to be constructed obey the mixed component distribution composed of the first number of preset distribution components, the statistical value of each preset distribution component is initialized with the preset model parameter, and specifically refer to fig. 3, which shows a method for initializing the preset distribution components, the method includes:

and S310, generating a parameter interval based on the preset model parameter range.

The value range is-1 to 1, and a parameter interval [ -1,1] is generated.

S320, equally dividing the parameter interval to obtain equal division points of the first number, wherein the equal division points of the first number comprise two interval end points of the parameter interval.

S330, determining the numerical value at each equally dividing point, and sequentially determining the numerical values at the equally dividing points of the first quantity as the statistical value of the preset distribution components of the first quantity.

In this embodiment, the first number may be preset to be 11, so that the interval [ -1,1] may be equally divided to obtain a value at each equally divided point, where the 11 equally divided points include two end points-1 and 1 of the interval; the resulting values at the 11 equally spaced points thus include: -1, -0.8, -0.6, -0.4, -0.2,0,0.2,0.4,0.6,0.8,1.

Specifically, the statistical value of the preset distribution component herein may include a mean value of gaussian distributions, so that the values at the 11 equally-divided points may be respectively determined as initial mean values of 11 gaussian distributions, and then the variance of each gaussian distribution is randomly initialized, so as to finally obtain 11 gaussian distribution components, that is, the initialization of the preset distribution component is completed.

S220, constructing a model to be trained based on a preset model and the statistic value of each preset distribution component, wherein each model parameter in the model to be trained obeys the mixed component distribution formed by the first number of preset distribution components.

In the embodiment of the present application, the model to be trained is constructed based on each preset distribution component, and specifically refer to fig. 4, which shows a method for constructing a model to be trained, where the method includes:

and S410, determining the initial occurrence probability distribution of each preset distribution component based on a preset sampling method.

The preset sampling method in the embodiment of the application may be a sampling method based on a dirichlet process, and the initial occurrence probability distribution of each preset component is obtained by sampling in the dirichlet process. The dirichiella process can be used to determine the number of preset distribution components and the probability that each distribution component is used, with which an infinite number of mixture components can be described and the number of reasonable components can be automatically determined.

It should be noted that, for the determination of the preset distribution components, it may be that a first number of preset components and an initial occurrence probability distribution are determined for the entire model, and each preset component and the corresponding occurrence probability distribution may be shared by each layer in the model; because the conditions of each layer in the model are different, the number of preset distribution components of each layer and the occurrence probability distribution of the corresponding preset distribution components can be respectively determined for each layer in the model, and because the number of the preset distribution components related to each layer is possibly different, the statistical values are also different when the preset components are initialized for each layer.

And S420, traversing each layer in the preset model.

S430, determining the number of model parameters in each layer.

S440, sampling each preset distribution component based on the initial occurrence probability distribution of each preset distribution component to obtain the preset distribution component corresponding to one statistical value.

For the condition that each layer shares the preset component, sampling the shared preset distribution component; for the case where each layer has its own predetermined component, the predetermined component involved in that layer is sampled at this time.

S450, sampling is carried out on the preset distribution component with the statistical value, and a sampling parameter is obtained.

Because the statistics values of the preset distribution components are different, the preset distribution component corresponding to one statistics value can be obtained through sampling, and the sampled preset component is sampled once to obtain one sampling parameter.

S460, judging whether the number of the obtained sampling parameters reaches the number of the model parameters in the current layer, and if not, executing the step S440; when the judgment result is yes, step S470 is performed.

And S470, stopping parameter sampling for the current layer when the judgment result is yes.

And S480, generating the model to be trained based on the sampling parameters in each layer.

And traversing each layer of the preset model layer by layer, namely after once traversal sampling, each layer of the whole model obtains corresponding initial model parameters, so that the model to be trained is established based on the initial model parameters. Since these initial model parameters are based on sampling the respective preset components, the model parameters of the model obey a mixed component distribution consisting of said first number of preset distribution components.

S230, obtaining a training sample, training each model parameter in the model to be trained based on the training sample to obtain a trained model, and training the statistic value of each preset distribution component based on the training sample to obtain a target statistic value of each preset distribution component.

In the embodiment of the application, the training of the model is to solve the target statistic of each preset component and the weight of each preset component in the model according to known information, taking the distribution component as gaussian distribution as an example, in a specific solving process, a variational reasoning method can be adopted for solving, wherein a model generating process includes a hyper-parameter theta { α ═ theta { (α) } theta { (₀,m₀,β₀,W₀,υ₀Wherein, α₀To DirichletPreset parameter of the process, m₀And β₀As an initial parameter related to the mean of the respective Gaussian distributions, W₀，υ₀Is the initial parameter associated with the variance of each gaussian distribution. Trainable parameters at training:

wherein

M and β are parameters related to the mean of the gaussian distributions, W, upsilon are parameters related to the variance of the gaussian distributions, and during the training of the model, each parameter in phi is continuously updated to make the model parameter distribution appear multimodal, so that subsequent quantization can be performed while the model maintains high performance.

The optimization goals of model training are as follows:

when the values calculated in equation (1) converge, it can be determined that the model training is complete, where L^EIs a calculated likelihood value, L^CThe penalty term is introduced by the variational reasoning method, and the penalty term can ensure that the trained model is consistent with a preset model as much as possible; d is the training data set, ω is a random variable representing the model parameters, z is an indicator variable of the gaussian component, i.e. which gaussian component is selected, and pi is a parameter of the polynomial distribution, i.e. with what probability a certain gaussian component is selected. Mu is the mean value of the Gaussian distribution, and lambda is the reciprocal of the variance of the Gaussian distribution, and the two are in one-to-one correspondence.

After the training of the model is completed, the parameters of each target model, the target statistics (including mean and variance) of each gaussian component, and the target parameters related to the dirichlet process can be obtained.

S240, determining target distribution components from the first number of preset distribution components based on the model parameters in the trained model and the target statistic values of the preset distribution components.

According to the training result, a target distribution component may be determined from the first number of gaussian distribution components, specifically, the target distribution component may be determined layer by layer, specifically, referring to fig. 5, which shows a target distribution component determining method, where the method includes:

and S510, for each layer in the trained model, comparing each model parameter in the current layer with the target statistic value of each preset distribution component, and determining the preset distribution component to which each model parameter in the current layer belongs based on the comparison result.

The setting of the preset distribution component includes setting a common preset distribution component and setting the preset distribution component for each layer, and the setting of the common preset distribution component is taken as an example for explanation here.

And determining a target mean value of each Gaussian distribution component, comparing each model parameter in the current layer with the target mean value of each Gaussian component, and determining the Gaussian distribution component to which each model parameter belongs based on the comparison result.

Referring to fig. 6, a specific method for determining the gaussian distribution component to which the model parameter belongs may include:

s610, calculating the difference value of each model parameter in the current layer and the target statistic value of each preset distribution component.

And S620, determining the preset distribution component corresponding to the minimum difference value as the attribution component of the model parameter.

And calculating the distance between each model parameter in the current layer and the target mean value of each Gaussian distribution, and when the distance between each model parameter in the current layer and the target mean value of a certain Gaussian distribution is minimum, determining the Gaussian distribution as the Gaussian distribution component to which the model parameter belongs.

S520, counting the number of the attribution model parameters at each preset distribution component.

Based on the above steps, the belonging gaussian distribution component of each model parameter is determined, and at this time, the number of model parameters belonging to each gaussian distribution component is counted at each gaussian distribution component.

S530, determining the preset distribution components with the number of the attribution model parameters larger than the preset number as the target distribution components of the current layer.

Specifically, the number of parameters falling into the first N gaussian components (sorted according to the number of parameters falling into the first N gaussian components) is counted, and if the ratio of the number of parameters included in the first N gaussian components to the total number of parameters (which may be the number of parameters in the layer) is greater than a certain threshold, for example, the threshold may be 0.98, the nth component is truncated, that is, the N components are selected.

In another implementation method, for each gaussian distribution component, when the number of model parameters belonging to a certain gaussian distribution component is greater than a preset number, the gaussian distribution component is determined as a target distribution component. For example, the number of the attribute model parameters at a certain gaussian distribution component is more than 10% of the total number of model parameters in the entire model, then the gaussian distribution component is determined as the target distribution component. The number of the target distribution components is multiple, and the number of the target distribution components is generally smaller than the number of the preset distribution components.

Because the Dirichlet process model can model infinite mixed components, the mixed components with the parameter quantity larger than a certain threshold value in the components are intercepted, and at the moment, reasonable quantization bit numbers of each layer of the model are obtained.

And S250, quantizing each model parameter in the trained model into a target statistic value of the target distribution component respectively to obtain quantization parameters corresponding to each model parameter respectively.

After the target distribution components are obtained, quantizing each model parameter in the model into the target mean value of corresponding Gaussian distribution respectively. The specific model parameter quantification method can be seen in fig. 7, and the method includes:

and S710, for each layer in the trained model, determining a target statistic value of a target distribution component to which each model parameter belongs in the current layer as a target quantization value.

Here, the method of determining the target distribution component to which each model parameter belongs is the same as the method described in fig. 6, and after the gaussian distribution component to which the model parameter belongs is determined, the target mean value of the gaussian distribution component may be determined as the target quantization value of the model parameter.

And S720, quantizing the model parameters into the target quantization values.

Based on the determination of the target quantization values, target quantization values corresponding to the model parameters are obtained.

And S260, generating a target compression model corresponding to the trained model based on the quantization parameters.

And replacing each model parameter in the trained model with a corresponding quantization parameter to obtain a corresponding target compression model.

Based on the steps, the model is constructed to be trained, and then the model is compressed, and the values in phi are updated after the model to be trained is trained once, so that under the condition that an optimization target is not considered, model parameters can be quantized after a searching and training process to construct a target compression model, so that the method is different from the conventional automatic model compression method based on reinforcement learning, which needs to compress, train and test a plurality of alternative models in the searching process, and the reciprocating training consumes a large amount of calculation, which is equivalent to the training of a plurality of models.

After the target compression model is obtained, the target compression model can be trained and fine-tuned again to improve the performance of the model; the training fine tuning process can adopt direct estimation (straight through estimation) to ensure that the position of the quantization point is unchanged and the parameter is still in a quantized state, thereby obtaining a corresponding compression model.

The following description is made of the sampling method based on the dirichlet process in this embodiment:

1. first, a Gaussian Mixture model is introduced, and referring to FIG. 8, a probability map is often used to represent the generation of data/parameters, such as a set of parameters { x } following a Mixture of Gaussian distributions (Mixture of Gaussian), and K Gaussian distributions with mean and variance μ_kSum Σ_kThen the generation process of the set of parameters x is as in fig. 8, which can be described as: firstly, sampling from a distribution taking pi as a parameter to obtain z, namely selecting which Gaussian component, and then sampling in the corresponding Gaussian distribution to obtain data x; taking the 2-component Gaussian mixture distribution as an example, a schematic diagram can be seen in FIG. 9, and as can be seen in FIG. 9, there are two central points around which the sampled data are gathered.

2. The dirichiella process is used to determine the number of mixture components of the above-mentioned mixture model, such as using K gaussian distributions for mixing and the probability that each component is used, i.e. the parameter pi, where the probability for the K-th component to occur can be expressed as:

i.e. first defining a variable sequence v₁,v₂,...,v_iWherein v is_iFor samples sampled from the Beta distribution Beta (1, α) according to the nature of the Beta distribution, 0 < v_i< 1, (i ═ 1, 2.), the dirichura process is constructed as follows:

there is a stick with a length of 1,

1) cutting the length v of the rod₁And let pi₁Is the length v of this segment₁And the remaining length of the stick is L₁＝1-π₁；

2) Cutting the remaining sticks L₁Length L₁v₂And let pi₂＝L₁v₂Is the length of the section and the remaining length of the stick is L₂＝L₁(1-v₂)；

3) Cutting the length of the rod to L₂v₃And let pi₃＝L₂v₃Is the length of the section and the remaining length of the stick is L₃＝L₂(1-v₃)；

……

A schematic diagram of the dirichiella process configuration can be seen in fig. 10.

The rod is continuously folded to obtain an infinite group of { pi_iThe Dirichlet process can theoretically describe an infinite number of mixed components, the set { π_iThe joint distribution of (c) becomes the GEM distribution.

The method has the advantages that infinite mixed components can be described by the Dirichlet process, reasonable mixed component numbers can be automatically determined according to data, an automatic model compression method based on the Dirichlet process is provided, the optimal quantization bit number of a neural network layer by layer can be automatically determined, and searching quantization can be completed in one training. The method provided by the application assumes that the parameter distribution of the model layer by layer is Gaussian mixture distribution, and the parameter quantization process is a process of solving the center of the mixture component and quantizing the parameters to the respective centers. The dirichlet process, as a priori of the number of mixed components, may automatically determine the number of mixed components used per layer, i.e. the number of bits to be quantized.

The algorithmic description of the automated compression method based on the dirichlet process can be generalized as follows:

1. setting a regular term weight tau and setting a learning rate η;

2. setting a prior parameter theta in the generation process;

3. initializing a variation posterior parameter (training parameter) phi;

4. when there is no convergence, the posterior parameter phi is updated^t+1←Φ^t-η▽_Φ{L^E+L^C}；

5. Determining the attribution of each parameter to the mixed components according to r (corresponding to an attribution variable z) in the posterior parameters;

6. intercepting proper mixed component number, namely quantization bit number according to the attribution condition;

7. quantifying model parameters according to the attribution of each parameter;

8. and training and adjusting the quantized network.

In a specific application process, the model compression method provided by the application automatically generates a layer-by-layer compression strategy suitable for data according to the data of a user and a provided and trained model, so as to achieve the functions of better model performance and more 'automatic' and 'efficient' user experience.

Searching a layer-by-layer compression strategy suitable for the current task according to an existing preset model and user data; then, compressing (pruning and quantifying) the model according to the compression strategy; and finally, carrying out a small amount of training adjustment on the compressed model to obtain a final lightweight network. A user only needs to prepare a preset model and training data, and can automatically and efficiently provide a reasonable compression strategy according to the data and the existing model and compress the model. A schematic diagram of an embodiment can be seen in fig. 11.

Specific application scenarios of the present application include, but are not limited to:

1) for a vehicle-mounted system, the computing and storage capabilities of a platform are limited, and various intelligent technologies such as voice recognition and control, geographic positioning and the like need to be integrated, so that a reasonably compressed lightweight neural network can be applied to the system.

2) The mobile terminal intelligent system is used for providing intelligent services such as face recognition and fingerprint recognition by using a large number of mobile phone intelligent terminals in the current life. Because the computing and storage capacities of the mobile phone are limited, a lightweight model is often needed, and the method and the device can quickly provide an optimized lightweight network model for the mobile terminal.

In order to verify the superiority of the automated model compression method based on the dirichlet process, two common Network architectures, namely LeNet (a convolutional neural Network) and ResNet (Residual Network), are used for verification on MNIST and CIFAR-10 data sets. The used comparison algorithm comprises 1) SWS, the selection of quantization bit numbers layer by layer is not carried out, and the same quantization bit number is used for each layer of the model; 2) VD carries out unstructured pruning without model quantification; 3) BC, an automatic model compression method based on Bayes posterior parameter uncertainty, wherein the floating point number with smaller precision is used for description when the posterior variance of the parameter is larger; 4) DC, clustering and quantifying parameters by using a K-means clustering method in a classical model compression method; 5) and the DDPG quantizes the optimal bit number layer by using a reinforcement learning search model. 6) UQ, uniform quantization using the same number of bits for each layer of the model using the standard STE method. UQ-2 uses 2-bit precision and UQ-4 denotes 4-bit precision. BAMC is a compression method provided by the application, wherein BAMC-G represents global search quantization bit number, each layer uses the same quantization bit number, and BAMC-L represents layer-by-layer search proper quantization bit number. C indicates that the searched suitable mixture component can take any

positive integer

1,2, 3 …. B denotes that the number of component components searched takes only 2^ N, i.e., expressed in bits. The comparison result of Le-Net compression on MNIST is shown in fig. 12, and the comparison result of ResNet20 compression on CIFAR-10 is shown in fig. 13, which respectively compares the ratio ((| ω ≠ 0|/| ω |)%) of the residual parameter after quantization pruning and the original model parameter, Bit (Bit), Component (Component), compression ratio (SR), and accuracy (Acc.%) of each method, and from these, it can be seen that the compression method provided in the present application can effectively learn reasonable layer-by-layer quantization Bit number, and has better learning ability.

In particular implementations, another compression method is considered: pruning, namely removing the parameter with the median value of 0 in the model. In the present method, compression and quantization can easily be done simultaneously, and if the mean of a gaussian component is fixed to 0, then all the parameters that fall within that component will be pruned. (| ω ≠ 0|/| ω |)% is the ratio of the residual parameters after the quantified pruning to the original model parameters.

For the above-described compression ratio SR, it can be calculated by the following formula:

wherein l is the number of model layers, N_lIs the model parameter number, bit, of the l-th layer_lThe number of bits of the model is searched for.

The method comprises the steps that a model to be trained is constructed through a first number of preset distribution components, and each model parameter in the model to be trained obeys mixed component distribution formed by the first number of preset distribution components; training the model parameters and the statistical values of the preset distribution components in the model to be trained based on training samples; determining target distribution components from the preset distribution components, and quantizing the model parameters of each layer in the trained model into target statistical values of the target distribution components to which the model parameters belong to obtain quantized parameters; based on the quantization parameters, a target compression model is generated. The method and the device can train the constructed model based on the training sample, and can automatically determine the corresponding layer-by-layer compression strategy after training, so that the time consumption of the model compression process is shortened, the calculated amount in the model compression process is reduced, and the efficiency of model compression is improved.

The invention models the quantization process of the model as a mixed component reasoning problem, uses a Dirichera process model to determine the quantity of mixed components according to data, and quantizes parameters to the center of the Dirichera process model, namely determines the quantization bit number of the model layer by layer (corresponding to the quantity of the mixed components). According to the method, the search and the quantification of the compression strategy can be completed only by one-time training, and the problem that repeated training is needed based on a reinforcement learning method is solved.

The present embodiment further provides an automatic model compression apparatus, please refer to fig. 14, the apparatus includes:

a statistic determination module 1410, configured to determine a first number of preset distribution components, and determine a statistic of each preset distribution component based on a preset model parameter range and the first number;

a model to be trained constructing module 1420, configured to construct a model to be trained based on a preset model and a statistical value of each preset distribution component, where each model parameter in the model to be trained obeys a mixed component distribution composed of the first number of preset distribution components;

a model training module 1430, configured to obtain a training sample, train each model parameter in the model to be trained based on the training sample to obtain a trained model, and train the statistical value of each preset distribution component based on the training sample to obtain a target statistical value of each preset distribution component;

a target distribution component determining module 1440, configured to determine a target distribution component from the first number of preset distribution components based on each model parameter in the trained model and a target statistic of each preset distribution component;

a parameter quantization module 1450, configured to quantize each model parameter in the trained model into a target statistical value of the target distribution component, to obtain quantization parameters corresponding to each model parameter;

a compression model generation module 1460 configured to generate a target compression model corresponding to the trained model based on the quantization parameters.

Referring to fig. 15, the statistic determination module 1410 includes:

a parameter interval generating module 1510, configured to generate a parameter interval based on the preset model parameter range;

an equally dividing module 1520, configured to equally divide the parameter interval to obtain equal division points of the first number, where the equal division points of the first number include two interval end points of the parameter interval;

the statistics determining module 1530 is configured to determine a numerical value at each of the equal dividing points, and sequentially determine the numerical values at the equal dividing points of the first number as statistics of the preset distribution components of the first number.

Referring to fig. 16, the model building module to be trained 1420 includes:

a first determining module 1610, configured to determine an initial occurrence probability distribution of each preset distribution component based on a preset sampling method;

a traversing module 1620, configured to traverse each layer in the preset model;

a second determining module 1630, configured to determine the number of model parameters in the current layer;

the first sampling module 1640 is configured to sample each preset distribution component based on an initial occurrence probability distribution of each preset distribution component to obtain a preset distribution component corresponding to one statistical value;

the second sampling module 1650 is configured to sample a preset distribution component having the statistical value to obtain a sampling parameter;

a repeat execution module 1660, configured to repeat the sampling of each preset distribution component based on the initial occurrence probability distribution of each preset distribution component to obtain a preset distribution component corresponding to a statistical value; sampling the preset distribution components with the statistical values to obtain a sampling parameter until the number of the obtained sampling parameters reaches the number of the model parameters in the current layer;

and a model to be trained generating module 1670, configured to generate the model to be trained based on the sampling parameters in each layer.

Referring to fig. 17, the target distribution component determining module 1440 includes:

a comparing module 1710, configured to compare, for each layer in the trained model, each model parameter in the current layer with a target statistic of each preset distribution component, and determine, based on a comparison result, a preset distribution component to which each model parameter in the current layer belongs;

a first statistical module 1720, configured to count the number of the attribution model parameters at each preset distribution component;

a third determining module 1730, configured to determine that a preset distribution component with the number of the attribution model parameters being greater than a preset number is the target distribution component of the current layer.

Referring to fig. 18, the comparing module 1710 includes:

a first calculating module 1810, configured to calculate a difference between each model parameter in the current layer and a target statistic of each preset distribution component;

a fourth determining module 1820, configured to determine the preset distribution component corresponding to the minimum difference as the attribution component of the model parameter.

Referring to fig. 19, the parameter quantization module 1450 includes:

a target quantization value determining module 1910, configured to determine, for each layer in the trained model, a target statistical value of a target distribution component to which each model parameter belongs in a current layer as a target quantization value;

a quantization module 1920 configured to quantize the model parameter to the target quantization value.

The device provided in the above embodiments can execute the method provided in any embodiment of the present application, and has corresponding functional modules and beneficial effects for executing the method. Technical details not described in detail in the above embodiments may be referred to a method provided in any of the embodiments of the present application.

The present embodiments also provide a computer-readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions that is loaded by a processor and performs any of the methods described above in the present embodiments.

Referring to fig. 20, the apparatus 2000 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 2022 (e.g., one or more processors) and a memory 2032, and one or more storage media 2030 (e.g., one or more mass storage devices) for storing applications 2042 or data 2044. The memory 2032 and the storage medium 2030 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 2030 may include one or more modules (not shown in the drawings), each of which may include a series of instruction operations in the device. Further, the central processor 2022 may be arranged to communicate with the storage medium 2030 to execute a series of instruction operations in the storage medium 2030 on the device 2000. The apparatus 2000 may also include one or more power supplies 2026, one or more wired or wireless network interfaces 2050, one or more input-output interfaces 2058, and/or one or more operating systems 2041, such as a Windows Server^TM，Mac OS X^TM，Unix^TM，Linux^TM，FreeBSD^TMAnd so on. Any of the methods described above in this embodiment can be implemented based on the apparatus shown in fig. 20.

The present specification provides method steps as described in the examples or flowcharts, but may include more or fewer steps based on routine or non-inventive labor. The steps and sequences recited in the embodiments are but one manner of performing the steps in a multitude of sequences and do not represent a unique order of performance. In the actual system or interrupted product execution, it may be performed sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures.

The configurations shown in the present embodiment are only partial configurations related to the present application, and do not constitute a limitation on the devices to which the present application is applied, and a specific device may include more or less components than those shown, or combine some components, or have an arrangement of different components. It should be understood that the methods, apparatuses, and the like disclosed in the embodiments may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a division of one logic function, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or unit modules.

Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An automated model compression method, comprising:

2. The method of claim 1, wherein the determining statistics of each predetermined distribution component based on the predetermined model parameter range and the first number comprises:

generating a parameter interval based on the preset model parameter range;

equally dividing the parameter interval to obtain the first number of equally divided points, wherein the first number of equally divided points comprise two interval end points of the parameter interval;

determining the numerical value at each equally dividing point, and sequentially determining the numerical values at the equally dividing points of the first number as the statistical value of the preset distribution components of the first number.

3. The method of claim 1, wherein the constructing the model to be trained based on the preset model and the statistical values of the preset distribution components comprises:

determining initial occurrence probability distribution of each preset distribution component based on a preset sampling method;

traversing each layer in the preset model;

determining the number of model parameters in a current layer;

sampling each preset distribution component based on the initial occurrence probability distribution of each preset distribution component to obtain a preset distribution component corresponding to a statistical value;

sampling a preset distribution component with the statistical value to obtain a sampling parameter;

repeating the sampling of each preset distribution component based on the initial occurrence probability distribution of each preset distribution component to obtain a preset distribution component corresponding to a statistical value; sampling the preset distribution components with the statistical values to obtain a sampling parameter until the number of the obtained sampling parameters reaches the number of the model parameters in the current layer;

and generating the model to be trained based on the sampling parameters in each layer.

4. The method of claim 1, wherein the determining a target distribution component from the first number of preset distribution components based on the model parameters in the trained model and the target statistics of the preset distribution components comprises:

for each layer in the trained model, comparing each model parameter in the current layer with the target statistic value of each preset distribution component, and determining the preset distribution component to which each model parameter in the current layer belongs based on the comparison result;

counting the number of the attribution model parameters at each preset distribution component;

and determining the preset distribution components of which the number of the attribution model parameters is larger than the preset number as the target distribution components of the current layer.

5. The method of claim 4, wherein the comparing each model parameter in the current layer with the target statistic of each preset distribution component, and determining the preset distribution component to which each model parameter in the current layer belongs based on the comparison result comprises:

calculating the difference value of each model parameter in the current layer and the target statistic value of each preset distribution component;

and determining the preset distribution component corresponding to the minimum difference value as the attribution component of the model parameter.

6. The method of claim 5, wherein the quantizing the model parameters in the trained model into the statistical values of the target distribution components respectively to obtain quantized parameters corresponding to the model parameters respectively comprises:

for each layer in the trained model, determining a target statistic value of a target distribution component to which each model parameter belongs in a current layer as a target quantization value;

quantizing the model parameter to the target quantization value.

7. The method of claim 1, wherein the generating a target compression model corresponding to the trained model based on the quantization parameters further comprises:

and training the target compression model based on the training samples.

8. An automated model compression apparatus, comprising:

a model to be trained building module, configured to build a model to be trained based on a preset model and a statistical value of each preset distribution component, where each model parameter in the model to be trained obeys a mixed component distribution composed of the first number of preset distribution components;

9. An apparatus comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the method of automated model compression as claimed in any one of claims 1 to 7.

10. A computer storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions that is loaded by a processor and that performs an automated model compression method according to any one of claims 1 to 7.