CN112465141A

CN112465141A - Model compression method, model compression device, electronic device and medium

Info

Publication number: CN112465141A
Application number: CN202011501677.5A
Authority: CN
Inventors: 成冠举; 李葛; 曾婵; 高鹏
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2021-03-09
Also published as: WO2022126902A1

Abstract

The invention relates to a data processing technology, and discloses a model compression method, which comprises the following steps: performing data fitting on random noise data by using a pre-constructed fitter to obtain simulation data, calculating an activation loss value of the simulation data and the noise data, adjusting parameters of the fitter when the activation loss value is greater than a preset activation threshold value until the activation loss value is less than or equal to the preset activation threshold value, and inputting the simulation data into a model to be compressed to obtain output data; and calculating sparse loss values of the output data and the simulation data, adjusting internal parameters of the fitting device when the sparse loss values are larger than a preset sparse threshold value until the sparse loss values are smaller than or equal to the preset sparse threshold value, outputting the simulation data and compressing the model to be compressed to obtain a compressed model. The invention also discloses a model compression device, an electronic device and a storage medium. The invention can realize the compression of the model without acquiring training data, network structure, parameters and the like.

Description

Model compression method, model compression device, electronic device and medium

Technical Field

The present invention relates to the field of data processing, and in particular, to a model compression method, apparatus, electronic device, and computer-readable storage medium.

Background

In the big data era, deep learning models are applied more and more frequently, and in order to apply the deep learning models to small devices such as mobile devices and sensors, the deep learning models sometimes need to be compressed and cut to be deployed to the small devices.

At present, the mainstream deep learning compression method is to compress models based on an original training data set, a network structure, parameters and the like, such as a knowledge distillation method and a metadata-based method, wherein the former needs a large amount of original training data, and the latter needs the network structure and parameters of the models, but the training data, the network structure and the parameters are usually difficult to obtain due to laws, privacy and the like.

Disclosure of Invention

The invention provides a model compression method, a model compression device, electronic equipment and a computer-readable storage medium, and mainly aims to provide a scheme for compressing a model without acquiring training data, a network structure and parameters.

In order to achieve the above object, the present invention provides a model compression method, including:

performing data fitting operation on the random noise data by using a pre-constructed fitter to obtain simulation data;

calculating an activation loss value between the simulation data and the noise data by using a preset first loss function, adjusting parameters of the fitter and returning to perform data fitting operation on random noise data by using a pre-constructed fitter when the activation loss value is larger than a preset activation threshold value to obtain simulation data, and inputting the simulation data into a model to be compressed to obtain output data when the activation loss value is smaller than or equal to the preset activation threshold value;

calculating a sparse loss value between the output data and the simulation data by using a preset second loss function, adjusting internal parameters of the fitter and returning to perform data fitting operation on random noise data by using a pre-constructed fitter when the sparse loss value is greater than a preset sparse threshold value to obtain simulation data, and outputting the simulation data until the sparse loss value is less than or equal to the preset sparse threshold value;

and compressing the model to be compressed according to the simulation data to obtain a compressed model.

Optionally, the performing, by using a pre-established fitter, data fitting operation on the random noise data to obtain simulation data includes:

predicting the noise data by using a long-term and short-term memory network in the fitter to obtain a fitting data set;

compressing the fitting data set by using an activation function to obtain a compressed data set;

and vectorizing the compressed data set to obtain simulation data.

Optionally, the vectorizing the compressed data set to obtain simulation data includes:

mapping the compressed data in the compressed data set into a feature vector by using a Word2Vec algorithm;

and splicing the eigenvectors according to the sequence of the eigenvectors to obtain the simulation data.

Optionally, the calculating an activation loss value between the simulation data and the noise data by using a preset first loss function includes:

calculating an activation loss value between the simulated data and the noise data using a first loss function:

wherein the content of the first and second substances,

for the activation loss value, n is the number of samples of the noise data,

is the mth data in the simulation data, | | | | non-woven phosphor₁Is the L1 norm.

Optionally, the calculating a sparse loss value between the output data and the simulation data by using a preset second loss function includes:

calculating sparse loss values between the output data and the simulation data using a second loss function:

wherein the content of the first and second substances,

for the sparse loss value, x is the number of samples of the simulation data,

is the mth data in the output data,

is a pre-set parameter of the process,

is a softmax loss function.

Optionally, the compressing the model to be compressed according to the simulation data to obtain a compressed model includes:

inputting the simulation data into a preset standard compression model to perform vector operation to obtain first characteristics output by the standard compression model, and inputting the simulation data into the model to be compressed to perform vector operation to obtain second characteristics output by the model to be compressed;

determining a loss function of the model to be compressed according to the first characteristic and the second characteristic;

and performing back propagation on the model to be compressed according to the loss function to obtain a compressed model.

Optionally, the determining a loss function of the model to be compressed according to the first feature and the second feature includes:

performing difference calculation according to the first characteristic and the second characteristic to obtain a difference function;

and performing norm conversion processing on the difference function and solving the square of the difference function to obtain a loss function.

In order to solve the above problems, the present invention also provides a pattern compression apparatus, comprising:

the data fitting module is used for performing data fitting operation on the random noise data by using a pre-constructed fitter to obtain simulation data;

the activation loss module is used for calculating an activation loss value between the simulation data and the noise data by using a preset first loss function, adjusting parameters of the fitter when the activation loss value is larger than a preset activation threshold value, and inputting the simulation data into a model to be compressed to obtain output data when the activation loss value is smaller than or equal to the preset activation threshold value;

the sparse loss module is used for calculating a sparse loss value between the output data and the simulation data by using a preset second loss function, adjusting internal parameters of the fitter when the sparse loss value is greater than a preset sparse threshold value, and outputting the simulation data until the sparse loss value is less than or equal to the preset sparse threshold value;

and the model compression module is used for compressing the model to be compressed according to the simulation data to obtain a compressed model.

In order to solve the above problem, the present invention also provides an electronic device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model compression method described above.

In order to solve the above problem, the present invention also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described model compression method.

The embodiment of the invention utilizes a pre-constructed fitter to perform data fitting operation on random noise data to obtain simulation data; calculating an activation loss value between the simulation data and the noise data by using a preset first loss function, adjusting parameters of the fitter when the activation loss value is larger than a preset activation threshold value, and inputting the simulation data into a model to be compressed to obtain output data when the activation loss value is smaller than or equal to the preset activation threshold value; calculating a sparse loss value between the output data and the simulation data by using a preset second loss function, and when the sparse loss value is greater than a preset sparse threshold value, adjusting internal parameters of the fitter until the sparse loss value is less than or equal to the preset sparse threshold value, and outputting the simulation data; and verifying the data simulated by the fitter by using the two loss functions to obtain simulation data closest to noise data, and compressing the model to be compressed according to the simulation data to obtain a compressed model. Therefore, the model compression method, the model compression device and the computer readable storage medium provided by the invention can solve the problem that training data, network structures and parameters which are difficult to obtain need to be obtained for model compression.

Drawings

FIG. 1 is a schematic flow chart of a model compression method according to an embodiment of the present invention;

FIG. 2 is a block diagram of a model compression apparatus according to an embodiment of the present invention;

fig. 3 is a schematic internal structural diagram of an electronic device implementing a model compression method according to an embodiment of the present invention;

the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The embodiment of the application provides a model compression method. The execution subject of the model compression method includes, but is not limited to, at least one of electronic devices such as a server and a terminal that can be configured to execute the method provided by the embodiments of the present application. In other words, the model compression method may be performed by software or hardware installed in the terminal device or the server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.

Fig. 1 is a schematic flow chart of a model compression method according to an embodiment of the present invention. In this embodiment, the model compression method includes:

and S1, performing data fitting operation on the random noise data by using a pre-constructed fitter to obtain simulation data.

In the embodiment of the invention, the random noise data is random Gaussian noise sampled from Gaussian distribution. The fitter continuously performs linear fitting processing on the noise data to generate simulation data close to real data.

Specifically, the performing data fitting operation on the random noise data by using a pre-constructed fitter to obtain simulation data includes:

and vectorizing the compressed data set to obtain simulation data.

The long-short term memory network can train the mapping of the random noise from Gaussian distribution to fitting distribution, and in order to prevent the occurrence of overfitting, a dropout mechanism is added to each layer of the neural network of the long-short term memory network. The activation function may be a tanh function, which is used to compress the data in the fitted data set to between-1 and 1 for subsequent vectorization operations.

Further, the vectorizing the compressed data set to obtain simulation data includes:

The Word2Vec algorithm can map data into vectors with uniform dimensions, is suitable for the situation that strong association exists between sequence local data and sequence data, and can be used for analyzing the data more generally.

In detail, a pre-constructed fitter is used for performing data fitting operation on random noise data, so that simulation data close to the random noise data can be obtained and used for replacing the random noise data to perform subsequent model compression.

And S2, calculating an activation loss value between the simulation data and the noise data by using a preset first loss function.

In an embodiment of the present invention, the first loss function:

wherein the content of the first and second substances,

for the activation loss value, n is the number of samples of the noise data,

is the mth data in the simulation data, | | | | non-woven phosphor₁Is the L1 norm. The L1 norm is mainly for sparseness, and the minus sign is added to make the sparseness as small as possible

As much as possible is activated.

When the activation loss value is greater than the preset activation threshold value, the embodiment of the present invention adjusts the parameters of the fitter and returns to the above step S1, and performs data fitting operation on the random noise data by using the pre-constructed fitter again to obtain simulation data.

Preferably, the parameters of the fitter may be the fitter's weights, gradients, etc.

And when the activation loss value is smaller than or equal to a preset activation threshold value, executing S3, and inputting the simulation data into a model to be compressed to obtain output data.

And the first loss function calculates an activation loss value between the simulation data and the noise data, compares the activation loss value with a preset activation threshold value, and further adjusts parameters of the fitter until the activation loss value between the simulation data and the noise data is converged, wherein the fitter obtained by adjustment meets the standard, and the parameters of the fitter do not need to be adjusted.

And S4, calculating a sparse loss value between the output data and the simulation data by using a preset second loss function.

In an embodiment of the present invention, the second loss function may be

Wherein the content of the first and second substances,

for the sparse loss value, x is the number of samples of the simulation data,

is the mth data in the output data,

is a pre-set parameter of the process,

is a softmax loss function.

When the sparse loss value is greater than the preset sparse threshold value, the embodiment of the present invention adjusts the internal parameters of the fitter and returns to the above S1, and performs data fitting operation on the random noise data again by using the pre-constructed fitter to obtain the simulation data.

And when the sparse loss value is less than or equal to a preset sparse threshold value, executing S5, outputting the simulation data, and compressing the model to be compressed according to the simulation data to obtain a compressed model.

In the embodiment of the present invention, the compressing the model to be compressed according to the simulation data to obtain a compressed model includes:

Specifically, the determining a loss function of the model to be compressed according to the first feature and the second feature includes:

Fig. 2 is a block diagram of the model compressing apparatus according to the present invention.

The model compressing device 100 of the present invention may be installed in an electronic device. Depending on the implemented functionality, the model compression apparatus 100 may include a data fitting module 101, an activation loss module 102, a sparse loss module 103, and a model compression module 104. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

In the present embodiment, the functions regarding the respective modules/units are as follows:

the data fitting module 101 is configured to perform data fitting operation on random noise data by using a pre-established fitter to obtain simulation data;

the activation loss module 102 is configured to calculate an activation loss value between the simulation data and the noise data by using a preset first loss function, adjust a parameter of the fitter when the activation loss value is greater than a preset activation threshold, and input the simulation data into a model to be compressed until the activation loss value is less than or equal to the preset activation threshold, so as to obtain output data;

the sparse loss module 103 is configured to calculate a sparse loss value between the output data and the simulation data by using a preset second loss function, adjust internal parameters of the fitter when the sparse loss value is greater than a preset sparse threshold, and output the simulation data until the sparse loss value is less than or equal to the preset sparse threshold;

the model compression module 104 is configured to perform compression processing on the model to be compressed according to the simulation data to obtain a compressed model.

In detail, when the modules in the model compression apparatus 100 are executed by a processor of an electronic device, a model compression method may be implemented, where the model compression method includes the following specific steps:

step one, the data fitting module 101 performs data fitting operation on random noise data by using a pre-constructed fitter to obtain simulation data.

Specifically, the data fitting module 101 performs data fitting operation on random noise data by using a pre-constructed fitter to obtain simulation data, including:

and vectorizing the compressed data set to obtain simulation data.

Step two, the activation loss module 102 calculates an activation loss value between the simulation data and the noise data by using a preset first loss function.

In an embodiment of the present invention, the first loss function:

wherein the content of the first and second substances,

for the activation loss value, n is the number of samples of the noise data,

As much as possible is activated.

When the activation loss value is larger than the preset activation threshold value, adjusting parameters of the fitter and returning to the first step, and performing data fitting operation on random noise data by using the pre-constructed fitter again to obtain simulation data.

And when the activation loss value is smaller than or equal to a preset activation threshold value, executing a third step of inputting the simulation data into a model to be compressed to obtain output data.

Fourthly, the sparse loss module 103 calculates a sparse loss value between the output data and the simulation data by using a preset second loss function.

In an embodiment of the present invention, the second loss function may be

Wherein the content of the first and second substances,

for the sparse loss value, x is the number of samples of the simulation data,

is the mth data in the output data,

is a pre-set parameter of the process,

is a softmax loss function.

When the sparse loss value is larger than the preset sparse threshold value, adjusting internal parameters of the fitter and returning to the first step, and performing data fitting operation on random noise data again by using the pre-constructed fitter to obtain simulation data.

And when the sparse loss value is less than or equal to a preset sparse threshold value, executing a fifth step, outputting the simulation data, and compressing the model to be compressed according to the simulation data to obtain a compressed model.

Fig. 3 is a schematic structural diagram of an electronic device implementing the model compression method according to the present invention.

The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a model compression program 12, stored in the memory 11 and executable on the processor 10.

The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of the model compression program 12, but also to temporarily store data that has been output or is to be output.

The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., executing a model compression program, etc.) stored in the memory 11 and calling data stored in the memory 11.

The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.

Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.

For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.

Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

The model compression program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:

Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer-readable storage medium may be volatile or non-volatile, and may include, for example: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).

The present invention also provides a computer-readable storage medium, which stores a computer program that, when executed by a processor of an electronic device, can implement:

Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any accompanying claims should not be construed as limiting the claim concerned.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A method of model compression, the method comprising:

step A: performing data fitting operation on the random noise data by using a pre-constructed fitter to obtain simulation data;

and B: calculating an activation loss value between the simulation data and the noise data by using a preset first loss function, adjusting parameters of the fitter and returning to the step A when the activation loss value is larger than a preset activation threshold value, and inputting the simulation data into a model to be compressed until the activation loss value is smaller than or equal to the preset activation threshold value to obtain output data;

and C: calculating a sparse loss value between the output data and the simulation data by using a preset second loss function, adjusting internal parameters of the fitter and returning to the step A when the sparse loss value is greater than a preset sparse threshold value, and outputting the simulation data until the sparse loss value is less than or equal to the preset sparse threshold value;

step D: and compressing the model to be compressed according to the simulation data to obtain a compressed model.

2. The model compression method of claim 1, wherein performing a data fitting operation on the random noise data using a pre-constructed fitter to obtain simulation data comprises:

and vectorizing the compressed data set to obtain simulation data.

3. The model compression method of claim 2, wherein the vectorizing the compressed data set to obtain simulation data comprises:

4. The model compression method of claim 1, wherein the calculating the activation loss value between the simulation data and the noise data using a preset first loss function comprises:

wherein the content of the first and second substances,

for the activation loss value, n is the number of samples of the noise data,

5. The model compression method of claim 1, wherein the calculating of the sparse loss value between the output data and the simulation data using a preset second loss function comprises:

wherein the content of the first and second substances,

for the sparse loss value, x is the number of samples of the simulation data,

is the mth data in the output data,

is a pre-set parameter of the process,

is a softmax loss function.

6. The model compression method according to any one of claims 1 to 5, wherein the compressing the model to be compressed according to the simulation data to obtain a compressed model comprises:

7. The method of model compression of claim 6, wherein said determining a loss function for the model to be compressed based on the first and second features comprises:

8. A pattern compression apparatus, the apparatus comprising:

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model compression method of any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a method of model compression as claimed in any one of claims 1 to 7.