CN115543945B

CN115543945B - Model compression method and device, storage medium and electronic equipment

Info

Publication number: CN115543945B
Application number: CN202211509870.2A
Authority: CN
Inventors: 王维强; 张长浩; 申书恒; 傅欣艺
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-06-20
Anticipated expiration: 2042-11-29
Also published as: CN115543945A

Abstract

The specification discloses a method, a device, a storage medium and electronic equipment for compressing a model, which can acquire the model to be compressed, determine model structure parameters of the model to be compressed, determine a plurality of compression schemes for the model to be compressed according to the model structure parameters, then determine a model to be evaluated corresponding to each compression scheme according to various compression schemes, and determine data processing time consumed by the model to be evaluated corresponding to the compression scheme for processing the sample data to obtain an output result according to preset sample data as the data processing time corresponding to the compression scheme, thereby determining a target compression scheme according to the data processing time corresponding to each compression scheme, compressing the model to be compressed according to the target compression scheme, further improving service execution efficiency through a neural network model, and protecting privacy data of a user.

Description

Model compression method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of neural networks, and in particular, to a method and apparatus for model compression, a storage medium, and an electronic device.

Background

With the continuous development of computer technology, the neural network model can be applied to various fields such as information recommendation, unmanned, privacy data protection, business wind control and the like, and the intelligence of modern life is improved.

The application scenarios of the neural network model are various, and the cloud end and the client end can deploy the neural network model to execute the required service, however, currently, the efficiency of executing the service through the neural network model is poor, and how to improve the efficiency of executing the service through the neural network model is a problem to be solved.

Disclosure of Invention

The present disclosure provides a method and apparatus for model compression to improve the efficiency of executing a service through a neural network model.

The technical scheme adopted in the specification is as follows:

the present specification provides a method of model compression, comprising:

obtaining a model to be compressed;

determining model structure parameters of the model to be compressed, and determining a plurality of compression schemes aiming at the model to be compressed according to the model structure parameters;

determining a model to be evaluated corresponding to each compression scheme according to the various compression schemes;

for each compression scheme, determining the data processing time length consumed by the sample data processed by the model to be evaluated corresponding to the compression scheme to obtain an output result according to preset sample data, and taking the data processing time length as the data processing time length corresponding to the compression scheme;

And determining a target compression scheme according to the data processing time length corresponding to each compression scheme, and compressing the model to be compressed according to the target compression scheme.

Optionally, the model structure parameters include a size of a convolution kernel in the model to be compressed;

according to the model structure parameters, determining a plurality of compression schemes for the model to be compressed, including:

determining the size of a convolution kernel of the model to be compressed according to the size of the convolution kernel in the model to be compressed;

according to the compression size, adjusting the convolution kernel of the model to be compressed to obtain at least one adjusted convolution kernel;

and determining a plurality of compression schemes for the model to be compressed according to the at least one adjusted convolution kernel.

Optionally, the model structure parameter includes a network layer number of the model to be compressed;

determining a combination mode for combining different network layers in the model to be compressed according to the network layer number of the model to be compressed;

According to the combination mode, a plurality of compression schemes for the model to be compressed are determined.

Optionally, the model structure parameter includes the number of convolution kernels of each network layer of the model to be compressed;

for each network layer in the model to be compressed, determining the candidate number of convolution kernels adopted by the network layer according to the number of convolution kernels actually contained by the network layer, wherein the candidate number does not exceed the number of convolution kernels actually contained in the network layer;

and determining a plurality of compression schemes for the model to be compressed according to the candidate quantity of the convolution kernels adopted by each network layer.

Optionally, for each compression scheme, according to preset sample data, determining a data processing duration consumed by a model to be evaluated corresponding to the compression scheme for processing the sample data to obtain an output result, where the data processing duration is used as the data processing duration corresponding to the compression scheme, and the data processing duration comprises:

selecting a partial compression scheme from the plurality of compression schemes, and inputting the sample data into a model to be evaluated corresponding to the partial compression scheme to obtain the data processing time length corresponding to the partial compression scheme;

Constructing a training sample according to the data processing time length corresponding to the partial compression scheme and the model structure parameters contained in the partial compression scheme;

inputting model structure parameters contained in the training samples into a duration prediction model to obtain a predicted duration, and training the duration prediction model by taking the deviation between the predicted duration and the data processing duration corresponding to the partial compression scheme as an optimization target;

and predicting the data processing time length corresponding to the residual compression scheme through the trained time length prediction model.

Optionally, determining the target compression scheme according to the data processing duration corresponding to each compression scheme includes:

grading the parameter quantity of the model to be evaluated corresponding to each compression scheme to obtain a grading result;

and determining a target compression scheme corresponding to the parameter value gear according to the compression scheme of which the corresponding data processing time length in the parameter value gear does not exceed the set time length corresponding to the parameter value gear aiming at each parameter value gear in the gear grading result.

Optionally, determining, according to a compression scheme of the parameter range in which the corresponding data processing duration does not exceed the set duration corresponding to the parameter range, a target compression scheme corresponding to the parameter range includes:

Taking the compression scheme of which the corresponding data processing time length in the parameter gear is not longer than the set time length corresponding to the parameter gear as each candidate compression scheme;

clustering model structure parameters contained in each candidate compression scheme to obtain a clustering center corresponding to each candidate compression scheme;

and determining a target compression scheme corresponding to the parameter quantity gear according to the model structure parameters corresponding to the clustering center.

The present specification provides an apparatus for model compression, comprising:

the acquisition module is used for acquiring the model to be compressed;

the scheme determining module is used for determining model structure parameters of the model to be compressed and determining a plurality of compression schemes aiming at the model to be compressed according to the model structure parameters;

the model determining module is used for determining a model to be evaluated corresponding to each compression scheme according to the various compression schemes;

the time length determining module is used for determining the data processing time length consumed by the sample data processed by the model to be evaluated corresponding to each compression scheme to obtain an output result according to preset sample data, and the data processing time length is used as the data processing time length corresponding to the compression scheme;

And the compression module is used for determining a target compression scheme according to the data processing time length corresponding to each compression scheme and compressing the model to be compressed according to the target compression scheme.

The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the method of model compression described above.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method of model compression as described above when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

it can be seen from the above method for compressing a model that a model to be compressed can be obtained, a model structure parameter of the model to be compressed is determined, a plurality of compression schemes for the model to be compressed are determined according to the model structure parameter, then, an to-be-evaluated model corresponding to each compression scheme is determined according to various compression schemes, and then, according to preset sample data, a data processing duration consumed by the to-be-evaluated model corresponding to the compression scheme for processing the sample data to obtain an output result is determined as the data processing duration corresponding to the compression scheme, so that a target compression scheme is determined according to the data processing duration corresponding to each compression scheme, and the model to be compressed is compressed according to the target compression scheme.

From the above, it can be seen that, according to the method, based on the model structure parameters of the model to be compressed, a plurality of compression schemes for the model to be compressed can be determined, and for one compression scheme, the model to be evaluated can be obtained by compressing the model to be compressed according to the compression scheme, so that the data processing duration corresponding to each compression scheme can be determined, and the most suitable compression scheme for compressing the model can be determined, thereby improving the efficiency of executing the service through the neural network model.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:

FIG. 1 is a flow chart of a method of model compression in the present specification;

FIG. 2 is a schematic diagram of a manner in which a compression scheme may be determined by adjusting the convolution kernel size provided herein;

FIG. 3 is a schematic diagram of a manner in which network layers are combined as provided in the present specification;

FIG. 4 is a schematic illustration of a determination of compression scheme at the width level provided in this specification;

Fig. 5 is a schematic flow chart of predicting a data processing duration corresponding to a residual compression scheme according to a training duration prediction model provided in the present specification;

FIG. 6 is a schematic diagram of a device for model compression provided in the present specification;

fig. 7 is a schematic diagram of an electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a flow chart of a method for compressing a model in the present specification, specifically including the following steps:

s100: and obtaining the trained model to be compressed.

S102: and determining model structure parameters of the model to be compressed, and determining a plurality of compression schemes aiming at the model to be compressed according to the model structure parameters.

In practical application, the neural network model is more and more huge, the size of the neural network model is reduced when the neural network model is compressed, the time consumption of the neural network model for prediction (reasoning) is reduced, and meanwhile, the prediction effect of the neural network model is required to be ensured, so that the efficiency of executing the service through the neural network model can be improved, and the effect when the neural network model is used for application is also ensured.

Based on the above, the service platform may obtain a model to be compressed, determine a model structure parameter of the model to be compressed, and determine each compression scheme for the model to be compressed according to the model structure parameter of the model to be compressed. In the subsequent process, the service platform needs to select a proper compression scheme from the compression schemes as a basis for compressing the to-be-compressed model, that is, the method actually provides a neural structure searching (Neural Architecture Search, NAS) method, and aims to search a sub-network (or a network structure is unchanged, and internal parameters of the to-be-compressed model are changed) with good performance, effect and operation efficiency from the neural structure of the to-be-compressed model, so that the to-be-compressed model can be a network model with a large volume trained by the service platform in advance.

The above-mentioned model structure parameters may represent the model structure of the model to be compressed, and may also represent the parameter structure contained in the model to be compressed, where the model structure parameters of the model to be compressed may often include parameters with multiple dimensions, for example, the model structure parameters may specifically include the size of the convolution kernels in the model to be compressed, the number of model layers of the model to be compressed, and the number of convolution kernels in each network layer of the model to be compressed. And adjusting the model to be compressed according to any one dimension parameter in the model structure parameters to obtain a model (model to be evaluated) with a changed model structure and/or model internal parameters, and determining multiple compression schemes for the model to be compressed according to the model structure parameters.

Specifically, the size of the convolution kernel after the convolution kernel of the model to be compressed is determined according to the size of the convolution kernel in the model to be compressed, the convolution kernel of the model to be compressed is adjusted according to the compression size, at least one adjusted convolution kernel is obtained, and then a plurality of compression schemes for the model to be compressed are determined according to the at least one adjusted convolution kernel.

There may be various specific ways of adjustment, for example, when the convolution kernel of the model to be compressed is adjusted after the compression size is determined, a sub-convolution kernel corresponding to the compression size may be extracted from the convolution kernel according to the compression size, and the sub-convolution kernel is used as the adjusted convolution kernel, as shown in fig. 2.

Fig. 2 is a schematic diagram of a manner of determining a compression scheme by adjusting the convolution kernel size provided in the present specification.

As can be seen from fig. 2, the original convolution kernel has a size of 7×7, and when the convolution kernel is compressed, the compressed size may be 5×5, and 3×3, and when the original convolution kernel is compressed, a matrix with a size matching the middle of the convolution kernel and the compressed size may be extracted as an adjusted convolution kernel.

The combination manner of combining different network layers in the model to be compressed can be determined according to the network layer number of the model to be compressed, and then several compression schemes for the model to be compressed are determined according to the combination manner, and it should be noted that, the network layers mentioned herein do not refer to layers such as a feature extraction layer, a convolution layer and a full connection layer, which are generally described, but refer to sub-layers in such layers, as shown in fig. 3.

Fig. 3 is a schematic diagram of a manner of combining network layers provided in the present specification.

As can be seen from fig. 3, in the compression scheme, certain network layers may be skipped, leaving portions of the network layers behind, thereby reducing the depth of the model to be compressed.

The number of candidates of the convolution kernels adopted by the network layer is determined according to the number of convolution kernels actually included by the network layer for each network layer in the model to be compressed, the number of candidates does not exceed the number of convolution kernels actually included in the network layer, and then, a plurality of compression schemes for the model to be compressed are determined according to the number of candidates of the convolution kernels adopted by each network layer. That is, assuming that 4 convolution kernels are included in one network layer, the number of candidates corresponding to the compression scheme may be 3 or 2.

That is, the above way to combine the network layers is to reduce the depth of the model, and this way to reduce the width of the model, select convolution kernels of candidate data in one network layer, that is, select channels in this network layer, and reduce the volume of the model at the parallel level inside the model, where, because the weights of the convolution kernels in one network layer are different, when selecting the candidate number of convolution kernels, the convolution kernels with higher weights can be selected, so as to obtain the compression scheme, as shown in fig. 4.

Fig. 4 is a schematic diagram of a determination of compression scheme at the width level provided in this specification.

As can be seen from fig. 4, in the compression scheme, the computation channels of some convolution kernels may be skipped, thereby reducing the width of the model to be compressed.

It should be noted that the above-mentioned various ways of determining the compression scheme may be combined with each other, for example, in the compression scheme, only the convolution kernel size may be adjusted, only the combination of network layers may be performed, or both may be adjusted.

S104: and determining a model to be evaluated corresponding to each compression scheme according to various compression schemes.

S106: for each compression scheme, according to preset sample data, determining the data processing duration consumed by the sample data processed by the model to be evaluated corresponding to the compression scheme to obtain an output result, wherein the data processing duration is used as the data processing duration corresponding to the compression scheme.

From the above, it can be seen that, according to the model structure parameters, each compression scheme is determined, more similar to a process of enumerating how many cases can exist for the model to be compressed. Then, according to each compression scheme, determining a model to be evaluated corresponding to each compression scheme, and according to preset sample data, determining the data processing duration consumed by the model to be evaluated corresponding to each compression scheme for processing the sample data to obtain an output result, wherein the data processing duration is used as the data processing duration corresponding to each compression scheme.

The model to be evaluated corresponding to one compression scheme refers to a model obtained after the model to be compressed is compressed according to the compression scheme, so that the mode is to determine the model to be evaluated corresponding to each compression scheme (all the models to be evaluated are not necessarily required to be built at the moment, but the model structure and the internal parameters of the models to be evaluated are required to be determined according to the compression scheme), and then the data processing time consumed by the model to be evaluated corresponding to each compression scheme for processing the sample data to obtain the output result is required to be determined, so that the compression schemes are suitable to be evaluated later.

However, since the model to be compressed is often large, and the model to be compressed may also include a plurality of convolution kernels, it is not practical to determine how many thousands of compression schemes may be determined for the model to be compressed, each compression scheme may correspond to one model to be evaluated, how many thousands of models to be evaluated may be determined may also be determined, select a suitable compression scheme, determine all the models to be evaluated corresponding to all the compression schemes through the model to be recommended, obtain each real model to be evaluated, and operate to obtain a practical effect.

Therefore, the service platform can select a part of compression schemes from the compression schemes, input sample data into a model to be evaluated corresponding to the part of compression schemes to obtain data processing time length corresponding to the part of compression schemes, further determine training samples according to the data processing time length corresponding to the part of compression schemes and model structure parameters contained in the part of compression schemes, train a preset time length prediction model through the determined training samples, and predict and obtain the data processing time length corresponding to the rest of compression schemes through the trained time length prediction model. In this way, the data processing duration corresponding to all compression schemes can be obtained, as shown in fig. 5.

Fig. 5 is a schematic flow chart of predicting a data processing duration corresponding to a residual compression scheme according to a training duration prediction model provided in the present specification.

That is, if there are too many compression schemes, a part of the compression schemes with fewer compression schemes may be selected from the compression schemes to obtain real to-be-evaluated models corresponding to the compression schemes, the to-be-evaluated models are actually operated, the sample data is predicted to obtain the data processing duration of the model, the data processing duration of the to-be-evaluated model may refer to the time consumption of predicting the sample data through the to-be-evaluated model, and further, the duration prediction model may be trained through the data processing durations of the to-be-evaluated models, so that the data processing duration of the remaining compression schemes corresponding to the to-be-evaluated model may be predicted through the duration prediction model.

That is, the duration prediction model may be subjected to supervised training, the model structure parameters of the model to be evaluated corresponding to the partial compression scheme may be input into the duration prediction model, so as to obtain a predicted duration (the predicted duration may refer to a predicted data processing duration consumed by the model to be evaluated corresponding to the partial compression scheme to process the sample data to obtain an output result), the duration prediction model may be trained with a deviation between the minimized predicted duration and an actual data processing duration of the model to be evaluated corresponding to the partial compression scheme as an optimization target, and when the data processing duration of the model to be evaluated corresponding to the residual compression scheme is predicted, the model structure parameters of the model to be evaluated corresponding to the residual compression scheme may be directly input into the duration prediction model, so as to obtain the predicted data processing duration corresponding to the residual compression scheme.

Thus, although there may be thousands of models to be evaluated. However, a small amount of models to be evaluated can be selected to run to obtain the data processing time length, and then the data processing time length of a large amount of residual models to be evaluated is predicted through the time length prediction model, so that the actual residual models to be evaluated do not need to be obtained, and the efficiency of model compression is improved.

S108: and determining a target compression scheme according to the data processing time length corresponding to each compression scheme, and compressing the model to be compressed according to the target compression scheme.

After determining the data processing time lengths corresponding to the various compression schemes, the target compression scheme can be determined according to the data processing time length corresponding to each compression scheme.

That is, an appropriate compression scheme can be selected from each compression scheme according to the data processing duration, and the model to be compressed is compressed according to the compression scheme, so as to achieve the effect of model compression.

Specifically, the parameter of the model to be evaluated corresponding to various compression schemes can be graded to obtain a grading result, and for each parameter grade in the grading result, a target compression scheme corresponding to the parameter grade is determined according to the compression scheme of which the corresponding data processing time length in the parameter grade does not exceed the set time length corresponding to the parameter grade. That is, each compression scheme may be divided into a plurality of shift positions according to the parameter amounts of the parameters included in the model to be evaluated, for example, each compression scheme may be divided into three parameter amount shift positions, the parameter amount of the model to be evaluated corresponding to the compression scheme in the first parameter amount shift position is the largest, the parameter amount of the model to be evaluated corresponding to the compression scheme in the second parameter amount shift position is the largest, and the parameter amount of the model to be evaluated corresponding to the compression scheme in the third parameter amount shift position is the smallest. The compression scheme is divided into a plurality of parameter gears, and the parameter of the model to be evaluated of the compression scheme in each parameter gear can be preset in any interval.

After the gear is shifted, the compression scheme with shorter data processing duration corresponding to each parameter gear can be determined, and the target compression scheme corresponding to the parameter gear is determined through the compression schemes.

And after each compression scheme is classified, the compression scheme with the corresponding data processing time length in the parameter gear not exceeding the set time length corresponding to the parameter gear can be used as each candidate compression scheme, model structure parameters contained in each candidate compression scheme are clustered to obtain a clustering center corresponding to each candidate compression scheme, and then the target compression scheme corresponding to the parameter gear is determined according to the model structure parameters corresponding to the clustering center.

That is, for the candidate compression schemes in a parameter range, the data processing time lengths of the candidate compression schemes do not exceed the set time length corresponding to the parameter range, so that the data processing time lengths of the candidate compression schemes are close, the candidate compression schemes are clustered according to the model structure parameters, and the compression scheme (or the model structure parameter) corresponding to the clustering center can be selected, so that the target compression scheme corresponding to the parameter range is obtained.

Particularly, during clustering, the candidate compression schemes in one parameter range are clustered, so that the candidate compression scheme corresponding to the clustering center (most robust) is selected, therefore, the model structure parameters contained in each candidate compression scheme can be clustered according to the above-mentioned method, the model structure (namely, the convolution kernel contained in the model, the network layer number and the convolution kernel number corresponding to each network layer) of the model to be evaluated corresponding to the candidate compression scheme can be assumed according to the model structure parameters of the candidate compression scheme, the simulated model structure is obtained, the simulated model structure corresponding to each model to be evaluated is clustered, the clustering center is obtained, and the candidate compression scheme corresponding to the clustering center is used as the target compression scheme corresponding to the parameter range.

The target compression scheme for this parameter range is determined because the data processing time lengths of the candidate compression schemes are close, and in these candidate compression schemes, the model is robust if the model structure is changed little and the data processing time length is also changed little. In the clustering, the distance between model structure parameters (or model structures) can be calculated by calculating a non-Euclidean distance (such as an editing distance), and then, each candidate compression scheme is clustered.

Because each parameter amount gear corresponds to a target compression scheme, when a required compression scheme is finally selected, the (neural network model of the) parameter amount gear suitable for the equipment can be selected through performance parameters, capacity and the like of the equipment needing to deploy the to-be-compressed model, so that the target compression scheme corresponding to the parameter amount gear can be applied to compress the to-be-compressed model.

According to the method, various compression schemes aiming at the model to be compressed can be determined based on the model structure parameters of the model to be compressed, and for one compression scheme, the model to be evaluated can be obtained by compressing the model to be compressed according to the compression scheme, so that the data processing time length corresponding to each compression scheme can be determined, and the most suitable compression scheme for compressing the model can be determined.

And under the condition that the volume of the model to be compressed is huge, a plurality of compression schemes exist, so that the data processing time length corresponding to each compression scheme can be determined by training a time length prediction model, after the data processing time length is determined, the compression scheme with shorter data processing time length in a file can be selected by grading the compression scheme according to the parameter quantity of the model, and the model with longer data processing time length and higher robustness is selected by clustering the compression scheme corresponding to the model to be evaluated, so that the model to be evaluated with higher calculation efficiency and better effect is obtained.

When the model structure parameters are clustered, in a parameter amount gear, all the model structure parameters contained in one candidate compression scheme can be regarded as one point, each candidate compression scheme can be clustered, and model structure parameters in different dimensions (three dimensions are exemplified by the convolution kernel size, the network layer number and the convolution kernel number contained in each network layer) can be clustered respectively, namely, for each candidate compression scheme, the model structure parameters in each dimension can be clustered once, and a clustering center can be selected for each dimension.

It should be further noted that, in order to ensure the effect of risk identification, a large number of risk identification algorithms are often required to be set inside the wind control model used in the risk identification process, which may cause that the model is too complex in structure and too large in parameter number on model parameters, so that the wind control model is difficult to be effectively deployed in different scenes or on different equipment ends.

Therefore, the model compression method provided by the specification is particularly suitable for carrying out model compression on the wind control model in the risk identification scene, and the compressed wind control model can be deployed in various business scenes or various equipment ends, so that active risk prevention and control in different business scenes and different equipment ends are realized, and the property safety of data and users is ensured.

Therefore, in the model compression method provided in the present specification, the above-mentioned sample data may refer to service data of a user, and the model to be compressed may refer to a wind control model for risk identification of the user, so in practical application, the service data may be input into the wind control model to perform service wind control on the user through the wind control model. The compression of the wind control model can be realized by using the mode provided by the specification.

The above method for compressing a model provided for one or more embodiments of the present specification further provides an apparatus for compressing a model based on the same concept, as shown in fig. 6.

Fig. 6 is a schematic diagram of an apparatus for model compression provided in the present specification, specifically including:

an obtaining module 601, configured to obtain a model to be compressed;

the scheme determining module 602 is configured to determine model structure parameters of the model to be compressed, and determine a plurality of compression schemes for the model to be compressed according to the model structure parameters;

the model determining module 603 is configured to determine a model to be evaluated corresponding to each compression scheme according to various compression schemes;

a duration determining module 604, configured to determine, for each compression scheme, according to preset sample data, a data processing duration consumed by the to-be-evaluated model corresponding to the compression scheme to process the sample data to obtain an output result, where the data processing duration is used as a data processing duration corresponding to the compression scheme;

And the compression module 605 is configured to determine a target compression scheme according to a data processing duration corresponding to each compression scheme, and compress the model to be compressed according to the target compression scheme.

the scheme determining module 602 is specifically configured to determine, according to a size of a convolution kernel in the model to be compressed, the size of the convolution kernel after the convolution kernel of the model to be compressed is compressed; according to the compression size, adjusting the convolution kernel of the model to be compressed to obtain at least one adjusted convolution kernel; and determining a plurality of compression schemes for the model to be compressed according to the at least one adjusted convolution kernel.

the scheme determining module 602 is specifically configured to determine, according to the number of network layers of the model to be compressed, a combination manner of combining different network layers in the model to be compressed; according to the combination mode, a plurality of compression schemes for the model to be compressed are determined.

The scheme determining module 602 is specifically configured to determine, for each network layer in the model to be compressed, a candidate number of convolution kernels adopted by the network layer according to a number of convolution kernels actually included in the network layer, where the candidate number does not exceed the number of convolution kernels actually included in the network layer; and determining a plurality of compression schemes for the model to be compressed according to the candidate quantity of the convolution kernels adopted by each network layer.

Optionally, the duration determining module 604 is specifically configured to select a partial compression scheme from the plurality of compression schemes, and input the sample data into a model to be evaluated corresponding to the partial compression scheme, so as to obtain a data processing duration corresponding to the partial compression scheme; determining a training sample according to the data processing time length corresponding to the partial compression scheme so as to train a preset time length prediction model through the training sample; and predicting the data processing time length corresponding to the residual compression scheme through the trained time length prediction model.

Optionally, the compression module 605 is specifically configured to rank the parameter amounts of the model to be evaluated corresponding to each compression scheme, so as to obtain a ranking result; and determining a target compression scheme corresponding to the parameter value gear according to the compression scheme of which the corresponding data processing time length in the parameter value gear does not exceed the set time length corresponding to the parameter value gear aiming at each parameter value gear in the gear grading result.

Optionally, the compression module 605 is specifically configured to use, as each candidate compression scheme, a compression scheme in the parameter range, where the duration of the corresponding data processing in the parameter range does not exceed the set duration corresponding to the parameter range; clustering model structure parameters contained in each candidate compression scheme to obtain a clustering center corresponding to each candidate compression scheme; and determining a target compression scheme corresponding to the parameter quantity gear according to the model structure parameters corresponding to the clustering center.

The present specification also provides a computer readable storage medium storing a computer program operable to perform the method of model compression described above.

The present specification also provides a schematic structural diagram of the electronic device shown in fig. 7. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as described in fig. 7, although other hardware required by other services may be included. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to realize the model compression method. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing nodes that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage nodes.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A method of model compression, comprising:

the electronic equipment acquires a model to be compressed;

determining model structure parameters of the model to be compressed, and determining a plurality of compression schemes for the model to be compressed according to the model structure parameters, wherein the model structure parameters comprise: the size of convolution kernels in the model to be compressed, the number of model layers of the model to be compressed and the number of convolution kernels in each network layer of the model to be compressed;

and executing compression operation on the model to be compressed according to the information contained in various compression schemes to generate a model to be evaluated corresponding to each compression scheme, wherein the information contained in each compression scheme comprises: at least one of the number of convolutions of the size-adjusted convolutions, the number of layers of the network layer, and the number of convolutions of the convolutions of each network layer;

for each compression scheme, determining the data processing time length consumed by a model to be evaluated corresponding to the compression scheme for processing the sample data to obtain an output result according to preset sample data, and taking the data processing time length as the data processing time length corresponding to the compression scheme, wherein the electronic equipment selects a part of the compression schemes from the plurality of compression schemes, and inputs the sample data into the model to be evaluated corresponding to the part of the compression schemes to obtain the data processing time length corresponding to the part of the compression schemes; constructing a training sample according to the data processing time length corresponding to the partial compression scheme and the model structure parameters contained in the partial compression scheme; inputting model structure parameters contained in the training samples into a duration prediction model to obtain a predicted duration, and training the duration prediction model by taking the deviation between the predicted duration and the data processing duration corresponding to the partial compression scheme as an optimization target; predicting the data processing time length corresponding to the residual compression scheme through a trained time length prediction model;

Searching a target compression scheme from the compression schemes according to the data processing time length corresponding to each compression scheme, compressing the model to be compressed according to the target compression scheme, and deploying the compressed model.

2. The method of claim 1, the model structure parameters comprising a size of a convolution kernel in the model to be compressed;

3. The method of claim 1, the model structure parameters comprising a number of network layers of the model to be compressed;

4. The method of claim 1, the model structure parameters comprising a number of convolution kernels per network layer of the model to be compressed;

5. The method of claim 1, searching for a target compression scheme from among the compression schemes according to a data processing duration corresponding to each compression scheme, comprising:

and searching a target compression scheme corresponding to the parameter value gear according to a compression scheme of which the corresponding data processing time length in the parameter value gear does not exceed the set time length corresponding to the parameter value gear aiming at each parameter value gear in the gear grading result.

6. The method according to claim 5, according to the compression scheme of the parameter gear corresponding to the data processing duration not exceeding the set duration corresponding to the parameter gear, searching the target compression scheme corresponding to the parameter gear, including:

and searching out a target compression scheme corresponding to the parameter quantity gear according to the model structure parameters corresponding to the clustering center.

7. An apparatus for model compression, comprising:

the acquisition module is used for acquiring the model to be compressed;

the scheme determining module is used for determining model structure parameters of the model to be compressed, determining a plurality of compression schemes aiming at the model to be compressed according to the model structure parameters, wherein the model structure parameters comprise: the size of convolution kernels in the model to be compressed, the number of model layers of the model to be compressed and the number of convolution kernels in each network layer of the model to be compressed;

The model determining module is used for executing compression operation on the model to be compressed according to various compression schemes to generate a model to be evaluated corresponding to each compression scheme, wherein for each compression scheme, the information contained in the compression scheme comprises: at least one of the number of convolutions of the size-adjusted convolutions, the number of layers of the network layer, and the number of convolutions of the convolutions of each network layer;

the time length determining module is used for determining the data processing time length consumed by the sample data processed by the model to be evaluated corresponding to the compression scheme to obtain an output result according to preset sample data as the data processing time length corresponding to the compression scheme, wherein the electronic equipment selects a part of the compression scheme from the plurality of compression schemes and inputs the sample data into the model to be evaluated corresponding to the part of the compression scheme to obtain the data processing time length corresponding to the part of the compression scheme; constructing a training sample according to the data processing time length corresponding to the partial compression scheme and the model structure parameters contained in the partial compression scheme; inputting model structure parameters contained in the training samples into a duration prediction model to obtain a predicted duration, and training the duration prediction model by taking the deviation between the predicted duration and the data processing duration corresponding to the partial compression scheme as an optimization target; predicting the data processing time length corresponding to the residual compression scheme through a trained time length prediction model;

The compression module is used for searching a target compression scheme from the compression schemes according to the data processing time length corresponding to each compression scheme, compressing the model to be compressed according to the target compression scheme, and deploying the compressed model.

8. A computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-6.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the preceding claims 1-6 when executing the program.