CN112329923A

CN112329923A - Model compression method and device, electronic equipment and readable storage medium

Info

Publication number: CN112329923A
Application number: CN202011334592.2A
Authority: CN
Inventors: 林晨; 李哲暘; 谭文明; 任烨; 彭博; 毛芳党
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2020-11-24
Filing date: 2020-11-24
Publication date: 2021-02-05
Anticipated expiration: 2040-11-24
Also published as: WO2022111490A1; CN112329923B

Abstract

The application provides a model compression method, a device, an electronic device and a readable storage medium, wherein the model compression method comprises the following steps: dividing a model to be compressed into a plurality of optimization units, wherein one optimization unit comprises a plurality of continuous convolution layers in the model to be compressed; for any optimization unit, quantizing the parameters of each convolution layer in the optimization unit to obtain a quantized optimization unit; and respectively optimizing the parameters of each convolution layer in the quantized optimization unit so as to enable the first distance to be smaller than the second distance. The method can reduce the time consumed by model compression and the resources of calculation and storage under the condition of ensuring the model performance and the model compression effect.

Description

Model compression method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a model compression method and apparatus, an electronic device, and a readable storage medium.

Background

As one of representative techniques of artificial intelligence, Convolutional Neural Networks (CNN) have been widely used in many fields such as computer vision, natural language processing, and automatic driving.

In order to adapt to a neural network hardware structure with a new architecture, an intelligent algorithm can be completed on a mobile terminal and an embedded device with lower power consumption and higher performance, model compression needs to be performed on a convolutional neural network, and the operation amount and parameter storage amount of a network model are reduced.

The traditional model compression method needs to train the compressed model by using a complete data set, and the process consumes a lot of time, calculation and storage resources.

Disclosure of Invention

In view of the above, the present application provides a model compression method, apparatus, electronic device and readable storage medium.

Specifically, the method is realized through the following technical scheme:

according to a first aspect of embodiments of the present application, there is provided a model compression method, including:

dividing a model to be compressed into a plurality of optimization units, wherein one optimization unit comprises a plurality of continuous convolution layers in the model to be compressed;

for any optimization unit, quantizing the parameters of each convolution layer in the optimization unit to obtain a quantized optimization unit;

respectively optimizing the parameters of each convolution layer in the quantized optimization unit so as to enable the first distance to be smaller than the second distance; the first distance is the distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit after parameter optimization; the second distance is a distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit before parameter optimization.

According to a second aspect of embodiments of the present application, there is provided a pattern compression apparatus including:

the device comprises a dividing unit, a compressing unit and a control unit, wherein the dividing unit is used for dividing a model to be compressed into a plurality of optimization units, and one optimization unit comprises a plurality of continuous convolution layers in the model to be compressed;

the compression unit is used for quantizing the parameters of each convolution layer in the optimization unit to obtain a quantized optimization unit for any optimization unit;

the optimization unit is used for respectively optimizing the parameters of each convolution layer in the quantized optimization unit so as to enable the first distance to be smaller than the second distance; the first distance is the distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit after parameter optimization; the second distance is a distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit before parameter optimization.

According to a third aspect of embodiments of the present application, there is provided an electronic device, comprising a processor and a machine-readable storage medium, the machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being configured to execute the machine-executable instructions to implement the model compression method described above.

According to a fourth aspect of embodiments of the present application, there is provided a machine-readable storage medium having stored therein machine-executable instructions that, when executed by a processor, implement the above-described model compression method.

The technical scheme provided by the application can at least bring the following beneficial effects:

the model to be compressed is divided into a plurality of optimization units, parameters of each convolution layer in the optimization units are quantized for any optimization unit to obtain quantized optimization units, and then the parameters of each convolution layer in the quantized optimization units are optimized respectively to reduce the distance between the output characteristics of the quantized optimization units and the output characteristics of the original optimization units, so that time consumed by model compression and calculation and storage resources are reduced under the condition that model performance and model compression effects are guaranteed.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating a method of model compression in accordance with an exemplary embodiment of the present application;

FIG. 2 is a schematic diagram illustrating partitioning of an optimization unit for a model to be compressed according to an exemplary embodiment of the present application;

FIG. 3 is a diagram illustrating channel clipping of an optimization unit according to an exemplary embodiment of the present application;

FIG. 4 is a schematic diagram of a model compression apparatus according to an exemplary embodiment of the present application;

fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

In order to make the technical solutions provided in the embodiments of the present application better understood and make the above objects, features and advantages of the embodiments of the present application more comprehensible, the technical solutions in the embodiments of the present application are described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, a schematic flow chart of a model compression method according to an embodiment of the present disclosure is shown in fig. 1, where the model compression method may include the following steps:

step S100, dividing the model to be compressed into a plurality of optimization units, wherein one optimization unit comprises a plurality of continuous convolution layers in the model to be compressed.

In the embodiment of the present application, it is considered that when the entire model is compressed, for example, parameter quantization, the amount of the related parameters is very large, and the compressed model usually needs to be parameter-adjusted by using an entire data set after compression, so as to ensure the performance of the model, and consume much time, calculation resources, and storage resources.

In addition, it is considered that if model compression is performed on a single convolutional layer, for example, parameter quantization, the compression space is small, and the correlation between consecutive convolutional layers cannot be considered, so that under-fitting is likely to occur, and the final model performance is poor.

Accordingly, in the embodiment of the present application, a plurality of convolutional layers may be used as one optimization unit, a model to be compressed is divided into a plurality of optimization units, and the optimization units are used as objects to perform model compression.

It should be noted that the convolutional layer mentioned in the embodiment of the present application may include a convolutional layer (Conv layer) or a fully-connected layer (FC layer, convolutional layer fully connected to a node in the previous layer, which belongs to a specific convolutional layer).

Optionally, two adjacent optimization units comprise at least one identical convolutional layer.

Preferably, 2-3 continuous convolution layers can be used as an optimization unit.

For example, taking 2 convolutional layers in series as an optimization unit, assume that the model includes 2N convolutional layers, i.e., M ═ L₁，L₂，…，L_i，…，L_2N]Then, the whole model can be traversed, two convolution layers connected in front and back are pairwise combined to form an optimization unit, and N optimization units [ (L)₁，L₂),(L₃，L₄)，…，(L_i，L_i+1)，…，(L_2N-1，L_2N)]。

As another example, taking 2 convolutional layers as an optimization unit, assume that the model includes N convolutional layersConvolutional layers, i.e. M ═ L₁，L₂，…，L_i，…，L_N]Then, the whole model can be traversed, two convolution layers connected in front and back are pairwise combined to form an optimization unit, and N-1 optimization units [ (L)₁，L₂),(L₂，L₃)，…，(L_i，L_i+1)，…，(L_N-2，L_N-1)，(L_N-1，L_N)]。

Step S110, for any optimization unit, quantizing the parameters of each convolution layer in the optimization unit to obtain a quantized optimization unit.

In this embodiment of the application, when the partition of the optimization unit of the model to be compressed is completed in the manner of step S100, the optimization unit may be used as an object to compress the model to be compressed, and the parameters of each convolution layer in each optimization unit are sequentially quantized to obtain a quantized optimization unit.

Step S120, respectively optimizing the parameters of each convolution layer in the quantized optimization unit so as to enable the first distance to be smaller than the second distance; the first distance is the distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit after parameter optimization; the second distance is the distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit before parameter optimization.

In the embodiment of the present application, in consideration that quantization operation may introduce noise to a model, so as to affect final performance of the model, in order to optimize performance of the compressed model, parameters of each convolution layer in the quantized optimization unit may be optimized respectively, so as to reduce a distance between output features of the quantized optimization unit and output features of the original optimization unit, that is, reduce a difference between output features of the quantized optimization unit and output features of the original optimization unit.

Even after the parameter optimization is performed, the distance (referred to as a first distance herein) between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit is smaller than the distance (referred to as a second distance herein) between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit before the parameter optimization is performed.

In the embodiment of the present application, the distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit refers to the distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit when the input characteristics are the same.

Illustratively, the distance may include, but is not limited to, a cosine distance, a euclidean distance, and the like.

For example, for any optimization unit, a first number of features X may be input to the original optimization unit to obtain output features Y; after the optimization unit is quantized, the first number of features X may be input to the quantized optimization unit to obtain output features Y ', and the distance between the output features of the quantized optimization unit and the output features of the original optimization unit is determined by calculating the cosine distance or euclidean distance between Y and Y', and when the parameters of each convolution layer in the quantized optimization unit are optimized, parameter optimization may be performed on the basis of minimizing the distance.

In some embodiments, in step S120, optimizing the parameters of each convolution layer in the quantized optimization unit respectively may include:

respectively optimizing each parameter of each convolution layer in the quantized optimization unit according to an iterative optimization mode;

and for any parameter, adjusting the parameter according to the quantization interval to determine a target parameter value, wherein the target parameter value is the value of the parameter which minimizes the distance between the output characteristic of the optimization unit after the parameter adjustment and the output characteristic of the original optimization unit in the process of adjusting the parameter.

Illustratively, parameters in the optimization unit can be finely optimized in an iterative optimization mode, the parameters in the optimization unit are optimized one by one, quantization noise caused by quantizing one parameter is compensated when other parameters are optimized subsequently, and the quantization noise is effectively reduced, so that the model performance is improved.

For any parameter, the parameter may be adjusted according to the quantization interval, for example, the quantization interval is added or subtracted on the basis of the parameter, the distance between the output characteristic of the optimized unit after the parameter adjustment and the output characteristic of the original optimized unit is determined, the distance is compared with the distance between the output characteristic of the optimized unit before the parameter adjustment and the output characteristic of the original optimized unit, and if the distance is reduced, the adjustment is maintained; otherwise, this adjustment is cancelled to determine the value of the parameter (referred to herein as the target value) that minimizes the distance between the output characteristic of the parameter-adjusted optimization unit and the output characteristic of the original optimization unit.

In addition, for any parameter, if adjusting the parameter according to the quantization interval increases the distance between the output characteristic of the optimization means and the output characteristic of the original optimization means, the target parameter value may be an original quantized value of the parameter (a value that is not adjusted according to the quantization interval).

In addition, it should be appreciated that, in the embodiment of the present application, the parameters of each convolutional layer in the quantized optimization unit are not limited to be optimized by using an iterative optimization method, for example, a gradient-based optimization method (such as a batch gradient descent method or a random gradient descent method) may also be used to optimize the parameters of each convolutional layer in the quantized optimization unit, and specific implementation thereof is not described herein again.

In some embodiments, before quantizing the parameters of each convolution layer in the optimization unit in step S110, the method may further include:

according to the importance of each channel in the optimization unit, channel cutting is carried out on the optimization unit to obtain an optimization unit after channel cutting; wherein, one channel of the optimization unit comprises an output channel of a previous convolution layer in two adjacent convolution layers in the optimization unit and a corresponding input channel of a next convolution layer, and the importance of the channel is positively correlated with the distance between output characteristics before and after the channel is cut;

in step S110, quantizing the parameters of each convolution layer in the optimization unit may include:

and quantizing the parameters of each convolution layer in the optimization unit after channel cutting to obtain a quantized optimization unit.

For example, for any optimization unit, before quantizing the parameters of each convolution layer in the optimization unit, channel clipping may be performed on the optimization unit to further compress the model.

For any optimization unit, channel cutting can be carried out on the optimization unit according to the importance of each channel in the optimization unit, so that a certain number of channels with lower importance are cut off, and model compression is realized.

Illustratively, one channel of the optimization unit includes one output channel of a previous convolutional layer of two adjacent convolutional layers in the optimization unit, and the corresponding input channel of a next convolutional layer.

For example, an optimization unit comprising 2 convolutional layers in succession (in L)₁And L₂For example), when channel clipping is performed, due to L₁And L₂Are connected in front and back, L₁One output channel of will correspond to L₂So that when channel cutting is performed, L is cut off₁When an output channel is needed, L needs to be cut off₂The corresponding input channel of (a).

Similarly, when an optimization unit includes 3 convolutional layers (in L)₁、L₂And L₃For example), when channel cutting is performed once, L can be cut off₁And L₂Or, cutting off L₂And L₃The corresponding input channel of (a).

When the optimization unit is cut, parameters of each convolution layer in the optimization unit after channel cutting may be quantized, and a specific implementation manner of the method may refer to related descriptions in the method flow shown in fig. 1.

In other embodiments, after the quantifying the parameters of each convolution layer in the optimization unit in step S110, the method may further include:

according to the importance of each channel in the quantized optimization unit, channel cutting is carried out on the quantized optimization unit to obtain the channel-cut optimization unit; wherein, a channel of the optimization unit comprises an output channel of a previous convolution layer in two adjacent convolution layers in the optimization unit and a corresponding input channel of a next convolution layer, and the importance of the channel is positively correlated with the distance between output features before and after the channel is cut.

In step S120, the optimizing the parameters of each convolution layer in the quantized optimization unit may include:

and respectively optimizing the parameters of each convolution layer in the optimization unit after channel cutting.

For example, for any optimization unit, after quantizing the parameters of each convolution layer in the optimization unit, channel clipping may be performed on the optimization unit to further compress the model.

When the cutting of the optimization unit is completed, parameters of each convolution layer in the optimization unit after the channel cutting can be optimized respectively, and the performance of the compressed model is optimized.

In one example, channel clipping the optimization unit is achieved by:

for any channel, determining the distance between the output features of the optimization units before and after the channel is cut;

sorting the channels according to the distance between the output features before and after cutting of each channel and the sequence of the distances from large to small;

and when determining that the distance between the output characteristic of the optimization unit and the output characteristic of the original optimization unit is smaller than a first threshold when the N channels in the front sequence are cut off, and determining that the N channels in the front sequence are cut off when the distance between the output characteristic of the optimization unit and the output characteristic of the original optimization unit when the N +1 channels in the front sequence are cut off is larger than or equal to the first threshold.

Illustratively, under the condition that the distance between the output characteristic of the optimized unit after channel clipping and the output characteristic of the original optimized unit is less than a preset threshold (referred to as a first threshold herein), the channels are clipped in sequence from low importance to high importance.

For any optimization unit (which may include an optimization unit before parameter quantization or an optimization unit after parameter quantization) that needs to perform channel clipping, the distance between the output features before and after each channel clipping may be determined respectively.

The distance between the output features before and after the trimming of one channel means that the optimization unit trims the distance between the output features before and after the trimming of the channel when the input features are the same.

For example, taking channel clipping for the optimization unit before parameter quantization as an example, for any optimization unit, the output characteristic Y of the optimization unit when a certain number of input characteristics X are input may be determined first, and for any channel in the optimization unit, the output Y ″ of the optimization unit after channel clipping when a certain number of input characteristics X are input after the channel is clipped may be determined, and the distance between Y and Y ″ is determined, such as the euclidean distance or the cosine distance.

When the distance between the output features of each channel before and after being cut is determined according to the mode, the channels can be sorted according to the distance between the output features of each channel before and after being cut according to the sequence from large to small, and the channel with the lowest importance in the channels reserved in the optimization unit is cut off in sequence on the premise that the distance between the output feature of the optimization unit after being cut and the output feature of the original optimization unit is smaller than a preset threshold (namely a first threshold).

For example, the channel 1 sorted from low to high in importance may be cut, then the distance between the output feature of the optimized unit after the channel is cut and the output feature of the original optimized unit is determined (in the case that the input features are the same), if the distance is smaller than the first threshold, the channel 2 sorted is further cut, the distance between the output feature of the optimized unit after the channel is cut and the output feature of the original optimized unit is determined again, and whether the distance is smaller than the first threshold is determined, if yes, the cutting is continued; otherwise, the cutting is cancelled, and the channel cutting process of the optimization unit is ended.

It should be noted that, when channel clipping is performed, a plurality of channels ranked in the front, such as 5 or 10 channels, may be clipped first, and a distance between an output feature of the optimization unit after channel clipping and an output feature of the original optimization unit is determined, and if the distance is smaller than the first threshold, further clipping is performed; if the distance is greater than or equal to the first threshold, the clipping is cancelled, the number of the clipped channels is reduced, and the channel clipping is performed again, which is not described herein in detail.

In another example, channel clipping the optimization unit is achieved by:

determining a channel of which the distance between the output features of the optimization units before and after cutting is smaller than a second threshold value as a channel to be cut according to the distance between the output features before and after cutting of each channel;

and cutting the channel to be cut.

For example, the channels with a distance smaller than a preset threshold (referred to as a second threshold herein) may be clipped according to the distance between the output features of the optimization unit before and after channel clipping.

When the distance between the output features before and after the channel is cut is determined according to the mode, the channel of which the distance is smaller than the second threshold value can be determined as the channel to be cut according to the distance between the output features before and after the channel is cut, and the channel to be cut is cut.

It should be noted that, in the embodiment of the present application, for a scenario where the compression rate requirement of the model is high but the performance requirement of the model is relatively low, a specified number of channels may also be deleted according to the order from low to high of the channel importance, and the specific implementation thereof is not described herein again.

In addition, it should be appreciated that, in the embodiment of the present application, the channel to be clipped is not limited to be determined according to the distance between the output features of the optimization unit before and after the channel clipping, for example, a group last (least absolute value convergence and selection operator) mode may also be adopted to select the remaining channel, and clip other channels, and a specific implementation thereof is not described herein.

In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present application, the technical solutions provided by the embodiments of the present application are described below with reference to specific examples.

In this embodiment, two consecutive convolutional layers in the model are used as the optimization unit, and the distance between the output features of the optimization unit before and after compression is used as the optimization target, and the two convolutional layers in the optimization unit are subjected to channel clipping and parameter quantization, and the implementation process is as follows:

1. dividing the model into a plurality of optimization units by taking two continuous convolution layers in the model as optimization units, wherein a schematic diagram can be shown in FIG. 2;

2. and for any optimization unit, determining the channel to be cut according to the importance of each channel in the optimization unit, and cutting the channel to be cut to obtain the optimization unit after channel cutting.

For example, for an optimization unit, when performing channel clipping, a channel includes an output channel of a previous convolutional layer and a corresponding input channel of a subsequent convolutional layer, and a schematic diagram thereof can be shown in fig. 3.

The importance of a channel is positively correlated to the distance between the output features of the optimization unit after the channel is clipped.

3. And quantizing the parameters of the two convolution layers in the optimized unit after channel cutting to obtain the quantized optimized unit.

4. And optimizing each parameter in the quantized optimization unit according to an iterative optimization mode.

5. And (5) repeating the steps 2-4 until all the optimization units corresponding to the whole model complete channel cutting and parameter quantification.

The respective steps will be described in detail below.

Step 1 may include:

the hypothetical model includes N convolutional layers, i.e., M ═ L₁，L₂，…，L_i，…，L_N]Then, the whole model can be traversed, two convolution layers connected in front and back are pairwise combined to form an optimization unit, and N-1 optimization units [ (L)₁，L₂),(L₂，L₃)，…，(L_i，L_i+1)，…，(L_N-2，L_N-1)，(L_N-1，L_N)]。

Step 2 may include:

2.1, a certain number of features X are input into the original optimization unit S, e.g. S ═ (L)₁，L₂) Obtaining the output Y of the original optimization unit;

2.2 cutting one channel in the S to obtain an optimization unit after cutting the channel

Illustratively, suppose L₁And L₂The parameters are respectively the shape

And

due to L₁And L₂Are connected in front and back, L₁One output channel of corresponds to L₂Of the input channel of, thus, L₁When one output channel is cut, L is required to be cut₂The corresponding input channel in (1) is also cut out.

2.3 determining an optimization Unit

And optimizing the distance between the output features of the unit S.

Illustratively, the unit can be optimized

And the distance (e.g. cosine distance or Euclidean distance) between the output features of the optimization unit S, i.e. inputting the feature X into the optimization unit

Obtain an output

According to Y and

the importance of the clipped channel is evaluated, and the greater the distance, the greater the importance of the clipped channel.

2.4, repeating the steps 2.2-2.3, and respectively determining the distance between the output characteristics of the optimization units before and after (cutting on the basis of the original optimization units) each channel is cut off;

2.5, sorting according to the sequence of the importance of the channels from low to high, according to a cutting target, cutting off the channels as much as possible under the condition that the distance between the output characteristic of the optimized channel after the channel cutting and the output characteristic of the original optimization unit is smaller than a first threshold value, or cutting off the channel of which the distance between the output characteristics of the optimization units before and after the channel cutting is smaller than a second threshold value, and performing channel cutting to obtain the optimization unit S after the channel cutting_p。

Illustratively, by performing channel clipping on an optimization unit comprising a plurality of convolutional layers in succession, the problem that clipping a channel in an implementation mode in which a single convolutional layer is used as the optimization unit causes the alignment failure of subsequent features and the channel is avoided.

Step 3 may include:

for optimization unit S_pQuantizing the parameters of each convolution layer to obtain a quantized optimization unit S_p&q。

Step 4 may include:

for the optimization unit S obtained in step 3_p&qRespectively optimizing each parameter of the two convolution layers, reducing S_p&qOutput Y of_p&qAnd the output characteristics Y (the input characteristics are all X) of the original optimization unit S. The method specifically comprises the following steps:

4.1, in pairs S_p&qThe parameter p in (1) is optimized as an example, and the value of p (quantization parameter) is assumed to be x_qQuantization interval of α according to which x is paired_qMaking an adjustment so that x_q＝argmin(f(S_p&qX, Y)), wherein X_qIs an integer multiple of alpha, and f () is the input of the input feature X to S_p&qIs compared to the distance between the output feature in (1) and the output feature inputting the input feature X into the original optimization unit S, i.e. X is paired according to the quantization interval α_qMaking adjustments, e.g. to x_qAdding or subtracting alpha (or adding or subtracting other integer multiples of alpha), finding the input features X to S_p&qX having the smallest distance between the output features in (1) and the output features inputting the input features into the original optimization unit S_q。

4.2, according to an iterative optimization mode, for S_p&qAnd the parameters are optimized one by one.

Illustratively, information loss caused by channel cutting and parameter quantization is compensated through parameter optimization.

In addition, through the one-by-one fine optimization of the parameters in the optimization unit, the quantization noise caused by quantizing one parameter can be compensated when other parameters are optimized subsequently, so that the quantization noise is effectively reduced, and the model performance is improved.

The methods provided herein are described above. The following describes the apparatus provided in the present application:

referring to fig. 4, a schematic structural diagram of a model compression apparatus according to an embodiment of the present application is shown in fig. 4, where the model compression apparatus may include:

a dividing unit 410, configured to divide a model to be compressed into multiple optimization units, where one optimization unit includes multiple convolutional layers that are consecutive in the model to be compressed;

a compressing unit 420, configured to quantize, for any optimization unit, parameters of each convolution layer in the optimization unit to obtain a quantized optimization unit;

an optimizing unit 430, configured to optimize parameters of each convolution layer in the quantized optimizing unit, respectively, so that the first distance is smaller than the second distance; the first distance is the distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit after parameter optimization; the second distance is a distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit before parameter optimization.

In some embodiments, the optimizing unit 430 optimizes the parameters of each convolutional layer in the quantized optimizing unit respectively, including:

and for any parameter, adjusting the parameter according to the quantization interval to determine a target parameter value, wherein the target parameter value is the value of the parameter, which minimizes the distance between the output characteristic of the optimization unit after the parameter adjustment and the output characteristic of the original optimization unit in the process of adjusting the parameter.

In some embodiments, before the compressing unit 420 quantizes the parameters of each convolution layer in the optimization unit, the method further includes:

the optimization unit 430 quantizes parameters of each convolution layer in the optimization unit, and includes:

In some embodiments, after the compressing unit 420 quantizes the parameters of each convolution layer in the optimization unit, the method further includes:

according to the importance of each channel in the quantized optimization unit, channel cutting is carried out on the quantized optimization unit to obtain an optimized unit after channel cutting; wherein, one channel of the optimization unit comprises an output channel of a previous convolution layer in two adjacent convolution layers in the optimization unit and a corresponding input channel of a next convolution layer, and the importance of the channel is positively correlated with the distance between output characteristics before and after the channel is cut;

the optimizing unit 430 optimizes parameters of each convolution layer in the quantized optimizing unit, respectively, including:

and respectively optimizing the parameters of each convolution layer in the optimization unit after the channel is cut.

In some embodiments, the compression unit 420 channel-clipping the optimization unit includes:

sorting the channels according to the distance between the output characteristics of the optimization units before and after cutting of each channel and the sequence of the distances from large to small;

and cutting the channel to be cut.

Fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure. The electronic device may include a processor 501, a memory 502 storing machine executable instructions. The processor 501 and the memory 502 may communicate via a system bus 503. Also, the processor 501 may perform the model compression methods described above by reading and executing machine-executable instructions in the memory 502 corresponding to the encoded control logic.

The memory 502 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

In some embodiments, there is also provided a machine-readable storage medium, such as the memory 502 in fig. 5, having stored therein machine-executable instructions that, when executed by a processor, implement the model compression method described above. For example, the machine-readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and so forth.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A method of model compression, comprising:

2. The method of claim 1, wherein the separately optimizing the parameters of each convolutional layer in the quantized optimization unit comprises:

3. The method of claim 1, wherein prior to quantizing the parameters of each convolutional layer in the optimization unit, further comprising:

the quantifying the parameters of each convolution layer in the optimization unit includes:

4. The method of claim 1, wherein after quantizing the parameters of each convolutional layer in the optimization unit, further comprising:

the optimizing the parameters of each convolution layer in the quantized optimization unit respectively comprises:

5. The method according to claim 3 or 4, characterized in that channel clipping the optimization unit is achieved by:

6. The method according to claim 3 or 4, characterized in that channel clipping the optimization unit is achieved by:

and cutting the channel to be cut.

7. A pattern compression apparatus, comprising:

8. The apparatus of claim 7, wherein the optimizing unit optimizes parameters of each convolutional layer in the quantized optimizing unit respectively, comprising:

9. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor, the processor being configured to execute the machine executable instructions to implement the method of any one of claims 1 to 6.

10. A machine-readable storage medium having stored therein machine-executable instructions which, when executed by a processor, implement the method of any one of claims 1-6.