CN112329923A - Model compression method and device, electronic equipment and readable storage medium - Google Patents

Model compression method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN112329923A
CN112329923A CN202011334592.2A CN202011334592A CN112329923A CN 112329923 A CN112329923 A CN 112329923A CN 202011334592 A CN202011334592 A CN 202011334592A CN 112329923 A CN112329923 A CN 112329923A
Authority
CN
China
Prior art keywords
optimization unit
channel
optimization
distance
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011334592.2A
Other languages
Chinese (zh)
Other versions
CN112329923B (en
Inventor
林晨
李哲暘
谭文明
任烨
彭博
毛芳党
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202011334592.2A priority Critical patent/CN112329923B/en
Publication of CN112329923A publication Critical patent/CN112329923A/en
Priority to PCT/CN2021/132610 priority patent/WO2022111490A1/en
Application granted granted Critical
Publication of CN112329923B publication Critical patent/CN112329923B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application provides a model compression method, a device, an electronic device and a readable storage medium, wherein the model compression method comprises the following steps: dividing a model to be compressed into a plurality of optimization units, wherein one optimization unit comprises a plurality of continuous convolution layers in the model to be compressed; for any optimization unit, quantizing the parameters of each convolution layer in the optimization unit to obtain a quantized optimization unit; and respectively optimizing the parameters of each convolution layer in the quantized optimization unit so as to enable the first distance to be smaller than the second distance. The method can reduce the time consumed by model compression and the resources of calculation and storage under the condition of ensuring the model performance and the model compression effect.

Description

Model compression method and device, electronic equipment and readable storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a model compression method and apparatus, an electronic device, and a readable storage medium.
Background
As one of representative techniques of artificial intelligence, Convolutional Neural Networks (CNN) have been widely used in many fields such as computer vision, natural language processing, and automatic driving.
In order to adapt to a neural network hardware structure with a new architecture, an intelligent algorithm can be completed on a mobile terminal and an embedded device with lower power consumption and higher performance, model compression needs to be performed on a convolutional neural network, and the operation amount and parameter storage amount of a network model are reduced.
The traditional model compression method needs to train the compressed model by using a complete data set, and the process consumes a lot of time, calculation and storage resources.
Disclosure of Invention
In view of the above, the present application provides a model compression method, apparatus, electronic device and readable storage medium.
Specifically, the method is realized through the following technical scheme:
according to a first aspect of embodiments of the present application, there is provided a model compression method, including:
dividing a model to be compressed into a plurality of optimization units, wherein one optimization unit comprises a plurality of continuous convolution layers in the model to be compressed;
for any optimization unit, quantizing the parameters of each convolution layer in the optimization unit to obtain a quantized optimization unit;
respectively optimizing the parameters of each convolution layer in the quantized optimization unit so as to enable the first distance to be smaller than the second distance; the first distance is the distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit after parameter optimization; the second distance is a distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit before parameter optimization.
According to a second aspect of embodiments of the present application, there is provided a pattern compression apparatus including:
the device comprises a dividing unit, a compressing unit and a control unit, wherein the dividing unit is used for dividing a model to be compressed into a plurality of optimization units, and one optimization unit comprises a plurality of continuous convolution layers in the model to be compressed;
the compression unit is used for quantizing the parameters of each convolution layer in the optimization unit to obtain a quantized optimization unit for any optimization unit;
the optimization unit is used for respectively optimizing the parameters of each convolution layer in the quantized optimization unit so as to enable the first distance to be smaller than the second distance; the first distance is the distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit after parameter optimization; the second distance is a distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit before parameter optimization.
According to a third aspect of embodiments of the present application, there is provided an electronic device, comprising a processor and a machine-readable storage medium, the machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being configured to execute the machine-executable instructions to implement the model compression method described above.
According to a fourth aspect of embodiments of the present application, there is provided a machine-readable storage medium having stored therein machine-executable instructions that, when executed by a processor, implement the above-described model compression method.
The technical scheme provided by the application can at least bring the following beneficial effects:
the model to be compressed is divided into a plurality of optimization units, parameters of each convolution layer in the optimization units are quantized for any optimization unit to obtain quantized optimization units, and then the parameters of each convolution layer in the quantized optimization units are optimized respectively to reduce the distance between the output characteristics of the quantized optimization units and the output characteristics of the original optimization units, so that time consumed by model compression and calculation and storage resources are reduced under the condition that model performance and model compression effects are guaranteed.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a method of model compression in accordance with an exemplary embodiment of the present application;
FIG. 2 is a schematic diagram illustrating partitioning of an optimization unit for a model to be compressed according to an exemplary embodiment of the present application;
FIG. 3 is a diagram illustrating channel clipping of an optimization unit according to an exemplary embodiment of the present application;
FIG. 4 is a schematic diagram of a model compression apparatus according to an exemplary embodiment of the present application;
fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In order to make the technical solutions provided in the embodiments of the present application better understood and make the above objects, features and advantages of the embodiments of the present application more comprehensible, the technical solutions in the embodiments of the present application are described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, a schematic flow chart of a model compression method according to an embodiment of the present disclosure is shown in fig. 1, where the model compression method may include the following steps:
step S100, dividing the model to be compressed into a plurality of optimization units, wherein one optimization unit comprises a plurality of continuous convolution layers in the model to be compressed.
In the embodiment of the present application, it is considered that when the entire model is compressed, for example, parameter quantization, the amount of the related parameters is very large, and the compressed model usually needs to be parameter-adjusted by using an entire data set after compression, so as to ensure the performance of the model, and consume much time, calculation resources, and storage resources.
In addition, it is considered that if model compression is performed on a single convolutional layer, for example, parameter quantization, the compression space is small, and the correlation between consecutive convolutional layers cannot be considered, so that under-fitting is likely to occur, and the final model performance is poor.
Accordingly, in the embodiment of the present application, a plurality of convolutional layers may be used as one optimization unit, a model to be compressed is divided into a plurality of optimization units, and the optimization units are used as objects to perform model compression.
It should be noted that the convolutional layer mentioned in the embodiment of the present application may include a convolutional layer (Conv layer) or a fully-connected layer (FC layer, convolutional layer fully connected to a node in the previous layer, which belongs to a specific convolutional layer).
Optionally, two adjacent optimization units comprise at least one identical convolutional layer.
Preferably, 2-3 continuous convolution layers can be used as an optimization unit.
For example, taking 2 convolutional layers in series as an optimization unit, assume that the model includes 2N convolutional layers, i.e., M ═ L1,L2,…,Li,…,L2N]Then, the whole model can be traversed, two convolution layers connected in front and back are pairwise combined to form an optimization unit, and N optimization units [ (L)1,L2),(L3,L4),…,(Li,Li+1),…,(L2N-1,L2N)]。
As another example, taking 2 convolutional layers as an optimization unit, assume that the model includes N convolutional layersConvolutional layers, i.e. M ═ L1,L2,…,Li,…,LN]Then, the whole model can be traversed, two convolution layers connected in front and back are pairwise combined to form an optimization unit, and N-1 optimization units [ (L)1,L2),(L2,L3),…,(Li,Li+1),…,(LN-2,LN-1),(LN-1,LN)]。
Step S110, for any optimization unit, quantizing the parameters of each convolution layer in the optimization unit to obtain a quantized optimization unit.
In this embodiment of the application, when the partition of the optimization unit of the model to be compressed is completed in the manner of step S100, the optimization unit may be used as an object to compress the model to be compressed, and the parameters of each convolution layer in each optimization unit are sequentially quantized to obtain a quantized optimization unit.
Step S120, respectively optimizing the parameters of each convolution layer in the quantized optimization unit so as to enable the first distance to be smaller than the second distance; the first distance is the distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit after parameter optimization; the second distance is the distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit before parameter optimization.
In the embodiment of the present application, in consideration that quantization operation may introduce noise to a model, so as to affect final performance of the model, in order to optimize performance of the compressed model, parameters of each convolution layer in the quantized optimization unit may be optimized respectively, so as to reduce a distance between output features of the quantized optimization unit and output features of the original optimization unit, that is, reduce a difference between output features of the quantized optimization unit and output features of the original optimization unit.
Even after the parameter optimization is performed, the distance (referred to as a first distance herein) between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit is smaller than the distance (referred to as a second distance herein) between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit before the parameter optimization is performed.
In the embodiment of the present application, the distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit refers to the distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit when the input characteristics are the same.
Illustratively, the distance may include, but is not limited to, a cosine distance, a euclidean distance, and the like.
For example, for any optimization unit, a first number of features X may be input to the original optimization unit to obtain output features Y; after the optimization unit is quantized, the first number of features X may be input to the quantized optimization unit to obtain output features Y ', and the distance between the output features of the quantized optimization unit and the output features of the original optimization unit is determined by calculating the cosine distance or euclidean distance between Y and Y', and when the parameters of each convolution layer in the quantized optimization unit are optimized, parameter optimization may be performed on the basis of minimizing the distance.
In some embodiments, in step S120, optimizing the parameters of each convolution layer in the quantized optimization unit respectively may include:
respectively optimizing each parameter of each convolution layer in the quantized optimization unit according to an iterative optimization mode;
and for any parameter, adjusting the parameter according to the quantization interval to determine a target parameter value, wherein the target parameter value is the value of the parameter which minimizes the distance between the output characteristic of the optimization unit after the parameter adjustment and the output characteristic of the original optimization unit in the process of adjusting the parameter.
Illustratively, parameters in the optimization unit can be finely optimized in an iterative optimization mode, the parameters in the optimization unit are optimized one by one, quantization noise caused by quantizing one parameter is compensated when other parameters are optimized subsequently, and the quantization noise is effectively reduced, so that the model performance is improved.
For any parameter, the parameter may be adjusted according to the quantization interval, for example, the quantization interval is added or subtracted on the basis of the parameter, the distance between the output characteristic of the optimized unit after the parameter adjustment and the output characteristic of the original optimized unit is determined, the distance is compared with the distance between the output characteristic of the optimized unit before the parameter adjustment and the output characteristic of the original optimized unit, and if the distance is reduced, the adjustment is maintained; otherwise, this adjustment is cancelled to determine the value of the parameter (referred to herein as the target value) that minimizes the distance between the output characteristic of the parameter-adjusted optimization unit and the output characteristic of the original optimization unit.
In addition, for any parameter, if adjusting the parameter according to the quantization interval increases the distance between the output characteristic of the optimization means and the output characteristic of the original optimization means, the target parameter value may be an original quantized value of the parameter (a value that is not adjusted according to the quantization interval).
In addition, it should be appreciated that, in the embodiment of the present application, the parameters of each convolutional layer in the quantized optimization unit are not limited to be optimized by using an iterative optimization method, for example, a gradient-based optimization method (such as a batch gradient descent method or a random gradient descent method) may also be used to optimize the parameters of each convolutional layer in the quantized optimization unit, and specific implementation thereof is not described herein again.
In some embodiments, before quantizing the parameters of each convolution layer in the optimization unit in step S110, the method may further include:
according to the importance of each channel in the optimization unit, channel cutting is carried out on the optimization unit to obtain an optimization unit after channel cutting; wherein, one channel of the optimization unit comprises an output channel of a previous convolution layer in two adjacent convolution layers in the optimization unit and a corresponding input channel of a next convolution layer, and the importance of the channel is positively correlated with the distance between output characteristics before and after the channel is cut;
in step S110, quantizing the parameters of each convolution layer in the optimization unit may include:
and quantizing the parameters of each convolution layer in the optimization unit after channel cutting to obtain a quantized optimization unit.
For example, for any optimization unit, before quantizing the parameters of each convolution layer in the optimization unit, channel clipping may be performed on the optimization unit to further compress the model.
For any optimization unit, channel cutting can be carried out on the optimization unit according to the importance of each channel in the optimization unit, so that a certain number of channels with lower importance are cut off, and model compression is realized.
Illustratively, one channel of the optimization unit includes one output channel of a previous convolutional layer of two adjacent convolutional layers in the optimization unit, and the corresponding input channel of a next convolutional layer.
For example, an optimization unit comprising 2 convolutional layers in succession (in L)1And L2For example), when channel clipping is performed, due to L1And L2Are connected in front and back, L1One output channel of will correspond to L2So that when channel cutting is performed, L is cut off1When an output channel is needed, L needs to be cut off2The corresponding input channel of (a).
Similarly, when an optimization unit includes 3 convolutional layers (in L)1、L2And L3For example), when channel cutting is performed once, L can be cut off1And L2Or, cutting off L2And L3The corresponding input channel of (a).
When the optimization unit is cut, parameters of each convolution layer in the optimization unit after channel cutting may be quantized, and a specific implementation manner of the method may refer to related descriptions in the method flow shown in fig. 1.
In other embodiments, after the quantifying the parameters of each convolution layer in the optimization unit in step S110, the method may further include:
according to the importance of each channel in the quantized optimization unit, channel cutting is carried out on the quantized optimization unit to obtain the channel-cut optimization unit; wherein, a channel of the optimization unit comprises an output channel of a previous convolution layer in two adjacent convolution layers in the optimization unit and a corresponding input channel of a next convolution layer, and the importance of the channel is positively correlated with the distance between output features before and after the channel is cut.
In step S120, the optimizing the parameters of each convolution layer in the quantized optimization unit may include:
and respectively optimizing the parameters of each convolution layer in the optimization unit after channel cutting.
For example, for any optimization unit, after quantizing the parameters of each convolution layer in the optimization unit, channel clipping may be performed on the optimization unit to further compress the model.
For any optimization unit, channel cutting can be carried out on the optimization unit according to the importance of each channel in the optimization unit, so that a certain number of channels with lower importance are cut off, and model compression is realized.
When the cutting of the optimization unit is completed, parameters of each convolution layer in the optimization unit after the channel cutting can be optimized respectively, and the performance of the compressed model is optimized.
In one example, channel clipping the optimization unit is achieved by:
for any channel, determining the distance between the output features of the optimization units before and after the channel is cut;
sorting the channels according to the distance between the output features before and after cutting of each channel and the sequence of the distances from large to small;
and when determining that the distance between the output characteristic of the optimization unit and the output characteristic of the original optimization unit is smaller than a first threshold when the N channels in the front sequence are cut off, and determining that the N channels in the front sequence are cut off when the distance between the output characteristic of the optimization unit and the output characteristic of the original optimization unit when the N +1 channels in the front sequence are cut off is larger than or equal to the first threshold.
Illustratively, under the condition that the distance between the output characteristic of the optimized unit after channel clipping and the output characteristic of the original optimized unit is less than a preset threshold (referred to as a first threshold herein), the channels are clipped in sequence from low importance to high importance.
For any optimization unit (which may include an optimization unit before parameter quantization or an optimization unit after parameter quantization) that needs to perform channel clipping, the distance between the output features before and after each channel clipping may be determined respectively.
The distance between the output features before and after the trimming of one channel means that the optimization unit trims the distance between the output features before and after the trimming of the channel when the input features are the same.
For example, taking channel clipping for the optimization unit before parameter quantization as an example, for any optimization unit, the output characteristic Y of the optimization unit when a certain number of input characteristics X are input may be determined first, and for any channel in the optimization unit, the output Y ″ of the optimization unit after channel clipping when a certain number of input characteristics X are input after the channel is clipped may be determined, and the distance between Y and Y ″ is determined, such as the euclidean distance or the cosine distance.
When the distance between the output features of each channel before and after being cut is determined according to the mode, the channels can be sorted according to the distance between the output features of each channel before and after being cut according to the sequence from large to small, and the channel with the lowest importance in the channels reserved in the optimization unit is cut off in sequence on the premise that the distance between the output feature of the optimization unit after being cut and the output feature of the original optimization unit is smaller than a preset threshold (namely a first threshold).
For example, the channel 1 sorted from low to high in importance may be cut, then the distance between the output feature of the optimized unit after the channel is cut and the output feature of the original optimized unit is determined (in the case that the input features are the same), if the distance is smaller than the first threshold, the channel 2 sorted is further cut, the distance between the output feature of the optimized unit after the channel is cut and the output feature of the original optimized unit is determined again, and whether the distance is smaller than the first threshold is determined, if yes, the cutting is continued; otherwise, the cutting is cancelled, and the channel cutting process of the optimization unit is ended.
It should be noted that, when channel clipping is performed, a plurality of channels ranked in the front, such as 5 or 10 channels, may be clipped first, and a distance between an output feature of the optimization unit after channel clipping and an output feature of the original optimization unit is determined, and if the distance is smaller than the first threshold, further clipping is performed; if the distance is greater than or equal to the first threshold, the clipping is cancelled, the number of the clipped channels is reduced, and the channel clipping is performed again, which is not described herein in detail.
In another example, channel clipping the optimization unit is achieved by:
for any channel, determining the distance between the output features of the optimization units before and after the channel is cut;
determining a channel of which the distance between the output features of the optimization units before and after cutting is smaller than a second threshold value as a channel to be cut according to the distance between the output features before and after cutting of each channel;
and cutting the channel to be cut.
For example, the channels with a distance smaller than a preset threshold (referred to as a second threshold herein) may be clipped according to the distance between the output features of the optimization unit before and after channel clipping.
For any optimization unit (which may include an optimization unit before parameter quantization or an optimization unit after parameter quantization) that needs to perform channel clipping, the distance between the output features before and after each channel clipping may be determined respectively.
When the distance between the output features before and after the channel is cut is determined according to the mode, the channel of which the distance is smaller than the second threshold value can be determined as the channel to be cut according to the distance between the output features before and after the channel is cut, and the channel to be cut is cut.
It should be noted that, in the embodiment of the present application, for a scenario where the compression rate requirement of the model is high but the performance requirement of the model is relatively low, a specified number of channels may also be deleted according to the order from low to high of the channel importance, and the specific implementation thereof is not described herein again.
In addition, it should be appreciated that, in the embodiment of the present application, the channel to be clipped is not limited to be determined according to the distance between the output features of the optimization unit before and after the channel clipping, for example, a group last (least absolute value convergence and selection operator) mode may also be adopted to select the remaining channel, and clip other channels, and a specific implementation thereof is not described herein.
In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present application, the technical solutions provided by the embodiments of the present application are described below with reference to specific examples.
In this embodiment, two consecutive convolutional layers in the model are used as the optimization unit, and the distance between the output features of the optimization unit before and after compression is used as the optimization target, and the two convolutional layers in the optimization unit are subjected to channel clipping and parameter quantization, and the implementation process is as follows:
1. dividing the model into a plurality of optimization units by taking two continuous convolution layers in the model as optimization units, wherein a schematic diagram can be shown in FIG. 2;
2. and for any optimization unit, determining the channel to be cut according to the importance of each channel in the optimization unit, and cutting the channel to be cut to obtain the optimization unit after channel cutting.
For example, for an optimization unit, when performing channel clipping, a channel includes an output channel of a previous convolutional layer and a corresponding input channel of a subsequent convolutional layer, and a schematic diagram thereof can be shown in fig. 3.
The importance of a channel is positively correlated to the distance between the output features of the optimization unit after the channel is clipped.
3. And quantizing the parameters of the two convolution layers in the optimized unit after channel cutting to obtain the quantized optimized unit.
4. And optimizing each parameter in the quantized optimization unit according to an iterative optimization mode.
5. And (5) repeating the steps 2-4 until all the optimization units corresponding to the whole model complete channel cutting and parameter quantification.
The respective steps will be described in detail below.
Step 1 may include:
the hypothetical model includes N convolutional layers, i.e., M ═ L1,L2,…,Li,…,LN]Then, the whole model can be traversed, two convolution layers connected in front and back are pairwise combined to form an optimization unit, and N-1 optimization units [ (L)1,L2),(L2,L3),…,(Li,Li+1),…,(LN-2,LN-1),(LN-1,LN)]。
Step 2 may include:
2.1, a certain number of features X are input into the original optimization unit S, e.g. S ═ (L)1,L2) Obtaining the output Y of the original optimization unit;
2.2 cutting one channel in the S to obtain an optimization unit after cutting the channel
Figure BDA0002796790930000111
Illustratively, suppose L1And L2The parameters are respectively the shape
Figure BDA0002796790930000112
And
Figure BDA0002796790930000113
due to L1And L2Are connected in front and back, L1One output channel of corresponds to L2Of the input channel of, thus, L1When one output channel is cut, L is required to be cut2The corresponding input channel in (1) is also cut out.
2.3 determining an optimization Unit
Figure BDA0002796790930000114
And optimizing the distance between the output features of the unit S.
Illustratively, the unit can be optimized
Figure BDA0002796790930000115
And the distance (e.g. cosine distance or Euclidean distance) between the output features of the optimization unit S, i.e. inputting the feature X into the optimization unit
Figure BDA0002796790930000116
Obtain an output
Figure BDA0002796790930000117
According to Y and
Figure BDA0002796790930000118
the importance of the clipped channel is evaluated, and the greater the distance, the greater the importance of the clipped channel.
2.4, repeating the steps 2.2-2.3, and respectively determining the distance between the output characteristics of the optimization units before and after (cutting on the basis of the original optimization units) each channel is cut off;
2.5, sorting according to the sequence of the importance of the channels from low to high, according to a cutting target, cutting off the channels as much as possible under the condition that the distance between the output characteristic of the optimized channel after the channel cutting and the output characteristic of the original optimization unit is smaller than a first threshold value, or cutting off the channel of which the distance between the output characteristics of the optimization units before and after the channel cutting is smaller than a second threshold value, and performing channel cutting to obtain the optimization unit S after the channel cuttingp
Illustratively, by performing channel clipping on an optimization unit comprising a plurality of convolutional layers in succession, the problem that clipping a channel in an implementation mode in which a single convolutional layer is used as the optimization unit causes the alignment failure of subsequent features and the channel is avoided.
Step 3 may include:
for optimization unit SpQuantizing the parameters of each convolution layer to obtain a quantized optimization unit Sp&q
Step 4 may include:
for the optimization unit S obtained in step 3p&qRespectively optimizing each parameter of the two convolution layers, reducing Sp&qOutput Y ofp&qAnd the output characteristics Y (the input characteristics are all X) of the original optimization unit S. The method specifically comprises the following steps:
4.1, in pairs Sp&qThe parameter p in (1) is optimized as an example, and the value of p (quantization parameter) is assumed to be xqQuantization interval of α according to which x is pairedqMaking an adjustment so that xq=argmin(f(Sp&qX, Y)), wherein XqIs an integer multiple of alpha, and f () is the input of the input feature X to Sp&qIs compared to the distance between the output feature in (1) and the output feature inputting the input feature X into the original optimization unit S, i.e. X is paired according to the quantization interval αqMaking adjustments, e.g. to xqAdding or subtracting alpha (or adding or subtracting other integer multiples of alpha), finding the input features X to Sp&qX having the smallest distance between the output features in (1) and the output features inputting the input features into the original optimization unit Sq
4.2, according to an iterative optimization mode, for Sp&qAnd the parameters are optimized one by one.
Illustratively, information loss caused by channel cutting and parameter quantization is compensated through parameter optimization.
In addition, through the one-by-one fine optimization of the parameters in the optimization unit, the quantization noise caused by quantizing one parameter can be compensated when other parameters are optimized subsequently, so that the quantization noise is effectively reduced, and the model performance is improved.
The methods provided herein are described above. The following describes the apparatus provided in the present application:
referring to fig. 4, a schematic structural diagram of a model compression apparatus according to an embodiment of the present application is shown in fig. 4, where the model compression apparatus may include:
a dividing unit 410, configured to divide a model to be compressed into multiple optimization units, where one optimization unit includes multiple convolutional layers that are consecutive in the model to be compressed;
a compressing unit 420, configured to quantize, for any optimization unit, parameters of each convolution layer in the optimization unit to obtain a quantized optimization unit;
an optimizing unit 430, configured to optimize parameters of each convolution layer in the quantized optimizing unit, respectively, so that the first distance is smaller than the second distance; the first distance is the distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit after parameter optimization; the second distance is a distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit before parameter optimization.
In some embodiments, the optimizing unit 430 optimizes the parameters of each convolutional layer in the quantized optimizing unit respectively, including:
respectively optimizing each parameter of each convolution layer in the quantized optimization unit according to an iterative optimization mode;
and for any parameter, adjusting the parameter according to the quantization interval to determine a target parameter value, wherein the target parameter value is the value of the parameter, which minimizes the distance between the output characteristic of the optimization unit after the parameter adjustment and the output characteristic of the original optimization unit in the process of adjusting the parameter.
In some embodiments, before the compressing unit 420 quantizes the parameters of each convolution layer in the optimization unit, the method further includes:
according to the importance of each channel in the optimization unit, channel cutting is carried out on the optimization unit to obtain an optimization unit after channel cutting; wherein, one channel of the optimization unit comprises an output channel of a previous convolution layer in two adjacent convolution layers in the optimization unit and a corresponding input channel of a next convolution layer, and the importance of the channel is positively correlated with the distance between output characteristics before and after the channel is cut;
the optimization unit 430 quantizes parameters of each convolution layer in the optimization unit, and includes:
and quantizing the parameters of each convolution layer in the optimization unit after channel cutting to obtain a quantized optimization unit.
In some embodiments, after the compressing unit 420 quantizes the parameters of each convolution layer in the optimization unit, the method further includes:
according to the importance of each channel in the quantized optimization unit, channel cutting is carried out on the quantized optimization unit to obtain an optimized unit after channel cutting; wherein, one channel of the optimization unit comprises an output channel of a previous convolution layer in two adjacent convolution layers in the optimization unit and a corresponding input channel of a next convolution layer, and the importance of the channel is positively correlated with the distance between output characteristics before and after the channel is cut;
the optimizing unit 430 optimizes parameters of each convolution layer in the quantized optimizing unit, respectively, including:
and respectively optimizing the parameters of each convolution layer in the optimization unit after the channel is cut.
In some embodiments, the compression unit 420 channel-clipping the optimization unit includes:
for any channel, determining the distance between the output features of the optimization units before and after the channel is cut;
sorting the channels according to the distance between the output characteristics of the optimization units before and after cutting of each channel and the sequence of the distances from large to small;
and when determining that the distance between the output characteristic of the optimization unit and the output characteristic of the original optimization unit is smaller than a first threshold when the N channels in the front sequence are cut off, and determining that the N channels in the front sequence are cut off when the distance between the output characteristic of the optimization unit and the output characteristic of the original optimization unit when the N +1 channels in the front sequence are cut off is larger than or equal to the first threshold.
In some embodiments, the compression unit 420 channel-clipping the optimization unit includes:
for any channel, determining the distance between the output features of the optimization units before and after the channel is cut;
determining a channel of which the distance between the output features of the optimization units before and after cutting is smaller than a second threshold value as a channel to be cut according to the distance between the output features before and after cutting of each channel;
and cutting the channel to be cut.
Fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure. The electronic device may include a processor 501, a memory 502 storing machine executable instructions. The processor 501 and the memory 502 may communicate via a system bus 503. Also, the processor 501 may perform the model compression methods described above by reading and executing machine-executable instructions in the memory 502 corresponding to the encoded control logic.
The memory 502 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.
In some embodiments, there is also provided a machine-readable storage medium, such as the memory 502 in fig. 5, having stored therein machine-executable instructions that, when executed by a processor, implement the model compression method described above. For example, the machine-readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and so forth.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims (10)

1. A method of model compression, comprising:
dividing a model to be compressed into a plurality of optimization units, wherein one optimization unit comprises a plurality of continuous convolution layers in the model to be compressed;
for any optimization unit, quantizing the parameters of each convolution layer in the optimization unit to obtain a quantized optimization unit;
respectively optimizing the parameters of each convolution layer in the quantized optimization unit so as to enable the first distance to be smaller than the second distance; the first distance is the distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit after parameter optimization; the second distance is a distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit before parameter optimization.
2. The method of claim 1, wherein the separately optimizing the parameters of each convolutional layer in the quantized optimization unit comprises:
respectively optimizing each parameter of each convolution layer in the quantized optimization unit according to an iterative optimization mode;
and for any parameter, adjusting the parameter according to the quantization interval to determine a target parameter value, wherein the target parameter value is the value of the parameter, which minimizes the distance between the output characteristic of the optimization unit after the parameter adjustment and the output characteristic of the original optimization unit in the process of adjusting the parameter.
3. The method of claim 1, wherein prior to quantizing the parameters of each convolutional layer in the optimization unit, further comprising:
according to the importance of each channel in the optimization unit, channel cutting is carried out on the optimization unit to obtain an optimization unit after channel cutting; wherein, one channel of the optimization unit comprises an output channel of a previous convolution layer in two adjacent convolution layers in the optimization unit and a corresponding input channel of a next convolution layer, and the importance of the channel is positively correlated with the distance between output characteristics before and after the channel is cut;
the quantifying the parameters of each convolution layer in the optimization unit includes:
and quantizing the parameters of each convolution layer in the optimization unit after channel cutting to obtain a quantized optimization unit.
4. The method of claim 1, wherein after quantizing the parameters of each convolutional layer in the optimization unit, further comprising:
according to the importance of each channel in the quantized optimization unit, channel cutting is carried out on the quantized optimization unit to obtain an optimized unit after channel cutting; wherein, one channel of the optimization unit comprises an output channel of a previous convolution layer in two adjacent convolution layers in the optimization unit and a corresponding input channel of a next convolution layer, and the importance of the channel is positively correlated with the distance between output characteristics before and after the channel is cut;
the optimizing the parameters of each convolution layer in the quantized optimization unit respectively comprises:
and respectively optimizing the parameters of each convolution layer in the optimization unit after the channel is cut.
5. The method according to claim 3 or 4, characterized in that channel clipping the optimization unit is achieved by:
for any channel, determining the distance between the output features of the optimization units before and after the channel is cut;
sorting the channels according to the distance between the output characteristics of the optimization units before and after cutting of each channel and the sequence of the distances from large to small;
and when determining that the distance between the output characteristic of the optimization unit and the output characteristic of the original optimization unit is smaller than a first threshold when the N channels in the front sequence are cut off, and determining that the N channels in the front sequence are cut off when the distance between the output characteristic of the optimization unit and the output characteristic of the original optimization unit when the N +1 channels in the front sequence are cut off is larger than or equal to the first threshold.
6. The method according to claim 3 or 4, characterized in that channel clipping the optimization unit is achieved by:
for any channel, determining the distance between the output features of the optimization units before and after the channel is cut;
determining a channel of which the distance between the output features of the optimization units before and after cutting is smaller than a second threshold value as a channel to be cut according to the distance between the output features before and after cutting of each channel;
and cutting the channel to be cut.
7. A pattern compression apparatus, comprising:
the device comprises a dividing unit, a compressing unit and a control unit, wherein the dividing unit is used for dividing a model to be compressed into a plurality of optimization units, and one optimization unit comprises a plurality of continuous convolution layers in the model to be compressed;
the compression unit is used for quantizing the parameters of each convolution layer in the optimization unit to obtain a quantized optimization unit for any optimization unit;
the optimization unit is used for respectively optimizing the parameters of each convolution layer in the quantized optimization unit so as to enable the first distance to be smaller than the second distance; the first distance is the distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit after parameter optimization; the second distance is a distance between the output characteristic of the quantized optimization unit and the output characteristic of the original optimization unit before parameter optimization.
8. The apparatus of claim 7, wherein the optimizing unit optimizes parameters of each convolutional layer in the quantized optimizing unit respectively, comprising:
respectively optimizing each parameter of each convolution layer in the quantized optimization unit according to an iterative optimization mode;
and for any parameter, adjusting the parameter according to the quantization interval to determine a target parameter value, wherein the target parameter value is the value of the parameter, which minimizes the distance between the output characteristic of the optimization unit after the parameter adjustment and the output characteristic of the original optimization unit in the process of adjusting the parameter.
9. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor, the processor being configured to execute the machine executable instructions to implement the method of any one of claims 1 to 6.
10. A machine-readable storage medium having stored therein machine-executable instructions which, when executed by a processor, implement the method of any one of claims 1-6.
CN202011334592.2A 2020-11-24 2020-11-24 Model compression method and device, electronic equipment and readable storage medium Active CN112329923B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011334592.2A CN112329923B (en) 2020-11-24 2020-11-24 Model compression method and device, electronic equipment and readable storage medium
PCT/CN2021/132610 WO2022111490A1 (en) 2020-11-24 2021-11-24 Model compression method and apparatus, electronic device, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011334592.2A CN112329923B (en) 2020-11-24 2020-11-24 Model compression method and device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN112329923A true CN112329923A (en) 2021-02-05
CN112329923B CN112329923B (en) 2024-05-28

Family

ID=74309407

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011334592.2A Active CN112329923B (en) 2020-11-24 2020-11-24 Model compression method and device, electronic equipment and readable storage medium

Country Status (2)

Country Link
CN (1) CN112329923B (en)
WO (1) WO2022111490A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022111490A1 (en) * 2020-11-24 2022-06-02 杭州海康威视数字技术股份有限公司 Model compression method and apparatus, electronic device, and readable storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116433627A (en) * 2023-04-11 2023-07-14 中国长江三峡集团有限公司 Photovoltaic panel defect identification model construction and defect identification method and system
CN117710786B (en) * 2023-08-04 2024-10-29 荣耀终端有限公司 Image processing method, optimization method of image processing model and related equipment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170308789A1 (en) * 2014-09-12 2017-10-26 Microsoft Technology Licensing, Llc Computing system for training neural networks
CN108615049A (en) * 2018-04-09 2018-10-02 华中科技大学 A kind of vehicle part detection model compression method and system
CN108805257A (en) * 2018-04-26 2018-11-13 北京大学 A kind of neural network quantization method based on parameter norm
CN110096647A (en) * 2019-05-10 2019-08-06 腾讯科技(深圳)有限公司 Optimize method, apparatus, electronic equipment and the computer storage medium of quantitative model
CN110245741A (en) * 2018-03-09 2019-09-17 佳能株式会社 Optimization and methods for using them, device and the storage medium of multilayer neural network model
CN110619391A (en) * 2019-09-19 2019-12-27 华南理工大学 Detection model compression method and device and computer readable storage medium
CN110942143A (en) * 2019-12-04 2020-03-31 卓迎 Toy detection acceleration method and device based on convolutional neural network
CN111091184A (en) * 2019-12-19 2020-05-01 浪潮(北京)电子信息产业有限公司 Deep neural network quantification method and device, electronic equipment and medium
US20200311552A1 (en) * 2019-03-25 2020-10-01 Samsung Electronics Co., Ltd. Device and method for compressing machine learning model
US20200311540A1 (en) * 2019-03-28 2020-10-01 International Business Machines Corporation Layer-Wise Distillation for Protecting Pre-Trained Neural Network Models
CN111860779A (en) * 2020-07-09 2020-10-30 北京航空航天大学 Rapid automatic compression method for deep convolutional neural network
CN111860841A (en) * 2020-07-28 2020-10-30 Oppo广东移动通信有限公司 Quantization model optimization method, device, terminal and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020190772A1 (en) * 2019-03-15 2020-09-24 Futurewei Technologies, Inc. Neural network model compression and optimization
CN112329923B (en) * 2020-11-24 2024-05-28 杭州海康威视数字技术股份有限公司 Model compression method and device, electronic equipment and readable storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170308789A1 (en) * 2014-09-12 2017-10-26 Microsoft Technology Licensing, Llc Computing system for training neural networks
CN110245741A (en) * 2018-03-09 2019-09-17 佳能株式会社 Optimization and methods for using them, device and the storage medium of multilayer neural network model
JP2019160319A (en) * 2018-03-09 2019-09-19 キヤノン株式会社 Method and device for optimizing and applying multi-layer neural network model, and storage medium
CN108615049A (en) * 2018-04-09 2018-10-02 华中科技大学 A kind of vehicle part detection model compression method and system
CN108805257A (en) * 2018-04-26 2018-11-13 北京大学 A kind of neural network quantization method based on parameter norm
US20200311552A1 (en) * 2019-03-25 2020-10-01 Samsung Electronics Co., Ltd. Device and method for compressing machine learning model
CN111738401A (en) * 2019-03-25 2020-10-02 北京三星通信技术研究有限公司 Model optimization method, grouping compression method, corresponding device and equipment
US20200311540A1 (en) * 2019-03-28 2020-10-01 International Business Machines Corporation Layer-Wise Distillation for Protecting Pre-Trained Neural Network Models
CN110096647A (en) * 2019-05-10 2019-08-06 腾讯科技(深圳)有限公司 Optimize method, apparatus, electronic equipment and the computer storage medium of quantitative model
CN110619391A (en) * 2019-09-19 2019-12-27 华南理工大学 Detection model compression method and device and computer readable storage medium
CN110942143A (en) * 2019-12-04 2020-03-31 卓迎 Toy detection acceleration method and device based on convolutional neural network
CN111091184A (en) * 2019-12-19 2020-05-01 浪潮(北京)电子信息产业有限公司 Deep neural network quantification method and device, electronic equipment and medium
CN111860779A (en) * 2020-07-09 2020-10-30 北京航空航天大学 Rapid automatic compression method for deep convolutional neural network
CN111860841A (en) * 2020-07-28 2020-10-30 Oppo广东移动通信有限公司 Quantization model optimization method, device, terminal and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ANTONIO POLINO ET.AL: "MODEL COMPRESSION VIA DISTILLATION AND QUANTIZATION", 《ARXIV:1802.05668V1》, 15 February 2018 (2018-02-15), pages 1 - 21 *
SHUPENG GUI ET.AL: "Model Compression with Adversarial Robustness: A Unified Optimization Framework", 《NEURIPS 2019》, 7 September 2019 (2019-09-07), pages 1 - 12 *
吴立帅: "深度神经网络中的结构化模型压缩算法研究与应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 7, 15 July 2020 (2020-07-15) *
孙彦丽等: "基于剪枝与量化的卷积神经网络压缩方法", 《 计算机科学》, vol. 47, no. 8, 26 August 2020 (2020-08-26), pages 261 - 266 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022111490A1 (en) * 2020-11-24 2022-06-02 杭州海康威视数字技术股份有限公司 Model compression method and apparatus, electronic device, and readable storage medium

Also Published As

Publication number Publication date
WO2022111490A1 (en) 2022-06-02
CN112329923B (en) 2024-05-28

Similar Documents

Publication Publication Date Title
CN112329923A (en) Model compression method and device, electronic equipment and readable storage medium
CN109978142B (en) Neural network model compression method and device
CN111866518B (en) Self-adaptive three-dimensional point cloud compression method based on feature extraction
CN110674924B (en) Deep learning inference automatic quantification method and device
CN111126595A (en) Method and equipment for model compression of neural network
CN112488304A (en) Heuristic filter pruning method and system in convolutional neural network
CN112884149A (en) Deep neural network pruning method and system based on random sensitivity ST-SM
CN116402117A (en) Image classification convolutional neural network pruning method and core particle device data distribution method
CN114819143A (en) Model compression method suitable for communication network field maintenance
CN116523949A (en) Video portrait matting method based on structured pruning
CN110263917B (en) Neural network compression method and device
Rui et al. Smart network maintenance in an edge cloud computing environment: An adaptive model compression algorithm based on model pruning and model clustering
CN117610632A (en) Neural network lightweight method based on parameter cutoff judgment dotting
CN113467949A (en) Gradient compression method for distributed DNN training in edge computing environment
CN110321799B (en) Scene number selection method based on SBR and average inter-class distance
CN116303386A (en) Intelligent interpolation method and system for missing data based on relational graph
CN116229154A (en) Class increment image classification method based on dynamic hybrid model
CN114611718B (en) Federal learning method and system for heterogeneous data
CN115913248A (en) Live broadcast software development data intelligent management system
CN116033159A (en) Feature processing method, image coding method and device
CN112734010B (en) Convolutional neural network model compression method suitable for image recognition
CN115982634A (en) Application program classification method and device, electronic equipment and computer program product
CN112437460A (en) IP address black and gray list analysis method, server, terminal and storage medium
CN113177627A (en) Optimization system, retraining system, and method thereof, and processor and readable medium
CN113033653A (en) Edge-cloud collaborative deep neural network model training method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant