CN114970828A - Compression method and device for deep neural network, electronic equipment and medium - Google Patents

Compression method and device for deep neural network, electronic equipment and medium Download PDF

Info

Publication number
CN114970828A
CN114970828A CN202210638580.1A CN202210638580A CN114970828A CN 114970828 A CN114970828 A CN 114970828A CN 202210638580 A CN202210638580 A CN 202210638580A CN 114970828 A CN114970828 A CN 114970828A
Authority
CN
China
Prior art keywords
neural network
network model
deep neural
quantization
pruned
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210638580.1A
Other languages
Chinese (zh)
Inventor
唐长成
陆天翼
梁爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Chaoxing Future Technology Co ltd
Original Assignee
Beijing Chaoxing Future Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Chaoxing Future Technology Co ltd filed Critical Beijing Chaoxing Future Technology Co ltd
Priority to CN202210638580.1A priority Critical patent/CN114970828A/en
Publication of CN114970828A publication Critical patent/CN114970828A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The embodiment of the application provides a compression method, a compression device, electronic equipment and a medium of a deep neural network, wherein the method comprises the following steps: correcting the deep neural network model with the quantization nodes to obtain a corrected deep neural network model; carrying out sensitivity analysis on the corrected deep neural network model to obtain the sensitivity of each convolution layer; determining pruning proportion of each convolutional layer according to sensitivity of each convolutional layer; cutting off the number of channels corresponding to the convolutional layers in the convolutional layers to obtain a pruned deep neural network model; carrying out quantitative perception training on the pruned deep neural network model to obtain the pruned deep neural network model; and if the depth neural network model after pruning quantification reaches a preset quantification condition, deriving the depth neural network model after pruning quantification. Therefore, the performance loss of the deep neural network model in the pruning quantification process is reduced, and the compression effect of the neural network is improved.

Description

Compression method and device for deep neural network, electronic equipment and medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for compressing a deep neural network, an electronic device, and a medium.
Background
With the rapid development of deep learning technology, neural network technology is widely applied to computer vision tasks such as image recognition, image segmentation and target tracking. In pursuit of performance improvement, existing neural network designs are increasingly wider and deeper, as more parametrically populated networks tend to perform better. Larger neural networks place greater computational demands on computational resources, but larger neural networks are not friendly to low-storage, low-power hardware platforms, thereby limiting the application of neural networks. In the prior art, a neural network is compressed by a neural network pruning technology and a quantization technology of the neural network. In the traditional neural network compression process, pruning and quantization are performed according to the sequence, the neural network is pruned firstly, the compressed neural network is quantized after being compressed to be small enough, and then deployment is performed. This approach may bring a large performance loss and has a poor compression effect on the neural network.
Disclosure of Invention
In order to solve the technical problem, embodiments of the present application provide a compression method and apparatus for a deep neural network, an electronic device, and a medium.
In a first aspect, an embodiment of the present application provides a method for compressing a deep neural network, where the method includes:
step S1, correcting the deep neural network model with the quantization nodes, determining the quantization step size of the weight, and adjusting the deep neural network model with the quantization nodes according to the quantization step size to obtain a corrected deep neural network model;
step S2, carrying out sensitivity analysis on the corrected deep neural network model to obtain the sensitivity of each convolution layer of the corrected deep neural network model;
step S3, determining the pruning proportion of each corresponding convolutional layer according to the sensitivity of each convolutional layer;
step S4, analyzing the topological connection relation of the corrected deep neural network model, dividing a plurality of convolutional layers of the corrected deep neural network model into a group of convolutional layers according to a preset connection relation, and pruning the number of channels corresponding to the convolutional layers in the group of convolutional layers according to the convolutional layer pruning proportion corresponding to the group of convolutional layers to obtain a pruned deep neural network model;
step S5, carrying out quantitative perception training on the pruned deep neural network model until the performance of the quantized deep neural network model is restored to the preset required performance, and obtaining the pruned quantized deep neural network model;
step S6, judging whether the depth neural network model after pruning quantization reaches a preset quantization condition;
and step S7, if the pruned and quantized deep neural network model reaches the preset quantization condition, deriving the pruned and quantized deep neural network model.
In an embodiment, the method further comprises:
and step S8, if the pruned quantized deep neural network model does not reach the preset quantization condition, returning to execute step S1.
In an embodiment, the step S6 of determining whether the pruned quantized deep neural network model meets the preset quantization condition includes:
judging whether the number of the weight parameters of the depth neural network model after pruning quantization is compressed to the preset number or not; or,
and judging whether the depth neural network model after pruning quantification cannot be recovered to the preset required precision.
In one embodiment, the step of obtaining the deep neural network model with the quantization nodes includes:
and step S0, inserting a virtual quantization node into the initial deep neural network model to obtain the deep neural network model with the quantization node.
In an embodiment, in step S2, the analyzing sensitivity of the corrected deep neural network model to obtain sensitivity of each convolutional layer of the corrected deep neural network model includes:
step S21, loading the trained model weight to the corrected deep neural network model, and testing to obtain the performance of the corrected deep neural network model;
step S22, sequentially and respectively cutting off a plurality of channels with preset proportion from the ith convolutional layer of the corrected deep neural network model to obtain a plurality of pruned deep neural network models, and testing to obtain the performance of each pruned neural network model, wherein i is more than or equal to 1 and less than or equal to N, i is an integer, and N is the number of convolutional layers of the corrected deep neural network model;
step S23, determining the loss performance of each convolutional layer of each pruned neural network model according to the performance of the corrected deep neural network model and the performance of each pruned neural network model;
and step S24, determining sensitivity of each convolutional layer according to the loss performance of each convolutional layer of the pruned neural network model.
In one embodiment, the loss performance of each layer of convolutional layer is positively correlated with the sensitivity of each layer of convolutional layer, and the sensitivity of each layer of convolutional layer is inversely correlated with the pruning ratio of each layer of convolutional layer.
In one embodiment, for two adjacent convolutional layers connected, the number of output channels of an adjacent previous convolutional layer and the number of input channels of an adjacent next convolutional layer are the same, the method further comprises:
and if the number of output channels of the adjacent previous layer of the convolutional layer is reduced in the pruning process, synchronously reducing the number of input channels of the adjacent next layer of the convolutional layer.
In a second aspect, an embodiment of the present application provides an apparatus for compressing a deep neural network, the apparatus including:
the correction module is used for correcting the deep neural network model with the quantization nodes, determining the quantization step of the weight, and adjusting the deep neural network model with the quantization nodes according to the quantization step to obtain the corrected deep neural network model;
the analysis module is used for carrying out sensitivity analysis on the corrected deep neural network model to obtain the sensitivity of each convolution layer of the corrected deep neural network model;
the determining module is used for determining the pruning proportion of each corresponding convolutional layer according to the sensitivity of each convolutional layer;
the pruning module is used for analyzing the topological connection relation of the corrected deep neural network model, dividing a plurality of convolutional layers of the corrected deep neural network model into a group of convolutional layers according to a preset connection relation, and pruning the number of channels corresponding to the convolutional layers in the group of convolutional layers according to the pruning proportion of the convolutional layers corresponding to the group of convolutional layers to obtain the pruned deep neural network model;
the quantization module is used for carrying out quantization perception training on the pruned deep neural network model until the performance of the quantized deep neural network model is restored to the preset required performance, so that the pruned deep neural network model is obtained;
the judging module is used for judging whether the depth neural network model after pruning quantization reaches a preset quantization condition;
and the deriving module is used for deriving the pruned and quantized deep neural network model if the pruned and quantized deep neural network model reaches a preset quantization condition.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory is used to store a computer program, and the computer program executes, when the processor runs, the compression method for a deep neural network provided in the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program runs on a processor, the computer program performs the compression method for the deep neural network provided in the first aspect.
According to the compression method, the compression device, the compression electronic equipment and the compression medium of the deep neural network, pruning and quantization perception training are combined, quantization nodes are inserted into a deep neural network model, then pruning and quantization perception training are carried out on the quantized deep neural network model, then quantization and pruning are carried out in an iteration mode, so that the influence of a quantization strategy is considered in the pruning process, the deep neural network model after pruning quantization can be maintained within a required performance range, the performance loss of the deep neural network model in the pruning quantization process is reduced, and the compression effect of the neural network is improved.
Drawings
In order to more clearly explain the technical solutions of the present application, the drawings needed to be used in the embodiments are briefly introduced below, and it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope of protection of the present application. Like components are numbered similarly in the various figures.
Fig. 1 is a schematic flow chart illustrating a compression method of a deep neural network provided by an embodiment of the present application;
fig. 2 is another schematic flow chart of a compression method of a deep neural network provided in an embodiment of the present application;
fig. 3 shows a schematic structural diagram of a compression apparatus of a deep neural network provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present application, are intended to indicate only specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the various embodiments of the present application belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments.
In the prior art, a neural network is compressed by a neural network pruning technology and a quantization technology of the neural network. In the traditional neural network compression process, pruning and quantization are performed according to the sequence, the neural network is pruned firstly, the compressed neural network is quantized after being compressed to be small enough, and then deployment is performed.
The neural network pruning technology is a scheme for compressing the parameters of the neural network and reducing the calculated amount, so that the memory occupation and the resource consumption of the original neural network which consumes huge calculation resources are reduced by times or even tens of times, the running speed of the neural network is improved, and meanwhile, the performance loss can be maintained within an acceptable range.
Quantization of a neural network is also a scheme for compressing the neural network, and unlike pruning, pruning compresses the model by reducing the amount of redundant parameters in the neural network, and quantization transforms parameters in the neural network from a floating point type to a shape that occupies fewer resources, thereby compressing the size of the model. In addition to the purpose of compressing the model size, some computing devices only support shaping operations and therefore have to perform quantization. Due to the loss of calculation accuracy after quantization, the performance of the neural network also has a certain loss. The performance loss is related to the redundancy of the network, the loss of the network with high redundancy after quantization is usually smaller, and the loss of the network with low redundancy after quantization is larger because the network with low redundancy is more sensitive to the change of parameters.
Because the pruning process has reduced the redundancy of the network, quantization may result in a greater performance penalty than before pruning. Even with quantized perceptual training, the network cannot be trained to an acceptable range of performance because of its low redundancy. In addition, sensitivity analysis is needed in the pruning process, so that the pruning proportion of each layer of convolution is determined. However, the sensitivity of the same layer may not be the same for the floating point model and the quantization model, and the most accurate result is not necessarily obtained on the quantization model by using the weight of the floating point model to determine the sensitivity. The traditional neural network compression process may bring about large performance loss, and the compression effect on the neural network is poor.
Example 1
The embodiment of the disclosure provides a compression method of a deep neural network.
Specifically, referring to fig. 1, the compression method of the deep neural network includes:
and S1, correcting the deep neural network model with the quantization nodes, determining the quantization step size of the weight, and adjusting the deep neural network model with the quantization nodes according to the quantization step size to obtain the corrected deep neural network model.
It should be noted that, besides the quantization step, other quantization parameters may also be determined, specifically, different quantization strategies may be used to determine various quantization parameters, and the adopted quantization strategy is related to hardware.
In one embodiment, the step of obtaining the deep neural network model with the quantization nodes includes:
and step S0, inserting a virtual quantization node into the initial deep neural network model to obtain the deep neural network model with the quantization node.
It should be noted that step S0 may be understood as a quantization operation process using floating-point number analog shaping.
And step S2, carrying out sensitivity analysis on the corrected deep neural network model to obtain the sensitivity of each convolution layer of the corrected deep neural network model.
In an embodiment, step S2 may include the steps of:
step S21, loading the trained model weight to the corrected deep neural network model, and testing to obtain the performance of the corrected deep neural network model;
step S22, sequentially and respectively cutting off a plurality of channels with preset proportion from the ith convolutional layer of the corrected deep neural network model to obtain a plurality of pruned deep neural network models, and testing to obtain the performance of each pruned neural network model, wherein i is more than or equal to 1 and less than or equal to N, i is an integer, and N is the number of convolutional layers of the corrected deep neural network model;
step S23, determining the loss performance of each convolutional layer of each pruned neural network model according to the performance of the corrected deep neural network model and the performance of each pruned neural network model;
and step S24, determining sensitivity of each convolutional layer according to the loss performance of each convolutional layer of the pruned neural network model.
Specifically, if the loss performance of a certain layer of the convolutional layer is higher, the sensitivity of the convolutional layer corresponding to the layer is higher. Thus, the larger the loss performance can be effectively characterized, the larger the influence on the model after pruning is, and the larger the sensitivity is. If the sensitivity of a certain layer of the convolutional layer is higher, the pruning proportion of the convolutional layer of the layer is lower. Thus, a smaller pruning proportion can be set for the convolution layer with higher sensitivity, and the influence of pruning on the model is reduced.
The calculation of the sensitivity of each convolutional layer is illustrated below. And S101, loading the trained model weight to the corrected deep neural network model, and testing to obtain the performance of the corrected deep neural network model, which is marked as P. Step S102, subtracting, in sequence, 10%, 20%, 9% of the number of channels from the first layer convolution layer of the corrected deep neural network model to obtain a plurality of pruned deep neural network models, and directly and respectively performing performance test on each pruned deep neural network model to obtain a plurality of performances, which are respectively denoted as P _0_1, P _0_2, …, P _ i _ j, and P _0_9, where P _ i _ j represents the performance after the i-th layer convolution cuts the ratio j in the sensitivity analysis process. And S103, repeating the step S102 on the basis of the corrected deep neural network model, and analyzing the ith convolutional layer until all convolutional layers are analyzed, wherein i is more than or equal to 1 and less than or equal to N, i is an integer, and N is the number of convolutional layers of the corrected deep neural network model. Step S104, generating a table with the size of N multiplied by 9, wherein N is the number of the convolutional layers, the numerical value of each table is the difference value of P-P _ i _ j, the difference value is the performance lost after pruning, and the larger the performance lost after pruning is, the more sensitive the layer is. In step S105, the appropriate pruning ratio for each convolutional layer can be determined according to the result of the sensitivity analysis. It should be noted that steps S101 to S105 are not shown in the drawing.
And step S3, determining the pruning proportion of each corresponding convolutional layer according to the sensitivity of each convolutional layer.
In one embodiment, the loss performance of each layer of convolutional layer is positively correlated with the sensitivity of each layer of convolutional layer, and the sensitivity of each layer of convolutional layer is inversely correlated with the pruning ratio of each layer of convolutional layer.
Step S4, analyzing the topological connection relation of the corrected deep neural network model, dividing a plurality of convolutional layers of the corrected deep neural network model into a group of convolutional layers according to a preset connection relation, and pruning the number of channels corresponding to the convolutional layers in the group of convolutional layers according to the convolutional layer pruning proportion corresponding to the group of convolutional layers to obtain the pruned deep neural network model.
Exemplarily, analyzing the topological connection of the deep neural network model, dividing convolution layers with special connection into a group, for example, dividing convolution layers of tensor addition operation (Element-wise add), tensor multiplication operation (Element-wise mul) and full connection layer (concat) into a group of convolution layers, and virtually cutting off the number of channels corresponding to each layer according to a pruning proportion, where virtual cutting off may be understood as setting the weight of a corresponding convolution channel to 0, and the convolution layers divided into a group needs to cut off the number of channels with the same number.
In one embodiment, for two adjacent convolutional layers connected, the number of output channels of an adjacent previous convolutional layer and the number of input channels of an adjacent next convolutional layer are the same, the method further comprises:
and if the number of output channels of the adjacent previous layer of the convolutional layer is reduced in the pruning process, synchronously reducing the number of input channels of the adjacent next layer of the convolutional layer.
In this embodiment, the number of channels in the convolutional layer is cut, so as to achieve the effect of compressing the model parameters and the calculated amount.
It should be noted that the parameter quantity of a convolutional layer is determined by three values, i.e., the number Ci of input channels, the number Co of output channels, and the size K of the convolutional kernel, where the parameter quantity is Ci × Co × K, and pruning is to reduce the size of the number Co of output channels. For two layers of connected convolutional layers, assuming convolutional layer 1 and convolutional layer 2, respectively, the number of output channels Co _1 of convolutional layer 1 and the number of input channels Ci _2 of convolutional layer 2 must be consistent, and the size of the number of output channels Co _1 is reduced in the pruning process, and then the size of the number of input channels Ci _2 of convolutional layer 2 is also reduced synchronously.
And step S5, carrying out quantitative perception training on the pruned deep neural network model until the performance of the quantized deep neural network model is restored to the preset required performance, and obtaining the pruned quantized deep neural network model.
In this embodiment, the quantization aware training can be understood as saving space and speeding up by converting the floating point representation of the weights in the original model to the reshaped representation. For example, the number of Bytes occupied by the floating point number is 4Bytes, the occupied shaping space is 1Byte, the calculation efficiency is greatly different, and different quantization effects can be obtained by different quantization strategies.
And step S6, judging whether the depth neural network model after pruning quantification meets a preset quantification condition.
In an embodiment, the step S6 of determining whether the pruned quantized deep neural network model meets the preset quantization condition includes:
judging whether the number of the weight parameters of the depth neural network model after pruning quantization is compressed to the preset number or not; or,
and judging whether the depth neural network model after pruning quantification cannot be recovered to the preset required precision.
In an embodiment, if the number of the weighting parameters of the depth neural network model after pruning quantization is compressed to the preset number, which indicates that the model is compressed to be small enough, it is determined that the depth neural network model after pruning quantization reaches a preset quantization condition. And if the number of the weight parameters of the depth neural network model after pruning quantization is not compressed to the preset number, determining that the depth neural network model after pruning quantization does not reach the preset quantization condition. The preset number can be set according to the actual application scene of the deep neural network model. It should be noted that, the degree to which the number of the weight parameters needs to be compressed may be set according to the user-defined determination standard, that is, the number of the weight parameters of the deep neural network model after the user-defined pruning quantization needs to be compressed to a corresponding preset number, for example, the number of the weight parameters may be set to be compressed to 20% of the number of the parameters of the original model.
In another embodiment, if the pruned quantized deep neural network model cannot be restored to the preset required precision, it is determined that the pruned quantized deep neural network model meets a preset quantization condition. And if the depth neural network model after pruning quantification can be recovered to the preset required precision, determining that the depth neural network model after pruning quantification does not reach the preset quantification condition. The preset required precision can be set according to the actual application scene of the deep neural network model. It should be noted that the preset required accuracy is a value compared with the original deep neural network model as a reference, and may be an accuracy value set by a user in a user-defined manner. For example, the preset required accuracy is not lower than the accuracy of the original deep neural network model minus 0.02. For example, if the accuracy of the original deep neural network model is 0.90, the accuracy after compression is 0.88.
And step S7, if the pruned and quantized deep neural network model reaches the preset quantization condition, deriving the pruned and quantized deep neural network model.
In one embodiment, the method for compressing the deep neural network further includes:
and step S8, if the pruned quantized deep neural network model does not reach the preset quantization condition, returning to execute step S1.
When the process returns to step S1, the quantization step size needs to be updated, and step S1 is executed based on the updated quantization step size.
Referring to fig. 2, an implementation flow of the compression method of the deep neural network according to the present embodiment is illustrated in the following with reference to fig. 2.
Step S201, inserting a virtual quantization node in the initial deep neural network model.
Specifically, virtual quantization nodes are inserted into the initial deep neural network model to obtain the deep neural network model with the quantization nodes.
And step S202, correcting the deep neural network model with the quantization nodes.
Specifically, the deep neural network model with the quantization nodes is corrected, the quantization step size of the weight is determined, and the deep neural network model with the quantization nodes is adjusted according to the quantization step size to obtain the corrected deep neural network model.
Step S203, sensitivity analysis is carried out on the corrected deep neural network model to obtain the sensitivity of each layer of convolution layer.
Specifically, after sensitivity analysis, the sensitivity of each convolutional layer of the corrected deep neural network model is obtained.
And step S204, determining the pruning proportion of each layer of the convolutional layer according to sensitivity analysis.
Specifically, the loss performance of each layer of the convolutional layer has a positive correlation with the sensitivity of each layer of the convolutional layer, and the sensitivity of each layer of the convolutional layer has an inverse correlation with the pruning proportion of each layer of the convolutional layer.
And S205, subtracting the number of channels corresponding to each convolutional layer according to the pruning proportion.
Specifically, the number of channels corresponding to each layer of convolutional layer is subtracted according to the pruning proportion to obtain the pruned deep neural network model.
And step S206, carrying out quantitative perception training on the pruned model.
Specifically, in the quantitative perception training process, training is continued until the performance of the quantized deep neural network model is restored to the preset required performance, so that the pruned and quantized deep neural network model is obtained.
And step S207, judging whether the depth neural network model after pruning quantization is compressed to be small enough or can not be recovered to the required precision.
In one embodiment, step S208 is performed if the model is sufficiently small or cannot be restored to the required accuracy, and step S202 is performed if the model is not sufficiently small or can be restored to the required accuracy.
And step S208, deriving a real model after pruning quantification.
Specifically, the pruned depth neural network model generated in step S207 may be derived.
According to the compression method of the deep neural network, pruning and quantization perception training are combined, quantization nodes are inserted into a deep neural network model, then pruning and quantization perception training are performed on the quantized deep neural network model, and then quantization and pruning are performed in an iterative mode, so that the influence of a quantization strategy is considered in the pruning process, the deep neural network model after pruning quantization can be maintained within a required performance range, the performance loss of the deep neural network model in the pruning quantization process is reduced, and the compression effect of the neural network is improved.
Example 2
In addition, the embodiment of the disclosure provides a compression device of the deep neural network.
Specifically, as shown in fig. 3, the compression apparatus 300 of the deep neural network includes:
the correcting module 301 is configured to correct the deep neural network model with the quantization nodes, determine a quantization step size of the weight, and adjust the deep neural network model with the quantization nodes according to the quantization step size to obtain a corrected deep neural network model;
an analysis module 302, configured to perform sensitivity analysis on the corrected deep neural network model to obtain sensitivities of convolutional layers of the corrected deep neural network model;
a determining module 303, configured to determine a pruning proportion of each corresponding layer of convolutional layer according to the sensitivity of each layer of convolutional layer;
a pruning module 304, configured to analyze a topological connection relationship of the corrected deep neural network model, divide the plurality of convolutional layers of the corrected deep neural network model into a group of convolutional layers according to a preset connection relationship, and prune, according to a convolutional layer pruning proportion corresponding to the group of convolutional layers, the number of channels corresponding to the convolutional layers in the group of convolutional layers, to obtain a pruned deep neural network model;
a quantization module 305, configured to perform quantization sensing training on the pruned deep neural network model until the performance of the quantized deep neural network model is restored to a preset required performance, so as to obtain the pruned quantized deep neural network model;
a judging module 306, configured to judge whether the pruned quantized deep neural network model reaches a preset quantization condition;
a deriving module 307, configured to derive the pruned quantized deep neural network model if the pruned quantized deep neural network model reaches a preset quantization condition.
In an embodiment, the correcting module 301 is configured to correct the deep neural network model with quantized nodes if the pruned and quantized deep neural network model does not reach a preset quantization condition, determine a quantization step size of the weight, and adjust the deep neural network model with quantized nodes according to the quantization step size to obtain the corrected deep neural network model.
In an embodiment, the preset quantization condition includes that the weight parameters are compressed to a preset number, or the quantized deep neural network model cannot be restored to a preset required precision, and the determining module 306 is further configured to determine whether the number of the weight parameters of the pruned quantized deep neural network model is compressed to the preset number; or,
and judging whether the depth neural network model after pruning quantification cannot be recovered to the preset required precision.
In one embodiment, the compression apparatus 300 of the deep neural network further includes:
and the acquisition module is used for inserting the virtual quantization node into the initial deep neural network model to obtain the deep neural network model with the quantization node.
In an embodiment, the analysis module 302 is further configured to load a trained model weight on the corrected deep neural network model, and test to obtain a performance of the corrected deep neural network model;
respectively cutting off the number of channels with a plurality of preset proportions on the ith convolutional layer of the corrected deep neural network model according to the sequence to obtain a plurality of pruned deep neural network models, and testing to obtain the performance of each pruned neural network model, wherein i is more than or equal to 1 and less than or equal to N, i is an integer, and N is the number of convolutional layers of the corrected deep neural network model;
determining the loss performance of each convolutional layer of each pruned neural network model according to the performance of the corrected deep neural network model and the performance of each pruned neural network model;
and determining sensitivity of each convolutional layer according to loss performance of each convolutional layer of the pruned neural network model.
In one embodiment, the loss performance of each layer of convolutional layer is positively correlated with the sensitivity of each layer of convolutional layer, and the sensitivity of each layer of convolutional layer is inversely correlated with the pruning ratio of each layer of convolutional layer.
In an embodiment, for two adjacent convolutional layers connected, the number of output channels of the convolutional layer in the previous layer is the same as the number of input channels of the convolutional layer in the next layer, and the pruning module 304 is further configured to synchronously reduce the number of input channels of the convolutional layer in the next layer if the number of output channels of the convolutional layer in the previous layer is reduced in the pruning process.
The compression apparatus 300 of the deep neural network provided in this embodiment may perform the compression method of the deep neural network provided in embodiment 1, and for avoiding repetition, details are not described herein again.
According to the compression device of the deep neural network, pruning and quantization perception training are combined, quantization nodes are inserted into a deep neural network model, then pruning and quantization perception training are carried out on the quantized deep neural network model, and then quantization and pruning are carried out in an iteration mode, so that the influence of a quantization strategy is considered in the pruning process, the deep neural network model after pruning quantization can be maintained within a required performance range, performance loss of the deep neural network model in the pruning quantization process is reduced, and the compression effect of the neural network is improved.
Example 3
Furthermore, an embodiment of the present disclosure provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the computer program executes the compression method of the deep neural network provided in embodiment 1 when running on the processor.
The electronic device provided in this embodiment may implement the compression method for the deep neural network provided in embodiment 1, and details are not described here to avoid repetition.
Example 4
The present application also provides a computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the compression method of the deep neural network provided in embodiment 1.
In this embodiment, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
In this embodiment, the computer readable storage medium may implement the method for compressing the deep neural network provided in embodiment 1, and is not described herein again to avoid repetition.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method of compression of a deep neural network, the method comprising:
step S1, correcting the deep neural network model with the quantization nodes, determining the quantization step size of the weight, and adjusting the deep neural network model with the quantization nodes according to the quantization step size to obtain a corrected deep neural network model;
step S2, carrying out sensitivity analysis on the corrected deep neural network model to obtain the sensitivity of each convolution layer of the corrected deep neural network model;
step S3, determining the pruning proportion of each corresponding convolutional layer according to the sensitivity of each convolutional layer;
step S4, analyzing the topological connection relation of the corrected deep neural network model, dividing a plurality of convolutional layers of the corrected deep neural network model into a group of convolutional layers according to a preset connection relation, and pruning the number of channels corresponding to the convolutional layers in the group of convolutional layers according to the convolutional layer pruning proportion corresponding to the group of convolutional layers to obtain a pruned deep neural network model;
step S5, carrying out quantitative perception training on the pruned deep neural network model until the performance of the quantized deep neural network model is restored to the preset required performance, and obtaining the pruned quantized deep neural network model;
step S6, judging whether the depth neural network model after pruning quantization reaches a preset quantization condition;
and step S7, if the pruned and quantized deep neural network model reaches the preset quantization condition, deriving the pruned and quantized deep neural network model.
2. The method of claim 1, further comprising:
and step S8, if the pruned quantized deep neural network model does not reach the preset quantization condition, returning to execute step S1.
3. The method according to claim 1, wherein the preset quantization condition includes that the weight parameter is compressed to a preset number, or the quantized deep neural network model cannot be restored to a preset required precision, and the step S6 of determining whether the pruned quantized deep neural network model meets the preset quantization condition includes:
judging whether the number of the weight parameters of the depth neural network model after pruning quantization is compressed to the preset number or not; or,
and judging whether the depth neural network model after pruning quantification cannot be recovered to the preset required precision.
4. The method of claim 1, wherein the step of obtaining the deep neural network model with quantization nodes comprises:
and step S0, inserting a virtual quantization node into the initial deep neural network model to obtain the deep neural network model with the quantization node.
5. The method according to claim 1, wherein the step S2 of analyzing the sensitivity of the corrected deep neural network model to obtain the sensitivity of each convolutional layer of the corrected deep neural network model comprises:
step S21, loading the trained model weight to the corrected deep neural network model, and testing to obtain the performance of the corrected deep neural network model;
step S22, sequentially and respectively cutting off a plurality of channels with preset proportion from the ith convolutional layer of the corrected deep neural network model to obtain a plurality of pruned deep neural network models, and testing to obtain the performance of each pruned neural network model, wherein i is more than or equal to 1 and less than or equal to N, i is an integer, and N is the number of convolutional layers of the corrected deep neural network model;
step S23, determining the loss performance of each convolutional layer of each pruned neural network model according to the performance of the corrected deep neural network model and the performance of each pruned neural network model;
and step S24, determining sensitivity of each convolutional layer according to the loss performance of each convolutional layer of the pruned neural network model.
6. The method of claim 5, wherein each layer of convolutional layer loss performance is positively correlated with each layer of convolutional layer sensitivity, and each layer of convolutional layer sensitivity is inversely correlated with each layer of convolutional layer pruning ratio.
7. The method of claim 1, wherein for two adjacent connected convolutional layers, the number of output channels of an adjacent preceding convolutional layer and the number of input channels of an adjacent succeeding convolutional layer are the same, the method further comprising:
and if the number of output channels of the adjacent previous layer of the convolutional layer is reduced in the pruning process, synchronously reducing the number of input channels of the adjacent next layer of the convolutional layer.
8. An apparatus for compressing a deep neural network, the apparatus comprising:
the correcting module is used for correcting the deep neural network model with the quantization nodes, determining the quantization step size of the weight, and adjusting the deep neural network model with the quantization nodes according to the quantization step size to obtain a corrected deep neural network model;
the analysis module is used for carrying out sensitivity analysis on the corrected deep neural network model to obtain the sensitivity of each convolution layer of the corrected deep neural network model;
the determining module is used for determining the pruning proportion of each corresponding convolutional layer according to the sensitivity of each convolutional layer;
the pruning module is used for analyzing the topological connection relation of the corrected deep neural network model, dividing a plurality of convolutional layers of the corrected deep neural network model into a group of convolutional layers according to a preset connection relation, and pruning the number of channels corresponding to the convolutional layers in the group of convolutional layers according to the pruning proportion of the convolutional layers corresponding to the group of convolutional layers to obtain the pruned deep neural network model;
the quantization module is used for carrying out quantization perception training on the pruned deep neural network model until the performance of the quantized deep neural network model is restored to the preset required performance, so that the pruned deep neural network model is obtained;
the judging module is used for judging whether the depth neural network model after pruning quantification meets a preset quantification condition;
and the deriving module is used for deriving the pruned and quantized deep neural network model if the pruned and quantized deep neural network model reaches a preset quantization condition.
9. An electronic device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, performs the method of compression of a deep neural network of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that it stores a computer program which, when run on a processor, performs the method of compression of a deep neural network of any one of claims 1 to 7.
CN202210638580.1A 2022-06-07 2022-06-07 Compression method and device for deep neural network, electronic equipment and medium Pending CN114970828A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210638580.1A CN114970828A (en) 2022-06-07 2022-06-07 Compression method and device for deep neural network, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210638580.1A CN114970828A (en) 2022-06-07 2022-06-07 Compression method and device for deep neural network, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN114970828A true CN114970828A (en) 2022-08-30

Family

ID=82959089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210638580.1A Pending CN114970828A (en) 2022-06-07 2022-06-07 Compression method and device for deep neural network, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN114970828A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710786A (en) * 2023-08-04 2024-03-15 荣耀终端有限公司 Image processing method, optimization method of image processing model and related equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710786A (en) * 2023-08-04 2024-03-15 荣耀终端有限公司 Image processing method, optimization method of image processing model and related equipment

Similar Documents

Publication Publication Date Title
CN112949842B (en) Neural network structure searching method, apparatus, computer device and storage medium
CN114707650B (en) Simulation implementation method for improving simulation efficiency
CN111144457A (en) Image processing method, device, equipment and storage medium
CN114580280A (en) Model quantization method, device, apparatus, computer program and storage medium
CN114970828A (en) Compression method and device for deep neural network, electronic equipment and medium
CN112598129A (en) Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator
CN112001491A (en) Search method and device for determining neural network architecture for processor
CN111898751A (en) Data processing method, system, equipment and readable storage medium
CN114511083A (en) Model training method and device, storage medium and electronic device
CN117175664B (en) Energy storage charging equipment output power self-adaptive adjusting system based on use scene
CN111932690B (en) Pruning method and device based on 3D point cloud neural network model
US11507782B2 (en) Method, device, and program product for determining model compression rate
CN117371496A (en) Parameter optimization method, device, equipment and storage medium
CN116306879A (en) Data processing method, device, electronic equipment and storage medium
CN117114074A (en) Training and data processing method and device for neural network model and medium
CN112488291B (en) 8-Bit quantization compression method for neural network
CN113554097B (en) Model quantization method and device, electronic equipment and storage medium
CN110276448B (en) Model compression method and device
CN114972950A (en) Multi-target detection method, device, equipment, medium and product
CN114998649A (en) Training method of image classification model, and image classification method and device
CN115130672A (en) Method and device for calculating convolution neural network by software and hardware collaborative optimization
CN114595627A (en) Model quantization method, device, equipment and storage medium
CN114065920A (en) Image identification method and system based on channel-level pruning neural network
CN111667028A (en) Reliable negative sample determination method and related device
CN111260052A (en) Image processing method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination