CN117131920A - Model pruning method based on network structure search - Google Patents

Model pruning method based on network structure search Download PDF

Info

Publication number
CN117131920A
CN117131920A CN202311396457.4A CN202311396457A CN117131920A CN 117131920 A CN117131920 A CN 117131920A CN 202311396457 A CN202311396457 A CN 202311396457A CN 117131920 A CN117131920 A CN 117131920A
Authority
CN
China
Prior art keywords
model
pruning
layer
network structure
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311396457.4A
Other languages
Chinese (zh)
Other versions
CN117131920B (en
Inventor
丁晓嵘
张新
李晓梅
聂明杰
李楠
王柏春
张兴业
田晓宇
高涵
项乾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Smart Water Development Research Institute
Original Assignee
Beijing Smart Water Development Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Smart Water Development Research Institute filed Critical Beijing Smart Water Development Research Institute
Priority to CN202311396457.4A priority Critical patent/CN117131920B/en
Publication of CN117131920A publication Critical patent/CN117131920A/en
Application granted granted Critical
Publication of CN117131920B publication Critical patent/CN117131920B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a model pruning method based on network structure search. The method comprises the following steps: s1, collecting network structure data, and training an improved model by loading; s2, searching based on a network structure to obtain the sensitivity degree of each convolution layer to pruning; s3, determining the sensitivity of each convolution layer to pruning, and adapting the pruning proportion of each layer; s4, pruning the model, namely pruning a channel with relatively low importance and a corresponding filter in each layer according to the self-adaptive pruning ratio of each layer; s5, the pruning model fine-tuning is self-adaptive to the cutting rate of each layer, and iteration is terminated after the effect of the test set reaches the expected target; s6, model testing, namely storing and outputting the models meeting the equipment requirements and the performance requirements. The application removes redundant channels and corresponding convolution kernels under the condition of ensuring the performance of the model, so that the model has the advantages of small size, low calculation complexity and low battery power consumption, and the portability of the network on the embedded equipment is ensured.

Description

Model pruning method based on network structure search
Technical Field
The application relates to the field of deep neural network model compression, in particular to a model pruning method based on network structure search, which is particularly suitable for solving the problems of overhigh calculated amount and parameter amount caused by transplanting a deep neural network model to edge mobile terminal equipment.
Background
The related fields of artificial intelligence such as computer vision and natural language processing benefit from the development of deep learning, so that the performance of the artificial intelligence reaches an unprecedented height. However, as the depth and width of the neural network are continuously increased, high performance is brought, and meanwhile, high storage space and huge calculation amount are also brought, which greatly hinders the commercialization of the deep learning related method.
With the increasing maturity of artificial intelligence technology and the improvement of computing power of edge devices, the AI landing of the mobile terminal becomes possible. Compared with a deep learning model at a server side, the edge moving end model can relieve the pressure at the server side, and has the advantages of high response speed, high stability and safety. However, the deployment of the deep learning model at the edge mobile terminal has great challenges, and the embedded devices such as the mobile terminal and the like are not designed for calculating dense tasks, so that the computing capacity, the storage resources and the like are limited, and the problem that the battery capacity and the like of the device are considered when the deep learning method is to be produced at the edge terminal is solved. Therefore, the model of the mobile terminal must meet the conditions of small size, low computational complexity, low battery power consumption, flexible deployment of issuing update, and the like. The diversity of the model deployment platform is another challenge, the hardware platforms of different mobile terminals are very different, the computing capacity and the storage capacity of the devices are also very different, and different edge devices have the maximum computing capacity and the parameter number which can be born by themselves, so that the same task may need models with different sizes for different hardware devices.
Under such circumstances, there is a need for a method of compressing a deep neural network model while ensuring the performance of the model and of constraining an adaptive network structure according to the parameters and the calculation amount of a mobile terminal platform.
Disclosure of Invention
The application provides a model pruning method based on network structure search, which aims to solve the defects in the prior art.
In order to achieve the above purpose, the present application adopts the following technical scheme.
A model pruning method based on network structure search, the method comprising:
s1, acquiring network structure data, and training an improved model through loading;
s2, searching based on a network structure to obtain the sensitivity degree of each convolution layer to pruning;
s3, determining the sensitivity of each convolution layer to pruning, and adapting the pruning proportion of each layer;
s4, pruning the model, namely pruning a channel with relatively low importance and a corresponding filter in each layer according to the self-adaptive pruning ratio of each layer;
s5, the pruning model fine-tuning is self-adaptive to the cutting rate of each layer, and iteration is terminated after the effect of the test set reaches the expected target;
s6, model testing, namely storing and outputting the models meeting the equipment requirements and the performance requirements.
Further, the step S1 specifically includes:
s1.1, a large number of photos are photographed by using a camera, and data augmentation is performed on the photos by using random cutting, random inversion and left and right mirror image processing, so that a data set is expanded;
s1.2, marking the data by using marking software, and storing the original image and marking content in the same folder;
s1.3, improving a model, adding a sandwich principle and in-situ distillation to improve the performance of a network, and accelerating the acquisition of network structure information by adopting a mode of sharing a convolution kernel.
Further, the S1.3 model improvement specifically includes:
(1) Firstly, defining a network structure according to tasks, setting a network structure searching range, and randomly and independently sampling the channel number of each layer in the training of a model;
(2) In the random sampling process of the channel, the upper limit and the lower limit of the set threshold are emphasized to be sampled, so that the accuracy of the model is within a preset range;
(3) In the network structure searching process, an in-situ distillation technology is adopted, and training of other proportions is guided by using a set pruning upper limit proportion;
(4) The method of sharing the convolution kernel is adopted to accelerate the collection of the network structure information, and the higher the front convolution kernel is, the higher the number of times of reuse is.
Further, the number of channels of each layer is randomly sampled in the training of the model, and the upper limit and the lower limit of a set threshold are emphasized in the random sampling process of the channels; the output of a single neuron can be represented by equation (1); equation (2) is the output of the first k channels:
wherein w and x are the upper convolution kernel weight and the input feature map, n is the number of channels input, and the property that y satisfies the formula (2) is the output of the first k channels, k and k 0 I.e. the structure search range.
Further, the step of S2 of deriving the sensitivity of each convolution layer to pruning specifically includes:
s2.1, performing a small amount of iterative training on the improved model by using training data and verification data until convergence;
s2.2, loading the model, deleting the convolution kernel of each layer randomly within a set threshold value, finally recording the last related data of the network, and sorting and dividing the data to obtain the sensitivity degree of different convolution layers to the model output result.
Further, the pruning of the S4 model specifically includes: l1 regularization is applied to the scaling factors of the BN layers such that the scaling factors of the BN layers tend to zero, ordered in ascending order according to the importance of the channels, adapting the pruning rate of each layer and pruning the relatively low importance channels and corresponding filters in each layer. L1 regularization, also known as Lasso regularization, penalizes the complexity of the model by adding the sum of the absolute values of the target parameters to the model's loss function. L1 regularization is applied to the scaling factor of the Bulk Normalized (BN) layer such that the scaling factor of the BN layer tends to be zero. The pruning rate of each layer is adapted and channels of relatively low importance and corresponding filters in each layer are pruned out according to the importance of the channels in ascending order.
Further, as shown in the following formula (3), introducing scaling factors in the BN layer as sparsity penalty terms into an objective function, and jointly training network weights and the scaling factors to enable a plurality of scaling factors in an obtained model to tend to 0;
wherein the first term of the objective function in the formula is a training loss function, L1 regularization |gamma| is adopted as a sparsification penalty term on a scaling factor, (x, y) is training input data and a corresponding label, w is a trainable parameter, and lambda is a balance factor.
Further, the step S5 specifically includes: model fine-tuning after clipping: according to the ratio of the absolute difference change of the optimization target relative to the iteration start time, the sensitivity degree of the corresponding convolution layer to pruning and the requirements of the edge equipment on the data quantity and the parameter quantity, the cutting rate of each layer is self-adaptive; after the effect of the test set reaches the desired goal, the iteration is terminated.
Further, the model test in S6 specifically includes: and (3) testing by using a test data input model, and comparing differences of the same target identification results before and after pruning of the model.
Further, the model output in S6 specifically includes: and outputting the model with the optimal parameters after training to an h5 file so as to facilitate the transplanting of the model to the embedded equipment.
Compared with the prior art, the scheme of the application has the following beneficial effects:
the application aims to solve the problem of deep learning method productization, and simultaneously starts with pruning on the requirements of the convolutional layer on pruning sensitivity, the importance of channels and the parameter quantity and the calculation quantity of edge equipment.
And redundant channels and corresponding convolution kernels are removed under the condition of ensuring the performance of the model, so that the model has the conditions of small size, low computational complexity, low power consumption of a battery, flexible issuing, updating and deployment and the like, and the portability of the network on embedded equipment is ensured.
Drawings
The application is further described below with reference to the drawings and the detailed description.
FIG. 1 is a flow chart of a method of pruning a model based on network structure search of the present application;
FIG. 2 is a flow chart of a conventional structure search;
FIG. 3 is a flow chart of a structure search based on a shared convolution kernel;
FIG. 4 is a flow chart of a conventional pruning;
fig. 5 is a pruning flow chart based on convolutional layer sensitivity based on the present application.
Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
The application is mainly applied to the task of classifying and detecting targets based on deep learning, and provides a model pruning method based on network structure search through compressing a network model to the greatest extent by limiting the calculated amount and the parameter amount of the model while ensuring the accuracy. Taking the identification task of the traditional water meter as an example, the task adopts an SSD model, the model parameter can be reduced to 0.5% of the original model, the identification accuracy rate can reach 99% in 3 seconds, and the method can be deployed into embedded equipment.
FIG. 1 is a block flow diagram of the present application, and a specific implementation of the present application will be described.
Step 1, data preparation: data acquisition, data labeling and data division;
and 1.1, taking a large number of photos by using a camera, and carrying out data augmentation on the photos by using random cutting, random inversion, left and right mirror images and other processes to expand a data set.
And 1.2, marking the data by using marking software, storing txt files in marking contents, and storing the original images and the marking contents in the same folder.
And step 1.3, dividing the image and the corresponding labeling information into a training set, a verification set and a test set, wherein the training set, the verification set and the test set respectively account for 60%, 20% and 20%.
Step 2, model improvement: the sandwich principle and the in-situ distillation are added to improve the performance of the network, and the collection of the network structure information is accelerated by adopting a mode of sharing a convolution kernel, comprising the following substeps:
step 2.1, firstly defining a network structure according to tasks, setting a structure search range, and randomly and independently sampling the channel number of each layer in the training of a model. The method comprises the following specific steps: the SSD target detection network is used as a network structure for accurately identifying the traditional water meter, and the network structure is adjusted. In the traditional SSD detection network, VGG16 is used as a feature extraction network of a backbone, multi-layer convolution is added behind the backbone network to realize multi-scale detection of the target, and for accurate identification of the water meter, single-layer detection is adopted because similar feelings of the detected target on different convolution layers are not greatly different. Setting a structure searching range, namely, the sampling rate of the convolution kernel, wherein the pruning rate adopted in the accurate identification of the traditional water meter is [0.25,1], and independently sampling each layer, so that the structure searching range is enlarged to the greatest extent.
Step 2.2, during random sampling of the channel, the upper limit and the lower limit of the set threshold are emphasized to be sampled, so that the accuracy of the model is within a preset range; the output of a single neuron can be represented by equation (1), where w and x are the upper layer convolution kernel weights and the input signature, respectively, and n is the number of channels of the input. The property that y satisfies equation (2) is the output of the first k channels, k and k 0 That is, the structure search range defined in step 2.1 is generally the larger the number of neurons is, the closer to the true value is. Given k 0 The output result of the network layer is in a defined range, and the relation between the performance of the network layer and the resource consumption can be directly regulated by adding and deleting the channel number.
Step 2.3, in the network structure searching process, a site distillation technology is adopted, and the set pruning upper limit proportion is used for guiding training of other proportions, so that the performance of the network is improved; the sandwich principle is that a plurality of networks are trained simultaneously when searching the network structure, the maximum width network is trained and uses the real label, and other widths adopt the predictive label with the maximum width, so that the predictive label can be adopted in situ in the training without extra calculation cost.
And 2.4, accelerating the acquisition of network structure information by adopting a mode of sharing the convolution kernel, wherein the higher the front convolution kernel is, the higher the number of times of reuse is.
Fig. 2 is a diagram illustrating a conventional network structure search. The search space is alternated with the search strategy and the performance evaluation according to a certain search strategy.
The network structure search schematic of the present application is shown in fig. 3. And uniformly evaluating after all the sub-networks are searched. The number of sub-networks is very large due to independent sampling of each layer, and the calculated amount is not reduced if the sub-networks are stored one by one.
Step 3, collecting network structure data: through loading, training an improved model, recording relevant data information of the model under different network structures, and finally summarizing the sensitivity degree of different convolution layers to model results, wherein the method comprises the following substeps:
step 3.1, performing a small amount of iterative training on the improved model in the step 2 by using training data and verification data until convergence;
step 3.2, loading the model in step 3.1, randomly deleting the convolution kernels of each layer within a set threshold, and finally recording the final relevant data (including the number of the convolution kernels of each layer and the final accuracy) of the network.
Fig. 4 is a schematic diagram of conventional model pruning. Training a large model, pruning according to a set pruning strategy to obtain a plurality of sub-network models, performing fine-tuning recovery model accuracy on the sub-network models, and finally evaluating to obtain an optimal sub-network model as a model of successful pruning. If fine-tuning is done for each sub-network, a significant amount of computing resources and training time are required.
Fig. 5 is a schematic diagram of pruning flow based on convolutional layer sensitivity according to the present application.
In the step 2, the application enables the weights of the sub-networks to be stored in the corresponding convolution kernels in a way of sharing the convolution kernels, so that fine-tuning of the network weights is not needed. Batch normalization is carried out on the channel, the BN layer calculates the mean value and the variance in the training process, then the BN layer slides and updates the two parameters, and the two parameters are called when the model is loaded after model training is finished. Although the present application does not use fine-tuning to recover model accuracy, the present application requires a small amount of forward propagation from the network to update the mean and variance of the sub-network. Each subnetwork performs BN (Batch Normalization) layer mean and variance updates and then records the relevant data.
And 3.3, sorting and dividing the data to obtain the sensitivity degree of different convolution layers to the model output result. One or more convolution kernels in only one convolution layer are removed at a time, and the sensitivity of each convolution layer to pruning proportion is recorded.
And 4, pruning the model, namely applying L1 regularization to the scaling factors of the BN layer, so that the scaling factors of the BN layer tend to be zero, sorting according to the importance of the channels in ascending order, referring to the influence of different convolution layers obtained in the step 3 on the result, self-adapting the pruning ratio of each layer and pruning channels and corresponding filters with relatively lower importance in each layer. Wherein L1 regularization, also known as Lasso regularization, penalizes the complexity of the model by adding the sum of the absolute values of the target parameters to the model's loss function. L1 regularization is applied to the scaling factor of the Bulk Normalized (BN) layer such that the scaling factor of the BN layer tends to be zero. The pruning rate of each layer is adapted and channels of relatively low importance and corresponding filters in each layer are pruned out according to the importance of the channels in ascending order.
Step 4.1 as shown in the following formula (3), introducing scaling factors in the BN layer as sparsity penalty terms into an objective function, and jointly training network weights and the scaling factors so that a plurality of scaling factors in the obtained model tend to be 0. Wherein the first term of the objective function in the formula is a training loss function, where f (x, w) represents a predicted value obtained by taking x as an input and w as a parameter, and l represents a loss value obtained by taking the predicted value and a true value y as inputs. Using L1 regularization |γ| as a sparsification penalty term on the scale factor, (x, y) is the training input data and corresponding labels, w is the trainable parameter, and λ is the balance factor.
Step 4.2, the BN scaling factors of each convolution layer are arranged from large to small, namely the channel importance ranking, and the pruning proportion of each layer is self-adaptive according to the sensitivity of each convolution layer to pruning determined in step 3. On the basis of not changing the original network layer number, unimportant channels and corresponding convolution kernels are removed, and a more compact network convenient to transplant is obtained.
Step 5, model fine-tuning after clipping: according to the ratio of the absolute difference change of the optimization target relative to the iteration start time, the sensitivity degree of the corresponding convolution layer to pruning and the requirements of the edge equipment on the data quantity and the parameter quantity, the cutting rate of each layer is self-adaptive; after the effect of the test set reaches the desired goal, the iteration is terminated.
Step 6, model test: and (3) testing by using a test data input model, and comparing differences of the same target identification results before and after pruning of the model.
And 7, model output: and outputting the model with the optimal parameters after training to an h5 file so as to facilitate the transplanting of the model to the embedded equipment.
It should be noted that the foregoing detailed description is exemplary and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is intended to include the plural unless the context clearly indicates otherwise. Furthermore, it will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, steps, operations, devices, components, and/or groups thereof.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or otherwise described herein.
Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the above detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, like numerals typically identify like components unless context indicates otherwise. The illustrated embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented herein.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A model pruning method based on network structure search, the method comprising:
s1, acquiring network structure data, and training an improved model through loading;
s2, searching based on a network structure to obtain the sensitivity degree of each convolution layer to pruning;
s3, determining the sensitivity of each convolution layer to pruning, and adapting the pruning proportion of each layer;
s4, pruning the model, namely pruning a channel with relatively low importance and a corresponding filter in each layer according to the self-adaptive pruning ratio of each layer;
s5, the pruning model fine-tuning is self-adaptive to the cutting rate of each layer, and iteration is terminated after the effect of the test set reaches the expected target;
s6, model testing, namely storing and outputting the models meeting the equipment requirements and the performance requirements.
2. The method for pruning a model for network structure search according to claim 1, wherein S1 specifically comprises:
s1.1, a large number of photos are photographed by using a camera, and data augmentation is performed on the photos by using random cutting, random inversion and left and right mirror image processing, so that a data set is expanded;
s1.2, marking the data by using marking software, and storing the original image and marking content in the same folder;
s1.3, improving a model, adding a sandwich principle and in-situ distillation to improve the performance of a network, and accelerating the acquisition of network structure information by adopting a mode of sharing a convolution kernel.
3. The method for pruning a model for network structure search according to claim 2, wherein the S1.3 model improvement specifically comprises:
(1) Firstly, defining a network structure according to tasks, setting a network structure searching range, and randomly and independently sampling the channel number of each layer in the training of a model;
(2) In the random sampling process of the channel, the upper limit and the lower limit of the set threshold are emphasized to be sampled, so that the accuracy of the model is within a preset range;
(3) In the network structure searching process, an in-situ distillation technology is adopted, and training of other proportions is guided by using a set pruning upper limit proportion;
(4) The method of sharing the convolution kernel is adopted to accelerate the collection of the network structure information, and the higher the front convolution kernel is, the higher the number of times of reuse is.
4. The method for model pruning for network structure search according to claim 3,
randomly sampling the number of channels of each layer in the training of the model, and in the random sampling process of the channels, focusing on sampling the upper limit and the lower limit of a set threshold; the output of a single neuron can be represented by equation (1); equation (2) is the output of the first k channels:
wherein w and x are the upper convolution kernel weight and the input feature map, n is the number of channels input, and the property that y satisfies the formula (2) is the output of the first k channels, k and k 0 I.e. the structure search range.
5. The method for pruning a model for a network structure search according to claim 1, wherein,
the step S2 of obtaining the sensitivity degree of each convolution layer to pruning specifically comprises the following steps:
s2.1, performing a small amount of iterative training on the improved model by using training data and verification data until convergence;
s2.2, loading the model, deleting the convolution kernel of each layer randomly within a set threshold value, finally recording the last related data of the network, and sorting and dividing the data to obtain the sensitivity degree of different convolution layers to the model output result.
6. The method for pruning a model for a network structure search according to claim 1, wherein,
the S4 model pruning specifically comprises the following steps: l1 regularization is applied to the scaling factors of the BN layers such that the scaling factors of the BN layers tend to zero, ordered in ascending order according to the importance of the channels, adapting the pruning rate of each layer and pruning the relatively low importance channels and corresponding filters in each layer.
7. The method for model pruning for network structure search according to claim 6, wherein,
as shown in the following formula (3), introducing scaling factors in a BN layer as sparsity penalty items into an objective function, and jointly training network weights and the scaling factors to enable a plurality of scaling factors in an obtained model to tend to 0;
wherein the first term of the objective function in the formula is a training loss function, L1 regularization |gamma| is adopted as a sparsification penalty term on a scaling factor, (x, y) is training input data and a corresponding label, w is a trainable parameter, and lambda is a balance factor.
8. The method for pruning a model for a network structure search according to claim 1, wherein,
the step S5 specifically comprises the following steps: model fine-tuning after clipping: according to the ratio of the absolute difference change of the optimization target relative to the iteration start time, the sensitivity degree of the corresponding convolution layer to pruning and the requirements of the edge equipment on the data quantity and the parameter quantity, the cutting rate of each layer is self-adaptive; after the effect of the test set reaches the desired goal, the iteration is terminated.
9. The method for pruning a model for a network structure search according to claim 1, wherein,
the model test in S6 specifically includes: and (3) testing by using a test data input model, and comparing differences of the same target identification results before and after pruning of the model.
10. The method for pruning a model for a network structure search according to claim 1, wherein,
the model output in S6 specifically includes: and outputting the model with the optimal parameters after training to an h5 file so as to facilitate the transplanting of the model to the embedded equipment.
CN202311396457.4A 2023-10-26 2023-10-26 Model pruning method based on network structure search Active CN117131920B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311396457.4A CN117131920B (en) 2023-10-26 2023-10-26 Model pruning method based on network structure search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311396457.4A CN117131920B (en) 2023-10-26 2023-10-26 Model pruning method based on network structure search

Publications (2)

Publication Number Publication Date
CN117131920A true CN117131920A (en) 2023-11-28
CN117131920B CN117131920B (en) 2024-01-30

Family

ID=88851194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311396457.4A Active CN117131920B (en) 2023-10-26 2023-10-26 Model pruning method based on network structure search

Country Status (1)

Country Link
CN (1) CN117131920B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021164752A1 (en) * 2020-02-21 2021-08-26 华为技术有限公司 Neural network channel parameter searching method, and related apparatus
CN114330644A (en) * 2021-12-06 2022-04-12 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) Neural network model compression method based on structure search and channel pruning
WO2022141754A1 (en) * 2020-12-31 2022-07-07 之江实验室 Automatic pruning method and platform for general compression architecture of convolutional neural network
CN114881136A (en) * 2022-04-27 2022-08-09 际络科技(上海)有限公司 Classification method based on pruning convolutional neural network and related equipment
CN115018039A (en) * 2021-03-05 2022-09-06 华为技术有限公司 Neural network distillation method, target detection method and device
US20220351043A1 (en) * 2021-04-30 2022-11-03 Chongqing University Adaptive high-precision compression method and system based on convolutional neural network model
CN115688908A (en) * 2022-09-28 2023-02-03 东南大学 Efficient neural network searching and training method based on pruning technology
US20230084203A1 (en) * 2021-09-06 2023-03-16 Baidu Usa Llc Automatic channel pruning via graph neural network based hypernetwork
US20230153623A1 (en) * 2021-11-18 2023-05-18 GM Global Technology Operations LLC Adaptively pruning neural network systems

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021164752A1 (en) * 2020-02-21 2021-08-26 华为技术有限公司 Neural network channel parameter searching method, and related apparatus
WO2022141754A1 (en) * 2020-12-31 2022-07-07 之江实验室 Automatic pruning method and platform for general compression architecture of convolutional neural network
CN115018039A (en) * 2021-03-05 2022-09-06 华为技术有限公司 Neural network distillation method, target detection method and device
US20220351043A1 (en) * 2021-04-30 2022-11-03 Chongqing University Adaptive high-precision compression method and system based on convolutional neural network model
US20230084203A1 (en) * 2021-09-06 2023-03-16 Baidu Usa Llc Automatic channel pruning via graph neural network based hypernetwork
US20230153623A1 (en) * 2021-11-18 2023-05-18 GM Global Technology Operations LLC Adaptively pruning neural network systems
CN114330644A (en) * 2021-12-06 2022-04-12 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) Neural network model compression method based on structure search and channel pruning
CN114881136A (en) * 2022-04-27 2022-08-09 际络科技(上海)有限公司 Classification method based on pruning convolutional neural network and related equipment
CN115688908A (en) * 2022-09-28 2023-02-03 东南大学 Efficient neural network searching and training method based on pruning technology

Also Published As

Publication number Publication date
CN117131920B (en) 2024-01-30

Similar Documents

Publication Publication Date Title
CN111860495A (en) Hierarchical network structure searching method and device and readable storage medium
CN113128355A (en) Unmanned aerial vehicle image real-time target detection method based on channel pruning
CN112183620B (en) Development method and system of small sample classification model based on graph convolution neural network
CN111833322B (en) Garbage multi-target detection method based on improved YOLOv3
CN112381763A (en) Surface defect detection method
CN113159115B (en) Vehicle fine granularity identification method, system and device based on neural architecture search
CN112308825B (en) SqueezeNet-based crop leaf disease identification method
CN112101547B (en) Pruning method and device for network model, electronic equipment and storage medium
CN111126278A (en) Target detection model optimization and acceleration method for few-category scene
CN112132279A (en) Convolutional neural network model compression method, device, equipment and storage medium
CN113780435A (en) Vehicle damage detection method, device, equipment and storage medium
CN112288700A (en) Rail defect detection method
CN110472092B (en) Geographical positioning method and system of street view picture
CN111815209A (en) Data dimension reduction method and device applied to wind control model
CN114639102A (en) Cell segmentation method and device based on key point and size regression
CN117131920B (en) Model pruning method based on network structure search
CN116167336B (en) Sensor data processing method based on cloud computing, cloud server and medium
CN112395952A (en) A unmanned aerial vehicle for rail defect detection
CN114139696A (en) Model processing method and device based on algorithm integration platform and computer equipment
CN112131415B (en) Method and device for improving data acquisition quality based on deep learning
CN113658217B (en) Self-adaptive target tracking method, device and storage medium
CN116415144A (en) Model compression and acceleration method based on cyclic neural network
CN115018884A (en) Visible light infrared visual tracking method based on multi-strategy fusion tree
CN110222657B (en) Single-step face detector optimization system, method and device
CN109409226B (en) Finger vein image quality evaluation method and device based on cascade optimization CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant