CN111027693A - Neural network compression method and system based on weight-removing pruning - Google Patents

Neural network compression method and system based on weight-removing pruning Download PDF

Info

Publication number
CN111027693A
CN111027693A CN201911174083.5A CN201911174083A CN111027693A CN 111027693 A CN111027693 A CN 111027693A CN 201911174083 A CN201911174083 A CN 201911174083A CN 111027693 A CN111027693 A CN 111027693A
Authority
CN
China
Prior art keywords
neural network
pruned
layer
pruning
cut
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911174083.5A
Other languages
Chinese (zh)
Inventor
王睿
宋昆
王帅杰
崔增皓
陈亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN201911174083.5A priority Critical patent/CN111027693A/en
Publication of CN111027693A publication Critical patent/CN111027693A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention provides a neural network compression method and system based on weight-removing pruning, wherein the method comprises the following steps: determining parameters to be cut in a neural network to be cut; setting the parameters to be cut in the neural network to be pruned to be 0 to obtain the pruned neural network; and modifying the bottom layer calculation function of the pruned neural network, so that if the parameters related to the current calculation are zero in the operation process of the pruned neural network, the current calculation is skipped. The method reduces the complexity of the neural network through the de-weighting pruning to reduce the requirement on the computing power of the computing equipment, thereby achieving the purpose of applying the neural network to the edge equipment.

Description

Neural network compression method and system based on weight-removing pruning
Technical Field
The invention relates to the technical field of edge calculation, in particular to a neural network compression method and system based on weight-removing pruning.
Background
The deep neural network is applied to many life and engineering scenes, and the application of the deep neural network on the edge device is required to depend on the computing power of the cloud server, but the number of the edge devices is increased sharply with the development of the internet of things. Therefore, on some day in the future, cloud computing will have to support large-scale application of deep neural networks in life.
In this scenario, edge computing is carried out in order to relieve the computing pressure of the cloud server. However, the computing resources of the edge device are limited, and in the face of the huge computation amount of the deep neural network, the computing power of the edge device is unconscious, thereby causing serious computing delay. Therefore, how to efficiently deploy deep neural networks on edge devices with limited computing resources becomes a concern for those skilled in the art.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a neural network compression method based on weight-removing pruning, so as to solve the problem that the existing neural network needs huge computing power, and the existing edge device does not have such large computing power, so that the existing deep neural network cannot be directly deployed on the edge device. The computational power requirement on the computing equipment is reduced by reducing the complexity of the neural network, so that the aim of applying the neural network to the edge equipment is fulfilled.
In order to solve the above technical problem, the present invention provides a neural network compression method based on weighted pruning, where the neural network compression method based on weighted pruning includes:
determining parameters to be cut in a neural network to be cut;
setting the parameters to be cut in the neural network to be pruned to be 0 to obtain the pruned neural network;
and modifying the bottom layer calculation function of the pruned neural network, so that if the parameters related to the current calculation are zero in the operation process of the pruned neural network, the current calculation is skipped.
Further, the determining of the parameters to be cut in the neural network to be cut specifically includes:
and sequentially determining parameters which should be cut in each layer in the neural network to be cut.
Further, setting a parameter to be cut in the neural network to be pruned to 0, specifically:
setting the parameters to be cut in each layer in the neural network to be cut to 0 in sequence, thereby completing the cutting operation of each layer one by one; wherein, the pruning operation among the layers is mutually independent;
and after each layer in the neural network to be pruned finishes pruning operation, retraining the neural network to be convergent to obtain the pruned neural network.
Further, the determination of the parameters to be trimmed in each layer of the neural network to be trimmed specifically includes:
introducing a parameter β for indicating whether the parameter in the layer to be pruned should be pruned, and obtaining the following formula for indicating the output characteristic graph of the layer after being pruned:
Yi=β*Wi*Xi+bi
wherein β is an array of one or more,
Figure BDA0002289503160000021
βiβ for the ith element in the β array i0 or 1 when βiWhen the value of (1) is 1, the ith element W in the weight matrix W of the layer to be pruned is representediThe corresponding parameters should be preserved when βiWhen the value of (A) is 0, the ith element W in the weight matrix W of the layer to be pruned is representediThe corresponding parameters should be clipped.
Further, the determining process of the parameter β specifically includes:
initializing β to have all its elements 1, iterating using the following equation:
Figure BDA0002289503160000022
stopping iteration when the number of the elements of β, which is 0, meets a preset number, wherein βiRepresenting the ith element, W, in β arrayiThe ith element, b, in the weight matrix W representing the layer to be prunediIs an offset value, lambda is a proportionality coefficient, N is the number of channels, F represents the Frobeniu norm, XiIs the i-th element of the input matrix, YiIs the ith element of the output matrix.
Accordingly, in order to solve the above technical problem, the present invention further provides a neural network compression system based on weighted pruning, where the neural network compression system based on weighted pruning includes:
the cutting parameter determining module is used for determining parameters to be cut in the neural network to be cut;
the parameter cutting module is used for setting the parameters to be cut in the neural network to be cut to 0 to obtain the neural network after cutting;
and the bottom layer calculation function modification module is used for modifying the bottom layer calculation function of the pruned neural network, so that the pruned neural network skips the current calculation if the parameter related to the current calculation is zero in the operation process.
Further, the clipping parameter determining module is specifically configured to:
and sequentially determining parameters which should be cut in each layer in the neural network to be cut.
Further, the parameter clipping module is specifically configured to:
setting the parameters to be cut in each layer in the neural network to be cut to 0 in sequence, thereby completing the cutting operation of each layer one by one; wherein, the pruning operation among the layers is mutually independent;
and after each layer in the neural network to be pruned finishes pruning operation, retraining the neural network to be convergent to obtain the pruned neural network.
Further, the determining process of the parameter to be clipped by the parameter to be clipped determining module for each layer in the neural network to be clipped specifically includes:
introducing a parameter β for indicating whether the parameter in the layer to be pruned should be pruned, and obtaining the following formula for indicating the output characteristic graph of the layer after being pruned:
Yi=β*Wi*Xi+bi
wherein β is an array of one or more,
Figure BDA0002289503160000031
βiβ for the ith element in the β array i0 or 1 when βiWhen the value of (1) is 1, the ith element W in the weight matrix W of the layer to be pruned is representediThe corresponding parameters should be preserved when βiWhen the value of (A) is 0, the ith element W in the weight matrix W of the layer to be pruned is representediThe corresponding parameters should be clipped.
Further, the determining process of the parameter β specifically includes:
initializing β to have all its elements 1, iterating using the following equation:
Figure BDA0002289503160000032
stopping iteration when the number of the elements of β, which is 0, meets a preset number, wherein βiRepresenting the ith element, W, in β arrayiThe ith element, b, in the weight matrix W representing the layer to be prunediIs an offset value, lambda is a proportionality coefficient, N is the number of channels, F represents the Frobeniu norm, XiIs the i-th element of the input matrix, YiIs the ith element of the output matrix.
The technical scheme of the invention has the following beneficial effects:
1. the invention greatly improves the operation speed of the neural network, reduces the operation time of the neural network and meets the calculation requirements of edge nodes;
2. the invention optimizes the neural network, carries out weight removal pruning on the neural network, improves the running speed of CNN and other networks, and meets the calculation requirement of edge equipment;
3. the invention accelerates the network operation speed without generating great influence on the accuracy of the network;
4. the invention does not change the network structure;
5. the invention has simple operation and is easy to use by hands.
Drawings
FIG. 1 is an overall flow of the neural network compression method based on de-weighting pruning of the present invention;
FIG. 2 is a flow of pruning each layer by the neural network compression method based on de-weighting pruning according to the present invention;
FIG. 3 is a graph of the operating speed of the original model of the Lenet network;
FIG. 4 is a diagram of the model operating speed after pruning by the Lenet network;
FIG. 5 is a graph of the operating speed of the original model of the cifar10 network;
figure 6 is a graph of the model operating speed after cifar10 network pruning.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
First embodiment
Referring to fig. 1 to 6, the present embodiment provides a neural network compression method based on weighted pruning, which includes:
s101, determining parameters to be cut in a neural network to be cut;
s102, setting parameters to be cut in the neural network to be pruned to be 0 to obtain the pruned neural network;
s103, modifying the bottom layer calculation function of the pruned neural network, so that if the parameters related to the current calculation are zero in the operation process of the pruned neural network, the current calculation is skipped.
When the method of the embodiment is operated, firstly, the weight information of the network model needing to be pruned is input, the first layer of the network model is pruned, after the pruning of the current layer is finished, whether a layer which is not pruned exists is judged, if the layer which is not pruned exists, the next layer is pruned continuously, after all the layers are pruned, the network model is retrained to be fitted, and finally, the network model after pruning is output, as shown in fig. 1.
Specifically, in this embodiment, the determination of the parameter to be clipped in the neural network to be pruned in the above S101 is a key part of the scheme of this embodiment, and the process specifically includes:
introducing a parameter β for indicating whether the parameter in the layer to be pruned should be pruned, and obtaining the following formula for indicating the output characteristic graph of the layer after being pruned:
Yi=β*Wi*Xi+bi
wherein β is an array of one or more,
Figure BDA0002289503160000051
βiβ for the ith element in the β array i0 or 1 when βiWhen the value of (1) is 1, the ith element W in the weight matrix W of the layer to be pruned is representediThe corresponding parameters should be preserved when βiWhen the value of (A) is 0, W is representediThe corresponding parameters should be clipped, i.e. the corresponding channel should be masked to zero, where Yi is bi.
Sequentially determining parameters to be cut in each layer in the neural network to be pruned according to the parameters β, further sequentially setting the parameters to be cut in each layer in the neural network to be pruned to 0, and completing the pruning operation of each layer one by one, wherein the pruning operations among the layers are independent;
further, in this embodiment, the determination process of the parameter β is shown in fig. 2, and specifically includes:
β and λ are initialized, with all elements in β being 1, iterated using the following equation:
Figure BDA0002289503160000052
stopping iteration when the number of the elements of β, which is 0, meets a preset number, wherein βiRepresenting the ith element, W, in β arrayiThe ith element, b, in the weight matrix W representing the layer to be prunediIs an offset value, lambda is a proportionality coefficient, N is the number of channels, F represents the Frobeniu norm, XiIs the i-th element of the input matrix, YiIs the ith element of the output matrix.
When pruning is carried out on a certain Layer, firstly, the lambda is set to be a minimum value, the lambda is gradually increased along with the progress of iteration, in the process, the pole point in the formula is moved to the direction smaller than β, so that the zero value of β is gradually increased, the minimum value of the formula is ensured, when β meets a certain condition (the number of elements of 0 in β meets the preset number), the iteration is stopped, and the operation of the next Layer is carried out.
And for each layer, firstly judging whether the number of 0 s in β meets the condition, if not, using the formula to iterate, after the number of 0 s in β meets the pruning condition of the current layer, ending pruning of the current layer, returning to carry out pruning operation of other layers, and after each layer in the neural network to be pruned finishes the pruning operation, retraining the neural network to be convergent to obtain the pruned neural network.
The effect of the solution of the invention is further illustrated below by practical application examples:
the effect of the invention was first tested on a shallow network. First, selecting a Lenet network as an original network for an mnist data set, and pruning results for each layer are shown in table 1:
TABLE 1 Lenet network pruning results
Layer name Conv1 Conv2 Ip1
Percentage of retention parameter 0.6 0.68 0.054
It can be seen that for the Lenet network convolution layer, less than 70% of the parameters are retained, and too many parameters are pruned out from the last layer, which may be too redundant in the original model, and few parameters are useful for the final result. The results of fig. 3 and 4 were obtained after 100 forward propagations of the running model. The mean, variance and model accuracy were found for each run time and the results are shown in table 2:
TABLE 2 results of 100 forward propagation before and after pruning for the Lenet network model
Figure BDA0002289503160000061
The final run results show that the FPS is increased, i.e., the run speed is increased, while the accuracy is not reduced or even increased, possibly due to overfitting of the previous model, while the model after pruning prunes off the overfitting parameters. Therefore, the method has good use effect on the network with less network layers and overfitting.
Next, the effect of the present invention was tested on a deep network, and cifar10 was selected as a data set to perform pruning on the Resnet network, and the pruning results are shown in table 3 below:
TABLE 3 Resnet network pruning results
Figure BDA0002289503160000062
Figure BDA0002289503160000071
The results of the Resnet network model after 100 forward propagations before and after pruning are shown in Table 4 below:
TABLE 4 results of the Resnet network model after 100 forward propagations before and after pruning
Figure BDA0002289503160000081
The running time of the visible model is improved according to the old.
In conclusion, the invention can greatly improve the operation speed of the neural network and reduce the operation time of the neural network by carrying out weight removal pruning on the neural network so as to meet the calculation requirement of the edge equipment; meanwhile, the invention accelerates the network operation speed and has no great influence on the accuracy of the network; nor does it change the network structure; has the advantages of simple operation and easy use.
Second embodiment
The present embodiment provides a neural network compression system based on weighted pruning, where the neural network compression system based on weighted pruning includes:
the cutting parameter determining module is used for determining parameters to be cut in the neural network to be cut;
the parameter cutting module is used for setting the parameters to be cut in the neural network to be cut to 0 to obtain the neural network after cutting;
and the bottom layer calculation function modification module is used for modifying the bottom layer calculation function of the pruned neural network, so that the pruned neural network skips the current calculation if the parameter related to the current calculation is zero in the operation process.
The neural network compression system based on the weighted pruning of the embodiment corresponds to the neural network compression method based on the weighted pruning of the first embodiment; the functions realized by the functional modules in the neural network compression system based on the weighted pruning correspond to the flow steps in the neural network compression method based on the weighted pruning in a one-to-one manner, and therefore, the detailed description is omitted here.
Furthermore, it should be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Claims (10)

1. A neural network compression method based on weight-removing pruning is characterized by comprising the following steps:
determining parameters to be cut in a neural network to be cut;
setting the parameters to be cut in the neural network to be pruned to be 0 to obtain the pruned neural network;
and modifying the bottom layer calculation function of the pruned neural network, so that if the parameters related to the current calculation are zero in the operation process of the pruned neural network, the current calculation is skipped.
2. The neural network compression method based on weighted pruning as claimed in claim 1, wherein the determining of the parameters to be pruned in the neural network to be pruned is specifically:
and sequentially determining parameters which should be cut in each layer in the neural network to be cut.
3. The neural network compression method based on weighted pruning as claimed in claim 2, wherein the setting of the parameters to be pruned in the neural network to be pruned to 0 is specifically:
setting the parameters to be cut in each layer in the neural network to be cut to 0 in sequence, thereby completing the cutting operation of each layer one by one; wherein, the pruning operation among the layers is mutually independent;
and after each layer in the neural network to be pruned finishes pruning operation, retraining the neural network to be convergent to obtain the pruned neural network.
4. The neural network compression method based on the weighted pruning as claimed in claim 3, wherein the determination of the parameters to be pruned in each layer of the neural network to be pruned is specifically as follows:
introducing a parameter β for indicating whether the parameter in the layer to be pruned should be pruned, and obtaining the following formula for indicating the output characteristic graph of the layer after being pruned:
Yi=β*Wi*Xi+bi
wherein β is an array of one or more,
Figure FDA0002289503150000011
βiβ for the ith element in the β arrayi0 or 1 when βiWhen the value of (1) is 1, the ith element W in the weight matrix W of the layer to be pruned is representediThe corresponding parameters should be preserved when βiWhen the value of (A) is 0, the ith element W in the weight matrix W of the layer to be pruned is representediThe corresponding parameters should be clipped, biIs an offset value.
5. The neural network compression method based on de-weighting pruning as claimed in claim 4, wherein the determination process of the parameter β specifically comprises:
initializing β to have all its elements 1, iterating using the following equation:
Figure FDA0002289503150000021
stopping iteration when the number of the elements of β, which is 0, meets a preset number, wherein βiRepresenting the ith element, W, in β arrayiThe ith element, b, in the weight matrix W representing the layer to be prunediIs an offset value, lambda is a proportionality coefficient, N is a channel number, F represents a Frobeniu norm, and XiIs the i-th element of the input matrix, YiIs the ith element of the output matrix.
6. A neural network compression system based on de-weighting pruning, comprising:
the cutting parameter determining module is used for determining parameters to be cut in the neural network to be cut;
the parameter cutting module is used for setting the parameters to be cut in the neural network to be cut to 0 to obtain the neural network after cutting;
and the bottom layer calculation function modification module is used for modifying the bottom layer calculation function of the pruned neural network, so that the pruned neural network skips the current calculation if the parameter related to the current calculation is zero in the operation process.
7. The de-weighted pruning-based neural network compression system of claim 6, wherein the clipping-parameters determination module is specifically configured to:
and sequentially determining parameters which should be cut in each layer in the neural network to be cut.
8. The de-weighting pruning-based neural network compression system of claim 7, wherein the parameter clipping module is specifically configured to:
setting the parameters to be cut in each layer in the neural network to be cut to 0 in sequence, thereby completing the cutting operation of each layer one by one; wherein, the pruning operation among the layers is mutually independent;
and after each layer in the neural network to be pruned finishes pruning operation, retraining the neural network to be convergent to obtain the pruned neural network.
9. The neural network compression system based on weighted pruning as claimed in claim 8, wherein the determining process of the parameter to be pruned by the parameter to be pruned determining module for each layer in the neural network to be pruned specifically comprises:
introducing a parameter β for indicating whether the parameter in the layer to be pruned should be pruned, and obtaining the following formula for indicating the output characteristic graph of the layer after being pruned:
Yi=β*Wi*Xi+bi
wherein β is an array of one or more,
Figure FDA0002289503150000022
βiβ for the ith element in the β arrayi0 or 1 when βiWhen the value of (1) is 1, the ith element W in the weight matrix W of the layer to be pruned is representediThe corresponding parameters should be preserved when βiWhen the value of (A) is 0, the ith element W in the weight matrix W of the layer to be pruned is representediThe corresponding parameters should be clipped.
10. The neural network compression system based on de-weighting pruning of claim 9, wherein the determination of the parameter β is specifically:
initializing β to have all its elements 1, iterating using the following equation:
Figure FDA0002289503150000031
when the number of elements 0 in β satisfies a predetermined numberStopping the iteration, wherein βiRepresenting the ith element, W, in β arrayiThe ith element, b, in the weight matrix W representing the layer to be prunediIs an offset value, lambda is a proportionality coefficient, N is a channel number, F represents a Frobeniu norm, and XiIs the i-th element of the input matrix, YiIs the ith element of the output matrix.
CN201911174083.5A 2019-11-26 2019-11-26 Neural network compression method and system based on weight-removing pruning Pending CN111027693A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911174083.5A CN111027693A (en) 2019-11-26 2019-11-26 Neural network compression method and system based on weight-removing pruning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911174083.5A CN111027693A (en) 2019-11-26 2019-11-26 Neural network compression method and system based on weight-removing pruning

Publications (1)

Publication Number Publication Date
CN111027693A true CN111027693A (en) 2020-04-17

Family

ID=70206814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911174083.5A Pending CN111027693A (en) 2019-11-26 2019-11-26 Neural network compression method and system based on weight-removing pruning

Country Status (1)

Country Link
CN (1) CN111027693A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582446A (en) * 2020-04-28 2020-08-25 北京达佳互联信息技术有限公司 System for neural network pruning and neural network pruning processing method
CN113033804A (en) * 2021-03-29 2021-06-25 北京理工大学重庆创新中心 Convolution neural network compression method for remote sensing image

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582446A (en) * 2020-04-28 2020-08-25 北京达佳互联信息技术有限公司 System for neural network pruning and neural network pruning processing method
CN111582446B (en) * 2020-04-28 2022-12-06 北京达佳互联信息技术有限公司 System for neural network pruning and neural network pruning processing method
CN113033804A (en) * 2021-03-29 2021-06-25 北京理工大学重庆创新中心 Convolution neural network compression method for remote sensing image
CN113033804B (en) * 2021-03-29 2022-07-01 北京理工大学重庆创新中心 Convolution neural network compression method for remote sensing image

Similar Documents

Publication Publication Date Title
CN110379416B (en) Neural network language model training method, device, equipment and storage medium
CN108536650B (en) Method and device for generating gradient lifting tree model
US20200342322A1 (en) Method and device for training data, storage medium, and electronic device
JP2019533257A (en) Neural architecture search
US20220405641A1 (en) Method for recommending information, recommendation server, and storage medium
WO2015089148A2 (en) Reducing dynamic range of low-rank decomposition matrices
US11068655B2 (en) Text recognition based on training of models at a plurality of training nodes
CN110663049A (en) Neural network optimizer search
WO2021103597A1 (en) Method and device for model compression of neural network
CN109616093A (en) End-to-end phoneme synthesizing method, device, equipment and storage medium
CN112101547B (en) Pruning method and device for network model, electronic equipment and storage medium
CN111027693A (en) Neural network compression method and system based on weight-removing pruning
CN111984414B (en) Data processing method, system, equipment and readable storage medium
CN110874634A (en) Neural network optimization method and device, equipment and storage medium
Pietron et al. Retrain or not retrain?-efficient pruning methods of deep cnn networks
CN112200313A (en) Deep learning model reasoning acceleration method, system, equipment and medium
CN114675975B (en) Job scheduling method, device and equipment based on reinforcement learning
CN112861996A (en) Deep neural network model compression method and device, electronic equipment and storage medium
CN116644804A (en) Distributed training system, neural network model training method, device and medium
KR20220097329A (en) Method and algorithm of deep learning network quantization for variable precision
US20160189026A1 (en) Running Time Prediction Algorithm for WAND Queries
CN113163004A (en) Industrial Internet edge task unloading decision method, device and storage medium
CN113033804B (en) Convolution neural network compression method for remote sensing image
CN113887721A (en) Post-training quantization compression method and system in voice recognition task
CN111062477A (en) Data processing method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200417