CN112488306A - Neural network compression method and device, electronic equipment and storage medium - Google Patents

Neural network compression method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112488306A
CN112488306A CN202011533198.1A CN202011533198A CN112488306A CN 112488306 A CN112488306 A CN 112488306A CN 202011533198 A CN202011533198 A CN 202011533198A CN 112488306 A CN112488306 A CN 112488306A
Authority
CN
China
Prior art keywords
neural network
cutting
preset
structural parameters
importance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011533198.1A
Other languages
Chinese (zh)
Inventor
王昭
王子玮
张峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC Information Science Research Institute
Original Assignee
CETC Information Science Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC Information Science Research Institute filed Critical CETC Information Science Research Institute
Priority to CN202011533198.1A priority Critical patent/CN112488306A/en
Publication of CN112488306A publication Critical patent/CN112488306A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The present disclosure provides a neural network compression method, apparatus, electronic device and storage medium, the method comprising: a: acquiring a neural network to be compressed; b: respectively obtaining the importance of a plurality of structural parameters in the neural network; c: cutting off low-importance structural parameters in the neural network according to the first cutting proportion or the first cutting quantity; d: training the cut neural network to update the structural parameters of the cut neural network; e: and d, repeating the steps b to d until the number of the structural parameters cut off in the neural network to be compressed accords with the target cutting number or the number of the remaining structural parameters after cutting accords with the model parameter number. According to the neural network compression method, the neural network compression device, the neural network compression equipment and the neural network compression medium, the performance of the neural network model after pruning can be effectively guaranteed on the basis of effectively reducing the size of the model, reducing the inference memory occupation of the model and accelerating the speed, and meanwhile better pruning controllability is achieved.

Description

Neural network compression method and device, electronic equipment and storage medium
Technical Field
The disclosure belongs to the technical field of information processing, and particularly relates to a neural network compression method, a neural network compression device, electronic equipment and a storage medium.
Background
The neural network becomes a core technology of artificial intelligence, however, with the continuous development of deep learning technology, the more complex the network structure is, the more and more the model parameters are, resulting in the large increase of storage and calculation resources required by the model during operation. Taking the classic ResNet-152 model as an example, it has 6020 ten thousand parameters, storage requirements over 230MB, and floating point number multiplication time required to compute each picture. For intelligent hardware devices such as mobile phones and FPGAs with limited resources, it is very important to compress and accelerate inference for neural network models.
Neural network pruning is a common neural network model compression method, but the existing neural network pruning method is difficult to balance high compression ratio and model performance. The existing weight pruning method can reduce the number of stored weights, further reduce the required storage space, but is not beneficial to accelerating the reasoning speed and reducing the memory occupation of a network model; the hard pruning of the filter is difficult to ensure the performance of the model because a plurality of filters in the convolution process are cut at one time; the soft filter pruning method allows the pruned filter to be updated in the retraining process after the filter pruning, but does not fully utilize the pruning information before retraining.
Disclosure of Invention
The present disclosure is directed to solving at least one of the technical problems of the prior art, and provides a neural network compression method, apparatus, electronic device, and storage medium.
One aspect of the present disclosure provides a neural network compression method, the method comprising:
step a: acquiring a neural network to be compressed;
step b: respectively acquiring the importance of a plurality of structural parameters in the neural network;
step c: cutting off low-importance structural parameters in the neural network according to a preset first cutting proportion or a preset first cutting quantity;
step d: training the cut neural network to update the structural parameters of the cut neural network;
step e: and c, repeatedly executing the steps b to d until the number of the structural parameters cut off in the neural network to be compressed accords with a preset target cutting number or the number of the structural parameters left after cutting in the neural network to be compressed accords with a preset model parameter number.
Optionally, the preset first clipping proportion is less than or equal to 50%, or the preset first clipping quantity is less than or equal to 50% of the total number of the structural parameters in the neural network to be compressed.
Optionally, after the obtaining the importance of the plurality of structural parameters in the neural network respectively, the method further includes:
ranking the importance of the plurality of structured parameters;
the cutting off the low-importance structural parameters in the neural network according to a preset first cutting proportion or a preset first cutting quantity comprises the following steps:
if the importance degrees of the plurality of structural parameters are ranked from high to low, cutting off the structural parameters with the ranked back importance degrees in the neural network according to a preset first cutting proportion or a preset first cutting quantity;
if the importance degrees of the plurality of structural parameters are ranked from low to high, the structural parameters with the importance degrees ranked in the front in the neural network are cut according to a preset first cutting proportion or a preset first cutting quantity.
Optionally, the respectively obtaining the importance of the plurality of structural parameters in the neural network includes:
and obtaining the importance of each structural parameter according to the L2 norm and/or the geometric median of each structural parameter in the neural network.
Optionally, the cutting out the low-importance structural parameter in the neural network according to a preset first cutting ratio or a preset first cutting number includes:
obtaining a second cutting quantity according to the preset first cutting proportion and the total number of the structural parameters in the neural network to be compressed, and cutting off the structural parameters of the second cutting quantity in the neural network by using a hard pruning cutting method; or the like, or, alternatively,
and cutting out the structural parameters of the first cutting quantity in the neural network by using a cutting method of hard pruning.
Optionally, before repeatedly performing steps b to d until the number of the structural parameters cut out from the neural network to be compressed meets a preset target cutting number or the number of the structural parameters remaining after cutting out from the neural network to be compressed meets a preset model parameter number, the method further includes:
recording the times of executing cutting;
and comparing the times of executing cutting with preset target cutting times, and if the times of executing cutting are greater than the target cutting times, replacing the first cutting proportion in the step b with a preset second cutting proportion or replacing the first cutting quantity in the step b with a preset third cutting quantity, wherein the second cutting proportion is greater than the first cutting proportion, and the third cutting quantity is greater than the first cutting quantity.
Optionally, the structured parameters include filters and channels.
In another aspect of the present disclosure, there is provided a neural network compression apparatus, the apparatus including:
the neural network acquisition module is used for acquiring a neural network to be compressed;
the importance acquisition module is used for respectively acquiring the importance of a plurality of structural parameters in the neural network;
the cutting module is used for cutting off the low-importance structural parameters in the neural network according to a preset first cutting proportion or a preset first cutting quantity;
and the updating module is used for training the cut neural network so as to update the structural parameters of the cut neural network.
In another aspect of the present disclosure, an electronic device is provided, including:
one or more processors;
a storage unit for storing one or more programs which, when executed by the one or more processors, enable the one or more processors to implement a compression method in accordance with the foregoing.
In another aspect of the disclosure, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, is able to carry out a compression method according to what has been set forth above.
In the neural network compression method, the device, the electronic equipment and the storage medium of the embodiment of the disclosure, in the pruning process, only a small number of structured parameters meeting a preset first pruning proportion or a preset first pruning quantity are pruned each time, and pruning of the neural network is completed through multiple pruning; in addition, the neural network model is retrained after each pruning, so that the information of the previous pruning process can be fully utilized in the subsequent pruning in the process of sequentially carrying out multiple pruning, the defect that the pruning information before retraining can not be fully utilized in gradual soft pruning is overcome, and the performance of the neural network model after pruning is further improved; finally, the first cutting proportion or the first cutting number can be preset in a self-adaptive mode according to actual use conditions, so that the pruning speed and the guarantee of performance stability are controlled, and the controllability of the pruning process is improved.
Drawings
FIG. 1 is a schematic block diagram of an example electronic device for implementing a neural network compression method and apparatus in accordance with an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a neural network compression method according to another embodiment of the present disclosure;
fig. 3 is a block diagram illustrating a neural network compression apparatus according to another embodiment of the present disclosure.
Detailed Description
For a better understanding of the technical aspects of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.
Unless otherwise specifically stated, technical or scientific terms used in the present disclosure shall have the ordinary meaning as understood by those of ordinary skill in the art to which the present disclosure belongs. The use of "including" or "comprising" and the like in this disclosure does not limit the referenced shapes, numbers, steps, actions, operations, members, elements and/or groups thereof, nor does it preclude the presence or addition of one or more other different shapes, numbers, steps, actions, operations, members, elements and/or groups thereof or those. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number and order of the technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present disclosure, "a plurality" means two or more unless specifically limited otherwise.
In some descriptions of the invention, unless expressly stated or limited otherwise, the terms "mounted," "connected," or "fixed" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect through an intermediate medium, whether internal to two elements or an interactive relationship between two elements.
The relative arrangement of parts and steps, numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise. Also, it should be understood that the dimensions of the various elements shown in the figures are not drawn to scale, for ease of description, and that techniques, methods and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate. In all examples shown and discussed herein, any particular other example may have a different value. It should be noted that: like symbols and letters represent like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Before discussing in greater detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (steps) as a sequential process, many of the operations can be performed in parallel, concurrently, or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when the operation is completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
First, an example electronic device for implementing a domain lexicon construction method and apparatus according to an embodiment of the present disclosure is described with reference to fig. 1.
As shown in FIG. 1, the electronic device 200 includes one or more processors 210, one or more memory devices 220, input devices 230, output devices 240, etc., which are interconnected via a bus system and/or other form of connection mechanism 250. It should be noted that the components and structure of the electronic device shown in fig. 1 are exemplary only, and not limiting, and the electronic device may have other components and structures as desired.
The processor 210 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.
The storage 220 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that a processor may execute to implement the client functionality (implemented by the processor) in the embodiments of the disclosure described below and/or other desired functionality. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.
The input device 230 may be a device used by a user to input instructions, and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.
The output device 240 may output various information (e.g., images or sounds) to an outside (e.g., a user), and may include one or more of a display, a speaker, and the like.
Exemplarily, an example electronic device for implementing a domain lexicon building method and apparatus according to an embodiment of the present disclosure may be implemented as a smartphone, a tablet, or the like.
Next, a neural network compression method according to an embodiment of the present disclosure will be described with reference to fig. 2, including:
step a: and acquiring the neural network to be compressed.
Specifically, the neural network is a neural network model, and the neural network model to be compressed may be constructed and pre-trained according to actual use requirements, for example, the neural network includes a convolutional neural network, and a person skilled in the art may select different neural network models and different pre-training methods according to actual use requirements, which is not limited in this embodiment.
Step b: and respectively obtaining the importance of a plurality of structural parameters in the neural network.
Specifically, the importance of each structural parameter in the neural network is obtained according to a preset importance obtaining method, for example, the preset importance obtaining method includes an L2 norm obtaining method, a gm (geometric media) obtaining method, or a comprehensive importance obtaining method combining multiple obtaining methods, and a person skilled in the art may set the importance obtaining method according to an actual use situation.
Step c: and cutting off the low-importance structural parameters in the neural network according to a preset first cutting proportion or a preset first cutting quantity.
Specifically, in this step, the compression of the neural network is realized by cutting off the structural parameters that are less important to the neural network, and a person skilled in the art can set the first cutting proportion or the first cutting number according to the actual use requirement, which is not specifically limited in this embodiment; the low importance degree refers to the importance degree meeting the clipping condition, for example, when the first clipping number is set to 2, the low importance degree refers to the last two digits in the importance degree ranking, or when the first clipping proportion is set to 3%, and the total number of the structural parameters of the neural network to be compressed is 100, the number of the structural parameters to be clipped at each time is 3, and at this time, the low importance degree refers to the last three digits in the importance degree ranking.
Step d: and training the cut neural network to update the structural parameters of the cut neural network.
Specifically, in this step, the trimmed neural network is retrained to repair the precision of the trimmed neural network, for example, the training method in this step may be the same as the method of pre-training in step a, or different training methods may be used, and a person skilled in the art may select the training method according to the actual use condition, which is not limited in this embodiment.
Step e: and c, repeating the steps b to d in sequence until the number of the structural parameters cut off in the neural network to be compressed accords with the preset target cutting number or the number of the structural parameters left after cutting in the neural network to be compressed accords with the preset model parameter number.
Specifically, in this step, by repeatedly performing steps b to d to implement multiple pruning of the neural network to be compressed until the pruned structural parameters meet the preset conditions, for example, the preset target pruning quantity may be 2/3 of the total number of structural parameters in the neural network to be compressed, and the preset model parameter quantity may be 1/3 of the total number of structural parameters in the neural network to be compressed, and a person skilled in the art may set specific values of the target pruning quantity and the model parameter quantity according to an actual use situation, which is not specifically limited in this embodiment.
According to the neural network compression method, in the pruning process, only a small number of structural parameters are cut each time, and the pruning of the neural network is completed through multiple times of cutting, so that the progressive pruning process can be realized, the influence of one-time pruning on the performance of a neural network model is reduced, the model is gradually pruned for multiple times, the performance of the neural network model is ensured to a certain extent, and the problem that the performance of the model is difficult to ensure due to the fact that a plurality of filters in the convolution process are directly cut at one time in filter hard pruning is solved; in addition, the neural network model is retrained after each pruning, so that the information of the previous pruning process can be fully utilized in the subsequent pruning in the process of sequentially carrying out multiple pruning, the defect that the pruning information before retraining can not be fully utilized in gradual soft pruning is overcome, and the performance of the neural network model after pruning is further improved; finally, the first cutting proportion or the first cutting number can be preset in a self-adaptive mode according to actual use conditions, so that the pruning speed and the guarantee of performance stability are controlled, and the controllability of the pruning process is improved.
The specific method of the neural network compression method will be further explained below.
Illustratively, the preset first clipping proportion is less than or equal to 50%, or the preset first clipping number is less than or equal to 50% of the total number of the structural parameters in the neural network to be compressed, that is, it is ensured that the clipping of the structural parameters in the neural network to be compressed is completed by multiple clipping, illustratively, both the first clipping proportion and the first clipping number are set to be smaller values to ensure the smoothness of the progressive clipping, the preset first clipping proportion is less than or equal to 5%, for example, 2% is selected, or the preset first clipping number is less than or equal to 5, for example, 2 is selected, so that the clipping process can be completed by more times of clipping, the smoothness of the progressive clipping process is improved, and the performance of the neural network after clipping is further ensured.
Illustratively, after the importance of the plurality of structural parameters in the neural network is respectively obtained, the method further includes:
b 1: the importance of the plurality of structured parameters is ranked.
Specifically, a person skilled in the art may select different sorting methods or sorting manners to sort the importance of the structured parameters, so that the structured parameters with low importance can be found more conveniently in the subsequent clipping process.
For example, on this basis, the step c, namely, the structural parameters with low importance in the neural network are cut according to the preset first cutting proportion or the preset first cutting number, specifically:
c 1: if the importance degrees of the plurality of structural parameters are ranked from high to low, the structural parameters with the ranked later importance degrees in the neural network are cut according to a preset first cutting proportion or a preset first cutting quantity.
c 2: if the importance degrees of the plurality of structural parameters are ranked from low to high, the structural parameters with the top ranking of the importance degrees in the neural network are cut off according to a preset first cutting proportion or a preset first cutting quantity.
That is to say, in the step b, the importance degrees of the plurality of structural parameters are sorted first, and in the step c, the sorted importance degrees are identified, so that the structural parameters with low importance degrees are found more quickly, and the clipping efficiency is improved.
Illustratively, the step b of respectively obtaining the importance of a plurality of structural parameters in the neural network specifically includes: and obtaining the importance of each structural parameter according to the L2 norm and/or the geometric median of each structural parameter in the neural network. For example, in step b, the L2 norm of each structural parameter may be used as the importance of the structural parameter, or a Geometric Median (Geometric media) of each structural parameter may be used as the importance of the structural parameter, or the L2 norm and the Geometric Median of each structural parameter may be weighted according to preset different weights, and a weighting result is used as the importance of the structural parameter, and a person skilled in the art may select different weighting values of the L2 norm and the Geometric Median according to an actual use situation, or select different importance obtaining methods according to the actual use situation, which is not limited in this embodiment.
For example, in the step c, the cutting out the low-importance structural parameter in the neural network according to the preset first cutting proportion or the preset first cutting number specifically includes: if the number of the structural parameters of each cutting is determined by using the first cutting proportion, obtaining a second cutting number according to the preset first cutting proportion and the total number of the structural parameters in the neural network to be compressed, and cutting off the structural parameters of the second cutting number in the neural network by using a hard pruning cutting method; if the number of the structured parameters to be cut each time is determined by using the first cutting number, the structured parameters of the first cutting number in the neural network are cut off by directly using the cutting method of the hard pruning.
For example, before step e, that is, before repeatedly performing steps b to d until the number of the structural parameters clipped in the neural network to be compressed meets the preset target clipping number or the number of the structural parameters remaining after clipping in the neural network to be compressed meets the preset model parameter number, the method further includes:
f 1: the number of times the cropping was performed is recorded.
Specifically, the number of times of performing clipping is initialized to 0 before compression is performed, and 1 is added to the number of times of performing clipping every time clipping is performed.
f 2: and (c) comparing the times of executing cutting with the preset target cutting times, and if the times of executing cutting are greater than the target cutting times, replacing the first cutting proportion in the step (b) by using a preset second cutting proportion or replacing the first cutting quantity in the step (b) by using a preset third cutting quantity, wherein the second cutting proportion is greater than the first cutting proportion, and the third cutting quantity is greater than the first cutting quantity.
Specifically, in this step, when it is found that the compression speed is too slow, that is, the compression frequency is large but the compression work is still not completed, the compression speed is increased by increasing the number of the structured parameters for each cropping, that is, a larger second cropping ratio is used instead of the original smaller first cropping ratio, or a larger third cropping number is used instead of the original smaller first cropping number, and the skilled person can set the second cropping ratio or the third cropping number according to the actual use condition, for example, the second cropping ratio is doubled compared with the first cropping ratio, or the third cropping number is doubled compared with the first cropping number, for example, the first cropping ratio is 3%, the second cropping ratio is 6%, the first cropping number is 2, the third cropping number is 4, and the skilled person can set the target cropping frequency according to the actual use requirement, for example, 30 times and 50 times, and no particular limitation is imposed in this embodiment.
It should be noted that steps f1 and f2 may be disposed at any position before step e after step a, for example, after step b, after step c, or after step d, and if steps f1 and f2 are disposed after step a and before step b, steps f1, f2, b, c, and d need to be repeatedly and sequentially performed in step e to ensure that the number of times of performing the cropping is recorded in each cropping process.
The neural network compression method of the embodiment includes that firstly, smoothness of progressive pruning is guaranteed by presetting a smaller first pruning proportion and a smaller first pruning quantity, secondly, importance of a plurality of structural parameters is ranked before pruning, so that efficiency of searching for structural parameters with low importance and pruning is improved subsequently is improved, thirdly, importance of the structural parameters is obtained through an L2 norm and/or a geometric median of the structural parameters, accuracy and adaptability of obtaining the importance are improved, and finally, when times of executing the pruning are too high and compression is too slow, compression speed is improved by improving the quantity of the structural parameters of each pruning, compression efficiency is improved, the quantity of the structured parameters of each pruning can be adjusted adaptively and improved, controllability of a pruning process is improved, on the basis of guaranteeing properties of a neural network model after the pruning, the compression speed is increased.
Next, a neural network compression apparatus 100 according to another embodiment of the present disclosure will be described with reference to fig. 3, the apparatus including:
a neural network obtaining module 110, configured to obtain a neural network to be compressed.
The importance obtaining module 120 is configured to obtain importance of a plurality of structural parameters in the neural network, exemplarily, and obtain importance of the structural parameters by using a method such as an L2 norm obtaining method, a gm (geometrical media) obtaining method, and the like.
And the cutting module 130 is configured to cut off the low-importance structural parameters in the neural network according to a preset first cutting proportion or a preset first cutting number.
And the updating module 140 is configured to train the trimmed neural network to update the structural parameters of the trimmed neural network.
Illustratively, the compression device 100 further comprises:
the determining module 150 is configured to determine whether the number of the trimmed structural parameters in the neural network to be compressed matches a preset target trimming number, or whether the number of the remaining structural parameters in the neural network to be compressed after trimming matches a preset model parameter number.
For example, after the updating module 140 completes updating the structural parameters of the trimmed neural network, the determining module 150 determines whether the number of the trimmed structural parameters in the neural network to be compressed matches the preset target trimming number, or whether the number of the remaining structural parameters in the neural network to be compressed matches the preset model parameter number, if the number of the trimmed structural parameters in the neural network to be compressed matches the preset target trimming number, or the number of the remaining structural parameters in the neural network to be trimmed matches the preset model parameter number, the trimming of the neural network is completed, otherwise, the importance obtaining module 120 obtains the importance of the plurality of structural parameters in the updated neural network again, and then the module 130 trims the structure with low importance in the updated neural network again according to the preset first ratio or the preset first trimming number Changing parameters to realize secondary cutting, then training the neural network subjected to secondary cutting by the updating module 140, updating the structural parameters of the neural network subjected to secondary cutting again, judging again by the judging module 150, stopping the pruning of the neural network if the conditions are met, or continuing to repeatedly execute the process until the judging module judges that the conditions for stopping the pruning are met.
According to the neural network compression device disclosed by the embodiment of the disclosure, in the pruning process, only a small number of structural parameters are cut each time, and then the pruning of the neural network is completed through multiple times of cutting, so that the progressive pruning process can be realized by using the cutting method, the influence of one-time pruning on the performance of a neural network model is reduced, the model is gradually subtracted for multiple times for pruning, the performance of the neural network model is ensured to a certain extent, and the problem that the performance of the model is difficult to ensure due to the fact that a plurality of filters in the convolution process are directly cut at one time in filter hard pruning is solved; in addition, the neural network model is retrained after each pruning, so that the information of the previous pruning process can be fully utilized in the subsequent pruning in the process of sequentially carrying out multiple pruning, the defect that the pruning information before retraining can not be fully utilized in gradual soft pruning is overcome, and the performance of the neural network model after pruning is further improved; finally, the first cutting proportion or the first cutting number can be preset in a self-adaptive mode according to actual use conditions, so that the pruning speed and the guarantee of performance stability are controlled, and the controllability of the pruning process is improved.
For example, the importance obtaining module 120 further includes:
and the sorting submodule 121 is used for sorting the importance of the plurality of structural parameters.
Illustratively, the clipping module 130 further includes:
the back-end cutting sub-module is used for cutting out the structural parameters with the back importance ranking in the neural network according to a preset first cutting proportion or a preset first cutting quantity when the importance of the structural parameters is ranked from high to low;
and the front-end cutting submodule is used for cutting out the structural parameters with the importance ranking close to the front in the neural network according to a preset first cutting proportion or a preset first cutting quantity when the importance of the structural parameters is ranked from low to high.
According to the neural network compression device disclosed by the embodiment of the disclosure, the importance degrees of a plurality of structured parameters are sorted before cutting, so that the efficiency of subsequently searching for the structured parameters with low importance degrees and cutting is improved.
The computer readable medium may be included in the apparatus, device, system, or may exist separately.
The computer readable storage medium may be any tangible medium that can contain or store a program, and may be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, more specific examples of which include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, an optical fiber, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination thereof.
The computer readable storage medium may also include a propagated data signal with computer readable program code embodied therein, for example, in a non-transitory form, such as in a carrier wave or in a carrier wave, wherein the carrier wave is any suitable carrier wave or carrier wave for carrying the program code.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
It is to be understood that the above embodiments are merely exemplary embodiments that are employed to illustrate the principles of the present disclosure, and that the present disclosure is not limited thereto. It will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the disclosure, and these are to be considered as the scope of the disclosure.

Claims (10)

1. A neural network compression method, the method comprising:
step a: acquiring a neural network to be compressed;
step b: respectively acquiring the importance of a plurality of structural parameters in the neural network;
step c: cutting off low-importance structural parameters in the neural network according to a preset first cutting proportion or a preset first cutting quantity;
step d: training the cut neural network to update the structural parameters of the cut neural network;
step e: and c, repeatedly executing the steps b to d until the number of the structural parameters cut off in the neural network to be compressed accords with a preset target cutting number or the number of the structural parameters left after cutting in the neural network to be compressed accords with a preset model parameter number.
2. The compression method according to claim 1, wherein the preset first clipping proportion is less than or equal to 50%, or the preset first clipping number is less than or equal to 50% of the total number of structured parameters in the neural network to be compressed.
3. The compression method according to claim 1, further comprising, after the respectively obtaining the importance of the plurality of structured parameters in the neural network:
ranking the importance of the plurality of structured parameters;
the cutting off the low-importance structural parameters in the neural network according to a preset first cutting proportion or a preset first cutting quantity comprises the following steps:
if the importance degrees of the plurality of structural parameters are ranked from high to low, cutting off the structural parameters with the ranked back importance degrees in the neural network according to a preset first cutting proportion or a preset first cutting quantity;
if the importance degrees of the plurality of structural parameters are ranked from low to high, the structural parameters with the importance degrees ranked in the front in the neural network are cut according to a preset first cutting proportion or a preset first cutting quantity.
4. The compression method according to claim 1, wherein the separately obtaining the importance of the plurality of structural parameters in the neural network comprises:
and obtaining the importance of each structural parameter according to the L2 norm and/or the geometric median of each structural parameter in the neural network.
5. The compression method according to claim 1, wherein the cropping of the low-importance structural parameters in the neural network according to a preset first cropping ratio or a preset first cropping number comprises:
obtaining a second cutting quantity according to the preset first cutting proportion and the total number of the structural parameters in the neural network to be compressed, and cutting off the structural parameters of the second cutting quantity in the neural network by using a hard pruning cutting method; or the like, or, alternatively,
and cutting out the structural parameters of the first cutting quantity in the neural network by using a cutting method of hard pruning.
6. The compression method according to any one of claims 1 to 5, further comprising, before the repeatedly performing steps b to d until the number of the structural parameters clipped in the neural network to be compressed meets a preset target clipping number or the number of the structural parameters remaining after clipping in the neural network to be compressed meets a preset model parameter number:
recording the times of executing cutting;
and comparing the times of executing cutting with preset target cutting times, and if the times of executing cutting are greater than the target cutting times, replacing the first cutting proportion in the step b with a preset second cutting proportion or replacing the first cutting quantity in the step b with a preset third cutting quantity, wherein the second cutting proportion is greater than the first cutting proportion, and the third cutting quantity is greater than the first cutting quantity.
7. The compression method according to any one of claims 1 to 5, wherein the structuring parameters comprise filters and channels.
8. An apparatus for neural network compression, the apparatus comprising:
the neural network acquisition module is used for acquiring a neural network to be compressed;
the importance acquisition module is used for respectively acquiring the importance of a plurality of structural parameters in the neural network;
the cutting module is used for cutting off the low-importance structural parameters in the neural network according to a preset first cutting proportion or a preset first cutting quantity;
and the updating module is used for training the cut neural network so as to update the structural parameters of the cut neural network.
9. An electronic device, comprising:
one or more processors;
a storage unit for storing one or more programs which, when executed by the one or more processors, enable the one or more processors to implement the compression method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that,
the computer program is capable of implementing a compression method according to any one of claims 1 to 7 when executed by a processor.
CN202011533198.1A 2020-12-22 2020-12-22 Neural network compression method and device, electronic equipment and storage medium Pending CN112488306A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011533198.1A CN112488306A (en) 2020-12-22 2020-12-22 Neural network compression method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011533198.1A CN112488306A (en) 2020-12-22 2020-12-22 Neural network compression method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112488306A true CN112488306A (en) 2021-03-12

Family

ID=74915358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011533198.1A Pending CN112488306A (en) 2020-12-22 2020-12-22 Neural network compression method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112488306A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113052257A (en) * 2021-04-13 2021-06-29 中国电子科技集团公司信息科学研究院 Deep reinforcement learning method and device based on visual converter
CN113659992A (en) * 2021-07-16 2021-11-16 深圳智慧林网络科技有限公司 Data compression method and device and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113052257A (en) * 2021-04-13 2021-06-29 中国电子科技集团公司信息科学研究院 Deep reinforcement learning method and device based on visual converter
CN113052257B (en) * 2021-04-13 2024-04-16 中国电子科技集团公司信息科学研究院 Deep reinforcement learning method and device based on visual transducer
CN113659992A (en) * 2021-07-16 2021-11-16 深圳智慧林网络科技有限公司 Data compression method and device and storage medium
CN113659992B (en) * 2021-07-16 2023-08-11 深圳智慧林网络科技有限公司 Data compression method and device and storage medium

Similar Documents

Publication Publication Date Title
US20200311552A1 (en) Device and method for compressing machine learning model
US10032463B1 (en) Speech processing with learned representation of user interaction history
CN112396179A (en) Flexible deep learning network model compression method based on channel gradient pruning
CN111079899A (en) Neural network model compression method, system, device and medium
CN112488306A (en) Neural network compression method and device, electronic equipment and storage medium
CN111357051B (en) Speech emotion recognition method, intelligent device and computer readable storage medium
CN112052951A (en) Pruning neural network method, system, equipment and readable storage medium
CN110059804B (en) Data processing method and device
CN112687266B (en) Speech recognition method, device, computer equipment and storage medium
CN116868204A (en) System and method for progressive learning of machine learning models to optimize training speed
CN110874635B (en) Deep neural network model compression method and device
CN116188878A (en) Image classification method, device and storage medium based on neural network structure fine adjustment
KR102204975B1 (en) Method and apparatus for speech recognition using deep neural network
CN111160049A (en) Text translation method, device, machine translation system and storage medium
CN112735466B (en) Audio detection method and device
US20210110816A1 (en) Electronic apparatus and method of providing sentence thereof
KR102374525B1 (en) Keyword Spotting Apparatus, Method and Computer Readable Recording Medium Thereof
CN116468102A (en) Pruning method and device for cutter image classification model and computer equipment
CN116737895A (en) Data processing method and related equipment
WO2023086311A1 (en) Control of speech preservation in speech enhancement
US11507782B2 (en) Method, device, and program product for determining model compression rate
CN113157582B (en) Test script execution sequence determining method and device
CN114818369A (en) Method, system, device and medium for designing continuous transonic wind tunnel section
CN114565080A (en) Neural network compression method and device, computer readable medium and electronic equipment
CN112561040A (en) Filter distribution perception training acceleration method and platform for neural network model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210312

RJ01 Rejection of invention patent application after publication