CN108416423B - Automatic threshold for neural network pruning and retraining - Google Patents

Automatic threshold for neural network pruning and retraining Download PDF

Info

Publication number
CN108416423B
CN108416423B CN201810100412.0A CN201810100412A CN108416423B CN 108416423 B CN108416423 B CN 108416423B CN 201810100412 A CN201810100412 A CN 201810100412A CN 108416423 B CN108416423 B CN 108416423B
Authority
CN
China
Prior art keywords
pruning
neural network
layer
weights
layers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810100412.0A
Other languages
Chinese (zh)
Other versions
CN108416423A (en
Inventor
冀正平
约翰·韦克菲尔德·布拉泽斯
伊利亚·奥夫相尼科夫
沈恩寿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Publication of CN108416423A publication Critical patent/CN108416423A/en
Application granted granted Critical
Publication of CN108416423B publication Critical patent/CN108416423B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

An embodiment includes a method comprising: pruning layers of a neural network having a plurality of layers using a threshold; and repeating pruning the layers of the neural network using different thresholds until a pruning error of the pruned layers reaches a pruning error tolerance.

Description

Automatic threshold for neural network pruning and retraining
Cross Reference to Related Applications
The present application claims priority from U.S. provisional patent application No.62/457,806 filed on 2 months 10 in 2017, the contents of which are incorporated herein by reference in their entirety for all purposes.
Technical Field
The present disclosure relates to pruning and retraining of neural networks, and in particular, pruning and retraining of neural networks using automatic thresholds.
Background
Deep learning architectures, particularly convolutional deep neural networks, have been used in the fields of Artificial Intelligence (AI) and computer vision. These architectures have been shown to produce results for tasks including visual object recognition, detection, and segmentation. However, these architectures may have a large number of parameters, resulting in higher computational loads and increased power consumption.
Disclosure of Invention
An embodiment includes a method comprising: pruning layers of a neural network having a plurality of layers using a threshold; and repeating pruning the layers of the neural network using different thresholds until a pruning error of the pruned layers reaches a pruning error tolerance.
An embodiment includes a method comprising: the following steps are repeated: pruning multiple layers of the neural network using the automatically determined threshold; and retraining the neural network using only weights remaining after pruning.
Embodiments include a system comprising: a memory; and a processor coupled to the memory and configured to: pruning layers of a neural network having a plurality of layers using a threshold; and repeating pruning the layers of the neural network using different thresholds until a pruning error of the pruned layers reaches a pruning error tolerance.
Drawings
1A-1B are flowcharts of techniques for automatically determining thresholds, according to some embodiments.
FIG. 2 is a flow diagram of a retraining operation according to some embodiments.
Fig. 3A-3B are flowcharts of retraining operations according to some embodiments.
FIG. 4 is a flow diagram of a technique for automatically determining thresholds, pruning, and retraining according to some embodiments.
FIG. 5 is a set of curves illustrating a retraining operation according to some embodiments.
FIG. 6 is a graph including results of various neural networks after pruning and retraining according to some embodiments.
Fig. 7A-7C are graphs including results of various techniques for pruning a neural network, according to some embodiments.
Fig. 8 is a system according to some embodiments.
Detailed Description
Embodiments relate to pruning and retraining of neural networks, and in particular, pruning and retraining of neural networks using automatic thresholds. The following description is presented to enable one of ordinary skill in the art to make and use the embodiments and is provided in the context of a patent application and its requirements. Various modifications to the embodiments and the generic principles and features described herein will be readily apparent. The embodiments are described primarily in the sense of specific methods, apparatus and systems provided in the detailed description.
However, the methods, apparatus and systems will operate effectively in other embodiments. Phrases such as "an embodiment," "one embodiment," and "another embodiment" may refer to the same or different embodiments as well as multiple embodiments. Embodiments will be described with respect to systems and/or devices having particular components. However, the system and/or apparatus may include more or less components than those shown, and variations in the arrangement and type of components may be made without departing from the scope of the disclosure. Embodiments will also be described in the context of specific methods having certain operations. However, the methods and systems may operate according to other methods with different and/or additional operations and with different orders and/or parallel operations that are not inconsistent with the embodiments. Thus, the embodiments are not intended to be limited to the specific embodiments shown, but are to be accorded the widest scope consistent with the principles and features described herein.
Embodiments are described in the context of a particular system or device having certain components. One of ordinary skill in the art will readily recognize that embodiments are consistent with the use of systems or devices having other and/or additional components and/or other features. Methods, apparatus, and systems may also be described in the context of a single element. However, one of ordinary skill in the art will readily recognize that: these methods and systems are consistent with using an architecture having multiple elements.
Those skilled in the art will appreciate that: terms used herein and particularly in the appended claims (e.g., the text of the appended claims) are generally intended to be "open" terms (e.g., the term "including" should be interpreted as "including but not limited to," the term "having" should be interpreted as "having at least," the term "including" should be interpreted as "including but not limited to," etc.). Those skilled in the art will further understand that: if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases "at least one" and "one or more" to introduce claim recitations. However, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an" (e.g., "a" and/or "an" should be interpreted to mean "at least one" or "one or more"), the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim recitation to examples containing only one such recitation; the same holds true for the use of definite articles used to introduce claim recitations. Further, in those instances where a convention analogous to "at least one of A, B or C, etc." is used, such a construction in general is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B and C together, etc.). Those skilled in the art will further understand that: whether in the specification, claims, or drawings, virtually any disjunctive word and/or phrase presenting two or more alternative terms should be understood to contemplate the possibilities including one of the terms, either of the terms, or both terms. For example, the phrase "a or B" will be understood to include the possibilities of "a" or "B" or "a and B".
In some embodiments, a neural network (such as a deep learning neural network) may be created with a reduced parameter size. As a result, the performance level of the image recognition task can be maintained while reducing the load on the neural network hardware. In some embodiments, to reduce the size of the parameters of the neural network, the neural network may be trimmed to zero many parameters. However, the problem is how to set good thresholds for each layer of the neural network to trim the network as much as possible, while maintaining its original performance. For a neural network consisting of tens of layers, the violence search threshold may not be practical, especially considering that the threshold of one layer may depend on the other layers. In addition, pruning may require retraining the network to restore the original performance. Such a trimming process may take a significant amount of time to verify as being effective. As described herein, in various embodiments, automatic selection of thresholds along with a method of retraining a network may be used to prune a neural network to reduce parameters.
FIG. 1A is a flow diagram of a technique for automatically determining a threshold according to some embodiments. At 100, a threshold for pruning layers of a neural network is initialized. In some embodiments, the threshold may be initialized to the extreme of the range of values, e.g., 0 or 1. In other embodiments, the threshold may be initialized to a particular value such as 0.2, 0.5, etc. In other embodiments, the threshold may be set to a random value. In other embodiments, the threshold may be determined using empirical rules. In some embodiments, the threshold may be set to a value that is automatically determined for another layer, such as a layer of similar situation (similarly situated layer) or a layer of similar or same type. In some embodiments, the initial threshold may be the same for all layers of the neural network; however, in other embodiments, the threshold may be different for some layers or for each layer.
In 102, layers of the neural network are trimmed using a threshold. For example, a threshold is used to set some of the weights for that layer to zero. At 104, a pruning error is calculated for the pruned layers. The clipping error is a function of the weights before and after clipping.
At 106, the clipping error (PE) is compared to a clipping error tolerance (PEA). In some embodiments, if the clipping error is equal to the clipping error tolerance, the clipping error may have reached the clipping error tolerance. However, in other embodiments, if the clipping error is within a range that includes a clipping error tolerance, the clipping error may have reached the clipping error tolerance. Alternatively, the clipping error margin may be a range of acceptable clipping errors. For example, a relatively small number may be used to define a range above and below a particular clipping error tolerance. If the degree of separation of the clipping error and the clipping error tolerance is less than a relatively small number, then the clipping error is considered to be at the clipping error tolerance.
If the clipping error has not reached the clipping error tolerance, then the threshold is changed at 108. Changing the threshold may be performed in various ways. For example, the threshold may be changed by a fixed amount. In other embodiments, the threshold may change the amount of difference based on the clipping error and the clipping error tolerance. In other embodiments, the threshold may be changed by an amount based on the current threshold. In other embodiments, another threshold may be selected using a search technique such as a binary search or other type of search. The technique of changing the threshold may (but need not) be the same technique for all layers.
Regardless of how the change is made, after the threshold is changed at 108, the process is repeated by pruning the layers at 102 and calculating a pruning error at 104. The clipping error is again compared to the clipping error tolerance to determine if the clipping error has reached the clipping error tolerance. Thus, pruning of layers of the neural network is repeated using different thresholds until the pruning error reaches the pruning error tolerance.
FIG. 1B is a flow diagram of a technique for automatically determining a threshold according to some embodiments. In this embodiment, the trimming technique is similar to the trimming technique of fig. 1A. A description of similar operations will be omitted. In some embodiments, a clipping error margin is initialized at 101. In other embodiments, a percentage or range of percentages of non-zero weights may be initialized. In other embodiments, a combination of the threshold, clipping error margin, and percentage or range of percentages of non-zero weights may be initialized. The pruning error margin and/or the percentage or percentage range of non-zero weights may be initialized using techniques similar to those described above with respect to the initialization threshold.
At 110, after the clipping error reaches the clipping error tolerance, a percentage of non-zero weights is calculated and compared to an acceptable percentage or range of percentages of non-zero weights. In this embodiment, the number of pruning weights is expressed as a percentage; however, in other embodiments, the number of pruning weights may be represented in a different manner. Although percentages of non-zero weights have been used as examples, in other embodiments percentages of pruning weights may be used and compared to corresponding ranges or values.
If the percentage of clipping weights is not within the range for that layer, then at 112 the clipping error margin is changed. The clipping error margin may be changed using techniques as described above for changing the threshold in 108. The same or different techniques used to change the threshold may also be used to change the clipping error margin. The technique to change the clipping error margin may (but need not) be the same technique for all layers.
After the trimming error margin has changed, the layer may be trimmed again at 102. Subsequent operations may be performed until the percentage of pruning weights is within an acceptable amount. In 114, the next layer may be processed similarly. Thus, this process may be repeated for each layer of the neural network.
Using techniques according to some embodiments allows all layers of the neural network to be trimmed with a single threshold and/or a single trimming error margin. However, each layer will eventually have an automatically determined threshold based on the particular layer. If a fixed threshold is used for two layers to all layers, the threshold may not be optimal for one or more of those layers. Furthermore, since pruning techniques according to some embodiments focus on a single layer, the threshold may be determined specifically for that layer.
In some embodiments, the percentage of non-zero weight may be a single control or a single type of control for pruning the neural network. As described above, the clipping error margin will change until the percentage of non-zero weights is within the desired range. Similarly, the threshold value is changed until the clipping error reaches the clipping error tolerance. Thus, by setting the percentage of non-zero weights, the clipping error margin and threshold will be changed to achieve the desired percentage.
In some embodiments, the clipping error margin and/or threshold may also be initialized. For example, the clipping error margin may be initialized to bias the result of the operation to a particular side of the percentage range of non-zero weights or to a particular location within the range.
In some embodiments, the threshold may be determined as follows. Tolerance epsilon for clipping errors * Initialization is performed. For each layer 1, the threshold T is calculated using the techniques described above 1 Initialization is performed. Using a threshold T 1 For each weight w of layer 1 i Trimming. Equation 1 is an example of how weights may be pruned.
In some embodiments, the threshold T may be scaled by a scaling factor 1 Scaling is performed. Here, the threshold T can be determined by σ (w) 1 Scaling is performed. σ (w) is the standard deviation of the ownership within the layer. However, in other embodiments, the threshold T may be paired by a different scaling factor 1 Scaling is performed.
Once the layer is trimmed, a trim error epsilon is calculated. Equation 2 is an example of how the clipping error is calculated.
Here, w pruned Is a vector of pruning weights and w is a vector of original weights before pruning. D (w) is the total length of w. Thus, the resulting clipping error ε is based on the amount of error and the number of clipped weights.
The clipping error ε may be related to the clipping error tolerance ε * A comparison is made. Equation 3 is an example of comparison.
|ε-ε * |>θ (3)
Here, θ is defined to define a clipping error margin ε * Is the number of ranges in the center. In some embodiments, θ is 0; however in other casesIn an embodiment, θ is a relatively small number. In other embodiments, θ is a number defining the size of the range.
If the clipping error epsilon and the clipping error tolerance epsilon * If the difference is smaller than θ, the clipping error ε has reached the clipping error tolerance ε * . If not, threshold T 1 May be varied as described in equation 4.
Here ζ is a constant, the threshold value T may be 1 The constant is changed. As described above, in other embodiments, the threshold T 1 May be varied in different ways. For example, ζ may be a value that progressively decreases by a factor of 2 at each iteration. Regardless, once the threshold T is changed 1 The use of the updated threshold T may be performed as described above 1 Is provided, and subsequent steps.
If the clipping error ε has reached the clipping error tolerance ε * The percentage of non-zero weights may be checked. Equation 5 is an example of calculating the percentage p.
The percentage p is then compared to a range of acceptable percentages. In some embodiments, the range of acceptable percentages may be the same; however, in other embodiments, the range may be different. In particular, the range may depend on the type of layer. For example, for convolutional layers, the percentage p may range between 0.2 and 0.9, while for other layers (e.g., fully connected layers) the range may be between 0.04 and 0.2.
If the percentage p is less than the lower end of the range for that layer, then the error margin ε is trimmed * Decreasing as in equation 6. Similarly, if the percentage is greater than the upper end of the range for that layer, the error margin ε is trimmed * Increasing as in equation 7.
ε * =ε * -τ (6)
ε * =ε * +τ (7)
At the error margin ε * After the change, the trimming may be repeated until the trimming error ε has reached a new trimming error margin ε * Until that point. In some embodiments, the threshold T for the previous iteration may be maintained 1 The method comprises the steps of carrying out a first treatment on the surface of the However, in other embodiments, the threshold T 1 May be different, for example initialized to an original initial value or initialized according to an initialization algorithm. For example, the threshold T for the next iteration may be used 1 Initializing to a past-based threshold T 1 But adjusted in a direction expected to reduce the number of pruning iterations to reach a new pruning error margin epsilon *
The above technique may be repeated until the percentage p is within an acceptable range for the layer. The operation may be repeated for other layers of the neural network. In some embodiments, the correction error margin ε may be selected, for example, with or without depending on the previously trimmed layer * And an initial threshold T 1 And the like. For example, for two similarly-case layers, the later trimmed layers may use the resulting trim error margin ε from the earlier trimmed layers * And threshold T 1
Thus by pruning according to the techniques described herein, in some embodiments, the pruning threshold for each layer may be automatically determined. That is, the threshold may be determined to satisfy a particular range of non-zero weights and/or a particular clipping error margin for the clipped layer retention. The threshold may be different for one or more layers (including all layers) depending on the particular layer.
FIG. 2 is a flow diagram of a retraining operation according to some embodiments. In 200, various parameters may be initialized. For example, a base learning rate, a counter for the number of iterations, etc. may be initialized.
In 202, layers of the neural network are trimmed using an automatically determined threshold. Specifically, the threshold for a layer may be automatically generated as described above. In some embodiments, all layers may be trimmed; however, as will be described in further detail below, in some embodiments less than all of the layers may be trimmed.
As a result of the pruning, the neural network with non-zero weights remains. At 204, the neural network is retrained using those non-zero weights. The pruning and retraining operations are repeated until the desired number of iterations is completed. For example, at 206, the number of iterations may be compared to the number of iterations required. If the number of iterations has not reached the desired number, pruning and retraining may be repeated.
Fig. 3A-3B are flowcharts of retraining operations according to some embodiments. Referring to fig. 3A, various parameters may be initialized in 300 similar to those described above in 200. In 302, convolutional (CONV) layers are pruned for those layers using an automatically determined threshold. Although convolutional-type layers have been used as an example, in other embodiments, other subsets of different types or containing less than all of the layers may be pruned.
At 304, the neural network is retrained using non-zero weights. In some embodiments, retraining continues for a particular number of iterations. In other embodiments, retraining continues until the retraining has covered all of the training sample set.
In 306, the number of iterations is compared to a threshold. If the number of iterations is less than the threshold, the pruning and retraining in 302 and 304 are repeated. Specifically, after retraining in 304, when pruning in 302 is performed, some non-zero weights that were previously survived the earlier pruning operation may have been reduced below the pruning threshold of the associated layer. Thus, those weights may be set to zero and the remaining non-zero weights retrained in 304.
If the number of iterations has reached the threshold in 306, then a set of layers in the neural network having a different type than the layers trimmed in 302 is fixed in 308. That is, during the subsequent retraining in 304, the fixed layer is not retrained. In some embodiments, the Full Connection (FC) and Input (IP) layers are fixed. The pruning at 302 and retraining at 304 may be repeated until the desired number of iterations is completed at 310.
Referring to fig. 3B, at 312, the layer being trimmed at 302 of fig. 3A is fixed. In this example, the convolutional layer is the layer trimmed at 302. Thus, a convolutional layer is fixed at 312.
At 314, the layers fixed at 308 are trimmed using an automatically determined threshold associated with the layers. In this example, the layer is an FC/IP layer that is the layer fixed in step 308.
At 316, the retraining rate may be adjusted based on the pruning rate. In particular, since pruning may reflect a reduced number of weights, the drop-out rate may be changed accordingly to accommodate a lower number of non-zero weights.
At 318, the neural network is retrained. However, in 321, the convolutional layer is fixed. Therefore, the layers are not retrained. At 320, if the number of iterations has not been completed, pruning and retraining at 314 and 318 are repeated.
In some embodiments, the remaining operations in FIG. 3A may not be performed other than the initialization in 300 in FIG. 3A. That is, operations may begin at 312, where a convolutional-type layer is fixed. In some embodiments, the convolutional layers may be pruned using respective automatically determined thresholds before the layers are fixed.
Although a particular type of layer has been used as an example of a layer that is trimmed, retrained, and fixed, in other embodiments the type may be different. Further, in some embodiments, the first layer set trimmed at 302 and fixed at 312 and the second layer set trimmed at 308 and fixed at 314 may form an entire layer set. However, in other embodiments, pruning, retraining, and fixing of other layer sets may not follow the techniques for the first set or the second set. For example, the third tier aggregate may be trimmed at 302, but not fixed at 312.
FIG. 4 is a flow diagram of a technique for automatically determining thresholds, pruning, and retraining according to some embodiments. In some embodiments, at 400, a pruning threshold is automatically determined. The threshold may be automatically determined as described above. In 402, those automatically determined thresholds are used to trim and retrain the neural network. In some embodiments, the threshold may be automatically determined in 400 by multiple iterations. After these iterations are completed, the resulting threshold is used to iteratively prune and retrain the neural network at 402.
FIG. 5 is a set of curves illustrating a retraining operation according to some embodiments. These graphs illustrate the pruning and retraining of google net neural networks. Specifically, the graphs show loss variation, top-1 accuracy, and top-2 accuracy. Here two pruning operations are shown, one before the first training iteration and after the second has performed a certain amount of training iterations. Although two trimming operations are shown, any number of trimming operations may be performed.
FIG. 6 is a graph including results of various neural networks after pruning and retraining according to some embodiments. In particular, the size variation of the weight parameters, top-1 accuracy, and top-5 accuracy are described for various neural networks and trimmed versions of those networks. AlexNet, VGG16, squeezeNet and GoogLeNet are listed herein, as well as pruned versions of AlexNet and VGG 16. In particular, the pruned google net entries illustrate training and reasoning networks of the pruned google net as described herein. As noted, google net trimmed as described herein has a weight parameter that can provide the smallest dimension of a neural network with higher accuracy. Specifically, the pruned training and reasoning neural network is able to achieve top-5 accuracy in excess of 89% with minimal weight parameters.
Fig. 7A-7C are graphs including results of various techniques for pruning a neural network, according to some embodiments. These charts list the layers and sublayers of the google net neural network and the results of various pruning techniques. Two examples of pre-fix (prefix) thresholds are shown, including the resulting total weight after pruning and top-1 and top-5 performance. Another example illustrates the result of using thresholds generated by empirical rules. Finally, the last example shows the pruning results according to the embodiments described herein. This result shows that pruning as described herein can achieve accuracy comparable to untrimmed networks, with less weight than a pre-fixed threshold. Pruning described herein achieves similar or higher accuracy with similar overall weights for thresholds generated by empirical rules. However, pruning as described herein may be performed without requiring selection techniques or rules for pre-selecting the pruning threshold. That is, multiple iterations with pre-fixed thresholds need not be performed, and similar and/or better results may be achieved without empirical information for generating rules.
Fig. 8 illustrates a system according to some embodiments. The system 800 includes a processor 802 and a memory 804. The processor 802 may be a general purpose processor, a Digital Signal Processor (DSP), an application specific integrated circuit, a microcontroller, a programmable logic device, a discrete circuit, a combination of these, or the like. The processor 802 may include internal parts such as registers, cache memory, processing cores, etc., and may also include external interfaces such as address and data bus interfaces, interrupt interfaces, etc. Although only one processor 802 is shown in system 800, multiple processors 802 may be present. In addition, other interface devices, such as a logic chipset, hub, memory controller, communication interface, etc., may be part of the system 800 to connect the processor 802 to internal and external components.
The memory 804 may be any device capable of storing data. Here, one memory 804 is shown for system 800; however, any number of memories 804 may be included in system 800, including different types of memory. Examples of memory 804 include Dynamic Random Access Memory (DRAM) modules, double data rate synchronous dynamic random access memory (DDR SDRAM) according to various standards such as DDR, DDR2, DDR3, DDR4, static Random Access Memory (SRAM), nonvolatile memory such as flash memory, spin-transfer torque magnetoresistive random access memory (STT-MRAM) or phase change RAM, magnetic or optical media, and the like.
The memory 804 may be configured to store code that, when executed on the processor 802, causes the system 800 to implement any or all of the techniques described herein. In some embodiments, the system 800 may be configured to receive an input 806, such as a neural network, an initial threshold, an initial clipping error margin, an acceptable clipping percentage range, and the like. Output 808 may include automatically determined thresholds, trimmed and retrained neural networks, or other result information described above.
Although the method and system have been described in terms of particular embodiments, those of ordinary skill in the art will readily recognize that there could be variations to the embodiments disclosed and, therefore, any variations would be considered to be within the spirit and scope of the methods and systems disclosed herein. Accordingly, many modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the appended claims.

Claims (19)

1. A method of pruning a neural network for image recognition using a threshold, comprising:
pruning layers of a neural network having a plurality of layers using a threshold; and
repeating pruning of the layers of the neural network using different thresholds, until a pruning error of the pruned layers reaches a pruning error tolerance,
wherein the pruning error is calculated by dividing a magnitude obtained by subtracting a length of a vector of pruning weights of the layer of a current iteration from a length of a vector of weights of the layer of a previous iteration by a magnitude obtained by subtracting a length of a vector of pruning weights of the layer of a current iteration from a total length of a vector of initial weights of the layer.
2. The method of claim 1, further comprising, for each layer of the neural network:
initializing the clipping error margin;
initializing the threshold; and
the following steps are repeated until the percentage of pruning weights for the layer is within the range for the layer:
repeating the steps until the clipping error reaches the clipping error tolerance:
pruning the layers of the neural network using the threshold;
calculating a trimming error of the trimmed layer;
comparing the clipping error with the clipping error tolerance; and
changing the threshold in response to the comparison;
calculating a percentage of the pruning weight of the pruned layer; and
the clipping error margin is changed in response to the percentage of the clipping weights.
3. The method of claim 1, further comprising:
repeating pruning the layer using different pruning error margins until the percentage of pruning weights for the layer is within a range for the layer.
4. A method according to claim 3, wherein different types of layers of the neural network have different ranges for the percentage of pruning weights.
5. The method of claim 1, wherein pruning the layer of the neural network comprises: if the magnitude of the weight is less than the threshold, the weight is set to zero.
6. The method of claim 1, wherein pruning the layer of the neural network comprises: if the magnitude of the weight is less than the threshold scaled by a scaling factor, the weight is set to zero.
7. The method of claim 6, wherein the scaling factor is a standard deviation of weights of the layers.
8. The method of claim 1, further comprising performing the following operations to produce the different thresholds:
if the clipping error is less than the clipping error tolerance, increasing the threshold; and
if the clipping error is greater than the clipping error tolerance, the threshold is lowered.
9. The method of claim 1, further comprising: repeating pruning of layers of the neural network is performed for each layer of the neural network.
10. The method of claim 1, further comprising: after the clipping error of the clipped layer reaches the clipping error tolerance, the neural network is iteratively clipped and retrained using the threshold.
11. A method of pruning a neural network for image recognition using a threshold, comprising:
the following steps are repeated:
pruning layers of a neural network having a plurality of layers using an automatically determined threshold until a pruning error of the pruned layers reaches a pruning error tolerance; and
the neural network is retrained using only the weights remaining after pruning,
wherein the pruning error is calculated by dividing a magnitude obtained by subtracting a length of a vector of pruning weights of the layer of a current iteration from a length of a vector of weights of the layer of a previous iteration by a magnitude obtained by subtracting a length of a vector of pruning weights of the layer of a current iteration from a total length of a vector of initial weights of the layer.
12. The method of claim 11, wherein pruning the layer of the neural network comprises: layers of the neural network having a first type are trimmed using an automatically determined threshold.
13. The method of claim 12, further comprising:
fixing weights of layers of the neural network having a second type different from the first type;
the following steps are repeated:
trimming the layer having the first type; and
retraining the neural network using only weights remaining after pruning;
fixing weights of the neural network having the layers of the first type; and
the following steps are repeated:
pruning layers of the neural network having the second type; and
the neural network is retrained using only weights that remain after pruning.
14. The method of claim 12, further comprising: the weights of layers of the neural network having a second type different from the first type are fixed.
15. The method of claim 14, further comprising:
fixing weights of the neural network having the layers of the first type; and
the following steps are repeated:
pruning layers of the neural network having the second type; and
the neural network is retrained using only weights that remain after pruning.
16. The method of claim 11, further comprising: the automatically determined threshold is generated prior to retraining the neural network.
17. The method of claim 11, further comprising: the drop-out rate for retraining is adjusted in response to the pruning rate of the pruning.
18. A system for pruning a neural network for image recognition using a threshold, comprising:
a memory; and
a processor coupled to the memory and configured to:
pruning layers of a neural network having a plurality of layers using a threshold; and
repeating pruning of the layers of the neural network using different thresholds, until a pruning error of a pruned layer reaches a pruning error tolerance,
wherein the pruning error is calculated by dividing a magnitude obtained by subtracting a length of a vector of pruning weights of the layer of a current iteration from a length of a vector of weights of the layer of a previous iteration by a magnitude obtained by subtracting a length of a vector of pruning weights of the layer of a current iteration from a total length of a vector of initial weights of the layer.
19. The system of claim 18, wherein the processor is further configured to, for each layer of the neural network:
initializing the clipping error margin;
initializing the threshold; and
the following steps are repeated until the percentage of pruning weights for the layer is within the range for the layer:
repeating the steps until the clipping error reaches the clipping error tolerance:
pruning the layers of the neural network using the threshold;
calculating a trimming error of the trimmed layer;
comparing the clipping error with the clipping error tolerance; and
changing the threshold in response to the comparison;
calculating a percentage of the pruning weight of the pruned layer; and
the clipping error margin is changed in response to the percentage of the clipping weights.
CN201810100412.0A 2017-02-10 2018-01-31 Automatic threshold for neural network pruning and retraining Active CN108416423B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201762457806P 2017-02-10 2017-02-10
US62/457,806 2017-02-10
US15/488,430 2017-04-14
US15/488,430 US10832135B2 (en) 2017-02-10 2017-04-14 Automatic thresholds for neural network pruning and retraining

Publications (2)

Publication Number Publication Date
CN108416423A CN108416423A (en) 2018-08-17
CN108416423B true CN108416423B (en) 2024-01-12

Family

ID=63104667

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810100412.0A Active CN108416423B (en) 2017-02-10 2018-01-31 Automatic threshold for neural network pruning and retraining

Country Status (3)

Country Link
US (2) US10832135B2 (en)
KR (1) KR102566480B1 (en)
CN (1) CN108416423B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10832135B2 (en) * 2017-02-10 2020-11-10 Samsung Electronics Co., Ltd. Automatic thresholds for neural network pruning and retraining
KR102413028B1 (en) * 2017-08-16 2022-06-23 에스케이하이닉스 주식회사 Method and device for pruning convolutional neural network
KR20190051697A (en) 2017-11-07 2019-05-15 삼성전자주식회사 Method and apparatus for performing devonvolution operation in neural network
US10776662B2 (en) * 2017-11-09 2020-09-15 Disney Enterprises, Inc. Weakly-supervised spatial context networks to recognize features within an image
JP6831347B2 (en) * 2018-04-05 2021-02-17 日本電信電話株式会社 Learning equipment, learning methods and learning programs
US20190378013A1 (en) * 2018-06-06 2019-12-12 Kneron Inc. Self-tuning model compression methodology for reconfiguring deep neural network and electronic device
US11010132B2 (en) * 2018-09-28 2021-05-18 Tenstorrent Inc. Processing core with data associative adaptive rounding
US11281974B2 (en) * 2018-10-25 2022-03-22 GM Global Technology Operations LLC Activation zero-bypass and weight pruning in neural networks for vehicle perception systems
US20210397962A1 (en) * 2018-10-31 2021-12-23 Nota, Inc. Effective network compression using simulation-guided iterative pruning
US11663001B2 (en) * 2018-11-19 2023-05-30 Advanced Micro Devices, Inc. Family of lossy sparse load SIMD instructions
KR20200066953A (en) 2018-12-03 2020-06-11 삼성전자주식회사 Semiconductor memory device employing processing in memory (PIM) and operating method for the same
KR102163498B1 (en) * 2018-12-24 2020-10-08 아주대학교산학협력단 Apparatus and method for pruning-retraining of neural network
CN109800859B (en) * 2018-12-25 2021-01-12 深圳云天励飞技术有限公司 Neural network batch normalization optimization method and device
CN109948795B (en) * 2019-03-11 2021-12-14 驭势科技(北京)有限公司 Method and device for determining network structure precision and delay optimization point
JP7150651B2 (en) * 2019-03-22 2022-10-11 株式会社日立ソリューションズ・テクノロジー Neural network model reducer
CN110276452A (en) * 2019-06-28 2019-09-24 北京中星微电子有限公司 Pruning method, device, equipment and the artificial intelligence chip of neural network model
CN110599458A (en) * 2019-08-14 2019-12-20 深圳市勘察研究院有限公司 Underground pipe network detection and evaluation cloud system based on convolutional neural network
CN111553480B (en) * 2020-07-10 2021-01-01 腾讯科技(深圳)有限公司 Image data processing method and device, computer readable medium and electronic equipment
KR20220045424A (en) * 2020-10-05 2022-04-12 삼성전자주식회사 Method and apparatus of compressing artificial neural network
KR102614909B1 (en) * 2021-03-04 2023-12-19 삼성전자주식회사 Neural network operation method and appratus using sparsification
CN113963175A (en) * 2021-05-13 2022-01-21 北京市商汤科技开发有限公司 Image processing method and device, computer equipment and storage medium
JP2023063944A (en) * 2021-10-25 2023-05-10 富士通株式会社 Machine learning program, method for machine learning, and information processing apparatus
US20230153625A1 (en) * 2021-11-17 2023-05-18 Samsung Electronics Co., Ltd. System and method for torque-based structured pruning for deep neural networks

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5787408A (en) * 1996-08-23 1998-07-28 The United States Of America As Represented By The Secretary Of The Navy System and method for determining node functionality in artificial neural networks
CN105352907A (en) * 2015-11-27 2016-02-24 南京信息工程大学 Infrared gas sensor based on radial basis network temperature compensation and detection method
CN105447498A (en) * 2014-09-22 2016-03-30 三星电子株式会社 A client device configured with a neural network, a system and a server system
CN105512725A (en) * 2015-12-14 2016-04-20 杭州朗和科技有限公司 Neural network training method and equipment
CN106127217A (en) * 2015-05-07 2016-11-16 西门子保健有限责任公司 The method and system that neutral net detects is goed deep into for anatomical object for approximation
CN106295800A (en) * 2016-07-28 2017-01-04 北京工业大学 A kind of water outlet total nitrogen TN intelligent detecting method based on recurrence Self organizing RBF Neural Network

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5288645A (en) * 1992-09-04 1994-02-22 Mtm Engineering, Inc. Hydrogen evolution analyzer
US6324532B1 (en) 1997-02-07 2001-11-27 Sarnoff Corporation Method and apparatus for training a neural network to detect objects in an image
NZ503882A (en) * 2000-04-10 2002-11-26 Univ Otago Artificial intelligence system comprising a neural network with an adaptive component arranged to aggregate rule nodes
US7031948B2 (en) * 2001-10-05 2006-04-18 Lee Shih-Jong J Regulation of hierarchic decisions in intelligent systems
WO2003079286A1 (en) * 2002-03-15 2003-09-25 Pacific Edge Biotechnology Limited Medical applications of adaptive learning systems using gene expression data
DE60217663T2 (en) * 2002-03-26 2007-11-22 Council Of Scientific And Industrial Research IMPROVED ARTIFICIAL NEURONAL NETWORK MODELS IN THE PRESENCE OF INSTRUMENT NOISE AND MEASUREMENT ERRORS
JP2005523533A (en) * 2002-04-19 2005-08-04 コンピュータ アソシエイツ シンク,インコーポレイテッド Processing mixed numeric and / or non-numeric data
WO2003094051A1 (en) * 2002-04-29 2003-11-13 Laboratory For Computational Analytics And Semiotics, Llc Sequence miner
US20080172214A1 (en) * 2004-08-26 2008-07-17 Strategic Health Decisions, Inc. System For Optimizing Treatment Strategies Using a Patient-Specific Rating System
US8700552B2 (en) * 2011-11-28 2014-04-15 Microsoft Corporation Exploiting sparseness in training deep neural networks
CN103136247B (en) * 2011-11-29 2015-12-02 阿里巴巴集团控股有限公司 Attribute data interval division method and device
US20140006471A1 (en) * 2012-06-27 2014-01-02 Horia Margarit Dynamic asynchronous modular feed-forward architecture, system, and method
US10055434B2 (en) 2013-10-16 2018-08-21 University Of Tennessee Research Foundation Method and apparatus for providing random selection and long-term potentiation and depression in an artificial network
US9730643B2 (en) * 2013-10-17 2017-08-15 Siemens Healthcare Gmbh Method and system for anatomical object detection using marginal space deep neural networks
WO2016010601A2 (en) * 2014-04-23 2016-01-21 The Florida State University Research Foundation, Inc. Adaptive nonlinear model predictive control using a neural network and input sampling
US9672474B2 (en) * 2014-06-30 2017-06-06 Amazon Technologies, Inc. Concurrent binning of machine learning data
US10650805B2 (en) 2014-09-11 2020-05-12 Nuance Communications, Inc. Method for scoring in an automatic speech recognition system
US11423311B2 (en) * 2015-06-04 2022-08-23 Samsung Electronics Co., Ltd. Automatic tuning of artificial neural networks
US10460230B2 (en) * 2015-06-04 2019-10-29 Samsung Electronics Co., Ltd. Reducing computations in a neural network
US20180082181A1 (en) * 2016-05-13 2018-03-22 Samsung Electronics, Co. Ltd. Neural Network Reordering, Weight Compression, and Processing
US10832135B2 (en) * 2017-02-10 2020-11-10 Samsung Electronics Co., Ltd. Automatic thresholds for neural network pruning and retraining

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5787408A (en) * 1996-08-23 1998-07-28 The United States Of America As Represented By The Secretary Of The Navy System and method for determining node functionality in artificial neural networks
CN105447498A (en) * 2014-09-22 2016-03-30 三星电子株式会社 A client device configured with a neural network, a system and a server system
CN106127217A (en) * 2015-05-07 2016-11-16 西门子保健有限责任公司 The method and system that neutral net detects is goed deep into for anatomical object for approximation
CN105352907A (en) * 2015-11-27 2016-02-24 南京信息工程大学 Infrared gas sensor based on radial basis network temperature compensation and detection method
CN105512725A (en) * 2015-12-14 2016-04-20 杭州朗和科技有限公司 Neural network training method and equipment
CN106295800A (en) * 2016-07-28 2017-01-04 北京工业大学 A kind of water outlet total nitrogen TN intelligent detecting method based on recurrence Self organizing RBF Neural Network

Also Published As

Publication number Publication date
KR102566480B1 (en) 2023-08-11
CN108416423A (en) 2018-08-17
US10832135B2 (en) 2020-11-10
US20180232640A1 (en) 2018-08-16
KR20180092810A (en) 2018-08-20
US20200410357A1 (en) 2020-12-31

Similar Documents

Publication Publication Date Title
CN108416423B (en) Automatic threshold for neural network pruning and retraining
US11948070B2 (en) Hardware implementation of a convolutional neural network
CN112085188B (en) Method for determining quantization parameter of neural network and related product
CN108170667B (en) Word vector processing method, device and equipment
US20190114742A1 (en) Image upscaling with controllable noise reduction using a neural network
CN110942483B (en) Function rapid convergence model construction method, device and terminal
CN112200132A (en) Data processing method, device and equipment based on privacy protection
CN111062897B (en) Image equalization method, terminal and storage medium
CN111723550A (en) Statement rewriting method, device, electronic device, and computer storage medium
US11886832B2 (en) Operation device and operation method
US10997497B2 (en) Calculation device for and calculation method of performing convolution
US9842647B1 (en) Programming of resistive random access memory for analog computation
WO2020125740A1 (en) Image reconstruction method and device, apparatus, and computer-readable storage medium
JP7360595B2 (en) information processing equipment
WO2020087254A1 (en) Optimization method for convolutional neural network, and related product
KR102203337B1 (en) Apparatus and method for m-estimation with trimmed l1 penalty
WO2022027242A1 (en) Neural network-based data processing method and apparatus, mobile platform, and computer readable storage medium
US20230131543A1 (en) Apparatus and method with multi-task processing
Zotov Algorithm for synthesizing optimal controllers of given complexity
CN115062765A (en) Task processing method and device, electronic equipment and storage medium
CN116306820A (en) Quantization training method, apparatus, device, and computer-readable storage medium
CN112202886A (en) Task unloading method, system, device and storage medium
Akyürek et al. Automatic Knot Adjustment Using Dolphin Echolocation Algorithm for B-Spline Curve Approximation
CN111130555A (en) Compressed sensing signal reconstruction method and system
Vega et al. Iterative Optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant