US20170337472A1 - Passive pruning of filters in a convolutional neural network - Google Patents

Passive pruning of filters in a convolutional neural network Download PDF

Info

Publication number
US20170337472A1
US20170337472A1 US15/595,049 US201715595049A US2017337472A1 US 20170337472 A1 US20170337472 A1 US 20170337472A1 US 201715595049 A US201715595049 A US 201715595049A US 2017337472 A1 US2017337472 A1 US 2017337472A1
Authority
US
United States
Prior art keywords
weights
neural network
layer
block
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US15/595,049
Other versions
US10740676B2 (en
Inventor
Igor Durdanovic
Hans Peter Graf
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Laboratories America Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Laboratories America Inc filed Critical NEC Laboratories America Inc
Priority to US15/595,049 priority Critical patent/US10740676B2/en
Assigned to NEC LABORATORIES AMERICA, INC. reassignment NEC LABORATORIES AMERICA, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DURDANOVIC, IGOR, GRAF, HANS PETER
Publication of US20170337472A1 publication Critical patent/US20170337472A1/en
Assigned to NEC CORPORATION reassignment NEC CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: NEC LABORATORIES AMERICA, INC.
Application granted granted Critical
Publication of US10740676B2 publication Critical patent/US10740676B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0635

Definitions

  • the present invention relates to neural networks and, more particularly, to filter pruning in convolutional neural networks.
  • CNNs convolutional neural networks
  • the cost of computing inferences increases with the number of parameters and convolution operations involved. These computational costs are particularly relevant when dealing with embedded sensors and mobile devices where computational and power resources are limited. High inference costs post a similar barrier in contexts where high responsiveness and low latency are needed.
  • a method of training a neural network includes training a neural network based on training data. Weights of a layer of the neural network are multiplied by an attrition factor. A block of weights is pruned from the layer if the block of weights in the layer has a contribution to an output of the layer that is below a threshold.
  • a method of training a neural network includes training a convolutional neural network based on training data. Weights of a layer of the neural network are multiplied by a number less than one. A block of weights in the layer is pruned, a filter corresponding to the block of weights in a subsequent layer in the neural network is pruned, and a block of weights that corresponds to the pruned filter in a subsequent layer in the neural network is pruned if the block of weights in the layer has a contribution to an output of the layer that is below a threshold. The contribution of a block of weights to the output of the layer is calculated as a percentage of a sum of absolute weights of the weights in the layer made up by a sum of absolute weights of the weights in the block of weights.
  • a system for training a neural network includes a neural network.
  • a training module is configured to train the neural network based on training data.
  • a pruning module is configured to multiply weights of a layer of the neural network by an attrition factor and to prune a block of weights from the layer if the block of weights in the layer has a contribution to an output of the layer that is below a threshold.
  • FIG. 1 is a diagram illustrating the pruning of a block of weights and corresponding filter from a convolutional neural network in accordance with an embodiment of the present invention
  • FIG. 2 is a block/flow diagram of a method for pruning weights and filters from a convolutional neural network in accordance with an embodiment of the present invention
  • FIG. 3 is a block diagram of a convolutional neural network system that prunes the convolutional neural network in accordance with an embodiment of the present invention.
  • FIG. 4 is a block diagram of a security system based on pruned convolutional neural network classifiers in accordance with an embodiment of the present invention.
  • CNNs convolutional neural networks
  • the present embodiments reduce the size of all weights between each iteration, driving the weight values toward zero. Once a set of weights falls below a threshold, the weights are removed from the CNN along with associated kernels, thereby reducing the computational cost of using the pruned CNN without increasing the sparsity of the CNN. Because sparsity does not increase, the present embodiments do not necessitate the use of sparse libraries or specialized hardware.
  • the number of filters that are pruned correlates directly with computational acceleration by reducing the number of matrix multiplications.
  • CNNs are extensively used in image and video recognition, natural language processing, and other machine learning processes.
  • CNNs use multi-dimensional layers of weights to create filters that have small spatial coverage but that extend through the full depth of an input volume.
  • the individual pixels represent the width and height of the input, while the number of colors (e.g., red, green, and blue) represent the depth.
  • a filter in a CNN being used to process image data would apply to a limited number of pixels but would apply to all of the color information for those pixels.
  • the filter is convolved across the width and height of the input volume, with dot products being calculated between entries of the filter and the input at each position.
  • the present embodiments actively drive weights to zero during training and prune convolutional filters associated with low-magnitude weights.
  • convolutional filters that are not maintained during training are driven down to zero. This results in an efficient network that involves fewer convolutional operations.
  • An input layer 102 is provided with neurons that perform a processing function on an input volume 102 that may represent, for example, an image, a frame of video, a document, or any other appropriate set of multi-dimensional input data.
  • the input layer 102 in this example includes three dimensions (e.g., x, y, and n).
  • the neurons can be grouped into n filters, each filter having a dimensions x and y.
  • the outputs of the input layer 102 are provided to a first array of weights 104 .
  • a an attrition factor
  • those weights which are not enhanced by the training process will eventually decrease in magnitude until they fall below a threshold.
  • a column 106 has fallen below the threshold, representing weights which do not contribute to the accuracy of the output. This column 106 is pruned from the first array of weights 104 .
  • the first array of weights 104 provides its output to a layer of hidden neurons 108 .
  • the pruned column 106 corresponds to one filter 110 that is pruned from the layer of hidden neurons 108 .
  • the layer of hidden neurons 108 perform a computational function and provide an output to a second array of weights 112 .
  • the pruned filter 108 in turn corresponds to a row 114 of the second array of weights 112 , which is also pruned.
  • the second array of weights provides its output to a layer of output neurons 116 (or, alternatively, additional hidden layers) which performs a computational function and provides the output of the CNN.
  • the active pruning of weights significantly reduces the number of computations needed to produce the output.
  • FIG. 2 a method for training a CNN is shown.
  • the present embodiments perform training by performing a forward pass 202 through a CNN to provide a calculation.
  • Block 210 determines whether any block of weight values have dropped below a threshold contribution to the output (e.g., 10%).
  • the contribution to the output of a block of weights may be determined as a sum of absolute weight values.
  • the sum of absolute weight values for each block of weights can then be compared to the total sum of absolute weight values to determine the contribution of that block of weights.
  • Block 212 prunes those weights, and any associated filters and weights on other layers.
  • Block 214 determines whether training is complete (e.g., whether the output of the trained CNN matches an expected output). If so, training completes. If not, processing returns to block 202 .
  • any of the filters in any convolutional layer can be easily pruned.
  • pruning may not be straightforward.
  • Complex architectures may impose restrictions, such that filters need to be pruned carefully.
  • correspondences between filters may necessitate the pruning of filters to permit pruning of a given convolutional layer.
  • the above-described method is applied on a per-layer basis until the entire CNN is trained. At each layer, pruning may be repeated until the validation error raises beyond some threshold.
  • the validation set is a set of data that have not been used in training, but instead are processed by the network in forward passes to determine how accurately the network classifies data it has not seen during training.
  • the error rate on the validation set generally decreases. Once the error rate stops decreasing, the neural net is generally considered to be trained. Thus, pruning as part of the training process stops at this point as well.
  • Embodiments described herein may be entirely hardware, entirely software or including both hardware and software elements.
  • the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
  • Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
  • a computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the medium can be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
  • the medium may include a computer-readable storage medium such as a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc.
  • Each computer program may be tangibly stored in a machine-readable storage media or device (e.g., program memory or magnetic disk) readable by a general or special purpose programmable computer, for configuring and controlling operation of a computer when the storage media or device is read by the computer to perform the procedures described herein.
  • the inventive system may also be considered to be embodied in a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
  • a data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus.
  • the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution.
  • I/O devices including but not limited to keyboards, displays, pointing devices, etc. may be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
  • Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
  • the system 300 includes a hardware processor 302 and memory 304 .
  • a CNN 306 is implemented either in hardware or in software.
  • the CNN 306 takes input data and generates an output based on the filters and weights that make up the CNN's configuration.
  • the system 300 furthermore includes one or more functional modules that may, in some embodiments, be implemented as software that is stored in the memory 304 and executed by hardware processor 302 .
  • the functional modules may be implemented as one or more discrete hardware components in the form of, e.g., application specific integrated chips or field programmable gate arrays.
  • a training module 308 trains the CNN 306 based on training data.
  • the training data includes one set of data used to train the CNN 306 and another set of data used to test the CNN 306 , with differences between the outcome of the 306 and expected outcome from the testing data being used to adjust the CNN 306 .
  • a pruning module 310 actively moves the weights of the CNN toward zero in each round of training and prunes filters from the CNN 306 to reduce the computational complexity.
  • the training module 308 and the pruning module 310 work together as described above to ensure that the output of the CNN 306 is not significantly degraded by pruning.
  • the security system 400 includes a hardware processor 402 and a memory 404 .
  • One or more sensors 406 provide data about a monitored area to the security system 400 .
  • the sensors 406 may include, for example, a camera, a night vision camera (e.g., operating in infrared), door and window sensors, acoustic sensors, temperature sensors, and any other sensors that collect raw data regarding the monitored area.
  • the CNN system 300 is included in the security system 400 .
  • the CNN system 300 accepts information that is gathered by the sensors 406 and stored in memory 404 , outputting security status information.
  • the CNN system 300 may include its own separate processor 302 and memory 304 or may, alternatively, omit those feature in favor of using the processor 402 and memory 404 of the security system 400 .
  • An alert module 408 accepts the output of the CNN system 300 .
  • the alert module 408 determines if the state of the area being monitored has changed and, if so, whether an alert should be issued.
  • the CNN system 300 may detect movement or the presence of a person or object in a place where it does not belong.
  • the CNN system 300 may detect an intrusion event.
  • the alert module 408 provides an appropriate alert to one or more of the user and a response organization (e.g., medical, police, or fire).
  • the alert module 408 provide the alert by any appropriate communications mechanism, including by wired or wireless network connections or by a user interface.
  • a control module 410 works with the alert module 408 to perform appropriate security management actions. For example, if an unauthorized person is detected by the CNN system 300 , the control module 410 may automatically increase a security level and perform such actions as locking doors, increasing sensor sensitivity, and changing the sensitivity of the alert module 408 .
  • the CNN system 300 can provide accurate results with relatively low computational complexity, making it possible to implement the security system 400 on lower-power hardware.
  • the processor 402 need not be a high-powered device and may in particular be implemented in an embedded environment.

Abstract

Methods and systems of training a neural network includes training a neural network based on training data. Weights of a layer of the neural network are multiplied by an attrition factor. A block of weights is pruned from the layer if the block of weights in the layer has a contribution to an output of the layer that is below a threshold.

Description

    RELATED APPLICATION INFORMATION
  • This application claims priority to U.S. Patent Application No. 62/338,573, filed on May 19, 2016, and 62/338,797, filed on May 19, 2016, incorporated herein by reference in their entirety.
  • BACKGROUND Technical Field
  • The present invention relates to neural networks and, more particularly, to filter pruning in convolutional neural networks.
  • Description of the Related Art
  • As convolutional neural networks (CNNs) grow deeper (i.e., involve progressively more layers), the cost of computing inferences increases with the number of parameters and convolution operations involved. These computational costs are particularly relevant when dealing with embedded sensors and mobile devices where computational and power resources are limited. High inference costs post a similar barrier in contexts where high responsiveness and low latency are needed.
  • Existing approaches to reducing the storage and computation costs involve model compression by pruning weights with small magnitudes and then retraining the model. However, pruning parameters does not necessarily reduce computation time, because the majority of the parameters that are removed are from fully connected layers where the computation cost is low. In addition, the resulting sparse models lack optimizations that make computations practical.
  • SUMMARY
  • A method of training a neural network includes training a neural network based on training data. Weights of a layer of the neural network are multiplied by an attrition factor. A block of weights is pruned from the layer if the block of weights in the layer has a contribution to an output of the layer that is below a threshold.
  • A method of training a neural network includes training a convolutional neural network based on training data. Weights of a layer of the neural network are multiplied by a number less than one. A block of weights in the layer is pruned, a filter corresponding to the block of weights in a subsequent layer in the neural network is pruned, and a block of weights that corresponds to the pruned filter in a subsequent layer in the neural network is pruned if the block of weights in the layer has a contribution to an output of the layer that is below a threshold. The contribution of a block of weights to the output of the layer is calculated as a percentage of a sum of absolute weights of the weights in the layer made up by a sum of absolute weights of the weights in the block of weights.
  • A system for training a neural network includes a neural network. A training module is configured to train the neural network based on training data. A pruning module is configured to multiply weights of a layer of the neural network by an attrition factor and to prune a block of weights from the layer if the block of weights in the layer has a contribution to an output of the layer that is below a threshold.
  • These and other features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
  • BRIEF DESCRIPTION OF DRAWINGS
  • The disclosure will provide details in the following description of preferred embodiments with reference to the following figures wherein:
  • FIG. 1 is a diagram illustrating the pruning of a block of weights and corresponding filter from a convolutional neural network in accordance with an embodiment of the present invention;
  • FIG. 2 is a block/flow diagram of a method for pruning weights and filters from a convolutional neural network in accordance with an embodiment of the present invention;
  • FIG. 3 is a block diagram of a convolutional neural network system that prunes the convolutional neural network in accordance with an embodiment of the present invention; and
  • FIG. 4 is a block diagram of a security system based on pruned convolutional neural network classifiers in accordance with an embodiment of the present invention.
  • DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
  • In accordance with the present principles, systems and methods are provided for active pruning of filters in convolutional neural networks (CNNs). During training, the present embodiments reduce the size of all weights between each iteration, driving the weight values toward zero. Once a set of weights falls below a threshold, the weights are removed from the CNN along with associated kernels, thereby reducing the computational cost of using the pruned CNN without increasing the sparsity of the CNN. Because sparsity does not increase, the present embodiments do not necessitate the use of sparse libraries or specialized hardware. The number of filters that are pruned correlates directly with computational acceleration by reducing the number of matrix multiplications.
  • CNNs are extensively used in image and video recognition, natural language processing, and other machine learning processes. CNNs use multi-dimensional layers of weights to create filters that have small spatial coverage but that extend through the full depth of an input volume. To use the example of an image input, the individual pixels represent the width and height of the input, while the number of colors (e.g., red, green, and blue) represent the depth. Thus, a filter in a CNN being used to process image data would apply to a limited number of pixels but would apply to all of the color information for those pixels. The filter is convolved across the width and height of the input volume, with dot products being calculated between entries of the filter and the input at each position.
  • The present embodiments actively drive weights to zero during training and prune convolutional filters associated with low-magnitude weights. Thus, convolutional filters that are not maintained during training are driven down to zero. This results in an efficient network that involves fewer convolutional operations.
  • Referring now in detail to the figures in which like numerals represent the same or similar elements and initially to FIG. 1, a diagram of active pruning of a CNN is shown. An input layer 102 is provided with neurons that perform a processing function on an input volume 102 that may represent, for example, an image, a frame of video, a document, or any other appropriate set of multi-dimensional input data. The input layer 102 in this example includes three dimensions (e.g., x, y, and n). The neurons can be grouped into n filters, each filter having a dimensions x and y. The outputs of the input layer 102 are provided to a first array of weights 104.
  • During training, the values of the weights 104 are multiplied by an attrition factor a that is less than 1 (e.g., a=0.9999). Thus, during each iteration of training, those weights which are not enhanced by the training process will eventually decrease in magnitude until they fall below a threshold. In this example, a column 106 has fallen below the threshold, representing weights which do not contribute to the accuracy of the output. This column 106 is pruned from the first array of weights 104.
  • The first array of weights 104 provides its output to a layer of hidden neurons 108. The pruned column 106 corresponds to one filter 110 that is pruned from the layer of hidden neurons 108. The layer of hidden neurons 108 perform a computational function and provide an output to a second array of weights 112.
  • The pruned filter 108 in turn corresponds to a row 114 of the second array of weights 112, which is also pruned. The second array of weights provides its output to a layer of output neurons 116 (or, alternatively, additional hidden layers) which performs a computational function and provides the output of the CNN. The active pruning of weights significantly reduces the number of computations needed to produce the output.
  • Referring now to FIG. 2, a method for training a CNN is shown. The present embodiments perform training by performing a forward pass 202 through a CNN to provide a calculation. This forward pass can be expressed as, e.g., K=I×W, where W represents an array of weights, data input to the array of weights, and K represents the output of the array of weights. Block 204 then performs a backward pass, B=G×W, where G is a gradient input to the array of weights W, providing the back-propagated gradient B. Block 206 performs a learning pass ΔW=I×G, which adjusts the weights in the weight array by ΔW.
  • After performing the learning pass, the weights are driven toward zero in block 208 by multiplying the weights by an attrition factor a that is less than zero (e.g., a=0.9999). Block 210 determines whether any block of weight values have dropped below a threshold contribution to the output (e.g., 10%). The contribution to the output of a block of weights (e.g., a column or row of weights) may be determined as a sum of absolute weight values. The sum of absolute weight values for each block of weights can then be compared to the total sum of absolute weight values to determine the contribution of that block of weights.
  • If the block of weight values have dropped below the threshold level of contribution toward the output, block 212 prunes those weights, and any associated filters and weights on other layers. Block 214 then determines whether training is complete (e.g., whether the output of the trained CNN matches an expected output). If so, training completes. If not, processing returns to block 202.
  • For simpler CNNs, any of the filters in any convolutional layer can be easily pruned. However, for complex network architectures, pruning may not be straightforward. Complex architectures may impose restrictions, such that filters need to be pruned carefully. In one example, correspondences between filters may necessitate the pruning of filters to permit pruning of a given convolutional layer.
  • The above-described method is applied on a per-layer basis until the entire CNN is trained. At each layer, pruning may be repeated until the validation error raises beyond some threshold. When training neural networks, the validation set is a set of data that have not been used in training, but instead are processed by the network in forward passes to determine how accurately the network classifies data it has not seen during training. During training, the error rate on the validation set generally decreases. Once the error rate stops decreasing, the neural net is generally considered to be trained. Thus, pruning as part of the training process stops at this point as well.
  • Embodiments described herein may be entirely hardware, entirely software or including both hardware and software elements. In a preferred embodiment, the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
  • Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. A computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. The medium may include a computer-readable storage medium such as a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc.
  • Each computer program may be tangibly stored in a machine-readable storage media or device (e.g., program memory or magnetic disk) readable by a general or special purpose programmable computer, for configuring and controlling operation of a computer when the storage media or device is read by the computer to perform the procedures described herein. The inventive system may also be considered to be embodied in a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
  • A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution. Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) may be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
  • Referring now to FIG. 3, a CNN system 300 is shown. The system 300 includes a hardware processor 302 and memory 304. A CNN 306 is implemented either in hardware or in software. The CNN 306 takes input data and generates an output based on the filters and weights that make up the CNN's configuration. The system 300 furthermore includes one or more functional modules that may, in some embodiments, be implemented as software that is stored in the memory 304 and executed by hardware processor 302. In alternative embodiments, the functional modules may be implemented as one or more discrete hardware components in the form of, e.g., application specific integrated chips or field programmable gate arrays.
  • In particular, a training module 308 trains the CNN 306 based on training data. The training data includes one set of data used to train the CNN 306 and another set of data used to test the CNN 306, with differences between the outcome of the 306 and expected outcome from the testing data being used to adjust the CNN 306. A pruning module 310 actively moves the weights of the CNN toward zero in each round of training and prunes filters from the CNN 306 to reduce the computational complexity. The training module 308 and the pruning module 310 work together as described above to ensure that the output of the CNN 306 is not significantly degraded by pruning.
  • Referring now to FIG. 4, a security system 400 is shown as one possible implementation of the present embodiments. The security system 400 includes a hardware processor 402 and a memory 404. One or more sensors 406 provide data about a monitored area to the security system 400. The sensors 406 may include, for example, a camera, a night vision camera (e.g., operating in infrared), door and window sensors, acoustic sensors, temperature sensors, and any other sensors that collect raw data regarding the monitored area.
  • The CNN system 300 is included in the security system 400. The CNN system 300 accepts information that is gathered by the sensors 406 and stored in memory 404, outputting security status information. The CNN system 300 may include its own separate processor 302 and memory 304 or may, alternatively, omit those feature in favor of using the processor 402 and memory 404 of the security system 400.
  • An alert module 408 accepts the output of the CNN system 300. The alert module 408 determines if the state of the area being monitored has changed and, if so, whether an alert should be issued. For example, the CNN system 300 may detect movement or the presence of a person or object in a place where it does not belong. Alternatively, the CNN system 300 may detect an intrusion event. In such a situation, the alert module 408 provides an appropriate alert to one or more of the user and a response organization (e.g., medical, police, or fire). The alert module 408 provide the alert by any appropriate communications mechanism, including by wired or wireless network connections or by a user interface.
  • A control module 410 works with the alert module 408 to perform appropriate security management actions. For example, if an unauthorized person is detected by the CNN system 300, the control module 410 may automatically increase a security level and perform such actions as locking doors, increasing sensor sensitivity, and changing the sensitivity of the alert module 408.
  • Because the CNN system 300 has been pruned, the CNN system 300 can provide accurate results with relatively low computational complexity, making it possible to implement the security system 400 on lower-power hardware. In particular, the processor 402 need not be a high-powered device and may in particular be implemented in an embedded environment.
  • The foregoing is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that those skilled in the art may implement various modifications without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention. Having thus described aspects of the invention, with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims.

Claims (19)

What is claimed is:
1. A method of training a neural network, comprising:
training a neural network based on training data;
multiplying weights of a layer of the neural network by an attrition factor; and
pruning a block of weights from the layer if the block of weights in the layer has a contribution to an output of the layer that is below a threshold.
2. The method of claim 1, wherein the attrition factor is a number less than one.
3. The method of claim 1, wherein the contribution of a block of weights to the output of the layer is calculated as a percentage of a sum of absolute weights of the weights in the layer made up by a sum of absolute weights of the weights in the block of weights.
4. The method of claim 1, further comprising pruning a filter in a subsequent layer in the neural network that corresponds to the pruned block of weights.
5. The method of claim 4, further comprising pruning a block of weights in a subsequent layer in the neural network that corresponds to the pruned filter.
6. The method of claim 1, wherein the neural network is a convolutional neural network.
7. The method of claim 1, wherein training, multiplying, and pruning are repeated until output of the neural network is within a threshold difference from an expected output for a validation data set.
8. The method of claim 1, wherein training the neural network comprises a forward pass using the training data, a backward pass, and a learning pass that updates weights of the neural network.
9. The method of claim 1, wherein pruning a block of weights comprises removing a column or row of an array of weights.
10. A method of training a neural network, comprising:
training a convolutional neural network based on training data;
multiplying weights of a layer of the neural network by a number less than one; and
pruning a block of weights from the layer, pruning a filter corresponding to the block of weights in a subsequent layer in the neural network, and pruning a block of weights that corresponds to the pruned filter in a subsequent layer in the neural network, if the block of weights in the layer has a contribution to an output of the layer that is below a threshold, wherein the contribution of a block of weights to the output of the layer is calculated as a percentage of a sum of absolute weights of the weights in the layer made up by a sum of absolute weights of the weights in the block of weights.
11. A system for training a neural network, comprising:
a neural network;
a training module configured to train the neural network based on training data; and
a pruning module configured to multiply weights of a layer of the neural network by an attrition factor and to prune a block of weights from the layer if the block of weights in the layer has a contribution to an output of the layer that is below a threshold.
12. The system of claim 11, wherein the attrition factor is a number less than one.
13. The system of claim 11, wherein the pruning module is further configured to calculate the contribution of a block of weights to the output of the layer as a percentage of a sum of absolute weights of the weights in the layer made up by a sum of absolute weights of the weights in the block of weights.
14. The system of claim 11, further wherein the pruning module is further configured to prune a filter in a subsequent layer in the neural network that corresponds to the pruned block of weights.
15. The system of claim 14, wherein the pruning module is further configured to prune a block of weights in a subsequent layer in the neural network that corresponds to the pruned filter.
16. The system of claim 11, wherein the neural network is a convolutional neural network.
17. The system of claim 11, wherein the training module and the pruning module are further configured to repeat training, multiplying, and pruning until output of the neural network is within a threshold difference from an expected output for a validation data set.
18. The system of claim 11, wherein the training module is further configured to train the neural network using a forward pass using the training data, a backward pass, and a learning pass that updates weights of the neural network.
19. The system of claim 11, wherein the pruning module is further configured to remove a column or row of an array of weights.
US15/595,049 2016-05-19 2017-05-15 Passive pruning of filters in a convolutional neural network Active 2039-05-26 US10740676B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/595,049 US10740676B2 (en) 2016-05-19 2017-05-15 Passive pruning of filters in a convolutional neural network

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201662338573P 2016-05-19 2016-05-19
US201662338797P 2016-05-19 2016-05-19
US15/595,049 US10740676B2 (en) 2016-05-19 2017-05-15 Passive pruning of filters in a convolutional neural network

Publications (2)

Publication Number Publication Date
US20170337472A1 true US20170337472A1 (en) 2017-11-23
US10740676B2 US10740676B2 (en) 2020-08-11

Family

ID=60330282

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/595,049 Active 2039-05-26 US10740676B2 (en) 2016-05-19 2017-05-15 Passive pruning of filters in a convolutional neural network

Country Status (1)

Country Link
US (1) US10740676B2 (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108388537A (en) * 2018-03-06 2018-08-10 上海熠知电子科技有限公司 A kind of convolutional neural networks accelerator and method
CN108846476A (en) * 2018-07-13 2018-11-20 电子科技大学 A kind of intelligent terminal security level classification method based on convolutional neural networks
CN109598340A (en) * 2018-11-15 2019-04-09 北京知道创宇信息技术有限公司 Method of cutting out, device and the storage medium of convolutional neural networks
CN110097177A (en) * 2019-05-15 2019-08-06 电科瑞达(成都)科技有限公司 A kind of network pruning method based on pseudo- twin network
CN110276452A (en) * 2019-06-28 2019-09-24 北京中星微电子有限公司 Pruning method, device, equipment and the artificial intelligence chip of neural network model
CN110287857A (en) * 2019-06-20 2019-09-27 厦门美图之家科技有限公司 A kind of training method of characteristic point detection model
CN110458289A (en) * 2019-06-10 2019-11-15 北京达佳互联信息技术有限公司 The construction method of multimedia class model, multimedia class method and device
CN110533156A (en) * 2018-05-23 2019-12-03 富士通株式会社 The method and apparatus for improving the processing speed of convolutional neural networks
CN110909667A (en) * 2019-11-20 2020-03-24 北京化工大学 Lightweight design method for multi-angle SAR target recognition network
WO2020091139A1 (en) * 2018-10-31 2020-05-07 주식회사 노타 Effective network compression using simulation-guided iterative pruning
US10657426B2 (en) 2018-01-25 2020-05-19 Samsung Electronics Co., Ltd. Accelerating long short-term memory networks via selective pruning
US20200272904A1 (en) * 2019-02-27 2020-08-27 Oracle International Corporation Forming an artificial neural network by generating and forming of tunnels
US20200334537A1 (en) * 2016-06-30 2020-10-22 Intel Corporation Importance-aware model pruning and re-training for efficient convolutional neural networks
US10936913B2 (en) * 2018-03-20 2021-03-02 The Regents Of The University Of Michigan Automatic filter pruning technique for convolutional neural networks
US10997502B1 (en) * 2017-04-13 2021-05-04 Cadence Design Systems, Inc. Complexity optimization of trainable networks
US20210158168A1 (en) * 2019-11-26 2021-05-27 Numenta, Inc. Performing Inference and Training Using Sparse Neural Network
CN113033804A (en) * 2021-03-29 2021-06-25 北京理工大学重庆创新中心 Convolution neural network compression method for remote sensing image
JP2021124949A (en) * 2020-02-05 2021-08-30 株式会社東芝 Machine learning model compression system, pruning method, and program
US11157814B2 (en) * 2016-11-15 2021-10-26 Google Llc Efficient convolutional neural networks and techniques to reduce associated computational costs
US20220092618A1 (en) * 2017-08-31 2022-03-24 Paypal, Inc. Unified artificial intelligence model for multiple customer value variable prediction
US11315018B2 (en) * 2016-10-21 2022-04-26 Nvidia Corporation Systems and methods for pruning neural networks for resource efficient inference
US20220207375A1 (en) * 2017-09-18 2022-06-30 Intel Corporation Convolutional neural network tuning systems and methods
US11449728B2 (en) 2018-07-01 2022-09-20 Al Falcon Ltd. Method of optimization of operating a convolutional neural network and system thereof
US20230053289A1 (en) * 2017-12-30 2023-02-16 Intel Corporation Machine learning accelerator mechanism
US11586909B2 (en) * 2017-01-13 2023-02-21 Kddi Corporation Information processing method, information processing apparatus, and computer readable storage medium

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200334537A1 (en) * 2016-06-30 2020-10-22 Intel Corporation Importance-aware model pruning and re-training for efficient convolutional neural networks
US11907843B2 (en) * 2016-06-30 2024-02-20 Intel Corporation Importance-aware model pruning and re-training for efficient convolutional neural networks
US11315018B2 (en) * 2016-10-21 2022-04-26 Nvidia Corporation Systems and methods for pruning neural networks for resource efficient inference
US11157815B2 (en) * 2016-11-15 2021-10-26 Google Llc Efficient convolutional neural networks and techniques to reduce associated computational costs
US11157814B2 (en) * 2016-11-15 2021-10-26 Google Llc Efficient convolutional neural networks and techniques to reduce associated computational costs
US11586909B2 (en) * 2017-01-13 2023-02-21 Kddi Corporation Information processing method, information processing apparatus, and computer readable storage medium
US10997502B1 (en) * 2017-04-13 2021-05-04 Cadence Design Systems, Inc. Complexity optimization of trainable networks
US20220092618A1 (en) * 2017-08-31 2022-03-24 Paypal, Inc. Unified artificial intelligence model for multiple customer value variable prediction
US20220207375A1 (en) * 2017-09-18 2022-06-30 Intel Corporation Convolutional neural network tuning systems and methods
US20230053289A1 (en) * 2017-12-30 2023-02-16 Intel Corporation Machine learning accelerator mechanism
US11151428B2 (en) 2018-01-25 2021-10-19 Samsung Electronics Co., Ltd. Accelerating long short-term memory networks via selective pruning
US10657426B2 (en) 2018-01-25 2020-05-19 Samsung Electronics Co., Ltd. Accelerating long short-term memory networks via selective pruning
CN108388537A (en) * 2018-03-06 2018-08-10 上海熠知电子科技有限公司 A kind of convolutional neural networks accelerator and method
US10936913B2 (en) * 2018-03-20 2021-03-02 The Regents Of The University Of Michigan Automatic filter pruning technique for convolutional neural networks
CN110533156A (en) * 2018-05-23 2019-12-03 富士通株式会社 The method and apparatus for improving the processing speed of convolutional neural networks
US11449728B2 (en) 2018-07-01 2022-09-20 Al Falcon Ltd. Method of optimization of operating a convolutional neural network and system thereof
CN108846476A (en) * 2018-07-13 2018-11-20 电子科技大学 A kind of intelligent terminal security level classification method based on convolutional neural networks
WO2020091139A1 (en) * 2018-10-31 2020-05-07 주식회사 노타 Effective network compression using simulation-guided iterative pruning
CN109598340A (en) * 2018-11-15 2019-04-09 北京知道创宇信息技术有限公司 Method of cutting out, device and the storage medium of convolutional neural networks
US11615309B2 (en) * 2019-02-27 2023-03-28 Oracle International Corporation Forming an artificial neural network by generating and forming of tunnels
US20200272904A1 (en) * 2019-02-27 2020-08-27 Oracle International Corporation Forming an artificial neural network by generating and forming of tunnels
CN110097177A (en) * 2019-05-15 2019-08-06 电科瑞达(成都)科技有限公司 A kind of network pruning method based on pseudo- twin network
CN110458289A (en) * 2019-06-10 2019-11-15 北京达佳互联信息技术有限公司 The construction method of multimedia class model, multimedia class method and device
CN110458289B (en) * 2019-06-10 2022-06-10 北京达佳互联信息技术有限公司 Multimedia classification model construction method, multimedia classification method and device
CN110287857A (en) * 2019-06-20 2019-09-27 厦门美图之家科技有限公司 A kind of training method of characteristic point detection model
CN110276452A (en) * 2019-06-28 2019-09-24 北京中星微电子有限公司 Pruning method, device, equipment and the artificial intelligence chip of neural network model
CN110909667A (en) * 2019-11-20 2020-03-24 北京化工大学 Lightweight design method for multi-angle SAR target recognition network
US11681922B2 (en) * 2019-11-26 2023-06-20 Numenta, Inc. Performing inference and training using sparse neural network
US20230274150A1 (en) * 2019-11-26 2023-08-31 Numenta, Inc. Performing Inference And Training Using Sparse Neural Network
US20210158168A1 (en) * 2019-11-26 2021-05-27 Numenta, Inc. Performing Inference and Training Using Sparse Neural Network
JP2021124949A (en) * 2020-02-05 2021-08-30 株式会社東芝 Machine learning model compression system, pruning method, and program
JP7242590B2 (en) 2020-02-05 2023-03-20 株式会社東芝 Machine learning model compression system, pruning method and program
CN113033804A (en) * 2021-03-29 2021-06-25 北京理工大学重庆创新中心 Convolution neural network compression method for remote sensing image

Also Published As

Publication number Publication date
US10740676B2 (en) 2020-08-11

Similar Documents

Publication Publication Date Title
US10740676B2 (en) Passive pruning of filters in a convolutional neural network
US10885437B2 (en) Security system using a convolutional neural network with pruned filters
US10402653B2 (en) Large margin high-order deep learning with auxiliary tasks for video-based anomaly detection
CN106982359B (en) A kind of binocular video monitoring method, system and computer readable storage medium
KR102563752B1 (en) Training method for neural network, recognition method using neural network, and devices thereof
KR102492318B1 (en) Model training method and apparatus, and data recognizing method
CN110647918B (en) Mimicry defense method for resisting attack by deep learning model
US9047568B1 (en) Apparatus and methods for encoding of sensory data using artificial spiking neurons
CN106796580B (en) Method, apparatus, and medium for processing multiple asynchronous event driven samples
WO2020046806A1 (en) Unsupervised anomaly detection, diagnosis, and correction in multivariate time series data
CN113449864B (en) Feedback type impulse neural network model training method for image data classification
US11815893B1 (en) Apparatus and method for monitoring and controlling of a neural network using another neural network implemented on one or more solid-state chips
US11606393B2 (en) Node classification in dynamic networks using graph factorization
CN110458109A (en) A kind of tealeaves disease recognition system and working method based on image recognition technology
US20210326661A1 (en) Determining an explanation of a classification
JP7217761B2 (en) Abnormal device detection from communication data
KR20160132032A (en) Blink and averted gaze avoidance in photographic images
US11468276B2 (en) System and method of a monotone operator neural network
WO2016053748A1 (en) Vibration signatures for prognostics and health monitoring of machinery
CN107038450A (en) Unmanned plane policing system based on deep learning
Poon et al. Driver distracted behavior detection technology with YOLO-based deep learning networks
JP2023118101A (en) Device and method for determining adversarial patch for machine learning system
US11289175B1 (en) Method of modeling functions of orientation and adaptation on visual cortex
KR102488281B1 (en) Method for adaptively controling precision of activation and weight of artificial neural network and apparatus thereof
Popov et al. Recognition of Dynamic Targets using a Deep Convolutional Neural Network

Legal Events

Date Code Title Description
AS Assignment

Owner name: NEC LABORATORIES AMERICA, INC., NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DURDANOVIC, IGOR;GRAF, HANS PETER;REEL/FRAME:042378/0906

Effective date: 20170511

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

AS Assignment

Owner name: NEC CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NEC LABORATORIES AMERICA, INC.;REEL/FRAME:052996/0574

Effective date: 20200616

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4