WO2022177931A1 - System and method for the automated learning of lean cnn network architectures - Google Patents

System and method for the automated learning of lean cnn network architectures Download PDF

Info

Publication number
WO2022177931A1
WO2022177931A1 PCT/US2022/016517 US2022016517W WO2022177931A1 WO 2022177931 A1 WO2022177931 A1 WO 2022177931A1 US 2022016517 W US2022016517 W US 2022016517W WO 2022177931 A1 WO2022177931 A1 WO 2022177931A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
samples
steps
training data
pruning
Prior art date
Application number
PCT/US2022/016517
Other languages
French (fr)
Inventor
Marios Savvides
Uzair Ahmed
Thanh Hai PHAN
Original Assignee
Carnegie Mellon University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Carnegie Mellon University filed Critical Carnegie Mellon University
Priority to US18/275,320 priority Critical patent/US20240095524A1/en
Publication of WO2022177931A1 publication Critical patent/WO2022177931A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • NAS Neural architecture search
  • NAS may automate the entire parameter search process by automatically cycling through different parameter settings and evaluating the performance of the network after training.
  • Deep neural networks support the lottery hypothesis, which postulates that there are different sub-networks within the network which are more efficient for particular classification tasks than the whole network.
  • Disclosed herein is a system and method for searching for the hidden sub networks within the deep neural network.
  • the network is evolved by iteratively adding layers and pruning until the desired performance is achieved.
  • a structured LI pruning strategy is used, where filters with low LI norms are dropped from the network, thus reducing the parameters resulting in the evolution of a lean and efficient CNN architecture.
  • the disclosed method imposes further constraints on the model weights to prune away inefficient weights while the model is growing in its constructive elements.
  • the method provides the user with efficient networks that leverage the lottery ticket hypothesis, thus leading to the construction of a lean model architecture which can be used on low power devices.
  • FIG. 1 is a flow diagram depicting the steps in the method.
  • FIG. 1 is a flowchart depicting steps comprising a method implementing one embodiment of the invention.
  • the method starts, at step 104, samples are selected from the training data.
  • the samples initially selected will be “easy” samples for the model to classify, for example, samples having nicely clustered features may initially be selected.
  • the model is fine-tuned using the selected samples.
  • the performance of the model is then tested at step 108. If the model is exhibiting the desired performance, the process ends.
  • the performance of the model can be measured using a metric indicating, for example, the percentage of objects in the training dataset that the model is able to correctly classify.
  • the model may be deemed to have acceptable performance.
  • the metric may measure the FLOPs of the model used for each classification task and, if the FLOPs exceed a predetermined threshold, the iteration of the method is terminated.
  • the model is enhanced by adding one or more layers at step 110.
  • layers may be added one at a time for each iteration of the steps of the method until the desired performance is achieved. In other embodiments, more than one layer at a time may be added.
  • the additional layers are initialized with random weights. In alternate embodiments, the additional layers may be initialized by some other method, for example, using pre-trained weights from other models.
  • the weights in all layers of the model are pruned to remove inefficient weights.
  • LI pruning is used.
  • a filter is pruned if the LI norm of its response (i.e., activation) is in the bottom segment, as defined by a hyperparameter.
  • the hyperparameter is referred to as a pruning factor that can be set between 1 (no pruning) and 0 (complete pruning).
  • a pruning factor of .8 means that the top 80% of the filters are kept (i.e., the top segment), while the bottom 20% of filters are removed (i.e., the bottom segment). This effectively only keeps filters that, on average, provide high enough activation responses.
  • the pruning factor can be understood as the factor or percent of parameters to keep while pruning or permanently deactivating the rest. In other embodiments, other methods of pruning may be used.
  • step 104 The method then returns to step 104, where additional samples are selected from the training dataset.
  • the model is trained based on a strategy derived from curriculum learning. Data is sampled to select features of data points with higher norm values as high norm values are, on average, easier samples for the model to classify.
  • the complexity of the model increases with each iteration, wherein additional layers are added to the model’ s architecture, the difficulty of the classification task is increased by adding samples with slightly lower norms. This ensures that the complexity of the training data is always under check and on par with the complexity of the model during as it evolves.
  • Fine-tuning again occurs at step 106 with the newly-added samples from the training dataset, and the model is again evaluated at step 108 to determine if the desired performance has been achieved.
  • the loop depicted in FIG. 1 iterates until the model is considered fully evolved, based on a level of complexity threshold, and the sampling on the training dataset is stopped. After the evolution process ends, the model is then exposed to all data samples in the training dataset for training at 114.
  • the disclosed method described herein can be implemented by a system comprising a processor and memory, storing software that, when executed by the processor, performs the functions comprising the method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed herein is a system and method for evolving a deep neural network model by searching for hidden sub-networks within the model. The model is evolved by adding convolutional layers to the model, then pruning the model to remove redundant filters. The model is exposed to training samples of increasing complexity each time the model is evolved, until a desired level of performance is achieved, at which time, the model is exposed to all available training data.

Description

SYSTEM AND METHOD FOR THE AUTOMATED LEARNING OF LEAN CNN NETWORK ARCHITECTURES
Related Applications
[0001] This application claims the benefit of U.S. Provisional Patent Application No. 63/150,133, filed February 17, 2021, the contents of which are incorporated herein in their entirety.
Background
[0002] Manually designing deep neural networks in a trial-and-error, ad hoc fashion is a tedious process requiring both architectural engineering skills and domain expertise. Experts in the design of such networks rely on past experience and technical knowledge to create and design a neural network. Designing novel neural network architectures involves searching over a huge space of hyperparameters concerning the number of layers in the network, the number of filters in each layer, different initializations, normalization techniques etc. Manually creating different configurations of the network architecture spanning different settings under each of the mentioned parameters makes creating novel architectures is difficult and inefficient.
[0003] Neural architecture search (NAS) is a technique for automating the design of neural networks to alleviate or reduce the effort required by the network architect and to optimize the network topology to achieve the best performance for a particular (engagement task. In some cases, NAS may automate the entire parameter search process by automatically cycling through different parameter settings and evaluating the performance of the network after training.
Summary
[0004] Deep neural networks support the lottery hypothesis, which postulates that there are different sub-networks within the network which are more efficient for particular classification tasks than the whole network. Disclosed herein is a system and method for searching for the hidden sub networks within the deep neural network. The network is evolved by iteratively adding layers and pruning until the desired performance is achieved. To find the sub-networks, in some embodiments, a structured LI pruning strategy is used, where filters with low LI norms are dropped from the network, thus reducing the parameters resulting in the evolution of a lean and efficient CNN architecture.
[0005] The disclosed method imposes further constraints on the model weights to prune away inefficient weights while the model is growing in its constructive elements. The method provides the user with efficient networks that leverage the lottery ticket hypothesis, thus leading to the construction of a lean model architecture which can be used on low power devices.
Brief Description of the Drawings
[0006] By way of example, a specific exemplary embodiment of the disclosed system and method will now be described, with reference to the accompanying drawings, in which:
[0007] FIG. 1 is a flow diagram depicting the steps in the method.
Detailed Description
[0008] FIG. 1 is a flowchart depicting steps comprising a method implementing one embodiment of the invention. When the method starts, at step 104, samples are selected from the training data. In a preferred embodiment, the samples initially selected will be “easy” samples for the model to classify, for example, samples having nicely clustered features may initially be selected. At step 106 of the process, the model is fine-tuned using the selected samples. The performance of the model is then tested at step 108. If the model is exhibiting the desired performance, the process ends. [0009] The performance of the model can be measured using a metric indicating, for example, the percentage of objects in the training dataset that the model is able to correctly classify. If the metric measuring the performance of the model exceeds a certain per briefing avoid the one in figure 1. Centage, the model may be deemed to have acceptable performance. In alternate embodiments of the method, the metric may measure the FLOPs of the model used for each classification task and, if the FLOPs exceed a predetermined threshold, the iteration of the method is terminated.
[0010] If, by measurement of the metric, the performance of the model is below the acceptable threshold, the model is enhanced by adding one or more layers at step 110. In a preferred embodiment, layers may be added one at a time for each iteration of the steps of the method until the desired performance is achieved. In other embodiments, more than one layer at a time may be added. In one embodiment, the additional layers are initialized with random weights. In alternate embodiments, the additional layers may be initialized by some other method, for example, using pre-trained weights from other models.
[0011] At step 112 of the method, the weights in all layers of the model are pruned to remove inefficient weights. In one embodiment of the invention, LI pruning is used. In this embodiment, a filter is pruned if the LI norm of its response (i.e., activation) is in the bottom segment, as defined by a hyperparameter. The hyperparameter is referred to as a pruning factor that can be set between 1 (no pruning) and 0 (complete pruning). For example, a pruning factor of .8 means that the top 80% of the filters are kept (i.e., the top segment), while the bottom 20% of filters are removed (i.e., the bottom segment). This effectively only keeps filters that, on average, provide high enough activation responses. The pruning factor can be understood as the factor or percent of parameters to keep while pruning or permanently deactivating the rest. In other embodiments, other methods of pruning may be used.
[0012] The method then returns to step 104, where additional samples are selected from the training dataset. During the model evolution, while the model complexity is low, the model is trained based on a strategy derived from curriculum learning. Data is sampled to select features of data points with higher norm values as high norm values are, on average, easier samples for the model to classify. As the complexity of the model increases with each iteration, wherein additional layers are added to the model’ s architecture, the difficulty of the classification task is increased by adding samples with slightly lower norms. This ensures that the complexity of the training data is always under check and on par with the complexity of the model during as it evolves.
[0013] Fine-tuning again occurs at step 106 with the newly-added samples from the training dataset, and the model is again evaluated at step 108 to determine if the desired performance has been achieved. The loop depicted in FIG. 1 iterates until the model is considered fully evolved, based on a level of complexity threshold, and the sampling on the training dataset is stopped. After the evolution process ends, the model is then exposed to all data samples in the training dataset for training at 114.
[0014] As would be realized by one of skill in the art, the disclosed method described herein can be implemented by a system comprising a processor and memory, storing software that, when executed by the processor, performs the functions comprising the method.
[0015] As would further be realized by one of skill in the art, many variations on implementations discussed herein which fall within the scope of the invention are possible. Moreover, it is to be understood that the features of the various embodiments described herein were not mutually exclusive and can exist in various combinations and permutations, even if such combinations or permutations were not made express herein, without departing from the spirit and scope of the invention. Accordingly, the method and apparatus disclosed herein are not to be taken as limitations on the invention but as an illustration thereof. The scope of the invention is defined by the claims which follow.

Claims

1. A method for evolving a deep neural network model for a task, the method comprising iterating the steps of: selecting samples of training data; fine-tuning the model using the selected samples; adding one or more additional convolutional layers to the model; and pruning the model; wherein the steps of the method are terminated when, after the fine- tuning step, the model is fully evolved and exhibits a desired level of performance.
2. The method of claim 1 wherein the initially selected samples of training data are samples which the model can easily classify.
3. The method of claim 2 wherein the initially selected samples have clustered features.
4. The method of claim 1 wherein the task is increased in difficulty with each iteration of the steps of the method.
5. The method of claim 4 wherein the complexity of the data samples selected from the training data increases with each iteration of the steps of the method.
6. The method of claim 5 wherein the difficulty of the task is increased at each iteration of the steps of the method by selecting additional samples from the training data having lower norms.
7. The method of claim 1 wherein the pruning step comprises pruning a filter when an LI norm of its response falls below a predefined threshold.
8. The method of claim 1 further comprising: exposing the model to all training data after the model is fully evolved.
9. The method of claim 1 wherein the desired level of performance of the model is measured by the percentage of objects in the database the model is able to correctly classify.
10. The method of claim 1 wherein the desired level of performance of the model is measured by FLOPS used by the model for each classification task.
11. A system comprising: a processor; and memory, storing software that, when executed by the processor, implements the steps of the method of claim 1.
PCT/US2022/016517 2021-02-17 2022-02-16 System and method for the automated learning of lean cnn network architectures WO2022177931A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/275,320 US20240095524A1 (en) 2021-02-17 2022-02-16 System and method for the automated learning of lean cnn network architectures

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163150133P 2021-02-17 2021-02-17
US63/150,133 2021-02-17

Publications (1)

Publication Number Publication Date
WO2022177931A1 true WO2022177931A1 (en) 2022-08-25

Family

ID=82931617

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/016517 WO2022177931A1 (en) 2021-02-17 2022-02-16 System and method for the automated learning of lean cnn network architectures

Country Status (2)

Country Link
US (1) US20240095524A1 (en)
WO (1) WO2022177931A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180107926A1 (en) * 2016-10-19 2018-04-19 Samsung Electronics Co., Ltd. Method and apparatus for neural network quantization
US20200334537A1 (en) * 2016-06-30 2020-10-22 Intel Corporation Importance-aware model pruning and re-training for efficient convolutional neural networks
US20200364573A1 (en) * 2019-05-15 2020-11-19 Advanced Micro Devices, Inc. Accelerating neural networks with one shot skip layer pruning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200334537A1 (en) * 2016-06-30 2020-10-22 Intel Corporation Importance-aware model pruning and re-training for efficient convolutional neural networks
US20180107926A1 (en) * 2016-10-19 2018-04-19 Samsung Electronics Co., Ltd. Method and apparatus for neural network quantization
US20200364573A1 (en) * 2019-05-15 2020-11-19 Advanced Micro Devices, Inc. Accelerating neural networks with one shot skip layer pruning

Also Published As

Publication number Publication date
US20240095524A1 (en) 2024-03-21

Similar Documents

Publication Publication Date Title
CN110674604B (en) Transformer DGA data prediction method based on multi-dimensional time sequence frame convolution LSTM
TWI769754B (en) Method and device for determining target business model based on privacy protection
CN111814966A (en) Neural network architecture searching method, neural network application method, device and storage medium
US20200311549A1 (en) Method of pruning convolutional neural network based on feature map variation
CN109818961B (en) Network intrusion detection method, device and equipment
CN111079899A (en) Neural network model compression method, system, device and medium
CN111242302A (en) XGboost prediction method of intelligent parameter optimization module
US20190213475A1 (en) Reducing machine-learning model complexity while maintaining accuracy to improve processing speed
CN111027629A (en) Power distribution network fault outage rate prediction method and system based on improved random forest
CN111368887A (en) Training method of thunderstorm weather prediction model and thunderstorm weather prediction method
Ruiz et al. Storms prediction: Logistic regression vs random forest for unbalanced data
CN108595526B (en) Resource recommendation method and device
CN110647456A (en) Fault prediction method, system and related device of storage equipment
WO2017015184A1 (en) Output efficiency optimization in production systems
CN111126072A (en) Method, device, medium and equipment for training Seq2Seq model
CN113139570A (en) Dam safety monitoring data completion method based on optimal hybrid valuation
US20240095524A1 (en) System and method for the automated learning of lean cnn network architectures
CN108826595A (en) A kind of control method and system of air purifier
CN110648248B (en) Control method, device and equipment for power station
CN112101313A (en) Machine room robot inspection method and system
CN115203970B (en) Diagenetic parameter prediction model training method and prediction method based on artificial intelligence algorithm
CN113361199A (en) Multi-dimensional pollutant emission intensity prediction method based on time series
CN113762511A (en) Convolution kernel pruning model compression method and device
CN114548421A (en) Optimization processing method and device for federal learning communication overhead
CN114266186A (en) Air conditioner energy consumption data missing value filling method, equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22756797

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18275320

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22756797

Country of ref document: EP

Kind code of ref document: A1