WO2021170215A1 - Recherche d'architecture neuronale - Google Patents

Recherche d'architecture neuronale Download PDF

Info

Publication number
WO2021170215A1
WO2021170215A1 PCT/EP2020/054808 EP2020054808W WO2021170215A1 WO 2021170215 A1 WO2021170215 A1 WO 2021170215A1 EP 2020054808 W EP2020054808 W EP 2020054808W WO 2021170215 A1 WO2021170215 A1 WO 2021170215A1
Authority
WO
WIPO (PCT)
Prior art keywords
editing
sequence
architecture
editing operations
neural network
Prior art date
Application number
PCT/EP2020/054808
Other languages
English (en)
Inventor
Tinghuai Wang
Kuan Eeik TAN
Adrian Flanagan
Guangming Wang
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2020/054808 priority Critical patent/WO2021170215A1/fr
Publication of WO2021170215A1 publication Critical patent/WO2021170215A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the aspects of the present disclosure relate generally to neural networks and more particularly to automatic neural network architecture search (NAS).
  • NAS automatic neural network architecture search
  • Neural networks are powerful models which have been successfully applied in various artificial intelligence (AI) tasks such as image recognition, speech recognition and machine translation. Despite their success, neural networks are hard to design, requiring expert knowledge and significant time. Employed neural architectures are typically developed manually by human experts, which is a time consuming and error prone process. Neural architecture search (NAS) is a technique for the automated design of artificial neural networks (ANN).
  • AI artificial intelligence
  • Previous neural architecture search approaches driven by evolutionary algorithms such as genetic algorithms (GA) or genetic programming (GP) address the neural architecture search problem by representing the neural architecture as encoding, such as binary strings, and searching for new architectures through evolution.
  • a difficulty related to these approaches is that the encoding scheme, which is limited, determines the search space, or the possible architectures.
  • Another difficulty is that neural architecture searches that start from scratch are highly inefficient and require substantial training to match the performance of pre- trained models.
  • the apparatus includes a processor that is coupled to a memory.
  • the processor is configured to apply a sequence of editing operations to a pre-trained neural network to formulate a search space for a neural architecture search.
  • Genetic programming is applied to the formulated search space to determine an optimal neural network architecture.
  • the search space for the optimal neural architecture is a pool of editing trees, where each editing tree represents a sequence of editing operations rather than network architecture. This is more flexible than existing methods which search or evolve network topologies.
  • the sequence of editing operations are arranged in a tree-structure.
  • the search space for the optimal neural architecture is a pool of editing trees where each tree represents a sequence of editing operations rather than network architecture. This is more flexible than existing methods which search or evolve network topologies.
  • sequence of editing operations is randomly selected from a pool of editing operations. This allows an entire range of possible solutions to be covered.
  • the sequence of editing operations comprise one or more of a split operation, a retain operation, a widen operation or a deepen operation.
  • the sequence of editing operations is applied to the existing neural network to generate a new neural network.
  • the use of editing operations is more flexible than searching an existing network topology.
  • the processor is configured to reuse weights of the pre-trained neural network in the new neural network architecture.
  • the evaluation of new neural architectures is more efficient due to the reuse of the existing network weights. Any pretrained neural network can be reused.
  • the optimal neural architecture is represented as a sequence of network editing operations. Searching a sequence of editing operations for a new or optimal neural architecture is more flexible than existing methods that search network topologies, which are much larger than the sequence of editing operations.
  • sequence of network editing operations comprise operators that manipulate or combine neural network fragments to form the optimal neural architecture. Searching a sequence of editing operations for a new optimal neural architecture is more flexible than existing methods that search network topologies.
  • the optimal neural architecture is determined by determining a fitness measure of each editing tree structure in the sequence of editing trees by validating an accuracy of each editing tree on training data and selecting a fittest editing tree structure.
  • the fittest tree in the final generation provides the searched neural architecture.
  • the method includes applying a sequence of editing operations to a pre-trained neural network, formulating a search space for a neural architecture search based on the applied sequence of editing operations, applying genetic programming to the formulated search space and determining an optimal neural network architecture based on the genetic programming applied to the formulated search space.
  • the search space for the new or optimal neural architecture is a pool of editing trees, where each editing tree represents a sequence of editing operations rather than network architecture. This is more flexible than existing methods which search or evolve network topologies.
  • the sequence of editing operations are arranged in a tree-structure.
  • the search space is a pool of editing trees where each tree represents a sequence of editing operations rather than network architecture. This is more flexible than existing methods which search or evolve network topologies.
  • the method further comprises randomly selecting the sequence of editing operations from a pool of editing operations. This allows an entire range of possible solutions to be covered.
  • the sequence of editing operations comprise one or more of a split operation, a retain operation, a widen operation or a deepen operation.
  • the sequence of editing operations is applied to the existing neural network to generate a new neural network.
  • the use of editing operations is more flexible than searching an existing network topology.
  • the method further comprises reusing weights of the pre-trained neural network in the optimal neural network architecture. The evaluation of new neural architectures is more efficient due to the reuse of the existing network weights.
  • any pretrained neural network can be reused.
  • the optimal neural architecture is represented as a sequence of network editing operations. Searching a sequence of editing operations for a optimal neural architecture is more flexible than existing methods that search network topologies, which are much larger than the sequence of editing operations.
  • the sequence of network editing operations comprise operators that manipulate or combine neural network fragments to form the optimal neural architecture. Searching a sequence of editing operations for a new neural architecture is more flexible than existing methods that search network topologies.
  • the method further comprises determining the optimal neural architecture by determining a fitness measure of each editing tree structure in the sequence of editing trees by validating an accuracy of each editing tree on training data and selecting a fittest editing tree structure as the optimal neural architecture.
  • the above and further objects and advantages are obtained by a computer program product.
  • the computer program product includes a non-transitory computer readable media having stored thereon program instructions that when executed by a processor causes the processor to perform the method according to any one of the possible implementation forms of the method.
  • Figure 1 illustrates a schematic block diagram of an exemplary apparatus incorporating aspects of the disclosed embodiments.
  • Figure 2 illustrates an exemplary population of network editing trees to be implemented in the neural network architecture search of the disclosed embodiments.
  • Figure 3 illustrates an exemplary method incorporating aspects of the disclosed embodiments.
  • Figure 4 illustrates an exemplary editing operation to be implemented in the neural network architecture search of the disclosed embodiments.
  • Figure 5 illustrates an exemplary editing operation to be implemented in the neural network architecture search of the disclosed embodiments.
  • Figure 6 illustrates an exemplary editing operation to be implemented in the neural network architecture search of the disclosed embodiments.
  • Figure 7 illustrates an exemplary editing operation to be implemented in the neural network architecture search of the disclosed embodiments.
  • Figure 8 illustrates an exemplary cross-over operation to be implemented in the neural network architecture search of the disclosed embodiments.
  • Figure 9 illustrates an exemplary mutation operation to be implemented in the neural network architecture search of the disclosed embodiments.
  • Figure 10 is a flow chart illustrating aspects of an exemplary method incorporating aspects of the disclosed embodiments.
  • NAS automatic neural network architecture search
  • a sequence of network editing operations is applied to an existing, pre-trained, neural network.
  • a search space is formulated from the application of the sequence of network editing operations to the existing neural network.
  • Genetic programming is applied to the formulated search space to determine an optimal neural network architecture, also referred to as a new neural architecture.
  • the search space for the optimal neural architecture is a pool of network editing trees, where each editing tree represents a sequence of operations rather than network architecture. Searching the sequence of editing operations is computationally more efficient in time, resources and cost, than searching a conventional neural network architecture topology.
  • the apparatus 100 includes a processor 102 and a memory 104 for storing one or more programs which are executable by the processor 102 for performing the methods described herein.
  • the apparatus 100 also includes an input device 106 for receiving input data and an output device 108 for outputting data.
  • the processor 102 is configured to generate a tree structure 208 to formulate a search space for a neural architecture search.
  • the tree structure 208 is formed by randomly selecting trees 206 from a pool or population of trees 204.
  • Each tree 206 also referred to herein as a parse tree, represents a sequence of network editing operations.
  • Applying the tree structure 208 to the pre-trained neural network 202 generates a new neural architecture.
  • the search for a new neural architecture is equivalent to finding an optimal network editing tree using Genetic Programming (GP).
  • GP Genetic Programming
  • the tree structure or parse tree 208 includes nodes 210 that act as operators that either manipulate or combine network fragments 212 to form the new, optimized neural network architecture.
  • the nodes 210 represent any one of a number of network editing operations, also referred to herein as atomic operations. These operations are the minimum steps to expand the given architecture, which results in a “new” but similar architecture. A sequence of operations can generate a new and quite different architecture. As a result, the new architecture corresponds to an optimal combination of atomic operations to be found.
  • Figure 3 illustrates one example of a process or method 300 incorporating aspects of the disclosed embodiments.
  • the method includes applying 302 a sequence of editing operations to a pre-trained neural network.
  • a search space is formulated 304 for a neural architecture search based on the applied sequence of editing operations.
  • Genetic programming is applied 306 to the formulated search space.
  • An optimal neural network architecture is determined 308 based on the genetic programming applied to the formulated search space.
  • the sequence of editing operations are arranged in a tree- structure.
  • the sequence of editing operations can be randomly selected from a pool of editing operations.
  • the sequence of editing operations can include any one of a number of editing operations.
  • the sequence of editing operations can include one or more of a split operation, a retain operation, a widen operation or a deepen operation.
  • the weights of the pre-trained neural network can be reused in the optimal neural network architecture.
  • any pretrained neural network can be used.
  • the evaluation of new neural architectures is more efficient due to the reuse of the existing network weights.
  • the method further includes determining the optimal neural architecture by determining a fitness measure of each editing tree structure in the sequence of editing trees. In one embodiment, this includes validating an accuracy of each editing tree on training data and selecting a fittest editing tree structure as the optimal neural architecture.
  • one or more sets of editing operations can be applied to generate new, optimized neural network architectures. For example, in one embodiment, several atomic or editing operations can be defined. These operations can include, but are not limited to Widen operations, Deepen operations, Split operations or Retain operations. It should be understood however, that the aspects of the disclosed embodiments are not limited to the editing operations disclosed herein.
  • the editing operations can include any suitable atomic operation other than including widen, deepen, split or retain operations.
  • other operations could include adding a skip connection or a recurrent neuron.
  • the editing tree and search space formulations remain.
  • Figure 4 illustrates one example of a Widen Operation.
  • the Widen operation is configured to insert neurons into a given layer of the neural network 202.
  • the functionality of the Widen operation is to extend input convolution (conv) or expand a fully connected (FC) layer with additional neurons.
  • the attribute of the Widen operation is a leaf node.
  • Parameters of the Widen operation include a variable indicating a number of neurons to be added to the input layer, kernel size, stride and a variable indicating the index (normalized) of the chosen layer.
  • the weights for the new neurons are randomly replicated weights from existing neurons of the chosen layer.
  • the weights of the next layer are weights corresponding to the replicated channels divided by the number of replications to preserve the equivalent functionality.
  • Figure 5 illustrates an example of a Deepen operation 500.
  • the Deepen operation 500 inserts a new layer 504 after the input convolution/fully connected (FC) layer or block 502.
  • the number of neurons in this new layer 504 can initially be the same as the previous layer 502.
  • the attribute of the Deepen operation 500 is a leaf node.
  • the weights for the new layer 504 identify the mapping between the layers 502, 504.
  • Figure 6 illustrates an example of a Split operation 600. In this example, the
  • Split operation 600 divides the input block 602 into groups, such as groups 604, 606, given a parameter.
  • the parameter is a ratio r in (0,1) to determine the splitting point.
  • the attribute of the Split operation is a non-leaf node.
  • FIG. 7 illustrates an example of a Retain operation 700.
  • the Retain operation 700 is configured to retain the input block/layer 702.
  • the attribute of a Retain operation is a leaf node.
  • each tree 208 is evaluated as follows. Each tree 208 is applied on the pre-trained model 202 to attain a new network architecture. The new architecture is fine-tuned on training data for a small number of epochs. The accuracy of the new architecture is then evaluated on validation data. Trees associating with top accuracies (e.g., top 1%) are selected as Elites, which pass through directly to the next generation.
  • the evolution usually starts from a population of randomly generated individuals, or trees in these examples, and is an iterative process, with the population in each iteration called a generation.
  • the fitness of every individual in the population is evaluated.
  • the fitness is usually the value of the objective function in the optimization problem being solved.
  • the more fit individuals are stochastically selected from the current population, and each individual's genome is modified (recombined and possibly randomly mutated) to form a new generation.
  • the new generation of candidate solutions is then used in the next iteration of the algorithm.
  • the algorithm terminates when either a maximum number of generations has been produced, or a satisfactory fitness level has been reached for the population.
  • next generation a portion of the editing trees that are passed through to the next generation, such as for example, 5%, is reinitialised at random.
  • the remainder of the next generation is bred from the selected parents using the processes of mutation and cross-over, examples of which are illustrated in Figures 8 and 9, respectively.
  • Mutation is a genetic operator used to maintain genetic diversity from one generation of a population of genetic algorithm chromosomes to the next. It is analogous to biological mutation. Mutation alters one or more gene values in a chromosome from its initial state. In accordance with the aspects of the disclosed embodiments, some nodes or branches can be totally altered. In the example of Figure 8 mutation introduces diversity into the population by randomly replacing a branch or node with a randomly generated branch or node. As shown in Figure 8, the widen node 804 in the editing tree 802 is replaced with the randomly generated branch 806. [0052] Cross-over, an example of which is shown in Figure 9, constructs two new network editing trees using portions of the parent network editing trees.
  • crossover also called recombination
  • recombination is a genetic operator used to combine the genetic information of two parents to generate new offspring. It is one way to stochastically generate new solutions from an existing population, and analogous to the crossover that happens during sexual reproduction in biology.
  • Figure 2 shows an exemplary population generated via evolutions e.g. cross over and mutation.
  • the fittest tree in the final generation provides the searched neural architecture.
  • An overview of the optimization is shown as a flowchart in Figure 10.
  • a population 204 of network editing trees are randomly generated 1002.
  • Each tree 206 in the population 204 also referred to as a program, comprises randomly generated editing operations. Examples of these randomly generated editing operations include, but are not limited to, Widen operations, Deepen operations, Split operations and Retain operations, as was previously described.
  • the fitness of the editing trees 206 in the population 204 are evaluated 1004. The fitness measure of each editing tree 206 in the population 204 takes into account the accuracy of the target task.
  • the new network architecture is fine-tuned on training data for a small number of epochs.
  • the accuracy of the new network architecture is then evaluated on validation data.
  • step 1006 it is determined whether there is a significant improvement in terms of accuracy for the target task. If in step 1006 it is determined that the evaluated editing tree does not have significant improvement with respect to accuracy for the target task, the best network architecture of the initial population of network editing trees 204 is returned 1008.
  • step 1006 If in step 1006 it is determined that there is a significant improvement in accuracy of the target task, the evaluated editing tree is passed 1010 to the next generation.
  • the best performing trees will be kept intact and form part of the population in the next generation.
  • the top one percent (1%) of the best performing trees are selected to pass to the next generation.
  • any suitable percentage of the best or top performing trees can be selected. For example, the range could be from one percent to ten percent.
  • Steps 1010 to 1020 illustrate the process to generate the population of the new generation: (1) the top individual, or tree in this context, also referred to as elites are passed 1010 from the previous generation to the next generation; (2) new trees are randomly generated 1016 if it is determined 1012 that a random number is lower than a small probability epsilon; (3) parents are picked 1014 and cross-over and/or mutation is performed 1018 on the picked parents to generate new trees if is determined 1012 that the random number is higher than the small probability epsilon.
  • the selected new tree is added 1020 to the next generation. It is determined
  • the aspects of the disclosed embodiments are directed to automatic neural network architecture search.
  • the “search” action is embodiment as editing programs, such as widen and deepen operations, where new operations can be defined based on needs.
  • the generation of the new neural architecture includes defining a sequence of operations formulated as tree structure on a given pre-trained model. Finding an optimal architecture is formulated as searching the space of programs to determine a sequence of operations using Genetic Programming algorithms subject to a performance measure, such as accuracy and latency.
  • Advantages of this solution include that the search can be based on any pre trained model and the weights of the pre-trained model can be reused.
  • the user only needs to define the atomic operations.
  • the search space is a pool of trees, and each tree represents a sequence of operations rather than network architecture.
  • the evaluation of the new neural architecture is more efficient due to the reuse of the existing network weights.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Physiology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Une séquence d'opérations d'édition est appliquée à un réseau neuronal préentraîné afin de formuler un espace de recherche pour une recherche d'architecture neuronale. Une programmation génétique est appliquée à l'espace de recherche formulé afin de déterminer une architecture optimale de réseau neuronal. L'espace de recherche pour l'architecture neuronale optimale est un groupe d'arbres d'édition, chaque arbre d'édition représentant une séquence d'opérations d'édition plutôt qu'une architecture de réseau. Cela est plus flexible que des procédés existants qui recherchent ou développent des topologies de réseau puisque la recherche d'une séquence d'opérations d'édition est plus efficace sur le plan informatique en matière de temps, de ressources et de coût.
PCT/EP2020/054808 2020-02-25 2020-02-25 Recherche d'architecture neuronale WO2021170215A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2020/054808 WO2021170215A1 (fr) 2020-02-25 2020-02-25 Recherche d'architecture neuronale

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2020/054808 WO2021170215A1 (fr) 2020-02-25 2020-02-25 Recherche d'architecture neuronale

Publications (1)

Publication Number Publication Date
WO2021170215A1 true WO2021170215A1 (fr) 2021-09-02

Family

ID=69701194

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/054808 WO2021170215A1 (fr) 2020-02-25 2020-02-25 Recherche d'architecture neuronale

Country Status (1)

Country Link
WO (1) WO2021170215A1 (fr)

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LI DONGNI ET AL: "Automatic Design of Intercell Scheduling Heuristics", IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 16, no. 4, 1 October 2019 (2019-10-01), pages 1907 - 1921, XP011749202, ISSN: 1545-5955, [retrieved on 20191003], DOI: 10.1109/TASE.2019.2895369 *
YANG JIANG ET AL: "Neural Architecture Refinement: A Practical Way for Avoiding Overfitting in NAS", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 7 May 2019 (2019-05-07), XP081273075 *
YIHENG ZHU ET AL: "GP-CNAS: Convolutional Neural Network Architecture Search with Genetic Programming", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 26 November 2018 (2018-11-26), XP081430419 *

Similar Documents

Publication Publication Date Title
Loos et al. Deep network guided proof search
US20200285948A1 (en) Robust auto-associative memory with recurrent neural network
González Muñoz et al. Multi-stage genetic fuzzy systems based on the iterative rule learning approach
CN111340227A (zh) 通过强化学习模型对业务预测模型进行压缩的方法和装置
CN114328048A (zh) 一种磁盘故障预测方法及装置
Tian et al. Automatic convolutional neural network selection for image classification using genetic algorithms
EP4000036A1 (fr) Arbre de décision spécifique d'un groupe
Wong et al. Inducing logic programs with genetic algorithms: The genetic logic programming system
WO2018167885A1 (fr) Dispositif, procédé de traitement et programme de traitement d'informations
CN112749530B (zh) 文本编码方法、装置、设备及计算机可读存储介质
US20230141655A1 (en) System and Method For Loss Function Metalearning For Faster, More Accurate Training, and Smaller Datasets
WO2021170215A1 (fr) Recherche d'architecture neuronale
da Silva et al. A multi-objective grammatical evolution framework to generate convolutional neural network architectures
Babatunde et al. Comparative analysis of genetic algorithm and particle swam optimization: An application in precision agriculture
CN115661546A (zh) 一种特征选择与分类器联合设计的多目标优化分类方法
CN115795035A (zh) 基于进化神经网络的科技服务资源分类方法、系统及其计算机可读存储介质
Georgieva Parameters of GFSSAM: coding the parameters of a hybrid genetic fuzzy system
Maini et al. Optimal feature selection using elitist genetic algorithm
CN116090538A (zh) 一种模型权重获取方法以及相关系统
CN111950615A (zh) 一种基于树种优化算法的网络故障特征选择方法
Ledezma et al. Heuristic search-based stacking of classifiers
Van Truong et al. An Ensemble Co-Evolutionary based Algorithm for Classification Problems
CN116821691B (zh) 基于任务融合的训练情感识别模型的方法和装置
Antoniou et al. A gene expression programming environment for fatigue modeling of composite materials
Chagas et al. Assessing Multi-Objective Search Engines for GE: A Case Study in CNN Generation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20707239

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20707239

Country of ref document: EP

Kind code of ref document: A1