CN112784949A - Neural network architecture searching method and system based on evolutionary computation - Google Patents

Neural network architecture searching method and system based on evolutionary computation Download PDF

Info

Publication number
CN112784949A
CN112784949A CN202110120132.8A CN202110120132A CN112784949A CN 112784949 A CN112784949 A CN 112784949A CN 202110120132 A CN202110120132 A CN 202110120132A CN 112784949 A CN112784949 A CN 112784949A
Authority
CN
China
Prior art keywords
network
sub
module
pheromone
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110120132.8A
Other languages
Chinese (zh)
Other versions
CN112784949B (en
Inventor
高明柯
杜欣军
赵�卓
逄涛
冒睿瑞
张浩博
郭威
王熠
刘晓娟
于楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 32 Research Institute
Original Assignee
CETC 32 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 32 Research Institute filed Critical CETC 32 Research Institute
Priority to CN202110120132.8A priority Critical patent/CN112784949B/en
Publication of CN112784949A publication Critical patent/CN112784949A/en
Application granted granted Critical
Publication of CN112784949B publication Critical patent/CN112784949B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a neural network architecture searching method and a system based on evolutionary computation, which comprises the following steps: setting a target requirement through a target function according to the target requirement and the platform requirement; according to the set size of the search space, randomly generating N directed cyclic graphs based on the sub-network module set to serve as a network search space for evolutionary optimization; under the guidance of heuristic information, combining with a pheromone dynamic volatilization and probability path selection mechanism, searching N directed acyclic graphs for optimizing paths in N randomly generated acyclic graphs through an ant colony algorithm to form a candidate set; acquiring the accuracy and reasoning time delay of N optimizing paths in the candidate set through training and testing, and selecting an optimal result as a current optimal network structure; and evaluating whether the current network architecture meets the target requirements. The invention has certain application flexibility and expandability, and obtains the neural network model which realizes good balance between precision and speed under the condition of resource limitation.

Description

Neural network architecture searching method and system based on evolutionary computation
Technical Field
The invention relates to the technical field of architecture design and optimization of a deep neural network, in particular to a neural network architecture searching method and system based on evolutionary computation.
Background
Deep learning has a powerful automatic feature extraction function on unstructured data and has powerful automatic representation capability, so that the deep learning makes major breakthrough and progress in many fields such as machine translation, image recognition, voice recognition, target detection, natural language processing and the like. Based on the importance of the design of neural network architectures to the characterization of data and the ultimate performance, researchers have focused on designing various complex neural network architectures to obtain good data characterization. However, the design of neural network architectures relies heavily on a priori knowledge and experience of researchers, requiring a great deal of time and effort. The prior knowledge and the fixed thinking paradigm of human beings are difficult to find a better network architecture to a certain extent, and beginners are also difficult to reasonably modify the network architecture according to the actual needs of the beginners. Therefore, Neural Architecture Search (NAS) has come into play. The NAS aims at automatically designing a neural network architecture with optimal performance by utilizing an algorithm under the condition of limited computing resources, and manual intervention is reduced as far as possible. The research that the network architecture obtained by using the reinforcement learning method achieves the SOTA classification precision on the image classification task is considered as the pioneering work of the NAS, and the idea of designing the automatic network architecture is also shown to be feasible. Subsequently, the feasibility of this idea was again verified by the research work of large-scale evolutionary computation using evolutionary learning to obtain similar results. NAS has been rapidly applied in target detection, semantic segmentation, counterlearning, building scale, and multi-objective optimization.
As the NAS method requires a strong computational support and consumes a huge amount of computation, research has been conducted to reconstruct a search space to reduce a search range and reduce search complexity, and to accelerate a search of a network architecture through strategies such as parameter sharing, model reuse, gradient optimization, and the like to reduce the amount of computation. Early NAS trained each candidate network architecture from scratch in the architecture search phase, resulting in a proliferation of computational effort; although a parameter sharing strategy is adopted to speed up the process of architecture search, the inaccurate ranking of the candidate architectures is likely to cause the NAS to have difficulty in selecting the optimal network architecture from a large number of candidate architectures, thereby further reducing the performance of the finally searched network architecture. The search space is relaxed from discrete to continuous by the differentiable neural network architecture search method based on One-Shot, so that the architecture and the learning weight can be searched simultaneously by using gradient descent, the search time is shortened, but when the number of search rounds is too large, a searched architecture contains a lot of jump connections, and the network becomes shallower. Shallow networks can learn fewer parameters and have weaker expression ability, resulting in a sharp decline in network performance. Although the improved differentiable neural network architecture searching method adopts an early-stopping mechanism to directly control the number of the jump connections, the control of the early-stopping mechanism is an important problem, and the early stopping can cause the incomplete architecture searching. Therefore, how to balance performance and efficiency under the condition of limited resources is a problem to be solved urgently.
Domestic patent CN111353313A discloses an emotion analysis model construction method based on evolutionary neural network architecture search, which comprises the following steps: group initialization; packaging a plurality of convolutional layer units, a plurality of pooling units and a plurality of full-connection units by taking the embedded layer as a first layer, and ending by the full-connection units to randomly generate M chromosomes; the accuracy is adopted as a fitness function to carry out fitness evaluation; selecting a plurality of chromosome individuals by adopting a roulette method to form a first chromosome population; carrying out pairwise crossing on chromosome individuals of the first chromosome population by adopting an unequal length chromosome crossing method to obtain a plurality of chromosome individuals to form a second chromosome population; adding or replacing or deleting a certain convolution layer unit or pooling unit or full-link unit of chromosome individuals of the second chromosome population; and calculating the fitness of the chromosome individuals of the second chromosome population until reaching a preset iteration number, and selecting the chromosome individuals with the optimal neural network structure by adopting the fitness.
The domestic patent CN111144555A discloses a cyclic neural network architecture searching method, a system and a medium based on an improved evolutionary algorithm, and the method comprises the steps of training a plurality of cyclic neural network submodels to update shared weight; initializing a generation population and a historical record table for recording the performance of all the recurrent neural network models; randomly sampling from the population to generate samples, selecting an optimal sample model for mutation operation, removing the oldest or worst model in the population according to specified probability, and adding the mutated child nodes into the population and a historical record table; and judging whether a preset finishing condition is met, if not, continuing to perform sample variation, and otherwise, outputting the optimal model in the history table. The invention can accelerate the process of searching the recurrent neural network architecture, and can greatly improve the efficiency of searching the recurrent neural network architecture by simultaneously considering the performance and the searching time when updating the seed group in each step.
Domestic patent CN110728355A discloses a neural network architecture search method, device, computer equipment and storage medium, relating to the technical field of deep learning, wherein the method can include: dividing a neural network architecture into M substructures, wherein M is a positive integer greater than 1; searching the topological structures in the substructures respectively; the neural network architecture is obtained by connecting the topological structures in the substructures, so that the searching speed can be improved.
The domestic patent CN110232434A discloses a neural network architecture evaluation method based on attribute graph optimization, which models a neural network architecture into an attribute graph and constructs a Bayesian graph neural network agent model; randomly generating, training and testing a group of neural network architectures, and taking the group of neural network architectures and performance indexes corresponding to the tests as an initial training set, wherein the training set is used for training a Bayesian graph neural network agent model; generating a new neural network candidate set through an evolutionary algorithm according to the current training set and training a Bayesian graph neural network agent model; selecting a potential individual from the neural network candidate set through a maximized acquisition function, then training and testing the individual, and adding the potential individual and a performance index corresponding to the test into the current training set; and repeating the steps under the constraint of fixed cost until the best neural network architecture and the corresponding weight of the architecture are obtained in the current training set.
The foreign patent JP2020522035A discloses a method, system and apparatus for determining the structure of a neural network. The method includes generating a batch of output sequences from current values of controller parameters using a controller neural network having the controller parameters. Generating an instance of a sub-Convolutional Neural Network (CNN) in a batch process from an output sequence, the CNN comprising a plurality of instances having a first convolution unit of an architecture defined by the output sequence; training an instance of the sub-CNN to perform an image processing task and evaluating performance of the instance of the sub-CNS training for determining a performance metric of the sub-CNN training instance, and including adjusting current values of controller parameters of the controller neural network using the performance metric of the trained CNN training instance.
The foreign patent WO2018081563a1 discloses a method, system and apparatus for determining the architecture of a neural network. The method comprises the following steps: generating a batch of output sequences using a controller neural network, each output sequence in the batch being an architecture of a sub-neural network for performing a particular neural network task; for each output sequence in the batch: training an architecture defined by the output sequence through respective sub-neural network instances; evaluating the performance of the sub-neural network training instance on the particular neural network task to determine a performance metric of the sub-neural network training instance on the particular neural network task; and adjusting current values of controller neural network controller parameters using the performance metrics of the sub-neural network training instances.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a neural network architecture searching method and system based on evolutionary computation.
The invention provides a neural network architecture searching method based on evolutionary computation, which comprises the following steps:
step S1: setting target requirements through a target function according to the target requirements and the platform requirements, wherein the target requirements comprise: expected accuracy, reasoning time delay, search space size and evolution times;
step S2: according to the set size of the search space, randomly generating N directed cyclic graphs based on the sub-network module set to serve as a network search space for evolutionary optimization;
step S3: under the guidance of heuristic information, combining with a pheromone dynamic volatilization and probability path selection mechanism, searching N directed acyclic graphs for optimizing paths in N randomly generated acyclic graphs through an ant colony algorithm to form a candidate set;
step S4: acquiring the accuracy and reasoning time delay of N optimizing paths in the candidate set through training and testing, and selecting an optimal result as a current optimal network structure;
step S5: evaluating whether the current network architecture meets the target requirement, when the current network architecture does not meet the preset target requirement, and when the current network architecture meets the speed and precision requirements, pausing the optimization result, carrying out real-time evolution mutation on the internal structures of all the sub-network modules based on the current optimal network architecture, and continuing iteration until the preset target requirement is met; and when the preset target requirement is met, outputting the current optimal network result, otherwise, quitting the searching process and outputting the abnormal searching.
Preferably, the step S1 includes:
the objective function formula is as follows:
Figure BDA0002921713920000041
s.t.LAT(Net)≤T&ACC(Net)≥A
wherein, the target function is defined as a multi-target search; net represents the network obtained by the evolutionary algorithm; acc (net) indicates the accuracy of the network; lat (net) represents the inference delay of the network; t represents the expected inference time delay; a represents the desired accuracy; the expected accuracy is set according to preset target requirements; the expected inference time delay is set according to mobile, embedded or general platform types.
Preferably, the step S2 includes:
the sub-network modules are nodes in an undirected cyclic graph; the sub-network modules comprise a plurality of types of sub-networks with M layers, and the types of the sub-networks can be expanded and selected according to target requirements and platform requirements;
the structure of the sub-network comprises: multi-convolution layers, ResNet blocks, depth separable convolutions, reversed residual structure with linear bottlenecks, and lightweight attention structure based on compression-excitation structures.
Preferably, the network search space comprises: the network search space comprises N search subspaces;
Figure BDA0002921713920000042
Figure BDA0002921713920000043
wherein ,
Figure BDA0002921713920000044
representing the i-th generation of a collection of sub-network modules,
Figure BDA0002921713920000045
represents the jth sub-network module in the ith generation,
Figure BDA0002921713920000046
representing an edge set of the ith generation of search space, and connecting the sub-network modules through edges;
Figure BDA0002921713920000047
representing an edge between a jth sub-network module and a kth sub-network module in an ith generation;
Figure BDA0002921713920000048
representing the ith generation nth search space; i denotes an iteration number.
Preferably, the step S3 includes:
step S3.1: selecting any point in each search subspace as a starting point, selecting a node farthest from the starting point as an end point, and initializing the ant number, the pheromone intensity constant and the cycle number;
step S3.2: calculating heuristic information;
Figure BDA0002921713920000051
wherein ,ηI,J(t) heuristic information from node I to node J at time t; depI,J、WigI,J、ConI,J、FilI,JRespectively representing the depth, the width, the connectivity and the number of filters of the joint J, wherein omega represents an excitation factor, and omega is more than or equal to 0 and less than or equal to 1; the reward mechanism is defined as exciting all nodes in the current optimal network architecture; the smaller the omega value is, the larger heuristic information eta is, and the initial value of omega is set to be 1 because evolution is not generated in the initial network search space;
step S3.3: selecting a probability path;
Figure BDA0002921713920000052
wherein ,
Figure BDA0002921713920000053
representing the probability that the ant m moves from the point I to the point J at the t-th moment; allowedmRepresenting nodes which can be selected by ants in the next step; alpha represents pheromone elicitation factors, represents the effect of residual pheromones on the paths in the optimizing process, and the larger the value is, the stronger the cooperation capability among ants is, and the paths passed by other ants are prone to be selected; beta represents an expected value heuristic factor, which shows the accuracy and the degree of importance of reasoning time delay when ants select paths, and the larger the value is, the closer the state transition rule is to the greedy rule; tau isI,J(t) pheromones on the path from point I to point J at time t; tau isI,S(t) indicates points I to allowedmPheromone on any point path;
Figure BDA0002921713920000054
representing I to allowedmHeuristic information on the path of any point in the tree;
step S3.4: the pheromone is volatilized dynamically;
Figure BDA0002921713920000055
wherein ,ρI,J(t) represents the volatility coefficient on the path from I to J at the moment t; etaI,J(t) heuristic information on the path from time I to J at t;
Figure BDA0002921713920000056
ηirepresenting all initiation information, wherein L represents the total number of nodes in the current network;
step S3.5: performing pheromone increment calculation;
Figure BDA0002921713920000057
wherein Q is a pheromone strength constant, which is the total amount of pheromones released by ants on a path traveled in one cycle; etamTo representThe total amount of heuristic information suffered by the mth ant in the cycle;
step S3.6: updating pheromone;
τI,J(t+1)=(1-ρ)τI,J(t)+ΔτI,J(t,t+1)
Figure BDA0002921713920000061
wherein rho is the pheromone dynamic volatility coefficient;
Figure BDA0002921713920000062
represents the pheromone increment left by the mth ant on the path (I, J) in the current cycle, delta tauI,J(t, t +1) represents pheromone increment left by all ants passing through the path (I, J) in the current cycle; k represents the total number of ants passing through the paths (I, J) in the current cycle;
step S3.7: and (4) optimizing and judging: when the optimization of all the search subspaces reaches the maximum cycle times, the circulation is exited, and the optimization results of all the search subspaces are output as a current candidate set; otherwise, step S3.2 to step S3.7 are repeated until the maximum number of cycles is reached.
Preferably, the evolving mutations in step S5 include: setting the excitation factors omega in all the sub-network modules in the current optimal network architecture as constants, wherein omega is more than or equal to 0 and less than or equal to 1; and simultaneously, randomly selecting mutation operation in the mutation set, promoting the internal structure of the sub-network module, generating a next generation sub-network module, and repeatedly executing the steps S2 to S5 until the preset target requirement is met.
The invention provides a neural network architecture search system based on evolutionary computing, which comprises:
module M1: setting target requirements through a target function according to the target requirements and the platform requirements, wherein the target requirements comprise: expected accuracy, reasoning time delay, search space size and evolution times;
module M2: according to the set size of the search space, randomly generating N directed cyclic graphs based on the sub-network module set to serve as a network search space for evolutionary optimization;
module M3: under the guidance of heuristic information, combining with a pheromone dynamic volatilization and probability path selection mechanism, searching N directed acyclic graphs for optimizing paths in N randomly generated acyclic graphs through an ant colony algorithm to form a candidate set;
module M4: acquiring the accuracy and reasoning time delay of N optimizing paths in the candidate set through training and testing, and selecting an optimal result as a current optimal network structure;
module M5: evaluating whether the current network architecture meets the target requirement, when the current network architecture does not meet the preset target requirement, and when the current network architecture meets the speed and precision requirements, pausing the optimization result, carrying out real-time evolution mutation on the internal structures of all the sub-network modules based on the current optimal network architecture, and continuing iteration until the preset target requirement is met; and when the preset target requirement is met, outputting the current optimal network result, otherwise, quitting the searching process and outputting the abnormal searching.
Preferably, said module M1 comprises:
the objective function formula is as follows:
Figure BDA0002921713920000071
s.t.LAT(Net)≤T&ACC(Net)≥A
wherein, the target function is defined as a multi-target search; net represents the network obtained by the evolutionary algorithm; acc (net) indicates the accuracy of the network; lat (net) represents the inference delay of the network; t represents the expected inference time delay; a represents the desired accuracy; the expected accuracy is set according to preset target requirements; the expected inference time delay is set according to mobile, embedded or general platform types.
Preferably, said module M2 comprises:
the sub-network modules are nodes in an undirected cyclic graph; the sub-network modules comprise a plurality of types of sub-networks with M layers, and the types of the sub-networks can be expanded and selected according to target requirements and platform requirements;
the structure of the sub-network comprises: the system comprises a multi-convolution layer, a ResNet block, a depth separable convolution, an inverted residual error structure with a linear bottleneck and a lightweight attention structure based on a compression-excitation structure;
the network search space includes: the network search space comprises N search subspaces;
Figure BDA0002921713920000072
Figure BDA0002921713920000073
wherein ,
Figure BDA0002921713920000074
representing the i-th generation of a collection of sub-network modules,
Figure BDA0002921713920000075
represents the jth sub-network module in the ith generation,
Figure BDA0002921713920000076
representing an edge set of the ith generation of search space, and connecting the sub-network modules through edges;
Figure BDA0002921713920000077
representing an edge between a jth sub-network module and a kth sub-network module in an ith generation;
Figure BDA0002921713920000078
representing the ith generation nth search space; i denotes an iteration number.
Preferably, said module M3 comprises:
module M3.1: selecting any point in each search subspace as a starting point, selecting a node farthest from the starting point as an end point, and initializing the ant number, the pheromone intensity constant and the cycle number;
module M3.2: calculating heuristic information;
Figure BDA0002921713920000079
wherein ,ηI,J(t) heuristic information from node I to node J at time t; depI,J、WigI,J、ConI,J、FilI,JRespectively representing the depth, the width, the connectivity and the number of filters of the joint J, wherein omega represents an excitation factor, and omega is more than or equal to 0 and less than or equal to 1; the reward mechanism is defined as exciting all nodes in the current optimal network architecture; the smaller the omega value is, the larger heuristic information eta is, and the initial value of omega is set to be 1 because evolution is not generated in the initial network search space;
module M3.3: selecting a probability path;
Figure BDA0002921713920000081
wherein ,
Figure BDA0002921713920000082
representing the probability that the ant m moves from the point I to the point J at the t-th moment; allowedmRepresenting nodes which can be selected by ants in the next step; alpha represents pheromone elicitation factors, represents the effect of residual pheromones on the paths in the optimizing process, and the larger the value is, the stronger the cooperation capability among ants is, and the paths passed by other ants are prone to be selected; beta represents an expected value heuristic factor, which shows the accuracy and the degree of importance of reasoning time delay when ants select paths, and the larger the value is, the closer the state transition rule is to the greedy rule; tau isI,J(t) pheromones on the path from point I to point J at time t; tau isI,S(t) indicates points I to allowedmPheromone on any point path;
Figure BDA0002921713920000083
representing I to allowedmHeuristic information on the path of any point in the tree;
module M3.4: the pheromone is volatilized dynamically;
Figure BDA0002921713920000084
wherein ,ρI,J(t) represents the volatility coefficient on the path from I to J at the moment t; etaI,J(t) heuristic information on the path from time I to J at t;
Figure BDA0002921713920000085
ηirepresenting all initiation information, wherein L represents the total number of nodes in the current network;
module M3.5: performing pheromone increment calculation;
Figure BDA0002921713920000086
wherein Q is a pheromone strength constant, which is the total amount of pheromones released by ants on a path traveled in one cycle; etamRepresenting the total amount of heuristic information suffered by the mth ant in the cycle;
module M3.6: updating pheromone;
τI,J(t+1)=(1-ρ)τI,J(t)+ΔτI,J(t,t+1)
Figure BDA0002921713920000087
wherein rho is the pheromone dynamic volatility coefficient;
Figure BDA0002921713920000088
represents the pheromone increment left by the mth ant on the path (I, J) in the current cycle, delta tauI,J(t, t +1) represents pheromone increment left by all ants passing through the path (I, J) in the current cycle; k represents the total number of ants passing through the paths (I, J) in the current cycle;
module M3.7: and (4) optimizing and judging: when the optimization of all the search subspaces reaches the maximum cycle times, the circulation is exited, and the optimization results of all the search subspaces are output as a current candidate set; otherwise, repeatedly triggering the execution of the modules M3.2 to M3.7 until the maximum cycle number is reached;
the mutations evolved in the module M5 include: setting the excitation factors omega in all the sub-network modules in the current optimal network architecture as constants, wherein omega is more than or equal to 0 and less than or equal to 1; and meanwhile, randomly selecting mutation operation in the mutation set, promoting the internal structure of the sub-network module, generating a next generation sub-network module, and repeatedly triggering the execution of the modules M2 to M5 until the preset target requirement is met.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the invention, through the idea of modular graph theory construction, a system level searching subspace taking the sub-network modules as basic components is constructed, so that the complexity of the searching space can be effectively reduced, the framework searching process is accelerated in a module level searching mode, and the searching performance is improved; by utilizing the heuristic optimization capability of the ant colony algorithm, the network performance reduction caused by the fact that the framework search is trapped into local optimization can be prevented from the global angle; the reward and mutation evolution mechanism are fused, and the evolution is as comprehensive as possible through a structure level mode, so that the incomplete framework search can be avoided.
2. The invention can set the expected target through the target function according to the actual application requirement and the platform requirement, is not limited by an application platform, simultaneously encourages the diversity of the structure in the module of the whole network, has certain application flexibility and expandability, and can obtain a neural network model realizing good balance between precision and speed under the condition of resource limitation.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
fig. 1 is a flowchart of a neural network architecture search method based on evolutionary computation.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
Example 1
The invention provides a neural network architecture searching method based on evolutionary computation, which comprises the following steps:
step S1: setting target requirements through a target function according to the target requirements and the platform requirements, wherein the target requirements comprise: expected accuracy, reasoning time delay, search space size and evolution times;
step S2: according to the set size of the search space, randomly generating N directed cyclic graphs based on the sub-network module set to serve as a network search space for evolutionary optimization;
step S3: under the guidance of heuristic information, combining with a pheromone dynamic volatilization and probability path selection mechanism, searching N directed acyclic graphs for optimizing paths in N randomly generated acyclic graphs through an ant colony algorithm to form a candidate set;
step S4: acquiring the accuracy and reasoning time delay of N optimizing paths in the candidate set through training and testing, and selecting an optimal result as a current optimal network structure;
step S5: evaluating whether the current network architecture meets the target requirement, when the current network architecture does not meet the preset target requirement, and when the current network architecture meets the speed and precision requirements, pausing the optimization result, carrying out real-time evolution mutation on the internal structures of all the sub-network modules based on the current optimal network architecture, and continuing iteration until the preset target requirement is met; and when the preset target requirement is met, outputting the current optimal network result, otherwise, quitting the searching process and outputting the abnormal searching.
Specifically, the step S1 includes:
the objective function formula is as follows:
Figure BDA0002921713920000101
s.t.LAT(Net)≤T&ACC(Net)≥A
wherein, the target function is defined as a multi-target search; net represents the network obtained by the evolutionary algorithm; acc (net) indicates the accuracy of the network; lat (net) represents the inference delay of the network; t represents the expected inference time delay; a represents the desired accuracy; the expected accuracy is set according to preset target requirements; the expected inference time delay is set according to mobile, embedded or general platform types.
Specifically, the step S2 includes:
the sub-network modules are nodes in an undirected cyclic graph; the sub-network modules comprise a plurality of types of sub-networks with M layers, and the types of the sub-networks can be expanded and selected according to target requirements and platform requirements;
the structure of the sub-network comprises: multi-convolution layers, ResNet blocks, depth separable convolutions, reversed residual structure with linear bottlenecks, and lightweight attention structure based on compression-excitation structures.
Specifically, the network search space includes: the network search space comprises N search subspaces;
Figure BDA0002921713920000102
Figure BDA0002921713920000103
wherein ,
Figure BDA0002921713920000104
representing the i-th generation of a collection of sub-network modules,
Figure BDA0002921713920000105
represents the jth sub-network module in the ith generation,
Figure BDA0002921713920000111
representing an edge set of the ith generation of search space, and connecting the sub-network modules through edges;
Figure BDA0002921713920000112
representing an edge between a jth sub-network module and a kth sub-network module in an ith generation;
Figure BDA0002921713920000113
representing the ith generation nth search space; i denotes an iteration number.
Specifically, the step S3 includes:
step S3.1: selecting any point in each search subspace as a starting point, selecting a node farthest from the starting point as an end point, and initializing the ant number, the pheromone intensity constant and the cycle number;
step S3.2: calculating heuristic information;
Figure BDA0002921713920000114
wherein ,ηI,J(t) heuristic information from node I to node J at time t; depI,J、WigI,J、ConI,J、FilI,JRespectively representing the depth, the width, the connectivity and the number of filters of the joint J, wherein omega represents an excitation factor, and omega is more than or equal to 0 and less than or equal to 1; the reward mechanism is defined as exciting all nodes in the current optimal network architecture; the smaller the omega value is, the larger heuristic information eta is, and the initial value of omega is set to be 1 because evolution is not generated in the initial network search space;
step S3.3: selecting a probability path;
Figure BDA0002921713920000115
wherein ,
Figure BDA0002921713920000116
indicates the t-th timeThe probability that ant m moves from point I to point J; allowedmRepresenting nodes which can be selected by ants in the next step; alpha represents pheromone elicitation factors, represents the effect of residual pheromones on the paths in the optimizing process, and the larger the value is, the stronger the cooperation capability among ants is, and the paths passed by other ants are prone to be selected; beta represents an expected value heuristic factor, which shows the accuracy and the degree of importance of reasoning time delay when ants select paths, and the larger the value is, the closer the state transition rule is to the greedy rule; tau isI,J(t) pheromones on the path from point I to point J at time t; tau isI,S(t) indicates points I to allowedmPheromone on any point path;
Figure BDA0002921713920000117
representing I to allowedmHeuristic information on the path of any point in the tree;
step S3.4: the pheromone is volatilized dynamically;
Figure BDA0002921713920000118
wherein ,ρI,J(t) represents the volatility coefficient on the path from I to J at the moment t; etaI,J(t) heuristic information on the path from time I to J at t;
Figure BDA0002921713920000119
ηirepresenting all initiation information, wherein L represents the total number of nodes in the current network;
step S3.5: performing pheromone increment calculation;
Figure BDA0002921713920000121
wherein Q is a pheromone strength constant, which is the total amount of pheromones released by ants on a path traveled in one cycle; etamRepresenting the total amount of heuristic information suffered by the mth ant in the cycle;
step S3.6: updating pheromone;
τI,J(t+1)=(1-ρ)τI,J(t)+ΔτI,J(t,t+1)
Figure BDA0002921713920000122
wherein rho is the pheromone dynamic volatility coefficient;
Figure BDA0002921713920000123
represents the pheromone increment left by the mth ant on the path (I, J) in the current cycle, delta tauI,J(t, t +1) represents pheromone increment left by all ants passing through the path (I, J) in the current cycle; k represents the total number of ants passing through the paths (I, J) in the current cycle;
step S3.7: and (4) optimizing and judging: when the optimization of all the search subspaces reaches the maximum cycle times, the circulation is exited, and the optimization results of all the search subspaces are output as a current candidate set; otherwise, step S3.2 to step S3.7 are repeated until the maximum number of cycles is reached.
Specifically, the evolving mutations in step S5 include: setting the excitation factors omega in all the sub-network modules in the current optimal network architecture as constants, wherein omega is more than or equal to 0 and less than or equal to 1; and simultaneously, randomly selecting mutation operation in the mutation set, promoting the internal structure of the sub-network module, generating a next generation sub-network module, and repeatedly executing the steps S2 to S5 until the preset target requirement is met.
The invention provides a neural network architecture search system based on evolutionary computing, which comprises:
module M1: setting target requirements through a target function according to the target requirements and the platform requirements, wherein the target requirements comprise: expected accuracy, reasoning time delay, search space size and evolution times;
module M2: according to the set size of the search space, randomly generating N directed cyclic graphs based on the sub-network module set to serve as a network search space for evolutionary optimization;
module M3: under the guidance of heuristic information, combining with a pheromone dynamic volatilization and probability path selection mechanism, searching N directed acyclic graphs for optimizing paths in N randomly generated acyclic graphs through an ant colony algorithm to form a candidate set;
module M4: acquiring the accuracy and reasoning time delay of N optimizing paths in the candidate set through training and testing, and selecting an optimal result as a current optimal network structure;
module M5: evaluating whether the current network architecture meets the target requirement, when the current network architecture does not meet the preset target requirement, and when the current network architecture meets the speed and precision requirements, pausing the optimization result, carrying out real-time evolution mutation on the internal structures of all the sub-network modules based on the current optimal network architecture, and continuing iteration until the preset target requirement is met; and when the preset target requirement is met, outputting the current optimal network result, otherwise, quitting the searching process and outputting the abnormal searching.
Specifically, the module M1 includes:
the objective function formula is as follows:
Figure BDA0002921713920000131
s.t.LAT(Net)≤T&ACC(Net)≥A
wherein, the target function is defined as a multi-target search; net represents the network obtained by the evolutionary algorithm; acc (net) indicates the accuracy of the network; lat (net) represents the inference delay of the network; t represents the expected inference time delay; a represents the desired accuracy; the expected accuracy is set according to preset target requirements; the expected inference time delay is set according to mobile, embedded or general platform types.
Specifically, the module M2 includes:
the sub-network modules are nodes in an undirected cyclic graph; the sub-network modules comprise a plurality of types of sub-networks with M layers, and the types of the sub-networks can be expanded and selected according to target requirements and platform requirements;
the structure of the sub-network comprises: the system comprises a multi-convolution layer, a ResNet block, a depth separable convolution, an inverted residual error structure with a linear bottleneck and a lightweight attention structure based on a compression-excitation structure;
the network search space includes: the network search space comprises N search subspaces;
Figure BDA0002921713920000132
Figure BDA0002921713920000133
wherein ,
Figure BDA0002921713920000134
representing the i-th generation of a collection of sub-network modules,
Figure BDA0002921713920000135
represents the jth sub-network module in the ith generation,
Figure BDA0002921713920000136
representing an edge set of the ith generation of search space, and connecting the sub-network modules through edges;
Figure BDA0002921713920000137
representing an edge between a jth sub-network module and a kth sub-network module in an ith generation;
Figure BDA0002921713920000138
representing the ith generation nth search space; i denotes an iteration number.
Specifically, the module M3 includes:
module M3.1: selecting any point in each search subspace as a starting point, selecting a node farthest from the starting point as an end point, and initializing the ant number, the pheromone intensity constant and the cycle number;
module M3.2: calculating heuristic information;
Figure BDA0002921713920000139
wherein ,ηI,J(t) heuristic information from node I to node J at time t; depI,J、WigI,J、ConI,J、FilI,JRespectively representing the depth, the width, the connectivity and the number of filters of the joint J, wherein omega represents an excitation factor, and omega is more than or equal to 0 and less than or equal to 1; the reward mechanism is defined as exciting all nodes in the current optimal network architecture; the smaller the omega value is, the larger heuristic information eta is, and the initial value of omega is set to be 1 because evolution is not generated in the initial network search space;
module M3.3: selecting a probability path;
Figure BDA0002921713920000141
wherein ,
Figure BDA0002921713920000142
representing the probability that the ant m moves from the point I to the point J at the t-th moment; allowedmRepresenting nodes which can be selected by ants in the next step; alpha represents pheromone elicitation factors, represents the effect of residual pheromones on the paths in the optimizing process, and the larger the value is, the stronger the cooperation capability among ants is, and the paths passed by other ants are prone to be selected; beta represents an expected value heuristic factor, which shows the accuracy and the degree of importance of reasoning time delay when ants select paths, and the larger the value is, the closer the state transition rule is to the greedy rule; tau isI,J(t) pheromones on the path from point I to point J at time t; tau isI,S(t) indicates points I to allowedmPheromone on any point path;
Figure BDA0002921713920000143
representing I to allowedmHeuristic information on the path of any point in the tree;
module M3.4: the pheromone is volatilized dynamically;
Figure BDA0002921713920000144
wherein ,ρI,J(t) represents the volatility coefficient on the path from I to J at the moment t; etaI,J(t) heuristic information on the path from time I to J at t;
Figure BDA0002921713920000145
ηirepresenting all initiation information, wherein L represents the total number of nodes in the current network;
module M3.5: performing pheromone increment calculation;
Figure BDA0002921713920000146
wherein Q is a pheromone strength constant, which is the total amount of pheromones released by ants on a path traveled in one cycle; etamRepresenting the total amount of heuristic information suffered by the mth ant in the cycle;
module M3.6: updating pheromone;
τI,J(t+1)=(1-ρ)τI,J(t)+ΔτI,J(t,t+1)
Figure BDA0002921713920000147
wherein rho is the pheromone dynamic volatility coefficient;
Figure BDA0002921713920000148
represents the pheromone increment left by the mth ant on the path (I, J) in the current cycle, delta tauI,J(t, t +1) represents pheromone increment left by all ants passing through the path (I, J) in the current cycle; k represents the total number of ants passing through the paths (I, J) in the current cycle;
module M3.7: and (4) optimizing and judging: when the optimization of all the search subspaces reaches the maximum cycle times, the circulation is exited, and the optimization results of all the search subspaces are output as a current candidate set; otherwise, repeatedly triggering the execution of the modules M3.2 to M3.7 until the maximum cycle number is reached;
the mutations evolved in the module M5 include: setting the excitation factors omega in all the sub-network modules in the current optimal network architecture as constants, wherein omega is more than or equal to 0 and less than or equal to 1; and meanwhile, randomly selecting mutation operation in the mutation set, promoting the internal structure of the sub-network module, generating a next generation sub-network module, and repeatedly triggering the execution of the modules M2 to M5 until the preset target requirement is met.
Example 2
Example 2 is a modification of example 1
The invention provides a neural network architecture searching method and system based on evolutionary computation, which are used for exploring an optimal neural network architecture in a mode of combining module-level search and structure-level evolution by utilizing the optimization capability and a reward evolution mechanism of an ant colony algorithm on the basis of a neural network initialization module according to the idea constructed by a modular graph theory, and can take account of speed and precision to solve the technical problems under the condition of resource limitation.
Aiming at the problem that the performance and efficiency are difficult to balance under the condition of limited resources in the existing neural network architecture search, the invention aims to provide a neural network architecture search method and system based on evolutionary computation. The invention takes a sub-network module as a basic component, randomly generates a plurality of network searching subspaces (directed acyclic graphs), takes the depth, the breadth, the connectivity and the number of filters of the sub-network module (node) as the complexity of the sub-network module and combines an incentive factor as heuristic information, utilizes an improved ant colony algorithm to search for optimization in the subspaces through an pheromone dynamic volatilization and random probability path selection mechanism, and simultaneously integrates a reward and an evolution mutation mechanism to continuously evolve so as to search for an optimal neural network architecture which gives consideration to both performance and efficiency.
The Evolutionary computing is in the field of computer science, and Evolutionary computing (evolution computing) is a sub-domain related to a combinatorial optimization problem in intelligent computing (Computational Intelligence). The evolution algorithm is influenced by a natural selection mechanism of 'winning or losing' in the biological evolution process and a transmission rule of genetic information, the process is simulated by program iteration, the problem to be solved is regarded as the environment, and in a population consisting of some possible solutions, the optimal solution is sought through natural evolution.
The Evolutionary algorithm or "Evolutionary algorithm" is an algorithm cluster, which generates the inspiration from the biological evolution of nature, despite its many variations, different genetic expression patterns, different crossover and mutation operators, the introduction of special operators, and different regeneration and selection methods. Compared with the traditional optimization algorithms such as a calculus-based method and an exhaustion method, the evolutionary algorithm is a mature global optimization method with high robustness and wide applicability, has the characteristics of self-organization, self-adaptation and self-learning, can not be limited by problem properties, and effectively solves the complex problem which is difficult to solve by the traditional optimization algorithm.
The Neural Network Architecture Search (NAS) is one of the hot spots for deep learning research. NAS aims to design a neural network architecture with optimal performance in an automated manner, with minimal human intervention, by using limited computational resources.
The method is one of group intelligent algorithms in the bionics, which is proposed in 1991 by the italian scholars m.Dorigo et al after inspired by ant foraging in the real world. During the foraging process of ants, each ant releases a chemical substance called pheromone on the path which can exchange information with other ants. As time goes on, the pheromone volatilizes, but the shorter the path that the ants climb over, the slower the corresponding pheromone volatilizes, and the concentration of the pheromone left on the path is relatively higher. Ants can find the pheromone and perceive the concentration of the pheromone, and can select a path with the highest pheromone concentration with higher probability. Thus, more ants can select a path containing high-concentration pheromone, and more ants are attracted to the path to form positive feedback. Based on the principle, ants can quickly find a shortest path away from the food source.
The invention relates to a neural network architecture searching method and system based on evolutionary computation, which comprises the steps of searching target setting, searching space initialization, ant colony optimization, target evaluation, evolution mutation and the like. Firstly, setting a search target parameter from the aspects of accuracy, reasoning time delay, evolution times and the like according to target requirements and platform requirements; then initializing a search space, and randomly generating N search subspaces; on the basis, ant colony optimization is started, and a current candidate set is constructed; then, target evaluation is completed on the data set, and the current optimal network architecture is selected from the candidate set; and finally, carrying out evolution mutation on the internal structures of all the sub-network modules of the current optimal network architecture, and continuing iteration until the target requirement is met.
The searching target setting is to set the expected accuracy and the reasoning time delay through a defined target function according to the target requirement and the platform requirement, and set parameters such as the size of a searching space and the evolution times.
The search space is initialized, namely N directed cyclic graphs are randomly generated for the sub-network module set according to the set size of the search space and are used as the network search space for evolutionary optimization.
Specifically, the network search space is a hierarchically partitioned unidirectional cyclic graph, each sub-network module representing a node in the unidirectional cyclic graph, allowing module-level searching and module internal structure-level mutation and searching.
Specifically, the sub-network modules are nodes in an undirected cyclic graph, are defined as a plurality of types of sub-networks with M layers, can expand the types of the sub-networks and are selected according to target requirements and application platforms, and the sub-network structures of the sub-networks comprise but are not limited to multi-convolution layers, ResNet blocks, deep separable convolutions, inverse residual structure with linear bottleneck, lightweight attention structure based on compression-excitation structure and the like.
The ant colony optimization is that under the guidance of heuristic information, by combining with a pheromone dynamic volatilization and probability path selection mechanism, N directed acyclic graphs are searched for optimizing paths in N randomly generated acyclic graphs through an ant colony algorithm, so that a candidate set of the generation is formed.
And the target evaluation is to obtain the accuracy and reasoning time delay of the N optimizing paths in the candidate set through training and testing, select the optimal result as the optimal network architecture of the generation, and evaluate whether the target requirement is met.
And (3) the evolution mutation, namely awarding rewards for internal structures of all sub-network modules in the optimal network architecture of the generation, and randomly selecting mutation operation in a mutation set to generate the sub-network module of the next generation, so as to simulate the process that excellent individuals in the nature are easier to leave offspring.
Specifically, the mutation set comprises module structure keeping unchanged, convolution type randomly selected, convolution kernel size, filter size, convolution layer insertion, convolution layer deletion, connection addition and connection deletion operations.
A neural network architecture searching method and system based on evolutionary computation comprises the following steps: as shown in figure 1 of the drawings, in which,
step 1, search target setting:
according to the target requirement and the platform requirement, the expected accuracy and the reasoning time delay are set through the defined target function, and parameters such as the size of a search space and the evolution times are set at the same time. In order to seek the balance between accuracy and reasoning time delay, an objective function is defined as a multi-objective searching problem, and the aim is to find a neural network architecture with high accuracy and low thrust time delay. The accuracy can be set according to the specific requirements of users, and TOP-1 or TOP-5 is adopted; the inference delay may be set according to mobile, embedded, or general platform types. The overall objective function is formalized specifically as follows:
Figure BDA0002921713920000171
s.t. LAT(Net)≤T&ACC(Net)≥A
where Net represents the network obtained by the evolutionary algorithm, acc (Net) represents the accuracy of the network, lat (Net) represents the inference delay of the network, and T represents the desired target delay. A represents the desired accuracy. Lat (Net) in s.t.
Step 2, initializing a search space:
and according to the set size of the search space, randomly generating N directed cyclic graphs from the sub-network module set to serve as the network search space for evolutionary optimization.
Each sub-network module represents a node in the undirected cyclic graph, is defined as a plurality of types of sub-networks with M layers, can extend the sub-network module types and select according to target requirements and application platform requirements, and has a sub-network structure including, but not limited to, multi-convolution layer (2D conv +1x1 conv with 1x1 conv + 3 x 3 filtering), ResNet block, deep separable convolution, reverse residual structure with linear bottleneck, lightweight attention structure based on compression-excitation structure, etc. Edges represent connections between sub-network modules. A search space is composed of N search subspaces, i.e.
Figure BDA0002921713920000172
Figure BDA0002921713920000173
The formalization is as follows:
Figure BDA0002921713920000181
wherein ,
Figure BDA0002921713920000182
representing the i-th generation of a collection of sub-network modules,
Figure BDA0002921713920000183
represents the jth sub-network module in the ith generation,
Figure BDA0002921713920000184
a set of edges representing the ith generation of search space,
Figure BDA0002921713920000185
representing an edge between a jth sub-network module and a kth sub-network module in an ith generation;
Figure BDA0002921713920000186
representing the ith generation nth search space; i denotes an iteration number.
Step 3, ant colony optimization:
under the guidance of heuristic information, combining with a pheromone dynamic volatilization and random probability path selection mechanism, searching N directed acyclic optimizing paths in N randomly generated acyclic graphs through an ant colony algorithm, and forming a candidate set of the generation.
Step 3.1: and initializing parameters. And selecting any point in each search subspace as a starting point, automatically selecting the node farthest from the starting point as an end point, initializing the ant number, the pheromone strength constant, the cycle number and the like.
Step 3.2: and (5) heuristic information calculation. The heuristic function has an important influence on the convergence and stability of the optimization process. In the ant colony algorithm, heuristic information eta is usually the inverse ratio of the distance between two points, but the standard of neural network architecture search balances accuracy and reasoning time delay, and the complexity of the internal structure of a sub-network module influencing the calculated quantity, such as depth, width, connectivity and the number of filters, and the excitation given by a reward mechanism need to be considered in the optimization process, so the heuristic information is defined as
Figure BDA0002921713920000187
wherein ,ηI,J(t) heuristic information from node I to node J at time t; depI,J、WigI,J、ConI,J、FilI,JRespectively representing the depth, the width, the connectivity and the number of filters of the node J, wherein omega represents an excitation factor, omega is more than or equal to 0 and less than or equal to 1, and a reward mechanism is defined to excite all nodes in the optimal network architecture of the generation. The smaller the value of ω is, the larger heuristic information η is, and since no evolution is generated in the initial network search space, the initial value of ω is set to 1.
Step 3.3: and (3) probability path selection: in each step of the path selection of the subspace search process, a pheromone heuristic factor is usedInfluence of expected value heuristic factor, according to probability
Figure BDA0002921713920000188
And deciding which way to move next.
Figure BDA0002921713920000189
Figure BDA00029217139200001810
Represents the probability that the ant m moves from the point I to the point J at the t-th moment, allowedmRepresenting nodes which can be selected by ants in the next step; the pheromone heuristic factor alpha represents the effect of the residual pheromone on the path in the optimizing process, and the larger the value is, the stronger the cooperation capability among the ants is, and the paths passed by other ants are prone to be selected; and the expected value heuristic factor beta shows the accuracy and the degree of importance of the reasoning delay when ants select paths, and the larger the value is, the closer the state transition rule is to the greedy rule. Tau isI,JAnd (t) represents pheromone on the path from point I to point J at time t. Tau isI,S(t) indicates points I to allowedmThe pheromone on the path of any point in the tree.
Step 3.4: dynamic volatilization of pheromones: the volatilization speed of pheromones in nature is a dynamic changing process and can be changed by environmental factors such as temperature, humidity and the like along with the lapse of time. In the optimization process of the neural network architecture, the more complex the structure of the nodes, the smaller the heuristic information is; the larger the reasoning time delay, the faster the volatilization speed of the node is, and the less the pheromone remains, so that after all ants reach the end point, the pheromone dynamic volatilization coefficient rho is calculated according to the following relation,
Figure BDA0002921713920000191
ρI,J(t) represents the volatility coefficient on the path from I to J at time t, etaI,J(t) indicates heuristic information on the path from time I to J at t,
Figure BDA0002921713920000192
ηirepresenting all the initiation information and L representing the total number of nodes under the network.
Step 3.5: pheromone increment calculation: the pheromone updating model is an important link of random search and rapid convergence of a basic ant colony algorithm. According to the global optimization requirement, the ant surrounding model is modified and formalized as follows:
Figure BDA0002921713920000193
q is a pheromone strength constant, which is the total amount of pheromones released by ants on the path traveled in a cycle, and affects the convergence rate of the algorithm to some extent. EtamRepresenting the total amount of heuristic information the mth ant receives in the cycle.
Step 3.6: and (3) updating pheromone: the pheromone amount on each path is equal at the initial moment, and after an ant completes one cycle, the pheromone gradually volatilizes along with the time, so that the pheromone concentration needs to be updated before the ant enters the next cycle, and the pheromone concentration is formalized as follows:
τI,J(t+1)=(1-ρ)τI,J(t)+ΔτI,J(t,t+1)
Figure BDA0002921713920000194
p is the dynamic volatility coefficient of the pheromone,
Figure BDA0002921713920000195
represents the amount of pheromone, delta tau, left on the path (I, J) by the mth ant in the current cycleI,J(t, t +1) represents the pheromone increment left by all ants passing through path (I, J) in this cycle. K represents the total number of ants passing through the path (I, J) in the current cycle.
Step 3.7: and (4) optimizing and judging: when the optimization of all the search subspaces reaches the maximum cycle times, the circulation is exited, the optimization results of all the search subspaces are output as a current candidate set, and the step 4 is entered; otherwise, the heuristic information calculation is carried out for 3.2, and the circulation is continued.
And 4, step 4: and (4) target evaluation. Acquiring the accuracy and reasoning time delay of N optimizing paths in the candidate set through training and testing, selecting an optimal result as an optimal network architecture of the generation, if the optimal result does not reach the evolution times and does not meet the final target requirement but meets the speed and precision requirements, temporarily storing the optimizing result, entering the step 5, and otherwise discarding the search subspace; and if the maximum evolution times are reached and the target requirements are met, sequencing the optimal network architecture results of each generation in all the evolution processes, entering the step 6, otherwise, exiting the searching process and outputting search abnormity.
And 5: and (5) evolving the mutation. All the sub-network modules in the optimal network architecture are rewarded, namely, an incentive factor omega is set to be a constant, omega is more than or equal to 0 and less than or equal to 1, and the smaller the value of omega is, the larger the incentive effect is. And meanwhile, randomly selecting mutation operation in the mutation set, developing the internal structure of the sub-network module, generating a next generation sub-network module, and returning to the step 2.
Step 6: and outputting a final result. And outputting the optimal neural network architecture.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. A neural network architecture searching method based on evolutionary computing is characterized by comprising the following steps:
step S1: setting target requirements through a target function according to the target requirements and the platform requirements, wherein the target requirements comprise: expected accuracy, reasoning time delay, search space size and evolution times;
step S2: according to the set size of the search space, randomly generating N directed cyclic graphs based on the sub-network module set to serve as a network search space for evolutionary optimization;
step S3: under the guidance of heuristic information, combining with a pheromone dynamic volatilization and probability path selection mechanism, searching N directed acyclic graphs for optimizing paths in N randomly generated acyclic graphs through an ant colony algorithm to form a candidate set;
step S4: acquiring the accuracy and reasoning time delay of N optimizing paths in the candidate set through training and testing, and selecting an optimal result as a current optimal network structure;
step S5: evaluating whether the current network architecture meets the target requirement, when the current network architecture does not meet the preset target requirement, and when the current network architecture meets the speed and precision requirements, pausing the optimization result, carrying out real-time evolution mutation on the internal structures of all the sub-network modules based on the current optimal network architecture, and continuing iteration until the preset target requirement is met; and when the preset target requirement is met, outputting the current optimal network result, otherwise, quitting the searching process and outputting the abnormal searching.
2. The evolutionary computing-based neural network architecture search method of claim 1, wherein the step S1 comprises:
the objective function formula is as follows:
Figure FDA0002921713910000011
s.t.LAT(Net)≤T&ACC(Net)≥A
wherein, the target function is defined as a multi-target search; net represents the network obtained by the evolutionary algorithm; acc (net) indicates the accuracy of the network; lat (net) represents the inference delay of the network; t represents the expected inference time delay; a represents the desired accuracy; the expected accuracy is set according to preset target requirements; the expected inference time delay is set according to mobile, embedded or general platform types.
3. The evolutionary computing-based neural network architecture search method of claim 1, wherein the step S2 comprises:
the sub-network modules are nodes in an undirected cyclic graph; the sub-network modules comprise a plurality of types of sub-networks with M layers, and the types of the sub-networks can be expanded and selected according to target requirements and platform requirements;
the structure of the sub-network comprises: multi-convolution layers, ResNet blocks, depth separable convolutions, reversed residual structure with linear bottlenecks, and lightweight attention structure based on compression-excitation structures.
4. The evolutionary computing-based neural network architecture search method of claim 3, wherein the network search space comprises: the network search space comprises N search subspaces;
Figure FDA0002921713910000021
Figure FDA0002921713910000022
wherein ,
Figure FDA0002921713910000023
representing the i-th generation of a collection of sub-network modules,
Figure FDA0002921713910000024
represents the jth sub-network module in the ith generation,
Figure FDA0002921713910000025
representing an edge set of the ith generation of search space, and connecting the sub-network modules through edges;
Figure FDA0002921713910000026
representing an edge between a jth sub-network module and a kth sub-network module in an ith generation;
Figure FDA0002921713910000027
representing the ith generation nth search space; i denotes an iteration number.
5. The evolutionary computing-based neural network architecture search method of claim 1, wherein the step S3 comprises:
step S3.1: selecting any point in each search subspace as a starting point, selecting a node farthest from the starting point as an end point, and initializing the ant number, the pheromone intensity constant and the cycle number;
step S3.2: calculating heuristic information;
Figure FDA0002921713910000028
wherein ,ηI,J(t) heuristic information from node I to node J at time t; depI,J、WigI,J、ConI,J、FilI,JRespectively representing the depth, the width, the connectivity and the number of filters of the joint J, wherein omega represents an excitation factor, and omega is more than or equal to 0 and less than or equal to 1; prize-giving deviceThe excitation mechanism is defined to excite all nodes in the current optimal network architecture; the smaller the omega value is, the larger heuristic information eta is, and the initial value of omega is set to be 1 because evolution is not generated in the initial network search space;
step S3.3: selecting a probability path;
Figure FDA0002921713910000029
wherein ,
Figure FDA00029217139100000210
representing the probability that the ant m moves from the point I to the point J at the t-th moment; allowedmRepresenting nodes which can be selected by ants in the next step; alpha represents pheromone elicitation factors, represents the effect of residual pheromones on the paths in the optimizing process, and the larger the value is, the stronger the cooperation capability among ants is, and the paths passed by other ants are prone to be selected; beta represents an expected value heuristic factor, which shows the accuracy and the degree of importance of reasoning time delay when ants select paths, and the larger the value is, the closer the state transition rule is to the greedy rule; tau isI,J(t) pheromones on the path from point I to point J at time t; tau isI,S(t) indicates points I to allowedmPheromone on any point path;
Figure FDA00029217139100000211
representing I to allowedmHeuristic information on the path of any point in the tree;
step S3.4: the pheromone is volatilized dynamically;
Figure FDA0002921713910000031
wherein ,ρI,J(t) represents the volatility coefficient on the path from I to J at the moment t; etaI,J(t) heuristic information on the path from time I to J at t;
Figure FDA0002921713910000032
representing all initiation information, wherein L represents the total number of nodes in the current network;
step S3.5: performing pheromone increment calculation;
Figure FDA0002921713910000033
wherein Q is a pheromone strength constant, which is the total amount of pheromones released by ants on a path traveled in one cycle; etamRepresenting the total amount of heuristic information suffered by the mth ant in the cycle;
step S3.6: updating pheromone;
τI,J(t+1)=(1-ρ)τI,J(t)+ΔτI,J(t,t+1)
Figure FDA0002921713910000034
wherein rho is the pheromone dynamic volatility coefficient;
Figure FDA0002921713910000035
represents the pheromone increment left by the mth ant on the path (I, J) in the current cycle, delta tauI,J(t, t +1) represents pheromone increment left by all ants passing through the path (I, J) in the current cycle; k represents the total number of ants passing through the paths (I, J) in the current cycle;
step S3.7: and (4) optimizing and judging: when the optimization of all the search subspaces reaches the maximum cycle times, the circulation is exited, and the optimization results of all the search subspaces are output as a current candidate set; otherwise, step S3.2 to step S3.7 are repeated until the maximum number of cycles is reached.
6. The evolutionary computing-based neural network architecture searching method of claim 1, wherein the evolving mutations in step S5 comprise: setting the excitation factors omega in all the sub-network modules in the current optimal network architecture as constants, wherein omega is more than or equal to 0 and less than or equal to 1; and simultaneously, randomly selecting mutation operation in the mutation set, promoting the internal structure of the sub-network module, generating a next generation sub-network module, and repeatedly executing the steps S2 to S5 until the preset target requirement is met.
7. An evolutionary computing-based neural network architecture search system, comprising:
module M1: setting target requirements through a target function according to the target requirements and the platform requirements, wherein the target requirements comprise: expected accuracy, reasoning time delay, search space size and evolution times;
module M2: according to the set size of the search space, randomly generating N directed cyclic graphs based on the sub-network module set to serve as a network search space for evolutionary optimization;
module M3: under the guidance of heuristic information, combining with a pheromone dynamic volatilization and probability path selection mechanism, searching N directed acyclic graphs for optimizing paths in N randomly generated acyclic graphs through an ant colony algorithm to form a candidate set;
module M4: acquiring the accuracy and reasoning time delay of N optimizing paths in the candidate set through training and testing, and selecting an optimal result as a current optimal network structure;
module M5: evaluating whether the current network architecture meets the target requirement, when the current network architecture does not meet the preset target requirement, and when the current network architecture meets the speed and precision requirements, pausing the optimization result, carrying out real-time evolution mutation on the internal structures of all the sub-network modules based on the current optimal network architecture, and continuing iteration until the preset target requirement is met; and when the preset target requirement is met, outputting the current optimal network result, otherwise, quitting the searching process and outputting the abnormal searching.
8. The evolutionary computing-based neural network architecture search system of claim 7, wherein the module M1 comprises:
the objective function formula is as follows:
Figure FDA0002921713910000041
s.t.LAT(Net)≤T&ACC(Net)≥A
wherein, the target function is defined as a multi-target search; net represents the network obtained by the evolutionary algorithm; acc (net) indicates the accuracy of the network; lat (net) represents the inference delay of the network; t represents the expected inference time delay; a represents the desired accuracy; the expected accuracy is set according to preset target requirements; the expected inference time delay is set according to mobile, embedded or general platform types.
9. The evolutionary computing-based neural network architecture search system of claim 7, wherein the module M2 comprises:
the sub-network modules are nodes in an undirected cyclic graph; the sub-network modules comprise a plurality of types of sub-networks with M layers, and the types of the sub-networks can be expanded and selected according to target requirements and platform requirements;
the structure of the sub-network comprises: the system comprises a multi-convolution layer, a ResNet block, a depth separable convolution, an inverted residual error structure with a linear bottleneck and a lightweight attention structure based on a compression-excitation structure;
the network search space includes: the network search space comprises N search subspaces;
Figure FDA0002921713910000042
Figure FDA0002921713910000043
wherein ,
Figure FDA0002921713910000044
representing an ith generation subnetworkA set of modules to be used in the method,
Figure FDA0002921713910000045
represents the jth sub-network module in the ith generation,
Figure FDA0002921713910000046
representing an edge set of the ith generation of search space, and connecting the sub-network modules through edges;
Figure FDA0002921713910000047
representing an edge between a jth sub-network module and a kth sub-network module in an ith generation;
Figure FDA0002921713910000048
representing the ith generation nth search space; i denotes an iteration number.
10. The evolutionary computing-based neural network architecture search system of claim 7, wherein the module M3 comprises:
module M3.1: selecting any point in each search subspace as a starting point, selecting a node farthest from the starting point as an end point, and initializing the ant number, the pheromone intensity constant and the cycle number;
module M3.2: calculating heuristic information;
Figure FDA0002921713910000051
wherein ,ηI,J(t) heuristic information from node I to node J at time t; depI,J、WigI,J、ConI,J、FilI,JRespectively representing the depth, the width, the connectivity and the number of filters of the joint J, wherein omega represents an excitation factor, and omega is more than or equal to 0 and less than or equal to 1; the reward mechanism is defined as exciting all nodes in the current optimal network architecture; the smaller the omega value is, the larger heuristic information eta is, and the initial value of omega is set to be 1 because evolution is not generated in the initial network search space;
module M3.3: selecting a probability path;
Figure FDA0002921713910000052
wherein ,
Figure FDA0002921713910000053
representing the probability that the ant m moves from the point I to the point J at the t-th moment; allowedmRepresenting nodes which can be selected by ants in the next step; alpha represents pheromone elicitation factors, represents the effect of residual pheromones on the paths in the optimizing process, and the larger the value is, the stronger the cooperation capability among ants is, and the paths passed by other ants are prone to be selected; beta represents an expected value heuristic factor, which shows the accuracy and the degree of importance of reasoning time delay when ants select paths, and the larger the value is, the closer the state transition rule is to the greedy rule; tau isI,J(t) pheromones on the path from point I to point J at time t; tau isI,S(t) indicates points I to allowedmPheromone on any point path;
Figure FDA0002921713910000054
representing I to allowedmHeuristic information on the path of any point in the tree;
module M3.4: the pheromone is volatilized dynamically;
Figure FDA0002921713910000055
wherein ,ρI,J(t) represents the volatility coefficient on the path from I to J at the moment t; etaI,J(t) heuristic information on the path from time I to J at t;
Figure FDA0002921713910000056
representing all initiation information, wherein L represents the total number of nodes in the current network;
module M3.5: performing pheromone increment calculation;
Figure FDA0002921713910000061
wherein Q is a pheromone strength constant, which is the total amount of pheromones released by ants on a path traveled in one cycle; etamRepresenting the total amount of heuristic information suffered by the mth ant in the cycle;
module M3.6: updating pheromone;
τI,J(t+1)=(1-ρ)τI,J(t)+ΔτI,J(t,t+1)
Figure FDA0002921713910000062
wherein rho is the pheromone dynamic volatility coefficient;
Figure FDA0002921713910000063
represents the pheromone increment left by the mth ant on the path (I, J) in the current cycle, delta tauI,J(t, t +1) represents pheromone increment left by all ants passing through the path (I, J) in the current cycle; k represents the total number of ants passing through the paths (I, J) in the current cycle;
module M3.7: and (4) optimizing and judging: when the optimization of all the search subspaces reaches the maximum cycle times, the circulation is exited, and the optimization results of all the search subspaces are output as a current candidate set; otherwise, repeatedly triggering the execution of the modules M3.2 to M3.7 until the maximum cycle number is reached;
the mutations evolved in the module M5 include: setting the excitation factors omega in all the sub-network modules in the current optimal network architecture as constants, wherein omega is more than or equal to 0 and less than or equal to 1; and meanwhile, randomly selecting mutation operation in the mutation set, promoting the internal structure of the sub-network module, generating a next generation sub-network module, and repeatedly triggering the execution of the modules M2 to M5 until the preset target requirement is met.
CN202110120132.8A 2021-01-28 2021-01-28 Neural network architecture searching method and system based on evolutionary computation Active CN112784949B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110120132.8A CN112784949B (en) 2021-01-28 2021-01-28 Neural network architecture searching method and system based on evolutionary computation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110120132.8A CN112784949B (en) 2021-01-28 2021-01-28 Neural network architecture searching method and system based on evolutionary computation

Publications (2)

Publication Number Publication Date
CN112784949A true CN112784949A (en) 2021-05-11
CN112784949B CN112784949B (en) 2023-08-11

Family

ID=75759475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110120132.8A Active CN112784949B (en) 2021-01-28 2021-01-28 Neural network architecture searching method and system based on evolutionary computation

Country Status (1)

Country Link
CN (1) CN112784949B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469078A (en) * 2021-07-07 2021-10-01 西安电子科技大学 Hyperspectral image classification method based on automatic design long-time and short-time memory network
CN114117206A (en) * 2021-11-09 2022-03-01 北京达佳互联信息技术有限公司 Recommendation model processing method and device, electronic equipment and storage medium
CN116522999A (en) * 2023-06-26 2023-08-01 深圳思谋信息科技有限公司 Model searching and time delay predictor training method, device, equipment and storage medium
CN117611974A (en) * 2024-01-24 2024-02-27 湘潭大学 Image recognition method and system based on searching of multiple group alternative evolutionary neural structures
CN117668701A (en) * 2024-01-30 2024-03-08 云南迅盛科技有限公司 AI artificial intelligence machine learning system and method
CN118014010A (en) * 2024-04-09 2024-05-10 南京信息工程大学 Multi-objective evolutionary nerve architecture searching method based on multiple group mechanisms and agent models

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214498A (en) * 2018-07-10 2019-01-15 昆明理工大学 Ant group algorithm optimization method based on search concentration degree and dynamic pheromone updating
CN111144555A (en) * 2019-12-31 2020-05-12 中国人民解放军国防科技大学 Recurrent neural network architecture search method, system and medium based on improved evolutionary algorithm
CN111325356A (en) * 2019-12-10 2020-06-23 四川大学 Neural network search distributed training system and training method based on evolutionary computation
CN112101525A (en) * 2020-09-08 2020-12-18 南方科技大学 Method, device and system for designing neural network through NAS

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214498A (en) * 2018-07-10 2019-01-15 昆明理工大学 Ant group algorithm optimization method based on search concentration degree and dynamic pheromone updating
CN111325356A (en) * 2019-12-10 2020-06-23 四川大学 Neural network search distributed training system and training method based on evolutionary computation
CN111144555A (en) * 2019-12-31 2020-05-12 中国人民解放军国防科技大学 Recurrent neural network architecture search method, system and medium based on improved evolutionary algorithm
CN112101525A (en) * 2020-09-08 2020-12-18 南方科技大学 Method, device and system for designing neural network through NAS

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KHALID M. SALAMA 等: "Learning neural network structures with ant colony algorithms", 《SWARM INTELL》 *
YUKANG CHEN 等: "RENAS: Reinforced Evolutionary Neural Architecture Search", 《ARXIV》 *
耿飞 等: "神经网络架构搜索综述", 《智能计算机与应用》, vol. 10, no. 6 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469078A (en) * 2021-07-07 2021-10-01 西安电子科技大学 Hyperspectral image classification method based on automatic design long-time and short-time memory network
CN114117206A (en) * 2021-11-09 2022-03-01 北京达佳互联信息技术有限公司 Recommendation model processing method and device, electronic equipment and storage medium
CN116522999A (en) * 2023-06-26 2023-08-01 深圳思谋信息科技有限公司 Model searching and time delay predictor training method, device, equipment and storage medium
CN116522999B (en) * 2023-06-26 2023-12-15 深圳思谋信息科技有限公司 Model searching and time delay predictor training method, device, equipment and storage medium
CN117611974A (en) * 2024-01-24 2024-02-27 湘潭大学 Image recognition method and system based on searching of multiple group alternative evolutionary neural structures
CN117611974B (en) * 2024-01-24 2024-04-16 湘潭大学 Image recognition method and system based on searching of multiple group alternative evolutionary neural structures
CN117668701A (en) * 2024-01-30 2024-03-08 云南迅盛科技有限公司 AI artificial intelligence machine learning system and method
CN117668701B (en) * 2024-01-30 2024-04-12 云南迅盛科技有限公司 AI artificial intelligence machine learning system and method
CN118014010A (en) * 2024-04-09 2024-05-10 南京信息工程大学 Multi-objective evolutionary nerve architecture searching method based on multiple group mechanisms and agent models

Also Published As

Publication number Publication date
CN112784949B (en) 2023-08-11

Similar Documents

Publication Publication Date Title
CN112784949A (en) Neural network architecture searching method and system based on evolutionary computation
Liu et al. Progressive neural architecture search
CN114373101A (en) Image classification method for neural network architecture search based on evolution strategy
CN111144555A (en) Recurrent neural network architecture search method, system and medium based on improved evolutionary algorithm
Anand et al. Black magic in deep learning: How human skill impacts network training
CN116108384A (en) Neural network architecture searching method and device, electronic equipment and storage medium
CN116090549A (en) Knowledge-driven multi-agent reinforcement learning decision-making method, system and storage medium
Ben-Iwhiwhu et al. Evolving inborn knowledge for fast adaptation in dynamic pomdp problems
CN113128689A (en) Entity relationship path reasoning method and system for regulating knowledge graph
CN113139644B (en) Information source navigation method and device based on deep Monte Carlo tree search
CN116861957A (en) Operator automatic tuning method and related device based on reinforcement learning
CN112348175B (en) Method for performing feature engineering based on reinforcement learning
Hu et al. Neural fidelity warping for efficient robot morphology design
CN113298233A (en) Agent model-based progressive depth integration architecture searching method
Ba et al. Monte Carlo Tree Search with variable simulation periods for continuously running tasks
Drugan Multi-objective optimization perspectives on reinforcement learning algorithms using reward vectors.
Guo et al. Learning to navigate in unknown environments based on GMRP-N
Guo A Review of Research on Algorithms for Solving SAT Problems
Montana et al. Evolution of internal dynamics for neural network nodes
CN112926611B (en) Feature extraction method, device and computer readable storage medium
Pieter Parameters for the best convergence of an optimization algorithm On-The-Fly
Garcia Improving Reinforcement Learning Techniques by Leveraging Prior Experience
Ali et al. Recent Trends in Neural Architecture Search Systems
US20240013061A1 (en) Architecture search method and apparatus for large-scale graph, and device and storage medium
Wiker Reducing the Search Space of Neuroevolution using Monte Carlo Tree Search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant