CN113344174A - Efficient neural network structure searching method based on probability distribution - Google Patents

Efficient neural network structure searching method based on probability distribution Download PDF

Info

Publication number
CN113344174A
CN113344174A CN202110421335.0A CN202110421335A CN113344174A CN 113344174 A CN113344174 A CN 113344174A CN 202110421335 A CN202110421335 A CN 202110421335A CN 113344174 A CN113344174 A CN 113344174A
Authority
CN
China
Prior art keywords
probability
training
neural network
network
operations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110421335.0A
Other languages
Chinese (zh)
Inventor
王涛
周达
刘星宇
徐航
王易
李明光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202110421335.0A priority Critical patent/CN113344174A/en
Publication of CN113344174A publication Critical patent/CN113344174A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a high-efficiency neural network structure searching method based on probability distribution. The neural network structure obtained by searching the neural network structure has very competitive effect in various computer image tasks and language tasks at present. How to improve the efficiency of the search strategy and reduce the evaluation cost of the neural structure is still the direction of effort to find a better network structure in a shorter time. The invention provides a probability distribution type algorithm, which greatly reduces the number of training sub-networks, accelerates the searching process of the neural network architecture, and uses a parameter sharing mode of training and searching at the same time, thereby reducing the evaluation cost of the sub-networks, ensuring better operation and obtaining more training, and further accelerating the searching process of the neural network architecture. On CIFAR-10, the method can search out an optimal neural network structure by using GTX1080Ti for only 2 GPU hours, and 2.69% of test errors are realized under the condition that the network parameter number is only 2.8M. On the ImageNet dataset, the network can achieve a top1 accuracy of 76%.

Description

Efficient neural network structure searching method based on probability distribution
Technical Field
The invention relates to a method for designing a deep neural network structure in the field of artificial intelligence, in particular to a high-efficiency neural network structure searching method.
Background
Automatic neural network searching in a given neural network architecture space has attracted considerable attention over the past few years. To this end, many people have proposed a number of excellent search algorithms and evaluation strategies to find the best neural Network Architecture Search (NAS). In general, the NAS framework is divided into three parts, search space, search strategy, and evaluation strategy. As in fig. 1.
The variable of the search space definition optimization problem, the variable definition of the neural network structure and the variable definition of the hyper-parameter are different, and the different variable scales are different for the difficulty of the algorithm. If we find a set of network architecture parameters and corresponding hyper-parameters, the performance of the deep learning model is actually controlled and determined by the set of parameters, so that only the architecture parameters and the corresponding hyper-parameters of the complex model need to be optimized. In the early days of NAS, our commonly used network architecture was a chain structure, as shown in fig. 2.
The structure is equivalent to a sequence of N layers, each layer having several optional operators, such as convolution, pooling, etc., each operator including some hyper-parameters, such as convolution size, convolution step size, etc.
Some recent work has inspired some manually designed network architectures to study networks with multiple branches, such as fig. 3.
Many deep networks have similar structures, many networks have many repeated cells although they are deep, and after the cells are abstracted, the complex structure becomes simple, so that the number of optimized variables can be reduced, and on the other hand, the same cells can be migrated between different tasks, as shown in fig. 4.
Due to the difficulties of high latitude, continuity, discrete mixing and the like, the neural network architecture searching problem can greatly improve the effect if the dimension reduction can be carried out on the block of searching space dimension, and the mode that the Zoph works in 2018 is accelerated by 7 times compared with the mode that the work in 2017 is carried out.
The search strategy defines what algorithm can be used to quickly and accurately find the optimal network structure parameter configuration. Common search strategies include: reinforcement learning, evolutionary algorithm, random search, Bayesian optimization and gradient-based algorithm. Work as in NAS uses reinforcement learning as a meta-controller, based on the performance of the sampling network, an iterative Recurrent Neural Network (RNN) controller is trained to sample sequentially strings encoding a particular neural architecture, resulting in a new sub-network. The approximate framework of the evolutionary algorithm is also basically similar, a population (N groups of solutions) is randomly generated, and the following steps are started to be circulated: and selecting, crossing and mutating until final conditions are met. The evolutionary algorithm is a non-gradient optimization algorithm, and has the advantages that a global optimal solution can be obtained, and the defect that the efficiency is relatively low. DARTS is to relax the discrete search space to the continuous space, so that the search space can be effectively optimized by using a gradient method. By converting the problem of searching the network architecture into a process of optimizing continuous variables. After the search is complete, the most likely operation needs to be selected, while other operations are discarded. DARTS thus amounts to solving a quadratic optimization problem, which requires optimization of the blending operation while optimizing the network weights. However, this causes a problem of large GPU memory consumption when searching.
The evaluation strategy is similar to the agent model in engineering optimization, because the effect of the deep learning model is very dependent on the scale of the training data, the model training on large-scale data is very time-consuming, and the evaluation on the optimization result is very time-consuming, so some means is needed to perform approximate evaluation. Such as training the model with some low fidelity training set, or by treating all the architectures as subgraphs of a hypergraph, the subgraphs sharing weights directly through the edges of the hypergraph, etc. The MDENAS also makes great interest in proposing a precision hypothesis ordering that considers the precision ordering of each training period of a subnetwork to be consistent, so that the performance of the subnetwork can be estimated only by training the subnetwork several times, and the process of network structure search convergence is accelerated. However, this hypothesis is verified to have an accuracy of only about 70%, and the neural network structure that is excellent in the initial stage of training is not necessarily the best performance when training is completed to converge. The evaluation process is accelerated but the final web search results are also affected.
Disclosure of Invention
The invention provides a novel probability distribution algorithm, which greatly reduces the number of training sub-networks, accelerates the searching process of the neural network architecture, and uses a parameter sharing mode of training and searching at the same time, thereby reducing the evaluation cost of the sub-networks, ensuring better operation and obtaining more training, and further accelerating the searching process of the neural network architecture.
In a first aspect, the present invention provides a probabilistic distributed algorithm for use in a neural network structure search strategy, the algorithm comprising: initialization and sampling, as shown in FIG. 5, the network structure is diversified by selecting between every two nodes
Figure BDA0003027943110000021
One possible operation (in this paper, one
Figure BDA0003027943110000022
) To be realized. And the operations on the edges in fig. 5 are initially unknown, the iteration probabilities are continuously updated by initially initializing probabilities for each operation, and finally selecting the operation that performs best. Therefore, when searching initially, we need to initialize all operations in the search space with probability parameters as
Figure BDA0003027943110000023
I.e. between two nodes
Figure BDA0003027943110000024
The sum of the probabilities of the operations is 1. Then in the sampling phase, we are based on the fact that between every two nodes
Figure BDA0003027943110000025
And selecting the operation between every two nodes in the current round according to the operation probability, wherein the probability value is higher, and the probability of selection is higher. And finally, selecting a result, namely the network sampled in the current round. Compared with the prior NAS sampling method, only one operation needs to be selected among the nodes for sampling during searching, and the memory consumption of the GPU is effectively reduced.
Figure BDA0003027943110000026
Figure BDA0003027943110000027
cell={o(i,j)|0≤i≤N,i<j≤N} (3)
In a second aspect, the present invention provides a probabilistic distributed algorithm for use in a neural network structure search strategy, the algorithm comprising: the probability updating method and the prior NAS method are time-consuming and memory-consuming, and the main reasons are that the number of sampling networks is large in the searching process, the network performance evaluation is slow, and the sampled networks need to be trained to be convergent. Therefore, for the network performance evaluation strategy, a forced parameter sharing mode is also adopted, after the sampling network is completed and cells are stacked to form a complete CNN network, sharing parameters are directly given, and then performance evaluation is carried out on a data set. After the accuracy of the network on the data set is obtained, we will feed back the accuracy and perform probability update of each operation. As shown in FIG. 6, according to the probability selection operation, after determining the cells, stacking the cells into a complete network, giving shared parameters, performing performance evaluation, feeding back to the controller, updating information and probability, and completing a round of iteration. We define the probability of operation as
Figure BDA0003027943110000031
Defining the training algebra of each operation as
Figure BDA0003027943110000032
Define the average accuracy of each operation as
Figure BDA0003027943110000033
Wherein
Figure BDA0003027943110000034
Represents
Figure BDA0003027943110000035
An operation of the operations. Following the rule that at a node
Figure BDA0003027943110000036
And
Figure BDA0003027943110000037
in between
Figure BDA0003027943110000038
In an operation, if the operation is
Figure BDA0003027943110000039
The operation is superior to other operations with fewer iterations and higher accuracy than other operations. The updating formula of the average accuracy of each operation is as follows:
Figure BDA00030279431100000310
(a is performance of the current round of network evaluation) (4)
The comparison between operations is:
Figure BDA00030279431100000311
(function F indicates that if the result is true, 1 is returned, and if the result is false, 0 is returned) (5)
The probability update formula for the operation is:
Pm=Pm+α×Z,(1≤m≤M) (6)
where α is a hyper-parameter representing the magnitude of the operation probability update, which also affects the convergence speed and convergence effect of the search process.
As we see in the formula. We will pick the operation with fewer iterations but higher average accuracy in the search space and then enhance the probability of that operation. Meanwhile, for those operations with more iterations but lower average accuracy, we consider that this is a poorly performing operation, so the probability of the operation will be reduced. After a certain number of iterations, the probabilities of the operations in the search space will converge and stabilize effectively.
To generate the final neural network, we select the operation with the highest probability among all edges after the probability converges. For nodes with multiple inputs, we take
Figure BDA00030279431100000312
And (4) operation of individual probabilities. After determining the normal cell and the reduction cell, we stack them by a set number to form a complete neural network.
In a third aspect, the present invention provides a parameter sharing strategy for training and searching in a neural network structure evaluation strategy.
As shown in fig. 6, the known search space has been determined, and we determined that after each iteration of searching for a cell, a complete neural network consisting of 8 cell stacks is subjected to performance evaluation. So we share the operating parameters on each side in 8 cells. I.e. only the training parameters of 8 x 14 x 8 operations need to be saved. Then, when the neural network is trained or evaluated on the training data set or the evaluation data set each time, corresponding operation parameters are read from the stored shared parameters instead of random initialization, and after the training parameters are finished, the latest operation parameters are stored back to corresponding positions. Therefore, when the neural network is evaluated, the shared parameters can be directly read, performance evaluation is carried out on the evaluation data set, the process that each searched network needs to be trained to be convergent is avoided, and the network evaluation process is greatly accelerated. As shown in fig. 7.
Then how do the shared parameters train?
Let us consider that if before the search starts, per operation
Figure BDA0003027943110000041
Is randomly sampled for each of the initial probabilities
Training a batch with a randomly sampled network, then training
Figure BDA0003027943110000042
After epoch, each operational parameter can be trained to some extent. However, the search cost is increased because it is ensured that the operation parameters can be sufficiently trained, and excessive algebra cannot be trained. Therefore, a shared parameter training mode for training and searching simultaneously is provided. Referring to fig. 7, after training a generation of shared parameters, we perform a round of network search, sample the network, perform performance evaluation, update the operation probability, and then perform a new round of shared parameter training with the updated operation probability, and so on until the operation probability converges and stabilizes. The method has the advantages that the operation with poor performance gradually loses the parameter training opportunity, and the operation with good performance obtains more parameter training opportunities to find out the operation with best performance, so that the convergence of the operation probability is accelerated, and the overall speed of searching the neural network architecture is improved.
The invention has the beneficial effects that: the search framework provided by the invention greatly improves the search efficiency of the NAS, and can search an excellent neural network architecture by only using 1 GTX1080ti to search for 2 hours, thereby improving the accuracy, the parameter quantity and the GPU time delay to a certain extent. The search efficiency of the invention is the highest in the existing various NAS algorithms, MetaQNN, Progressive NAS, DARTS, ENAS, AmoebaNet-A + CutOut.
Drawings
FIG. 1 is a NAS framework diagram
FIG. 2 is an exemplary diagram of a chain network architecture
FIG. 3 is a diagram of an example multi-drop network architecture
FIG. 4 is a diagram of an exemplary network structure based on cell stacking
FIG. 5 is a diagram of exemplary network structure diversity
FIG. 6 is an exemplary graph of probabilistic iterative update
FIG. 7 is a diagram of an example of alternating iterations of network search and parameter training
FIG. 8 is a diagram illustrating a simple operation between nodes
FIG. 9 is an exemplary diagram of a cell
FIG. 10 is a network example diagram
FIG. 11 is a diagram of an example of a Normal Cell neural network architecture
FIG. 12 is a diagram of an exemplary architecture of a neural network of a preferred reduction cell
Detailed Description
Some terms used in the embodiments of the present application will be explained below.
The embodiments of the present application relate to related applications of a neural network, and in order to better understand the solution of the embodiments of the present application, a search space construction and related concepts that may be involved in the embodiments of the present application are described below.
On the search space, we search cells as building blocks of the final architecture. The searched cells may be stacked to form a convolutional network, or recursively connected to form a cyclic network. Neural networks are defined in different proportions: network, cell and node.
And (3) node:
nodes are the basic elements that make up a cell. Each node xiIs a particular tensor (e.g., an eigengraph in a convolutional neural network), each directed edge (i, j) represents an operation O sampled from an operation search space(i,j)To connect node xiConversion to another node xjAs shown in fig. 8. There are three types of nodes in the Cell: input node
Figure BDA0003027943110000051
Intermediate node
Figure BDA0003027943110000052
And an output node
Figure BDA0003027943110000053
Each cell takes the previous output tensor as an input node and operates by sampling(i,j)Applied to the previous node (
Figure BDA0003027943110000054
And
Figure BDA0003027943110000055
j ∈ [1, i)) to generate intermediate nodes
Figure BDA0003027943110000056
The concatenation of all intermediate nodes is considered the final output node.
The following set of possible operations (denoted as O) that we have chosen according to the differentiated architecture search consists of the following 8 operations: (1)3 × 3 maximal pooling; (2) connectionless (zero) (3)3 x 3 average pooling (4) skip connection (identity) (5) rate 23 x 3 extended convolution (6) rate 25 x 5 extended convolution (7)3 x 3 depth separable convolution (8)5 x 5 depth separable convolution
We only employ element-by-element addition at the input of nodes with multiple operations (edges). For example, in fig. 9, B2 has three operations, the result of which is added element by element and then considered B2.
Cell:
One Cell is defined as a tiny convolutional network, one
Figure BDA0003027943110000057
Tensor mapping to another
Figure BDA0003027943110000058
Figure BDA0003027943110000059
There are two types of cells, normal cells and reduction cells. One normal cell uses an operation with a span of 1, thus making it possible to operate with a single normal cell
Figure BDA00030279431100000510
And
Figure BDA00030279431100000511
one reduction cell uses operations with stride 2, i.e.
Figure BDA00030279431100000512
Figure BDA00030279431100000513
And
Figure BDA00030279431100000514
for filters and
Figure BDA00030279431100000515
number of convolutional neural network architectures designed in most people [10,12,13,23,32,33 ]]In the method, when the space feature map is halved, the common method is to reduce the space feature map to half
Figure BDA0003027943110000061
Doubled. Thus, for step 1, use is made of
Figure BDA0003027943110000062
For step 2, use is made of
Figure BDA0003027943110000063
As shown in fig. 9, the unit is represented by DAG having 7 nodes (two input nodes, i 1 and i 1, four intermediate nodes, B1, B2, B3, B4, apply the adoption operation to the input nodes and upper nodes, and the output node connecting the intermediate nodes). The edge between two nodes represents the operation O sampled from the operation search space(i,j). In training, when an intermediate node has a plurality of edges (operations), the input of the intermediate node is obtained by adding element by element. In the test, we select the top K probabilities to generate the final unit. Thus, the size of the entire search space is
Figure BDA0003027943110000066
Wherein
Figure BDA0003027943110000064
Is a set of possible edges with N intermediate nodes. In our use
Figure BDA0003027943110000065
In the case of (2X 8) total number of cell structures2+3+4+5=2×814This is a very large search space and therefore requires an efficient optimization method.
Network:
As shown in fig. 10, the network is composed of a predetermined number of cell stacks, which may be normal cells or reduction cells. At the top of the network, global average pooling is followed by a softmax layer for final output. Based on the network performance evaluation strategy of the shared parameters, a small (for example, 8-layer) stacking model is trained on related data sets to search for normal cells and reduction cells, and then the normal cells and the reduction cells are stacked into a deeper network (for example, 20 layers) for performance evaluation. The overall construction process and search space of CNNs is the same as the differentiated architecture search. Note, however, that our search algorithm is different.
The present invention will be described in further detail with reference to specific examples.
In this example we first performed experiments on the CIFAR-10 dataset to demonstrate the feasibility of our algorithm. Then, the cell searched on the CIFAR-10 dataset is applied to other wider image classification datasets (such as CIFAR-100 and ImageNet), so that the method carries out comparison on the aspects of searching efficiency, accuracy, network parameter size and the like with other latest NAS methods.
The method comprises the following steps: the data set was set up and we followed the experimental data set and evaluation index of most NAS algorithms. Therefore, we performed a number of experiments on the CIFAR-10 dataset. The CIFAR-10 data set comprises 50000 training set pictures and 10000 testing set pictures. In the neural network architecture search, 5000 training set pictures are randomly selected as a verification set to evaluate the sampled network architecture. The picture of CIFAR-10 is a color image, the size is 32 x 32, and 10 image categories are provided in total. All color intensities of the image are normalized to [ -1, +1 ].
Step two: setting a search space; in the whole searching process, according to the theory, the number of the cell stacks does not influence the updating of the evaluation result on the operation probability. Therefore, in order to speed up the search process, when searching, the number of layers of cell stack is set to be 8, the 2 nd layer and the 5 th layer are reduction cells, the rest layers are normal cells, and each cell has 4 nodes. The search process trains the shared parameter 50 epochs total, samples sub-network 100 epochs, batch size set to 128, initial number of channels 16
Figure BDA0003027943110000071
The initial learning rate was set to 0.025 (scheduled to 0 by cosine anneal), the momentum was set to 0.9, and the weight decay value was set to 3 x 10-4. And the hyper-parameter a representing the magnitude of the update of the probability of operation is set to 0.005.
Step three: the probability of initializing each cell operation is 1/8;
step four: sampling the operation according to the probability;
step five: stacking cells to form a network;
step six: allocating a sharing parameter;
step seven: verifying the network effect in the verification set;
step eight: the feedback accuracy;
step nine: updating information and operation probability;
step ten: generating hundreds of sub-networks;
step eleven: training each subnetwork once;
step twelve: updating the sharing parameters;
step thirteen: then implementing steps four to twelve in each epoche;
fourteen steps: after searching, we need to stack cells to form a complete neural network so that we can perform performance evaluation on the neural network on data sets such as CIFAR-10, CIFAR-100 and ImageNet.
Fourteen steps: when evaluating on CIFAR-10 and CIFAR-100 datasets, we basically retain the hyper-parameter settings when searching on CIFAR-10, but to expand 8 cells to 20 cells, train 600 epochs, set the batch-size to 128, and perform regularization processes such as cut out and path drop of probability of 0.3.
Step fifteen: when evaluated on the ImageNet dataset, we also basically retained the previous hyper-parameter settings, but set the cell to 14 cells, train 250 epochs, set the batch-size to 64, set the weight attenuation to 3X 10-5, and have an initial SGD learning rate of 0.1 (calibrated by a factor of 0.97in event epochs).
To eliminate the random factor, we performed several experiments. We find that the finally searched neural network architectures perform closely, which explains the stability of our algorithm. The optimal Normal Cell neural network architecture is shown in FIG. 11. The optimal ReductoCell neural network architecture is shown in FIG. 12.
Figure BDA0003027943110000081
TABLE 1
The training results for the optimal neural network architecture on CIFAR-10 and CIFAR-100 data sets are listed in Table 1. It is worth mentioning that compared with other NAS methods, the method proposed by the present invention has a very significant advantage in terms of computing resource consumption, and only 2 GPU hours are required to complete the whole search process. In terms of the network parameter quantity, the network parameter is only 2.8M, which is far smaller than the networks searched by other NAS methods. On the error rate, the error rate of the neural network is 2.69% on CIFAR-10 and 17% on CIFAR-100, and the neural network is slightly better than other neural network architectures. Therefore, it is clear that our method shows certain advantages in terms of computational resource consumption and network test accuracy over other NAS methods, as well as over artificially designed neural networks.
We also train our optimal neural network structure on the ImageNet dataset. According to the existing searched cell structure, the cell structure is stacked into a complete neural network architecture and is directly transplanted to ImageNet for training. As described in section 1, we set the number of cells to 14, where 4,9 layers are reduction cells and the rest are normal cells, and the input image size is 224 × 224.
Figure BDA0003027943110000091
TABLE 2
As shown in table 2, on the ImageNet data set, the error rate of our neural network is 24%, which is superior to the neural network searched by other NAS methods, and compared with them, our method has higher performance and less calculation consumption.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (5)

1. A high-efficiency neural network structure searching method based on probability distribution is characterized by comprising the following steps: a novel probability distribution type algorithm greatly reduces the number of training sub-networks, accelerates the searching process of a neural network architecture, and uses a parameter sharing mode of training and searching at the same time, so that the evaluation cost of the sub-networks is reduced, better operation is guaranteed to obtain more training, and the searching process of the neural network architecture is further accelerated.
2. The method of claim 1, wherein the probabilistic distributed algorithm comprises: initialization and sampling, the network structure being diversified by selecting between every two nodes
Figure FDA0003027943100000011
A possible operation;initializing probability for each operation at first, continuously updating iteration probability, and finally selecting the operation which shows the best performance; when searching is started, all operations in the search space are initialized to probability parameters
Figure FDA0003027943100000012
I.e. between two nodes
Figure FDA0003027943100000013
The sum of the probabilities of the operations is 1; then in the sampling phase, we are based on the fact that between every two nodes
Figure FDA0003027943100000014
Selecting operation between every two nodes in the round according to the operation probability, wherein the probability value is higher, and the selected probability is higher; and finally, selecting a result, namely the network sampled in the current round.
3. The method according to claim 1 or 2, wherein the probability distribution algorithm comprises a probability updating method, wherein for a network performance evaluation strategy, a forced parameter sharing mode is adopted, after a sampling network is completed and cells are stacked to form a complete CNN network, sharing parameters are directly given, and then performance evaluation is carried out on a data set; after the accuracy of the network on the data set is obtained, the accuracy is fed back, and probability updating of each operation is carried out; selecting operation according to the probability, stacking the determined cells into a complete network, giving a shared parameter, performing performance evaluation, feeding back to a controller, updating information and the probability, and completing a round of iteration; defining the probability of operation as
Figure FDA0003027943100000015
Defining the training algebra of each operation as
Figure FDA0003027943100000016
Define the average accuracy of each operation as
Figure FDA00030279431000000113
Wherein
Figure FDA00030279431000000114
Represents
Figure FDA00030279431000000115
One of the operations follows the rule that, at a node
Figure FDA0003027943100000017
And
Figure FDA0003027943100000018
in between
Figure FDA0003027943100000019
In an operation, if the operation is
Figure FDA00030279431000000110
Compared with other operations, the operation has fewer iterations and higher accuracy, and the operation is superior to other operations, and the updating formula of the average accuracy of each operation is as follows:
Figure FDA00030279431000000111
the comparison between operations is:
Figure FDA00030279431000000112
the probability update formula for the operation is:
Pm=Pm+α×Z,(1≤m≤M) (6)
wherein α is a hyperparameter representing the magnitude of the operational probability update; selecting an operation with less iteration times but higher average accuracy in a search space, and then enhancing the probability of the operation; meanwhile, for the operations with more iteration times but lower average accuracy, the operations are considered to belong to the operations with poor performance, and the probability of the operations is weakened; after a certain number of iterations, the probabilities of the operations in the search space will converge and stabilize effectively; in order to generate a final neural network, after probability convergence, selecting an operation with the highest probability in all edges; for nodes with multiple inputs, we take
Figure FDA0003027943100000025
An operation of individual probabilities; after determining the normal cell and the reduction cell, we stack them by a set number to form a complete neural network.
4. The method of claim 1, wherein the parameter sharing mode for search-while-training comprises: after the known search space has been determined, and we have determined that after each iteration of searching for a cell, we will be able to search for a cell by
Figure FDA0003027943100000021
Stacking the cells to form a complete neural network for performance evaluation; so we share
Figure FDA0003027943100000022
Operating parameters on each side in each cell; i.e. only need to save
Figure FDA0003027943100000023
A training parameter of an operation, wherein ∈
Figure FDA0003027943100000024
Is a set of possible edges with N intermediate nodes; then training or evaluating on the training data set or the evaluation data set each timeAnd during neural network, reading corresponding operating parameters from the stored shared parameters instead of random initialization, and storing the latest operating parameters back to corresponding positions after training parameters are finished.
5. The method according to claim 1 or 4, wherein the parameter sharing mode of the search-while-training comprises: after training a generation of shared parameters, performing a round of network search, sampling a network, performing performance evaluation, updating the operation probability, and performing a new round of shared parameter training by using the updated operation probability, and repeating the operation until the operation probability is converged and stable.
CN202110421335.0A 2021-04-20 2021-04-20 Efficient neural network structure searching method based on probability distribution Pending CN113344174A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110421335.0A CN113344174A (en) 2021-04-20 2021-04-20 Efficient neural network structure searching method based on probability distribution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110421335.0A CN113344174A (en) 2021-04-20 2021-04-20 Efficient neural network structure searching method based on probability distribution

Publications (1)

Publication Number Publication Date
CN113344174A true CN113344174A (en) 2021-09-03

Family

ID=77468197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110421335.0A Pending CN113344174A (en) 2021-04-20 2021-04-20 Efficient neural network structure searching method based on probability distribution

Country Status (1)

Country Link
CN (1) CN113344174A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114429197A (en) * 2022-01-25 2022-05-03 西安交通大学 Neural network architecture searching method, system, equipment and readable storage medium
CN115115873A (en) * 2022-06-08 2022-09-27 中国船舶集团有限公司系统工程研究院 Image classification method and device based on differentiable network structure search
CN115760777A (en) * 2022-11-21 2023-03-07 脉得智能科技(无锡)有限公司 Hashimoto's thyroiditis diagnostic system based on neural network structure search
WO2023087953A1 (en) * 2021-11-22 2023-05-25 华为技术有限公司 Method and apparatus for searching for neural network ensemble model, and electronic device
CN114429197B (en) * 2022-01-25 2024-05-28 西安交通大学 Neural network architecture searching method, system, equipment and readable storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023087953A1 (en) * 2021-11-22 2023-05-25 华为技术有限公司 Method and apparatus for searching for neural network ensemble model, and electronic device
CN114429197A (en) * 2022-01-25 2022-05-03 西安交通大学 Neural network architecture searching method, system, equipment and readable storage medium
CN114429197B (en) * 2022-01-25 2024-05-28 西安交通大学 Neural network architecture searching method, system, equipment and readable storage medium
CN115115873A (en) * 2022-06-08 2022-09-27 中国船舶集团有限公司系统工程研究院 Image classification method and device based on differentiable network structure search
CN115760777A (en) * 2022-11-21 2023-03-07 脉得智能科技(无锡)有限公司 Hashimoto's thyroiditis diagnostic system based on neural network structure search
CN115760777B (en) * 2022-11-21 2024-04-30 脉得智能科技(无锡)有限公司 Hashimoto thyroiditis diagnosis system based on neural network structure search

Similar Documents

Publication Publication Date Title
Liashchynskyi et al. Grid search, random search, genetic algorithm: a big comparison for NAS
CN113344174A (en) Efficient neural network structure searching method based on probability distribution
CN109948029A (en) Based on the adaptive depth hashing image searching method of neural network
CN109934332A (en) The depth deterministic policy Gradient learning method in pond is tested based on reviewer and double ends
CN112232413B (en) High-dimensional data feature selection method based on graph neural network and spectral clustering
Bakhshi et al. Fast evolution of CNN architecture for image classification
Suganuma et al. Designing convolutional neural network architectures using cartesian genetic programming
CN110222824B (en) Intelligent algorithm model autonomous generation and evolution method, system and device
CN112288046B (en) Mixed granularity-based joint sparse method for neural network
Luo et al. HSCoNAS: Hardware-software co-design of efficient DNNs via neural architecture search
Dutta et al. Effective building block design for deep convolutional neural networks using search
CN116911459A (en) Multi-input multi-output ultra-short-term power load prediction method suitable for virtual power plant
Yamada et al. Weight Features for Predicting Future Model Performance of Deep Neural Networks.
US20230051955A1 (en) System and Method For Regularized Evolutionary Population-Based Training
Vanneschi et al. Heterogeneous cooperative coevolution: strategies of integration between gp and ga
Hen Solving spin glasses with optimized trees of clustered spins
Wan et al. RSSM-Net: Remote sensing image scene classification based on multi-objective neural architecture search
CN113554144A (en) Self-adaptive population initialization method and storage device for multi-target evolutionary feature selection algorithm
Zhang et al. Bandit neural architecture search based on performance evaluation for operation selection
Joldos et al. A parallel evolutionary approach to community detection in complex networks
JPH0561848A (en) Device and method for selecting and executing optimum algorithm
Frachon et al. An immune-inspired approach to macro-level neural ensemble search
Sun et al. Matrix-Based Ant Colony Optimization for Large-Scale Balanced Multiple Traveling Salesmen Problem
Zou et al. G-EvoNAS: Evolutionary Neural Architecture Search Based on Network Growth
US20230110362A1 (en) Data processing apparatus and data processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210903

WD01 Invention patent application deemed withdrawn after publication