CN111656365A - Method and apparatus for network structure search, computer storage medium, and computer program product - Google Patents
Method and apparatus for network structure search, computer storage medium, and computer program product Download PDFInfo
- Publication number
- CN111656365A CN111656365A CN201980009246.6A CN201980009246A CN111656365A CN 111656365 A CN111656365 A CN 111656365A CN 201980009246 A CN201980009246 A CN 201980009246A CN 111656365 A CN111656365 A CN 111656365A
- Authority
- CN
- China
- Prior art keywords
- network structure
- network
- operations
- feedback
- search space
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000004590 computer program Methods 0.000 title claims abstract description 17
- 238000005070 sampling Methods 0.000 claims abstract description 29
- 238000003062 neural network model Methods 0.000 claims abstract description 20
- 238000012549 training Methods 0.000 claims description 69
- 239000013598 vector Substances 0.000 claims description 22
- 238000012360 testing method Methods 0.000 claims description 21
- 238000013528 artificial neural network Methods 0.000 claims description 20
- 238000010586 diagram Methods 0.000 claims description 12
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 239000004744 fabric Substances 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 125000004122 cyclic group Chemical group 0.000 claims 2
- 238000004891 communication Methods 0.000 description 6
- 238000013139 quantization Methods 0.000 description 6
- 238000012795 verification Methods 0.000 description 6
- 238000007667 floating Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000004913 activation Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013136 deep learning model Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
A method for network structure search includes: (step S12) determining a search space of the neural network model for which a network structure search is to be performed, the search space defining a plurality of operations on an operation layer between every two nodes in the neural network model; (step S14) sampling an operation at each operation level of the search space according to the first network structure to obtain a target network structure; (step S16) low-specific the target network structure to obtain a second network structure; (step S18) determining a feedback amount for the second network configuration; (step S19) the first network configuration is updated according to the feedback amount. An apparatus, computer storage medium, and computer program product for network structure search are also disclosed.
Description
Technical Field
The present application relates to the field of machine learning, and in particular, to a method and an apparatus for network structure search, a computer storage medium, and a computer program product.
Background
In the related art, when the deep learning model is applied to the mobile terminal, the model needs to be trained on a computer or a server, and then the model is deployed on a chip of the mobile terminal. However, the existing high-performance deep learning model is often huge in parameter quantity, and the parameters are 32-bit floating point parameters, so that the training on devices with abundant computing power such as computers or servers is not problematic, but the deep learning model is very difficult to directly deploy on a mobile terminal with limited computing resources. Low-bit networks have small storage space, fast operation speed and low demand for computing resources, so that the low-bit networks become one of the hot spots of research in recent years. However, the network structure has a large influence on the low bit network, and how to design a network structure suitable for the low bit network is an urgent problem to be solved.
Disclosure of Invention
Embodiments of the present application provide a method and apparatus for network structure search, a computer storage medium, and a computer program product.
The method for searching the network structure comprises the following steps:
determining a search space of a neural network model to be subjected to network structure search, wherein the search space defines various operations on an operation layer between every two nodes in the neural network model;
sampling one of the operations at each of the operation levels of the search space according to a first network structure to obtain a target network structure;
low-specializing the target network structure to obtain a second network structure;
determining a feedback quantity of the second network structure;
and updating the first network structure according to the feedback quantity.
The network structure search device of the embodiment of the application comprises a processor and a memory, wherein the memory stores one or more programs, and when the programs are executed by the processor, the processor is used for executing: determining a search space of a neural network model to be subjected to network structure search, wherein the search space defines various operations on an operation layer between every two nodes in the neural network model; sampling one of the operations at each of the operation levels of the search space according to a first network structure to obtain a target network structure; low-specializing the target network structure to obtain a second network structure; determining a feedback quantity of the second network structure; and updating the first network structure according to the feedback quantity.
The computer storage medium of the present embodiment stores thereon a computer program that, when executed by a computer, causes the computer to execute the above-described method.
The computer program product of the present application embodiment includes instructions, which when executed by a computer, cause the computer to perform the above-described method.
According to the method and device for searching the network structure, the computer storage medium and the computer program product, the sampled target network structure is subjected to low-bit specialization to obtain the second network structure, and the feedback quantity of the second network structure is determined to update the first network structure, so that the network structure more suitable for the low-bit network can be obtained, the high-performance low-bit network is realized, and the low-bit network can be better applied to a mobile terminal scene.
Additional aspects and advantages of embodiments of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of embodiments of the present application.
Drawings
The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart illustrating a method for network structure search according to an embodiment of the present application;
fig. 2 is a block diagram of an apparatus for network structure search according to an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating a method for network structure search according to an embodiment of the present invention;
FIG. 4 is another schematic diagram of a method for network structure search according to an embodiment of the present application;
FIG. 5 is a flow chart illustrating a method for network structure search according to yet another embodiment of the present application;
FIG. 6 is a flow chart illustrating a method for network structure searching according to yet another embodiment of the present application;
FIG. 7 is a flow chart illustrating a method for network structure search according to another embodiment of the present application;
FIG. 8 is a flow chart illustrating a method for network structure searching according to yet another embodiment of the present application;
FIG. 9 is a flow chart illustrating a method for network structure searching according to yet another embodiment of the present application;
FIG. 10 is a flow chart illustrating a method for network structure searching according to another embodiment of the present application;
FIG. 11 is a general diagram illustrating a method for network structure search according to an embodiment of the present application;
fig. 12 is a flowchart illustrating a method for network structure search according to another embodiment of the present application.
Description of the main element symbols:
a device 10 for network structure searching, a memory 102, a processor 104, and a communication interface 106.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative and are only for the purpose of explaining the present application and are not to be construed as limiting the present application.
In the description of the present application, it is to be understood that the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
In the description of the present application, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; may be mechanically connected, may be electrically connected or may be in communication with each other; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
The following disclosure provides many different embodiments or examples for implementing different features of the application. In order to simplify the disclosure of the present application, specific example components and arrangements are described below. Of course, they are merely examples and are not intended to limit the present application. Moreover, the present application may repeat reference numerals and/or letters in the various examples, such repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. In addition, examples of various specific processes and materials are provided herein, but one of ordinary skill in the art may recognize applications of other processes and/or use of other materials.
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative and are only for the purpose of explaining the present application and are not to be construed as limiting the present application.
Referring to fig. 1 and 2, a method and an apparatus 10 for network structure search are provided in the present application.
The method for searching the network structure comprises the following steps:
step S12: determining a search space of a neural network model to be subjected to network structure search, wherein the search space defines various operations on an operation layer between every two nodes in the neural network model;
step S14: sampling an operation in each operation layer of a search space according to a first network structure to obtain a target network structure;
step S16: low-specializing the target network structure to obtain a second network structure;
step S18: determining a feedback quantity (val-acc) of the second network structure;
step S19: the first network structure is updated according to the feedback quantity.
The apparatus 10 for network structure search according to the embodiment of the present application includes a processor 104 and a memory 102, where the memory 102 stores one or more programs, and when the programs are executed by the processor 104, the processor 104 is configured to perform: determining a search space of a neural network model to be subjected to network structure search, wherein the search space defines various operations on an operation layer between every two nodes in the neural network model; sampling an operation in each operation layer of a search space according to a first network structure to obtain a target network structure; low-specializing the target network structure to obtain a second network structure; determining a feedback quantity of the second network structure; the first network structure is updated according to the feedback quantity.
That is, the method for network structure search according to the embodiment of the present application can be implemented by the apparatus 10 for network structure search according to the embodiment of the present application.
The method and apparatus 10 for searching a network structure according to the embodiment of the present application perform low-bit specialization on a sampled target network structure to obtain a second network structure, and then determine a feedback amount of the second network structure to update the first network structure, so that a network structure more suitable for a low-bit network can be obtained, and thus a high-performance low-bit network is implemented, and the low-bit network can be better applied to a mobile terminal scenario.
Optionally, the apparatus 10 for network structure search may further include a communication interface 106 for outputting data processed by the apparatus 10 for network structure search and/or inputting data to be processed by the apparatus 10 for network structure search from an external device. For example, the processor 104 is used to control the communication interface 106 to input and/or output data.
Note that the number of processors 104 may be one. The number of processors 104 may also be multiple, such as 2, 3, 5, or other numbers. In the case where the number of the processors 104 is plural, the steps S12, S14, S16, S18, and S19 may be performed by different processors 104.
In addition, step S14, step S16, step S18, and step S19 are performed in a plurality of cycles. Thus, the second network structure with better effect can be finally obtained.
In the related art, the low bit-rate network mainly adopts a model quantization technology, and the model quantization technology mainly comprises two parts, namely quantization aiming at Weight (Weight) and quantization aiming at Activation value (Activation). The related art illustrates the feasibility of 8 bits, and can realize the accuracy rate which is not lost compared with 32 bits. In order to further pursue the calculation efficiency, 4-bit, 2-bit and 1-bit networks are further proposed, but the accuracy of the networks is greatly reduced due to too much information loss.
Taking a 1-bit network as an example, the low-bit network in the related art directly quantizes the 32-bit network weight and the activation function output after the model is trained on the server to 1 or-1. Specifically, during training, the forward (forward) process binarizes the weights and activation values, and then during backward (backward) process, the Straight-through estimation (STE) method is used to update the weights until convergence. Or training the floating point model to be converged, and then performing binarization processing and transfer learning (tune). However, none of these techniques solves the problem well.
That is, the related art low ratio trick is based on the same rule in operation: low-bit specialization is performed by using a model with good performance designed on a computer or a server, for example, low-bit processing is performed by directly using a Residual Neural Network (resnet). In fact, the neural network is a very complex non-convex model, and it is uncertain whether the structure of the high-performance model with floating point precision is the same as that of the high-performance model with low-ratio characteristic.
In fact, the high performance model structure of floating point parameters should be different from the high performance model structure of low bit parameters. That is, the direct use of models designed at floating point precision for low bit scaling does not solve the problem, which is why low bit scaling networks still do not have a good solution at present.
Therefore, it is considered that the network structure has a large influence on the low bit rate network. In contrast, the method and apparatus 10 for network structure search according to the embodiment of the present application can obtain a network structure more suitable for a low bit rate network, thereby implementing a high performance low bit rate network, and further enabling the low bit rate network to be better applied to a mobile terminal scenario. It can be understood that, because the process of designing the network structure is very complex, the method and apparatus 10 for searching the network structure according to the embodiment of the present application use an automatic Network Architecture Search (NAS) algorithm to solve the problem of the low bit rate network.
In particular, Network Architecture Search (NAS) is a technique for automatically designing Neural network models using algorithms. The network structure search is to search the structure of the neural network model. In the embodiment of the present application, the neural network model to be subjected to the network structure search is a Convolutional Neural Network (CNN).
The problem to be solved by network structure search is to determine the operations between nodes in the neural network model. Different combinations of operations between nodes correspond to different network architectures. Further, operations between nodes in the neural network model may be understood as a feature layer in the neural network model. An operation between two nodes refers to an operation required for the transformation of the characteristic data on one of the nodes into the characteristic data on the other node. The operations referred to herein may be convolution operations, pooling operations, or other neural network operations such as fully-connected operations. Operations between two nodes can be considered to constitute an operational layer between the two nodes. Typically, there are multiple operations available for searching, i.e., there are multiple candidate operations, at the operational level between two nodes. The purpose of the network structure search is to determine an operation at each operational level.
For example, conv3 × 3, conv5 × 5, depthwise3 × 3, depthwise5 × 5, maxpool3 × 3, averagepool3 × 3, and the like are defined as the search space. That is, each layer of operation of the target network fabric is sampled in these six choices.
The idea of NAS is to obtain a network structure in a search space through a first network structure, then obtain an accuracy R according to the network structure, use the accuracy R as feedback to update the first network structure, and continue to optimize the first network structure to obtain another network structure, and so on until an optimal result is obtained.
The first network structure may act as a controller. In the example of fig. 3, the first Network structure is constructed by a Recurrent Neural Network (RNN), but of course, the first Network structure may also be constructed by a Convolutional Neural Network (CNN) or a Long-Short term memory artificial Neural Network (LSTM). The specific manner in which the first network structure is constructed is not limited herein. The first network structure will be explained by taking an example of LSTM construction.
Referring to fig. 4 and 5, each operation layer of the search space corresponds to a time step (time) of the long-short term memory artificial neural network, and for each time step, the Cell (Cell) of the long-short term memory artificial neural network outputs a hidden state (hidden state), and step S14 includes:
step S142: mapping the hidden state into a feature vector, wherein the dimensionality of the feature vector is the same as the operation quantity on each operation layer;
step S144: and sampling an operation in each operation layer according to the characteristic vectors to obtain the target network structure.
Correspondingly, the processor 104 is configured to map the hidden state into a feature vector, where the dimension of the feature vector is the same as the number of operations on each operation layer; and sampling one operation at each operation layer according to the characteristic vectors to obtain the target network structure.
In this way, sampling one operation at each operation level of the search space to obtain the target network structure is achieved. For example, a 20-tier network is searched altogether, requiring 20 time steps without regard to jumpers.
In the example of fig. 4, the solid arrows represent time steps (timesteps), time 1 represents the first Cell of the LSTM, time 2 represents the second Cell … … of the LSTM, and so on. The block conv3 × 3 represents the operation of the layer in the model, and the circle represents the connection relationship between the operation layer and the operation layer.
It can be understood that, since the network structure is calculated in a sequential order, the logical relationship of the calculated sequential relationship is mapped to the LSTM, which is the state of the cell of the LSTM at each time from left to right of a small square in fig. 5.
Specifically, at time 1, the hidden state of the cell output is calculated to obtain convolution conv3 × 3, conv3 × 3 is used as the input layer of the cell at time 2, and the hidden state of the cell output at time 1 is also used as the input of the cell at time 2, so that circle 1 is calculated.
Similarly, the circle 1 is used as the input of the cell at time 3, the hidden state output by the cell at time 2 is also used as the input of time 3, and the convolution sep5 × 5 … … is obtained through calculation, and so on.
Further, referring to fig. 6, step S144 includes:
step S1442: normalizing (softmax) the feature vectors to obtain a probability of each operation layer;
step S1444: one operation is sampled at each operation layer according to the probability to obtain a target network structure.
Correspondingly, the processor 104 is configured to normalize the feature vector to obtain a probability of each operation layer; and for sampling an operation at each operational level according to the probability to obtain the target network structure.
In this way, sampling of one operation at each operation level based on the feature vectors is achieved to obtain the network structure. Specifically, in the example shown in fig. 4, an encoding (encoding) operation is performed on the hidden state of the cell output of the LSTM, the hidden state is mapped to a vector (vector) with a dimension of 6, the vector is changed into a probability distribution through a normalized exponential function (softmax), and sampling is performed according to the probability distribution to obtain the operation of the current layer. And so on to finally obtain a network structure. It will be appreciated that in this example, there is only one input, comprising a total of six operations (3 × 3 convolution, 5 × 5 convolution, 3 × 3 depthwise-partial convolution, 5 × 5 depthwise-partial convolution, max po ling, 3 × 3average po ling), the vector dimensions correspond to the search space, 6 means that 6 operations are selectable in the search space.
In step S16, the method for low-ratio-specifying the target network structure to obtain the second network structure may be the model quantization technique described above, and the specific method for low-ratio-specifying the target network structure is not limited herein.
In addition, step S19 is implemented by the following formula:
wherein R iskIs the k < th > feedback quantity, thetacFor the parameters of the long-short term memory artificial neural network, atFor the operation sampled at the t-th of the operation layer, P (a)t|a(t-1):1;θc) Is the probability of sampling the operation. m is the total number of the feedback quantities, and T is the number of the predicted hyperparameters of the first network structure.
In this way, updating the first network configuration according to the amount of feedback is achieved.
The network structure Search method according to the embodiment of the present application may be a network structure Search method based on NAS, or an all-variant network structure Search method (ENAS-like) based on Efficient network structure Search (ENAS).
The ENAS may be an efficient network structure search method proposed on a Reinforcement Learning (RL) based NAS structure, or an efficient network structure search method based on an evolutionary algorithm. It can be understood that, because the efficiency of the NAS is low, in the present embodiment, the ENAS may improve the efficiency of the network structure search through weight sharing and the like.
In the embodiments shown in fig. 10 to 14 (hereinafter referred to as the first embodiment), the method and apparatus 10 for network configuration search are based on NAS.
In the embodiment shown in fig. 15-19 (hereinafter referred to as embodiment two), the method and apparatus 10 for network structure search is based on ENAS-like.
The method and apparatus 10 for searching a network configuration according to the first and second embodiments will be described below.
The first implementation mode comprises the following steps:
in the present embodiment, the network configuration search method and apparatus 10 are based on NAS.
Referring to fig. 7, in the present embodiment, step S18 includes:
step S181: the second network structure is trained to converge to determine the amount of feedback.
Correspondingly, the processor 104 is configured to train the second network structure to converge to determine the amount of feedback.
In this way, determining the amount of feedback for the second network configuration is achieved.
In addition, the training sample (train set) may be previously divided into a training set (train) and a test set (valid). It can be understood that in the conventional CNN, the sample data is generally divided into a training sample and a verification sample, where the training sample is used for training the network structure, and the verification sample is used for verifying the network structure.
In the present embodiment, when searching for the second network structure, the training set is used to train the searched parameters of the second network structure, such as the parameters of the second network structure calculated by conv3 × 3, sep5 × 5, for example, the weight, the bias, and the like. After the second network structure is searched, the searched second network structure can be predicted on the test set to obtain a feedback quantity to update the first network structure (LSTM) according to the aforementioned formula. Note that the LSTM is not trained directly with the test set.
That is, the training set is used to train the searched parameters of the second network structure, the test set is used to update the parameters of the LSTM, and the verification sample is used to verify that the searched parameters of the second network structure are not good.
In one example, the number of training samples is 10, the training samples are divided into 8 training sets and 2 test sets, the 8 training sets are used for training the searched structure, and the 2 test sets are used for updating the LSTM.
Further, referring to fig. 8, step S181 includes:
step S182: training the second network structure to converge by using the training set;
step S184: the converged second network structure is predicted (predicted) using the test set to determine the amount of feedback.
Correspondingly, the processor 104 is configured to train the second network structure to converge using the training set; and means for predicting the converged second network structure using the test set to determine the amount of feedback.
In this manner, training the second network structure to converge to determine the amount of feedback is achieved.
In one example, an operation is sampled at each operation layer of the search space to obtain a target network structure, then the target network structure is low-bit into a second network structure by using a model quantization technology, then the second network structure is directly trained on a training set to be converged, and the converged second network structure is predicted on a test set, so as to determine the feedback quantity. Finally, the feedback quantity is substituted into the following formula to update the first network structure according to the feedback quantity:
the second embodiment:
in this embodiment, the method and apparatus 10 for network structure search is based on ENAS-like.
Referring to fig. 9, step S18 includes:
step S183: the feedback quantity is determined from a second network structure that is not trained to converge.
Correspondingly, the processor 104 is configured to determine the feedback amount according to a second network structure, which is not trained to converge.
In this way, determining the amount of feedback for the second network configuration is achieved. It will be appreciated that in the first embodiment, each time a second network structure is obtained, it is trained to converge to determine the amount of feedback, which is time consuming and inefficient. In this embodiment, when the feedback amount is determined according to the second network structure, the second network structure is not trained to converge, and the time for training the second network structure to converge can be reduced, thereby improving the efficiency.
In addition, the training sample (train set) may be previously divided into a training set (train) and a test set (valid). It can be understood that in the conventional CNN, the sample data is generally divided into a training sample and a verification sample, where the training sample is used for training the network structure, and the verification sample is used for verifying the network structure.
In the present embodiment, when searching for the second network structure, the training set is used to train the searched parameters of the second network structure, such as the parameters of the second network structure calculated by conv3 × 3, sep5 × 5, for example, the weight, the bias, and the like. After the second network structure is searched, the searched second network structure can be predicted on the test set to obtain a feedback quantity to update the first network structure (LSTM) according to the aforementioned formula. Note that the LSTM is not trained directly with the test set.
That is, the training set is used to train the searched parameters of the second network structure, the test set is used to update the parameters of the LSTM, and the verification sample is used to verify that the searched parameters of the second network structure are not good.
In one example, the number of training samples is 10, the training samples are divided into 8 training sets and 2 test sets, the 8 training sets are used for training the searched structure, and the 2 test sets are used for updating the LSTM.
Further, referring to fig. 10 and 11, step S14 includes:
step S146: training a general graph (a whole graph) of a search space by using a training set, wherein the general graph is formed by connecting operations;
step S148: sampling the trained general graph according to the first network structure to obtain a target network structure;
step S18 includes:
step S185: the second network structure is predicted (predicted) using the test set to determine an amount of feedback.
Correspondingly, the processor 104 is configured to train a general diagram of the search space using the training set, the general diagram being formed by operatively connected; the system comprises a training device, a first network structure and a second network structure, wherein the training device is used for training a general diagram of a target network; and for predicting a second network structure using the test set to determine the amount of feedback.
The general diagram shown in fig. 11 is formed by the operative connections between the nodes. It can be understood that the connection mode of the thickened band edge optimal structure in fig. 11 is a sub-graph of the general graph.
Note that steps S146 and S148, S16, S185, and S19 may be iterated until a preset total number of iterations is completed. This allows a better second network structure to be obtained.
In the present embodiment, the total number of iterations is 310. It is understood that in other embodiments, the total number of iterations may have a value of 100, 200, or other values.
In each iteration, step S146 may be repeated, each time using a batch of data (batch) of the training set until the data of the training set is used up, i.e. one iteration (epoch) is completed. The LSTM is then updated.
In updating the LSTM, step S148, step S16, step S185, and step S19 may be cyclically performed until the preset number of times is completed.
In the present embodiment, the preset number of times is 50 times. It is understood that in other examples, the predetermined number of times may be 10, 20, 30, or other numerical values. The specific number of times of the preset number is not limited herein. It can be understood that the preset times are 50 times, and the randomness optimization brought by sampling can be reduced.
At each time of the loop of steps S148, S16, S185, and S19, a preset number of feedback amounts may be determined, thereby updating the LSTM with the preset number of feedback amounts. Further, the LSTM may be updated in a policy gradient optimized manner. The manner in which the LSTM is updated is not limited herein.
In the present embodiment, the preset number is 20. It is understood that in other examples, the predetermined number may be 10, 15, 25 or other values. The specific number of the preset number is not limited herein.
Referring to fig. 12, step S146 includes:
step S1462: sampling an operation at each operation layer of a search space to obtain a subgraph of a general graph;
step S1464: the subgraph is trained using a batch of data (batch) of a training set.
Correspondingly, the processor 104 is configured to sample an operation at each operation level of the search space to obtain a subgraph of the overall graph; and training the subgraph using a batch of data of the training set.
Thus, the training of the general diagram is realized. In this embodiment, the ENAS adopts a weight sharing strategy, and after sampling a network structure each time, it does not train the network structure to be convergent, but trains sub-graphs by using a batch of data (batch) of a training set. Note that the convergence of the graph does not correspond to the convergence of the network structure.
It can be understood that, in this embodiment, the ENAS based on the weight sharing policy shares parameters that can be shared each time a network structure is searched, so that time can be saved, thereby improving efficiency of searching the network structure. For example, in the example of fig. 16, if the nodes 1, 2, 3, and 6 are searched for this time after the nodes 1, 3, and 6 are searched for and the searched network structure is trained, the relevant parameters of the network structure trained when the nodes 1, 3, and 6 are searched for may be applied to the training of the searched network structure this time. Thus, efficiency improvement through weight sharing can be achieved.
In one example, the search space is 5 levels, with 4 optional operations per level, corresponding to a 4X5 diagram. The network structure search requires one operation per layer, which is equivalent to path optimization on the graph. Initially, randomly sampling an operation in each layer, then connecting the sampled operations to obtain a subgraph, and training the subgraph on a batch of data in a training set; then, randomly sampling one operation on each layer to obtain another subgraph, and training the subgraph on another batch of data in the training set; then, random sampling to another sub-graph is continued and the sub-graph is trained … … on another batch of data in the training set until the data in the training set is used up, i.e., an iteration (epoch) is completed. The first network structure is then trained.
The overall graph is then trained in the same manner to complete the second epoch, and then the first network structure is trained.
Next, the overall graph is trained in the same manner to complete the third epoch, and then the first network structure … … is trained to iterate through a total number of iterations that is 310 times complete to optimize the overall graph and the first network structure alternately. That is, the training of the overall graph and the updating of the first network structure are performed in a plurality of iterations. Thus, the second network structure with better effect can be finally obtained. It will be appreciated that, after 310 iterations, the overall graph converges and the first network structure also converges.
Specifically, in each iteration, the first network structure may be updated 50 times, that is, the steps S148, S16, S185, and S19 are looped 50 times. That is, in each iteration, the following formula is executed 50 times:
further, when the first network structure is updated circularly each time, a preset number of 20 target network structures can be sampled, and 20 second network structures are obtained after low-ratio specialization, so that 20 feedback quantities are determined. Taking 20 feedback quantities as RkSubstituting into the above formula. That is, in the above formula, m has a value of 20.
The present application also provides a computer storage medium, on which a computer program is stored, and when the computer program is executed by a computer, the computer executes the method of any one of the above embodiments.
The present application also provides a computer program product containing instructions, and when the instructions are executed by a computer, the instructions cause the computer to execute the method of any one of the above embodiments.
According to the computer storage medium and the computer program product of the embodiment of the application, the sampled target network structure is subjected to low-bit specialization to obtain the second network structure, and then the feedback quantity of the second network structure is determined to update the first network structure, so that the network structure more suitable for the low-bit network can be obtained, the high-performance low-bit network is realized, and the low-bit network can be better applied to a mobile terminal scene.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware or any other combination. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (22)
1. A method of network fabric searching, comprising:
determining a search space of a neural network model to be subjected to network structure search, wherein the search space defines various operations on an operation layer between every two nodes in the neural network model;
sampling one of the operations at each of the operation levels of the search space according to a first network structure to obtain a target network structure;
low-specializing the target network structure to obtain a second network structure;
determining a feedback quantity of the second network structure;
and updating the first network structure according to the feedback quantity.
2. The method of claim 1, wherein determining the feedback amount for the second network structure comprises:
training the second network structure to converge to determine the amount of feedback.
3. The method of network structure search of claim 2, wherein training the second network structure to converge to determine the feedback quantity comprises:
training the second network structure to converge using a training set;
predicting the converged second network structure using a test set to determine the feedback quantity.
4. The method of claim 1, wherein determining the feedback amount for the second network structure comprises:
determining the feedback amount from the second network structure, the second network structure not being trained to converge.
5. The method of network structure search according to claim 4, wherein sampling one of the operations at each of the operation levels of the search space according to a first network structure to obtain a target network structure comprises:
training a general diagram of the search space by using a training set, wherein the general diagram is formed by connecting the operations;
sampling the trained general graph according to a first network structure to obtain the target network structure;
determining the feedback quantity according to the second network structure, comprising:
predicting the second network structure using a test set to determine the amount of feedback.
6. The method of network structure search of claim 5, wherein training the general graph of the search space with a training set comprises:
sampling one of the operations at each of the operation levels of the search space to obtain a subgraph of the overall graph;
and training the subgraph by using a batch of data of the training set.
7. The method of claim 1, wherein the first network structure is constructed according to a long-short term memory artificial neural network, and the first network structure is updated according to the feedback quantity, and is implemented by the following formula:
wherein Rk is the kth feedback quantity, θ c is a parameter of the long-short term memory artificial neural network, and at is the operation sampled at the tth operation layer and is the probability of the operation being sampled. m is the total number of the feedback quantities, and T is the number of the predicted hyperparameters of the first network structure.
8. The method of claim 1, wherein the first network structure is constructed according to a long-short term memory artificial neural network, each of the operation layers of the search space corresponds to a time step of the long-short term memory artificial neural network, and for each of the time steps, the cells of the long-short term memory artificial neural network output a hidden state, and each of the operation layers of the search space samples one of the operations according to the first network structure to obtain the target network structure, and the method comprises:
mapping the hidden state into a feature vector, wherein the dimension of the feature vector is the same as the operation quantity on each operation layer;
sampling one of the operations at each of the operation layers according to the feature vectors to obtain the target network structure.
9. The method of claim 8, wherein sampling one of the operations at each of the operation layers according to the feature vector to obtain the target network structure comprises:
normalizing the feature vectors to obtain a probability of each of the operations of each of the operation layers;
sampling one of the operations at each of the operational layers based on the probabilities to obtain the target network structure.
10. The method of network structure search of claim 1, wherein the first network structure is constructed from a convolutional neural network or a cyclic neural network.
11. An apparatus for network structure search, comprising a processor and a memory, the memory storing one or more programs that, if executed by the processor, cause the processor to perform: determining a search space of a neural network model to be subjected to network structure search, wherein the search space defines various operations on an operation layer between every two nodes in the neural network model; sampling one of the operations at each of the operation levels of the search space according to a first network structure to obtain a target network structure; low-specializing the target network structure to obtain a second network structure; determining a feedback quantity of the second network structure; and updating the first network structure according to the feedback quantity.
12. The apparatus of claim 11, wherein the processor is configured to train the second network structure to converge to determine the feedback amount.
13. The apparatus of claim 12, wherein the processor is configured to train the second network structure to converge using a training set; and means for predicting the converged second network structure using a test set to determine the amount of feedback.
14. The apparatus of claim 11, wherein the processor is configured to determine the feedback amount according to the second network structure, and wherein the second network structure is not trained to converge.
15. The apparatus of claim 14, wherein the processor is configured to train a general graph of the search space using a training set, the general graph being formed by the operations connected together; the general graph after training is sampled according to a first network structure to obtain the target network structure; and means for predicting the second network structure using a test set to determine the amount of feedback.
16. The apparatus of claim 15, wherein the processor is configured to sample one of the operations at each of the operation levels of the search space to obtain a subgraph of the overall graph; and training the subgraph using a batch of data of the training set.
17. The apparatus of claim 11, wherein the first network structure is constructed according to a long-short term memory artificial neural network, and the processor is configured to update the first network structure according to the feedback quantity, and is implemented by the following formula:
wherein Rk is the kth feedback quantity, θ c is a parameter of the long-short term memory artificial neural network, and at is the operation sampled at the tth operation layer and is the probability of the operation being sampled. m is the total number of the feedback quantities, and T is the number of the predicted hyperparameters of the first network structure.
18. The apparatus according to claim 11, wherein the first network structure is constructed according to a long-short term memory artificial neural network, each of the operation layers of the search space corresponds to a time step of the long-short term memory artificial neural network, and for each of the time steps, the cells of the long-short term memory artificial neural network output a hidden state, and the processor is configured to map the hidden states into feature vectors, the dimensions of the feature vectors being the same as the number of operations on each of the operation layers; and sampling one of the operations at each of the operation levels based on the feature vectors to obtain the target network structure.
19. The apparatus of claim 18, wherein the processor is configured to normalize the feature vectors to obtain a probability of each of the operations for each of the operation layers; and for sampling one of said operations at each of said operational levels in accordance with said probabilities to obtain said target network structure.
20. The apparatus of claim 11, wherein the first network structure is constructed according to a convolutional neural network or a cyclic neural network.
21. A computer storage medium, having stored thereon a computer program which, when executed by a computer, causes the computer to perform the method of any one of claims 1 to 10.
22. A computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of any one of claims 1 to 10.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/089698 WO2020237689A1 (en) | 2019-05-31 | 2019-05-31 | Network structure search method and apparatus, computer storage medium, and computer program product |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111656365A true CN111656365A (en) | 2020-09-11 |
Family
ID=72351852
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201980009246.6A Pending CN111656365A (en) | 2019-05-31 | 2019-05-31 | Method and apparatus for network structure search, computer storage medium, and computer program product |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111656365A (en) |
WO (1) | WO2020237689A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112990436A (en) * | 2021-03-23 | 2021-06-18 | 联想(北京)有限公司 | Neural network architecture selection method and device and electronic equipment |
WO2023272972A1 (en) * | 2021-06-30 | 2023-01-05 | 北京市商汤科技开发有限公司 | Neural network search method and apparatus, and device, storage medium and program product |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560985B (en) * | 2020-12-25 | 2024-01-12 | 北京百度网讯科技有限公司 | Neural network searching method and device and electronic equipment |
CN112949832B (en) * | 2021-03-25 | 2024-04-16 | 鼎富智能科技有限公司 | Network structure searching method and device, electronic equipment and storage medium |
CN114462484A (en) * | 2021-12-27 | 2022-05-10 | 东软睿驰汽车技术(沈阳)有限公司 | Network architecture searching method, device, equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109242098A (en) * | 2018-07-25 | 2019-01-18 | 深圳先进技术研究院 | Limit neural network structure searching method and Related product under cost |
CN109190754A (en) * | 2018-08-30 | 2019-01-11 | 北京地平线机器人技术研发有限公司 | Quantitative model generation method, device and electronic equipment |
-
2019
- 2019-05-31 WO PCT/CN2019/089698 patent/WO2020237689A1/en active Application Filing
- 2019-05-31 CN CN201980009246.6A patent/CN111656365A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112990436A (en) * | 2021-03-23 | 2021-06-18 | 联想(北京)有限公司 | Neural network architecture selection method and device and electronic equipment |
WO2023272972A1 (en) * | 2021-06-30 | 2023-01-05 | 北京市商汤科技开发有限公司 | Neural network search method and apparatus, and device, storage medium and program product |
Also Published As
Publication number | Publication date |
---|---|
WO2020237689A1 (en) | 2020-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111656365A (en) | Method and apparatus for network structure search, computer storage medium, and computer program product | |
CN108154237B (en) | Data processing system and method | |
CN112491818B (en) | Power grid transmission line defense method based on multi-agent deep reinforcement learning | |
US20220147877A1 (en) | System and method for automatic building of learning machines using learning machines | |
CN112236782A (en) | End-to-end learning in a communication system | |
CN109361404A (en) | A kind of LDPC decoding system and interpretation method based on semi-supervised deep learning network | |
CN115062779A (en) | Event prediction method and device based on dynamic knowledge graph | |
CN116938323B (en) | Satellite transponder resource allocation method based on reinforcement learning | |
CN116402138A (en) | Time sequence knowledge graph reasoning method and system for multi-granularity historical aggregation | |
Javaheripi et al. | Swann: Small-world architecture for fast convergence of neural networks | |
CN111582456B (en) | Method, apparatus, device and medium for generating network model information | |
CN112513837A (en) | Network structure searching method and device | |
CN112106077A (en) | Method, apparatus, storage medium, and computer program product for network structure search | |
CN111684471A (en) | Method and apparatus for network structure search, computer storage medium, and computer program product | |
Liu et al. | FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training with Dynamic Sparsity | |
Fang et al. | Distributed online adaptive subgradient optimization with dynamic bound of learning rate over time‐varying networks | |
CN116824281B (en) | Privacy-protected image classification method and device | |
CN114389990B (en) | Shortest blocking method and device based on deep reinforcement learning | |
CN111684472A (en) | Method and apparatus for network structure search, computer storage medium, and computer program product | |
CN117336187B (en) | Unmanned aerial vehicle communication network inference method based on inter-edge association | |
CN113868395B (en) | Multi-round dialogue generation type model establishment method, system, electronic equipment and medium | |
CN116484201B (en) | New energy power grid load prediction method and device and electronic equipment | |
CN113052323B (en) | Model training method and device based on federal learning and electronic equipment | |
US20230403204A1 (en) | Method, electronic device, and computer program product for information-centric networking | |
US20240046068A1 (en) | Information processing device for improving quality of generator of generative adversarial network (gan) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200911 |
|
WD01 | Invention patent application deemed withdrawn after publication |