WO2021146977A1 - 网络结构搜索方法和装置 - Google Patents

网络结构搜索方法和装置 Download PDF

Info

Publication number
WO2021146977A1
WO2021146977A1 PCT/CN2020/073674 CN2020073674W WO2021146977A1 WO 2021146977 A1 WO2021146977 A1 WO 2021146977A1 CN 2020073674 W CN2020073674 W CN 2020073674W WO 2021146977 A1 WO2021146977 A1 WO 2021146977A1
Authority
WO
WIPO (PCT)
Prior art keywords
network structure
feedback
parameters
training
general map
Prior art date
Application number
PCT/CN2020/073674
Other languages
English (en)
French (fr)
Inventor
蒋阳
李健兴
胡湛
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2020/073674 priority Critical patent/WO2021146977A1/zh
Priority to CN202080004032.2A priority patent/CN112513837A/zh
Publication of WO2021146977A1 publication Critical patent/WO2021146977A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of machine learning, and in particular to a method and device for searching a network structure.
  • This application provides a network structure search method and device, which can improve the efficiency of network structure search.
  • a network structure search method which includes: sampling a first overall graph through a first network structure in a network structure search model to generate multiple subgraphs, and simultaneously training the multiple subgraphs , To train the first general map, wherein the first general map is constructed according to the search space corresponding to the network structure search model, the search space includes multiple operations, and the first general map includes At least one operation among the multiple operations and the connection between the operations; training one sub-graph in the multiple sub-graphs includes: using a batch of training data in a training set to train the sub-graph; according to training the multiple Update the parameters of the first overall diagram to generate a second overall diagram; determine the feedback amount of the first network structure according to the second overall diagram; and determine the feedback amount of the first network structure according to the first network structure The amount of feedback updates the first network structure.
  • a network structure search device which is used to execute the foregoing first aspect or any possible implementation of the first aspect.
  • the device includes a unit for executing the above-mentioned first aspect or any possible implementation of the first aspect.
  • a network structure search device including: a storage unit and a processor, the storage unit is used to store instructions, the processor is used to execute the instructions stored in the memory, and when the processor executes the instructions stored in the memory When an instruction is executed, the execution causes the processor to execute the method in the first aspect or any possible implementation of the first aspect.
  • a computer-readable medium for storing a computer program, and the computer program includes instructions for executing the first aspect or any possible implementation of the first aspect.
  • a computer program product including instructions is provided.
  • the computer runs the instructions of the computer program product
  • the computer executes the network in the first aspect or any possible implementation of the first aspect.
  • Structure search method Specifically, the computer program product can run on the network structure search device of the second aspect described above.
  • Fig. 1 is a schematic flowchart of a method for searching a network structure.
  • Fig. 2 is a schematic diagram of the principle of a method for searching a network structure in the related art.
  • Fig. 3 is a schematic diagram of a general diagram of a method for searching a network structure according to an embodiment of the present application.
  • Fig. 4 is a schematic flowchart of a network structure search method according to an embodiment of the present application.
  • Fig. 5 is a schematic flowchart of a network structure search method according to another embodiment of the present application.
  • Figure 6 is a schematic diagram of a single-node training general map in the network structure search method.
  • FIG. 7 is a schematic diagram of the parallel training general map in the network structure search method of the embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a network structure search method according to still another embodiment of the present application.
  • FIG. 9 is a schematic diagram of the principle of a method for searching a network structure according to an embodiment of the present application.
  • Fig. 10 is a schematic diagram of a single node determining the amount of feedback in the method for searching the network structure.
  • FIG. 11 is a schematic diagram of determining the feedback amount in parallel in the network structure search method of the embodiment of the present application.
  • Fig. 12 is a schematic block diagram of a network structure search device according to an embodiment of the present application.
  • This application relates to a network structure search parallelization algorithm in Auto Machine Learning (AutoML) technology. It can be applied to model optimization including but not limited to PC, mobile and other scenarios.
  • AutoML Auto Machine Learning
  • network structure search is a technology that uses algorithms to automatically design neural network models.
  • the network structure search is to search out the structure of the neural network model.
  • the neural network model to be searched for the network structure may be a convolutional neural network (Convolutional Neural Networks, CNN).
  • Figure 1 shows a schematic flowchart of a method 1000 for network structure search.
  • the method 1000 includes: S100, determining a search space of a neural network model to be searched for a network structure, and the search space defines There are multiple possible operations on the operation layer between every two nodes in the neural network model; S200, training the whole graph of the search space according to the first network structure, which consists of operations; S300, determining the first The amount of feedback of the network structure (ACC), and the first network structure is updated according to the amount of feedback of the first network structure.
  • S100 determining a search space of a neural network model to be searched for a network structure, and the search space defines There are multiple possible operations on the operation layer between every two nodes in the neural network model
  • S200 training the whole graph of the search space according to the first network structure, which consists of operations
  • S300 determining the first The amount of feedback of the network structure (ACC), and the first network structure is updated according to the amount of feedback of the first network structure.
  • the problem to be solved by the network structure search is to determine the operations between nodes in the neural network model.
  • Different combinations of operations between nodes correspond to different network structures.
  • the nodes in the neural network model can be understood as the characteristic layer in the neural network model.
  • the operation between two nodes refers to the operation required to transform the feature data on one node into the feature data on the other node.
  • the operations mentioned in this application may be other neural network operations such as convolution operations, pooling operations, or fully connected operations. It can be considered that the operation between two nodes constitutes the operation layer between these two nodes.
  • the purpose of the network structure search is to determine an operation on each operation layer.
  • each layer operation of the network structure is sampled from the six choices included in the search space.
  • the NAS After the NAS has established the search space, it usually uses the first network structure to sample the second network structure (ie, network structure A) in the search space, and then trains the second network structure to converge. Determine the amount of feedback, and finally use the amount of feedback to update the first network structure.
  • the second network structure ie, network structure A
  • NAS Specifically, the idea of NAS is to obtain a network structure in the search space through a first network structure, and then obtain the accuracy rate R according to the network structure, and use the accuracy rate R as feedback to update the first network structure. Continue to optimize to get another network structure, and repeat until the best result is obtained.
  • the first network structure in the embodiment of the present application may be used as a controller.
  • the first network structure in the embodiment of the present application may be the network structure of the controller in any training phase.
  • the first network structure may be It is a network structure of a controller that has never been updated, or the first network structure may be a network structure of a controller that has been updated several times.
  • the first network structure is constructed by a recurrent neural network (Recurrent Neural Network, RNN), for example, the first network structure can be specifically constructed by a Long-Short Term Memory (Long-Short Term Memory, LSTM); Alternatively, the first network structure may also be a convolutional neural network (Convolutional Neural Networks, CNN).
  • RNN Recurrent Neural Network
  • the first network structure can be specifically constructed by a Long-Short Term Memory (Long-Short Term Memory, LSTM); Alternatively, the first network structure may also be a convolutional neural network (Convolutional Neural Networks, CNN).
  • RNN recurrent Neural Network
  • CNN convolutional Neural Networks
  • each operation layer in the network structure can sample in the search space, according to the possible sampling of each layer
  • the correspondence can be connected into a general map including multiple sampling results, where the final optimal structure searched is one of the sub-graphs in the general map.
  • the overall graph is connected by operations between nodes, and these operations belong to the search space.
  • the connection mode of the optimal structure with edges in bold in Fig. 3 is a subgraph of the overall picture.
  • ENAS adopts a weight sharing strategy. After sampling a network structure each time, such as the first network structure, it is no longer directly trained to convergence, but the overall image is trained first, that is, S200 in the above method 1000 is executed. A batch, after multiple iterations, the overall graph can finally converge. Please note that the convergence of the graph is not equivalent to the convergence of the network structure.
  • the parameters of the overall graph can be fixed, and then the first network structure is trained, that is, S300 in method 1000 is executed.
  • the overall graph may be sampled to obtain the second network structure, and the second network structure may be trained to obtain the amount of feedback, thereby updating the first network structure.
  • the efficient network structure search based on weight sharing can save time and improve the efficiency of the network structure search because each time the network structure is searched, the parameters that can be shared are shared. For example, in the example in Figure 3, if after searching for node 1, node 3 and node 6 and training the searched network structure, node 1, node 2, node 3 and node 6 are searched this time, then, The relevant parameters of the network structure trained when node 1, node 3 and node 6 are searched can be applied to the training of the network structure searched this time. In this way, it is possible to improve efficiency through weight sharing, and ENAS can increase the efficiency of NAS by more than 1000 times.
  • FIG. 4 shows another schematic flowchart of a method 1000 used in network structure search according to an embodiment of the present application.
  • S200 in the above method 1000 may further include: S210, sampling the first general map through the first network structure in a network structure search model to generate multiple sub-images, and comparing the multiple sub-images Training is performed at the same time to train the first general map, wherein the first general map is constructed according to the search space corresponding to the network structure search model, and the search space includes multiple operations.
  • the overall image includes at least one operation among the multiple operations and the connections between the operations; training a sub-image in the multiple sub-images includes: training the sub-image by using a batch of training data in the training set; S220, According to the parameters obtained by training the multiple sub-pictures, the parameters of the first overall picture are updated to generate a second overall picture.
  • S300 in the method 1000 may further include: S310, determining the amount of feedback of the first network structure according to the second general map, and updating the first network structure according to the amount of feedback of the first network structure .
  • the neural network structure obtained from the network structure search can generally be trained and verified through sample data.
  • the sample data includes verification samples and training samples.
  • the verification samples can be used to verify whether the network structure is good or not, and the training samples can be used to train the network structure searched by the network structure search method.
  • the training samples can be used in the embodiments of this application. S200 and S300 in the method 1000.
  • the detailed introduction is mainly focused on training samples, while the verification samples remain unchanged, which can be used to verify whether the network structure is good.
  • the training sample can also be divided into a training set (train) and a test set (valid).
  • the training set and the test set can be divided by the user, or can also be divided by the network structure search device.
  • the embodiment of the present application does not Limited to this.
  • the first network structure is constructed by LSTM.
  • the training set is used to train the parameters of the search structure, such as the parameters of the structure calculated by conv3*3, sep5*5; and to see the generalization ability of the searched network structure, the parameters of LSTM will be used Train on the test set.
  • the training set is used to train the parameters of the search structure
  • the test set is used to train the parameters of the LSTM.
  • the verification sample is used to verify whether the searched network structure is good or not after training.
  • the number of training samples is 10, and the training samples are divided into a training set with a number of 8 and a test set with a number of 2.
  • the training set with a number of 8 is used to train the searched network structure, and the number is 2.
  • a test set is used to train the LSTM.
  • the training of multiple subgraphs will be performed at the same time in S210, and then the training of the first network in S310 will be performed.
  • the training of any one of the multiple subgraphs included in S210 is taken as an example. A detailed description will be given, including the sampling process of any sub-picture.
  • the S210 may further include: S211: Sampling step, that is, according to the first network structure, sampling the first general graph to generate a subgraph of the first general graph Figure; S212: training step, that is, using a batch of training data of the training set (train) to train the sub-graph.
  • ENAS adopts a weight sharing strategy, and each time a sub-graph is sampled, a batch of data in the training set is used to train the sub-graph.
  • the operation layer of the first network structure is 5 layers, and the search space corresponding to each layer has 4 optional operations, which is equivalent to a 4*5 graph.
  • Network structure search needs to select an operation at each layer, which is equivalent to path optimization on the graph.
  • the first network structure samples an operation in each layer, and then connects the sampled operations to obtain a sub-graph.
  • the result includes updating the parameters, using the updated parameters to update the parameters of the current first general map to generate a second general map, and then fixing the parameters of the second general map; then, sample one operation per layer to obtain one of the second general map Subgraph, and then train this subgraph on another batch of data in the training set, and the update parameters included in the training result are used to update the parameters of the current second overall graph to generate the third overall graph, and so on, repeating the cycle After training for many times, for example, after training until the total graph converges, then continue to perform S300 to train the first network structure.
  • a sub-graph of the current overall picture is obtained, a batch of training data in the training set is used for training, and the parameters obtained from the training are , Update the parameters of the current general chart to generate a general chart with new parameters. That is to say, the parameter G t (for example, the gradient) of the overall graph at time t is calculated based on the overall graph G t-1 that has been updated at time t-1. For example, it can be expressed by formula (1):
  • G t-1 G t-2 - ⁇ G t-2
  • is a coefficient.
  • the embodiment of the present application proposes a parallel calculation method, that is, in S210, a method of simultaneously performing multiple sub-graph training , The training is based on the general map constructed from the search space.
  • multiple subgraphs of the overall graph are determined by sampling, and the multiple subgraphs are trained at the same time.
  • the current first image can be updated according to the parameters obtained from the training.
  • the parameters of a general graph are used to generate a second general graph, and then the parameters of the second general graph are fixed, and then the training of multiple sub graphs is performed in parallel, and the parameters of the second general graph are updated with the parameters obtained by training to generate the third general graph.
  • Figures, and so on after repeated cyclic training many times, for example, until the training to the convergence of the overall graph of the first network structure, continue to perform S300, for example, perform S310 to train the first network structure.
  • n subgraphs are trained in parallel each time, that is, when the first overall graph is trained, n batches of training data are used in parallel to train n subgraphs, and the corresponding training result includes n sets of parameters, and the first training result is updated according to the training result.
  • the parameters of the overall graph are used to generate the second overall graph. Compared with the calculation method of a single node in the prior art, the calculation speed is increased by n times.
  • n sets of parameters can be obtained, that is, the set n in Figure 7 can include n sets of parameters; according to the n sets of parameters, the parameters of the current first general map are updated, that is, the bold dashed line in Figure 7 Indicates update, thereby generating a new general map, that is, generating a second general map.
  • the parameter G that originally occupies multiple times can be calculated in parallel at the same time. For example, assuming that n nodes are calculated at the same time, that is, from tn to n time, then the formula (1) It becomes the following formula (2):
  • is a coefficient. Obviously, time t no longer depends on time t-1.
  • the sampling network structure has strong randomness. If single-node calculation is used, that is, each training is performed sequentially. According to the above formula (1), it can be seen that the network structure sampled for the first time will be Affects the subsequent update calculations, so the search process based on sequential calculations will become unrobust.
  • the parameter n in the embodiment of the present application can be set according to actual applications, and can be set to any positive integer greater than 1.
  • the value of n may be set reasonably, for example, setting n to 10, 20 or other values, and the embodiment of the present application is not limited to this.
  • the training result of any subgraph in the embodiment of the present application may include a set of parameters, and the training of n subgraphs in parallel can obtain n sets of training results, and the n sets of training results can be used to update the first total graph.
  • the parameters in the embodiments of the present application may include weight parameters, that is, the training results obtained from the training of each sub-graph include the weight parameters of one or more operations in the overall image; or, the parameters in the embodiments of the present application may also Including other types of parameters, the embodiment of the present application is not limited thereto.
  • the at least two parameters can be processed to determine the updated value of the parameter of the operation.
  • the target parameter is determined according to the at least two parameters, and the target parameter is used to update the parameter of the same operation.
  • determining the target parameter according to the at least two parameters may include: determining an average of the at least two parameters as the target parameter. That is to say, the training of n subgraphs in parallel obtains that there are at least two sets of parameters in the n sets of training results.
  • the at least two sets of parameters include parameters corresponding to the same operation. For example, the same operation is referred to as the target operation.
  • Each of the two sets of parameters includes a parameter corresponding to the target operation, or that the target operation corresponds to at least two parameters.
  • the at least two parameters can be averaged, and the average value is the target parameter , Update the parameters of the target operation with the average value.
  • determining the target parameter according to the at least two parameters may further include: determining the target parameter according to the weight of the at least two parameters. That is to say, there are at least two sets of parameters in the n sets of training results obtained by the parallel training of n subgraphs, and each set of parameters in the at least two sets of parameters includes a parameter corresponding to the target operation, or that the target operation corresponds to at least Two parameters, correspondingly, the target parameter can be determined according to the weight of the at least two parameters, and the target operation parameter can be updated by the target parameter.
  • the weight of each parameter can be set according to the actual application, for example, The user makes settings, but the embodiment of the present application is not limited to this.
  • the method 200 or the method 300 may further include: determining the target general map according to the second general map, for example, generating the target general map after S210 and S220 are cyclically executed. Specifically, after S210 and S220, the second general map is obtained; then, according to the methods of steps S210 and S220, the second general map is trained, and then the parameters of the second general map are updated to generate the third general map.
  • the training of the overall image is performed multiple times, that is, after the steps S210 and S220 are executed repeatedly, the final overall image is obtained, which is referred to herein as the target overall image.
  • the final overall graph obtained converges, and the finally obtained overall graph is the target overall graph.
  • S310 in the embodiment of the present application may include: determining the feedback amount of the first network structure according to the parameters of the first network structure and the target general map.
  • determining the feedback amount of the first network structure in S310 in step S300 may further include: determining multiple feedback amounts of the first network structure, and determining the first network structure based on the multiple feedback amounts of the first network structure.
  • the target feedback amount of the network structure; in addition, updating the first network structure according to the feedback amount of the first network structure in S310 may include: updating the first network structure according to the target feedback amount.
  • the process of determining any one of the multiple feedback amounts may include: S311: According to the first network structure, each operation layer samples an operation in the search space to obtain the second Network structure; S312: Use a batch of test data in the test set to predict the second network structure to determine a feedback amount of the first network structure.
  • the searched second network structure can be predicted on the test set to obtain feedback to update the first network structure, instead of directly using the test Set training LSTM.
  • each operation layer corresponds to a time step of the long short-term memory artificial neural network (LSTM).
  • the cell output of the long short-term memory artificial neural network (LSTM) A hidden state (hidden state), the corresponding S311 may further include: mapping the hidden state into a feature vector, the dimension of the feature vector is the same as the number of operations on each operation layer; sampling an operation on each operation layer according to the feature vector To get the network structure.
  • the solid arrow represents the time step
  • time 1 represents the first cell of the LSTM
  • time 2 represents the second cell of the LSTM...
  • the square conv3*3 represents the operation of this layer in the model
  • the circle represents the connection relationship between the operation layer and the operation layer.
  • the hidden state output by the cell is calculated to obtain conv3 ⁇ 3, conv3 ⁇ 3 is used as the input layer of the cell at time 2, and the hidden state output by the cell at time 1 is also used as the input of the cell at time 2.
  • Circle 1 is calculated.
  • circle 1 is used as the input of the cell at time 3
  • the hidden state of the cell output at time 2 is also used as the input of time 3.
  • the convolution sep5 ⁇ 5 is calculated and so on.
  • sampling an operation at each operation layer according to the feature vector to obtain the network structure including:
  • an operation is sampled at each operation layer according to the feature vector to obtain the network structure.
  • the hidden state of the cell output of the LSTM is encoded (encoding), and it is mapped to a vector with a dimension of 6, which undergoes a normalized exponential function (softmax). , Becomes a probability distribution, sampling is performed according to this probability distribution, and the operation of the current layer is obtained. And so on to finally get a network structure.
  • S311 and S312 in S310 for training the first network structure can be looped m times, that is, S311 and S312 are repeatedly executed for m times to obtain m feedback quantities of the first network structure, and according to The m feedback quantities determine the target feedback quantity to update the first network structure.
  • a second network structure can be sampled. After this network structure is tested on a batch of training data in the test set, a feedback about the first network structure can be obtained. The amount of feedback that corresponds to the first network structure can be obtained after sufficient cycles are performed.
  • the value of m may be set according to actual applications.
  • m may be set to 20, that is, the feedback amount of the first network structure is 20, or m may also be set to 10, 15 or other values.
  • the specific value of m is not limited here.
  • the first network structure may be constructed according to the long and short-term memory network model, and step S310 may include: according to the m feedback quantities of the first network structure, the parameters of the long- and short-term memory artificial neural network, and the sampled operations, In order to achieve the determination of the target feedback amount of the first network structure in S310. Specifically, it can be achieved by the following formula (4):
  • R k is the k th feedback amount
  • ⁇ c is the parameter for the short and long term memory artificial neural network
  • t A to t-th of the operation in the operation layer is sampled
  • m is the total number of feedback quantities, that is, m is the number of cycles of steps S311 and S312 described above
  • T is the number of hyperparameters predicted by the first network structure. In this way, the first network structure can be updated according to the average value of multiple feedback quantities.
  • T includes operating layers and jumpers, or may also include other hyperparameters that you want to optimize.
  • the specific content of T is not limited here.
  • the first network structure executes S200, it completes the training of the overall graph, and then continues to execute S311 and S312 in S300, and repeats the loop for m times, corresponding to the determination of m feedback quantities, that is, the graph
  • the set m in 10 includes m feedback quantities, and then according to formula (4), the first network structure is updated.
  • the parameter C t of the updated first network structure obtained at time t is calculated based on the parameter C t-1 of the first network structure at time t-1 before the update, which is expressed as the following formula (5 )
  • is the coefficient.
  • m feedback quantities need to be determined, and then all or part of the m feedback quantities can be determined in parallel.
  • S311 and S312 can be executed multiple times at the same time.
  • m'times are executed, that is, m'second network structures are determined at the same time according to the first network structure; m'batches of test data in the test set are used to predict m'respectively A second network structure, and then determine m'feedback quantities.
  • the step of updating the first network structure can be speeded up by m'times; and, using this method to update the first network structure is also applicable to the above formula (5), and has no effect on the update result.
  • m'can be set to any positive integer greater than 1 according to actual applications. For example, considering that m feedback quantities need to be determined according to formula (4), m'can be set equal to m; or considering other factors, m'can also be set to satisfy: m is an integer multiple of m', the embodiment of the present application It is not limited to this.
  • m feedback quantities can be determined at the same time, that is, the set m in Figure 11 can obtain m feedback quantities, according to the formula ( 4) Take the average value of the m feedback quantities to update the first network structure. Using this method to update, compared to Figure 10, the speed can be increased by m times.
  • updating the first network structure according to the feedback amount of the first network structure in step S310 can also be performed multiple times in a loop, that is, after S200, when step S310 of training the first network structure is performed,
  • the first network structure can be updated repeatedly for multiple times, which can reduce random optimization caused by sampling, thereby realizing the optimization of the first network structure.
  • the number of cycles can be set according to actual applications, for example, it can be set to 50 times, or it can also be set to 10, 20, 30 or other values. The specific value of the number of cycles is not limited here.
  • S200 and S300 in the method 1000 can also be repeated multiple times, that is, the training of the overall graph and the update of the first network structure are performed in multiple iterations, and the number of iterations can be performed according to actual applications.
  • the setting for example, can be set to 300 times or 200 times, or can also be set to other values, to finally obtain a better second network structure.
  • the network structure search method of the embodiment of the present application splits the sequentially calculated ENAS into parallel computing algorithms. For example, the training process of the overall graph and the training process of the network structure can be performed in parallel, thereby greatly improving the network
  • the structure search efficiency can alleviate the bias caused by sequential calculation. More efficient network structure design can be carried out for the model design of any scene such as mobile and server.
  • the size of the sequence number of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not correspond to the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the network structure search method according to the embodiment of the present application is described in detail above with reference to FIGS. 1 to 11, and the network structure search device according to the embodiment of the present application will be described below with reference to FIG. 12.
  • the network structure search device 400 includes: multiple processors 410 and a memory 420, where the memory 420 stores one or more programs, and when the programs are executed by the processors,
  • the multiple processors 410 are configured to perform the following steps: sample the first overall graph through the first network structure in a network structure search model to generate multiple sub graphs, and simultaneously train the multiple sub graphs to train The first general map, wherein the first general map is constructed according to a search space corresponding to the network structure search model, the search space includes multiple operations, and the first general map includes the multiple At least one of the operations and the connection between the operations; training one sub-graph in the multiple sub-graphs includes: training the sub-graph by using a batch of training data in the training set; To update the parameters of the first general map to generate a second general map; determine the feedback amount of the first network structure according to the second general map; according to the feedback amount of the first network structure, Update the first network structure.
  • each of the multiple processors 410 implemented in the present application may be used to generate one sub-graph in multiple sub-graphs, and may also be used to train the one sub-graph, that is, the embodiment of the present application
  • the multiple processors 410 of may implement simultaneous training of multiple corresponding sub-graphs, or in other words, train multiple sub-graphs in parallel.
  • At least one of the plurality of processors 410 may be configured to: if there are at least two parameters in the parameters obtained by training the plurality of sub-pictures, corresponding to the first overall picture For the same operation, the target parameter is determined according to the at least two parameters, and the target parameter is used to update the parameter of the same operation.
  • the at least one processor may be specifically configured to determine an average of the at least two parameters as the target parameter.
  • the at least one processor may be specifically configured to determine the target parameter according to the weight of the at least two parameters.
  • the multiple processors 410 are configured to: in the training of the first general image, and the update of the parameters of the first general image to generate a second general image, and loop After multiple executions, a target general map is generated; the feedback amount of the first network structure is determined according to the first network structure and the parameters of the target general map.
  • the multiple processors 410 are configured to: determine m feedback quantities of the first network structure, where m is a positive integer greater than 1, and determine that among the m feedback quantities Any feedback amount of includes: according to the first network structure, each operation layer samples an operation in the search space to obtain the second network structure, and uses a batch of test data in the test set to predict the second network structure , To determine a feedback quantity of the first network structure; determine the target feedback quantity of the first network structure according to the m feedback quantities of the first network structure; the feedback quantity according to the first network structure The updating of the first network structure includes: updating the first network structure according to the target feedback amount.
  • m feedback amounts may be determined by different or the same processors in the multiple processors 410, and the embodiment of the present application is not limited thereto.
  • the multiple processors 410 are configured to simultaneously determine m'feedback quantities of the first network structure, 1 ⁇ m' ⁇ m, and m'is a positive integer. That is, a feedback quantity can be determined correspondingly by each of the multiple processors 410, and m'processors in the multiple processors 410 can determine m'feedback quantities at the same time.
  • the first network structure is constructed according to a long and short-term memory network model
  • the multiple processors 410 are configured to: according to the m feedback quantities of the first network structure, the The parameters of the long- and short-term memory artificial neural network and the sampled operations determine the target feedback amount of the first network structure.
  • the target feedback amount of the first network structure can be determined according to the above formula (4).
  • the multiple processors 410 determine the feedback amount of the first network structure, and the update of the first network structure is executed multiple times in a loop.
  • the number of the multiple processors 410 in the embodiment of the present application may be two or any number greater than two.
  • the steps of the network structure search method in the embodiment of the present application may be executed by different processors 410, for example, different processors 410 may execute step S200 and step S300, respectively.
  • the network structure search apparatus 400 of the embodiment of the present application may further include a communication interface 430 for outputting data processed by the apparatus 400, and/or inputting data to be processed from an external device to the apparatus 400.
  • a communication interface 430 for outputting data processed by the apparatus 400, and/or inputting data to be processed from an external device to the apparatus 400.
  • at least one of the plurality of processors 410 may be used to control the communication interface 430 to input and/output data.
  • first and second are only used for description purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include one or more of the features. In the description of the present application, “multiple” means two or more than two, unless otherwise specifically defined.
  • each embodiment of the present application may be implemented based on a memory and a processor, each memory is used to store instructions for executing the method of each embodiment of the present application, and the processor executes the foregoing instructions, so that the device executes each embodiment of the present application. Methods.
  • processors mentioned in the embodiment of this application may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application-specific integrated circuits (Central Processing Unit, CPU).
  • CPU Central Processing Unit
  • DSPs Digital Signal Processors
  • CPU Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory mentioned in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), and electrically available Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be a random access memory (Random Access Memory, RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • DDR SDRAM Double Data Rate Synchronous Dynamic Random Access Memory
  • Enhanced SDRAM, ESDRAM Enhanced Synchronous Dynamic Random Access Memory
  • Synchronous Link Dynamic Random Access Memory Synchronous Link Dynamic Random Access Memory
  • DR RAM Direct Rambus RAM
  • the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component
  • the memory storage module
  • the embodiments of the present application also provide a computer-readable storage medium on which instructions are stored, and when the instructions are run on a computer, the computer executes the methods of the foregoing method embodiments.
  • An embodiment of the present application also provides a computing device, which includes the above-mentioned computer-readable storage medium.
  • the embodiments of the present application can be applied to aircraft, especially in the field of unmanned aerial vehicles.
  • circuits, sub-circuits, and sub-units in each embodiment of the present application is only illustrative. A person of ordinary skill in the art may be aware that the circuits, sub-circuits, and sub-units of the examples described in the embodiments disclosed herein can be further divided or combined.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer instructions When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • Computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • computer instructions may be transmitted from a website, computer, server, or data center through a cable (such as Coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to transmit to another website, computer, server or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media. Available media can be magnetic media (for example, floppy disks, hard drives, tapes), optical media (for example, high-density digital video discs (Digital Video Disc, DVD)), or semiconductor media (for example, solid state disks (Solid State Disk, SSD)) )Wait.
  • each embodiment of the present application is described by taking a total bit width of 16 bits as an example, and each embodiment of the present application may be applicable to other bit widths.
  • one embodiment or “an embodiment” mentioned throughout the specification means that a specific feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the present application. Therefore, the appearances of "in one embodiment” or “in an embodiment” in various places throughout the specification do not necessarily refer to the same embodiment. In addition, these specific features, structures, or characteristics can be combined in any suitable manner in one or more embodiments.
  • the size of the sequence number of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not correspond to the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • B corresponding to A means that B is associated with A, and B can be determined according to A.
  • determining B based on A does not mean that B is determined only based on A, and B can also be determined based on A and/or other information.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种网络结构搜索方法和装置,可以提高网络结构搜索的效率。该方法包括:通过一网络结构搜索模型中的第一网络结构对第一总图进行采样以生成多个子图,并对该多个子图同时进行训练,以训练该第一总图,其中,该第一总图为根据该网络结构搜索模型对应的搜索空间进行构建的,该搜索空间包括多个操作,该第一总图包括该多个操作中至少一个操作及操作间的连线;对该多个子图中一个子图进行训练包括:利用训练集中的一批训练数据,训练该子图;根据训练该多个子图获得的参数,更新该第一总图的参数,以生成第二总图;根据该第二总图,确定该第一网络结构的反馈量;根据该第一网络结构的反馈量,更新该第一网络结构。

Description

网络结构搜索方法和装置
版权申明
本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者该专利披露。
技术领域
本申请涉及机器学习领域,尤其涉及一种网络结构搜索方法和装置。
背景技术
机器学习算法尤其深度学习算法近年来得到快速发展与广泛应用。随着应用场景和模型结构变得越来越复杂,在应用场景中得到最优模型的难度越来越大,其中,可以使用基于权值分享的高效网络结构搜索(Efficient Neural Architecture Search via Parameter Sharing,ENAS)来提高网络结构搜索(Neural Architecture Search,NAS)的效率。尽管ENAS已经将速度提升了很多,但是由于其基于强化学习进行搜索(reinforcement learning)是顺序计算算法,这种单节点的计算效率仍然有着明显的瓶颈,并且顺序计算会带来较大的偏见(bias),影响搜索结果鲁棒性。
发明内容
本申请提供了一种网络结构搜索方法和装置,可以提高网络结构搜索的效率。
第一方面,提供了一种网络结构搜索方法,包括:通过一网络结构搜索模型中的第一网络结构对第一总图进行采样以生成多个子图,并对所述多个子图同时进行训练,以训练所述第一总图,其中,所述第一总图为根据所述网络结构搜索模型对应的搜索空间进行构建的,所述搜索空间包括多个操作,所述第一总图包括所述多个操作中至少一个操作及操作间的连线;对所述多个子图中一个子图进行训练包括:利用训练集中的一批训练数据,训练所述子图;根据训练所述多个子图获得的参数,更新所述第一总图的参数, 以生成第二总图;根据所述第二总图,确定所述第一网络结构的反馈量;根据所述第一网络结构的反馈量,更新所述第一网络结构。
第二方面,提供了一种网络结构搜索装置,用于执行上述第一方面或第一方面的任意可能的实现方式中的方法。具体地,该装置包括用于执行上述第一方面或第一方面的任意可能的实现方式中的方法的单元。
第三方面,提供了一种网络结构搜索装置,包括:存储单元和处理器,该存储单元用于存储指令,该处理器用于执行该存储器存储的指令,并且当该处理器执行该存储器存储的指令时,该执行使得该处理器执行第一方面或第一方面的任意可能的实现方式中的方法。
第四方面,提供了一种计算机可读介质,用于存储计算机程序,该计算机程序包括用于执行第一方面或第一方面的任意可能的实现方式中的方法的指令。
第五方面,提供了一种包括指令的计算机程序产品,当计算机运行所述计算机程序产品的所述指时,所述计算机执行上述第一方面或第一方面的任意可能的实现方式中的网络结构搜索方法。具体地,该计算机程序产品可以运行于上述第二方面的网络结构搜索装置上。
附图说明
图1是网络结构搜索的方法的示意性流程图。
图2是相关技术的网络结构搜索的方法的原理示意图。
图3是本申请实施例的网络结构搜索的方法的总图的示意图。
图4是本申请实施例的网络结构搜索方法的示意性流程图。
图5是本申请另一实施例的网络结构搜索方法的示意性流程图。
图6是网络结构搜索的方法中单节点训练总图的示意图。
图7是本申请实施例的网络结构搜索方法中并行训练总图的示意图。
图8是本申请再一实施例的网络结构搜索方法的示意性流程图。
图9是本申请实施例的网络结构搜索的方法的原理示意图。
图10是网络结构搜索的方法中单节点确定反馈量的示意图。
图11是本申请实施例的网络结构搜索方法中并行确定反馈量的示意图。
图12是本申请实施例的网络结构搜索装置的示意性框图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。
首先介绍本申请实施例涉及的相关技术及概念。
本申请涉及自动化机器学习算法(Auto Machine Learning,AutoML)技术中网络结构搜索并行化算法。可应用于包含且不限于PC端,移动端等各个场景的模型优化。
近年来,机器学习算法,尤其是深度学习算法,得到了快速发展与广泛应用。随着模型性能不断地提高,模型结构也越来越复杂。在非自动化机器学习算法中,这些结构需要机器学习专家手工设计和调试,过程非常繁复,例如,需要手动设计的有网络每层的操作(operation)和层与层之间的跳线(skip),也称为捷径(shortcut)。而且,随着应用场景和模型结构变得越来越复杂,在应用场景中得到最优模型的难度也越来越大。在这种情况下,AutoML受到学术界与工业界的广泛关注,尤其是网络结构搜索(Neural Architecture Search,NAS)。
具体地,网络结构搜索是一种利用算法自动化设计神经网络模型的技术。网络结构搜索就是要搜索出神经网络模型的结构。例如,在本申请实施例中,待进行网络结构搜索的神经网络模型可以为卷积神经网络(Convolutional Neural Networks,CNN)。
下面将结合附图,详细描述网络结构搜索的方法。图1示出了一种网络结构搜索的方法l000的示意性流程图,如图1所示,该方法1000包括:S100,确定待进行网络结构搜索的神经网络模型的搜索空间,搜索空间定义了神经网络模型中每两个节点之间的操作层上的可能的多种操作;S200,根据第一网络结构训练搜索空间的总图(whole graph),总图由操作构成;S300,确定第一网络结构的反馈量(ACC),并根据该第一网络结构的反馈量更新第一网络结构。
应理解,网络结构搜索要解决的问题就是确定神经网络模型中的节点之间的操作。节点之间的操作的不同组合对应不同的网络结构。进一步地,神经网络模型中的节点可以理解为神经网络模型中的特征层。两个节点之 间的操作指的是,其中一个节点上的特征数据变换为另一个节点上的特征数据所需的操作。本申请提及的操作可以为卷积操作、池化操作、或全连接操作等其他神经网络操作。可以认为两个节点之间的操作构成这两个节点之间的操作层。通常,两个节点之间的操作层上具有多个可供搜索的操作,即具有多个候选操作。网络结构搜索的目的就是在每个操作层上确定一个操作。
例如,将conv3*3,conv5*5,depthwise3*3,depthwise5*5,maxpool3*3,average pool3*3等定义为搜索空间。也即是说,网络结构的每一层操作是在搜索空间中包括的这六个选择中采样。
如图1和图2所示,NAS在建立搜索空间后,通常利用第一网络结构在搜索空间中采样到第二网络结构(即网络结构A),然后将第二网络结构训练到收敛,以确定反馈量,最后利用反馈量更新第一网络结构。
具体地,NAS的思想是,通过一个第一网络结构在搜索空间中得到一个网络结构,然后根据该网络结构得到准确率R,将准确率R作为反馈以更新第一网络结构,第一网络结构继续优化得到另一个网络结构,如此反复进行直到得到最佳的结果。
其中,本申请实施例中的该第一网络结构可作为控制器,例如,本申请实施例中的第一网络结构可以是控制器在任意一个训练阶段的网络结构,例如,第一网络结构可以是从未更新过的控制器的网络结构,或者,第一网络结构可以是更新过若干次的控制器的网络结构。
在图2的示例中,第一网络结构通过循环神经网络(Recurrent Neural Network,RNN)构建,例如,第一网络结构可以具体通过长短期记忆人工神经网络(Long-Short Term Memory,LSTM)构建;或者,第一网络结构也可以通过卷积神经网络(Convolutional Neural Networks,CNN)。在此不对第一网络结构构建的具体方式进行限定。
然而,将第二网络结构训练到收敛比较耗时。因此,相关技术出现了多种解决NAS高效性的方法,例如,基于网络变换的高效网络结构搜索(Efficient Architecture Search by Network Transformation),以及基于权值分享的高效网络结构搜索(Efficient Neural Architecture Search via Parameter Sharing,ENAS)。其中,基于权值分享的高效网络结构搜索应用较为广泛。
具体地,如图3所示,基于权值分享的高效网络结构搜索在使用过程中,对于网络结构中各个操作层,每个操作层均可以在搜索空间中采样,根据每一层可能的采样结果,对应可以连接成包括多种采样结果的总图,其中,搜索到的最终最优结构是总图中的其中一个子图。在图3的示例中,总图由节点间的操作连接而成,这些操作属于搜索空间。图3中加粗的带边最优结构的连接方式是总图的一个子图。
ENAS采用权值分享策略,在每次采样到一个网络结构后,例如采用第一网络结构,不再将其直接训练至收敛,而是先训练总图,即执行上述方法1000中的S200,训练一个批(batch),迭代多次后,总图最终可以收敛。请注意,图的收敛并不相当于网络结构的收敛。
在训练完所述总图后,可以将总图的参数固定(fix)住,然后训练该第一网络结构,即执行方法1000中的S300。具体地,可以对总图进行采样以得到第二网络结构,训练第二网络结构以得到反馈量,从而更新第一网络结构。
可以理解,基于权值分享的高效网络结构搜索,由于在每次搜索网络结构时,分享了可以分享的参数,可以节约时间,从而提高网络结构搜索的效率。例如,在图3的示例中,如果在搜索到节点1、节点3和节点6并对搜索到的网络结构进行训练之后,本次搜索到节点1、节点2、节点3和节点6,那么,搜索到节点1、节点3和节点6时训练的网络结构的相关参数可以应用到对本次搜索到的网络结构的训练中。这样,就可以实现通过权值分享提高效率,ENAS可以将NAS的效率提升1000倍以上。
尽管ENAS已经将速度提升了很多,但是由于基于强化学习进行搜索(reinforcement learning)是顺序计算算法,即S200中的总图训练过程和S300中的网络结构的训练过程仍然为单节点计算,比如在一块图形处理器(英语:Graphics Processing Unit,GPU)上计算,这使得NAS的效率有着明显的瓶颈,大大制约了搜索效率;另外,顺序计算难免带来偏见(bias),影响搜索结果鲁棒性。因此,本申请实施例提出了并行计算的方式。具体地,下面将结合附图,对本申请实施例的方法1000中的S200和S300采用并行计算的方式进行详细描述。
图4示出了本申请实施例的用于网络结构搜索中的方法1000的另一示意性流程图。如图4所示,上述方法1000中的S200可以进一步包括:S210, 通过一网络结构搜索模型中的第一网络结构对第一总图进行采样以生成多个子图,并对所述多个子图同时进行训练,以训练所述第一总图,其中,所述第一总图为根据所述网络结构搜索模型对应的搜索空间进行构建的,所述搜索空间包括多个操作,所述第一总图包括所述多个操作中至少一个操作及操作间的连线;对所述多个子图中一个子图进行训练包括:利用训练集中的一批训练数据,训练所述子图;S220,根据训练所述多个子图获得的参数,更新所述第一总图的参数,以生成第二总图。对应的,方法1000中的S300可以进一步包括:S310,根据所述第二总图,确定第一网络结构的反馈量,并根据所述第一网络结构的反馈量,更新所述第一网络结构。
应理解,由网络结构搜索得到的神经网络结构一般可以通过样本数据进行训练和验证。其中,样本数据包括验证样本和训练样本,验证样本可以用于验证网络结构好不好,训练样本可以用于训练网络结构搜索方法搜索到的网络结构,例如,训练样本可以用于本申请实施例中的方法1000中的S200和S300。
在本申请实施例中,主要针对训练样本进行详细介绍,而验证样本保持不变,可以用于验证网络结构好不好。具体地,训练样本还可以划分为训练集(train)和测试集(valid),该训练集和测试集可以由用户进行划分,或者也可以由网络结构搜索装置进行划分,本申请实施例并不限于此。
在本申请实施例中,假设第一网络结构通过LSTM构建。在搜索网络结构时,训练集用于训练搜索结构的参数,如通过conv3*3、sep5*5计算出的结构的参数;而为看搜索到的网络结构的泛化能力,会把LSTM的参数在测试集上训练。也即是说,训练集用于训练搜索结构的参数,测试集用于训练LSTM的参数。而验证样本用于验证训练后搜索到的网络结构好不好。
例如,训练样本的数量为10个,将训练样本划分为数量为8个的训练集和数量为2个的测试集,数量为8个的训练集用于训练搜到的网络结构,数量为2个的测试集用于训练LSTM。
应理解,在S210中会同时进行多个子图的训练,然后再进行S310中的第一网络的训练,为了便于说明,这里以S210包括的多个子图的训练中任意一个子图的训练为例进行详细说明,其中,包括对该任意一个子图的采样过程。具体地,如图5所示,对于任意一个子图而言,该S210可以进一步包括:S211:采样步骤,即根据第一网络结构,对第一总图进行采样生成第 一总图的一个子图;S212:训练步骤,即利用训练集(train)的一批(batch)训练数据训练该子图。
在本本申请实施例中,ENAS采用权值分享策略,在每次采样到一个子图后,利用训练集的一批数据训练该子图。
例如,假设第一网络结构的操作层为5层,每一层对应的搜索空间有4个可选用的操作,相当于4*5的图。网络结构搜索需要在每层选一个操作,相当于在图上进行路径优化。在现有技术中,初始时,第一网络结构在每层采样一个操作,然后把采样到的操作连起来,得到一个子图,在训练集的一批数据上训练这个子图,得到的训练结果包括更新参数,采用该更新参数更新当前的第一总图的参数以生成第二总图,然后固定该第二总图的参数;接着,再每层采样一个操作得到第二总图的一个子图,再在训练集的另一批数据上训练这个子图,得到的训练结果中包括的更新参数用于更新当前的第二总图的参数以生成第三总图,依次类推,反复循环训练多次之后,例如训练至总图收敛之后,则继续执行S300,以训练第一网络结构。
例如,如图6所示,对于任意一个的子图而言,根据第一网络结构,获取当前的总图的一个子图,利用训练集中的一批训练数据进行训练,并根据训练得到的参数,更新当前的总图的参数,以生成具有新的参数的总图。也就是说,t时刻的总图的参数G t(例如,梯度)是基于t-1时刻已经更新过的总图G t-1计算的,例如,可以通过公式(1)表示:
G t-1=G t-2-αΔG t-2
G t=G t-1-αΔG t-1   (1)
...
其中,α为系数。
考虑这种单节点计算方式,每次只执行一个子图的训练,效率较低,因此,本申请实施例提出了并行计算的方式,即在S210中,采用同时进行多个子图的训练的方式,训练根据搜索空间构建的总图。
具体地,根据本申请的实施例,通过采样确定总图的多个子图,并且同时训练该多个子图,在完成并行的多个子图的训练之后,可以根据训练得到的参数,更新当前的第一总图的参数以生成第二总图,然后固定该第二总图的参数,再并行地进行多个子图的训练,并采用训练得到的参数更新第二总图的参数以生成第三总图,依次类推,反复循环训练多次之后,例如,直至训练至第一网络结构的总图收敛后,则继续执行S300,例如执行S310,以 训练第一网络结构。例如,假设每次并行进行n个子图的训练,即训练第一总图时,并行采用n批训练数据对应训练n个子图,对应得到的训练结果包括n组参数,根据该训练结果更新第一总图的参数以生成第二总图,相比于现有技术中的单个节点计算的方式,计算速度提高n倍。
具体地,如图7所示,仍然假设可以同时进行n个子图的训练,并且当前的为第一总图,那么同时进行该n个子图的训练,每次训练均对应产生一组参数(或者说训练结果),则可以获取n组参数,即图7中的集合n中可以包括n组参数;根据该n组参数,更新当前的第一总图的参数,即图7中的加粗虚线表示更新,从而生成新的总图,即生成第二总图。也就是说,相比于上述的单节点计算,原本占用多个时刻的参数G可以同时并行计算,例如,假设有n个节点同时计算,即从t-n到n时刻同时计算,那么公式(1)则变为下面的公式(2):
Figure PCTCN2020073674-appb-000001
其中,α为系数。显然,t时刻不再依赖t-1时刻。
根据公式(2)可知,K个时刻的并行更新公式可以整合在一起写为:
Figure PCTCN2020073674-appb-000002
因此,并行会对顺序计算带来影响,所以只能用“约等于”符号。但是并行化会大幅提高搜索效率,可以将训练过程加速n倍。下面会分析收敛影响好坏。
在训练总图的过程中,采样网络结构是具有很强的随机性,如果采用单节点计算,也就是顺序进行每次训练,根据上述公式(1)可知,第一次采样到的网络结构会影响后面的更新计算,所以基于顺序计算的搜索过程会变的不鲁棒。
但是换成并行以后,根据公式(3)可知,在n次内,先采样哪个网络结构,对结果没有影响。因此并行化以后,避免顺序计算带来的bias,搜索过程变的更加鲁棒。
可选地,本申请实施例中的参数n可以根据实际应用设置,并且可以设置为任意大于1的正整数。例如,可以考虑效率与成本,合理设置n的值,例如,设置n为10、20或者其它数值,本申请实施例并不限于此。
应理解,本申请实施例中的任意一个子图的训练产生的结果可以包括一组参数,并行n个子图的训练可以获得n组训练结果,该n组训练结果可以用于更新第一总图的参数以生成第二总图。其中,本申请实施例中的参数可以包括权重参数,即每个子图的训练的得到的训练结果包括总图中的一个或者多个操作的权重参数;或者,本申请实施例中的参数也可以包括其他类型的参数,本申请实施例并不限于此。
具体地,在S210中的并行n个子图的训练的过程中,对于第一总图中包括的任意一个操作而言,若在获得的n组参数中,存在至少两组参数中包括对应该操作的参数,也就是n组参数中存在至少两个参数对应于同一个操作,那么可以对该至少两个参数进行处理,以确定对该操作的参数的更新值。例如,根据该至少两个参数确定目标参数,并采用该目标参数更新该同一操作的参数。
可选地,根据该至少两个参数确定目标参数可以包括:将该至少两个参数的平均数确定为该目标参数。也就是说并行的n个子图的训练获得n组训练结果中存在至少两组参数,该至少两组参数中都包括对应于同一操作的参数,比如这里称该同一操作为目标操作,那么该至少两组参数中每组参数中都包括对应于该目标操作的参数,或者说该目标操作对应有至少两个参数,对应的,可以将该至少两个参数取平均值,该平均值为目标参数,通过该平均值更新该目标操作的参数。
或者,根据该至少两个参数确定目标参数还可以包括:根据该至少两个参数的权重,确定该目标参数。也就是说并行的n个子图的训练获得n组训练结果中存在至少两组参数,该至少两组参数中每组参数中都包括对应于该目标操作的参数,或者说该目标操作对应有至少两个参数,对应的,可以根据该至少两个参数的权重,确定目标参数,通过该目标参数更新该目标操作的参数,其中,每个参数的权重可以根据实际应用进行设置,例如,可以由用户进行设置,但本申请实施例并不限于此。
如图4所示,在S210与S220之后,继续执行S300中的S310。可选地,在S310之前,所述方法200或者说方法300还可以包括:根据第二总图,确定目标总图,例如,循环执行S210与S220之后,生成目标总图。具体地,在S210与S220之后,获得第二总图;之后,根据步骤S210与S220的方式,训练该第二总图,进而更新该第二总图的参数,以生成第三总图,依次类推, 执行多次总图的训练,也就是循环执行步骤S210与S220多次之后,获得最后的总图,这里将其称为目标总图,例如,在循环执行步骤S210与S220多次之后,最后获得的总图收敛,则该最后获得总图为目标总图。
对应的,本申请实施例中的S310可以包括:根据该第一网络结构和该目标总图的参数,确定第一网络结构的反馈量。
可选地,该步骤S300中的S310中确定第一网络结构的反馈量还可以进一步包括:确定第一网络结构的多个反馈量,并根据第一网络结构的多个反馈量,确定第一网络结构的目标反馈量;另外,S310中的根据所述第一网络结构的反馈量,更新所述第一网络结构,可以包括:根据该目标反馈量,更新该第一网络结构。
其中,如图8所示,确定多个反馈量中的任意一个反馈量的过程可以包括:S311:根据第一网络结构,每个操作层在所述搜索空间内采样一个操作,以得到第二网络结构;S312:利用测试集的一批测试数据预测该第二网络结构,以确定第一网络结构的一个反馈量。
这样可以根据搜索到的第二网络结构在测试集上进行预测从而得到第一网络结构的一个反馈量;那么搜索到多个第二网络结构,对应可以得到多个反馈量,即反复执行多次S311与S312之后,可以获得第一网络结构的多个反馈量。为了便于描述,这里假设确定第一网络结构的m个反馈量,即反复执行m次S311与S312之后获得m个反馈量,其中,m为大于1的正整数。
在本申请实施例中,根据S312,在搜索到第二网络结构后,可将搜索到的第二网络结构在测试集上预测,以得到反馈量来更新第一网络结构,而并非直接用测试集训练LSTM。
具体地,如图9所示,每个操作层对应于长短期记忆人工神经网络(LSTM)的一个时间步(timestep),对于每个时间步,长短期记忆人工神经网络的细胞(Cell)输出一个隐状态(hidden state),对应的S311还可以进一步包括:将隐状态映射为特征向量,特征向量的维度与每个操作层上的操作数量相同;根据特征向量在每个操作层采样一个操作以得到网络结构。
这样可以实现针对每个操作层,在搜索空间内采样一个操作以得到第二网络结构。例如,一共要搜索一个20层的网络,不考虑跳线,需要20个时 间步。
如图9所示,实线箭头表示时间步(timestep),时间1表示LSTM的第一个Cell,时间2表示LSTM的第二个Cell……以此类推。方块conv3*3表示在模型中该层的操作,圆形表示操作层与操作层之间的连接关系。
可以理解,由于网络结构的计算有先后顺序,将计算先后的关系的逻辑关系映射到LSTM上,就是图9中一个小方块从左往右,对应的每一个时间的LSTM的cell的状态。
具体地,在时间1下,cell输出的隐状态经过计算得到卷积conv3×3,conv3×3作为时间2下cell的输入层,时间1下cell输出的隐状态也作为时间2下cell的输入,计算得到圆圈1。
同理,圆圈1作为时间3下cell的输入,时间2下cell输出的隐状态也作为时间3的输入,计算得到卷积sep5×5……以此类推。
进一步地,根据特征向量在每个操作层采样一个操作以得到网络结构,包括:
将特征向量进行归一化(softmax)以得到每个操作层的每个操作的概率;
根据概率在每个操作层采样一个操作以得到网络结构。
如此,实现根据特征向量在每个操作层采样一个操作以得到网络结构。具体地,在图9所示的例子中,对LSTM的cell输出的隐状态进行编码(encoding)操作,将其映射维度为6的向量(vector),该向量经过归一化指数函数(softmax),变为概率分布,依据此概率分布进行采样,得到当前层的操作。以此类推最终得到一个网络结构。可以理解,在此例子中,只有一个输入,一共包含六种操作(3×3卷积,5×5卷积,3×3depthwise-separable卷积,5×5depthwise-separable卷积,max pooling,3×3average pooling),向量的维度与搜索空间对应,6是指搜索空间有6个操作可选择。
可选地,在本申请实施例中,训练第一网络结构的S310中的S311与S312可以循环m次,即反复执行S311和S312m次,进而获得第一网络结构的m个反馈量,并根据该m个反馈量,确定目标反馈量,以更新第一网络结构。
可以理解,在m次循环中的每次循环时,都可以采样到一个第二网络 结构,这个网络结构在测试集的一批训练数据上进行测试后,可以得到一个关于第一网络结构的反馈量,循环进行足够次数后可以获得第一网络结构对应数量的反馈量。
可选地,该m的值可以根据实际应用进行设置,例如,可以设置该m为20,即第一网络结构的反馈量为20个,或者,m也可以设置为10、15或其他数值。在此不对m的具体数值进行限定。
可选地,第一网络结构可以是根据长短期记忆网络模型来构建,则步骤S310可以包括:根据第一网络结构的m个反馈量、长短期记忆人工神经网络的参数以及采样到的操作,以实现S310中的确定第一网络结构的目标反馈量。具体地,可以通过下面的公式(4)实现:
Figure PCTCN2020073674-appb-000003
其中,R k为第k个所述反馈量;θ c为所述长短期记忆人工神经网络的参数;a t为在第t个所述操作层采样到的所述操作;
Figure PCTCN2020073674-appb-000004
为采样到所述操作的概率;m是所述反馈量的总数量,即m为上述的步骤S311与S312的循环次数;T为所述第一网络结构预测的超参数的数量。这样可以实现根据多个反馈量的平均值更新第一网络结构。
可选地,T包括操作层和跳线,或者,还可能包含其他想要优化的超参数。在此不对T的具体内容进行限定。
应理解,在现有技术中,通常采用单节点计算方式,即上述步骤S311与S312执行一次,获得一个反馈量,然后执行下一次,再获得一个反馈量,直至循环执行S311和S312共计m次,共获得m个反馈量,以便于根据公式(4)实现对第一网络结构的更新。
例如,如图10所示,第一网络结构在执行S200之后,完成对总图的训练,进而继续执行S300中的S311与S312,并反复循环执行m次,对应确定m个反馈量,即图10中的集合m中包括m个反馈量,再根据公式(4),实现对第一网络结构的更新。其中,对于其中t时刻获得的更新后的第一网络结构的参数C t是基于更新前的t-1时刻的第一网络结构的参数C t-1计算的,即表示为下面的公式(5)
C t=C t-1-βΔC t-1   (5)
其中,β为系数。
由于这种单节点计算方式,每次只能计算获得一个反馈量,效率较低,因此可以考虑并行计算的方式,以提高效率,即同时执行多次S311与S312。
应理解,根据公式(4)可知,需要确定m个反馈量,那么该m个反馈量中的全部或者部分可以并行确定。具体地,可以同时执行多次S311与S312,这里假设执行m'次,即根据第一网络结构,同时确定m'个第二网络结构;采用测试集中的m'批测试数据,分别预测m'个第二网络结构,进而确定m'个反馈量。这样可以将更新第一网络结构的步骤加速m'倍;并且,采用这种方式更新第一网络结构,同样适用于上述公式(5),对更新结果没有影响。
可选地,m'可以根据实际应用设置为任意大于1的正整数。例如,考虑到根据公式(4),需要确定m个反馈量,那么可以设置m'与m相等;或者考虑其他因素,也可以设置m'满足:m为的m'整数倍,本申请实施例并不限于此。
具体地,如图11所示,假设m'设置为与m相等,那么利用测试集中的数据,可以同时确定m个反馈量,即图11中的集合m可以获得m个反馈量,根据公式(4)取该m个反馈量的均值,以更新第一网络结构。采用这种方式更新,相比于图10,速度可以提升m倍。
在本申请实施例中,步骤S310中根据第一网络结构的反馈量更新第一网络结构也可以循环执行多次,也即是说,在S200之后,在训练第一网络结构步骤S310执行时,可以反复更新第一网络结构多次,这样能够减少采样带来的随机性优化,从而实现第一网络结构的优化。其中,该循环的次数可以根据实际应用进行设置,例如可以设置为50次,或者也可以设置为10,20,30或其他数值,在此不对循环次数的具体数值进行限定。
在本申请实施例中,方法1000中的S200和S300也可以反复执行多次,即对总图的训练和对第一网络结构的更新是多次迭代进行的,该迭代次数可以根据实际应用进行设置,例如可以设置为300次或者200次,或者也可以设置为其他数值,最终获得效果较好的第二网络结构。
因此,本申请实施例的网络结构搜索方法,将顺序计算的ENAS拆分为并行计算算法,例如,对于总图的训练过程和网络结构的训练过程都可以采用并行计算的方式,从而大幅提升网络结构搜索效率,同时可缓解顺序计算带来的bias。可针对移动端、服务器端等任何场景的模型设计进行更加高效 网络结构设计。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
上文中结合图1至图11,详细描述了根据本申请实施例的网络结构搜索方法,下面将结合图12,描述根据本申请实施例的网络结构搜索装置。
如图12所示,根据本申请实施例的网络结构搜索装置400包括:多个处理器410和存储器420,其中,存储器420存储有一个或多个程序,在程序被处理器执行的情况下,使得多个处理器410用于执行以下步骤:通过一网络结构搜索模型中的第一网络结构对第一总图进行采样以生成多个子图,并对所述多个子图同时进行训练,以训练所述第一总图,其中,所述第一总图为根据所述网络结构搜索模型对应的搜索空间进行构建的,所述搜索空间包括多个操作,所述第一总图包括所述多个操作中至少一个操作及操作间的连线;对所述多个子图中一个子图进行训练包括:利用训练集中的一批训练数据,训练所述子图;根据训练所述多个子图获得的参数,更新所述第一总图的参数,以生成第二总图;根据所述第二总图,确定所述第一网络结构的反馈量;根据所述第一网络结构的反馈量,更新所述第一网络结构。
应理解,本申请实施了的多个处理器410中的每个处理器可以用于生成多个子图中的一个子图,还可以用于对所述一个子图进行训练,即本申请实施例的多个处理器410可以实现对应的多个子图的同时训练,或者说并行训练多个子图。
可选的,作为一个实施例,所述多个处理器410中的至少一个处理器可以用于:若训练所述多个子图获得的参数中存在至少两个参数对应所述第一总图中的同一操作,根据所述至少两个参数确定目标参数,并采用所述目标参数更新所述同一操作的参数。
可选的,作为一个实施例,所述至少一个处理器具体可以用于:将所述至少两个参数的平均数确定为所述目标参数。
可选的,作为一个实施例,所述至少一个处理器具体可以用于:根据所述至少两个参数的权重,确定所述目标参数。
可选的,作为一个实施例,所述多个处理器410用于:在所述训练所述第一总图,以及所述更新所述第一总图的参数以生成第二总图,循环执行多 次之后,生成目标总图;根据所述第一网络结构和所述目标总图的参数,确定所述第一网络结构的反馈量。
可选的,作为一个实施例,所述多个处理器410用于:确定所述第一网络结构的m个反馈量,其中,m为大于1的正整数,确定所述m个反馈量中的任意一个反馈量包括:根据第一网络结构,每个操作层在所述搜索空间采样一个操作,以得到第二网络结构,并利用测试集中的一批测试数据,预测所述第二网络结构,以确定所述第一网络结构的一个反馈量;根据所述第一网络结构的m个反馈量,确定所述第一网络结构的目标反馈量;所述根据所述第一网络结构的反馈量,更新所述第一网络结构,包括:根据所述目标反馈量,更新所述第一网络结构。
应理解,该m个反馈量可以由多个处理器410中的不同的或者相同的处理器确定,本申请实施例并不限于此。
可选的,作为一个实施例,所述多个处理器410用于:同时确定所述第一网络结构的m'个反馈量,1<m'≤m,m'为正整数。即可以通过多个处理器410中每个处理器对应确定一个反馈量,多个处理器410中的m'个处理器可以同时确定出m'个反馈量。
可选的,作为一个实施例,所述第一网络结构是根据长短期记忆网络模型来构建,所述多个处理器410用于:根据所述第一网络结构的m个反馈量、所述长短期记忆人工神经网络的参数以及采样到的操作,确定所述第一网络结构的目标反馈量。例如,可以根据上述公式(4)确定所述第一网络结构的目标反馈量。
可选的,作为一个实施例,所述多个处理器410确定所述第一网络结构的反馈量,以及所述更新所述第一网络结构,循环执行多次。
应理解,本申请实施例中的多个处理器410的数量可以是两个或者多于两个的任意数量。在该多个处理器410中,本申请实施例的网络结构搜索方法的步骤流程可以由不同的处理器410分别执行,例如可以由不同的处理器410分别执行步骤S200和步骤S300。
可选地,本申请实施例的网络结构搜索装置400还可以包括通信接口430,用于将装置400处理完成的数据输出,和/或,将要处理的数据从外部设备输入至装置400。例如,多个处理器410中的至少一个处理器可以用于控制通信接口430输入和/输出数据。
在本申请的描述中,需要理解的是,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个所述特征。在本申请的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。
应理解,本申请各实施例的装置可以基于存储器和处理器实现,各存储器用于存储用于执行本申请个实施例的方法的指令,处理器执行上述指令,使得装置执行本申请各实施例的方法。
应理解,本申请实施例中提及的处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
需要说明的是,当处理器为通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件时,存储器(存储模块)集成在处理器中。
应注意,本文描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本申请实施例还提供一种计算机可读存储介质,其上存储有指令,当指令在计算机上运行时,使得计算机执行上述各方法实施例的方法。
本申请实施例还提供一种计算设备,该计算设备包括上述计算机可读存储介质。
本申请实施例可以应用在飞行器,尤其是无人机领域。
应理解,本申请各实施例的电路、子电路、子单元的划分只是示意性的。本领域普通技术人员可以意识到,本文中所公开的实施例描述的各示例的电路、子电路和子单元,能够再行拆分或组合。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(Digital Subscriber Line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(Digital Video Disc,DVD))、或者半导体介质(例如,固态硬盘(Solid State Disk,SSD))等。
应理解,本申请各实施例均是以总位宽为16位(bit)为例进行说明的,本申请各实施例可以适用于其他的位宽。
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方 式结合在一个或多个实施例中。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
应理解,在本申请实施例中,“与A相应的B”表示B与A相关联,根据A可以确定B。但还应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (18)

  1. 一种网络结构搜索方法,其特征在于,包括:
    通过一网络结构搜索模型中的第一网络结构对第一总图进行采样以生成多个子图,并对所述多个子图同时进行训练,以训练所述第一总图,其中,所述第一总图为根据所述网络结构搜索模型对应的搜索空间进行构建的,所述搜索空间包括多个操作,所述第一总图包括所述多个操作中至少一个操作及操作间的连线;对所述多个子图中一个子图进行训练包括:利用训练集中的一批训练数据,训练所述子图;
    根据训练所述多个子图获得的参数,更新所述第一总图的参数,以生成第二总图;
    根据所述第二总图,确定所述第一网络结构的反馈量;
    根据所述第一网络结构的反馈量,更新所述第一网络结构。
  2. 根据权利要求1所述的方法,其特征在于,所述根据训练所述多个子图获得的参数,包括:
    若训练所述多个子图获得的参数中存在至少两个参数对应所述第一总图中的同一操作,根据所述至少两个参数确定目标参数,并采用所述目标参数更新所述同一操作的参数。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述至少两个参数确定目标参数,包括:
    将所述至少两个参数的平均数确定为所述目标参数。
  4. 根据权利要求2所述的方法,其特征在于,所述根据所述至少两个参数确定目标参数,包括:
    根据所述至少两个参数的权重,确定所述目标参数。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述根据所述第二总图,确定所述第一网络结构的反馈量,包括:
    在所述训练所述第一总图,以及所述更新所述第一总图的参数以生成第二总图,循环执行多次之后,生成目标总图;
    根据所述第一网络结构和所述目标总图的参数,确定所述第一网络结构的反馈量。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述确定 所述第一网络结构的反馈量,包括:
    确定所述第一网络结构的m个反馈量,其中,m为大于1的正整数,确定所述m个反馈量中的任意一个反馈量包括:根据第一网络结构,每个操作层在所述搜索空间中采样一个操作,以得到第二网络结构,并利用测试集中的一批测试数据,预测所述第二网络结构,以确定所述第一网络结构的一个反馈量;
    根据所述第一网络结构的m个反馈量,确定所述第一网络结构的目标反馈量;
    所述根据所述第一网络结构的反馈量,更新所述第一网络结构,包括:
    根据所述目标反馈量,更新所述第一网络结构。
  7. 根据权利要求6所述的方法,其特征在于,所述确定所述第一网络结构的m个反馈量,包括:
    同时确定所述第一网络结构的m'个反馈量,1<m'≤m,m'为正整数。
  8. 根据权利要求6或7所述的方法,其特征在于,所述第一网络结构是根据长短期记忆网络模型来构建,
    所述根据所述第一网络结构的m个反馈量,确定所述第一网络结构的目标反馈量,包括:
    根据所述第一网络结构的m个反馈量、所述长短期记忆人工神经网络的参数以及采样到的操作,确定所述第一网络结构的目标反馈量。
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述确定所述第一网络结构的反馈量,以及所述更新所述第一网络结构,循环执行多次。
  10. 一种网络结构搜索装置,其特征在于,包括多个处理器和存储器,所述存储器存储有一个或多个程序,在所述程序被所述多个处理器执行的情况下,使得所述多个处理器分别用于:
    通过一网络结构搜索模型中的第一网络结构对第一总图进行采样以生成多个子图,并对所述多个子图同时进行训练,以训练所述第一总图,其中,所述第一总图为根据所述网络结构搜索模型对应的搜索空间进行构建的,所述搜索空间包括多个操作,所述第一总图包括所述多个操作中至少一个操作及操作间的连线;对所述多个子图中一个子图进行训练包括:利用训练集中的一批训练数据,训练所述子图;
    根据训练所述多个子图获得的参数,更新所述第一总图的参数,以生成第二总图;
    根据所述第二总图,确定所述第一网络结构的反馈量;
    根据所述第一网络结构的反馈量,更新所述第一网络结构。
  11. 根据权利要求10所述的装置,其特征在于,所述多个处理器中的至少一个处理器用于:
    若训练所述多个子图获得的参数中存在至少两个参数对应所述第一总图中的同一操作,根据所述至少两个参数确定目标参数,并采用所述目标参数更新所述同一操作的参数。
  12. 根据权利要求11所述的装置,其特征在于,所述至少一个处理器用于:
    将所述至少两个参数的平均数确定为所述目标参数。
  13. 根据权利要求11所述的装置,其特征在于,所述至少一个处理器用于:
    根据所述至少两个参数的权重,确定所述目标参数。
  14. 根据权利要求10至13中任一项所述的装置,其特征在于,所述多个处理器用于:
    在所述训练所述第一总图,以及所述更新所述第一总图的参数以生成第二总图,循环执行多次之后,生成目标总图;
    根据所述第一网络结构和所述目标总图的参数,确定所述第一网络结构的反馈量。
  15. 根据权利要求10至14中任一项所述的装置,其特征在于,所述多个处理器用于:
    确定所述第一网络结构的m个反馈量,其中,m为大于1的正整数,确定所述m个反馈量中的任意一个反馈量包括:根据第一网络结构,每个操作层在所述搜索空间采样一个操作,以得到第二网络结构,并利用测试集中的一批测试数据,预测所述第二网络结构,以确定所述第一网络结构的一个反馈量;
    根据所述第一网络结构的m个反馈量,确定所述第一网络结构的目标反馈量;
    所述根据所述第一网络结构的反馈量,更新所述第一网络结构,包括:
    根据所述目标反馈量,更新所述第一网络结构。
  16. 根据权利要求15所述的装置,其特征在于,所述多个处理器用于:
    同时确定所述第一网络结构的m'个反馈量,1<m'≤m,m'为正整数。
  17. 根据权利要求15或16所述的装置,其特征在于,所述第一网络结构是根据长短期记忆网络模型来构建,
    所述多个处理器用于:
    根据所述第一网络结构的m个反馈量、所述长短期记忆人工神经网络的参数以及采样到的操作,确定所述第一网络结构的目标反馈量。
  18. 根据权利要求10至17中任一项所述的装置,其特征在于,所述多个处理器确定所述第一网络结构的反馈量,以及所述更新所述第一网络结构,循环执行多次。
PCT/CN2020/073674 2020-01-22 2020-01-22 网络结构搜索方法和装置 WO2021146977A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/073674 WO2021146977A1 (zh) 2020-01-22 2020-01-22 网络结构搜索方法和装置
CN202080004032.2A CN112513837A (zh) 2020-01-22 2020-01-22 网络结构搜索方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/073674 WO2021146977A1 (zh) 2020-01-22 2020-01-22 网络结构搜索方法和装置

Publications (1)

Publication Number Publication Date
WO2021146977A1 true WO2021146977A1 (zh) 2021-07-29

Family

ID=74953034

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/073674 WO2021146977A1 (zh) 2020-01-22 2020-01-22 网络结构搜索方法和装置

Country Status (2)

Country Link
CN (1) CN112513837A (zh)
WO (1) WO2021146977A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114826921A (zh) * 2022-05-05 2022-07-29 苏州大学应用技术学院 基于抽样子图的网络资源动态分配方法、系统及介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175671A (zh) * 2019-04-28 2019-08-27 华为技术有限公司 神经网络的构建方法、图像处理方法及装置
US20190286984A1 (en) * 2018-03-13 2019-09-19 Google Llc Neural architecture search by proxy
CN110428046A (zh) * 2019-08-28 2019-11-08 腾讯科技(深圳)有限公司 神经网络结构的获取方法及装置、存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190286984A1 (en) * 2018-03-13 2019-09-19 Google Llc Neural architecture search by proxy
CN110175671A (zh) * 2019-04-28 2019-08-27 华为技术有限公司 神经网络的构建方法、图像处理方法及装置
CN110428046A (zh) * 2019-08-28 2019-11-08 腾讯科技(深圳)有限公司 神经网络结构的获取方法及装置、存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Overview of NAS (Neural Structure Search)", SIGAI-ARTIFICIAL INTELLIGENCE TECHNICAL ARTICLE, 21 October 2019 (2019-10-21), pages 1 - 21, XP055832251, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/60414004> [retrieved on 20210816] *
BARRET ZOPH, QUOC V. LE: "Neural Architecture Search with Reinforcement Learning", 15 February 2017 (2017-02-15), XP055444384, Retrieved from the Internet <URL:https://arxiv.org/pdf/1611.01578.pdf> [retrieved on 20180125] *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114826921A (zh) * 2022-05-05 2022-07-29 苏州大学应用技术学院 基于抽样子图的网络资源动态分配方法、系统及介质
CN114826921B (zh) * 2022-05-05 2024-05-17 苏州大学应用技术学院 基于抽样子图的网络资源动态分配方法、系统及介质

Also Published As

Publication number Publication date
CN112513837A (zh) 2021-03-16

Similar Documents

Publication Publication Date Title
CN111126574B (zh) 基于内镜图像对机器学习模型进行训练的方法、装置和存储介质
WO2022141869A1 (zh) 模型训练方法、调用方法、装置、计算机设备和存储介质
CN110929047A (zh) 关注邻居实体的知识图谱推理方法和装置
CN113168559A (zh) 机器学习模型的自动化生成
CN111406264A (zh) 神经架构搜索
US20190138929A1 (en) System and method for automatic building of learning machines using learning machines
CN113987119A (zh) 一种数据检索方法、跨模态数据匹配模型处理方法和装置
US20230153513A1 (en) Method and system for using deep learning to improve design verification by optimizing code coverage, functional coverage, and bug detection
WO2021146977A1 (zh) 网络结构搜索方法和装置
US10990525B2 (en) Caching data in artificial neural network computations
WO2020237689A1 (zh) 网络结构搜索的方法及装置、计算机存储介质和计算机程序产品
CN114399025A (zh) 一种图神经网络解释方法、系统、终端以及存储介质
KR20220134627A (ko) 하드웨어-최적화된 신경 아키텍처 검색
KR102620875B1 (ko) 심층 신경망 기반 영상 스티칭 방법 및 장치
KR102561799B1 (ko) 디바이스에서 딥러닝 모델의 레이턴시를 예측하는 방법 및 시스템
CN114638823B (zh) 基于注意力机制序列模型的全切片图像分类方法及装置
KR20200090061A (ko) 인공신경망 모델의 검증 방법 및 장치
WO2021081809A1 (zh) 网络结构搜索的方法、装置、存储介质和计算机程序产品
CN114692808A (zh) 图神经网络传播模型确定方法和系统
WO2020237687A1 (zh) 网络结构搜索的方法及装置、计算机存储介质和计算机程序产品
WO2020237688A1 (zh) 网络结构搜索的方法及装置、计算机存储介质和计算机程序产品
EP3895024A1 (en) Caching data in artificial neural network computations
CN115858821B (zh) 知识图谱处理方法、装置及知识图谱处理模型的训练方法
WO2023231796A1 (zh) 一种视觉任务处理方法及其相关设备
CN116522999B (zh) 模型搜索与时延预测器训练方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20915252

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20915252

Country of ref document: EP

Kind code of ref document: A1