CN112106077A - Method, apparatus, storage medium, and computer program product for network structure search - Google Patents

Method, apparatus, storage medium, and computer program product for network structure search Download PDF

Info

Publication number
CN112106077A
CN112106077A CN201980031708.4A CN201980031708A CN112106077A CN 112106077 A CN112106077 A CN 112106077A CN 201980031708 A CN201980031708 A CN 201980031708A CN 112106077 A CN112106077 A CN 112106077A
Authority
CN
China
Prior art keywords
training
network structure
jumper
graph
density
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201980031708.4A
Other languages
Chinese (zh)
Inventor
蒋阳
庞磊
胡湛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SZ DJI Technology Co Ltd
Original Assignee
SZ DJI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SZ DJI Technology Co Ltd filed Critical SZ DJI Technology Co Ltd
Publication of CN112106077A publication Critical patent/CN112106077A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a network structure searching method, which comprises the following steps: and (3) general diagram training: training the first general diagram according to the first network structure and the training data to generate a second general diagram; network structure training: determining a plurality of test subgraphs from the second general graph according to the first network structure; testing the plurality of test subgraphs through the test data to generate a feedback result; determining a jumper wire constraint item according to the feedback result; and updating the first network structure according to the feedback result and the jumper constraint item. In the training process of the neural network, the requirements of different training stages on jumper density are different, and the jumper constraint item in the method is determined based on the feedback result in the current training stage, so that the jumper density in the method is related to the current training stage, the current jumper density of the controller is more adaptive to the current training stage, and the training efficiency can be improved while a good training result is obtained.

Description

Method, apparatus, storage medium, and computer program product for network structure search
Copyright declaration
The disclosure of this patent document contains material which is subject to copyright protection. The copyright is owned by the copyright owner. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the patent and trademark office official records and records.
Technical Field
The present application relates to the field of Artificial Intelligence (AI), and more particularly, to a method, apparatus, storage medium, and computer program product for network structure search.
Background
The neural network is the basis of the AI, and as the performance of the neural network is continuously improved, the network structure thereof is more and more complex. A neural network can be used normally only after being trained, and the training process of the neural network mainly adjusts operations (operations) in each layer of the neural network and connection relations among the layers, so that the neural network can output correct results. The connection relationship may also be referred to as a skip or a shortcut.
One method of training a Neural network is to train the Neural network using Efficient Neural Architecture Search (ENAS). In the process of training the neural network by using the ENAS, the controller continuously samples the network structure of the neural network, tries the influence of different network structures on an output result, and determines the network structure of the next sampling by using the output result of the network structure obtained by the last sampling until the neural network converges.
Compared with a method for manually debugging the neural network, the training efficiency of the neural network can be improved by using the ENAS for training the neural network, but the training efficiency of the neural network by using the ENAS still needs to be improved.
Disclosure of Invention
Embodiments of the present application provide a method and apparatus for network structure search, a computer storage medium, and a computer program product.
In a first aspect, a network structure searching method is provided, including: and (3) general diagram training: training the first general diagram according to the first network structure and the training data to generate a second general diagram; network structure training: determining a plurality of test subgraphs from the second general graph according to the first network structure; testing the plurality of test subgraphs through the test data to generate a feedback result; determining a jumper wire constraint item according to the feedback result; and updating the first network structure according to the feedback result and the jumper constraint item.
The method can be applied to a chip, a mobile terminal or a server. In the training process of the neural network, the requirements on jumper wire density are different in different training stages; for example, in the initial training phase of some neural networks, in order to explore the search space as much as possible to avoid the network structure deviation (bias), a controller with a high jumper density is required to be used for training; in the later training stage of some neural networks, the randomness of the neural networks is greatly reduced, and a controller with low jumper wire density can be adopted for training without exploring all search spaces so as to reduce the consumption of resources (including computing resources and time resources). Because the jumper constraint item in the method is determined based on the feedback result in the current training stage, the jumper constraint item in the method is related to the current training stage, so that the current jumper density of the controller is more suitable for the current training stage, the resource consumption can be reduced while a good training result is obtained, the training efficiency is improved, and the method is particularly suitable for mobile equipment.
In a second aspect, a network structure searching apparatus is provided, which is configured to perform the method in the first aspect.
In a third aspect, there is provided a network structure search apparatus, the apparatus comprising a memory for storing instructions and a processor for executing the instructions stored in the memory, and execution of the instructions stored in the memory causes the processor to perform the method of the first aspect.
In a fourth aspect, a chip is provided, where the chip includes a processing module and a communication interface, the processing module is configured to control the communication interface to communicate with the outside, and the processing module is further configured to implement the method of the first aspect.
In a fifth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a computer, causes the computer to carry out the method of the first aspect.
In a sixth aspect, there is provided a computer program product containing instructions which, when executed by a computer, cause the computer to carry out the method of the first aspect.
Drawings
Fig. 1 is a schematic flow chart of a network structure searching method provided in the present application;
FIG. 2 is a schematic diagram of a general diagram and a sub-diagram provided herein;
FIG. 3 is a schematic flow chart diagram of another network structure searching method provided by the present application;
FIG. 4 is a schematic flow chart illustrating a further method for searching a network structure provided by the present application;
fig. 5 is a schematic diagram of a network structure searching apparatus provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
In the description of the present application, it is to be understood that the terms "first" and "second" are used only for describing different objects, and are not to be construed as other limitations exist. For example, "first general view" and "second general view" represent two different general views, except that there is no other limitation. Further, in the description of the present application, "a plurality" means two or more unless specifically defined otherwise.
In the description of the present application, it is to be noted that, unless otherwise explicitly specified or limited, the term "connected" is to be interpreted broadly, e.g. as a fixed connection, a detachable connection, or an integral connection; may be mechanically connected, may be electrically connected or may be in communication with each other; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
The following disclosure provides many different embodiments or examples for implementing the solution of the present application. To simplify the disclosure of the present application, specific components or steps are described in the examples below. Of course, the following examples are not intended to limit the present application. Moreover, the present application may repeat reference numerals and/or letters in the various examples, which repeat reference numerals and/or letters in the various examples for purposes of brevity and clarity and do not represent a relationship between the various embodiments and/or arrangements discussed.
Embodiments of the present application will be described in detail below, and the embodiments described below are exemplary and are only intended to explain the present application, and should not be construed as limiting the present application.
In recent years, machine learning algorithms, especially deep learning algorithms, have been rapidly developed and widely used. As model performance continues to improve, model structures are also becoming more complex. In the non-automatic machine learning algorithm, the structures need to be manually designed and debugged by a machine learning expert, and the process is very complicated. Moreover, as application scenarios and model structures become more complex, the difficulty of obtaining an optimal model in an application scenario becomes greater. In this case, automated Machine Learning (AutoML) algorithms are of great interest to both academic and industrial fields, especially Neural Architecture Search (NAS).
In particular, network structure search is a technique for automatically designing neural network models using algorithms. The network structure search is to search the structure of the neural network model. In the embodiment of the present application, the Neural network model to be subjected to the network structure search is a Convolutional Neural Network (CNN).
The problem to be solved by network structure search is to determine the operations between nodes in the neural network model. Different combinations of operations between nodes correspond to different network architectures. Further, the nodes in the neural network model may be understood as feature layers in the neural network model. An operation between two nodes refers to an operation required for the transformation of the characteristic data on one of the nodes into the characteristic data on the other node. The operations referred to herein may be convolution operations, pooling operations, or other neural network operations such as fully-connected operations. Operations between two nodes can be considered to constitute an operational layer between the two nodes. Typically, there are multiple operations available for searching, i.e., there are multiple candidate operations, at the operational level between two nodes. The purpose of the network structure search is to determine an operation at each operational level.
For example, conv3 × 3, conv5 × 5, depthwise3 × 3, depthwise5 × 5, maxpool3 × 3, averagepool3 × 3, and the like are defined as the search space. That is, each layer of operation of the network fabric is sampled in these six choices.
As shown in fig. 1, after establishing a search space, the NAS typically utilizes a controller (a kind of neural network) to sample a network structure a in the search space, and then trains a subnetwork (a child network with architecture a) with the network structure a to determine a feedback quantity such as accuracy (accuracy) R, where the accuracy may also be referred to as a predicted value; subsequently, the gradient of p is calculated and scaled with R to update the controller, i.e. the controller is updated with R as feedback (reward).
And then, sampling in a search space by using the updated controller to obtain a new network structure, and circularly executing the steps until a converged sub-network is obtained.
In the example of FIG. 1, the controller may be a Recurrent Neural Network (RNN), or may be a CNN or Long-Short Term Memory (LSTM) Neural Network. The present application is not limited to a specific form of the controller.
However, training the subnetwork structure to converge is time consuming. Therefore, various methods for improving the NAS efficiency have appeared in the related art, such as an efficient network architecture search (efficient architecture search by network transformation) based on network transformation, and ENAS (ENAS view parameter sharing) based on weight sharing. Among them, the ENAS based on weight sharing is widely used.
As shown in fig. 2, the overall graph is composed of operations represented by each node, which may be all operations of the search space, and jumpers between the operations. In the process of using the weight sharing based ENAS, the controller may search the network configuration from the search space, for example, determine the operation of each node and the connection relationship between the nodes from the search space, and determine a sub-graph from the general graph based on the searched final network configuration. The operations connected by the bold arrows in fig. 2 are an example of the final subgraph, where node 1 is the input node of the final subgraph and nodes 3 and 6 are the output nodes of the final subgraph.
In the process of using the ENAS based on weight sharing, after sampling a network structure each time, determining a sub-graph from the general graph based on the network structure, instead of training the sub-graph directly to convergence, a small batch (batch) of data is used to train the sub-graph, for example, a Back Propagation (BP) algorithm may be used to update parameters of the sub-graph, and an iteration is completed. Since the subgraph belongs to the general graph, updating the parameters of the subgraph is equivalent to updating the parameters of the general graph. After multiple iterations, the overall graph may eventually converge. Note that the convergence of the overall graph is not equivalent to the convergence of the sub-graph.
After the overall graph is trained, the parameters of the overall graph may be fixed (fix) and then the controller trained. For example, the controller may search the search space for a network structure, obtain a sub-graph from the general graph based on the network structure, input test data (valid) into the sub-graph to obtain a predicted value, and update the controller with the predicted value.
The ENAS based on weight sharing shares the parameters which can be shared when searching the network structure each time, thereby improving the efficiency of searching the network structure. For example, in the example shown in fig. 2, when the network structure is searched for the last time, the node 1, the node 3, and the node 6 are searched for and the searched network structure is trained, and the node 1, the node 2, the node 3, and the node 6 are searched for this time, then the parameters related to the node 1, the node 3, and the node 6 obtained last time may be applied to the training of the searched network structure. Thus, efficiency improvement through weight sharing is achieved.
The ENAS can improve the efficiency of the NAS by more than 1000 times, but in the actual use process, the following problems occur: the predicted value of the sub-graph is changed all the time, and the predicted result of the sub-graph determined by the controller is more and more accurate along with the progress of the training process, namely the predicted value of the sub-graph is gradually increased; the coefficient of the jumper constraint item of the parameter updating formula of the controller is fixed, so that the constraint strength generated by the jumper constraint item is continuously reduced along with the training process of the subgraph; the jumper wire constraint item reflects the jumper wire density of the controller, the constraint strength generated by the jumper wire constraint item is gradually reduced, which means that the jumper wire density of the controller is gradually increased, and the number of floating-point operations per second (FLOPS) of the controller is increased by excessive jumper wires, so that the updating efficiency of the controller is reduced, and the efficiency of determining the final subgraph is influenced. In addition, if the initial value of the jumper constraint item is set to be a larger value, the predicted value of the sub-graph is smaller in the initial stage of the overall graph training, so that the constraint strength of the jumper constraint item is too large, a search space cannot be fully explored when the controller is updated, and a larger deviation (bias) occurs in the network structure of the controller.
Based on this, the present application provides a network structure searching method, as shown in fig. 3, the method including:
general graph (whole graph) training step S12: training the first general diagram according to a first network structure and training (train) data to generate a second general diagram;
network structure training step S14: determining a plurality of test subgraphs from the second general graph according to the first network structure; testing the plurality of test subgraphs through the test data to generate a feedback result; determining a jumper wire constraint item according to the feedback result; and updating the first network structure according to the feedback result and the jumper constraint item.
The first network structure may be the network structure of the controller in any one training phase, for example, the first network structure may be the network structure of the controller that has never been updated, or the first network structure may be the network structure of the controller that has been updated several times.
In this application, "number" refers to at least one, e.g., number of times refers to at least one, and number of test subgraphs refers to at least one test subgraph.
The first total graph may be a neural network including a preset number of layers, for example, the preset number of layers is 4, and each layer of the neural network corresponds to a search space including 6 operations, so that the first total graph may be a network structure formed by 24 operations and connection relationships between the 24 operations, and each layer of the network structure includes 6 operations.
The second overall graph may not be a converged overall graph, however, the randomness of the second overall graph is typically less than the randomness of the first overall graph upon training of the training data.
While training the first general graph, the first network structure may determine a first training subgraph from within the first general graph; inputting a batch of data in the training data into a first training subgraph to generate a first training result; and training the first general diagram according to the first training result to generate the second general diagram. For example, the first general graph may be trained using a first training result and a Back Propagation (BP) algorithm.
The first network structure may determine a first training subgraph from within the first general graph using the method shown in fig. 2 and update the first network structure using the method shown in fig. 2. After the overall graph training step and the network structure training step are executed for a plurality of times in a circulating mode, generating a final overall graph and a final network structure (namely, a final controller); and determining a final sub-graph in the final general graph through the final network structure, wherein the final sub-graph is the network structure conforming to the preset scene.
Next, a process of updating the first network configuration will be described in detail.
The first network structure may be an LSTM neural network, the search space including, for example, conv3 x 3, conv5 x 5, depthwise3 x 3, depthwise5 x 5, maxpool3 x 3, and averagepool3 x 3.
If the preset number of layers of the network structure to be searched is 20, a 20-layer first general diagram needs to be constructed, and each layer of the first general diagram contains all operations of the search space. Each layer of a network structure to be searched corresponds to one time step (timestamp) of the LSTM neural network, and without considering a jumper, the LSTM neural network needs to execute 20 time steps, and each time a time step is executed, a unit (cell) of the LSTM neural network outputs a hidden state (hidden state), and the hidden state can be encoded (encoding) and mapped into a vector (vector) with a dimension of 6, wherein the vector corresponds to a search space, and the 6 dimensions correspond to 6 operations of the search space; subsequently, the vector is processed by the softmax function and becomes a probability distribution, and the LSTM neural network performs sampling (sample) according to the probability distribution to obtain the operation (operation) of the current layer of the network structure to be searched. The above process is repeated to obtain a network structure (i.e., subgraph).
Fig. 4 shows an example of determining a network structure.
The blank rectangle represents a cell (cell) of the LSTM neural network, the square containing "conv 3 × 3" and the like represents the operation of the layer in the network structure to be searched, and the circle represents the connection relationship between layers.
The LSTM neural network may perform encoding (encoding) operation on the hidden state, map the hidden state to a vector (vector) with a dimension of 6, convert the vector into probability distribution through a normalized exponential function (softmax), and perform sampling according to the probability distribution to obtain the operation of the current layer.
For example, in performing the first time step, the input quantity (e.g., a random value) to the elements of the LSTM neural network is normalized by the softmax function to a vector, which is then translated into an operation (conv3 × 3); conv3 × 3 is used as an input quantity of the unit of the LSTM neural network when executing the second time step, and a hidden state generated when executing the first time step is also used as an input quantity when executing the second time step, the two input quantities are processed to obtain a circle 1, and the circle 1 represents that the output of the current operation layer (the operation layer corresponding to the node 2) and the output of the first operation layer (the operation layer corresponding to the node 1) are spliced together.
Similarly, circle 1 is used as the input quantity of the unit of the LSTM neural network when executing the third time step, and the hidden state generated when executing the second time step is also used as the input quantity when executing the third time step, and the two input quantities are processed to obtain sep5 × 5. And so on to finally obtain a network structure.
Then, a sub-graph is determined from the first general graph based on the network structure, and a batch of data in the training data is input into the sub-graph to generate a training result so as to train the first general graph based on the training result. Thus, the subgraph of the first network structure determined from the first overall graph may be referred to as a training subgraph.
For example, the controller updates the parameters of the training subgraph according to the training result and the BP algorithm to complete one iteration, and because the training subgraph belongs to the first general graph, updating the parameters of the training subgraph is equivalent to updating the parameters of the first general graph; subsequently, the controller may determine a training subgraph again from the first general graph after one iteration, input another batch of training data into the training subgraph, generate another training result, and update the training subgraph again by using the training result and the BP algorithm to complete another iteration. And obtaining a second general graph after all the training data are used.
After the second general map is acquired, the parameters of the second general map may be fixed and then the controller may be trained. The first network structure may determine a subgraph from the second overall graph, which may be referred to as a test subgraph, in accordance with the method shown in fig. 4; and inputting a batch of data in the test data into the test subgraph to obtain a feedback result (e.g. a predicted value). The first network structure can be updated directly by using the feedback result, or by using an average value of a plurality of feedback results, wherein the plurality of test results are obtained by inputting a plurality of batches of data in the test data into the test subgraph.
In the process of updating the first network structure, a jumper constraint item can be determined according to the feedback result, and then the first network structure is updated according to the jumper constraint item and the feedback result.
Because the jumper wire constraint item is related to the feedback result of the current training stage, the current jumper wire density determined based on the jumper wire constraint item is more suitable for the current training stage, so that the training efficiency can be improved while a network structure with higher credibility is searched.
Optionally, the size of the jumper constraint term is positively correlated with the size of the feedback result.
In the present application, "positively correlated" means: the value of one parameter increases with increasing value of the other parameter or the value of one parameter decreases with decreasing value of the other parameter.
In the initial training phase, the controller needs to fully explore a search space to avoid that a large deviation causes that a total graph cannot be converged in a subsequent iteration process, so that the jumper density of the controller is not small enough, that is, the value of a jumper constraint term is not large enough. After several iterations, the randomness of the overall graph is reduced, i.e., the probability that some operations may be sampled is reduced, in which case, continuing to use a controller with a higher jumper density for sampling will result in a reduction in training efficiency and a waste of computing power, and therefore, a larger jumper constraint term needs to be used to update the controller.
Because the feedback result (the prediction accuracy of the test subgraph) generally continuously increases along with the progress of the training phase, the value of the jumper constraint item can be continuously increased along with the progress of the training phase by positively correlating the size of the jumper constraint item with the size of the feedback result, so that the balance between the performance (namely, the performance of the subgraph) and the training efficiency and the calculation power is achieved in the searching process of the network structure.
Optionally, the jumper constraint term comprises cos (1-R)k)n,RkFor feedback results, n is a hyper-parameter associated with the application scenario. The value of n may be a real number greater than 0, for example, the value of n may be 10, 20, 30, 40, 50, 60, 70, or 100, and optionally, n may also be a value greater than or equal to 100.
Optionally, the jumper constraint term includes a KL divergence (Kullback-Leibler divergence) between the current jumper density and a preset desired jumper density, for example, the jumper constraint term is
Figure BDA0002771965980000102
Wherein α ═ cos (1-R)k)nλ is a hyperparameter, θcQ is a preset expected jumper density and p is a current jumper density for the parameters of the first network structure. The current jumper density may be based on feedback results of the test sub-graph. The feedback result comprises the prediction accuracy of the test subgraph on the test data.
Alternatively, the processor may update the first network structure according to equation (1).
Figure BDA0002771965980000101
Wherein, atIs the operation (operation) sampled in the t-th time step, P (a)t|a(t-1):1;θc) To sample the probability of this operation, m is the number of feedback results used when updating the first network structure once, T is the number of layers of the second general graph, λ is a hyperparameter, which is generally set to 0.0001 in a classification task, and different values may be set according to specific tasks.
The meaning of formula (1) is: maximization of RkWhile minimizing KL divergence; i.e., keeping the current patch cord density consistent with the desired patch cord density,maximization of Rk
In the prior art, R iskAs the convergence of the overall graph gradually increases, and since the penalty strength generated by λ is constant all the time, q can only be set to be between 0.4 and 0.6 when the jump line is constrained in the actual iteration process. Wherein, q is calculated according to (the number of all current jumpers)/(the number of all connectable jumpers) and takes a value between 0 and 1, the jumpers are randomly connected lines in an initial state, and the density of the jumpers is 0.5.
Formula (1) has an additional coefficient α, α and R compared to the prior artkAnd (4) positively correlating. During the initial training phase, the predicted values generated by the controller from the test subgraphs determined in the general graph are not accurate enough, and therefore, RkThe method has the advantages that the method is small, alpha is small, punishment of a jumper constraint item is small, the updated controller has large jumper density, a search space can be fully explored, and large deviation is avoided in an initial training stage; as the training phase progresses, the accuracy of the predicted values generated by the controller from the test sub-graphs determined in the general graph increases, RkGradually increasing, alpha is also gradually increased, the punishment of the jumper constraint item is also gradually increased, the jumper density of the updated controller is small, and the search space can not be fully explored any more (the randomness of the general diagram is reduced, the controller does not need to fully explore the search space), so that the training efficiency is improved; furthermore, since the updated controller is no longer fully exploring the search space, excessive FLOPS is not required, thereby reducing computational power consumption. As can be seen from the above, α is an adaptive coefficient, which can balance the performance of the network structure (i.e., the performance of the subgraph) and the computational power consumption during the search process of the network structure, and is particularly suitable for mobile devices with weak processing capability.
It should be noted that the jumper constraint term containing α is merely an example, and any jumper constraint term that can be adaptively adjusted according to the training phase falls within the scope of the present application.
Examples of the method for network structure search provided by the present application are described above in detail. It is understood that the network structure searching apparatus includes hardware structures and/or software modules for performing the respective functions in order to realize the above functions. Those of skill in the art would readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The present application may perform the division of the functional units for the apparatus for network structure search according to the method example described above, for example, each function may be divided into each functional unit, or two or more functions may be integrated into one functional unit. The functional units can be realized in a hardware form or a software form. It should be noted that the division of the units in the present application is schematic, and is only one division of logic functions, and there may be another division manner in actual implementation.
Fig. 5 shows a schematic structural diagram of an apparatus for network structure search provided in the present application. The dashed lines in fig. 5 indicate that the unit is an optional unit. The apparatus 500 may be used to implement the methods described in the method embodiments above. The apparatus 500 may be a software module, a chip, a terminal device, or other electronic device.
The apparatus 500 comprises one or more processing units 501, and the one or more processing units 501 may support the apparatus 500 to implement the method in the method embodiment corresponding to fig. 3. The processing unit 501 may be a software processing unit, a general purpose processor, or a special purpose processor. The processing unit 501 may be configured to control the apparatus 500, execute a software program (e.g. for implementing the method according to the first aspect), and process data (e.g. predicted values). The apparatus 500 may further include a communication unit 505 to enable input (reception) and output (transmission) of signals.
For example, the apparatus 500 may be a software module, and the communication unit 505 may be an interface function of the software module. The software modules may run on a processor or control circuitry.
Also for example, the apparatus 500 may be a chip, and the communication unit 505 may be an input and/or output circuit of the chip, or the communication unit 505 may be a communication interface of the chip, and the chip may be a component of a terminal device or other electronic devices.
In the apparatus 500, the processing unit 501 may perform:
and (3) general diagram training: training the first general diagram according to the first network structure and the training data to generate a second general diagram;
network structure training: determining a plurality of test subgraphs from the second general graph according to the first network structure; testing the plurality of test subgraphs through the test data to generate a feedback result; determining a jumper wire constraint item according to the feedback result; and updating the first network structure according to the feedback result and the jumper constraint item.
Optionally, the size of the jumper constraint term is positively correlated with the size of the feedback result.
Optionally, the jumper constraint term comprises cos (1-R)k)n,RkAnd n is a hyper-parameter related to the application scene for the feedback result.
Alternatively, 0< n ≦ 100.
Optionally, the jumper constraint term includes a KL divergence between the current jumper density and a preset desired jumper density.
Optionally, the jumper constraint term is
Figure BDA0002771965980000131
Wherein α ═ cos (1-R)k)nλ is a hyperparameter, θcQ is a preset expected jumper density and p is a current jumper density, wherein q is a parameter of the first network structure.
Optionally, the current jumper density is derived based on the number of test subgraphs.
Optionally, the feedback result includes a predicted accuracy of the test data by the plurality of test subgraphs.
Optionally, the processing unit 501 is specifically configured to: determining a first training subgraph within the first population graph over the first network structure; inputting a batch of data in the training data into the first training subgraph to generate a first training result; and training the first general diagram according to the first training result to generate the second general diagram.
Optionally, the processing unit 501 is further configured to: after the overall graph training step and the network structure training step are executed for a plurality of times in a circulating mode, generating a final overall graph and a final network structure; and determining a final sub-graph in the final general graph through the final network structure, wherein the final sub-graph is a network structure conforming to a preset scene.
It will be clear to those skilled in the art that for the convenience and simplicity of description, the specific operation and effects of the above-mentioned devices and units can be referred to the relevant description of the corresponding method embodiments in fig. 1 to 4. For brevity, no further description is provided herein.
As an alternative, the above steps may be performed by logic circuits in the form of hardware or instructions in the form of software. For example, the processing unit 501 may be a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other programmable logic device, such as a discrete gate, a transistor logic device, or a discrete hardware component.
The apparatus 500 may comprise one or more storage units 502, in which a program 504 (e.g. a software program including the method according to the second aspect) is stored, the program 504 may be executed by the processing unit 501, and generates an instruction 503, so that the processing unit 501 executes the method described in the above method embodiments according to the instruction 503. Optionally, the storage unit 502 may also have data (e.g., prediction value and jumper density) stored therein. Alternatively, the processing unit 501 may also read data stored in the storage unit 502, the data may be stored at the same storage address as the program 504, and the data may be stored at a different storage address from the program 504.
The processing unit 501 and the storage unit 502 may be separately disposed, or may be integrated together, for example, on a single board or a System On Chip (SOC).
The present application also provides a computer program product which, when executed by the processing unit 501, implements the method according to any of the embodiments of the present application.
The computer program product may be stored in the storage unit 502, for example, the program 504, and the program 504 is finally converted into an executable object file capable of being executed by the processing unit 501 through preprocessing, compiling, assembling, linking, and the like.
The computer program product may be transmitted from one computer readable storage medium to another computer readable storage medium, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.).
The present application also provides a computer-readable storage medium (e.g., storage unit 502) having stored thereon a computer program which, when executed by a computer, implements the method of any of the embodiments of the present application. The computer program may be a high-level language program or an executable object program.
The computer readable storage medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., Digital Video Disk (DVD)), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others. For example, the computer-readable storage medium may be volatile memory or nonvolatile memory, or the computer-readable storage medium may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate SDRAM, enhanced SDRAM, SLDRAM, Synchronous Link DRAM (SLDRAM), and direct rambus RAM (DR RAM).
It should be understood that, in the embodiments of the present application, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the inherent logic of the processes, and should not constitute any limitation to the implementation process of the embodiments of the present application.
The term "and/or" herein is merely an association relationship describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
The system, apparatus and method disclosed in the embodiments provided in the present application can be implemented in other ways. For example, some features of the method embodiments described above may be omitted, or not performed. The above-described embodiments of the apparatus are merely exemplary, the division of the unit is only one logical function division, and there may be other division ways in actual implementation, and a plurality of units or components may be combined or integrated into another system. In addition, the coupling between the units or the coupling between the components may be direct coupling or indirect coupling, and the coupling includes electrical, mechanical or other connections.
In short, the above description is only a part of the embodiments of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (23)

1. A network structure search method, comprising:
and (3) general diagram training: training the first general diagram according to the first network structure and the training data to generate a second general diagram;
network structure training: determining a plurality of test subgraphs from the second general graph according to the first network structure; testing the plurality of test subgraphs through the test data to generate a feedback result; determining a jumper wire constraint item according to the feedback result; and updating the first network structure according to the feedback result and the jumper constraint item.
2. The method of claim 1, wherein the size of the jumper constraint term is positively correlated to the size of the feedback result.
3. The method of claim 2, wherein the jumper constraint term comprises cos (1-R)k)n,RkAnd n is a hyper-parameter related to the application scene for the feedback result.
4. The method of claim 3, wherein 0< n ≦ 100.
5. The method according to any one of claims 1 to 4, wherein the jumper constraint term includes a KL divergence between a current jumper density and a preset desired jumper density.
6. The method of claim 5, wherein the step of removing the metal oxide layer comprises removing the metal oxide layer from the metal oxide layerIn that, the jumper constraint term is
Figure FDA0002771965970000011
Wherein α ═ cos (1-R)k)nλ is a hyperparameter, θcQ is a preset expected jumper density and p is a current jumper density, wherein q is a parameter of the first network structure.
7. The method of claim 5 or 6, wherein the current patch cord density is derived based on the number of test subgraphs.
8. The method of any of claims 1 to 7, wherein the feedback results comprise predicted correctness of the test data by the number of test subgraphs.
9. The method of any one of claims 1 to 8, wherein training the first total graph according to the first network structure and the training data to generate a second total graph comprises:
determining a first training subgraph within the first population graph over the first network structure;
inputting a batch of data in the training data into the first training subgraph to generate a first training result;
and training the first general diagram according to the first training result to generate the second general diagram.
10. The method of any one of claims 1 to 9, further comprising:
after the overall graph training step and the network structure training step are executed for a plurality of times in a circulating mode, generating a final overall graph and a final network structure;
and determining a final sub-graph in the final general graph through the final network structure, wherein the final sub-graph is a network structure conforming to a preset scene.
11. A network structure search apparatus, comprising a processing unit configured to perform:
and (3) general diagram training: training the first general diagram according to the first network structure and the training data to generate a second general diagram;
network structure training: determining a plurality of test subgraphs from the second general graph according to the first network structure; testing the plurality of test subgraphs through the test data to generate a feedback result; determining a jumper wire constraint item according to the feedback result; and updating the first network structure according to the feedback result and the jumper constraint item.
12. The apparatus of claim 11, wherein the size of the jumper constraint term is positively correlated to the size of the feedback result.
13. The apparatus of claim 12, wherein the jumper constraint term comprises cos (1-R)k)n,RkAnd n is a hyper-parameter related to the application scene for the feedback result.
14. The apparatus of claim 13, wherein 0< n ≦ 100.
15. The apparatus of any of claims 11-14, wherein the jumper constraint term comprises a KL divergence between a current jumper density and a preset desired jumper density.
16. The apparatus of claim 15, wherein the jumper constraint term is
Figure FDA0002771965970000021
Wherein α ═ cos (1-R)k)nλ is a hyperparameter, θcQ is a preset expected jumper density and p is a current jumper density, wherein q is a parameter of the first network structure.
17. The apparatus of claim 15 or 16, wherein the current jumper density is derived based on the number of test subgraphs.
18. The apparatus of any of claims 11 to 17, wherein the feedback results comprise predicted correctness of the test data by the number of test subgraphs.
19. The apparatus according to any one of claims 11 to 18, wherein the processing unit is specifically configured to:
determining a first training subgraph within the first population graph over the first network structure;
inputting a batch of data in the training data into the first training subgraph to generate a first training result;
and training the first general diagram according to the first training result to generate the second general diagram.
20. The apparatus according to any one of claims 11 to 19, wherein the processing unit is further configured to:
after the overall graph training step and the network structure training step are executed for a plurality of times in a circulating mode, generating a final overall graph and a final network structure;
and determining a final sub-graph in the final general graph through the final network structure, wherein the final sub-graph is a network structure conforming to a preset scene.
21. A network structure search device characterized by comprising: a memory for storing instructions and a processor for executing the instructions stored by the memory, and execution of the instructions stored in the memory causes the processor to perform the method of any of claims 1 to 10.
22. A computer storage medium, having stored thereon a computer program which, when executed by a computer, causes the computer to perform the method of any one of claims 1 to 10.
23. A computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of any one of claims 1 to 10.
CN201980031708.4A 2019-10-30 2019-10-30 Method, apparatus, storage medium, and computer program product for network structure search Pending CN112106077A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/114361 WO2021081809A1 (en) 2019-10-30 2019-10-30 Network architecture search method and apparatus, and storage medium and computer program product

Publications (1)

Publication Number Publication Date
CN112106077A true CN112106077A (en) 2020-12-18

Family

ID=73750057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980031708.4A Pending CN112106077A (en) 2019-10-30 2019-10-30 Method, apparatus, storage medium, and computer program product for network structure search

Country Status (2)

Country Link
CN (1) CN112106077A (en)
WO (1) WO2021081809A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6963627B2 (en) * 2017-07-21 2021-11-10 グーグル エルエルシーGoogle LLC Neural architecture search for convolutional neural networks
EP3688673A1 (en) * 2017-10-27 2020-08-05 Google LLC Neural architecture search
CN109934336B (en) * 2019-03-08 2023-05-16 江南大学 Neural network dynamic acceleration platform design method based on optimal structure search and neural network dynamic acceleration platform
CN110009048B (en) * 2019-04-10 2021-08-24 苏州浪潮智能科技有限公司 Method and equipment for constructing neural network model

Also Published As

Publication number Publication date
WO2021081809A1 (en) 2021-05-06

Similar Documents

Publication Publication Date Title
US11610131B2 (en) Ensembling of neural network models
US11106978B2 (en) Execution of a genetic algorithm with variable evolutionary weights of topological parameters for neural network generation and training
US11853893B2 (en) Execution of a genetic algorithm having variable epoch size with selective execution of a training algorithm
CN109376867B (en) Processing method and device of two-quantum-bit logic gate
US20220147877A1 (en) System and method for automatic building of learning machines using learning machines
CN111105029B (en) Neural network generation method, generation device and electronic equipment
CN111353601A (en) Method and apparatus for predicting delay of model structure
JP6325762B1 (en) Information processing apparatus, information processing method, and information processing program
CN111656365A (en) Method and apparatus for network structure search, computer storage medium, and computer program product
Zhao et al. Dynamic regret of online markov decision processes
CN110009048B (en) Method and equipment for constructing neural network model
CN116938323B (en) Satellite transponder resource allocation method based on reinforcement learning
WO2024051655A1 (en) Method and apparatus for processing histopathological whole-slide image, and medium and electronic device
CN115718869A (en) Model training method, system, cluster and medium
CN111063000B (en) Magnetic resonance rapid imaging method and device based on neural network structure search
CN116151384B (en) Quantum circuit processing method and device and electronic equipment
CN114372539B (en) Machine learning framework-based classification method and related equipment
CN112106077A (en) Method, apparatus, storage medium, and computer program product for network structure search
CN114595641A (en) Method and system for solving combined optimization problem
WO2021146977A1 (en) Neural architecture search method and apparatus
CN115688042A (en) Model fusion method, device, equipment and storage medium
CN116579435B (en) Quantum circuit classification method, quantum circuit classification device, electronic equipment, medium and product
WO2020237688A1 (en) Method and device for searching network structure, computer storage medium and computer program product
CN111684471A (en) Method and apparatus for network structure search, computer storage medium, and computer program product
US20240104415A1 (en) System and method for improving the efficiency of inputs to quantum computational devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination