WO2020107264A1 - 神经网络架构搜索的方法与装置 - Google Patents

神经网络架构搜索的方法与装置 Download PDF

Info

Publication number
WO2020107264A1
WO2020107264A1 PCT/CN2018/117957 CN2018117957W WO2020107264A1 WO 2020107264 A1 WO2020107264 A1 WO 2020107264A1 CN 2018117957 W CN2018117957 W CN 2018117957W WO 2020107264 A1 WO2020107264 A1 WO 2020107264A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
network model
structural parameters
calculation amount
operation layer
Prior art date
Application number
PCT/CN2018/117957
Other languages
English (en)
French (fr)
Inventor
蒋阳
赵丛
张李亮
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN201880068164.4A priority Critical patent/CN111406263A/zh
Priority to PCT/CN2018/117957 priority patent/WO2020107264A1/zh
Publication of WO2020107264A1 publication Critical patent/WO2020107264A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • This application relates to the field of machine learning, and more specifically, to a method and device for searching a neural network architecture.
  • Machine learning algorithms occupy a central position in tasks such as detection/tracking.
  • the mobile terminal is one of the important application scenarios for such tasks, such as mobile phones, drones or autonomous vehicles. Due to the limitation of the scene, the computing resources of the mobile terminal are limited, and machine learning algorithms, especially deep learning algorithms, often require a large amount of computing resources to ensure the performance of the algorithm.
  • the present application provides a method and device for searching a neural network architecture, which can effectively improve the performance of a neural network model in a scenario where computing resources are limited, that is, it can better balance the calculation amount and model performance of the neural network model.
  • a method for searching a neural network architecture includes: acquiring a neural network model to be searched for a network architecture; determining a search space of the neural network model, the search space defining each of the two nodes in the neural network model Multiple operations on the operation layer between; configuring structural parameters for multiple operations on each operation layer defined in the search space; using an optimization algorithm based on gradient information, performing a network architecture search on the neural network model to obtain the optimized Structural parameters, where the target optimization function used in the network architecture search includes the loss function of the neural network model, the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process, and the calculation of the computing device using the neural network model Differences between resources.
  • a method for searching a neural network architecture includes: acquiring a neural network to be searched for an architecture; performing a differential network architecture search on the neural network to obtain structural parameters of the neural network, wherein the differential network architecture
  • the optimization objective function used in the search includes the first regular term, and the first regular term represents the difference between the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process and the computing resources of the computing device using the neural network model .
  • an apparatus for searching a network architecture includes the following units.
  • the obtaining unit is used to obtain a neural network model to be searched for a network architecture.
  • the determination unit is used to determine the search space of the neural network model, and the search space defines various operations on the operation layer between every two nodes in the neural network model.
  • the configuration unit is used to configure structural parameters for various operations on each operation layer defined in the search space.
  • the optimization unit is used to perform a network architecture search on a neural network model using an optimization algorithm based on gradient information to obtain optimized structural parameters.
  • the target optimization function used in the network architecture search includes the loss function of the neural network model and the use of The difference between the calculation amount of the neural network model of the structural parameters of each iteration in the optimization process and the calculation resources of the computing device using the neural network model.
  • an apparatus for searching a network architecture includes the following units.
  • the obtaining unit is used to obtain a neural network to be searched for an architecture.
  • the optimization unit is used to perform a differential network architecture search on the neural network to obtain the structural parameters of the neural network.
  • the optimization objective function used in the differential network architecture search includes the first regular term.
  • the first regular term indicates that the optimization process is used. The difference between the calculation amount of the neural network model of the structural parameters of each iteration and the calculation resources of the computing device using the neural network model.
  • a neural network processing device includes a memory and a processor, the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory, and the execution of the instructions stored in the memory causes the The processor executes the method provided in the first aspect or the second aspect.
  • a chip is provided.
  • the chip includes a processing module and a communication interface.
  • the processing module is used to control the communication interface to communicate with the outside.
  • the processing module is also used to implement the method provided in the first aspect or the second aspect.
  • a computer-readable storage medium on which a computer program is stored, which when executed by a computer causes the computer to implement the method provided in the first aspect or the second aspect.
  • An eighth aspect provides a computer program product containing instructions that when executed by a computer causes the computer to implement the method provided in the first aspect or the second aspect.
  • this application can improve the performance of the neural network model by using the gradient information to optimize the network architecture search process.
  • the difference between the calculation amount of the neural network model and the computing resources of the computing device in the target optimization function Effectively restrict the calculation amount of the neural network model, so that it can effectively improve the performance of the neural network model in the scenario of limited computing resources.
  • Fig. 1 is a schematic diagram of a search scenario of a neural network architecture.
  • FIG. 2 is a schematic flowchart of a method for searching a neural network architecture provided by an embodiment of the present application.
  • Figures 3, 4 and 5 are schematic diagrams of neural network architecture search.
  • FIG. 6 is a schematic block diagram of a device for searching a neural network architecture provided by an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of a device for searching a neural network architecture provided by another embodiment of the present application.
  • FIG. 8 is a schematic block diagram of a neural network processing device provided by an embodiment of the present application.
  • Network architecture search is a technology that uses algorithms to automatically design neural network models. As the name suggests, network architecture search is to search out the architecture of the neural network model.
  • the neural network model to be searched for an architecture is shown in FIG. 1, and it is known that the neural network model includes 4 nodes (nodes 0, 1, 2, and 3 shown in FIG. 1), each two The operation between the two nodes is unknown (as indicated by the question mark "?" in Figure 1).
  • the problem to be solved by the network architecture search is to determine the operation between nodes 0, 1, 2, and 3. Different combinations of operations between nodes 0, 1, 2 and 3 correspond to different network architectures.
  • the nodes mentioned in this article can be understood as the feature layer in the neural network model.
  • the neural network model includes an input feature layer, two intermediate feature layers, and an output feature layer.
  • node 0 represents the input feature layer
  • node 1 and node 2 represent the intermediate feature layer
  • node 3 represents the output feature layer.
  • node 0 includes the feature data on the input feature layer (feature vector or feature matrix, similar to the following)
  • node 1 includes the feature data on the first intermediate feature layer
  • node 2 includes the feature data
  • node 3 includes the feature data on the output feature layer.
  • the operation between two nodes refers to the operation required to transform the feature data on one node into the feature data on the other node.
  • the operation between node 0 and node 1 refers to an operation where input feature data is the feature data on node 0 and output feature data is the feature data on node 1; node 1 and node
  • the operation between 3 refers to an operation in which the input feature data is the feature data on node 1 and the output feature data is the feature data on node 3.
  • the operations mentioned in this article can be other neural network operations such as convolution operations, pooling operations, or fully connected operations.
  • the problem to be solved by the network architecture search is to determine the operations between the various nodes as shown in Figure 1.
  • the operation between two nodes constitutes the operation layer between these two nodes.
  • there is an operation layer between node 0 and node 1 there is an operation layer between node 0 and node 2
  • there is an operation layer there is an operation layer between node 1 and node 3
  • the operation layer between two nodes has convolution operations, pooling operations, and fully connected operations.
  • the purpose of the network architecture search is to determine an operation at each operation layer.
  • the more commonly used network architecture search methods are random search, evolutionary algorithm, reinforcement learning, Bayesian optimization and differential network architecture search (Differentiable Architecture Search (DARTS).
  • DARTS Differentiable Architecture Search
  • the network architecture search method can obtain a newer and better network architecture than the artificially designed network.
  • the mobile terminal has become an important application scenario for applying the neural network model. Due to the limitation of the scene, the computing resources on the mobile terminal are limited. Therefore, to apply the neural network model on the mobile terminal, it is necessary to limit the calculation amount of the neural network model.
  • this application proposes a neural network architecture search method and device, which can better balance the computational volume and model performance of the neural network model. In other words, it can make full use of computing resources under the conditions of limited computing resources and can maximize Improve the performance of the neural network model.
  • FIG. 2 is a schematic flowchart of a method 200 for searching a neural network architecture provided by an embodiment of this application.
  • the method 200 includes the following steps.
  • nodes ie feature layers
  • operations between nodes are unknown.
  • the neural network model is shown in Figure 1.
  • the search space defines various operations on the operation layer between every two nodes in the neural network model.
  • FIG. 3 Taking the neural network model to be searched for the network architecture as shown in FIG. 1 as an example, the example of step 220 is shown in FIG. 3.
  • three operations are defined for each operation layer.
  • the three different dashed lines shown in FIG. 3 represent operation 1, operation 2, and operation 3.
  • the three operations are convolution operation, pooling operation, and full connection.
  • the purpose of the network architecture search is to select one operation from the three operations as the operation of the operation layer.
  • search space defines the scope of the network architecture search.
  • For each operation layer configure a structural parameter for each operation on it.
  • configuration parameters with the same value for each operation on it are configured.
  • the structural parameters configured for each operation on it may not be exactly the same.
  • the actions performed in step 230 can be regarded as quantizing the network architecture into a set of structural parameters. For example, taking the neural network model in step 210 as shown in FIG. 1 and the action in step 220 as shown in FIG. 2 as an example, in step 230, it can be considered that the network architecture of the neural network model is quantized into six 3-dimensional The structural parameters of, or, it can be considered that the network architecture of the neural network model is quantized into a structural parameter matrix of 6 rows and 3 columns.
  • the target optimization function used in the network architecture search includes the loss function of the neural network model and the optimization process. The difference between the calculation amount of the neural network model of the structural parameters of each iteration and the calculation resources of the computing device using the neural network model.
  • the objective optimization function includes at least two parts, one part is the loss function of the neural network model, and the other part is the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process and the computing resources of the computing device using the neural network model difference between.
  • the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process refers to the calculation amount of the neural network model corresponding to the structural parameters of each iteration in the optimization process of the structural parameters.
  • the objective optimization function includes the loss function of the neural network model
  • one of the optimization objectives is to improve the performance of the neural network model.
  • the objective optimization function includes the difference between the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process and the computing resources of the computing device using the neural network model
  • another optimization goal is to reduce the use of optimization
  • the difference between the calculation amount of the neural network model of the structural parameters of each iteration in the process and the calculation resources of the computing device using the neural network model is to limit the calculation amount of the neural network model to the computing resource of the computing device.
  • optimization algorithm based on gradient information is a prior art, which will not be described in detail in this article.
  • any feasible optimization method based on gradient information may be adopted as the optimization method of the network architecture search process.
  • the optimized neural network can be effectively guaranteed
  • the calculation amount of the model is within the calculation resource range of the calculation device.
  • the performance of the neural network model can be improved.
  • the difference can effectively restrict the calculation amount of the neural network model, so that it can effectively improve the performance of the neural network model in the scenario of limited computing resources.
  • each operating layer has a structural parameter with the largest value.
  • the structural parameter indicated by the thickest dotted line is the structural parameter with the largest value.
  • the operation corresponding to the structure parameter with the largest value on each operation layer may be used as the selected operation, and the remaining operations are discarded.
  • the operation corresponding to the structure parameter with the largest value is retained, and the remaining operations are deleted.
  • the result is shown in FIG. 5. So far, through the network architecture search, the final neural network model is obtained, for example, the neural network model shown in FIG. 5.
  • the structural parameters on each operation layer of each iteration are normalized.
  • the difference between the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process and the calculation resources of the computing device using the neural network model can be regarded as a regular term of the target optimization function.
  • the calculation amount of the determined neural network model is related to the structural parameters.
  • the structural parameters obtained in different iterations correspond to different calculation amounts.
  • the operations on the operation layer between the nodes in the neural network model to be searched for the network architecture can be designed in advance, and therefore, the calculation amount of each operation can be easily calculated.
  • the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process can be obtained according to the calculation amount of each operation layer in the neural network model.
  • the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process is accumulated according to the calculation amounts of all operation layers in the neural network model, wherein the calculation amount of each operation layer is based on each operation layer
  • the structure parameters and calculation amount of each operation on each operation layer are obtained.
  • i, j in (i, j) respectively represent the input node and output node of the same operation layer in the neural network model, the total number of nodes in the neural network model is (M+1), FLOPS (i, j) Represents the amount of calculation on the operation layer between input node i and output node j, ⁇ represents the set of candidate operations on the operation layer between input node i and output node j, Represents the structural parameters of the candidate operation o in the candidate operation set ⁇ on the operation layer between the input node i and the output node j, It represents the calculation amount of the candidate operation o in the candidate operation set ⁇ on the operation layer between the input node i and the output node j.
  • formula (1) is only an example and not a limitation.
  • the method of obtaining the calculation amount of the neural network model can be defined according to actual needs.
  • the calculation amount of the neural network model can be accumulated according to the calculation amounts of some operation layers in the neural network model.
  • the amount of computing resources of the computing device can be estimated by the same evaluation means.
  • the amount of computing resources of the computing device is used as the “computing resources of the computing device using a neural network model” in the embodiment of the present application.
  • the computing device to which the neural network model is to be applied is not yet known.
  • a threshold for constraining the calculation amount of the neural network model can be set according to experience or specific needs, and this threshold is used as an embodiment in this application "Computer Resources of Computing Equipment Using Neural Network Models".
  • computing resources of a computing device using a neural network model can also be replaced with "a threshold of the calculation amount of a neural network model”.
  • the target optimization function in the embodiments of the present application includes the loss function of the neural network model, the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process, and the calculation resources of the computing device using the neural network model difference between.
  • the objective optimization function is shown in the following formula:
  • represents structural parameters.
  • w represents the network weight.
  • L val (w * ( ⁇ ), ⁇ ) represents the loss value on the verification set, and L val (w * ( ⁇ ), ⁇ ) is determined by ⁇ and w.
  • M represents the computing resources of the computing device, FLOPS represents the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process, ⁇ 1 is a constant, and ⁇ 1 (M-FLOPS) represents the calculation amount and calculation resources of the neural network model Difference. FLOPS can be calculated by formula (1), and can also be obtained by other feasible methods.
  • ⁇ 1 (M-FLOPS) can be regarded as the regular term in the objective optimization function.
  • the calculation amount of the neural network model can be effectively constrained.
  • the objective optimization function includes, in addition to the loss function of the neural network model and the difference between the calculation amount of the neural network model and the computing resources of the computing device using the neural network model, 1 norm of structural parameters.
  • the 1-norm of the structure parameter can make the value of the structure parameter on each operation layer as sparse as possible.
  • the objective optimization function is shown in the following formula:
  • represents structural parameters.
  • w represents the network weight.
  • L val (w * ( ⁇ ), ⁇ ) represents the loss value on the verification set, and L val (w * ( ⁇ ), ⁇ ) is determined by ⁇ and w.
  • M represents the computing resources of the computing device
  • FLOPS represents the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process
  • ⁇ 1 is a constant
  • ⁇ 1 (M-FLOPS) represents the calculation amount and calculation resources of the neural network model Difference.
  • FLOPS can be calculated by formula (1), and can also be obtained by other feasible methods.
  • ⁇ 1 represents the 1 norm of ⁇ .
  • ⁇ 1 can make the values of structural parameters on each operation layer as sparse as possible.
  • ⁇ 1 in equation (3) can make the structure parameter of each operation layer non-zero or one.
  • the calculation amount of the neural network model determined during the network architecture search process there is some discrepancy between the calculation amount of the neural network model determined during the network architecture search process and the calculation amount of the neural network model actually put into use.
  • the values of the three structural parameters on an operation layer are 0.1, 0.2, and 0.7.
  • the calculation amount of operation layer X It is equal to 0.1 ⁇ flops1+0.2 ⁇ flops2+0.7 ⁇ flops3, where flops1, flops1 and flops1 respectively represent the calculation amount of the three candidate operations on the operation layer X.
  • the calculation amount of the operation layer X should be equal to 0.7 ⁇ flops3 instead of 0.1 ⁇ flops1+0.2 ⁇ flops2+0.7 ⁇ flops3.
  • the 1 norm of the structural parameter by adding the 1 norm of the structural parameter to the objective optimization function, and the 1 norm of the structural parameter can make the value of the structural parameter on each operation layer as sparse as possible, so that it can be reduced to a certain extent
  • the values of multiple structural parameters of each operation layer can be made as sparse as possible, thereby making the neural network model used in the optimization process
  • the amount of calculation is relatively close to the amount of calculation of the neural network model in practical applications. Not only can the performance of the applied network model be improved under certain conditions of calculation amount, but also the utilization of computing resources can be further improved.
  • the objective optimization function includes, in addition to the loss function of the neural network model and the difference between the calculation amount of the neural network model and the computing resources of the computing device using the neural network model, The 0 norm of structural parameters.
  • the 0-norm of the structure parameter can make only one of the structure parameters on each operation layer take the value 1, and the rest are 0.
  • the objective optimization function is shown in the following formula:
  • represents structural parameters.
  • w represents the network weight.
  • L val (w * ( ⁇ ), ⁇ ) represents the loss value on the verification set, and L val (w * ( ⁇ ), ⁇ ) is determined by ⁇ and w.
  • M represents the computing resources of the computing device
  • FLOPS represents the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process
  • ⁇ 1 is a constant
  • ⁇ 1 (M-FLOPS) represents the calculation amount and calculation resources of the neural network model Difference.
  • FLOPS can be calculated by formula (1), and can also be obtained by other feasible methods.
  • b is an auxiliary variable. When ⁇ 2 , b is equal to 0, and when ⁇ 2 , b is equal to ⁇ .
  • ⁇ B ⁇ 0 represents the 0 norm of b, and ⁇ 2 is a constant.
  • ⁇ B ⁇ 0 can make only one of the structural parameters on each operation layer take the value 1, and the rest take the value 0. It should also be understood that in this embodiment of the present application, by adding the 0-norm of the structural parameter to the target optimization function, and the 0-norm of the structural parameter can make only one of the structural parameters on each operation layer take the value 1, and the remaining values Both are 0, which can further reduce the gap between the calculation amount of the neural network model determined during the network architecture search process and the calculation amount of the actually put into use neural network model.
  • the calculation amount of the neural network model used in the process is close to the calculation amount of the neural network model in practical applications to the greatest extent, which can not only improve the performance of the application network model under certain calculation amount restrictions, but also effectively improve the utilization of computing resources rate.
  • the solution provided in this application can be implemented on the basis of the existing differential network architecture search (DARTS) method.
  • DARTS differential network architecture search
  • an embodiment of the present application further provides a method for searching a neural network architecture.
  • the method includes the following steps.
  • the first step is to obtain the neural network to be searched for architecture.
  • the second step is to perform a differential network architecture search on the neural network to obtain structural parameters of the neural network, wherein the optimization objective function used in the differential network architecture search includes a first regular term, the first The regular term represents the difference between the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process and the computing resources of the computing device using the neural network model.
  • optimization objective function is as follows:
  • represents structural parameters.
  • w represents the network weight.
  • L val (w * ( ⁇ ), ⁇ ) represents the loss value on the verification set.
  • L train (w, ⁇ ) represents the loss value on the training set. Both L train (w, ⁇ ) and L val (w * ( ⁇ ), ⁇ ) are determined by the structure parameter ⁇ and the network weight w.
  • M represents the computing resources of the computing device
  • FLOPS represents the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process
  • ⁇ 1 is a constant.
  • st is an abbreviation of subject to, which means that the condition behind st needs to be satisfied, that is, the first formula in formula (5) needs to be calculated when the second formula of formula (5) is satisfied.
  • the gradient information-based optimization algorithm used in DARTS may be used to perform optimization based on the target optimization function, so as to obtain optimized structural parameters.
  • Step 1) Fix w and optimize ⁇ ;
  • Step 2) Fix ⁇ and optimize w;
  • the embodiment of the present application proposes a constrained differential network architecture search method, so that a high-performance neural network model can be automatically designed in a scenario where computing resources are limited.
  • the objective optimization function further includes a 1-norm of structural parameters.
  • optimization objective function is as follows:
  • represents structural parameters.
  • w represents the network weight.
  • L val (w * ( ⁇ ), ⁇ ) represents the loss value on the verification set.
  • L train (w, ⁇ ) represents the loss value on the training set. Both L train (w, ⁇ ) and L val (w * ( ⁇ ), ⁇ ) are determined by the structure parameter ⁇ and the network weight w.
  • M represents the computing resources of the computing device
  • FLOPS represents the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process
  • ⁇ 1 is a constant.
  • ⁇ 1 represents the 1-norm of ⁇ , and ⁇ 1 makes the values of structural parameters on each operation layer of the neural network model relatively sparse.
  • Step 1) Fix w and optimize ⁇ ;
  • Step 2) Fix ⁇ and optimize w;
  • the values of multiple structural parameters of each operation layer can be made as sparse as possible, thereby making the calculation amount of the neural network model used in the optimization process
  • the amount of calculation of the neural network model is relatively close to the actual application. Not only can the performance of the applied network model be improved under certain calculation limits, but also the utilization of computing resources can be further improved.
  • the objective optimization function also includes the 0 norm of the structural parameters.
  • optimization objective function is as follows:
  • represents structural parameters.
  • w represents the network weight.
  • L val (w * ( ⁇ ), ⁇ ) represents the loss value on the verification set.
  • L train (w, ⁇ ) represents the loss value on the training set. Both L train (w, ⁇ ) and L val (w * ( ⁇ ), ⁇ ) are determined by the structure parameter ⁇ and the network weight w.
  • M represents the computing resources of the computing device
  • FLOPS represents the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process
  • ⁇ 1 is a constant.
  • b is an auxiliary variable, which is equivalent to a template. When ⁇ 2 , b is equal to 0, and when ⁇ 2 , b is equal to ⁇ .
  • ⁇ B ⁇ 0 represents the 0 norm of b, and ⁇ 2 is a constant. It should be understood that ⁇ b ⁇ 0 can also be regarded as the 0 norm of ⁇ .
  • ⁇ B ⁇ 0 can make only one of the structural parameters on each operation layer take the value 1, and the rest take the value 0.
  • a suitable ⁇ 2 can realize that only one of the structural parameters ⁇ on each operation layer takes the value 1, and the rest take the value 0.
  • Step 1) Given ⁇ , obtain b from the following formula;
  • Step 2) After obtaining the value of b, fix w and optimize ⁇ ;
  • Step 3) Fix ⁇ , optimize w;
  • the computational volume of the neural network model used is close to the computational volume of the neural network model in practical applications to the greatest extent. It can not only improve the performance of the applied network model under certain computational constraints, but also effectively increase the utilization of computing resources.
  • An embodiment of the present application proposes a constrained differential network architecture search method.
  • the differential network architecture search method uses an optimization algorithm based on gradient information, the performance of the neural network model can be improved, and the other
  • the computational constraints in the objective optimization function used in the search method of the differential network architecture can be effectively constrained, which can be achieved to effectively improve the performance of the neural network model in scenarios with limited computing resources .
  • this application can improve the performance of the neural network model by using the gradient information to optimize the network architecture search process.
  • the difference between the calculation amount of the neural network model and the calculation resources of the computing device in the target optimization function Effectively restrict the calculation amount of the neural network model, so that it can effectively improve the performance of the neural network model in the scenario of limited computing resources.
  • FIGS. 1, 3, 4 and 5 are examples only and not limiting. It should also be understood that the formulas (1) to (7) mentioned in this article also represent only one possible implementation, and corresponding deformations can be made in practical applications, for example, the coefficients in formulas (2) to (7) There are many variations of the settings. A deformation method logically inferred by a person skilled in the art based on the formula provided herein also falls within the protection scope of the present application.
  • an embodiment of the present application provides a device 600 for searching a neural network architecture.
  • the device 600 includes the following units.
  • the obtaining unit 610 is used to obtain a neural network model to be searched for a network architecture.
  • the determining unit 620 is used to determine the search space of the neural network model.
  • the search space defines various operations on the operation layer between every two nodes in the neural network model.
  • the configuration unit 630 is configured to configure structural parameters for various operations on each operation layer defined in the search space.
  • the optimization unit 640 is used to perform a network architecture search on a neural network model using an optimization algorithm based on gradient information to obtain optimized structural parameters.
  • the target optimization function used in the network architecture search includes the loss function of the neural network model, and The difference between the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process and the calculation resources of the computing device using the neural network model.
  • the performance of the neural network model can be improved.
  • the difference can effectively restrict the calculation amount of the neural network model, so that it can effectively improve the performance of the neural network model in the scenario of limited computing resources.
  • the target optimization function may be as shown in Formula (2) or Formula (5) described above.
  • the objective optimization function further includes a 1-norm of structural parameters.
  • the target optimization function may be as shown in Formula (3) or Formula (6) described above.
  • the objective optimization function further includes the 0 norm of the structural parameters.
  • the target optimization function may be as shown in Formula (4) or Formula (7) described above.
  • the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process is obtained according to the calculation amount of each operation layer in the neural network model.
  • the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process is accumulated according to the calculation amounts of all operation layers in the neural network model, where the calculation amount of each operation layer is based on the structure on each operation layer
  • the parameters and calculation amount of each operation on each operation layer are obtained.
  • the calculation amount FLOPS of the neural network model using the structural parameters of each iteration in the optimization process is obtained according to the formula (1) described above.
  • the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process is accumulated according to the calculation amount of some operation layers in the neural network model, where each operation layer The amount of calculation is obtained based on the structural parameters on each operation layer and the calculation amount of each operation on each operation layer.
  • the acquisition unit 610, the determination unit 620, the configuration unit 630, and the optimization unit 640 in this embodiment may all be implemented by a processor or a processor-related circuit.
  • an embodiment of the present application further provides an apparatus 700 for network architecture search.
  • the apparatus 700 includes the following units.
  • the obtaining unit 710 is used to obtain a neural network to be searched for an architecture.
  • the optimization unit 720 is used to perform a differential network architecture search on the neural network to obtain the structural parameters of the neural network, wherein the optimization objective function used in the differential network architecture search includes the first regular term, and the first regular term indicates that the optimization process is adopted The difference between the calculation amount of the neural network model of the structural parameters of each iteration in each iteration and the computing resources of the computing device using the neural network model.
  • This application proposes a constrained differential network architecture search method, which can realize the automatic design of high-performance neural network models in scenarios with limited computing resources.
  • the objective optimization function may be as shown in formula (5) described above.
  • the objective optimization function further includes a 1-norm of structural parameters.
  • the objective optimization function may be as shown in formula (6) described above.
  • the objective optimization function further includes a 0-norm of structural parameters.
  • the objective optimization function may be as shown in formula (7) described above.
  • the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process is obtained according to the calculation amount of each operation layer in the neural network model.
  • the calculation amount of the neural network model using the structural parameters of each iteration in the optimization process is accumulated according to the calculation amounts of all operation layers in the neural network model, where the calculation amount of each operation layer is based on the structure on each operation layer
  • the parameters and calculation amount of each operation on each operation layer are obtained.
  • the calculation amount FLOPS of the neural network model using the structural parameters of each iteration in the optimization process is obtained according to the formula (1) described above.
  • both the obtaining unit 710 and the optimization unit 720 in this embodiment may be implemented by a processor or a processor-related circuit.
  • an embodiment of the present application further provides a neural network processing device 800.
  • the neural network processing device 800 includes a processor 810 and a memory 820.
  • the memory 820 is used to store instructions.
  • the processor 810 is used to execute the instructions stored in the memory 820. For performing the above method embodiments.
  • the performance of the neural network model can be improved.
  • the difference can effectively restrict the calculation amount of the neural network model, so that it can effectively improve the performance of the neural network model in the scenario of limited computing resources.
  • the neural network processing device 800 may also correspond to the device 600 or the device 700 provided in the above embodiments.
  • the neural network processing apparatus 800 may further include a communication interface 830 for outputting data processed by the neural network processing apparatus 800, and/or inputting data to be processed by the neural network processing apparatus 800 from an external device.
  • a communication interface 830 for outputting data processed by the neural network processing apparatus 800, and/or inputting data to be processed by the neural network processing apparatus 800 from an external device.
  • the processor 810 is used to control the communication interface 830 to input and/or output data.
  • Embodiments of the present application also provide a computer storage medium on which a computer program is stored.
  • the computer program executes the foregoing method embodiment.
  • An embodiment of the present application further provides a computer program product containing instructions, which is characterized in that, when the instructions are executed by a computer, the computer executes the above method embodiments.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transferred from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, computer, server or data center Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device including a server, a data center, and the like integrated with one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, digital video disc (DVD)), or semiconductor media (eg, solid state disk (SSD)), etc. .
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a division of logical functions.
  • there may be other divisions for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

提供一种神经网络架构搜索的方法与装置,该方法包括:获取待进行网络架构搜索的神经网络模型;确定神经网络模型的搜索空间,搜索空间定义了神经网络模型中每两个节点之间的操作层上的多种操作;为搜索空间中的每个操作层上的多种操作配置结构参数;利用基于梯度信息的优化算法,对神经网络模型进行网络架构搜索,获得优化后的结构参数,其中,网络架构搜索所使用的目标优化函数包括神经网络模型的损失函数,以及采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异。可以在计算资源有限的场景下,有效提高神经网络模型的性能。

Description

神经网络架构搜索的方法与装置
版权申明
本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者该专利披露。
技术领域
本申请涉及机器学习领域,并且更为具体地,涉及一种神经网络架构搜索的方法与装置。
背景技术
随着技术发展,神经网络模型被设计的越来越复杂,需要人工考虑的因素越来越多,人工设计神经网络模型变得越来越困难,需要专家级别的算法工程师经过大量的调试才有可能得到较好性能的模型。利用算法自动化设计神经网络模型成为一个重要的研究课题。
机器学习算法在检测/跟踪等任务中占据核心的位置,通常,移动端是这类任务的重要应用场景之一,例如手机、无人机或自动驾驶汽车等。由于场景限制,移动端的计算资源有限,而机器学习算法,尤其深度学习算法往往需要大量的计算资源才能保证算法性能。
如何在计算资源有限的条件下,最大限度地提升神经网络模型的性能,是当前网络架构搜索技术需要克服的问题。目前已有的网络架构搜索方法,均无法较好地兼顾神经网络模型的计算量与模型性能。
发明内容
本申请提供一种神经网络架构搜索的方法与装置,可以在计算资源有限的场景下,有效提高神经网络模型的性能,即可以较好地兼顾神经网络模型的计算量与模型性能。
第一方面,提供一种神经网络架构搜索的方法,该方法包括:获取待进行网络架构搜索的神经网络模型;确定神经网络模型的搜索空间,搜索空间定义了神经网络模型中每两个节点之间的操作层上的多种操作;为搜索空间中定义的每个操作层上的多种操作配置结构参数;利用基于梯度信息的优化 算法,对神经网络模型进行网络架构搜索,获得优化后的结构参数,其中,网络架构搜索所使用的目标优化函数包括神经网络模型的损失函数,以及采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异。
第二方面,提供一种神经网络架构搜索的方法,该方法包括:获取待进行架构搜索的神经网络;对神经网络进行可差分网络架构搜索,获得神经网络的结构参数,其中,可差分网络架构搜索使用的优化目标函数中包括第一正则项,第一正则项表示采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异。
第三方面,提供一种网络架构搜索的装置,该装置包括如下单元。
获取单元,用于获取待进行网络架构搜索的神经网络模型。
确定单元,用于确定神经网络模型的搜索空间,搜索空间定义了神经网络模型中每两个节点之间的操作层上的多种操作。
配置单元,用于为搜索空间中定义的每个操作层上的多种操作配置结构参数。
优化单元,用于利用基于梯度信息的优化算法,对神经网络模型进行网络架构搜索,获得优化后的结构参数,其中,网络架构搜索所使用的目标优化函数包括神经网络模型的损失函数,以及采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异。
第四方面,提供一种网络架构搜索的装置,该装置包括如下单元。
获取单元,用于获取待进行架构搜索的神经网络。
优化单元,用于对神经网络进行可差分网络架构搜索,获得神经网络的结构参数,其中,可差分网络架构搜索使用的优化目标函数中包括第一正则项,第一正则项表示采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异。
第五方面,提供一种神经网络处理装置,该神经网络处理装置包括存储器和处理器,存储器用于存储指令,处理器用于执行所述存储器存储的指令,并且对存储器中存储的指令的执行使得处理器执行第一方面或第二方面提供的方法。
第六方面,提供一种芯片,该芯片包括处理模块与通信接口,处理模块 用于控制通信接口与外部进行通信,处理模块还用于实现第一方面或第二方面提供的方法。
第七方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被计算机执行时使得所述计算机实现第一方面或第二方面提供的方法。
第八方面,提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机实现第一方面或第二方面提供的方法。
综上所述,本申请通过利用梯度信息优化网络架构搜索过程,可以提高神经网络模型的性能,通过在目标优化函数中包括神经网络模型的计算量与计算设备的计算资源之间的差异,可以有效约束神经网络模型的计算量,从而可以实现,在计算资源有限的场景下,有效提高神经网络模型的性能。
附图说明
图1是神经网络架构搜索的场景示意图。
图2是本申请实施例提供的神经网络架构搜索的方法的示意性流程图。
图3、图4和图5是神经网络架构搜索的示意图。
图6是本申请实施例提供的神经网络架构搜索的装置的示意性框图。
图7是本申请另一实施例提供的神经网络架构搜索的装置的示意性框图。
图8是本申请实施例提供的神经网络处理装置的示意性框图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。
为了便于理解本申请提供的方案,下文首先介绍神经网络架构搜索的概念。
网络架构搜索(NAS)是一种利用算法自动化设计神经网络模型的技术。顾名思义,网络架构搜索就是要搜索出神经网络模型的架构。
作为示例,待进行架构搜索的神经网络模型如图1所示,已知该神经网 络模型包括4个节点(node)(如图1中所示的节点0、1、2与3),每两个节点之间的操作(operation)是未知的(如图1中问号“?”所指示)。在图1的例子中,网络架构搜索要解决的问题就是确定节点0、1、2与3之间的操作。节点0、1、2与3之间的操作的不同组合对应不同的网络架构。
本文中提及的节点,即神经网络模型中的节点,可以理解为神经网络模型中的特征层。例如,在图1中,神经网络模型包括一个输入特征层,两个中间特征层,一个输出特征层。其中,节点0表示输入特征层,节点1和节点2表示中间特征层,节点3表示输出特征层。应理解,在本例中,节点0上包括输入特征层上的特征数据(特征向量或特征矩阵,如下类似),节点1上包括第一个中间特征层上的特征数据,节点2上包括第二个中间特征层上的特征数据,节点3上包括输出特征层上的特征数据。
两个节点之间的操作指的是,其中一个节点上的特征数据变换为另一个节点上的特征数据所需的操作。在图1的示例中,节点0与节点1之间的操作指的是,可以实现输入特征数据为节点0上的特征数据、输出特征数据为节点1上的特征数据的操作;节点1与节点3之间的操作指的是,可以实现输入特征数据为节点1上的特征数据、输出特征数据为节点3上的特征数据的操作。本文提及的操作可以为卷积操作、池化操作、或全连接操作等其他神经网络操作。
网络架构搜索要解决的问题就是,确定如图1所示的各个节点之间的操作。
可以认为两个节点之间的操作构成这两个节点之间的操作层。例如,在图1中,节点0和节点1之间具有一个操作层,节点0和节点2之间具有一个操作层,节点0与节点3之间具有一个操作层,节点1和节点2之间具有一个操作层,节点1和节点3之间具有一个操作层,节点2与节点3之间具有一个操作层。即图1所示神经网络模型具有6个操作层。
通常,两个节点之间的操作层上具有多个可供搜索的操作,即具有多个候选操作。例如,两个节点之间的操作层具有卷积操作、池化操作和全连接操作。网络架构搜索的目的就是在每个操作层上确定一个操作。
目前,较为常用的网络架构搜索方法有随机搜索、进化算法、强化学习、贝叶斯优化和可差分网络架构搜索(Differentiable Architecture Search,DARTS)。网络架构搜索方法可以得到比人工设计网络性能更好、更新颖的 网络架构。
前文已述,当前技术中,移动端已经成为应用神经网络模型的重要应用场景。由于场景限制,移动端上的计算资源有限,所以,要想在移动端上应用神经网络模型,需要限制神经网络模型的计算量。
当前已有的几种网络架构搜索方法,均无法较好地兼顾神经网络模型的计算量与模型性能。
针对此,本申请提出一种神经网络架构搜索方法与装置,可以较好地兼顾神经网络模型的计算量与模型性能,换言之,可以在有限的计算资源条件下,充分利用计算资源,且可以最大程度提高神经网络模型的性能。
图2为本申请实施例提供的神经网络架构搜索的方法200的示意性流程图。该方法200包括如下步骤。
210,获取待进行网络架构搜索的神经网络模型。
在该神经网络模型中,节点(即特征层)已知,节点之间的操作未知。
例如,该神经网络模型如图1所示。
220,确定神经网络模型的搜索空间,搜索空间定义了神经网络模型中每两个节点之间的操作层上的多种操作。
以待进行网络架构搜索的神经网络模型如图1所示为例,步骤220的示例如图3所示。在图3中,为每个操作层定义了3个操作,如图3中所示的3种不同虚线表示操作1、操作2和操作3。例如,3个操作是卷积操作、池化操作和全连接。在图3中,对于一个操作层,网络架构搜索的目的就是从3个操作中选择一个操作作为该操作层的操作。
应理解,搜索空间定义了网络架构搜索的范围。
230,为搜索空间中定义的每个操作层上的多种操作配置结构参数。
针对每个操作层,为其上的每个操作配置一个结构参数。
可选地,针对每个操作层,为其上的各个操作配置取值相同的结构参数。
可选地,针对每个操作层,为其上的各个操作配置的结构参数可以不完全相同。
步骤230执行的动作,可以看作是将网络架构量化为一组结构参数。例如,以步骤210中的神经网络模型如图1所示,步骤220的动作如图2所示为例,在步骤230中,可以认为是将该神经网络模型的网络架构量化为6个3维的结构参数,或者,可以认为是将该神经网络模型的网络架构量化为一 个6行3列的结构参数矩阵。
240,利用基于梯度信息的优化算法,对神经网络模型进行网络架构搜索,获得优化后的结构参数,其中,网络架构搜索所使用的目标优化函数包括神经网络模型的损失函数,以及采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异。
目标优化函数中至少包括两部分,其中一部分是神经网络模型的损失函数,另一部分是采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异。采用优化过程中每次迭代的结构参数的神经网络模型的计算量,指的是,在结构参数的优化过程中,每次迭代的结构参数所对应的神经网络模型的计算量。
应理解,由于目标优化函数中包括神经网络模型的损失函数,因此,优化目的之一是提高神经网络模型的性能。由于目标优化函数中包括采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异,因此,另一个优化目的是减小采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异,即将神经网络模型的计算量限制在计算设备的计算资源范围内。
本文利用基于梯度信息的优化算法,对目标优化函数进行优化。应理解,利用梯度信息优化网络架构搜索过程,具有高效、且不易于收敛至局部最优解的优点。
基于梯度信息的优化算法为现有技术,本文对此不做详述。在本申请提供的方案中,可以采用任意可行的基于梯度信息的优化方法作为网络架构搜索过程的优化方法。
此外,由于目标优化函数中包括采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异,所以可以有效保证优化得到的神经网络模型的计算量在计算设备的计算资源范围内。
因此,在本申请提供的方案中,通过利用梯度信息优化网络架构搜索过程,可以提高神经网络模型的性能,通过在目标优化函数中包括神经网络模型的计算量与计算设备的计算资源之间的差异,可以有效约束神经网络模型 的计算量,从而可以实现,在计算资源有限的场景下,有效提高神经网络模型的性能。
例如,步骤240的优化结果如图4所示,每个操作层上的具有一个取值最大的结构参数。在图4中,在每一个操作层上,最粗的虚线表示的结构参数为取值最大的结构参数。
在通过步骤240获得优化结果之后,可以将每一个操作层上取值最大的结构参数所对应的操作作为所选择的操作,舍弃其余操作。
例如,在图4的基础上,在每一个操作层上,保留取值最大的结构参数所对应的操作,删除其余操作,结果如图5所示。至此,通过网络架构搜索,获得最终的神经网络模型,例如,如图5所示的神经网络模型。
可选地,在结构参数的优化过程中,每次迭代的每个操作层上的结构参数被归一化。
采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异,可以看作为目标优化函数的正则项。
为了描述的简洁,下文有些地方会将“采用优化过程中每次迭代的结构参数的神经网络模型的计算量”简单记为“神经网络模型的计算量”。
应理解,在网络架构搜索过程中,确定的神经网络模型的计算量与结构参数有关。在优化过程中,不同次迭代获得的结构参数对应不同的计算量。
下文将描述在网络架构搜索过程中,如何确定神经网络模型的计算量。
待进行网络架构搜索的神经网络模型中的节点之间的操作层上的操作可以是提前设计好的,因此,可以很容易计算出每一个操作的计算量。
采用优化过程中每次迭代的结构参数的神经网络模型的计算量,可以根据这个神经网络模型中每个操作层的计算量获得。
可选地,采用优化过程中每次迭代的结构参数的神经网络模型的计算量根据神经网络模型中所有操作层的计算量累加得到,其中,每个操作层的计算量根据每个操作层上的结构参数以及每个操作层上各个操作的计算量获得。
例如,根据如下公式,计算神经网络模型的计算量FLOPS:
Figure PCTCN2018117957-appb-000001
其中,(i,j)中的i,j分别表示神经网络模型中同一个操作层的输入节点和输出节点,神经网络模型中节点的总数量为(M+1),FLOPS (i,j)表示输入节点i与输出节点j之间的操作层上的计算量,Ο表示输入节点i与输出节点j之间的操作层上的候选操作集合,
Figure PCTCN2018117957-appb-000002
表示输入节点i与输出节点j之间的操作层上的候选操作集合Ο中的候选操作o的结构参数,
Figure PCTCN2018117957-appb-000003
表示输入节点i与输出节点j之间的操作层上的候选操作集合Ο中的候选操作o的计算量。
需要说明的是,公式(1)仅为示例而非限定。实际应用中,可以根据实际需要定义神经网络模型的计算量的获取方式。例如,神经网络模型的计算量可以根据该神经网络模型中部分操作层的计算量累加得到。
获取使用神经网络模型的计算设备的计算资源的方法可以多种。
作为一种情况,已知将要应用神经网络模型的计算设备,则可以通过相同评估手段,估计该计算设备的计算资源量。将该计算设备的计算资源量作为本申请实施例中的“用神经网络模型的计算设备的计算资源”。
作为另一种情况,将要应用神经网络模型的计算设备还未知,这时,可以根据经验或者具体需求,设置一个用于约束神经网络模型的计算量的阈值,将该阈值作为本申请实施例中的“用神经网络模型的计算设备的计算资源”。
还可以将表述“使用神经网络模型的计算设备的计算资源”替换为“神经网络模型的计算量阈值”。
如前文描述,本申请实施例中的目标优化函数包括神经网络模型的损失函数,以及采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异。
例如,作为第一种实现方式,目标优化函数为如下公式所示:
Figure PCTCN2018117957-appb-000004
其中,α表示结构参数。w表示网络权重。L val(w *(α),α)表示验证集上的损失值,L val(w *(α),α)由α与w决定。M表示计算设备的计算资源,FLOPS表 示采用优化过程中每次迭代的结构参数的神经网络模型的计算量,λ 1为常数,λ 1(M-FLOPS)表示神经网络模型的计算量与计算资源的差异。FLOPS可以通过公式(1)计算得到,也可以通过其它可行方式获得。
λ 1(M-FLOPS)可以看作是目标优化函数中的正则项。
通过在目标优化函数中包括神经网络模型的计算量与计算设备的计算资源之间的差异,可以有效约束神经网络模型的计算量。
可选地,在一些实施例中,目标优化函数中除了包括神经网络模型的损失函数,以及神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异之外,还包括结构参数的1范数。
需要说明的是,结构参数的1范数可以使得每个操作层上的结构参数的取值尽量稀疏。
例如,作为第二种实现方式,目标优化函数为如下公式所示:
Figure PCTCN2018117957-appb-000005
其中,α表示结构参数。w表示网络权重。L val(w *(α),α)表示验证集上的损失值,L val(w *(α),α)由α与w决定。M表示计算设备的计算资源,FLOPS表示采用优化过程中每次迭代的结构参数的神经网络模型的计算量,λ 1为常数,λ 1(M-FLOPS)表示神经网络模型的计算量与计算资源的差异。FLOPS可以通过公式(1)计算得到,也可以通过其它可行方式获得。‖α‖ 1表示α的1范数。‖α‖ 1可以使得每个操作层上的结构参数的取值尽量稀疏。
换言之,式(3)中的‖α‖ 1可以使每个操作层的结构参数非0即1。
通过上文结合图1至图5的描述可知,完成网络架构搜索后,针对每个操作层,会将该操作层上取值最大的结构参数所对应的操作作为该操作层的操作,而将该操作层上的其余操作舍弃。换言之,实际投入使用的神经网络模型中的每个操作层上只保留一个操作,那么,该神经网络模型的计算量等于所有操作层上的操作的计算量之和。即只要保证该神经网络模型中所有操作层上的操作的计算量之和小于计算设备的计算资源即可。
但是,从上文结合公式(1)的描述可知,在网络架构搜索过程中确定的神经网络模型的计算量,与实际投入使用的神经网络模型的计算量之间有一 些出入。例如,在完成网络架构搜索时,某一个操作层(记为操作层X)上的三个结构参数的取值为0.1、0.2与0.7,按照上文公式(1),操作层X的计算量等于0.1×flops1+0.2×flops2+0.7×flops3,其中,flops1、flops1与flops1分别表示操作层X上的三个候选操作的计算量。但在完成网络架构搜索之后,在操作层X上,只会选用取值为0.7的结构参数所对应的操作,即,在投入实际使用的神经网络模型中,操作层X的计算量应该等于0.7×flops3,而非0.1×flops1+0.2×flops2+0.7×flops3。
而本申请实施例通过在目标优化函数中加入结构参数的1范数,且结构参数的1范数可以使得每个操作层上的结构参数的取值尽量稀疏,从而可以在一定程度上,减小在网络架构搜索过程中确定的神经网络模型的计算量,与实际投入使用的神经网络模型的计算量之间的差距。
因此,本申请实施例,通过在目标优化函数中加入结构参数的1范数,可以使得每一个操作层的多个结构参数的取值尽量稀疏,从而使得在优化过程中使用的神经网络模型的计算量较为接近实际应用中神经网络模型的计算量,不仅可以在一定计算量限制条件下,提高申请网络模型的性能,还可以进一步提高对计算资源的利用率。
可选地,在一些实施例中,目标优化函数中除了包括神经网络模型的损失函数,以及神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异之外,还包括结构参数的0范数。
需要说明的是,结构参数的0范数可以使得每个操作层上的结构参数中只有一个结构参数的取值为1,其余为0。
例如,作为第三种实现方式,目标优化函数为如下公式所示:
Figure PCTCN2018117957-appb-000006
其中,α表示结构参数。w表示网络权重。L val(w *(α),α)表示验证集上的损失值,L val(w *(α),α)由α与w决定。M表示计算设备的计算资源,FLOPS表示采用优化过程中每次迭代的结构参数的神经网络模型的计算量,λ 1为常数,λ 1(M-FLOPS)表示神经网络模型的计算量与计算资源的差异。FLOPS可以通过公式(1)计算得到,也可以通过其它可行方式获得。b是一个辅助变量, 当α<λ 2时,b等于0,当α≥λ 2时,b等于α。‖b‖ 0表示b的0范数,λ 2为常数。
‖b‖ 0可以使得每一个操作层上的结构参数中只有一个取值为1,其余取值均为0。还应理解,本申请实施例通过在目标优化函数中加入结构参数的0范数,且结构参数的0范数可以使得每一个操作层上的结构参数中只有一个取值为1,其余取值均为0,从而可以更进一步地,减小在网络架构搜索过程中确定的神经网络模型的计算量,与实际投入使用的神经网络模型的计算量之间的差距。
因此,本申请实施例,通过在目标优化函数中加入结构参数的0范数,可以使得每一个操作层上的结构参数中只有一个取值为1,其余取值均为0,从而使得在优化过程中使用的神经网络模型的计算量最大程度地接近实际应用中神经网络模型的计算量,不仅可以在一定计算量限制条件下,提高申请网络模型的性能,还可以有效提高对计算资源的利用率。
作为一种实现方式,本申请提供的方案可以通过在现有的可差分网络架构搜索(DARTS)方法的基础上实现。
可选地,本申请实施例还提供一种神经网络架构搜索的方法,该方法包括如下步骤。
第一步,获取待进行架构搜索的神经网络。
第二步,对所述神经网络进行可差分网络架构搜索,获得所述神经网络的结构参数,其中,所述可差分网络架构搜索使用的优化目标函数中包括第一正则项,所述第一正则项表示采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异。
例如,优化目标函数如下所示:
Figure PCTCN2018117957-appb-000007
其中,α表示结构参数。w表示网络权重。L val(w *(α),α)表示验证集上的损失值。L train(w,α)表示训练集上的损失值。L train(w,α)与L val(w *(α),α)均由结构参数α与网络权重w决定。M表示计算设备的计算资源,FLOPS表示采用优化过程中每次迭代的结构参数的神经网络模型的计算量,λ 1为常数。s.t.是subject to的缩写,表示需要满足s.t.后面的条件,即公式(5)中的第一个 公式需要在满足公式(5)的第二个公式的情况下计算。
基于公式(5)定义的优化问题的优化目标是找到在满足w *(α)=argmin wL train(w,α)的前提下,找到使得L val(w *(α),α)+λ 1|M-FLOPS|最小的结构参数。
应理解,在第二步中,可以采用DARTS中使用的基于梯度信息的优化算法,基于目标优化函数进行优化,从而获得优化后的结构参数。
例如,基于公式(5)的优化流程如下:
步骤1)固定w,优化α;
步骤2)固定α,优化w;
重复步骤1)和步骤2),直至收敛。
因此,本申请实施例通过提出一种带有约束的可差分网络架构搜索方法,从而可以实现在计算资源有限的场景下自动化设计高性能的神经网络模型。
可选地,在本实施例中,目标优化函数中还包括结构参数的1范数。
例如,优化目标函数如下所示:
Figure PCTCN2018117957-appb-000008
其中,α表示结构参数。w表示网络权重。L val(w *(α),α)表示验证集上的损失值。L train(w,α)表示训练集上的损失值。L train(w,α)与L val(w *(α),α)均由结构参数α与网络权重w决定。M表示计算设备的计算资源,FLOPS表示采用优化过程中每次迭代的结构参数的神经网络模型的计算量,λ 1为常数。‖α‖ 1表示α的1范数,‖α‖ 1使得神经网络模型的每一个操作层上的结构参数的取值较为稀疏。
基于公式(6)定义的优化问题的优化目标是找到在满足w *(α)=argmin wL train(w,α)的前提下,找到使得L val(w *(α),α)+λ 1(M-FLOPS)+‖α‖ 1最小的结构参数。
例如,基于公式(6)的优化流程如下:
步骤1)固定w,优化α;
步骤2)固定α,优化w;
重复步骤1)和步骤2),直至收敛。
本申请实施例,通过在目标优化函数中加入结构参数的1范数,可以使得每一个操作层的多个结构参数的取值尽量稀疏,从而使得在优化过程中使用的神经网络模型的计算量较为接近实际应用中神经网络模型的计算量,不仅可以在一定计算量限制条件下,提高申请网络模型的性能,还可以进一步提高对计算资源的利用率。
可选地,在本实施例中,目标优化函数中还包括结构参数的0范数。
例如,优化目标函数如下所示:
Figure PCTCN2018117957-appb-000009
其中,α表示结构参数。w表示网络权重。L val(w *(α),α)表示验证集上的损失值。L train(w,α)表示训练集上的损失值。L train(w,α)与L val(w *(α),α)均由结构参数α与网络权重w决定。M表示计算设备的计算资源,FLOPS表示采用优化过程中每次迭代的结构参数的神经网络模型的计算量,λ 1为常数。b是一个辅助变量,相当于模板,当α<λ 2时,b等于0,当α≥λ 2时,b等于α。‖b‖ 0表示b的0范数,λ 2为常数。应理解,‖b‖ 0也可以看作是α的0范数。
‖b‖ 0可以使得每一个操作层上的结构参数中只有一个取值为1,其余取值均为0。例如,合适的λ 2可以实现每一个操作层上的结构参数α中,只有一个取值为1,其余取值均为0。
基于公式(7)定义的优化问题的优化目标是找到在满足w *(α)=argmin wL train(w,α)的前提下,找到使得L val(w *(α),α)+λ 1(M-FLOPS)+‖b‖ 02∑(b-α) 2最小的结构参数。
例如,基于公式(7)的优化流程如下:
步骤1)给定α,由如下式子得到b;
Figure PCTCN2018117957-appb-000010
步骤2)得到b的值后,固定w,优化α;
步骤3)固定α,优化w;
重复步骤1)至步骤2),直至收敛。
本申请实施例,通过在目标优化函数中加入结构参数的0范数,可以使得每一个操作层上的结构参数中只有一个取值为1,其余取值均为0,从而使得在优化过程中使用的神经网络模型的计算量最大程度地接近实际应用中神经网络模型的计算量,不仅可以在一定计算量限制条件下,提高申请网络模型的性能,还可以有效提高对计算资源的利用率。
本申请实施例提出一种带有约束的可差分网络架构搜索方法,一方面,由于可差分网络架构搜索方法使用的是基于梯度信息的优化算法,因此,可以提高神经网络模型的性能,另一方面,通过在可差分网络架构搜索方法使用的目标优化函数中纳入计算量约束,可以有效约束神经网络模型的计算量,从而可以实现,在计算资源有限的场景下,有效提高神经网络模型的性能。
综上所述,本申请通过利用梯度信息优化网络架构搜索过程,可以提高神经网络模型的性能,通过在目标优化函数中包括神经网络模型的计算量与计算设备的计算资源之间的差异,可以有效约束神经网络模型的计算量,从而可以实现,在计算资源有限的场景下,有效提高神经网络模型的性能。
应理解,图1、图3、图4和图5仅为示例而非限定。还应理解,本文中提及的公式(1)至公式(7)也仅表示一种可能的实现,实际应用中可以进行相应的变形,例如,例如公式(2)至公式(7)中系数的设置可以有多种变形。本领域技术人员在本文提供的公式的基础上合乎逻辑推断出的变形方式,也落入本申请保护范围内。
应理解,本申请提出的网络架构搜索方案可以应用于带有计算量约束的网络架构优化的问题。
上文结合图1至图5,详细描述了本申请的方法实施例,下面结合图6、图7和图8,详细描述本发明的装置实施例。应理解,装置实施例的描述与方法实施例的描述相互对应,因此,未详细描述的部分可以参见前面方法实施例。
如图6所示,本申请实施例提供一种神经网络架构搜索的装置600,该装置600包括如下单元。
获取单元610,用于获取待进行网络架构搜索的神经网络模型。
确定单元620,用于确定神经网络模型的搜索空间,搜索空间定义了神经网络模型中每两个节点之间的操作层上的多种操作。
配置单元630,用于为搜索空间中定义的每个操作层上的多种操作配置结构参数。
优化单元640,用于利用基于梯度信息的优化算法,对神经网络模型进行网络架构搜索,获得优化后的结构参数,其中,网络架构搜索所使用的目标优化函数包括神经网络模型的损失函数,以及采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异。
因此,在本申请提供的方案中,通过利用梯度信息优化网络架构搜索过程,可以提高神经网络模型的性能,通过在目标优化函数中包括神经网络模型的计算量与计算设备的计算资源之间的差异,可以有效约束神经网络模型的计算量,从而可以实现,在计算资源有限的场景下,有效提高神经网络模型的性能。
例如,目标优化函数可以如前文描述的公式(2)或公式(5)所示。
可选地,作为一个实施例,目标优化函数还包括结构参数的1范数。
例如,目标优化函数可以如前文描述的公式(3)或公式(6)所示。
可选地,作为一个实施例,目标优化函数还包括结构参数的0范数。
例如,目标优化函数可以如前文描述的公式(4)或公式(7)所示。
可选地,在上述一些或全部实施例中,采用优化过程中每次迭代的结构参数的神经网络模型的计算量根据神经网络模型中每个操作层的计算量得到。
例如,采用优化过程中每次迭代的结构参数的神经网络模型的计算量根据神经网络模型中所有操作层的计算量累加得到,其中,每个操作层的计算量根据每个操作层上的结构参数以及每个操作层上各个操作的计算量获得。
例如,采用优化过程中每次迭代的结构参数的神经网络模型的计算量FLOPS根据前文描述的公式(1)获得。
可选地,在上述一些或全部实施例中,采用优化过程中每次迭代的结构参数的神经网络模型的计算量根据神经网络模型中部分操作层的计算量累加得到,其中,每个操作层的计算量根据每个操作层上的结构参数以及每个操作层上各个操作的计算量获得。
应理解,本实施例中的获取单元610、确定单元620、配置单元630和优化单元640均可以由处理器或处理器相关电路实现。
如图7所示,本申请实施例还提供一种网络架构搜索的装置700,该装置700包括如下单元。
获取单元710,用于获取待进行架构搜索的神经网络。
优化单元720,用于对神经网络进行可差分网络架构搜索,获得神经网络的结构参数,其中,可差分网络架构搜索使用的优化目标函数中包括第一正则项,第一正则项表示采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用神经网络模型的计算设备的计算资源之间的差异。
本申请通过提出一种带有约束的可差分网络架构搜索方法,可以实现在计算资源有限的场景下自动化设计高性能的神经网络模型。
例如,目标优化函数可以如前文描述的公式(5)所示。
可选地,作为一个实施例,目标优化函数还包括结构参数的1范数。
例如,目标优化函数可以如前文描述的公式(6)所示。
可选地,作为一个实施例,目标优化函数还包括结构参数的0范数。
例如,目标优化函数可以如前文描述的公式(7)所示。
可选地,在上述一些或全部实施例中,采用优化过程中每次迭代的结构参数的神经网络模型的计算量根据神经网络模型中每个操作层的计算量得到。
例如,采用优化过程中每次迭代的结构参数的神经网络模型的计算量根据神经网络模型中所有操作层的计算量累加得到,其中,每个操作层的计算量根据每个操作层上的结构参数以及每个操作层上各个操作的计算量获得。
例如,采用优化过程中每次迭代的结构参数的神经网络模型的计算量FLOPS根据前文描述的公式(1)获得。
应理解,本实施例中的获取单元710和优化单元720均可以由处理器或处理器相关电路实现。
如图8所示,本申请实施例还提供一种神经网络处理装置800。该神经网络处理装置800包括处理器810和存储器820,存储器820用于存储指令,处理器810用于执行存储器820中存储的指令,且对存储器820中存储的指令的执行使得,处理器810用于执行上文方法实施例。
因此,在本申请提供的方案中,通过利用梯度信息优化网络架构搜索过程,可以提高神经网络模型的性能,通过在目标优化函数中包括神经网络模型的计算量与计算设备的计算资源之间的差异,可以有效约束神经网络模型 的计算量,从而可以实现,在计算资源有限的场景下,有效提高神经网络模型的性能。
神经网络处理装置800还可以对应于上文实施例提供的装置600或装置700。
可选地,神经网络处理装置800还可以包括通信接口830,用于将神经网络处理装置800处理完成的数据输出,和/或,从外部设备输入神经网络处理装置800将要处理的数据。
例如,处理器810用于控制通信接口830输入和/输出数据。
本申请实施例还提供一种计算机存储介质,其上存储有计算机程序,计算机程序被计算机执行时使得,计算机执行上文方法实施例。
本申请实施例还提供一种包含指令的计算机程序产品,其特征在于,指令被计算机执行时使得计算机执行上文方法实施例。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方 法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (21)

  1. 一种神经网络架构搜索的方法,其特征在于,包括:
    获取待进行网络架构搜索的神经网络模型;
    确定所述神经网络模型的搜索空间,所述搜索空间定义了所述神经网络模型中每两个节点之间的操作层上的多种操作;
    为所述搜索空间中定义的每个操作层上的多种操作配置结构参数;
    利用基于梯度信息的优化算法,对所述神经网络模型进行网络架构搜索,获得优化后的所述结构参数,其中,所述网络架构搜索所使用的目标优化函数包括所述神经网络模型的损失函数,以及采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用所述神经网络模型的计算设备的计算资源之间的差异。
  2. 根据权利要求1所述的方法,其特征在于,所述目标优化函数还包括所述结构参数的1范数。
  3. 根据权利要求1所述的方法,其特征在于,所述目标优化函数还包括所述结构参数的0范数。
  4. 根据权利要求1所述的方法,其特征在于,所述目标优化函数如下:
    Figure PCTCN2018117957-appb-100001
    其中,α表示所述结构参数,w表示网络权重,L val(w *(α),α)表示验证集上的损失值,M表示所述计算设备的计算资源,FLOPS表示所述采用优化过程中每次迭代的结构参数的神经网络模型的计算量,λ 1为常数。
  5. 根据权利要求2所述的方法,其特征在于,所述目标优化函数如下:
    Figure PCTCN2018117957-appb-100002
    其中,α表示所述结构参数,w表示网络权重,L val(w *(α),α)表示验证集上的损失值,M表示所述计算设备的计算资源,FLOPS表示所述采用优化过程中每次迭代的结构参数的神经网络模型的计算量,λ 1为常数,‖α‖ 1表示α的1范数。
  6. 根据权利要求2所述的方法,其特征在于,所述目标优化函数如下:
    Figure PCTCN2018117957-appb-100003
    Figure PCTCN2018117957-appb-100004
    其中,α表示所述结构参数,w表示网络权重,L val(w *(α),α)表示验证集上的损失值,M表示所述计算设备的计算资源,FLOPS表示所述采用优化过程中每次迭代的结构参数的神经网络模型的计算量,‖b‖ 0表示b的0范数,λ 12为常数。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述采用优化过程中每次迭代的结构参数的神经网络模型的计算量,根据所述神经网络模型中每个操作层的计算量获得。
  8. 根据权利要求7所述的方法,其特征在于,采用优化过程中每次迭代的结构参数的神经网络模型的计算量根据所述神经网络模型中所有操作层的计算量累加得到,其中,每个操作层的计算量根据所述每个操作层上的结构参数以及所述每个操作层上各个操作的计算量获得。
  9. 根据权利要求8所述的方法,其特征在于,采用优化过程中每次迭代的结构参数的神经网络模型的计算量FLOPS根据如下公式获得:
    Figure PCTCN2018117957-appb-100005
    Figure PCTCN2018117957-appb-100006
    其中,(i,j)中的i,j分别表示所述神经网络模型中同一个操作层的输入节点和输出节点,所述神经网络模型中节点的总数量为(M+1),FLOPS (i,j)表示输入节点i与输出节点j之间的操作层上的计算量,Ο表示输入节点i与输出节点j之间的操作层上的候选操作集合,
    Figure PCTCN2018117957-appb-100007
    表示输入节点i与输出节点j之间的操作层上的候选操作集合Ο中的候选操作o的结构参数,
    Figure PCTCN2018117957-appb-100008
    表示输入节点i与输出节点j之间的操作层上的候选操作集合Ο中的候选操作o的计算量。
  10. 一种网络架构搜索的装置,其特征在于,包括:
    获取单元,用于获取待进行网络架构搜索的神经网络模型;
    确定单元,用于确定所述神经网络模型的搜索空间,所述搜索空间定义 了所述神经网络模型中每两个节点之间的操作层上的多种操作;
    配置单元,用于为所述搜索空间中定义的每个操作层上的多种操作配置结构参数;
    优化单元,用于利用基于梯度信息的优化算法,对所述神经网络模型进行网络架构搜索,获得优化后的所述结构参数,其中,所述网络架构搜索所使用的目标优化函数包括所述神经网络模型的损失函数,以及采用优化过程中每次迭代的结构参数的神经网络模型的计算量与使用所述神经网络模型的计算设备的计算资源之间的差异。
  11. 根据权利要求10所述的装置,其特征在于,所述目标优化函数还包括所述结构参数的1范数。
  12. 根据权利要求10所述的装置,其特征在于,所述目标优化函数还包括所述结构参数的0范数。
  13. 根据权利要求10所述的装置,其特征在于,所述目标优化函数如下:
    Figure PCTCN2018117957-appb-100009
    其中,α表示所述结构参数,w表示网络权重,L val(w *(α),α)表示验证集上的损失值,M表示所述计算设备的计算资源,FLOPS表示采用优化过程中每次迭代的结构参数的神经网络模型的计算量,λ 1为常数。
  14. 根据权利要求11所述的装置,其特征在于,所述目标优化函数如下:
    Figure PCTCN2018117957-appb-100010
    其中,α表示所述结构参数,w表示网络权重,L val(w *(α),α)表示验证集上的损失值,M表示所述计算设备的计算资源,FLOPS表示采用优化过程中每次迭代的结构参数的神经网络模型的计算量,λ 1为常数,‖α‖ 1表示α的1范数。
  15. 根据权利要求12所述的装置,其特征在于,所述目标优化函数如下:
    Figure PCTCN2018117957-appb-100011
    Figure PCTCN2018117957-appb-100012
    其中,α表示所述结构参数,w表示网络权重,L val(w *(α),α)表示验证集上的损失值,M表示所述计算设备的计算资源,FLOPS表示采用优化过程中每次迭代的结构参数的神经网络模型的计算量,‖b‖ 0表示b的0范数,λ 12为常数。
  16. 根据权利要求10至15中任一项所述的装置,其特征在于,所述采用优化过程中每次迭代的结构参数的神经网络模型的计算量,根据所述神经网络模型中每个操作层的计算量获得。
  17. 根据权利要求16所述的装置,其特征在于,采用优化过程中每次迭代的结构参数的神经网络模型的计算量根据所述神经网络模型中所有操作层的计算量累加得到,其中,每个操作层的计算量根据所述每个操作层上的结构参数以及所述每个操作层上各个操作的计算量获得。
  18. 根据权利要求17所述的装置,其特征在于,采用优化过程中每次迭代的结构参数的神经网络模型的计算量FLOPS根据如下公式获得:
    Figure PCTCN2018117957-appb-100013
    Figure PCTCN2018117957-appb-100014
    其中,(i,j)中的i,j分别表示所述神经网络模型中同一个操作层的输入节点和输出节点,所述神经网络模型中节点的总数量为(M+1),FLOPS (i,j)表示输入节点i与输出节点j之间的操作层上的计算量,Ο表示输入节点i与输出节点j之间的操作层上的候选操作集合,
    Figure PCTCN2018117957-appb-100015
    表示输入节点i与输出节点j之间的操作层上的候选操作集合Ο中的候选操作o的结构参数,
    Figure PCTCN2018117957-appb-100016
    表示输入节点i与输出节点j之间的操作层上的候选操作集合Ο中的候选操作o的计算量。
  19. 一种神经网络处理装置,其特征在于,包括:存储器与处理器,所述存储器用于存储指令,所述处理器用于执行所述存储器存储的指令,并且对所述存储器中存储的指令的执行使得,所述处理器用于执行如权利要求1 至9中任一项所述的方法。
  20. 一种计算机存储介质,其特征在于,其上存储有计算机程序,所述计算机程序被计算机执行时使得,所述计算机执行如权利要求1至9中任一项所述的方法。
  21. 一种包含指令的计算机程序产品,其特征在于,所述指令被计算机执行时使得计算机执行如权利要求1至9中任一项所述的方法。
PCT/CN2018/117957 2018-11-28 2018-11-28 神经网络架构搜索的方法与装置 WO2020107264A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880068164.4A CN111406263A (zh) 2018-11-28 2018-11-28 神经网络架构搜索的方法与装置
PCT/CN2018/117957 WO2020107264A1 (zh) 2018-11-28 2018-11-28 神经网络架构搜索的方法与装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/117957 WO2020107264A1 (zh) 2018-11-28 2018-11-28 神经网络架构搜索的方法与装置

Publications (1)

Publication Number Publication Date
WO2020107264A1 true WO2020107264A1 (zh) 2020-06-04

Family

ID=70854171

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/117957 WO2020107264A1 (zh) 2018-11-28 2018-11-28 神经网络架构搜索的方法与装置

Country Status (2)

Country Link
CN (1) CN111406263A (zh)
WO (1) WO2020107264A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022116519A1 (zh) * 2020-12-03 2022-06-09 北京搜狗科技发展有限公司 一种搜索方法、装置和电子设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200304B (zh) * 2020-09-30 2023-03-24 北京市商汤科技开发有限公司 神经网络搜索方法、装置、电子设备和存储介质
CN112819138A (zh) * 2021-01-26 2021-05-18 上海依图网络科技有限公司 一种图像神经网络结构的优化方法及装置
CN113312175B (zh) * 2021-04-27 2024-09-06 北京迈格威科技有限公司 一种算子确定、运行方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107180410A (zh) * 2017-04-11 2017-09-19 中国农业大学 一种图像的风格化重建方法及装置
US20170330068A1 (en) * 2016-05-16 2017-11-16 Canon Kabushiki Kaisha Devices, systems, and methods for feature encoding
CN107463953A (zh) * 2017-07-21 2017-12-12 上海交通大学 在标签含噪情况下基于质量嵌入的图像分类方法及系统
CN107945204A (zh) * 2017-10-27 2018-04-20 西安电子科技大学 一种基于生成对抗网络的像素级人像抠图方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10275719B2 (en) * 2015-01-29 2019-04-30 Qualcomm Incorporated Hyper-parameter selection for deep convolutional networks
US9659248B1 (en) * 2016-01-19 2017-05-23 International Business Machines Corporation Machine learning and training a computer-implemented neural network to retrieve semantically equivalent questions using hybrid in-memory representations
US11521069B2 (en) * 2016-10-31 2022-12-06 Oracle International Corporation When output units must obey hard constraints
US11003989B2 (en) * 2017-04-27 2021-05-11 Futurewei Technologies, Inc. Non-convex optimization by gradient-accelerated simulated annealing
CN108022257A (zh) * 2017-12-28 2018-05-11 中国科学院半导体研究所 适用于硬件的高速卷积神经网络目标跟踪方法和装置
CN108805257A (zh) * 2018-04-26 2018-11-13 北京大学 一种基于参数范数的神经网络量化方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170330068A1 (en) * 2016-05-16 2017-11-16 Canon Kabushiki Kaisha Devices, systems, and methods for feature encoding
CN107180410A (zh) * 2017-04-11 2017-09-19 中国农业大学 一种图像的风格化重建方法及装置
CN107463953A (zh) * 2017-07-21 2017-12-12 上海交通大学 在标签含噪情况下基于质量嵌入的图像分类方法及系统
CN107945204A (zh) * 2017-10-27 2018-04-20 西安电子科技大学 一种基于生成对抗网络的像素级人像抠图方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022116519A1 (zh) * 2020-12-03 2022-06-09 北京搜狗科技发展有限公司 一种搜索方法、装置和电子设备

Also Published As

Publication number Publication date
CN111406263A (zh) 2020-07-10

Similar Documents

Publication Publication Date Title
WO2020107264A1 (zh) 神经网络架构搜索的方法与装置
US20180268306A1 (en) Using Different Data Sources for a Predictive Model
WO2021089013A1 (zh) 空间图卷积网络的训练方法、电子设备及存储介质
WO2017124713A1 (zh) 一种数据模型的确定方法及装置
TWI729606B (zh) 用於一邊緣運算網路的負載平衡裝置及方法
CN111783810B (zh) 用于确定用户的属性信息的方法和装置
WO2022083093A1 (zh) 图谱中的概率计算方法、装置、计算机设备及存储介质
US20190138929A1 (en) System and method for automatic building of learning machines using learning machines
WO2022028147A1 (zh) 图像分类模型训练方法、装置、计算机设备及存储介质
US20220114479A1 (en) Systems and methods for automatic mixed-precision quantization search
WO2020237689A1 (zh) 网络结构搜索的方法及装置、计算机存储介质和计算机程序产品
JP2024509036A (ja) ニューラルネットワークモデルの圧縮方法及び同方法を実施する電子機器
CN113626612A (zh) 一种基于知识图谱推理的预测方法和系统
CN114925651A (zh) 一种电路走线确定方法及相关设备
WO2023274167A1 (zh) 图像分类、模型训练方法、设备、存储介质及计算机程序
CN113468344B (zh) 实体关系抽取方法、装置、电子设备和计算机可读介质
CN115034379A (zh) 一种因果关系确定方法及相关设备
WO2024139703A1 (zh) 对象识别模型的更新方法、装置、电子设备、存储介质及计算机程序产品
CN115983362A (zh) 一种量化方法、推荐方法以及装置
CN112381184B (zh) 图像检测方法、装置、电子设备和计算机可读介质
CN115618065A (zh) 一种数据处理方法及相关设备
CN110782017B (zh) 用于自适应调整学习率的方法和装置
CN114610922A (zh) 图像处理方法及装置、存储介质及电子设备
WO2020237687A1 (zh) 网络结构搜索的方法及装置、计算机存储介质和计算机程序产品
CN112036418A (zh) 用于提取用户特征的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18941309

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18941309

Country of ref document: EP

Kind code of ref document: A1