WO2020237689A1 - 网络结构搜索的方法及装置、计算机存储介质和计算机程序产品 - Google Patents
网络结构搜索的方法及装置、计算机存储介质和计算机程序产品 Download PDFInfo
- Publication number
- WO2020237689A1 WO2020237689A1 PCT/CN2019/089698 CN2019089698W WO2020237689A1 WO 2020237689 A1 WO2020237689 A1 WO 2020237689A1 CN 2019089698 W CN2019089698 W CN 2019089698W WO 2020237689 A1 WO2020237689 A1 WO 2020237689A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network structure
- network
- feedback
- search space
- operations
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Definitions
- This application relates to the field of machine learning, and in particular to a method and device for network structure search, computer storage media, and computer program products.
- the model needs to be trained on a computer or server, and then the model is deployed on the mobile terminal chip.
- high-performance deep learning models often have huge parameters, and the parameters are 32bit floating point parameters. This is no problem when training on computers or servers and other computing-rich devices, but it is very difficult to deploy directly on mobile terminals with limited computing resources.
- the low-bit network has small storage space, fast calculation speed, and less demand for computing resources. Therefore, low-bit network has become one of the hot spots of research in recent years.
- the network structure has a greater impact on low-bit networks, and how to design a network structure suitable for low-bit networks is an urgent problem to be solved.
- the embodiments of the present application provide a method and device for searching a network structure, a computer storage medium, and a computer program product.
- the network structure search method in the implementation manner of this application includes:
- the search space of the neural network model to be searched for the network structure the search space defining a variety of operations on the operation layer between every two nodes in the neural network model
- the network structure search device of the embodiment of the present application includes a processor and a memory, the memory stores one or more programs, and when the programs are executed by the processor, the processor is used for execution: OK
- the search space of the neural network model to be searched for the network structure the search space defines multiple operations on the operation layer between every two nodes in the neural network model; in the search space according to the first network structure
- Each of the operation layers samples one of the operations to obtain the target network structure; lowers the bit of the target network structure to obtain the second network structure; determines the feedback amount of the second network structure; updates according to the feedback amount The first network structure.
- the computer storage medium of the embodiment of the present application stores a computer program thereon, and when the computer program is executed by a computer, the computer executes the above-mentioned method.
- a computer program product containing instructions according to an embodiment of the present application when the instructions are executed by a computer, the computer executes the above-mentioned method.
- the network structure search method and device, computer storage medium, and computer program product of the embodiment of the present application reduce the bit of the sampled target network structure to obtain the second network structure, and then determine the feedback amount of the second network structure to update the first network structure.
- a network structure can obtain a network structure that is more suitable for low-bit networks, thereby achieving high-performance low-bit networks, and thereby enabling low-bit networks to be better applied to mobile terminal scenarios.
- FIG. 1 is a schematic flowchart of a method for searching a network structure according to an embodiment of the present application
- FIG. 2 is a schematic diagram of modules of a network structure search device according to an embodiment of the present application.
- FIG. 3 is a schematic diagram of the principle of a network structure search method according to an embodiment of the present application.
- FIG. 4 is another schematic diagram of the principle of the network structure search method according to the embodiment of the present application.
- FIG. 5 is a schematic flowchart of a method for searching a network structure according to another embodiment of the present application.
- FIG. 6 is a schematic flowchart of a method for searching a network structure according to another embodiment of the present application.
- FIG. 7 is a schematic flowchart of a network structure search method according to another embodiment of the present application.
- FIG. 8 is a schematic flowchart of a method for searching a network structure according to another embodiment of the present application.
- FIG. 9 is a schematic flowchart of a network structure search method according to still another embodiment of the present application.
- FIG. 10 is a schematic flowchart of a network structure search method according to another embodiment of the present application.
- FIG. 11 is a schematic diagram of a general diagram of a network structure search method according to an embodiment of the present application.
- FIG. 12 is a schematic flowchart of a network structure search method according to another embodiment of the present application.
- the network structure search device 10 the memory 102, the processor 104, and the communication interface 106.
- first and second are only used for description purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include one or more of the features. In the description of this application, “multiple” means two or more than two, unless otherwise specifically defined.
- connection should be interpreted broadly unless otherwise clearly specified and limited.
- it can be a fixed connection or a detachable connection.
- Connected or integrally connected it can be mechanically connected, or electrically connected or can communicate with each other; it can be directly connected, or indirectly connected through an intermediate medium, it can be the internal communication of two components or the interaction of two components relationship.
- connection should be interpreted broadly unless otherwise clearly specified and limited.
- it can be a fixed connection or a detachable connection.
- Connected or integrally connected it can be mechanically connected, or electrically connected or can communicate with each other; it can be directly connected, or indirectly connected through an intermediate medium, it can be the internal communication of two components or the interaction of two components relationship.
- an embodiment of the present application provides a method and device 10 for searching a network structure.
- the network structure search method in the implementation manner of this application includes:
- Step S12 Determine the search space of the neural network model to be searched for the network structure, and the search space defines various operations on the operation layer between every two nodes in the neural network model;
- Step S14 Sample an operation at each operation layer of the search space according to the first network structure to obtain the target network structure
- Step S16 Reduce the bit of the target network structure to obtain a second network structure
- Step S18 Determine the feedback amount (val-acc) of the second network structure
- Step S19 Update the first network structure according to the feedback amount.
- the network structure search device 10 of the embodiment of the present application includes a processor 104 and a memory 102.
- the memory 102 stores one or more programs.
- the processor 104 is used to execute: The search space of the neural network model for network structure search.
- the search space defines a variety of operations on the operation layer between every two nodes in the neural network model; one sample is sampled from each operation layer of the search space according to the first network structure Operate to obtain the target network structure; reduce the bit of the target network structure to obtain the second network structure; determine the feedback amount of the second network structure; update the first network structure according to the feedback amount.
- the network structure search method of the embodiment of the present application can be implemented by the network structure search apparatus 10 of the embodiment of the present application.
- the sampled target network structure is lowered to obtain the second network structure, and then the feedback amount of the second network structure is determined to update the first network structure, which can obtain more It is suitable for the network structure of low-bit network, so as to realize high-performance low-bit network, so that low-bit network can be better applied to mobile terminal scenarios.
- the network structure search apparatus 10 may further include a communication interface 106 for outputting data processed by the network structure search apparatus 10, and/or input data to be processed by the network structure search apparatus 10 from an external device .
- the processor 104 is used to control the communication interface 106 to input and/or output data.
- processors 104 may be one.
- the number of processors 104 may also be multiple, such as 2, 3, 5, or other numbers.
- different processors 104 can execute step S12, step S14, step S16, step S18, and step S19.
- step S14, step S16, step S18, and step S19 are repeated multiple times. In this way, a better second network structure can be finally obtained.
- low-bit networks mainly use model quantization technology.
- the model quantization technology mainly includes two parts, one is for weight quantification, and the other is for activation value (Activation) quantification.
- the related technology explains the feasibility of 8bit, which can achieve accuracy rate comparable to 32bit.
- 4bit, 2bit, and 1bit networks have been proposed, but these networks have experienced a significant drop in accuracy due to too much information loss.
- the low-bit network of related technologies directly quantizes the 32bit network weight and activation function output after the model is trained on the server to 1 or -1. Specifically, during training, the forward process binarizes the weights and activation values, and then uses the straight-through estimation (STE) method to update the weights in the backward (forward) process until convergence . Or train the floating-point model to convergence, and then perform binarization and transfer learning (fintune).
- STE straight-through estimation
- these technologies have not solved the problem well.
- the low-bit technology of related technologies is based on the same rule: the use of a good performance model designed on the computer or server for low-bit, such as directly using the residual network structure (Residual Neural Network, resnet) for low-bit processing.
- the neural network is a very complex non-convex model. It is uncertain whether the high-performance model structure under floating-point accuracy should be the same as the high-performance model structure under low specificity.
- the high-performance model structure of floating-point parameters should be different from the high-performance model structure of low-bit parameters.
- directly using the model designed under floating-point accuracy for low-bit cannot solve the problem, which is why there is still no good solution for low-bit networks at present.
- the network structure search method and device 10 of the embodiment of the present application can obtain a network structure more suitable for low-bit networks, thereby realizing high-performance low-bit networks, and thus enabling low-bit networks to be better applied to mobile applications. End scene. It can be understood that because the process of network structure design is very complicated, the network structure search method and device 10 of the embodiment of the present application uses an automated network structure search (NAS, neural architecture search) algorithm to solve the problem of low-bit networks.
- NAS automated network structure search
- NAS Neural Architecture Search
- CNN Convolutional Neural Networks
- the problem to be solved by the network structure search is to determine the operations between nodes in the neural network model. Different combinations of operations between nodes correspond to different network structures. Further, the operations between nodes in the neural network model can be understood as the feature layer in the neural network model. The operation between two nodes refers to the operation required to transform the feature data on one node into the feature data on the other node. The operations mentioned in this application may be other neural network operations such as convolution operations, pooling operations, or fully connected operations. It can be considered that the operation between two nodes constitutes the operation layer between these two nodes. Generally, there are multiple searchable operations on the operation layer between two nodes, that is, multiple candidate operations. The purpose of network structure search is to determine an operation on each operation layer.
- NAS The idea of NAS is to obtain a network structure in the search space through a first network structure, and then obtain the accuracy rate R according to the network structure, and use the accuracy rate R as feedback to update the first network structure.
- the first network structure continues to be optimized. Another network structure, and so on repeatedly until the best results are obtained.
- the first network structure can be used as a controller.
- the first network structure is constructed by Recurrent Neural Network (RNN).
- RNN Recurrent Neural Network
- the first network structure can also be constructed by Convolutional Neural Networks (CNN) or long and short-term memory artificial nerves.
- CNN Convolutional Neural Networks
- LSTM Long-Short Term Memory
- each operation layer of the search space corresponds to a time step of the long and short-term memory artificial neural network.
- the cell of the long and short-term memory artificial neural network outputs one Hidden state, step S14 includes:
- Step S142 Map the hidden state to a feature vector, and the dimension of the feature vector is the same as the number of operations on each operation layer;
- Step S144 Sample an operation at each operation layer according to the feature vector to obtain the target network structure.
- the processor 104 is used to map the hidden state into a feature vector, the dimension of the feature vector is the same as the number of operations on each operation layer; and used to sample an operation in each operation layer according to the feature vector to obtain the target network structure .
- the solid arrow represents a time step
- time 1 represents the first cell of the LSTM
- time 2 represents the second cell of the LSTM
- so on The square conv3*3 represents the operation of this layer in the model
- the circle represents the connection relationship between the operation layer and the operation layer.
- the hidden state output by the cell is calculated to obtain conv3 ⁇ 3, conv3 ⁇ 3 is used as the input layer of the cell at time 2, and the hidden state output by the cell at time 1 is also used as the input of the cell at time 2.
- Circle 1 is calculated.
- circle 1 is used as the input of the cell at time 3
- the hidden state of the cell output at time 2 is also used as the input of time 3.
- the convolution sep5 ⁇ 5 is calculated and so on.
- step S144 includes:
- Step S1442 Normalize the feature vector (softmax) to obtain the probability of each operation of each operation layer;
- Step S1444 Sample an operation at each operation layer according to the probability to obtain the target network structure.
- the processor 104 is used to normalize the feature vector to obtain the probability of each operation of each operation layer; and used to sample an operation in each operation layer according to the probability to obtain the target network structure.
- the hidden state output by the cell of the LSTM is encoded (encoding), and it is mapped to a vector with a dimension of 6, which undergoes a normalized exponential function (softmax) , Becomes a probability distribution, sampling is performed according to this probability distribution, and the operation of the current layer is obtained. And so on to finally get a network structure.
- the method of lowering the target network structure to obtain the second network structure may be the aforementioned model quantization technique, and the specific method of lowering the target network structure is not limited here.
- step S19 is implemented by the following formula:
- R k is the k th feedback amount
- ⁇ c is the parameter for the short and long term memory artificial neural network
- a t is the t-th operation in the operation layer is sampled
- a (t-1): 1 ; ⁇ c ) is the probability of sampling to the operation.
- m is the total number of feedback quantities
- T is the number of hyperparameters predicted by the first network structure.
- the first network structure is updated according to the amount of feedback.
- the network structure search method in the embodiment of the present application may be a network structure search method based on NAS, or a network structure search method (ENAS-like) of all variants based on Efficient Neural Architecture Search (ENAS).
- ENAS Efficient Neural Architecture Search
- ENAS can be an efficient network structure search method based on the reinforcement learning (RL) NAS structure, or it can be an efficient network structure search method based on evolutionary algorithms. It can be understood that, due to the low efficiency of NAS, in this embodiment, ENAS can improve the efficiency of network structure search through weight sharing and other methods.
- RL reinforcement learning
- the method and device 10 for network structure search are based on NAS.
- the network structure search method and device 10 are based on ENAS-like.
- the method and device 10 for searching the network structure of the first embodiment and the second embodiment are respectively described below.
- the method and device 10 for network structure search are based on NAS.
- step S18 includes:
- Step S181 Train the second network structure to converge to determine the amount of feedback.
- the processor 104 is configured to train the second network structure to converge to determine the amount of feedback.
- the feedback amount of the second network structure is determined.
- training sample can be divided into a training set (train) and a test set (valid) in advance.
- sample data is generally divided into training samples and verification samples. The training samples are used to train the network structure, and the verification samples are used to verify the network structure.
- the training set is used to train the parameters of the searched second network structure, such as the parameters of the second network structure calculated by conv3*3, sep5*5, for example Weight, bias, etc.
- the searched second network structure can be predicted on the test set to obtain a feedback amount to update the first network structure (LSTM) according to the aforementioned formula. Please note that the LSTM is not directly trained on the test set.
- the training set is used to train the parameters of the searched second network structure
- the test set is used to update the parameters of the LSTM
- the verification sample is used to verify whether the searched second network structure is good.
- the number of training samples is 10, and the training samples are divided into 8 training sets and 2 test sets.
- the 8 training sets are used to train the searched structures.
- Two test sets are used to update LSTM.
- step S181 includes:
- Step S182 Use the training set to train the second network structure to convergence
- Step S184 Use the test set to predict the second network structure after convergence to determine the amount of feedback.
- the processor 104 is configured to use the training set to train the second network structure to convergence; and to use the test set to predict the converged second network structure to determine the amount of feedback.
- the second network structure is trained to converge to determine the amount of feedback.
- an operation is sampled at each operation layer of the search space to obtain the target network structure, and then the target network structure is reduced to the second network structure using model quantization technology, and then the second network structure is directly converted Train to convergence on the training set, and predict the second network structure after convergence on the test set to determine the amount of feedback. Finally, the feedback amount is substituted into the following formula to update the first network structure according to the feedback amount:
- the network structure search method and device 10 are based on ENAS-like.
- Step S18 includes:
- Step S183 Determine the feedback amount according to the second network structure, and the second network structure has not been trained to convergence.
- the processor 104 is configured to determine the amount of feedback according to the second network structure, which has not been trained to converge.
- the feedback amount of the second network structure is determined. It can be understood that in the first embodiment, each time a second network structure is obtained, it is trained to converge to determine the amount of feedback, which is time-consuming and too inefficient. In this embodiment, when the feedback amount is determined according to the second network structure, the second network structure is not trained to convergence, which can reduce the time to train the second network structure to convergence, thereby improving efficiency.
- training sample can be divided into a training set (train) and a test set (valid) in advance.
- sample data is generally divided into training samples and verification samples. The training samples are used to train the network structure, and the verification samples are used to verify the network structure.
- the training set is used to train the parameters of the searched second network structure, such as the parameters of the second network structure calculated by conv3*3, sep5*5, for example Weight, bias, etc.
- the searched second network structure can be predicted on the test set to obtain a feedback amount to update the first network structure (LSTM) according to the aforementioned formula. Please note that the LSTM is not directly trained on the test set.
- the training set is used to train the parameters of the searched second network structure
- the test set is used to update the parameters of the LSTM
- the verification sample is used to verify whether the searched second network structure is good.
- the number of training samples is 10, and the training samples are divided into 8 training sets and 2 test sets.
- the 8 training sets are used to train the searched structures.
- Two test sets are used to update LSTM.
- step S14 includes:
- Step S146 Use the training set to train a whole graph (whole graph) of the search space, which is formed by operation connection;
- Step S148 Sampling the trained general map according to the first network structure to obtain the target network structure
- Step S18 includes:
- Step S185 Use the test set to predict the second network structure to determine the amount of feedback.
- the processor 104 is configured to use the training set to train the general map of the search space, which is formed by operation connection; and is used to sample the trained general map according to the first network structure to obtain the target network structure; and To use the test set to predict the second network structure to determine the amount of feedback.
- connection mode of the optimal structure with edges in bold in FIG. 11 is a sub-graph of the overall graph.
- step S146 and step S148, step S16, step S185 and step S19 can be performed iteratively until the preset total number of iterations is completed. In this way, a better second network structure can be obtained.
- the total number of iterations is 310. It can be understood that, in other embodiments, the value of the total number of iterations may be 100 times, 200 times, or other values.
- step S146 may be repeated, and each batch of data of the training set is used until the data of the training set is used up, that is, an epoch is completed. Then, update the LSTM.
- step S148, step S16, step S185 and step S19 can be repeated until the preset number of times is completed.
- the preset number of times is 50 times. It can be understood that, in other examples, the preset number of times may be 10, 20, 30 or other numerical values. The specific value of the preset number of times is not limited here. It can be understood that the preset number of times is 50, which can reduce the randomness optimization caused by sampling.
- step S148, step S16, step S185, and step S19 are looped, a preset number of feedback amounts can be determined, so that the LSTM can be updated with the preset number of feedback amounts. Further, the LSTM can be updated by way of strategy gradient optimization. The way to update the LSTM is not limited here.
- the preset number is 20. It can be understood that in other examples, the preset number may be 10, 15, 25 or other numerical values. The specific value of the preset number is not limited here.
- step S146 includes:
- Step S1462 Sample an operation at each operation layer of the search space to obtain a sub-picture of the overall picture
- Step S1464 Use a batch of data (batch) of the training set to train the subgraph.
- the processor 104 is used to sample an operation at each operation layer of the search space to obtain a sub-graph of the overall image; and used to train the sub-image using a batch of data of the training set.
- ENAS adopts a weight sharing strategy. After sampling a network structure each time, it is no longer directly trained to convergence, but a batch of data (batch) of the training set is used to train the subgraph. Please note that the convergence of the graph is not equivalent to the convergence of the network structure.
- the ENAS based on the weight sharing strategy, because each time the network structure is searched, the parameters that can be shared are shared, which can save time and improve the efficiency of the network structure search.
- the relevant parameters of the network structure trained when node 1, node 3 and node 6 are searched can be applied to the training of the network structure searched this time. In this way, it is possible to improve efficiency through weight sharing.
- the search space has 5 layers, and each layer has 4 optional operations, which is equivalent to a 4X5 graph.
- Network structure search needs to select an operation at each layer, which is equivalent to path optimization on the graph. Initially, each layer randomly samples an operation, and then connects the sampled operations to obtain a subgraph, and trains this subgraph on a batch of data in the training set; then, randomly sample one operation for each layer to obtain another subgraph Figure, and then train this subgraph on another batch of data in the training set; then, continue to randomly sample another subgraph and train this subgraph on another batch of data in the training set...until the data in the training set is used up , Which is to complete an epoch. Then train the first network structure.
- the first network structure can be updated 50 times a preset number of times, that is, step S148, step S16, step S185, and step S19 are looped 50 times.
- the following formula is executed 50 times:
- the embodiment of the present application also provides a computer storage medium on which a computer program is stored.
- the computer program When the computer program is executed by a computer, the computer executes the method of any of the above embodiments.
- the embodiment of the present application also provides a computer program product containing instructions, which when executed by a computer causes the computer to execute the method of any one of the foregoing embodiments.
- the computer storage medium and computer program product of the embodiments of the present application reduce the bit of the sampled target network structure to obtain the second network structure, and then determine the feedback amount of the second network structure to update the first network structure, which can be more suitable Based on the network structure of low-bit network, high-performance low-bit network is realized, and low-bit network can be better applied to mobile terminal scenarios.
- the computer program product includes one or more computer instructions.
- the computer can be a general-purpose computer, a dedicated computer, a computer network, or other programmable devices.
- Computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- computer instructions can be transmitted from a website, computer, server, or data center through a cable (such as Coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL) or wireless (such as infrared, wireless, microwave, etc.) transmission to another website site, computer, server, or data center.
- a computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
- the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a digital video disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)).
- the disclosed system, device, and method may be implemented in other ways.
- the device embodiments described above are merely illustrative, for example, the division of units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
一种网络结构搜索的方法包括:(步骤S12)确定待进行网络结构搜索的神经网络模型的搜索空间,搜索空间定义了神经网络模型中每两个节点之间的操作层上的多种操作;(步骤S14)根据第一网络结构在搜索空间的每个操作层采样一个操作以得到目标网络结构;(步骤S16)低比特化目标网络结构以得到第二网络结构;(步骤S18)确定第二网络结构的反馈量;(步骤S19)根据反馈量更新第一网络结构。本申请还公开了一种网络结构搜索的装置、计算机存储介质和计算机程序产品。
Description
本申请涉及机器学习领域,特别涉及一种网络结构搜索的方法及装置、计算机存储介质和计算机程序产品。
相关技术在移动端应用深度学习模型时,需要在电脑或者服务器上训练模型,然后将模型部署在移动端的芯片。而目前高性能的深度学习模型往往参数量巨大,并且参数为32bit浮点参数,这在电脑或者服务器等计算力丰富的设备上训练没有问题,但是直接部署在计算资源有限的移动端则非常困难。而低比特网络存储空间小,运算速度快,对计算资源需求少,因此低比特网络成为近年来研究的热点之一。然而,网络结构对低比特网络有着较大的影响,如何设计一个适用于低比特网络的网络结构是亟待解决的问题。
发明内容
本申请的实施方式提供一种网络结构搜索的方法及装置、计算机存储介质和计算机程序产品。
本申请实施方式的网络结构搜索的方法包括:
确定待进行网络结构搜索的神经网络模型的搜索空间,所述搜索空间定义了所述神经网络模型中每两个节点之间的操作层上的多种操作;
根据第一网络结构在所述搜索空间的每个所述操作层采样一个所述操作以得到目标网络结构;
低比特化所述目标网络结构以得到第二网络结构;
确定所述第二网络结构的反馈量;
根据所述反馈量更新所述第一网络结构。
本申请实施方式的网络结构搜索的装置包括处理器和存储器,所述存储器存储有一个或多个程序,在所述程序被所述处理器执行的情况下,使得所述处理器用于执行:确定待进行网络结构搜索的神经网络模型的搜索空间,所述搜索空间定义了所述神经网络模型中每两个节点之间的操作层上的多种操作;根据第一网络结构在所述搜索空间的每个所述操作层采样一个所述操作以得到目标网络结构;低比特化所述目标网络结构以得到第二网络结构;确定所述第二网络结构的反馈量;根据所述反馈量更新所述第一网络结构。
本申请实施方式的计算机存储介质,其上存储有计算机程序,所述计算机程序被计算机执行时使得,所述计算机执行上述的方法。
本申请实施方式的包含指令的计算机程序产品,所述指令被计算机执行时使得计算机执行上述的方法。
本申请实施方式的网络结构搜索的方法及装置、计算机存储介质和计算机程序产品,将采样到的目标网络结构低比特化以得到第二网络结构,再确定第二网络结构的反馈量来更新第一网络结构,可以得到更适用于低比特网络的网络结构,从而实现高性能的低比特网络,进而使低比特网络能够更好的应用于移动端场景。
本申请的实施方式的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实施方式的实践了解到。
本申请的上述和/或附加的方面和优点从结合下面附图对实施方式的描述中将变得明显和容易理解,其中:
图1是本申请实施方式的网络结构搜索的方法的流程示意图;
图2是本申请实施方式的网络结构搜索的装置的模块示意图;
图3是本申请实施方式的网络结构搜索的方法的原理示意图;
图4是本申请实施方式的网络结构搜索的方法的另一原理示意图;
图5是本申请又一实施方式的网络结构搜索的方法的流程示意图;
图6是本申请再一实施方式的网络结构搜索的方法的流程示意图;
图7是本申请另一实施方式的网络结构搜索的方法的流程示意图;
图8是本申请又一实施方式的网络结构搜索的方法的流程示意图;
图9是本申请再一实施方式的网络结构搜索的方法的流程示意图;
图10是本申请另一实施方式的网络结构搜索的方法的流程示意图;
图11是本申请实施方式的网络结构搜索的方法的总图示意图;
图12是本申请另一实施方式的网络结构搜索的方法的流程示意图。
主要元件符号说明:
网络结构搜索的装置10、存储器102、处理器104、通信接口106。
下面详细描述本申请的实施方式,所述实施方式的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附 图描述的实施方式是示例性的,仅用于解释本申请,而不能理解为对本申请的限制。
在本申请的描述中,需要理解的是,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个所述特征。在本申请的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。
在本申请的描述中,需要说明的是,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接或可以相互通信;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本申请中的具体含义。
下文的公开提供了许多不同的实施方式或例子用来实现本申请的不同结构。为了简化本申请的公开,下文中对特定例子的部件和设置进行描述。当然,它们仅仅为示例,并且目的不在于限制本申请。此外,本申请可以在不同例子中重复参考数字和/或参考字母,这种重复是为了简化和清楚的目的,其本身不指示所讨论各种实施方式和/或设置之间的关系。此外,本申请提供了的各种特定的工艺和材料的例子,但是本领域普通技术人员可以意识到其他工艺的应用和/或其他材料的使用。
下面详细描述本申请的实施方式,所述实施方式的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施方式是示例性的,仅用于解释本申请,而不能理解为对本申请的限制。
请参阅图1和图2,本申请实施方式提供一种网络结构搜索的方法及装置10。
本申请实施方式的网络结构搜索的方法包括:
步骤S12:确定待进行网络结构搜索的神经网络模型的搜索空间,搜索空间定义了神经网络模型中每两个节点之间的操作层上的多种操作;
步骤S14:根据第一网络结构在搜索空间的每个操作层采样一个操作以得到目标网络结构;
步骤S16:低比特化目标网络结构以得到第二网络结构;
步骤S18:确定第二网络结构的反馈量(val-acc);
步骤S19:根据反馈量更新第一网络结构。
本申请实施方式的网络结构搜索的装置10包括处理器104和存储器102,存储器102存储有一个或多个程序,在程序被处理器104执行的情况下,使得处理器104用于执行:确定待进行网络结构搜索的神经网络模型的搜索空间,搜索空间定义了神经网络模型中每两个节点之间的操作层上的多种操作;根据第一网络结构在搜索空间的每个操作层采样一 个操作以得到目标网络结构;低比特化目标网络结构以得到第二网络结构;确定第二网络结构的反馈量;根据反馈量更新第一网络结构。
也就是说,本申请实施方式的网络结构搜索的方法可由本申请实施方式的网络结构搜索的装置10实现。
本申请实施方式的网络结构搜索的方法及装置10,将采样到的目标网络结构低比特化以得到第二网络结构,再确定第二网络结构的反馈量来更新第一网络结构,可以得到更适用于低比特网络的网络结构,从而实现高性能的低比特网络,进而使低比特网络能够更好的应用于移动端场景。
可选地,网络结构搜索的装置10还可以包括通信接口106,用于将网络结构搜索的装置10处理完成的数据输出,和/或,从外部设备输入网络结构搜索的装置10将要处理的数据。例如,处理器104用于控制通信接口106输入和/输出数据。
请注意,处理器104的数量可以是一个。处理器104的数量也可以是多个,例如2个、3个、5个或其他数量。在处理器104的数量为多个的情况下,可以由不同的处理器104执行步骤S12、步骤S14、步骤S16、步骤S18和步骤S19。
另外,步骤S14、步骤S16、步骤S18和步骤S19是多次循环进行的。如此,可以最终获得效果较好的第二网络结构。
在相关技术中,低比特网络主要采用模型量化技术,模型量化技术主要包括两个部分,一是针对权重(Weight)量化,一是针对激活值(Activation)量化。相关技术阐述了8bit的可行性,可以实现相较于32bit不掉准确率。而为了进一步追求计算效率,4bit,2bit和1bit网络进而被提出,但是这些网络由于信息损失太多,均出现准确率大幅下降。
以1bit网络为例,相关技术的低比特网络是将模型在服务器上训练之后的32bit网络权值和激活函数输出直接量化为1或-1。具体地,训练的时候,前向(forward)的过程将权重和激活值二值化,然后反向(backward)的时候使用直通估算(Straight Through Estimator,STE)方法进行权值的更新,直到收敛。或者将浮点模型训练到收敛,然后再进行二值化处理并迁移学习(fintune)。然而,这些技术都没有很好的解决问题。
也即是说,相关技术的低比特技术都是基于同样的一种规则在操作:利用计算机或者服务器上设计好的性能较好的模型进行低比特化,例如直接使用残差网络结构(Residual Neural Network,resnet)进行低比特处理。而实际上神经网络是一个非常复杂的非凸模型,浮点精度下的高性能模型结构与低比特性下的高性能模型结构是否应该一样并不确定。
事实上,浮点参数的高性能模型结构应该与低比特参数的高性能模型结构存在一定的差异。也就是说,直接使用在浮点精度下设计出的模型进行低比特化并不能解决问题,这也是为什么目前低比特网络仍然没有很好的解决方案的原因。
因此,可以认为,网络结构对低比特网络有着较大的影响。对此,本申请实施方式的网络结构搜索的方法及装置10,可以得到更适用于低比特网络的网络结构,从而实现高性能的低比特网络,进而使低比特网络能够更好的应用于移动端场景。可以理解,由于网络结构设计的过程十分复杂,本申请实施方式的网络结构搜索的方法及装置10,使用自动化网络结构搜索(NAS,neural architecture search)算法解决低比特网络的问题。
具体地,网络结构搜索(Neural Architecture Search,NAS)是一种利用算法自动化设计神经网络模型的技术。网络结构搜索就是要搜索出神经网络模型的结构。在本申请实施方式中,待进行网络结构搜索的神经网络模型为卷积神经网络(Convolutional Neural Networks,CNN)。
网络结构搜索要解决的问题就是确定神经网络模型中的节点之间的操作。节点之间的操作的不同组合对应不同的网络结构。进一步地,神经网络模型中的节点之间的操作可以理解为神经网络模型中的特征层。两个节点之间的操作指的是,其中一个节点上的特征数据变换为另一个节点上的特征数据所需的操作。本申请提及的操作可以为卷积操作、池化操作、或全连接操作等其他神经网络操作。可以认为两个节点之间的操作构成这两个节点之间的操作层。通常,两个节点之间的操作层上具有多个可供搜索的操作,即具有多个候选操作。网络结构搜索的目的就是在每个操作层上确定一个操作。
例如,将conv3*3,conv5*5,depthwise3*3,depthwise5*5,maxpool3*3,average pool3*3等定义为搜索空间。也即是说,目标网络结构的每一层操作是在这六个选择中采样。
NAS的思想是,通过一个第一网络结构在搜索空间中得到一个网络结构,然后根据该网络结构得到准确率R,将准确率R作为反馈以更新第一网络结构,第一网络结构继续优化得到另一个网络结构,如此反复进行直到得到最佳的结果。
第一网络结构可作为控制器。在图3的示例中,第一网络结构通过循环神经网络(Recurrent Neural Network,RNN)构建,当然,第一网络结构也可以通过卷积神经网络(Convolutional Neural Networks,CNN)或长短期记忆人工神经网络(Long-Short Term Memory,LSTM)构建。在此不对第一网络结构构建的具体方式进行限定。接下来以第一网络结构通过LSTM构建为例进行说明。
请参阅图4和图5,搜索空间的每个操作层对应于长短期记忆人工神经网络的一个时间步(timestep),对于每个时间步,长短期记忆人工神经网络的细胞(Cell)输出一个隐状态(hidden state),步骤S14包括:
步骤S142:将隐状态映射为特征向量,特征向量的维度与每个操作层上的操作数量相同;
步骤S144:根据特征向量在每个操作层采样一个操作以得到目标网络结构。
对应地,处理器104用于将隐状态映射为特征向量,特征向量的维度与每个操作层上的操作数量相同;以及用于根据特征向量在每个操作层采样一个操作以得到目标网络结构。
如此,实现在搜索空间的每个操作层采样一个操作以得到目标网络结构。例如,一共要搜索一个20层的网络,不考虑跳线,需要20个时间步。
在图4的示例中,实线箭头表示时间步(timestep),时间1表示LSTM的第一个Cell,时间2表示LSTM的第二个Cell……以此类推。方块conv3*3表示在模型中该层的操作,圆形表示操作层与操作层之间的连接关系。
可以理解,由于网络结构的计算有先后顺序,将计算先后的关系的逻辑关系映射到LSTM上,就是图5中一个小方块从左往右,对应的每一个时间的LSTM的cell的状态。
具体地,在时间1下,cell输出的隐状态经过计算得到卷积conv3×3,conv3×3作为时间2下cell的输入层,时间1下cell输出的隐状态也作为时间2下cell的输入,计算得到圆圈1。
同理,圆圈1作为时间3下cell的输入,时间2下cell输出的隐状态也作为时间3的输入,计算得到卷积sep5×5……以此类推。
进一步地,请参阅图6,步骤S144包括:
步骤S1442:将特征向量进行归一化(softmax)以得到每个操作层的每个操作的概率;
步骤S1444:根据概率在每个操作层采样一个操作以得到目标网络结构。
对应地,处理器104用于将特征向量进行归一化以得到每个操作层的每个操作的概率;以及用于根据概率在每个操作层采样一个操作以得到目标网络结构。
如此,实现根据特征向量在每个操作层采样一个操作以得到网络结构。具体地,在图4所示的例子中,对LSTM的cell输出的隐状态进行编码(encoding)操作,将其映射维度为6的向量(vector),该向量经过归一化指数函数(softmax),变为概率分布,依据此概率分布进行采样,得到当前层的操作。以此类推最终得到一个网络结构。可以理解,在此例子中,只有一个输入,一共包含六种操作(3×3卷积,5×5卷积,3×3depthwise-separable卷积,5×5depthwise-separable卷积,max pooling,3×3average pooling),向量的维度与搜索空间对应,6是指搜索空间有6个操作可选择。
在步骤S16中,低比特化目标网络结构以得到第二网络结构的方法可以是前文的模型量化技术,在此不对低比特化目标网络结构的具体方法进行限定。
另外,步骤S19通过以下公式实现:
其中,R
k为第k个所述反馈量,θ
c为所述长短期记忆人工神经网络的参数,a
t为在第t 个所述操作层采样到的所述操作,P(a
t|a
(t-1):1;θ
c)为采样到所述操作的概率。m为所述反馈量的总数量,T为所述第一网络结构预测的超参数的数量。
如此,实现根据反馈量更新第一网络结构。
本申请实施方式的网络结构搜索的方法,可以是基于NAS的网络结构搜索方法,也可以是基于高效网络结构搜索(Efficient Neural Architecture Search,ENAS)的一切变种网络结构搜索方法(ENAS-like)。
ENAS可以是在基于强化学习(reinforcement learning,RL)NAS结构上提出的高效网络结构搜索方法,也可以是基于进化算法的高效网络结构搜索方法。可以理解,由于NAS的效率较低,而在本实施方式中,ENAS通过权值分享等方式可以提高网络结构搜索的效率。
在图10-14所示的实施方式(下称实施方式一)中,网络结构搜索的方法及装置10基于NAS。
在图15-19所示的实施方式(下称实施方式二)中,网络结构搜索的方法及装置10基于ENAS-like。
以下就实施方式一和实施方式二的网络结构搜索的方法及装置10分别进行说明。
实施方式一:
在本实施方式中,网络结构搜索的方法及装置10基于NAS。
请参阅图7,在本实施方式中,步骤S18包括:
步骤S181:将第二网络结构训练至收敛以确定反馈量。
对应地,处理器104用于将第二网络结构训练至收敛以确定反馈量。
如此,实现确定第二网络结构的反馈量。
另外,可预先将训练样本(train set)分为训练集(train)和测试集(valid)。可以理解,传统的CNN中,一般将样本数据分为训练样本和验证样本,训练样本用于训练网络结构,验证样本用于验证网络结构好不好。
而在本实施方式中,在搜索第二网络结构时,训练集用于训练搜索到的第二网络结构的参数,如通过conv3*3,sep5*5计算出的第二网络结构的参数,例如权重、偏置等。在搜索到第二网络结构后,可将搜索到的第二网络结构在测试集上预测,以得到反馈量来根据前述公式更新第一网络结构(LSTM)。请注意,并非直接用测试集训练LSTM。
也即是说,训练集用于训练搜索到的第二网络结构的参数,测试集用于更新LSTM的参数,而验证样本用于验证搜索到的第二网络结构好不好。
在一个例子中,训练样本的数量为10个,将训练样本划分为数量为8个的训练集和数量为2个的测试集,数量为8个的训练集用于训练搜到的结构,数量为2个的测试集用于 更新LSTM。
进一步地,请参阅图8,步骤S181包括:
步骤S182:利用训练集将第二网络结构训练至收敛;
步骤S184:利用测试集预测(prediction)收敛后的第二网络结构以确定反馈量。
对应地,处理器104用于利用训练集将第二网络结构训练至收敛;以及用于利用测试集预测收敛后的第二网络结构以确定反馈量。
如此,实现将第二网络结构训练至收敛以确定反馈量。
在一个例子中,在搜索空间的每个操作层采样一个操作,从而得到目标网络结构,然后再使用模型量化技术将目标网络结构低比特化为第二网络结构,接着,直接将第二网络结构在训练集上训练至收敛,并在测试集上预测收敛后的第二网络结构,从而确定反馈量。最后,将反馈量代入以下公式,以根据反馈量更新第一网络结构:
实施方式二:
在本实施方式中,网络结构搜索的方法及装置10基于ENAS-like。
请参阅图9,步骤S18包括:
步骤S183:根据第二网络结构确定反馈量,第二网络结构未被训练至收敛。
对应地,处理器104用于根据第二网络结构确定反馈量,第二网络结构未被训练至收敛。
如此,实现确定第二网络结构的反馈量。可以理解,实施方式一中,每得到一个第二网络结构就将其训练至收敛以确定反馈量,这样比较耗时,效率太低。在本实施方式中,根据第二网络结构确定反馈量时,第二网络结构未被训练至收敛,可以减少将第二网络结构训练至收敛的时间,从而提高效率。
另外,可预先将训练样本(train set)分为训练集(train)和测试集(valid)。可以理解,传统的CNN中,一般将样本数据分为训练样本和验证样本,训练样本用于训练网络结构,验证样本用于验证网络结构好不好。
而在本实施方式中,在搜索第二网络结构时,训练集用于训练搜索到的第二网络结构的参数,如通过conv3*3,sep5*5计算出的第二网络结构的参数,例如权重、偏置等。在搜索到第二网络结构后,可将搜索到的第二网络结构在测试集上预测,以得到反馈量来根据前述公式更新第一网络结构(LSTM)。请注意,并非直接用测试集训练LSTM。
也即是说,训练集用于训练搜索到的第二网络结构的参数,测试集用于更新LSTM的参数,而验证样本用于验证搜索到的第二网络结构好不好。
在一个例子中,训练样本的数量为10个,将训练样本划分为数量为8个的训练集和数量为2个的测试集,数量为8个的训练集用于训练搜到的结构,数量为2个的测试集用于更新LSTM。
进一步地,请参阅图10和图11,步骤S14包括:
步骤S146:利用训练集训练搜索空间的总图(whole graph),总图由操作连接而成;
步骤S148:根据第一网络结构对训练后的总图进行采样以得到目标网络结构;
步骤S18包括:
步骤S185:利用测试集预测(prediction)第二网络结构以确定反馈量。
对应地,处理器104用于利用训练集训练搜索空间的总图,总图由操作连接而成;及用于根据第一网络结构对训练后的总图进行采样以得到目标网络结构;以及用于利用测试集预测第二网络结构以确定反馈量。
图11所示的总图由节点间的操作连接而成。可以理解,图11中加粗的带边最优结构的连接方式是总图的一个子图。
请注意,步骤S146与步骤S148、步骤S16、步骤S185和步骤S19可迭代进行,直到完成预设的迭代总次数。这样可以获得较好的第二网络结构。
在本实施方式中,迭代总次数为310次。可以理解,在其他的实施方式中,迭代总次数的数值可为100次、200次或其他数值。
在每次迭代中,步骤S146可重复进行,每次利用训练集的一批数据(batch),直到训练集的数据使用完毕,也即是完成一个迭代(epoch)。然后,更新LSTM。
在更新LSTM时,步骤S148、步骤S16、步骤S185和步骤S19可循环进行,直到完成预设次数。
在本实施方式中,预设次数为50次。可以理解,在其他的例子中,预设次数可为10、20、30或其他数值值。在此不对预设次数的具体数值进行限定。可以理解,预设次数为50次,可以减少采样带来的随机性优化。
在每次循环步骤S148、步骤S16、步骤S185和步骤S19时,可确定预设数量的反馈量,从而利用预设数量的反馈量更新LSTM。进一步地,可以采用策略梯度优化的方式更新LSTM。在此不对更新LSTM的方式进行限定。
在本实施方式中,预设数量为20个。可以理解在其他的例子中,预设数量可为10、15、25或其他数值。在此不对预设数量的具体数值进行限定。
请参阅图12,步骤S146包括:
步骤S1462:在搜索空间的每个操作层采样一个操作以得到总图的子图;
步骤S1464:利用训练集的一批数据(batch)训练子图。
对应地,处理器104用于在搜索空间的每个操作层采样一个操作以得到总图的子图;以及用于利用训练集的一批数据训练子图。
如此,实现对总图的训练。在本实施方式中,ENAS采用权值分享策略,在每次采样到一个网络结构后,不再将其直接训练至收敛,而是利用训练集的一批数据(batch)训练子图。请注意,图的收敛并不相当于网络结构的收敛。
可以理解,本实施方式中,基于权值分享策略的ENAS,由于在每次搜索网络结构时,分享了可以分享的参数,可以节约时间,从而提高网络结构搜索的效率。例如,在图16的示例中,如果在搜索到节点1、节点3和节点6并对搜索到的网络结构进行训练之后,本次搜索到节点1、节点2、节点3和节点6,那么,搜索到节点1、节点3和节点6时训练的网络结构的相关参数可以应用到对本次搜索到的网络结构的训练中。这样,就可以实现通过权值分享提高效率。
在一个例子中,搜索空间为5层,每层有4个可选用的操作,相当于4X5的图。网络结构搜索需要在每层选一个操作,相当于在图上进行路径优化。初始时,每层随机采样一个操作,然后把采样到的操作连起来,得到一个子图,在训练集的一批数据上训练这个子图;接着,再每层随机采样一个操作得到另一个子图,再在训练集的另一批数据上训练这个子图;接着,继续随机采样到又一个子图并在训练集的又一批数据上训练这个子图……直到训练集中的数据使用完毕,也即是完成一个迭代(epoch)。然后训练第一网络结构。
接着,以相同的方式训练总图完成第二个epoch,然后训练第一网络结构。
接着,以相同的方式训练总图完成第三个epoch,然后训练第一网络结构……如此迭代,直至完成迭代总次数310次,以将总图和第一网络结构交替进行优化。也即是说,对总图的训练和对第一网络结构的更新是多次迭代进行的。如此,可以最终获得效果较好的第二网络结构。可以理解,如此迭代,完成310次后,总图收敛,第一网络结构也收敛。
具体地,在每次迭代中,可将第一网络结构更新预设次数50次,也即是将步骤S148、步骤S16、步骤S185和步骤S19循环50次。也即是说,在每次迭代中,以下公式执行50次:
进一步地,在每次循环更新第一网络结构时,可采样预设数量20个目标网络结构,低比特化后得到20个第二网络结构,从而确定20个反馈量。将20个反馈量作为R
k代入上述公式。也即是说,上述公式中,m的值为20。
本申请实施方式还提供一种计算机存储介质,其上存储有计算机程序,计算机程序被计算机执行时使得,计算机执行上述任一实施方式的方法。
本申请实施方式还提供一种包含指令的计算机程序产品,指令被计算机执行时使得计算机执行上述任一实施方式的方法。
本申请实施方式的计算机存储介质和计算机程序产品,将采样到的目标网络结构低比特化以得到第二网络结构,再确定第二网络结构的反馈量来更新第一网络结构,可以得到更适用于低比特网络的网络结构,从而实现高性能的低比特网络,进而使低比特网络能够更好的应用于移动端场景。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。
Claims (22)
- 一种网络结构搜索的方法,其特征在于,包括:确定待进行网络结构搜索的神经网络模型的搜索空间,所述搜索空间定义了所述神经网络模型中每两个节点之间的操作层上的多种操作;根据第一网络结构在所述搜索空间的每个所述操作层采样一个所述操作以得到目标网络结构;低比特化所述目标网络结构以得到第二网络结构;确定所述第二网络结构的反馈量;根据所述反馈量更新所述第一网络结构。
- 根据权利要求1所述的网络结构搜索的方法,其特征在于,确定所述第二网络结构的反馈量,包括:将所述第二网络结构训练至收敛以确定所述反馈量。
- 根据权利要求2所述的网络结构搜索的方法,其特征在于,将所述第二网络结构训练至收敛以确定所述反馈量,包括:利用训练集将所述第二网络结构训练至收敛;利用测试集预测收敛后的所述第二网络结构以确定所述反馈量。
- 根据权利要求1所述的网络结构搜索的方法,其特征在于,确定所述第二网络结构的反馈量,包括:根据所述第二网络结构确定所述反馈量,所述第二网络结构未被训练至收敛。
- 根据权利要求4所述的网络结构搜索的方法,其特征在于,根据第一网络结构在所述搜索空间的每个所述操作层采样一个所述操作以得到目标网络结构,包括:利用训练集训练所述搜索空间的总图,所述总图由所述操作连接而成;根据第一网络结构对训练后的所述总图进行采样以得到所述目标网络结构;根据所述第二网络结构确定所述反馈量,包括:利用测试集预测所述第二网络结构以确定所述反馈量。
- 根据权利要求5所述的网络结构搜索的方法,其特征在于,利用训练集训练所述搜 索空间的总图,包括:在所述搜索空间的每个所述操作层采样一个所述操作以得到所述总图的子图;利用所述训练集的一批数据训练所述子图。
- 根据权利要求1所述的网络结构搜索的方法,其特征在于,所述第一网络结构是根据长短期记忆人工神经网络来构建,所述搜索空间的每个所述操作层对应于所述长短期记忆人工神经网络的一个时间步,对于每个所述时间步,所述长短期记忆人工神经网络的细胞输出一个隐状态,根据第一网络结构在所述搜索空间的每个所述操作层采样一个所述操作以得到目标网络结构,包括:将所述隐状态映射为特征向量,所述特征向量的维度与每个所述操作层上的操作数量相同;根据所述特征向量在每个所述操作层采样一个所述操作以得到所述目标网络结构。
- 根据权利要求8所述的网络结构搜索的方法,其特征在于,根据所述特征向量在每个所述操作层采样一个所述操作以得到所述目标网络结构,包括:将所述特征向量进行归一化以得到每个所述操作层的每个所述操作的概率;根据所述概率在每个所述操作层采样一个所述操作以得到所述目标网络结构。
- 根据权利要求1所述的网络结构搜索的方法,其特征在于,所述第一网络结构根据卷积神经网络或循环神经网络构建。
- 一种网络结构搜索的装置,其特征在于,包括处理器和存储器,所述存储器存储有一个或多个程序,在所述程序被所述处理器执行的情况下,使得所述处理器用于执行: 确定待进行网络结构搜索的神经网络模型的搜索空间,所述搜索空间定义了所述神经网络模型中每两个节点之间的操作层上的多种操作;根据第一网络结构在所述搜索空间的每个所述操作层采样一个所述操作以得到目标网络结构;低比特化所述目标网络结构以得到第二网络结构;确定所述第二网络结构的反馈量;根据所述反馈量更新所述第一网络结构。
- 根据权利要求11所述的网络结构搜索的装置,其特征在于,所述处理器用于将所述第二网络结构训练至收敛以确定所述反馈量。
- 根据权利要求12所述的网络结构搜索的装置,其特征在于,所述处理器用于利用训练集将所述第二网络结构训练至收敛;以及用于利用测试集预测收敛后的所述第二网络结构以确定所述反馈量。
- 根据权利要求11所述的网络结构搜索的装置,其特征在于,所述处理器用于根据所述第二网络结构确定所述反馈量,所述第二网络结构未被训练至收敛。
- 根据权利要求14所述的网络结构搜索的装置,其特征在于,所述处理器用于利用训练集训练所述搜索空间的总图,所述总图由所述操作连接而成;及用于根据第一网络结构对训练后的所述总图进行采样以得到所述目标网络结构;以及用于利用测试集预测所述第二网络结构以确定所述反馈量。
- 根据权利要求15所述的网络结构搜索的装置,其特征在于,所述处理器用于在所述搜索空间的每个所述操作层采样一个所述操作以得到所述总图的子图;以及用于利用所述训练集的一批数据训练所述子图。
- 根据权利要求11所述的网络结构搜索的装置,其特征在于,所述第一网络结构是根据长短期记忆人工神经网络来构建,所述搜索空间的每个所述操作层对应于所述长短期记忆人工神经网络的一个时间步,对于每个所述时间步,所述长短期记忆人工神经网络的细胞输出一个隐状态,所述处理器用于将所述隐状态映射为特征向量,所述特征向量的维度与每个所述操作层上的操作数量相同;以及用于根据所述特征向量在每个所述操作层采样一个所述操作以得到所述目标网络结构。
- 根据权利要求18所述的网络结构搜索的装置,其特征在于,所述处理器用于将所述特征向量进行归一化以得到每个所述操作层的每个所述操作的概率;以及用于根据所述概率在每个所述操作层采样一个所述操作以得到所述目标网络结构。
- 根据权利要求11所述的网络结构搜索的装置,其特征在于,所述第一网络结构根据卷积神经网络或循环神经网络构建。
- 一种计算机存储介质,其特征在于,其上存储有计算机程序,所述计算机程序被计算机执行时使得,所述计算机执行权利要求1至10中任一项所述的方法。
- 一种包含指令的计算机程序产品,其特征在于,所述指令被计算机执行时使得计算机执行权利要求1至10中任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/089698 WO2020237689A1 (zh) | 2019-05-31 | 2019-05-31 | 网络结构搜索的方法及装置、计算机存储介质和计算机程序产品 |
CN201980009246.6A CN111656365A (zh) | 2019-05-31 | 2019-05-31 | 网络结构搜索的方法及装置、计算机存储介质和计算机程序产品 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/089698 WO2020237689A1 (zh) | 2019-05-31 | 2019-05-31 | 网络结构搜索的方法及装置、计算机存储介质和计算机程序产品 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020237689A1 true WO2020237689A1 (zh) | 2020-12-03 |
Family
ID=72351852
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/089698 WO2020237689A1 (zh) | 2019-05-31 | 2019-05-31 | 网络结构搜索的方法及装置、计算机存储介质和计算机程序产品 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111656365A (zh) |
WO (1) | WO2020237689A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560985A (zh) * | 2020-12-25 | 2021-03-26 | 北京百度网讯科技有限公司 | 神经网络的搜索方法、装置及电子设备 |
CN112949832A (zh) * | 2021-03-25 | 2021-06-11 | 鼎富智能科技有限公司 | 一种网络结构搜索方法、装置、电子设备及存储介质 |
CN114462484A (zh) * | 2021-12-27 | 2022-05-10 | 东软睿驰汽车技术(沈阳)有限公司 | 网络架构搜索方法、装置、设备及存储介质 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112990436B (zh) * | 2021-03-23 | 2024-08-27 | 联想(北京)有限公司 | 一种神经网络架构选择方法、装置和电子设备 |
CN113434750B (zh) * | 2021-06-30 | 2022-09-06 | 北京市商汤科技开发有限公司 | 神经网络搜索方法、装置、设备及存储介质 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109190754A (zh) * | 2018-08-30 | 2019-01-11 | 北京地平线机器人技术研发有限公司 | 量化模型生成方法、装置和电子设备 |
CN109242098A (zh) * | 2018-07-25 | 2019-01-18 | 深圳先进技术研究院 | 限定代价下神经网络结构搜索方法及相关产品 |
-
2019
- 2019-05-31 WO PCT/CN2019/089698 patent/WO2020237689A1/zh active Application Filing
- 2019-05-31 CN CN201980009246.6A patent/CN111656365A/zh active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109242098A (zh) * | 2018-07-25 | 2019-01-18 | 深圳先进技术研究院 | 限定代价下神经网络结构搜索方法及相关产品 |
CN109190754A (zh) * | 2018-08-30 | 2019-01-11 | 北京地平线机器人技术研发有限公司 | 量化模型生成方法、装置和电子设备 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560985A (zh) * | 2020-12-25 | 2021-03-26 | 北京百度网讯科技有限公司 | 神经网络的搜索方法、装置及电子设备 |
CN112560985B (zh) * | 2020-12-25 | 2024-01-12 | 北京百度网讯科技有限公司 | 神经网络的搜索方法、装置及电子设备 |
CN112949832A (zh) * | 2021-03-25 | 2021-06-11 | 鼎富智能科技有限公司 | 一种网络结构搜索方法、装置、电子设备及存储介质 |
CN112949832B (zh) * | 2021-03-25 | 2024-04-16 | 鼎富智能科技有限公司 | 一种网络结构搜索方法、装置、电子设备及存储介质 |
CN114462484A (zh) * | 2021-12-27 | 2022-05-10 | 东软睿驰汽车技术(沈阳)有限公司 | 网络架构搜索方法、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN111656365A (zh) | 2020-09-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020237689A1 (zh) | 网络结构搜索的方法及装置、计算机存储介质和计算机程序产品 | |
CN113408743B (zh) | 联邦模型的生成方法、装置、电子设备和存储介质 | |
WO2019114344A1 (zh) | 一种基于图结构模型的异常账号防控方法、装置以及设备 | |
WO2020087974A1 (zh) | 生成模型的方法和装置 | |
CN108520470B (zh) | 用于生成用户属性信息的方法和装置 | |
CN111784041B (zh) | 一种基于图卷积神经网络的风电功率预测方法及系统 | |
CN112236782A (zh) | 通信系统中的端到端学习 | |
CN112131274B (zh) | 时间序列异常点的检测方法、装置、设备及可读存储介质 | |
WO2024051655A1 (zh) | 全视野组织学图像的处理方法、装置、介质和电子设备 | |
WO2020107264A1 (zh) | 神经网络架构搜索的方法与装置 | |
CN111459780B (zh) | 用户识别方法、装置、可读介质及电子设备 | |
CN116090504A (zh) | 图神经网络模型训练方法及装置、分类方法、计算设备 | |
CN116668351A (zh) | 服务质量预测方法、装置、计算机设备及存储介质 | |
JP2023041618A (ja) | コンピュータ実装方法、情報ハンドリングシステムおよびコンピュータプログラム製品(コグニティブマルチエージェントシステムにおけるデータ完全性の保持) | |
CN111582456B (zh) | 用于生成网络模型信息的方法、装置、设备和介质 | |
WO2023096570A2 (zh) | 故障gpu的预测方法、装置、电子设备及存储介质 | |
CN114419339A (zh) | 基于电力肖像的数据重建模型训练的方法和装置 | |
WO2021146977A1 (zh) | 网络结构搜索方法和装置 | |
WO2020237687A1 (zh) | 网络结构搜索的方法及装置、计算机存储介质和计算机程序产品 | |
WO2020237688A1 (zh) | 网络结构搜索的方法及装置、计算机存储介质和计算机程序产品 | |
CN113850390A (zh) | 联邦学习系统中共享数据的方法、装置、设备及介质 | |
CN114399355B (zh) | 基于用户转化率的信息推送方法、装置和电子设备 | |
CN117336187B (zh) | 一种基于连边间关联的无人机通信网络推断方法 | |
WO2021081809A1 (zh) | 网络结构搜索的方法、装置、存储介质和计算机程序产品 | |
CN115174681B (zh) | 一种边缘计算服务请求调度方法、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19931317 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19931317 Country of ref document: EP Kind code of ref document: A1 |