WO2021175058A1 - 一种神经网络架构搜索方法、装置、设备及介质 - Google Patents

一种神经网络架构搜索方法、装置、设备及介质 Download PDF

Info

Publication number
WO2021175058A1
WO2021175058A1 PCT/CN2021/074533 CN2021074533W WO2021175058A1 WO 2021175058 A1 WO2021175058 A1 WO 2021175058A1 CN 2021074533 W CN2021074533 W CN 2021074533W WO 2021175058 A1 WO2021175058 A1 WO 2021175058A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
models
sub
hardware
network architecture
Prior art date
Application number
PCT/CN2021/074533
Other languages
English (en)
French (fr)
Inventor
周卫民
郭益君
李亿
麦宇庭
邓彬彬
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21765071.2A priority Critical patent/EP4105835A4/en
Publication of WO2021175058A1 publication Critical patent/WO2021175058A1/zh
Priority to US17/902,206 priority patent/US20220414426A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a neural network architecture search method, device, equipment, and computer-readable storage medium.
  • AI artificial intelligence
  • neural networks neural networks
  • many fields such as image processing, audio and video recognition and other fields have made great progress.
  • AI-based image processing effects or audio and video recognition effects often depend on the performance of the adopted neural network.
  • Neural networks with better performance usually require experienced technical experts to spend a lot of energy to build a neural network architecture, and then train based on the neural network architecture to obtain a neural network that can be used for specific applications.
  • NAS neural architecture search
  • This application provides a neural network architecture search method, which solves the search efficiency in related technologies by decoupling different initial sub-model training processes, and decoupling the neural network architecture search process and the initial sub-model training process Low, difficult to meet the business needs of the problem.
  • This application also provides devices, equipment, computer-readable storage media, and computer program products corresponding to the method.
  • this application provides a neural network architecture search method.
  • This method decouples the training process of different initial sub-models, so that the process of training different initial sub-models can be well paralleled, and decouples the process of searching the neural network architecture and the process of training the initial sub-model, so that the search
  • the process of neural network architecture and the process of training the initial sub-model can also be well paralleled, thereby reducing search time and improving search efficiency.
  • the method is applied to a search system.
  • the search system includes a generator and a searcher.
  • the generator generates multiple neural network architectures according to the search space. By initializing the weights of the multiple neural network architectures, multiple initial sub-models can be obtained.
  • the model training platform can train multiple initial sub-models in parallel to obtain multiple sub-models, thus realizing the decoupling of the training process of multiple initial sub-models.
  • the model inference platform can perform inference on each sub-model on the first hardware, and obtain the evaluation index value of the trained sub-model on the first hardware.
  • the searcher can obtain the evaluation index values of the multiple sub-models on the first hardware, and use the evaluation index values and the neural network architecture corresponding to the sub-models to determine the first target neural network architecture that meets the preset conditions. Since the search neural network architecture does not need to rely on the actual evaluation index value of the previous sub-model, the neural network architecture search process and the initial sub-model training process can also be processed in parallel, thus realizing the decoupling of the neural network architecture search process and the initial sub-model training process .
  • this application provides a searcher including an evaluator and a controller.
  • the searcher uses the evaluator to predict the evaluation index value corresponding to the neural network architecture, and uses the predicted evaluation index value as an incentive. There is no need to wait for the model reasoning platform to reason to obtain the actual evaluation index value. This greatly reduces the delay of incentives and improves Search efficiency.
  • the searcher includes an evaluator and a controller, and the searcher trains the evaluator according to the neural network architecture corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the first hardware, and the searcher
  • the controller is trained using a trained evaluator, and the first target neural network architecture that satisfies a preset condition is determined according to the trained controller.
  • the evaluation index values of the plurality of sub-models on the first hardware represent evaluation index values obtained by the plurality of sub-models performing inference on the first hardware, and the evaluation index value is true Evaluation index value.
  • the evaluator trained with real evaluation index value has high credibility and can be used to predict the evaluation index value corresponding to the neural network architecture.
  • the sub-model needs to be extended to other hardware or migrated to other hardware such as the second hardware.
  • the design of the second hardware is different from the design of the first hardware, which may result in that the first target neural network architecture suitable for the first hardware is not suitable for the second hardware. Based on this, it is also possible to search for a second target neural network architecture suitable for the second hardware.
  • the second hardware may be known hardware or new hardware.
  • the evaluation index value includes two types of evaluation index values.
  • the first type of evaluation index value changes with hardware changes
  • the second type of evaluation index value does not change with hardware changes.
  • the search is suitable for the first hardware
  • the evaluation index value used in the first target neural network architecture includes the second type evaluation index value
  • the second type evaluation index value of the sub-model on the first hardware can also be used as the second type of the sub-model on the second hardware.
  • the class evaluation index value thus saves the time for performing inference on the sub-model on the second hardware, further shortens the search time, and improves the search efficiency.
  • the evaluation index value includes hardware-related performance values
  • the searcher can obtain the performance values of the multiple sub-models on the second hardware, and the performance values of the multiple sub-models on the second hardware are determined by the multiple sub-models.
  • the model is obtained by performing inference on the second hardware, and then the searcher determines the second target that meets the preset conditions according to the neural network architectures corresponding to the multiple sub-models and the performance values of the multiple sub-models on the second hardware Neural network architecture.
  • the hardware-related performance value includes any one or more of the following: model inference time, activation amount, throughput, power consumption, and video memory occupancy rate.
  • the search space is characterized by the attribute value space of each attribute of the neuron.
  • the generator may randomly select an attribute value for each attribute from the attribute value space of each attribute of the neuron to obtain the multiple neural network architectures. In this way, the balance of the model of the training evaluator can be guaranteed.
  • the generator can intervene in the random sampling process. Specifically, the generator can randomly select an attribute value for each attribute from the attribute value space of each neuron to obtain a neural network architecture, and then remove the selected attribute value from the attribute value space The attribute value of each attribute randomly selects an attribute value to obtain another neural network architecture.
  • the device can randomly select an attribute value for the first attribute from the attribute value space of the first attribute of the neuron, and the attribute value of the second attribute of the neuron except the selected attribute value is the first attribute value.
  • Two attributes randomly select an attribute value to generate a new neural network architecture.
  • the generator When the attribute value of each attribute of the neuron in the neural network architecture covers the corresponding attribute value space, the generator generates a set of neural network architecture.
  • the generator may provide an application programming interface to the user, and generate multiple nerves for the user through the application programming interface.
  • Network Architecture In some possible implementations, in order to protect data security, such as search space and neural network architecture security, the generator may provide an application programming interface to the user, and generate multiple nerves for the user through the application programming interface.
  • the search system can also implement neural network architecture search without data exchange, which can break data islands and improve the performance of the searched neural network architecture. Accuracy.
  • the search system further includes a model training platform that can use M data sets to train N initial sub-models obtained according to multiple neural network architectures, where the neural network architecture and the initial sub-models One-to-one correspondence, N is greater than 1, M is greater than.
  • model training platform trains N initial sub-models, it can be implemented in the following ways:
  • the first implementation is that the model training platform uses M data sets to perform federated learning for each of the N initial sub-models to obtain N sub-models;
  • the second implementation method is that the model training platform uses M data sets to train each of the N initial sub-models to obtain N*M sub-models; or,
  • the third implementation manner is that the model training platform divides the N initial sub-models into M groups of initial sub-models, and the M groups of initial sub-models correspond to M data sets in a one-to-one relationship. Use the corresponding data set for training to obtain M groups of sub-models.
  • this application provides a neural network architecture search device.
  • the device is applied to a search system, the search system includes a generator and a searcher, and the device includes:
  • the generation module is used to generate multiple neural network architectures according to the search space;
  • a communication module configured to obtain evaluation index values of multiple sub-models on the first hardware obtained according to the multiple neural network architectures
  • the search module is configured to determine the first target neural network architecture that meets the preset conditions according to the neural network architectures corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the first hardware.
  • the searcher includes an evaluator and a controller, and the search module is specifically configured to:
  • the controller is trained using a trained evaluator, and the first target neural network architecture that satisfies a preset condition is determined according to the trained controller.
  • the evaluation index values of the plurality of sub-models on the first hardware represent evaluation index values obtained by inference of the plurality of sub-models on the first hardware.
  • the evaluation index value includes a hardware-related performance value
  • the communication module is further configured to:
  • the search module is also used for:
  • a second target neural network architecture that meets the preset condition is determined.
  • the hardware-related performance value includes any one or more of the following: model inference time, activation amount, throughput, power consumption, and video memory occupancy rate.
  • the search space is characterized by the attribute value space of each attribute of the neuron
  • the generating module is specifically used for:
  • An attribute value is randomly selected for each attribute from the attribute value space of each attribute of the neuron to obtain the multiple neural network architectures.
  • the generating module is specifically used for:
  • An application programming interface is provided to the user, and multiple neural network architectures are generated for the user through the application programming interface.
  • the search system further includes a model training platform
  • the device further includes a training module for:
  • the N initial sub-models are obtained according to the multiple neural network architectures, the initial sub-models correspond to the neural network architectures one-to-one, the N is greater than 1, and the M is greater than 1.
  • the present application provides a computer cluster.
  • the computing cluster includes at least one computer, and each computer includes a processor and a memory.
  • the processor and the memory communicate with each other.
  • the processor of the at least one computer is configured to execute the instructions stored in the memory of the at least one computer, so that the computer cluster executes the neural network architecture search method in the first aspect or any one of the implementations of the first aspect .
  • the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which, when running on a computer cluster, cause the computer cluster to execute any of the above-mentioned first aspect or any of the first aspects.
  • the present application provides a computer program product containing instructions, which when run on a computer cluster, causes the computer cluster to execute the neural network architecture described in the first aspect or any one of the implementations of the first aspect. Search method.
  • FIG. 1 is a schematic flowchart of a neural network architecture search provided by an embodiment of this application
  • FIG. 2 is a schematic structural diagram 100 of a search system provided by an embodiment of this application.
  • FIG. 3 is a schematic structural diagram 200 of a search system provided by an embodiment of this application.
  • FIG. 4 is an interaction flowchart of a neural network architecture search method provided by an embodiment of the application.
  • Fig. 5 is a schematic diagram of a neural network architecture provided by an embodiment of the application.
  • Fig. 6 is a schematic diagram of a neural network architecture provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of a process for determining a first target neural network architecture provided by an embodiment of this application.
  • FIG. 8 is a schematic structural diagram of an evaluator provided by an embodiment of the application.
  • FIG. 9 is a schematic diagram of a process for determining a second target neural network architecture provided by an embodiment of this application.
  • FIG. 10 is a schematic flowchart of a neural network architecture search provided by an embodiment of this application.
  • FIG. 11 is a schematic flowchart of a neural network architecture search provided by an embodiment of this application.
  • FIG. 12 is a schematic flowchart of a neural network architecture search provided by an embodiment of this application.
  • FIG. 13 is a schematic structural diagram of a neural network architecture search device provided by an embodiment of this application.
  • FIG. 14 is a schematic structural diagram of a computer cluster provided by an embodiment of this application.
  • Neural networks is a kind of neural network that simulates the human brain in order to achieve a mathematical model similar to artificial intelligence. Neural networks can also be called neural network models. Neural networks usually use multiple neurons (also called nodes) with connection relationships to simulate the neural network of the human brain to implement tasks such as image classification and speech recognition.
  • the structure obtained by connecting the neurons in each neural network is called the neural network architecture of the neural network.
  • Typical neural network architectures include recurrent neural network (RNN), convolutional neural network (convolutional neural network, CNN), and so on.
  • the neural network architecture can be characterized by a directed graph. Each edge in a directed graph has a weight, and the weight is used to characterize the importance of the input node in an edge relative to the output node in the edge.
  • the parameters of the neural network include the above-mentioned weights. It should be noted that the weights can usually be obtained by training the neural network using samples.
  • Obtaining the neural network model according to the neural network architecture includes two stages.
  • One stage is to perform weight initialization on the neural network architecture to obtain the initial neural network model, also called the initial sub-model.
  • the weight initialization refers to the initialization of the weight (in some cases, the bias) of each edge in the neural network architecture.
  • the initial value of the weight can be generated through the Gaussian distribution to realize the weight initialization.
  • the other stage is to update the weight of the initial sub-model with sample data to obtain a child model.
  • the sample data is input into the initial sub-model, and the initial sub-model can determine the loss value according to the predicted value and the true value carried by the sample data, and update the weight of the initial sub-model based on the loss value.
  • a sub-model can be obtained. This sub-model is a neural network model that has been trained and can be used for specific applications.
  • the pros and cons of a sub-model can be measured by the evaluation index value of the sub-model.
  • the evaluation index value is a metric value obtained by evaluating the sub-model from at least one dimension.
  • the evaluation index values of the sub-models can be divided into two types, one type of evaluation index value changes with hardware changes, and the other type of evaluation index value remains unchanged with hardware changes.
  • the evaluation index value that changes with the hardware change is called the first type of evaluation index value
  • the evaluation index value that remains unchanged with the hardware change is called the second type of evaluation index value.
  • the first type of evaluation index value is the evaluation index value related to the hardware, including the performance value related to the hardware.
  • the hardware-related performance value includes any one or more of model inference time, activation amount, throughput, power consumption, and video memory occupancy.
  • the second type of evaluation index values are evaluation index values that are not related to hardware, including precision values that are not related to hardware.
  • the accuracy value includes any one or more of accuracy, precision, and recall.
  • the evaluation index values that are not related to hardware also include parameter amounts and computing power, and computing power specifically includes floating-point operations per second (FLOPs).
  • neural network architecture design For the current tasks based on neural networks, the main process is still to manually explore new neural network architectures by researchers.
  • the performance of neural networks often depends on the understanding of the task and the imagination of neural network architecture design.
  • the entire process of neural network architecture design requires researchers to have a full understanding of related fields, which indirectly raises the barriers to entry for practitioners. Moreover, it is also very time-consuming for researchers to continuously improve the neural network architecture through manual methods.
  • NAS neural network architecture search
  • the search space defines the scope of the search, based on which a set of searchable neural network architectures can be provided. According to the type of neural network that needs to be constructed, the search space can be divided into multiple types such as chain architecture space, multi-branch architecture space, and block-based search space.
  • the above-mentioned different types of search spaces can be characterized by the attribute value space corresponding to each attribute of the neuron (ie, node) included in the neural network architecture.
  • the block-based search space is used as an example below.
  • Figure 1 shows the principle of searching the neural network architecture from the search space.
  • an RNN-based control neural network also called a controller, samples a neural network architecture with probability p from the search space A, then initialize the weight of the neural network architecture A to obtain an initial sub-model, and then train the initial sub-model to obtain a sub-model, and obtain the accuracy R of the sub-model on the validation set, and then use the accuracy R to update the control
  • the parameters of the controller are executed in a loop until the controller converges, so as to obtain a high-performance neural network architecture and realize the design of the neural network architecture.
  • the generation of a new neural network architecture relies on the accuracy of the previous sub-model to train the controller, and the generation of a new sub-model relies on the training of the initial sub-model obtained by initializing the weights of the new neural network architecture. , That is, the training process of multiple initial sub-models cannot be well paralleled.
  • the process of searching the neural network architecture depends on the process of training the initial sub-model, that is, the process of searching the neural network architecture and the process of training the initial sub-model are highly coupled, and the process of searching the neural network architecture and training the initial sub-model cannot Well paralleled. The above two aspects lead to low search efficiency and long search time.
  • an embodiment of the present application provides a neural network architecture search method.
  • This method decouples the process of searching the neural network architecture and the process of training the initial sub-model, and decoupling the training process of different initial sub-models, so that the process of searching the neural network architecture and the process of training the initial sub-model can be very Well parallel, the training process of different initial sub-models can also be well paralleled, thereby reducing search time and improving search efficiency.
  • the method is applied to a search system.
  • the search system includes a generator and a searcher.
  • the generator can generate multiple neural network architectures based on the search space, and initialize the weights of the multiple neural network architectures. Multiple initial sub-models can be obtained, and the model training platform can train multiple initial sub-models in parallel to obtain multiple sub-models. In this way, the decoupling of the training process of multiple initial sub-models is realized.
  • the model inference platform can perform inference on each sub-model on the first hardware, and obtain the evaluation index value of the trained sub-model on the first hardware.
  • the searcher can obtain the evaluation index values of the multiple sub-models on the first hardware, and use the evaluation index values and the neural network architecture corresponding to the sub-models to determine the first target neural network architecture that meets the preset conditions. Since the search neural network architecture does not need to rely on the actual evaluation index value of the previous sub-model, the neural network architecture search process and the initial sub-model training process can also be processed in parallel, thus realizing the decoupling of the neural network architecture search process and the initial sub-model training process . As a result, the search time of the neural network architecture is greatly reduced, and the search efficiency is improved.
  • the model training platform may be located on the service side.
  • the model training platform may be provided by a neural network architecture search cloud service provider.
  • the model training platform can also be located on the user side, that is, the model training platform can be provided by users who need to perform neural network architecture searches. Using the model training platform provided by the user for sub-model training can avoid the leakage of training data used to train the sub-model and ensure data security.
  • the model reasoning platform can be located on the service side or on the user side.
  • the model inference platform is located on the user side, the model inference platform provided by the user is used to perform inference on the sub-model trained by the model training platform to obtain the evaluation index value. There is no need to upload the sub-model to the server side, which can avoid sub-model leakage and guarantee Model privacy.
  • an embodiment of the present application provides a searcher, which includes a controller and an evaluator. Among them, the controller can also be called a filter.
  • the searcher uses the neural network architecture generated by the generator and the evaluation index values of the sub-model corresponding to the neural network architecture as training data for supervised learning, which can greatly shorten training time and improve training efficiency . Moreover, since the neural network architecture and the corresponding evaluation index values can be reused, they are used to train the evaluator for multiple epochs, thus reducing the amount of data required for training and improving the utilization of training data .
  • the evaluator can be used to give feedback to the controller. Specifically, the trained evaluator predicts the evaluation index value of the neural network architecture provided by the controller as the excitation of the controller, without using sub-models The actual evaluation index value of, in this way, improves the training efficiency.
  • the candidate neural network architecture can be output, and the first target neural network architecture that meets the preset conditions can be selected from the candidate neural network architecture.
  • the controller can output the candidate neural network architecture quickly. Based on the candidate neural network architecture, the first target neural network architecture that meets the preset conditions can be quickly selected, which further shortens The neural network architecture has a long search time, which improves the search efficiency.
  • the neural network architecture search method provided by the embodiment of the present application may include but is not limited to being applied to the application scenario shown in FIG. 2.
  • the search system includes a generator 202, a searcher 204, a model training platform 206, and a model reasoning platform 208.
  • the generator 202 is used to generate multiple neural network architectures according to the search space
  • the model training platform 206 is used to train multiple initial sub-models obtained according to multiple neural network architectures to obtain multiple sub-models
  • the model inference platform 208 is used to Perform reasoning on the multiple sub-models on the first hardware to obtain the evaluation index values of the multiple sub-models on the first hardware.
  • the searcher 208 is used to calculate the neural network architecture corresponding to the multiple sub-models and the multiple sub-models on the first hardware.
  • the evaluation index value determines the first target neural network architecture that meets the preset conditions.
  • the generator 202, the searcher 204, the model training platform 206, and the model inference platform 208 are deployed in the same cloud computing cluster (the cloud computing cluster includes at least one cloud computing device, such as a cloud server), specifically the service side Cloud computing cluster.
  • the generator 202 samples the search space to generate multiple neural network architectures.
  • the neural network architecture generated by the generator 202 can be specifically represented by coding.
  • the model training platform 206 can initialize the weights of multiple neural network architectures to obtain multiple initial sub-models, and then train multiple initial sub-models to obtain multiple sub-models. Wherein, when the neural network architecture is represented by coding, the model training platform 206 first parses the coding to obtain the neural network architecture, and then performs the operations of weight initialization and training of the initial sub-model.
  • the model inference platform 208 can perform inference on the trained multiple sub-models on the first hardware, and obtain the evaluation index value of each sub-model on the first hardware.
  • the searcher 204 may obtain the neural network architectures corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the first hardware, and according to the neural network architectures corresponding to the multiple sub-models and the multiple sub-models on the first hardware
  • the evaluation index value on the training evaluator 2042 uses the evaluation index value as a label, and trains the evaluator 2042 in a supervised learning manner.
  • the searcher 204 may also use a neural network architecture code instead of the neural network architecture.
  • the searcher 204 may use the trained evaluator 2042 to train the controller 2044, and then determine the first target neural network architecture that satisfies the preset condition according to the trained controller 2044.
  • the controller 2044 can provide a neural network architecture
  • the evaluator 2042 can predict the evaluation index value of the neural network architecture
  • the searcher 204 can use the predicted evaluation index value as an incentive for the neural network architecture to update the controller 2044. Parameters without having to initialize the weight of the neural network architecture to obtain the initial sub-model, and then train the initial sub-model to obtain the sub-model, and perform inference on the sub-model through the model inference platform to obtain the actual evaluation index value. In this way, the decoupling of the neural network architecture search process and the initial sub-model training process is realized, the search time is shortened, and the search efficiency is improved.
  • the controller 2044 may output at least one candidate neural network architecture.
  • the searcher 204 may determine the first target neural network architecture that satisfies the preset condition according to the candidate neural network architecture. Specifically, an initial sub-model can be generated according to each candidate neural network architecture output by the controller 2044. After the initial sub-model is trained, a sub-model can be obtained. When the sub-model meets preset conditions, for example, the evaluation index of the sub-model When the value reaches the preset value, the candidate neural network architecture corresponding to the sub-model can be determined as the first target neural network architecture for specific applications.
  • FIG. 2 illustrates that the search system includes a model training platform 206 and a model inference platform 208, and the generator 202, searcher 204, and model training platform 206 and model inference platform 208 of the search system are deployed in the same cloud computing cluster.
  • the search system may not include the above-mentioned model training platform 206 and model inference platform 208, which can be combined with The model training platform 206 and the model inference platform 208 on the user side interact to implement neural network architecture search.
  • the generator 202 and the searcher 204 are deployed in a first cloud computing cluster, which may specifically be a cloud computing cluster on the service side.
  • the model training platform 206 and the model inference platform 208 are deployed in the second cloud computing cluster, which may specifically be a cloud computing cluster on the user side.
  • the cloud computing cluster on the service side can be a public cloud
  • the cloud computing cluster on the user side can be a private cloud, so that neural network architecture search can be implemented based on the hybrid cloud formed by the public cloud and the private cloud.
  • the codes of each neural network architecture are transmitted to the model training platform 206 on the user side.
  • the model training platform 206 can analyze the codes of the neural network architecture to obtain Neural network architecture, and then initialize the weights of the neural network architecture to obtain the initial sub-model, and use the training data to train the initial sub-model to obtain the sub-model.
  • the model inference platform 208 on the user side can perform inference on the sub-model on the first hardware to obtain the evaluation index value of the sub-model on the first hardware.
  • the coding of each neural network architecture and the evaluation index value of the corresponding sub-model on the first hardware can be transmitted to the searcher 204 on the server side.
  • the searcher 204 trains the evaluator 2042 by using the above coding and the corresponding evaluation index value. After the training is completed The evaluator 2042 can then be used in turn to train the controller 2044.
  • the controller 2044 can output at least one candidate neural network architecture when the training is completed, and the first target neural network architecture that meets the preset condition can be determined from the candidate neural network architectures.
  • FIG. 2 and FIG. 3 are only some specific examples of the neural network architecture search application scenario provided by the embodiment of the present application.
  • the generator 202, the searcher 204, the model training platform 206, and the model inference platform 208 can be deployed in different cloud computing clusters respectively, or they can be deployed in different clouds in a pairwise combination.
  • the computing cluster it may be deployed in a cloud computing cluster in the form of a combination of any three, and the remaining one is deployed in another cloud computing cluster.
  • the generator 202, the searcher 204, the model training platform 206, and the model inference platform 208 may not be deployed in the cloud computing cluster, but directly deployed in physical devices such as servers, or some of them may be deployed in the cloud computing cluster and the other may be deployed in the cloud computing cluster. Physical equipment.
  • the neural network architecture search method of the embodiments of the present application will be introduced from the perspective of interaction of the generator 202, the searcher 204, the model training platform 206, and the model inference platform 208.
  • the method includes:
  • the generator 202 generates multiple neural network architectures according to the search space.
  • the search space can be characterized by the attribute value space corresponding to each attribute of the neuron (also called node) included in the neural network architecture.
  • the search space can be characterized by the value space of the two attributes of neuron (identity, id) and operation (operation, op).
  • the search space may also be characterized in combination with at least one of the number of layers included in the neural network architecture, the data of the unit blocks included in each layer, and the number of neurons included in each unit block.
  • the generator 202 may encode the attribute value of the neuron included in the neural network architecture, and the encoding result may be used to represent the neural network architecture.
  • the search space is defined as: a layer includes 5 blocks (specifically block0-block4), each block includes two nodes x and y, two nodes x and y have id and Two attributes of op.
  • the attribute value space (that is, the value range) of the id attribute in block0 is ⁇ 0,1 ⁇
  • the attribute value space of the id attribute in blocki is ⁇ 0,1...i+1 ⁇ , which represents the id that can be selected.
  • the value space of the op attribute of each block is ⁇ 0,1...5 ⁇ , which means that each operation has 6 optional operations.
  • the generator 202 can sample the search space to generate multiple neural network architectures. When sampling the search space, the generator 202 can perform sampling in a random manner. In this way, the sample balance of the training evaluator 2042 can be guaranteed.
  • the generator 202 can intervene in the random sampling process. Specifically, the generator 202 can randomly select an attribute value for each attribute from the attribute value space of each neuron to obtain a neural network architecture, and then remove the selected attribute value from the attribute value space Other attribute values are randomly selected for each attribute to obtain another neural network architecture.
  • the generator 202 can randomly select an attribute value for the first attribute from the attribute value space of the first attribute of the neuron, and remove attributes other than the selected attribute value from the attribute value space of the second attribute of the neuron For the second attribute, an attribute value is randomly selected to generate a new neural network architecture.
  • the generator 202 When the attribute values of each attribute of the neuron in the neural network architecture all cover the corresponding attribute value space, the generator 202 generates a group of neural network architectures.
  • each block uses the above operations.
  • the codes of the multiple neural network architectures (referred to as multiple architecture codes) are used as a batch of architecture data.
  • the generator 202 may perform the above operations in a loop, thereby generating multiple batches of architecture data.
  • each batch of architecture data includes 6 pieces of architecture data.
  • the specific generation process is as follows:
  • the id of x node of block 0 is randomly selected from 0 and 1
  • the op of x node of block 0 is randomly selected from the six values 0-5.
  • the remaining positions are also randomly selected and generated.
  • the value range of the x node id of block0 is 0 and 1. After subtracting the 0 already selected by Arc0, the remaining value range is 1, and only 1 can be selected.
  • the op value range of block0 x node is 0-5, minus the 0 selected in Arc0, and 1 is randomly selected from the remaining 1-5.
  • the y node of block 0 and the x/y node of the remaining blocks are randomly selected according to the above method to generate the second piece of architecture data Arc1.
  • the x node id of the third architecture data block0 has a value range of 0 and 1. Since Arc0 and Arc1 have already covered their value range, a random number is selected directly from its value range, and 0 is selected here.
  • the op value range of block0 x node is 0-5 minus the 0 and 1 already selected by Arc0 and Arc1, and 3 is selected from 2-5.
  • the y node of block 0 and the x/y node of the remaining blocks are randomly selected according to the above method, and the third piece of architecture data Arc2 is generated.
  • the search space can be user-defined or automatically generated by the search system. Specifically, the user can configure target scenes, such as image classification scenes, target detection scenes, etc.
  • the search system can search for a built-in search space based on the target scene, and then determine a search space matching the target scene based on the built-in search space.
  • the generator 202 sends multiple neural network architectures to the model training platform 206.
  • the generator 202 may send codes of multiple neural network architectures to the model training platform 206.
  • the generator 202 may send the neural network architecture codes in batches, for example, send a batch of architecture data at a time, and send multiple batches of architecture data through multiple sending.
  • the generator 202 can also send multiple batches of neural network architecture codes at one time, for example, send multiple batches of architecture data at one time, which can improve transmission efficiency and save transmission resources.
  • the model training platform 206 obtains multiple sub-models according to multiple neural network architectures.
  • the model training platform 206 may perform weight initialization on the neural network architecture to obtain multiple initial sub-models, and then use the training data to train multiple initial sub-models to obtain multiple sub-models.
  • the model training platform 206 obtains the code of the neural network architecture from the generator 202, it also needs to analyze the code of the neural network architecture first to obtain the neural network architecture.
  • the model training platform 206 parses the coding to obtain the neural network architecture of the sub-model as: each layer It includes 5 blocks. Among them, the id of x node in block 0 is 1, op is 5, the id of y node in block 1 is 0, op is 0, and so on, so I won’t repeat them here.
  • the training data used for training the initial sub-model may be a data set corresponding to the task.
  • the training data can be a public data set ImageNet 1000, other public data sets used for image classification, or a data set provided by the user.
  • the training data may be public data sets such as visual object classification (VOC), common objects in context (COCO) based on content, or data sets provided by users.
  • VOC visual object classification
  • COCO common objects in context
  • the model training platform 206 can set the batch size, and then train the initial sub-model in a batch iterative manner, which can improve training efficiency and shorten training convergence time. Specifically, the model training platform 206 may input the training data in batches according to the batch size, and then update the initial sub-model parameters once through the gradient descent method based on the batch of training data, thereby realizing one iteration training. The model training platform 206 performs multiple iteration training according to the above method, and stops training when the training end condition is satisfied, such as the sub-model convergence or the loss value of the sub-model is less than the preset loss value.
  • the training process of the initial sub-models does not have a mutual dependence relationship. Therefore, the model training platform 206 can train multiple initial sub-models concurrently, which can shorten the training time and improve the training efficiency. For example, when the number of initial sub-models is 600, the model training platform 206 may use 6 machines to perform parallel training on these 600 initial sub-models. Among them, each machine is equipped with 8 V100 graphics processing units (GPUs), which can further increase the concurrency speed and thereby increase the efficiency of parallel training.
  • GPUs V100 graphics processing units
  • the model training platform 206 sends multiple sub-models to the model inference platform 208.
  • the model training platform 206 sends the trained sub-model to the model inference platform.
  • Each sub-model can be characterized by its corresponding neural network architecture code and parameter set (a set of model parameters, usually a set of weights). Based on this, when the model training platform 206 sends the sub-model to the model inference platform 208, it can send the code and parameter set of the neural network architecture corresponding to the sub-model.
  • the model inference platform 208 performs inference on the multiple sub-models on the first hardware, and obtains the evaluation index values of the multiple sub-models on the first hardware.
  • the model reasoning platform 208 can perform reasoning on the sub-model from at least one dimension on the first hardware to obtain at least one evaluation index value, such as accuracy, parameter amount, computing power, model reasoning time and other evaluation index values.
  • the computing power can be specifically measured by FLOPs, the number of floating-point operations per second.
  • the model inference platform 208 can perform inference on multiple sub-models on the first hardware in parallel, and obtain the evaluation index values of the multiple sub-models on the first hardware.
  • the evaluation index value obtained by the model reasoning platform 208 performing reasoning on the sub-model is the true index value.
  • the model reasoning platform 208 sends the neural network architectures corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the first hardware to the searcher 204.
  • model reasoning platform 208 when the model reasoning platform 208 sends the neural network architecture, it can send the code of the neural network architecture.
  • the model reasoning platform 208 interacts with the searcher 204 in the search system through coding and evaluation index values, which can meet the needs of protecting privacy.
  • the searcher 204 determines the first target neural network architecture that meets the preset condition according to the neural network architectures corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the first hardware.
  • the searcher 204 searches the neural network architecture according to the neural network architectures corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the first hardware. Since the generation of a new neural network architecture no longer depends on the training of the controller by the previous sub-model, the generation of a new sub-model no longer depends on the training of the initial sub-model obtained by initializing the weight of the new neural network architecture. , Shorten the search time of the neural network architecture, improve the search efficiency, and be able to quickly determine the first target neural network architecture that meets the preset conditions.
  • the traditional neural network architecture search is to train a controller through reinforcement learning.
  • the excitation used for training the controller is the true evaluation index value of the sub-model obtained according to the neural network architecture. This results in a longer delay for the controller to obtain the excitation. Greatly affects the training efficiency, thereby affecting the search efficiency.
  • this application proposes a method of directly predicting the evaluation index value of the neural network architecture by using an evaluator, and training the controller based on the predicted evaluation index value, thereby shortening the delay for the controller to obtain excitation, improving training efficiency, and then Improve search efficiency.
  • the searcher 204 determines the first target neural network architecture that meets the preset conditions according to the neural network architectures corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the first hardware, which specifically includes the following steps:
  • the searcher 204 trains the evaluator 2042 according to the neural network architectures corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the first hardware.
  • the evaluator 2042 belongs to a kind of neural network.
  • the neural network takes a neural network architecture as an input, specifically the coding of the neural network architecture as an input, and an evaluation index value corresponding to the neural network architecture as an output. That is, the evaluator 2042 is used to predict the evaluation index value of the neural network architecture.
  • the evaluator 2042 can be implemented by a time recursive network such as a gated recurrent unit (GRU) or a long short-term memory (LSTM).
  • GRU gated recurrent unit
  • LSTM long short-term memory
  • the searcher 204 constructs an evaluator 2042 through GRU or LSTM, and then uses the neural network architectures corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the first hardware to train the evaluator 2042.
  • the training process specifically involves inputting the neural network architecture and evaluation index values into the evaluator 2042, using the input evaluation index value as the neural network architecture label for supervised learning, and updating the first evaluation index value according to the evaluation index value predicted by the evaluator 2042 and the loss value determined by the label.
  • the weight of an initial model When the training end condition is met, such as the evaluator 2042 tends to converge or the loss value of the evaluator 2042 is less than the preset loss value, the training is stopped, and the trained evaluator 2042 can be used to predict the evaluation index value of the neural network architecture.
  • the embodiments of the present application are based on GRU implementation for illustration.
  • the evaluator 2042 includes at least one GRU Cell, and the GRU Cell takes a neural network architecture as an input, and takes an evaluation index value as an output.
  • the evaluator 2042 can include multiple GRU Cells, and multiple GRU Cells can be cascaded.
  • the hidden layer state of one GRU Cell can be input to the next GRU Cell, so that the next GRU Cell can reason about the hidden layer state. Evaluation index value.
  • the evaluator 2042 can also use an attention mechanism to introduce an attention layer for regression.
  • FIG 8 shows a schematic diagram of the structure of the evaluator 2042.
  • the neural network architecture code (arch code) can be input as input into the GRU cells. Each GRU cell is based on the currently input archcode and the previous one.
  • the hidden state of the GRU is processed, and then regression through the attention layer can output the evaluation index value, based on the output evaluation index value and the evaluation index value in the input training data (that is, the training data).
  • the weight of the evaluator 2042 can be updated, so that the training of the evaluator 2042 can be realized.
  • the training of the evaluator 2042 can be stopped.
  • the training of the evaluator 2042 belongs to a supervised learning method.
  • the searcher 204 can use a gradient descent method such as a stochastic gradient descent method for iterative training, thereby improving training efficiency and shortening training convergence time.
  • S4144 The searcher 204 uses the trained evaluator 2042 to train the controller 2044.
  • the controller 2044 belongs to a neural network.
  • the controller 2044 may be implemented by GRU.
  • the controller 2044 may adopt a reinforcement learning method for training.
  • the training process is specifically that the controller 2044 provides a neural network architecture code according to the definition of the search space, and then predicts the evaluation index value of the neural network architecture through the evaluator 2042, and uses the predicted evaluation index value instead of the neural network architecture to obtain
  • the evaluation index value (the actual evaluation index value) of the sub-model of is used as the reward of the controller 2044, and the parameters of the controller 2044 are updated according to the reward.
  • the trained controller 2044 satisfies the training end condition. For example, when the trained controller 2044 converges, the controller 2044 can output the candidate neural network architecture.
  • the searcher 204 determines the first target neural network architecture that meets the preset condition according to the trained controller 2044.
  • the searcher 204 obtains the candidate neural network architecture output by the controller 2044, and then obtains the candidate initial sub-model according to the candidate neural network architecture, and then trains the candidate initial sub-model to obtain the candidate sub-model, and then calculates the candidate sub-model Perform reasoning to obtain the evaluation index value of the candidate sub-model. Based on the evaluation index value, a target sub-model that meets a preset condition can be determined from the candidate sub-models, and the neural network architecture corresponding to the target sub-model is the first target neural network architecture.
  • the controller 2044 may also send the candidate neural network architecture to the user device, and the user determines the first target neural network architecture that meets the preset condition from the candidate neural network architectures through the user device. This embodiment does not limit this.
  • the evaluator 2042 in the neural network architecture search method provided by the present application can reuse training data, which reduces the amount of training data and reduces the requirement for spatial sampling.
  • the evaluator 2042 can quickly converge based on a small number of samples, and then the evaluator 2042 can be used to feed back the neural network architecture provided by the controller 2044, so that the controller 2044 can also converge faster, which improves training efficiency and further shortens
  • the search time is increased, and the search efficiency is improved.
  • users need to extend the sub-model to other hardware or migrate to other hardware due to the need to control costs, or the relationship between supply and demand has changed (for example, certain types of hardware may not be available on schedule), etc.
  • the design of the second hardware is different from the design of the first hardware, which may cause the first target neural network architecture suitable for the first hardware to not be suitable for the second hardware.
  • the embodiment of the present application also provides a method for searching for a second target neural network architecture suitable for the second hardware.
  • the second hardware may be known hardware or new hardware.
  • the evaluation index value includes two types of evaluation index values.
  • the first type of evaluation index value changes with hardware changes
  • the second type of evaluation index value does not change with hardware changes.
  • the search is suitable for the first hardware
  • the evaluation index value used in the first target neural network architecture includes the second type evaluation index value
  • the second type evaluation index value of the sub-model on the first hardware can also be used as the second type of the sub-model on the second hardware.
  • the class evaluation index value thus saves the time for performing inference on the sub-model on the second hardware, further shortens the search time, and improves the search efficiency.
  • the method includes:
  • the model reasoning platform 208 performs reasoning on the multiple sub-models on the second hardware, and obtains the evaluation index values of the multiple sub-models on the second hardware.
  • the model reasoning platform 208 sends the neural network architectures corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the second hardware to the searcher 204.
  • the model inference platform 208 performs inference on the multiple sub-models on the second hardware, and sends the neural network architectures corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the second hardware to the searcher 208.
  • the specific implementation can be seen in Figure 4 Description of related content of the illustrated embodiment.
  • the method of searching for the second target neural network architecture suitable for the second hardware may not execute the above S902 and S904, and the searcher 204 may directly store the sub-model in the first hardware.
  • the above evaluation index value is used as the evaluation index value of the sub-model on the second hardware.
  • the searcher 204 determines the second target neural network architecture that meets the preset condition according to the neural network architectures corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the second hardware.
  • the searcher 204 determines the target neural network architecture to be a second target neural network architecture suitable for the second hardware according to the neural network architectures corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the second hardware.
  • the target neural network architecture to be a second target neural network architecture suitable for the second hardware according to the neural network architectures corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the second hardware.
  • the first target neural network architecture suitable for the first hardware searched by the search system is also suitable for the second hardware. That is, when the evaluation index value only includes the second type of evaluation index value, the first target neural network structure suitable for the first hardware searched by the search system is the same as the second target neural network structure suitable for the second hardware searched by the search system .
  • this application also provides a specific example of searching for a second target neural network architecture suitable for the second hardware.
  • the evaluation index values used to search for the target neural network framework suitable for the first hardware include the first type evaluation index value and the second type evaluation index value, where the first type evaluation index is used
  • the value is the performance value of model inference time
  • the second type of evaluation index value adopted is the precision value of accuracy.
  • the searcher 204 can directly obtain the accuracy value of the sub-model on the first hardware as the accuracy value of the sub-model on the second hardware, and perform inference on the sub-model on the second hardware to obtain the model of the sub-model on the second hardware.
  • the performance value of inference time includes the accuracy value of model inference time.
  • the searcher 204 can obtain the evaluation index value of the sub-model on the second hardware, according to the neural network architecture corresponding to the multiple sub-models (specifically the coding of the neural network architecture) and the evaluation index value of the multiple sub-models on the second hardware (I.e. the evaluation index value and architecture code in the figure)
  • An evaluator 2042 can be trained, and the trained evaluator 2042 can be used to predict the neural network architecture provided by the controller 2044, and the evaluation index value corresponding to the neural network architecture can be obtained.
  • the searcher 204 uses the evaluation index value to stimulate and update the parameters of the controller 2044, so as to realize the training of the controller 2044.
  • the controller 2044 When the controller 2044 converges, it can output the candidate neural network architecture (specifically, the code of the candidate neural network architecture). Based on the candidate neural network architecture, a second target neural network architecture suitable for the second hardware can be determined.
  • the coding of the candidate neural network architecture suitable for the first hardware is referred to as the first candidate architecture code for short
  • the coding of the candidate neural network architecture suitable for the second hardware is referred to as the second candidate architecture code for short.
  • the search system may also have functions to protect data security and model privacy.
  • the service side of the search system may not include the model training platform 206 and the model inference platform 208, and specifically includes the architecture generation and architecture search modules, that is, it includes generation ⁇ 202 and searcher 204.
  • the remaining steps such as model training and model inference can be performed in the model training platform 206 and the model inference platform 208 on the user side, so as to achieve the goal of protecting user data/model privacy.
  • the generator 202 can provide the user with an application programming interface (application programming interface, API), and the generator 202 can process the search space through the API, thereby generating multiple neural network architectures for the user.
  • application programming interface application programming interface
  • the multiple neural network architectures obtained by the user device exist in the form of coding of the neural network architecture.
  • the coding of the neural network architecture can be analyzed to obtain the neural network architecture.
  • the model training platform 206 on the user side can initialize the weights of the neural network architecture to obtain the initial sub-model, and then use its own data set to train the initial sub-model to obtain the sub-model.
  • the model reasoning platform 208 on the user side can perform reasoning on the sub-model on the first hardware, and obtain the evaluation index value of the sub-model on the first hardware.
  • the searcher 204 obtains the neural network architecture (specifically the coding of the neural network architecture) corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the first hardware, and trains the evaluator 2042 according to the coding and evaluation index values of the aforementioned neural network architecture.
  • the trained evaluator 2042 is used to train the controller 2044.
  • the candidate neural network architecture output by the controller 2044 can be obtained.
  • the candidate neural network architecture can be determined The first target neural network architecture that meets the preset conditions. Since the user side interacts with the service side through the coding of the neural network architecture and the evaluation index value of the sub-model, data and model privacy will not be leaked, and security is guaranteed.
  • the search system can also use the training data provided by different users to search without data exchange. This can improve the accuracy/performance of the searched neural network architecture and break the neural network architecture search. The problem of data islanding in the process.
  • the generator 202 can generate N neural network architectures, and the N neural network architectures can obtain N initial sub-models after weight initialization, where N is greater than 1. Different users provide a total of M data sets for training the initial sub-model, and M is greater than 1. Considering privacy and data security, no data exchange is performed for the above M data sets. Based on this, the search system can search in the following ways.
  • the first implementation is that the model training platform 206 uses M data sets to perform federated learning for each initial sub-model to obtain N sub-models, and the model inference platform 208 performs inference on the N sub-models obtained from federated learning on the first hardware. Obtain the evaluation index value of each sub-model on the first hardware.
  • the searcher 204 uses the evaluation index values of the sub-models obtained by performing federated learning based on the N initial sub-models as the labels of the above-mentioned N neural network architectures to obtain training data, and use the training data to train the evaluator 2042.
  • the evaluator 2042 When the training of the evaluator 2042 is completed, such as when the evaluator 2042 converges, the evaluator 2042 is used to predict the neural network architecture provided by the controller 2044, and the evaluation index value corresponding to the neural network architecture is obtained, and the evaluation index value is fed back as an incentive
  • the controller 2044 updates the parameters of the controller 2044 to realize the training of the controller 2044, so as to realize the search for the first target neural network architecture suitable for the first hardware.
  • the second implementation manner is that the model training platform 206 uses M data sets for training for each initial sub-model, so that N*M sub-models can be obtained.
  • the model reasoning platform 208 may perform reasoning on the N*M sub-models on the first hardware, and obtain the evaluation index values of the N*M sub-models on the first hardware.
  • the evaluation index values of these N*M sub-models can be used as labels of their corresponding neural network architectures to form training data for training the evaluator 2042.
  • the training data may be divided into M groups based on the data set used for training the initial sub-model. In this way, it is equivalent to providing M groups of training data for the evaluator 2042.
  • the searcher 204 may use M sets of training data to train the evaluator 2042 in a federated learning manner.
  • the searcher 204 can use the evaluator 2042 to predict the neural network architecture provided by the controller 2044, obtain the evaluation index value corresponding to the neural network architecture, and use the evaluation index value as the feedback training controller 2044. Thereby searching for the first target neural network architecture suitable for the first hardware is realized.
  • the third implementation is to divide the N initial sub-models into M groups of initial sub-models, the M groups of initial sub-models correspond to M data sets one-to-one, and the model training platform 206 treats the M groups of initial sub-models.
  • the model uses the corresponding data set for training to obtain M groups of sub-models.
  • the model reasoning platform 208 can perform reasoning on the above N sub-models on the first hardware, and obtain the evaluation index values of the N sub-models on the first hardware.
  • the neural network architecture corresponding to each sub-model and the evaluation index value of the sub-model can form one training data. In this way, M sets of training data can be obtained.
  • the searcher 204 may use M sets of training data to train the evaluator 2042 in a federated learning manner.
  • the searcher 204 uses the evaluator 2042 to predict the neural network architecture provided by the controller 2044, obtains the evaluation index value corresponding to the neural network architecture, and uses the evaluation index value as the feedback training controller 2044 to achieve Search for the first target neural network architecture suitable for the first hardware.
  • the embodiment of the present application also uses data sets provided by two users separately to illustrate the neural network architecture search method.
  • the generator 202 can generate multiple neural network architectures based on the search space, and the multiple neural network architectures can be represented by coding.
  • Users A and B have their own data sets, and users A and B can provide their own data sets to the model training platform 206.
  • the model training platform 206 can analyze the coding of the neural network architecture to obtain the neural network architecture, and then analyze the neural network architecture. Perform weight initialization to obtain the initial sub-model, and then use the data set of user A and the data set of user B to train each of the multiple initial sub-models using a federated learning algorithm to obtain multiple sub-models.
  • the model inference platform 208 can perform inference on the multiple sub-models on the first hardware, and obtain the evaluation index values of the multiple sub-models on the first hardware.
  • two sub-models can be obtained according to each neural network architecture, including a sub-model trained based on the training data provided by user A and a sub-model trained based on the training data provided by user B.
  • the model inference platform 208 performs inference on each sub-model on the first hardware, and obtains the evaluation index value of the sub-model on the first hardware.
  • Each evaluation index value and the corresponding neural network architecture can form a training data, so according to a
  • the neural network architecture can get two training data. These training data can be divided into two groups based on the data set.
  • the searcher 204 can use the two sets of training data and adopt a federated learning algorithm to train the evaluator 2042.
  • the trained evaluator 2042 is used to predict the neural network architecture provided by the controller 2044, and the evaluation index value corresponding to the neural network architecture is obtained.
  • the evaluation index value can be used as feedback for training control Detector 2044, so as to realize the search for the first target neural network architecture suitable for the first hardware.
  • the neural network architecture search method provided by the embodiments of the present application has been described in detail above with reference to FIGS. 1 to 12. Next, the neural network architecture search device and equipment provided by the embodiments of the present application will be introduced with reference to the accompanying drawings.
  • the device 1300 is applied to a search system, the search system includes a generator and a searcher, and the device 1300 includes:
  • the generating module 1302 is used to generate multiple neural network architectures according to the search space;
  • the communication module 1304 is configured to obtain the evaluation index values of the multiple sub-models obtained according to the multiple neural network architectures on the first hardware;
  • the search module 1306 is configured to determine the first target neural network architecture that meets the preset condition according to the neural network architectures corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the first hardware.
  • the specific implementation of the generating module 1302 please refer to the description of S402 in the embodiment shown in Figure 4, the specific implementation of the communication module 1304 can refer to the description of S412 in the embodiment shown in Figure 4, and the specific implementation of the search module 1306 can refer to The description of S414 related content in the embodiment shown in FIG. 4 will not be repeated here.
  • the searcher includes an evaluator and a controller, and the search module 1306 is specifically configured to:
  • the controller is trained using a trained evaluator, and the first target neural network architecture that satisfies a preset condition is determined according to the trained controller.
  • search module 1306 For the specific implementation of the search module 1306, reference may be made to the description of related content in the embodiment shown in FIG. 7, and details are not described herein again.
  • the evaluation index values of the plurality of sub-models on the first hardware represent evaluation index values obtained by inference of the plurality of sub-models on the first hardware.
  • the evaluation index value includes a hardware-related performance value
  • the communication module 1304 is further configured to:
  • the search module 1306 is also used for:
  • a second target neural network architecture that meets the preset condition is determined.
  • the hardware-related performance value includes any one or more of the following: model inference time, activation amount, throughput, power consumption, and video memory occupancy rate.
  • the search space is characterized by the attribute value space of each attribute of the neuron
  • the generating module 1302 is specifically used for:
  • An attribute value is randomly selected for each attribute from the attribute value space of each attribute of the neuron to obtain the multiple neural network architectures.
  • the generating module 1302 is specifically configured to:
  • An application programming interface is provided to the user, and multiple neural network architectures are generated for the user through the application programming interface.
  • the search system further includes a model training platform
  • the device 1300 further includes a training module for:
  • the N initial sub-models are obtained according to the multiple neural network architectures, the initial sub-models correspond to the neural network architectures one-to-one, the N is greater than 1, and the M is greater than 1.
  • the neural network architecture search device 1300 may correspond to the method described in the embodiment of the present application, and the above-mentioned and other operations and/or functions of the various modules of the neural network architecture search device 1300 are respectively intended to implement FIG. 4 and FIG. The corresponding processes of each method in 7 and FIG. 9 are not repeated here for brevity.
  • the aforementioned neural network architecture search device 1300 may be implemented by a computer cluster, and the computer cluster includes at least one computer.
  • Fig. 14 provides a computer cluster.
  • the computer cluster shown in Fig. 14 includes one computer for illustration.
  • the computer cluster 1400 may be specifically used to implement the function of the neural network architecture search device 1300 in the embodiment shown in FIG. 13.
  • the computer cluster 1400 includes a bus 1401, a processor 1402, a communication interface 1403, and a memory 1404.
  • the processor 1402, the memory 1404, and the communication interface 1403 communicate with each other through the bus 1401.
  • the bus 1401 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the communication interface 1403 is used to communicate with the outside.
  • the communication interface 1403 may obtain evaluation index values of multiple sub-models on the first hardware obtained according to the multiple neural network architectures, obtain evaluation index values of multiple sub-models on the second hardware, and so on.
  • the processor 1402 may be a central processing unit (CPU).
  • the memory 1404 may include a volatile memory (volatile memory), such as a random access memory (RAM).
  • the memory 1404 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), flash memory, HDD or SSD.
  • the memory 1404 stores executable code, and the processor 1402 executes the executable code to execute the aforementioned neural network architecture search method.
  • the functions of the generation module 1302, the search module 1306, and the training module in FIG. 13 are executed.
  • the required software or program code is stored in the memory 1404.
  • the function of the communication module 1304 is implemented through the communication interface 1403.
  • the processor 1402 is configured to execute instructions in the memory 1404, for example, execute instructions corresponding to the generation module 1302 to generate multiple neural network architectures according to the search space, and the communication interface 1403 obtains multiple sub-models obtained according to the multiple neural network architectures.
  • the evaluation index value on the hardware is transmitted to the processor 1402 through the bus 1401, and the processor 1402 executes the instructions corresponding to the search module 1306 to according to the neural network architecture corresponding to the multiple sub-models and the evaluation of the multiple sub-models on the first hardware
  • the index value determines the first target neural network architecture that meets the preset conditions, so as to execute the neural network architecture search method provided in the embodiment of the present application.
  • the processor 1402 determines the first target neural network architecture that satisfies the preset conditions according to the neural network architectures corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the first hardware, it is specifically based on the multiple sub-models.
  • the neural network architecture corresponding to the model and the evaluation index values of the multiple sub-models on the first hardware train the evaluator 2042, and then use the trained evaluator 2042 to train the controller 2044: Determine, according to the trained controller 2044, the first target neural network architecture that satisfies a preset condition.
  • multiple neural network architectures can obtain multiple initial sub-models through weight initialization, where the multiple initial sub-models correspond to multiple neural network architectures one-to-one.
  • the initial number of sub-models is N.
  • the processor 1402 may also execute instructions corresponding to the training module after generating multiple neural network architectures to train the initial sub-models to obtain multiple sub-models.
  • the processor 1402 training the initial sub-model specifically includes the following methods:
  • the first way is to use M data sets for each of the N initial sub-models to perform federated learning to obtain N sub-models; or,
  • the second way is to train each of the N initial sub-models using M data sets to obtain N*M sub-models; or,
  • the third way is to divide the N initial sub-models into M groups of initial sub-models, the M groups of initial sub-models have a one-to-one correspondence with M data sets, and corresponding data are used for the M groups of initial sub-models Set to train to get M groups of sub-models.
  • the processor 1402 determines that the predetermined value is satisfied according to the neural network architecture corresponding to the multiple sub-models and the evaluation index values of the multiple sub-models on the first hardware.
  • the training data is grouped according to the data set used for training the initial sub-model, and then different sets of training data are used to train the evaluator 2042 using a federated learning algorithm.
  • the processor 1402 trains the controller 2044 according to the trained evaluator, and determines the first target neural network architecture that satisfies the preset condition according to the trained controller 2044.
  • An embodiment of the present application also provides a computer-readable storage medium including instructions that instruct the computer cluster 1400 to execute the neural network architecture search method applied to the neural network architecture search device 1300.
  • the embodiment of the present application also provides a computer program product.
  • the computer program product When the computer program product is executed by a computer, the computer executes any one of the aforementioned neural network architecture search methods.
  • the computer program product may be a software installation package. In the case where any method of the aforementioned neural network architecture search method needs to be used, the computer program product may be downloaded and executed on the computer.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate.
  • the physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the connection relationship between the modules indicates that they have a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • this application can be implemented by means of software plus necessary general hardware.
  • it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memory, Dedicated components and so on to achieve.
  • all functions completed by computer programs can be easily implemented with corresponding hardware.
  • the specific hardware structures used to achieve the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. Circuit etc.
  • software program implementation is a better implementation in more cases.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, training device, or network device, etc.) execute the various embodiments described in this application method.
  • a computer device which can be a personal computer, training device, or network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, training device, or data.
  • the center transmits to another website, computer, training equipment, or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供了一种神经网络架构搜索方法,应用于搜索系统,搜索系统包括生成器和搜索器,该方法包括: 生成器根据搜索空间生成多个神经网络架构,搜索器获取根据多个神经网络架构获得的多个子模型在第一硬件上的评价指标值,搜索器根据多个子模型对应的神经网络架构和多个子模型在第一硬件上的评价指标值,确定满足预设条件的第一目标神经网络架构。如此,实现了不同初始子模型训练过程的解耦以及神经网络架构搜索过程和初始子模型训练过程的解耦,缩短了搜索时长,提高了搜索效率。

Description

一种神经网络架构搜索方法、装置、设备及介质
本申请要求于2020年03月05日提交中国知识产权局、申请号为202010148339.1、申请名称为“一种神经网络架构搜索的方法和系统”的中国专利申请,以及2020年04月14日提交国家知识产权局、申请号为202010290428.X、申请名称为“一种神经网络架构搜索方法、装置、设备及介质”中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种神经网络架构搜索方法、装置、设备以及计算机可读存储介质。
背景技术
随着人工智能(artificial intelligence,AI)技术的兴起,尤其是神经网络(neural networks)的兴起,许多领域如图像处理、音视频识别等领域均取得了较大的进展。目前,基于AI的图像处理效果或者音视频识别效果往往依赖于采用的神经网络的性能。性能较好的神经网络通常需要具有丰富经验的技术专家花费大量精力构建一个神经网络架构,然后基于该神经网络架构进行训练得到可用于特定应用的神经网络。
考虑到成本问题和效率问题,业界提出了一种神经网络架构搜索(neural architecture search,NAS)的方法搭建神经网络。所谓NAS是指定义搜索空间,然后在搜索空间中自动搜索神经网络架构,从而获得性能较好的神经网络。
然而,已有NAS的搜索效率通常较低,难以满足业务需求。基于此,业界亟需提供一种高效的神经网络架构搜索方法。
发明内容
本申请提供了一种神经网络架构搜索方法,该方法通过对不同初始子模型训练过程进行解耦,以及对神经网络架构搜索过程和初始子模型训练过程进行解耦,解决了相关技术中搜索效率较低,难以满足业务需求的问题。本申请还提供了该方法对应的装置、设备、计算机可读存储介质以及计算机程序产品。
第一方面,本申请提供了一种神经网络架构搜索方法。该方法通过将不同初始子模型的训练过程进行解耦,使得训练不同初始子模型的过程能够很好地并行,以及将搜索神经网络架构的过程和训练初始子模型的过程进行解耦,使得搜索神经网络架构的过程和训练初始子模型的过程也能够很好地并行,从而减少搜索时长,提高搜索效率。
具体地,该方法应用于搜索系统,该搜索系统包括生成器和搜索器,生成器根据搜索空间生成多个神经网络架构,通过对多个神经网络架构进行权重初始化,可以获得多个初始子模型,模型训练平台可以并行地训练多个初始子模型得到多个子模型,如此,实现了对多个初始子模型训练过程的解耦。
模型推理平台可以对每个子模型在第一硬件上执行推理,获得训练后的子模型在第一硬件上的评价指标值。搜索器可以获取多个子模型在第一硬件上的评价指标值,利用该评价指标值以及子模型对应的神经网络架构确定满足预设条件的第一目标神经网络架构。由于搜索神经网络架构无需依赖上一个子模型的实际评价指标值,神经网络架构搜索过程和初始子模型训练过程也能够并行处理,如此实现了神经网络架构搜索过程和初始子模型训练过程的解耦。
基于此,神经网络架构搜索时长大幅减少,搜索效率得以提高。
在一些可能的实现方式中,考虑到相关技术中在采用强化学习训练控制器时,使用的激励是子模型的实际评价指标值(即模型推理平台对子模型在硬件上进行推理所得的评价指标值),该激励的延迟较大,严重影响了搜索效率,基于此,本申请提供了一种包括评估器和控制器的搜索器。该搜索器利用评估器对神经网络架构对应的评价指标值进行预测,采用预测的评价指标值作为激励,不必等待模型推理平台进行推理得到实际评价指标值,如此大幅减少了激励的延迟,提高了搜索效率。
具体地,搜索器包括评估器和控制器,搜索器根据所述多个子模型对应的神经网络架构和所述多个子模型在第一硬件上的评价指标值训练所述评估器,所述搜索器利用已训练的评估器训练所述控制器,根据已训练的控制器确定满足预设条件的所述第一目标神经网络架构。
在一些可能的实现方式中,所述多个子模型在第一硬件上的评价指标值表示所述多个子模型在所述第一硬件上执行推理获得的评价指标值,该评价指标值为真实的评价指标值,利用真实的评价指标值训练的评估器具有较高可信度,可以用于预测神经网络架构对应的评价指标值。
在一些可能的实现方式中,考虑到控制成本的需求,或者市场上供需关系发生变化等等,需要将子模型扩展到别的硬件或迁移到别的硬件如第二硬件上。第二硬件的设计与第一硬件的设计有所区别,有可能导致适于第一硬件的第一目标神经网络架构并不适于第二硬件。基于此,还可以搜索适于第二硬件的第二目标神经网络架构。其中,第二硬件可以是已知硬件,也可以是新硬件。
该方法能够利用搜索适于第一硬件的第一目标神经网络架构时训练得到的子模型,无需重新训练,因此,可以快速确定适于第二硬件的第二目标神经网络架构。进一步地,评价指标值包括两类评价指标值,第一类评价指标值随着硬件变化而发生变化,第二类评价指标值不会随着硬件变化而发生变化,当搜索适于第一硬件的第一目标神经网络架构时采用的评价指标值包括第二类评价指标值时,还可以将子模型在第一硬件上的第二类评价指标值作为子模型在第二硬件上的第二类评价指标值,如此节省了对子模型在第二硬件执行推理的时间,进一步缩短了搜索时间,提高了搜索效率。
具体地,所述评价指标值包括与硬件相关的性能值,搜索器可以获取所述多个子模型在第二硬件上的性能值,多个子模型在第二硬件上的性能值由所述多个子模型在所述第二硬件上执行推理得到,然后搜索器根据所述多个子模型对应的神经网络架构和所述多个子模型在第二硬件上的性能值,确定满足预设条件的第二目标神经网络架构。
在一些可能的实现方式中,所述与硬件相关的性能值包括以下任意一种或多种:模型推理时间、激活量、吞吐量、功耗和显存占用率。
在一些可能的实现方式中,所述搜索空间通过神经元的各属性的属性取值空间进行表征。生成器可以从所述神经元的各属性的属性取值空间中为每个属性随机选择属性值,获得所述多个神经网络架构。如此可以保障训练评估器的样板的均衡性。
考虑到随机采样过程可以生成相同的神经网络架构,为了避免生成重复样本,减少样本数量,生成器可以对随机采样过程进行干预。具体地,生成器可以从神经元的各属性的属性取值空间中为每个属性随机选择一个属性值,获得一个神经网络架构,然后从所述属性取值空间中除去已选择的属性值以外的属性值为每个属性随机选择一个属性值,获得另一个神经网络架构。
进一步地,当神经元的一个属性(即第一属性)的属性值覆盖对应的属性取值空间,而神经元的另一个属性(即第二属性)尚未覆盖对应的属性取值空间,则生成器可以从神经元的第一属性的属性取值空间中为第一属性随机选择一个属性值,神经元的第二属性的属性取值空间中除去已选择的属性值以外的属性值为该第二属性随机选择一个属性值,生成新的神经网络架构。神经网络架构中神经元的各属性的属性值均覆盖对应的属性取值空间时,生成器即生成一组神经网络架构。
在一些可能的实现方式中,为了保护数据安全,如搜索空间以及神经网络架构安全,所述生成器可以向用户提供应用程序编程接口,通过所述应用程序编程接口为所述用户生成多个神经网络架构。
在一些可能的实现方式中,当用户提供不同数据集时,搜索系统还可以在不进行数据交换的前提下实现神经网络架构搜索,如此可以打破数据孤岛,提高搜索出的神经网络架构的性能/精度。
具体地,所述搜索系统还包括模型训练平台,所述模型训练平台可以利用M个数据集对根据多个神经网络架构获得的N个初始子模型进行训练,其中,神经网络架构与初始子模型一一对应,N大于1,M大于。
其中,模型训练平台在训练N个初始子模型时,可以通过如下几种方式实现:
第一种实现方式为,模型训练平台对N个初始子模型中的每个初始子模型利用M个数据集进行联邦学习得到N个子模型;
第二种实现方式为,模型训练平台对N个初始子模型中的每个初始子模型分别采用M个数据集进行训练得到N*M个子模型;或者,
第三种实现方式为,模型训练平台将所述N个初始子模型划分为M组初始子模型,所述M组初始子模型与M个数据集一一对应,对所述M组初始子模型采用对应的数据集进行训练得到M组子模型。
第二方面,本申请提供了一种神经网络架构搜索装置。所述装置应用于搜索系统,所述搜索系统包括生成器和搜索器,所述装置包括:
生成模块,用于根据搜索空间生成多个神经网络架构;
通信模块,用于获取根据所述多个神经网络架构获得的多个子模型在第一硬件上的评价指标值;
搜索模块,用于根据所述多个子模型对应的神经网络架构和所述多个子模型在第一硬件上的评价指标值,确定满足预设条件的第一目标神经网络架构。
在一些可能的实现方式中,所述搜索器包括评估器和控制器,所述搜索模块具体用于:
根据所述多个子模型对应的神经网络架构和所述多个子模型在第一硬件上的评价指标值训练所述评估器;
利用已训练的评估器训练所述控制器,根据已训练的控制器确定满足预设条件的所述第一目标神经网络架构。
在一些可能的实现方式中,所述多个子模型在第一硬件上的评价指标值表示所述多个子模型在所述第一硬件上执行推理获得的评价指标值。
在一些可能的实现方式中,所述评价指标值包括与硬件相关的性能值,所述通信模块还用于:
获取所述多个子模型在第二硬件上的性能值,所述多个子模型在第二硬件上的性能值由所述多个子模型在所述第二硬件上执行推理得到;
所述搜索模块还用于:
根据所述多个子模型对应的神经网络架构和所述多个子模型在第二硬件上的性能值,确定满足预设条件的第二目标神经网络架构。
在一些可能的实现方式中,所述与硬件相关的性能值包括以下任意一种或多种:模型推理时间、激活量、吞吐量、功耗和显存占用率。
在一些可能的实现方式中,所述搜索空间通过神经元的各属性的属性取值空间进行表征;
所述生成模块具体用于:
从所述神经元的各属性的属性取值空间中为每个属性随机选择属性值,获得所述多个神经网络架构。
在一些可能的实现方式中,所述生成模块具体用于:
向用户提供应用程序编程接口,通过所述应用程序编程接口为所述用户生成多个神经网络架构。
在一些可能的实现方式中,所述搜索系统还包括模型训练平台,所述装置还包括训练模块,用于:
对N个初始子模型中的每个初始子模型利用M个数据集进行联邦学习得到N个子模型;或者,
对N个初始子模型中的每个初始子模型分别采用M个数据集进行训练得到N*M个 子模型;或者,
将所述N个初始子模型划分为M组初始子模型,所述M组初始子模型与M个数据集一一对应,对所述M组初始子模型采用对应的数据集进行训练得到M组子模型;
其中,所述N个初始子模型根据所述多个神经网络架构获得,所述初始子模型与所述神经网络架构一一对应,所述N大于1,所述M大于1。
第三方面,本申请提供一种计算机集群,所述计算集群包括至少一台计算机,每台计算机包括处理器和存储器。所述处理器、所述存储器进行相互的通信。所述至少一台计算机的处理器用于执行所述至少一台计算机的存储器中存储的指令,以使得计算机集群执行如第一方面或第一方面的任一种实现方式中的神经网络架构搜索方法。
第四方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机集群上运行时,使得计算机集群执行上述第一方面或第一方面的任一种实现方式所述的神经网络架构搜索方法。
第五方面,本申请提供了一种包含指令的计算机程序产品,当其在计算机集群上运行时,使得计算机集群执行上述第一方面或第一方面的任一种实现方式所述的神经网络架构搜索方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
为了更清楚地说明本申请实施例的技术方法,下面将对实施例中所需使用的附图作以简单地介绍。
图1为本申请实施例提供的一种神经网络架构搜索的流程示意图;
图2为本申请实施例提供的一种搜索系统的架构示意图100;
图3为本申请实施例提供的一种搜索系统的架构示意图200;
图4为本申请实施例提供的一种神经网络架构搜索方法的交互流程图;
图5为本申请实施例提供的一种生成神经网络架构的示意图;
图6为本申请实施例提供的一种生成神经网络架构的示意图;
图7为本申请实施例提供的一种确定第一目标神经网络架构的流程示意图;
图8为本申请实施例提供的一种评估器的结构示意图;
图9为本申请实施例提供的一种确定第二目标神经网络架构的流程示意图;
图10为本申请实施例提供的一种神经网络架构搜索的流程示意图;
图11为本申请实施例提供的一种神经网络架构搜索的流程示意图;
图12为本申请实施例提供的一种神经网络架构搜索的流程示意图;
图13为本申请实施例提供的一种神经网络架构搜索装置的结构示意图;
图14为本申请实施例提供的一种计算机集群的结构示意图。
具体实施方式
下面将结合本申请中的附图,对本申请提供的实施例中的方案进行描述。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。
为了便于理解本申请的技术方案,下面对本申请涉及的一些技术术语进行介绍。
神经网络(neural networks,NN)是一种模拟人脑的神经网络以期能够实现类人工智能的数学模型,神经网络也可以称为神经网络模型。神经网络通常采用具有连接关系的多个神经元(也称作节点,node)模拟人脑的神经网络,实现图像分类、语音识别等任务。
其中,每个神经网络中各神经元连接所得的结构称为该神经网络的神经网络架构。典型的神经网络架构包括循环神经网络(recurrent neural network,RNN)、卷积神经网络(convolutional neural network,CNN)等等。神经网络架构可以通过有向图进行表征。有向图中的每条边具有一个权重,权重用于表征一条边中输入节点相对于该边中输出节点的重要性。神经网络的参数即包括上述权重。需要说明,权重通常可以利用样本对神经网络进行训练得到。
根据神经网络架构获得神经网络模型包括两个阶段。一个阶段是对神经网络架构进行权重初始化(weight initialization),得到初始神经网络模型,也称作初始子模型。其中,权重初始化是指对神经网络架构中各条边的权重(在有些情况下,还包括偏置)进行初始化。具体实现时,可以通过高斯分布生成权重初始值从而实现权重初始化。另一个阶段是,利用样本数据更新初始子模型的权重,得到子模型(child model)。具体地,将样本数据输入初始子模型,该初始子模型可以根据预测值以及样本数据携带的真值确定损失值,基于该损失值更新初始子模型的权重。通过多轮权重迭代,可以获得一个子模型。该子模型即为已训练的、可用于特定应用的神经网络模型。
衡量一个子模型的优劣可以通过子模型的评价指标值实现。其中,评价指标值是对子模型从至少一个维度进行评价所得的度量值。子模型的评价指标值可以分为两类,一类评价指标值随着硬件变化而变化,另一类评价指标值随着硬件变化保持不变。为了方便描述,本申请将随着硬件变化而变化的评价指标值称为第一类评价指标值,将随着硬件变化保持不变的评价指标值称为第二类评价指标值。
第一类评价指标值是与硬件相关的评价指标值,包括与硬件相关的性能值。在一些实现方式中,与硬件相关的性能值包括模型推理时间、激活量、吞吐量、功耗和显存占用率中的任意一种或多种。第二类评价指标值是与硬件不相关的评价指标值,包括与硬件不相关的精度值。在一些实现方式中,精度值包括准确率(accuracy)、精确率(precision)、召回率(recall)中的任意一种或多种。其中,与硬件不相关的评价指标值还包括参数量和计算力,计算力具体包括每秒浮点运算次数(floating-point operations per second,FLOPs)。
对于目前的基于神经网络的各项任务而言,主要的过程依旧是由研究人员手动地探索新的神经网络架构。而神经网络的性能往往取决于对任务的理解以及神经网络架构设计上的想象力。神经网络架构设计的整个过程需要研究人员对相关领域有着充分的认知,间接提高了从业人员的入门门槛。而且,研究人员通过人工方式不断地对神经网络架构进行改良也非常耗费时间。
随着近年来计算机设备的算力以及存储能力逐年递增,业界提出了一种神经网络架构搜索(neural architecture search,NAS)方法,用于实现自动地神经网络架构设计。与计算机学习神经网络的权重类似,计算机也可以通过NAS从搜索空间(search space)学习神经网络架构,从而实现神经网络架构设计。
搜索空间定义了搜索的范围,基于该范围可以提供一组可供搜索的神经网络架构。根据需要构建的神经网络的类型,搜索空间可以分为链式架构空间、多分支架构空间以及基于单元块(block)的搜索空间等多种类型。针对上述不同类型的搜索空间,可以通过神经网络架构包括的神经元(即节点)的各属性对应的属性取值空间进行表征。为了方便理解,下文均以基于block的搜索空间进行示例说明。
图1示出了从搜索空间中搜索神经网络架构的原理,如图1所示,一个基于RNN的控制神经网络,也称控制器(controller),从搜索空间中以概率p采样一个神经网络架构A,接着对该神经网络架构A进行权重初始化,获得一个初始子模型,然后训练该初始子模型得到一个子模型,并得到该子模型在验证集上的精度R,然后再使用精度R更新控制器的参数,如此循环执行直至控制器收敛,从而获得高性能的神经网络架构,实现神经网络架构的设计。
其中,一个新的神经网络架构的产生依赖于上一个子模型的精度对控制器的训练,一个新的子模型的产生依赖于对上述新的神经网络架构进行权重初始化所得的初始子模型的训练,也即多个初始子模型的训练过程不能很好地并行。而且,搜索神经网络架构的过程依赖于训练初始子模型的过程,也即搜索神经网络架构的过程和训练初始子模型的过程耦合度较高,搜索神经网络架构和训练初始子模型的过程也不能很好地并行。以上两方面导致搜索效率低,搜索耗时长。
有鉴于此,本申请实施例提供了一种神经网络架构搜索方法。该方法通过将搜索神经网络架构的过程和训练初始子模型的过程进行解耦,以及将不同初始子模型的训练过程进行解耦,使得搜索神经网络架构的过程和训练初始子模型的过程能够很好地并行,不同初始子模型的训练过程也能够很好地并行,从而减少搜索时长,提高搜索效率。
具体地,该方法应用于搜索系统,该搜索系统包括生成器(generator)和搜索器(searcher),生成器能够基于搜索空间生成多个神经网络架构,通过对多个神经网络架构进行权重初始化,可以获得多个初始子模型,模型训练平台可以并行地训练多个初始子模型得到多个子模型,如此,实现了对多个初始子模型训练过程的解耦。模型推理平台可以对每个子模型在第一硬件上执行推理,获得训练后的子模型在第一硬件上的评价指标值。搜索器可以获取多个子模型在第一硬件上的评价指标值,利用该评价指标值以及子模型对应的神经网络架构确定满足预设条件的第一目标神经网络架构。由于搜索神经网络架构无需依赖上一个子模型的实际评价指标值,神经网络架构搜索过程和初始子模型训练过程也能够并行处理,如此实现了神经网络架构搜索过程和初始子模型训练过程的解耦。由此,神经网络架构搜索时长大幅减少,搜索效率得以提高。
在一些实施例中,模型训练平台可以位于服务侧,例如:该模型训练平台可以由神经网络架构搜索云服务的提供方提供。当然,在另一些实施例中,模型训练平台也可以位于用户侧,即模型训练平台可以由需要进行神经网络架构搜索的用户提供。采用用户提供的模型训练平台进行子模型训练,可以避免用于训练子模型的训练数据泄露,保障了数据安全。
类似地,模型推理平台可以位于服务侧,也可以位于用户侧。当模型推理平台位于用户侧时,采用用户所提供的模型推理平台对模型训练平台训练得到的子模型执行推理得到评价指标值,无需将子模型上传至服务侧,如此可以避免子模型泄露,保障模型隐私。
进一步地,已有的NAS是通过强化学习(reinforcement learning,RL)的方式训练控制器。强化学习需要大量的训练数据,训练收敛周期长。并且,在训练控制器时,每个控制策略对应的激励(reward)的延迟较大,通常需要几十毫秒返回。基于此,本申请实施例提供了一种搜索器,该搜索器包括控制器和评估器。其中,控制器也可以称作筛选器。
在对评估器进行训练时,搜索器采用生成器生成的神经网络架构以及与该神经网络架构对应的子模型的评价指标值作为训练数据,进行监督学习,如此可以大幅缩短训练时间,提高训练效率。而且,由于神经网络架构和对应的评价指标值可以重复利用,用于对评估器进行多个轮次(epoch)的训练,如此,减少了训练所需的数据量,提高了训练数据的利用率。
在完成对评估器的训练后,可以利用评估器对控制器进行反馈,具体是以训练好的评估器预测出控制器提供的神经网络架构的评价指标值作为控制器的激励,无需采用子模型的实际评价指标值,如此,提高了训练效率。控制器训练完成时,可以输出候选神经网络架构,从该候选神经网络架构中可以挑选出满足预设条件的第一目标神经网络架构。
由于评估器和控制器能够较快地训练完成,控制器能够较快地输出候选神经网络架构,基于该候选神经网络架构能够快速挑选出满足预设条件的第一目标神经网络架构,进一步缩短了神经网络架构搜索时长,提高了搜索效率。
本申请实施例提供的神经网络架构搜索方法可以包括但不限于应用于如图2所示的应用场景中。
如图2所示,搜索系统包括生成器202、搜索器204和模型训练平台206以及模型推理平台208。其中,生成器202用于根据搜索空间生成多个神经网络架构,模型训练平台206用于对根据多个神经网络架构获得的多个初始子模型进行训练得到多个子模型,模型推理平台208用于对多个子模型在第一硬件上执行推理,获得多个子模型在第一硬件上的评价指标值,搜索器208用于根据多个子模型对应的神经网络架构以及多个子模型在第一硬件上的评价指标值确定满足预设条件的第一目标神经网络架构。
在该应用场景中,生成器202、搜索器204和模型训练平台206、模型推理平台208部署于同一个云计算集群(云计算集群包括至少一个云计算设备,如云服务器),具体为服务侧的云计算集群。
具体地,生成器202对搜索空间进行采样,生成多个神经网络架构。该生成器202生成的神经网络架构具体可以通过编码进行表示。模型训练平台206可以对多个神经网 络架构进行权重初始化,获得多个初始子模型,然后对多个初始子模型进行训练获得多个子模型。其中,神经网络架构采用编码表示时,模型训练平台206先解析编码得到神经网络架构,然后再执行权重初始化以及训练初始子模型的操作。模型推理平台208可以对训练好的多个子模型在第一硬件上执行推理,得到各个子模型在第一硬件上的评价指标值。
接着,搜索器204可以获取多个子模型对应的神经网络架构以及多个子模型在第一硬件上的评价指标值,根据所述多个子模型对应的神经网络架构和所述多个子模型在第一硬件上的评价指标值训练评估器2042。具体地,搜索器204以所述评价指标值作为标签,采用监督学习的方式训练评估器2042。考虑到隐私安全,搜索器204在训练评估器时,还可以采用神经网络架构的编码代替神经网络架构。
在完成评估器2042的训练后,搜索器204可以利用已训练的评估器2042训练控制器2044,然后根据已训练的控制器2044确定满足预设条件的第一目标神经网络架构。具体地,控制器2044可以提供神经网络架构,评估器2042可以预测该神经网络架构的评价指标值,搜索器204可以利用预测得到的评价指标值作为该神经网络架构的激励,更新控制器2044的参数,而不必对该神经网络架构进行权重初始化得到初始子模型,然后训练初始子模型得到子模型,并通过模型推理平台对子模型执行推理,得到实际评价指标值。如此,实现了神经网络架构搜索过程和初始子模型训练过程的解耦,缩短了搜索时长,提高了搜索效率。
在完成控制器2044的训练后,例如控制器2044收敛后,该控制器2044可以输出至少一个候选神经网络架构。搜索器204可以根据该候选神经网络架构确定满足预设条件的第一目标神经网络架构。具体地,根据控制器2044输出的每一个候选神经网络架构可以生成一个初始子模型,该初始子模型经过训练后可以得到一个子模型,当子模型满足预设条件时,例如子模型的评价指标值达到预设值时,可以将该子模型对应的候选神经网络架构确定为第一目标神经网络架构,以用于特定应用。
图2是以搜索系统包括模型训练平台206和模型推理平台208,且搜索系统的生成器202、搜索器204以及模型训练平台206、模型推理平台208部署于同一个云计算集群进行示例说明。在一些可能的实现方式中,考虑到训练初始子模型所采用的训练数据以及训练得到的子模型的安全性,搜索系统也可以不包括上述模型训练平台206和模型推理平台208,其可以通过与用户侧的模型训练平台206和模型推理平台208进行交互实现神经网络架构搜索。
具体地,如图3所示,生成器202和搜索器204部署在一个第一云计算集群,具体可以是服务侧的云计算集群。模型训练平台206和模型推理平台208部署在第二云计算集群,具体可以是用户侧的云计算集群。服务侧的云计算集群可以是公有云,用户侧的云计算集群可以是私有云,如此可以基于公有云和私有云形成的混合云实现神经网络架构搜索。
在图3所示的场景中,生成器202生成多个神经网络架构后,将各神经网络架构的编码传输至用户侧的模型训练平台206,模型训练平台206可以解析神经网络架构的编码,获得神经网络架构,然后对神经网络架构进行权重初始化获得初始子模型,并利用训练数据对初始子模型进行训练得到子模型。然后,用户侧的模型推理平台208可以对子模型在第一硬件上执行推理,得到子模型在第一硬件上的评价指标值。各神经网络架 构的编码以及对应的子模型在第一硬件上的评价指标值可以传输至服务侧的搜索器204,搜索器204利用上述编码以及对应的评价指标值训练评估器2042,在训练完成后可以反过来利用该评估器2042训练控制器2044。控制器2044训练完成时可以输出至少一个候选神经网络架构,从候选神经网络架构中可以确定出满足预设条件的第一目标神经网络架构。
需要说明的是,图2和图3仅仅是本申请实施例提供的神经网络架构搜索应用场景的一些具体示例。在一些可能的实现方式中,生成器202、搜索器204、模型训练平台206和模型推理平台208可以分别部署于不同的云计算集群中,或者是,以两两组合的形式部署在不同的云计算集群中,又或者是以任意三个进行组合的形式部署在一个云计算集群,剩下的一个部署在另一个云计算集群。当然,生成器202、搜索器204、模型训练平台206和模型推理平台208也可以不部署在云计算集群,直接部署在物理设备如服务器中,或者是一部分部署在云计算集群,另一部分部署在物理设备中。
为了便于理解本申请实施例的技术方案,接下来,将从生成器202、搜索器204和模型训练平台206、模型推理平台208交互的角度对本申请实施例的神经网络架构搜索方法进行介绍。
参见图4所示的神经网络架构搜索方法的流程图,该方法包括:
S402:生成器202根据搜索空间生成多个神经网络架构。
搜索空间可以通过神经网络架构包括的神经元(也称作节点node)的各属性对应的属性取值空间进行表征。例如,搜索空间可以通过神经元(identity,id)的标识和操作(operation,op)这2种属性的属性取值空间进行表征。在有些情况下,搜索空间还可以结合神经网络架构包括的层(layer)数、每层包括的单元块(block)数据以及每个单元块包括的神经元数中的至少一个进行表征。进一步地,生成器202可以对神经网络架构包括的神经元的属性值进行编码,编码结果可以用于表示神经网络架构。
为了便于理解,本申请还提供了搜索空间的一个具体示例。如图5所示,在该示例中,搜索空间定义为:一个layer包括5个block(具体为block0-block4),每个block包括x和y两个node,x和y两个node具有id和op两个属性。
其中,block0中id属性的属性取值空间(即取值范围)为{0,1},blocki中id属性的属性取值空间为{0,1…i+1},表示可以选择的id。此外,每个block的op属性的属性取值空间为{0,1…5},表示每个操作有6种可选操作。
生成器202可以对搜索空间进行采样,从而生成多个神经网络架构。在对搜索空间进行采样时,生成器202可以采用随机方式进行采样,如此,可以保障训练评估器2042的样本的均衡性。
考虑到随机采样过程可以生成相同的神经网络架构,为了避免生成重复样本,减少样本数量,生成器202可以对随机采样过程进行干预。具体地,生成器202可以从神经元的各属性的属性取值空间中为每个属性随机选择一个属性值,获得一个神经网络架构,然后从所述属性取值空间中除去已选择的属性值以外的属性值为每个属性随机选择一个属性值,获得另一个神经网络架构。
当神经元的一个属性(下文称之为第一属性)的属性值覆盖对应的属性取值空间,而神经元的另一个属性(下文称之为第二属性)尚未覆盖对应的属性取值空间,则生成器202可以从神经元的第一属性的属性取值空间中为第一属性随机选择一个属性值,神 经元的第二属性的属性取值空间中除去已选择的属性值以外的属性值为该第二属性随机选择一个属性值,生成新的神经网络架构。神经网络架构中神经元的各属性的属性值均覆盖对应的属性取值空间时,生成器202即生成一组神经网络架构。
如图5所示,第一个神经网络架构中block0的x/y两个node的id和op是从取值范围内随机选取一个值,第二个神经网络架构中block0的x/y两个node的id和op是从取值范围减去第一个神经网络架构已选取值余下范围中随机选取的值。依此类推,每个block采用上述操作。当神经网络架构中神经元的各属性的属性值覆盖对应的属性取值空间,则将这多个神经网络架构的编码(简称为多个架构编码)作为一批架构数据。生成器202可以循环执行上述操作,从而产生多批架构数据。
为了便于理解,本申请还提供了一具体示例对生成神经网络架构的过程进行详细说明。如图6所示,神经元各属性的取值个数最多为6个,因此,每一批架构数据包括6条架构数据。具体生成过程如下所示:
第一条架构数据中block0的x node的id从0和1中随机选择了0,block0的x node的op从0-5六个值中随机选择了0,剩下的位置也依次随机选择生成了第一条架构数据Arc0。
第二条架构数据中block0的x node的id取值范围为0和1,减去Arc0已经选择的0,剩下的取值范围为1,只能选择1。block0 x node的op取值范围为0-5,减去Arc0中选择的0,从剩下的1-5中随机选择了1。依此类推,block0的y node、剩下的block的x/y node按照上述方式随机选择,生成第二条架构数据Arc1。
第三条架构数据block0的x node的id取值范围0和1,由于Arc0和Arc1已经完成覆盖了其取值范围,因此直接从其取值范围中随机选择一个数字,此处选择了0。block0 x node的op取值范围0-5减去Arc0和Arc1已经选择的0和1,从2-5中选择了3。依此类推,block0的y node、剩下的block的x/y node按照上述方式随机选择,生成第三条架构数据Arc2。
依此类推,直到第六条架构数据的每个node的op覆盖其取值范围0-5,则生成一批架构数据。
还需要说明的是,搜索空间可以是用户自定义的,也可以是搜索系统自动生成的。具体地,用户可以配置目标场景,如图像分类场景、目标检测场景等等,搜索系统可以基于目标场景搜索内置的搜索空间,然后根据该内置的搜索空间确定与目标场景相匹配的搜索空间。
S404:生成器202向模型训练平台206发送多个神经网络架构。
具体地,生成器202可以向模型训练平台206发送多个神经网络架构的编码。生成器202可以分批发送神经网络架构的编码,例如一次发送一批架构数据,通过多次发送实现发送多批架构数据。当然,生成器202也可以一次性地发送多批神经网络架构的编码,例如一次性地发送多批架构数据,如此可以提高传输效率,节省传输资源。
S406:模型训练平台206根据多个神经网络架构获得多个子模型。
具体地,模型训练平台206可以对神经网络架构进行权重初始化得到多个初始子模型,然后利用训练数据训练多个初始子模型,得到多个子模型。其中,模型训练平台206从生成器202获取到神经网络架构的编码时,还需要先对神经网络架构的编码进行解析,得到神经网络架构。例如,神经网络架构的编码为[1 5 0 0 0 2 0 5 0 5 1 1 1 5 4 2 4 5 5 3]时, 模型训练平台206解析该编码得到子模型的神经网络架构为:每层包括5个block,其中,block0中x node的id为1,op为5,block1中y node的id为0,op为0,依此类推,在此不再赘述。
训练初始子模型所采用的训练数据可以是与任务对应的数据集。例如,对于图像分类任务,训练数据可以是公开数据集ImageNet 1000,其他用于图像分类的公开数据集,或者是用户提供的数据集。又例如,对于目标检测任务,训练数据可以是视觉目标分类(visual object classification,VOC)、基于内容的共同目标(common objects in context,COCO)等公开数据集,或者是用户提供的数据集。
在训练初始子模型时,模型训练平台206可以设置批次大小batch size,然后采用分批迭代方式训练初始子模型,如此可以提高训练效率,缩短训练收敛时间。具体地,模型训练平台206可以按照批次大小分批输入训练数据,然后基于一批训练数据通过梯度下降法更新一次初始子模型参数,从而实现一次迭代训练。模型训练平台206根据上述方式进行多次迭代训练,当满足训练结束条件,如子模型收敛或者子模型的损失值小于预设损失值时,停止训练。
在本申请实施例中,初始子模型的训练过程不存在相互依赖关系,因此,模型训练平台206可以并发地对多个初始子模型进行训练,如此可以缩短训练时长,提高训练效率。例如,初始子模型数量为600个时,模型训练平台206可以使用6台机器对这600个初始子模型进行并行训练。其中,每台机器配备有8张V100图形处理器(graphics processing unit,GPU),如此可以进一步提高并发速度,进而提高并行训练效率。
S408:模型训练平台206向模型推理平台208发送多个子模型。
模型训练平台206向模型推理平台发送训练完成的子模型。每个子模型可以通过其对应的神经网络架构的编码以及参数集(模型参数的集合,通常是权重的集合)进行表征。基于此,模型训练平台206在向模型推理平台208发送子模型时,可以发送子模型对应的神经网络架构的编码以及参数集。
S410:模型推理平台208对多个子模型在第一硬件上执行推理,获得多个子模型在第一硬件上的评价指标值。
模型推理平台208可以从至少一个维度对子模型在第一硬件上执行推理,得到至少一种评价指标值,例如精度、参数量、计算力、模型推理时间等评价指标值。其中,计算力具体可以通过每秒浮点运算次数FLOPs进行衡量。
与模型训练平台206类似,模型推理平台208可以并行地对多个子模型在第一硬件上执行推理,获得多个子模型在第一硬件上的评价指标值。其中,该模型推理平台208对子模型执行推理所得评价指标值为真实指标值。
S412:模型推理平台208向搜索器204发送多个子模型对应的神经网络架构以及多个子模型在第一硬件上的评价指标值。
其中,模型推理平台208在发送神经网络架构时,可以发送该神经网络架构的编码。模型推理平台208通过编码和评价指标值与搜索系统中的搜索器204交互,可以满足保护隐私的需求。
S414:搜索器204根据多个子模型对应的神经网络架构以及多个子模型在第一硬件上的评价指标值确定满足预设条件的第一目标神经网络架构。
搜索器204根据多个子模型对应的神经网络架构以及多个子模型在第一硬件上的评 价指标值,搜索神经网络架构。由于一个新的神经网络架构的产生不再依赖上一个子模型对控制器的训练,一个新的子模型的产生不再依赖于对上述新的神经网络架构进行权重初始化所得的初始子模型的训练,缩短了搜索神经网络架构的时间,提高了搜索效率,能够较快地确定满足预设条件的第一目标神经网络架构。
传统的神经网络架构搜索是通过强化学习训练一个控制器,训练该控制器所采用的激励是根据神经网络架构获得的子模型的真实评价指标值,如此,导致控制器获得激励的延迟较长,极大地影响了训练效率,从而影响了搜索效率。
基于此,本申请提出了一种利用评估器直接对神经网络架构的评价指标值进行预测,基于该预测的评价指标值训练控制器,从而缩短控制器获得激励的延迟,以提高训练效率,进而提高搜索效率。
参见图7,搜索器204根据多个子模型对应的神经网络架构以及多个子模型在第一硬件上的评价指标值确定满足预设条件的第一目标神经网络架构,具体包括如下步骤:
S4142:搜索器204根据所述多个子模型对应的神经网络架构和所述多个子模型在第一硬件上的评价指标值训练评估器2042。
评估器2042属于一种神经网络。该神经网络以神经网络架构为输入,具体是神经网络架构的编码为输入,以神经网络架构对应的评价指标值为输出。也即该评估器2042用于对神经网络架构的评价指标值进行预测。
在实际应用时,评估器2042可以通过门控循环单元(gated recurrent unit,GRU)或者长短期记忆(long short-term memory,LSTM)等时间递归网络实现。具体地,搜索器204通过GRU或者LSTM构建一个评估器2042,然后利用多个子模型对应的神经网络架构和多个子模型在第一硬件上的评价指标值训练该评估器2042。训练过程具体为将神经网络架构和评价指标值输入评估器2042,以输入的评价指标值作为神经网络架构的标签进行监督学习,根据评估器2042预测的评价指标值和标签确定的损失值更新第一初始模型的权重。当满足训练结束条件,如评估器2042趋于收敛或评估器2042的损失值小于预设损失值时,停止训练,该训练后的评估器2042可以用于预测神经网络架构的评价指标值。
为了便于理解,本申请实施例以基于GRU实现进行示例说明。具体地,评估器2042包括至少一个GRU Cell,该GRU Cell以神经网络架构为输入,以评价指标值为输出。考虑到序列关联性,评估器2042可以包括多个GRU Cell,并且多个GRU Cell级联,一个GRU Cell的隐藏层状态可以输入至下一个GRU Cell,以便下一个GRU Cell结合该隐藏层状态推理评价指标值。需要说明的是,在一些可能的实现方式中,评估器2042还可以利用注意力机制,引入注意力层进行回归。
图8示出了评估器2042的一个结构示意图,如图8所示,可以将神经网络架构的编码(arch code)作为输入分别输入GRU cell,每个GRU Cell基于当前输入的arch code以及上一GRU的隐藏层状态(hidden state)进行处理,然后通过注意力层(attention layer)进行回归可以输出评价指标值,基于输出的评价指标值以及输入的训练数据中的评价指标值(即训练数据中的标签)可以更新评估器2042的权重,如此可以实现评估器2042的训练。当评估器2042收敛时,可以停止对评估器2042的训练。
其中,对于评估器2042的训练属于监督学习方式。基于此,搜索器204可以采用梯 度下降法如随机梯度下降法等进行迭代训练,从而提高训练效率,缩短训练收敛时间。
S4144:所述搜索器204利用已训练的评估器2042训练所述控制器2044。
控制器2044属于一种神经网络。在具体实现时,控制器2044可以通过GRU实现。具体地,控制器2044可以采用强化学习的方法进行训练。其训练过程具体为,控制器2044根据搜索空间定义,提供一个神经网络架构的编码,然后通过评估器2042预测该神经网络架构的评价指标值,采用预测的评价指标值代替基于该神经网络架构获得的子模型的评价指标值(真实的评价指标值)作为控制器2044的激励(reward),根据该激励更新控制器2044的参数。重复以上步骤,训练后的控制器2044满足训练结束条件,如训练后的控制器2044收敛时,该控制器2044可以输出候选神经网络架构。
S4146:所述搜索器204根据已训练的控制器2044确定满足预设条件的第一目标神经网络架构。
具体地,搜索器204获取控制器2044输出的候选神经网络架构,然后根据候选神经网络架构获得候选的初始子模型,接着,对候选的初始子模型进行训练得到候选子模型,然后对候选子模型执行推理,得到候选子模型的评价指标值。基于该评价指标值可以从候选子模型中确定满足预设条件的目标子模型,该目标子模型对应的神经网络架构即为第一目标神经网络架构。
在一些可能的实现方式中,控制器2044也可以将候选神经网络架构发送给用户装置,由用户通过用户装置从候选神经网络架构中确定满足预设条件的第一目标神经网络架构。本实施例对此不作限定。
基于上述内容描述,本申请提供的神经网络架构搜索方法中评估器2042可以重复利用训练数据,减少了训练数据的数量,降低了空间采样量的要求。并且,评估器2042能够基于少量的样本快速收敛,然后评估器2042可以用于对控制器2044提供的神经网络架构进行反馈,使得控制器2044也能较快收敛,如此提高了训练效率,进一步缩短了搜索时长,提高了搜索效率。
在一些实现方式中,用户出于控制成本的需求,或者供需关系发生变化(如某种类型硬件可能无法如期供货)等等,需要将子模型扩展到别的硬件或迁移到别的硬件如第二硬件上。而第二硬件的设计与第一硬件的设计有所区别,有可能导致适于第一硬件的第一目标神经网络架构并不适于第二硬件。基于此,本申请实施例还提供了一种搜索适于第二硬件的第二目标神经网络架构的方法。其中,第二硬件可以是已知硬件,也可以是新硬件。
该方法能够利用搜索适于第一硬件的第一目标神经网络架构时训练得到的子模型,无需重新训练,因此,可以快速确定适于第二硬件的第二目标神经网络架构。进一步地,评价指标值包括两类评价指标值,第一类评价指标值随着硬件变化而发生变化,第二类评价指标值不会随着硬件变化而发生变化,当搜索适于第一硬件的第一目标神经网络架构时采用的评价指标值包括第二类评价指标值时,还可以将子模型在第一硬件上的第二类评价指标值作为子模型在第二硬件上的第二类评价指标值,如此节省了对子模型在第二硬件执行推理的时间,进一步缩短了搜索时间,提高了搜索效率。
下面结合附图,对本申请实施例提供的上述方法进行介绍。
参见图9所示的搜索适于第二硬件的神经网络架构的方法的流程图,该方法包括:
S902:模型推理平台208对多个子模型在第二硬件上执行推理,得到多个子模型在第二硬件上的评价指标值。
S904:模型推理平台208向搜索器204发送多个子模型对应的神经网络架构以及多个子模型在第二硬件上的评价指标值。
模型推理平台208对多个子模型在第二硬件上执行推理,以及向搜索器208发送多个子模型对应的神经网络架构和多个子模型在第二硬件上的评价指标值的具体实现可以参见图4所示实施例相关内容描述。
当评价指标值为第二类评价指标值时,执行搜索适于第二硬件的第二目标神经网络架构的方法也可以不执行上述S902和S904,搜索器204可以直接将子模型在第一硬件上的评价指标值作为子模型在第二硬件上的评价指标值。
S906:搜索器204根据多个子模型对应的神经网络架构以及多个子模型在第二硬件上的评价指标值确定满足预设条件的第二目标神经网络架构。
搜索器204根据多个子模型对应的神经网络架构以及多个子模型在第二硬件上的评价指标值,确定的目标神经网络架构为适于第二硬件的第二目标神经网络架构。其具体实现可以参见图4、图7所示实施例相关内容描述,在此不再赘述。
当评价指标值仅包括第二类评价指标值时,则搜索系统搜索得到的适于第一硬件的第一目标神经网络架构也适于第二硬件。即评价指标值仅包括第二类评价指标值时,搜索系统搜索得到的适于第一硬件的第一目标神经网络架构和搜索系统搜索得到的适于第二硬件的第二目标神经网络架构相同。
为了便于理解,本申请还提供了搜索适于第二硬件的第二目标神经网络架构的具体示例。
如图10所示,搜索适于第一硬件(device type1)的目标神经网络架采用的评价指标值包括第一类评价指标值和第二类评价指标值,其中,采用的第一类评价指标值为模型推理时间这一性能值,采用的第二类评价指标值为准确率这一精度值。搜索器204可以直接获取子模型在第一硬件上的精度值作为子模型在第二硬件上的精度值,以及对子模型在第二硬件上执行推理,得到子模型在第二硬件上的模型推理时间这一性能值。
如此搜索器204即可获得子模型在第二硬件上的评价指标值,根据多个子模型对应的神经网络架构(具体为神经网络架构的编码)以及多个子模型在第二硬件上的评价指标值(即图中的评价指标值和架构编码)可以训练一个评估器2042,利用已训练的评估器2042对控制器2044提供的神经网络架构进行预测,可以得到该神经网络架构对应的评价指标值,搜索器204以该评价指标值为激励更新控制器2044的参数,从而实现对控制器2044的训练。当控制器2044收敛时,可以输出候选神经网络架构(具体为候选神经网络架构的编码)。基于候选神经网络架构可以确定适于第二硬件的第二目标神经网络架构。为了便于描述,图中将适于第一硬件的候选神经网络架构的编码简称为第一候选架构编码,将适于第二硬件的候选神经网络架构的编码简称为第二候选架构编码。
在一些可能的实现方式中,搜索系统还可以具有保护数据安全、模型隐私的功能。如图11所示,由于子模型训练和神经网络架构搜索过程的解耦,搜索系统的服务侧可以不包括模型训练平台206和模型推理平台208,具体包括架构生成和架构搜索模块,即包括生成器202和搜索器204。其余的步骤如模型训练和模型推理可以在用户侧的模型训练平台206、模型推理平台208中进行,从而实现保护用户数据/模型隐私的目标。
如图11所示,生成器202可以向用户提供应用程序编程接口(application programming interface,API),生成器202可以通过该API对搜索空间进行处理,从而为用户生成多个神经网络架构。
其中,用户装置获得的多个神经网络架构是以神经网络架构的编码形式存在。神经网络架构的编码经过解析可以得到神经网络架构。用户侧的模型训练平台206可以对神经网络架构进行权重初始化,得到初始子模型,然后利用自身的数据集对初始子模型进行训练,得到子模型。用户侧的模型推理平台208可以对子模型在第一硬件上执行推理,得到子模型在第一硬件上的评价指标值。
搜索器204获取多个子模型对应的神经网络架构(具体是神经网络架构的编码)和多个子模型在第一硬件上的评价指标值,根据上述神经网络架构的编码和评价指标值训练评估器2042,当评估器2042训练完成时,利用已训练的评估器2042训练控制器2044,当控制器2044训练完成时,可以获取控制器2044输出的候选神经网络架构,基于该候选神经网络架构可以确定出满足预设条件的第一目标神经网络架构。由于用户侧通过神经网络架构的编码和子模型的评价指标值与服务侧交互,不会泄露数据和模型隐私,保障了安全性。
在一些可能的实现方式中,搜索系统还可以在不需要数据交换的前提下,利用不同用户提供的训练数据进行搜索,如此可以提升搜索出来的神经网络架构的精度/性能,打破神经网络架构搜索过程中的数据孤岛问题。
具体地,生成器202可以生成N个神经网络架构,N个神经网络架构经过权重初始化可以获得N个初始子模型,所述N大于1。不同用户共提供有M个数据集以供训练初始子模型时使用,M大于1。考虑到隐私和数据安全,上述M个数据集不进行数据交换。基于此,搜索系统可以通过以下几种方式进行搜索。
第一种实现方式为,模型训练平台206对每个初始子模型利用M个数据集进行联邦学习得到N个子模型,模型推理平台208对联邦学习所得的N个子模型在第一硬件上执行推理,得到各个子模型在第一硬件上的评价指标值。搜索器204以基于N个初始子模型进行联邦学习所得的子模型的评价指标值作为上述N个神经网络架构的标签,得到训练数据,利用该训练数据训练评估器2042。当评估器2042训练完成时,如评估器2042收敛时,利用评估器2042对控制器2044提供的神经网络架构进行预测,得到神经网络架构对应的评价指标值,将该评价指标值作为激励反馈给控制器2044,以更新控制器2044参数,实现对控制器2044的训练,从而实现搜索适于第一硬件的第一目标神经网络架构。
第二种实现方式为,模型训练平台206对每个初始子模型分别采用M个数据集进行训练,如此,可以得到N*M个子模型。模型推理平台208可以对N*M个子模型在第一硬件上执行推理,得到N*M个子模型在第一硬件上的评价指标值。这N*M个子模型的评价指标值可以作为其对应的神经网络架构的标签,从而形成训练数据,用于训练评估器2042。
在训练评估器2042时,可以基于训练初始子模型所采用的数据集将训练数据划分为M组,如此,相当于为评估器2042提供了M组训练数据。搜索器204可以利用M组训练数据,采用联邦学习方式训练评估器2042。评估器2042训练完成时,搜索器204可以利用该评估器2042对控制器2044提供的神经网络架构进行预测,得到神经网络架构对应的评价指标值,将该评价指标值作为反馈训练控制器2044,从而实现搜索适于第一硬 件的第一目标神经网络架构。
第三种实现方式为,将所述N个初始子模型划分为M组初始子模型,所述M组初始子模型与M个数据集一一对应,模型训练平台206对所述M组初始子模型采用对应的数据集进行训练得到M组子模型。模型推理平台208可以对上述N个子模型在第一硬件上执行推理,得到N个子模型在第一硬件上的评价指标值。每一个子模型对应的神经网络架构以及子模型的评价指标值可以形成一个训练数据,如此,可以获得M组训练数据。搜索器204可以利用M组训练数据,采用联邦学习方式训练评估器2042。评估器2042训练完成时,搜索器204利用评估器2042对控制器2044提供的神经网络架构进行预测,得到神经网络架构对应的评价指标值,将该评价指标值作为反馈训练控制器2044,从而实现搜索适于第一硬件的第一目标神经网络架构。
为了便于理解,本申请实施例还以两个用户分别提供数据集对神经网络架构搜索方法进行示例说明。如图12所示,生成器202可以基于搜索空间生成多个神经网络架构,这多个神经网络架构可以通过编码进行表示。用户A和B拥有各自的数据集,用户A和用户B可以将各自拥有的数据集提供给模型训练平台206,模型训练平台206可以解析神经网络架构的编码得到神经网络架构,然后对神经网络架构进行权重初始化,得到初始子模型,接着利用用户A的数据集和用户B的数据集,采用联邦学习算法对多个初始子模型中的每个初始子模型进行训练,得到多个子模型。
然后模型推理平台208可以对多个子模型在第一硬件上执行推理,得到多个子模型在第一硬件上的评价指标值。在图12的场景中,根据每个神经网络架构可以获得2个子模型,包括基于用户A提供的训练数据训练得到的子模型和基于用户B提供的训练数据训练得到的子模型。模型推理平台208对每个子模型在第一硬件上执行推理,得到该子模型在第一硬件上的评价指标值,每个评价指标值与对应的神经网络架构可以形成一个训练数据,如此根据一个神经网络架构可以得到两个训练数据。这些训练数据可以基于数据集分为两组,搜索器204可以利用两组训练数据,采用联邦学习算法,训练评估器2042。当评估器2042训练完成时,利用已训练的评估器2042对控制器2044提供的神经网络架构进行预测,得到该神经网络架构对应的评价指标值,该评价指标值可以作为反馈,用于训练控制器2044,从而实现搜索适于第一硬件的第一目标神经网络架构。
以上结合图1至图12对本申请实施例提供的神经网络架构搜索方法进行了详细介绍,接下来,将结合附图对本申请实施例提供的神经网络架构搜索装置和设备进行介绍。
参见图13所示的神经网络架构搜索装置的结构示意图,所述装置1300应用于搜索系统,所述搜索系统包括生成器和搜索器,所述装置1300包括:
生成模块1302,用于根据搜索空间生成多个神经网络架构;
通信模块1304,用于获取根据所述多个神经网络架构获得的多个子模型在第一硬件上的评价指标值;
搜索模块1306,用于根据所述多个子模型对应的神经网络架构和所述多个子模型在第一硬件上的评价指标值,确定满足预设条件的第一目标神经网络架构。
其中,生成模块1302的具体实现可以参见图4所示实施例中S402相关内容描述,通信模块1304的具体实现可以参见图4所示实施例中S412相关内容描述,搜索模块1306的具体实现可以参见图4所示实施例中S414相关内容描述,在此不再赘述。
在一些可能的实现方式中,所述搜索器包括评估器和控制器,所述搜索模块1306具体用于:
根据所述多个子模型对应的神经网络架构和所述多个子模型在第一硬件上的评价指标值训练所述评估器;
利用已训练的评估器训练所述控制器,根据已训练的控制器确定满足预设条件的所述第一目标神经网络架构。
其中,搜索模块1306的具体实现可以参见图7所示实施例中相关内容描述,在此不再赘述。
在一些可能的实现方式中,所述多个子模型在第一硬件上的评价指标值表示所述多个子模型在所述第一硬件上执行推理获得的评价指标值。
在一些可能的实现方式中,所述评价指标值包括与硬件相关的性能值,所述通信模块1304还用于:
获取所述多个子模型在第二硬件上的性能值,所述多个子模型在第二硬件上的性能值由所述多个子模型在所述第二硬件上执行推理得到;
所述搜索模块1306还用于:
根据所述多个子模型对应的神经网络架构和所述多个子模型在第二硬件上的性能值,确定满足预设条件的第二目标神经网络架构。
其中,通信模块1304的具体实现以及搜索模块1306的具体实现,可以参见图9所示实施例中相关内容描述,在此不再赘述。
在一些可能的实现方式中,所述与硬件相关的性能值包括以下任意一种或多种:模型推理时间、激活量、吞吐量、功耗和显存占用率。
在一些可能的实现方式中,所述搜索空间通过神经元的各属性的属性取值空间进行表征;
所述生成模块1302具体用于:
从所述神经元的各属性的属性取值空间中为每个属性随机选择属性值,获得所述多个神经网络架构。
在一些可能的实现方式中,所述生成模块1302具体用于:
向用户提供应用程序编程接口,通过所述应用程序编程接口为所述用户生成多个神经网络架构。
在一些可能的实现方式中,所述搜索系统还包括模型训练平台,所述装置1300还包括训练模块,用于:
对N个初始子模型中的每个初始子模型利用M个数据集进行联邦学习得到N个子模型;或者,
对N个初始子模型中的每个初始子模型分别采用M个数据集进行训练得到N*M个子模型;或者,
将所述N个初始子模型划分为M组初始子模型,所述M组初始子模型与M个数据集一一对应,对所述M组初始子模型采用对应的数据集进行训练得到M组子模型;
其中,所述N个初始子模型根据所述多个神经网络架构获得,所述初始子模型与所述神经网络架构一一对应,所述N大于1,所述M大于1。
根据本申请实施例神经网络架构搜索装置1300可对应于执行本申请实施例中描述的 方法,并且神经网络架构搜索装置1300的各个模块的上述和其它操作和/或功能分别为了实现图4、图7和图9中的各个方法的相应流程,为了简洁,在此不再赘述。
上述神经网络架构搜索装置1300可以通过计算机集群实现,计算机集群包括至少一台计算机。图14提供了一种计算机集群,图14所示的计算机集群以包括一台计算机进行示例说明。如图14所示,计算机集群1400具体可以用于实现上述图13所示实施例中神经网络架构搜索装置1300的功能。计算机集群1400包括总线1401、处理器1402、通信接口1403和存储器1404。处理器1402、存储器1404和通信接口1403之间通过总线1401通信。总线1401可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图14中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。通信接口1403用于与外部通信。例如,通信接口1403可以获取根据所述多个神经网络架构获得的多个子模型在第一硬件上的评价指标值,获取多个子模型在第二硬件上的评价指标值等。
其中,处理器1402可以为中央处理器(central processing unit,CPU)。存储器1404可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器1404还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,HDD或SSD。
存储器1404中存储有可执行代码,处理器1402执行该可执行代码以执行前述神经网络架构搜索方法。
具体地,在实现图13所示实施例的情况下,且图13实施例中所描述的各模块为通过软件实现的情况下,执行图13中的生成模块1302、搜索模块1306、训练模块功能所需的软件或程序代码存储在存储器1404中。通信模块1304功能通过通信接口1403实现。
处理器1402用于执行存储器1404中的指令,例如执行生成模块1302对应的指令,以根据搜索空间生成多个神经网络架构,通信接口1403获取根据多个神经网络架构获得的多个子模型在第一硬件上的评价指标值,通过总线1401传输至处理器1402,处理器1402执行搜索模块1306对应的指令,以根据所述多个子模型对应的神经网络架构以及多个子模型在第一硬件上的评价指标值,确定满足预设条件的第一目标神经网络架构,从而执行本申请实施例提供的神经网络架构搜索方法。
其中,处理器1402在根据所述多个子模型对应的神经网络架构以及多个子模型在第一硬件上的评价指标值,确定满足预设条件的第一目标神经网络架构时,具体是根据多个子模型对应的神经网络架构和所述多个子模型在第一硬件上的评价指标值(即评估器2042的训练数据)训练所述评估器2042,然后利用已训练的评估器2042训练所述控制器2044,根据已训练的控制器2044确定满足预设条件的所述第一目标神经网络架构。
在一些可能的实现方式中,多个神经网络架构可以通过权重初始化得到多个初始子模型,其中,多个初始子模型与多个神经网络架构一一对应。在本申请实施例中,初始子模型数量为N。处理器1402还可以在生成多个神经网络架构后,执行训练模块对应的指令,对初始子模型进行训练得到多个子模型。
其中,处理器1402对初始子模型进行训练具体包括以下几种方式:
第一种方式为,对N个初始子模型中的每个初始子模型利用M个数据集进行联邦学 习得到N个子模型;或者,
第二种方式为,对N个初始子模型中的每个初始子模型分别采用M个数据集进行训练得到N*M个子模型;或者,
第三种方式为,将所述N个初始子模型划分为M组初始子模型,所述M组初始子模型与M个数据集一一对应,对所述M组初始子模型采用对应的数据集进行训练得到M组子模型。
需要说明的是,当采用第二方式或第三种方式时,处理器1402在根据所述多个子模型对应的神经网络架构以及多个子模型在第一硬件上的评价指标值,确定满足预设条件的第一目标神经网络架构时,将训练数据按照训练初始子模型采用的数据集进行分组,然后利用不同组训练数据,采用联邦学习算法对评估器2042进行训练。然后处理器1402根据已训练的评估器训练控制器2044,根据已训练的控制器2044确定满足预设条件的第一目标神经网络架构。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质包括指令,所述指令指示计算机集群1400执行上述应用于神经网络架构搜索装置1300的神经网络架构搜索方法。
本申请实施例还提供了一种计算机程序产品,所述计算机程序产品被计算机执行时,所述计算机执行前述神经网络架构搜索方法的任一方法。该计算机程序产品可以为一个软件安装包,在需要使用前述神经网络架构搜索方法的任一方法的情况下,可以下载该计算机程序产品并在计算机上执行该计算机程序产品。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可 以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (18)

  1. 一种神经网络架构搜索方法,其特征在于,应用于搜索系统,所述搜索系统包括生成器和搜索器,所述方法包括:
    所述生成器根据搜索空间生成多个神经网络架构;
    所述搜索器获取根据所述多个神经网络架构获得的多个子模型在第一硬件上的评价指标值;
    所述搜索器根据所述多个子模型对应的神经网络架构和所述多个子模型在第一硬件上的评价指标值,确定满足预设条件的第一目标神经网络架构。
  2. 根据权利要求1所述的方法,其特征在于,所述搜索器包括评估器和控制器,所述搜索器根据所述多个子模型对应的神经网络架构和所述多个子模型在第一硬件上的评价指标值,确定满足预设条件的第一目标神经网络架构,包括:
    所述搜索器根据所述多个子模型对应的神经网络架构和所述多个子模型在第一硬件上的评价指标值训练所述评估器;
    所述搜索器利用已训练的评估器训练所述控制器,根据已训练的控制器确定满足预设条件的所述第一目标神经网络架构。
  3. 根据权利要求1或2所述的方法,其特征在于,所述多个子模型在第一硬件上的评价指标值表示所述多个子模型在所述第一硬件上执行推理获得的评价指标值。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述评价指标值包括与硬件相关的性能值,所述方法还包括:
    所述搜索器获取所述多个子模型在第二硬件上的性能值,所述多个子模型在第二硬件上的性能值由所述多个子模型在所述第二硬件上执行推理得到;
    所述搜索器根据所述多个子模型对应的神经网络架构和所述多个子模型在第二硬件上的性能值,确定满足预设条件的第二目标神经网络架构。
  5. 根据权利要求4所述的方法,其特征在于,所述与硬件相关的性能值包括以下任意一种或多种:模型推理时间、激活量、吞吐量、功耗和显存占用率。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述搜索空间通过神经元的各属性的属性取值空间进行表征;
    所述生成器根据搜索空间生成多个神经网络架构,包括:
    所述生成器从所述神经元的各属性的属性取值空间中为每个属性随机选择属性值,获得所述多个神经网络架构。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述生成器根据搜索空间生成多个神经网络架构,包括:
    所述生成器向用户提供应用程序编程接口,通过所述应用程序编程接口为所述用户生成多个神经网络架构。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述搜索系统还包括模型训练平台,所述方法还包括:
    所述模型训练平台对N个初始子模型中的每个初始子模型利用M个数据集进行联邦学习得到N个子模型;或者,
    所述模型训练平台对N个初始子模型中的每个初始子模型分别采用M个数据集进行训练得到N*M个子模型;或者,
    所述模型训练平台将N个初始子模型划分为M组初始子模型,所述M组初始子模型与M个数据集一一对应,对所述M组初始子模型采用对应的数据集进行训练得到M组子模型;
    其中,所述N个初始子模型根据所述多个神经网络架构获得,所述初始子模型与所述神经网络架构一一对应,所述N大于1,所述M大于1。
  9. 一种神经网络架构搜索装置,其特征在于,应用于搜索系统,所述搜索系统包括生成器和搜索器,所述装置包括:
    生成模块,用于根据搜索空间生成多个神经网络架构;
    通信模块,用于获取根据所述多个神经网络架构获得的多个子模型在第一硬件上的评价指标值;
    搜索模块,用于根据所述多个子模型对应的神经网络架构和所述多个子模型在第一硬件上的评价指标值,确定满足预设条件的第一目标神经网络架构。
  10. 根据权利要求9所述的装置,其特征在于,所述搜索器包括评估器和控制器,所述搜索模块具体用于:
    根据所述多个子模型对应的神经网络架构和所述多个子模型在第一硬件上的评价指标值训练所述评估器;
    利用已训练的评估器训练所述控制器,根据已训练的控制器确定满足预设条件的所述第一目标神经网络架构。
  11. 根据权利要求9或10所述的装置,其特征在于,所述多个子模型在第一硬件上的评价指标值表示所述多个子模型在所述第一硬件上执行推理获得的评价指标值。
  12. 根据权利要求9至11任一项所述的装置,其特征在于,所述评价指标值包括与硬件相关的性能值,所述通信模块还用于:
    获取所述多个子模型在第二硬件上的性能值,所述多个子模型在第二硬件上的性能值由所述多个子模型在所述第二硬件上执行推理得到;
    所述搜索模块还用于:
    根据所述多个子模型对应的神经网络架构和所述多个子模型在第二硬件上的性能值,确定满足预设条件的第二目标神经网络架构。
  13. 根据权利要求12所述的装置,其特征在于,所述与硬件相关的性能值包括以下任意一种或多种:模型推理时间、激活量、吞吐量、功耗和显存占用率。
  14. 根据权利要求9至13任一项所述的装置,其特征在于,所述搜索空间通过神经元的各属性的属性取值空间进行表征;
    所述生成模块具体用于:
    从所述神经元的各属性的属性取值空间中为每个属性随机选择属性值,获得所述多个神经网络架构。
  15. 根据权利要求9至14任一项所述的装置,其特征在于,所述生成模块具体用于:
    向用户提供应用程序编程接口,通过所述应用程序编程接口为所述用户生成多个神经网络架构。
  16. 根据权利要求9至15任一项所述的装置,其特征在于,所述搜索系统还包括模 型训练平台,所述装置还包括训练模块,用于:
    对N个初始子模型中的每个初始子模型利用M个数据集进行联邦学习得到N个子模型;或者,
    对N个初始子模型中的每个初始子模型分别采用M个数据集进行训练得到N*M个子模型;或者,
    将所述N个初始子模型划分为M组初始子模型,所述M组初始子模型与M个数据集一一对应,对所述M组初始子模型采用对应的数据集进行训练得到M组子模型;
    其中,所述N个初始子模型根据所述多个神经网络架构获得,所述初始子模型与所述神经网络架构一一对应,所述N大于1,所述M大于1。
  17. 一种计算机集群,其特征在于,所述计算机集群包括至少一台计算机,每台计算机包括处理器和存储器;
    所述至少一台计算机的处理器用于执行所述至少一台计算机的存储器中存储的指令,以使所述计算机集群执行如权利要求1至8任一项所述的方法。
  18. 一种计算机可读存储介质,其特征在于,包括指令,当其在计算机集群上运行时,使得所述计算机集群执行如权利要求1至8中任一项所述的方法。
PCT/CN2021/074533 2020-03-05 2021-01-30 一种神经网络架构搜索方法、装置、设备及介质 WO2021175058A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21765071.2A EP4105835A4 (en) 2020-03-05 2021-01-30 SEARCH METHOD AND APPARATUS FOR NEURAL NETWORK ARCHITECTURE, DEVICE AND MEDIUM
US17/902,206 US20220414426A1 (en) 2020-03-05 2022-09-02 Neural Architecture Search Method and Apparatus, Device, and Medium

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202010148339 2020-03-05
CN202010148339.1 2020-03-05
CN202010290428.X 2020-04-14
CN202010290428.XA CN113361680B (zh) 2020-03-05 2020-04-14 一种神经网络架构搜索方法、装置、设备及介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/902,206 Continuation US20220414426A1 (en) 2020-03-05 2022-09-02 Neural Architecture Search Method and Apparatus, Device, and Medium

Publications (1)

Publication Number Publication Date
WO2021175058A1 true WO2021175058A1 (zh) 2021-09-10

Family

ID=77524374

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/074533 WO2021175058A1 (zh) 2020-03-05 2021-01-30 一种神经网络架构搜索方法、装置、设备及介质

Country Status (4)

Country Link
US (1) US20220414426A1 (zh)
EP (1) EP4105835A4 (zh)
CN (1) CN113361680B (zh)
WO (1) WO2021175058A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113568659A (zh) * 2021-09-18 2021-10-29 深圳比特微电子科技有限公司 参数配置模型的训练方法、参数配置方法和参数配置设备
CN114493052A (zh) * 2022-04-08 2022-05-13 南方电网数字电网研究院有限公司 多模型融合自适应新能源功率预测方法和系统
WO2023082045A1 (zh) * 2021-11-09 2023-05-19 华为技术有限公司 一种神经网络架构搜索的方法和装置
CN116663618A (zh) * 2023-07-28 2023-08-29 之江实验室 一种算子优化方法、装置、存储介质及电子设备
EP4250180A1 (en) * 2022-03-21 2023-09-27 Beijing Tusen Zhitu Technology Co., Ltd. Method and apparatus for generating neural network

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114117206B (zh) * 2021-11-09 2023-05-30 北京达佳互联信息技术有限公司 推荐模型处理方法、装置、电子设备及存储介质
CN115170565B (zh) * 2022-09-06 2022-12-27 浙商银行股份有限公司 基于自动神经网络架构搜索的图像欺诈检测方法及装置
CN116679981B (zh) * 2023-08-03 2023-11-24 北京电科智芯科技有限公司 一种基于迁移学习的软件系统配置调优方法及装置
CN116962176B (zh) * 2023-09-21 2024-01-23 浪潮电子信息产业股份有限公司 一种分布式集群的数据处理方法、装置、系统及存储介质
CN117648946A (zh) * 2024-01-30 2024-03-05 南湖实验室 一种面向安全关键系统的dnn模型自动生成方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378464A (zh) * 2019-06-27 2019-10-25 苏州浪潮智能科技有限公司 人工智能平台的配置参数的管理方法和装置
CN110476172A (zh) * 2017-07-21 2019-11-19 谷歌有限责任公司 用于卷积神经网络的神经架构搜索
CN110610231A (zh) * 2019-08-26 2019-12-24 联想(北京)有限公司 一种信息处理方法、电子设备和存储介质
CN110728355A (zh) * 2019-09-11 2020-01-24 北京百度网讯科技有限公司 神经网络架构搜索方法、装置、计算机设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111602148B (zh) * 2018-02-02 2024-04-02 谷歌有限责任公司 正则化神经网络架构搜索
CN109284820A (zh) * 2018-10-26 2019-01-29 北京图森未来科技有限公司 一种深度神经网络的结构搜索方法及装置
CN110175671B (zh) * 2019-04-28 2022-12-27 华为技术有限公司 神经网络的构建方法、图像处理方法及装置
CN110543944B (zh) * 2019-09-11 2022-08-02 北京百度网讯科技有限公司 神经网络结构搜索方法、装置、电子设备和介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110476172A (zh) * 2017-07-21 2019-11-19 谷歌有限责任公司 用于卷积神经网络的神经架构搜索
CN110378464A (zh) * 2019-06-27 2019-10-25 苏州浪潮智能科技有限公司 人工智能平台的配置参数的管理方法和装置
CN110610231A (zh) * 2019-08-26 2019-12-24 联想(北京)有限公司 一种信息处理方法、电子设备和存储介质
CN110728355A (zh) * 2019-09-11 2020-01-24 北京百度网讯科技有限公司 神经网络架构搜索方法、装置、计算机设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4105835A4

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113568659A (zh) * 2021-09-18 2021-10-29 深圳比特微电子科技有限公司 参数配置模型的训练方法、参数配置方法和参数配置设备
CN113568659B (zh) * 2021-09-18 2022-02-08 深圳比特微电子科技有限公司 参数配置模型的训练方法、参数配置方法和参数配置设备
WO2023082045A1 (zh) * 2021-11-09 2023-05-19 华为技术有限公司 一种神经网络架构搜索的方法和装置
EP4250180A1 (en) * 2022-03-21 2023-09-27 Beijing Tusen Zhitu Technology Co., Ltd. Method and apparatus for generating neural network
CN114493052A (zh) * 2022-04-08 2022-05-13 南方电网数字电网研究院有限公司 多模型融合自适应新能源功率预测方法和系统
CN114493052B (zh) * 2022-04-08 2022-10-11 南方电网数字电网研究院有限公司 多模型融合自适应新能源功率预测方法和系统
CN116663618A (zh) * 2023-07-28 2023-08-29 之江实验室 一种算子优化方法、装置、存储介质及电子设备
CN116663618B (zh) * 2023-07-28 2023-12-05 之江实验室 一种算子优化方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
EP4105835A1 (en) 2022-12-21
CN113361680A (zh) 2021-09-07
CN113361680B (zh) 2024-04-12
EP4105835A4 (en) 2023-08-02
US20220414426A1 (en) 2022-12-29

Similar Documents

Publication Publication Date Title
WO2021175058A1 (zh) 一种神经网络架构搜索方法、装置、设备及介质
WO2022083624A1 (zh) 一种模型的获取方法及设备
CN113053115B (zh) 一种基于多尺度图卷积网络模型的交通预测方法
CN109214599B (zh) 一种对复杂网络进行链路预测的方法
CN110533183B (zh) 流水线分布式深度学习中异构网络感知的任务放置方法
CN109120462B (zh) 机会网络链路的预测方法、装置及可读存储介质
US20200167659A1 (en) Device and method for training neural network
CN116011510A (zh) 用于优化机器学习架构的框架
US20230196202A1 (en) System and method for automatic building of learning machines using learning machines
WO2021254114A1 (zh) 构建多任务学习模型的方法、装置、电子设备及存储介质
CN112101525A (zh) 一种通过nas设计神经网络的方法、装置和系统
CN115965061A (zh) 通过实时代理评估反馈增强的深度神经网络模型设计
Navgaran et al. Evolutionary based matrix factorization method for collaborative filtering systems
Yan et al. Study on deep unsupervised learning optimization algorithm based on cloud computing
Addanki et al. Placeto: Efficient progressive device placement optimization
Musikawan et al. Parallelized metaheuristic-ensemble of heterogeneous feedforward neural networks for regression problems
Lu et al. An adaptive neural architecture search design for collaborative edge-cloud computing
US20240095529A1 (en) Neural Network Optimization Method and Apparatus
Han et al. An efficient genetic algorithm for optimization problems with time-consuming fitness evaluation
Li et al. Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network
CN111831955A (zh) 一种锂离子电池剩余寿命预测方法及系统
Ying et al. Neural architecture search using multi-objective evolutionary algorithm based on decomposition
CN116341634A (zh) 神经结构搜索模型的训练方法、装置及电子设备
Wang et al. End-edge-cloud collaborative computing for deep learning: A comprehensive survey
Liang et al. Multi-objective memetic algorithms with tree-based genetic programming and local search for symbolic regression

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21765071

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021765071

Country of ref document: EP

Effective date: 20220914

NENP Non-entry into the national phase

Ref country code: DE