WO2022037039A1 - 神经网络架构搜索方法以及装置 - Google Patents

神经网络架构搜索方法以及装置 Download PDF

Info

Publication number
WO2022037039A1
WO2022037039A1 PCT/CN2021/080497 CN2021080497W WO2022037039A1 WO 2022037039 A1 WO2022037039 A1 WO 2022037039A1 CN 2021080497 W CN2021080497 W CN 2021080497W WO 2022037039 A1 WO2022037039 A1 WO 2022037039A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
search
task
tasks
configuration information
Prior art date
Application number
PCT/CN2021/080497
Other languages
English (en)
French (fr)
Inventor
乔萧雅
刘国宝
周雍恺
Original Assignee
中国银联股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国银联股份有限公司 filed Critical 中国银联股份有限公司
Publication of WO2022037039A1 publication Critical patent/WO2022037039A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the field of deep learning, in particular to the field of neural network architecture search.
  • the deep learning neural network architecture search technology can automatically build a deep learning neural network that meets the needs, reduce manpower input, improve manpower efficiency, and has high industrial value.
  • the search task is submitted to the cluster management system as a single task, and the search task is executed on a single machine, which is difficult to perform in a distributed manner on a cluster composed of multiple machines. Executing search tasks in parallel results in low search efficiency and poor utilization of cluster resources.
  • the embodiments of the present application provide a neural network architecture search method and device to solve the problems existing in the related art.
  • the technical solutions are as follows:
  • a neural network architecture search method including:
  • the search result corresponding to the search task is obtained.
  • it also includes:
  • the step of generating multiple candidate model structures and corresponding parameters according to the configuration information is returned to.
  • the configuration information is configuration information input by a user, and the configuration information includes search task information and training task information.
  • multiple candidate model structures and corresponding parameters are generated according to the configuration information, including:
  • the search task information includes a search space and a search algorithm of a traditional model structure; a corresponding algorithm instance is created according to the search task information, including:
  • the corresponding algorithm instance is created.
  • the search task information includes a search algorithm for a search space and a single-shot model structure; a corresponding algorithm instance is created according to the search task information, including:
  • the corresponding algorithm instance is created.
  • it also includes:
  • scheduling multiple training tasks to corresponding physical nodes includes:
  • Multiple training tasks are scheduled to corresponding physical nodes according to the scheduling sequence.
  • the training result includes actual training times and actual values of the training index
  • the preset training conditions include the maximum number of searches and the expected value of the training index
  • the training results of multiple training tasks satisfy the preset training in the configuration information
  • the search results corresponding to the search task are obtained, including:
  • the optimal network structure and corresponding parameters are obtained, and the optimal network structure and corresponding parameters are used as search results.
  • it also includes:
  • a corresponding temporary task is created for each training task, and the temporary task is used to store the training results in the data storage.
  • an embodiment of the present application provides a neural network architecture search device, including:
  • the configuration information acquisition module is used to acquire the configuration information of the search task
  • the candidate model structure generation module is used to generate multiple candidate model structures and corresponding parameters according to the configuration information
  • the training task creation module is used to create multiple training tasks based on multiple candidate model structures and parameters
  • the training task scheduling module is used for scheduling multiple training tasks to the corresponding physical nodes, so that each physical node executes the corresponding training tasks in parallel, and counts the training results of the multiple training tasks;
  • the search result generation module is configured to obtain the search result corresponding to the search task under the condition that the training result satisfies the preset training condition in the configuration information.
  • it also includes:
  • the iterative search triggering module is configured to trigger the candidate model structure generation module to perform the step of generating multiple candidate model structures and corresponding parameters according to the configuration information when the training result does not meet the preset training conditions in the configuration information.
  • the configuration information is configuration information input by a user, and the configuration information includes search task information and training task information.
  • the candidate model structure generation module includes:
  • the algorithm instance creation sub-module is used to create the corresponding algorithm instance according to the search task information
  • the candidate model structure generation sub-module is used to load the training task information into the algorithm instance to generate multiple candidate model structures and corresponding parameters.
  • the search task information includes a search space and a search algorithm of a traditional model structure
  • the algorithm instance creation submodule includes:
  • the first algorithm instance creation unit is configured to create a corresponding algorithm instance according to the search space and the search algorithm of the traditional model structure.
  • the search task information includes a search algorithm for a search space and a single-shot model structure
  • the algorithm instance creation submodule also includes:
  • the second algorithm instance creation unit is configured to create a corresponding algorithm instance according to the search space and the search algorithm of the single-shot model structure.
  • it also includes:
  • the model tuning task creation module is used to create multiple model tuning tasks according to multiple candidate model structures and corresponding parameters when the candidate model structure needs to be tuned;
  • the model tuning task scheduling module is used to schedule multiple model tuning tasks to the corresponding physical nodes, so that each physical node executes the corresponding model tuning tasks in parallel, obtains multiple optimized candidate model structures, and counts more The training results of each optimized candidate model structure.
  • the training task scheduling module includes:
  • the resource type weight calculation submodule is used to calculate the weights of multiple resource types corresponding to multiple training tasks according to the number of resources used by all training tasks corresponding to the search task;
  • the scheduling sequence determination sub-module is used to determine the scheduling sequence of multiple training tasks according to multiple resource weights corresponding to multiple training tasks;
  • the training task scheduling sub-module is used for scheduling multiple training tasks to corresponding physical nodes according to the scheduling sequence.
  • the training result includes the actual training times and the actual value of the training index
  • the preset training condition includes the maximum number of searches and the expected value of the training index
  • the search result generating module includes:
  • the search result generation sub-module is used to obtain the optimal network structure and corresponding parameters when the actual number of training is greater than or equal to the maximum number of searches, or the actual value of the training index is greater than or equal to the expected value of the training index. and the corresponding parameters as search results.
  • it also includes:
  • the temporary task creation module is used to create a corresponding temporary task for each training task, and the temporary task is used to store the training result in the data storage.
  • an electronic device comprising:
  • the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any of the above.
  • a non-transitory computer-readable storage medium storing computer instructions, the computer instructions being used to cause a computer to perform any of the above methods.
  • An embodiment in the above application has the following advantages or beneficial effects: because the configuration information of the search task is used to dynamically generate multiple candidate model structures and corresponding parameters, and create training tasks for multiple candidate model structures, the multiple training tasks are The physical nodes corresponding to the scheduling execute multiple training tasks in parallel, and then obtain the search results, so not only the search efficiency is improved, but also the utilization rate of cluster resources is improved. Scale search needs.
  • FIG. 1 is a schematic diagram of a neural network architecture search method according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a neural network architecture search scenario according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of another neural network architecture search method according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of another neural network architecture search method according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a training task scheduling method according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a neural network architecture search apparatus according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a neural network architecture search apparatus according to another embodiment of the present application.
  • FIG. 8 is a schematic diagram of a training task scheduling module according to an embodiment of the present application.
  • Fig. 9 is a block diagram of an electronic device for implementing a neural network architecture search method according to an embodiment of the present application.
  • a neural network architecture search method including:
  • Step S110 obtaining configuration information of the search task
  • Step S120 Generate multiple candidate model structures and corresponding parameters according to the configuration information
  • Step S130 Create multiple training tasks according to multiple candidate model structures and parameters
  • Step S140 Scheduling multiple training tasks to corresponding physical nodes, so that each physical node executes the corresponding training tasks in parallel, and counts the training results of the multiple training tasks;
  • Step S150 Obtain a search result corresponding to the search task when the training result satisfies the preset training condition in the configuration information.
  • components such as controllers, schedulers, data storage, and API (application) servers in the K8S system (Kubernetes, container cluster management system).
  • API application
  • the application container engine) container runs on the master node of the K8S system.
  • the controller obtains the configuration information of the search task.
  • the configuration information may be the default configuration information stored in the configuration file, or the configuration information input by the user through an interface operation, or the search task information input by the user through the command line on the client.
  • the configuration information includes search task information and training task information. Add missing information or default initial values to the configuration information, and set the configuration information completely.
  • the search space entered by the user is an appropriate search space: if the user selects an integer variable, the search space needs to provide the minimum and maximum integers; if the user selects a discrete variable, then The search space needs to provide a discrete list of all search options; if the user selects a floating-point variable, the search space needs to provide the minimum, maximum, and step size.
  • the controller After verifying the configuration information, the controller sends a request to create a training task to the API server through the Golang client provided by Kubernetes, and the API server persists the request to create a training task to the data storage.
  • the API server relies on the data store, which is the storage function provided by the ETCD (distributed consistent key-value store).
  • the API server sends the request to create the training task to the processor of the search task for further processing.
  • the processor of the search task uses the component of Informer (a client tool with a local cache and indexing mechanism) in the K8S system to monitor the events of the search task in the API server.
  • the Informer component registers callback handlers for events such as search tasks, training tasks, and algorithms.
  • the processor uses the client of the K8S system to process these changes.
  • the processor After receiving the above events, the processor first puts these events into the event queue maintained in the memory, and a special Worker (worker server) coroutine is responsible for fetching events from the event queue.
  • the processor obtains the configuration information of the search task from the API server through the client of the K8S system according to the reference object of the event.
  • the processor of the search task obtains the configuration information of the search task from the API server through the client provided by the K8S system.
  • the processor creates an algorithm instance according to a user-defined search space and a search algorithm in the search task information, and uses the algorithm instance to run the search algorithm.
  • the processor creates corresponding algorithm instances according to different types of search algorithms (search algorithms for traditional model structures, search algorithms for single-pass model structures, and search algorithms for optimized single-pass model structures, etc.).
  • search algorithms search algorithms for traditional model structures, search algorithms for single-pass model structures, and search algorithms for optimized single-pass model structures, etc.
  • the algorithm instance will use different implementations.
  • the processor will use the communication mechanism of the gRPC (google remote procedure call, remote procedure call) framework to enable the controller and the processor to communicate and interact, and the training task information in the controller will be communicated with the processor in the processor.
  • Algorithm instances interact to generate multiple candidate model structures and corresponding parameters (eg, weights).
  • the processor accesses the algorithm instance through the HTTP interface, and obtains multiple candidate model structures and corresponding parameters through the HTTP interface.
  • a training task of the candidate model structures is created.
  • a candidate model structure corresponds to creating a training task, and a training task is distributed to physical nodes on multiple container groups (pods) for parallel execution. And send the training tasks of multiple candidate model structures to the API server. Since the processor can monitor the API server, when there is a change in the creation of training tasks in the API server, the processor obtains the training task information in the configuration information from the API server through the client of the K8S system, and creates a bearer based on the training task information.
  • the container group (pod) runs with physical nodes. This function is implemented by Kubeflow.
  • the scheduler determines the physical nodes to which multiple training tasks need to be scheduled according to the training task information, so that these training tasks can be executed by multiple physical nodes in parallel.
  • the processor will determine whether the training task requires hardware accelerator resources such as GPU or CPU according to the training task information, so as to improve the used hardware resources and the scalability of hardware acceleration resources.
  • Each training task will accept the candidate model structure as input, and use the training code in the training task information to train.
  • the training task exits, and the processor modifies the state of the training task to complete. Then, the processor will count the training results, and the training results include: the total number of training tasks completed in the same search task, the actual value of the training indicators obtained by each training task, and the like.
  • the processor compares the maximum search times in the search task information and the expected value of the training index with the actual training times in the training result and the actual value of the training index. If the training result is greater than the corresponding value in the search task information, then The processor marks the status of this search task as complete.
  • the deep learning neural network architecture search technology utilizes the distributed execution capability of the K8S system to realize large-scale deep learning neural network architecture search.
  • the user When performing a search task, the user only needs to provide configuration information of the search task, such as search task information and training task information, to initiate a search task of the deep learning neural network architecture.
  • Configuration information of the search task such as search task information and training task information
  • Users do not need to build the operating environment for deep learning neural network architecture search, apply for cluster resources required for training tasks, and design content related to search algorithms, but are automatically implemented by the K8S system, which improves the convenience of search , adaptability, and search speed.
  • the configuration information of the search task input by the user multiple candidate model structures and corresponding parameters are dynamically generated, and training tasks for multiple candidate model structures are created.
  • the K8S system is used to schedule the corresponding physical nodes of the multiple training tasks in parallel Perform multiple training tasks to get search results. It not only improves the search efficiency, but also improves the utilization rate of cluster resources, and the used hardware resources and hardware acceleration resources have enhanced scalability to meet the needs of large-scale search.
  • the step of generating multiple candidate model structures and corresponding parameters according to the configuration information is returned to.
  • the processor compares the maximum number of searches in the search task information and the expected value of the training index with the actual number of trainings in the training result and the actual value of the training index, if
  • the processor triggers to return to step S120-step S140. Specifically, the processor will use the training indicators and candidate model structures of the historical training tasks as input, and re-interact with the algorithm instance. The algorithm instance will recommend new candidate model structures based on the training indicators and candidate model structures of the historical training tasks, until the maximum The number of searches is less than or equal to the actual number of training, or the actual value of the training index is better than the expected value.
  • the configuration information is configuration information input by a user, and the configuration information includes search task information and training task information.
  • the configuration information input by the user may include: search task information and training task information.
  • the search task information includes: the namespace where the search task is located (NameSpace), the name of the search task, the data set and version that the search task needs to use, the model structure and version required to perform the search task, the training index name of the search task, and The expected value of the training index, the definition of the search space (operations, such as convolution operations, etc.; operation types, such as discrete variables, shaping variables, etc.), the number of search failures, the maximum number of search times, and the search algorithm.
  • operations such as convolution operations, etc.
  • operation types such as discrete variables, shaping variables, etc.
  • the search algorithm can include: search algorithm name (such as providing a mirror image of the search algorithm when using a custom algorithm), search algorithm type (such as one-time algorithm type, traditional algorithm type, etc.), hardware resources used by the search algorithm (such as GPU, CPU, persistent storage, memory, etc.).
  • the training task information includes: the training code for a single execution of the training task, the hardware resources (such as CPU, memory, GPU, etc.) for the single execution of the training task, the training mode for the single execution of the training task (such as the distributed data flow programming based mode, Parameter Server parameter server, Worker work server mode, etc.) and so on.
  • step S120 includes:
  • Step S121 Create a corresponding algorithm instance according to the search task information
  • Step S122 Load the training task information into the algorithm instance, and generate multiple candidate model structures and corresponding parameters.
  • the processor creates a corresponding algorithm instance according to the algorithm type in the search task information. For example, for a search algorithm of a traditional model structure, the processor uses the algorithm code in the search task information to create a corresponding algorithm instance. For search algorithms that require only one model training, such as DARTS (framework of neural network search technology based on gradient backpropagation), the processor uses the algorithm code in the search task information to create a corresponding algorithm instance. Then, the processor loads the training task information into the created algorithm instance, and generates multiple candidate model structures and corresponding parameters.
  • DARTS frame of neural network search technology based on gradient backpropagation
  • the user when performing a search task, the user only needs to provide configuration information of the search task, for example, search task information and training task information, to initiate a search task of a deep learning neural network architecture, which improves the convenience of search.
  • configuration information of the search task for example, search task information and training task information
  • the adaptability of the search is improved.
  • the search task information includes a search space and a search algorithm of a traditional model structure; step S121 includes:
  • the corresponding algorithm instance is created.
  • the algorithm type is a search algorithm with a traditional model structure, for example, a random search search algorithm selected by a user
  • the algorithm instance will randomly select values in the search space.
  • the processor will find the corresponding image according to the image list in the configuration information, and use this image to create an algorithm container.
  • the algorithm container is exposed as an API server in the form of RESTful (REpresentational State Transfer). Provide services.
  • the user selects a search algorithm based on deep learning, and the processor will create an algorithm instance according to the relevant information in the configuration information.
  • the algorithm instance will establish an LSTM (Long Short-Term Memory, long short-term memory network), and then the generated candidate model structure is the model structure of deep learning.
  • LSTM Long Short-Term Memory, long short-term memory network
  • the search task information includes a search space and a search algorithm for a single-shot model structure; step S121 includes:
  • the corresponding algorithm instance is created.
  • the algorithm type is a single-shot model structure search algorithm, for example, a subgraph search algorithm such as DARTS selected by the user, both model structures and model weights are generated.
  • the processor creates an instance of the algorithm to complete the training. After the training is completed, the processor uses the obtained subgraph to create a model tuning task, and sends the model tuning task to the API server of K8S. If the user does not need tuning, push the searched model directly to the data store or other persistent store.
  • the processor will create an algorithm instance to complete the training. After the training is completed, the processor uses the obtained model substructure to create a training task, and sends the training task to the K8S system API server.
  • Step S160 when the candidate model structure needs to be tuned, create a plurality of model tuning tasks according to the plurality of candidate model structures and corresponding parameters;
  • Step S170 Scheduling multiple model tuning tasks to corresponding physical nodes, so that each physical node executes the corresponding model tuning tasks in parallel, obtains multiple optimized candidate model structures, and counts multiple optimized candidate models The training results of the structure.
  • the processor obtains the tuning information, it sends a request to create a model tuning task to the API server, and the API server receives it.
  • the processor listens to the event of creating a model tuning task from the API server, and creates multiple model tuning tasks using multiple candidate model structures and corresponding parameters according to the tuning information. For example, create a complete full graph from subgraphs, or use the full full graph entered by the user for model optimization and evaluation.
  • step 140 includes:
  • Step 141 Calculate multiple resource type weights corresponding to multiple training tasks according to the number of resources used by all training tasks corresponding to the search task;
  • Step 142 Determine the scheduling sequence of the multiple training tasks according to the multiple resource weights corresponding to the multiple training tasks;
  • Step 143 Schedule multiple training tasks to corresponding physical nodes according to the scheduling sequence.
  • the scheduler executes the scheduling task
  • the following formula can be used to calculate the resources being used by all search tasks:
  • the scheduler calculates the resource weight for each training task j:
  • W j w CPU ⁇ CPU i +w Mem ⁇ Mem i +w GPU ⁇ GPU i
  • w cpu , w Mem , w GPU are the resource weights of CPU, memory and GPU respectively, all of which are less than 1;
  • CPU i is the amount of CPU resources used by search task i,
  • Mem i is the amount of memory resources used by search task i,
  • GPU i is the amount of GPU resources used by search task i.
  • the scheduler When the resources are insufficient to run multiple training tasks in the queue, the scheduler will decide which task to run based on this weighting formula. When the high-weight task resources cannot be satisfied, the scheduler will defer scheduling the second-highest-weight training or tuning task, and so on.
  • the training result includes the actual number of times of training and the actual value of the training index
  • the preset training condition includes the maximum number of searches and the expected value of the training index
  • Step S151 When the actual training times are greater than or equal to the maximum search times, or the actual value of the training index is greater than or equal to the expected value of the training index, obtain the optimal network structure and corresponding parameters, and the optimal network structure and corresponding parameters are used as search results.
  • the training metrics refer to performance evaluation metrics such as accuracy or speed, including error, accuracy, and variance.
  • Step S180 Create a corresponding temporary task for each training task, and the temporary task is used to store the training result in the data storage.
  • training tasks are executed after scheduling is complete.
  • the processor will create a temporary task for each training task.
  • the temporary task and the training task are in the same namespace.
  • the temporary task is responsible for obtaining training tasks or model tuning from the API server through the client of the K8S system. log of the task and store the training results in the data store. For example, in the logs of streaming training tasks, different processing mechanisms are used to obtain training indicators for different training frameworks, and temporary tasks feed back the training indicators to the data storage.
  • a neural network architecture search apparatus including:
  • a configuration information obtaining module 110 configured to obtain configuration information of the search task
  • a candidate model structure generation module 120 configured to generate a plurality of candidate model structures and corresponding parameters according to the configuration information
  • a training task creation module 130 configured to create multiple training tasks according to multiple candidate model structures and parameters
  • the training task scheduling module 140 is used for scheduling multiple training tasks to corresponding physical nodes, so that each physical node executes the corresponding training tasks in parallel, and counts the training results of the multiple training tasks;
  • the search result generating module 150 is configured to obtain a search result corresponding to the search task when the training result satisfies the preset training condition in the configuration information.
  • the iterative search trigger module 160 is configured to trigger the candidate model structure generation module 120 to perform the step of generating multiple candidate model structures and corresponding parameters according to the configuration information when the training result does not meet the preset training conditions in the configuration information.
  • the configuration information is configuration information input by a user, and the configuration information includes search task information and training task information.
  • the candidate model structure generation module 120 includes:
  • the algorithm instance creation sub-module 121 is used to create a corresponding algorithm instance according to the search task information
  • the candidate model structure generation sub-module 122 is configured to load the training task information into the algorithm instance to generate multiple candidate model structures and corresponding parameters.
  • the search task information includes a search space and a search algorithm of a traditional model structure
  • the algorithm instance creation sub-module 121 includes:
  • the first algorithm instance creation unit is configured to create a corresponding algorithm instance according to the search space and the search algorithm of the traditional model structure.
  • the search task information includes a search space and a search algorithm for a single-shot model structure
  • the algorithm instance creation submodule 121 also includes:
  • the second algorithm instance creation unit is configured to create a corresponding algorithm instance according to the search space and the search algorithm of the single-shot model structure.
  • the model tuning task creation module 170 is configured to create multiple model tuning tasks according to multiple candidate model structures and corresponding parameters when the candidate model structures need to be tuned;
  • the model tuning task scheduling module 180 is used to schedule multiple model tuning tasks to corresponding physical nodes, so that each physical node executes the corresponding model tuning tasks in parallel, obtains multiple optimized candidate model structures, and counts them. Training results for multiple optimized candidate model structures.
  • the training task scheduling module 140 includes:
  • the resource type weight calculation sub-module 141 is configured to calculate multiple resource type weights corresponding to multiple training tasks according to the number of resources used by all training tasks corresponding to the search task;
  • the scheduling sequence determination sub-module 142 is configured to determine the scheduling sequence of multiple training tasks according to multiple resource weights corresponding to multiple training tasks;
  • the training task scheduling sub-module 143 is configured to schedule multiple training tasks to corresponding physical nodes according to the scheduling sequence.
  • the training result includes the actual training times and the actual value of the training index
  • the preset training condition includes the maximum number of searches and the expected value of the training index
  • the search result generating module 150 includes:
  • the search result generation sub-module 151 is used to obtain the optimal network structure and corresponding parameters when the actual number of training is greater than or equal to the maximum number of searches, or the actual value of the training index is greater than or equal to the expected value of the training index.
  • the structure and the corresponding parameters are used as search results.
  • the temporary task creation module 190 is used for creating a corresponding temporary task for each training task, and the temporary task is used for storing the training result in the data storage.
  • the present application further provides an electronic device and a readable storage medium.
  • FIG. 9 it is a block diagram of an electronic device of a neural network architecture search method according to an embodiment of the present application.
  • Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the application described and/or claimed herein.
  • the electronic device includes: one or more processors 901, a memory 902, and interfaces for connecting various components, including a high-speed interface and a low-speed interface. Each part is not used
  • the same buses are interconnected and can be mounted on a common motherboard or otherwise as desired.
  • the processor may process instructions executed within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface.
  • multiple processors and/or multiple buses may be used with multiple memories and multiple memories, if desired.
  • multiple electronic devices may be connected, each providing some of the necessary operations (eg, as a server array, a group of blade servers, or a multiprocessor system).
  • a processor 901 is taken as an example in FIG. 9 .
  • the memory 902 is the non-transitory computer-readable storage medium provided by the present application.
  • the memory stores instructions executable by at least one processor, so that the at least one processor executes a neural network architecture search method provided by the present application.
  • the non-transitory computer-readable storage medium of the present application stores computer instructions, and the computer instructions are used to cause a computer to execute a neural network architecture search method provided by the present application.
  • the memory 902 can be used to store non-transitory software programs, non-transitory computer-executable programs and modules, such as program instructions/modules corresponding to a neural network architecture search method in the embodiments of the present application (For example, the configuration information acquisition module 110, the candidate model structure generation module 120, the training task creation module 130, the training task scheduling module 140, and the search result generation module 150 shown in FIG. 6).
  • the processor 901 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions and modules stored in the memory 902, ie, implements a neural network architecture search method in the above method embodiments.
  • the memory 902 may include a stored program area and a stored data area, wherein the stored program area may store an operating system and an application program required by at least one function; created data, etc. Additionally, memory 902 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 902 may optionally include memory located remotely relative to processor 901, and these remote memories may be connected to an electronic device of a neural network architecture search method via a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the electronic device of a neural network architecture search method may further include: an input device 903 and an output device 904 .
  • the processor 901 , the memory 902 , the input device 903 and the output device 904 may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 9 .
  • the input device 903 can receive input numerical or character information, and generate key signal input related to user settings and function control of an electronic device for a neural network architecture search method, such as a touch screen, a keypad, a mouse, a trackpad, a touchpad , pointing stick, one or more mouse buttons, trackball, joystick and other input devices.
  • Output devices 904 may include display devices, auxiliary lighting devices (eg, LEDs), haptic feedback devices (eg, vibration motors), and the like.
  • the display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
  • Various implementations of the systems and techniques described herein can be implemented in digital electronic circuitry, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that The processor, which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
  • the processor which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
  • machine-readable medium and “computer-readable medium” refer to any computer program product, apparatus, and/or apparatus for providing machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memories, programmable logic devices (PLDs), including machine-readable media that receive machine instructions as machine-readable signals.
  • machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, as a data server).
  • back-end components eg, as a data server
  • middleware components eg, an application server
  • front-end components eg, as a data server
  • a user computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
  • a computer system can include clients and servers.
  • Clients and servers are generally remote from each other and usually interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.

Abstract

本申请公开了一种神经网络架构搜索方法以及装置,具体实现方案为:方法包括:获取搜索任务的配置信息,根据配置信息生成多个候选模型结构以及对应的参数;根据多个候选模型结构以及参数创建多个训练任务;将多个训练任务调度至对应的物理节点,以使各物理节点并行执行对应的训练任务,并统计多个训练任务的训练结果;在训练结果满足配置信息中的预设训练条件的情况下,得到搜索任务对应的搜索结果。不仅提高了搜索效率,还提高了集群资源的利用率,使用的硬件资源和硬件加速资源扩展性增强,满足大规模搜索的需求。

Description

神经网络架构搜索方法以及装置
相关申请的交叉引用
本申请要求于2020年8月18日申请的专利申请号CN202010829782.5且发明名称为“神经网络架构搜索方法以及装置”的优先权,其全部内容通过引用并入本文。
技术领域
本申请涉及深度学习领域,尤其涉及神经网络架构搜索领域。
背景技术
随着人工智能技术的发展,深度学习神经网络架构搜索技术逐渐成熟。深度学习神经网络架构搜索技术可以自动构建满足需求的深度学习神经网络,降低人力投入,提高人力效率,具有很高的工业价值。
然而,目前的深度学习神经网络架构搜索技术,搜索任务被以单个任务的方式提交到集群管理系统中,在单台机器上执行搜索任务,难以在多台机器组成的集群上以分布式的方式并行执行搜索任务,导致搜索效率较低,并且集群资源的利用率较差。
发明内容
本申请实施例提供一种神经网络架构搜索方法以及装置,以解决相关技术存在的问题,技术方案如下:
第一方面,提供了一种神经网络架构搜索方法,包括:
获取搜索任务的配置信息,根据配置信息生成多个候选模型结构以及对应的参数;
根据多个候选模型结构以及参数创建多个训练任务;
将多个训练任务调度至对应的物理节点,以使各物理节点并行执行对应的训练任务,并统计多个训练任务的训练结果;
在训练结果满足配置信息中的预设训练条件的情况下,得到搜索任务对 应的搜索结果。
在一种实施方式中,还包括:
在训练结果未满足配置信息中的预设训练条件的情况下,返回执行根据配置信息生成多个候选模型结构以及对应的参数的步骤。
在一种实施方式中,配置信息是用户输入的配置信息,配置信息包括搜索任务信息和训练任务信息。
在一种实施方式中,根据配置信息生成多个候选模型结构以及对应的参数,包括:
根据搜索任务信息创建对应的算法实例;
将训练任务信息加载至算法实例中,生成多个候选模型结构以及对应的参数。
在一种实施方式中,搜索任务信息包括搜索空间和传统模型结构的搜索算法;根据搜索任务信息创建对应的算法实例,包括:
根据搜索空间和传统模型结构的搜索算法,创建对应的算法实例。
在一种实施方式中,搜索任务信息包括搜索空间和单次模型结构的搜索算法;根据搜索任务信息创建对应的算法实例,包括:
根据搜索空间和单次模型结构的搜索算法,创建对应的算法实例。
在一种实施方式中,还包括:
在候选模型结构需要调优的情况下,根据多个候选模型结构以及对应的参数创建多个模型调优任务;
将多个模型调优任务调度至对应的物理节点,以使各物理节点并行执行对应的模型调优任务,得到多个优化后的候选模型结构,并统计多个优化后的候选模型结构的训练结果。
在一种实施方式中,将多个训练任务调度至对应的物理节点,包括:
根据搜索任务对应的全部训练任务所使用的资源数量,计算多个训练任务对应的多个资源类型权重;
根据多个训练任务对应的多个资源权重,确定多个训练任务的调度顺序;
根据调度顺序将多个训练任务调度至对应的物理节点。
在一种实施方式中,训练结果包括实际训练次数和训练指标的实际值,预设训练条件包括最大搜索次数和训练指标的期望值;在多个训练任务的训 练结果满足配置信息中的预设训练条件的情况下,得到搜索任务对应的搜索结果,包括:
在实际训练次数大于或等于最大搜索次数,或训练指标的实际值大于或等于训练指标的期望值的情况下,得到最优网络结构以及对应的参数,最优网络结构以及对应的参数作为搜索结果。
在一种实施方式中,还包括:
针对每个训练任务创建对应的临时任务,临时任务用于将训练结果存储至数据存储器中。
第二方面,本申请实施例提供一种神经网络架构搜索装置,包括:
配置信息获取模块,用于获取搜索任务的配置信息;
候选模型结构生成模块,用于根据配置信息生成多个候选模型结构以及对应的参数;
训练任务创建模块,用于根据多个候选模型结构以及参数创建多个训练任务;
训练任务调度模块,用于将多个训练任务调度至对应的物理节点,以使各物理节点并行执行对应的训练任务,并统计多个训练任务的训练结果;
搜索结果生成模块,用于在训练结果满足配置信息中的预设训练条件的情况下,得到搜索任务对应的搜索结果。
在一种实施方式中,还包括:
迭代搜索触发模块,用于在训练结果未满足配置信息中的预设训练条件的情况下,触发所述候选模型结构生成模块执行根据配置信息生成多个候选模型结构以及对应的参数的步骤。
在一种实施方式中,配置信息是用户输入的配置信息,配置信息包括搜索任务信息和训练任务信息。
在一种实施方式中,候选模型结构生成模块,包括:
算法实例创建子模块,用于根据搜索任务信息创建对应的算法实例;
候选模型结构生成子模块,用于将训练任务信息加载至算法实例中,生成多个候选模型结构以及对应的参数。
在一种实施方式中,搜索任务信息包括搜索空间和传统模型结构的搜索算法,算法实例创建子模块,包括:
第一算法实例创建单元,用于根据搜索空间和传统模型结构的搜索算法,创建对应的算法实例。
在一种实施方式中,搜索任务信息包括搜索空间和单次模型结构的搜索算法,算法实例创建子模块,还包括:
第二算法实例创建单元,用于根据搜索空间和单次模型结构的搜索算法,创建对应的算法实例。
在一种实施方式中,还包括:
模型调优任务创建模块,用于在候选模型结构需要调优的情况下,根据多个候选模型结构以及对应的参数创建多个模型调优任务;
模型调优任务调度模块,用于将多个模型调优任务调度至对应的物理节点,以使各物理节点并行执行对应的模型调优任务,得到多个优化后的候选模型结构,并统计多个优化后的候选模型结构的训练结果。
在一种实施方式中,训练任务调度模块,包括:
资源类型权重计算子模块,用于根据搜索任务对应的全部训练任务所使用的资源数量,计算多个训练任务对应的多个资源类型权重;
调度顺序确定子模块,用于根据多个训练任务对应的多个资源权重,确定多个训练任务的调度顺序;
训练任务调度子模块,用于根据调度顺序将多个训练任务调度至对应的物理节点。
在一种实施方式中,训练结果包括实际训练次数和训练指标的实际值,预设训练条件包括最大搜索次数和训练指标的期望值,搜索结果生成模块,包括:
搜索结果生成子模块,用于在实际训练次数大于或等于最大搜索次数,或训练指标的实际值大于或等于训练指标的期望值的情况下,得到最优网络结构以及对应的参数,最优网络结构以及对应的参数作为搜索结果。
在一种实施方式中,还包括:
临时任务创建模块,用于针对每个训练任务创建对应的临时任务,临时任务用于将训练结果存储至数据存储器中。
第三方面,提供了一种电子设备,包括:
至少一个处理器;以及
与至少一个处理器通信连接的存储器;其中,
存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行上述任一项的方法。
第四方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,计算机指令用于使计算机执行上述任一项的方法。
上述申请中的一个实施例具有如下优点或有益效果:由于利用搜索任务的配置信息动态地生成多个候选模型结构以及对应的参数,并创建多个候选模型结构的训练任务,将多个训练任务调度之对应的物理节点,并行地执行多个训练任务,进而得到搜索结果,所以不仅提高了搜索效率,还提高了集群资源的利用率,使用的硬件资源和硬件加速资源扩展性增强,满足大规模搜索的需求。
上述可选方式所具有的其他效果将在下文中结合具体实施例加以说明。
附图说明
附图用于更好地理解本方案,不构成对本申请的限定。其中:
图1是根据本申请一实施例的一种神经网络架构搜索方法的示意图;
图2是根据本申请一实施例的一种神经网络架构搜索场景的示意图;
图3是根据本申请一实施例的另一种神经网络架构搜索方法的示意图;
图4是根据本申请一实施例的另一种神经网络架构搜索方法的示意图;
图5是根据本申请一实施例的训练任务调度方法的示意图;
图6是根据本申请一实施例的一种神经网络架构搜索装置的示意图;
图7是根据本申请另一实施例的一种神经网络架构搜索装置的示意图;
图8是根据本申请一实施例的训练任务调度模块的示意图;
图9是用来实现本申请实施例的一种神经网络架构搜索方法的电子 设备的框图。
具体实施方式
以下结合附图对本申请的示范性实施例做出说明,其中包括本申请实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本申请的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
如图1所示,在一种具体的实施方式中,提供一种神经网络架构搜索方法,包括:
步骤S110:获取搜索任务的配置信息;
步骤S120:根据配置信息生成多个候选模型结构以及对应的参数;
步骤S130:根据多个候选模型结构以及参数创建多个训练任务;
步骤S140:将多个训练任务调度至对应的物理节点,以使各物理节点并行执行对应的训练任务,并统计多个训练任务的训练结果;
步骤S150:在训练结果满足配置信息中的预设训练条件的情况下,得到搜索任务对应的搜索结果。
一种示例中,如图2所示,可以选择在K8S系统(Kubernetes,容器集群管理系)中运行控制器、调度器、数据存储器和API(应用程序)服务器等组件,这些组件以Docker(开源的应用容器引擎)容器的方式运行在K8S系统的主控节点上。
如图3所示,首先,控制器获取搜索任务的配置信息。配置信息可以是存储在配置文件中的默认的配置信息,或者用户通过界面操作输入的配置信息,又或者是用户在客户端通过命令行输入的搜索任务信息。配置信息中包括搜索任务信息和训练任务信息等。对配置信息添加缺失的信息或者默认的初始值等,将配置信息设置完整。并对完整的配置信息进行校验,例如,确定用户输入的搜索空间是适当的搜索空间:如果用户选择了整形变量,则搜索空间需要提供最小整数和最大整数;如果用户选择了离散变量,则搜索空间需要提供一个离散的列表,写明所有的搜索选择;如果用户选择了浮点数变量,则搜索空间需要提供最小值、最大值以及步长。校验是否选择了合 法的操作类型:如果操作类型是卷积操作,则卷积操作对应的参数包括过滤器的尺寸、过滤器的数量、步幅等。在对配置信息进行完校验后,控制器通过Kubernetes提供的Golang客户端向API服务器发送创建训练任务的请求,API服务器将创建训练任务的请求持久化到数据存储器中。API服务器依赖数据存储器,数据存储器是由ETCD(分布式一致性键值存储系统)提供的存储功能。API服务器将创建训练任务的请求发送给搜索任务的处理器进行下一步处理。
搜索任务的处理器利用K8S系统中的Informer(带有本地缓存和索引机制的客户端工具)的组件,监控API服务器中关于搜索任务的事件。Informer组件注册了搜索任务、训练任务和算法等事件的回调处理器,当搜索任务、训练任务和算法的事件中的任意事件发生变化时,处理器利用K8S系统的客户端来处理这些变化。处理器在收到上述事件后,首先将这些事件放入维护在内存中的事件队列中,由专门的Worker(工作服务器)协程负责从事件队列中取出事件。处理器根据事件的引用对象,通过K8S系统的客户端从API服务器处取得搜索任务的配置信息。
其次,搜索任务的处理器接收到创建训练任务的请求后,通过K8S系统提供的客户端从API服务器中获取搜索任务的配置信息。处理器根据搜索任务信息中用户定义的搜索空间和搜索算法,创建算法实例,并利用算法实例来运行搜索算法。例如,处理器根据不同类型的搜索算法(传统模型结构的搜索算法、单次模型结构的搜索算法以及优化后的单次模型结构的搜索算法等),创建对应的算法实例。根据算法的架构不同,算法实例会采用不同的实现。处理器在算法实例初始化完成后,会通过gRPC(google remote procedure call,远程过程调用)框架的通信机制,使得控制器和处理器产生通信交互,将控制器中的训练任务信息与处理器中的算法实例进行交互,生成多个候选模型结构以及对应的参数(例如,权重)。处理器通过HTTP接口访问算法实例,通过HTTP接口获得多个候选模型结构以及对应的参数。
然后,处理器通过访问算法实例获取多个候选模型结构以及对应的参数后,创建候选模型结构的训练任务。一个候选模型结构对应创建一个训练任务,一个训练任务分发多个容器组(pod)上的物理节点并行执行。并将多个候选模型结构的训练任务发送至API服务器。由于处理器可以监控API 服务器,当API服务器中存在创建了训练任务的变化时,处理器通过K8S系统的客户端从API服务器处取得配置信息中的训练任务信息,并根据训练任务信息,创建承载训练负载的容器组(pod),容器组(pod)中运行有物理节点,这一功能由Kubeflow实现。在容器组(pod)被创建后,调度器会根据训练任务信息,确定多个训练任务需要被调度到的物理节点,使得这些训练任务可以被多个物理节点并行执行。处理器会根据训练任务信息确定训练任务是否需要GPU或CPU等硬件加速器资源,提高使用的硬件资源和硬件加速资源扩展性。每个训练任务会接受候选模型结构作为输入,利用训练任务信息中的训练代码进行训练。在训练任务完成后,训练任务退出,处理器修改训练任务的状态为完成。随后,处理器会统计训练结果,训练结果包括:同一搜索任务中一共完成的训练任务的次数,每个训练任务得到的训练指标的实际值等。
最后,处理器根据搜索任务信息中的最大搜索次数和训练指标的期望值,与训练结果中的实际训练次数和训练指标的实际值进行比对,如果训练结果大于搜索任务信息中的对应值,则处理器标示这一搜索任务的状态为完成。
本实施方式中,深度学习神经网络架构搜索技术利用K8S系统的分布式执行的能力,实现大规模的深度学习神经网络架构搜索。在进行搜索任务时,用户仅需提供搜索任务的配置信息,例如,搜索任务信息和训练任务信息,就可发起深度学习神经网络架构的搜索任务。用户无需自行对深度学习神经网络架构搜索的运行环境进行构建,无需自行申请训练任务所需的集群资源,无需自行设计搜索算法相关的内容,而是由K8S系统自动实现,提高了搜索的便利性、适应性以及搜索速度。根据用户输入的搜索任务的配置信息动态地生成多个候选模型结构以及对应的参数,并创建多个候选模型结构的训练任务,利用K8S系统将多个训练任务调度之对应的物理节点,并行地执行多个训练任务,进而得到搜索结果。不仅提高了搜索效率,还提高了集群资源的利用率,使用的硬件资源和硬件加速资源扩展性增强,满足大规模搜索的需求。
在一种实施方式中,如图4所示,还包括:
在训练结果未满足配置信息中的预设训练条件的情况下,返回执行根据配置信息生成多个候选模型结构以及对应的参数的步骤。
一种示例中,处理器根据搜索任务信息中的最大搜索次数和训练指标的期望值,与训练结果中的实际训练次数和训练指标的实际值进行比对,如果
训练结果小于搜索任务信息中的对应值,则处理器触发返回执行步骤S120-步骤S140。具体的,处理器会以历史训练任务的训练指标和候选模型结构作为输入,重新与算法实例交互,算法实例会根据历史训练任务的训练指标和候选模型结构,推荐新的候选模型结构,直到最大搜索次数小于或者等于实际训练次数,或者训练指标的实际值优于期望值为止。
在一种实施方式中,配置信息是用户输入的配置信息,配置信息包括搜索任务信息和训练任务信息。
一种示例中,用户输入的配置信息可以包括:搜索任务信息和训练任务信息。其中,搜索任务信息包括:搜索任务所在的命名空间(NameSpace)、搜索任务的名字、搜索任务需要使用的数据集以及版本、执行搜索任务所需的模型结构以及版本、搜索任务的训练指标名称以及训练指标的期望值、搜索空间的定义(操作,如卷积操作等;操作类型,如离散变量,整形变量等)、搜索失败次数、搜索次数最大值、搜索算法。搜索算法可以包括:搜索算法名称(如使用自定义算法时提供搜索算法的镜像)、搜索算法类型(如一次性算法类型,传统算法类型等)、搜索算法使用的硬件资源(如GPU、CPU、持久化存储、内存等)。训练任务信息包括:单次执行训练任务的训练代码、单次执行训练任务的硬件资源(如CPU、内存、GPU等)、单次执行训练任务的训练模式(如TensorFlow基于数据流编程的分布式模式、Parameter Server参数服务器、Worker工作服务器模式等)等。
在一种实施方式中,如图4所示,步骤S120,包括:
步骤S121:根据搜索任务信息创建对应的算法实例;
步骤S122:将训练任务信息加载至算法实例中,生成多个候选模型结构以及对应的参数。
一种示例中,如图3所示,处理器根据搜索任务信息中的算法类型,创建对应的算法实例。例如,针对传统模型结构的搜索算法,处理器利用搜索任务信息中的算法代码,创建出对应的算法实例。针对如DARTS(基于梯度反传的神经网络搜索技术的框架)等仅需要一次模型训练的搜索算法,处理器利用搜索任务信息中的算法代码,创建出对应的算法实例。 然后,处理器将训练任务信息加载至创建出来的算法实例中,生成多个候选模型结构以及对应的参数。
本实施方式中,在进行搜索任务时,用户仅需提供搜索任务的配置信息,例如,搜索任务信息和训练任务信息,就可发起深度学习神经网络架构的搜索任务,提高了搜索的便利性。同时,由于根据不同的算法类型创建对应的算法实例,进而生成多个候选模型结构以及对应的参数,所以提高了搜索的适应性。
在一种实施方式中,搜索任务信息包括搜索空间和传统模型结构的搜索算法;步骤S121,包括:
根据搜索空间和传统模型结构的搜索算法,创建对应的算法实例。
一种示例中,如图3所示,如果算法类型为传统模型结构的搜索算法,例如,用户选择的随机搜索的搜索算法,算法实例会随机地在搜索空间中选取取值。处理器会根据配置信息中的镜像列表,寻找到随机搜索对应的镜像,并且利用这一镜像创建出算法容器,算法容器以RESTful(REpresentational State Transfer,表现层状态转移)形式的API服务器的方式对外提供服务。
另一示例中,用户选择了基于深度学习的搜索算法,处理器会根据配置信息中相关信息建立算法实例。算法实例会建立一个LSTM(Long Short-Term Memory,长短期记忆网络),进而生成的候选模型结构为深度学习的模型结构。
在一种实施方式中,搜索任务信息包括搜索空间和单次模型结构的搜索算法;步骤S121,包括:
根据搜索空间和单次模型结构的搜索算法,创建对应的算法实例。
一种示例中,如图3所示,如果算法类型为单次模型结构的搜索算法,例如,用户选择的DARTS等的子图搜索算法,既产生模型结构又产生模型权重。处理器会创建算法实例来完成训练。在训练完成后,处理器利用得到的子图,创建模型调优任务,并将模型调优任务发送至K8S的API服务器。如果用户不需要调优,则将搜索到的模型直接推送到数据存储器中或者其他持久化存储器中。
另一示例中,如果用户选择的是只产生模型结构,不产生模型权重的子图搜索算法,处理器会创建算法实例完成训练。在训练完成后,处理器利用 得到的模型子结构创建训练任务,并将训练任务发送至K8S系统API服务器。
在一种实施方式中,如图4所示,还包括:
步骤S160:在候选模型结构需要调优的情况下,根据多个候选模型结构以及对应的参数创建多个模型调优任务;
步骤S170:将多个模型调优任务调度至对应的物理节点,以使各物理节点并行执行对应的模型调优任务,得到多个优化后的候选模型结构,并统计多个优化后的候选模型结构的训练结果。
一种示例中,如果用户输入的配置信息中包含有候选模型结构相关的调优信息,那么处理器获取调优信息后,将创建模型调优任务的请求发送至API服务器,在API服务器收到创建模型调优任务的请求后,处理器监听到来自API服务器的创建模型调优任务的事件,并根据调优信息,利用多个候选模型结构以及对应的参数创建多个模型调优任务。例如,根据子图创建出完整的全图,或者利用用户输入的完整的全图进行模型优化和评估。
在一种实施方式中,如图5所示,步骤140:包括:
步骤141:根据搜索任务对应的全部训练任务所使用的资源数量,计算多个训练任务对应的多个资源类型权重;
步骤142:根据多个训练任务对应的多个资源权重,确定多个训练任务的调度顺序;
步骤143:根据调度顺序将多个训练任务调度至对应的物理节点。
一种示例中,调度器在执行调度任务时,可以利用如下公式计算所有搜索任务正在使用的资源:
Figure PCTCN2021080497-appb-000001
其中,Resource表示资源类型,可以为内存、CPU或硬件加速器资源等;i为搜索任务;j为搜索任务i的训练任务;k为搜索任务i定义的最大搜索次数;Resource jobj表示第j个任务所用的资源。
调度器计算每个训练任务j的资源权重:
W j=w CPU·CPU i+w Mem·Mem i+w GPU·GPU i
其中,w cpu,w Mem,w GPU分别是CPU、内存和GPU的资源权重,都小于1;CPU i是搜索任务i使用的CPU资源数量,Mem i是搜索任务i 使用的内存资源数量,GPU i是搜索任务i使用的GPU资源数量。
调度器在资源不足以运行队列中的多个训练任务时,会根据这一权重公式来决定运行哪一个任务。当高权重的任务资源无法被满足时,调度器会顺延调度第二高权重的训练或调优任务,以此类推。
在一种实施方式中,如图4所示,训练结果包括实际训练次数和训练指标的实际值,预设训练条件包括最大搜索次数和训练指标的期望值;步骤S150,包括:
步骤S151:在实际训练次数大于或等于最大搜索次数,或训练指标的实际值大于或等于训练指标的期望值的情况下,得到最优网络结构以及对应的参数,最优网络结构以及对应的参数作为搜索结果。
一种示例中,训练指标是指精度或速度等性能评估指标,包括误差、精准度以及方差等。
在一种实施方式中,如图4所示,还包括:
步骤S180:针对每个训练任务创建对应的临时任务,临时任务用于将训练结果存储至数据存储器中。
一种示例中,在调度完成后,训练任务会被执行。在执行时,处理器会为每个训练任务创建一个临时任务,临时任务与训练任务在同一个命名空间下,临时任务负责通过用K8S系统的客户端从API服务器处获得训练任务或模型调优任务的日志,并将训练结果存储至数据存储器中。例如,在流式的训练任务的日志中,针对不同的训练框架利用不同的处理机制得到训练指标,临时任务将训练指标反馈到数据存储器中。
在另一种具体实施方式中,如图6所示,提供一种神经网络架构搜索装置,包括:
配置信息获取模块110,用于获取搜索任务的配置信息;
候选模型结构生成模块120,用于根据配置信息生成多个候选模型结构以及对应的参数;
训练任务创建模块130,用于根据多个候选模型结构以及参数创建多个训练任务;
训练任务调度模块140,用于将多个训练任务调度至对应的物理节点,以使各物理节点并行执行对应的训练任务,并统计多个训练任务的训练结果;
搜索结果生成模块150,用于在训练结果满足配置信息中的预设训练条件的情况下,得到搜索任务对应的搜索结果。
在一种实施方式中,如图7所示,还包括:
迭代搜索触发模块160,用于在训练结果未满足配置信息中的预设训练条件的情况下,触发候选模型结构生成模块120执行根据配置信息生成多个候选模型结构以及对应的参数的步骤。
在一种实施方式中,配置信息是用户输入的配置信息,配置信息包括搜索任务信息和训练任务信息。
在一种实施方式中,如图7所示,候选模型结构生成模块120,包括:
算法实例创建子模块121,用于根据搜索任务信息创建对应的算法实例;
候选模型结构生成子模块122,用于将训练任务信息加载至算法实例中,生成多个候选模型结构以及对应的参数。
在一种实施方式中,搜索任务信息包括搜索空间和传统模型结构的搜索算法,算法实例创建子模块121,包括:
第一算法实例创建单元,用于根据搜索空间和传统模型结构的搜索算法,创建对应的算法实例。
在一种实施方式中,搜索任务信息包括搜索空间和单次模型结构的搜索算法,算法实例创建子模块121,还包括:
第二算法实例创建单元,用于根据搜索空间和单次模型结构的搜索算法,创建对应的算法实例。
在一种实施方式中,如图7所示,还包括:
模型调优任务创建模块170,用于在候选模型结构需要调优的情况下,根据多个候选模型结构以及对应的参数创建多个模型调优任务;
模型调优任务调度模块180,用于将多个模型调优任务调度至对应的物理节点,以使各物理节点并行执行对应的模型调优任务,得到多个优化后的候选模型结构,并统计多个优化后的候选模型结构的训练结果。
在一种实施方式中,如图8所示,训练任务调度模块140,包括:
资源类型权重计算子模块141,用于根据搜索任务对应的全部训练任务所使用的资源数量,计算多个训练任务对应的多个资源类型权重;
调度顺序确定子模块142,用于根据多个训练任务对应的多个资源权重, 确定多个训练任务的调度顺序;
训练任务调度子模块143,用于根据调度顺序将多个训练任务调度至对应的物理节点。
在一种实施方式中,如图7所示,训练结果包括实际训练次数和训练指标的实际值,预设训练条件包括最大搜索次数和训练指标的期望值,搜索结果生成模块150,包括:
搜索结果生成子模块151,用于在实际训练次数大于或等于最大搜索次数,或训练指标的实际值大于或等于训练指标的期望值的情况下,得到最优网络结构以及对应的参数,最优网络结构以及对应的参数作为搜索结果。
在一种实施方式中,如图7所示,还包括:
临时任务创建模块190,用于针对每个训练任务创建对应的临时任务,临时任务用于将训练结果存储至数据存储器中。
请实施例各装置中的各模块的功能可以参见上述方法中的对应描述,在此不再赘述。
根据本申请的实施例,本申请还提供了一种电子设备和一种可读存储介质。
如图9所示,是根据本申请实施例的一种神经网络架构搜索方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本申请的实现。
如图9所示,该电子设备包括:一个或多个处理器901、存储器902,以及用于连接各部件的接口,包括高速接口和低速接口。各个部件利用不
同的总线互相连接,并且可以被安装在公共主板上或者根据需要以其它方式安装。处理器可以对在电子设备内执行的指令进行处理,包括存储在存储器中或者存储器上以在外部输入/输出装置(诸如,耦合至接口的显示设备)上显示GUI的图形信息的指令。在其它实施方式中,若需要,可以将多个处理器和/或多条总线与多个存储器和多个存储器一起使用。同样, 可以连接多个电子设备,各个设备提供部分必要的操作(例如,作为服务器阵列、一组刀片式服务器、或者多处理器系统)。图9中以一个处理器901为例。
存储器902即为本申请所提供的非瞬时计算机可读存储介质。其中,所述存储器存储有可由至少一个处理器执行的指令,以使所述至少一个处理器执行本申请所提供的一种神经网络架构搜索方法。本申请的非瞬时计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行本申请所提供的一种神经网络架构搜索方法。
存储器902作为一种非瞬时计算机可读存储介质,可用于存储非瞬时软件程序、非瞬时计算机可执行程序以及模块,如本申请实施例中的一种神经网络架构搜索方法对应的程序指令/模块(例如,附图6所示的配置信息获取模块110、候选模型结构生成模块120、训练任务创建模块130、训练任务调度模块140、搜索结果生成模块150)。处理器901通过运行存储在存储器902中的非瞬时软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例中的一种神经网络架构搜索方法。
存储器902可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据一种神经网络架构搜索方法的电子设备的使用所创建的数据等。此外,存储器902可以包括高速随机存取存储器,还可以包括非瞬时存储器,例如至少一个磁盘存储器件、闪存器件、或其他非瞬时固态存储器件。在一些实施例中,存储器902可选包括相对于处理器901远程设置的存储器,这些远程存储器可以通过网络连接至一种神经网络架构搜索方法的电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
一种神经网络架构搜索方法的电子设备还可以包括:输入装置903和输出装置904。处理器901、存储器902、输入装置903和输出装置904可以通过总线或者其他方式连接,图9中以通过总线连接为例。
输入装置903可接收输入的数字或字符信息,以及产生与一种神经网络架构搜索方法的电子设备的用户设置以及功能控制有关的键信号输入, 例如触摸屏、小键盘、鼠标、轨迹板、触摸板、指示杆、一个或者多个鼠标按钮、轨迹球、操纵杆等输入装置。输出装置904可以包括显示设备、辅助照明装置(例如,LED)和触觉反馈装置(例如,振动电机)等。该显示设备可以包括但不限于,液晶显示器(LCD)、发光二极管(LED)显示器和等离子体显示器。在一些实施方式中,显示设备可以是触摸屏。
此处描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、专用ASIC(专用集成电路)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
这些计算程序(也称作程序、软件、软件应用、或者代码)包括可编程处理器的机器指令,并且可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。如本文使用的,术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如,磁盘、光盘、存储器、可编程逻辑装置(PLD)),包括,接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如, 作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本申请公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本申请保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本申请的精神和原则之内所作的修改、等同替换和改进等,均应包含在本申请保护范围之内。

Claims (22)

  1. 一种神经网络架构搜索方法,其特征在于,包括:
    获取搜索任务的配置信息,根据所述配置信息生成多个候选模型结构以及对应的参数;
    根据所述多个候选模型结构以及参数创建多个训练任务;
    将多个训练任务调度至对应的物理节点,以使各物理节点并行执行对应的训练任务,并统计所述多个训练任务的训练结果;
    在所述训练结果满足所述配置信息中的预设训练条件的情况下,得到所述搜索任务对应的搜索结果。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    在所述训练结果未满足所述配置信息中的预设训练条件的情况下,返回执行所述根据所述配置信息生成多个候选模型结构以及对应的参数的步骤。
  3. 根据权利要求1所述的方法,其特征在于,所述配置信息是用户输入的配置信息,所述配置信息包括搜索任务信息和训练任务信息。
  4. 根据权利要求3所述的方法,其特征在于,根据所述配置信息生成多个候选模型结构以及对应的参数,包括:
    根据所述搜索任务信息创建对应的算法实例;
    将所述训练任务信息加载至所述算法实例中,生成所述多个候选模型结构以及对应的参数。
  5. 根据权利要求4所述的方法,其特征在于,所述搜索任务信息包括搜索空间和传统模型结构的搜索算法;根据所述搜索任务信息创建对应的算法实例,包括:
    根据所述搜索空间和所述传统模型结构的搜索算法,创建对应的算法实例。
  6. 根据权利要求4所述的方法,其特征在于,所述搜索任务信息包括搜索空间和单次模型结构的搜索算法;根据所述搜索任务信息创建对应的算法实例,包括:
    根据所述搜索空间和所述单次模型结构的搜索算法,创建对应的算法实例。
  7. 根据权利要求6所述的方法,其特征在于,还包括:
    在所述候选模型结构需要调优的情况下,根据所述多个候选模型结构以及对应的参数创建多个模型调优任务;
    将所述多个模型调优任务调度至对应的物理节点,以使各物理节点并行执行对应的模型调优任务,得到多个优化后的候选模型结构,并统计所述多个优化后的候选模型结构的训练结果。
  8. 根据权利要求1所述的方法,其特征在于,将多个训练任务调度至对应的物理节点,包括:
    根据所述搜索任务对应的全部训练任务所使用的资源数量,计算所述多个训练任务对应的多个资源类型权重;
    根据所述多个训练任务对应的多个资源权重,确定所述多个训练任务的调度顺序;
    根据所述调度顺序将所述多个训练任务调度至对应的物理节点。
  9. 根据权利要求1所述的方法,其特征在于,所述训练结果包括实际训练次数和训练指标的实际值,所述预设训练条件包括所述最大搜索次数和所述训练指标的期望值;在所述多个训练任务的训练结果满足所述配置信息中的预设训练条件的情况下,得到所述搜索任务对应的搜索结果,包括:
    在所述实际训练次数大于或等于所述最大搜索次数,或所述训练指标的实际值大于或等于所述训练指标的期望值的情况下,得到最优网络结构以及对应的参数,所述最优网络结构以及对应的参数作为所述搜索结果。
  10. 根据权利要求1所述的方法,其特征在于,还包括:
    针对每个训练任务创建对应的临时任务,所述临时任务用于将所述训练结果存储至数据存储器中。
  11. 一种神经网络架构搜索装置,其特征在于,包括:
    配置信息获取模块,用于获取搜索任务的配置信息;
    候选模型结构生成模块,用于根据所述配置信息生成多个候选模型结构以及对应的参数;
    训练任务创建模块,用于根据所述多个候选模型结构以及参数创建多个训练任务;
    训练任务调度模块,用于将多个训练任务调度至对应的物理节点,以使各物理节点并行执行对应的训练任务,并统计所述多个训练任务的训练结果;
    搜索结果生成模块,用于在所述训练结果满足所述配置信息中的预设训练条件的情况下,得到所述搜索任务对应的搜索结果。
  12. 根据权利要求11所述的装置,其特征在于,还包括:
    迭代搜索触发模块,用于在所述训练结果未满足所述配置信息中的预设训练条件的情况下,触发所述候选模型结构生成模块执行所述根据所述配置信息生成多个候选模型结构以及对应的参数的步骤。
  13. 根据权利要求11所述的装置,其特征在于,所述配置信息是用户输入的配置信息,所述配置信息包括搜索任务信息和训练任务信息。
  14. 根据权利要求13所述的装置,其特征在于,所述候选模型结构生成模块,包括:
    算法实例创建子模块,根据所述搜索任务信息创建对应的算法实例;
    候选模型结构生成子模块,用于将所述训练任务信息加载至所述算法实例中,生成所述多个候选模型结构以及对应的参数。
  15. 根据权利要求14所述的装置,其特征在于,所述搜索任务信息包括搜索空间和传统模型结构的搜索算法,所述算法实例创建子模块,包括:
    第一算法实例创建单元,用于根据所述搜索空间和所述传统模型结构的搜索算法,创建对应的算法实例。
  16. 根据权利要求14所述的装置,其特征在于,所述搜索任务信息包括搜索空间和单次模型结构的搜索算法,所述算法实例创建子模块,还包括:
    第二算法实例创建单元,用于根据所述搜索空间和所述单次模型结构的搜索算法,创建对应的算法实例。
  17. 根据权利要求16所述的装置,其特征在于,还包括:
    模型调优任务创建模块,用于在所述候选模型结构需要调优的情况下,根据所述多个候选模型结构以及对应的参数创建多个模型调优任务;
    模型调优任务调度模块,用于将所述多个模型调优任务调度至对应的物理节点,以使各物理节点并行执行对应的模型调优任务,得到多个优化后的候选模型结构,并统计所述多个优化后的候选模型结构的训练结果。
  18. 根据权利要求11所述的装置,其特征在于,所述训练任务调度模块, 包括:
    资源类型权重计算子模块,用于根据所述搜索任务对应的全部训练任务所使用的资源数量,计算所述多个训练任务对应的多个资源类型权重;
    调度顺序确定子模块,用于根据所述多个训练任务对应的多个资源权重,确定所述多个训练任务的调度顺序;
    训练任务调度子模块,用于根据所述调度顺序将所述多个训练任务调度至对应的物理节点。
  19. 根据权利要求11所述的装置,其特征在于,所述训练结果包括实际训练次数和训练指标的实际值,所述预设训练条件包括所述最大搜索次数和所述训练指标的期望值,所述搜索结果生成模块,包括:
    搜索结果生成子模块,用于在所述实际训练次数大于或等于所述最大搜索次数,或所述训练指标的实际值大于或等于所述训练指标的期望值的情况下,得到最优网络结构以及对应的参数,所述最优网络结构以及对应的参数作为所述搜索结果。
  20. 根据权利要求11所述的装置,其特征在于,还包括:
    临时任务创建模块,用于针对每个训练任务创建对应的临时任务,所述临时任务用于将所述训练结果存储至数据存储器中。
  21. 一种电子设备,其特征在于,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-10中任一项所述的方法。
  22. 一种存储有计算机指令的非瞬时计算机可读存储介质,其特征在于,所述计算机指令用于使所述计算机执行权利要求1-10中任一项所述的方法。
PCT/CN2021/080497 2020-08-18 2021-03-12 神经网络架构搜索方法以及装置 WO2022037039A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010829782.5 2020-08-18
CN202010829782.5A CN112000450A (zh) 2020-08-18 2020-08-18 神经网络架构搜索方法以及装置

Publications (1)

Publication Number Publication Date
WO2022037039A1 true WO2022037039A1 (zh) 2022-02-24

Family

ID=73472626

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/080497 WO2022037039A1 (zh) 2020-08-18 2021-03-12 神经网络架构搜索方法以及装置

Country Status (3)

Country Link
CN (1) CN112000450A (zh)
TW (1) TWI773100B (zh)
WO (1) WO2022037039A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114640695A (zh) * 2022-04-24 2022-06-17 上海交通大学 一种智能工厂基于长序列双预测和informer的高频时序数据有效传输方法
CN116954873A (zh) * 2023-09-21 2023-10-27 浪潮电子信息产业股份有限公司 异构计算系统及其算力节点选择方法、装置、设备、介质
WO2023221371A1 (zh) * 2022-05-19 2023-11-23 北京百度网讯科技有限公司 任务搜索方法及装置、服务器和存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112000450A (zh) * 2020-08-18 2020-11-27 中国银联股份有限公司 神经网络架构搜索方法以及装置
CN112819138A (zh) * 2021-01-26 2021-05-18 上海依图网络科技有限公司 一种图像神经网络结构的优化方法及装置
CN114089889B (zh) * 2021-02-09 2024-04-09 京东科技控股股份有限公司 模型训练方法、装置以及存储介质
CN112965803A (zh) * 2021-03-22 2021-06-15 共达地创新技术(深圳)有限公司 Ai模型生成方法及电子设备
CN115563063A (zh) * 2021-07-01 2023-01-03 马上消费金融股份有限公司 一种模型构建方法、装置和电子设备
CN115220899A (zh) * 2022-08-20 2022-10-21 抖音视界有限公司 模型训练任务的调度方法、装置及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9390370B2 (en) * 2012-08-28 2016-07-12 International Business Machines Corporation Training deep neural network acoustic models using distributed hessian-free optimization
CN110543944A (zh) * 2019-09-11 2019-12-06 北京百度网讯科技有限公司 神经网络结构搜索方法、装置、电子设备和介质
CN111063000A (zh) * 2019-12-15 2020-04-24 中国科学院深圳先进技术研究院 基于神经网络结构搜索的磁共振快速成像方法和装置
CN111324630A (zh) * 2020-03-04 2020-06-23 中科弘云科技(北京)有限公司 基于mpi的神经网络架构搜索并行化方法和设备
CN111325356A (zh) * 2019-12-10 2020-06-23 四川大学 一种基于演化计算的神经网络搜索分布式训练系统及训练方法
CN112000450A (zh) * 2020-08-18 2020-11-27 中国银联股份有限公司 神经网络架构搜索方法以及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111198754B (zh) * 2018-11-19 2023-07-14 中移(杭州)信息技术有限公司 一种任务调度方法及装置
CN111325338B (zh) * 2020-02-12 2023-05-05 暗物智能科技(广州)有限公司 神经网络结构评价模型构建和神经网络结构搜索方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9390370B2 (en) * 2012-08-28 2016-07-12 International Business Machines Corporation Training deep neural network acoustic models using distributed hessian-free optimization
CN110543944A (zh) * 2019-09-11 2019-12-06 北京百度网讯科技有限公司 神经网络结构搜索方法、装置、电子设备和介质
CN111325356A (zh) * 2019-12-10 2020-06-23 四川大学 一种基于演化计算的神经网络搜索分布式训练系统及训练方法
CN111063000A (zh) * 2019-12-15 2020-04-24 中国科学院深圳先进技术研究院 基于神经网络结构搜索的磁共振快速成像方法和装置
CN111324630A (zh) * 2020-03-04 2020-06-23 中科弘云科技(北京)有限公司 基于mpi的神经网络架构搜索并行化方法和设备
CN112000450A (zh) * 2020-08-18 2020-11-27 中国银联股份有限公司 神经网络架构搜索方法以及装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114640695A (zh) * 2022-04-24 2022-06-17 上海交通大学 一种智能工厂基于长序列双预测和informer的高频时序数据有效传输方法
CN114640695B (zh) * 2022-04-24 2023-04-07 上海交通大学 一种智能工厂基于长序列双预测和informer的高频时序数据有效传输方法
WO2023221371A1 (zh) * 2022-05-19 2023-11-23 北京百度网讯科技有限公司 任务搜索方法及装置、服务器和存储介质
CN116954873A (zh) * 2023-09-21 2023-10-27 浪潮电子信息产业股份有限公司 异构计算系统及其算力节点选择方法、装置、设备、介质
CN116954873B (zh) * 2023-09-21 2024-01-23 浪潮电子信息产业股份有限公司 异构计算系统及其算力节点选择方法、装置、设备、介质

Also Published As

Publication number Publication date
TWI773100B (zh) 2022-08-01
TW202209152A (zh) 2022-03-01
CN112000450A (zh) 2020-11-27

Similar Documents

Publication Publication Date Title
WO2022037039A1 (zh) 神经网络架构搜索方法以及装置
US11954522B2 (en) Method for processing tasks in parallel, device and storage medium
JP7042897B2 (ja) モデルパラメータ更新方法及び装置
EP3893112A2 (en) Method and apparatus for scheduling deep learning reasoning engines, device, and medium
WO2022000802A1 (zh) 深度学习模型的适配方法、装置及电子设备
US20160253402A1 (en) Adaptive data repartitioning and adaptive data replication
CN111488492B (zh) 用于检索图数据库的方法和装置
KR102340277B1 (ko) 고효율 부정확 컴퓨팅 스토리지 장치
CN111461343B (zh) 模型参数更新方法及其相关设备
US20200169614A1 (en) Function Based Dynamic Traffic Management for Network Services
CN111177476B (zh) 数据查询方法、装置、电子设备及可读存储介质
US11769125B2 (en) Method and apparatus for processing transaction requests in blockchain, device and medium
CN111914010B (zh) 业务处理的方法、装置、设备以及存储介质
US11372594B2 (en) Method and apparatus for scheduling memory access request, device and storage medium
CN115335821B (zh) 卸载统计收集
CN111652354B (zh) 用于训练超网络的方法、装置、设备以及存储介质
US10901976B2 (en) Method and apparatus for determining SQL execution plan
EP3828739A2 (en) Parallelization of authentication strategies
CN111782341B (zh) 用于管理集群的方法和装置
CN111782147A (zh) 用于集群扩缩容的方法和装置
CN113778973B (zh) 数据存储方法和装置
CN112579897B (zh) 信息搜索方法和装置
CN111340222B (zh) 神经网络模型搜索方法、装置以及电子设备
CN111581049B (zh) 分布式系统运行状态监测方法、装置、设备及存储介质
CN111159316B (zh) 关系型数据库查询方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21857143

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21857143

Country of ref document: EP

Kind code of ref document: A1