WO2021000244A1

WO2021000244A1 - Hyperparameter recommendation for machine learning method

Info

Publication number: WO2021000244A1
Application number: PCT/CN2019/094317
Authority: WO
Inventors: Yaliang LI; Wang Lin; Facai YAN; Bolin Ding; Weidan KONG; Zhaofu Li; Wei Lin; Jingren Zhou
Original assignee: Alibaba Group Holding Limited
Priority date: 2019-07-02
Filing date: 2019-07-02
Publication date: 2021-01-07
Also published as: CN114341894A

Abstract

Different hyperparameter combinations may be initiated or selected as an input hyperparameter combination set. Corresponding learning model instances are trained through a machine learning method using a training subset set, with each learning model instance corresponding to each hyperparameter combination in the input hyperparameter combination set. Performance results of the respective learning model instances are obtained using a testing set. Based on the performance results, one or more hyperparameter combinations are selected according to at least one selection rule of a plurality of different selection rules, and added into an output hyperparameter combination set. The foregoing process is repeated with the one or more selected hyperparameter combinations being set as the input hyperparameter combination set, until a termination condition is met. A hyperparameter combination having the best performance result in the output hyperparameter combination set is then used as a final or optimal hyperparameter combination for the machine learning method.

Description

Hyperparameter Recommendation for Machine Learning Method

BACKGROUND

With the growth of machine learning methods, machine learning has become widely adopted in a variety of different applications nowadays, which include, for example, social media services, spam and malware filtering, search engine result refinement, product recommendation, etc. A number of hyperparameters, which are parameters usually fixed before training a learning model for a particular application is started, need to be defined beforehand. These hyperparameters affect and control behaviors (such as a total time needed for the training, for example) and performance (e.g., an accuracy rate of the learning model) of a machine learning method, and therefore are critical for a success of the machine learning method. If the number of hyperparameters is small and respective ranges of the hyperparameters are limited, an exhaustive search or a grid search may be used to determine optimal values of the hyperparameters.

However, with increases in size and complexity of data, finding optimal values of hyperparameters have become increasingly time‐consuming and resource‐intensive, if not impossible. Furthermore, an increasingly complexity of machine learning methods and/or models may also increase the number and complexities of combinations of hyperparameters that need to be searched and determined, which further add the difficulties of obtaining optimal values of hyperparameters for a corresponding machine learning method.

SUMMARY

This summary introduces simplified concepts of hyperparameter recommendation, which will be further described below in the Detailed Description. This summary is not intended to identify essential features of the claimed subject matter, nor is it intended for use in limiting the scope of the claimed subject matter.

This application describes example implementations of hyperparameter recommendation for a machine learning method. In implementations, a plurality of different hyperparameter combinations may be initiated or selected as an input hyperparameter combination set. Corresponding learning model instances are trained through the machine learning method using a subset of a training set, with each learning model instance corresponding to each hyperparameter combination in the input hyperparameter combination set. Performance results of the respective learning model instances are obtained using a testing set. Based on the performance results, one or more hyperparameter combinations are selected according to at least one selection rule of a plurality of different selection rules, and added into an output hyperparameter combination set. The foregoing process is repeated with the one or more selected hyperparameter combinations being set as the input hyperparameter combination set, until a termination condition is met. A hyperparameter combination having the best performance result in the output hyperparameter combination set is then used as a final or optimal hyperparameter combination for the machine learning method.

BRIEF DESCRIPTION OF THE DRAWINGS

The detailed description is set forth with reference to the accompanying figures. In the figures, the left‐most digit (s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items.

FIG. 1 illustrates an example environment in which a hyperparameter recommendation system may be used.

FIG. 2 illustrates an example hyperparameter recommendation system in more detail.

FIG. 3 illustrates a first example method of determining and recommending a hyperparameter combination.

FIG. 4 illustrates a second example method of determining and recommending a hyperparameter combination.

DETAILED DESCRIPTION

Overview

As noted above, existing technologies require determination of a number of hyperparameters prior to starting a machine learning process. Due to a large scale and complexity of data involving in training and testing, the complexity of a machine learning method, and a large number and vast ranges of hyperparameters, determining a suitable or optimal set of hyperparameters (or hyperparameter combination) requires a large amount of time, and a huge amount of computing and storage resources in the existing technologies, which may result in obtaining sub‐ optimal hyperparameter combination for a machine learning method due to limited resources.

This disclosure describes an example hyperparameter recommendation system. The hyperparameter recommendation system samples a subset of a training set that changes (e.g., the subset is progressively increased in size, etc. ) after each iteration of a machine learning process. The hyperparameter recommendation system also adopts a plurality of different selection rules for outputting one or more hyperparameter combinations that are obtained at the end of an iteration of the machine learning process, and uses the one or more hyperparameter combinations as a new input for a next iteration of the machine learning process, thereby allowing exploration and exploitation of potential hyperparameter combinations in a balanced manner.

Specifically, given a particular machine learning method, the hyperparameter recommendation system may first obtain a plurality of different hyperparameter combinations (or configurations) as an input for a first iteration of a machine learning process. A hyperparameter is a parameter that is fixed during a machine learning process, and whose value is used for controlling the behavior of the machine learning process. Examples of hyperparameters include, but are not limited to, parameters associated with a machine learning method or algorithm (such as a learning rate, a batch size, etc. ) , and parameters associated with a learning model (such as the number of hidden layers and the number of nodes in each hidden layer in a neural network model, for example) . In implementations, hyperparameters are different from model parameters of a learning model (e.g., weights, etc., of a neural network model) which are to be determined when the learning model is trained, i.e., during a machine learning process. In implementations, depending on a type of a hyperparameter, a value of the hyperparameter can be taken in any form, such as a real value (e.g., a learning rate) , an integral value (e.g., the number of layers) , a binary value (e.g., whether an early stopping is used or not) , or a categorical value (e.g., a choice of an optimizer) , etc.

During the first iteration of the machine learning process, the hyperparameter recommendation system may train learning model instances through the machine learning method with each hyperparameter combination of the plurality of different hyperparameter combinations using a sampling subset of a training set (or called a training sampling subset) . The hyperparameter recommendation system may obtain and evaluate performance results of the learning model instances for the plurality of different hyperparameter combinations using a testing set.

In implementations, the hyperparameter recommendation system may then obtain or select one or more hyperparameter combinations from among the plurality of different hyperparameter combinations based on the performance results through one or more selection rules. In implementations, the one or more selection rules may be chosen strategically (e.g., probabilistically) from a plurality of selection rules. The plurality of selection rules may include a first selection rule of selecting a first predetermined number of hyperparameter combinations having top performance results from the plurality of different hyperparameter combinations, a second rule of selecting a second predetermined number of hyperparameter combinations having top performance results from the plurality of different hyperparameter combinations, and randomly changing values of a selected number of hyperparameters in each of the second predetermined number of hyperparameter combinations, and a third rule of randomly selecting a third predetermined number of hyperparameter combinations from the plurality of different hyperparameter combinations.

The hyperparameter recommendation system may then add the one or more hyperparameter combinations into an output hyperparameter combination set. In implementations, the hyperparameter recommendation system may determine whether a termination condition is met.

If the termination condition is not met, the hyperparameter recommendation system may use the one or more hyperparameter combinations as an input for a next iteration (e.g., a second iteration) of the machine learning process, and the next iteration of the machine learning process is repeatedly performed as described above until the termination condition is met.

In implementations, after the termination condition is met, the hyperparameter recommendation system may determine a hyperparameter combination having the best performance result in the output hyperparameter combination set as a final hyperparameter combination for the machine learning method.

In the examples described herein, the described hyperparameter recommendation system progressively changes a size of a subset of a training set used in each iteration of a machine learning process, strategically (e.g., probabilistically) adopts one or more different selection rules for outputting one or more hyperparameter combinations that are obtained at the end of an iteration of the machine learning process, and use the one or more hyperparameter combinations as an input for a next iteration of the machine learning process, thereby allowing exploration and exploitation of potential hyperparameter combinations in a balanced manner.

In implementations, the described hyperparameter recommendation system greatly reduces the running time and the number of iterations for obtaining a suitable or optimal hyperparameter combination for recommending to a machine learning algorithm and/or a machine learning model as compared with other hyperparameter optimization methods (such as the Bayesian optimization algorithm) , thus reducing the amount of computing and storage resources used for obtaining the suitable or optimal hyperparameter combination. As an example, a comparison between the disclosed method adopted by the current hyperparameter recommendation system and the Bayesian algorithm method using CIFAR‐10 is given in the following two tables. The following results are obtained by averaging ten independent experiments.

Table 1

Table 2

As can be seen from the above tables, the disclosed method obtains accuracy rates that are similar to those of the Bayesian optimization algorithm with much less time and fewer number of tests or iterations, and thus greatly improves the speed of obtaining a suitable or optimal hyperparameter combination for recommendation to a machine learning algorithm and/or a machine learning model on the one hand, and greatly reduces the computing and storage resources used for obtaining the suitable or optimal hyperparameter combination on the other hand, as compared to existing hyperparameter optimization methods.

Furthermore, functions described herein to be performed by the hyperparameter recommendation system may be performed by multiple separate units or services. For example, an input service may obtain a set of hyperparameter combinations as an input for an iteration of a machine learning process, while a training service may train learning models through respective instances of a machine learning method with the set of hyperparameter combinations using a training sampling subset that is increased in size after each iteration of the machine learning process. A testing service may obtain and evaluate performance results of the learning models for the set of hyperparameter combinations using a testing set, while yet a selection service may select one or more hyperparameter combinations from among the set of hyperparameter combinations based on the performance results through one or more selection rules. In implementations, a determination service may determine whether a termination condition is met, and send the one or more hyperparameter combinations as a new input for a next iteration of the machine learning process to the input service, or send a final output hyperparameter combination set to an output service, which may determine and output a hyperparameter combination having the best performance result in the final output hyperparameter combination set as a recommended or optimal hyperparameter combination to be used by the machine learning method.

Moreover, although in the examples described herein, the hyperparameter recommendation system may be implemented as software and/or hardware installed in a single device, in other examples, the hyperparameter recommendation system may be implemented and distributed in multiple devices or as services provided in one or more servers over a network and/or in a cloud computing architecture.

The application describes multiple and varied implementations and implementations. The following section describes an example framework that is suitable for practicing various implementations. Next, the application describes example systems, devices, and processes for implementing a hyperparameter recommendation system.

Example Environment

FIG. 1 illustrates an example environment 100 usable to implement a hyperparameter recommendation system. The environment 100 may include a hyperparameter recommendation system 102. In this example, the hyperparameter recommendation system 102 is described to include one or more servers 104. In some instances, the hyperparameter recommendation system 102 may be part of the one or more servers 104, or may be included in and/or distributed among the one or more servers 104, which may communicate data with one another via a network 106. Additionally or alternatively, in some instances, the functions of the hyperparameter recommendation system 102 may be included in and/or distributed among the one or more servers 104. For example, a first server of the one or more servers 104 may include part of the functions of the hyperparameter recommendation system 102, while other functions of the hyperparameter recommendation system 102 may be included in a second server of the one or more servers 104. Furthermore, in some implementations, some or all the functions of the hyperparameter recommendation system 102 may be included in a cloud computing system or architecture, and may be provided as services for determining or recommending suitable or optimal hyperparameter combinations or configurations for machine learning methods.

In implementations, the hyperparameter recommendation system 102 may be part of a client device 108, e.g., software and/or hardware components of the client device 108. In some instances, the hyperparameter recommendation system 102 may include a client device 108. In some implementations, some or all the functions of the hyperparameter recommendation system 102 may be included in the one or more servers 104, the client device 108, and/or a cloud computing system or architecture.

The client device 108 may be implemented as any of a variety of computing devices including, but not limited to, a desktop computer, a notebook or portable computer, a handheld device, a netbook, an Internet appliance, a tablet or slate computer, a mobile device (e.g., a mobile phone, a personal digital assistant, a smart phone, etc. ) , etc., or a combination thereof.

The network 106 may be a wireless or a wired network, or a combination thereof. The network 106 may be a collection of individual networks interconnected with each other and functioning as a single large network (e.g., the Internet or an intranet) . Examples of such individual networks include, but are not limited to, telephone networks, cable networks, Local Area Networks (LANs) , Wide Area Networks (WANs) , and Metropolitan Area Networks (MANs) . Further, the individual networks may be wireless or wired networks, or a combination thereof. Wired networks may include an electrical carrier connection (such a communication cable, etc. ) and/or an optical carrier or connection (such as an optical fiber connection, etc. ) . Wireless networks may include, for example, a WiFi network, other radio frequency networks (e.g.,

Zigbee, etc. ) , etc.

In implementations, a user 110 may want to obtain a suitable set of hyperparameters for training a learning model using a machine learning method. The user 110 may input information of the machine learning method and the learning model to the hyperparameter recommendation system 102 through the client device 108. In implementations, the user 110 may provide respective names of the machine learning method and the learning model to the hyperparameter recommendation system 102 through the client device 108, and the hyperparameter recommendation system 102 will then search for detailed information such as what hyperparameters exist for the machine learning method and/or the learning model, types of the hyperparameters, etc. In implementations, the user 110 may directly provide these pieces of detailed information to the hyperparameter recommendation system 102. After receiving the information of the machine learning method and the learning model, the hyperparameter recommendation system 102 may perform a determination of a hyperparameter combination or configuration that is suitable or optimal for the machine learning method and the learning model, and return or recommend such suitable or optimal hyperparameter combination or configuration to the client device 108 for presentation to the user 110, or for storage by the client device 108.

Example Hyperparameter recommendation system

FIG. 2 illustrates the hyperparameter recommendation system 102 in more detail. In implementations, the hyperparameter recommendation system 102 may include, but is not limited to, one or more processing units 202, memory 204, and program data 206. In implementations, the hyperparameter recommendation system 102 may further include a network interface 208 and an input/output interface 210. Additionally or alternatively, some or all of the functionalities of the hyperparameter recommendation system 102 may be implemented using an ASIC (i.e., Application‐Specific Integrated Circuit) , a FPGA (i.e., Field‐Programmable Gate Array) , or other hardware provided in the hyperparameter recommendation system 102.

In implementations, the one or more processing units 202 are configured to execute instructions received from the network interface 208, received from the input/output interface 210, and/or stored in the memory 204. In implementations, the one or more processing units 202 may be implemented as one or more hardware processors including, for example, a microprocessor, an application‐specific instruction‐set processor, a graphics processing unit, a physics processing unit (PPU) , a central processing unit (CPU) , a graphics processing unit (GPU) , a digital signal processor, etc. Additionally or alternatively, the functionality described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include field‐programmable gate arrays (FPGAs) , application‐specific integrated circuits (ASICs) , application‐specific standard products (ASSPs) , system‐on‐a‐chip systems (SOCs) , complex programmable logic devices (CPLDs) , etc.

The memory 204 may include computer‐readable media in a form of volatile memory, such as Random Access Memory (RAM) and/or non‐volatile memory, such as read only memory (ROM) or flash RAM. The memory 204 is an example of computer‐readable media.

The computer readable media may include a volatile or non‐volatile type, a removable or non‐removable media, which may achieve storage of information using any method or technology. The information may include a computer‐readable instruction, a data structure, a program module or other data. Examples of computer storage media include, but not limited to, phase‐change memory (PRAM) , static random access memory (SRAM) , dynamic random access memory (DRAM) , other types of random‐access memory (RAM) , read‐only memory (ROM) , electronically erasable programmable read‐only memory (EEPROM) , quick flash memory or other internal storage technology, compact disk read‐only memory (CD‐ROM) , digital versatile disc (DVD) or other optical storage, magnetic cassette tape, magnetic disk storage or other magnetic storage devices, or any other non‐transmission media, which may be used to store information that may be accessed by a computing device. As defined herein, the computer readable media does not include transitory media, such as modulated data signals and carrier waves.

Although in this example, only hardware components are described in the hyperparameter recommendation system 102, in other instances, the hyperparameter recommendation system 102 may further include other hardware components and/or other software components such as program units to execute instructions stored in the memory 204 for performing various operations such as training, testing, evaluation, selection, outputting, etc. For example, the hyperparameter recommendation system 102 may further include a hyperparameter database 212 that stores mapping relationships between identification information (e.g., names) of a plurality of machine learning methods and respective sets of hyperparameters. Additionally or alternatively, the hyperparameter database 212 may store mapping relationships between identification information (e.g., names) of a plurality of learning models and respective sets of hyperparameters. Additionally or alternatively, the hyperparameter database 212 may store a corresponding type and a corresponding scope (or range) of each hyperparameter associated with a machine learning method or a learning model.

Example Methods

FIG. 3 is a schematic diagram depicting an example method of determining and recommending a hyperparameter combination. FIG. 4 is a schematic diagram depicting another example method of determining and recommending a hyperparameter combination. The methods of FIGS. 3 and 4 may, but need not, be implemented in the environment of FIG. 1 and using the system of FIG. 2. For ease of explanation, methods 300 and 400 are described with reference to FIGS. 3 and 4. However, the methods 300 and 400 may alternatively be implemented in other environments and/or using other systems.

The methods 300 and 400 are described in the general context of computer‐executable instructions. Generally, computer‐executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, and the like that perform particular functions or implement particular abstract data types. Furthermore, each of the example methods are illustrated as a collection of blocks in a logical flow graph representing a sequence of operations that can be implemented in hardware, software, firmware, or a combination thereof. The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method, or alternate methods. Additionally, individual blocks may be omitted from the method without departing from the spirit and scope of the subject matter described herein. In the context of software, the blocks represent computer instructions that, when executed by one or more processors, perform the recited operations. In the context of hardware, some or all of the blocks may represent application specific integrated circuits (ASICs) or other physical components that perform the recited operations.

Referring back to FIG. 3, at block 302, the hyperparameter recommendation system 102 may obtain information of a machine learning method and/or a learning model.

In implementations, the hyperparameter recommendation system 102 may obtain information of a machine learning method and/or a learning model from a user (such as the user 110) through a client device (such as the client device 108) . In one instance, the information of the machine learning method and/or the learning model obtained by the hyperparameter recommendation system 102 may include a name of the machine learning method, a name of the learning model, and information of hyperparameters of the machine learning method and/or the learning model that need to be determined.

In instances, the information of the machine learning method and/or the learning model obtained by the hyperparameter recommendation system 102 may include a name of the machine learning method, and a name of the learning model. In this case, the hyperparameter recommendation system 102 may determine or search for information about hyperparameters of the machine learning method and/or the learning model that need to be determined based on the name of the machine learning method and/or the name of the learning model. For example, the hyperparameter recommendation system 102 may search the hyperparameter database 212 for the information of the hyperparameters of the machine learning method and/or the learning model that need to be determined.

At block 304, the hyperparameter recommendation system 102 may determine or obtain a training set for training and a testing set for cross‐validation.

In implementations, the hyperparameter recommendation system 102 may receive information of the training set and the testing set from the user 110 through the client device 108 in a form of address links. For example, the user 110 may provide address links of storage devices that store the training set and the testing set to the hyperparameter recommendation system 102 through the client device 108. In one instance, the storage devices may be storage spaces that are owned and/or maintained by the user 110 or a third‐party provider. In this case, the hyperparameter recommendation system 102 may download the training set and the testing set via the address links. In some instances, the user 110 may have provided and stored the training set and the testing set in the memory 204 of the hyperparameter recommendation system 102. In this case, the address links may be address information of storage locations of the training set and the testing set in the memory 204 of the hyperparameter recommendation system 102.

At block 306, the hyperparameter recommendation system 102 may determine a plurality of different hyperparameter combinations for the machine learning method and/or the learning model, each hyperparameter combination including a respective set of initial values of hyperparameters to be determined.

In implementations, for each hyperparameter combination, the hyperparameter recommendation system 102 may determine initial values of hyperparameters included in the respective hyperparameter combination by randomly selecting values of the hyperparameters within respective allowable scopes (or ranges) of the hyperparameters according to corresponding types of the hyperparameters. In implementations, the hyperparameter recommendation system 102 may obtain information of the types and the scopes of the hyperparameters from the hyperparameter database 212.

Additionally or alternatively, the hyperparameter recommendation system 102 may deterministically or strategically obtain the plurality of different hyperparameter combinations of the machine learning method and/or the learning model based on prior knowledge. For example, the user 110 or another user may have requested a recommendation for the hyperparameters of the machine learning method and/or the learning model previously. In this case, the hyperparameter recommendation system 102 may use a previous result as the initial values of the plurality of different hyperparameter combinations.

At block 308, the hyperparameter recommendation system 102 may set the initial values of the plurality of different hyperparameter combinations as an input hyperparameter combination set.

At block 310, the hyperparameter recommendation system 102 may perform a machine learning process for a respective learning model instance through the machine learning method for each hyperparameter combination in the input hyperparameter combination set using a sampling subset of the training set (or simply called a training subset) , with at least one of a plurality selection rules being selected and used at the end of the machine learning process to select one or more hyperparameter combinations.

In implementations, an iteration of the machine learning process may include training a respective learning model instance through the machine learning method for each hyperparameter combination in the input hyperparameter combination set using a sampling subset of the training set, evaluating a performance result of the learning model instance for each hyperparameter combination in the input hyperparameter combination set using the testing set, and selecting at least one selection rule of a plurality of selection rules, and selecting one or more hyperparameter combinations from the input hyperparameter combination set based on the at least one selection rule and the performance result for each hyperparameter combination in the input hyperparameter combination set.

In implementations, given values of hyperparameters in each hyperparameter combination of the input hyperparameter combination set, the hyperparameter recommendation system 102 may train and obtain a respective learning model instance by the machine learning method with the given values of the hyperparameters using a subset of the training set.

In implementations, training subsets may be different for different iterations of the machine learning process. For example, a size of a training subset used for training may be progressively or monotonically increased along with the number of iterations of the machine learning process. For example, a size of a training subset of each iteration of the machine learning process may be predefined and stored in a data structure such as an a predefined ratio array, e.g., [s ₁, s ₂, …, s _n] , where n is a positive integer, 0＜s _i≤1, s _i≤s _j, with i＜j and i and j are integers less than or equal to n.

In implementations, a size of a training subset for an iteration of the machine learning process may depend on the availability of computing and storage resources, a size of a training subset at a previous iteration of the machine learning process, etc. In implementations, depending on which sampling rule (random sampling, additional sampling on top of existing sampling subset, etc. ) is employed, a training subset of an iteration of the machine learning process may be completely included, partially included, or not included in a training subset of a subsequent iteration of the machine learning process.

In implementations, the hyperparameter recommendation system 102 may evaluate the performance of the learning model instance for each hyperparameter combination using the testing set and an evaluation metric. In implementations, the evaluation metric may include, but is not limited to, an accuracy rate. For example, the hyperparameter recommendation system 102 may apply a learning model instance of a hyperparameter combination on the testing set, and compute a percentage (i.e., an accuracy rate) of cases in the testing set that are accurately recognized or determined by the learning model instance. This accuracy rate is then treated by the hyperparameter recommendation system 102 as a performance result of the learning model instance for the associated hyperparameter combination.

In implementations, the plurality of selection rules may include, but are not limited to, a first rule of selecting a first predetermined number of hyperparameter combinations having top performance results from the input hyperparameter combination set, a second rule of selecting a second predetermined number of hyperparameter combinations having top performance results from the input hyperparameter combination set, and randomly changing values of a selected number of hyperparameters in each of the second predetermined number of hyperparameter combinations, and a third rule of randomly selecting a third predetermined number of hyperparameter combinations from the input hyperparameter combination set.

In implementations, the hyperparameter recommendation system 102 may choose the at least one selection rule of the plurality of selection rules based on a predefined strategy. For example, the hyperparameter recommendation system 102 may assign different selection probabilities of being selected to the plurality of selection rules, each of the different selection probabilities being changed after each iteration of the machine learning process according to the predefined strategy. By way of example and not limitation, the predefined strategy may include assigning different numerical intervals to the plurality of selection rules, randomly selecting a numerical value within a total range of the different numerical intervals, determining a numerical interval of the different numerical intervals to which the selected numerical value belongs, and selecting a selection rule corresponding to the determined numerical interval from among the plurality of selection rules. In implementations, the different numerical intervals assigned to the plurality of selection rules may or may not change along with the number of iterations of the machine learning process.

By way of example and not limitation, the plurality of selection rules include the three selection rules (i.e., the first selection rule, the second selection rule, and the third selection rule) as described above. A range (i.e., R2) of a numerical interval assigned to the second selection rule may be decreased (e.g., linearly) along with the number of iterations of the machine learning process, and a range (i.e., R3) of a numerical interval assigned to the third selection rule may be decreased (e.g., exponentially) along with the number of iterations of the machine learning process. In implementations, a rate of decrease of the range of the numerical interval assigned to the second selection rule may be slower than a rate of decrease of the range of numerical interval assigned to the third selection rule, so that the likelihood of randomly selecting a hyperparameter combination is relatively high in early iterations of the machine learning process, and is decreased as the number of iterations of the machine learning process increases, to allow exploration and exploitation of potential hyperparameter combinations that are optimal or suitable for the machine learning method and/or the learning model. Since a total range of the different numerical intervals assigned to these selection rules is fixed (e.g., a range of [0, 1] ) , a range (i.e., R1) of a numerical interval assigned to the first selection rule is increased along with the number of iterations of the machine learning process.

Following the above example, the range of [0, 1] may be divided into three numerical intervals, R1, R2, and R3, and an order of these three numerical intervals can be arbitrary, as long as a sum of R1, R2, and R3 is equal to one. After arranging these numerical intervals, a random number within the range of [0, 1] is drawn, and a determination is made as to which numerical interval this random number falls into. A selection rule corresponding to that numerical interval is then selected for selecting the one or more hyperparameter combinations during the machine learning process at block 310. For example, if the random number is less than R3, the third selection rule is chosen. If the random number is larger than R3 and less than R2+R3, the second selection rule is chosen. Otherwise, the first selection rule is chosen, i.e., the random number is larger than R2+R3.

At block 312, the hyperparameter recommendation system 102 may add the one or more selected hyperparameter combinations into an output hyperparameter combination set.

In implementations, the hyperparameter recommendation system 102 may add one or more hyperparameter combinations selected at the end of each iteration of the machine learning process into an output hyperparameter combination set. In implementations, the hyperparameter recommendation system 102 may further store information (e.g., connection weights, etc., in case of a neural network model) of respective learning model instances that are obtained at the end of a current iteration of the machine learning process using the one or more selected hyperparameter combinations in the memory 204 or peripheral storage devices of the hyperparameter recommendation system 102. In implementations, stored information of a learning model instance corresponding to a selected hyperparameter combination is sufficient for constructing the learning model instance without additional training.

At block 314, the hyperparameter recommendation system 102 may determine whether one or more termination conditions are met.

In implementations, the hyperparameter recommendation system 102 may determine whether to start a new iteration of the machine learning process or to terminate the machine learning process based on whether one or more termination conditions are met. In implementations, the one or more termination conditions may include, but are not limited to, a maximum allowable number of iterations being reached, a difference between hyperparameter combinations of two successive iterations being less than or equal to a predefined threshold, an improvement in performance results between hyperparameter combinations of two successive iterations being less than or equal to a predefined percentage, the likelihood of selecting the first rule being greater than or equal to a predefined value, an amount of available resources (such as computing and/or storage resources) being less than an expected amount of resources needed for performing a new iteration of the machine learning process. The expected amount of resources needed for performing a new iteration of the machine learning process may be estimated based on an increase in size of a training subset and a size of the testing set for the new iteration, a calculated or measured amount of resources used for the current iteration of the machine learning process, etc.

At block 316, if the one or more termination conditions are not met, the hyperparameter recommendation system 102 may obtain a new training subset (which may include the previous training subset plus additional training data according to a predefined ratio array as described above, for example) , set the one or more hyperparameter combinations that are selected above at block 310 as a new input hyperparameter combination set, and start a new iteration of the machine learning process to repeat the above operations of blocks 310 –314 until the one or more termination conditions are met.

At block 318, if the one or more termination conditions are met, the hyperparameter recommendation system 102 may select a hyperparameter combination having the best performance result from the output hyperparameter combination set, and recommend this selected hyperparameter combination as a suitable or optimal hyperparameter combination for the machine learning method and/or the learning model.

In implementations, upon determining that the one or more termination conditions are met, the hyperparameter recommendation system 102 may select a hyperparameter combination having the best performance result from among the hyperparameter combinations that have been added into the output hyperparameter combination set, and send this selected hyperparameter combination to the user 110 through the client device 108 as a recommended or optimal hyperparameter combination for the machine learning method and/or the learning model.

In implementations, the hyperparameter recommendation system 102 may further provide information of a corresponding learning model instance associated with the recommended hyperparameter combination to the client device 108, so that this corresponding learning model instance can be directly constructed in the client device 108 without additional training (e.g., by applying the machine learning method with the selected hyperparameter combination to obtain the learning model instance) .

Although the above method blocks are described to be executed in a particular order, in some implementations, some or all of the method blocks can be executed in other orders, or in parallel. For example, the hyperparameter recommendation system 102 may determine whether the one or more termination conditions are met at block 314 before or when adding the one or more selected hyperparameter combinations into the output hyperparameter combination set at block 312. Furthermore, the hyperparameter recommendation system 102 may train learning model instance for each hyperparameter combination in parallel at block 310, upon determining or predicting that available computing and/or storage resources are enough for performing such parallel operations.

Referring to FIG. 4, at block 402, the hyperparameter recommendation system 102 may receive a request for determining or recommending a hyperparameter combination for a machine learning method and/or a learning model.

In implementations, the request may include, but is not limited to, information of the machine learning method and/or the learning model. In implementations, the request may further include information about storage locations of a training set and a testing set to be used.

At block 404, the hyperparameter recommendation system 102 may initiate an input hyperparameter combination set.

In implementations, the hyperparameter recommendation system 102 may randomly select a plurality of different hyperparameter combinations from possible hyperparameter combinations as an input hyperparameter combination set.

At block 406, the hyperparameter recommendation system 102 may perform a plurality of iterations of a machine learning process for the learning model through the machine learning method, wherein each iteration of the machine learning process comprises training and testing a respective learning model instance for each hyperparameter combination in the input hyperparameter combination set, selecting at least one selection rule from a plurality of selection rules, selecting one or more hyperparameter combinations into an output hyperparameter combination set based on the at least one selection rule, adding the one or more selected hyperparameter combinations into an output hyperparameter combination set, and setting the one or more selected hyperparameter combinations as the input hyperparameter combination set for a next iteration of the machine learning process if one or more termination conditions are not met, wherein training sets are different in at least some of the plurality of iterations of the machine learning process.

In implementations, the plurality of selection rules may include the first selection, rule, the second selection rule, and the third selection rule as described in the foregoing implementations.

At block 408, the hyperparameter recommendation system 102 may select a hyperparameter combination having a best performance result from the output hyperparameter combination set.

At block 410, the hyperparameter recommendation system 102 may return the selected hyperparameter combination as the hyperparameter combination recommended for the machine learning method and/or the learning model.

Although the above method blocks are described to be executed in a particular order, in some implementations, some or all of the method blocks can be executed in other orders, or in parallel.

Any of the acts of any of the methods described herein may be implemented at least partially by a processor or other electronic device based on instructions stored on one or more computer‐readable media. By way of example and not limitation, any of the acts of any of the methods described herein may be implemented under control of one or more processors configured with executable instructions that may be stored on one or more computer‐readable media.

Conclusion

Although implementations have been described in language specific to structural features and/or methodological acts, it is to be understood that the claims are not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the claimed subject matter. Additionally or alternatively, some or all of the operations may be implemented by one or more ASICS, FPGAs, or other hardware.

The present disclosure can be further understood using the following clauses.

Clause 1: A method implemented by one or more computing devices, the method comprising: determining a plurality of different hyperparameter combinations as an input hyperparameter combination set; training a corresponding learning model instance of a machine learning method with each hyperparameter combination in the input hyperparameter combination set using a varying ratio of a training set; testing the corresponding learning model instance for each hyperparameter combination in the input hyperparameter combination set using a testing set to obtain a respective performance result; obtaining one or more hyperparameter combinations from the input hyperparameter combination set based on one or more rules, the one or more rules being selected from a plurality of rules comprising: a first rule of selecting a first predetermined number of hyperparameter combinations having top performance results from the input hyperparameter combination set; a second rule of selecting a second predetermined number of hyperparameter combinations having top performance results from the input hyperparameter combination set, and randomly changing values of a selected number of hyperparameters in each of the second predetermined number of hyperparameter combinations; and a third rule of randomly selecting a third predetermined number of hyperparameter combinations from the input hyperparameter combination set; adding the one or more hyperparameter combinations into an output hyperparameter combination set; setting the one or more hyperparameter combinations as the input hyperparameter combination set, and recursively performing a new iteration of the training, the testing, the obtaining, and the adding until a termination condition is satisfied; and selecting a hyperparameter combination having a best performance result from the output hyperparameter combination set as a final hyperparameter combination for the machine learning method, wherein the varying ratio and a likelihood of selecting the first rule are increased, and a likelihood of selecting the second rule and a likelihood of selecting the third rule are decreased, after each iteration of the training, the testing, the obtaining, and the adding.

Clause 2: The method of Clause 1, wherein determining the plurality of different hyperparameter combinations comprises: randomly selecting the plurality of different hyperparameter combinations from among all known hyperparameter combinations; or deterministically selecting the plurality of different hyperparameter combinations based on a previous output hyperparameter combination set that is obtained using a different training set and/or a different testing set.

Clause 3: The method of Clause 1, wherein the termination condition comprises at least one of: a maximum number of iterations being reached; a difference between hyperparameter combinations of two successive iterations being less than a predefined threshold; or the likelihood of selecting the first rule being greater than or equal to a predefined threshold.

Clause 4: The method of Clause 1, wherein an amount of increase of the varying ratio after each iteration depends on at least one of a predefined ratio increase rate, a current of the varying ratio, or computing resources that are available for the training.

Clause 5: A system comprising: one or more processors; memory storing executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising: receiving a request for recommending a hyperparameter combination for a machine learning method and/or a learning model; initiating an input hyperparameter combination set; performing a plurality of iterations of a machine learning process for the learning model through the machine learning method, wherein each iteration of the machine learning process comprises training and testing a respective learning model instance for each hyperparameter combination in the input hyperparameter combination set, selecting at least one selection rule from a plurality of selection rules, selecting one or more hyperparameter combinations into an output hyperparameter combination set based on the at least one selection rule, adding the one or more selected hyperparameter combinations into an output hyperparameter combination set, and setting the one or more selected hyperparameter combinations as the input hyperparameter combination set for a next iteration of the machine learning process if one or more termination conditions are not met, wherein training sets are different in at least some of the plurality of iterations of the machine learning process; and selecting a hyperparameter combination having a best performance result from the output hyperparameter combination set as the hyperparameter combination recommended for the machine learning method and/or the learning model.

Clause 6: The system of Clause 5, wherein initiating the input hyperparameter combination set comprises: randomly selecting the input hyperparameter combination set from among all known hyperparameter combinations; or deterministically selecting the input hyperparameter combination set based on a previous output hyperparameter combination set that is obtained using a different training set and/or a different testing set.

Clause 7: The system of Clause 5, wherein the plurality of selection rules comprise: a first selection rule of selecting a first predetermined number of hyperparameter combinations having top performance results from the input hyperparameter combination set; a second selection rule of selecting a second predetermined number of hyperparameter combinations having top performance results from the input hyperparameter combination set, and randomly changing values of a selected number of hyperparameters in each of the second predetermined number of hyperparameter combinations; and a third selection rule of randomly selecting a third predetermined number of hyperparameter combinations from the input hyperparameter combination set.

Clause 8: The system of Clause 7, wherein the one or more termination conditions comprise: a maximum number of iterations being reached; a difference between hyperparameter combinations of two successive iterations being less than a predefined threshold; or a likelihood of selecting the first rule being greater than or equal to a predefined threshold.

Clause 9: The system of Clause 5, wherein selecting the at least one selection rule from the plurality of selection rules comprises randomly selecting the at least one selection rule from the plurality of selection rules.

Clause 10: The system of Clause 5, wherein the plurality of selection rules are assigned with different selection probabilities, selection probabilities of one or more selection rules being decreased after each iteration of the machine learning process.

Clause 11: The system of Clause 5, wherein a size of a training set is monotonically increased after each iteration of the machine learning process.

Clause 12: The system of Clause 5, wherein a size of a training set in each iteration of the machine learning process depends on a predefined ratio increase rate, and/or available computing and storage resources.

Clause 13: One or more computer readable media storing executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising: receiving a request for recommending a hyperparameter combination for a machine learning method and/or a learning model; initiating an input hyperparameter combination set; performing a plurality of iterations of a machine learning process for the learning model through the machine learning method, wherein each iteration of the machine learning process comprises training and testing a respective learning model instance for each hyperparameter combination in the input hyperparameter combination set, selecting at least one selection rule from a plurality of selection rules, selecting one or more hyperparameter combinations into an output hyperparameter combination set based on the at least one selection rule, adding the one or more selected hyperparameter combinations into an output hyperparameter combination set, and setting the one or more selected hyperparameter combinations as the input hyperparameter combination set for a next iteration of the machine learning process if one or more termination conditions are not met, wherein training sets are different in at least some of the plurality of iterations of the machine learning process; and selecting a hyperparameter combination having a best performance result from the output hyperparameter combination set as the hyperparameter combination recommended for the machine learning method and/or the learning model.

Clause 14: The one or more computer readable media of Clause 13, wherein initiating the input hyperparameter combination set comprises: randomly selecting the input hyperparameter combination set from among all known hyperparameter combinations; or deterministically selecting the input hyperparameter combination set based on a previous output hyperparameter combination set that is obtained using a different training set and/or a different testing set.

Clause 15: The one or more computer readable media of Clause 13, wherein the plurality of selection rules comprise: a first selection rule of selecting a first predetermined number of hyperparameter combinations having top performance results from the input hyperparameter combination set; a second selection rule of selecting a second predetermined number of hyperparameter combinations having top performance results from the input hyperparameter combination set, and randomly changing values of a selected number of hyperparameters in each of the second predetermined number of hyperparameter combinations; and a third selection rule of randomly selecting a third predetermined number of hyperparameter combinations from the input hyperparameter combination set.

Clause 16: The one or more computer readable media of Clause 15, wherein the one or more termination conditions comprise: a maximum number of iterations being reached; a difference between hyperparameter combinations of two successive iterations being less than a predefined threshold; or the likelihood of selecting the first rule being greater than or equal to a predefined threshold.

Clause 17: The one or more computer readable media of Clause 13, wherein selecting the at least one selection rule from the plurality of selection rules comprises randomly selecting the at least one selection rule from the plurality of selection rules.

Clause 18: The one or more computer readable media of Clause 13, wherein the plurality of selection rules are assigned with different selection probabilities, selection probabilities of one or more selection rules being decreased after each iteration of the machine learning process.

Clause 19: The one or more computer readable media of Clause 13, wherein a size of a training set is monotonically increased after each iteration of the machine learning process.

Clause 20: The one or more computer readable media of Clause 13, wherein a size of a training set in each iteration of the machine learning process depends on a predefined ratio increase rate, and/or available computing and storage resources.

Claims

A method implemented by one or more computing devices, the method comprising:

determining a plurality of different hyperparameter combinations as an input hyperparameter combination set;

training a corresponding learning model instance of a machine learning method with each hyperparameter combination in the input hyperparameter combination set using a varying ratio of a training set;

testing the corresponding learning model instance for each hyperparameter combination in the input hyperparameter combination set using a testing set to obtain a respective performance result;

obtaining one or more hyperparameter combinations from the input hyperparameter combination set based on one or more rules, the one or more rules being selected from a plurality of rules comprising:

a first rule of selecting a first predetermined number of hyperparameter combinations having top performance results from the input hyperparameter combination set;

a second rule of selecting a second predetermined number of hyperparameter combinations having top performance results from the input hyperparameter combination set, and randomly changing values of a selected number of hyperparameters in each of the second predetermined number of hyperparameter combinations; and

a third rule of randomly selecting a third predetermined number of hyperparameter combinations from the input hyperparameter combination set;

adding the one or more hyperparameter combinations into an output hyperparameter combination set;

setting the one or more hyperparameter combinations as the input hyperparameter combination set, and recursively performing a new iteration of the training, the testing, the obtaining, and the adding until a termination condition is satisfied; and

selecting a hyperparameter combination having a best performance result from the output hyperparameter combination set as a final hyperparameter combination for the machine learning method, wherein the varying ratio and a likelihood of selecting the first rule are increased, and a likelihood of selecting the second rule and a likelihood of selecting the third rule are decreased, after each iteration of the training, the testing, the obtaining, and the adding.
The method of claim 1, wherein determining the plurality of different hyperparameter combinations comprises:

randomly selecting the plurality of different hyperparameter combinations from among all known hyperparameter combinations; or

deterministically selecting the plurality of different hyperparameter combinations based on a previous output hyperparameter combination set that is obtained using a different training set and/or a different testing set.
The method of claim 1, wherein the termination condition comprises at least one of:

a maximum number of iterations being reached;

a difference between hyperparameter combinations of two successive iterations being less than a predefined threshold; or

the likelihood of selecting the first rule being greater than or equal to a predefined threshold.
The method of claim 1, wherein an amount of increase of the varying ratio after each iteration depends on at least one of a predefined ratio increase rate, a current of the varying ratio, or computing resources that are available for the training.
A system comprising:

one or more processors;

memory storing executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising:

receiving a request for recommending a hyperparameter combination for a machine learning method and/or a learning model;

initiating an input hyperparameter combination set;

performing a plurality of iterations of a machine learning process for the learning model through the machine learning method, wherein each iteration of the machine learning process comprises training and testing a respective learning model instance for each hyperparameter combination in the input hyperparameter combination set, selecting at least one selection rule from a plurality of selection rules, selecting one or more hyperparameter combinations into an output hyperparameter combination set based on the at least one selection rule, adding the one or more selected hyperparameter combinations into an output hyperparameter combination set, and setting the one or more selected hyperparameter combinations as the input hyperparameter combination set for a next iteration of the machine learning process if one or more termination conditions are not met, wherein training sets are different in at least some of the plurality of iterations of the machine learning process; and

selecting a hyperparameter combination having a best performance result from the output hyperparameter combination set as the hyperparameter combination recommended for the machine learning method and/or the learning model.
The system of claim 5, wherein initiating the input hyperparameter combination set comprises:

randomly selecting the input hyperparameter combination set from among all known hyperparameter combinations; or

deterministically selecting the input hyperparameter combination set based on a previous output hyperparameter combination set that is obtained using a different training set and/or a different testing set.
The system of claim 5, wherein the plurality of selection rules comprise:

a first selection rule of selecting a first predetermined number of hyperparameter combinations having top performance results from the input hyperparameter combination set;

a second selection rule of selecting a second predetermined number of hyperparameter combinations having top performance results from the input hyperparameter combination set, and randomly changing values of a selected number of hyperparameters in each of the second predetermined number of hyperparameter combinations; and

a third selection rule of randomly selecting a third predetermined number of hyperparameter combinations from the input hyperparameter combination set.
The system of claim 7, wherein the one or more termination conditions comprise:

a maximum number of iterations being reached;

a difference between hyperparameter combinations of two successive iterations being less than a predefined threshold; or

the likelihood of selecting the first rule being greater than or equal to a predefined threshold.
The system of claim 5, wherein selecting the at least one selection rule from the plurality of selection rules comprises randomly selecting the at least one selection rule from the plurality of selection rules.
The system of claim 5, wherein the plurality of selection rules are assigned with different selection probabilities, selection probabilities of one or more selection rules being decreased after each iteration of the machine learning process.
The system of claim 5, wherein a size of a training set is monotonically increased after each iteration of the machine learning process.
The system of claim 5, wherein a size of a training set in each iteration of the machine learning process depends on a predefined ratio increase rate, and/or available computing and storage resources.
One or more computer readable media storing executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising:

receiving a request for recommending a hyperparameter combination for a machine learning method and/or a learning model;

initiating an input hyperparameter combination set;

performing a plurality of iterations of a machine learning process for the learning model through the machine learning method, wherein each iteration of the machine learning process comprises training and testing a respective learning model instance for each hyperparameter combination in the input hyperparameter combination set, selecting at least one selection rule from a plurality of selection rules, selecting one or more hyperparameter combinations into an output hyperparameter combination set based on the at least one selection rule, adding the one or more selected hyperparameter combinations into an output hyperparameter combination set, and setting the one or more selected hyperparameter combinations as the input hyperparameter combination set for a next iteration of the machine learning process if one or more termination conditions are not met, wherein training sets are different in at least some of the plurality of iterations of the machine learning process; and

selecting a hyperparameter combination having a best performance result from the output hyperparameter combination set as the hyperparameter combination recommended for the machine learning method and/or the learning model.
The one or more computer readable media of claim 13, wherein initiating the input hyperparameter combination set comprises:

randomly selecting the input hyperparameter combination set from among all known hyperparameter combinations; or

deterministically selecting the input hyperparameter combination set based on a previous output hyperparameter combination set that is obtained using a different training set and/or a different testing set.
The one or more computer readable media of claim 13, wherein the plurality of selection rules comprise:

a first selection rule of selecting a first predetermined number of hyperparameter combinations having top performance results from the input hyperparameter combination set;

a second selection rule of selecting a second predetermined number of hyperparameter combinations having top performance results from the input hyperparameter combination set, and randomly changing values of a selected number of hyperparameters in each of the second predetermined number of hyperparameter combinations; and

a third selection rule of randomly selecting a third predetermined number of hyperparameter combinations from the input hyperparameter combination set.
The one or more computer readable media of claim 15, wherein the one or more termination conditions comprise:

a maximum number of iterations being reached;

a difference between hyperparameter combinations of two successive iterations being less than a predefined threshold; or

the likelihood of selecting the first rule being greater than or equal to a predefined threshold.
The one or more computer readable media of claim 13, wherein selecting the at least one selection rule from the plurality of selection rules comprises randomly selecting the at least one selection rule from the plurality of selection rules.
The one or more computer readable media of claim 13, wherein the plurality of selection rules are assigned with different selection probabilities, selection probabilities of one or more selection rules being decreased after each iteration of the machine learning process.
The one or more computer readable media of claim 13, wherein a size of a training set is monotonically increased after each iteration of the machine learning process.
The one or more computer readable media of claim 13, wherein a size of a training set in each iteration of the machine learning process depends on a predefined ratio increase rate, and/or available computing and storage resources.