CN110503208B - Resource scheduling method and resource scheduling device in multi-model exploration - Google Patents
Resource scheduling method and resource scheduling device in multi-model exploration Download PDFInfo
- Publication number
- CN110503208B CN110503208B CN201910791358.3A CN201910791358A CN110503208B CN 110503208 B CN110503208 B CN 110503208B CN 201910791358 A CN201910791358 A CN 201910791358A CN 110503208 B CN110503208 B CN 110503208B
- Authority
- CN
- China
- Prior art keywords
- machine learning
- learning model
- model
- resource
- hyper
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Debugging And Monitoring (AREA)
Abstract
A resource scheduling method and a resource scheduling apparatus in multi-model exploration are provided. The resource scheduling method in the multi-model exploration comprises the following steps: respectively carrying out a round of hyper-parameter exploration training on a plurality of machine learning models based on the same target data set, wherein each machine learning model at least explores M groups of hyper-parameters in the round of exploration, and M is a positive integer greater than 1; calculating the performance score of each machine learning model in the current round and calculating the future potential score of each machine learning model based on the model evaluation indexes respectively corresponding to the plurality of groups of hyper-parameters explored by the plurality of machine learning models in the current round; integrating the performance scores of the current round and the potential scores of the future of each machine learning model, and determining a resource allocation scheme for allocating available resources to each machine learning model; and carrying out corresponding resource scheduling in next round of hyper-parameter exploration training according to the resource allocation scheme.
Description
Technical Field
The present invention relates generally to the field of artificial intelligence, and more particularly, to a resource scheduling method and a resource scheduling apparatus in multi-model exploration.
Background
With the advent of massive data, artificial intelligence technology is rapidly developing, and machine learning is a necessary product of artificial intelligence development to a certain stage, which is dedicated to mining valuable potential information from massive data through a calculation means.
Currently, Auto-Machine learning (Auto-ml) is a very popular direction in Machine learning, which is directed to automatically determining optimal parameters and network structures based on problems. The implementation of AutoML is endless and can be roughly divided into two categories: one is a single model (i.e., machine learning model) exploring the best model, representing Efficient Neural Architecture Search (ENAS) for google, and one is a multi-model exploration, i.e., finding the best model from a plurality of models. Here, the plurality of models means a plurality of models having fixed structures. The fixed structure of the model means that the model is determined except for the uncertainty of the hyper-parameters. The multi-model exploration refers to that the optimal model is finally explored by continuously carrying out super-parameter tuning on a plurality of models.
However, the current exploration method is that each model is independently subjected to hyper-parameter tuning, and resource planning and scheduling between models are lacked. And whatever the model behaves, it runs continuously, which undoubtedly wastes a lot of computing and time resources. There is a lack of effective management mechanisms in terms of resources and exploration efficiency.
Disclosure of Invention
The invention aims to provide a resource scheduling method and a resource scheduling device in multi-model exploration.
One aspect of the present invention provides a resource scheduling method in multi-model exploration, where the resource scheduling method includes: respectively carrying out a round of hyper-parameter exploration training on a plurality of machine learning models based on the same target data set, wherein each machine learning model at least explores M groups of hyper-parameters in the round of exploration, and M is a positive integer greater than 1; calculating the performance score of each machine learning model in the current round and calculating the future potential score of each machine learning model based on the model evaluation indexes respectively corresponding to the plurality of groups of hyper-parameters explored by the plurality of machine learning models in the current round; integrating the performance scores of the current round and the potential scores of the future of each machine learning model, and determining a resource allocation scheme for allocating available resources to each machine learning model; and carrying out corresponding resource scheduling in the next round of hyperparameter exploration training according to the resource allocation scheme.
Optionally, the calculating the performance score of each machine learning model in the current round comprises: determining the first K optimal model evaluation indexes from the plurality of machine learning models in the model evaluation indexes respectively corresponding to the plurality of groups of hyper-parameters searched in the round, wherein K is a positive integer; and for each machine learning model, taking the proportion value of the machine learning model to the first K best model evaluation indexes as the performance score of the current round of the machine learning model.
Optionally, the calculating the future potential score of each machine learning model comprises: storing model evaluation indexes respectively corresponding to a plurality of groups of hyper-parameters searched by each machine learning model in an array according to the sequence to obtain a plurality of arrays respectively corresponding to the plurality of machine learning models; for each machine learning model, extracting a monotone enhancement array from an array corresponding to the machine learning model, and taking the ratio of the length of the monotone enhancement array to the length of the array corresponding to the machine learning model as the future potential score of the machine learning model.
Optionally, the plurality of machine learning models includes at least two of a logistic regression machine learning model with a hyper-parameter selection mechanism, a naive bayes machine learning model with a hyper-parameter selection mechanism, an ensemble learning model with a hyper-parameter selection mechanism, and a regression correlation machine learning model with a hyper-parameter selection mechanism.
Optionally, the resource comprises at least one of a central processor, a memory space, and a thread.
Optionally, the step of performing a round of hyper-parameter exploration training on the multiple machine learning models based on the same target data set further includes: determining whether at least one machine learning model of the plurality of machine learning models meets a condition of early stopping, wherein when at least one machine learning model is determined to meet the condition of early stopping, training of the at least one machine learning model is stopped, and the step of calculating the performance score of the current round and the future potential score is not performed on the at least one machine learning model.
Optionally, the condition of early stop includes: when the model evaluation indexes corresponding to the current round exploration hyper-parameters of a machine learning model are not innovated for I times continuously, the machine learning model meets the early stop condition; and/or when the model evaluation indexes corresponding to the J hyper-parameters searched by one machine learning model in the current round are higher than the optimal evaluation indexes of the other machine learning model in the current round, the other machine learning model meets the early stop condition.
Optionally, the array corresponding to the machine learning model sequentially includes a first model evaluation index to an xth model evaluation index, where X is an integer greater than or equal to M, and the step of extracting the monotone enhanced array from the array corresponding to the machine learning model includes: extracting the first model evaluation index as a first value in a monotone enhancement array; and for any model evaluation index from the second model evaluation index to the Xth model evaluation index, if the model evaluation index is superior to the maximum value in the current monotone enhancement array, extracting the model evaluation index as a new value in the monotone enhancement array.
Optionally, the step of determining the resource allocation scheme comprises: calculating a comprehensive score of each machine learning model based on the performance score of the current round and the potential score of the future of each machine learning model; calculating the ratio of the comprehensive score of each machine learning model to the sum of all the comprehensive scores as the resource distribution coefficient of each machine learning model; determining the resource allocation scheme as the following resource allocation scheme: determining a resource corresponding to a product of the resource allocation coefficient of each machine learning model and the total resource to be allocated as a resource to be allocated to each machine learning model.
Optionally, the step of determining a resource corresponding to a product of the resource allocation coefficient of each machine learning model and the total resource to be allocated as a resource to be allocated to each machine learning model includes: in all the machine learning models except the machine learning model with the highest resource distribution coefficient, from the machine learning model with the lowest resource distribution coefficient, rounding down the product of the resource distribution coefficient of the machine learning model and the total resource to be distributed and determining the value after rounding down as the number of the resources to be distributed to the machine learning model; and determining the resource which is not allocated to the machine learning model in the total resources to be allocated as the resource to be allocated to the machine learning model with the highest resource allocation coefficient.
Optionally, the step of determining a resource corresponding to a product of the resource allocation coefficient of each machine learning model and the total resource to be allocated as a resource to be allocated to each machine learning model further includes: when the number of the resources allocated to each machine learning model has a value of zero and a value greater than one, sorting the number of the resources of the machine learning model for which the number of the resources allocated to the machine learning model is greater than one in an increasing order; reducing the resources of the machine learning model to at most one, starting from the fewest resources of the machine learning model, among the resources of the machine learning model ordered in increasing order, and allocating the reduced resources to one or more of the zero number of resources machine learning models, such that the number of resources to be allocated to the one or more machine learning models is one, and only when the resource of the current machine learning model is reduced to one, the reduction of the resource of the next machine learning model to one at most is started, and allocating the reduced resources to the zero number of remaining resources machine learning models such that the number of resources of one or more of the zero number of remaining resources machine learning models is one until the current machine learning model is the last machine learning model of the machine learning models sorted in increasing order.
Optionally, the resource scheduling method further includes: in response to a user's stop request, the total training time reaching a predetermined total training time, or the total number of training rounds reaching a predetermined total number of training rounds, stopping allocating resources to the machine learning model.
Optionally, the step of performing a round of hyper-parameter exploration training on the multiple machine learning models based on the same target data set includes: and respectively allocating the same number of resources to the multiple machine learning models, and respectively performing a round of hyper-parameter exploration training on the multiple machine learning models based on the same target data set by using the same number of resources.
One aspect of the present invention provides a resource scheduling apparatus for multi-model exploration, the resource scheduling apparatus comprising: the hyper-parameter exploration training unit is configured to perform one round of hyper-parameter exploration training on the multiple machine learning models respectively based on the same target data set, wherein each machine learning model explores at least M groups of hyper-parameters in the round of exploration, and M is a positive integer greater than 1; a score calculating unit configured to calculate a performance score of each machine learning model in a current round and calculate a future potential score of each machine learning model based on model evaluation indexes respectively corresponding to the plurality of sets of hyper-parameters searched by the plurality of machine learning models in the current round; a resource allocation scheme determination unit configured to synthesize the performance score of the current round and the potential score of the future of each machine learning model and determine a resource allocation scheme for allocating available resources to each machine learning model; and the resource scheduling unit is configured to perform corresponding resource scheduling in the next round of hyper-parameter exploration training according to the resource allocation scheme.
Optionally, the score calculating unit is configured to: determining the first K optimal model evaluation indexes from the plurality of machine learning models in the model evaluation indexes respectively corresponding to the plurality of groups of hyper-parameters searched in the round, wherein K is a positive integer; and for each machine learning model, taking the proportion value of the machine learning model to the first K best model evaluation indexes as the performance score of the current round of the machine learning model.
Optionally, the score calculating unit is configured to: storing model evaluation indexes respectively corresponding to a plurality of groups of hyper-parameters searched by each machine learning model in an array according to the sequence to obtain a plurality of arrays respectively corresponding to the plurality of machine learning models; for each machine learning model, extracting a monotone enhancement array from an array corresponding to the machine learning model, and taking the ratio of the length of the monotone enhancement array to the length of the array corresponding to the machine learning model as the future potential score of the machine learning model.
Optionally, the plurality of machine learning models includes at least two of a logistic regression machine learning model with a hyper-parameter selection mechanism, a naive bayes machine learning model with a hyper-parameter selection mechanism, an ensemble learning model with a hyper-parameter selection mechanism, and a regression correlation machine learning model with a hyper-parameter selection mechanism.
Optionally, the resource comprises at least one of a central processor, a memory space, and a thread.
Optionally, the hyper-parameter exploration training unit is further configured to: and determining whether at least one machine learning model in the plurality of machine learning models meets the condition of early stopping, wherein when the at least one machine learning model is determined to meet the condition of early stopping, the hyperparameter exploration training unit stops the training of the at least one machine learning model and does not perform the step of calculating the performance score of the current round and the future potential score on the at least one machine learning model.
Optionally, the condition of early stop includes: when the model evaluation indexes corresponding to the current round exploration hyper-parameters of a machine learning model are not innovated for I times continuously, the machine learning model meets the early stop condition; and/or when the model evaluation indexes corresponding to the J hyper-parameters searched by one machine learning model in the current round are higher than the optimal evaluation indexes of the other machine learning model in the current round, the other machine learning model meets the early stop condition.
Optionally, the array corresponding to the machine learning model sequentially includes a first model evaluation index to an xth model evaluation index, where X is an integer equal to or greater than M, and the score calculation unit is configured to: extracting the first model evaluation index as a first value in a monotone enhancement array; and for any model evaluation index from the second model evaluation index to the Xth model evaluation index, if the model evaluation index is superior to the maximum value in the current monotone enhancement array, extracting the model evaluation index as a new value in the monotone enhancement array.
Optionally, the resource allocation scheme determining unit is configured to: calculating a comprehensive score of each machine learning model based on the performance score of the current round and the potential score of the future of each machine learning model; calculating the ratio of the comprehensive score of each machine learning model to the sum of all the comprehensive scores as the resource distribution coefficient of each machine learning model; determining the resource allocation scheme as the following resource allocation scheme: a resource corresponding to a product of the resource allocation coefficient of each machine learning model and the total resource to be allocated is determined as a resource to be allocated to each machine learning model.
Optionally, the resource allocation scheme determining unit is configured to: in all the machine learning models except the machine learning model with the highest resource distribution coefficient, from the machine learning model with the lowest resource distribution coefficient, rounding down the product of the resource distribution coefficient of the machine learning model and the total resource to be distributed and determining the value after rounding down as the number of the resources to be distributed to the machine learning model; and determining the resource which is not allocated to the machine learning model in the total resources to be allocated as the resource to be allocated to the machine learning model with the highest resource allocation coefficient.
Optionally, the resource allocation scheme determining unit is further configured to: when the number of the resources allocated to each machine learning model has a value of zero and a value greater than one, sorting the number of the resources of the machine learning model for which the number of the resources allocated to the machine learning model is greater than one in an increasing order; and in the resources of the machine learning models which are ordered according to the ascending order, starting from the machine learning model with the least resources, reducing the resources of the machine learning model by one unit, allocating the reduced resources to one machine learning model in the machine learning models with the zero resource number, and returning to the step of ordering according to the ascending order until all the resources of the machine learning models are not zero.
Optionally, the resource scheduling unit is further configured to: in response to a user's stop request, the total training time reaching a predetermined total training time, or the total number of training rounds reaching a predetermined total number of training rounds, stopping allocating resources to the machine learning model.
Optionally, the hyperparameter exploration training unit is configured to: and respectively allocating the same number of resources to the multiple machine learning models, and respectively performing a round of hyper-parameter exploration training on the multiple machine learning models based on the same target data set by using the same number of resources.
An aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by one or more computing devices, causes the one or more computing devices to implement any of the resource scheduling methods described above.
An aspect of the present invention provides a resource scheduling system comprising one or more computing devices and one or more storage devices having a computer program recorded thereon, which when executed by the one or more computing devices, causes the one or more computing devices to implement any of the resource scheduling methods described above.
According to the technical scheme for scheduling the resources by using the performance scores of the current round and the potential scores of the future of the machine learning models, on one hand, the current performance of each machine learning model can be evaluated by fully using the searched results, resources are effectively distributed, and the resource utilization efficiency and the searching efficiency are improved, on the other hand, the future performance of each machine learning model is evaluated by using the searched results, resources are distributed more reasonably, and the resource utilization efficiency and the searching efficiency are improved.
Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.
Drawings
The above and other objects and features of the present invention will become more apparent from the following description taken in conjunction with the accompanying drawings which illustrate, by way of example, an example in which:
FIG. 1 illustrates a flow diagram of a method of resource scheduling in multi-model exploration according to the present invention;
FIG. 2 illustrates a flow diagram for calculating performance scores for each of the current rounds of the machine learning model, according to an embodiment of the invention;
FIG. 3 illustrates a flow diagram for calculating a future potential score for each machine learning model, according to an embodiment of the invention;
FIG. 4 shows a flow diagram for determining a resource allocation scheme according to an embodiment of the invention;
FIG. 5 shows a flow diagram of an allocation compensation mechanism according to an embodiment of the invention;
fig. 6 illustrates a resource scheduling apparatus for multi-model exploration, according to an embodiment of the present invention.
Detailed Description
The following description is provided with reference to the accompanying drawings to assist in a comprehensive understanding of exemplary embodiments of the invention as defined by the claims and their equivalents. The description includes various specific details to aid understanding, but these details are to be regarded as illustrative only. Thus, one of ordinary skill in the art will recognize that: various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the present invention. Moreover, descriptions of well-known functions and constructions may be omitted for clarity and conciseness.
Fig. 1 shows a flow chart of a resource scheduling method in multi-model exploration according to the present invention.
Referring to fig. 1, in step S110, a round of hyper-parameter search training is performed on a plurality of machine learning models respectively based on the same target data set, wherein each machine learning model searches at least M sets of hyper-parameters in the round of search, and M is a positive integer greater than 1.
Here, the plurality of machine learning models may include at least two of a logistic regression machine learning model with a hyper-parameter selection mechanism, a naive bayes machine learning model with a hyper-parameter selection mechanism, an ensemble learning model with a hyper-parameter selection mechanism, and a regression correlation machine learning model with a hyper-parameter selection mechanism. However, the invention is not so limited and the plurality of machine learning models of the invention may include any other machine learning models.
In the present invention, each of the plurality of machine learning models explores a plurality of sets of hyper-parameters during a round of hyper-parameter exploration. For example, machine learning model a of the plurality of machine learning models explores M sets of hyper-parameter heuristics, i.e., machine learning model a having a first set of hyper-parameters, machine learning models a, … … having a second hyper-parameters, and machine learning model a having an M-th set of hyper-parameters are trained. For example, when the machine learning model is a logistic regression machine learning model, M is 2, then in one round of exploration, the logistic regression machine learning model with the first set of hyper-parameters and the logistic regression machine learning model with the second set of hyper-parameters are trained based on the same target dataset.
Further, it should be understood that the number of sets of hyper-parameters explored by each machine learning model in a round of exploration of multiple machine learning models may or may not be the same, because the machine learning models may take different times to complete training based on a set of hyper-parameters.
Optionally, step S110 may further include: determining whether at least one machine learning model of the plurality of machine learning models satisfies a condition of early stopping, wherein when at least one machine learning model is determined to satisfy the condition of early stopping, training of the at least one machine learning model is stopped, and subsequent steps of calculating the performance score and the future potential score of the current round are not performed on the at least one machine learning model. According to one embodiment of the invention, the training of the machine learning model meeting the condition of early stop and the calculation of the performance score and the future potential score of the current round are stopped, so that the resources are saved, and the training efficiency is improved.
As an example, the conditions for early stop may include: and/or when the model evaluation indexes corresponding to the J hyper-parameters explored by one machine learning model in the current round are higher than the optimal evaluation indexes of the other machine learning model in the current round, the other machine learning model meets the early stop condition. Here, I and J may be predetermined values. Here, note that in one example, the value of I may differ from machine learning model to machine learning model.
Here, the model evaluation index may indicate a training effect of the machine learning model having a set of hyper-parameters. In one example, the training effect may indicate a validation set accuracy rate. In another example, the training effect may indicate a mean square error. However, the above examples are merely exemplary, and the model evaluation index of the present invention may include any parameter and/or value that indicates the training effect of the machine learning model.
More specifically, for example, when the training effect indicates the accuracy of the validation set, the model evaluation index is continuously not innovative for I times, which may indicate that the accuracy of the model corresponding to the continuously explored I groups of hyper-parameters is not innovative on the validation set; when the training effect can indicate the mean square error, the model evaluation index can indicate that the mean square error of the model corresponding to the I groups of hyper-parameters continuously explored is not innovative for I times.
To enhance the understanding of the model evaluation index as I successive non-innovative advantages, the following example is given, but the invention is not limited thereto, e.g., in some examples, a non-innovative advantage may indicate whether a model evaluation index is greater than a predetermined threshold. Assuming that a represents an array of model evaluation indexes respectively corresponding to a plurality of hyper-parameters including continuous exploration of a machine learning model, I is 5, W (x, x +4) represents the maximum value of 5 elements from the x-th to the (x +4) -th elements in a, the model evaluation indexes indicate verification set accuracy, len (a) indicates the length of the array a, whether the model evaluation indexes are continuously I times innovated (i.e., whether an early-stop condition is satisfied) can be judged by the following pseudo codes:
While x+4<len(A):
If W(x,x+4)<W(0,x–1):
triggering early stop
x=x+1
As an example (referred to as a first example for convenience of description), the plurality of machine learning models includes a logistic regression machine learning model lr, a gradient boosting decision tree machine learning model gbdt, a deep sparse network machine learning model dsn. In this illustrative first example, a round of hyper-parameter exploration training is performed on the logistic regression machine learning model lr, the gradient boosting decision tree machine learning model gbdt, and the deep sparse network machine learning model dsn, respectively, based on the same target dataset. In this first example, the logistic regression machine learning model lr, the gradient boosting decision tree machine learning model gbdt, and the deep sparse network machine learning model dsn explore at least 5 sets of hyper-parameters, respectively. It should be understood that the first example is merely exemplary, and more specifically, any specific numerical value shown in the first example is exemplary, and the numerical value of the present invention is not limited to any specific numerical value shown in the first example, that is, any other numerical value is also possible according to the embodiment of the present invention.
In an exemplary first example, the model evaluation indexes respectively corresponding to the multiple sets of hyper-parameters explored in the round of the logistic regression machine learning model lr, the gradient boosting decision tree machine learning model gbdt, and the deep sparse network machine learning model dsn may be represented as follows:
lr:[0.2,0.4,0.5,0.3,0.6,0.1,0.7,0.3]
gbdt:[0.5,0.2,0.1,0.4,0.2,0.6]
dsn:[0.61,0.67,0.63,0.72,0.8]
wherein a single value in a single array may indicate a training effect of a machine learning model having a set of hyper-parameters. For example, as an example, a single value (e.g., 0.2) in the array herein may indicate the validation set accuracy. Furthermore, in the first example, the logistic regression machine learning model lr was trained with eight sets of hyper-parameters, the gradient boosting decision tree machine learning model gbdt was trained with six sets of hyper-parameters, and the deep sparse network machine learning model dsn was trained with five sets of hyper-parameters.
In conjunction with the above pseudo code, for the array [0.2,0.4,0.5,0.3,0.6,0.1,0.7,0.3] of the logistic regression machine learning model lr corresponding to the model evaluation index, when x in the pseudo code is 2, since W (2,6) is 0.7, W (0,1) is 0.4, and W (0,1) < W (2,6), in this case, the early stop condition is not triggered. Similarly, when x is other values, the early stop condition is not triggered. Thus, in this first example, the logistic regression machine learning model lr is not triggered to the early stop condition. Similarly calculated, the gradient boosting decision tree machine learning model gbdt and the deep sparse network machine learning model dsn also do not satisfy the early stop condition.
Further, optionally, as described above, the early stop condition may include: and when the model evaluation indexes corresponding to the J hyper-parameters searched by one machine learning model in the current round are higher than the optimal evaluation indexes of the other machine learning model in the current round, the other machine learning model meets the early stop condition.
In the exemplary first example, the optimal model evaluation index and the 5 th (in this example, J is 5, although the present invention is not limited thereto) optimal model evaluation index of the logistic regression machine learning model lr are 0.7 and 0.3, respectively, the optimal model evaluation index and the 5 th optimal model evaluation index of the gradient boosting decision tree machine learning model gbdt are 0.6 and 0.2, respectively, and the optimal model evaluation index and the 5 th optimal model evaluation index of the deep sparse network machine learning model dsn are 0.8 and 0.61, respectively. Since the 5 th best model evaluation index 0.61 of the deep sparse network machine learning model dsn is greater than the optimal model evaluation index 0.6 of the gradient boosting decision tree machine learning model gbdt, the gradient boosting decision tree machine learning model gbdt is determined to satisfy the early stop condition. Therefore, the gradient boosting decision tree machine learning model gbdt no longer participates in the model exploration. Therefore, by determining whether the machine learning model satisfies the early-stop condition and stopping the search for the machine learning model satisfying the early-stop condition, the waste of resources can be reduced and the search efficiency can be improved.
In one embodiment of the invention, initially, the same number of resources may be allocated to each of the plurality of machine learning models, and a round of hyper-parameter exploration training may be performed on each of the plurality of machine learning models based on the same target data set using the same number of resources. Here, the resources may indicate computing resources for exploring the machine learning model. In one example, the resources may include at least one of a central processor, a memory space, and a thread.
In step S120, a performance score of each machine learning model in the current round and a future potential score of each machine learning model are calculated based on model evaluation indexes respectively corresponding to the plurality of sets of hyper-parameters searched by the plurality of machine learning models in the current round.
In the present invention, a round of performance scores of a machine learning model may be related to one or more results that are optimal for the machine learning model to explore in the round. The steps for calculating the performance scores for the current round of each machine learning model will be described in more detail below in conjunction with fig. 2.
Further, in the present invention, the future potential score of a machine learning model may indicate the ability to explore better results if the machine learning model continues to explore. The step of calculating the future potential score for each machine learning model will be described in detail later with reference to fig. 3.
In step S130, the performance scores of the current round and the potential scores of the future of each machine learning model are integrated, and a resource allocation scheme for allocating available resources to each machine learning model is determined.
That is, in the present invention, a resource allocation scheme that allocates available resources to each machine learning model may be determined based on the current round performance score and the future potential score of each machine learning model. Because the resource allocation scheme is determined by considering the performance score of the current round and the potential score of the future of each machine learning model, resources among the machine learning models are integrated and scheduled, so that the resources are allocated to different machine learning models in a targeted manner, the situation that any machine learning model can continuously run regardless of the performance is avoided, a large amount of calculation and time resources are saved, and further, the resources and the exploration efficiency are effectively managed.
In step S140, corresponding resource scheduling is performed in the next round of hyper-parameter exploration training according to the resource allocation scheme.
For example, in a next round of hyperparametric exploration training, the respective machine learning model may use the resources it was scheduled according to the resource allocation scheme for exploration training.
In addition, optionally, the resource scheduling method may further include: in response to a user's stop request, the total training time reaching a predetermined total training time, or the total number of training rounds reaching a predetermined total number of training rounds, stopping allocating resources to the machine learning model.
FIG. 2 illustrates a flow diagram for calculating performance scores for each of the current rounds of the machine learning model, according to an embodiment of the invention.
Referring to fig. 2, in step S210, the top K best model evaluation indexes are determined from the model evaluation indexes respectively corresponding to the plurality of sets of hyper-parameters found in the round of search by the plurality of machine learning models, where K is a positive integer.
As described above, in the exemplary first example, the model evaluation indexes respectively corresponding to the plurality of sets of hyper-parameters respectively explored in this round of the logistic regression machine learning model lr, the gradient boosting decision tree machine learning model gbdt, and the deep sparse network machine learning model dsn can be respectively represented as the following arrays: lr: [0.2,0.4,0.5,0.3,0.6,0.1,0.7,0.3], gbdt: [0.5,0.2,0.1,0.4,0.2,0.6], dsn: [0.61,0.67,0.63,0.72,0.8 ]. Here, since the gradient boost decision tree machine learning model gbdt may satisfy the early stop condition as described above, the gradient boost decision tree machine learning model gbdt subsequently does not participate in the training exploration. In this case, the top 5 (here, K is 5, however the present invention does not limit this) best model evaluation indexes among all the model evaluation indexes of the logistic regression machine learning model lr and the deep sparse network machine learning model dsn are respectively: 0.7, 0.67,0.63,0.72 and 0.8.
In step S220, for each machine learning model, the proportional value of the machine learning model to the top K best model evaluation indexes is used as the performance score of the current round of the machine learning model.
In the exemplary first example, "0.7" of the 5 best model evaluation indexes "0.7, 0.67,0.63,0.72, and 0.8" is a model evaluation index of the logistic regression machine learning model lr, and therefore, the ratio of the model evaluation index of the logistic regression machine learning model lr to the first 5 best model evaluation indexes "0.7, 0.67,0.63,0.72, and 0.8" is 1/5. In contrast, in the exemplary first example, "0.67", "0.63", "0.72", and "0.8" of the 5 best model evaluation indices "0.7, 0.67,0.63,0.72, and 0.8" are model evaluation indices of the deep-sparse-network machine learning model dsn, and thus, the model evaluation index of the deep-sparse-network machine learning model dsn accounts for 4/5 which is the proportional value of the first 5 best model evaluation indices "0.7, 0.67,0.63,0.72, and 0.8". Thus, in the illustrative first example, the performance score of the current round of the logistic regression machine learning model lr may correspond to 1/5 and the performance score of the current round of the deep sparse network machine learning model dsn may correspond to 4/5.
Fig. 3 illustrates a flow diagram for calculating a future potential score for each machine learning model according to an embodiment of the invention.
Referring to fig. 3, in step S310, the model evaluation indexes respectively corresponding to the multiple sets of hyper-parameters searched in the round of the machine learning model are stored in an array according to the sequence, so as to obtain multiple arrays respectively corresponding to the multiple machine learning models.
In the exemplary first example, as described above, the array corresponding to the model evaluation index of the logistic regression machine learning model lr is [0.2,0.4,0.5,0.3,0.6,0.1,0.7,0.3], and the array corresponding to the model evaluation index of the deep sparse network machine learning model dsn is [0.61,0.67,0.63,0.72,0.8 ].
In step S320, for each machine learning model, a monotone enhancement array is extracted from an array corresponding to the machine learning model, and a ratio of a length of the monotone enhancement array to a length of the array corresponding to the machine learning model is used as a future potential score of the machine learning model.
Here, the monotone increasing array does not necessarily express the monotone increasing array. In one example, when the training effect indicates a validation set accuracy rate, the monotonically increasing array may indicate a monotonically increasing array. In another example, the monotonically increasing array may indicate a monotonically decreasing array when the training effect may indicate a mean square error. In other words, an enhancement of a value in the monotonic enhancement array may indicate an enhancement or optimization of the training effect.
For convenience of description, it is assumed below that the array corresponding to the machine learning model sequentially includes a first model evaluation index to an xth model evaluation index, where X is an integer equal to or greater than M.
The step of extracting the monotone enhanced array from the array corresponding to the machine learning model may include: the first model evaluation index is extracted as a first value in a monotonically increasing array.
For example, in the exemplary first example, the array corresponding to the model evaluation index of the logistic regression machine learning model lr is [0.2,0.4,0.5,0.3,0.6,0.1,0.7,0.3], and therefore, 0.2 is extracted as the first value in the monotonic enhancing array.
Furthermore, the step of extracting the monotone enhancing array from the array corresponding to the machine learning model may further comprise: and for any model evaluation index from the second model evaluation index to the Xth model evaluation index, if the model evaluation index is superior to the maximum value in the current monotone enhancement array, extracting the model evaluation index as a new value in the monotone enhancement array.
For example, in the exemplary first example, for the second model evaluation index 0.4 of the logistic regression machine learning model lr, since the second model evaluation index 0.4 is larger than the maximum value (i.e., 0.2) in the current monotonic enhancement array (at this time, corresponding to the monotonic enhancement array including only the first value), 0.4 is extracted as a new value (i.e., a second value) in the monotonic enhancement array. At this time, the monotonic boost array becomes [0.2,0.4 ]. Next, with respect to the third model evaluation index 0.5 of the logistic regression machine learning model lr, since the third model evaluation index 0.5 is larger than the maximum value (i.e., 0.4) in the current monotone enhancing array (at this time, corresponding to the monotone enhancing array including the first value and the second value), 0.4 is extracted as a new value (i.e., a third value) in the monotone enhancing array. At this time, the monotone enhancing array becomes [0.2,0.4,0.5 ]. Next, with respect to the fourth model evaluation index 0.3 of the logistic regression machine learning model lr, since the fourth model evaluation index 0.3 is smaller than the maximum value (i.e., 0.5) in the current monotone enhancing array (at this time, corresponding to the monotone enhancing array including the first value, the second value, and the third value), 0.3 is not extracted as a new value in the monotone enhancing array. At this point, the monotonic boost array is still [0.2,0.4,0.5 ]. Subsequently, the fifth to eighth model evaluation indexes 0.6 to 0.3 are processed similarly to the second to fourth model evaluation indexes 0.4 to 0.3, and the monotone enhancement array finally obtained is [0.2,0.4,0.5,0.6,0.7 ].
In the present invention, the length of the array may indicate the number of values included in the array. In the illustrative first example, the length of the resulting monotonically increasing array of the logistic regression machine learning model lr is 5, and the length of the array [0.2,0.4,0.5,0.3,0.6,0.1,0.7,0.3] corresponding to the logistic regression machine learning model lr is 8, so the future potential of the logistic regression machine learning model lr is 5/8.
In the illustrative first example, the length of the resulting monotonically increasing array [0.61,0.67,0.72,0.8] of the deep sparse network machine learning model dsn is 4, and the length of the array [0.61,0.67,0.63,0.72,0.8] corresponding to the deep sparse network machine learning model dsn is 5, based on a method similar to the calculation method of the future potential score with reference to the logistic regression machine learning model lr, and thus, the future potential score of the logistic regression machine learning model lr is 4/5.
Fig. 4 shows a flow diagram for determining a resource allocation scheme according to an embodiment of the invention.
Referring to fig. 4, in step S410, a composite score for each machine learning model is calculated based on the performance score of the current round and the future potential score of each machine learning model.
In one embodiment, a composite score for each machine learning model may be calculated by weighted summation of the performance scores of the current round and the future potential scores for each machine learning model. For example only, the composite score for each machine learning model may be calculated by assigning the same or different weights to the current round of performance scores and the future potential scores of the machine learning models. In one example, a composite score for each machine learning model may be calculated by assigning a weight of "1" to the current round of performance scores and the future potential scores of the machine learning models, i.e., the composite score for each machine learning model is the sum of the current round of performance scores and the future potential scores of the machine learning models. However, the above examples are merely exemplary, and the present invention does not limit the range of the weight. In some examples, when the exploration efficiency is more of a concern, the weight assigned to the current round of performance scores of the machine learning model may be set to be greater than the weight assigned to the future potential scores of the machine learning model. Further, in some examples, when the final exploration results are more focused, the weight assigned to the current round of performance scores of the machine learning model may be set to be less than the weight assigned to the future potential scores of the machine learning model.
For example, in an exemplary first example, the composite score of the logistic regression machine learning model lr may be: 1/5+5/8 ═ 33/40; the composite score of the deep sparse network machine learning model dsn may be: 4/5+4/5 is 8/5.
In step S420, the ratio of the composite score of each machine learning model to the sum of all the composite scores is calculated as the resource allocation coefficient of each machine learning model.
For example, in an exemplary first example, the resource allocation coefficient of the logistic regression machine learning model lr may be calculated as: (33/40)/(33/40 +8/5) ═ 33/97, the resource allocation coefficient for the deep sparse network machine learning model dsn can be calculated as: (8/5) ÷ (33/40+8/5) ═ 64/97.
In step S430, the resource allocation scheme is determined as the following resource allocation scheme: determining a resource corresponding to a product of the resource allocation coefficient of each machine learning model and the total resource to be allocated as a resource to be allocated to each machine learning model.
For example, in an exemplary first example, the resource allocation scheme of the logistic regression machine learning model lr may be determined as the following resource allocation scheme: determining a resource corresponding to a product of the resource allocation coefficient 33/97 of the logistic regression machine learning model lr and the total resource to be allocated as a resource to be allocated to the logistic regression machine learning model lr; the resource allocation scheme of the deep sparse network machine learning model dsn may be determined as follows: a resource corresponding to the product of the resource allocation coefficient 64/97 of the deep sparse network machine learning model dsn and the total resource to be allocated is determined as a resource to be allocated to the deep sparse network machine learning model dsn.
Here, the total resource to be allocated may indicate a predetermined number of resources. As described above, the resources may include at least one of a central processor, memory space, and threads.
In one example, when the resource indicates a central processor, the total number of resources to be allocated may indicate the number of central processors to be allocated.
In another example, when the resource indicates storage space, the amount of total resources to be allocated may indicate the amount or size of storage space to be allocated.
In yet another example, when a resource indicates a thread (also referred to as a task), the total resource to be allocated may indicate the number of tasks or threads. For example, when the total number of tasks to be allocated is 4, in the illustrative first example, the number of resources to be allocated to the logistic regression machine learning model lr is: 33/97 × 4 ═ 132/97, the number of resources to be allocated to the deep sparse network machine learning model dsn is: 64/97 × 4 ═ 256/97.
Alternatively, the step of determining a resource corresponding to a product of the resource allocation coefficient of each machine learning model and the total resource to be allocated as a resource to be allocated to each machine learning model may include: in all the machine learning models except the machine learning model with the highest resource distribution coefficient, from the machine learning model with the lowest resource distribution coefficient, rounding down the product of the resource distribution coefficient of the machine learning model and the total resource to be distributed and determining the value after rounding down as the number of the resources to be distributed to the machine learning model; and determining the resource which is not allocated to the machine learning model in the total resources to be allocated as the resource to be allocated to the machine learning model with the highest resource allocation coefficient. Here, since some resources (tasks, as an example only) are operated in units of integers, in this case, it is necessary to round down a resource corresponding to a product of a resource allocation coefficient of each machine learning model and a total resource to be allocated.
As an example, as described above, when the total number of tasks to be allocated is 4, in an exemplary first example, the number of resources to be allocated to the logistic regression machine learning model lr is: 33/97 × 4 ═ 132/97, the number of resources to be allocated to the deep sparse network machine learning model dsn is: 64/97 × 4 ═ 256/97. In this example, in all the machine learning models (i.e., the logistic regression machine learning model lr having the resource allocation coefficient of 33/97) except the machine learning model having the highest resource allocation coefficient (i.e., the deep sparse network machine learning model dsn having the resource allocation coefficient of 64/97), starting from the logistic regression machine learning model lr having the resource allocation coefficient of 33/97, the product of the resource allocation coefficient of the logistic regression machine learning model lr and the total resources to be allocated (i.e., 132/97) is rounded down and the rounded-down value (i.e., the value 1 rounded down by 132/97) is determined as the number (i.e., 1) of resources (i.e., tasks) to be allocated to the logistic regression machine learning model lr, and the resources (i.e., tasks: 4-1 ═ 3) not yet allocated to the machine learning model among the total resources to be allocated (i.e., total tasks) are determined as the resources (i.e., tasks: 4-1 ═ 3) to be allocated to the machine learning model having the highest resource allocation coefficient The resources of the model are learned (i.e., the deep sparse network machine learning model dsn with resource allocation coefficient 64/97). That is, the number of tasks to be assigned to the logistic regression machine learning model lr is 1, and the number of tasks to be assigned to the deep sparse network machine learning model dsn is 3.
Further, in the above rounding-down, there may be a case where the number of resources is rounded to 0. Therefore, to avoid this situation, an allocation compensation mechanism is adopted. By adopting the allocation compensation mechanism, the strong person (namely, the machine learning model with more allocated resources) can be ensured to have constant strength, and the weak person (namely, the machine learning model with less allocated resources) still has old opportunity. An allocation compensation mechanism in the step of determining a resource corresponding to the product of the resource allocation coefficient of each machine learning model and the total resource to be allocated as a resource to be allocated to each machine learning model is described in detail below with reference to fig. 5.
FIG. 5 shows a flow diagram of an allocation compensation mechanism according to an embodiment of the invention.
Referring to fig. 5, in step S510, when there are a value of zero and a value greater than one among the numbers of resources allocated to the respective machine learning models, the numbers of resources of the machine learning models, the numbers of which are greater than one, among the resources to be allocated to the machine learning models, are sorted in an increasing order.
For convenience of explanation, the following description will be given taking a second example in which six machine learning models a, b, c, d, e, f are allocated with the number of tasks [1,0,0,0,2,7], however, the present invention is not limited thereto, and the number of machine learning models and the number of specifically allocated resources (e.g., the number of tasks) may be any other number.
In the second example illustrated, the allocation compensation mechanism is triggered because at least one machine learning model (i.e., machine learning model b through machine learning model d) allocates a number of tasks of 0. Here, the number of resources of the machine learning model that are greater than one of the resources to be allocated to the machine learning model are ordered in increasing order. That is, in the exemplary second example, the number of resources of the machine learning models (i.e., machine learning model e and machine learning model f) for which the machine learning models a to f are allocated the number of tasks greater than one are sorted in increasing order as [2,7 ]. In the present invention, the resource of the number 1 does not necessarily mean a single resource, and it may mean one unit of resource, wherein one unit of resource corresponds to a predetermined number of resources.
In step S520, starting from the machine learning model with the fewest resources among the resources of the machine learning models sorted in increasing order, the resources of the machine learning model are reduced by one unit, and the reduced resources are allocated to one of the machine learning models whose number of resources is zero, and the process returns to step S510 until all the models have resources that are not 0. The following is explained in more detail in connection with an exemplary second example, however, the present invention is not limited to the exemplary second example.
In the second example, the number of resources of machine learning model e is subtracted by 1 from 2, and the reduced number of resources is allocated to one machine learning model (e.g., machine learning model b) with a number of resources of 0. Since the number of resources of the machine learning model e after subtracting 1 becomes 1, the number of resources of the machine learning model e is subsequently kept to 1, i.e., resources are no longer allocated from the resources of the machine learning model e to other machine learning models. At this time, since there are two machine learning models (i.e., machine learning models c and d) whose number of resources is 0, it is conceivable to continue allocating resources from the other machine learning models to the machine learning model whose number of resources is 0. Since the number of resources of the machine learning model e has become 1, at most the number of resources of the next machine learning model (i.e., the machine learning model f) is reduced to 1. The number of resources of the machine learning model f may be reduced from 7 to 5, and the reduced resources may be allocated to the machine learning model c and the machine learning model d, respectively, such that the number of resources of the machine learning model c and the machine learning model d are both 1.
By assigning the compensation mechanism, the number of resources of the machine learning model a to the machine learning model f eventually becomes [1,1,1,1, 5 ]. Therefore, after the allocation compensation mechanism is adopted, the machine learning models a to f are allocated with resources, the constant intensity of the strong person is ensured, the weak person still has an old chance, and therefore the situation that the machine learning models which are not good in performance are stopped exploring only at one time is avoided, and the accuracy rate of exploration is further improved.
Fig. 6 illustrates a resource scheduling apparatus for multi-model exploration according to an embodiment of the present invention.
Referring to fig. 6, the resource scheduling apparatus 600 for multi-model exploration may include a hyper-parameter exploration training unit 610, a score calculating unit 620, a resource allocation scheme determining unit 630, and a resource scheduling unit 640. Here, the resource scheduling apparatus 600 for multi-model exploration may perform any one of the methods and/or steps described with reference to fig. 1 to 5.
The hyper-parameter exploration training unit 610 may be configured to perform a round of hyper-parameter exploration training on a plurality of machine learning models, respectively, based on the same target dataset, wherein each machine learning model explores at least M sets of hyper-parameters in the round of exploration, M being a positive integer greater than 1. In other words, the hyper-parameter exploration training unit 610 may perform step S110 described with reference to fig. 1. Therefore, for the sake of brevity, the step S110 performed by the hyper-parameter exploration training unit 610 is not described in detail herein, and the description of the step S110 with reference to fig. 1 is also applicable to the hyper-parameter exploration training unit 610.
The score calculation unit 620 is configured to: and calculating the performance score of each machine learning model in the current round and the future potential score of each machine learning model based on the model evaluation indexes respectively corresponding to the plurality of sets of hyper-parameters searched by the plurality of machine learning models in the current round. In other words, the score calculating unit 620 may be configured to perform step S120 described with reference to fig. 1. Therefore, for the sake of brevity, the step S120 performed by the score calculating unit 620 is not described in detail herein, and the description of the step S120 with reference to fig. 1 is also applicable to the score calculating unit 620. Further, as an example, the score calculation unit 620 may also perform the calculation of the performance score of the current round of each machine learning model described with reference to fig. 2 and/or the calculation of the future potential score of each machine learning model described with reference to fig. 3.
The resource allocation scheme determining unit 630 is configured to: and integrating the performance scores of the current round and the potential scores of the future of each machine learning model to determine a resource allocation scheme for allocating the available resources to each machine learning model. In other words, the resource allocation scheme determining unit 630 may be configured to perform step S130 described with reference to fig. 1. Therefore, for the sake of brevity, the resource allocation scheme determining unit 630 is not described in detail herein, and the description of step S130 with reference to fig. 1 may also be applied to the resource allocation scheme determining unit 630. Further, as an example, the resource allocation scheme determining unit 630 may also perform the determination of the resource allocation scheme described with reference to fig. 4 and/or the allocation compensation mechanism described with reference to fig. 5.
The resource scheduling unit 640 may be configured to perform corresponding resource scheduling in the next round of the hyper-parameter discovery training according to the resource allocation scheme. Optionally, the resource scheduling unit 640 may be further configured to: in response to a user's stop request, the total training time reaching a predetermined total training time, or the total number of training rounds reaching a predetermined total number of training rounds, stopping allocating resources to the machine learning model.
The resource scheduling method and the resource scheduling apparatus in multi-model exploration according to the exemplary embodiments of the present invention have been described above with reference to fig. 1 to 6. However, it should be understood that: the devices, systems, units, etc. used in fig. 1-6 may each be configured as software, hardware, firmware, or any combination thereof that performs a particular function. For example, these systems, devices, units, etc. may correspond to dedicated integrated circuits, to pure software code, or to a combination of software and hardware. Further, one or more functions implemented by these systems, apparatuses, or units, etc. may also be uniformly executed by components in a physical entity device (e.g., processor, client, server, etc.).
Further, the above-described method may be implemented by a computer program recorded on a computer-readable storage medium. For example, according to an exemplary embodiment of the present invention, a computer-readable storage medium may be provided, having stored thereon a computer program which, when executed by one or more computing devices, causes the one or more computing devices to implement any of the methods disclosed in the present application.
For example, the computer program, when executed by one or more computing devices, causes the one or more computing devices to perform the steps of: respectively carrying out a round of hyper-parameter exploration training on a plurality of machine learning models based on the same target data set, wherein each machine learning model at least explores M groups of hyper-parameters in the round of exploration, and M is a positive integer greater than 1; calculating the performance score of each machine learning model in the current round and calculating the future potential score of each machine learning model based on the model evaluation indexes respectively corresponding to the plurality of groups of hyper-parameters explored by the plurality of machine learning models in the current round; integrating the performance scores of the current round and the potential scores of the future of each machine learning model, and determining a resource allocation scheme for allocating available resources to each machine learning model; and carrying out corresponding resource scheduling in next round of hyper-parameter exploration training according to the resource allocation scheme.
The computer program in the computer-readable storage medium may be executed in an environment deployed in a computer device such as a client, a host, a proxy apparatus, a server, etc., and it should be noted that the computer program may be further used to perform additional steps other than the above steps or perform more specific processes when the above steps are performed, and the contents of the additional steps and the further processes are mentioned in the description of the related methods and apparatuses with reference to fig. 1 to 6, and thus will not be described again here to avoid repetition.
It should be noted that the resource scheduling method and the resource scheduling apparatus in the multi-model exploration according to the exemplary embodiments of the present invention may be completely dependent on the execution of the computer program to implement the corresponding functions, wherein each unit of the apparatus or system corresponds to each step in the functional architecture of the computer program, so that the whole apparatus or system is called by a special software package (e.g., lib library) to implement the corresponding functions.
For example, a resource scheduling system is provided according to an embodiment of the present invention comprising one or more computing devices and one or more storage devices, wherein the one or more storage devices have stored therein a computer program that, when executed by the one or more computing devices, causes the one or more computing devices to implement any of the methods disclosed herein. For example, causing the one or more computing devices to perform the steps of: respectively carrying out a round of hyper-parameter exploration training on a plurality of machine learning models based on the same target data set, wherein each machine learning model at least explores M groups of hyper-parameters in the round of exploration, and M is a positive integer greater than 1; calculating the performance score of each machine learning model in the current round and calculating the future potential score of each machine learning model based on the model evaluation indexes respectively corresponding to the plurality of groups of hyper-parameters explored by the plurality of machine learning models in the current round; integrating the performance scores of the current round and the potential scores of the future of each machine learning model, and determining a resource allocation scheme for allocating available resources to each machine learning model; and carrying out corresponding resource scheduling in next round of hyper-parameter exploration training according to the resource allocation scheme.
In particular, the computing devices described above may be deployed in servers as well as on node devices in a distributed network environment. Further, the computing device apparatus may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the computing device apparatus may be connected to each other via a bus and/or network.
The computing device here need not be a single device, but may be any collection of devices or circuits that can execute the instructions (or sets of instructions) described above, either individually or in combination. The computing device may also be part of an integrated control computing device or computing device manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).
The computing device for performing the training method or the named entity recognition method of the neural network according to the exemplary embodiment of the present invention may be a processor, and such a processor may include a Central Processing Unit (CPU), a Graphic Processing Unit (GPU), a programmable logic device, a dedicated processor, a microcontroller, or a microprocessor. By way of example, and not limitation, the processor may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like. The processor may execute instructions or code stored in one of the storage devices, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.
The storage device may be integral to the processor, e.g., having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage device may comprise a stand-alone device, such as an external disk drive, storage array, or other storage device usable by any database computing device. The storage device and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the storage device.
It should be noted that exemplary embodiments of the present invention focus on solving the problems of low resource utilization and low exploration efficiency in multi-machine learning model exploration at present. Specifically, according to the technical scheme of the resource scheduling by using the performance score and the future potential score of the current round of the machine learning model, on one hand, the current performance of each machine learning model can be evaluated by fully using the searched results, resources are effectively distributed, and the resource utilization efficiency and the searching efficiency are improved, on the other hand, the future performance of each machine learning model is evaluated by using the searched results, resources are distributed more reasonably, and the resource utilization efficiency and the searching efficiency are improved.
While exemplary embodiments of the present application have been described above, it should be understood that the above description is exemplary only, and not exhaustive, and that the present application is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the present application. Therefore, the protection scope of the present application shall be subject to the scope of the claims.
Claims (28)
1. A method of resource scheduling in multi-model exploration, the method comprising:
respectively carrying out a round of hyper-parameter exploration training on a plurality of machine learning models based on the same target data set, wherein each machine learning model at least explores M groups of hyper-parameters in the round of exploration, and M is a positive integer greater than 1;
calculating a performance score of the current round of each machine learning model and a future potential score of each machine learning model based on model evaluation indexes respectively corresponding to a plurality of sets of hyper-parameters respectively searched by the plurality of machine learning models in the round, wherein the model evaluation indexes indicate the training effect of the machine learning models with the set of hyper-parameters, the performance score of the current round of the machine learning models is related to the optimal result or results searched by the machine learning models in the current round, and the future potential score of the machine learning model indicates the capability of searching better results if the machine learning models are continuously searched and trained;
integrating the performance score of the current round and the potential score of the future of each machine learning model, and determining a resource allocation scheme for allocating available resources to each machine learning model;
and carrying out corresponding resource scheduling in next round of hyper-parameter exploration training according to the resource allocation scheme.
2. The resource scheduling method of claim 1, wherein said calculating a performance score for each machine learning model in its turn comprises:
determining the first K optimal model evaluation indexes from the plurality of machine learning models in the model evaluation indexes respectively corresponding to the plurality of groups of hyper-parameters searched in the round, wherein K is a positive integer;
and for each machine learning model, taking the proportion value of the machine learning model to the first K best model evaluation indexes as the performance score of the current round of the machine learning model.
3. The resource scheduling method of claim 1, wherein said calculating a future potential score for each machine learning model comprises:
storing model evaluation indexes respectively corresponding to a plurality of groups of hyper-parameters searched by each machine learning model in an array according to the sequence to obtain a plurality of arrays respectively corresponding to the plurality of machine learning models;
for each machine learning model, extracting a monotone enhancement array from an array corresponding to the machine learning model, and taking the ratio of the length of the monotone enhancement array to the length of the array corresponding to the machine learning model as the future potential score of the machine learning model.
4. The resource scheduling method of claim 1, wherein the plurality of machine learning models comprises at least two of a logistic regression machine learning model with a hyper-parameter selection mechanism, a naive bayes machine learning model with a hyper-parameter selection mechanism, an ensemble learning model with a hyper-parameter selection mechanism, and a regression correlation machine learning model with a hyper-parameter selection mechanism.
5. The resource scheduling method of claim 1, wherein the resource comprises at least one of a central processor, a memory space, and a thread.
6. The method of claim 1, wherein performing a round of hyperparametric exploration training on each of the plurality of machine learning models based on the same set of target data further comprises:
determining whether at least one of the plurality of machine learning models satisfies an early-stop condition,
wherein, when at least one machine learning model is determined to satisfy the condition of early stopping, stopping training of the at least one machine learning model and not performing the steps of calculating the performance score of the current round and the potential score of the future for the at least one machine learning model.
7. The resource scheduling method of claim 6, wherein the condition of the early stop comprises:
when the model evaluation indexes corresponding to the current round exploration hyper-parameters of a machine learning model are not innovated for I times continuously, the machine learning model meets the early stop condition;
and/or the presence of a gas in the gas,
and when the model evaluation indexes corresponding to the J hyper-parameters explored by one machine learning model in the current round are higher than the optimal evaluation indexes of the other machine learning model in the current round, the other machine learning model meets the early stop condition.
8. The resource scheduling method of claim 3, wherein an array corresponding to the machine learning model sequentially includes a first model evaluation index through an Xth model evaluation index, wherein X is an integer equal to or greater than M,
the step of extracting a monotone enhanced array from an array corresponding to the machine learning model includes:
extracting the first model evaluation index as a first value in a monotone enhancement array;
and for any model evaluation index from the second model evaluation index to the Xth model evaluation index, if the model evaluation index is superior to the maximum value in the current monotone enhancement array, extracting the model evaluation index as a new value in the monotone enhancement array.
9. The resource scheduling method of claim 1, wherein the step of determining the resource allocation scheme comprises:
calculating a composite score of each machine learning model based on the performance score of the current round and the potential score of the future of each machine learning model;
calculating the ratio of the comprehensive score of each machine learning model to the sum of all the comprehensive scores as the resource distribution coefficient of each machine learning model;
determining the resource allocation scheme as the following resource allocation scheme: determining a resource corresponding to a product of the resource allocation coefficient of each machine learning model and the total resource to be allocated as a resource to be allocated to each machine learning model.
10. The resource scheduling method of claim 9, wherein the determining of the resource corresponding to a product of the resource allocation coefficient of each machine learning model and the total resource to be allocated as the resource to be allocated to each machine learning model comprises:
in all the machine learning models except the machine learning model with the highest resource distribution coefficient, from the machine learning model with the lowest resource distribution coefficient, rounding down the product of the resource distribution coefficient of the machine learning model and the total resource to be distributed and determining the value after rounding down as the number of the resources to be distributed to the machine learning model;
and determining the resource which is not allocated to the machine learning model in the total resources to be allocated as the resource to be allocated to the machine learning model with the highest resource allocation coefficient.
11. The resource scheduling method of claim 10, wherein the determining of the resource corresponding to a product of the resource allocation coefficient of each machine learning model and the total resource to be allocated as the resource to be allocated to each machine learning model further comprises:
when the number of the resources allocated to each machine learning model has a value of zero and a value greater than one, sorting the number of the resources allocated to the machine learning model whose number of the resources is greater than one in an increasing order;
and in the resources of the machine learning models which are ordered according to the ascending order, starting from the machine learning model with the least resources, reducing the resources of the machine learning model by one unit, allocating the reduced resources to one machine learning model in the machine learning models with the zero resource number, and returning to the step of ordering according to the ascending order until all the resources of the machine learning models are not zero.
12. The resource scheduling method of claim 1, wherein the resource scheduling method further comprises:
in response to a user's stop request, the total training time reaching a predetermined total training time, or the total number of training rounds reaching a predetermined total number of training rounds, stopping allocating resources to the machine learning model.
13. The resource scheduling method of claim 1, wherein performing a round of hyper-parameter exploration training on each of the plurality of machine learning models based on the same target dataset comprises: and respectively distributing the same number of resources to the plurality of machine learning models, and respectively performing a round of hyper-parameter exploration training on the plurality of machine learning models by using the same number of resources based on the same target data set.
14. A resource scheduling apparatus for multi-model exploration, the resource scheduling apparatus comprising:
the hyper-parameter exploration training unit is configured to perform one round of hyper-parameter exploration training on the multiple machine learning models respectively based on the same target data set, wherein each machine learning model explores at least M groups of hyper-parameters in the round of exploration, and M is a positive integer greater than 1;
a score calculating unit configured to calculate a present round performance score of each machine learning model based on model evaluation indexes respectively corresponding to the plurality of sets of hyper-parameters respectively explored by the plurality of machine learning models in the present round, and calculate a future potential score of each machine learning model, wherein the model evaluation indexes indicate training effects of the machine learning models with the set of hyper-parameters, the present round performance score of the machine learning models is related to the optimal one or more results explored by the machine learning models in the present round, and the future potential score of the machine learning model indicates an ability to explore better results of the training if the machine learning models are continuously explored;
a resource allocation scheme determination unit configured to synthesize the performance score of the current round and the potential score of the future of each machine learning model and determine a resource allocation scheme for allocating available resources to each machine learning model;
and the resource scheduling unit is configured to perform corresponding resource scheduling in the next round of hyper-parameter exploration training according to the resource allocation scheme.
15. The resource scheduling apparatus of claim 14, wherein the score calculating unit is configured to:
determining the first K optimal model evaluation indexes from the plurality of machine learning models in the model evaluation indexes respectively corresponding to the plurality of groups of hyper-parameters searched in the round, wherein K is a positive integer;
and for each machine learning model, taking the proportion value of the machine learning model to the first K best model evaluation indexes as the performance score of the current round of the machine learning model.
16. The resource scheduling apparatus of claim 14, wherein the score calculating unit is configured to:
storing model evaluation indexes respectively corresponding to a plurality of groups of hyper-parameters searched by each machine learning model in an array according to the sequence to obtain a plurality of arrays respectively corresponding to the plurality of machine learning models;
for each machine learning model, extracting a monotone enhancement array from an array corresponding to the machine learning model, and taking the ratio of the length of the monotone enhancement array to the length of the array corresponding to the machine learning model as the future potential score of the machine learning model.
17. The resource scheduling apparatus of claim 14, wherein the plurality of machine learning models comprises at least two of a logistic regression machine learning model with a hyper-parameter selection mechanism, a naive bayes machine learning model with a hyper-parameter selection mechanism, an ensemble learning model with a hyper-parameter selection mechanism, and a regression correlation machine learning model with a hyper-parameter selection mechanism.
18. The resource scheduling apparatus of claim 14, wherein the resource comprises at least one of a central processor, a memory space, and a thread.
19. The resource scheduling apparatus of claim 14, wherein the hyperparameter exploration training unit is further configured to:
determining whether at least one of the plurality of machine learning models satisfies an early-stop condition,
wherein the hyper-parameter exploration training unit stops the training of the at least one machine learning model and does not perform the steps of calculating the performance score of the current round and the future potential score on the at least one machine learning model when the at least one machine learning model is determined to satisfy the condition of early stop.
20. The resource scheduling apparatus of claim 19, wherein the condition for the early stop comprises:
when the model evaluation indexes corresponding to the current round exploration hyper-parameters of a machine learning model are not innovated for I times continuously, the machine learning model meets the early stop condition;
and/or the presence of a gas in the gas,
and when the model evaluation indexes corresponding to the J hyper-parameters searched by one machine learning model in the current round are higher than the optimal evaluation indexes of the other machine learning model in the current round, the other machine learning model meets the early stop condition.
21. The resource scheduling apparatus of claim 16, wherein an array corresponding to the machine learning model sequentially includes a first model evaluation index through an Xth model evaluation index, wherein X is an integer equal to or greater than M,
the score calculation unit is configured to:
extracting the first model evaluation index as a first value in a monotone enhancement array;
and for any model evaluation index from the second model evaluation index to the Xth model evaluation index, if the model evaluation index is superior to the maximum value in the current monotone enhancement array, extracting the model evaluation index as a new value in the monotone enhancement array.
22. The resource scheduling apparatus of claim 14, wherein the resource allocation scheme determining unit is configured to:
calculating a composite score of each machine learning model based on the performance score of the current round and the potential score of the future of each machine learning model;
calculating the ratio of the comprehensive score of each machine learning model to the sum of all the comprehensive scores as the resource distribution coefficient of each machine learning model;
determining the resource allocation scheme as the following resource allocation scheme: determining a resource corresponding to a product of the resource allocation coefficient of each machine learning model and the total resource to be allocated as a resource to be allocated to each machine learning model.
23. The resource scheduling apparatus of claim 22, wherein the resource allocation scheme determining unit is configured to:
in all the machine learning models except the machine learning model with the highest resource distribution coefficient, from the machine learning model with the lowest resource distribution coefficient, rounding down the product of the resource distribution coefficient of the machine learning model and the total resource to be distributed and determining the value after rounding down as the number of the resources to be distributed to the machine learning model;
and determining the resource which is not allocated to the machine learning model in the total resources to be allocated as the resource to be allocated to the machine learning model with the highest resource allocation coefficient.
24. The resource scheduling apparatus of claim 23, wherein the resource allocation scheme determining unit is further configured to:
when the number of the resources allocated to each machine learning model has a value of zero and a value greater than one, sorting the number of the resources allocated to the machine learning model whose number of the resources is greater than one in an increasing order;
and in the resources of the machine learning models which are ordered according to the ascending order, starting from the machine learning model with the least resources, reducing the resources of the machine learning model by one unit, allocating the reduced resources to one machine learning model in the machine learning models with the zero resource number, and returning to the step of ordering according to the ascending order until all the resources of the machine learning models are not zero.
25. The resource scheduling apparatus of claim 14, wherein the resource scheduling unit is further configured to:
in response to a user's stop request, the total training time reaching a predetermined total training time, or the total number of training rounds reaching a predetermined total number of training rounds, stopping allocating resources to the machine learning model.
26. The resource scheduling apparatus of claim 14, wherein the hyperparameter exploration training unit is configured to: and respectively allocating the same number of resources to the multiple machine learning models, and respectively performing a round of hyper-parameter exploration training on the multiple machine learning models based on the same target data set by using the same number of resources.
27. A computer-readable storage medium having stored thereon a computer program that, when executed by one or more computing devices, causes the one or more computing devices to implement the resource scheduling method of any one of claims 1-13.
28. A resource scheduling system comprising one or more computing devices and one or more storage devices having a computer program recorded thereon, which, when executed by the one or more computing devices, causes the one or more computing devices to carry out the resource scheduling method of any one of claims 1-13.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910791358.3A CN110503208B (en) | 2019-08-26 | 2019-08-26 | Resource scheduling method and resource scheduling device in multi-model exploration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910791358.3A CN110503208B (en) | 2019-08-26 | 2019-08-26 | Resource scheduling method and resource scheduling device in multi-model exploration |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110503208A CN110503208A (en) | 2019-11-26 |
CN110503208B true CN110503208B (en) | 2022-05-17 |
Family
ID=68589639
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910791358.3A Active CN110503208B (en) | 2019-08-26 | 2019-08-26 | Resource scheduling method and resource scheduling device in multi-model exploration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110503208B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111340240A (en) * | 2020-03-25 | 2020-06-26 | 第四范式(北京)技术有限公司 | Method and device for realizing automatic machine learning |
CN112149838A (en) * | 2020-09-03 | 2020-12-29 | 第四范式(北京)技术有限公司 | Method, device, electronic equipment and storage medium for realizing automatic model building |
CN112116104B (en) * | 2020-09-17 | 2024-06-18 | 京东科技控股股份有限公司 | Method, device, medium and electronic equipment for automatically integrating machine learning |
TWI756974B (en) * | 2020-12-09 | 2022-03-01 | 財團法人工業技術研究院 | Machine learning system and resource allocation method thereof |
CN113010312B (en) * | 2021-03-11 | 2024-01-23 | 山东英信计算机技术有限公司 | Super-parameter tuning method, device and storage medium |
US20220414530A1 (en) * | 2021-06-25 | 2022-12-29 | International Business Machines Corporation | Selection of a machine learning model |
CN113780287A (en) * | 2021-07-30 | 2021-12-10 | 武汉中海庭数据技术有限公司 | Optimal selection method and system for multi-depth learning model |
CN114003393B (en) * | 2021-12-30 | 2022-06-14 | 南京大学 | Method and system for improving integrated automatic machine learning operation performance |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108108228A (en) * | 2018-01-05 | 2018-06-01 | 安徽师范大学 | A kind of resource allocation methods based on differential evolution algorithm |
CN109144724A (en) * | 2018-07-27 | 2019-01-04 | 众安信息技术服务有限公司 | A kind of micro services resource scheduling system and method |
CN109711548A (en) * | 2018-12-26 | 2019-05-03 | 歌尔股份有限公司 | Choosing method, application method, device and the electronic equipment of hyper parameter |
CN109816116A (en) * | 2019-01-17 | 2019-05-28 | 腾讯科技(深圳)有限公司 | The optimization method and device of hyper parameter in machine learning model |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8924964B2 (en) * | 2010-11-01 | 2014-12-30 | Microsoft Corporation | Dynamic allocation and assignment of virtual environment |
US10817259B2 (en) * | 2017-07-31 | 2020-10-27 | Allegro Artificial Intelligence Ltd | System and method for causing actions in a dataset management system |
-
2019
- 2019-08-26 CN CN201910791358.3A patent/CN110503208B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108108228A (en) * | 2018-01-05 | 2018-06-01 | 安徽师范大学 | A kind of resource allocation methods based on differential evolution algorithm |
CN109144724A (en) * | 2018-07-27 | 2019-01-04 | 众安信息技术服务有限公司 | A kind of micro services resource scheduling system and method |
CN109711548A (en) * | 2018-12-26 | 2019-05-03 | 歌尔股份有限公司 | Choosing method, application method, device and the electronic equipment of hyper parameter |
CN109816116A (en) * | 2019-01-17 | 2019-05-28 | 腾讯科技(深圳)有限公司 | The optimization method and device of hyper parameter in machine learning model |
Non-Patent Citations (2)
Title |
---|
"Cross-Correlation Prediction of Resource Demand for Virtual Machine Resource";Dorian Minarolli.et al;《IEEE》;20130312;全文 * |
"云计算资源调度:策略与算法";储雅等;《计算机科学》;20131130;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110503208A (en) | 2019-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110503208B (en) | Resource scheduling method and resource scheduling device in multi-model exploration | |
US11720822B2 (en) | Gradient-based auto-tuning for machine learning and deep learning models | |
Hu et al. | Spear: Optimized dependency-aware task scheduling with deep reinforcement learning | |
US20180240043A1 (en) | Model and pattern structure online unital learning: mapsoul | |
EP3866008A1 (en) | Method for processing tasks in parallel, device and storage medium | |
KR20180073669A (en) | Stream-based accelerator processing of computed graphs | |
Arnaiz-González et al. | MR-DIS: democratic instance selection for big data by MapReduce | |
Huang et al. | Task ranking and allocation in list-based workflow scheduling on parallel computing platform | |
Kumar et al. | Multi-objective workflow scheduling scheme: a multi-criteria decision making approach | |
Varghese et al. | Cloud benchmarking for maximising performance of scientific applications | |
CN112580775A (en) | Job scheduling for distributed computing devices | |
CN112529211A (en) | Hyper-parameter determination method and device, computer equipment and storage medium | |
CN109727376B (en) | Method and device for generating configuration file and vending equipment | |
Singh et al. | A machine learning approach for modular workflow performance prediction | |
Rahman et al. | SMBSP: a self-tuning approach using machine learning to improve performance of spark in big data processing | |
US20220391672A1 (en) | Multi-task deployment method and electronic device | |
Feng et al. | Heterogeneity-aware proactive elastic resource allocation for serverless applications | |
CN114217930A (en) | Accelerator system resource optimization management method based on mixed task scheduling | |
Bender et al. | Closing the gap between cache-oblivious and cache-adaptive analysis | |
Rahmani et al. | Machine learning-driven energy-efficient load balancing for real-time heterogeneous systems | |
Ahmed et al. | Fuzzy active learning to detect OpenCL kernel heterogeneous machines in cyber physical systems | |
CN112200653A (en) | Bank transaction amount prediction method, resource allocation method, computing device and medium | |
JP2021182224A (en) | Job scheduling program, information processing device, and job scheduling method | |
Nemirovsky et al. | A deep learning mapper (DLM) for scheduling on heterogeneous systems | |
Khramtsova et al. | Embark on DenseQuest: A System for Selecting the Best Dense Retriever for a Custom Collection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |