CN111539536B - Method and device for evaluating service model hyper-parameters - Google Patents

Method and device for evaluating service model hyper-parameters Download PDF

Info

Publication number
CN111539536B
CN111539536B CN202010566084.0A CN202010566084A CN111539536B CN 111539536 B CN111539536 B CN 111539536B CN 202010566084 A CN202010566084 A CN 202010566084A CN 111539536 B CN111539536 B CN 111539536B
Authority
CN
China
Prior art keywords
hyper
parameter
value
combinations
combination
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010566084.0A
Other languages
Chinese (zh)
Other versions
CN111539536A (en
Inventor
张雅淋
李龙飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010566084.0A priority Critical patent/CN111539536B/en
Publication of CN111539536A publication Critical patent/CN111539536A/en
Application granted granted Critical
Publication of CN111539536B publication Critical patent/CN111539536B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification provides a method and a device for evaluating a business model hyper-parameter, wherein the method comprises the following steps: acquiring a plurality of first hyper-parameter combinations of the business model, respective scores of the first hyper-parameter combinations and a second hyper-parameter combination to be evaluated, wherein the score of each first hyper-parameter combination is the performance score of the corresponding business model; calculating the similarity of the second hyper-parameter combination and each first hyper-parameter combination; calculating a weighted sum of the plurality of scores as an estimated score of the second hyper-parametric combination, wherein a weight of the score of each of the first hyper-parametric combinations is determined based on the respective similarity.

Description

Method and device for evaluating service model hyper-parameters
Technical Field
The embodiment of the specification relates to the technical field of machine learning, in particular to a method and a device for evaluating a business model hyperparameter and a method and a device for determining the business model hyperparameter.
Background
In an application scene of the internet, a large amount of business data needs to be analyzed every day, and machine learning is playing a role in more and more scenes as a technical means. For a given task, establishing and deploying an effective model generally includes two main parts, one is to select a suitable model, and the other is to select a suitable hyper-parameter for the model, so as to provide guarantee for the performance of the model.
In the current scheme, the most basic and widely used are two hyper-parameter search algorithms, namely the grid search (GridSearch) algorithm and the random search (RandomSearch) algorithm, in which the superior hyper-parameter is searched within a given search range. Improved algorithms for the above search Algorithm include Genetic Algorithm (Genetic Algorithm) and Differential Evolution (Differential Evolution) Algorithm, in which the Genetic Algorithm is adapted to discrete hyper-parameters and the Differential Evolution Algorithm is adapted to continuous hyper-parameters. In addition, the Bayesian optimization algorithm fits a hyper-parameter-performance curve of historical hyper-parameters based on a Gaussian process so as to guide the selection of the hyper-parameters in the next round. In addition, in these hyper-parameter search algorithms, when the hyper-parameters of the model need to be evaluated, a training sample set is usually used to train the model corresponding to the hyper-parameters, and then the performance score of the trained model is tested by the test sample set, so as to obtain the score of the hyper-parameters.
Therefore, a more efficient solution for evaluating and determining hyper-parameters of a business model is needed.
Disclosure of Invention
The embodiments of the present specification aim to provide a more efficient solution for evaluating and determining hyper-parameters of a business model, so as to solve the deficiencies in the prior art.
To achieve the above object, one aspect of the present specification provides a method for evaluating a service model hyper-parameter, where the service model is used for processing a service related to any one of the following objects in a network platform: the business model comprises a plurality of hyper-parameters, and the method comprises the following steps:
acquiring a plurality of first hyper-parameter combinations and respective scores thereof of a business model, and a second hyper-parameter combination to be evaluated, wherein the first hyper-parameter combination and the second hyper-parameter combination respectively comprise respective values of the plurality of hyper-parameters, and the score of the first hyper-parameter combination is the performance score of the corresponding business model;
calculating the similarity of the second hyper-parameter combination and each first hyper-parameter combination;
calculating a weighted sum of the plurality of scores as an estimated score of the second hyper-parametric combination, wherein a weight of the score of each of the first hyper-parametric combinations is determined based on the respective similarity.
In one embodiment, calculating the similarity of the second hyper-parametric combination to each of the first hyper-parametric combinations comprises calculating a distance of the second hyper-parametric combination to each of the first hyper-parametric combinations, and calculating the similarity of the second hyper-parametric combination to each of the first hyper-parametric combinations based on each of the distances.
In one embodiment, the plurality of hyper-parameters includes a first hyper-parameter, and in a case where a value range of the first hyper-parameter is a continuous value range, calculating the distance between the second hyper-parameter combination and each first hyper-parameter combination includes calculating a difference between a value of the first hyper-parameter in the second hyper-parameter combination and a value of the first hyper-parameter in any one of the first hyper-parameter combinations.
In one embodiment, in a case where the value ranges of the first hyper-parameters are discrete value ranges, calculating the distance between the second hyper-parameter combination and each first hyper-parameter combination includes determining whether the value of the first hyper-parameter in the second hyper-parameter combination is equal to the value of the first hyper-parameter in any first hyper-parameter combination.
Another aspect of the present specification provides a method of determining a hyper-parameter of a business model, the business model comprising a plurality of hyper-parameters, the method comprising:
obtaining a plurality of first hyper-parameter combinations, wherein each first hyper-parameter combination comprises respective values of the plurality of hyper-parameters;
acquiring a performance score of a business model corresponding to each first hyper-parameter combination as a score of each first hyper-parameter combination based on a training sample set and a testing sample set which are prepared in advance, wherein the training sample set and the testing sample set are related to any one of the following objects in the network platform: users, merchants, goods, transactions;
determining a first high-performance hyper-parameter combination and a plurality of low-performance hyper-parameter combinations from the plurality of first hyper-parameter combinations based on the scores of the respective first hyper-parameter combinations;
determining a first value subspace of the plurality of hyper-parameters based on respective hyper-parameter values of the first high-performance hyper-parameter combination and the plurality of low-performance hyper-parameter combinations, wherein the first value subspace comprises the first high-performance hyper-parameter combination and does not comprise the plurality of low-performance hyper-parameter combinations;
randomly acquiring a plurality of second hyper-parameter combinations from the first value subspace;
predicting an estimation score of each second hyper-parameter combination by the method for evaluating the hyper-parameters of the business model;
and selecting a preset number of second hyper-parameter combinations as first hyper-parameter combinations processed next time when the method is executed based on the estimation scores of the second hyper-parameter combinations.
In one embodiment, determining the first value subspace of the plurality of hyper-parameters based on the respective hyper-parameter values of the first high-performance hyper-parameter combination and the plurality of low-performance hyper-parameter combinations comprises:
selecting a first low-performance hyper-parameter combination from the plurality of low-performance hyper-parameter combinations;
selecting a first hyper-parameter from selectable hyper-parameters in a plurality of hyper-parameters, wherein the value of the first hyper-parameter of a first high-performance hyper-parameter combination is a first value, the value of the first hyper-parameter of a first low-performance hyper-parameter combination is a second value, and the value range of the selectable hyper-parameter comprises at least two values;
and in the case that the value range of the first hyper-parameter is a continuous value range, randomly selecting a value from the value range between the first value and the second value to be used for contracting the value range of the first hyper-parameter.
In one embodiment, determining the first value subspace of the plurality of hyper-parameters based on the respective hyper-parameter values of the first high-performance hyper-parameter combination and the plurality of low-performance hyper-parameter combinations further comprises narrowing the value range of the first hyper-parameter to the first value in case the value range of the first hyper-parameter comprises a plurality of discrete values.
Another aspect of the present specification provides an apparatus for evaluating a hyper-parameter of a business model, where the business model is used for processing a business related to any one of the following objects in a network platform: user, merchant, commodity, transaction, the business model includes a plurality of hyper-parameters, the device includes:
the system comprises an acquisition unit, a calculation unit and a calculation unit, wherein the acquisition unit is configured to acquire a plurality of first hyper-parameter combinations and respective scores thereof of a business model, and a second hyper-parameter combination to be evaluated, the first hyper-parameter combinations and the second hyper-parameter combinations respectively comprise respective values of the plurality of hyper-parameters, and the scores of the first hyper-parameter combinations are performance scores of the corresponding business model;
a first calculation unit configured to calculate a similarity between the second hyper-parameter combination and each of the first hyper-parameter combinations;
a second calculation unit configured to calculate a weighted sum of the plurality of scores as an estimated score of the second hyper-parameter combination, wherein a weight of the score of each of the first hyper-parameter combinations is determined based on the corresponding similarity.
In one embodiment, the first calculation unit is further configured to calculate distances between the second hyper-parameter combination and each of the first hyper-parameter combinations, and calculate a similarity between the second hyper-parameter combination and each of the first hyper-parameter combinations based on each of the distances.
In an embodiment, the plurality of hyper-parameters includes a first hyper-parameter, and in a case that a value range of the first hyper-parameter is a continuous value range, the first calculating unit is further configured to calculate a difference between a value of the first hyper-parameter in the second hyper-parameter combination and a value of the first hyper-parameter in any first hyper-parameter combination.
In an embodiment, in a case that the value range of the first hyper-parameter is a discrete value range, the first computing unit is further configured to determine whether the value of the first hyper-parameter in the second hyper-parameter combination is equal to the value of the first hyper-parameter in any one of the first hyper-parameter combinations.
Another aspect of the present specification provides an apparatus for determining a hyper-parameter of a business model, the business model comprising a plurality of hyper-parameters, the apparatus comprising:
a first acquisition unit configured to acquire a plurality of first hyper-parameter combinations, each of the first hyper-parameter combinations including respective values of the plurality of hyper-parameters;
a second obtaining unit, configured to obtain, as a score of each first hyper-parameter combination, a performance score of the business model corresponding to each first hyper-parameter combination based on a training sample set and a test sample set prepared in advance, where the training sample set and the test sample set are related to any one of the following objects in the network platform: users, merchants, goods, transactions;
a first determination unit configured to determine a first high-performance hyper-parameter combination and a plurality of low-performance hyper-parameter combinations from the plurality of first hyper-parameter combinations based on scores of the respective first hyper-parameter combinations;
a second determining unit, configured to determine, based on respective superparametric values of the first high-performance superparametric combination and the plurality of low-performance superparametric combinations, a first value subspace of the plurality of superparameters, where the first value subspace includes the first high-performance superparametric combination and does not include the plurality of low-performance superparametric combinations;
a third obtaining unit configured to randomly obtain a plurality of second hyper-parameter combinations from the first value subspace;
a prediction unit configured to predict an estimation score of each of the second hyper-parameter combinations by the above apparatus for evaluating a hyper-parameter of a business model;
and the selecting unit is configured to select a predetermined number of second hyper-parameter combinations as first hyper-parameter combinations processed next time when the method is executed based on the estimation scores of the second hyper-parameter combinations.
In one embodiment, the second determination unit comprises:
the first selection subunit is configured to select a first low-performance hyper-parameter combination from the plurality of low-performance hyper-parameter combinations;
the second selecting subunit is configured to select a first hyper-parameter from selectable hyper-parameters in the plurality of hyper-parameters, a value of the first hyper-parameter of the first high-performance hyper-parameter combination is a first value, a value of the first hyper-parameter of the first low-performance hyper-parameter combination is a second value, and a value range of the selectable hyper-parameter includes at least two values;
and the first contraction subunit is configured to randomly select one value from the value range between the first value and the second value to be used for contracting the value range of the first hyper-parameter when the value range of the first hyper-parameter is a continuous value range.
In one embodiment, the second determining unit further comprises: a second contraction subunit configured to, in a case where the range of values of the first hyper-parameter includes a plurality of discrete values, contract the range of values of the first hyper-parameter to the first value.
Another aspect of the present specification provides a computer readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform any one of the above methods.
Another aspect of the present specification provides a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements any of the methods described above.
In the scheme for determining the hyper-parameters of the business model according to the embodiment of the specification, the selectable range of the hyper-parameters is mapped into a region in the space, and a suitable region is learned through the model to recommend new hyper-parameters, so that the method can be applied to both continuous hyper-parameters and discrete hyper-parameters. Furthermore, in the process of learning the model, the hyper-parameter value space is shrunk by adopting a random method, so that the problem of dimension explosion in the fitting Gaussian algorithm is avoided, and the learned space is ensured to be good enough. In addition, the estimation scores of the hyper-parameter combinations to be evaluated are predicted based on the existing hyper-parameter combinations, so that the hyper-parameter combinations to be evaluated selected based on the estimation scores have better performance, and the time efficiency in the hyper-parameter searching process is further improved.
Drawings
The embodiments of the present specification may be made more clear by describing the embodiments with reference to the attached drawings:
FIG. 1 illustrates a schematic diagram of a system for determining hyper-parameters of a business model in accordance with an embodiment of the present description;
FIG. 2 is a schematic diagram of a process for obtaining a new first hyper-parameter set;
FIG. 3 illustrates a schematic diagram of an update process of the predictive model 13 according to an embodiment of the present description;
FIG. 4 illustrates a flow diagram of a method of evaluating business model hyper-parameters, in accordance with an embodiment of the present description;
FIG. 5 illustrates a flow diagram of a method of evaluating business model hyper-parameters, in accordance with an embodiment of the present description;
FIG. 6 illustrates an apparatus 600 for evaluating business model hyper-parameters according to an embodiment of the present description;
FIG. 7 illustrates an apparatus 700 for determining hyper-parameters of a business model in accordance with an embodiment of the present description.
Detailed Description
The embodiments of the present specification will be described below with reference to the accompanying drawings.
The scheme for evaluating and determining model hyper-parameters according to embodiments of the present description may be applied to various business models. The business model is, for example, an XGBoost model, which is trained based on, for example, a plurality of training samples corresponding to a plurality of users in the network platform, respectively, so as to be used for classifying the users for business processing, or which is trained based, for example, on a plurality of training samples corresponding to a plurality of transactions in the network platform, respectively, so as to be used for classifying the transactions for business processing, and so on. It is understood that the business model is not limited to the XGBoost model, but may be various classification models, regression models, neural network models, tree models, and the like, and is not limited thereto. The prediction objects aimed by the model are not limited to users, transactions and the like, but can be various objects in a network platform, such as merchants, commodities, film and television works and the like. The XGBoost model will be described as an example.
Before training the XGBoost model, it is usually necessary to determine a plurality of superparameters, such as eta, max _ depth, subsample, colsample _ byte, num _ round, etc., and for convenience of description, the superparameters a, b, c, d, e will be used respectively and correspond to the order thereof hereinafter. Wherein, for example, the super parameters a-c are continuous super parameters having predetermined value ranges, a: [0.1,0.3],b:[0.6,1],c:[0.6,1]The super parameters d and e are discrete super parameters, and the initial selectable values are d: {4,5,6,7} and e: {100,200,300 }. The value ranges of the respective hyper-parameters form a 5-dimensional value space in whichAny point is a possible hyper-parametric combination of the XGBoost model, which can be expressed as a vector x with dimension 5, such as x = [0.1,0.6,0.6,4,100 = [0.1,0.6 ], 4,100 ]]The performance score S of the XGboost model can be obtained by training the XGboost model with the hyperparametric combination x by using a training sample set and testing the trained XGboost model by using a testing sample set. The performance score S may be values of various parameters, such as accuracy, precision, recall, AUC, combinations of various parameters, and the like, which are not limited herein. Thus, in the scheme through embodiments of the present specification, a plurality of hyper-parametric combinations (i.e., vectors) x are determined through multiple cycles1,x2…xnThereafter, performance scores S may be based on respective XGboost models corresponding to respective combinations of hyper-parameters1,S2…SnSelecting a plurality of hyper-parameter combinations so as to finally determine a better hyper-parameter combination (x) of the XGboost modeli)。
FIG. 1 illustrates a schematic diagram of a system for determining hyper-parameters of a business model according to an embodiment of the present description. As shown in fig. 1, a business model 11, a spatial contraction model 12 and a prediction model 13 are included in the system. In fig. 1, the connection of the above three models by arrows also shows a cycle course in a plurality of cycles in the method according to the embodiment of the present specification. In this loop, after the plurality of first hyper-parameter combinations of the business models 11 are obtained, that is, the plurality of business models 11 corresponding to the plurality of first hyper-parameter combinations, which are schematically illustrated by three overlapped boxes in fig. 1, are obtained. Then, the scores of the respective business models 11 respectively corresponding to the respective first hyper-parameter combinations may be obtained based on a training sample set (not shown in fig. 1) and a testing sample set (not shown in fig. 1). After the scores of the service models 11 are obtained, the first hyper-parameter combinations and the scores are provided to the spatial contraction model 12, so that the hyper-parameter value space of the service models 11 is contracted through the spatial contraction model 12 to obtain a subspace in the value space, and the estimation scores of the second hyper-parameter combinations in the subspace are predicted through the prediction model 13 to be used for determining a new first hyper-parameter combination, wherein the new first hyper-parameter combination is one of the first hyper-parameter combinations processed in the next cycle. In the embodiments of the present specification, the hyper-parameter combinations initially acquired at each cycle are referred to as first hyper-parameter combinations, and the hyper-parameter combinations acquired in the subspace are referred to as second hyper-parameter combinations for distinguishing them.
Fig. 2 shows a schematic diagram of a process of obtaining a new first hyper-parameter combination. As shown in fig. 2, the area within the outermost frame in fig. 2 represents a value space of the hyper-parameter determined by the value range of each hyper-parameter of the business model 11. In the first loop of the method of the embodiments of the present specification, a predetermined number of hyper-parameter combinations are randomly selected in the value space as the first hyper-parameter combination to be processed, for example, the first hyper-parameter combination x shown in a circle in fig. 21~x6. Selecting a first hyper-parameter combination x1~x6Thereafter, the score S of each first hyper-parameter combination may be obtained by based on the training sample set and the test sample set as described above1~S6. Based on the scores of the respective first hyper-parameter combinations, they may be determined as high performance hyper-parameter combinations or low performance hyper-parameter combinations, respectively, e.g. as shown in fig. 2, white circles correspond to high performance hyper-parameter combinations and black circles correspond to low performance hyper-parameter combinations. Then, through the spatial contraction model 12, one of the high performance hyper-parameter combinations, such as the first hyper-parameter combination x, is selected2Then combining x based on the first hyperparameter2And low performance hyper-parametric combinations are punctured from the hyper-parametric space to obtain a subspace of FIG. 2 comprising a first hyper-parametric combination x2But not low performance hyper-parameter combinations. As shown in fig. 2, each straight line within the large box in fig. 2 corresponds to a contraction from the value space of the hyper-parameter. For example, the vertical line on the right represents the 1 st spatial contraction by which the first hyperparameter of low performance is combined x5Moving out of subspace, the left-hand horizontal line represents the 2 nd spatial contraction by which the first hyperparameter of low performance is combined x3Outside the subspace, the vertical lines to the left represent3 rd spatial contraction by which a first hyperparameter of low performance is combined x1Move out of the subspace and thereby finally acquire the subspace as shown in fig. 2. It will be appreciated that in fig. 2, the hyper-parametric dimensioning space is schematically shown as a two-dimensional space, and that one contraction from the hyper-parametric dimensioning space is represented by a straight line, which is merely illustrative. In practice, the value space of the hyper-parameter is a multi-dimensional space, and the value space is divided not by a straight line but by a face space in the value space. In the process, the hyper-parameters are combined in the hyper-parameter value space with high performance (namely, the hyper-parameters are combined by 'x' in figure 2)2Marked white circles) as axis, e.g. in the 1 st spatial contraction described above, with the first hyper-parameter combination x2Is an axis, so that the value of the subspace in the dimension approximates to the axis, and therefore the process may also be referred to as axis shrinkage.
After the subspace in fig. 2 is obtained through the above spatial contraction process, since only the high-performance hyper-parameter combinations are included in the subspace and the low-performance hyper-parameter combinations are not included in the subspace, the subspace can be considered to correspond to the high-performance value space. Thereafter, a predetermined number of second hyper-parameter combinations may be randomly acquired from the subspace, four newly acquired second hyper-parameter combinations x being schematically illustrated in diamond form in fig. 27~x10. After obtaining the second hyper-parameter combination x7~x10Thereafter, the prediction model 13 is used to base the existing plurality of first hyper-parameter combinations (i.e. the first hyper-parameter combination x shown in a circle in fig. 2)1~x6) Respective score S1~S6And calculating the estimated score of each newly acquired second hyper-parameter combination according to the similarity between the newly acquired second hyper-parameter combination and each existing first hyper-parameter combination
Figure DEST_PATH_IMAGE001
. So that x can be combined from the second hyperparameter7~x10The second hyper-parameter combination (for example, the second hyper-parameter combination x) with the highest estimation score is selected8In FIG. 2, the diamond is shown in white)As a new first hyper-parameter combination for processing in the next cycle.
After selecting the second hyperparametric combination x8After being a new first superparameter combination, the above spatial contraction process may be repeated to determine a new subspace and determine another new first superparameter combination. After repeating the process a predetermined number of times (e.g., 6 times), 6 new first hyper-parameter combinations may be acquired. The whole process shown in fig. 1 can thus be re-executed based on the 6 new first superparameter combinations, i.e. a new loop is started. After a number of cycles have been performed, the first hyper-parameter combination with the highest score in the number of cycles may be selected as the hyper-parameter combination for the final use of the business model 11.
Fig. 3 is a schematic diagram illustrating an updating process of the prediction model 13 according to an embodiment of the present specification. Referring to the above description, the updating process of the prediction model 13 includes the stage (r) and the stage (c). In the first stage, the prediction model 13 determines an estimation score of the second hyper-parameter combination to be predicted based on the similarity between the second hyper-parameter combination to be predicted and each first hyper-parameter combination in the current first hyper-parameter combination set and the score of each first hyper-parameter combination, and determines a new first hyper-parameter combination to be added into the first hyper-parameter combination set based on the estimation score. In stage two, the first hyper-parameter combination set is updated in stage one, and the prediction model 13 predicts based on the first hyper-parameter combination set, so the update of the first hyper-parameter combination set also updates the prediction model 13 accordingly.
It will be appreciated that although the prediction process of the predictive model 13 is described above with reference to fig. 1 and 2, the predictive model 13 is not limited to use in conjunction with the spatial contraction model 12 described above, and the predictive model 13 may be used in other model hyper-parametric search algorithms. For example, in the genetic algorithm, after a plurality of new superparameter combinations are generated by the genetic algorithm based on a plurality of initially acquired superparameter combinations in one cycle, the similarity between the new superparameter combinations and each existing superparameter combination and the score of each existing superparameter combination may be acquired, and then, the prediction model 13 calculates the weighted sum of the plurality of scores with each of the similarities as the weight of the score of the corresponding existing superparameter combination as the estimated score of the new superparameter combination, and selects a superparameter combination to be processed in the next cycle based on the estimated score. The processes shown in fig. 1 and 2 will be described in detail below as an example.
FIG. 4 is a flow chart illustrating a method for evaluating business model hyper-parameters, according to an embodiment of the present description, the method comprising:
in step S402, a plurality of first hyper-parameter combinations of the business model and their respective scores, and a second hyper-parameter combination to be evaluated are obtained;
in step S404, calculating the similarity between the second hyper-parameter combination and each first hyper-parameter combination;
in step S406, a weighted sum of the plurality of scores is calculated as an estimated score of the second hyper-parameter combination with each of the similarities as a weight of a score of the corresponding first hyper-parameter combination.
First, in step S402, a plurality of first hyper-parameter combinations of the business model and their respective scores, and a second hyper-parameter combination to be evaluated are obtained.
As described above with reference to FIG. 2, in the first cycle of the method, a first hyper-parameter combination x is randomly acquired in the value space of the hyper-parameters of the business model 111~x6Then, the first hyper-parametric combination x can be trained separately through the training sample set1~x6Corresponding business model 11, and testing the performance score of each business model 11 through the test sample set as the first hyper-parameter combination x1~x6Respective score S1~S6. In addition, through the process shown in FIG. 2, after the subspace is determined, a second hyper-parameter combination x may be randomly acquired in the subspace7~x10Second hyperparametric combination x7~x10Each second hyper-parameter combination in (a) can predict an estimation score by the method shown in fig. 4, hereinafter referred to as second hyper-parameter combination x8Is described as an exampleThe above-mentioned processes are described.
In step S404, the similarity between the second hyper-parameter combination and each of the first hyper-parameter combinations is calculated.
Since, in this specification embodiment, the first hyper-parameter combination and the second hyper-parameter combination both have the form of vectors as described above, the degree of similarity of the second hyper-parameter combination with each of the first hyper-parameter combinations can be calculated by various methods of calculating the degree of similarity between the vectors. If each hyper-parameter of the service model 11 is a continuous hyper-parameter, the similarity may be cosine similarity, similarity based on euclidean distance, equidistant manhattan distance, pearson correlation coefficient, and the like. If the hyper-parameters of the business model 11 include discrete hyper-parameters, the similarity between the hyper-parameter combinations can be calculated by the various similarity calculation methods for continuous hyper-parameters described above.
In one embodiment, x is combined with the first hyperparameter mentioned above1~x6And a second hyperparametric combination x8For example, first, a second hyper-parameter combination x is calculated8In combination with the first hyperparameter x1~x6The distance between the respective combinations in (a). Here, the distance may be calculated by various methods for calculating the distance between vectors. For example, the second hyper-parameter combination x can be calculated by the following formula (1)8In combination with the first hyperparameter xiDistance dist (x) therebetween8,xi) Wherein i is any value of 1-6:
Figure 562751DEST_PATH_IMAGE002
(1)
where k is the number of hyper-parameters, x, included in the business model 118jFor the second hyperparametric combination x8Including the jth super parameter value, xijFor the first hyperparametric combination xiIncluding the jth super parameter value, wherein,
Figure DEST_PATH_IMAGE003
for the second hyperparametric combination x8Is combined with the first hyper-parameterxiOf the jth parameter value of (a).
For example, the service model 11 is the XGBoost model, which includes continuous superparameters a to c and discrete superparameters d and e, and the suppositions that the superparameters a to e and the superparameters x81~x85One-to-one correspondence, for the continuous type hyper-parameters a to c, i.e., j =1, 2, or 3, can be calculated by the following formula (2)
Figure 956518DEST_PATH_IMAGE003
:
Figure 472819DEST_PATH_IMAGE004
(2),
Wherein,
Figure DEST_PATH_IMAGE005
the difference value between the maximum selectable value and the minimum selectable value of the hyper-parameter is represented, and the term is used for avoiding the influence of different value ranges of different hyper-parameters on the calculation result, so that the calculation result is influenced
Figure 265194DEST_PATH_IMAGE006
Is in the range of 0 to 1, i.e. normalized for said distance for different hyper-parameters.
For the discrete superparameters d and e, i.e., j =4 or 5, it can be calculated by the following formula (3)
Figure 402915DEST_PATH_IMAGE003
Figure DEST_PATH_IMAGE007
(3),
That is, for a discrete hyper-parameter, the distance
Figure 6065DEST_PATH_IMAGE006
Also discrete, not 0, i.e. 1. For example for a hyperparameter d, if a second hyperparameter combination x8D value of (2) in combination with the first hyperparameter, xiD is equal toI.e. its distance is nearest, i.e. 0, if the second hyper-parameter combination x8D value of (2) in combination with the first hyperparameter, xiIf d is not equal, the distance is the farthest, i.e. 1.
After calculating the second hyperparametric combination x8In combination with the first hyperparameter xiDistance dist (x) therebetween8,xi) Thereafter, a second hyper-parameter combination x may be calculated based on the distance8In combination with the first hyperparameter xiThe similarity between them. For example, the second hyper-parameter combination x is calculated by the following formula (4)8In combination with the first hyperparameter xiSimilarity sim (x) between8,xi):
Figure 868717DEST_PATH_IMAGE008
(4),
As can be seen from this equation (4), the second hyperparametric combination x8In combination with the first hyperparameter xiThe larger the distance between, the second hyper-parameter combination x8In combination with the first hyperparameter xiThe smaller the similarity between them and vice versa.
In step S406, a weighted sum of the plurality of scores is calculated as an estimated score of the second hyper-parameter combination with each of the similarities as a weight of a score of the corresponding first hyper-parameter combination.
In one embodiment, x may be combined based on the second hyperparameter by the following equation (5)8In combination with the first hyperparameter xiSimilarity between them, calculating the second hyperparametric combination x8Is estimated to be a fraction
Figure DEST_PATH_IMAGE009
Figure 481095DEST_PATH_IMAGE010
(5)
Wherein S isiFor the first hyperparametric combination xiThe fraction of (c).
As can be appreciated, the first and second,
Figure DEST_PATH_IMAGE011
the calculation may be performed based on the above equation (4), or may be performed based on other similarity calculation methods, and in addition, in equation (5), although only in the case of
Figure 207743DEST_PATH_IMAGE011
As SiWeight calculation of each SiFor example, the weight may further include a normalization term for a plurality of S participating in the calculationiAnd (6) carrying out normalization processing. This is because, when calculating the estimation scores for different second hyperparameter combinations, it is possible that the obtained plurality of first hyperparameter combinations are different, e.g. for the second hyperparameter combinations in the subspace of the second iteration in fig. 2, the number of first hyperparameter combinations available for calculating the estimation scores thereof is greater than the number of first hyperparameter combinations available in the first iteration, and thus, by calculating the estimation scores for a plurality of SiAnd normalization processing is carried out, so that the estimation score is more accurate.
FIG. 5 is a flowchart illustrating a method for evaluating hyper-parameters of a business model according to an embodiment of the present disclosure, and as shown in FIG. 5, the method includes the following steps S502-S520.
First, in step S502, n first hyper-parameter combinations are acquired.
The service model 11 is, for example, an XGBoost model, and as described above, the value ranges of 5 hyper-parameters a to e of the XGBoost model are a: [0.1,0.3], b: [0.6,1], c: [0.6,1], d: {4,5,6,7}, and e: {100,200,300}, and the value ranges of the respective hyper-parameters form a value space of 5 hyper-parameters. As shown in fig. 5, the method is performed in a plurality of cycles, and when the first cycle of the method is performed, n =6 hyper-parameter combinations, for example, can be obtained from the value space as 6 first hyper-parameter combinations.
In one embodiment, 6 first hyper-parameter combinations may be randomly obtained from the value space. Specifically, in the process of acquiring one of the first hyper-parameter combinations, for each hyper-parameter, it may be taken therefromA value is randomly acquired from the value range as the value of the superparameter, and the respective values of 5 superparameters are randomly acquired to form a first superparameter combination. By repeating this process 6 times, 6 first hyper-parameter combinations x can be obtained1~x6. For example, if the hyperparameter is a continuous parameter, a value is randomly sampled from the value range of the hyperparameter as the value of the hyperparameter, and if the hyperparameter is a discrete parameter, a value is randomly selected from the selectable values of the parameter as the value of the hyperparameter.
In one embodiment, the initial value space may be divided into 6 portions, and a first hyper-parameter combination may be randomly obtained from each portion. In this embodiment, the method for obtaining the first hyper-parameter combination is not limited.
In step S504, respective scores of the n first hyperparametric combinations are determined.
In particular, for each first hyper-parameter combination xiTraining the business model 11 corresponding to the first hyper-parameter combination by using a pre-prepared training sample set, and testing the trained business model 11 by using a pre-prepared testing sample set to obtain the performance score S of the business model 11iAs the first hyper-parameter combination xiThe fraction of (c).
In step S506, it is determined whether to end the loop of the method.
Specifically, it is determined whether the number of times of the loop reaches a predetermined number of times, and if the number of times of the loop reaches the predetermined number of times, the loop of the method is ended, or it is determined whether the score of the first hyper-parameter combination acquired in the loop reaches a predetermined score, and if the score reaches the predetermined score, the loop of the method is ended. If it is determined that the next loop is necessary, the flow advances to step S508. If it is determined that the next loop is not necessary, the flow advances to step S520.
In step S508, a high-performance hyper-parameter combination and a plurality of low-performance hyper-parameter combinations are determined from the n first hyper-parameter combinations.
After the scores of the respective first hyper-parameter combinations are obtained, a high-performance hyper-parameter combination and a low-performance hyper-parameter combination can be determined based on the scores thereof. For example, the first hyper-parameter combination with the score ordering p top bits may be determined as a high performance hyper-parameter combination, and the first hyper-parameter combination with the score ordering p top bits may be determined as a low performance hyper-parameter combination. Where p may be predetermined based on the number of first hyper-parameter combinations and the score of each first hyper-parameter combination, for example, when n =6, p may be set to 3, it being understood that this setting is merely illustrative for exemplifying the method and not limiting the method.
Referring to FIG. 2, three high-performance first hyperparametric combinations x may be determined based on the scores of the respective first hyperparametric combinations in FIG. 22、x4、x6And three first hyperparametric combinations x of low performance1、x3、x5Then, a first hyper-parameter combination x with high performance can be randomly selected from the hyper-parameter combinations x2For performing subsequent steps.
In step S510, a subspace is determined.
The initial space of the subspace may be set as the value space of the hyper-parameters of the business model 11 (i.e. the space in the maximum rectangular box in fig. 2), and then the first hyper-parameter combination x based on high performance is combined through the above-mentioned space shrinkage model 122And a first hyper-parametric combination of low performance x1、x3、x5The subspace is spatially shrunk from the value space of the hyper-parameters of the business model 11, thereby obtaining the subspace in fig. 2.
As shown in fig. 2, the spatial contraction process includes three spatial contractions, one of which is randomly selected from the low performance hyper-parameter combinations included in the current subspace to perform spatial contraction before performing each spatial contraction. For example, as described above, before the first spatial contraction, a first hyper-parameter combination x with low performance is selected in the current subspace (i.e., the value space of the hyper-parameters of the business model 11)5That is, the first spatial contraction is used to combine x, a first hyperparameter5Out of the subspace. The first spatial contraction process is described in detail below as an example.
Specifically, first, x is combined from the first hyperparameter2And a first hyper-parametric combination x5One hyper-parameter is selected from the respective selectable hyper-parameters, and the combination x is combined based on the first hyper-parameter2And a first hyper-parametric combination x5And (3) respectively taking the value of the hyper-parameter, contracting the value range of the hyper-parameter, and taking the value space of the contracted hyper-parameter as a subspace. Wherein the selectable hyper-parameter is a hyper-parameter whose value range comprises more than two values. For example, in the first spatial compression, the value range of each of the 5 hyper-parameters includes more than two values, and thus each of the plurality of hyper-parameters is an optional hyper-parameter.
In one embodiment, the selected hyper-parameter is, for example, a continuous hyper-parameter. For example, the hyper-parameter is a hyper-parameter a of the above 5 hyper-parameters. Hereinafter, the first hyperparameter is combined with x2The value of the hyper-parameter a of (a) is expressed as a2Combining the first hyperparameter with x5The value of the hyper-parameter a of (a) is expressed as a5. As mentioned above, the range of the hyperparameter a is [0.1,0.3]]In one case, a2<a5E.g. a2=0.18,a5=0.24, and thus can be at a2And a5Randomly sample a value, for example 0.22, and shrink the value range of the hyperparameter a to [0.1, 0.22%]And the value range is used as the value range of the hyperparameter a of the subspace. Because the range of the hyper-parameter a is shrunk, the shrunk value range of the hyper-parameter a does not comprise the first hyper-parameter combination x5I.e. combining the first hyperparameter with x5Move out of the subspace. In another case, a2>a5E.g. a2=0.24,a5=0.18, analogously, can be at a2And a5Randomly sample a value, for example 0.22, and shrink the value range of the hyperparameter a to [0.22, 0.3%]As the value range of the hyper-parameter a of the subspace.
In one embodiment, the selected hyper-parameter is, for example, a discrete hyper-parameter. For example, the super parameter is e among the above 5 super parameters. Hereinafter, will beA hyper-parametric combination x2Is represented by the value of the hyper-parameter e of2Combining the first hyperparameter with x5Is represented by the value of the hyper-parameter e of5. As described above, alternative values for the hyperparameter e include {100,200,300}, assuming e is2=200,e5=300, the selectable value of the over-parameter e is defined as e2And sets the hyperparameter to the non-selectable hyperparameter, is removed from the list of selectable hyperparameters, i.e., in subsequent spatial compression, the hyperparameter e will no longer be scoped. In this way, the first hyperparameters are combined with x in the same manner5And (4) moving out of the subspace.
In the space compression process, after space compression is performed every time, whether all low-performance hyper-parameter combinations in the current subspace are excluded is judged. For example, as shown in FIG. 2, after the first spatial contraction is performed as shown by the right vertical line, it can be determined that there are still low-performance hyper-parameter combinations in the subspace, so the second and third spatial contractions as shown in FIG. 2 are continued until only the first hyper-parameter combination x is included in the subspace obtained after the spatial contraction2Without including any low performance hyper-parameter combinations, thereby obtaining the subspace as shown in fig. 2.
In step S512, m second hyper-parameter combinations are obtained from the subspace.
The process of acquiring the hyper-parameter combination in the predetermined space in this step may refer to the description of step S502 above, and is not described herein again. The m second hyperparametric combinations are for example four second hyperparametric combinations x shown in fig. 2 with diamonds7~x10
In step S514, the estimated scores of the respective second hyperparameter combinations are calculated.
This step may be performed by the predictive model 13 performing the method of FIG. 4, thereby obtaining 4 second hyper-parameter combinations x7~x10Respective estimated scores
Figure 496642DEST_PATH_IMAGE012
In step S516, a new first hyper-parameter combination is determined.
After obtaining the estimation scores of the second hyper-parameter combinations, the m second hyper-parameter combinations may be ranked based on the estimation scores, where the higher the ranking of the second hyper-parameter combinations is, the better the estimation performance of the business model 11 corresponding to the second hyper-parameter combination is. Thereafter, a predetermined number of the second superparameter combinations ranked top are determined as new first superparameter combinations for use in the next iteration of the method. For example, the second hyperparameter with the highest estimation score in FIG. 2 may be combined with x8The first hyper-parameter combination is determined as a new first hyper-parameter combination, or two second hyper-parameter combinations with estimation scores arranged in the first two digits in fig. 2 may be determined as a new first hyper-parameter combination, which is not limited herein.
In step S518, it is determined whether n new first hyper-parameter combinations have been determined.
In this step, it is determined whether to perform the next acquisition process for a new first hyper-parameter combination by determining whether n new first hyper-parameter combinations have been determined. If n new first superparameter combinations have been determined, step S502 is entered, i.e. the next loop is started and the acquired n new first superparameter combinations are taken as the n first superparameter combinations processed in the second loop. If n new first hyperparameter combinations have not been determined, step S508 is returned to, and one high-performance hyperparameter combination and a plurality of low-performance hyperparameter combinations are re-determined from the n first hyperparameter combinations corresponding to the loop, so as to determine new first hyperparameter combinations again.
In step S520, business model hyper-parameters are determined.
When it is determined in step S506 that the next loop is not performed, the process proceeds to step S520, where all the existing first hyper-parameter combinations are sorted based on their scores, and each hyper-parameter in the first hyper-parameter combination with the highest score is determined as the hyper-parameter of the business model 11.
FIG. 6 illustrates an apparatus 600 for evaluating hyper-parameters of a business model for processing a business related to any one of the following objects, according to an embodiment of the present disclosure: user, merchant, commodity, transaction, the business model includes a plurality of hyper-parameters, the apparatus 600 includes:
an obtaining unit 61, configured to obtain a plurality of first hyper-parameter combinations of the service model and their respective scores, and a second hyper-parameter combination to be evaluated, where the first hyper-parameter combination and the second hyper-parameter combination respectively include respective values of the plurality of hyper-parameters, and the score of the first hyper-parameter combination is a performance score of the service model corresponding to the first hyper-parameter combination;
a first calculation unit 62 configured to calculate a similarity between the second hyper-parameter combination and each of the first hyper-parameter combinations;
a second calculating unit 63 configured to calculate a weighted sum of the plurality of scores as an estimated score of the second hyper-parameter combination, wherein a weight of the score of each of the first hyper-parameter combinations is determined based on the corresponding similarity.
In one embodiment, the first calculating unit 62 is further configured to calculate distances between the second hyper-parameter combination and each of the first hyper-parameter combinations, and calculate a similarity between the second hyper-parameter combination and each of the first hyper-parameter combinations based on each of the distances.
In an embodiment, the plurality of hyper-parameters includes a first hyper-parameter, and in a case that a value range of the first hyper-parameter is a continuous value range, the first calculating unit 62 is further configured to calculate a difference between a value of the first hyper-parameter in the second hyper-parameter combination and a value of the first hyper-parameter in any first hyper-parameter combination.
In an embodiment, in a case that the value range of the first hyperparameter is a discrete value range, the first calculating unit 62 is further configured to determine whether the value of the first hyperparameter in the second hyperparameter combination is equal to the value of the first hyperparameter in any one of the first hyperparameter combinations.
Fig. 7 illustrates an apparatus 700 for determining a hyper-parameter of a business model, the business model including a plurality of hyper-parameters, according to an embodiment of the present description, the apparatus 700 including:
a first obtaining unit 71 configured to obtain a plurality of first hyper-parameter combinations, each of the first hyper-parameter combinations including respective values of the plurality of hyper-parameters;
a second obtaining unit 72, configured to obtain, as the score of each first hyper-parameter combination, a performance score of the business model corresponding to each first hyper-parameter combination based on a training sample set and a test sample set prepared in advance, where the training sample set and the test sample set are related to any one of the following objects in the network platform: users, merchants, goods, transactions;
a first determination unit 73 configured to determine a first high-performance hyper-parameter combination and a plurality of low-performance hyper-parameter combinations from the plurality of first hyper-parameter combinations based on the scores of the respective first hyper-parameter combinations;
a second determining unit 74, configured to determine, based on respective hyper-parameter values of the first high-performance hyper-parameter combination and the plurality of low-performance hyper-parameter combinations, a first value subspace of the plurality of hyper-parameters, where the first value subspace includes the first high-performance hyper-parameter combination and does not include the plurality of low-performance hyper-parameter combinations;
a third obtaining unit 75, configured to randomly obtain a plurality of second hyper-parameter combinations from the first value subspace;
a prediction unit 76 configured to predict an estimation score of each of the second hyper-parameter combinations by the above-mentioned means for evaluating hyper-parameters of the business model;
a selecting unit 77 configured to select a predetermined number of second hyper-parameter combinations as first hyper-parameter combinations to be processed next time the method is executed, based on the estimated scores of the respective second hyper-parameter combinations.
In one embodiment, the second determining unit 74 includes:
a first selecting subunit 741, configured to select a first low-performance hyper-parameter combination from the plurality of low-performance hyper-parameter combinations;
a second selecting sub-unit 742 configured to select a first hyper-parameter from selectable hyper-parameters in a plurality of hyper-parameters, where a value of the first hyper-parameter of the first high-performance hyper-parameter combination is a first value, and a value of the first hyper-parameter of the first low-performance hyper-parameter combination is a second value, where a value range of the selectable hyper-parameter includes at least two values;
a first puncturing sub-unit 743, configured to, in a case that a value range of the first hyper-parameter is a continuous value range, randomly select a value from a value range between a first value and a second value for puncturing the value range of the first hyper-parameter.
In one embodiment, the second determining unit 74 further comprises: a second narrowing sub-unit 744 configured to narrow the range of the first hyperparameter to the first value in a case where the range of the first hyperparameter includes a plurality of discrete values.
Another aspect of the present specification provides a computer readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform any one of the above methods.
Another aspect of the present specification provides a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements any of the methods described above.
In the scheme for determining the hyper-parameters of the business model according to the embodiment of the specification, the selectable range of the hyper-parameters is mapped into a region in the space, and a suitable region is learned through the model to recommend new hyper-parameters, so that the method can be applied to both continuous hyper-parameters and discrete hyper-parameters. Furthermore, in the process of learning the model, the hyper-parameter value space is shrunk by adopting a random method, so that the problem of dimension explosion in the fitting Gaussian algorithm is avoided, and the learned space is ensured to be good enough. In addition, the estimation scores of the hyper-parameter combinations to be evaluated are predicted based on the existing hyper-parameter combinations, so that the hyper-parameter combinations to be evaluated selected based on the estimation scores have better performance, and the time efficiency in the hyper-parameter searching process is further improved.
It is to be understood that the terms "first," "second," and the like, herein are used for descriptive purposes only and not for purposes of limitation, to distinguish between similar concepts.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
It will be further appreciated by those of ordinary skill in the art that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. The software modules may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above-described embodiments are intended to illustrate the objects, technical solutions and advantages of the embodiments of the present disclosure in further detail, and it should be understood that the above-described embodiments are merely exemplary embodiments of the present disclosure, and are not intended to limit the scope of the embodiments of the present disclosure.

Claims (16)

1. A method of evaluating a business model hyper-parameter, the business model for processing a business related to any one of: the business model comprises a plurality of hyper-parameters, and the method comprises the following steps:
acquiring a plurality of first hyper-parameter combinations and respective scores thereof of a business model, and a second hyper-parameter combination to be evaluated, wherein the first hyper-parameter combination and the second hyper-parameter combination respectively comprise respective values of the plurality of hyper-parameters, and the score of the first hyper-parameter combination is the performance score of the corresponding business model;
calculating the similarity of the second hyper-parameter combination and each first hyper-parameter combination;
calculating a weighted sum of the scores of each of the plurality of first hyper-parameter combinations as the estimated score of the second hyper-parameter combination, wherein the weight of the score of each of the first hyper-parameter combinations is equal to the corresponding similarity or equal to a normalized value of the corresponding similarity.
2. The method of claim 1, wherein calculating the similarity of the second hyper-parametric combination to each first hyper-parametric combination comprises calculating a distance of the second hyper-parametric combination to each first hyper-parametric combination, calculating the similarity of the second hyper-parametric combination to each first hyper-parametric combination based on each of the distances.
3. The method of claim 2, wherein the plurality of hyper-parameters comprises a first hyper-parameter, and wherein, in case the value range of the first hyper-parameter is a continuous value range, calculating the distance of the second hyper-parameter combination from each first hyper-parameter combination comprises calculating the difference of the value of the first hyper-parameter in the second hyper-parameter combination and the value of the first hyper-parameter in any of the first hyper-parameter combinations.
4. The method of claim 2, wherein calculating the distance between the second superparameter combination and each first superparameter combination in the case that the value range of the first superparameter is a discrete value range comprises determining whether the value of a first superparameter in the second superparameter combination is equal to the value of a first superparameter in any one of the first superparameter combinations.
5. A method of determining a business model hyper-parameter, the business model comprising a plurality of hyper-parameters, the method comprising:
obtaining a plurality of first hyper-parameter combinations, wherein each first hyper-parameter combination comprises respective values of the plurality of hyper-parameters;
acquiring a performance score of the business model corresponding to each first hyper-parameter combination as a score of each first hyper-parameter combination based on a training sample set and a test sample set which are prepared in advance, wherein the training sample set and the test sample set are related to any one of the following objects: users, merchants, goods, transactions;
determining a first high-performance hyper-parameter combination and a plurality of low-performance hyper-parameter combinations from the plurality of first hyper-parameter combinations based on the scores of the respective first hyper-parameter combinations;
determining a first value subspace of the plurality of hyper-parameters based on respective hyper-parameter values of the first high-performance hyper-parameter combination and the plurality of low-performance hyper-parameter combinations, wherein the first value subspace comprises the first high-performance hyper-parameter combination and does not comprise the plurality of low-performance hyper-parameter combinations;
randomly acquiring a plurality of second hyper-parameter combinations from the first value subspace;
predicting an estimated score for each of the second hyperparametric combinations by the method according to any of the claims 1-4;
and selecting a preset number of second hyper-parameter combinations as first hyper-parameter combinations processed next time when the method is executed based on the estimation scores of the second hyper-parameter combinations.
6. The method of claim 5, wherein determining a first value subspace of the plurality of hyper-parameters based on respective hyper-parameter values of the first high-performance hyper-parameter combination and the plurality of low-performance hyper-parameter combinations comprises:
selecting a first low-performance hyper-parameter combination from the plurality of low-performance hyper-parameter combinations;
selecting a second hyper-parameter from selectable hyper-parameters in a plurality of hyper-parameters, wherein the value of the second hyper-parameter of the first high-performance hyper-parameter combination is a first value, and the value of the second hyper-parameter of the first low-performance hyper-parameter combination is a second value, and the value range of the selectable hyper-parameter comprises at least two values;
and in the case that the value range of the second hyperparameter is a continuous value range, randomly selecting a value from the value range between the first value and the second value to be used for contracting the value range of the second hyperparameter.
7. The method of claim 6, wherein determining a first value subspace of the plurality of hyper-parameters based on respective hyper-parameter values of the first high-performance hyper-parameter combination and the plurality of low-performance hyper-parameter combinations further comprises narrowing a value range of the second hyper-parameter to the first value in case the value range of the second hyper-parameter comprises a plurality of discrete values.
8. An apparatus for evaluating a business model hyper-parameter, the business model for processing a business related to any one of: user, merchant, commodity, transaction, the business model includes a plurality of hyper-parameters, the device includes:
the system comprises an acquisition unit, a calculation unit and a calculation unit, wherein the acquisition unit is configured to acquire a plurality of first hyper-parameter combinations and respective scores thereof of a business model, and a second hyper-parameter combination to be evaluated, the first hyper-parameter combinations and the second hyper-parameter combinations respectively comprise respective values of the plurality of hyper-parameters, and the scores of the first hyper-parameter combinations are performance scores of the corresponding business model;
a first calculation unit configured to calculate a similarity between the second hyper-parameter combination and each of the first hyper-parameter combinations;
a second calculation unit configured to calculate, as the estimated scores of the second hyper-parameter combinations, a weighted sum of scores of the respective plurality of first hyper-parameter combinations, wherein a weight of the score of each of the first hyper-parameter combinations is equal to the corresponding degree of similarity or is equal to a normalized value of the corresponding degree of similarity.
9. The apparatus according to claim 8, wherein the first calculation unit is further configured to calculate distances of the second hyper-parametric combinations to the respective first hyper-parametric combinations, and to calculate similarities of the second hyper-parametric combinations to the respective first hyper-parametric combinations based on the respective distances.
10. The apparatus according to claim 9, wherein the plurality of hyper-parameters includes a first hyper-parameter, and in a case where a value range of the first hyper-parameter is a continuous value range, the first calculating unit is further configured to calculate a difference between a value of the first hyper-parameter in the second hyper-parameter combination and a value of the first hyper-parameter in any one of the first hyper-parameter combinations.
11. The apparatus according to claim 9, wherein in a case that the value range of the first hyperparameter is a discrete value range, the first computing unit is further configured to determine whether the value of the first hyperparameter in the second hyperparameter combination is equal to the value of the first hyperparameter in any one of the first hyperparameter combinations.
12. An apparatus for determining a hyper-parameter of a business model, the business model comprising a plurality of hyper-parameters, the apparatus comprising:
a first acquisition unit configured to acquire a plurality of first hyper-parameter combinations, each of the first hyper-parameter combinations including respective values of the plurality of hyper-parameters;
a second obtaining unit, configured to obtain, as a score of each first hyper-parameter combination, a performance score of the business model corresponding to each first hyper-parameter combination based on a training sample set and a test sample set prepared in advance, where the training sample set and the test sample set are related to any one of the following objects: users, merchants, goods, transactions;
a first determination unit configured to determine a first high-performance hyper-parameter combination and a plurality of low-performance hyper-parameter combinations from the plurality of first hyper-parameter combinations based on scores of the respective first hyper-parameter combinations;
a second determining unit, configured to determine a first value subspace of the plurality of hyper-parameters based on respective hyper-parameter values of the first high-performance hyper-parameter combination and the plurality of low-performance hyper-parameter combinations, where the first value subspace includes the first high-performance hyper-parameter combination and does not include the plurality of low-performance hyper-parameter combinations;
a third obtaining unit configured to randomly obtain a plurality of second hyper-parameter combinations from the first value subspace;
a prediction unit configured to predict an estimated score of each of the second hyperparameter combinations by the apparatus according to any one of claims 8-11;
and the selecting unit is configured to select a preset number of second hyper-parameter combinations as first hyper-parameter combinations processed when the device is operated next time based on the estimation scores of the second hyper-parameter combinations.
13. The apparatus of claim 12, wherein the second determining unit comprises:
the first selection subunit is configured to select a first low-performance hyper-parameter combination from the plurality of low-performance hyper-parameter combinations;
a second selecting sub-unit configured to select a second hyper-parameter from selectable hyper-parameters in the plurality of hyper-parameters, where a value of the second hyper-parameter of the first high-performance hyper-parameter combination is a first value, and a value of the second hyper-parameter of the first low-performance hyper-parameter combination is a second value, where a value range of the selectable hyper-parameter includes at least two values;
a first contraction subunit, configured to, in a case that a value range of the second hyper-parameter is a continuous value range, randomly select a value from the value range between the first value and the second value, so as to contract the value range of the second hyper-parameter.
14. The apparatus of claim 13, the second determination unit further comprising: a second narrowing subunit configured to narrow the range of values of the second hyperparameter to the first value in a case where the range of values of the second hyperparameter includes a plurality of discrete values.
15. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-7.
16. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-7.
CN202010566084.0A 2020-06-19 2020-06-19 Method and device for evaluating service model hyper-parameters Active CN111539536B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010566084.0A CN111539536B (en) 2020-06-19 2020-06-19 Method and device for evaluating service model hyper-parameters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010566084.0A CN111539536B (en) 2020-06-19 2020-06-19 Method and device for evaluating service model hyper-parameters

Publications (2)

Publication Number Publication Date
CN111539536A CN111539536A (en) 2020-08-14
CN111539536B true CN111539536B (en) 2020-10-23

Family

ID=71979756

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010566084.0A Active CN111539536B (en) 2020-06-19 2020-06-19 Method and device for evaluating service model hyper-parameters

Country Status (1)

Country Link
CN (1) CN111539536B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446741A (en) * 2018-03-29 2018-08-24 中国石油大学(华东) Machine learning hyper parameter importance appraisal procedure, system and storage medium
CN108921207A (en) * 2018-06-20 2018-11-30 中诚信征信有限公司 A kind of hyper parameter determines method, device and equipment
CN111105040A (en) * 2019-11-14 2020-05-05 深圳追一科技有限公司 Hyper-parameter optimization method, device, computer equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12026612B2 (en) * 2017-06-02 2024-07-02 Google Llc Optimization of parameter values for machine-learned models
CN111260074B (en) * 2020-01-09 2022-07-19 腾讯科技(深圳)有限公司 Method for determining hyper-parameters, related device, equipment and storage medium
CN111260077A (en) * 2020-01-14 2020-06-09 支付宝(杭州)信息技术有限公司 Method and device for determining hyper-parameters of business processing model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446741A (en) * 2018-03-29 2018-08-24 中国石油大学(华东) Machine learning hyper parameter importance appraisal procedure, system and storage medium
CN108921207A (en) * 2018-06-20 2018-11-30 中诚信征信有限公司 A kind of hyper parameter determines method, device and equipment
CN111105040A (en) * 2019-11-14 2020-05-05 深圳追一科技有限公司 Hyper-parameter optimization method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111539536A (en) 2020-08-14

Similar Documents

Publication Publication Date Title
CN110852755B (en) User identity identification method and device for transaction scene
JP6954003B2 (en) Determining device and method of convolutional neural network model for database
CN109461001B (en) Method and device for obtaining training sample of first model based on second model
CN110570111A (en) Enterprise risk prediction method, model training method, device and equipment
CN111127364B (en) Image data enhancement strategy selection method and face recognition image data enhancement method
CN111681067A (en) Long-tail commodity recommendation method and system based on graph attention network
CN110222838B (en) Document sorting method and device, electronic equipment and storage medium
US9842279B2 (en) Data processing method for learning discriminator, and data processing apparatus therefor
CN110413878B (en) User-commodity preference prediction device and method based on adaptive elastic network
CN108171010B (en) Protein complex detection method and device based on semi-supervised network embedded model
CN111160459A (en) Device and method for optimizing hyper-parameters
US20230206054A1 (en) Expedited Assessment and Ranking of Model Quality in Machine Learning
CN116993548A (en) Incremental learning-based education training institution credit assessment method and system for LightGBM-SVM
US11295229B1 (en) Scalable generation of multidimensional features for machine learning
CN113988272A (en) Method and device for generating neural network, computer equipment and storage medium
CN112561569B (en) Dual-model-based store arrival prediction method, system, electronic equipment and storage medium
CN111985616B (en) Image feature extraction method, image retrieval method, device and equipment
Zhao et al. Autodes: Automl pipeline generation of classification with dynamic ensemble strategy selection
CN111126617B (en) Method, device and equipment for selecting fusion model weight parameters
CN113706285A (en) Credit card fraud detection method
CN111539536B (en) Method and device for evaluating service model hyper-parameters
CN112333652B (en) WLAN indoor positioning method and device and electronic equipment
CN111445025A (en) Method and device for determining hyper-parameters of business model
CN115936773A (en) Internet financial black product identification method and system
CN111026935B (en) Cross-modal retrieval reordering method based on adaptive measurement fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40035501

Country of ref document: HK