CN112488245A - Service model hyper-parameter configuration determining method, device, equipment and storage medium - Google Patents

Service model hyper-parameter configuration determining method, device, equipment and storage medium Download PDF

Info

Publication number
CN112488245A
CN112488245A CN202011534124.XA CN202011534124A CN112488245A CN 112488245 A CN112488245 A CN 112488245A CN 202011534124 A CN202011534124 A CN 202011534124A CN 112488245 A CN112488245 A CN 112488245A
Authority
CN
China
Prior art keywords
hyper
training
target
parameter
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011534124.XA
Other languages
Chinese (zh)
Inventor
刘亮
张晟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Jiangsu Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Jiangsu Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Jiangsu Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202011534124.XA priority Critical patent/CN112488245A/en
Publication of CN112488245A publication Critical patent/CN112488245A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/17Function evaluation by approximation methods, e.g. inter- or extrapolation, smoothing, least mean square method
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Pure & Applied Mathematics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Computational Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Algebra (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a method, a device, equipment and a storage medium for determining the hyper-parameter configuration of a business model, wherein the method comprises the steps of determining a plurality of first training tasks in a preset historical task table according to a training target; calculating a distance between the target training task and each of the plurality of first training tasks; taking the hyper-parameter configuration of a second training task as a hyper-parameter sample to obtain a target hyper-parameter sample set of a target training task, wherein the second training task is a first training task with the distance smaller than a preset threshold value in a plurality of first training tasks; and sending the target hyper-parameter sample set to a model training module so that the model training module determines a target model and target hyper-parameter configuration based on the target hyper-parameter sample set and the first hyper-parameter sample set. According to the method provided by the embodiment of the invention, the super-parameter configuration of the business model can be determined based on historical experience, the convergence speed of model training is greatly improved, and the training times are reduced.

Description

Service model hyper-parameter configuration determining method, device, equipment and storage medium
Technical Field
The invention belongs to the technical field of machine learning, and particularly relates to a method, a device, equipment and a storage medium for determining service model hyper-parameter configuration.
Background
Various systems of communication operators generate a large amount of business data every day, and the data has huge application value after being analyzed and processed. Currently, a suitable model is generally established by a machine learning method to analyze and process the business data.
For a given task, establishing and deploying an effective model generally includes two main parts, one is to select a suitable model, and the other is to select a suitable hyper-parameter for the model, so as to provide guarantee for the performance of the model. For the adjustment and optimization of the hyper-parameters of the model, two methods are generally adopted in the industry at present:
firstly, a manual parameter adjusting method is used, modeling personnel realize the super-parameter effect adjustment of the model according to own experience, but abundant personal experience is needed.
And secondly, an optimization engine is utilized to perform parameter automatic tuning evaluation, the tuning optimization of the hyper-parameters needs to be based on methods such as random search, grid search, Bayesian optimization and the like, a set of hyper-parameter automatic tuning functional module for the target model is established, and the optimal hyper-parameter configuration of the target function is obtained through continuous iterative training and automatic evaluation of the model. However, in the tuning scheme based on the optimization engine, the initial value setting of the hyper-parameters is restarted every time when a new modeling task is performed, and a large number of hyper-parameter combinations need to be substituted and calculated, so that the speed is low, the efficiency is low, and the time consumption is long.
Disclosure of Invention
The embodiment of the invention provides a method, a device and equipment for determining service model hyper-parameter configuration and a computer storage medium, which can form initial hyper-parameter configuration on the basis of historical experience, improve the convergence speed of model training, reduce the training times and improve the system efficiency.
In a first aspect, an embodiment of the present invention provides a method for determining a service model hyper-parameter configuration, where the method includes: acquiring a target training task, wherein the target training task comprises a training target, and the training target comprises at least one of whether to leave the network, whether to downshift and whether to order an unlimited package; determining a plurality of first training tasks in a preset historical task table according to a training target; calculating a distance between the target training task and each of the plurality of first training tasks; taking the hyper-parameter configuration of a second training task as a hyper-parameter sample to obtain a target hyper-parameter sample set of a target training task, wherein the second training task is a first training task with the distance smaller than a preset threshold value in a plurality of first training tasks; and sending the target hyper-parameter sample set to a model training module so that the model training module constructs a plurality of business models based on the target hyper-parameter sample set and a first hyper-parameter sample set, training the plurality of business models, and determining a target model and target hyper-parameter configuration in the plurality of trained business models, wherein the first hyper-parameter sample set is obtained according to a preset hyper-parameter search algorithm.
In an optional embodiment, before obtaining the target training task, the method further includes:
a plurality of training targets and a plurality of model characteristic fields are preset, so that a target user selects a required training target and a required model characteristic field from the preset training targets and the preset model characteristic fields to generate a target training task.
In an alternative embodiment, the model characteristic field includes at least one of age, gender, average income ARPU, average traffic per month DOU, target application traffic.
In an alternative embodiment, calculating the distance between the target training task and each of the plurality of first training tasks comprises:
distances are calculated between the model feature fields of the target training task and the model feature fields of each of the plurality of first training tasks.
In an optional implementation manner, after the hyper-parameter configuration of the second training task is used as a hyper-parameter sample to obtain a target hyper-parameter sample set of the target training task, the method further includes:
determining a weighted distance of a distance corresponding to each hyper-parameter sample in the target hyper-parameter sample set according to a plurality of preset influence factors;
and deleting the hyper-parameter samples in the target hyper-parameter sample set when the weighting distance corresponding to the hyper-parameter samples does not meet the preset condition.
In an alternative embodiment, the configuring the hyper-parameter of the second training task as a hyper-parameter sample to obtain a target hyper-parameter sample set of the target training task, includes:
determining a plurality of second training tasks in the plurality of first training tasks according to a preset distance threshold;
according to the plurality of second training tasks, determining the hyper-parameter configuration of each second training task in the plurality of second training tasks in a preset historical hyper-parameter table;
and taking the hyper-parameter configuration of each second training task as a hyper-parameter sample to obtain a target hyper-parameter sample set of the target training task.
In an alternative embodiment, the method further comprises:
and configuring the target training task and the target hyper-parameter, and storing the target training task and the target hyper-parameter into a historical training task table and a historical hyper-parameter table.
In an alternative embodiment, the training objectives of the plurality of first training tasks are the same as the training objectives of the target training task.
In a second aspect, an embodiment of the present invention provides a device for determining a hyper-parameter configuration of a business model, where the device includes:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is configured to acquire a target training task, the target training task comprises a training target, and the training target comprises at least one of whether off-line exists or not, whether downshifting exists or not and whether an unlimited package is ordered or not;
the first judgment module is configured to determine a plurality of first training tasks in a preset historical task table according to a training target;
a first calculation module configured to calculate a distance between the target training task and each of a plurality of first training tasks;
the system comprises a hyper-parameter sample construction module, a hyper-parameter sample construction module and a hyper-parameter sample construction module, wherein the hyper-parameter sample construction module is configured to use hyper-parameter configuration of a second training task as a hyper-parameter sample so as to obtain a target hyper-parameter sample set of a target training task, and the second training task is a first training task with a distance smaller than a preset threshold value in a plurality of first training tasks;
the information sending module is configured to send the target hyper-parameter sample set to the model training module, so that the model training module constructs a plurality of business models based on the target hyper-parameter sample set and the first hyper-parameter sample set, trains the plurality of business models, and determines a target model and target hyper-parameter configuration in the plurality of trained business models, wherein the first hyper-parameter sample set is obtained according to a preset hyper-parameter search algorithm.
In a third aspect, an embodiment of the present invention provides a device for determining a hyper-parameter configuration of a service model, where the device includes: a processor, and a memory storing computer program instructions; the processor reads and executes the computer program instructions to implement the business model hyper-parameter configuration determining method provided by the first aspect and any optional implementation manner of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer storage medium, where computer program instructions are stored on the computer storage medium, and when the computer program instructions are executed by a processor, the method for determining the hyper-parameter configuration of the service model provided in the first aspect and any optional implementation manner of the first aspect is implemented.
According to the method, the device, the equipment and the storage medium for determining the service model hyper-parameter configuration, a plurality of first training tasks can be determined in a preset historical task table according to the obtained target training tasks; and obtaining a target hyper-parameter sample set of the target training task according to the distance between the target training task and each of the plurality of first training tasks and the hyper-parameter configuration of the first training task, determining the initial hyper-parameter configuration of the model corresponding to the target training task based on the target hyper-parameter sample set, substituting the initial hyper-parameter configuration as a high-priority configuration into an optimization engine of a model training module for hyper-parameter tuning, and greatly improving the convergence speed of model training, greatly reducing the training times, improving the system efficiency and rapidly obtaining a global optimal solution because the initial hyper-parameter configuration is formed on the basis of historical experience and is closer to the optimal hyper-parameter configuration of the target task.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for determining a hyper-parameter configuration of a business model according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a sample controller function block provided by an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a service model hyper-parameter configuration determining apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a service model hyper-parameter configuration determining device according to an embodiment of the present invention.
Detailed Description
Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Data modeling is a relatively high-technology-threshold work, most of the work content of the data modeling is focused on adjusting and optimizing the model hyper-parameters, and two methods are generally adopted in the industry at present:
the method is characterized in that a manual parameter adjusting method is used, a modeling worker carries out manual parameter adjustment according to self subjective experience and in combination with model fitting degree, various index grading conditions and the like, and model results under various schemes are compared by continuously trying to change over parameter values, so that effect optimization of the model is realized.
However, the manual parameter adjusting method requires strong professional skills and rich modeling experience of modeling personnel, the optimal scheme is selected by performing transverse effect comparison through setting hyper-parameters for a plurality of times and substituting the hyper-parameters into a training process, the final parameter adjusting result greatly depends on personal business level, the optimal parameter adjusting effect is not easy to obtain, the manual parameter adjusting process is long in time consumption, large in workload, and large in parameter adjusting efficiency and effect difference under different scenes.
And the second scheme utilizes AI technology to automatically adjust parameters of the model, and performs functions of extracting, processing, modeling and the like by system encapsulation data to automatically adjust and evaluate the parameters of the model so as to realize full-process automatic model mining. The adjustment and optimization of the hyper-parameters are mainly based on methods such as random search, grid search, Bayesian optimization and the like, a set of hyper-parameter automatic adjusting and optimizing functional modules for the target function is established, and the optimal hyper-parameter scheme of the target function is obtained through continuous iterative training and automatic evaluation of the models.
However, based on automatic parameter tuning methods such as bayes, grid search, random search and the like, the process of selecting hyper-parameters is automatically completed by a system, but certain defects exist, such as too strong randomness of an optimization scheme of a random search algorithm and large difference of optimization efficiency under different scenes; grid search requires a large amount of computing resources and time; bayesian optimization algorithms tend to fall into local optima, etc. In addition, each time when a new modeling task is performed, the parameter initial value setting is restarted, a large number of hyper-parameter combination agent models are required to be calculated in a substitution mode, and the method is slow in speed, low in efficiency and long in time consumption.
With more and more model training results, a new model optimization task still needs to start from the beginning according to the same step, the early training experience is not effectively utilized in the subsequent training, and under the scene of particularly more training tasks, a large amount of priori knowledge has no obvious guiding effect on the subsequent model optimization task.
Based on the above problems, the application provides a method, a device, equipment and a storage medium for determining the super-parameter configuration of a business model, and combines the scheme II in the foregoing, a sample controller module is added in an optimization process, and through the module, the final super-parameter optimization configuration scheme of all previous model training tasks is stored as an experience base.
The following first describes a method for determining a hyper-parameter configuration of a business model, which can be implemented based on the foregoing sample controller. Referring to fig. 1, a flow chart of a method for determining a hyper-parameter configuration of a business model may include steps S101 to S105.
Step S101, a target training task is obtained, wherein the target training task comprises a training target, and the training target comprises at least one of whether off-network exists or not, whether downshifting exists or not and whether an unlimited package is ordered or not.
The sample controller obtains a target training task from a user, wherein the target training task can be input by the user independently or generated by the user through selection and combination according to preset task composition elements. The target training tasks include training targets, the training targets include at least one of whether to go off-grid, whether to downshift, whether to order an unlimited package, and as business progresses, the training targets may also add other training targets.
In one example, a functional block diagram of a sample controller can refer to fig. 2. The sample controller may include a configuration sample knowledge base module, a probability formula generator, a selection probability calculator, and a sample plan selector.
Configuring a sample experience library module, wherein the experience library stores all historical task information and mainly comprises two tables: the method comprises the steps of historical training task list storing configuration information, multiplexing times and the like of historical tasks, setting a unique main key 'hyper-parameter configuration scheme' to be associated with a historical hyper-parameter list, storing specific hyper-parameter schemes of the tasks in the historical hyper-parameter list, tabulating parameters possibly related to all algorithms, filling optimal parameter values of the tasks, and filling null values for algorithms not containing certain parameters. The fields of the historical training task table are described as follows:
warehousing time: the parameter configuration scheme is included in the time of the database, and the field is designed to take the training of the partial model into consideration and have correlation with the time;
region: storing the region attributes of the parameter configuration scheme, wherein the models of similar regions have similarity;
target string: storing the dimensional values of the training targets of the model, wherein only the optimization tasks of the same training targets have comparative significance;
characteristic string: and (3) inputting a characteristic field of model training, combing a full-quantity model training common characteristic library, coding the characteristic library in a binary form in sequence, fixing the position of each characteristic in a binary string, and if a certain characteristic participates in the model training, setting the position value of the characteristic to be 1, otherwise, setting the position value to be 0. For storage convenience, the feature string is ultimately stored in hexadecimal numbers.
Multiplexing times are as follows: the number of times the sample plan is reused by other training tasks.
And the probability formula generator is used for calculating a regression formula of the new optimization task and the historical task through the sample scheme of the historical training task so as to evaluate the similarity of the new optimization task and the historical task. Dividing samples of the historical scheme library into two sets A and B according to 80% and 20%, comparing each sample in B with the sample in A one by one, and calculating the Euclidean distance DisT between sample schemesabAnd comparing the similarity with the historical sample, and setting a and B as sample schemes in A and B respectively
Figure BDA0002849902100000071
a, b approximation of
Figure BDA0002849902100000072
Once d isapproIf K is less than K (K is constant and configurable), a and b are considered to be approximate super-parameter configuration schemes. Traversing A, B all sample schemes to obtain multiple sets of approximate solutions, taking the multiple sets as positive samples, selecting a certain proportion of negative samples (non-approximate solutions), and inputting XGboost algorithm operation together to obtain a probability model of the new and old tasks with the approximate solutions: f. ofprob=Compare_Func(Dnew,Dn) (the model output is a number between 0 and 1, representing the approximation of the two sample schemes, Dnew={xnew_1,xnew_2,xnew_3...xnew_nIs the training feature set of the new task, Dn={x1,x2,x3...xnIs a comparative historical task training feature set)) probabilistic model generator does not need to work at each task, and is trained monthly and outputs fprobAnd (4) finishing.
Selecting probability calculator, receiving new task Dnew={xnew_1,xne_w2,xne_w3...xne_wAt this time, pass fprobAnd history sample Dn={x1,x2,x3...xnComparing one by one to obtain the probability f of the history sample and the new task not approximate solutionnew
A sample scheme selector, which simultaneously sets the proportion of the sample generated by the sample controller participating in the initial training of the model and compares the result f in 3newAnd selecting a hyper-parameter configuration scheme in the historical library from high to low to participate in the training of a new task.
Step S102, determining a plurality of first training tasks in a preset historical task table according to a training target.
The sample controller is stored with a historical task table in advance, and the historical task table is stored with a plurality of trained models and hyper-parameter configurations corresponding to the models. In this step, the sample controller may determine a plurality of first training tasks in the historical task table according to a training goal in the target training tasks.
In one example, the training goal of each of the plurality of first training tasks is the same as the training goal of the target training task. For example, the target training task is compared with the historical training task table to obtain a historical sample set f with the same training target Ysame
Step S103, calculating a distance between the target training task and each of the plurality of first training tasks.
After finding a plurality of first training tasks, the sample controller may determine similarity between each first training task and the target training task, specifically, calculate a distance between the first training task and the target training task, and determine the level of the similarity according to a value of the distance.
In one example, model feature fields may also be included in each training task, which may include one or more of age, gender, average income ARPU, average traffic per month DOU, target application traffic. The distance between the target training task and the first training task is calculated, and specifically, the distance between the model feature field of the target training task and the model feature field of each of the plurality of first training tasks may be calculated.
In one example, a target training task feature field Dnew={xnew_1,xnew_2,xnew_3...xnew_nIs passed through fprobCalendar with two handsStamper sample Dn={x1,x2,x3...xnComparing one by one to obtain the evaluation probability of the similarity between the historical samples and the new tasks, and recording the historical samples with the similarity larger than a set threshold value F as a set Fnew
And step S104, taking the hyper-parameter configuration of a second training task as a hyper-parameter sample to obtain a target hyper-parameter sample set of the target training task, wherein the second training task is a first training task with the distance smaller than a preset threshold value in a plurality of first training tasks.
In this step, the sample controller may first determine a plurality of second training tasks among the plurality of first training tasks according to the value of each distance and a preset threshold value for the distance. The second training task has a higher similarity to the target training task than the first training task. And taking the second training task as an initial hyper-parameter sample so as to construct a target hyper-parameter sample set.
Step S105, sending the target hyper-parameter sample set to a model training module so that the model training module constructs a plurality of business models based on the target hyper-parameter sample set and a first hyper-parameter sample set, training the plurality of business models, and determining a target model and target hyper-parameter configuration in the plurality of trained business models, wherein the first hyper-parameter sample set is obtained according to a preset hyper-parameter search algorithm.
And constructing a first hyper-parameter sample set through a preset hyper-parameter search algorithm. And combining the first hyper-parameter sample set, the target hyper-parameter sample set and a preset proportion to construct a second hyper-parameter sample set. Inputting the second hyper-parameter sample set into a hyper-parameter optimization engine of a model training module to search for an optimal hyper-parameter sample set, constructing a business model based on each sample in the optimal hyper-parameter sample set, training, and finally selecting an optimal business model, namely a target model, according to a training result. And the hyper-parameter configuration corresponding to the target model is the target hyper-parameter configuration.
According to the method for determining the super-parameter configuration of the business model, the sample controller is arranged in front when a super-parameter automatic training task is carried out. And storing the previously optimized hyper-parameters and model schemes, namely a preset historical task table, by the sample controller. The method can determine a plurality of first training tasks in a preset historical task table, and obtain a target hyper-parameter sample set according to the plurality of first training tasks. The target hyper-parameter sample set is used as an original parameter sample for subsequent training, so that a large amount of historical experience can be applied to a new training process, the time for model hyper-parameter optimization is greatly saved, and the efficiency of model convergence is improved.
In one embodiment, before step S101, the method for determining the hyper-parameter configuration of the business model may further include step S106.
And S106, presetting a plurality of training targets and a plurality of model characteristic fields so that the target user selects the required training targets and model characteristic fields from the preset training targets and model characteristic fields to generate a target training task.
The user can construct different target training tasks by selecting training targets and characteristic fields. The system may generate a target training task according to the user's selection. For example, the user sets a training Target (Y — Target _ Func (x1, x2 … xn)) (e.g., whether to go off-line, whether to downshift, whether to order unlimited packages, etc.) for the traffic model and a feature field D of the training model { x1, x2 … xn } (age, gender, ARPU, DOU, micro-traffic, etc.).
In one embodiment, after the hyper-parameter configuration of the second training task is used as a hyper-parameter sample to obtain a target hyper-parameter sample set of the target training task, the method further includes steps S107-S108.
And S107, determining the weighted distance of the distance corresponding to each hyper-parameter sample in the target hyper-parameter sample set according to a plurality of preset influence factors.
In the process of determining the distance in step S103, some factors may not be suitable for being added to the calculation process of the distance for some reasons, and if the factors are directly added, the complexity of the calculation process may be increased or conflict with other factors, so that a certain error may exist in the result. In this case, the weighted distance of the distance may be calculated directly, based on some factor not directly added to the distance calculation. For example, the simple feature string similarity comparison does not consider time, region, and other information affecting the accuracy of the model, and can be used to calculate the weighted distance.
And S108, deleting the hyper-parameter samples in the target hyper-parameter sample set when the weighting distance corresponding to the hyper-parameter samples does not meet the preset condition.
Considering that after the weighted distance is calculated, part of the samples no longer satisfy the preset condition, at this time, the samples can be deleted from the target hyper-parameter sample set.
In one example, since the pure feature string similarity comparison does not consider time, region and other information affecting the accuracy of the model, different weighting values Q ═ { Q1, Q2 … qn } need to be assigned to different attribute synthesis inputs to participate in the calculation of the optimal parameter scheme. For the screened sample schemes, all attributes in table 1 are matched as input variables, weighted summation is carried out, and the support degree score f ═ g Σ QD + f of each sample scheme is outputnewI.e. the weighted distance.
According to the method for determining the super-parameter configuration of the service model, not only the model characteristic field but also other information influencing the accuracy of the model such as time, region and the like are considered in the process of constructing the target super-parameter sample set, so that the obtained target super-parameter sample set has higher referential property. The target hyper-parameter sample set is used as an original parameter sample for subsequent training, the effect is better, and the efficiency of model convergence can be improved.
In one embodiment, step S104 may specifically include steps S1041-S1043.
Step S1041, determining a plurality of second training tasks among the plurality of first training tasks according to a preset threshold value of the distance;
step S1042, according to the plurality of second training tasks, determining the hyper-parameter configuration of each second training task in the plurality of second training tasks in a preset historical hyper-parameter table;
and S1043, taking the hyper-parameter configuration of each second training task as a hyper-parameter sample to obtain a target hyper-parameter sample set of the target training task.
In one example, the business model hyper-parameter configuration determination method may further include step S109.
And step S109, configuring the target training task and the target hyper-parameter, and storing the target training task and the target hyper-parameter into a historical training task table and a historical hyper-parameter table.
In a specific example, in order to verify the method for optimizing the hyperparameter based on the sample controller, the applicant uses a plurality of model tuning tasks with different algorithms and different training characteristic variables as an example, and the model tuning tasks are respectively applied to the grid search hyperparameter optimization and the hyperparameter optimization based on the sample controller, and the time consumed by each task to reach the set value f1 is calculated, and the obtained experimental results are shown in table 1.
TABLE 1 model tuning and comparing table
Task list Number of features f1 Mesh traversal optimization Random search optimization Sample controller optimization
Model 1: xgboost 50 0.81 1h21m24s 58m28s 12m32s
Model 2: xgboost 35 0.79 42min22s 43m17s 37m02s
Model 3: gbdt 70 0.77 1h22m37s 1h14m21s 47m11s
Model 4: randomfortest 62 0.81 1h11min13s 1h12m12s 49m12s
Model 5: c5.0 30 0.78 37min42s 41m20s 2m32s
The experimental result shows that the time consumed by the scheme of the application is obviously less than that consumed by random optimization and grid optimization on the premise that the same optimization target is achieved by the super-parameter optimization. The super-parameter optimization method based on the sample controller, provided by the application, combines with the common characteristics of the historical optimization tasks, fully utilizes the historical experience, and compared with the traditional parameter optimization method, the optimization convergence efficiency is effectively improved, more reasoning tasks can be completed in the same time period, the scale popularization of data mining is accelerated, and the application effect of data mining is rapidly evaluated and improved.
According to the method for determining the service model hyper-parameter configuration, a plurality of first training tasks can be determined in a preset historical task table, and a target hyper-parameter sample set is obtained according to the plurality of first training tasks. The target hyper-parameter sample set is used as an original parameter sample for subsequent training, so that a large amount of historical experience can be applied to a new training process, the time for model hyper-parameter optimization is greatly saved, the model convergence efficiency is improved, and after the target model and the target hyper-parameter configuration are obtained, the target model and the target hyper-parameter configuration are stored in a historical training task table and a historical hyper-parameter table, so that virtuous circle can be realized.
Based on the service model hyper-parameter configuration determining method provided in the foregoing embodiment, correspondingly, the embodiment of the present application further provides a service model hyper-parameter configuration determining apparatus, as shown in fig. 3, the apparatus may include a first obtaining module 301, a first judging module 302, a first calculating module 303, a hyper-parameter sample constructing module 304, and an information sending module 305.
A first obtaining module 301 configured to obtain a target training task, the target training task including a training target, the training target including at least one of whether to go off-grid, whether to downshift, whether to order an unlimited package.
The first determining module 302 is configured to determine a plurality of first training tasks in a preset historical task table according to a training target.
A first calculation module 303 configured to calculate a distance between the target training task and each of the plurality of first training tasks.
The hyper-parameter sample construction module 304 is configured to use a hyper-parameter configuration of a second training task as a hyper-parameter sample to obtain a target hyper-parameter sample set of the target training task, where the second training task is a first training task with a distance smaller than a preset threshold value among the plurality of first training tasks.
The information sending module 305 is configured to send the target hyper-parameter sample set to the model training module, so that the model training module constructs a plurality of business models based on the target hyper-parameter sample set and a first hyper-parameter sample set, trains the plurality of business models, and determines a target model and a target hyper-parameter configuration in the plurality of trained business models, wherein the first hyper-parameter sample set is obtained according to a preset hyper-parameter search algorithm.
According to the device for determining the hyper-parameter configuration of the business model provided by the embodiment of the application, the first judging module 302 may determine a plurality of first training tasks according to the data of the first obtaining module 301, and the training targets of the first training tasks are the same as those of the target training tasks. The hyper-parameter sample construction module 304 and the first calculation module 303 can obtain a target hyper-parameter sample set according to the result of the first judgment module 302. The target hyper-parameter sample set is used as an original parameter sample for subsequent training, so that a large amount of historical experience can be applied to a new training process, the time for model hyper-parameter optimization is greatly saved, and the efficiency of model convergence is improved.
In one embodiment, the apparatus may further include a task setting module.
The task setting module is configured to preset a plurality of training targets and a plurality of model characteristic fields before the target training task is obtained, so that a target user selects a required training target and a required model characteristic field from the preset training targets and the model characteristic fields to generate the target training task.
In one example, the model feature fields in the task setup module include at least one of age, gender, average income ARPU, average traffic per month DOU, target application traffic.
In one example, the first calculation module 303 is specifically configured to calculate a distance between the model feature field of the target training task and the model feature field of each of the plurality of first training tasks.
In one embodiment, the apparatus may further include a distance weighting module and a noise data processing module.
And the distance weighting module is configured to determine a weighted distance of a distance corresponding to each hyper-parameter sample in the target hyper-parameter sample set according to a plurality of preset influence factors after the hyper-parameter configuration of the second training task is used as a hyper-parameter sample to obtain the target hyper-parameter sample set of the target training task.
And the noise data processing module is configured to delete the hyper-parameter samples in the target hyper-parameter sample set when the weighted distance corresponding to the hyper-parameter samples does not meet the preset condition.
In one embodiment, the hyper-parametric sample building block 304 may include:
the first judging unit is configured to determine a plurality of second training tasks in the plurality of first training tasks according to a preset threshold value of the distance.
And the second judging unit is configured to determine the hyper-parameter configuration of each second training task in the plurality of second training tasks in a preset historical hyper-parameter table according to the plurality of second training tasks.
And the hyper-parameter sample set construction unit is configured to use the hyper-parameter configuration of each second training task as a hyper-parameter sample to obtain a target hyper-parameter sample set of the target training task.
In one example, the apparatus may further include an information storage module.
And the information storage module is configured to configure the target training task and the target hyper-parameter and store the target training task and the target hyper-parameter into a historical training task table and a historical hyper-parameter table.
In one example, the training objectives of the plurality of first training tasks determined in the first determination module 302 are the same as the training objectives of the target training task.
The service model hyper-parameter configuration determining method provided by the foregoing embodiments may be executed by the service model hyper-parameter configuration determining device shown in fig. 4.
The business model hyper-parameter configuration determining apparatus may comprise a processor 401 and a memory 402 storing computer program instructions.
Specifically, the processor 401 may include a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement the embodiments of the present invention.
Memory 402 may include mass storage for data or instructions. By way of example, and not limitation, memory 402 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. In one example, memory 402 may include removable or non-removable (or fixed) media, or memory 402 is non-volatile solid-state memory. The memory 402 may be internal or external to the integrated gateway disaster recovery device.
In one example, the Memory 402 may be a Read Only Memory (ROM). In one example, the ROM may be mask programmed ROM, programmable ROM (prom), erasable prom (eprom), electrically erasable prom (eeprom), electrically rewritable ROM (earom), or flash memory, or a combination of two or more of these.
The memory 402 may include Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to the methods according to an aspect of the present disclosure.
The processor 401 reads and executes the computer program instructions stored in the memory 402 to implement the service model hyper-parameter configuration determining method provided in any of the above embodiments, and achieve the corresponding technical effect achieved by the method, which is not described herein again for brevity.
In one example, the business model hyper-parameter configuration determining device may further comprise a communication interface 403 and a bus 410. The processor 401, the memory 402, and the communication interface 403 are connected via a bus 410 to complete communication therebetween.
The communication interface 403 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present invention.
Bus 410 comprises hardware, software, or both that couple the components of the online data traffic billing device to one another. By way of example, and not limitation, a Bus may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (Front Side Bus, FSB), a Hyper Transport (HT) interconnect, an Industry Standard Architecture (ISA) Bus, an infiniband interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a Micro Channel Architecture (MCA) Bus, a Peripheral Component Interconnect (PCI) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a video electronics standards association local (VLB) Bus, or other suitable Bus or a combination of two or more of these. Bus 410 may include one or more buses, where appropriate. Although specific buses have been described and shown in the embodiments of the invention, any suitable buses or interconnects are contemplated by the invention.
The service model hyper-parameter configuration determining equipment can form initial hyper-parameter configuration on the basis of historical experience, improve the convergence speed of model training, reduce the training times and improve the system efficiency.
In combination with the service model hyper-parameter configuration determining method in the foregoing embodiment, the embodiment of the present invention may provide a computer storage medium to implement. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement any one of the business model hyper-parameter configuration determination methods in the above embodiments.
It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic Circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims (11)

1. A method for determining the hyper-parameter configuration of a business model is characterized by comprising the following steps:
acquiring a target training task, wherein the target training task comprises a training target, and the training target comprises at least one of whether to leave a network, whether to downshift and whether to order an unlimited package;
determining a plurality of first training tasks in a preset historical task table according to the training targets;
calculating a distance between the target training task and each of the plurality of first training tasks;
taking the hyper-parameter configuration of a second training task as a hyper-parameter sample to obtain a target hyper-parameter sample set of the target training task, wherein the second training task is the first training task of which the distance is smaller than a preset threshold value in the plurality of first training tasks;
and sending the target hyper-parameter sample set to a model training module so that the model training module constructs a plurality of business models based on the target hyper-parameter sample set and a first hyper-parameter sample set, training the business models, and determining a target model and target hyper-parameter configuration in the trained business models, wherein the first hyper-parameter sample set is obtained according to a preset hyper-parameter search algorithm.
2. The method of claim 1, wherein prior to obtaining the target training task, the method further comprises:
a plurality of training targets and a plurality of model characteristic fields are preset, so that a target user selects a required training target and a required model characteristic field from the preset training targets and the preset model characteristic fields to generate a target training task.
3. The method of claim 2, wherein the model feature field comprises at least one of age, gender, average income ARPU, average traffic per month DOU, target application traffic.
4. The method of claim 2, wherein calculating the distance between the target training task and each of the plurality of first training tasks comprises:
and calculating the distance between the model characteristic field of the target training task and the model characteristic field of each of the plurality of first training tasks.
5. The method of claim 1, wherein after the configuring the hyper-parameter of the second training task as a hyper-parameter sample to obtain the target hyper-parameter sample set of the target training task, the method further comprises:
determining a weighted distance of distances corresponding to each hyper-parameter sample in the target hyper-parameter sample set according to a plurality of preset influence factors;
and when the weighting distance corresponding to the hyper-parameter sample does not meet the preset condition, deleting the hyper-parameter sample in the target hyper-parameter sample set.
6. The method of claim 1, wherein the configuring the hyper-parameter of the second training task as a hyper-parameter sample to obtain a target hyper-parameter sample set of the target training task comprises:
determining a plurality of second training tasks in the plurality of first training tasks according to a preset distance threshold;
according to the plurality of second training tasks, determining the hyper-parameter configuration of each second training task in the plurality of second training tasks in a preset historical hyper-parameter table;
and taking the hyper-parameter configuration of each second training task as a hyper-parameter sample to obtain a target hyper-parameter sample set of the target training task.
7. The method of claim 6, further comprising:
and configuring the target training task and the target hyper-parameter, and storing the target training task and the target hyper-parameter into a historical training task table and a historical hyper-parameter table.
8. The method of claim 1, wherein the training objectives of the plurality of first training tasks are the same as the training objectives of the target training task.
9. A business model hyper-parameter configuration determining apparatus, comprising:
a first obtaining module configured to obtain a target training task, the target training task including a training target, the training target including at least one of whether to go off-grid, whether to downshift, whether to order an unlimited package;
the first judgment module is configured to determine a plurality of first training tasks in a preset historical task table according to the training target;
a first calculation module configured to calculate a distance between the target training task and each of the plurality of first training tasks;
a hyper-parameter sample construction module configured to use a hyper-parameter configuration of a second training task as a hyper-parameter sample to obtain a target hyper-parameter sample set of the target training task, wherein the second training task is the first training task of which the distance is smaller than a preset threshold value among the plurality of first training tasks;
and the information sending module is configured to send the target hyper-parameter sample set to a model training module so that the model training module constructs a plurality of business models based on the target hyper-parameter sample set and a first hyper-parameter sample set, trains the business models, and determines a target model and target hyper-parameter configuration in the trained business models, wherein the first hyper-parameter sample set is obtained according to a preset hyper-parameter search algorithm.
10. A business model hyper-parameter configuration determining apparatus, the apparatus comprising: a processor, and a memory storing computer program instructions; the processor reads and executes the computer program instructions to implement a business model hyper-parameter configuration determination method as claimed in any of claims 1-8.
11. A computer storage medium having computer program instructions stored thereon, which when executed by a processor, implement a business model hyper-parameter configuration determination method as claimed in any of claims 1-8.
CN202011534124.XA 2020-12-21 2020-12-21 Service model hyper-parameter configuration determining method, device, equipment and storage medium Pending CN112488245A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011534124.XA CN112488245A (en) 2020-12-21 2020-12-21 Service model hyper-parameter configuration determining method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011534124.XA CN112488245A (en) 2020-12-21 2020-12-21 Service model hyper-parameter configuration determining method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112488245A true CN112488245A (en) 2021-03-12

Family

ID=74915449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011534124.XA Pending CN112488245A (en) 2020-12-21 2020-12-21 Service model hyper-parameter configuration determining method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112488245A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109213805A (en) * 2018-09-07 2019-01-15 东软集团股份有限公司 A kind of method and device of implementation model optimization
CN110991658A (en) * 2019-11-28 2020-04-10 重庆紫光华山智安科技有限公司 Model training method and device, electronic equipment and computer readable storage medium
WO2020249125A1 (en) * 2019-06-14 2020-12-17 第四范式(北京)技术有限公司 Method and system for automatically training machine learning model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109213805A (en) * 2018-09-07 2019-01-15 东软集团股份有限公司 A kind of method and device of implementation model optimization
WO2020249125A1 (en) * 2019-06-14 2020-12-17 第四范式(北京)技术有限公司 Method and system for automatically training machine learning model
CN110991658A (en) * 2019-11-28 2020-04-10 重庆紫光华山智安科技有限公司 Model training method and device, electronic equipment and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
柯志辅;: "运营商移动用户离网预测模型", 科技经济导刊, no. 29, pages 48 *

Similar Documents

Publication Publication Date Title
CN112766550B (en) Random forest-based power failure sensitive user prediction method, system, storage medium and computer equipment
CN112910690A (en) Network traffic prediction method, device and equipment based on neural network model
CN111782460A (en) Large-scale log data anomaly detection method and device and storage medium
CN115577858B (en) Block chain-based carbon emission prediction method and device and electronic equipment
CN111510368B (en) Family group identification method, device, equipment and computer readable storage medium
CN116596095B (en) Training method and device of carbon emission prediction model based on machine learning
CN114662602A (en) Outlier detection method and device, electronic equipment and storage medium
CN116150125A (en) Training method, training device, training equipment and training storage medium for structured data generation model
CN113988558B (en) Power grid dynamic security assessment method based on blind area identification and electric coordinate system expansion
CN110059938B (en) Power distribution network planning method based on association rule driving
CN114416573A (en) Defect analysis method, device, equipment and medium for application program
CN110377909A (en) A kind of classification method and device of client feedback information
CN112488245A (en) Service model hyper-parameter configuration determining method, device, equipment and storage medium
CN116664335A (en) Intelligent monitoring-based operation analysis method and system for semiconductor production system
CN113554049A (en) Method, device, equipment and storage medium for identifying different network broadband users
Zhang et al. Federated multi-task learning with non-stationary heterogeneous data
CN116151409A (en) Urban daily water demand prediction method based on neural network
CN115130602A (en) Density nonuniform data clustering method and device, electronic equipment and storage medium
CN114140259A (en) Artificial intelligence-based wind control method, device, equipment and storage medium for underwriting
Zhao et al. On the convergence rates of KNN density estimation
US20240054369A1 (en) Ai-based selection using cascaded model explanations
CN113076348B (en) Policy information management method, device, server and storage medium
CN111984634B (en) Alarm transaction extraction method, device, equipment and computer storage medium
CN117075684B (en) Self-adaptive clock gridding calibration method for Chiplet chip
CN116910555A (en) Training and application methods, devices, equipment and media of user credit prediction model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination