CN111639807A - Model training method, duration prediction method, system, device and medium - Google Patents
Model training method, duration prediction method, system, device and medium Download PDFInfo
- Publication number
- CN111639807A CN111639807A CN202010473630.6A CN202010473630A CN111639807A CN 111639807 A CN111639807 A CN 111639807A CN 202010473630 A CN202010473630 A CN 202010473630A CN 111639807 A CN111639807 A CN 111639807A
- Authority
- CN
- China
- Prior art keywords
- order
- model
- hotel
- loss function
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000012549 training Methods 0.000 title claims abstract description 64
- 230000006870 function Effects 0.000 claims description 82
- 238000007781 pre-processing Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 10
- 230000003068 static effect Effects 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 230000003247 decreasing effect Effects 0.000 claims description 2
- 230000007547 defect Effects 0.000 abstract description 2
- 238000012790 confirmation Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000003203 everyday effect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0633—Lists, e.g. purchase orders, compilation or processing
- G06Q30/0635—Processing of requisition or of purchase orders
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/12—Hotels or restaurants
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- Tourism & Hospitality (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Finance (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Primary Health Care (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method for training a model, a method for predicting duration, a system, equipment and a medium, wherein the method for training the model comprises the following steps: acquiring historical data of a plurality of hotel orders; extracting characteristic data from the historical data, inputting the acquired characteristic data into an xgboost model for training, and generating an order reply duration prediction model when the absolute value of the error between the true value of the order reply duration in the historical data and the predicted value output by the xgboost model is minimum. The model training method and the order reply duration prediction method provided by the invention overcome the defects that business personnel judge the order-hastening time of a hotel order according to business experience and the prediction accuracy is low; the accuracy of the hotel order reply duration prediction is improved, and the order hastening rate is finally reduced.
Description
Technical Field
The present invention relates to model training technologies, and in particular, to a method for model training, a method, a system, a device, and a medium for duration prediction.
Background
Large internet companies are called proxy connections for some providers, e.g., the most common providers of hotel services in the chinese and metropolitan areas. Each hotel can correspond to a plurality of sub-hotels, and the sub-hotels can be respectively sold by a plurality of different travel agencies, so the travel agencies are also called agent traffic. When a user gets an order from an order agent at an internet client, an internet company can finally confirm whether hotel commodities in the order can be really ordered or not after a supplier confirms the inventory condition of hotel rooms in the order. In the process, the user has a waiting time difference, which is called a confirmation time length, the starting time point of the confirmation time length is the ordering time point of the user, and the ending time point is the replying time point of the supplier. If the confirmation duration is too long, the user experience will be affected. Therefore, the internet company adopts a manual invoicing mode for solving the problem, and the waiting time required by the hotel order is predicted according to the type of the hotel order and the order placing time point.
In the prior art, the waiting time is completely set by service personnel according to service experience, a large amount of unnecessary invoicing causes the increase of labor cost, and the invoicing time is completely judged by the service personnel according to the service experience, so that the prediction accuracy is low.
Disclosure of Invention
The invention aims to overcome the defects that in the prior art, business personnel judge the order-drawing time of a hotel order according to business experience and the prediction accuracy is low, and provides a model training method, a duration prediction method, a system, equipment and a medium.
The invention solves the technical problems through the following technical scheme:
in a first aspect, the present invention provides a method of model training, the method comprising:
acquiring historical data of a plurality of hotel orders;
extracting characteristic data from the historical data, wherein the characteristic data comprises user ordering time point information, order type information, hotel static attribute information, holiday information and channel information of hotel suppliers, and the user ordering time point information is used for representing that ordering time points are in working periods or non-working periods;
inputting the obtained characteristic data into an xgboost model for training, and generating an order reply duration prediction model when the absolute value of the error between the true value of the order reply duration in the historical data and the predicted value output by the xgboost model is minimum; the xgboost model is a tree structure model of a plurality of serial trees constructed according to the feature data.
Preferably, the step of inputting the acquired feature data into an xgboost model for training includes:
acquiring initial super parameter values of the xgboost model and the loss function;
initializing the xgboost model based on an initial hyper-parameter value of the loss function;
according to the characteristic data and the loss function, parameter adjustment is carried out on the initialized initial super parameter value of the xgboost model so as to reduce the corresponding loss function;
when the loss function corresponding to the adjusted xgboost model is not reduced any more, determining the adjusted super parameter value of the xgboost model as a target parameter value, wherein the loss function corresponding to the target parameter value is a target loss function;
adjusting leaf node weights of the objective loss function so that a predicted value output by the xgboost model is closer to the true value.
Preferably, the loss function of the xgboost model is obtained by the following formula:
wherein L represents the loss function,t represents the number of leaf nodes in the current tree structure model, ft(xi) A functional representation representing the current tree structure model, γ representing a hyperparameter controlling the number of leaves, λ representing an L2 hyperparameter controlling the weights of the leaves, w representing the weights of the leaves, yiRepresenting the true value of the current tree structure model,representing the predicted values of the first t-1 tree structure models,represents the partial derivative, l represents the residual equation, and n represents the total historical data for the hotel order.
Preferably, the method further comprises:
after the historical data corresponding to the plurality of hotel orders are obtained, preprocessing operation is carried out on the historical data, wherein the preprocessing operation comprises the step of removing the hotel orders with order reply duration longer than preset duration.
In a second aspect, the present invention provides a system for model training, the system comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring historical data of a plurality of hotel orders;
the extraction module is used for extracting characteristic data from the historical data, wherein the characteristic data comprises user ordering time point information, order type information, hotel static attribute information, holiday information and channel information of hotel suppliers, and the user ordering time point information is used for representing that ordering time points are in working periods or non-working periods;
the training module is used for inputting the acquired feature data into an xgboost model for training, and when the absolute value of the error between the real value of the order reply duration in the historical data and the predicted value output by the xgboost model is minimum, an order reply duration prediction model is generated; the xgboost model is a tree structure model of a plurality of serial trees constructed according to the feature data.
Preferably, the training module comprises:
the acquisition unit is used for acquiring the initial hyper-parameter value of the xgboost model and the loss function;
an initialization unit, configured to initialize the xgboost model based on an initial hyper-parameter value of the loss function;
a first adjusting unit, configured to perform parameter adjustment on an initial hyper-parameter value of the initialized xgboost model according to the feature data and the loss function, so as to reduce the corresponding loss function;
a determining unit, configured to determine, when the loss function corresponding to the adjusted xgboost model is not continuously decreased any more, a super parameter value of the adjusted xgboost model as a target parameter value, where the loss function corresponding to the target parameter value is a target loss function;
and the second adjusting unit is used for adjusting the leaf node weight of the target loss function to enable the predicted value output by the xgboost model to be closer to the true value.
Preferably, the loss function of the xgboost model is obtained by the following formula:
wherein L represents the loss function,t represents the number of leaf nodes in the current tree structure model, ft(xi) A functional representation representing the current tree structure model, γ representing a hyperparameter controlling the number of leaves, λ representing an L2 hyperparameter controlling the weights of the leaves, w representing the weights of the leaves, yiRepresenting the true value of the current tree structure model,representing the predicted values of the first t-1 tree structure models,represents the partial derivative, l represents the residual equation, and n represents the total amount of historical data for the hotel order.
Preferably, the system further comprises:
the preprocessing module is used for preprocessing the historical data after acquiring the historical data corresponding to the plurality of hotel orders, wherein the preprocessing operation comprises removing the hotel orders with order reply duration longer than preset duration.
In a third aspect, the present invention provides a method for predicting an order reply duration, where the method includes:
receiving a target hotel order to be predicted in real time;
and inputting the target hotel order into the order reply duration prediction model trained by the method to obtain a reply duration value.
Preferably, the method further comprises:
acquiring the order type of the target hotel order and the order placing time point of the user;
determining an order urging time according to the order type, the order placing time point of the user and the reply duration value;
and sending the order-prompting information corresponding to the target hotel order based on the order-prompting time.
In a fourth aspect, the present invention provides a system for predicting an order reply duration, the system comprising:
the receiving module is used for receiving a target hotel order to be predicted in real time;
and the input module is used for inputting the target hotel order into the order reply duration prediction model trained by the system to obtain a reply duration value.
Preferably, the system further comprises:
the acquisition module is used for acquiring the order type of the target hotel order and the order placing time point of the user;
the determining module is used for determining the order urging time according to the order type, the order placing time point of the user and the reply duration value;
and the sending module is used for sending the order-prompting information corresponding to the target hotel order based on the order-prompting time.
In a fifth aspect, the present invention further provides an electronic device, which includes a processor, a memory, and a computer program stored in the memory and executable on the processor, where the computer program, when executed by the processor, implements the method for model training according to the first aspect, or implements the method for predicting the order reply duration according to the third aspect.
In a sixth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for model training according to the first aspect, or performs the steps of the method for predicting order reply duration according to the third aspect.
The positive progress effects of the invention are as follows: a method of model training, a method, system, device and medium of duration prediction are provided. According to the model training method and the order reply duration prediction method, the accuracy of prediction is improved by training the order reply duration prediction model, and the problem that the labor cost is high and the accuracy is poor due to the fact that the order reply duration is predicted only based on manual prediction is solved by the order reply duration prediction method, so that the accuracy of prediction of order-holding time of a hotel order is improved, and the labor cost is reduced.
Drawings
Fig. 1 is a flowchart of a model training method according to embodiment 1 of the present invention.
Fig. 2 is a flowchart of step S14 of the method for model training according to embodiment 1 of the present invention.
Fig. 3 is a block diagram of a system for model training according to embodiment 2 of the present invention.
Fig. 4 is a flowchart of a method for predicting an order reply duration according to embodiment 3 of the present invention.
Fig. 5 is a block diagram illustrating a system for predicting an order reply duration according to embodiment 4 of the present invention.
Fig. 6 is a schematic diagram of a hardware structure of an electronic device according to embodiment 5 of the present invention.
Detailed Description
The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention. The experimental methods without specifying specific conditions in the following examples were selected according to the conventional methods and conditions, or according to the commercial instructions.
Example 1
The present embodiment provides a method for model training, and referring to fig. 1, the method includes the following steps:
and step S11, acquiring historical data of a plurality of hotel orders.
Receiving historical data of a plurality of hotel orders, wherein the historical data comprises ordering time points of hotels, names of the hotels, cities where the hotels are located, order information of the hotels and reply duration of the hotels. The period of historical data for the hotel order may be 7 days, 30 days, 90 days, and 180 days.
And step S12, preprocessing the historical data, wherein the preprocessing includes removing hotel orders with order reply duration longer than preset duration.
After the historical data corresponding to the plurality of hotel orders are obtained, if the confirmation mode of the hotel is known from the hotel agent business side, if the store has a house source, the order information is timely replied to confirm whether the order information is the orderable hotel order. However, there are a series of orders with a reply duration of more than 20 hours, and some hotel orders have historical data with a reply duration of even several days. Therefore, in this embodiment, hotel orders with reply duration longer than the preset duration in the hotel order information are filtered out, and the preset duration may be 3 hours, 6 hours, or 10 hours, which is not specifically limited here.
Step S13, extracting characteristic data from the historical data, wherein the characteristic data comprises user ordering time point information, order type information, hotel static attribute information, holiday information and channel information of hotel suppliers, and the user ordering time point information is used for representing that the ordering time point is in a working period or a non-working period.
In this embodiment, the ordering time point information indicates that the hotel room agent is in a working period or a non-working period, and the working period may be 9: 00 to 17 pm: 00, the period of inactivity may refer to 9: 00 to 17 pm: times other than 00. The order type information indicates whether the order is a hotel order on the current day or a hotel order on an alternate day; the hotel static attribute information can comprise the star level of the hotel, the city where the hotel is located, the business district where the hotel is located, and the country where the hotel is located; holiday information may include national celebration, date, mid-autumn or early afternoon; the provider channel information includes information for travel agency a, travel agency B, and travel agency C.
And step S14, inputting the acquired feature data into an xgboost model for training, and generating an order reply duration prediction model when the absolute value of the error between the true value of the order reply duration in the historical data and the predicted value output by the xgboost model is minimum, wherein the xgboost model is a plurality of serial tree structure models constructed according to the feature data.
In the embodiment of the application, after the order is placed by a user, the time period during which the order can be ordered or cannot be ordered because the stock is insufficient is determined by obtaining the order from the historical data of the hotel order, and the time period is a real value of the order reply time length.
After all the characteristic data are input into an xgboost model to be trained, judging whether the current node splitting in the tree structure of the xgboost model after the adjustment of the super parameter can cause the reduction of the loss function or not by adjusting the super parameter value of the loss function and combining the characteristic data, if the current node splitting in the tree structure of the xgboost model after the adjustment of the super parameter is continuously reduced, the current node splitting is continuously carried out, if the current node splitting is not reduced, the current node is a leaf node in the tree structure of the xgboost model, and the quantity of the leaf nodes in the tree structure of the xgboost model is determined by the size of the super parameter.
Further, through multiple loop iterations, according to the historical data of each hotel order, when the error between the predicted value output by the xgboost model and the actual value contained in the historical data reaches a preset threshold value, an order reply duration prediction model is generated, and the preset threshold value is the minimum absolute value of the error between the actual value and the predicted value output by the xgboost model.
In this embodiment, referring to fig. 2, the step S14 includes the following steps:
and step S141, acquiring an xgboost model and an initial hyper-parameter value of the loss function.
And S142, initializing the xgboost model based on the initial hyper-parameter value of the loss function.
And S143, according to the characteristic data and the loss function, performing parameter adjustment on the initial super parameter value of the initialized xgboost model so as to reduce the corresponding loss function.
And step S144, when the loss function corresponding to the adjusted xgboost model is not reduced any more, determining the over-parameter value of the adjusted xgboost model as a target parameter value, and determining the loss function corresponding to the target parameter value as a target loss function.
When the initial super parameter value in the loss function is adjusted each time, the loss function is continuously reduced, which means that the xgboost model is continuously split, therefore, when the initial super parameter value is adjusted to reduce the loss function to the minimum value, that is, when the loss function is not continuously reduced, the predicted value of the xgboost model under the loss function in the state is closer to the true value.
And S145, adjusting the leaf node weight of the target loss function to enable the predicted value output by the xgboost model to be closer to the true value.
Specifically, the leaf node weight of the objective loss function is adjusted to ensure that the absolute value of the difference between the predicted value and the true value output by using the xgboost model is within a preset threshold range, and the predicted value threshold is 10% or 5%. The accuracy of the prediction of the xgboost model is ensured by the adjustment of the leaf node weight.
Wherein the loss function of the xgboost model is obtained by the following formula:
wherein, L represents a loss function,t represents the number of leaf nodes in the current tree structure model, ft(xi) A functional representation representing a current tree structure model, γ representing a hyperparameter controlling the number of leaves, λ representing an L2 hyperparameter controlling the weights of the leaves, w representing the weights of the leaves, yiRepresenting the true value of the current tree structure model,representing the predicted value of the first t-1 tree structure model,represents the partial derivative, l represents the residual equation, and n represents the total historical data for the hotel order.
In this embodiment, the weight of the leaf nodeWherein k represents the sample data falling into the leaf node in the training stage, j represents the index of the leaf in the tree, the training of the xgboost model is completed through multiple similar loop iterations, and the optimization of the model effect can be realized through adjusting rich hyper-parameters in the whole process.
And after the model is trained by using the training data, testing the model by using the test data. The idea of regression problem can be utilized, the model is used for comparing the predicted value of the order reply duration in the hotel order historical data with the true value of the hotel order, and the absolute average error of the predicted value and the true value is calculated and used as the standard for judging the goodness of the model training.
The embodiment of the invention provides a model training method, which comprises the steps of adjusting a super parameter in a multi-class serial tree structure xgboost model by acquiring feature data in historical data, determining the super parameter value of the adjusted xgboost model as a target parameter value when a loss function corresponding to the adjusted xgboost model is not reduced any more, adjusting leaf node weight of the target loss function, and improving the accuracy of predicting the reply duration of a hotel order by using the xgboost model.
Example 2
The present embodiment provides a system for model training, referring to fig. 3, including: an acquisition module 110, a pre-processing module 120, an extraction module 130, and a training module 140.
The obtaining module 110 is configured to obtain historical data of a plurality of hotel orders.
The obtaining module 110 receives historical data of a plurality of hotel orders, where the historical data includes ordering time points of hotels, names of the hotels, cities where the hotels are located, order information of the hotels, and reply durations of the hotels. The period of historical data for the hotel order may be 7 days, 30 days, 90 days, and 180 days.
The preprocessing module 120 is configured to perform preprocessing operation on historical data after obtaining the historical data corresponding to the multiple hotel orders, where the preprocessing operation includes removing hotel orders whose order reply duration is greater than a preset duration.
After the preprocessing module 120 acquires the historical data corresponding to the plurality of hotel orders, if the confirmation mode of the hotel is known from the hotel agent business side that the hotel has a room source, the order information is replied in time to confirm whether the order information is an orderable hotel order. However, there are a series of orders with a reply duration of more than 20 hours, and some hotel orders have historical data with a reply duration of even several days. Therefore, in this embodiment, hotel orders with reply duration longer than the preset duration in the hotel order information are filtered out, and the preset duration may be 3 hours, 6 hours, or 10 hours, which is not specifically limited here.
And an extracting module 130, configured to extract feature data from the historical data, where the feature data includes user ordering time point information, order type information, hotel static attribute information, holiday information, and channel information of a hotel provider, and the user ordering time point information is used to represent that an ordering time point is in a working period or a non-working period.
In this embodiment, the ordering time point information in the feature data extracted by the extraction module 130 indicates that the information is in the working period or the non-working period of the hotel room agent, and the working period may refer to 9: 00 to 17 pm: 00, the period of inactivity may refer to 9: 00 to 17 pm: times other than 00. The order type information indicates whether the order is a hotel order on the current day or a hotel order on an alternate day; the hotel static attribute information can comprise the star level of the hotel, the city where the hotel is located, the business district where the hotel is located, and the country where the hotel is located; holiday information may include national celebration, date, mid-autumn or early afternoon; the provider channel information includes information for travel agency a, travel agency B, and travel agency C.
The training module 140 is configured to input the acquired feature data into an xgboost model for training, and generate an order reply duration prediction model when an absolute value of an error between a true value of order reply duration in the history data and a predicted value output by the xgboost model is minimum; the xgboost model is a tree structure model of a plurality of series constructed according to the feature data.
In the embodiment of the application, after the order is placed by a user, the time period during which the order can be ordered or cannot be ordered because the stock is insufficient is determined by obtaining the order from the historical data of the hotel order, and the time period is a real value of the order reply time length.
After all the characteristic data are input into an xgboost model to be trained, judging whether the current node splitting in the tree structure of the xgboost model after the adjustment of the super parameter can cause the reduction of the loss function or not by adjusting the super parameter value of the loss function and combining the characteristic data, if the current node splitting in the tree structure of the xgboost model after the adjustment of the super parameter is continuously reduced, the current node splitting is continuously carried out, if the current node splitting is not reduced, the current node is a leaf node in the tree structure of the xgboost model, and the quantity of the leaf nodes in the tree structure of the xgboost model is determined by the size of the super parameter.
Further, through multiple loop iterations, according to historical data of each hotel order, when an error between a predicted value output by the xgboost model and a true value contained in the historical data reaches a preset threshold value, an order reply duration prediction model is generated, and the preset threshold value is an absolute value minimum value of the error between the true value and the predicted value output by the xgboost model
In this embodiment, the training module 140 further includes:
an obtaining unit 141, configured to obtain the xgboost model and the initial hyper-parameter value of the loss function.
An initialization unit 142, configured to initialize the xgboost model based on the initial hyper-parameter value of the loss function.
The first adjusting unit 143 is configured to perform parameter adjustment on the initial hyper-parameter value of the initialized xgboost model according to the feature data and the loss function, so that the corresponding loss function is reduced.
The determining unit 144 is configured to determine, when the loss function corresponding to the adjusted xgboost model does not decrease any more, a super parameter value of the adjusted xgboost model as a target parameter value, and determine the loss function corresponding to the target parameter value as a target loss function.
When the initial super parameter value in the loss function is adjusted each time, the loss function is continuously reduced, which means that the xgboost model is continuously split, therefore, when the initial super parameter value is adjusted to reduce the loss function to the minimum value, that is, when the loss function is not continuously reduced, the predicted value of the xgboost model under the loss function in the state is closer to the true value.
And a second adjusting unit 145, configured to adjust the leaf node weights of the objective loss function so that the predicted value output by the xgboost model is closer to the true value.
Specifically, the leaf node weight of the objective loss function is adjusted to ensure that the absolute value of the difference between the predicted value and the true value output by using the xgboost model is within a preset threshold range, and the predicted value threshold is 10% or 5%. The accuracy of the prediction of the xgboost model is ensured by the adjustment of the leaf node weight.
Wherein the loss function of the xgboost model is obtained by the following formula:
wherein, L represents a loss function,t represents the number of leaf nodes in the current tree structure model, ft(xi) A functional representation representing a current tree structure model, γ representing a hyperparameter controlling the number of leaves, λ representing an L2 hyperparameter controlling the weights of the leaves, w representing the weights of the leaves, yiRepresenting the true value of the current tree structure model,representing the predicted value of the first t-1 tree structure model,represents the partial derivative, l represents the residual equation, and n represents the total historical data for the hotel order.
In this embodiment, the weight of the leaf nodeWherein k represents the sample data falling into the leaf node in the training stage, j represents the index of the leaf in the tree, the training of the xgboost model is completed through multiple similar loop iterations, and the optimization of the model effect can be realized through adjusting rich hyper-parameters in the whole process.
And after the model is trained by using the training data, testing the model by using the test data. The idea of regression problem can be utilized, the model is used for comparing the predicted value of the order reply duration in the hotel order historical data with the true value of the hotel order, and the absolute average error of the predicted value and the true value is calculated and used as the standard for judging the goodness of the model training.
In the embodiment of the invention, the extraction module acquires the characteristic data in the historical data, the training module adjusts the super-parameter in the multi-class serial tree structure xgboost model, when the loss function corresponding to the adjusted xgboost model is not reduced any more, the adjusted super-parameter value of the xgboost model is determined as the target parameter value, the leaf node weight of the target loss function is adjusted, and the accuracy of predicting the reply duration of the hotel order by using the xgboost model is improved.
Example 3
The present embodiment provides a method for predicting an order reply duration, referring to fig. 4, the method includes:
and step S21, receiving the target hotel order to be predicted in real time.
In the OTA (Online Travel) industry, since different rooms in a hotel are sold by agents of internet company a, internet company B, internet company C, Travel Agency a, Travel Agency B, and Travel Agency C. If a customer places an order for a hotel on the client of the internet company a, but the hotel for placing the order is sold by the agent of the travel agency a, when the customer places the order on the client of the internet company a, the customer needs to confirm the stock with the supplier of the travel agency a before confirming whether the order is really available for the customer. The process has a time difference of waiting of the user, namely, the confirmation time length.
At present, order data of agents are stored in a database of a cloud server, and hotel order data of the agents are synchronously intercepted into a server database of an Internet company A in the cloud server database by using a development tool.
The server of the Internet company receives hotel order information to be predicted synchronously in real time every day, wherein the hotel order information comprises hotel names, hotel addresses, hotel star levels, supplier channel information and the like.
Step S22, inputting the target hotel order into the order reply duration prediction model trained by the method in embodiment 1, and obtaining a reply duration value.
And inputting the data of the target hotel order into an order reply duration prediction model generated after training by using an xgboost model, wherein the order reply duration can extract characteristic data from the order data of the target hotel, and the characteristic data is used for analyzing and predicting.
And step S23, acquiring the order type of the target hotel order and the order placing time point of the user.
Acquiring an order type from the target hotel order, wherein the order type is an order for entering the hotel on the same day or an order for entering the hotel on the next day; the ordering time point of the user is the working time period or the non-working time period.
And step S24, determining the order urging time according to the order type, the order placing time point of the user and the reply duration value.
In this embodiment, the xgboost model is used to predict each hotel order proxied by a different travel agency in the future, and determine the reply duration of the hotel order. And the service party of the Internet company specifies a proper invoicing time according to the reply duration value and in combination with other related information. For example: the original emergency order which can be returned in 4 minutes can be called, but the order is called as soon as the order is called in 3 minutes, so that unnecessary manual calling for the order is saved, and the cost of the order is reduced.
And step S25, sending the order-prompting information corresponding to the target hotel order based on the order-prompting time.
In this embodiment, corresponding order information is sent to different travel agencies of the hotel order based on the order time. When the business directions of the internet formula are ordered to the travel agency of the hotel, the business directions can continuously send a piece of order information each time, and the order information can also be sent every 1 minute or every 2 minutes after the first order information is sent. Until the agent of the travel agency receives the order information and returns confirmation that the order is orderable or orderable.
The embodiment of the invention provides a method for predicting order reply duration, which is characterized in that an order prediction model generated through training is utilized, and the order is determined whether to initiate an order or not by combining an order placing time point and an order type, and the model provides reliable support for business party decision by predicting the reply duration of each hotel order as accurately as possible, so that the aim of reducing the order rate is fulfilled.
Example 4
Referring to fig. 5, the system for predicting an order reply duration according to the present embodiment includes: a receiving module 210, an inputting module 220, an obtaining module 230, a determining module 240, and a sending module 250.
The receiving module 210 is configured to receive, in real time, a target hotel order to be predicted.
In the OTA (Online Travel) industry, since different rooms in a hotel are sold by agents of internet company a, internet company B, internet company C, Travel Agency a, Travel Agency B, and Travel Agency C. If a customer places an order for a hotel on the client of the internet company a, but the hotel for placing the order is sold by the agent of the travel agency a, when the customer places the order on the client of the internet company a, the customer needs to confirm the stock with the supplier of the travel agency a before confirming whether the order is really available for the customer. The process has a time difference of waiting of the user, namely, the confirmation time length.
At present, order data of agents are stored in a database of a cloud server, and hotel order data of the agents are synchronously intercepted into a server database of an Internet company A in the cloud server database by using a development tool.
The receiving module 210 receives hotel order information to be predicted in real time and synchronously every day, where the hotel order information includes a hotel name, a hotel address, a hotel star level, supplier channel information, and the like.
An input module 220, configured to input the target hotel order into the order reply duration prediction model trained by using the system in embodiment 2, to obtain a reply duration value.
The input module 220 inputs data of the target hotel order into an order reply duration prediction model generated after training by using an xgboost model, wherein the order reply duration extracts characteristic data from the target hotel order data, and the characteristic data is used for analysis and prediction.
The obtaining module 230 is configured to obtain an order type of the target hotel order and an order placing time point of the user.
The obtaining module 230 obtains the order type from the target hotel order, i.e. whether the order is an order for living in the hotel on the same day or an order for living in the hotel on the next day; the ordering time point of the user is the working time period or the non-working time period.
And the determining module 240 is configured to determine the order hastening time according to the order type, the order placing time point of the user, and the reply duration value.
In this embodiment, the xgboost model is used to predict each hotel order proxied by a different travel agency in the future, and determine the reply duration of the hotel order. And the service party of the Internet company specifies a proper invoicing time according to the reply duration value and in combination with other related information. For example: the original emergency order which can be returned in 4 minutes can be called, but the order is called as soon as the order is called in 3 minutes, so that unnecessary manual calling for the order is saved, and the cost of the order is reduced.
And a sending module 250, configured to send order promotion information corresponding to the target hotel order based on the order promotion time.
In this embodiment, corresponding order information is sent to different travel agencies of the hotel order based on the order time. When the business directions of the internet formula are ordered to the travel agency of the hotel, the business directions can continuously send a piece of order information each time, and the order information can also be sent every 1 minute or every 2 minutes after the first order information is sent. Until the agent of the travel agency receives the order information and returns confirmation that the order is orderable or orderable.
The embodiment of the invention provides a system for predicting order reply duration, which utilizes an order prediction model generated by training to determine whether the order is necessary to initiate an order taking by combining an order placing time point and an order type, and the model provides reliable support for business party decision by predicting the reply duration of each hotel order as accurately as possible, thereby achieving the purpose of reducing the order taking rate.
Example 5
Fig. 6 is a schematic structural diagram of an electronic device provided in this embodiment. The electronic device includes a memory, a processor and a computer program stored in the memory and executable on the processor, and the processor executes the program to implement the method for model training of embodiment 1 or the method for predicting order reply duration of embodiment 3, and the electronic device 30 shown in fig. 6 is only an example and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.
The electronic device 30 may be embodied in the form of a general purpose computing device, which may be, for example, a server device. The components of the electronic device 30 may include, but are not limited to: the at least one processor 31, the at least one memory 32, and a bus 33 connecting the various system components (including the memory 32 and the processor 31).
The bus 33 includes a data bus, an address bus, and a control bus.
The memory 32 may include volatile memory, such as Random Access Memory (RAM)321 and/or cache memory 322, and may further include Read Only Memory (ROM) 323.
The processor 31 executes various functional applications and data processing, such as the method of model training of embodiment 1 of the present invention or the method of order reply duration prediction of embodiment 3, by running the computer program stored in the memory 32.
The electronic device 30 may also communicate with one or more external devices 34 (e.g., keyboard, pointing device, etc.). Such communication may be through input/output (I/O) interfaces 35. Also, model-generating device 30 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via network adapter 36. As shown, network adapter 36 communicates with the other modules of model-generating device 30 via bus 33. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the model-generating device 30, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Example 6
The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the steps of the method of model training of embodiment 1 or the steps of the method of order reply duration prediction of embodiment 3.
More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation, the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps of the method of implementing the model training of example 1 or the steps of the method of predicting an order reply duration of example 3, when the program product is run on the terminal device.
Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may be executed entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.
Claims (14)
1. A method of model training, the method comprising:
acquiring historical data of a plurality of hotel orders;
extracting characteristic data from the historical data, wherein the characteristic data comprises user ordering time point information, order type information, hotel static attribute information, holiday information and channel information of hotel suppliers, and the user ordering time point information is used for representing that ordering time points are in working periods or non-working periods;
inputting the obtained characteristic data into an xgboost model for training, and generating an order reply duration prediction model when the absolute value of the error between the true value of the order reply duration in the historical data and the predicted value output by the xgboost model is minimum; the xgboost model is a tree structure model of a plurality of serial trees constructed according to the feature data.
2. The method for model training according to claim 1, wherein the step of inputting the acquired feature data into an xgboost model for training comprises:
acquiring initial super parameter values of the xgboost model and the loss function;
initializing the xgboost model based on an initial hyper-parameter value of the loss function;
according to the characteristic data and the loss function, parameter adjustment is carried out on the initialized initial super parameter value of the xgboost model so as to reduce the corresponding loss function;
when the loss function corresponding to the adjusted xgboost model is not reduced any more, determining the adjusted super parameter value of the xgboost model as a target parameter value, wherein the loss function corresponding to the target parameter value is a target loss function;
adjusting leaf node weights of the objective loss function so that a predicted value output by the xgboost model is closer to the true value.
3. The method of model training as defined in claim 2, wherein the penalty function for the xgboost model is obtained by the following equation:
wherein L represents the loss function,t represents the number of leaf nodes in the current tree structure model, ft(xi) A functional representation representing the current tree structure model, γ representing a hyperparameter controlling the number of leaves, λ representing an L2 hyperparameter controlling the weights of the leaves, w representing the weights of the leaves, yiRepresenting the true value of the current tree structure model,representing the predicted values of the first t-1 tree structure models,represents the partial derivative, l represents the residual equation, and n represents the total historical data for the hotel order.
4. The method of model training of claim 1, the method further comprising:
after the historical data corresponding to the plurality of hotel orders are obtained, preprocessing operation is carried out on the historical data, wherein the preprocessing operation comprises the step of removing the hotel orders with order reply duration longer than preset duration.
5. A system for model training, the system comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring historical data of a plurality of hotel orders;
the extraction module is used for extracting characteristic data from the historical data, wherein the characteristic data comprises user ordering time point information, order type information, hotel static attribute information, holiday information and channel information of hotel suppliers, and the user ordering time point information is used for representing that ordering time points are in working periods or non-working periods;
the training module is used for inputting the acquired feature data into an xgboost model for training, and when the absolute value of the error between the real value of the order reply duration in the historical data and the predicted value output by the xgboost model is minimum, an order reply duration prediction model is generated; the xgboost model is a tree structure model of a plurality of serial trees constructed according to the feature data.
6. The model training system of claim 5, wherein the training module comprises:
the acquisition unit is used for acquiring the initial hyper-parameter value of the xgboost model and the loss function;
an initialization unit, configured to initialize the xgboost model based on an initial hyper-parameter value of the loss function;
a first adjusting unit, configured to perform parameter adjustment on an initial hyper-parameter value of the initialized xgboost model according to the feature data and the loss function, so as to reduce the corresponding loss function;
a determining unit, configured to determine, when the loss function corresponding to the adjusted xgboost model is not continuously decreased any more, a super parameter value of the adjusted xgboost model as a target parameter value, where the loss function corresponding to the target parameter value is a target loss function;
and the second adjusting unit is used for adjusting the leaf node weight of the target loss function to enable the predicted value output by the xgboost model to be closer to the true value.
7. The model training system of claim 6, wherein the penalty function for the xgboost model is obtained by the following equation:
wherein L represents the loss function,t represents the number of leaf nodes in the current tree structure model, ft(xi) A functional representation representing the current tree structure model, γ representing a hyperparameter controlling the number of leaves, λ representing an L2 hyperparameter controlling the weights of the leaves, w representing the weights of the leaves, yiRepresenting the true value of the current tree structure model,representing the predicted values of the first t-1 tree structure models,represents the partial derivative, l represents the residual equation, and n represents the total amount of historical data for the hotel order.
8. The system of model training of claim 5, the system further comprising:
the preprocessing module is used for preprocessing the historical data after acquiring the historical data corresponding to the plurality of hotel orders, wherein the preprocessing operation comprises removing the hotel orders with order reply duration longer than preset duration.
9. A method for predicting order reply duration, characterized in that the method comprises:
receiving a target hotel order to be predicted in real time;
inputting the target hotel order into the order reply duration prediction model trained by the method according to any one of claims 1 to 4 to obtain a reply duration value.
10. The method for predicting order reply duration of claim 9, wherein the method further comprises:
acquiring the order type of the target hotel order and the order placing time point of the user;
determining an order urging time according to the order type, the order placing time point of the user and the reply duration value;
and sending the order-prompting information corresponding to the target hotel order based on the order-prompting time.
11. A system for predicting an order reply duration, the system comprising:
the receiving module is used for receiving a target hotel order to be predicted in real time;
an input module, configured to input the target hotel order into the order reply duration prediction model trained by the system according to any one of claims 5 to 8, so as to obtain a reply duration value.
12. The system for order reply duration prediction according to claim 11, wherein the system further comprises:
the acquisition module is used for acquiring the order type of the target hotel order and the order placing time point of the user;
the determining module is used for determining the order urging time according to the order type, the order placing time point of the user and the reply duration value;
and the sending module is used for sending the order-prompting information corresponding to the target hotel order based on the order-prompting time.
13. An electronic device comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing a method of model training according to any one of claims 1-4, or performing a method of order reply duration prediction according to claim 9 or 10.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of model training according to any one of claims 1 to 4, or the steps of the method of order reply duration prediction according to claim 9 or 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010473630.6A CN111639807A (en) | 2020-05-29 | 2020-05-29 | Model training method, duration prediction method, system, device and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010473630.6A CN111639807A (en) | 2020-05-29 | 2020-05-29 | Model training method, duration prediction method, system, device and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111639807A true CN111639807A (en) | 2020-09-08 |
Family
ID=72329353
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010473630.6A Pending CN111639807A (en) | 2020-05-29 | 2020-05-29 | Model training method, duration prediction method, system, device and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111639807A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112257884A (en) * | 2020-09-25 | 2021-01-22 | 南京意博软件科技有限公司 | Order management method and system |
CN112801763A (en) * | 2021-04-14 | 2021-05-14 | 浙江口碑网络技术有限公司 | Touch and reach scheme generation method and device and electronic equipment |
CN113266952A (en) * | 2021-05-24 | 2021-08-17 | 佛山市顺德区美的洗涤电器制造有限公司 | Temperature control method and system for wall-mounted boiler and server |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002268971A (en) * | 2001-03-08 | 2002-09-20 | Ntt Data Corp | Web service system |
US20140207499A1 (en) * | 2013-01-24 | 2014-07-24 | Room 77, Inc. | Check-in to a hotel room online |
JP2019082872A (en) * | 2017-10-31 | 2019-05-30 | 株式会社日立製作所 | Action proposal system |
CN110110936A (en) * | 2019-05-13 | 2019-08-09 | 拉扎斯网络科技(上海)有限公司 | Order duration estimation method, estimation device, storage medium and electronic equipment |
CN110490357A (en) * | 2019-07-02 | 2019-11-22 | 北京星选科技有限公司 | Confirmation method, device, server, the electronic equipment of waiting time |
CN112257884A (en) * | 2020-09-25 | 2021-01-22 | 南京意博软件科技有限公司 | Order management method and system |
-
2020
- 2020-05-29 CN CN202010473630.6A patent/CN111639807A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002268971A (en) * | 2001-03-08 | 2002-09-20 | Ntt Data Corp | Web service system |
US20140207499A1 (en) * | 2013-01-24 | 2014-07-24 | Room 77, Inc. | Check-in to a hotel room online |
JP2019082872A (en) * | 2017-10-31 | 2019-05-30 | 株式会社日立製作所 | Action proposal system |
CN110110936A (en) * | 2019-05-13 | 2019-08-09 | 拉扎斯网络科技(上海)有限公司 | Order duration estimation method, estimation device, storage medium and electronic equipment |
CN110490357A (en) * | 2019-07-02 | 2019-11-22 | 北京星选科技有限公司 | Confirmation method, device, server, the electronic equipment of waiting time |
CN112257884A (en) * | 2020-09-25 | 2021-01-22 | 南京意博软件科技有限公司 | Order management method and system |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112257884A (en) * | 2020-09-25 | 2021-01-22 | 南京意博软件科技有限公司 | Order management method and system |
CN112801763A (en) * | 2021-04-14 | 2021-05-14 | 浙江口碑网络技术有限公司 | Touch and reach scheme generation method and device and electronic equipment |
CN112801763B (en) * | 2021-04-14 | 2021-08-24 | 浙江口碑网络技术有限公司 | Touch and reach scheme generation method and device and electronic equipment |
CN113266952A (en) * | 2021-05-24 | 2021-08-17 | 佛山市顺德区美的洗涤电器制造有限公司 | Temperature control method and system for wall-mounted boiler and server |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111639807A (en) | Model training method, duration prediction method, system, device and medium | |
CN110390408B (en) | Transaction object prediction method and device | |
WO2019001359A1 (en) | Data processing method and data processing apparatus | |
US11449798B2 (en) | Automated problem detection for machine learning models | |
CN110502538B (en) | Method, system, equipment and storage medium for portrait tag generation logic mapping | |
CN104572976A (en) | Website data updating method and system | |
CN111080417A (en) | Processing method for improving booking smoothness rate, model training method and system | |
CN111507541B (en) | Goods quantity prediction model construction method, goods quantity measurement device and electronic equipment | |
CN115688547A (en) | Simulated weather scenarios and extreme weather predictions | |
CN115454420A (en) | Artificial intelligence algorithm model deployment system, method, equipment and storage medium | |
CN110083518B (en) | AdaBoost-Elman-based virtual machine software aging prediction method | |
CN114818353A (en) | Train control vehicle-mounted equipment fault prediction method based on fault characteristic relation map | |
CN113656315B (en) | Data testing method and device, electronic equipment and storage medium | |
CN113190746A (en) | Recommendation model evaluation method and device and electronic equipment | |
WO2018200937A1 (en) | Systems and methods for dynamic risk modeling tagging | |
CN111861801A (en) | Hotel full room prediction method, system, equipment and storage medium | |
CN115391746B (en) | Interpolation method, interpolation device, electronic device and medium for meteorological element data | |
CN110753366A (en) | Prediction processing method and device for industry short message gateway capacity | |
CN111159988B (en) | Model processing method, device, computer equipment and storage medium | |
CN114978944A (en) | Pressure testing method, device and computer program product | |
US20210027234A1 (en) | Systems and methods for analyzing user projects | |
CN113704314A (en) | Data analysis method and device, electronic equipment and storage medium | |
CN112308639B (en) | Aging prediction method and device for target event | |
CN112148551B (en) | Method, apparatus and computer program product for determining a rate of change of usage of a storage system | |
JP2023516035A (en) | A method and system for processing data with varying temporal characteristics to generate predictions about management arrangements using a random forest classifier |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200908 |
|
RJ01 | Rejection of invention patent application after publication |