CN111639807A - Model training method, duration prediction method, system, device and medium - Google Patents

Model training method, duration prediction method, system, device and medium Download PDF

Info

Publication number
CN111639807A
CN111639807A CN202010473630.6A CN202010473630A CN111639807A CN 111639807 A CN111639807 A CN 111639807A CN 202010473630 A CN202010473630 A CN 202010473630A CN 111639807 A CN111639807 A CN 111639807A
Authority
CN
China
Prior art keywords
order
model
hotel
loss function
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010473630.6A
Other languages
Chinese (zh)
Inventor
黎建辉
邹亚鹏
胡泓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ctrip Computer Technology Shanghai Co Ltd
Original Assignee
Ctrip Computer Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ctrip Computer Technology Shanghai Co Ltd filed Critical Ctrip Computer Technology Shanghai Co Ltd
Priority to CN202010473630.6A priority Critical patent/CN111639807A/en
Publication of CN111639807A publication Critical patent/CN111639807A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Lists, e.g. purchase orders, compilation or processing
    • G06Q30/0635Processing of requisition or of purchase orders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/12Hotels or restaurants

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Finance (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Primary Health Care (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for training a model, a method for predicting duration, a system, equipment and a medium, wherein the method for training the model comprises the following steps: acquiring historical data of a plurality of hotel orders; extracting characteristic data from the historical data, inputting the acquired characteristic data into an xgboost model for training, and generating an order reply duration prediction model when the absolute value of the error between the true value of the order reply duration in the historical data and the predicted value output by the xgboost model is minimum. The model training method and the order reply duration prediction method provided by the invention overcome the defects that business personnel judge the order-hastening time of a hotel order according to business experience and the prediction accuracy is low; the accuracy of the hotel order reply duration prediction is improved, and the order hastening rate is finally reduced.

Description

Model training method, duration prediction method, system, device and medium
Technical Field
The present invention relates to model training technologies, and in particular, to a method for model training, a method, a system, a device, and a medium for duration prediction.
Background
Large internet companies are called proxy connections for some providers, e.g., the most common providers of hotel services in the chinese and metropolitan areas. Each hotel can correspond to a plurality of sub-hotels, and the sub-hotels can be respectively sold by a plurality of different travel agencies, so the travel agencies are also called agent traffic. When a user gets an order from an order agent at an internet client, an internet company can finally confirm whether hotel commodities in the order can be really ordered or not after a supplier confirms the inventory condition of hotel rooms in the order. In the process, the user has a waiting time difference, which is called a confirmation time length, the starting time point of the confirmation time length is the ordering time point of the user, and the ending time point is the replying time point of the supplier. If the confirmation duration is too long, the user experience will be affected. Therefore, the internet company adopts a manual invoicing mode for solving the problem, and the waiting time required by the hotel order is predicted according to the type of the hotel order and the order placing time point.
In the prior art, the waiting time is completely set by service personnel according to service experience, a large amount of unnecessary invoicing causes the increase of labor cost, and the invoicing time is completely judged by the service personnel according to the service experience, so that the prediction accuracy is low.
Disclosure of Invention
The invention aims to overcome the defects that in the prior art, business personnel judge the order-drawing time of a hotel order according to business experience and the prediction accuracy is low, and provides a model training method, a duration prediction method, a system, equipment and a medium.
The invention solves the technical problems through the following technical scheme:
in a first aspect, the present invention provides a method of model training, the method comprising:
acquiring historical data of a plurality of hotel orders;
extracting characteristic data from the historical data, wherein the characteristic data comprises user ordering time point information, order type information, hotel static attribute information, holiday information and channel information of hotel suppliers, and the user ordering time point information is used for representing that ordering time points are in working periods or non-working periods;
inputting the obtained characteristic data into an xgboost model for training, and generating an order reply duration prediction model when the absolute value of the error between the true value of the order reply duration in the historical data and the predicted value output by the xgboost model is minimum; the xgboost model is a tree structure model of a plurality of serial trees constructed according to the feature data.
Preferably, the step of inputting the acquired feature data into an xgboost model for training includes:
acquiring initial super parameter values of the xgboost model and the loss function;
initializing the xgboost model based on an initial hyper-parameter value of the loss function;
according to the characteristic data and the loss function, parameter adjustment is carried out on the initialized initial super parameter value of the xgboost model so as to reduce the corresponding loss function;
when the loss function corresponding to the adjusted xgboost model is not reduced any more, determining the adjusted super parameter value of the xgboost model as a target parameter value, wherein the loss function corresponding to the target parameter value is a target loss function;
adjusting leaf node weights of the objective loss function so that a predicted value output by the xgboost model is closer to the true value.
Preferably, the loss function of the xgboost model is obtained by the following formula:
Figure BDA0002515104020000021
wherein L represents the loss function,
Figure BDA0002515104020000022
t represents the number of leaf nodes in the current tree structure model, ft(xi) A functional representation representing the current tree structure model, γ representing a hyperparameter controlling the number of leaves, λ representing an L2 hyperparameter controlling the weights of the leaves, w representing the weights of the leaves, yiRepresenting the true value of the current tree structure model,
Figure BDA0002515104020000023
representing the predicted values of the first t-1 tree structure models,
Figure BDA0002515104020000024
represents the partial derivative, l represents the residual equation, and n represents the total historical data for the hotel order.
Preferably, the method further comprises:
after the historical data corresponding to the plurality of hotel orders are obtained, preprocessing operation is carried out on the historical data, wherein the preprocessing operation comprises the step of removing the hotel orders with order reply duration longer than preset duration.
In a second aspect, the present invention provides a system for model training, the system comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring historical data of a plurality of hotel orders;
the extraction module is used for extracting characteristic data from the historical data, wherein the characteristic data comprises user ordering time point information, order type information, hotel static attribute information, holiday information and channel information of hotel suppliers, and the user ordering time point information is used for representing that ordering time points are in working periods or non-working periods;
the training module is used for inputting the acquired feature data into an xgboost model for training, and when the absolute value of the error between the real value of the order reply duration in the historical data and the predicted value output by the xgboost model is minimum, an order reply duration prediction model is generated; the xgboost model is a tree structure model of a plurality of serial trees constructed according to the feature data.
Preferably, the training module comprises:
the acquisition unit is used for acquiring the initial hyper-parameter value of the xgboost model and the loss function;
an initialization unit, configured to initialize the xgboost model based on an initial hyper-parameter value of the loss function;
a first adjusting unit, configured to perform parameter adjustment on an initial hyper-parameter value of the initialized xgboost model according to the feature data and the loss function, so as to reduce the corresponding loss function;
a determining unit, configured to determine, when the loss function corresponding to the adjusted xgboost model is not continuously decreased any more, a super parameter value of the adjusted xgboost model as a target parameter value, where the loss function corresponding to the target parameter value is a target loss function;
and the second adjusting unit is used for adjusting the leaf node weight of the target loss function to enable the predicted value output by the xgboost model to be closer to the true value.
Preferably, the loss function of the xgboost model is obtained by the following formula:
Figure BDA0002515104020000041
wherein L represents the loss function,
Figure BDA0002515104020000042
t represents the number of leaf nodes in the current tree structure model, ft(xi) A functional representation representing the current tree structure model, γ representing a hyperparameter controlling the number of leaves, λ representing an L2 hyperparameter controlling the weights of the leaves, w representing the weights of the leaves, yiRepresenting the true value of the current tree structure model,
Figure BDA0002515104020000043
representing the predicted values of the first t-1 tree structure models,
Figure BDA0002515104020000044
represents the partial derivative, l represents the residual equation, and n represents the total amount of historical data for the hotel order.
Preferably, the system further comprises:
the preprocessing module is used for preprocessing the historical data after acquiring the historical data corresponding to the plurality of hotel orders, wherein the preprocessing operation comprises removing the hotel orders with order reply duration longer than preset duration.
In a third aspect, the present invention provides a method for predicting an order reply duration, where the method includes:
receiving a target hotel order to be predicted in real time;
and inputting the target hotel order into the order reply duration prediction model trained by the method to obtain a reply duration value.
Preferably, the method further comprises:
acquiring the order type of the target hotel order and the order placing time point of the user;
determining an order urging time according to the order type, the order placing time point of the user and the reply duration value;
and sending the order-prompting information corresponding to the target hotel order based on the order-prompting time.
In a fourth aspect, the present invention provides a system for predicting an order reply duration, the system comprising:
the receiving module is used for receiving a target hotel order to be predicted in real time;
and the input module is used for inputting the target hotel order into the order reply duration prediction model trained by the system to obtain a reply duration value.
Preferably, the system further comprises:
the acquisition module is used for acquiring the order type of the target hotel order and the order placing time point of the user;
the determining module is used for determining the order urging time according to the order type, the order placing time point of the user and the reply duration value;
and the sending module is used for sending the order-prompting information corresponding to the target hotel order based on the order-prompting time.
In a fifth aspect, the present invention further provides an electronic device, which includes a processor, a memory, and a computer program stored in the memory and executable on the processor, where the computer program, when executed by the processor, implements the method for model training according to the first aspect, or implements the method for predicting the order reply duration according to the third aspect.
In a sixth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for model training according to the first aspect, or performs the steps of the method for predicting order reply duration according to the third aspect.
The positive progress effects of the invention are as follows: a method of model training, a method, system, device and medium of duration prediction are provided. According to the model training method and the order reply duration prediction method, the accuracy of prediction is improved by training the order reply duration prediction model, and the problem that the labor cost is high and the accuracy is poor due to the fact that the order reply duration is predicted only based on manual prediction is solved by the order reply duration prediction method, so that the accuracy of prediction of order-holding time of a hotel order is improved, and the labor cost is reduced.
Drawings
Fig. 1 is a flowchart of a model training method according to embodiment 1 of the present invention.
Fig. 2 is a flowchart of step S14 of the method for model training according to embodiment 1 of the present invention.
Fig. 3 is a block diagram of a system for model training according to embodiment 2 of the present invention.
Fig. 4 is a flowchart of a method for predicting an order reply duration according to embodiment 3 of the present invention.
Fig. 5 is a block diagram illustrating a system for predicting an order reply duration according to embodiment 4 of the present invention.
Fig. 6 is a schematic diagram of a hardware structure of an electronic device according to embodiment 5 of the present invention.
Detailed Description
The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention. The experimental methods without specifying specific conditions in the following examples were selected according to the conventional methods and conditions, or according to the commercial instructions.
Example 1
The present embodiment provides a method for model training, and referring to fig. 1, the method includes the following steps:
and step S11, acquiring historical data of a plurality of hotel orders.
Receiving historical data of a plurality of hotel orders, wherein the historical data comprises ordering time points of hotels, names of the hotels, cities where the hotels are located, order information of the hotels and reply duration of the hotels. The period of historical data for the hotel order may be 7 days, 30 days, 90 days, and 180 days.
And step S12, preprocessing the historical data, wherein the preprocessing includes removing hotel orders with order reply duration longer than preset duration.
After the historical data corresponding to the plurality of hotel orders are obtained, if the confirmation mode of the hotel is known from the hotel agent business side, if the store has a house source, the order information is timely replied to confirm whether the order information is the orderable hotel order. However, there are a series of orders with a reply duration of more than 20 hours, and some hotel orders have historical data with a reply duration of even several days. Therefore, in this embodiment, hotel orders with reply duration longer than the preset duration in the hotel order information are filtered out, and the preset duration may be 3 hours, 6 hours, or 10 hours, which is not specifically limited here.
Step S13, extracting characteristic data from the historical data, wherein the characteristic data comprises user ordering time point information, order type information, hotel static attribute information, holiday information and channel information of hotel suppliers, and the user ordering time point information is used for representing that the ordering time point is in a working period or a non-working period.
In this embodiment, the ordering time point information indicates that the hotel room agent is in a working period or a non-working period, and the working period may be 9: 00 to 17 pm: 00, the period of inactivity may refer to 9: 00 to 17 pm: times other than 00. The order type information indicates whether the order is a hotel order on the current day or a hotel order on an alternate day; the hotel static attribute information can comprise the star level of the hotel, the city where the hotel is located, the business district where the hotel is located, and the country where the hotel is located; holiday information may include national celebration, date, mid-autumn or early afternoon; the provider channel information includes information for travel agency a, travel agency B, and travel agency C.
And step S14, inputting the acquired feature data into an xgboost model for training, and generating an order reply duration prediction model when the absolute value of the error between the true value of the order reply duration in the historical data and the predicted value output by the xgboost model is minimum, wherein the xgboost model is a plurality of serial tree structure models constructed according to the feature data.
In the embodiment of the application, after the order is placed by a user, the time period during which the order can be ordered or cannot be ordered because the stock is insufficient is determined by obtaining the order from the historical data of the hotel order, and the time period is a real value of the order reply time length.
After all the characteristic data are input into an xgboost model to be trained, judging whether the current node splitting in the tree structure of the xgboost model after the adjustment of the super parameter can cause the reduction of the loss function or not by adjusting the super parameter value of the loss function and combining the characteristic data, if the current node splitting in the tree structure of the xgboost model after the adjustment of the super parameter is continuously reduced, the current node splitting is continuously carried out, if the current node splitting is not reduced, the current node is a leaf node in the tree structure of the xgboost model, and the quantity of the leaf nodes in the tree structure of the xgboost model is determined by the size of the super parameter.
Further, through multiple loop iterations, according to the historical data of each hotel order, when the error between the predicted value output by the xgboost model and the actual value contained in the historical data reaches a preset threshold value, an order reply duration prediction model is generated, and the preset threshold value is the minimum absolute value of the error between the actual value and the predicted value output by the xgboost model.
In this embodiment, referring to fig. 2, the step S14 includes the following steps:
and step S141, acquiring an xgboost model and an initial hyper-parameter value of the loss function.
And S142, initializing the xgboost model based on the initial hyper-parameter value of the loss function.
And S143, according to the characteristic data and the loss function, performing parameter adjustment on the initial super parameter value of the initialized xgboost model so as to reduce the corresponding loss function.
And step S144, when the loss function corresponding to the adjusted xgboost model is not reduced any more, determining the over-parameter value of the adjusted xgboost model as a target parameter value, and determining the loss function corresponding to the target parameter value as a target loss function.
When the initial super parameter value in the loss function is adjusted each time, the loss function is continuously reduced, which means that the xgboost model is continuously split, therefore, when the initial super parameter value is adjusted to reduce the loss function to the minimum value, that is, when the loss function is not continuously reduced, the predicted value of the xgboost model under the loss function in the state is closer to the true value.
And S145, adjusting the leaf node weight of the target loss function to enable the predicted value output by the xgboost model to be closer to the true value.
Specifically, the leaf node weight of the objective loss function is adjusted to ensure that the absolute value of the difference between the predicted value and the true value output by using the xgboost model is within a preset threshold range, and the predicted value threshold is 10% or 5%. The accuracy of the prediction of the xgboost model is ensured by the adjustment of the leaf node weight.
Wherein the loss function of the xgboost model is obtained by the following formula:
Figure BDA0002515104020000081
wherein, L represents a loss function,
Figure BDA0002515104020000082
t represents the number of leaf nodes in the current tree structure model, ft(xi) A functional representation representing a current tree structure model, γ representing a hyperparameter controlling the number of leaves, λ representing an L2 hyperparameter controlling the weights of the leaves, w representing the weights of the leaves, yiRepresenting the true value of the current tree structure model,
Figure BDA0002515104020000083
representing the predicted value of the first t-1 tree structure model,
Figure BDA0002515104020000084
represents the partial derivative, l represents the residual equation, and n represents the total historical data for the hotel order.
In this embodiment, the weight of the leaf node
Figure BDA0002515104020000085
Wherein k represents the sample data falling into the leaf node in the training stage, j represents the index of the leaf in the tree, the training of the xgboost model is completed through multiple similar loop iterations, and the optimization of the model effect can be realized through adjusting rich hyper-parameters in the whole process.
And after the model is trained by using the training data, testing the model by using the test data. The idea of regression problem can be utilized, the model is used for comparing the predicted value of the order reply duration in the hotel order historical data with the true value of the hotel order, and the absolute average error of the predicted value and the true value is calculated and used as the standard for judging the goodness of the model training.
The embodiment of the invention provides a model training method, which comprises the steps of adjusting a super parameter in a multi-class serial tree structure xgboost model by acquiring feature data in historical data, determining the super parameter value of the adjusted xgboost model as a target parameter value when a loss function corresponding to the adjusted xgboost model is not reduced any more, adjusting leaf node weight of the target loss function, and improving the accuracy of predicting the reply duration of a hotel order by using the xgboost model.
Example 2
The present embodiment provides a system for model training, referring to fig. 3, including: an acquisition module 110, a pre-processing module 120, an extraction module 130, and a training module 140.
The obtaining module 110 is configured to obtain historical data of a plurality of hotel orders.
The obtaining module 110 receives historical data of a plurality of hotel orders, where the historical data includes ordering time points of hotels, names of the hotels, cities where the hotels are located, order information of the hotels, and reply durations of the hotels. The period of historical data for the hotel order may be 7 days, 30 days, 90 days, and 180 days.
The preprocessing module 120 is configured to perform preprocessing operation on historical data after obtaining the historical data corresponding to the multiple hotel orders, where the preprocessing operation includes removing hotel orders whose order reply duration is greater than a preset duration.
After the preprocessing module 120 acquires the historical data corresponding to the plurality of hotel orders, if the confirmation mode of the hotel is known from the hotel agent business side that the hotel has a room source, the order information is replied in time to confirm whether the order information is an orderable hotel order. However, there are a series of orders with a reply duration of more than 20 hours, and some hotel orders have historical data with a reply duration of even several days. Therefore, in this embodiment, hotel orders with reply duration longer than the preset duration in the hotel order information are filtered out, and the preset duration may be 3 hours, 6 hours, or 10 hours, which is not specifically limited here.
And an extracting module 130, configured to extract feature data from the historical data, where the feature data includes user ordering time point information, order type information, hotel static attribute information, holiday information, and channel information of a hotel provider, and the user ordering time point information is used to represent that an ordering time point is in a working period or a non-working period.
In this embodiment, the ordering time point information in the feature data extracted by the extraction module 130 indicates that the information is in the working period or the non-working period of the hotel room agent, and the working period may refer to 9: 00 to 17 pm: 00, the period of inactivity may refer to 9: 00 to 17 pm: times other than 00. The order type information indicates whether the order is a hotel order on the current day or a hotel order on an alternate day; the hotel static attribute information can comprise the star level of the hotel, the city where the hotel is located, the business district where the hotel is located, and the country where the hotel is located; holiday information may include national celebration, date, mid-autumn or early afternoon; the provider channel information includes information for travel agency a, travel agency B, and travel agency C.
The training module 140 is configured to input the acquired feature data into an xgboost model for training, and generate an order reply duration prediction model when an absolute value of an error between a true value of order reply duration in the history data and a predicted value output by the xgboost model is minimum; the xgboost model is a tree structure model of a plurality of series constructed according to the feature data.
In the embodiment of the application, after the order is placed by a user, the time period during which the order can be ordered or cannot be ordered because the stock is insufficient is determined by obtaining the order from the historical data of the hotel order, and the time period is a real value of the order reply time length.
After all the characteristic data are input into an xgboost model to be trained, judging whether the current node splitting in the tree structure of the xgboost model after the adjustment of the super parameter can cause the reduction of the loss function or not by adjusting the super parameter value of the loss function and combining the characteristic data, if the current node splitting in the tree structure of the xgboost model after the adjustment of the super parameter is continuously reduced, the current node splitting is continuously carried out, if the current node splitting is not reduced, the current node is a leaf node in the tree structure of the xgboost model, and the quantity of the leaf nodes in the tree structure of the xgboost model is determined by the size of the super parameter.
Further, through multiple loop iterations, according to historical data of each hotel order, when an error between a predicted value output by the xgboost model and a true value contained in the historical data reaches a preset threshold value, an order reply duration prediction model is generated, and the preset threshold value is an absolute value minimum value of the error between the true value and the predicted value output by the xgboost model
In this embodiment, the training module 140 further includes:
an obtaining unit 141, configured to obtain the xgboost model and the initial hyper-parameter value of the loss function.
An initialization unit 142, configured to initialize the xgboost model based on the initial hyper-parameter value of the loss function.
The first adjusting unit 143 is configured to perform parameter adjustment on the initial hyper-parameter value of the initialized xgboost model according to the feature data and the loss function, so that the corresponding loss function is reduced.
The determining unit 144 is configured to determine, when the loss function corresponding to the adjusted xgboost model does not decrease any more, a super parameter value of the adjusted xgboost model as a target parameter value, and determine the loss function corresponding to the target parameter value as a target loss function.
When the initial super parameter value in the loss function is adjusted each time, the loss function is continuously reduced, which means that the xgboost model is continuously split, therefore, when the initial super parameter value is adjusted to reduce the loss function to the minimum value, that is, when the loss function is not continuously reduced, the predicted value of the xgboost model under the loss function in the state is closer to the true value.
And a second adjusting unit 145, configured to adjust the leaf node weights of the objective loss function so that the predicted value output by the xgboost model is closer to the true value.
Specifically, the leaf node weight of the objective loss function is adjusted to ensure that the absolute value of the difference between the predicted value and the true value output by using the xgboost model is within a preset threshold range, and the predicted value threshold is 10% or 5%. The accuracy of the prediction of the xgboost model is ensured by the adjustment of the leaf node weight.
Wherein the loss function of the xgboost model is obtained by the following formula:
Figure BDA0002515104020000111
wherein, L represents a loss function,
Figure BDA0002515104020000112
t represents the number of leaf nodes in the current tree structure model, ft(xi) A functional representation representing a current tree structure model, γ representing a hyperparameter controlling the number of leaves, λ representing an L2 hyperparameter controlling the weights of the leaves, w representing the weights of the leaves, yiRepresenting the true value of the current tree structure model,
Figure BDA0002515104020000113
representing the predicted value of the first t-1 tree structure model,
Figure BDA0002515104020000114
represents the partial derivative, l represents the residual equation, and n represents the total historical data for the hotel order.
In this embodiment, the weight of the leaf node
Figure BDA0002515104020000115
Wherein k represents the sample data falling into the leaf node in the training stage, j represents the index of the leaf in the tree, the training of the xgboost model is completed through multiple similar loop iterations, and the optimization of the model effect can be realized through adjusting rich hyper-parameters in the whole process.
And after the model is trained by using the training data, testing the model by using the test data. The idea of regression problem can be utilized, the model is used for comparing the predicted value of the order reply duration in the hotel order historical data with the true value of the hotel order, and the absolute average error of the predicted value and the true value is calculated and used as the standard for judging the goodness of the model training.
In the embodiment of the invention, the extraction module acquires the characteristic data in the historical data, the training module adjusts the super-parameter in the multi-class serial tree structure xgboost model, when the loss function corresponding to the adjusted xgboost model is not reduced any more, the adjusted super-parameter value of the xgboost model is determined as the target parameter value, the leaf node weight of the target loss function is adjusted, and the accuracy of predicting the reply duration of the hotel order by using the xgboost model is improved.
Example 3
The present embodiment provides a method for predicting an order reply duration, referring to fig. 4, the method includes:
and step S21, receiving the target hotel order to be predicted in real time.
In the OTA (Online Travel) industry, since different rooms in a hotel are sold by agents of internet company a, internet company B, internet company C, Travel Agency a, Travel Agency B, and Travel Agency C. If a customer places an order for a hotel on the client of the internet company a, but the hotel for placing the order is sold by the agent of the travel agency a, when the customer places the order on the client of the internet company a, the customer needs to confirm the stock with the supplier of the travel agency a before confirming whether the order is really available for the customer. The process has a time difference of waiting of the user, namely, the confirmation time length.
At present, order data of agents are stored in a database of a cloud server, and hotel order data of the agents are synchronously intercepted into a server database of an Internet company A in the cloud server database by using a development tool.
The server of the Internet company receives hotel order information to be predicted synchronously in real time every day, wherein the hotel order information comprises hotel names, hotel addresses, hotel star levels, supplier channel information and the like.
Step S22, inputting the target hotel order into the order reply duration prediction model trained by the method in embodiment 1, and obtaining a reply duration value.
And inputting the data of the target hotel order into an order reply duration prediction model generated after training by using an xgboost model, wherein the order reply duration can extract characteristic data from the order data of the target hotel, and the characteristic data is used for analyzing and predicting.
And step S23, acquiring the order type of the target hotel order and the order placing time point of the user.
Acquiring an order type from the target hotel order, wherein the order type is an order for entering the hotel on the same day or an order for entering the hotel on the next day; the ordering time point of the user is the working time period or the non-working time period.
And step S24, determining the order urging time according to the order type, the order placing time point of the user and the reply duration value.
In this embodiment, the xgboost model is used to predict each hotel order proxied by a different travel agency in the future, and determine the reply duration of the hotel order. And the service party of the Internet company specifies a proper invoicing time according to the reply duration value and in combination with other related information. For example: the original emergency order which can be returned in 4 minutes can be called, but the order is called as soon as the order is called in 3 minutes, so that unnecessary manual calling for the order is saved, and the cost of the order is reduced.
And step S25, sending the order-prompting information corresponding to the target hotel order based on the order-prompting time.
In this embodiment, corresponding order information is sent to different travel agencies of the hotel order based on the order time. When the business directions of the internet formula are ordered to the travel agency of the hotel, the business directions can continuously send a piece of order information each time, and the order information can also be sent every 1 minute or every 2 minutes after the first order information is sent. Until the agent of the travel agency receives the order information and returns confirmation that the order is orderable or orderable.
The embodiment of the invention provides a method for predicting order reply duration, which is characterized in that an order prediction model generated through training is utilized, and the order is determined whether to initiate an order or not by combining an order placing time point and an order type, and the model provides reliable support for business party decision by predicting the reply duration of each hotel order as accurately as possible, so that the aim of reducing the order rate is fulfilled.
Example 4
Referring to fig. 5, the system for predicting an order reply duration according to the present embodiment includes: a receiving module 210, an inputting module 220, an obtaining module 230, a determining module 240, and a sending module 250.
The receiving module 210 is configured to receive, in real time, a target hotel order to be predicted.
In the OTA (Online Travel) industry, since different rooms in a hotel are sold by agents of internet company a, internet company B, internet company C, Travel Agency a, Travel Agency B, and Travel Agency C. If a customer places an order for a hotel on the client of the internet company a, but the hotel for placing the order is sold by the agent of the travel agency a, when the customer places the order on the client of the internet company a, the customer needs to confirm the stock with the supplier of the travel agency a before confirming whether the order is really available for the customer. The process has a time difference of waiting of the user, namely, the confirmation time length.
At present, order data of agents are stored in a database of a cloud server, and hotel order data of the agents are synchronously intercepted into a server database of an Internet company A in the cloud server database by using a development tool.
The receiving module 210 receives hotel order information to be predicted in real time and synchronously every day, where the hotel order information includes a hotel name, a hotel address, a hotel star level, supplier channel information, and the like.
An input module 220, configured to input the target hotel order into the order reply duration prediction model trained by using the system in embodiment 2, to obtain a reply duration value.
The input module 220 inputs data of the target hotel order into an order reply duration prediction model generated after training by using an xgboost model, wherein the order reply duration extracts characteristic data from the target hotel order data, and the characteristic data is used for analysis and prediction.
The obtaining module 230 is configured to obtain an order type of the target hotel order and an order placing time point of the user.
The obtaining module 230 obtains the order type from the target hotel order, i.e. whether the order is an order for living in the hotel on the same day or an order for living in the hotel on the next day; the ordering time point of the user is the working time period or the non-working time period.
And the determining module 240 is configured to determine the order hastening time according to the order type, the order placing time point of the user, and the reply duration value.
In this embodiment, the xgboost model is used to predict each hotel order proxied by a different travel agency in the future, and determine the reply duration of the hotel order. And the service party of the Internet company specifies a proper invoicing time according to the reply duration value and in combination with other related information. For example: the original emergency order which can be returned in 4 minutes can be called, but the order is called as soon as the order is called in 3 minutes, so that unnecessary manual calling for the order is saved, and the cost of the order is reduced.
And a sending module 250, configured to send order promotion information corresponding to the target hotel order based on the order promotion time.
In this embodiment, corresponding order information is sent to different travel agencies of the hotel order based on the order time. When the business directions of the internet formula are ordered to the travel agency of the hotel, the business directions can continuously send a piece of order information each time, and the order information can also be sent every 1 minute or every 2 minutes after the first order information is sent. Until the agent of the travel agency receives the order information and returns confirmation that the order is orderable or orderable.
The embodiment of the invention provides a system for predicting order reply duration, which utilizes an order prediction model generated by training to determine whether the order is necessary to initiate an order taking by combining an order placing time point and an order type, and the model provides reliable support for business party decision by predicting the reply duration of each hotel order as accurately as possible, thereby achieving the purpose of reducing the order taking rate.
Example 5
Fig. 6 is a schematic structural diagram of an electronic device provided in this embodiment. The electronic device includes a memory, a processor and a computer program stored in the memory and executable on the processor, and the processor executes the program to implement the method for model training of embodiment 1 or the method for predicting order reply duration of embodiment 3, and the electronic device 30 shown in fig. 6 is only an example and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.
The electronic device 30 may be embodied in the form of a general purpose computing device, which may be, for example, a server device. The components of the electronic device 30 may include, but are not limited to: the at least one processor 31, the at least one memory 32, and a bus 33 connecting the various system components (including the memory 32 and the processor 31).
The bus 33 includes a data bus, an address bus, and a control bus.
The memory 32 may include volatile memory, such as Random Access Memory (RAM)321 and/or cache memory 322, and may further include Read Only Memory (ROM) 323.
Memory 32 may also include a program/utility 325 having a set (at least one) of program modules 324, such program modules 324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The processor 31 executes various functional applications and data processing, such as the method of model training of embodiment 1 of the present invention or the method of order reply duration prediction of embodiment 3, by running the computer program stored in the memory 32.
The electronic device 30 may also communicate with one or more external devices 34 (e.g., keyboard, pointing device, etc.). Such communication may be through input/output (I/O) interfaces 35. Also, model-generating device 30 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via network adapter 36. As shown, network adapter 36 communicates with the other modules of model-generating device 30 via bus 33. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the model-generating device 30, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Example 6
The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the steps of the method of model training of embodiment 1 or the steps of the method of order reply duration prediction of embodiment 3.
More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation, the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps of the method of implementing the model training of example 1 or the steps of the method of predicting an order reply duration of example 3, when the program product is run on the terminal device.
Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may be executed entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims (14)

1. A method of model training, the method comprising:
acquiring historical data of a plurality of hotel orders;
extracting characteristic data from the historical data, wherein the characteristic data comprises user ordering time point information, order type information, hotel static attribute information, holiday information and channel information of hotel suppliers, and the user ordering time point information is used for representing that ordering time points are in working periods or non-working periods;
inputting the obtained characteristic data into an xgboost model for training, and generating an order reply duration prediction model when the absolute value of the error between the true value of the order reply duration in the historical data and the predicted value output by the xgboost model is minimum; the xgboost model is a tree structure model of a plurality of serial trees constructed according to the feature data.
2. The method for model training according to claim 1, wherein the step of inputting the acquired feature data into an xgboost model for training comprises:
acquiring initial super parameter values of the xgboost model and the loss function;
initializing the xgboost model based on an initial hyper-parameter value of the loss function;
according to the characteristic data and the loss function, parameter adjustment is carried out on the initialized initial super parameter value of the xgboost model so as to reduce the corresponding loss function;
when the loss function corresponding to the adjusted xgboost model is not reduced any more, determining the adjusted super parameter value of the xgboost model as a target parameter value, wherein the loss function corresponding to the target parameter value is a target loss function;
adjusting leaf node weights of the objective loss function so that a predicted value output by the xgboost model is closer to the true value.
3. The method of model training as defined in claim 2, wherein the penalty function for the xgboost model is obtained by the following equation:
Figure FDA0002515104010000011
wherein L represents the loss function,
Figure FDA0002515104010000012
t represents the number of leaf nodes in the current tree structure model, ft(xi) A functional representation representing the current tree structure model, γ representing a hyperparameter controlling the number of leaves, λ representing an L2 hyperparameter controlling the weights of the leaves, w representing the weights of the leaves, yiRepresenting the true value of the current tree structure model,
Figure FDA0002515104010000022
representing the predicted values of the first t-1 tree structure models,
Figure FDA0002515104010000021
represents the partial derivative, l represents the residual equation, and n represents the total historical data for the hotel order.
4. The method of model training of claim 1, the method further comprising:
after the historical data corresponding to the plurality of hotel orders are obtained, preprocessing operation is carried out on the historical data, wherein the preprocessing operation comprises the step of removing the hotel orders with order reply duration longer than preset duration.
5. A system for model training, the system comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring historical data of a plurality of hotel orders;
the extraction module is used for extracting characteristic data from the historical data, wherein the characteristic data comprises user ordering time point information, order type information, hotel static attribute information, holiday information and channel information of hotel suppliers, and the user ordering time point information is used for representing that ordering time points are in working periods or non-working periods;
the training module is used for inputting the acquired feature data into an xgboost model for training, and when the absolute value of the error between the real value of the order reply duration in the historical data and the predicted value output by the xgboost model is minimum, an order reply duration prediction model is generated; the xgboost model is a tree structure model of a plurality of serial trees constructed according to the feature data.
6. The model training system of claim 5, wherein the training module comprises:
the acquisition unit is used for acquiring the initial hyper-parameter value of the xgboost model and the loss function;
an initialization unit, configured to initialize the xgboost model based on an initial hyper-parameter value of the loss function;
a first adjusting unit, configured to perform parameter adjustment on an initial hyper-parameter value of the initialized xgboost model according to the feature data and the loss function, so as to reduce the corresponding loss function;
a determining unit, configured to determine, when the loss function corresponding to the adjusted xgboost model is not continuously decreased any more, a super parameter value of the adjusted xgboost model as a target parameter value, where the loss function corresponding to the target parameter value is a target loss function;
and the second adjusting unit is used for adjusting the leaf node weight of the target loss function to enable the predicted value output by the xgboost model to be closer to the true value.
7. The model training system of claim 6, wherein the penalty function for the xgboost model is obtained by the following equation:
Figure FDA0002515104010000031
wherein L represents the loss function,
Figure FDA0002515104010000032
t represents the number of leaf nodes in the current tree structure model, ft(xi) A functional representation representing the current tree structure model, γ representing a hyperparameter controlling the number of leaves, λ representing an L2 hyperparameter controlling the weights of the leaves, w representing the weights of the leaves, yiRepresenting the true value of the current tree structure model,
Figure FDA0002515104010000033
representing the predicted values of the first t-1 tree structure models,
Figure FDA0002515104010000034
represents the partial derivative, l represents the residual equation, and n represents the total amount of historical data for the hotel order.
8. The system of model training of claim 5, the system further comprising:
the preprocessing module is used for preprocessing the historical data after acquiring the historical data corresponding to the plurality of hotel orders, wherein the preprocessing operation comprises removing the hotel orders with order reply duration longer than preset duration.
9. A method for predicting order reply duration, characterized in that the method comprises:
receiving a target hotel order to be predicted in real time;
inputting the target hotel order into the order reply duration prediction model trained by the method according to any one of claims 1 to 4 to obtain a reply duration value.
10. The method for predicting order reply duration of claim 9, wherein the method further comprises:
acquiring the order type of the target hotel order and the order placing time point of the user;
determining an order urging time according to the order type, the order placing time point of the user and the reply duration value;
and sending the order-prompting information corresponding to the target hotel order based on the order-prompting time.
11. A system for predicting an order reply duration, the system comprising:
the receiving module is used for receiving a target hotel order to be predicted in real time;
an input module, configured to input the target hotel order into the order reply duration prediction model trained by the system according to any one of claims 5 to 8, so as to obtain a reply duration value.
12. The system for order reply duration prediction according to claim 11, wherein the system further comprises:
the acquisition module is used for acquiring the order type of the target hotel order and the order placing time point of the user;
the determining module is used for determining the order urging time according to the order type, the order placing time point of the user and the reply duration value;
and the sending module is used for sending the order-prompting information corresponding to the target hotel order based on the order-prompting time.
13. An electronic device comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing a method of model training according to any one of claims 1-4, or performing a method of order reply duration prediction according to claim 9 or 10.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of model training according to any one of claims 1 to 4, or the steps of the method of order reply duration prediction according to claim 9 or 10.
CN202010473630.6A 2020-05-29 2020-05-29 Model training method, duration prediction method, system, device and medium Pending CN111639807A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010473630.6A CN111639807A (en) 2020-05-29 2020-05-29 Model training method, duration prediction method, system, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010473630.6A CN111639807A (en) 2020-05-29 2020-05-29 Model training method, duration prediction method, system, device and medium

Publications (1)

Publication Number Publication Date
CN111639807A true CN111639807A (en) 2020-09-08

Family

ID=72329353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010473630.6A Pending CN111639807A (en) 2020-05-29 2020-05-29 Model training method, duration prediction method, system, device and medium

Country Status (1)

Country Link
CN (1) CN111639807A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257884A (en) * 2020-09-25 2021-01-22 南京意博软件科技有限公司 Order management method and system
CN112801763A (en) * 2021-04-14 2021-05-14 浙江口碑网络技术有限公司 Touch and reach scheme generation method and device and electronic equipment
CN113266952A (en) * 2021-05-24 2021-08-17 佛山市顺德区美的洗涤电器制造有限公司 Temperature control method and system for wall-mounted boiler and server

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002268971A (en) * 2001-03-08 2002-09-20 Ntt Data Corp Web service system
US20140207499A1 (en) * 2013-01-24 2014-07-24 Room 77, Inc. Check-in to a hotel room online
JP2019082872A (en) * 2017-10-31 2019-05-30 株式会社日立製作所 Action proposal system
CN110110936A (en) * 2019-05-13 2019-08-09 拉扎斯网络科技(上海)有限公司 Order duration estimation method, estimation device, storage medium and electronic equipment
CN110490357A (en) * 2019-07-02 2019-11-22 北京星选科技有限公司 Confirmation method, device, server, the electronic equipment of waiting time
CN112257884A (en) * 2020-09-25 2021-01-22 南京意博软件科技有限公司 Order management method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002268971A (en) * 2001-03-08 2002-09-20 Ntt Data Corp Web service system
US20140207499A1 (en) * 2013-01-24 2014-07-24 Room 77, Inc. Check-in to a hotel room online
JP2019082872A (en) * 2017-10-31 2019-05-30 株式会社日立製作所 Action proposal system
CN110110936A (en) * 2019-05-13 2019-08-09 拉扎斯网络科技(上海)有限公司 Order duration estimation method, estimation device, storage medium and electronic equipment
CN110490357A (en) * 2019-07-02 2019-11-22 北京星选科技有限公司 Confirmation method, device, server, the electronic equipment of waiting time
CN112257884A (en) * 2020-09-25 2021-01-22 南京意博软件科技有限公司 Order management method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257884A (en) * 2020-09-25 2021-01-22 南京意博软件科技有限公司 Order management method and system
CN112801763A (en) * 2021-04-14 2021-05-14 浙江口碑网络技术有限公司 Touch and reach scheme generation method and device and electronic equipment
CN112801763B (en) * 2021-04-14 2021-08-24 浙江口碑网络技术有限公司 Touch and reach scheme generation method and device and electronic equipment
CN113266952A (en) * 2021-05-24 2021-08-17 佛山市顺德区美的洗涤电器制造有限公司 Temperature control method and system for wall-mounted boiler and server

Similar Documents

Publication Publication Date Title
CN111639807A (en) Model training method, duration prediction method, system, device and medium
CN110390408B (en) Transaction object prediction method and device
WO2019001359A1 (en) Data processing method and data processing apparatus
US11449798B2 (en) Automated problem detection for machine learning models
CN110502538B (en) Method, system, equipment and storage medium for portrait tag generation logic mapping
CN104572976A (en) Website data updating method and system
CN111080417A (en) Processing method for improving booking smoothness rate, model training method and system
CN111507541B (en) Goods quantity prediction model construction method, goods quantity measurement device and electronic equipment
CN115688547A (en) Simulated weather scenarios and extreme weather predictions
CN115454420A (en) Artificial intelligence algorithm model deployment system, method, equipment and storage medium
CN110083518B (en) AdaBoost-Elman-based virtual machine software aging prediction method
CN114818353A (en) Train control vehicle-mounted equipment fault prediction method based on fault characteristic relation map
CN113656315B (en) Data testing method and device, electronic equipment and storage medium
CN113190746A (en) Recommendation model evaluation method and device and electronic equipment
WO2018200937A1 (en) Systems and methods for dynamic risk modeling tagging
CN111861801A (en) Hotel full room prediction method, system, equipment and storage medium
CN115391746B (en) Interpolation method, interpolation device, electronic device and medium for meteorological element data
CN110753366A (en) Prediction processing method and device for industry short message gateway capacity
CN111159988B (en) Model processing method, device, computer equipment and storage medium
CN114978944A (en) Pressure testing method, device and computer program product
US20210027234A1 (en) Systems and methods for analyzing user projects
CN113704314A (en) Data analysis method and device, electronic equipment and storage medium
CN112308639B (en) Aging prediction method and device for target event
CN112148551B (en) Method, apparatus and computer program product for determining a rate of change of usage of a storage system
JP2023516035A (en) A method and system for processing data with varying temporal characteristics to generate predictions about management arrangements using a random forest classifier

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200908

RJ01 Rejection of invention patent application after publication