Disclosure of Invention
The invention mainly aims to provide a method for predicting the price of an air ticket by combining flight information and a price sequence, and aims to overcome the problems.
In order to achieve the purpose, the invention provides a method for predicting the price of an air ticket by combining flight information and a price sequence, which comprises the following steps:
s10, collecting historical flight characteristics and a recent price sequence, respectively extracting flight continuous characteristics and flight discrete characteristics from the historical flight characteristics, and carrying out unique hot coding on the flight discrete characteristics; respectively extracting price continuous features and price discrete features from the recent price sequence according to the continuous features, and carrying out one-hot coding on the price discrete features;
s20, establishing flight characteristic prediction models and price sequence prediction models respectively by using a machine learning model;
s30, inputting the flight continuous characteristic and the flight discrete characteristic after being coded into a flight characteristic prediction model to train and optimize the flight weight β of the model, and inputting the price continuous characteristic and the price discrete characteristic into a price sequence prediction model to train and optimize the sequence weight beta of the model;
s40 flight forecast price P output based on flight characteristic forecast modelstaticPrice sequence prediction model output sequence prediction price Pdynamicand constructing an objective prediction function by combining the optimized flight weight α and the optimized sequence weight β to obtain a prediction result.
Preferably, the target prediction function is specifically as follows:
wherein P isstaticFlight prediction price, P, output by the flight characteristic prediction modeldynamicPrices are predicted from the sequence output based on the price sequence prediction model.
Preferably, the machine learning model is a multilayer perception mechanism building model of a neural network, the neural network adopts a dynamic neural network built by a deep learning network framework pytorch, and the dynamic neural network comprises an optimizer which takes the multilayer perception mechanism as the model, takes the root mean square error RMSE as the loss function of the model and takes the Adam algorithm as the weight coefficient of the optimization model.
Preferably, the method of S30 includes:
s301, flight continuous characteristics and flight discrete characteristics are aggregated and then input into a flight characteristic prediction model; after the price continuous characteristic and the price flight discrete characteristic are aggregated, inputting a price sequence prediction model;
s302, the flight characteristic prediction model represents a mapping function of mapping flight characteristics to flight prices through a multilayer perceptron of a neural network, and flight prediction prices are output; the price sequence prediction model represents a mapping function of mapping price sequence characteristics to flight prices through a multilayer perceptron of a neural network, and outputs sequence prediction prices;
s303, respectively calculating flight errors of the predicted flight price and the actual price and sequence errors of the sequence predicted price and the actual price through a Root Mean Square Error (RMSE);
s304, adopting an Adam algorithm as an optimizer to carry out back propagation on flight errors so as to optimize flight weights α of the flight characteristic prediction model, and adopting the Adam algorithm as the optimizer to carry out back propagation on sequence errors so as to optimize sequence weights β of the price sequence prediction model.
Preferably, the flight continuity characteristics at least include flight duration, departure date, number of the day of takeoff, whether the day of takeoff is a holiday, whether the day of takeoff is a weekend, whether the flight is across the sky, number of stops, airport construction cost, fuel surcharge, tax, whether the flight is a shared flight, and number of days away from the day of takeoff.
Preferably, the price continuation feature includes, in addition to the contents included in the flight connection feature, at least a sequence of prices near the query date and a number of days from the departure date for each price in the sequence.
Preferably, the price discrete feature and the flight discrete feature are the same and at least include a takeoff time, an arrival time, a takeoff airport, an arrival airport, an airline and an actual carrier airline, wherein the takeoff time and the arrival time are subjected to segmented unique hot coding, and the other discrete features are subjected to unique hot coding.
Preferably, the machine learning model may build a model for a CART regression tree in a decision tree.
Preferably, the recent price sequence is composed of the lowest price sequence T days before flight F { PQ-T,PQ-(T-1),PQ-(T-3)...PQ-3,PQ-2,PQ-1}。
Preferably, the specific method for performing segmented one-hot coding on the takeoff time and the arrival time comprises: the 24 hours per day is divided into four time periods of [0:00-10:00], [10:00-14:00], [14:00-19:00] and [19:00-24:00], and the departure time and the arrival time correspond to the four time periods for independent thermal coding.
11. Preferably, the multilayer perceptron comprises:
the input layer comprises a plurality of neurons, the number of the neurons is determined by an actual route, and the neurons are used for inputting flight continuous features and the flight discrete features after being coded into the input flight feature prediction model; inputting the price continuous characteristic and the coded price discrete characteristic into a price sequence prediction model;
first hidden layer h1Total 32 neurons, each neuron is connected with the input layer, and the input layer is subjected to nonlinear transformation h1=Relu(w1x+b1) Is obtained wherein w1To connect coefficients, b1In the flight characteristic prediction model, x represents flight continuous characteristics and flight discrete characteristics after being coded; in the price sequence prediction model, x represents a price continuous characteristic and a coded price discrete characteristic, and Relu is a linear rectification activation function;
second hidden layer h2Total 32 neurons, each neuron is fully connected with the first hidden layer, and the first hidden layer is subjected to nonlinear transformation h2=Relu(w2h1+b2) Is obtained wherein w2To connect coefficients, b2For bias, Relu is the linear rectification activation function;
the output layer P has 1 neuron, the neuron is fully connected with the second hidden layer, and the second hidden layer is subjected to linear transformation P ═ w3h2+b3Is obtained wherein w3To connect coefficients, b3For bias, P is the predicted price of the output.
Compared with the prior art, the invention has the beneficial effects that: the invention effectively predicts the air ticket price by combining the historical flight characteristics and the air ticket price prediction method of the price sequence, and well reduces the error in the air ticket price prediction. To alleviate the failure problem of the accumulated error in the price series-based prediction model based on the flight characteristic prediction model. When the recent price fluctuates sharply, the prediction model based on the price sequence can make up the defect that the prediction model based on the flight information is not accurate enough.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.
In addition, if there is a description of "first", "second", etc. in an embodiment of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
The invention provides a method for predicting air ticket prices by combining flight information and price sequences, which comprises the following steps:
s10, collecting historical flight characteristics and a recent price sequence, respectively extracting flight continuous characteristics and flight discrete characteristics from the historical flight characteristics, and carrying out unique hot coding on the flight discrete characteristics; respectively extracting price continuous features and price discrete features from the recent price sequence according to the continuous features, and carrying out one-hot coding on the price discrete features;
s20, establishing flight characteristic prediction models and price sequence prediction models respectively by using a machine learning model;
s30, inputting the flight continuous characteristic and the flight discrete characteristic after being coded into a flight characteristic prediction model to train and optimize the flight weight β of the model, and inputting the price continuous characteristic and the price discrete characteristic into a price sequence prediction model to train and optimize the sequence weight beta of the model;
s40 flight forecast price P output based on flight characteristic forecast modelstaticPrice sequence prediction model output sequence prediction price Pdynamicand constructing an objective prediction function by combining the optimized flight weight α and the optimized sequence weight β to obtain a prediction result.
In the embodiment of the invention, flight characteristics and price records of each flight with determined takeoff date under the condition of different days from the takeoff date are extracted from historical price data, the difference of the days from the creation date to the takeoff date is calculated and recorded as the 'days from the takeoff date', the continuous characteristics and the discrete characteristics in the flight characteristics are integrated as the input of a multilayer perceptron, and the corresponding price is used as the output to construct a plurality of training samples. The method comprises the steps of extracting prices and flight records of each determined takeoff date flight on different inquiry dates, calculating and recording the number difference between the creation date and the takeoff date as the number of days away from the takeoff date, integrating continuous features and discrete features in flight information, price sequences near the inquiry date Q and the number difference between each price and the takeoff date corresponding to each price in the sequence as input of a neural network, taking the corresponding price as output, and constructing a plurality of training samples.
Among the flight characteristics, the discrete characteristics include departure time, arrival time, departure airport, arrival airport, airline, actual carrier airline. For example, assuming that airlines share a, b, c, and d, the one-hot code corresponding to airline a is [1, 0, 0]]The one-hot code corresponding to airline b is [0, 1, 0]The one-hot code corresponding to the airline company c is [0, 0, 1, 0]]The one-hot code corresponding to the airline company d is [0, 0, 0, 1]]. A continuous feature is provided. In the flight characteristics, the continuous characteristics include flight duration, a takeoff month, a day of takeoff, whether the day of takeoff is a holiday, whether the day of takeoff is a weekend, whether flights cross the day, the number of stops, airport construction cost, fuel oil additional cost, tax cost, and whether the flights are shared flights. For example, if a flight has a flight duration of 95 minutes, takes off at 18 th 6 th, does not span the day, does not stop, has an airport construction fee of 50 yuan, has a fuel oil additional fee and a tax fee of 0 yuan, and is a shared flight, the feature vector corresponding to the flight is set to [95, 6, 18, 0, 0, 50, 0, 0, 1]. Setting the number of days from the takeoff date, aggregating the continuous characteristic and the discrete characteristic as input, inputting the input into the prediction model, and obtaining the corresponding predicted air ticket price Pstatic. By setting different days from the takeoff date, the predicted air ticket prices of different dates can be obtained. For example, if the date of the query day is 3 months and 5 days, and the price of the air ticket at 7 days after the query date is predicted, the "days from the departure date" may be set to 8, and the discrete and continuous features of the flight F may be aggregated as input ([8, 90, 3, 20, 1, 0, 50, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0]) After entering the prediction model, the air ticket price 820 for the 7 th day after the set inquiry date can be obtained. Setting discrete characteristics in flight characteristic-based prediction methodThe characterization steps are the same. The discrete feature code for this example is [0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0]。
The invention predicts the future price of the flight by combining the flight characteristic prediction model and the price sequence prediction model, can provide ticket purchasing reference for passengers, and is convenient for the passengers to make a better purchasing scheme according to the predicted price trend. Meanwhile, forecasting the price trend can also provide reference for the airline companies, so that the airline companies can adjust price pricing better to obtain more profits.
When the flight characteristic prediction model or the price sequence prediction model is used for predicting the future air ticket price, the number of days from the departure date is set, and the corresponding discrete characteristics and continuous characteristics are aggregated and input into the prediction model to obtain the flight predicted price or the sequence predicted price on the corresponding date. For example, if the departure date of flight F is D, the date of the query day is Q, and the ticket price i days after the query date Q is to be predicted, the "number of days from the departure date" may be set to be D-Q + i, and the discrete features and the continuous features of flight F are aggregated and input to the prediction model, so that the ticket price i days after the set query date Q can be obtained.
According to the invention, a new air ticket price prediction model is constructed by linearly combining the weight with the flight characteristic prediction model and the price sequence prediction model, so that the problem of accumulated errors in the prediction of long-term air ticket prices in the future by the single price sequence prediction model is solved, and the defect that the prediction model based on flight information is not accurate enough when the recent price fluctuates severely is overcome. The method not only reduces the error in the air ticket price prediction, but also relieves the failure problem of the accumulated error of the prediction model based on the price sequence. When the recent price fluctuates sharply, the prediction model based on the price sequence can make up the defect that the prediction model based on the flight information is not accurate enough.
Preferably, the target prediction function is specifically as follows:
wherein P isstaticFlight prediction price, P, output by the flight characteristic prediction modeldynamicPrices are predicted from the sequence output based on the price sequence prediction model.
Preferably, the machine learning model is a multilayer perception mechanism building model of a neural network, the neural network adopts a dynamic neural network built by a deep learning network framework pytorch, and the dynamic neural network comprises an optimizer which takes the multilayer perception mechanism as the model, takes the root mean square error RMSE as the loss function of the model and takes the Adam algorithm as the weight coefficient of the optimization model.
Preferably, the method of S30 includes:
s301, flight continuous characteristics and flight discrete characteristics are aggregated and then input into a flight characteristic prediction model; after the price continuous characteristic and the price flight discrete characteristic are aggregated, inputting a price sequence prediction model;
s302, the flight characteristic prediction model represents a mapping function of mapping flight characteristics to flight prices through a multilayer perceptron of a neural network, and flight prediction prices are output; the price sequence prediction model represents a mapping function of mapping price sequence characteristics to flight prices through a multilayer perceptron of a neural network, and outputs sequence prediction prices;
s303, respectively calculating flight errors of the predicted flight price and the actual price and sequence errors of the sequence predicted price and the actual price through a Root Mean Square Error (RMSE);
s304, adopting an Adam algorithm as an optimizer to carry out back propagation on flight errors so as to optimize flight weights α of the flight characteristic prediction model, and adopting the Adam algorithm as the optimizer to carry out back propagation on sequence errors so as to optimize sequence weights β of the price sequence prediction model.
In the embodiment of the invention, the multi-layer perceptron in the neural network is taken as an example and all models are realized by using the pytorch, the multi-layer perceptron used in the example comprises four layers, the first layer is an input layer, the number of the carrying airlines is generally different due to the number of take-off and landing airports in different routes, so that the take-off and landing airports are related, the one-hot coding lengths of the non-numerical characteristics of the airlines are different, and the number of the neurons in the input layer is determined by a specific route. The second layer is the first hidden layer, and the number of neurons in the layer is set to be 32. The third layer is a second hidden layer, and the number of neurons in the layer is set to be 32. The fourth layer is an output layer, the number of the neurons is set to be 1, and the predicted air ticket price is output. Between the input layer and the hidden layer, the hidden layer and the hidden layer, a non-linear transformation is introduced by a non-linear activation function, here a Relu activation function is used. The invention sets the root mean square error RMSE as a loss function for evaluating the error of a model to a training set, and the calculation formula of the RMSE is as follows:
where y denotes the true price, PstaticRepresenting the predicted price, and after taking the error, using Adam as an optimizer to back-propagate the error to update the parameters in the model.
Preferably, the flight continuity characteristics at least include flight duration, departure date, number of the day of takeoff, whether the day of takeoff is a holiday, whether the day of takeoff is a weekend, whether the flight is across the sky, number of stops, airport construction cost, fuel surcharge, tax, whether the flight is a shared flight, and number of days away from the day of takeoff.
In the embodiment of the invention, the continuous characteristics of the flight schedule include flight duration, takeoff month, the number of the takeoff day, whether flights cross the day, the number of the stop, airport construction cost, fuel oil additional cost, tax and whether the flights are shared flights. The flight F corresponds to a flight duration of 90 minutes, takes off in 3 months and 20 days, spans 1 day, has a number of stops of 0, has an engineering cost of 50 yuan, has a fuel cost and a tax cost of 0 yuan, and is a shared flight (corresponding to 1, not corresponding to 0). Thus, the continuation feature for flight F is set to [90, 3, 20, 1, 0, 50, 0, 0, 1 ].
Preferably, the price continuation feature includes, in addition to the contents included in the flight connection feature, at least a sequence of prices near the query date and a number of days from the departure date for each price in the sequence.
In the inventionIn the embodiment, the price continuation feature of the invention needs to additionally add a price sequence with a recent inquiry date and a sequence corresponding to the day number difference in addition to the step of setting the continuation feature in the prediction method of the flight feature. For example, the price series of the last 3 days of the query date Q is additionally set as { PQ-2,PQ-1,PQThe difference in the number of days corresponding to the date of takeoff is { d }Q-2,dQ-1,dQ}. In this example, the query date is 3 months and 5 days, the price sequence of the last 3 days is {863, 884, 845}, and the difference of the number of days from the takeoff date is {17, 16, 15 }. The consecutive features of this example are [90, 3, 20, 1, 0, 50, 0, 0, 1, 863, 17, 884, 16, 845, 15]。
Preferably, the price discrete feature and the flight discrete feature are the same and at least include a takeoff time, an arrival time, a takeoff airport, an arrival airport, an airline and an actual carrier airline, wherein the takeoff time and the arrival time are subjected to segmented unique hot coding, and the other discrete features are subjected to unique hot coding.
The machine learning model may build a model for a CART regression tree in a decision tree.
Preferably, the recent price sequence is composed of the lowest price sequence T days before flight F { PQ-T,PQ-(T-1),PQ-(T-3)...PQ-3,PQ-2,PQ-1}。
Preferably, the specific method for performing segmented one-hot coding on the takeoff time and the arrival time comprises: the 24 hours per day is divided into four time periods of [0:00-10:00], [10:00-14:00], [14:00-19:00] and [19:00-24:00], and the departure time and the arrival time correspond to the four time periods for independent thermal coding.
In the embodiment of the invention, the one-hot coding is carried out after the departure time and the arrival time are divided into four time periods (0:00-10:00, 10:00-14:00, 14:00-19:00 and 19:00-24: 00). For example, the takeoff time is 15:35, which belongs to the third time period, and the corresponding one-hot code is [0, 0, 1, 0] (the third position is 1, and the rest positions are 0).
Preferably, the multilayer perceptron comprises:
the input layer comprises a plurality of neurons, the number of the neurons is determined by an actual route, and the neurons are used for inputting flight continuous features and the flight discrete features after being coded into the input flight feature prediction model; inputting the price continuous characteristic and the coded price discrete characteristic into a price sequence prediction model;
first hidden layer h1Total 32 neurons, each neuron is connected with the input layer, and the input layer is subjected to nonlinear transformation h1=Relu(w1x+b1) Is obtained wherein w1To connect coefficients, b1In the flight characteristic prediction model, x represents flight continuous characteristics and flight discrete characteristics after being coded; in the price sequence prediction model, x represents a price continuous characteristic and a coded price discrete characteristic, and Relu is a linear rectification activation function;
second hidden layer h2Total 32 neurons, each neuron is fully connected with the first hidden layer, and the first hidden layer is subjected to nonlinear transformation h2=Relu(w2h1+b2) Is obtained wherein w2To connect coefficients, b2For bias, Relu is the linear rectification activation function;
the output layer P has 1 neuron, the neuron is fully connected with the second hidden layer, and the second hidden layer is subjected to linear transformation P ═ w3h2+b3Is obtained wherein w3To connect coefficients, b3For bias, P predicts the price for the outgoing ticket.
Compared with the prior art, the invention has the following advantages and technical effects:
the method combines two types of prediction models, can well combine the advantages of the two types of prediction models, solves the problem of accumulated error in predicting the future long-term air ticket price in the prediction model based on the price sequence by using the prediction model based on the flight characteristics, and overcomes the defect that the prediction model based on the flight information is not accurate enough when the recent price fluctuates severely.
The invention combines two types of prediction models to predict the future price of the flight, can provide ticket buying reference for passengers, and is convenient for the passengers to make a more optimal purchasing scheme according to the predicted price trend. Meanwhile, the predicted price trend can also provide reference for the airline company, so that the airline company can adjust price pricing better to obtain more profits.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.