CN107506435B - Special price air ticket query method based on price prediction - Google Patents

Special price air ticket query method based on price prediction Download PDF

Info

Publication number
CN107506435B
CN107506435B CN201710729735.1A CN201710729735A CN107506435B CN 107506435 B CN107506435 B CN 107506435B CN 201710729735 A CN201710729735 A CN 201710729735A CN 107506435 B CN107506435 B CN 107506435B
Authority
CN
China
Prior art keywords
news
date
price
air ticket
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710729735.1A
Other languages
Chinese (zh)
Other versions
CN107506435A (en
Inventor
胡婕茹
李尚锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huoli Tianhui Technology Co ltd
Original Assignee
Shenzhen Huoli Tianhui Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huoli Tianhui Technology Co ltd filed Critical Shenzhen Huoli Tianhui Technology Co ltd
Priority to CN201710729735.1A priority Critical patent/CN107506435B/en
Publication of CN107506435A publication Critical patent/CN107506435A/en
Application granted granted Critical
Publication of CN107506435B publication Critical patent/CN107506435B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0283Price estimation or determination

Abstract

The invention discloses a special price air ticket query method based on price prediction, and belongs to the technical field of air ticket query. The method comprises the following steps: the method comprises the steps that before air ticket price inquiry is carried out, the air ticket price at the inquiry time is predicted according to historical data of the air ticket price, and inquiry is carried out if the predicted air ticket price is lower than a set threshold value; otherwise, no query is made. The method and the device predict before the actual query, and query is carried out only when the prediction result shows that the price of the air ticket is reduced below the expected price, so that the probability of searching the required special air ticket during the actual query is greatly improved, the phenomenon of wasting query cost due to the fact that the required special air ticket cannot be searched can be avoided, and the query cost is reduced.

Description

Special price air ticket query method based on price prediction
Technical Field
The invention relates to the field of air ticket price inquiry, in particular to a special price air ticket inquiry method based on price prediction.
Background
Traveling abroad has become an important way for people to relax and entertain today. International airline tickets are a significant proportion of the cost of travel, and it is desirable to buy relatively inexpensive tickets. In order to provide low-price air tickets of all airlines for a user to select in a future period, a special-price air ticket channel is set up in a travel website, and relatively cheap air tickets are provided for the user by means of caching the price of the air tickets which are inquired recently and manually configuring promotion activities of airlines and the like. The measures are limited by the search amount of a user and the coverage degree of manually discovering special price activities, and have good effect on hot routes but poor effect on cold routes.
Premium tickets are tickets that are relatively inexpensive for a period of time (typically 2-3 months). If all fares are available in time, it is not difficult to find a relatively cheap fare. However, in reality, price query of an international air ticket with a fixed airline and date takes several seconds to tens of seconds, and certain query cost needs to be paid to an interface party, while a round-trip air ticket of an airline staying within two months for the shortest two days and the longest 15 days needs to be queried as many as 840 times, and all international airlines are hundreds of thousands, the hot airline accounts for only 1%, and for 99% of relatively cold airlines, due to lack of cached price data, the cost of acquiring all prices in real time is very high. Therefore, the problem of the current international air ticket price inquiry is that the price inquiry of the international air line air ticket at the relatively cold door is high in cost.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a special price air ticket query method based on price prediction, which carries out air ticket price prediction before the actual special price air ticket query, and carries out the actual query if the air ticket price is reduced to the expected price; otherwise, no actual query is made. The query cost is reduced.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a special price air ticket query method based on price prediction, which comprises the following steps:
the method comprises the steps that before air ticket price inquiry is carried out, the air ticket price at the inquiry time is predicted according to historical data of the air ticket price, and inquiry is carried out if the predicted air ticket price is lower than a set threshold value; otherwise, no query is made.
Further, the historical data of the air ticket price comprises: the departure place, the destination, the flight control department, the departure date, the departure time, the arrival time, the transit stop time, the inquiry time, the air ticket prices of different cabins, the air ticket taxes, the remaining ticket numbers of different cabins and the flight control airplane type.
Further, the method for predicting the air ticket price at the query time according to the historical data of the air ticket price comprises the following steps:
and training a data set consisting of the historical data of the inquired air ticket prices by utilizing a regularized multiple linear regression method to obtain a fitting formula for predicting the air ticket prices. The fitting equation is expressed as follows:
Figure BDA0001386832630000021
in the formula (1), y is the price of the tax-free air ticket of the inquired cabin according to the air ticket prices of different cabins; thetaiIs xiI is 0, 1, 2, …, 24; x is the number of0=1;x1The distance from the starting place to the destination is the unit of kilometers; x is the number of2Month of the departure date; x is the number of3The week number of the departure date; x is the number of4The number of minutes from the takeoff time point 0; x is the number of5Minutes to time from 0; x is the number of6The number of minutes of transit; x is the number of7The number of days from the query time to the takeoff time; x is the number of8The number of passengers of the flight is obtained according to the type of the airplane carried by the flight; x is the number of9The unit is a billion yuan of airplane price estimation value obtained according to the type of airplane carried by the flight; x is the number of10The unit is element for air ticket tax; x is the number of11The average number of the remaining tickets is obtained according to the number of the remaining tickets of different slots, when the number of the remaining tickets of each slot exceeds 10, the number of the remaining tickets is 10, which represents that the remaining tickets are sufficient, and the value range of the average number of the remaining tickets is 0-10;
Figure BDA0001386832630000022
i=12,13,…,22;x23=(10-x11)×x810, an estimate of the minimum number of free seats;
Figure BDA0001386832630000023
predicting the non-tax price of said ticket at the time of inquiry according to equation (1), x at the time of prediction11And taking the TNP, wherein the TNP is the average number of remaining tickets of the query time predicted according to the historical data of the queried air ticket prices. The method for solving the TNP comprises the following steps:
and training a data set consisting of historical data of the inquired air ticket prices by utilizing a regularized multiple linear regression method to obtain a fitting formula of the average number of the remaining tickets for predicting the inquiry time. The fitting equation is expressed as follows:
Figure BDA0001386832630000031
in the formula (2), z is the average number of the remaining tickets obtained according to the number of the remaining tickets of different cabins αiIs tiI is 0, 1, 2, …, 20; t is t0=1;t1The distance from the starting place to the destination is the unit of kilometers; t is t2Month of the departure date; t is t3The week number of the departure date; t is t4The number of minutes from the takeoff time point 0; t is t5Minutes to time from 0; t is t6The number of minutes of transit; t is t7The number of days from the query time to the takeoff time; t is t8The number of passengers of the flight is obtained according to the type of the airplane carried by the flight; t is t9The unit is a billion yuan of airplane price estimation value obtained according to the type of airplane carried by the flight; t is t10The unit is element for air ticket tax;
Figure BDA0001386832630000032
i=11,12,…,20。
and predicting the average remaining ticket number TNP of the query time according to the formula (2).
Further, when considering news influence, the solution method of TNP includes:
and periodically grabbing a news website by using a crawler program, traversing the grabbed news, acquiring the minimum classification of the navigation bar of each news, and classifying the news with the same minimum classification of the navigation bar into the same news. Counting the number of news and the reading amount and the number of comments of each news;
when a certain news appears for the first time in a capturing period, the title and the text of the news are traversed, the date with the highest frequency of appearance is used as the starting date of the news, and if the date acquisition fails, the capturing day is used as the starting date. Setting N days after the start date as the end date of the news;
acquiring all place names in each news, taking the place name with the highest frequency of occurrence as the place name of the news place, finding the airport closest to the place as the place airport and the airports covered by the rest place names, and traversing all routes from the place airport of the event to the airports covered by the rest place names, wherein the route distance is greater than a threshold value dminOf the route of the newsAffecting the course;
checking whether a news title contains a negative word, and if the news title contains the negative word, the influence of the news is negative; if no negative words are included, the effect is positive. The negative words include: earthquakes, tsunamis, debris flows, fire wars, and riots;
predicting the average remaining ticket number of the query time, wherein the calculation formula is as follows:
Figure BDA0001386832630000041
in the formula (3), TNP0(date) is the average number of remaining tickets for date when news influence is not considered, predicted from the formula (2); mu is a set coefficient; d is the route distance of the inquired route, and the unit is kilometer; h (date) is the combined heat of all news on date and date for the inquired airline which affects the airline, and the solution formula is as follows:
Figure BDA0001386832630000042
in equation (4), h (date) is the combined popularity of all news affecting the airline being queried (i.e., all news affecting the airline being queried) at date; p (date) is the number of the news at date; h (i, date) is the popularity of the ith news in date, and f is positive influenceiWhen negative, f is 1i-1; d is the route distance of the query route, and the unit is kilometers; m (i) the number of affected routes for the ith news; dijThe route distance of the j th influencing route of the ith news; dminThe unit of the threshold value for judging the influence air route is kilometers; power is the average number of flights per day for the airline being queried; the solving formula of the heat h (i, date) of the ith news at date is as follows:
Figure BDA0001386832630000043
in formula (5), ssiThe start date of the ith news; esiIs the ithThe end date of the news; now is the date when the news was captured; h (i, now) is the popularity of the ith news on the date now, and is equal to the difference between the popularity sum obtained by capturing this time and the popularity sum extracted last time, which is obtained by weighted summation of the reading amount and the comment number of the news, divided by the number of days between two capturing.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a special price air ticket query method based on price prediction, which comprises the steps of predicting the air ticket price at query time according to the historical data of an air ticket before the air ticket price is queried, and querying if the predicted air ticket price is lower than a set threshold value; otherwise, the inquiry is not carried out, the price prediction is carried out before the actual inquiry is carried out, the actual inquiry is carried out only when the price of the air ticket is lower than the expected price, and the cost for carrying out special price air ticket inquiry on the cold air route is reduced.
Drawings
Fig. 1 is a prediction curve of average remaining ticket number: the curve 1 and the curve 2 are average remaining ticket number prediction curves when news influence is not considered and news influence is considered respectively;
fig. 2 is a graph of the forecast of the price of the air ticket in consideration of the influence of news.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The embodiment of the invention provides a special price air ticket query method based on price prediction, which comprises the following steps:
the method comprises the steps that before air ticket price inquiry is carried out, the air ticket price at the inquiry time is predicted according to historical data of the air ticket price, and inquiry is carried out if the predicted air ticket price is lower than a set threshold value; otherwise, no query is made.
In this embodiment, before the air ticket price query is performed, the air ticket price at the query time is predicted, and if the predicted air ticket price is already as low as the expected price, the actual query is performed; if not as low as expected, no query is made. The set threshold is the expected price, and the size of the threshold can be set according to the specific situation of the user. Historical data of air ticket prices are obtained through usual accumulation, and air ticket price data of different air routes obtained during usual inquiry are stored in a historical database and continuously stored with latest data. In practical application, the method can predict the air ticket price in one day at a time, and also can predict the air ticket price in the next several days to obtain an air ticket price prediction curve. According to the air ticket price prediction curve, the change trend of the air ticket price can be better known, and particularly under the condition that the departure date is not fixed, the date with the lowest price can be set as the departure date according to the prediction curve.
In the prior art, the query is directly carried out without prediction, so that the required special price air ticket can not be found, and the query cost is wasted in vain. According to the query method, due to the fact that prediction is carried out firstly, only when the prediction result shows that the air ticket price is reduced to be lower than the expected price, query is carried out, so that the probability of finding the required special air ticket in actual query is greatly improved, the query cost can be avoided being wasted due to the fact that the required special air ticket cannot be found, and the query cost is reduced.
As an alternative embodiment, the historical data of the air ticket prices includes: the departure place, the destination, the flight control department, the departure date, the departure time, the arrival time, the transit stop time, the inquiry time, the air ticket prices of different cabins, the air ticket taxes, the remaining ticket numbers of different cabins and the flight control airplane type.
In this embodiment, the content of the historical data of the air ticket prices is given, and includes departure place, destination, flight ticket receiving department, departure date, departure time, arrival time, transit stop time, inquiry time, air ticket prices of different cabins, air ticket taxes, remaining ticket numbers of different cabins, flight ticket receiving airplane type, and the like. The spaces of an aircraft typically include first class, business class and economy class, and are also more finely classified. The first class is the highest fare, the second class is the business class, and the lowest class is the economic class. In addition, the air ticket prices of different cabins refer to air ticket prices without tax. The surplus ticket numbers of different cabins comprise first-class surplus ticket numbers, business class surplus ticket numbers and economic class surplus ticket numbers.
As an alternative embodiment, the method of predicting the price of a flight ticket at a query time based on historical data of the price of the flight ticket includes: and training a data set consisting of the historical data of the inquired air ticket prices by utilizing a regularized multiple linear regression method to obtain a fitting formula for predicting the air ticket prices. The fitting equation is expressed as follows:
Figure BDA0001386832630000061
in the formula (1), y is the price of the tax-free air ticket of the inquired cabin according to the air ticket prices of different cabins; thetaiIs xiI is 0, 1, 2, …, 24; x is the number of0=1;x1The distance from the starting place to the destination is the unit of kilometers; x is the number of2Month of the departure date; x is the number of3The week number of the departure date; x is the number of4The number of minutes from the takeoff time point 0; x is the number of5Minutes to time from 0; x is the number of6The number of minutes of transit; x is the number of7The number of days from the query time to the takeoff time; x is the number of8The number of passengers of the flight is obtained according to the type of the airplane carried by the flight; x is the number of9The unit is a billion yuan of airplane price estimation value obtained according to the type of airplane carried by the flight; x is the number of10The unit is element for air ticket tax; x is the number of11The average number of the remaining tickets is obtained according to the number of the remaining tickets of different slots, when the number of the remaining tickets of each slot exceeds 10, the number of the remaining tickets is 10, the remaining tickets are sufficient, and the value range of the average number of the remaining tickets is 0-10. For example, the remaining ticket numbers of the first class cabin, the business class cabin and the economy class cabin are 8, 9 and 10 respectively, and the average remaining ticket number is (8+9+10)/3 is 9;
Figure BDA0001386832630000071
i=12,13,…,22;x23=(10-x11)×x810, an estimate of the minimum number of free seats;
Figure BDA0001386832630000072
predicting a non-tax price of said air ticket at a query time according to equation (1)Time measurement x11And taking the TNP, wherein the TNP is the average number of remaining tickets of the query time predicted according to the historical data of the queried air ticket prices. The method for solving the TNP comprises the following steps:
and training a data set consisting of historical data of the inquired air ticket prices by utilizing a regularized multiple linear regression method to obtain a fitting formula of the average number of the remaining tickets for predicting the inquiry time. The fitting equation is expressed as follows:
Figure BDA0001386832630000073
in the formula (2), z is the average number of the remaining tickets obtained according to the number of the remaining tickets of different cabins αiIs tiI is 0, 1, 2, …, 20; t is t0=1;t1The distance from the starting place to the destination is the unit of kilometers; t is t2Month of the departure date; t is t3The week number of the departure date; t is t4The number of minutes from the takeoff time point 0; t is t5Minutes to time from 0; t is t6The number of minutes of transit; t is t7The number of days from the query time to the takeoff time; t is t8The number of passengers of the flight is obtained according to the type of the airplane carried by the flight; t is t9The unit is a billion yuan of airplane price estimation value obtained according to the type of airplane carried by the flight; t is t10The unit is element for air ticket tax;
Figure BDA0001386832630000074
i=11,12,…,20。
and predicting the average remaining ticket number TNP of the query time according to the formula (2).
The embodiment provides a method for predicting the air ticket price according to historical data of the air ticket price, namely, a regularized multiple linear regression method is utilized to train a data set consisting of the historical data to obtain a fitting formula of the air ticket predicted price. The regularized multiple linear regression method belongs to the mature prior art, and a specific algorithm thereof is not introduced here. The fitting formula is as in equation (1). And when the matching is carried out, y takes the non-tax price of the inquired cabin air ticket. To the right of the formula is xiLinear combination of,i=0,1,2,…,24。θiIs xiIs obtained by fitting. x is the number of1~x11Corresponding to the historical data. x is the number of11The average number of the remaining tickets is obtained according to the number of the remaining tickets in different cabins, namely the number of the remaining tickets in each cabin is summed and then the average value is obtained. Because the number of the purchased tickets at one time rarely exceeds 10, the average value is 10 when the obtained average value exceeds 10, namely the value range of the average number of the remaining tickets is 0-10. x is the number of12~x22Are respectively x1~x11Square of (d). x is the number of23=(10-x11)×x8And/10, the minimum number of vacant seats of the airplane.
Figure BDA0001386832630000081
Predicting the non-tax price of said ticket at the time of inquiry according to equation (1), x at the time of prediction11The average remaining ticket number in the historical data is not used any more, but the average remaining ticket number TNP of the query time obtained by prediction is adopted. Of course, x22、x23And x24X in (2)11TNP was also taken.
The method for solving the TNP is also characterized in that a regularized multiple linear regression method is utilized to train a data set consisting of historical data to obtain a fitting formula for predicting the average residual vote of the query time, and then prediction is carried out according to the fitting formula. The fitting formula is shown in formula (2). In contrast to equation (1), the right side of equation (2) no longer contains the average number of remaining tickets and several variables associated therewith. Therefore, the right side of the expression (2) is changed from the 25-term of the expression (1) to the 21-term.
As an alternative embodiment, when considering news influence, the method for solving TNP includes:
s1, periodically grabbing news websites by using a crawler program, traversing the grabbed news, obtaining the minimum classification of the navigation bar of each news, and classifying the news with the same minimum classification of the navigation bar into the same news. Counting the number of news and the reading amount and the number of comments of each news;
the method comprises the steps of periodically grabbing news websites by using a crawler program, obtaining the minimum classification of navigation bars of each news through traversing the grabbed news, and classifying the news with the same minimum classification of the navigation bars into the same news. For example, the smallest classification of the navigation bar as "sports > integrated sports > pattern swimming >2017 world swimming championship > text" is "2017 world swimming championship". The news of the navigation bar with the smallest classification of "2017 world swimming championship" all belong to the same news. And counting the number of news and the reading amount and the number of comments of each news for calculating the popularity of the news later.
And S2, when a certain news appears for the first time in a capturing period, traversing the title and the text of the news, taking the date with the highest frequency of appearance as the start date of the news, and taking the capturing day as the start date if the date acquisition fails. Setting N days after the start date as the end date of the news;
the step provides a technical scheme for acquiring the news starting date and the news ending date: when news appears for the first time in one capturing period, word segmentation processing is carried out on all captured news titles and texts, and the date with the highest frequency of appearance is found out to be used as the start date of the news. If the date acquisition fails, the current day is captured as the starting date. Since the news is not generally finished when the news is released, the end date cannot be obtained by capturing the news, and the end date is directly set according to the start date, that is, N days after the start date is set as the end date. Different news may set different values of N.
S3, acquiring all place names in each news, taking the place name with the highest frequency of occurrence as the place name of the news place, finding the airport closest to the place as the place of occurrence airport and the airports covered by the rest place names, traversing all routes from the airport of the event place to the airports covered by the rest place names, wherein the route distance is larger than the threshold value dminThe route of (a) is an influencing route of the news;
the step provides a technical scheme for determining the influence route of each news. Firstly, all place names appearing in each news are obtained, word segmentation is carried out, and the place name with the highest appearance frequency is found out, wherein the place name is the place name of the news place. Then, a map API (Application Programming In) is utilizedSurface, application programming interface) and airport latitude and longitude data to find the closest airport to the venue. And matching the rest place names with the airport information table to obtain the airports covered by the rest place names. Finally, the route distance from the airport closest to the place of emergence to the route of each airport covered by the rest place names is calculated, and the route distance is larger than a threshold value dminThe route of (1) is the influencing route of the news. This allows the affected route for each news item. The route distance greater than the threshold value is taken as the influence route of the news, and the traffic modes other than the airplane, such as trains or coaches and the like, can be selected mainly by considering that the distance is too short.
S4, checking whether the news title contains negative words or not, and if the news title contains negative words, the influence of the news is negative; if no negative words are included, the effect is positive. The negative words include: earthquakes, tsunamis, debris flows, fire wars, and riots;
this step provides a method for judging the nature of news influence, i.e. checking whether the news title contains negative words, if so, the influence is negative, otherwise, the influence is positive. Negative words include: earthquake, tsunami, debris flow, fire, riot. Only a few commonly used negative words are given here and limited to these, and other similar negative words can be added. In addition, when the negative word judgment is performed by the word segmentation process, synonyms and near synonym recognition, such as fire, war, and confusion, are also performed. Words synonymous or near to the negative words mentioned above are also considered negative words.
S5, predicting the average remaining ticket number of the query time, wherein the calculation formula is as follows:
Figure BDA0001386832630000101
in the formula (3), TNP0(date) is the average number of remaining tickets for date when news influence is not considered, predicted from the formula (2); mu is a set coefficient; d is the route distance of the inquired route, and the unit is kilometer; h (date) is the combined heat of all news on date and date for the inquired airline which affects the airline, and the solution formula is as follows:
Figure BDA0001386832630000102
in equation (4), h (date) is the integrated heat of the date of all news that affects the airline in which the airline is queried; p (date) is the number of the news at date; h (i, date) is the popularity of the ith news in date, and f is positive influenceiWhen negative, f is 1i-1; d is the route distance of the inquired route, and the unit is kilometer; m (i) the number of affected routes for the ith news; dijThe route distance of the j th influencing route of the ith news; dminThe unit of the threshold value for judging the influence air route is kilometers; power is the average number of flights per day for the airline being queried; the solving formula of the heat h (i, date) of the ith news at date is as follows:
Figure BDA0001386832630000103
in formula (5), ssiThe start date of the ith news; esiThe end date of the ith news; now is the date when the news was captured; h (i, now) is the popularity of the ith news on the date now, and is equal to the difference between the popularity sum obtained by capturing this time and the popularity sum extracted last time, which is obtained by weighted summation of the reading amount and the comment number of the news, divided by the number of days between two capturing.
The step provides a method for calculating the average remaining ticket number of the predicted query time, and the formula is as formula (3). To the right of equation (3) is the difference of the two parts: part is the average remaining number TNP0(date) without considering news influence, and is obtained by the formula (2); some of them are correction amounts in consideration of news effects. The correction amount is related to TNP0(date), the airline distance d of the queried airline and the integrated heat h (date) at the querying time of all news affecting the queried airline. The route distance d is on the denominator because the smaller d, the greater the proportion of the affected part of the route to the entire route, the greater the effect. Mu is a set coefficient, and the reference value is 5000-15000.
h (date) is calculated by equation (4). The right numerator of equation (4) superimposes the heat h (i, date) of each news item on the date, when the news item is positive, h (i, date) is multiplied by a factor f i1 is ═ 1; for negative effects, h (i, date) is multiplied by a factor fiIs-1. All influencing airlines per news are also considered when overlaying. The superimposed heat is divided by the average number of flights per day for the flight being queried to obtain h (date).
The heat of the ith news at date is solved by equation (5). The right side of equation (5) contains two factors, one is the popularity h (i, now) of the ith news on the date now on which the news crawl is performed, and the other is a factor related to the difference between the query date and the start date and end date of the ith news. h (i, now) is equal to the difference between the heat sum of the current capture and the heat sum of the last capture divided by the number of days between two captures, wherein the heat sum is obtained by weighted summation of the reading amount and the comment amount of news. For example, if the date of the current capture is 2016-10-18, the sum of heat is 81543, the date of the last capture is 16-10-15, and the sum of heat is 77843, then h (i, now) is (81543-.
The following provides experimental data obtained by applying the method of the invention to forecast the air ticket price.
And 7, 8 months and 1 day and Tuesday in 2017, and predicting the price of the MU505 flight economy class from Shanghai to hong Kong. First, the average number of remaining tickets for each space without considering news is predicted, and the obtained curve of the average number of remaining tickets for each space in the future is shown as curve 1 in fig. 1 (the horizontal axes in fig. 1 and 2 both indicate the number of days from 8/1/2017). Then, the variation curve of the average remaining number of different slots under the influence of one news is predicted. Suppose that the news is a 'century global shopping festival' which is about to be held in hong Kong in 2017 at 9, 1 and obtained in a latest round of news grabbing, the end time is 2017 at 9, 10 and the mentioned place names comprise Shanghai, Beijing, State, Chengdu and the like, and the average daily heat of the news reaches 4000 ten thousand. The hong Kong airport is 1995 kilometers away from the capital airport, 1257 kilometers away from the Shanghai airport, 1096 kilometers away from the Hangzhou airport, 1352 kilometers away from the Chengdu airport, and the judgment threshold value for influencing the airline is set as dmin300 km. In the news influenceThe variation curve of the average number of remaining tickets of the next different cabins is shown as curve 2 in figure 1. Obviously, the average number of remaining tickets drops significantly on the day of news occurrence and several days before and after news occurrence. Finally, the change of the air ticket price considering news influence is predicted, and the change curve is shown in figure 2.
The above description is only for the purpose of illustrating a few embodiments of the present invention, and should not be taken as limiting the scope of the present invention, in which all equivalent changes, modifications, or equivalent scaling-up or down, etc. made in accordance with the spirit of the present invention should be considered as falling within the scope of the present invention.

Claims (1)

1. A special price air ticket query method based on price prediction is characterized by comprising the following steps: the method comprises the steps that before air ticket price inquiry is carried out, the air ticket price at the inquiry time is predicted according to historical data of the air ticket price, and inquiry is carried out if the predicted air ticket price is lower than a set threshold value; otherwise, not inquiring; the historical data of the air ticket price comprises: the departure place, the destination, the flight control department, the departure date, the departure time, the arrival time, the transit stop time, the inquiry time, the air ticket prices of different cabins, the air ticket taxes, the remaining ticket numbers of different cabins and the type of the flight control aircraft;
the method for predicting the air ticket price at the query time according to the historical data of the air ticket price comprises the following steps:
training a data set consisting of historical data of the inquired air ticket prices by utilizing a regularized multiple linear regression method to obtain a fitting formula for predicting the air ticket prices; the fitting equation is expressed as follows:
Figure FDA0002459590900000011
in the formula (1), y is the price of the tax-free air ticket of the inquired cabin according to the air ticket prices of different cabins; thetaiIs xiI is 0, 1, 2, …, 24; x is the number of0=1;x1The distance from the starting place to the destination is the unit of kilometers; x is the number of2Month of the departure date; x is the number of3The week number of the departure date; x is the number of4The number of minutes from the takeoff time point 0; x is the number of5Minutes to time from 0; x is the number of6The number of minutes of transit; x is the number of7The number of days from the query time to the takeoff time; x is the number of8The number of passengers of the flight is obtained according to the type of the airplane carried by the flight; x is the number of9The unit is a billion yuan of airplane price estimation value obtained according to the type of airplane carried by the flight; x is the number of10The unit is element for air ticket tax; x is the number of11The average number of the remaining tickets is obtained according to the number of the remaining tickets of different slots, when the number of the remaining tickets of each slot exceeds 10, the number of the remaining tickets is 10, which represents that the remaining tickets are sufficient, and the value range of the average number of the remaining tickets is 0-10;
Figure FDA0002459590900000012
i=12,13,…,22;x23=(10-x11)×x810, an estimate of the minimum number of free seats;
Figure FDA0002459590900000013
predicting the non-tax price of said ticket at the time of inquiry according to equation (1), x at the time of prediction11Taking TNP, wherein TNP is the average number of remaining tickets of query time predicted according to historical data of the queried air ticket price; the method for solving the TNP comprises the following steps:
training a data set consisting of historical data of the inquired air ticket prices by utilizing a regularized multiple linear regression method to obtain a fitting formula of the average remaining ticket number of the predicted inquiry time; the fitting equation is expressed as follows:
Figure FDA0002459590900000021
in the formula (2), z is the average number of the remaining tickets obtained according to the number of the remaining tickets of different cabins αiIs tiI is 0, 1, 2, …, 20; t is t0=1;t1The distance from the starting place to the destination is the unit of kilometers; t is t2Month of the departure date; t is t3The week number of the departure date;t4the number of minutes from the takeoff time point 0; t is t5Minutes to time from 0; t is t6The number of minutes of transit; t is t7The number of days from the query time to the takeoff time; t is t8The number of passengers of the flight is obtained according to the type of the airplane carried by the flight; t is t9The unit is a billion yuan of airplane price estimation value obtained according to the type of airplane carried by the flight; t is t10The unit is element for air ticket tax;
Figure FDA0002459590900000022
i=11,12,…,20;
predicting the average remaining ticket number TNP of the query time according to the formula (2), which specifically comprises the following steps:
periodically capturing news websites by using a crawler program, traversing the captured news, acquiring the minimum classification of navigation bars of each news, and classifying the news with the same minimum classification of the navigation bars into the same news; counting the number of news and the reading amount and the number of comments of each news;
when a certain news appears for the first time in a capturing period, traversing related news titles and texts, taking the date with the highest frequency of appearance as the start date of the news, and taking the capturing day as the start date if the date acquisition fails; setting N days after the start date as the end date of the news;
acquiring all place names in each news, taking the place name with the highest frequency of occurrence as the place name of the news place, finding the airport closest to the place as the place airport and the airports covered by the rest place names, and traversing all routes from the place airport of the event to the airports covered by the rest place names, wherein the route distance is greater than a threshold value dminThe route of (a) is an influencing route of the news;
checking whether a news title contains a negative word, and if the news title contains the negative word, the influence of the news is negative; if no negative words are included, the effect is positive; the negative words include: earthquakes, tsunamis, debris flows, fire wars, and riots;
predicting the average remaining ticket number of the query time, wherein the calculation formula is as follows:
Figure FDA0002459590900000031
in the formula (3), TNP0(date) is the average number of remaining tickets for date when news influence is not considered, predicted from the formula (2); mu is a set coefficient; d is the route distance of the inquired route, and the unit is kilometer; h (date) is the combined heat of all news on date and date for the inquired airline which affects the airline, and the solution formula is as follows:
Figure FDA0002459590900000032
in equation (4), h (date) is the integrated heat of the date of all news that affects the airline in which the airline is queried; p (date) is the number of the news at date; h (i, date) is the popularity of the ith news in date, and f is positive influenceiWhen negative, f is 1i-1; d is the route distance of the inquired route, and the unit is kilometer; m (i) the number of affected routes for the ith news; dijThe route distance of the j th influencing route of the ith news; dminThe unit of the threshold value for judging the influence air route is kilometers; power is the average number of flights per day for the airline being queried; the solving formula of the heat h (i, date) of the ith news at date is as follows:
Figure FDA0002459590900000033
in formula (5), ssiThe start date of the ith news; esiThe end date of the ith news; now is the date when the news was captured; h (i, now) is the popularity of the ith news on the date now, and is equal to the difference between the popularity sum obtained by capturing this time and the popularity sum extracted last time, which is obtained by weighted summation of the reading amount and the comment number of the news, divided by the number of days between two capturing.
CN201710729735.1A 2017-08-23 2017-08-23 Special price air ticket query method based on price prediction Active CN107506435B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710729735.1A CN107506435B (en) 2017-08-23 2017-08-23 Special price air ticket query method based on price prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710729735.1A CN107506435B (en) 2017-08-23 2017-08-23 Special price air ticket query method based on price prediction

Publications (2)

Publication Number Publication Date
CN107506435A CN107506435A (en) 2017-12-22
CN107506435B true CN107506435B (en) 2020-07-07

Family

ID=60692486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710729735.1A Active CN107506435B (en) 2017-08-23 2017-08-23 Special price air ticket query method based on price prediction

Country Status (1)

Country Link
CN (1) CN107506435B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472399A (en) * 2018-10-23 2019-03-15 上海交通大学 Consider the air ticket purchase decision method and system of uncertainty in traffic
CN112561594A (en) * 2020-12-22 2021-03-26 北京天九共享航空服务咨询集团有限公司 Method for generating quoted price information of aircraft customized service

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105022825A (en) * 2015-07-22 2015-11-04 中国人民解放军国防科学技术大学 Financial variety price prediction method capable of combining financial news mining and financial historical data
CN106030626A (en) * 2013-12-11 2016-10-12 天巡有限公司 Method and system for providing fare availabilities, such as air fare availabilities
CN106104615A (en) * 2013-12-11 2016-11-09 天巡有限公司 For providing method and the server of one group of price evaluation value, such as air fare price evaluation value
CN106355428A (en) * 2016-08-16 2017-01-25 吕栋雷 Method for predicting timing for airline ticket purchase
CN106682934A (en) * 2016-11-18 2017-05-17 云南电网有限责任公司电力科学研究院 Bidding strategy for electricity purchase

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106030626A (en) * 2013-12-11 2016-10-12 天巡有限公司 Method and system for providing fare availabilities, such as air fare availabilities
CN106104615A (en) * 2013-12-11 2016-11-09 天巡有限公司 For providing method and the server of one group of price evaluation value, such as air fare price evaluation value
CN105022825A (en) * 2015-07-22 2015-11-04 中国人民解放军国防科学技术大学 Financial variety price prediction method capable of combining financial news mining and financial historical data
CN106355428A (en) * 2016-08-16 2017-01-25 吕栋雷 Method for predicting timing for airline ticket purchase
CN106682934A (en) * 2016-11-18 2017-05-17 云南电网有限责任公司电力科学研究院 Bidding strategy for electricity purchase

Also Published As

Publication number Publication date
CN107506435A (en) 2017-12-22

Similar Documents

Publication Publication Date Title
De Poret et al. The economic viability of long-haul low cost operations: Evidence from the transatlantic market
US9739626B2 (en) Journey planning method and system
CN103678489A (en) Smart city travel information recommending method and device
WO2017050205A1 (en) Information prompt method and mobile service terminal used for public place
JP2019197578A (en) Method and server for providing fare availability, for example, airfare availability
US20140019176A1 (en) Apparatus and method for searching and booking a complete travel itinerary
CN107423837A (en) The Intelligent planning method and system of tourism route
CN105468580A (en) Attention point information based method and apparatus for providing service
WO2009137309A2 (en) Process and system to determine commercial airline arrivals
US20160117618A1 (en) Determining alternative travel itineraries using current location
US20100030591A1 (en) Method and apparatus for recommending simplified fares with consistent buyacross
Zajac The role of air transport in the development of international tourism
CN104616188A (en) Method and system based on network ticket buying
US20160117616A1 (en) Determining alternative travel itineraries using weather information
CN107506435B (en) Special price air ticket query method based on price prediction
CN106104615A (en) For providing method and the server of one group of price evaluation value, such as air fare price evaluation value
CN107423833A (en) Method is recommended in a kind of travel routing
Liebhardt et al. Estimation of the market potential for supersonic airliners via analysis of the global premium ticket market
US20170124205A1 (en) Smart cache for travel search computer system hosting a travel meta-search engine
US20140222475A1 (en) Flight saver system
García-Albertos et al. Analyzing door-to-door travel times through mobile phone data: A case study of Spanish airports
CN111339122A (en) Active caching method of travel platform, travel query method and related products
US20190156252A1 (en) System For Facilitating And Executing Travel-Related Transactions
CN105654340A (en) Method and system for determining real flight of passenger
WO2022045058A1 (en) Schedule management device, schedule management system, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant