CN106779214B - Multi-factor fusion civil aviation passenger trip prediction method based on topic model - Google Patents
Multi-factor fusion civil aviation passenger trip prediction method based on topic model Download PDFInfo
- Publication number
- CN106779214B CN106779214B CN201611159984.3A CN201611159984A CN106779214B CN 106779214 B CN106779214 B CN 106779214B CN 201611159984 A CN201611159984 A CN 201611159984A CN 106779214 B CN106779214 B CN 106779214B
- Authority
- CN
- China
- Prior art keywords
- passenger
- airline
- passengers
- theme
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 230000004927 fusion Effects 0.000 title claims abstract description 23
- 238000010586 diagram Methods 0.000 claims abstract description 18
- 238000009826 distribution Methods 0.000 claims description 33
- 230000006399 behavior Effects 0.000 claims description 11
- 230000011218 segmentation Effects 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000007418 data mining Methods 0.000 claims description 4
- 238000009499 grossing Methods 0.000 claims description 2
- 239000013598 vector Substances 0.000 description 18
- 239000011159 matrix material Substances 0.000 description 7
- 238000005070 sampling Methods 0.000 description 7
- 238000012549 training Methods 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 6
- 238000007781 pre-processing Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000010977 jade Substances 0.000 description 1
- WABPQHHGFIMREM-UHFFFAOYSA-N lead(0) Chemical compound [Pb] WABPQHHGFIMREM-UHFFFAOYSA-N 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000005295 random walk Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Tourism & Hospitality (AREA)
- General Physics & Mathematics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A multi-factor fusion civil aviation passenger trip prediction method based on a theme model. According to the invention, firstly, the association diagram among passengers is constructed, and subject modeling is carried out according to the preference of the passengers, so that a Passenger association diagram Travel subject Model (PGTTM) is constructed, the subject information can be enriched, and the problem of sparsity of civil aviation data can be effectively solved; secondly, a multi-factor fusion prediction framework is constructed through a Bayesian probability model, and the future travel of the passenger is accurately predicted by fusing the airline heat and PGTTM to obtain the passenger airline preference, passenger loyalty and airline market share information. The invention can effectively predict airlines and airlines of passengers going out in the future, can provide effective decision support for aviation and related industries, and provides personalized service for passengers.
Description
Technical Field
The invention belongs to the technical field of computer application, relates to data mining and civil aviation data analysis, and particularly relates to a multi-factor fusion civil aviation passenger trip prediction method based on a theme model.
Background
The improvement of the living standard of people and the development of the internet enable a large amount of ticket booking data to be accumulated in a civil aviation passenger ticket booking system, and the system has the characteristics of mass property, sparseness and long-tailed property and brings challenges to the analysis of civil aviation data. The method is one of the most important tasks in civil aviation data analysis based on the data for analyzing the travel characteristics of passengers and predicting the travel behaviors in the future. The analysis and research of civil aviation passengers at home and abroad are in the preliminary stage, and much research on the travel prediction of the civil aviation passengers is not provided.
Analysis research related to civil aviation data, such as applying clustering analysis and association rules to real frequent passenger data of airlines by Mallouf and the like, and proposing recommendation and improvement strategies to customer relationship management[1]. And Wangchun et al adopts questionnaire survey and combines statistical method to analyze consumer motivation, airline company preference and purchasing behavior of Changchun civil aviation passenger population[2]. Feng et al constructs heterogeneous information network on civil aviation data and performs low-frequency trip passenger value discovery task in random walk mode[3]. Etzioni et al explores the relevance between time and ticket price, and adopts a multi-strategy data mining algorithm to inform passengers of the best time for buying ticketsWorkshop[4]。
The LDA (latent Dirichlet allocation) model in the theme model has better text theme modeling performance and good expansibility[5]. LDA-based ATM (Author-Topic Model) as Rosen-Zvi et al, while subject modeling authors, documents and words[6]. And Blei et al propose a supervised LDA model for text classification problems, and add document labels in training corpus as observed values into LDA[7]. An expansion theme model or LDA model is applied to the recommendation field, for example, Liu and the like add implicit characteristic display in travel package data into the theme model, and a method for recommending travel information in a personalized way is provided[8]. Tan and the like represent passenger information into a characteristic-value pair form, learn the potential interest distribution of passengers by adopting a theme model, and recommend travel packages by combining collaborative filtering[9]。
The social relationship among passengers is beneficial to modeling, for example, beautiful jade and the like build a common travel network, and the civil aviation passenger seat preference modeling method combining passenger individual preference and relationship preference is provided[10]. And Zhouyiwei et al propose a semi-supervised relationship classification algorithm based on information graph to obtain more accurate passenger relationship and provide targeted and high-quality service[11]。
The theme model is applied to analysis and prediction of civil aviation passenger travel, potential theme distribution is found, the problem of massive data is solved, the method is worthy of trial, the relation among passengers is integrated into theme modeling, theme information is enriched, the problem of sparsity is reduced, and therefore the modeling effect is improved. In addition, by constructing a probability model frame, various trip influence factors are fused, and the improvement of the prediction effect is also waited for.
Reference documents:
[1]Maalouf L,Mansour N.Mining airline data for crm strategies.InProceeding of the 7th WSEAS International Conference on Simulation,Modelingand Optimization,Beijing,China,pages 345-350,2007.
[2] dynasty en, vinpocketed civil aviation passenger characterization and behavioral analysis [ D ] jilin university, 2010.
[3]Feng X,Xu B Y,Lu M,et al.Infrequent Passenger Value Discovery byRandom Walk on Passenger-route Heterogeneous Network.Journal of Computationaland Theoretical Nanoscience,2(1):10-17,2015.
[4]Etzioni,Oren,Tuchinda,et al.To buy or not to buy:mining airfaredata to minimize ticket purchase price[C]//ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining,Washington,USA,August.2003:119-128.
[5]Blei D M,Ng A Y,Jordan M I.Latent dirichlet allocation[J].Journalof Machine Learning Research,2003,3:993-1022.
[6]Rosen-Zvi M,Griffiths T,Steyvers M,et al.The author-topic modelfor authors and documents[C]//Proceedings of the 20th conference onUncertainty in artificial intelligence.AUAI Press,2004:487-494.
[7]Blei D M,Mcauliffe J D.Supervised Topic Models[J].Advances inNeural Information Processing Systems,2010,3:327-332.
[8]Liu Q,Ge Y,Li Z,et al.Personalized Travel Package Recommendation[C]//IEEE,International Conference on Data Mining.IEEE Computer Society,2011:407-416.
[9]Tan C,Liu Q,Chen E,et al.Object-Oriented Travel PackageRecommendation[J].Acm Transactions on Intelligent Systems&Technology,2014,5(3):1-26.
[10] Beautiful jade, civil aviation passenger seat preference modeling and application research [ D ]. beijing traffic university, 2015.
[11] Zhou Yuanwei, civil aviation social network relation classification algorithm design and implementation [ D ]. Beijing university of transportation, 2013.
Disclosure of Invention
The invention aims to solve the problems of the mass property, the sparseness, the long tail property and the influence on the travel factor diversity of the ticket booking data of civil aviation passengers, and provides a multi-factor fusion civil aviation passenger travel prediction method based on a theme model for accurately predicting airlines and airlines taken by passengers in the future.
The invention adopts a theme model to carry out theme modeling on passengers and selected airlines thereof, and provides a passenger association map trip theme model PGTTM (Passenger Graph based Topic model) by introducing a constructed passenger association map, so that the preference information of the passengers for the airlines and the airlines can be obtained, the theme information is enriched, and the problem of sparsity of civil aviation is solved.
And then introducing a Bayesian probability model, fusing four factors of passenger airline preference, passenger loyalty and airline market share obtained by airline heat and PGTTM, constructing a multi-factor fusion prediction framework, and more accurately predicting and recommending airlines and airlines taken by passengers in the future. The method is the main inventive content of the multi-factor fusion civil aviation passenger trip prediction method based on the theme model.
Technical scheme of the invention
A multi-factor fusion civil aviation passenger trip prediction method based on a topic model comprises the following steps:
step 1): and constructing a passenger association map travel subject model. The method mainly comprises the steps of constructing a passenger association diagram, carrying out theme modeling on passenger travel preference, and finally obtaining a passenger association diagram travel theme model:
step 1.1), constructing a passenger association diagram;
constructing a passenger association diagram, namely calculating the association degree among passengers, wherein the association degree is jointly determined by the co-occurrence degree and the attribute co-occurrence degree of a passenger airline; the line co-occurrence degree is determined by the line co-occurrence number among passengers; the attribute co-occurrence degree refers to whether the age, the gender, the average discount and the average mileage of the passengers are the same; the passenger age, the average discount and the average mileage information are obtained by a variance-based segmentation method;
step 1.2), modeling a passenger travel preference theme;
the method comprises the steps that theme modeling is carried out on a passenger and an airline company which the passenger takes on the basis of a theme model, potential theme distribution of the passenger, the airline and the airline company is found and obtained, and finally the potential theme distribution of the passenger is combined with the potential theme distribution of the airline and the airline company, so that the travel preference information of the passenger on the airline and the airline can be obtained;
step 1.3), constructing a passenger association map travel subject model;
adding the Passenger association diagram in the step 1.1) in the theme modeling process in the step 1.2) to construct a Passenger association diagram Travel theme Model (PGTTM); when the PGTTM allocates themes to airlines and airlines of each passenger, the themes come from the passengers and possibly other passengers related to the passengers, so that theme information can be enriched, the prediction performance can be improved, and the problem of travel sparsity of civil aviation passengers can be solved;
step 2): an airline heat, passenger loyalty and airline market share calculation model is built, and the prior knowledge is utilized to help accurate prediction in the following steps:
step 2.1), calculating the heat of the air route;
for the airline heat, firstly counting the number of times that the airline is taken by all passengers and the sum of the number of times that each airline is taken by all passengers, and calculating to obtain the airline heat on the basis;
step 2.2), calculating the loyalty of the passenger to the airline company;
for the loyalty of the passenger, firstly counting the times of the passenger taking the airline company and the sum of the times of the passenger taking each airline company, and calculating the loyalty of the passenger to the airline company through smoothing treatment on the basis;
step 2.3), calculating the market share of the airline company to the airline;
for the market share of the airline company, firstly counting the times of taking the airline company and the airline as a word pair by all passengers, and calculating the market share of the airline company to the airline line based on the times of taking the airline line by all passengers without considering the airline company;
step 3): a multi-factor fusion prediction framework is constructed by fusing airline popularity, passenger preference to airlines, passenger loyalty and airline market share through a Bayesian probability model, and the airlines and airlines selected by the passengers in the future are predicted:
step 3.1), multi-factor fusion based on a Bayesian probability model;
constructing a Bayesian probability model based on the passenger preference to the airline obtained by PGTTM in the step 1), the airline heat in the step 2.1), the passenger loyalty in the step 2.2) and the airline market share in the step 2.3), fusing the four factors, and better modeling the travel behavior of the passenger;
step 3.2), multi-factor prediction based on a Bayesian probability model;
aiming at each passenger and each airline company-airline word pair, respectively calculating the boarding probability of the passenger by using a Bayesian probability model function; and (4) selecting a plurality of airline company-airline word pairs with the highest probability for each passenger, and predicting and recommending the airline company-airline word pairs.
The invention has the advantages and positive effects that:
proposing a passenger association map travel theme model PGTTM
The method carries out theme modeling aiming at travel behaviors of civil aviation passengers, finds potential theme distribution of the passengers and airlines taken by the passengers, and accurately predicts the behaviors of airlines selected by the passengers for future travel and the like. On the basis, a passenger association graph is constructed and introduced to obtain PGTTM, rich subject information of similar passengers can be used, prediction accuracy is improved, and the problem of sparseness of travel data of civil aviation passengers is solved.
Proposing a multi-factor fusion prediction framework by means of a Bayesian probability model function
According to the invention, a multi-factor fusion prediction framework is obtained through a Bayesian probability model function, the PGTTM is fused to obtain the preference of the passenger to the airline, and the priori knowledge of airline heat, passenger loyalty and airline market share, and compared with a benchmark method, the prediction framework can more accurately predict the airline and airline selected by the passenger for future travel.
Drawings
FIG. 1 is a diagram of the overall model system of the present invention.
Fig. 2 is a flow chart of the algorithm of the present invention.
Detailed Description
Example 1:
the following describes in detail the multi-factor fusion civil aviation passenger trip prediction method based on the topic model in combination with the accompanying drawings and specific implementation.
The invention mainly adopts a data mining theory and a method to analyze the travel behavior of passengers in civil aviation data, and in order to ensure the normal operation of the system, in the specific implementation, a computer platform is required to be provided with a memory not lower than 8G, a 64-bit operating system with the CPU core number not lower than 4 and the main frequency not lower than 2.6GHz, Windows7 and above versions, and essential software environments such as an Oracle database, Java1.7 and above versions, Matlab2011b and above versions are installed.
The method for predicting the passenger trip behavior based on the multi-factor fusion of the subject model is described as follows with reference to the attached figures 1 and 2.
Step 1): construction of passenger association map trip theme model PGTTM
Step 1.1), data preprocessing and S1.1 stage of constructing a passenger association diagram;
step 1.11), data introduction and preprocessing
Each piece of data in the passenger booking data comprises passenger personal information and travel information; the personal information includes encrypted identification number for uniquely identifying the passenger, the passenger's age, sex, etc., and the travel information includes the information of the airline company, departure airport, arrival airport, discount, etc.
After preprocessing operations such as removing low-frequency passengers, removing repeated records, removing abnormal records and the like, certain historical data are taken as a training set, and the rest data are taken as a test set.
Step 1.12), a variance-based segmentation method;
for example, segmenting ages, extracting all ages in a training set passenger travel record into a row sequence table, traversing minimum ages to maximum ages, taking each traversed age as a segmentation point, calculating a weighted average value of variances of two segments of age tables after segmentation, wherein the weight is the proportion of the number of the ages contained after segmentation to the number of the ages before segmentation, and finding a segmentation age value with the largest difference between the weighted average value of the difference after segmentation and the difference before segmentation, namely the optimal segmentation point.
Step 1.13), constructing a passenger association diagram;
the association degree between passengers is jointly determined by the airline co-occurrence degree and the attribute co-occurrence degree; performing statistical calculation on the training set obtained in the step 1.11) to obtain a sparse matrix expressing the number of co-occurrence of the airlines among the passengers, wherein each row of normalization is an airlines co-occurrence matrix; the attribute co-occurrence degree is that whether the two passengers are the same after the ages, the sexes, the average discounts and the average mileage of the passengers are segmented in the step 1.12); finally, taking a plurality of passengers with the highest airline co-occurrence degree as the associated passengers, and obtaining the association degree of the passengers and the associated passengers by the weighted average of the airline co-occurrence degree and the attribute co-occurrence degree among the passengers; thus, a correlation diagram between passengers is constructed.
The air route taken by the passenger is determined by a take-off airport and an arrival airport, the mileage information is obtained by the distance between the take-off airport and two cities represented by the arrival airport, the price is determined by the mileage and discount information, and the average discount is determined by the total mileage and the total price of the passenger.
Step 1.2), modeling passenger travel preference by utilizing PGTTM
Step 1.21) obtaining input data S1.21 stage;
the passenger booking record of the training set is provided with different U passengers (which are distinguished by encrypted identity numbers), C airlines and R airlines. Three fields of an identity card number, an airline company and a airline are extracted from the passenger booking record and are respectively replaced by an index form, namely the three fields are respectively represented by numbers 1-U, 1-C and 1-R, and finally three vectors U, C and R are obtained, wherein the lengths of the vectors are all N (the booking record number of the training set), namely input data. Each row of three vectors represents the passenger u in the ith reservation recordiPick up airline company ciLower route ri,(1≤ui≤U,1≤ci≤C,1≤ri≤R,i=1,2,...,N)。
T is the number of set themes. z represents a theme vector of length N, x is the passenger vector used to generate the theme, and length N. u, c, r are related to z, x by the score of eachQuantity represents passenger uiAirline company c of rideiAnd the route riSubject z ofiIs formed by xiIs distributed, and xiMay be uiMay also be ui(1. ltoreq. z)i≤T,1≤xi≤U,i=1,2,...,N)。
The following is the process by which a passenger generates each travel behavior in PGTTM:
(1) each passenger u corresponds to a topic distribution, and each topic t corresponds to an airline distribution and an airline distribution. Subject distribution theta of passenger uuDirichlet (α), airline distribution of topic t [ ]tDirichlet (μ), course distribution of subject tDirichlet(β),(u=1,2,...,U,t=1,2,...,T;θuIs a vector of dimension T, phitIs a vector of the dimension C and is,is the R-dimensional vector, α, mu, β are the parameters of the dirichlet distribution).
(2) Passenger uiFirstly, a passenger s is sampled, then a travel theme is sampled by s, and finally, an airline company and an airline to be taken are selected according to the travel theme. I.e. the subject ziMultinomial(θs) Airline company ci Air route ri In PGTTM s may be uiItself, may also be uiAssociated passenger (1 ≦ u)i≤U,1≤zi≤T,1≤ci≤C,1≤ri≤R,i=1,2,...,N)。
Passenger-topic distribution θ (U × T dimension), topic-airline distribution φ (T × C dimension), topic-airline distributionThe (T × R dimension) is the parameter to be inferred by PGTTM, namely, the theme distribution of the existing passenger u and the boarding behaviors c and R of the passenger u is inferred reversely.
Step 1.22) stage S1.22 of the initialization operation;
the initial state of the passenger x to be assigned the theme is set to be equal to the boarding passenger u. The topic vector z is then randomly initialized with T topics. (i.e. 1. ltoreq. z)i≤T,i=1,2,...,N)。
Is provided with CUTIs a U × T-dimensional matrix representing the number of times that passengers assign each topic, and is obtained by vector x and z statistics, CTCIs a T × C-dimensional matrix representing the number of times the topic is assigned to each airline, and is obtained by the statistics of vectors z and CTRThe matrix is a T × R dimensional matrix which represents the times of allocating the theme to each airline and is obtained by counting vectors z and R.
Setting a maximum iteration number NN; a vector order of length N is constructed, whose values extend over 1 to N, but the order is randomly shuffled.
Step 1.23) does not consider the theme distribution of the current passenger, the current airline company and the airline, and updates the S1.23 stage of the theme counting matrix;
subscript order regardless of subject ziThat component of, three subject count matrices are updated, i.e. Are all decremented by 1.
Step 1.24) sampling a passenger for generating a new theme for the current airline company and airline at the S1.24 stage;
determined by Bernoulli distribution with a parameter of τ as current airlineTravel routeThe subject of the resampling is determined by the current passengerIs produced or produced fromThe associated passengers in the association map are generated. To be composed ofWhich of the associated passengers is generated is determined by a multi-term distribution whose parameter is the degree of association of the passenger with its associated passenger. Assuming that the sampling passenger is s,is the sampling probability, depends on the parameters of the two distributions.
Step 1.25) utilizing a Gibbs sampling formula to redistribute a new theme for the current airline company and the airline 1.25;
according to a Gibbs sampling formula, calculating that the passenger s is the current airline companyAnd the current routeThe new topic of reassignment is the probability of T (T1, 2.., T). The formula is as follows:
the significance of the formula is the probability of sampling the passenger s for the current passenger and the new subject t for the current airline, airline. Wherein the subscript is marked with an orderiThe vector representation of (A) irrespective of the subscript as orderiThe component of (a) to (b),is the number of times the passenger s assigned the topic t,is the assignment of topic t to the airlineThe number of times of the operation of the motor,is the assignment of a topic t to a flight lineThe number of times of the operation of the motor,is obtained in step 1.23) and is according toThe probability of passenger s is sampled.
And finally, forming a plurality of distributions by taking the T probability values as parameters, and sampling a new theme as topic.
Step 1.26) updating passenger vectors used for generating themes and S1.26 stages of theme vectors;
Step 1.27) updating the S1.27 stage of the three subject counting matrixes;
generating passenger vectors and subject vectors for subjects after updating in step 1.26), orderBoth are incremented by 1.
Step 1.28) calculating to obtain the S1.28 stages of passenger-theme, theme-airline company and theme-airline distribution after iteration is finished;
the number of iterations, from 1 to NN, i from 1 to N, as outer and inner loops respectively, is continuously resampled from passengers producing themes and themes assigned to airlines and airlines, i.e. steps 1.23) to 1.27) are repeated. After iteration is completed, according to the following formula, the distribution theta, phi and airline distribution of the passenger-theme can be obtained
Wherein, U1, 2, C, R1, 2, R, T1, 2, T.
Step 1.29) calculating the preference degree of the passenger to the airline at S1.29 stage;
PGTTM is used for modeling the preference of passengers for airlines and airlines, for example, P (u | r) represents the preference degree of passengers for airlines and the attraction degree of airlines for passengers, and the calculation formula is as follows:
wherein, U1, 2., U, R1, 2., R.
Step 2): calculating airline heat, passenger loyalty and airline market share:
step 2.1), calculating the S2.1 stage of the heat of the air route;
the popularity of a route is represented by P (r), and the probability that a passenger selects the route r when travelling is shown by the following formula:
wherein, count (R) represents the number of times that the airline R appears in the 2010 passenger booking record, and R is 1, 2.
Step 2.2), calculating the loyalty of the passengers in the S2.2 stage;
passenger loyalty, expressed as P (c | u), indicates the probability that passenger u will select airline c when traveling, and is formulated as follows:
where, count (U, C) indicates the number of times the passenger U selects the airline C in the 2010 passenger booking record, and C is 1, 2.
Step 2.3), calculating the market share of the airline company at S2.3 stage;
the airline market share is represented by P (c | r), indicating the probability that airline r belongs to an airline under airline c, as follows:
wherein, count (C, R) represents the number of records in which the airline company C and the airline R commonly appear in the 2010 passenger booking record, and C is 1, 2.
Step 3): introducing a Bayesian probability model, constructing a multi-factor fusion prediction framework, calculating the probability of passengers taking airlines and airlines, and predicting and recommending S3:
step 3.1), constructing a multi-factor fusion prediction framework by using a Bayesian probability model;
fusing the passenger preferences of the PGTTM in the step 1) and the passenger heat, passenger loyalty and airline market share in the step 2) together by using a Bayesian probability model to construct a multi-factor fusion prediction framework. The Bayesian probability model used in the invention is derived as follows:
first for a fixed passenger u, P (u) is a constant, which can be derived
In turn according to
P (r, c, u) ═ P (r) × P (u | r) × P (c | u, r) ≈ P (r) × P (u | r) [ α P (c | u) + (1- α) P (c | r) ], so a desired bayesian probability function can be obtained as follows:
logP(r,c|u)∝log{P(r)*P(u|r)*[αP(c|u)+(1-α)P(c|r)]}
wherein, P (r, c | u) represents the probability of the passenger u selecting the airline r under the airline company c, alpha is a settable parameter, and log is taken at two sides of the formula to avoid that the obtained probability value is too small.
The final formula is a required Bayesian probability model and a multi-factor fusion prediction framework, and integrates airline heat P (r), airline preference P (u | r), airline loyalty P (c | u) and airline market share P (c | r). (C1, 2., C, R1, 2., R, U1, 2., U).
Step 3.2), forecasting airlines and airlines selected by the passengers in the future;
according to the multi-factor prediction framework in the step 3.1), assuming that a total of W airline-airline word pairs in the training set, calculating the probability of taking each airline-airline word pair for each passenger u, sequencing the passenger u from large to small according to the calculated numerical values, then finding the top K (TopK) airline-airline word pairs with the largest numerical values as prediction objects, recommending the prediction objects, and comparing the prediction results with the test set to obtain the prediction accuracy.
For example, for a certain passenger 17464755. (the encrypted identification number), the (c, r) represented by the airline 290 (the code of the real name) and the airline CTU-CAN (the airport three-character code, the capital double-flow airport-the Guangzhou white cloud airport) in the booking data is substituted into the multi-factor fusion prediction frame function in the step 3.1 for calculation, and if the calculated numerical value is the maximum compared with other W-1 word pairs, the word pair is taken as a prediction object, and if the passenger actually rides the airline under the airline in the test set, the prediction accuracy is 1 for Top 1. (C1, 2., C, R1, 2., R, U1, 2., U).
It should be emphasized that the embodiments described herein are illustrative rather than restrictive, and thus the present invention is not limited to the embodiments described in the detailed description, but other embodiments derived from the technical solutions of the present invention by those skilled in the art are also within the scope of the present invention.
Claims (1)
1. A multi-factor fusion civil aviation passenger trip prediction method based on a theme model is characterized in that a data mining theory and a method are adopted to analyze passenger trip behaviors in civil aviation data, a computer platform used by an operating environment is required to be provided with a memory not lower than 8G, a 64-bit operating system with the CPU core number not lower than 4 and the dominant frequency not lower than 2.6GHz, Windows7 and above versions is installed, and an Oracle database, Java1.7 and above versions, and Matlab2011b and above versions are installed to have essential software environments; the method is characterized by comprising the following steps:
step 1): constructing a passenger association chart travel theme model; the method comprises the steps of constructing a passenger association diagram, carrying out theme modeling on passenger trip selection probability distribution, and finally obtaining a passenger association diagram trip theme model:
step 1.1), constructing a passenger association diagram;
constructing a passenger association diagram, namely calculating the association degree among passengers, wherein the association degree is jointly determined by the co-occurrence degree and the attribute co-occurrence degree of a passenger airline; the line co-occurrence degree is determined by the line co-occurrence number among passengers; the attribute co-occurrence degree refers to whether the age, the gender, the average discount and the average mileage of the passengers are the same; the passenger age, the average discount and the average mileage information are obtained by a variance-based segmentation method;
step 1.2), modeling a probability distribution theme of passenger trip selection;
carrying out theme modeling on the passenger and the airline and airline company which the passenger takes on the basis of the theme model, finding and solving the potential theme distribution of the passenger, the airline and the airline company, and finally combining the potential theme distribution of the passenger with the potential theme distribution of the airline and airline company to obtain the trip selection probability distribution information of the passenger on the airline and airline company;
step 1.3), constructing a passenger association map travel subject model;
adding the Passenger association diagram in the step 1.1) in the theme modeling process in the step 1.2) to construct a Passenger association diagram Travel theme Model (PGTTM); when the PGTTM assigns a theme to each passenger's airline or airline, the theme is not only from the passenger itself, but also possibly from other passengers associated with the passenger; therefore, theme information can be enriched, the prediction performance is improved, and the problem of sparseness of civil aviation passengers in travel is solved;
step 2): an airline heat, passenger loyalty and airline market share calculation model is built, and the prior knowledge is utilized to help accurate prediction in the following steps:
step 2.1), calculating the heat of the air route;
for the airline heat, firstly counting the number of times that the airline is taken by all passengers and the sum of the number of times that each airline is taken by all passengers, and calculating to obtain the airline heat on the basis;
step 2.2), calculating the loyalty of the passenger to the airline company;
for the loyalty of the passenger, firstly counting the times of the passenger taking the airline company and the sum of the times of the passenger taking each airline company, and calculating the loyalty of the passenger to the airline company through smoothing treatment on the basis;
step 2.3), calculating the market share of the airline company to the airline;
for the market share of the airline company, firstly counting the times of taking the airline company and the airline as a word pair by all passengers, and calculating the market share of the airline company to the airline line based on the times of taking the airline line by all passengers without considering the airline company;
step 3): constructing a multi-factor fusion prediction framework through a Bayesian probability model; and predicting the airlines and airlines selected by the passenger in the future by fusing the airline heat, the passenger to airline selection probability distribution, the passenger loyalty and the airline market share through a Bayesian probability model:
step 3.1), multi-factor fusion based on a Bayesian probability model;
constructing a Bayesian probability model based on the probability distribution of passenger to airline selection obtained by PGTTM in the step 1.3), airline heat in the step 2.1), passenger loyalty in the step 2.2) and airline market share in the step 2.3), fusing the four factors, and better modeling the travel behavior of the passenger;
step 3.2), multi-factor prediction based on a Bayesian probability model;
aiming at each passenger and each airline company-airline word pair, respectively calculating the boarding probability of the passenger by using a Bayesian probability model function; and (4) selecting a plurality of airline company-airline word pairs with the highest probability for each passenger, and predicting and recommending the airline company-airline word pairs.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611159984.3A CN106779214B (en) | 2016-12-15 | 2016-12-15 | Multi-factor fusion civil aviation passenger trip prediction method based on topic model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611159984.3A CN106779214B (en) | 2016-12-15 | 2016-12-15 | Multi-factor fusion civil aviation passenger trip prediction method based on topic model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106779214A CN106779214A (en) | 2017-05-31 |
CN106779214B true CN106779214B (en) | 2020-08-28 |
Family
ID=58889245
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611159984.3A Active CN106779214B (en) | 2016-12-15 | 2016-12-15 | Multi-factor fusion civil aviation passenger trip prediction method based on topic model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106779214B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108876049A (en) * | 2018-06-27 | 2018-11-23 | 南京航空航天大学 | A kind of airport market share variation prediction method in new demand servicing nurturing period |
CN110751523A (en) * | 2019-10-21 | 2020-02-04 | 中国民航信息网络股份有限公司 | Method and device for discovering potential high-value passengers |
CN110852650B (en) * | 2019-11-19 | 2021-11-02 | 交通运输部公路科学研究所 | Comprehensive passenger transport hub group network modeling method based on dynamic graph hybrid automaton |
CN112948161B (en) * | 2021-03-09 | 2022-06-03 | 四川大学 | Deep learning-based aviation message error correction and correction method and system |
CN118350858A (en) * | 2024-03-20 | 2024-07-16 | 中航信数智科技(北京)有限公司 | Method and device for predicting airline passenger quantity, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105488597A (en) * | 2015-12-28 | 2016-04-13 | 中国民航信息网络股份有限公司 | Passenger destination prediction method and system |
CN105512773A (en) * | 2015-12-25 | 2016-04-20 | 中国民航信息网络股份有限公司 | Passenger travel destination prediction method and device |
CN106055807A (en) * | 2016-06-06 | 2016-10-26 | 四川大学 | Civil aviation passenger movement model based on potential trip purposes |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5946394B2 (en) * | 2012-11-09 | 2016-07-06 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | Statistical inference method, computer program, and computer of path start and end points using multiple types of data sources. |
-
2016
- 2016-12-15 CN CN201611159984.3A patent/CN106779214B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105512773A (en) * | 2015-12-25 | 2016-04-20 | 中国民航信息网络股份有限公司 | Passenger travel destination prediction method and device |
CN105488597A (en) * | 2015-12-28 | 2016-04-13 | 中国民航信息网络股份有限公司 | Passenger destination prediction method and system |
CN106055807A (en) * | 2016-06-06 | 2016-10-26 | 四川大学 | Civil aviation passenger movement model based on potential trip purposes |
Non-Patent Citations (1)
Title |
---|
出行链化的贝叶斯网络预测;赵应场 等;《道路交通与安全》;20151231;第15卷(第1期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN106779214A (en) | 2017-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106779214B (en) | Multi-factor fusion civil aviation passenger trip prediction method based on topic model | |
Ni et al. | Perceive your users in depth: Learning universal user representations from multiple e-commerce tasks | |
Gupta | Evaluating service quality of airline industry using hybrid best worst method and VIKOR | |
CN108363804B (en) | Local model weighted fusion Top-N movie recommendation method based on user clustering | |
Gao et al. | Location-centered house price prediction: A multi-task learning approach | |
Çavdar et al. | Airline customer lifetime value estimation using data analytics supported by social network information | |
CN105893406A (en) | Group user profiling method and system | |
Jenab et al. | A graph-based model for manufacturing complexity | |
Zhang | Design of a sports culture data fusion system based on a data mining algorithm | |
Gao et al. | Application of artificial intelligence and big data technology in digital marketing | |
CN107633035B (en) | Shared traffic service reorder estimation method based on K-Means and LightGBM model | |
Mattila et al. | Maintenance scheduling of a fleet of fighter aircraft through multi-objective simulation-optimization | |
CN111177538A (en) | Unsupervised weight calculation-based user interest tag construction method | |
Liu et al. | Personalized air travel prediction: A multi-factor perspective | |
Kang et al. | LA-CTR: A limited attention collaborative topic regression for social media | |
CN111429161A (en) | Feature extraction method, feature extraction device, storage medium, and electronic apparatus | |
CN115222433A (en) | Information recommendation method and device and storage medium | |
CN115293920A (en) | Multi-modal data-based social relationship analysis method, system and storage medium | |
Pu et al. | Improved tourism recommendation system | |
Liu | E‐Commerce Precision Marketing Model Based on Convolutional Neural Network | |
Ugochi et al. | Customer opinion mining in electricity distribution company using twitter topic modeling and logistic regression | |
CN112784177B (en) | Spatial distance adaptive next interest point recommendation method | |
Cheng et al. | [Retracted] Using Clustering Analysis and Association Rule Technology in Cross‐Marketing | |
CN117495482A (en) | Secondhand mobile phone sales recommendation method and system based on user portrait | |
CN114358807A (en) | User portrayal method and system based on predictable user characteristic attributes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |