CN106779214B - Multi-factor fusion civil aviation passenger trip prediction method based on topic model - Google Patents

Multi-factor fusion civil aviation passenger trip prediction method based on topic model Download PDF

Info

Publication number
CN106779214B
CN106779214B CN201611159984.3A CN201611159984A CN106779214B CN 106779214 B CN106779214 B CN 106779214B CN 201611159984 A CN201611159984 A CN 201611159984A CN 106779214 B CN106779214 B CN 106779214B
Authority
CN
China
Prior art keywords
passenger
airline
passengers
theme
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611159984.3A
Other languages
Chinese (zh)
Other versions
CN106779214A (en
Inventor
刘杰
王嫄
冯丽娜
陈会朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN201611159984.3A priority Critical patent/CN106779214B/en
Publication of CN106779214A publication Critical patent/CN106779214A/en
Application granted granted Critical
Publication of CN106779214B publication Critical patent/CN106779214B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A multi-factor fusion civil aviation passenger trip prediction method based on a theme model. According to the invention, firstly, the association diagram among passengers is constructed, and subject modeling is carried out according to the preference of the passengers, so that a Passenger association diagram Travel subject Model (PGTTM) is constructed, the subject information can be enriched, and the problem of sparsity of civil aviation data can be effectively solved; secondly, a multi-factor fusion prediction framework is constructed through a Bayesian probability model, and the future travel of the passenger is accurately predicted by fusing the airline heat and PGTTM to obtain the passenger airline preference, passenger loyalty and airline market share information. The invention can effectively predict airlines and airlines of passengers going out in the future, can provide effective decision support for aviation and related industries, and provides personalized service for passengers.

Description

Multi-factor fusion civil aviation passenger trip prediction method based on topic model
Technical Field
The invention belongs to the technical field of computer application, relates to data mining and civil aviation data analysis, and particularly relates to a multi-factor fusion civil aviation passenger trip prediction method based on a theme model.
Background
The improvement of the living standard of people and the development of the internet enable a large amount of ticket booking data to be accumulated in a civil aviation passenger ticket booking system, and the system has the characteristics of mass property, sparseness and long-tailed property and brings challenges to the analysis of civil aviation data. The method is one of the most important tasks in civil aviation data analysis based on the data for analyzing the travel characteristics of passengers and predicting the travel behaviors in the future. The analysis and research of civil aviation passengers at home and abroad are in the preliminary stage, and much research on the travel prediction of the civil aviation passengers is not provided.
Analysis research related to civil aviation data, such as applying clustering analysis and association rules to real frequent passenger data of airlines by Mallouf and the like, and proposing recommendation and improvement strategies to customer relationship management[1]. And Wangchun et al adopts questionnaire survey and combines statistical method to analyze consumer motivation, airline company preference and purchasing behavior of Changchun civil aviation passenger population[2]. Feng et al constructs heterogeneous information network on civil aviation data and performs low-frequency trip passenger value discovery task in random walk mode[3]. Etzioni et al explores the relevance between time and ticket price, and adopts a multi-strategy data mining algorithm to inform passengers of the best time for buying ticketsWorkshop[4]
The LDA (latent Dirichlet allocation) model in the theme model has better text theme modeling performance and good expansibility[5]. LDA-based ATM (Author-Topic Model) as Rosen-Zvi et al, while subject modeling authors, documents and words[6]. And Blei et al propose a supervised LDA model for text classification problems, and add document labels in training corpus as observed values into LDA[7]. An expansion theme model or LDA model is applied to the recommendation field, for example, Liu and the like add implicit characteristic display in travel package data into the theme model, and a method for recommending travel information in a personalized way is provided[8]. Tan and the like represent passenger information into a characteristic-value pair form, learn the potential interest distribution of passengers by adopting a theme model, and recommend travel packages by combining collaborative filtering[9]
The social relationship among passengers is beneficial to modeling, for example, beautiful jade and the like build a common travel network, and the civil aviation passenger seat preference modeling method combining passenger individual preference and relationship preference is provided[10]. And Zhouyiwei et al propose a semi-supervised relationship classification algorithm based on information graph to obtain more accurate passenger relationship and provide targeted and high-quality service[11]
The theme model is applied to analysis and prediction of civil aviation passenger travel, potential theme distribution is found, the problem of massive data is solved, the method is worthy of trial, the relation among passengers is integrated into theme modeling, theme information is enriched, the problem of sparsity is reduced, and therefore the modeling effect is improved. In addition, by constructing a probability model frame, various trip influence factors are fused, and the improvement of the prediction effect is also waited for.
Reference documents:
[1]Maalouf L,Mansour N.Mining airline data for crm strategies.InProceeding of the 7th WSEAS International Conference on Simulation,Modelingand Optimization,Beijing,China,pages 345-350,2007.
[2] dynasty en, vinpocketed civil aviation passenger characterization and behavioral analysis [ D ] jilin university, 2010.
[3]Feng X,Xu B Y,Lu M,et al.Infrequent Passenger Value Discovery byRandom Walk on Passenger-route Heterogeneous Network.Journal of Computationaland Theoretical Nanoscience,2(1):10-17,2015.
[4]Etzioni,Oren,Tuchinda,et al.To buy or not to buy:mining airfaredata to minimize ticket purchase price[C]//ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining,Washington,USA,August.2003:119-128.
[5]Blei D M,Ng A Y,Jordan M I.Latent dirichlet allocation[J].Journalof Machine Learning Research,2003,3:993-1022.
[6]Rosen-Zvi M,Griffiths T,Steyvers M,et al.The author-topic modelfor authors and documents[C]//Proceedings of the 20th conference onUncertainty in artificial intelligence.AUAI Press,2004:487-494.
[7]Blei D M,Mcauliffe J D.Supervised Topic Models[J].Advances inNeural Information Processing Systems,2010,3:327-332.
[8]Liu Q,Ge Y,Li Z,et al.Personalized Travel Package Recommendation[C]//IEEE,International Conference on Data Mining.IEEE Computer Society,2011:407-416.
[9]Tan C,Liu Q,Chen E,et al.Object-Oriented Travel PackageRecommendation[J].Acm Transactions on Intelligent Systems&Technology,2014,5(3):1-26.
[10] Beautiful jade, civil aviation passenger seat preference modeling and application research [ D ]. beijing traffic university, 2015.
[11] Zhou Yuanwei, civil aviation social network relation classification algorithm design and implementation [ D ]. Beijing university of transportation, 2013.
Disclosure of Invention
The invention aims to solve the problems of the mass property, the sparseness, the long tail property and the influence on the travel factor diversity of the ticket booking data of civil aviation passengers, and provides a multi-factor fusion civil aviation passenger travel prediction method based on a theme model for accurately predicting airlines and airlines taken by passengers in the future.
The invention adopts a theme model to carry out theme modeling on passengers and selected airlines thereof, and provides a passenger association map trip theme model PGTTM (Passenger Graph based Topic model) by introducing a constructed passenger association map, so that the preference information of the passengers for the airlines and the airlines can be obtained, the theme information is enriched, and the problem of sparsity of civil aviation is solved.
And then introducing a Bayesian probability model, fusing four factors of passenger airline preference, passenger loyalty and airline market share obtained by airline heat and PGTTM, constructing a multi-factor fusion prediction framework, and more accurately predicting and recommending airlines and airlines taken by passengers in the future. The method is the main inventive content of the multi-factor fusion civil aviation passenger trip prediction method based on the theme model.
Technical scheme of the invention
A multi-factor fusion civil aviation passenger trip prediction method based on a topic model comprises the following steps:
step 1): and constructing a passenger association map travel subject model. The method mainly comprises the steps of constructing a passenger association diagram, carrying out theme modeling on passenger travel preference, and finally obtaining a passenger association diagram travel theme model:
step 1.1), constructing a passenger association diagram;
constructing a passenger association diagram, namely calculating the association degree among passengers, wherein the association degree is jointly determined by the co-occurrence degree and the attribute co-occurrence degree of a passenger airline; the line co-occurrence degree is determined by the line co-occurrence number among passengers; the attribute co-occurrence degree refers to whether the age, the gender, the average discount and the average mileage of the passengers are the same; the passenger age, the average discount and the average mileage information are obtained by a variance-based segmentation method;
step 1.2), modeling a passenger travel preference theme;
the method comprises the steps that theme modeling is carried out on a passenger and an airline company which the passenger takes on the basis of a theme model, potential theme distribution of the passenger, the airline and the airline company is found and obtained, and finally the potential theme distribution of the passenger is combined with the potential theme distribution of the airline and the airline company, so that the travel preference information of the passenger on the airline and the airline can be obtained;
step 1.3), constructing a passenger association map travel subject model;
adding the Passenger association diagram in the step 1.1) in the theme modeling process in the step 1.2) to construct a Passenger association diagram Travel theme Model (PGTTM); when the PGTTM allocates themes to airlines and airlines of each passenger, the themes come from the passengers and possibly other passengers related to the passengers, so that theme information can be enriched, the prediction performance can be improved, and the problem of travel sparsity of civil aviation passengers can be solved;
step 2): an airline heat, passenger loyalty and airline market share calculation model is built, and the prior knowledge is utilized to help accurate prediction in the following steps:
step 2.1), calculating the heat of the air route;
for the airline heat, firstly counting the number of times that the airline is taken by all passengers and the sum of the number of times that each airline is taken by all passengers, and calculating to obtain the airline heat on the basis;
step 2.2), calculating the loyalty of the passenger to the airline company;
for the loyalty of the passenger, firstly counting the times of the passenger taking the airline company and the sum of the times of the passenger taking each airline company, and calculating the loyalty of the passenger to the airline company through smoothing treatment on the basis;
step 2.3), calculating the market share of the airline company to the airline;
for the market share of the airline company, firstly counting the times of taking the airline company and the airline as a word pair by all passengers, and calculating the market share of the airline company to the airline line based on the times of taking the airline line by all passengers without considering the airline company;
step 3): a multi-factor fusion prediction framework is constructed by fusing airline popularity, passenger preference to airlines, passenger loyalty and airline market share through a Bayesian probability model, and the airlines and airlines selected by the passengers in the future are predicted:
step 3.1), multi-factor fusion based on a Bayesian probability model;
constructing a Bayesian probability model based on the passenger preference to the airline obtained by PGTTM in the step 1), the airline heat in the step 2.1), the passenger loyalty in the step 2.2) and the airline market share in the step 2.3), fusing the four factors, and better modeling the travel behavior of the passenger;
step 3.2), multi-factor prediction based on a Bayesian probability model;
aiming at each passenger and each airline company-airline word pair, respectively calculating the boarding probability of the passenger by using a Bayesian probability model function; and (4) selecting a plurality of airline company-airline word pairs with the highest probability for each passenger, and predicting and recommending the airline company-airline word pairs.
The invention has the advantages and positive effects that:
proposing a passenger association map travel theme model PGTTM
The method carries out theme modeling aiming at travel behaviors of civil aviation passengers, finds potential theme distribution of the passengers and airlines taken by the passengers, and accurately predicts the behaviors of airlines selected by the passengers for future travel and the like. On the basis, a passenger association graph is constructed and introduced to obtain PGTTM, rich subject information of similar passengers can be used, prediction accuracy is improved, and the problem of sparseness of travel data of civil aviation passengers is solved.
Proposing a multi-factor fusion prediction framework by means of a Bayesian probability model function
According to the invention, a multi-factor fusion prediction framework is obtained through a Bayesian probability model function, the PGTTM is fused to obtain the preference of the passenger to the airline, and the priori knowledge of airline heat, passenger loyalty and airline market share, and compared with a benchmark method, the prediction framework can more accurately predict the airline and airline selected by the passenger for future travel.
Drawings
FIG. 1 is a diagram of the overall model system of the present invention.
Fig. 2 is a flow chart of the algorithm of the present invention.
Detailed Description
Example 1:
the following describes in detail the multi-factor fusion civil aviation passenger trip prediction method based on the topic model in combination with the accompanying drawings and specific implementation.
The invention mainly adopts a data mining theory and a method to analyze the travel behavior of passengers in civil aviation data, and in order to ensure the normal operation of the system, in the specific implementation, a computer platform is required to be provided with a memory not lower than 8G, a 64-bit operating system with the CPU core number not lower than 4 and the main frequency not lower than 2.6GHz, Windows7 and above versions, and essential software environments such as an Oracle database, Java1.7 and above versions, Matlab2011b and above versions are installed.
The method for predicting the passenger trip behavior based on the multi-factor fusion of the subject model is described as follows with reference to the attached figures 1 and 2.
Step 1): construction of passenger association map trip theme model PGTTM
Step 1.1), data preprocessing and S1.1 stage of constructing a passenger association diagram;
step 1.11), data introduction and preprocessing
Each piece of data in the passenger booking data comprises passenger personal information and travel information; the personal information includes encrypted identification number for uniquely identifying the passenger, the passenger's age, sex, etc., and the travel information includes the information of the airline company, departure airport, arrival airport, discount, etc.
After preprocessing operations such as removing low-frequency passengers, removing repeated records, removing abnormal records and the like, certain historical data are taken as a training set, and the rest data are taken as a test set.
Step 1.12), a variance-based segmentation method;
for example, segmenting ages, extracting all ages in a training set passenger travel record into a row sequence table, traversing minimum ages to maximum ages, taking each traversed age as a segmentation point, calculating a weighted average value of variances of two segments of age tables after segmentation, wherein the weight is the proportion of the number of the ages contained after segmentation to the number of the ages before segmentation, and finding a segmentation age value with the largest difference between the weighted average value of the difference after segmentation and the difference before segmentation, namely the optimal segmentation point.
Step 1.13), constructing a passenger association diagram;
the association degree between passengers is jointly determined by the airline co-occurrence degree and the attribute co-occurrence degree; performing statistical calculation on the training set obtained in the step 1.11) to obtain a sparse matrix expressing the number of co-occurrence of the airlines among the passengers, wherein each row of normalization is an airlines co-occurrence matrix; the attribute co-occurrence degree is that whether the two passengers are the same after the ages, the sexes, the average discounts and the average mileage of the passengers are segmented in the step 1.12); finally, taking a plurality of passengers with the highest airline co-occurrence degree as the associated passengers, and obtaining the association degree of the passengers and the associated passengers by the weighted average of the airline co-occurrence degree and the attribute co-occurrence degree among the passengers; thus, a correlation diagram between passengers is constructed.
The air route taken by the passenger is determined by a take-off airport and an arrival airport, the mileage information is obtained by the distance between the take-off airport and two cities represented by the arrival airport, the price is determined by the mileage and discount information, and the average discount is determined by the total mileage and the total price of the passenger.
Step 1.2), modeling passenger travel preference by utilizing PGTTM
Step 1.21) obtaining input data S1.21 stage;
the passenger booking record of the training set is provided with different U passengers (which are distinguished by encrypted identity numbers), C airlines and R airlines. Three fields of an identity card number, an airline company and a airline are extracted from the passenger booking record and are respectively replaced by an index form, namely the three fields are respectively represented by numbers 1-U, 1-C and 1-R, and finally three vectors U, C and R are obtained, wherein the lengths of the vectors are all N (the booking record number of the training set), namely input data. Each row of three vectors represents the passenger u in the ith reservation recordiPick up airline company ciLower route ri,(1≤ui≤U,1≤ci≤C,1≤ri≤R,i=1,2,...,N)。
T is the number of set themes. z represents a theme vector of length N, x is the passenger vector used to generate the theme, and length N. u, c, r are related to z, x by the score of eachQuantity represents passenger uiAirline company c of rideiAnd the route riSubject z ofiIs formed by xiIs distributed, and xiMay be uiMay also be ui(1. ltoreq. z)i≤T,1≤xi≤U,i=1,2,...,N)。
The following is the process by which a passenger generates each travel behavior in PGTTM:
(1) each passenger u corresponds to a topic distribution, and each topic t corresponds to an airline distribution and an airline distribution. Subject distribution theta of passenger uuDirichlet (α), airline distribution of topic t [ ]tDirichlet (μ), course distribution of subject t
Figure GDA0002549423010000061
Dirichlet(β),(u=1,2,...,U,t=1,2,...,T;θuIs a vector of dimension T, phitIs a vector of the dimension C and is,
Figure GDA0002549423010000062
is the R-dimensional vector, α, mu, β are the parameters of the dirichlet distribution).
(2) Passenger uiFirstly, a passenger s is sampled, then a travel theme is sampled by s, and finally, an airline company and an airline to be taken are selected according to the travel theme. I.e. the subject ziMultinomial(θs) Airline company ci
Figure GDA0002549423010000063
Air route ri
Figure GDA0002549423010000064
In PGTTM s may be uiItself, may also be uiAssociated passenger (1 ≦ u)i≤U,1≤zi≤T,1≤ci≤C,1≤ri≤R,i=1,2,...,N)。
Passenger-topic distribution θ (U × T dimension), topic-airline distribution φ (T × C dimension), topic-airline distribution
Figure GDA0002549423010000065
The (T × R dimension) is the parameter to be inferred by PGTTM, namely, the theme distribution of the existing passenger u and the boarding behaviors c and R of the passenger u is inferred reversely.
Step 1.22) stage S1.22 of the initialization operation;
the initial state of the passenger x to be assigned the theme is set to be equal to the boarding passenger u. The topic vector z is then randomly initialized with T topics. (i.e. 1. ltoreq. z)i≤T,i=1,2,...,N)。
Is provided with CUTIs a U × T-dimensional matrix representing the number of times that passengers assign each topic, and is obtained by vector x and z statistics, CTCIs a T × C-dimensional matrix representing the number of times the topic is assigned to each airline, and is obtained by the statistics of vectors z and CTRThe matrix is a T × R dimensional matrix which represents the times of allocating the theme to each airline and is obtained by counting vectors z and R.
Setting a maximum iteration number NN; a vector order of length N is constructed, whose values extend over 1 to N, but the order is randomly shuffled.
Step 1.23) does not consider the theme distribution of the current passenger, the current airline company and the airline, and updates the S1.23 stage of the theme counting matrix;
subscript order regardless of subject ziThat component of, three subject count matrices are updated, i.e.
Figure GDA0002549423010000071
Figure GDA0002549423010000072
Are all decremented by 1.
Step 1.24) sampling a passenger for generating a new theme for the current airline company and airline at the S1.24 stage;
determined by Bernoulli distribution with a parameter of τ as current airline
Figure GDA0002549423010000073
Travel route
Figure GDA0002549423010000074
The subject of the resampling is determined by the current passenger
Figure GDA0002549423010000075
Is produced or produced from
Figure GDA0002549423010000076
The associated passengers in the association map are generated. To be composed of
Figure GDA0002549423010000077
Which of the associated passengers is generated is determined by a multi-term distribution whose parameter is the degree of association of the passenger with its associated passenger. Assuming that the sampling passenger is s,
Figure GDA0002549423010000078
is the sampling probability, depends on the parameters of the two distributions.
Step 1.25) utilizing a Gibbs sampling formula to redistribute a new theme for the current airline company and the airline 1.25;
according to a Gibbs sampling formula, calculating that the passenger s is the current airline company
Figure GDA0002549423010000079
And the current route
Figure GDA00025494230100000710
The new topic of reassignment is the probability of T (T1, 2.., T). The formula is as follows:
Figure GDA00025494230100000711
the significance of the formula is the probability of sampling the passenger s for the current passenger and the new subject t for the current airline, airline. Wherein the subscript is marked with an orderiThe vector representation of (A) irrespective of the subscript as orderiThe component of (a) to (b),
Figure GDA00025494230100000712
is the number of times the passenger s assigned the topic t,
Figure GDA00025494230100000713
is the assignment of topic t to the airline
Figure GDA00025494230100000714
The number of times of the operation of the motor,
Figure GDA00025494230100000715
is the assignment of a topic t to a flight line
Figure GDA00025494230100000716
The number of times of the operation of the motor,
Figure GDA00025494230100000717
is obtained in step 1.23) and is according to
Figure GDA00025494230100000718
The probability of passenger s is sampled.
And finally, forming a plurality of distributions by taking the T probability values as parameters, and sampling a new theme as topic.
Step 1.26) updating passenger vectors used for generating themes and S1.26 stages of theme vectors;
according to step 1.24) in x will
Figure GDA0002549423010000081
Update to s, according to step 1.25) will be in z
Figure GDA0002549423010000082
Updated to topic.
Step 1.27) updating the S1.27 stage of the three subject counting matrixes;
generating passenger vectors and subject vectors for subjects after updating in step 1.26), order
Figure GDA0002549423010000083
Both are incremented by 1.
Step 1.28) calculating to obtain the S1.28 stages of passenger-theme, theme-airline company and theme-airline distribution after iteration is finished;
the number of iterations, from 1 to NN, i from 1 to N, as outer and inner loops respectively, is continuously resampled from passengers producing themes and themes assigned to airlines and airlines, i.e. steps 1.23) to 1.27) are repeated. After iteration is completed, according to the following formula, the distribution theta, phi and airline distribution of the passenger-theme can be obtained
Figure GDA0002549423010000084
Figure GDA0002549423010000085
Figure GDA0002549423010000086
Figure GDA0002549423010000087
Wherein, U1, 2, C, R1, 2, R, T1, 2, T.
Step 1.29) calculating the preference degree of the passenger to the airline at S1.29 stage;
PGTTM is used for modeling the preference of passengers for airlines and airlines, for example, P (u | r) represents the preference degree of passengers for airlines and the attraction degree of airlines for passengers, and the calculation formula is as follows:
Figure GDA0002549423010000088
wherein, U1, 2., U, R1, 2., R.
Step 2): calculating airline heat, passenger loyalty and airline market share:
step 2.1), calculating the S2.1 stage of the heat of the air route;
the popularity of a route is represented by P (r), and the probability that a passenger selects the route r when travelling is shown by the following formula:
Figure GDA0002549423010000089
wherein, count (R) represents the number of times that the airline R appears in the 2010 passenger booking record, and R is 1, 2.
Step 2.2), calculating the loyalty of the passengers in the S2.2 stage;
passenger loyalty, expressed as P (c | u), indicates the probability that passenger u will select airline c when traveling, and is formulated as follows:
Figure GDA0002549423010000091
where, count (U, C) indicates the number of times the passenger U selects the airline C in the 2010 passenger booking record, and C is 1, 2.
Step 2.3), calculating the market share of the airline company at S2.3 stage;
the airline market share is represented by P (c | r), indicating the probability that airline r belongs to an airline under airline c, as follows:
Figure GDA0002549423010000092
wherein, count (C, R) represents the number of records in which the airline company C and the airline R commonly appear in the 2010 passenger booking record, and C is 1, 2.
Step 3): introducing a Bayesian probability model, constructing a multi-factor fusion prediction framework, calculating the probability of passengers taking airlines and airlines, and predicting and recommending S3:
step 3.1), constructing a multi-factor fusion prediction framework by using a Bayesian probability model;
fusing the passenger preferences of the PGTTM in the step 1) and the passenger heat, passenger loyalty and airline market share in the step 2) together by using a Bayesian probability model to construct a multi-factor fusion prediction framework. The Bayesian probability model used in the invention is derived as follows:
first for a fixed passenger u, P (u) is a constant, which can be derived
Figure GDA0002549423010000093
In turn according to
P (r, c, u) ═ P (r) × P (u | r) × P (c | u, r) ≈ P (r) × P (u | r) [ α P (c | u) + (1- α) P (c | r) ], so a desired bayesian probability function can be obtained as follows:
logP(r,c|u)∝log{P(r)*P(u|r)*[αP(c|u)+(1-α)P(c|r)]}
wherein, P (r, c | u) represents the probability of the passenger u selecting the airline r under the airline company c, alpha is a settable parameter, and log is taken at two sides of the formula to avoid that the obtained probability value is too small.
The final formula is a required Bayesian probability model and a multi-factor fusion prediction framework, and integrates airline heat P (r), airline preference P (u | r), airline loyalty P (c | u) and airline market share P (c | r). (C1, 2., C, R1, 2., R, U1, 2., U).
Step 3.2), forecasting airlines and airlines selected by the passengers in the future;
according to the multi-factor prediction framework in the step 3.1), assuming that a total of W airline-airline word pairs in the training set, calculating the probability of taking each airline-airline word pair for each passenger u, sequencing the passenger u from large to small according to the calculated numerical values, then finding the top K (TopK) airline-airline word pairs with the largest numerical values as prediction objects, recommending the prediction objects, and comparing the prediction results with the test set to obtain the prediction accuracy.
For example, for a certain passenger 17464755. (the encrypted identification number), the (c, r) represented by the airline 290 (the code of the real name) and the airline CTU-CAN (the airport three-character code, the capital double-flow airport-the Guangzhou white cloud airport) in the booking data is substituted into the multi-factor fusion prediction frame function in the step 3.1 for calculation, and if the calculated numerical value is the maximum compared with other W-1 word pairs, the word pair is taken as a prediction object, and if the passenger actually rides the airline under the airline in the test set, the prediction accuracy is 1 for Top 1. (C1, 2., C, R1, 2., R, U1, 2., U).
It should be emphasized that the embodiments described herein are illustrative rather than restrictive, and thus the present invention is not limited to the embodiments described in the detailed description, but other embodiments derived from the technical solutions of the present invention by those skilled in the art are also within the scope of the present invention.

Claims (1)

1. A multi-factor fusion civil aviation passenger trip prediction method based on a theme model is characterized in that a data mining theory and a method are adopted to analyze passenger trip behaviors in civil aviation data, a computer platform used by an operating environment is required to be provided with a memory not lower than 8G, a 64-bit operating system with the CPU core number not lower than 4 and the dominant frequency not lower than 2.6GHz, Windows7 and above versions is installed, and an Oracle database, Java1.7 and above versions, and Matlab2011b and above versions are installed to have essential software environments; the method is characterized by comprising the following steps:
step 1): constructing a passenger association chart travel theme model; the method comprises the steps of constructing a passenger association diagram, carrying out theme modeling on passenger trip selection probability distribution, and finally obtaining a passenger association diagram trip theme model:
step 1.1), constructing a passenger association diagram;
constructing a passenger association diagram, namely calculating the association degree among passengers, wherein the association degree is jointly determined by the co-occurrence degree and the attribute co-occurrence degree of a passenger airline; the line co-occurrence degree is determined by the line co-occurrence number among passengers; the attribute co-occurrence degree refers to whether the age, the gender, the average discount and the average mileage of the passengers are the same; the passenger age, the average discount and the average mileage information are obtained by a variance-based segmentation method;
step 1.2), modeling a probability distribution theme of passenger trip selection;
carrying out theme modeling on the passenger and the airline and airline company which the passenger takes on the basis of the theme model, finding and solving the potential theme distribution of the passenger, the airline and the airline company, and finally combining the potential theme distribution of the passenger with the potential theme distribution of the airline and airline company to obtain the trip selection probability distribution information of the passenger on the airline and airline company;
step 1.3), constructing a passenger association map travel subject model;
adding the Passenger association diagram in the step 1.1) in the theme modeling process in the step 1.2) to construct a Passenger association diagram Travel theme Model (PGTTM); when the PGTTM assigns a theme to each passenger's airline or airline, the theme is not only from the passenger itself, but also possibly from other passengers associated with the passenger; therefore, theme information can be enriched, the prediction performance is improved, and the problem of sparseness of civil aviation passengers in travel is solved;
step 2): an airline heat, passenger loyalty and airline market share calculation model is built, and the prior knowledge is utilized to help accurate prediction in the following steps:
step 2.1), calculating the heat of the air route;
for the airline heat, firstly counting the number of times that the airline is taken by all passengers and the sum of the number of times that each airline is taken by all passengers, and calculating to obtain the airline heat on the basis;
step 2.2), calculating the loyalty of the passenger to the airline company;
for the loyalty of the passenger, firstly counting the times of the passenger taking the airline company and the sum of the times of the passenger taking each airline company, and calculating the loyalty of the passenger to the airline company through smoothing treatment on the basis;
step 2.3), calculating the market share of the airline company to the airline;
for the market share of the airline company, firstly counting the times of taking the airline company and the airline as a word pair by all passengers, and calculating the market share of the airline company to the airline line based on the times of taking the airline line by all passengers without considering the airline company;
step 3): constructing a multi-factor fusion prediction framework through a Bayesian probability model; and predicting the airlines and airlines selected by the passenger in the future by fusing the airline heat, the passenger to airline selection probability distribution, the passenger loyalty and the airline market share through a Bayesian probability model:
step 3.1), multi-factor fusion based on a Bayesian probability model;
constructing a Bayesian probability model based on the probability distribution of passenger to airline selection obtained by PGTTM in the step 1.3), airline heat in the step 2.1), passenger loyalty in the step 2.2) and airline market share in the step 2.3), fusing the four factors, and better modeling the travel behavior of the passenger;
step 3.2), multi-factor prediction based on a Bayesian probability model;
aiming at each passenger and each airline company-airline word pair, respectively calculating the boarding probability of the passenger by using a Bayesian probability model function; and (4) selecting a plurality of airline company-airline word pairs with the highest probability for each passenger, and predicting and recommending the airline company-airline word pairs.
CN201611159984.3A 2016-12-15 2016-12-15 Multi-factor fusion civil aviation passenger trip prediction method based on topic model Active CN106779214B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611159984.3A CN106779214B (en) 2016-12-15 2016-12-15 Multi-factor fusion civil aviation passenger trip prediction method based on topic model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611159984.3A CN106779214B (en) 2016-12-15 2016-12-15 Multi-factor fusion civil aviation passenger trip prediction method based on topic model

Publications (2)

Publication Number Publication Date
CN106779214A CN106779214A (en) 2017-05-31
CN106779214B true CN106779214B (en) 2020-08-28

Family

ID=58889245

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611159984.3A Active CN106779214B (en) 2016-12-15 2016-12-15 Multi-factor fusion civil aviation passenger trip prediction method based on topic model

Country Status (1)

Country Link
CN (1) CN106779214B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876049A (en) * 2018-06-27 2018-11-23 南京航空航天大学 A kind of airport market share variation prediction method in new demand servicing nurturing period
CN110751523A (en) * 2019-10-21 2020-02-04 中国民航信息网络股份有限公司 Method and device for discovering potential high-value passengers
CN110852650B (en) * 2019-11-19 2021-11-02 交通运输部公路科学研究所 Comprehensive passenger transport hub group network modeling method based on dynamic graph hybrid automaton
CN112948161B (en) * 2021-03-09 2022-06-03 四川大学 Deep learning-based aviation message error correction and correction method and system
CN118350858A (en) * 2024-03-20 2024-07-16 中航信数智科技(北京)有限公司 Method and device for predicting airline passenger quantity, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488597A (en) * 2015-12-28 2016-04-13 中国民航信息网络股份有限公司 Passenger destination prediction method and system
CN105512773A (en) * 2015-12-25 2016-04-20 中国民航信息网络股份有限公司 Passenger travel destination prediction method and device
CN106055807A (en) * 2016-06-06 2016-10-26 四川大学 Civil aviation passenger movement model based on potential trip purposes

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5946394B2 (en) * 2012-11-09 2016-07-06 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Statistical inference method, computer program, and computer of path start and end points using multiple types of data sources.

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512773A (en) * 2015-12-25 2016-04-20 中国民航信息网络股份有限公司 Passenger travel destination prediction method and device
CN105488597A (en) * 2015-12-28 2016-04-13 中国民航信息网络股份有限公司 Passenger destination prediction method and system
CN106055807A (en) * 2016-06-06 2016-10-26 四川大学 Civil aviation passenger movement model based on potential trip purposes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
出行链化的贝叶斯网络预测;赵应场 等;《道路交通与安全》;20151231;第15卷(第1期);全文 *

Also Published As

Publication number Publication date
CN106779214A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
CN106779214B (en) Multi-factor fusion civil aviation passenger trip prediction method based on topic model
Ni et al. Perceive your users in depth: Learning universal user representations from multiple e-commerce tasks
Gupta Evaluating service quality of airline industry using hybrid best worst method and VIKOR
CN108363804B (en) Local model weighted fusion Top-N movie recommendation method based on user clustering
Gao et al. Location-centered house price prediction: A multi-task learning approach
Çavdar et al. Airline customer lifetime value estimation using data analytics supported by social network information
CN105893406A (en) Group user profiling method and system
Jenab et al. A graph-based model for manufacturing complexity
Zhang Design of a sports culture data fusion system based on a data mining algorithm
Gao et al. Application of artificial intelligence and big data technology in digital marketing
CN107633035B (en) Shared traffic service reorder estimation method based on K-Means and LightGBM model
Mattila et al. Maintenance scheduling of a fleet of fighter aircraft through multi-objective simulation-optimization
CN111177538A (en) Unsupervised weight calculation-based user interest tag construction method
Liu et al. Personalized air travel prediction: A multi-factor perspective
Kang et al. LA-CTR: A limited attention collaborative topic regression for social media
CN111429161A (en) Feature extraction method, feature extraction device, storage medium, and electronic apparatus
CN115222433A (en) Information recommendation method and device and storage medium
CN115293920A (en) Multi-modal data-based social relationship analysis method, system and storage medium
Pu et al. Improved tourism recommendation system
Liu E‐Commerce Precision Marketing Model Based on Convolutional Neural Network
Ugochi et al. Customer opinion mining in electricity distribution company using twitter topic modeling and logistic regression
CN112784177B (en) Spatial distance adaptive next interest point recommendation method
Cheng et al. [Retracted] Using Clustering Analysis and Association Rule Technology in Cross‐Marketing
CN117495482A (en) Secondhand mobile phone sales recommendation method and system based on user portrait
CN114358807A (en) User portrayal method and system based on predictable user characteristic attributes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant