CN106779214B - A Multi-factor Fusion Civil Aviation Passenger Travel Prediction Method Based on Theme Model - Google Patents
A Multi-factor Fusion Civil Aviation Passenger Travel Prediction Method Based on Theme Model Download PDFInfo
- Publication number
- CN106779214B CN106779214B CN201611159984.3A CN201611159984A CN106779214B CN 106779214 B CN106779214 B CN 106779214B CN 201611159984 A CN201611159984 A CN 201611159984A CN 106779214 B CN106779214 B CN 106779214B
- Authority
- CN
- China
- Prior art keywords
- passenger
- airline
- passengers
- travel
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 230000004927 fusion Effects 0.000 title claims abstract description 22
- 238000009826 distribution Methods 0.000 claims description 33
- 230000011218 segmentation Effects 0.000 claims description 12
- 230000006399 behavior Effects 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000007418 data mining Methods 0.000 claims description 4
- 238000009499 grossing Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 abstract description 3
- 239000013598 vector Substances 0.000 description 17
- 238000005070 sampling Methods 0.000 description 7
- 238000012549 training Methods 0.000 description 7
- 239000011159 matrix material Substances 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 4
- 238000007781 pre-processing Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 101150041570 TOP1 gene Proteins 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000013277 forecasting method Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000005295 random walk Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Tourism & Hospitality (AREA)
- General Physics & Mathematics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域technical field
本发明属于计算机应用技术领域,涉及数据挖掘和民航数据分析,特别是一种基于主题模型的多因素融合民航旅客出行预测方法。The invention belongs to the technical field of computer application and relates to data mining and civil aviation data analysis, in particular to a multi-factor fusion civil aviation passenger travel prediction method based on a theme model.
背景技术Background technique
人们生活水平的提高、互联网的发展,使民航旅客订票系统中积累了大量订票数据,具有海量性、稀疏性、长尾性特点,给民航数据分析带来挑战。基于这些数据分析旅客出行特点、预测未来出行行为,是民航数据分析中最重要的任务之一。国内外关于民航旅客分析研究处于初步阶段,也没有对民航旅客出行预测有较多研究。With the improvement of people's living standards and the development of the Internet, a large amount of booking data has been accumulated in the passenger booking system of civil aviation, which has the characteristics of mass, sparseness and long tail, which brings challenges to the analysis of civil aviation data. Analyzing passenger travel characteristics and predicting future travel behavior based on these data is one of the most important tasks in civil aviation data analysis. Domestic and foreign research on the analysis of civil aviation passengers is in the preliminary stage, and there is not much research on the travel forecast of civil aviation passengers.
民航数据相关的分析研究例如Maalouf等对真实的航空公司频繁旅客数据应用聚类分析和关联规则等,对客户关系管理提出推荐和改善策略[1]。而王朝恩等采用问卷调查并结合统计方法,对长春民航旅客群体进行消费动机、航空公司偏好以及购买行为分析[2]。Feng等人构建民航数据上的异质信息网络,采用随机游走方式进行低频次出行旅客价值发现任务[3]。而Etzioni等探究了时间与票价之间关联性,采用一种多策略数据挖掘算法,告知旅客购买机票的最佳时间[4]。The analysis and research related to civil aviation data, such as Maalouf et al., applied cluster analysis and association rules to real airline frequent passenger data, and proposed recommendations and improvement strategies for customer relationship management [1] . Wang Wang et al. used a questionnaire survey combined with statistical methods to analyze the consumption motivation, airline preferences and purchasing behavior of Changchun civil aviation passenger groups [2] . Feng et al. constructed a heterogeneous information network on civil aviation data, and used a random walk method to find the value of low-frequency travel passengers [3] . Etzioni et al. explored the correlation between time and fare, and adopted a multi-strategy data mining algorithm to inform passengers of the best time to buy air tickets [4] .
主题模型中的LDA(Latent Dirichlet Allocation)模型有更好文本主题建模性能,具有良好扩展性[5]。如Rosen-Zvi等基于LDA提出ATM(Author-Topic Model),同时对作者、文档和词进行主题建模[6]。而Blei等针对文本分类问题提出有监督LDA模型,将训练语料中文档标记作为观测值加入LDA中[7]。拓展主题模型或LDA模型应用到推荐领域,如Liu等将旅行套餐数据中隐含特征显示加入主题模型中,提出一种个性化推荐旅游信息方法[8]。而Tan等将旅客信息表示成特征-值对形式,采用主题模型学习旅客潜在兴趣分布,并结合协同过滤进行旅行套餐推荐[9]。The LDA (Latent Dirichlet Allocation) model in the topic model has better performance of text topic modeling and has good scalability [5] . For example, Rosen-Zvi et al. proposed ATM (Author-Topic Model) based on LDA, and performed topic modeling on authors, documents and words at the same time [6] . And Blei et al. proposed a supervised LDA model for the text classification problem, adding the document tags in the training corpus as observations into the LDA [7] . Extending the topic model or LDA model to the field of recommendation, for example, Liu et al. added the implicit feature display in the travel package data to the topic model, and proposed a personalized recommendation method of travel information [8] . Tan et al. expressed the passenger information in the form of feature-value pairs, used a topic model to learn the potential interest distribution of passengers, and combined with collaborative filtering to recommend travel packages [9] .
旅客间社会关系有助于建模,如王琨琨等通过构建共同出行网络,提出一种旅客个体偏好和关系偏好结合的民航旅客座位偏好建模方法[10]。而周元炜等提出一个基于信息图的半监督关系分类算法,获得更为准确的旅客关系,提供针对性、高质量服务[11]。The social relationship between passengers is helpful for modeling. For example, Wang Kunkun et al. proposed a modeling method of passenger seat preference in civil aviation that combines individual passenger preference and relationship preference by constructing a common travel network [10] . Zhou Yuanwei et al. proposed a semi-supervised relationship classification algorithm based on information graphs to obtain more accurate passenger relationships and provide targeted and high-quality services [11] .
将主题模型应用到民航旅客出行分析和预测中,发现潜在主题分布、解决数据海量性问题,是值得尝试的,以及将旅客之间的关系融入到主题建模中,丰富主题信息、减轻稀疏性问题,借此来提高建模的效果。另外通过构建概率模型框架,融合多种出行影响因素,对提高预测效果同样拭目以待。It is worth trying to apply the topic model to the analysis and prediction of passenger travel in civil aviation to discover the distribution of potential topics and solve the problem of massive data, and integrate the relationship between passengers into topic modeling to enrich topic information and reduce sparsity problem, so as to improve the modeling effect. In addition, by building a probabilistic model framework and integrating a variety of travel influencing factors, we will also wait and see to improve the prediction effect.
参考文献:references:
[1]Maalouf L,Mansour N.Mining airline data for crm strategies.InProceeding of the 7th WSEAS International Conference on Simulation,Modelingand Optimization,Beijing,China,pages 345-350,2007.[1] Maalouf L, Mansour N. Mining airline data for crm strategies. In Proceeding of the 7th WSEAS International Conference on Simulation, Modeling and Optimization, Beijing, China, pages 345-350, 2007.
[2]王朝恩,长春民航旅客特征与行为分析[D].吉林大学,2010.[2] Wang Wangen, Analysis on the characteristics and behavior of Changchun civil aviation passengers [D]. Jilin University, 2010.
[3]Feng X,Xu B Y,Lu M,et al.Infrequent Passenger Value Discovery byRandom Walk on Passenger-route Heterogeneous Network.Journal of Computationaland Theoretical Nanoscience,2(1):10-17,2015.[3] Feng X, Xu B Y, Lu M, et al. Infrequent Passenger Value Discovery by Random Walk on Passenger-route Heterogeneous Network. Journal of Computational and Theoretical Nanoscience, 2(1):10-17, 2015.
[4]Etzioni,Oren,Tuchinda,et al.To buy or not to buy:mining airfaredata to minimize ticket purchase price[C]//ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining,Washington,USA,August.2003:119-128.[4] Etzioni, Oren, Tuchinda, et al. To buy or not to buy: mining airfaredata to minimize ticket purchase price[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, USA, August. 2003: 119 -128.
[5]Blei D M,Ng A Y,Jordan M I.Latent dirichlet allocation[J].Journalof Machine Learning Research,2003,3:993-1022.[5] Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022.
[6]Rosen-Zvi M,Griffiths T,Steyvers M,et al.The author-topic modelfor authors and documents[C]//Proceedings of the 20th conference onUncertainty in artificial intelligence.AUAI Press,2004:487-494.[6] Rosen-Zvi M, Griffiths T, Steyvers M, et al. The author-topic model for authors and documents [C]//Proceedings of the 20th conference on Uncertainty in artificial intelligence. AUAI Press, 2004: 487-494.
[7]Blei D M,Mcauliffe J D.Supervised Topic Models[J].Advances inNeural Information Processing Systems,2010,3:327-332.[7] Blei D M, Mcauliffe J D.Supervised Topic Models[J].Advances inNeural Information Processing Systems,2010,3:327-332.
[8]Liu Q,Ge Y,Li Z,et al.Personalized Travel Package Recommendation[C]//IEEE,International Conference on Data Mining.IEEE Computer Society,2011:407-416.[8]Liu Q,Ge Y,Li Z,et al.Personalized Travel Package Recommendation[C]//IEEE,International Conference on Data Mining.IEEE Computer Society,2011:407-416.
[9]Tan C,Liu Q,Chen E,et al.Object-Oriented Travel PackageRecommendation[J].Acm Transactions on Intelligent Systems&Technology,2014,5(3):1-26.[9]Tan C,Liu Q,Chen E,et al.Object-Oriented Travel PackageRecommendation[J].Acm Transactions on Intelligent Systems&Technology,2014,5(3):1-26.
[10]王琨琨,民航旅客座位偏好建模与应用研究[D].北京交通大学,2015.[10] Wang Kunkun, Modeling and application of passenger seat preference in civil aviation [D]. Beijing Jiaotong University, 2015.
[11]周元炜,民航社会网络关系分类算法设计与实现[D].北京交通大学,2013.[11] Zhou Yuanwei, Design and Implementation of Civil Aviation Social Network Relationship Classification Algorithm [D]. Beijing Jiaotong University, 2013.
发明内容SUMMARY OF THE INVENTION
本发明目的是针对民航旅客订票数据的海量性、稀疏性、长尾性、影响出行因素多样性问题,为准确预测旅客将来搭乘的航空公司和航线,提供一种基于主题模型的多因素融合民航旅客出行预测方法。The purpose of the present invention is to provide a multi-factor fusion based on a theme model in order to accurately predict the airlines and routes that passengers will take in the future in order to solve the problems of mass, sparseness, long tail, and diversity of factors affecting the travel of passenger booking data in civil aviation. Air passenger travel forecasting method.
本发明采用主题模型对旅客与其选择的航空公司、航线进行主题建模,并通过引进构建的旅客关联图,提出旅客关联图出行主题模型PGTTM(Passenger Graph basedTravel Topic Model),能够得到旅客对航线、航空公司偏好信息,并丰富主题信息,解决民航稀疏性问题。The present invention adopts the theme model to carry out theme modeling for passengers and their selected airlines and routes, and introduces a passenger association graph constructed by introducing a passenger association graph travel theme model PGTTM (Passenger Graph based Travel Topic Model). Airline preference information and enrich topic information to solve the problem of civil aviation sparsity.
接着引进贝叶斯概率模型,融合航线热度、PGTTM得到的旅客对航线偏好、旅客忠诚度、航空公司市场占有率四部分因素,构造多因素融合预测框架,更准确的预测和推荐旅客将来搭乘的航空公司和航线。以上即是基于主题模型的多因素融合民航旅客出行预测方法的主要发明内容。Next, a Bayesian probability model is introduced, which integrates airline popularity, passenger preference for airline routes obtained by PGTTM, passenger loyalty, and airline market share, and constructs a multi-factor fusion forecasting framework to more accurately predict and recommend future passengers. Airlines and routes. The above is the main content of the invention of the multi-factor fusion civil aviation passenger travel prediction method based on the theme model.
本发明技术方案Technical scheme of the present invention
一种基于主题模型的多因素融合民航旅客出行预测方法,该方法包括:A multi-factor fusion civil aviation passenger travel prediction method based on topic model, the method includes:
步骤1):构建旅客关联图出行主题模型。主要包括构建旅客的关联图,并对旅客出行偏好进行主题建模,最终得到旅客关联图出行主题模型:Step 1): Build a travel topic model of passenger association graph. It mainly includes the construction of passenger association graph, and the topic modeling of passenger travel preference, and finally the travel topic model of passenger association graph is obtained:
步骤1.1)、构建旅客关联图;Step 1.1), build a passenger association graph;
构建旅客关联图,就是计算旅客之间的关联度,它由旅客航线共现度和属性共现度共同决定;航线共现度由旅客之间的航线共现数决定;属性共现度是指旅客的年龄、性别、平均折扣、平均里程是否相同;旅客年龄、平均折扣、平均里程信息由基于方差的切分方法得到;To construct a passenger association graph is to calculate the association degree between passengers, which is determined by the co-occurrence degree of passenger routes and the co-occurrence degree of attributes; the co-occurrence degree of routes is determined by the co-occurrence number of routes between passengers; the co-occurrence degree of attributes refers to Whether the age, gender, average discount, and average mileage of the passengers are the same; the information on the age, average discount, and average mileage of the passengers is obtained by the segmentation method based on variance;
步骤1.2)、对旅客出行偏好主题建模;Step 1.2), modeling the theme of passenger travel preference;
基于主题模型对旅客和其搭乘的航线、航空公司进行主题建模,发现并求得旅客、航线、航空公司的潜在主题分布,最终将旅客的潜在主题分布和航空公司、航线的潜在主题分布相结合,可以得到旅客对航空公司和航线的出行偏好信息;Based on the topic model, the topic modeling of passengers and their routes and airlines is carried out, and the potential topic distribution of passengers, routes and airlines is found and obtained, and finally the potential topic distribution of passengers is compared with the potential topic distribution of airlines and airlines. Combined, you can get the travel preference information of passengers on airlines and routes;
步骤1.3)、构建旅客关联图出行主题模型;Step 1.3), build a travel theme model of a passenger association graph;
在步骤1.2)主题建模过程中加入步骤1.1)中的旅客关联图,以构建旅客关联图出行主题模型(Passenger Graph based Travel Topic Model,PGTTM);PGTTM在为每个旅客的航线、航空公司分配主题时,使得主题不仅来自于旅客本身,还有可能来自于旅客关联的其他旅客,能丰富主题信息,提高预测性能,并减轻民航旅客出行稀疏性的问题;In the process of step 1.2) topic modeling, the passenger association graph in step 1.1) is added to construct the Passenger Graph based Travel Topic Model (PGTTM); PGTTM is assigned to each passenger's route and airline When the theme is used, the theme can come not only from the passenger itself, but also from other passengers associated with the passenger, which can enrich the theme information, improve the prediction performance, and alleviate the problem of the sparse travel of civil aviation passengers;
步骤2):构建航线热度、旅客忠诚度、航空公司市场占有率计算模型,利用这些先验知识,可以帮助后面准确预测:Step 2): Build a calculation model for airline popularity, passenger loyalty, and airline market share, and use these prior knowledge to help accurately predict the following:
步骤2.1)、计算航线的热度;Step 2.1), calculate the heat of the route;
对于航线热度,首先统计该航线被全部旅客搭乘的次数,以及每个航线被全部旅客搭乘的次数之和,在此基础上,计算得到航线热度;For airline popularity, first count the number of times the route is taken by all passengers, and the sum of the number of times each route is taken by all passengers. On this basis, the route popularity is calculated;
步骤2.2)、计算旅客对航空公司的忠诚度;Step 2.2), calculate the loyalty of the passenger to the airline;
对于旅客忠诚度,首先统计该旅客搭乘该航空公司的次数,以及该旅客搭乘每一个航空公司的次数之和,在此基础上,经过平滑处理,计算得到旅客对航空公司的忠诚度;For passenger loyalty, first count the number of times the passenger takes the airline and the sum of the number of times the passenger takes each airline. On this basis, after smoothing, the passenger's loyalty to the airline is calculated;
步骤2.3)、计算航空公司对航线的市场占有率;Step 2.3), calculate the market share of the airline on the route;
对于航空公司市场占有率,首先统计该航空公司、该航线作为一个词对被全部旅客搭乘的次数,以及在不考虑航空公司下该航线被全部旅客搭乘的次数,基于此,计算得到航空公司对航线的市场占有率;For the airline market share, first count the number of times the airline and this route are taken by all passengers as a word pair, and the number of times the airline is taken by all passengers without considering the airline. market share of the route;
步骤3):通过贝叶斯概率模型融合航线热度、旅客对航线偏好、旅客忠诚度以及航空公司市场占有率,构建多因素融合预测框架,对旅客将来选择的航线、航空公司进行预测:Step 3): Construct a multi-factor fusion forecasting framework by integrating airline popularity, passenger preference for airline, passenger loyalty, and airline market share through a Bayesian probability model to predict the airline and airline that passengers will choose in the future:
步骤3.1)、基于贝叶斯概率模型的多因素融合;Step 3.1), multi-factor fusion based on Bayesian probability model;
基于步骤1)中PGTTM得到的旅客对航线偏好,步骤2.1)中的航线热度,步骤2.2)中的旅客忠诚度,以及步骤2.3)中的航空公司市场占有率,构建贝叶斯概率模型,对这四部分因素进行融合,更好建模旅客的出行行为;Based on the passenger's preference for the route obtained by PGTTM in step 1), the route popularity in step 2.1), the passenger loyalty in step 2.2), and the airline's market share in step 2.3), a Bayesian probability model is constructed to calculate These four factors are integrated to better model the travel behavior of passengers;
步骤3.2)、基于贝叶斯概率模型的多因素预测;Step 3.2), multi-factor prediction based on Bayesian probability model;
针对每个旅客、每个航空公司-航线词对,利用贝叶斯概率模型函数,分别计算旅客的搭乘概率;对每个旅客而言,挑选出概率最大的几个航空公司-航线词对,进行预测和推荐。For each passenger and each airline-route word pair, use the Bayesian probability model function to calculate the passenger's boarding probability; for each passenger, select several airline-route word pairs with the highest probability, Make predictions and recommendations.
本发明的优点和积极效果:Advantages and positive effects of the present invention:
·提出旅客关联图出行主题模型PGTTM·Propose the travel theme model PGTTM of passenger association graph
本发明针对民航旅客出行行为进行主题建模,发现旅客及其搭乘的航空公司、航线的潜在主题分布,准确地预测旅客未来出行选择的航线等行为。在此基础上构建并引进旅客关联图,得到PGTTM,能够借助相似旅客丰富主题信息,提高预测准确度,解决民航旅客出行数据稀疏性问题。The invention conducts theme modeling for the travel behavior of civil aviation passengers, discovers the potential theme distribution of passengers and the airlines and routes they take, and accurately predicts the routes and other behaviors that passengers choose to travel in the future. On this basis, a passenger correlation graph is constructed and introduced, and the PGTTM is obtained, which can enrich the subject information with the help of similar passengers, improve the prediction accuracy, and solve the problem of the sparseness of civil aviation passenger travel data.
·借助贝叶斯概率模型函数提出多因素融合预测框架·Propose a multi-factor fusion prediction framework with the help of the Bayesian probability model function
本发明通过一个贝叶斯概率模型函数得到一个多因素融合预测框架,融合PGTTM得到的旅客对航线的偏好,以及航线热度、旅客忠诚度和航空公司市场占有率这些先验知识,相较于基准方法,该预测框架可以更准确地预测旅客将来出行选择的航空公司和航线。The present invention obtains a multi-factor fusion prediction framework through a Bayesian probability model function, integrates the preference of passengers on routes obtained by PGTTM, as well as the prior knowledge of route popularity, passenger loyalty and airline market share, compared with the benchmark method, the forecasting framework can more accurately predict the airlines and routes that passengers choose to travel in the future.
附图说明Description of drawings
图1是本发明的整体模型系统图。FIG. 1 is an overall model system diagram of the present invention.
图2是本发明的算法流程图。Fig. 2 is the algorithm flow chart of the present invention.
具体实施方式Detailed ways
实施例1:Example 1:
下面结合附图和具体实施对本发明提供的基于主题模型的多因素融合民航旅客出行预测方法进行详细说明。The method for predicting the travel of civil aviation passengers based on multi-factor fusion based on the theme model provided by the present invention will be described in detail below with reference to the accompanying drawings and specific implementations.
本发明主要采用数据挖掘理论和方法对民航数据中旅客出行行为进行分析,为了保证系统的正常运行,在具体实施中,要求所使用的计算机平台配备不低于8G的内存,CPU核心数不低于4个且主频不低2.6GHz、Windows7及以上版本的64位操作系统,并安装Oracle数据库、Java1.7及以上版本、Matlab2011b及以上版本等必备软件环境。The present invention mainly adopts the theory and method of data mining to analyze the travel behavior of passengers in the civil aviation data. In order to ensure the normal operation of the system, in the specific implementation, it is required that the computer platform used is equipped with a memory of not less than 8G and the number of CPU cores is not low. On 4 64-bit operating systems with a main frequency of not lower than 2.6GHz, Windows7 and above, and install Oracle database, Java1.7 and above, Matlab2011b and above and other necessary software environments.
本发明提供的基于主题模型的多重因素融合的旅客出行行为预测方法如下,并结合附图1和附图2进行说明。The method for predicting the travel behavior of passengers based on the theme model fusion of multiple factors provided by the present invention is as follows, and is described in conjunction with FIG. 1 and FIG. 2 .
步骤1):构建旅客关联图出行主题模型PGTTMStep 1): Build a passenger association graph travel topic model PGTTM
步骤1.1)、数据预处理和构建旅客关联图的S1.1阶段;Step 1.1), data preprocessing and S1.1 stage of building passenger association graph;
步骤1.11)、数据介绍与预处理Step 1.11), data introduction and preprocessing
旅客订票数据中每一条数据包含旅客个人信息和出行信息;个人信息包括唯一识别旅客的加密身份证号、旅客年龄、性别等,出行信息包括搭乘的航空公司、起飞机场、到达机场、折扣等信息。Each piece of data in the passenger booking data contains the personal information and travel information of the passenger; the personal information includes the encrypted ID number that uniquely identifies the passenger, the passenger's age, gender, etc., and the travel information includes the airline taken, the departure airport, the arrival airport, and the discount. and other information.
经过去除低频旅客、去除重复记录、去除异常记录等预处理操作后,取一定的历史数据作为训练集,其余数据作为测试集。After preprocessing operations such as removing low-frequency passengers, removing duplicate records, and removing abnormal records, certain historical data is taken as the training set, and the rest of the data is used as the test set.
步骤1.12)、基于方差的切分方法;Step 1.12), segmentation method based on variance;
例如切分年龄,将训练集旅客出行记录中所有年龄提取成排序列表,遍历最小年龄到最大年龄,以遍历到的每个年龄为切分点,计算切分后两段年龄表方差的加权平均值,权重是切分后包含的年龄数占切分前年龄数的比例,找到切分后方差加权平均值和切分前方差相差最大的切分年龄值,即为最佳切分点。For example, age segmentation, extract all ages in the passenger travel records of the training set into a sorted list, traverse the minimum age to the maximum age, take each age traversed as the segmentation point, and calculate the weighted average of the variances of the two age tables after segmentation. The weight is the ratio of the age included after segmentation to the age before segmentation. Find the segmentation age value with the largest difference between the weighted average of the variance after segmentation and the variance before segmentation, which is the best segmentation point.
步骤1.13)、构建旅客关联图;Step 1.13), build a passenger association graph;
旅客之间的关联度由航线共现度和属性共现度共同决定;在步骤1.11)中得到的训练集上统计计算,得到一个表达旅客之间航线共现数的稀疏矩阵,每一列归一化即是航线共现度矩阵;属性共现度是指旅客年龄、性别、平均折扣、平均里程在经过步骤1.12)切分后,两个旅客是否都相同;最后取旅客航线共现度最高的几个旅客作为其关联旅客,然后该旅客与这些关联旅客的关联度由他们之间的航线共现度与属性共现度的加权平均所得;这样旅客间的关联图得以构建。The degree of association between passengers is determined by the co-occurrence degree of the airline and the co-occurrence degree of attributes; statistical calculation is performed on the training set obtained in step 1.11), and a sparse matrix expressing the number of airline co-occurrences between passengers is obtained, and each column is normalized is the route co-occurrence degree matrix; the attribute co-occurrence degree refers to whether the two passengers are the same after the passenger's age, gender, average discount, and average mileage are divided in step 1.12); Several passengers are regarded as their associated passengers, and then the degree of association between the passenger and these associated passengers is obtained by the weighted average of the co-occurrence degree of routes and the co-occurrence degree of attributes between them; thus, the association graph between passengers is constructed.
所述旅客搭乘的航线由起飞机场和到达机场决定,里程信息由起飞机场和到达机场代表的两个城市的距离所得,价格由里程和折扣信息决定,平均折扣由旅客总里程和总价格决定。The flight route taken by the passenger is determined by the departure airport and the arrival airport, the mileage information is obtained from the distance between the two cities represented by the departure airport and the arrival airport, the price is determined by the mileage and discount information, and the average discount is determined by the passenger's total mileage and total price. Decide.
步骤1.2)、利用PGTTM建模旅客出行偏好Step 1.2), use PGTTM to model passenger travel preferences
步骤1.21)得到输入数据的S1.21阶段;Step 1.21) obtain the S1.21 stage of the input data;
设训练集的旅客订票记录中有不同的U位旅客(由加密身份证号区别),C家航空公司,R条航线。从旅客订票记录中抽取身份证号、航空公司、航线三个字段,并分别替换成索引形式,即这三个字段分别由数字1~U,1~C,1~R表示,最后得到三个向量u、c、r,长度都为N(也是训练集的订票记录数),即是输入数据。三个向量的每一行表示第i个订票记录中的旅客ui搭乘了航空公司ci下的航线ri,(1≤ui≤U,1≤ci≤C,1≤ri≤R,i=1,2,...,N)。Suppose there are different U passengers (distinguished by encrypted ID numbers), C airlines, and R routes in the passenger booking records of the training set. Extract the three fields of ID number, airline and airline from the passenger booking record, and replace them with the index form respectively, that is, these three fields are represented by numbers 1~U, 1~C, 1~R respectively, and finally three fields are obtained. A vector u, c, r, all of length N (also the number of booking records in the training set), is the input data. Each row of the three vectors indicates that the passenger ui in the ith booking record took the route ri under the airline c i , (1≤u i ≤U, 1≤ci ≤C, 1≤r i ≤ R, i=1,2,...,N).
T为设定的主题个数。z表示主题向量,长度为N,x是用以生成主题的旅客向量,长度为N。u、c、r与z、x的关系是,它们的每一分量表示旅客ui搭乘的航空公司ci和航线ri的主题zi是由xi分配的,而xi可以是ui,也可能是ui的关联旅客,(1≤zi≤T,1≤xi≤U,i=1,2,...,N)。T is the set number of topics. z represents the topic vector, with length N, and x is the passenger vector used to generate the topic, with length N. The relationship of u, c, r to z, x is that each component of them represents that the subject zi of the airline ci and the route ri that the passenger ui takes is assigned by xi , and xi can be u i , and may also be associated passengers of ui , (1≤zi ≤T, 1≤xi ≤U, i =1,2,...,N).
下面是PGTTM中旅客生成每个出行行为的过程:The following is the process by which passengers generate each travel behavior in PGTTM:
(1)每一个旅客u对应一个主题分布,每一个主题t对应一个航空公司分布和一个航线分布。旅客u的主题分布θuDirichlet(α),主题t的航空公司分布φtDirichlet(μ),主题t的航线分布Dirichlet(β),(u=1,2,...,U,t=1,2,...,T;θu是T维向量,φt是C维向量,是R维向量;α,μ,β是狄利克雷分布的参数)。(1) Each passenger u corresponds to a topic distribution, and each topic t corresponds to an airline distribution and an airline distribution. The topic distribution θ u Dirichlet(α) of passenger u, the airline distribution φ t Dirichlet(μ) of topic t, the route distribution of topic t Dirichlet(β), (u=1,2,...,U, t=1,2,...,T; θ u is a T-dimensional vector, φ t is a C-dimensional vector, is an R-dimensional vector; α, μ, β are the parameters of the Dirichlet distribution).
(2)旅客ui首先采样一个旅客s,然后由s采样一个出行主题,最后根据出行主题选择搭乘的航空公司和航线。即主题ziMultinomial(θs),航空公司ci 航线ri 在PGTTM中s可以是ui本身,还可能是ui的关联旅客,(1≤ui≤U,1≤zi≤T,1≤ci≤C,1≤ri≤R,i=1,2,...,N)。(2) Passenger u i first samples a passenger s, and then samples a travel theme from s, and finally selects the airline and route to take according to the travel theme. i.e. topic z i Multinomial(θ s ), airline c i route r i In PGTTM, s can be u i itself, or the associated passenger of u i , (1≤u i ≤U, 1≤zi ≤T, 1≤ci ≤C, 1≤r i ≤R , i = 1,2,...,N).
旅客-主题分布θ(U×T维),主题-航空公司分布φ(T×C维),主题-航线分布(T×R维)是PGTTM要推断的参数。就是根据已有的旅客u和其搭乘行为c、r,反向推断它们的主题分布。Passenger-subject distribution θ (U×T dimension), subject-airline distribution φ (T×C dimension), subject-airline distribution (T×R dimension) is the parameter to be inferred by PGTTM. It is to infer their topic distributions in reverse based on the existing passengers u and their boarding behaviors c and r.
步骤1.22)初始化操作的S1.22阶段;Step 1.22) S1.22 stage of initialization operation;
设定用以分配主题的旅客x初始状态和搭乘旅客u相等。接着用T个主题随机初始化主题向量z。(即1≤zi≤T,i=1,2,...,N)。The initial state of the passenger x for assigning the theme is set equal to the passenger u boarding. Then randomly initialize the topic vector z with T topics. (ie 1≤zi≤T, i =1,2,...,N).
设CUT是U×T维矩阵,表示旅客分配各个主题的次数,由向量x和z统计得到;CTC是T×C维矩阵,表示主题分配到各个航空公司的次数,由向量z和c统计得到;CTR是T×R维矩阵,表示主题分配到各个航线的次数,由向量z和r统计得到。这三个矩阵分别是旅客、航空公司、航线的主题计数矩阵。Let C UT be a U×T-dimensional matrix, which represents the number of times passengers are assigned to each topic, which is obtained by the vectors x and z; C TC is a T×C-dimensional matrix, which represents the number of topics assigned to each airline, which is calculated by the vectors z and c. Statistically obtained; C TR is a T×R-dimensional matrix, which represents the number of times the subject is assigned to each route, and is obtained from the statistics of the vectors z and r. These three matrices are the subject count matrices of passengers, airlines, and routes, respectively.
设定最大迭代次数NN;构造一个长度为N的向量order,其值遍布1到N,但是顺序随机打乱。Set the maximum number of iterations NN; construct a vector order of length N whose values are spread from 1 to N, but the order is randomly shuffled.
步骤1.23)不考虑当前旅客、当前航空公司和航线的主题分配,更新主题计数矩阵的S1.23阶段;Step 1.23) Update the S1.23 stage of the topic count matrix without considering the topic assignments of the current passenger, current airline and airline;
不考虑主题z的下标为orderi的那一分量,更新三个主题计数矩阵,即 都减1。Update the three topic count matrices without considering the component of topic z subscripted by order i , that is both minus 1.
步骤1.24)为当前航空公司、航线采样一个用来生成新主题的旅客的S1.24阶段;Step 1.24) S1.24 stage of sampling a passenger used to generate a new theme for the current airline and airline;
由一个参数为τ的伯努利分布决定为当前航空公司和航线重新采样的主题由当前旅客产生,还是由关联图中的关联旅客产生。而由的哪一个关联旅客产生,则由一个多项分布来决定,该多项分布的参数是该旅客与其关联旅客的关联度。假设采样旅客为s,是采样概率,取决于两个分布的参数。Determined by a Bernoulli distribution with parameter τ as the current airline and route Resampled topics by current travelers produced by The associated passengers in the association graph are generated. and by Which of the associated passengers is generated is determined by a multinomial distribution whose parameter is the degree of association between the passenger and its associated passengers. Suppose the sample passenger is s, is the sampling probability, which depends on the parameters of the two distributions.
步骤1.25)利用Gibbs采样公式为当前航空公司和航线重新分配新主题的S1.25阶段;Step 1.25) The S1.25 stage of reassigning a new topic for the current airline and airline using the Gibbs sampling formula;
根据Gibbs采样公式,计算出由旅客s为当前航空公司和当前航线重新分配的新主题是t(t=1,2,...,T)的概率。公式如下:According to the Gibbs sampling formula, it is calculated that the passenger s is the current airline and current route The reassigned new topic is the probability of t(t=1,2,...,T). The formula is as follows:
公式的意义是为当前旅客采样旅客s以及为当前航空公司、航线采样新主题t的概率。其中,下标标有-orderi的向量表示不考虑下标为orderi的那一分量,是旅客s分配主题t的次数,是主题t分配给航空公司的次数,是主题t分配给航线的次数,是步骤1.23)得到的、根据采样旅客s的概率。The meaning of the formula is the probability of sampling passenger s for the current passenger and sampling the new topic t for the current airline and airline. Among them, the vector with the subscript -order i means that the component with the subscript order i is not considered, is the number of times passenger s assigns topic t, is the topic t assigned to the airline number of times, is the subject t assigned to the route number of times, is obtained in step 1.23), according to Probability of sampling passenger s.
最后,以这T个概率值为参数构成多项分布,采样一个新主题为topic。Finally, a multinomial distribution is formed with these T probability values as parameters, and a new topic is sampled as topic.
步骤1.26)更新用以生成主题的旅客向量以及主题向量的S1.26阶段;Step 1.26) update the S1.26 stage for generating the subject vector and subject vector;
根据步骤1.24)在x中将更新为s,根据步骤1.25)在z中将更新为topic。According to step 1.24) in x the Update to s, in z according to step 1.25) Update to topic.
步骤1.27)更新三个主题计数矩阵的S1.27阶段;Step 1.27) Update the S1.27 stage of the three topic count matrices;
生成主题的旅客向量和主题向量在步骤1.26)更新后,令都加1。After the passenger vector and topic vector of the generated topic are updated in step 1.26), let Both add 1.
步骤1.28)迭代结束后计算得到旅客-主题、主题-航空公司、主题-航线分布的S1.28阶段;Step 1.28) Calculate the S1.28 stage of passenger-topic, topic-airline, and topic-airline distribution after the iteration is over;
迭代次数从1到NN,i从1到N,分别作为外循环和内循环,不断重新采样产生主题的旅客以及分配到航空公司和航线的主题,即重复执行步骤1.23)到步骤1.27)。迭代完成后,根据以下公式,可以得到旅客-主题分布θ,主题-航空公司分布φ,主题-航线分布 The number of iterations is from 1 to NN, and i is from 1 to N, as the outer loop and the inner loop, respectively, to continuously resample the passengers that generate the topic and the topics assigned to airlines and routes, that is, repeat steps 1.23) to 1.27). After the iteration is completed, the passenger-topic distribution θ, the topic-airline distribution φ, and the topic-airline distribution can be obtained according to the following formulas
其中,u=1,2,...,U,c=1,2,...,C,r=1,2,...,R,t=1,2,...,T。Wherein, u=1,2,...,U, c=1,2,...,C, r=1,2,...,R, t=1,2,...,T.
步骤1.29)计算旅客对航线偏好程度的S1.29阶段;Step 1.29) The S1.29 stage of calculating the passenger's preference for the route;
PGTTM用来建模旅客对航空公司和航线的偏好,例如用P(u|r)表示旅客对航线偏好程度,也是航线对旅客的吸引程度,计算公式如下:PGTTM is used to model passengers' preferences for airlines and routes. For example, P(u|r) is used to represent the passenger's preference for the airline, which is also the attraction of the airline to passengers. The calculation formula is as follows:
其中,u=1,2,...,U,r=1,2,...,R。where u=1,2,...,U, r=1,2,...,R.
步骤2):计算航线热度,旅客忠诚度,航空公司市场占有率:Step 2): Calculate airline popularity, passenger loyalty, and airline market share:
步骤2.1)、计算航线热度的S2.1阶段;Step 2.1), the S2.1 stage of calculating the route heat;
航线热度用P(r)表示,表明旅客在出行时选择航线r的概率,公式如下:Route popularity is represented by P(r), which indicates the probability that passengers choose route r when they travel. The formula is as follows:
其中,count(r)表示航线r在2010年旅客订票记录中出现的次数,r=1,2,...,R。Among them, count(r) represents the number of times the route r appears in the passenger booking records in 2010, r=1,2,...,R.
步骤2.2)、计算旅客忠诚度的S2.2阶段;Step 2.2), the S2.2 stage of calculating passenger loyalty;
旅客忠诚度用P(c|u)表示,表明旅客u在出行时选择航空公司c的概率,公式如下:Passenger loyalty is represented by P(c|u), which indicates the probability that passenger u chooses airline c when he travels. The formula is as follows:
其中,count(u,c)表示在2010年旅客订票记录中旅客u选择航空公司c的次数,c=1,2,...,C,u=1,2,...,U。Among them, count(u,c) represents the number of times that passenger u chooses airline c in the passenger booking record in 2010, c=1,2,...,C, u=1,2,...,U.
步骤2.3)、计算航空公司市场占有率的S2.3阶段;Step 2.3), the S2.3 stage of calculating the airline market share;
航空公司市场占有率用P(c|r)表示,表明航线r属于航空公司c下航线的概率,公式如下:The airline market share is represented by P(c|r), which indicates the probability that route r belongs to the route of airline c. The formula is as follows:
其中,count(c,r)表示在2010年旅客订票记录中,航空公司c和航线r共同出现的记录数,c=1,2,...,C,r=1,2,...,R。Among them, count(c,r) represents the number of records co-occurred by airline c and route r in the passenger booking records in 2010, c=1,2,...,C, r=1,2,... ., R.
步骤3):引进贝叶斯概率模型,构建多因素融合预测框架,计算旅客搭乘航空公司、航线的概率,并进行预测和推荐的S3阶段:Step 3): Introduce a Bayesian probability model, build a multi-factor fusion prediction framework, calculate the probability of passengers taking airlines and routes, and carry out the S3 stage of prediction and recommendation:
步骤3.1)、利用贝叶斯概率模型,构建多因素融合预测框架;Step 3.1), use the Bayesian probability model to construct a multi-factor fusion prediction framework;
将步骤1)中PGTTM得到的旅客对航线偏好,以及步骤2)中航线热度、旅客忠诚度、航空公司市场占有率利用一个贝叶斯概率模型融合在一起,构造多因素融合预测框架。本发明用到的贝叶斯概率模型推导如下:The airline preference of passengers obtained by PGTTM in step 1), and the airline popularity, passenger loyalty, and airline market share in step 2) are fused together using a Bayesian probability model to construct a multi-factor fusion prediction framework. The Bayesian probability model used in the present invention is derived as follows:
首先对于固定的旅客u,P(u)是常数,可以得到First, for a fixed passenger u, P(u) is a constant, we can get
又根据according to
P(r,c,u)=P(r)*P(u|r)*P(c|u,r)≈P(r)*P(u|r)*[αP(c|u)+(1-α)P(c|r)],所以可以得到需要的贝叶斯概率函数如下:P(r,c,u)=P(r)*P(u|r)*P(c|u,r)≈P(r)*P(u|r)*[αP(c|u)+ (1-α)P(c|r)], so the required Bayesian probability function can be obtained as follows:
logP(r,c|u)∝log{P(r)*P(u|r)*[αP(c|u)+(1-α)P(c|r)]}logP(r,c|u)∝log{P(r)*P(u|r)*[αP(c|u)+(1-α)P(c|r)]}
其中,P(r,c|u)表示旅客u选择航空公司c下航线r的概率,α为可设定的参数,公式两边取log是为了避免求得的概率值过小。Among them, P(r,c|u) represents the probability that the passenger u chooses the route r under the airline c, α is a parameter that can be set, and the log on both sides of the formula is to avoid the obtained probability value being too small.
最后一个公式即是所需贝叶斯概率模型,也是多因素融合预测框架,融合了航线热度P(r),旅客对航线偏好P(u|r),旅客忠诚度P(c|u),航空公司市场占有率P(c|r)。(c=1,2,...,C,r=1,2,...,R,u=1,2,...,U)。The last formula is the required Bayesian probability model, and it is also a multi-factor fusion prediction framework, which integrates airline popularity P(r), passenger preference for airline P(u|r), passenger loyalty P(c|u), Airline market share P(c|r). (c=1,2,...,C, r=1,2,...,R, u=1,2,...,U).
步骤3.2)、预测旅客将来选择的航空公司、航线;Step 3.2), predict the airline and route that the passenger will choose in the future;
根据步骤3.1)中的多因素预测框架,假设训练集中一共有W个航空公司-航线词对,对于每一旅客u能够计算其搭乘每一个航空公司-航线词对的概率,根据算得的数值进行从大到小排序,然后找到数值最大的前K个(TopK)航空公司-航线词对作为预测对象,进行推荐,通过将预测结果与测试集进行比对,得到预测准确度。According to the multi-factor prediction framework in step 3.1), assuming that there are a total of W airline-airline word pairs in the training set, for each passenger u, the probability of taking each airline-airline word pair can be calculated. Sort from large to small, and then find the top K (TopK) airline-airline word pairs with the largest values as the prediction objects, and recommend them. By comparing the prediction results with the test set, the prediction accuracy is obtained.
比如对于某个旅客17464755.(加密后身份证号),将订票数据中的航空公司290(真实名称的代号)、航线CTU-CAN(机场三字码,成都双流机场-广州白云机场)所代表的(c,r)代入步骤3.1)的多因素融合预测框架函数中进行计算,假设计算得到的数值相较于其它W-1个词对最大,那么理所当然的将该词对作为预测对象,如果在测试集中该旅客真实搭乘了该航空公司下的该航线,那么对于Top1来说,预测准确率为1。(c=1,2,...,C,r=1,2,...,R,u=1,2,...,U)。For example, for a passenger 17464755. (encrypted ID number), the airline 290 (the real name code) and the route CTU-CAN (airport three-character code, Chengdu Shuangliu Airport-Guangzhou Baiyun Airport) in the booking data The representative (c, r) is substituted into the multi-factor fusion prediction framework function of step 3.1) for calculation. Assuming that the calculated value is the largest compared to other W-1 word pairs, it is a matter of course that this word pair is used as the prediction object. If the passenger actually took the route under the airline in the test set, then for Top1, the prediction accuracy is 1. (c=1,2,...,C, r=1,2,...,R, u=1,2,...,U).
需要强调的是,本发明所述的实施例是说明性的,而不是限定性的,因此本发明并不限于具体实施方式中所述的实施例,凡是由本领域技术人员根据本发明的技术方案得出的其他实施方式,同样属于本发明保护的范围。It should be emphasized that the embodiments described in the present invention are illustrative rather than restrictive, so the present invention is not limited to the embodiments described in the specific implementation manner. The other embodiments obtained also belong to the protection scope of the present invention.
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611159984.3A CN106779214B (en) | 2016-12-15 | 2016-12-15 | A Multi-factor Fusion Civil Aviation Passenger Travel Prediction Method Based on Theme Model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611159984.3A CN106779214B (en) | 2016-12-15 | 2016-12-15 | A Multi-factor Fusion Civil Aviation Passenger Travel Prediction Method Based on Theme Model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106779214A CN106779214A (en) | 2017-05-31 |
CN106779214B true CN106779214B (en) | 2020-08-28 |
Family
ID=58889245
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611159984.3A Active CN106779214B (en) | 2016-12-15 | 2016-12-15 | A Multi-factor Fusion Civil Aviation Passenger Travel Prediction Method Based on Theme Model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106779214B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108876049A (en) * | 2018-06-27 | 2018-11-23 | 南京航空航天大学 | A kind of airport market share variation prediction method in new demand servicing nurturing period |
CN110751523A (en) * | 2019-10-21 | 2020-02-04 | 中国民航信息网络股份有限公司 | Method and device for discovering potential high-value passengers |
CN110852650B (en) * | 2019-11-19 | 2021-11-02 | 交通运输部公路科学研究所 | Modeling Method of Integrated Passenger Hub Group Network Based on Dynamic Graph Hybrid Automata |
CN112948161B (en) * | 2021-03-09 | 2022-06-03 | 四川大学 | A method and system for error correction and correction of aviation message based on deep learning |
CN118350858B (en) * | 2024-03-20 | 2025-02-21 | 中航信数智科技(北京)有限公司 | Route passenger volume prediction method, device, electronic device and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105488597A (en) * | 2015-12-28 | 2016-04-13 | 中国民航信息网络股份有限公司 | Passenger destination prediction method and system |
CN105512773A (en) * | 2015-12-25 | 2016-04-20 | 中国民航信息网络股份有限公司 | Passenger travel destination prediction method and device |
CN106055807A (en) * | 2016-06-06 | 2016-10-26 | 四川大学 | Civil aviation passenger movement model based on potential trip purposes |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5946394B2 (en) * | 2012-11-09 | 2016-07-06 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | Statistical inference method, computer program, and computer of path start and end points using multiple types of data sources. |
-
2016
- 2016-12-15 CN CN201611159984.3A patent/CN106779214B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105512773A (en) * | 2015-12-25 | 2016-04-20 | 中国民航信息网络股份有限公司 | Passenger travel destination prediction method and device |
CN105488597A (en) * | 2015-12-28 | 2016-04-13 | 中国民航信息网络股份有限公司 | Passenger destination prediction method and system |
CN106055807A (en) * | 2016-06-06 | 2016-10-26 | 四川大学 | Civil aviation passenger movement model based on potential trip purposes |
Non-Patent Citations (1)
Title |
---|
出行链化的贝叶斯网络预测;赵应场 等;《道路交通与安全》;20151231;第15卷(第1期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN106779214A (en) | 2017-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106779214B (en) | A Multi-factor Fusion Civil Aviation Passenger Travel Prediction Method Based on Theme Model | |
Çavdar et al. | Airline customer lifetime value estimation using data analytics supported by social network information | |
Gao et al. | Location-centered house price prediction: A multi-task learning approach | |
Zhao et al. | Modelling consumer satisfaction based on online reviews using the improved Kano model from the perspective of risk attitude and aspiration | |
WO2024031933A1 (en) | Social relation analysis method and system based on multi-modal data, and storage medium | |
Liu et al. | Personalized air travel prediction: A multi-factor perspective | |
Dokuz et al. | Discovering socially important locations of social media users | |
Kang et al. | LA-CTR: A limited attention collaborative topic regression for social media | |
Noorian | A BERT-based sequential POI recommender system in social media | |
CN111429161A (en) | Feature extraction method, feature extraction device, storage medium, and electronic apparatus | |
Chen et al. | Big data analytics on aviation social media: The case of china southern airlines on sina weibo | |
Dai | A hybrid machine learning-based model for predicting flight delay through aviation big data | |
CN110096651A (en) | Visual analysis method based on online social media individual center network | |
CN112784177B (en) | A Spatial Distance Adaptive Next Interest Point Recommendation Method | |
Weng et al. | OptDist: Learning Optimal Distribution for Customer Lifetime Value Prediction | |
Long et al. | Construction framework of smart tourism big data mining model driven by blockchain technology | |
Krishnan et al. | Predicting Passenger Preferences: An AI-Driven Framework for Personalized Airport Lobby Experiences | |
KR102639069B1 (en) | Artificial intelligence-based advertising method recommendation system | |
Dai et al. | Attention Mechanism with Spatial‐Temporal Joint Deep Learning Model for the Forecasting of Short‐Term Passenger Flow Distribution at the Railway Station | |
Sun et al. | Measuring latent combinational novelty of Technology | |
Li et al. | A greyness reduction framework for prediction of grey heterogeneous data | |
Kalra et al. | RETRACTED ARTICLE: Enduring data analytics for reliable data management in handling smart city services | |
Parbat et al. | Understanding the customer perception using machine learning while booking flight tickets | |
Faroqi et al. | Modelling socioeconomic attributes of public transit passengers | |
Ranasinghe et al. | Ensemble Learning Approach for Predicting Job Satisfaction on Freelancing Jobs in Sri Lanka |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |