CN116503086A - Method, system and medium for data processing of air ticket freight rate based on machine learning - Google Patents
Method, system and medium for data processing of air ticket freight rate based on machine learning Download PDFInfo
- Publication number
- CN116503086A CN116503086A CN202310490403.8A CN202310490403A CN116503086A CN 116503086 A CN116503086 A CN 116503086A CN 202310490403 A CN202310490403 A CN 202310490403A CN 116503086 A CN116503086 A CN 116503086A
- Authority
- CN
- China
- Prior art keywords
- data
- air ticket
- freight
- time
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010801 machine learning Methods 0.000 title claims abstract description 48
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000012545 processing Methods 0.000 title claims abstract description 23
- 230000008859 change Effects 0.000 claims abstract description 69
- 239000002245 particle Substances 0.000 claims description 32
- 238000012549 training Methods 0.000 claims description 27
- 230000006399 behavior Effects 0.000 claims description 14
- 238000003672 processing method Methods 0.000 claims description 14
- 238000012360 testing method Methods 0.000 claims description 14
- 230000007246 mechanism Effects 0.000 claims description 13
- 238000013527 convolutional neural network Methods 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000012731 temporal analysis Methods 0.000 claims description 12
- 238000000700 time series analysis Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 11
- 238000013500 data storage Methods 0.000 claims description 8
- 230000003993 interaction Effects 0.000 claims description 8
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 7
- 238000007418 data mining Methods 0.000 claims description 6
- 210000002569 neuron Anatomy 0.000 claims description 5
- 238000012546 transfer Methods 0.000 claims description 5
- 230000002776 aggregation Effects 0.000 claims description 4
- 238000004220 aggregation Methods 0.000 claims description 4
- 230000002123 temporal effect Effects 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 6
- 230000004913 activation Effects 0.000 description 4
- 230000002452 interceptive effect Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000001364 causal effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 230000017105 transposition Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0206—Price or cost determination based on market factors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明公开了一种基于机器学习的航路机票运价数据处理方法、系统及介质,包括:获取历史机票运价数据,提取各航线机票运价的时序变化序列;根据时序变化序列获取历史机票运价的更新周期及更新频次,另外基于大数据手段获取各航线的在各更新周期中的访问量,将访问量与时序变化序列关联;基于机器学习构建机票运价数据更新模型,获取多源机票运价数据,通过机票运价数据更新模型获取机票运价数据的下次更新时间及预测机票运价,并进行数据存储;基于用户搜索的时间戳调用存储数据中的机票运价数据,输出结果返回用户端。本发明通过机票运价数据进行数据处理机,获取合理的价格预测,并提高用户购买旅游产品过程中的价格准确性和成功率。
The invention discloses a method, system and medium for processing air ticket fare data based on machine learning, including: acquiring historical air ticket fare data, extracting time series change sequences of air ticket fares of each route; obtaining historical air ticket fare data according to the time series change sequence In addition, the number of visits in each update period of each route is obtained based on big data means, and the visits are associated with the time-series change sequence; based on machine learning, a data update model for air tickets is built to obtain multi-source air tickets Freight data, through the air ticket freight data update model to obtain the next update time of the air ticket freight data and predict the air ticket freight, and store the data; call the air ticket freight data in the stored data based on the timestamp of the user's search, and output the result Return to the client. The invention performs data processing through air ticket freight data, obtains reasonable price prediction, and improves price accuracy and success rate in the process of users purchasing tourism products.
Description
技术领域technical field
本发明涉及数据处理技术领域,更具体的,涉及一种基于机器学习的航路机票运价数据处理方法、系统及介质。The present invention relates to the technical field of data processing, and more specifically, relates to a method, system and medium for processing air ticket fare data based on machine learning.
背景技术Background technique
随着我国民航业的蓬勃发展,越来越多的旅客选择飞机作为出行工具。航空公司的客运量正在快速增长,与此同时产生的航路机票运价数据也在爆炸式的增长,对航路机票运价数据的数据处理提出了挑战。长期的数据积累使得航路机票运价数据不仅数据维度较多,而且数据量较大。动态定价作为收益管理的主要技术之一,是航空公司针对不同供应水平调整机票价格,以期获得最大收益的重要手段,近年来动态定价已广泛应用于航空机票的销售中。With the vigorous development of my country's civil aviation industry, more and more passengers choose airplanes as a means of travel. The passenger volume of airlines is growing rapidly, and at the same time, the air ticket freight data generated is also growing explosively, which poses a challenge to the data processing of air ticket freight data. Long-term data accumulation makes air ticket freight data not only have more data dimensions, but also have a large amount of data. As one of the main technologies of revenue management, dynamic pricing is an important means for airlines to adjust ticket prices according to different supply levels in order to obtain the maximum profit. In recent years, dynamic pricing has been widely used in the sales of airline tickets.
目前,大多数旅行服务提供商都是通过在为航空公司代售机票并在此基础上额外收取一笔佣金来赚取利润,许多旅行公司尝试通过基于自身行业经验来调整佣金以获得更多利润。但由于在真实世界中机票的需求和用户的行为模式十分复杂,因此专家经验等方法应用在调价决策上存在着许多缺点险评估的准确性,需要一种高效、合理的航路机票运价管理系统。因此,在航路机票运价管理中,如何利用机器学习对航路机票运价相关的数据进行处理,提取关联特进行周期性智能判断,以提高用户购买旅游产品过程中的价格准确性和成功率是亟不可待需要解决的问题。At present, most travel service providers make profits by selling air tickets for airlines and charging an additional commission on this basis. Many travel companies try to adjust commissions based on their own industry experience to obtain more profits. However, due to the complexity of air ticket demand and user behavior patterns in the real world, there are many shortcomings in the application of expert experience and other methods in price adjustment decisions. The accuracy of risk assessment requires an efficient and reasonable air ticket freight rate management system. . Therefore, in air ticket price management, how to use machine learning to process air ticket price-related data, extract correlation characteristics and make periodic intelligent judgments, so as to improve the price accuracy and success rate of users in the process of purchasing travel products is an important issue. Problems that urgently need to be solved.
发明内容Contents of the invention
为了解决上述技术问题,本发明提出了一种基于机器学习的航路机票运价数据处理方法、系统及介质。In order to solve the above-mentioned technical problems, the present invention proposes a method, system and medium for processing air ticket fare data based on machine learning.
本发明第一方面提供了一种基于机器学习的航路机票运价数据处理方法,包括:The first aspect of the present invention provides a machine learning-based air ticket tariff data processing method, including:
获取历史机票运价数据,按照航线信息将历史机票运价数据划分为不同数据集,对不同数据集进行时序分析,获取各航线机票运价的时序变化序列;Obtain the historical air ticket freight data, divide the historical air ticket freight data into different data sets according to the route information, conduct time series analysis on different data sets, and obtain the time series change sequence of air ticket freight for each route;
根据所述时序变化序列获取历史机票运价的更新周期及更新频次,另外基于大数据手段获取各航线的在各更新周期中的访问量,将所述访问量与时序变化序列关联,获取特征样本数据集;Obtain the update cycle and update frequency of historical air ticket freight rates according to the time series change sequence, and obtain the visits of each route in each update cycle based on big data means, associate the visits with the time series change sequence, and obtain feature samples data set;
基于机器学习构建机票运价数据更新模型,利用所述特征样本数据集进行训练,测试达标后输出训练后机票运价数据更新模型;Constructing the air ticket freight data update model based on machine learning, using the feature sample data set for training, and outputting the air ticket freight data update model after the training after the test is up to standard;
获取多源机票运价数据,根据预设参数化基准对所述多源机票运价数据进行筛选,通过机票运价数据更新模型获取机票运价数据的下次更新时间,基于下次更新时间获取预测机票运价,设置数据标签进行数据存储;Obtain multi-source air ticket freight data, filter the multi-source air ticket freight data according to the preset parameterized benchmark, obtain the next update time of air ticket freight data through the air ticket freight data update model, and obtain based on the next update time Predict air ticket freight rates, set data tags for data storage;
基于用户搜索的时间戳调用存储数据中的机票运价数据,输出结果返回用户端。Based on the time stamp of the user's search, the air ticket price data in the stored data is called, and the output result is returned to the client.
本方案中,按照航线信息将历史机票运价数据划分为不同数据集,对不同数据集进行时序分析,获取各航线机票运价的时序变化序列,具体为:In this scheme, the historical air ticket freight data is divided into different data sets according to the route information, and the time series analysis is performed on different data sets to obtain the time series change sequence of the air ticket price of each route, specifically:
提取历史机票运价数据中关键词信息,根据关键词信息确定起飞城市信息及目的城市信息,提取航线信息,基于所述航线信息设置分类标签;extracting keyword information in the historical air ticket tariff data, determining departure city information and destination city information according to the keyword information, extracting route information, and setting classification labels based on the route information;
根据所述分类标签将历史机票运价数据进行分类,获取不同分类标签下的机票运价数据集,根据节假日信息及普通日信息将对应的机票运价数据进行标记;Classify the historical air ticket freight data according to the classification tags, obtain the air ticket freight data sets under different classification tags, and mark the corresponding air ticket freight data according to the holiday information and ordinary day information;
对不同标记下机票运价数据进行时序分析,获取机票运价的的变化时间戳及变化差价,生成各航线的时序变化序列,获取历史机票运价的更新周期及更新频次。Carry out time-series analysis of air ticket freight data under different tags, obtain the change time stamp and price difference of air ticket freight, generate time-series change series of each route, and obtain the update cycle and update frequency of historical air ticket freight.
本方案中,基于机器学习构建机票运价数据更新模型,利用所述特征样本数据集进行训练,测试达标后输出训练后机票运价数据更新模型,具体为:In this solution, an air ticket freight data update model is constructed based on machine learning, and the feature sample data set is used for training. After the test reaches the standard, the air ticket freight data update model after training is output, specifically:
基于大数据手段获取机票预订相关网站中各航线信息的查询量,根据航线信息的目的地信息设置异构信息检索标签,并根据各更新周期获取检索时间步长,获取异构信息的查询量;Obtain the query volume of each route information in the airline ticket booking related websites based on big data means, set heterogeneous information retrieval tags according to the destination information of the route information, and obtain the retrieval time step according to each update cycle, and obtain the query volume of heterogeneous information;
设置转化系数与异构信息的查询量进行结合,将结合后的数据与各航线信息的查询量进行匹配,获取各更新周期内最终访问量,关联到时序变化序列,获取访问量与更新周期的关联特征,构建特征样本数据集;Set the conversion coefficient and combine the query volume of heterogeneous information, match the combined data with the query volume of each route information, obtain the final visit volume in each update cycle, associate it with the time series change sequence, and obtain the relationship between the visit volume and the update cycle Associate features to construct a feature sample dataset;
基于粒子群算法优化后的LSTM网络构建机票运价数据更新模型,根据LSTM网络中隐藏层神经元个数、学习率及最大迭代次数设置粒子,并初始化粒子参数,并设置初始位置及速度;Based on the LSTM network optimized by the particle swarm optimization algorithm, the air ticket price data update model is constructed, and the particles are set according to the number of hidden layer neurons, the learning rate and the maximum number of iterations in the LSTM network, and the particle parameters are initialized, and the initial position and speed are set;
根据均方误差设置适应度函数,根据不断更新粒子个体最优和全局最优进行粒子的位置寻优,根据粒子的最优位置确定LSTM网络的参数;Set the fitness function according to the mean square error, optimize the position of the particle according to the continuous update of the particle individual optimal and global optimal, and determine the parameters of the LSTM network according to the optimal position of the particle;
将所述特征样本数据集按照预设比例划分为训练集与测试集,经过迭代训练后输出准确度符合预设标准的机票运价数据更新模型。The feature sample data set is divided into a training set and a test set according to a preset ratio, and after iterative training, an air ticket price data update model whose accuracy meets a preset standard is output.
本方案中,通过机票运价数据更新模型获取机票运价数据的下次更新时间,具体为:In this scheme, the next update time of the air ticket tariff data is obtained through the air ticket freight data update model, specifically:
将多源机票运价数据进行筛选获取目标机票运价数据,提取目标机票运价数据的航线信息、时间信息及仓位信息,获取过去预设时间内的时序变化序列及访问量变化序列作为机票运价数据更新模型的输入;Filter the multi-source air ticket freight data to obtain the target air ticket freight data, extract the route information, time information and position information of the target air ticket freight data, and obtain the time series change sequence and visit volume change sequence in the past preset time as the air ticket freight data. The input of price data update model;
在机票运价数据更新模型引入自注意力机制,构建自注意力层,将不同时间步长的隐藏层状态输出作为所述自注意力层的输入,计算自注意力权重;Introduce the self-attention mechanism in the ticket price data update model, build the self-attention layer, and use the hidden layer state output of different time steps as the input of the self-attention layer, and calculate the self-attention weight;
通过所述自注意力权重表征各时间步长对预测目标的重要性,根据迭代计算后,输出目标机票运价数据的下次更新时间。The importance of each time step to the predicted target is represented by the self-attention weight, and the next update time of the target air ticket price data is output after iterative calculation.
本方案中,基于下次更新时间获取预测机票运价,设置数据标签进行数据存储,具体为:In this solution, based on the next update time, the predicted air ticket price is obtained, and the data label is set for data storage, specifically:
根据各航线历史机票运价的时序变化序列及访问量变化序列通过数据挖掘获取机票运价的影响因素,对所述影响因素进行筛选获取过去预设时间段内影响目标机票运价的影响因素集合;According to the time-series change sequence and traffic change sequence of the historical air ticket price of each route, the influencing factors of the air ticket price are obtained through data mining, and the influencing factors are screened to obtain the set of influencing factors affecting the target air ticket price in the past preset time period ;
基于时间卷积神经网络构建机票运价预测网络,将获取的下次更新时间作为目标预测时间,将所述影响因素集合中影响因素与目标机票运价过去预设时间内的时序变化序列进行匹配,并进行归一化处理;Construct the air ticket price prediction network based on the temporal convolutional neural network, take the acquired next update time as the target prediction time, and match the influencing factors in the set of influencing factors with the time series change sequence of the target air ticket price in the past preset time , and perform normalization processing;
将归一化后的数据导入机票运价预测网络,获取目标预测时间的预测机票运价,将下次更新时间及所述预测机票运价设置数据标签后进行存储。The normalized data is imported into the air ticket price prediction network, the predicted air ticket price at the target forecast time is obtained, and the next update time and the predicted air ticket price are set as data tags and then stored.
本方案中,基于用户搜索的时间戳调用存储数据中的机票运价数据,输出结果返回用户端,具体为:In this solution, the air ticket freight data in the stored data is called based on the timestamp of the user's search, and the output result is returned to the client, specifically:
获取用户的历史行为数据,根据所述历史行为数据获取预设时间步长内用户与机票项目节点的交互信息,通过所述交互信息生成用于与机票项目的二部图结构;Acquiring the historical behavior data of the user, obtaining the interaction information between the user and the ticket project node within the preset time step according to the historical behavior data, and generating a bipartite graph structure for the ticket project through the interaction information;
获取用户的基础信息、航段信息及提前购票时间信息,作为所述二部图结构中节点的附加特征;Obtain the user's basic information, flight segment information and advance ticket purchase time information as additional features of nodes in the bipartite graph structure;
基于图卷积神经网络对所述二部图结构进行学习表示,得到用户与机票项目的初始向量表示,将用户与机票项目的初始向量表示进行拼接,构建邻接矩阵;Learning and representing the bipartite graph structure based on the graph convolutional neural network, obtaining the initial vector representation of the user and the ticket item, splicing the initial vector representation of the user and the ticket item, and constructing an adjacency matrix;
通过图卷积神经网络的消息传递机制及邻居聚合机制,基于邻接矩阵进行节点之间的特征传递,学习邻居节点的特征,更新用户节点的嵌入表示;Through the message passing mechanism and neighbor aggregation mechanism of the graph convolutional neural network, the feature transfer between nodes is performed based on the adjacency matrix, the characteristics of neighbor nodes are learned, and the embedded representation of user nodes is updated;
另外,获取用户在预设时间步长内各时间戳交互的机票项目,将对应的二部图结构进行拼接,构建预设时间步长内的元路径,将所述元路径与用户进行匹配;In addition, acquire the air ticket items interacted with by the user at each time stamp within the preset time step, splice the corresponding bipartite graph structure, construct a meta-path within the preset time step, and match the meta-path with the user;
通过计算用户之间的元路径上节点的均方距离获取用户之间的相似度,将所述相似度作为注意力权重,利用图注意力结构对用户节点的嵌入表示进行聚合输出最终的用户偏好特征;Obtain the similarity between users by calculating the mean square distance of nodes on the meta-path between users, use the similarity as the attention weight, and use the graph attention structure to aggregate the embedded representation of user nodes to output the final user preference feature;
根据用户的搜索信息获取对应的机票运价数据,根据用户的偏好特征结合获取的下次更新时间及预测机票运价分析机票的运价变化趋势,并查询的信息与运价变化趋势返回至用户端。According to the user's search information, the corresponding air ticket price data is obtained, and the air ticket price change trend is analyzed according to the user's preference characteristics combined with the next update time obtained and the predicted air ticket price, and the queried information and freight price change trend are returned to the user end.
本发明第二方面还提供了一种基于机器学习的航路机票运价数据处理系统,该系统包括:存储器、处理器,所述存储器中包括一种基于机器学习的航路机票运价数据处理方法程序,所述一种基于机器学习的航路机票运价数据处理方法程序被所述处理器执行时实现如下步骤:The second aspect of the present invention also provides a machine learning-based air ticket tariff data processing system, the system includes: a memory, a processor, the memory includes a machine learning-based air ticket tariff data processing method program , when the program of the machine learning-based air ticket tariff data processing method is executed by the processor, the following steps are realized:
获取历史机票运价数据,按照航线信息将历史机票运价数据划分为不同数据集,对不同数据集进行时序分析,获取各航线机票运价的时序变化序列;Obtain the historical air ticket freight data, divide the historical air ticket freight data into different data sets according to the route information, conduct time series analysis on different data sets, and obtain the time series change sequence of air ticket freight for each route;
根据所述时序变化序列获取历史机票运价的更新周期及更新频次,另外基于大数据手段获取各航线的在各更新周期中的访问量,将所述访问量与时序变化序列关联,获取特征样本数据集;Obtain the update cycle and update frequency of historical air ticket freight rates according to the time series change sequence, and obtain the visits of each route in each update cycle based on big data means, associate the visits with the time series change sequence, and obtain feature samples data set;
基于机器学习构建机票运价数据更新模型,利用所述特征样本数据集进行训练,测试达标后输出训练后机票运价数据更新模型;Constructing the air ticket freight data update model based on machine learning, using the feature sample data set for training, and outputting the air ticket freight data update model after the training after the test is up to standard;
获取多源机票运价数据,根据预设参数化基准对所述多源机票运价数据进行筛选,通过机票运价数据更新模型获取机票运价数据的下次更新时间,基于下次更新时间获取预测机票运价,设置数据标签进行数据存储;Obtain multi-source air ticket freight data, filter the multi-source air ticket freight data according to the preset parameterized benchmark, obtain the next update time of air ticket freight data through the air ticket freight data update model, and obtain based on the next update time Predict air ticket freight rates, set data tags for data storage;
基于用户搜索的时间戳调用存储数据中的机票运价数据,输出结果返回用户端。Based on the time stamp of the user's search, the air ticket price data in the stored data is called, and the output result is returned to the client.
本发明第三方面还提供一种计算机可读存储介质,所述计算机可读存储介质中包括一种基于机器学习的航路机票运价数据处理方法程序,所述一种基于机器学习的航路机票运价数据处理方法程序被处理器执行时,实现如上述任一项所述的一种基于机器学习的航路机票运价数据处理方法的步骤。The third aspect of the present invention also provides a computer-readable storage medium. The computer-readable storage medium includes a machine learning-based air ticket fare data processing method program. The machine learning-based air ticket transportation When the program of the price data processing method is executed by the processor, the steps of a machine learning-based air ticket price data processing method as described in any one of the above are realized.
本发明公开了一种基于机器学习的航路机票运价数据处理方法、系统及介质,包括:获取历史机票运价数据,提取各航线机票运价的时序变化序列;根据时序变化序列获取历史机票运价的更新周期及更新频次,另外基于大数据手段获取各航线的在各更新周期中的访问量,将访问量与时序变化序列关联;基于机器学习构建机票运价数据更新模型,获取多源机票运价数据,通过机票运价数据更新模型获取机票运价数据的下次更新时间及预测机票运价,并进行数据存储;基于用户搜索的时间戳调用存储数据中的机票运价数据,输出结果返回用户端。本发明通过机票运价数据进行数据处理机,获取合理的价格预测,并提高用户购买旅游产品过程中的价格准确性和成功率。The invention discloses a method, system and medium for processing air ticket fare data based on machine learning, including: acquiring historical air ticket fare data, extracting time series change sequences of air ticket fares of each route; obtaining historical air ticket fare data according to the time series change sequence In addition, the number of visits in each update period of each route is obtained based on big data means, and the visits are associated with the time-series change sequence; based on machine learning, a data update model for air tickets is built to obtain multi-source air tickets Freight data, through the air ticket freight data update model to obtain the next update time of the air ticket freight data and predict the air ticket freight, and store the data; call the air ticket freight data in the stored data based on the timestamp of the user's search, and output the result Return to the client. The invention performs data processing through air ticket freight data, obtains reasonable price prediction, and improves price accuracy and success rate in the process of users purchasing tourism products.
附图说明Description of drawings
图1示出了本发明一种基于机器学习的航路机票运价数据处理方法的流程图;Fig. 1 shows a flow chart of the present invention based on a machine learning-based air ticket tariff data processing method;
图2示出了本发明获取机票运价数据的下次更新时间的方法流程图;Fig. 2 shows the method flow chart of the present invention to obtain the next update time of air ticket tariff data;
图3示出了本发明基于用户搜索的时间戳调用存储数据中的机票运价数据的方法流程图;Fig. 3 shows the flow chart of the method for invoking the air ticket tariff data in the stored data based on the time stamp searched by the user in the present invention;
图4示出了本发明一种基于机器学习的航路机票运价数据处理系统的框图。Fig. 4 shows a block diagram of a machine learning-based air ticket tariff data processing system of the present invention.
具体实施方式Detailed ways
为了能够更清楚地理解本发明的上述目的、特征和优点,下面结合附图和具体实施方式对本发明进行进一步的详细描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互组合。In order to understand the above-mentioned purpose, features and advantages of the present invention more clearly, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments can be combined with each other.
在下面的描述中阐述了很多具体细节以便于充分理解本发明,但是,本发明还可以采用其他不同于在此描述的其他方式来实施,因此,本发明的保护范围并不受下面公开的具体实施例的限制。In the following description, many specific details are set forth in order to fully understand the present invention. However, the present invention can also be implemented in other ways different from those described here. Therefore, the protection scope of the present invention is not limited by the specific details disclosed below. EXAMPLE LIMITATIONS.
图1示出了本发明一种基于机器学习的航路机票运价数据处理方法的流程图。Fig. 1 shows a flow chart of a method for processing air ticket tariff data based on machine learning in the present invention.
如图1所示,本发明第一方面提供了一种基于机器学习的航路机票运价数据处理方法,包括:As shown in Figure 1, the first aspect of the present invention provides a machine learning-based air ticket tariff data processing method, including:
S102,获取历史机票运价数据,按照航线信息将历史机票运价数据划分为不同数据集,对不同数据集进行时序分析,获取各航线机票运价的时序变化序列;S102. Obtain historical air ticket freight data, divide the historical air ticket freight data into different data sets according to route information, perform time series analysis on different data sets, and obtain time series change sequences of air ticket freight prices for each route;
S104,根据所述时序变化序列获取历史机票运价的更新周期及更新频次,另外基于大数据手段获取各航线的在各更新周期中的访问量,将所述访问量与时序变化序列关联,获取特征样本数据集;S104. Obtain the update period and update frequency of the historical air ticket price according to the time series change sequence, and obtain the visit volume of each route in each update period based on big data means, associate the visit volume with the time series change sequence, and obtain feature sample dataset;
S106,基于机器学习构建机票运价数据更新模型,利用所述特征样本数据集进行训练,测试达标后输出训练后机票运价数据更新模型;S106, constructing an air ticket freight data update model based on machine learning, using the feature sample data set for training, and outputting the air ticket freight data update model after training after the test is up to standard;
S108,获取多源机票运价数据,根据预设参数化基准对所述多源机票运价数据进行筛选,通过机票运价数据更新模型获取机票运价数据的下次更新时间,基于下次更新时间获取预测机票运价,设置数据标签进行数据存储;S108. Obtain multi-source air ticket freight data, filter the multi-source air ticket freight data according to a preset parameterization benchmark, and obtain the next update time of air ticket freight data through the air ticket freight data update model, based on the next update Time acquisition forecast air ticket freight, set data tags for data storage;
S110,基于用户搜索的时间戳调用存储数据中的机票运价数据,输出结果返回用户端。S110, based on the time stamp searched by the user, the air ticket price data in the stored data is called, and the output result is returned to the user terminal.
需要说明的是,提取历史机票运价数据中关键词信息,根据关键词信息确定起飞城市信息及目的城市信息,提取航线信息,基于所述航线信息设置分类标签;根据所述分类标签将历史机票运价数据进行分类,获取不同分类标签下的机票运价数据集,根据节假日信息及普通日信息将对应的机票运价数据进行标记;对不同标记下机票运价数据进行时序分析,获取机票运价的的变化时间戳及变化差价,生成各航线的时序变化序列,获取历史机票运价的更新周期及更新频次。It should be noted that the keyword information in the historical air ticket freight data is extracted, the departure city information and the destination city information are determined according to the keyword information, the route information is extracted, and classification labels are set based on the route information; Classify the freight data, obtain the air ticket freight data sets under different classification tags, and mark the corresponding air ticket freight data according to the holiday information and ordinary day information; conduct time series analysis on the air ticket freight data under different tags, and obtain the air ticket freight data. Time stamps of price changes and price differences are used to generate time-series change sequences for each route, and to obtain the update cycle and update frequency of historical air ticket prices.
图2示出了本发明获取机票运价数据的下次更新时间的方法流程图。Fig. 2 shows a flow chart of the method for obtaining the next update time of air ticket tariff data in the present invention.
根据本发明实施例,通过机票运价数据更新模型获取机票运价数据的下次更新时间,具体为:According to an embodiment of the present invention, the next update time of the air ticket tariff data is obtained through the air ticket tariff data update model, specifically:
S202,将多源机票运价数据进行筛选获取目标机票运价数据,提取目标机票运价数据的航线信息、时间信息及仓位信息,获取过去预设时间内的时序变化序列及访问量变化序列作为机票运价数据更新模型的输入;S202, filter the multi-source air ticket freight data to obtain the target air ticket freight data, extract the route information, time information and position information of the target air ticket freight data, and obtain the time series change sequence and the visit volume change sequence within the past preset time as The input of the air ticket tariff data update model;
S204,在机票运价数据更新模型引入自注意力机制,构建自注意力层,将不同时间步长的隐藏层状态输出作为所述自注意力层的输入,计算自注意力权重;S204, introducing a self-attention mechanism into the air ticket tariff data update model, constructing a self-attention layer, using hidden layer state outputs of different time steps as the input of the self-attention layer, and calculating self-attention weights;
S206,通过所述自注意力权重表征各时间步长对预测目标的重要性,根据迭代计算后,输出目标机票运价数据的下次更新时间。S206, using the self-attention weight to represent the importance of each time step to the predicted target, and output the next update time of the target air ticket price data after iterative calculation.
需要说明的是,通过所述自注意力权重表征各时间步长对预测目标的重要性,所述自注意力权重的计算公式为其中,/>表示在t时间步隐藏层状态的注意力得分,tanh表示激活函数,WcVc表示自注意力层参数,bc表示偏置,T表示矩阵转置。It should be noted that the importance of each time step to the prediction target is represented by the self-attention weight, and the calculation formula of the self-attention weight is where, /> Indicates the attention score of the hidden layer state at time step t, tanh indicates the activation function, W c V c indicates the self-attention layer parameters, b c indicates the bias, and T indicates the matrix transposition.
基于大数据手段获取机票预订相关网站中各航线信息的查询量,根据航线信息的目的地信息设置异构信息检索标签,并根据各更新周期获取检索时间步长,获取异构信息的查询量,所述异构信息可以通过旅行服务提供商的酒店预定信息、旅游搜索信息等获取;设置转化系数与异构信息的查询量进行结合,将结合后的数据与各航线信息的查询量进行匹配,获取各更新周期内最终访问量,所述转化系数通过某一城市历史同期时段航空客流量与总客流量的比值进行设置;将最终访问量关联到时序变化序列,获取访问量与更新周期的关联特征,构建特征样本数据集。Obtain the query volume of each route information in the airline ticket booking related websites based on big data means, set the heterogeneous information retrieval label according to the destination information of the route information, and obtain the retrieval time step according to each update cycle, and obtain the query volume of heterogeneous information, The heterogeneous information can be obtained through the travel service provider's hotel reservation information, travel search information, etc.; setting the conversion coefficient to combine with the query volume of the heterogeneous information, and matching the combined data with the query volume of each route information, Obtain the final number of visits in each update period, and the conversion coefficient is set by the ratio of the airline passenger flow to the total passenger flow in a certain city's historical period in the same period; associate the final visits with the time series change sequence, and obtain the association between the visits and the update period Features, construct a feature sample dataset.
基于粒子群算法优化后的LSTM网络构建机票运价数据更新模型,根据LSTM网络中隐藏层神经元个数、学习率及最大迭代次数设置粒子,并初始化粒子参数,包括粒子群算法的最大迭代次数、种群规模、加速系数及惯性权重等,并设置初始位置及速度;确定LSTM网络的网络结构,根据实际值与预测值的均方误差设置适应度函数,根据不断更新粒子个体最优和全局最优进行粒子的位置寻优,当更替得到的最优位置则停止寻优过程,根据粒子的最优位置确定LSTM网络的参数;将所述特征样本数据集按照预设比例划分为训练集与测试集,经过迭代训练后输出准确度符合预设标准的机票运价数据更新模型。Based on the LSTM network optimized by the particle swarm optimization algorithm, the air ticket price data update model is constructed, and the particles are set according to the number of hidden layer neurons, the learning rate and the maximum number of iterations in the LSTM network, and the particle parameters are initialized, including the maximum number of iterations of the particle swarm algorithm. , population size, acceleration coefficient and inertia weight, etc., and set the initial position and speed; determine the network structure of the LSTM network, set the fitness function according to the mean square error between the actual value and the predicted value, and continuously update the particle individual optimal and global optimal Optimize the position of the particles, stop the optimization process when the optimal position is replaced, and determine the parameters of the LSTM network according to the optimal position of the particles; divide the feature sample data set into a training set and a test set according to a preset ratio After iterative training, the air ticket price data update model whose accuracy meets the preset standards is output.
需要说明的是,根据各航线历史机票运价的时序变化序列及访问量变化序列通过数据挖掘获取机票运价的影响因素,所述数据挖掘能够利用大数据手段实现,通过文献及相关资料检索、专家经验汇总等方法获取机票运价的影响因素,一般包括航班自身因素、节假日因素、星期因素等,对所述影响因素通过随机森林或者主成分分析等方法进行筛选获取过去预设时间段内影响目标机票运价的影响因素集合;基于时间卷积神经网络构建机票运价预测网络,所述机票运价预测网络通过3个因果卷积汇成的扩张卷积网络残差快,与1个1×1卷积核的卷积网络组合而成,并最终通过1层全连接层输出最终的预测结果;It should be noted that, according to the time-series change sequence and traffic change sequence of the historical air ticket prices of each route, the influencing factors of air ticket prices are obtained through data mining. The data mining can be realized by means of big data. The factors affecting air ticket freight rates are obtained by means of expert experience summary, generally including flight factors, holiday factors, week factors, etc., and the influence factors are screened by random forest or principal component analysis to obtain the influence factors in the past preset time period A set of influencing factors of the target air ticket freight rate; the air ticket freight price prediction network is constructed based on the temporal convolutional neural network, and the air ticket freight price prediction network is formed by three causal convolutions. The convolutional network with ×1 convolution kernel is combined, and finally the final prediction result is output through a fully connected layer;
将获取的下次更新时间作为目标预测时间,将所述影响因素集合中影响因素与目标机票运价过去预设时间内的时序变化序列进行匹配,并进行归一化处理;将归一化后的数据导入机票运价预测网络,获取目标预测时间的预测机票运价,将下次更新时间及所述预测机票运价设置数据标签后进行存储。The acquired next update time is used as the target forecast time, and the influencing factors in the set of influencing factors are matched with the time series change sequence of the target air ticket freight price in the past preset time, and normalized processing is performed; the normalized The data is imported into the air ticket price prediction network, the predicted air ticket price at the target forecast time is obtained, and the next update time and the predicted air ticket price are set as data tags for storage.
图3示出了本发明基于用户搜索的时间戳调用存储数据中的机票运价数据的方法流程图。Fig. 3 shows a flow chart of the method of calling the air ticket tariff data in the stored data based on the time stamp of the user's search in the present invention.
根据本发明实施例,基于用户搜索的时间戳调用存储数据中的机票运价数据,输出结果返回用户端,具体为:According to the embodiment of the present invention, based on the time stamp searched by the user, the air ticket fare data in the stored data is called, and the output result is returned to the client, specifically:
S302,获取用户的历史行为数据,根据所述历史行为数据获取预设时间步长内用户与机票项目节点的交互信息,通过所述交互信息生成用于与机票项目的二部图结构;S302. Obtain historical behavior data of the user, obtain interaction information between the user and the ticket item node within a preset time step according to the historical behavior data, and generate a bipartite graph structure for the ticket item through the interaction information;
S304,获取用户的基础信息、航段信息及提前购票时间信息,作为所述二部图结构中节点的附加特征;S304, obtaining the user's basic information, flight segment information and advance ticket purchase time information as additional features of nodes in the bipartite graph structure;
S306,基于图卷积神经网络对所述二部图结构进行学习表示,得到用户与机票项目的初始向量表示,将用户与机票项目的初始向量表示进行拼接,构建邻接矩阵;S306, learning and representing the bipartite graph structure based on the graph convolutional neural network, obtaining the initial vector representation of the user and the ticket item, splicing the initial vector representation of the user and the ticket item, and constructing an adjacency matrix;
S308,通过图卷积神经网络的消息传递机制及邻居聚合机制,基于邻接矩阵进行节点之间的特征传递,学习邻居节点的特征,更新用户节点的嵌入表示;S308, through the message transfer mechanism of the graph convolutional neural network and the neighbor aggregation mechanism, the feature transfer between nodes is performed based on the adjacency matrix, the features of the neighbor nodes are learned, and the embedded representation of the user node is updated;
S310,另外,获取用户在预设时间步长内各时间戳交互的机票项目,将对应的二部图结构进行拼接,构建预设时间步长内的元路径,将所述元路径与用户进行匹配;S310. In addition, acquire the air ticket items interacted with by the user at each time stamp within the preset time step, splice the corresponding bipartite graph structure, construct a meta-path within the preset time step, and link the meta-path with the user match;
S312,通过计算用户之间的元路径上节点的均方距离获取用户之间的相似度,将所述相似度作为注意力权重,利用图注意力结构对用户节点的嵌入表示进行聚合输出最终的用户偏好特征;S312, obtain the similarity between users by calculating the mean square distance of nodes on the meta-path between users, use the similarity as the attention weight, use the graph attention structure to aggregate the embedded representation of the user node and output the final user preference features;
S314,根据用户的搜索信息获取对应的机票运价数据,根据用户的偏好特征结合获取的下次更新时间及预测机票运价分析机票的运价变化趋势,并查询的信息与运价变化趋势返回至用户端。S314, obtain the corresponding air ticket price data according to the user's search information, analyze the air ticket price change trend according to the user's preference characteristics combined with the acquired next update time and the predicted air ticket price, and return the queried information and the freight rate change trend to the client.
需要说明的是,利用图注意力结构对用户节点的嵌入表示进行聚合输出最终的用户偏好特征,所述用户偏好特征的具体公式为: 其中,fu表示用户u的偏好特征,σ表示非线性激活函数,Ws表示共享结构矩阵,hv表示其他用户v节点的嵌入表示,αuv表示注意力权重,/>表示用户u的邻居节点集合。It should be noted that, using the graph attention structure to aggregate the embedded representation of user nodes to output the final user preference feature, the specific formula of the user preference feature is: Among them, f u represents the preference feature of user u, σ represents the nonlinear activation function, W s represents the shared structure matrix, h v represents the embedding representation of other user v nodes, α uv represents the attention weight, /> Represents the set of neighbor nodes of user u.
当目标用户进行机票搜索过程中,根据目标用户搜索的时间戳获取与目标机票出发时间的时间差,获取目标机票的下次更新时间及预测机票运价,根据多个时间步的预测分析时间差内机票运价的运价变化趋势。When the target user is searching for an air ticket, the time difference between the target user's search time stamp and the departure time of the target air ticket is obtained, the next update time of the target air ticket and the predicted air ticket price are obtained, and the air ticket within the time difference is analyzed according to the prediction of multiple time steps Freight rate change trend.
根据本发明实施例,根据用户的个人偏好特征构建个性化数据库,具体为:According to an embodiment of the present invention, a personalized database is constructed according to the user's personal preference characteristics, specifically:
获取目标用户在机票预订网站的交互行为构建个性化数据集,根据预设时间内的交互行为数据进行偏好分析,获取目标用户当前预设时间内的偏好特征;Obtain the interactive behavior of the target user on the air ticket booking website to construct a personalized data set, conduct preference analysis according to the interactive behavior data within the preset time, and obtain the preference characteristics of the target user within the current preset time;
当目标用户进行机票搜索过程中,根据所述偏好特征对目标用户进行机票推荐,并根据目标用户的搜索行为对机票信息进行重点标注,对重点标注的机票进行监测;When the target user is searching for an air ticket, recommend the target user for an air ticket according to the preference characteristics, and mark the ticket information according to the search behavior of the target user, and monitor the key marked air ticket;
在监测过程中,获取机票运价的更新周期及更新频率,判断价格趋势,基于目标用户偏好特征输出最佳购买时间;During the monitoring process, obtain the update cycle and update frequency of the air ticket price, judge the price trend, and output the best purchase time based on the target user's preference characteristics;
实时获取目标用户的交互行为,对所述个性化数据库进行更新,并更新对应的偏好特征,当目标用户的交互行为对应的时间戳超过预设存储时间阈值,则将目标对象的个性化数据进行删除。Acquire the interactive behavior of the target user in real time, update the personalized database, and update the corresponding preference features. When the timestamp corresponding to the interactive behavior of the target user exceeds the preset storage time threshold, the personalized data of the target object will be stored. delete.
需要说明的是,用户偏好包括用户的偏好出发时间、提前预定时间、预定价格特征及出行地点特征等。It should be noted that the user preferences include the user's preferred departure time, advance reservation time, reservation price characteristics, and travel location characteristics.
图4示出了本发明一种基于机器学习的航路机票运价数据处理系统的框图。Fig. 4 shows a block diagram of a machine learning-based air ticket tariff data processing system of the present invention.
本发明第二方面还提供了一种基于机器学习的航路机票运价数据处理系统4,该系统包括:存储器41、处理器42,所述存储器中包括一种基于机器学习的航路机票运价数据处理方法程序,所述一种基于机器学习的航路机票运价数据处理方法程序被所述处理器执行时实现如下步骤:The second aspect of the present invention also provides a machine learning-based air ticket tariff data processing system 4, the system includes: a memory 41, a processor 42, the memory includes a machine learning-based air ticket tariff data The processing method program, when the machine learning-based air ticket tariff data processing method program is executed by the processor, the following steps are realized:
获取历史机票运价数据,按照航线信息将历史机票运价数据划分为不同数据集,对不同数据集进行时序分析,获取各航线机票运价的时序变化序列;Obtain the historical air ticket freight data, divide the historical air ticket freight data into different data sets according to the route information, conduct time series analysis on different data sets, and obtain the time series change sequence of air ticket freight for each route;
根据所述时序变化序列获取历史机票运价的更新周期及更新频次,另外基于大数据手段获取各航线的在各更新周期中的访问量,将所述访问量与时序变化序列关联,获取特征样本数据集;Obtain the update cycle and update frequency of historical air ticket freight rates according to the time series change sequence, and obtain the visits of each route in each update cycle based on big data means, associate the visits with the time series change sequence, and obtain feature samples data set;
基于机器学习构建机票运价数据更新模型,利用所述特征样本数据集进行训练,测试达标后输出训练后机票运价数据更新模型;Constructing the air ticket freight data update model based on machine learning, using the feature sample data set for training, and outputting the air ticket freight data update model after the training after the test is up to standard;
获取多源机票运价数据,根据预设参数化基准对所述多源机票运价数据进行筛选,通过机票运价数据更新模型获取机票运价数据的下次更新时间,基于下次更新时间获取预测机票运价,设置数据标签进行数据存储;Obtain multi-source air ticket freight data, filter the multi-source air ticket freight data according to the preset parameterized benchmark, obtain the next update time of air ticket freight data through the air ticket freight data update model, and obtain based on the next update time Predict air ticket freight rates, set data tags for data storage;
基于用户搜索的时间戳调用存储数据中的机票运价数据,输出结果返回用户端。Based on the time stamp of the user's search, the air ticket price data in the stored data is called, and the output result is returned to the client.
需要说明的是,提取历史机票运价数据中关键词信息,根据关键词信息确定起飞城市信息及目的城市信息,提取航线信息,基于所述航线信息设置分类标签;根据所述分类标签将历史机票运价数据进行分类,获取不同分类标签下的机票运价数据集,根据节假日信息及普通日信息将对应的机票运价数据进行标记;对不同标记下机票运价数据进行时序分析,获取机票运价的的变化时间戳及变化差价,生成各航线的时序变化序列,获取历史机票运价的更新周期及更新频次。It should be noted that the keyword information in the historical air ticket freight data is extracted, the departure city information and the destination city information are determined according to the keyword information, the route information is extracted, and classification labels are set based on the route information; Classify the freight data, obtain the air ticket freight data sets under different classification tags, and mark the corresponding air ticket freight data according to the holiday information and ordinary day information; conduct time series analysis on the air ticket freight data under different tags, and obtain the air ticket freight data. Time stamps of price changes and price differences are used to generate time-series change sequences for each route, and to obtain the update cycle and update frequency of historical air ticket prices.
根据本发明实施例,通过机票运价数据更新模型获取机票运价数据的下次更新时间,具体为:According to an embodiment of the present invention, the next update time of the air ticket tariff data is obtained through the air ticket tariff data update model, specifically:
将多源机票运价数据进行筛选获取目标机票运价数据,提取目标机票运价数据的航线信息、时间信息及仓位信息,获取过去预设时间内的时序变化序列及访问量变化序列作为机票运价数据更新模型的输入;Filter the multi-source air ticket freight data to obtain the target air ticket freight data, extract the route information, time information and position information of the target air ticket freight data, and obtain the time series change sequence and visit volume change sequence in the past preset time as the air ticket freight data. The input of price data update model;
在机票运价数据更新模型引入自注意力机制,构建自注意力层,将不同时间步长的隐藏层状态输出作为所述自注意力层的输入,计算自注意力权重;Introduce the self-attention mechanism in the ticket price data update model, build the self-attention layer, and use the hidden layer state output of different time steps as the input of the self-attention layer, and calculate the self-attention weight;
通过所述自注意力权重表征各时间步长对预测目标的重要性,根据迭代计算后,输出目标机票运价数据的下次更新时间。The importance of each time step to the predicted target is represented by the self-attention weight, and the next update time of the target air ticket price data is output after iterative calculation.
需要说明的是,通过所述自注意力权重表征各时间步长对预测目标的重要性,所述自注意力权重的计算公式为其中,/>表示在t时间步隐藏层状态的注意力得分,tanh表示激活函数,WcVc表示自注意力层参数,bc表示偏置,T表示矩阵转置。It should be noted that the importance of each time step to the prediction target is represented by the self-attention weight, and the calculation formula of the self-attention weight is where, /> Indicates the attention score of the hidden layer state at time step t, tanh indicates the activation function, W c V c indicates the self-attention layer parameters, b c indicates the bias, and T indicates the matrix transposition.
基于大数据手段获取机票预订相关网站中各航线信息的查询量,根据航线信息的目的地信息设置异构信息检索标签,并根据各更新周期获取检索时间步长,获取异构信息的查询量,所述异构信息可以通过旅行服务提供商的酒店预定信息、旅游搜索信息等获取;设置转化系数与异构信息的查询量进行结合,将结合后的数据与各航线信息的查询量进行匹配,获取各更新周期内最终访问量,所述转化系数通过某一城市历史同期时段航空客流量与总客流量的比值进行设置;将最终访问量关联到时序变化序列,获取访问量与更新周期的关联特征,构建特征样本数据集。Obtain the query volume of each route information in the airline ticket booking related websites based on big data means, set the heterogeneous information retrieval label according to the destination information of the route information, and obtain the retrieval time step according to each update cycle, and obtain the query volume of heterogeneous information, The heterogeneous information can be obtained through the travel service provider's hotel reservation information, travel search information, etc.; setting the conversion coefficient to combine with the query volume of the heterogeneous information, and matching the combined data with the query volume of each route information, Obtain the final number of visits in each update period, and the conversion coefficient is set by the ratio of the airline passenger flow to the total passenger flow in a certain city's historical period in the same period; associate the final visits with the time series change sequence, and obtain the association between the visits and the update period Features, construct a feature sample dataset.
基于粒子群算法优化后的LSTM网络构建机票运价数据更新模型,根据LSTM网络中隐藏层神经元个数、学习率及最大迭代次数设置粒子,并初始化粒子参数,包括粒子群算法的最大迭代次数、种群规模、加速系数及惯性权重等,并设置初始位置及速度;确定LSTM网络的网络结构,根据实际值与预测值的均方误差设置适应度函数,根据不断更新粒子个体最优和全局最优进行粒子的位置寻优,当更替得到的最优位置则停止寻优过程,根据粒子的最优位置确定LSTM网络的参数;将所述特征样本数据集按照预设比例划分为训练集与测试集,经过迭代训练后输出准确度符合预设标准的机票运价数据更新模型。Based on the LSTM network optimized by the particle swarm optimization algorithm, the air ticket price data update model is constructed, and the particles are set according to the number of hidden layer neurons, the learning rate and the maximum number of iterations in the LSTM network, and the particle parameters are initialized, including the maximum number of iterations of the particle swarm algorithm. , population size, acceleration coefficient and inertia weight, etc., and set the initial position and speed; determine the network structure of the LSTM network, set the fitness function according to the mean square error between the actual value and the predicted value, and continuously update the particle individual optimal and global optimal Optimize the position of the particles, stop the optimization process when the optimal position is replaced, and determine the parameters of the LSTM network according to the optimal position of the particles; divide the feature sample data set into a training set and a test set according to a preset ratio After iterative training, the air ticket price data update model whose accuracy meets the preset standards is output.
需要说明的是,根据各航线历史机票运价的时序变化序列及访问量变化序列通过数据挖掘获取机票运价的影响因素,所述数据挖掘能够利用大数据手段实现,通过文献及相关资料检索、专家经验汇总等方法获取机票运价的影响因素,一般包括航班自身因素、节假日因素、星期因素等,对所述影响因素通过随机森林或者主成分分析等方法进行筛选获取过去预设时间段内影响目标机票运价的影响因素集合;基于时间卷积神经网络构建机票运价预测网络,所述机票运价预测网络通过3个因果卷积汇成的扩张卷积网络残差快,与1个1×1卷积核的卷积网络组合而成,并最终通过1层全连接层输出最终的预测结果;It should be noted that, according to the time-series change sequence and traffic change sequence of the historical air ticket prices of each route, the influencing factors of air ticket prices are obtained through data mining. The data mining can be realized by means of big data. The factors affecting air ticket freight rates are obtained by means of expert experience summary, generally including flight factors, holiday factors, week factors, etc., and the influence factors are screened by random forest or principal component analysis to obtain the influence factors in the past preset time period A set of influencing factors of the target air ticket freight rate; the air ticket freight price prediction network is constructed based on the temporal convolutional neural network, and the air ticket freight price prediction network is formed by three causal convolutions. The convolutional network with ×1 convolution kernel is combined, and finally the final prediction result is output through a fully connected layer;
将获取的下次更新时间作为目标预测时间,将所述影响因素集合中影响因素与目标机票运价过去预设时间内的时序变化序列进行匹配,并进行归一化处理;将归一化后的数据导入机票运价预测网络,获取目标预测时间的预测机票运价,将下次更新时间及所述预测机票运价设置数据标签后进行存储。The acquired next update time is used as the target forecast time, and the influencing factors in the set of influencing factors are matched with the time series change sequence of the target air ticket freight price in the past preset time, and normalized processing is performed; the normalized The data is imported into the air ticket price prediction network, the predicted air ticket price at the target forecast time is obtained, and the next update time and the predicted air ticket price are set as data tags for storage.
根据本发明实施例,基于用户搜索的时间戳调用存储数据中的机票运价数据,输出结果返回用户端,具体为:According to the embodiment of the present invention, based on the time stamp searched by the user, the air ticket fare data in the stored data is called, and the output result is returned to the client, specifically:
获取用户的历史行为数据,根据所述历史行为数据获取预设时间步长内用户与机票项目节点的交互信息,通过所述交互信息生成用于与机票项目的二部图结构;Acquiring the historical behavior data of the user, obtaining the interaction information between the user and the ticket project node within the preset time step according to the historical behavior data, and generating a bipartite graph structure for the ticket project through the interaction information;
获取用户的基础信息、航段信息及提前购票时间信息,作为所述二部图结构中节点的附加特征;Obtain the user's basic information, flight segment information and advance ticket purchase time information as additional features of nodes in the bipartite graph structure;
基于图卷积神经网络对所述二部图结构进行学习表示,得到用户与机票项目的初始向量表示,将用户与机票项目的初始向量表示进行拼接,构建邻接矩阵;Learning and representing the bipartite graph structure based on the graph convolutional neural network, obtaining the initial vector representation of the user and the ticket item, splicing the initial vector representation of the user and the ticket item, and constructing an adjacency matrix;
通过图卷积神经网络的消息传递机制及邻居聚合机制,基于邻接矩阵进行节点之间的特征传递,学习邻居节点的特征,更新用户节点的嵌入表示;Through the message passing mechanism and neighbor aggregation mechanism of the graph convolutional neural network, the feature transfer between nodes is performed based on the adjacency matrix, the characteristics of neighbor nodes are learned, and the embedded representation of user nodes is updated;
另外,获取用户在预设时间步长内各时间戳交互的机票项目,将对应的二部图结构进行拼接,构建预设时间步长内的元路径,将所述元路径与用户进行匹配;In addition, acquire the air ticket items interacted with by the user at each time stamp within the preset time step, splice the corresponding bipartite graph structure, construct a meta-path within the preset time step, and match the meta-path with the user;
通过计算用户之间的元路径上节点的均方距离获取用户之间的相似度,将所述相似度作为注意力权重,利用图注意力结构对用户节点的嵌入表示进行聚合输出最终的用户偏好特征;Obtain the similarity between users by calculating the mean square distance of nodes on the meta-path between users, use the similarity as the attention weight, and use the graph attention structure to aggregate the embedded representation of user nodes to output the final user preference feature;
根据用户的搜索信息获取对应的机票运价数据,根据用户的偏好特征结合获取的下次更新时间及预测机票运价分析机票的运价变化趋势,并查询的信息与运价变化趋势返回至用户端。According to the user's search information, the corresponding air ticket price data is obtained, and the air ticket price change trend is analyzed according to the user's preference characteristics combined with the next update time obtained and the predicted air ticket price, and the queried information and freight price change trend are returned to the user end.
需要说明的是,利用图注意力结构对用户节点的嵌入表示进行聚合输出最终的用户偏好特征,所述用户偏好特征的具体公式为: 其中,fu表示用户u的偏好特征,σ表示非线性激活函数,Ws表示共享结构矩阵,hv表示其他用户v节点的嵌入表示,αuv表示注意力权重,/>表示用户u的邻居节点集合。It should be noted that, using the graph attention structure to aggregate the embedded representation of user nodes to output the final user preference feature, the specific formula of the user preference feature is: Among them, f u represents the preference feature of user u, σ represents the nonlinear activation function, W s represents the shared structure matrix, h v represents the embedding representation of other user v nodes, α uv represents the attention weight, /> Represents the set of neighbor nodes of user u.
当目标用户进行机票搜索过程中,根据目标用户搜索的时间戳获取与目标机票出发时间的时间差,获取目标机票的下次更新时间及预测机票运价,根据多个时间步的预测分析时间差内机票运价的运价变化趋势。When the target user is searching for an air ticket, the time difference between the target user's search time stamp and the departure time of the target air ticket is obtained, the next update time of the target air ticket and the predicted air ticket price are obtained, and the air ticket within the time difference is analyzed according to the prediction of multiple time steps Freight rate change trend.
本发明第三方面还提供一种计算机可读存储介质,所述计算机可读存储介质中包括一种基于机器学习的航路机票运价数据处理方法程序,所述一种基于机器学习的航路机票运价数据处理方法程序被处理器执行时,实现如上述任一项所述的一种基于机器学习的航路机票运价数据处理方法的步骤。The third aspect of the present invention also provides a computer-readable storage medium. The computer-readable storage medium includes a machine learning-based air ticket fare data processing method program. The machine learning-based air ticket transportation When the program of the price data processing method is executed by the processor, the steps of a machine learning-based air ticket price data processing method as described in any one of the above are realized.
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods, such as: multiple units or components can be combined, or May be integrated into another system, or some features may be ignored, or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms of.
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units; they may be located in one place or distributed to multiple network units; Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本发明各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention can be integrated into one processing unit, or each unit can be used as a single unit, or two or more units can be integrated into one unit; the above-mentioned integration The unit can be realized in the form of hardware or in the form of hardware plus software functional unit.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps to realize the above method embodiments can be completed by hardware related to program instructions, and the aforementioned programs can be stored in computer-readable storage media. When the program is executed, the execution includes: The steps of the foregoing method embodiment; and the foregoing storage medium includes: various possible storage devices such as removable storage devices, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks, etc. A medium that stores program code.
或者,本发明上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, if the above-mentioned integrated units of the present invention are implemented in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the embodiment of the present invention is essentially or the part that contributes to the prior art can be embodied in the form of a software product. The computer software product is stored in a storage medium and includes several instructions for Make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the methods described in various embodiments of the present invention. The aforementioned storage medium includes: various media capable of storing program codes such as removable storage devices, ROM, RAM, magnetic disks or optical disks.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. Should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310490403.8A CN116503086A (en) | 2023-05-04 | 2023-05-04 | Method, system and medium for data processing of air ticket freight rate based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310490403.8A CN116503086A (en) | 2023-05-04 | 2023-05-04 | Method, system and medium for data processing of air ticket freight rate based on machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116503086A true CN116503086A (en) | 2023-07-28 |
Family
ID=87329968
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310490403.8A Pending CN116503086A (en) | 2023-05-04 | 2023-05-04 | Method, system and medium for data processing of air ticket freight rate based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116503086A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117271886A (en) * | 2023-08-25 | 2023-12-22 | 广东美亚旅游科技集团股份有限公司 | Data search method, system, equipment and media based on ticket order management |
-
2023
- 2023-05-04 CN CN202310490403.8A patent/CN116503086A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117271886A (en) * | 2023-08-25 | 2023-12-22 | 广东美亚旅游科技集团股份有限公司 | Data search method, system, equipment and media based on ticket order management |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210073283A1 (en) | Machine learning and prediction using graph communities | |
CN104169950B (en) | Utilize the Database Systems of the calculating towards batch processing | |
CN115002200B (en) | Message pushing method, device, equipment and storage medium based on user portrait | |
CN109582876B (en) | Tourist industry user portrait construction method and device and computer equipment | |
WO2017190610A1 (en) | Target user orientation method and device, and computer storage medium | |
CN109189904A (en) | Individuation search method and system | |
CN114912948B (en) | Cloud service-based cross-border e-commerce big data intelligent processing method, device and equipment | |
CN115204971B (en) | Product recommendation method, device, electronic equipment and computer readable storage medium | |
CN111400613A (en) | Article recommendation method, device, medium and computer equipment | |
CN110992097A (en) | Processing method and device for revenue product price, computer equipment and storage medium | |
Guo et al. | Concurrent order dispatch for instant delivery with time-constrained actor-critic reinforcement learning | |
CN113707302B (en) | Service recommendation method, device, equipment and storage medium based on associated information | |
CN113592605B (en) | Product recommendation method, device, equipment and storage medium based on similar products | |
CN115222433A (en) | Information recommendation method and device and storage medium | |
CN114997916A (en) | Prediction method, system, electronic device and storage medium of potential user | |
CN115131101A (en) | A Personalized Intelligent Recommendation System for Insurance Products | |
CN113781149A (en) | Information recommendation method and device, computer-readable storage medium and electronic equipment | |
Singh et al. | Analysis of machine learning techniques for airfare prediction | |
CN116503086A (en) | Method, system and medium for data processing of air ticket freight rate based on machine learning | |
CN111859113A (en) | Personalized content pushing method and device based on online taxi appointment | |
CN113779241A (en) | Information acquisition method and apparatus, computer-readable storage medium, and electronic device | |
KR102554886B1 (en) | Open market data analysis-based advertising efficiency maximization platform | |
KR102563095B1 (en) | AI-based open market integrated management system | |
CN117312670A (en) | Recommendation generation method, device, equipment and medium based on static and dynamic data | |
CN114429384B (en) | Intelligent product recommendation method and system based on e-commerce platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20230728 |
|
WD01 | Invention patent application deemed withdrawn after publication |