CN118072518A

CN118072518A - Traffic flow prediction method based on big data

Info

Publication number: CN118072518A
Application number: CN202410319851.6A
Authority: CN
Inventors: 鲍俊颖
Original assignee: Chongqing University of Education
Current assignee: Chongqing University of Education
Priority date: 2024-03-20
Filing date: 2024-03-20
Publication date: 2024-05-24

Abstract

The application provides a traffic flow prediction method based on big data, which comprises the steps of dividing a monitoring area into M sample road sections by constructing a traffic flow prediction model, and determining historical traffic data (comprising corresponding traffic flow labels) corresponding to each sample road section by utilizing a historical traffic data set; determining real-time characteristics and global characteristics corresponding to each sample road section; performing feature fusion to obtain N training features corresponding to each sample road section, and forming a training data set containing MN training features; and constructing a GRU model, and training and testing to obtain a trained traffic flow prediction model. By combining the real-time features and the global features and adopting the GRU model for prediction, the accuracy of traffic flow prediction can be improved, and the prediction result is more reliable.

Description

Traffic flow prediction method based on big data

技术领域Technical Field

本申请涉及大数据技术领域，具体而言，涉及一种基于大数据的交通流量预测方法。The present application relates to the field of big data technology, and in particular, to a traffic flow prediction method based on big data.

背景技术Background technique

随着城市化进程的不断加速，城市交通拥堵已成为严重影响人们日常生活和经济发展的问题。频繁的交通拥堵导致了长时间的通勤，加剧了车辆尾气排放，对环境造成了严重污染；同时也降低了人们的生活质量，增加了生活压力。With the continuous acceleration of urbanization, urban traffic congestion has become a problem that seriously affects people's daily life and economic development. Frequent traffic congestion leads to long commutes, aggravates vehicle exhaust emissions, and causes serious environmental pollution; at the same time, it also reduces people's quality of life and increases life pressure.

传统的交通管理相关方法通常只能提供静态的信息，对于动态变化的交通状况缺乏及时的响应能力。因此，如何利用现代信息技术来改善交通管理(例如行车路径规划)，提高交通流量的预测精度和准确性，成为了当前交通领域亟待解决的问题。Traditional traffic management methods can usually only provide static information and lack the ability to respond to dynamically changing traffic conditions in a timely manner. Therefore, how to use modern information technology to improve traffic management (such as driving route planning) and improve the prediction accuracy of traffic flow has become an urgent problem to be solved in the current transportation field.

随着大数据技术的迅猛发展，交通领域也开始逐渐应用大数据技术进行交通流量的预测与管理。大数据技术以其强大的数据处理和分析能力，为交通管理部门提供了更多元化、实时化的数据支持。例如，通过收集车辆轨迹数据、道路监控数据、移动终端数据等，交通管理部门可以更准确地了解交通状况，及时采取相应的管理措施。同时，大数据技术还可以通过对海量数据的分析，发现交通流量变化的规律和趋势，为未来的交通规划提供科学依据。With the rapid development of big data technology, the transportation field has also begun to gradually apply big data technology to predict and manage traffic flow. With its powerful data processing and analysis capabilities, big data technology provides more diversified and real-time data support for traffic management departments. For example, by collecting vehicle trajectory data, road monitoring data, mobile terminal data, etc., traffic management departments can understand traffic conditions more accurately and take corresponding management measures in a timely manner. At the same time, big data technology can also discover the laws and trends of traffic flow changes through the analysis of massive data, providing a scientific basis for future traffic planning.

尽管大数据技术为交通流量预测带来了新的希望，但在实际应用中仍然存在问题，当前大数据的处理和分析技术在交通流量预测方面应用效果不佳，预测结果不准确，导致应用效果较差(例如对行车路径规划的指导，提前规避拥堵)，不利于改善交通拥堵状况。Although big data technology has brought new hope for traffic flow prediction, there are still problems in practical applications. The current big data processing and analysis technology has poor application effect in traffic flow prediction, and the prediction results are inaccurate, resulting in poor application effect (such as guidance on driving route planning and avoiding congestion in advance), which is not conducive to improving traffic congestion.

发明内容Summary of the invention

本申请实施例的目的在于提供一种基于大数据的交通流量预测方法，以提高交通流量预测精度，提高对应用层面的适用性，改善交通拥堵状况。The purpose of the embodiments of the present application is to provide a traffic flow prediction method based on big data to improve the accuracy of traffic flow prediction, improve the applicability at the application level, and improve traffic congestion conditions.

为了实现上述目的，本申请的实施例通过如下方式实现：In order to achieve the above purpose, the embodiments of the present application are implemented in the following ways:

第一方面，本申请实施例提供一种基于大数据的交通流量预测方法，包括：获取目标路段的实时交通数据，其中，目标路段包含进行流量预测的监测路段和与监测路段相通的邻近路段，在目标路段的每个通行方向上，可从该通行方向上位于起点的邻近路段出发经过监测路段后到达该通行方向上位于终点的邻近路段；对目标路段的实时交通数据进行预处理，并确定出目标路段对应的实时特征；获取目标路段对应的全局特征，并对目标路段对应的全局特征和实时特征进行特征融合，得到目标路段对应的输入特征；将目标路段对应的输入特征输入至预设的交通流量预测模型中，通过交通流量预测模型确定出目标路段对应的交通流量预测结果。In a first aspect, an embodiment of the present application provides a traffic flow prediction method based on big data, comprising: obtaining real-time traffic data of a target section, wherein the target section includes a monitoring section for flow prediction and an adjacent section connected to the monitoring section, and in each direction of travel of the target section, it is possible to start from the adjacent section at the starting point in the travel direction and then reach the adjacent section at the end point in the travel direction after passing through the monitoring section; preprocessing the real-time traffic data of the target section, and determining the real-time features corresponding to the target section; obtaining the global features corresponding to the target section, and performing feature fusion on the global features and real-time features corresponding to the target section to obtain the input features corresponding to the target section; inputting the input features corresponding to the target section into a preset traffic flow prediction model, and determining the traffic flow prediction result corresponding to the target section through the traffic flow prediction model.

结合第一方面，在第一方面的第一种可能的实现方式中，在获取实时交通数据之前，需构建交通流量预测模型，构建交通流量预测模型的方式为：获取历史交通数据集，其中，历史交通数据集包含监测区域内N个连续时间步采集的历史交通信息；对监测区域进行划分，形成M个样本路段，确定出每个样本路段对应的历史交通数据，其中，每个样本路段对应的每条历史交通数据中包含对应的交通流量标签，用于揭示该样本路段在该时间步的交通流量，每个样本路段包含一个标定路段和与标定路段相通的邻近路段，在样本路段的每个通行方向上，可从该通行方向上位于起点的邻近路段出发经过标定路段后到达该通行方向上位于终点的邻近路段；对历史交通数据集中的每个样本路段对应的历史交通数据进行预处理，并确定出每个样本路段对应的实时特征和全局特征，得到每个样本路段对应的N个实时特征和M个样本路段对应的M个全局特征；对每个样本路段对应的全局特征和实时特征进行特征融合，得到每个样本路段对应的N个训练特征，形成包含MN个训练特征的训练数据集；将训练数据集划分为训练集和测试集；构建GRU模型，并利用训练集和测试集对GRU模型进行训练和测试，得到训练好的交通流量预测模型。In combination with the first aspect, in a first possible implementation method of the first aspect, before obtaining real-time traffic data, a traffic flow prediction model needs to be constructed, and the method of constructing the traffic flow prediction model is: obtaining a historical traffic data set, wherein the historical traffic data set contains historical traffic information collected in N consecutive time steps within the monitoring area; dividing the monitoring area to form M sample sections, and determining the historical traffic data corresponding to each sample section, wherein each piece of historical traffic data corresponding to each sample section contains a corresponding traffic flow label for revealing the traffic flow of the sample section at the time step, and each sample section contains a calibrated section and an adjacent section connected to the calibrated section, and in each direction of travel of the sample section, a traffic flow label can be obtained from the direction of travel. Starting from the adjacent road section at the starting point, passing through the calibrated road section, it arrives at the adjacent road section at the end point in the travel direction; preprocessing the historical traffic data corresponding to each sample road section in the historical traffic data set, and determining the real-time features and global features corresponding to each sample road section, and obtaining N real-time features corresponding to each sample road section and M global features corresponding to M sample road sections; feature fusion of the global features and real-time features corresponding to each sample road section, obtaining N training features corresponding to each sample road section, and forming a training data set containing MN training features; dividing the training data set into a training set and a test set; constructing a GRU model, and using the training set and the test set to train and test the GRU model, to obtain a trained traffic flow prediction model.

结合第一方面的第一种可能的实现方式，在第一方面的第二种可能的实现方式中，对监测区域进行划分，形成M个样本路段，确定出每个样本路段对应的历史交通数据，包括：对监测区域进行路段划分，得到M个路段；针对每个路段，将当前路段作为标定路段，确定标定路段及邻近路段为一个样本路段，共计得到M个样本路段，其中，每个样本路段包含一个标定路段和与标定路段相通的邻近路段，在样本路段的每个通行方向上，可从该通行方向上位于起点的邻近路段出发经过标定路段后到达该通行方向上位于终点的邻近路段；针对每个样本路段，将每条历史交通信息在当前样本路段对应的信息作为样本路段对应的历史交通数据，每个样本路段对应N条历史交通数据，其中，每条历史交通数据中包含交通流量标签，通过当前样本路段中标记路段在相应时间步下的交通流量确定。In combination with the first possible implementation of the first aspect, in a second possible implementation of the first aspect, the monitoring area is divided to form M sample sections, and the historical traffic data corresponding to each sample section is determined, including: dividing the monitoring area into sections to obtain M sections; for each section, taking the current section as a calibration section, determining the calibration section and an adjacent section as a sample section, and obtaining a total of M sample sections, wherein each sample section includes a calibration section and an adjacent section connected to the calibration section, and in each travel direction of the sample section, it is possible to start from the adjacent section at the starting point in the travel direction and pass through the calibration section to reach the adjacent section at the end point in the travel direction; for each sample section, the information corresponding to each piece of historical traffic information in the current sample section is used as the historical traffic data corresponding to the sample section, and each sample section corresponds to N pieces of historical traffic data, wherein each piece of historical traffic data includes a traffic flow label, which is determined by the traffic flow of the marked section in the current sample section at the corresponding time step.

结合第一方面的第一种可能的实现方式，在第一方面的第三种可能的实现方式中，对历史交通数据集中的每个样本路段对应的历史交通数据进行预处理，并确定出每个样本路段对应的实时特征和全局特征，包括：对历史交通数据集中的每个样本路段对应的历史交通数据进行数据清洗；对清洗后的每个样本路段对应的历史交通数据进行标准化；对标准化后的每个样本路段对应的历史交通数据进行实时特征提取，确定出样本路段对应的N个实时特征；对同一样本路段在所有时间步下的历史交通数据进行全局特征提取，确定出M个样本路段对应的M个全局特征。In combination with the first possible implementation method of the first aspect, in a third possible implementation method of the first aspect, the historical traffic data corresponding to each sample section in the historical traffic data set are preprocessed, and the real-time features and global features corresponding to each sample section are determined, including: performing data cleaning on the historical traffic data corresponding to each sample section in the historical traffic data set; standardizing the historical traffic data corresponding to each cleaned sample section; performing real-time feature extraction on the historical traffic data corresponding to each standardized sample section, and determining N real-time features corresponding to the sample section; performing global feature extraction on the historical traffic data of the same sample section at all time steps, and determining M global features corresponding to the M sample sections.

结合第一方面的第一种可能的实现方式，在第一方面的第四种可能的实现方式中，对每个样本路段对应的全局特征和实时特征进行特征融合，得到每个样本路段对应的N个训练特征，形成包含MN个训练特征的训练数据集，包括：针对每个样本路段：利用全连接层将当前样本路段对应的全局特征和每个实时特征进行特征嵌入，得到当前样本路段对应的N组实时嵌入特征和一组全局嵌入特征；针对每组实时嵌入特征：将实时嵌入特征与对应的全局嵌入特征进行特征融合，得到对应的训练特征；整合M个样本路段对应的所有训练特征，形成包含MN个训练特征的训练数据集。In combination with the first possible implementation method of the first aspect, in a fourth possible implementation method of the first aspect, feature fusion is performed on the global features and real-time features corresponding to each sample road section to obtain N training features corresponding to each sample road section, and a training data set containing MN training features is formed, including: for each sample road section: using a fully connected layer to feature embed the global features and each real-time feature corresponding to the current sample road section to obtain N groups of real-time embedded features and a group of global embedded features corresponding to the current sample road section; for each group of real-time embedded features: feature fusion is performed on the real-time embedded features and the corresponding global embedded features to obtain the corresponding training features; all training features corresponding to the M sample road sections are integrated to form a training data set containing MN training features.

结合第一方面的第四种可能的实现方式，在第一方面的第五种可能的实现方式中，利用全连接层将当前样本路段对应的全局特征和每个实时特征进行特征嵌入，得到当前样本路段对应的N组实时嵌入特征和一组全局嵌入特征，包括：针对当前样本路段j对应的第i个实时特征和当前样本路段j对应的全局特征Y，采用以下公式进行特征嵌入：In combination with the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, a global feature corresponding to the current sample section and each real-time feature are feature embedded using a fully connected layer to obtain N groups of real-time embedded features and a group of global embedded features corresponding to the current sample section, including: for the i-th real-time feature corresponding to the current sample section j The global feature Y corresponding to the current sample road section j is embedded using the following formula:

其中，i∈[1,N],j∈[1,M]，为对实时特征/>进行特征嵌入后的特征表示，为对全局特征Y^j进行特征嵌入后的特征表示，/>为对实时特征/>进行特征嵌入时的权重矩阵，/>为对全局特征Y^j进行特征嵌入时的权重矩阵，/>为对实时特征/>进行特征嵌入时的偏置项，/>为对全局特征Y^j进行特征嵌入时的偏置项，σ为激活函数。Among them, i∈[1,N],j∈[1,M], For real-time features/> Feature representation after feature embedding, is the feature representation after feature embedding of the global feature Y ^j ,/> For real-time features/> The weight matrix for feature embedding, /> is the weight matrix for embedding the global feature Y ^j ,/> For real-time features/> Bias term when embedding features, /> is the bias term when embedding the global feature ^Yj , and σ is the activation function.

结合第一方面的第五种可能的实现方式，在第一方面的第六种可能的实现方式中，将实时嵌入特征与对应的全局嵌入特征进行特征融合，得到对应的训练特征，包括：针对实时嵌入特征和全局嵌入特征/>采用以下公式进行特征融合：In combination with the fifth possible implementation of the first aspect, in a sixth possible implementation of the first aspect, the real-time embedding feature is fused with the corresponding global embedding feature to obtain the corresponding training feature, including: for the real-time embedding feature and global embedding features/> The following formula is used for feature fusion:

其中，为样本路段j对应的第i个训练特征，即实时嵌入特征/>和全局嵌入特征/>融合后的训练特征，α为注意力权重，/>为关于实时嵌入特征/>的非线性变换，/>为关于全局嵌入特征/>的非线性变换，/>为实时嵌入特征/>对应的权重参数，/>为全局嵌入特征/>对应的权重参数，/>为实时嵌入特征/>在时间步t的滞后项，t∈[1,N]，/>为全局嵌入特征/>在样本路段s的滞后项。in, is the i-th training feature corresponding to sample road section j, i.e., the real-time embedding feature/> and global embedding features/> The fused training features, α is the attention weight, /> For real-time embedding features/> Nonlinear transformation of For global embedding features/> Nonlinear transformation of Embedding features for real-time/> The corresponding weight parameter, /> is the global embedding feature/> The corresponding weight parameter, /> Embedding features for real-time/> The lag term at time step t, t∈[1,N],/> is the global embedding feature/> The lag term in the sample segment s.

结合第一方面的第六种可能的实现方式，在第一方面的第七种可能的实现方式中，注意力权重α满足：In combination with the sixth possible implementation manner of the first aspect, in a seventh possible implementation manner of the first aspect, the attention weight α satisfies:

其中，为时间步t时注意力权重α的第k维参数，F(·)为注意力函数，h^(t-1)为上一时间步的隐藏状态，/>为样本路段j对应的第i个训练特征中的第k维参数，C为训练特征的维度总数，/>为样本路段j对应的第i个训练特征中的第l维参数。in, is the k-th dimension parameter of the attention weight α at time step t, F(·) is the attention function, h ^(t-1) is the hidden state of the previous time step, /> is the k-th dimension parameter in the i-th training feature corresponding to the sample road segment j, C is the total number of dimensions of the training feature, /> is the lth dimension parameter in the i-th training feature corresponding to sample road segment j.

结合第一方面的第六种可能的实现方式，在第一方面的第八种可能的实现方式中，权重参数满足：In combination with the sixth possible implementation manner of the first aspect, in an eighth possible implementation manner of the first aspect, the weight parameter satisfy:

其中，为权重参数。in, is the weight parameter.

结合第一方面的第六种可能的实现方式，在第一方面的第九种可能的实现方式中，权重参数满足：In conjunction with the sixth possible implementation of the first aspect, in a ninth possible implementation of the first aspect, the weight parameter satisfy:

其中，为权重参数。in, is the weight parameter.

有益效果：Beneficial effects:

1.本方案通过构建交通流量预测模型，对监测区域进行划分形成M个样本路段，利用历史交通数据集确定出每个样本路段对应的历史交通数据(包含对应的交通流量标签)；对历史交通数据进行预处理，并确定出每个样本路段对应的实时特征和全局特征；进一步进行特征融合，得到每个样本路段对应的N个训练特征，形成包含MN个训练特征的训练数据集；构建GRU模型，进行训练和测试，得到训练好的交通流量预测模型。通过结合实时特征和全局特征，并采用GRU模型进行预测，能够提高交通流量预测的准确性，使预测结果更加可靠。利用实时交通数据进行预测，能够更好地反映出交通流量的动态变化，全局特征和实时特征的融合能够充分利用历史和当前数据，提高模型对复杂交通情况的感知能力，从而考虑到交通拥堵、事故等突发事件的影响，提高交通流量预测的精度。1. This scheme builds a traffic flow prediction model, divides the monitoring area into M sample sections, and uses the historical traffic data set to determine the historical traffic data corresponding to each sample section (including the corresponding traffic flow label); preprocesses the historical traffic data and determines the real-time features and global features corresponding to each sample section; further performs feature fusion to obtain N training features corresponding to each sample section, forming a training data set containing MN training features; builds a GRU model, trains and tests, and obtains a trained traffic flow prediction model. By combining real-time features and global features and using the GRU model for prediction, the accuracy of traffic flow prediction can be improved and the prediction results can be made more reliable. Using real-time traffic data for prediction can better reflect the dynamic changes of traffic flow. The fusion of global features and real-time features can make full use of historical and current data, improve the model's perception of complex traffic conditions, and thus take into account the impact of emergencies such as traffic congestion and accidents, and improve the accuracy of traffic flow prediction.

2.设计特征嵌入和特征融合思路，能够使模型更好地利用全局和实时特征之间的关系，提高特征表达的效果，从而增强模型对不同因素的综合考虑能力。通过将实时嵌入特征和全局嵌入特征进行加权融合，模型可以同时考虑到当前时刻的实时信息和历史全局信息，从而综合利用两者的优势。这有助于提高模型对交通数据特征的综合把握能力，使得模型更加全面地理解路况情况。引入注意力权重，动态地学习实时嵌入特征和全局嵌入特征之间的重要性，并据此进行加权融合，这样可以使模型自适应地关注对当前预测任务更为关键的特征信息，提高模型在不同场景下的适应性和泛化能力。在实时嵌入特征和全局嵌入特征进行非线性变换时，引入了更多的非线性因素，增强特征之间的复杂关系建模能力，助于提高模型的表达能力，使其可以更好地拟合数据的非线性特征，提高预测的准确性。根据训练特征的形成过程，区域划分的方式(一个样本路段的划分包含标定路段和邻近路段)，据此设计了考虑到实时嵌入特征的滞后项和全局嵌入特征的滞后项，可以帮助模型捕捉数据的时间依赖性和长期记忆，更好地理解数据的时间序列特征，有助于提高模型对交通数据时间变化趋势的理解和预测能力。而实时嵌入特征和全局嵌入特征的权重参数可以通过训练数据学习得到(也可以采用本方案所设计的权重分配方案)，从而可以根据数据的特点自动调整权重参数，使得特征融合过程更加灵活和适应不同数据分布的情况。2. Designing feature embedding and feature fusion ideas can enable the model to better utilize the relationship between global and real-time features, improve the effect of feature expression, and thus enhance the model's ability to comprehensively consider different factors. By weighted fusion of real-time embedded features and global embedded features, the model can simultaneously consider the real-time information at the current moment and the historical global information, thereby comprehensively utilizing the advantages of both. This helps to improve the model's ability to comprehensively grasp the characteristics of traffic data, allowing the model to understand road conditions more comprehensively. Introducing attention weights, dynamically learning the importance between real-time embedded features and global embedded features, and performing weighted fusion accordingly, this allows the model to adaptively focus on feature information that is more critical to the current prediction task, and improve the model's adaptability and generalization ability in different scenarios. When nonlinear transformations are performed on real-time embedded features and global embedded features, more nonlinear factors are introduced, enhancing the ability to model complex relationships between features, helping to improve the model's expression ability, so that it can better fit the nonlinear characteristics of the data and improve the accuracy of predictions. According to the formation process of training features and the way of regional division (the division of a sample road section includes the calibration road section and the adjacent road section), the lag term of the real-time embedded features and the lag term of the global embedded features are designed accordingly, which can help the model capture the time dependency and long-term memory of the data, better understand the time series characteristics of the data, and help improve the model's understanding and prediction capabilities of the time-varying trends of traffic data. The weight parameters of the real-time embedded features and the global embedded features can be learned through training data (the weight distribution scheme designed by this scheme can also be used), so that the weight parameters can be automatically adjusted according to the characteristics of the data, making the feature fusion process more flexible and adaptable to different data distribution situations.

3.注意力权重的计算公式可以使模型根据不同特征的重要性动态调整权重，更好地关注对当前预测任务更为关键的信息。通过对隐藏状态和训练特征的计算，模型可以根据历史信息和当前特征的关系来决定每个特征的重要性，有助于提高模型对数据特征的敏感度，增强预测的准确性。权重参数的设计使用tanh函数，随着时间步t与特征序号i(第i个时间步对应的训练特征)之间的差值增大，权重逐渐减小，可以使模型更加关注近期的特征信息，降低对远期特征的依赖性，有助于模型更好地捕捉数据的短期变化趋势，提高对实时信息的敏感度。权重参数/>根据样本路段j的位置s在整体序列中的位置进行不同取值，其中M为样本路段总数，这样能够使模型对不同位置的路段赋予不同的权重(主要是依据通行方向区分前序的邻近路段和后序的临近路段的影响)，能够更好地区分不同路段对预测任务的贡献，且能够使模型关注对整体预测更为重要的路段信息，提高模型的泛化能力和对全局信息的理解，最终提高交通流量预测的精度，提高对应用层面的适用性(例如行车路径规划，指导智能化的交通管理设计等)，改善交通拥堵状况。3. The calculation formula of attention weight can enable the model to dynamically adjust the weight according to the importance of different features, and better focus on the information that is more critical to the current prediction task. By calculating the hidden state and training features, the model can determine the importance of each feature based on the relationship between historical information and current features, which helps to improve the model's sensitivity to data features and enhance the accuracy of prediction. Weight parameter The design uses the tanh function. As the difference between time step t and feature number i (training feature corresponding to the i-th time step) increases, the weight gradually decreases, which can make the model pay more attention to recent feature information and reduce its dependence on long-term features. This helps the model better capture the short-term change trend of data and improve its sensitivity to real-time information. Weight parameter/> Different values are taken according to the position s of the sample road section j in the overall sequence, where M is the total number of sample road sections. This enables the model to assign different weights to road sections at different positions (mainly to distinguish the influence of the preceding adjacent road sections and the succeeding adjacent road sections based on the travel direction), and can better distinguish the contribution of different road sections to the prediction task. It also enables the model to focus on the road section information that is more important for the overall prediction, improve the generalization ability of the model and the understanding of global information, and ultimately improve the accuracy of traffic flow prediction, improve the applicability at the application level (such as driving route planning, guiding intelligent traffic management design, etc.), and improve traffic congestion.

为使本申请的上述目的、特征和优点能更明显易懂，下文特举较佳实施例，并配合所附附图，作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present application more obvious and easy to understand, preferred embodiments are specifically cited below and described in detail with reference to the attached drawings.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例的技术方案，下面将对本申请实施例中所需要使用的附图作简单地介绍，应当理解，以下附图仅示出了本申请的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for use in the embodiments of the present application will be briefly introduced below. It should be understood that the following drawings only show certain embodiments of the present application and therefore should not be regarded as limiting the scope. For ordinary technicians in this field, other related drawings can be obtained based on these drawings without paying creative work.

图1为构建交通流量预测模型的流程图。Figure 1 is a flow chart of building a traffic flow prediction model.

图2为本申请实施例提供的基于大数据的交通流量预测方法的流程图。FIG2 is a flow chart of a traffic flow prediction method based on big data provided in an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.

由于本方案的基于大数据的交通流量预测方法，主要依赖于本方案构建的交通流量预测模型来实现，为了便于对本方案的理解，此处先对构建交通流量预测模型的过程进行介绍。Since the traffic flow prediction method based on big data of this scheme mainly relies on the traffic flow prediction model constructed by this scheme, in order to facilitate the understanding of this scheme, the process of constructing the traffic flow prediction model is first introduced here.

请参阅图1，图1为构建交通流量预测模型的流程图。在本实施例中，构建交通流量预测模型可以包括步骤S11、步骤S12、步骤S13、步骤S14、步骤S15、步骤S16。Please refer to Figure 1, which is a flow chart of constructing a traffic flow prediction model. In this embodiment, constructing a traffic flow prediction model may include steps S11, S12, S13, S14, S15, and S16.

为了构建交通流量预测模型，可以先运行步骤S11。In order to construct a traffic flow prediction model, step S11 may be executed first.

步骤S11：获取历史交通数据集，其中，历史交通数据集包含监测区域内N个连续时间步采集的历史交通信息。Step S11: Acquire a historical traffic data set, wherein the historical traffic data set includes historical traffic information collected in N consecutive time steps within the monitoring area.

在本实施例中，可以采集监测区域内N个连续时间步的历史交通信息，此处的N个连续时间步，可以是每1分钟、10分钟等作为一个时间步，不作限定。理论上为了保持较高的预测准确率，数据采集的时间跨度最好在1年及以上，但由于这样数据量太过巨大，因此本实施例以时间跨度1个月为例，通过模型在使用过程中的不断学习更新来逐步完善长期特征(即全局特征可以随着模型的使用过程进行更新优化，或者设计定期更新全局特征)。而历史交通信息则包含传感器采集的数据(如交通摄像头、地磁传感器、雷达)、GPS数据(车辆上搭载的GPS设备发送的数据)、地理信息系统(GIS)数据、天气数据(降雨量、风速、能见度等)、事件数据(交通事故、施工工程、特殊活动等)、车辆数据(车辆的运行状态和行驶数据等)等。In this embodiment, historical traffic information of N consecutive time steps in the monitoring area can be collected. Here, N consecutive time steps can be 1 minute, 10 minutes, etc. as a time step, without limitation. In theory, in order to maintain a high prediction accuracy, the time span of data collection is preferably 1 year or more. However, since the amount of data is too large, this embodiment takes a time span of 1 month as an example, and gradually improves the long-term features through continuous learning and updating of the model during use (that is, the global features can be updated and optimized as the model is used, or the global features are designed to be updated regularly). The historical traffic information includes data collected by sensors (such as traffic cameras, geomagnetic sensors, radars), GPS data (data sent by GPS devices on vehicles), geographic information system (GIS) data, weather data (rainfall, wind speed, visibility, etc.), event data (traffic accidents, construction projects, special activities, etc.), vehicle data (vehicle operation status and driving data, etc.), etc.

得到历史交通信息后，可以运行步骤S12。After obtaining the historical traffic information, step S12 may be executed.

步骤S12：对监测区域进行划分，形成M个样本路段，确定出每个样本路段对应的历史交通数据，其中，每个样本路段对应的每条历史交通数据中包含对应的交通流量标签，用于揭示该样本路段在该时间步的交通流量，每个样本路段包含一个标定路段和与标定路段相通的邻近路段，在样本路段的每个通行方向上，可从该通行方向上位于起点的邻近路段出发经过标定路段后到达该通行方向上位于终点的邻近路段。Step S12: Divide the monitoring area to form M sample sections, and determine the historical traffic data corresponding to each sample section, wherein each piece of historical traffic data corresponding to each sample section contains a corresponding traffic flow label, which is used to reveal the traffic flow of the sample section at the time step. Each sample section includes a calibrated section and an adjacent section connected to the calibrated section. In each travel direction of the sample section, it is possible to start from the adjacent section at the starting point in the travel direction and then pass through the calibrated section to reach the adjacent section at the end point in the travel direction.

在本实施例中，为了实现对监测区域内局部路段的流量预测，可以进行划分，形成M个样本路段，确定出每个样本路段对应的历史交通数据。In this embodiment, in order to realize the flow prediction of the local road section in the monitoring area, it can be divided into M sample sections, and the historical traffic data corresponding to each sample section can be determined.

示例性的，可以对监测区域进行路段划分，得到M个路段。Exemplarily, the monitoring area may be divided into sections to obtain M sections.

针对每个路段，可以将当前路段作为标定路段，确定标定路段及邻近路段(例如，本实施例以直接与标定路段连通的路段作为邻近路段，其他实施例中，为了进一步提升预测精度，还可以考虑更广阔范围的路段作为邻近路段，如，每个通行方向上与标定路段连通的最近三条路段作为邻近路段)为一个样本路段，共计得到M个样本路段，其中，每个样本路段包含一个标定路段和与标定路段相通的邻近路段，在样本路段的每个通行方向上，可从该通行方向上位于起点的邻近路段出发经过标定路段后到达该通行方向上位于终点的邻近路段。For each road section, the current road section can be used as a calibration section, and the calibration section and adjacent sections (for example, in this embodiment, the section directly connected to the calibration section is used as the adjacent section. In other embodiments, in order to further improve the prediction accuracy, sections in a wider range can also be considered as adjacent sections, such as the three nearest sections connected to the calibration section in each travel direction as adjacent sections) are determined as a sample section, and a total of M sample sections are obtained, wherein each sample section includes a calibration section and adjacent sections connected to the calibration section. In each travel direction of the sample section, it is possible to start from the adjacent section at the starting point in the travel direction and then pass through the calibration section to reach the adjacent section at the end point in the travel direction.

然后，针对每个样本路段，将每条历史交通信息在当前样本路段对应的信息作为样本路段对应的历史交通数据，每个样本路段对应N条历史交通数据，其中，每条历史交通数据中包含交通流量标签，通过当前样本路段中标记路段在相应时间步下的交通流量确定。由此，可以得到M个样本路段对应的MN条历史交通数据(每个样本路段对应N条历史交通数据)。Then, for each sample road section, the information corresponding to each piece of historical traffic information in the current sample road section is used as the historical traffic data corresponding to the sample road section, and each sample road section corresponds to N pieces of historical traffic data, wherein each piece of historical traffic data contains a traffic flow label, which is determined by the traffic flow of the marked section in the current sample road section at the corresponding time step. Thus, MN pieces of historical traffic data corresponding to M sample road sections can be obtained (each sample road section corresponds to N pieces of historical traffic data).

确定出每个样本路段对应的历史交通数据后，可以运行步骤S13。After the historical traffic data corresponding to each sample road section is determined, step S13 may be executed.

步骤S13：对历史交通数据集中的每个样本路段对应的历史交通数据进行预处理，并确定出每个样本路段对应的实时特征和全局特征，得到每个样本路段对应的N个实时特征和M个样本路段对应的M个全局特征。Step S13: pre-process the historical traffic data corresponding to each sample section in the historical traffic data set, and determine the real-time features and global features corresponding to each sample section, to obtain N real-time features corresponding to each sample section and M global features corresponding to M sample sections.

在本实施例中，可以对历史交通数据集中的每个样本路段对应的历史交通数据进行数据清洗，如处理缺失值，识别和剔除异常值等，以确保数据质量的可靠性和准确性。In this embodiment, data cleaning may be performed on the historical traffic data corresponding to each sample road section in the historical traffic data set, such as processing missing values, identifying and removing outliers, etc., to ensure the reliability and accuracy of data quality.

然后对清洗后的每个样本路段对应的历史交通数据进行标准化，例如对数转换、归一化等操作。Then, the historical traffic data corresponding to each cleaned sample road section is standardized, such as logarithmic transformation, normalization and other operations.

完成数据标准化后，可以对标准化后的每个样本路段对应的历史交通数据进行实时特征提取，确定出样本路段对应的N个实时特征，共计得到MN个实时特征。此处的实施特征提取，例如进行聚合、提取统计特征(如均值、方差等)、构建时空特征(联合各通行方向上的邻近路段和标定路段进行时空特征构建，每个通行方向形成一部分特征参数)，从而整合得到样本路段对应的实时特征。After completing data standardization, real-time feature extraction can be performed on the historical traffic data corresponding to each standardized sample road section to determine N real-time features corresponding to the sample road section, and a total of MN real-time features can be obtained. Feature extraction is implemented here, such as aggregation, extraction of statistical features (such as mean, variance, etc.), and construction of spatiotemporal features (jointly constructing spatiotemporal features by combining adjacent sections and calibration sections in each direction of travel, and each direction of travel forms a part of the feature parameters), so as to integrate and obtain the real-time features corresponding to the sample road section.

以及，可以对同一样本路段在所有时间步下的历史交通数据进行全局特征提取，确定出M个样本路段对应的M个全局特征。在训练阶段，本实施例以时间跨度为1-3个月的数据为例，提取这部分的全局特征。在模型训练完成投入使用后，可以随着数据的时间跨度拉长，更新每个样本路段的全局特征，从而随着时间的增长，更进一步提高模型的准确率。Furthermore, global features can be extracted from the historical traffic data of the same sample road section at all time steps to determine the M global features corresponding to the M sample road sections. In the training phase, this embodiment takes the data with a time span of 1-3 months as an example to extract the global features of this part. After the model training is completed and put into use, the global features of each sample road section can be updated as the time span of the data is lengthened, thereby further improving the accuracy of the model as time goes by.

得到每个样本路段对应的N个实时特征和M个样本路段对应的M个全局特征后，可以运行步骤S14。After obtaining N real-time features corresponding to each sample road section and M global features corresponding to M sample road sections, step S14 may be executed.

步骤S14：对每个样本路段对应的全局特征和实时特征进行特征融合，得到每个样本路段对应的N个训练特征，形成包含MN个训练特征的训练数据集。Step S14: performing feature fusion on the global features and real-time features corresponding to each sample road section to obtain N training features corresponding to each sample road section, thereby forming a training data set containing MN training features.

在本实施例中，针对每个样本路段：In this embodiment, for each sample road section:

可以利用全连接层将当前样本路段对应的全局特征和每个实时特征进行特征嵌入，得到当前样本路段对应的N组实时嵌入特征和一组全局嵌入特征。The fully connected layer can be used to embed the global features and each real-time feature corresponding to the current sample road section, so as to obtain N groups of real-time embedded features and a group of global embedded features corresponding to the current sample road section.

示例性的，针对当前样本路段j对应的第i个实时特征和当前样本路段j对应的全局特征Y，可以采用以下公式进行特征嵌入：For example, for the i-th real-time feature corresponding to the current sample road segment j The global feature Y corresponding to the current sample road section j can be embedded using the following formula:

其中，i∈[1,N],j∈[1,M]，为对实时特征/>进行特征嵌入后的特征表示，为对全局特征Y^j进行特征嵌入后的特征表示，/>为对实时特征/>进行特征嵌入时的权重矩阵，/>为对全局特征Y^j进行特征嵌入时的权重矩阵，/>为对实时特征/>进行特征嵌入时的偏置项，/>为对全局特征Y^j进行特征嵌入时的偏置项，σ为激活函数(本实施例采用Sigmoid函数为例)。Among them, i∈[1,N],j∈[1,M], For real-time features/> Feature representation after feature embedding, is the feature representation after feature embedding of the global feature Y ^j ,/> For real-time features/> The weight matrix for feature embedding, /> is the weight matrix for embedding the global feature Y ^j ,/> For real-time features/> Bias term when embedding features, /> is the bias term when embedding the global feature ^Yj , and σ is the activation function (the Sigmoid function is used as an example in this embodiment).

针对每组实时嵌入特征：可以将实时嵌入特征与对应的全局嵌入特征进行特征融合，得到对应的训练特征。For each set of real-time embedding features: the real-time embedding features can be fused with the corresponding global embedding features to obtain the corresponding training features.

具体的，针对实时嵌入特征和全局嵌入特征/>可以采用以下公式进行特征融合：Specifically, for real-time embedding features and global embedding features/> The following formula can be used for feature fusion:

注意力权重α满足：The attention weight α satisfies:

权重参数满足：Weight parameters satisfy:

然后，可以整合M个样本路段对应的所有训练特征，形成包含MN个训练特征的训练数据集。Then, all training features corresponding to the M sample road sections may be integrated to form a training data set containing MN training features.

得到训练数据集后，可以运行步骤S15。After obtaining the training data set, step S15 may be executed.

步骤S15：将训练数据集划分为训练集和测试集。Step S15: Divide the training data set into a training set and a test set.

在本实施例中，采用8.5：1.5的划分方式将训练数据集划分为训练集(占比85％)和测试集(占比15％)，划分时，均需要按照样本路段进行等比例划分，以便后续针对监测区域内每个样本路段的训练和测试，避免样本失衡。In this embodiment, the training data set is divided into a training set (accounting for 85%) and a test set (accounting for 15%) in an 8.5:1.5 division method. When dividing, it is necessary to divide it in equal proportion according to the sample sections to facilitate subsequent training and testing of each sample section in the monitoring area to avoid sample imbalance.

得到训练集和测试集后，可以运行步骤S16。After obtaining the training set and the test set, step S16 may be executed.

步骤S16：构建GRU模型，并利用训练集和测试集对GRU模型进行训练和测试，得到训练好的交通流量预测模型。Step S16: construct a GRU model, and use the training set and the test set to train and test the GRU model to obtain a trained traffic flow prediction model.

在本实施例中，可以构建GRU模型，GRU(Gated Recurrent Unit)为适用于处理时间序列数据的循环神经网络模型，模型中各门的更新公式如下：In this embodiment, a GRU model can be constructed. GRU (Gated Recurrent Unit) is a recurrent neural network model suitable for processing time series data. The update formulas of each gate in the model are as follows:

重置门：Reset the gate:

其中，r^(t)为时间步t时的重置门，σ为激活函数(同样为Sigmoid函数)，W_r为重置门的权重，h^(t-1)为时间步t-1时的隐藏状态，为样本路段j对应的时间步t时的训练特征(将前文得到的训练特征/>改写为了/>适应GRU模型的参数描述)，b_r为重置门的偏置。Where r ^(t) is the reset gate at time step t, σ is the activation function (also Sigmoid function), _Wr is the weight of the reset gate, h ^(t-1) is the hidden state at time step t-1, is the training feature at time step t corresponding to the sample road segment j (the training feature obtained in the previous article/> Rewritten as/> Parameter description of the adapted GRU model), _br is the bias of the reset gate.

更新门：Update Gate:

其中，z^(t)为时间步t时的更新门，σ为激活函数(同样为Sigmoid函数)，W_z为更新门的权重，h^(t-1)为时间步t-1时的隐藏状态，为样本路段j对应的时间步t时的训练特征，b_z为更新门的偏置。Where z ^(t) is the update gate at time step t, σ is the activation function (also Sigmoid function), _Wz is the weight of the update gate, h ^(t-1) is the hidden state at time step t-1, is the training feature at time step t corresponding to sample road segment j, and b _z is the bias of the update gate.

候选隐藏状态：Candidate hidden states:

其中，为时间步t时的候选隐藏状态，W_k为候选隐藏状态对应的权重，⊙为逐元素乘法操作，b_k为候选隐藏状态对应的偏置。in, is the candidate hidden state at time step t, _Wk is the weight corresponding to the candidate hidden state, ⊙ is the element-by-element multiplication operation, and _bk is the bias corresponding to the candidate hidden state.

隐藏状态更新：Hide status updates:

其中，h^(t)为时间步t时的隐藏状态。Where h ^(t) is the hidden state at time step t.

选用均方误差作为GRU模型的损失函数。The mean square error is selected as the loss function of the GRU model.

最终，利用训练集对构建的GRU模型进行训练，利用测试集对GRU模型进行测试，最终得到训练好的交通流量预测模型。Finally, the constructed GRU model is trained using the training set, and the GRU model is tested using the test set, and finally a trained traffic flow prediction model is obtained.

得到训练好的交通流量预测模型后，可以将训练好的交通流量预测模型搭载于服务器，并置入基于大数据的交通流量预测方法的运行程序到服务器内，即可由服务器运行基于大数据的交通流量预测方法。After obtaining the trained traffic flow prediction model, the trained traffic flow prediction model can be mounted on a server, and the running program of the traffic flow prediction method based on big data can be placed in the server, and the traffic flow prediction method based on big data can be run by the server.

请参阅图2，图2为本申请实施例提供的基于大数据的交通流量预测方法的流程图。在本实施例中，基于大数据的交通流量预测方法可以包括步骤S21、步骤S22、步骤S23、步骤S24。Please refer to Figure 2, which is a flow chart of a traffic flow prediction method based on big data provided in an embodiment of the present application. In this embodiment, the traffic flow prediction method based on big data may include steps S21, S22, S23, and S24.

首先，服务器可以运行步骤S21。First, the server may execute step S21.

步骤S21：获取目标路段的实时交通数据，其中，目标路段包含进行流量预测的监测路段和与监测路段相通的邻近路段，在目标路段的每个通行方向上，可从该通行方向上位于起点的邻近路段出发经过监测路段后到达该通行方向上位于终点的邻近路段。Step S21: Acquire real-time traffic data of the target road section, wherein the target road section includes a monitoring section for traffic prediction and an adjacent road section connected to the monitoring section. In each travel direction of the target road section, it is possible to start from the adjacent road section at the starting point in the travel direction and then reach the adjacent road section at the end point in the travel direction after passing through the monitoring section.

在本实施例中，为了实现对监测路段的交通流量预测，需要获取目标路段(包含进行流量预测的监测路段和与监测路段相通的邻近路段，在目标路段的每个通行方向上，可从该通行方向上位于起点的邻近路段出发经过监测路段后到达该通行方向上位于终点的邻近路段)的实时交通数据，实时交通数据包含传感器采集的数据(如交通摄像头、地磁传感器、雷达)、GPS数据(车辆上搭载的GPS设备发送的数据)、地理信息系统(GIS)数据、天气数据(降雨量、风速、能见度等)、事件数据(交通事故、施工工程、特殊活动等)、车辆数据(车辆的运行状态和行驶数据等)等，与前文中样本路段的历史交通数据对应。In this embodiment, in order to realize the traffic flow prediction of the monitored section, it is necessary to obtain the real-time traffic data of the target section (including the monitored section for flow prediction and the adjacent section connected to the monitored section. In each travel direction of the target section, it is possible to start from the adjacent section at the starting point in the travel direction and then reach the adjacent section at the end point in the travel direction after passing through the monitored section). The real-time traffic data includes data collected by sensors (such as traffic cameras, geomagnetic sensors, radars), GPS data (data sent by the GPS device on the vehicle), geographic information system (GIS) data, weather data (rainfall, wind speed, visibility, etc.), event data (traffic accidents, construction projects, special activities, etc.), vehicle data (vehicle operating status and driving data, etc.), etc., which correspond to the historical traffic data of the sample section in the previous text.

得到目标路段的实时交通数据后，服务器可以运行步骤S22。After obtaining the real-time traffic data of the target road section, the server may execute step S22.

步骤S22：对目标路段的实时交通数据进行预处理，并确定出目标路段对应的实时特征。Step S22: pre-processing the real-time traffic data of the target road section, and determining the real-time features corresponding to the target road section.

在本实施例中，服务器可以对目标路段的实时交通数据进行预处理，例如数据清洗、标准化等，此处不再赘述。然后进行实时特征提取，得到目标路段对应的实时特征。In this embodiment, the server may pre-process the real-time traffic data of the target road section, such as data cleaning and standardization, which will not be described in detail here, and then perform real-time feature extraction to obtain the real-time features corresponding to the target road section.

得到目标路段对应的实时特征后，服务器可以运行步骤S23。After obtaining the real-time features corresponding to the target road section, the server may execute step S23.

步骤S23：获取目标路段对应的全局特征，并对目标路段对应的全局特征和实时特征进行特征融合，得到目标路段对应的输入特征。Step S23: obtaining the global features corresponding to the target road section, and performing feature fusion on the global features and real-time features corresponding to the target road section to obtain the input features corresponding to the target road section.

在本实施例中，服务器可以获取目标路段对应的全局特征，由于目标路段是与监测区域内的一个样本路段对应的，因此可以直接获取该样本路段对应的全局特征作为目标路段的全局特征。In this embodiment, the server may obtain the global features corresponding to the target road section. Since the target road section corresponds to a sample road section in the monitoring area, the global features corresponding to the sample road section may be directly obtained as the global features of the target road section.

得到目标路段对应的全局特征后，可以对目标路段对应的全局特征和实时特征进行特征融合，得到目标路段对应的输入特征。全局特征和实时特征的特征融合过程具体可以参阅前文步骤S14示出的过程，此处不再赘述。After obtaining the global features corresponding to the target road section, the global features and real-time features corresponding to the target road section can be fused to obtain the input features corresponding to the target road section. The feature fusion process of the global features and the real-time features can be specifically referred to the process shown in step S14 above, which will not be repeated here.

得到输入特征后，服务器可以运行步骤S24。After obtaining the input features, the server may execute step S24.

步骤S24：将目标路段对应的输入特征输入至预设的交通流量预测模型中，通过交通流量预测模型确定出目标路段对应的交通流量预测结果。Step S24: inputting the input features corresponding to the target road section into a preset traffic flow prediction model, and determining the traffic flow prediction result corresponding to the target road section through the traffic flow prediction model.

在本实施例中，可以将输入特征输入至预设的交通流量预测模型中，通过交通流量预测模型确定出目标路段对应的交通流量预测结果。当然，在实际应用时，如果是刚开始使用(即没有前序的预测结果和输入数据)时，最好是连续预测多次后，以便得到更准确的预测结果。In this embodiment, the input features can be input into a preset traffic flow prediction model, and the traffic flow prediction result corresponding to the target road section can be determined by the traffic flow prediction model. Of course, in actual application, if it is just beginning to be used (i.e., there are no previous prediction results and input data), it is best to perform continuous predictions for multiple times in order to obtain more accurate prediction results.

得到交通流量预测结果后，便可将交通流量预测结果投入应用层使用，如，基于交通流量预测结果进行路径规划，或者进行路径拥堵提示，指导交通管理等，此处不做延伸说明。After obtaining the traffic flow prediction results, the traffic flow prediction results can be put into use in the application layer, such as route planning based on the traffic flow prediction results, or route congestion prompts, traffic management guidance, etc., which will not be further explained here.

综上所述，本申请实施例提供一种基于大数据的交通流量预测方法，通过构建交通流量预测模型，对监测区域进行划分形成M个样本路段，利用历史交通数据集确定出每个样本路段对应的历史交通数据(包含对应的交通流量标签)；对历史交通数据进行预处理，并确定出每个样本路段对应的实时特征和全局特征；进一步进行特征融合，得到每个样本路段对应的N个训练特征，形成包含MN个训练特征的训练数据集；构建GRU模型，进行训练和测试，得到训练好的交通流量预测模型。通过结合实时特征和全局特征，并采用GRU模型进行预测，能够提高交通流量预测的准确性，使预测结果更加可靠。利用实时交通数据进行预测，能够更好地反映出交通流量的动态变化，全局特征和实时特征的融合能够充分利用历史和当前数据，提高模型对复杂交通情况的感知能力，从而考虑到交通拥堵、事故等突发事件的影响，提高交通流量预测的精度。In summary, the embodiment of the present application provides a traffic flow prediction method based on big data, by constructing a traffic flow prediction model, dividing the monitoring area into M sample sections, and using the historical traffic data set to determine the historical traffic data corresponding to each sample section (including the corresponding traffic flow label); preprocessing the historical traffic data, and determining the real-time features and global features corresponding to each sample section; further performing feature fusion to obtain N training features corresponding to each sample section, forming a training data set containing MN training features; constructing a GRU model, training and testing, and obtaining a trained traffic flow prediction model. By combining real-time features and global features, and using the GRU model for prediction, the accuracy of traffic flow prediction can be improved, and the prediction results can be made more reliable. Using real-time traffic data for prediction can better reflect the dynamic changes of traffic flow. The fusion of global features and real-time features can make full use of historical and current data, improve the model's perception of complex traffic conditions, and thus take into account the impact of emergencies such as traffic congestion and accidents, and improve the accuracy of traffic flow prediction.

设计特征嵌入和特征融合思路，能够使模型更好地利用全局和实时特征之间的关系，提高特征表达的效果，从而增强模型对不同因素的综合考虑能力。通过将实时嵌入特征和全局嵌入特征进行加权融合，模型可以同时考虑到当前时刻的实时信息和历史全局信息，从而综合利用两者的优势。这有助于提高模型对交通数据特征的综合把握能力，使得模型更加全面地理解路况情况。引入注意力权重，动态地学习实时嵌入特征和全局嵌入特征之间的重要性，并据此进行加权融合，这样可以使模型自适应地关注对当前预测任务更为关键的特征信息，提高模型在不同场景下的适应性和泛化能力。在实时嵌入特征和全局嵌入特征进行非线性变换时，引入了更多的非线性因素，增强特征之间的复杂关系建模能力，助于提高模型的表达能力，使其可以更好地拟合数据的非线性特征，提高预测的准确性。根据训练特征的形成过程，区域划分的方式(一个样本路段的划分包含标定路段和邻近路段)，据此设计了考虑到实时嵌入特征的滞后项和全局嵌入特征的滞后项，可以帮助模型捕捉数据的时间依赖性和长期记忆，更好地理解数据的时间序列特征，有助于提高模型对交通数据时间变化趋势的理解和预测能力。而实时嵌入特征和全局嵌入特征的权重参数可以通过训练数据学习得到(也可以采用本方案所设计的权重分配方案)，从而可以根据数据的特点自动调整权重参数，使得特征融合过程更加灵活和适应不同数据分布的情况。Designing feature embedding and feature fusion ideas can enable the model to better utilize the relationship between global and real-time features, improve the effect of feature expression, and thus enhance the model's ability to comprehensively consider different factors. By weighted fusion of real-time embedded features and global embedded features, the model can simultaneously consider the real-time information at the current moment and the historical global information, thereby comprehensively utilizing the advantages of both. This helps to improve the model's ability to comprehensively grasp the characteristics of traffic data, allowing the model to understand road conditions more comprehensively. Introducing attention weights, dynamically learning the importance between real-time embedded features and global embedded features, and performing weighted fusion accordingly, this allows the model to adaptively focus on feature information that is more critical to the current prediction task, improving the model's adaptability and generalization ability in different scenarios. When nonlinear transformations are performed on real-time embedded features and global embedded features, more nonlinear factors are introduced, enhancing the ability to model complex relationships between features, helping to improve the model's expression ability, enabling it to better fit the nonlinear characteristics of data and improve the accuracy of predictions. According to the formation process of training features and the way of regional division (the division of a sample road section includes the calibration road section and the adjacent road section), the lag term of the real-time embedded features and the lag term of the global embedded features are designed accordingly, which can help the model capture the time dependency and long-term memory of the data, better understand the time series characteristics of the data, and help improve the model's understanding and prediction capabilities of the time-varying trends of traffic data. The weight parameters of the real-time embedded features and the global embedded features can be learned through training data (the weight distribution scheme designed by this scheme can also be used), so that the weight parameters can be automatically adjusted according to the characteristics of the data, making the feature fusion process more flexible and adaptable to different data distribution situations.

注意力权重的计算公式可以使模型根据不同特征的重要性动态调整权重，更好地关注对当前预测任务更为关键的信息。通过对隐藏状态和训练特征的计算，模型可以根据历史信息和当前特征的关系来决定每个特征的重要性，有助于提高模型对数据特征的敏感度，增强预测的准确性。权重参数的设计使用tanh函数，随着时间步t与特征序号i(第i个时间步对应的训练特征)之间的差值增大，权重逐渐减小，可以使模型更加关注近期的特征信息，降低对远期特征的依赖性，有助于模型更好地捕捉数据的短期变化趋势，提高对实时信息的敏感度。权重参数/>根据样本路段j的位置s在整体序列中的位置进行不同取值，其中M为样本路段总数，这样能够使模型对不同位置的路段赋予不同的权重(主要是依据通行方向区分前序的邻近路段和后序的临近路段的影响)，能够更好地区分不同路段对预测任务的贡献，且能够使模型关注对整体预测更为重要的路段信息，提高模型的泛化能力和对全局信息的理解，最终提高交通流量预测的精度，提高对应用层面的适用性(例如行车路径规划，指导智能化的交通管理设计等)，改善交通拥堵状况。The calculation formula of attention weights allows the model to dynamically adjust weights according to the importance of different features, and better focus on information that is more critical to the current prediction task. By calculating the hidden state and training features, the model can determine the importance of each feature based on the relationship between historical information and current features, which helps to improve the model's sensitivity to data features and enhance the accuracy of predictions. Weight parameters The design uses the tanh function. As the difference between time step t and feature number i (training feature corresponding to the i-th time step) increases, the weight gradually decreases, which can make the model pay more attention to recent feature information and reduce its dependence on long-term features. This helps the model better capture the short-term change trend of data and improve its sensitivity to real-time information. Weight parameter/> Different values are taken according to the position s of the sample road section j in the overall sequence, where M is the total number of sample road sections. This enables the model to assign different weights to road sections at different positions (mainly to distinguish the influence of the preceding adjacent road sections and the succeeding adjacent road sections based on the travel direction), and can better distinguish the contribution of different road sections to the prediction task. It also enables the model to focus on the road section information that is more important for the overall prediction, improve the generalization ability of the model and the understanding of global information, and ultimately improve the accuracy of traffic flow prediction, improve the applicability at the application level (such as driving route planning, guiding intelligent traffic management design, etc.), and improve traffic congestion.

在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。In this document, relational terms such as first and second, etc. are used merely to distinguish one entity or operation from another entity or operation, but do not necessarily require or imply any such actual relationship or order between these entities or operations.

以上所述仅为本申请的实施例而已，并不用于限制本申请的保护范围，对于本领域的技术人员来说，本申请可以有各种更改和变化。凡在本申请的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本申请的保护范围之内。The above description is only an embodiment of the present application and is not intended to limit the scope of protection of the present application. For those skilled in the art, the present application may have various modifications and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application shall be included in the scope of protection of the present application.

Claims

1. A traffic flow prediction method based on big data, comprising:

Acquiring real-time traffic data of a target road section, wherein the target road section comprises a monitoring road section for flow prediction and an adjacent road section communicated with the monitoring road section, and in each passing direction of the target road section, the adjacent road section positioned at a starting point in the passing direction can start from the adjacent road section positioned at the starting point in the passing direction, passes through the monitoring road section and then reaches the adjacent road section positioned at a terminal point in the passing direction;

preprocessing real-time traffic data of a target road section, and determining real-time characteristics corresponding to the target road section;

acquiring global features corresponding to the target road segments, and carrying out feature fusion on the global features corresponding to the target road segments and the real-time features to obtain input features corresponding to the target road segments;

And inputting the input characteristics corresponding to the target road section into a preset traffic flow prediction model, and determining a traffic flow prediction result corresponding to the target road section through the traffic flow prediction model.

2. The traffic flow prediction method based on big data according to claim 1, wherein before acquiring real-time traffic data, a traffic flow prediction model is constructed by:

Acquiring a historical traffic data set, wherein the historical traffic data set comprises historical traffic information acquired by N continuous time steps in a monitoring area;

Dividing a monitoring area to form M sample road sections, and determining historical traffic data corresponding to each sample road section, wherein each historical traffic data corresponding to each sample road section comprises a corresponding traffic flow label used for revealing the traffic flow of the sample road section in the time step, each sample road section comprises a calibration road section and an adjacent road section communicated with the calibration road section, and in each passing direction of the sample road section, the adjacent road section which is positioned at a starting point in the passing direction can be started from the adjacent road section which is positioned at the starting point in the passing direction, passes through the calibration road section and then reaches the adjacent road section which is positioned at an end point in the passing direction;

preprocessing historical traffic data corresponding to each sample road section in the historical traffic data set, and determining real-time features and global features corresponding to each sample road section to obtain N real-time features corresponding to each sample road section and M global features corresponding to M sample road sections;

Feature fusion is carried out on the global features and the real-time features corresponding to each sample road section, N training features corresponding to each sample road section are obtained, and a training data set containing MN training features is formed;

dividing the training data set into a training set and a testing set;

And constructing a GRU model, and training and testing the GRU model by using a training set and a testing set to obtain a trained traffic flow prediction model.

3. The traffic flow prediction method based on big data according to claim 2, wherein dividing the monitoring area to form M sample segments, determining historical traffic data corresponding to each sample segment, includes:

Carrying out road section division on the monitoring area to obtain M road sections;

For each road section, taking the current road section as a calibration road section, determining that the calibration road section and the adjacent road sections are one sample road section, and obtaining M sample road sections in total, wherein each sample road section comprises one calibration road section and the adjacent road sections communicated with the calibration road section, and in each passing direction of the sample road section, the adjacent road sections positioned at the starting point in the passing direction can reach the adjacent road sections positioned at the end point in the passing direction after passing through the calibration road sections from the adjacent road sections positioned at the starting point in the passing direction;

And aiming at each sample road section, taking the information corresponding to each piece of historical traffic information in the current sample road section as historical traffic data corresponding to the sample road section, wherein each sample road section corresponds to N pieces of historical traffic data, each piece of historical traffic data comprises a traffic flow label, and the traffic flow of the marked road section in the current sample road section under the corresponding time step is determined.

4. The big data based traffic flow prediction method according to claim 2, wherein preprocessing the historical traffic data corresponding to each sample section in the historical traffic data set and determining the real-time feature and the global feature corresponding to each sample section comprises:

Carrying out data cleaning on the historical traffic data corresponding to each sample road section in the historical traffic data set;

Normalizing the historical traffic data corresponding to each cleaned sample road section;

Carrying out real-time feature extraction on the standardized historical traffic data corresponding to each sample road section, and determining N real-time features corresponding to the sample road sections;

and carrying out global feature extraction on the historical traffic data of the same sample road section in all time steps, and determining M global features corresponding to the M sample road sections.

5. The traffic flow prediction method based on big data according to claim 2, wherein the feature fusion is performed on the global feature and the real-time feature corresponding to each sample road segment to obtain N training features corresponding to each sample road segment, and a training data set including MN training features is formed, and the method includes:

for each sample segment: performing feature embedding on the global features corresponding to the current sample road section and each real-time feature by using a full connection layer to obtain N groups of real-time embedded features and a group of global embedded features corresponding to the current sample road section;

For each set of real-time embedded features: performing feature fusion on the real-time embedded features and the corresponding global embedded features to obtain corresponding training features;

And integrating all training features corresponding to the M sample road sections to form a training data set containing the MN training features.

6. The traffic flow prediction method based on big data according to claim 5, wherein feature embedding is performed on global features and each real-time feature corresponding to a current sample section by using a full connection layer, so as to obtain N sets of real-time embedded features and a set of global embedded features corresponding to the current sample section, and the method comprises:

ith real-time feature corresponding to current sample segment j The global feature Y corresponding to the current sample section j is embedded by adopting the following formula:

Wherein i is E [1, N ], j is E [1, M ], To real-time feature/>Feature representation after feature embedding,/>For feature representation after feature embedding of global feature Y ^j,/>To real-time feature/>Weight matrix during feature embedding,/>For the weight matrix when the global feature Y ^j is embedded with the features,/>To real-time feature/>Bias term in feature embeddingFor bias terms when feature embedding is performed on global feature Y ^j, σ is the activation function.

7. The traffic flow prediction method based on big data according to claim 6, wherein feature fusion is performed between the real-time embedded feature and the corresponding global embedded feature to obtain the corresponding training feature, comprising:

For real-time embedded features And globally embedded features/>The following formula is used for feature fusion:

wherein, For the ith training feature corresponding to the sample section j, namely the real-time embedded feature/>And globally embedded features/>Fused training features, alpha is the attention weight,/>To embed features/>, in real timeIs used for the non-linear transformation of (a),For the purpose of global embedded features/>Nonlinear transformation of,/>For embedding features in real time/>The corresponding weight parameter is used to determine the weight of the object,For global embedded features/>Corresponding weight parameter,/>For embedding features in real time/>Hysteresis term at time step t, t.epsilon.1, N,/>For global embedded features/>Hysteresis term at sample segment s.

8. The big data based traffic flow prediction method according to claim 7, wherein the attention weight α satisfies:

wherein, The kth dimension parameter of the attention weight alpha at the time step t is F (·) is the attention function, h ^(t-1) is the hidden state of the last time step,/>For the kth dimension parameter in the ith training feature corresponding to the sample section j, C is the total number of dimensions of the training feature,/>And the first dimension parameter in the ith training feature corresponding to the sample section j.

9. The traffic flow prediction method based on big data according to claim 7, wherein the weight parameter isThe method meets the following conditions:

wherein, Is a weight parameter.

10. The traffic flow prediction method based on big data according to claim 7, wherein the weight parameter isThe method meets the following conditions:

wherein, Is a weight parameter.