CN117875674A

CN117875674A - A bus dispatching method based on Q-learning

Info

Publication number: CN117875674A
Application number: CN202410269459.5A
Authority: CN
Inventors: 颜建强; 赵仁琪; 高原; 曲卜婷
Original assignee: NORTHWEST UNIVERSITY
Current assignee: NORTHWEST UNIVERSITY
Priority date: 2024-03-11
Filing date: 2024-03-11
Publication date: 2024-04-12
Anticipated expiration: 2044-03-11
Also published as: CN117875674B

Abstract

The present invention discloses a bus dispatching method based on Q-learning, comprising the following steps: step 1, obtaining historical operation data of a bus system; step 2, obtaining expected passenger flow data within a preset time according to the historical operation data; step 3, using a Q-learning algorithm to construct a dispatching model according to the expected passenger flow data; step 4, applying the dispatching model to the actual operation of the bus system. The bus dispatching method based on Q-learning proposed in the present application predicts the expected passenger flow data within a preset time through the historical operation data of the bus system, and then uses a Q-learning algorithm to construct a dispatching model according to the expected passenger flow data, thereby improving the correlation between the collected data and the dispatching decision, and improving the accuracy of bus dispatching.

Description

A bus dispatching method based on Q-learning

技术领域Technical Field

本发明涉及交通调度技术领域，特别涉及一种基于Q-learning的公交调度方法。The present invention relates to the technical field of traffic dispatching, and in particular to a public transportation dispatching method based on Q-learning.

背景技术Background technique

随着我国城市化进程的加速，公共交通系统在解决城市交通拥堵问题中发挥着越来越重要的作用。快速公交（Bus Rapid Transit，简称BRT）作为一种大容量、高效率的公共交通方式，已经在世界范围内得到了广泛的应用。如何有效地对快速公交进行调度，以提高运行效率和满足乘客需求，成为了一个亟待解决的问题。With the acceleration of my country's urbanization process, public transportation systems play an increasingly important role in solving urban traffic congestion problems. Bus Rapid Transit (BRT), as a large-capacity and high-efficiency public transportation mode, has been widely used around the world. How to effectively dispatch BRT to improve operating efficiency and meet passenger needs has become an urgent problem to be solved.

现有技术中，专利公开号为CN116895144A的《一种基于深度强化学习的电动公交动态调度系统及方法》采用深度强化学习DQN算法，设计电动公交调度的深度强化学习程序，代价函数的设计考虑了服务效率、服务可靠性、运营成本，并对其进行训练，使得智能体能够根据训练好的神经网络产生调度决策；考虑了不同于燃油公交的电动公交所面临的短续航里程、长充电时间的问题，该方法同时考虑乘客的请求、电动公交本身的电量以及充电站的位置和状态，能够提供最优的充电规划以及调度决策。In the prior art, the patent publication number CN116895144A, "A dynamic dispatching system and method for electric buses based on deep reinforcement learning", adopts the deep reinforcement learning DQN algorithm to design a deep reinforcement learning program for electric bus dispatching. The design of the cost function takes into account service efficiency, service reliability, and operating costs, and trains them so that the intelligent agent can make dispatching decisions based on the trained neural network; considering the problems of short cruising range and long charging time faced by electric buses, which are different from fuel buses, this method also takes into account the passenger's request, the power of the electric bus itself, and the location and status of the charging station, and can provide the best charging plan and dispatching decision.

在进行公交调度时，除了考虑运营收入和运营成本这些直接因素的影响，也不能忽略预期客流量这种间接因素的影响，预期客流量能够反映预期时间内到达站点的乘客数量，进而对调度决策产生影响，然而，上述现有技术并未考虑对预期客流量的影响进行研究，采集数据与调度决策的相关度较差；此外，上述现有技术中，DQN算法存在无法保证收敛性的问题，在进行公交调度的强化学习研究时，容易陷入次优策略，公交调度的准确性较差。When conducting bus dispatch, in addition to considering the impact of direct factors such as operating income and operating costs, the impact of indirect factors such as expected passenger flow cannot be ignored. The expected passenger flow can reflect the number of passengers arriving at the station within the expected time, which in turn affects the dispatch decision. However, the above-mentioned prior art does not consider the impact of expected passenger flow, and the correlation between the collected data and the dispatch decision is poor. In addition, in the above-mentioned prior art, the DQN algorithm has the problem of not being able to guarantee convergence. When conducting reinforcement learning research on bus dispatch, it is easy to fall into suboptimal strategies, and the accuracy of bus dispatch is poor.

发明内容Summary of the invention

本发明提供了一种基于Q-learning的公交调度方法，用以解决现有技术中没有比较可靠的针对公交调度采集数据与调度决策的相关度较差，公交调度的准确性较差的问题。The present invention provides a bus dispatching method based on Q-learning, which is used to solve the problem that there is no relatively reliable bus dispatching collection data and the correlation between the dispatching decision is poor, and the bus dispatching accuracy is poor in the prior art.

一方面，本发明提供了一种基于Q-learning的公交调度方法，包括以下步骤：On the one hand, the present invention provides a bus scheduling method based on Q-learning, comprising the following steps:

步骤一，获取公交系统的历史运行数据。Step 1: Obtain historical operation data of the public transportation system.

步骤二，根据所述历史运行数据得到预设时间内的预期客流量数据。Step 2: Obtain expected passenger flow data within a preset time based on the historical operation data.

步骤三，利用Q-learning算法根据所述预期客流量数据构建得到调度模型。Step three: Use the Q-learning algorithm to construct a scheduling model based on the expected passenger flow data.

步骤四，将所述调度模型应用于实际的公交系统运行中。Step 4: Apply the scheduling model to the actual operation of the public transportation system.

在一种可能的实现方式中，步骤一中，所述历史运行数据包括：发车时间数据、线路数据、到站时间数据、GPS轨迹数据和刷卡数据。In a possible implementation, in step one, the historical operation data includes: departure time data, route data, arrival time data, GPS track data and card swiping data.

在一种可能的实现方式中，步骤二包括：In a possible implementation, step 2 includes:

根据所述历史运行数据得到历史客流数据。The historical passenger flow data is obtained according to the historical operation data.

采用时空图卷积网络根据所述历史客流数据得到所述预期客流量数据。The expected passenger flow data is obtained according to the historical passenger flow data using a spatiotemporal graph convolutional network.

在一种可能的实现方式中，步骤三中，所述利用Q-learning算法根据所述预期客流量数据构建得到调度模型的步骤包括：In a possible implementation, in step 3, the step of using the Q-learning algorithm to construct a scheduling model according to the expected passenger flow data includes:

创建Q矩阵，行表示状态，列表示动作。Create a Q matrix with rows representing states and columns representing actions.

采用所述预期客流量数据对所述Q矩阵进行训练，得到训练好的Q矩阵，即所述调度模型。The Q matrix is trained using the expected passenger flow data to obtain a trained Q matrix, namely the scheduling model.

在一种可能的实现方式中，步骤三中，所述采用所述预期客流量数据对所述Q矩阵进行训练包括：In a possible implementation, in step 3, the using the expected passenger flow data to train the Q matrix includes:

A，初始化当前状态为起始状态。A, initialize the current state to the starting state.

B1，根据当前状态和Q矩阵，使用ε-greedy策略选择决策动作。B1, according to the current state and Q matrix, use the ε-greedy strategy to select the decision action.

B2，执行所述决策动作，得到新状态。B2, execute the decision action to obtain a new state.

B3，观察新状态和即时奖励。B3, observe the new state and immediate reward.

B4，将新的Q值更新到Q矩阵中。B4, updates the new Q value into the Q matrix.

B5，将新状态设为当前状态。B5, sets the new state as the current state.

B6，若到达预设训练步数或到达终点状态，进入下一步骤，否则回到步骤B1。B6, if the preset number of training steps is reached or the end state is reached, proceed to the next step, otherwise return to step B1.

C，若到达预设训练次数，训练完成，否则回到步骤A。C. If the preset number of training times is reached, the training is completed, otherwise return to step A.

在一种可能的实现方式中，步骤B3中，所述即时奖励是通过预设计的奖励函数得到。In a possible implementation, in step B3, the instant reward is obtained through a pre-designed reward function.

所述奖励函数包括：运营收入奖励函数、运营成本奖励函数和乘客时间成本奖励函数。The reward function includes: an operating income reward function, an operating cost reward function and a passenger time cost reward function.

在一种可能的实现方式中，步骤B4中，使用Q-learning更新策略进行Q值更新。In a possible implementation, in step B4, the Q-value is updated using a Q-learning update strategy.

在一种可能的实现方式中，步骤三中，在得到所述调度模型后，还采用所述预期客流量数据对所述调度模型进行测试。In a possible implementation, in step three, after obtaining the scheduling model, the scheduling model is tested using the expected passenger flow data.

在一种可能的实现方式中，步骤四包括：In a possible implementation, step 4 includes:

获取公交系统的实时运行数据。Get real-time operation data of the public transportation system.

得到对应所述实时运行数据的预期客流量数据并输入所述调度模型，输出调度决策。The expected passenger flow data corresponding to the real-time operation data is obtained and input into the scheduling model, and a scheduling decision is output.

根据所述调度决策进行实际调度。Actual scheduling is performed according to the scheduling decision.

在一种可能的实现方式中，步骤四之后还包括：对所述调度模型进行性能评估。In a possible implementation manner, after step 4, the method further includes: performing a performance evaluation on the scheduling model.

当性能评估结果低于预设预期时，回到步骤一进行循环，直到性能评估结果高于或等于预设预期。When the performance evaluation result is lower than the preset expectation, return to step 1 and repeat until the performance evaluation result is higher than or equal to the preset expectation.

本发明中的一种基于Q-learning的公交调度方法，具有以下优点：A bus dispatching method based on Q-learning in the present invention has the following advantages:

通过公交系统的历史运行数据进行预测得到预设时间内的预期客流量数据，再利用Q-learning算法根据预期客流量数据构建得到调度模型，提高了采集数据与调度决策的相关度，并提高了公交调度的准确性；提出的采用时空图卷积网络根据历史客流数据得到预期客流量数据，提高了预期客流量数据的准确性；提出的奖励函数包括：运营收入奖励函数、运营成本奖励函数和乘客时间成本奖励函数，通过考虑乘客时间成本的影响，提高了公交调度的准确性；提出的对所述调度模型进行性能评估，当性能评估结果低于预设预期时，回到步骤一进行循环，直到性能评估结果高于或等于预设预期，提高了调度模型的可优化性。The expected passenger flow data within a preset time is predicted by using the historical operation data of the public transportation system, and then the scheduling model is constructed based on the expected passenger flow data using the Q-learning algorithm, which improves the correlation between the collected data and the scheduling decision, and improves the accuracy of public transportation scheduling; the proposed spatiotemporal graph convolutional network is used to obtain the expected passenger flow data based on the historical passenger flow data, which improves the accuracy of the expected passenger flow data; the proposed reward functions include: operating income reward function, operating cost reward function and passenger time cost reward function, which improves the accuracy of public transportation scheduling by considering the impact of passenger time cost; the proposed scheduling model is subjected to performance evaluation, and when the performance evaluation result is lower than the preset expectation, it returns to step one and loops until the performance evaluation result is higher than or equal to the preset expectation, thereby improving the optimizability of the scheduling model.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1为本发明实施例提供的一种基于Q-learning的公交调度方法的流程示意图；FIG1 is a schematic diagram of a process flow of a bus dispatching method based on Q-learning provided by an embodiment of the present invention;

图2为本发明实施例提供的一种基于Q-learning的公交调度方法的又一流程示意图。FIG. 2 is another schematic flow chart of a bus dispatching method based on Q-learning provided in an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

如图1所示，本发明实施例提供了一种基于Q-learning的公交调度方法，包括以下步骤：As shown in FIG1 , an embodiment of the present invention provides a bus scheduling method based on Q-learning, comprising the following steps:

示例性地，步骤一中，所述历史运行数据包括：发车时间数据、线路数据、到站时间数据、GPS轨迹数据和刷卡数据。Exemplarily, in step one, the historical operation data includes: departure time data, route data, arrival time data, GPS track data and card swiping data.

示例性地，步骤二包括：Exemplarily, step two includes:

具体地，在本实施例中，以5分钟为时间间隔统计历史运行数据，根据发车时间数据、线路数据、到站时间数据、GPS轨迹数据和刷卡数据综合分析得到每5分钟间隔的历史客流数据，包括每个站点在某一历史时刻的客流数据。Specifically, in this embodiment, historical operation data is counted at 5-minute time intervals, and historical passenger flow data every 5 minutes is obtained based on a comprehensive analysis of departure time data, route data, arrival time data, GPS trajectory data and card swiping data, including passenger flow data for each station at a certain historical moment.

采用训练好的时空图卷积网络对历史客流数据进行分析预测，得到预期客流量数据。时空图卷积网络的训练过程包括：将历史客流训练集（即根据历史客流数据建立的训练数据集）中的数据作为时空图卷积网络的输入，输出预期客流量训练数据；将输入数据对应预期时间后的实际客流量数据作为目标值，经过若干次训练迭代后，使得损失函数达到最小，得到训练好的时空图卷积网络。The trained spatiotemporal graph convolutional network is used to analyze and predict the historical passenger flow data to obtain the expected passenger flow data. The training process of the spatiotemporal graph convolutional network includes: taking the data in the historical passenger flow training set (i.e., the training data set established based on the historical passenger flow data) as the input of the spatiotemporal graph convolutional network, and outputting the expected passenger flow training data; taking the actual passenger flow data after the expected time corresponding to the input data as the target value, and after several training iterations, the loss function is minimized to obtain the trained spatiotemporal graph convolutional network.

示例性地，步骤三中，所述利用Q-learning算法根据所述预期客流量数据构建得到调度模型的步骤包括：Exemplarily, in step 3, the step of constructing a scheduling model according to the expected passenger flow data using a Q-learning algorithm includes:

具体地，在本实施例中，将Q矩阵的智能体表示为公交车辆，状态表示为各个公交线路的客流量，动作表示为选择某个时刻和某条公交线路进行发车。智能体会遍历所有线路，智能体获取当前状态下动作组合的最大Q值，获取最大Q值对应的动作并执行，然后转移到下一个状态。Specifically, in this embodiment, the agent of the Q matrix is represented as a bus, the state is represented as the passenger flow of each bus line, and the action is represented as selecting a certain time and a certain bus line for departure. The agent will traverse all lines, obtain the maximum Q value of the action combination in the current state, obtain the action corresponding to the maximum Q value and execute it, and then transfer to the next state.

在本实施例中，所有的初始Q值都设为0。设定学习率（alpha）为0.5，折扣因子（gamma）为0.9；设置训练次数为100、最大训练步数为20和ε-greedy策略中的ε值为0.2。在其它可能的实施例中，还可以根据实际情况调整上述参数。In this embodiment, all initial Q values are set to 0. Set the learning rate (alpha) to 0.5, the discount factor (gamma) to 0.9, set the number of training times to 100, the maximum number of training steps to 20, and the ε value in the ε-greedy strategy to 0.2. In other possible embodiments, the above parameters can also be adjusted according to actual conditions.

示例性地，步骤三中，所述采用所述预期客流量数据对所述Q矩阵进行训练包括：Exemplarily, in step 3, the using the expected passenger flow data to train the Q matrix includes:

示例性地，步骤B3中，所述即时奖励是通过预设计的奖励函数得到。Exemplarily, in step B3, the instant reward is obtained through a pre-designed reward function.

具体地，综合考虑乘客乘车带来的运营收入、运营成本、乘客时间成本，奖励函数R可表示为，其中R_I表示运营收入，R_O表示运营成本，R_P表示乘客时间成本。Specifically, considering the operating income, operating costs, and passenger time costs brought by passengers, the reward function R can be expressed as , where _RI represents operating revenue, _RO represents operating costs, and _RP represents passenger time cost.

运营收入奖励函数如下式：The operating income reward function is as follows:

其中k表示站点j的乘客数量，s表示票价。Where k is the number of passengers at station j and s is the fare.

公交企业的运营成本包括固定成本和车辆运行成本，车辆运行成本和运营里程具有正相关性，直接使用车辆运行成本来表示运营成本，运营成本奖励函数如下式：The operating costs of bus companies include fixed costs and vehicle operating costs. The vehicle operating costs and operating mileage are positively correlated. The vehicle operating costs are directly used to represent the operating costs. The operating cost reward function is as follows:

其中表示当前站点i与站点j之间的运营成本，/>表示站点i与站点j的距离，p表示单位油耗费用，取实际价格，n表示站点的个数，发车场点记为第0个站点，停车场点记为第n+1个站点。in represents the current operating cost between site i and site j,/> represents the distance between station i and station j, p represents the unit fuel consumption cost, taking the actual price, n represents the number of stations, the departure point is recorded as the 0th station, and the parking lot is recorded as the n+1th station.

设乘客均在公交到站时间内准时到达站点，乘客的时间成本为公交车晚到站点带来的乘客等待的时间成本，乘客时间成本奖励函数如下式：Assume that all passengers arrive at the station on time within the bus arrival time, and the passenger's time cost is the time cost of the passenger waiting caused by the late arrival of the bus at the station. The passenger time cost reward function is as follows:

其中表示站点j的乘客的时间成本；k为站点j的乘客数量；/>表示公交车到达站点j的实际时间；/>为站点j时间窗的最晚时间；/>为乘客的时间价值，即预先设置的乘客乘坐公交所节约时间对应的价值，在本实施例中，设置乘客的时间价值为每小时50元；/>为接近0的正数，在本实施例中取为0.0001，避免分母为0。in represents the time cost of passengers at station j; k is the number of passengers at station j; /> Indicates the actual time when the bus arrives at station j; /> is the latest time of the time window of station j;/> The time value of the passenger is the preset value of the time saved by the passenger when taking the bus. In this embodiment, the time value of the passenger is set to 50 yuan per hour; It is a positive number close to 0, and in this embodiment is taken as 0.0001 to avoid the denominator being 0.

综上，奖励函数如下式：In summary, the reward function is as follows:

示例性地，步骤B4中，使用Q-learning更新策略进行Q值更新。Exemplarily, in step B4, the Q-value is updated using a Q-learning update strategy.

具体地，首先定义记忆矩阵来依次记录智能体所经历过的所有状态st与相应动作at ；设记忆矩阵为h行2列的矩阵，其中h表示从初始时刻到当前时刻所经历的状态数量；以记忆矩阵中的/>为索引找到前一个“状态-动作”所对应的Q值并更新；然后令t减1，并判断t-1是否为0，如果为0，说明状态st前续所经历过的所有“状态-动作”的Q值已更新完毕；如果不为0，则寻找其下一个“状态-动作”的Q值进行更新，直至所有Q值更新完毕。Q-learning更新策略如下式：Specifically, first define the memory matrix To record all the states st and corresponding actions at experienced by the agent in sequence; let the memory matrix be a matrix with h rows and 2 columns, where h represents the number of states experienced from the initial moment to the current moment; with the memory matrix in/> Find the Q value corresponding to the previous "state-action" for the index and update it; then subtract 1 from t and determine whether t-1 is 0. If it is 0, it means that the Q values of all the "state-actions" that the state st has experienced before have been updated; if it is not 0, find the Q value of the next "state-action" and update it until all Q values are updated. The Q-learning update strategy is as follows:

其中表示状态/>采取动作/>后更新的Q值，/>表示g时刻的状态，/>表示状态/>采取的动作，/>表示状态/>采取动作/>获得的即时奖励，/>为折扣系数，/>表示状态/>采取动作a可获得的最大Q值，g=t-1,t-2,… ,2,1。in Display status/> Take Action/> The updated Q value, /> Indicates the state at time g, /> Display status/> Actions taken, /> Display status/> Take Action/> Instant rewards received, /> is the discount factor, /> Display status/> The maximum Q value that can be obtained by taking action a is g=t-1,t-2,… ,2,1.

示例性地，步骤三中，在得到所述调度模型后，还采用所述预期客流量数据对所述调度模型进行测试。Exemplarily, in step three, after obtaining the scheduling model, the scheduling model is tested using the expected passenger flow data.

具体地，测试步骤包括：Specifically, the testing steps include:

A`，初始化当前状态为起始状态。A`, initialize the current state to the starting state.

B1`，根据当前状态和训练完成的Q矩阵，选择具有最大Q值的动作作为决策动作。B1`, according to the current state and the trained Q matrix, select the action with the maximum Q value as the decision action.

B2`，执行所述决策动作，得到新状态。B2`, execute the decision action to obtain a new state.

B3`，观察新状态和即时奖励。B3`, observe the new state and immediate reward.

B4`，将新状态设为当前状态。B4`, set the new state as the current state.

B5`，若到达终点状态，结束测试，否则回到步骤B1`。B5`, if the end state is reached, end the test, otherwise return to step B1`.

示例性地，步骤四包括：Exemplarily, step four includes:

具体地，将对应所述实时运行数据的预期客流量数据作为调度模型的当前状态，选择具有最大Q值的动作作为决策动作，即调度决策；根据调度决策生成公交车发车时间表或线路规划。Specifically, the expected passenger flow data corresponding to the real-time operation data is used as the current state of the scheduling model, and the action with the maximum Q value is selected as the decision action, that is, the scheduling decision; and a bus departure timetable or route planning is generated according to the scheduling decision.

如图2所示，示例性地，步骤四之后还包括：对所述调度模型进行性能评估。As shown in FIG. 2 , exemplarily, step 4 further includes: performing performance evaluation on the scheduling model.

具体地，使用生成的公交车发车时间表或线路规划进行模拟运行。根据模拟运行的结果，评估公交系统的性能指标，包括乘客等待时间、车辆利用率。当性能评估结果低于预设预期时，回到步骤一进行循环，调整学习率、折扣因子等超参数，并重新进行训练和测试，直到性能评估结果高于或等于预设预期。Specifically, the generated bus departure timetable or route planning is used for simulation. Based on the results of the simulation, the performance indicators of the bus system are evaluated, including passenger waiting time and vehicle utilization. When the performance evaluation result is lower than the preset expectation, return to step 1 for a loop, adjust the hyperparameters such as learning rate and discount factor, and retrain and test until the performance evaluation result is higher than or equal to the preset expectation.

本发明实施例通过公交系统的历史运行数据进行预测得到预设时间内的预期客流量数据，再利用Q-learning算法根据预期客流量数据构建得到调度模型，提高了采集数据与调度决策的相关度，并提高了公交调度的准确性；提出的采用时空图卷积网络根据历史客流数据得到预期客流量数据，提高了预期客流量数据的准确性；提出的奖励函数包括：运营收入奖励函数、运营成本奖励函数和乘客时间成本奖励函数，通过考虑乘客时间成本的影响，提高了公交调度的准确性；提出的对所述调度模型进行性能评估，当性能评估结果低于预设预期时，回到步骤一进行循环，直到性能评估结果高于或等于预设预期，提高了调度模型的可优化性。The embodiment of the present invention predicts the expected passenger flow data within a preset time through the historical operation data of the public transportation system, and then uses the Q-learning algorithm to construct a scheduling model according to the expected passenger flow data, thereby improving the correlation between the collected data and the scheduling decision, and improving the accuracy of public transportation scheduling; the proposed spatiotemporal graph convolutional network is used to obtain the expected passenger flow data according to the historical passenger flow data, thereby improving the accuracy of the expected passenger flow data; the proposed reward function includes: an operating income reward function, an operating cost reward function and a passenger time cost reward function, and improves the accuracy of public transportation scheduling by considering the influence of the passenger time cost; the proposed scheduling model is subjected to performance evaluation, and when the performance evaluation result is lower than the preset expectation, it returns to step one and loops until the performance evaluation result is higher than or equal to the preset expectation, thereby improving the optimizability of the scheduling model.

在一种可能的实施例中，本申请的研究原理如下：In a possible embodiment, the research principle of this application is as follows:

根据公交公司是否调整线路运力配置，调度运营模式主要包括两种：运力不变型和运力增加型。Depending on whether the bus company adjusts the line capacity configuration, the dispatching operation mode mainly includes two types: capacity unchanged type and capacity increased type.

运力不变型。在运力不变型的运营模式下，公交线路的配车数不做调整，通过调整全程车的发车间隔，预留部分运力用于快速公交的调度。快速公交与全程车在运营线路、车辆配置和路权使用方面保持一致，仅在停靠站点和发车频率有所区别。该模式适用于现状运力水平较高的线路，在这种调度模式下不会额外增加公交公司的车辆购置成本。Capacity unchanged type. Under the capacity unchanged operation mode, the number of buses on the bus routes will not be adjusted. By adjusting the departure intervals of full-distance buses, some capacity will be reserved for the dispatch of rapid buses. Rapid buses and full-distance buses are consistent in terms of operating routes, vehicle configuration and road rights, and only differ in stops and departure frequencies. This mode is suitable for routes with relatively high current capacity levels, and under this dispatch mode, the vehicle purchase costs of bus companies will not be increased.

运力增加型。在运力增加型的运营模式下，现有的全程车的发车计划不做调整，公交公司额外增加配车数用于快速公交调度。快速公交车与全程车在运营线路和路权使用方面保持一致，在停靠站点、发车频率和车辆配置方面会有所区别。该模式适用于现状运力水平较低的线路，在这种调度模式下会一定程度上增加公交公司的车辆购置成本。Capacity increase type. Under the capacity increase operation mode, the departure plan of the existing full-distance buses will not be adjusted, and the bus company will add additional vehicles for rapid bus dispatch. Rapid buses and full-distance buses are consistent in terms of operating routes and road rights, but will differ in terms of stops, departure frequency and vehicle configuration. This mode is suitable for routes with low current capacity levels. Under this dispatch mode, the vehicle acquisition cost of the bus company will increase to a certain extent.

当公交线路客流具有需求集中性，呈现多峰型分布形态，且中长距离出行乘客比例较大时，需要采用快速公交调度模式。一般采用方向不均衡系数和站点不均衡系数等参数来确定快速公交的开设方向和服务站点。When the passenger flow on a bus route has concentrated demand, presents a multi-peak distribution pattern, and the proportion of passengers traveling medium and long distances is large, it is necessary to adopt the rapid bus dispatching mode. Generally, parameters such as direction imbalance coefficient and station imbalance coefficient are used to determine the opening direction and service stations of rapid buses.

开设方向确定。快速公交开设方向主要依据方向不均匀系数来确定，其值为该线路上下行的客流量中的较大值与上下行平均客流量的比值。通过比较线路上下行方向不均匀系数，当其中一个方向上的客流量显著大于另外一个方向的客流量时，在客流量较少的方向开设快速公交以增加车辆的周转。Determination of the opening direction. The opening direction of the BRT is mainly determined by the directional unevenness coefficient, which is the ratio of the larger value of the up and down passenger flow of the line to the average up and down passenger flow. By comparing the up and down direction unevenness coefficients of the line, when the passenger flow in one direction is significantly greater than the passenger flow in the other direction, the BRT is opened in the direction with less passenger flow to increase vehicle turnover.

停靠站点确定。快速公交服务站点数量的多少对乘客和公交公司均有直接的影响，若站点数量选择过少，将会造成客流量不足、资源浪费和收益过少等问题；若站点数量选择过多，将增加快速公交的运行时间，且相对于全程车的优势不明显。站点的选择以大客流上、下车站点和枢纽站点为主。在实际选择过程中，有一部分比较靠近起点站或终点站的中间站点，由于上客量较大但下客量较小，或下客量较大但上客量较小，导致站点不均衡系数较小，无法进入快速公交站点集合，在一定程度上将损失部分客流。因此，在快速公交站点选取时，需将这些客流量较大的站点纳入快速公交站点集合。Determine the stop sites. The number of BRT service sites has a direct impact on both passengers and bus companies. If the number of sites is too small, it will cause problems such as insufficient passenger flow, waste of resources and too little revenue; if the number of sites is too large, the operation time of BRT will increase, and the advantage over full-route buses will not be obvious. The selection of sites is mainly based on large passenger flow boarding and alighting sites and hub sites. In the actual selection process, some intermediate sites that are relatively close to the starting station or the terminal station have a small station imbalance coefficient due to a large number of passengers boarding but a small number of passengers disembarking, or a large number of passengers disembarking but a small number of passengers boarding. They cannot enter the BRT site set, and will lose some passenger flow to a certain extent. Therefore, when selecting BRT sites, these sites with large passenger flow need to be included in the BRT site set.

单一的全程车调度无法解决公交沿线客流在时间、站点、方向上分布不均衡的问题，研究单向线路高峰时期的站点客流分布规律，提出高峰时期单向公交线路快速公交调度策略，可以为平峰时期运力配置和潮汐性强的公交线路提供依据。A single full-route vehicle dispatch cannot solve the problem of uneven distribution of passenger flow in time, stations and directions along the bus line. By studying the passenger flow distribution pattern at stations during peak periods on one-way lines and proposing a rapid bus dispatch strategy for one-way bus lines during peak periods, a basis can be provided for capacity allocation during off-peak periods and for bus lines with strong tidal characteristics.

快速公交调度策略也会给线路运营带来负面影响，比如：快速公交的调度减缓了全程车的周转、非快速公交服务站点的乘客等待时间相对增加等。因此，快速公交调度问题主要为了研究最佳的服务频率以及最优服务站点，以兼顾乘客效益和公交公司的运营效益。The BRT dispatching strategy will also have a negative impact on line operations. For example, the BRT dispatching slows down the turnover of full-route vehicles, and the waiting time of passengers at non-BRT service stations increases. Therefore, the BRT dispatching problem is mainly to study the optimal service frequency and the optimal service station, so as to take into account both passenger benefits and bus company operating benefits.

在一种可能的实施例中，调度策略的约束条件主要包含：满载率、站点乘客等待时间和发车频率。In a possible embodiment, the constraints of the dispatching strategy mainly include: full load rate, passenger waiting time at the station and departure frequency.

满载率：满载率是车内实际载客量与车辆额定载客量的比值，是衡量车辆利用率的指标。满载率过高，乘客的乘车舒适度将大大降低；满载率过低，公交公司的经济效益无法保证。因此，在保证社会效益的前提下，需通过满载率约束，保证公交公司的经济效益，实现乘客和公交公司的效益最优。根据《城市公交管理规范》，从乘客舒适度方面考虑，满载率应不超过120%，从公交公司的经济效益方面考虑，满载率应不低于50%。Full load rate: The full load rate is the ratio of the actual passenger capacity in the vehicle to the rated passenger capacity of the vehicle, and is an indicator to measure the vehicle utilization rate. If the full load rate is too high, the passenger comfort will be greatly reduced; if the full load rate is too low, the economic benefits of the bus company cannot be guaranteed. Therefore, under the premise of ensuring social benefits, it is necessary to ensure the economic benefits of the bus company through full load rate constraints to achieve the best benefits for passengers and bus companies. According to the "Urban Public Transport Management Regulations", considering the passenger comfort, the full load rate should not exceed 120%, and considering the economic benefits of the bus company, the full load rate should not be less than 50%.

站点乘客等待时间：根据乘客在候车过程中心理状态变化，一般乘客在等车15分钟后感到非常焦虑、烦躁，想换其他交通方式，而在50分钟后会绝望，决定不再坐公交车。因此，本申请取候车时间上下限分别为25min和2min。Passenger waiting time at the station: According to the changes in the psychological state of passengers while waiting for the bus, passengers generally feel very anxious and irritable after waiting for 15 minutes and want to change other modes of transportation, while they will despair after 50 minutes and decide not to take the bus anymore. Therefore, the upper and lower limits of the waiting time for this application are 25 minutes and 2 minutes respectively.

发车频率：发车频率与乘客和公交公司的利益联系紧密。发车频率过低，乘客等候时间将增加，可能会流失一部分客流；发车频率过高，公交公司的经济效益无法得到保证。根据相关研究，乘客满意度达到80%情况下的高峰时期公交发车间隔为3min，因此，取发车频率的上限为20 辆/小时。Departure frequency: The departure frequency is closely related to the interests of passengers and bus companies. If the departure frequency is too low, the waiting time of passengers will increase, and some passengers may be lost; if the departure frequency is too high, the economic benefits of the bus company cannot be guaranteed. According to relevant research, when passenger satisfaction reaches 80%, the bus departure interval during peak hours is 3 minutes. Therefore, the upper limit of the departure frequency is 20 buses/hour.

在一种可能的实施例中，采用python作为仿真搭建的语言，利用公交车gps轨迹数据、公交车到站表、公交刷卡数据、公交线网信息多源数据融合分析得到各公交线路实时客流信息，当线路客流量出现过多或路段拥堵时动态改变车次，以满足乘客乘车需求同时减少路途拥堵。本申请使用运力增加型公交调度方式，强化学习通过观察环境中每个状态的变化，并从环境中获得动作反馈，学习最优的控制策略以获得最大的总体奖励。In a possible embodiment, Python is used as the language for simulation construction, and real-time passenger flow information of each bus line is obtained by fusion analysis of multi-source data such as bus GPS trajectory data, bus arrival table, bus card swiping data, and bus line network information. When the passenger flow of the line is too much or the road section is congested, the bus number is dynamically changed to meet the passenger's demand for riding and reduce road congestion. This application uses a capacity-increasing bus scheduling method, and reinforcement learning learns the optimal control strategy to obtain the maximum overall reward by observing the changes in each state in the environment and obtaining action feedback from the environment.

尽管已描述了本发明的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例作出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although the preferred embodiments of the present invention have been described, those skilled in the art may make other changes and modifications to these embodiments once they have learned the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications that fall within the scope of the present invention.

显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.

Claims

1. A bus dispatching method based on Q-learning is characterized by comprising the following steps:

step one, acquiring historical operation data of a public transportation system;

step two, expected passenger flow data in preset time are obtained according to the historical operation data;

thirdly, constructing a scheduling model according to the expected passenger flow data by utilizing a Q-learning algorithm;

and step four, applying the scheduling model to actual bus system operation.

2. The bus dispatching method based on Q-learning according to claim 1, wherein in the first step, the historical operation data includes: departure time data, line data, arrival time data, GPS track data, and swipe card data.

3. The bus dispatching method based on Q-learning according to claim 1, wherein the second step comprises:

obtaining historical passenger flow data according to the historical operation data;

and obtaining the expected passenger flow data according to the historical passenger flow data by adopting a space-time diagram convolution network.

4. The bus dispatching method based on Q-learning according to claim 1, wherein in the third step, the step of constructing a dispatching model according to the expected passenger flow data by using Q-learning algorithm includes:

creating a Q matrix, wherein rows represent states and columns represent actions;

and training the Q matrix by adopting the expected passenger flow data to obtain a trained Q matrix, namely the scheduling model.

5. The bus dispatching method based on Q-learning according to claim 4, wherein in step three, the training the Q matrix using the expected passenger flow data comprises:

a, initializing a current state as an initial state;

b1, selecting a decision action by using an epsilon-greedy strategy according to the current state and the Q matrix;

b2, executing the decision action to obtain a new state;

b3, observing a new state and instant rewards;

b4, updating the new Q value into the Q matrix;

b5, setting the new state as the current state;

if the preset training step number is reached or the end point state is reached, entering the next step, otherwise returning to the step B1;

and C, if the preset training times are reached, finishing training, otherwise, returning to the step A.

6. The bus dispatching method based on Q-learning according to claim 5, wherein in step B3, the instant rewards are obtained by a pre-designed rewarding function;

the reward function includes: an operating revenue rewards function, an operating cost rewards function, and a passenger time cost rewards function.

7. The bus dispatching method based on Q-learning according to claim 5, wherein in step B4, Q value update is performed by using Q-learning update strategy.

8. The bus dispatching method based on Q-learning according to claim 4, wherein in step three, after the dispatching model is obtained, the expected passenger flow data is further adopted to test the dispatching model.

9. The bus dispatching method based on Q-learning as set forth in claim 1, wherein the fourth step comprises:

acquiring real-time operation data of a public transport system;

obtaining expected passenger flow data corresponding to the real-time operation data, inputting the expected passenger flow data into the scheduling model, and outputting a scheduling decision;

and carrying out actual scheduling according to the scheduling decision.

10. The bus dispatching method based on Q-learning according to claim 1, further comprising, after the fourth step: performing performance evaluation on the scheduling model;

and when the performance evaluation result is lower than the preset expectation, returning to the step one for circulation until the performance evaluation result is higher than or equal to the preset expectation.