CN110570672B

CN110570672B - A method of regional traffic light control based on graph neural network

Info

Publication number: CN110570672B
Application number: CN201910881544.6A
Authority: CN
Inventors: 余正旭; 蔡登�; 魏龙; 谢亮; 金仲明; 黄建强; 华先胜; 何晓飞
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-09-18
Filing date: 2019-09-18
Publication date: 2020-12-01
Anticipated expiration: 2039-09-18
Also published as: CN110570672A

Abstract

The invention provides a regional traffic light control method based on a graph neural network, which is characterized in that a traffic flow predictor and a traffic light controller are trained at the same time, a traffic light controller is helped to generate a new control scheme by using a predicted value of future traffic flow change under a current intervention action of the traffic flow predictor, and evaluation information of the current action value is used for assisting in training the traffic light controller to maximize long-term and short-term benefits of the traffic light control scheme. The traffic flow predictor and the signal lamp controller are both built on the basis of a deep message propagation graph network. The invention can continuously optimize the system to adapt to the changing traffic flow, and improve the smoothness degree of the road network and the traffic efficiency.

Description

A method of regional traffic light control based on graph neural network

技术领域technical field

本发明属于交通信号灯控制领域，尤其是涉及一种基于图神经网络的区域交通信号灯控制方法。The invention belongs to the field of traffic signal light control, in particular to a regional traffic signal light control method based on a graph neural network.

背景技术Background technique

交通信号灯控制是一个关键而又具有挑战性的现实问题，其目的是使路网的交通效率最大化，并避免交叉口内可能的交通冲突。近年来，信号交叉口成为了是城市交通路网中的交通效率提升最大的瓶颈之一。因此，找到一种切实可行的能够根据当期、未来车流情况自动学习调节的交通信号控制方法，能够显著地缓解交通拥堵，并带来显著的经济、环境和社会效益。Traffic light control is a critical and challenging real-world problem, which aims to maximize the traffic efficiency of the road network and avoid possible traffic conflicts within the intersection. In recent years, signalized intersections have become one of the biggest bottlenecks in the improvement of traffic efficiency in the urban traffic network. Therefore, finding a feasible traffic signal control method that can automatically learn and adjust according to the current and future traffic conditions can significantly alleviate traffic congestion and bring significant economic, environmental and social benefits.

目前，许多现代城市中广泛使用的交通信号控制系统，如SCATS、SCOOT等系统，这些系统的交通信号方案主要是依靠基于统计历史交通数据的交通信号控制算法设计得到。这些方法的控制策略(基于预定义相位的控制策略)是改变每个预先定义好的相位的参数如放行时间等。其中相位的概念是指该相位放行时间内各方向各车道的信号状态的组合。这些方法在不断变化，与时间相关的交通场景中既不灵活，也不能动态的根据实时流量调整信号控制方案。除此之外，还有其他较少使用的自适应交通信号控制方法，这些方法根据交叉口附近感应线圈的信号来确定下一秒(或时间段)应该放行哪些车道(相位)。但是这些方法同样也只依据当前的车流情况来干预交通流，并没有充分利用到历史交通数据来帮助优化、设计控制方案。同时这些方法还受到其依赖的感应线圈易损坏、在线率不稳定等问题的影响。At present, the traffic signal control systems widely used in many modern cities, such as SCATS, SCOOT and other systems, the traffic signal scheme of these systems is mainly designed based on the traffic signal control algorithm based on statistical historical traffic data. The control strategy of these methods (the control strategy based on the predefined phases) is to change the parameters of each predefined phase such as release time and so on. The concept of phase refers to the combination of signal states in each direction and each lane within the release time of the phase. These methods are neither flexible nor dynamically adapting signal control schemes to real-time traffic in time-dependent traffic scenarios. In addition to this, there are other less-used methods of adaptive traffic signal control that use signals from induction coils near an intersection to determine which lanes (phases) should be cleared for the next second (or time period). However, these methods also only intervene in the traffic flow based on the current traffic situation, and do not make full use of historical traffic data to help optimize and design control schemes. At the same time, these methods are also affected by the problems of fragile induction coils and unstable online rates that they rely on.

为了解决传统交通控制方法中存在的问题，最近提出了许多基于增强学习的信号灯控制方法。其中大部分方法仍使用每一秒(或时间段)决定下一秒(或时间段)可通行车道组合的相位切换策略。这些方法比使用预定义相位的控制策略显得更为灵活，但是这种方法频繁、突然的相位切换方式容易导致交通事故，并且严重影响驾驶体验。因此，为了避免因相位突然切换而造成驾驶体验不良和交通事故等问题，最近提出了一些基于预定义相位的强化学习方法。To solve the problems existing in traditional traffic control methods, many reinforcement learning-based signal light control methods have been proposed recently. Most of these methods still use the phase switching strategy in which every second (or time period) determines the combination of passable lanes in the next second (or time period). These methods are more flexible than the control strategy using predefined phases, but the frequent and sudden phase switching of this method is easy to cause traffic accidents and seriously affect the driving experience. Therefore, in order to avoid problems such as poor driving experience and traffic accidents caused by sudden phase switching, some reinforcement learning methods based on predefined phases have been proposed recently.

但是，不论是传统方法，或者是基于强化学习的方法，均将主要焦点放在了一个或很少量的交通灯的独立控制上，忽略了交通控制行为对周边路网可能带来的影响。同时，目前的方法仍然局限于对历史交通数据的学习和对当前车流的及时响应，忽略了交通控制行为可能带来的对未来区域车流的影响。However, both traditional methods and reinforcement learning-based methods focus on the independent control of one or a small number of traffic lights, ignoring the possible impact of traffic control behavior on the surrounding road network. At the same time, the current methods are still limited to the learning of historical traffic data and the timely response to the current traffic flow, ignoring the possible impact of traffic control behavior on future regional traffic flow.

发明内容SUMMARY OF THE INVENTION

本发明提供了一种基于图神经网络的区域交通信号灯控制方法，可以不断优化系统适应变化的车流，提高路网通畅程度和交通效率。The invention provides a regional traffic signal light control method based on a graph neural network, which can continuously optimize the system to adapt to the changing traffic flow and improve the smoothness of the road network and the traffic efficiency.

一种基于图神经网络的区域交通信号灯控制方法，其特征在于，包括：A method for controlling regional traffic lights based on a graph neural network, comprising:

(1)从信号灯控制系统中获取当前的信号控制方案以及目标区域路网过去几个周期的流量数据，所述的信号控制方案包括周期长度、相位方案和各相位放行时间；一般选择过去五个周期的流量数据，具体周期数目亦可根据应用需求修改(每一个周期指的是每个相位执行一次的时间总长)；(1) Obtain the current signal control scheme and the traffic data of the road network in the target area in the past several cycles from the signal light control system. The signal control scheme includes the period length, the phase scheme and the release time of each phase; generally select the past five Periodic flow data, the specific number of periods can also be modified according to the application requirements (each period refers to the total length of time for each phase to be executed once);

(2)将当前相位配时、各目标路口过去数个周期的流量数据、路网连接图输入到基于深度消息传播图网络MPGNN构建的交通流量预测器MPTF中，获得当前周期的各路口各方向的车流量预测数据；(2) Input the current phase timing, the traffic data of each target intersection in the past several cycles, and the road network connection map into the traffic flow predictor MPTF constructed based on the deep message propagation graph network MPGNN, and obtain the current cycle of each intersection and direction. traffic flow forecast data;

(3)将当前信号控制方案、各目标路口过去数个周期的流量数据、路网连接图以及步骤(2)得到的车流量预测数据输入到基于深度消息传播图网络MPGNN构建的交通信号灯控制器RTSC，将生成的当前周期的各相位配时作为调节后的控制方案；其中，所述的RTSC为每一个控制路口构建控制子网络，每个控制子网络使用同样的MPGNN输出作为输入；(3) Input the current signal control scheme, the traffic data of each target intersection in the past several cycles, the road network connection map and the traffic flow prediction data obtained in step (2) into the traffic signal controller constructed based on the deep message propagation graph network MPGNN RTSC, using the generated timing of each phase of the current cycle as the adjusted control scheme; wherein, the RTSC constructs a control sub-network for each control intersection, and each control sub-network uses the same MPGNN output as an input;

(4)使用交通流量预测器MPTF，输入步骤(3)调节后的控制方案、各目标路口过去数个周期的流量数据、路网连接图，评估步骤(3)调节动作的价值；(4) Using the traffic flow predictor MPTF, input the control scheme adjusted in step (3), the flow data of each target intersection in the past several cycles, and the road network connection diagram, and evaluate the value of the adjustment action in step (3);

(5)使用步骤(3)调节后的控制方案控制路网一个周期时间；(5) using the control scheme adjusted in step (3) to control a cycle time of the road network;

(6)从信号灯控制系统中收集当前的路网流量数据，结合该周期开始前的路网流量数据，计算步骤(3)中调节方案的收益；(6) Collect the current road network flow data from the signal light control system, and calculate the income of the adjustment scheme in step (3) in combination with the road network flow data before the start of the cycle;

(7)使用步骤(6)中收集的路网流量数据和调节方案收益，结合步骤(4)中得到的价值估计，训练交通流量预测器MPTF；(7) Using the road network flow data collected in the step (6) and the income of the adjustment scheme, combined with the value estimation obtained in the step (4), to train the traffic flow predictor MPTF;

(8)使用步骤(6)中得到的调节方案收益和步骤(4)中得到的价值估计，训练交通信号灯控制器RTSC；(8) using the adjustment scheme benefit obtained in step (6) and the value estimate obtained in step (4) to train the traffic light controller RTSC;

(9)开始下一个周期，每个周期重复上述步骤(1)至步骤(8)。(9) The next cycle is started, and the above steps (1) to (8) are repeated for each cycle.

本发明使用当前、历史流量数据以及配时方案，生成信号灯相位配时方案。同时训练一个流量预测器和一个信号灯控制器，可以同时控制多个交通信号灯，并协同优化每个交通信号控制方案；通过在线训练方式，使用交通流量预测器预测当前干预动作下的未来车流变化预测值，来帮助交通信号灯控制器生成新的控制方案，并使用交通流量预测器的动作价值预测器评估新的控制方案的价值，来辅助训练交通信号灯控制器最大化交通信号灯控制方案的长期、短期收益。整个算法以最小化路口等待时间为目标来优化每个步骤。The present invention uses the current and historical flow data and the timing scheme to generate the signal lamp phase timing scheme. Simultaneously train a traffic predictor and a signal light controller, which can control multiple traffic lights at the same time, and coordinately optimize each traffic signal control scheme; through online training, use the traffic flow predictor to predict the future traffic flow change prediction under the current intervention action value, to help the traffic light controller generate new control schemes, and use the action value predictor of the traffic flow predictor to evaluate the value of the new control scheme to assist in training the traffic light controller to maximize the long-term and short-term of the traffic light control scheme income. The entire algorithm optimizes each step with the goal of minimizing the intersection waiting time.

交通流量预测器和信号灯控制器均基于本发明中提出的深度消息传播图网络(MPGNN)搭建而成。Both the traffic flow predictor and the signal light controller are constructed based on the deep message propagation graph network (MPGNN) proposed in the present invention.

所述的深度消息传播图网络MPGNN由多个图神经网络层组成，网络输入是由路网上各节点的流量值和表示各节点间连接关系的路网连接图组成的输入图；在MPGNN中的每一层，会对输入图上各节点进行信息传播和信息汇聚两个操作，这两个操作的数学表达式分别为：The described deep message propagation graph network MPGNN is composed of multiple graph neural network layers, and the network input is the input graph composed of the traffic value of each node on the road network and the road network connection graph representing the connection relationship between each node; At each layer, two operations of information dissemination and information aggregation are performed on each node on the input graph. The mathematical expressions of these two operations are:

其中，

是节点v在经过第k层信息传播操作的输出，

是节点v在经过第k层信息汇聚操作的输出，

是输入图上节点v的流量值，N(v)表示所有直接连接到节点v的节点的集合，MLP表示由三层全连接神经网络层组成的多层感知机。in,

is the output of node v through the k-th layer information propagation operation,

is the output of node v after the k-th layer information aggregation operation,

is the flow value of node v on the input graph, N(v) represents the set of all nodes directly connected to node v, and MLP represents a multilayer perceptron consisting of three fully connected neural network layers.

步骤(2)中，所述交通流量预测器MPTF获得车流量预测数据的步骤为：首先将当前信号控制方案中的各相位配时使用多层全连接神经网络提取成特征编码；然后将各目标路口过去数个周期的流量数据、路网连接图、相位配时特征编码输入MPGNN，提取得到当前区域路网交通情况的特征向量；最后，将特征向量输入到未来流量预测器中，得到未来流量的预测值。In step (2), the step of obtaining the traffic flow prediction data by the traffic flow predictor MPTF is as follows: first, each phase timing in the current signal control scheme is extracted into a feature code using a multi-layer fully connected neural network; The traffic data, road network connection map, and phase timing feature codes of the past few cycles at the intersection are input into MPGNN, and the feature vector of the current regional road network traffic situation is extracted; finally, the feature vector is input into the future traffic predictor to obtain the future traffic. predicted value.

步骤(3)中，所述交通信号灯控制器RTSC生成行相位配时步骤为：In step (3), described traffic light controller RTSC generates line phase timing step as follows:

(3-1)将步骤(2)中得到的车流量预测数据、当前相位配时、各目标路口过去数个周期的流量数据、路网连接图输入一个MPGNN得到当前区域路网交通情况的特征向量；(3-1) Input the traffic flow prediction data obtained in step (2), the current phase timing, the traffic flow data of each target intersection in the past several cycles, and the road network connection map into an MPGNN to obtain the characteristics of the current regional road network traffic situation vector;

(3-2)根据需要控制的各路口分别构建一个路口控制子网络，每个子网络输出的相位配时与该路口相位数目有关。具体的，如果一个路口有6个相位，那么控制该路口的子网络将生成6个相位的配时，而另一个控制只有2个相位的子网络将生成2个相位的配时。各子网络均使用当前区域路网交通情况的特征向量作为输入，输出为各路口的控制动作高维连续分布的均值、平方差值；(3-2) Build an intersection control sub-network respectively according to each intersection that needs to be controlled, and the phase timing output by each sub-network is related to the number of phases at the intersection. Specifically, if an intersection has 6 phases, the sub-network controlling the intersection will generate timings of 6 phases, while the other sub-network controlling only 2 phases will generate timings of 2 phases. Each sub-network uses the feature vector of the current regional road network traffic situation as input, and the output is the mean value and squared difference of the high-dimensional continuous distribution of control actions at each intersection;

(3-3)各子网络从对应路口的控制动作高维连续分布中采样出各相位配时，并使用Softmax函数将各相位配时归一化得到各相位配时比例，再乘以周期长度得到各相位配时长度。(3-3) Each sub-network samples the timing of each phase from the high-dimensional continuous distribution of the control actions at the corresponding intersection, and uses the Softmax function to normalize the timing of each phase to obtain the timing ratio of each phase, which is then multiplied by the period length Get the timing length of each phase.

步骤(4)中，交通流预测器MPTF的使用方式与步骤(2)中相似，但当前相位配时用步骤(3)中生成的新配时替代。具体步骤为：输入步骤(3)调节后的控制方案、各目标路口过去数个周期的流量数据、路网连接图后，将得到的当前区域路网交通情况特征向量输入到动作价值预测器中用以预测步骤(3)中生成的新配时价值。In step (4), the traffic flow predictor MPTF is used in a similar manner as in step (2), but the current phase timing is replaced with the new timing generated in step (3). The specific steps are: after inputting the control scheme adjusted in step (3), the traffic data of each target intersection in the past several cycles, and the road network connection diagram, input the obtained traffic situation feature vector of the current regional road network into the action value predictor Used to predict the new timing value generated in step (3).

步骤(6)中，所述调节方案的收益表达式为：In step (6), the income expression of the adjustment scheme is:

R(O^(t-1),O^(t),A^(t))＝O^(t-1)-O^(t) R(O ^(t-1) ,O ^(t) ,A ^(t) )=O ^(t-1) -O ^(t)

其中，O⁽ⁱ⁾为第i个周期内，区域路网各路口各方向的车辆排队长度；A^(t)为第i个周期开始前步骤(3)中生成的相位配时。该收益的具体意义为区域路网各节点上的排队长度变化值，最终得到的R是一个向量。Among them, O ⁽ⁱ⁾ is the vehicle queuing length in each direction of each intersection of the regional road network in the i-th cycle; A ^(t) is the phase timing generated in step (3) before the i-th cycle begins. The specific meaning of this benefit is the change value of the queue length on each node of the regional road network, and the final R obtained is a vector.

步骤(7)中，训练交通流量预测器使用的损失函数为：In step (7), the loss function used to train the traffic flow predictor is:

其中，V_θ(O^(t-1),A^(t))为步骤(4)中预测的新配时价值，

为步骤(2)中得到的流量预测值(车辆排队长度)，|*|_L1为平均绝对误差函数。Among them, V _θ (O ^(t-1) , A ^(t) ) is the new timing value predicted in step (4),

is the flow prediction value (vehicle queuing length) obtained in step (2), and |*| _L1 is the mean absolute error function.

步骤(8)中，训练信号灯控制器使用的目标函数为：In step (8), the objective function used for training the signal light controller is:

其中，N为所有路口的集合，训练采用在线训练策略，每一个周期之后所有网络均会被优化一次。Among them, N is the set of all intersections, and the training adopts an online training strategy. After each cycle, all networks will be optimized once.

与现有技术相比，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

1、本方法基于图神经网络挖掘区域路网车流的相关特征，对路网车流的演化过程建模，能够提高交通信号灯控制器、流量预测器的对路网车流变化趋势的感知能力。1. This method mines the relevant characteristics of the traffic flow on the regional road network based on the graph neural network, and models the evolution process of the traffic flow on the road network, which can improve the perception ability of the traffic signal controller and the flow predictor to the changing trend of the traffic flow on the road network.

2、本发明提出的基于图神经网络的所有网络，仅需使用路网连接图来表明路网拓扑结构，而各路口连接边上的车流转移权重由网络动态学习得到。克服了基于卷积图网络的其他方法需要使用拉普拉斯特征基来构建图网络结构的缺点。2. All the networks based on the graph neural network proposed by the present invention only need to use the road network connection graph to indicate the road network topology, and the traffic flow transfer weights on the connecting edges of each intersection are dynamically learned by the network. It overcomes the shortcomings of other methods based on convolutional graph networks that need to use Laplacian feature bases to build graph network structures.

3、本发明提出的流量预测器，通过动态对路网车流演化过程建模，能够更好的刻画车流动态变化。在有说服力的公开数据集上测试，效果超过其他目前世界最优秀的流量预测方法。3. The traffic predictor proposed by the present invention can better describe the dynamic changes of the traffic flow by dynamically modeling the evolution process of the traffic flow of the road network. Tested on convincing public datasets, it outperforms other state-of-the-art traffic forecasting methods in the world.

4、本发明提出的交通信号灯控制器，可同时控制大量路口信号灯。与先前的最优秀的方法相比，能够协同优化所有信号灯，以达到路网流量均衡，有效提升路网通畅程度的作用。基于图神经网络提取得到的当前车流特征为交通信号灯控制器提升了其全局感知能力。4. The traffic signal light controller proposed by the present invention can control a large number of intersection signal lights at the same time. Compared with the previous best method, it can synergistically optimize all traffic lights to achieve a balanced traffic flow in the road network and effectively improve the smoothness of the road network. The current traffic flow feature extracted based on the graph neural network improves the global perception capability of the traffic light controller.

5、本发明将流量预测和交通信号灯控制有机结合，通过在线训练的方式，不断优化系统适应变化的车流，并提高路网通畅程度和交通效率。流量预测值的使用，使得交通信号灯控制器提高了对动作未来收益的感知能力，帮助提高交通信号灯控制器的动作价值。同时对动作价值的预测，可以帮助提高交通信号灯控制器产生的动作的远期收益。5. The present invention organically combines flow prediction and traffic signal control, and continuously optimizes the system to adapt to changing traffic flows through online training, and improves the smoothness of the road network and traffic efficiency. The use of the traffic forecast value enables the traffic light controller to improve the ability to perceive the future benefits of the action, helping to improve the action value of the traffic light controller. At the same time, the prediction of the action value can help to improve the long-term benefits of the action generated by the traffic light controller.

附图说明Description of drawings

图1为本发明基于图神经网络的区域交通信号灯控制方法的流程示意图1 is a schematic flow chart of a method for controlling regional traffic lights based on a graph neural network according to the present invention

图2为本发明中交通流量预测器MPTF的结构示意图；2 is a schematic structural diagram of a traffic flow predictor MPTF in the present invention;

图3为本发明中交通信号灯控制器RTSC的结构示意图；3 is a schematic structural diagram of the traffic signal light controller RTSC in the present invention;

图4为本发明实施例中基于SUMO模拟器搭建的仿真路网示意图；4 is a schematic diagram of a simulated road network constructed based on a SUMO simulator in an embodiment of the present invention;

图5为本发明实施例在模拟流量配置1下测试的路网平均速度可视化图；5 is a visualization diagram of the average speed of the road network tested under simulated traffic configuration 1 according to an embodiment of the present invention;

图6为本发明实施例在模拟流量配置1下测试的路网平均排队长度可视化图。FIG. 6 is a visualization diagram of the average queue length of a road network tested under simulated traffic configuration 1 according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本发明做进一步详细描述，需要指出的是，以下所述实施例旨在便于对本发明的理解，而对其不起任何限定作用。The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be pointed out that the following embodiments are intended to facilitate the understanding of the present invention, but do not have any limiting effect on it.

如图1所示，一种基于图神经网络的区域交通信号灯控制方法，使用预定义相位、调节各相位配时长度的控制策略。本发明使用在线训练方法在运行中不断学习，在每一个周期(每个相位执行一次的时间总长)内，该方法包括以下步骤：As shown in Figure 1, a graph neural network-based control method for regional traffic lights uses a control strategy of pre-defined phases and adjusting the timing length of each phase. The present invention uses the online training method to learn continuously during operation, and in each cycle (the total length of time for each phase to be executed once), the method includes the following steps:

S01从信号灯控制系统中获取当前的信号控制方案和流量指标数据。其中信号控制方案包括周期长度、相位方案、各相位放行时间及信号灯结构化静态数据如GPS定位、版本号等。流量指标数据包括目标区域路网过去几个周期各路口各方向的车辆排队长度，亦可根据应用需求替换为其他可表示流量状况的指标如通过车辆数目等。S01 obtains the current signal control scheme and flow index data from the signal light control system. The signal control scheme includes cycle length, phase scheme, release time of each phase, and structured static data of signal lights such as GPS positioning, version number, etc. The traffic index data includes the queuing length of vehicles in each direction at each intersection of the target area road network in the past few cycles, and can also be replaced by other indicators that can represent traffic conditions, such as the number of passing vehicles, according to application requirements.

S02将当前相位配时、各目标路口过去数个周期的流量数据、路网连接图输入基于MPGNN构建的交通流量预测器MPTF，获得当前周期的各路口各方向的车辆排队长度的预测值，交通流量预测器MPTF的结构如图2所示。其中的深度消息传播图网络MPGNN是由多个图神经网络层组成，网络输入是由路网上各节点的流量值和表示各节点间连接关系的路网连接图组成的输入图。在MPGNN中的每一层，会对输入图上各节点进行‘信息传播’和‘信息汇聚’两个操作，这两个操作的数学表达式为：S02 Input the current phase timing, the traffic data of each target intersection in the past several cycles, and the road network connection map into the traffic flow predictor MPTF constructed based on MPGNN, and obtain the predicted value of the vehicle queue length of each intersection and direction in the current cycle. The structure of the traffic predictor MPTF is shown in Figure 2. The deep message propagation graph network MPGNN is composed of multiple graph neural network layers, and the network input is the input graph composed of the traffic value of each node on the road network and the road network connection graph representing the connection relationship between each node. In each layer of MPGNN, two operations of 'information spreading' and 'information gathering' are performed on each node on the input graph. The mathematical expressions of these two operations are:

其中，

是节点v在经过第k层‘信息传播’操作的输出，

是节点v在经过第k层‘信息汇聚’操作的输出，

是输入图上节点v的流量值，N(v)表示所有直接连接到节点v的节点的集合，MLP表示由三层全连接神经网络层组成的多层感知机。在‘信息传播’操作中，所有直接连接到节点v的节点的信息被线性加和。在‘信息汇聚’操作中，‘信息传播’操作的输出会和上一层输出的节点信息相加，然后输入MLP得到当前层的输出图。in,

is the output of node v through the k-th layer 'information propagation' operation,

is the output of node v after the k-th layer 'information aggregation' operation,

is the flow value of node v on the input graph, N(v) represents the set of all nodes directly connected to node v, and MLP represents a multilayer perceptron consisting of three fully connected neural network layers. In the 'information propagation' operation, the information of all nodes directly connected to node v is summed linearly. In the 'information gathering' operation, the output of the 'information spreading' operation will be added to the node information output by the previous layer, and then input to the MLP to obtain the output graph of the current layer.

S03将当前信号控制方案、各目标路口过去数个周期的流量数据、S02中得到的流量预测数据、路网连接图输入基于MPGNN构建的交通信号灯控制器RTSC，生成当前周期的各相位配时。交通信号灯控制器RTSC的结构如图3所示。其中，交通信号灯控制器RTSC为每一个控制路口个性化构建控制子网络，每个控制子网络均使用同样的MPGNN的输出作为输入。而各子网络输出的相位配时从一个独立的高维连续分布中采样得到。S03 inputs the current signal control scheme, the flow data of each target intersection in the past several cycles, the flow prediction data obtained in S02, and the road network connection map into the traffic light controller RTSC constructed based on MPGNN, and generates the timing of each phase of the current cycle. The structure of the traffic light controller RTSC is shown in Figure 3. Among them, the traffic light controller RTSC builds a control sub-network individually for each control intersection, and each control sub-network uses the same MPGNN output as the input. The phase distribution of each sub-network output is sampled from an independent high-dimensional continuous distribution.

S04使用交通流量预测器，输入S03中调节后的控制方案、各目标路口过去数个周期的流量数据、路网连接图，通过深度传播图网络MPGNN后得到当前路网情况的特征向量，再输入到由三个全连接层构成的动作价值预测器中得到调节动作的价值预测值。三个全连接层之间各有一个ReLU激活函数。S04 uses the traffic flow predictor to input the adjusted control scheme in S03, the flow data of each target intersection in the past several cycles, and the road network connection diagram, and obtains the feature vector of the current road network situation through the deep propagation graph network MPGNN, and then input To the action value predictor composed of three fully connected layers, the value prediction value of the adjustment action is obtained. There is a ReLU activation function between each of the three fully connected layers.

S05使用S03中调节后的控制方案控制路网一个周期时间。S05 uses the control scheme adjusted in S03 to control the road network for one cycle time.

S06再次从信号灯控制系统中收集当前的路网流量数据，结合该周期开始前的路网流量数据，计算S03中调节方案的收益。收益的计算方式为：S06 collects the current road network flow data from the signal light control system again, and calculates the income of the adjustment scheme in S03 in combination with the road network flow data before the start of the cycle. Earnings are calculated as:

其中O⁽ⁱ⁾为第i个周期内，区域路网各路口各方向的车辆排队长度，A^(t) Among them, O ⁽ⁱ⁾ is the vehicle queuing length in each direction of each intersection of the regional road network in the ith cycle, A ^(t)

为第i个周期开始前S03中生成的相位配时。该收益的具体意义为区域路网各节点上的排队长度变化值，最终得到的R是一个向量。Timing the phase generated in S03 before the start of the i-th cycle. The specific meaning of this benefit is the change value of the queue length on each node of the regional road network, and the final R obtained is a vector.

S07使用S06中收集的路网流量数据和计算得到的收益，结合S04中得到的价值估计，训练交通流量预测器。训练交通流量预测器使用的损失函数为：S07 uses the road network flow data collected in S06 and the calculated income, combined with the value estimate obtained in S04, to train the traffic flow predictor. The loss function used to train the traffic flow predictor is:

其中V_θ(O^(t-1),A^(t))为S04中预测的新配时价值，

为S02中得到的流量预测值(车辆排队长度)，|*|_L1为平均绝对误差函数。where V _θ (O ^(t-1) , A ^(t) ) is the new timing value predicted in S04,

is the flow prediction value (vehicle queuing length) obtained in S02, and |*| _L1 is the mean absolute error function.

S08使用S06中得到的收益和S04中得到的价值估计，训练交通信号灯控制器。训练信号灯控制器使用的目标函数为：S08 uses the gain obtained in S06 and the value estimate obtained in S04 to train the traffic light controller. The objective function used to train the semaphore controller is:

其中N为所有路口的集合。本发明使用在线训练策略，每一个周期之后所有网络均会被优化一次。where N is the set of all intersections. The present invention uses an online training strategy, and all networks are optimized once after each cycle.

S09开始下一周期，重复S01到S09。S09 starts the next cycle, repeating S01 to S09.

为了验证本发明对提升交通路网运行效率的有效性，我们在SUMO模拟器上构建了一个具有21个路口、72条道路的仿真路网，仿真路网示意图如图4所示。我们根据真实车流规律生成了具有挑战性的仿真车流，对比本发明同时控制21个路口时与目前最有效的同类方法的效果。SUMO模拟器全称Simulation of Urban Mobility，是由德国航空航天中心运输系统研究所(Institute of Transportation Systems at the German AerospaceCenter)开发的一款目前在交通领域最常用的交通仿真软件，以其接近真实的仿真效果著称。我们在SUMO模拟器中生成了三种配置的车流，以验证方法在不同场景下的效果。三种配置车流如下表1所示：In order to verify the effectiveness of the present invention in improving the operational efficiency of the traffic road network, we constructed a simulated road network with 21 intersections and 72 roads on the SUMO simulator. The schematic diagram of the simulated road network is shown in Figure 4. We generate a challenging simulated traffic flow according to the real traffic flow law, and compare the effect of the present invention when controlling 21 intersections at the same time with the most effective similar method at present. The full name of the SUMO simulator is Simulation of Urban Mobility. It is developed by the Institute of Transportation Systems at the German Aerospace Center and is currently the most commonly used traffic simulation software in the transportation field. The effect is known. We generate traffic flows with three configurations in the SUMO simulator to verify the effect of the method in different scenarios. The three configurations of traffic flow are shown in Table 1 below:

表1Table 1

其中车流主要趋势表示了该段时间内，车流在路网中路线的趋势，例如西到东表示该股车流在路网中大部分由西向东形势，以此模拟潮汐车流场景。车辆到达率决定了车流量的大小，数值越大，交通压力越大。The main trend of traffic flow represents the trend of the traffic flow in the road network during this period of time. For example, west to east means that most of the traffic flow in the road network is from west to east, so as to simulate the tidal traffic flow scene. The vehicle arrival rate determines the size of the traffic flow. The larger the value, the greater the traffic pressure.

在三种交通流配置下本发明与目前最有效的同类方法比较的结果如下表2所示：The results of comparing the present invention with the most effective similar methods at present under three traffic flow configurations are shown in Table 2 below:

表2Table 2

其中，Traffic Configuration表示表1中的配置编号，Avg.Speed是整个模拟过程中路网平均车速的均值，Avg.Queue是整个模拟过程中路网平均路口排队长度的均值，Avg.Waiting是整个模拟过程中车辆在路网中等待通行(包括拥堵和等红绿灯)的平均时间，Time Duration是完成整个模拟所花费的时间。从结果看出，我们的方法(GraphRTSC)相比于所有选取的同类方法在三种交通流配置下各项指标均超过了同类方法。同时，我们测试了不使用MPTF提供的预测流量时的效果(GraphRTSC-noMPTF)，如表2所示，效果差于使用MPTF提供的预测流量(GraphRTSC)。Among them, Traffic Configuration represents the configuration number in Table 1, Avg.Speed is the average value of the average speed of the road network during the entire simulation process, Avg.Queue is the average value of the queue length at the average intersection of the road network during the entire simulation process, and Avg.Waiting is the entire simulation process. The average time for vehicles to wait for traffic in the road network (including congestion and waiting for traffic lights), Time Duration is the time it takes to complete the entire simulation. It can be seen from the results that our method (GraphRTSC) outperforms similar methods in all indicators under three traffic flow configurations compared to all selected similar methods. At the same time, we test the effect of not using the predicted traffic provided by MPTF (GraphRTSC-noMPTF), as shown in Table 2, the effect is worse than the predicted traffic provided by MPTF (GraphRTSC).

同时，我们将配置1下每一秒的路网平均车速和平均排队长度记录下来，比较本发明与同类方法的效果。如图5和图6所示，本发明(GraphRTSC)在路网平均车速和平均排队长度上表现均优于同类方法。At the same time, we record the average vehicle speed and average queue length of the road network every second in the configuration 1, and compare the effects of the present invention and similar methods. As shown in Figures 5 and 6, the present invention (GraphRTSC) outperforms similar methods in both average vehicle speed and average queue length of the road network.

另外，为验证本发明中流量预测器的流量预测准确有效性，在METR-LA数据集与目前世界最优秀的流量预测方法进行了对比实验。由南加州大学公开的METR-LA数据集包含从洛杉矶县高速公路环形探测器收集的交通信息，包括207个传感器的从2012年3月1日到2012年6月30日的车流数据。In addition, in order to verify the accuracy and validity of the flow prediction of the flow predictor in the present invention, a comparative experiment was carried out on the METR-LA data set and the current world's best flow prediction method. The METR-LA dataset, published by the University of Southern California, contains traffic information collected from the Los Angeles County Freeway Ring Detector, including traffic flow data for 207 sensors from March 1, 2012 to June 30, 2012.

本实验在测试集上与目前最有效的同类方法DCRNN、STGCN、ST-UNet就15分钟、30分钟和60分钟预测准确率做了比较，对比结果如表3所示。This experiment compares the prediction accuracy of 15 minutes, 30 minutes and 60 minutes with the most effective similar methods DCRNN, STGCN and ST-UNet on the test set. The comparison results are shown in Table 3.

表3table 3

从结果看出，本发明的方法(MPTF)相比于所有选取的同类方法在该数据集上具有明显更高的准确率。It can be seen from the results that the method of the present invention (MPTF) has significantly higher accuracy on this data set than all selected similar methods.

以上所述的实施例对本发明的技术方案和有益效果进行了详细说明，应理解的是以上所述仅为本发明的具体实施例，并不用于限制本发明，凡在本发明的原则范围内所做的任何修改、补充和等同替换，均应包含在本发明的保护范围之内。The above-mentioned embodiments describe the technical solutions and beneficial effects of the present invention in detail. It should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention. Any modifications, additions and equivalent substitutions made shall be included within the protection scope of the present invention.

Claims

1. A regional traffic signal lamp control method based on a graph neural network is characterized by comprising the following steps:

(1) acquiring a current signal control scheme and flow data of a target area road network in a plurality of past periods from a signal lamp control system, wherein the signal control scheme comprises a period length, a phase scheme and release time of each phase;

(2) inputting current phase timing, flow data of a plurality of past periods of each target intersection and a road network connection graph into a traffic flow predictor MPTF constructed based on a depth message propagation graph network MPGNN to obtain traffic flow prediction data of each intersection in each direction in the current period;

(3) inputting a current signal control scheme, flow data of a plurality of past periods of each target intersection, a road network connection graph and traffic flow prediction data obtained in the step (2) into a traffic light controller RTSC constructed based on a depth message propagation graph network MPGNN, and taking each generated phase timing of the current period as an adjusted control scheme; the RTSC constructs a control sub-network for each control intersection, and each control sub-network uses the same MPGNN output as input; the traffic signal lamp controller RTSC generates the row phase timing steps as follows:

(3-1) inputting the traffic flow prediction data obtained in the step (2), the current phase timing, the flow data of a plurality of past periods of each target intersection and the road network connection diagram into an MPGNN to obtain a feature vector of the road network traffic condition of the current region;

(3-2) respectively constructing a crossing control sub-network according to each crossing required to be controlled, wherein the phase timing output by each sub-network is related to the number of phases of the crossing, each sub-network uses the feature vector of the traffic condition of the current regional road network as input, and the output is the mean value and the square difference value of the high-dimensional continuous distribution of the control action of each crossing;

(3-3) each sub-network samples each phase timing from the control action high-dimensional continuous distribution of the corresponding intersection, normalizes each phase timing by using a Softmax function to obtain each phase timing proportion, and multiplies the phase timing proportion by a period length to obtain each phase timing length;

(4) inputting the control scheme regulated in the step (3), flow data of past cycles of each target intersection and a road network connection diagram by using a traffic flow predictor MPTF, and evaluating the value of the regulation action in the step (3);

(5) controlling the road network for one period time by using the control scheme adjusted in the step (3);

(6) collecting current road network flow data from a signal lamp control system, and calculating the benefit of the adjusting scheme in the step (3) by combining the road network flow data before the period starts;

(7) training a traffic flow predictor MPTF by using the road network flow data and the adjustment scheme income collected in the step (6) and combining the value estimation obtained in the step (4);

(8) training a traffic signal lamp controller (RTSC) by using the adjustment scheme gains obtained in the step (6) and the value estimation obtained in the step (4);

(9) and (5) starting the next period, and repeating the steps (1) to (8) every period.

2. The method according to claim 1, wherein the MPGNN comprises a plurality of neural network layers, and the network input is an input graph comprising traffic values of nodes on a network and a network connection graph representing connection relationships between the nodes; at each layer in the MPGNN, two operations of information propagation and information aggregation are performed on each node on the input graph, and mathematical expressions of the two operations are respectively:

wherein,

is the output of node v undergoing a k-th layer information propagation operation,

is the output of node v undergoing a k-th layer information aggregation operation,

is the flow value of a node v on an input graph, N (v) represents the set of all nodes directly connected to the node v, and MLP represents a multilayer perceptron consisting of three fully-connected neural network layers.

3. The regional traffic signal lamp control method based on the graph neural network according to claim 1, wherein in the step (2), the step of obtaining the traffic flow prediction data by the traffic flow predictor MPTF is: firstly, extracting each phase timing in the current signal control scheme into a feature code by using a multilayer fully-connected neural network; then inputting the flow data, road network connection graph and phase timing feature codes of each target intersection in past periods into MPGNN, and extracting to obtain a feature vector of the road network traffic condition of the current region; and finally, inputting the characteristic vector into a future flow predictor to obtain a predicted value of the future flow.

4. The regional traffic signal lamp control method based on the graph neural network as claimed in claim 1, wherein the specific steps of step (4) are as follows: and (4) after inputting the control scheme adjusted in the step (3), the flow data of each target intersection in a plurality of past periods and the road network connection graph, inputting the obtained current regional road network traffic condition feature vector into an action value predictor to predict the new timing value generated in the step (3).

5. The regional traffic signal light control method based on the graph neural network as claimed in claim 1, wherein in the step (6), the benefit expression of the adjusting scheme is as follows:

R(O^(t-1),O^(t),A^(t))＝O^(t-1)-O^(t)

wherein, O^(t)The length of vehicle queue in each direction of each road junction of the regional road network in the t-th period; a. the^(t)Timing the phase generated in step (3) before the start of the t-th period.

6. The regional traffic signal lamp control method based on the graph neural network as claimed in claim 1, wherein in the step (7), the loss function used for training the traffic flow predictor is as follows:

wherein, V_θ(O^(t-1),A^(t)) For the novelty predicted in step (4)The value of the timing is high,

calculating a predicted flow value (|) of non-woven fly ash in step (2)_L1As a function of the mean absolute error.

7. The regional traffic signal control method based on the graph neural network as claimed in claim 1, wherein in step (8), the objective function used by the training signal controller is:

and N is a set of all intersections, an online training strategy is adopted for training, and all networks are optimized once after each period.