CN112489464A - Crossing traffic signal lamp regulation and control method with position sensing function - Google Patents

Crossing traffic signal lamp regulation and control method with position sensing function Download PDF

Info

Publication number
CN112489464A
CN112489464A CN202011302815.7A CN202011302815A CN112489464A CN 112489464 A CN112489464 A CN 112489464A CN 202011302815 A CN202011302815 A CN 202011302815A CN 112489464 A CN112489464 A CN 112489464A
Authority
CN
China
Prior art keywords
intersection
network
traffic
intersections
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011302815.7A
Other languages
Chinese (zh)
Other versions
CN112489464B (en
Inventor
郭健
李克秋
郝建业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202011302815.7A priority Critical patent/CN112489464B/en
Publication of CN112489464A publication Critical patent/CN112489464A/en
Application granted granted Critical
Publication of CN112489464B publication Critical patent/CN112489464B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/09Arrangements for giving variable traffic instructions
    • G08G1/095Traffic lights
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • G08G1/081Plural intersections under common control

Abstract

The invention discloses a crossing traffic signal lamp regulation and control method with position perception, step 1, utilizing a reinforcement learning network model to carry out mathematical modeling of a traffic signal control problem, wherein the mathematical modeling comprises the steps of modeling a traffic network into a graph form and defining a state space, an action space and a reward according to the traffic signal control problem; step 2, carrying out original observation value o of the intelligent agentiPre-treating; step 3, obtaining edge characteristics with position perception between the intelligent agents; step 4, realizing a Pos-Light message transfer model between agents; step 5, realizing the regulation and control decision of the intersection traffic signal lamp of the Q network; and 6, carrying out regulation and control target training based on the Q network. Compared with the prior art, the intelligent agent has the advantages that (1) the intelligent agent decision is more efficient, the convergence speed is higher, and the finally obtained strategy effectively relieves traffic jam; (2) superior foodThe method for solving the control problem of the traffic signal lamp by using the neural network of the map is used at present, the convergence speed is higher, and the result is more stable.

Description

Crossing traffic signal lamp regulation and control method with position sensing function
Technical Field
The invention relates to the technical field of garbage classification, in particular to an intelligent garbage classification system and method.
Background
The traditional traffic light control is to make a control scheme in advance according to collected historical traffic data, such as: the SCATS traffic signal control system calculates two indexes of class saturation and flow of the crossroad according to the detection data of the traffic detector, and selects a proper signal lamp control scheme from the preset schemes for the crossroad. The SMOOTH traffic signal control system adopts a short-term prediction strategy to obtain the current state of the crossroad according to the collected traffic data, and then selects a corresponding strategy scheme according to the state. The traffic signal control system for SCATS and SMOOTH needs to consider various traffic conditions of crossroads in advance and provide a coping strategy, and the solution needs manual intervention and cannot be well adapted to dynamic traffic flow.
With the continuous progress of science and technology, the research of traffic systems tends to be more intelligent. Researchers have begun exploring the use of deep reinforcement learning methods in traffic signal control systems, combining artificial intelligence with traffic systems. In the prior art, a deep reinforcement learning model is arranged for each intersection, and each model performs decision learning according to the observed state of each model, so that the regulation and control of traffic lights of multiple intersections are realized. In the method, each intersection has an independent model, and the method cannot be used in a large-scale traffic network, and the fact that the adjacent intersections are influenced by each other is ignored by the independently learned model, so that the coordination control of the intersections cannot be realized. Considering that the decision of each intersection is also influenced by other intersections, for each intersection, the traffic condition of the adjacent intersection is spliced with the intersection to serve as the traffic condition information of the intersection. And all intersections share parameters of the network model. However, in most cases, stitching intersection information may not be a reasonable way to fuse information from adjacent intersections, which makes it difficult to determine the stitching order of adjacent intersections. In addition, Long Short-term memory networks (LSTM) are also used to integrate historical traffic information into current traffic states.
Among the traffic signal control methods proposed based on the deep reinforcement learning method, some methods improve on the state and reward setting, but ignore the spatial relationship between intersections. Some methods also consider the spatial relationship among intersections, model the traffic network into a graph form, extract the information of adjacent intersections by adopting a graph neural network, and integrate the information into the traffic information of a central intersection for decision making. But these methods only consider the connectivity of the traffic network and aggregate traffic conditions from adjacent intersections equally, however neglecting the spatial location of these intersections. And the spatial position of the crossroad is important for the control of the cooperative traffic signal lamp.
In summary, the present invention is directed to provide a method for manufacturing a semiconductor device
Disclosure of Invention
Aiming at the defect that the prior art ignores the position information of the intersection, the invention provides a method for regulating and controlling the traffic signal lamp of the intersection with position perception, which combines the graph neural network with reinforcement learning by considering the position information in the traffic network graph and realizes the coordination control of the traffic signal lamps of a plurality of intersections in the traffic network.
The invention discloses a method for regulating and controlling intersection traffic lights with position perception, which comprises the following concrete implementation processes:
step 1, performing mathematical modeling of a traffic signal control problem by using a reinforcement learning network model:
modeling the traffic network in the form of a graph and denoted as G: g ═ V, E, V is the set of intersections, E is the set of edges connecting two intersections; each intersection is seen as an agent, and N intersections are provided;
the state space, the action space and the reward are defined according to the traffic signal control problem as follows:
the state space is denoted as S: stThe system state of the moment t is the epsilon S and consists of traffic condition information of all intersections in the traffic signal network;
the observation space is marked as O:
Figure BDA0002787434150000021
the observed value of the agent i at the time t is obtained; the device consists of two parts: (1) the phase of the intersection at the current moment; (2) the number of vehicles on the access lane connected to the intersection;
the motion space is marked as A:
Figure BDA0002787434150000022
joint action a for all agents at time ti,tA set of (a);
the reward is noted as R: each moment of time
Figure BDA0002787434150000031
Reward for each agent at time t; in particular, the negative value of the total number of vehicles in the entering lane of the intersection represented by the agent i, namely
Figure BDA0002787434150000032
Figure BDA0002787434150000033
The number of vehicles in the entering lane l at time t;
step 2, carrying out original observation value o of the intelligent agentiThe pretreatment of (1):
at the time t, the original local observed value of each agent is a splicing vector of the number of vehicles on each lane and the current phase of the traffic signal, and the k-dimensional original observed value of the agent i is obtained through a multilayer perceptron
Figure BDA0002787434150000034
Hidden state h output in hidden space mapped to m dimensioni,t∈RmThe traffic condition of the ith intersection at the time t is represented, m is a dimension, and the formula is as follows:
Figure BDA0002787434150000035
wherein the content of the first and second substances,
Figure BDA0002787434150000036
k is
Figure BDA0002787434150000037
Characteristic information dimension of, Wo∈Rk×m、bo∈RmRespectively weight matrix and bias in hidden layer of multilayer perceptron, sigma is ReAn LU activation function;
step 3, obtaining edge characteristics with position perception between the intelligent agents:
selecting all intersections within k dimension of a target intersection i as a neighbor node set N (i), and then calculating the Euclidean distance d (i, j) between the target intersection i and an adjacent intersection j e.N (i); calculating the Euclidean distance d (i, j) between the intersections i and j based on the coordinates according to the following calculation formula:
d(i,j)=fdistance between two adjacent plates(i,j;Gw)
Mapping d (i, j) to [0,1 ]]Number p within the rangei,jTo express the relative position relationship between intersections, the formula is as follows:
Figure BDA0002787434150000038
finally, the edge feature e is obtainedi,j=(pi,j,-si,j) Representing the relative position and structure information of the adjacent intersection j and the target intersection i;
and 4, realizing a Pos-Light message transfer model between agents, and fusing traffic information, wherein the method comprises the following two stages:
1) integration of edge feature information with adjacent intersection information
For any neighbor intersection j ∈ N (i), N (i) is a set e of adjacent intersections of the target intersection ii,j=(pi,j,-si,j) Characteristic information of an edge which is i, j; the traffic information of the neighboring intersection is coded according to the characteristics of the two types of edges, and the expression is as follows:
Figure BDA0002787434150000041
Figure BDA0002787434150000042
wherein, the spatial structure information of the adjacent intersection j relative to the target intersection i is reserved by using a multilayer perceptron
Figure BDA0002787434150000043
si,j∈RlL is the number of neighbor nodes of the target intersection, Ws∈Rl×mIs a weight matrix of the network, bs∈RmIn order to be a bias of the network,
Figure BDA0002787434150000044
summarizing traffic messages at neighbor intersections
Figure BDA0002787434150000045
And
Figure BDA0002787434150000046
the total information is then encoded to obtain a final traffic message h containing location information for an adjacent intersection ji,jThe expression is as follows:
Figure BDA0002787434150000047
wherein the content of the first and second substances,
Figure BDA0002787434150000048
We∈Rm×nis a weight matrix of the network, be∈RnAs an offset of the network, hi.j∈RnPosition information of a target intersection i for an adjacent intersection j;
2) updating the traffic condition representation of the target intersection:
at this stage, the traffic condition characterization of each intersection is updated by aggregating the traffic information around the target intersection i
Figure BDA0002787434150000049
Figure BDA00027874341500000410
Wherein, Wh∈Rn×cIs a weight matrix of the network, bh∈RcIn order to bias the network in a biased manner,
Figure BDA00027874341500000411
important information of traffic conditions around the target intersection i is aggregated, so that an intelligent agent can make a decision more efficiently;
and 5, realizing the regulation and control decision of the intersection traffic signal lamp of the Q network:
for each agent (i.e., target intersection i), it will
Figure BDA0002787434150000051
Inputting the Q network, the agent using epsilon according to the output of the Q network-Greedy algorithm to select actions, let ε equal p, p ∈ [0,1 ∈]In [0,1 ]]Generating a random number Q in the range, randomly selecting an action from the selectable actions if Q is less than epsilon, and otherwise, selecting the action which enables the Q value to be maximum as the action of the intelligent agent at the current moment;
at time t, the Q value of each agent is:
Figure BDA0002787434150000052
wherein, Wd∈Rc×dIs a weight matrix of the Q network, bd∈RdFor biasing of the Q network, d is the size of the motion space, Qi,t∈R|A|,Qi,t(a) The Q value corresponding to the action a;
and 6, carrying out regulation and control target training based on a Q network:
transfer sequence(s) at each time tt,at,st+1,rt) Storing into experience pool D, wherein the global observation value
Figure BDA0002787434150000053
Joint action
Figure BDA0002787434150000054
Reward
Figure BDA0002787434150000055
The loss function for the updated model is:
Figure BDA0002787434150000056
Figure BDA0002787434150000057
wherein T is the total number of time steps for model updating, N is the total number of intersections in the whole traffic network, and the algorithm updates the parameters in the training network according to the updating formula of the loss function
Figure BDA0002787434150000058
After each g-round iteration, copying the parameters in the prediction network to the parameters in the target network
Figure BDA0002787434150000059
Compared with the prior art, the invention can achieve the following positive technical effects:
(1) the intelligent agent has higher decision efficiency and higher convergence speed, and the finally obtained strategy effectively relieves traffic jam;
(2) the method is superior to the current method for solving the control problem of the traffic signal lamp by using the neural network of the map, and has higher convergence speed and more stable result.
Drawings
FIG. 1 is an overall flow chart of a method for regulating and controlling intersection traffic lights with position sensing according to the present invention;
FIG. 2 is a schematic diagram of an interaction process model of a traffic environment and an agent;
FIG. 3 is a schematic view of a road network, (3a) a parallelogram ABCD, (3B) a parallelogram A 'B' C 'D';
FIG. 4 is a schematic diagram of a grid-type 4 × 4 road network;
FIG. 5 is a schematic view of an intersection configuration;
FIG. 6 is a schematic diagram of a model Pos-Light framework;
FIG. 7 is a graph comparing the performance of Pos-Light, PositionWithAtt, and other 3 RL methods (dashed lines) on a 3 × 3 road network during training;
FIG. 8 Performance of Pos-Light, PositionWithAtt, and other 3 RL methods (dashed lines) on a 4 × 4 road network during training
FIG. 9 Performance of Pos-Light, PositionWithAtt, and other 3 RL methods (dashed lines) in the southwest road network during training
FIG. 10 Performance of Pos-Light, PositionWithAtt, and other 3 RL methods (dashed lines) on New York road network during training
Detailed Description
The technical solution of the present invention is further explained with reference to the drawings and the embodiments.
The reinforcement learning agent interacts with the environment based on discrete time steps, and the specific interaction process is shown in fig. 1. At each time step t, the agent obtains the state s of the current time from the environmenttAnd a reward rt for environmental feedback, the agent then selects an action from a set of selectable actions to enter into the environment. The environment then transitions to the next state s according to the selected actiont+1While simultaneously feeding back a reward r to the agentt+1. The goal of reinforcement learning is to maximize the accumulated rewards.
Fig. 2 is a schematic diagram of an interaction process model of a traffic environment and an agent.
Fig. 3 is a schematic diagram of a road network. Traffic conditions at neighboring intersections having different distances from the target intersection have different effects on traffic light decisions at the target intersection. A is a target intersection, the crossroads B, D are directly adjacent to A, and the distance satisfies lA,B>lA,D. Given the speed limit of urban roads, it takes more time for a vehicle to travel from intersection B to intersection a than from intersection D to intersection a, for whichIt is concluded that the impact of neighboring intersections on the target intersection decreases with increasing distance. (2) Even if the distance between any two adjacent intersections in a given traffic network, the basic structural information of the intersection is still lost. For example, the distance, i.e./, at any immediately adjacent intersection of a quadrilateral having a given vertex of A, B, C, DA,B>lD,E,lA,D>lB,EThe shape of the quadrangle cannot be determined. It may be the parallelogram ABCD of (3a) or the square a 'B' C 'D' of (3B) in fig. 3. However, if the distance between A, E is known, the shape of the quadrangle can be uniquely identified. Accordingly, the invention proposes to add connecting edges at the diagonal intersection to reduce the uncertainty of graph modeling. Meanwhile, the influence of the traffic condition of the intersection E on the control of the traffic signal lamp of the target intersection A cannot be ignored, and the vehicle of the intersection E can influence the control of the signal lamp of the target intersection A by influencing the directly adjacent intersections and living areas.
Step 1, performing mathematical modeling of a traffic signal control problem by using a reinforcement learning network model, comprising the following processes:
modeling the traffic network in the form of a graph and denoted as G: g ═ V, E, V is the set of intersections, E is the set of edges connecting two intersections; and each intersection is seen as an agent, and N intersections are provided in total. Defining state space, action space and reward according to the traffic signal control problem:
the state space is denoted as S: stAnd E is the system state at the moment t, and consists of the traffic condition information of all intersections in the traffic signal network. Each agent can only observe the traffic condition of the agent at the current moment;
the observation space is marked as O:
Figure BDA0002787434150000071
is the observed value of agent i at time t. The device consists of two parts: (1) the phase of the intersection at the current moment; (2) the number of vehicles on the access lane connected to the intersection;
the motion space is marked as A:
Figure BDA0002787434150000072
joint action a for all agents at time ti,tA set of (a);
the reward is noted as R: at each moment, each agent has its own prize.
Figure BDA0002787434150000081
The reward of the agent i at the moment t is specifically the negative value of the total number of vehicles in the entering lane of the intersection represented by the agent i, namely
Figure BDA0002787434150000082
Figure BDA0002787434150000083
The number of vehicles in the incoming lane i at time t.
Step 2, carrying out original observation value o of the intelligent agentiThe pretreatment of (1):
at the time t, the original local observed value of each agent is a splicing vector of the number of vehicles on each lane and the current phase of the traffic signal, and the k-dimensional original observed value of the agent i is obtained through a multilayer perceptron
Figure BDA0002787434150000084
Hidden state h output in hidden space mapped to m dimensioni,t∈RmThe traffic condition of the ith intersection at the time t is represented, m is a dimension, and the formula is as follows:
Figure BDA0002787434150000085
wherein the content of the first and second substances,
Figure BDA0002787434150000086
k is
Figure BDA0002787434150000087
Characteristic information dimension of, Wo∈Rk×m、bo∈RmRespectively weight matrix and bias in hidden layer of multi-layer perceptron, sigma is ReLU excitationA live function;
step 3, obtaining edge characteristics with position perception between the intelligent agents:
selecting all intersections within k dimension of a target intersection i as a neighbor node set N (i), then calculating the Euclidean distance d (i, j) between the target intersection i and an adjacent intersection j belonging to N (i), and in order to keep the structure of an original traffic network, the invention also uses a 0-1 vector si,jTo distinguish neighboring intersections. The invention provides the edge distance and the [0,1 ] for distinguishing the adjacent intersection]The vector can uniquely identify the intersection in the traffic network, i.e. has a location-aware function. For example, as shown in fig. 4, a schematic diagram of a 4 × 4 grid traffic network is shown. In the distance between any two intersections in a given colored intersection set, the topological structure between the intersections can be restored by using multi-dimensional scaling, and the like, and the spatial structure of other similar four intersection sets can be restored. And the topology structure can be recovered only by considering the second hop neighbor information instead of considering the higher hop neighbors. Thus, the position of the intersection point can be located by the proposed side feature information. Each edge is augmented with a distance attribute to implicitly account for the coordinate values of the node. To encode the location information of each target intersection i. In summary, the present invention represents the edge feature between the target intersection point i and its adjacent intersection point j as ei,jIt is composed of the distance correlation value between i and j and the connection information of j relative to i.
First, the euclidean distance d (i, j) between intersections i and j is calculated based on the coordinates:
d(i,j)=fdistance between two adjacent plates(i,j;Gw)
Because the influence of the neighbor intersection with shorter distance to the target intersection is larger, the invention maps d (i, j) to be [0,1 ]]Number p within the rangei,jTo express the relative position relationship between intersections, the formula is as follows:
Figure BDA0002787434150000091
furthermore, l is added1[0,1 ] of vitamin]Vector s of rangei,jTo distinguish adjacent intersections. In particular toFor target intersection i, n (i) is the set of adjacent intersections. 1lIs the maximum jump point relative to target intersection i in n (i). If the adjacent intersection j in N (i) is positioned at r of the target intersection i in the network-Jumping. Will si,jIs set to 1 and the remaining dimensions are 0.
Finally, the edge feature e is obtainedi,j=(pi,j,-si,j) Indicating the relative position and structure information of the adjacent intersection j and the target intersection i.
And 4, realizing message transmission among agents:
ensuring communication between intersections in a multi-intersection traffic network is vital to cooperatively controlling traffic lights. The location-aware messaging model (Pos-Light) provided by the invention preserves intersection location and structure information when transmitting messages. The Pos-Light model is divided into the following two stages when fusing traffic information:
1) integration of edge feature information with adjacent intersection information
For any neighbor intersection j ∈ N (i), N (i) is a set e of adjacent intersections of the target intersection ii,j=(pi,j,-si,j) Characteristic information of an edge of i, j. The traffic information of the neighboring intersections is coded for the features of the two types of edges as follows:
Figure BDA0002787434150000092
Figure BDA0002787434150000093
wherein, the spatial structure information of the adjacent intersection j relative to the target intersection i is reserved by using a multilayer perceptron
Figure BDA0002787434150000094
si,j∈RlL is the number of neighbor nodes of the target intersection, Ws∈Rl×mIs a weight matrix of the network, bs∈RmIn order to be a bias of the network,
Figure BDA0002787434150000095
summarizing traffic messages at neighbor intersections
Figure BDA0002787434150000096
And
Figure BDA0002787434150000097
the total information is then encoded to obtain a final traffic message h containing location information for an adjacent intersection ji,jThe expression is as follows:
Figure BDA0002787434150000101
wherein the content of the first and second substances,
Figure BDA0002787434150000102
We∈Rm×nis a weight matrix of the network, be∈RnAs an offset of the network, hi.j∈RnPosition information for the adjacent intersection j for the target intersection i.
2) Updating traffic condition characterization for a target intersection
At this stage, the traffic condition characterization of each intersection is updated by aggregating the traffic information around the target intersection i
Figure BDA0002787434150000103
Figure BDA0002787434150000104
Wherein, Wh∈Rn×cIs a weight matrix of the network, bh∈RcIn order to bias the network in a biased manner,
Figure BDA0002787434150000105
the important information of the traffic condition around the target intersection i is aggregated, so that the intelligent agent can betterAnd the decision is efficiently made.
And 5, realizing a regulation decision-making decision of the intersection traffic signal lamp of the Q network:
and making a decision for the traffic signal lamp of each intersection according to the learned traffic conditions. For each agent (i.e., target intersection i), it will
Figure BDA0002787434150000106
Inputting the Q network, the agent using epsilon according to the output of the Q network-Greedy algorithm to select actions, let ε equal p, p ∈ [0,1 ∈]In [0,1 ]]And generating a random number Q in the range, randomly selecting one action from the selectable actions if Q < epsilon ═ p, and selecting the action which enables the Q value to be maximum as the action of the intelligent agent at the current time.
At time t, the Q value of each agent is:
Figure BDA0002787434150000107
wherein, Wd∈Rc×dIs a weight matrix of the Q network, bd∈RdFor Q-network biasing, d is the size of the motion space. Qi,t∈RA,Qi,t(a) The Q value is the Q value corresponding to action a.
If each agent has its own model, it is not suitable for a traffic network with large-scale intersections. To scale up, the present invention allows all agents to share parameters and maintain one model.
And 6, carrying out regulation and control target training based on a Q network:
in the Q network, a target network and a main network which have the same structure but do not synchronously update network parameters exist. The parameters of the target network and the main network are respectively
Figure BDA0002787434150000111
Transfer sequence(s) at each time tt,at,st+1,rt) Storing into experience pool D, wherein the global observation value
Figure BDA0002787434150000112
Joint action
Figure BDA0002787434150000113
Reward
Figure BDA0002787434150000114
The loss function for the updated model is:
Figure BDA0002787434150000115
Figure BDA0002787434150000116
Figure BDA0002787434150000117
wherein T is the total time step for model updating, N is the total intersection number in the whole traffic network, and the algorithm updates the parameters in the training network according to the updating formula of the loss function
Figure BDA0002787434150000118
After each g-round iteration, copying the parameters in the prediction network to the parameters in the target network
Figure BDA0002787434150000119
The intelligent agents communicate with each other to realize the coordination control of the signal lamps in the traffic network. The invention adopts a message transmission neural network framework to realize the mutual communication between intelligent agents, firstly preprocesses the initial traffic condition of the intersection, then aggregates the peripheral traffic condition of each target intersection as the final traffic information of the intersection based on the edge characteristics with position perception, and then inputs the final traffic information into a Q network for decision making, and the whole implementation process is shown as figure 5.
The invention verifies the validity of the algorithm on the intersection with four roads, wherein the entering lane in each road consists of three lanes of straight going, left turning and right turning, as shown in figure 6. For other types of intersections, such as intersections with only three roads or only straight lanes and left-turn lanes on each road, the intersections can be unified into the type of the intersection in the experiment through a zero filling method.
As shown in Table 1, the action space A for agent iiThe action space set of the composed agent is composed of four cases.
TABLE 1
Figure BDA00027874341500001110
Figure BDA0002787434150000121
The achieved regulatory effect was evaluated as follows:
the invention performs experiments on a simulation platform Cityflow supporting large traffic signal control. The Cityflow provides traffic conditions to the signaling method and performs traffic signaling actions from the signaling method. The average travel time in seconds was used to evaluate the performance of the model. The average travel time of all vehicles is the most common metric for evaluating the performance of algorithms in the traffic domain, and is calculated from the average travel time spent by all vehicles in the traffic network.
Experimental data experiments were performed using synthetic and truly collected traffic data, more traffic data being available in public websites 1.
Synthesis data: in the experiment, different scale road networks were used for performance analysis (3 × 3 and 4 × 4 road networks, respectively, with four directions at each intersection (south-east-west-north), three entrance lanes (left-turn lane, right-turn lane and straight lane) each with a width of 4 meters, and the traffic flow of the road network was sampled from the gaussian distribution according to the analysis of the actual traffic flow pattern.
Real data: the use of the Jinan and New York road networks from OpenStreetMap2 in the experiments, as shown in Table 2, summarizes traffic flow data statistics in real world road networks for traffic flow data analysis in real world road networks.
TABLE 2
Figure BDA0002787434150000122
Figure BDA0002787434150000131
In order to evaluate the performance of the model in the traffic light control problem, the model is compared with two classical heuristic methods and three recently proposed reinforcement learning methods.
FixedTime: the red road lamp regulation and control scheme of the intersection is selected from a predefined rule set with a period, and is widely applied to a stable traffic flow scene.
Maxpressurr: the method is the best traffic light regulation and control method in the current traffic neighborhood, and the current direction at the maximum pressure is set as a green light at each intersection.
SimpleDqnOne: each intersection is controlled by a respective agent, and no interaction of traffic conditions is performed between agents.
NeighborDqnOne: on the basis of SimpleDqnOne, the traffic conditions of the neighbor intersections of each central intersection are connected with the intelligent agents in series, and all the intelligent agents share the same network parameters but cannot distinguish different traffic conditions of the neighbor intersections.
Color: the method selects a fixed number of neighbor intersections and uses an attention mechanism to aggregate traffic condition information of neighbors.
Pos-Light: the model provided by the invention combines the proposed edge characteristics e (i, j) with position perception as (p)i,j,si,j) To fuse the traffic conditions around each target intersectionAnd (4) representing the state of the intersection, and taking the state representation as the input of the decision Q network to make a decision.
PositionwithAtt: an attention mechanism is added on the basis of Pos-Light, attention coefficients are dynamically learned according to traffic conditions of peripheral intersections, and the peripheral traffic conditions of the target intersections are better converged.
As shown in table 3, the performance of the synthetic data and the real data for each model is shown. The performance of each model in the synthetic data and the real data is listed, and fig. 7 to 10 show the convergence of each model in different data sets. Pos-Light and its variant, PoswithAtt, achieved consistent performance improvements over the most advanced method (MaxPressure) and the reinforcement learning method (sight) in all road networks and traffic areas, with the greatest improvement in the composite data set being 23.43% in the 4 x 4 road network and the greatest improvement in the real data set being 15.42% in the new york data set.
simpleDqnOne is in most cases inferior to other reinforcement learning methods, even the MaxPressure method in the traffic field, because each agent in the model makes decisions only according to its traffic conditions, and does not communicate with multiple intersections in the traffic network. Compared with SimpleDqnOne, NeighborDqnOne considers the traffic conditions of adjacent intersections, but directly connects the traffic information from upstream and downstream intersections together without considering the different importance of the adjacent intersections to the target intersection, so the method has poor effect. Colight with attention mechanism ignores the spatial location of the intersection and therefore is less effective in all cases than Pos-Light and positionwithAtt.
In addition to considering the location of the intersection, PositionwithAtt also incorporates an attention mechanism to dynamically adjust the impact of neighboring intersections on the target intersection, so PositionwithAtt can extract better information from neighboring intersections with different traffic conditions, making learning more stable (see fig. 7-8 for details), and PositionwithAtt performs better than Pos-Light in some cases.
Figure BDA0002787434150000141
The invention provides a deep reinforcement learning model with position perception, which is used for solving the problem of multi-channel traffic signal lamp control. In particular, the model takes into account the spatial location of the intersection and introduces side information with location awareness to help locate the location of the intersection in the traffic network. In addition, the invention also dynamically adjusts the influence of the adjacent crossing on the target crossing based on the attention mechanism. The invention firstly proposes to research the spatial position of the crossroad to promote the coordination control of the traffic signal lamp. And extensive experiments are carried out by using a synthetic and real data set, so that the effectiveness and the efficiency of the model provided by the invention are better than those of the latest method.

Claims (1)

1. A method for regulating and controlling intersection traffic signal lamps with position perception is characterized by comprising the following concrete implementation processes:
step 1, performing mathematical modeling of a traffic signal control problem by using a reinforcement learning network model:
modeling the traffic network in the form of a graph and denoted as G: g ═ V, E, V is the set of intersections, E is the set of edges connecting two intersections; each intersection is seen as an agent, and N intersections are provided;
the state space, the action space and the reward are defined according to the traffic signal control problem as follows:
the state space is denoted as S: stThe system state of the moment t is the epsilon S and consists of traffic condition information of all intersections in the traffic signal network;
the observation space is marked as O:
Figure FDA0002787434140000011
the observed value of the agent i at the time t is obtained; the device consists of two parts: (1) the phase of the intersection at the current moment; (2) the number of vehicles on the access lane connected to the intersection;
the motion space is marked as A:
Figure FDA0002787434140000012
joint action a for all agents at time ti,tA set of (a);
the reward is noted as R: each time ri tReward for each agent at time t; in particular, the negative value of the total number of vehicles in the entering lane of the intersection represented by the agent i, namely
Figure FDA0002787434140000013
Figure FDA0002787434140000014
The number of vehicles in the entering lane l at time t;
step 2, carrying out original observation value o of the intelligent agentiThe pretreatment of (1):
at the time t, the original local observed value of each agent is a splicing vector of the number of vehicles on each lane and the current phase of the traffic signal, and the k-dimensional original observed value of the agent i is obtained through a multilayer perceptron
Figure FDA0002787434140000015
Hidden state h output in hidden space mapped to m dimensioni,t∈RmThe traffic condition of the ith intersection at the time t is represented, m is a dimension, and the formula is as follows:
Figure FDA0002787434140000016
wherein the content of the first and second substances,
Figure FDA0002787434140000017
k is
Figure FDA0002787434140000018
Characteristic information dimension of, Wo∈Rk×m、bo∈RmRespectively representing a weight matrix and an offset in a hidden layer of the multilayer perceptron, wherein sigma is a ReLU activation function;
step 3, obtaining edge characteristics with position perception between the intelligent agents:
selecting all intersections within k dimension of a target intersection i as a neighbor node set N (i), and then calculating the Euclidean distance d (i, j) between the target intersection i and an adjacent intersection j e.N (i); calculating the Euclidean distance d (i, j) between the intersections i and j based on the coordinates according to the following calculation formula:
d(i,j)=fdistance between two adjacent plates(i,j;Gw)
Mapping d (i, j) to [0,1 ]]Number p within the rangei,jTo express the relative position relationship between intersections, the formula is as follows:
Figure FDA0002787434140000021
finally, the edge feature e is obtainedi,j=(pi,j,-si,j) Representing the relative position and structure information of the adjacent intersection j and the target intersection i;
and 4, realizing a Pos-Light message transfer model between agents, and fusing traffic information, wherein the method comprises the following two stages:
1) integration of edge feature information with adjacent intersection information
For any neighbor intersection j ∈ N (i), N (i) is a set e of adjacent intersections of the target intersection ii,j=(pi,j,-si,j) Characteristic information of an edge which is i, j; the traffic information of the neighboring intersection is coded according to the characteristics of the two types of edges, and the expression is as follows:
Figure FDA0002787434140000022
Figure FDA0002787434140000023
wherein, the spatial structure information of the adjacent intersection j relative to the target intersection i is reserved by using a multilayer perceptron
Figure FDA0002787434140000024
si,j∈RlL is the number of neighbor nodes of the target intersection, Ws∈Rl×mIs a weight matrix of the network, bs∈RmIn order to be a bias of the network,
Figure FDA0002787434140000025
summarizing traffic messages at neighbor intersections
Figure FDA0002787434140000026
And
Figure FDA0002787434140000027
the total information is then encoded to obtain a final traffic message h containing location information for an adjacent intersection ji,jThe expression is as follows:
Figure FDA0002787434140000028
wherein the content of the first and second substances,
Figure FDA0002787434140000031
We∈Rm×nis a weight matrix of the network, be∈RnAs an offset of the network, hi.j∈RnPosition information of a target intersection i for an adjacent intersection j;
2) updating the traffic condition representation of the target intersection:
at this stage, the traffic condition characterization of each intersection is updated by aggregating the traffic information around the target intersection i
Figure FDA0002787434140000032
Figure FDA0002787434140000033
Wherein, Wh∈Rn×cIs a weight matrix of the network, bh∈RcIn order to bias the network in a biased manner,
Figure FDA0002787434140000034
important information of traffic conditions around the target intersection i is aggregated, so that an intelligent agent can make a decision more efficiently;
and 5, realizing the regulation and control decision of the intersection traffic signal lamp of the Q network:
for each agent (i.e., target intersection i), it will
Figure FDA00027874341400000310
Inputting the Q network, the agent using epsilon according to the output of the Q network-Greedy algorithm to select actions, let ε equal p, p ∈ [0,1 ∈]In [0,1 ]]Generating a random number Q in the range, randomly selecting an action from the selectable actions if Q is less than epsilon, and otherwise, selecting the action which enables the Q value to be maximum as the action of the intelligent agent at the current moment;
at time t, the Q value of each agent is:
Figure FDA0002787434140000035
wherein, Wd∈Rc×dIs a weight matrix of the Q network, bd∈RdFor biasing of the Q network, d is the size of the motion space, Qi,t∈R|A|,Qi,t(a) The Q value corresponding to the action a;
and 6, carrying out regulation and control target training based on a Q network:
transfer sequence(s) at each time tt,at,st+1,rt) Storing into experience pool D, wherein the global observation value
Figure FDA0002787434140000036
Figure FDA0002787434140000037
Joint action
Figure FDA0002787434140000038
Reward
Figure FDA0002787434140000039
The loss function for the updated model is:
Figure FDA0002787434140000041
Figure FDA0002787434140000042
wherein T is the total time step for model updating, N is the total intersection number in the whole traffic network, and the algorithm updates the parameters in the training network according to the updating formula of the loss function
Figure FDA0002787434140000043
After each g-round iteration, copying the parameters in the prediction network to the parameters in the target network
Figure FDA0002787434140000044
CN202011302815.7A 2020-11-19 2020-11-19 Crossing traffic signal lamp regulation and control method with position sensing function Active CN112489464B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011302815.7A CN112489464B (en) 2020-11-19 2020-11-19 Crossing traffic signal lamp regulation and control method with position sensing function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011302815.7A CN112489464B (en) 2020-11-19 2020-11-19 Crossing traffic signal lamp regulation and control method with position sensing function

Publications (2)

Publication Number Publication Date
CN112489464A true CN112489464A (en) 2021-03-12
CN112489464B CN112489464B (en) 2022-06-28

Family

ID=74932102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011302815.7A Active CN112489464B (en) 2020-11-19 2020-11-19 Crossing traffic signal lamp regulation and control method with position sensing function

Country Status (1)

Country Link
CN (1) CN112489464B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113077642A (en) * 2021-04-01 2021-07-06 武汉理工大学 Traffic signal lamp control method and device and computer readable storage medium
CN113299078A (en) * 2021-03-29 2021-08-24 东南大学 Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation
CN113436447A (en) * 2021-06-22 2021-09-24 佳都科技集团股份有限公司 Traffic signal management and control system and equipment for grid-shaped road network
CN113435112A (en) * 2021-06-10 2021-09-24 大连海事大学 Multi-agent reinforcement learning traffic signal control method based on neighbor perception
CN114267191A (en) * 2021-12-10 2022-04-01 北京理工大学 Control system, method, medium, equipment and application for relieving traffic jam of driver
CN114613168A (en) * 2022-04-19 2022-06-10 南京信息工程大学 Deep reinforcement learning traffic signal control method based on memory network
CN115457781A (en) * 2022-09-13 2022-12-09 内蒙古工业大学 Intelligent traffic signal lamp control method based on multi-agent deep reinforcement learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106910351A (en) * 2017-04-19 2017-06-30 大连理工大学 A kind of traffic signals self-adaptation control method based on deeply study
WO2018030772A1 (en) * 2016-08-10 2018-02-15 중앙대학교 산학협력단 Responsive traffic signal control method and apparatus therefor
CN110060475A (en) * 2019-04-17 2019-07-26 清华大学 A kind of multi-intersection signal lamp cooperative control method based on deeply study
CN110135196A (en) * 2019-05-10 2019-08-16 内蒙古工业大学 A kind of data fusion tamper resistant method based on input data compression expression association analysis
CN110264750A (en) * 2019-06-14 2019-09-20 大连理工大学 A kind of multi-intersection signal lamp cooperative control method of the Q value migration based on multitask depth Q network
CN111260937A (en) * 2020-02-24 2020-06-09 武汉大学深圳研究院 Cross traffic signal lamp control method based on reinforcement learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018030772A1 (en) * 2016-08-10 2018-02-15 중앙대학교 산학협력단 Responsive traffic signal control method and apparatus therefor
CN106910351A (en) * 2017-04-19 2017-06-30 大连理工大学 A kind of traffic signals self-adaptation control method based on deeply study
CN110060475A (en) * 2019-04-17 2019-07-26 清华大学 A kind of multi-intersection signal lamp cooperative control method based on deeply study
CN110135196A (en) * 2019-05-10 2019-08-16 内蒙古工业大学 A kind of data fusion tamper resistant method based on input data compression expression association analysis
CN110264750A (en) * 2019-06-14 2019-09-20 大连理工大学 A kind of multi-intersection signal lamp cooperative control method of the Q value migration based on multitask depth Q network
CN111260937A (en) * 2020-02-24 2020-06-09 武汉大学深圳研究院 Cross traffic signal lamp control method based on reinforcement learning

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113299078A (en) * 2021-03-29 2021-08-24 东南大学 Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation
CN113299078B (en) * 2021-03-29 2022-04-08 东南大学 Multi-mode traffic trunk line signal coordination control method and device based on multi-agent cooperation
CN113077642A (en) * 2021-04-01 2021-07-06 武汉理工大学 Traffic signal lamp control method and device and computer readable storage medium
CN113077642B (en) * 2021-04-01 2022-06-21 武汉理工大学 Traffic signal lamp control method and device and computer readable storage medium
CN113435112A (en) * 2021-06-10 2021-09-24 大连海事大学 Multi-agent reinforcement learning traffic signal control method based on neighbor perception
CN113435112B (en) * 2021-06-10 2024-02-13 大连海事大学 Traffic signal control method based on neighbor awareness multi-agent reinforcement learning
CN113436447A (en) * 2021-06-22 2021-09-24 佳都科技集团股份有限公司 Traffic signal management and control system and equipment for grid-shaped road network
CN114267191A (en) * 2021-12-10 2022-04-01 北京理工大学 Control system, method, medium, equipment and application for relieving traffic jam of driver
CN114613168A (en) * 2022-04-19 2022-06-10 南京信息工程大学 Deep reinforcement learning traffic signal control method based on memory network
CN114613168B (en) * 2022-04-19 2023-02-24 南京信息工程大学 Deep reinforcement learning traffic signal control method based on memory network
CN115457781A (en) * 2022-09-13 2022-12-09 内蒙古工业大学 Intelligent traffic signal lamp control method based on multi-agent deep reinforcement learning
CN115457781B (en) * 2022-09-13 2023-07-11 内蒙古工业大学 Intelligent traffic signal lamp control method based on multi-agent deep reinforcement learning

Also Published As

Publication number Publication date
CN112489464B (en) 2022-06-28

Similar Documents

Publication Publication Date Title
CN112489464B (en) Crossing traffic signal lamp regulation and control method with position sensing function
CN108847037B (en) Non-global information oriented urban road network path planning method
CN111785045B (en) Distributed traffic signal lamp combined control method based on actor-critic algorithm
CN108470444B (en) A kind of city area-traffic big data analysis System and method for based on genetic algorithm optimization
CN109035767A (en) A kind of tide lane optimization method considering Traffic Control and Guidance collaboration
CN111260937A (en) Cross traffic signal lamp control method based on reinforcement learning
CN104766484A (en) Traffic control and guidance system and method based on evolutionary multi-objective optimization and ant colony algorithm
CN108597246A (en) A method of Path selection real time problems are solved to avoid local congestion
CN106535282A (en) QoS sensing routing protocol based on genetic algorithm in vehicular ad-hoc Network
Pham et al. Learning coordinated traffic light control
CN102867409A (en) Road traffic cooperative control method for urban central area
Du et al. GAQ-EBkSP: a DRL-based urban traffic dynamic rerouting framework using fog-cloud architecture
CN111126687A (en) Single-point off-line optimization system and method for traffic signals
Lin et al. Scheduling eight-phase urban traffic light problems via ensemble meta-heuristics and Q-learning based local search
Wei et al. Study of self-organizing control of traffic signals in an urban network based on cellular automata
Shamshirband A distributed approach for coordination between traffic lights based on game theory.
Zeng et al. Halight: Hierarchical deep reinforcement learning for cooperative arterial traffic signal control with cycle strategy
Garg et al. A deep reinforcement learning agent for traffic intersection control optimization
CN110691453A (en) Method for efficiently managing and controlling intelligent street lamp by adopting artificial intelligence technology
CN113077642B (en) Traffic signal lamp control method and device and computer readable storage medium
CN115762128A (en) Deep reinforcement learning traffic signal control method based on self-attention mechanism
CN109600712A (en) A kind of adaptive routing method based on software definition car networking
Luo et al. AlphaRoute: Large-scale coordinated route planning via monte carlo tree search
Kato et al. An alife approach to modeling virtual cities
Su et al. A graph attention mechanism based multi-agent reinforcement learning method for efficient traffic light control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant