CN112489464A

CN112489464A - Crossing traffic signal lamp regulation and control method with position sensing function

Info

Publication number: CN112489464A
Application number: CN202011302815.7A
Authority: CN
Inventors: 郭健; 李克秋; 郝建业
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2021-03-12
Anticipated expiration: 2040-11-19
Also published as: CN112489464B

Abstract

The invention discloses a crossing traffic signal lamp regulation and control method with position perception, step 1, utilizing a reinforcement learning network model to carry out mathematical modeling of a traffic signal control problem, wherein the mathematical modeling comprises the steps of modeling a traffic network into a graph form and defining a state space, an action space and a reward according to the traffic signal control problem; step 2, carrying out original observation value o of the intelligent agent_iPre-treating; step 3, obtaining edge characteristics with position perception between the intelligent agents; step 4, realizing a Pos-Light message transfer model between agents; step 5, realizing the regulation and control decision of the intersection traffic signal lamp of the Q network; and 6, carrying out regulation and control target training based on the Q network. Compared with the prior art, the intelligent agent has the advantages that (1) the intelligent agent decision is more efficient, the convergence speed is higher, and the finally obtained strategy effectively relieves traffic jam; (2) superior foodThe method for solving the control problem of the traffic signal lamp by using the neural network of the map is used at present, the convergence speed is higher, and the result is more stable.

Description

Crossing traffic signal lamp regulation and control method with position sensing function

Technical Field

The invention relates to the technical field of garbage classification, in particular to an intelligent garbage classification system and method.

Background

The traditional traffic light control is to make a control scheme in advance according to collected historical traffic data, such as: the SCATS traffic signal control system calculates two indexes of class saturation and flow of the crossroad according to the detection data of the traffic detector, and selects a proper signal lamp control scheme from the preset schemes for the crossroad. The SMOOTH traffic signal control system adopts a short-term prediction strategy to obtain the current state of the crossroad according to the collected traffic data, and then selects a corresponding strategy scheme according to the state. The traffic signal control system for SCATS and SMOOTH needs to consider various traffic conditions of crossroads in advance and provide a coping strategy, and the solution needs manual intervention and cannot be well adapted to dynamic traffic flow.

With the continuous progress of science and technology, the research of traffic systems tends to be more intelligent. Researchers have begun exploring the use of deep reinforcement learning methods in traffic signal control systems, combining artificial intelligence with traffic systems. In the prior art, a deep reinforcement learning model is arranged for each intersection, and each model performs decision learning according to the observed state of each model, so that the regulation and control of traffic lights of multiple intersections are realized. In the method, each intersection has an independent model, and the method cannot be used in a large-scale traffic network, and the fact that the adjacent intersections are influenced by each other is ignored by the independently learned model, so that the coordination control of the intersections cannot be realized. Considering that the decision of each intersection is also influenced by other intersections, for each intersection, the traffic condition of the adjacent intersection is spliced with the intersection to serve as the traffic condition information of the intersection. And all intersections share parameters of the network model. However, in most cases, stitching intersection information may not be a reasonable way to fuse information from adjacent intersections, which makes it difficult to determine the stitching order of adjacent intersections. In addition, Long Short-term memory networks (LSTM) are also used to integrate historical traffic information into current traffic states.

Among the traffic signal control methods proposed based on the deep reinforcement learning method, some methods improve on the state and reward setting, but ignore the spatial relationship between intersections. Some methods also consider the spatial relationship among intersections, model the traffic network into a graph form, extract the information of adjacent intersections by adopting a graph neural network, and integrate the information into the traffic information of a central intersection for decision making. But these methods only consider the connectivity of the traffic network and aggregate traffic conditions from adjacent intersections equally, however neglecting the spatial location of these intersections. And the spatial position of the crossroad is important for the control of the cooperative traffic signal lamp.

In summary, the present invention is directed to provide a method for manufacturing a semiconductor device

Disclosure of Invention

Aiming at the defect that the prior art ignores the position information of the intersection, the invention provides a method for regulating and controlling the traffic signal lamp of the intersection with position perception, which combines the graph neural network with reinforcement learning by considering the position information in the traffic network graph and realizes the coordination control of the traffic signal lamps of a plurality of intersections in the traffic network.

The invention discloses a method for regulating and controlling intersection traffic lights with position perception, which comprises the following concrete implementation processes:

step 1, performing mathematical modeling of a traffic signal control problem by using a reinforcement learning network model:

modeling the traffic network in the form of a graph and denoted as G: g ═ V, E, V is the set of intersections, E is the set of edges connecting two intersections; each intersection is seen as an agent, and N intersections are provided;

the state space, the action space and the reward are defined according to the traffic signal control problem as follows:

the state space is denoted as S: s_tThe system state of the moment t is the epsilon S and consists of traffic condition information of all intersections in the traffic signal network;

the observation space is marked as O:

the observed value of the agent i at the time t is obtained; the device consists of two parts: (1) the phase of the intersection at the current moment; (2) the number of vehicles on the access lane connected to the intersection;

the motion space is marked as A:

joint action a for all agents at time t_i,tA set of (a);

the reward is noted as R: each moment of time

Reward for each agent at time t; in particular, the negative value of the total number of vehicles in the entering lane of the intersection represented by the agent i, namely

The number of vehicles in the entering lane l at time t;

step 2, carrying out original observation value o of the intelligent agent_iThe pretreatment of (1):

at the time t, the original local observed value of each agent is a splicing vector of the number of vehicles on each lane and the current phase of the traffic signal, and the k-dimensional original observed value of the agent i is obtained through a multilayer perceptron

Hidden state h output in hidden space mapped to m dimension_i,t∈R^mThe traffic condition of the ith intersection at the time t is represented, m is a dimension, and the formula is as follows:

wherein the content of the first and second substances,

k is

Characteristic information dimension of, W_o∈R^k×m、b_o∈R^mRespectively weight matrix and bias in hidden layer of multilayer perceptron, sigma is ReAn LU activation function;

step 3, obtaining edge characteristics with position perception between the intelligent agents:

selecting all intersections within k dimension of a target intersection i as a neighbor node set N (i), and then calculating the Euclidean distance d (i, j) between the target intersection i and an adjacent intersection j e.N (i); calculating the Euclidean distance d (i, j) between the intersections i and j based on the coordinates according to the following calculation formula:

d(i,j)＝f_{distance between two adjacent plates}(i,j；G_w)

Mapping d (i, j) to [0,1 ]]Number p within the range_i,jTo express the relative position relationship between intersections, the formula is as follows:

finally, the edge feature e is obtained_i,j＝(p_i,j,-s_i,j) Representing the relative position and structure information of the adjacent intersection j and the target intersection i;

and 4, realizing a Pos-Light message transfer model between agents, and fusing traffic information, wherein the method comprises the following two stages:

1) integration of edge feature information with adjacent intersection information

For any neighbor intersection j ∈ N (i), N (i) is a set e of adjacent intersections of the target intersection i_i,j＝(p_i,j,-s_i,j) Characteristic information of an edge which is i, j; the traffic information of the neighboring intersection is coded according to the characteristics of the two types of edges, and the expression is as follows:

wherein, the spatial structure information of the adjacent intersection j relative to the target intersection i is reserved by using a multilayer perceptron

s_i,j∈R^lL is the number of neighbor nodes of the target intersection, W_s∈R^l×mIs a weight matrix of the network, b_s∈R^mIn order to be a bias of the network,

summarizing traffic messages at neighbor intersections

And

the total information is then encoded to obtain a final traffic message h containing location information for an adjacent intersection j_i,jThe expression is as follows:

wherein the content of the first and second substances,

W_e∈R^m×nis a weight matrix of the network, b_e∈RⁿAs an offset of the network, h_i.j∈RⁿPosition information of a target intersection i for an adjacent intersection j;

2) updating the traffic condition representation of the target intersection:

at this stage, the traffic condition characterization of each intersection is updated by aggregating the traffic information around the target intersection i

Wherein, W_h∈R^n×cIs a weight matrix of the network, b_h∈R^cIn order to bias the network in a biased manner,

important information of traffic conditions around the target intersection i is aggregated, so that an intelligent agent can make a decision more efficiently;

and 5, realizing the regulation and control decision of the intersection traffic signal lamp of the Q network:

for each agent (i.e., target intersection i), it will

Inputting the Q network, the agent using epsilon according to the output of the Q network^-Greedy algorithm to select actions, let ε equal p, p ∈ [0,1 ∈]In [0,1 ]]Generating a random number Q in the range, randomly selecting an action from the selectable actions if Q is less than epsilon, and otherwise, selecting the action which enables the Q value to be maximum as the action of the intelligent agent at the current moment;

at time t, the Q value of each agent is:

wherein, W_d∈R^c×dIs a weight matrix of the Q network, b_d∈R^dFor biasing of the Q network, d is the size of the motion space, Q_i,t∈R^|A|，Q_i,t(a) The Q value corresponding to the action a;

and 6, carrying out regulation and control target training based on a Q network:

transfer sequence(s) at each time t_t,a_t,s_t+1,r_t) Storing into experience pool D, wherein the global observation value

Joint action

Reward

The loss function for the updated model is:

wherein T is the total number of time steps for model updating, N is the total number of intersections in the whole traffic network, and the algorithm updates the parameters in the training network according to the updating formula of the loss function

After each g-round iteration, copying the parameters in the prediction network to the parameters in the target network

Compared with the prior art, the invention can achieve the following positive technical effects:

(1) the intelligent agent has higher decision efficiency and higher convergence speed, and the finally obtained strategy effectively relieves traffic jam;

(2) the method is superior to the current method for solving the control problem of the traffic signal lamp by using the neural network of the map, and has higher convergence speed and more stable result.

Drawings

FIG. 1 is an overall flow chart of a method for regulating and controlling intersection traffic lights with position sensing according to the present invention;

FIG. 2 is a schematic diagram of an interaction process model of a traffic environment and an agent;

FIG. 3 is a schematic view of a road network, (3a) a parallelogram ABCD, (3B) a parallelogram A 'B' C 'D';

FIG. 4 is a schematic diagram of a grid-type 4 × 4 road network;

FIG. 5 is a schematic view of an intersection configuration;

FIG. 6 is a schematic diagram of a model Pos-Light framework;

FIG. 7 is a graph comparing the performance of Pos-Light, PositionWithAtt, and other 3 RL methods (dashed lines) on a 3 × 3 road network during training;

FIG. 8 Performance of Pos-Light, PositionWithAtt, and other 3 RL methods (dashed lines) on a 4 × 4 road network during training

FIG. 9 Performance of Pos-Light, PositionWithAtt, and other 3 RL methods (dashed lines) in the southwest road network during training

FIG. 10 Performance of Pos-Light, PositionWithAtt, and other 3 RL methods (dashed lines) on New York road network during training

Detailed Description

The technical solution of the present invention is further explained with reference to the drawings and the embodiments.

The reinforcement learning agent interacts with the environment based on discrete time steps, and the specific interaction process is shown in fig. 1. At each time step t, the agent obtains the state s of the current time from the environment_tAnd a reward rt for environmental feedback, the agent then selects an action from a set of selectable actions to enter into the environment. The environment then transitions to the next state s according to the selected action_t+1While simultaneously feeding back a reward r to the agent_t+1. The goal of reinforcement learning is to maximize the accumulated rewards.

Fig. 2 is a schematic diagram of an interaction process model of a traffic environment and an agent.

Fig. 3 is a schematic diagram of a road network. Traffic conditions at neighboring intersections having different distances from the target intersection have different effects on traffic light decisions at the target intersection. A is a target intersection, the crossroads B, D are directly adjacent to A, and the distance satisfies l_A,B＞l_A,D. Given the speed limit of urban roads, it takes more time for a vehicle to travel from intersection B to intersection a than from intersection D to intersection a, for whichIt is concluded that the impact of neighboring intersections on the target intersection decreases with increasing distance. (2) Even if the distance between any two adjacent intersections in a given traffic network, the basic structural information of the intersection is still lost. For example, the distance, i.e./, at any immediately adjacent intersection of a quadrilateral having a given vertex of A, B, C, D_A,B＞l_D,E，l_A,D＞l_B,EThe shape of the quadrangle cannot be determined. It may be the parallelogram ABCD of (3a) or the square a 'B' C 'D' of (3B) in fig. 3. However, if the distance between A, E is known, the shape of the quadrangle can be uniquely identified. Accordingly, the invention proposes to add connecting edges at the diagonal intersection to reduce the uncertainty of graph modeling. Meanwhile, the influence of the traffic condition of the intersection E on the control of the traffic signal lamp of the target intersection A cannot be ignored, and the vehicle of the intersection E can influence the control of the signal lamp of the target intersection A by influencing the directly adjacent intersections and living areas.

Step 1, performing mathematical modeling of a traffic signal control problem by using a reinforcement learning network model, comprising the following processes:

modeling the traffic network in the form of a graph and denoted as G: g ═ V, E, V is the set of intersections, E is the set of edges connecting two intersections; and each intersection is seen as an agent, and N intersections are provided in total. Defining state space, action space and reward according to the traffic signal control problem:

the state space is denoted as S: s_tAnd E is the system state at the moment t, and consists of the traffic condition information of all intersections in the traffic signal network. Each agent can only observe the traffic condition of the agent at the current moment;

the observation space is marked as O:

is the observed value of agent i at time t. The device consists of two parts: (1) the phase of the intersection at the current moment; (2) the number of vehicles on the access lane connected to the intersection;

the motion space is marked as A:

joint action a for all agents at time t_i,tA set of (a);

the reward is noted as R: at each moment, each agent has its own prize.

The reward of the agent i at the moment t is specifically the negative value of the total number of vehicles in the entering lane of the intersection represented by the agent i, namely

The number of vehicles in the incoming lane i at time t.

wherein the content of the first and second substances,

k is

Characteristic information dimension of, W_o∈R^k×m、b_o∈R^mRespectively weight matrix and bias in hidden layer of multi-layer perceptron, sigma is ReLU excitationA live function;

selecting all intersections within k dimension of a target intersection i as a neighbor node set N (i), then calculating the Euclidean distance d (i, j) between the target intersection i and an adjacent intersection j belonging to N (i), and in order to keep the structure of an original traffic network, the invention also uses a 0-1 vector s_i,jTo distinguish neighboring intersections. The invention provides the edge distance and the [0,1 ] for distinguishing the adjacent intersection]The vector can uniquely identify the intersection in the traffic network, i.e. has a location-aware function. For example, as shown in fig. 4, a schematic diagram of a 4 × 4 grid traffic network is shown. In the distance between any two intersections in a given colored intersection set, the topological structure between the intersections can be restored by using multi-dimensional scaling, and the like, and the spatial structure of other similar four intersection sets can be restored. And the topology structure can be recovered only by considering the second hop neighbor information instead of considering the higher hop neighbors. Thus, the position of the intersection point can be located by the proposed side feature information. Each edge is augmented with a distance attribute to implicitly account for the coordinate values of the node. To encode the location information of each target intersection i. In summary, the present invention represents the edge feature between the target intersection point i and its adjacent intersection point j as e_i,jIt is composed of the distance correlation value between i and j and the connection information of j relative to i.

First, the euclidean distance d (i, j) between intersections i and j is calculated based on the coordinates:

d(i,j)＝f_{distance between two adjacent plates}(i,j；G_w)

Because the influence of the neighbor intersection with shorter distance to the target intersection is larger, the invention maps d (i, j) to be [0,1 ]]Number p within the range_i,jTo express the relative position relationship between intersections, the formula is as follows:

furthermore, l is added₁[0,1 ] of vitamin]Vector s of range_i,jTo distinguish adjacent intersections. In particular toFor target intersection i, n (i) is the set of adjacent intersections. ₁lIs the maximum jump point relative to target intersection i in n (i). If the adjacent intersection j in N (i) is positioned at r of the target intersection i in the network_-Jumping. Will s_i,jIs set to 1 and the remaining dimensions are 0.

Finally, the edge feature e is obtained_i,j＝(p_i,j,-s_i,j) Indicating the relative position and structure information of the adjacent intersection j and the target intersection i.

And 4, realizing message transmission among agents:

ensuring communication between intersections in a multi-intersection traffic network is vital to cooperatively controlling traffic lights. The location-aware messaging model (Pos-Light) provided by the invention preserves intersection location and structure information when transmitting messages. The Pos-Light model is divided into the following two stages when fusing traffic information:

For any neighbor intersection j ∈ N (i), N (i) is a set e of adjacent intersections of the target intersection i_i,j＝(p_i,j,-s_i,j) Characteristic information of an edge of i, j. The traffic information of the neighboring intersections is coded for the features of the two types of edges as follows:

summarizing traffic messages at neighbor intersections

And

wherein the content of the first and second substances,

W_e∈R^m×nis a weight matrix of the network, b_e∈RⁿAs an offset of the network, h_i.j∈RⁿPosition information for the adjacent intersection j for the target intersection i.

2) Updating traffic condition characterization for a target intersection

the important information of the traffic condition around the target intersection i is aggregated, so that the intelligent agent can betterAnd the decision is efficiently made.

And 5, realizing a regulation decision-making decision of the intersection traffic signal lamp of the Q network:

and making a decision for the traffic signal lamp of each intersection according to the learned traffic conditions. For each agent (i.e., target intersection i), it will

Inputting the Q network, the agent using epsilon according to the output of the Q network^-Greedy algorithm to select actions, let ε equal p, p ∈ [0,1 ∈]In [0,1 ]]And generating a random number Q in the range, randomly selecting one action from the selectable actions if Q < epsilon ═ p, and selecting the action which enables the Q value to be maximum as the action of the intelligent agent at the current time.

At time t, the Q value of each agent is:

wherein, W_d∈R^c×dIs a weight matrix of the Q network, b_d∈R^dFor Q-network biasing, d is the size of the motion space. Q_i,t∈R^A，Q_i,t(a) The Q value is the Q value corresponding to action a.

If each agent has its own model, it is not suitable for a traffic network with large-scale intersections. To scale up, the present invention allows all agents to share parameters and maintain one model.

in the Q network, a target network and a main network which have the same structure but do not synchronously update network parameters exist. The parameters of the target network and the main network are respectively

Joint action

Reward

The loss function for the updated model is:

wherein T is the total time step for model updating, N is the total intersection number in the whole traffic network, and the algorithm updates the parameters in the training network according to the updating formula of the loss function

The intelligent agents communicate with each other to realize the coordination control of the signal lamps in the traffic network. The invention adopts a message transmission neural network framework to realize the mutual communication between intelligent agents, firstly preprocesses the initial traffic condition of the intersection, then aggregates the peripheral traffic condition of each target intersection as the final traffic information of the intersection based on the edge characteristics with position perception, and then inputs the final traffic information into a Q network for decision making, and the whole implementation process is shown as figure 5.

The invention verifies the validity of the algorithm on the intersection with four roads, wherein the entering lane in each road consists of three lanes of straight going, left turning and right turning, as shown in figure 6. For other types of intersections, such as intersections with only three roads or only straight lanes and left-turn lanes on each road, the intersections can be unified into the type of the intersection in the experiment through a zero filling method.

As shown in Table 1, the action space A for agent i_iThe action space set of the composed agent is composed of four cases.

TABLE 1

The achieved regulatory effect was evaluated as follows:

the invention performs experiments on a simulation platform Cityflow supporting large traffic signal control. The Cityflow provides traffic conditions to the signaling method and performs traffic signaling actions from the signaling method. The average travel time in seconds was used to evaluate the performance of the model. The average travel time of all vehicles is the most common metric for evaluating the performance of algorithms in the traffic domain, and is calculated from the average travel time spent by all vehicles in the traffic network.

Experimental data experiments were performed using synthetic and truly collected traffic data, more traffic data being available in public websites 1.

Synthesis data: in the experiment, different scale road networks were used for performance analysis (3 × 3 and 4 × 4 road networks, respectively, with four directions at each intersection (south-east-west-north), three entrance lanes (left-turn lane, right-turn lane and straight lane) each with a width of 4 meters, and the traffic flow of the road network was sampled from the gaussian distribution according to the analysis of the actual traffic flow pattern.

Real data: the use of the Jinan and New York road networks from OpenStreetMap2 in the experiments, as shown in Table 2, summarizes traffic flow data statistics in real world road networks for traffic flow data analysis in real world road networks.

TABLE 2

In order to evaluate the performance of the model in the traffic light control problem, the model is compared with two classical heuristic methods and three recently proposed reinforcement learning methods.

FixedTime: the red road lamp regulation and control scheme of the intersection is selected from a predefined rule set with a period, and is widely applied to a stable traffic flow scene.

Maxpressurr: the method is the best traffic light regulation and control method in the current traffic neighborhood, and the current direction at the maximum pressure is set as a green light at each intersection.

SimpleDqnOne: each intersection is controlled by a respective agent, and no interaction of traffic conditions is performed between agents.

NeighborDqnOne: on the basis of SimpleDqnOne, the traffic conditions of the neighbor intersections of each central intersection are connected with the intelligent agents in series, and all the intelligent agents share the same network parameters but cannot distinguish different traffic conditions of the neighbor intersections.

Color: the method selects a fixed number of neighbor intersections and uses an attention mechanism to aggregate traffic condition information of neighbors.

Pos-Light: the model provided by the invention combines the proposed edge characteristics e (i, j) with position perception as (p)_i,j,s_i,j) To fuse the traffic conditions around each target intersectionAnd (4) representing the state of the intersection, and taking the state representation as the input of the decision Q network to make a decision.

PositionwithAtt: an attention mechanism is added on the basis of Pos-Light, attention coefficients are dynamically learned according to traffic conditions of peripheral intersections, and the peripheral traffic conditions of the target intersections are better converged.

As shown in table 3, the performance of the synthetic data and the real data for each model is shown. The performance of each model in the synthetic data and the real data is listed, and fig. 7 to 10 show the convergence of each model in different data sets. Pos-Light and its variant, PoswithAtt, achieved consistent performance improvements over the most advanced method (MaxPressure) and the reinforcement learning method (sight) in all road networks and traffic areas, with the greatest improvement in the composite data set being 23.43% in the 4 x 4 road network and the greatest improvement in the real data set being 15.42% in the new york data set.

simpleDqnOne is in most cases inferior to other reinforcement learning methods, even the MaxPressure method in the traffic field, because each agent in the model makes decisions only according to its traffic conditions, and does not communicate with multiple intersections in the traffic network. Compared with SimpleDqnOne, NeighborDqnOne considers the traffic conditions of adjacent intersections, but directly connects the traffic information from upstream and downstream intersections together without considering the different importance of the adjacent intersections to the target intersection, so the method has poor effect. Colight with attention mechanism ignores the spatial location of the intersection and therefore is less effective in all cases than Pos-Light and positionwithAtt.

In addition to considering the location of the intersection, PositionwithAtt also incorporates an attention mechanism to dynamically adjust the impact of neighboring intersections on the target intersection, so PositionwithAtt can extract better information from neighboring intersections with different traffic conditions, making learning more stable (see fig. 7-8 for details), and PositionwithAtt performs better than Pos-Light in some cases.

The invention provides a deep reinforcement learning model with position perception, which is used for solving the problem of multi-channel traffic signal lamp control. In particular, the model takes into account the spatial location of the intersection and introduces side information with location awareness to help locate the location of the intersection in the traffic network. In addition, the invention also dynamically adjusts the influence of the adjacent crossing on the target crossing based on the attention mechanism. The invention firstly proposes to research the spatial position of the crossroad to promote the coordination control of the traffic signal lamp. And extensive experiments are carried out by using a synthetic and real data set, so that the effectiveness and the efficiency of the model provided by the invention are better than those of the latest method.

Claims

1. A method for regulating and controlling intersection traffic signal lamps with position perception is characterized by comprising the following concrete implementation processes:

the observation space is marked as O:

the motion space is marked as A:

joint action a for all agents at time t_i,tA set of (a);

the reward is noted as R: each time r_i ^tReward for each agent at time t; in particular, the negative value of the total number of vehicles in the entering lane of the intersection represented by the agent i, namely

The number of vehicles in the entering lane l at time t;

wherein the content of the first and second substances,

k is

Characteristic information dimension of, W_o∈R^k×m、b_o∈R^mRespectively representing a weight matrix and an offset in a hidden layer of the multilayer perceptron, wherein sigma is a ReLU activation function;

d(i,j)＝f_{distance between two adjacent plates}(i,j；G_w)