CN116612636B

CN116612636B - Signal lamp cooperative control method based on multi-agent reinforcement learning

Info

Publication number: CN116612636B
Application number: CN202310582760.7A
Authority: CN
Inventors: 欧阳雅捷; 殷力; 郭艺雯; 赵阔
Original assignee: Jinan University
Current assignee: Jinan University
Priority date: 2023-05-22
Filing date: 2023-05-22
Publication date: 2024-01-23
Anticipated expiration: 2043-05-22
Also published as: CN116612636A

Abstract

The invention provides a signal lamp cooperative control method based on multi-agent reinforcement learning and multi-mode signal perception, which comprises the following steps: collecting data of various sensors, performing multi-mode definition, and acquiring information in real time through a data fusion technology; the signal lamp and the vehicle are cooperatively controlled by adopting a cooperative vehicle-road multi-agent reinforcement learning algorithm; preprocessing the collected data of various sensors, and fusing the data of different modes by utilizing a characteristic fusion method to construct a local state space for each intelligent body; designing action spaces for signal lamp intelligent bodies and vehicle intelligent bodies; designing a reward function for multi-agent reinforcement learning according to the traffic flow control target; designing a communication protocol suitable for a vehicle-road cooperative control scene; and training the multi-agent reinforcement learning model by using historical data or a simulation environment to find an optimal strategy. According to the invention, by introducing the vehicle as an intelligent body, more effective vehicle-road coordination is realized, and the traffic control effect is further improved.

Description

Signal lamp cooperative control method based on multi-agent reinforcement learning

Technical Field

The invention belongs to the field of vehicle-road coordination, and particularly relates to a signal lamp cooperative control method based on multi-agent reinforcement learning.

Background

With increasing urban traffic, traditional signal lamp control methods have been difficult to meet the efficient traffic demands of modern cities. To solve this problem, researchers have begun to employ Intelligent Transportation Systems (ITS) to improve road traffic efficiency. Among them, a signal lamp control system based on multi-agent reinforcement learning and multi-mode signal sensing is attracting attention.

Conventional signal control methods are generally based on fixed signal periods or predetermined traffic flow patterns, lacking adaptability to real-time traffic conditions. Therefore, a synergistic method for breaking through the limitations of the conventional traffic signal lamp control method and improving the adaptability to real-time traffic conditions is urgently needed.

Disclosure of Invention

The invention aims to provide a signal lamp cooperative control method based on multi-agent reinforcement learning, which realizes more effective vehicle-road cooperation by introducing vehicles as agents and further improves the traffic control effect.

In order to achieve the above object, the present invention provides a signal lamp cooperative control method based on multi-agent reinforcement learning, the method comprising:

s1, collecting data of various sensors, performing multi-mode definition, and acquiring information in real time through a data fusion technology;

s2, adopting a cooperative vehicle-road multi-agent reinforcement learning algorithm to cooperatively control the signal lamp and the vehicle, and finding an optimal strategy through learning to realize efficient traffic flow control;

s3, preprocessing the collected data of various sensors, and fusing the data of different modes by utilizing a characteristic fusion method to construct a local state space for each intelligent agent;

s4, designing an action space for the intelligent body;

s5, designing a reward function for multi-agent reinforcement learning according to the traffic flow control target;

s6, designing a communication protocol;

and S7, training the multi-agent reinforcement learning model by using historical data or simulation environment to find out an optimal strategy.

Further, the multi-modal definition in S1 includes a visual modality and a radar modality; image data collected by the visual mode through a camera; distance and speed information collected by the radar module through a radar; the information includes road condition information, vehicle position and speed.

Further, the data fusion technique specifically includes:

s1.1, modeling a scene into a graph structure;

s1.2, extracting features of data of each mode;

s1.3, feature fusion based on a graph convolution neural network;

s1.4, outputting the reinforcement learning state.

Further, the objective function of the collaborative vehicle-road multi-agent reinforcement learning algorithm is defined as:

the bonus function is defined as R (s, a), where s represents a state and a represents an action of an agent, expressed as:

J(θ)＝∑_t R(s_t,a_t)

wherein J (θ) represents an objective function; s_t represents the state of the agent at the moment t; a_t represents actions made by the agent at time t;

the loss function L (θ) is expressed as:

L(θ)＝0.5*E[(R(s,a)+γ*max_a'Q(s',a'；θ')-Q(s,a；θ))^2]

wherein E [. Cndot. ] represents an expected value, θ represents a parameter of the current agent, θ ' represents a parameter of the target agent, γ is a discount factor, a ' represents an action made by the agent in state s ', Q (s, a; θ) is an action cost function for estimating a jackpot for taking action a in state s; q (s ', a '; θ ') represents an evaluation of the action value of the network for action a ' in state s ', and is used to measure how good the action a ' is in state s '.

Further, the implementation steps of the objective function of the collaborative vehicle-road multi-agent reinforcement learning algorithm include:

s2.1, utilizing a centralized training and a distributed execution strategy, realizing the cooperation among the intelligent agents by performing the centralized training in a training stage, and making a decision according to a local state by using the distributed strategy by each intelligent agent in an execution stage;

s2.2, acquiring a comprehensive state space containing multi-mode information through the data of the various sensors in the step S1, so that an intelligent agent can more accurately sense traffic conditions;

s2.3, the vehicle and the signal lamp are used as different intelligent agents, so that cooperative control between the vehicle and the signal lamp is realized, and the traffic fluency is improved.

Further, the data in the step S3 includes vehicle data, road condition data and signal lamp data;

further, the state space is represented as follows:

s= { vehicle data, road condition data, signal lamp data }.

Further, the step S4 specifically includes:

s4.1, discretizing the control strategy of the signal lamp into a series of optional actions;

s4.2, dynamically adjusting the phase setting of the signal lamp according to the real-time road condition data;

s4.3, designing a self-adaptive signal lamp control strategy.

Further, the communication protocol includes vehicle to roadside facility communication, communication between lights, central controller to lights communication, and data fusion and processing.

Further, the step S7 specifically includes: training the multi-agent reinforcement learning model by using historical data or a simulation environment, finding an optimal strategy, and deploying the optimal strategy to a signal lamp control system.

The beneficial technical effects of the invention are at least as follows:

(1) The multi-mode signal perception technology provides abundant real-time traffic information for the signal lamp control system. By fusing data from various sensors such as cameras, radars, vehicle-mounted sensors and the like, the system can more accurately sense traffic conditions and provide more targeted decision basis for signal lamp control.

(2) By the multi-mode feature fusion method based on the graph convolution neural network, the topological relation between the vehicle and the signal lamp can be captured better. Meanwhile, through the method, the multi-mode signal perception data can be fused into a unified state space, and richer and more accurate information is provided for the multi-agent reinforcement learning algorithm.

(3) Through the optimal strategy of the invention, the signal lamp control system has stronger dynamic adjustment capability, can better adapt to continuously-changing traffic conditions in practical application, and improves the overall signal lamp control effect.

Drawings

The invention will be further described with reference to the accompanying drawings, in which embodiments do not constitute any limitation of the invention, and other drawings can be obtained by one of ordinary skill in the art without inventive effort from the following drawings.

FIG. 1 is a flow chart of a signal lamp cooperative control method based on multi-agent reinforcement learning.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

Example 1

In one or more embodiments, as shown in fig. 1, a signal lamp cooperative control method based on multi-agent reinforcement learning is disclosed, which includes the following steps:

s1, collecting data of various sensors, performing multi-mode definition, and acquiring information in real time through a data fusion technology; .

Specifically, step S1 is responsible for collecting data of various sensors (such as cameras, radars, vehicle sensors, etc.), and acquiring information of road condition information, vehicle position, speed, etc. in real time through a data fusion technology. This information will be used to construct a state space for multi-agent reinforcement learning.

The multi-modal definition specifically includes:

a. visual modality: the image data collected by the camera can provide information on the position, shape, speed and the like of the vehicle. The camera can be installed on the roadside or on the vehicle to transmit image data in real time.

b. Radar mode: the distance and speed information collected by the radar helps to accurately detect the position, speed and distance of the vehicle. The radar may be installed on a roadside or a vehicle, and transmits distance data in real time. These data may be transmitted in real time to the signal control system via an on-board communication system (e.g., V2X communication).

The step of data fusion in the step S1 specifically includes:

s1.1, modeling a scene into a graph structure. In the traffic light control scenario, there is a certain topological relationship between the vehicle and the lights. We can model a scene as a graph structure with vehicles and signal lights as nodes and their interrelationships as edges;

s1.2, extracting features of data of each mode. For each modality of data, feature extraction is first performed. For visual modalities, features may be extracted using Convolutional Neural Networks (CNNs); for radar modalities, features may be extracted using one-dimensional convolutional neural networks or Recurrent Neural Networks (RNNs); for the in-vehicle sensor modality, features may be extracted using a Fully Connected Neural Network (FCNN). The extracted features are then represented as node features.

And S1.3, feature fusion based on a graph convolution neural network. The extracted multi-modal features are input as node features into a graph roll-up neural network (GCN). The GCN may capture the topological relationship between nodes while preserving node characteristics. Through the propagation and convergence operations of the GCN, the comprehensive characteristics including multi-mode information and the topological relation among the nodes can be obtained.

S1.4, outputting the reinforcement learning state. And taking the GCN fused characteristics as a reinforcement learning state space for a subsequent multi-agent reinforcement learning algorithm.

By the multi-mode feature fusion method based on the graph convolution neural network, the topological relation between the vehicle and the signal lamp can be better captured. Meanwhile, through the method, the multi-mode signal perception data can be fused into a unified state space, and richer and more accurate information is provided for the multi-agent reinforcement learning algorithm.

S2, a cooperative vehicle-road multi-agent reinforcement learning algorithm is adopted to cooperatively control the signal lamp and the vehicle, wherein the signal lamp and the vehicle are regarded as a plurality of agents, and an optimal strategy is found through learning to realize efficient traffic flow control.

Specifically, the collaborative Road Multi-agent reinforcement learning algorithm is called CVR-MARL, which is collectively called Collaborative Vehicle-Road Multi-Agent Reinforcement Learning, and the objective function of CVR-MARL is to maximize the system's jackpot. In this model, signal lights and vehicles are considered as multiple agents that need to find optimal strategies through learning to achieve efficient traffic flow control.

The objective function of CVR-MARL is defined as:

the bonus function is defined as R (s, a), where s represents the state (based on the multi-modal perceived feature fusion result), and a represents the actions of the agent, expressed as:

J(θ)＝∑_t R(s_t,a_t)

the loss function L (θ) is expressed as:

L(θ)＝0.5*E[(R(s,a)+γ*max_a'Q(s',a'；θ')-Q(s,a；θ))^2]

The method for realizing the aim concretely comprises the following steps:

s2.1, multi-agent cooperation: the intelligent training system has the advantages that the centralized training and the distributed execution strategy are utilized, the cooperation among the intelligent agents is realized through the centralized training in the training stage, and in the execution stage, each intelligent agent (including vehicles and signal lamps) uses the distributed strategy to make a decision according to the local state;

s2.2, constructing a state space: the comprehensive state space containing multi-mode information is obtained through the data of the various sensors in the step S1, so that an intelligent agent can more accurately sense traffic conditions and construct the state space;

s2.3, vehicle-road cooperation: the vehicle and the signal lamp are used as different intelligent bodies, so that cooperative control between the vehicle and the signal lamp is realized, and the traffic fluency is improved.

S3, preprocessing the collected data of various sensors, and fusing the data of different modes together by using a characteristic fusion method to construct a local state space for each intelligent body.

Specifically, according to the data collected by the multi-mode signal sensing module, a local state space is built for each intelligent agent (signal lamp and vehicle), and the CVR-MARL collects multi-mode data from three aspects of vehicles, road conditions and signal lamps. Comprising the following data:

vehicle data:

a) Position information: longitude and latitude, heading angle, etc. of the vehicle.

b) Speed information: real-time speed of the vehicle.

c) Acceleration information: real-time acceleration of the vehicle.

d) Vehicle type: such as cars, trucks, buses, etc.

e) Vehicle communication data: communication information between vehicles, such as internet of vehicles (V2V), and the like.

Road condition data:

a) Road structure: the width of the road, the number of lanes, the dividing strips and the like.

b) Traffic flow: the number, density, etc. of vehicles in each lane.

c) Road environment information: road conditions, weather, lighting, etc.

Signal lamp data:

a) Status information: the current state of the signal lamp (red, green, yellow).

b) Remaining time: the remaining time of the signal lamp state change.

c) Signal lamp control strategy: such as fixed cycle control, inductive control, etc.

d) Vehicle-signal lamp communication data: communication information between the vehicle and the signal lamp, such as vehicle road communication data (V2I), and the like.

From the collected multimodal data, a state space can be constructed as follows:

s= { vehicle data, road condition data, traffic light data }

When constructing the state space, preprocessing is required for collected multi-modal data to eliminate dimension and dimension differences between the data. For example, the data may be normalized so as to be within the same range. And then, fusing the data of different modes together by using a characteristic fusion method to generate a comprehensive state representation. The comprehensive state representation can fully utilize multi-mode information to provide richer environment perception for the CVR-MARL, so that cooperative control of the vehicle and the road is better realized.

S4, designing a feasible action space for the signal lamp intelligent body and the vehicle intelligent body.

Specifically, a feasible action space is designed for signal lamp agents and vehicle agents. When designing the action space, the control strategy of the signal lamp needs to be considered, so that the action space can adapt to different road conditions. In order to realize the dynamic signal lamp action space, the following method is adopted:

s4.1, discretizing the control strategy of the signal lamp into a series of optional actions, for example, the signal lamp can be divided into a plurality of discrete actions according to parameters such as phase, duration, change rate and the like of the signal lamp. The method simplifies the representation of the action space and is convenient for the reinforcement learning algorithm to explore and optimize.

And S4.2, dynamically adjusting the phase setting of the signal lamp according to the real-time road condition data, for example, increasing green light time on a lane with larger traffic flow so as to relieve congestion. In addition, the phase sequence and the time length can be adjusted according to the road structure, the vehicle type, the weather and other factors so as to improve the traffic efficiency.

S4.3, designing a self-adaptive signal lamp control strategy, for example, adopting induction control at the intersection with smaller flow and adopting cooperative control at the intersection with larger flow. In addition, parameters of the control strategy can be dynamically adjusted according to the real-time traffic data so as to adapt to the change of road conditions.

The action space is expressed as:

a= { action 1, action 2,..

Wherein each action corresponds to a signal lamp control strategy or parameter setting. By dynamically adjusting the action space and the control strategy, intelligent control of the signal lamp can be realized, so that traffic efficiency and safety are improved.

S5, designing a proper reward function for multi-agent reinforcement learning according to the traffic flow control target.

Specifically, a suitable reward function is designed for multi-agent reinforcement learning based on the goals of traffic flow control (e.g., congestion reduction, emissions reduction, etc.). The bonus function needs to balance various factors to achieve an optimal vehicle-road cooperative control effect. Specifically, the transfer reward function R (s, a, s') is designed as follows:

R(s,a,s')＝w1*T(s,a,s')+w2*D(s,a,s')+w3*S(s,a,s')

wherein:

s: a current state;

a: actions performed by the agent;

s': a new state after the action is executed;

w1, w2, w3: the weight parameter is used for balancing the importance of each index;

t (s, a, s'): traffic efficiency indicators, such as average speed or waiting time of vehicles passing through an intersection;

d (s, a, s'): traffic congestion degree indexes such as the length of vehicles queued at an intersection or the number of waiting vehicles;

s (S, a, S'): traffic safety indicators such as the probability of a traffic accident or the safety distance between a vehicle and a pedestrian.

The design of the reward function needs to take into account aspects such as traffic efficiency, congestion level and safety, so as to guide the intelligent agent to make decisions beneficial to the overall traffic condition. In practical application, the weight parameters and the index function can be adjusted according to specific scenes and requirements so as to achieve a better control effect.

S6, designing a communication protocol suitable for a vehicle-road cooperative control scene.

Specifically, a communication protocol suitable for a vehicle-road cooperative control scene is designed, so that information can be efficiently and safely exchanged between intelligent agents. The communication protocol needs to take into account low latency, high reliability, security, etc. It is believed that the specific devices should be able to communicate in different ways before the purpose of the present invention is achieved.

The communication protocol comprises vehicle and road side facility communication, communication between signal lamps, communication between a central controller and the signal lamps, and data fusion and processing, and specifically comprises the following steps:

a) Vehicle-to-roadside facility communication (vehicle-to-road communication): two-way communication between the vehicle and roadside equipment (e.g., signal lights, sensors, etc.) is achieved using Dedicated Short Range Communication (DSRC) or vehicle-to-vehicle internet (V2X) technologies. The vehicle may send its own status information (e.g., position, speed, direction of travel, etc.) to the roadside facility while receiving instructions (e.g., signal light changes, speed limit information, etc.) from the roadside facility.

b) Communication between signal lamps: information exchange between the signal lamps can be realized through a Wireless Sensor Network (WSN) or a cellular network so as to carry out cooperative control. With this communication mechanism, adjacent traffic lights may share local traffic information, such as traffic volume, waiting time, etc.

c) The central controller communicates with the signal lamp: the central controller communicates with the individual signal lights via a wired or wireless network. The central controller is responsible for processing information from the signal lights and the vehicle, running a multi-agent reinforcement learning algorithm (CVR-MARL), and sending control strategies to the corresponding signal lights.

d) Data fusion and processing: the multi-mode signal sensing module sends the collected data to the data processing module. The data processing module is responsible for carrying out feature fusion on the multi-mode data, constructing a state space and transmitting the state space to the multi-agent reinforcement learning algorithm.

Specifically, the multi-agent reinforcement learning model is trained using historical data or a simulation environment to find an optimal strategy. In practical applications, besides deploying the optimal strategy to the signal lamp control system, we need to consider the dynamic adjustment capability to better adapt to the continuously changing traffic conditions in practical applications. To achieve this goal, we can take the following strategies:

s7.1, online learning and updating: by online learning and updating the strategy, we can update the reinforcement learning model in real time according to the current traffic conditions. This means that our system can constantly learn and adapt to the actual traffic environment, thereby improving the effect of signal lamp control.

S7.2, exploration-utilization tradeoff: in the implementation phase we need to make a trade-off between exploration and utilization. By introducing a certain degree of exploration, we can make the model try new strategies continuously in practical application to find a better solution. However, excessive exploration may reduce the stability of the system. Thus, we need to find a suitable balance between exploration and utilization.

S7.3, abnormal condition processing: in practical applications, some abnormal situations may occur, such as traffic accidents, road closure, etc. For these situations, it is desirable to design a set of exception handling mechanisms so that the system can automatically adjust when these problems are encountered. For example, when a road closure is detected, the system may automatically adjust the traffic light strategy to guide the vehicle around.

S7.4, real-time feedback and adjustment: to further enhance the dynamic tuning capabilities of the system, we can introduce a real-time feedback mechanism into the signal lamp control system. By collecting real-time traffic data and comparing the real-time traffic data with the model prediction result, the system can be self-adjusted according to actual conditions, so that the system is better suitable for actual traffic conditions.

Through the strategy, the signal lamp control system has strong dynamic adjustment capability, can be better adapted to continuously changing traffic conditions in practical application, and improves the overall signal lamp control effect.

While embodiments of the invention have been shown and described, it will be understood by those skilled in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. The signal lamp cooperative control method based on multi-agent reinforcement learning is characterized by comprising the following steps:

s4, designing an action space for the intelligent body;

s6, designing a communication protocol;

s7, training the multi-agent reinforcement learning model by using historical data or simulation environment to find an optimal strategy;

the multi-mode definition in the S1 comprises a visual mode and a radar mode; image data collected by the visual mode through a camera; distance and speed information collected by the radar module through a radar; the information comprises road condition information, vehicle position and speed;

the objective function of the collaborative vehicle-road multi-agent reinforcement learning algorithm is defined as:

J(θ)＝∑_t R(s_t,a_t)

the loss function L (θ) is expressed as:

L(θ)＝0.5*E[(R(s,a)+γ*max_a'Q(s',a'；θ')-Q(s,a；θ))^2]

wherein E [. Cndot. ] represents an expected value, θ represents a parameter of the current agent, θ ' represents a parameter of the target agent, γ is a discount factor, a ' represents an action made by the agent in state s ', Q (s, a; θ) is an action cost function for estimating a jackpot for taking action a in state s; q (s ', a '; theta ') represents the evaluation of the action value of the network on the action a ' in the state s ' and is used for measuring the quality of the action on the action a ' in the state s ';

the implementation steps of the objective function of the collaborative vehicle-road multi-agent reinforcement learning algorithm comprise:

2. The signal lamp cooperative control method based on multi-agent reinforcement learning according to claim 1, wherein the data fusion technology is specifically:

s1.1, modeling a scene into a graph structure;

s1.2, extracting features of data of each mode;

s1.3, feature fusion based on a graph convolution neural network;

s1.4, outputting the reinforcement learning state.

3. The method according to claim 1, wherein the data in step S3 includes vehicle data, road condition data, and signal lamp data.

4. The signal lamp cooperative control method based on multi-agent reinforcement learning and multi-mode signal sensing according to claim 3, wherein the state space is represented as follows:

s= { vehicle data, road condition data, signal lamp data }.

5. The signal lamp cooperative control method based on multi-agent reinforcement learning according to claim 1, wherein the step S4 specifically includes:

s4.3, designing a self-adaptive signal lamp control strategy.

6. The multi-agent reinforcement learning based signal lamp cooperative control method according to claim 1, wherein the communication protocol includes vehicle-to-road side facility communication, communication between signal lamps, central controller-to-signal lamp communication, and data fusion and processing.

7. The signal lamp cooperative control method based on multi-agent reinforcement learning according to claim 1, wherein the step S7 is specifically: training the multi-agent reinforcement learning model by using historical data or a simulation environment, finding an optimal strategy, and deploying the optimal strategy to a signal lamp control system.