CN116612636B - Signal lamp cooperative control method based on multi-agent reinforcement learning - Google Patents

Signal lamp cooperative control method based on multi-agent reinforcement learning Download PDF

Info

Publication number
CN116612636B
CN116612636B CN202310582760.7A CN202310582760A CN116612636B CN 116612636 B CN116612636 B CN 116612636B CN 202310582760 A CN202310582760 A CN 202310582760A CN 116612636 B CN116612636 B CN 116612636B
Authority
CN
China
Prior art keywords
data
signal lamp
vehicle
agent
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310582760.7A
Other languages
Chinese (zh)
Other versions
CN116612636A (en
Inventor
欧阳雅捷
殷力
郭艺雯
赵阔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN202310582760.7A priority Critical patent/CN116612636B/en
Publication of CN116612636A publication Critical patent/CN116612636A/en
Application granted granted Critical
Publication of CN116612636B publication Critical patent/CN116612636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/052Detecting movement of traffic to be counted or controlled with provision for determining speed or overspeed
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • G08G1/08Controlling traffic signals according to detected number or speed of vehicles
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/09Arrangements for giving variable traffic instructions
    • G08G1/095Traffic lights

Abstract

The invention provides a signal lamp cooperative control method based on multi-agent reinforcement learning and multi-mode signal perception, which comprises the following steps: collecting data of various sensors, performing multi-mode definition, and acquiring information in real time through a data fusion technology; the signal lamp and the vehicle are cooperatively controlled by adopting a cooperative vehicle-road multi-agent reinforcement learning algorithm; preprocessing the collected data of various sensors, and fusing the data of different modes by utilizing a characteristic fusion method to construct a local state space for each intelligent body; designing action spaces for signal lamp intelligent bodies and vehicle intelligent bodies; designing a reward function for multi-agent reinforcement learning according to the traffic flow control target; designing a communication protocol suitable for a vehicle-road cooperative control scene; and training the multi-agent reinforcement learning model by using historical data or a simulation environment to find an optimal strategy. According to the invention, by introducing the vehicle as an intelligent body, more effective vehicle-road coordination is realized, and the traffic control effect is further improved.

Description

Signal lamp cooperative control method based on multi-agent reinforcement learning
Technical Field
The invention belongs to the field of vehicle-road coordination, and particularly relates to a signal lamp cooperative control method based on multi-agent reinforcement learning.
Background
With increasing urban traffic, traditional signal lamp control methods have been difficult to meet the efficient traffic demands of modern cities. To solve this problem, researchers have begun to employ Intelligent Transportation Systems (ITS) to improve road traffic efficiency. Among them, a signal lamp control system based on multi-agent reinforcement learning and multi-mode signal sensing is attracting attention.
Conventional signal control methods are generally based on fixed signal periods or predetermined traffic flow patterns, lacking adaptability to real-time traffic conditions. Therefore, a synergistic method for breaking through the limitations of the conventional traffic signal lamp control method and improving the adaptability to real-time traffic conditions is urgently needed.
Disclosure of Invention
The invention aims to provide a signal lamp cooperative control method based on multi-agent reinforcement learning, which realizes more effective vehicle-road cooperation by introducing vehicles as agents and further improves the traffic control effect.
In order to achieve the above object, the present invention provides a signal lamp cooperative control method based on multi-agent reinforcement learning, the method comprising:
s1, collecting data of various sensors, performing multi-mode definition, and acquiring information in real time through a data fusion technology;
s2, adopting a cooperative vehicle-road multi-agent reinforcement learning algorithm to cooperatively control the signal lamp and the vehicle, and finding an optimal strategy through learning to realize efficient traffic flow control;
s3, preprocessing the collected data of various sensors, and fusing the data of different modes by utilizing a characteristic fusion method to construct a local state space for each intelligent agent;
s4, designing an action space for the intelligent body;
s5, designing a reward function for multi-agent reinforcement learning according to the traffic flow control target;
s6, designing a communication protocol;
and S7, training the multi-agent reinforcement learning model by using historical data or simulation environment to find out an optimal strategy.
Further, the multi-modal definition in S1 includes a visual modality and a radar modality; image data collected by the visual mode through a camera; distance and speed information collected by the radar module through a radar; the information includes road condition information, vehicle position and speed.
Further, the data fusion technique specifically includes:
s1.1, modeling a scene into a graph structure;
s1.2, extracting features of data of each mode;
s1.3, feature fusion based on a graph convolution neural network;
s1.4, outputting the reinforcement learning state.
Further, the objective function of the collaborative vehicle-road multi-agent reinforcement learning algorithm is defined as:
the bonus function is defined as R (s, a), where s represents a state and a represents an action of an agent, expressed as:
J(θ)=∑_t R(s_t,a_t)
wherein J (θ) represents an objective function; s_t represents the state of the agent at the moment t; a_t represents actions made by the agent at time t;
the loss function L (θ) is expressed as:
L(θ)=0.5*E[(R(s,a)+γ*max_a'Q(s',a';θ')-Q(s,a;θ))^2]
wherein E [. Cndot. ] represents an expected value, θ represents a parameter of the current agent, θ ' represents a parameter of the target agent, γ is a discount factor, a ' represents an action made by the agent in state s ', Q (s, a; θ) is an action cost function for estimating a jackpot for taking action a in state s; q (s ', a '; θ ') represents an evaluation of the action value of the network for action a ' in state s ', and is used to measure how good the action a ' is in state s '.
Further, the implementation steps of the objective function of the collaborative vehicle-road multi-agent reinforcement learning algorithm include:
s2.1, utilizing a centralized training and a distributed execution strategy, realizing the cooperation among the intelligent agents by performing the centralized training in a training stage, and making a decision according to a local state by using the distributed strategy by each intelligent agent in an execution stage;
s2.2, acquiring a comprehensive state space containing multi-mode information through the data of the various sensors in the step S1, so that an intelligent agent can more accurately sense traffic conditions;
s2.3, the vehicle and the signal lamp are used as different intelligent agents, so that cooperative control between the vehicle and the signal lamp is realized, and the traffic fluency is improved.
Further, the data in the step S3 includes vehicle data, road condition data and signal lamp data;
further, the state space is represented as follows:
s= { vehicle data, road condition data, signal lamp data }.
Further, the step S4 specifically includes:
s4.1, discretizing the control strategy of the signal lamp into a series of optional actions;
s4.2, dynamically adjusting the phase setting of the signal lamp according to the real-time road condition data;
s4.3, designing a self-adaptive signal lamp control strategy.
Further, the communication protocol includes vehicle to roadside facility communication, communication between lights, central controller to lights communication, and data fusion and processing.
Further, the step S7 specifically includes: training the multi-agent reinforcement learning model by using historical data or a simulation environment, finding an optimal strategy, and deploying the optimal strategy to a signal lamp control system.
The beneficial technical effects of the invention are at least as follows:
(1) The multi-mode signal perception technology provides abundant real-time traffic information for the signal lamp control system. By fusing data from various sensors such as cameras, radars, vehicle-mounted sensors and the like, the system can more accurately sense traffic conditions and provide more targeted decision basis for signal lamp control.
(2) By the multi-mode feature fusion method based on the graph convolution neural network, the topological relation between the vehicle and the signal lamp can be captured better. Meanwhile, through the method, the multi-mode signal perception data can be fused into a unified state space, and richer and more accurate information is provided for the multi-agent reinforcement learning algorithm.
(3) Through the optimal strategy of the invention, the signal lamp control system has stronger dynamic adjustment capability, can better adapt to continuously-changing traffic conditions in practical application, and improves the overall signal lamp control effect.
Drawings
The invention will be further described with reference to the accompanying drawings, in which embodiments do not constitute any limitation of the invention, and other drawings can be obtained by one of ordinary skill in the art without inventive effort from the following drawings.
FIG. 1 is a flow chart of a signal lamp cooperative control method based on multi-agent reinforcement learning.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
Example 1
In one or more embodiments, as shown in fig. 1, a signal lamp cooperative control method based on multi-agent reinforcement learning is disclosed, which includes the following steps:
s1, collecting data of various sensors, performing multi-mode definition, and acquiring information in real time through a data fusion technology; .
Specifically, step S1 is responsible for collecting data of various sensors (such as cameras, radars, vehicle sensors, etc.), and acquiring information of road condition information, vehicle position, speed, etc. in real time through a data fusion technology. This information will be used to construct a state space for multi-agent reinforcement learning.
The multi-modal definition specifically includes:
a. visual modality: the image data collected by the camera can provide information on the position, shape, speed and the like of the vehicle. The camera can be installed on the roadside or on the vehicle to transmit image data in real time.
b. Radar mode: the distance and speed information collected by the radar helps to accurately detect the position, speed and distance of the vehicle. The radar may be installed on a roadside or a vehicle, and transmits distance data in real time. These data may be transmitted in real time to the signal control system via an on-board communication system (e.g., V2X communication).
The step of data fusion in the step S1 specifically includes:
s1.1, modeling a scene into a graph structure. In the traffic light control scenario, there is a certain topological relationship between the vehicle and the lights. We can model a scene as a graph structure with vehicles and signal lights as nodes and their interrelationships as edges;
s1.2, extracting features of data of each mode. For each modality of data, feature extraction is first performed. For visual modalities, features may be extracted using Convolutional Neural Networks (CNNs); for radar modalities, features may be extracted using one-dimensional convolutional neural networks or Recurrent Neural Networks (RNNs); for the in-vehicle sensor modality, features may be extracted using a Fully Connected Neural Network (FCNN). The extracted features are then represented as node features.
And S1.3, feature fusion based on a graph convolution neural network. The extracted multi-modal features are input as node features into a graph roll-up neural network (GCN). The GCN may capture the topological relationship between nodes while preserving node characteristics. Through the propagation and convergence operations of the GCN, the comprehensive characteristics including multi-mode information and the topological relation among the nodes can be obtained.
S1.4, outputting the reinforcement learning state. And taking the GCN fused characteristics as a reinforcement learning state space for a subsequent multi-agent reinforcement learning algorithm.
By the multi-mode feature fusion method based on the graph convolution neural network, the topological relation between the vehicle and the signal lamp can be better captured. Meanwhile, through the method, the multi-mode signal perception data can be fused into a unified state space, and richer and more accurate information is provided for the multi-agent reinforcement learning algorithm.
S2, a cooperative vehicle-road multi-agent reinforcement learning algorithm is adopted to cooperatively control the signal lamp and the vehicle, wherein the signal lamp and the vehicle are regarded as a plurality of agents, and an optimal strategy is found through learning to realize efficient traffic flow control.
Specifically, the collaborative Road Multi-agent reinforcement learning algorithm is called CVR-MARL, which is collectively called Collaborative Vehicle-Road Multi-Agent Reinforcement Learning, and the objective function of CVR-MARL is to maximize the system's jackpot. In this model, signal lights and vehicles are considered as multiple agents that need to find optimal strategies through learning to achieve efficient traffic flow control.
The objective function of CVR-MARL is defined as:
the bonus function is defined as R (s, a), where s represents the state (based on the multi-modal perceived feature fusion result), and a represents the actions of the agent, expressed as:
J(θ)=∑_t R(s_t,a_t)
wherein J (θ) represents an objective function; s_t represents the state of the agent at the moment t; a_t represents actions made by the agent at time t;
the loss function L (θ) is expressed as:
L(θ)=0.5*E[(R(s,a)+γ*max_a'Q(s',a';θ')-Q(s,a;θ))^2]
wherein E [. Cndot. ] represents an expected value, θ represents a parameter of the current agent, θ ' represents a parameter of the target agent, γ is a discount factor, a ' represents an action made by the agent in state s ', Q (s, a; θ) is an action cost function for estimating a jackpot for taking action a in state s; q (s ', a '; θ ') represents an evaluation of the action value of the network for action a ' in state s ', and is used to measure how good the action a ' is in state s '.
The method for realizing the aim concretely comprises the following steps:
s2.1, multi-agent cooperation: the intelligent training system has the advantages that the centralized training and the distributed execution strategy are utilized, the cooperation among the intelligent agents is realized through the centralized training in the training stage, and in the execution stage, each intelligent agent (including vehicles and signal lamps) uses the distributed strategy to make a decision according to the local state;
s2.2, constructing a state space: the comprehensive state space containing multi-mode information is obtained through the data of the various sensors in the step S1, so that an intelligent agent can more accurately sense traffic conditions and construct the state space;
s2.3, vehicle-road cooperation: the vehicle and the signal lamp are used as different intelligent bodies, so that cooperative control between the vehicle and the signal lamp is realized, and the traffic fluency is improved.
S3, preprocessing the collected data of various sensors, and fusing the data of different modes together by using a characteristic fusion method to construct a local state space for each intelligent body.
Specifically, according to the data collected by the multi-mode signal sensing module, a local state space is built for each intelligent agent (signal lamp and vehicle), and the CVR-MARL collects multi-mode data from three aspects of vehicles, road conditions and signal lamps. Comprising the following data:
vehicle data:
a) Position information: longitude and latitude, heading angle, etc. of the vehicle.
b) Speed information: real-time speed of the vehicle.
c) Acceleration information: real-time acceleration of the vehicle.
d) Vehicle type: such as cars, trucks, buses, etc.
e) Vehicle communication data: communication information between vehicles, such as internet of vehicles (V2V), and the like.
Road condition data:
a) Road structure: the width of the road, the number of lanes, the dividing strips and the like.
b) Traffic flow: the number, density, etc. of vehicles in each lane.
c) Road environment information: road conditions, weather, lighting, etc.
Signal lamp data:
a) Status information: the current state of the signal lamp (red, green, yellow).
b) Remaining time: the remaining time of the signal lamp state change.
c) Signal lamp control strategy: such as fixed cycle control, inductive control, etc.
d) Vehicle-signal lamp communication data: communication information between the vehicle and the signal lamp, such as vehicle road communication data (V2I), and the like.
From the collected multimodal data, a state space can be constructed as follows:
s= { vehicle data, road condition data, traffic light data }
When constructing the state space, preprocessing is required for collected multi-modal data to eliminate dimension and dimension differences between the data. For example, the data may be normalized so as to be within the same range. And then, fusing the data of different modes together by using a characteristic fusion method to generate a comprehensive state representation. The comprehensive state representation can fully utilize multi-mode information to provide richer environment perception for the CVR-MARL, so that cooperative control of the vehicle and the road is better realized.
S4, designing a feasible action space for the signal lamp intelligent body and the vehicle intelligent body.
Specifically, a feasible action space is designed for signal lamp agents and vehicle agents. When designing the action space, the control strategy of the signal lamp needs to be considered, so that the action space can adapt to different road conditions. In order to realize the dynamic signal lamp action space, the following method is adopted:
s4.1, discretizing the control strategy of the signal lamp into a series of optional actions, for example, the signal lamp can be divided into a plurality of discrete actions according to parameters such as phase, duration, change rate and the like of the signal lamp. The method simplifies the representation of the action space and is convenient for the reinforcement learning algorithm to explore and optimize.
And S4.2, dynamically adjusting the phase setting of the signal lamp according to the real-time road condition data, for example, increasing green light time on a lane with larger traffic flow so as to relieve congestion. In addition, the phase sequence and the time length can be adjusted according to the road structure, the vehicle type, the weather and other factors so as to improve the traffic efficiency.
S4.3, designing a self-adaptive signal lamp control strategy, for example, adopting induction control at the intersection with smaller flow and adopting cooperative control at the intersection with larger flow. In addition, parameters of the control strategy can be dynamically adjusted according to the real-time traffic data so as to adapt to the change of road conditions.
The action space is expressed as:
a= { action 1, action 2,..
Wherein each action corresponds to a signal lamp control strategy or parameter setting. By dynamically adjusting the action space and the control strategy, intelligent control of the signal lamp can be realized, so that traffic efficiency and safety are improved.
S5, designing a proper reward function for multi-agent reinforcement learning according to the traffic flow control target.
Specifically, a suitable reward function is designed for multi-agent reinforcement learning based on the goals of traffic flow control (e.g., congestion reduction, emissions reduction, etc.). The bonus function needs to balance various factors to achieve an optimal vehicle-road cooperative control effect. Specifically, the transfer reward function R (s, a, s') is designed as follows:
R(s,a,s')=w1*T(s,a,s')+w2*D(s,a,s')+w3*S(s,a,s')
wherein:
s: a current state;
a: actions performed by the agent;
s': a new state after the action is executed;
w1, w2, w3: the weight parameter is used for balancing the importance of each index;
t (s, a, s'): traffic efficiency indicators, such as average speed or waiting time of vehicles passing through an intersection;
d (s, a, s'): traffic congestion degree indexes such as the length of vehicles queued at an intersection or the number of waiting vehicles;
s (S, a, S'): traffic safety indicators such as the probability of a traffic accident or the safety distance between a vehicle and a pedestrian.
The design of the reward function needs to take into account aspects such as traffic efficiency, congestion level and safety, so as to guide the intelligent agent to make decisions beneficial to the overall traffic condition. In practical application, the weight parameters and the index function can be adjusted according to specific scenes and requirements so as to achieve a better control effect.
S6, designing a communication protocol suitable for a vehicle-road cooperative control scene.
Specifically, a communication protocol suitable for a vehicle-road cooperative control scene is designed, so that information can be efficiently and safely exchanged between intelligent agents. The communication protocol needs to take into account low latency, high reliability, security, etc. It is believed that the specific devices should be able to communicate in different ways before the purpose of the present invention is achieved.
The communication protocol comprises vehicle and road side facility communication, communication between signal lamps, communication between a central controller and the signal lamps, and data fusion and processing, and specifically comprises the following steps:
a) Vehicle-to-roadside facility communication (vehicle-to-road communication): two-way communication between the vehicle and roadside equipment (e.g., signal lights, sensors, etc.) is achieved using Dedicated Short Range Communication (DSRC) or vehicle-to-vehicle internet (V2X) technologies. The vehicle may send its own status information (e.g., position, speed, direction of travel, etc.) to the roadside facility while receiving instructions (e.g., signal light changes, speed limit information, etc.) from the roadside facility.
b) Communication between signal lamps: information exchange between the signal lamps can be realized through a Wireless Sensor Network (WSN) or a cellular network so as to carry out cooperative control. With this communication mechanism, adjacent traffic lights may share local traffic information, such as traffic volume, waiting time, etc.
c) The central controller communicates with the signal lamp: the central controller communicates with the individual signal lights via a wired or wireless network. The central controller is responsible for processing information from the signal lights and the vehicle, running a multi-agent reinforcement learning algorithm (CVR-MARL), and sending control strategies to the corresponding signal lights.
d) Data fusion and processing: the multi-mode signal sensing module sends the collected data to the data processing module. The data processing module is responsible for carrying out feature fusion on the multi-mode data, constructing a state space and transmitting the state space to the multi-agent reinforcement learning algorithm.
And S7, training the multi-agent reinforcement learning model by using historical data or simulation environment to find out an optimal strategy.
Specifically, the multi-agent reinforcement learning model is trained using historical data or a simulation environment to find an optimal strategy. In practical applications, besides deploying the optimal strategy to the signal lamp control system, we need to consider the dynamic adjustment capability to better adapt to the continuously changing traffic conditions in practical applications. To achieve this goal, we can take the following strategies:
s7.1, online learning and updating: by online learning and updating the strategy, we can update the reinforcement learning model in real time according to the current traffic conditions. This means that our system can constantly learn and adapt to the actual traffic environment, thereby improving the effect of signal lamp control.
S7.2, exploration-utilization tradeoff: in the implementation phase we need to make a trade-off between exploration and utilization. By introducing a certain degree of exploration, we can make the model try new strategies continuously in practical application to find a better solution. However, excessive exploration may reduce the stability of the system. Thus, we need to find a suitable balance between exploration and utilization.
S7.3, abnormal condition processing: in practical applications, some abnormal situations may occur, such as traffic accidents, road closure, etc. For these situations, it is desirable to design a set of exception handling mechanisms so that the system can automatically adjust when these problems are encountered. For example, when a road closure is detected, the system may automatically adjust the traffic light strategy to guide the vehicle around.
S7.4, real-time feedback and adjustment: to further enhance the dynamic tuning capabilities of the system, we can introduce a real-time feedback mechanism into the signal lamp control system. By collecting real-time traffic data and comparing the real-time traffic data with the model prediction result, the system can be self-adjusted according to actual conditions, so that the system is better suitable for actual traffic conditions.
Through the strategy, the signal lamp control system has strong dynamic adjustment capability, can be better adapted to continuously changing traffic conditions in practical application, and improves the overall signal lamp control effect.
While embodiments of the invention have been shown and described, it will be understood by those skilled in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims (7)

1. The signal lamp cooperative control method based on multi-agent reinforcement learning is characterized by comprising the following steps:
s1, collecting data of various sensors, performing multi-mode definition, and acquiring information in real time through a data fusion technology;
s2, adopting a cooperative vehicle-road multi-agent reinforcement learning algorithm to cooperatively control the signal lamp and the vehicle, and finding an optimal strategy through learning to realize efficient traffic flow control;
s3, preprocessing the collected data of various sensors, and fusing the data of different modes by utilizing a characteristic fusion method to construct a local state space for each intelligent agent;
s4, designing an action space for the intelligent body;
s5, designing a reward function for multi-agent reinforcement learning according to the traffic flow control target;
s6, designing a communication protocol;
s7, training the multi-agent reinforcement learning model by using historical data or simulation environment to find an optimal strategy;
the multi-mode definition in the S1 comprises a visual mode and a radar mode; image data collected by the visual mode through a camera; distance and speed information collected by the radar module through a radar; the information comprises road condition information, vehicle position and speed;
the objective function of the collaborative vehicle-road multi-agent reinforcement learning algorithm is defined as:
the bonus function is defined as R (s, a), where s represents a state and a represents an action of an agent, expressed as:
J(θ)=∑_t R(s_t,a_t)
wherein J (θ) represents an objective function; s_t represents the state of the agent at the moment t; a_t represents actions made by the agent at time t;
the loss function L (θ) is expressed as:
L(θ)=0.5*E[(R(s,a)+γ*max_a'Q(s',a';θ')-Q(s,a;θ))^2]
wherein E [. Cndot. ] represents an expected value, θ represents a parameter of the current agent, θ ' represents a parameter of the target agent, γ is a discount factor, a ' represents an action made by the agent in state s ', Q (s, a; θ) is an action cost function for estimating a jackpot for taking action a in state s; q (s ', a '; theta ') represents the evaluation of the action value of the network on the action a ' in the state s ' and is used for measuring the quality of the action on the action a ' in the state s ';
the implementation steps of the objective function of the collaborative vehicle-road multi-agent reinforcement learning algorithm comprise:
s2.1, utilizing a centralized training and a distributed execution strategy, realizing the cooperation among the intelligent agents by performing the centralized training in a training stage, and making a decision according to a local state by using the distributed strategy by each intelligent agent in an execution stage;
s2.2, acquiring a comprehensive state space containing multi-mode information through the data of the various sensors in the step S1, so that an intelligent agent can more accurately sense traffic conditions;
s2.3, the vehicle and the signal lamp are used as different intelligent agents, so that cooperative control between the vehicle and the signal lamp is realized, and the traffic fluency is improved.
2. The signal lamp cooperative control method based on multi-agent reinforcement learning according to claim 1, wherein the data fusion technology is specifically:
s1.1, modeling a scene into a graph structure;
s1.2, extracting features of data of each mode;
s1.3, feature fusion based on a graph convolution neural network;
s1.4, outputting the reinforcement learning state.
3. The method according to claim 1, wherein the data in step S3 includes vehicle data, road condition data, and signal lamp data.
4. The signal lamp cooperative control method based on multi-agent reinforcement learning and multi-mode signal sensing according to claim 3, wherein the state space is represented as follows:
s= { vehicle data, road condition data, signal lamp data }.
5. The signal lamp cooperative control method based on multi-agent reinforcement learning according to claim 1, wherein the step S4 specifically includes:
s4.1, discretizing the control strategy of the signal lamp into a series of optional actions;
s4.2, dynamically adjusting the phase setting of the signal lamp according to the real-time road condition data;
s4.3, designing a self-adaptive signal lamp control strategy.
6. The multi-agent reinforcement learning based signal lamp cooperative control method according to claim 1, wherein the communication protocol includes vehicle-to-road side facility communication, communication between signal lamps, central controller-to-signal lamp communication, and data fusion and processing.
7. The signal lamp cooperative control method based on multi-agent reinforcement learning according to claim 1, wherein the step S7 is specifically: training the multi-agent reinforcement learning model by using historical data or a simulation environment, finding an optimal strategy, and deploying the optimal strategy to a signal lamp control system.
CN202310582760.7A 2023-05-22 2023-05-22 Signal lamp cooperative control method based on multi-agent reinforcement learning Active CN116612636B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310582760.7A CN116612636B (en) 2023-05-22 2023-05-22 Signal lamp cooperative control method based on multi-agent reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310582760.7A CN116612636B (en) 2023-05-22 2023-05-22 Signal lamp cooperative control method based on multi-agent reinforcement learning

Publications (2)

Publication Number Publication Date
CN116612636A CN116612636A (en) 2023-08-18
CN116612636B true CN116612636B (en) 2024-01-23

Family

ID=87677681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310582760.7A Active CN116612636B (en) 2023-05-22 2023-05-22 Signal lamp cooperative control method based on multi-agent reinforcement learning

Country Status (1)

Country Link
CN (1) CN116612636B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109559530A (en) * 2019-01-07 2019-04-02 大连理工大学 A kind of multi-intersection signal lamp cooperative control method based on Q value Transfer Depth intensified learning
CN111583675A (en) * 2020-05-14 2020-08-25 吴钢 Regional road network traffic signal lamp coordination control system and method
CN112700663A (en) * 2020-12-23 2021-04-23 大连理工大学 Multi-agent intelligent signal lamp road network control method based on deep reinforcement learning strategy
WO2022110611A1 (en) * 2020-11-26 2022-06-02 东南大学 Pedestrian road-crossing behavior prediction method for plane intersection
CN114627657A (en) * 2022-03-09 2022-06-14 哈尔滨理工大学 Adaptive traffic signal control method based on deep graph reinforcement learning
CN114638339A (en) * 2022-03-10 2022-06-17 中国人民解放军空军工程大学 Intelligent agent task allocation method based on deep reinforcement learning
KR20220102395A (en) * 2021-01-13 2022-07-20 부경대학교 산학협력단 System and Method for Improving of Advanced Deep Reinforcement Learning Based Traffic in Non signalalized Intersections for the Multiple Self driving Vehicles
CN115731724A (en) * 2022-11-17 2023-03-03 北京航空航天大学 Regional traffic signal timing method and system based on reinforcement learning
CN115861632A (en) * 2022-12-20 2023-03-28 清华大学 Three-dimensional target detection method based on visual laser fusion of graph convolution

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109559530A (en) * 2019-01-07 2019-04-02 大连理工大学 A kind of multi-intersection signal lamp cooperative control method based on Q value Transfer Depth intensified learning
CN111583675A (en) * 2020-05-14 2020-08-25 吴钢 Regional road network traffic signal lamp coordination control system and method
WO2022110611A1 (en) * 2020-11-26 2022-06-02 东南大学 Pedestrian road-crossing behavior prediction method for plane intersection
CN112700663A (en) * 2020-12-23 2021-04-23 大连理工大学 Multi-agent intelligent signal lamp road network control method based on deep reinforcement learning strategy
KR20220102395A (en) * 2021-01-13 2022-07-20 부경대학교 산학협력단 System and Method for Improving of Advanced Deep Reinforcement Learning Based Traffic in Non signalalized Intersections for the Multiple Self driving Vehicles
CN114627657A (en) * 2022-03-09 2022-06-14 哈尔滨理工大学 Adaptive traffic signal control method based on deep graph reinforcement learning
CN114638339A (en) * 2022-03-10 2022-06-17 中国人民解放军空军工程大学 Intelligent agent task allocation method based on deep reinforcement learning
CN115731724A (en) * 2022-11-17 2023-03-03 北京航空航天大学 Regional traffic signal timing method and system based on reinforcement learning
CN115861632A (en) * 2022-12-20 2023-03-28 清华大学 Three-dimensional target detection method based on visual laser fusion of graph convolution

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
基于多智能体强化学习的交通信号灯协同控制算法的研究;丛珊;中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑(第1期);第19-24页 *
基于多智能体深度强化学习的交通信号控制算法研究;杨山田;中国博士学位论文全文数据库 工程科技Ⅱ辑(第1期);第66-72页 *
基于深度强化学习的交通信号控制方法;孙浩 等;计算机科学;第47卷(第2期);第169-174页 *
大数据驱动的快消品终端拜访"云-边"联动决策与优化;赵阔 等;机械工程学报;第59卷;第1-11页 *

Also Published As

Publication number Publication date
CN116612636A (en) 2023-08-18

Similar Documents

Publication Publication Date Title
CN107507430B (en) Urban intersection traffic control method and system
CN109767630B (en) A kind of traffic signal control system based on bus or train route collaboration
CN111222630B (en) Autonomous driving rule learning method based on deep reinforcement learning
CN114418468B (en) Smart city traffic scheduling strategy control method and Internet of things system
CN108010307A (en) Fleet controls
CN111311959B (en) Multi-interface cooperative control method and device, electronic equipment and storage medium
CN108961803A (en) Vehicle drive assisting method, device, system and terminal device
EP2276012A2 (en) Methods for transmission power control in vehicle-to-vehicle communication
CN104575035A (en) Intersection self-adaptation control method based on car networking environment
CN110444015B (en) Intelligent network-connected automobile speed decision method based on no-signal intersection partition
CN114360266B (en) Intersection reinforcement learning signal control method for sensing detection state of internet connected vehicle
Mo et al. Simulation and analysis on overtaking safety assistance system based on vehicle-to-vehicle communication
CN111028504A (en) Urban expressway intelligent traffic control method and system
CN112925309A (en) Intelligent networking automobile data interaction method and system
CN113362605B (en) Distributed highway optimization system and method based on potential homogeneous area analysis
CN108733063B (en) Autonomous cooperative driving decision method for automatic driving vehicle
CN116612636B (en) Signal lamp cooperative control method based on multi-agent reinforcement learning
CN116434524A (en) Unmanned system, method and vehicle for hybrid traffic flow down-road side perception cloud planning
CN113409567B (en) Traffic assessment method and system for mixed traffic lane of public transport and automatic driving vehicle
Alkhatib et al. A New System for Road Traffic Optimisation Using the Virtual Traffic Light Technology.
Kavas Torris Eco-Driving of Connected and Automated Vehicles (CAVs)
Torris Eco-Driving of Connected and Automated Vehicles (CAVs)
CN114202949B (en) Method for identifying adjacent vehicles and adjusting reference paths of expressway intelligent network-connected automobile
Cao et al. The Design of Vehicle Profile Based on Multivehicle Collaboration for Autonomous Vehicles in Roundabouts
Wang Research on self-organising control method of urban intelligent traffic signal based on vehicle networking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant