CN115691167A - Single-point traffic signal control method based on intersection holographic data - Google Patents

Single-point traffic signal control method based on intersection holographic data Download PDF

Info

Publication number
CN115691167A
CN115691167A CN202211253243.7A CN202211253243A CN115691167A CN 115691167 A CN115691167 A CN 115691167A CN 202211253243 A CN202211253243 A CN 202211253243A CN 115691167 A CN115691167 A CN 115691167A
Authority
CN
China
Prior art keywords
network
data
intersection
signal control
holographic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211253243.7A
Other languages
Chinese (zh)
Inventor
王涛
赵晓寅
程瑞
徐奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202211253243.7A priority Critical patent/CN115691167A/en
Publication of CN115691167A publication Critical patent/CN115691167A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Traffic Control Systems (AREA)

Abstract

The invention relates to the technical field of traffic control, in particular to a single-point traffic signal control method based on intersection holographic data, which is characterized in that a deep reinforcement learning model comprising a plurality of intelligent bodies is constructed, then a MARDDPG algorithm is used for training a neural network, and finally the trained neural network is used for controlling signals.

Description

Single-point traffic signal control method based on intersection holographic data
Technical Field
The invention relates to the technical field of traffic control, in particular to a single-point traffic signal control method based on intersection holographic data.
Background
Currently, the widely used induction signal control systems in China mainly include the british SCOOT (Split Cycle Offset Optimization technology) System and australian SCATS (systematic Coordinated Area Traffic System). With the continuous acceleration of the urbanization process and the exponential increase of the number of motor vehicles, the traditional induction traffic signal control mode is difficult to deal with the traffic flow with huge real-time change and carry out effective management, and the traditional induction control principle is as follows: and adjusting the green light duration based on the headway, the occupancy, the queuing length and the crowding degree.
The disadvantages are mainly shown in the following aspects:
1. the detection data of the traditional induction traffic signal control system cannot comprehensively and effectively represent the traffic demand of an intersection, the coil detection error is large, the reliability is low, and the effectiveness of an algorithm and a control method is difficult to effectively exert.
2. The conventional deep reinforcement learning traffic control method is generally insufficient in cooperative optimization, a single intelligent agent only can perform system optimization on traffic signal control, and the discrete decision space of a combined optimization problem is too complex;
3. the state space dimension of the single intelligent agent is too high, the calculation time is long, and the single intelligent agent cannot be used on the ground actually.
Disclosure of Invention
The invention aims to provide a single-point traffic signal control method based on intersection holographic data, which utilizes an intersection vehicle holographic detection means to acquire vehicle position and speed information in real time and more effectively and accurately control traffic signals.
In order to achieve the purpose, the invention provides a single-point traffic signal control method based on intersection holographic data, which comprises the following steps:
collecting initial holographic traffic data;
processing holographic traffic data;
constructing a deep reinforcement learning model comprising a plurality of agents;
training a neural network by using a MARDDPG algorithm;
and performing signal control by using the trained neural network.
The holographic traffic data comprises target vehicle operation data, lane-level traffic data, intersection design current situation and traffic signal control current situation, wherein the target vehicle operation data comprises a timestamp ID of a target vehicle, a vehicle type, a vehicle longitudinal speed, a lane number where the vehicle is located and a distance between the vehicle and a stop line, the lane-level traffic data comprises a target lane queuing length, a total vehicle waiting time, an average delay and a number of vehicles passing the stop line, the intersection design current situation comprises the number of lanes of each entrance lane and lane function distribution, and the traffic signal control current situation comprises intersection current phase sequence and time duration distribution of each phase.
The holographic traffic data processing process comprises the following steps:
redundant data is deleted;
deleting track abnormal data;
and (5) performing noise track data completion by adopting a linear function interpolation method.
In the process of constructing the deep reinforcement learning model comprising a plurality of intelligent agents, the traffic signal control intelligent Agent selects a MARDDPG algorithm to carry out deep reinforcement learning, and defines a state space S, an action space A and a reward value R respectively.
The process of carrying out neural network training by using the MARDDPG algorithm comprises the following steps:
step 1: participant network and comment family network initialization parameterized action selection strategy
Figure BDA0003888636660000021
Wherein
Figure BDA0003888636660000022
Commenting historical memory data of the family network for historical memory data of an ith intelligent agent in a participant network at a certain time step t
Figure BDA0003888636660000023
Historical memory data of an ith agent in the network at a certain time step t is also defined; (ii) a Constructing and parameterizing value functions in a critic network
Figure BDA0003888636660000024
The input in the value function comprises a historical state h and an action a selected by the participant network; separately, the weights θ 'of all target networks are initialized' i And
Figure BDA0003888636660000025
initializing the replay buffer D, clearing the time step, and the participant network reading the initial state s t,i
And 2, step: every 5 seconds, a single agent selects a strategy by action
Figure BDA0003888636660000026
Wherein
Figure BDA0003888636660000027
Exploring noise for the current time step; the participant network selects action a in action set Ai t,i And executes, after execution, to receive the reward value r t,i With new state s t+1,i Generating new history data
Figure BDA0003888636660000028
Figure BDA0003888636660000029
And 3, step 3: will have already been performed s 1,i ,a 1,i ,r 1,i ,s 2,i ,a 2,i ,r 2,i A reader is stored in a replay buffer D, and the number of samples is not less than 20000; the agent i samples M historical training steps stored in the replay buffer D for training the network of participants and critics;
and 4, step 4: after the agent selects M pieces of historical training step data, a minimatch (expressed by M) in M is utilized to enable the comment family network to pass through the value function constructed in the step 1
Figure BDA0003888636660000031
Estimating Q value, and updating parameters in the target network of the critic by minimizing the average loss function
Figure BDA0003888636660000032
Similarly, the participant network updates the parameter θ in the participant target network by calculating a policy gradient through a loss function i
For updating
Figure BDA0003888636660000033
The loss function of (d) is specifically as follows:
Figure BDA0003888636660000034
Figure BDA0003888636660000035
representing the average value of Q values of M minimatch selection actions of all intelligent agent critics;
Figure BDA0003888636660000036
is the total prize value; gamma is a reduction coefficient;
Figure BDA0003888636660000037
representing the sum of the Q values of all agents at the next time step;
for updating theta i The objective function of (2) is specifically as follows:
Figure BDA0003888636660000038
to update the participant network to maximize future expected rewards, J (θ) is defined i ) To find a direction to maximize the jackpot;
and 5: the intelligent agent adopts a soft update mode to update the parameters of the target network
Figure BDA0003888636660000039
And theta' i First, a rate τ (0) for updating the target network based on the master network is defined<τ<1) (ii) a Updating the target network by using the convex combination of the parameters of the current network and the parameters of the target network, as follows:
Figure BDA00038886366600000310
θ' i =τθ i +(1-τ)θ' i
repeating the steps 2 to 5 when
Figure BDA00038886366600000311
In time, the best in the traversable state space has been reached and the agent completes the training.
The method comprises the following steps of performing a signal control process by using a trained neural network, specifically, acquiring real-time holographic data according to an intelligent agent construction requirement, inputting a signal to control the intelligent agent, and generating a phase time matrix G' by changing the time length of each phase signal output by the intelligent agent:
G'=[G1' G2' G3' … Gn']。
the invention provides a single-point traffic signal control method based on intersection holographic data, which is characterized in that a deep reinforcement learning model comprising a plurality of intelligent bodies is constructed, then a MARDDPG algorithm is used for training a neural network, and finally the trained neural network is used for controlling signals.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a single-point traffic signal control method based on intersection holographic data according to the present invention.
FIG. 2 is a schematic diagram of a convolution network structure of the deep reinforcement learning model of the present invention.
FIG. 3 is a schematic diagram comparing vehicle position to a position matrix and a speed matrix at an intersection of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Referring to fig. 1, the invention provides a crop disease and pest mobile terminal identification method based on deep learning, which comprises the following steps:
s1: collecting initial holographic traffic data;
s2: processing holographic traffic data;
s3: constructing a deep reinforcement learning model comprising a plurality of agents;
s4: training a neural network by using a MARDDPG algorithm;
s5: and performing signal control by using the trained neural network.
Specifically, in step S1, the holographic traffic data includes target vehicle operation data, lane-level traffic data, intersection design current situation, and traffic signal control current situation.
Wherein the target vehicle operation data includes: target vehicle identification number C id Vehicle type C s Vehicle longitudinal speed V p Number L of lane where vehicle is located i Distance Y between vehicle and stop line i
The lane-level traffic data includes: target lane queuing length Q i Total waiting time W of vehicle i Average delay D i Number of vehicles passing stop line N c
The current design situation of the intersection comprises: the number of lanes of each entrance lane at the intersection and the function distribution of the lanes.
The current control situation of traffic signals includes: the current phase sequence and the time length of each phase at the intersection are distributed.
In step S2, the holographic traffic data processing is to improve data accuracy and improve traffic control efficiency, and specifically includes the following steps:
step 2.1, redundant and miscellaneous data are deleted, and abnormal identification possibly occurs under the influence of transmission interference and equipment identification rate: the normal output data is a character string, and if the data digits do not accord with each other or the data is messy, all traffic data of related target vehicles need to be deleted.
And 2.2, deleting track abnormal data, and if the situation that the lane steering does not correspond to the data collected by the equipment is the track abnormal, deleting all traffic data of related target vehicles.
And 2.3, completing noise track data, wherein a linear function interpolation method can be adopted for collecting traffic data according to the timing step length of the equipment, firstly, the target vehicle continuous position points with the track offset larger than a threshold value form a track, the formed track is fitted into a linear function, and then the offset track points are inserted into the median positions of front and rear track points in a fitting curve.
In step S3, a deep reinforcement learning model including a plurality of agents is constructed, which specifically includes the following steps:
assuming that the intersection is controlled by classical n-phase signals at the current state, the number of passable lanes in each phase is m, under the condition of not adjusting the phase sequence, 1 signal control agent is deployed in each phase to adjust the duration of the green light of the phase to control the traffic flow, and the state and feedback can be obtained from the environment after each adjustment; all signal control agents can be optimized in a coordinated mode, and the total traffic jam of the intersection is reduced.
And 3.1, the traffic signal control Agent uses a multi-Agent deep reinforcement learning method, and selects a multi-Agent recursive depth certainty strategy gradient algorithm (MARDDPG) according to the characteristics of high precision of holographic traffic data and time continuity of traffic flow, wherein the MARDDPG algorithm is an MADDPG algorithm added with LSTM (long-short term memory).
The input of the participant (Actor) network is agent Sti = (P, V, L), where P and V are same-dimension matrices and have dimensions of 45 × m, and the P matrix and the V matrix are combined into a two-channel image and input to the stacking sub-network. The sub-network comprises two convolutional layers, the first convolutional layer comprising 32 filters, each size 4 × 4, applying a span (2, 2); the second convolutional layer contains 64 filters, each of size 2 × 2, applying a span (2,2). The phase matrix L is encoded as an 8-dimensional vector using a fully connected layer. Then, the outputs of all networks are connected into a vector, the vector synthesized by all agents is respectively sent to the LSTM containing 64 hidden units, and the action prediction value Q (S, A) is output through the softmax activation function, and the specific convolution network structure is shown in FIG. 2,
the reviewer (Critic) network structure is similar to the participant network, and in addition to inputting the state space, the global action set a of all the intelligent agents at the intersection needs to be input.
The replay buffer D is used to randomly sample the samples during each training step while updating the participant network and the reviewer network.
And 3.2, defining a state space S, reading the acquired target vehicle running data, and defining the state by each intelligent agent by using the vehicle position and speed in the corresponding passable lane of the self-distributed phase.
As shown in fig. 3, in order to represent the vehicle position, dividing the released lane into discrete units with 6 meters at equal intervals, wherein each discrete unit is a cell, if a vehicle is in the corresponding cell, the corresponding position value is 1, otherwise the corresponding position value is 0; the matrix Pi is transversely placed by taking the right side as a stop line in different directions, and the position matrix P of the intelligent body consists of the matrixes Pi in all directions:
Figure BDA0003888636660000061
reading the target vehicle speed to represent the vehicle speed, forming a speed matrix Vi according to the arrangement mode of a matrix P, wherein the speed matrix V of the intelligent agent is composed of the matrix Vi in each direction:
Figure BDA0003888636660000062
the intersection phase scheme is expressed in the state space, the green time of each phase is Gi, the ratio of the conversion to the cycle length is Li, and a green time matrix G and a phase matrix L are formed:
G=[G1 G2 G3 … Gn]
L=[L1 L2 L3 … Ln]
in summary, the intersection state is defined as St = (P, V, L) at discrete time step t.
And 3.3, defining an action space A, controlling the intelligent agent of the current phase to select action when the phase green light time is over, wherein the phase green light time is required to be changed within a small range in order to ensure the stability of the system, and the green light time Gi of each phase is required to be limited between the maximum green light time Gmax and the minimum green light time Gmin (Gmin is less than or equal to Gi and less than or equal to Gmax).
Maximum green time calculation formula:
Figure BDA0003888636660000071
in the formula: g max -the maximum time of the green light,
C max -maximum cycle time, recommended to take 180 seconds;
l-the total loss time,
y-this phase-critical flow ratio,
y is the sum of the key flow ratios of the phases;
minimum green time calculation formula:
Figure BDA0003888636660000072
in the formula: g min -the minimum time of the green light,
PL p -the longitudinal length of the crosswalk,
Pv p -pedestrian crossing speed, experience is taken as 1.2m/s.
I-Green light time interval;
its action set is Ai = (-5, -4, -3, -2, -1,0, +1, +2, +3, +4, + 5), if the agent selects Ai = +3, it means that the current phase green duration is increased by 3 seconds, and the changed phase time is used to update the phase matrix in the state space after conversion.
Step 3.4, defining an award value R, wherein different indexes have different degrees of influence on each agent strategy, and therefore, the reconciliation weight values of various intersection traffic parameters are adopted as the award values of the agents, namely R = W 1 R l +W 2 R w +W 3 R d +W 4 R c (in the invention, W1 is-0.25, W2 is 0.2, W3 is-1, and W4 is 1), wherein:
(1) Queue length Rl: namely the sum of the queue lengths lij of all related roads controlled by the intelligent agent, and the sources of the sum are lane-level traffic data acquisition:
Figure BDA0003888636660000073
dij — the queuing length of all lanes controlled by the agent;
(2) Waiting time Rw: namely the sum of the vehicle waiting time Wij of all related roads controlled by the intelligent agent, the source of the sum is given by the combination of target vehicle operation data and lane-level traffic data:
Figure BDA0003888636660000074
wijn — the queuing time of all vehicles in all lanes controlled by the agent;
(3) Average delay Rd: namely the average delay of all related roads controlled by the intelligent agent, the sources of the delay are lane-level traffic data acquisition:
Figure BDA0003888636660000081
dijn — average delay of all lanes controlled by the agent;
(4) The intersection passes through the vehicle Rc, and is obtained by collecting the running data of the target vehicle:
Figure BDA0003888636660000082
cijn-the sum of all vehicles passing the stop line in phase green time for all lanes controlled by the agent;
in step S4, training a neural network, training a traffic signal control agent by using a MARDDPG algorithm in a deep reinforcement learning method, wherein the algorithm selects an optimal action by updating an action strategy, and specifically comprises the following substeps:
step 4.1, initializing parameterized action selection strategy for participant network and comment family network
Figure BDA0003888636660000083
Wherein
Figure BDA0003888636660000084
Historical memory data including state, reward and action information of an ith intelligent agent in the network at a certain time step t is stored; constructing and parameterizing value functions in a critic network
Figure BDA0003888636660000085
The input in the value function comprises a historical state h and an action a selected by the participant network;
separately, the weights θ 'of all target networks are initialized' i
Figure BDA0003888636660000086
Initializing replay buffer D, clearing time steps, participant network reading initial state s t,i
Step 4.2, every 5 seconds, single agent selects strategy through action
Figure BDA0003888636660000087
Wherein
Figure BDA0003888636660000088
Exploring for noise for the current time step; the participant network selects action a in action set Ai t,i And executes, after execution, to receive the bonus value r t,i With new state s t+1,i Generating new history data
Figure BDA0003888636660000089
Figure BDA00038886366600000810
Step 4.3, will have already gone on { s 1,i ,a 1,i ,r 1,i ,s 2,i ,a 2,i ,r 2,i A sentence } is stored in a replay buffer D, in the present invention the buffer storage size is set to 20000; and the agent i samples M historical training steps stored in the replay buffer D for training the network of the participants and the critics.
Step 4.4, after the agent selects M pieces of historical training step data, the agent sends the data to the serverThe number of samples of M in the lighting is 64. Dividing a plurality of minipatches (expressed by m) by utilizing 64 historical data to perform random batch, specifically, firstly, extracting 64 data from a buffer, but not updating 64 data, randomly dividing 64 data into a plurality of minipatches, and then updating a loss function by using data of the minipatches (batch); inputting the state and the action selected by the participant network into the critic network, and enabling the critic network to pass through the value function constructed in the step 4.1
Figure BDA0003888636660000091
Estimating Q value, and updating parameters in the target network of the critic by minimizing the average loss function
Figure BDA0003888636660000092
The participant network utilizes the reward value returned by the critic network to calculate the strategy gradient through the loss function so as to update the parameter theta in the participant target network i
For updating
Figure BDA0003888636660000093
The loss function of (2) is specifically as follows:
Figure BDA0003888636660000094
Figure BDA0003888636660000095
representing the average value of Q values of M minimatch selection actions of all intelligent agent critics;
Figure BDA0003888636660000096
is the total reward value; gamma is a reduction coefficient, and the value of gamma is 0.99 in the invention;
Figure BDA0003888636660000097
representing the sum of the Q values of all agents at the next time step.
For updating theta i The objective function of (2) is specifically as follows:
Figure BDA0003888636660000098
to update the participant network to maximize future expected rewards, J (θ) is defined i ) To find the direction to maximize the jackpot.
Step 4.5, all the agents in the system adopt a soft update mode to update the parameters of the target network
Figure BDA0003888636660000099
And theta' i First, a rate τ (0) for updating the target network with respect to the master network is defined<τ<1) In the invention, the value is 0.001; updating the target network by using the convex combination of the parameters of the current network and the parameters of the target network, as follows:
Figure BDA00038886366600000910
θ' i =τθ i +(1-τ)θ' i
repeating the steps 4.2 to 4.5 when
Figure BDA00038886366600000911
(in the present invention Δ p is taken to be 0.05), the best in the state space that can be traversed has been reached, and the agent completes the training.
S5, performing signal control by using the trained neural network, acquiring real-time holographic data according to the intelligent agent construction requirements in the steps 3.2 and 3.4, inputting signals to control the intelligent agent, and generating a phase time matrix through the time length change action of each phase signal output by the intelligent agent:
G'=[G1' G2' G3' … Gn']
gn' is the optimized green time of the intelligent agent for controlling the n phases, the green time is combined and then input into a signal machine before the next period starts, and the optimized signal control scheme is used for controlling the traffic signals.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims (6)

1. A single-point traffic signal control method based on intersection holographic data is characterized by comprising the following steps:
collecting initial holographic traffic data;
processing holographic traffic data;
constructing a deep reinforcement learning model comprising a plurality of agents;
training a neural network by using a MARDDPG algorithm;
and performing signal control by using the trained neural network.
2. The single-point traffic signal control method based on intersection holographic data according to claim 1,
the holographic traffic data comprises target vehicle running data, lane-level traffic data, intersection design current situation and traffic signal control current situation, wherein the target vehicle running data comprises a timestamp ID of a target vehicle, a vehicle type, a vehicle longitudinal speed, a lane number where the vehicle is located and a distance between the vehicle and a stop line, the lane-level traffic data comprises a target lane queuing length, a total vehicle waiting time, an average delay and a number of vehicles passing the stop line, the intersection design current situation comprises the number of lanes of each entrance lane and the function distribution of the lanes of an intersection, and the traffic signal control current situation comprises intersection current phase sequence and time length distribution of each phase.
3. The single-point traffic signal control method based on intersection holographic data according to claim 1,
the holographic traffic data processing process comprises the following steps:
redundant data is deleted;
deleting track abnormal data;
and (5) performing noise track data completion by adopting a linear function interpolation method.
4. The single point traffic signal control method based on intersection holographic data according to claim 1,
in the process of constructing a deep reinforcement learning model comprising a plurality of intelligent agents, the traffic signal control intelligent Agent selects a MARDDPG algorithm to carry out deep reinforcement learning, and defines a state space S, an action space A and a reward value R respectively.
5. The single point traffic signal control method based on intersection holographic data according to claim 1,
the process of neural network training using the MARDDPG algorithm comprises the following steps:
step 1: participant network and comment family network initialization parameterized action selection strategy
Figure FDA0003888636650000021
Wherein
Figure FDA0003888636650000022
Commenting historical memory data of the family network for historical memory data of an ith intelligent agent in a participant network at a certain time step t
Figure FDA0003888636650000023
Historical memory data of an ith agent in the network at a certain time step t is also defined; constructing and parameterizing value functions in a critic network
Figure FDA0003888636650000024
The input in the value function comprises the historical state h and the action a selected by the participant network; separately, the weights θ 'of all target networks are initialized' i And
Figure FDA0003888636650000025
initializing the replay buffer D, clearing the time step, and the participant network reading the initial state s t,i
And 2, step: every 5 seconds, a single agent selects a strategy by action
Figure FDA0003888636650000026
Wherein
Figure FDA0003888636650000027
Exploring noise for the current time step; the participant network selects action a in action set Ai t,i And executes, after execution, to receive the bonus value r t,i With new state s t+1,i Generating new history data
Figure FDA0003888636650000028
Figure FDA0003888636650000029
And 3, step 3: will have already been performed s 1,i ,a 1,i ,r 1,i ,s 2,i ,a 2,i ,r 2,i \ 8230 } is stored in the replay buffer D, and the number of samples is not less than 20000; the agent i samples M historical training steps stored in the replay buffer D for training the network of participants and critics;
and 4, step 4: after the intelligent agent selects M pieces of historical training step data, a minibatch in M is utilized to enable the comment family network to pass through the value function constructed in the step 1
Figure FDA00038886366500000210
Estimating Q value and updating parameters in critic target network by minimizing average loss function
Figure FDA00038886366500000211
Wherein minipatch is represented as m; similarly, the participant network updates the parameter θ in the participant target network by calculating a policy gradient through a loss function i
For updating
Figure FDA00038886366500000212
The loss function of (d) is specifically as follows:
Figure FDA00038886366500000213
Figure FDA00038886366500000214
representing the average value of Q values of M minimatch selection actions of all intelligent agent critics;
Figure FDA00038886366500000215
is the total prize value; gamma is a reduction coefficient;
Figure FDA00038886366500000216
representing the sum of Q values of all agents at the next time step in the critic network;
for updating theta i The objective function of (2) is specifically as follows:
Figure FDA00038886366500000217
to update the participant network to maximize future expected rewards, J (θ) is defined i ) To find a direction to maximize the jackpot;
and 5: the intelligent agent adopts a soft update mode to update the parameters of the target network
Figure FDA0003888636650000031
And theta' i First, a rate τ (0) for updating the target network based on the master network is defined<τ<1) (ii) a Updating the target network by using the convex combination of the parameters of the current network and the parameters of the target network, as follows:
Figure FDA0003888636650000032
θ′ i =τθ i +(1-τ)θ′ i
repeating the steps 2 to 5 when
Figure FDA0003888636650000033
In time, the best in the traversable state space has been reached and the agent completes the training.
6. The single-point traffic signal control method based on intersection holographic data according to claim 1,
the method comprises the following steps of performing signal control by using a trained neural network, specifically, acquiring real-time holographic data according to the construction requirement of an intelligent agent, inputting signals to control the intelligent agent, and generating a phase time matrix G' by changing the time length of each phase signal output by the intelligent agent:
G'=[G1' G2' G3' … Gn']。
CN202211253243.7A 2022-10-13 2022-10-13 Single-point traffic signal control method based on intersection holographic data Pending CN115691167A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211253243.7A CN115691167A (en) 2022-10-13 2022-10-13 Single-point traffic signal control method based on intersection holographic data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211253243.7A CN115691167A (en) 2022-10-13 2022-10-13 Single-point traffic signal control method based on intersection holographic data

Publications (1)

Publication Number Publication Date
CN115691167A true CN115691167A (en) 2023-02-03

Family

ID=85064353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211253243.7A Pending CN115691167A (en) 2022-10-13 2022-10-13 Single-point traffic signal control method based on intersection holographic data

Country Status (1)

Country Link
CN (1) CN115691167A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071939A (en) * 2023-03-24 2023-05-05 华东交通大学 Traffic signal control model building method and control method
CN117114079A (en) * 2023-10-25 2023-11-24 中泰信合智能科技有限公司 Method for migrating single intersection signal control model to target environment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071939A (en) * 2023-03-24 2023-05-05 华东交通大学 Traffic signal control model building method and control method
CN117114079A (en) * 2023-10-25 2023-11-24 中泰信合智能科技有限公司 Method for migrating single intersection signal control model to target environment
CN117114079B (en) * 2023-10-25 2024-01-26 中泰信合智能科技有限公司 Method for migrating single intersection signal control model to target environment

Similar Documents

Publication Publication Date Title
CN111696370B (en) Traffic light control method based on heuristic deep Q network
CN112215337B (en) Vehicle track prediction method based on environment attention neural network model
WO2022052406A1 (en) Automatic driving training method, apparatus and device, and medium
CN115691167A (en) Single-point traffic signal control method based on intersection holographic data
CN112216124B (en) Traffic signal control method based on deep reinforcement learning
CN112201069B (en) Deep reinforcement learning-based method for constructing longitudinal following behavior model of driver
CN114038212B (en) Signal lamp control method based on two-stage attention mechanism and deep reinforcement learning
CN112071062B (en) Driving time estimation method based on graph convolution network and graph attention network
CN111243299A (en) Single cross port signal control method based on 3 DQN-PSER algorithm
CN109269516B (en) Dynamic path induction method based on multi-target Sarsa learning
CN111126687B (en) Single-point offline optimization system and method for traffic signals
CN113552883B (en) Ground unmanned vehicle autonomous driving method and system based on deep reinforcement learning
CN113947928A (en) Traffic signal lamp timing method based on combination of deep reinforcement learning and extended Kalman filtering
CN113299079B (en) Regional intersection signal control method based on PPO and graph convolution neural network
CN113724507B (en) Traffic control and vehicle guidance cooperative method and system based on deep reinforcement learning
CN116758767B (en) Traffic signal lamp control method based on multi-strategy reinforcement learning
CN111507499B (en) Method, device and system for constructing model for prediction and testing method
CN117116064A (en) Passenger delay minimization signal control method based on deep reinforcement learning
CN115472023A (en) Intelligent traffic light control method and device based on deep reinforcement learning
WO2021258847A1 (en) Driving decision-making method, device, and chip
CN114444737A (en) Intelligent pavement maintenance planning method based on transfer learning
CN113077642A (en) Traffic signal lamp control method and device and computer readable storage medium
CN116069014B (en) Vehicle automatic control method based on improved deep reinforcement learning
CN117709602B (en) Urban intelligent vehicle personification decision-making method based on social value orientation
CN116137103B (en) Large-scale traffic light signal control method based on primitive learning and deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination