CN112991750B - Local traffic optimization method based on reinforcement learning and generation type countermeasure network - Google Patents

Local traffic optimization method based on reinforcement learning and generation type countermeasure network Download PDF

Info

Publication number
CN112991750B
CN112991750B CN202110526842.0A CN202110526842A CN112991750B CN 112991750 B CN112991750 B CN 112991750B CN 202110526842 A CN202110526842 A CN 202110526842A CN 112991750 B CN112991750 B CN 112991750B
Authority
CN
China
Prior art keywords
traffic
training
learning
local traffic
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110526842.0A
Other languages
Chinese (zh)
Other versions
CN112991750A (en
Inventor
刘新成
宣帆
肖通
徐璀
周国冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Boyuxin Transportation Technology Co Ltd
Original Assignee
Suzhou Boyuxin Transportation Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Boyuxin Transportation Technology Co Ltd filed Critical Suzhou Boyuxin Transportation Technology Co Ltd
Priority to CN202110526842.0A priority Critical patent/CN112991750B/en
Publication of CN112991750A publication Critical patent/CN112991750A/en
Application granted granted Critical
Publication of CN112991750B publication Critical patent/CN112991750B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

A local traffic optimization method based on a reinforcement learning and generation type countermeasure network comprises the steps of establishing a training model, automatically improving the accuracy of the model by adopting the generation countermeasure network, and predicting traffic flow data at a specified time by training real traffic flow data detected at a certain intersection; the method comprises the steps of training real traffic flow data and virtual traffic flow data by adopting Q learning to output actions to form a Q value table, obtaining an optimal local traffic optimization strategy by adopting a reward function, utilizing the advantages of reinforcement learning interactive learning, greatly improving the efficiency of traffic signal lamp period adjustment, verifying whether congestion conditions are relieved by adjusting the current congestion level and the traffic signal lamp time ratio of a certain intersection, repeatedly and continuously optimizing to obtain the optimal traffic light time ratio, utilizing the inspiring self-game idea of a generative confrontation network to realize limited time optimal training of Q learning, realizing local traffic optimization, and finally obtaining an optimal adjustment scheme, thereby improving the local traffic optimization capability.

Description

Local traffic optimization method based on reinforcement learning and generation type countermeasure network
Technical Field
The invention belongs to the field of traffic optimization, and particularly relates to a local traffic optimization method based on a reinforcement learning and generation type countermeasure network.
Background
The traditional local traffic optimization method comprises several typical control systems such as TRANSYT, SCOOT and the like, signal timing is optimized mainly through real-time data obtained by vehicle detection equipment, and control is realized through various communication and signal control equipment.
At present, various artificial intelligence methods are applied to traffic control and optimization, however, the methods have limitations in solving the problem of local traffic optimization, the local traffic optimization is a huge system, a large amount of empirical knowledge reasoning required by an expert system and the establishment of a knowledge base are difficult, and traffic parameters are not easily described through some qualitative knowledge and relations. The traditional artificial neural network is easy to fall into local optimization due to the traversability of learning samples, so that other methods need to be combined to improve generalization capability. The existing method has a good effect of solving the problem of traffic optimization at a single intersection. But in the face of complex road sections and local traffic control, the apparent capacity is insufficient. Therefore, it is of great significance to design an optimization scheme capable of efficiently solving the local traffic problem.
Disclosure of Invention
The invention aims to provide a local traffic optimization method based on a reinforcement learning and generation type countermeasure network.
In order to solve the technical problems, the invention adopts the technical scheme that: a local traffic optimization method based on a reinforcement learning and generation type countermeasure network specifically comprises the following steps: a local traffic optimization method based on a reinforcement learning and generation type countermeasure network is characterized by comprising the following steps: s1, establishing a training model, optimizing the training speed of the model by adopting a generated countermeasure network, inputting a real traffic data state set S detected at a certain intersection,
outputting virtual traffic flow data; s2, training the real traffic flow data and the virtual traffic flow data by adopting Q learning and outputting an action set
Figure 756250DEST_PATH_IMAGE001
Form a Q value table with the formula
Figure 725343DEST_PATH_IMAGE002
Wherein a is the learning efficiency,
Figure 552484DEST_PATH_IMAGE003
inputting a state set s as a parameter of a neural network, outputting an action value function Q corresponding to an action to obtain a local traffic optimization scheme, training the local traffic optimization scheme by adopting a reward function, and calculating by utilizing a reward function formula to obtain a return value of the previous action
Figure 759475DEST_PATH_IMAGE004
The reward function is formulated as
Figure 404083DEST_PATH_IMAGE005
Wherein, in the step (A),
Figure 234373DEST_PATH_IMAGE006
is a constant number of times, and is,
Figure 255419DEST_PATH_IMAGE007
represents the average vehicle speed of the lane with lane number i,
Figure 723440DEST_PATH_IMAGE009
the amount of traffic in lane i is shown,
Figure 538950DEST_PATH_IMAGE010
showing the total flow of all lanes in the local traffic network,
Figure 358001DEST_PATH_IMAGE011
is a set standard average speed, the calculated speed is higher than the speed and gives a positive return, and the calculated speed is lower than the speed and gives a negative return, the optimal strategy is found, namely the set of all the optimal actions, and the learning algorithm is updated according to the formula
Figure 917158DEST_PATH_IMAGE012
Wherein, a is learning efficiency, and a is large for learning efficiency
Figure 505266DEST_PATH_IMAGE013
Is greatly influenced by the next state, R is a reward value R,
Figure 226097DEST_PATH_IMAGE014
the selection policy representing the next set of states,
Figure 36839DEST_PATH_IMAGE015
in order to achieve a discount rate, the rate of the discount,
Figure 134108DEST_PATH_IMAGE016
the lower the learning efficiency is, the more affected by the reward value, the best local traffic optimization scheme is obtained.
In some embodiments, the specific steps of step S1 are: establishing a generative confrontation network model, initializing a generator and a discriminator in the generative confrontation network, fixing one party in the training process of the generative confrontation network, updating parameters of the other network, alternately iterating to maximize the error of the other party, and finally generating virtual data distribution which is the same as the real data distribution.
In some embodiments, the fixed party in the generative confrontation network training process is the generator.
In some embodiments, the set of states S for an intersection is all at time t
Figure 170197DEST_PATH_IMAGE017
Set of (2), state
Figure 937296DEST_PATH_IMAGE018
The traffic flow of all lanes at the one-way exit of the crossroad at the moment t, the action, namely the Q value, is adjusted as a period, the period is one-time traffic light switching, and the action set
Figure 855573DEST_PATH_IMAGE019
And for all Q values, the action return value R is the vehicle speed on the road.
In some embodiments, there are four intersection states: the south-north direction is straight, the south-north direction is turned left, the east-west direction is straight, the east-west direction is turned left, 1 represents that green light can pass, 0 represents that red light is forbidden to pass, then four states have four actions, and are represented by a one-dimensional binary array: [1,0,0,0], [0,1,0,0], [0,0,1,0], and [0,0,0,1], simulation of time control of traffic signals is realized by changing the input array, with one second as a unit.
The scope of the present invention is not limited to the specific combinations of the above-described features, and other embodiments in which the above-described features or their equivalents are arbitrarily combined are also intended to be encompassed. For example, the above features and the technical features (but not limited to) having similar functions disclosed in the present application are mutually replaced to form the technical solution.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the invention utilizes the advantages of reinforcement learning interactive learning to set period adjustment as action and set traffic flow and local traffic operation condition as state and return, greatly improves the efficiency of traffic signal lamp period adjustment, trains a model through basic data, obtains corresponding reward by state and action, namely checks whether the congestion condition is relieved or not by adjusting the current congestion level and the time ratio of a traffic light at a certain intersection, obtains the optimal time ratio of the traffic light through reciprocating adjustment, utilizes the inspiring self-game thought of a generative confrontation network, can train and generate the confrontation network by utilizing limited basic data, then utilizes new data generated by the generated confrontation network to form virtual data and combines the basic data to improve the reinforcement learning speed, creatively uses the generated confrontation network to realize the optimal training of Q learning, the two are combined with each other, local traffic optimization is realized in the aspect of traffic signal lamp period, the best adjustment scheme is finally obtained, and the local traffic optimization efficiency can be greatly improved.
Drawings
FIG. 1 is a flow diagram of the present invention;
FIG. 2 is a diagram of a generative confrontation network architecture;
FIG. 3 is a diagram of a generative confrontation network training process;
FIG. 4 is a schematic view of a partial traffic network;
fig. 5 is a schematic view of the traffic optimization principle.
Detailed Description
The invention is described below with reference to the accompanying drawings:
(1) data set and feature selection
The traffic flow of an intersection is set as a data set, a typical intersection is researched by the invention, as shown in fig. 4, the state space size of the intersection is the traffic flow size of all roads, the action is set as a red light or a green light, the action quality is judged by using the speed size of the roads as reward return, one time of traffic light switching is regarded as one period, the action adjustment is set to be carried out every three periods, namely the adjustment of the time ratio of the traffic light, an optimal Q value table is found through a large amount of training and is applied to the specific intersection, and the time ratio of the traffic light of a signal light can be adjusted in time to optimize traffic.
(2) Detailed description of the invention
The method introduces a generative confrontation network for improving the training effect of the model on normal data and simultaneously inhibiting the generalization capability of the model on abnormal data, as shown in fig. 2, the generative confrontation network comprises a generator G and a discriminator D, the generator G tries to generate traffic flow sample data closer to reality, the discriminator D tries to perfectly distinguish the real data from the generated data so as to generate data which is required to be obtained, and then the network structure is shown in fig. 2.
The objective function of the generative confrontation network model is as follows:
Figure 631900DEST_PATH_IMAGE020
wherein the content of the first and second substances,
Figure 319233DEST_PATH_IMAGE021
as real data
Figure 522812DEST_PATH_IMAGE022
The distribution of (a) to (b) is,
Figure 662807DEST_PATH_IMAGE023
for noise variance, D is discriminant function, x is true data, D (x) is probability of discriminant true data, D (G (z)) is probability of discriminant generated data, and D is trained to maximize
Figure 741359DEST_PATH_IMAGE024
And
Figure 486461DEST_PATH_IMAGE025
training G minimization
Figure 985575DEST_PATH_IMAGE026
I.e. to maximize the loss of D. Also can be combined with
Figure 753811DEST_PATH_IMAGE027
And
Figure 996574DEST_PATH_IMAGE028
in the sense of the loss of D,
Figure 268286DEST_PATH_IMAGE028
the loss of G is understood, one side is fixed in the training process, the parameters of the other network are updated, alternate iteration is performed to maximize the error of the other side, and finally G can estimate the distribution of sample data, namely the generated sample is more real.
In the embodiment, the idea of the generative confrontation network algorithm is to initialize G and D at first, then in each iteration process, fix G and train D; selecting m sample points from the data set, and selecting m vectors from a distribution (uniform distribution, normal distribution, etc.); taking a vector z in the m vectors as the input of the network to obtain m generated data; train D to maximize
Figure 938302DEST_PATH_IMAGE027
And
Figure 928255DEST_PATH_IMAGE028
training G minimization
Figure 709129DEST_PATH_IMAGE028
(ii) a G hope
Figure 599462DEST_PATH_IMAGE029
Approaching
1, i.e., positive class, so that G losses will be minimal, D expects the output of the real data to approach 1, generating an output of the data
Figure 440379DEST_PATH_IMAGE030
Approaching 0.
The generative confrontation network training process is shown in fig. 3, wherein a light dotted line represents the corresponding distribution of the generated data in the discriminator, a dark dotted line represents the distribution of the real data, a solid line represents the generation distribution of the data, and fig. 3 (a) shows that the classification capability of the training system is limited when the training is started; fig. 3 (b), D training effect is better, and generated data can be clearly distinguished; fig. 3 (c), the solid line and the dark dotted line deviate, the light dotted line drops, which indicates that the probability of generating data drops, the solid line moves in the direction of the light dotted line, G is promoted during the training process, G also affects the distribution of D, and if G is fixed, D is trained to be optimal, the formula is as follows:
Figure 917628DEST_PATH_IMAGE031
wherein the content of the first and second substances,
Figure 502193DEST_PATH_IMAGE032
in order to be the distribution of the real data x,
Figure 748498DEST_PATH_IMAGE033
to generate a distribution of data x
Figure 760316DEST_PATH_IMAGE034
Increasingly approaching
Figure 459282DEST_PATH_IMAGE035
Figure 50800DEST_PATH_IMAGE036
Approaching 0.5, i.e. the state of fig. 3 (d), i.e. the training result is finally obtained, and the distribution is the same as the distribution, and agent training for reinforcement learning is performed simultaneously with the generated data and the real data.
As shown in fig. 4, the basic traffic implementation principle is as follows: the signal controller sends an action by controlling the next second state of the signal lamp, thereby changing the vehicle speed state of the lane detected by the roadside detector, and then obtaining a cyclic process of reward in interaction with the environment, so the Markov property is simply expressed as: m = < S, A, Ps, a, R >.
Specifically, as shown in fig. 5, a certain intersection is provided with N lanes at a one-way exit, a detector is provided on each road in each direction for detecting a vehicle to obtain a vehicle speed V, and the road with a length L is divided into M regional sections, so that the size of a state space at the time t of the exit can be obtained, and the size is defined as:
Figure 276245DEST_PATH_IMAGE037
and the state set S of the intersection is all
Figure 832866DEST_PATH_IMAGE017
A collection of (a).
In this embodiment, the right turn is set without being controlled by the signal lamp, so that there are four states at an intersection: the south-north direction is straight, the south-north direction is turned left, the east-west direction is straight, the east-west direction is turned left, 1 represents that green light can pass, 0 represents that red light is forbidden to pass, then four states have four actions, and are represented by a one-dimensional binary array: [1,0,0,0], [0,1,0,0], [0,0,1,0] and [0,0,0,1], simulation of time control of the traffic light signal is realized by changing the input array, for example, the input [1,0, 0], [1,0,0,0] represents the straight-going green light in the north-south direction for two seconds.
The reward function needs to reflect the unblocked jam condition of the local traffic network, the traffic condition can be well judged according to the traffic speed of the lanes under normal conditions, the faster the average speed is, the better the traffic is, because the traffic flow of each lane has the size difference, the calculation of the average speed can not be directly carried out on all lanes in the area, the lane with the large traffic flow has a large contribution to the average speed of the whole local network, and a large proportion is given to the lane, and the reward function formula is as follows:
Figure 674920DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 476654DEST_PATH_IMAGE006
is a constant number of times, and is,
Figure 556606DEST_PATH_IMAGE007
represents the average vehicle speed of the lane with lane number i,
Figure 51172DEST_PATH_IMAGE008
the amount of traffic in lane i is shown,
Figure 849364DEST_PATH_IMAGE010
showing the total flow of all lanes in the local traffic network,
Figure 782685DEST_PATH_IMAGE011
is a set standard average speed above which the calculated speed gives a positive return and below which a negative return is given.
For the storage of Q values, the inputs are per state and the outputs are actions, i.e. Q values:
Figure 592509DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 382610DEST_PATH_IMAGE003
the input is a state set S and the action value function Q corresponding to the action is output.
Using a reward function formulaCalculating to obtain the last action return value
Figure 59578DEST_PATH_IMAGE004
The virtual traffic data and the real traffic data are used for training a neural network, so that a real action value function is approximated, and an optimal strategy, namely a set of all optimal actions, is found.
The learning algorithm updates the formula as:
Figure 327748DEST_PATH_IMAGE012
wherein, a is learning efficiency, and a is large for learning efficiency
Figure 523237DEST_PATH_IMAGE013
Is greatly affected by the next state. R is the value of the reward R,
Figure 484240DEST_PATH_IMAGE014
the selection policy representing the next set of states.
Figure 866811DEST_PATH_IMAGE015
In order to achieve a discount rate, the rate of the discount,
Figure 938672DEST_PATH_IMAGE016
the lower the learning efficiency is, the more influenced by the reward value.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (5)

1. A local traffic optimization method based on a reinforcement learning and generation type countermeasure network is characterized by comprising the following steps: s1, establishing a training model, optimizing the training speed of the model by adopting a generated countermeasure network, and inputting a certain tenOutputting virtual traffic flow data by using a real traffic flow data state set S detected by a word road interface; s2, training the real traffic flow data and the virtual traffic flow data by Q learning and outputting an action set to form a Q value table, wherein the formula is
Figure DEST_PATH_IMAGE002
Wherein a is the learning efficiency,
Figure DEST_PATH_IMAGE003
inputting parameters of a neural network into a state set s, outputting an action value function Q corresponding to an action to obtain a local traffic optimization scheme, training the local traffic optimization scheme by adopting a reward function, and calculating by utilizing a reward function formula to obtain a previous action return value, wherein the reward function formula is
Figure DEST_PATH_IMAGE005
Wherein, in the step (A),
Figure DEST_PATH_IMAGE006
is a constant number of times, and is,
Figure DEST_PATH_IMAGE007
represents the average vehicle speed of the lane with lane number i,
Figure DEST_PATH_IMAGE009
the amount of traffic in lane i is shown,
Figure DEST_PATH_IMAGE010
showing the total flow of all lanes in the local traffic network,
Figure DEST_PATH_IMAGE011
is a set standard average speed, the calculated speed is higher than the speed and gives a positive return, and the calculated speed is lower than the speed and gives a negative return, the optimal strategy is found, namely the set of all the optimal actions, and the learning algorithm is updated according to the formula
Figure DEST_PATH_IMAGE012
Wherein, a is learning efficiency, and a is large for learning efficiency
Figure DEST_PATH_IMAGE013
Is greatly influenced by the next state, R is a reward value R,
Figure DEST_PATH_IMAGE014
the selection policy representing the next set of states,
Figure DEST_PATH_IMAGE015
in order to achieve a discount rate, the rate of the discount,
Figure DEST_PATH_IMAGE016
the lower the learning efficiency is, the more affected by the reward value, the best local traffic optimization scheme is obtained.
2. The local traffic optimization method based on reinforcement learning and generative confrontation network as claimed in claim 1, wherein the step S1 using the generative confrontation network to optimize the model training speed comprises the following specific steps: establishing a generative confrontation network model, initializing a generator and a discriminator in the generative confrontation network, fixing one party in the training process of the generative confrontation network, updating parameters of the other network, alternately iterating to maximize the error of the other party, and finally generating virtual data distribution which is the same as the real data distribution.
3. The local traffic optimization method based on reinforcement learning and generative confrontation network as claimed in claim 2, wherein: the fixed party is a generator in the process of generating confrontation network training.
4. The local traffic optimization method based on reinforcement learning and generative confrontation network as claimed in claim 1, wherein: all the state sets S of a certain crossroad at the moment t
Figure DEST_PATH_IMAGE017
The action, namely Q value is used as cycle adjustment, the cycle is one-time traffic light switching, the action set is a set of all Q values, and the action return value R is the speed of the vehicle on the road.
5. The local traffic optimization method based on reinforcement learning and generative confrontation network as claimed in claim 1, wherein: there are four intersection states: the south-north direction is straight, the south-north direction is turned left, the east-west direction is straight, the east-west direction is turned left, 1 represents that green light can pass, 0 represents that red light is forbidden to pass, then four states have four actions, and are represented by a one-dimensional binary array: [1,0,0,0], [0,1,0,0], [0,0,1,0], and [0,0,0,1], simulation of time control of traffic signals is realized by changing the input array, with one second as a unit.
CN202110526842.0A 2021-05-14 2021-05-14 Local traffic optimization method based on reinforcement learning and generation type countermeasure network Active CN112991750B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110526842.0A CN112991750B (en) 2021-05-14 2021-05-14 Local traffic optimization method based on reinforcement learning and generation type countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110526842.0A CN112991750B (en) 2021-05-14 2021-05-14 Local traffic optimization method based on reinforcement learning and generation type countermeasure network

Publications (2)

Publication Number Publication Date
CN112991750A CN112991750A (en) 2021-06-18
CN112991750B true CN112991750B (en) 2021-11-30

Family

ID=76336522

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110526842.0A Active CN112991750B (en) 2021-05-14 2021-05-14 Local traffic optimization method based on reinforcement learning and generation type countermeasure network

Country Status (1)

Country Link
CN (1) CN112991750B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113506450B (en) * 2021-07-28 2022-05-17 浙江海康智联科技有限公司 Qspare-based single-point signal timing scheme selection method
CN114613170B (en) * 2022-03-10 2023-02-17 湖南大学 Traffic signal lamp intersection coordination control method based on reinforcement learning
CN115662152B (en) * 2022-09-27 2023-07-25 哈尔滨理工大学 Urban traffic management self-adaptive system based on deep learning driving

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194612A (en) * 2017-06-20 2017-09-22 清华大学 A kind of train operation dispatching method learnt based on deeply and system
CN111191654A (en) * 2019-12-30 2020-05-22 重庆紫光华山智安科技有限公司 Road data generation method and device, electronic equipment and storage medium
CN111311577A (en) * 2020-02-14 2020-06-19 迈拓仪表股份有限公司 Intelligent water seepage detection method based on generation of confrontation network and reinforcement learning
CN112700664A (en) * 2020-12-19 2021-04-23 北京工业大学 Traffic signal timing optimization method based on deep reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10832140B2 (en) * 2019-01-30 2020-11-10 StradVision, Inc. Method and device for providing information for evaluating driving habits of driver by detecting driving scenarios occurring during driving

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194612A (en) * 2017-06-20 2017-09-22 清华大学 A kind of train operation dispatching method learnt based on deeply and system
CN111191654A (en) * 2019-12-30 2020-05-22 重庆紫光华山智安科技有限公司 Road data generation method and device, electronic equipment and storage medium
CN111311577A (en) * 2020-02-14 2020-06-19 迈拓仪表股份有限公司 Intelligent water seepage detection method based on generation of confrontation network and reinforcement learning
CN112700664A (en) * 2020-12-19 2021-04-23 北京工业大学 Traffic signal timing optimization method based on deep reinforcement learning

Also Published As

Publication number Publication date
CN112991750A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
CN112991750B (en) Local traffic optimization method based on reinforcement learning and generation type countermeasure network
CN111739284B (en) Traffic signal lamp intelligent timing method based on genetic algorithm optimization fuzzy control
CN112216124A (en) Traffic signal control method based on deep reinforcement learning
CN113538910B (en) Self-adaptive full-chain urban area network signal control optimization method
George et al. Traffic prediction using multifaceted techniques: A survey
CN114360266B (en) Intersection reinforcement learning signal control method for sensing detection state of internet connected vehicle
CN113744527B (en) Intelligent targeting dredging method for highway confluence area
CN113554875B (en) Variable speed-limiting control method for heterogeneous traffic flow of expressway based on edge calculation
CN113257016B (en) Traffic signal control method and device and readable storage medium
CN111126687B (en) Single-point offline optimization system and method for traffic signals
Ma et al. A deep reinforcement learning approach to traffic signal control with temporal traffic pattern mining
Li et al. Deep imitation learning for traffic signal control and operations based on graph convolutional neural networks
CN112950963A (en) Self-adaptive signal control optimization method for main branch intersection of city
Zeng et al. Training reinforcement learning agent for traffic signal control under different traffic conditions
Cao et al. Design of a traffic junction controller using classifier system and fuzzy logic
Song et al. Traffic signal control under mixed traffic with connected and automated vehicles: a transfer-based deep reinforcement learning approach
Zhang et al. Direction-decision learning based pedestrian flow behavior investigation
Nishikawa et al. Improvements of the traffic signal control by complex-valued Hopfield networks
Shi et al. Improving the generalizability and robustness of large-scale traffic signal control
Wu et al. Deep Reinforcement Learning Based Traffic Signal Control: A Comparative Analysis
Shahriar et al. Intersection traffic efficiency enhancement using deep reinforcement learning and V2X communications
Gahlan et al. A review on various issues, challenges and different methodologies in vehicular environment
Faqir et al. Deep q-learning approach for congestion problem in smart cities
Ahmed Continuous genetic algorithm for traffic signal control
CN115691110B (en) Intersection signal period stable timing method based on reinforcement learning and oriented to dynamic traffic flow

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant