CN112991750A - Local traffic optimization method based on reinforcement learning and generation type countermeasure network - Google Patents

Local traffic optimization method based on reinforcement learning and generation type countermeasure network Download PDF

Info

Publication number
CN112991750A
CN112991750A CN202110526842.0A CN202110526842A CN112991750A CN 112991750 A CN112991750 A CN 112991750A CN 202110526842 A CN202110526842 A CN 202110526842A CN 112991750 A CN112991750 A CN 112991750A
Authority
CN
China
Prior art keywords
training
traffic
local traffic
network
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110526842.0A
Other languages
Chinese (zh)
Other versions
CN112991750B (en
Inventor
刘新成
宣帆
肖通
徐璀
周国冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Boyuxin Information Technology Co ltd
Original Assignee
Suzhou Boyuxin Transportation Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Boyuxin Transportation Technology Co Ltd filed Critical Suzhou Boyuxin Transportation Technology Co Ltd
Priority to CN202110526842.0A priority Critical patent/CN112991750B/en
Publication of CN112991750A publication Critical patent/CN112991750A/en
Application granted granted Critical
Publication of CN112991750B publication Critical patent/CN112991750B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Traffic Control Systems (AREA)

Abstract

A local traffic optimization method based on a reinforcement learning and generation type countermeasure network comprises the steps of establishing a training model, automatically improving the accuracy of the model by adopting the generation countermeasure network, and predicting traffic flow data at a specified time by training real traffic flow data detected at a certain intersection; the method comprises the steps of training real traffic flow data and virtual traffic flow data by adopting Q learning to output actions to form a Q value table, obtaining an optimal local traffic optimization strategy by adopting a reward function, utilizing the advantages of reinforcement learning interactive learning, greatly improving the efficiency of traffic signal lamp period adjustment, verifying whether congestion conditions are relieved by adjusting the current congestion level and the traffic signal lamp time ratio of a certain intersection, repeatedly and continuously optimizing to obtain the optimal traffic light time ratio, utilizing the inspiring self-game idea of a generative confrontation network to realize limited time optimal training of Q learning, realizing local traffic optimization, and finally obtaining an optimal adjustment scheme, thereby improving the local traffic optimization capability.

Description

Local traffic optimization method based on reinforcement learning and generation type countermeasure network
Technical Field
The invention belongs to the field of traffic optimization, and particularly relates to a local traffic optimization method based on a reinforcement learning and generation type countermeasure network.
Background
The traditional local traffic optimization method comprises several typical control systems such as TRANSYT, SCOOT and the like, signal timing is optimized mainly through real-time data obtained by vehicle detection equipment, and control is realized through various communication and signal control equipment.
At present, various artificial intelligence methods are applied to traffic control and optimization, however, the methods have limitations in solving the problem of local traffic optimization, the local traffic optimization is a huge system, a large amount of empirical knowledge reasoning required by an expert system and the establishment of a knowledge base are difficult, and traffic parameters are not easily described through some qualitative knowledge and relations. The traditional artificial neural network is easy to fall into local optimization due to the traversability of learning samples, so that other methods need to be combined to improve generalization capability. The existing method has a good effect of solving the problem of traffic optimization at a single intersection. But in the face of complex road sections and local traffic control, the apparent capacity is insufficient. Therefore, it is of great significance to design an optimization scheme capable of efficiently solving the local traffic problem.
Disclosure of Invention
The invention aims to provide a local traffic optimization method based on a reinforcement learning and generation type countermeasure network.
In order to solve the technical problems, the invention adopts the technical scheme that: a local traffic optimization method based on a reinforcement learning and generation type countermeasure network specifically comprises the following steps: s1, establishing a training model, optimizing the training speed of the model by adopting a generated countermeasure network, inputting real traffic data detected at a certain intersection and outputting virtual traffic data; and S2, training the real traffic flow data and the virtual traffic flow data by adopting Q learning to output actions to form a Q value table, obtaining a local traffic optimization scheme, and training the local traffic optimization scheme by adopting a reward function to obtain an optimal local traffic optimization scheme.
In some embodiments, the specific steps of step S1 are: establishing a generative confrontation network model, initializing a generator and a discriminator in the generative confrontation network, fixing one party in the training process of the generative confrontation network, updating parameters of the other network, alternately iterating to maximize the error of the other party, and finally generating virtual data distribution which is the same as the real data distribution.
In some embodiments, the specific steps of step S2 are: setting the virtual traffic flow data as a state set S, inputting the state set S to a neural network, and outputting an action set
Figure 358221DEST_PATH_IMAGE001
Wherein a is learning efficiency, weights are set for N lanes, action return values R are obtained through a reward function, real data and virtual data are input into a Q learning algorithm, and an action set is obtained
Figure 31779DEST_PATH_IMAGE002
Thereby approaching the real action and finding the best action set
Figure 761837DEST_PATH_IMAGE003
And obtaining the optimal local traffic optimization scheme.
In some embodiments, the fixed party in the generative confrontation network training process is the generator.
In some embodiments, the set of states S is all states
Figure 604285DEST_PATH_IMAGE004
Collection, state of
Figure 4173DEST_PATH_IMAGE004
The action is the traffic flow of a single intersection, namely the Q value is used as the period adjustment, the period is one-time traffic light switching, and the action set
Figure 289661DEST_PATH_IMAGE003
And for all Q values, the action return value R is the vehicle speed on the road.
The scope of the present invention is not limited to the specific combinations of the above-described features, and other embodiments in which the above-described features or their equivalents are arbitrarily combined are also intended to be encompassed. For example, the above features and the technical features (but not limited to) having similar functions disclosed in the present application are mutually replaced to form the technical solution.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the invention utilizes the advantages of reinforcement learning interactive learning to set period adjustment as action and set traffic flow and local traffic operation condition as state and return, greatly improves the efficiency of traffic signal lamp period adjustment, trains a model through basic data, obtains corresponding reward by state and action, namely checks whether the congestion condition is relieved or not by adjusting the current congestion level and the time ratio of a traffic light at a certain intersection, obtains the optimal time ratio of the traffic light through reciprocating adjustment, utilizes the inspiring self-game thought of a generative confrontation network, can train and generate the confrontation network by utilizing limited basic data, then utilizes new data generated by the generated confrontation network to form virtual data and combines the basic data to improve the reinforcement learning speed, creatively uses the generated confrontation network to realize the optimal training of Q learning, the two are combined with each other, local traffic optimization is realized in the aspect of traffic signal lamp period, the best adjustment scheme is finally obtained, and the local traffic optimization efficiency can be greatly improved.
Drawings
FIG. 1 is a flow diagram of the present invention;
FIG. 2 is a diagram of a generative confrontation network architecture;
FIG. 3 is a diagram of a generative confrontation network training process;
FIG. 4 is a schematic view of a partial traffic network;
fig. 5 is a schematic view of the traffic optimization principle.
Detailed Description
The invention is described below with reference to the accompanying drawings:
(1) data set and feature selection
The traffic flow of an intersection is set as a data set, a typical intersection is researched by the invention, as shown in fig. 4, the state space size of the intersection is the traffic flow size of all roads, the action is set as a red light or a green light, the action quality is judged by using the speed size of the roads as reward return, one time of traffic light switching is regarded as one period, the action adjustment is set to be carried out every three periods, namely the adjustment of the time ratio of the traffic light, an optimal Q value table is found through a large amount of training and is applied to the specific intersection, and the time ratio of the traffic light of a signal light can be adjusted in time to optimize traffic.
(2) Detailed description of the invention
The method introduces a generative confrontation network for improving the training effect of the model on normal data and simultaneously inhibiting the generalization capability of the model on abnormal data, as shown in fig. 2, the generative confrontation network comprises a generator G and a discriminator D, the generator G tries to generate traffic flow sample data closer to reality, the discriminator D tries to perfectly distinguish the real data from the generated data so as to generate data which is required to be obtained, and then the network structure is shown in fig. 2.
The objective function of the generative confrontation network model is as follows:
Figure 433198DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 487741DEST_PATH_IMAGE006
as real data
Figure 324110DEST_PATH_IMAGE007
The distribution of (a) to (b) is,
Figure 96894DEST_PATH_IMAGE008
for noise variance, D is discriminant function, x is true data, D (x) is probability of discriminant true data, D (G (z)) is probability of discriminant generated data, and D is trained to maximize
Figure 277078DEST_PATH_IMAGE009
And
Figure 451707DEST_PATH_IMAGE010
training G minimization
Figure 458977DEST_PATH_IMAGE011
I.e. to maximize the loss of D. Also can be combined with
Figure 719057DEST_PATH_IMAGE012
And
Figure 204396DEST_PATH_IMAGE013
in the sense of the loss of D,
Figure 233532DEST_PATH_IMAGE013
the loss of G is understood, one side is fixed in the training process, the parameters of the other network are updated, alternate iteration is performed to maximize the error of the other side, and finally G can estimate the distribution of sample data, namely the generated sample is more real.
In the embodiment, the idea of the generative confrontation network algorithm is to initialize G and D at first, then in each iteration process, fix G and train D; selecting m sample points from the data set, and selecting m vectors from a distribution (uniform distribution, normal distribution, etc.); taking a vector z in the m vectors as the input of the network to obtain m generated data; train D to maximize
Figure 411704DEST_PATH_IMAGE012
And
Figure 893501DEST_PATH_IMAGE013
training G minimization
Figure 683996DEST_PATH_IMAGE013
(ii) a G hope
Figure 567638DEST_PATH_IMAGE014
Approaching
1, i.e., positive class, so that G losses will be minimal, D expects the output of the real data to approach 1, generating an output of the data
Figure 916711DEST_PATH_IMAGE015
Approaching 0.
The generative confrontation network training process is shown in fig. 3, wherein a light dotted line represents the corresponding distribution situation of the generated data in the discriminator, a dark dotted line represents the distribution situation of the real data, a solid line represents the generation distribution situation of the data, and fig. (a) shows that the training is just started and the classification capability of the training is limited; the training effect of D is better, and generated data can be obviously distinguished; in the graph (c), the solid line and the dark dotted line deviate, the light dotted line drops, which indicates that the probability of generating data drops, the solid line moves in the direction of the light dotted line, G is promoted during the training process, and G also affects the distribution of D, and if G is fixed, D is trained to be optimal, the formula is as follows:
Figure 620225DEST_PATH_IMAGE016
wherein the content of the first and second substances,
Figure 447366DEST_PATH_IMAGE017
in order to be the distribution of the real data x,
Figure 451095DEST_PATH_IMAGE018
to generate a distribution of data x
Figure 971069DEST_PATH_IMAGE019
Increasingly approaching
Figure 427458DEST_PATH_IMAGE020
Figure 556826DEST_PATH_IMAGE021
Approaching 0.5, i.e., the state of diagram (d), i.e., the training result is finally obtained, and the distribution is the same as the distribution, and agent training for reinforcement learning is performed simultaneously by using the generated data and the real data.
As shown in fig. 4, the basic traffic implementation principle is as follows: the signal controller sends an action by controlling the next second state of the signal lamp, thereby changing the vehicle speed state of the lane detected by the roadside detector, and then obtaining a cyclic process of reward in interaction with the environment, so the Markov property is simply expressed as: m = < S, A, Ps, a, R >.
Specifically, as shown in fig. 5, a certain intersection is provided with N lanes at a one-way exit, a detector is provided on each road in each direction for detecting a vehicle to obtain a vehicle speed V, and the road with a length L is divided into M regional sections, so that the size of a state space at the time t of the exit can be obtained, and the size is defined as:
Figure 149481DEST_PATH_IMAGE022
and the state set S of the intersection is all
Figure 699411DEST_PATH_IMAGE023
A collection of (a).
In this embodiment, the right turn is set without being controlled by the signal lamp, so that there are four states at an intersection: the south-north direction is straight, the south-north direction is turned left, the east-west direction is straight, the east-west direction is turned left, 1 represents that green light can pass, 0 represents that red light is forbidden to pass, then four states have four actions, and are represented by a one-dimensional binary array: [1,0,0,0], [0,1,0,0], [0,0,1,0] and [0,0,0,1], simulation of time control of the traffic light signal is realized by changing the input array, for example, the input [1,0, 0], [1,0,0,0] represents the straight-going green light in the north-south direction for two seconds.
The reward function needs to reflect the unblocked jam condition of the local traffic network, the traffic condition can be well judged according to the traffic speed of the lanes under normal conditions, the faster the average speed is, the better the traffic is, because the traffic flow of each lane has the size difference, the calculation of the average speed can not be directly carried out on all lanes in the area, the lane with the large traffic flow has a large contribution to the average speed of the whole local network, and a large proportion is given to the lane, and the reward function formula is as follows:
Figure 518462DEST_PATH_IMAGE024
wherein the content of the first and second substances,
Figure 77620DEST_PATH_IMAGE025
is a constant number of times, and is,
Figure 665727DEST_PATH_IMAGE026
represents the average vehicle speed of the lane with lane number i,
Figure 386558DEST_PATH_IMAGE027
the amount of traffic in lane i is shown,
Figure 692906DEST_PATH_IMAGE028
showing the total flow of all lanes in the local traffic network,
Figure 790175DEST_PATH_IMAGE029
is a set standard average speed above which the calculated speed gives a positive return and below which a negative return is given.
For the storage of Q values, the inputs are per state and the outputs are actions, i.e. Q values:
Figure 11552DEST_PATH_IMAGE030
wherein the content of the first and second substances,
Figure 903284DEST_PATH_IMAGE031
the input is the state s and the action value function Q corresponding to the action is output as the parameter of the neural network.
Calculating by using a reward function formula to obtain a return value of the last action
Figure 696928DEST_PATH_IMAGE032
The virtual data and the real data are used for training the neural network, so that a real action value function is approximated, and an optimal strategy, namely a set of all optimal actions, is found.
The learning algorithm updates the formula as:
Figure 332309DEST_PATH_IMAGE033
wherein, a is learning efficiency, and a is large for learning efficiency
Figure 629429DEST_PATH_IMAGE034
Is greatly affected by the next state. R is the value of the reward R,
Figure 957642DEST_PATH_IMAGE035
the selection policy representing the next set of states.
Figure 973002DEST_PATH_IMAGE036
In order to achieve a discount rate, the rate of the discount,
Figure 677653DEST_PATH_IMAGE037
the lower the learning efficiency is, the more influenced by the reward value.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (5)

1. A local traffic optimization method based on a reinforcement learning and generation type countermeasure network is characterized by comprising the following steps: s1, establishing a training model, optimizing the training speed of the model by adopting a generated countermeasure network, inputting real traffic flow data detected at a certain intersection, and outputting virtual traffic flow data; and S2, training the real traffic flow data and the virtual traffic flow data by adopting Q learning to output actions to form a Q value table, obtaining a local traffic optimization scheme, and training the local traffic optimization scheme by adopting a reward function to obtain an optimal local traffic optimization scheme.
2. The local traffic optimization method based on reinforcement learning and generative countermeasure network of claim 1, wherein the specific steps of step S1 are as follows: establishing a generative confrontation network model, initializing a generator and a discriminator in the generative confrontation network, fixing one party in the training process of the generative confrontation network, updating parameters of the other network, alternately iterating to maximize the error of the other party, and finally generating virtual data distribution which is the same as the real data distribution.
3. The local traffic optimization method based on reinforcement learning and generative countermeasure network of claim 1, wherein the specific steps of step S2 are as follows: setting the virtual traffic flow data as a state set S, inputting the state set S to a neural network, and outputting an action set
Figure DEST_PATH_IMAGE001
Wherein a is learning efficiency, weights are set for N lanes, action return values R are obtained through a reward function, real data and virtual data are input into a Q learning algorithm, and an action set is obtained
Figure 101452DEST_PATH_IMAGE002
Thereby approaching the real action and finding the best action set
Figure 487434DEST_PATH_IMAGE001
To obtainAnd optimizing a local traffic optimization scheme.
4. The local traffic optimization method based on reinforcement learning and generative confrontation network as claimed in claim 2, wherein: the fixed party is a generator in the process of generating confrontation network training.
5. The local traffic optimization method based on reinforcement learning and generative confrontation network as claimed in claim 3, wherein: the set of states S is all states
Figure 860647DEST_PATH_IMAGE003
Collection, state of
Figure 724697DEST_PATH_IMAGE003
The action is the traffic flow of a single intersection, namely the Q value is used as the period adjustment, the period is one-time traffic light switching, and the action set
Figure 381813DEST_PATH_IMAGE002
And for all Q values, the action return value R is the vehicle speed on the road.
CN202110526842.0A 2021-05-14 2021-05-14 Local traffic optimization method based on reinforcement learning and generation type countermeasure network Active CN112991750B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110526842.0A CN112991750B (en) 2021-05-14 2021-05-14 Local traffic optimization method based on reinforcement learning and generation type countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110526842.0A CN112991750B (en) 2021-05-14 2021-05-14 Local traffic optimization method based on reinforcement learning and generation type countermeasure network

Publications (2)

Publication Number Publication Date
CN112991750A true CN112991750A (en) 2021-06-18
CN112991750B CN112991750B (en) 2021-11-30

Family

ID=76336522

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110526842.0A Active CN112991750B (en) 2021-05-14 2021-05-14 Local traffic optimization method based on reinforcement learning and generation type countermeasure network

Country Status (1)

Country Link
CN (1) CN112991750B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113506450A (en) * 2021-07-28 2021-10-15 浙江海康智联科技有限公司 Qspare-based single-point signal timing scheme selection method
CN114613170A (en) * 2022-03-10 2022-06-10 湖南大学 Traffic signal lamp intersection coordination control method based on reinforcement learning
CN115662152A (en) * 2022-09-27 2023-01-31 哈尔滨理工大学 Urban traffic management self-adaptive system based on deep learning drive

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194612A (en) * 2017-06-20 2017-09-22 清华大学 A kind of train operation dispatching method learnt based on deeply and system
CN111191654A (en) * 2019-12-30 2020-05-22 重庆紫光华山智安科技有限公司 Road data generation method and device, electronic equipment and storage medium
CN111311577A (en) * 2020-02-14 2020-06-19 迈拓仪表股份有限公司 Intelligent water seepage detection method based on generation of confrontation network and reinforcement learning
US20200242477A1 (en) * 2019-01-30 2020-07-30 StradVision, Inc. Method and device for providing information for evaluating driving habits of driver by detecting driving scenarios occurring during driving
CN112700664A (en) * 2020-12-19 2021-04-23 北京工业大学 Traffic signal timing optimization method based on deep reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194612A (en) * 2017-06-20 2017-09-22 清华大学 A kind of train operation dispatching method learnt based on deeply and system
US20200242477A1 (en) * 2019-01-30 2020-07-30 StradVision, Inc. Method and device for providing information for evaluating driving habits of driver by detecting driving scenarios occurring during driving
CN111191654A (en) * 2019-12-30 2020-05-22 重庆紫光华山智安科技有限公司 Road data generation method and device, electronic equipment and storage medium
CN111311577A (en) * 2020-02-14 2020-06-19 迈拓仪表股份有限公司 Intelligent water seepage detection method based on generation of confrontation network and reinforcement learning
CN112700664A (en) * 2020-12-19 2021-04-23 北京工业大学 Traffic signal timing optimization method based on deep reinforcement learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113506450A (en) * 2021-07-28 2021-10-15 浙江海康智联科技有限公司 Qspare-based single-point signal timing scheme selection method
CN114613170A (en) * 2022-03-10 2022-06-10 湖南大学 Traffic signal lamp intersection coordination control method based on reinforcement learning
CN114613170B (en) * 2022-03-10 2023-02-17 湖南大学 Traffic signal lamp intersection coordination control method based on reinforcement learning
CN115662152A (en) * 2022-09-27 2023-01-31 哈尔滨理工大学 Urban traffic management self-adaptive system based on deep learning drive

Also Published As

Publication number Publication date
CN112991750B (en) 2021-11-30

Similar Documents

Publication Publication Date Title
CN112991750B (en) Local traffic optimization method based on reinforcement learning and generation type countermeasure network
CN112216124B (en) Traffic signal control method based on deep reinforcement learning
Wang et al. Adaptive Traffic Signal Control for large-scale scenario with Cooperative Group-based Multi-agent reinforcement learning
CN114360266B (en) Intersection reinforcement learning signal control method for sensing detection state of internet connected vehicle
CN113744527B (en) Intelligent targeting dredging method for highway confluence area
CN113538910A (en) Self-adaptive full-chain urban area network signal control optimization method
CN113554875B (en) Variable speed-limiting control method for heterogeneous traffic flow of expressway based on edge calculation
Ma et al. A deep reinforcement learning approach to traffic signal control with temporal traffic pattern mining
CN113257016B (en) Traffic signal control method and device and readable storage medium
Grover et al. Traffic control using V-2-V based method using reinforcement learning
CN111126687B (en) Single-point offline optimization system and method for traffic signals
CN105185106A (en) Road traffic flow parameter prediction method based on granular computing
Srinivasan et al. Cooperative multi-agent system for coordinated traffic signal control
CN112950963A (en) Self-adaptive signal control optimization method for main branch intersection of city
Lukoševicius et al. Time warping invariant echo state networks
Song et al. Traffic signal control under mixed traffic with connected and automated vehicles: a transfer-based deep reinforcement learning approach
Li et al. Deep imitation learning for traffic signal control and operations based on graph convolutional neural networks
Cao et al. Design of a traffic junction controller using classifier system and fuzzy logic
CN114970058A (en) Large-scale network signal control optimization method based on belief domain Bayes
Ghods et al. A genetic-fuzzy control application to ramp metering and variable speed limit control
Arabi et al. Reinforcement learning-driven attack on road traffic signal controllers
CN117133138A (en) Multi-intersection traffic signal cooperative control method
Nishikawa et al. Improvements of the traffic signal control by complex-valued Hopfield networks
CN115796017A (en) Interpretable traffic cognition method based on fuzzy theory
Tung et al. Novel traffic signal timing adjustment strategy based on genetic algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 35 / F, block a, Suzhou City Life Plaza, 251 pinglong Road, Gusu District, Suzhou City, Jiangsu Province 215000

Patentee after: Jiangsu Boyuxin Information Technology Co.,Ltd.

Country or region after: China

Address before: 35 / F, block a, Suzhou City Life Plaza, 251 pinglong Road, Gusu District, Suzhou City, Jiangsu Province 215000

Patentee before: Suzhou BOYUXIN Transportation Technology Co.,Ltd.

Country or region before: China