CN112991750B - Local traffic optimization method based on reinforcement learning and generation type countermeasure network - Google Patents
Local traffic optimization method based on reinforcement learning and generation type countermeasure network Download PDFInfo
- Publication number
- CN112991750B CN112991750B CN202110526842.0A CN202110526842A CN112991750B CN 112991750 B CN112991750 B CN 112991750B CN 202110526842 A CN202110526842 A CN 202110526842A CN 112991750 B CN112991750 B CN 112991750B
- Authority
- CN
- China
- Prior art keywords
- traffic
- training
- learning
- local traffic
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/01—Detecting movement of traffic to be counted or controlled
- G08G1/0104—Measuring and analyzing of parameters relative to traffic conditions
- G08G1/0125—Traffic data processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
A local traffic optimization method based on a reinforcement learning and generation type countermeasure network comprises the steps of establishing a training model, automatically improving the accuracy of the model by adopting the generation countermeasure network, and predicting traffic flow data at a specified time by training real traffic flow data detected at a certain intersection; the method comprises the steps of training real traffic flow data and virtual traffic flow data by adopting Q learning to output actions to form a Q value table, obtaining an optimal local traffic optimization strategy by adopting a reward function, utilizing the advantages of reinforcement learning interactive learning, greatly improving the efficiency of traffic signal lamp period adjustment, verifying whether congestion conditions are relieved by adjusting the current congestion level and the traffic signal lamp time ratio of a certain intersection, repeatedly and continuously optimizing to obtain the optimal traffic light time ratio, utilizing the inspiring self-game idea of a generative confrontation network to realize limited time optimal training of Q learning, realizing local traffic optimization, and finally obtaining an optimal adjustment scheme, thereby improving the local traffic optimization capability.
Description
Technical Field
The invention belongs to the field of traffic optimization, and particularly relates to a local traffic optimization method based on a reinforcement learning and generation type countermeasure network.
Background
The traditional local traffic optimization method comprises several typical control systems such as TRANSYT, SCOOT and the like, signal timing is optimized mainly through real-time data obtained by vehicle detection equipment, and control is realized through various communication and signal control equipment.
At present, various artificial intelligence methods are applied to traffic control and optimization, however, the methods have limitations in solving the problem of local traffic optimization, the local traffic optimization is a huge system, a large amount of empirical knowledge reasoning required by an expert system and the establishment of a knowledge base are difficult, and traffic parameters are not easily described through some qualitative knowledge and relations. The traditional artificial neural network is easy to fall into local optimization due to the traversability of learning samples, so that other methods need to be combined to improve generalization capability. The existing method has a good effect of solving the problem of traffic optimization at a single intersection. But in the face of complex road sections and local traffic control, the apparent capacity is insufficient. Therefore, it is of great significance to design an optimization scheme capable of efficiently solving the local traffic problem.
Disclosure of Invention
The invention aims to provide a local traffic optimization method based on a reinforcement learning and generation type countermeasure network.
In order to solve the technical problems, the invention adopts the technical scheme that: a local traffic optimization method based on a reinforcement learning and generation type countermeasure network specifically comprises the following steps: a local traffic optimization method based on a reinforcement learning and generation type countermeasure network is characterized by comprising the following steps: s1, establishing a training model, optimizing the training speed of the model by adopting a generated countermeasure network, inputting a real traffic data state set S detected at a certain intersection,
outputting virtual traffic flow data; s2, training the real traffic flow data and the virtual traffic flow data by adopting Q learning and outputting an action setForm a Q value table with the formulaWherein a is the learning efficiency,inputting a state set s as a parameter of a neural network, outputting an action value function Q corresponding to an action to obtain a local traffic optimization scheme, training the local traffic optimization scheme by adopting a reward function, and calculating by utilizing a reward function formula to obtain a return value of the previous actionThe reward function is formulated asWherein, in the step (A),is a constant number of times, and is,represents the average vehicle speed of the lane with lane number i,the amount of traffic in lane i is shown,showing the total flow of all lanes in the local traffic network,is a set standard average speed, the calculated speed is higher than the speed and gives a positive return, and the calculated speed is lower than the speed and gives a negative return, the optimal strategy is found, namely the set of all the optimal actions, and the learning algorithm is updated according to the formulaWherein, a is learning efficiency, and a is large for learning efficiencyIs greatly influenced by the next state, R is a reward value R,the selection policy representing the next set of states,in order to achieve a discount rate, the rate of the discount,the lower the learning efficiency is, the more affected by the reward value, the best local traffic optimization scheme is obtained.
In some embodiments, the specific steps of step S1 are: establishing a generative confrontation network model, initializing a generator and a discriminator in the generative confrontation network, fixing one party in the training process of the generative confrontation network, updating parameters of the other network, alternately iterating to maximize the error of the other party, and finally generating virtual data distribution which is the same as the real data distribution.
In some embodiments, the fixed party in the generative confrontation network training process is the generator.
In some embodiments, the set of states S for an intersection is all at time tSet of (2), stateThe traffic flow of all lanes at the one-way exit of the crossroad at the moment t, the action, namely the Q value, is adjusted as a period, the period is one-time traffic light switching, and the action setAnd for all Q values, the action return value R is the vehicle speed on the road.
In some embodiments, there are four intersection states: the south-north direction is straight, the south-north direction is turned left, the east-west direction is straight, the east-west direction is turned left, 1 represents that green light can pass, 0 represents that red light is forbidden to pass, then four states have four actions, and are represented by a one-dimensional binary array: [1,0,0,0], [0,1,0,0], [0,0,1,0], and [0,0,0,1], simulation of time control of traffic signals is realized by changing the input array, with one second as a unit.
The scope of the present invention is not limited to the specific combinations of the above-described features, and other embodiments in which the above-described features or their equivalents are arbitrarily combined are also intended to be encompassed. For example, the above features and the technical features (but not limited to) having similar functions disclosed in the present application are mutually replaced to form the technical solution.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the invention utilizes the advantages of reinforcement learning interactive learning to set period adjustment as action and set traffic flow and local traffic operation condition as state and return, greatly improves the efficiency of traffic signal lamp period adjustment, trains a model through basic data, obtains corresponding reward by state and action, namely checks whether the congestion condition is relieved or not by adjusting the current congestion level and the time ratio of a traffic light at a certain intersection, obtains the optimal time ratio of the traffic light through reciprocating adjustment, utilizes the inspiring self-game thought of a generative confrontation network, can train and generate the confrontation network by utilizing limited basic data, then utilizes new data generated by the generated confrontation network to form virtual data and combines the basic data to improve the reinforcement learning speed, creatively uses the generated confrontation network to realize the optimal training of Q learning, the two are combined with each other, local traffic optimization is realized in the aspect of traffic signal lamp period, the best adjustment scheme is finally obtained, and the local traffic optimization efficiency can be greatly improved.
Drawings
FIG. 1 is a flow diagram of the present invention;
FIG. 2 is a diagram of a generative confrontation network architecture;
FIG. 3 is a diagram of a generative confrontation network training process;
FIG. 4 is a schematic view of a partial traffic network;
fig. 5 is a schematic view of the traffic optimization principle.
Detailed Description
The invention is described below with reference to the accompanying drawings:
(1) data set and feature selection
The traffic flow of an intersection is set as a data set, a typical intersection is researched by the invention, as shown in fig. 4, the state space size of the intersection is the traffic flow size of all roads, the action is set as a red light or a green light, the action quality is judged by using the speed size of the roads as reward return, one time of traffic light switching is regarded as one period, the action adjustment is set to be carried out every three periods, namely the adjustment of the time ratio of the traffic light, an optimal Q value table is found through a large amount of training and is applied to the specific intersection, and the time ratio of the traffic light of a signal light can be adjusted in time to optimize traffic.
(2) Detailed description of the invention
The method introduces a generative confrontation network for improving the training effect of the model on normal data and simultaneously inhibiting the generalization capability of the model on abnormal data, as shown in fig. 2, the generative confrontation network comprises a generator G and a discriminator D, the generator G tries to generate traffic flow sample data closer to reality, the discriminator D tries to perfectly distinguish the real data from the generated data so as to generate data which is required to be obtained, and then the network structure is shown in fig. 2.
The objective function of the generative confrontation network model is as follows:
wherein the content of the first and second substances,as real dataThe distribution of (a) to (b) is,for noise variance, D is discriminant function, x is true data, D (x) is probability of discriminant true data, D (G (z)) is probability of discriminant generated data, and D is trained to maximizeAndtraining G minimizationI.e. to maximize the loss of D. Also can be combined withAndin the sense of the loss of D,the loss of G is understood, one side is fixed in the training process, the parameters of the other network are updated, alternate iteration is performed to maximize the error of the other side, and finally G can estimate the distribution of sample data, namely the generated sample is more real.
In the embodiment, the idea of the generative confrontation network algorithm is to initialize G and D at first, then in each iteration process, fix G and train D; selecting m sample points from the data set, and selecting m vectors from a distribution (uniform distribution, normal distribution, etc.); taking a vector z in the m vectors as the input of the network to obtain m generated data; train D to maximizeAndtraining G minimization(ii) a G hopeApproaching 1, i.e., positive class, so that G losses will be minimal, D expects the output of the real data to approach 1, generating an output of the dataApproaching 0.
The generative confrontation network training process is shown in fig. 3, wherein a light dotted line represents the corresponding distribution of the generated data in the discriminator, a dark dotted line represents the distribution of the real data, a solid line represents the generation distribution of the data, and fig. 3 (a) shows that the classification capability of the training system is limited when the training is started; fig. 3 (b), D training effect is better, and generated data can be clearly distinguished; fig. 3 (c), the solid line and the dark dotted line deviate, the light dotted line drops, which indicates that the probability of generating data drops, the solid line moves in the direction of the light dotted line, G is promoted during the training process, G also affects the distribution of D, and if G is fixed, D is trained to be optimal, the formula is as follows:
wherein the content of the first and second substances,in order to be the distribution of the real data x,to generate a distribution of data xIncreasingly approaching,Approaching 0.5, i.e. the state of fig. 3 (d), i.e. the training result is finally obtained, and the distribution is the same as the distribution, and agent training for reinforcement learning is performed simultaneously with the generated data and the real data.
As shown in fig. 4, the basic traffic implementation principle is as follows: the signal controller sends an action by controlling the next second state of the signal lamp, thereby changing the vehicle speed state of the lane detected by the roadside detector, and then obtaining a cyclic process of reward in interaction with the environment, so the Markov property is simply expressed as: m = < S, A, Ps, a, R >.
Specifically, as shown in fig. 5, a certain intersection is provided with N lanes at a one-way exit, a detector is provided on each road in each direction for detecting a vehicle to obtain a vehicle speed V, and the road with a length L is divided into M regional sections, so that the size of a state space at the time t of the exit can be obtained, and the size is defined as:
In this embodiment, the right turn is set without being controlled by the signal lamp, so that there are four states at an intersection: the south-north direction is straight, the south-north direction is turned left, the east-west direction is straight, the east-west direction is turned left, 1 represents that green light can pass, 0 represents that red light is forbidden to pass, then four states have four actions, and are represented by a one-dimensional binary array: [1,0,0,0], [0,1,0,0], [0,0,1,0] and [0,0,0,1], simulation of time control of the traffic light signal is realized by changing the input array, for example, the input [1,0, 0], [1,0,0,0] represents the straight-going green light in the north-south direction for two seconds.
The reward function needs to reflect the unblocked jam condition of the local traffic network, the traffic condition can be well judged according to the traffic speed of the lanes under normal conditions, the faster the average speed is, the better the traffic is, because the traffic flow of each lane has the size difference, the calculation of the average speed can not be directly carried out on all lanes in the area, the lane with the large traffic flow has a large contribution to the average speed of the whole local network, and a large proportion is given to the lane, and the reward function formula is as follows:
wherein the content of the first and second substances,is a constant number of times, and is,represents the average vehicle speed of the lane with lane number i,the amount of traffic in lane i is shown,showing the total flow of all lanes in the local traffic network,is a set standard average speed above which the calculated speed gives a positive return and below which a negative return is given.
For the storage of Q values, the inputs are per state and the outputs are actions, i.e. Q values:
wherein the content of the first and second substances,the input is a state set S and the action value function Q corresponding to the action is output.
The virtual traffic data and the real traffic data are used for training a neural network, so that a real action value function is approximated, and an optimal strategy, namely a set of all optimal actions, is found.
The learning algorithm updates the formula as:
wherein, a is learning efficiency, and a is large for learning efficiencyIs greatly affected by the next state. R is the value of the reward R,the selection policy representing the next set of states.In order to achieve a discount rate, the rate of the discount,the lower the learning efficiency is, the more influenced by the reward value.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.
Claims (5)
1. A local traffic optimization method based on a reinforcement learning and generation type countermeasure network is characterized by comprising the following steps: s1, establishing a training model, optimizing the training speed of the model by adopting a generated countermeasure network, and inputting a certain tenOutputting virtual traffic flow data by using a real traffic flow data state set S detected by a word road interface; s2, training the real traffic flow data and the virtual traffic flow data by Q learning and outputting an action set to form a Q value table, wherein the formula isWherein a is the learning efficiency,inputting parameters of a neural network into a state set s, outputting an action value function Q corresponding to an action to obtain a local traffic optimization scheme, training the local traffic optimization scheme by adopting a reward function, and calculating by utilizing a reward function formula to obtain a previous action return value, wherein the reward function formula isWherein, in the step (A),is a constant number of times, and is,represents the average vehicle speed of the lane with lane number i,the amount of traffic in lane i is shown,showing the total flow of all lanes in the local traffic network,is a set standard average speed, the calculated speed is higher than the speed and gives a positive return, and the calculated speed is lower than the speed and gives a negative return, the optimal strategy is found, namely the set of all the optimal actions, and the learning algorithm is updated according to the formulaWherein, a is learning efficiency, and a is large for learning efficiencyIs greatly influenced by the next state, R is a reward value R,the selection policy representing the next set of states,in order to achieve a discount rate, the rate of the discount,the lower the learning efficiency is, the more affected by the reward value, the best local traffic optimization scheme is obtained.
2. The local traffic optimization method based on reinforcement learning and generative confrontation network as claimed in claim 1, wherein the step S1 using the generative confrontation network to optimize the model training speed comprises the following specific steps: establishing a generative confrontation network model, initializing a generator and a discriminator in the generative confrontation network, fixing one party in the training process of the generative confrontation network, updating parameters of the other network, alternately iterating to maximize the error of the other party, and finally generating virtual data distribution which is the same as the real data distribution.
3. The local traffic optimization method based on reinforcement learning and generative confrontation network as claimed in claim 2, wherein: the fixed party is a generator in the process of generating confrontation network training.
4. The local traffic optimization method based on reinforcement learning and generative confrontation network as claimed in claim 1, wherein: all the state sets S of a certain crossroad at the moment tThe action, namely Q value is used as cycle adjustment, the cycle is one-time traffic light switching, the action set is a set of all Q values, and the action return value R is the speed of the vehicle on the road.
5. The local traffic optimization method based on reinforcement learning and generative confrontation network as claimed in claim 1, wherein: there are four intersection states: the south-north direction is straight, the south-north direction is turned left, the east-west direction is straight, the east-west direction is turned left, 1 represents that green light can pass, 0 represents that red light is forbidden to pass, then four states have four actions, and are represented by a one-dimensional binary array: [1,0,0,0], [0,1,0,0], [0,0,1,0], and [0,0,0,1], simulation of time control of traffic signals is realized by changing the input array, with one second as a unit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110526842.0A CN112991750B (en) | 2021-05-14 | 2021-05-14 | Local traffic optimization method based on reinforcement learning and generation type countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110526842.0A CN112991750B (en) | 2021-05-14 | 2021-05-14 | Local traffic optimization method based on reinforcement learning and generation type countermeasure network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112991750A CN112991750A (en) | 2021-06-18 |
CN112991750B true CN112991750B (en) | 2021-11-30 |
Family
ID=76336522
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110526842.0A Active CN112991750B (en) | 2021-05-14 | 2021-05-14 | Local traffic optimization method based on reinforcement learning and generation type countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112991750B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113506450B (en) * | 2021-07-28 | 2022-05-17 | 浙江海康智联科技有限公司 | Qspare-based single-point signal timing scheme selection method |
CN114613170B (en) * | 2022-03-10 | 2023-02-17 | 湖南大学 | Traffic signal lamp intersection coordination control method based on reinforcement learning |
CN115662152B (en) * | 2022-09-27 | 2023-07-25 | 哈尔滨理工大学 | Urban traffic management self-adaptive system based on deep learning driving |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107194612A (en) * | 2017-06-20 | 2017-09-22 | 清华大学 | A kind of train operation dispatching method learnt based on deeply and system |
CN111191654A (en) * | 2019-12-30 | 2020-05-22 | 重庆紫光华山智安科技有限公司 | Road data generation method and device, electronic equipment and storage medium |
CN111311577A (en) * | 2020-02-14 | 2020-06-19 | 迈拓仪表股份有限公司 | Intelligent water seepage detection method based on generation of confrontation network and reinforcement learning |
CN112700664A (en) * | 2020-12-19 | 2021-04-23 | 北京工业大学 | Traffic signal timing optimization method based on deep reinforcement learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10832140B2 (en) * | 2019-01-30 | 2020-11-10 | StradVision, Inc. | Method and device for providing information for evaluating driving habits of driver by detecting driving scenarios occurring during driving |
-
2021
- 2021-05-14 CN CN202110526842.0A patent/CN112991750B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107194612A (en) * | 2017-06-20 | 2017-09-22 | 清华大学 | A kind of train operation dispatching method learnt based on deeply and system |
CN111191654A (en) * | 2019-12-30 | 2020-05-22 | 重庆紫光华山智安科技有限公司 | Road data generation method and device, electronic equipment and storage medium |
CN111311577A (en) * | 2020-02-14 | 2020-06-19 | 迈拓仪表股份有限公司 | Intelligent water seepage detection method based on generation of confrontation network and reinforcement learning |
CN112700664A (en) * | 2020-12-19 | 2021-04-23 | 北京工业大学 | Traffic signal timing optimization method based on deep reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN112991750A (en) | 2021-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112991750B (en) | Local traffic optimization method based on reinforcement learning and generation type countermeasure network | |
CN111739284B (en) | Traffic signal lamp intelligent timing method based on genetic algorithm optimization fuzzy control | |
CN112216124A (en) | Traffic signal control method based on deep reinforcement learning | |
CN113538910B (en) | Self-adaptive full-chain urban area network signal control optimization method | |
George et al. | Traffic prediction using multifaceted techniques: A survey | |
CN114360266B (en) | Intersection reinforcement learning signal control method for sensing detection state of internet connected vehicle | |
CN113744527B (en) | Intelligent targeting dredging method for highway confluence area | |
CN113554875B (en) | Variable speed-limiting control method for heterogeneous traffic flow of expressway based on edge calculation | |
CN113257016B (en) | Traffic signal control method and device and readable storage medium | |
CN111126687B (en) | Single-point offline optimization system and method for traffic signals | |
Ma et al. | A deep reinforcement learning approach to traffic signal control with temporal traffic pattern mining | |
Li et al. | Deep imitation learning for traffic signal control and operations based on graph convolutional neural networks | |
CN112950963A (en) | Self-adaptive signal control optimization method for main branch intersection of city | |
Zeng et al. | Training reinforcement learning agent for traffic signal control under different traffic conditions | |
Cao et al. | Design of a traffic junction controller using classifier system and fuzzy logic | |
Song et al. | Traffic signal control under mixed traffic with connected and automated vehicles: a transfer-based deep reinforcement learning approach | |
Zhang et al. | Direction-decision learning based pedestrian flow behavior investigation | |
Nishikawa et al. | Improvements of the traffic signal control by complex-valued Hopfield networks | |
Shi et al. | Improving the generalizability and robustness of large-scale traffic signal control | |
Wu et al. | Deep Reinforcement Learning Based Traffic Signal Control: A Comparative Analysis | |
Shahriar et al. | Intersection traffic efficiency enhancement using deep reinforcement learning and V2X communications | |
Gahlan et al. | A review on various issues, challenges and different methodologies in vehicular environment | |
Faqir et al. | Deep q-learning approach for congestion problem in smart cities | |
Ahmed | Continuous genetic algorithm for traffic signal control | |
CN115691110B (en) | Intersection signal period stable timing method based on reinforcement learning and oriented to dynamic traffic flow |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |