CN106097733B - A kind of traffic signal optimization control method based on Policy iteration and cluster - Google Patents
A kind of traffic signal optimization control method based on Policy iteration and cluster Download PDFInfo
- Publication number
- CN106097733B CN106097733B CN201610696748.9A CN201610696748A CN106097733B CN 106097733 B CN106097733 B CN 106097733B CN 201610696748 A CN201610696748 A CN 201610696748A CN 106097733 B CN106097733 B CN 106097733B
- Authority
- CN
- China
- Prior art keywords
- control
- traffic behavior
- matrix
- traffic
- strategy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/07—Controlling traffic signals
- G08G1/08—Controlling traffic signals according to detected number or speed of vehicles
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Traffic Control Systems (AREA)
Abstract
The present invention proposes that a kind of traffic signal optimization control method based on Policy iteration and cluster, this method are related to Intelligent Optimization Technique field, comprising: step 1, select control program, define traffic behavior, control action, immediate yield andQValue;Step 2, induction control traffic lights, record the traffic behavior, control action and the vehicle number for leaving stop line of each sampling instant;Step 3, traffic behavior is pre-processed, then carries out k mean cluster;Step 4, Policy iteration method optimisation strategy is used in the machine of crossing, and mass center obtained in the strategy and step 3 that optimization obtains is stored in traffic signal control;Step 5, the control strategy substitution induction control obtained using step 4, in the initial time in each sampling period, traffic signal control receives the traffic behavior of crossing machine acquisition, control strategy is inquired according to the corresponding discrete state of mass center, obtain control action and is sent to the execution of crossing machine.
Description
Technical field
The present invention relates to Intelligent Optimization Technique fields.
Background technique
The optimal control of traffic signals is the important component of urban traffic control and control system, traffic signalization
The superiority and inferiority of strategy directly affects the conevying efficiency of entire road network and the trip experience of people, therefore, various intelligent optimal control sides
Method is suggested and is attempted the optimization applied to traffic signal control strategy.
Dynamic Programming is a kind of method for solving optimal control policy, including two methods of value iteration and Policy iteration.It is right
Tactful traffic behavior, phase and immediate yield are sampled, and are then advanced optimized using sample to control strategy, thus very suitable
It closes and solves traffic signal optimization control problem.When carrying out Policy iteration to traffic signalization problem, need vehicle queue
The continuous variables such as length carry out discretization.Traditional discretization method is that entire state space is carried out to uniform division, and practical
The state of appearance is only gathered in some regions of state space, therefore, is carried out using k- mean cluster to the region that state is assembled
It divides, can guarantee higher discretization precision under conditions of using same number discrete state, to improve the effect of optimization
Fruit.
Summary of the invention
The purpose of the present invention is using k- mean cluster to carry out discretization to traffic behavior, to improve the optimization of Policy iteration
Effect, the preferably control strategy of optimization of road joints traffic lights.Final purpose is to increase and pass through crossing in the unit time
Vehicle number, and reduce because wait red light caused by stop frequency and mean delay.
The present invention is first controlled using the induction control method oral sex messenger that satisfies the need, every one section of shorter unit time
Interval, the vehicle number and traffic signals that crossing machine records the vehicle queue length of current phase and next phase, leaves stop line
The control action of controller.After crossing machine collects enough samples, it is poly- that k- mean value is carried out to the vehicle queue length in sample
Class obtains discrete traffic behavior.Then strategy is optimized using Policy iteration, and the strategy optimized is stored in traffic
In signal controller.Every one section of shorter unit interval, crossing machine is the current phase and next phase detected
Vehicle queue length is sent to traffic signal control, what traffic signal control was kept according to vehicle queue length and in advance
Optimisation strategy selects suitable phase movement, executes for crossing machine.
The present invention proposes a kind of traffic signal optimization control method based on Policy iteration and cluster, comprising the following steps:
Step 1, select signal timing plan to be optimized for fixed phase sequence control, define traffic behavior be current phase and
The vehicle queue length of next phase, defining control action is to keep current phase or be switched to next phase, and definition is directly returned
Report is a variable related with the vehicle number of stop line is left in the single sampling period, and definition status-movement is to for discrete friendship
The data vector of logical state and control action composition, the Q value for defining each state-movement pair are indicated in corresponding discrete traffic shape
The expectation obtained after control action accumulation return is taken under state, defining each discrete traffic behavior of control strategy should execute
Control action;
Step 2, the control strategy of traffic signal control is set as induction control, minimum green time, most by crossing machine
Big green time is set as the positive integer times in sampling period, and unit green extension is identical as the sampling period, and crossing machine is to friendship
The vehicle number that logical state, the phase of execution acted and left stop line is sampled and is recorded sample, the method for sampling are as follows: each
Sampling instant recording traffic state, control action and each sampling period leave the vehicle number of stop line;
Step 3, after crossing machine collects the sample specified number, discretization is carried out to the traffic behavior in sample, it is discrete
Change method are as follows: the traffic behavior first obtained to sampling is normalized, and removes the traffic behavior that spacing is more than preset threshold,
K- mean cluster is carried out again, and obtained mass center is numbered, the corresponding discrete traffic behavior of each mass center, and normalizing
The traffic behavior changed in sample is indicated with the number of nearest mass center, obtains corresponding discrete traffic behavior;
Step 4, crossing machine uses Policy iteration optimisation strategy, mass center obtained in the strategy and step 3 that optimization is obtained
It is stored in traffic signal control;
Step 5, the control strategy of crossing machine setting traffic signal control is the control strategy that step 4 obtains, and handle is determined
The plan period is set as the sampling period, and at each decision moment, traffic signal control receives the traffic behavior that crossing machine examination measures,
It is normalized, the traffic behavior after calculating normalization is found out to the distance of each mass center apart from nearest mass center, according to mass center
Corresponding discrete traffic behavior inquires control strategy, obtains control action and is sent to the execution of crossing machine.
The present invention is compared with advantage possessed by the prior art:
Before using Policy iteration optimization traffic signal control strategy, need first to carry out discretization to traffic behavior ---
The continuous state space that the vehicle queue length of two phases is constituted is converted into separate manufacturing firms, the precision of discretization can shadow
Ring the effect of optimization of Policy iteration.In different typical period of time, actual traffic behavior is not dispersed in entire state space, and
It is to concentrate on some regions.The traffic behavior actually occurred is only considered using the discrete traffic behavior that k- means clustering algorithm obtains
The region of concentration is also taken into account the region there is no actual traffic state like that rather than conventional discrete method.Thus,
It is compared with the traditional method, after k- means clustering algorithm, can be obtained using equal number of discrete traffic behavior higher
Discretization precision, to improve the effect of optimization of Policy iteration.
Detailed description of the invention
Fig. 1 is urban road intersection traffic signalization schematic diagram.
Fig. 2 is a kind of traffic signal optimization control method flow chart based on Policy iteration and cluster.
1, the first earth magnetism wagon detector;2, the second earth magnetism wagon detector;3, third earth magnetism wagon detector;4, the 4th
Earth magnetism wagon detector;5, the 5th earth magnetism wagon detector;6, the 6th earth magnetism wagon detector;7, the 7th earth magnetism vehicle detection
Device;8, the 8th earth magnetism wagon detector;9, the 9th earth magnetism wagon detector;10, the tenth earth magnetism wagon detector;11, the ten one
Earth magnetism wagon detector;12, the 12nd earth magnetism wagon detector;13, the 13rd earth magnetism wagon detector;14, the 14th earth magnetism
Wagon detector;15, the 15th earth magnetism wagon detector;16, the 16th earth magnetism wagon detector;17, the 17th ground magnetic vehicle
Detector;18, eighteenthly magnetic vehicle detector;19, the 19th earth magnetism wagon detector;20, the 20th earth magnetism vehicle detection
Device;21, the 21st earth magnetism wagon detector;22, the 22nd earth magnetism wagon detector;23, the 23rd ground magnetic vehicle is examined
Survey device;24, the 24th earth magnetism wagon detector;25, lane one;26, lane two;27, lane three;28, lane four;29, vehicle
Road five;30, lane six;31, lane seven;32, lane eight;33, lane nine;34, lane ten;35, lane 11;36, lane ten
Two.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, with reference to the accompanying drawings, the present invention is made further
It is described in detail.
Each lane requires two earth magnetism wagon detectors of placement, and an earth magnetism wagon detector is placed on stop line
Trip detects the vehicle number by stop line at stop line, another earth magnetism wagon detector is placed in stop line upstream 120
At rice, detection passes through the vehicle number of section at 120 meters of stop line upstream.It can be calculated by the two earth magnetism wagon detectors
The vehicle number between stop line and 120 meters of stop line upstream section of any time in place lane, and it is converted into vehicle
Queue length.As shown in Figure 1, the first earth magnetism wagon detector 1 and the second earth magnetism wagon detector 2 are for detecting lane 1
Vehicle queue length, third earth magnetism wagon detector 3 and the 4th earth magnetism wagon detector 4 are used to detect the vehicle row in lane 2 26
The vehicle queue that team leader's degree, the 5th earth magnetism wagon detector 5 and the 6th earth magnetism wagon detector 6 are used to detect lane 3 27 is long
Degree, the 7th earth magnetism wagon detector 7 and the 8th earth magnetism wagon detector 8 are used to detect the vehicle queue length in lane 4 28, the
Nine earth magnetism wagon detectors 9 and the tenth earth magnetism wagon detector 10 are used to detect the vehicle queue length in lane 5 29, and the 11st
Earth magnetism wagon detector 11 and the 12nd earth magnetism wagon detector 12 are used to detect the vehicle queue length in lane 6 30, and the 13rd
Earth magnetism wagon detector 13 and the 14th earth magnetism wagon detector 14 are used to detect the vehicle queue length in lane 7 31, and the 15th
Earth magnetism wagon detector 15 and the 16th earth magnetism wagon detector 16 are used to detect the vehicle queue length in lane 8 32, and the 17th
Earth magnetism wagon detector 17 and eighteenthly magnetic vehicle detector 18 are used to detect the vehicle queue length in lane 9 33, and the 19th
Earth magnetism wagon detector 19 and the 20th earth magnetism wagon detector 20 are used to detect the vehicle queue length in lane 10, and the 20th
One earth magnetism wagon detector 21 and the 22nd earth magnetism wagon detector 22 are used to detect the vehicle queue length in lane 11,
23rd earth magnetism wagon detector 23 and the 24th earth magnetism wagon detector 24 are used to detect the vehicle row in lane 12
Team leader's degree.
Crossing machine receives 1 to the 24th earth magnetism wagon detector 24 of the first earth magnetism wagon detector and amounts to 24 ground
The information that magnetic vehicle detector is sent, is subsequently forwarded to traffic signal control.Every 10 seconds, traffic signal control was according to connecing
The control strategy of traffic behavior and crossing the machine setting received determines control action.
A kind of traffic signal optimization control method flow chart based on Policy iteration and cluster shown in Fig. 2 includes following step
It is rapid:
Step 1, selection signal control program defines traffic behavior, control action, immediate yield and Q value:
Signal timing plan to be optimized is situated between in case where four symmetrical phases below using fixed phase sequence control program
Continue control program, but the present invention is not limited to use four phases, be also not necessarily limited to use symmetrical phase.Phase 1: allow one 25 He of lane
Vehicle straight trip and right-hand rotation on lane 4 28, allow the vehicle on lane 2 26 and lane 5 29 to keep straight on;Phase 2: allow lane
3 27 and lane six on 30 vehicle turn left;Phase 3: allow the vehicle on lane 7 31 and lane 10 to keep straight on and turn right, permit
Perhaps the vehicle straight trip on lane 8 32 and lane 11;Phase 4: allow the vehicle on lane 9 33 and lane 12 left
Turn.Traffic signals can be only in one in four phases at each moment, and successively execute in sequence.Although phase is suitable
Sequence be it is fixed, the long green light time of each phase need not but be fixed.Defining control action is to keep current phase or be switched to down
One phase, if current phase is phase 1, after 10 seconds, traffic signal control needs Decision Control to act: keeping phase
Position 1, or it is switched to phase 2, if selected phase 2, need to make a control action again by 10 seconds: keeping phase 2, or
Person is switched to phase 3, if selected phase 3, needed to make a control action again by 10 seconds: keeping phase 3, or switching
To phase 4, if selected phase 4, needed to make a control action again by 10 seconds: keeping phase 4, or be switched to phase
1 ... loops back and forth like this.The minimum green time for defining all phases is 10 seconds, and maximum green time is 60 seconds.
The vehicle queue length of each phase is defined as the maximum value of the vehicle queue length in all lanes of the phase, phase
1 vehicle queue length is equal to the maximum in the vehicle queue length in lane 1, lane 2 26, lane 4 28 and lane 5 29
Value;The vehicle queue length of phase 2 is equal to the maximum value in the vehicle queue length in lane 3 27 and lane 6 30;Phase 3
Vehicle queue length is equal to the maximum in the vehicle queue length in lane 7 31, lane 8 32, lane 10 and lane 11
Value;The vehicle queue length of phase 4 is equal to the maximum value in the vehicle queue length in lane 9 33 and lane 12.
The vehicle queue length that traffic behavior is current phase and next phase is defined, for example, if current phase is phase
Position 1, then current traffic condition is indicated by the vector data that the vehicle queue length of phase 1 and phase 2 the two variables form.
At the time of the initial time for defining a sampling period is the movement of traffic signal control Decision Control, the sampling period
Duration it is equal with the duration of decision-making period, be 10 seconds;Defining immediate yield is the vehicle that stop line is left with the single sampling period
Related characteristics of number, indicate to take the direct benefit obtained after control action under a traffic behavior;Define shape
State-movement is to the data vector for discrete traffic behavior and control action composition;The Q value for defining each state-movement pair is place
The expectation obtained after control action accumulation return is taken under corresponding discrete traffic behavior, that is, takes several samplings after control action
The expectation of the sum of the immediate yield obtained in period, what Q value represented is obtained after taking control action under discrete traffic behavior
The long-term interest obtained;Defining control strategy is the control action that should be taken when giving discrete traffic behavior;
The calculation formula of immediate yield r is as follows:
In above formula, npIndicate the vehicle number in a sampling period by stop line, constant 6.5,4.5 in formula and-
1.0 effect is to maintain immediate yield r between [- 1,1].Traffic signal control is sent according to the adjacent machine of crossing twice
Traffic behavior calculate np, immediate yield r then is calculated according to above formula.
State-movement pair Q value is defined as follows:
S indicates that discrete traffic behavior, a indicate the control action executed at traffic behavior s, and (s a) indicates that state-is dynamic to Q
Make the Q value to s-a, E indicates expectation, and (s, a) indicates the immediate yield that execution control action a is obtained at state s to r, and γ is folding
The factor is detained, is a real number between 0 and 1, k expression experienced k-th of sampling period after encountering traffic behavior s, undergo
Traffic behavior s simultaneously executes control action a, and k=1 is corresponded to after a sampling period, and T expression encounters traffic behavior s post-sampling
The T sampling period is terminated at, i.e. the calculating of accumulation return only uses the immediate yield in T sampling period.
Step 2, it the control action to traffic behavior, execution and leaves the vehicle number of stop line and samples.
In specified typical period of time, if morning peak or evening peak period carry out the sampling of a period of time, on sample phase, road
The control strategy of traffic signal control is set as induction control, minimum green time, maximum green time and setting by mouth machine
For the positive integer times in sampling period, unit green extension is identical as the sampling period, when the minimum green light of each phase is arranged
Between be 10 seconds, maximum green time be 60 seconds, unit green extension be 10 seconds.It determines according to the methods below every second
Plan phase: when current phase green time was less than 10 seconds, current phase is kept;Current phase green time was than or equal to 60 seconds
When, it is switched to next phase;When current phase green time was between 10 seconds and 60 seconds, current phase, which has, to be carried out vehicle and just extends green light
Time 10 seconds, does not carry out vehicle and be just directly switch to next phase.Every 10 seconds, crossing machine testing simultaneously stored following message as sample
This: the vehicle queue length of current phase and next phase, the control action of execution and each sampling period leave stop line
Vehicle number.The sample number to be acquired is set as 9000.
Step 3, after crossing machine collects 9000 samples, discretization is carried out to the traffic behavior in sample.Each sample
This arrangement is the form of data vector (l, a, l ', r), and l indicates that the traffic behavior of some sampling instant, a indicate that traffic behavior is l
The control action of Shi Zhihang, l ' indicate that the traffic behavior of next sampling instant after l, r indicate that traffic behavior is transferred to l ' from l
This sampling period in obtain immediate yield, the vehicle for leaving stop line in original sample in each sampling period can be used
Number, is calculated according to the calculation formula of immediate yield r in step 1.
Traffic behavior in sample is pre-processed, is first normalized, then removes spacing more than preset threshold
Traffic behavior.Select Euclidean distance as distance, it is 0.1 that threshold value, which is arranged, and a normalized friendship is first randomly choosed from sample
An empty data set, referred to as traffic state data collection is added in logical state, then traffic behavior remaining in sample under
Column principle is added in data set: if the traffic behavior in sample concentrates the distance of all traffic behaviors to traffic state data
Both greater than 0.1, then traffic state data collection is added in the traffic behavior, is otherwise added without.
K- mean cluster is carried out to the traffic behavior that traffic state data is concentrated, defines the collection that cluster is close traffic behavior
It closes, the corresponding discrete traffic behavior of each cluster defines the mass center that mass center is all traffic behaviors that cluster includes, mass center number is arranged
Be 30, after start to cluster, steps are as follows:
Step a concentrates 30 different traffic behaviors of random selection as initial mass center from traffic state data;
Step b calculates each traffic behavior to the distance of each mass center, each traffic behavior is assigned to nearest matter
The heart forms 30 clusters;
Step c recalculates the mass center of each cluster;
Step d calculates the variable quantity of mass center, i.e., the distance between original mass center and new mass center, if the matter of all clusters
The heart is no longer changed, and k- mean cluster terminates, no to then follow the steps b.
After k- mean cluster, in each sample (l, a, l ', r) l and l ' be assigned to nearest mass center respectively, i.e.,
It is separately converted to discrete traffic behavior s and s ', it is data vector (s, a, s ', r) that sample, which is arranged,.
Step 4, one traffic signal control strategy of arbitrary initialization in the machine of crossing, it is then excellent using Policy iteration method
Change strategy, mass center obtained in the strategy and step 3 that optimization obtains is stored in traffic signal control;
In isolated intersection traffic signal control optimization problem, 30 discrete traffic behaviors, each discrete traffic behavior are shared
Under all there are two control action --- a1It indicates to keep current phase, a2Expression is switched to next phase, and tactful optimization is at crossing
It carries out, is optimized using Policy iteration method, steps are as follows in machine:
Step a, setting the number of iterations are 1, initialize Q value and control strategy, calculate state-transition matrix and immediate yield
Matrix.The Q value of each state-movement pair is initialized as zero, is stored in matrix Q, it is straight according to sample (s, a, s ', r) estimation
Take back report matrix R1And R2, R1, R2It saves respectively and executes control action a1、a2The expectation of the immediate yield obtained afterwards, if i=1,
2 ..., 30, j=1,2 ..., 30, k=1,2, Q, R1And R2Definition difference it is as follows:
Wherein, Q (si,ak) expression movement-state is to si-akQ value, r (si,ak,sj) indicate to be in discrete traffic behavior
si, execute control action akLater, it is transferred to discrete traffic behavior sjWhen the immediate yield that obtains.Initialize a control strategy
For any strategy, it is stored in matrix Π, Π is defined as follows:
Wherein, π (si,ak) indicate in discrete state siLower execution acts akProbability, the sum of every row element of Π be 1.Root
According to sample (s, a, s ', r) estimated state transfer matrix P, it is defined as follows:
Wherein, matrix element p (sj|si,ak) it is conditional probability, it indicates to be in discrete traffic behavior si, execute control action
akLater, next sampling instant is transferred to discrete traffic behavior sjProbability.Utilize R1,R2With the element in P, can find out
Immediate yield matrix R, R are defined as follows:
Wherein, r (si,ak) indicate to be in discrete traffic behavior si, execute control action akThe immediate yield obtained later
It is expected that calculation formula is as follows:
Step b updates Q value, updates matrix Q according to the following formula:
Q=(I- γ P Π)-1R
Wherein, I indicates unit matrix, and γ is discount factor, is set as 0.95, ()-1It indicates to matrix inversion;
Step c, updates control strategy according to Q value, updates the element in matrix Π according to the following formula:
Step d, if the number of iterations is 1, preservation matrix Π to a matrix of the same dimensions Π ', the number of iterations adds 1, returns to step
Rapid b, otherwise, two norms of the difference of solution matrix Π and matrix Π ':
D=| | Π-Π ' | |
If D is equal to 0, Policy iteration terminates, if D is not equal to 0, preservation matrix Π adds to matrix Π ', the number of iterations
1, return step b.
After Policy iteration, obtained control strategy is stored in matrix Q, mass center obtained in matrix Q and step 3
It is stored in traffic signal control;
Step 5, the control strategy of crossing machine setting traffic signal control is the control strategy that step 4 obtains, every 10
Second, traffic signal control receives the traffic behavior that crossing machine examination measures, it is normalized, the friendship after calculating normalization
Lead to state to the distance of each mass center, finds out the number apart from nearest mass center, i.e., discrete traffic behavior siThe number i of state,
Then control action a is selected according to the following formula*:
Traffic signal control is control action a*It is sent to the execution of crossing machine, if a*Value be a1Then keep current phase
Position, if a*Value be a2Then it is switched to next phase.
Claims (1)
1. a kind of traffic signal optimization control method based on Policy iteration and cluster, it is characterised in that:
The following steps are included:
Step 1, select signal timing plan to be optimized for fixed phase sequence control, it is current phase and next for defining traffic behavior
The vehicle queue length of phase, defining control action is to keep current phase or be switched to next phase, defines immediate yield and is
One variable related with the vehicle number of stop line is left in the single sampling period, definition status-movement is to for discrete traffic shape
The data vector of state and control action composition, the Q value for defining each state-movement pair indicate under corresponding discrete traffic behavior
The expectation obtained after control action accumulation return is taken, defining control strategy is the control that each discrete traffic behavior should execute
Movement;
Step 2, the control strategy of traffic signal control is set as induction control by crossing machine, and minimum green time, maximum are green
The lamp time is set as the positive integer times in sampling period, and unit green extension is identical as the sampling period, and crossing machine is to traffic shape
The vehicle number that state, the phase of execution acted and left stop line is sampled and is recorded sample, the method for sampling are as follows: in each sampling
Moment recording traffic state, control action and each sampling period leave the vehicle number of stop line;
Step 3, after crossing machine collects the sample specified number, discretization, discretization side are carried out to the traffic behavior in sample
Method are as follows: the traffic behavior first obtained to sampling is normalized, and removes the traffic behavior that spacing is more than preset threshold, then into
Row k- mean cluster, obtained mass center is numbered, the corresponding discrete traffic behavior of each mass center, and normalization sample
Traffic behavior in this is indicated with the number of nearest mass center, obtains corresponding discrete traffic behavior;
Step 4, crossing machine uses Policy iteration optimisation strategy, and mass center obtained in the strategy and step 3 that optimization obtains is saved
In traffic signal control;
Step 5, the control strategy of crossing machine setting traffic signal control is the control strategy that step 4 obtains, and in decision week
Phase is set as the sampling period, and at each decision moment, traffic signal control receives the traffic behavior that crossing machine examination measures, and carries out
Normalization, the traffic behavior after calculating normalization is found out to the distance of each mass center apart from nearest mass center, corresponding according to mass center
Discrete traffic behavior inquire control strategy, obtain control action and be sent to crossing machine execution,
Wherein used Policy iteration method comprises the steps of:
Step a, setting the number of iterations are 1, initialize Q value and control strategy, calculate state-transition matrix and immediate yield matrix,
The Q value of each state-movement pair is initialized as zero, is stored in matrix Q, according to sample (s, a, s ', r) estimation immediate yield
Matrix R1And R2, s indicates the traffic behavior of some sampling instant, and a indicates the control action executed when discrete traffic behavior is s, always
It altogether include two kinds of control actions, control action a1It is to maintain current phase, control action a2It is to switch to next phase, s ' indicates s
The discrete traffic behavior of next sampling instant later, r indicate discrete traffic behavior out of, s is transferred to s ' this sampling period
The immediate yield of acquisition, calculation formula are as follows:
Wherein, npIndicate the vehicle number in a sampling period by stop line, R1, R2It saves respectively and executes control action a1、a2
The expectation of the immediate yield obtained afterwards, Q, R1And R2Definition difference it is as follows:
Wherein, n indicates the mass center number used when clustering in the step 3, Q (si,ak) indicate state-movement to si-akQ value, r
(si,ak,sj) indicate to be in discrete traffic behavior si, execute control action akLater, it is transferred to discrete traffic behavior sjWhen obtain
Immediate yield, the value range of i and j are all the integers between [1, n], and the value range of k is integer 1 and 2, initialize one
Control strategy is any strategy, is stored in matrix Π, Π is defined as follows:
Wherein, π (si,ak) indicate in discrete state siLower execution acts akProbability, the sum of every row element of Π be 1, according to sample
This (s, a, s ', r) and estimated state transfer matrix P, it is defined as follows:
Wherein, matrix element p (sj|si,ak) it is conditional probability, it indicates to be in discrete traffic behavior si, execute control action akIt
Afterwards, next sampling instant is transferred to discrete traffic behavior sjProbability, utilize R1,R2With the element in P, can find out directly
Matrix R is returned, R is defined as follows:
Wherein, r (si,ak) indicate to be in discrete traffic behavior si, execute control action akThe expectation of the immediate yield obtained later,
Calculation formula is as follows:
Step b updates Q value, updates matrix Q according to the following formula:
Q=(I- γ P Π)-1R
Wherein, I indicates unit matrix, and γ is discount factor, is set as 0.95, ()-1It indicates to matrix inversion;
Step c, updates control strategy according to Q value, updates the element in matrix Π according to the following formula:
Step d, if the number of iterations is 1, preservation matrix Π to a matrix of the same dimensions Π ', the number of iterations adds 1, return step b,
Otherwise, two norms of the difference of solution matrix Π and matrix Π ':
D=| | Π-Π ' | |
If D is equal to 0, Policy iteration terminates, if D is not equal to 0, preservation matrix Π adds 1 to matrix Π ', the number of iterations, returns
Return step b.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610696748.9A CN106097733B (en) | 2016-08-22 | 2016-08-22 | A kind of traffic signal optimization control method based on Policy iteration and cluster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610696748.9A CN106097733B (en) | 2016-08-22 | 2016-08-22 | A kind of traffic signal optimization control method based on Policy iteration and cluster |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106097733A CN106097733A (en) | 2016-11-09 |
CN106097733B true CN106097733B (en) | 2018-12-07 |
Family
ID=58070003
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610696748.9A Expired - Fee Related CN106097733B (en) | 2016-08-22 | 2016-08-22 | A kind of traffic signal optimization control method based on Policy iteration and cluster |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106097733B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2566098B (en) * | 2017-09-05 | 2020-10-07 | Jaguar Land Rover Ltd | Apparatus and method for determining following vehicle information |
CN110378460B (en) * | 2018-04-13 | 2022-03-08 | 北京智行者科技有限公司 | Decision making method |
CN108806287B (en) * | 2018-06-27 | 2021-02-02 | 沈阳理工大学 | Traffic signal timing method based on cooperative optimization |
CN109859475B (en) * | 2019-03-14 | 2021-08-31 | 江苏中设集团股份有限公司 | Intersection signal control method, device and system based on DBSCAN density clustering |
CN112652164B (en) * | 2020-12-02 | 2022-12-30 | 北京北大千方科技有限公司 | Traffic time interval dividing method, device and equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20090040128A (en) * | 2007-10-19 | 2009-04-23 | 한국전자통신연구원 | Apaaratus and system for traffic signal controlling and method thereof |
CN103699933A (en) * | 2013-12-05 | 2014-04-02 | 北京工业大学 | Traffic signal timing optimization method based on minimum spanning tree clustering genetic algorithm |
CN105118308A (en) * | 2015-10-12 | 2015-12-02 | 青岛大学 | Method based on clustering reinforcement learning and used for optimizing traffic signals of urban road intersections |
CN105405303A (en) * | 2015-12-18 | 2016-03-16 | 佛山市高明区云大机械科技有限公司 | Traffic control method based on traffic flow |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101436345B (en) * | 2008-12-19 | 2010-08-18 | 天津市市政工程设计研究院 | System for forecasting harbor district road traffic requirement based on TransCAD macroscopic artificial platform |
EP2801963B1 (en) * | 2013-05-09 | 2016-01-20 | The Boeing Company | Providing a description of aircraft intent |
-
2016
- 2016-08-22 CN CN201610696748.9A patent/CN106097733B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20090040128A (en) * | 2007-10-19 | 2009-04-23 | 한국전자통신연구원 | Apaaratus and system for traffic signal controlling and method thereof |
CN103699933A (en) * | 2013-12-05 | 2014-04-02 | 北京工业大学 | Traffic signal timing optimization method based on minimum spanning tree clustering genetic algorithm |
CN105118308A (en) * | 2015-10-12 | 2015-12-02 | 青岛大学 | Method based on clustering reinforcement learning and used for optimizing traffic signals of urban road intersections |
CN105405303A (en) * | 2015-12-18 | 2016-03-16 | 佛山市高明区云大机械科技有限公司 | Traffic control method based on traffic flow |
Non-Patent Citations (1)
Title |
---|
基于聚类思想的交通信号相位组合优化研究;陈文斌等;《西华大学学报(自然科学版)》;20160531;第35卷(第3期);第40-44页 * |
Also Published As
Publication number | Publication date |
---|---|
CN106097733A (en) | 2016-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106097733B (en) | A kind of traffic signal optimization control method based on Policy iteration and cluster | |
CN104933859B (en) | A kind of method of the determination network carrying power based on macroscopical parent map | |
CN110648527B (en) | Traffic speed prediction method based on deep learning model | |
CN105118308B (en) | Urban road intersection traffic signal optimization method based on cluster intensified learning | |
CN110222873A (en) | A kind of subway station passenger flow forecast method based on big data | |
CN111243271A (en) | Single-point intersection signal control method based on deep cycle Q learning | |
CN109272146A (en) | A kind of Forecasting Flood method corrected based on deep learning model and BP neural network | |
CN105679037B (en) | A kind of dynamic path planning method based on user's trip habit | |
CN101123038A (en) | A dynamic information collection method for associated road segments of intersection | |
CN104464304A (en) | Urban road vehicle running speed forecasting method based on road network characteristics | |
CN103295404B (en) | Road section pedestrian traffic signal control system based on pedestrian crossing clearance time | |
CN105279982A (en) | Single intersection dynamic traffic signal control method based on data driving | |
KR102329826B1 (en) | Device and method for artificial intelligence-based traffic signal control | |
CN102819958B (en) | Cellular simulation method for control of urban road motor vehicle traffic signals | |
CN110188936A (en) | Short-time Traffic Flow Forecasting Methods based on multifactor spatial choice deep learning algorithm | |
CN106856049A (en) | Crucial intersection demand clustering analysis method based on bayonet socket number plate identification data | |
WO2021073526A1 (en) | Trajectory data-based signal control period division method | |
CN109544916A (en) | A kind of road network vehicle OD estimation method based on sample path data | |
CN108806290A (en) | Dynamic bidirectional green wave control method based on traffic state judging | |
CN113378486B (en) | Regional traffic signal optimization method and device, computing equipment and storage medium | |
CN108389406A (en) | Signal control time Automated Partition Method | |
CN102890866A (en) | Traffic flow speed estimation method based on multi-core support vector regression machine | |
CN107945534A (en) | A kind of special bus method for predicting based on GMDH neutral nets | |
CN103914981A (en) | Method for predicting confliction between pedestrians and left-turn vehicles at plane intersection | |
CN113724507B (en) | Traffic control and vehicle guidance cooperative method and system based on deep reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20181207 Termination date: 20200822 |