CN106097733B - A kind of traffic signal optimization control method based on Policy iteration and cluster - Google Patents

A kind of traffic signal optimization control method based on Policy iteration and cluster Download PDF

Info

Publication number
CN106097733B
CN106097733B CN201610696748.9A CN201610696748A CN106097733B CN 106097733 B CN106097733 B CN 106097733B CN 201610696748 A CN201610696748 A CN 201610696748A CN 106097733 B CN106097733 B CN 106097733B
Authority
CN
China
Prior art keywords
control
traffic behavior
matrix
traffic
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610696748.9A
Other languages
Chinese (zh)
Other versions
CN106097733A (en
Inventor
王冬青
张震
董心壮
丁军航
宋婷婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao University
Original Assignee
Qingdao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao University filed Critical Qingdao University
Priority to CN201610696748.9A priority Critical patent/CN106097733B/en
Publication of CN106097733A publication Critical patent/CN106097733A/en
Application granted granted Critical
Publication of CN106097733B publication Critical patent/CN106097733B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • G08G1/08Controlling traffic signals according to detected number or speed of vehicles

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Traffic Control Systems (AREA)

Abstract

The present invention proposes that a kind of traffic signal optimization control method based on Policy iteration and cluster, this method are related to Intelligent Optimization Technique field, comprising: step 1, select control program, define traffic behavior, control action, immediate yield andQValue;Step 2, induction control traffic lights, record the traffic behavior, control action and the vehicle number for leaving stop line of each sampling instant;Step 3, traffic behavior is pre-processed, then carries out k mean cluster;Step 4, Policy iteration method optimisation strategy is used in the machine of crossing, and mass center obtained in the strategy and step 3 that optimization obtains is stored in traffic signal control;Step 5, the control strategy substitution induction control obtained using step 4, in the initial time in each sampling period, traffic signal control receives the traffic behavior of crossing machine acquisition, control strategy is inquired according to the corresponding discrete state of mass center, obtain control action and is sent to the execution of crossing machine.

Description

A kind of traffic signal optimization control method based on Policy iteration and cluster
Technical field
The present invention relates to Intelligent Optimization Technique fields.
Background technique
The optimal control of traffic signals is the important component of urban traffic control and control system, traffic signalization The superiority and inferiority of strategy directly affects the conevying efficiency of entire road network and the trip experience of people, therefore, various intelligent optimal control sides Method is suggested and is attempted the optimization applied to traffic signal control strategy.
Dynamic Programming is a kind of method for solving optimal control policy, including two methods of value iteration and Policy iteration.It is right Tactful traffic behavior, phase and immediate yield are sampled, and are then advanced optimized using sample to control strategy, thus very suitable It closes and solves traffic signal optimization control problem.When carrying out Policy iteration to traffic signalization problem, need vehicle queue The continuous variables such as length carry out discretization.Traditional discretization method is that entire state space is carried out to uniform division, and practical The state of appearance is only gathered in some regions of state space, therefore, is carried out using k- mean cluster to the region that state is assembled It divides, can guarantee higher discretization precision under conditions of using same number discrete state, to improve the effect of optimization Fruit.
Summary of the invention
The purpose of the present invention is using k- mean cluster to carry out discretization to traffic behavior, to improve the optimization of Policy iteration Effect, the preferably control strategy of optimization of road joints traffic lights.Final purpose is to increase and pass through crossing in the unit time Vehicle number, and reduce because wait red light caused by stop frequency and mean delay.
The present invention is first controlled using the induction control method oral sex messenger that satisfies the need, every one section of shorter unit time Interval, the vehicle number and traffic signals that crossing machine records the vehicle queue length of current phase and next phase, leaves stop line The control action of controller.After crossing machine collects enough samples, it is poly- that k- mean value is carried out to the vehicle queue length in sample Class obtains discrete traffic behavior.Then strategy is optimized using Policy iteration, and the strategy optimized is stored in traffic In signal controller.Every one section of shorter unit interval, crossing machine is the current phase and next phase detected Vehicle queue length is sent to traffic signal control, what traffic signal control was kept according to vehicle queue length and in advance Optimisation strategy selects suitable phase movement, executes for crossing machine.
The present invention proposes a kind of traffic signal optimization control method based on Policy iteration and cluster, comprising the following steps:
Step 1, select signal timing plan to be optimized for fixed phase sequence control, define traffic behavior be current phase and The vehicle queue length of next phase, defining control action is to keep current phase or be switched to next phase, and definition is directly returned Report is a variable related with the vehicle number of stop line is left in the single sampling period, and definition status-movement is to for discrete friendship The data vector of logical state and control action composition, the Q value for defining each state-movement pair are indicated in corresponding discrete traffic shape The expectation obtained after control action accumulation return is taken under state, defining each discrete traffic behavior of control strategy should execute Control action;
Step 2, the control strategy of traffic signal control is set as induction control, minimum green time, most by crossing machine Big green time is set as the positive integer times in sampling period, and unit green extension is identical as the sampling period, and crossing machine is to friendship The vehicle number that logical state, the phase of execution acted and left stop line is sampled and is recorded sample, the method for sampling are as follows: each Sampling instant recording traffic state, control action and each sampling period leave the vehicle number of stop line;
Step 3, after crossing machine collects the sample specified number, discretization is carried out to the traffic behavior in sample, it is discrete Change method are as follows: the traffic behavior first obtained to sampling is normalized, and removes the traffic behavior that spacing is more than preset threshold, K- mean cluster is carried out again, and obtained mass center is numbered, the corresponding discrete traffic behavior of each mass center, and normalizing The traffic behavior changed in sample is indicated with the number of nearest mass center, obtains corresponding discrete traffic behavior;
Step 4, crossing machine uses Policy iteration optimisation strategy, mass center obtained in the strategy and step 3 that optimization is obtained It is stored in traffic signal control;
Step 5, the control strategy of crossing machine setting traffic signal control is the control strategy that step 4 obtains, and handle is determined The plan period is set as the sampling period, and at each decision moment, traffic signal control receives the traffic behavior that crossing machine examination measures, It is normalized, the traffic behavior after calculating normalization is found out to the distance of each mass center apart from nearest mass center, according to mass center Corresponding discrete traffic behavior inquires control strategy, obtains control action and is sent to the execution of crossing machine.
The present invention is compared with advantage possessed by the prior art:
Before using Policy iteration optimization traffic signal control strategy, need first to carry out discretization to traffic behavior --- The continuous state space that the vehicle queue length of two phases is constituted is converted into separate manufacturing firms, the precision of discretization can shadow Ring the effect of optimization of Policy iteration.In different typical period of time, actual traffic behavior is not dispersed in entire state space, and It is to concentrate on some regions.The traffic behavior actually occurred is only considered using the discrete traffic behavior that k- means clustering algorithm obtains The region of concentration is also taken into account the region there is no actual traffic state like that rather than conventional discrete method.Thus, It is compared with the traditional method, after k- means clustering algorithm, can be obtained using equal number of discrete traffic behavior higher Discretization precision, to improve the effect of optimization of Policy iteration.
Detailed description of the invention
Fig. 1 is urban road intersection traffic signalization schematic diagram.
Fig. 2 is a kind of traffic signal optimization control method flow chart based on Policy iteration and cluster.
1, the first earth magnetism wagon detector;2, the second earth magnetism wagon detector;3, third earth magnetism wagon detector;4, the 4th Earth magnetism wagon detector;5, the 5th earth magnetism wagon detector;6, the 6th earth magnetism wagon detector;7, the 7th earth magnetism vehicle detection Device;8, the 8th earth magnetism wagon detector;9, the 9th earth magnetism wagon detector;10, the tenth earth magnetism wagon detector;11, the ten one Earth magnetism wagon detector;12, the 12nd earth magnetism wagon detector;13, the 13rd earth magnetism wagon detector;14, the 14th earth magnetism Wagon detector;15, the 15th earth magnetism wagon detector;16, the 16th earth magnetism wagon detector;17, the 17th ground magnetic vehicle Detector;18, eighteenthly magnetic vehicle detector;19, the 19th earth magnetism wagon detector;20, the 20th earth magnetism vehicle detection Device;21, the 21st earth magnetism wagon detector;22, the 22nd earth magnetism wagon detector;23, the 23rd ground magnetic vehicle is examined Survey device;24, the 24th earth magnetism wagon detector;25, lane one;26, lane two;27, lane three;28, lane four;29, vehicle Road five;30, lane six;31, lane seven;32, lane eight;33, lane nine;34, lane ten;35, lane 11;36, lane ten Two.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, with reference to the accompanying drawings, the present invention is made further It is described in detail.
Each lane requires two earth magnetism wagon detectors of placement, and an earth magnetism wagon detector is placed on stop line Trip detects the vehicle number by stop line at stop line, another earth magnetism wagon detector is placed in stop line upstream 120 At rice, detection passes through the vehicle number of section at 120 meters of stop line upstream.It can be calculated by the two earth magnetism wagon detectors The vehicle number between stop line and 120 meters of stop line upstream section of any time in place lane, and it is converted into vehicle Queue length.As shown in Figure 1, the first earth magnetism wagon detector 1 and the second earth magnetism wagon detector 2 are for detecting lane 1 Vehicle queue length, third earth magnetism wagon detector 3 and the 4th earth magnetism wagon detector 4 are used to detect the vehicle row in lane 2 26 The vehicle queue that team leader's degree, the 5th earth magnetism wagon detector 5 and the 6th earth magnetism wagon detector 6 are used to detect lane 3 27 is long Degree, the 7th earth magnetism wagon detector 7 and the 8th earth magnetism wagon detector 8 are used to detect the vehicle queue length in lane 4 28, the Nine earth magnetism wagon detectors 9 and the tenth earth magnetism wagon detector 10 are used to detect the vehicle queue length in lane 5 29, and the 11st Earth magnetism wagon detector 11 and the 12nd earth magnetism wagon detector 12 are used to detect the vehicle queue length in lane 6 30, and the 13rd Earth magnetism wagon detector 13 and the 14th earth magnetism wagon detector 14 are used to detect the vehicle queue length in lane 7 31, and the 15th Earth magnetism wagon detector 15 and the 16th earth magnetism wagon detector 16 are used to detect the vehicle queue length in lane 8 32, and the 17th Earth magnetism wagon detector 17 and eighteenthly magnetic vehicle detector 18 are used to detect the vehicle queue length in lane 9 33, and the 19th Earth magnetism wagon detector 19 and the 20th earth magnetism wagon detector 20 are used to detect the vehicle queue length in lane 10, and the 20th One earth magnetism wagon detector 21 and the 22nd earth magnetism wagon detector 22 are used to detect the vehicle queue length in lane 11, 23rd earth magnetism wagon detector 23 and the 24th earth magnetism wagon detector 24 are used to detect the vehicle row in lane 12 Team leader's degree.
Crossing machine receives 1 to the 24th earth magnetism wagon detector 24 of the first earth magnetism wagon detector and amounts to 24 ground The information that magnetic vehicle detector is sent, is subsequently forwarded to traffic signal control.Every 10 seconds, traffic signal control was according to connecing The control strategy of traffic behavior and crossing the machine setting received determines control action.
A kind of traffic signal optimization control method flow chart based on Policy iteration and cluster shown in Fig. 2 includes following step It is rapid:
Step 1, selection signal control program defines traffic behavior, control action, immediate yield and Q value:
Signal timing plan to be optimized is situated between in case where four symmetrical phases below using fixed phase sequence control program Continue control program, but the present invention is not limited to use four phases, be also not necessarily limited to use symmetrical phase.Phase 1: allow one 25 He of lane Vehicle straight trip and right-hand rotation on lane 4 28, allow the vehicle on lane 2 26 and lane 5 29 to keep straight on;Phase 2: allow lane 3 27 and lane six on 30 vehicle turn left;Phase 3: allow the vehicle on lane 7 31 and lane 10 to keep straight on and turn right, permit Perhaps the vehicle straight trip on lane 8 32 and lane 11;Phase 4: allow the vehicle on lane 9 33 and lane 12 left Turn.Traffic signals can be only in one in four phases at each moment, and successively execute in sequence.Although phase is suitable Sequence be it is fixed, the long green light time of each phase need not but be fixed.Defining control action is to keep current phase or be switched to down One phase, if current phase is phase 1, after 10 seconds, traffic signal control needs Decision Control to act: keeping phase Position 1, or it is switched to phase 2, if selected phase 2, need to make a control action again by 10 seconds: keeping phase 2, or Person is switched to phase 3, if selected phase 3, needed to make a control action again by 10 seconds: keeping phase 3, or switching To phase 4, if selected phase 4, needed to make a control action again by 10 seconds: keeping phase 4, or be switched to phase 1 ... loops back and forth like this.The minimum green time for defining all phases is 10 seconds, and maximum green time is 60 seconds.
The vehicle queue length of each phase is defined as the maximum value of the vehicle queue length in all lanes of the phase, phase 1 vehicle queue length is equal to the maximum in the vehicle queue length in lane 1, lane 2 26, lane 4 28 and lane 5 29 Value;The vehicle queue length of phase 2 is equal to the maximum value in the vehicle queue length in lane 3 27 and lane 6 30;Phase 3 Vehicle queue length is equal to the maximum in the vehicle queue length in lane 7 31, lane 8 32, lane 10 and lane 11 Value;The vehicle queue length of phase 4 is equal to the maximum value in the vehicle queue length in lane 9 33 and lane 12.
The vehicle queue length that traffic behavior is current phase and next phase is defined, for example, if current phase is phase Position 1, then current traffic condition is indicated by the vector data that the vehicle queue length of phase 1 and phase 2 the two variables form.
At the time of the initial time for defining a sampling period is the movement of traffic signal control Decision Control, the sampling period Duration it is equal with the duration of decision-making period, be 10 seconds;Defining immediate yield is the vehicle that stop line is left with the single sampling period Related characteristics of number, indicate to take the direct benefit obtained after control action under a traffic behavior;Define shape State-movement is to the data vector for discrete traffic behavior and control action composition;The Q value for defining each state-movement pair is place The expectation obtained after control action accumulation return is taken under corresponding discrete traffic behavior, that is, takes several samplings after control action The expectation of the sum of the immediate yield obtained in period, what Q value represented is obtained after taking control action under discrete traffic behavior The long-term interest obtained;Defining control strategy is the control action that should be taken when giving discrete traffic behavior;
The calculation formula of immediate yield r is as follows:
In above formula, npIndicate the vehicle number in a sampling period by stop line, constant 6.5,4.5 in formula and- 1.0 effect is to maintain immediate yield r between [- 1,1].Traffic signal control is sent according to the adjacent machine of crossing twice Traffic behavior calculate np, immediate yield r then is calculated according to above formula.
State-movement pair Q value is defined as follows:
S indicates that discrete traffic behavior, a indicate the control action executed at traffic behavior s, and (s a) indicates that state-is dynamic to Q Make the Q value to s-a, E indicates expectation, and (s, a) indicates the immediate yield that execution control action a is obtained at state s to r, and γ is folding The factor is detained, is a real number between 0 and 1, k expression experienced k-th of sampling period after encountering traffic behavior s, undergo Traffic behavior s simultaneously executes control action a, and k=1 is corresponded to after a sampling period, and T expression encounters traffic behavior s post-sampling The T sampling period is terminated at, i.e. the calculating of accumulation return only uses the immediate yield in T sampling period.
Step 2, it the control action to traffic behavior, execution and leaves the vehicle number of stop line and samples.
In specified typical period of time, if morning peak or evening peak period carry out the sampling of a period of time, on sample phase, road The control strategy of traffic signal control is set as induction control, minimum green time, maximum green time and setting by mouth machine For the positive integer times in sampling period, unit green extension is identical as the sampling period, when the minimum green light of each phase is arranged Between be 10 seconds, maximum green time be 60 seconds, unit green extension be 10 seconds.It determines according to the methods below every second Plan phase: when current phase green time was less than 10 seconds, current phase is kept;Current phase green time was than or equal to 60 seconds When, it is switched to next phase;When current phase green time was between 10 seconds and 60 seconds, current phase, which has, to be carried out vehicle and just extends green light Time 10 seconds, does not carry out vehicle and be just directly switch to next phase.Every 10 seconds, crossing machine testing simultaneously stored following message as sample This: the vehicle queue length of current phase and next phase, the control action of execution and each sampling period leave stop line Vehicle number.The sample number to be acquired is set as 9000.
Step 3, after crossing machine collects 9000 samples, discretization is carried out to the traffic behavior in sample.Each sample This arrangement is the form of data vector (l, a, l ', r), and l indicates that the traffic behavior of some sampling instant, a indicate that traffic behavior is l The control action of Shi Zhihang, l ' indicate that the traffic behavior of next sampling instant after l, r indicate that traffic behavior is transferred to l ' from l This sampling period in obtain immediate yield, the vehicle for leaving stop line in original sample in each sampling period can be used Number, is calculated according to the calculation formula of immediate yield r in step 1.
Traffic behavior in sample is pre-processed, is first normalized, then removes spacing more than preset threshold Traffic behavior.Select Euclidean distance as distance, it is 0.1 that threshold value, which is arranged, and a normalized friendship is first randomly choosed from sample An empty data set, referred to as traffic state data collection is added in logical state, then traffic behavior remaining in sample under Column principle is added in data set: if the traffic behavior in sample concentrates the distance of all traffic behaviors to traffic state data Both greater than 0.1, then traffic state data collection is added in the traffic behavior, is otherwise added without.
K- mean cluster is carried out to the traffic behavior that traffic state data is concentrated, defines the collection that cluster is close traffic behavior It closes, the corresponding discrete traffic behavior of each cluster defines the mass center that mass center is all traffic behaviors that cluster includes, mass center number is arranged Be 30, after start to cluster, steps are as follows:
Step a concentrates 30 different traffic behaviors of random selection as initial mass center from traffic state data;
Step b calculates each traffic behavior to the distance of each mass center, each traffic behavior is assigned to nearest matter The heart forms 30 clusters;
Step c recalculates the mass center of each cluster;
Step d calculates the variable quantity of mass center, i.e., the distance between original mass center and new mass center, if the matter of all clusters The heart is no longer changed, and k- mean cluster terminates, no to then follow the steps b.
After k- mean cluster, in each sample (l, a, l ', r) l and l ' be assigned to nearest mass center respectively, i.e., It is separately converted to discrete traffic behavior s and s ', it is data vector (s, a, s ', r) that sample, which is arranged,.
Step 4, one traffic signal control strategy of arbitrary initialization in the machine of crossing, it is then excellent using Policy iteration method Change strategy, mass center obtained in the strategy and step 3 that optimization obtains is stored in traffic signal control;
In isolated intersection traffic signal control optimization problem, 30 discrete traffic behaviors, each discrete traffic behavior are shared Under all there are two control action --- a1It indicates to keep current phase, a2Expression is switched to next phase, and tactful optimization is at crossing It carries out, is optimized using Policy iteration method, steps are as follows in machine:
Step a, setting the number of iterations are 1, initialize Q value and control strategy, calculate state-transition matrix and immediate yield Matrix.The Q value of each state-movement pair is initialized as zero, is stored in matrix Q, it is straight according to sample (s, a, s ', r) estimation Take back report matrix R1And R2, R1, R2It saves respectively and executes control action a1、a2The expectation of the immediate yield obtained afterwards, if i=1, 2 ..., 30, j=1,2 ..., 30, k=1,2, Q, R1And R2Definition difference it is as follows:
Wherein, Q (si,ak) expression movement-state is to si-akQ value, r (si,ak,sj) indicate to be in discrete traffic behavior si, execute control action akLater, it is transferred to discrete traffic behavior sjWhen the immediate yield that obtains.Initialize a control strategy For any strategy, it is stored in matrix Π, Π is defined as follows:
Wherein, π (si,ak) indicate in discrete state siLower execution acts akProbability, the sum of every row element of Π be 1.Root According to sample (s, a, s ', r) estimated state transfer matrix P, it is defined as follows:
Wherein, matrix element p (sj|si,ak) it is conditional probability, it indicates to be in discrete traffic behavior si, execute control action akLater, next sampling instant is transferred to discrete traffic behavior sjProbability.Utilize R1,R2With the element in P, can find out Immediate yield matrix R, R are defined as follows:
Wherein, r (si,ak) indicate to be in discrete traffic behavior si, execute control action akThe immediate yield obtained later It is expected that calculation formula is as follows:
Step b updates Q value, updates matrix Q according to the following formula:
Q=(I- γ P Π)-1R
Wherein, I indicates unit matrix, and γ is discount factor, is set as 0.95, ()-1It indicates to matrix inversion;
Step c, updates control strategy according to Q value, updates the element in matrix Π according to the following formula:
Step d, if the number of iterations is 1, preservation matrix Π to a matrix of the same dimensions Π ', the number of iterations adds 1, returns to step Rapid b, otherwise, two norms of the difference of solution matrix Π and matrix Π ':
D=| | Π-Π ' | |
If D is equal to 0, Policy iteration terminates, if D is not equal to 0, preservation matrix Π adds to matrix Π ', the number of iterations 1, return step b.
After Policy iteration, obtained control strategy is stored in matrix Q, mass center obtained in matrix Q and step 3 It is stored in traffic signal control;
Step 5, the control strategy of crossing machine setting traffic signal control is the control strategy that step 4 obtains, every 10 Second, traffic signal control receives the traffic behavior that crossing machine examination measures, it is normalized, the friendship after calculating normalization Lead to state to the distance of each mass center, finds out the number apart from nearest mass center, i.e., discrete traffic behavior siThe number i of state, Then control action a is selected according to the following formula*:
Traffic signal control is control action a*It is sent to the execution of crossing machine, if a*Value be a1Then keep current phase Position, if a*Value be a2Then it is switched to next phase.

Claims (1)

1. a kind of traffic signal optimization control method based on Policy iteration and cluster, it is characterised in that:
The following steps are included:
Step 1, select signal timing plan to be optimized for fixed phase sequence control, it is current phase and next for defining traffic behavior The vehicle queue length of phase, defining control action is to keep current phase or be switched to next phase, defines immediate yield and is One variable related with the vehicle number of stop line is left in the single sampling period, definition status-movement is to for discrete traffic shape The data vector of state and control action composition, the Q value for defining each state-movement pair indicate under corresponding discrete traffic behavior The expectation obtained after control action accumulation return is taken, defining control strategy is the control that each discrete traffic behavior should execute Movement;
Step 2, the control strategy of traffic signal control is set as induction control by crossing machine, and minimum green time, maximum are green The lamp time is set as the positive integer times in sampling period, and unit green extension is identical as the sampling period, and crossing machine is to traffic shape The vehicle number that state, the phase of execution acted and left stop line is sampled and is recorded sample, the method for sampling are as follows: in each sampling Moment recording traffic state, control action and each sampling period leave the vehicle number of stop line;
Step 3, after crossing machine collects the sample specified number, discretization, discretization side are carried out to the traffic behavior in sample Method are as follows: the traffic behavior first obtained to sampling is normalized, and removes the traffic behavior that spacing is more than preset threshold, then into Row k- mean cluster, obtained mass center is numbered, the corresponding discrete traffic behavior of each mass center, and normalization sample Traffic behavior in this is indicated with the number of nearest mass center, obtains corresponding discrete traffic behavior;
Step 4, crossing machine uses Policy iteration optimisation strategy, and mass center obtained in the strategy and step 3 that optimization obtains is saved In traffic signal control;
Step 5, the control strategy of crossing machine setting traffic signal control is the control strategy that step 4 obtains, and in decision week Phase is set as the sampling period, and at each decision moment, traffic signal control receives the traffic behavior that crossing machine examination measures, and carries out Normalization, the traffic behavior after calculating normalization is found out to the distance of each mass center apart from nearest mass center, corresponding according to mass center Discrete traffic behavior inquire control strategy, obtain control action and be sent to crossing machine execution,
Wherein used Policy iteration method comprises the steps of:
Step a, setting the number of iterations are 1, initialize Q value and control strategy, calculate state-transition matrix and immediate yield matrix, The Q value of each state-movement pair is initialized as zero, is stored in matrix Q, according to sample (s, a, s ', r) estimation immediate yield Matrix R1And R2, s indicates the traffic behavior of some sampling instant, and a indicates the control action executed when discrete traffic behavior is s, always It altogether include two kinds of control actions, control action a1It is to maintain current phase, control action a2It is to switch to next phase, s ' indicates s The discrete traffic behavior of next sampling instant later, r indicate discrete traffic behavior out of, s is transferred to s ' this sampling period The immediate yield of acquisition, calculation formula are as follows:
Wherein, npIndicate the vehicle number in a sampling period by stop line, R1, R2It saves respectively and executes control action a1、a2 The expectation of the immediate yield obtained afterwards, Q, R1And R2Definition difference it is as follows:
Wherein, n indicates the mass center number used when clustering in the step 3, Q (si,ak) indicate state-movement to si-akQ value, r (si,ak,sj) indicate to be in discrete traffic behavior si, execute control action akLater, it is transferred to discrete traffic behavior sjWhen obtain Immediate yield, the value range of i and j are all the integers between [1, n], and the value range of k is integer 1 and 2, initialize one Control strategy is any strategy, is stored in matrix Π, Π is defined as follows:
Wherein, π (si,ak) indicate in discrete state siLower execution acts akProbability, the sum of every row element of Π be 1, according to sample This (s, a, s ', r) and estimated state transfer matrix P, it is defined as follows:
Wherein, matrix element p (sj|si,ak) it is conditional probability, it indicates to be in discrete traffic behavior si, execute control action akIt Afterwards, next sampling instant is transferred to discrete traffic behavior sjProbability, utilize R1,R2With the element in P, can find out directly Matrix R is returned, R is defined as follows:
Wherein, r (si,ak) indicate to be in discrete traffic behavior si, execute control action akThe expectation of the immediate yield obtained later, Calculation formula is as follows:
Step b updates Q value, updates matrix Q according to the following formula:
Q=(I- γ P Π)-1R
Wherein, I indicates unit matrix, and γ is discount factor, is set as 0.95, ()-1It indicates to matrix inversion;
Step c, updates control strategy according to Q value, updates the element in matrix Π according to the following formula:
Step d, if the number of iterations is 1, preservation matrix Π to a matrix of the same dimensions Π ', the number of iterations adds 1, return step b, Otherwise, two norms of the difference of solution matrix Π and matrix Π ':
D=| | Π-Π ' | |
If D is equal to 0, Policy iteration terminates, if D is not equal to 0, preservation matrix Π adds 1 to matrix Π ', the number of iterations, returns Return step b.
CN201610696748.9A 2016-08-22 2016-08-22 A kind of traffic signal optimization control method based on Policy iteration and cluster Expired - Fee Related CN106097733B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610696748.9A CN106097733B (en) 2016-08-22 2016-08-22 A kind of traffic signal optimization control method based on Policy iteration and cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610696748.9A CN106097733B (en) 2016-08-22 2016-08-22 A kind of traffic signal optimization control method based on Policy iteration and cluster

Publications (2)

Publication Number Publication Date
CN106097733A CN106097733A (en) 2016-11-09
CN106097733B true CN106097733B (en) 2018-12-07

Family

ID=58070003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610696748.9A Expired - Fee Related CN106097733B (en) 2016-08-22 2016-08-22 A kind of traffic signal optimization control method based on Policy iteration and cluster

Country Status (1)

Country Link
CN (1) CN106097733B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2566098B (en) * 2017-09-05 2020-10-07 Jaguar Land Rover Ltd Apparatus and method for determining following vehicle information
CN110378460B (en) * 2018-04-13 2022-03-08 北京智行者科技有限公司 Decision making method
CN108806287B (en) * 2018-06-27 2021-02-02 沈阳理工大学 Traffic signal timing method based on cooperative optimization
CN109859475B (en) * 2019-03-14 2021-08-31 江苏中设集团股份有限公司 Intersection signal control method, device and system based on DBSCAN density clustering
CN112652164B (en) * 2020-12-02 2022-12-30 北京北大千方科技有限公司 Traffic time interval dividing method, device and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090040128A (en) * 2007-10-19 2009-04-23 한국전자통신연구원 Apaaratus and system for traffic signal controlling and method thereof
CN103699933A (en) * 2013-12-05 2014-04-02 北京工业大学 Traffic signal timing optimization method based on minimum spanning tree clustering genetic algorithm
CN105118308A (en) * 2015-10-12 2015-12-02 青岛大学 Method based on clustering reinforcement learning and used for optimizing traffic signals of urban road intersections
CN105405303A (en) * 2015-12-18 2016-03-16 佛山市高明区云大机械科技有限公司 Traffic control method based on traffic flow

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101436345B (en) * 2008-12-19 2010-08-18 天津市市政工程设计研究院 System for forecasting harbor district road traffic requirement based on TransCAD macroscopic artificial platform
EP2801963B1 (en) * 2013-05-09 2016-01-20 The Boeing Company Providing a description of aircraft intent

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090040128A (en) * 2007-10-19 2009-04-23 한국전자통신연구원 Apaaratus and system for traffic signal controlling and method thereof
CN103699933A (en) * 2013-12-05 2014-04-02 北京工业大学 Traffic signal timing optimization method based on minimum spanning tree clustering genetic algorithm
CN105118308A (en) * 2015-10-12 2015-12-02 青岛大学 Method based on clustering reinforcement learning and used for optimizing traffic signals of urban road intersections
CN105405303A (en) * 2015-12-18 2016-03-16 佛山市高明区云大机械科技有限公司 Traffic control method based on traffic flow

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于聚类思想的交通信号相位组合优化研究;陈文斌等;《西华大学学报(自然科学版)》;20160531;第35卷(第3期);第40-44页 *

Also Published As

Publication number Publication date
CN106097733A (en) 2016-11-09

Similar Documents

Publication Publication Date Title
CN106097733B (en) A kind of traffic signal optimization control method based on Policy iteration and cluster
CN104933859B (en) A kind of method of the determination network carrying power based on macroscopical parent map
CN110648527B (en) Traffic speed prediction method based on deep learning model
CN105118308B (en) Urban road intersection traffic signal optimization method based on cluster intensified learning
CN110222873A (en) A kind of subway station passenger flow forecast method based on big data
CN111243271A (en) Single-point intersection signal control method based on deep cycle Q learning
CN109272146A (en) A kind of Forecasting Flood method corrected based on deep learning model and BP neural network
CN105679037B (en) A kind of dynamic path planning method based on user's trip habit
CN101123038A (en) A dynamic information collection method for associated road segments of intersection
CN104464304A (en) Urban road vehicle running speed forecasting method based on road network characteristics
CN103295404B (en) Road section pedestrian traffic signal control system based on pedestrian crossing clearance time
CN105279982A (en) Single intersection dynamic traffic signal control method based on data driving
KR102329826B1 (en) Device and method for artificial intelligence-based traffic signal control
CN102819958B (en) Cellular simulation method for control of urban road motor vehicle traffic signals
CN110188936A (en) Short-time Traffic Flow Forecasting Methods based on multifactor spatial choice deep learning algorithm
CN106856049A (en) Crucial intersection demand clustering analysis method based on bayonet socket number plate identification data
WO2021073526A1 (en) Trajectory data-based signal control period division method
CN109544916A (en) A kind of road network vehicle OD estimation method based on sample path data
CN108806290A (en) Dynamic bidirectional green wave control method based on traffic state judging
CN113378486B (en) Regional traffic signal optimization method and device, computing equipment and storage medium
CN108389406A (en) Signal control time Automated Partition Method
CN102890866A (en) Traffic flow speed estimation method based on multi-core support vector regression machine
CN107945534A (en) A kind of special bus method for predicting based on GMDH neutral nets
CN103914981A (en) Method for predicting confliction between pedestrians and left-turn vehicles at plane intersection
CN113724507B (en) Traffic control and vehicle guidance cooperative method and system based on deep reinforcement learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20181207

Termination date: 20200822