CN105279978A - Intersection traffic signal control method and device - Google Patents
Intersection traffic signal control method and device Download PDFInfo
- Publication number
- CN105279978A CN105279978A CN201510665966.1A CN201510665966A CN105279978A CN 105279978 A CN105279978 A CN 105279978A CN 201510665966 A CN201510665966 A CN 201510665966A CN 105279978 A CN105279978 A CN 105279978A
- Authority
- CN
- China
- Prior art keywords
- network
- training
- critic
- weights
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Traffic Control Systems (AREA)
Abstract
The invention relates to an intersection traffic signal control method and device. The method can learn from environment feedbacks according to traffic status so as to achieve adaptive control of traffic signals. The method comprises: defining system parameters; setting up an Action network and a Critic network; initializing a controller; obtaining a corresponding system control parameter according to system status; obtaining a performance index according to status and actions; training the Critic network and the Action network alternatively; recording network weights after achieving training goals; and using the trained Critic network and the trained Action network to conduct online control. The method and the device use an ADHDP method to provide an effective approach for adaptive control of intersection traffic signals.
Description
Technical field
The present invention relates to urban traffic signal control field, be specifically related to a kind of intersection traffic signal control method and equipment.
Background technology
Along with the rapid growth of China's economic and the quickening of urbanization process, a large amount of population pours in city, and means of transportation are built and the speed of improvement is unable to catch up with the growing transport need of people far away, and traffic jam issue becomes increasingly conspicuous.
Outside the factors such as the reason that traffic jam issue occurs is many-sided, and removing means of transportation are inadequate, traffic programme is unreasonable and public's sense of traffic is thin, a very important factor is that existing urban traffic signal control system does not play one's part to the full.Due to the singularity of urban transport problems, be difficult to set up accurate mathematical model.Simple timing controlled, induction control method are difficult to the traffic adapting to become increasingly complex.
Self-adaptation dynamic programming (ADP) theory has merged the methods such as dynamic programming, intensified learning and approximation of function, it utilizes online or off-line data, adopt approximation to function structure to carry out the performance index function of estimating system, then obtain the control survey of near-optimization according to the principle of optimization.It is a kind of typical self-adaptation dynamic programming method that heuristic dynamic programming (ADHDP) method is relied in action, because it has the feature of model-free adaption, system parameter variations can be met frequent, requirement of real-time is higher, is difficult to the control overflow of the Traffic Systems setting up accurate model.
Summary of the invention
One aspect of the present invention provides a kind of ADHDP controller off-line training method controlled for intersection traffic signal, this ADHDP controller comprises Action network and Critic network, the method comprises: in step S1, define system state, Reward Program, split and system control parameters; In step S2, set up Action network and Critic network, wherein: Action network is the BP neural network with a hidden layer, and wherein input layer number is P, and output layer neuron number is P-1, the neuron number of hidden layer is M
a, M
afor empirical value; And Critic network is the BP neural network with a hidden layer, wherein input layer number is 2P-1, and output layer neuron number is 1, and the neuron number of hidden layer is M
c, M
cfor empirical value; In step S3, initialization ADHDP controller, comprising: initialization Action network weight and initialization Critic network weight; In step S4, before each control cycle terminates, obtain system state, input to Action network, export corresponding system controling parameters u (k), system control parameters u (k) is exported to simulation software to instruct the operation of next cycle; In step S5, system state S (k) and system control parameters u (k) are inputed to Critic network, output performance index J (k); In step S6, alternately train Critic network according to performance index and Reward Program and train Action network, with the weights of the weights and Action network that upgrade Critic network according to performance index; And in step S7, judge whether the target reaching expection setting: when reaching the target of expection setting, in step S8, off-line training terminates, and records the weights of final Action network and the weights of Critic network; Otherwise, return step S6 and continue training.
Another aspect of the present invention provides a kind of use carrys out On-line Control intersection traffic signal method according to the ADHDP controller that above method is trained, comprising: respectively with the weight initialization Action network of the weights of final Action network and Critic network and Critic network; The real time traffic data of on-line system is input to ADHDP controller; And according to the definition in step S1, obtain system state from the real time traffic data of on-line system, system state is inputted Action network, using the output of Action network as system control parameters, for Dominating paths oral sex messenger.
Another aspect provides a kind of ADHDP controller off-line training equipment controlled for intersection traffic signal, this ADHDP controller comprises Action network and Critic network, this equipment comprises: first device, define system state, Reward Program, split and system control parameters; Second device, sets up Action network and Critic network, and wherein: Action network is the BP neural network with a hidden layer, and wherein input layer number is P, and output layer neuron number is P-1, the neuron number of hidden layer is M
a, M
afor empirical value; And Critic network is the BP neural network with a hidden layer, wherein input layer number is 2P-1, and output layer neuron number is 1, and the neuron number of hidden layer is M
c, M
cfor empirical value; 3rd device, initialization ADHDP controller, comprising: initialization Action network weight and initialization Critic network weight; 4th device, before each control cycle terminates, obtains system state, inputs to Action network, export corresponding system controling parameters u (k), system control parameters u (k) is exported to simulation software to instruct the operation of next cycle; 5th device, inputs to Critic network, output performance index J (k) by system state S (k) and system control parameters u (k); 6th device, alternately trains Critic network according to performance index and Reward Program and trains Action network, with the weights of the weights and Action network that upgrade Critic network according to performance index; And the 7th device, judge whether the target reaching expection setting: when reaching the target of expection setting, off-line training terminates, and records the weights of final Action network and the weights of Critic network; Otherwise, use the 6th device to continue training.
Another aspect of the present invention provides the equipment that a kind of ADHDP controller using above equipment to train carrys out On-line Control intersection traffic signal, comprise: the 8th device, respectively with the weight initialization Action network of the weights of final Action network and Critic network and Critic network; 9th device, is input to ADHDP controller by the real time traffic data of on-line system; And the tenth device, according to the definition in first device, obtain system state from the real time traffic data of on-line system, system state is inputted Action network, using the output of Action network as system control parameters, for Dominating paths oral sex messenger.
The present invention effectively overcomes deficiency of the prior art.Intersection traffic signal control method of the present invention has on-line study ability, can change in the magnitude of traffic flow, in the complex environment of the practical engineering application such as non-vehicle flow large percentage, by the study to environmental feedback, calculate the timing parameter of crossing, realize the effective control to the changeable crossing of traffic flow.The method does not need to set up traffic model, can according to traffic behavior, and simulation human brain is learnt by environmental feedback, thus realizes the adaptive control to traffic signals.
Accompanying drawing explanation
Fig. 1 diagrammatically illustrates off-line training method flow diagram of the present invention.
Fig. 2 diagrammatically illustrates ADHDP structure and training schematic diagram.
Fig. 3 diagrammatically illustrates Action network and Critic schematic network structure.
Embodiment
Below in conjunction with drawings and Examples, technical scheme of the present invention is described in further detail.Following examples are implemented under premised on technical solution of the present invention, give detailed embodiment and process, but protection scope of the present invention is not limited to following embodiment.
With reference to figure 1 and Fig. 2, embodiments of the invention are described.Fig. 1 diagrammatically illustrates ADHDP controller off-line training method flow diagram of the present invention.Fig. 2 diagrammatically illustrates ADHDP structure and training schematic diagram.Hereinafter, be described for the crossing of a two phase place.
As shown in Figure 1, the method starts from step S0.
In step S1, define system state, Reward Program, split and system control parameters.
Define system state as follows.Suppose there be P phase place in each control cycle, phase time length is T
i, each phase place has L
iindividual track obtains right of access, and each track maximum queue length is h
i, phase place queue length H
i=max{h
i, phase average queue length
the flow in each track is q
j, phase place flow is Q
i=max{q
j, definition phase place saturation degree is
wherein 1≤i≤P, 1≤j≤L
i, ε is normaliztion constant.
Define system state is S (k)={ s
i(k) }, 1≤i≤P, wherein k is emulation step number, and step-length is the time span C of a kth control cycle
k, Cycle Length can be determined according to historical traffic Webster method, and value is usually between 30 seconds to 120 seconds.
Definition Reward Program is
wherein N=P-1, P>=2.
Definition split is a
i, wherein 1≤i≤P-1.The split of last phase place
System control parameters is u (k)={ a
i(k) }, 1≤i≤P.
In the example of two phase place, system state is S (k)={ s
i(k) }, wherein i=1,2.The split of first phase place is a
1, then second phase place split is had to be a
2=1-a
1.
In step S2, set up Action network and Critic network.As shown in Figure 3, Action network is the BP neural network with a hidden layer, and wherein input layer number is P, and output layer neuron number is P-1, and the neuron number of hidden layer is M
a, hidden neuron number M
afor empirical value, usually between 5 ~ 20.Critic network is the BP neural network with a hidden layer, and wherein input layer number is 2P-1, and output layer neuron number is 1, and the neuron number of hidden layer is M
c, hidden neuron number M
cfor empirical value, usually between 5 ~ 20.
In the example of two phase place, Action network is the BP neural network with a hidden layer, and wherein input layer number is 2, and output layer neuron number is 2, and the neuron number of hidden layer is 8.Critic network is the BP neural network with a hidden layer, and wherein input layer number is 3, and output layer neuron number is 1, and the neuron number of hidden layer is 8.
In step S3, initialization controller, comprises initialization Action network weight and Critic network weight.The learning rate of Action network can be set to l
a, learning rate l
abe generally the constant between 0 ~ 1, each step frequency of training is set to N
a, frequency of training N
afor empirical value, usually between 5 ~ 50.The learning rate of Critic network can be set to l
c, learning rate l
cbe generally the constant between 0 ~ 1, each step frequency of training is set to N
c, frequency of training N
cfor empirical value, usually between 5 ~ 50.For Action network and Critic network, all Sigmoid function can be adopted
as activation function, β gets 1 usually.
In the example of two phase place, initialization Action network weight gets the random number between 0 to 1, and learning rate is 0.3, and each step frequency of training is 5.Initialization Critic network weight gets the random number between 0 to 1, and learning rate is 0.1, and each step frequency of training is 5.
In step S4, before each control cycle terminates, obtain system state, input to Action network, export corresponding system controling parameters u (k).Such as, the flow q in each track, crossing collected can be received from simulation software
jand queue length h
idata, obtain system state S (k), using the input of system state as Action network, obtain corresponding output u (k), system control parameters u (k) are exported to simulation software to instruct the operation of next cycle.In the present embodiment, adopt paramic simulation software to be connected with controller, controller and simulation software are by shared file interactive information.
In step S5, system state S (k) and system control parameters u (k) are inputed to Critic network, output performance index J (k).
In step S6, alternately training Critic network and Action network, comprising:
The training error of Critic network is defined as:
A value usually between 0 ~ 1, a=0.2 in the example of two phase place.
The right value update of Critic network is in the following way:
w
c(k+1)=w
c(k)+Δw
c(k)
The training error of Action network is defined as:
G in formula
ck () is control objectives, G in the example of two phase place
c(k)=0.
The right value update of Action network is in the following way:
w
a(k+1)=w
a(k)+Δw
a(k)
Alternately training flow process is as follows: by the flow q based on each track, crossing
jand queue length h
inetwork state Deng traffic data inputs to Action network, obtains system control parameters u (k), and input system state and system control parameters u (k), to evaluating network, obtain performance index.Calculate the training error of Critic network according to performance index and Reward Program, and upgrade the weights of Critic network according to this training error.According to the training error of performance Index Calculation Action network, and upgrade the weights of Action network according to this training error.So move in circles, to reaching the target of expection setting.
In step S7, judge whether to reach training objective.When reaching the target of setting of looking ahead, in step S8, off-line training terminates, and records the weights of final Action network and the weights of Critic network.Otherwise, return step S6 and continue training.
In the present embodiment, the target of expection setting is: | e
a| < 0.05, | e
c| < 0.05, wherein: e
a=J (k), e
c=α J (k)-J (k-1)+r (k).The weights of Action network and Critic network are recorded after reaching target.
Present invention also offers a kind of method that ADHDP controller using above method to train carrys out On-line Control intersection traffic signal, comprising:
Respectively with the weight initialization Action network of the weights of final Action network and Critic network and Critic network, the real time data of on-line system (is comprised the flow q in each track, crossing
jand queue length h
i) be input to ADHDP controller, obtain system state according to the definition in step S1, system state is inputted Action network, using the output of Action network as system control parameters, for Dominating paths oral sex messenger.Alternatively, the method can also comprise carries out on-line training according to step S5 and S6, with the weights of the weights of real-time update Action network and Critic network.
Above method step of the present invention is non-essential to be performed with illustrated order.Under the prerequisite not departing from spirit of the present invention, in an alternate embodiment, above-mentioned steps and/or some step of executed in parallel can be performed with different order.These modification all fall into protection scope of the present invention.
Said method of the present invention can be performed the computer instruction be stored in memory device and realize by the equipment (such as processor) with computing function.An example of this implementation is a kind of ADHDP controller off-line training equipment controlled for intersection traffic signal, this ADHDP controller comprises Action network and Critic network, this equipment comprises: first device, define system state, Reward Program, split and system control parameters; Second device, sets up Action network and Critic network, and wherein: Action network is the BP neural network with a hidden layer, and wherein input layer number is P, and output layer neuron number is P-1, the neuron number of hidden layer is M
a, M
afor empirical value; And Critic network is the BP neural network with a hidden layer, wherein input layer number is 2P-1, and output layer neuron number is 1, and the neuron number of hidden layer is M
c, M
cfor empirical value; 3rd device, initialization ADHDP controller, comprising: initialization Action network weight and initialization Critic network weight; 4th device, before each control cycle terminates, obtains system state, inputs to Action network, export corresponding system controling parameters u (k), system control parameters u (k) is exported to simulation software to instruct the operation of next cycle; 5th device, inputs to Critic network, output performance index J (k) by system state S (k) and system control parameters u (k); 6th device, alternately trains Critic network according to performance index and Reward Program and trains Action network, with the weights of the weights and Action network that upgrade Critic network according to performance index; And the 7th device, judge whether the target reaching expection setting: when reaching the target of expection setting, off-line training terminates, and records the weights of final Action network and the weights of Critic network; Otherwise, use the 6th device to continue training.
Another example of this implementation is the equipment that a kind of ADHDP controller using above equipment to train carrys out On-line Control intersection traffic signal, comprise: the 8th device, respectively with the weight initialization Action network of the weights of final Action network and Critic network and Critic network; 9th device, is input to ADHDP controller by the real time traffic data of on-line system; And the tenth device, according to the definition in first device, obtain system state from the real time traffic data of on-line system, system state is inputted Action network, using the output of Action network as system control parameters, for Dominating paths oral sex messenger.
In this implementation, each device above-mentioned is that computing equipment performs instruction and the corresponding function module of producing.
Although illustrate and describe the present invention with reference to certain exemplary embodiments of the present invention, but those skilled in the art should understand that, when not deviating from the spirit and scope of the present invention of claims and equivalents thereof, the multiple change in form and details can be carried out to the present invention.Therefore, scope of the present invention should not be limited to above-described embodiment, but should not only be determined by claims, is also limited by the equivalent of claims.
Claims (30)
1., for the ADHDP controller off-line training method that intersection traffic signal controls, this ADHDP controller comprises Action network and Critic network, and the method comprises:
In step S1, define system state, Reward Program, split and system control parameters;
In step S2, set up Action network and Critic network, wherein:
Action network is the BP neural network with a hidden layer, and wherein input layer number is P, and output layer neuron number is P-1, and the neuron number of hidden layer is M
a, M
afor empirical value; And
Critic network is the BP neural network with a hidden layer, and wherein input layer number is 2P-1, and output layer neuron number is 1, and the neuron number of hidden layer is M
c, M
cfor empirical value;
In step S3, initialization ADHDP controller, comprising: initialization Action network weight and initialization Critic network weight;
In step S4, before each control cycle terminates, obtain system state, input to Action network, export corresponding system controling parameters u (k), system control parameters u (k) is exported to simulation software to instruct the operation of next cycle;
In step S5, system state S (k) and system control parameters u (k) are inputed to Critic network, output performance index Jk);
In step S6, alternately train Critic network according to performance index and Reward Program and train Action network, with the weights of the weights and Action network that upgrade Critic network according to performance index; And
In step S7, judge whether the target reaching expection setting: when reaching the target of expection setting, in step S8, off-line training terminates, and records the weights of final Action network and the weights of Critic network; Otherwise, return step S6 and continue training.
2. method according to claim 1, wherein define system state, Reward Program, split and system control parameters comprise:
Define system state, comprising: suppose there be P phase place in each control cycle, and phase time length is T
i, each phase place has L
iindividual track obtains right of access, and each track maximum queue length is h
i, phase place queue length H
i=max{h
i, phase average queue length
the flow in each track is q
j, phase place flow is Q
i=max{q
j, definition phase place saturation degree is
wherein 1≤i≤P, 1≤j≤L
i,
`ε is normaliztion constant, and define system state is Sk)={ s
i(k) }, 1≤i≤P, wherein k is emulation step number, and step-length is the time span C of a kth control cycle
k, determine C according to historical traffic Webster method
k;
Definition Reward Program is
wherein N=P-1, P>=2;
Definition split is a
i, wherein 1≤i≤P-1.Split is the green light duration of i-th phase place and the ratio of the duration of control cycle, the split of last phase place
and
Define system controling parameters is u (k)={ a
i(k)), 1≤i≤P.
3. method according to claim 2, wherein each control cycle is the traffic signals period of change that of given crossing is complete.
4. method according to claim 2, wherein each phase place corresponds to a kind of traffic signal state at given crossing.
5. method according to claim 1, wherein initialization ADHDP controller also comprises:
The learning rate of Action network is set to l
a, learning rate l
avalue between for 0 ~ 1, each step frequency of training is set to as N
a, frequency of training N
avalue between 5 ~ 50;
The learning rate of Critic network is set to l
c, learning rate l
cvalue between 0 ~ 1, each step frequency of training be set to N
c, frequency of training N
cvalue between 5 ~ 50; And
For Action network and Critic network, all use Sigmoid function
as activation function, β equals 1.
6. method according to claim 2, wherein obtains system state and comprises: the flow q receiving each track, crossing from simulation software
jand queue length h
idata, obtain system state S (k).
7. method according to claim 2, wherein train Critic network and Action network to comprise:
The training error of Critic network is calculated according to performance index and Reward Program;
The weights of Critic network are upgraded according to this training error;
According to the training error of performance Index Calculation Action network; And
The weights of Action network are upgraded according to this training error.
8. method according to claim 7, wherein:
The training error of Critic network is defined as:
The right value update of Critic network is in the following way:
w
c(k+1)=w
c(k)+Δw
c(k)
The training error of Action network is defined as:
g in formula
ck () is control objectives, G
c(k)=0;
The right value update of Action network is in the following way:
w
a(k+1)=w
a(k)+Δw
a(k)
9. method according to claim 1, wherein M
avalue between 5 ~ 20, M
cvalue between 5 ~ 20.
10. method according to claim 1, wherein:
The target of expection setting is crossing total delay time or each track average vehicle speed;
If expection setting target be the crossing total delay time, then in step S7, when the total delay time be less than or close to expection setting the total delay time time, the method proceeds to step S8, otherwise return step S6 continue training;
If expection setting target be each track average vehicle speed, then when each track average vehicle speed is greater than or close to expection setting average vehicle speed time, the method proceeds to step S8, otherwise return step S6 continue training.
11. method according to claim 2, wherein C
kvalue is between 30 seconds to 120 seconds.
12. 1 kinds of methods using the ADHDP controller of the training of the method any one of claim 1-11 to carry out On-line Control intersection traffic signal, comprising:
Respectively with the weight initialization Action network of the weights of final Action network and Critic network and Critic network;
The real time traffic data of on-line system is input to ADHDP controller; And
According to the definition in step S1, obtain system state from the real time traffic data of on-line system, system state is inputted Action network, using the output of Action network as system control parameters, for Dominating paths oral sex messenger.
13. methods according to claim 12, wherein real time traffic data comprises the flow q in each track, crossing
jand queue length h
i.
14. methods according to claim 12, also comprise and carry out on-line training according to step S5 and S6, with the weights of the weights of real-time update Action network and Critic network.
15. methods according to claim 12, wherein the real time traffic data of on-line system comprises the flow q in each track, crossing
jand queue length h
i.
16. 1 kinds of ADHDP controller off-line training equipment controlled for intersection traffic signal, this ADHDP controller comprises Action network and Critic network, and this equipment comprises:
First device, define system state, Reward Program, split and system control parameters;
Second device, sets up Action network and Critic network, wherein:
Action network is the BP neural network with a hidden layer, and wherein input layer number is P, and output layer neuron number is P-1, and the neuron number of hidden layer is M
a, M
afor empirical value; And
Critic network is the BP neural network with a hidden layer, and wherein input layer number is 2P-1, and output layer neuron number is 1, and the neuron number of hidden layer is M
c, M
cfor empirical value;
3rd device, initialization ADHDP controller, comprising: initialization Action network weight and initialization Critic network weight;
4th device, before each control cycle terminates, obtains system state, inputs to Action network, export corresponding system controling parameters u (k), system control parameters u (k) is exported to simulation software to instruct the operation of next cycle;
5th device, inputs to Critic network, output performance index J (k) by system state S (k) and system control parameters u (k);
6th device, alternately trains Critic network according to performance index and Reward Program and trains Action network, with the weights of the weights and Action network that upgrade Critic network according to performance index; And
7th device, judges whether the target reaching expection setting: when reaching the target of expection setting, off-line training terminates, and records the weights of final Action network and the weights of Critic network; Otherwise, use the 6th device to continue training.
17. equipment according to claim 16, wherein define system state, Reward Program, split and system control parameters comprise:
Define system state, comprising: suppose there be P phase place in each control cycle, and phase time length is T
i, each phase place has L
iindividual track obtains right of access, and each track maximum queue length is h
i, phase place queue length H
i=max{h
i, phase average queue length
the flow in each track is q
j, phase place flow is Q
i=max{q
j, definition phase place saturation degree is
wherein 1≤i≤P, 1≤j≤L
i, ε is normaliztion constant, and define system state is S (k)={ s
i(k}, 1≤i≤P, wherein k is emulation step number, and step-length is the time span C of a kth control cycle
k, determine Ck according to historical traffic Webster method
;
Definition Reward Program is
wherein N=P-1, P>=2;
Definition split is a
i, wherein 1≤i≤P-1.Split is the green light duration of i-th phase place and the ratio of the duration of control cycle, the split of last phase place
and
Define system controling parameters is u (k)={ a
i(k) }, 1≤i≤P.
18. equipment according to claim 17, wherein each control cycle is the traffic signals period of change that of given crossing is complete.
19. equipment according to claim 17, wherein each phase place corresponds to a kind of traffic signal state at given crossing.
20. equipment according to claim 16, wherein initialization ADHDP controller also comprises:
The learning rate of Action network is set to l
a, learning rate l
avalue between for 0 ~ 1, each step frequency of training is set to as N
a, frequency of training N
avalue between 5 ~ 50;
The learning rate of Critic network is set to l
c, learning rate l
cvalue between 0 ~ 1, each step frequency of training be set to N
c, frequency of training N
cvalue between 5 ~ 50; And
For Action network and Critic network, all use Sigmoid function
as activation function, β equals 1.
21. equipment according to claim 17, wherein obtain system state and comprise: the flow q receiving each track, crossing from simulation software
jand queue length h
idata, obtain system state S (k).
22. equipment according to claim 17, wherein train Critic network and Action network to comprise:
The training error of Critic network is calculated according to performance index and Reward Program;
The weights of Critic network are upgraded according to this training error;
According to the training error of performance Index Calculation Action network; And
The weights of Action network are upgraded according to this training error.
23. equipment according to claim 22, wherein:
The training error of Critic network is defined as:
The right value update of Critic network is in the following way:
w
c(k+1)=w
c(k)+Δw
c(k)
The training error of Action network is defined as:
g in formula
ck () is control objectives, G
c(k)=0;
The right value update of Action network is in the following way:
w
a(k+1)=w
a(k)+Δw
a(k)
24. equipment according to claim 16, wherein M
avalue between 5 ~ 20, M
cvalue between 5 ~ 20.
25. equipment according to claim 16, wherein:
The target of expection setting is crossing total delay time or each track average vehicle speed;
If the target of expection setting is the crossing total delay time, then in the 7th device, during the total delay time be less than when the total delay time or set close to expection, off-line training terminates, record the weights of final Action network and the weights of Critic network, otherwise use the 6th device to continue training;
If the target of expection setting is each track average vehicle speed, then when the average vehicle speed that each track average vehicle speed is greater than or close expection sets, this off-line training terminates, record the weights of final Action network and the weights of Critic network, otherwise use the 6th device to continue training.
26. equipment according to claim 17, wherein C
kvalue is between 30 seconds to 120 seconds.
27. 1 kinds of equipment using the ADHDP controller of the training of the equipment any one of claim 16-26 to carry out On-line Control intersection traffic signal, comprising:
8th device, respectively with the weight initialization Action network of the weights of final Action network and Critic network and Critic network;
9th device, is input to ADHDP controller by the real time traffic data of on-line system; And
Tenth device, according to the definition in first device, obtains system state from the real time traffic data of on-line system, system state is inputted Action network, using the output of Action network as system control parameters, for Dominating paths oral sex messenger.
28. equipment according to claim 27, wherein real time traffic data comprises the flow q in each track, crossing
jand queue length h
i.
29. equipment according to claim 27, also comprise use the 5th device and the 6th device carries out on-line training, with the weights of the weights of real-time update Action network and Critic network.
30. equipment according to claim 27, wherein the real time traffic data of on-line system comprises the flow q in each track, crossing
jand queue length h
i.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510665966.1A CN105279978B (en) | 2015-10-15 | 2015-10-15 | Intersection traffic signal control method and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510665966.1A CN105279978B (en) | 2015-10-15 | 2015-10-15 | Intersection traffic signal control method and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105279978A true CN105279978A (en) | 2016-01-27 |
CN105279978B CN105279978B (en) | 2018-05-25 |
Family
ID=55148906
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510665966.1A Active CN105279978B (en) | 2015-10-15 | 2015-10-15 | Intersection traffic signal control method and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105279978B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108459506A (en) * | 2018-03-20 | 2018-08-28 | 清华大学 | A kind of parameter tuning method of the virtual inertia controller of wind turbine |
CN114973698A (en) * | 2022-05-10 | 2022-08-30 | 阿波罗智联(北京)科技有限公司 | Control information generation method and machine learning model training method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20010046291A (en) * | 1999-11-11 | 2001-06-15 | 정환도 | Traffic signal control system and method using cdma wireless communication network |
KR20050051956A (en) * | 2003-11-28 | 2005-06-02 | 주식회사 비츠로시스 | Control system and methdod for local divisional traffic signal |
JP2007122584A (en) * | 2005-10-31 | 2007-05-17 | Sumitomo Electric Ind Ltd | Traffic signal control system and control method of traffic signal control system |
CN102568220A (en) * | 2010-12-17 | 2012-07-11 | 上海市长宁区少年科技指导站 | Self-adaptive traffic control system |
CN104882006A (en) * | 2014-07-03 | 2015-09-02 | 中国科学院沈阳自动化研究所 | Message-based complex network traffic signal optimization control method |
-
2015
- 2015-10-15 CN CN201510665966.1A patent/CN105279978B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20010046291A (en) * | 1999-11-11 | 2001-06-15 | 정환도 | Traffic signal control system and method using cdma wireless communication network |
KR20050051956A (en) * | 2003-11-28 | 2005-06-02 | 주식회사 비츠로시스 | Control system and methdod for local divisional traffic signal |
JP2007122584A (en) * | 2005-10-31 | 2007-05-17 | Sumitomo Electric Ind Ltd | Traffic signal control system and control method of traffic signal control system |
CN102568220A (en) * | 2010-12-17 | 2012-07-11 | 上海市长宁区少年科技指导站 | Self-adaptive traffic control system |
CN104882006A (en) * | 2014-07-03 | 2015-09-02 | 中国科学院沈阳自动化研究所 | Message-based complex network traffic signal optimization control method |
Non-Patent Citations (2)
Title |
---|
张鹏程: "基于核的连续空间增强学习方法及应用研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
齐驰: "近似动态规划方法及其在交通中的应用", 《中国博士学位论文全文数据库 工程科技Ⅱ辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108459506A (en) * | 2018-03-20 | 2018-08-28 | 清华大学 | A kind of parameter tuning method of the virtual inertia controller of wind turbine |
CN108459506B (en) * | 2018-03-20 | 2020-12-08 | 清华大学 | Parameter setting method of virtual inertia controller of fan |
CN114973698A (en) * | 2022-05-10 | 2022-08-30 | 阿波罗智联(北京)科技有限公司 | Control information generation method and machine learning model training method and device |
CN114973698B (en) * | 2022-05-10 | 2024-04-16 | 阿波罗智联(北京)科技有限公司 | Control information generation method and machine learning model training method and device |
Also Published As
Publication number | Publication date |
---|---|
CN105279978B (en) | 2018-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Belletti et al. | Expert level control of ramp metering based on multi-task deep reinforcement learning | |
CN109492814B (en) | Urban traffic flow prediction method, system and electronic equipment | |
CN110570672B (en) | Regional traffic signal lamp control method based on graph neural network | |
US11182676B2 (en) | Cooperative neural network deep reinforcement learning with partial input assistance | |
CN109284812B (en) | Video game simulation method based on improved DQN | |
JP6092477B2 (en) | An automated method for correcting neural dynamics | |
CN106781489A (en) | A kind of road network trend prediction method based on recurrent neural network | |
CN105700526A (en) | On-line sequence limit learning machine method possessing autonomous learning capability | |
KR20160125967A (en) | Method and apparatus for efficient implementation of common neuron models | |
US20230367934A1 (en) | Method and apparatus for constructing vehicle dynamics model and method and apparatus for predicting vehicle state information | |
CN112172813A (en) | Car following system and method for simulating driving style based on deep inverse reinforcement learning | |
Tagliaferri et al. | A real-time strategy-decision program for sailing yacht races | |
CN114415507B (en) | Deep neural network-based smart hand-held process dynamics model building and training method | |
CN105279978A (en) | Intersection traffic signal control method and device | |
Wang et al. | Dynamic-horizon model-based value estimation with latent imagination | |
Hilleli et al. | Toward deep reinforcement learning without a simulator: An autonomous steering example | |
CN113821903A (en) | Temperature control method and device, modular data center and storage medium | |
CN117008620A (en) | Unmanned self-adaptive path planning method, system, equipment and medium | |
US20230162539A1 (en) | Driving decision-making method and apparatus and chip | |
CN105513380B (en) | The off-line training method and system and its On-Line Control Method and system of EADP controllers | |
KR102624710B1 (en) | Structural response estimation method using gated recurrent unit | |
Koltovska et al. | Intelligent Agent Based Traffic Signal Control on Isolated Intersections. | |
Li et al. | Prediction for short-term traffic flow based on optimized wavelet neural network model | |
CN115512558A (en) | Traffic light signal control method based on multi-agent reinforcement learning | |
JP2017513110A (en) | Contextual real-time feedback for neuromorphic model development |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |