CN105279978A - Intersection traffic signal control method and device - Google Patents

Intersection traffic signal control method and device Download PDF

Info

Publication number
CN105279978A
CN105279978A CN201510665966.1A CN201510665966A CN105279978A CN 105279978 A CN105279978 A CN 105279978A CN 201510665966 A CN201510665966 A CN 201510665966A CN 105279978 A CN105279978 A CN 105279978A
Authority
CN
China
Prior art keywords
network
training
critic
weights
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510665966.1A
Other languages
Chinese (zh)
Other versions
CN105279978B (en
Inventor
王飞跃
刘裕良
段艳杰
吕宜生
朱凤华
苟超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Huicheng Intelligent Technology Co Ltd
Qingdao Intelligent Industry Institute For Research And Technology
Original Assignee
Qingdao Huicheng Intelligent Technology Co Ltd
Qingdao Intelligent Industry Institute For Research And Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Huicheng Intelligent Technology Co Ltd, Qingdao Intelligent Industry Institute For Research And Technology filed Critical Qingdao Huicheng Intelligent Technology Co Ltd
Priority to CN201510665966.1A priority Critical patent/CN105279978B/en
Publication of CN105279978A publication Critical patent/CN105279978A/en
Application granted granted Critical
Publication of CN105279978B publication Critical patent/CN105279978B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Traffic Control Systems (AREA)

Abstract

The invention relates to an intersection traffic signal control method and device. The method can learn from environment feedbacks according to traffic status so as to achieve adaptive control of traffic signals. The method comprises: defining system parameters; setting up an Action network and a Critic network; initializing a controller; obtaining a corresponding system control parameter according to system status; obtaining a performance index according to status and actions; training the Critic network and the Action network alternatively; recording network weights after achieving training goals; and using the trained Critic network and the trained Action network to conduct online control. The method and the device use an ADHDP method to provide an effective approach for adaptive control of intersection traffic signals.

Description

Intersection traffic signal control method and equipment
Technical field
The present invention relates to urban traffic signal control field, be specifically related to a kind of intersection traffic signal control method and equipment.
Background technology
Along with the rapid growth of China's economic and the quickening of urbanization process, a large amount of population pours in city, and means of transportation are built and the speed of improvement is unable to catch up with the growing transport need of people far away, and traffic jam issue becomes increasingly conspicuous.
Outside the factors such as the reason that traffic jam issue occurs is many-sided, and removing means of transportation are inadequate, traffic programme is unreasonable and public's sense of traffic is thin, a very important factor is that existing urban traffic signal control system does not play one's part to the full.Due to the singularity of urban transport problems, be difficult to set up accurate mathematical model.Simple timing controlled, induction control method are difficult to the traffic adapting to become increasingly complex.
Self-adaptation dynamic programming (ADP) theory has merged the methods such as dynamic programming, intensified learning and approximation of function, it utilizes online or off-line data, adopt approximation to function structure to carry out the performance index function of estimating system, then obtain the control survey of near-optimization according to the principle of optimization.It is a kind of typical self-adaptation dynamic programming method that heuristic dynamic programming (ADHDP) method is relied in action, because it has the feature of model-free adaption, system parameter variations can be met frequent, requirement of real-time is higher, is difficult to the control overflow of the Traffic Systems setting up accurate model.
Summary of the invention
One aspect of the present invention provides a kind of ADHDP controller off-line training method controlled for intersection traffic signal, this ADHDP controller comprises Action network and Critic network, the method comprises: in step S1, define system state, Reward Program, split and system control parameters; In step S2, set up Action network and Critic network, wherein: Action network is the BP neural network with a hidden layer, and wherein input layer number is P, and output layer neuron number is P-1, the neuron number of hidden layer is M a, M afor empirical value; And Critic network is the BP neural network with a hidden layer, wherein input layer number is 2P-1, and output layer neuron number is 1, and the neuron number of hidden layer is M c, M cfor empirical value; In step S3, initialization ADHDP controller, comprising: initialization Action network weight and initialization Critic network weight; In step S4, before each control cycle terminates, obtain system state, input to Action network, export corresponding system controling parameters u (k), system control parameters u (k) is exported to simulation software to instruct the operation of next cycle; In step S5, system state S (k) and system control parameters u (k) are inputed to Critic network, output performance index J (k); In step S6, alternately train Critic network according to performance index and Reward Program and train Action network, with the weights of the weights and Action network that upgrade Critic network according to performance index; And in step S7, judge whether the target reaching expection setting: when reaching the target of expection setting, in step S8, off-line training terminates, and records the weights of final Action network and the weights of Critic network; Otherwise, return step S6 and continue training.
Another aspect of the present invention provides a kind of use carrys out On-line Control intersection traffic signal method according to the ADHDP controller that above method is trained, comprising: respectively with the weight initialization Action network of the weights of final Action network and Critic network and Critic network; The real time traffic data of on-line system is input to ADHDP controller; And according to the definition in step S1, obtain system state from the real time traffic data of on-line system, system state is inputted Action network, using the output of Action network as system control parameters, for Dominating paths oral sex messenger.
Another aspect provides a kind of ADHDP controller off-line training equipment controlled for intersection traffic signal, this ADHDP controller comprises Action network and Critic network, this equipment comprises: first device, define system state, Reward Program, split and system control parameters; Second device, sets up Action network and Critic network, and wherein: Action network is the BP neural network with a hidden layer, and wherein input layer number is P, and output layer neuron number is P-1, the neuron number of hidden layer is M a, M afor empirical value; And Critic network is the BP neural network with a hidden layer, wherein input layer number is 2P-1, and output layer neuron number is 1, and the neuron number of hidden layer is M c, M cfor empirical value; 3rd device, initialization ADHDP controller, comprising: initialization Action network weight and initialization Critic network weight; 4th device, before each control cycle terminates, obtains system state, inputs to Action network, export corresponding system controling parameters u (k), system control parameters u (k) is exported to simulation software to instruct the operation of next cycle; 5th device, inputs to Critic network, output performance index J (k) by system state S (k) and system control parameters u (k); 6th device, alternately trains Critic network according to performance index and Reward Program and trains Action network, with the weights of the weights and Action network that upgrade Critic network according to performance index; And the 7th device, judge whether the target reaching expection setting: when reaching the target of expection setting, off-line training terminates, and records the weights of final Action network and the weights of Critic network; Otherwise, use the 6th device to continue training.
Another aspect of the present invention provides the equipment that a kind of ADHDP controller using above equipment to train carrys out On-line Control intersection traffic signal, comprise: the 8th device, respectively with the weight initialization Action network of the weights of final Action network and Critic network and Critic network; 9th device, is input to ADHDP controller by the real time traffic data of on-line system; And the tenth device, according to the definition in first device, obtain system state from the real time traffic data of on-line system, system state is inputted Action network, using the output of Action network as system control parameters, for Dominating paths oral sex messenger.
The present invention effectively overcomes deficiency of the prior art.Intersection traffic signal control method of the present invention has on-line study ability, can change in the magnitude of traffic flow, in the complex environment of the practical engineering application such as non-vehicle flow large percentage, by the study to environmental feedback, calculate the timing parameter of crossing, realize the effective control to the changeable crossing of traffic flow.The method does not need to set up traffic model, can according to traffic behavior, and simulation human brain is learnt by environmental feedback, thus realizes the adaptive control to traffic signals.
Accompanying drawing explanation
Fig. 1 diagrammatically illustrates off-line training method flow diagram of the present invention.
Fig. 2 diagrammatically illustrates ADHDP structure and training schematic diagram.
Fig. 3 diagrammatically illustrates Action network and Critic schematic network structure.
Embodiment
Below in conjunction with drawings and Examples, technical scheme of the present invention is described in further detail.Following examples are implemented under premised on technical solution of the present invention, give detailed embodiment and process, but protection scope of the present invention is not limited to following embodiment.
With reference to figure 1 and Fig. 2, embodiments of the invention are described.Fig. 1 diagrammatically illustrates ADHDP controller off-line training method flow diagram of the present invention.Fig. 2 diagrammatically illustrates ADHDP structure and training schematic diagram.Hereinafter, be described for the crossing of a two phase place.
As shown in Figure 1, the method starts from step S0.
In step S1, define system state, Reward Program, split and system control parameters.
Define system state as follows.Suppose there be P phase place in each control cycle, phase time length is T i, each phase place has L iindividual track obtains right of access, and each track maximum queue length is h i, phase place queue length H i=max{h i, phase average queue length the flow in each track is q j, phase place flow is Q i=max{q j, definition phase place saturation degree is wherein 1≤i≤P, 1≤j≤L i, ε is normaliztion constant.
Define system state is S (k)={ s i(k) }, 1≤i≤P, wherein k is emulation step number, and step-length is the time span C of a kth control cycle k, Cycle Length can be determined according to historical traffic Webster method, and value is usually between 30 seconds to 120 seconds.
Definition Reward Program is wherein N=P-1, P>=2.
Definition split is a i, wherein 1≤i≤P-1.The split of last phase place a P = Σ i = 1 P - 1 a i .
System control parameters is u (k)={ a i(k) }, 1≤i≤P.
In the example of two phase place, system state is S (k)={ s i(k) }, wherein i=1,2.The split of first phase place is a 1, then second phase place split is had to be a 2=1-a 1.
In step S2, set up Action network and Critic network.As shown in Figure 3, Action network is the BP neural network with a hidden layer, and wherein input layer number is P, and output layer neuron number is P-1, and the neuron number of hidden layer is M a, hidden neuron number M afor empirical value, usually between 5 ~ 20.Critic network is the BP neural network with a hidden layer, and wherein input layer number is 2P-1, and output layer neuron number is 1, and the neuron number of hidden layer is M c, hidden neuron number M cfor empirical value, usually between 5 ~ 20.
In the example of two phase place, Action network is the BP neural network with a hidden layer, and wherein input layer number is 2, and output layer neuron number is 2, and the neuron number of hidden layer is 8.Critic network is the BP neural network with a hidden layer, and wherein input layer number is 3, and output layer neuron number is 1, and the neuron number of hidden layer is 8.
In step S3, initialization controller, comprises initialization Action network weight and Critic network weight.The learning rate of Action network can be set to l a, learning rate l abe generally the constant between 0 ~ 1, each step frequency of training is set to N a, frequency of training N afor empirical value, usually between 5 ~ 50.The learning rate of Critic network can be set to l c, learning rate l cbe generally the constant between 0 ~ 1, each step frequency of training is set to N c, frequency of training N cfor empirical value, usually between 5 ~ 50.For Action network and Critic network, all Sigmoid function can be adopted as activation function, β gets 1 usually.
In the example of two phase place, initialization Action network weight gets the random number between 0 to 1, and learning rate is 0.3, and each step frequency of training is 5.Initialization Critic network weight gets the random number between 0 to 1, and learning rate is 0.1, and each step frequency of training is 5.
In step S4, before each control cycle terminates, obtain system state, input to Action network, export corresponding system controling parameters u (k).Such as, the flow q in each track, crossing collected can be received from simulation software jand queue length h idata, obtain system state S (k), using the input of system state as Action network, obtain corresponding output u (k), system control parameters u (k) are exported to simulation software to instruct the operation of next cycle.In the present embodiment, adopt paramic simulation software to be connected with controller, controller and simulation software are by shared file interactive information.
In step S5, system state S (k) and system control parameters u (k) are inputed to Critic network, output performance index J (k).
In step S6, alternately training Critic network and Action network, comprising:
The training error of Critic network is defined as:
E c ( k ) = 1 2 [ a J ( k ) - J ( k - 1 ) + r ( k ) ] 2
A value usually between 0 ~ 1, a=0.2 in the example of two phase place.
The right value update of Critic network is in the following way:
w c(k+1)=w c(k)+Δw c(k)
Δw c ( k ) = - ∂ E c ( k ) ∂ w c ( k ) = - ∂ E c ( k ) ∂ J ( k ) ∂ J ( k ) ∂ w c ( k )
The training error of Action network is defined as:
E a = 1 2 [ J ( k ) - G c ( k ) ] 2
G in formula ck () is control objectives, G in the example of two phase place c(k)=0.
The right value update of Action network is in the following way:
w a(k+1)=w a(k)+Δw a(k)
Δw a ( k ) = - ∂ E a ( k ) ∂ w a ( k ) = - ∂ E a ( k ) ∂ J ( k ) ∂ J ( k ) ∂ u ( k ) ∂ u ( k ) ∂ w a ( k )
Alternately training flow process is as follows: by the flow q based on each track, crossing jand queue length h inetwork state Deng traffic data inputs to Action network, obtains system control parameters u (k), and input system state and system control parameters u (k), to evaluating network, obtain performance index.Calculate the training error of Critic network according to performance index and Reward Program, and upgrade the weights of Critic network according to this training error.According to the training error of performance Index Calculation Action network, and upgrade the weights of Action network according to this training error.So move in circles, to reaching the target of expection setting.
In step S7, judge whether to reach training objective.When reaching the target of setting of looking ahead, in step S8, off-line training terminates, and records the weights of final Action network and the weights of Critic network.Otherwise, return step S6 and continue training.
In the present embodiment, the target of expection setting is: | e a| < 0.05, | e c| < 0.05, wherein: e a=J (k), e c=α J (k)-J (k-1)+r (k).The weights of Action network and Critic network are recorded after reaching target.
Present invention also offers a kind of method that ADHDP controller using above method to train carrys out On-line Control intersection traffic signal, comprising:
Respectively with the weight initialization Action network of the weights of final Action network and Critic network and Critic network, the real time data of on-line system (is comprised the flow q in each track, crossing jand queue length h i) be input to ADHDP controller, obtain system state according to the definition in step S1, system state is inputted Action network, using the output of Action network as system control parameters, for Dominating paths oral sex messenger.Alternatively, the method can also comprise carries out on-line training according to step S5 and S6, with the weights of the weights of real-time update Action network and Critic network.
Above method step of the present invention is non-essential to be performed with illustrated order.Under the prerequisite not departing from spirit of the present invention, in an alternate embodiment, above-mentioned steps and/or some step of executed in parallel can be performed with different order.These modification all fall into protection scope of the present invention.
Said method of the present invention can be performed the computer instruction be stored in memory device and realize by the equipment (such as processor) with computing function.An example of this implementation is a kind of ADHDP controller off-line training equipment controlled for intersection traffic signal, this ADHDP controller comprises Action network and Critic network, this equipment comprises: first device, define system state, Reward Program, split and system control parameters; Second device, sets up Action network and Critic network, and wherein: Action network is the BP neural network with a hidden layer, and wherein input layer number is P, and output layer neuron number is P-1, the neuron number of hidden layer is M a, M afor empirical value; And Critic network is the BP neural network with a hidden layer, wherein input layer number is 2P-1, and output layer neuron number is 1, and the neuron number of hidden layer is M c, M cfor empirical value; 3rd device, initialization ADHDP controller, comprising: initialization Action network weight and initialization Critic network weight; 4th device, before each control cycle terminates, obtains system state, inputs to Action network, export corresponding system controling parameters u (k), system control parameters u (k) is exported to simulation software to instruct the operation of next cycle; 5th device, inputs to Critic network, output performance index J (k) by system state S (k) and system control parameters u (k); 6th device, alternately trains Critic network according to performance index and Reward Program and trains Action network, with the weights of the weights and Action network that upgrade Critic network according to performance index; And the 7th device, judge whether the target reaching expection setting: when reaching the target of expection setting, off-line training terminates, and records the weights of final Action network and the weights of Critic network; Otherwise, use the 6th device to continue training.
Another example of this implementation is the equipment that a kind of ADHDP controller using above equipment to train carrys out On-line Control intersection traffic signal, comprise: the 8th device, respectively with the weight initialization Action network of the weights of final Action network and Critic network and Critic network; 9th device, is input to ADHDP controller by the real time traffic data of on-line system; And the tenth device, according to the definition in first device, obtain system state from the real time traffic data of on-line system, system state is inputted Action network, using the output of Action network as system control parameters, for Dominating paths oral sex messenger.
In this implementation, each device above-mentioned is that computing equipment performs instruction and the corresponding function module of producing.
Although illustrate and describe the present invention with reference to certain exemplary embodiments of the present invention, but those skilled in the art should understand that, when not deviating from the spirit and scope of the present invention of claims and equivalents thereof, the multiple change in form and details can be carried out to the present invention.Therefore, scope of the present invention should not be limited to above-described embodiment, but should not only be determined by claims, is also limited by the equivalent of claims.

Claims (30)

1., for the ADHDP controller off-line training method that intersection traffic signal controls, this ADHDP controller comprises Action network and Critic network, and the method comprises:
In step S1, define system state, Reward Program, split and system control parameters;
In step S2, set up Action network and Critic network, wherein:
Action network is the BP neural network with a hidden layer, and wherein input layer number is P, and output layer neuron number is P-1, and the neuron number of hidden layer is M a, M afor empirical value; And
Critic network is the BP neural network with a hidden layer, and wherein input layer number is 2P-1, and output layer neuron number is 1, and the neuron number of hidden layer is M c, M cfor empirical value;
In step S3, initialization ADHDP controller, comprising: initialization Action network weight and initialization Critic network weight;
In step S4, before each control cycle terminates, obtain system state, input to Action network, export corresponding system controling parameters u (k), system control parameters u (k) is exported to simulation software to instruct the operation of next cycle;
In step S5, system state S (k) and system control parameters u (k) are inputed to Critic network, output performance index Jk);
In step S6, alternately train Critic network according to performance index and Reward Program and train Action network, with the weights of the weights and Action network that upgrade Critic network according to performance index; And
In step S7, judge whether the target reaching expection setting: when reaching the target of expection setting, in step S8, off-line training terminates, and records the weights of final Action network and the weights of Critic network; Otherwise, return step S6 and continue training.
2. method according to claim 1, wherein define system state, Reward Program, split and system control parameters comprise:
Define system state, comprising: suppose there be P phase place in each control cycle, and phase time length is T i, each phase place has L iindividual track obtains right of access, and each track maximum queue length is h i, phase place queue length H i=max{h i, phase average queue length the flow in each track is q j, phase place flow is Q i=max{q j, definition phase place saturation degree is wherein 1≤i≤P, 1≤j≤L i, `ε is normaliztion constant, and define system state is Sk)={ s i(k) }, 1≤i≤P, wherein k is emulation step number, and step-length is the time span C of a kth control cycle k, determine C according to historical traffic Webster method k;
Definition Reward Program is wherein N=P-1, P>=2;
Definition split is a i, wherein 1≤i≤P-1.Split is the green light duration of i-th phase place and the ratio of the duration of control cycle, the split of last phase place and
Define system controling parameters is u (k)={ a i(k)), 1≤i≤P.
3. method according to claim 2, wherein each control cycle is the traffic signals period of change that of given crossing is complete.
4. method according to claim 2, wherein each phase place corresponds to a kind of traffic signal state at given crossing.
5. method according to claim 1, wherein initialization ADHDP controller also comprises:
The learning rate of Action network is set to l a, learning rate l avalue between for 0 ~ 1, each step frequency of training is set to as N a, frequency of training N avalue between 5 ~ 50;
The learning rate of Critic network is set to l c, learning rate l cvalue between 0 ~ 1, each step frequency of training be set to N c, frequency of training N cvalue between 5 ~ 50; And
For Action network and Critic network, all use Sigmoid function as activation function, β equals 1.
6. method according to claim 2, wherein obtains system state and comprises: the flow q receiving each track, crossing from simulation software jand queue length h idata, obtain system state S (k).
7. method according to claim 2, wherein train Critic network and Action network to comprise:
The training error of Critic network is calculated according to performance index and Reward Program;
The weights of Critic network are upgraded according to this training error;
According to the training error of performance Index Calculation Action network; And
The weights of Action network are upgraded according to this training error.
8. method according to claim 7, wherein:
The training error of Critic network is defined as:
E c ( k ) = 1 2 &lsqb; &alpha; J ( k ) - J ( k - 1 ) + r ( k ) &rsqb; 2 , A value is between 0 ~ 1.
The right value update of Critic network is in the following way:
w c(k+1)=w c(k)+Δw c(k)
&Delta;w c ( k ) = - &part; E c ( k ) &part; w c ( k ) = - &part; E c ( k ) &part; J ( k ) &part; J ( k ) &part; w c ( k ) ;
The training error of Action network is defined as:
g in formula ck () is control objectives, G c(k)=0;
The right value update of Action network is in the following way:
w a(k+1)=w a(k)+Δw a(k)
&Delta;w a ( k ) = - &part; E a ( k ) &part; w a ( k ) = - &part; E a ( k ) &part; J ( k ) &part; J ( k ) &part; u ( k ) &part; u ( k ) &part; w a ( k ) .
9. method according to claim 1, wherein M avalue between 5 ~ 20, M cvalue between 5 ~ 20.
10. method according to claim 1, wherein:
The target of expection setting is crossing total delay time or each track average vehicle speed;
If expection setting target be the crossing total delay time, then in step S7, when the total delay time be less than or close to expection setting the total delay time time, the method proceeds to step S8, otherwise return step S6 continue training;
If expection setting target be each track average vehicle speed, then when each track average vehicle speed is greater than or close to expection setting average vehicle speed time, the method proceeds to step S8, otherwise return step S6 continue training.
11. method according to claim 2, wherein C kvalue is between 30 seconds to 120 seconds.
12. 1 kinds of methods using the ADHDP controller of the training of the method any one of claim 1-11 to carry out On-line Control intersection traffic signal, comprising:
Respectively with the weight initialization Action network of the weights of final Action network and Critic network and Critic network;
The real time traffic data of on-line system is input to ADHDP controller; And
According to the definition in step S1, obtain system state from the real time traffic data of on-line system, system state is inputted Action network, using the output of Action network as system control parameters, for Dominating paths oral sex messenger.
13. methods according to claim 12, wherein real time traffic data comprises the flow q in each track, crossing jand queue length h i.
14. methods according to claim 12, also comprise and carry out on-line training according to step S5 and S6, with the weights of the weights of real-time update Action network and Critic network.
15. methods according to claim 12, wherein the real time traffic data of on-line system comprises the flow q in each track, crossing jand queue length h i.
16. 1 kinds of ADHDP controller off-line training equipment controlled for intersection traffic signal, this ADHDP controller comprises Action network and Critic network, and this equipment comprises:
First device, define system state, Reward Program, split and system control parameters;
Second device, sets up Action network and Critic network, wherein:
Action network is the BP neural network with a hidden layer, and wherein input layer number is P, and output layer neuron number is P-1, and the neuron number of hidden layer is M a, M afor empirical value; And
Critic network is the BP neural network with a hidden layer, and wherein input layer number is 2P-1, and output layer neuron number is 1, and the neuron number of hidden layer is M c, M cfor empirical value;
3rd device, initialization ADHDP controller, comprising: initialization Action network weight and initialization Critic network weight;
4th device, before each control cycle terminates, obtains system state, inputs to Action network, export corresponding system controling parameters u (k), system control parameters u (k) is exported to simulation software to instruct the operation of next cycle;
5th device, inputs to Critic network, output performance index J (k) by system state S (k) and system control parameters u (k);
6th device, alternately trains Critic network according to performance index and Reward Program and trains Action network, with the weights of the weights and Action network that upgrade Critic network according to performance index; And
7th device, judges whether the target reaching expection setting: when reaching the target of expection setting, off-line training terminates, and records the weights of final Action network and the weights of Critic network; Otherwise, use the 6th device to continue training.
17. equipment according to claim 16, wherein define system state, Reward Program, split and system control parameters comprise:
Define system state, comprising: suppose there be P phase place in each control cycle, and phase time length is T i, each phase place has L iindividual track obtains right of access, and each track maximum queue length is h i, phase place queue length H i=max{h i, phase average queue length the flow in each track is q j, phase place flow is Q i=max{q j, definition phase place saturation degree is wherein 1≤i≤P, 1≤j≤L i, ε is normaliztion constant, and define system state is S (k)={ s i(k}, 1≤i≤P, wherein k is emulation step number, and step-length is the time span C of a kth control cycle k, determine Ck according to historical traffic Webster method ;
Definition Reward Program is wherein N=P-1, P>=2;
Definition split is a i, wherein 1≤i≤P-1.Split is the green light duration of i-th phase place and the ratio of the duration of control cycle, the split of last phase place and
Define system controling parameters is u (k)={ a i(k) }, 1≤i≤P.
18. equipment according to claim 17, wherein each control cycle is the traffic signals period of change that of given crossing is complete.
19. equipment according to claim 17, wherein each phase place corresponds to a kind of traffic signal state at given crossing.
20. equipment according to claim 16, wherein initialization ADHDP controller also comprises:
The learning rate of Action network is set to l a, learning rate l avalue between for 0 ~ 1, each step frequency of training is set to as N a, frequency of training N avalue between 5 ~ 50;
The learning rate of Critic network is set to l c, learning rate l cvalue between 0 ~ 1, each step frequency of training be set to N c, frequency of training N cvalue between 5 ~ 50; And
For Action network and Critic network, all use Sigmoid function as activation function, β equals 1.
21. equipment according to claim 17, wherein obtain system state and comprise: the flow q receiving each track, crossing from simulation software jand queue length h idata, obtain system state S (k).
22. equipment according to claim 17, wherein train Critic network and Action network to comprise:
The training error of Critic network is calculated according to performance index and Reward Program;
The weights of Critic network are upgraded according to this training error;
According to the training error of performance Index Calculation Action network; And
The weights of Action network are upgraded according to this training error.
23. equipment according to claim 22, wherein:
The training error of Critic network is defined as:
E c ( k ) = 1 2 &lsqb; &alpha; J ( k ) - J ( k - 1 ) + r ( k ) &rsqb; 2 , A value is between 0 ~ 1.
The right value update of Critic network is in the following way:
w c(k+1)=w c(k)+Δw c(k)
&Delta;w c ( k ) = - &part; E c ( k ) &part; w c ( k ) = - &part; E c ( k ) &part; J ( k ) &part; J ( k ) &part; w c ( k ) ;
The training error of Action network is defined as:
g in formula ck () is control objectives, G c(k)=0;
The right value update of Action network is in the following way:
w a(k+1)=w a(k)+Δw a(k)
&Delta;w a ( k ) = - &part; E a ( k ) &part; w a ( k ) = - &part; E a ( k ) &part; J ( k ) &part; J ( k ) &part; u ( k ) &part; u ( k ) &part; w a ( k ) .
24. equipment according to claim 16, wherein M avalue between 5 ~ 20, M cvalue between 5 ~ 20.
25. equipment according to claim 16, wherein:
The target of expection setting is crossing total delay time or each track average vehicle speed;
If the target of expection setting is the crossing total delay time, then in the 7th device, during the total delay time be less than when the total delay time or set close to expection, off-line training terminates, record the weights of final Action network and the weights of Critic network, otherwise use the 6th device to continue training;
If the target of expection setting is each track average vehicle speed, then when the average vehicle speed that each track average vehicle speed is greater than or close expection sets, this off-line training terminates, record the weights of final Action network and the weights of Critic network, otherwise use the 6th device to continue training.
26. equipment according to claim 17, wherein C kvalue is between 30 seconds to 120 seconds.
27. 1 kinds of equipment using the ADHDP controller of the training of the equipment any one of claim 16-26 to carry out On-line Control intersection traffic signal, comprising:
8th device, respectively with the weight initialization Action network of the weights of final Action network and Critic network and Critic network;
9th device, is input to ADHDP controller by the real time traffic data of on-line system; And
Tenth device, according to the definition in first device, obtains system state from the real time traffic data of on-line system, system state is inputted Action network, using the output of Action network as system control parameters, for Dominating paths oral sex messenger.
28. equipment according to claim 27, wherein real time traffic data comprises the flow q in each track, crossing jand queue length h i.
29. equipment according to claim 27, also comprise use the 5th device and the 6th device carries out on-line training, with the weights of the weights of real-time update Action network and Critic network.
30. equipment according to claim 27, wherein the real time traffic data of on-line system comprises the flow q in each track, crossing jand queue length h i.
CN201510665966.1A 2015-10-15 2015-10-15 Intersection traffic signal control method and equipment Active CN105279978B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510665966.1A CN105279978B (en) 2015-10-15 2015-10-15 Intersection traffic signal control method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510665966.1A CN105279978B (en) 2015-10-15 2015-10-15 Intersection traffic signal control method and equipment

Publications (2)

Publication Number Publication Date
CN105279978A true CN105279978A (en) 2016-01-27
CN105279978B CN105279978B (en) 2018-05-25

Family

ID=55148906

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510665966.1A Active CN105279978B (en) 2015-10-15 2015-10-15 Intersection traffic signal control method and equipment

Country Status (1)

Country Link
CN (1) CN105279978B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108459506A (en) * 2018-03-20 2018-08-28 清华大学 A kind of parameter tuning method of the virtual inertia controller of wind turbine
CN114973698A (en) * 2022-05-10 2022-08-30 阿波罗智联(北京)科技有限公司 Control information generation method and machine learning model training method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010046291A (en) * 1999-11-11 2001-06-15 정환도 Traffic signal control system and method using cdma wireless communication network
KR20050051956A (en) * 2003-11-28 2005-06-02 주식회사 비츠로시스 Control system and methdod for local divisional traffic signal
JP2007122584A (en) * 2005-10-31 2007-05-17 Sumitomo Electric Ind Ltd Traffic signal control system and control method of traffic signal control system
CN102568220A (en) * 2010-12-17 2012-07-11 上海市长宁区少年科技指导站 Self-adaptive traffic control system
CN104882006A (en) * 2014-07-03 2015-09-02 中国科学院沈阳自动化研究所 Message-based complex network traffic signal optimization control method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010046291A (en) * 1999-11-11 2001-06-15 정환도 Traffic signal control system and method using cdma wireless communication network
KR20050051956A (en) * 2003-11-28 2005-06-02 주식회사 비츠로시스 Control system and methdod for local divisional traffic signal
JP2007122584A (en) * 2005-10-31 2007-05-17 Sumitomo Electric Ind Ltd Traffic signal control system and control method of traffic signal control system
CN102568220A (en) * 2010-12-17 2012-07-11 上海市长宁区少年科技指导站 Self-adaptive traffic control system
CN104882006A (en) * 2014-07-03 2015-09-02 中国科学院沈阳自动化研究所 Message-based complex network traffic signal optimization control method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张鹏程: "基于核的连续空间增强学习方法及应用研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
齐驰: "近似动态规划方法及其在交通中的应用", 《中国博士学位论文全文数据库 工程科技Ⅱ辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108459506A (en) * 2018-03-20 2018-08-28 清华大学 A kind of parameter tuning method of the virtual inertia controller of wind turbine
CN108459506B (en) * 2018-03-20 2020-12-08 清华大学 Parameter setting method of virtual inertia controller of fan
CN114973698A (en) * 2022-05-10 2022-08-30 阿波罗智联(北京)科技有限公司 Control information generation method and machine learning model training method and device
CN114973698B (en) * 2022-05-10 2024-04-16 阿波罗智联(北京)科技有限公司 Control information generation method and machine learning model training method and device

Also Published As

Publication number Publication date
CN105279978B (en) 2018-05-25

Similar Documents

Publication Publication Date Title
Belletti et al. Expert level control of ramp metering based on multi-task deep reinforcement learning
CN109492814B (en) Urban traffic flow prediction method, system and electronic equipment
CN110570672B (en) Regional traffic signal lamp control method based on graph neural network
US11182676B2 (en) Cooperative neural network deep reinforcement learning with partial input assistance
CN109284812B (en) Video game simulation method based on improved DQN
JP6092477B2 (en) An automated method for correcting neural dynamics
CN106781489A (en) A kind of road network trend prediction method based on recurrent neural network
CN105700526A (en) On-line sequence limit learning machine method possessing autonomous learning capability
KR20160125967A (en) Method and apparatus for efficient implementation of common neuron models
US20230367934A1 (en) Method and apparatus for constructing vehicle dynamics model and method and apparatus for predicting vehicle state information
CN112172813A (en) Car following system and method for simulating driving style based on deep inverse reinforcement learning
Tagliaferri et al. A real-time strategy-decision program for sailing yacht races
CN114415507B (en) Deep neural network-based smart hand-held process dynamics model building and training method
CN105279978A (en) Intersection traffic signal control method and device
Wang et al. Dynamic-horizon model-based value estimation with latent imagination
Hilleli et al. Toward deep reinforcement learning without a simulator: An autonomous steering example
CN113821903A (en) Temperature control method and device, modular data center and storage medium
CN117008620A (en) Unmanned self-adaptive path planning method, system, equipment and medium
US20230162539A1 (en) Driving decision-making method and apparatus and chip
CN105513380B (en) The off-line training method and system and its On-Line Control Method and system of EADP controllers
KR102624710B1 (en) Structural response estimation method using gated recurrent unit
Koltovska et al. Intelligent Agent Based Traffic Signal Control on Isolated Intersections.
Li et al. Prediction for short-term traffic flow based on optimized wavelet neural network model
CN115512558A (en) Traffic light signal control method based on multi-agent reinforcement learning
JP2017513110A (en) Contextual real-time feedback for neuromorphic model development

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant