CN112927522B - Internet of things equipment-based reinforcement learning variable-duration signal lamp control method - Google Patents

Internet of things equipment-based reinforcement learning variable-duration signal lamp control method Download PDF

Info

Publication number
CN112927522B
CN112927522B CN202110067478.6A CN202110067478A CN112927522B CN 112927522 B CN112927522 B CN 112927522B CN 202110067478 A CN202110067478 A CN 202110067478A CN 112927522 B CN112927522 B CN 112927522B
Authority
CN
China
Prior art keywords
lane
intersection
phase
intensity
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110067478.6A
Other languages
Chinese (zh)
Other versions
CN112927522A (en
Inventor
陈铭松
张雯倩
赵吴攀
叶豫桐
胡铭
韩定定
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN202110067478.6A priority Critical patent/CN112927522B/en
Publication of CN112927522A publication Critical patent/CN112927522A/en
Application granted granted Critical
Publication of CN112927522B publication Critical patent/CN112927522B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • G08G1/08Controlling traffic signals according to detected number or speed of vehicles
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • G08G1/085Controlling traffic signals using a free-running cyclic timer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y40/00IoT characterised by the purpose of the information processing
    • G16Y40/30Control
    • G16Y40/35Management of things, i.e. controlling in accordance with a policy or in order to achieve specified objectives
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B20/00Energy efficient lighting technologies, e.g. halogen lamps or gas discharge lamps
    • Y02B20/40Control techniques providing energy savings, e.g. smart controller or presence detection

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention provides a reinforcement learning variable-duration signal lamp control method based on Internet of things equipment, which mainly comprises the following contents: a reinforcement learning method based on the concept of 'strength' of an intersection is designed, and various real-time traffic information (such as the position, speed and the like of a vehicle) collected by Internet of things equipment is used for controlling phase selection of a signal lamp. And simultaneously, the most reasonable green light duration can be selected according to the number of vehicles on each lane. The invention can quickly converge to an excellent signal lamp control strategy under the condition of traffic dynamic change, greatly shortens the learning time of the strategy and improves the control quality of the strategy.

Description

Internet of things equipment-based reinforcement learning variable-duration signal lamp control method
Technical Field
The invention belongs to the technical field of computers, relates to a deep reinforcement learning algorithm and a signal lamp control problem, and particularly relates to an effective signal lamp control strategy generated by learning according to real-time traffic data obtained by Internet of things equipment in a highly complex real-time traffic environment.
Background
In recent years, with the rapid increase of automobile holding capacity in our country, more and more road traffic problems frequently occur, such as traffic planning problems, road safety problems, road congestion problems, traffic control problems, and the like. Traffic congestion has been a key issue in designing efficient infrastructure, but due to the rapid increase in traffic demand, traffic congestion has become a prominent issue. In addition, traffic jam also causes a series of problems of traffic environmental pollution, traffic peace confusion and the like, and the travel quality and the life quality of people are seriously influenced. And the signal lamp of the intersection is used as the minimum unit for controlling the traffic, and how to make a reasonable control strategy becomes the key point of research.
Conventional fixed logic traffic signal controllers, which control signal lights using predefined signal phases and green light times, cannot be flexibly adjusted according to changes in traffic conditions, which makes it difficult to effectively control and guide traffic in the context of dynamic changes in traffic flow. Therefore, how to control the signal lamps in real time according to the traffic conditions so as to achieve dynamic control of the overall traffic flow is one of the current research hotspots. Reinforcement learning, as a learning method of "trial and error", is increasingly applied to the problem of signal lamp control.
However, due to inaccurate modeling, the existing reinforcement learning-based signal lamp control method has difficulty in quickly extracting effective contents from complex traffic information to guide the model to converge to an excellent control strategy. Meanwhile, in order to simplify traffic modeling, the existing method usually sets a fixed green time period for the signal lamp, which actually causes a waste of control time. Therefore, how to design an accurate and reasonable reinforcement learning method to quickly learn an effective variable-duration signal lamp control strategy is a problem to be solved urgently.
Disclosure of Invention
In order to solve the technical defects, the invention aims to provide a novel reinforcement learning variable-duration signal lamp control method based on internet of things equipment, designs a reinforcement learning method based on the concept of intersection strength, and controls the phase selection of a signal lamp by utilizing various real-time traffic information (such as the position, speed and the like of a vehicle) acquired by the internet of things equipment. And simultaneously, the most reasonable green light duration can be selected according to the number of vehicles on each lane.
The method comprises the following specific steps:
step 1: acquiring real-time traffic data based on the Internet of things equipment, processing the acquired traffic information, and generating newly defined intensity information according to the acquired traffic data; the Internet of things equipment comprises a velometer and a sensor; real-time traffic data includes the location and speed of the vehicle; the intensity information includes the intensity of the vehicle, lane, action, phase, and intersection.
Through extensive research and study, most of the current reinforcement learning methods tend to design a complex state to include as much traffic information as possible, however, the complex design is usually accompanied by a long learning process. The invention provides a brand-new concept of 'strength' under the condition of fully considering various traffic data, and can calculate the strength of the vehicle, the lane, the action, the phase and the intersection on the basis of the vehicle position and the speed data which can be acquired by the equipment of the Internet of things. Designing reinforcement learning methods based on the definition of strength can greatly shorten the strategy learning process.
The intensity of the vehicle is first defined. Assuming that the current vehicle speed is v, the allowable maximum driving speed of the current lane is vmaxThe length of the lane is L, the distance between the vehicle and the intersection is x, and a weight coefficient delta is introduced, so that the vehicle strength is as follows:
Figure GDA0002981437830000021
on the basis, the invention defines the lane strength as the sum of all vehicle strengths on the current lane, namely
Figure GDA0002981437830000022
vehicleiIndicating the i-th vehicle on the lane,
Figure GDA0002981437830000023
indicating the intensity of the i-th vehicle on the lane.
The action intensity is the difference between the 'intensity of a lane driving into the intersection' and the 'average value of the intensity of the lane driving out of the intersection' under the current action, namely
Figure GDA0002981437830000024
Wherein laneinIndicating a set of driving lanes, lane, under the actionoutRepresenting a set of outgoing lanes, lane, reachable from an incoming laneiIndicating the ith lane, in the set of lanesjRepresents the jth lane, | lane in the lane setoutL represents the number of exiting lanes,
Figure GDA0002981437830000025
indicating the strength of the ith lane,
Figure GDA0002981437830000026
indicating the intensity of the jth lane.
The phase strength being the sum of the motion strengths of the permissible movements in that phase, i.e.
Figure GDA0002981437830000027
movementiRepresenting the ith action that constitutes phase,
Figure GDA0002981437830000028
indicating the intensity corresponding to action i.
The invention defines the intersection intensity as the sum of all the vehicle intensities entering the intersection minus the sum of the vehicle intensities exiting the intersection, and the intersection intensity is expressed as:
Figure GDA0002981437830000029
wherein, laneinIndicating a set of incoming lanes at the crossing, laneoutSet of outgoing lanes, lane, representing an intersectioniIndicating the ith lane, in the set of lanesjRepresenting the jth lane in the set of lanes,
Figure GDA00029814378300000210
indicating the strength of the ith lane,
Figure GDA00029814378300000211
indicating the intensity of the jth lane.
In addition, in order to realize the control cooperation between the signal lamps of the adjacent intersections, the strength of the adjacent intersection of the intersection I is defined as follows:
Figure GDA0002981437830000031
wherein laneinComposed of the lanes of adjacent crossings, to which vehicles will drive toward the crossings I, laneiRepresenting the ith lane in the set of lanes,
Figure GDA0002981437830000032
indicating the strength of the ith lane. n is0Indicating the number of vehicles passing through the intersection in unit time, t indicating the remaining time of green light at the adjacent intersection, and N being the time of laneinThe total number of vehicles, ω, is a weight coefficient.
Step 2: designing a reinforcement learning method, generating a reinforcement learning state:
reinforcement learning methods generally include three elements: state, action and reward, the invention is designed as follows:
the state is as follows: the state is calculated after the intelligent agent observes the environment through the Internet of things equipment, and the state comprises the strength of each phase, the strength of a direct neighbor intersection and the current phase of the intersection. The intensity of each phase and the intensity of a direct neighbor intersection can be obtained by calculating the speed and the position of a vehicle collected by a road test speed sensor and an intersection camera, and the current state of a signal lamp can be directly read by the current phase of the intersection.
Taking a typical four-way intersection as an example, if there are 4 selectable phases and the current phase of the intersection is p, the state is represented as
Figure GDA0002981437830000033
If there is no direct neighbor crossing in a certain direction, the neighbor crossing in the direction is strongThe degree is 0.
The actions are as follows: representing the behavior taken by the model in interacting with the environment, and in signal light control problems, the actions are typically set to phase numbers. If there are 4 selectable phases, the motion space is {0,1,2,3 }.
Rewarding: rewards reflect how well an action is performed in a certain state, reflecting the quality of the action taken in the current state, to guide the learning process. The negative value of the crossing intensity is set. This means that if an action can reduce the intersection intensity more greatly, the action is considered as a better action.
The key point of the design is the state and the reward, and the state and the reward are designed through various intensity information obtained through real-time traffic data calculation.
And step 3: each intersection is provided with an intensified learning agent to control the phase selection of the traffic signal lamp.
When the green light duration of the current phase is used up, the intelligent agent selects a new optimal phase for the signal lamp by processing traffic data acquired by the intersection and the road Internet of things equipment. The simultaneously acquired traffic data and the selected phase are stored for use in training the agent.
The selection strategy of the phase is trained by the reinforcement learning method based on the intensity mechanism. The intelligent agent interacts with the traffic environment, the intelligent agent is trained through the traffic data acquired in real time, model parameters are continuously optimized while signal lamps are controlled, a more excellent control strategy is gradually learned, the strategy can be adjusted according to traffic changes, and the average waiting time of all traveling vehicles is minimized. The phase of the signal lamp can be optimally selected according to real-time traffic conditions by using the model.
And 4, step 4: selecting the green light time length according to the selected phase, calculating the number of vehicles on each lane, and applying the selected phase and the green light time length to the traffic signal lamp;
the agent will select the most reasonable time for the selected phase according to the number of vehicles on each lane at the current moment, so as to ensure that waiting vehicles on the passable lane under the selected phase can smoothly pass through the intersection and avoid time waste.
The intelligent agent firstly obtains the number of vehicles on each lane of the intersection, and then selects a most reasonable time length from the selectable duration set according to the number. The specific calculation mode is designed as follows:
first, the minimum value t of the duration of the green light is setminAnd maximum value tmaxAnd a number M of selectable durations, based on which the set of selectable durations is defined as:
Figure GDA0002981437830000041
where Δ t is equal to tminAnd tmaxThe time periods between are evenly divided according to M-1 sections, tiIndicating the final selectable duration.
After the agent selects a phase, it is observed that the total number of vehicles on the lane of the crossing that are allowed to move for that phase is N, and a green duration is assigned to that phase
Figure GDA0002981437830000042
Wherein n is0Representing the number of vehicles passing through the intersection per unit time, operator
Figure GDA0002981437830000043
(a.ltoreq.b) means that y is x when x.ltoreq.a and x.ltoreq.b, a when x.ltoreq.a, and b when x.ltoreq.b. t is a positive integer (N)*Representing a set of positive integers) and is not less than
Figure GDA0002981437830000044
And 5: storing the data and updating the network parameters through a playback mechanism of the reinforcement learning agent.
The detailed process of step 5 is as follows: the learning-intensive experience playback buffer M is first initialized, and the duration t is initialized. Whenever the current green duration runs out, the agent needs to select the next phase of the traffic signal and the green duration. The intelligent agent firstly obtains a state s through interaction with the environment (information uploaded by Internet of things equipment such as a velometer and a sensor), then the state is input into a reinforcement learning model, the phase action a is obtained through model calculation, and the green light duration time t is calculated. Then apply phase a with duration t for the traffic signal, after time t the agent may take the next state s 'and calculate the reward r gained by taking action a, then store experience < s, a, r, s' > into the experience playback buffer. When the number of stored experiences is not less than the number available for training, the agent will randomly select a batch of samples from the experience replay buffer for model training and update the network weights using a random gradient descent each round.
In each cycle of interaction between the agent and the environment, the learning process can be roughly divided into five steps:
1) observing the traffic environment to obtain the state required by reinforcement learning;
2) selecting the optimal phase action by the reinforcement learning model;
3) calculating the duration time of the green light with the most reasonable phase;
4) applying the selected phase and green duration for the signal light;
5) storing the data and updating the network parameters through a reinforcement learning playback mechanism.
The invention has the beneficial effects that:
the invention provides a novel and effective 'strength' mechanism, dynamic data of a vehicle acquired in real time are abstracted into strength information, and a reinforcement learning method is designed on the basis of the strength information. The method greatly improves the learning convergence speed of the model, realizes the rapid convergence of the reinforcement learning process, greatly shortens the learning time of the strategy, ensures the control quality of the final strategy and improves the control quality of the final learning strategy.
The invention can not only control the phase of the traffic signal lamp in real time, but also distribute the most reasonable green time for the selected phase according to the observed real-time traffic condition. Different from the traditional fixed signal time length, the variable time length signal lamp control strategy adopted by the invention can further utilize time and shorten the average waiting time of the traveling vehicle. Compared with the traditional signal lamp control method and other signal lamp control algorithms based on reinforcement learning, the method can quickly converge to an excellent signal lamp control strategy under the condition of traffic dynamic change, improves the control quality of the strategy, and continuously optimizes the control strategy along with the change of traffic environment.
The method of the invention not only can control the phase of the traffic signal lamp in real time, but also can allocate the most reasonable green light time for the selected phase, thereby further refining and utilizing the time. The phase of the signal lamp can be controlled, and meanwhile, the green time can be dynamically adjusted according to the number of vehicles on each lane, rather than the fixed time length is allocated to the phase as in the traditional method.
Drawings
Fig. 1 is a schematic intersection diagram depicting the concept of motion, signal and phase. Wherein, the left graph road junction with arrow long line represents the action, and the left graph road junction enters a certain driving-in lane and a certain driving-out lane through the road junction. The signal is used to determine which actions are allowed at a time, where light dots indicate that movement is allowed and dark dots indicate that movement is prohibited. The phase is defined as the combination of non-conflicting signals, and as shown in the right diagram, the four phases adopted by a classical signal light control scheme, namely north-south going straight, east-west going straight, north-south turning left, east-west turning left.
Fig. 2 is a flow chart of signal control and policy learning.
FIG. 3 shows the results of the performance test of the method of the present invention.
Detailed Description
The invention is described in further detail in connection with the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.
The invention relates to a reinforcement learning variable-duration signal lamp control method based on Internet of things equipment, which comprises the following steps:
1. traffic data definition intensity information based on internet of things equipment acquisition:
the invention first defines the strength of the vehicle. Assuming that the current vehicle speed is v, the allowable maximum driving speed of the current lane is vmaxThe length of the lane is L, the distance between the vehicle and the intersection is x, and a weight coefficient delta is introduced, so that the vehicle strength is as follows:
Figure GDA0002981437830000061
on the basis, the invention defines the lane strength as the sum of all vehicle strengths on the current lane, namely
Figure GDA0002981437830000062
vehicleiIndicating the i-th vehicle on the lane,
Figure GDA0002981437830000063
indicating the intensity of the i-th vehicle on the lane.
The action intensity is the difference between the 'intensity of a lane driving into the intersection' and the 'average value of the intensity of the lane driving out of the intersection' under the current action, namely
Figure GDA0002981437830000064
Wherein laneinIndicating a set of driving lanes, lane, under the actionoutRepresenting a set of outgoing lanes reachable from an incoming lane, laneiIndicating the ith lane, in the set of lanesjRepresents the jth lane, | lane in the lane setoutL represents the number of exiting lanes,
Figure GDA0002981437830000065
indicating the strength of the ith lane,
Figure GDA0002981437830000066
indicating the intensity of the jth lane.
The phase strength being the sum of the motion strengths of the permissible movements in that phase, i.e.
Figure GDA0002981437830000067
movementiRepresenting the ith action that constitutes phase,
Figure GDA0002981437830000068
indicating the intensity corresponding to action i.
The invention defines the intersection intensity as the sum of all vehicle intensities entering the intersection minus the sum of vehicle intensities exiting the intersection, and the intersection intensity is expressed as:
Figure GDA0002981437830000069
wherein, laneinIndicating a set of incoming lanes, lane, at the intersectionoutSet of outgoing lanes, lane, representing an intersectioniIndicating the ith lane, in the set of lanesjRepresents the jth lane in the set of lanes,
Figure GDA00029814378300000610
indicating the strength of the ith lane,
Figure GDA00029814378300000611
indicating the intensity of the jth lane.
In addition, in order to realize the control cooperation between the signal lamps of the adjacent intersections, the strength of the adjacent intersection of the intersection I is defined as follows:
Figure GDA0002981437830000071
wherein laneinComposed of the lanes of adjacent crossings, to which vehicles will drive toward the crossings I, laneiRepresenting the ith lane in the set of lanes,
Figure GDA0002981437830000072
indicating the strength of the ith lane. n is0Indicating the number of vehicles passing through the intersection in unit time, t indicating the remaining time of green light at the adjacent intersection, and N being the time of laneinThe total number of vehicles, ω, is a weight coefficient.
2. Designing a reinforcement learning method:
the three element states, actions and rewards of the reinforcement learning method are specifically as follows:
the state is as follows: the state is calculated after the intelligent agent observes the environment, and comprises the strength of each phase, the strength of a direct neighbor intersection and the current phase of the intersection. The phase intensity and the intensity of the direct neighbor intersection can be obtained by calculating the speed and the position of the vehicle collected by the road measuring speed sensor and the intersection camera, and the current phase of the intersection can be obtained by directly reading the current state of the signal lamp.
Taking a typical four-way intersection as an example, if there are 4 selectable phases and the current phase of the intersection is p, the state is represented as
Figure GDA0002981437830000073
If there is no direct neighbor intersection in a certain direction, the strength of the neighbor intersection in the direction is 0.
The method comprises the following steps: the main task of the signal lamp control strategy is to select the optimal phase position to minimize the intensity of the intersection. When the time for the current phase runs out, the agent needs to take action to select a new phase (phase may be repeatedly selected). The invention thus defines the action as the best signal phase selectable by the signal light. The invention sets 4 selectable phases, four phases as shown on the right side of fig. 1, so that each agent has four different predefined allowed actions, which are spatially coded as {0,1,2,3 }. The key to the design is generally the state and the reward, and the state and the reward are designed according to various intensity information calculated by real-time traffic data.
Rewarding: the reward should reflect the quality of the action taken in the current state to guide the learning process. Higher rewards mean better action. The optimization goal of existing signal light control problems is generally to minimize the average waiting time of the vehicle, which is an indicator that can be obtained only by long-term, continuous signal light action, and is not immediately available and therefore not directly rewarded. According to existing research, the convergence trend of the reinforcement learning algorithm is consistent whether the optimization goal is to shorten the vehicle driving time or minimize the intersection intensity. Therefore, based on the correlation between the strength and the average waiting time established by our model, the reward is set to be the opposite number of the strength value of the intersection. This means that if an action can reduce the intersection intensity more greatly, the action is considered as a better action.
In addition, the invention adopts a classic DQN network structure when designing the reinforcement learning network structure.
3. Control of phase and green duration of traffic lights:
the invention configures a reinforcement learning agent for each intersection. The intelligent agent interacts with the traffic environment, constantly optimizes model parameters while controlling the signal lamp, and learns more excellent control strategies. When the green light time of the current phase is used up, the intelligent agent selects an optimal phase for the next period of time by processing traffic data acquired by the intersection and road Internet of things equipment. The simultaneously acquired traffic data and the selected phase action are stored for use in training the agent. In addition, the agent will select the most reasonable time period for the selected phase based on the number of vehicles in each lane at the current time.
In each cycle of interaction between the agent and the environment, the learning process can be roughly divided into five steps: 1) observing the traffic environment to obtain the state required by reinforcement learning; 2) making the best phase action by the reinforcement learning model; 3) calculating the duration time of the green light with the most reasonable phase; 4) selected phase and green duration for the signal lamp application, and 5) storing the data and updating the network parameters through a reinforcement learning playback mechanism.
The detailed process of the step 5 is as follows: the learning-intensive experience playback buffer M is first initialized, and the duration t is initialized. Whenever the current green duration runs out, the agent needs to select the next phase of the traffic signal and the green duration. The intelligent agent firstly obtains a state s through interaction with the environment (information uploaded by Internet of things equipment such as a velometer and a sensor), then the state is input into a reinforcement learning model, the phase action a is obtained through model calculation, and the duration time t of a green light is calculated. Then apply phase a with duration t for the traffic signal, after time t the agent may take the next state s 'and calculate the reward r gained by taking action a, then store experience < s, a, r, s' > into the experience playback buffer. When the number of stored experiences is not less than the number available for training, the agent will randomly select a batch of samples from the experience replay buffer for model training and update the network weights using a random gradient descent each round.
The selection strategy of the phase is trained by the reinforcement learning method based on the intensity mechanism. And the phase green light duration is calculated according to the number of vehicles on each lane. The specific calculation mode is designed as follows:
first, the minimum value t of the duration of the green light is setminAnd maximum value tmaxAnd a number M of selectable durations, based on which the set of selectable durations is defined as
Figure GDA0002981437830000081
Where Δ t is equal to tminAnd tmaxThe time periods between are evenly divided according to M-1 sections, tiIndicating the final selectable duration.
After the agent selects a phase, it is observed that the total number of vehicles on the lane of the crossing that are allowed to move for that phase is N, and a green duration is assigned to that phase
Figure GDA0002981437830000082
Wherein n is0Indicating a path of passage per unit timeNumber of vehicles per mouth, operator
Figure GDA0002981437830000083
(a.ltoreq.b) means that y is x when x.ltoreq.a and x.ltoreq.b, a when x.ltoreq.a, and b when x.ltoreq.b. t is a positive integer (N)*Representing a set of positive integers) and is not less than
Figure GDA0002981437830000091
Examples
The invention provides a reinforcement learning variable-duration signal lamp control method based on equipment of the Internet of things, which comprises the following code implementation parts (important interception):
as shown in code 1, this section includes the code for reinforcement learning method state acquisition:
Figure GDA0002981437830000092
Figure GDA0002981437830000101
Figure GDA0002981437830000111
Figure GDA0002981437830000121
Figure GDA0002981437830000131
Figure GDA0002981437830000141
code 1
In code 1, the state that the intensity information is generated through the traffic data acquired in real time and then the intensity information is processed to obtain the reinforcement learning method is mainly given, namely the state is used as the observation of the traffic condition of the intersection by the intelligent agent. The main functions are interaction _ info, get _ lanepressure, get _ neighbor _ compression, get _ state. The intersectioninfo obtains partial traffic data of the current intersection, including the number of vehicles on each lane, the vehicle traveling speed, the vehicle position, and the like. get _ Lanepressure returns the calculated lane strength, and get _ neighbor _ pressure returns the strength of the neighboring intersection. get _ state calculates to get the phase intensity, then combines the phase intensity, the intensity of the neighboring crossing and the current signal lamp phase to get the state and returns.
The calculation method of the phase green duration is shown as code 2:
Figure GDA0002981437830000142
Figure GDA0002981437830000151
code 2
In code 2, a calculation method of the green light duration is mainly given. The intelligent agent firstly obtains the number of vehicles on each lane of the intersection, and then selects a most reasonable time length from the selectable duration set according to the number.
The selection method of the action under the reinforcement learning control strategy and the return of the reward corresponding to the action are shown as code 3.
Figure GDA0002981437830000152
Figure GDA0002981437830000161
Code 3
In code 3, a selection function, choose _ action, reinforcement learning experience playback function, play, and reward function, get _ reward, of the beacon phase action are given. Wherein, the choose _ action function processes the state through the model and returns to the optimal phase action in the current state. The replay function updates the network parameters through the stored historical data.
In addition, in order to comprehensively test the performance of the invention, a Cityflow traffic simulation platform is used, simulation control is performed on 4 simulation data sets (1x3 intersection, 2x2 intersection, 3x3 intersection, 4x4 intersection) and 2 real data sets (Jinan 3x3 intersection, Hangzhou 4x4 intersection), and performance comparison is performed with a traditional signal lamp control method and other advanced reinforcement learning methods. The test results simulate the average waiting time of all traveling vehicles within one hour of traffic. Fig. 3 is a result of performance test of the method of the present invention, and it can be seen that the average waiting time of traveling vehicles can be minimized by applying the method.
The invention provides a reinforcement learning variable-duration signal lamp control method based on Internet of things equipment. The method is characterized in that strength information is designed based on various real-time traffic data collected by Internet of things equipment, and a reinforcement learning method is designed on the basis. The invention gets rid of the limit of fixed green time of the traditional signal lamp and can select the most reasonable green time according to the real-time traffic condition. The invention configures a reinforcement learning agent for each intersection. The intelligent agent interacts with the traffic environment, constantly optimizes model parameters while controlling the signal lamp, and learns more excellent control strategies. Under the condition of traffic dynamic change, the intelligent agent can quickly converge to an excellent signal lamp control strategy, so that the learning time of the strategy is greatly shortened, and the control quality of the strategy is improved.
The protection of the present invention is not limited to the above embodiments. Variations and advantages that may occur to those skilled in the art may be incorporated into the invention without departing from the spirit and scope of the inventive concept, and the scope of the appended claims is intended to be protected.

Claims (7)

1. A reinforcement learning variable-duration signal lamp control method based on Internet of things equipment is characterized by comprising the following steps:
step 1: generating newly defined intensity information through real-time traffic data acquired by the Internet of things equipment; the Internet of things equipment in the step 1 comprises a velometer and a sensor;
the real-time traffic data includes a position and a speed of the vehicle;
the intensity information comprises the intensity of the vehicle, the lane, the action, the phase and the intersection;
the calculation formula of the strength of the vehicle is as follows:
Figure FDA0003580412910000011
wherein, the vehicle speed is v, and the allowable maximum driving speed of the current lane is vmaxThe length of a lane is L, the distance between a vehicle and an intersection is x, and a weight coefficient delta is introduced;
the intensity of the lane is the sum of the intensities of all vehicles on the current lane, namely
Figure FDA0003580412910000012
vehicleiIndicating the ith vehicle on the lane and,
Figure FDA0003580412910000013
indicating the intensity of the i-th vehicle on the lane;
the action intensity is the difference value between the 'intensity of a lane entering the intersection' and the 'average value of the intensity of the lane leaving the intersection' under the current action, namely
Figure FDA0003580412910000014
Wherein, laneinIndicating a set of driving lanes, lane, under the actionoutRepresenting a set of outgoing lanes, lane, reachable from an incoming laneiIndicating the ith lane, in the set of lanesjRepresents the jth lane, | lane in the lane setoutL represents the number of exiting lanes,
Figure FDA0003580412910000015
indicating the strength of the ith lane,
Figure FDA0003580412910000016
representing the intensity of the jth lane;
the intensity of the phase being the sum of the intensities of the movements permitted for that phase, i.e.
Figure FDA0003580412910000017
movementiRepresenting the ith action that constitutes phase,
Figure FDA00035804129100000110
representing the intensity corresponding to the action i;
the intersection intensity is the sum of all the vehicle intensities entering the intersection minus the sum of the vehicle intensities exiting the intersection, and is expressed as:
Figure FDA0003580412910000018
wherein, laneinIndicating a set of incoming lanes, lane, at the intersectionoutSet of outgoing lanes, lane, representing an intersectioniIndicating the ith lane, in the set of lanesjRepresenting the jth lane in the set of lanes,
Figure FDA0003580412910000019
indicates the intensity of the i-th lane,
Figure FDA00035804129100000111
representing the strength of the jth lane;
step 2: designing a reinforcement learning method based on the step 1;
and 3, step 3: each intersection is provided with an intensive learning agent, when the green light duration of the current phase is used up, the agent selects an optimal phase for a signal lamp by processing traffic data acquired by the intersection and road Internet of things equipment, and the acquired traffic data and the selected phase action are stored to train the agent;
and 4, step 4: step 3, the intelligent agent selects the most reasonable green light time length according to the selected phase, and applies the selected phase and the green light time length to the traffic signal lamp; the green light duration is obtained by calculating the number of vehicles on each lane at the current moment;
and 5: storing the data and updating the network parameters through a playback mechanism of the reinforcement learning agent.
2. The method of claim 1, wherein the strength of the neighbor intersection of intersection I is:
Figure FDA0003580412910000021
wherein, laneinComposed of the lanes of adjacent crossings, to which vehicles will drive toward the crossings I, laneiRepresenting the ith lane in the set of lanes,
Figure FDA0003580412910000022
representing the intensity of the ith lane; n is0Indicating the number of vehicles passing through the intersection in unit time, t indicating the remaining time of green light at the adjacent intersection, and N being the time of laneinThe total number of vehicles, ω, is a weight coefficient.
3. The method of claim 1, wherein the reinforcement learning method in step 2 comprises three elements: status, actions, and rewards;
the state is calculated after the intelligent agent observes the environment through the Internet of things equipment, and comprises the strength of each phase, the strength of a direct neighbor intersection and the current phase of the intersection; the intensity information of each phase and the intensity of the direct neighbor intersection can be obtained by calculating the speed and the position of the vehicle collected by a road test speed sensor and an intersection camera; the current phase of the intersection can be obtained by directly reading the current state of the signal lamp;
the action represents the action taken by the interaction of the model and the environment and is set as a phase number;
the reward represents the degree of quality of executing a certain action in a certain state, and is set as a negative value of the intersection intensity.
4. The method as claimed in claim 3, wherein the key to the reinforcement learning method design is status and reward, which are designed by various intensity information calculated from real-time traffic data, and designed using DQN network structure.
5. The method of claim 1, wherein the agent interacts with the traffic environment in step 3, trains the agent through the traffic data acquired in real time, continuously optimizes model parameters while controlling the signal lights, learns an optimized control strategy, and the strategy can be adjusted to adapt to traffic changes, minimize average waiting time of all traveling vehicles, and make the best choice for the phase of the signal lights according to real-time traffic conditions.
6. The method of claim 1, wherein in step 4, the agent first obtains the number of vehicles on each lane entering the intersection, and then selects a most reasonable time length from the selectable time duration set according to the number of vehicles, wherein the most reasonable time length is to ensure that the waiting vehicles on the passable lane in the selected phase can pass through the intersection smoothly without wasting time, and the specific calculation method is as follows:
first, the minimum value t of the duration of the green light is setminAnd maximum value tmaxAnd a number M of selectable durations, the set of selectable durations being:
Figure FDA0003580412910000031
where Δ t is equal to tminAnd tmaxThe time periods between are evenly divided according to M-1 sections, tiRepresents the final selectable duration;
after the agent selects a phase, it can be observed that the total number of vehicles on the lane of the entrance intersection allowed to move in the phase is N, and then the green duration is assigned to the phase as follows:
Figure FDA0003580412910000032
wherein n is0Representing the number of vehicles passing through the intersection per unit time, operator
Figure FDA0003580412910000033
(a ≦ b) means y ═ x when x ≦ a and x ≦ b, a when x ≦ a, and b when x ≦ b; t is positive integer and not less than
Figure FDA0003580412910000034
N*Representing a set of positive integers.
7. The method of claim 1, wherein the detailed process of step 5 is: firstly, initializing an experience playback buffer zone M for reinforcement learning, and initializing duration t; whenever the current green duration runs out, the agent needs to select the next phase of the traffic signal and the green duration; the intelligent agent firstly interacts with the environment, uploads information through the Internet of things equipment including a velocimeter and a sensor to obtain a state s, then inputs the state s into a reinforcement learning model, and calculates a phase action a and a green light duration t by the model; then applying a phase a with a duration t to the traffic light, after the time t, the agent may obtain the next state s ', and calculate a reward r obtained by taking the action a, and then store the experience < s, a, r, s' > into an experience playback buffer; when the number of stored experiences is not less than the number available for training, the agent will randomly select a batch of samples from the experience replay buffer for model training and update the network weights using a random gradient descent each round.
CN202110067478.6A 2021-01-19 2021-01-19 Internet of things equipment-based reinforcement learning variable-duration signal lamp control method Active CN112927522B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110067478.6A CN112927522B (en) 2021-01-19 2021-01-19 Internet of things equipment-based reinforcement learning variable-duration signal lamp control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110067478.6A CN112927522B (en) 2021-01-19 2021-01-19 Internet of things equipment-based reinforcement learning variable-duration signal lamp control method

Publications (2)

Publication Number Publication Date
CN112927522A CN112927522A (en) 2021-06-08
CN112927522B true CN112927522B (en) 2022-07-05

Family

ID=76163355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110067478.6A Active CN112927522B (en) 2021-01-19 2021-01-19 Internet of things equipment-based reinforcement learning variable-duration signal lamp control method

Country Status (1)

Country Link
CN (1) CN112927522B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115083149B (en) * 2022-05-19 2023-07-28 华东师范大学 Reinforced learning variable duration signal lamp control method for real-time monitoring

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106898147A (en) * 2015-12-18 2017-06-27 英业达集团(北京)电子技术有限公司 Vehicle and intersection information is collected to control the system and method for car speed
CN110114806A (en) * 2018-02-28 2019-08-09 华为技术有限公司 Signalized control method, relevant device and system
CN111243271A (en) * 2020-01-11 2020-06-05 多伦科技股份有限公司 Single-point intersection signal control method based on deep cycle Q learning
KR102155055B1 (en) * 2019-10-28 2020-09-11 라온피플 주식회사 Apparatus and method for controlling traffic signal based on reinforcement learning

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103280114B (en) * 2013-06-24 2015-01-07 电子科技大学 Signal lamp intelligent control method based on BP-PSO fuzzy neural network
CN105844927A (en) * 2016-04-06 2016-08-10 深圳榕亨实业集团有限公司 Novel control system and novel control method for sensing and controlling road intersection group signals
US11593659B2 (en) * 2018-03-30 2023-02-28 Visa International Service Association Method, system, and computer program product for implementing reinforcement learning
US11131992B2 (en) * 2018-11-30 2021-09-28 Denso International America, Inc. Multi-level collaborative control system with dual neural network planning for autonomous vehicle control in a noisy environment
CN109559530B (en) * 2019-01-07 2020-07-14 大连理工大学 Multi-intersection signal lamp cooperative control method based on Q value migration depth reinforcement learning
CN111260937B (en) * 2020-02-24 2021-09-14 武汉大学深圳研究院 Cross traffic signal lamp control method based on reinforcement learning
CN111696370B (en) * 2020-06-16 2021-09-03 西安电子科技大学 Traffic light control method based on heuristic deep Q network
CN111785045B (en) * 2020-06-17 2022-07-05 南京理工大学 Distributed traffic signal lamp combined control method based on actor-critic algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106898147A (en) * 2015-12-18 2017-06-27 英业达集团(北京)电子技术有限公司 Vehicle and intersection information is collected to control the system and method for car speed
CN110114806A (en) * 2018-02-28 2019-08-09 华为技术有限公司 Signalized control method, relevant device and system
KR102155055B1 (en) * 2019-10-28 2020-09-11 라온피플 주식회사 Apparatus and method for controlling traffic signal based on reinforcement learning
CN111243271A (en) * 2020-01-11 2020-06-05 多伦科技股份有限公司 Single-point intersection signal control method based on deep cycle Q learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
POMDP环境下交通信号自适应控制的策略梯度学习方法;夏新海;《武汉理工大学学报》;20120731(第07期);全文 *
信号控制交叉口的复合动态车道管理方法;徐洪峰等;《吉林大学学报(工学版)》;20180228(第02期);全文 *
多相位交通信号模糊控制规则的优化;陈淑燕等;《公路交通科技》;20031220(第06期);全文 *

Also Published As

Publication number Publication date
CN112927522A (en) 2021-06-08

Similar Documents

Publication Publication Date Title
CN109559530B (en) Multi-intersection signal lamp cooperative control method based on Q value migration depth reinforcement learning
CN110969848B (en) Automatic driving overtaking decision method based on reinforcement learning under opposite double lanes
Qiao et al. Automatically generated curriculum based reinforcement learning for autonomous vehicles in urban environment
CN113643553B (en) Multi-intersection intelligent traffic signal lamp control method and system based on federal reinforcement learning
CN110570672B (en) Regional traffic signal lamp control method based on graph neural network
CN113223305B (en) Multi-intersection traffic light control method and system based on reinforcement learning and storage medium
CN113643528B (en) Signal lamp control method, model training method, system, device and storage medium
CN112863206B (en) Traffic signal lamp control method and system based on reinforcement learning
CN113963555A (en) Deep reinforcement learning traffic signal control method combined with state prediction
Aragon-Gómez et al. Traffic-signal control reinforcement learning approach for continuous-time Markov games
GB2583747A (en) Traffic control system
CN112927522B (en) Internet of things equipment-based reinforcement learning variable-duration signal lamp control method
CN115019523B (en) Deep reinforcement learning traffic signal coordination optimization control method based on minimized pressure difference
Huo et al. Cooperative control for multi-intersection traffic signal based on deep reinforcement learning and imitation learning
Shi et al. Efficient Lane-changing Behavior Planning via Reinforcement Learning with Imitation Learning Initialization
CN115083149B (en) Reinforced learning variable duration signal lamp control method for real-time monitoring
CN113870589B (en) Intersection signal lamp and variable lane joint control system and method
CN116639124A (en) Automatic driving vehicle lane changing method based on double-layer deep reinforcement learning
Zhao et al. Imitation of real lane-change decisions using reinforcement learning
CN115330064A (en) Human-machine decision logic online optimization method for highly automatic driving
Yuan et al. Deep reinforcement learning based green wave speed guidance for human-driven connected vehicles at signalized intersections
CN115512558A (en) Traffic light signal control method based on multi-agent reinforcement learning
Shahriar et al. Intersection traffic efficiency enhancement using deep reinforcement learning and V2X communications
CN115705771A (en) Traffic signal control method based on reinforcement learning
CN114360290B (en) Reinforced learning-based method for selecting vehicle group lanes in front of intersection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Chen Mingsong

Inventor after: Zhang Wenqian

Inventor after: Zhao Wupan

Inventor after: Ye Yutong

Inventor after: Hu Ming

Inventor after: Han Dingding

Inventor before: Chen Mingsong

Inventor before: Zhao Wupan

Inventor before: Ye Yutong

Inventor before: Hu Ming

Inventor before: Xia Jun

Inventor before: Han Dingding

GR01 Patent grant
GR01 Patent grant