CN109269516B - Dynamic path induction method based on multi-target Sarsa learning - Google Patents
Dynamic path induction method based on multi-target Sarsa learning Download PDFInfo
- Publication number
- CN109269516B CN109269516B CN201810992284.5A CN201810992284A CN109269516B CN 109269516 B CN109269516 B CN 109269516B CN 201810992284 A CN201810992284 A CN 201810992284A CN 109269516 B CN109269516 B CN 109269516B
- Authority
- CN
- China
- Prior art keywords
- traffic
- induction
- target
- driver
- road
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 230000006698 induction Effects 0.000 title claims abstract description 64
- 238000000034 method Methods 0.000 title claims abstract description 49
- 239000013598 vector Substances 0.000 claims abstract description 46
- 238000004364 calculation method Methods 0.000 claims abstract description 10
- 239000000126 substance Substances 0.000 claims description 8
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 230000003068 static effect Effects 0.000 claims description 6
- 239000006185 dispersion Substances 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 238000011425 standardization method Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 abstract description 2
- 230000002787 reinforcement Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000001939 inductive effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/34—Route searching; Route guidance
- G01C21/3453—Special cost functions, i.e. other than distance or default speed limit of road segments
- G01C21/3492—Special cost functions, i.e. other than distance or default speed limit of road segments employing speed data or traffic data, e.g. real-time or historical
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/34—Route searching; Route guidance
- G01C21/3407—Route searching; Route guidance specially adapted for specific applications
- G01C21/3415—Dynamic re-routing, e.g. recalculating the route when the user deviates from calculated route or after detecting real-time traffic data or accidents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
- G06Q10/047—Optimisation of routes or paths, e.g. travelling salesman problem
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
Landscapes
- Engineering & Computer Science (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Marketing (AREA)
- Theoretical Computer Science (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention provides a dynamic path induction method based on multi-target Sarsa learning, which comprises the following steps: initializing information; updating information; inducement path calculation, including Q vector table normalization, calculation of scalar values based on driver preferences, calculation of Boltzmann probability distribution, selection of the next road segment for the driver to meet his personal preferences by roulette method until the driver's vehicle reaches the destination. According to the traffic condition of the current traffic system, the running path of the vehicle is optimized, the efficiency of the traffic system is improved, and the traffic jam condition is relieved. From a practical perspective, the dynamic path induction of multiple induction targets is carried out simultaneously, and the induction requirements in practical life are met. The driver induction preference is considered, and a dynamic induction path which accords with the personal preference is provided for the driver, so that the acceptance rate of the induction path is improved, the traffic efficiency of a traffic system is further improved, and the traffic jam condition is relieved.
Description
Technical Field
The invention belongs to the technical field of intelligent transportation, and particularly relates to a dynamic path induction method based on multi-target Sarsa learning.
Background
In recent years, with the rapid development of social economy in China, the holding capacity of private automobiles is continuously increased, and the problems of increased urban traffic pressure, urban traffic congestion, traffic jam, frequent traffic accidents and the like are increasingly serious. Furthermore, drivers, as important participants in traffic systems, often have multiple inducement goals simultaneously and different preferences for different goals during a trip. Whether to consider the personal preferences of the driver can have a great influence on the acceptance of the inducement information and thus on the traffic efficiency of the traffic system. Therefore, it is very necessary to realize efficient and dynamic route guidance from the viewpoint of relieving traffic congestion and satisfying individual preference of the driver.
The reinforcement learning has strong self-adaptability and self-learning capability, the self control strategy can be continuously adjusted along with the change of the system environment without prior knowledge and modeling, the dynamic information of the system is utilized for learning, and the control requirement on the traffic guidance system with high randomness and complexity is met. The Sarsa learning as a reinforcement learning algorithm of on-policy learning is particularly suitable for searching an optimal path and dynamically inducing vehicles in a traffic induction system with complexity, variability and strong real-time performance.
Most of the currently proposed route guidance models and guidance algorithms are single-target route guidance methods constructed only for the road section travel time, and the guidance requirements in actual life and the individual preference of drivers are ignored. Multi-objective reinforcement learning is commonly used to solve such multi-objective optimization problems, and methods for solving an optimal solution set for multi-objective reinforcement learning are mainly classified into a single-strategy method and a multi-strategy method. However, compared with the single-strategy method, the multi-strategy method learns a set of a series of optimal solutions to approach the Pareto frontier when interacting with the environment every time, and a large amount of calculation time is required in the process, and the corresponding calculation amount is very large. And a multi-strategy method is used in on-policy learning, and the calculation amount and the storage time of the corresponding solution set are both large, so that the method is not suitable for a dynamic path induction system. Therefore, the single-strategy multi-target Sarsa learning is suitable for solving the dynamic path induction problem considering the preference of the driver on the basis of containing the multi-induction targets.
Disclosure of Invention
In light of the above technical problems, an object of the present invention is to provide a dynamic path induction method based on multi-objective Sarsa learning. The real-time traffic data information and the personal preference information of the driver are fully utilized, the route guidance information according to the personal preference is provided for the driver, the whole traffic system is coordinated to pass, the traffic jam is relieved, and the passing efficiency of the traffic system is improved.
The technical scheme is as follows: a dynamic path induction method based on multi-target Sarsa learning comprises the following steps of 1-3:
step 1: initializing information, specifically comprising steps 1.1 to 1.3:
step 1.1: confirming an induction target: the method comprises the steps of selecting one or more of minimized travel time, minimized travel distance and minimized cost; (ii) a
Step 1.2: aiming at the induced targets, a traffic information center initializes a Q vector table of each induced target corresponding to a terminal to be selected on a road network by using a dynamic planning algorithm based on a Q value according to road network information in a geographic information base and historically collected static data of each road section, wherein one Q vector table corresponds to one terminal to be selected;
step 1.3: setting a Q value information updating time interval T issued by a traffic information center;
the road network information includes: road network topological structure, road length and number of lanes;
the static data of each path segment comprises: historical vehicle transit time, distance, cost;
step 2: the information updating specifically includes: defining the weight of an induced target, calculating the current road network traffic jam coefficient and updating a Q vector table by using a Sarsa learning method at every T moment:
(1) defining an induced target weight:
recording current information of all vehicles in a road network, real-time traffic information passing through a current road section and preference of each passing driver in the road network; assuming that there are n induction targets in total, the preference of each driver is denoted as weight vector ω ═ ω (ω ═ ω)1,…,ωn) Wherein, epsilono∈[0,1]And (3) representing the weight of the corresponding preference of the No. o induction target, defining the weight of each induction target:
each driver self-defines the attention degree of each induction target, namely, the preference of each driver is weighted;
the all-vehicle current information includes: including location, desired destination, all next traffic nodes that can be reached;
the real-time traffic information of the current road section comprises: travel time, distance, cost;
(2) calculating the traffic congestion coefficient of the current road network: counting the number NV of vehicles in the current road network, and calculating the traffic congestion coefficient of the current road network according to the number of the vehicles in the current road network, wherein the traffic congestion coefficient belongs to the following steps:
wherein β and γ are parameters, the traffic congestion coefficient ∈ represents the current traffic condition of the traffic system, the value of ∈ increases with the increase of the total number NV of vehicles in the current road network, and when the value of ∈ is larger, it means that the current traffic condition is congested, and vice versa.
(3) And updating a Q vector table by using a Sarsa learning method every T time: and (2) updating the Q vector table of the corresponding terminal point according to a Sarsa learning method for each guidance target o by the current information of all vehicles in the recorded road network which is acquired in the step (1) and is closest to the updating time and the real-time traffic information of the current road section and by using the next driving road section distributed in the step (3.3) and the step (3.4) every T moment, wherein the formula of the Sarsa learning method is as follows:
wherein the content of the first and second substances,a Q value with o as an induction target and d as a terminal point, wherein the Q value is from a traffic node i to an adjacent traffic node j, k is the adjacent traffic node of the traffic node j, alpha is a learning rate,for vehicles v to pass over sections sijThe actual prize value obtained;
the actual prize values include: travel time, distance, or cost, only one of which is selected.
And step 3: the calculation of the induction path comprises the following steps of 3.1-3.5:
step 3.1: and (3) normalizing by a Q vector table: according to the Q vector table updated in the step 2, respectively normalizing the corresponding Q values of different induction targets by adopting a dispersion standardization method, wherein the formula is as follows:
wherein the content of the first and second substances,to pass through a section s of roadijNormalized Q values for the induced target o with end point d,andthe minimum value and the maximum value of the Q values of all the road sections corresponding to the end point d and the induction target o are respectively.
Step 3.2: calculating a scalar value based on driver preferences: according to the corresponding driver preference, namely the weight vector epsilon obtained in the step 2 and the Q vector table normalized in the step 3.1, a linear scaling function is applied to convert the Q vectors of all the adjacent road sections of the current traffic node where the vehicle is located in the Q vector table with the end point d into a scaling value SQ based on the driver preference by using the following formulad(i, j), the specific formula is as follows:
wherein n represents the number of induction targets, ωoIndicating the preference weight corresponding to the target o,representing a passing road section sijNormalized Q value of target o with end point d;
step 3.3: calculating Boltzmann probability distribution: using the scalar value SQ based on the driver's preference with the vehicle current information obtained in step 2d(i, j), calculating Boltzmann probability distribution of the current traffic node adjacent road section, wherein the formula is as follows:
wherein, Pd(i, j) vehicle end point d and select road segment sijThe probability of (i, j) is a traffic node,a (i) is a set consisting of end points of road sections with the traffic node i as a starting point and end points corresponding to adjacent road sections of the current node obtained according to a road network topological structure, and epsilon is a traffic congestion coefficient and ESQd(i) Scalar value SQ based on driver preference from road segment around node i to destination dd(i) Average value of (a).
Step 3.4: selecting the next travel segment that meets his personal preferences: calculating Boltzmann probability distribution of each road section based on the step 3.3, and selecting a next driving road section which meets personal preference of a driver by a roulette method;
step 3.5: and if the vehicle does not reach the destination, repeating the steps 3.2-3.3 until the vehicle reaches the destination.
The beneficial technical effects are as follows:
1. a dynamic route induction method based on multi-target Sarsa learning can fully utilize real-time information of a current traffic system, optimize a running route of a vehicle according to traffic conditions of the current traffic system, improve efficiency of the traffic system and relieve traffic jam conditions.
2. A dynamic path induction method based on multi-target Sarsa learning is based on the practical angle, and simultaneously carries out dynamic path induction of multiple induction targets, so that the induction requirements in the practical life are met.
3. A dynamic route induction method based on multi-target Sarsa learning considers the induction preference of a driver and provides a dynamic induction route which accords with the personal preference for the driver, so that the acceptance rate of the induction route is improved, the traffic efficiency of a traffic system is further improved, and the traffic jam condition is relieved.
Drawings
FIG. 1 is a flowchart of a dynamic path induction method based on multi-objective Sarsa learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of dynamic path induction according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a vehicle path calculation according to an embodiment of the present invention;
fig. 4 is a comparison diagram of the embodiment of the invention for traffic congestion compared with the conventional inducing method.
Detailed Description
The invention is further explained with reference to the drawings and the specific embodiment, and the whole process of information interaction between the dynamic path guidance system and the vehicle is shown in fig. 2. The vehicle in the road network sends data such as self position, terminal point, personal preference and the like to the dynamic path guidance system, the dynamic path guidance system calculates the guidance path which accords with the personal preference by using a path guidance algorithm through the data transmitted by the vehicle and the collected information such as real-time traffic conditions of the road network and the like, and sends the guidance path to the vehicle to complete information interaction between the two parties. A dynamic path induction method based on multi-target Sarsa learning comprises steps 1-3, as shown in FIG. 1:
step 1: initializing information, specifically comprising steps 1.1 to 1.3:
step 1.1: confirming an induction target: travel time and travel cost;
step 1.2: aiming at the induced targets, a traffic information center initializes a Q vector table of each induced target corresponding to a terminal to be selected on a road network by using a dynamic planning algorithm based on a Q value according to road network information in a geographic information base and historically collected static data of each road section, wherein one Q vector table corresponds to one terminal to be selected; first, initializing the Q vectors of the road segments around the possible destination d, and the specific operation is as follows:
wherein the content of the first and second substances,to pass through a section s of roadijBeginning of reaching end point dThe initialized Q vector, i, j is the traffic node, timeijAnd costijRespectively, vehicle passing road section sijD is an end point set, a (i) is an end point set of links starting from the traffic node i, and b (i) is a start point set of links starting from the traffic node i.
Then, the Q vectors of all the road segments corresponding to the destination d are updated through a plurality of iterations, and the update formula is as follows:
wherein the content of the first and second substances,for the n-th iteration, corresponding to the road section s with the end point dijAnd the obtained Q vector k is an adjacent traffic node of the traffic node j.
Step 1.3: setting a Q value information updating time interval T issued by a traffic information center;
the road network information includes: road network topological structure, road length and number of lanes;
the static data of each path segment comprises: historical vehicle transit time, cost;
as shown in fig. 3, taking a vehicle v with a destination d and located at a traffic node j as an example, the dynamic induction process is as follows:
step 2: the information updating specifically includes: defining the weight of an induced target, calculating the current road network traffic jam coefficient and updating a Q vector table by using a Sarsa learning method at every T moment:
(1) defining an induced target weight: recording current information of all vehicles in a road network, real-time traffic information passing through a current road section and preference of each passing driver in the road network; assuming that there are n induction targets in total, the preference of each driver is denoted as weight vector ω ═ ω (ω ═ ω)1,…,ωn) Wherein, ω iso∈[0,1]And (3) representing the weight of the corresponding preference of the No. o induction target, defining the weight of each induction target:
the all-vehicle current information includes: including location, desired destination, all next traffic nodes that can be reached;
the real-time traffic information of the current road section comprises: travel time, cost;
each driver self-defines the attention degree of each induction target, namely, the preference of each driver is weighted;
recording current information of the vehicle v such as position: traffic node j, desired destination: traffic node d, all next reachable traffic nodes: k. k ', k' passing through the current road section sijTravel time, cost, etc., and driver preferences. The preference weight vector ω of each driver is (0.8, 0.2). Where 0.8 and 0.2 represent the weight of the corresponding preference for the induction target in terms of time and expense, respectively.
(2) Calculating the traffic congestion coefficient of the current road network: counting the number NV of vehicles in the current road network, and calculating the traffic congestion coefficient of the current road network according to the number of the vehicles in the current road network, wherein the traffic congestion coefficient belongs to the following steps:
wherein β and γ are parameters, the traffic congestion coefficient ∈ represents the current traffic condition of the traffic system, the value of ∈ increases with the increase of the total number NV of vehicles in the current road network, and when the value of ∈ is larger, it means that the current traffic condition is congested, and vice versa. Where β, γ are set to 0.3, 0.005, respectively, assuming that the number of vehicles NV in the current road network is 500, then ∈ is 0.8.
(3) Every T time, all recorded road networks which are acquired in the step (1) and are nearest to the updating time are passedCurrent information of vehicles and real-time traffic information passing through current road section, e.g. vehicle v on road section sijOn the immediate travel timeAnd immediate expenseAnd the next driving section s assigned using the 3-step route selection methodjkAnd assuming that the learning rate alpha is 0.7, in the current Q vector tableAndthe values of (1) and (250s, 21) and (200s, 20) respectively. Therefore, the Q vector table of the corresponding end point d is respectively updated for each induction target according to the Sarsa learning method. The Sarsa learning formula is as follows:
wherein the content of the first and second substances,a Q vector starting from traffic node i and passing through an adjacent traffic node j and ending at d.
And step 3: the calculation of the induction path comprises the following steps of 3.1-3.5:
step 3.1: and (3) normalizing by a Q vector table: according to the Q vector table updated in the step 2, different induced targets are normalized by a dispersion standardization method respectively to obtain corresponding Q values, so that the problem that the different induced targets have different units and dimensions is solved, and the formula is as follows:
wherein the content of the first and second substances,to pass through a section s of roadijNormalized Q values for the induced target o with end point d,andthe minimum value and the maximum value of the Q values of all the road sections corresponding to the end point d and the induction target o are respectively.
Based on the values in Q vector Table and 2Updating the section s in the normalized Q vector table corresponding to the end point d according to the valueijThe corresponding normalized Q vector.
Step 3.2: calculating a scalar value based on driver preferences: according to the weight vector omega which is the corresponding driver preference obtained in the step (2) and the Q vector table after normalization in the step (1), a linear scaling function is applied to convert the Q vectors of all the adjacent road sections of the current traffic node where the vehicle v is located in the Q vector table with the end point d into the scaling value SQ based on the driver preferenced(i, j), according to FIG. 3, the specific operation is as follows:
SQd(j,k)=0.8×0.195+0.2×0.388=0.2336
SQd(j,k′)=0.8×0.253+0.2×0.306=0.2636
SQd(j,k″)=0.8×0.310+0.2×0.306=0.3092
step 3.3: calculating Boltzmann probability distribution: using the scalar value SQ based on the driver's preference with the vehicle current information obtained in step 2d(i, j), calculating Boltzmann probability distribution of the current traffic node adjacent road section, wherein the formula is as follows:
wherein, Pd(i, j) vehicle end point d and select road segment sijThe probability of (a) is a traffic node, A (i) is a set of end points of road sections with the traffic node i as a starting point, and according to a set formed by end points corresponding to adjacent road sections of the current node obtained by a road network topological structure, the epsilon is a traffic congestion coefficient, and ESQd(i) Scalar value SQ based on driver preference from road segment around node i to destination dd(i) Average value of (a).
Can be calculated as pd(j,k)=0.3705,pd(j,k′)=0.3387,pd(j,k")=0.2908
Step 3.4: selecting the next travel segment that meets his personal preferences: calculating Boltzmann probability distribution of each road section based on the step 3.3, and selecting a next driving road section which meets personal preference of a driver by a roulette method;
step 3.5: and if the vehicle does not reach the destination, repeating the steps 3.2-3.3 until the vehicle reaches the destination.
As shown in fig. 4, for the traffic congestion situation compared with the conventional guidance method, the abscissa is the simulation time step, and the ordinate is the current total vehicle number of the road network; the more vehicles are, the more congested the road network is. Compared with the traditional path induction method Dijk, the SMOSWU and the dynamic path induction method based on multi-target Sarsa learning, disclosed by the invention, have the advantages that real-time traffic information is fully utilized, the efficiency of a traffic system is improved, and the traffic jam condition is effectively relieved on the basis of considering the personal preference of a user.
Claims (6)
1. A dynamic path induction method based on multi-target Sarsa learning is characterized by comprising the following procedures:
step 1: initializing information, specifically comprising steps 1.1 to 1.3:
step 1.1: confirming an induction target: the method comprises the steps of selecting one or more of minimized travel time, minimized travel distance and minimized cost;
step 1.2: aiming at the induced targets, a traffic information center initializes a Q vector table of each induced target corresponding to a terminal to be selected on a road network by using a dynamic planning algorithm based on a Q value according to road network information in a geographic information base and historically acquired static data of each road section, wherein one Q vector table corresponds to one terminal to be selected;
step 1.3: setting a Q value information updating time interval T issued by a traffic information center;
step 2: the information updating specifically includes: defining the weight of an induced target, calculating the current road network traffic jam coefficient and updating a Q vector table by using a Sarsa learning method at every T moment:
(1) defining an induced target weight:
recording current information of all vehicles in a road network, real-time traffic information passing through a current road section and preference of each passing driver in the road network; assuming that there are n induction targets in total, the preference of each driver is denoted as weight vector ω ═ ω (ω ═ ω)1,…,ωn) Wherein, ω iso∈[0,1]And (3) representing the weight of the corresponding preference of the No. o induction target, defining the weight of each induction target:
each driver self-defines the attention degree of each induction target, namely, the preference of each driver is weighted;
(2) calculating the traffic congestion coefficient of the current road network: counting the number NV of vehicles in the current road network, and calculating the traffic congestion coefficient of the current road network according to the number of the vehicles in the current road network, wherein the traffic congestion coefficient belongs to the following steps:
wherein, beta and gamma are parameters, and the traffic congestion coefficient belongs to the current traffic condition of the traffic system;
(3) and updating a Q vector table by using a Sarsa learning method every T time: and (2) updating the Q vector table of the corresponding terminal point according to a Sarsa learning method for each guidance target o by the current information of all vehicles in the recorded road network which is acquired in the step (1) and is closest to the updating time and the real-time traffic information of the current road section and by using the next driving road section distributed in the step (3.3) and the step (3.4) every T moment, wherein the formula of the Sarsa learning method is as follows:
wherein the content of the first and second substances,a Q value with o as an induction target and d as a terminal point, wherein the Q value is from a traffic node i to an adjacent traffic node j, k is the adjacent traffic node of the traffic node j, alpha is a learning rate,for vehicles v to pass over sections sijThe actual prize value obtained;
and step 3: the calculation of the induction path comprises the following steps of 3.1-3.5:
step 3.1: and (3) normalizing by a Q vector table: according to the Q vector table updated in the step 2, respectively normalizing the corresponding Q values of different induction targets by adopting a dispersion standardization method, wherein the formula is as follows:
wherein the content of the first and second substances,to pass through a section s of roadijNormalized Q values for the induced target o with end point d,andrespectively taking the minimum value and the maximum value of Q values of all road sections corresponding to the destination d and the induction target o;
step 3.2: calculating a scalar value based on driver preferences: according to the corresponding driver preference, namely the weight vector omega obtained in the step 2 and the Q vector table normalized in the step 3.1, a linear scaling function is applied to convert the Q vectors of all the adjacent road sections of the current traffic node where the vehicle is located in the Q vector table with the end point d into a scaling value SQ based on the driver preference by using the following formulad(i, j), the specific formula is as follows:
wherein n represents the number of induction targets, ωoIndicating the preference weight corresponding to the target o,representing a passing road section sijNormalized Q value of target o with end point d;
step 3.3: calculating Boltzmann probability distribution: using the scalar value SQ based on the driver's preference with the vehicle current information obtained in step 2d(i, j), calculating Boltzmann probability distribution of the current traffic node adjacent road section, wherein the formula is as follows:
wherein, Pd(i, j) vehicle end point d and select road segment sijThe probability of (a) is a traffic node, A (i) is a set of end points of road sections with the traffic node i as a starting point, according to a set formed by end points corresponding to adjacent road sections of the current node obtained by a road network topological structure, epsilon is a traffic congestion coefficient,ESQd(i) scalar value SQ based on driver preference from road segment around node i to destination dd(i) Average value of (d);
step 3.4: selecting the next travel segment that meets his personal preferences: calculating Boltzmann probability distribution of each road section based on the step 3.3, and selecting a next driving road section which meets personal preference of a driver by a roulette method;
step 3.5: and if the vehicle does not reach the destination, repeating the steps 3.2-3.3 until the vehicle reaches the destination.
2. The dynamic path induction method based on multi-objective Sarsa learning of claim 1, wherein the road network information in step 1 comprises: road network topology, road length, number of lanes.
3. The dynamic path induction method based on multi-target Sarsa learning as claimed in claim 1, wherein the static data of each road section in step 1 comprises: historical vehicle transit time, distance, cost.
4. The dynamic path guidance method based on multi-target Sarsa learning of claim 1, wherein the current information of all vehicles in step 2 comprises: including location, desired destination, all next traffic nodes that can be reached.
5. The dynamic path induction method based on multi-target Sarsa learning of claim 1, wherein the step 2 comprises the following steps: travel time, distance, cost.
6. The dynamic path induction method based on multi-target Sarsa learning of claim 1, wherein the actual reward value of step 2 comprises: travel time, distance, or cost, only one of which is selected.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810992284.5A CN109269516B (en) | 2018-08-29 | 2018-08-29 | Dynamic path induction method based on multi-target Sarsa learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810992284.5A CN109269516B (en) | 2018-08-29 | 2018-08-29 | Dynamic path induction method based on multi-target Sarsa learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109269516A CN109269516A (en) | 2019-01-25 |
CN109269516B true CN109269516B (en) | 2022-03-04 |
Family
ID=65154604
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810992284.5A Expired - Fee Related CN109269516B (en) | 2018-08-29 | 2018-08-29 | Dynamic path induction method based on multi-target Sarsa learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109269516B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110631599A (en) * | 2019-08-29 | 2019-12-31 | 重庆长安汽车股份有限公司 | Navigation method, system, server and automobile based on air pollution |
CN114664086B (en) * | 2019-12-18 | 2023-11-24 | 北京嘀嘀无限科技发展有限公司 | Method, device, electronic equipment and storage medium for controlling information release |
CN112039767B (en) * | 2020-08-11 | 2021-08-31 | 山东大学 | Multi-data center energy-saving routing method and system based on reinforcement learning |
CN113503888A (en) * | 2021-07-09 | 2021-10-15 | 复旦大学 | Dynamic path guiding method based on traffic information physical system |
CN114267176A (en) * | 2021-12-24 | 2022-04-01 | 中电金信软件有限公司 | Navigation method, navigation device, electronic equipment and computer readable storage medium |
CN114459498A (en) * | 2022-03-14 | 2022-05-10 | 南京理工大学 | New energy vehicle charging station selection and self-adaptive navigation method based on reinforcement learning |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104658297A (en) * | 2015-02-04 | 2015-05-27 | 沈阳理工大学 | Central type dynamic path inducing method based on Sarsa learning |
CN106096756A (en) * | 2016-05-31 | 2016-11-09 | 武汉大学 | A kind of urban road network dynamic realtime Multiple Intersections routing resource |
CN107977738A (en) * | 2017-11-21 | 2018-05-01 | 合肥工业大学 | A kind of multiobjective optimization control method for conveyer belt feed processing station system |
US10024675B2 (en) * | 2016-05-10 | 2018-07-17 | Microsoft Technology Licensing, Llc | Enhanced user efficiency in route planning using route preferences |
CN108389419A (en) * | 2018-03-02 | 2018-08-10 | 辽宁工业大学 | A kind of Dynamic Route Guidance Method of Vehicle |
-
2018
- 2018-08-29 CN CN201810992284.5A patent/CN109269516B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104658297A (en) * | 2015-02-04 | 2015-05-27 | 沈阳理工大学 | Central type dynamic path inducing method based on Sarsa learning |
US10024675B2 (en) * | 2016-05-10 | 2018-07-17 | Microsoft Technology Licensing, Llc | Enhanced user efficiency in route planning using route preferences |
CN106096756A (en) * | 2016-05-31 | 2016-11-09 | 武汉大学 | A kind of urban road network dynamic realtime Multiple Intersections routing resource |
CN107977738A (en) * | 2017-11-21 | 2018-05-01 | 合肥工业大学 | A kind of multiobjective optimization control method for conveyer belt feed processing station system |
CN108389419A (en) * | 2018-03-02 | 2018-08-10 | 辽宁工业大学 | A kind of Dynamic Route Guidance Method of Vehicle |
Also Published As
Publication number | Publication date |
---|---|
CN109269516A (en) | 2019-01-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109269516B (en) | Dynamic path induction method based on multi-target Sarsa learning | |
CN108197739B (en) | Urban rail transit passenger flow prediction method | |
CN112629533B (en) | Fine path planning method based on road network rasterization road traffic prediction | |
CN109102124B (en) | Dynamic multi-target multi-path induction method and system based on decomposition and storage medium | |
CN111080018B (en) | Intelligent network-connected automobile speed prediction method based on road traffic environment | |
CN105260785B (en) | Logistics distribution vehicle path optimization method based on improved cuckoo algorithm | |
CN110570672A (en) | regional traffic signal lamp control method based on graph neural network | |
CN106845703B (en) | Urban road network time-varying K shortest path searching method considering steering delay | |
CN109115220B (en) | Method for parking lot system path planning | |
CN109238297B (en) | Dynamic path selection method for user optimization and system optimization | |
CN113516277B (en) | Internet intelligent traffic path planning method based on road network dynamic pricing | |
CN109489679B (en) | Arrival time calculation method in navigation path | |
CN115409256A (en) | Route recommendation method for congestion area avoidance based on travel time prediction | |
CN108830401B (en) | Dynamic congestion charging optimal rate calculation method based on cellular transmission model | |
CN112562363B (en) | Intersection traffic signal optimization method based on V2I | |
CN114120670A (en) | Method and system for traffic signal control | |
CN113724507A (en) | Traffic control and vehicle induction cooperation method and system based on deep reinforcement learning | |
US11237008B2 (en) | System and method for controlling vehicular pollution concentration and providing maximum traffic flow throughput | |
CN117116064A (en) | Passenger delay minimization signal control method based on deep reinforcement learning | |
CN108256662A (en) | The Forecasting Methodology and device of arrival time | |
Martynova et al. | Ant colony algorithm for rational transit network design of urban passenger transport | |
CN111862657A (en) | Method and device for determining road condition information | |
CN113343358B (en) | Electric vehicle charging load space-time distribution modeling method considering road information | |
CN115481777A (en) | Multi-line bus dynamic schedule oriented collaborative simulation optimization method, device and medium | |
CN113628455B (en) | Intersection signal optimization control method considering number of people in vehicle under Internet of vehicles environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220304 |