CN109269516B - Dynamic path induction method based on multi-target Sarsa learning - Google Patents

Dynamic path induction method based on multi-target Sarsa learning Download PDF

Info

Publication number
CN109269516B
CN109269516B CN201810992284.5A CN201810992284A CN109269516B CN 109269516 B CN109269516 B CN 109269516B CN 201810992284 A CN201810992284 A CN 201810992284A CN 109269516 B CN109269516 B CN 109269516B
Authority
CN
China
Prior art keywords
traffic
induction
target
driver
road
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810992284.5A
Other languages
Chinese (zh)
Other versions
CN109269516A (en
Inventor
文峰
封筱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Ligong University
Original Assignee
Shenyang Ligong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Ligong University filed Critical Shenyang Ligong University
Priority to CN201810992284.5A priority Critical patent/CN109269516B/en
Publication of CN109269516A publication Critical patent/CN109269516A/en
Application granted granted Critical
Publication of CN109269516B publication Critical patent/CN109269516B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/3453Special cost functions, i.e. other than distance or default speed limit of road segments
    • G01C21/3492Special cost functions, i.e. other than distance or default speed limit of road segments employing speed data or traffic data, e.g. real-time or historical
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/3407Route searching; Route guidance specially adapted for specific applications
    • G01C21/3415Dynamic re-routing, e.g. recalculating the route when the user deviates from calculated route or after detecting real-time traffic data or accidents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Engineering & Computer Science (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Theoretical Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention provides a dynamic path induction method based on multi-target Sarsa learning, which comprises the following steps: initializing information; updating information; inducement path calculation, including Q vector table normalization, calculation of scalar values based on driver preferences, calculation of Boltzmann probability distribution, selection of the next road segment for the driver to meet his personal preferences by roulette method until the driver's vehicle reaches the destination. According to the traffic condition of the current traffic system, the running path of the vehicle is optimized, the efficiency of the traffic system is improved, and the traffic jam condition is relieved. From a practical perspective, the dynamic path induction of multiple induction targets is carried out simultaneously, and the induction requirements in practical life are met. The driver induction preference is considered, and a dynamic induction path which accords with the personal preference is provided for the driver, so that the acceptance rate of the induction path is improved, the traffic efficiency of a traffic system is further improved, and the traffic jam condition is relieved.

Description

Dynamic path induction method based on multi-target Sarsa learning
Technical Field
The invention belongs to the technical field of intelligent transportation, and particularly relates to a dynamic path induction method based on multi-target Sarsa learning.
Background
In recent years, with the rapid development of social economy in China, the holding capacity of private automobiles is continuously increased, and the problems of increased urban traffic pressure, urban traffic congestion, traffic jam, frequent traffic accidents and the like are increasingly serious. Furthermore, drivers, as important participants in traffic systems, often have multiple inducement goals simultaneously and different preferences for different goals during a trip. Whether to consider the personal preferences of the driver can have a great influence on the acceptance of the inducement information and thus on the traffic efficiency of the traffic system. Therefore, it is very necessary to realize efficient and dynamic route guidance from the viewpoint of relieving traffic congestion and satisfying individual preference of the driver.
The reinforcement learning has strong self-adaptability and self-learning capability, the self control strategy can be continuously adjusted along with the change of the system environment without prior knowledge and modeling, the dynamic information of the system is utilized for learning, and the control requirement on the traffic guidance system with high randomness and complexity is met. The Sarsa learning as a reinforcement learning algorithm of on-policy learning is particularly suitable for searching an optimal path and dynamically inducing vehicles in a traffic induction system with complexity, variability and strong real-time performance.
Most of the currently proposed route guidance models and guidance algorithms are single-target route guidance methods constructed only for the road section travel time, and the guidance requirements in actual life and the individual preference of drivers are ignored. Multi-objective reinforcement learning is commonly used to solve such multi-objective optimization problems, and methods for solving an optimal solution set for multi-objective reinforcement learning are mainly classified into a single-strategy method and a multi-strategy method. However, compared with the single-strategy method, the multi-strategy method learns a set of a series of optimal solutions to approach the Pareto frontier when interacting with the environment every time, and a large amount of calculation time is required in the process, and the corresponding calculation amount is very large. And a multi-strategy method is used in on-policy learning, and the calculation amount and the storage time of the corresponding solution set are both large, so that the method is not suitable for a dynamic path induction system. Therefore, the single-strategy multi-target Sarsa learning is suitable for solving the dynamic path induction problem considering the preference of the driver on the basis of containing the multi-induction targets.
Disclosure of Invention
In light of the above technical problems, an object of the present invention is to provide a dynamic path induction method based on multi-objective Sarsa learning. The real-time traffic data information and the personal preference information of the driver are fully utilized, the route guidance information according to the personal preference is provided for the driver, the whole traffic system is coordinated to pass, the traffic jam is relieved, and the passing efficiency of the traffic system is improved.
The technical scheme is as follows: a dynamic path induction method based on multi-target Sarsa learning comprises the following steps of 1-3:
step 1: initializing information, specifically comprising steps 1.1 to 1.3:
step 1.1: confirming an induction target: the method comprises the steps of selecting one or more of minimized travel time, minimized travel distance and minimized cost; (ii) a
Step 1.2: aiming at the induced targets, a traffic information center initializes a Q vector table of each induced target corresponding to a terminal to be selected on a road network by using a dynamic planning algorithm based on a Q value according to road network information in a geographic information base and historically collected static data of each road section, wherein one Q vector table corresponds to one terminal to be selected;
step 1.3: setting a Q value information updating time interval T issued by a traffic information center;
the road network information includes: road network topological structure, road length and number of lanes;
the static data of each path segment comprises: historical vehicle transit time, distance, cost;
step 2: the information updating specifically includes: defining the weight of an induced target, calculating the current road network traffic jam coefficient and updating a Q vector table by using a Sarsa learning method at every T moment:
(1) defining an induced target weight:
recording current information of all vehicles in a road network, real-time traffic information passing through a current road section and preference of each passing driver in the road network; assuming that there are n induction targets in total, the preference of each driver is denoted as weight vector ω ═ ω (ω ═ ω)1,…,ωn) Wherein, epsilono∈[0,1]And (3) representing the weight of the corresponding preference of the No. o induction target, defining the weight of each induction target:
Figure GDA0003374110940000021
each driver self-defines the attention degree of each induction target, namely, the preference of each driver is weighted;
the all-vehicle current information includes: including location, desired destination, all next traffic nodes that can be reached;
the real-time traffic information of the current road section comprises: travel time, distance, cost;
(2) calculating the traffic congestion coefficient of the current road network: counting the number NV of vehicles in the current road network, and calculating the traffic congestion coefficient of the current road network according to the number of the vehicles in the current road network, wherein the traffic congestion coefficient belongs to the following steps:
Figure GDA0003374110940000022
wherein β and γ are parameters, the traffic congestion coefficient ∈ represents the current traffic condition of the traffic system, the value of ∈ increases with the increase of the total number NV of vehicles in the current road network, and when the value of ∈ is larger, it means that the current traffic condition is congested, and vice versa.
(3) And updating a Q vector table by using a Sarsa learning method every T time: and (2) updating the Q vector table of the corresponding terminal point according to a Sarsa learning method for each guidance target o by the current information of all vehicles in the recorded road network which is acquired in the step (1) and is closest to the updating time and the real-time traffic information of the current road section and by using the next driving road section distributed in the step (3.3) and the step (3.4) every T moment, wherein the formula of the Sarsa learning method is as follows:
Figure GDA0003374110940000023
wherein the content of the first and second substances,
Figure GDA0003374110940000024
a Q value with o as an induction target and d as a terminal point, wherein the Q value is from a traffic node i to an adjacent traffic node j, k is the adjacent traffic node of the traffic node j, alpha is a learning rate,
Figure GDA0003374110940000031
for vehicles v to pass over sections sijThe actual prize value obtained;
the actual prize values include: travel time, distance, or cost, only one of which is selected.
And step 3: the calculation of the induction path comprises the following steps of 3.1-3.5:
step 3.1: and (3) normalizing by a Q vector table: according to the Q vector table updated in the step 2, respectively normalizing the corresponding Q values of different induction targets by adopting a dispersion standardization method, wherein the formula is as follows:
Figure GDA0003374110940000032
wherein the content of the first and second substances,
Figure GDA0003374110940000033
to pass through a section s of roadijNormalized Q values for the induced target o with end point d,
Figure GDA0003374110940000034
and
Figure GDA0003374110940000035
the minimum value and the maximum value of the Q values of all the road sections corresponding to the end point d and the induction target o are respectively.
Step 3.2: calculating a scalar value based on driver preferences: according to the corresponding driver preference, namely the weight vector epsilon obtained in the step 2 and the Q vector table normalized in the step 3.1, a linear scaling function is applied to convert the Q vectors of all the adjacent road sections of the current traffic node where the vehicle is located in the Q vector table with the end point d into a scaling value SQ based on the driver preference by using the following formulad(i, j), the specific formula is as follows:
Figure GDA0003374110940000036
wherein n represents the number of induction targets, ωoIndicating the preference weight corresponding to the target o,
Figure GDA0003374110940000037
representing a passing road section sijNormalized Q value of target o with end point d;
step 3.3: calculating Boltzmann probability distribution: using the scalar value SQ based on the driver's preference with the vehicle current information obtained in step 2d(i, j), calculating Boltzmann probability distribution of the current traffic node adjacent road section, wherein the formula is as follows:
Figure GDA0003374110940000038
wherein, Pd(i, j) vehicle end point d and select road segment sijThe probability of (i, j) is a traffic node,a (i) is a set consisting of end points of road sections with the traffic node i as a starting point and end points corresponding to adjacent road sections of the current node obtained according to a road network topological structure, and epsilon is a traffic congestion coefficient and ESQd(i) Scalar value SQ based on driver preference from road segment around node i to destination dd(i) Average value of (a).
Step 3.4: selecting the next travel segment that meets his personal preferences: calculating Boltzmann probability distribution of each road section based on the step 3.3, and selecting a next driving road section which meets personal preference of a driver by a roulette method;
step 3.5: and if the vehicle does not reach the destination, repeating the steps 3.2-3.3 until the vehicle reaches the destination.
The beneficial technical effects are as follows:
1. a dynamic route induction method based on multi-target Sarsa learning can fully utilize real-time information of a current traffic system, optimize a running route of a vehicle according to traffic conditions of the current traffic system, improve efficiency of the traffic system and relieve traffic jam conditions.
2. A dynamic path induction method based on multi-target Sarsa learning is based on the practical angle, and simultaneously carries out dynamic path induction of multiple induction targets, so that the induction requirements in the practical life are met.
3. A dynamic route induction method based on multi-target Sarsa learning considers the induction preference of a driver and provides a dynamic induction route which accords with the personal preference for the driver, so that the acceptance rate of the induction route is improved, the traffic efficiency of a traffic system is further improved, and the traffic jam condition is relieved.
Drawings
FIG. 1 is a flowchart of a dynamic path induction method based on multi-objective Sarsa learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of dynamic path induction according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a vehicle path calculation according to an embodiment of the present invention;
fig. 4 is a comparison diagram of the embodiment of the invention for traffic congestion compared with the conventional inducing method.
Detailed Description
The invention is further explained with reference to the drawings and the specific embodiment, and the whole process of information interaction between the dynamic path guidance system and the vehicle is shown in fig. 2. The vehicle in the road network sends data such as self position, terminal point, personal preference and the like to the dynamic path guidance system, the dynamic path guidance system calculates the guidance path which accords with the personal preference by using a path guidance algorithm through the data transmitted by the vehicle and the collected information such as real-time traffic conditions of the road network and the like, and sends the guidance path to the vehicle to complete information interaction between the two parties. A dynamic path induction method based on multi-target Sarsa learning comprises steps 1-3, as shown in FIG. 1:
step 1: initializing information, specifically comprising steps 1.1 to 1.3:
step 1.1: confirming an induction target: travel time and travel cost;
step 1.2: aiming at the induced targets, a traffic information center initializes a Q vector table of each induced target corresponding to a terminal to be selected on a road network by using a dynamic planning algorithm based on a Q value according to road network information in a geographic information base and historically collected static data of each road section, wherein one Q vector table corresponds to one terminal to be selected; first, initializing the Q vectors of the road segments around the possible destination d, and the specific operation is as follows:
Figure GDA0003374110940000041
Figure GDA0003374110940000042
Figure GDA0003374110940000043
wherein the content of the first and second substances,
Figure GDA0003374110940000044
to pass through a section s of roadijBeginning of reaching end point dThe initialized Q vector, i, j is the traffic node, timeijAnd costijRespectively, vehicle passing road section sijD is an end point set, a (i) is an end point set of links starting from the traffic node i, and b (i) is a start point set of links starting from the traffic node i.
Then, the Q vectors of all the road segments corresponding to the destination d are updated through a plurality of iterations, and the update formula is as follows:
Figure GDA0003374110940000051
Figure GDA0003374110940000052
wherein the content of the first and second substances,
Figure GDA0003374110940000053
for the n-th iteration, corresponding to the road section s with the end point dijAnd the obtained Q vector k is an adjacent traffic node of the traffic node j.
Step 1.3: setting a Q value information updating time interval T issued by a traffic information center;
the road network information includes: road network topological structure, road length and number of lanes;
the static data of each path segment comprises: historical vehicle transit time, cost;
as shown in fig. 3, taking a vehicle v with a destination d and located at a traffic node j as an example, the dynamic induction process is as follows:
step 2: the information updating specifically includes: defining the weight of an induced target, calculating the current road network traffic jam coefficient and updating a Q vector table by using a Sarsa learning method at every T moment:
(1) defining an induced target weight: recording current information of all vehicles in a road network, real-time traffic information passing through a current road section and preference of each passing driver in the road network; assuming that there are n induction targets in total, the preference of each driver is denoted as weight vector ω ═ ω (ω ═ ω)1,…,ωn) Wherein, ω iso∈[0,1]And (3) representing the weight of the corresponding preference of the No. o induction target, defining the weight of each induction target:
Figure GDA0003374110940000054
the all-vehicle current information includes: including location, desired destination, all next traffic nodes that can be reached;
the real-time traffic information of the current road section comprises: travel time, cost;
each driver self-defines the attention degree of each induction target, namely, the preference of each driver is weighted;
recording current information of the vehicle v such as position: traffic node j, desired destination: traffic node d, all next reachable traffic nodes: k. k ', k' passing through the current road section sijTravel time, cost, etc., and driver preferences. The preference weight vector ω of each driver is (0.8, 0.2). Where 0.8 and 0.2 represent the weight of the corresponding preference for the induction target in terms of time and expense, respectively.
(2) Calculating the traffic congestion coefficient of the current road network: counting the number NV of vehicles in the current road network, and calculating the traffic congestion coefficient of the current road network according to the number of the vehicles in the current road network, wherein the traffic congestion coefficient belongs to the following steps:
Figure GDA0003374110940000055
wherein β and γ are parameters, the traffic congestion coefficient ∈ represents the current traffic condition of the traffic system, the value of ∈ increases with the increase of the total number NV of vehicles in the current road network, and when the value of ∈ is larger, it means that the current traffic condition is congested, and vice versa. Where β, γ are set to 0.3, 0.005, respectively, assuming that the number of vehicles NV in the current road network is 500, then ∈ is 0.8.
(3) Every T time, all recorded road networks which are acquired in the step (1) and are nearest to the updating time are passedCurrent information of vehicles and real-time traffic information passing through current road section, e.g. vehicle v on road section sijOn the immediate travel time
Figure GDA0003374110940000061
And immediate expense
Figure GDA0003374110940000062
And the next driving section s assigned using the 3-step route selection methodjkAnd assuming that the learning rate alpha is 0.7, in the current Q vector table
Figure GDA0003374110940000063
And
Figure GDA0003374110940000064
the values of (1) and (250s, 21) and (200s, 20) respectively. Therefore, the Q vector table of the corresponding end point d is respectively updated for each induction target according to the Sarsa learning method. The Sarsa learning formula is as follows:
Figure GDA0003374110940000065
Figure GDA0003374110940000066
wherein the content of the first and second substances,
Figure GDA0003374110940000067
a Q vector starting from traffic node i and passing through an adjacent traffic node j and ending at d.
And step 3: the calculation of the induction path comprises the following steps of 3.1-3.5:
step 3.1: and (3) normalizing by a Q vector table: according to the Q vector table updated in the step 2, different induced targets are normalized by a dispersion standardization method respectively to obtain corresponding Q values, so that the problem that the different induced targets have different units and dimensions is solved, and the formula is as follows:
Figure GDA0003374110940000068
wherein the content of the first and second substances,
Figure GDA0003374110940000069
to pass through a section s of roadijNormalized Q values for the induced target o with end point d,
Figure GDA00033741109400000610
and
Figure GDA00033741109400000611
the minimum value and the maximum value of the Q values of all the road sections corresponding to the end point d and the induction target o are respectively.
Based on the values in Q vector Table and 2
Figure GDA00033741109400000612
Updating the section s in the normalized Q vector table corresponding to the end point d according to the valueijThe corresponding normalized Q vector.
Step 3.2: calculating a scalar value based on driver preferences: according to the weight vector omega which is the corresponding driver preference obtained in the step (2) and the Q vector table after normalization in the step (1), a linear scaling function is applied to convert the Q vectors of all the adjacent road sections of the current traffic node where the vehicle v is located in the Q vector table with the end point d into the scaling value SQ based on the driver preferenced(i, j), according to FIG. 3, the specific operation is as follows:
SQd(j,k)=0.8×0.195+0.2×0.388=0.2336
SQd(j,k′)=0.8×0.253+0.2×0.306=0.2636
SQd(j,k″)=0.8×0.310+0.2×0.306=0.3092
step 3.3: calculating Boltzmann probability distribution: using the scalar value SQ based on the driver's preference with the vehicle current information obtained in step 2d(i, j), calculating Boltzmann probability distribution of the current traffic node adjacent road section, wherein the formula is as follows:
Figure GDA0003374110940000071
wherein, Pd(i, j) vehicle end point d and select road segment sijThe probability of (a) is a traffic node, A (i) is a set of end points of road sections with the traffic node i as a starting point, and according to a set formed by end points corresponding to adjacent road sections of the current node obtained by a road network topological structure, the epsilon is a traffic congestion coefficient, and ESQd(i) Scalar value SQ based on driver preference from road segment around node i to destination dd(i) Average value of (a).
Can be calculated as pd(j,k)=0.3705,pd(j,k′)=0.3387,pd(j,k")=0.2908
Step 3.4: selecting the next travel segment that meets his personal preferences: calculating Boltzmann probability distribution of each road section based on the step 3.3, and selecting a next driving road section which meets personal preference of a driver by a roulette method;
step 3.5: and if the vehicle does not reach the destination, repeating the steps 3.2-3.3 until the vehicle reaches the destination.
As shown in fig. 4, for the traffic congestion situation compared with the conventional guidance method, the abscissa is the simulation time step, and the ordinate is the current total vehicle number of the road network; the more vehicles are, the more congested the road network is. Compared with the traditional path induction method Dijk, the SMOSWU and the dynamic path induction method based on multi-target Sarsa learning, disclosed by the invention, have the advantages that real-time traffic information is fully utilized, the efficiency of a traffic system is improved, and the traffic jam condition is effectively relieved on the basis of considering the personal preference of a user.

Claims (6)

1. A dynamic path induction method based on multi-target Sarsa learning is characterized by comprising the following procedures:
step 1: initializing information, specifically comprising steps 1.1 to 1.3:
step 1.1: confirming an induction target: the method comprises the steps of selecting one or more of minimized travel time, minimized travel distance and minimized cost;
step 1.2: aiming at the induced targets, a traffic information center initializes a Q vector table of each induced target corresponding to a terminal to be selected on a road network by using a dynamic planning algorithm based on a Q value according to road network information in a geographic information base and historically acquired static data of each road section, wherein one Q vector table corresponds to one terminal to be selected;
step 1.3: setting a Q value information updating time interval T issued by a traffic information center;
step 2: the information updating specifically includes: defining the weight of an induced target, calculating the current road network traffic jam coefficient and updating a Q vector table by using a Sarsa learning method at every T moment:
(1) defining an induced target weight:
recording current information of all vehicles in a road network, real-time traffic information passing through a current road section and preference of each passing driver in the road network; assuming that there are n induction targets in total, the preference of each driver is denoted as weight vector ω ═ ω (ω ═ ω)1,…,ωn) Wherein, ω iso∈[0,1]And (3) representing the weight of the corresponding preference of the No. o induction target, defining the weight of each induction target:
Figure FDA0003374110930000011
each driver self-defines the attention degree of each induction target, namely, the preference of each driver is weighted;
(2) calculating the traffic congestion coefficient of the current road network: counting the number NV of vehicles in the current road network, and calculating the traffic congestion coefficient of the current road network according to the number of the vehicles in the current road network, wherein the traffic congestion coefficient belongs to the following steps:
Figure FDA0003374110930000012
wherein, beta and gamma are parameters, and the traffic congestion coefficient belongs to the current traffic condition of the traffic system;
(3) and updating a Q vector table by using a Sarsa learning method every T time: and (2) updating the Q vector table of the corresponding terminal point according to a Sarsa learning method for each guidance target o by the current information of all vehicles in the recorded road network which is acquired in the step (1) and is closest to the updating time and the real-time traffic information of the current road section and by using the next driving road section distributed in the step (3.3) and the step (3.4) every T moment, wherein the formula of the Sarsa learning method is as follows:
Figure FDA0003374110930000013
wherein the content of the first and second substances,
Figure FDA0003374110930000014
a Q value with o as an induction target and d as a terminal point, wherein the Q value is from a traffic node i to an adjacent traffic node j, k is the adjacent traffic node of the traffic node j, alpha is a learning rate,
Figure FDA0003374110930000015
for vehicles v to pass over sections sijThe actual prize value obtained;
and step 3: the calculation of the induction path comprises the following steps of 3.1-3.5:
step 3.1: and (3) normalizing by a Q vector table: according to the Q vector table updated in the step 2, respectively normalizing the corresponding Q values of different induction targets by adopting a dispersion standardization method, wherein the formula is as follows:
Figure FDA0003374110930000021
wherein the content of the first and second substances,
Figure FDA0003374110930000022
to pass through a section s of roadijNormalized Q values for the induced target o with end point d,
Figure FDA0003374110930000023
and
Figure FDA0003374110930000024
respectively taking the minimum value and the maximum value of Q values of all road sections corresponding to the destination d and the induction target o;
step 3.2: calculating a scalar value based on driver preferences: according to the corresponding driver preference, namely the weight vector omega obtained in the step 2 and the Q vector table normalized in the step 3.1, a linear scaling function is applied to convert the Q vectors of all the adjacent road sections of the current traffic node where the vehicle is located in the Q vector table with the end point d into a scaling value SQ based on the driver preference by using the following formulad(i, j), the specific formula is as follows:
Figure FDA0003374110930000025
wherein n represents the number of induction targets, ωoIndicating the preference weight corresponding to the target o,
Figure FDA0003374110930000026
representing a passing road section sijNormalized Q value of target o with end point d;
step 3.3: calculating Boltzmann probability distribution: using the scalar value SQ based on the driver's preference with the vehicle current information obtained in step 2d(i, j), calculating Boltzmann probability distribution of the current traffic node adjacent road section, wherein the formula is as follows:
Figure FDA0003374110930000027
wherein, Pd(i, j) vehicle end point d and select road segment sijThe probability of (a) is a traffic node, A (i) is a set of end points of road sections with the traffic node i as a starting point, according to a set formed by end points corresponding to adjacent road sections of the current node obtained by a road network topological structure, epsilon is a traffic congestion coefficient,ESQd(i) scalar value SQ based on driver preference from road segment around node i to destination dd(i) Average value of (d);
step 3.4: selecting the next travel segment that meets his personal preferences: calculating Boltzmann probability distribution of each road section based on the step 3.3, and selecting a next driving road section which meets personal preference of a driver by a roulette method;
step 3.5: and if the vehicle does not reach the destination, repeating the steps 3.2-3.3 until the vehicle reaches the destination.
2. The dynamic path induction method based on multi-objective Sarsa learning of claim 1, wherein the road network information in step 1 comprises: road network topology, road length, number of lanes.
3. The dynamic path induction method based on multi-target Sarsa learning as claimed in claim 1, wherein the static data of each road section in step 1 comprises: historical vehicle transit time, distance, cost.
4. The dynamic path guidance method based on multi-target Sarsa learning of claim 1, wherein the current information of all vehicles in step 2 comprises: including location, desired destination, all next traffic nodes that can be reached.
5. The dynamic path induction method based on multi-target Sarsa learning of claim 1, wherein the step 2 comprises the following steps: travel time, distance, cost.
6. The dynamic path induction method based on multi-target Sarsa learning of claim 1, wherein the actual reward value of step 2 comprises: travel time, distance, or cost, only one of which is selected.
CN201810992284.5A 2018-08-29 2018-08-29 Dynamic path induction method based on multi-target Sarsa learning Expired - Fee Related CN109269516B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810992284.5A CN109269516B (en) 2018-08-29 2018-08-29 Dynamic path induction method based on multi-target Sarsa learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810992284.5A CN109269516B (en) 2018-08-29 2018-08-29 Dynamic path induction method based on multi-target Sarsa learning

Publications (2)

Publication Number Publication Date
CN109269516A CN109269516A (en) 2019-01-25
CN109269516B true CN109269516B (en) 2022-03-04

Family

ID=65154604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810992284.5A Expired - Fee Related CN109269516B (en) 2018-08-29 2018-08-29 Dynamic path induction method based on multi-target Sarsa learning

Country Status (1)

Country Link
CN (1) CN109269516B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110631599A (en) * 2019-08-29 2019-12-31 重庆长安汽车股份有限公司 Navigation method, system, server and automobile based on air pollution
CN114664086B (en) * 2019-12-18 2023-11-24 北京嘀嘀无限科技发展有限公司 Method, device, electronic equipment and storage medium for controlling information release
CN112039767B (en) * 2020-08-11 2021-08-31 山东大学 Multi-data center energy-saving routing method and system based on reinforcement learning
CN113503888A (en) * 2021-07-09 2021-10-15 复旦大学 Dynamic path guiding method based on traffic information physical system
CN114267176A (en) * 2021-12-24 2022-04-01 中电金信软件有限公司 Navigation method, navigation device, electronic equipment and computer readable storage medium
CN114459498A (en) * 2022-03-14 2022-05-10 南京理工大学 New energy vehicle charging station selection and self-adaptive navigation method based on reinforcement learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104658297A (en) * 2015-02-04 2015-05-27 沈阳理工大学 Central type dynamic path inducing method based on Sarsa learning
CN106096756A (en) * 2016-05-31 2016-11-09 武汉大学 A kind of urban road network dynamic realtime Multiple Intersections routing resource
CN107977738A (en) * 2017-11-21 2018-05-01 合肥工业大学 A kind of multiobjective optimization control method for conveyer belt feed processing station system
US10024675B2 (en) * 2016-05-10 2018-07-17 Microsoft Technology Licensing, Llc Enhanced user efficiency in route planning using route preferences
CN108389419A (en) * 2018-03-02 2018-08-10 辽宁工业大学 A kind of Dynamic Route Guidance Method of Vehicle

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104658297A (en) * 2015-02-04 2015-05-27 沈阳理工大学 Central type dynamic path inducing method based on Sarsa learning
US10024675B2 (en) * 2016-05-10 2018-07-17 Microsoft Technology Licensing, Llc Enhanced user efficiency in route planning using route preferences
CN106096756A (en) * 2016-05-31 2016-11-09 武汉大学 A kind of urban road network dynamic realtime Multiple Intersections routing resource
CN107977738A (en) * 2017-11-21 2018-05-01 合肥工业大学 A kind of multiobjective optimization control method for conveyer belt feed processing station system
CN108389419A (en) * 2018-03-02 2018-08-10 辽宁工业大学 A kind of Dynamic Route Guidance Method of Vehicle

Also Published As

Publication number Publication date
CN109269516A (en) 2019-01-25

Similar Documents

Publication Publication Date Title
CN109269516B (en) Dynamic path induction method based on multi-target Sarsa learning
CN108197739B (en) Urban rail transit passenger flow prediction method
CN112629533B (en) Fine path planning method based on road network rasterization road traffic prediction
CN109102124B (en) Dynamic multi-target multi-path induction method and system based on decomposition and storage medium
CN111080018B (en) Intelligent network-connected automobile speed prediction method based on road traffic environment
CN105260785B (en) Logistics distribution vehicle path optimization method based on improved cuckoo algorithm
CN110570672A (en) regional traffic signal lamp control method based on graph neural network
CN106845703B (en) Urban road network time-varying K shortest path searching method considering steering delay
CN109115220B (en) Method for parking lot system path planning
CN109238297B (en) Dynamic path selection method for user optimization and system optimization
CN113516277B (en) Internet intelligent traffic path planning method based on road network dynamic pricing
CN109489679B (en) Arrival time calculation method in navigation path
CN115409256A (en) Route recommendation method for congestion area avoidance based on travel time prediction
CN108830401B (en) Dynamic congestion charging optimal rate calculation method based on cellular transmission model
CN112562363B (en) Intersection traffic signal optimization method based on V2I
CN114120670A (en) Method and system for traffic signal control
CN113724507A (en) Traffic control and vehicle induction cooperation method and system based on deep reinforcement learning
US11237008B2 (en) System and method for controlling vehicular pollution concentration and providing maximum traffic flow throughput
CN117116064A (en) Passenger delay minimization signal control method based on deep reinforcement learning
CN108256662A (en) The Forecasting Methodology and device of arrival time
Martynova et al. Ant colony algorithm for rational transit network design of urban passenger transport
CN111862657A (en) Method and device for determining road condition information
CN113343358B (en) Electric vehicle charging load space-time distribution modeling method considering road information
CN115481777A (en) Multi-line bus dynamic schedule oriented collaborative simulation optimization method, device and medium
CN113628455B (en) Intersection signal optimization control method considering number of people in vehicle under Internet of vehicles environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220304