CN109269516A

CN109269516A - A kind of dynamic route guidance method based on multiple target Sarsa study

Info

Publication number: CN109269516A
Application number: CN201810992284.5A
Authority: CN
Inventors: 文峰; 封筱
Original assignee: Shenyang Ligong University
Current assignee: Shenyang Ligong University
Priority date: 2018-08-29
Filing date: 2018-08-29
Publication date: 2019-01-25
Anticipated expiration: 2038-08-29
Also published as: CN109269516B

Abstract

The present invention proposes a kind of dynamic route guidance method based on multiple target Sarsa study, and process includes: information initializing；Information update；Path computing, including the normalization of Q vector table are induced, the scalar value based on driver's preference is calculated, calculates Boltzmann probability distribution, is next running section that driver's selection meets its people's preference by wheel disc bet method, until driver's vehicle arrives at the destination.According to the traffic condition of Current traffic system, optimize the driving path of vehicle, improve traffic system efficiency, alleviates traffic congestion.From actual angle, while the dynamic path guidance of more induction targets is carried out, more meets the supplier induced demand in real life.Consider that driver induces preference, the dynamic induction path for meeting personal preference is provided for driver, to improve induction path receptance, further increases the traffic efficiency of traffic system, alleviation traffic congestion.

Description

A kind of dynamic route guidance method based on multiple target Sarsa study

Technical field

The invention belongs to field of intelligent transportation technology, and in particular to a kind of dynamic route based on multiple target Sarsa study Abductive approach.

Background technique

In recent years, with the rapid development of Chinese society economy, private car ownership is constantly soaring, the following city The problems such as city's traffic pressure increases, urban traffic congestion, blocking, traffic accident takes place frequently also gets worse.In addition, driver makees For the important participant in traffic system, often there are multiple induction targets and have to different targets simultaneously in reach Different preferences.The acceptance level of induction information can be had a huge impact to influence by whether considering driver individual's preference The traffic efficiency of traffic system.Therefore, from traffic congestion is alleviated, the angle for meeting driver individual's preference sets out, realize efficiently, Dynamic paths chosen is necessary.

Intensified learning has very strong adaptivity and self-learning capability, does not need priori knowledge and modeling, so that it may with The variation of system environments constantly adjust itself control strategy, learnt using the multidate information of system, meet to height with Machine, complexity system for traffic guiding control requirement.Sarsa learns the intensified learning learnt as a kind of on-policy Algorithm is particularly well suited to complicated and changeable, and the search of optimal path and the dynamic of vehicle lure in the system for traffic guiding of strong real-time It leads.

The paths chosen model and induction algorithm that it is proposed at present are the single goal only for Link Travel Time building mostly Paths chosen method has ignored the supplier induced demand in real life and the personal preference of driver.Multiple target intensified learning is normal It is made to solve such multi-objective optimization question, the method for solving multiple target intensified learning optimal solution set is broadly divided into single strategy side Method and more strategy process.However compared to single strategy process, more strategy process can all learn a system when every time with environmental interaction The set of column optimal solution goes to approach the forward position Pareto, this process needs a large amount of calculating times, and corresponding calculation amount is also very big.And More strategy process are used in on-policy study, the plenty of time needed for the calculation amount and storage of corresponding disaggregation all makes such Method is not suitable for Dynamic Route Guidance System.Therefore, single strategy multiple target Sarsa study, suitable for solving comprising more luring Lead the dynamic path guidance problem that driver's preference is considered on the basis of target.

Summary of the invention

According to the above technical problem, the object of the present invention is to provide a kind of dynamic routes based on multiple target Sarsa study Abductive approach.Real time traffic data information and driver individual's preference information are made full use of, is provided for driver according to personal While the paths chosen information of preference, coordinate whole traffic system and pass through, alleviate traffic congestion, improves the current effect of traffic system Rate.

The technical solution adopted is that: it is a kind of based on multiple target Sarsa study dynamic route guidance method include step 1~ Step 3:

Step 1: information initializing specifically includes step 1.1~step 1.3:

Step 1.1: confirmation induction target: minimizing hourage including selection, minimize travel distance and minimize flower Take, it is one or several kinds of；；

Step 1.2: for induction target, traffic information center is using the dynamic programming algorithm based on Q value and according to geography Road network information and the collected each section static data of history are corresponding to initialize each induction target on road network in information bank The Q vector table of terminal to be selected, and the corresponding terminal to be selected of a Q vector table；

Step 1.3: the Q value information renewal time interval T that setting traffic information center is issued；

The road network information includes: road network topology structure, link length, number of track-lines；

Each section static data includes: history vehicle pass-through time, distance, cost；

Step 2: information update specifically includes: defining induction target weight, current road grid traffic congestion coefficient calculates and every Every the T moment, Q vector table is updated with Sarsa learning method:

(1) definition induction target weight:

All vehicle current informations in road network are recorded, by current in the Real-time Traffic Information and road network of current road segment Each driver preference；Assuming that share n induction target, then the preference of each driver be denoted as weight vector ω= (ω₁..., ω_n), wherein ω_o∈ [0,1] indicates that o-th of induction target corresponds to the weight of preference, defines each induction target Weight:

To the degree of taking notice of of each induction target, the preference of as each driver is remembered to be weighed each driver's self-defining Weight；

All vehicle current informations include: including position, it is expected that destination, all next traffic sections that can be reached Point；

The Real-time Traffic Information of the current road segment includes: running time, distance, cost；

(2) current road grid traffic congestion coefficient calculates: counting vehicle fleet size NV in current road network, and according in current road network Vehicle fleet size calculates current road grid traffic congestion coefficient ∈:

Wherein, beta, gamma is parameter, and traffic congestion coefficient ∈ indicates the current traffic condition of traffic system, and the value of ∈ can be with The increase of total vehicle fleet size NV in current road network and increase, when ∈ value is larger, it is meant that current traffic condition is more gathered around Stifled, vice versa.

(3) every the T moment, Q vector table is updated with Sarsa learning method: every the T moment, by being obtained in (1) away from more The real time information of vehicle on each section of new time recently, and the next traveling distributed using step 3.3 and step 3.4 Section updates the Q vector table of corresponding terminal, Sarsa learning method to each induction target o, according to Sarsa learning method respectively Formula is as follows:

Wherein,To be induction target from transport node i by adjacent traffic node j and terminal is d's with o Q value, k are the adjacent traffic node of transport node j, and α is learning rate,It is vehicle ν by section s_ijThe practical prize obtained Reward value；

The practical reward value includes: running time, distance or cost, only selects one kind.

Step 3: induction path computing, including step 3.1~step 3.5:

The normalization of step 3.1:Q vector table: according to Q vector table updated in step 2, different induction targets is distinguished Corresponding Q value is normalized using deviation standardized method, formula is as follows:

Wherein,For by section s_ijTerminal is the normalized Q of the induction target o of d,WithRespectively terminal is d and induces target to be the minimum value and maximum value in all section Q values corresponding to o.

Step 3.2: calculate the scalar value based on driver's preference: corresponding driver's preference according to obtained in step 2 is Terminal is d using the following formula of linear scalarization function by Q vector table after weight vector ω and step 3.1 normalization Q vector table in Current traffic node locating for vehicle whole adjacent segments Q vector, be converted to the mark based on driver's preference Magnitude SQ_d(i, j), specific formula is as follows:

Wherein, n indicates induction destination number, ω_oIndicate the corresponding preference weight of target o,It indicates through passing by one's way The normalized Q for the target o that section sij terminal is d；

Step 3.3: calculating Boltzmann probability distribution: by the vehicle current information obtained in step 2, using being based on The scalar value SQ of driver's preference_d(i, j) calculates the Boltzmann probability distribution of Current traffic node adjacent segments, and formula is such as Under:

Wherein, P_d(i, j) is that vehicle terminal is d and selects section s_ijProbability, i, j are transport node, and A (i) is to hand over Logical node i is the destination set in the section of starting point, according to end corresponding to present node adjacent segments obtained by road network topology structure The set of point composition, ∈ are traffic congestion coefficient, ESQ_d(i) be around node i section to destination d based on driver's preference Scalar value SQ_d(i) average value.

Step 3.4: selection meets next running section of its people's preference: calculating each section based on step 3.3 Boltzmann probability distribution is next running section that driver's selection meets its people's preference by wheel disc bet method；

Step 3.5: if vehicle does not arrive at the destination, step 3.2~3.3 are repeated, until vehicle arrives at the destination.

Advantageous effects:

1. a kind of dynamic route guidance method based on multiple target Sarsa study can make full use of Current traffic system Real time information optimizes the driving path of vehicle according to the traffic condition of Current traffic system, improves traffic system efficiency, alleviates Traffic congestion.

2. a kind of dynamic route guidance method based on multiple target Sarsa study carries out more from actual angle The dynamic path guidance for inducing target, more meets the supplier induced demand in real life.

3. a kind of dynamic route guidance method based on multiple target Sarsa study considers that driver induces preference, to drive Person provides the dynamic induction path for meeting personal preference, induces path receptance to improve, further increases traffic system Traffic efficiency, alleviate traffic congestion.

Detailed description of the invention

Fig. 1 is a kind of dynamic route guidance method flow chart based on multiple target Sarsa study of the embodiment of the present invention；

Fig. 2 is the dynamic path guidance schematic diagram of the embodiment of the present invention；

Fig. 3 is that the vehicle route of the embodiment of the present invention calculates schematic diagram；

Fig. 4 is directed to traffic congestion situation contrast schematic diagram for the embodiment of the present invention compared with traditional abductive approach.

Specific embodiment

Invention is described further with specific implementation example with reference to the accompanying drawing, entire Dynamic Route Guidance System and vehicle The process of information interaction is as shown in Figure 2.Vehicle in road network sends self-position, end to Dynamic Route Guidance System Data, the above-mentioned data and collected road network that Dynamic Route Guidance System is transmitted by vehicle such as point, personal preference are handed in real time The information such as logical situation calculate the induction path for meeting personal preference using route guidance algorithm, and are sent to vehicle, complete both sides Between information exchange.It is a kind of based on multiple target Sarsa study dynamic route guidance method include step 1~step 3, such as Fig. 1 It is shown:

Step 1: information initializing specifically includes step 1.1~step 1.3:

Step 1.1: confirmation induction target: hourage time and travel costs cost；

Step 1.2: for induction target, traffic information center is using the dynamic programming algorithm based on Q value and according to geography Road network information and the collected each section static data of history are corresponding to initialize each induction target on road network in information bank The Q vector table of terminal to be selected, and the corresponding terminal to be selected of a Q vector table；Possible destination is initialized first d weeks The Q vector in section is enclosed, concrete operations are as follows:

Wherein,For by section s_ijIt reaches home the initialization Q vector of d, i, j are traffic Node, time_ijAnd cost_ijRespectively vehicle passes through section s_ijTime and cost, D be destination set, A (i) is with traffic section Point i is starting point

Section destination set, B (i) be using transport node i as the section of terminal rise point set.

Then, the Q vector in all sections of corresponding destination d is updated by successive ignition, more new formula is as follows:

Wherein,To correspond to the section s that terminal is d when nth iteration_ijObtained Q vector, K is the adjacent traffic node of transport node j.

Each section static data includes: the history vehicle pass-through time spends；

As shown in figure 3, by taking the vehicle ν that destination is d and is located at transport node j as an example, the following institute of dynamic Induction Process Show:

(1) definition induction target weight: all vehicle current informations in record road network, by the real-time traffic of current road segment The preference of current each driver in information and road network；Assuming that sharing n induction target, the then preference of each driver It is denoted as weight vector ω=(ω₁..., ω_n), wherein ω_o∈ [0,1] indicates that o-th of induction target corresponds to the weight of preference, Define the weight of each induction target:

The Real-time Traffic Information of the current road segment includes: running time, spends；

Record vehicle v current information such as position: transport node j, expectation destination: transport node d, can reach it is all under One transport node: k, k ', k ", by current road segment s_ijThe Real-time Traffic Informations such as running time, cost and driver it is inclined It is good.Preference weight vector ω=(0.8,0.2) of each driver.Wherein 0.8 and 0.2 respectively indicates with the time and spend to lure Lead the weight of preference corresponding to target.

Wherein, beta, gamma is parameter, and traffic congestion coefficient ∈ indicates the current traffic condition of traffic system, and the value of ∈ can be with The increase of total vehicle fleet size NV in current road network and increase, when ∈ value is larger, it is meant that current traffic condition is more gathered around Stifled, vice versa.Wherein, beta, gamma is respectively set to 0.3,0.005. and assumes that vehicle fleet size NV is 500 in current road network, then ∈= 0.8。

(3) every the T moment, pass through vehicle on each section nearest away from renewal time of acquisition in (1), such as vehicle v In section s_ijOn running time immediatelyIt spends immediatelyAnd use Path selection in 3 Next running section s that method is distributed_jk, and assume that learning rate α is 0.7, in current Q vector table WithValue be respectively (250s, 21$) and (200s, 20$).Therefore to each induction target according to Sarsa learning method updates the Q vector table of corresponding terminal d respectively.It is as follows that Sarsa learns formula:

Wherein,The Q for being d by adjacent traffic node j and terminal from transport node i Vector.

Step 3: induction path computing, including step 3.1~step 3.5:

The normalization of step 3.1:Q vector table: according to Q vector table updated in step 2, different induction targets is distinguished Corresponding Q value is normalized using deviation standardized method, solves different induction targets asking with different unit and dimension Topic, formula are as follows:

It can be obtained based on the value in Q vector table and 2More according to this value Section s in the new normalization Q vector table corresponding to terminal d_ijCorresponding normalization Q vector.

Step 3.2: calculate the scalar value based on driver's preference: corresponding driver's preference according to obtained in not chasing after 2 is i.e. Q vector table in weight vector ω, and (1) after normalization, using linear scalarization function, by terminal in the Q vector table of d The Q vector median filters of whole adjacent segments of Current traffic node locating for vehicle v are the scalar value SQ based on driver's preference_d(i, J), according to Fig. 3, concrete operations are as follows:

SQ_d(j, k)=0.8 × 0.195+0.2 × 0.388=0.2336

SQ_d(j, k ')=0.8 × 0.253+0.2 × 0.306=0.2636

SQ_d(j, k ")=0.8 × 0.310+0.2 × 0.306=0.3092

It can be calculated, p_d(j, k)=0.3705, p_d(j, k ')=0.3387, p_d(j, k ")=0.2908

As shown in figure 4, being directed to traffic congestion situation compared with traditional abductive approach for the present invention, abscissa is simulation time Step, ordinate are road network currently total vehicle fleet size；Vehicle fleet size more multi path network more congestion.Contrast schematic diagram, Dijk represent tradition Paths chosen method, SMOSWU represent the method for the present invention, a kind of dynamic road based on multiple target Sarsa study proposed by the present invention Diameter abductive approach makes full use of on the basis of considering individual subscriber preference compared to legacy paths abductive approach Diikstra Real-time Traffic Information, improves the efficiency of traffic system, and traffic congestion is effectively relieved.

Claims

1. a kind of dynamic route guidance method based on multiple target Sarsa study, which is characterized in that including following process:

Step 1: information initializing specifically includes step 1.1~step 1.3:

Step 1.1: confirmation induction target: being spent including selecting to minimize hourage, minimum travel distance and minimize, one Kind is several；

Step 1.2: for induction target, traffic information center is using the dynamic programming algorithm based on Q value and according to geography information Road network information and the collected each section static data of history in library, come initialize each induction target on road network it is corresponding to Select the Q vector table of terminal, and the corresponding terminal to be selected of a Q vector table；

Step 2: information update specifically includes: definition induction target weight, current road grid traffic congestion coefficient calculate and every T Moment updates Q vector table with Sarsa learning method:

(1) definition induction target weight:

All vehicle current informations in road network are recorded, it is every by what is passed through in the Real-time Traffic Information and road network of current road segment The preference of a driver；Assuming that share n induction target, then the preference of each driver be denoted as weight vector ω= (ω₁..., ω_n), wherein ω_o∈ [0,1] indicates that o-th of induction target corresponds to the weight of preference, defines each induction target Weight:

Take notice of degree of each driver's self-defining to each induction target, the preference note weight of as each driver；

(2) current road grid traffic congestion coefficient calculates: counting vehicle fleet size NV in current road network, and according to vehicle in current road network Quantity calculates current road grid traffic congestion coefficient ∈:

Wherein, beta, gamma is parameter, and traffic congestion coefficient ∈ indicates the current traffic condition of traffic system；

(3) every the T moment, Q vector table is updated with Sarsa learning method: every the T moment, when by being obtained in (1) away from updating Between on nearest each section vehicle real time information, and the next running section distributed using step 3.3 and step 3.4 To each induction target o, the Q vector table of corresponding terminal, Sarsa learning method formula are updated respectively according to Sarsa learning method It is as follows:

Wherein,To be the Q value for inducing target from transport node i by adjacent traffic node j and terminal for d with o, K is the adjacent traffic node of transport node j, and α is learning rate,It is vehicle v by section s_ijThe practical reward value obtained；

Step 3: induction path computing, including step 3.1~step 3.5:

The normalization of step 3.1:Q vector table: according to Q vector table updated in step 2, different induction targets is respectively adopted Deviation standardized method normalizes corresponding Q value, and formula is as follows:

Wherein,For by section s_ijTerminal is the normalized Q of the induction target o of d,WithPoint Not Wei terminal be d and to induce target be minimum value and maximum value in all section Q values corresponding to o；

Step 3.2: calculating the scalar value based on driver's preference: corresponding driver's preference, that is, weight according to obtained in step 2 Q vector table after vector ω and step 3.1 normalize swears the Q that terminal is d using the following formula of linear scalarization function The Q vector of whole adjacent segments of Current traffic node locating for vehicle, is converted to the scalar value based on driver's preference in scale SQ_d(i, j), specific formula is as follows:

Wherein, n indicates induction destination number, ω_oIndicate the corresponding preference weight of target o,It indicates by section s_ijEventually Point is the normalized Q of the target o of d；

Step 3.3: calculating Boltzmann probability distribution: by the vehicle current information obtained in step 2, using based on driving The scalar value SQ of person's preference_d(i, j) calculates the Boltzmann probability distribution of Current traffic node adjacent segments, and formula is as follows:

Wherein, P_d(i, j) is that vehicle terminal is d and selects section s_ijProbability, i, j are transport node, and A (i) is with traffic section Point i is the destination set in the section of starting point, according to terminal group corresponding to present node adjacent segments obtained by road network topology structure At set, ∈ be traffic congestion coefficient, ESQ_d(i) be around node i section to the mark based on driver's preference of destination d Magnitude SQ_d(i) average value；

Step 3.4: selection meets next running section of its people's preference: it is general to calculate each section Boltzmann based on step 3.3 Rate distribution is next running section that driver's selection meets its people's preference by wheel disc bet method；

2. a kind of dynamic route guidance method based on multiple target Sarsa study according to claim 1, which is characterized in that Road network information described in step 1 includes: road network topology structure, link length, number of track-lines.

3. a kind of dynamic route guidance method based on multiple target Sarsa study according to claim 1, which is characterized in that Each section static data described in step 1 includes: history vehicle pass-through time, distance, cost.

4. a kind of dynamic route guidance method based on multiple target Sarsa study according to claim 1, which is characterized in that All vehicle current informations described in step 2 include: including position, it is expected that destination, all next traffic sections that can be reached Point.

5. a kind of dynamic route guidance method based on multiple target Sarsa study according to claim 1, which is characterized in that The Real-time Traffic Information of current road segment described in step 2 includes: running time, distance, cost.

6. a kind of dynamic route guidance method based on multiple target Sarsa study according to claim 1, which is characterized in that Practical reward value described in step 2 includes: running time, distance or cost, only selects one kind.