CN109579861B

CN109579861B - Path navigation method and system based on reinforcement learning

Info

Publication number: CN109579861B
Application number: CN201811504732.9A
Authority: CN
Inventors: 余辰; 金海�; 谢晓然; 邹俊峰; 郝童博
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2018-12-10
Filing date: 2018-12-10
Publication date: 2020-05-19
Anticipated expiration: 2038-12-10
Also published as: CN109579861A

Abstract

The invention discloses a path navigation method and a system based on reinforcement learning, which comprises the following steps: constructing a road adjacency relation graph of the city according to map data of the city; predicting congestion indexes of different sections of the city in different time periods according to the vehicle track data and the road adjacency graph; constructing a road congestion probability map of the city according to the congestion index on the basis of the road adjacency map; and generating a navigation path based on reinforcement learning, wherein the state space of the reinforcement learning comprises the road congestion probability map. The urban road congestion situation is probabilistic on the basis of digitization, and is more visual and easy to visualize; the road congestion calculation only utilizes the road condition and the historical vehicle track data, and is convenient to practice; the method is different from a general obstacle routing method, the probability routing value is more accurate, and routes which cannot be found by a general routing algorithm can be found; the reinforcement learning is taken as a heuristic algorithm to consider the time consumption and smoothness of path finding, so that a global optimal solution is obtained, and the accuracy of the path finding algorithm is improved.

Description

Path navigation method and system based on reinforcement learning

Technical Field

The invention belongs to the technical field of path navigation, and particularly relates to a path navigation method and system based on reinforcement learning.

Background

It has become daily for cell phone navigation to find effective driving routes. The good driving route can not only save the time of the driver, but also save the energy consumption. The wide use of the GPS device allows us to easily acquire detailed road information of a city, such as traffic volume, speed, etc. The data has an extremely important guiding function for path navigation.

In the prior art, patent CN108847037A discloses a non-global information oriented urban road network path planning method, which enables the distribution of traffic flow to the road network to have the capability of adaptive adjustment through reinforcement learning, so that the state of the road network is in a flow balance state. However, the a × R routing algorithm in the method is relatively coarse in estimation of the time taken from the current position to the target position in the evaluation function, the accuracy is insufficient, and the space-time complexity is high. Strictly, the equivalent proposes a dynamic real-time multi-intersection path selection model of an urban traffic network, combines the preference of vehicles for front selectable routes and the real-time traffic states of the selectable routes, and utilizes a self-adaptive learning algorithm to carry out game so as to enable the dynamic route selection strategy of each running vehicle to reach Nash balance. However, the method has the defects that application scenarios require multiple assumptions (for example, each vehicle is independently routed according to a certain fixed probability, and each vehicle can observe the routing of other vehicles), the considered factors are excessive (for example, road illumination, road flatness and other indexes which are difficult to measure), and the method is difficult to implement.

In summary, the existing path navigation methods based on reinforcement learning all have the problems that the application scene of the algorithm needs a plurality of assumptions as a premise, and the space-time complexity is too high.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to solve the problems that the prior art has more preconditions and incomplete decision function on a routing algorithm.

In order to achieve the above object, in a first aspect, an embodiment of the present invention provides a method for navigating a route based on reinforcement learning, where the method includes the following steps:

s1, constructing a road adjacency graph of a city according to map data of the city;

s2, predicting congestion indexes of different road sections of the city in different time periods according to vehicle track data and a road adjacency graph;

s3, constructing a road congestion probability map of the city according to the congestion index on the basis of the road adjacency map;

and S4, generating a navigation path based on reinforcement learning, wherein the state space of the reinforcement learning comprises the road congestion probability map.

Specifically, in the road adjacency graph, the vertices are the common endpoints of the roads, the edges are the roads, and each vertex holds a set of other vertices that it can reach.

Specifically, step S2 includes the steps of:

s201, mapping the track data of the vehicle to a road adjacency graph, and establishing a corresponding relation between the track data of the vehicle and the road;

s202, calculating the congestion index of the current road according to the road type of the current road.

Specifically, the step S201 specifically includes the following steps:

(1) extracting inflection points of the vehicle track;

(2) calculating the vertical distance between the turning point and the side in the road adjacency graph, and obtaining the side with the minimum distance as the current side;

(3) mapping the inflection point to a vertex closest to the current edge;

(4) and calculating the speed between the inflection points of the track by using the distance between the front inflection point and the rear inflection point and the time difference between the inflection points according to the time sequence, and taking the speed as the speed of the taxi on the road section at the current hour.

Specifically, step S202 specifically includes the following steps:

(1) setting corresponding weights of different road types;

(2) and predicting the congestion index of the current road in different periods by using the average vehicle speed and the number of passing vehicles in each hour of the current road section, the corresponding weight of the type of the road in which the road is located and the travel time ratio.

Specifically, step S3 includes the steps of:

s301, converting the congestion index into the congestion probability at the current moment through a Logistic function;

s302, mapping the congestion probability to the edge of the road adjacency graph for weighting, and generating an urban road congestion probability graph in each hour.

Specifically, step S4 includes the steps of:

s401, defining a state space as a three-dimensional space comprising an urban road congestion probability graph and time, defining actions as selecting an adjacent edge from a current vertex to reach a next vertex, and defining a reward function as an expectation of time consumed by a path from a starting point to the current vertex;

s402, selecting an action strategy, namely selecting an edge with the minimum time consumption expectation when reaching the point as the direction of reaching the point for a certain vertex;

and S403, after the navigation path is expanded to the end point, the parent node of the navigation path is accessed until the navigation path returns to the starting point, and the route from the starting point to the end point is the navigation path.

In order to achieve the above object, according to a second aspect, an embodiment of the present invention provides a reinforcement learning-based path navigation system, which includes a server and a client,

the server side comprises: the device comprises a road adjacency relation graph building module, a congestion index prediction module, a road congestion probability graph building module and a navigation path generation module;

the road adjacency relation graph construction module is used for constructing a road adjacency relation graph of a city according to map data of the city;

the congestion index prediction module is used for predicting congestion indexes of different sections of the city in different time periods according to vehicle track data and a road adjacency graph;

the road congestion probability map building module is used for building a road congestion probability map of the city according to the congestion index on the basis of the road adjacency map;

the navigation path generation module is used for generating a navigation path based on reinforcement learning and sending the navigation path to a client for path navigation, and the reinforcement learning state space comprises the road congestion probability map;

the client comprises: the system comprises a navigation module, a guide module and a track data extraction module;

the navigation module is used for acquiring a navigation route from a server;

the track data extraction module is used for acquiring track data of the vehicle;

and the guide module is used for indicating the current position and the advancing direction of the vehicle owner according to the navigation route and the track data.

Specifically, the trajectory data extraction module of the client may also feed back the acquired trajectory data to the congestion index prediction module of the server in real time.

In order to achieve the above object, according to a third aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the path navigation method according to the first aspect.

Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:

1. the invention carries out probability transformation on the urban road congestion condition on the basis of digitization. When the client shows the road condition, the probability of whether the urban road is congested is more intuitively understood by people than the numerical value, and the visualization is facilitated. Meanwhile, the road congestion condition calculation only utilizes the road condition and the historical vehicle track data, and is convenient to practice.

2. The invention designs a way-finding algorithm based on reinforcement learning. As a probability way of finding a way, the method is different from a general obstacle way finding way, except that the method can be used for walking and can not be used for walking, other options are not available, and the probability way finding way can find a route which can not be found by a general way finding algorithm because the numerical value is more accurate. The reinforcement learning as a heuristic algorithm considers the time consumption and the smoothness degree of the route searching from the overall view so as to obtain the global optimal solution, and the method is different from the A-algorithm that the estimation needs to be carried out on the current position to the target position, so that the accuracy of the route searching algorithm is increased.

Drawings

Fig. 1 is a flowchart of a method for navigating a route based on reinforcement learning according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a route guidance system based on reinforcement learning according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, a method for navigating a route based on reinforcement learning includes the following steps:

S1, constructing a road adjacency relation graph of a city according to map data of the city.

The map data used in the embodiment of the invention is OpenStreetMap. The map data includes: road id, road type, whether bi-directional travel is possible, and the set of endpoints that the current road contains. Each endpoint has latitude and longitude information. Each end point can be shared by multiple roads. By the end points, the adjacency relation of the roads can be obtained, and the length of the roads can also be calculated. The purpose of constructing the road adjacency graph is to obtain road network information so as to map the track data of vehicles onto real roads, thereby predicting and navigating the road traffic condition.

S101, extracting road information from map data.

And extracting road information from the OpenStreetMap, wherein the road information comprises road adjacency information, road type and road length. Wherein the road types include: the highway, the branch road, the first grade road, the second grade road, tertiary road, residential area.

And S102, constructing a road adjacency graph according to the road information.

And constructing a road adjacency relation graph according to the road end-to-end adjacency relation. In the road adjacency graph, the vertexes are public endpoints of the roads, the edges are the roads, and each vertex stores a set of other vertexes which can be reached by the vertex. This is a directed graph.

And S2, predicting the congestion indexes of different road sections of the city in different time periods according to the vehicle track data and the road adjacency graph.

The vehicle trajectory data may be collected in real time or may be an off-line data set. The trajectory data includes: track id, vehicle id, current longitude of the vehicle, current latitude of the vehicle and current time information. The vehicle may be a taxi, a private car, or the like.

Data cleansing may be performed on the collected trajectory data prior to predicting the congestion index. The data cleaning refers to removing repeated or missing track data. Due to situations such as GPS signal interruption or vehicle driving to an intersection, the GPS receiver will continue to collect a large amount of the same or similar redundant data for a short period of time at a time. These redundant data directly reduce the efficiency of the algorithm operation. When a vehicle moves in a building or a forest or GPS signals are interrupted and other positioning methods such as base station positioning and the like are used, the positioning of the vehicle can drift, a large number of noise points are generated, and the distortion of a track is caused. Therefore, it is necessary to remove redundant data and drift data to correct the trajectory data.

S201, mapping the track data of the vehicle to a road adjacency graph, and establishing a corresponding relation between the vehicle track data and the road. The method specifically comprises the following steps:

(1) and extracting inflection points of the vehicle track.

And the inflection point of the vehicle track is the characteristic point of the vehicle.

(2) And calculating the vertical distance between the turning point and the side in the road adjacency graph, and obtaining the side with the minimum distance as the current side.

(3) The inflection point is mapped to the vertex closest to the current edge.

After the vertex of the graph corresponding to the inflection point is determined, the side corresponding to the taxi track segment is also determined.

(1) Corresponding weights for different road types are set.

Type of road	Highway, branch road	First-level highway	Second-level road	Three-level road	Residential area
						Weight of	5	4	3	1	0.5

(2) And predicting the congestion index of the current road in different periods by utilizing the average vehicle speed and the number of passing vehicles in each hour of the current road section, the corresponding weight of the type of the road in which the road is located and a travel time ratio (the length of the road section divided by the time when the current vehicle finishes the road section).

The predictive model may be a neural network model, a decision tree, or a Logistic regression. Congestion conditions of the same road section in different time periods are different, such as working days and rest days, peak hours of work and other time periods. And predicting each road of the current city to realize the prediction of the road congestion index with the time granularity of one hour in the whole city. The congestion index is used to reflect the road environment.

And S3, constructing a road congestion probability map of the city according to the congestion index on the basis of the road adjacency map.

After predicting the road congestion index, obtaining the traffic condition of the urban road in a future period of time, and generating an urban road congestion probability map according to the traffic condition, wherein the method comprises the following steps of:

and S301, converting the congestion index into the current congestion probability through a Logistic function.

The urban road congestion probability map comprises the following contents: the vertex represents the intersection point of the road section and the road section, the edge represents the road section, and each edge contains the congestion probability and the road length. The urban road congestion probability map is based on a road adjacency map, and the following contents are added: current time, congestion probability of each edge at the current time.

S401, a state space is defined to be a three-dimensional space comprising an urban road congestion probability graph and time, an action is defined to be that an adjacent edge is selected from a vertex where the current position is located to reach the next vertex, and a reward function is defined to be the expectation of the time consumed by a path from a starting point to the current vertex.

S402, selecting the action strategy that for a certain vertex, selecting the edge with the minimum time consumption for reaching the point as the direction for reaching the point.

For a vertex, there are multiple directions to reach the point. And selecting the direction with the expected minimum time consumption for reaching the point as the parent node of the vertex. Through greedy value iteration, for any point, the path from the starting point to the point is the shortest time-consuming path.

As shown in fig. 2, a reinforcement learning-based path navigation system includes a server and a client,

the navigation path generation module is used for generating a navigation path based on reinforcement learning and sending the navigation path to a client for path navigation, and the reinforcement learning state space comprises the road congestion probability map.

The client comprises: the device comprises a navigation module, a guide module and a track data extraction module.

The navigation module is used for acquiring a navigation route from a server;

The server side can further comprise a data cleaning module used for cleaning the track data and eliminating repeated or missing track data before the congestion index is predicted.

The track data extraction module of the client can also feed back the acquired track data to the congestion index prediction module of the server in real time.

The preferred embodiments of the present invention are described in detail, but the scope of the present invention is not limited thereto, and any modifications or substitutions that can be easily made by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A path navigation method based on reinforcement learning is characterized by comprising the following steps:

s4, generating a navigation path based on reinforcement learning, wherein the state space of the reinforcement learning comprises the road congestion probability map;

step S4 includes the following steps:

2. The route guidance method according to claim 1, wherein in the link adjacency graph, the vertices are common end points of the links, the edges are the links, and each vertex holds a set of other vertices it can reach.

3. The path guidance method according to claim 1, wherein step S2 includes the steps of:

4. The route guidance method according to claim 3, wherein the step S201 specifically includes the steps of:

(1) extracting inflection points of the vehicle track;

(3) mapping the inflection point to a vertex closest to the current edge;

5. The route guidance method according to claim 3, wherein the step S202 specifically comprises the steps of:

(1) setting corresponding weights of different road types;

6. The path guidance method according to claim 1, wherein step S3 includes the steps of:

7. A route navigation system based on reinforcement learning comprises a server and a client, and is characterized in that,

the navigation module is used for acquiring a navigation path from a server;

the guide module is used for indicating the current position and the advancing direction of the vehicle owner according to the navigation path and the track data;

the navigation path generation based on reinforcement learning is realized by the following steps:

(1) the method comprises the steps that a specified state space is a three-dimensional space comprising an urban road congestion probability graph and time, a specified action is used for selecting one adjacent edge from a current vertex to reach the next vertex, and a specified reward function is the expectation of the time consumed by a path from a starting point to the current vertex;

(2) selecting action strategy, namely selecting an edge with the minimum time consumption expectation when reaching the point as the direction of reaching the point for a certain vertex;

(3) and after the navigation path is expanded to the end point, the parent node of the navigation path is accessed until the navigation path returns to the starting point, and the route from the starting point to the end point is the navigation path.

8. The path navigation system of claim 7, wherein the trajectory data extraction module of the client is further configured to feed back the acquired trajectory data to the congestion index prediction module of the server in real time.

9. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, implements the path navigation method according to any one of claims 1 to 6.