CN109579861B - A Reinforcement Learning-Based Path Navigation Method and System - Google Patents

A Reinforcement Learning-Based Path Navigation Method and System Download PDF

Info

Publication number
CN109579861B
CN109579861B CN201811504732.9A CN201811504732A CN109579861B CN 109579861 B CN109579861 B CN 109579861B CN 201811504732 A CN201811504732 A CN 201811504732A CN 109579861 B CN109579861 B CN 109579861B
Authority
CN
China
Prior art keywords
road
congestion
navigation
reinforcement learning
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811504732.9A
Other languages
Chinese (zh)
Other versions
CN109579861A (en
Inventor
余辰
金海�
谢晓然
邹俊峰
郝童博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201811504732.9A priority Critical patent/CN109579861B/en
Publication of CN109579861A publication Critical patent/CN109579861A/en
Application granted granted Critical
Publication of CN109579861B publication Critical patent/CN109579861B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/3453Special cost functions, i.e. other than distance or default speed limit of road segments
    • G01C21/3492Special cost functions, i.e. other than distance or default speed limit of road segments employing speed data or traffic data, e.g. real-time or historical
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/3446Details of route searching algorithms, e.g. Dijkstra, A*, arc-flags or using precalculated routes

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Navigation (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a path navigation method and a system based on reinforcement learning, which comprises the following steps: constructing a road adjacency relation graph of the city according to map data of the city; predicting congestion indexes of different sections of the city in different time periods according to the vehicle track data and the road adjacency graph; constructing a road congestion probability map of the city according to the congestion index on the basis of the road adjacency map; and generating a navigation path based on reinforcement learning, wherein the state space of the reinforcement learning comprises the road congestion probability map. The urban road congestion situation is probabilistic on the basis of digitization, and is more visual and easy to visualize; the road congestion calculation only utilizes the road condition and the historical vehicle track data, and is convenient to practice; the method is different from a general obstacle routing method, the probability routing value is more accurate, and routes which cannot be found by a general routing algorithm can be found; the reinforcement learning is taken as a heuristic algorithm to consider the time consumption and smoothness of path finding, so that a global optimal solution is obtained, and the accuracy of the path finding algorithm is improved.

Description

Path navigation method and system based on reinforcement learning
Technical Field
The invention belongs to the technical field of path navigation, and particularly relates to a path navigation method and system based on reinforcement learning.
Background
It has become daily for cell phone navigation to find effective driving routes. The good driving route can not only save the time of the driver, but also save the energy consumption. The wide use of the GPS device allows us to easily acquire detailed road information of a city, such as traffic volume, speed, etc. The data has an extremely important guiding function for path navigation.
In the prior art, patent CN108847037A discloses a non-global information oriented urban road network path planning method, which enables the distribution of traffic flow to the road network to have the capability of adaptive adjustment through reinforcement learning, so that the state of the road network is in a flow balance state. However, the a × R routing algorithm in the method is relatively coarse in estimation of the time taken from the current position to the target position in the evaluation function, the accuracy is insufficient, and the space-time complexity is high. Strictly, the equivalent proposes a dynamic real-time multi-intersection path selection model of an urban traffic network, combines the preference of vehicles for front selectable routes and the real-time traffic states of the selectable routes, and utilizes a self-adaptive learning algorithm to carry out game so as to enable the dynamic route selection strategy of each running vehicle to reach Nash balance. However, the method has the defects that application scenarios require multiple assumptions (for example, each vehicle is independently routed according to a certain fixed probability, and each vehicle can observe the routing of other vehicles), the considered factors are excessive (for example, road illumination, road flatness and other indexes which are difficult to measure), and the method is difficult to implement.
In summary, the existing path navigation methods based on reinforcement learning all have the problems that the application scene of the algorithm needs a plurality of assumptions as a premise, and the space-time complexity is too high.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to solve the problems that the prior art has more preconditions and incomplete decision function on a routing algorithm.
In order to achieve the above object, in a first aspect, an embodiment of the present invention provides a method for navigating a route based on reinforcement learning, where the method includes the following steps:
s1, constructing a road adjacency graph of a city according to map data of the city;
s2, predicting congestion indexes of different road sections of the city in different time periods according to vehicle track data and a road adjacency graph;
s3, constructing a road congestion probability map of the city according to the congestion index on the basis of the road adjacency map;
and S4, generating a navigation path based on reinforcement learning, wherein the state space of the reinforcement learning comprises the road congestion probability map.
Specifically, in the road adjacency graph, the vertices are the common endpoints of the roads, the edges are the roads, and each vertex holds a set of other vertices that it can reach.
Specifically, step S2 includes the steps of:
s201, mapping the track data of the vehicle to a road adjacency graph, and establishing a corresponding relation between the track data of the vehicle and the road;
s202, calculating the congestion index of the current road according to the road type of the current road.
Specifically, the step S201 specifically includes the following steps:
(1) extracting inflection points of the vehicle track;
(2) calculating the vertical distance between the turning point and the side in the road adjacency graph, and obtaining the side with the minimum distance as the current side;
(3) mapping the inflection point to a vertex closest to the current edge;
(4) and calculating the speed between the inflection points of the track by using the distance between the front inflection point and the rear inflection point and the time difference between the inflection points according to the time sequence, and taking the speed as the speed of the taxi on the road section at the current hour.
Specifically, step S202 specifically includes the following steps:
(1) setting corresponding weights of different road types;
(2) and predicting the congestion index of the current road in different periods by using the average vehicle speed and the number of passing vehicles in each hour of the current road section, the corresponding weight of the type of the road in which the road is located and the travel time ratio.
Specifically, step S3 includes the steps of:
s301, converting the congestion index into the congestion probability at the current moment through a Logistic function;
s302, mapping the congestion probability to the edge of the road adjacency graph for weighting, and generating an urban road congestion probability graph in each hour.
Specifically, step S4 includes the steps of:
s401, defining a state space as a three-dimensional space comprising an urban road congestion probability graph and time, defining actions as selecting an adjacent edge from a current vertex to reach a next vertex, and defining a reward function as an expectation of time consumed by a path from a starting point to the current vertex;
s402, selecting an action strategy, namely selecting an edge with the minimum time consumption expectation when reaching the point as the direction of reaching the point for a certain vertex;
and S403, after the navigation path is expanded to the end point, the parent node of the navigation path is accessed until the navigation path returns to the starting point, and the route from the starting point to the end point is the navigation path.
In order to achieve the above object, according to a second aspect, an embodiment of the present invention provides a reinforcement learning-based path navigation system, which includes a server and a client,
the server side comprises: the device comprises a road adjacency relation graph building module, a congestion index prediction module, a road congestion probability graph building module and a navigation path generation module;
the road adjacency relation graph construction module is used for constructing a road adjacency relation graph of a city according to map data of the city;
the congestion index prediction module is used for predicting congestion indexes of different sections of the city in different time periods according to vehicle track data and a road adjacency graph;
the road congestion probability map building module is used for building a road congestion probability map of the city according to the congestion index on the basis of the road adjacency map;
the navigation path generation module is used for generating a navigation path based on reinforcement learning and sending the navigation path to a client for path navigation, and the reinforcement learning state space comprises the road congestion probability map;
the client comprises: the system comprises a navigation module, a guide module and a track data extraction module;
the navigation module is used for acquiring a navigation route from a server;
the track data extraction module is used for acquiring track data of the vehicle;
and the guide module is used for indicating the current position and the advancing direction of the vehicle owner according to the navigation route and the track data.
Specifically, the trajectory data extraction module of the client may also feed back the acquired trajectory data to the congestion index prediction module of the server in real time.
In order to achieve the above object, according to a third aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the path navigation method according to the first aspect.
Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:
1. the invention carries out probability transformation on the urban road congestion condition on the basis of digitization. When the client shows the road condition, the probability of whether the urban road is congested is more intuitively understood by people than the numerical value, and the visualization is facilitated. Meanwhile, the road congestion condition calculation only utilizes the road condition and the historical vehicle track data, and is convenient to practice.
2. The invention designs a way-finding algorithm based on reinforcement learning. As a probability way of finding a way, the method is different from a general obstacle way finding way, except that the method can be used for walking and can not be used for walking, other options are not available, and the probability way finding way can find a route which can not be found by a general way finding algorithm because the numerical value is more accurate. The reinforcement learning as a heuristic algorithm considers the time consumption and the smoothness degree of the route searching from the overall view so as to obtain the global optimal solution, and the method is different from the A-algorithm that the estimation needs to be carried out on the current position to the target position, so that the accuracy of the route searching algorithm is increased.
Drawings
Fig. 1 is a flowchart of a method for navigating a route based on reinforcement learning according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a route guidance system based on reinforcement learning according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, a method for navigating a route based on reinforcement learning includes the following steps:
s1, constructing a road adjacency graph of a city according to map data of the city;
s2, predicting congestion indexes of different road sections of the city in different time periods according to vehicle track data and a road adjacency graph;
s3, constructing a road congestion probability map of the city according to the congestion index on the basis of the road adjacency map;
and S4, generating a navigation path based on reinforcement learning, wherein the state space of the reinforcement learning comprises the road congestion probability map.
S1, constructing a road adjacency relation graph of a city according to map data of the city.
The map data used in the embodiment of the invention is OpenStreetMap. The map data includes: road id, road type, whether bi-directional travel is possible, and the set of endpoints that the current road contains. Each endpoint has latitude and longitude information. Each end point can be shared by multiple roads. By the end points, the adjacency relation of the roads can be obtained, and the length of the roads can also be calculated. The purpose of constructing the road adjacency graph is to obtain road network information so as to map the track data of vehicles onto real roads, thereby predicting and navigating the road traffic condition.
S101, extracting road information from map data.
And extracting road information from the OpenStreetMap, wherein the road information comprises road adjacency information, road type and road length. Wherein the road types include: the highway, the branch road, the first grade road, the second grade road, tertiary road, residential area.
And S102, constructing a road adjacency graph according to the road information.
And constructing a road adjacency relation graph according to the road end-to-end adjacency relation. In the road adjacency graph, the vertexes are public endpoints of the roads, the edges are the roads, and each vertex stores a set of other vertexes which can be reached by the vertex. This is a directed graph.
And S2, predicting the congestion indexes of different road sections of the city in different time periods according to the vehicle track data and the road adjacency graph.
The vehicle trajectory data may be collected in real time or may be an off-line data set. The trajectory data includes: track id, vehicle id, current longitude of the vehicle, current latitude of the vehicle and current time information. The vehicle may be a taxi, a private car, or the like.
Data cleansing may be performed on the collected trajectory data prior to predicting the congestion index. The data cleaning refers to removing repeated or missing track data. Due to situations such as GPS signal interruption or vehicle driving to an intersection, the GPS receiver will continue to collect a large amount of the same or similar redundant data for a short period of time at a time. These redundant data directly reduce the efficiency of the algorithm operation. When a vehicle moves in a building or a forest or GPS signals are interrupted and other positioning methods such as base station positioning and the like are used, the positioning of the vehicle can drift, a large number of noise points are generated, and the distortion of a track is caused. Therefore, it is necessary to remove redundant data and drift data to correct the trajectory data.
S201, mapping the track data of the vehicle to a road adjacency graph, and establishing a corresponding relation between the vehicle track data and the road. The method specifically comprises the following steps:
(1) and extracting inflection points of the vehicle track.
And the inflection point of the vehicle track is the characteristic point of the vehicle.
(2) And calculating the vertical distance between the turning point and the side in the road adjacency graph, and obtaining the side with the minimum distance as the current side.
(3) The inflection point is mapped to the vertex closest to the current edge.
After the vertex of the graph corresponding to the inflection point is determined, the side corresponding to the taxi track segment is also determined.
(4) And calculating the speed between the inflection points of the track by using the distance between the front inflection point and the rear inflection point and the time difference between the inflection points according to the time sequence, and taking the speed as the speed of the taxi on the road section at the current hour.
S202, calculating the congestion index of the current road according to the road type of the current road.
(1) Corresponding weights for different road types are set.
Type of road Highway, branch road First-level highway Second-level road Three-level road Residential area
Weight of 5 4 3 1 0.5
(2) And predicting the congestion index of the current road in different periods by utilizing the average vehicle speed and the number of passing vehicles in each hour of the current road section, the corresponding weight of the type of the road in which the road is located and a travel time ratio (the length of the road section divided by the time when the current vehicle finishes the road section).
The predictive model may be a neural network model, a decision tree, or a Logistic regression. Congestion conditions of the same road section in different time periods are different, such as working days and rest days, peak hours of work and other time periods. And predicting each road of the current city to realize the prediction of the road congestion index with the time granularity of one hour in the whole city. The congestion index is used to reflect the road environment.
And S3, constructing a road congestion probability map of the city according to the congestion index on the basis of the road adjacency map.
After predicting the road congestion index, obtaining the traffic condition of the urban road in a future period of time, and generating an urban road congestion probability map according to the traffic condition, wherein the method comprises the following steps of:
and S301, converting the congestion index into the current congestion probability through a Logistic function.
S302, mapping the congestion probability to the edge of the road adjacency graph for weighting, and generating an urban road congestion probability graph in each hour.
The urban road congestion probability map comprises the following contents: the vertex represents the intersection point of the road section and the road section, the edge represents the road section, and each edge contains the congestion probability and the road length. The urban road congestion probability map is based on a road adjacency map, and the following contents are added: current time, congestion probability of each edge at the current time.
And S4, generating a navigation path based on reinforcement learning, wherein the state space of the reinforcement learning comprises the road congestion probability map.
S401, a state space is defined to be a three-dimensional space comprising an urban road congestion probability graph and time, an action is defined to be that an adjacent edge is selected from a vertex where the current position is located to reach the next vertex, and a reward function is defined to be the expectation of the time consumed by a path from a starting point to the current vertex.
S402, selecting the action strategy that for a certain vertex, selecting the edge with the minimum time consumption for reaching the point as the direction for reaching the point.
For a vertex, there are multiple directions to reach the point. And selecting the direction with the expected minimum time consumption for reaching the point as the parent node of the vertex. Through greedy value iteration, for any point, the path from the starting point to the point is the shortest time-consuming path.
And S403, after the navigation path is expanded to the end point, the parent node of the navigation path is accessed until the navigation path returns to the starting point, and the route from the starting point to the end point is the navigation path.
As shown in fig. 2, a reinforcement learning-based path navigation system includes a server and a client,
the server side comprises: the device comprises a road adjacency relation graph building module, a congestion index prediction module, a road congestion probability graph building module and a navigation path generation module;
the road adjacency relation graph construction module is used for constructing a road adjacency relation graph of a city according to map data of the city;
the congestion index prediction module is used for predicting congestion indexes of different sections of the city in different time periods according to vehicle track data and a road adjacency graph;
the road congestion probability map building module is used for building a road congestion probability map of the city according to the congestion index on the basis of the road adjacency map;
the navigation path generation module is used for generating a navigation path based on reinforcement learning and sending the navigation path to a client for path navigation, and the reinforcement learning state space comprises the road congestion probability map.
The client comprises: the device comprises a navigation module, a guide module and a track data extraction module.
The navigation module is used for acquiring a navigation route from a server;
the track data extraction module is used for acquiring track data of the vehicle;
and the guide module is used for indicating the current position and the advancing direction of the vehicle owner according to the navigation route and the track data.
The server side can further comprise a data cleaning module used for cleaning the track data and eliminating repeated or missing track data before the congestion index is predicted.
The track data extraction module of the client can also feed back the acquired track data to the congestion index prediction module of the server in real time.
The preferred embodiments of the present invention are described in detail, but the scope of the present invention is not limited thereto, and any modifications or substitutions that can be easily made by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (9)

1.一种基于强化学习的路径导航方法,其特征在于,该方法包括以下步骤:1. a route navigation method based on reinforcement learning, is characterized in that, this method comprises the following steps: S1.根据城市的地图数据,构建所述城市的道路邻接关系图;S1. According to the map data of the city, construct the road adjacency relationship diagram of the city; S2.根据车辆轨迹数据和道路邻接关系图,预测所述城市不同路段不同时段的拥塞指数;S2. According to the vehicle trajectory data and the road adjacency graph, predict the congestion index of different road sections and different time periods in the city; S3.以道路邻接关系图为基础,根据拥塞指数构建所述城市的道路拥塞概率图;S3. Based on the road adjacency graph, construct a road congestion probability map of the city according to the congestion index; S4.基于强化学习生成导航路径,所述强化学习的状态空间包括所述道路拥塞概率图;S4. Generate a navigation path based on reinforcement learning, and the state space of the reinforcement learning includes the road congestion probability map; 步骤S4包括以下步骤:Step S4 includes the following steps: S401.规定状态空间为包括城市道路拥塞概率图和时间的三维空间,规定动作为从当前所在顶点选择一条相邻边到达下一个顶点,规定奖励函数为从起始点到当前顶点的路径所耗时间的期望;S401. The state space is specified as a three-dimensional space including the urban road congestion probability map and time, the specified action is to select an adjacent edge from the current vertex to reach the next vertex, and the specified reward function is the time taken by the path from the starting point to the current vertex expectations; S402.选取动作的策略为对于某一顶点,选取到达该点耗时期望最小的边作为到达该点的方向;S402. The strategy for selecting an action is, for a certain vertex, selecting the edge with the least expected time-consuming to reach the point as the direction to reach the point; S403.扩展到终点后,访问其父节点直到回到起始点,起始点到终点的路线即为导航路径。S403. After expanding to the end point, visit its parent node until it returns to the start point, and the route from the start point to the end point is the navigation path. 2.如权利要求1所述路径导航方法,其特征在于,所述道路邻接关系图中,顶点为道路的公共端点,边为道路,每个顶点都保存了它能到达的其他顶点的集合。2 . The route navigation method according to claim 1 , wherein, in the road adjacency graph, vertices are public endpoints of roads, edges are roads, and each vertex stores a set of other vertices that it can reach. 3 . 3.如权利要求1所述的路径导航方法,其特征在于,步骤S2包括以下步骤:3. The route navigation method according to claim 1, wherein step S2 comprises the following steps: S201.将车辆的轨迹数据映射到道路邻接关系图中,建立车辆轨迹数据与道路的对应关系;S201. Map the trajectory data of the vehicle to a road adjacency relationship graph, and establish a corresponding relationship between the vehicle trajectory data and the road; S202.根据当前道路的道路类型计算当前道路的拥塞指数。S202. Calculate the congestion index of the current road according to the road type of the current road. 4.如权利要求3所述的路径导航方法,其特征在于,所述步骤S201具体包括以下步骤:4. The route navigation method according to claim 3, wherein the step S201 specifically comprises the following steps: (1)提取车辆轨迹的拐点;(1) Extract the inflection point of the vehicle trajectory; (2)计算拐点与道路邻接关系图中边的垂直距离,求得距离最小的边作为当前边;(2) Calculate the vertical distance between the inflection point and the edge in the road adjacency graph, and obtain the edge with the smallest distance as the current edge; (3)将拐点映射到与当前边最靠近的顶点;(3) Map the inflection point to the vertex closest to the current edge; (4)按时间顺序,利用前后拐点间的距离和拐点间的时间差,计算轨迹拐点间的速度,作为出租车在当前小时在该路段的速度。(4) In chronological order, using the distance between the front and rear inflection points and the time difference between the inflection points, the speed between the inflection points of the trajectory is calculated as the speed of the taxi on the road section at the current hour. 5.如权利要求3所述的路径导航方法,其特征在于,步骤S202具体包括以下步骤:5. The route navigation method according to claim 3, wherein step S202 specifically comprises the following steps: (1)设定不同道路类型的对应权重;(1) Set the corresponding weights for different road types; (2)利用在当前路段的每个小时的平均车速和通过的车辆数目、所在道路类型对应权重、行程时间比,预测当前道路在不同时段的拥塞指数。(2) Predict the congestion index of the current road in different time periods by using the average vehicle speed and the number of passing vehicles in each hour of the current road section, the corresponding weight of the road type, and the travel time ratio. 6.如权利要求1所述的路径导航方法,其特征在于,步骤S3包括以下步骤:6. The route navigation method according to claim 1, wherein step S3 comprises the following steps: S301.将拥塞指数通过Logistic函数转换成当前时刻拥塞概率;S301. Convert the congestion index into the congestion probability at the current moment through the Logistic function; S302.将拥塞概率映射到道路邻接关系图的边进行加权,生成每个小时的城市道路拥塞概率图。S302. Map the congestion probability to the edges of the road adjacency graph for weighting, and generate an hourly urban road congestion probability map. 7.一种基于强化学习的路径导航系统,该系统包括服务端和客户端,其特征在于,7. A route navigation system based on reinforcement learning, the system comprises a server and a client, characterized in that, 所述服务端包括:道路邻接关系图构建模块、拥塞指数预测模块、道路拥塞概率图构建模块和导航路径生成模块;The server includes: a road adjacency graph building module, a congestion index prediction module, a road congestion probability graph building module, and a navigation path generation module; 所述道路邻接关系图构建模块,用于根据城市的地图数据,构建所述城市的道路邻接关系图;The road adjacency graph building module is used to construct the road adjacency graph of the city according to the map data of the city; 所述拥塞指数预测模块,用于根据车辆轨迹数据和道路邻接关系图,预测所述城市不同路段不同时段的拥塞指数;The congestion index prediction module is used to predict the congestion index of different road sections and different time periods in the city according to the vehicle trajectory data and the road adjacency relationship diagram; 所述道路拥塞概率图构建模块,用于以道路邻接关系图为基础,根据拥塞指数构建所述城市的道路拥塞概率图;The road congestion probability map construction module is used to construct the road congestion probability map of the city according to the congestion index based on the road adjacency relationship diagram; 所述导航路径生成模块,用于基于强化学习生成导航路径,并发送给客户端进行路径导航,所述强化学习的状态空间包括所述道路拥塞概率图;The navigation path generation module is configured to generate a navigation path based on reinforcement learning and send it to a client for path navigation, and the reinforcement learning state space includes the road congestion probability map; 所述客户端包括:导航模块、指南模块和轨迹数据提取模块;The client includes: a navigation module, a guide module and a trajectory data extraction module; 所述导航模块,用于从服务端获取导航路径;The navigation module is used to obtain the navigation path from the server; 所述轨迹数据提取模块,用于获取车辆的轨迹数据;The trajectory data extraction module is used to obtain the trajectory data of the vehicle; 所述指南模块,用于根据导航路径和轨迹数据,指示车主当前位置和前进方向;The guide module is used to indicate the current position and heading direction of the vehicle owner according to the navigation path and trajectory data; 通过以下步骤实现基于强化学习生成导航路径:The generation of navigation paths based on reinforcement learning is achieved by the following steps: (1)规定状态空间为包括城市道路拥塞概率图和时间的三维空间,规定动作为从当前所在顶点选择一条相邻边到达下一个顶点,规定奖励函数为从起始点到当前顶点的路径所耗时间的期望;(1) The state space is specified as a three-dimensional space including the urban road congestion probability map and time, the specified action is to select an adjacent edge from the current vertex to reach the next vertex, and the specified reward function is the cost of the path from the starting point to the current vertex expectations of time; (2)选取动作的策略为对于某一顶点,选取到达该点耗时期望最小的边作为到达该点的方向;(2) The strategy for selecting an action is to select the edge with the least expected time-consuming to reach the point for a certain vertex as the direction to reach the point; (3)扩展到终点后,访问其父节点直到回到起始点,起始点到终点的路线即为导航路径。(3) After extending to the end point, visit its parent node until it returns to the starting point, and the route from the starting point to the end point is the navigation path. 8.如权利要求7所述的路径导航系统,其特征在于,所述客户端的轨迹数据提取模块还用于将获取到的轨迹数据实时反馈给所述服务端的拥塞指数预测模块。8 . The route navigation system according to claim 7 , wherein the trajectory data extraction module of the client is further configured to feed back the acquired trajectory data to the congestion index prediction module of the server in real time. 9 . 9.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至6任一项所述的路径导航方法。9. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the path according to any one of claims 1 to 6 is implemented Navigation method.
CN201811504732.9A 2018-12-10 2018-12-10 A Reinforcement Learning-Based Path Navigation Method and System Active CN109579861B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811504732.9A CN109579861B (en) 2018-12-10 2018-12-10 A Reinforcement Learning-Based Path Navigation Method and System

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811504732.9A CN109579861B (en) 2018-12-10 2018-12-10 A Reinforcement Learning-Based Path Navigation Method and System

Publications (2)

Publication Number Publication Date
CN109579861A CN109579861A (en) 2019-04-05
CN109579861B true CN109579861B (en) 2020-05-19

Family

ID=65927980

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811504732.9A Active CN109579861B (en) 2018-12-10 2018-12-10 A Reinforcement Learning-Based Path Navigation Method and System

Country Status (1)

Country Link
CN (1) CN109579861B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113252054B (en) * 2020-02-11 2023-11-28 株式会社日立制作所 Navigation method and navigation system
CN112525213B (en) * 2021-02-10 2021-05-14 腾讯科技(深圳)有限公司 ETA prediction method, model training method, device and storage medium
CN113516865B (en) * 2021-03-17 2022-07-05 北京易控智驾科技有限公司 Method and device for queuing vehicles on unmanned road network in mines based on high-precision map
CN113503888A (en) * 2021-07-09 2021-10-15 复旦大学 Dynamic path guiding method based on traffic information physical system
CN113643535B (en) * 2021-08-02 2023-02-21 宝方云科技(浙江)有限公司 Road traffic prediction method, device, equipment and medium based on smart city
CN115808178B (en) * 2021-09-13 2025-06-27 北京四维图新科技股份有限公司 Navigation map updating method, device and automatic driving system
CN116793376B (en) * 2023-04-13 2024-03-19 北京邮电大学 Path prediction method, device and storage medium based on shortest path and historical experience

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929281A (en) * 2012-11-05 2013-02-13 西南科技大学 Robot k-nearest-neighbor (kNN) path planning method under incomplete perception environment
CN104157139A (en) * 2014-08-05 2014-11-19 中山大学 Prediction method and visualization method of traffic jam
CN104620078A (en) * 2012-06-29 2015-05-13 通腾发展德国公司 Generating alternative routes
CN106530694A (en) * 2016-11-07 2017-03-22 深圳大学 Traffic congestion prediction method and system based on traffic congestion propagation model
CN107747947A (en) * 2017-10-23 2018-03-02 电子科技大学 A kind of collaboration itinerary based on user's history GPS track recommends method
JP2018112900A (en) * 2017-01-11 2018-07-19 Kddi株式会社 Program, vehicle terminal, mobile terminal, estimation server, and method for estimating behaviors based on driving characteristics of users
CN108540384A (en) * 2018-04-13 2018-09-14 西安交通大学 Intelligent heavy route method and device based on congestion aware in software defined network
CN108847037A (en) * 2018-06-27 2018-11-20 华中师范大学 A kind of city road network paths planning method towards non-global information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110574046A (en) * 2017-05-19 2019-12-13 渊慧科技有限公司 Efficient imitation of data for various behaviors

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104620078A (en) * 2012-06-29 2015-05-13 通腾发展德国公司 Generating alternative routes
CN102929281A (en) * 2012-11-05 2013-02-13 西南科技大学 Robot k-nearest-neighbor (kNN) path planning method under incomplete perception environment
CN104157139A (en) * 2014-08-05 2014-11-19 中山大学 Prediction method and visualization method of traffic jam
CN106530694A (en) * 2016-11-07 2017-03-22 深圳大学 Traffic congestion prediction method and system based on traffic congestion propagation model
JP2018112900A (en) * 2017-01-11 2018-07-19 Kddi株式会社 Program, vehicle terminal, mobile terminal, estimation server, and method for estimating behaviors based on driving characteristics of users
CN107747947A (en) * 2017-10-23 2018-03-02 电子科技大学 A kind of collaboration itinerary based on user's history GPS track recommends method
CN108540384A (en) * 2018-04-13 2018-09-14 西安交通大学 Intelligent heavy route method and device based on congestion aware in software defined network
CN108847037A (en) * 2018-06-27 2018-11-20 华中师范大学 A kind of city road network paths planning method towards non-global information

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Modeling learning and adaptation processes in activity-travel choiceA framework and numerical experiment;Arentze,Theo等;《Transportation》;20030228;第30卷(第1期);第37-62页 *
基于分层式强化学习的移动机器人导航控制;陈春林等;《南京航空航天大学学报》;20060228;第38卷(第1期);第70-75页 *
基于累积Logistic模型的城市交通拥堵概率估计研究;崔承颖;《中国优秀硕士学位论文全文数据库 社会科学Ⅰ辑》;20151015(第10期);第G113-26页 *

Also Published As

Publication number Publication date
CN109579861A (en) 2019-04-05

Similar Documents

Publication Publication Date Title
CN109579861B (en) A Reinforcement Learning-Based Path Navigation Method and System
JP4896981B2 (en) A method for predicting a destination from a partial trajectory using open world modeling and closed world modeling methods
JP6094543B2 (en) Origin / Destination Extraction Device, Origin / Destination Extraction Method
JP6332287B2 (en) Route prediction apparatus and route prediction method
JP5602856B2 (en) Distributed traffic navigation using vehicle communication
KR101994496B1 (en) Providing routes through information collection and retrieval
JP2019184583A (en) Dynamic lane-level vehicle navigation using lane group identification
US10424202B1 (en) Parking strategy recommendation based on parking space availability data
US10739152B2 (en) Dynamic reporting of location data for a vehicle based on a fitted history model
WO2015013042A2 (en) Familiarity modeling
CN104121918A (en) Real-time path planning method and system
CN104197948A (en) Navigation system and method based on traffic information prediction
US20230051766A1 (en) Method, apparatus, and computer program product for predicting electric vehicle charge point utilization
CN116698054B (en) Road matching method, device, electronic equipment and storage medium
RU2664034C1 (en) Traffic information creation method and system, which will be used in the implemented on the electronic device cartographic application
KR20080093580A (en) Route navigation system and method
KR101728447B1 (en) Apparatus and method of search for vehicle route using predicted traffic volume
GB2556876A (en) Vehicle route guidance
WO2018179956A1 (en) Parking lot information management system, parking lot guidance system, parking lot information management program, and parking lot guidance program
CN108256662A (en) The Forecasting Methodology and device of arrival time
GB2560487A (en) Vehicle route guidance
JP5860136B2 (en) Image processing device, image processing management device, terminal device, and image processing method
Chmiel et al. A multicriteria model for dynamic route planning
JP2019211484A (en) Image processing apparatus, image processing management apparatus, terminal device, and image processing method
JP2018109631A (en) Image processing unit, image processing management device, terminal device, and image processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant