CN114384901B - Reinforced learning aided driving decision-making method oriented to dynamic traffic environment - Google Patents

Reinforced learning aided driving decision-making method oriented to dynamic traffic environment Download PDF

Info

Publication number
CN114384901B
CN114384901B CN202210032222.6A CN202210032222A CN114384901B CN 114384901 B CN114384901 B CN 114384901B CN 202210032222 A CN202210032222 A CN 202210032222A CN 114384901 B CN114384901 B CN 114384901B
Authority
CN
China
Prior art keywords
road
intersection
reinforcement learning
dynamic traffic
traffic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210032222.6A
Other languages
Chinese (zh)
Other versions
CN114384901A (en
Inventor
侯卫锋
叶建位
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Zhongzhida Technology Co ltd
Original Assignee
Zhejiang Zhongzhida Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Zhongzhida Technology Co ltd filed Critical Zhejiang Zhongzhida Technology Co ltd
Priority to CN202210032222.6A priority Critical patent/CN114384901B/en
Publication of CN114384901A publication Critical patent/CN114384901A/en
Application granted granted Critical
Publication of CN114384901B publication Critical patent/CN114384901B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/0088Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/80Technologies aiming to reduce greenhouse gasses emissions common to all road transportation technologies
    • Y02T10/84Data processing systems or methods, management, administration

Abstract

The application discloses a reinforcement learning auxiliary driving decision-making method facing to a dynamic traffic environment, which comprises the following steps: extracting urban area road information from an environment road map, and abstracting the urban area road information into a road area undirected graph after simplifying preprocessing; acquiring the traffic density distribution condition of regional roads by using an API (application programming interface) for real-time road condition query; constructing a dynamic traffic model according to the traffic flow density distribution condition and the road traffic rule; obtaining the transfer cost of the vehicle between any two intersections; in the road area undirected graph, a path planning based on reinforcement learning is carried out according to a dynamic traffic model and corresponding transfer cost, and a minimum cost path from a starting point to a terminal point is solved. Therefore, the dynamic traffic environment is modeled, an auxiliary driving decision scheme is provided for a driver by using the dynamic traffic flow information, the dynamic route planning under the local information view angle is realized, and the problem that the path planning result is inaccurate easily caused by only using static information is effectively avoided.

Description

Dynamic traffic environment-oriented reinforcement learning auxiliary driving decision method
Technical Field
The invention relates to the technical field of intelligent traffic, in particular to a reinforcement learning auxiliary driving decision method for a dynamic traffic environment.
Background
An Intelligent Transportation System (ITS) aims to construct a safer, more comfortable and more stable traffic environment. Advanced Driver Assistance System (ADAS) of vehicles has also been developed rapidly in recent years, wherein driving environment information acquisition, driving environment characterization modeling and driving Assistance decision making are important research directions of driving Assistance decision making.
At present, navigation planning problems about travel are mainly to make driver auxiliary driving route decisions based on static road information, and mainly focus on path planning in a graph theory. The studies related to vehicle travel planning can be roughly divided into two categories: the first category is mainly to recommend a travel planning scheme for a vehicle cluster, optimize the overall operation target of a fleet and not reflect the driving requirements of drivers. The second category of problems is that for a single vehicle, the optimal driving route is recommended for the driver in consideration of the objectives of shortest distance, shortest time, or shortest energy consumption, but only the deterministic environment is considered.
However, most of the dynamic information of the existing map navigation software is based on iterative update of a historical database, and a data-driven method is utilized to perform road traffic modeling aiming at the batch process of traffic flow, so that road vehicle transfer rules brought by a dynamic traffic environment are not really considered, and an auxiliary driving decision result of factors such as comprehensive coverage area dynamic information and the like is difficult to provide for a driver.
Therefore, how to implement the driving assistance decision method for the dynamic traffic environment is a technical problem to be urgently solved by the technical personnel in the field.
Disclosure of Invention
In view of the above, the present invention provides a method for assisting driving decision by reinforcement learning in a dynamic traffic environment, which can provide a driving decision-assisting scheme for a driver in the dynamic traffic environment, and effectively avoid the problem that a path planning result is inaccurate due to the use of only static information. The specific scheme is as follows:
a reinforcement learning auxiliary driving decision method facing to a dynamic traffic environment comprises the following steps:
extracting urban area road information from an environmental road map, and abstracting the urban area road information into a road area undirected graph after simplified preprocessing;
acquiring the traffic density distribution condition of regional roads by using an API (application programming interface) for real-time road condition query;
constructing a dynamic traffic model according to the traffic flow density distribution condition and the road traffic rule;
obtaining the transfer cost of the vehicle between any two intersections;
and in the road area undirected graph, performing a path planning based on reinforcement learning according to the dynamic traffic model and the corresponding transfer cost, and solving a minimum cost path from a starting point to a terminal point.
Preferably, in the reinforcement learning aided driving decision method for a dynamic traffic environment provided in the embodiment of the present invention, obtaining a traffic density distribution of a regional road by using an API interface for real-time road condition query includes:
according to the longitude and latitude of each road in the range from the driving starting point to the destination, dividing the road into longitude and latitude points corresponding to the road according to certain longitude and latitude step lengths, and returning the congestion evaluation of each intersection node through an API (application program interface) for real-time road condition query;
utilizing congestion evaluation of each point in the whole environmental road map range to draw a road congestion evaluation layered thermodynamic diagram;
calculating the traffic flow density of each road according to the congestion condition of each road in the thermodynamic diagram;
and acquiring the traffic density distribution condition of the regional roads according to the calculated traffic density of each road.
Preferably, in the method for assisting driving decision by reinforcement learning oriented to a dynamic traffic environment according to the embodiment of the present invention, the following formula is used to calculate the traffic density of each road:
Figure BDA0003466870140000021
where ρ is m The traffic density of the mth longitude and latitude point in the current road, N is the total number of the longitude and latitude points, c m A road congestion evaluation value l corresponding to the mth longitude and latitude point in the road m For the longitude and latitude step length of the road, sigma m Is the total length of the road.
Preferably, in the method for assisting driving decision by reinforcement learning for a dynamic traffic environment provided in an embodiment of the present invention, in a process of constructing a dynamic traffic model, the method includes:
and when the traffic flow is updated every time, the information of the current intersection and the road vehicles is put into the dynamic traffic model, and the inter-intersection vehicle transfer information of each iteration is obtained through the dynamic traffic model so as to update the state information of each intersection and road vehicle after transfer.
Preferably, in the method for assisting driving decision by reinforcement learning for a dynamic traffic environment provided in an embodiment of the present invention, in a process of constructing a dynamic traffic model, the method further includes:
and normalizing the intersection turning probability in the dynamic traffic model to meet the condition that the sum of the turning-out probabilities of all intersections is 1, wherein the turning-out probability of each intersection has four directions.
Preferably, the reinforcement learning assistance for the dynamic traffic environment provided in the embodiment of the present inventionIn the driving decision method, the dynamic traffic model comprises an intersection turning probability model; the intersection turning probability model P i,j Comprises the following steps:
Figure BDA0003466870140000031
wherein each road is represented by intersection labels i, j at two ends of the road;
Figure BDA0003466870140000032
as a function of speed and density, p i,j Is the traffic density, beta, of the communication road between the ith intersection and the jth intersection 1 、β 2 、β 3 、β 4 Are all weighting coefficients, β 4 rand denotes a random factor, V max Representing the road speed upper limit, p max The upper density limit is indicated, and the corresponding speed is 0.
Preferably, in the reinforcement learning aided driving decision method for a dynamic traffic environment provided in the embodiment of the present invention, the dynamic traffic model further includes an intersection outflow vehicle model; the intersection outflow vehicle model is a discrete model based on state, the time interval of each iteration updating cycle is 1, and the number L of the intersection outflow vehicles is calculated by adopting the following formula j
Figure BDA0003466870140000033
Wherein alpha is i,j And the width matrix of the connecting road between the ith intersection and the jth intersection.
Preferably, in the method for assisting driving decision by reinforcement learning for a dynamic traffic environment provided by the embodiment of the present invention, obtaining a transfer cost of a vehicle between any two intersections includes:
calculating the running time of the road with two connected road ports according to the relation between the traffic density and the running speed;
calculating the waiting time of the intersection according to the relationship between the waiting time of the intersection and the number of vehicles at the intersection;
calculating the running time from the target intersection to the terminal according to the coordinates of the target intersection, the coordinates of the terminal and the running speed of the current road;
linearly weighting the driving time of the road with the two road junctions connected, the waiting time of the road junctions and the driving time from the target road junction to the terminal point to obtain the transfer cost of the two adjacent road junctions;
and obtaining the transfer cost between any two intersections according to the product of the transfer cost of two adjacent intersections and the reciprocal of the reachable matrix element.
Preferably, in the reinforcement learning aided driving decision method for a dynamic traffic environment provided in an embodiment of the present invention, the path planning based on reinforcement learning is performed, including:
and performing path planning by using a Q-learning reinforcement learning algorithm.
Preferably, in the method for assisting driving decision by reinforcement learning for a dynamic traffic environment provided in an embodiment of the present invention, in a process of performing a path planning by using a Q-learning reinforcement learning algorithm, the method includes:
defining the reinforcement learning state as the intersection serial number from the starting point to the end point of the driving;
defining the action of reinforcement learning as the state of the next moment;
defining elements in the reinforcement learning reward matrix as the transfer cost of each road of each intersection, wherein under the intersection where the reinforcement learning agent is located at present, the reinforcement learning agent can only know the information of the surrounding connected roads;
obtaining a transition probability according to the dynamic traffic model; the environmental model interacted by the reinforcement learning agent is described by the transition probability;
and updating the Q value matrix in the Q-learning reinforcement learning algorithm according to the Bellman equation.
According to the technical scheme, the reinforcement learning auxiliary driving decision method for the dynamic traffic environment comprises the following steps: extracting urban area road information from an environment road map, and abstracting the urban area road information into a road area undirected graph after simplifying preprocessing; acquiring the traffic density distribution condition of regional roads by using an API (application programming interface) for real-time road condition query; constructing a dynamic traffic model according to the traffic flow density distribution condition and the road traffic rule; obtaining the transfer cost of the vehicle between any two intersections; in the road area undirected graph, a path planning based on reinforcement learning is carried out according to a dynamic traffic model and corresponding transfer cost, and a minimum cost path from a starting point to a terminal point is solved.
According to the reinforcement learning auxiliary driving decision method for the dynamic traffic environment, provided by the invention, the dynamic traffic environment is modeled according to the environment road map and the real-time road condition query API interface data, and a driving decision assisting scheme is provided for a driver by fully utilizing simple and easily-obtained dynamic traffic flow information in the dynamic traffic environment, so that dynamic route planning under a local information view angle is realized, and the problem that a path planning result is inaccurate easily caused by only using static information is effectively avoided.
Drawings
In order to more clearly illustrate the embodiments of the present invention or technical solutions in related arts, the drawings used in the description of the embodiments or related arts will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a driving decision method assisted by reinforcement learning for a dynamic traffic environment according to an embodiment of the present invention;
FIG. 2 is a general framework diagram of a driving decision method assisted by reinforcement learning for dynamic traffic environment according to an embodiment of the present invention;
FIG. 3 is a comparison diagram of a full map and a processed binary rasterized map provided by an embodiment of the present invention;
fig. 4 is a simplified extracted road area undirected graph provided by the embodiment of the present invention;
fig. 5 is a map road congestion evaluation layered thermodynamic diagram provided by an embodiment of the present invention;
fig. 6 is a triangular matrix on the map road traffic density according to the embodiment of the present invention;
FIG. 7 is a dynamic traffic update process provided by an embodiment of the present invention;
FIG. 8 is a schematic diagram of intersection modeling provided by an embodiment of the present invention;
fig. 9 is a schematic diagram of a decision result of the Q-Learning reinforcement assisted driving provided by the embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention provides a dynamic traffic environment-oriented reinforcement learning auxiliary driving decision method, which comprises the following steps as shown in figure 1:
s101, extracting urban area road information from an environment road map, and abstracting the urban area road information into a road area undirected graph after simplifying preprocessing;
specifically, according to an environment road map, extracting urban area road information, simplifying the road information, and abstracting the road information into a road area undirected graph;
s102, acquiring the traffic density distribution condition of regional roads by using an API (application program interface) for real-time road condition query;
s103, constructing a dynamic traffic model according to the traffic flow density distribution condition and the road traffic rule;
the steps are used for modeling the dynamic environment of road traffic, and the modeling process of the dynamic traffic model can be divided into an intersection turning probability model and an intersection flowing vehicle model;
s104, obtaining the transfer cost of the vehicle between any two intersections;
the steps are modeling for the vehicle running cost, and a transfer cost function of the vehicle from one intersection to another intersection can be established according to the known road and intersection information;
and S105, in the road area undirected graph, performing a path planning based on reinforcement learning according to the dynamic traffic model and the corresponding transfer cost, and solving a minimum cost path from the starting point to the end point.
Specifically, a route planning based on reinforcement learning may be performed according to the undirected connectivity graph model of the road environment obtained in step S101 and step S102 by using the local road information in step S103 and step S104, and a minimum cost route from the start point to the end point is solved.
If the starting point or the end point of the traveling crane changes, returning to the step S101;
if the starting point or the end point of the traveling crane is not changed, the step S106 is executed;
and S106, drawing the minimum cost path obtained in the step S105 in an environment road map or a road area undirected graph, thereby providing a driving assistance decision scheme for a driver.
In the reinforcement learning auxiliary driving decision method for the dynamic traffic environment provided by the embodiment of the invention, the dynamic traffic environment is modeled according to the environment road map and the real-time road condition query API interface data, and the simple and easily obtained dynamic traffic flow information is fully utilized to provide an auxiliary driving decision scheme for a driver, so that the dynamic route planning under the local information view angle is realized, and the problem that the path planning result is inaccurate easily caused by only using static information is effectively avoided.
Taking the decision-making for assisting driving from a city area a to a city area B in a certain province as an example, the decision-making problem for assisting driving can be decomposed into the following three parts:
firstly, a dynamic traffic area is defined as a part of area from an area A to an area B, so that the road is simplified to a certain extent, and a main road with large traffic flow is considered. At the same time, the area is defined without vehicle exchange with the outside, only vehicle flow and transfer within the area are considered.
Secondly, performing mathematical modeling on traffic flow distribution and road traffic rules, and considering factors such as road vehicle density, lane width, road length, destination guiding information and the like to construct a dynamic traffic model (which can be divided into an intersection turning probability model and an intersection outflow vehicle model).
And finally, based on the established dynamic traffic model, a driving scheme with the minimum cost is sought for vehicles driving from the area A to the area B with the aim of directing to a target point, avoiding congestion and the like as a comprehensive target and with the constraint of different traffic flow distribution (dynamic change) and road traffic rules.
Fig. 2 shows an overall framework diagram of a reinforcement learning aided driving decision method for a dynamic traffic environment.
Further, in a concrete implementation, in the method for assisting driving decision through reinforcement learning for a dynamic traffic environment provided by the embodiment of the present invention, the step S101 is abstracted to a road area undirected graph after simplified preprocessing, and may specifically include the following steps:
firstly, dyeing an environmental road map, and converting the environmental road map into a gray picture;
expanding and convolving the gray level picture, so that when pixels with gray values larger than 0 appear in a convolution kernel range, the pixels in the convolution kernel range are white, and filtering to obtain a binary rasterized map;
abstracting the binary rasterized map into a road area undirected graph; wherein, every crossing is abstracted to a node, if there is a road intercommunication between the crossings, abstract the road to the side.
Specifically, taking an area a as a yuquan and an area B as a hong kong as an example, as shown in fig. 3, a map full view of an area from the yuquan to the hong kong is first obtained through a certain map API, and then the area map is dyed and converted into a gray image; and expanding and convolving the gray map, so that when pixels with the gray values larger than 0 appear in the convolution kernel range, the pixels in the convolution kernel range are white (namely, can pass paths), and filtering to obtain a binary rasterized map. And then abstracting the binary rasterized map into an undirected graph. Wherein each intersection is abstracted as a node, and if there is a road communication between intersections, the road is abstracted as an edge, so as to obtain the abstract undirected graph shown in fig. 4. The horizontal and vertical coordinates of each intersection depend on its pixel location in the picture. Because all intersections are uniformly marked by the same picture, and the scales of the intersections are the same, the undirected graph can well represent the original road information graph. So far, the extraction and simplification preprocessing of the road information are completed.
Suppose that there are 30 intersections in the area, which are numbered p in sequence 1 ~p 30 I.e. P ═ P 1 ,p 2 ,…,p 30 }; the starting point of the vehicle is Yuquan p 30 The end point is hong Kong p 11 (ii) a The vehicle flow at the boundary of the area is not considered, i.e. the area is assumed to be closed, and the vehicle only runs on the road in the figure. The formalized optimization target of the driving decision-making assisting method is
Figure BDA0003466870140000071
Wherein C is ij The cost function is transferred for the intersection, i.e., the total cost of vehicle operation from the starting point to the destination is minimized.
Further, in a specific implementation, in the method for assisting driving decision by reinforcement learning for a dynamic traffic environment provided in the embodiment of the present invention, the step S102 obtains a traffic density distribution of an area road by using an API interface for real-time road condition query, which may specifically include the following steps:
dividing the road into longitude and latitude points corresponding to the road according to certain longitude and latitude step length according to the longitude and latitude of each road in the range from the driving starting point to the destination point, and returning congestion evaluation of each intersection node through an API (application program interface) for real-time road condition query; the congestion evaluation is classified into 5 categories, and the corresponding relationship is as follows: 0-unblocked, 1-good, 2-slow walking, 3-congested, 4-severe congestion; the specific conditions are shown in the following table I and the following table II.
Table a certain map API request parameter (excerpt)
Figure BDA0003466870140000081
API return parameter for a certain map of Table two (alternate)
Figure BDA0003466870140000082
Fifthly, drawing a road congestion evaluation layered thermodynamic diagram by using congestion evaluation of each point in the whole environment road map range;
step six, calculating the traffic flow density of each road according to the congestion condition of each road in the thermodynamic diagram; in particular, the following formula can be used to calculate the traffic density of each road:
Figure BDA0003466870140000083
where ρ is m The traffic density of the mth longitude and latitude point in the current road, N is the total number of the longitude and latitude points, c m A road congestion evaluation value l corresponding to the mth longitude and latitude point in the road m For the longitude and latitude step length of the road, sigma m The total length of the road.
And step seven, acquiring the traffic density distribution condition of the regional roads according to the calculated traffic density of each road.
Specifically, taking the area a as jade spring and the area B as purple hong kong as an example, according to the longitude and latitude of each road in the range from jade spring to purple hong kong, the road is divided into longitude and latitude points corresponding to the road according to certain longitude and latitude step lengths, and the congestion evaluation of the points is returned through an API; the method comprises the steps of utilizing congestion evaluation of all points in the range from the whole Yuquan to the purple gold harbor to make a road congestion evaluation layered thermodynamic diagram; and calculating the total traffic flow density of each road according to the congestion condition of each road in the thermodynamic diagram. Fig. 5 shows the mapped road congestion evaluation hierarchical thermodynamic diagram. According to the calculation, a triangular matrix on the map road traffic density shown in fig. 6 is obtained. Therefore, the reading and the arrangement of the initial traffic state information are completed, and an upper triangular matrix of the map road traffic flow density is obtained.
Therefore, an initial value of the traffic flow distribution condition of the urban area is obtained according to the environmental road map and the road congestion evaluation layered thermodynamic diagram, then a dynamic traffic model is constructed, and the model is driven according to the road dynamic change condition.
Further, in a specific implementation, in the method for assisting driving decision through reinforcement learning for a dynamic traffic environment provided in the embodiment of the present invention, the step S103 may specifically include, in the process of constructing a dynamic traffic model: and when the traffic flow is updated every time, the information of the current intersection and the road vehicles is put into the dynamic traffic model, and the inter-intersection vehicle transfer information of each iteration is obtained through the dynamic traffic model so as to update the state information of each intersection and road vehicle after transfer. It can be understood that the dynamic traffic environment modeling of the invention refers to the idea of cellular automata to dynamically simulate the traffic flow, wherein the updating of the traffic flow data and the updating of the road traffic flow rules are mutually iterated along with the program operation.
The state updating formulas of the traffic flow and the traffic flow density in the dynamic traffic model are respectively as follows:
Figure BDA0003466870140000091
Figure BDA0003466870140000092
wherein each road is represented by intersection numbers i, j at two ends of the road, and X n×1 For a matrix of crossing vehicles, L n×1 Matrix of number of vehicles coming out of intersection, P n×n For the egress vehicle steering state matrix, ρ n×n And t represents the current time as a road traffic density matrix.
Specifically, taking the area a as yuquan and the area B as hong kong as an example, the discrete modeling based on the state is performed on the dynamic traffic. Given that the hong Kong-Yuquan area has n intersections, the model basic variable information is obtained according to the actual traffic network conditions as follows: intersection vehicle number matrix X n×1 Road traffic density matrix ρ n×n Road length matrix Y n×n Road widthMatrix alpha n×n (ii) a Wherein each road is represented by intersection labels i, j at two ends of the road, and the matrix Y n×nn×n Can be a fixed value, obtained by abstract modeling of an actual road in the early stage, X n×1n×n In order to update the variables in real time, inter-intersection vehicle transfer information of each iteration is obtained through a dynamic traffic model, and further state information of each intersection and road vehicle after transfer is updated, the specific process is shown in fig. 7, and an intersection modeling schematic diagram is shown in fig. 8.
When the traffic flow is updated every time, the information of the current intersection and the road vehicles is put into a dynamic traffic network model to obtain the information of the vehicles transferred between the intersections, including an intersection outflow vehicle number matrix L n×1 Matrix P of steering states of outflowing vehicle n×n . Based on the markov process, its successor state S' is completely determined by the current state S, and the traffic successor updates the state.
In specific implementation, in the method for assisting driving decision by reinforcement learning for a dynamic traffic environment provided in the embodiment of the present invention, in order to simulate a vehicle flow situation between intersections and convert a single intersection flowing out of a vehicle to a probability problem, step S103 may specifically include the following steps in a process of constructing a dynamic traffic model: the method comprises the following steps of carrying out normalization processing on the intersection turning probability in a dynamic traffic model to meet the requirement that the sum of all intersection turning probabilities is 1, wherein each intersection turning probability has four directions, and the specific formula is as follows:
Figure BDA0003466870140000101
in specific implementation, in the reinforcement learning aided driving decision method for a dynamic traffic environment provided by the embodiment of the present invention, the dynamic traffic model may include an intersection turning probability model and an intersection outgoing vehicle model. Wherein, the turning rules of the intersection turning probability model comprise generality and particularity. The generality means that the larger the traffic density is, the larger the probability is; and the greater the degree of the skeleton, the greater the probability. The particularity is obstacle avoidance, and the probability is higher when the driving speed is higher; and destination randomness.
Because the dynamic traffic network simulates the running state of a general traffic flow, the destination is unknown and has randomness, when a crossing turning probability model is constructed, general thinking and special condition consideration are integrated, and reasonable conjecture is made on the turning selection of the vehicles at the crossing. Based on general thinking, for a vehicle with an unknown destination, the probability that the vehicle enters a main road and the road with a large traffic flow is obviously higher than that of other roads, namely, the probability is higher when the traffic flow density is higher and the main road degree is higher. Based on special case considerations, since the vehicle destination is unknown; if the vehicle is not unique in the intersection steering mode, a road with a small traffic flow is selected possibly based on the obstacle avoidance idea, that is, the probability is larger when the driving speed is larger. On the contrary, if the turning mode of the vehicle at the intersection is unique and determined, the turning selection is only purposefully unique guide, and the model is generated by random digital simulation.
Considering the above factors, the intersection turning probability model P i,j Can be as follows:
Figure BDA0003466870140000111
wherein each road is represented by intersection labels i, j at two ends of the road;
Figure BDA0003466870140000112
as a function of speed and density, p i,j Is the traffic density, beta, of the communication road between the ith intersection and the jth intersection 1 、β 2 、β 3 、β 4 Are all weighting coefficients, β 4 rand denotes a random factor, V max Representing the road speed upper limit, p max The upper density limit is shown, and the corresponding speed is 0.
Specifically, the outgoing vehicle of the intersection outgoing vehicle model includes the density of the traffic flow and the vehicle running speed; the vehicle travel speed includes a base travel speed and a congestion-affected speed. The principle of the number of vehicles flowing out of the intersection is as follows: the number of vehicles flowing out of the intersection is equal to the width of a road, the density of the traffic flow, the speed of the vehicles and the time.
The crossroad outflow vehicle model is a discrete model based on state, the time interval of each iteration updating period is 1, and the number L of crossroad outflow vehicles is calculated by adopting the following formula j
Figure BDA0003466870140000113
Wherein alpha is i,j And the width matrix of the connecting road between the ith intersection and the jth intersection.
Further, in a specific implementation, in the above method for assisting driving decision by reinforcement learning oriented to a dynamic traffic environment provided by the embodiment of the present invention, since the operation cost is equal to the time spent, and the time spent on transferring the intersection needs to consider not only the time in the transferring process (the longer the time, the greater the cost), but also the time spent on going to the destination after transferring (the farther the distance, the greater the cost), so the step S104 obtains the transfer cost of the vehicle between any two intersections, which specifically includes the following steps:
firstly, calculating the running time of a road with two connected road openings according to the relation between the traffic flow density and the running speed;
the steps are modeling for the road running time between intersections: knowing a traffic density matrix rho and a road length matrix Y, and assuming that the road density and the running speed are in a linear relation, the running time of a road connected between two road ports is the ratio of the road length to the running speed; specifically, the inter-intersection road running time t 1 The expression of (c) is:
Figure BDA0003466870140000114
wherein, y ij Indicating the length of the road between intersections.
Secondly, calculating the waiting time of the intersection according to the relationship between the waiting time of the intersection and the number of vehicles at the intersection;
the steps are modeling for the intersection waiting time: knowing the matrix X of the number of vehicles at the intersection, assuming a roadThe intersection waiting time is in a linear relation with the number of the intersection vehicles, and the intersection waiting time is the product of k and the number of the intersection vehicles; in particular, the road waiting time t 2 The expression of (a) is: t is t 2 =k·x i Wherein x is i And k is a linear relation coefficient, and is the number of vehicles at the intersection corresponding to the ith intersection.
Thirdly, calculating the driving time from the target intersection to the terminal according to the coordinates of the target intersection, the coordinates of the terminal and the driving speed of the current road;
the steps are modeling for the driving time from the target intersection to the terminal: assuming that the running speed is the average running speed of the current road, the running time from the target intersection to the terminal point is the ratio of the distance to the running speed; specifically, the travel time t from the target intersection to the end point 3 The expression of (a) is:
Figure BDA0003466870140000121
the distance d can be calculated according to the coordinates of the target intersection and the coordinates of the terminal point.
Fourthly, linearly weighting the driving time of the road with two connected road junctions, the waiting time of the road junctions and the driving time from the target road junction to the terminal point to obtain the transfer cost of two adjacent road junctions;
specifically, according to the calculation results of the first step to the third step, since the three partial functions have the same dimension, the linear weighting is performed on the three partial functions, and the total transfer cost expression C from the ith intersection to the jth intersection can be obtained ij
Figure BDA0003466870140000122
And fifthly, obtaining the transfer cost between any two intersections according to the product of the transfer cost of two adjacent intersections and the reciprocal of the reachable matrix element.
Specifically, on the basis of the result obtained in the fourth step, the reachability of a non-adjacent intersection is considered, and a transfer cost expression is further processedCorrecting, and for two non-adjacent intersections, the transfer cost is expected to be infinite, and the transfer cost C between any two intersections can be obtained ij
Figure BDA0003466870140000123
Wherein, a ij And the intersection j and the intersection i are respectively indicated whether to be reachable, and can reach 1 and not reach 0.
Further, in a specific implementation, in the method for assisting driving decision by reinforcement learning for a dynamic traffic environment according to the embodiment of the present invention, the step S105 of performing a route planning based on reinforcement learning may specifically include: and performing path planning by using a Q-learning reinforcement learning algorithm. Therefore, in the auxiliary driving decision method part, the auxiliary driving route decision under the local information view angle is considered, and the problem that global information cannot be obtained frequently in the real situation can be solved through a Q-Learning reinforcement Learning algorithm, so that the auxiliary driving decision effect is improved.
In specific implementation, the process of performing the path planning by using the Q-learning reinforcement learning algorithm may specifically include: the reinforcement learning state S is set to { S } 1 ,s 2 ,...,s m Defining the intersection sequence number from the starting point to the end point of the driving; the action A of reinforcement learning is { a } 1 ,a 2 ,...,a n Defining the state of the next moment, namely the sequence number of the intersection to go to at the next moment; reward matrix to reinforce learning
Figure BDA0003466870140000131
The element in (1) is defined as the transfer cost of each road of each intersection, wherein under the intersection where the reinforcement learning agent is located at present, the reinforcement learning agent can only know the information of the surrounding connected roads; deriving transition probabilities from dynamic traffic models
Figure BDA0003466870140000132
The interactive environment model of the reinforcement learning agent is described by transition probability; according to the theory of reinforcement learningThe Bellman equation in (1) updates the Q value matrix in the Q-learning reinforcement learning algorithm. Q value matrix
Figure BDA0003466870140000133
The step-by-step iteration process of the middle Q value updating process is described as follows:
Figure BDA0003466870140000134
wherein α is the learning rate of Q-learning reinforcement learning.
It should be noted that, based on the analysis of the dynamic traffic model, the problem has been transformed to solve the minimum cost path from the starting point to the end point under the undirected connected graph model, and requires reasonable use of global or local information. Therefore, a Q-learning reinforcement learning algorithm is selected and local information is used for path planning. The prime symbols are defined as follows:
and a state s: the reinforcement learning agent is used to make decisions about the information to be used for the next time strategy to be taken. The state at the appointed time t is recorded as st, and the action set of the whole problem is S ═ S 1 ,s 2 ,...,s m }。
Action a: the strategies taken by the reinforcement learning agent in different states enable the reinforcement learning agent to transition between different states by taking different actions. The action at contract time t is denoted as a t The action set for the entire problem is a ═ a 1 ,a 2 ,...,a n }。
Observation o: and the reinforcement learning agent learns the environmental information observed by the agent at different moments and different states. The observation at contract time t is denoted as o t
Strategy pi: reinforcement learning agent in state s i Take action a j Probability of (a), i.e. pi (a) j |s i )=P(A t =a j |S t =s i ). For deterministic processes, for state s i There should be only one optimal strategy, namely:
Figure BDA0003466870140000141
state transition matrix P: each row of the state transition matrix corresponds to one state and each column corresponds to the other state, element p ij Represents a state of being i To state s j Corresponding probability P(s) j |s i ). It is obvious that the state transition matrix has properties
Figure BDA0003466870140000142
Figure BDA0003466870140000143
m:number of states
Cost value matrix R: each row of the cost value matrix corresponds to a state, each column corresponds to an action, and the element r ij Is represented in state s i Take action a j The corresponding cost value.
Figure BDA0003466870140000144
m:number of states;n:number of actions
Discount factor γ: indicating the relative degree of importance between the future award and the current award.
Report G t
Figure BDA0003466870140000145
Representing a weighted sum of future prize values.
Action value matrix Q: each row of the cost value matrix corresponds to a state and each column corresponds to an action, where the element q ij =q(s i ,a j )=E π [R t+1 +γq(s t+1 ,a t+1 )|s t =s i ,a t =a j ]Is represented in state s i Take action a j The corresponding action value is desired.
Figure BDA0003466870140000146
m:number of states;n:number of actions
In the problem solving process, if the reinforcement learning agent can only obtain local information, when the reinforcement learning agent is positioned at a Cross1, the considerable information of the reinforcement learning agent only comprises the vehicle information of Cross at an adjacent Cross and the traffic density information of Road (1, i) connected with the adjacent Cross, and the advantages of the reinforcement learning method can be well embodied.
In the learning process, the reinforcement learning agent needs to build an environment model step by step to estimate the change of the environment. One model needs to include transition probabilities
Figure BDA0003466870140000151
And reward value estimation
Figure BDA0003466870140000152
Two parts; wherein the content of the first and second substances,
Figure BDA0003466870140000153
is shown at S t =s,A t Under a to S t+1 As the probability value of s',
Figure BDA0003466870140000154
is shown at S t =s,A t The estimate of the prize value for the next moment under a.
Figure BDA0003466870140000155
Can be generally obtained by Monte Carlo stochastic simulation, and
Figure BDA0003466870140000156
typically through multiple iterative convergence of the markov chain.
Considering the path planning problem based on local information, the path planning problem can be abstracted into a markov decision process containing 30 states and 30 actions, and a Q-Learning method is adopted in the problem to solve the markov decision process.
After the environment model is obtained, the Q-Learning algorithm may be used to select a policy. In reinforcement learning, some reinforcement learning agents can only obtain a series of action values, so that the reinforcement learning agents need to take actions in consideration of not only the reward of performing the current action but also the future reward that may be brought by reaching the next state after the action is performed. This can be expressed in mathematical language as:
q π (s,a)=E π [R t+1 +γq π (S t+1 ,A t+1 )|S t =s,A t =a]
therefore, the Q-Learning iteration needs to involve quantities including: a specific state S t Perform action A under s t Prize value R corresponding to a t+1 (ii) a A certain specific state S t Perform action A under s t A next state S reached t+1 (ii) a In a state S t+1 Then execute action A according to the current strategy pi or random selection action t+1 Corresponding Q value Q π (S t+1 ,A t+1 )。
First, for a reward value R t+1 In the problem solving, the cost value of each road of each intersection is used as a reward value. For the intersection crosssi where the reinforcement learning agent is located, the reinforcement learning agent only knows the cost value of the Road (i, j) j belonging to the access to crosssi around the reinforcement learning agent.
Second, learning agent is in a particular state S t Perform action A under s t A next state S reached t+1 This, in turn, requires solving using the previously obtained estimation model. The estimation model can give the state S t Perform action A under s t A to a different S t+1 Probability of s ═ s
Figure BDA0003466870140000157
So that the next state of the reinforcement learning agent can be obtained.
Again, the agent is in state S for reinforcement learning t+1 Execute action A according to the current strategy pi or random selection action t+1 Corresponding Q value Q π (S t+1 ,A t+1 ). Since Q-Learning is an iterative process, in pair Q π (S t ,A t ) When iterative update is carried out, q obtained by last iteration can be adopted π (S t+1 ,A t+1 ). The updating process of the Q value matrix can be gradually and iteratively learned according to the Bellman equation.
The decision result of assisted driving based on Q-Learning reinforcement Learning is shown in fig. 9. The path with the darkest color represents a suggested route generated by the driving assistance decision, and the gray value of each road section represents the congestion degree of the road section. Note that since the actual program is a dynamic process of state switching, it cannot be described here in picture volume.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
To sum up, the reinforcement learning aided driving decision method for the dynamic traffic environment provided by the embodiment of the invention comprises the following steps: extracting urban area road information from an environment road map, and abstracting the urban area road information into a road area undirected graph after simplifying preprocessing; acquiring the traffic density distribution condition of regional roads by using an API (application programming interface) for real-time road condition query; constructing a dynamic traffic model according to the traffic flow density distribution condition and the road traffic rule; obtaining the transfer cost of the vehicle between any two intersections; in the road area undirected graph, a path planning based on reinforcement learning is carried out according to a dynamic traffic model and corresponding transfer cost, and a minimum cost path from a starting point to a terminal point is solved. Therefore, dynamic traffic environment modeling is carried out according to the environment road map and real-time road condition query API interface data, simple and easily-obtained dynamic traffic flow information is fully utilized to provide an auxiliary driving decision scheme for a driver, dynamic route planning under a local information view angle is further realized, and the problem that a path planning result is inaccurate easily caused by only using static information is effectively avoided.
Finally, it should also be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
The dynamic traffic environment-oriented reinforcement learning aided driving decision method provided by the invention is described in detail above, a specific example is applied in the method to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (8)

1. A reinforcement learning auxiliary driving decision method for a dynamic traffic environment is characterized by comprising the following steps:
extracting urban area road information from an environment road map, and abstracting the urban area road information into a road area undirected graph after simplifying preprocessing;
acquiring the traffic density distribution condition of regional roads by using an API (application programming interface) for real-time road condition query;
constructing a dynamic traffic model according to the traffic flow density distribution condition and the road traffic rule; the dynamic traffic model comprises an intersection turning probability model; the intersection turning probability model P i,j Comprises the following steps:
Figure FDA0003741618850000011
wherein, each road is represented by intersection labels i and j at two ends of the road;
Figure FDA0003741618850000012
as a function of speed and density, p i,j Is the traffic density, beta, of the communication road between the ith intersection and the jth intersection 1 、β 2 、β 3 、β 4 Are all weighting coefficients, β 4 rand denotes a random factor, V max Representing the road speed upper limit, p max Represents the upper density limit, corresponding to a speed of 0;
the dynamic traffic model further comprises an intersection outflow vehicle model; the intersection outflow vehicle model is a discrete model based on state, the time interval of each iteration updating cycle is 1, and the number L of the intersection outflow vehicles is calculated by adopting the following formula j
Figure FDA0003741618850000013
Wherein alpha is i,j The width matrix of a communication road between the ith intersection and the jth intersection is obtained;
obtaining the transfer cost of the vehicle between any two intersections;
and in the road area undirected graph, performing a path planning based on reinforcement learning according to the dynamic traffic model and the corresponding transfer cost, and solving a minimum cost path from a starting point to a terminal point.
2. The dynamic traffic environment-oriented reinforcement learning aided driving decision-making method according to claim 1, wherein an API (application program interface) for real-time road condition query is used for obtaining the traffic flow density distribution condition of regional roads, and the method comprises the following steps:
according to the longitude and latitude of each road in the range from the driving starting point to the destination, dividing the road into longitude and latitude points corresponding to the road according to a certain longitude and latitude step length, and returning the congestion evaluation of each intersection node through an API (application program interface) for real-time road condition query;
utilizing congestion evaluation of each point in the whole environmental road map range to draw a road congestion evaluation layered thermodynamic diagram;
calculating the traffic flow density of each road according to the congestion condition of each road in the thermodynamic diagram;
and acquiring the traffic density distribution condition of the regional roads according to the calculated traffic density of each road.
3. The dynamic traffic environment-oriented reinforcement learning aided driving decision method according to claim 2, is characterized in that the traffic flow density of each road is calculated by the following formula:
Figure FDA0003741618850000021
where ρ is m For the current roadThe traffic density of the mth longitude and latitude point, N is the total number of the longitude and latitude points, c m A road congestion evaluation value l corresponding to the mth longitude and latitude point in the road m For the longitude and latitude step length of the road, ∑ l m The total length of the road.
4. The dynamic traffic environment-oriented reinforcement learning aided driving decision method according to claim 1, is characterized in that in the process of constructing a dynamic traffic model, the method comprises the following steps:
and when the traffic flow is updated every time, the information of the current intersection and the road vehicles is put into the dynamic traffic model, and the inter-intersection vehicle transfer information of each iteration is obtained through the dynamic traffic model so as to update the state information of each intersection and road vehicle after transfer.
5. The dynamic traffic environment-oriented reinforcement learning aided driving decision method according to claim 4, wherein in the process of constructing the dynamic traffic model, the method further comprises the following steps:
and normalizing the probability of intersection turning in the dynamic traffic model to meet the sum of the turning-out probabilities of all intersections as 1, wherein the turning-out probability of each intersection has four directions.
6. The dynamic traffic environment-oriented reinforcement learning aided driving decision method according to claim 1, wherein the step of obtaining the transfer cost of the vehicle between any two intersections comprises the following steps:
calculating the running time of the road with two connected road ports according to the relation between the traffic density and the running speed;
calculating the waiting time of the intersection according to the relationship between the waiting time of the intersection and the number of vehicles at the intersection;
calculating the running time from the target intersection to the terminal according to the coordinates of the target intersection, the coordinates of the terminal and the running speed of the current road;
linearly weighting the driving time of the road with the two road junctions connected, the waiting time of the road junctions and the driving time from the target road junction to the terminal point to obtain the transfer cost of two adjacent road junctions;
and obtaining the transfer cost between any two intersections according to the product of the transfer cost of two adjacent intersections and the reciprocal of the reachable matrix element.
7. The dynamic traffic environment-oriented reinforcement learning aided driving decision method according to claim 1, wherein the reinforcement learning-based path planning is performed by:
and performing path planning by using a Q-learning reinforcement learning algorithm.
8. The dynamic traffic environment-oriented reinforcement learning aided driving decision method according to claim 7, wherein in the process of performing path planning by using Q-learning reinforcement learning algorithm, the method comprises the following steps:
defining the reinforcement learning state as the intersection serial number from the starting point to the end point of the driving;
defining the action of reinforcement learning as the state of the next moment;
defining elements in the reinforcement learning reward matrix as the transfer cost of each road of each intersection, wherein under the current intersection, the reinforcement learning intelligent agent only can know the information of the surrounding connected roads;
obtaining a transition probability according to the dynamic traffic model; the environmental model interacted by the reinforcement learning agent is described by the transition probability;
and updating the Q value matrix in the Q-learning reinforcement learning algorithm according to the Bellman equation.
CN202210032222.6A 2022-01-12 2022-01-12 Reinforced learning aided driving decision-making method oriented to dynamic traffic environment Active CN114384901B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210032222.6A CN114384901B (en) 2022-01-12 2022-01-12 Reinforced learning aided driving decision-making method oriented to dynamic traffic environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210032222.6A CN114384901B (en) 2022-01-12 2022-01-12 Reinforced learning aided driving decision-making method oriented to dynamic traffic environment

Publications (2)

Publication Number Publication Date
CN114384901A CN114384901A (en) 2022-04-22
CN114384901B true CN114384901B (en) 2022-09-06

Family

ID=81201282

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210032222.6A Active CN114384901B (en) 2022-01-12 2022-01-12 Reinforced learning aided driving decision-making method oriented to dynamic traffic environment

Country Status (1)

Country Link
CN (1) CN114384901B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116518989B (en) * 2023-07-05 2023-09-12 新唐信通(浙江)科技有限公司 Method for vehicle navigation based on sound and thermal imaging

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104157139B (en) * 2014-08-05 2016-01-13 中山大学 A kind of traffic congestion Forecasting Methodology and method for visualizing
CN106767858A (en) * 2015-11-24 2017-05-31 英业达科技有限公司 Traffic route planning system
CN108281023B (en) * 2016-12-30 2020-08-21 中国移动通信集团公司 Method and system for displaying real-time road conditions through mobile terminal
US10796204B2 (en) * 2017-02-27 2020-10-06 Huawei Technologies Co., Ltd. Planning system and method for controlling operation of an autonomous vehicle to navigate a planned path
CN108847037B (en) * 2018-06-27 2020-11-17 华中师范大学 Non-global information oriented urban road network path planning method
CN109765820B (en) * 2019-01-14 2019-08-09 南栖仙策(南京)科技有限公司 A kind of training system for automatic Pilot control strategy
CN110363984B (en) * 2019-06-25 2021-04-02 讯飞智元信息科技有限公司 Traffic flow prediction method and apparatus

Also Published As

Publication number Publication date
CN114384901A (en) 2022-04-22

Similar Documents

Publication Publication Date Title
CN112216108B (en) Traffic prediction method based on attribute-enhanced space-time graph convolution model
CN109887282B (en) Road network traffic flow prediction method based on hierarchical timing diagram convolutional network
Lu et al. A bi-criterion dynamic user equilibrium traffic assignment model and solution algorithm for evaluating dynamic road pricing strategies
Qin et al. Reinforcement learning for ridesharing: An extended survey
Chen et al. Reliable shortest path finding in stochastic time-dependent road network with spatial-temporal link correlations: A case study from Beijing
CN108256914A (en) A kind of point of interest category forecasting method based on tensor resolution model
Huang et al. Eco-routing based on a data driven fuel consumption model
Qin et al. Reinforcement learning for ridesharing: A survey
CN114384901B (en) Reinforced learning aided driving decision-making method oriented to dynamic traffic environment
Tsagkis et al. Analysing urban growth using machine learning and open data: An artificial neural network modelled case study of five Greek cities
Zhou et al. A novel spatio-temporal cellular automata model coupling partitioning with CNN-LSTM to urban land change simulation
Chen et al. Systematizing heterogeneous expert knowledge, scenarios and goals via a goal-reasoning artificial intelligence agent for democratic urban land use planning
Tang et al. Modeling parking search on a network by using stochastic shortest paths with history dependence
Madadi et al. Multi-stage optimal design of road networks for automated vehicles with elastic multi-class demand
Li et al. Non‐linear fixed and multi‐level random effects of origin–destination specific attributes on route choice behaviour
Guan et al. HGAT-VCA: Integrating high-order graph attention network with vector cellular automata for urban growth simulation
Gupta et al. Urban traffic light scheduling for pedestrian–vehicle mixed-flow networks using discrete sine–cosine algorithm and its variants
Quan et al. An optimized task assignment framework based on crowdsourcing knowledge graph and prediction
CN116486624A (en) Traffic flow prediction method and system based on space-time diagram convolutional neural network
Yu et al. A batch reinforcement learning approach to vacant taxi routing
CN115600421A (en) Construction method and device and medium of autonomous traffic system evolution model based on improved Petri network
CN114153216A (en) Lunar surface path planning system and method based on deep reinforcement learning and block planning
Guo et al. Simulating urban growth by coupling macro-processes and micro-dynamics: a case study on Wuhan, China
Shi et al. An adaptive route guidance model considering the effect of traffic signals based on deep reinforcement learning
Hussain et al. A Novel Graph Convolutional Gated Recurrent Unit Framework for Network-Based Traffic Prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant