CN113053122A - WMGIRL algorithm-based regional flow distribution prediction method in variable traffic control scheme - Google Patents
WMGIRL algorithm-based regional flow distribution prediction method in variable traffic control scheme Download PDFInfo
- Publication number
- CN113053122A CN113053122A CN202110305552.3A CN202110305552A CN113053122A CN 113053122 A CN113053122 A CN 113053122A CN 202110305552 A CN202110305552 A CN 202110305552A CN 113053122 A CN113053122 A CN 113053122A
- Authority
- CN
- China
- Prior art keywords
- agent
- road network
- state
- agents
- experience
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 39
- 230000002787 reinforcement Effects 0.000 claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 4
- 239000003795 chemical substances by application Substances 0.000 claims description 148
- 230000006870 function Effects 0.000 claims description 26
- 239000011159 matrix material Substances 0.000 claims description 22
- 230000009471 action Effects 0.000 claims description 16
- 210000002569 neuron Anatomy 0.000 claims description 15
- 230000007704 transition Effects 0.000 claims description 15
- 230000006399 behavior Effects 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 230000035772 mutation Effects 0.000 claims description 9
- 239000013598 vector Substances 0.000 claims description 8
- 238000012797 qualification Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 4
- 238000007476 Maximum Likelihood Methods 0.000 claims description 3
- 238000011160 research Methods 0.000 description 12
- 238000004088 simulation Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000002068 genetic effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 241001415830 Bubo Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 208000016335 bubo Diseases 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/01—Detecting movement of traffic to be counted or controlled
- G08G1/0104—Measuring and analyzing of parameters relative to traffic conditions
- G08G1/0125—Traffic data processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/01—Detecting movement of traffic to be counted or controlled
- G08G1/0104—Measuring and analyzing of parameters relative to traffic conditions
- G08G1/0137—Measuring and analyzing of parameters relative to traffic conditions for specific applications
- G08G1/0145—Measuring and analyzing of parameters relative to traffic conditions for specific applications for active traffic flow control
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/065—Traffic control systems for road vehicles by counting the vehicles in a section of the road or in a parking area, i.e. comparing incoming count with outgoing count
Landscapes
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Analytical Chemistry (AREA)
- Marketing (AREA)
- Chemical & Material Sciences (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Traffic Control Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method for predicting regional traffic flow distribution under a variable traffic control scheme based on a weighted multi-agent grouping reverse Reinforcement Learning algorithm, namely weighted multi-agent Group inversion Learning (WMGIRL), which comprises the following steps: s1, carrying out urban road network modeling on the area to be predicted, and collecting the driving track data in the area; s2, extracting flow characteristics in the area to be predicted through a maximum entropy inverse reinforcement learning method based on weighting (Weighted) based on urban road networks and traffic track data corresponding to the area to be predicted; and S3, processing the urban road network based on the extracted traffic characteristics and the current traffic control scheme by adopting a forward reinforcement learning method based on multi-agent Group (Multiagent Group), and obtaining a traffic flow distribution prediction result of the area to be predicted.
Description
Technical Field
The invention belongs to the technical field of urban traffic flow prediction, and particularly relates to a variable traffic control scheme summary area flow distribution prediction method based on a WMGIRL (multi-agent reverse reinforcement learning based on in-group evolution) algorithm.
Background
Along with the continuous improvement of the comprehensive strength of China, a large number of cities have been held and operated large-scale activities one after another, and urban roads cannot meet the rapidly increasing traffic demands in a short time. In recent years, as the research on urban intelligent traffic is more and more intensive, the research on two problems of traffic guidance technology and traffic control has become a core research method of urban intelligent traffic, the current core technology of traffic guidance technology and traffic control is still in the research on short-term traffic flow, and with the continuous practice and development of urban traffic flow prediction methods, experts and scholars all over the world at present strive to develop short-term traffic flow prediction methods, which mainly include the following types:
(1) the naive mathematical statistical method carries out prediction research on the urban traffic flow;
(2) carrying out prediction research on urban traffic flow based on an artificial intelligence method;
(3) carrying out prediction research on urban traffic flow based on a nonlinear theory method;
urban traffic flow prediction is a hotspot of research in decades in the future, the complexity of the current urban traffic types is high, the scale-to-scale ratio of the traffic network is high, how to comprehensively arrange various traffic modes and complex traffic networks in one environment to build a model on the basis of the methods, and the method realizes the acquisition and screening of traffic flow data to predict the traffic flow more accurately, and is a problem worthy of thinking.
Disclosure of Invention
In order to overcome the defects in the prior art, the method for predicting the regional flow distribution in the variable traffic control scheme based on the WMGIRL algorithm is based on the angle of the driving track, and is comprehensively considered from multiple aspects such as an environment model and road attributes so as to improve the accuracy of flow distribution prediction.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that: the method for predicting regional flow distribution in the variable traffic control scheme based on the WMGIRL algorithm comprises the following steps:
s1, carrying out urban road network modeling on the area to be predicted, and collecting the driving track data in the area;
s2, extracting flow characteristics in the area to be predicted through a maximum entropy inverse reinforcement learning method based on multiple weights based on the urban road network and the traffic track data corresponding to the area to be predicted;
and S3, processing the urban road network based on the extracted traffic characteristics and the current traffic control scheme by a forward reinforcement learning method to obtain a traffic distribution prediction result of the area to be predicted.
Further, in step S1, the method for modeling an urban road network for the area to be predicted specifically includes:
a1, in the area to be predicted, regarding the intersection as a point V and regarding the road between two adjacent intersections as a side E to obtain an entity road network G (V, E);
a2, regarding the crossing in the entity road network G (V, E) as the state in the Markov process, regarding the road as the behavior required by the intelligent agent in the state transition, and further analogizing the entity road network G (V, E) to the environment model;
a3, determining an environment matrix F of the environment model based on the connectivity of the road network in the entity road network G (V, E), and determining a state transition matrix P according to the attributes of the roads in the entity road network G (V, E), thereby completing the modeling of the city road network;
wherein, the environment matrix F is:
F=[(e1,l1),(e2,l2),...,(ef,lf)]
wherein (e)f',lf')∈F,ef'Is a road network node,/f'For the road network node as ef'The number of motor vehicle lanes, subscript f 'is a road network node ordinal number, and f' is 1,2, 3.
The state transition matrix P is a three-dimensional matrix with the size of m multiplied by n multiplied by m established on the basis of an environment matrix, wherein m is the number of states, and n is the number of behaviors; each element P in the state transition matrix Psas'Indicating that taking action a in state s transitions to a new one of states s', and Psas'∈[0,1]。
Further, the step S2 is specifically:
s21, constructing an expert strategy set T ═ ξ { ξ) based on urban road network and collected driving track data0,ξ1,...,ξm};
Wherein ξi∈T,ξiFor expert strategies, the index i is 1,2,3iIs xii={(s0,a0),(s1,a1),...,(sn,an)},(st,at)∈ξi,(st,at) Is a binary group, stIs in a state oftFor activity, the subscript t is the tth time, t 1,2, 3.
S22, setting each expert strategy xi in the expert strategy set TiState of(s)tSetting a characteristic expectation weight, and calculating the characteristic expectation of an expert strategy;
s23, setting a characteristic weight coefficient x based on the condition that the lengths of the expert strategies in the expert strategy set are different, and initializing the characteristic weight coefficient x;
s24, determining the state return rate c according to the characteristic weight coefficient x, and further obtaining a reward functionAnd a state action value function p (s' | s, a);
S26, accessing frequency based on expected stateUpdating the feature weight coefficients by using the gradient vectors;
s26, repeating the steps S24-S25 until convergence, and entering the step S27;
and S27, taking the current reward function as the optimal reward function to obtain the flow characteristics.
Further, in the step S22, the expert strategy ξiState of(s)tIs expected weight z of the featurenComprises the following steps:
in the formula, l is expert strategy xiiLength weight of (d), γtIn order to obtain the learning rate of the algorithm,as a function of the reward, i.e. the reward given to the ith trace for its state at time t,the state of the ith track at the time t is shown;
wherein the expert strategy xiiThe length weight of (l) is:
in the formula, len (-) is the length of the expert strategy;
the state action value function p (S' | S, a) in step S24 is:
in the formula, p (s '| s, a) represents the probability that agent i takes action a in state s, reaching state s'.
wherein L (theta) is a maximum likelihood function,for the access frequency actually calculated from the expert trajectory,is a binary group(s)t,at) The desired frequency of the frequency of,is the corresponding feature vector.
Further, the step S3 is specifically:
s31, and based on the expert strategy set T ═ ξ { ξ) corresponding to the urban road network under the current traffic control scheme0,ξ1,...,ξmConstruction ofAn environmental model M;
s32, arranging a plurality of agents in the constructed environment model, and arranging three experience cache pools in each agent;
wherein, the three experience cache pools in each agent are priority cache pools R respectivelypriorityRevenue caching pool RsuccessAnd a loss buffer pool Rfailure;
S33, each agent selects a behavior a in turntModifying the state s of a current agenttAnd gives a corresponding return value rtAnd then determining the corresponding experience(s)t,at,rt,st+1);
S34, based on the reported value rtWill experience(s)t,at,rt,st+1) And the corresponding priority Y is stored in an experience cache pool corresponding to the agent;
s35, slave agent and state S thereoftExtracting experiences from an experience cache pool of the nearest agent according to a set proportion to form a training set, and training the DDPG network structure of the agent;
s36, and the track similarity D of each trained agentsClustering is carried out to realize grouping of the agents;
s37, carrying out intra-group evolution operation on each group of agents obtained by grouping to obtain a plurality of new agent subgroups to form a track set of the current city road network;
and S38, counting the flow of each road section in the urban road network based on the track set of the current urban road network, and realizing the flow distribution prediction.
Further, the step S34 is specifically:
r1, experience(s)t,at,rt,st+1) Deposit to priority cache pool RpriorityIn calculating the experience(s)t,at,rt,st+1) And stores the priority Y into a priority cache pool Rpriority;
R2, experience(s)t,at,rt,st+1) Is given a return value rtTo be positive, the experience(s) is given according to its priority Yt,at,rt,st+1) Deposit into profit cache pool Rsuccess;
When the experience(s)t,at,rt,st+1) Is given a return value rtWhen negative, the experience is assigned according to its priority Y(s)t,at,rt,st+1) Store in loss buffer pool Rfailure。
Further, in the step A1, experience(s)t,at,rt,st+1) The formula for calculating the priority Y is as follows:
Y=Azt+Bδt
wherein A and B are error coefficients, ztAs a qualification factor, δtIn order to be the error of the TD,
wherein, the qualification factor ztComprises the following steps:
where γ is the discount coefficient, λ is the trace-attenuation parameter,is in a state stThen, take action atThe return value of (2);
TD error deltatComprises the following steps:
δt=Rt+γQ'(st,argmaxQ(st,at)-Q(st-1,at-1))
in the formula, RtFor the reward obtained at time t, Q' (. cndot.) is a behavior selection function corresponding to the state in order to maximize the reward.
Further, the step S36 is specifically:
b1, calculating the track similarity D between the tracks generated by a plurality of agents through a dynamic time warping algorithms;
B2, connecting any two tracksDegree of similarity DsAs the inter-class distance;
and B3, dividing the agents corresponding to the inter-class distance smaller than the set threshold into a group, and realizing grouping of the agents.
Further, in step S37, performing an intra-group evolution operation on each group of agents to obtain a new group of agents specifically includes:
c1, regarding all agents as population pop, and coding each agent;
c2, calculating the fitness of the agents in the population pop based on the codes of the agents;
c3, initializing an empty population newport;
c3, adding the agent with the maximum fitness into the newport population;
c4, selecting the remaining agents in the population pop, adding the selected agents into the population newport, and updating the population newport;
and C5, replacing the population pop with the updated population newport, and decoding each agent in the population newport to obtain a new agent group.
Further, in the step C1, the method for encoding each Agent in the population pop includes:
carrying out real number coding on the weight of the neural network of the agent, wherein the coding mode is as follows:
Agent=[w11...wx'y'...wxy,w11...wy'q'...wyq]
in the formula, wx'y'Is the weight, w, of the input layer neuron x 'to the hidden layer neuron y' in the neural network of the agenty'q'The weight from hidden layer neuron y 'to output layer neuron q' in the hidden layer neuron network;
in step C2, the fitness f (b) of the agent is:
f(i)=Rb+q∑δi
in the formula, RbFor the total award, sigma delta, of agent b for each roundiIs the total loss value of agent b, q is the learning rate;
the step C4 specifically includes:
c41, selecting the agent by roulette selection among the remaining agents within the population pop;
c42, generating a new Agent from the selected Agent based on the set cross probability and mutation probability;
wherein the cross probability PcComprises the following steps:
Pc=1-0.5/(1+eΔf)
probability of variation PmComprises the following steps:
Pm=1-0.05/(1+eΔf)
in the formula, Δ f is the difference between the maximum fitness value and the average fitness value in the population pop;
and C43, adding the generated new agent into the newport group until the number of agents in the newport group reaches a set number, and finishing the update of the newport group.
The invention has the beneficial effects that:
(1) the method starts from the angle of the driving track, and comprehensively considers a plurality of aspects such as an environment model, road attributes and the like so as to improve the accuracy of flow distribution prediction;
(2) according to the method, firstly, a road network is modeled, the connection relation among road points is extracted to serve as the basis of simplifying the road network, then the road network is simplified and operated, so that the workload is reduced for the subsequent maximum entropy algorithm, the problem of overlarge environment model is solved, and the link relation among intersections in the road network cannot be damaged;
(3) according to the method, the characteristic that an expert strategy set is used as a return function in a research object mining environment in inverse reinforcement learning is utilized, and unique urban resident driving track characteristics are mined;
(4) the invention provides a multi-experience discrimination multi-agent reinforcement learning algorithm based on in-group evolution on the basis of a DDPG algorithm, researches that a plurality of agents can quickly find a target in an unknown environment by carrying out a plurality of mechanisms of information interaction group internal heredity isocratic, and can quickly and effectively find an optimal strategy in the environment compared with the traditional learning algorithm.
Drawings
Fig. 1 is a flowchart of a method for predicting traffic distribution in a variable traffic control scheme based on the WMGIRL algorithm according to the present invention.
FIG. 2 is a diagram of a simulation platform framework in an embodiment of the present invention.
Fig. 3 is a schematic diagram of a regional road network in an embodiment of the present invention.
Fig. 4 is a schematic diagram illustrating a traffic control scheme according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of experimental comparison results in an example provided by the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
Example 1:
as shown in fig. 1, the method for predicting regional traffic distribution in a variable traffic control scheme based on the WMGIRL algorithm includes the following steps:
s1, carrying out urban road network modeling on the area to be predicted, and collecting the driving track data in the area;
s2, extracting flow characteristics in the area to be predicted through a maximum entropy inverse reinforcement learning method based on multiple weights based on the urban road network and the traffic track data corresponding to the area to be predicted;
and S3, processing the urban road network based on the extracted traffic characteristics and the current traffic control scheme by a forward reinforcement learning method to obtain a traffic distribution prediction result of the area to be predicted.
In step S1, when modeling the urban road network, all things and related relations in the real world are abstracted reasonably to form a type of form formed by nodes and edges according to corresponding criteria, and the corresponding entity network can be expressed by G (V, E); based on this, the method for performing urban road network modeling on the area to be predicted in step S1 specifically includes:
a1, in the area to be predicted, regarding the intersection as a point V and regarding a road E between two adjacent intersections as a side E to obtain an entity road network G (V, E);
where V represents a finite set of "nodes" V ═ V1,v2,...,vmAnd E represents the set of two finite "edges" E ═ E1,e2,...,enWherein e ═ vi,vjDenotes that an edge e is formed by two nodes vi,vjTo determine directional rays, for a network of n nodes without authority, the graph adjacency matrix a (g) ═ aij]n*nWherein:
r2, regarding the crossing in the entity road network G (V, E) as the state in the Markov process, regarding the road as the behavior required by the intelligent agent in the state transition, and further analogizing the entity road network G (V, E) to the environment model;
r3, determining an environment matrix F of the environment model based on the connectivity of the road network in the entity road network G (V, E), and determining a state transition matrix P according to the attributes of the roads in the entity road network G (V, E), thereby completing the modeling of the city road network;
wherein, the environment matrix F is:
F=[(e1,l1),(e2,l2),...,(ef,lf)]
wherein (e)f',lf')∈F,ef'Is a road network node,/f'For the road network node as ef'The number of motor vehicle lanes, subscript f 'is a road network node ordinal number, and f' is 1,2, 3.
The state transition matrix P is a three-dimensional matrix with the size of m multiplied by n multiplied by m established on the basis of the environment matrix, wherein m is the number of statesN is the number of behaviors; each element P in the state transition matrix Psas'Indicating that taking action a in state s transitions to a new one of states s', and Psas'∈[0,1]。
In the above-mentioned urban road network modeling process, a certain node e in such a network is usually usediCorresponding degree kiCan be defined as the number of all nodes interconnected with it, a greater degree of a node means that the node is more important. For directed graphs, the degree of a node can generally be divided into two. Node eiDegree of penetration ofCan be recognized as pointing to node eiNumber of nodes involved, degree of outingCan be considered as the number of all nodes pointed to by node i. Then can derive
In general, in a complex network, an "edge" can represent an association that exists for each node. For the traffic network system in the city, it is not comprehensive in practice to express the mutual association between the adjacent intersections by the general road section is not the connection and the direction; in terms of a finer granularity, it should be taken into account that the lanes can also be divided into motor lanes and non-motor lanes; the most important attributes of a lane include the actual number of lanes, the width, the linearity and the gradient of the road, etc.
Because of the constraint of the relevant research conditions, the acquisition of all parameters in the current road network hierarchy still has certain difficulty, so the practical influence caused by the number of motor vehicle lanes is considered at first, and other elements are slowly subjected to empirical analysis along with the promotion of research work. This is also typically done during road element analysis. Only the number of lanes attribute was included in the study. Therefore, it is not only easy to use
E={(e1,l1),(e2,l2),...,(en,ln)}
Wherein l is the number of lanes of the motor vehicle.
The step S2 is specifically:
s21, constructing an expert strategy set T ═ ξ { ξ) based on urban road network and collected driving track data0,ξ1,...,ξm};
Wherein ξi∈T,ξiFor expert strategies, the index i is 1,2,3iIs xii={(s0,a0),(s1,a1),...,(sn,an)},(st,at)∈ξi,(st,at) Is a binary group, stIs in a state oftFor activity, the subscript t is the tth time, t 1,2, 3.
S22, setting each expert strategy xi in the expert strategy set TiState of(s)tSetting a characteristic expectation weight, and calculating the characteristic expectation of an expert strategy;
s23, setting a characteristic weight coefficient x based on the condition that the lengths of the expert strategies in the expert strategy set are different, and initializing the characteristic weight coefficient x;
s24, determining the state return rate c according to the characteristic weight coefficient x, and further obtaining a reward functionAnd a state action value function p (s' | s, a);
S26, accessing frequency based on expected stateUpdating the feature weight coefficients by using the gradient vectors;
s26, repeating the steps S24-S25 until convergence, and entering the step S27;
when the characteristic weight coefficient is not changed any more, convergence is achieved;
and S27, taking the current reward function as the optimal reward function to obtain the flow characteristics.
In the above process, when the agent learns the reward function in the environment model, if the expert policy set can access all states in the environment, the agent directly learns the expert policy set, and it is the most intuitive method to mine potential reward function information from the expert policies, but since the expert policy set is basically impossible to cover all the states in the environment, the state in the environment needs to be classified, the state that the expert policy often passes through when analyzing which states, the state with policy conflict is voted to find the optimal state, the state that the expert policy often passes through is weighted, and the agent is guided to mainly consider the high-frequency state in the process of learning the reward function. In addition, given expert strategies may be poor, so that the expert strategies need to be analyzed and information provided by all tracks is synthesized to evaluate each expert strategy, and a brand-new expert strategy set is formed. Thus, the present embodiment summarize the introduction of two weights z for the expert policy and the state in the environment, respectivelynAnd l;
from the perspective of the probability model, the inverse reinforcement learning is performed, and the track distribution probability model is solved and generated under the assumption that a hidden probability distribution exists, which is a known expert strategy. Some states in the track set may be that most tracks pass through the states, and the importance of the states in the practical application environment is not always the same, such as the vehicle always tends to drive to a road with a wider lane and avoids a downtown area which is easy to be blocked, so the weight of each state in the environment is different, and the expert strategy ξ is used foriState of(s)tIs expected weight z of the featurenDefining:
the input expert strategy set has the condition that the lengths of expert strategies in m expert strategies are different, the invention provides the probability of weight factors, so that each expert strategy has the same weight in the expert strategy set, and the expert strategy set is supposed to have m expert strategies:
T={ξ0,ξ1,...,ξm}
wherein the length of the expert strategy is q, given m expert strategy strategies, in the step S22, the expert strategy xiiCharacteristic expectation ofComprises the following steps:
in the formula, l is expert strategy xiiLength weight of (d), γtIn order to obtain the learning rate of the algorithm,as a function of the reward, i.e. the reward given to the ith trace for its state at time t,the state of the ith track at the time t is shown;
wherein the expert strategy xiiThe length weight of (l) is:
in the formula, len (-) is the length of the expert strategy;
the state action value function p (S' | S, a) in step S24 is:
in the formula, p (s '| s, a) represents the probability that agent i takes action a in state s, reaching state s'. In step S26, gradient vectors are usedThe formula for updating the feature weight coefficients is:
wherein L (theta) is a maximum likelihood function,for the access frequency actually calculated from the expert trajectory,is a binary group(s)t,at) The desired frequency of the frequency of,is the corresponding feature vector.
Step S3 of this embodiment specifically includes:
s31, and based on the expert strategy set T ═ ξ { ξ) corresponding to the urban road network under the current traffic control scheme0,ξ1,...,ξmConstructing an environment model M;
s32, arranging a plurality of agents in the constructed environment model, and arranging three experience cache pools in each agent;
wherein, the three experience cache pools in each agent are priority cache pools R respectivelypriorityRevenue caching pool RsuccessAnd a loss buffer pool Rfailure;
S33, each agent selects a behavior a in turntModifying the state s of a current agenttAnd gives a corresponding return value rtAnd then determining the corresponding experience(s)t,at,rt,st+1);
S34, based on the reported value rtWill experience(s)t,at,rt,st+1) And the corresponding priority Y is stored in an experience cache pool corresponding to the agent;
s35, slave agent and state S thereoftExtracting experiences from an experience cache pool of the nearest agent according to a set proportion to form a training set, and training the DDPG network structure of the agent;
s36, and the track similarity D of each trained agentsClustering is carried out to realize grouping of the agents;
s37, carrying out intra-group evolution operation on each group of agents obtained by grouping to obtain a plurality of new agent subgroups to form a track set of the current city road network;
and S38, counting the flow of each road section in the urban road network based on the track set of the current urban road network, and realizing the flow distribution prediction.
In the step S32, the present invention introduces an experience buffer pool mechanism to reserve the original priority buffer pool RpriorityMeanwhile, two new experience cache pools are created, namely a profit cache pool RsuccessAnd a loss buffer pool RfailureThe profit cache pool is used for storing experience of obtaining profits, the loss cache pool is used for storing experience of receiving punishments, the experience is judged while the experience calculation priority is put into the priority cache pool, and if the experience belongs to the profit experience or the punishment experience, the experience is stored into the corresponding experience cache pool.
The step S34 is specifically:
r1, experience(s)t,at,rt,st+1) Deposit to priority cache pool RpriorityIn calculating the experience(s)t,at,rt,st+1) And stores the priority Y into a priority cache pool Rpriority;
Therein, experience(s)t,at,rt,st+1) The formula for calculating the priority Y is as follows:
Y=Azt+Bδt
wherein A and B are error coefficients, ztAs a qualification factor, δtIn order to be the error of the TD,
wherein, the qualification factor ztComprises the following steps:
where γ is the discount coefficient, λ is the trace-attenuation parameter,is in a state stThen, take action atThe return value of (2);
TD error deltatComprises the following steps:
δt=Rt+γQ'(st,argmaxQ(st,at)-Q(st-1,at-1))
in the formula, RtFor the reward obtained at time t, Q' (. cndot.) is a behavior selection function corresponding to the state in order to maximize the reward.
R2, experience(s)t,at,rt,st+1) Is given a return value rtTo be positive, the experience(s) is given according to its priority Yt,at,rt,st+1) Deposit into profit cache pool Rsuccess;
When the experience(s)t,at,rt,st+1) Is given a return value rtWhen negative, the experience is assigned according to its priority Y(s)t,at,rt,st+1) Store in loss buffer pool Rfailure。
In step S35, when the agent learns, a certain amount of experience is extracted from the three experience pools according to a specific ratio, so that the relevance between data can be eliminated, the stability of neural network training is improved, training oscillation is reduced, and the extraction ratios from the priority cache pool, the profit cache pool, and the loss cache pool are as follows:
wherein, epicodenIs the current round number, epsilonmaxIs the total number of rounds.
Alpha is smaller along with the increase of the number of rounds, so that the intelligent body pays more attention to the preferred cache pool in the training period, the experience in the profit cache is reduced by a few unknown exploration actions, and when the experience is extracted from the experience cache pools of other intelligent bodies, the intelligent body collects the state with the closest distance between the experience pool and the state according to the state of the intelligent body, so that the experience of the other intelligent bodies exploring near the s state is fully utilized, and the environment same as the repeated exploration is effectively avoided. Therefore, the step S36 is specifically:
b1, calculating the track similarity D between the tracks generated by a plurality of agents through a dynamic time warping algorithms;
B2 similarity D of any two trackssAs the inter-class distance;
specifically, when two-day tracks are calculated, each point in the tracks is calculated until the point reaches the end point, all calculated distances are accumulated to obtain the track similarity in the invention, a most regular track with the minimum accumulated distance is found by utilizing the idea of dynamic planning, the accumulated distance is the path similarity between the two tracks, and the specific formula is as follows:
and B3, dividing the agents corresponding to the inter-class distance smaller than the set threshold into a group, and realizing grouping of the agents.
In the step S37, the method for performing the intra-group evolution operation in each group of agents to obtain a new group of agents specifically includes:
c1, regarding all agents as population pop, and coding each agent;
c2, calculating the fitness of the agents in the population pop based on the codes of the agents;
c3, initializing an empty population newport;
c3, adding the agent with the maximum fitness into the newport population;
c4, selecting the remaining agents in the population pop, adding the selected agents into the population newport, and updating the population newport;
and C5, replacing the population pop with the updated population newport, and decoding each agent in the population newport to obtain a new agent group.
In the process, when a plurality of agents are put into the environment for training at one time, when each round is finished, the DDPG models trained in the environment are good in quality, at the moment, the elite reservation strategy is used for reserving the models which are good in performance, the genetic evolution strategy is used for carrying out cross variation on the models which are poor in quality, and then the models are put into the environment as a new population for carrying out training of the next round.
Firstly, real number coding is performed on the neural network weight of each Agent, and in step C1, the method for coding each Agent in the population pop includes:
carrying out real number coding on the weight of the neural network of the agent, wherein the coding mode is as follows:
Agent=[w11...wx'y'...wxy,w11...wy'q'...wyq]
in the formula, wx'y'Is the weight, w, of the input layer neuron x 'to the hidden layer neuron y' in the neural network of the agenty'q'The weight from hidden layer neuron y 'to output layer neuron q' in the hidden layer neuron network;
in step C2, the genetic algorithm summarizes the optimization process of the DDPG neural network of the intelligent agent, and evaluates the quality of the individual according to the fitness function so as to achieve the purpose of eliminating the quality; the fitness f (b) of the agent is as follows:
f(i)=Rb+q∑δi
in the formula, RbFor the total award, sigma delta, of agent b for each roundiIs the total loss value of agent b, q is the learning rate;
when each agent is selected, in order to prevent the optimal solution generated in the evolution process from being damaged by crossover and variation, the present invention adopts an elite retention strategy to copy the individuals with the maximum fitness in each group to the next generation without any change, and therefore, the step C4 is specifically:
c41, selecting the agent by roulette selection among the remaining agents within the population pop;
the probability of roulette selection is:
c42, generating a new Agent from the selected Agent based on the set cross probability and mutation probability;
because of cross entropy PcAnd the mutation probability PmThe setting of (2) can influence the convergence of the genetic algorithm and increase the error of the optimal solution and the true solution to a great extent, and the cross probability and the mutation probability which can be adjusted in a self-adaptive manner are used in the invention to ensure the diversity of the population:
wherein the cross probability PcComprises the following steps:
Pc=1-0.5/(1+eΔf)
probability of variation PmComprises the following steps:
Pm=1-0.05/(1+eΔf)
in the formula, Δ f is the difference between the maximum fitness value and the average fitness value in the population pop;
and C43, adding the generated new agent into the newport group until the number of agents in the newport group reaches a set number, and finishing the update of the newport group.
Since the coding scheme adopted in this embodiment is a real number coding scheme, the interleaving and mutation operations respectively adopt an arithmetic interleaving operation and a uniform mutation operation, and the arithmetic interleaving operation refers to an arithmetic interleaving operationTwo agents are linearly combined to produce two new agents, assuming Agenta,AgentbThe arithmetic intersection is performed, and two new individuals generated after the intersection operation are:
Agent'a=Pc·Agenta+(1-Pa)·Agentb
Agent'b=Pc·Agentb+(1-Pc)·Agenta
the embodiment adopts uniform mutation operation, and is provided with an intelligent agent for coding in the form of the above-mentioned code, if wx'y'Is a variation point wx'y'=[ωmin,ωmax]For variation point wx'y'Make a random union with the mutation probability PmComparing, if less than, at [ omega ]min,ωmax]Selecting one random fragment w'x'y'Substitution of wx'y'。
Example 2:
in the embodiment, a flow simulation platform under a variable traffic control scheme built based on the flow distribution prediction method is provided,
the platform can predict regional flow distribution under different traffic control schemes on the basis of real urban road networks and data in buckles, the framework of the flow simulation platform based on the variable traffic control scheme is shown in figure 2, and the flow simulation platform is provided with three sub-modules including a road network module, a flow module and an algorithm module. The road network module models an urban road network in a specified urban specified range; the traffic module is used for excavating driving track data by utilizing the bayonet data of a certain time period within the specified range of the city; the algorithm module is divided into a flow characteristic extraction module and a flow simulation algorithm module, wherein the flow characteristic extraction module is used for extracting flow characteristics of the area at specific time by utilizing an improved maximum entropy-based reinforcement learning algorithm on the basis of a road network module and a flow module, the flow simulation module is used for carrying out flow simulation by taking the calculated flow characteristics and an urban road network under a traffic control scheme as input through forward reinforcement learning, and finally, the flow is displayed in a visual mode.
The embodiment always simulates the traffic control scheme of the Mianyang Cone device in 9 months in 2019 to predict the area traffic of the traffic control scheme about road section prohibition, and verifies the accuracy of the area traffic prediction of the traffic control scheme about intersection turning prohibition by simulating the traffic control scheme based on turning prohibition in the traffic maneuver of the Mianyang CBD area of the Mianyang city in 12 months in 2019.
(1) Road network selection explanation:
regional traffic prediction for traffic control schemes for road section prohibition is performed by consulting a traffic police team in Yangyang city and consulting a smart city project worker in Yangyang city to select a main urban area in Yangyang city, and the selected range is a rectangular range surrounded by four points of a dragon stream logistics park (104.717637,31.513032), an Yangyang suburban airport (104.749689,31.435669), a miller hub (104.603876,31.450148) and a youth square (104.774194, 31.46093). As shown in fig. 3(a, road network range, b, yang road network), the regional road network is crawled and yang road network is simplified. There are 3200 road segments and 1016 intersections.
(2) Preparing data:
the experiment predicts the flow distribution situation from 8 points to 9 points of the early peak at 9 and 6 days in 2019 during the cobble, the number of the expert strategy sets for combing the early peak is 302 from the bayonet data from 1 day in 9 and 1 days in 2019 to 5 days in 9 and 5 days in 2019, the minimum number of tracks in one expert strategy set is 43, the maximum number of tracks is 809, and the average length of the tracks is 46.
(3) Traffic control scheme description:
the traffic police department in the Mianyang city during the Mianyang scientific society implements road ban on the following roads: (a) one-way traffic is carried out from the flying cloud great road winery intersection to the mountain road to the rubdown intersection; (b) the Liaoning Daodao Li Hospital intersection passes to the old Yong Lu Li Hospital intersection in one way; (c) the old Yong an Lu Xin automobile intersection goes to the Bubo garden intersection for one-way traffic, as shown in FIG. 4.
(4) Simulation experiment process:
in order to verify the accuracy and reliability of the algorithm provided by the present invention, the flow rate during the kobo meeting is subjected to a plurality of flow rate simulations for the mianyang urban area after the traffic control scheme is applied during the mianyang kobo meeting, and the experimental result is shown in fig. 5. The non-path-blocking prediction performed at 9 am of the first day by the cobbleshoot can show that the core circle and the control circle have congestion conditions at 9 am in the early peak period, red and orange congestion stages are displayed in a large range in the exhibition center of the cobbleshoot at the moment according to thermodynamic diagrams, after unidirectional path blocking is performed, orange congestion conditions of a small number of road sections occur in the core and control area, and light green basic smoothness is displayed in most areas. Different traffic states correspondingly appear according to different road sections in the non-road-closing state in the shunting area, and the orange jam condition appears on the road section close to the control and management area after the road is closed.
As can be seen from a comparison of fig. 5(a) and 5(b), the predicted flow distribution is similar to the actual flow distribution. Fig. 5(c) compares the predicted data of all the unidirectional road segments in the road network with the actual data, and it can be seen that the predicted data and the actual data have the same distribution trend in the road network. Then, the first 20 road segments with the largest vehicles passing through the road network are taken out to observe the graph 5(d), and the data predicted by the algorithm is slightly larger than the actual data, so that the data is usedRepresenting the actual flow rate, f(s) representing the predicted flow rate. In summary, the accuracy in the area traffic prediction experiment for the mianyang urban area after the platform implements the traffic control scheme during the mianyang scientific fair is as follows:
Claims (10)
1. the method for predicting regional flow distribution in the variable traffic control scheme based on the WMGIRL algorithm is characterized by comprising the following steps of:
s1, carrying out urban road network modeling on the area to be predicted, and collecting the driving track data in the area;
s2, extracting flow characteristics in the area to be predicted through a maximum entropy inverse reinforcement learning method based on multiple weights based on the urban road network and the traffic track data corresponding to the area to be predicted;
and S3, processing the urban road network based on the extracted traffic characteristics and the current traffic control scheme by a forward reinforcement learning method to obtain a traffic distribution prediction result of the area to be predicted.
2. The method for predicting the regional traffic distribution in the variable traffic control scheme based on the WMGIRL algorithm according to claim 1, wherein in the step S1, the method for modeling the urban road network of the region to be predicted specifically comprises:
a1, in the area to be predicted, regarding the intersection as a point V and regarding the road between two adjacent intersections as a side E to obtain an entity road network G (V, E);
a2, regarding the crossing in the entity road network G (V, E) as the state in the Markov process, regarding the road as the behavior required by the intelligent agent in the state transition, and further analogizing the entity road network G (V, E) to the environment model;
a3, determining an environment matrix F of the environment model based on the connectivity of the road network in the entity road network G (V, E), and determining a state transition matrix P according to the attributes of the roads in the entity road network G (V, E), thereby completing the modeling of the city road network;
wherein, the environment matrix F is:
F=[(e1,l1),(e2,l2),...,(ef,lf)]
wherein (e)f',lf')∈F,ef'Is a road network node,/f'For the road network node as ef'The number of motor vehicle lanes, subscript f 'is a road network node ordinal number, and f' is 1,2, 3.
The state transition matrix P is a three-dimensional matrix with the size of m multiplied by n multiplied by m established on the basis of an environment matrix, wherein m is the number of states, and n is the number of behaviors; each element P in the state transition matrix Psas'Indicating that taking action a in state s transitions to a new one of states s', and Psas'∈[0,1]。
3. The method for predicting the regional flow distribution in the variable traffic control scheme based on the WMGIRL algorithm according to claim 1, wherein the step S2 specifically comprises:
s21, constructing an expert strategy set T ═ ξ { ξ) based on urban road network and collected driving track data0,ξ1,...,ξm};
Wherein ξi∈T,ξiFor expert strategies, the index i is 1,2,3iIs xii={(s0,a0),(s1,a1),...,(sn,an)},(st,at)∈ξi,(st,at) Is a binary group, stIs in a state oftFor activity, the subscript t is the tth time, t 1,2, 3.
S22, setting each expert strategy xi in the expert strategy set TiState of(s)tSetting a characteristic expectation weight, and calculating the characteristic expectation of an expert strategy;
s23, setting a characteristic weight coefficient x based on the condition that the lengths of the expert strategies in the expert strategy set are different, and initializing the characteristic weight coefficient x;
s24, determining the state return rate c according to the characteristic weight coefficient x, and further obtaining a reward functionAnd a state action value function p (s' | s, a);
S26, accessing frequency based on expected stateUsing direction of gradientQuantity update characteristic weight coefficients;
s26, repeating the steps S24-S25 until convergence, and entering the step S27;
and S27, taking the current reward function as the optimal reward function to obtain the flow characteristics.
4. The method for predicting regional flow distribution in variable traffic control scheme based on WMGIRL algorithm of claim 3, wherein in the step S22, expert strategy ξiState of(s)tIs expected weight z of the featurenComprises the following steps:
in the formula, l is expert strategy xiiLength weight of (d), γtIn order to obtain the learning rate of the algorithm,as a function of the reward, i.e. the reward given to the ith trace for its state at time t,the state of the ith track at the time t is shown;
wherein the expert strategy xiiThe length weight of (l) is:
in the formula, len (-) is the length of the expert strategy;
the state action value function p (S' | S, a) in step S24 is:
in the formula, p (s '| s, a) represents the probability that agent i takes action a in state s, reaching state s'.
5. The method for predicting the regional flow distribution in the variable traffic control scheme based on the WMGIRL algorithm according to claim 3, wherein the step S3 specifically comprises:
s31, and based on the expert strategy set T ═ ξ { ξ) corresponding to the urban road network under the current traffic control scheme0,ξ1,...,ξmConstructing an environment model M;
s32, arranging a plurality of agents in the constructed environment model, and arranging three experience cache pools in each agent;
wherein, the three experience cache pools in each agent are priority cache pools R respectivelypriorityRevenue caching pool RsuccessAnd a loss buffer pool Rfailure;
S33, each agent selects a behavior a in turntModifying the state s of a current agenttAnd gives a corresponding return value rtAnd then determining the corresponding experience(s)t,at,rt,st+1);
S34, based on the reported value rtWill experience(s)t,at,rt,st+1) And the corresponding priority Y is stored in an experience cache pool corresponding to the agent;
s35, slave agent and state S thereoftExtracting experiences from an experience cache pool of the nearest agent according to a set proportion to form a training set, and training the DDPG network structure of the agent;
s36, and the track similarity D of each trained agentsClustering is carried out to realize grouping of the agents;
s37, carrying out intra-group evolution operation on each group of agents obtained by grouping to obtain a plurality of new agent subgroups to form a track set of the current city road network;
and S38, counting the flow of each road section in the urban road network based on the track set of the current urban road network, and realizing the flow distribution prediction.
6. The method for predicting the regional flow distribution in the variable traffic control scheme based on the WMGIRL algorithm according to claim 5, wherein the step S34 specifically comprises:
r1, experience(s)t,at,rt,st+1) Deposit to priority cache pool RpriorityIn calculating the experience(s)t,at,rt,st+1) And stores the priority Y into a priority cache pool Rpriority;
R2, experience(s)t,at,rt,st+1) Is given a return value rtTo be positive, the experience(s) is given according to its priority Yt,at,rt,st+1) Deposit into profit cache pool Rsuccess;
When the experience(s)t,at,rt,st+1) Is given a return value rtWhen negative, the experience is assigned according to its priority Y(s)t,at,rt,st+1) Store in loss buffer pool Rfailure。
7. The method for predicting the regional flow distribution in the variable traffic management scheme based on the WMGIRL algorithm according to claim 6, wherein in the step A1, the experience(s)t,at,rt,st+1) The formula for calculating the priority Y is as follows:
Y=Azt+Bδt
wherein A and B are error coefficients, ztAs a qualification factor, δtIn order to be the error of the TD,
wherein, the qualification factor ztComprises the following steps:
where γ is the discount coefficient, λ is the trace-attenuation parameter,is in a state stThen, take action atThe return value of (2);
TD error deltatComprises the following steps:
δt=Rt+γQ'(st,argmaxQ(st,at)-Q(st-1,at-1))
in the formula, RtFor the reward obtained at time t, Q' (. cndot.) is a behavior selection function corresponding to the state in order to maximize the reward.
8. The method for predicting the regional flow distribution in the variable traffic control scheme based on the WMGIRL algorithm according to claim 6, wherein the step S36 specifically comprises:
b1, calculating the track similarity D between the tracks generated by a plurality of agents through a dynamic time warping algorithms;
B2 similarity D of any two trackssAs the inter-class distance;
and B3, dividing the agents corresponding to the inter-class distance smaller than the set threshold into a group, and realizing grouping of the agents.
9. The method for predicting regional traffic distribution in a variable traffic control scheme based on the WMGIRL algorithm according to claim 5, wherein in the step S37, performing an intra-group evolution operation on each group of agents to obtain a new group of agents specifically comprises:
c1, regarding all agents as population pop, and coding each agent;
c2, calculating the fitness of the agents in the population pop based on the codes of the agents;
c3, initializing an empty population newport;
c3, adding the agent with the maximum fitness into the newport population;
c4, selecting the remaining agents in the population pop, adding the selected agents into the population newport, and updating the population newport;
and C5, replacing the population pop with the updated population newport, and decoding each agent in the population newport to obtain a new agent group.
10. The method for predicting the regional traffic distribution in the variable traffic control scheme based on the WMGIRL algorithm of claim 9, wherein in the step C1, the method for encoding each Agent in the group pop comprises:
carrying out real number coding on the weight of the neural network of the agent, wherein the coding mode is as follows:
Agent=[w11...wx'y'...wxy,w11...wy'q'...wyq]
in the formula, wx'y'Is the weight, w, of the input layer neuron x 'to the hidden layer neuron y' in the neural network of the agenty'q'The weight from hidden layer neuron y 'to output layer neuron q' in the hidden layer neuron network;
in step C2, the fitness f (b) of the agent is:
f(i)=Rb+q∑δi
in the formula, RbFor the total award, sigma delta, of agent b for each roundiIs the total loss value of agent b, q is the learning rate;
the step C4 specifically includes:
c41, selecting the agent by roulette selection among the remaining agents within the population pop;
c42, generating a new Agent from the selected Agent based on the set cross probability and mutation probability;
wherein the cross probability PcComprises the following steps:
Pc=1-0.5/(1+eΔf)
probability of variation PmComprises the following steps:
Pm=1-0.05/(1+eΔf)
in the formula, Δ f is the difference between the maximum fitness value and the average fitness value in the population pop;
and C43, adding the generated new agent into the newport group until the number of agents in the newport group reaches a set number, and finishing the update of the newport group.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110305552.3A CN113053122B (en) | 2021-03-23 | 2021-03-23 | WMGIRL algorithm-based regional flow distribution prediction method in variable traffic control scheme |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110305552.3A CN113053122B (en) | 2021-03-23 | 2021-03-23 | WMGIRL algorithm-based regional flow distribution prediction method in variable traffic control scheme |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113053122A true CN113053122A (en) | 2021-06-29 |
CN113053122B CN113053122B (en) | 2022-02-18 |
Family
ID=76514332
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110305552.3A Active CN113053122B (en) | 2021-03-23 | 2021-03-23 | WMGIRL algorithm-based regional flow distribution prediction method in variable traffic control scheme |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113053122B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113759902A (en) * | 2021-08-17 | 2021-12-07 | 中南民族大学 | Multi-agent local interaction path planning method, device, equipment and storage medium |
CN113791612A (en) * | 2021-08-17 | 2021-12-14 | 中南民族大学 | Intelligent agent real-time path planning method, device, equipment and storage medium |
CN114627648A (en) * | 2022-03-16 | 2022-06-14 | 中山大学·深圳 | Federal learning-based urban traffic flow induction method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110164128A (en) * | 2019-04-23 | 2019-08-23 | 银江股份有限公司 | A kind of City-level intelligent transportation analogue system |
US20190329772A1 (en) * | 2018-04-27 | 2019-10-31 | Daniel Mark Graves | Method and system for adaptively controlling object spacing |
CN110570672A (en) * | 2019-09-18 | 2019-12-13 | 浙江大学 | regional traffic signal lamp control method based on graph neural network |
US10627823B1 (en) * | 2019-01-30 | 2020-04-21 | StradVision, Inc. | Method and device for performing multiple agent sensor fusion in cooperative driving based on reinforcement learning |
CN111243297A (en) * | 2020-01-17 | 2020-06-05 | 苏州科达科技股份有限公司 | Traffic light phase control method, system, device and medium |
CN111582469A (en) * | 2020-03-23 | 2020-08-25 | 成都信息工程大学 | Multi-agent cooperation information processing method and system, storage medium and intelligent terminal |
CN112241814A (en) * | 2020-10-20 | 2021-01-19 | 河南大学 | Traffic prediction method based on reinforced space-time diagram neural network |
-
2021
- 2021-03-23 CN CN202110305552.3A patent/CN113053122B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190329772A1 (en) * | 2018-04-27 | 2019-10-31 | Daniel Mark Graves | Method and system for adaptively controlling object spacing |
US10627823B1 (en) * | 2019-01-30 | 2020-04-21 | StradVision, Inc. | Method and device for performing multiple agent sensor fusion in cooperative driving based on reinforcement learning |
CN110164128A (en) * | 2019-04-23 | 2019-08-23 | 银江股份有限公司 | A kind of City-level intelligent transportation analogue system |
CN110570672A (en) * | 2019-09-18 | 2019-12-13 | 浙江大学 | regional traffic signal lamp control method based on graph neural network |
CN111243297A (en) * | 2020-01-17 | 2020-06-05 | 苏州科达科技股份有限公司 | Traffic light phase control method, system, device and medium |
CN111582469A (en) * | 2020-03-23 | 2020-08-25 | 成都信息工程大学 | Multi-agent cooperation information processing method and system, storage medium and intelligent terminal |
CN112241814A (en) * | 2020-10-20 | 2021-01-19 | 河南大学 | Traffic prediction method based on reinforced space-time diagram neural network |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113759902A (en) * | 2021-08-17 | 2021-12-07 | 中南民族大学 | Multi-agent local interaction path planning method, device, equipment and storage medium |
CN113791612A (en) * | 2021-08-17 | 2021-12-14 | 中南民族大学 | Intelligent agent real-time path planning method, device, equipment and storage medium |
CN113791612B (en) * | 2021-08-17 | 2023-10-24 | 中南民族大学 | Method, device, equipment and storage medium for planning real-time path of intelligent agent |
CN113759902B (en) * | 2021-08-17 | 2023-10-27 | 中南民族大学 | Multi-agent local interaction path planning method, device, equipment and storage medium |
CN114627648A (en) * | 2022-03-16 | 2022-06-14 | 中山大学·深圳 | Federal learning-based urban traffic flow induction method and system |
Also Published As
Publication number | Publication date |
---|---|
CN113053122B (en) | 2022-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113053122B (en) | WMGIRL algorithm-based regional flow distribution prediction method in variable traffic control scheme | |
CN112216108B (en) | Traffic prediction method based on attribute-enhanced space-time graph convolution model | |
Wang et al. | Adaptive Traffic Signal Control for large-scale scenario with Cooperative Group-based Multi-agent reinforcement learning | |
CN109754605B (en) | Traffic prediction method based on attention temporal graph convolution network | |
Silva et al. | Complexity, emergence and cellular urban models: lessons learned from applying SLEUTH to two Portuguese metropolitan areas | |
Yeh et al. | Simulation of development alternatives using neural networks, cellular automata, and GIS for urban planning | |
Lam et al. | Decision support system for contractor pre‐qualification—artificial neural network model | |
CN113762595B (en) | Traffic time prediction model training method, traffic time prediction method and equipment | |
CN111160622A (en) | Scenic spot passenger flow prediction method and device based on hybrid neural network model | |
Li et al. | International roughness index prediction based on multigranularity fuzzy time series and particle swarm optimization | |
CN115796007A (en) | Traffic flow prediction method based on space-time diagram network | |
CN115270506B (en) | Method and system for predicting passing time of crowd ascending along stairs | |
Sarnataro et al. | A portfolio approach for the selection and the timing of urban planning projects | |
Zhang et al. | Direction-decision learning based pedestrian flow behavior investigation | |
Cui et al. | An interpretation framework for autonomous vehicles decision-making via SHAP and RF | |
Furtlehner et al. | Spatial and temporal analysis of traffic states on large scale networks | |
Zhao et al. | Short-term traffic flow prediction based on VMD and IDBO-LSTM | |
CN111507499B (en) | Method, device and system for constructing model for prediction and testing method | |
Xing et al. | RL-GCN: Traffic flow prediction based on graph convolution and reinforcement learning for smart cities | |
Liu et al. | EvoTSC: An evolutionary computation-based traffic signal controller for large-scale urban transportation networks | |
CN108108554A (en) | A kind of more material vehicle body assembly sequence plan optimization methods | |
CN115762128B (en) | Deep reinforcement learning traffic signal control method based on self-attention mechanism | |
CN114861368B (en) | Construction method of railway longitudinal section design learning model based on near-end strategy | |
CN116612633A (en) | Self-adaptive dynamic path planning method based on vehicle-road cooperative sensing | |
Davidson | A new approach to transport modelling-the Stochastic Segmented Slice Simulation (4S) model and its recent applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |