CN113053122A - WMGIRL algorithm-based regional flow distribution prediction method in variable traffic control scheme - Google Patents

WMGIRL algorithm-based regional flow distribution prediction method in variable traffic control scheme Download PDF

Info

Publication number
CN113053122A
CN113053122A CN202110305552.3A CN202110305552A CN113053122A CN 113053122 A CN113053122 A CN 113053122A CN 202110305552 A CN202110305552 A CN 202110305552A CN 113053122 A CN113053122 A CN 113053122A
Authority
CN
China
Prior art keywords
agent
road network
state
agents
experience
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110305552.3A
Other languages
Chinese (zh)
Other versions
CN113053122B (en
Inventor
郑皎凌
张中雷
李军
吴昊昇
乔少杰
刘双侨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Yifang Intelligent Technology Co ltd
Chengdu University of Information Technology
Original Assignee
Sichuan Yifang Intelligent Technology Co ltd
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Yifang Intelligent Technology Co ltd, Chengdu University of Information Technology filed Critical Sichuan Yifang Intelligent Technology Co ltd
Priority to CN202110305552.3A priority Critical patent/CN113053122B/en
Publication of CN113053122A publication Critical patent/CN113053122A/en
Application granted granted Critical
Publication of CN113053122B publication Critical patent/CN113053122B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0137Measuring and analyzing of parameters relative to traffic conditions for specific applications
    • G08G1/0145Measuring and analyzing of parameters relative to traffic conditions for specific applications for active traffic flow control
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/065Traffic control systems for road vehicles by counting the vehicles in a section of the road or in a parking area, i.e. comparing incoming count with outgoing count

Landscapes

  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Marketing (AREA)
  • Chemical & Material Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Traffic Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for predicting regional traffic flow distribution under a variable traffic control scheme based on a weighted multi-agent grouping reverse Reinforcement Learning algorithm, namely weighted multi-agent Group inversion Learning (WMGIRL), which comprises the following steps: s1, carrying out urban road network modeling on the area to be predicted, and collecting the driving track data in the area; s2, extracting flow characteristics in the area to be predicted through a maximum entropy inverse reinforcement learning method based on weighting (Weighted) based on urban road networks and traffic track data corresponding to the area to be predicted; and S3, processing the urban road network based on the extracted traffic characteristics and the current traffic control scheme by adopting a forward reinforcement learning method based on multi-agent Group (Multiagent Group), and obtaining a traffic flow distribution prediction result of the area to be predicted.

Description

WMGIRL algorithm-based regional flow distribution prediction method in variable traffic control scheme
Technical Field
The invention belongs to the technical field of urban traffic flow prediction, and particularly relates to a variable traffic control scheme summary area flow distribution prediction method based on a WMGIRL (multi-agent reverse reinforcement learning based on in-group evolution) algorithm.
Background
Along with the continuous improvement of the comprehensive strength of China, a large number of cities have been held and operated large-scale activities one after another, and urban roads cannot meet the rapidly increasing traffic demands in a short time. In recent years, as the research on urban intelligent traffic is more and more intensive, the research on two problems of traffic guidance technology and traffic control has become a core research method of urban intelligent traffic, the current core technology of traffic guidance technology and traffic control is still in the research on short-term traffic flow, and with the continuous practice and development of urban traffic flow prediction methods, experts and scholars all over the world at present strive to develop short-term traffic flow prediction methods, which mainly include the following types:
(1) the naive mathematical statistical method carries out prediction research on the urban traffic flow;
(2) carrying out prediction research on urban traffic flow based on an artificial intelligence method;
(3) carrying out prediction research on urban traffic flow based on a nonlinear theory method;
urban traffic flow prediction is a hotspot of research in decades in the future, the complexity of the current urban traffic types is high, the scale-to-scale ratio of the traffic network is high, how to comprehensively arrange various traffic modes and complex traffic networks in one environment to build a model on the basis of the methods, and the method realizes the acquisition and screening of traffic flow data to predict the traffic flow more accurately, and is a problem worthy of thinking.
Disclosure of Invention
In order to overcome the defects in the prior art, the method for predicting the regional flow distribution in the variable traffic control scheme based on the WMGIRL algorithm is based on the angle of the driving track, and is comprehensively considered from multiple aspects such as an environment model and road attributes so as to improve the accuracy of flow distribution prediction.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that: the method for predicting regional flow distribution in the variable traffic control scheme based on the WMGIRL algorithm comprises the following steps:
s1, carrying out urban road network modeling on the area to be predicted, and collecting the driving track data in the area;
s2, extracting flow characteristics in the area to be predicted through a maximum entropy inverse reinforcement learning method based on multiple weights based on the urban road network and the traffic track data corresponding to the area to be predicted;
and S3, processing the urban road network based on the extracted traffic characteristics and the current traffic control scheme by a forward reinforcement learning method to obtain a traffic distribution prediction result of the area to be predicted.
Further, in step S1, the method for modeling an urban road network for the area to be predicted specifically includes:
a1, in the area to be predicted, regarding the intersection as a point V and regarding the road between two adjacent intersections as a side E to obtain an entity road network G (V, E);
a2, regarding the crossing in the entity road network G (V, E) as the state in the Markov process, regarding the road as the behavior required by the intelligent agent in the state transition, and further analogizing the entity road network G (V, E) to the environment model;
a3, determining an environment matrix F of the environment model based on the connectivity of the road network in the entity road network G (V, E), and determining a state transition matrix P according to the attributes of the roads in the entity road network G (V, E), thereby completing the modeling of the city road network;
wherein, the environment matrix F is:
F=[(e1,l1),(e2,l2),...,(ef,lf)]
wherein (e)f',lf')∈F,ef'Is a road network node,/f'For the road network node as ef'The number of motor vehicle lanes, subscript f 'is a road network node ordinal number, and f' is 1,2, 3.
The state transition matrix P is a three-dimensional matrix with the size of m multiplied by n multiplied by m established on the basis of an environment matrix, wherein m is the number of states, and n is the number of behaviors; each element P in the state transition matrix Psas'Indicating that taking action a in state s transitions to a new one of states s', and Psas'∈[0,1]。
Further, the step S2 is specifically:
s21, constructing an expert strategy set T ═ ξ { ξ) based on urban road network and collected driving track data01,...,ξm};
Wherein ξi∈T,ξiFor expert strategies, the index i is 1,2,3iIs xii={(s0,a0),(s1,a1),...,(sn,an)},(st,at)∈ξi,(st,at) Is a binary group, stIs in a state oftFor activity, the subscript t is the tth time, t 1,2, 3.
S22, setting each expert strategy xi in the expert strategy set TiState of(s)tSetting a characteristic expectation weight, and calculating the characteristic expectation of an expert strategy;
s23, setting a characteristic weight coefficient x based on the condition that the lengths of the expert strategies in the expert strategy set are different, and initializing the characteristic weight coefficient x;
s24, determining the state return rate c according to the characteristic weight coefficient x, and further obtaining a reward function
Figure BDA0002987664320000031
And a state action value function p (s' | s, a);
s25, calculating the expected state access frequency through a forward and backward algorithm
Figure BDA0002987664320000037
S26, accessing frequency based on expected state
Figure BDA0002987664320000038
Updating the feature weight coefficients by using the gradient vectors;
s26, repeating the steps S24-S25 until convergence, and entering the step S27;
and S27, taking the current reward function as the optimal reward function to obtain the flow characteristics.
Further, in the step S22, the expert strategy ξiState of(s)tIs expected weight z of the featurenComprises the following steps:
Figure BDA0002987664320000032
in the step S22, the expert policy ξiCharacteristic expectation of
Figure BDA0002987664320000033
Comprises the following steps:
Figure BDA0002987664320000034
in the formula, l is expert strategy xiiLength weight of (d), γtIn order to obtain the learning rate of the algorithm,
Figure BDA0002987664320000035
as a function of the reward, i.e. the reward given to the ith trace for its state at time t,
Figure BDA0002987664320000036
the state of the ith track at the time t is shown;
wherein the expert strategy xiiThe length weight of (l) is:
Figure BDA0002987664320000041
in the formula, len (-) is the length of the expert strategy;
the state action value function p (S' | S, a) in step S24 is:
Figure BDA0002987664320000042
in the formula, p (s '| s, a) represents the probability that agent i takes action a in state s, reaching state s'.
In step S26, gradient vectors are used
Figure BDA0002987664320000043
The formula for updating the feature weight coefficients is:
Figure BDA0002987664320000044
wherein L (theta) is a maximum likelihood function,
Figure BDA0002987664320000045
for the access frequency actually calculated from the expert trajectory,
Figure BDA0002987664320000046
is a binary group(s)t,at) The desired frequency of the frequency of,
Figure BDA0002987664320000047
is the corresponding feature vector.
Further, the step S3 is specifically:
s31, and based on the expert strategy set T ═ ξ { ξ) corresponding to the urban road network under the current traffic control scheme01,...,ξmConstruction ofAn environmental model M;
s32, arranging a plurality of agents in the constructed environment model, and arranging three experience cache pools in each agent;
wherein, the three experience cache pools in each agent are priority cache pools R respectivelypriorityRevenue caching pool RsuccessAnd a loss buffer pool Rfailure
S33, each agent selects a behavior a in turntModifying the state s of a current agenttAnd gives a corresponding return value rtAnd then determining the corresponding experience(s)t,at,rt,st+1);
S34, based on the reported value rtWill experience(s)t,at,rt,st+1) And the corresponding priority Y is stored in an experience cache pool corresponding to the agent;
s35, slave agent and state S thereoftExtracting experiences from an experience cache pool of the nearest agent according to a set proportion to form a training set, and training the DDPG network structure of the agent;
s36, and the track similarity D of each trained agentsClustering is carried out to realize grouping of the agents;
s37, carrying out intra-group evolution operation on each group of agents obtained by grouping to obtain a plurality of new agent subgroups to form a track set of the current city road network;
and S38, counting the flow of each road section in the urban road network based on the track set of the current urban road network, and realizing the flow distribution prediction.
Further, the step S34 is specifically:
r1, experience(s)t,at,rt,st+1) Deposit to priority cache pool RpriorityIn calculating the experience(s)t,at,rt,st+1) And stores the priority Y into a priority cache pool Rpriority
R2, experience(s)t,at,rt,st+1) Is given a return value rtTo be positive, the experience(s) is given according to its priority Yt,at,rt,st+1) Deposit into profit cache pool Rsuccess
When the experience(s)t,at,rt,st+1) Is given a return value rtWhen negative, the experience is assigned according to its priority Y(s)t,at,rt,st+1) Store in loss buffer pool Rfailure
Further, in the step A1, experience(s)t,at,rt,st+1) The formula for calculating the priority Y is as follows:
Y=Azt+Bδt
wherein A and B are error coefficients, ztAs a qualification factor, δtIn order to be the error of the TD,
wherein, the qualification factor ztComprises the following steps:
Figure BDA0002987664320000051
where γ is the discount coefficient, λ is the trace-attenuation parameter,
Figure BDA0002987664320000052
is in a state stThen, take action atThe return value of (2);
TD error deltatComprises the following steps:
δt=Rt+γQ'(st,argmaxQ(st,at)-Q(st-1,at-1))
in the formula, RtFor the reward obtained at time t, Q' (. cndot.) is a behavior selection function corresponding to the state in order to maximize the reward.
Further, the step S36 is specifically:
b1, calculating the track similarity D between the tracks generated by a plurality of agents through a dynamic time warping algorithms
B2, connecting any two tracksDegree of similarity DsAs the inter-class distance;
and B3, dividing the agents corresponding to the inter-class distance smaller than the set threshold into a group, and realizing grouping of the agents.
Further, in step S37, performing an intra-group evolution operation on each group of agents to obtain a new group of agents specifically includes:
c1, regarding all agents as population pop, and coding each agent;
c2, calculating the fitness of the agents in the population pop based on the codes of the agents;
c3, initializing an empty population newport;
c3, adding the agent with the maximum fitness into the newport population;
c4, selecting the remaining agents in the population pop, adding the selected agents into the population newport, and updating the population newport;
and C5, replacing the population pop with the updated population newport, and decoding each agent in the population newport to obtain a new agent group.
Further, in the step C1, the method for encoding each Agent in the population pop includes:
carrying out real number coding on the weight of the neural network of the agent, wherein the coding mode is as follows:
Agent=[w11...wx'y'...wxy,w11...wy'q'...wyq]
in the formula, wx'y'Is the weight, w, of the input layer neuron x 'to the hidden layer neuron y' in the neural network of the agenty'q'The weight from hidden layer neuron y 'to output layer neuron q' in the hidden layer neuron network;
in step C2, the fitness f (b) of the agent is:
f(i)=Rb+q∑δi
in the formula, RbFor the total award, sigma delta, of agent b for each roundiIs the total loss value of agent b, q is the learning rate;
the step C4 specifically includes:
c41, selecting the agent by roulette selection among the remaining agents within the population pop;
c42, generating a new Agent from the selected Agent based on the set cross probability and mutation probability;
wherein the cross probability PcComprises the following steps:
Pc=1-0.5/(1+eΔf)
probability of variation PmComprises the following steps:
Pm=1-0.05/(1+eΔf)
in the formula, Δ f is the difference between the maximum fitness value and the average fitness value in the population pop;
and C43, adding the generated new agent into the newport group until the number of agents in the newport group reaches a set number, and finishing the update of the newport group.
The invention has the beneficial effects that:
(1) the method starts from the angle of the driving track, and comprehensively considers a plurality of aspects such as an environment model, road attributes and the like so as to improve the accuracy of flow distribution prediction;
(2) according to the method, firstly, a road network is modeled, the connection relation among road points is extracted to serve as the basis of simplifying the road network, then the road network is simplified and operated, so that the workload is reduced for the subsequent maximum entropy algorithm, the problem of overlarge environment model is solved, and the link relation among intersections in the road network cannot be damaged;
(3) according to the method, the characteristic that an expert strategy set is used as a return function in a research object mining environment in inverse reinforcement learning is utilized, and unique urban resident driving track characteristics are mined;
(4) the invention provides a multi-experience discrimination multi-agent reinforcement learning algorithm based on in-group evolution on the basis of a DDPG algorithm, researches that a plurality of agents can quickly find a target in an unknown environment by carrying out a plurality of mechanisms of information interaction group internal heredity isocratic, and can quickly and effectively find an optimal strategy in the environment compared with the traditional learning algorithm.
Drawings
Fig. 1 is a flowchart of a method for predicting traffic distribution in a variable traffic control scheme based on the WMGIRL algorithm according to the present invention.
FIG. 2 is a diagram of a simulation platform framework in an embodiment of the present invention.
Fig. 3 is a schematic diagram of a regional road network in an embodiment of the present invention.
Fig. 4 is a schematic diagram illustrating a traffic control scheme according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of experimental comparison results in an example provided by the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
Example 1:
as shown in fig. 1, the method for predicting regional traffic distribution in a variable traffic control scheme based on the WMGIRL algorithm includes the following steps:
s1, carrying out urban road network modeling on the area to be predicted, and collecting the driving track data in the area;
s2, extracting flow characteristics in the area to be predicted through a maximum entropy inverse reinforcement learning method based on multiple weights based on the urban road network and the traffic track data corresponding to the area to be predicted;
and S3, processing the urban road network based on the extracted traffic characteristics and the current traffic control scheme by a forward reinforcement learning method to obtain a traffic distribution prediction result of the area to be predicted.
In step S1, when modeling the urban road network, all things and related relations in the real world are abstracted reasonably to form a type of form formed by nodes and edges according to corresponding criteria, and the corresponding entity network can be expressed by G (V, E); based on this, the method for performing urban road network modeling on the area to be predicted in step S1 specifically includes:
a1, in the area to be predicted, regarding the intersection as a point V and regarding a road E between two adjacent intersections as a side E to obtain an entity road network G (V, E);
where V represents a finite set of "nodes" V ═ V1,v2,...,vmAnd E represents the set of two finite "edges" E ═ E1,e2,...,enWherein e ═ vi,vjDenotes that an edge e is formed by two nodes vi,vjTo determine directional rays, for a network of n nodes without authority, the graph adjacency matrix a (g) ═ aij]n*nWherein:
Figure BDA0002987664320000091
r2, regarding the crossing in the entity road network G (V, E) as the state in the Markov process, regarding the road as the behavior required by the intelligent agent in the state transition, and further analogizing the entity road network G (V, E) to the environment model;
r3, determining an environment matrix F of the environment model based on the connectivity of the road network in the entity road network G (V, E), and determining a state transition matrix P according to the attributes of the roads in the entity road network G (V, E), thereby completing the modeling of the city road network;
wherein, the environment matrix F is:
F=[(e1,l1),(e2,l2),...,(ef,lf)]
wherein (e)f',lf')∈F,ef'Is a road network node,/f'For the road network node as ef'The number of motor vehicle lanes, subscript f 'is a road network node ordinal number, and f' is 1,2, 3.
The state transition matrix P is a three-dimensional matrix with the size of m multiplied by n multiplied by m established on the basis of the environment matrix, wherein m is the number of statesN is the number of behaviors; each element P in the state transition matrix Psas'Indicating that taking action a in state s transitions to a new one of states s', and Psas'∈[0,1]。
In the above-mentioned urban road network modeling process, a certain node e in such a network is usually usediCorresponding degree kiCan be defined as the number of all nodes interconnected with it, a greater degree of a node means that the node is more important. For directed graphs, the degree of a node can generally be divided into two. Node eiDegree of penetration of
Figure BDA0002987664320000101
Can be recognized as pointing to node eiNumber of nodes involved, degree of outing
Figure BDA0002987664320000102
Can be considered as the number of all nodes pointed to by node i. Then can derive
Figure BDA0002987664320000103
In general, in a complex network, an "edge" can represent an association that exists for each node. For the traffic network system in the city, it is not comprehensive in practice to express the mutual association between the adjacent intersections by the general road section is not the connection and the direction; in terms of a finer granularity, it should be taken into account that the lanes can also be divided into motor lanes and non-motor lanes; the most important attributes of a lane include the actual number of lanes, the width, the linearity and the gradient of the road, etc.
Because of the constraint of the relevant research conditions, the acquisition of all parameters in the current road network hierarchy still has certain difficulty, so the practical influence caused by the number of motor vehicle lanes is considered at first, and other elements are slowly subjected to empirical analysis along with the promotion of research work. This is also typically done during road element analysis. Only the number of lanes attribute was included in the study. Therefore, it is not only easy to use
E={(e1,l1),(e2,l2),...,(en,ln)}
Wherein l is the number of lanes of the motor vehicle.
The step S2 is specifically:
s21, constructing an expert strategy set T ═ ξ { ξ) based on urban road network and collected driving track data01,...,ξm};
Wherein ξi∈T,ξiFor expert strategies, the index i is 1,2,3iIs xii={(s0,a0),(s1,a1),...,(sn,an)},(st,at)∈ξi,(st,at) Is a binary group, stIs in a state oftFor activity, the subscript t is the tth time, t 1,2, 3.
S22, setting each expert strategy xi in the expert strategy set TiState of(s)tSetting a characteristic expectation weight, and calculating the characteristic expectation of an expert strategy;
s23, setting a characteristic weight coefficient x based on the condition that the lengths of the expert strategies in the expert strategy set are different, and initializing the characteristic weight coefficient x;
s24, determining the state return rate c according to the characteristic weight coefficient x, and further obtaining a reward function
Figure BDA0002987664320000111
And a state action value function p (s' | s, a);
s25, calculating the expected state access frequency through a forward and backward algorithm
Figure BDA0002987664320000112
S26, accessing frequency based on expected state
Figure BDA0002987664320000113
Updating the feature weight coefficients by using the gradient vectors;
s26, repeating the steps S24-S25 until convergence, and entering the step S27;
when the characteristic weight coefficient is not changed any more, convergence is achieved;
and S27, taking the current reward function as the optimal reward function to obtain the flow characteristics.
In the above process, when the agent learns the reward function in the environment model, if the expert policy set can access all states in the environment, the agent directly learns the expert policy set, and it is the most intuitive method to mine potential reward function information from the expert policies, but since the expert policy set is basically impossible to cover all the states in the environment, the state in the environment needs to be classified, the state that the expert policy often passes through when analyzing which states, the state with policy conflict is voted to find the optimal state, the state that the expert policy often passes through is weighted, and the agent is guided to mainly consider the high-frequency state in the process of learning the reward function. In addition, given expert strategies may be poor, so that the expert strategies need to be analyzed and information provided by all tracks is synthesized to evaluate each expert strategy, and a brand-new expert strategy set is formed. Thus, the present embodiment summarize the introduction of two weights z for the expert policy and the state in the environment, respectivelynAnd l;
from the perspective of the probability model, the inverse reinforcement learning is performed, and the track distribution probability model is solved and generated under the assumption that a hidden probability distribution exists, which is a known expert strategy. Some states in the track set may be that most tracks pass through the states, and the importance of the states in the practical application environment is not always the same, such as the vehicle always tends to drive to a road with a wider lane and avoids a downtown area which is easy to be blocked, so the weight of each state in the environment is different, and the expert strategy ξ is used foriState of(s)tIs expected weight z of the featurenDefining:
Figure BDA0002987664320000121
the input expert strategy set has the condition that the lengths of expert strategies in m expert strategies are different, the invention provides the probability of weight factors, so that each expert strategy has the same weight in the expert strategy set, and the expert strategy set is supposed to have m expert strategies:
T={ξ01,...,ξm}
wherein the length of the expert strategy is q, given m expert strategy strategies, in the step S22, the expert strategy xiiCharacteristic expectation of
Figure BDA0002987664320000122
Comprises the following steps:
Figure BDA0002987664320000123
in the formula, l is expert strategy xiiLength weight of (d), γtIn order to obtain the learning rate of the algorithm,
Figure BDA0002987664320000124
as a function of the reward, i.e. the reward given to the ith trace for its state at time t,
Figure BDA0002987664320000125
the state of the ith track at the time t is shown;
wherein the expert strategy xiiThe length weight of (l) is:
Figure BDA0002987664320000126
in the formula, len (-) is the length of the expert strategy;
the state action value function p (S' | S, a) in step S24 is:
Figure BDA0002987664320000127
in the formula, p (s '| s, a) represents the probability that agent i takes action a in state s, reaching state s'. In step S26, gradient vectors are used
Figure BDA0002987664320000128
The formula for updating the feature weight coefficients is:
Figure BDA0002987664320000131
wherein L (theta) is a maximum likelihood function,
Figure BDA0002987664320000132
for the access frequency actually calculated from the expert trajectory,
Figure BDA0002987664320000133
is a binary group(s)t,at) The desired frequency of the frequency of,
Figure BDA0002987664320000134
is the corresponding feature vector.
Step S3 of this embodiment specifically includes:
s31, and based on the expert strategy set T ═ ξ { ξ) corresponding to the urban road network under the current traffic control scheme01,...,ξmConstructing an environment model M;
s32, arranging a plurality of agents in the constructed environment model, and arranging three experience cache pools in each agent;
wherein, the three experience cache pools in each agent are priority cache pools R respectivelypriorityRevenue caching pool RsuccessAnd a loss buffer pool Rfailure
S33, each agent selects a behavior a in turntModifying the state s of a current agenttAnd gives a corresponding return value rtAnd then determining the corresponding experience(s)t,at,rt,st+1);
S34, based on the reported value rtWill experience(s)t,at,rt,st+1) And the corresponding priority Y is stored in an experience cache pool corresponding to the agent;
s35, slave agent and state S thereoftExtracting experiences from an experience cache pool of the nearest agent according to a set proportion to form a training set, and training the DDPG network structure of the agent;
s36, and the track similarity D of each trained agentsClustering is carried out to realize grouping of the agents;
s37, carrying out intra-group evolution operation on each group of agents obtained by grouping to obtain a plurality of new agent subgroups to form a track set of the current city road network;
and S38, counting the flow of each road section in the urban road network based on the track set of the current urban road network, and realizing the flow distribution prediction.
In the step S32, the present invention introduces an experience buffer pool mechanism to reserve the original priority buffer pool RpriorityMeanwhile, two new experience cache pools are created, namely a profit cache pool RsuccessAnd a loss buffer pool RfailureThe profit cache pool is used for storing experience of obtaining profits, the loss cache pool is used for storing experience of receiving punishments, the experience is judged while the experience calculation priority is put into the priority cache pool, and if the experience belongs to the profit experience or the punishment experience, the experience is stored into the corresponding experience cache pool.
The step S34 is specifically:
r1, experience(s)t,at,rt,st+1) Deposit to priority cache pool RpriorityIn calculating the experience(s)t,at,rt,st+1) And stores the priority Y into a priority cache pool Rpriority
Therein, experience(s)t,at,rt,st+1) The formula for calculating the priority Y is as follows:
Y=Azt+Bδt
wherein A and B are error coefficients, ztAs a qualification factor, δtIn order to be the error of the TD,
wherein, the qualification factor ztComprises the following steps:
Figure BDA0002987664320000141
where γ is the discount coefficient, λ is the trace-attenuation parameter,
Figure BDA0002987664320000142
is in a state stThen, take action atThe return value of (2);
TD error deltatComprises the following steps:
δt=Rt+γQ'(st,argmaxQ(st,at)-Q(st-1,at-1))
in the formula, RtFor the reward obtained at time t, Q' (. cndot.) is a behavior selection function corresponding to the state in order to maximize the reward.
R2, experience(s)t,at,rt,st+1) Is given a return value rtTo be positive, the experience(s) is given according to its priority Yt,at,rt,st+1) Deposit into profit cache pool Rsuccess
When the experience(s)t,at,rt,st+1) Is given a return value rtWhen negative, the experience is assigned according to its priority Y(s)t,at,rt,st+1) Store in loss buffer pool Rfailure
In step S35, when the agent learns, a certain amount of experience is extracted from the three experience pools according to a specific ratio, so that the relevance between data can be eliminated, the stability of neural network training is improved, training oscillation is reduced, and the extraction ratios from the priority cache pool, the profit cache pool, and the loss cache pool are as follows:
Figure BDA0002987664320000151
wherein, epicodenIs the current round number, epsilonmaxIs the total number of rounds.
Alpha is smaller along with the increase of the number of rounds, so that the intelligent body pays more attention to the preferred cache pool in the training period, the experience in the profit cache is reduced by a few unknown exploration actions, and when the experience is extracted from the experience cache pools of other intelligent bodies, the intelligent body collects the state with the closest distance between the experience pool and the state according to the state of the intelligent body, so that the experience of the other intelligent bodies exploring near the s state is fully utilized, and the environment same as the repeated exploration is effectively avoided. Therefore, the step S36 is specifically:
b1, calculating the track similarity D between the tracks generated by a plurality of agents through a dynamic time warping algorithms
B2 similarity D of any two trackssAs the inter-class distance;
specifically, when two-day tracks are calculated, each point in the tracks is calculated until the point reaches the end point, all calculated distances are accumulated to obtain the track similarity in the invention, a most regular track with the minimum accumulated distance is found by utilizing the idea of dynamic planning, the accumulated distance is the path similarity between the two tracks, and the specific formula is as follows:
Figure BDA0002987664320000152
and B3, dividing the agents corresponding to the inter-class distance smaller than the set threshold into a group, and realizing grouping of the agents.
In the step S37, the method for performing the intra-group evolution operation in each group of agents to obtain a new group of agents specifically includes:
c1, regarding all agents as population pop, and coding each agent;
c2, calculating the fitness of the agents in the population pop based on the codes of the agents;
c3, initializing an empty population newport;
c3, adding the agent with the maximum fitness into the newport population;
c4, selecting the remaining agents in the population pop, adding the selected agents into the population newport, and updating the population newport;
and C5, replacing the population pop with the updated population newport, and decoding each agent in the population newport to obtain a new agent group.
In the process, when a plurality of agents are put into the environment for training at one time, when each round is finished, the DDPG models trained in the environment are good in quality, at the moment, the elite reservation strategy is used for reserving the models which are good in performance, the genetic evolution strategy is used for carrying out cross variation on the models which are poor in quality, and then the models are put into the environment as a new population for carrying out training of the next round.
Firstly, real number coding is performed on the neural network weight of each Agent, and in step C1, the method for coding each Agent in the population pop includes:
carrying out real number coding on the weight of the neural network of the agent, wherein the coding mode is as follows:
Agent=[w11...wx'y'...wxy,w11...wy'q'...wyq]
in the formula, wx'y'Is the weight, w, of the input layer neuron x 'to the hidden layer neuron y' in the neural network of the agenty'q'The weight from hidden layer neuron y 'to output layer neuron q' in the hidden layer neuron network;
in step C2, the genetic algorithm summarizes the optimization process of the DDPG neural network of the intelligent agent, and evaluates the quality of the individual according to the fitness function so as to achieve the purpose of eliminating the quality; the fitness f (b) of the agent is as follows:
f(i)=Rb+q∑δi
in the formula, RbFor the total award, sigma delta, of agent b for each roundiIs the total loss value of agent b, q is the learning rate;
when each agent is selected, in order to prevent the optimal solution generated in the evolution process from being damaged by crossover and variation, the present invention adopts an elite retention strategy to copy the individuals with the maximum fitness in each group to the next generation without any change, and therefore, the step C4 is specifically:
c41, selecting the agent by roulette selection among the remaining agents within the population pop;
the probability of roulette selection is:
Figure BDA0002987664320000171
c42, generating a new Agent from the selected Agent based on the set cross probability and mutation probability;
because of cross entropy PcAnd the mutation probability PmThe setting of (2) can influence the convergence of the genetic algorithm and increase the error of the optimal solution and the true solution to a great extent, and the cross probability and the mutation probability which can be adjusted in a self-adaptive manner are used in the invention to ensure the diversity of the population:
wherein the cross probability PcComprises the following steps:
Pc=1-0.5/(1+eΔf)
probability of variation PmComprises the following steps:
Pm=1-0.05/(1+eΔf)
in the formula, Δ f is the difference between the maximum fitness value and the average fitness value in the population pop;
and C43, adding the generated new agent into the newport group until the number of agents in the newport group reaches a set number, and finishing the update of the newport group.
Since the coding scheme adopted in this embodiment is a real number coding scheme, the interleaving and mutation operations respectively adopt an arithmetic interleaving operation and a uniform mutation operation, and the arithmetic interleaving operation refers to an arithmetic interleaving operationTwo agents are linearly combined to produce two new agents, assuming Agenta,AgentbThe arithmetic intersection is performed, and two new individuals generated after the intersection operation are:
Agent'a=Pc·Agenta+(1-Pa)·Agentb
Agent'b=Pc·Agentb+(1-Pc)·Agenta
the embodiment adopts uniform mutation operation, and is provided with an intelligent agent for coding in the form of the above-mentioned code, if wx'y'Is a variation point wx'y'=[ωminmax]For variation point wx'y'Make a random union with the mutation probability PmComparing, if less than, at [ omega ]minmax]Selecting one random fragment w'x'y'Substitution of wx'y'
Example 2:
in the embodiment, a flow simulation platform under a variable traffic control scheme built based on the flow distribution prediction method is provided,
the platform can predict regional flow distribution under different traffic control schemes on the basis of real urban road networks and data in buckles, the framework of the flow simulation platform based on the variable traffic control scheme is shown in figure 2, and the flow simulation platform is provided with three sub-modules including a road network module, a flow module and an algorithm module. The road network module models an urban road network in a specified urban specified range; the traffic module is used for excavating driving track data by utilizing the bayonet data of a certain time period within the specified range of the city; the algorithm module is divided into a flow characteristic extraction module and a flow simulation algorithm module, wherein the flow characteristic extraction module is used for extracting flow characteristics of the area at specific time by utilizing an improved maximum entropy-based reinforcement learning algorithm on the basis of a road network module and a flow module, the flow simulation module is used for carrying out flow simulation by taking the calculated flow characteristics and an urban road network under a traffic control scheme as input through forward reinforcement learning, and finally, the flow is displayed in a visual mode.
The embodiment always simulates the traffic control scheme of the Mianyang Cone device in 9 months in 2019 to predict the area traffic of the traffic control scheme about road section prohibition, and verifies the accuracy of the area traffic prediction of the traffic control scheme about intersection turning prohibition by simulating the traffic control scheme based on turning prohibition in the traffic maneuver of the Mianyang CBD area of the Mianyang city in 12 months in 2019.
(1) Road network selection explanation:
regional traffic prediction for traffic control schemes for road section prohibition is performed by consulting a traffic police team in Yangyang city and consulting a smart city project worker in Yangyang city to select a main urban area in Yangyang city, and the selected range is a rectangular range surrounded by four points of a dragon stream logistics park (104.717637,31.513032), an Yangyang suburban airport (104.749689,31.435669), a miller hub (104.603876,31.450148) and a youth square (104.774194, 31.46093). As shown in fig. 3(a, road network range, b, yang road network), the regional road network is crawled and yang road network is simplified. There are 3200 road segments and 1016 intersections.
(2) Preparing data:
the experiment predicts the flow distribution situation from 8 points to 9 points of the early peak at 9 and 6 days in 2019 during the cobble, the number of the expert strategy sets for combing the early peak is 302 from the bayonet data from 1 day in 9 and 1 days in 2019 to 5 days in 9 and 5 days in 2019, the minimum number of tracks in one expert strategy set is 43, the maximum number of tracks is 809, and the average length of the tracks is 46.
(3) Traffic control scheme description:
the traffic police department in the Mianyang city during the Mianyang scientific society implements road ban on the following roads: (a) one-way traffic is carried out from the flying cloud great road winery intersection to the mountain road to the rubdown intersection; (b) the Liaoning Daodao Li Hospital intersection passes to the old Yong Lu Li Hospital intersection in one way; (c) the old Yong an Lu Xin automobile intersection goes to the Bubo garden intersection for one-way traffic, as shown in FIG. 4.
(4) Simulation experiment process:
in order to verify the accuracy and reliability of the algorithm provided by the present invention, the flow rate during the kobo meeting is subjected to a plurality of flow rate simulations for the mianyang urban area after the traffic control scheme is applied during the mianyang kobo meeting, and the experimental result is shown in fig. 5. The non-path-blocking prediction performed at 9 am of the first day by the cobbleshoot can show that the core circle and the control circle have congestion conditions at 9 am in the early peak period, red and orange congestion stages are displayed in a large range in the exhibition center of the cobbleshoot at the moment according to thermodynamic diagrams, after unidirectional path blocking is performed, orange congestion conditions of a small number of road sections occur in the core and control area, and light green basic smoothness is displayed in most areas. Different traffic states correspondingly appear according to different road sections in the non-road-closing state in the shunting area, and the orange jam condition appears on the road section close to the control and management area after the road is closed.
As can be seen from a comparison of fig. 5(a) and 5(b), the predicted flow distribution is similar to the actual flow distribution. Fig. 5(c) compares the predicted data of all the unidirectional road segments in the road network with the actual data, and it can be seen that the predicted data and the actual data have the same distribution trend in the road network. Then, the first 20 road segments with the largest vehicles passing through the road network are taken out to observe the graph 5(d), and the data predicted by the algorithm is slightly larger than the actual data, so that the data is used
Figure BDA0002987664320000201
Representing the actual flow rate, f(s) representing the predicted flow rate. In summary, the accuracy in the area traffic prediction experiment for the mianyang urban area after the platform implements the traffic control scheme during the mianyang scientific fair is as follows:
Figure BDA0002987664320000202

Claims (10)

1. the method for predicting regional flow distribution in the variable traffic control scheme based on the WMGIRL algorithm is characterized by comprising the following steps of:
s1, carrying out urban road network modeling on the area to be predicted, and collecting the driving track data in the area;
s2, extracting flow characteristics in the area to be predicted through a maximum entropy inverse reinforcement learning method based on multiple weights based on the urban road network and the traffic track data corresponding to the area to be predicted;
and S3, processing the urban road network based on the extracted traffic characteristics and the current traffic control scheme by a forward reinforcement learning method to obtain a traffic distribution prediction result of the area to be predicted.
2. The method for predicting the regional traffic distribution in the variable traffic control scheme based on the WMGIRL algorithm according to claim 1, wherein in the step S1, the method for modeling the urban road network of the region to be predicted specifically comprises:
a1, in the area to be predicted, regarding the intersection as a point V and regarding the road between two adjacent intersections as a side E to obtain an entity road network G (V, E);
a2, regarding the crossing in the entity road network G (V, E) as the state in the Markov process, regarding the road as the behavior required by the intelligent agent in the state transition, and further analogizing the entity road network G (V, E) to the environment model;
a3, determining an environment matrix F of the environment model based on the connectivity of the road network in the entity road network G (V, E), and determining a state transition matrix P according to the attributes of the roads in the entity road network G (V, E), thereby completing the modeling of the city road network;
wherein, the environment matrix F is:
F=[(e1,l1),(e2,l2),...,(ef,lf)]
wherein (e)f',lf')∈F,ef'Is a road network node,/f'For the road network node as ef'The number of motor vehicle lanes, subscript f 'is a road network node ordinal number, and f' is 1,2, 3.
The state transition matrix P is a three-dimensional matrix with the size of m multiplied by n multiplied by m established on the basis of an environment matrix, wherein m is the number of states, and n is the number of behaviors; each element P in the state transition matrix Psas'Indicating that taking action a in state s transitions to a new one of states s', and Psas'∈[0,1]。
3. The method for predicting the regional flow distribution in the variable traffic control scheme based on the WMGIRL algorithm according to claim 1, wherein the step S2 specifically comprises:
s21, constructing an expert strategy set T ═ ξ { ξ) based on urban road network and collected driving track data01,...,ξm};
Wherein ξi∈T,ξiFor expert strategies, the index i is 1,2,3iIs xii={(s0,a0),(s1,a1),...,(sn,an)},(st,at)∈ξi,(st,at) Is a binary group, stIs in a state oftFor activity, the subscript t is the tth time, t 1,2, 3.
S22, setting each expert strategy xi in the expert strategy set TiState of(s)tSetting a characteristic expectation weight, and calculating the characteristic expectation of an expert strategy;
s23, setting a characteristic weight coefficient x based on the condition that the lengths of the expert strategies in the expert strategy set are different, and initializing the characteristic weight coefficient x;
s24, determining the state return rate c according to the characteristic weight coefficient x, and further obtaining a reward function
Figure FDA0002987664310000021
And a state action value function p (s' | s, a);
s25, calculating the expected state access frequency through a forward and backward algorithm
Figure FDA0002987664310000022
S26, accessing frequency based on expected state
Figure FDA0002987664310000023
Using direction of gradientQuantity update characteristic weight coefficients;
s26, repeating the steps S24-S25 until convergence, and entering the step S27;
and S27, taking the current reward function as the optimal reward function to obtain the flow characteristics.
4. The method for predicting regional flow distribution in variable traffic control scheme based on WMGIRL algorithm of claim 3, wherein in the step S22, expert strategy ξiState of(s)tIs expected weight z of the featurenComprises the following steps:
Figure FDA0002987664310000024
in the step S22, the expert policy ξiCharacteristic expectation of
Figure FDA0002987664310000031
Comprises the following steps:
Figure FDA0002987664310000032
in the formula, l is expert strategy xiiLength weight of (d), γtIn order to obtain the learning rate of the algorithm,
Figure FDA0002987664310000033
as a function of the reward, i.e. the reward given to the ith trace for its state at time t,
Figure FDA0002987664310000034
the state of the ith track at the time t is shown;
wherein the expert strategy xiiThe length weight of (l) is:
Figure FDA0002987664310000035
in the formula, len (-) is the length of the expert strategy;
the state action value function p (S' | S, a) in step S24 is:
Figure FDA0002987664310000036
in the formula, p (s '| s, a) represents the probability that agent i takes action a in state s, reaching state s'.
In step S26, gradient vectors are used
Figure FDA0002987664310000037
The formula for updating the feature weight coefficients is:
Figure FDA0002987664310000038
wherein L (theta) is a maximum likelihood function,
Figure FDA0002987664310000039
for the access frequency actually calculated from the expert trajectory,
Figure FDA00029876643100000310
is a binary group(s)t,at) The desired frequency of the frequency of,
Figure FDA00029876643100000311
is the corresponding feature vector.
5. The method for predicting the regional flow distribution in the variable traffic control scheme based on the WMGIRL algorithm according to claim 3, wherein the step S3 specifically comprises:
s31, and based on the expert strategy set T ═ ξ { ξ) corresponding to the urban road network under the current traffic control scheme01,...,ξmConstructing an environment model M;
s32, arranging a plurality of agents in the constructed environment model, and arranging three experience cache pools in each agent;
wherein, the three experience cache pools in each agent are priority cache pools R respectivelypriorityRevenue caching pool RsuccessAnd a loss buffer pool Rfailure
S33, each agent selects a behavior a in turntModifying the state s of a current agenttAnd gives a corresponding return value rtAnd then determining the corresponding experience(s)t,at,rt,st+1);
S34, based on the reported value rtWill experience(s)t,at,rt,st+1) And the corresponding priority Y is stored in an experience cache pool corresponding to the agent;
s35, slave agent and state S thereoftExtracting experiences from an experience cache pool of the nearest agent according to a set proportion to form a training set, and training the DDPG network structure of the agent;
s36, and the track similarity D of each trained agentsClustering is carried out to realize grouping of the agents;
s37, carrying out intra-group evolution operation on each group of agents obtained by grouping to obtain a plurality of new agent subgroups to form a track set of the current city road network;
and S38, counting the flow of each road section in the urban road network based on the track set of the current urban road network, and realizing the flow distribution prediction.
6. The method for predicting the regional flow distribution in the variable traffic control scheme based on the WMGIRL algorithm according to claim 5, wherein the step S34 specifically comprises:
r1, experience(s)t,at,rt,st+1) Deposit to priority cache pool RpriorityIn calculating the experience(s)t,at,rt,st+1) And stores the priority Y into a priority cache pool Rpriority
R2, experience(s)t,at,rt,st+1) Is given a return value rtTo be positive, the experience(s) is given according to its priority Yt,at,rt,st+1) Deposit into profit cache pool Rsuccess
When the experience(s)t,at,rt,st+1) Is given a return value rtWhen negative, the experience is assigned according to its priority Y(s)t,at,rt,st+1) Store in loss buffer pool Rfailure
7. The method for predicting the regional flow distribution in the variable traffic management scheme based on the WMGIRL algorithm according to claim 6, wherein in the step A1, the experience(s)t,at,rt,st+1) The formula for calculating the priority Y is as follows:
Y=Azt+Bδt
wherein A and B are error coefficients, ztAs a qualification factor, δtIn order to be the error of the TD,
wherein, the qualification factor ztComprises the following steps:
Figure FDA0002987664310000051
where γ is the discount coefficient, λ is the trace-attenuation parameter,
Figure FDA0002987664310000052
is in a state stThen, take action atThe return value of (2);
TD error deltatComprises the following steps:
δt=Rt+γQ'(st,argmaxQ(st,at)-Q(st-1,at-1))
in the formula, RtFor the reward obtained at time t, Q' (. cndot.) is a behavior selection function corresponding to the state in order to maximize the reward.
8. The method for predicting the regional flow distribution in the variable traffic control scheme based on the WMGIRL algorithm according to claim 6, wherein the step S36 specifically comprises:
b1, calculating the track similarity D between the tracks generated by a plurality of agents through a dynamic time warping algorithms
B2 similarity D of any two trackssAs the inter-class distance;
and B3, dividing the agents corresponding to the inter-class distance smaller than the set threshold into a group, and realizing grouping of the agents.
9. The method for predicting regional traffic distribution in a variable traffic control scheme based on the WMGIRL algorithm according to claim 5, wherein in the step S37, performing an intra-group evolution operation on each group of agents to obtain a new group of agents specifically comprises:
c1, regarding all agents as population pop, and coding each agent;
c2, calculating the fitness of the agents in the population pop based on the codes of the agents;
c3, initializing an empty population newport;
c3, adding the agent with the maximum fitness into the newport population;
c4, selecting the remaining agents in the population pop, adding the selected agents into the population newport, and updating the population newport;
and C5, replacing the population pop with the updated population newport, and decoding each agent in the population newport to obtain a new agent group.
10. The method for predicting the regional traffic distribution in the variable traffic control scheme based on the WMGIRL algorithm of claim 9, wherein in the step C1, the method for encoding each Agent in the group pop comprises:
carrying out real number coding on the weight of the neural network of the agent, wherein the coding mode is as follows:
Agent=[w11...wx'y'...wxy,w11...wy'q'...wyq]
in the formula, wx'y'Is the weight, w, of the input layer neuron x 'to the hidden layer neuron y' in the neural network of the agenty'q'The weight from hidden layer neuron y 'to output layer neuron q' in the hidden layer neuron network;
in step C2, the fitness f (b) of the agent is:
f(i)=Rb+q∑δi
in the formula, RbFor the total award, sigma delta, of agent b for each roundiIs the total loss value of agent b, q is the learning rate;
the step C4 specifically includes:
c41, selecting the agent by roulette selection among the remaining agents within the population pop;
c42, generating a new Agent from the selected Agent based on the set cross probability and mutation probability;
wherein the cross probability PcComprises the following steps:
Pc=1-0.5/(1+eΔf)
probability of variation PmComprises the following steps:
Pm=1-0.05/(1+eΔf)
in the formula, Δ f is the difference between the maximum fitness value and the average fitness value in the population pop;
and C43, adding the generated new agent into the newport group until the number of agents in the newport group reaches a set number, and finishing the update of the newport group.
CN202110305552.3A 2021-03-23 2021-03-23 WMGIRL algorithm-based regional flow distribution prediction method in variable traffic control scheme Active CN113053122B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110305552.3A CN113053122B (en) 2021-03-23 2021-03-23 WMGIRL algorithm-based regional flow distribution prediction method in variable traffic control scheme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110305552.3A CN113053122B (en) 2021-03-23 2021-03-23 WMGIRL algorithm-based regional flow distribution prediction method in variable traffic control scheme

Publications (2)

Publication Number Publication Date
CN113053122A true CN113053122A (en) 2021-06-29
CN113053122B CN113053122B (en) 2022-02-18

Family

ID=76514332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110305552.3A Active CN113053122B (en) 2021-03-23 2021-03-23 WMGIRL algorithm-based regional flow distribution prediction method in variable traffic control scheme

Country Status (1)

Country Link
CN (1) CN113053122B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113759902A (en) * 2021-08-17 2021-12-07 中南民族大学 Multi-agent local interaction path planning method, device, equipment and storage medium
CN113791612A (en) * 2021-08-17 2021-12-14 中南民族大学 Intelligent agent real-time path planning method, device, equipment and storage medium
CN114627648A (en) * 2022-03-16 2022-06-14 中山大学·深圳 Federal learning-based urban traffic flow induction method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110164128A (en) * 2019-04-23 2019-08-23 银江股份有限公司 A kind of City-level intelligent transportation analogue system
US20190329772A1 (en) * 2018-04-27 2019-10-31 Daniel Mark Graves Method and system for adaptively controlling object spacing
CN110570672A (en) * 2019-09-18 2019-12-13 浙江大学 regional traffic signal lamp control method based on graph neural network
US10627823B1 (en) * 2019-01-30 2020-04-21 StradVision, Inc. Method and device for performing multiple agent sensor fusion in cooperative driving based on reinforcement learning
CN111243297A (en) * 2020-01-17 2020-06-05 苏州科达科技股份有限公司 Traffic light phase control method, system, device and medium
CN111582469A (en) * 2020-03-23 2020-08-25 成都信息工程大学 Multi-agent cooperation information processing method and system, storage medium and intelligent terminal
CN112241814A (en) * 2020-10-20 2021-01-19 河南大学 Traffic prediction method based on reinforced space-time diagram neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190329772A1 (en) * 2018-04-27 2019-10-31 Daniel Mark Graves Method and system for adaptively controlling object spacing
US10627823B1 (en) * 2019-01-30 2020-04-21 StradVision, Inc. Method and device for performing multiple agent sensor fusion in cooperative driving based on reinforcement learning
CN110164128A (en) * 2019-04-23 2019-08-23 银江股份有限公司 A kind of City-level intelligent transportation analogue system
CN110570672A (en) * 2019-09-18 2019-12-13 浙江大学 regional traffic signal lamp control method based on graph neural network
CN111243297A (en) * 2020-01-17 2020-06-05 苏州科达科技股份有限公司 Traffic light phase control method, system, device and medium
CN111582469A (en) * 2020-03-23 2020-08-25 成都信息工程大学 Multi-agent cooperation information processing method and system, storage medium and intelligent terminal
CN112241814A (en) * 2020-10-20 2021-01-19 河南大学 Traffic prediction method based on reinforced space-time diagram neural network

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113759902A (en) * 2021-08-17 2021-12-07 中南民族大学 Multi-agent local interaction path planning method, device, equipment and storage medium
CN113791612A (en) * 2021-08-17 2021-12-14 中南民族大学 Intelligent agent real-time path planning method, device, equipment and storage medium
CN113791612B (en) * 2021-08-17 2023-10-24 中南民族大学 Method, device, equipment and storage medium for planning real-time path of intelligent agent
CN113759902B (en) * 2021-08-17 2023-10-27 中南民族大学 Multi-agent local interaction path planning method, device, equipment and storage medium
CN114627648A (en) * 2022-03-16 2022-06-14 中山大学·深圳 Federal learning-based urban traffic flow induction method and system

Also Published As

Publication number Publication date
CN113053122B (en) 2022-02-18

Similar Documents

Publication Publication Date Title
CN113053122B (en) WMGIRL algorithm-based regional flow distribution prediction method in variable traffic control scheme
CN112216108B (en) Traffic prediction method based on attribute-enhanced space-time graph convolution model
Wang et al. Adaptive Traffic Signal Control for large-scale scenario with Cooperative Group-based Multi-agent reinforcement learning
CN109754605B (en) Traffic prediction method based on attention temporal graph convolution network
Silva et al. Complexity, emergence and cellular urban models: lessons learned from applying SLEUTH to two Portuguese metropolitan areas
Yeh et al. Simulation of development alternatives using neural networks, cellular automata, and GIS for urban planning
Lam et al. Decision support system for contractor pre‐qualification—artificial neural network model
CN113762595B (en) Traffic time prediction model training method, traffic time prediction method and equipment
CN111160622A (en) Scenic spot passenger flow prediction method and device based on hybrid neural network model
Li et al. International roughness index prediction based on multigranularity fuzzy time series and particle swarm optimization
CN115796007A (en) Traffic flow prediction method based on space-time diagram network
CN115270506B (en) Method and system for predicting passing time of crowd ascending along stairs
Sarnataro et al. A portfolio approach for the selection and the timing of urban planning projects
Zhang et al. Direction-decision learning based pedestrian flow behavior investigation
Cui et al. An interpretation framework for autonomous vehicles decision-making via SHAP and RF
Furtlehner et al. Spatial and temporal analysis of traffic states on large scale networks
Zhao et al. Short-term traffic flow prediction based on VMD and IDBO-LSTM
CN111507499B (en) Method, device and system for constructing model for prediction and testing method
Xing et al. RL-GCN: Traffic flow prediction based on graph convolution and reinforcement learning for smart cities
Liu et al. EvoTSC: An evolutionary computation-based traffic signal controller for large-scale urban transportation networks
CN108108554A (en) A kind of more material vehicle body assembly sequence plan optimization methods
CN115762128B (en) Deep reinforcement learning traffic signal control method based on self-attention mechanism
CN114861368B (en) Construction method of railway longitudinal section design learning model based on near-end strategy
CN116612633A (en) Self-adaptive dynamic path planning method based on vehicle-road cooperative sensing
Davidson A new approach to transport modelling-the Stochastic Segmented Slice Simulation (4S) model and its recent applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant