CN117173914A

CN117173914A - Road network signal control unit decoupling method, device and medium for simplifying complex model

Info

Publication number: CN117173914A
Application number: CN202311456928.6A
Authority: CN
Inventors: 陈敬龙; 魏婧; 徐文轩; 李振; 唐涛; 谭墍元; 郭伟伟
Original assignee: Shandong Moshe Network Technology Co ltd; Zhongtai Xinhe Intelligent Technology Co ltd; North China University of Technology
Current assignee: Shandong Moshe Network Technology Co ltd; Zhongtai Xinhe Intelligent Technology Co ltd; North China University of Technology
Priority date: 2023-11-03
Filing date: 2023-11-03
Publication date: 2023-12-05
Anticipated expiration: 2043-11-03
Also published as: CN117173914B

Abstract

The application belongs to the field of intelligent control, and particularly relates to a road network signal control unit decoupling method, a device and a medium for simplifying a complex model. Compared with MADRL algorithm based on the inter-agent joint game, the test result shows that the control effect of the model provided by the user reaches an ideal level under the condition of decoupling.

Description

Road network signal control unit decoupling method, device and medium for simplifying complex model

Technical Field

The application belongs to the field of intelligent control, and particularly relates to a method, a device and a medium for decoupling a road network signal control unit for simplifying complex model training.

Background

In view of the limited space of urban roads, a series of traffic problems, such as traffic jams and accidents, are generated. These problems not only cause serious economic losses, but also limit the sustainable development of cities. Traffic congestion has therefore become a common problem facing cities. To solve this problem, one possible approach is to build intelligent traffic systems using intelligent technology. In this system, traffic signal control plays a central role and becomes an important means for solving traffic problems.

The prior art MADRL approach still faces some challenges and limitations. One of the methods is that the calculation complexity is increased, with the increase of the number of agents and the environmental scale, more calculation resources are needed in the training and reasoning process, the complexity of a neural network, the cooperative game among multiple agents, the value decomposition strategy of a central control network and the like are emphasized, and the training and the utilization of a model in a real signal control scene are not easy to realize. In addition, the robustness and stability of the MADRL method also needs to be further improved to cope with uncertainty and noise in real-world scenarios.

Disclosure of Invention

The application provides a signal optimization model based on a multi-agent deep Q network, which considers that an independent learning mode with a simpler structure is used for controlling signals of a trunk line and a large-scale road network, replaces the MADRL algorithm based on the combined game between agents at present under the condition of decoupling traffic environment, so as to reduce the complexity of the model, and compares and verifies the feasibility and the effectiveness of the model in the multi-agent signal optimization through different methods. The technical proposal is as follows:

a decoupling method for a road network signal control unit of a simplified complex model comprises the following steps:

s1, designing a road network environment, including designing a road intersection topological structure, arranging entrance road detectors of all intersections, establishing signal lamps of all intersections, setting green light time of a starting period, collecting static data of the road network, and performing static decoupling;

s2, designing different intervals and flow combinations as a plurality of traffic scenes, and constructing a reward function of the deep reinforcement learning model by using traffic states and action decisions;

s3, training the deep reinforcement learning model by using a neural network;

s4, applying the trained deep reinforcement learning model to different test scenes for decoupling, and determining a decoupling range.

Preferably, in step S1,

s11, collecting static data of a road network, wherein the static data comprise road network connection relations, historical traffic flow and road type information;

s12, road network segmentation: constructing a road network topological structure model according to the collected data, wherein the road is represented as a node, the running path of the vehicle is represented as an edge, and the weight of the edge represents the length of the road connected with the node; clustering is carried out according to the relative positions of nodes of the topological structure model of the road network, the whole road network is divided into a plurality of road sections or areas, and each road section or area is called a signal control unit;

s13, distributing the traffic flow data collected in the step S11 to each signal control unit, further dividing the road network dividing result in the step 2 by using a decoupling method according to the distributed traffic flow data, so that the traffic flow in each signal control unit is relatively balanced, meanwhile, the traffic flow between different signal control units is less, the divided result comprises a single-point signal control intersection, a main line signal control road section and an area signal control road network, the static data decoupling method adopts a clustering method, re-clusters according to the flow on the basis of the node relative position dividing result, and separates the traffic flow irrelevant parts in the road network, thereby realizing decoupling in the road network signal control units.

Preferably, in step S2,

s21, respectively deploying the intelligent agents at all intersections, and sharingThe intersection state observed by each agent at the moment is recorded as (++)>) Wherein->Representing intelligent agent->At->Observing the state of the intersection where the user is located at the moment, (-)>) The union of->The overall traffic state at the moment;

s22, each agent makes a corresponding decision action according to the traffic state, the decision action is defined as the selection of signal phases, and a selectable phase set is recorded asI.e. a set of actions, which is of finite dimension, action set +.>Is selected from->The action to be taken by each agent at the moment is recorded as +.>Then->The method comprises the steps of carrying out a first treatment on the surface of the Signal lamp execution->Decision action of moment->The duration of action execution is variable, noted +.>Then at +.>The moment environment feeds back the action to rewards and punishments, and an average queuing length is used as an evaluation index to construct a rewarding function +.>Influence of quantitative decision action:

wherein the number of lanes at an intersection is recorded as，/>Is->Lane (S),>queuing in the ith laneLength.

Preferably, in step S2, it is assumed that the entire intersection includescThe flow direction is used for acquiring the queuing length of vehicles on each lane as the traffic state of the current lane, so that the traffic state of the whole intersection is input bycThe number of queuing length values, i.e;

The action decision is set as phase selection, namely, according to a predefined phase sequence scheme, the current stage is selected to be kept or is switched to any next stage according to the running condition; the phase scheme set isAlternative phases include north-south left, north-south right, east-west left and east-west right; control logic for phase selection: if the selected action coincides with the current phase, the green light of this phase is preset for a time +.>Otherwise, after execution of the yellow light transition time +.>Ensuring that the green light is running for a preset time +.>And then switch to the next decision phase.

Preferably, in step S3, the training process of the deep reinforcement learning model is as follows:

s31, initializing model superparameters, and acquiring current traffic states from simulation dataAnd maximum queuing length of different flow directions of each inlet channelO _t The method comprises the steps of carrying out a first treatment on the surface of the Super parameters include discount coefficient->Green light preset time->Transition time of yellow lamp->；

S32, inputting the traffic state into a neural network, and according to the action Q value output by the neural network, inputting the traffic state into an action setPolicy selection actionsaAnd will actaIssuing to a signal machine for execution, and if the action is still to select the current phase, setting the green light of the current phase for a preset time +.>Then making action decision again; if the motion selection is different from the current phase, executing the transition time of the yellow lamp of the current phase>Switching to a phase corresponding to the action;

s33, taking the current traffic state, decision action, rewarding value and traffic state of the next time step as a four-element groupThe method comprises the steps of storing in an experience playback unit, randomly extracting small batches of sample data in an experience pool, and updating the weight value of the neural network by using the small batches of sample data;

the neural network outputs the actual value each time, and the target value is approximated by the maximum action value output by the corresponding Q-value function in the next state, so the update formula of the Q-value function is expressed as:

in this way, the output value of the neural network is approximated to the target Q value, and then an optimal signal control strategy is obtained by selecting an action having the maximum Q value.

Preferably, in step S4,

s41, defining the association degree of the intersections, and relating to the road network spacing of the urban roads and the traffic flow input by the road network:

for road-to-road-network spacing, set as: 1500-2500 meters for expressways, 700-1200 meters for main roads, 350-500 meters for secondary main roads and 150-250 meters for branches, so that the range of [100m,2500m ] for preliminary determination of road-network spacing by expanding upper and lower bounds is increased]And the pitch length is recorded as;

For the traffic flow input by the road network, firstly, the number of inlets of the traffic flow capable of being input by the road network is determined, the number is related to the number of intersections included in the road network, and the number of known intelligent agents is defined asThe road network comprises the number of intersections>The road network line number is marked as +.>The column number is->The number of inlets capable of inputting flow is as follows:

if at firstThe flow at the inlet of the inlet flow is set to +.>The total flow input is:

is->The total number of the traffic flow input at the moment is in direct proportion to the road-network spacing, and the larger the road-network spacing is, the more the number of the vehicles can be accommodated in the road section, and the more the total number of the traffic flow can be input;

the association degree of adjacent intersections is defined by combining the road network distance and the input flow, wherein the road segments between the adjacent intersections are discretely divided, and the known road segment length isIs divided into->The zone boundary is +.>The left boundary of each section is marked +.>The right border is marked->Each section is provided with a unit length of +>Discretizing the number of intervalsAcquiring the queuing length of the straight lanes of the road section between adjacent intersections in the simulation at each moment, discretizing the queuing length into the sections, and counting the frequency number of each section as +.>Defining the association degree as the average value of the left and right boundaries of the discretization interval corresponding to the frequency maximum value, wherein the formula is as follows:

the association degree reflects the association degree between adjacent intersections, and the smaller the association degree value is, the lower the association degree between the intersections is, and the stronger the road network decoupling performance is.

Preferably, in step S4, it is determined whether decoupling is possible:

s42, testing the deep reinforcement learning model trained in the step S4 and the combined game model in the same traffic scene, and recording evaluation indexes, wherein the evaluation indexes can comprise vehicle average speed, vehicle average queuing length and vehicle average delayAnd->The similarity coefficient calculation formula is as follows:

wherein the method comprises the steps ofRepresentation set->Element number of->Representation->And->Element number of intersection of +.>Representation->And->The number of elements of the union of (a);

the value range of the similarity coefficient is between 0 and 1, when the coefficient is close to 1, the two sets are very similar, when the coefficient is close to 0, the two sets are not very similar, and the similarity coefficient is setWhen the method is used, the evaluation indexes of the two algorithms are relatively close, and the model can replace a combined game model, so that a road network can be decoupled for model training; when (when)When the method is used, the difference between the two algorithm evaluation indexes is considered to be large, and a combined game model is adopted; when (when)At this time, any model can be selected for training according to specific conditions.

Preferably, in step S4, a decoupling range is determined:

s43, knowing the similarity coefficient according to the above studyThe time-road network can be decoupled for model training, and combines similarity coefficient +.>And when the corresponding road network distance is combined with the input flow, determining the upper limit of the input flow at a certain distance, wherein the upper limit is the maximum boundary value of the decoupling range.

A decoupling device, comprising: the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the computer program realizes the steps of the road network signal control unit decoupling method when being executed by the processor.

A computer-readable storage medium having stored thereon a program for implementing information transfer, which when executed by a processor implements the steps of the road network signal control unit decoupling method of the present application.

Compared with the prior art, the application has the following priority:

a) According to the scheme, different traffic scenes are designed from the distance between adjacent intersections and the input traffic flow of an entrance way, and researches show that the larger the physical distance between the adjacent intersections is, the higher the independence between the intersections is, and the stronger the model decoupling property is;

b) Compared with the MADRL algorithm based on the inter-agent joint game, the MADRL algorithm based on the independent learning mode achieves an ideal control effect in the application range, and the algorithm training difficulty is low;

c) MADRL algorithm based on independent learning mode can be applied to traffic demand scenes of different scales in multi-intersection coordination control, and has the capability of adapting to new scenes.

d) Static decoupling of the road network signal control unit refers to decoupling and dividing the road network signal control unit according to static data, and designing and optimizing signal control strategies among different divided areas independently of each other. The method can be more optimized for different intersections, and the historical traffic flow and relative position factors of different intersections are fully considered, so that traffic jam is reduced, and road traffic capacity is improved. In addition, the decoupling operation can be optimized and decided according to the pre-acquired static data, so that the static decoupling of the signal control unit can be operated independently of the real-time data, and greater flexibility and reliability are provided. The static decoupling has the advantages of the static decoupling in a comprehensive way, and meanwhile, basic data support is provided for subsequent dynamic decoupling.

Drawings

Fig. 1 is a structural framework of a model.

Fig. 2 is a set of phase selections.

Fig. 3 is a schematic illustration of a simulation environment.

Fig. 4 shows a maximum queuing length frequency distribution: where (a) in fig. 4 is a maximum queuing length frequency distribution of model 1 in the low density state, (b) in fig. 4 is a maximum queuing length frequency distribution of model 1 in the medium density state, and (c) in fig. 4 is a maximum queuing length frequency distribution of model 1 in the high density state.

FIG. 5 is a graph showing the comparison of evaluation indexes (cumulative delay time of vehicle average) under different methods.

Fig. 6 is a flow chart of the present application.

Detailed Description

The following detailed description is exemplary and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application.

Fig. 1, fig. 2 and fig. 6 show a network connection signal coordination control method based on multiple intelligent agents under a decoupling condition, which comprises the following steps:

in the conventional road network signal control system, traffic signal control units are often designed based on intersections or road sections, and the method easily causes high coupling between the signal control units and influences the control effect. Therefore, the application provides a decoupling method for the road network signal control unit of the simplified complex model. According to the method, the road network is divided into a plurality of sub-areas by analyzing the road network topological structure, each sub-area is used as an independent signal control unit for design and control, and the dividing units are adjusted and optimized according to actual conditions so as to achieve a better control effect. The decoupling method can reduce the coupling between the signal control units and improve the control effect of the road network signal control system.

S11, data preparation: static data of the road network is collected, wherein the static data comprise information such as road network connection relation, historical traffic flow, road types and the like. In the application, the data can be obtained through traffic simulation software SUMO or field investigation.

S12, road network segmentation: a road network topology model is constructed from the collected data, which model is typically a graph in which roads are represented as nodes and paths traveled by the vehicle are represented as edges, the weights of which represent the length of the roads connecting the nodes. Clustering is carried out according to the relative positions of nodes of the topological structure model of the road network, the whole road network is divided into a plurality of road sections or areas, and each road section or area is called a signal control unit.

S13, decoupling and dividing: and (3) distributing the traffic flow data collected in the step (S11) to each signal control unit, and further dividing the road network division result in the step (S12) by using a decoupling method according to the distributed traffic flow data, so that the traffic flow in each signal control unit is relatively balanced, and meanwhile, the traffic flow among different signal control units is less, and the divided result comprises a single-point signal control intersection, a main line signal control road section and an area signal control road network. The decoupling method based on static data in the application mainly adopts a clustering method, clusters again according to the traffic on the basis of the node relative position dividing result, separates the traffic flow irrelevant parts in the road network, and realizes the decoupling in the road network signal control unit.

assuming that the entire intersection includescThe flow direction is used for acquiring the queuing length of vehicles on each lane as the traffic state of the current lane, so that the traffic state of the whole intersection is input bycThe number of queuing length values, i.estate={q ₁ ,q ₂ ,q ₃ ,…, q _c };

S21, respectively deploying the intelligent agents at all intersections, and sharingThe intelligent agent is +.>The intersection status observed at the moment is noted as (/ -or->) Wherein->Representing intelligent agent->At->Observing the state of the intersection where the user is located at the moment, (-)>) The union of->The overall traffic state at the moment;

s22, each agent makes a corresponding decision action according to the traffic state, the decision action is defined as the selection of signal phases, and a selectable phase set is recorded asI.e. a set of actions, which is of finite dimension, fromAction set->Is selected from->The action to be taken by each agent at the moment is recorded as +.>Then->The method comprises the steps of carrying out a first treatment on the surface of the Signal lamp execution->Decision action of moment->The duration of action execution is variable, noted +.>Then at +.>The moment environment feeds back the action to rewards and punishments, and an average queuing length is used as an evaluation index to construct a rewarding function +.>Influence of quantitative decision action:

wherein the number of lanes at an intersection is recorded as，/>Is->Lane (S),>refers to the queuing length in the ith lane.

S3, constructing a deep Q network model of the signal control unit, and training the model;

s31, initializing model superparameters, and acquiring current traffic states from simulation dataAnd maximum queuing length of different flow directions of each inlet channelO _t Model super parameters include discount coefficient->Green light preset time->Transition time of yellow lamp->；

s33, taking the current traffic state, decision action, rewarding value and traffic state of the next time step as a four-element groupStored in an experience playback unit, along withThe machine extracts small batches of sample data from the experience pool and uses the small batches of sample data to update the weight value of the neural network;

S4, applying the trained model to different test scenes to determine a decoupling range.

S42, testing the model in the step S3 and the combined game model in the same traffic scene, and recording evaluation indexes, wherein the evaluation indexes can comprise average speed of a vehicle, average queuing length of the vehicle and average delay of the vehicle, the two groups of evaluation indexes are measured by Jaccard similarity coefficients, and the sets of the two groups of evaluation indexes are respectively recorded asAnd->The similarity coefficient calculation formula is as follows:

S43, determining a decoupling range: from the above study, similarity coefficients are knownThe time-road network can be decoupled for model training, and combines similarity coefficient +.>Corresponding road network spacingAnd the input flow combination is used for determining the upper limit of the input flow at a certain interval, wherein the upper limit is the maximum boundary value of the decoupling range.

S5, real-time control of the intersection is achieved by using the model under the decoupling condition.

Description of the preferred embodiments

1) Simulation environment construction

The microscopic traffic simulation platform SUMO (Simulation of Urban Mobility) is utilized to carry out simulation experiments, and the main functions of SUMO include road network construction, traffic demand generation, acquisition of various evaluation indexes in simulation and the like. The study realizes real-time interaction functions of real-time information of an intersection and a multi-agent depth Q network algorithm by connecting the TraCI interface of the SUMO with Python software. The signal control model is built by using the environments of three crossroads, and the conditions of decoupling of the environment analysis models comprising two crossroads are utilized, wherein four lanes of the two environments are two-way six lanes, and the two lanes comprise a left-turn lane, a straight lane and a straight right lane, as shown in fig. 3.

After the model is built, the number of intersections is transversely expanded to 6, and the number of intersections is transversely and longitudinally expanded to 9 based on the environment of three intersections, so that two new scenes are formed, and the same model is adopted for training and testing the new scenes to verify the expandability of the model.

The initialized superparameter settings for the model are shown in table 1.

Table 1 super parameter set-up case of model

Super parameter	Initialization value
		Simulation time stephorizon	3600
Number of simulated roundsepisode	200
		Number of iterationsiteration	4
Experience pool capacity	50000
		Batch sizebatch size	600
Learning rate	0.0003
		Discount coefficientγ	0.9
Green light preset timeg _t	6
		Transition time of yellow lampy _t	3

2) Traffic demand scene setting

It is generally recommended that urban road network spacing be set to: 1500-2500 meters of expressway; 700-1200 meters of main road; the secondary trunk is 350-500 m; the branch is 150-250 meters. Therefore, the road spacing range is preliminarily determined to be [100m,2500m ], five spacings are set, three types of flow are set for each spacing, 15 total spacings and flow combinations are used as training scenes, and the model decoupling range is searched.

Model decoupling condition experiments are carried out according to 5 intersection intervals, a minimum value of 100m of the intersection intervals is set, three flow inputs of low, medium and high are set under each interval, simulation duration is 3600s, 6 entrance ways are arranged on the intersection, namely east, west, south 1, south 2, north 1 and north 2, and in order to distinguish traffic flow traffic states of different grades, the distribution ratio of traffic flows in all directions of the intersection is shown in a table 2.

Table 2 trained traffic scene settings

Training the model based on the five demand scenes, wherein the traffic states of the road sections between adjacent signal lamps are observed in the discretization training process. Firstly, the road section between adjacent signal lamps is divided into three sections, and the section boundary value is recorded as，/>，/>，The division ratio of the sections is respectively marked as +.>，/>The upper limit of the queuing length is lane length +.>The boundary value of each section is +.>，/>，/>，/>。

Taking model 5 as an example, the upper limit of the queuing length is the lane lengthThe queuing length of each flow direction is divided into a plurality of sections, each section is further divided according to unit queuing length intervals, each flow direction after discrete processing comprises 40 traffic state spaces, and the queuing length intervals of the five models are divided as shown in tables 3-4.

TABLE 3 queuing length interval partitioning

Model	q1	q2	q3	q4
					Model 1	0m	30m	60m	100m
Model 2	0m	60m	120m	200m
					Model 3	0m	90m	180m	300m
Model 4	0m	120m	240m	400m
					Model 5	0m	150m	300m	500m

TABLE 4 arrangement of unit queuing length interval

Model	b1	b2	b3
				Model 1	3m	3m	2m
Model 2	6m	6m	4m
				Model 3	9m	9m	6m
Model 4	12m	12m	8m
				Model 5	15m	15m	10m

The number of traffic state spaces at different intersection spacings was counted as shown in table 5.

TABLE 5 State space division quantity case

Intersection spacing	100m	200m	300m	400m	500m
						Number of state spaces	40	40	40	40	40

In order to further analyze the association degree between intersections, the maximum queuing length frequency distribution of the model 1 under different traffic degrees was counted, as shown in fig. 4 (a), fig. 4 (b), and fig. 4 (c). With the increase of the input flow, the spatial distribution of the traffic state of the road section with the maximum queuing length between adjacent signal lamps gradually moves to the right, and the association degree values are also gradually increased, wherein the association degree values corresponding to the low density and the medium density are all concentrated at the first two of the queuing length intervals, and the association degree value corresponding to the high density starts to move towards the last interval, which means that the association degree of the adjacent crossing is higher as the input flow is more at the same interval. Further, from the point of view of the overall evaluation index of the intersection, the difference of the control effect of the two trained models on the intersection is observed, and particularly, the intersection association degree under high density is higher, and a critical range of decoupling or not possibly exists.

3) Convergence analysis of models

The index of the loss function reflects the training quality of the multi-agent depth Q network signal control model, and the smaller the loss function value is, the better the model is trained. Training is carried out based on the algorithm flow, the loss function value of the model is reduced in fluctuation, the model tends to be stable, and the model converges.

4) Performance and alternatives analysis of models

And selecting several indexes of the running speed of the vehicle, the maximum queuing length and the average queuing length of the entrance way to evaluate the control effect of the multi-agent depth Q network model. Wherein, the running speed of the vehicle refers to the average value of the instantaneous speeds of all detected vehicles at a certain moment; the maximum queuing length of the entrance way refers to the maximum queuing length of the four entrance ways detection vehicles at a certain moment; the average queuing length refers to the average of the queuing lengths of all the entrance lanes detecting vehicles at a certain time.

First, 5 rounds of simulation tests are performed on the training convergence model by using the requirement scene during training, and the 5 rounds of test results are averaged, so that the obtained evaluation index data are shown in table 6.

Table 6 test evaluation index data of each model

	Model 1-Low Density	Model 1-Medium Density	Model 1-high density
				Vehicle travel speed	8.05	14.4	12.34
Maximum queuing length	3.72	9.8	15.95
				Average queuing length	0.32	1.42	3.03

According to the results of the test experiments, under the low density and the medium density, the maximum queuing length and the average queuing length of the entrance road are both at stable and lower levels, the running speed of the vehicle rises along with the increase of detected vehicles, the model performance is stable, and the feasibility of the MADRL algorithm based on the independent learning mode in the field of multi-intersection signal coordination control is verified. In a high-density scene, the running speed of the vehicle is reduced to a certain extent compared with the medium density due to congestion, and the maximum queuing length and the average queuing length have certain fluctuation. In order to further test the control effect of the model, MADRL algorithm based on the inter-agent joint game under the same scene is trained, the trained model is tested, evaluation index data of the two algorithms are compared, and the application range of the research model is tested.

And 5 rounds of simulation tests are also carried out on MADRL algorithm based on the inter-agent joint game, 18000 groups of data are added, similarity analysis is carried out on the evaluation index data obtained by the two algorithms, the evaluation index data comprise mean value, variance and similarity coefficient, the difference degree of the two groups of data is observed from different dimensions, and the mean value of the evaluation index is shown in a table 7.

Table 7 test evaluation index data of each model

Experimental results show that under the low density and the medium density, the evaluation indexes of the two models are equivalent, compared with the combined model under the corresponding scene, the evaluation index change of the model 1 is +/-5%, the model belongs to an acceptable range, the combined game model can be replaced by the model trained by the independent learning mode, in addition, in the model training time, the model training time based on the independent learning mode is 1.5 hours, and is 1 hour less than that based on the combined game model, and the training time is shortened by 40%. When the flow is in high density, the performance of the original model is obviously reduced compared with that of the combined game model, which shows that when the flow is gradually increased, the association degree between intersections is increased, and the combined control effect is better than that of the independent mode. For model 1, the flow level is in the range of [0, 300] which is the application range of the model at the current intersection distance, the range of [300, 500] is the transition stage of two model conversion, the selection of the control model can be determined by comprehensively considering the control effect and the training duration, the range of [500, + ] belongs to the model uncoupling-free range, the control by adopting the joint model is considered, and meanwhile, the problem of overlong training time of the joint model is to be noted.

5) Performance analysis of models in different scale scenarios

The experimental environment is expanded from two intersections to three, and the traffic flow in the application range of the independent mode is input, so that the performance condition of the model under different numbers of intersections is tested, and after model training is converged, the model training is compared with the trunk line green wave and the SAC algorithm based on the independent mode in the traditional method. And the observation vehicles accumulate delay time to evaluate and analyze the control optimization effect of the model. The evaluation index pairs of the three signal control methods are shown in fig. 5.

For the evaluation index of the accumulated delay time of the vehicle, the average value of the green wave of the trunk line is 19.59s, the average value of the SAC algorithm based on the independent mode is 7.38s, the average value of the depth Q network model based on the independent mode is 8.65s, the delay time is reduced to a certain extent compared with the delay time of the traditional method, and the delay is slightly increased compared with the SAC training algorithm in the independent mode. Conclusion: when being compared with other signal control methods, the research model is superior to the traditional method model, has little difference with other training algorithms in the field of deep reinforcement, and verifies the effectiveness of the training method proposed by the research.

The scene is expanded to six intersections and nine intersections, and the evaluation indexes are continuously observed as shown in table 8, so that the model can be effectively controlled under the condition that the number of the intersections is increased, and the model is proved to have certain expandability.

Table 8 evaluation index data in different traffic scenarios

	Vehicle travel speed	Maximum queuing length	Average queuing length
				Six intersections	30.95	27.61	4.59
Nine intersections	29.65	32.46	5.11

A computer readable storage medium, on which a program for implementing information transfer is stored, which program, when being executed by a processor, implements the steps of the road network signal control unit decoupling method according to the application.

The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. The method for decoupling the road network signal control unit of the simplified complex model is characterized by comprising the following steps of:

s3, training the deep reinforcement learning model by using a neural network;

s4, applying the trained deep reinforcement learning model to different test scenes to determine a decoupling range.

2. The method for decoupling road network signaling control units with simplified complex model as claimed in claim 1, wherein, in step S1,

3. The method for decoupling road network signaling control units with simplified complex model as claimed in claim 1,

characterized in that in step S2,

s21, respectively disposing the intelligent agents on all intersections, sharing the intelligent agents, and recording the intersection states observed by all the intelligent agents at the moment as%) Wherein->Representing intelligent agent->At->Observing the state of the intersection of the self-body at any time) The union of->The overall traffic state at the moment;

wherein the number of lanes at an intersection is recorded as，/>Is->Lane (S),>indicate at +.>The length of the line of the lane,is->The state observed by the agent at the moment.

4. A method for decoupling a road network signal control unit with simplified complex model as claimed in claim 3,

wherein in step S2, it is assumed that the entire intersection includescThe flow direction is used for acquiring the queuing length of vehicles on each lane as the traffic state of the current lane, so that the traffic state of the whole intersection is input bycThe number of queuing length values, i.estate={q ₁ ,q ₂ ,q ₃ ,…,q _c };

5. The method for decoupling road network signal control units for simplifying complex models according to claim 1, wherein in step S3, the training process of the deep reinforcement learning model is as follows:

s31, initializing model superparameters, and acquiring current traffic states from simulation dataAnd the maximum queuing length of different flow directions of each inlet channel +.>The method comprises the steps of carrying out a first treatment on the surface of the Super parameters include discount coefficient->Green light preset time->Transition time of yellow lamp->；

6. The method for decoupling road network signaling control units with simplified complex model as claimed in claim 1,

characterized in that in step S4,

For the traffic flow input by the road network, firstly determining the number of inlets of the traffic flow which can be input by the road network, wherein the number of inlets is related to the number of intersections included in the road network, and the number of known intelligent agents is defined asThe road network comprises the number of intersections>The road network line number is marked as +.>The column number is->The number of inlets capable of inputting flow is as follows:

is->The total number of traffic flow input at the moment is in direct proportion to the road-network spacing, and the larger the road-network spacing is, the more the number of vehicles can be accommodated in road sections, and the more the total number of traffic flow can be input;

the association degree of adjacent intersections is defined by combining the road network distance and the input flow, wherein the road segments between the adjacent intersections are discretely divided, and the known road segment length isIs divided into->The zone boundary is +.>The left boundary of each section is marked +.>The right border is marked->Each section is provided with a unit length of +>Discretization interval number ∈>Acquiring the queuing length of the straight lanes of the road section between adjacent intersections in the simulation at each moment, discretizing the queuing length into the sections, and counting the frequency number of each section as +.>Defining the association degree as the average value of the left and right boundaries of the discretization interval corresponding to the frequency maximum value, wherein the formula is as follows:

7. The method for decoupling a road network signal control unit with simplified complex model as claimed in claim 6, wherein in step S4, it is determined whether decoupling is possible:

s42, testing the deep reinforcement learning model trained in the step S4 and the combined game model in the same traffic scene, and recording evaluation indexes including vehicle average speed, vehicle average queuing length and vehicle average delay, wherein two groups of evaluation indexes are measured by Jaccard similarity coefficients, and the sets of the two groups of evaluation indexes are respectively recorded asAnd->The similarity coefficient calculation formula is as follows:

the value range of the similarity coefficient is between 0 and 1, when the coefficient is close to 1, the two sets are very similar, when the coefficient is close to 0, the two sets are dissimilar, and the similarity coefficient is setIn the process, the evaluation indexes of the two algorithms are relatively close, and the road network can perform decoupling control.

8. The method for decoupling road network signaling control units with simplified complex model as claimed in claim 7,

the method is characterized in that in step S4, a decoupling range is determined:

s43, according to the similarity coefficientThe time-path network can be decoupled for model training, and combines with the similarity coefficientAnd when the corresponding road network distance is combined with the input flow, determining the upper limit of the input flow at a certain distance, namely the maximum boundary value of the decoupling range.

9. A decoupling device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor performs the steps of the road network signal control unit decoupling method according to any one of claims 1 to 8.

10. A computer-readable storage medium, wherein a program for implementing information transfer is stored on the computer-readable storage medium, and the program when executed by a processor implements the steps of the road network signal control unit decoupling method according to any one of claims 1 to 8.