CN113096418B

CN113096418B - Traffic network traffic light control method, system and computer readable storage medium

Info

Publication number: CN113096418B
Application number: CN202110366053.5A
Authority: CN
Inventors: 李才博; 王迅
Original assignee: Zhaotong Liangfengtai Information Technology Co ltd
Current assignee: Zhaotong Liangfengtai Information Technology Co ltd
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2022-04-22
Anticipated expiration: 2041-04-06
Also published as: CN113096418A

Abstract

The invention provides a traffic network traffic light control method, a traffic network traffic light control system and a computer readable storage medium based on edge computing. Each edge node sends the real-time traffic running state to a cloud server, and the cloud server optimizes the timing scheme of the whole traffic network according to a specific optimization algorithm to obtain the optimal set time of the current traffic light of each intersection; the cloud server then sends the corresponding time adjustment scheme to each edge node. Meanwhile, each edge node performs regional timing optimization based on a local traffic network formed by nodes connected with the edge node to obtain a regional traffic optimization-based local timing scheme; the two schemes compete, and the scheme which improves the optimization target most is adopted; each edge terminal control module then makes real-time adjustments to the traffic signal equipment controlled by it.

Description

Traffic network traffic light control method, system and computer readable storage medium

Technical Field

The invention relates to the technical field of traffic control, in particular to a traffic network traffic light control method, a traffic network traffic light control system and a computer readable storage medium.

Background

The existing traffic light control strategies are all relatively backward, the waiting or passing time of the traffic light of the intersection is basically set according to the past experience of the traffic volume of the intersection, and the early and late peak periods of the intersection during work and holidays or the burst of the traffic volume under emergency are ignored. Therefore, traffic flow pressure is brought to the intersection and the road sections passed by the adjacent intersections, the passing efficiency of traffic is reduced, and inconvenience is brought to the traveling of urban residents. In addition, a trip user often cannot acquire traffic information of a relevant road section in time, a more suitable trip scheme cannot be selected, and the lagged nature of the information enables a congested road section to last for a long time.

Edge calculation is a distributed operation architecture, which divides the large service processed by the central node into smaller and easier-to-manage active units, and distributes them to each edge node for processing. The edge node is closer to the terminal equipment, so that the processing and transmission speed of data can be increased, the delay is reduced, and the real-time performance is improved. Under the structure, the analysis of the data and the generation of knowledge are closer to the real source of the data information, so that the method is more suitable for processing large data.

The internet of things is 'the internet connected with everything', is an extended and expanded network on the basis of the internet, connects any terminal with the internet through various information sensing devices according to an agreed protocol, and performs information exchange and communication so as to realize intelligent detection, identification, control and management on objects.

The whole road system of a city is a huge network, each intersection can be used as an edge node of the network, traffic flow information in the network is huge data, and the traffic system is optimized based on the whole network, so that the traffic operation efficiency can be greatly improved, and traffic resources are more reasonably utilized. If the road information such as video monitoring of each edge node is transmitted as data to the server for processing, it will bring great processing pressure to the central edge node of the network and cause a certain delay. Therefore, the intelligent traffic light control system of the urban traffic network based on the edge calculation integrates the traffic resources of the whole traffic network and performs local timing optimization in each edge node area. Based on the combination of global optimization and local optimization, the traffic resource allocation can be greatly facilitated and the traffic efficiency can be greatly improved in the global, intelligent and real-time aspects of the urban traffic system.

In the prior art, in some schemes, a Q-learning reinforcement learning algorithm or a deep Q network algorithm is used for timing optimization of a traffic model, but a manually set Q function or a manually designed action selection strategy is often greatly limited and cannot well utilize the most valuable characteristics carried by a state from state information. Moreover, the motion space controlled by the traffic signal is often continuous, and some reinforcement learning algorithms based on discrete motion space are not suitable for actual scenes.

The traditional signal control system generally adopts a multi-period timing signal machine, an induction type signal machine and a centralized coordination type signal machine. The control scheme of the traffic signal mostly adopts the fixed timing according to the past vehicle throughput experience of each direction of the intersection or establishes a mathematical model for a certain intersection and then optimizes the timing in a self-adaptive manner. However, when the road junction scale is enlarged and the urban traffic is expanded, the centralized control system cannot meet the real-time optimization of the communication transmission of a large number of traffic data streams and the traffic control strategy, and the traffic model is larger and larger, and the maintenance difficulty is also increased. In the face of excessively complex data, the traditional traffic signal equipment control scheme and the traffic data processing method cannot match the requirements of the current traffic control optimization.

Some places adopt the traffic signal device that can artifical adjustment time to place in the middle of the crossing, go to the operation by the traffic police when needs adjustment, the problem that current highway section meets can not be fine solved on this kind of adjustment method with subjectivity is big probability, and when the traffic flow is great, when the traffic is complicated, the personal safety of traffic police receives the threat easily moreover. And the adoption of multiple sensors to acquire the traffic flow state means road modification cost and installation and maintenance cost of sensor equipment.

Disclosure of Invention

In order to overcome the technical defects, the present invention provides a traffic network traffic light control method, system and computer readable storage medium, which utilize traffic information resources to relieve traffic pressure and provide more real-time road condition information for travel users.

The invention discloses a traffic network traffic light control method based on edge calculation, which comprises the following steps: acquiring real-time videos of each intersection, and detecting and acquiring current traffic flow data of each intersection based on a YOLO model; the traffic flow data comprises the number of vehicles in each direction and the number of vehicles in different lanes in each direction;

setting edge nodes by taking an intersection as a unit, and carrying out timing optimization processing by combining the traffic flow data of the edge nodes and adjacent edge nodes, current traffic light state information and average vehicle speeds of lanes in different directions to obtain a local timing scheme of the edge node pairs, wherein the local timing scheme comprises traffic signal equipment adjustment information; integrating the traffic flow data of all edge nodes, current traffic light state information and average speed of each lane in different directions to perform timing optimization processing to obtain a global timing scheme, wherein the global timing scheme comprises traffic signal equipment adjustment information;

respectively calculating and obtaining the benefits of the local timing scheme and the global timing scheme, and selecting the higher benefit of the local timing scheme and the global timing scheme as an implementation scheme to control traffic signal equipment of each intersection;

receiving and storing traffic states of each intersection in real time to form a traffic state database, wherein the traffic states comprise traffic flow data, current traffic light state information and average speeds of all lanes in different directions; the user acquires the traffic state information of any intersection in real time through the wireless network.

Preferably, the acquiring the real-time video of each intersection and detecting and acquiring the current traffic data of each intersection includes: selecting an interested area in the real-time video through an algorithm based on a channel attention mechanism, wherein the interested area is an area containing a road; setting a threshold n and a confidence b of the prior frame number to obtain a prior frame of the feature map; counting the prior frames in the region of interest which are greater than the confidence b to obtain a_n(ii) a If a_nIf the number of the prior frames is less than n, deleting all the prior frames of the area; if a_nAnd if the number n is larger than n, detecting the area through a non-maximum suppression algorithm to obtain the current traffic flow data of each intersection.

Preferably, the time-distribution optimization processing is performed by integrating the traffic flow data of all edge nodes, the current traffic light state information and the average speed of each lane in different directions, and the obtaining of the global time-distribution scheme includes: performing iterative optimization on the traffic network composed of all edge nodes, and setting a weight value for each edge node

Wherein n is_xNumber of vehicles at current intersection, W_xThe weight of the current intersection in the model to be optimized is defined, R is an edge node set incorporated into the whole traffic network, and the sum of the weight values of all edge nodes is 1;

the current traffic running state of each direction at the intersection is Q, and the method comprises the following steps: num _ L represents the number of vehicles in a current left-turn lane, num _ R represents the number of vehicles in a right-turn lane, num _ S represents the number of vehicles in a straight-through lane, v _ L represents the average speed of the left-turn lane, v _ R represents the average speed of the right-turn lane, v _ S represents the average speed of the straight-through lane, num _ A represents the number of vehicles about to enter the intersection, t _ L represents the state of a left-turn signal lamp, and t _ S represents the state of a straight-through signal lamp; when the output of the signal lamp state value is positive, representing the green lamp remaining time; when the output of the signal lamp state value is negative, representing the red lamp remaining time; the traffic running state of each edge node is S ═ Q | Q_xX ∈ (1, 2., k) }, k ═ 3 or k ═ 4, representing a three-way intersection and an intersection, respectively; the current whole traffic running state is S ═ S | S₁,S₂,...,S_nN ∈ R }, the weight is set to W ═ W | W ═ W₁,W₂,...,W_n,n∈R}；

And constructing a second deep reinforcement learning model by using the sum of the vehicle passing numbers of each edge node in unit time as a reward, and calculating and acquiring the global timing scheme through the second deep reinforcement learning model.

Preferably, the step of setting edge nodes by taking the intersection as a unit, and performing timing optimization processing by combining the traffic flow data of the edge nodes and the adjacent edge nodes, the current traffic light state information and the average speed of each lane in different directions to obtain the local timing scheme of the edge nodes comprises the following steps: setting a weight value for each edge node

Describing the strength of the association degree between the adjacent edge nodes and the edge nodes; wherein n is_xNumber of vehicles coming into the target edge node from the current edge node, d_xThe length of a road section between a current edge node and a target edge node is defined, Z is an edge node set adjacent to the target optimized edge node, and the sum of weight values of all edge nodes is 1;

the current traffic running state of each direction at the intersection is Q, and the method comprises the following steps: num _ L represents the number of vehicles in a current left-turn lane, num _ R represents the number of vehicles in a right-turn lane, num _ S represents the number of vehicles in a straight-through lane, v _ L represents the average speed of the left-turn lane, v _ R represents the average speed of the right-turn lane, v _ S represents the average speed of the straight-through lane, num _ A represents the number of vehicles about to enter the intersection, t _ L represents the state of a left-turn signal lamp, and t _ S represents the state of a straight-through signal lamp; when the output of the signal lamp state value is positive, representing the green lamp remaining time; when the output of the signal lamp state value is negative, representing the red lamp remaining time;

the traffic running state of each edge node is S ═ Q | Q_xX ∈ (1, 2., k) }, k ═ 3 or k ═ 4, representing a three-way intersection and an intersection, respectively; current state S of adjacent edge node_L{ (num, t) | (num _ i, t _ i), i ∈ (1, 2.. k) }, wherein num _ i is the number of vehicles coming into the target edge node from the adjacent edge node, t _ i is negative and represents the time of red light left in the lane corresponding to the adjacent edge node, and is positive and represents the time of green light left in the lane corresponding to the adjacent edge node, and the weight configuration W is { W | W ═ W }_nN ∈ (1,2,..., k) }; k is 3 or 4 and represents the total number of edge nodes adjacent to the target optimized edge node;

and constructing a first deep reinforcement learning model by using the vehicle passing number of the target edge node in unit time as a reward, and calculating and acquiring the local timing scheme through the first deep reinforcement learning model.

Preferably, the time-distribution optimization processing is performed by combining the traffic flow data of the edge node and the adjacent edge nodes, the current traffic light state information, and the average speed of each lane in different directions, and the obtaining of the local time-distribution scheme of the edge node includes: and sharing information between the edge nodes and the adjacent edge nodes, wherein the shared information comprises the number of vehicles to arrive and the current traffic light information.

Preferably, the selecting the traffic signal device with higher benefit from the two as the implementation scheme for controlling each intersection comprises: and fitting the behavior value function and the action selection strategy through the DDPG deep neural network model to optimize the implementation scheme, and controlling the traffic signal equipment by adopting the optimized implementation scheme.

Preferably, the respectively calculating and obtaining the benefits of the local timing scheme and the global timing scheme, and selecting the scheme with the higher benefit as the implementation scheme includes: by the formula

Calculating the benefits of the local time distribution scheme and the global time distribution scheme, wherein P represents the global time distribution scheme or the local time distribution scheme, S is the current traffic state of the edge node, V is the state and weight configuration set of the adjacent edge node,

to perform action P in the current traffic state;

selecting the implementation scheme with higher efficiency from the two by comparing a formula max (λ L _ j, (1- λ) L _ q), wherein L _ j represents the benefit brought by the local timing scheme, L _ q represents the benefit brought by the global timing scheme, a parameter λ e (0.4, 0.6), and λ ═ 0.8x²-0.8x +0.6, x being the distance of the target edge node from the urban traffic center, and x ∈ (0, 1), the farthest edge node distance being 1, the traffic center edge node distance being 0.

Preferably, the integrating the traffic flow data of all edge nodes, the current traffic light state information, and the average speed of each lane in different directions to perform timing optimization processing, and obtaining the global timing scheme further includes: respectively setting vehicle number thresholds of three levels of optimization-free, optimization-required and urgent optimization-required according to the road width and the road length of the edge node; grading the traffic flow state of each edge node in the traffic network according to the three vehicle number thresholds;

when the first preset number of nodes in the traffic network are in the optimization-free level and no edge node is in the urgent optimization-free level, stopping acquiring the global timing scheme; when a second preset number of nodes in the traffic network are optimization-free levels and no edge node is an urgent optimization-free level, setting the second preset number of edge nodes which are not required to be optimized in a suspended state and not including the global timing scheme; and when edge nodes which are in urgent need of optimization level appear in the traffic network or a first preset number of edge nodes are in need of optimization level, incorporating all the edge nodes into the global timing scheme.

A traffic network traffic light control system based on edge calculation comprises road monitoring equipment, edge nodes, a cloud server, traffic signal equipment and a user center; the road monitoring equipment acquires real-time videos of each intersection and transmits the real-time videos to the edge nodes which are arranged by taking the intersections as units;

the edge nodes analyze the real-time videos to detect and obtain current traffic flow data of each intersection, wherein the traffic flow data comprise the number of vehicles in each direction and the number of vehicles in different lanes in each direction; the edge node combines a target edge node and the traffic flow data of the edge node adjacent to the target edge node, the current traffic light state information and the average speed of each lane in different directions to carry out time distribution optimization processing, and a local time distribution scheme of the edge node pair is obtained, wherein the local time distribution scheme comprises traffic signal equipment adjustment information;

the cloud server integrates the traffic flow data of all edge nodes, current traffic light state information and average speed of each lane in different directions to perform timing optimization processing, and a global timing scheme is obtained, wherein the global timing scheme comprises traffic signal equipment adjustment information;

the edge node acquires the global timing scheme, respectively calculates the benefits of the local timing scheme and the global timing scheme, and selects the higher benefit of the local timing scheme and the global timing scheme as an implementation scheme to control the traffic signal equipment of each intersection;

the cloud server receives and stores traffic states of all intersections in real time to form a traffic state database, wherein the traffic states comprise the traffic flow data, current traffic light state information and average speeds of all lanes in different directions; and the user center acquires the traffic state information of any intersection in real time through a wireless network.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of any of the traffic network traffic light control methods described above.

After the technical scheme is adopted, compared with the prior art, the method has the following beneficial effects:

1. based on the edge calculation architecture, a complex model does not need to be established in the larger traffic network, more traffic nodes can be accommodated, the processing information amount is large, and the result is accurate;

2. the method only extracts the traffic state basic information based on video analysis, thereby saving the extra cost of various sensors and only needing extremely low equipment maintenance cost;

3. noise in the video is removed by selecting the region of interest in the real-time video, the calculated amount of a detection result is reduced, and the probability of inaccurate or false detection of a result region frame is reduced by carrying out secondary judgment on the prior frame, so that vehicle detection is more accurate;

4. by adopting local and global timing optimization and comparing the local timing scheme with the global timing scheme, the scheme with the highest benefit is selected as the implementation scheme, so that the requirements of the local optimization are met, and the association between local areas is also considered, so that the finally obtained implementation scheme data is balanced;

5. the cloud end can determine whether to bring each intersection into global optimization or not according to the current state of each intersection and whether to need the cloud end for optimization, so that the effects of intelligently working and resting the cloud end, saving energy, reducing energy consumption and improving the service period of the whole system are achieved; the method is different from the method that the cloud end keeps a computing state every moment in the prior art, so that the phenomena that the power consumption of a system is greatly increased and the service life of a computing unit is reduced due to the fact that computing is started for a long time are avoided;

6. the user who goes out can obtain the most real-time, the most accurate road traffic situation of relevant crossing on the route of going out through wireless network at any time, and it is more convenient to go out.

Drawings

FIG. 1 is a flow chart of a traffic network traffic light control method based on edge calculation according to the present invention;

FIG. 2 is a schematic architecture diagram of a traffic network traffic light control method based on edge calculation according to the present invention;

fig. 3 is a diagram of a DDPG algorithm structure of a traffic network traffic light control method based on edge calculation provided by the present invention.

Fig. 4 is a timing optimization work flow based on a single node of the traffic network traffic light control method based on edge calculation provided by the invention.

Detailed Description

The advantages of the invention are further illustrated in the following description of specific embodiments in conjunction with the accompanying drawings.

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience of description and for simplicity of description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention.

In the description of the present invention, unless otherwise specified and limited, it is to be noted that the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection via an intermediate medium, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.

Referring to the attached figures 1-2, the invention discloses a traffic network traffic light control method based on edge calculation, which comprises the following steps:

s1, collecting real-time videos of each intersection, and detecting and acquiring current traffic flow data of each intersection based on a YOLO model; the traffic flow data comprises the number of vehicles in each direction and the number of vehicles in different lanes in each direction;

s2, setting edge nodes by taking the intersection as a unit, and carrying out timing optimization processing by combining the traffic flow data of the edge nodes and adjacent edge nodes, the current traffic light state information and the average speed of each lane in different directions to obtain a local timing scheme of the edge nodes, wherein the local timing scheme comprises traffic signal equipment adjustment information;

s3, integrating traffic flow data of all edge nodes, current traffic light state information and average speed of each lane in different directions to perform timing optimization processing to obtain a global timing scheme, wherein the global timing scheme comprises traffic signal equipment adjustment information;

s4, respectively calculating and obtaining the benefits of the local timing scheme and the global timing scheme, and selecting the scheme with higher benefit from the local timing scheme and the global timing scheme as an implementation scheme to control traffic signal equipment of each intersection;

s5, receiving and storing the traffic states of each intersection in real time to form a traffic state database, wherein the traffic states comprise traffic flow data, current traffic light state information and average speeds of lanes in different directions; the user acquires the traffic state information of any intersection in real time through the wireless network.

According to the method, a target detection algorithm based on a YOLO model is used, a corresponding road photo data set containing vehicles is collected for each intersection, fine adjustment is performed on the data set, the detection algorithm is more suitable for a specific scene, and the detection is more accurate.

Firstly, before the real-time video is analyzed, the real-time video is subjected to denoising pretreatment: the method comprises the steps of selecting an area needing to be detected, carrying out fuzzy processing on invalid areas on two sides of a road, namely areas without benefits on detection results by using a specific algorithm, only reserving the area containing the road in a video, and selecting an area of interest in a real-time video, namely the area containing the road, by using an algorithm based on a channel attention mechanism, so that the calculation amount is reduced, and the influence of environmental noise is reduced.

Because actual detection is carried out, the number of the prior frames of which the areas with false detection are larger than the set threshold in the non-maximum value suppression algorithm is often few, but the confidence coefficient is higher, the invention carries out secondary judgment on the prior frames: setting a threshold n and a confidence b of the prior frame number to obtain a prior frame of the feature map; counting prior boxes in the region of interest greater than confidence bTo obtain a_n(ii) a If a_nIf the number of the prior frames in the region is less than n, considering that the target object does not exist in the region actually, deleting all the prior frames in the region; if a_nIf the number n is larger than n, the target object is considered to exist in the area actually at a high probability, and the area is detected through a non-maximum suppression algorithm to obtain the current traffic data of each intersection. By the method, the probability that false detection or result area frames are possibly inaccurate can be greatly reduced, and vehicle statistics of the intersection is more accurate. The threshold n of the prior frame number needs to be determined according to tests in actual scenes, and is different under different scenes.

For global timing optimization, when iterative optimization is performed on the whole traffic network, intersections with more vehicles have larger influence on the optimization target, and intersections with less vehicles have smaller influence on the optimization target, so that the degree of requirement for optimizing the timing scheme for intersections with more vehicles is greater than that for intersections with smaller vehicle number. Therefore, the invention sets the weight value for each edge node

Wherein n is_xNumber of vehicles at current intersection, W_xAnd R is an edge node set which is included in the whole traffic network, and the sum of the weight values of all edge nodes is 1.

An intersection comprises at least three directions, the current traffic running state of each direction of the intersection is Q, and the method comprises the following steps: num _ L represents the number of vehicles in a current left-turn lane, num _ R represents the number of vehicles in a right-turn lane, num _ S represents the number of vehicles in a straight-through lane, v _ L represents the average speed of the left-turn lane, v _ R represents the average speed of the right-turn lane, v _ S represents the average speed of the straight-through lane, num _ A represents the number of vehicles about to enter the intersection, t _ L represents the state of a left-turn signal lamp, and t _ S represents the state of a straight-through signal lamp; when the output of the signal lamp state value is positive, representing the green lamp remaining time; when the signal lamp state value output is negative, it represents the red lamp remaining time. And when the intersection is a three-way intersection, removing the corresponding parameters according to the specific conditions of the intersection in different directions.

Of each edge nodeThe traffic running state is S ═ Q | Q_xX ∈ (1, 2., k) }, k ═ 3 or k ═ 4, representing a three-way intersection and an intersection, respectively; the current whole traffic running state is S ═ S | S₁,S₂,...,S_nN ∈ R }, the weight is set to W ═ W | W ═ W₁，W₂，W_x，x∈R}。

And finally, constructing a second deep reinforcement learning model by using the sum of the vehicle passing numbers of each edge node in unit time as a reward, and calculating and acquiring a global timing scheme through the second deep reinforcement learning model.

For local timing optimization, there is a certain correlation between edge nodes, and there is a potential future time influence. For example, if two intersections a and B are currently in a congested state, and most of the forward direction of the vehicle is to go to the intersection C, the current state of the intersections a and B will affect the intersection C at a future time, and further affect the timing optimization.

The information flow between edge nodes contains the number of oncoming vehicles and current traffic light information. Example (c): if a shares its information to B, the content of the information a passes to B is: (1) the number of vehicles about to leave the node a and enter the road between A, B; (2) and B, current traffic light state information of the intersection A, and waiting for the remaining time of the red light or the remaining time of the green light passing. The sharing of information between adjacent nodes is bi-directional.

The timing optimization scheme based on the independent node only considers the current intersection and the current state, does not predict the future and does relevant timing optimization adjustment, and therefore the local timing optimization scheme needs to consider the current states of all nodes directly connected with the node. However, the influence degree of different adjacent node pairs on the node is different from the moment of influence, so that the relevance adjustment needs to be performed, that is, a weight is set for each adjacent node pair to describe the strength of the relevance degree between a certain node and the node which are directly adjacent.

Different from the weight in the global optimization, the weight in the local optimization method is determined by the physical distance between adjacent nodes and the current state of the adjacent nodes, that is, the nodes which are close in physical distance and have a large number of vehicles reaching the target optimization node have a stronger association degree with the target optimization node, and the nodes which have a longer physical distance and have a small number of vehicles reaching the target optimization node have a weaker association degree with the target optimization node.

And information sharing is carried out between the edge node and the adjacent edge node, the shared information comprises the number of vehicles to be arrived and the current traffic light information, and the two adjacent nodes sharing the information are called node pairs. Setting a weight value for each edge node pair

Describing the strength of the association degree between the adjacent edge nodes and the edge nodes; wherein n is_xNumber of vehicles coming into the target edge node from the current edge node, d_xLength of road section between current edge node and target edge node, d_iIs a constant value, and n_iThe variable is Z, the edge node set adjacent to the target optimization edge node is Z, and the sum of the weight values of all the edge nodes is 1.

The current traffic running state of each direction at the intersection is Q, and the method comprises the following steps: num _ L represents the number of vehicles in a current left-turn lane, num _ R represents the number of vehicles in a right-turn lane, num _ S represents the number of vehicles in a straight-through lane, v _ L represents the average speed of the left-turn lane, v _ R represents the average speed of the right-turn lane, v _ S represents the average speed of the straight-through lane, num _ A represents the number of vehicles about to enter the intersection, t _ L represents the state of a left-turn signal lamp, and t _ S represents the state of a straight-through signal lamp; when the output of the signal lamp state value is positive, representing the green lamp remaining time; when the signal lamp state value output is negative, it represents the red lamp remaining time. In the case of a three-way intersection, to ensure that the input vectors are the same shape, the corresponding missing parameter is set to 0 according to the intersection situation.

The traffic running state of each edge node is S ═ Q | Q_xX ∈ (1, 2., k) }, k ═ 3 or k ═ 4, representing a three-way intersection and an intersection, respectively; current state S of adjacent edge node_L{(num，t)|(num_i，t_i),i∈(1，2.., k), where num _ i is the number of vehicles coming into the target edge node from the adjacent edge node, t _ i is negative representing the time of red light remaining in the lane corresponding to the adjacent edge node and positive representing the time of green light remaining in the lane corresponding to the adjacent edge node, and the weight configuration W is { W | W ═ W }_nN ∈ (1,2,..., k) }; and k is 3 or 4 and represents the total number of edge nodes adjacent to the target optimized edge node.

And finally, the vehicle passing number of the target edge node in unit time is used as a reward, a first deep reinforcement learning model is constructed, and a local timing scheme is calculated and obtained through the first deep reinforcement learning model.

The invention combines two optimization schemes based on global and local, gives consideration to global and local efficiency; the configuration method with the weight gives consideration to fairness and benefits, and enables the nodes which are urgently needed to be adjusted most effectively. When a local area meets a short-time sudden traffic jam condition, a timing scheme based on the local area can win, the local jam is relieved in time, and a global scheme acts on other intersections and plays a role in guiding flow; when a plurality of global areas enter a congested condition or global traffic flows are stable, an optimization scheme based on the global situation can win, traffic flows are effectively dredged in time, and traffic flows are prevented from being converged to cause larger congestion. In addition, due to the future relevance of the local optimization scheme, traffic flow cannot be accumulated, and the situation that a plurality of continuous intersections need to wait for a long time due to more traffic flows is avoided.

Because the state space and the action space of the global traffic or the regional traffic are continuous and the strategy is determined, the DDPG algorithm is adopted to carry out optimization control on the traffic signal equipment, and the behavior cost function and the action selection strategy are fitted by a deep neural network.

Specifically, referring to fig. 3, S is the current state, a is the most valuable action taken in the S state, Q is the behavior cost function value of action a, and TD-error is the difference between the estimated Q value and the actual Q value, which is used to measure the accuracy of Critic network estimation and update the parameters of the Critic network. S ' is a new state sampled from the experience playback pool, a ' is the most valuable action taken under S ' state according to the Actor network, Q ' is the value of the behavioral cost function of taking action a ' under S ' state, and R is the reward value obtained immediately from the environment after taking action a '.

The task of the Actor network is to find out an action A which enables the Q value to be maximum from the state S and to be responsible for parameter iterative updating of the strategy output network, and the task of the Actor target network is to select a next action A ' which enables the Q ' value to be maximum from next state S ' sampled in an experience playback pool; the task of the Critic network is to predict the Q value of the action A and to be responsible for the iterative update of the parameters of the behavior value network, and the Critic target network is responsible for predicting the target Q' value. In order to converge and reduce the oscillation of the model, parameters of the critical network and the Actor network are updated to the critical target network and the Actor target network step by step in a soft updating mode, w ', theta' are parameters of the critical target network and the Actor target network respectively, w, theta are parameters of the critical network and the Actor network respectively, and the updating formula is as follows: w '. o.c.. tau.w + (1-tau) w'; θ '. about.τ θ + (1- τ) θ'; wherein τ is an update coefficient, which is generally obtained as small as 0.1, 0.2, etc.

The complete algorithm flow of DDPG is as follows:

s1, randomly initializing parameter values of the four networks (the target network parameter value is initialized to the corresponding current network parameter value), and playing back a pool by experience;

s2, setting the total iteration times, and performing iteration:

s201, initializing S to be a first state of a current state sequence;

s202, obtaining an action A based on the state S by using an Actor current network;

s203, executing the action A to obtain a new state S', an award R and whether the state pool is terminated;

s204, storing the quintuple of { S, A, R, S', boul } into an experience playback pool;

s205, S is equal to S', and a new state is entered;

s206, sampling a batch of samples from the experience playback pool, selecting A ' by using the Actor target network, and inputting S ' and A ' into the Critic target network to predict to obtain the current targetQ value yj:

gamma is a discount factor, is a real value and is E [0, 1]The value of the agent in the current state is the value of all possible rewards until the discount is achieved in the future, namely the discount, which represents the concern for long-term benefits and pre-ocular benefits. If only short term returns are of interest, i.e. γ tends towards 0, the agent is likely to fall into local optimality; if the future is too much concerned, that is, gamma tends to 1, the agent will appear "not to step on the ground", and will easily not converge, and the model will lose control effect. Gamma is generally about 0.9, but still needs to be adjusted in an actual scene.

S207, using a mean square error loss function

Updating all parameters of the Critic current network through gradient back propagation of the neural network;

s208, updating all parameters of the current network of the Actor through gradient back propagation of the neural network;

s209, updating the Critic target network and the Actor target network parameters according to a certain updating frequency, wherein the updating mode is soft updating: w '. o.c.. tau.w + (1-tau) w'; θ '. about.τ θ + (1- τ) θ';

and S2010, if the S' is in the termination state, finishing the current iteration, and otherwise, turning to the step S202.

Under the global optimization scheme, the state space is the current whole traffic running state S and is a continuous state space, and before the current whole traffic running state S is input into an Actor network and a Critic network, Hadamard operation is required to be carried out on the state space and the weight configuration W; the action space A { (L _ t, S _ t) | L _ t ∈ [ -90, -15] < U [15, 90], S _ t [ -90, -15] < U [15, 90] }, L _ t and S _ t respectively represent a left-turn signal lamp adjustment scheme and a straight-going signal lamp adjustment scheme, when the signs are negative, the left-turn signal lamp adjustment scheme and the straight-going signal lamp adjustment scheme are set to be red, and the magnitude of the numerical value represents the duration set to be red; when the symbol is positive, it represents a setting to green, the magnitude of the value represents the duration of the setting to green and the motion space is also continuous. And when the intersection is a three-way intersection, setting the direction parameter of the corresponding intersection which is missing as 0.

Under the local optimization scheme, the state space is S { (SQ, W × SL) }, which is a continuous space, and represents the hadamard operation. The action space A { (L _ t, S _ t) | L _ t ∈ 90, -15] < 15, 90], S _ t ∈ 90, -15] < 15, 90 }, the meaning of the parameters is the same as the global optimization mode, and when the intersection is a three-way intersection, the missing corresponding intersection direction parameter is set to be 0.

Since the global timing scheme is obtained based on global optimal target optimization, but the corresponding scheme is not necessarily optimal in local target benefit when being put locally, a local application adjustment assumption is made on the scheme from the cloud, and the benefit of the scheme on the node is evaluated.

The benefit is calculated in the following way:

p represents a global-based timing scheme from the cloud, S is the current traffic state of the node, V is a state and weight configuration set of adjacent nodes,

corresponding to applying the timing scheme from the cloud to the local, that is, performing the action P, in the current traffic state.

In order to avoid the two methods of competition from being soft, the comparison method is set to max (λ L _ j, (1- λ) L _ q), wherein L _ j represents the benefit brought by the local timing scheme, and L _ q represents the benefit brought by the global timing scheme. The parameter λ ∈ 0.4, 0.6, and λ ═ 0.8x²-0.8x +0.6, x being the distance of the target edge node from the urban traffic center, and x ∈ 0, 1, the farthest edge node distance being 1, the traffic center edge node distance being 0.

From the farthest to the traffic center, the distance is from 1 to 0, and λ has a value from 0.6 to 0.4 to 0.6, which results from three phenomena: 1) the urban edge intersections have small influence on the traffic running efficiency in the whole city, the traffic flow is small at ordinary times, and the traffic flow occasionally bursts without excessive intervention of global optimization; 2) the urban traffic center has large traffic pressure, and the traffic flow is mostly converged, so that the global-based optimization method is not suitable for the situation, and the powerful and rapid local timing optimization method is more suitable for the situation; 3) with the increase of the radiation range of the traffic network, the overall optimization and the diversion effect begin to appear, and the local optimization is only used for rapidly reducing the occasional peak flow. By the comparison method, the scheme with the highest benefit can be selected.

Under the condition of great majority of the whole year, the traffic flow is less at night or at the intersection far away from the center of the city, and at the moment, if the adjustment scheme of end cloud parallel optimization is still used, great energy loss can be caused.

Therefore, the traffic flow state of each node in the traffic network is analyzed at the cloud end, and three grades are set: cloud optimization is not needed, cloud optimization is urgently needed, and the physical significance of the cloud optimization corresponds to the three conditions of low traffic flow, medium traffic flow and high traffic flow of the node. And respectively setting the vehicle number threshold values of three levels according to the road width and the road length of the edge node.

When most of the traffic network is not required to be optimized and is not urgently required to be optimized, or even all of the traffic network is not required to be optimized, the cloud computing function is stopped, and only a data interface is provided for the outside; when part of the traffic network is not needed to be optimized, part of the traffic network is needed to be optimized and the traffic network is not needed to be optimized urgently, the part of the traffic network is set to be in a suspension state by the nodes which are not needed to be optimized, and the traffic network is not subjected to global optimization; when the traffic network is in urgent need of optimization or most nodes need to be optimized, the cloud end needs to bring all the nodes into the global state for optimization.

The invention also discloses a traffic network traffic light control system based on edge calculation, which comprises the following components:

-road monitoring equipment for acquiring intersection real-time video and transmitting to edge nodes;

the edge nodes are arranged by taking the intersection as a unit and are responsible for processing the traffic information corresponding to the intersection, performing local optimization, and performing communication and data transmission with the cloud server and the adjacent nodes; the main work content is as follows: (1) receiving a video stream of the road monitoring equipment, and carrying out real-time synchronous processing on the video input in each direction to obtain the number of vehicles in each direction and the number of vehicles in different lanes in each direction; (2) receiving information shared by adjacent edge nodes, and performing local timing optimization by combining the current intersection traffic state to obtain a local timing scheme based on the local traffic state; (3) uploading data corresponding to the node to a cloud server, wherein the data comprise the number of vehicles in each direction, the number of vehicles in different lanes in each direction, current traffic light state information and average speeds of the lanes in different directions, receiving a global timing scheme from the cloud server, competing with a local timing scheme obtained by the node, and controlling traffic signal equipment by adopting a scheme with the maximum optimization and promotion of target nodes, namely the maximum benefit;

-a cloud server: the system is responsible for receiving traffic state data from all edge nodes, performing time distribution optimization aiming at a traffic network on the whole situation to obtain a whole time distribution scheme of each node, and then issuing the whole time distribution scheme of the corresponding node to the edge nodes; in addition, the cloud server receives and stores the traffic states of all intersections to form a traffic state database, wherein the traffic states comprise that the edge nodes of all intersections are used as data interfaces, and external programs are used for accessing and acquiring information;

-user centric information flow: the travel app associated with the system can be any travel app, and the most accurate traffic condition of each intersection of the current whole traffic system can be obtained in real time. When a user program accesses a data interface of a relevant intersection, the traffic condition information corresponding to the node is pulled, and the information content comprises: the number of vehicles at the intersection and the average speed of the corresponding road section; when the intersection is congested, information suggesting to go to the intersection can be obtained;

-a traffic signal device: the edge nodes are directly controlled, and when the edge nodes determine the optimal timing optimization scheme, signal adjustment is carried out on the optimal timing optimization scheme, and the states of traffic lights at the intersection are adjusted.

Information flow of the edge node and the cloud server: the system comprises an uplink direction and a downlink direction, wherein the uplink direction indicates that the edge end transmits to the cloud end, and the downlink direction indicates that the cloud end transmits to the edge end. The information contained in the uplink is: the number of vehicles to be reached by the node, the number of vehicles currently existing in the node, the traffic light state information of the current node and the average speed of each road section of the node.

Referring to fig. 4, each edge node obtains a related data source from the traffic equipment module controlled by the edge node, and performs dynamic real-time video analysis; sharing information between the edge nodes; and all edge nodes upload node traffic state data information to the cloud. The cloud server and the edge node respectively perform timing optimization based on the global and local to obtain two timing optimization schemes; the cloud server sends the optimized timing scheme to each node, scheme election is carried out on each edge node, and the timing scheme with the highest benefit is selected; and the edge node adjusts the traffic signal equipment according to the finally obtained timing scheme.

The invention collects road conditions of related road sections of each intersection in real time based on edge end camera equipment, and each traffic light intersection of the whole city and the road sections connected among the intersections are connected as edge nodes to form a real-time and dynamic urban traffic network. Information sharing is carried out between edge nodes connected with the road sections, and the central cloud server shares information of all the nodes. Each edge node sends the real-time traffic running state to a cloud server, and the cloud server optimizes the timing scheme of the whole traffic network according to a specific optimization algorithm to obtain the optimal set time of the current traffic light of each intersection; the cloud server then sends the corresponding time adjustment scheme to each edge node. Meanwhile, each edge node performs regional timing optimization based on a local traffic network formed by nodes connected with the edge node to obtain a regional traffic optimization-based local timing scheme; after receiving a global timing optimization scheme issued by a cloud, the two schemes compete to be adopted, wherein the optimization target is improved most; then each edge terminal control module adjusts the traffic signal equipment controlled by the edge terminal control module in real time, so as to achieve the purposes of reasonably utilizing traffic resources and improving traffic efficiency. Meanwhile, the traffic flow and the speed of each intersection and the suggestion that the intersection is approached can be issued to the user center by the cloud server, and the most real-time traffic information and the most intelligent suggestion are provided for the users.

The invention is based on the edge computing framework, so that data extraction and local optimization are operated locally, and large-scale operation and global optimization are operated at the cloud end, so that the road condition monitoring is more real-time, and further the traffic signal control system is more intelligent.

The invention also discloses a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the traffic network traffic light control method described above.

It should be noted that the embodiments of the present invention have been described in terms of preferred embodiments, and not by way of limitation, and that those skilled in the art can make modifications and variations of the embodiments described above without departing from the spirit of the invention.

Claims

1. A traffic network traffic light control method based on edge calculation is characterized by comprising the following steps:

acquiring real-time videos of each intersection, and detecting and acquiring current traffic flow data of each intersection; the traffic flow data comprises the number of vehicles in each direction and the number of vehicles in different lanes in each direction;

setting edge nodes by taking an intersection as a unit, and carrying out timing optimization processing by combining the traffic flow data of the edge nodes and adjacent edge nodes, current traffic light state information and average vehicle speeds of lanes in different directions to obtain a local timing scheme of the edge node pairs, wherein the local timing scheme comprises traffic signal equipment adjustment information;

integrating the traffic flow data of all edge nodes, current traffic light state information and average speed of each lane in different directions to perform timing optimization processing to obtain a global timing scheme, wherein the global timing scheme comprises traffic signal equipment adjustment information;

2. The traffic network traffic light control method according to claim 1, wherein the acquiring real-time video of each intersection and detecting and acquiring current traffic flow data of each intersection comprises:

selecting an interested area in the real-time video through an algorithm based on a channel attention mechanism, wherein the interested area is an area containing a road;

setting a threshold n and a confidence b of the prior frame number to obtain a prior frame of the feature map; counting the prior frames in the region of interest which are greater than the confidence b to obtain a_n(ii) a If a_nIf the number of the prior frames is less than n, deleting all the prior frames of the area; if a_nAnd if the number n is larger than n, detecting the area through a non-maximum suppression algorithm to obtain the current traffic flow data of each intersection.

3. The traffic network traffic light control method according to claim 1, wherein the time-distribution optimization processing is performed by integrating the traffic flow data of all edge nodes, the current traffic light state information and the average speed of each lane in different directions, and the obtaining of the global time-distribution scheme comprises:

performing iterative optimization on the traffic network composed of all edge nodes, and setting a weight value for each edge node

the current traffic running state of each direction at the intersection is Q, and the method comprises the following steps: num _ L represents the number of vehicles in a current left-turn lane, num _ R represents the number of vehicles in a right-turn lane, num _ S represents the number of vehicles in a straight-through lane, v _ L represents the average speed of the left-turn lane, v _ R represents the average speed of the right-turn lane, v _ S represents the average speed of the straight-through lane, num _ A represents the number of vehicles about to enter the intersection, t _ L represents the state of a left-turn signal lamp, and t _ S represents the state of a straight-through signal lamp;

when the output of the signal lamp state value is positive, representing the green lamp remaining time; when the output of the signal lamp state value is negative, representing the red lamp remaining time;

the traffic running state of each edge node is S ═ Q | Q_xX ∈ (1, 2., k) }, k ═ 3 or k ═ 4, representing a three-way intersection and an intersection, respectively; the current whole traffic running state is S ═ S | S₁,S₂,...,S_nN ∈ R }, the weight is set to W ═ W | W ═ W₁,W₂,...,W_n,n∈R}；

4. The traffic network traffic light control method according to claim 1, wherein the edge nodes are set by taking the intersection as a unit, the time-distribution optimization processing is performed by combining the traffic flow data of the edge nodes and the adjacent edge nodes, the current traffic light state information and the average speed of each lane in different directions, and the local time-distribution scheme for obtaining the edge nodes comprises the following steps:

setting a weight value for each edge node

5. The traffic network traffic light control method according to claim 1, wherein the time-distribution optimization processing is performed by combining the traffic flow data of the edge node and the adjacent edge nodes, the current traffic light state information and the average speed of each lane in different directions, and the obtaining of the local time-distribution scheme of the edge node comprises:

and sharing information between the edge nodes and the adjacent edge nodes, wherein the shared information comprises the number of vehicles to arrive and the current traffic light information.

6. The traffic network traffic light control method of claim 1, wherein the selecting of the traffic signal devices with higher effectiveness as the implementation scheme for controlling each intersection comprises:

and fitting the behavior value function and the action selection strategy through the DDPG deep neural network model to optimize the implementation scheme, and controlling the traffic signal equipment by adopting the optimized implementation scheme.

7. The traffic network traffic light control method according to claim 1, wherein the respectively calculating and obtaining the benefits of the local timing scheme and the global timing scheme, and selecting the higher benefit of the local timing scheme and the global timing scheme as an implementation scheme comprises:

by the formula

to perform action P in the current traffic state;

8. The traffic network traffic light control method according to claim 1, wherein the time-distribution optimization processing is performed by integrating the traffic flow data of all edge nodes, the current traffic light state information and the average speed of each lane in different directions, and the obtaining of the global time-distribution scheme further comprises:

respectively setting vehicle number thresholds of three levels of optimization-free, optimization-required and urgent optimization-required according to the road width and the road length of the edge node;

grading the traffic flow state of each edge node in the traffic network according to the three vehicle number thresholds;

when the first preset number of nodes in the traffic network are in the optimization-free level and no edge node is in the urgent optimization-free level, stopping acquiring the global timing scheme;

when a second preset number of nodes in the traffic network are optimization-free levels and no edge node is an urgent optimization-free level, setting the second preset number of edge nodes which are not required to be optimized in a suspended state and not including the global timing scheme;

and when edge nodes which are in urgent need of optimization level appear in the traffic network or a first preset number of edge nodes are in need of optimization level, incorporating all the edge nodes into the global timing scheme.

9. A traffic network traffic light control system based on edge calculation is characterized by comprising road monitoring equipment, edge nodes, a cloud server, traffic signal equipment and a user center;

the road monitoring equipment acquires real-time videos of each intersection and transmits the real-time videos to the edge nodes which are arranged by taking the intersections as units;

the edge nodes analyze the real-time videos to detect and obtain current traffic flow data of each intersection, wherein the traffic flow data comprise the number of vehicles in each direction and the number of vehicles in different lanes in each direction;

the edge node combines a target edge node and the traffic flow data of the edge node adjacent to the target edge node, the current traffic light state information and the average speed of each lane in different directions to carry out time distribution optimization processing, and a local time distribution scheme of the edge node pair is obtained, wherein the local time distribution scheme comprises traffic signal equipment adjustment information;

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the traffic network traffic light control method according to any one of claims 1 to 8.