CN114038218A

CN114038218A - Chained feedback multi-intersection signal lamp decision system and method based on road condition information

Info

Publication number: CN114038218A
Application number: CN202111621771.9A
Authority: CN
Inventors: 郑龙; 凃浩; 张雅婷; 杜丛晋
Original assignee: Jiangsu Titan Intelligent Technology Co ltd
Current assignee: Jiangsu Titan Intelligent Technology Co ltd
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-02-11

Abstract

The invention discloses a chained feedback multi-intersection signal lamp decision system and method based on road condition information. The system comprises an intersection signal lamp decision device which is applied to a plurality of intersections in a to-be-decided area and is based on the information of the road conditions; the multiple intersection signal lamp decision-making devices based on the road condition information are used for training intersections to be decided. The method is to set the intersection signal lamp decision device based on the intersection in-out road condition information for a plurality of intersections to be decided; the intersection signal lamp decision device of the intersection to be decided based on the information of the road conditions of the intersection in and out is used for deciding the phase and the timing of the phase after training aiming at the sand table simulation data of the intersection to be decided. According to the invention, the global coordination control of the signal lamps at multiple intersections can be completed on the basis of intelligent control of a single intersection only by locally additionally acquiring the traffic information of the entrance and the exit, and the problem of regional traffic jam is alleviated without complex global training with poor effect.

Description

Chained feedback multi-intersection signal lamp decision system and method based on road condition information

Technical Field

The invention belongs to the field of traffic control, and particularly relates to a chained feedback multi-intersection signal lamp decision system and method based on traffic information.

Background

Traffic signal control is a traffic regulation and control measure with the widest distribution surface and the highest use frequency, and an intelligent control signal lamp is one of effective means for solving traffic problems. However, the traffic problem is not a problem of a local single intersection, but a section of continuous road or a region including multiple intersections is caused together, so that global coordination control of multiple intersection signal lamps must be considered to improve the regional traffic condition.

Many attempts have been made to intelligently control the signal lamps at a single intersection, and particularly after deep learning and reinforcement learning methods are introduced, the method has obvious progress, the collected characteristics of the road conditions of vehicles to be passed at the intersection are extracted through a deep network, optimized signal lamp phase positions, timing and the like are obtained based on reinforcement learning, the passing time, the parking times and the like are reduced, and the passing efficiency of the intersection is improved. However, the multi-intersection coordination control is still lack of an effective and feasible method, the existing multi-intersection method or simple independent training for learning the road conditions of entering vehicles at each intersection by deep enhancement cannot be realized, and the coordination control cannot be realized because each intersection cannot know the states of other intersections and cannot learn the influence of the decision on other intersections; or the problem is taken as a multi-agent reinforcement learning (MARL) problem, a global agent is used for controlling, the road condition information of all the intersections is input and the phases of all the intersections are given, and the algorithm is difficult to train and the coordination control effect is poor due to the fact that the global state and the action space are increased along with the number of the intersections rapidly; or communication is added among crossing signal lamp decision devices of different crossings, so that different crossings can acquire other crossing information, but on one hand, a larger state space is brought, training difficulty is increased, on the other hand, extra system and communication overhead is introduced, and implementation difficulty is increased.

Therefore, how to realize the ground multi-intersection signal lamp global coordination control without increasing the training and deployment complexity and effectively relieve the regional congestion is a problem to be solved urgently.

Disclosure of Invention

Aiming at the defects or the improvement requirements of the prior art, the invention provides a chained feedback multi-intersection signal lamp decision system and a chained feedback multi-intersection signal lamp decision method based on the information of the road conditions, which aim to better complete the training by additionally collecting the exit direction information at an intersection, using the exit direction information in the training of a single intersection signal lamp decision device, adjusting the state and the reward of an algorithm and taking the overflow of an exit vehicle into consideration, so that an intelligent agent obtained by the training can observe the influence of a control strategy of the intersection on adjacent intersections, thereby realizing the coordination control of the adjacent intersections, further realizing the coordination control of the whole situation in a self-adaptive manner by continuously adaptively adjusting every two adjacent intersections, and further solving the technical problems of complex structure, high training difficulty and difficult deployment of a multi-intersection decision system.

In order to achieve the above object, according to one aspect of the present invention, there is provided an intersection signal light decision device based on information of an access road condition, comprising an intersection access road condition acquisition module and an intelligent decision module;

the intersection entrance and exit road condition acquisition module is used for acquiring road condition information of an intersection to be decided and submitting the road condition information to the intelligent decision module, wherein the road condition information comprises: the traffic information of the direction of entering the intersection and the traffic information of leaving the intersection;

and the intelligent decision module is used for adopting a reinforcement learning model for the road condition information of the intersection to be decided submitted by the intersection road condition acquisition module according to the decision period to decide the phase of the next period and the timing of the phase.

Preferably, the traffic information of the intersection signal light decision device based on the traffic information includes: macroscopic information and microscopic information within a preset observation road section range; the macroscopic information is statistical information of the running condition of the vehicle in the range of the observation road section; the microscopic information is a running condition information set of each quantity of vehicles in the observation road section range.

Preferably, the macroscopic information of the intersection signal lamp decision device based on the traffic information comprises average waiting time, queuing length, number of coming vehicles, and/or average passing speed.

Preferably, the average waiting time obtaining method of the intersection signal light decision device based on the traffic information is preferably: if the displacement of the vehicle in the preset time interval is smaller than a preset threshold value, judging that the vehicle is in a waiting state, observing the duration time of all vehicles in the waiting state, taking an average value as average waiting time, and preferably adopting a vehicle tracking algorithm to obtain the displacement of the vehicle in the preset time interval; the queuing length is the difference value between the number of vehicles in the observation range at the observation time or the number of units divided by the preset length occupied by the vehicles or the number of distance units from the intersection to the nearest vehicle position observed in the departure direction and the preset value, and the queuing length is considered to be the difference value between the observable area and the downstream intersection or the length beyond the observation area, therefore, the difference value between the distance from the intersection to the nearest vehicle position observed in the departure direction and the preset value is preferably used as the substitute value; the number of the coming vehicles is the number of the vehicles entering a specific lane in the observation period; the passing speed is the ratio of the length of a specific lane in the passing observation range to the time when the vehicle appears in the specific lane, and the arithmetic average of the passing speeds of all vehicles is the average passing speed.

Preferably, the microscopic information of the intersection signal lamp decision device based on the traffic information is a driving condition information set of each quantity of vehicles in the observation road section range, and can be represented as a vehicle position matrix in a preset road section range.

Preferably, the vehicle position matrix of the intersection signal light decision device based on the traffic information is used for storing vehicle positions and vehicle attribute information, and can be represented as W_v×L_v×C_vWherein W is_vIndicating a lane, L_vIndicating a location unit within the lane, C_vThe vehicle attribute vector comprises vehicle speed and the like, and preferably also comprises a historical position information sequence, for example, vehicle attributes at N continuous moments are used for representing vehicle position changeHistory information of (2).

Preferably, the reinforcement learning model of the intersection signal light decision device based on the traffic information entering and exiting is a DQN neural network and an asynchronous dominant motion evaluation model of A3C.

Preferably, the reinforcement learning model of the intersection signal light decision device based on the traffic information is trained according to the following method:

s1, simulating traffic data of the intersection to be decided by adopting an urban traffic sand table with static environment simulation and dynamic traffic simulation functions, and acquiring road condition information of the intersection to be decided and corresponding phases and timing of the phases within a period of time;

s2, taking the traffic information obtained in step S1 and the corresponding phase, i.e. the timing of the phase, as training data, training the reinforcement learning model, preferably using a reward function:

Reward＝-(w1*avg_speed+w2*avg_wait+w3*queue_length)

wherein w1, w2 and w3 are weights; avg _ speed is the average traffic speed; avg _ wait is the average waiting time; queue _ length is the queue length;

and S3, performing iterative training until the reinforcement learning model converges.

According to another aspect of the present invention, a chained feedback signal lamp decision system based on traffic information access is provided, which comprises an intersection signal lamp decision device based on traffic information access provided by the present invention applied to a plurality of intersections in a to-be-decided area; the multiple intersection signal lamp decision-making devices based on the road condition information are used for training intersections to be decided.

Preferably, the multiple intersection signal lamp decision devices of the chained feedback signal lamp decision system based on the traffic information are sequentially arranged in the area to be decided according to the principle that the influence of signal lamp timing on the road traffic efficiency is from large to small and the relevance of the signal lamp decision devices of the existing intersections is from strong to weak.

According to another aspect of the present invention, a method for global coordination control of multiple intersection signal lamps is provided, which applies the chained feedback signal lamp decision system based on traffic information, comprising the following steps:

according to the principle that the influence of signal lamp timing on road passing efficiency is from large to small and the relevance between the signal lamp timing and the conventional intersection signal lamp decision device is from strong to weak, the intersection signal lamp decision device based on the information of the road condition of the intersection is arranged for a plurality of intersections to be decided;

the intersection signal lamp decision device of the intersection to be decided based on the information of the road conditions of the intersection in and out is used for deciding the phase and the timing of the phase after training aiming at the sand table simulation data of the intersection to be decided;

and the signal lamp of the intersection adjusts the signal lamp of the intersection according to the phase and the timing of the phase determined by the intersection signal lamp decision device based on the road condition information of the intersection.

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

1. the method has the advantages that the acquisition of the information of the road conditions of entrance and exit is only added locally, the adaptive adjustment is carried out on the training algorithm of the single intersection, the global coordination control of the signal lamps of the multiple intersections can be completed on the basis of the intelligent control of the single intersection, the complex global training with poor effect is not needed, the extra communication and system overhead are not increased, the road network resources are fully utilized, the more efficient traffic control is realized, and the problem of regional traffic jam is solved.

2. The model is trained to the following two aspects: firstly, according to the road condition of the entering direction, the vehicle can pass through the intersection as soon as possible so as to improve the passing speed and reduce the parking waiting time; meanwhile, once possible queuing in the leaving direction occurs, the number of vehicles passing through the intersection is rapidly controlled, and the traffic flow entering the downstream road is reduced by adopting the modes of reducing timing, adjusting phase and the like so as to avoid the overflow condition.

3. Although each single intersection cannot see the global information, each intersection can judge whether the adjacent intersection is subjected to too much pressure according to the road condition of the leaving direction so as to cause the adjacent intersection to be jammed, and the control is carried out according to a unified principle. Therefore, under the mechanism, congestion information can be diffused and fed back step by step, so that the adjacent intersections on the feedback route gradually reduce the input flow step by taking the intersection which is the most congested intersection as a starting point until the congestion is improved. The method is a dynamic automatic balancing process, and when road resources are sufficient, the outflow effect is maximized by improving the average passing speed; when the road resources are insufficient, the pressure on the road sections with insufficient resources is reduced through step-by-step back pressure, and therefore overall multi-intersection coordinated scheduling is achieved.

Drawings

Fig. 1 is a schematic diagram illustrating deployment of a chained feedback signal lamp decision system based on traffic information access provided by an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The invention provides an intersection signal lamp decision device based on traffic information, which comprises an intersection road condition acquisition module and an intelligent decision module;

the intersection road condition acquisition module is used for acquiring road condition information of an intersection to be decided and submitting the road condition information to the intelligent decision module, wherein the road condition information comprises: the traffic information of the direction of entering the intersection and the traffic information of leaving the intersection; the traffic information is as follows: macroscopic information and microscopic information within a preset observation road section range; the macroscopic information is statistical information of the running conditions of the vehicles in the observation road section range, and comprises average waiting time, queuing length, number of coming vehicles and average passing speed in a preset road section range; the average waiting time obtaining method preferably includes: if the displacement of the vehicle is smaller than a preset threshold value within a preset time interval, judging that the vehicle is in a waiting state, observing the duration time of all vehicles in the waiting state, and levelingTaking the average value as average waiting time, and preferably adopting a vehicle tracking algorithm to obtain the displacement of the vehicle within a preset time interval; the queuing length is the difference value between the number of vehicles in the observation range at the observation time or the number of units divided by the preset length occupied by the vehicles or the number of distance units from the intersection to the nearest vehicle position observed in the departure direction and the preset value, and the queuing length is considered to be the difference value between the observable area and the downstream intersection or the length beyond the observation area, therefore, the difference value between the distance from the intersection to the nearest vehicle position observed in the departure direction and the preset value is preferably used as the substitute value; the number of the coming vehicles is the number of the vehicles entering a specific lane in the observation period; the passing speed is the ratio of the length of a specific lane in the passing observation range to the time when the vehicle appears in the specific lane, and the arithmetic mean value of the passing speeds of all vehicles is the average passing speed; the microscopic information is a running condition information set of each amount of vehicles in the observation road section range and can be represented as a vehicle position matrix in a preset road section range; the vehicle location matrix, used to store vehicle location and vehicle attribute information, may be represented as W_v×L_v×C_vWherein W is_vIndicating a lane, L_vIndicating a location unit within the lane, C_vThe vehicle attribute vector is a vehicle attribute vector comprising a vehicle identifier for uniquely marking the vehicle and preferably further comprising a sequence of historical location information, e.g. historical information representing N time periods with vehicle attributes of the location for N consecutive time instants.

The intelligent decision module is used for adopting a reinforcement learning model for the road condition information of the intersection to be decided submitted by the intersection road condition acquisition module according to the decision period to decide the phase of the next period and the timing of the phase;

the reinforcement learning model is preferably DQN neural network and asynchronous dominant motion evaluation model (A3C model)

The reinforcement learning model is trained according to the following method:

s1, simulating traffic data of the intersection to be decided by adopting an urban traffic sand table with static environment simulation and dynamic traffic simulation functions, and acquiring road condition information of the intersection to be decided and corresponding phases and timing of the phases within a period of time; preferably continuous traffic information; the urban traffic sand table comprises, but is not limited to, SUMO, AIMSUN, VISSIM, TRANSIMS.

Reward＝-(w1*avg_speed+w2*avg_wait+w3*queue_length)

wherein w1, w2 and w3 are weights; avg _ speed is the average passing speed, and the passing speeds of all vehicles are the arithmetic average value; queue _ length is the queue length.

And S3, carrying out iterative training until the reinforcement learning model converges, and adopting methods such as epsilon-greedy and the like.

The invention provides a chained feedback signal lamp decision system based on traffic information of entering and exiting, which comprises a crossing signal lamp decision device based on the traffic information of entering and exiting, which is applied to a plurality of crossings in a to-be-decided area; the multiple intersection signal lamp decision-making devices based on the road condition information are used for training intersections to be decided.

A plurality of intersection signal lamp decision devices of the decision system are connected into a congestion information processing chain according to the structural association relationship among intersections, and the plurality of intersection signal lamp decision devices on the congestion information processing chain feed back congestion information in a chain manner to decide phases of the intersections and timing of the phases; the structural association between the intersections, namely the specific traffic information is simultaneously contained in the traffic information of the specific direction leaving the intersection and the traffic information of the direction entering the intersection of the associated intersection.

The multiple intersection signal lamp decision devices are sequentially arranged in the area to be decided according to the principle that the influence of signal lamp timing on road passing efficiency is from large to small and the relevance of the multiple intersection signal lamp decision devices to the existing intersection signal lamp decision devices is from strong to weak.

The invention provides a method for global coordination control of multi-intersection signal lamps, which applies a chained feedback signal lamp decision system based on traffic information access provided by the invention, and comprises the following steps:

according to the principle that the influence of signal lamp timing on road passing efficiency is small and the relevance between the signal lamp timing and the conventional intersection signal lamp decision device is strong to weak, an intersection signal lamp decision device based on the road entering and exiting condition information is arranged for a plurality of intersections to be decided;

the crossing signal lamp decision device of the crossing to be decided based on the road condition information determines the phase and the timing of the phase after training aiming at the sand table simulation data of the crossing to be decided;

and the signal lamp of the intersection adjusts the signal lamp of the intersection according to the phase and the timing of the phase determined by the intersection signal lamp decision device based on the traffic information.

The application provides a method for realizing overall coordination control of multiple intersection signal lamps by utilizing enhanced local information, which is characterized in that outlet direction information is additionally acquired at an intersection, the outlet direction information is used for training a single intersection signal lamp decision device, and the state and reward of an algorithm are adjusted to better finish training, so that the trained intersection signal lamp decision device can observe the influence of an intersection control strategy on adjacent intersections, the coordination control of the adjacent intersections is realized, and further, the overall coordination control is realized by continuous adaptive adjustment of every two adjacent intersections. The method naturally utilizes the local information of each intersection, does not need to explicitly communicate between intersection signal lamp decision devices, and minimizes the requirements on the system and the communication.

The following are examples:

example 1

A crossing signal lamp decision device based on the information of the road conditions of entering and exiting comprises a crossing road condition acquisition module and an intelligent decision module;

the intersection road condition acquisition module is used for acquiring road condition information of an intersection to be decided and submitting the road condition information to the intelligent decision module, wherein the road condition information comprises: the traffic information of the direction of entering the intersection and the traffic information of leaving the intersection; the traffic information is as follows: macroscopic information and microscopic information within a preset observation road section range; the macroscopic information is the vehicles in the range of the observation road sectionThe statistical information of the running condition of the vehicles comprises average waiting time, queuing length, the number of coming vehicles and average passing speed; the average waiting time obtaining method preferably includes: if the displacement of the vehicle in the preset time interval is smaller than a preset threshold value, judging that the vehicle is in a waiting state, observing the duration time of all vehicles in the waiting state, taking an average value as average waiting time, and preferably adopting a vehicle tracking algorithm to obtain the displacement of the vehicle in the preset time interval; the queue length is a negative value of the number of vehicles in the observation range at the observation time or the number of units of which the vehicles occupy the unit divided by a preset length or the number of units of distance from the intersection to the nearest vehicle position observed in the departure direction, and the queue length is considered to be the observable region to the downstream intersection or the length beyond the observable region, therefore, it is preferable to use a negative value of the distance from the intersection to the nearest vehicle position observed in the departure direction as a substitute value; the number of the coming vehicles is the number of the vehicles entering a specific lane in the observation period; the passing speed is the ratio of the length of a specific lane in the passing observation range to the time when the vehicle appears in the specific lane, and the arithmetic mean value of the passing speeds of all vehicles is the average passing speed; the microscopic information is a running condition information set of each amount of vehicles in the observation road section range and can be represented as a vehicle position matrix in a preset road section range; the vehicle location matrix, used to store vehicle location and vehicle attribute information, may be represented as W_v×L_v×C_vWherein W is_vIndicating a lane, L_vIndicating a location unit within the lane, C_vThe vehicle attribute vector is a vehicle attribute vector and comprises a vehicle identifier used for uniquely marking a vehicle and also comprises a historical position information sequence, for example, vehicle information of N time periods is embodied by vehicle attributes of the position at N continuous moments. The road matrix specifically adopted in this embodiment is specifically:

long and wide matrices, i.e. W_v×L_v×C_vWherein W is_vIndicating a lane, L_vIndicating a position unit within a preset observation section, a position setting 1 of the vehicle exists, and at the same time, passing through C_vIncluding vehicle speed attributes, history information, as if continuous use is madeAnd the road condition information of 8 observation periods is used for reflecting the road condition in a period of time.

Selecting a proper regulation and control intersection: the selection of the demonstration intersection needs to preferably distinguish whether the road traffic efficiency is determined by signal lamp control or is mainly influenced by the signal lamp control. In some extreme cases, the road traffic efficiency is interfered by various other factors, such as the fact that vehicles are interfered on the road to run, so that the vehicles are congested, and the traffic efficiency cannot be changed no matter how the control such as signals is optimized. In order to make the intersection have more demonstration significance, the intersection with the signal lamp timing being the dominant factor of the road traffic efficiency should be selected as much as possible.

The existing signal lamp intelligently regulates and controls the road condition information of vehicles in the entering direction based on the intersection approaching stop line, such as early queuing length, waiting time and the like of the entering direction as characteristics, or later maps the entering direction road and the vehicle position on the entering direction road network as a matrix, an intelligent intersection signal lamp decision device is obtained by using reinforcement learning algorithm training, and recommended phase, timing and other execution actions are given according to the road condition information of the vehicles in the entering direction. The basic idea of the design is that the signal lamp is mainly used for regulating and controlling the vehicles entering the direction, and once the vehicles pass through the intersection, the signal lamp cannot be regulated and controlled, so that the road conditions of the vehicles leaving the intersection do not need to be known.

However, for the adjacent intersection, the road condition of the vehicle in the exit direction of the upstream intersection, that is, the road condition of the vehicle entering the downstream intersection, can be obtained by observing the road condition of the vehicle in the exit direction of the upstream intersection, for example, if the exit direction of the upstream intersection starts to appear that the vehicle is traveling slowly and queuing, it means that the traffic capacity of the downstream intersection cannot support the vehicle released by the upstream intersection.

Therefore, it is considered to simultaneously acquire the traffic information of the vehicles in the entering and leaving directions of the intersection. The road condition of the vehicles in the entering direction is mainly used for evaluating the condition of the vehicles to be passed at the intersection, and the road condition of the vehicles in the leaving direction is mainly used for evaluating the influence on the adjacent intersections. Theoretically, the larger the range of the collected road condition information is, the earlier the intelligent intersection signal lamp decision device can make a response and adjust in time.

Therefore, the invention collects the road condition information of the intersection to be decided, and simultaneously collects the road condition information of vehicles in the entering and leaving directions.

But the larger the acquisition range, the corresponding increase in acquisition and deployment costs. Therefore, for the entering direction, the range of the observation road section adopted by the embodiment is about 200-250 meters before the stop line; for the departure direction, the distance of the observed road section range behind the intersection is 15-200 meters. In addition, the road condition in the middle of the intersection also needs to be collected, and the requirement of vehicle flow direction tracking is met, as shown in fig. 1.

In the current common traffic collection equipment, available collection ways include but are not limited to intersection cameras, millimeter wave radars, geomagnetism, floating cars and the like, and collected data have different advantages and disadvantages, for example, the cameras can more accurately collect information of motor vehicles, non-motor vehicles and pedestrians through visual features, but the observation range is relatively small and is mainly in an intersection area; millimeter wave radars can realize long-distance tracking in road sections, but lack visual information with relatively low precision and are generally difficult to accurately detect non-motor vehicles and pedestrians; the geomagnetic coverage area is the smallest, a certain point position of a lane is covered, the vehicle counting and the vehicle speed measurement are realized, the geomagnetic coverage area is used as basic data of the road vehicle condition, and a plurality of geomagnetic sensors can be arranged at intervals to increase the acquisition range; floating cars can track the vehicle continuously throughout, but often in insufficient numbers and sampling frequency. Therefore, the specific implementation can be determined according to the actual situation, and the change of different acquisition modes and ranges can be adapted through the adjustment of the details of the algorithm.

The intelligent decision module is used for adopting a reinforcement learning model for the road condition information of the intersection to be decided submitted by the intersection road condition acquisition module according to the decision period to decide the phase of the next period and the timing of the phase; the phase refers to a traffic flow state existing at the intersection at the same time, such as a straight bidirectional traffic flow in the southeast direction; an east left-turn and right-turn straight three-way traffic flow; and a traffic-free pedestrian state.

The reinforcement learning model includes but is not limited to DQN neural network, asynchronous dominant motion evaluation model (A3C model). For each intersection, the intersection signal lamp decision device adaptive to the intersection is independently trained by using the road condition data of the intersection, and the intersection can adaptively adjust the road condition of the adjacent intersection because the intersection training data comprises the leaving road condition information, so that the intersection signal lamp decision devices independently trained by a plurality of intersections can form global dynamic adaptive adjustment. Preferably inputting complete road condition information of continuous time periods, wherein the complete road condition information comprises the state of a certain vehicle on the road for a period of time, and intelligently simulating and observing the road condition for a period of time; the output is the phase and timing suitable for the current road condition, so as to maintain the current phase or switch to the next phase.

In this embodiment, a DQN based on deep convolutional neural network and reinforcement learning is used as a training algorithm, where the input data may use historical road condition data, and the output is a signal lamp phase and timing, and the trained intersection signal lamp decision device may infer a recommended signal lamp phase and timing according to the current road condition, compare the recommended signal lamp phase and timing with the current intersection phase and phase duration, and determine whether to continue to maintain the current phase or switch the phase. The convolutional neural network in the DQN can reserve the basic structure in the original network, and the network structure is properly adjusted according to the size of input data. And the reinforcement learning part uses the initialized urban traffic sand table as a simulator, continuously obtains the subsequent different road conditions generated by the phase change and the timing change of different required signal lamps step by step, provides basic data for calculating reward, and thus completes the training of reinforcement learning.

In this embodiment, the entering direction adopts radar to collect data, and can return information such as vehicle position and speed on the road network. And comparing the road network information obtained in the data preparation stage with the collected data, discretizing every other c meters within l meters outside the stop line of each lane at the intersection into a plurality of contact units, wherein if a vehicle exists in each unit, the corresponding position value is 1, and if no vehicle exists in each unit, the corresponding position value is 0, so that the position information of all vehicles in the road network entering the direction is obtained. Meanwhile, more information such as the average running speed, the acceleration, the deceleration, the following distance and the like of the vehicle can be expanded on each piece of vehicle position information. And taking the width length info of the matrix dimension, taking the current traffic information matrix obtained at continuous T moments as the current traffic state s, and taking continuous T _ windows states each time in a sliding window mode, wherein the matrix dimension not only contains the static information of the vehicle position, but also contains the information of the dynamic change of traffic, thereby more accurately depicting the traffic state. The exit direction is similarly processed and will be input as a state along with the entry direction data.

The reinforcement learning needs to provide a simulation environment, wherein the reinforcement learning is carried out by using an urban traffic sand table, and the simulation environment has static environment simulation and dynamic traffic simulation functions, reproduces road networks, traffic lights and the like in the real world, simulates the behaviors of motor vehicles, non-motor vehicles and pedestrians in the road networks, restores the traffic conditions in the real world and serves as an environmental support for training and analysis. Here, the traffic simulator may be developed by itself, or an existing traffic simulator may be used. Specifically, the method comprises the following steps:

the method comprises the steps of scheduling intersections and road network structures of all the associated intersections, obtaining the road network structures by adopting an existing map, a construction engineering drawing or field measurement mode, selecting a specific area required on the map through an OpenStreetMap, exporting map data of the area into a file in a JOSM format, adjusting the map data in an OpenStreetMap map editor by combining the construction engineering drawing or field measurement result, converting the adjusted map data file into road network data available for SUMO by using a tool provided by SUMO, setting intersection signal lamp phase setting, and setting traffic control related information such as vehicle speed limit of each road.

The method comprises the steps of associating data from a plurality of different sources based on time and space information, fusing all information, combining static road network data in a sand table, outputting structured data available for a sand table system, and setting vehicle information in the structured data into a simulator through an SUMO configuration file or a user programming interface, so as to finish initialization of vehicles in the sand table.

The reinforcement learning model is trained according to the following method:

s1, simulating traffic data of the intersection to be decided by adopting an urban traffic sand table with static environment simulation and dynamic traffic simulation functions, and acquiring road condition information of the intersection to be decided and corresponding phases and timing of the phases within a period of time; in particular continuous traffic information.

S2, taking the road condition information obtained in the step S1 and the corresponding phase thereof, namely the timing of the phase as training data, training the reinforcement learning model, and adopting a reward function:

Reward＝-(w1*avg_speed+w2*avg_wait+w3*queue_length)

wherein w1, w2 and w3 are weights; avg _ speed is the average passing speed, and the passing speeds of all vehicles are the arithmetic average value; queue _ length is the queue length. The weight setting needs to be determined in combination with the acquisition mode and range. If the acquisition mode is an opposite camera, a very small exit area can be covered, and at the moment, if a queuing phenomenon is observed, the overflow is shown to be generated, so that the weight is doubled or multiplied, a higher punishment is given, and the overflow condition is avoided. If the acquisition mode is a geomagnetic acquisition mode or a radar acquisition mode with a wider acquisition range, the weight can be set in a grading mode according to the geomagnetic layout position or the radar return position, and the principle is that the more the intersection is approached, the higher punishment is given, and the overflow is avoided.

The key to using reinforcement learning is to give a definition of the appropriate states, actions and rewards. The definition of the action is generally phase and timing, the state is defined as road conditions including vehicle information, signal lamp state, road network condition and the like, and the reward is used for embodying an optimization target, such as the smaller the average passing time or the parking waiting times is, the better the optimization is. The step-by-step traffic statistics information can be obtained, and the corresponding statistical data can be obtained through the SUMO simulator.

And S3, carrying out iterative training until the reinforcement learning model converges, and adopting methods such as epsilon-greedy and the like. After the obtained state, reward, action and network are determined, training of the DQN algorithm can be started, and the crossing signal lamp decision device of a single crossing is obtained through repeated iterative optimization.

Example 2

A chain feedback signal lamp decision system based on access road condition information comprises an intersection signal lamp decision device based on access road condition information, which is applied to all intersections in a to-be-decided area and provided by embodiment 1; the multiple intersection signal lamp decision-making devices based on the road condition information are used for training intersections to be decided.

The multiple intersection signal lamp decision devices are sequentially arranged in the area to be decided according to the principle that the influence of signal lamp timing on road passing efficiency is small and the relevance of the multiple intersection signal lamp decision devices to the existing intersection signal lamp decision devices is strong to weak.

And (3) optimizing the single intersection at each regulation and control intersection, and forming a plurality of traffic decision intelligent devices by the single intersection optimization method introduced in the embodiment 1.

Example 3

A method for global coordination control of multi-intersection signal lamps applies a chained feedback signal lamp decision system based on traffic information access provided by embodiment 2, and comprises the following steps:

The intersection signal lamp decision device of each intersection in the method can learn the following two characteristics: firstly, according to the road condition of the entering direction, the vehicle can pass through the intersection as soon as possible so as to improve the passing speed and reduce the parking waiting time; meanwhile, once possible queuing in the leaving direction occurs, the number of vehicles passing through the intersection can be rapidly controlled, and the traffic flow entering the downstream road is reduced by adopting the modes of reducing timing, adjusting phase and the like so as to avoid the overflow condition. Although each single intersection can not see global information, each intersection can judge whether the adjacent intersection is subjected to too much pressure according to the road condition of the leaving direction so as to cause congestion of the adjacent intersection, and the regulation and control are carried out according to a unified principle. Therefore, under the mechanism, the congestion information can be diffused and fed back step by step, so that the adjacent intersections on the feedback route gradually reduce the input flow step by taking the intersection which is congested at first as a starting point until the congestion is improved. The method is a dynamic automatic balancing process, and when road resources are sufficient, the outflow effect is maximized by improving the average passing speed; when the road resources are insufficient, the pressure on the road sections with insufficient resources is reduced through step-by-step back pressure, and therefore overall multi-interface coordinated scheduling is achieved.

The process can acquire multi-source road condition data in real time, directly carry out real-time decision through artificial intelligence, provide an accurate signal lamp timing scheme and directly control a signal controller in real time. The real-time control subsystem completes a closed loop from the real-time road condition to the signal lamp setting, wherein the input is the real-time road condition, and the output is the timing of the corresponding signal lamp. Real-time control is based on real-time data, only data in a sand table is used for training, and real-time road condition data is used for reasoning. The real-time road condition can be obtained from sensors such as a camera, a radar and the geomagnetism from multiple sources by using a data acquisition and analysis subsystem, the real-time road condition is inferred by using a traffic decision intelligent body obtained by training of a training subsystem, the phase and the duration which are required to be set by the signal lamp are obtained, and the signal lamp is controlled to carry out corresponding operation through an interactive interface with a signal controller, so that the intelligent real-time control of the signal lamp is completed.

Training uses data in the sand table, while reasoning is based on real-time road conditions data. The real-time control subsystem completes a closed loop from the real-time road condition to the signal lamp setting, the real-time road condition can be obtained from a plurality of sensors such as a camera, a radar and the geomagnetism by using the data acquisition and analysis subsystem, a traffic decision intelligent agent obtained by training the training subsystem is used for reasoning the real-time road condition to obtain the phase and the time length which are required to be set by the signal lamp, and the signal lamp is controlled to carry out corresponding operation through an interactive interface with the signal controller, so that the intelligent real-time control of the signal lamp is completed.

The decision-making devices of the signal lamps of a plurality of intersections are regulated and controlled according to a unified principle, for example, when a certain intelligent agent finds that a downstream intersection is close to an overflow state, the modes of timing reduction, phase adjustment and the like are adopted to reduce the traffic flow entering the downstream road, and similarly, if an upstream intersection is also regulated and controlled according to the principle, the traffic flow can be continuously transmitted to a boundary intersection step by step to form flow control back pressure; and under the condition that the passing speed does not cause overlong queuing at the downstream intersections, the vehicles can pass through as soon as possible, and each downstream intersection is regulated and controlled according to a unified principle to form pressurization drainage step by step.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A crossing signal lamp decision device based on the information of the road conditions of the entrance and the exit is characterized by comprising a crossing entrance and exit road condition acquisition module and an intelligent decision module;

2. The intersection signal light decision device based on traffic information and traffic information as claimed in claim 1, wherein the traffic information is: macroscopic information and microscopic information within a preset observation road section range; the macroscopic information is statistical information of the running condition of the vehicle in the range of the observation road section; the microscopic information is a running condition information set of each quantity of vehicles in the observation road section range.

3. The intersection signal light decision device based on traffic information as claimed in claim 2, wherein the macro information comprises average waiting time, queue length, number of coming vehicles, and/or average traffic speed.

4. The device for deciding the signal lamp of the intersection based on the traffic information as claimed in claim 3, wherein the average waiting time obtaining method preferably comprises: if the displacement of the vehicle in the preset time interval is smaller than a preset threshold value, judging that the vehicle is in a waiting state, observing the duration time of all vehicles in the waiting state, taking an average value as average waiting time, and preferably adopting a vehicle tracking algorithm to obtain the displacement of the vehicle in the preset time interval; the queuing length is the difference value between the number of vehicles in the observation range at the observation time or the number of units divided by the preset length occupied by the vehicles or the number of distance units from the intersection to the nearest vehicle position observed in the departure direction and the preset value, and the queuing length is considered to be the difference value between the observable area and the downstream intersection or the length beyond the observation area, therefore, the difference value between the distance from the intersection to the nearest vehicle position observed in the departure direction and the preset value is preferably used as the substitute value; the number of the coming vehicles is the number of the vehicles entering a specific lane in the observation period; the passing speed is the ratio of the length of a specific lane in the passing observation range to the time when the vehicle appears in the specific lane, and the arithmetic average of the passing speeds of all vehicles is the average passing speed.

5. The device as claimed in claim 2, wherein the device is used for deciding signal lights at intersections based on information of traffic conditionsThe microscopic information is a running condition information set of each quantity of vehicles in the observation road section range and can be represented as a vehicle position matrix in a preset road section range; preferably, the vehicle position matrix is used for storing vehicle positions and vehicle attribute information, which can be expressed as W_v×L_v×C_vWherein W is_vIndicating a lane, L_vIndicating a location unit within the lane, C_vThe vehicle attribute vector is a vehicle attribute vector including a vehicle speed and the like, and preferably further including a history position information sequence, for example, history information representing a change in the vehicle position by the vehicle attributes at N consecutive time points.

6. The intersection signal light decision device based on traffic information as claimed in claim 1, wherein the reinforcement learning model is a DQN neural network, A3C asynchronous dominant motion evaluation model.

7. The intersection signal light decision device based on traffic information ingress and egress according to claim 1, wherein the reinforcement learning model is trained according to the following method:

Reward＝-(w1*avg_speed+w2*avg_wait+w3*queue_length)

8. A chain feedback signal lamp decision system based on traffic information, which is characterized by comprising the intersection signal lamp decision device based on traffic information as claimed in any one of claims 1 to 7, applied to a plurality of intersections in an area to be decided; the multiple intersection signal lamp decision-making devices based on the road condition information are used for training intersections to be decided.

9. The system according to claim 8, wherein the signal light decision devices are sequentially disposed in the area to be decided according to a rule that signal light timing has a small influence on road traffic efficiency and a strong correlation with the signal light decision devices of the existing intersections.

10. A method for global coordination control of multiple intersection signal lamps, wherein the system for deciding the chained feedback signal lamps based on the traffic information according to claim 8 or 9 is applied, comprising the following steps: