WO2023123906A1

WO2023123906A1 - Traffic light control method and related device

Info

Publication number: WO2023123906A1
Application number: PCT/CN2022/099853
Authority: WO
Inventors: 蚁韩羚; 李圆法; 余晓填; 王孝宇; 陈宁
Original assignee: 深圳云天励飞技术股份有限公司
Priority date: 2021-12-31
Filing date: 2022-06-20
Publication date: 2023-07-06
Also published as: CN114399909A; CN114399909B

Abstract

Embodiments of the present invention provide a traffic light control method, comprising: obtaining state information of a current intersection and an adjacent intersection within a first preset time period, and obtaining a graph structure of the current intersection and the adjacent intersection, the state information comprising a stay position and a stay duration of a vehicle in each lane, and the graph structure comprising a connection relationship between the current intersection and the adjacent intersection; inputting the state information and the graph structure into a pre-trained intelligent agent, and predicting a traffic light action at the current intersection at a preset point in time, the intelligent agent being trained by means of reinforcement learning; and controlling a traffic light at the current intersection to execute the traffic light action at the preset point in time according to the traffic light action at the current intersection at the preset point in time. The traffic light action at the current intersection at the preset point in time is accurately predicted from a space-time dimension by means of the intelligent agent, and then the traffic light at the intersection is controlled to execute the traffic light action at the preset point in time, thereby avoiding vehicle congestion or an idle passing time window and thus improving the passing efficiency of a whole road network.

Description

Traffic signal light control method and related equipment

This application claims the priority of the Chinese patent application with the application number 202111674229.X and the title of the invention "Traffic signal light control method and related equipment" filed with the China Patent Office on December 31, 2021, the entire contents of which are incorporated herein by reference Applying.

technical field

The invention relates to the field of traffic signal lamp control, in particular to a traffic signal lamp control method and related equipment.

Background technique

Traffic signal light control is an essential part of smart city construction. Effective control of traffic signal lights is of great significance for alleviating urban traffic congestion. At present, traffic lights generally use the control method of single-point timing control, that is, within a fixed period of time, according to the order and duration of the preset phase, the traffic flow in each direction is released sequentially. Vehicles that have not passed through for a fixed period of time need to wait until the phase of the next cycle before they can continue to pass. Lanes with few vehicles have redundant passing time, resulting in idle passing time windows (no vehicles passing). Therefore, the existing traffic signal light control method has the problem of low traffic efficiency.

Contents of the invention

An embodiment of the present invention provides a traffic signal light control method, by taking the state information of the current intersection and adjacent intersections within the first preset time period, and the graph structure of the current intersection and adjacent intersections as the input of the agent, and outputting the information through the agent Signal light action, because the state information includes timing information and the parking position and duration of vehicles in the lane, considering the congestion of the lane, and the graph structure includes the spatial dependence of each intersection, taking into account the spatial distribution of each intersection, making intelligent The body can accurately predict the signal light action at the preset time at the current intersection from the dimension of time and space, and control the signal light at the intersection to execute the signal light action at the preset time to avoid vehicle congestion or idle time windows, thereby improving the traffic efficiency of vehicles on the entire road network.

In a first aspect, an embodiment of the present invention provides a traffic signal light control method, the traffic signal light control method comprising:

Obtain the state information of the current intersection and the adjacent intersection within the first preset time period, and obtain the graph structure of the current intersection and the adjacent intersection, the state information includes the parking position and the duration of the vehicle in each lane, so The graph structure includes the connection relationship between the current intersection and the adjacent intersection;

Inputting the state information and the graph structure into a pre-trained agent, predicting the signal light action at the preset time at the current intersection, and the agent is obtained through reinforcement learning training;

According to the signal light action at the current intersection at the preset time, the signal light at the current intersection is controlled to execute the signal light action at the preset time.

Optionally, the acquiring state information of the current intersection and adjacent intersections within the first preset time period includes:

At the current moment, the image information of each lane at the current intersection is obtained, and the parking position and duration of the vehicle in each lane are extracted according to the image information of each lane;

According to the parking position and the staying time of the vehicle in each lane, calculate the lane queue length corresponding to each lane;

calculating the status information of the current intersection at the current moment according to the signal light action information and the lane queue lengths corresponding to the respective lanes;

Obtain state information corresponding to each moment within a first preset time period, where the first preset time period includes the current moment.

Optionally, the calculating the state information of the current intersection at the current moment according to the signal light action information and the lane queue length corresponding to each lane includes:

Obtaining the signal light action at the current intersection at the current moment;

Calculating the sum of the lane queue lengths corresponding to the lanes that are allowed to pass under the signal light action at the current moment at the current intersection;

The state information of the current intersection at the current moment is obtained according to the sum of the signal light action at the current intersection and the lane queue length corresponding to the allowed lane.

Optionally, before inputting the state information and the graph structure into the pre-trained agent to predict the signal light action at the current intersection at a preset time, the method further includes:

Constructing an agent configured to output a signal light action according to state information;

Taking the traffic volume at the current intersection within the second preset time period as a reward, the agent is trained in reinforcement learning, and after the training is completed, the trained agent is obtained as the preset agent.

Optionally, the construction of an intelligent body includes:

Constructing a signal light action network based on the spatio-temporal graph convolutional network and the first output network, the signal light action network outputs the signal light action through the first output network;

An evaluation network is constructed based on a spatio-temporal graph convolutional network and a second output network, the evaluation network outputs a state value through the second output network, and the state value is used to evaluate the performance of the signal light action network, and the evaluation network and The signal light action network shares the parameters of a spatio-temporal graph convolutional network;

According to the signal light action network and the evaluation network, an agent is constructed.

Optionally, using the traffic volume at the current intersection within the second preset time period as a reward, the agent is trained in reinforcement learning, and after the training is completed, the trained agent is obtained as the preset agent, include:

According to the preset number of simulated intersections, simulated roads, the connection relationship between simulated intersections, the maximum speed limit of each simulated road, and the length of simulated roads, a road network simulation environment is constructed. Each simulated intersection is set with a well-built intelligent body. The road network simulation environment randomly generates simulated traffic flow in each simulation lane;

Every preset time, the state information of all simulated intersections in the first preset time period and the graph structure corresponding to the road network simulation environment are used as the input of the constructed intelligent agent, through the constructed intelligent agent Body output signal light action;

After executing the signal light action, the traffic volume of each simulated intersection in the second preset time period is used as a reward, and the constructed intelligent body is trained for reinforcement learning;

After the training is completed, the signal light action network in the trained intelligence is used as the preset intelligent body.

Optionally, after inputting the state information and the graph structure into the pre-trained agent to predict the signal light action at the current intersection at a preset time, the method further includes:

performing post-processing on the signal light action at the preset time at the current intersection according to the preset post-processing rule, to obtain the signal light action at the preset time at the current intersection after post-processing;

According to the post-processed signal light action at the current intersection at the preset time, the current intersection is controlled to execute the signal light action at the preset time.

In a second aspect, an embodiment of the present invention provides a traffic signal light control device, the device comprising:

An acquisition module, configured to acquire status information of the current intersection and adjacent intersections within a first preset time period, and acquire a graph structure of the current intersection and the adjacent intersections, the status information including the parking positions of vehicles in each lane And the length of stay, the graph structure includes the connection relationship between the current intersection and the adjacent intersection;

A prediction module, configured to input the state information and the graph structure into a pre-trained agent to predict the signal light action at the current intersection at a preset moment, and the agent is obtained through reinforcement learning training;

The first control module is configured to control the signal light at the current intersection to perform the signal light action at the preset time according to the signal light action at the current intersection at the preset time.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the computer program, The steps in the traffic signal light control method provided by the embodiment of the present invention are implemented.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the traffic signal light control method provided in the embodiment of the present invention is implemented. A step of.

In the embodiment of the present invention, the status information of the current intersection and the adjacent intersections within the first preset time period is obtained, and the graph structure of the current intersection and the adjacent intersections is obtained, and the status information includes the parking of vehicles in each lane The location and the length of stay, the graph structure includes the connection relationship between the current intersection and the adjacent intersection; the state information and the graph structure are input into the pre-trained agent to predict the current intersection in the preset The signal light action at the moment is obtained by the agent through reinforcement learning training; according to the signal light action at the current intersection at the preset time, the signal light at the current intersection is controlled to execute the signal light action at the preset time. By using the state information of the current intersection and adjacent intersections within the first preset time period, as well as the graph structure of the current intersection and adjacent intersections as the input of the agent, the agent outputs signal lights. Since the state information includes timing information and lane The parking position and duration of the vehicle in the vehicle, taking into account the recent congestion of the lane, and the graph structure includes the spatial dependence of each intersection, taking into account the spatial distribution of each intersection, so that the agent can accurately predict the current intersection from the dimension of time and space The signal light action at the preset time, the control signal light at the intersection executes the signal light action at the preset time, avoiding vehicle congestion or idle time windows, thereby improving the traffic efficiency of the overall road network.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

FIG. 1 is a schematic flowchart of a traffic signal light control method provided by an embodiment of the present invention;

Fig. 2 is a schematic diagram of phases of a signal light at an intersection provided by an embodiment of the present invention;

FIG. 3 is a network architecture diagram of an agent provided by an embodiment of the present invention;

Fig. 4 is a schematic structural diagram of a traffic signal light control device provided by an embodiment of the present invention;

Fig. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.

Detailed ways

The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

Please refer to FIG. 1. FIG. 1 is a flowchart of a traffic signal light control method provided by an embodiment of the present invention. As shown in FIG. 1, the traffic signal light control method includes:

101. Obtain status information of the current intersection and adjacent intersections within a first preset time period, and acquire graph structures of the current intersection and adjacent intersections.

In the embodiment of the present invention, the above-mentioned state information includes the state information of the current intersection and the state information of the adjacent intersections. Further, the state information of the above-mentioned current intersection includes the parking position and the duration of the vehicles in each lane in the current intersection. The above-mentioned adjacent intersections The status information of the vehicle includes the location and duration of each lane in the adjacent intersection.

The above-mentioned graph structure includes the connection relationship between the above-mentioned current intersection and the above-mentioned adjacent intersection. The connection relationship between the above-mentioned current intersection and the above-mentioned adjacent intersection can be understood as which intersections are connected to the current intersection. Vehicles can drive from these intersections to the current intersection, or from the current intersection. Drive to these intersections. In the graph structure, the current intersection and adjacent intersections are used as nodes, and the connection relationship between the current intersection and adjacent intersections is used as a weight edge. The closer the distance between the current intersection and the adjacent intersection, the greater the value of the weight edge. The distance between the current intersection and the adjacent intersection The farther away, the smaller the value of the weight edge.

The above graph structure can be pre-built. According to the connection relationship and distance between each intersection and other intersections, the corresponding graph structure is constructed. The graph structure is a fixed structure of the road network. Before the road network changes, the graph structure is There will be no change. The graph structure encodes the spatial dependencies between different traffic intersections. In the graph structure, each node represents a traffic intersection, and the edge relationship between nodes can be defined in many ways. For example, each traffic intersection and its adjacent There are edges at K=4 traffic intersections, and each node also has an edge pointing to itself, and each edge has a weight value, etc.

The status information of the above-mentioned current intersection and adjacent intersections within the first preset time period can be obtained from image information captured by cameras installed at the current intersection and adjacent intersections. The camera will collect real-time images of each lane in the intersection where it is located, and obtain the image information of each lane in the intersection where it is located.

Specifically, taking the current lane as an example, the image information of each lane at the current intersection can be obtained at the current moment, and the parking position and duration of the vehicle in each lane can be extracted according to the image information of each lane; Calculate the lane queue length corresponding to each lane based on the location and length of stay; calculate the status information of the current intersection at the current moment according to the signal light action information and the lane queue length corresponding to each lane; obtain the corresponding State information, the first preset time period includes the current moment.

After the camera collects the image information of each lane at the current intersection at the current moment, the vehicle detection can be performed on the image information to obtain the vehicle information of each lane. The vehicle information includes the vehicle's parking position and duration.

Specifically, the parking position of the above-mentioned vehicle may be a preset area of which lane the vehicle stays in. For example, for the current intersection, the vehicles on each lane within 50 meters from the current intersection may be recorded. The dwell time of the above-mentioned vehicles can be the length of stay of these vehicles on the corresponding lane, and the dwell time can be calculated from the time when the vehicle enters the preset area of the lane, for example, the calculation starts when the vehicle enters within 50 meters from the current intersection.

According to the parking position and duration of vehicles in each lane, the lane queue length corresponding to each lane is calculated. Taking lane l as an example, according to the parking position and duration of vehicles in lane l, according to the vehicle set V _l of lane l, each vehicle in the vehicle set V _l is within the preset area of lane l vehicles, the lane queue length corresponding to lane l can be shown in the following formula:

Among them, V _l is the set of vehicles on the lane l within the preset area from the intersection, t _v is the length of time that the vehicle v stays on the lane l,

and w are hyperparameters. It can be seen that the length of the lane queue in the embodiment of the present invention takes into account the length of time the vehicle stays on the lane. Generally speaking, if there are many vehicles in a lane and the length of stay is very long (that is, t _v is very large), then The corresponding lane queue length will also be very long.

It should be noted that the above calculation of the lane formation length for lane 1 can also be used for the calculation of the lane formation lengths of other lanes at the current intersection, and the calculation of the lane formation lengths of the lanes at other intersections.

In the embodiment of the present invention, the action information of the signal light can be determined according to the phase of the signal light. For the phase of the signal light, refer to FIG. 2 . The intersection is a four-fork intersection, including a total of 24 lanes from No. 1 to No. 24. Based on the current right turn without waiting for the signal light indicator, forward and left turn need to wait for the signal light indicator, and the same bifurcated intersection is performed at the same time as the left turn and forward movement. If there is no cross traffic rule at the crossing, then there are a total of 8 signal light phases No. 1-8 in a four-fork intersection. In Figure 2, the four-fork intersection includes four fork intersections in the east, south, west, and north. The intersection includes left-turn lane, forward lane, right-turn lane, and 3 oncoming lanes. For example, the north of the bifurcation intersection includes left-turn lane 1, forward lane 2, right-turn lane 3, oncoming lane 13, and oncoming lane. Lane 14, oncoming lane 15; the east of the fork intersection includes left-turn lane 4, forward lane 5, right-turn lane 6, oncoming lane 16, oncoming lane 17, and oncoming lane 18; the south of the fork includes 7 left-turn lanes, 8 front lanes, 9 right-turn lanes, 19 oncoming lanes, 20 oncoming lanes, and 21 oncoming lanes; the west of the fork intersection includes 10 left-turn lanes, 11 forward lanes, and 12 right-turn lanes , oncoming lane 22, oncoming lane 23, oncoming lane 24; Phase 1 corresponds to the release action of left-turn lane 1 and left-turn lane 7, and phase 2 corresponds to the release action of forward lane 2 and forward lane 8, Phase 3 corresponds to the release action of left-turn lane 4 and left-turn lane 10, phase 4 corresponds to the release action of forward lane 5 and forward lane 11, and phase 5 corresponds to the release action of forward lane 2 and left-turn lane 1 , Phase 6 corresponds to the clearance action of the forward lane 5 and left-turn lane 4, phase 7 corresponds to the clearance action of the forward lane 8 and left-turn lane 7, and phase 8 corresponds to the clearance action of the forward lane 11 and left-turn lane 10 Actions, 8 phases, respectively corresponding to 8 release actions of signal lights.

In the embodiment of the present invention, the state information of the current intersection can be understood as the state information of the signal light at the current intersection. The state information of the current intersection includes a dimension equal to the total number of phases of the intersection. When the current intersection is a four-way intersection, the total number of phases of the signal light is 8 , the state information of the current intersection has 8 dimensions.

Certainly, the embodiment of the present invention only takes a four-fork intersection as an example. For other intersections, the state information of the intersection includes a dimension equal to the total number of intersection phases.

Optionally, the traffic intersection can also be a three-way intersection, and there are only three phases of the signal lights at the three-way intersection. In the embodiment of the present invention, on the basis of the phases of the signal lights at the four-way intersection, the corresponding three signal light phases can be selected as the signal lights at the three-way intersection For example, on the basis of Figure 2, for a three-way intersection with no bifurcation north (no lanes 1-3, 13-15), there can be

phases

1, 4, and 6 to choose from. Specifically, you can use the following Table 1 is used to represent the phase of the signal lights at the three-way intersection:

Table 1

缺失分叉路口missing fork	可选择相位phase selectable


北north	1，4，61, 4, 6
东 East	2，3，72, 3, 7
南 South	1，4，81, 4, 8
西 West	2，3，52, 3, 5

Further, in the embodiment of the present invention, for the state information of a three-way intersection, it can be replaced by -1 in an unselectable dimension, which is equivalent to blocking the phase of an unselectable signal light. For example, the state information of a four-way intersection is (1, 2, 3, 4, 5, 6, 7, 8), then the state information of the three-fork intersection north of the fork intersection is (1, -1, -1, 4, -1, 6, -1, -1), so that the state information of all intersections in the road network is an 8-dimensional vector. By reducing the action space of the three-fork intersection, the action of the agent can be made more efficient and reasonable, thereby speeding up the learning speed of the agent.

Optionally, the signal light action at the current intersection at the current moment can be obtained; calculate the sum of the lane queue lengths corresponding to the lanes that are allowed to pass under the signal light action at the current intersection at the current moment; according to the signal light action at the current intersection at the current moment and the allowed passage The sum of the queue lengths of the lanes corresponding to the lanes of , to obtain the state information of the current intersection at the current moment. Specifically, the above-mentioned status information of the current intersection at the current moment includes two parts of information, one part of the information is the signal light action at the current intersection at the current moment, and the other part of the information is the sum of the lane queue lengths corresponding to the allowed lanes.

There is a one-to-one correspondence between signal light actions and signal light phases. Taking a four-way intersection as an example, there are 8 signal light phases at a four-way intersection, so there are also 8 signal light actions, and each signal light action corresponds to a signal light phase. In the embodiment of the present invention, a single phase acts as a signal light, which can improve the flexibility of phase selection.

Specifically, the status information of the current intersection includes two parts. The first part is the signal light status at the current intersection. Assuming that the signal light at the current moment is 2, we use One-hot Encoding to encode it. The second part is the sum of the lane queue lengths on the lanes corresponding to each phase. Taking a four-fork intersection as an example and referring to Fig. The sum of the lane formation length of turning lane 1 and the lane formation length of left turning lane 7. More specifically, L _i can be used to represent the set of allowed lanes in phase i, then the i-th dimension of state information s can define the following formula:

s _i represents the i-th dimension state of the state information in the current path.

Obtain state information corresponding to each moment within a first preset time period, where the first preset time period includes the current moment. Specifically, the first preset time period is the latest H moments, and the H moments include the current moment, for example, the latest 5 moments including the current moment.

The state information of the current intersection and adjacent intersections within the first preset time period can be expressed as a global state, which is a tensor of H*N*F, where H is a hyperparameter, representing the first preset Assume state information corresponding to H times in the time period, for example, H=5, that is, use state information corresponding to the latest 5 moments. N represents the number of traffic lights in the road network. Each intersection controls the clearance of each lane through a signal light (or a signal light system), and F represents the dimension of status information.

102. Input the state information and graph structure into the pre-trained agent, and predict the signal light action at the preset time at the current intersection.

In the embodiment of the present invention, the above-mentioned agent is obtained through reinforcement learning training, the input of the agent is state information H*N*F and graph structure G, and the state information H*N*F is the current intersection and adjacent intersections in the first State information within a preset time period, the graph structure G is the graph structure of the current intersection and adjacent intersections. The current intersection and adjacent intersections can be classified into a target road network. Therefore, the state information H*N*F can also be called the global state of the target road network, and the graph structure G can also be called the graph structure of the target road network. The output of the agent is the probability distribution of signal light actions at the preset time, and the signal light action with the highest probability is selected as the signal light action at the preset time. For example, when F=8, the agent will get the probability distribution of signal light actions at 8 preset times, each signal light action at a preset time has a probability, and the signal light action at a preset time with the highest probability is taken as the final Beacon action at preset times. The aforementioned preset moment may be the next A moment of the current moment, for example, it may be the next moment of the current moment.

The preset agent is obtained through reinforcement learning training. In reinforcement learning, the agent is rewarded so that the agent can learn and train with the goal of getting more rewards.

Optionally, an intelligent agent can be constructed, and the intelligent agent is constructed to output signal light actions according to the state information; the intelligent agent is trained for reinforcement learning with the traffic volume of the current intersection in the second preset time period as a reward, and after the training is completed Get the trained agent as the default agent. In the road network, an agent is set at each intersection to predict the signal light action at the intersection at a preset time.

The above-mentioned second preset time period may be a time period during which the operation of the signal light lasts. During the time period during which the operation of the signal light lasts, vehicles in the corresponding lane can pass. The agent corresponding to the current intersection is rewarded with the traffic volume within the second preset time period. The higher the traffic volume, the higher the reward, and the stronger the positive incentive effect. Specifically, let V _t be the set of vehicles passing through the current intersection at time t, then the reward at the current intersection can be expressed by the following formula:

It can be seen that the reward at the current intersection takes into account the vehicle's dwell time t _v , which can make the agent pay more attention to the congested lanes, thereby improving the overall traffic efficiency of the road network.

Optionally, a signal light action network is constructed based on the spatiotemporal graph convolution network and the first output network, and the signal light action network outputs the signal light action through the first output network; an evaluation network is constructed based on the spatiotemporal graph convolution network and the second output network, and the evaluation network passes The second output network outputs the state value. The state value is used to evaluate the performance of the signal light action network. The evaluation network and the signal light action network share the parameters of a spatio-temporal graph convolutional network; according to the signal light action network and the evaluation network, an agent is constructed.

Further, the spatio-temporal graph convolutional network may include graph convolutional networks, cyclic neural networks, and fully connected networks, where the graph convolutional network is used to extract the spatial dependencies of the current intersection and adjacent intersections in the graph structure, and the cyclic neural network is used to The state temporal dependencies of the current intersection and adjacent intersections are extracted, and the spatial dependencies and state temporal dependencies are fused through a fully connected network to obtain the spatio-temporal information of the current intersection and adjacent intersections.

Furthermore, the above-mentioned graph convolutional network may be a graph convolutional network based on the GAT layer, and the above-mentioned recurrent neural network may be a recurrent neural network based on the GRU layer. As a graph convolutional neural network, the GAT layer can capture the spatial correlation of adjacent intersections well, so that the agent can take the state of adjacent intersections into consideration when making decisions. As a kind of recurrent neural network, the GRU layer can capture the time correlation of the intersection state very well, so that the agent can take the historical state into consideration when making decisions. By combining GAT layer, GRU layer and multiple fully connected layers, a spatiotemporal graph convolutional network can be obtained, which can well capture the spatiotemporal characteristics of road network traffic flow.

The first output network may include a linear layer, a mask layer, and a classification layer, wherein the linear layer is used to extract the spatio-temporal graph convolutional network to the spatio-temporal feature for linear transformation, and the classification layer is used to classify the linearly transformed feature vector, The classification layer can use Softmax to classify, and obtain the probability distribution of each signal light action. The masking layer is used to mask the probability distribution of signal light actions, so that the probability distribution of non-selectable signal light actions is 0, mainly for agents at three-way intersections.

The second output network can include a linear layer. The linear layer is used to extract the spatio-temporal graph convolutional network into the spatio-temporal features for linear transformation, output the state value, and the state value is used to evaluate the performance of the signal light action network. The performance of the signal light action network is derived from the state A process evaluation of the probability distribution of information to signal light actions. During the training process, the signal light action network will be adjusted according to the state value, and the evaluation network will also be adjusted according to the state value, so that the performance of the signal light action network is getting better and better. The evaluation network The status value is also getting higher and higher.

It should be noted that when the agent training is completed and deployed to the corresponding intersection, it is not necessary to deploy the evaluation network together, only the signal light action network needs to be deployed. The training of the agent includes the training of the signal light action network and the evaluation network. It should be noted that after the agent is constructed, it includes a state function, an action function, a reward function, a signal light action network, and an evaluation network. The trained agent can only include a signal light action network. The state function is used to describe the state information. The action function is used to describe the action of the signal light, and the reward function is used to motivate the agent to choose the action of the signal light with higher traffic volume.

In a possible embodiment, please refer to FIG. 3. FIG. 3 is an architecture diagram of an agent provided by an embodiment of the present invention. As shown in FIG. 3, the signal light action network and evaluation network can be based on the Actor-Critic framework. Construction. At this point, the agent includes the Actor network and the Critic network. The actor network and the critic network share some network parameters (the parameters of the spatiotemporal graph convolutional network), the upper part is the critic network, and the lower part is the actor network. These two networks share the first four layers of network parameters (parameters of the spatiotemporal graph convolutional network ). This is beneficial to reduce the learning difficulty of the model and speed up the convergence of agent training. During the training process, the output of the agent is divided into two parts, one part is the output of the critic network, which is the state value of each agent; the other part is the output of the actor network, which is the signal light predicted by the agent The probability distribution of actions, because different agents (set at different intersections) may choose different signal actions (for example, the agent at a three-way intersection can only choose three phases), so a Mask can be added to the output layer of the Actor network ( That is, the mask) operation, adding a mask to the output action distribution of the agent at the three-fork intersection so that the output of the unselectable action probability is 0.

Optionally, during the training process, a road network simulation environment can be constructed according to a preset number of simulated intersections, simulated roads, connectivity relationships between simulated intersections, the maximum speed limit of each simulated road, and the length of simulated roads. A well-built agent is set at the intersection, and the road network simulation environment randomly generates simulated traffic flow in each simulation lane; every preset time, the state information of all simulated intersections in the first preset time period and the corresponding road network simulation environment The graph structure is used as the input of the constructed agent, and the signal light action is output through the constructed agent; after the execution of the signal light action, the traffic volume of each simulated intersection within the second preset time period is used as a reward, which is beneficial to the constructed intelligent agent. The agent is trained for reinforcement learning; after the training is completed, the signal light action network in the trained intelligence is used as the preset agent.

In the training process, the traffic flow in the road network simulation environment can be randomly generated every M iterations to increase the adaptability of the agent to different traffic environments, and M is greater than or equal to 1. At the same time, it is also possible to randomly reconstruct the road network simulation environment every iteration Z times, and randomly reconstruct the simulated intersections, simulated roads, the connection relationship between simulated intersections, and the maximum speed limit of each simulated road in the road network simulation environment As well as the simulated road length, Z is greater than or equal to 1, so as to further increase the adaptability of the agent to different traffic environments.

In the road network simulation environment, the currently observed state S is calculated every preset time, the state S of all simulated intersections and the graph structure G of the road network simulation environment are used as the input of the corresponding agent, and the action of each agent is output The probability distribution of the agent, the agent chooses the action with the highest probability to execute, and at the same time, the reward r after the action is stored for the reinforcement learning training of the agent.

After the training in the simulation environment is completed, it can be deployed and used in the actual road network. Specifically, a camera is installed at each traffic intersection, and the vehicle detection algorithm is run on the end-side (that is, the camera) to obtain real-time information of each lane at the intersection. Vehicle information (such as vehicle location, length of stay). After obtaining the vehicle information, each agent calculates the current state, and exchanges state information with its adjacent agents, and finally outputs the signal light action at the preset time after the calculation of the signal light action network in the agent. It should be noted that before the agent makes a decision, in addition to obtaining the state of the current intersection, it also needs to obtain the state of the adjacent intersection. This is the state information of the adjacent intersection that the graph convolutional neural network needs to use in the calculation. In this way, multiple agents can fully cooperate when making decisions, and effectively consider the state information of adjacent intersections.

103. According to the signal light action at the current intersection at the preset time, control the signal light at the current intersection to execute the signal light action at the preset time.

Optionally, after the agent predicts the signal light action at the current intersection at the preset time, the signal light at the current intersection can be controlled to perform the signal light action at the preset time, so that at the preset time, the vehicles in the corresponding lane can follow the Traffic light action.

Optionally, post-processing can be performed on the signal light action at the preset time at the current intersection according to the preset post-processing rules, so as to obtain the signal light action at the preset time at the current intersection after post-processing; according to the post-processing The signal light action of the current intersection at the preset time, and the current intersection is controlled to execute the signal light action at the preset time after the post-processing.

It can be understood that the post-processing is used to modify the final signal light action, and the post-processing can be composed of various rules. For example, the final phase can be adjusted by defining the phase (equivalent to limiting the signal light action) corresponding to the length of time the vehicle stays on the lane. Correction (because the defined signal light actions are independent, it is possible that some phases have not been selected and the vehicles on the corresponding lanes have to wait too long).

A post-processing rule may be that if a certain signal light action is not selected and the dwell time of vehicles in the lane corresponding to the signal light action exceeds a preset threshold, then the signal light action is selected to allow vehicles in the corresponding lane to pass. The addition of post-processing can make the final action more reasonable.

It should be noted that the traffic signal light control method provided by the embodiment of the present invention can be applied to smart phones, computers, servers and other devices capable of controlling traffic signal lights.

Please refer to FIG. 4. FIG. 4 is a structural diagram of a traffic signal light control device provided by an embodiment of the present invention. As shown in FIG. 4, the traffic signal light control device includes:

An acquisition module 401, configured to acquire status information of the current intersection and adjacent intersections within a first preset time period, and acquire a graph structure of the current intersection and the adjacent intersections, the status information including the parking of vehicles in each lane location and length of stay, the graph structure includes the connection relationship between the current intersection and the adjacent intersection;

A prediction module 402, configured to input the state information and the graph structure into a pre-trained agent to predict the signal light action at the preset time at the current intersection, and the agent is obtained through reinforcement learning training;

The first control module 403 is configured to control the signal light at the current intersection to perform the signal light action at the preset time according to the signal light action at the current intersection at the preset time.

Optionally, the obtaining module 401 includes:

The first acquisition sub-module is used to acquire the image information of each lane of the current intersection at the current moment, and extract the parking position and duration of the vehicle in each lane according to the image information of each lane;

The first calculation sub-module is used to calculate the lane queue length corresponding to each lane according to the parking position and the staying time of the vehicle in each lane;

The second calculation sub-module is used to calculate the state information of the current intersection at the current moment according to the signal light action information and the lane queue length corresponding to each lane;

The second obtaining sub-module is used to obtain state information corresponding to each moment within a first preset time period, where the first preset time period includes the current moment.

Optionally, the second computing submodule includes:

an acquisition unit, configured to acquire the signal light action at the current intersection at the current moment;

A calculation unit, configured to calculate the sum of the lane queue lengths corresponding to the lanes that are allowed to pass under the signal light action at the current intersection at the current moment;

The processing unit is configured to obtain the state information of the current intersection at the current moment according to the sum of the signal light action at the current intersection and the lane queue length corresponding to the allowed lane.

Optionally, the device also includes:

A building block for building an intelligent body, the intelligent body is constructed to output a signal light action according to the state information;

The training module is used to use the traffic volume of the current intersection within the second preset time period as a reward to perform reinforcement learning training on the agent, and obtain the trained agent as the preset agent after the training is completed.

Optionally, the building blocks include:

A first construction submodule, configured to construct a signal light action network based on a spatio-temporal graph convolutional network and a first output network, and the signal light action network outputs a signal light action through the first output network;

The second construction sub-module is used to construct an evaluation network based on the spatio-temporal graph convolutional network and a second output network, the evaluation network outputs a state value through the second output network, and the state value is used to evaluate the signal light action network performance, the evaluation network shares the parameters of a spatio-temporal graph convolutional network with the signal light action network;

The third construction sub-module is used to construct an agent according to the signal light action network and the evaluation network.

Optionally, the training module includes:

The fourth construction sub-module is used to build a road network simulation environment according to the preset number of simulated intersections, simulated roads, the connection relationship between simulated intersections, the maximum speed limit of each simulated road, and the length of simulated roads. Each simulated intersection is set A constructed intelligent body, the road network simulation environment randomly generates simulated traffic flow in each simulated lane;

The first processing sub-module is used to take the state information of all simulated intersections within the first preset time period and the graph structure corresponding to the road network simulation environment as the input of the constructed agent every preset time , outputting a signal light action through the constructed agent;

The reward sub-module is used to use the traffic volume of each simulated intersection in the second preset time period as a reward after performing the signal light action, and carry out reinforcement learning training to the constructed intelligent body;

The second processing sub-module is used to use the signal light action network in the trained intelligence as a preset agent after the training is completed.

Optionally, the device also includes:

The post-processing module is used to perform post-processing on the signal light action of the current intersection at the preset time according to the preset post-processing rules, so as to obtain the signal light action of the current intersection at the preset time after post-processing;

The second control module is configured to control the current intersection to execute the signal light action at the preset time according to the post-processed signal light action at the current intersection at the preset time.

It should be noted that the traffic signal light control device provided by the embodiment of the present invention can be applied to smart phones, computers, servers and other devices capable of controlling traffic signal lights.

The data center equipment provided by the embodiments of the present invention can implement various processes implemented by the traffic signal light control method in the above method embodiments, and can achieve the same beneficial effects. To avoid repetition, details are not repeated here.

Referring to FIG. 5, FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention. As shown in FIG. 5, it includes: a memory 502, a processor 501 and a A computer program for a traffic signal control method operating on 501, wherein:

The processor 501 is used to call the computer program stored in the memory 502, and perform the following steps:

Optionally, the acquiring state information of the current intersection and adjacent intersections within the first preset time period performed by the processor 501 includes:

Optionally, the calculation performed by the processor 501 on the status information of the current intersection at the current moment according to the signal light action information and the lane queue length corresponding to each lane includes:

The state information of the current intersection at the current moment is calculated according to the sum of the signal light action at the current intersection and the lane queue length corresponding to the allowed lane.

Optionally, before inputting the state information and the graph structure into the pre-trained agent to predict the signal light action at the current intersection at a preset time, the method executed by the processor 501 further includes :

Optionally, the building agent executed by the processor 501 includes:

Optionally, the processor 501 performs reinforcement learning training on the agent with the traffic volume of the current intersection within the second preset time period as a reward, and after the training is completed, the trained agent is obtained as a preset The set of intelligent agents, including:

Optionally, after the state information and the graph structure are input into the pre-trained agent to predict the signal light action at the current intersection at a preset time, the method executed by the processor 501 further includes :

It should be noted that the electronic device provided by the embodiment of the present invention can be applied to smart phones, computers, servers and other devices capable of controlling traffic lights.

The electronic device provided by the embodiment of the present invention can realize each process realized by the traffic signal light control method in the above method embodiment, and can achieve the same beneficial effect. To avoid repetition, details are not repeated here.

The embodiment of the present invention also provides a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, each process of the traffic signal light control method provided in the embodiment of the present invention is implemented, and can To achieve the same technical effect, in order to avoid repetition, no more details are given here.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented through computer programs to instruct related hardware, and the programs can be stored in a computer-readable storage medium. During execution, it may include the processes of the embodiments of the above-mentioned methods. Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM for short).

The above disclosures are only preferred embodiments of the present invention, and certainly cannot limit the scope of rights of the present invention. Therefore, equivalent changes made according to the claims of the present invention still fall within the scope of the present invention.

Claims

A traffic light control method, characterized in that, comprising the following steps:

Obtain the state information of the current intersection and the adjacent intersection within the first preset time period, and obtain the graph structure of the current intersection and the adjacent intersection, the state information includes the parking position and the duration of the vehicle in each lane, so The graph structure includes the connection relationship between the current intersection and the adjacent intersection;

Inputting the state information and the graph structure into a pre-trained agent, predicting the signal light action at the preset time at the current intersection, and the agent is obtained through reinforcement learning training;

According to the signal light action at the current intersection at the preset time, the signal light at the current intersection is controlled to execute the signal light action at the preset time.
The traffic signal light control method according to claim 1, wherein said acquiring state information of the current intersection and adjacent intersections within a first preset time period comprises:

At the current moment, the image information of each lane at the current intersection is obtained, and the parking position and duration of the vehicle in each lane are extracted according to the image information of each lane;

According to the parking position and the staying time of the vehicle in each lane, calculate the lane queue length corresponding to each lane;

calculating the status information of the current intersection at the current moment according to the signal light action information and the lane queue lengths corresponding to the respective lanes;

Obtain state information corresponding to each moment within a first preset time period, where the first preset time period includes the current moment.
The traffic signal light control method according to claim 2, wherein the calculating the state information of the current intersection at the current moment according to the signal light action information and the lane queue lengths corresponding to the respective lanes includes:

Obtaining the signal light action at the current intersection at the current moment;

Calculating the sum of the lane queue lengths corresponding to the lanes that are allowed to pass under the signal light action at the current moment at the current intersection;

The state information of the current intersection at the current moment is obtained according to the sum of the signal light action at the current intersection and the lane queue length corresponding to the allowed lane.
The traffic signal light control method according to claim 1, characterized in that, when the state information and the graph structure are input into the pre-trained agent, the signal light action of the current intersection at a preset time is predicted Previously, the method further included:

Constructing an agent configured to output a signal light action according to state information;

Taking the traffic volume at the current intersection within the second preset time period as a reward, the agent is trained in reinforcement learning, and after the training is completed, the trained agent is obtained as the preset agent.
The traffic signal light control method according to claim 4, wherein said building an intelligent body includes:

Constructing a signal light action network based on the spatio-temporal graph convolutional network and the first output network, the signal light action network outputs the signal light action through the first output network;

An evaluation network is constructed based on a spatio-temporal graph convolutional network and a second output network, the evaluation network outputs a state value through the second output network, and the state value is used to evaluate the performance of the signal light action network, and the evaluation network and The signal light action network shares the parameters of a spatio-temporal graph convolutional network;

According to the signal light action network and the evaluation network, an agent is constructed.
The traffic signal light control method according to claim 5, characterized in that, using the traffic volume of the current crossing in the second preset time period as a reward, the agent is trained in reinforcement learning, and after the training is completed, it is obtained The trained agent is used as the preset agent, including:

According to the preset number of simulated intersections, simulated roads, the connection relationship between simulated intersections, the maximum speed limit of each simulated road, and the length of simulated roads, a road network simulation environment is constructed. Each simulated intersection is set with a well-built intelligent body. The road network simulation environment randomly generates simulated traffic flow in each simulation lane;

Every preset time, the state information of all simulated intersections in the first preset time period and the graph structure corresponding to the road network simulation environment are used as the input of the constructed intelligent agent, through the constructed intelligent agent Body output signal light action;

After executing the signal light action, the traffic volume of each simulated intersection in the second preset time period is used as a reward, and the constructed intelligent body is trained for reinforcement learning;

After the training is completed, the signal light action network in the trained intelligence is used as the preset intelligent body.
The traffic signal light control method according to claim 1, characterized in that, when the state information and the graph structure are input into the pre-trained agent, the signal light action of the current intersection at a preset time is predicted Afterwards, the method also includes:

performing post-processing on the signal light action at the preset time at the current intersection according to the preset post-processing rule, to obtain the signal light action at the preset time at the current intersection after post-processing;

Controlling the current intersection to execute the signal light action at the preset time after post-processing according to the post-processed signal light action at the preset time.
A traffic signal light control device, characterized in that the device comprises:

An acquisition module, configured to acquire status information of the current intersection and adjacent intersections within a first preset time period, and acquire a graph structure of the current intersection and the adjacent intersections, the status information including the parking positions of vehicles in each lane And the length of stay, the graph structure includes the connection relationship between the current intersection and the adjacent intersection;

A prediction module, configured to input the state information and the graph structure into a pre-trained agent to predict the signal light action at the current intersection at a preset moment, and the agent is obtained through reinforcement learning training;

The first control module is configured to control the signal light at the current intersection to perform the signal light action at the preset time according to the signal light action at the current intersection at the preset time.
An electronic device, characterized in that it comprises: a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the computer program, the computer program according to claim 1 is realized. Steps in the traffic signal control method described in any one of to 7.
A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the traffic signal light according to any one of claims 1 to 7 is realized The steps in the control method.