WO2023123906A1 - Traffic light control method and related device - Google Patents

Traffic light control method and related device Download PDF

Info

Publication number
WO2023123906A1
WO2023123906A1 PCT/CN2022/099853 CN2022099853W WO2023123906A1 WO 2023123906 A1 WO2023123906 A1 WO 2023123906A1 CN 2022099853 W CN2022099853 W CN 2022099853W WO 2023123906 A1 WO2023123906 A1 WO 2023123906A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal light
intersection
preset time
lane
current
Prior art date
Application number
PCT/CN2022/099853
Other languages
French (fr)
Chinese (zh)
Inventor
蚁韩羚
李圆法
余晓填
王孝宇
陈宁
Original Assignee
深圳云天励飞技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术股份有限公司 filed Critical 深圳云天励飞技术股份有限公司
Publication of WO2023123906A1 publication Critical patent/WO2023123906A1/en

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • G08G1/0133Traffic data processing for classifying traffic situation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B20/00Energy efficient lighting technologies, e.g. halogen lamps or gas discharge lamps
    • Y02B20/40Control techniques providing energy savings, e.g. smart controller or presence detection

Definitions

  • the invention relates to the field of traffic signal lamp control, in particular to a traffic signal lamp control method and related equipment.
  • Traffic signal light control is an essential part of smart city construction. Effective control of traffic signal lights is of great significance for alleviating urban traffic congestion.
  • traffic lights generally use the control method of single-point timing control, that is, within a fixed period of time, according to the order and duration of the preset phase, the traffic flow in each direction is released sequentially. Vehicles that have not passed through for a fixed period of time need to wait until the phase of the next cycle before they can continue to pass. Lanes with few vehicles have redundant passing time, resulting in idle passing time windows (no vehicles passing). Therefore, the existing traffic signal light control method has the problem of low traffic efficiency.
  • An embodiment of the present invention provides a traffic signal light control method, by taking the state information of the current intersection and adjacent intersections within the first preset time period, and the graph structure of the current intersection and adjacent intersections as the input of the agent, and outputting the information through the agent Signal light action, because the state information includes timing information and the parking position and duration of vehicles in the lane, considering the congestion of the lane, and the graph structure includes the spatial dependence of each intersection, taking into account the spatial distribution of each intersection, making intelligent
  • the body can accurately predict the signal light action at the preset time at the current intersection from the dimension of time and space, and control the signal light at the intersection to execute the signal light action at the preset time to avoid vehicle congestion or idle time windows, thereby improving the traffic efficiency of vehicles on the entire road network.
  • an embodiment of the present invention provides a traffic signal light control method, the traffic signal light control method comprising:
  • the state information includes the parking position and the duration of the vehicle in each lane, so
  • the graph structure includes the connection relationship between the current intersection and the adjacent intersection;
  • the signal light at the current intersection is controlled to execute the signal light action at the preset time.
  • the acquiring state information of the current intersection and adjacent intersections within the first preset time period includes:
  • the image information of each lane at the current intersection is obtained, and the parking position and duration of the vehicle in each lane are extracted according to the image information of each lane;
  • the calculating the state information of the current intersection at the current moment according to the signal light action information and the lane queue length corresponding to each lane includes:
  • the state information of the current intersection at the current moment is obtained according to the sum of the signal light action at the current intersection and the lane queue length corresponding to the allowed lane.
  • the method before inputting the state information and the graph structure into the pre-trained agent to predict the signal light action at the current intersection at a preset time, the method further includes:
  • the agent is trained in reinforcement learning, and after the training is completed, the trained agent is obtained as the preset agent.
  • construction of an intelligent body includes:
  • the signal light action network Constructing a signal light action network based on the spatio-temporal graph convolutional network and the first output network, the signal light action network outputs the signal light action through the first output network;
  • An evaluation network is constructed based on a spatio-temporal graph convolutional network and a second output network, the evaluation network outputs a state value through the second output network, and the state value is used to evaluate the performance of the signal light action network, and the evaluation network and The signal light action network shares the parameters of a spatio-temporal graph convolutional network;
  • an agent is constructed.
  • the agent is trained in reinforcement learning, and after the training is completed, the trained agent is obtained as the preset agent, include:
  • a road network simulation environment is constructed.
  • Each simulated intersection is set with a well-built intelligent body.
  • the road network simulation environment randomly generates simulated traffic flow in each simulation lane;
  • the state information of all simulated intersections in the first preset time period and the graph structure corresponding to the road network simulation environment are used as the input of the constructed intelligent agent, through the constructed intelligent agent Body output signal light action;
  • the traffic volume of each simulated intersection in the second preset time period is used as a reward, and the constructed intelligent body is trained for reinforcement learning;
  • the signal light action network in the trained intelligence is used as the preset intelligent body.
  • the method further includes:
  • the current intersection is controlled to execute the signal light action at the preset time.
  • an embodiment of the present invention provides a traffic signal light control device, the device comprising:
  • An acquisition module configured to acquire status information of the current intersection and adjacent intersections within a first preset time period, and acquire a graph structure of the current intersection and the adjacent intersections, the status information including the parking positions of vehicles in each lane And the length of stay, the graph structure includes the connection relationship between the current intersection and the adjacent intersection;
  • a prediction module configured to input the state information and the graph structure into a pre-trained agent to predict the signal light action at the current intersection at a preset moment, and the agent is obtained through reinforcement learning training;
  • the first control module is configured to control the signal light at the current intersection to perform the signal light action at the preset time according to the signal light action at the current intersection at the preset time.
  • an embodiment of the present invention provides an electronic device, including: a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • the processor executes the computer program, The steps in the traffic signal light control method provided by the embodiment of the present invention are implemented.
  • an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the traffic signal light control method provided in the embodiment of the present invention is implemented. A step of.
  • the status information of the current intersection and the adjacent intersections within the first preset time period is obtained, and the graph structure of the current intersection and the adjacent intersections is obtained, and the status information includes the parking of vehicles in each lane
  • the location and the length of stay the graph structure includes the connection relationship between the current intersection and the adjacent intersection;
  • the state information and the graph structure are input into the pre-trained agent to predict the current intersection in the preset
  • the signal light action at the moment is obtained by the agent through reinforcement learning training; according to the signal light action at the current intersection at the preset time, the signal light at the current intersection is controlled to execute the signal light action at the preset time.
  • the agent By using the state information of the current intersection and adjacent intersections within the first preset time period, as well as the graph structure of the current intersection and adjacent intersections as the input of the agent, the agent outputs signal lights. Since the state information includes timing information and lane The parking position and duration of the vehicle in the vehicle, taking into account the recent congestion of the lane, and the graph structure includes the spatial dependence of each intersection, taking into account the spatial distribution of each intersection, so that the agent can accurately predict the current intersection from the dimension of time and space The signal light action at the preset time, the control signal light at the intersection executes the signal light action at the preset time, avoiding vehicle congestion or idle time windows, thereby improving the traffic efficiency of the overall road network.
  • FIG. 1 is a schematic flowchart of a traffic signal light control method provided by an embodiment of the present invention
  • Fig. 2 is a schematic diagram of phases of a signal light at an intersection provided by an embodiment of the present invention
  • FIG. 3 is a network architecture diagram of an agent provided by an embodiment of the present invention.
  • Fig. 4 is a schematic structural diagram of a traffic signal light control device provided by an embodiment of the present invention.
  • Fig. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
  • FIG. 1 is a flowchart of a traffic signal light control method provided by an embodiment of the present invention. As shown in FIG. 1, the traffic signal light control method includes:
  • the above-mentioned state information includes the state information of the current intersection and the state information of the adjacent intersections. Further, the state information of the above-mentioned current intersection includes the parking position and the duration of the vehicles in each lane in the current intersection. The above-mentioned adjacent intersections The status information of the vehicle includes the location and duration of each lane in the adjacent intersection.
  • the above-mentioned graph structure includes the connection relationship between the above-mentioned current intersection and the above-mentioned adjacent intersection.
  • the connection relationship between the above-mentioned current intersection and the above-mentioned adjacent intersection can be understood as which intersections are connected to the current intersection. Vehicles can drive from these intersections to the current intersection, or from the current intersection. Drive to these intersections.
  • the current intersection and adjacent intersections are used as nodes, and the connection relationship between the current intersection and adjacent intersections is used as a weight edge. The closer the distance between the current intersection and the adjacent intersection, the greater the value of the weight edge. The distance between the current intersection and the adjacent intersection The farther away, the smaller the value of the weight edge.
  • the above graph structure can be pre-built. According to the connection relationship and distance between each intersection and other intersections, the corresponding graph structure is constructed.
  • the graph structure is a fixed structure of the road network. Before the road network changes, the graph structure is There will be no change.
  • the graph structure encodes the spatial dependencies between different traffic intersections.
  • each node represents a traffic intersection
  • the status information of the above-mentioned current intersection and adjacent intersections within the first preset time period can be obtained from image information captured by cameras installed at the current intersection and adjacent intersections.
  • the camera will collect real-time images of each lane in the intersection where it is located, and obtain the image information of each lane in the intersection where it is located.
  • the image information of each lane at the current intersection can be obtained at the current moment, and the parking position and duration of the vehicle in each lane can be extracted according to the image information of each lane; Calculate the lane queue length corresponding to each lane based on the location and length of stay; calculate the status information of the current intersection at the current moment according to the signal light action information and the lane queue length corresponding to each lane; obtain the corresponding State information, the first preset time period includes the current moment.
  • the vehicle detection can be performed on the image information to obtain the vehicle information of each lane.
  • the vehicle information includes the vehicle's parking position and duration.
  • the parking position of the above-mentioned vehicle may be a preset area of which lane the vehicle stays in.
  • the vehicles on each lane within 50 meters from the current intersection may be recorded.
  • the dwell time of the above-mentioned vehicles can be the length of stay of these vehicles on the corresponding lane, and the dwell time can be calculated from the time when the vehicle enters the preset area of the lane, for example, the calculation starts when the vehicle enters within 50 meters from the current intersection.
  • the lane queue length corresponding to each lane is calculated.
  • lane queue length corresponding to lane l can be shown in the following formula:
  • V l is the set of vehicles on the lane l within the preset area from the intersection
  • t v is the length of time that the vehicle v stays on the lane l
  • w are hyperparameters. It can be seen that the length of the lane queue in the embodiment of the present invention takes into account the length of time the vehicle stays on the lane. Generally speaking, if there are many vehicles in a lane and the length of stay is very long (that is, t v is very large), then The corresponding lane queue length will also be very long.
  • the above calculation of the lane formation length for lane 1 can also be used for the calculation of the lane formation lengths of other lanes at the current intersection, and the calculation of the lane formation lengths of the lanes at other intersections.
  • the action information of the signal light can be determined according to the phase of the signal light.
  • the intersection is a four-fork intersection, including a total of 24 lanes from No. 1 to No. 24. Based on the current right turn without waiting for the signal light indicator, forward and left turn need to wait for the signal light indicator, and the same bifurcated intersection is performed at the same time as the left turn and forward movement. If there is no cross traffic rule at the crossing, then there are a total of 8 signal light phases No. 1-8 in a four-fork intersection. In Figure 2, the four-fork intersection includes four fork intersections in the east, south, west, and north.
  • the intersection includes left-turn lane, forward lane, right-turn lane, and 3 oncoming lanes.
  • the north of the bifurcation intersection includes left-turn lane 1, forward lane 2, right-turn lane 3, oncoming lane 13, and oncoming lane.
  • Lane 14 oncoming lane 15; the east of the fork intersection includes left-turn lane 4, forward lane 5, right-turn lane 6, oncoming lane 16, oncoming lane 17, and oncoming lane 18; the south of the fork includes 7 left-turn lanes, 8 front lanes, 9 right-turn lanes, 19 oncoming lanes, 20 oncoming lanes, and 21 oncoming lanes; the west of the fork intersection includes 10 left-turn lanes, 11 forward lanes, and 12 right-turn lanes , oncoming lane 22, oncoming lane 23, oncoming lane 24; Phase 1 corresponds to the release action of left-turn lane 1 and left-turn lane 7, and phase 2 corresponds to the release action of forward lane 2 and forward lane 8, Phase 3 corresponds to the release action of left-turn lane 4 and left-turn lane 10, phase 4 corresponds to the release action of forward lane 5 and forward lane 11, and phase 5 corresponds to the release action of forward lane 2 and left-turn lane 1 , Phase 6 corresponds to the clearance action of the forward
  • the state information of the current intersection can be understood as the state information of the signal light at the current intersection.
  • the state information of the current intersection includes a dimension equal to the total number of phases of the intersection.
  • the total number of phases of the signal light is 8
  • the state information of the current intersection has 8 dimensions.
  • the embodiment of the present invention only takes a four-fork intersection as an example.
  • the state information of the intersection includes a dimension equal to the total number of intersection phases.
  • the traffic intersection can also be a three-way intersection, and there are only three phases of the signal lights at the three-way intersection.
  • the corresponding three signal light phases can be selected as the signal lights at the three-way intersection
  • Table 1 is used to represent the phase of the signal lights at the three-way intersection:
  • the state information of a three-way intersection can be replaced by -1 in an unselectable dimension, which is equivalent to blocking the phase of an unselectable signal light.
  • the state information of a four-way intersection is (1, 2, 3, 4, 5, 6, 7, 8)
  • the state information of the three-fork intersection north of the fork intersection is (1, -1, -1, 4, -1, 6, -1, -1)
  • the state information of all intersections in the road network is an 8-dimensional vector.
  • the signal light action at the current intersection at the current moment can be obtained; calculate the sum of the lane queue lengths corresponding to the lanes that are allowed to pass under the signal light action at the current intersection at the current moment; according to the signal light action at the current intersection at the current moment and the allowed passage The sum of the queue lengths of the lanes corresponding to the lanes of , to obtain the state information of the current intersection at the current moment.
  • the above-mentioned status information of the current intersection at the current moment includes two parts of information, one part of the information is the signal light action at the current intersection at the current moment, and the other part of the information is the sum of the lane queue lengths corresponding to the allowed lanes.
  • signal light actions there is a one-to-one correspondence between signal light actions and signal light phases. Taking a four-way intersection as an example, there are 8 signal light phases at a four-way intersection, so there are also 8 signal light actions, and each signal light action corresponds to a signal light phase. In the embodiment of the present invention, a single phase acts as a signal light, which can improve the flexibility of phase selection.
  • the status information of the current intersection includes two parts.
  • the first part is the signal light status at the current intersection. Assuming that the signal light at the current moment is 2, we use One-hot Encoding to encode it.
  • the second part is the sum of the lane queue lengths on the lanes corresponding to each phase. Taking a four-fork intersection as an example and referring to Fig.
  • s i represents the i-th dimension state of the state information in the current path.
  • the first preset time period includes the current moment.
  • the first preset time period is the latest H moments
  • the H moments include the current moment, for example, the latest 5 moments including the current moment.
  • N represents the number of traffic lights in the road network.
  • Each intersection controls the clearance of each lane through a signal light (or a signal light system), and F represents the dimension of status information.
  • the above-mentioned agent is obtained through reinforcement learning training
  • the input of the agent is state information H*N*F and graph structure G
  • the state information H*N*F is the current intersection and adjacent intersections in the first State information within a preset time period
  • the graph structure G is the graph structure of the current intersection and adjacent intersections.
  • the current intersection and adjacent intersections can be classified into a target road network. Therefore, the state information H*N*F can also be called the global state of the target road network, and the graph structure G can also be called the graph structure of the target road network.
  • the output of the agent is the probability distribution of signal light actions at the preset time, and the signal light action with the highest probability is selected as the signal light action at the preset time.
  • the aforementioned preset moment may be the next A moment of the current moment, for example, it may be the next moment of the current moment.
  • the preset agent is obtained through reinforcement learning training.
  • reinforcement learning the agent is rewarded so that the agent can learn and train with the goal of getting more rewards.
  • an intelligent agent can be constructed, and the intelligent agent is constructed to output signal light actions according to the state information; the intelligent agent is trained for reinforcement learning with the traffic volume of the current intersection in the second preset time period as a reward, and after the training is completed Get the trained agent as the default agent.
  • an agent is set at each intersection to predict the signal light action at the intersection at a preset time.
  • the above-mentioned second preset time period may be a time period during which the operation of the signal light lasts. During the time period during which the operation of the signal light lasts, vehicles in the corresponding lane can pass. The agent corresponding to the current intersection is rewarded with the traffic volume within the second preset time period. The higher the traffic volume, the higher the reward, and the stronger the positive incentive effect. Specifically, let V t be the set of vehicles passing through the current intersection at time t, then the reward at the current intersection can be expressed by the following formula:
  • the reward at the current intersection takes into account the vehicle's dwell time t v , which can make the agent pay more attention to the congested lanes, thereby improving the overall traffic efficiency of the road network.
  • a signal light action network is constructed based on the spatiotemporal graph convolution network and the first output network, and the signal light action network outputs the signal light action through the first output network;
  • an evaluation network is constructed based on the spatiotemporal graph convolution network and the second output network, and the evaluation network passes The second output network outputs the state value.
  • the state value is used to evaluate the performance of the signal light action network.
  • the evaluation network and the signal light action network share the parameters of a spatio-temporal graph convolutional network; according to the signal light action network and the evaluation network, an agent is constructed.
  • the spatio-temporal graph convolutional network may include graph convolutional networks, cyclic neural networks, and fully connected networks, where the graph convolutional network is used to extract the spatial dependencies of the current intersection and adjacent intersections in the graph structure, and the cyclic neural network is used to The state temporal dependencies of the current intersection and adjacent intersections are extracted, and the spatial dependencies and state temporal dependencies are fused through a fully connected network to obtain the spatio-temporal information of the current intersection and adjacent intersections.
  • the above-mentioned graph convolutional network may be a graph convolutional network based on the GAT layer
  • the above-mentioned recurrent neural network may be a recurrent neural network based on the GRU layer.
  • the GAT layer can capture the spatial correlation of adjacent intersections well, so that the agent can take the state of adjacent intersections into consideration when making decisions.
  • the GRU layer can capture the time correlation of the intersection state very well, so that the agent can take the historical state into consideration when making decisions.
  • the first output network may include a linear layer, a mask layer, and a classification layer, wherein the linear layer is used to extract the spatio-temporal graph convolutional network to the spatio-temporal feature for linear transformation, and the classification layer is used to classify the linearly transformed feature vector,
  • the classification layer can use Softmax to classify, and obtain the probability distribution of each signal light action.
  • the masking layer is used to mask the probability distribution of signal light actions, so that the probability distribution of non-selectable signal light actions is 0, mainly for agents at three-way intersections.
  • the second output network can include a linear layer.
  • the linear layer is used to extract the spatio-temporal graph convolutional network into the spatio-temporal features for linear transformation, output the state value, and the state value is used to evaluate the performance of the signal light action network.
  • the performance of the signal light action network is derived from the state A process evaluation of the probability distribution of information to signal light actions.
  • the signal light action network will be adjusted according to the state value, and the evaluation network will also be adjusted according to the state value, so that the performance of the signal light action network is getting better and better.
  • the evaluation network The status value is also getting higher and higher.
  • the training of the agent includes the training of the signal light action network and the evaluation network. It should be noted that after the agent is constructed, it includes a state function, an action function, a reward function, a signal light action network, and an evaluation network. The trained agent can only include a signal light action network.
  • the state function is used to describe the state information.
  • the action function is used to describe the action of the signal light, and the reward function is used to motivate the agent to choose the action of the signal light with higher traffic volume.
  • FIG. 3 is an architecture diagram of an agent provided by an embodiment of the present invention.
  • the signal light action network and evaluation network can be based on the Actor-Critic framework. Construction.
  • the agent includes the Actor network and the Critic network.
  • the actor network and the critic network share some network parameters (the parameters of the spatiotemporal graph convolutional network), the upper part is the critic network, and the lower part is the actor network.
  • These two networks share the first four layers of network parameters (parameters of the spatiotemporal graph convolutional network ). This is beneficial to reduce the learning difficulty of the model and speed up the convergence of agent training.
  • the output of the agent is divided into two parts, one part is the output of the critic network, which is the state value of each agent; the other part is the output of the actor network, which is the signal light predicted by the agent.
  • the probability distribution of actions because different agents (set at different intersections) may choose different signal actions (for example, the agent at a three-way intersection can only choose three phases), so a Mask can be added to the output layer of the Actor network ( That is, the mask) operation, adding a mask to the output action distribution of the agent at the three-fork intersection so that the output of the unselectable action probability is 0.
  • a road network simulation environment can be constructed according to a preset number of simulated intersections, simulated roads, connectivity relationships between simulated intersections, the maximum speed limit of each simulated road, and the length of simulated roads.
  • a well-built agent is set at the intersection, and the road network simulation environment randomly generates simulated traffic flow in each simulation lane; every preset time, the state information of all simulated intersections in the first preset time period and the corresponding road network simulation environment
  • the graph structure is used as the input of the constructed agent, and the signal light action is output through the constructed agent; after the execution of the signal light action, the traffic volume of each simulated intersection within the second preset time period is used as a reward, which is beneficial to the constructed intelligent agent.
  • the agent is trained for reinforcement learning; after the training is completed, the signal light action network in the trained intelligence is used as the preset agent.
  • the traffic flow in the road network simulation environment can be randomly generated every M iterations to increase the adaptability of the agent to different traffic environments, and M is greater than or equal to 1.
  • M is greater than or equal to 1.
  • Z is greater than or equal to 1, so as to further increase the adaptability of the agent to different traffic environments.
  • the currently observed state S is calculated every preset time, the state S of all simulated intersections and the graph structure G of the road network simulation environment are used as the input of the corresponding agent, and the action of each agent is output
  • the probability distribution of the agent, the agent chooses the action with the highest probability to execute, and at the same time, the reward r after the action is stored for the reinforcement learning training of the agent.
  • each agent After the training in the simulation environment is completed, it can be deployed and used in the actual road network. Specifically, a camera is installed at each traffic intersection, and the vehicle detection algorithm is run on the end-side (that is, the camera) to obtain real-time information of each lane at the intersection. Vehicle information (such as vehicle location, length of stay). After obtaining the vehicle information, each agent calculates the current state, and exchanges state information with its adjacent agents, and finally outputs the signal light action at the preset time after the calculation of the signal light action network in the agent. It should be noted that before the agent makes a decision, in addition to obtaining the state of the current intersection, it also needs to obtain the state of the adjacent intersection. This is the state information of the adjacent intersection that the graph convolutional neural network needs to use in the calculation. In this way, multiple agents can fully cooperate when making decisions, and effectively consider the state information of adjacent intersections.
  • the signal light at the current intersection can be controlled to perform the signal light action at the preset time, so that at the preset time, the vehicles in the corresponding lane can follow the Traffic light action.
  • post-processing can be performed on the signal light action at the preset time at the current intersection according to the preset post-processing rules, so as to obtain the signal light action at the preset time at the current intersection after post-processing; according to the post-processing
  • the signal light action of the current intersection at the preset time, and the current intersection is controlled to execute the signal light action at the preset time after the post-processing.
  • the post-processing is used to modify the final signal light action, and the post-processing can be composed of various rules.
  • the final phase can be adjusted by defining the phase (equivalent to limiting the signal light action) corresponding to the length of time the vehicle stays on the lane. Correction (because the defined signal light actions are independent, it is possible that some phases have not been selected and the vehicles on the corresponding lanes have to wait too long).
  • a post-processing rule may be that if a certain signal light action is not selected and the dwell time of vehicles in the lane corresponding to the signal light action exceeds a preset threshold, then the signal light action is selected to allow vehicles in the corresponding lane to pass.
  • the addition of post-processing can make the final action more reasonable.
  • the status information of the current intersection and the adjacent intersections within the first preset time period is obtained, and the graph structure of the current intersection and the adjacent intersections is obtained, and the status information includes the parking of vehicles in each lane
  • the location and the length of stay the graph structure includes the connection relationship between the current intersection and the adjacent intersection;
  • the state information and the graph structure are input into the pre-trained agent to predict the current intersection in the preset
  • the signal light action at the moment is obtained by the agent through reinforcement learning training; according to the signal light action at the current intersection at the preset time, the signal light at the current intersection is controlled to execute the signal light action at the preset time.
  • the agent By using the state information of the current intersection and adjacent intersections within the first preset time period, as well as the graph structure of the current intersection and adjacent intersections as the input of the agent, the agent outputs signal lights. Since the state information includes timing information and lane The parking position and duration of the vehicle in the vehicle, taking into account the recent congestion of the lane, and the graph structure includes the spatial dependence of each intersection, taking into account the spatial distribution of each intersection, so that the agent can accurately predict the current intersection from the dimension of time and space The signal light action at the preset time, the control signal light at the intersection executes the signal light action at the preset time, avoiding vehicle congestion or idle time windows, thereby improving the traffic efficiency of the overall road network.
  • the traffic signal light control method provided by the embodiment of the present invention can be applied to smart phones, computers, servers and other devices capable of controlling traffic signal lights.
  • FIG. 4 is a structural diagram of a traffic signal light control device provided by an embodiment of the present invention. As shown in FIG. 4, the traffic signal light control device includes:
  • An acquisition module 401 configured to acquire status information of the current intersection and adjacent intersections within a first preset time period, and acquire a graph structure of the current intersection and the adjacent intersections, the status information including the parking of vehicles in each lane location and length of stay, the graph structure includes the connection relationship between the current intersection and the adjacent intersection;
  • a prediction module 402 configured to input the state information and the graph structure into a pre-trained agent to predict the signal light action at the preset time at the current intersection, and the agent is obtained through reinforcement learning training;
  • the first control module 403 is configured to control the signal light at the current intersection to perform the signal light action at the preset time according to the signal light action at the current intersection at the preset time.
  • the obtaining module 401 includes:
  • the first acquisition sub-module is used to acquire the image information of each lane of the current intersection at the current moment, and extract the parking position and duration of the vehicle in each lane according to the image information of each lane;
  • the first calculation sub-module is used to calculate the lane queue length corresponding to each lane according to the parking position and the staying time of the vehicle in each lane;
  • the second calculation sub-module is used to calculate the state information of the current intersection at the current moment according to the signal light action information and the lane queue length corresponding to each lane;
  • the second obtaining sub-module is used to obtain state information corresponding to each moment within a first preset time period, where the first preset time period includes the current moment.
  • the second computing submodule includes:
  • an acquisition unit configured to acquire the signal light action at the current intersection at the current moment
  • a calculation unit configured to calculate the sum of the lane queue lengths corresponding to the lanes that are allowed to pass under the signal light action at the current intersection at the current moment;
  • the processing unit is configured to obtain the state information of the current intersection at the current moment according to the sum of the signal light action at the current intersection and the lane queue length corresponding to the allowed lane.
  • the device also includes:
  • a building block for building an intelligent body the intelligent body is constructed to output a signal light action according to the state information
  • the training module is used to use the traffic volume of the current intersection within the second preset time period as a reward to perform reinforcement learning training on the agent, and obtain the trained agent as the preset agent after the training is completed.
  • the building blocks include:
  • a first construction submodule configured to construct a signal light action network based on a spatio-temporal graph convolutional network and a first output network, and the signal light action network outputs a signal light action through the first output network;
  • the second construction sub-module is used to construct an evaluation network based on the spatio-temporal graph convolutional network and a second output network, the evaluation network outputs a state value through the second output network, and the state value is used to evaluate the signal light action network performance, the evaluation network shares the parameters of a spatio-temporal graph convolutional network with the signal light action network;
  • the third construction sub-module is used to construct an agent according to the signal light action network and the evaluation network.
  • the training module includes:
  • the fourth construction sub-module is used to build a road network simulation environment according to the preset number of simulated intersections, simulated roads, the connection relationship between simulated intersections, the maximum speed limit of each simulated road, and the length of simulated roads.
  • Each simulated intersection is set A constructed intelligent body, the road network simulation environment randomly generates simulated traffic flow in each simulated lane;
  • the first processing sub-module is used to take the state information of all simulated intersections within the first preset time period and the graph structure corresponding to the road network simulation environment as the input of the constructed agent every preset time , outputting a signal light action through the constructed agent;
  • the reward sub-module is used to use the traffic volume of each simulated intersection in the second preset time period as a reward after performing the signal light action, and carry out reinforcement learning training to the constructed intelligent body;
  • the second processing sub-module is used to use the signal light action network in the trained intelligence as a preset agent after the training is completed.
  • the device also includes:
  • the post-processing module is used to perform post-processing on the signal light action of the current intersection at the preset time according to the preset post-processing rules, so as to obtain the signal light action of the current intersection at the preset time after post-processing;
  • the second control module is configured to control the current intersection to execute the signal light action at the preset time according to the post-processed signal light action at the current intersection at the preset time.
  • the traffic signal light control device provided by the embodiment of the present invention can be applied to smart phones, computers, servers and other devices capable of controlling traffic signal lights.
  • the data center equipment provided by the embodiments of the present invention can implement various processes implemented by the traffic signal light control method in the above method embodiments, and can achieve the same beneficial effects. To avoid repetition, details are not repeated here.
  • FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention. As shown in FIG. 5, it includes: a memory 502, a processor 501 and a A computer program for a traffic signal control method operating on 501, wherein:
  • the processor 501 is used to call the computer program stored in the memory 502, and perform the following steps:
  • the state information includes the parking position and the duration of the vehicle in each lane, so
  • the graph structure includes the connection relationship between the current intersection and the adjacent intersection;
  • the signal light at the current intersection is controlled to execute the signal light action at the preset time.
  • the acquiring state information of the current intersection and adjacent intersections within the first preset time period performed by the processor 501 includes:
  • the image information of each lane at the current intersection is obtained, and the parking position and duration of the vehicle in each lane are extracted according to the image information of each lane;
  • the calculation performed by the processor 501 on the status information of the current intersection at the current moment according to the signal light action information and the lane queue length corresponding to each lane includes:
  • the state information of the current intersection at the current moment is calculated according to the sum of the signal light action at the current intersection and the lane queue length corresponding to the allowed lane.
  • the method executed by the processor 501 further includes :
  • the agent is trained in reinforcement learning, and after the training is completed, the trained agent is obtained as the preset agent.
  • the building agent executed by the processor 501 includes:
  • the signal light action network Constructing a signal light action network based on the spatio-temporal graph convolutional network and the first output network, the signal light action network outputs the signal light action through the first output network;
  • An evaluation network is constructed based on a spatio-temporal graph convolutional network and a second output network, the evaluation network outputs a state value through the second output network, and the state value is used to evaluate the performance of the signal light action network, and the evaluation network and The signal light action network shares the parameters of a spatio-temporal graph convolutional network;
  • an agent is constructed.
  • the processor 501 performs reinforcement learning training on the agent with the traffic volume of the current intersection within the second preset time period as a reward, and after the training is completed, the trained agent is obtained as a preset
  • the set of intelligent agents including:
  • a road network simulation environment is constructed.
  • Each simulated intersection is set with a well-built intelligent body.
  • the road network simulation environment randomly generates simulated traffic flow in each simulation lane;
  • the state information of all simulated intersections in the first preset time period and the graph structure corresponding to the road network simulation environment are used as the input of the constructed intelligent agent, through the constructed intelligent agent Body output signal light action;
  • the traffic volume of each simulated intersection in the second preset time period is used as a reward, and the constructed intelligent body is trained for reinforcement learning;
  • the signal light action network in the trained intelligence is used as the preset intelligent body.
  • the method executed by the processor 501 further includes :
  • the current intersection is controlled to execute the signal light action at the preset time.
  • the electronic device provided by the embodiment of the present invention can be applied to smart phones, computers, servers and other devices capable of controlling traffic lights.
  • the electronic device provided by the embodiment of the present invention can realize each process realized by the traffic signal light control method in the above method embodiment, and can achieve the same beneficial effect. To avoid repetition, details are not repeated here.
  • the embodiment of the present invention also provides a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the computer program is executed by a processor, each process of the traffic signal light control method provided in the embodiment of the present invention is implemented, and can To achieve the same technical effect, in order to avoid repetition, no more details are given here.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM for short).

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Traffic Control Systems (AREA)

Abstract

Embodiments of the present invention provide a traffic light control method, comprising: obtaining state information of a current intersection and an adjacent intersection within a first preset time period, and obtaining a graph structure of the current intersection and the adjacent intersection, the state information comprising a stay position and a stay duration of a vehicle in each lane, and the graph structure comprising a connection relationship between the current intersection and the adjacent intersection; inputting the state information and the graph structure into a pre-trained intelligent agent, and predicting a traffic light action at the current intersection at a preset point in time, the intelligent agent being trained by means of reinforcement learning; and controlling a traffic light at the current intersection to execute the traffic light action at the preset point in time according to the traffic light action at the current intersection at the preset point in time. The traffic light action at the current intersection at the preset point in time is accurately predicted from a space-time dimension by means of the intelligent agent, and then the traffic light at the intersection is controlled to execute the traffic light action at the preset point in time, thereby avoiding vehicle congestion or an idle passing time window and thus improving the passing efficiency of a whole road network.

Description

交通信号灯控制方法及相关设备Traffic signal light control method and related equipment
本申请要求于2021年12月31日提交中国专利局,申请号为202111674229.X、发明名称为“交通信号灯控制方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111674229.X and the title of the invention "Traffic signal light control method and related equipment" filed with the China Patent Office on December 31, 2021, the entire contents of which are incorporated herein by reference Applying.
技术领域technical field
本发明涉及交通信号灯控制领域,尤其涉及一种交通信号灯控制方法及相关设备。The invention relates to the field of traffic signal lamp control, in particular to a traffic signal lamp control method and related equipment.
背景技术Background technique
交通信号灯控制是智慧城市建设中必不可少的一环,有效控制好交通信号灯,对于缓解城市交通拥堵具有重大意义。目前,交通信号灯普遍使用单点定时控制的控制方式,即在固定时段内按照预设相位的顺序和时长对各向车流依次放行,这样会导致车辆多的车道无法在固定时段内全部通行,在固定时段未通行的车辆需要等到下一个周期的相位才有可能继续通行,车辆少的车道存在冗余的通行时间,导致通行时间窗口闲置(无车辆通行)。因此,现有的交通信号灯控制方式存在通行效率低的问题。Traffic signal light control is an essential part of smart city construction. Effective control of traffic signal lights is of great significance for alleviating urban traffic congestion. At present, traffic lights generally use the control method of single-point timing control, that is, within a fixed period of time, according to the order and duration of the preset phase, the traffic flow in each direction is released sequentially. Vehicles that have not passed through for a fixed period of time need to wait until the phase of the next cycle before they can continue to pass. Lanes with few vehicles have redundant passing time, resulting in idle passing time windows (no vehicles passing). Therefore, the existing traffic signal light control method has the problem of low traffic efficiency.
发明内容Contents of the invention
本发明实施例提供一种交通信号灯控制方法,通过将当前路口以及邻接路口在第一预设时间段内的状态信息,以及当前路口与邻接路口的图结构作为智能体的输入,通过智能体输出信号灯动作,由于状态信息中包括时序信息以及车道中车辆的停留位置以及停留时长,考虑到车道的拥堵情况,而图结构中包括各个路口的空间依赖,考虑到各个路口的空间分布情况,使得智能体能从时空的维度准确预测当前路口在预设时刻的信号灯动作,在控制路口信号灯在预设时刻执行该信号灯动作,避免车辆拥堵或通行时间窗口闲置,从而提高整个路网车辆的通行效率。An embodiment of the present invention provides a traffic signal light control method, by taking the state information of the current intersection and adjacent intersections within the first preset time period, and the graph structure of the current intersection and adjacent intersections as the input of the agent, and outputting the information through the agent Signal light action, because the state information includes timing information and the parking position and duration of vehicles in the lane, considering the congestion of the lane, and the graph structure includes the spatial dependence of each intersection, taking into account the spatial distribution of each intersection, making intelligent The body can accurately predict the signal light action at the preset time at the current intersection from the dimension of time and space, and control the signal light at the intersection to execute the signal light action at the preset time to avoid vehicle congestion or idle time windows, thereby improving the traffic efficiency of vehicles on the entire road network.
第一方面,本发明实施例提供一种交通信号灯控制方法,所述交通信号灯控制方法包括:In a first aspect, an embodiment of the present invention provides a traffic signal light control method, the traffic signal light control method comprising:
获取当前路口以及邻接路口在第一预设时间段内的状态信息,以及获取所述当前路口与所述邻接路口的图结构,所述状态信息包括各个车道中车辆的停留位置以及停留时长,所述图结构包括所述当前路口与所述邻接路口的连接关系;Obtain the state information of the current intersection and the adjacent intersection within the first preset time period, and obtain the graph structure of the current intersection and the adjacent intersection, the state information includes the parking position and the duration of the vehicle in each lane, so The graph structure includes the connection relationship between the current intersection and the adjacent intersection;
将所述状态信息和所述图结构输入到预训练的智能体中,预测所述当前路口在预设时刻的信号灯动作,所述智能体通过强化学习的训练得到;Inputting the state information and the graph structure into a pre-trained agent, predicting the signal light action at the preset time at the current intersection, and the agent is obtained through reinforcement learning training;
根据所述当前路口在预设时刻的信号灯动作,控制所述当前路口的信号灯在所述预设时刻执行所述信号灯动作。According to the signal light action at the current intersection at the preset time, the signal light at the current intersection is controlled to execute the signal light action at the preset time.
可选的,所述获取当前路口以及邻接路口在第一预设时间段内的状态信息,包括:Optionally, the acquiring state information of the current intersection and adjacent intersections within the first preset time period includes:
在当前时刻,获取所述当前路口各个车道的图像信息,根据所述各个车道的图像信息提取所述各个车道中车辆的停留位置以及停留时长;At the current moment, the image information of each lane at the current intersection is obtained, and the parking position and duration of the vehicle in each lane are extracted according to the image information of each lane;
根据所述各个车道中车辆的停留位置以及停留时长,计算所述各个车道对应的车道队列长度;According to the parking position and the staying time of the vehicle in each lane, calculate the lane queue length corresponding to each lane;
根据信号灯动作信息以及所述各个车道对应的车道队列长度,计算所述当前路口在当前时刻的状态信息;calculating the status information of the current intersection at the current moment according to the signal light action information and the lane queue lengths corresponding to the respective lanes;
获取第一预设时间段内每个时刻对应的状态信息,所述第一预设时间段包括当前时刻。Obtain state information corresponding to each moment within a first preset time period, where the first preset time period includes the current moment.
可选的,所述根据信号灯动作信息以及所述各个车道对应的车道队列长度,计算所述当前路口在当前时刻的状态信息,包括:Optionally, the calculating the state information of the current intersection at the current moment according to the signal light action information and the lane queue length corresponding to each lane includes:
获取所述当前路口在当前时刻的信号灯动作;Obtaining the signal light action at the current intersection at the current moment;
计算所述当前路口在当前时刻的信号灯动作下允许通行的车道所对应的车道队列长度之和;Calculating the sum of the lane queue lengths corresponding to the lanes that are allowed to pass under the signal light action at the current moment at the current intersection;
根据所述当前路口在当前时刻的信号灯动作以及所述允许通行的车道所对应的车道队列长度之和,得到所述当前路口在所述当前时刻的状态信息。The state information of the current intersection at the current moment is obtained according to the sum of the signal light action at the current intersection and the lane queue length corresponding to the allowed lane.
可选的,在所述将所述状态信息和所述图结构输入到预训练的智能体中,预测所述当前路口在预设时刻的信号灯动作之前,所述方法还包括:Optionally, before inputting the state information and the graph structure into the pre-trained agent to predict the signal light action at the current intersection at a preset time, the method further includes:
构建智能体,所述智能体被构建为根据状态信息输出信号灯动作;Constructing an agent configured to output a signal light action according to state information;
以当前路口在第二预设时间段内的通行量为奖励,对所述智能体进行强化学习的训练,训练完成后得到训练好的智能体作为预设的智能体。Taking the traffic volume at the current intersection within the second preset time period as a reward, the agent is trained in reinforcement learning, and after the training is completed, the trained agent is obtained as the preset agent.
可选的,所述构建智能体,包括:Optionally, the construction of an intelligent body includes:
基于时空图卷积网络和第一输出网络构建信号灯动作网络,所述信号灯动作网络通过所述第一输出网络输出信号灯动作;Constructing a signal light action network based on the spatio-temporal graph convolutional network and the first output network, the signal light action network outputs the signal light action through the first output network;
基于时空图卷积网络和第二输出网络构建评价网络,所述评价网络通过所述第二输出网络输出状态价值,所述状态价值用于评价所述信号灯动作网络的表现,所述评价网络与所述信号灯动作网络共享一个时空图卷积网络的参数;An evaluation network is constructed based on a spatio-temporal graph convolutional network and a second output network, the evaluation network outputs a state value through the second output network, and the state value is used to evaluate the performance of the signal light action network, and the evaluation network and The signal light action network shares the parameters of a spatio-temporal graph convolutional network;
根据所述信号灯动作网络和所述评价网络,构建得到智能体。According to the signal light action network and the evaluation network, an agent is constructed.
可选的,所述以当前路口在第二预设时间段内的通行量为奖励,对所述智能体进行强化学习的训练,训练完成后得到训练好的智能体作为预设的智能体,包括:Optionally, using the traffic volume at the current intersection within the second preset time period as a reward, the agent is trained in reinforcement learning, and after the training is completed, the trained agent is obtained as the preset agent, include:
根据预设数量的仿真路口、仿真道路、仿真路口之间的联通关系、每条仿真道路的最高限速以及仿真道路长度构建路网仿真环境,每个仿真路口设置一个构建好的智能体,所述路网仿真环境随机在各个仿真车道生成仿真车流;According to the preset number of simulated intersections, simulated roads, the connection relationship between simulated intersections, the maximum speed limit of each simulated road, and the length of simulated roads, a road network simulation environment is constructed. Each simulated intersection is set with a well-built intelligent body. The road network simulation environment randomly generates simulated traffic flow in each simulation lane;
每隔预设时间,将所有仿真路口在所述第一预设时间段内的状态信息以及路网仿真环境对应的图结构作为所述构建好的智能体的输入,通过所述构建好的智能体输出信号灯动作;Every preset time, the state information of all simulated intersections in the first preset time period and the graph structure corresponding to the road network simulation environment are used as the input of the constructed intelligent agent, through the constructed intelligent agent Body output signal light action;
在执行信号灯动作后,将各个仿真路口在第二预设时间段内的通行量作为奖励,对所述构建好的智能体进行强化学习的训练;After executing the signal light action, the traffic volume of each simulated intersection in the second preset time period is used as a reward, and the constructed intelligent body is trained for reinforcement learning;
训练完成后,将所述训练好的智能中的信号灯动作网络作为预设的智能体。After the training is completed, the signal light action network in the trained intelligence is used as the preset intelligent body.
可选的,在所述将所述状态信息和所述图结构输入到预训练的智能体中,预测所述当前路口在预设时刻的信号灯动作之后,所述方法还包括:Optionally, after inputting the state information and the graph structure into the pre-trained agent to predict the signal light action at the current intersection at a preset time, the method further includes:
根据预设的后处理规则,对所述当前路口在预设时刻的信号灯动作进行后处理,得到经过后处理的当前路口在预设时刻的信号灯动作;performing post-processing on the signal light action at the preset time at the current intersection according to the preset post-processing rule, to obtain the signal light action at the preset time at the current intersection after post-processing;
根据所述经过后处理的当前路口在预设时刻的信号灯动作,控制所述当前路口在所述预设时刻执行所述信号灯动作。According to the post-processed signal light action at the current intersection at the preset time, the current intersection is controlled to execute the signal light action at the preset time.
第二方面,本发明实施例提供一种交通信号灯控制装置,所述装置包括:In a second aspect, an embodiment of the present invention provides a traffic signal light control device, the device comprising:
获取模块,用于获取当前路口以及邻接路口在第一预设时间段内的状态信息,以及获取所述当前路口与所述邻接路口的图结构,所述状态信息包括各个车道中车辆的停留位置以及停留时长,所述图结构包括所述当前路口与所述邻接路口的连接关系;An acquisition module, configured to acquire status information of the current intersection and adjacent intersections within a first preset time period, and acquire a graph structure of the current intersection and the adjacent intersections, the status information including the parking positions of vehicles in each lane And the length of stay, the graph structure includes the connection relationship between the current intersection and the adjacent intersection;
预测模块,用于将所述状态信息和所述图结构输入到预训练的智能体中,预测所述当前路口在预设时刻的信号灯动作,所述智能体通过强化学习的训练得到;A prediction module, configured to input the state information and the graph structure into a pre-trained agent to predict the signal light action at the current intersection at a preset moment, and the agent is obtained through reinforcement learning training;
第一控制模块,用于根据所述当前路口在预设时刻的信号灯动作,控制所述当前路口的信号灯在所述预设时刻执行所述信号灯动作。The first control module is configured to control the signal light at the current intersection to perform the signal light action at the preset time according to the signal light action at the current intersection at the preset time.
第三方面,本发明实施例提供一种电子设备,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现本发明实施例提供的交通信号灯控制方法中的步骤。In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the computer program, The steps in the traffic signal light control method provided by the embodiment of the present invention are implemented.
第四方面,本发明实施例提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现发明实施例提供的交通信号灯控制方法中的步骤。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the traffic signal light control method provided in the embodiment of the present invention is implemented. A step of.
本发明实施例中,获取当前路口以及邻接路口在第一预设时间段内的状态信息,以及获取所述当前路口与所述邻接路口的图结构,所述状态信息包括各个车道中车辆的停留位置以及停留时长,所述图结构包括所述当前路口与所述邻接路口的连接关系;将所述状态信息和所述图结构输入到预训练的智能体中,预测所述当前路口在预设时刻的信号灯动作,所述智能体通过强化学习的训练得到;根据所述当前路口在预设时刻的信号灯动作,控制所述当前路口的信号灯在所述预设时刻执行所述信号灯动作。通过将当前路口以及邻接路口在第一预设时间段内的状态信息,以及当前路口与邻接路口的图结构作为智能体的输入,通过智能体输出信号灯动作,由于状态信息中包括时序信息以及车道中车辆的停留位置以及停留时长,考虑到车道最近一段时间的拥堵情况,而图结构中包括各个路口的空间依赖,考虑到各个路口的空间分布情况,使得智能体能从时空的维度准确预测当前路口在预设时刻的信号灯动作,在控制路口信号灯在预设时刻执行该信号灯动作,避免车辆拥堵 或通行时间窗口闲置,从而提高整体路网的通行效率。In the embodiment of the present invention, the status information of the current intersection and the adjacent intersections within the first preset time period is obtained, and the graph structure of the current intersection and the adjacent intersections is obtained, and the status information includes the parking of vehicles in each lane The location and the length of stay, the graph structure includes the connection relationship between the current intersection and the adjacent intersection; the state information and the graph structure are input into the pre-trained agent to predict the current intersection in the preset The signal light action at the moment is obtained by the agent through reinforcement learning training; according to the signal light action at the current intersection at the preset time, the signal light at the current intersection is controlled to execute the signal light action at the preset time. By using the state information of the current intersection and adjacent intersections within the first preset time period, as well as the graph structure of the current intersection and adjacent intersections as the input of the agent, the agent outputs signal lights. Since the state information includes timing information and lane The parking position and duration of the vehicle in the vehicle, taking into account the recent congestion of the lane, and the graph structure includes the spatial dependence of each intersection, taking into account the spatial distribution of each intersection, so that the agent can accurately predict the current intersection from the dimension of time and space The signal light action at the preset time, the control signal light at the intersection executes the signal light action at the preset time, avoiding vehicle congestion or idle time windows, thereby improving the traffic efficiency of the overall road network.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1是本发明实施例提供的一种交通信号灯控制方法的流程示意图;FIG. 1 is a schematic flowchart of a traffic signal light control method provided by an embodiment of the present invention;
图2是本发明实施例提供的一种路口信号灯相位的示意图;Fig. 2 is a schematic diagram of phases of a signal light at an intersection provided by an embodiment of the present invention;
图3是本发明实施例提供的一种智能体的网络架构图;FIG. 3 is a network architecture diagram of an agent provided by an embodiment of the present invention;
图4是本发明实施例提供的一种交通信号灯控制装置的结构示意图;Fig. 4 is a schematic structural diagram of a traffic signal light control device provided by an embodiment of the present invention;
图5是本发明实施例提供的一种电子设备的结构示意图。Fig. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.
请参见图1,图1是本发明实施例提供的一种交通信号灯控制方法的流程图,如图1所示,该交通信号灯控制方法包括:Please refer to FIG. 1. FIG. 1 is a flowchart of a traffic signal light control method provided by an embodiment of the present invention. As shown in FIG. 1, the traffic signal light control method includes:
101、获取当前路口以及邻接路口在第一预设时间段内的状态信息,以及获取当前路口与邻接路口的图结构。101. Obtain status information of the current intersection and adjacent intersections within a first preset time period, and acquire graph structures of the current intersection and adjacent intersections.
在本发明实施例中,上述状态信息包括当前路口的状态信息以及邻接路口的状态信息,进一步的,上述当前路口的状态信息包括当前路口中各个车道中车辆的停留位置以及停留时长,上述邻接路口的状态信息包括邻接路口中各个车道的停留位置以及停留时长。In the embodiment of the present invention, the above-mentioned state information includes the state information of the current intersection and the state information of the adjacent intersections. Further, the state information of the above-mentioned current intersection includes the parking position and the duration of the vehicles in each lane in the current intersection. The above-mentioned adjacent intersections The status information of the vehicle includes the location and duration of each lane in the adjacent intersection.
上述图结构包括上述当前路口与上述邻接路口的连接关系,上述当前路口与上述邻接路口的连接关系可以理解为当前路口与哪些路口连接,车辆可以从这些路口行驶到当前路口,也可以从当前路口行驶到这些路口。在图结构中,当前路口与邻接路口作为节点,当前路口与邻接路口的连接关系作为 权重边,当前路口与邻接路口的距离越近,则权重边的值越大,当前路口与邻接路口的距离越远,则权重边的值越小。The above-mentioned graph structure includes the connection relationship between the above-mentioned current intersection and the above-mentioned adjacent intersection. The connection relationship between the above-mentioned current intersection and the above-mentioned adjacent intersection can be understood as which intersections are connected to the current intersection. Vehicles can drive from these intersections to the current intersection, or from the current intersection. Drive to these intersections. In the graph structure, the current intersection and adjacent intersections are used as nodes, and the connection relationship between the current intersection and adjacent intersections is used as a weight edge. The closer the distance between the current intersection and the adjacent intersection, the greater the value of the weight edge. The distance between the current intersection and the adjacent intersection The farther away, the smaller the value of the weight edge.
上述图结构可以是预先构建得到的,根据每个路口与其他路口的连接关系以及距离,构建得到对应的图结构,图结构为路网的固定结构,在路网没有发生变化前,图结构是不会发生变化的。图结构编码了不同交通路口之间的空间依赖关系,在图结构中,每个节点代表一个交通路口,节点之间的边关系可以有多种定义,例如可以定义每个交通路口与其相邻的K=4个交通路口存在边,同时每个节点还有一条边指向自己,以及每条边具有权重值等。The above graph structure can be pre-built. According to the connection relationship and distance between each intersection and other intersections, the corresponding graph structure is constructed. The graph structure is a fixed structure of the road network. Before the road network changes, the graph structure is There will be no change. The graph structure encodes the spatial dependencies between different traffic intersections. In the graph structure, each node represents a traffic intersection, and the edge relationship between nodes can be defined in many ways. For example, each traffic intersection and its adjacent There are edges at K=4 traffic intersections, and each node also has an edge pointing to itself, and each edge has a weight value, etc.
上述当前路口以及邻接路口在第一预设时间段内的状态信息可以通过设置在当前路口以及邻接路口中的摄像头拍摄到的图像信息得到。摄像头会对自身所处的路口中各个车道进行实时的图像采集,得到路口自身所处的路口中各个车道的图像信息。The status information of the above-mentioned current intersection and adjacent intersections within the first preset time period can be obtained from image information captured by cameras installed at the current intersection and adjacent intersections. The camera will collect real-time images of each lane in the intersection where it is located, and obtain the image information of each lane in the intersection where it is located.
具体的,以当前车道为例进行说明,可以在当前时刻,获取当前路口各个车道的图像信息,根据各个车道的图像信息提取各个车道中车辆的停留位置以及停留时长;根据各个车道中车辆的停留位置以及停留时长,计算各个车道对应的车道队列长度;根据信号灯动作信息以及各个车道对应的车道队列长度,计算当前路口在当前时刻的状态信息;获取第一预设时间段内每个时刻对应的状态信息,所述第一预设时间段包括当前时刻。Specifically, taking the current lane as an example, the image information of each lane at the current intersection can be obtained at the current moment, and the parking position and duration of the vehicle in each lane can be extracted according to the image information of each lane; Calculate the lane queue length corresponding to each lane based on the location and length of stay; calculate the status information of the current intersection at the current moment according to the signal light action information and the lane queue length corresponding to each lane; obtain the corresponding State information, the first preset time period includes the current moment.
在摄像头采集到当前时刻的当前路口各个车道的图像信息后,可以对图像信息进行车辆检测,得到各个车道的车辆信息,车辆信息包括车辆的停留位置以及停留时长。After the camera collects the image information of each lane at the current intersection at the current moment, the vehicle detection can be performed on the image information to obtain the vehicle information of each lane. The vehicle information includes the vehicle's parking position and duration.
具体地,上述车辆的停留位置可以是车辆停留在哪条车道的预设区域,比如,对于当前路口,可以记录每条车道上距离当前路口50米内的车辆。上述车辆的停留时间可以是这些车辆在对应车道上面的停留时长,停留时长可以从车辆进入车道的预设区域开始计算,比如车辆进入距离当前路口50米内开始计算。Specifically, the parking position of the above-mentioned vehicle may be a preset area of which lane the vehicle stays in. For example, for the current intersection, the vehicles on each lane within 50 meters from the current intersection may be recorded. The dwell time of the above-mentioned vehicles can be the length of stay of these vehicles on the corresponding lane, and the dwell time can be calculated from the time when the vehicle enters the preset area of the lane, for example, the calculation starts when the vehicle enters within 50 meters from the current intersection.
根据各个车道中车辆的停留位置以及停留时长,计算各个车道对应的车道队列长度。以车道l为例进行说明,根据车道l中车辆的停留位置以及停留时长,根据车道l的车辆集合V l,车辆集合V l中的每辆车都是在车道l中的预设区域内的车辆,则车道l对应的车道队列长度可以如下述式子所示: According to the parking position and duration of vehicles in each lane, the lane queue length corresponding to each lane is calculated. Taking lane l as an example, according to the parking position and duration of vehicles in lane l, according to the vehicle set V l of lane l, each vehicle in the vehicle set V l is within the preset area of lane l vehicles, the lane queue length corresponding to lane l can be shown in the following formula:
Figure PCTCN2022099853-appb-000001
Figure PCTCN2022099853-appb-000001
其中,V l是车道l上面距离路口预设区域内的车辆的集合,t v是车辆v在车道l上面的停留时长,
Figure PCTCN2022099853-appb-000002
和w是超参数。可以看出,本发明实施例中的车道队列长度中考虑了车辆在车道上的停留时长,一般来说,如果一个车道上存在很多辆车的停留时长很长(即t v很大),那么相应的车道队列长度也会很长。
Among them, V l is the set of vehicles on the lane l within the preset area from the intersection, t v is the length of time that the vehicle v stays on the lane l,
Figure PCTCN2022099853-appb-000002
and w are hyperparameters. It can be seen that the length of the lane queue in the embodiment of the present invention takes into account the length of time the vehicle stays on the lane. Generally speaking, if there are many vehicles in a lane and the length of stay is very long (that is, t v is very large), then The corresponding lane queue length will also be very long.
需要说明的是,上述针对车道l的车道队列长度计算也可以用于当前路口的其他车道的车道队列长度计算,以及其他路口的车道的车道队列长度计算。It should be noted that the above calculation of the lane formation length for lane 1 can also be used for the calculation of the lane formation lengths of other lanes at the current intersection, and the calculation of the lane formation lengths of the lanes at other intersections.
在本发明实施例中,信号灯动作信息可以是根据信号灯相位来确定的,信号灯的相位可以参考图2,图2是本发明实施例提供的一种路口信号灯相位的示意图,在图2中,交通路口为四叉路口,包括1-24号总共24条车道,基于现行的右转不等待信号灯指示,前行与左转需等待信号灯指标,同一个分叉路口左转与前行同时进行,对向路口无交叉交通路口通行规则,则一个四叉路口存在1-8号总共8个信号灯相位,在图2中,以四叉路口包括东、南、西、北四个分叉路口,每个路口包括左转车道、前行车道、右转车道、3个来驶车道为例,分叉路口北则包括左转车道1、前行车道2、右转车道3、来驶车道13、来驶车道14、来驶车道15;分叉路口东则包括左转车道4、前行车道5、右转车道6、来驶车道16、来驶车道17、来驶车道18;分叉路口南则包括左转车道7、前行车道8、右转车道9、来驶车道19、来驶车道20、来驶车道21;分叉路口西则包括左转车道10、前行车道11、右转车道12、来驶车道22、来驶车道23、来驶车道24;1号相位对应左转车道1和左转车道7的放行动作,2号相位对应前行车道2和前行车道8的放行动作,3号相位对应左转车道4和左转车道10的放行动作,4号相位对应前行车道5和前行车道11的放行动作,5号相位对应前行车道2和左转车道1的放行动作,6号相位对前行车道5和左转车道4的放行动作,7号相位对应前行车道8和左转车道7的放行动作,8号相位对应前行车道11和左转车道10的放行动作,8个相位,分别对应信号灯的8个放行动作。In the embodiment of the present invention, the action information of the signal light can be determined according to the phase of the signal light. For the phase of the signal light, refer to FIG. 2 . The intersection is a four-fork intersection, including a total of 24 lanes from No. 1 to No. 24. Based on the current right turn without waiting for the signal light indicator, forward and left turn need to wait for the signal light indicator, and the same bifurcated intersection is performed at the same time as the left turn and forward movement. If there is no cross traffic rule at the crossing, then there are a total of 8 signal light phases No. 1-8 in a four-fork intersection. In Figure 2, the four-fork intersection includes four fork intersections in the east, south, west, and north. The intersection includes left-turn lane, forward lane, right-turn lane, and 3 oncoming lanes. For example, the north of the bifurcation intersection includes left-turn lane 1, forward lane 2, right-turn lane 3, oncoming lane 13, and oncoming lane. Lane 14, oncoming lane 15; the east of the fork intersection includes left-turn lane 4, forward lane 5, right-turn lane 6, oncoming lane 16, oncoming lane 17, and oncoming lane 18; the south of the fork includes 7 left-turn lanes, 8 front lanes, 9 right-turn lanes, 19 oncoming lanes, 20 oncoming lanes, and 21 oncoming lanes; the west of the fork intersection includes 10 left-turn lanes, 11 forward lanes, and 12 right-turn lanes , oncoming lane 22, oncoming lane 23, oncoming lane 24; Phase 1 corresponds to the release action of left-turn lane 1 and left-turn lane 7, and phase 2 corresponds to the release action of forward lane 2 and forward lane 8, Phase 3 corresponds to the release action of left-turn lane 4 and left-turn lane 10, phase 4 corresponds to the release action of forward lane 5 and forward lane 11, and phase 5 corresponds to the release action of forward lane 2 and left-turn lane 1 , Phase 6 corresponds to the clearance action of the forward lane 5 and left-turn lane 4, phase 7 corresponds to the clearance action of the forward lane 8 and left-turn lane 7, and phase 8 corresponds to the clearance action of the forward lane 11 and left-turn lane 10 Actions, 8 phases, respectively corresponding to 8 release actions of signal lights.
在本发明实施例中,当前路口的状态信息可以理解为当前路口信号灯的状态信息,当前路口的状态信息包括与路口相位总数相等的维数,当前路口 为四叉路口时信号灯的相位总数为8,则当前路口的状态信息具有8个维度。In the embodiment of the present invention, the state information of the current intersection can be understood as the state information of the signal light at the current intersection. The state information of the current intersection includes a dimension equal to the total number of phases of the intersection. When the current intersection is a four-way intersection, the total number of phases of the signal light is 8 , the state information of the current intersection has 8 dimensions.
当然,本发明实施例只是以四叉路口进行举例,对于其他叉口数量的交通路口,路口的状态信息包括与路口相位总数相等的维数。Certainly, the embodiment of the present invention only takes a four-fork intersection as an example. For other intersections, the state information of the intersection includes a dimension equal to the total number of intersection phases.
可选的,交通路口也可以是三叉路口,三叉路口信号灯的相位只有3个,在本发明实施例中,可以在四叉路口信号灯的相位基础上,选择对应的3个信号灯相位作为三叉路口信号灯的相位,例如,在图2的基础上,对于没有分叉路口北(没有车道1-3,13-15)的三叉路口,可以有相位1,4,6可以选择,具体的,可以通过下述表1来表示三叉路口信号灯的相位:Optionally, the traffic intersection can also be a three-way intersection, and there are only three phases of the signal lights at the three-way intersection. In the embodiment of the present invention, on the basis of the phases of the signal lights at the four-way intersection, the corresponding three signal light phases can be selected as the signal lights at the three-way intersection For example, on the basis of Figure 2, for a three-way intersection with no bifurcation north (no lanes 1-3, 13-15), there can be phases 1, 4, and 6 to choose from. Specifically, you can use the following Table 1 is used to represent the phase of the signal lights at the three-way intersection:
表1Table 1
缺失分叉路口missing fork 可选择相位phase selectable
north 1,4,61, 4, 6
East 2,3,72, 3, 7
South 1,4,81, 4, 8
西 West 2,3,52, 3, 5
进一步的,在本发明实施例中,对于三叉路口的状态信息,可以在不可选择的维度上用-1代替,相当于对不可选择的信号灯的相位进行遮挡,比如,四叉路口的状态信息为(1,2,3,4,5,6,7,8),则没有分叉路口北的三叉路口的状态信息为(1,-1,-1,4,-1,6,-1,-1),这样使得路网所有路口的状态信息都是一个8维的向量。通过减少三叉路口的动作空间,可以使得智能体的动作更加高效合理,从而加快智能体的学习速度。Further, in the embodiment of the present invention, for the state information of a three-way intersection, it can be replaced by -1 in an unselectable dimension, which is equivalent to blocking the phase of an unselectable signal light. For example, the state information of a four-way intersection is (1, 2, 3, 4, 5, 6, 7, 8), then the state information of the three-fork intersection north of the fork intersection is (1, -1, -1, 4, -1, 6, -1, -1), so that the state information of all intersections in the road network is an 8-dimensional vector. By reducing the action space of the three-fork intersection, the action of the agent can be made more efficient and reasonable, thereby speeding up the learning speed of the agent.
可选的,可以获取当前路口在当前时刻的信号灯动作;计算当前路口在当前时刻的信号灯动作下允许通行的车道所对应的车道队列长度之和;根据当前路口在当前时刻的信号灯动作以及允许通行的车道所对应的车道队列长度之和,得到所述当前路口在所述当前时刻的状态信息。具体的,上述当前路口在所述当前时刻的状态信息包括两部分信息,一部分信息是当前路口在当前时刻的信号灯动作,另一部分信息是允许通行的车道所对应的车道队列长度之和。Optionally, the signal light action at the current intersection at the current moment can be obtained; calculate the sum of the lane queue lengths corresponding to the lanes that are allowed to pass under the signal light action at the current intersection at the current moment; according to the signal light action at the current intersection at the current moment and the allowed passage The sum of the queue lengths of the lanes corresponding to the lanes of , to obtain the state information of the current intersection at the current moment. Specifically, the above-mentioned status information of the current intersection at the current moment includes two parts of information, one part of the information is the signal light action at the current intersection at the current moment, and the other part of the information is the sum of the lane queue lengths corresponding to the allowed lanes.
信号灯动作与信号灯的相位一一对应,以四叉路口为例,四叉路口信号灯的相位为8个,则信号灯动作也是8个,每个信号灯动作对应一个信号灯的相位。在本发明实施例中,以单个相位作为一个信号灯动作,可以提高相位选择的灵活性。There is a one-to-one correspondence between signal light actions and signal light phases. Taking a four-way intersection as an example, there are 8 signal light phases at a four-way intersection, so there are also 8 signal light actions, and each signal light action corresponds to a signal light phase. In the embodiment of the present invention, a single phase acts as a signal light, which can improve the flexibility of phase selection.
具体来说,当前路口的状态信息包含两部分,第一部分为当前路口当前时刻的信号灯状态,假设当前时刻的信号灯为2,我们使用独热编码(One-hot Encoding)对其进行编码。第二部分为各相位对应的车道上的车道队列长度之和,以四叉路口为例,结合图2进行说明,1号相位对应左转车道1和左转车道7的放行动作,则是左转车道1的车道队列长度与左转车道7的车道队列长度之和。更具体的,可以用L i来表示相位i中允许通行的车道的集合,那么状态信息s的第i维可以定义下述式子: Specifically, the status information of the current intersection includes two parts. The first part is the signal light status at the current intersection. Assuming that the signal light at the current moment is 2, we use One-hot Encoding to encode it. The second part is the sum of the lane queue lengths on the lanes corresponding to each phase. Taking a four-fork intersection as an example and referring to Fig. The sum of the lane formation length of turning lane 1 and the lane formation length of left turning lane 7. More specifically, L i can be used to represent the set of allowed lanes in phase i, then the i-th dimension of state information s can define the following formula:
Figure PCTCN2022099853-appb-000003
Figure PCTCN2022099853-appb-000003
s i表示当前路中的状态信息中第i维状态。 s i represents the i-th dimension state of the state information in the current path.
获取第一预设时间段内每个时刻对应的状态信息,第一预设时间段包括当前时刻。具体的,第一预设时间段是最近的H个时刻,H个时刻包括当前时刻,比如,包括当前时刻在内的最近5个时刻。Obtain state information corresponding to each moment within a first preset time period, where the first preset time period includes the current moment. Specifically, the first preset time period is the latest H moments, and the H moments include the current moment, for example, the latest 5 moments including the current moment.
当前路口以及邻接路口在第一预设时间段内的状态信息则可以被表达为一个全局状态,该全局状态为一个H*N*F的张量,其中H是超参数,代表了第一预设时间段内H个时刻对应的状态信息,比如取H=5,即使用最近5个时刻对应的状态信息。N代表路网中交通信号灯的数量,每个路口通过一个信号灯(或者一个信号灯系统)控制各个车道的放行,F代表了状态信息的维度。The state information of the current intersection and adjacent intersections within the first preset time period can be expressed as a global state, which is a tensor of H*N*F, where H is a hyperparameter, representing the first preset Assume state information corresponding to H times in the time period, for example, H=5, that is, use state information corresponding to the latest 5 moments. N represents the number of traffic lights in the road network. Each intersection controls the clearance of each lane through a signal light (or a signal light system), and F represents the dimension of status information.
102、将状态信息和图结构输入到预训练的智能体中,预测当前路口在预设时刻的信号灯动作。102. Input the state information and graph structure into the pre-trained agent, and predict the signal light action at the preset time at the current intersection.
在本发明实施例中,上述智能体通过强化学习的训练得到,智能体的输入为状态信息H*N*F和图结构G,状态信息H*N*F为当前路口以及邻接路口在第一预设时间段内的状态信息,图结构G为当前路口与邻接路口的图结构。当前路口与邻接路口可以级成一个目标路网,因此,状态信息H*N*F也可以称为目标路网的全局状态,图结构G也可以称为目标路网的图结构。智能体的输出为预设时刻的信号灯动作概率分布,选择概率最大的信号灯动作为预设时刻的信号灯动作。比如,F=8时,则智能体会得到8个预设时刻的信号灯动作的概率分布,每个预设时刻的信号灯动作都有一个概率,概率最大的一个预设时刻的信号灯动作被作为最终的预设时刻的信号灯动作。上述预设时刻可以是当前时刻的下A个时刻,比如可以是当前时刻的下一时刻。In the embodiment of the present invention, the above-mentioned agent is obtained through reinforcement learning training, the input of the agent is state information H*N*F and graph structure G, and the state information H*N*F is the current intersection and adjacent intersections in the first State information within a preset time period, the graph structure G is the graph structure of the current intersection and adjacent intersections. The current intersection and adjacent intersections can be classified into a target road network. Therefore, the state information H*N*F can also be called the global state of the target road network, and the graph structure G can also be called the graph structure of the target road network. The output of the agent is the probability distribution of signal light actions at the preset time, and the signal light action with the highest probability is selected as the signal light action at the preset time. For example, when F=8, the agent will get the probability distribution of signal light actions at 8 preset times, each signal light action at a preset time has a probability, and the signal light action at a preset time with the highest probability is taken as the final Beacon action at preset times. The aforementioned preset moment may be the next A moment of the current moment, for example, it may be the next moment of the current moment.
预设的智能体是通过强化学习的训练得到的,在强化学习中,通过对智能体进行奖励,以使智能体以得到更多的奖励为目标进行学习训练。The preset agent is obtained through reinforcement learning training. In reinforcement learning, the agent is rewarded so that the agent can learn and train with the goal of getting more rewards.
可选的,可以构建智能体,智能体被构建为根据状态信息输出信号灯动作;以当前路口在第二预设时间段内的通行量为奖励,对智能体进行强化学习的训练,训练完成后得到训练好的智能体作为预设的智能体。在路网中,每个路口设置一个智能体来预测该路口在预设时刻的信号灯动作。Optionally, an intelligent agent can be constructed, and the intelligent agent is constructed to output signal light actions according to the state information; the intelligent agent is trained for reinforcement learning with the traffic volume of the current intersection in the second preset time period as a reward, and after the training is completed Get the trained agent as the default agent. In the road network, an agent is set at each intersection to predict the signal light action at the intersection at a preset time.
上述第二预设时间段可以是信号灯动作持续的时间段,在信号灯动作持续的时间段内,对应的车道内的车辆可以通行。当前路口对应的智能体以在第二预设时间段内的通行量为奖励,通行量越高,则奖励越高,正向激励的作用越强。具体的,设V t为在t时刻当前路口通行的车辆的集合,则当前路口的奖励可以通过下述式子进行表示: The above-mentioned second preset time period may be a time period during which the operation of the signal light lasts. During the time period during which the operation of the signal light lasts, vehicles in the corresponding lane can pass. The agent corresponding to the current intersection is rewarded with the traffic volume within the second preset time period. The higher the traffic volume, the higher the reward, and the stronger the positive incentive effect. Specifically, let V t be the set of vehicles passing through the current intersection at time t, then the reward at the current intersection can be expressed by the following formula:
Figure PCTCN2022099853-appb-000004
Figure PCTCN2022099853-appb-000004
可以看出,当前路口的奖励考虑了车辆的停留时长t v,这样可以使智能体更加关注拥堵的车道,从而提高路网整体的通行效率。 It can be seen that the reward at the current intersection takes into account the vehicle's dwell time t v , which can make the agent pay more attention to the congested lanes, thereby improving the overall traffic efficiency of the road network.
可选的,基于时空图卷积网络和第一输出网络构建信号灯动作网络,信号灯动作网络通过第一输出网络输出信号灯动作;基于时空图卷积网络和第二输出网络构建评价网络,评价网络通过第二输出网络输出状态价值,状态价值用于评价信号灯动作网络的表现,评价网络与信号灯动作网络共享一个时空图卷积网络的参数;根据信号灯动作网络和评价网络,构建得到智能体。Optionally, a signal light action network is constructed based on the spatiotemporal graph convolution network and the first output network, and the signal light action network outputs the signal light action through the first output network; an evaluation network is constructed based on the spatiotemporal graph convolution network and the second output network, and the evaluation network passes The second output network outputs the state value. The state value is used to evaluate the performance of the signal light action network. The evaluation network and the signal light action network share the parameters of a spatio-temporal graph convolutional network; according to the signal light action network and the evaluation network, an agent is constructed.
进一步的,时空图卷积网络可以包括图卷积网络、循环神经网络以及全连接网络,其中,图卷积网络用于提取图结构中当前路口以及邻接路口的空间依赖关系,循环神经网络用于提取当前路口以及邻接路口的状态时序依赖关系,通过全连接网络将空间依赖关系与状态时序依赖关系进行融合,得到当前路口以及邻接路口车流的时空信息。Further, the spatio-temporal graph convolutional network may include graph convolutional networks, cyclic neural networks, and fully connected networks, where the graph convolutional network is used to extract the spatial dependencies of the current intersection and adjacent intersections in the graph structure, and the cyclic neural network is used to The state temporal dependencies of the current intersection and adjacent intersections are extracted, and the spatial dependencies and state temporal dependencies are fused through a fully connected network to obtain the spatio-temporal information of the current intersection and adjacent intersections.
更进一步的,上述图卷积网络可以是基于GAT层的图卷积网络,上述循环神经网络可以是基于GRU层的循环神经网络。GAT层作为一种图卷积神经网络,能够很好的捕捉相邻路口的空间相关性,使得智能体在做决策时能把相邻路口的状态也考虑进来。GRU层作为一种循环神经网络,能够很好的捕捉路口状态的时间相关性,使得智能体在做决策时能把历史状态考虑进来。通过结合GAT层、GRU层以及多个全连接层,可以得到一个时空图卷积网 络,能很好地捕捉路网车流的时空特征。Furthermore, the above-mentioned graph convolutional network may be a graph convolutional network based on the GAT layer, and the above-mentioned recurrent neural network may be a recurrent neural network based on the GRU layer. As a graph convolutional neural network, the GAT layer can capture the spatial correlation of adjacent intersections well, so that the agent can take the state of adjacent intersections into consideration when making decisions. As a kind of recurrent neural network, the GRU layer can capture the time correlation of the intersection state very well, so that the agent can take the historical state into consideration when making decisions. By combining GAT layer, GRU layer and multiple fully connected layers, a spatiotemporal graph convolutional network can be obtained, which can well capture the spatiotemporal characteristics of road network traffic flow.
第一输出网络可以包括线性层、掩码层以及分类层,其中,线性层用于将时空图卷积网络提取到时空特征进行线性变换,分类层用于将线性变换后的特征向量进行分类,分类层可以采用Softmax进行分类,得到各个信号灯动作的概率分布。掩码层用于对信号灯动作的概率分布进行掩码,使得不可选的信号灯动作的概率分布为0,主要针对于三叉路口的智能体。The first output network may include a linear layer, a mask layer, and a classification layer, wherein the linear layer is used to extract the spatio-temporal graph convolutional network to the spatio-temporal feature for linear transformation, and the classification layer is used to classify the linearly transformed feature vector, The classification layer can use Softmax to classify, and obtain the probability distribution of each signal light action. The masking layer is used to mask the probability distribution of signal light actions, so that the probability distribution of non-selectable signal light actions is 0, mainly for agents at three-way intersections.
第二输出网络可以包括线性层,线性层用于将时空图卷积网络提取到时空特征进行线性变换,输出状态价值,状态价值用于评价信号灯动作网络的表现,信号灯动作网络的表现是从状态信息到信号灯动作的概率分布的一过程评价,在训练过程中,信号灯动作网络会根据状态价值进行调整,评价网络也会根据状态价值进行调整,使得信号灯动作网络表现越来越好,评价网络的状态价值也越来高。The second output network can include a linear layer. The linear layer is used to extract the spatio-temporal graph convolutional network into the spatio-temporal features for linear transformation, output the state value, and the state value is used to evaluate the performance of the signal light action network. The performance of the signal light action network is derived from the state A process evaluation of the probability distribution of information to signal light actions. During the training process, the signal light action network will be adjusted according to the state value, and the evaluation network will also be adjusted according to the state value, so that the performance of the signal light action network is getting better and better. The evaluation network The status value is also getting higher and higher.
需要说明的是,当智能体训练完成后,部署到对应路口时,则不需要将评价网络一块部署,只需要部署信号灯动作网络即可。智能体的训练包括信号灯动作网络和评价网络的训练。需要说明的是,智能体在构建好后包括状态函数、动作函数、奖励函数、信号灯动作网络以及评价网络,训练好的智能体可以只包括信号灯动作网络,其中,状态函数用于描述状态信息,动作函数用于描述信号灯动作、奖励函数用于激励智能体选择通行量更高的信号灯动作。It should be noted that when the agent training is completed and deployed to the corresponding intersection, it is not necessary to deploy the evaluation network together, only the signal light action network needs to be deployed. The training of the agent includes the training of the signal light action network and the evaluation network. It should be noted that after the agent is constructed, it includes a state function, an action function, a reward function, a signal light action network, and an evaluation network. The trained agent can only include a signal light action network. The state function is used to describe the state information. The action function is used to describe the action of the signal light, and the reward function is used to motivate the agent to choose the action of the signal light with higher traffic volume.
在一种可能的实施例中,请参见图3,图3是本发明实施例提供的一种智能体的架构图,如图3所示,信号灯动作网络与评价网络可以基于Actor-Critic框架进行构建,此时,智能体中包括Actor网络和Critic网络。Actor网络与Critic网络共享部分网络参数(时空图卷积网络的参数),上部分为Critic网络,下部分为Actor网络,这两个网络共享了前面四层网络参数(时空图卷积网络的参数)。这样有利于降低模型的学习难度,加快智能体训练的收敛。在训练过程中,智能体的输出分为两部分,一部分是Critic网络的输出,这部分输出是每个智能体的状态价值;另一部分是Actor网络的输出,这部分输出是智能体预测的信号灯动作的概率分布,由于不同智能体(设置在不同路口)可选择的信号灯动作可能不一样(例如三叉路口的智能体只可以选择三个相位),因此可以在Actor网络的输出层加了Mask(即掩 模)的操作,针对三叉路口的智能体的输出动作分布加入掩模使得不可选择的动作概率输出为0。In a possible embodiment, please refer to FIG. 3. FIG. 3 is an architecture diagram of an agent provided by an embodiment of the present invention. As shown in FIG. 3, the signal light action network and evaluation network can be based on the Actor-Critic framework. Construction. At this point, the agent includes the Actor network and the Critic network. The actor network and the critic network share some network parameters (the parameters of the spatiotemporal graph convolutional network), the upper part is the critic network, and the lower part is the actor network. These two networks share the first four layers of network parameters (parameters of the spatiotemporal graph convolutional network ). This is beneficial to reduce the learning difficulty of the model and speed up the convergence of agent training. During the training process, the output of the agent is divided into two parts, one part is the output of the critic network, which is the state value of each agent; the other part is the output of the actor network, which is the signal light predicted by the agent The probability distribution of actions, because different agents (set at different intersections) may choose different signal actions (for example, the agent at a three-way intersection can only choose three phases), so a Mask can be added to the output layer of the Actor network ( That is, the mask) operation, adding a mask to the output action distribution of the agent at the three-fork intersection so that the output of the unselectable action probability is 0.
可选的,在训练过程中,可以根据预设数量的仿真路口、仿真道路、仿真路口之间的联通关系、每条仿真道路的最高限速以及仿真道路长度构建路网仿真环境,每个仿真路口设置一个构建好的智能体,路网仿真环境随机在各个仿真车道生成仿真车流;每隔预设时间,将所有仿真路口在第一预设时间段内的状态信息以及路网仿真环境对应的图结构作为构建好的智能体的输入,通过构建好的智能体输出信号灯动作;在执行信号灯动作后,将各个仿真路口在第二预设时间段内的通行量作为奖励,对构建好的智能体进行强化学习的训练;训练完成后,将训练好的智能中的信号灯动作网络作为预设的智能体。Optionally, during the training process, a road network simulation environment can be constructed according to a preset number of simulated intersections, simulated roads, connectivity relationships between simulated intersections, the maximum speed limit of each simulated road, and the length of simulated roads. A well-built agent is set at the intersection, and the road network simulation environment randomly generates simulated traffic flow in each simulation lane; every preset time, the state information of all simulated intersections in the first preset time period and the corresponding road network simulation environment The graph structure is used as the input of the constructed agent, and the signal light action is output through the constructed agent; after the execution of the signal light action, the traffic volume of each simulated intersection within the second preset time period is used as a reward, which is beneficial to the constructed intelligent agent. The agent is trained for reinforcement learning; after the training is completed, the signal light action network in the trained intelligence is used as the preset agent.
在训练过程,可以每迭代M次,就对路网仿真环境中的交通流量进行随机生成,以增加智能体对应不同交通环境的适应性,M大于或等于1。同时,也可以每迭代Z次,对路网仿真环境进行随机重构,随机重构路网仿真环境中的仿真路口、仿真道路、仿真路口之间的联通关系、每条仿真道路的最高限速以及仿真道路长度,Z大于或等于1,以进一步增加智能体对应不同交通环境的适应性。In the training process, the traffic flow in the road network simulation environment can be randomly generated every M iterations to increase the adaptability of the agent to different traffic environments, and M is greater than or equal to 1. At the same time, it is also possible to randomly reconstruct the road network simulation environment every iteration Z times, and randomly reconstruct the simulated intersections, simulated roads, the connection relationship between simulated intersections, and the maximum speed limit of each simulated road in the road network simulation environment As well as the simulated road length, Z is greater than or equal to 1, so as to further increase the adaptability of the agent to different traffic environments.
在路网仿真环境中,每隔预设时间计算当前观察到的状态S,将所有仿真路口的状态S和路网仿真环境的图结构G作为对应智能体的输入,输出每个智能体的动作的概率分布,智能体选择概率最大的动作执行,同时会把执行动作之后的奖励r存储下来用以对智能体进行强化学习的训练。In the road network simulation environment, the currently observed state S is calculated every preset time, the state S of all simulated intersections and the graph structure G of the road network simulation environment are used as the input of the corresponding agent, and the action of each agent is output The probability distribution of the agent, the agent chooses the action with the highest probability to execute, and at the same time, the reward r after the action is stored for the reinforcement learning training of the agent.
通过在仿真环境中训练完成后,则可以在实际路网中部署使用,具体的,在每个交通路口安装摄像头,通过端侧(即摄像头)运行车辆检测算法的方式,实时获取路口各个车道的车辆信息(如车辆位置,停留时长)。获取车辆信息后,每个智能体计算当前的状态,并与其相邻的智能体交换状态信息,经过智能体内信号灯动作网络的运算后最终输出预设时刻的信号灯动作。需要说明的是,智能体在做决策前,除了要获取当前路口的状态之外,还需要获取邻接路口的状态,这是图卷积神经网络在计算时需要利用到相邻路口的状态信息。这样可以使得多个智能体在做决策时能充分协作,有效考虑相邻路口的状态信息。After the training in the simulation environment is completed, it can be deployed and used in the actual road network. Specifically, a camera is installed at each traffic intersection, and the vehicle detection algorithm is run on the end-side (that is, the camera) to obtain real-time information of each lane at the intersection. Vehicle information (such as vehicle location, length of stay). After obtaining the vehicle information, each agent calculates the current state, and exchanges state information with its adjacent agents, and finally outputs the signal light action at the preset time after the calculation of the signal light action network in the agent. It should be noted that before the agent makes a decision, in addition to obtaining the state of the current intersection, it also needs to obtain the state of the adjacent intersection. This is the state information of the adjacent intersection that the graph convolutional neural network needs to use in the calculation. In this way, multiple agents can fully cooperate when making decisions, and effectively consider the state information of adjacent intersections.
103、根据当前路口在预设时刻的信号灯动作,控制当前路口的信号灯在预设时刻执行所述信号灯动作。103. According to the signal light action at the current intersection at the preset time, control the signal light at the current intersection to execute the signal light action at the preset time.
可选的,在智能体预测到当前路口在预设时刻的信号灯动作后,可以控制当前路口的信号灯在预设时刻执行所述信号灯动作,使得在预设时刻时,对应车道中的车辆能根据信号灯动作通行。Optionally, after the agent predicts the signal light action at the current intersection at the preset time, the signal light at the current intersection can be controlled to perform the signal light action at the preset time, so that at the preset time, the vehicles in the corresponding lane can follow the Traffic light action.
可选的,可以根据预设的后处理规则,对所述当前路口在预设时刻的信号灯动作进行后处理,得到经过后处理的当前路口在预设时刻的信号灯动作;根据所述经过后处理的当前路口在预设时刻的信号灯动作,控制所述当前路口在所述经过后处理的预设时刻执行所述信号灯动作。Optionally, post-processing can be performed on the signal light action at the preset time at the current intersection according to the preset post-processing rules, so as to obtain the signal light action at the preset time at the current intersection after post-processing; according to the post-processing The signal light action of the current intersection at the preset time, and the current intersection is controlled to execute the signal light action at the preset time after the post-processing.
可以理解的是,后处理用于对最终的信号灯动作进行修正,后处理可以由多种规则组成,例如可以通过限定相位(相当于限定信号灯动作)对应车道上车辆的停留时长来对最终相位进行修正(由于定义的信号灯动作是独立的,有可能使得某些相位一直未被选择导致对应车道上的车辆等候时间过长)。It can be understood that the post-processing is used to modify the final signal light action, and the post-processing can be composed of various rules. For example, the final phase can be adjusted by defining the phase (equivalent to limiting the signal light action) corresponding to the length of time the vehicle stays on the lane. Correction (because the defined signal light actions are independent, it is possible that some phases have not been selected and the vehicles on the corresponding lanes have to wait too long).
一种后处理规则可以是,如果某一信号灯动作未被选中并且该信号灯动作对应的车道上车辆的停留时长超过预设阈值,则选中该信号灯动作让对应车道的车辆通行。后处理的加入可以使得最终的动作更加合理。A post-processing rule may be that if a certain signal light action is not selected and the dwell time of vehicles in the lane corresponding to the signal light action exceeds a preset threshold, then the signal light action is selected to allow vehicles in the corresponding lane to pass. The addition of post-processing can make the final action more reasonable.
本发明实施例中,获取当前路口以及邻接路口在第一预设时间段内的状态信息,以及获取所述当前路口与所述邻接路口的图结构,所述状态信息包括各个车道中车辆的停留位置以及停留时长,所述图结构包括所述当前路口与所述邻接路口的连接关系;将所述状态信息和所述图结构输入到预训练的智能体中,预测所述当前路口在预设时刻的信号灯动作,所述智能体通过强化学习的训练得到;根据所述当前路口在预设时刻的信号灯动作,控制所述当前路口的信号灯在所述预设时刻执行所述信号灯动作。通过将当前路口以及邻接路口在第一预设时间段内的状态信息,以及当前路口与邻接路口的图结构作为智能体的输入,通过智能体输出信号灯动作,由于状态信息中包括时序信息以及车道中车辆的停留位置以及停留时长,考虑到车道最近一段时间的拥堵情况,而图结构中包括各个路口的空间依赖,考虑到各个路口的空间分布情况,使得智能体能从时空的维度准确预测当前路口在预设时刻的信号灯动作,在控制路口信号灯在预设时刻执行该信号灯动作,避免车辆拥堵 或通行时间窗口闲置,从而提高整体路网的通行效率。In the embodiment of the present invention, the status information of the current intersection and the adjacent intersections within the first preset time period is obtained, and the graph structure of the current intersection and the adjacent intersections is obtained, and the status information includes the parking of vehicles in each lane The location and the length of stay, the graph structure includes the connection relationship between the current intersection and the adjacent intersection; the state information and the graph structure are input into the pre-trained agent to predict the current intersection in the preset The signal light action at the moment is obtained by the agent through reinforcement learning training; according to the signal light action at the current intersection at the preset time, the signal light at the current intersection is controlled to execute the signal light action at the preset time. By using the state information of the current intersection and adjacent intersections within the first preset time period, as well as the graph structure of the current intersection and adjacent intersections as the input of the agent, the agent outputs signal lights. Since the state information includes timing information and lane The parking position and duration of the vehicle in the vehicle, taking into account the recent congestion of the lane, and the graph structure includes the spatial dependence of each intersection, taking into account the spatial distribution of each intersection, so that the agent can accurately predict the current intersection from the dimension of time and space The signal light action at the preset time, the control signal light at the intersection executes the signal light action at the preset time, avoiding vehicle congestion or idle time windows, thereby improving the traffic efficiency of the overall road network.
需要说明的是,本发明实施例提供的交通信号灯控制方法可以应用于可以进行交通信号灯控制的智能手机、电脑、服务器等设备。It should be noted that the traffic signal light control method provided by the embodiment of the present invention can be applied to smart phones, computers, servers and other devices capable of controlling traffic signal lights.
请参见图4,图4是本发明实施例提供的一种交通信号灯控制装置的结构图,如图4所示,该交通信号灯控制装置包括:Please refer to FIG. 4. FIG. 4 is a structural diagram of a traffic signal light control device provided by an embodiment of the present invention. As shown in FIG. 4, the traffic signal light control device includes:
获取模块401,用于获取当前路口以及邻接路口在第一预设时间段内的状态信息,以及获取所述当前路口与所述邻接路口的图结构,所述状态信息包括各个车道中车辆的停留位置以及停留时长,所述图结构包括所述当前路口与所述邻接路口的连接关系;An acquisition module 401, configured to acquire status information of the current intersection and adjacent intersections within a first preset time period, and acquire a graph structure of the current intersection and the adjacent intersections, the status information including the parking of vehicles in each lane location and length of stay, the graph structure includes the connection relationship between the current intersection and the adjacent intersection;
预测模块402,用于将所述状态信息和所述图结构输入到预训练的智能体中,预测所述当前路口在预设时刻的信号灯动作,所述智能体通过强化学习的训练得到;A prediction module 402, configured to input the state information and the graph structure into a pre-trained agent to predict the signal light action at the preset time at the current intersection, and the agent is obtained through reinforcement learning training;
第一控制模块403,用于根据所述当前路口在预设时刻的信号灯动作,控制所述当前路口的信号灯在所述预设时刻执行所述信号灯动作。The first control module 403 is configured to control the signal light at the current intersection to perform the signal light action at the preset time according to the signal light action at the current intersection at the preset time.
可选的,所述获取模块401包括:Optionally, the obtaining module 401 includes:
第一获取子模块,用于在当前时刻,获取所述当前路口各个车道的图像信息,根据所述各个车道的图像信息提取所述各个车道中车辆的停留位置以及停留时长;The first acquisition sub-module is used to acquire the image information of each lane of the current intersection at the current moment, and extract the parking position and duration of the vehicle in each lane according to the image information of each lane;
第一计算子模块,用于根据所述各个车道中车辆的停留位置以及停留时长,计算所述各个车道对应的车道队列长度;The first calculation sub-module is used to calculate the lane queue length corresponding to each lane according to the parking position and the staying time of the vehicle in each lane;
第二计算子模块,用于根据信号灯动作信息以及所述各个车道对应的车道队列长度,计算所述当前路口在当前时刻的状态信息;The second calculation sub-module is used to calculate the state information of the current intersection at the current moment according to the signal light action information and the lane queue length corresponding to each lane;
第二获取子模块,用于获取第一预设时间段内每个时刻对应的状态信息,所述第一预设时间段包括当前时刻。The second obtaining sub-module is used to obtain state information corresponding to each moment within a first preset time period, where the first preset time period includes the current moment.
可选的,所述第二计算子模块包括:Optionally, the second computing submodule includes:
获取单元,用于获取所述当前路口在当前时刻的信号灯动作;an acquisition unit, configured to acquire the signal light action at the current intersection at the current moment;
计算单元,用于计算所述当前路口在当前时刻的信号灯动作下允许通行的车道所对应的车道队列长度之和;A calculation unit, configured to calculate the sum of the lane queue lengths corresponding to the lanes that are allowed to pass under the signal light action at the current intersection at the current moment;
处理单元,用于根据所述当前路口在当前时刻的信号灯动作以及所述允许通行的车道所对应的车道队列长度之和,得到所述当前路口在所述当前时 刻的状态信息。The processing unit is configured to obtain the state information of the current intersection at the current moment according to the sum of the signal light action at the current intersection and the lane queue length corresponding to the allowed lane.
可选的,所述装置还包括:Optionally, the device also includes:
构建模块,用于构建智能体,所述智能体被构建为根据状态信息输出信号灯动作;A building block for building an intelligent body, the intelligent body is constructed to output a signal light action according to the state information;
训练模块,用于以当前路口在第二预设时间段内的通行量为奖励,对所述智能体进行强化学习的训练,训练完成后得到训练好的智能体作为预设的智能体。The training module is used to use the traffic volume of the current intersection within the second preset time period as a reward to perform reinforcement learning training on the agent, and obtain the trained agent as the preset agent after the training is completed.
可选的,所述构建模块包括:Optionally, the building blocks include:
第一构建子模块,用于基于时空图卷积网络和第一输出网络构建信号灯动作网络,所述信号灯动作网络通过所述第一输出网络输出信号灯动作;A first construction submodule, configured to construct a signal light action network based on a spatio-temporal graph convolutional network and a first output network, and the signal light action network outputs a signal light action through the first output network;
第二构建子模块,用于基于时空图卷积网络和第二输出网络构建评价网络,所述评价网络通过所述第二输出网络输出状态价值,所述状态价值用于评价所述信号灯动作网络的表现,所述评价网络与所述信号灯动作网络共享一个时空图卷积网络的参数;The second construction sub-module is used to construct an evaluation network based on the spatio-temporal graph convolutional network and a second output network, the evaluation network outputs a state value through the second output network, and the state value is used to evaluate the signal light action network performance, the evaluation network shares the parameters of a spatio-temporal graph convolutional network with the signal light action network;
第三构建子模块,用于根据所述信号灯动作网络和所述评价网络,构建得到智能体。The third construction sub-module is used to construct an agent according to the signal light action network and the evaluation network.
可选的,所述训练模块包括:Optionally, the training module includes:
第四构建子模块,用于根据预设数量的仿真路口、仿真道路、仿真路口之间的联通关系、每条仿真道路的最高限速以及仿真道路长度构建路网仿真环境,每个仿真路口设置一个构建好的智能体,所述路网仿真环境随机在各个仿真车道生成仿真车流;The fourth construction sub-module is used to build a road network simulation environment according to the preset number of simulated intersections, simulated roads, the connection relationship between simulated intersections, the maximum speed limit of each simulated road, and the length of simulated roads. Each simulated intersection is set A constructed intelligent body, the road network simulation environment randomly generates simulated traffic flow in each simulated lane;
第一处理子模块,用于每隔预设时间,将所有仿真路口在所述第一预设时间段内的状态信息以及路网仿真环境对应的图结构作为所述构建好的智能体的输入,通过所述构建好的智能体输出信号灯动作;The first processing sub-module is used to take the state information of all simulated intersections within the first preset time period and the graph structure corresponding to the road network simulation environment as the input of the constructed agent every preset time , outputting a signal light action through the constructed agent;
奖励子模块,用于在执行信号灯动作后,将各个仿真路口在第二预设时间段内的通行量作为奖励,对所述构建好的智能体进行强化学习的训练;The reward sub-module is used to use the traffic volume of each simulated intersection in the second preset time period as a reward after performing the signal light action, and carry out reinforcement learning training to the constructed intelligent body;
第二处理子模块,用于训练完成后,将所述训练好的智能中的信号灯动作网络作为预设的智能体。The second processing sub-module is used to use the signal light action network in the trained intelligence as a preset agent after the training is completed.
可选的,所述装置还包括:Optionally, the device also includes:
后处理模块,用于根据预设的后处理规则,对所述当前路口在预设时刻 的信号灯动作进行后处理,得到经过后处理的当前路口在预设时刻的信号灯动作;The post-processing module is used to perform post-processing on the signal light action of the current intersection at the preset time according to the preset post-processing rules, so as to obtain the signal light action of the current intersection at the preset time after post-processing;
第二控制模块,用于根据所述经过后处理的当前路口在预设时刻的信号灯动作,控制所述当前路口在所述预设时刻执行所述信号灯动作。The second control module is configured to control the current intersection to execute the signal light action at the preset time according to the post-processed signal light action at the current intersection at the preset time.
需要说明的是,本发明实施例提供的交通信号灯控制装置可以应用于可以进行交通信号灯控制的智能手机、电脑、服务器等设备。It should be noted that the traffic signal light control device provided by the embodiment of the present invention can be applied to smart phones, computers, servers and other devices capable of controlling traffic signal lights.
本发明实施例提供的数据中心设备能够实现上述方法实施例中交通信号灯控制方法实现的各个过程,且可以达到相同的有益效果。为避免重复,这里不再赘述。The data center equipment provided by the embodiments of the present invention can implement various processes implemented by the traffic signal light control method in the above method embodiments, and can achieve the same beneficial effects. To avoid repetition, details are not repeated here.
参见图5,图5是本发明实施例提供的一种电子设备的结构示意图,如图5所示,包括:存储器502、处理器501及存储在所述存储器502上并可在所述处理器501上运行的交通信号灯控制方法的计算机程序,其中:Referring to FIG. 5, FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention. As shown in FIG. 5, it includes: a memory 502, a processor 501 and a A computer program for a traffic signal control method operating on 501, wherein:
处理器501用于调用存储器502存储的计算机程序,执行如下步骤:The processor 501 is used to call the computer program stored in the memory 502, and perform the following steps:
获取当前路口以及邻接路口在第一预设时间段内的状态信息,以及获取所述当前路口与所述邻接路口的图结构,所述状态信息包括各个车道中车辆的停留位置以及停留时长,所述图结构包括所述当前路口与所述邻接路口的连接关系;Obtain the state information of the current intersection and the adjacent intersection within the first preset time period, and obtain the graph structure of the current intersection and the adjacent intersection, the state information includes the parking position and the duration of the vehicle in each lane, so The graph structure includes the connection relationship between the current intersection and the adjacent intersection;
将所述状态信息和所述图结构输入到预训练的智能体中,预测所述当前路口在预设时刻的信号灯动作,所述智能体通过强化学习的训练得到;Inputting the state information and the graph structure into a pre-trained agent, predicting the signal light action at the preset time at the current intersection, and the agent is obtained through reinforcement learning training;
根据所述当前路口在预设时刻的信号灯动作,控制所述当前路口的信号灯在所述预设时刻执行所述信号灯动作。According to the signal light action at the current intersection at the preset time, the signal light at the current intersection is controlled to execute the signal light action at the preset time.
可选的,处理器501执行的所述获取当前路口以及邻接路口在第一预设时间段内的状态信息,包括:Optionally, the acquiring state information of the current intersection and adjacent intersections within the first preset time period performed by the processor 501 includes:
在当前时刻,获取所述当前路口各个车道的图像信息,根据所述各个车道的图像信息提取所述各个车道中车辆的停留位置以及停留时长;At the current moment, the image information of each lane at the current intersection is obtained, and the parking position and duration of the vehicle in each lane are extracted according to the image information of each lane;
根据所述各个车道中车辆的停留位置以及停留时长,计算所述各个车道对应的车道队列长度;According to the parking position and the staying time of the vehicle in each lane, calculate the lane queue length corresponding to each lane;
根据信号灯动作信息以及所述各个车道对应的车道队列长度,计算所述当前路口在当前时刻的状态信息;calculating the status information of the current intersection at the current moment according to the signal light action information and the lane queue lengths corresponding to the respective lanes;
获取第一预设时间段内每个时刻对应的状态信息,所述第一预设时间段 包括当前时刻。Obtain state information corresponding to each moment within a first preset time period, where the first preset time period includes the current moment.
可选的,处理器501执行的所述根据信号灯动作信息以及所述各个车道对应的车道队列长度,计算所述当前路口在当前时刻的状态信息,包括:Optionally, the calculation performed by the processor 501 on the status information of the current intersection at the current moment according to the signal light action information and the lane queue length corresponding to each lane includes:
获取所述当前路口在当前时刻的信号灯动作;Obtaining the signal light action at the current intersection at the current moment;
计算所述当前路口在当前时刻的信号灯动作下允许通行的车道所对应的车道队列长度之和;Calculating the sum of the lane queue lengths corresponding to the lanes that are allowed to pass under the signal light action at the current moment at the current intersection;
根据所述当前路口在当前时刻的信号灯动作以及所述允许通行的车道所对应的车道队列长度之和,计算得到所述当前路口在所述当前时刻的状态信息。The state information of the current intersection at the current moment is calculated according to the sum of the signal light action at the current intersection and the lane queue length corresponding to the allowed lane.
可选的,在所述将所述状态信息和所述图结构输入到预训练的智能体中,预测所述当前路口在预设时刻的信号灯动作之前,处理器501执行的所述方法还包括:Optionally, before inputting the state information and the graph structure into the pre-trained agent to predict the signal light action at the current intersection at a preset time, the method executed by the processor 501 further includes :
构建智能体,所述智能体被构建为根据状态信息输出信号灯动作;Constructing an agent configured to output a signal light action according to state information;
以当前路口在第二预设时间段内的通行量为奖励,对所述智能体进行强化学习的训练,训练完成后得到训练好的智能体作为预设的智能体。Taking the traffic volume at the current intersection within the second preset time period as a reward, the agent is trained in reinforcement learning, and after the training is completed, the trained agent is obtained as the preset agent.
可选的,处理器501执行的所述构建智能体,包括:Optionally, the building agent executed by the processor 501 includes:
基于时空图卷积网络和第一输出网络构建信号灯动作网络,所述信号灯动作网络通过所述第一输出网络输出信号灯动作;Constructing a signal light action network based on the spatio-temporal graph convolutional network and the first output network, the signal light action network outputs the signal light action through the first output network;
基于时空图卷积网络和第二输出网络构建评价网络,所述评价网络通过所述第二输出网络输出状态价值,所述状态价值用于评价所述信号灯动作网络的表现,所述评价网络与所述信号灯动作网络共享一个时空图卷积网络的参数;An evaluation network is constructed based on a spatio-temporal graph convolutional network and a second output network, the evaluation network outputs a state value through the second output network, and the state value is used to evaluate the performance of the signal light action network, and the evaluation network and The signal light action network shares the parameters of a spatio-temporal graph convolutional network;
根据所述信号灯动作网络和所述评价网络,构建得到智能体。According to the signal light action network and the evaluation network, an agent is constructed.
可选的,处理器501执行的所述以当前路口在第二预设时间段内的通行量为奖励,对所述智能体进行强化学习的训练,训练完成后得到训练好的智能体作为预设的智能体,包括:Optionally, the processor 501 performs reinforcement learning training on the agent with the traffic volume of the current intersection within the second preset time period as a reward, and after the training is completed, the trained agent is obtained as a preset The set of intelligent agents, including:
根据预设数量的仿真路口、仿真道路、仿真路口之间的联通关系、每条仿真道路的最高限速以及仿真道路长度构建路网仿真环境,每个仿真路口设置一个构建好的智能体,所述路网仿真环境随机在各个仿真车道生成仿真车流;According to the preset number of simulated intersections, simulated roads, the connection relationship between simulated intersections, the maximum speed limit of each simulated road, and the length of simulated roads, a road network simulation environment is constructed. Each simulated intersection is set with a well-built intelligent body. The road network simulation environment randomly generates simulated traffic flow in each simulation lane;
每隔预设时间,将所有仿真路口在所述第一预设时间段内的状态信息以及路网仿真环境对应的图结构作为所述构建好的智能体的输入,通过所述构建好的智能体输出信号灯动作;Every preset time, the state information of all simulated intersections in the first preset time period and the graph structure corresponding to the road network simulation environment are used as the input of the constructed intelligent agent, through the constructed intelligent agent Body output signal light action;
在执行信号灯动作后,将各个仿真路口在第二预设时间段内的通行量作为奖励,对所述构建好的智能体进行强化学习的训练;After executing the signal light action, the traffic volume of each simulated intersection in the second preset time period is used as a reward, and the constructed intelligent body is trained for reinforcement learning;
训练完成后,将所述训练好的智能中的信号灯动作网络作为预设的智能体。After the training is completed, the signal light action network in the trained intelligence is used as the preset intelligent body.
可选的,在所述将所述状态信息和所述图结构输入到预训练的智能体中,预测所述当前路口在预设时刻的信号灯动作之后,处理器501执行的所述方法还包括:Optionally, after the state information and the graph structure are input into the pre-trained agent to predict the signal light action at the current intersection at a preset time, the method executed by the processor 501 further includes :
根据预设的后处理规则,对所述当前路口在预设时刻的信号灯动作进行后处理,得到经过后处理的当前路口在预设时刻的信号灯动作;performing post-processing on the signal light action at the preset time at the current intersection according to the preset post-processing rule, to obtain the signal light action at the preset time at the current intersection after post-processing;
根据所述经过后处理的当前路口在预设时刻的信号灯动作,控制所述当前路口在所述预设时刻执行所述信号灯动作。According to the post-processed signal light action at the current intersection at the preset time, the current intersection is controlled to execute the signal light action at the preset time.
需要说明的是,本发明实施例提供的电子设备可以应用于可以进行交通信号灯控制的智能手机、电脑、服务器等设备。It should be noted that the electronic device provided by the embodiment of the present invention can be applied to smart phones, computers, servers and other devices capable of controlling traffic lights.
本发明实施例提供的电子设备能够实现上述方法实施例中交通信号灯控制方法实现的各个过程,且可以达到相同的有益效果。为避免重复,这里不再赘述。The electronic device provided by the embodiment of the present invention can realize each process realized by the traffic signal light control method in the above method embodiment, and can achieve the same beneficial effect. To avoid repetition, details are not repeated here.
本发明实施例还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现本发明实施例提供的交通信号灯控制方法的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。The embodiment of the present invention also provides a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, each process of the traffic signal light control method provided in the embodiment of the present invention is implemented, and can To achieve the same technical effect, in order to avoid repetition, no more details are given here.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存取存储器(Random Access Memory,简称RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented through computer programs to instruct related hardware, and the programs can be stored in a computer-readable storage medium. During execution, it may include the processes of the embodiments of the above-mentioned methods. Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM for short).
以上所揭露的仅为本发明较佳实施例而已,当然不能以此来限定本发明 之权利范围,因此依本发明权利要求所作的等同变化,仍属本发明所涵盖的范围。The above disclosures are only preferred embodiments of the present invention, and certainly cannot limit the scope of rights of the present invention. Therefore, equivalent changes made according to the claims of the present invention still fall within the scope of the present invention.

Claims (10)

  1. 一种交通信号灯控制方法,其特征在于,包括以下步骤:A traffic light control method, characterized in that, comprising the following steps:
    获取当前路口以及邻接路口在第一预设时间段内的状态信息,以及获取所述当前路口与所述邻接路口的图结构,所述状态信息包括各个车道中车辆的停留位置以及停留时长,所述图结构包括所述当前路口与所述邻接路口的连接关系;Obtain the state information of the current intersection and the adjacent intersection within the first preset time period, and obtain the graph structure of the current intersection and the adjacent intersection, the state information includes the parking position and the duration of the vehicle in each lane, so The graph structure includes the connection relationship between the current intersection and the adjacent intersection;
    将所述状态信息和所述图结构输入到预训练的智能体中,预测所述当前路口在预设时刻的信号灯动作,所述智能体通过强化学习的训练得到;Inputting the state information and the graph structure into a pre-trained agent, predicting the signal light action at the preset time at the current intersection, and the agent is obtained through reinforcement learning training;
    根据所述当前路口在预设时刻的信号灯动作,控制所述当前路口的信号灯在所述预设时刻执行所述信号灯动作。According to the signal light action at the current intersection at the preset time, the signal light at the current intersection is controlled to execute the signal light action at the preset time.
  2. 如权利要求1所述的交通信号灯控制方法,其特征在于,所述获取当前路口以及邻接路口在第一预设时间段内的状态信息,包括:The traffic signal light control method according to claim 1, wherein said acquiring state information of the current intersection and adjacent intersections within a first preset time period comprises:
    在当前时刻,获取所述当前路口各个车道的图像信息,根据所述各个车道的图像信息提取所述各个车道中车辆的停留位置以及停留时长;At the current moment, the image information of each lane at the current intersection is obtained, and the parking position and duration of the vehicle in each lane are extracted according to the image information of each lane;
    根据所述各个车道中车辆的停留位置以及停留时长,计算所述各个车道对应的车道队列长度;According to the parking position and the staying time of the vehicle in each lane, calculate the lane queue length corresponding to each lane;
    根据信号灯动作信息以及所述各个车道对应的车道队列长度,计算所述当前路口在当前时刻的状态信息;calculating the status information of the current intersection at the current moment according to the signal light action information and the lane queue lengths corresponding to the respective lanes;
    获取第一预设时间段内每个时刻对应的状态信息,所述第一预设时间段包括当前时刻。Obtain state information corresponding to each moment within a first preset time period, where the first preset time period includes the current moment.
  3. 如权利要求2所述的交通信号灯控制方法,其特征在于,所述根据信号灯动作信息以及所述各个车道对应的车道队列长度,计算所述当前路口在当前时刻的状态信息,包括:The traffic signal light control method according to claim 2, wherein the calculating the state information of the current intersection at the current moment according to the signal light action information and the lane queue lengths corresponding to the respective lanes includes:
    获取所述当前路口在当前时刻的信号灯动作;Obtaining the signal light action at the current intersection at the current moment;
    计算所述当前路口在当前时刻的信号灯动作下允许通行的车道所对应的车道队列长度之和;Calculating the sum of the lane queue lengths corresponding to the lanes that are allowed to pass under the signal light action at the current moment at the current intersection;
    根据所述当前路口在当前时刻的信号灯动作以及所述允许通行的车道所对应的车道队列长度之和,得到所述当前路口在所述当前时刻的状态信息。The state information of the current intersection at the current moment is obtained according to the sum of the signal light action at the current intersection and the lane queue length corresponding to the allowed lane.
  4. 如权利要求1所述的交通信号灯控制方法,其特征在于,在所述将所述状态信息和所述图结构输入到预训练的智能体中,预测所述当前路口在预 设时刻的信号灯动作之前,所述方法还包括:The traffic signal light control method according to claim 1, characterized in that, when the state information and the graph structure are input into the pre-trained agent, the signal light action of the current intersection at a preset time is predicted Previously, the method further included:
    构建智能体,所述智能体被构建为根据状态信息输出信号灯动作;Constructing an agent configured to output a signal light action according to state information;
    以当前路口在第二预设时间段内的通行量为奖励,对所述智能体进行强化学习的训练,训练完成后得到训练好的智能体作为预设的智能体。Taking the traffic volume at the current intersection within the second preset time period as a reward, the agent is trained in reinforcement learning, and after the training is completed, the trained agent is obtained as the preset agent.
  5. 如权利要求4所述的交通信号灯控制方法,其特征在于,所述构建智能体,包括:The traffic signal light control method according to claim 4, wherein said building an intelligent body includes:
    基于时空图卷积网络和第一输出网络构建信号灯动作网络,所述信号灯动作网络通过所述第一输出网络输出信号灯动作;Constructing a signal light action network based on the spatio-temporal graph convolutional network and the first output network, the signal light action network outputs the signal light action through the first output network;
    基于时空图卷积网络和第二输出网络构建评价网络,所述评价网络通过所述第二输出网络输出状态价值,所述状态价值用于评价所述信号灯动作网络的表现,所述评价网络与所述信号灯动作网络共享一个时空图卷积网络的参数;An evaluation network is constructed based on a spatio-temporal graph convolutional network and a second output network, the evaluation network outputs a state value through the second output network, and the state value is used to evaluate the performance of the signal light action network, and the evaluation network and The signal light action network shares the parameters of a spatio-temporal graph convolutional network;
    根据所述信号灯动作网络和所述评价网络,构建得到智能体。According to the signal light action network and the evaluation network, an agent is constructed.
  6. 如权利要求5所述的交通信号灯控制方法,其特征在于,所述以当前路口在第二预设时间段内的通行量为奖励,对所述智能体进行强化学习的训练,训练完成后得到训练好的智能体作为预设的智能体,包括:The traffic signal light control method according to claim 5, characterized in that, using the traffic volume of the current crossing in the second preset time period as a reward, the agent is trained in reinforcement learning, and after the training is completed, it is obtained The trained agent is used as the preset agent, including:
    根据预设数量的仿真路口、仿真道路、仿真路口之间的联通关系、每条仿真道路的最高限速以及仿真道路长度构建路网仿真环境,每个仿真路口设置一个构建好的智能体,所述路网仿真环境随机在各个仿真车道生成仿真车流;According to the preset number of simulated intersections, simulated roads, the connection relationship between simulated intersections, the maximum speed limit of each simulated road, and the length of simulated roads, a road network simulation environment is constructed. Each simulated intersection is set with a well-built intelligent body. The road network simulation environment randomly generates simulated traffic flow in each simulation lane;
    每隔预设时间,将所有仿真路口在所述第一预设时间段内的状态信息以及路网仿真环境对应的图结构作为所述构建好的智能体的输入,通过所述构建好的智能体输出信号灯动作;Every preset time, the state information of all simulated intersections in the first preset time period and the graph structure corresponding to the road network simulation environment are used as the input of the constructed intelligent agent, through the constructed intelligent agent Body output signal light action;
    在执行信号灯动作后,将各个仿真路口在第二预设时间段内的通行量作为奖励,对所述构建好的智能体进行强化学习的训练;After executing the signal light action, the traffic volume of each simulated intersection in the second preset time period is used as a reward, and the constructed intelligent body is trained for reinforcement learning;
    训练完成后,将所述训练好的智能中的信号灯动作网络作为预设的智能体。After the training is completed, the signal light action network in the trained intelligence is used as the preset intelligent body.
  7. 如权利要求1所述的交通信号灯控制方法,其特征在于,在所述将所述状态信息和所述图结构输入到预训练的智能体中,预测所述当前路口在预设时刻的信号灯动作之后,所述方法还包括:The traffic signal light control method according to claim 1, characterized in that, when the state information and the graph structure are input into the pre-trained agent, the signal light action of the current intersection at a preset time is predicted Afterwards, the method also includes:
    根据预设的后处理规则,对所述当前路口在预设时刻的信号灯动作进行后处理,得到经过后处理的当前路口在预设时刻的信号灯动作;performing post-processing on the signal light action at the preset time at the current intersection according to the preset post-processing rule, to obtain the signal light action at the preset time at the current intersection after post-processing;
    根据所述经过后处理的当前路口在预设时刻的信号灯动作,控制所述当前路口在所述经过后处理的预设时刻执行所述信号灯动作。Controlling the current intersection to execute the signal light action at the preset time after post-processing according to the post-processed signal light action at the preset time.
  8. 一种交通信号灯控制装置,其特征在于,所述装置包括:A traffic signal light control device, characterized in that the device comprises:
    获取模块,用于获取当前路口以及邻接路口在第一预设时间段内的状态信息,以及获取所述当前路口与所述邻接路口的图结构,所述状态信息包括各个车道中车辆的停留位置以及停留时长,所述图结构包括所述当前路口与所述邻接路口的连接关系;An acquisition module, configured to acquire status information of the current intersection and adjacent intersections within a first preset time period, and acquire a graph structure of the current intersection and the adjacent intersections, the status information including the parking positions of vehicles in each lane And the length of stay, the graph structure includes the connection relationship between the current intersection and the adjacent intersection;
    预测模块,用于将所述状态信息和所述图结构输入到预训练的智能体中,预测所述当前路口在预设时刻的信号灯动作,所述智能体通过强化学习的训练得到;A prediction module, configured to input the state information and the graph structure into a pre-trained agent to predict the signal light action at the current intersection at a preset moment, and the agent is obtained through reinforcement learning training;
    第一控制模块,用于根据所述当前路口在预设时刻的信号灯动作,控制所述当前路口的信号灯在所述预设时刻执行所述信号灯动作。The first control module is configured to control the signal light at the current intersection to perform the signal light action at the preset time according to the signal light action at the current intersection at the preset time.
  9. 一种电子设备,其特征在于,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至7中任一项所述的交通信号灯控制方法中的步骤。An electronic device, characterized in that it comprises: a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the computer program, the computer program according to claim 1 is realized. Steps in the traffic signal control method described in any one of to 7.
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的交通信号灯控制方法中的步骤。A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the traffic signal light according to any one of claims 1 to 7 is realized The steps in the control method.
PCT/CN2022/099853 2021-12-31 2022-06-20 Traffic light control method and related device WO2023123906A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111674229.X 2021-12-31
CN202111674229.XA CN114399909B (en) 2021-12-31 2021-12-31 Traffic signal lamp control method and related equipment

Publications (1)

Publication Number Publication Date
WO2023123906A1 true WO2023123906A1 (en) 2023-07-06

Family

ID=81229791

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/099853 WO2023123906A1 (en) 2021-12-31 2022-06-20 Traffic light control method and related device

Country Status (2)

Country Link
CN (1) CN114399909B (en)
WO (1) WO2023123906A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117062280A (en) * 2023-08-17 2023-11-14 北京美中爱瑞肿瘤医院有限责任公司 Automatic following system of neurosurgery self-service operating lamp

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114399909B (en) * 2021-12-31 2023-05-12 深圳云天励飞技术股份有限公司 Traffic signal lamp control method and related equipment
CN114898576B (en) * 2022-05-10 2023-12-19 阿波罗智联(北京)科技有限公司 Traffic control signal generation method and target network model training method
CN114822037B (en) * 2022-06-01 2023-09-08 浙江大华技术股份有限公司 Traffic signal control method and device, storage medium and electronic device
CN116071939B (en) * 2023-03-24 2023-06-16 华东交通大学 Traffic signal control model building method and control method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111243297A (en) * 2020-01-17 2020-06-05 苏州科达科技股份有限公司 Traffic light phase control method, system, device and medium
CN113223305A (en) * 2021-03-26 2021-08-06 中南大学 Multi-intersection traffic light control method and system based on reinforcement learning and storage medium
CN113380054A (en) * 2021-06-09 2021-09-10 湖南大学 Traffic signal lamp control method and system based on reinforcement learning
CN113643528A (en) * 2021-07-01 2021-11-12 腾讯科技(深圳)有限公司 Signal lamp control method, model training method, system, device and storage medium
CN114399909A (en) * 2021-12-31 2022-04-26 深圳云天励飞技术股份有限公司 Traffic signal lamp control method and related equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110444028A (en) * 2019-09-06 2019-11-12 科大讯飞股份有限公司 Multiple Intersections Signalized control method, device and equipment
CN111785045B (en) * 2020-06-17 2022-07-05 南京理工大学 Distributed traffic signal lamp combined control method based on actor-critic algorithm
CN112289045B (en) * 2020-10-19 2021-12-21 智邮开源通信研究院(北京)有限公司 Traffic signal control method and device, electronic equipment and readable storage medium
CN113299085A (en) * 2021-06-11 2021-08-24 昭通亮风台信息科技有限公司 Traffic signal lamp control method, equipment and storage medium
CN113627596A (en) * 2021-08-10 2021-11-09 中国科学院自动化研究所 Multi-agent confrontation method and system based on dynamic graph neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111243297A (en) * 2020-01-17 2020-06-05 苏州科达科技股份有限公司 Traffic light phase control method, system, device and medium
CN113223305A (en) * 2021-03-26 2021-08-06 中南大学 Multi-intersection traffic light control method and system based on reinforcement learning and storage medium
CN113380054A (en) * 2021-06-09 2021-09-10 湖南大学 Traffic signal lamp control method and system based on reinforcement learning
CN113643528A (en) * 2021-07-01 2021-11-12 腾讯科技(深圳)有限公司 Signal lamp control method, model training method, system, device and storage medium
CN114399909A (en) * 2021-12-31 2022-04-26 深圳云天励飞技术股份有限公司 Traffic signal lamp control method and related equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117062280A (en) * 2023-08-17 2023-11-14 北京美中爱瑞肿瘤医院有限责任公司 Automatic following system of neurosurgery self-service operating lamp
CN117062280B (en) * 2023-08-17 2024-03-08 北京美中爱瑞肿瘤医院有限责任公司 Automatic following system of neurosurgery self-service operating lamp

Also Published As

Publication number Publication date
CN114399909A (en) 2022-04-26
CN114399909B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
WO2023123906A1 (en) Traffic light control method and related device
CN111260937B (en) Cross traffic signal lamp control method based on reinforcement learning
Jin et al. A group-based traffic signal control with adaptive learning ability
CN109215355A (en) A kind of single-point intersection signal timing optimization method based on deeply study
CN116235229A (en) Method and system for controlling self-adaptive periodic level traffic signals
CN114925836B (en) Urban traffic flow reasoning method based on dynamic multi-view graph neural network
Egea et al. Assessment of reward functions for reinforcement learning traffic signal control under real-world limitations
Liao et al. Time difference penalized traffic signal timing by LSTM Q-network to balance safety and capacity at intersections
WO2021073716A1 (en) Traffic reasoner
Li et al. Deep imitation learning for traffic signal control and operations based on graph convolutional neural networks
CN114038216A (en) Signal lamp control method based on road network division and boundary flow control
Soon et al. Extended pheromone-based short-term traffic forecasting models for vehicular systems
CN114120670B (en) Method and system for traffic signal control
Du et al. Multi-agent deep reinforcement learning with spatio-temporal feature fusion for traffic signal control
Fang et al. Multi-objective traffic signal control using network-wide agent coordinated reinforcement learning
CN113838296B (en) Traffic signal control method, device, equipment and storage medium
Yang et al. Multi-granularity scenarios understanding network for trajectory prediction
Rasheed et al. Deep Reinforcement Learning for Addressing Disruptions in Traffic Light Control.
Han et al. Extensible prototype learning for real‐time traffic signal control
CN115331460B (en) Large-scale traffic signal control method and device based on deep reinforcement learning
CN111552294A (en) Outdoor robot path-finding simulation system and method based on time dependence
Samson et al. Smart traffic signal control system for two inter-dependent intersections in Akure, Nigeria
Tian et al. Multi-mode spatial-temporal convolution network for traffic flow forecasting
Paul et al. Intelligent traffic signal management using DRL for a real-time road network in ITS
Clara Fang et al. An intelligent traffic signal control system based on fuzzy theory

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22913195

Country of ref document: EP

Kind code of ref document: A1