CN112309138A - Traffic signal control method and device, electronic equipment and readable storage medium - Google Patents

Traffic signal control method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN112309138A
CN112309138A CN202011120565.5A CN202011120565A CN112309138A CN 112309138 A CN112309138 A CN 112309138A CN 202011120565 A CN202011120565 A CN 202011120565A CN 112309138 A CN112309138 A CN 112309138A
Authority
CN
China
Prior art keywords
intersection
current
stage
state information
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011120565.5A
Other languages
Chinese (zh)
Inventor
王鲁晗
胡天风
胡智群
王刚
傅彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhiyou Open Source Communication Research Institute Beijing Co ltd
Original Assignee
Zhiyou Open Source Communication Research Institute Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhiyou Open Source Communication Research Institute Beijing Co ltd filed Critical Zhiyou Open Source Communication Research Institute Beijing Co ltd
Priority to CN202011120565.5A priority Critical patent/CN112309138A/en
Publication of CN112309138A publication Critical patent/CN112309138A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • G08G1/08Controlling traffic signals according to detected number or speed of vehicles
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0137Measuring and analyzing of parameters relative to traffic conditions for specific applications
    • G08G1/0145Measuring and analyzing of parameters relative to traffic conditions for specific applications for active traffic flow control

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Traffic Control Systems (AREA)

Abstract

The embodiment of the disclosure discloses a traffic signal control method, a traffic signal control device, electronic equipment and a readable storage medium, wherein the method comprises the steps of obtaining state information of a current stage of a current intersection; receiving a control action of a previous stage of an adjacent intersection of the current intersection; determining an average value of codes of control actions of a previous stage of the adjacent intersection; and determining the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection, so that the interaction of all the agents and the traffic flow environment is taken as a random game problem, the dimensionality of an action space is reduced through mean field approximation, the interaction problem of the agent and other agents is converted into the interaction problem of a certain average effect of the agent and the adjacent agents, and the problem of 'dimensionality disaster' is avoided.

Description

Traffic signal control method and device, electronic equipment and readable storage medium
Technical Field
The disclosure relates to the technical field of intelligent traffic, in particular to a traffic signal control method, a traffic signal control device, electronic equipment and a readable storage medium.
Background
With the rapid development of the Chinese social economy, the automobile holding capacity of urban residents is continuously improved, the road congestion phenomenon occurs more and more frequently, and the problem also directly influences the life quality and the trip experience of the urban residents. As a key part of traffic flow management, traffic signal lamp timing plays an important role in traffic flow control, road blockage alleviation and regional traffic flow coordination.
The traffic light intelligent regulation and control research based on deep reinforcement learning introduces a deep reinforcement learning algorithm into traffic light timing, an intelligent agent is built at each intersection, traffic flow information of the current intersection is obtained as a state, actions are output through a deep reinforcement learning network to control a timing scheme of the traffic light, and traffic performance indexes such as queuing length, waiting time and the like are set as rewards to guide the learning of the intelligent agent. The research mainly utilizes the characteristic that deep reinforcement learning can be applied to dynamic and uncertain scenes, and does not need to deduce a complex mathematical model.
However, the present inventors have found that the prior art has at least the following problems: if the intelligent bodies at each intersection are taken as independent bodies, each intelligent body collects local state information and independently makes action decisions, and state and action information interaction among the intelligent bodies is lacked, so that local traffic jam is easy to generate, and a better area coordination effect cannot be achieved; if the interaction between all agents and the environment is regarded as a random game problem, the action space required to be processed by the agents comprises the action selection of all agents, and under the condition of processing multiple intersections and even a large-scale road network, the dimensionality of the action space is exponentially increased along with the increase of the number of intersections, so that a dimensionality disaster is caused.
Disclosure of Invention
In order to solve the problems in the related art, embodiments of the present disclosure provide a traffic signal control method, a traffic signal control apparatus, an electronic device, and a readable storage medium.
In a first aspect, a traffic signal control method is provided in an embodiment of the present disclosure.
Specifically, the traffic signal control method includes:
acquiring state information of a current stage of a current intersection;
receiving a control action of a previous stage of an adjacent intersection of the current intersection;
determining an average value of codes of control actions of a previous stage of the adjacent intersection;
and determining the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection.
With reference to the first aspect, the present disclosure provides in a first implementation manner of the first aspect, where the state information includes vehicle states of a plurality of sub-areas within a range of the current intersection, and the vehicle states include whether vehicles exist in the sub-areas and speeds of the vehicles in the sub-areas.
With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the obtaining the state information of the current stage of the current intersection includes:
determining attributes of lanes to which each of the plurality of sub-regions belongs;
collecting vehicle states of a plurality of sub-areas within the range of the current intersection;
and according to the attribute of the lane to which the sub-regions belong, mapping the vehicle states of the sub-regions into a state matrix as the state information of the current stage of the current intersection.
With reference to the first aspect, in a third implementation manner of the first aspect, the approaching intersection includes at least one of:
crossing the number of crossing which is smaller than the first threshold value;
the intersection of which the straight-line distance from the current intersection is smaller than a second threshold value;
and the distance from the road to the current intersection is less than a third threshold value.
With reference to the first aspect, in a fourth implementation manner of the first aspect, the determining a control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection includes:
and inputting the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection into a prediction model based on deep reinforcement learning so as to obtain the control actions of the current stage of the current intersection.
With reference to the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the method further includes:
acquiring state information of a next stage of the current intersection after executing the control action of the current stage of the current intersection;
determining the vehicle queuing length based on the state information of the next stage of the current intersection;
determining a reward value based on the vehicle queue length;
updating parameters of the predictive model based on the reward value.
With reference to the second implementation manner of the first aspect, in a sixth implementation manner of the first aspect, the method further includes processing the state matrix through a convolutional neural network to extract a state feature.
In a second aspect, a traffic signal control apparatus is provided in an embodiment of the present disclosure.
Specifically, the traffic signal control device includes:
the acquisition module is configured to acquire the state information of the current stage of the current intersection;
a receiving module configured to receive a control action of a previous stage of an adjacent intersection of the current intersection;
a first determination module configured to determine an average of encodings of control actions of a previous stage of the adjacent intersection;
a second determination module configured to determine a control action of a current stage of the current intersection based on the state information of the current stage of the current intersection and an average of the codes of the control actions of a previous stage of the adjacent intersection.
In a third aspect, the present disclosure provides an electronic device, including a memory and a processor, where the memory is configured to store one or more computer instructions, where the one or more computer instructions are executed by the processor to implement the method according to the first aspect, and any one of the first to sixth aspects.
In a fourth aspect, the present disclosure provides a computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the method according to the first aspect, or any one of the first to sixth aspects.
According to the technical scheme provided by the embodiment of the disclosure, the state information of the current stage of the current intersection is obtained; receiving a control action of a previous stage of an adjacent intersection of the current intersection; determining an average value of codes of control actions of a previous stage of the adjacent intersection; and determining the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection, so that the interaction of all the agents and the traffic flow environment is taken as a random game problem, the dimensionality of an action space is reduced through mean field approximation, the interaction problem of the agent and other agents is converted into the interaction problem of a certain average effect of the agent and the adjacent agents, and the problem of 'dimensionality disaster' is avoided.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:
fig. 1A and 1B show schematic diagrams of application scenarios according to embodiments of the present disclosure;
FIG. 2 illustrates a flow chart of a traffic signal control method according to an embodiment of the disclosure;
FIG. 3 illustrates a flow diagram for obtaining status information according to an embodiment of the present disclosure;
FIG. 4 illustrates a flow diagram for updating model parameters according to an embodiment of the present disclosure;
FIG. 5 shows a block diagram of a traffic signal control device according to an embodiment of the present disclosure;
FIG. 6 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure;
FIG. 7 illustrates a schematic block diagram of a computer system suitable for implementing traffic signal control of embodiments of the present disclosure.
Detailed Description
Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.
In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.
It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
The prior art has two problems when applying reinforcement learning to solve the control of the traffic signal lamp of multiple intersections: if the intelligent bodies at each intersection are taken as independent bodies, each intelligent body collects local state information and independently makes action decisions, and state and action information interaction among the intelligent bodies is lacked, so that local traffic jam is easy to generate, and a better area coordination effect cannot be achieved; if the interaction between all agents and the environment is regarded as a random game problem, the action space required to be processed by the agents comprises the action selection of all agents, and under the condition of processing multiple intersections and even a large-scale road network, the dimensionality of the action space is exponentially increased along with the increase of the number of intersections, so that a dimensionality disaster is caused.
According to the technical scheme provided by the embodiment of the disclosure, the state information of the current stage of the current intersection is obtained; receiving a control action of a previous stage of an adjacent intersection of the current intersection; determining an average value of codes of control actions of a previous stage of the adjacent intersection; and determining the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection, so that the interaction of all the agents and the traffic flow environment is taken as a random game problem, the dimensionality of an action space is reduced through mean field approximation, the interaction problem of the agent and other agents is converted into the interaction problem of a certain average effect of the agent and the adjacent agents, and the problem of 'dimensionality disaster' is avoided.
Fig. 1A and 1B show schematic diagrams of application scenarios according to embodiments of the present disclosure.
The road network structure generally includes a plurality of roads and a plurality of intersections formed by intersections of the plurality of roads. Fig. 1A shows an exemplary road network structure including three lateral roads and three longitudinal roads and nine intersections formed. Fig. 1B shows a schematic diagram of an exemplary intersection, where lanes entering the intersection are defined as entering lanes, and lanes leaving the intersection are defined as exiting lanes, the intersection includes 12 entering lanes and 12 exiting lanes, and there are three entering lanes and three exiting lanes in each direction. Wherein, the three entering lanes are respectively a left-turn lane, a straight lane and a right-turn lane.
It should be noted that fig. 1A and 1B above are only exemplary road network structures, and the application of the traffic signal control method and apparatus of the embodiment of the present disclosure is not limited to such a structure, for example, there may be more or less intersections, or there may be other intersections such as t-intersections, and each intersection may also have a different lane arrangement from that of fig. 1B.
Fig. 2 illustrates a flow chart of a traffic signal control method according to an embodiment of the present disclosure.
As shown in fig. 2, the traffic signal control method includes operations S210 to S240.
In operation S210, state information of a current stage of the current intersection is acquired;
receiving a control action of a previous stage of an adjacent intersection of the current intersection in operation S220;
determining an average value of codes of control actions of a previous stage of the adjacent intersection in operation S230;
in operation S240, a control action of the current stage of the current intersection is determined based on the state information of the current stage of the current intersection and the encoded average value of the control actions of the previous stage of the adjacent intersection.
According to the technical scheme provided by the embodiment of the disclosure, the state information of the current stage of the current intersection is obtained; receiving a control action of a previous stage of an adjacent intersection of the current intersection; determining an average value of codes of control actions of a previous stage of the adjacent intersection; and determining the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection, so that the interaction of all the agents and the traffic flow environment is taken as a random game problem, the dimensionality of an action space is reduced through mean field approximation, the interaction problem of the agent and other agents is converted into the interaction problem of a certain average effect of the agent and the adjacent agents, and the problem of 'dimensionality disaster' is avoided.
According to the embodiment of the present disclosure, in operation S210, state information of a current stage of a current intersection is acquired, where the state information includes vehicle states of a plurality of sub-areas within a range of the current intersection. For example, a detector may be arranged in each direction of each intersection, and the detection range may cover the entire road segment for acquiring the state information of the intersection.
According to the embodiment of the present disclosure, in addition to the above-described detector, a communication unit, a processing unit, and a control unit may be provided for each intersection. The communication unit is used for sending the information of the current intersection to other intersections or receiving the information sent by other intersections. The processing unit is used for executing the calculation task and generating the control action according to the obtained information. The control unit is used for controlling signal change of the signal lamp according to the control action generated by the processing unit.
According to the embodiment of the disclosure, the state information comprises vehicle states of a plurality of sub-areas within a range of a current intersection, and the vehicle states comprise whether vehicles exist in the sub-areas and speeds of the vehicles in the sub-areas.
According to the technical scheme provided by the embodiment of the disclosure, the vehicle state can be simply and effectively described through the existence of the vehicle in the sub-area and the speed of the vehicle in the sub-area, so that the prediction efficiency and effectiveness of the control action are improved.
For example, the distance L (e.g. about 6.5 meters) between the vehicle length and the inter-vehicle distance road section can divide the K entering lanes corresponding to each intersection into w small road sections with length of L, i.e. sub-areas, respectively, so that the traffic state of one intersection can be converted into a matrix of K × w (w depends on the length of the lane). The position in the matrix may accurately represent the lane position of the vehicle, and each element of the matrix is the vehicle speed at that position. Taking the intersection with 12 entering lanes shown in fig. 1B as an example, the state information of the intersection can be represented as a 12 × w matrix, and each element in the matrix represents the vehicle state on a corresponding one of the sub-areas.
According to the embodiment of the present disclosure, in order to represent the case where there is no vehicle in the sub-area, it may be defined that the matrix element is equal to 1 when the vehicle speed is 0, and the matrix element is equal to 0 at this position without a vehicle.
Fig. 3 shows a flow chart for obtaining status information according to an embodiment of the present disclosure.
As shown in fig. 3, the operation S210 may include operations S310 to S330.
In operation S310, determining an attribute of a lane to which each of a plurality of the sub-regions belongs;
in operation S320, vehicle states of a plurality of sub-areas within the range of the current intersection are collected;
in operation S330, vehicle states of the plurality of sub-regions are mapped to a state matrix according to attributes of lanes to which the sub-regions belong, as state information of a current stage of the current intersection.
According to an embodiment of the present disclosure, the attributes of the lane may include, for example, a left turn, a straight lane, or a right turn. For the case of the lane setting different from the intersection as shown in fig. 1B, the original state information may be preprocessed according to a predetermined rule, that is, the vehicle states of the plurality of sub-regions are mapped to the state matrix according to the attribute of the lane to which the sub-regions belong through the above operation S330. For example, in the case that there are four lanes in the north-south direction, i.e., left turn, straight run, and right turn, the vehicle speeds at the corresponding (parallel) positions on the two straight lanes are arithmetically averaged to be the vehicle speed at the position on the straight lane, and the vehicle speed is filled in the corresponding position of the matrix; and (4) directly filling the speeds of all vehicles on the lane into the corresponding positions of the matrix for the left-turn lane and the right-turn lane. For another example, two lanes are provided in the north-south direction, one is left-turn and the other is straight-going and right-turn, the element of the right-turn lane in the matrix is set to 0, and the vehicle speed information on the straight-going and right-turn lane is filled into the feature matrix to indicate the straight-going lane.
According to the technical scheme provided by the embodiment of the disclosure, the attribute of the lane to which each sub-area in a plurality of sub-areas belongs is determined; collecting vehicle states of a plurality of sub-areas within the range of the current intersection; according to the attribute of the lanes to which the sub-regions belong, the vehicle states of the sub-regions are mapped into a state matrix as the state information of the current stage of the current intersection, the state information has better expansibility, can adapt to different road network structures, has lower dimensionality, can simply and effectively describe the conditions of the intersection, and improves the efficiency and effectiveness of prediction of control actions.
According to the embodiment of the present disclosure, in operation S220, a control action of a previous stage of an adjacent intersection of the current intersection is received, wherein the adjacent intersection includes at least one of:
crossing the number of crossing which is smaller than the first threshold value;
the intersection of which the straight-line distance from the current intersection is smaller than a second threshold value;
and the distance from the road to the current intersection is less than a third threshold value.
According to the embodiment of the present disclosure, the first threshold may be an integer greater than or equal to 1, and in the case that the first threshold is equal to 1, intersections passing through the current intersection are less than the first threshold, that is, intersections adjacent to the current intersection. For example, in the embodiment illustrated in FIG. 1A, the intersections adjacent to intersection 5 are intersections 2, 4, 6, 8. As another example, in the case where the first threshold value is equal to 2, the adjacent intersections of intersection 1 are intersections 2, 3, 4, 5, 7. According to an embodiment of the present disclosure, the second threshold and the third threshold are both positive numbers, e.g. 5 km.
According to the technical scheme provided by the embodiment of the disclosure, the adjacent intersections are determined through one or more ways, only the data of the adjacent intersections are received, and the data of all the intersections are prevented from being collected, so that the method can be suitable for road network structures of any scale.
According to the embodiment of the present disclosure, in operation S230, an average value of codes of control actions of a previous stage of the adjacent intersection is determined. At each signal light cycle (e.g. 15s), each agent makes a selection of control actions, which may include, for example: south-north straight going, south-north left turning, east-west straight going, east-west left turning. Thus, the actions of the agent can be represented as a 4-dimensional vector, i.e. the action space of the agent can be represented as a one-hot code (one-hot), i.e., { [1,0,0,0], [0,1,0,0], [0,0,1,0], [0,0,0,1] }, e.g. if the control actions of the east-west-straight line can be represented as a code of [0,0,1,0 ].
According to embodiments of the present disclosure, an average value of the encoding of the control actions of the adjacent intersections can be determined. For example, in the case where the intersection 5 shown in fig. 1B is adjacent intersections 2, 4, 6, and 8, if the codes of the control operations of the intersections 2, 4, 6, and 8 are [1,0,0,0], [0,1,0,0], [0,0,1,0], [1,0,0,0], respectively, the average value of [0.5,0.25,0.25, and 0] can be taken, and the vector having the same dimension is still one element whose sum is 1.
It should be noted that the arithmetic sum of the codes of the adjacent intersection control actions has an equivalent effect in addition to the average value, and the arithmetic sum may be used instead of the average value, for example, the arithmetic sum of [1,0,0,0], [0,1,0,0], [0,0,1,0], [1,0,0,0] is [2,1,1,0], except that the sum of the elements is not 1, and normalization processing may be performed as necessary.
According to the embodiment of the present disclosure, in operation S240, the determining the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection includes inputting the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection into a prediction model based on deep reinforcement learning to obtain the control action of the current stage of the current intersection.
Reinforcement learning is one of the paradigms and methodologies of machine learning to describe and solve the problem of agents (agents) learning strategies to maximize returns or achieve specific goals during interactions with the environment. In the embodiment of the present disclosure, the prediction model run by the processing unit of each intersection can be abstracted into an agent, and the state information of each intersection is the state information of the environment faced by the agent.
According to the embodiment of the disclosure, the prediction model may be mainly constructed by a dqn (deep Q network) network, for example. The state input dimension of the model is large, and a convolutional neural network can be used for extracting state features. The traffic flow state information is subjected to the sum of two layers of convolution and pooling, and is prepared to be input into a full-connection layer through data flattening operation. The above described encoded average of the control actions of the approaching crossing is introduced at this time. The arithmetic mean of the actions of adjacent agents is still a vector with the same dimensions, which can be understood as the empirical distribution of the agent action choices around the intersection. The average action information is input into the deep neural network as a part of the state, so that the information interaction between intersections can be realized, only a few dimension increases are introduced, and the problem of dimension disasters frequently encountered in multi-agent reinforcement learning is avoided. In an actual traffic scene, instant information exchange can be performed through means of real-time communication, remote cloud control or the like, and the technical requirement of the model can be met by the existing 5G communication technology. After the average action of the intelligent agent is obtained through collection and calculation, the traffic flow state vectors which are subjected to convolution, pooling and flattening are spliced and input into a full connection layer of the neural network.
According to the embodiment of the disclosure, the action strategy selects the Boltzmann exploration strategy with the learning rate attenuated along with time so as to balance exploration and utilization of the intelligent agent learning process, and the problem that the algorithm cannot be converged finally due to dynamic instability of the environment possibly caused by the adoption of the greedy strategy is avoided.
According to embodiments of the present disclosure, a model may first be trained using a simulation environment. In the interaction process of the intelligent agent and the simulated traffic environment, the intelligent agent makes an action selection to the environment according to the current state of the environment, the environment feeds back a reward value and a new state, and in the process, the gradient descent parameters of the objective function are updated, and finally a mature deep neural network is trained.
After training of the intelligent agent is completed, traffic state information collected by the traffic detector and average action information of the intelligent agent are input into the deep neural network, and the intelligent agent can provide an optimal signal lamp timing scheme aiming at the current traffic flow state. And the waiting time, the queuing length and the average speed of the traffic flow at each road junction can be output. Through dynamic action decision, the signal lamp period can be fully utilized, the phase loss is reduced, and the effectiveness of a traffic signal lamp control system is improved.
According to the technical scheme provided by the embodiment of the disclosure, the average action is used as a part of the state, and the average action and the state information collected by the detector are spliced together and input into the prediction model based on the deep reinforcement learning to obtain the control action of the current stage of the current intersection, so that the control action aiming at the optimal signal lamp timing in the current traffic flow state can be generated by using the prediction model based on the deep reinforcement learning, and the control action can be self-learned continuously in the application process to continuously improve the timing strategy.
FIG. 4 shows a flow chart for updating model parameters according to an embodiment of the disclosure.
As shown in fig. 4, the method may further include operations S410 to S440.
In operation S410, state information of a next stage of the current intersection after the control action of the current stage of the current intersection is performed is acquired;
determining a vehicle queuing length based on the state information of the next stage of the current intersection in operation S420;
in operation S430, a reward value is determined based on the vehicle queue length, for example, the opposite number of the total number of vehicles entering the lane 12 at the intersection or the opposite number of the total queue length of the traffic flow at the intersection can be determined as the reward value, and the longer the queue length or the more the total number of vehicles, the lower the reward value;
in operation S440, parameters of the prediction model are updated based on the bonus value.
According to the technical scheme provided by the embodiment of the disclosure, the state information of the next stage of the current intersection after the control action of the current stage of the current intersection is executed is obtained; determining the vehicle queuing length based on the state information of the next stage of the current intersection; determining a reward value based on the vehicle queue length; and updating the parameters of the prediction model based on the reward value, so that the aim of minimizing the queuing length can be fulfilled, and the timing strategy can be continuously improved.
According to the embodiment of the disclosure, the method may further include sending the control action of the current stage of the current intersection to the control device of the adjacent intersection, so that the adjacent intersection can generate the timing control action by adopting a similar method.
Fig. 5 illustrates a block diagram of a traffic signal control device 500 according to an embodiment of the present disclosure. The apparatus may be implemented as part or all of an electronic device through software, hardware, or a combination of both.
As shown in fig. 5, the traffic signal control apparatus 500 includes an obtaining module 510, a receiving module 520, a first determining module 530, and a second determining module 540.
An obtaining module 510 configured to obtain state information of a current stage of a current intersection;
a receiving module 520 configured to receive a control action of a previous stage of an adjacent intersection of the current intersection;
a first determination module 530 configured to determine an average value of the encoding of the control action of the previous stage of the approaching intersection;
a second determining module 540 configured to determine the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the encoded average value of the control actions of the previous stage of the adjacent intersection.
According to the technical scheme provided by the embodiment of the disclosure, the state information of the current stage of the current intersection is obtained; receiving a control action of a previous stage of an adjacent intersection of the current intersection; determining an average value of codes of control actions of a previous stage of the adjacent intersection; and determining the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection, so that the interaction of all the agents and the traffic flow environment is taken as a random game problem, the dimensionality of an action space is reduced through mean field approximation, the interaction problem of the agent and other agents is converted into the interaction problem of a certain average effect of the agent and the adjacent agents, and the problem of 'dimensionality disaster' is avoided.
According to the embodiment of the disclosure, the state information comprises vehicle states of a plurality of sub-areas within a range of a current intersection, and the vehicle states comprise whether vehicles exist in the sub-areas and speeds of the vehicles in the sub-areas.
According to the technical scheme provided by the embodiment of the disclosure, the vehicle state can be simply and effectively described through the existence of the vehicle in the sub-area and the speed of the vehicle in the sub-area, so that the prediction efficiency and effectiveness of the control action are improved.
According to the embodiment of the present disclosure, the obtaining module 510 includes a first determining sub-module, a collecting sub-module, and a mapping sub-module.
A first determination submodule configured to determine an attribute of a lane to which each of the plurality of sub-regions belongs;
the acquisition sub-module is configured to acquire vehicle states of a plurality of sub-areas within the range of the current intersection;
and the mapping submodule is configured to map the vehicle states of the plurality of sub-areas into a state matrix according to the attribute of the lane to which the sub-areas belong, and the state matrix is used as the state information of the current stage of the current intersection.
According to the technical scheme provided by the embodiment of the disclosure, the attribute of the lane to which each sub-area in a plurality of sub-areas belongs is determined; collecting vehicle states of a plurality of sub-areas within the range of the current intersection; according to the attribute of the lanes to which the sub-regions belong, the vehicle states of the sub-regions are mapped into a state matrix as the state information of the current stage of the current intersection, the state information has better expansibility, can adapt to different road network structures, has lower dimensionality, can simply and effectively describe the conditions of the intersection, and improves the efficiency and effectiveness of prediction of control actions.
According to an embodiment of the present disclosure, the approaching junction includes at least one of:
crossing the number of crossing which is smaller than the first threshold value;
the intersection of which the straight-line distance from the current intersection is smaller than a second threshold value;
and the distance from the road to the current intersection is less than a third threshold value.
According to the technical scheme provided by the embodiment of the disclosure, the adjacent intersections are determined through one or more ways, only the data of the adjacent intersections are received, and the data of all the intersections are prevented from being collected, so that the device can be suitable for road network structures of any scale.
According to the embodiment of the present disclosure, the second determining module 540 is configured to input the state information of the current stage of the current intersection and the encoded average value of the control actions of the previous stage of the adjacent intersection to the prediction model based on the deep reinforcement learning to obtain the control actions of the current stage of the current intersection.
According to the technical scheme provided by the embodiment of the disclosure, the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection are input to the prediction model based on the deep reinforcement learning to obtain the control actions of the current stage of the current intersection, the control actions can be generated by using the prediction model based on the deep reinforcement learning, and the control actions can be self-learned continuously in the application process to continuously improve the timing strategy.
According to an embodiment of the present disclosure, the apparatus may further include a parameter update module configured to perform the following operations:
acquiring state information of a next stage of the current intersection after executing the control action of the current stage of the current intersection;
determining the vehicle queuing length based on the state information of the next stage of the current intersection;
determining a reward value based on the vehicle queue length;
updating parameters of the predictive model based on the reward value.
According to the technical scheme provided by the embodiment of the disclosure, the state information of the next stage of the current intersection after the control action of the current stage of the current intersection is executed is obtained; determining the vehicle queuing length based on the state information of the next stage of the current intersection; determining a reward value based on the vehicle queue length; and updating the parameters of the prediction model based on the reward value, so that the aim of minimizing the queuing length can be fulfilled, and the timing strategy can be continuously improved.
According to the embodiment of the disclosure, the apparatus may further include a sending module configured to send the control action of the current stage of the current intersection to the control device of the adjacent intersection.
According to the technical scheme provided by the embodiment of the disclosure, the control action of the current stage of the current intersection is sent to the control equipment of the adjacent intersection, so that the adjacent intersection can generate the timing control action by adopting a similar method.
The present disclosure also discloses an electronic device, and fig. 6 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
As shown in fig. 6, the electronic device 600 includes a memory 601 and a processor 602, wherein the memory 601 is configured to store one or more computer instructions, and wherein the one or more computer instructions are executed by the processor 602 to implement the following operations:
acquiring state information of a current stage of a current intersection;
receiving a control action of a previous stage of an adjacent intersection of the current intersection;
determining an average value of codes of control actions of a previous stage of the adjacent intersection;
and determining the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection.
According to the embodiment of the disclosure, the state information comprises vehicle states of a plurality of sub-areas within a range of a current intersection, and the vehicle states comprise whether vehicles exist in the sub-areas and speeds of the vehicles in the sub-areas.
According to the embodiment of the present disclosure, the acquiring the state information of the current stage of the current intersection includes:
determining attributes of lanes to which each of the plurality of sub-regions belongs;
collecting vehicle states of a plurality of sub-areas within the range of the current intersection;
and according to the attribute of the lane to which the sub-regions belong, mapping the vehicle states of the sub-regions into a state matrix as the state information of the current stage of the current intersection.
According to an embodiment of the present disclosure, the approaching junction includes at least one of:
crossing the number of crossing which is smaller than the first threshold value;
the intersection of which the straight-line distance from the current intersection is smaller than a second threshold value;
and the distance from the road to the current intersection is less than a third threshold value.
According to an embodiment of the present disclosure, the determining the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection includes:
and inputting the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection into a prediction model based on deep reinforcement learning so as to obtain the control actions of the current stage of the current intersection.
According to an embodiment of the present disclosure, the processor 602 is further configured to perform:
acquiring state information of a next stage of the current intersection after executing the control action of the current stage of the current intersection;
determining the vehicle queuing length based on the state information of the next stage of the current intersection;
determining a reward value based on the vehicle queue length;
updating parameters of the predictive model based on the reward value.
Processor 602 is also configured to perform processing the state matrix through a convolutional neural network to extract state features, according to an embodiment of the present disclosure.
FIG. 7 illustrates a schematic block diagram of a computer system suitable for implementing traffic signal control of embodiments of the present disclosure.
As shown in fig. 7, the computer system 700 includes a processing unit 701 that can execute various processes in the above-described embodiments according to a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The processing unit 701, the ROM702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary. The processing unit 701 may be implemented as a CPU, a GPU, a TPU, an FPGA, an NPU, or other processing units.
In particular, the above described methods may be implemented as computer software programs according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the above-described method. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present disclosure may be implemented by software or by programmable hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.
As another aspect, the present disclosure also provides a computer-readable storage medium, which may be a computer-readable storage medium included in the electronic device or the computer system in the above embodiments; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (10)

1. A traffic signal control method, comprising:
acquiring state information of a current stage of a current intersection;
receiving a control action of a previous stage of an adjacent intersection of the current intersection;
determining an average value of codes of control actions of a previous stage of the adjacent intersection;
and determining the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection.
2. The method of claim 1, wherein the status information comprises vehicle status for a plurality of sub-areas within range of a current intersection, the vehicle status comprising:
whether a vehicle is present within the sub-region;
the speed of the vehicle within the sub-region.
3. The method of claim 2, wherein the obtaining of the state information of the current stage of the current intersection comprises:
determining attributes of lanes to which each of the plurality of sub-regions belongs;
collecting vehicle states of a plurality of sub-areas within the range of the current intersection;
and according to the attribute of the lane to which the sub-regions belong, mapping the vehicle states of the sub-regions into a state matrix as the state information of the current stage of the current intersection.
4. The method of claim 1, wherein the approaching junction comprises at least one of:
crossing the number of crossing which is smaller than the first threshold value;
the intersection of which the straight-line distance from the current intersection is smaller than a second threshold value;
and the distance from the road to the current intersection is less than a third threshold value.
5. The method of claim 1, wherein the determining the control action for the current stage of the current intersection based on the encoded average of the state information for the current stage of the current intersection and the control action for the previous stage of the adjacent intersection comprises:
and inputting the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection into a prediction model based on deep reinforcement learning so as to obtain the control actions of the current stage of the current intersection.
6. The method of claim 5, further comprising:
acquiring state information of a next stage of the current intersection after executing the control action of the current stage of the current intersection;
determining the vehicle queuing length based on the state information of the next stage of the current intersection;
determining a reward value based on the vehicle queue length;
updating parameters of the predictive model based on the reward value.
7. The method of claim 3, further comprising:
the state matrix is processed through a convolutional neural network to extract state features.
8. A traffic signal control apparatus comprising:
the acquisition module is configured to acquire the state information of the current stage of the current intersection;
a receiving module configured to receive a control action of a previous stage of an adjacent intersection of the current intersection;
a first determination module configured to determine an average of encodings of control actions of a previous stage of the adjacent intersection;
a second determination module configured to determine a control action of a current stage of the current intersection based on the state information of the current stage of the current intersection and an average of the codes of the control actions of a previous stage of the adjacent intersection.
9. An electronic device comprising a memory and a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of any of claims 1-7.
10. A readable storage medium having stored thereon computer instructions, characterized in that the computer instructions, when executed by a processor, carry out the method steps of any of claims 1 to 7.
CN202011120565.5A 2020-10-19 2020-10-19 Traffic signal control method and device, electronic equipment and readable storage medium Pending CN112309138A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011120565.5A CN112309138A (en) 2020-10-19 2020-10-19 Traffic signal control method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011120565.5A CN112309138A (en) 2020-10-19 2020-10-19 Traffic signal control method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN112309138A true CN112309138A (en) 2021-02-02

Family

ID=74328322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011120565.5A Pending CN112309138A (en) 2020-10-19 2020-10-19 Traffic signal control method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN112309138A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436447A (en) * 2021-06-22 2021-09-24 佳都科技集团股份有限公司 Traffic signal management and control system and equipment for grid-shaped road network
CN114049760A (en) * 2021-10-22 2022-02-15 北京经纬恒润科技股份有限公司 Traffic control method, device and system based on intersection
CN116628520A (en) * 2023-07-24 2023-08-22 中国船舶集团有限公司第七〇七研究所 Multi-scholars simulation training method and system based on average field theory algorithm

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107705557A (en) * 2017-09-04 2018-02-16 清华大学 Road network signal control method and device based on depth enhancing network
CN109559530A (en) * 2019-01-07 2019-04-02 大连理工大学 A kind of multi-intersection signal lamp cooperative control method based on Q value Transfer Depth intensified learning
CN110264750A (en) * 2019-06-14 2019-09-20 大连理工大学 A kind of multi-intersection signal lamp cooperative control method of the Q value migration based on multitask depth Q network
CN110428615A (en) * 2019-07-12 2019-11-08 中国科学院自动化研究所 Learn isolated intersection traffic signal control method, system, device based on deeply
US20190347933A1 (en) * 2018-05-11 2019-11-14 Virtual Traffic Lights, LLC Method of implementing an intelligent traffic control apparatus having a reinforcement learning based partial traffic detection control system, and an intelligent traffic control apparatus implemented thereby
CN111127910A (en) * 2019-12-18 2020-05-08 上海天壤智能科技有限公司 Traffic signal adjusting method, system and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107705557A (en) * 2017-09-04 2018-02-16 清华大学 Road network signal control method and device based on depth enhancing network
US20190347933A1 (en) * 2018-05-11 2019-11-14 Virtual Traffic Lights, LLC Method of implementing an intelligent traffic control apparatus having a reinforcement learning based partial traffic detection control system, and an intelligent traffic control apparatus implemented thereby
CN109559530A (en) * 2019-01-07 2019-04-02 大连理工大学 A kind of multi-intersection signal lamp cooperative control method based on Q value Transfer Depth intensified learning
CN110264750A (en) * 2019-06-14 2019-09-20 大连理工大学 A kind of multi-intersection signal lamp cooperative control method of the Q value migration based on multitask depth Q network
CN110428615A (en) * 2019-07-12 2019-11-08 中国科学院自动化研究所 Learn isolated intersection traffic signal control method, system, device based on deeply
CN111127910A (en) * 2019-12-18 2020-05-08 上海天壤智能科技有限公司 Traffic signal adjusting method, system and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨文臣: "多智能体强化学习在城市交通网络信号控制方法中的应用综述", 《计算机应用研究》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436447A (en) * 2021-06-22 2021-09-24 佳都科技集团股份有限公司 Traffic signal management and control system and equipment for grid-shaped road network
CN114049760A (en) * 2021-10-22 2022-02-15 北京经纬恒润科技股份有限公司 Traffic control method, device and system based on intersection
CN116628520A (en) * 2023-07-24 2023-08-22 中国船舶集团有限公司第七〇七研究所 Multi-scholars simulation training method and system based on average field theory algorithm
CN116628520B (en) * 2023-07-24 2023-09-29 中国船舶集团有限公司第七〇七研究所 Multi-scholars simulation training method and system based on average field theory algorithm

Similar Documents

Publication Publication Date Title
CN108197739B (en) Urban rail transit passenger flow prediction method
CN110796856B (en) Vehicle lane change intention prediction method and training method of lane change intention prediction network
CN112309138A (en) Traffic signal control method and device, electronic equipment and readable storage medium
CN110562258B (en) Method for vehicle automatic lane change decision, vehicle-mounted equipment and storage medium
US20230124864A1 (en) Graph Representation Querying of Machine Learning Models for Traffic or Safety Rules
CN109272157A (en) A kind of freeway traffic flow parameter prediction method and system based on gate neural network
CN103593535A (en) Urban traffic complex self-adaptive network parallel simulation system and method based on multi-scale integration
CN111667693B (en) Method, apparatus, device and medium for determining estimated time of arrival
CN112289045B (en) Traffic signal control method and device, electronic equipment and readable storage medium
CN114360239A (en) Traffic prediction method and system for multilayer space-time traffic knowledge map reconstruction
CN114495060A (en) Road traffic marking identification method and device
CN114822019A (en) Traffic information processing method and device
Zheng et al. A deep learning–based approach for moving vehicle counting and short-term traffic prediction from video images
CN115691140B (en) Analysis and prediction method for space-time distribution of automobile charging demand
CN115493610A (en) Lane-level navigation method and device, electronic equipment and storage medium
CN115062202A (en) Method, device, equipment and storage medium for predicting driving behavior intention and track
CN113276860B (en) Vehicle control method, device, electronic device, and storage medium
CN115366919A (en) Trajectory prediction method, system, electronic device and storage medium
CN112686457B (en) Route arrival time estimation method and device, electronic equipment and storage medium
CN110853346B (en) Traffic flow control method and system for intersection
CN115169239A (en) Convolution, attention and MLP integrated travel destination prediction method
CN114428889A (en) Trajectory path binding method, model training method, device, equipment and storage medium
CN114202272A (en) Vehicle and goods matching method and device based on electronic fence, storage medium and terminal
Płaczek Fuzzy cellular model for on-line traffic simulation
CN115081186B (en) Driving behavior simulation system supporting data driving and simulation method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210202