CN112309138A

CN112309138A - Traffic signal control method and device, electronic equipment and readable storage medium

Info

Publication number: CN112309138A
Application number: CN202011120565.5A
Authority: CN
Inventors: 王鲁晗; 胡天风; 胡智群; 王刚; 傅彬
Original assignee: Zhiyou Open Source Communication Research Institute Beijing Co ltd
Current assignee: Zhiyou Open Source Communication Research Institute Beijing Co ltd
Priority date: 2020-10-19
Filing date: 2020-10-19
Publication date: 2021-02-02

Abstract

The embodiment of the disclosure discloses a traffic signal control method, a traffic signal control device, electronic equipment and a readable storage medium, wherein the method comprises the steps of obtaining state information of a current stage of a current intersection; receiving a control action of a previous stage of an adjacent intersection of the current intersection; determining an average value of codes of control actions of a previous stage of the adjacent intersection; and determining the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection, so that the interaction of all the agents and the traffic flow environment is taken as a random game problem, the dimensionality of an action space is reduced through mean field approximation, the interaction problem of the agent and other agents is converted into the interaction problem of a certain average effect of the agent and the adjacent agents, and the problem of 'dimensionality disaster' is avoided.

Description

Traffic signal control method and device, electronic equipment and readable storage medium

Technical Field

The disclosure relates to the technical field of intelligent traffic, in particular to a traffic signal control method, a traffic signal control device, electronic equipment and a readable storage medium.

Background

With the rapid development of the Chinese social economy, the automobile holding capacity of urban residents is continuously improved, the road congestion phenomenon occurs more and more frequently, and the problem also directly influences the life quality and the trip experience of the urban residents. As a key part of traffic flow management, traffic signal lamp timing plays an important role in traffic flow control, road blockage alleviation and regional traffic flow coordination.

The traffic light intelligent regulation and control research based on deep reinforcement learning introduces a deep reinforcement learning algorithm into traffic light timing, an intelligent agent is built at each intersection, traffic flow information of the current intersection is obtained as a state, actions are output through a deep reinforcement learning network to control a timing scheme of the traffic light, and traffic performance indexes such as queuing length, waiting time and the like are set as rewards to guide the learning of the intelligent agent. The research mainly utilizes the characteristic that deep reinforcement learning can be applied to dynamic and uncertain scenes, and does not need to deduce a complex mathematical model.

However, the present inventors have found that the prior art has at least the following problems: if the intelligent bodies at each intersection are taken as independent bodies, each intelligent body collects local state information and independently makes action decisions, and state and action information interaction among the intelligent bodies is lacked, so that local traffic jam is easy to generate, and a better area coordination effect cannot be achieved; if the interaction between all agents and the environment is regarded as a random game problem, the action space required to be processed by the agents comprises the action selection of all agents, and under the condition of processing multiple intersections and even a large-scale road network, the dimensionality of the action space is exponentially increased along with the increase of the number of intersections, so that a dimensionality disaster is caused.

Disclosure of Invention

In order to solve the problems in the related art, embodiments of the present disclosure provide a traffic signal control method, a traffic signal control apparatus, an electronic device, and a readable storage medium.

In a first aspect, a traffic signal control method is provided in an embodiment of the present disclosure.

Specifically, the traffic signal control method includes:

acquiring state information of a current stage of a current intersection;

receiving a control action of a previous stage of an adjacent intersection of the current intersection;

determining an average value of codes of control actions of a previous stage of the adjacent intersection;

and determining the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection.

With reference to the first aspect, the present disclosure provides in a first implementation manner of the first aspect, where the state information includes vehicle states of a plurality of sub-areas within a range of the current intersection, and the vehicle states include whether vehicles exist in the sub-areas and speeds of the vehicles in the sub-areas.

With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the obtaining the state information of the current stage of the current intersection includes:

determining attributes of lanes to which each of the plurality of sub-regions belongs;

collecting vehicle states of a plurality of sub-areas within the range of the current intersection;

and according to the attribute of the lane to which the sub-regions belong, mapping the vehicle states of the sub-regions into a state matrix as the state information of the current stage of the current intersection.

With reference to the first aspect, in a third implementation manner of the first aspect, the approaching intersection includes at least one of:

crossing the number of crossing which is smaller than the first threshold value;

the intersection of which the straight-line distance from the current intersection is smaller than a second threshold value;

and the distance from the road to the current intersection is less than a third threshold value.

With reference to the first aspect, in a fourth implementation manner of the first aspect, the determining a control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection includes:

and inputting the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection into a prediction model based on deep reinforcement learning so as to obtain the control actions of the current stage of the current intersection.

With reference to the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the method further includes:

acquiring state information of a next stage of the current intersection after executing the control action of the current stage of the current intersection;

determining the vehicle queuing length based on the state information of the next stage of the current intersection;

determining a reward value based on the vehicle queue length;

updating parameters of the predictive model based on the reward value.

With reference to the second implementation manner of the first aspect, in a sixth implementation manner of the first aspect, the method further includes processing the state matrix through a convolutional neural network to extract a state feature.

In a second aspect, a traffic signal control apparatus is provided in an embodiment of the present disclosure.

Specifically, the traffic signal control device includes:

the acquisition module is configured to acquire the state information of the current stage of the current intersection;

a receiving module configured to receive a control action of a previous stage of an adjacent intersection of the current intersection;

a first determination module configured to determine an average of encodings of control actions of a previous stage of the adjacent intersection;

a second determination module configured to determine a control action of a current stage of the current intersection based on the state information of the current stage of the current intersection and an average of the codes of the control actions of a previous stage of the adjacent intersection.

In a third aspect, the present disclosure provides an electronic device, including a memory and a processor, where the memory is configured to store one or more computer instructions, where the one or more computer instructions are executed by the processor to implement the method according to the first aspect, and any one of the first to sixth aspects.

In a fourth aspect, the present disclosure provides a computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the method according to the first aspect, or any one of the first to sixth aspects.

According to the technical scheme provided by the embodiment of the disclosure, the state information of the current stage of the current intersection is obtained; receiving a control action of a previous stage of an adjacent intersection of the current intersection; determining an average value of codes of control actions of a previous stage of the adjacent intersection; and determining the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection, so that the interaction of all the agents and the traffic flow environment is taken as a random game problem, the dimensionality of an action space is reduced through mean field approximation, the interaction problem of the agent and other agents is converted into the interaction problem of a certain average effect of the agent and the adjacent agents, and the problem of 'dimensionality disaster' is avoided.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:

fig. 1A and 1B show schematic diagrams of application scenarios according to embodiments of the present disclosure;

FIG. 2 illustrates a flow chart of a traffic signal control method according to an embodiment of the disclosure;

FIG. 3 illustrates a flow diagram for obtaining status information according to an embodiment of the present disclosure;

FIG. 4 illustrates a flow diagram for updating model parameters according to an embodiment of the present disclosure;

FIG. 5 shows a block diagram of a traffic signal control device according to an embodiment of the present disclosure;

FIG. 6 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure;

FIG. 7 illustrates a schematic block diagram of a computer system suitable for implementing traffic signal control of embodiments of the present disclosure.

Detailed Description

Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.

In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.

It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

The prior art has two problems when applying reinforcement learning to solve the control of the traffic signal lamp of multiple intersections: if the intelligent bodies at each intersection are taken as independent bodies, each intelligent body collects local state information and independently makes action decisions, and state and action information interaction among the intelligent bodies is lacked, so that local traffic jam is easy to generate, and a better area coordination effect cannot be achieved; if the interaction between all agents and the environment is regarded as a random game problem, the action space required to be processed by the agents comprises the action selection of all agents, and under the condition of processing multiple intersections and even a large-scale road network, the dimensionality of the action space is exponentially increased along with the increase of the number of intersections, so that a dimensionality disaster is caused.

Fig. 1A and 1B show schematic diagrams of application scenarios according to embodiments of the present disclosure.

The road network structure generally includes a plurality of roads and a plurality of intersections formed by intersections of the plurality of roads. Fig. 1A shows an exemplary road network structure including three lateral roads and three longitudinal roads and nine intersections formed. Fig. 1B shows a schematic diagram of an exemplary intersection, where lanes entering the intersection are defined as entering lanes, and lanes leaving the intersection are defined as exiting lanes, the intersection includes 12 entering lanes and 12 exiting lanes, and there are three entering lanes and three exiting lanes in each direction. Wherein, the three entering lanes are respectively a left-turn lane, a straight lane and a right-turn lane.

It should be noted that fig. 1A and 1B above are only exemplary road network structures, and the application of the traffic signal control method and apparatus of the embodiment of the present disclosure is not limited to such a structure, for example, there may be more or less intersections, or there may be other intersections such as t-intersections, and each intersection may also have a different lane arrangement from that of fig. 1B.

Fig. 2 illustrates a flow chart of a traffic signal control method according to an embodiment of the present disclosure.

As shown in fig. 2, the traffic signal control method includes operations S210 to S240.

In operation S210, state information of a current stage of the current intersection is acquired;

receiving a control action of a previous stage of an adjacent intersection of the current intersection in operation S220;

determining an average value of codes of control actions of a previous stage of the adjacent intersection in operation S230;

in operation S240, a control action of the current stage of the current intersection is determined based on the state information of the current stage of the current intersection and the encoded average value of the control actions of the previous stage of the adjacent intersection.

According to the embodiment of the present disclosure, in operation S210, state information of a current stage of a current intersection is acquired, where the state information includes vehicle states of a plurality of sub-areas within a range of the current intersection. For example, a detector may be arranged in each direction of each intersection, and the detection range may cover the entire road segment for acquiring the state information of the intersection.

According to the embodiment of the present disclosure, in addition to the above-described detector, a communication unit, a processing unit, and a control unit may be provided for each intersection. The communication unit is used for sending the information of the current intersection to other intersections or receiving the information sent by other intersections. The processing unit is used for executing the calculation task and generating the control action according to the obtained information. The control unit is used for controlling signal change of the signal lamp according to the control action generated by the processing unit.

According to the embodiment of the disclosure, the state information comprises vehicle states of a plurality of sub-areas within a range of a current intersection, and the vehicle states comprise whether vehicles exist in the sub-areas and speeds of the vehicles in the sub-areas.

According to the technical scheme provided by the embodiment of the disclosure, the vehicle state can be simply and effectively described through the existence of the vehicle in the sub-area and the speed of the vehicle in the sub-area, so that the prediction efficiency and effectiveness of the control action are improved.

For example, the distance L (e.g. about 6.5 meters) between the vehicle length and the inter-vehicle distance road section can divide the K entering lanes corresponding to each intersection into w small road sections with length of L, i.e. sub-areas, respectively, so that the traffic state of one intersection can be converted into a matrix of K × w (w depends on the length of the lane). The position in the matrix may accurately represent the lane position of the vehicle, and each element of the matrix is the vehicle speed at that position. Taking the intersection with 12 entering lanes shown in fig. 1B as an example, the state information of the intersection can be represented as a 12 × w matrix, and each element in the matrix represents the vehicle state on a corresponding one of the sub-areas.

According to the embodiment of the present disclosure, in order to represent the case where there is no vehicle in the sub-area, it may be defined that the matrix element is equal to 1 when the vehicle speed is 0, and the matrix element is equal to 0 at this position without a vehicle.

Fig. 3 shows a flow chart for obtaining status information according to an embodiment of the present disclosure.

As shown in fig. 3, the operation S210 may include operations S310 to S330.

In operation S310, determining an attribute of a lane to which each of a plurality of the sub-regions belongs;

in operation S320, vehicle states of a plurality of sub-areas within the range of the current intersection are collected;

in operation S330, vehicle states of the plurality of sub-regions are mapped to a state matrix according to attributes of lanes to which the sub-regions belong, as state information of a current stage of the current intersection.

According to an embodiment of the present disclosure, the attributes of the lane may include, for example, a left turn, a straight lane, or a right turn. For the case of the lane setting different from the intersection as shown in fig. 1B, the original state information may be preprocessed according to a predetermined rule, that is, the vehicle states of the plurality of sub-regions are mapped to the state matrix according to the attribute of the lane to which the sub-regions belong through the above operation S330. For example, in the case that there are four lanes in the north-south direction, i.e., left turn, straight run, and right turn, the vehicle speeds at the corresponding (parallel) positions on the two straight lanes are arithmetically averaged to be the vehicle speed at the position on the straight lane, and the vehicle speed is filled in the corresponding position of the matrix; and (4) directly filling the speeds of all vehicles on the lane into the corresponding positions of the matrix for the left-turn lane and the right-turn lane. For another example, two lanes are provided in the north-south direction, one is left-turn and the other is straight-going and right-turn, the element of the right-turn lane in the matrix is set to 0, and the vehicle speed information on the straight-going and right-turn lane is filled into the feature matrix to indicate the straight-going lane.

According to the technical scheme provided by the embodiment of the disclosure, the attribute of the lane to which each sub-area in a plurality of sub-areas belongs is determined; collecting vehicle states of a plurality of sub-areas within the range of the current intersection; according to the attribute of the lanes to which the sub-regions belong, the vehicle states of the sub-regions are mapped into a state matrix as the state information of the current stage of the current intersection, the state information has better expansibility, can adapt to different road network structures, has lower dimensionality, can simply and effectively describe the conditions of the intersection, and improves the efficiency and effectiveness of prediction of control actions.

According to the embodiment of the present disclosure, in operation S220, a control action of a previous stage of an adjacent intersection of the current intersection is received, wherein the adjacent intersection includes at least one of:

According to the embodiment of the present disclosure, the first threshold may be an integer greater than or equal to 1, and in the case that the first threshold is equal to 1, intersections passing through the current intersection are less than the first threshold, that is, intersections adjacent to the current intersection. For example, in the embodiment illustrated in FIG. 1A, the intersections adjacent to intersection 5 are

intersections

2, 4, 6, 8. As another example, in the case where the first threshold value is equal to 2, the adjacent intersections of intersection 1 are

intersections

2, 3, 4, 5, 7. According to an embodiment of the present disclosure, the second threshold and the third threshold are both positive numbers, e.g. 5 km.

According to the technical scheme provided by the embodiment of the disclosure, the adjacent intersections are determined through one or more ways, only the data of the adjacent intersections are received, and the data of all the intersections are prevented from being collected, so that the method can be suitable for road network structures of any scale.

According to the embodiment of the present disclosure, in operation S230, an average value of codes of control actions of a previous stage of the adjacent intersection is determined. At each signal light cycle (e.g. 15s), each agent makes a selection of control actions, which may include, for example: south-north straight going, south-north left turning, east-west straight going, east-west left turning. Thus, the actions of the agent can be represented as a 4-dimensional vector, i.e. the action space of the agent can be represented as a one-hot code (one-hot), i.e., { [1,0,0,0], [0,1,0,0], [0,0,1,0], [0,0,0,1] }, e.g. if the control actions of the east-west-straight line can be represented as a code of [0,0,1,0 ].

According to embodiments of the present disclosure, an average value of the encoding of the control actions of the adjacent intersections can be determined. For example, in the case where the intersection 5 shown in fig. 1B is

adjacent intersections

2, 4, 6, and 8, if the codes of the control operations of the

intersections

2, 4, 6, and 8 are [1,0,0,0], [0,1,0,0], [0,0,1,0], [1,0,0,0], respectively, the average value of [0.5,0.25,0.25, and 0] can be taken, and the vector having the same dimension is still one element whose sum is 1.

It should be noted that the arithmetic sum of the codes of the adjacent intersection control actions has an equivalent effect in addition to the average value, and the arithmetic sum may be used instead of the average value, for example, the arithmetic sum of [1,0,0,0], [0,1,0,0], [0,0,1,0], [1,0,0,0] is [2,1,1,0], except that the sum of the elements is not 1, and normalization processing may be performed as necessary.

According to the embodiment of the present disclosure, in operation S240, the determining the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection includes inputting the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection into a prediction model based on deep reinforcement learning to obtain the control action of the current stage of the current intersection.

Reinforcement learning is one of the paradigms and methodologies of machine learning to describe and solve the problem of agents (agents) learning strategies to maximize returns or achieve specific goals during interactions with the environment. In the embodiment of the present disclosure, the prediction model run by the processing unit of each intersection can be abstracted into an agent, and the state information of each intersection is the state information of the environment faced by the agent.

According to the embodiment of the disclosure, the prediction model may be mainly constructed by a dqn (deep Q network) network, for example. The state input dimension of the model is large, and a convolutional neural network can be used for extracting state features. The traffic flow state information is subjected to the sum of two layers of convolution and pooling, and is prepared to be input into a full-connection layer through data flattening operation. The above described encoded average of the control actions of the approaching crossing is introduced at this time. The arithmetic mean of the actions of adjacent agents is still a vector with the same dimensions, which can be understood as the empirical distribution of the agent action choices around the intersection. The average action information is input into the deep neural network as a part of the state, so that the information interaction between intersections can be realized, only a few dimension increases are introduced, and the problem of dimension disasters frequently encountered in multi-agent reinforcement learning is avoided. In an actual traffic scene, instant information exchange can be performed through means of real-time communication, remote cloud control or the like, and the technical requirement of the model can be met by the existing 5G communication technology. After the average action of the intelligent agent is obtained through collection and calculation, the traffic flow state vectors which are subjected to convolution, pooling and flattening are spliced and input into a full connection layer of the neural network.

According to the embodiment of the disclosure, the action strategy selects the Boltzmann exploration strategy with the learning rate attenuated along with time so as to balance exploration and utilization of the intelligent agent learning process, and the problem that the algorithm cannot be converged finally due to dynamic instability of the environment possibly caused by the adoption of the greedy strategy is avoided.

According to embodiments of the present disclosure, a model may first be trained using a simulation environment. In the interaction process of the intelligent agent and the simulated traffic environment, the intelligent agent makes an action selection to the environment according to the current state of the environment, the environment feeds back a reward value and a new state, and in the process, the gradient descent parameters of the objective function are updated, and finally a mature deep neural network is trained.

After training of the intelligent agent is completed, traffic state information collected by the traffic detector and average action information of the intelligent agent are input into the deep neural network, and the intelligent agent can provide an optimal signal lamp timing scheme aiming at the current traffic flow state. And the waiting time, the queuing length and the average speed of the traffic flow at each road junction can be output. Through dynamic action decision, the signal lamp period can be fully utilized, the phase loss is reduced, and the effectiveness of a traffic signal lamp control system is improved.

According to the technical scheme provided by the embodiment of the disclosure, the average action is used as a part of the state, and the average action and the state information collected by the detector are spliced together and input into the prediction model based on the deep reinforcement learning to obtain the control action of the current stage of the current intersection, so that the control action aiming at the optimal signal lamp timing in the current traffic flow state can be generated by using the prediction model based on the deep reinforcement learning, and the control action can be self-learned continuously in the application process to continuously improve the timing strategy.

FIG. 4 shows a flow chart for updating model parameters according to an embodiment of the disclosure.

As shown in fig. 4, the method may further include operations S410 to S440.

In operation S410, state information of a next stage of the current intersection after the control action of the current stage of the current intersection is performed is acquired;

determining a vehicle queuing length based on the state information of the next stage of the current intersection in operation S420;

in operation S430, a reward value is determined based on the vehicle queue length, for example, the opposite number of the total number of vehicles entering the lane 12 at the intersection or the opposite number of the total queue length of the traffic flow at the intersection can be determined as the reward value, and the longer the queue length or the more the total number of vehicles, the lower the reward value;

in operation S440, parameters of the prediction model are updated based on the bonus value.

According to the technical scheme provided by the embodiment of the disclosure, the state information of the next stage of the current intersection after the control action of the current stage of the current intersection is executed is obtained; determining the vehicle queuing length based on the state information of the next stage of the current intersection; determining a reward value based on the vehicle queue length; and updating the parameters of the prediction model based on the reward value, so that the aim of minimizing the queuing length can be fulfilled, and the timing strategy can be continuously improved.

According to the embodiment of the disclosure, the method may further include sending the control action of the current stage of the current intersection to the control device of the adjacent intersection, so that the adjacent intersection can generate the timing control action by adopting a similar method.

Fig. 5 illustrates a block diagram of a traffic signal control device 500 according to an embodiment of the present disclosure. The apparatus may be implemented as part or all of an electronic device through software, hardware, or a combination of both.

As shown in fig. 5, the traffic signal control apparatus 500 includes an obtaining module 510, a receiving module 520, a first determining module 530, and a second determining module 540.

An obtaining module 510 configured to obtain state information of a current stage of a current intersection;

a receiving module 520 configured to receive a control action of a previous stage of an adjacent intersection of the current intersection;

a first determination module 530 configured to determine an average value of the encoding of the control action of the previous stage of the approaching intersection;

a second determining module 540 configured to determine the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the encoded average value of the control actions of the previous stage of the adjacent intersection.

According to the embodiment of the present disclosure, the obtaining module 510 includes a first determining sub-module, a collecting sub-module, and a mapping sub-module.

A first determination submodule configured to determine an attribute of a lane to which each of the plurality of sub-regions belongs;

the acquisition sub-module is configured to acquire vehicle states of a plurality of sub-areas within the range of the current intersection;

and the mapping submodule is configured to map the vehicle states of the plurality of sub-areas into a state matrix according to the attribute of the lane to which the sub-areas belong, and the state matrix is used as the state information of the current stage of the current intersection.

According to an embodiment of the present disclosure, the approaching junction includes at least one of:

According to the technical scheme provided by the embodiment of the disclosure, the adjacent intersections are determined through one or more ways, only the data of the adjacent intersections are received, and the data of all the intersections are prevented from being collected, so that the device can be suitable for road network structures of any scale.

According to the embodiment of the present disclosure, the second determining module 540 is configured to input the state information of the current stage of the current intersection and the encoded average value of the control actions of the previous stage of the adjacent intersection to the prediction model based on the deep reinforcement learning to obtain the control actions of the current stage of the current intersection.

According to the technical scheme provided by the embodiment of the disclosure, the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection are input to the prediction model based on the deep reinforcement learning to obtain the control actions of the current stage of the current intersection, the control actions can be generated by using the prediction model based on the deep reinforcement learning, and the control actions can be self-learned continuously in the application process to continuously improve the timing strategy.

According to an embodiment of the present disclosure, the apparatus may further include a parameter update module configured to perform the following operations:

determining a reward value based on the vehicle queue length;

updating parameters of the predictive model based on the reward value.

According to the embodiment of the disclosure, the apparatus may further include a sending module configured to send the control action of the current stage of the current intersection to the control device of the adjacent intersection.

According to the technical scheme provided by the embodiment of the disclosure, the control action of the current stage of the current intersection is sent to the control equipment of the adjacent intersection, so that the adjacent intersection can generate the timing control action by adopting a similar method.

The present disclosure also discloses an electronic device, and fig. 6 shows a block diagram of an electronic device according to an embodiment of the present disclosure.

As shown in fig. 6, the electronic device 600 includes a memory 601 and a processor 602, wherein the memory 601 is configured to store one or more computer instructions, and wherein the one or more computer instructions are executed by the processor 602 to implement the following operations:

acquiring state information of a current stage of a current intersection;

According to the embodiment of the present disclosure, the acquiring the state information of the current stage of the current intersection includes:

According to an embodiment of the present disclosure, the determining the control action of the current stage of the current intersection based on the state information of the current stage of the current intersection and the average value of the codes of the control actions of the previous stage of the adjacent intersection includes:

According to an embodiment of the present disclosure, the processor 602 is further configured to perform:

determining a reward value based on the vehicle queue length;

updating parameters of the predictive model based on the reward value.

Processor 602 is also configured to perform processing the state matrix through a convolutional neural network to extract state features, according to an embodiment of the present disclosure.

As shown in fig. 7, the computer system 700 includes a processing unit 701 that can execute various processes in the above-described embodiments according to a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The processing unit 701, the ROM702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary. The processing unit 701 may be implemented as a CPU, a GPU, a TPU, an FPGA, an NPU, or other processing units.

In particular, the above described methods may be implemented as computer software programs according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the above-described method. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present disclosure may be implemented by software or by programmable hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.

As another aspect, the present disclosure also provides a computer-readable storage medium, which may be a computer-readable storage medium included in the electronic device or the computer system in the above embodiments; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A traffic signal control method, comprising:

acquiring state information of a current stage of a current intersection;

2. The method of claim 1, wherein the status information comprises vehicle status for a plurality of sub-areas within range of a current intersection, the vehicle status comprising:

whether a vehicle is present within the sub-region;

the speed of the vehicle within the sub-region.

3. The method of claim 2, wherein the obtaining of the state information of the current stage of the current intersection comprises:

4. The method of claim 1, wherein the approaching junction comprises at least one of:

5. The method of claim 1, wherein the determining the control action for the current stage of the current intersection based on the encoded average of the state information for the current stage of the current intersection and the control action for the previous stage of the adjacent intersection comprises:

6. The method of claim 5, further comprising:

determining a reward value based on the vehicle queue length;

updating parameters of the predictive model based on the reward value.

7. The method of claim 3, further comprising:

the state matrix is processed through a convolutional neural network to extract state features.

8. A traffic signal control apparatus comprising:

9. An electronic device comprising a memory and a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of any of claims 1-7.

10. A readable storage medium having stored thereon computer instructions, characterized in that the computer instructions, when executed by a processor, carry out the method steps of any of claims 1 to 7.