WO2021042401A1

WO2021042401A1 - Method and device for traffic light control

Info

Publication number: WO2021042401A1
Application number: PCT/CN2019/104816
Authority: WO
Inventors: Yan JIAO; Tao Huang; Zhiwei QIN
Original assignee: Beijing Didi Infinity Technology And Development Co., Ltd.
Priority date: 2019-09-07
Filing date: 2019-09-07
Publication date: 2021-03-11

Abstract

A method (400) and a device (100) for controlling traffic lights are provided. The method (400) includes : obtaining traffic information along a plurality of road segments adjacent an intersection (405); generating a state representation using a neural network model based on the traffic information (410); generating a traffic light control scheme using a machine learning algorithm based on the state representation (415).

Description

METHOD AND DEVICE FOR TRAFFIC LIGHT CONTROL

TECHNICAL FIELD

The present disclosure relates to the technology field of traffic light control and more particularly, to a method and a device for traffic light control based on neural networks and machine learning.

BACKGROUND

Cities around the world are experiencing ever increasing traffic jams. Traditionally, traffic lights have been controlled based on a fixed scheme for a given time of the day. For example, the fixed scheme may define that the green light will be on for ten seconds and the red light will be on for five seconds for a certain direction at an intersection. In some special circumstances, when the time period for the green light and the red light need to be changed, the change is often made through manual adjustments.

Smart traffic light control has become a potential solution to easing the heavy city traffic. As the traffic flow state changes at different times, adaptive control of the traffic lights signals has been demonstrated in certain cities as a promising solution. For example, smart traffic lights implemented in the city of Jinan, China has shown an about 17%reduction in traffic delay. However, such smart traffic lights are still not fully automated, and cannot automatically learn from the real time traffic and adapt the traffic light control scheme based on the real time traffic.

Therefore, there is a need to develop a fully automated traffic light control method and device that can learn from real time traffic and adapt the traffic light control scheme based on the state of the traffic.

SUMMARY

An embodiment of the present disclosure provides a method of controlling traffic lights. The method includes obtaining traffic information along a plurality of road segments adjacent an intersection. The method also includes generating a state representation using a neural network model based on the traffic information. The method further includes generating a traffic light control scheme using a machine learning algorithm based on the state representation.

Another embodiment of the present disclosure provides a device for controlling traffic lights. The device includes a memory configured to store instructions. The device also includes a processor configured to execute the instructions to obtain traffic information along a plurality of road segments adjacent an intersection. The processor is also configured to generate a state representation using a neural network model based on the traffic information. The processor is further configured to generate a traffic light control scheme using a machine learning algorithm based on the state representation.

A further embodiment of the present disclosure provides a non-transitory computer readable medium encoded with instructions, which when executed by a processor, cause the processor to perform a method for controlling traffic lights. The method includes obtaining traffic information along a plurality of road segments adjacent an intersection. The method also includes generating a state representation using a neural network model based on the traffic information. The method further includes generating a traffic light control scheme using a machine learning algorithm based on the state representation.

BRIEF DESCRIPTION OF THE DRAWINGS

To better describe the technical solutions of the various embodiments of the present disclosure, the accompanying drawings showing the various embodiments will be briefly described. As a person of ordinary skill in the art would appreciate, the drawings show only some embodiments of the present disclosure. Without departing from the scope of the present disclosure, those having ordinary skills in the art could derive other embodiments and drawings based on the disclosed drawings without inventive efforts.

FIG. 1 is a schematic diagram of a system for traffic light control, in accordance with an embodiment of the present disclosure.

FIG. 2 is a schematic diagram of an intersection and traffic information acquisition, in accordance with an embodiment of the present disclosure.

FIG. 3 shows example density data from four road segments at an intersection, in accordance with an embodiment of the present disclosure.

FIG. 4 is a flow chart illustrating a method for controlling traffic lights, in accordance with an embodiment of the present disclosure.

FIG. 5 is an example simulator for generating traffic data and training the models, in accordance with an embodiment of the present disclosure.

FIG. 6 is an example plot of reward versus steps showing convergence of the machine learning algorithm, in accordance with an embodiment of the present disclosure.

DETAILED DESCRIPTION

Technical solutions of the present disclosure will be described in detail with reference to the drawings. It will be appreciated that the described embodiments represent some, rather than all, of the embodiments of the present disclosure. Other embodiments conceived or derived by those having ordinary skills in the art based on the described embodiments without inventive efforts should fall within the scope of the present disclosure. Example embodiments will be described with reference to the accompanying drawings, in which the same numbers refer to the same or similar elements unless otherwise specified.

As used herein, the singular forms “a, ” “an, ” and “the” are intended to include the plural forms as well, unless the context indicates otherwise. And, the terms “comprise, ” “comprising, ” “include, ” and the like specify the presence of stated features, steps, operations, elements, and/or components but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups. The term “and/or” used herein includes any suitable combination of one or more related items listed. The term “at least one of” A, B, or C encompasses all possible combinations of A, B, and C, including A only, B only, C only, and any combination of two or more of these items.

Unless otherwise defined, all the technical and scientific terms used herein have the same or similar meanings as generally understood by one of ordinary skill in the art. As described herein, the terms used in the specification of the present disclosure are intended to describe example embodiments, instead of limiting the present disclosure.

Further, when an embodiment illustrated in a drawing shows a single element, it is understood that the embodiment may include a plurality of such elements. Likewise, when an embodiment illustrated in a drawing shows a plurality of such elements, it is understood that the embodiment may include only one such element. The number of elements illustrated in the drawing is for illustration purposes only, and should not be construed as limiting the scope of the embodiment.

Moreover, unless otherwise noted, the embodiments shown in the drawings are not mutually exclusive, and may be combined in any suitable manner. For example, elements shown in one embodiment but not in another embodiment may nevertheless be included in the other embodiment.

The present disclosure does not limit the sequence of execution of steps included in disclosed methods. The sequence of the steps may be any suitable sequence, and certain steps may be repeated, omitted, or added.

The present disclosure provides a method and a device for traffic light control based on neural network and machine learning. The disclosed method and device for traffic light control can learn from the real time traffic at an intersection and adapt the traffic light control scheme based on the learning. As a result, better traffic light control can be achieved, and traffic delay can be reduced.

FIG. 1 is a schematic diagram of a system for traffic light control. The system may include a device 100 configured for traffic light control. The device 100 may be located at a central control site that controls multiple intersections, or may be located at each individual intersection. The device 100 may include a transceiver 105 configured to communicate with external devices, such as, for example, a traffic light controller 120 located at each intersection where traffic lights are installed. In some embodiments, the device 100 may be included in the traffic light controller 120. In some embodiments, the transceiver 105 may receive data or signals from the traffic light controller 120 or transmit data or signals to the traffic light controller 120. For example, the transceiver 105 may receive information regarding current operational state of the traffic light controller 120 from the traffic light controller 120. The transceiver 105 may transmit a traffic light control scheme to the traffic light controller 120. The traffic light control scheme may include various time periods, such as a traffic lights changing cycle (e.g., 100 seconds, 150 seconds, 200 seconds) , a green light time ratio that is (atime period for a green light) / (acycle length) , a red-green ratio between a time period for the red light and a time period for the green light (e.g., 0.5, 0.7, 0.8, etc. ) , or the time period for the red light (e.g., 10 seconds, 15 seconds, etc. ) and the time period for the green light (e.g., 15 seconds, 20 seconds, etc. ) , and/or a time period for the yellow light (e.g., 1 second, 2 seconds, 3 seconds, etc. ) . In some embodiments, the transceiver 105 may periodically (e.g., every 10 minutes, 20 minutes) transmit the traffic light control scheme to the traffic light controller 120. Upon receiving the traffic light control scheme, the traffic light controller 120 may execute the traffic light control scheme to control the traffic lights.

In some embodiments, the system may include a plurality of sensors disposed along the road segments adjacent an intersection. The sensors may be configured to obtain traffic information. Various types of sensors may be used, such as cameras, infrared sensors, laser sensors, radar sensors, piezoelectric sensors, strain sensors, magnet based sensors, etc. The sensors may be disposed at suitable locations, such as on the road surfaces, on the sides of the roads, on dedicated poles, on traffic sign poles, or on traffic lights fixtures, etc. Although FIG. 1 shows a first sensor 125 and a second sensor 126 for illustration purposes, it is understood that more than two sensors may be disposed on each road segment.

In some embodiments, the sensors may transmit sensed or measured data relating to the traffic to the transceiver 105 of the device 100. The sensors may measure the traffic in real time, and may transmit the data relating to the traffic to the transceiver 105 in real time. Alternatively, the sensors may transmit the data relating to the traffic to the transceiver 105 periodically at a predetermined time interval (e.g., every 2 minutes, 5 minutes, etc. ) . In some embodiments, at least one sensor may transmit the data relating to the traffic in real time to the transceiver 105.

The device 100 for traffic light control may include a memory 110 and a processor 115. The memory 110 may include any suitable memories, such as a volatile memory or a non-volatile memory. In some embodiments, the memory 110 may include a flash memory, a read-only memory ( “ROM” ) , a random-access memory ( “RAM” ) , a programmable read-only memory (“PROM” ) , an erasable programmable read-only memory ( “EPROM” ) , a dynamic random-access memory ( “DRAM” ) , a static random-access memory ( “SRAM” ) , etc. The memory 110 may be configured to store data, such as computer codes or instructions that may be executed by the processor 115 to perform various methods or processes. In some embodiments, the memory 110 may also be configured to store the data relating to the traffic as acquired by the sensors and received by the transceiver 105. In some embodiments, the device 100 may further include a storage device configured for storing the data relating to the traffic. The storage device may include a hard disk, a compact disc, a solid state disk, a magnetic tape, etc.

The processor 115 may include any suitable processor, such as a central processing unit ( “CPU” ) , a network processor ( “NP” ) , a graphical processing unit ( “GPU” ) , or any combination thereof. In some embodiments, the processor may include a hardware chip. The hardware chip may be an application-specific integrated circuit ( “ASIC” ) , a programmable logic device ( “PLD” ) , or a combination thereof. The processor 115 may be configured to execute the instructions stored in the memory 110 to perform various functions, methods, or processes disclosed herein. The processor 115 may also be configured to read data stored in the memory 110 and/or the storage device and may analyze the data relating to the traffic to determine a traffic light control scheme. In some embodiments, the processor 115 may execute instructions configured for a neural network model and a machine learning algorithm to generate a traffic light control scheme based on real time traffic information.

As shown in FIG. 1, in some embodiments, the device 100 may also receive traffic information from one or

more vehicles

130 and 135 travelling on the roads. For example, each vehicle may be installed with a device having a positioning capability, such as a smart phone or a tablet having a global positioning system ( “GPS” ) sensor. The vehicles may report their locations to the device 100 by transmitting their GPS location information to the transceiver 105. The processor 115 may collect and analyze the location information provided by the vehicles to determine traffic status on each road.

FIG. 2 is a schematic diagram of an intersection and traffic information acquisition at the intersection. An intersection may include at least two road segments joining each other. For illustrative purposes, a four-way intersection is used. Thus, the intersection has four road segments joining each other. Each road segment may be divided into a plurality of cell units for traffic data measurement. Traffic information is acquired within the cell units by the sensors distributed along the road segments. For example, if a road segment has five cell units, the five cell units are treated as five traffic data measurement points or locations, and the traffic information is acquired at these five cell units. In each cell unit, one or more traffic parameters may be measured. In some embodiments, the traffic parameters may include at least one of a density of the traffic, a speed of the traffic, or an influx of the traffic. In some embodiments, all of the three traffic parameters may be measured in each cell unit. The density of the traffic may indicate the number of vehicles per unit length of the cell unit. The speed of the traffic may indicate the average speed of the vehicles within the cell unit. The influx of the traffic may indicate an amount of the traffic within the cell unit. For simplicity of illustrating the disclosed methods, only traffic on the right hand side on each road segment is measured, although traffic in both directions may be measured and considered in the disclosed methods.

The disclosed methods use a neural network model and a machine learning algorithm to generate a traffic light control scheme based on the measured traffic data. For illustrative purposes, a convolutional neural network is used in this disclosure, although other types of neural network may also be used.

FIG. 3 shows example density data from four road segments at an intersection. FIG. 3 shows four density plots (images) corresponding to the four road segments of the intersection (for simplicity, only traffic on the right hand side is measured on each road segment) . Each plot has an x-axis, which is time (in seconds) . The time interval on the x-axis is 0-100 seconds, although other time intervals may also be used. This 100-second time interval may be any other suitable time interval or time period, such as 90 seconds, 120 seconds, 150 seconds, etc. The y-axis is distance (in meters) . In this example, the y-axis is 0 –50 meters. The measurement distance may be any other suitable distance, such as 100 meters, 30 meters, etc. Each plot is shown in gray scale (or color scale) . The gray scale (or color scale) at each point indicates the density of the traffic. At each time instance and each distance, there is a density value. Thus, the density data may be represented by a two-dimensional image (or matrix) , as shown in FIG. 3.

In a similar fashion, data measured by the sensors for other traffic parameters, such as speed and influx (e.g., amount of the traffic) , for the four road segments within the 100 second time period may be represented by two-dimensional images. The combination of the measured data (representing two-dimensional images) for the three parameters, density, speed, and influx, in four road segments (or in four directions) , may be stacked up to create a twelve-channel image (e.g., twelve two-dimensional images stacked up to form a twelve-channel image) . It is understood that when the number of traffic parameters and/or the number of road segments are different, the two-dimensional image input to the neural network may have other suitable number of channels, such as 9 channels, 16 channels, etc. The twelve-channel image may be input to a neural network model, such as a convolutional neural network ( “CNN” ) model. The CNN model is used as an example of the neural network model that may be used to implement the disclosed methods, although other neural network models may also be used.

A CNN model is made up of neurons with weights and biases. Each neuron calculates a weighted sum of a plurality of inputs, processes it with a predetermined function, and provides an output. A CNN model includes an input layer and an output layer, with one or more hidden layers in the middle. The hidden layers may include at least one of a convolutional layer, a pooling layer, a fully connecting layer, or a normalization layer. The input layer may use the twelve-channel two-dimensional image as the input. A predetermined number of filters of a predetermined size (or predetermined sizes) may be applied to the twelve-channel input image to generate a plurality of feature maps, which may be combined as a new image for further convolutional processing, if additional convolutional layers are included in the CNN model. In other words, an output from the previous convolutional layer may be an input to the next convolutional layer. Different levels of features may be extracted in each convolutional layer. In some embodiments, the CNN model may include three convolutional layers. Other suitable number of convolutional layers may also be used. In some embodiments, a pooling layer may be included. Each pooling layer may implement a non-linear down-sampling, and may combine the outputs of neuron clusters from the previous layer into a single neuron in the next layer. Different methods may be used for the pooling, such as maximum pooling, average pooling, etc. A fully connected layer may connect each neuron from a previous layer to each neuron in the next layer. The CNN model may output a state representation, which may be a vector or in any other suitable form. The state representation includes features extracted from the input images.

The state representation output from the CNN model may be input into a machine learning algorithm configured to generate a traffic light control scheme or policy. Any suitable machine learning algorithm may be used. For example, in some embodiments, a reinforcement learning algorithm may be used. For illustrative purposes, a proximal policy optimization ( “PPO” ) algorithm (John Schulman et al., “Proximal Policy Optimization Algorithms, ” August 28, 2017, published at https: //arXiv. org/abs/1707.06347) is used. Other suitable machine learning algorithms may also be used, such as trust region policy optimization algorithms.

A reinforcement learning may be modeled as a Markov Decision Process. The Markov Decision Process defines an environment, a state, an agent, an action, and a reward. For a given state, the agent observes a reward and interacts with the environment (by taking actions) . The environment moves to a new state and assigns a new reward, which is observed by the agent. The process is repeated for a large number of steps. The goal for the agent is to collect as much reward as possible by taking various actions. A policy is a map for guiding the agent’s selection of an action, which may provide a probability of taking a certain action in a certain state. A policy may also be a non-probabilistic policy.

Applying the PPO reinforcement learning algorithm to the traffic light control, the parameters for the PPO algorithm may be defined as follows. For example, a “reward” may be defined as – (10*delay + number of stops) /1000, which is a negative value. The “delay” may be quantified as a time difference between a first time period for a vehicle to pass through an intersection when there is no traffic jam and a second time period for the vehicle to pass through the intersection when there is traffic jam. A “stop” is defined as a stop of a vehicle in a cell unit detected by a sensor, and the number of stops refer to the total number of stops detected in all of the cell units of all of the road segments. Thus, minimizing delay and number of stops would maximize the reward. The “delay” and the “number of stops” collectively represent a level of traffic delay or jam. Other measurable parameters that can represent traffic delay or jam can also be used. An “action” may be defined as a green light time ratio: (atime period for the green light) / (acycle length) , where the cycle length is the total time period for a cycle of traffic lights change. For example, the cycle length may be 100 seconds, or any other number of seconds. It is understood that the action may be in other forms, such as a red-green ratio (atime period for the red light /a time period for the green light) . The policy may be 2-dimensional continuous for the mean and the standard deviation of the green light time ratio. For example, the green light time ratio may be any value in a continuous range, e.g., [0, 1] .

FIG. 4 is a flow chart illustrating a method for traffic light control. Method 400 may be performed by the processor 115 of the device 100 shown in FIG. 1. Method 400 may include obtaining traffic information acquired by at least one sensor disposed along a plurality of road segments adjacent an intersection (step 405) . The traffic information may include at least one of a density, a speed, or an influx (e.g., an amount) of the traffic on each road segment of an intersection. The traffic information may be obtained in real time by sensors disposed along the road segment, such as the radar sensors, laser sensors, infrared sensors, piezoelectric sensors, cameras, etc. In some embodiments, the traffic information may be obtained at predetermined locations on the road segments and at predetermined time instances within the predetermined time period or interval. For example, the traffic information may be obtained from a plurality of cell units of a road segment in a predetermined distance (e.g., 50 meters) within a predetermined time period or interval, such as 100 seconds.

In some embodiments, the traffic information may be derived from positional data and/or speed data received from a plurality of vehicles travelling on the road segment. For example, a vehicle may include a device (e.g., a smart phone, a tablet, a GPS unit) that may transmit its location information to the transceiver 105 of the device 100. The device 100 may be located at a control center or a central control site that controls the traffic lights of a plurality of intersections. In some embodiments, the device 100 may be a dedicated traffic light controller (e.g., the device 100 may be part of the traffic light controller 120) located at a specific intersection. In some embodiments, the transceiver 105 may not receive traffic information directly from the sensors disposed along the road segments or from the vehicles. Instead, in some embodiments, the transceiver 105 may receive traffic information from a traffic control center, which receives the traffic information from the sensors and/or vehicles. The processor 115 may obtain the traffic information from the transceiver 105 or from a storage device that stores the traffic information received by the transceiver 105.

Method 400 may also include generating a state representation using a neural network model based on the traffic information (step 410) . For example, the processor 115 may be configured to execute the neural network model discussed above to process the traffic information that is input into the neural network model. In some embodiments, the neural network model may be a convolutional neural network model. In some embodiments, the processor 115 may process the traffic information by generating two-dimensional images for the measured parameters, such as density, speed, and influx of the traffic. In some embodiments, the processor 115 may combine the two-dimensional images of the traffic parameters measured at multiple road segments to form a multi-channel two-dimensional image (e.g., 12-channel image) , which may be input into the CNN model to generate a state representation for inputting into a machine learning (e.g., a reinforcement learning) algorithm.

Method 400 may further include generating a traffic light control scheme using a machine learning algorithm based on the state representation (step 415) . For example, the processor 115 may be configured to execute a PPO machine learning algorithm to generate a traffic light control scheme based on the state representation generated by the CNN model. In some embodiments, the processor 115 may be configured to generate the traffic light control scheme periodically at a predetermined time interval (e.g., 10 minutes, 15 minutes, 20 minutes) , which may or may not be the same as the predetermined time interval in which the traffic information is obtained.

Method 400 may include other processes and steps not shown in FIG. 4. For example, the transceiver 105 may be configured to transmit the traffic light control scheme to a traffic light controller, which may be located at the intersection. Method 400 may also include training the neural network model and the machine learning model. The training process can determine the model parameters. Training may be performed using a digital simulator. FIG. 5 shows an example simulator or a simulation environment. The simulation environment may generate traffic data or information mimicking real traffic on the roads of a region, including the traffic at one or more intersections. The simulated traffic information may be input to the CNN model and the PPO machine learning algorithm to train the CNN model. When the training process converges on the reward, as shown in FIG. 6, the model parameters are determined. After the model parameters are determined using the training with the simulated traffic, the model may be further calibrated using actual traffic data. The actual data calibration fine-tunes the environmental parameters to be as close to real-world applications as possible. For example, actual traffic data obtained from sensors or provided by traffic control authorities may be input into the simulation environment to fine-tune the model parameters.

After the model parameters are fine-tuned, the model (including the CNN model and the PPO algorithm) is programmed in the device 100 for controlling the traffic lights. In implementation, real time traffic data is fed into the model. The device 100 processes the real time traffic data and generates a traffic light control scheme. The traffic light control scheme may include a green light time ratio, i.e., (atime period for the green light) / (acycle length) . The green light time ratio may be any suitable value between [0, 1] . For example, in some traffic conditions, the green light time ratio may be 0.8 for minimal traffic delay. In some traffic conditions, the green light time ratio may be 0.6 for minimal traffic delay.

The disclosed methods and devices for traffic light control use neural network models and machine learning (e.g., reinforcement learning) algorithms to adapt the traffic light control parameter (e.g., green light time ratio) to the changing traffic at an intersection. Although an intersection is used as an example implementation environment, the disclosed methods may also be applied to a region with multiple intersections. Each intersection may have an associated green light time ratio at a certain time for minimal traffic delay. The green light time ratio may be updated periodically at a predetermined time interval, such as 100 seconds, 200 seconds, 500 seconds, 10 minutes, 20 minutes, 30 minutes, etc. In some embodiments, the predetermined time interval may be fixed for all of the intersections. In some embodiments, the predetermined time interval may be different for different intersections. For example, the time interval for an intersection may be different for different times of the day. For example, the time interval for updating the green light time ratio may be 5 minutes in morning rush hours and/or afternoon rush hours, and 30 minutes during the non-rush hours of the day and/or evening. The time interval for updating the green light time ratio may be dependent on the traffic status. For example, when the traffic is heavy, updating the green light time ratio may be more frequent than when the traffic is light. That is, the predetermine time interval for updating the traffic light control policy or scheme may be shorter when the traffic if heavier.

While embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided to explain the technical solutions of the present disclosure, and do not limit the scope of the present disclosure. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the present disclosure. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the present disclosure. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims

A method of controlling traffic lights, comprising:

obtaining traffic information along a plurality of road segments adjacent an intersection;

generating a state representation using a neural network model based on the traffic information; and

generating a traffic light control scheme using a machine learning algorithm based on the state representation.
The method of claim 1, further comprising transmitting the traffic light control scheme to a traffic light controller.
The method of claim 1, wherein obtaining traffic information acquired by the at least one sensor comprises obtaining at least one of a speed, a density, or an amount of traffic on the road segments within a predetermined time interval.
The method of claim 3, wherein obtaining at least one of the speed, the density, and the amount of the traffic on the road segments within the predetermined time interval comprises:

obtaining the speed, the density, and the amount of the traffic on the road segments at predetermined locations and predetermined time instances within the predetermined time interval.
The method of claim 1, wherein generating the traffic light control scheme comprises generating the traffic light control scheme periodically at the predetermined time interval.
The method of claim 1, further comprising training the neural network model using simulated traffic data and actual traffic data to determine model parameters of the neural network.
The method of claim 1, wherein the traffic light control scheme comprises a ratio between a green light time period and a cycle length.
The method of claim 1, wherein the neural network model is a convolutional neural network model.
The method of claim 1, wherein the machine learning algorithm is configured to maximize a reward defined based on delay of the traffic.
The method of claim 1, wherein the obtaining the traffic information comprises obtaining the traffic information using a plurality of sensors disposed along the plurality of road segments.
A device for controlling traffic lights, comprising:

a memory configured to store instructions; and

a processor configured to execute the instructions to:

obtain traffic information along a plurality of road segments adjacent an intersection;

generate a state representation using a neural network model based on the traffic information; and

generate a traffic light control scheme using a machine learning algorithm based on the state representation.
The device of claim 11, further comprising a transceiver configured to transmit the traffic light control scheme to a traffic light controller.
The device of claim 12, wherein the transceiver is configured to receive the traffic information acquired by the one or more sensors disposed along the plurality of road segments.
The device of claim 11, wherein the traffic information comprises at least one of a speed, a density, or an amount of the traffic on the road segments within a predetermined time interval.
The device of claim 14, wherein the at least one of the speed, the density, or the amount of the traffic on the road segments within the predetermined time interval comprises the speed, the density, and the amount of the traffic on the road segments at predetermined locations and predetermined time instances within the predetermined time interval.
The device of claim 11, wherein the processor is configured to generate the traffic light control scheme periodically at the predetermined time interval.
The device of claim 11, wherein the processor is configured to train the neural network model using simulated traffic data and actual traffic data to determine model parameters of the neural network.
The device of claim 11, wherein the traffic light control scheme comprises a ratio between a green light time period and a cycle length.
The device of claim 11, wherein the neural network model is a convolutional neural network model, and the machine learning algorithm is configured to maximize a reward defined based on delay of the traffic.
A non-transitory computer readable medium encoded with instructions, which when executed by a processor, cause the processor to perform a method for controlling traffic lights, the method comprising:

obtaining traffic information along a plurality of road segments adjacent an intersection;

generating a state representation using a neural network model based on the traffic information; and

generating a traffic light control scheme using a machine learning algorithm based on the state representation.