WO2021102213A1

WO2021102213A1 - Data-driven determination of cascading effects of congestion in a network

Info

Publication number: WO2021102213A1
Application number: PCT/US2020/061418
Authority: WO
Inventors: Sanchita BASAK; Abhishek Dubey; Bruno Paes Leao
Original assignee: Siemens Aktiengesellschaft; Vanderbilt University
Priority date: 2019-11-20
Filing date: 2020-11-20
Publication date: 2021-05-27

Abstract

For determining congestion and/or a cascading effect of congestion in a network, such as a traffic network, separate machine-learned models are provided for different segments of the network. By using data localized to connected segments by a limited number of hops, segment-specific predictions of flow by the respective machine-learned models are used to determine congestion. The cascade effect is determined from the localized predictions.

Description

DATA-DRIVEN DETERMINATION OF CASCADING EFFECTS OF CONGESTION IN A NETWORK

RELATED APPLICATIONS

[0001] The present patent document claims the benefit of the filing date under 35 U.S.C. §119(e) of Provisional U.S. Patent Application Serial No. 62/937,951, filed November 20, 2019, which is hereby incorporated by reference.

BACKGROUND

[0002] The present embodiments relate to data-driven determination of congestion in a network. In a large-scale interconnected system, such as a traffic network, cascading failures occur where failure in one part of the system eventually triggers failure in other parts of the system. One example traffic network is a road network. A primary road congestion created at a source can trigger secondary and tertiary road congestion due to physical connectivity. Traffic delays and waste of time and energy result. To mitigate such effects and build effective route guidance systems, congestion is forecast in advance to predict when road segments will be affected in the near future.

[0003] Traffic congestion prediction has been carried out in both model- driven and data-driven approaches. Model-driven approaches are based upon mathematical modelling to capture traffic congestion dynamics, such as models based on shockwave theory or a bathtub model. Accurate modeling of the dynamic behavior of a complex system such as traffic networks using standard mathematical or statistical methods is a challenging task because the speed distributions in a large-scale dynamic system like traffic network cannot be always modeled by predetermined distributions and all the modalities of such a dynamic and complex system cannot be captured.

[0004] In known data-driven approaches, the complex functional relationships among several influencing factors can be learned by studying large amounts of data without relying on any standard and fixed statistical relation. The traffic network is treated as a homogeneous system, applying a generalized architecture for the entire network. This approach may not capture the dynamically changing influence of each neighbor on a certain target road segment.

SUMMARY

[0005] By way of introduction, the preferred embodiments described below include methods, systems, instructions, and computer readable media for determining congestion and/or a cascading effect of congestion in a network, such as a traffic network. Separate machine-learned models are provided for different segments of the network. By using data localized to connected segments by a limited number of hops, segment-specific predictions of traffic flow by the respective machine-learned models are used to determine congestion. The cascade effect is determined from the localized predictions. [0006] In a first aspect, a method is provided for determining a cascading effect of congestion in a traffic network. The traffic network is defined as a directed connected graph where each edge is a road segment. Congestion on one or more of the road segments is identified. Traffic for a future time at the road segments in the directed connected graph is predicted by separate machine-learned models predicting the traffic for respective road segments. The cascading effect of the congestion is determined from the predicted traffic.

[0007] In one embodiment, the directed connected graph is defined for at least ten road segments. Each of the machine-learned models was trained for the prediction of the traffic for a single direction of the respective road segment. Various types of traffic metrics may be predicted, such as the speed, volume flow, count, stop frequency, and/or jam factor. In one embodiment, sensor data for the road segments is received. The sensor data is speed. The traffic is predicted as speed.

[0008] Various measures of congestion may be used. In one embodiment, congestion is identified as a ratio of (1 ) a current speed on the one or more road segments to (2) an average speed unlimited by congestion on the one or more road segments being below a threshold.

[0009] In one embodiment, the traffic is predicted for each of the road segments by input of only data of downstream road segments by only one or two hops on the directed connected graph to the respective machine-learned model for the road segment. The prediction is of the traffic, such as speed of traffic, at a future time.

[0010] Various machine-learned models may be used. In one embodiment, recurrent neural networks with long short-term memory predict the traffic for respective road segments or graph edges.

[0011] In one embodiment, the cascading effect is determined as a start of congestion upstream on the directed connected graph of the congestion. Different measures of congestion may be used for the cascading effect. For example, the upstream congestion is detected as a ratio of predicted traffic to a reference traffic is below a threshold on one of the road segments upstream from the congestion.

[0012] The predicted cascade of congestion may be used for further prediction. In one embodiment, the predicted cascade of congestion is used in navigation, such as to route a mobile device on the traffic network based on the determined cascading effect (e.g., route to avoid expected congestion in the future due to the cascade).

[0013] In a second aspect, a system is provided for predicting congestion in a traffic network. Sensors are configured to sense traffic flow in different road segments of the traffic network. A processor is configured to group the sensed traffic flow separately for different ones of the road segments and forecast the congestion by application of the groups of the sensed traffic flow to respective machine-learned networks, wherein each machine-learned network was trained for different ones of the road segments.

[0014] The sensors may sense various aspects of traffic, such as being configured to sense speed of traffic as the traffic flow. The processor may use the sensed traffic flow to forecast or may be configured to normalize the sensed traffic flow of each group to a same scale regardless of differences in maximum speeds and average speeds for the road segments.

[0015] In one embodiment, the traffic network is modeled as a directed connected graph. The processor is configured to group so that the sensed traffic flow for each machine-learned network includes the sensed traffic flow for the road segment for the machine-learned network and the sensed traffic flow for downstream road segments less than two or three hops away from the road segment for the machine-learned network .

[0016] In an embodiment, the processor is configured to forecast the congestion by an amount of difference of a measure of traffic output by the machine-learned network to a measure of the traffic.

[0017] In another embodiment, the machine-learned networks are long- short term memory neural networks. In a further embodiment, an interface is configured to transmit the forecast congestion for one or more of the road segments to a traffic routing system.

[0018] In a third aspect, a method is provided for machine training for congestion detection. Flow data from different edges of a directed connected graph is separated into sets for the edge and any outgoing edges within one or two hops. Models are machine trained to predict flow for the different edges. One of the models is machine trained for each one of the different edges based on the flow data for the set for the respected one of the different edges.

[0019] In one embodiment, long short-term memory neural networks are separately machine trained for each of the different edges.

[0020] The separation and machine training form distributed spatiotemporal modeling of flow of traffic on a transportation network as the directed connected graph.

[0021] The present invention is defined by the following claims, and nothing in this section should be taken as a limitation on those claims.

Further aspects and advantages of the invention are discussed below in conjunction with the preferred embodiments and may be later claimed independently or in combination.

BRIEF DESCRIPTION OF THE DRAWINGS

[0022] The components and the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like reference numerals designate corresponding parts throughout the different views. [0023] Figure 1 is a flow chart diagram of an embodiment of a method for determining a cascade effect of congestion in a network and routing traffic according to those predictions;

[0024] Figure 2 is a flow chart diagram of one embodiment of methods for training and forecasting of congestion;

[0025] Figure 3 illustrates an example data-driven forecasting arrangement for a road network;

[0026] Figure 4 shows example grouping of traffic data and a distributed processing approach for forecasting congestion;

[0027] Figure 5 illustrates an example framework for congestion forecasting using fine-tuned timesteps;

[0028] Figure 6 illustrates another example framework for congestion forecasting;

[0029] Figure 7 shows example precision and recall for congestion forecasting;

[0030] Figure 8 shows an example portion of a road network;

[0031] Figure 9 shows actual versus predicted congestion for cascade for three example neighbor groups;

[0032] Figure 10 is a graph comparing example predicted verses actual speed;

[0033] Figure 11 is a block diagram of one embodiment of a system for predicting congestion; and

[0034] Figure 12 is a flow chart diagram of one embodiment of a method for machine training networks to predict localized congestion in an overall traffic network.

DETAILED DESCRIPTION OF THE DRAWINGS AND PRESENTLY PREFERRED EMBODIMENTS

[0035] The cascading effect of traffic congestion is analyzed using a distribution of machine-learned networks, such as long-short term memory (LSTM) networks. Traffic congestion forecasting using data driven approaches may use a single network approach where the network state at any point of time is inputted and flattened as a vector. As a result, the specific neighborhood information obtained from the network graph is lost because the flattened vector does not incorporate the spatial closeness information along with the traffic data. By using a distribution of machine-learned networks, such as multiple recurrent architectures with specific attention to each of the traffic channels in the network, the networks may capture the specific dynamic relationships of any traffic channel and its neighbors, which capture of the entire traffic network for a single neural network architecture is costly to update and presents difficulty for distributed processing.

[0036] For example, a citywide congestion forecasting framework works at a higher granularity tailored towards capturing the specificity of each traffic intersection or segment of the network. To develop such an integrated architecture, the traffic network is modeled to a directed connected graph encapsulating the spatial interconnections where each neighbor of a road segment is a function of spatial connection to other segments as well as traffic flow directions. Along with modelling spatial dependencies, the temporal aspect of the traffic flow is captured by the recurrent neural network architectures. Time varying behavior, for instance, by considering information such as time of day and day of week, may be included as additional inputs to the model.

[0037] A data-driven approach predicts the propagation of traffic congestion at road segments as a function of the congestion in their neighboring segments. The city-wide ensemble of intersection level connected LSTM models may be used to identify congestion events. To reduce the search space of likely congestion sinks, the likelihood of congestion propagation in neighboring road segments of a congestion source is learned from the past historical data. One example resulting forecasting framework for Nashville, Tennessee, identifies the onset of congestion in each of the neighboring segments of any congestion source with an average precision of 0.9269 and an average recall of 0.9118 tested over ten congestion events.

[0038] In one embodiment, the congestion forecasting framework incorporates intersection-specific information. Spatiotemporal modelling of the transportation network is provided by expressing the network as a directed connected graph and uses LSTM networks to learn the distribution of the traffic speed of a target road in the future as a function of the past sequences of observed speed of the target road and its immediate outgoing neighbors. Congestion events at any part of the network are identified based on spatial and temporal correlations of the traffic speed at any road segment and its associated neighborhood. The likelihood of congestion propagation to any road segment is determined from its outgoing neighbors to form an overall cascaded congestion forecasting framework with the connected fabric of multiple machine learning models. The search space of the real time congestion forecasting algorithm is reduced by focusing on intersections with a higher likelihood of congestion progression as learned from the historical data.

[0039] A road network is used as the example traffic network. For example, the congestion forecasting framework is applied to urban road traffic. Other networked systems applications may be provided, such as a network of an electrical grid or a pipeline and corresponding flow or traffic of such networks.

[0040] Figure 1 shows one embodiment of a method for determining a cascading effect of congestion in a traffic network. A framework or model uses localized machine-learned models for predicting traffic in different localized portions of the traffic network.

[0041] Figure 2 shows the overall framework including the training phase for machine learning the localized models and the forecasting phase applying the previously learned or machine-learned localized models. Figure 1 is directed to the forecasting phase, providing a different embodiment than shown in Figure 2. Figure 12 is directed to the training phase, providing a different embodiment than shown in Figure 2. Figures 1 and 12 are described below with reference also to Figure 2.

[0042] The congestion forecasting method of Figures 1 and 2 is implemented by the system of Figure 11 or another system. For example, sensors acquire data and a computer or server forecasts congestion from the sensor data. As another example, a network of different computers performs the forecasting, different computers forecasting for different parts of the traffic network. [0043] Additional, different, or fewer acts may be provided. For example, acts 100 and 102 are not performed where the network is previously defined and the sensor data previously acquired. As another example, act 110 is not performed. In yet another example, acts 104 and 108 are the same act performed in a loop with the prediction of act 106. According to another example, the prediction of act 106 directly predicts future congestion level so the identifying of act 104 and/or the determining of act 108 are not performed. [0044] In Figure 2, act 204 may not be performed in some embodiments. Act 204 may be performed in the method of Figure 1. As another example, act 206 is not performed.

[0045] The acts are performed in the order shown (e.g., top to bottom or numerical) or other orders. In Figure 1 , acts 100 and 102 may be performed simultaneously or in reverse order. Acts 104-108 may be repeated. In Figure 2, act 204 may be performed earlier, such as prior to act 104.

[0046] In act 100, the traffic network is defined as a directed connected graph. Each edge is a road segment. For example, a traffic network of at least ten road segments (e.g., tens, hundreds, or thousands of road segments) are linked to each other based on direct physical connection and/or flow direction. For a road segment with flow in two directions (two-way traffic), the road segment is treated as two segments, one for flow in one direction and the other for flow in the opposite direction. Other graphs may be used, such as non-directed connected graph. Alternatively, the traffic network is formed as connected segments without graphing.

[0047] Each segment is a continuous road segment with two end points. The end points correspond to neighboring intersections separated and connected by the road segment. Other road segment structures may be used. Each road segment is to have a given machine-learned model trained to predict traffic in a single or both directions for that respective road segment based on the traffic data for that road segment and other local road segments. In alternative embodiments, a sub-set of connected segments (e.g., two or more) are modeled together (e.g., one machine-learned model trained to predict traffic for the sub-set). [0048] Figure 3 illustrates a representation of a sample road network 300 with directions of traffic flow and the corresponding framework of the connected fabric of neural architectures 302. These neural modules associated with each and every edge A-K of the network takes into account the information from itself and its outgoing neighbors for certain past sequences up to the current time to determine the future traffic state of the target edge. For example, the congestion for road segment D uses the traffic from road segment D and the traffic from road segment A. The flow of traffic is from road segment D to road segment A, so A receives the outgoing traffic (i.e. , is downstream from) of road segment D. Thus, the machine-learned module 304 for road segment D is localized to receive traffic data from different times, t, in the past and current time from road segments A and D. [0049] Figure 4 shows a deployment diagram where each computing processor 402 associated with each road segment in the network collects the traffic of the neighboring segments from the associated sensors 400 according to the graphical model of the network (e.g., see Figure 3). For example, processor 1 (402) receives traffic data from the sensors 400 for the respective road segment and one outgoing connected road segments. The processors 402 process the sensed traffic data from the sensors 400 to forecast the traffic of the respective target road segment. Depending on the traffic network, different processors 402 receive traffic data measured by different combinations of sensors 400. Different numbers of sensors 400 may provide measurements to different processors 402, depending on the interconnection of the traffic network. For example, Figure 3 shows that the processor 402 for segment H would receive sensed traffic from sensors 400 for segments H, I, and D while the processor 402 for segment J would receive sensed traffic from sensors 400 for segments J and K.

[0050] In the example of Figure 4, the predicted traffic is sent to a central cloud server 404, which can be used for taking traffic routing decisions. Each computing processor 402 associated with each road segment in the network collects the speed of the neighboring segments according to the graphical model of the network 300, processes the collected data to forecast the traffic, and sends the results to the central cloud server 404, which can be used for taking traffic routing decisions. In alternative embodiments, a same processor implements all or some of the processors 402; one of the processors 402 makes routing decisions; and/or the processors 402 communicate to different remote computers or workstations in additional to or as an alternative to the cloud server 404.

[0051] In one embodiment, the transportation network 300 is modeled as a directed connected graph defined as G = ( V,E ), where V is a set of nodes representing intersections. E is the set of road segments connecting the nodes. In the graph, let

denote a node and

represent an edge. In the graph, in and out operators are applied such that the operator in:

gives all the edges for which this node v is the destination and the operator out :

gives all the edges whose source is node v. The indegree of a node v is the number of road segments incoming to the node and can be calculated as \in(v)\, whereas, the outdegree of a node v is the number of road segments outgoing from the node and is calculated as \out(v)\. Similarly, the source and destination node of an edge can be accessed. Each node is associated with some static information, such as the location of the node. For example, the attribute of a node

contains longitude and latitude. provides location of node v

as a tuple of two reals (latitude, longitude). Each edge contains the information of the geographical shape of the road segment as a sequence of latitude longitude tuples.

[0052] An edge defines a traffic message channel (TMC) if the edge has timestamped traffic data associated with the edge. The set of TMCs is denoted as

Each sensor

represents the traffic readings of each TMC at times T.

[0053] In addition to the TMCs, certain operations are defined to provide access to neighbors of a TMC based on the number of hops. Access may be to the immediate neighbors and/or all the incoming and/or outgoing neighbors of an edge up to a certain number of hops, providing k- hop incoming neighbors and k- hop outgoing neighbors. The congestion wave travels in a reverse direction as compared to the flow of traffic, so the prediction network is created in the opposite direction as flow (see Figure 3 comparing arrows of 300 and 302). The k-hop incoming neighbors are the k- nearest hops of the incoming edges feeding traffic into an edge. The set of k-hop incoming neighbors

may be defined recursively as:

Given the set the set can be defined as

where src is the source edge.

[0054] The k-hop outgoing neighbors are the k- nearest hops of edges taking traffic away from an edge via the edges that are going outwards. The set of k-hop outgoing neighbors may be defined recursively as:

where dst is destination edge. Given

the set

the set can be defined as

[0055] In the example of Figure 3, k-hop incoming and outgoing neighbors are defined. For road segment or edge A as the target road, D is the 1^st hop incoming neighbor and H and G are the 2^nd hop incoming neighbors. B and C are the 1^st hop outgoing neighbors and K is the 2^nd hop outgoing neighbor. This hop structure for the directed connected graph may be used to access the sensor data for localized prediction.

[0056] In act 102 of Figure 1 , the processor or processors receive sensor data for the road segments. The sensor data is received for the entire traffic network. Alternatively, sensor data for localized sensors is received by localized processors or processes. For example, the grouping 200 by k-hop for incoming and/or outgoing neighbors is performed. For each segment, the sensor data for that segment and the sensor data for outgoing neighbors based on the graph for k number of hops (e.g., k-hop is 1 or 2) is gathered 202. Data for incoming neighbors may be gathered 202.

[0057] The sensor data is any measure of traffic. In the example used herein, the measure is speed. The speed may be an average speed of all vehicles or vehicles of a particular class over a given interval (e.g., 10 minutes). Other measures of speed may be used, such as a median or other statistic over the interval. Other measures of traffic flow may be used, such as a count of vehicles, volume flow, stack up (i.e. , difference between number in and number out), stop frequency, and/or a jam factor. [0058] The machine-learned models are to output the same characteristic as input. For example, the speed is measured, and the machine-learned models output speed. In alternative embodiments, the machine-learned models output a different characteristic (e.g., Jam factor or congestion) in response to input of the speed.

[0059] In act 104, the processor or processors identify congestion. Congestion is identified by analysis for each or some of the road segments. Congestion may be identified in any number or no road segments.

[0060] The congestion for a current time is based on the current traffic for that road segment as compared to a reference. For example, a current speed is compared to a reference speed. If different enough (e.g., beyond a threshold), then congestion is identified. In one embodiment, the congestion is identified as a ratio of (1) a current speed on the road segment to (2) an average speed unlimited by congestion on the road segment being below a threshold. Any threshold may be used, such as 0.70 or 0.600. Other comparisons than a ratio may be used.

[0061] In alternative embodiments, the congestion is identified by the prediction of act 106. The prediction is used to identify congestion at the current time or a future time. The prediction of 106 may then additionally be used to predict any congestion for a later time.

[0062] The reference traffic may be a generalized reference, such as one reference being used for all road segments. In other embodiments, the reference traffic for a non-congestion or free flow condition is specific to each road segment. A speed limit or an average speed in a free flow condition is used as the reference. In alternative embodiments, the reference flow is a reference level for initial or onset of congestion. Where the current traffic is at or below the reference, then congestion is identified.

[0063] The congestion may be expressed as a jam factor JF, where

is an indicator of number of cars over capacity on the road.

JF is part of the HERE API. Other congestion metrics may be used.

Generally JF is from 0 to 10, but its value is normalized to be within 0 to 1 , where JF = 1 indicates non-moving traffic and JF = 0 indicates free flowing traffic. The free flow speed FF is used as the reference. Free flow speed

is the average of all maximum speeds observed when the jam factor observed on the TMC is 0.

[0064] Once congestion is identified on one or more road segments, the goal is to determine the cascade effect of the congestion onto other road segments. A congestion event is observed at a certain road segment at any point of time in the transportation network. The question is when does the congestion effect propagate to its k - hop incoming neighbors. For example, the congestion event (CE) at an edge e is a tuple CE(e) = (t,s(e, t)) where

Reduction of 60% (0.6) speed compared to free flow speed may be used as the indicator of congestion, but other levels such as 50% may be used. Other metrics may be used.

[0065] The delta cascade event is defined as a congestion event where more than 50% of first hop neighbors show 60% speed reduction with

D time steps “e" is the source of the cascade event. Given a city network and the data collected from TMC segments, the goal is to find the D - cascade events across the traffic network. Without training specifically on the cascade or congestion events, the predicted local traffic may be used to identify the time of propagation of congestion up to the k - hop incoming neighboring segments where k varies from one to three. K may have larger values (e.g., four or more).

[0066] In act 204, the traffic data is normalized. For each road segment, the traffic data is normalized to the same scale or dynamic range, such as [0,

1 ]. The free-flow speed for the segment may be used to normalize the traffic data for the segment. The free-flow speed is mapped to the maximum value of 1. Higher speeds are clipped or limited to the maximum value of 1. Since different road segments may have different reference traffic, different speed limits, and/or different speeds at which congestion occurs, the traffic is normalized to the same range. In this way, the variance across the network due to reference speed is removed, allowing for data from road segments with different reference speeds to be used comparatively by the machine-learned models. In alternative embodiments, the traffic data is not normalized as the machine-learned models are separately trained. [0067] In act 106, the processor or processors predict traffic for a future time. The current traffic data from the segment and outgoing (downstream) segments are used to predict the traffic. Past traffic data, such as from a last one, two, three or more intervals may be used with the current traffic to predict the traffic.

[0068] The prediction for each or different segments uses separately trained machine-learned models. Any type of machine learning architecture and resulting machine-learned model may be used. For example, a neural network, such as a convolutional neural network or fully connected neural network is used. In one embodiment, a recurrent neural network is used. Any machine learning operating on a sequence of measurements may be used. In one embodiment, a recurrent neural network with one or more long-short term memory (LSTM) architectures is used. Any number of layers, nodes, and/or connections may be used, such as a 2-layer LSTM network architecture with drop out regularization. The learnable parameters of the network are learned through training. The learned values are used in application of the machine- learned model.

[0069] Figure 3 shows an example. The machine-learned model for segment D is a LSTM network 304 that receives a temporal sequence of measured or sensed traffic over three intervals of the current time and the two most immediate past times (e.g., over the current 10 minutes plus the past 20 minutes for a total of traffic for the past 30 minutes). Since there is only one outgoing segment A for segment D, the traffic over this range of time for A and D are input to the LSTM network 304. No other inputs are used, but other inputs may be provided.

[0070] LSTM is a form of recurrent neural network with the capability of processing sequences of data. LSTM prevents the vanishing and exploding gradient problem encountered in a recurrent neural network so that the network is capable of capturing long temporal dependencies using backpropagation through time. LSTM models the temporal dependencies of the traffic speed that will affect the speed in future. A connected LSTM-based architecture is separately trained as intersection specific. To model the future speed of a particular TMC, the information from the relevant neighboring segments is used.

[0071] In a transportation network, traffic flows to a road from its incoming neighbor but congestion flows in a reverse direction of traffic flow, i.e. , from an outgoing neighbor to a target road. As the congestion moves in a sequence, the speed forecasting detector for a target road is trained on the traffic data of the target road segment and its immediate outgoing neighbor(s), since congestion flows from an outgoing neighbor to a target road. For predicting future traffic speed under the influence of congestion, only information from the outgoing neighbors (not from incoming neighbors) and the target segment are used. In alternative embodiments, information from both the incoming and outgoing neighbors may be used to model real-time traffic and/or future speed.

[0072] The function of the traffic predictors for speed forecasting

can be expressed as:

where s(e) denotes the speed of any TMC e, c_t denotes the current timestep, p is the number of timesteps to predict ahead in future and j is the number of past timesteps to look back. TMC is one definition specifying a road segment (see HERE API) but other definitions (e.g., XD from Inrix or OSM ways from Open street maps) may be used. Future traffic states of the TMC s(e), evaluated at current timestep c_t, is been modeled as a function (f) of traffic states of its own and its immediate outgoing neighbors’ speed

from timestep ( c_t - j ) to c_t using the machine-learned model. The traffic predictors take into account the normalized speed data of each TMC, normalized w.r.t. the free flow speed. Each TMC in the network has such a LSTM-based traffic predictor associated with the segment. Figure 3 shows an example of a sample road network 300 and its corresponding connected fabric 302 of LSTMs, one LSTM network 304 provided for each of segments A-K.

[0073] The machine-learned model for each segment has the same architecture and corresponding learnable or learned parameters. The learned values may be different due to the different inputs or groupings of data. In alternative embodiments, different segments have machine-learned models with different architectures and/or learnable parameters. For example, the size of the road, number of incoming neighbors, number of outgoing neighbors, and/or another characteristic is used to select the recurrent neural network to use for the model of that segment.

[0074] In one embodiment, the same two-layered deep LSTM network with 100 units in each layer and a dense output layer is used for each traffic predictor. Other architectures may be used. For training, the mean squared error (MSE) between the predicted and actual speed is used as the loss function, and the ‘Adam’ optimizer is used for optimizing the loss function. Other optimization and/or loss functions may be used in training.

[0075] The prediction is performed for the different road segments. For example, the traffic at each road segment A-K in the road network 300 of Figure 3 is predicted. The predictions may be performed for fewer than all of the road segments A-K, such as based on a likelihood of congestion. Each prediction is localized or independent of the other predictions. Alternatively, the prediction for one segment may use prediction for other segments, such as from outgoing segments. Separate machine-learned models for the separate road segments predict the traffic for the respective road segments. The prediction is localized based on the directed connected graph or interconnections of road segments.

[0076] Only information from neighbors and the target are used to forecast the traffic speed of a target road. For example, the traffic data measured or sensed for the target and only the k-hop nearest outgoing segments or neighbors is used in the prediction without using traffic from other segments in the prediction. To solely analyze the importance and influence of neighboring road segments in determining the future traffic speed of a target road segment, two feed-forward networks with the same architecture, optimizer and loss functions may be trained. The first network is trained to forecast traffic speed using the information from the neighboring road segments, and the second network is trained to forecast the traffic speed without using any information from neighbors. The mean squared errors (MSE) in forecasting the traffic speed over five randomly chosen TMC IDs indicate that, given the same architectural constraints, the forecasts using the neighbors’ information have far less MSE than the forecasts without using the neighbor’s information, indicating a benefit to using neighborhood information in traffic forecasting. [0077] The prediction is for a future time. For example, the prediction is of the traffic in a next one or multiple future time increments. The timestep (i.e. , the interval at which the traffic data is discretely sampled) is a hyperparameter for the prediction. Over 500 minutes of data collected at an interval of 1 minute, 5 minutes and 10 minutes respectively, the error in prediction increases gradually with the size of the timestep. If data is collected at one- minute time intervals, then prediction is performed 10 times to get a prediction after 10 minutes, which includes the error accumulated at each level of prediction. Instead, by sampling at or averaging over a 10-minute time interval, then just one prediction is needed to get a 10-minute ahead prediction. Not much information is lost by sampling the data at 10-minute intervals. The predicted results using LSTMs with timestep = 10 are in multiples of 10-minute time intervals. The prediction may be fine-tuned as needed using prediction of congestion times in multiple of 5-minute time intervals using LSTMs with timestep = 5 or other timesteps.

[0078] The speed of traffic at a future time is predicted. This prediction is based on speed from any number of past observations or increments. The number of past observations is a hyperparameter to tune the LSTM models. For example, two past sequences (i.e., look back into the past 20 minutes) of the traffic speed are used to predict the future traffic speed. Any time increment may be used, such as 10 minutes, 5 minutes, 20 minutes, 30 minutes or an hour. Any number of past sequences for a prediction may be used, such as 2 (e.g., 20 minutes with 10-minute increments) or 3 (e.g., 30 minutes with 10-minute increments). Choosing longer time sequences may not improve performance because the future speed may be more closely approximated with speeds in recent history. The mean square error (MSE) may not decrease as a greater number of past data samples (e.g., greater than 20 minutes) is taken into account and may be least when looking back for two ten-minute timesteps. In one embodiment, the hyper-parameter representing the number of past observations is two. [0079] The prediction may be for any number of future timesteps or intervals. Using the connected LSTM fabric 302, multiple timesteps ahead in future may be predicted. As the prediction is ahead from current time, the information up to k-hop neighbors of a target road are used to predict the traffic speed for ‘k’ number of timesteps in advance. For example, a one-step ahead prediction uses the past and current traffic speed of the 1st hop neighbors, whereas, a two-step ahead prediction uses the one-step ahead predictions of the target road segment as well as that of the 1st hop neighbors to be treated as input. The one-step ahead predictions of the 1 st hop neighbors uses the traffic information from their neighbors, i.e. , the 2nd hop neighbors of the target road. So, for a two-step ahead prediction, information up to 2nd hop neighbors is used. For predictions up to three timesteps ahead in the future, information up to 3rd hop neighbors are incorporated. The 0-th timestamp is the current time, and traffic is predicted at one, two and three timesteps ahead from the current time (i.e., predicting traffic at 10, 20, and 30 minutes from the current time). Other arrangements of number of hops per number of future increments being predicted from a current time may be used. [0080] The connected fabric 302 of LSTM architectures can inter- dependently produce multi-timestep ahead predictions. The predictions from an earlier timestep (e.g., 10 minutes in the future) for k-hop neighbors are used as inputs to predict for later occurring timesteps (e.g., 20 and 30 minutes). The difference between actual and predicted speed while predicting three timesteps ahead may be 1.3414 times more than that of two timesteps ahead and 2.6857 times more than that of one timestep ahead. As the predictions move further away in the future, the difference between the actual and predicted speed may increase.

[0081] In act 108, the processor or processors determines the cascading effect of the congestion from the predicted traffic. The cascading effect is determined as congestion upstream (i.e., following the backup or congestion wave in the reverse direction of traffic flow) of identified congestion. The start of congestion upstream on the directed connected graph of the identified congestion is determined. [0082] The determination is the same or different approach as identifying the congestion in act 104. For example, a ratio of predicted traffic to a reference traffic is below a threshold on one of the road segments upstream from the congestion. This ratio being below the threshold for an upstream segment is the detection of congestion from the cascade. The predicted traffic (e.g., speed) for the future time is used to detect congestion. Alternatively, the machine-learned model for the segment directly outputs an indication of congestion or jam factor rather than the indirect prediction of traffic as speed.

[0083] By using traffic data from outgoing segments of one, two, or other numbers of k-hops, the cascade of any congestion on any downstream segments may be predicted. Algorithm 1 below illustrates one embodiment of the overall congestion forecasting architecture.

Once congestion is identified at a target road segment, e, the algorithm starts with gathering the 1 st hop incoming neighbors For each of those 1 st

hop neighbors, the 2nd hop incoming neighbors denoted as

are found. The process repeats for the 3rd hop incoming neighbors denoted as

These subsets of 1st, 2nd and 3rd hop neighbors constitute the set of total neighbors denoted as N.

[0084] The function predict_next(e, timestep ) calls the pre-trained LSTM forecasting module to predict the speed for a certain TMC edge

based on the values of its neighboring segments. The speed of edge e one time-step ahead in the future, such as 10 minutes, is predicted. When the decrease in speed between two consecutive forecasts for a given TMC e is more than a detection threshold ( d ) indicating a drop in forecasted speed for the specified TMC, and the forecasted speed is less than or equal to 60% of the free-flow speed, the algorithm turns on the corresponding flag for the TMC e and forecasts a congestion to start at that TMC from the next timestep. The accuracy of this algorithm depends on the detection threshold d indicating how much percentage of dip in forecasted speed from that of the previous timestep would trigger the initiation of congestion. Example detection thresholds are between 0.1 to 0.15, but other values may be used.

[0085] At each timestep, the algorithm checks for the TMCs whose flags have been turned on and eliminates those from the list of N. In general, the algorithm starts with checking if a congestion is forecasted to start within the next 10 minutes for all the relevant 1st, 2nd and 3rd hop neighbors, and then eliminates the neighbors where congestion gets started. As the congestion in the 1st hop neighbors starts earlier, the first hop neighbors are eliminated from the list first. In the next timestep, the computation is carried out only for their corresponding 2nd and 3rd hop neighbors to output the corresponding time for onset of congestion for them. Other algorithms may be used, such as one detecting congestion without specific references to earlier congested segments. The congestion may be predicted for all segments.

[0086] In one embodiment shown in Figure 5, a refined temporal prediction is provided. Two sets of LSTM networks 304 are provided, one trained for 10-minute timesteps and another trained for lesser (e.g., 5 minute) timesteps. The LSTM networks 304 with timestep = 10 use the data collected at 10- minute intervals and predict time of onset of congestion at multiples of 10 minutes. Once congestion is forecasted within the next ten minutes, the solution can be fine tuned by predicting whether the congestion will start within the next 0 to 5 minutes or within next 5 to 10 minutes by using LSTM networks 304 with timestep = 5. The same algorithm is applied using the data sampled at 5-minute intervals once congestion is detected.

[0087] Figure 6 shows an example of predicted congestion showing the cascade effect. The road network 600 includes segments or edges A-G. At the current time t, congestion is identified at segment D and not the other segments. For the prediction, the LSTM networks 304 for segments A-G or just upstream segments B and E (1-hop) or B, E, G (2-hop) predict traffic for the respective segments. In the example of Figure 6, congestion is predicted based on the predicted traffic (e.g., speed) for upstream segments B, C, and E and not segments A, D, F, and G. Based on the direction connected graph, the congestion at time t+1 to t+1+k (i.e., 0-10 minutes where each increment is 10 minutes and k is the length of an increment) for segments B and E (upstream from segment D) is from cascade. Since segment C is not upstream from segment D in the road network 600, the congestion predicted for segment C at time t+1 to t+1 +k is due to other causes. The congestion at segment C may be identified as an initial congestion. The prediction of time n+1 to n+1 +k for segment G due to cascade from segment E is based on the predicted traffic for all segments at time t+1 to t+1 +k.

[0088] The likelihood of congestion propagation may be identified in other embodiments. Algorithm 2 shows one embodiment for estimating likelihood of congestion propagation from a source road to a destination road.

Which of the incoming neighbors of a target road segment have higher likelihood of congestion propagation is identified. By doing so, the execution time of algorithm 1 may be reduced by testing for onset of congestion for only those neighbors at each hop where the likelihood of congestion propagation are higher given historical records, instead of testing for congestion for all the incoming neighbors at each hop.

[0089] Algorithm 2 keeps track of two kinds of events. Event evl corresponds to the phenomenon where a significant speed decrease is observed at any target road. Event ev2 corresponds to the phenomenon where a significant speed decrease is observed at any of its incoming neighbors within the time range of start time of congestion in target road, up to D timesteps from that time. D is a heuristic and is chosen as 4 in this case with the assumption that a congestion, if progressing from source to neighbor, should take place within 4 timesteps. The choice of D may vary according to the problem. For each neighbor, the algorithm checks the number of times the event evl and ev2 occurred and saves the ratio of ev2/ev1 as likelihood, which signifies the proportion of times the congestion created at the source propagated to the corresponding neighbors.

[0090] From the historical observations, this likelihood of congestion propagation for each source destination pair can be found and can be updated in real time, as more and more such cases are encountered. If the likelihood is more than 50% (i.e. , more than half of the times the congestion from source propagated to a particular neighbor given historical records) or another threshold, then this particular neighbor is appended to the set of most likely neighbors to be affected by congestion at source road e and this set is denoted as

of a road segment ‘e’. The k-hop

is denoted as which indicates the subset of the neighbors of ‘e’ at k - th hop that

have higher likelihood of getting affected by the congestion at ‘e’, where k=1 ,2,3.

[0091] So,

such that when algorithm 1 is run, the congestion forecasts for

only, instead of the whole set of Thus,

the execution time of the overall congestion forecasting algorithm 1 is reduced by an order of for each of the road segments. The likelihood of

congestion propagation saves the time complexity of the overall congestion forecasting algorithm. Algorithm 2 may additionally be applied to edges that are left out (e.g., used to construct propagation graphs from the probabilities). [0092] In act 206 of Figure 2, the processor or processors output a time for onset of congestion. The onset of congestion is output as any congestion or from a cascade of previous congestion or congestion on another road segment. The congestion detected in acts 104 and/or 108 are output.

[0093] In act 110 of Figure 1 , the output is by an interface, such as through a memory buffer or computer network interface. The output is to a routing computer, such as a server providing route information to a mobile device (e.g., phone or vehicle). The predicted congestion may be used for routing decisions by the routing computer. The segments expected to be congested may be avoided. For example, the routing may use historical data with or without current traffic data to determine a fastest route through a road network or part of the road network. The predicted congestion may be included in the navigation routing decisions to further penalize road segments expected to be slower. The congestion and/or cascading effect may be considered in routing. [0094] In one example embodiment, congestion forecasting is based on a traffic data of Nashville (USA). In the dataset, each road segment is expressed as a Traffic Message Channel (TMC) having a TMC ID. Each TMC ID has timestamped information consisting of traffic speed, jam factor, and free flow speed collected over several months. For machine training and/or testing, the traffic data over a period (e.g., from January 1 , 2018 to February 12, 2018) is used. Each TMC ID signifies a specific road segment in the network and contains the sensor information for that particular segment.

In Nashville, there are a total of 3724 TMCs.

[0095] Algorithm 1 is validated on the Nashville dataset. The data from January 28, 2018 to February 12, 2018 is used for validation purposes. The set of congestion events are identified in the dataset for this period. One example procedure for finding the cascade events from validation dataset is provided in Algorithm 3. Algorithm 3 starts with checking for TMC IDs whose current speeds are less than 60% of the free flow speed (FF) for two consecutive timesteps n and n + 1.

Then for each of the incoming neighbors for TMC e, their

corresponding normalized speed are checked from timestep

is used as the hypothesis. If there is a congestion event that affects a neighborhood, then the congestion propagation between any two consecutive hops are within this D number of timesteps. The parameter D is a heuristic and may vary depending on the problem. If congestion is detected in any of the incoming neighbors within this specified time range, the flag temp is turned on for that road as specified in Algorithm 3. Then, the number of times the flag temp turned on is summed (e.g., counted). This count indicates how many incoming neighbors showed the sign of congestion within that time range. If more than or equal to 50% of the incoming neighbors showed the effect of congestion, then a congestion event occurs, and the traffic network edge ‘e’ is output as having congestion at timestep n. Not all of the neighbors necessarily need to be congested in a dynamic real-world traffic scenario. By identifying these cascaded congestion events, a validation set is created to verify the proposed congestion forecasting algorithm 1. In this example, ten such events are identified from the Nashville dataset.

[0096] Algorithm 1 is validated on the ten congestion events identified across Nashville using 10-minute increments or timesteps. To give a more precise idea of the efficacy of Algorithm 1 , the corresponding precision and recall values in identifying the onset of congestion in each of the neighboring road segments is calculated. For each road segment, an experiment for three consecutive timesteps, including the actual time of onset of congestion, and one timestep before and after that is carried out. Whether the proposed algorithm outputted the presence of congestion or not for those timesteps is classified and compared with true conditions. When the onset of congestion is correctly identified, a true positive occurs. When the onset of congestion is predicted before the actual onset, a false positive occurs for those number of timesteps during when congestion was forecasted but was not actually present. When the onset of congestion is predicted after the actual onset, the prediction is considered to be false negative for those number of timesteps during when congestion was not forecasted but was actually present.

[0097] The test is only for neighbors that had higher likelihood congestion propagation as outlined in Algorithm 2. Various scenarios where the congestion is confined within the 1st hop neighbors itself or affects a larger number of neighbors ranging up to the 3rd hop are considered. Figure 7 shows the corresponding results of the precision and recall values in identifying the onset of congestion 10 minutes in advance in the neighboring segments and the corresponding number of neighbors that got affected by the congestion for ten different congestion events. The average precision and recall are obtained as 0.9269 and 0.9118, respectively, tested on these ten events. The variance of these precision and recall values are 0.02 and 0.0131 respectively.

[0098] Figure 8 shows the transportation network for one congestion event. The initial congestion occurred at segment A and cascaded to other segments. The source of the road congestion is road segment ‘A’. Following the congestion at the source road segment, the congestion propagates to the 1st (‘B’), 2nd ('C', 'G') and 3rd hop (O’, Έ’, F’, Ή’, 'I', ‘J’) incoming neighbors, respectively.

[0099] For better understanding of the result on the cascaded congestion prediction using LSTM with timestep = 10, Figure 9 shows the effectiveness of identifying the onset of congestion in each of the neighbors through three different radar charts. The chart in the middle shows the results for one 1st and two 2nd hop neighbors. The radar charts on the left and the right show the results for the 3rd hop neighbors corresponding to each of the 2nd hop neighbors. It is seen that the onset of congestion can be identified accurately most of the time where the actual onset of congestion shown as solid line overlaps with the predicted onset shown in dotted line.

[00100] When congestion is predicted for a neighboring segment within the next ten minutes, the solution may be refined to identify whether the congestion will take place within next 0 to 5 minutes or next 5 to 10 minutes. The actual and predicted time of onset of congestion for all the neighbors using LSTM with timestep = 5 is provided as a fine tuning. The average precision and recall values for identifying the onset of congestion in one of the two possible higher resolution time-slots are calculated as 0.75 and 0.92 respectively.

[00101] The congestion forecasting framework is formed with an overall connected fabric 302 of LSTM architectures 304. Figure 10 shows an example of the traffic speed forecasting performance on a road segment having five neighbors. The forecasted speed after ten minutes and the actual speed after ten minutes overlap each other with a mean squared error of 0.0046.

[00102] The mean squared error (MSE) in predicting one vs. multiple timesteps ahead in the future is determined for multiple TMCs. The MSEs in forecasting one, two, three and four timesteps ahead are calculated for 45 TMCs out of 3724 TMCs in Nashville. The MSE increases for prediction further in the future.

[00103] Figure 11 shows one embodiment of a system for predicting congestion in a traffic network, such as a road network. The system applies a plurality of localized, machine-learned predictors for separate prediction of traffic for different segments of the network. For example, different LSTM networks predict future traffic for different road segments based on local traffic information from a range of previous times. The predicted traffic is used to determine whether and/or when congestion occurs.

[00104] The system implements the method of Figure 1 and/or the method of Figure 2. Other methods or acts may be implemented, such as acts for using spatiotemporal modeling of the traffic network as a directed connected graph and machine-learned models to predict localized traffic from current and past traffic for the local segment and outgoing segments.

[00105] The system includes a processor 10 for applying the spatiotemporal learned networks 12 in a memory 11 to predict congestion in a network of road segments 15. The system may include sensors 16 for measuring current traffic and/or providing past traffic measures. The system may include an interface 13 for reporting the prediction to a routing or navigation server 14, which guides or directs devices 17 (e.g., smart phones in vehicles and/or navigation systems of vehicles) along the road segments 15. Additional, different, or fewer components may be provided. For example, the interface 13, routing server 14, and/or devices 17 are not provided. As another example, the measures of traffic are stored in the memory 11 , and the sensors 16 are not provided.

[00106] The sensors 16 are traffic sensors, such as car counters, cameras with vehicle detection, and/or speed sensors (e.g., radar and/or time between detection at different locations). In one embodiment, the sensors 16 are speed or velocity sensors. Other traffic sensors measuring a characteristic of the flow of traffic on the road segments 15 may be used. [00107] The sensors 16 are configured by placement and/or operation to sense traffic flow in different roads. Different sensors 16 sense for different road segments 15 of the traffic network. The sensors 16 output to the processor 10. Alternatively, the processor 10 reads the measurements. [00108] The traffic network is modeled as a directed connected graph. The road segments 15 are edges in the directed connected graph. Other modeling may be used. The model of the traffic network is stored in the memory 11 and/or otherwise accessible by the processor 10. [00109] The processor 10 is a programmable logic controller, application specific integrated circuit, artificial intelligence processor, graphics processing unit, digital signal processor, field programmable gate array, workstation, computer, and/or server. The processor 10 is a single device but may be a network of processing devices, such as different processors applying different ones of the learned networks 12.

[00110] The processor 10 includes the interface 13 and the memory 11 storing the machine-learned networks 12. The memory 11 and/or interface 13 may be external to the processor 10. The processor 10 is configured by software, firmware, and/or hardware.

[00111] The processor 10 is configured to process the measurements from the sensors 16. For example, the speed is sampled in time increments. As another example, the speeds during each time period are averaged. In yet another example, the speed before or after sampling or averaging is normalized. The sensed traffic flow is normalized to a same scale regardless of differences in maximum, reference, and/or average speeds for the road segments.

[00112] The processor 10 is configured to group the sensed traffic flow by the road segments 15. The grouping localizes the measurements to each road segment. The grouping is so that the sensed traffic flow for each machine-learned network includes the sensed traffic flow for the road segment for the machine-learned network and the sensed traffic flow for downstream road segments of less than two, three, or four hops of the directed connected graph. The normalization of or for the grouped traffic flow measurements may occur prior to or after grouping.

[00113] The processor 10 is configured to forecast congestion road segment-by-road segment 15. By application of the groups of the sensed traffic flow to respective machine-learned networks 12 for respective road segments 15, the machine-learned networks 12 output predicted traffic for the different road segments 15. The machine-learned networks 12 were each trained for different ones of the road segments with input of current and past measurements to output a prediction of future traffic. In one embodiment, the learned networks 12 are long-short term memory neural networks. [00114] The processor 10 is configured to forecast congestion based on the predicted traffic. The forecast uses the output of the learned networks 12 to determine whether congestion has occurred or not. For example, congestion is forecast, for each road segment 15, based on an amount of difference of a measure of traffic output by the respective machine-learned network 12 to a reference measure of the traffic.

[00115] The memory 11 stores the learned networks 12, such as the architecture and values of the learned parameters. The memory 11 may store measurements from the sensors 16, groupings, the directed connected graph, predicted traffic, congestion determination, and/or other information. The memory 11 or other memory is alternatively or additionally a non-transitory computer readable storage medium storing data representing instructions executable by the processor 10 for applying the machine-learned networks 12. The instructions for implementing the processes, methods, and/or techniques discussed herein are provided on non-transitory computer- readable storage media or memories, such as a cache, buffer, RAM, removable media, hard drive, or other computer readable storage media. Non-transitory computer readable storage media include various types of volatile and nonvolatile storage media. The functions, acts or tasks illustrated in the figures or described herein are executed in response to one or more sets of instructions stored in or on computer readable storage media. The functions, acts or tasks are independent of the particular type of instructions set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firmware, micro code and the like, operating alone, or in combination. Likewise, processing strategies may include multiprocessing, multitasking, parallel processing, and the like.

[00116] In one embodiment, the instructions are stored on a removable media device for reading by local or remote systems. In other embodiments, the instructions are stored in a remote location for transfer through a computer network or over telephone lines. In yet other embodiments, the instructions are stored within a given computer, CPU, GPU, or system.

[00117] The interface 13 is a computer network interface, transceiver, communications buffer, and/or other device for Internet or computer network communications. The interface 13 is configured to transmit the forecast congestion for one or more of the road segments to a traffic routing system. The predicted traffic may or may not also be transmitted.

[00118] The routing server 14 is a server for providing a route or navigation to devices 17. Based on a selected end point and a current or past starting point, the routing server 14 determines a route for the device 17 to the end point. The route determination uses one or more criteria, such as fastest route. The routing server 14 may use the predicted congestion to establish the route and/or to alter the route to avoid congestion.

[00119] Figures 2 and 12 show embodiments for machine training for congestion detection. The machine training is for a plurality of neural networks or other models. The networks or models are trained for different traffic segments, such as one network trained for each segment.

[00120] A machine, such as a processor, computer, workstation, and/or server, performs the machine training. Training data stored in a memory includes many samples of inputs of sensor data 210 and the ground truth outputs. The training data includes historical sensor data 210 for the traffic network. The machine uses the training data in an optimization, such as ADAM optimization, to learn the values of learnable parameters of the network.

[00121] In act 100, the traffic network, such as a road network, is defined as a directed connected graph 212. Other organizations of the traffic network may be used.

[00122] In act 214, the graphical model 212 is used to group the training data. The flow data 210 from different edges of the directed connected graph 212 are separated or copied into sets for the edge and any outgoing edges within one or two hops. The data is grouped into sets for each segment or edge of the traffic network. For each edge, the grouped sensor data 210 may include the sensor data 210 of that edge and other edges, such as 1 , 2, and/or 3 hop outgoing edges.

[00123] In act 216, the flow data 210 is normalized. The flow data 210 may be normalized before or after grouping in act 214. [00124] In act 218, the groups of flow data 210 are sent and/or used for the respective network or model to be trained. The flow data 210 grouped for each road segment is sent to or assigned for training the respective network or model, such as an LSTM network 304.

[00125] In act 220, hyperparameters to be used in the training and/or for the trained network are tuned. The number of hops of outgoing connected segments to include in the grouping of the sensor data 210, the number of past increments to use in prediction, the period of each timestep, and/or other characteristic of the input or output are defined. For example, time increments of 10 minutes are used with three increments including the current time and one hop is used for input of the sensor data 210 with output of a prediction for the next future (one) time increment. The tuning may be manual selection. In other embodiments, the training of act 222 is repeated with different settings and the performance is tested to find the optimum settings of the hyperparameters.

[00126] In act 222, the processor performs machine training. For example, a traffic predictor is trained for each TMC with deep learning. An LSTM network or other neural network is trained using the training data as grouped. The models (e.g., LSTM network) are machine trained to predict flow for the different edges. One of the models is machine trained for each one of the different edges based on the flow data 210 for the set for the respected one of the different edges. For example, the same architecture and learnable parameters are machine trained separately for each of the different edges.

[00127] Rather than training on specific instances of congestion as the ground truth, the traffic flow is used as the ground truth. The models are trained to predict traffic flow for respective segments. The training is based on all possible traffic conditions observed over the entire traffic network (e.g., city) for historical flow data 210 over any period (e.g., almost one and a half months). A separate algorithm may be used to identify congestion and/or cascade effect from the predicted traffic, so congestion and/or cascade effect is not part of the training data. In alternative embodiments, the congestion and/or cascade effect are used as the ground truth to machine learn to predict future congestion and/or cascade effect.

[00128] In one embodiment, one or more traffic predictors are machine trained for each TMC. The LSTM fabric 302 is trained to provide different predictors for different segments of the network. The grouping of act 214 and the machine training of act 222 forms a distributed spatiotemporal modeling of flow of traffic on a transportation network as the directed connected graph. [00129] For training, the training data includes many samples. The deep learning learns features to be extracted from the input sensor data 210.

These learned features are to be used to estimate the traffic flow for a future time. The features that may be used to best or sufficiently distinguish between traffic states are learned from the training data. For example, deep learning (e.g., deep structured learning, hierarchical learning, or deep machine learning) models high-level abstractions in data by using multiple processing layers with structures composed of multiple non-linear transformations, where the input data features are not engineered explicitly.

A deep neural network processes the input via multiple layers of feature extraction to produce features used to set. The deep learning provides the features used to set. Other deep learned models may be trained and applied. [00130] By training as a connected fabric 302 (see Figure 3), localized information in the network is represented. Only the portions of the network that are affected when a data shift occurs are retrained. This avoids having to retrain the entire single network where a local change in data occurs.

[00131] In act 224, the trained model for each TMC is saved for use in the congestion forecasting phase (see Figures 1 or 2). The trained networks are stored, such as in a memory of the system for application of the machine- learned networks.

[00132] While the invention has been described above by reference to various embodiments, it should be understood that many changes and modifications can be made without departing from the scope of the invention.

It is therefore intended that the foregoing detailed description be regarded as illustrative rather than limiting, and that it be understood that it is the following claims, including all equivalents, that are intended to define the spirit and scope of this invention.

Claims

I (WE) CLAIM:

1. A method for determining a cascading effect of congestion in a traffic network, the method comprising: defining the traffic network as a directed connected graph where each edge is a road segment; identifying the congestion on one or more of the road segments; predicting traffic for a future time at the road segments in the directed connected graph, separate machine-learned models predicting the traffic for respective road segments; and determining the cascading effect of the congestion from the predicted traffic.

2. The method of claim 1 wherein defining the traffic network comprises defining the directed connected graph for at least ten road segments, each of the machine-learned models having been trained for the prediction of the traffic for a single direction of the respective road segment.

3. The method of claim 1 further comprising receiving sensor data for the road segments, the sensor data comprising speed and wherein predicting comprises predicting the traffic as speed.

4. The method of claim 1 wherein identifying comprises identifying the congestion as a ratio of (1 ) a current speed on the one or more road segments to (2) an average speed unlimited by congestion on the one or more road segments being below a threshold.

5. The method of claim 1 wherein predicting comprises predicting for each of the road segments by input of only data of the road segment and data of downstream road segments by only one or two hops on the directed connected graph to the respective machine-learned model for the road segment.

6. The method of claim 1 wherein predicting the traffic comprises predicting the speed of traffic at the future time.

7. The method of claim 1 wherein predicting by the machine-learned models comprises predicting by recurrent neural networks with long short term memory.

8. The method of claim 1 wherein determining the cascading effect comprises determining a start of congestion upstream from the one or more of the road segments with the identified congestion.

9. The method of claim 1 wherein determining comprises detecting that a ratio of predicted traffic to a reference traffic is below a threshold on one of the road segments upstream from the congestion.

10. The method of claim 1 further comprising routing a mobile device on the traffic network based on the determined cascading effect.

11. A system for predicting congestion in a traffic network, the system comprising: sensors configured to sense traffic flow in different road segments of the traffic network; and a processor configured to group the sensed traffic flow separately for different ones of the road segments and forecast the congestion by application of the groups of the sensed traffic flow to respective machine- learned networks, where each machine-learned network was trained for different ones of the road segments.

12. The system of claim 11 wherein the sensors are configured to sense speed of traffic as the traffic flow.

13. The system of claim 11 wherein the processor is configured to normalize the sensed traffic flow of each group to a same scale regardless of differences in maximum speeds and average speeds for the road segments.

14. The system of claim 11 wherein the traffic network is modeled as a directed connected graph, and wherein the processor is configured to group so that the sensed traffic flow for each machine-learned network includes the sensed traffic flow for the road segment for the machine-learned network and the sensed traffic flow for downstream road segments of less than two or three hops of the directed connected graph.

15. The system of claim 11 wherein the machine-learned networks comprise long-short term memory neural networks.

16. The system of claim 11 further comprising an interface configured to transmit the forecast congesting for one or more of the road segments to a traffic routing system.

17. The system of claim 11 wherein the processor is configured to forecast the congestion by an amount of difference of a measure of traffic output by the machine-learned network to a measure of the traffic.

18. A method for machine training for congestion detection, the method comprising: separating flow data from different edges of a directed connected graph into sets for the edge and any outgoing edges within one or two hops; machine training models to predict flow for the different edges, one of the models being machine trained for each one or subset of the different edges based on the flow data for the set for the respective one or subset of the different edges.

19. The method of claim 18 wherein machine training the models comprises machine training long short-term memory networks separately for each of the different edges and retraining less than all of the long short-term memory networks due to a localized change.

20. The method of claim 18 wherein separating and machine training comprises forming distributed spatiotemporal modeling of flow of traffic on a transportation network as the directed connected graph.