^{1}

^{1}

^{2}

^{3}

The authors have declared that no competing interests exist.

Betweenness Centrality (BC) has proven to be a fundamental metric in many domains to identify the components (nodes) of a system modelled as a graph that are mostly traversed by information flows thus being critical to the proper functioning of the system itself. In the transportation domain, the metric has been mainly adopted to discover topological bottlenecks of the physical infrastructure composed of roads or railways. The adoption of this metric to study the evolution of transportation networks that take into account also the dynamic conditions of traffic is in its infancy mainly due to the high computation time needed to compute BC in large dynamic graphs. This paper explores the adoption of dynamic BC,

In the context of smart transportation, it is reasonable to consider road networks as weighted and directed graphs, to better capture road diversity (

The dynamic nature of transportation networks depends on multiple factors, such as travel demand, passenger behaviors, road conditions, weather-related phenomena and accidents. Similarly, recommendations deriving from trip planners might have a significant impact on traffic dynamics, by generating unbalanced traffic distribution and thus easily saturating critical areas of the network. In fact, even in the presence of smart trip planners that take into account global and real-time traffic conditions, traffic unbalance still exists due to sudden appearance of disruptions and accidents. Similarly, travel information can direct the network state towards inefficient equilibria, due the presence of unequipped users’ as well as selfish or bounded-rational behaviors from equipped ones [

In this context, it appears beneficial to have global information about traffic conditions in order to improve the quality of local decisions by taking under control their impact on the whole transportation system. This information can be inferred from local data about traffic conditions that, today, can be easily collected via ubiquitous sensors (such as loop detectors, travel assistants, mobile phone apps, etc.) that help to monitor large urban areas. However, processing these data to provide predictive estimations of short-term traffic states and their impact on the whole network is a challenge.

Several domain-specific approaches, often leveraging simulators based on physical models of traffic propagation and people mobility, have been proposed by researchers to estimate short-term traffic conditions from monitored data [

Current lines of research exploiting graphs in the transportation domain are mainly focused on static topological information related to nodes and intersections [

BC has been effectively exploited in many domains to identify the components (nodes) that are mostly traversed by information flows and therefore potential critical spots. These spots are particularly relevant in transportation networks, since they are subject to hardly predictable hazardous and cascading effects. The latter may have tremendous impacts on the operation of the urban infrastructure at large scales and with extreme rapidity.

Recent literature [

Several solutions have been proposed in recent years to reduce the computation time of this metric [

This paper proposes an approach to compute BC values on periodic snapshots of dynamic graphs representing transportation networks. The proposed approach can be thus leveraged to promptly detect anomalies or abrupt changes of network properties and traffic dynamics, by monitoring the sudden variations of the weighted shortest paths that traverse the links of the network.

The contributions of the paper are the following:

the adoption of dynamic BC for ahead monitoring of transportation networks conditions.

the analysis of static and dynamic BC in different scenarios and with various graphs by exploiting a large dataset of traffic-related real observations (GPS traces of vehicles). Interestingly, and originally with respect to [

an extensive evaluation of our algorithm in terms of performance, scalability and accuracy on large dynamic road-network graphs, showing that it outperforms other state of the art algorithms for computing BC of dynamic graphs.

Our paper, via the proposed approach, lays the foundations for a novel data-driven, complex network-based control system for supporting resilience enhancement of large-scale road networks.

The rest of the paper is organized as follows. First, we present related work. Then, we describe our model and metrics to characterize road-network vulnerability. A case study, related to the analysis of both static and dynamic BC atop a large-scale road network is presented. Finally, we conclude the work by also highlighting future directions.

Graph models, network theory and BC, a metric originally proposed in [

Over the last decade, BC has been particularly exploited in the context of transportation networks [

Other works have studied the effect of high-load on roads to evaluate resilience and efficiency in transportation networks for cities in US [

As an important limitation, most of the existing works consider networks and their attributes (

BC for dynamic analysis has been proposed in [

The main limitation of BC when used as an indicator of vulnerability over large-scale networks is its extremely high computation time, even when computed by the fastest general solution for exact BC computation proposed by Brandes in [

Starting from the considerations above, we started in previous work the exploration of a different approach taking into account that the border nodes of clusters obtained through modularity-based clustering techniques are the nodes with a high value of BC since they are crossed by all the nodes inside the cluster to reach all the other nodes of the graph. Therefore, we focused on the identification of pivots, as explained later in this paper, that avoid BC errors of border nodes and the nodes outside the cluster of the pivot, for each cluster obtained by clustering the initial graph.

A first result on unweighted graphs has been presented in [

In this section, we introduce the main assumptions we consider to model a dynamic transportation network, the related graph model, the metric we use to analyze it, and the algorithm proposed for efficient computation of this metric (BC) over large-scale, dynamic, weighted and directed networks.

In our study, we take into account three fundamental real-world, well-known properties of large-scale urban traffic networks:

The choice of travel time as edge weight stems from the importance of such variable as a proxy to identify the appearance of congestion on specific road segments. However, it is worth to remark that link travel time per se is not sufficient to identify critical network traffic condition in the near-future. In fact, road links with high travel time at a given moment might not necessarily represent critical links of the network at that specific instant or in following ones, as they might not appear on a significant number of shortest paths connecting different pairs of origins/destinations (OD). Conversely, a link characterized by a low travel time, close to free-flow conditions, could represent a critical link, as it might belong to multiple shortest routes connecting a large number of OD pairs.

The considerations above derive from the global behavior of the network: by relying on the assumption that users tend to choose shortest paths to reach their destinations, when congestion appears, or hardly predictable accidents occur on a given path, the network is not at equilibrium and people may look for alternative shortest paths to reach their destinations, possibly following the indications provided by real-time travel planners. The previous assumptions are realistic if we consider that modern vehicles are equipped with smart navigation systems and planners may significantly affect users’ route choice. This phenomenon can easily lead to grid-locks and large-scale propagation of congestion. In fact, travelers might easily saturate areas of the network (possibly still in free-flow conditions) that are central to the global functioning of the urban system (

Under the assumptions above, by continually computing node BC of large urban-scale road networks, weighted by travel times collected by sensors, control strategies could be designed to smartly re-distribute traffic flow for balancing node BC values over time with the aim of keeping the BC distribution close to the one observed in free-flow conditions. Therefore, we can conclude that:

By relying on the continuous quick computation of node BC over a time-varying weighted graph, such a system could be used to promptly advise vehicles about the availability of reasonable path alternatives to the shortest one, whose choice can contribute to the global improvement of network performance. Such alternatives should be computed by taking into account the current distribution of BC values in a given, possibly congested, area and by identifying the paths that allow for a more homogeneous distribution of BC values in the given area during the next time step.

We assume the following definitions throughout the paper. Let _{ij} ∈

A path _{i}, _{j}), between two nodes _{i} and _{j} of _{i} and _{j}, represented by _{i}, _{j})), is the sum of the weights of the edges (or hops) to reach _{j} from _{i}. If nodes _{i} and _{j} are directly connected, then the path length is the weight of the link, or 1 for unweighted graphs. A shortest path between any two nodes _{i} and _{j}, denoted as _{i}, _{j}), is the path with the minimum length, among all the paths connecting the two nodes. Multiple shortest paths may exist between the same pair of nodes, _{i}, _{j}) = _{i}, _{j})) is the length of the shortest path between nodes _{i} and _{j}. We denote _{i} and _{j}, while _{i} to _{j} that cross node _{k}.

Node betweenness centrality (BC) [

We define betweenness centrality for weighted networks as in [_{i} to _{j} and _{k}) the number of them traversing vertex _{k}, then, the weighted betweenness centrality of vertex _{k} is defined as:

We exploit modularity for clustering weighted directed graphs with the Louvain method [

Given a graph

1:

2:

3:

4: _{i} ←

5: _{i} ← _{s}(_{z}(

6: _{i} ← _{i}, _{i},

7: _{i} ← _{i},

8: _{i} ← _{i})

9: _{i} ← (_{i} − _{i}|

10: _{i} ← _{s}(_{z}(

11:

12: _{i} ← _{i} + _{i}

13:

14:

15:

The main result of clustering is the identification of

Then, a parallel execution of the Brandes algorithm (based on Dijkstra) is performed inside each cluster to compute the

The information above is used also to identify the nodes inside each cluster _{i} that equally contribute to the dependency score of each node of the graph (class of equivalence, see [_{i} is defined with reference to the values of dependency score (BC contribution) computed on the nodes outside _{i} with an SSSP exploration started from a node of the class: all nodes in the class equally contribute to the BC of the nodes outside _{i}. Taking into account that nodes belonging to the same class produce the same dependency score on each node of the graph outside cluster _{i}, one representative node should be identified as a source node (called class _{i} in order to compute the contribution of all the nodes of the class by multiplying the dependency score due to the pivot by the cardinality of the pivot’s class.

The partial dependency score calculated for the pivot is then multiplied by the cardinality of the pivot class (line 9). This method avoids re-applying Dijkstra’ algorithm to another node of the same class, thus ensuring fast calculation of BC if _{i}, we introduce an approximation error since the simplification applies only to the nodes outside _{i}. We are working also to remove this source of error in the exact version of the algorithm [

To further reduce the computation time, we have extended the concept of

To perform class grouping, we exploit a parallel implementation of the

Differently from the first two sources of errors, this last one can not be removed since it is induced to relax the constraints of class identification to have larger classes and consequently a lower number of pivots. However, it is important to highlight that removing errors has an impact on performance; this consideration leads to the conclusion that approximation and computation time should be considered as a whole and consequently when approximated results are acceptable, as in the domain we are discussing in this paper, the current version of W2C-Fast-BC is to prefer. The final value of BC is obtained for each node by summing up all partial contributions (produced by the reduce operation of line 10) with local BC values (lines 12).

In this section, we discuss how dynamic betweenness centrality can help in understanding transportation traffic dynamics and providing insights for predicting traffic flows.

For our analysis, we consider a very-large directed graph, namely ^{2}. This dataset was created using digital maps supplied by the French National Institute of Geographic Information (IGN). The network consists of 117,605 nodes and 248,337 edges. By extracting the largest connected component of the resulting network, the final undirected and unweighted graph (see

Name | Description | Size | Dataset Main Attributes | Source |
---|---|---|---|---|

Multi-attribute, directed, static graph of the Rhone department road network. | 117,605 nodes and 248,337 edges. | Edge attributes: road segment length, width, number of lanes, speed limit, importance of the road segment (from 1 = max to 5 = min), post-code of the road segment area (INSEE), link geometry. | IGN | |

Map-matched, time-stamped, geo-referenced trips of floating taxis on working days of March, April and May 2011. | 5,662,844 GPS geo-referenced elementary trips related to 103,639 unique taxi trips. | Elementary-trip attributes: unique taxi’s trip identifier, GPS-logged coordinates of the segment starting point, GPS-logged coordinates of the segment arrival point, segment travel time. | Radio Taxi | |

Multi-attribute, directed sub-graph derived by joining the Rhone-TAXIS dataset and the Rhone-ROADS network: each edge of the subnetwork has at least one elementary trip associated to it. | 35,940 nodes and 59,132 edges. | Edge attributes: static attributes from Rhone-ROADS; per-edge median speeds derived from the elementary trips (24 median speed values for each edge, corresponding to the median of all speeds observed over the edge during the twenty-four 1h-time-slots from 00:00 to 23:00), as derived from Rhone-TAXIS. | Derived dataset |

Maps throughout this research article were created using open-source data from OpenStreetMap and OpenStreetMap Foundation, which is made available under the Open Database Licence. Map tiles are from OpenStreetMap cartography, which is licensed as CC BY-SA (see

By considering the graph only as undirected and unweighted, many relevant properties of the road network and its dynamics could be missed. Therefore, in the first part of our evaluation, we consider multiple weighted, directed and static graphs that have been derived from the Rhone-ROADS network by selecting some of the available topological attributes (

Our second dataset, namely

A preliminary analysis of our dataset has shown a relatively low number of observations especially during evening and night-time (

In

(a) Hourly number of elementary trips for the typical working day (b) Number of edges from the Rhone-ROADS with median speed (c) Evolution of the network median edge-speed over time.

Regarding the spatial dimension,

From visual inspection of

As a preliminary step for the evaluation of BC on

The three static configurations of the graph are graphically represented in

(a) Undirected, unweighted (circle size proportional to node’s BC) (b) Directed, length-weighted (circle size proportional to node’s BC) (c) Directed, free-flow-travel-time-weighted (circle size proportional to node’s BC).

The visual inspection of the different figures of BC, makes it evident the effect of using different weights for BC computation. Particularly, it can be noticed that on the undirected, unweighted version of the Rhone-OBS graph (

In order to produce a dynamic graph, the Rhone-OBS graph has been leveraged to extract multiple weighted graph instances, depending on the specific time slot we consider in our analysis. Several instances of the Rhone-OBS graph, related to different hours of the typical day, are graphically shown in

Edge color (from black/red to yellow) indicates higher speed-ratio,

To extract the graph associated to a specific time slot

In

In

The size of each circle in the subplots is proportional to node’s BC. (a) 06:00 (b) 07:00 (c) 08:00 (d) 09:00 (e) 10:00 (f) 11:00.

(a) Node with the highest BC at 06:00: evolution of its BC over time (b) Node with the highest BC at 08:00: evolution of its BC value over time (c) Node with the highest BC at 10:00: evolution of its BC value over time.

These results provide first hints on the importance of using dynamic, weighted graphs in the computation of BC as well as the need for a rapid algorithm for computation of up-to-date BC values. In that sense, the figures unfold interesting dynamics of people’s mobility in the city of Lyon: during morning peak hours (

In order to dig deeper into the interactions between TTBC and traffics dynamics, we performed a more specific analysis of the

(a) Per-edge temporal correlation (only edges equipped with loop detectors have a non-null value) (b) Zoom on an area with two roads: Qaui Dr. Gailleton and Quai Claude Bernard (c) Evolution of the TTBC and flow on the road: Qaui Dr. Gailleton and Quai Claude Bernard.

We analyze the detail of these dynamics (related to dynamic BC and flows) by focusing on a specific region of the analyzed graph including two roads with a mirror behavior in terms of temporal correlation,

To summarize, BC computed on static weighted graphs (

Finally, an intermediate situation exists, with “neutral” areas (mildly positively or negatively correlated in terms of TTBC and flow) being characterized by either low or medium/high traffic demand and usually capable to dispatch such flow, without becoming congested. It is worth to remark that the considerations above further confirm the need for providing an efficient solution to rapidly compute BC on large-scale and very dynamic weighted graphs.

We exploited our W2C-Fast-BC algorithm [

As a preliminary step to evaluate the performance of W2C-Fast-BC, we have compared our approach to both Brandes [

The reported analysis has been performed in sequential mode for two reasons: first of all, a sequential execution permits to clearly quantify the benefit of our technique with respect to Brandes only as a consequence of the reduced number of SSSP explorations, whose number corresponds to the cardinality of the identified set of pivot nodes. Secondly, the available implementation of BADIOS does not support parallelism. As another important limitation, the available implementation of BADIOS does not implement any support for weighted graphs. Thus, we have considered an unweighted version of the Rhone-ROADS graph from

W2C-Fast-BC configurations
W2C-Fast-BC avg. perc. error (top- 1000 nodes) [%]
Speedup of W2C-Fast-BC to (Scala) Brandes
Speedup of BADIOS to (C++) Brandes
K
# pivots
0.01
972
0.93
3.00
0.1
9,684
0.175
0.2
19,368
0.089
0.3
29,047
0.050
0.4
38,732
0.043
2.45
0.5
48,445
0.038
1.96
0.6
57,905
0.023
1.70
0.7
67,518
0.020
1.51
0.8
77,192
0.015
1.26
0.9
86,439
0.009
1.26
1.0
94,354
0.004
1.05

The following sub-sections aim at generalizing these preliminary results related to the performance of our approach, by specifically focusing on two aspects:

Spark was configured to work in the standalone cluster mode on two Intel Xeon E5 2640 2.4 GHz multi-core machines, each equipped with 56 virtual cores and 128 GB of DDR4 RAM.

As a first static graph, we exploit the original directed Rhone-ROADS graph, where edges are both directed and weighted according to the lengths (in meters) of the road segments. Thus, the computation of shortest paths through the Dijkstra algorithm reduces the BC of the nodes traversed by longer paths between pairs of nodes. We remark that the road-length does not account for traffic dynamics.

The values of nodes’ weighted BC via our novel W2C-Fast-BC algorithm is reported in

(a) Nodes’ BC values (circle size proportional to node’s BC) (b) Execution times (W2C-Fast-BC and Brandes BC) with 10 cores (c) Percentage error of W2C-Fast-BC with K-fraction = 0.2 (top-1000).

As a second static graph, we weigh edges by considering free-flow-travel-time (FFTT), an information easily derivable for all edges of the Rhone-ROADS network by dividing the road length by the road segment speed limit. FFTT-weighted BC values computed via the W2C-Fast-BC algorithm are reported in

(a) Nodes’ BC values (circle size proportional to node’s BC) (b) Execution times (W2C-Fast-BC and Brandes BC) with 10 cores (c) Percentage error of W2C-Fast-BC with K-fraction = 0.2 (top-1000).

The considerations on performance of road-length-weighted BC apply also to the FFTT-weighted BC, with a slightly higher speedup approximately equal to 5.2 and a |0.6%| bounded percentage error on the BC value for the top-1000 nodes. The number of SSSP explorations in this case corresponds to 12,727,

To obtain a dynamic, weighted network, larger than the one observed from taxi trips (

(a) KNR-interpolated graph at 08:00 (b) Top-1000 nodes’ BC values at 08:00 (c) Execution time of W2C-Fast-BC vs Brandes-BC at 08:00 (d) KNR-interpolated top-1000 BC percentage error at 08:00.

The top-1000 values of BC associated with this graph are reported in

The W2C-Fast-BC solution proposed in this paper is subject to different sources of randomness, noise and measurement inaccuracies. In this section, we provide a discussion of the most important aspects that could impact the validity of BC-based estimation and the way such threats to validity have been addressed in this paper or will be investigated as matter of future work.

First of all, our solution is impacted by the randomness of the Louvain method and that of the K-means clustering, which are used, respectively:

Concerning data availability, in our empirical evaluation we use consecutive GPS position observations from taxis on-board GPS sensors and the derived instantaneous speed measurements, which have been post-processed and associated to specific road segments of the underlying network via a map-matching solution. It is important to remind that we derive hourly typical travel-time graph weights by exploiting road length information and multiple speed observations related to a multitude of vehicles traversing each road segments, collected over different days. Nonetheless, samples can be unavailable or very limited due to low traffic flow, thus hampering the quality of the aggregate travel-time information and, therefore, the possibility of retrieving realistic hourly snapshots of the road network traffic dynamics, especially during non-peak hours and nighttime periods. Therefore, it appears realistic to study the accuracy of our W2C-Fast-BC solution over time and evaluate its robustness with respect to a varying number of available observations, as presented in

The shaded portion of the graph corresponds to measured values of standard deviation at each time step.

Regarding noise, sensors are naturally subject to faults and inaccuracies that might generate imprecise, biased or anomalous measurements. In our context, position and speed measurements are acquired via civil-use GPS navigation systems, which are notoriously subject to several sources of error, largely studied in many papers from the related literature [_{t}, we modify the travel-time weight _{l}, associated to the generic network link _{t}, by drawing a random sample from a 0-centered Gaussian distribution _{l}_{t} and that such error depends on the free flow travel time of the road segment, _{l}, by capping possibly negative weights to the minimum observed travel time from the edge weights of _{t}. By repeating this operation for all the links of _{t}, we finally obtain a noisy instance

In _{08:00}, to compute the noisy instances considered in our evaluation. The following statistical indicators are considered to evaluate the accuracy of BC estimation in the different noise scenarios:

Mean and standard deviation are reported for each accuracy indicator as obtained by aggregating over the 10 instances of the noisy graph

Gaussian noise Ψ | Top-1000 mean abs. perc. err. [%] | # of retained Top-1000 nodes | Normalized inversion count [%] | |||
---|---|---|---|---|---|---|

mean | std | mean | std | mean | std | |

2.5 | 0.14 | 0.01 | 997.80 | 1.69 | 0.28 | 0.03 |

5 | 0.13 | 0.02 | 998.10 | 1.73 | 0.28 | 0.03 |

10 | 0.14 | 0.02 | 998.20 | 1.14 | 0.31 | 0.04 |

20 | 0.14 | 0.02 | 998.50 | 0.85 | 0.30 | 0.03 |

To further investigate this aspect, we apply again both W2C-Fast-BC and Brandes on the ten random instances of _{08:00}. To that purpose, all the indicators in _{08:00} as reference values. We underline that, in this context, the absolute percentage error does not represent a real error, but rather the percentage absolute difference with respect to the baseline BC values computed on the graph without noise. Similarly, the errors related to BC ranking only indicate that the most BC critical nodes are different in presence of noise (_{08:00}). Statistics for W2C-Fast-BC and Brandes are reported in Tables

Gaussian noise Ψ | Top-1000 mean abs. perc. err. [%] | # of retained Top-1000 nodes | Normalized inversion count [%] | |||
---|---|---|---|---|---|---|

mean | std | mean | std | mean | std | |

2.5 | 1.19 | 0.59 | 982.80 | 7.94 | 1.88 | 1.03 |

5 | 6.08 | 2.68 | 933.50 | 26.88 | 8.66 | 3.82 |

10 | 18.37 | 6.72 | 782.80 | 81.63 | 19.03 | 6.14 |

20 | 27.60 | 6.77 | 655.50 | 58.17 | 24.31 | 4.39 |

Gaussian noise Ψ | Top-1000 mean abs. perc. err. [%] | # of retained Top-1000 nodes | Normalized inversion count [%] | |||
---|---|---|---|---|---|---|

mean | std | mean | std | mean | std | |

2.5 | 1.15 | 0.60 | 983.00 | 8.14 | 1.80 | 1.02 |

5 | 6.07 | 2.68 | 933.60 | 27.29 | 8.58 | 3.83 |

10 | 18.35 | 6.71 | 782.60 | 81.91 | 18.96 | 6.14 |

20 | 27.57 | 6.79 | 655.50 | 58.71 | 24.23 | 4.39 |

The reported sensitivity analyses make possible to conclude that our W2C-Fast-BC solution with KNR-based interpolation is extremely accurate as well as robust to both limited availability of observations and presence of noise in the observations, when used to compute the BC of the most critical nodes of the network (

In this paper, we have proven, through an in-depth analysis performed on a large real network, that betweenness centrality is a useful indicator of both structural bottlenecks (in static, unweighted and weighted graphs) and traffic conditions (in dynamic, weighted graphs). At the same time, we have pointed out that in a dynamic context the estimation of traffic flows requires fast computation of BC. A requirement that we satisfy with our algorithm able to compute good approximation of BC values in short times.

For the study, we have used two datasets: one related to the whole road network of Lyon, whose weights have been computed using free-flow travel times and road lengths, and another one containing GPS traces of taxi trips, with a partial coverage of the whole road network, for estimating dynamic weights from average travel times.

The results of BC computation over the network representing the Lyon metropolitan area (both in a static and dynamic scenario) confirm that our algorithm is very fast and at the same time able to keep the error below a desired threshold, so posing the basis for its exploitation as core component of a ahead monitoring system.

Possible improvements in terms of approximation errors will be evaluated in comparison with performance degradation due to the removal of the sources of error at the first clustering level with the integration of the new version of the algorithm aimed at computing the exact value of BC.

Our W2C-Fast-BC algorithm is planned to be plugged into a dynamic and adaptive distributed control system aimed at performing resilience enhancement by keeping, over space and time, the values of BC close to the ones observed in free-flow conditions so as to achieve a more uniform distribution of traffic flows and, consequently, guarantee more efficient network states at equilibrium.

In the future, we intend to extend our analysis by exploiting a dataset with a larger coverage for dynamic weights as, due to a limited coverage of GPS data, we have been forced to apply an interpolation technique based on non-parametric regression to realistically scale the dynamic analysis on a larger network. Moreover, we plan to evaluate and eventually integrate BC with other information related to a preventive or statistical knowledge of traffic distribution that could contribute to the definition of a more effective traffic predictor based on betweenness centrality.