Statistical and Clustering-Based Assessment of Variable Speed Limits Effects on Motorway Performance from Real-World Observations
[ad_1]
1. Introduction
The study contribution consists of providing a comprehensive analysis of the operational impact of VSL using a large dataset on the Padua–Mestre motorway to provide an answer on the effectiveness of VSL in reducing traffic congestion based on data collected in the field; the impact is analyzed for different traffic conditions, including the episodes of recurrent traffic congestion, by using two different and complementary methods including consolidated statistical methods and a clustering approach.
2. State of the Art
The research activity on Variable Speed Limit (VSL) systems primarily falls into two main categories: implementation and testing of control algorithms by simulation models and assessment of the performances of the control systems operating in the real world.
In the following years, many countries implemented the VSL system control on highways. The objectives pursued by the implementation of the VSL system concerned the general harmonization of the traffic state as well as improvements in road safety and emissions.
This paper aims to add quantitative evaluation research to the framework of the VSL system impact assessment applied in the real world. The effectiveness of the system application is analyzed not only using consolidated statistical analysis techniques but also by incorporating two specific clustering methods. Based on the authors’ research, clustering techniques have not been employed to assess the impact of the VSL system.
3. Methodology
This study introduces an analysis framework that integrates data analysis statistical testing methods and clustering techniques for evaluating the effect of the VSL system on the traffic flow performance on individual lanes.
Table 1.
State of the art.
Table 1.
State of the art.
Location | Motorway Length (Km) | Data | Study | Assessment Measure | Evaluation Results | Ref |
---|---|---|---|---|---|---|
Barcelona | 14.5 | 2 D 1 | FE 2 | Delay, emission, and safety cost function | Decrease in free flow, longer travel time, significant changes in speed distribution, general failure of VSL | [3] |
Munich | 18 | 1 D | FE | FD 3 | Identification of several bottlenecks in the network and smoother flow during traffic congestion | [17] |
Munich | 16.3 | 25 DW 4, 6 DWO 4 |
FE | Flow Profiles | A slight decrease in capacity, congestion improvements, and notable homogeneity | [29] |
Michigan | NF 5 | 46000 V 6 | FE | Quantile regression | Significant improvement in driver compliance | [31] |
Europe | NF | 27 D | FE | FD | 15 to 20% increase in critical density and no clear results about capacity | [1] |
Seattle | 11 | 1 D | FE | Full Bayesian analysis | Reduction of total crashes | [32] |
UK | NF | 1 YW 7 2 YWO 7 |
FE | Travel Time, Flow | Reduction in travel time and collisions | [27] |
Stockholm | NF | 1 DWO MW 8 |
FE | F-Test, FD | Not any significant impact on traffic conditions | [23] |
Rotterdam | 4.2 | 1 Y 1 | FE | Traffic performance, emission, and traffic safety. | With a 3% reduction in travel time and 20% in the number of lost vehicle hours, air quality and noise levels slightly deteriorated. | [20] |
Virginia | NF | 23 M 1 | FE | Linear regression, Z-test |
Improvement in driver compliance and average speed reduction | [30] |
Texas | NF | 11 D | FE | Standard deviation | Improvement in flow consistency and reduction of crash severity | [22] |
Missouri | 61 | 5 D | FE | Kolmogorov–Smirnov test, FD | Change in flow–occupancy relation and decrease in average daily congestion in most locations | [21] |
Virginia | 20 | 9 M | FE | Chi-square test, Z-test | There was no increase in drivers’ compliance and a 30% reduction in travel time | [18] |
Netherlands | 2.2 | NF | FE | FD | Improvement of traffic distribution | [28] |
Barcelona | 13 | 3 D | FE | FD | Stabilizing occupancy and preventing capacity drop | [33] |
Rotterdam | 3.3 | 2 D | FE | FD | A flow reduction of 15% | [24] |
Europe | NF | 14 D | FE | Mean speed | Reduction of average speed and rear-end crashes | [26] |
Seattle | 16 | 5 D | FE | t-test, TT 10, Speed | Improvement of traffic performance | [25] |
Netherlands | 15 | 149 D | FE | Shock wave speed and demand | VSL efficiency and strategy failure’s motivations | [19] |
Not real | 12 | NF | S 9 | METANET | Reduction in total travel time | [8] |
Orlando | 32 | NF | S | PARAMICS | Safety improvement in medium-to-high-speed regimes and no benefit in low-speed situations. | [13] |
Calgary | 8 | NF | S | PARAMICS | Safety, delay, and traffic conditions improvements | [11] |
Netherlands | 14 | NF | S | METANET | Reduction in capacity and driver’s compliance | [10] |
Edmonton | 9 | NF | S | VISSIM | Improved safety and mobility | [9] |
Netherlands | 12 | 5Y | S | VISSIM | Improvement in safety | [7] |
Not real | 2.5 | NF | S | PARAMICS | Crash potential reduction and increase in travel time | [12] |
Naples | 6.3 | NF | S | VISSIM | Improvement in mobility of network and fuel consumption | [6] |
Toronto | 8 | 6M | S | PARAMICS | Up to 50% improvement in safety and increase in travel time | [34] |
Not real | 21.5 | NF | S | METANET | Capacity reduction | [37] |
3.1. Clustering of Speed and Flow Profiles
The K-means algorithm is an iterative algorithm that partitions the data into K clusters, where K is a user-defined parameter. This method aims to partition data based on their similarities. In this case, the algorithm groups the traffic speed profiles based on the Euclidean distance. The algorithm starts by randomly selecting K data points (here, speed or flow profiles) to be the initial centroids. Then, each data point is assigned to the nearest centroid based on the Euclidean distance between the data point and the centroid.
This process of assigning data points to centroids and updating the centroids is repeated until the centroids no longer change or a maximum number of iterations is reached. After the procedure convergence, each cluster contains the data points that are more like each other rather than points of other clusters.
The algorithm starts by selecting an unvisited point and finding all the points within a specified radius of that point, known as epsilon, which is identified using the Euclidean distance. If the number of points in the radius is greater than or equal to the specified minimum number of points, then a cluster is formed. If epsilon is set too high, then many data points that should be part of different clusters may be lumped together into one big cluster. On the other hand, if the epsilon is set too low, then some data points may be classified as noise points even though they should be part of a cluster. If the number of points is less than the minimum number of points, then the point is considered as noise or an outlier.
Once a cluster is formed, the algorithm recursively expands the cluster by finding all the points within the epsilon radius of the points in the cluster. This process continues until all the points in the cluster have been identified. The algorithm then moves to the next unvisited point and repeats the process until all points have been visited. However, DBSCAN can be sensitive to the choice of parameters, especially when the density of the data varies widely across the dataset. In such cases, it may be necessary to perform parameter tuning to achieve optimal results. Choosing a good value for epsilon often involves some trial and error and optimization methods, and in many cases, it depends on the specific dataset and the desired clustering results. Additionally, the algorithm can be computationally expensive for large datasets, as it requires calculating distances between all pairs of data points.
3.2. Statistical Tests for Traffic Performance Variables
Statistical tests are used to determine whether there is evidence to support a hypothesis, compare groups or samples, and assess the significance of differences or relationships between variables of interest. In this study, two types of statistical tests are performed:
-
Student’s t-test is a statistical test used to compare the means of two independent samples or paired observations. The test is based on the t-distribution and is used when the sample size is small and the population standard deviation is unknown. The test statistic, denoted by t, is the ratio of the difference between the means to the standard error of the difference. The null hypothesis of the test is that the means of the two samples are equal. If the null hypothesis is rejected, it means that there is sufficient evidence to suggest that the means of the two samples are different.
-
The Kolmogorov–Smirnov test is a non-parametric statistical method, implying no assumptions about the underlying distribution of the data, that can be used to compare the distributions of two independent samples. This test is often used when the normality assumption is not satisfied or when the shape of the distribution is unknown. The Kolmogorov–Smirnov test works by comparing the cumulative distribution functions (CDFs) of the two samples, X1 and X2. The test statistic, denoted by D, is the maximum absolute difference between the two CDFs. If the underlying distribution of X1 is shifted towards greater values than the distribution of X2, then CDF(X1) is less than CDF(X2). The p-value is the probability of obtaining a test statistic as extreme as the observed value, assuming that the null hypothesis is true. If the p-value is less than the significance level (set to 0.05 in this study), the null hypothesis is rejected, indicating that there is sufficient evidence to suggest that the samples come from two different distributions and that the distribution of X1 has higher values than the distribution of X2. It is important to note that the Kolmogorov–Smirnov test is sensitive to differences in both location and shape between the two distributions. Therefore, it can be used to detect differences that may not be detected by other tests that focus only on location (e.g., t-test).
4. Experimental Analysis
4.1. Case Study and Available Data
The available data used in this study included:
The sub-dataset used for this research was limited to situations with no lane closures on the network. The analyzed traffic situations include both periods when the VSL was active and displaying a message for the three lanes and when it was not active at all.
The data collected by the count locations refer to the period from January 2021 to November 2021. The data are differentiated by light and heavy vehicles for each lane and include the traffic counts, the average arithmetic speed, the average harmonic speed, and the headway measures collected every minute. For each lane, the occupancy measure is provided together with the accuracy index for the collected data.
4.2. VSL Control Algorithm
The implemented VSL system is an advisory system without enforcement measures applied to guarantee drivers’ compliance. The control algorithm used by the Variable Speed Limit system applies 40 km/h, 50 km/h, 60 km/h, 70 km/h, 80 km/h, and 90 km/h speed limit for each lane based on the traffic conditions detected upstream.
Additional steps of the control algorithm concern restraining the difference between the speed limit and the speed observed near the VMS (observed speed control), obtaining a smooth speed limit profile for sequential VMS locations (speed limits transversal control), obtaining the same values for at least two lanes (longitudinal control), and avoiding higher speed limits for slower lanes (hierarchical control), and consistency with ordinary speed limits (ordinary control).
The control algorithm operates iteratively, aiming to ensure that the rules are satisfied and that the differences stay within given thresholds.
When the system is not active the ordinary speed limit (No VSL) in the urban expressway under study is to be considered, which is equal to 90 km/h for the left and the center lanes, while the ordinary limit for the right lane, which is narrower than the standard is 60 km/h.
4.3. Motorway Performance with and without Activation of the VSL System
4.3.1. Activated Messages and Speed Limits
Generally, the analyzed cases concern the speed distributions when the VMS displayed speed limits from 40 km/h to 90 km/h. In the case of the application of a 90 km/h speed limit by the VSL system for the left lane and the center lane and the application of a speed limit of 60 km/h for the right lane, formally, there was no change imposed on the drivers with respect to the conditions when the system was not operating. However, in the case of the active VSL system, the message signs displayed the speed limits, while in the case of the inactive system, no message sign appeared on the display.
4.3.2. Analysis of the Observed Speeds with Different Speed Limits Displayed
The distribution of the observed speed always assumes the highest values for the left lane, followed by the center lane and finally, the right lane is the slowest. In cases of No VSL applied, and the cases of 60 km/h and 90 km/h the distributions of the speeds on the left lane and on the right lane follow a bi-modal shape, indicating situations of congestion reported for some of the count locations.
In the case of the 80 km/h speed limit, the distribution of the observed speeds does not preserve such a bi-modal shape. This is because the application of this speed limit was limited itself and included only cases of forming congestion, while the limits of 90 km/h and 60 km/h were applied by the system more frequently and covered different observations of the traffic state. The application of the speed limits of 40 km/h, 50 km/h, and 70 km/h had seen only a few cases of application; for this reason, the frequency distributions do not show a smoothed shape.
As some locations report congested traffic states and the differences between observed values are moderate, the Kolmogorov–Smirnov two-sample test was conducted individually for each count location to verify whether the observed speed distributions are significantly different and are shifted towards smaller values in the case of the active VSL system.
It is also worth noting that in the case of the ordinary speed limit shown by the system, the observed speeds are significantly lower than in the case when the VSL system was not showing any message. Showing the ordinary limit by the VSL system led to a moderate reduction of the speed between 2 km/h and 4 km/h for different lanes.
As per compliance, the results show that the application of the VSL system generally increased the percentage of values below the posted speed limit. Despite the levels of compliance being moderate, the activated VSL system increased the percentage of drivers that follow the speed limit, even when an ordinary speed limit of 90 km/h for the fast lanes or 60 km/h for the right lane was displayed by the system with respect to the days when the system was not activated. For the left lane, the compliance ranged between 12% and 18% for different speed limits, except for lower compliance values of 4% and 3% that were observed, respectively, for the speed limits of 80 km/h and 40 km/h. For the center lane, the compliance ranged between 16% and 53% for different speed limits, with an exception for the speed limit of 60 km/h, when the compliance was 6%, while for the right lane, the compliance ranged between 13% and 30%. Lower compliance levels can be attributed to more fluid traffic states, where the drivers did not perceive the necessity to adjust the speed.
4.4. Motorway Performance during Recurring Congestion
One of the main objectives of the VSL system is the prevention of congestion, so further analysis focuses on recurrent congestion conditions. A visual analysis of the observed speed values on the motorway indicated that recurrent congestion took place in the same area (that is, from km 11.0 to km 13.4 on the direction from Mestre to Padua) in 40 days, relative to 19 days in which the VSL system was operating and was triggered by high-density values observed upstream, while the remaining 21 days report data when the VSL system was not operating. Statistical t-tests performed on each count location section showed that the traffic flows could be considered similar at a 5% significance level for the two observation periods.
On both days, the congestion started forming at around 17:10 and was dissolved at around 18:00. The figure on the top is relative to the 5 October 2021, a day when the variable speed limit system was not active, while the figure on the bottom is relative to the 3 March 2021, when the system was active and changed the speed limit shown based on the revealed conditions.
where and are, respectively, density and speed observed on the segment j during the time interval i, is the length of the segment j, and is the time interval of data aggregation. N is the number of count sections, and T is the duration of the observation interval.
Statistically significant results were observed for all three lanes for the average speed and the mean travel time. On the days with the VSL system activated, higher speeds were observed for the three lanes, and the differences were 3.1 km/h (3%), 3.5 km/h (4%), and 3.4 km/h (5%), respectively, for the left, central, and right lanes.
With reference to a 1-km long segment, the MTT obviously equals the inverse of the average speed; however, it is useful to report it with reference to the average distance traveled by the drivers on the motorway stretch under analysis. In fact, with VSL activated, the average travel time on the carriageway was 6.02 min, with a 4% reduction with respect to the average travel time without VSL (6.24 min). The mean travel time was reduced on the days when the VSL was active; the reduction was 0.19 min (−4%), 0.27 min (−4%), and 0.31 min (−4%), respectively, for the left, central, and right lanes.
The results for the mean distance traveled are similar for the two observation periods. For the left lane, the mean distance traveled was slightly reduced by 0.04 km (−1%) on the days when the system was operating, and the statistical significance was reached, while for the remaining two lanes, the mean traveled distance was slightly higher when the system was operating, and the results are not statistically significant. Since the flows on the ramps are not directly observed, the slight difference observed in the mean traveled distance can be attributed to variation in demand patterns. The mean flows were slightly lower on the days when the system was operating (reduction between 1% and 2% for the different lanes) and showed significant similarity in the two observation periods.
4.5. Clustering Analysis Results
The K-Means and DBSCAN clustering algorithms were applied both for the flows and speed profiles for each count location separately to derive the traffic behavior patterns. The results are analyzed by considering the distributions of the number of days assigned to different clusters, with a further focus on the recurrent congestion and the state of the VSL system. Scikit-learn Python’s library was used to implement the clustering analysis.
The K-Means algorithm performed clustering by dividing the data into distinct clusters based on the weekdays, Sundays, and Saturdays for both the school period and summer, while the DBSCAN provided three distinct yet similar clusters.
DBSCAN algorithm individuated two clusters for the count location situated at km 11.0 and four clusters for the count location situated at km 11.7; analyzing the results, it becomes evident that the cluster −1 refers to the days classified as outliers. For both count locations, most of the days when the system was active are assigned to a unique cluster, while the days when the system was not operating are distributed to different clusters.
Similar results are provided by the K-Means algorithm, which categorized the data into six clusters, as specified earlier. Also, in this case, almost all the days when the system was operating are attributed to a distinct cluster, namely cluster 1 for km 11.0 and cluster 3 for km 11.7, while the days when the VSL was not active are assigned to different clusters.
These results are in line with the statistical analysis and suggest that the observations of the recurring congestion relative to the days when the VSL system was active were similar in terms of the observed speed, while the days when the VSL system was not operating greater variances were observed among the detected speeds, and therefore the days are distributed to different clusters.
It is also worth noting that labeling most of the days with the activated VSL system in a unique cluster was observed mostly for the congested sections, while upstream and downstream of the congested area, where the traffic conditions remained stable, both clustering algorithms assigned different days to clusters regardless of the state of the VSL system.
As for the observed values of the flows, the results showed that days were distributed to different clusters regardless of the state of the activation of the VSL system, which is also in line with the statistical analysis that did not reveal statistically significant differences for the total distance traveled, affected by the flow values.
4.6. Speed Variance Analysis
In general, the variance values assumed higher values when the VSL system was not operating. For the left lane and the center lane, it is evident that the variances increase closer to the point where the recurrent congestion is formed, located between km 11.7 and km 12.0. The highest values of the standard deviation are observed for the center lane, followed by the left lane, and finally, the right lane reports the lowest values.
The average standard deviation was 8 km/h on the days when the VSL system was not operating and reduced to 7 km/h on the days when the system was active. The reduction of the variance is statistically significant for the left lane for the count locations situated at km 11.7 and km 12.0; in the case of the center lane, the statistical significance is obtained for the km 11.0, km 11.7, and km 12.0, while for the right lane, the statistical significance is obtained for the count location situated at km 12.7 downstream from the formed congestion.
Also, the statistical analysis of the variance of the observed flows was conducted. This analysis did not provide any proof of the statistical differences between the days referring to the two cases of the system application.
5. Discussion
The Kolmogorov–Smirnov test was applied to assess the differences in the observed speed distributions with and without VSL: on average, around 74% of count locations reported statistically significant results. For the left lane, the 85th percentile of the observed speeds was 119 km/h in the No VSL case and decreased on average by 15 km/h (12%) in the VSL case as the speed limit decreased from 90 km/h to 60 km/h. For the center lane, the 85th percentile passed from 103 km/h in the No VSL case to an average of 93 km/h in the VSL case, with a decrease of 10 km/h (10%). Despite general low values of compliance to the speed limit, ranging between 3% and 53% for different cases, an increase of compliance between 2% and 8% for different lanes was observed when the VSL was active, even in the case of ordinary speed limits, i.e., 90 km/h for the left and center lane and 60 km/h for the right lane, was indicated by VSL.
Similar cases of recurrent congestion were analyzed statistically with and without the VSL activation, using data relative to 40 days with recurrent congestion occurring between km 11.0 and km 14.0 of the Westbound. The average speed (V avg), the mean travel time per vehicle (MTT), and the mean travel distance per vehicle (MTD) distributions were compared by t-test to verify differences in the observations.
Also, a clustering analysis confirmed the results obtained by the statistical tests. For the locations situated within the recurring congestion area, most days when the VSL system was active were in the same cluster, while the days when the system was not active were distributed to different clusters. These results were obtained independently by the two tested clustering approaches, namely K-Means and DBSCAN. For the locations situated out of the congested area, the state of the VSL system was not relevant, as different days were distributed to different clusters regardless of the state of the VSL system.
The average profiles of the intravehicular variances relative to different count locations were examined, and the results showed significant differences for the count sections situated within the congested area. On the affected count locations on the days when the VSL system was activated, the standard deviation decreased by around 1 km/h (12%) for different lanes with a statistical significance of 95%.
6. Conclusions
The paper presented a field study of an Italian expressway with the implementation of an advisory Variable Speed Limit system, carried out on the data collected by the count locations during almost one year of observations covering cases when the system was active (VSL) and cases when the VSL system was not operating (No VSL).
The analyzed traffic patterns appeared different both by using statistical tests and clustering approach, and improvements in terms of the average speed and the mean travel time were observed in the case of the active VSL system. The distributions of the observed speeds were significantly different in the two cases, and on most data, different speed limits were applied. The analyzed system was advisory without enforcement measures implemented to guarantee compliance rates; thus, the observed differences were moderate, and the compliance level was low.
The obtained results suggest that the VSL system can potentially improve the performance of traffic flow; however, the introduction of enforcement measures aiming to increase compliance levels are to be considered for implementation to empower traffic flow management.
The provided analysis was concerned mostly with the distributions of the observed speed, while the observed flows were analyzed only on a macroscopic level in terms of average values observed by different count locations. However, to make further considerations about the evolution of the traffic propagation process with the application of the VSL system, future research needs to consider aspects of traffic deterioration on the fundamental diagram that relates speed and density.
[ad_2]