Quantifying Uncertainty in Runoff Simulation According to Multiple Evaluation Metrics and Varying Calibration Data Length

Quantifying Uncertainty in Runoff Simulation According to Multiple Evaluation Metrics and Varying Calibration Data Length

1. Introduction

Understanding and predicting runoff behavior is essential in hydrological studies, with far-reaching implications for water resource management, flood control, irrigation, and environmental protection. Hydrological models that use climatic data to simulate and predict runoff are invaluable tools for this purpose. Uncertainties in hydrological models arise from factors such as model parameters, model structure, calibration (observation), and input data [1,2,3,4]. The reliability and accuracy of hydrological models are often affected by uncertainties, making it imperative to effectively discern and quantify them [5,6,7].
Uncertainty is an intrinsic element in all aspects of scientific research that imposes constraints on our ability to interpret and predict outcomes, and ultimately, it influences decision-making processes. Among these types of uncertainties, aleatory uncertainty, inherent in natural processes, is a major source that has received increasing attention in recent years [8,9]. Aleatory uncertainty, also known as inherent or statistical uncertainty, originates from the intrinsic randomness and variability in natural processes and systems [10]. This type of uncertainty exists irrespective of the amount of data or knowledge available, and contrasts with epistemic uncertainty, which is associated with a lack of knowledge or information [11]. Aleatory uncertainty in runoff analysis may stem from natural variations in climatic variables, such as precipitation, temperature, and evapotranspiration.
Numerous studies have been conducted to minimize uncertainty in hydrological models and enhance the prediction accuracy [12,13]. Another crucial factor that influences the accuracy and reliability of hydrological models is the calibration data period and length [6,14]. Calibration is a vital step in hydrological analysis in which the parameters of a model are adjusted such that the model’s outputs match the observed data to a certain extent. The choice of the calibration period, which refers to the temporal span of the data and the data length used for calibration, can significantly affect the model’s performance and simulation results [15,16]. In the context of aleatory uncertainty, the period and length of the calibration data may affect how well the model captures the inherent variability in the system.
While numerous studies have sought to minimize uncertainties in hydrological models, the optimal length of the calibration data period remains a contentious issue. The studies are divided on whether a shorter or longer calibration period leads to more accurate model predictions. Some studies assert that data quality matters, and that even a short calibration period of a year to a few years can produce reliable results [15,17,18,19]. These studies argue that the quality of the data used for calibration is more critical than the quantity, suggesting that high-quality data from a short period can be just as effective as data from a more extended range. On the flip side, most studies advocate for longer calibration periods to enhance the model’s reliability and predictive capabilities [16,20,21,22,23]. These studies suggest that extended periods are particularly beneficial for models with more complex structures or a larger number of parameters, as well as in the context of studies dealing with climate change impacts. Given this divergence of viewpoints, the “one-size-fits-all” notion of an optimal calibration data length appears inadequate. Instead, the question warrants a nuanced exploration. Quantifying the uncertainties arising from varying lengths of calibration data in runoff simulations is crucial for resolving this gap, thereby enhancing our understanding of hydrological modeling.
Previous studies have been conducted to understand the uncertainty caused by the selection of the calibration period [24,25]. However, most of these studies do not consider the variety of evaluation metrics used for quantification. Evaluation metrics such as Nash–Sutcliffe efficiency (NSE), Kling–Gupta efficiency (KGE), Percent Bias (Pbias), and normalized root-mean-squared error (NRMSE), which have been popularly used in hydrological modeling, may lead to contradictory conclusions; therefore, performance evaluation based on a single statistical evaluation metric may be questionable [26,27].

Therefore, this study quantified the uncertainty inherent in the use of different evaluation metrics and different calibration data lengths. The evaluation metrics used in this study are NSE, KGE, Pbias, NRMSE and Jensen–Shannon divergence, and the data lengths for calibration are 1, 2, …, 11 years. Using all calibrated parameter sets, the uncertainty was quantified using the simulated and observed runoff data for the validation period. The Soil and Water Assessment Tool (SWAT), QSWAT3 v1.6.5 was used and the parameters were calibrated using R-SWAT.

3. Results

3.1. Model Performance over the Calibration Period

The SWAT model parameters that reflect the groundwater, hydrologic response unit, watershed, and soil characteristics were selected. The selected parameters, including their boundary conditions, are listed in Table S1. Parameter optimization was performed using R-SWAT by setting the objective function to NSE. The box plots of the model’s performance before and after calibration are shown in Figure 4 and Table 1. Each box plot was generated based on the calibrated period according to the calibration data length. As a result of the parameter optimization, the NSE was more than 0.65 in all cases, showing that the results of the SWAT model can be considered reasonable. The performances of all optimized parameter sets are different according to each period, even if the calibration data length is the same. In other words, the higher the evaluation metrics value and the smaller the value of the interquartile range (IQR), the lower the uncertainty.

The IQR, which represents the model uncertainty in the calibration period, was the largest for P1 before and after calibration, at 0.122 and 0.100, and the smallest for P11, at 0.030 and 0.024. The shorter the calibration data length, the higher the model performance but the higher the inherent uncertainty of each independent period. The average IQR for the NSE, including all calibration data lengths, decreased from 0.085 to 0.069 after calibration, and showed a decrease in uncertainty.

3.2. Evaluation of Performance over Validation Period

The overall hydrologic graph during the validation period is shown in Figure S2. With the exception of an extreme runoff event in August 2020 due to heavy rains on the Korean Peninsula, the simulated daily runoff followed the trend of the observed runoff data, with similar seasonal variability. The results of the evaluation metrics for the validation period according to different calibration data lengths are shown in Figure 5 and Table S2. The uncertainty according to the evaluation metrics and calibration data length was described in each evaluation metric as the average value due to the calibration data length. A general concept of each evaluation metric can be set out as follows: the closer the NSE and KGE are to 1, the lower the uncertainty, and the closer the Pbias and NRMSE are to 0, the lower the uncertainty. Uncertainty based on evaluation metric values differs for each evaluation metric, but uncertainty based on IQR has higher uncertainty with larger IQR values. The average value of NSE in the validation period was 0.71, which was higher than in the calibration period. The average value was the highest (0.74) in P5 and the lowest (0.72) in P1, which means that the uncertainty based on the average value was the highest in P1 and the lowest in P5. The IQR value was the highest (0.05) in P1 and the lowest (0.01) in P7 and P11, which means that the uncertainty based on the NSE value was highest. The average value of KGE was the highest (0.61) in P5 and the lowest (0.59) in P11, indicating the highest uncertainty in P11. The IQR value of KGE had the highest value (0.09) in P1 and the lowest value (0.03) in P7, P9, and P11.

The average value of the absolute Pbias was the lowest in P6 (2.84) and the highest in P8 (4.36), with the highest uncertainty in P8. The IQR of Pbias was the highest in P1 (4.03) and the lowest in P2 and P5 (2.00). NRMSE was the lowest for P5 (51.08), and the IQR was the highest for P1 (3.73). Considering the average, the uncertainty in the NRMSE was the largest for P1 (52.52).

Overall, the uncertainty based on the average value of each evaluation metric varied depending on the evaluation metric. NSE and NRMSE have higher uncertainties for shorter calibration data lengths, while KGE and Pbias have higher uncertainties for longer calibration data lengths. Consistently, P5 to P7 have lower uncertainties. The uncertainty based on IQR values was found to be the highest for P1 for all evaluation metrics, while P7 was evaluated as having relatively low uncertainty.

3.3. Uncertainty Index

The uncertainty index was calculated using the evaluation metrics presented in Figure 6 and Table 2. The uncertainty index was calculated by conducting the min–max normalization of each evaluation metric, where a value closer to 1 indicates greater uncertainty in the runoff simulation. The calculated uncertainty index was the highest for P10 (0.454), while P3 had a low uncertainty, with an average of 0.311. The difference in the length of each calibration period was the highest for P1, with an IQR of 0.181, and the lowest for P11, with an IQR of 0.11. The median value of the uncertainty index also indicated that P3 had the lowest uncertainty, at 0.305, whereas P10 had the highest uncertainty, at 0.458. The maximum value of uncertainty index for P5–7was calculated to be lower compared to the average value of 0.552 (P5—0.425, P6—0.519, and P7—0.448). In particular, the maximum values for P5 and P7 show significantly lower uncertainty compared to other calibration period lengths.

3.4. Evaluation of the Extreme Runoff

To analyze the uncertainty of the extreme runoff simulations, the 98th–100th percentiles of observed and simulated runoff were compared. Overall, the simulated runoff for the 98–100 percentiles performed similarly to the observed extreme runoff, as shown in Figure S3. JS-D was then used to calculate the similarity between the observed and simulated extreme runoff distributions, as shown in Figure 7 and Table S3. The higher the value of JS-D, the higher the difference between the two distributions. Here, the higher the value of JS-D, the higher the uncertainty in simulated extreme runoff. It was found that JS-D for P1 had the greatest uncertainty, with an average of 0.0143, whereas that of P7 was the smallest, with an average of 0.0131. The median values of JS-D also showed that P6 and P7 had the lowest uncertainty, at 0.0132, whereas P2 had the highest, at 0.0145.

3.5. Overall Uncertainty Assessment

The overall uncertainty rankings are shown in Figure 8. In this matrix chart, a higher ranking—with 1 being the highest—indicates greater uncertainty. Thus, the overall uncertainty was lower for runoff simulations with calibration data lengths of five to seven years, with that of P7 being the lowest. In particular, P7 had the lowest uncertainty according to the calibration data length for the extreme runoff simulations. For a period longer than P7, the uncertainty was higher, but the uncertainty based on the IQR was relatively low. This suggests that there is an optimal length for the calibration period, and an excessive amount of data might have heightened the uncertainty in the runoff simulations.

4. Discussion

Hydrological models have been popularly used in water management because the runoff can be simulated using climate data, where observational runoff data are lacking. In addition, hydrological models that consider terrain data reflecting the characteristics of a region provide more reliable simulations. However, to use the simulation data of a hydrological model, it is necessary to optimize the model parameters using the observed runoff data. However, there is some degree of uncertainty in the parameter calibration process. This uncertainty sometimes has significant implications for water resource management and planning [39,40,41].
As shown in Figure 5, the selection of evaluation metrics has a significant impact on the degree of uncertainty. As a result, the uncertainty in the runoff simulations was quantified using different evaluation metrics in this study. The results for NSE and NRMSE were consistent with existing studies, in that short calibration data lengths do not reflect various hydrological cycle conditions, resulting in large uncertainties in the runoff simulations. In contrast, KGE and Pbias showed large uncertainties when the hydrological model parameters with relatively long calibration data lengths were used. This is consistent with previous studies that have characterized NSE as giving more importance to correlation than bias, and that bias and variance can be more easily distinguished when compared using KGE [42]. This confirms that there is some degree of uncertainty using only one or two metrics, and thus a variety of metrics should be considered at the parameter calibration in hydrologic modeling [43].
Contrary to the common assumption that the longer the calibration data length, the lower the uncertainty of runoff simulations when optimizing hydrologic model parameters, the results of this study show that the optimal calibration data length is five to seven years, with the lowest uncertainty obtained with a data length of seven years (P7). This result is similar to previous research in the same region of Korea [25], which found that a calibration data length of six to eight years provides reliable runoff simulation, but differs from studies that generally recommend using a calibration data length of eight years or more [20,23].
To compare with the common assumption that longer calibration data lengths are advantageous for optimizing parameters of hydrological models, this study conducted a comparative analysis using a parameter set calibrated with the maximum observed length of 20 years in the Yeongsan River basin. Consequently, to analyze the uncertainty with more detailed calibration data lengths, the analysis was performed using a calibration data length of P20, which was 20 years long and considered all periods of observed runoff data. For a contrasting analysis, the highest and the lowest uncertainty cases of the shorter calibration data lengths, P1 of one year and P2 of two years, were considered together. The results of the comparative analysis with P7, which has the lowest overall uncertainty in this study, are shown in Figure 9. P20 has higher uncertainty than P7 in all cases of NSE, KGE, Pbais, NRMSE, uncertainty index, and JS-D. This supports the idea that a data length longer than the optimal calibration data length found in this study actually increases uncertainty. For P1 and P2, with shorter calibration data lengths, the lower uncertainty cases showed lower uncertainty than P7, but the higher uncertainty cases all had higher uncertainty than P7. This means that shorter calibration data lengths may have performance and uncertainty benefits for runoff simulation in some cases, but the individual uncertainties for each period are high, resulting in large deviations. This confirms the need to use different combinations of observed runoff data in runoff simulations using hydrological models when the length of the observed runoff data is short or the data lacks continuity.

This study has immediate applications in policy decisions and water management practices. Water resource managers and policymakers can employ the insights gained to optimize calibration lengths and evaluation metrics, thus enhancing model reliability. The methodological approach of using multiple evaluation metrics to quantify uncertainty represents a significant advancement in hydrological studies. Moreover, the results are particularly useful for locations where data may be scarce or incomplete, as demonstrated by the model’s performance despite missing data for 2011. However, there are some limitations that should be highlighted. While this study provides specific insights into the Yeongsan River basin, the methodology and findings offer broader implications for hydrological modeling. The approach to determining the optimal calibration data length, based on a balance between reducing uncertainty and the practicality of data availability, can be applied to other river basins. However, it is important to note that the specific optimal calibration period may vary depending on several factors, including the hydrological characteristics of the basin, the variability of meteorological conditions, and the quality and quantity of available data. Therefore, while our study findings suggest a general approach to identifying an optimal calibration data length, this study recommends that hydrologists and modelers conduct similar analyses tailored to their specific river basins. Such analyses should consider local hydrological dynamics and data characteristics to determine the most appropriate calibration period for their models.

5. Conclusions

The uncertainty of runoff simulations using climate data and a hydrological model in the Yeongsan River Basin located in southwest South Korea was quantified. The uncertainty of the runoff simulations was considered based on the calibration data length and the selection of the evaluation metrics. To quantify the uncertainty of the runoff simulation, and the extreme runoff (95th percentile flow), the difference in performance according to the calibration data length, and the difference in performance according to the validation period were quantified. Extreme runoff was evaluated using JS-D to determine the difference in the distribution from the observed data, and NSE, KGE, Pbias, and NRMSE were applied as the evaluation metrics. Based on the results, the following conclusions can be drawn:

  • Different evaluation metrics all showed different levels of uncertainty, which means it is necessary to consider multiple evaluation metrics rather than relying on any one single metric;

  • Runoff simulations using a hydrological model had the least uncertainty owing to the calibration data length when using a parameter set of seven years, and the uncertainty increased for calibration data lengths longer than seven years;

  • Parameter sets with the same calibration length showed period-dependent uncertainty, which led to uncertainty differences within the same length;

  • For extreme runoff simulations, employing long calibration data lengths (of more than seven years) achieved lower uncertainty than shorter calibration data lengths.

In the end, this study contributes to the broader knowledge base by providing a framework for assessing the optimal calibration data length in hydrological modeling. This framework can be adapted and applied to other river basins, with the understanding that local conditions and data availability will influence the specific outcomes.

Disasters Expo USA, is proud to be supported by Inergency for their next upcoming edition on March 6th & 7th 2024!

The leading event mitigating the world’s most costly disasters is returning to the Miami Beach

Convention Center and we want you to join us at the industry’s central platform for emergency management professionals.
Disasters Expo USA is proud to provide a central platform for the industry to connect and
engage with the industry’s leading professionals to better prepare, protect, prevent, respond
and recover from the disasters of today.
Hosting a dedicated platform for the convergence of disaster risk reduction, the keynote line up for Disasters Expo USA 2024 will provide an insight into successful case studies and
programs to accurately prepare for disasters. Featuring sessions from the likes of The Federal Emergency Management Agency,
NASA, The National Aeronautics and Space Administration, NOAA, The National Oceanic and Atmospheric Administration, TSA and several more this event is certainly providing you with the knowledge
required to prepare, respond and recover to disasters.
With over 50 hours worth of unmissable content, exciting new features such as their Disaster
Resilience Roundtable, Emergency Response Live, an Immersive Hurricane Simulation and
much more over just two days, you are guaranteed to gain an all-encompassing insight into
the industry to tackle the challenges of disasters.
By uniting global disaster risk management experts, well experienced emergency
responders and the leading innovators from the world, the event is the hub of the solutions
that provide attendees with tools that they can use to protect the communities and mitigate
the damage from disasters.
Tickets for the event are $119, but we have been given the promo code: HUGI100 that will
enable you to attend the event for FREE!

So don’t miss out and register today: https://shorturl.at/aikrW

And in case you missed it, here is our ultimate road trip playlist is the perfect mix of podcasts, and hidden gems that will keep you energized for the entire journey


This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More