Linked Open Government Data: Still a Viable Option for Sharing and Integrating Public Data?

[ad_1]

1. Introduction

Open Government Data (OGD), traditionally sourced from governments, encompass public records in areas like transportation, infrastructure, education, health, and the environment [1]. These data are intended to be reused and redistributed, either for free or at marginal cost, to create new business opportunities, increase government transparency and accountability, foster citizen engagement, promote economic growth, reduce costs and efficiencies, and support innovation [2,3,4,5]. Since the launch of the Data.gov portal in the United States in 2009, which stands out as one of the pioneering and influential OGD platforms globally [4,6], several countries have followed suit and launched their OGD portals in the following years, resulting in an enormous amount and variety of datasets available worldwide [7] (p. 151). Thanks to these endeavors, a Deloitte-conducted study commissioned by the European Commission foresees a remarkable surge in the overall direct economic value of public sector information—from EUR 52 billion in 2018 to a whopping EUR 215 billion in 2028 [8]. OGD has applications in diverse sectors, including urban planning, environmental protection, security, mobility, and agriculture [9]. Numerous success stories highlight promising aspects, encouraging the dissemination of best practices that can be applied in similar contexts [10,11,12].
Despite the consistent growth in the publication of datasets and the increasing acknowledgment of the potential advantages of open and accessible government data among both governments and citizens, there are evident indications that the utilization of OGD remains constrained [13,14] and presents challenges [15], with stakeholders expressing concern that only a small proportion of their datasets are actively used [16]. This concern is supported by our recent research [17,18], which confirms that the vast majority of datasets published by government portals, whether international, national, or regional, are largely ignored by users who focus their attention on a limited number of government datasets.
Following Edward’s metaphor of “data friction” which expresses “the costs in terms of time, energy and attention required simply to collect, control, store, move, receive and access data” [19] (p. 84), we can speak of “OGD friction” referring to the obstacles, difficulties, or resistance encountered in the access, use, and sharing of OGD. These frictions can “have both physical and social aspects” [19]. Regarding the first, OGD quality is essential for effective utilization by users, and it is considered a potential technical factor that can hinder or facilitate its exploitation [20,21,22]. In particular, metadata quality is particularly vital because it enables users to search and access dataset descriptions, ultimately enhancing the speed and ease of OGD use [18,23,24]. Low levels of open data literacy [25] are a significant social friction, indicating citizens’ limited familiarity with the design and use of technology [26], and difficulties in accepting technology [27], a misalignment between user needs and the capabilities of available datasets [26,28].
To address these challenges and promote the reuse of OGD, several authors have suggested applying Linked Data (LD) principles to OGD [2,29,30,31,32]. Linked Open Government datasets offer the potential to “replace isolated data silos with larger, interconnected datasets built on top of the Web architecture” [33,34], fostering synergy among disparate data sources [35]. This approach allows users to gain a better understanding of the data’s context by exploring related datasets [36,37]. As proposed by Tim Berners-Lee (https://www.w3.org/DesignIssues/LinkedData.html (accessed on 1 February 2024)) in 2010, the principles of LD offer a modular and scalable solution to counteract fragmentation in government data. This method not only enhances citizen awareness about government functions but also improves administrative efficiency. Furthermore, Linked Open Government Data (LOGD) enable access to and retrieval of data from authoritative and reliable sources. This approach can significantly reduce the inconsistencies found in legacy datasets, leading to more accurate and reliable results when querying open databases [38].
Under these auspices, in the early 2010s, a few government agencies began adopting LOGD principles and technologies [6,39], a movement that was bolstered by academic and research studies [40], prototypes [41,42,43], and methodological proposals [32,44,45].
However, despite the initially promising expectations, our recent research activities on OGD and LD, involving the analysis of various government portals in different capacities [18,25,46], have occasionally identified what seems to be a significant decline in interest in LOGD. This impression is corroborated by Penteado et al. [33], who recently observed, “Although the open government data movement is still producing large amounts of data worldwide, linked data still represents a tiny portion”. Attard et al. also noted “Yet, the use of Linked Data in open government initiatives is still quite low” [34]. In 2020, Hogan also observed that despite the LD community’s success in “convincing various stakeholders to publish data with the implicit promise that applications would justify the cost, these applications did not emerge, leading to the removal of many datasets and related services” [47]. In the adjacent field of Open Science, considering its role in supporting and improving “the discoverability, accessibility, shareability, reusability, reproducibility, and monitoring of data-driven research results on a global scale”, the decision by OpenAire to abandon Linked Open Data (LOD) technologies is indicative of a similar trend. As reported on their website (https://www.openaire.eu/pausing-our-lod-services (accessed on 30 January 2024)), “starting from Monday, 8 May 2023, the SPARQL endpoint will shut down and that no new OpenAIRE LOD Dump versions will be released”.

To illuminate the current state of LD practices adoption in OGD, our objective is to determine whether such adoption is unequivocally declining or if emerging practices and trends, even after nearly 15 years since its inception, still substantiate its foundational assumptions. Within the evolving landscape of the OGD movement, this paper seeks to address the titular question: Are LOGD still a viable option for sharing and integrating public data? To tackle this inquiry, we will examine three crucial snapshots: first, an analysis of LOGD adoption practices by national portals and the scientific community; second, a systematic literature survey probing potential factors impeding LOGD development; and third, an overview of current adoption practices—ranging from foundational to advanced—that align with the expectations set forth in the early 2010s. This review aims to redirect the focus on the use of LOGD, assessing how they align with the current requirements of the OGD movement and potentially reshaping certain initial assumptions that may have proven challenging to implement.

3. Methodology

Given the introductory assumptions, to answer our question “Linked Open Government Data: Still a viable option for sharing and integrating public data?” we defined the following research questions (RQs):

RQ1: What is the current state of Linked Open Government Data?

This RQ sets a broad foundation for our research by seeking to understand the overall current landscape of LOGD. The answer to this question is provided by the combination of two sub-questions.

RQ1.1: What is the prevalence of RDF and SPARQL endpoint distributions in national OGD portals?—This sub-question narrows down the focus to specific technical aspects (RDF formats and SPARQL endpoints), which are crucial for understanding the implementation and accessibility of LOGD.

RQ1.2: What are the relations between OGD and Linked Open Data found in the literature?—This sub-question aims to explore the relationship between OGD and LOD, based on the researchers’ publication practices.

RQ2: What factors are holding back the spread of LOGD?

This question addresses potential challenges and barriers in the adoption of LOGD, which is essential for understanding what might be impeding its broader utilization.

RQ3: What valuable examples of LOGD adoption can be found today?

This RQ seeks to identify successful case studies or instances where LOGD has been effectively implemented, which can provide insights into best practices and the benefits of LOGD.

After the subsequent Section 4, the paper endeavors to address the three research questions in the following sections, as outlined in Figure 1.

4. Related Works

In addressing our first research question, “What is the current state of Linked Open Government Data (LOGD)?”, it is important to recognize the concerns raised by several scholars about the suboptimal adoption of LD and LOD dissemination practices in the government sector, often centered around the limited use of RDF distributions [57,58,59,60]. Despite these concerns, there is a notable scarcity of studies that provide a quantitative analysis of these issues in specific contexts or scenarios [33,61,62,63]. Among these, Ibanez et al. [62], in their analysis of a diverse sample of regional and local European institutional websites, observed that RDF is not widely used. They found that “RDF is still a minority format”, not only when compared to CSV formats (constituting less than 5% of the data formats used) but also significantly “less common than non-tabular structured formats like XML and JSON”, being approximately five to six times less prevalent. In 2018, Pawełoszek et al. [61] focused on exploring the potential for creating business models based on open data and also briefly examined four national portals: the US, UK, Germany, and Poland. This study yielded percentages of RDF distributions that were similar to the findings in our own research. Penteado et al. [33], drawing on various studies of national portals from countries like the US, Brazil, Italy, Colombia, and Greece, also highlight the extremely limited proliferation of RDF datasets. In their analysis of LOD challenges and opportunities for data-driven government initiatives in Russia, Aitkin et al. [63], as of May 2020, noted that the Russian Open Data Portal showed minimal growth in open data, with only 23,775 datasets. Over 60% of these data were in CSV format, indicating compliance with the third level of the five-star open data model, but only five had an RDF distribution, highlighting a significant gap in LOD adoption in Russia.

Compared to previous research, our study provides two significant contributions. Firstly, it presents a comprehensive and current overview of national portals worldwide, including a detailed quantitative analysis of RDF distribution publication and the availability of national SPARQL endpoints. Secondly, in addressing RQ1.2, our study concurrently sheds light on the scientific community’s interest concerning the intersection of OGD and LOD.

Concerning the second research question “What factors are holding back the spread of LOGD?”, the literature review we carried out shows a clear lack of systematic investigations into the obstacles that impede the development of LD within OGD. This situation contrasts sharply with the extensive body of research that examines, from various angles, the challenges and barriers faced in the widespread implementation of OGD. One of the earlier studies focusing on OGD issues is by Zuiderwijk et al. [64] in 2012. They specifically explored socio-technical barriers to the use of open data. Their approach combined a literature review, four workshops, and six interviews to categorize these barriers into ten distinct categories. Notably, they found that the impediments identified through empirical research differed from those documented in the existing literature, providing a unique and comprehensive perspective on the challenges faced in the realm of OGD. Attard et al. [65], in 2015, conducted a thorough analysis of the existing literature, focusing specifically on the processes of data publishing and consuming within OGD initiatives. They also aimed to identify key challenges and issues that prevent these initiatives from achieving their full potential. Based on their findings, they categorized the challenges into five distinct groups. We adopted their categorization as a framework to guide and structure our specific investigation of literature in the LOGD field. In the same year, Verma et al. [66] carried out a comprehensive survey across various government agencies in India. By employing statistical methods to analyze the responses to the questionnaires, they were able to identify five key factors that influence the implementation and effectiveness of government initiatives. These factors, as determined through principal component factor analysis, include governance, resource constraints, capacity building, technology, and lack of awareness. These elements were found to be significant in shaping the outcomes and effectiveness of government policies and initiatives within the surveyed agencies. Roa et al.’s 2018 study [67] presented a systematic analysis of OGD literature, identifying six dimensions of data friction. Their categorization of data frictions overlaps significantly with the findings of our study. However, they differentiate between data quality and technical aspects, whereas, in our research, following the framework established by Attard et al. [65], these two dimensions are merged into one.
Building on the foundations laid by the mentioned studies, our survey highlights the opportunity to systematically examine the challenges associated with LOGD. This area, which has not been fully explored in existing research, is where our survey aims to fill the gap. Through a thorough analysis of obstacles in LOGD, we aim to contribute to a more effective approach for the dissemination and implementation of LOGD practices. This is essential for overcoming the current stagnation we have highlighted in the field. Indeed, although many previous studies have addressed barriers in the LOGD space, they often carry this out superficially, identifying a problem only to immediately suggest a technological solution. This approach tends to overlook the complexity and interconnectedness of the challenges. However, a few of the studies we reviewed offer somewhat more elaborate insights into several LOGD barriers. Portisch et al. [68] delve into the intricate task of connecting organizational information from public datasets to established entities in knowledge graphs such as Wikidata and DBpedia. They undertook this by manually establishing links between datasets from the Open Data Portal Watch [24] and their respective publishing organizations in these knowledge graphs. Through this meticulous process, they uncovered a series of interconnected challenges. These ranged from the dynamic nature of organizations, which frequently change, to the complexities arising from the lack of a uniform base ontology that would facilitate standardized link creation. Furthermore, they grappled with the variable quality of metadata, which is crucial for establishing accurate links, and the complexities introduced by multilingual datasets. Another notable challenge was distinguishing between similar public sector organizations, a task requiring precise disambiguation. To navigate these obstacles, the authors not only proposed targeted solutions for each identified challenge but also advocated for a community-driven approach. They suggested a hand-search service that would enable the collaborative efforts of the data science community and dataset publishers in annotating and refining dataset-level links. This collaborative method highlights the importance of human input and collective effort in enhancing the accuracy and reliability of data linking in the public sector. In their 2020 study, Geci et al. [69] investigated the use of LOGD to improve budget transparency in Kosovo. Their research, which included desk reviews and interviews with government and NGO officials, revealed key challenges: poor metadata quality (e.g., temporality, formats, provenance), difficulties in data linking by government employees, and a lack of educational programs for effective data management. The main conclusion drawn from the interview responses indicates that the application of LOGD in Kosovo remains limited. Despite this, there is a general belief in the potential for implementing LOGD in managing Kosovo’s budgetary data. This highlights the need for focused efforts to increase public understanding and engagement with open data initiatives. Additionally, there is a crucial requirement for targeted training of staff.
As highlighted in Section 6.2, several barriers identified in the context of LOGD are also found in general discussions about OGD, such as those mentioned above, albeit with varying nuances and interpretations. However, our review of the literature on the nature of LOGD issues shows that the implementation of LOGD presents a unique set of complex challenges that significantly differ from general data practices [70]. Collectively, these challenges underscore the specialized nature and complexity of LOGD, hinting at why its deployment faces significant obstacles.

5. What Is the Current State of Linked Open Government Data?

To address research question RQ1, we implemented two complementary methods of investigation. Initially, we surveyed the portals of the countries best ranked by OGDI. The objective of this survey was to gauge the extent of the diffusion of LOD technology within OGD initiatives. Specifically, we examined how many of these portals publish their datasets in RDF format, including the proportion of RDF datasets relative to their total datasets. Additionally, we assessed the availability of SPARQL endpoints in these portals.

To enhance our understanding and provide a more nuanced view, we complemented this empirical analysis with a systematic literature review. This review focused on exploring the scientific community’s interest in the application of LD technology within the realm of OGD. This dual approach enabled us not only to assess the current state of LOD technology in OGD initiatives but also to comprehend the academic perspective and interest in this technology.

5.1. What Is the Prevalence of RDF and SPARQL Endpoint Distributions in National OGD Portals?

To verify the prevalence of LOD practices, we examined the presence of two key indicators, i.e., the number of datasets with RDF distribution(s) and the availability of SPARQL endpoints alongside the national portals of a large number of countries, selected according to the OGDI index adopted by the UN [7]. The decision to concentrate on national portals, as opposed to other governmental portals such as geo-portals or statistical portals, as well as regional or transnational platforms, stems from the pivotal role these national portals play. As highlighted by the United Nations report [7], users prioritize identifying the “official” national government site among the multitude of potentially available government sites, recognizing it as the gateway or starting point for national users. Bulazel et al. [49] emphasize that a government agency’s official web presence serves as the authoritative source of information about its activities, distinguishing itself from unofficial sources like Wikipedia, the news media, or social networks, while [7] notes that national portals tend to be more advanced than those operating at the local level. According to Cyganiak et al. [43], national portals function as central one-stop platforms, offering interested public access to data published by government bodies. These portals play a crucial role in providing “visibility to the process of translating policy into reality”.
Considering the extensive coverage of the OGDI (Open Government Data Index), which assesses 193 countries, we utilized its ranking system that categorizes countries into four tiers. For our analysis of the adoption of LOD practices, we focused on the 78 countries rated as “Very High” and “High” by the OGDI [7] (see Table A1 in Appendix A). These countries’ portals were selected as they are expected to be the trendsetters, being the highest scored by the index. For the selected portals, we checked the presence of RDF distributions and the presence of SPARQL endpoints. The selection of these indicators is directly influenced by the third principle of LD, as proposed by Tim Berners-Lee in 2006, which states: “When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)” (https://www.w3.org/DesignIssues/LinkedData.html (accessed on 1 February 2024)). Berners-Lee outlined this principle as part of a broader framework to guide the effective use of the web for linking data, also slightly revising it, as the fourth star in the five-star scheme for LOD: “use open standards from W3C (RDF and SPARQL) to identify things so that people can point at your stuff”. Given this context, it is reasonable to infer that the presence or absence of these two parameters (RDF and SPARQL) in a national OD portal is indicative of the degree to which the portal adheres to LOD principles. This adherence is crucial for ensuring that, when users access a URI (Uniform Resource Identifier), they are met with information that is both useful and standardized, thereby fulfilling a key aspect of LOD. The use of RDF distributions as a gauge to measure the prevalence of LOD practices is a concept endorsed by multiple researchers. These authors have highlighted, in both recent and occasionally unsystematic studies, the rather limited adoption of these practices [33,57,58,59,61,62,63]. Kumar et al. [58] underscore that “RDFs are central to Linked Data and LOD”, encapsulating their crucial role. Meanwhile, Penteado et al. [33] emphasize the significance of RDF as a benchmark: “Although RDF is not the exclusive format for serializing linked data, its widespread recognition as the most popular format makes it a valuable proxy for gauging the implementation of LOGD”.
The systematic analysis of the 78 countries that ranked highest in the GODI, conducted between June and September 2023, serves both as an update and a complement to the results of previously cited studies. This analysis confirms the continued very low adoption of RDF in the distribution of OGD datasets. Out of these 78 portals, only 26 feature at least one RDF distribution and only 21 of them have more than one. As illustrated in Figure 2, only 10 of these 26 portals have more than a marginal one percent of RDF distributions. However, the overall percentage of RDF usage remains low, with only four portals exceeding a 4 percent usage rate. Among the portals analyzed, the Italian portal notably stands out with a modest yet comparatively higher adoption rate of RDF, at 6.5%.
Additionally, a closer examination in absolute terms further highlights the limited extent of RDF adoption in OGD. Among the 21 portals that feature multiple RDF distributions, they collectively account for just 19,255 out of an extensive total of 801,765 datasets, which equates to a mere 2.4 percent. This figure, however, is significantly skewed by the contributions of a few countries. Notably, the United States, Italy, and Spain have a disproportionate impact on this percentage, with their respective counts of 10,297 RDF datasets out of 250,717, 3854 out of 59,516, and 2530 out of 69,879. In contrast, countries like Australia, Germany, the UK, and France display a markedly lower number of RDF representations, despite their substantial volumes of published datasets. For instance, Australia and Germany, with 105,647 and 82,845 datasets, respectively, maintain only 331 and 265 RDF distributions each. The situation is even more pronounced in the UK and France, where, despite having 51,502 and 44,486 datasets, respectively, each country has a surprisingly low number of 13 RDF distributions. This discrepancy highlights the uneven adoption of RDF across different national portals, and underscores the need for a broader and more balanced advancement in the use of RDF in OGD initiatives. Interestingly, the distribution between the “Very High” and “High” categories by the GODI is evenly split, with each category comprising 39 countries. However, a closer examination of the 21 countries that maintain more than one RDF dataset, as shown in Figure 2 and in Table A1, reveals that 16 of them, accounting for over three-quarters, are rated as “Very High” by GODI. This trend suggests that a strong commitment to good OGD practices may be somewhat conducive to the adoption of LOD practices. However, considering the overall limited adoption of LOD, as evidenced by our analysis, this potential facilitation appears modest.

The examination of the second key indicator, the presence of national SPARQL endpoints, reveals an even more challenging scenario in the realm of LOD practices. Among the 78 countries evaluated, a mere five—Italy, the Czech Republic, Spain, Germany, and Croatia—have established a SPARQL endpoint. Notably, Italy, Spain, and the Czech Republic are also among those countries that demonstrated a higher presence of RDF distributions.

This limited deployment of SPARQL endpoints, essential for querying RDF data, further underscores the global disparity in the adoption of advanced LOD technologies within government data initiatives. The scarcity of such endpoints reflects a significant hurdle in realizing the full potential of LOD, especially in facilitating efficient data interoperability and access across diverse government platforms. This result resonates with the literature, which on several occasions reports that most of the SPARQL endpoints were unavailable or almost permanently down [47,71]. Meanwhile, Mouzakitis et al. [70] notes, “Unfortunately, in recent years, a significant number of data providers have ceased supporting and maintaining public SPARQL endpoints, thereby damaging the trust between consumers and open Linked Data providers”.

5.2. What Are the Relations between OGD and LOGD Found in the Literature?

To address Research Question 1.2, we conducted a comprehensive exploration of Open Government Data-related topics by querying the Scopus and Web of Science digital libraries. Keywords such as “open government data” and “linked data” were employed in the search (see Table 1). The search, limited to English-language papers spanning the period from 2010 to June 2023, aimed to discern trends and select recent studies for analysis. Notably, the initial date retrieved by the digital library search engine aligns with Tim Berners-Lee’s promotion of the “Linked Open Data 5 Star” concept, which specifically targets “especially government data owners”. To determine the extent to which LOD research is integrated with OGD research, we used two search queries, namely Q1 and Q2 (see Table 1). The first included only “open government data” and the second combined “open government data” with terms related to LD. Both libraries were searched using title content, abstract content, and keywords.
In total, the first search returned 1290 papers, while the second returned 193 papers. These figures and Figure 3 highlight two facts. Approximately one-sixth (15%) of the research studies on the OGD phenomenon also cover LOD technology. Additionally, although the number of articles on OGD experienced growth in the first decade followed by stabilization in the last three years (probably due to the COVID-19 pandemic, as suggested by Wirtz et al. [48]), the studies that simultaneously address LOD, except for the initial four-year period (2010–2013), have remained relatively constant throughout the considered timeframe.

Based on these figures and trends, one could reasonably conclude that research on OGD has gradually shifted its focus away from LOD. Conversely, it may suggest that researchers consider LOD less compelling for the advancement of OGD than it appeared to be in the early years of the past decade. On the other hand, a less unfavorable interpretation could be considered, suggesting that the consolidation of LOD within the OGD research field, having reached a plateau in the number of studies after the initial four years, has also attracted researchers’ curiosity to a lesser extent.

8. Discussion and Conclusions

To address the paper’s title question, Are LOGD still a viable option for sharing and integrating public data? we explored the three distinct yet complementary research questions outlined in Figure 1. This approach provided insights into the challenges and opportunities of adopting LD practices within the OGD context.

RQ1 is divided into RQ1.1 and RQ1.2, providing an overall snapshot of the current landscape of LOGD. RQ1.1 reveals that the number of datasets served natively as RDF is low compared to the vast amount of government data produced. Only 2.4% of data is available in RDF across the 78 national portals analyzed and only five of them provide an active SPARQL endpoint. Question RQ1.2 suggests a deceleration of LOGD momentum. However, this fact can have a double reading. On the one hand, it seems to suggest a possible decrease for scholars in the perceived importance of LODs for OGD progress compared to the early years of the last decade. Alternatively, a more favorable interpretation may suggest that the consolidation of LODs within OGD research has reached a plateau, potentially leading researchers to explore other areas.

In the opposite direction, in RQ3, we have shown evidence that LD practices have a noteworthy impact, especially on how governmental data is aggregated, and made searchable and accessible by catalogs, as primarily witnessed by the aggregation served by the European Data Portal (more than 1,600,000 datasets from 36 nations and 183 catalogs, made available in a harmonized access point). Apart from the impact on data catalogs, LD practices have affected national guidelines and have improved data interoperability, consolidating practices for sharing controlled vocabulary (i.e., via publication of SKOS terminology and thesauri) and fostering the adoption of transversal data models such as in the case of the Italian National Catalog of Data Semantics and the European Core Vocabularies. Notably, the influence of guidelines varies across nations. In the instances of Italy and Spain, the impact is discernible through the availability of datasets adhering to LD principles, as illustrated in Figure 2. Conversely, in the case of the German Federal Government, the impact is characterized more as a visionary pursuit rather than an immediate, tangible outcome, as evidenced by the low number of RDF distributions in the German data portal. The LD approach enables the retention of the original semantics and carries forward the choices made in data modeling. This makes data consolidation and interpretation more reliable, as information can be handled more consciously and coherently.

However, the contrast emerging from RQ1 and RQ3 finds an explanation in the frictions still present and analyzed in RQ2 and tensions that need to be balanced. The first tension is between voluntary efforts vs well-funded coordinated efforts. Data activism, as advocated by some LD enthusiasts, must be encouraged, but the generation of high-quality data cannot rely solely on activism. Some public administrations already allocate resources for producing and processing quality data; the key is to adopt technologies that efficiently leverage existing funding, unlocking the data’s full potential. The second tension revolves around the effort required for publishing versus consuming data. The potential of LD to break down data silos, integrate information from diverse sources, and establish value chains that enhance the access and reuse of public data clashes with two conflicting interests. On one hand, providing LD semantically has the potential to simplify user searches and the comprehension of government datasets, thereby encouraging their reuse without the need for creating more complex mashup applications. On the other hand, if the complexities and constraints—both cultural and economic—associated with data publication are deemed excessive or burdensome, administrations may either refrain from publishing or release data in a suboptimal manner (e.g., using flat formats, ad hoc terminology, or exhibiting inconsistency). Consequently, users stand to lose most, if not all, of the benefits offered by OGD. Simplicity in publishing should not compromise the provision of easily interpretable data. For instance, sidestepping the responsibility of offering precise semantics through shared vocabularies imposes additional challenges on consumers when it comes to data integration—perhaps by resorting to guesswork regarding the original data semantics and misusing the information. Striking the right balance between data publishers and consumers is crucial. Linked data, in this sense, provides the tool for preserving the original semantics of data, playing on the side of fostering the adoption of standard vocabularies and explicit links between datasets provided by distinct providers. It eases splitting some of the efforts required for ETL among publishers and consumers, but the extent of the split for each of the different players needs to be balanced differently depending on the nature and origin of the data.

Although the analysis in response to RQ2 (Table 2) indicates that the majority of LOGD works primarily to address technical and organizational issues, it is crucial not to underestimate the influence of social, legislative, and economic factors. The challenges identified are frequently addressed either through specific, ad hoc solutions or, as noted by some authors, solutions deemed too general. These solutions primarily pertained to technical and methodological aspects, with occasional considerations for legislative or social factors. However, these solutions only partially succeeded in advancing LOGD practices. In our assessment, both research and government agencies often lacked the holistic perspective necessary to unlock the potential of LD for OGD. Despite its challenges, adopting an integrated approach that comprehensively addresses various dimensions of OGD is indispensable. Any integrated approach in the OGD domain, however, should keep in mind the tensions discussed and decide the appropriate trade-off to the target open data initiative. Our analysis of LOGD attrition and success stories, coupled with broader insights derived from OGD best practices, suggests the following recommendations as key components of such an approach.

Establish robust data governance to foster a culture of openness and elaborate clear policies, by championing organizational structures that streamline LOGD workflows, ensuring seamless navigation through the bureaucratic and cultural challenges prevalent in government contexts.

Identify High Value Datasets. As recognized by the EU Commission, certain subsets of government data are more strategic than others. In this context, identifying priority data and establishing a suitable budget is paramount to ensure cohesive approaches and avoid financial inefficiencies. Given their relevance, it is reasonable to expect that the quality of these datasets would be particularly well cared for, facilitating their seamless integration into LOGD. Moreover, the capability to integrate High Value Datasets from various thematic fields through LD can be a powerful driver for maximizing their value.

Cultivate stakeholders engagement also by anchoring the development of targeted guidelines in real-world use cases. Guidelines should be specifically crafted for LOGD. This includes the provision of tangible examples and an intricate consideration of non-functional requirements, reinforcing a comprehensive approach directly applicable to day-to-day operations. Tailoring guidelines to address specific user needs and challenges ensures practical relevance and effectiveness. This approach aligns organizational efforts with the practicalities of user interactions, fostering a more meaningful and user-centric approach.

Longer-term maintenance activities must be explicitly factored into the equation. Maintaining up-to-date data and the continuous operation of servers, including those supporting SPARQL endpoints, incurs costs that must be carefully considered. This financial aspect must be accounted for to prevent the squandering of the already invested efforts.

Coordinated approaches should also consider a pay-as-you-go perspective, as not every dataset requires the same extent of LD. For example, not every dataset needs to be interlinked with the others to be reused: the adoption of URIs and dereferenceable standard data terms provided by shared LD vocabularies is a step forward in balancing the effort between publisher and consumer.

While the inherently decentralized nature aligns with the spirit of the Web of Data, the heterogeneity and complexity within public administrations and their, often intricate, organizational practices suggest a need for centralized, domain-focused solutions. These solutions can effectively handle the intricate processes of data transformation and integration—from raw to structured and interlinked data—thus enhancing the efficiency of published data. When recurrent issues are addressed in a centralized hub, organizations can implement a more cohesive and systematic approach to problem-solving. This centralization fosters a comprehensive understanding of recurring issues, paving the way for the implementation of more effective and standardized solutions. Instead of mandating an LD expert in each administration, a coordinated approach should prioritize shared and domain-targeted tools accessible even to smaller administrations, allowing them to participate without requiring specialized expertise. This strategy empowers public administrations to contribute effectively, emphasizing the integration of user-friendly tools to address this essential need.

The tensions and frictions under discussion serve as a catalyst for re-evaluating certain assumptions commonly held in the LD framework, which may not seamlessly align with the LOGD. Unlike the broader LD perspective, where data is often disseminated in an entirely unregulated manner without an explicit public mandate for data production, the governmental context demands a distinct approach. In the realm of Government Data, the combined governance of LD and OGD practices within the collaborative network of stakeholders and initiatives becomes crucial and, in our view, inevitable, in order to best benefit from the potential offered by LD. Unlike the more generalized LD viewpoint, the LOGD framework necessitates a deliberate and explicit commitment to a public mandate in data production. This shift underscores the significance of orchestrating effective governance mechanisms among the diverse actors involved in LOGD initiatives, acknowledging their pivotal role in fulfilling the public mandate for data accessibility and transparency in governmental operations. Establishing a seamlessly interconnected global data space for OGD on a worldwide scale might be deemed unattainable and could potentially lead to disillusionment. Nevertheless, the endeavor to align with LD principles and technologies should be viewed as a strategic investment. This investment is aimed at maximizing the efficiency of data production efforts, refining the streamlined data production processes and unlocking future multiplication factors within the data value chain to extract enhanced value. In essence, the commitment to LD principles becomes a proactive approach to optimize the generation and utilization of data, paving the way for greater efficiency and value realization in the broader data ecosystem. From this perspective alone, we posit that the question presented in the title of this paper can be answered in the affirmative: yes, Linked Open Government Data remains a viable option for sharing and integrating public data.

[ad_2]

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More