On the challenges of open access monitoring

Monitoring systems are essential for tracking the progress in open access (OA) and particularly the goal of transitioning from paywalled to OA publications in many European countries. In this work, we express our opinion about the challenges faced by monitoring dashboards in providing a complete view of the OA status, ensuring accuracy in measuring OA production and achieving efficiency in the entire process. We analyze the characteristics of various monitoring systems from European countries, including the sources of data, formats, visualization methods, update frequencies, granularity and types of access recorded. We conclude by underlining the importance of monitoring systems in showcasing policy implementation, aiding decision-making, ensuring compliance and measuring impact in the pursuit of a more open scholarly landscape.

bringing this transition about, including academic and research institutions, funders, libraries and councils.Further, the need to make monitoring reports accessible and exchangeable to make as much information about where OA is heading widely available was quickly identified. 4Over the last decade, several monitoring dashboards have appeared, while in some countries, like Ireland and Slovenia, they have been embedded in national open science plans. 5According to Science Europe, 'Open Access monitoring enables deeper insight into publishing trends, can inform future strategies at institutional and national levels, provides guidance for policy development and review, helps to assess the effects of funding mechanisms and is crucial to negotiate transformative agreements with traditional subscription publishers.' 6 In other cases, reliable recording of research productivity, with emphasis on the visibility of OA publications, makes the collection of such data imperative for securing funding. 7Reporting norms require that their operation is simple, iterative and as automated and illustrative as possible.In this article, we express our opinion about the future of these systems, based on an analysis of the monitoring dashboards as they have been recorded in Europe and beyond.

Purpose
We have identified two main purposes that monitoring systems can serve: 1. Providing status reports: This refers to any system that gathers and processes information related to the progress of OA for scientific information, including types of publications, publication venues, licences and more.This information is essential for research-performing organizations, managing bodies, libraries and policymakers to implement national, institutional and funders policies.
2. Providing cost reports: This refers to any system that records the costs of OA and most specifically the pricing of article processing charges.This kind of system is useful for libraries, associations and policymakers to understand the cost-benefit ratio of their decisions and to provide transparency.
These two kinds of services can work separately or together, but their explicit division is recommended, 'it is not only an appropriate strategy to separate the monitoring of pervasion of Open publication strategies from Open Access cost monitoring, but that a distinct separation of these two aspects can be instrumental for the overall success of monitoring exercises'. 8Perhaps, limiting where these different types of reports are published can provide dashboard managers with an objective view of both.Because of the sensitive nature of cost information, it seems that it is governed by the principle 'as open as possible, as closed as necessary'.Nevertheless, both are included in the analysis since they are considered as OA monitoring tools.

Characteristics
We collected data from monitoring systems in ten European countries, along with five international ones, with the goal of thoroughly understanding their characteristics and distinctions.The sample includes publicly available systems we encountered in our professional and research activities and those we actively sought for this research.We acknowledge that other systems are operating in closed mode, but since we could not perform our own inspection, these are not listed.We have not included institutional dashboards, but we have included organizational ones, as defined below.This sample reflects diversity in terms of geography, economic development and culture and we believe that this ensures a comprehensive view of monitoring systems in Europe, encompassing various contexts and practices.
We have identified and documented the specific characteristics of each monitoring system, along with a rationale detailing why each one is deemed important and relevant within the context of our study: 'This sample … ensures a comprehensive view of monitoring systems in Europe' 1. Source: the origin of the data that they are analyzing and presenting.The source is important in terms of data accuracy and comprehensiveness, and it directly influences the credibility of any report.It encompasses data that the organizations collect and organize themselves, such as from repositories or from local sources (e.g.agreement data) and data from external sources, such as third-party services (see Unpaywall) and bibliographic databases (see for instance Scopus, Dimensions, Web of Science, etc.).
2. Format: the nature of the systems, which can be either dynamic or static.Dynamic systems are characterized by interactivity and adaptability.The dynamic format provides flexibility for users to retrieve specific, targeted information, thus enhancing the comprehensibility and the accuracy of the reports.At the same time, static systems, often an organized set of still images, present information in the limited manner that each system rapporteur has decided and do not allow the user to interact.
3. Graphs and visuals: the visualization characteristics of these dashboards, which often take the form of tabular data and/or graphic representations, such as bar and line charts, pie and doughnut charts, etc.The choice of visual elements significantly impacts the comprehensibility and interpretation of the data.Table 1 presents each graph and visual in the study and explains its purpose.
4. Update frequency: the time interval between the updates of information.The choice of update frequency influences the system's ability to reflect evolving situations, making it an essential consideration for users relying on accurate and up-to-date data for informed decision-making.
5. Granularity: the level they address, such as a nationwide, a regional, an organizational and/or an institutional viewpoint of this information.A nationwide viewpoint provides the very broad overview of a country's OA growth, which can be further analyzed to a regional one.Organizational viewpoints are those of intermediate aggregations, such as of library consortia, funding agencies, etc.All of the above can be refined to a focused, institutional perspective that facilitates the deeper examination of a single institution's OA contribution and progress.
6. Types of access: the types of open access that they record and report on.The diversity in the types of access may reflect the varying priorities and objectives of each system.As mentioned later, different systems may employ the same terms but interpret them differently.This characteristic not only informs about system priorities, but also underscores the need for clarity and consistency in the OA terminology.
These characteristics were chosen because they constitute defining qualities for the accuracy and comprehensiveness (see Source), analysis and exploration (see Graphs and visuals, Format), and representation of the information (see Types of access, Update frequency and Granularity).

Type Description Occurrence
A line chart displays a series of data points connected by lines.Line charts are useful for showing changes over time.In the frame of OA monitoring, they are used to easily visualize progress over time and to indicate patterns and trends.Tables 2 and 3 offer a comparison between the initiatives, grouped into European and international cases respectively.They provide insights into how various countries and initiatives are actively engaged in monitoring and exploring OA.These systems are operated by institutions and agencies of varied scope and responsibilities, including library consortia that want to monitor the progress of their transformative agreements and research funding organizations that wish to check the compliance of their institutions with their policies.
The graphical representations employed, including diverse chart types and tables, not only facilitate data comprehension, but also provide a means for stakeholders to grasp the multifaceted dimensions of OA.Line charts seem to be popular visualizations that serve as visual aids for tracking progress of the various access types, whereas area charts that are characterized by filled areas to visually represent the achievement of specific milestones in the process of reaching a goal, are rarely used.Bar charts are a common choice when it comes to displaying rates or making comparisons among different organizations or subjects.Stacked bar charts, meanwhile, are particularly useful for illustrating the components that make up a whole, often used to showcase the breakdown of data in terms of access types or subjects.Equally popular are visual cues of a numerical nature, such as flash cards and tables, which users are already familiar with, as they are commonly found in everyday documents.

Type Description Occurrence
A pie/doughnut chart is a circular chart divided into slices, each representing a portion of a whole.It is useful for illustrating the composition of a dimension in a way that also reveal the proportions.

5
A flash card represents a numerical value of a dimension.They are easily accessible indicators that inform about the status of a dimension.

2
A tree map chart organizes in hierarchical form sets of data.This hierarchy is displayed as nested rectangles that are proportional to the size of each dataset.In the OA monitoring, they are used to show in a structured way the proportional properties of entities. 2 An area chart displays data as a cumulative series of filled-in areas.In OA monitoring they are used for visualizing simultaneously progress over time and comparison of data sets.A notable observation is the range of data sources harnessed for this purpose.Few initiatives are based on one source only, see the Austrian system and the three cases of consortia, namely FinELib, HEAL-Link and Couperin, which are using data from their agreements with publishers.The other cases combine sources either to complement or validate the stem of their data or to enrich them.Except for a few cases, these systems allow users to dynamically navigate through the data, providing an interactive experience.However, the appearance and functionality of these dynamic systems may vary between different initiatives.While some systems offer more refined data presentations, which allow an in-depth analysis and filtering, others may present the data in a more straightforward manner, showing values without extensive refinement or analysis.Updating frequencies, which range from weekly to annually, reflect the dynamic nature of the data involved and the diverse goals of these initiatives, from institutional to national and even international levels.While most initiatives primarily centre their efforts on journal publications, some, such as the Austrian, French and Swiss initiatives, broaden their scope to include data collection on book chapters and conference proceedings.The French initiative distinguishes itself as the sole case that systematically compiles and presents information regarding the languages of the articles.Conversely, the German initiative adopts a unique approach, relying on bibliometric databases to generate impact indicators, particularly focusing on citation-based metrics.The German initiative is also the only one that integrates cost data in the same dashboard.
Naturally, the main indicators are further analyzed to gauge the growth of OA practices in academia and research.Widely developing a culture of openness in these fields requires insights into the penetration of OA practices in specific domains, which can inform decisionmaking.Stakeholders can tailor their efforts to address the unique needs and challenges of different academic and research areas.These data allow academic institutions, researchers and policymakers to assess the impact of OA initiatives, allocate resources effectively and devise strategies for promoting openness in scholarly communication.The Danish OA dashboard is an example of subject filtering providing such information.This information can guide funding allocation, research support and academic publishing strategies in ways that benefit individual subject areas.Publishers can also provide information, especially when transformative agreement data are used for monitoring.This is often the responsibility of library consortia, such as FinELib in Finland and HEAL-Link in Greece.Only a few cases provide further analyses at a specialized level.For instance, the Austrian case, operated by FWF, provides filters based on the funding programme and whether the publications are peer-reviewed, which can be extremely valuable when it comes to the impact of funding policies and the quality of OA publications.
Although there are exporting functionalities available for these data, both in raw form and aggregations, it is important to note that they are not always licensed with an open licence.This limitation poses a challenge for those who seek to reuse these data, as many systems do not provide clear information on whether it is allowed to do so.As OA monitoring is an area of research in itself, unclear legal data status can deter researchers from using or building upon data, therefore stalling reproducibility.In a global research environment, constrained data sharing can impede collaboration among researchers and policymakers from different regions.Finally, accessible, open data are vital for transparency and accountability in policy-making, preventing suspicions of bias.

Challenges for the current monitoring dashboards
Based on our comparison of the characteristics of the monitoring dashboards, we pose three questions to understand the key challenges, as well as to highlight the key differences between the monitoring dashboards.

Attaining a complete view of OA status
The first question is whether we can have a complete view of the OA outputs of a country.Given the current situation we can have approximate and not exhaustive views of the scientific production of a country.Some of these initiatives address the national level, by collecting metadata for all of the publications of researchers based in that country, while others are limited to the organizational one.Depending on the range of responsibility, for instance, if monitoring is performed by a research-funding organization, a research-performing organization or a national agency, it is expected that the smaller the scale, the higher the accuracy of numbers will be.Even if a funding organization gathers data about all nationally funded projects, such as in the case of the Austrian FWF, still this is a partial view of the OA growth.Naturally, precision in measurement is challenged as the project escalates and attempts to gather evidence on the national level.The collection of all publication metadata, while these vary by each source, is a challenging work, because not all publications are 'accessible, open data are vital for transparency and accountability in policy-making' 'not all publications are indexed in large bibliographic services or findable on interoperable and linked repositories' indexed in large bibliographic services or findable on interoperable and linked repositories.Therefore, there is certainty of bias and the selection of each system's source may impact the representation and the terminology of the data.Additionally, self-archiving rates vary among disciplines, which can be influenced by national practices that affect language representation and recognition.Figure 1 shows that as the perimeter and the size of data expand, it is expected that the density will dilute.The international efforts are expected to be diluted and are represented in the figure below as a dome.
Figure 1.A simplified view of the width and depth of OA monitoring.Indicatively, the example of institutional monitoring shows that the growth can be further analyzed What we consider important is for these initiatives to explain their goals and the reasons behind their current methodologies.By understanding the underlying objectives, we can also have a clearer image of the motivation behind their approaches and thus clarify the scope of the monitoring.Does 'measuring everything' encompass all authors involved in research publications or is it limited to certain key contributors?Does it extend to all affiliations, including academic institutions?Understanding the extent of the data collection is pivotal in comprehending their impact.The extent of data is clearly visible in the case of cost information, as each publication is uniquely and explicitly associated with an invoiced amount of money.

Obtaining an accurate picture of OA production
A second question is how accurate and consistent the OA production of data is.For this, we need clear definitions of criteria and metrics, which are essential for any assessment process, such as those monitoring the progress of OA.Cost dashboards, aligned with welldefined fields (e.g.publisher and publication cost), are easily aggregated, as demonstrated by the OpenAPC project. 10However, alignment becomes more challenging for status dashboards, where disagreements on the definition of the access types result in external misalignments (what is counted) and internal incoherence (double counting a work).Given the numerous definitions, often originating from third parties, several aspects of the analytical process, such as deduplication of versions, are not guaranteed.In addition to the issues of visibility and recognition of some access types, such as diamond, inaccuracies are to the detriment of publishing inclusivity.
'What we consider important is for these initiatives to explain their goals and the reasons behind their current methodologies' Therefore, one of the key improvements needed is the establishment of clear and concise terminology.When there are variations in terminology, there should be an effort to explain these differences.For example, why does a particular publication fall under the 'bronze' type rather than 'gold'?This may be interpreted differently; for some, it means that it refers to any OA publication that does not have a reuse licence, regardless of the type of the journal, while for others it signifies only hybrid journals and gratis publications.An implicit alignment between initiatives seems to happen, as the systems that rely on Web of Science data also use the typology of access by Unpaywall.Therefore, these descriptions coincide with those that use Unpaywall to enrich data from other sources.However, clarity is crucial for avoiding confusion and for ensuring that stakeholders can make informed decisions about the OA models they wish to adopt or support.

Maintaining efficiency and iteration in the OA monitoring process
The third question is whether we can make the OA publishing data collection processes efficient, reliable and consistent in the long-term.Looking at the practical implementation of OA monitoring systems, we conclude that even with the best intentions, it is an exigent task to create a lightweight, simple, automated and transparent system, so that it is both effective and aligned with the ethos of openness.There are technical, cultural, conceptual and often legal constraints, under which one cannot ensure that the collection, analysis and presentation of accurate and comprehensive data can be easily iterated.Additionally, adherence to principles of openness may guide organizations to follow specific routes and to gather data from specific sources, as recently announced by NWO and ZonMw in the Netherlands. 11rther, it is not realistic to expect all initiatives to deliver the same services, some highly automated, or be perfectly aligned together.Achieving regularity in reporting intervals is difficult, as the gathering, processing, analyzing and presenting of information depends on the resources that each organization has available.In addition to these complex aspects, one should consider the organizational setting in which some initiatives operate.There are certain factors that make OA monitoring an arduous task, from lack of strong mandates to limited resources, with an administrative burden that perhaps cannot be held by all stakeholders. 12 seems that a natural step for the OA community is to upgrade the publishing of monitored information to the exchanging of information.To this end, there could be two proposals: firstly, agree on a basic set of indicators in standardized format to allow comparison of progress.Secondly, the initiative (national or organizational) that produces them should contextualize these indicators via a rubric that answers key questions of data origin and analytical process.A better understanding of the goals, measurement scope and inclusion criteria not only informs us about the initiatives themselves but also facilitates interpretation and comparison between different countries and systems.Both proposals rely on the access of data via both open licences and non-proprietary formats.Ideally, access to these could be provided by an API (Application Programming Interface; an automated communication mechanism between two systems), as implemented by Germany's Open Access Monitor.However, since, as mentioned, not all countries can materialize such proposals, providing periodically timely information via data repositories could accelerate the process.A process like this would resemble the OpenAPC project, which gathers and processes cost information centrally.

Concluding remarks
These monitoring systems serve different purposes, among which one can find showcasing the progress of policy implementation, decision-making, compliance and impact measurement.Predominantly dynamic in nature, they underline the importance of real-'one of the key improvements needed is the establishment of clear and concise terminology' 'a natural step for the OA community is to upgrade the publishing of monitored information to the exchanging of information'

8 A 7 A 7 A 7 (
bar chart represents values as bars, where the length of each one varies to show the frequency or size of different categories.They are a very common format that is useful for comparing data based on the perception of length.stacked bar chart represents data as bars, with each bar divided into segments.The segments represent parts of the dimension, and their length corresponds to the values within.In OA they are used for comparing compositions within categories.table chart is a common organization of data on grids.Tables are used for organizing and displaying numerical data, allowing the comparison of values.Contd.)

1 A 1 A 1 A
gauge chart displays a single value within a predefined range.Gauge charts often resemble a speedometer and in the frame of OA monitoring they are useful for indicating performance or progress briefly.progress chart visualizes the advancement of data towards a target point.Progress charts show the completion stage of a task at a given point and in OA monitoring are used to show the level of achievement of a goal, e.g.waiver counting.scatter plot displays individual data points on a two-dimensional plane.Scatterplots show the relationship and/or distribution between variables.In the frame of OA, they are used for visualizing correlations, progress and patterns in data.

1 Table 1 .
9ypes of graphs and visuals ranked by frequency of appearance in the sample9