Adopt, Adapt, and Share! FAIR Archeological Data for Studying Roman Rural Landscapes in Northern Noricum

This paper offers a detailed overview of the archeological data from the “Roman Rural Landscapes in Noricum” (RRLN) project. It focuses on the less-explored northern and northeastern rural regions of Roman-period Noricum (c. 16/15 BC to 488 AD). The University of Vienna’s PHAIDRA system was employed for the long-term archiving of selected new archeological data, adhering to the FAIR (Findable, Accessible, Interoperable, Reusable) principles. The project adopted an innovative digital archeology approach, combining open geodata with various unstructured datasets within a Geographic Information System (GIS) framework. Accordingly, this method aimed to deepen our understanding of Roman rural landscapes in a specific Area of Interest (AoI). The paper highlights the selective preservation of crucial archeological data in a specialized repository and also promotes open science to improve the discoverability and usability of data related to Roman-period objects.


Figure 1
The RRLN project concentrated on a region in the Danube Limes' hinterland, spanning the area between the Erlauf and Traisen river valleys in Lower Austria's Mostviertel region.Here, it investigated several Romanperiod sites (map after Hagmann, 2020c).
The rural settlement's geospatial and qualitative archeological data were consolidated into a GeoPackage database (Open Geospatial Consortium, 2021), forming the RRLN-DB.This open-source SQLite container (SQLite, n.d.) integrates all datasets within a Geographic Information System (GIS) framework.The database's design enhances data handling and supports sophisticated spatial archeological assessments.In compliance with an Open Spatial Archeology approach, QGIS (developed in C++) in the Long Term Release (LTR) versions was used. 2 QGIS is Free and Open Source Software (FOSS) and was applied for nearly all archeological work steps due to its versatility (Conolly & Lake, 2006;Dell'Unto & Landeschi, 2022;Ducke, 2015;Gillings et al., 2020).The majority of cases used the current regional Austrian survey system for the eastern parts of Austria -"Militärgeopgrahisches Institut (MGI) Gauß-Krüger (GK) East" (European Petroleum Survey Group [EPSG]:31256 MGI / Austria GK East) -as the coordinate reference system (Otter, 2015).For easier reuse on a global scale, data has also been reprojected to the World Geodetic System (WGS 84) (EPSG:4326) (Department of Defense, 2014;International Association of Oil and Gas Producers, 2023;ISO 2019).
All archeological data, often received in Environmental Systems Research Institute (ESRI) Shapefile format (ESRI, 1998), were exported into the RRLN database.The GeoPackage used allowed for managing spatial and non-spatial information in a simple, platform-independent The database is subject to the limitations of Austrian copyright law (Urheberrechtsgesetz, 2021); requests were made directly to the BDA in 2018 and 2020, leading to the provision of the data.

2
The latest version used was 3.28 "Firenze" (QGIS Development Team [QGIS], 2022).Hagmann Journal of Open Humanities Data DOI: 10.5334/johd.129 database, requiring no server.Thus, it was entirely maintenance-free and stored in a single file, which could also be opened on mobile devices.QGIS was utilized as a relational database management system for organizing the data.
Before adding them to the GeoPackage, tabular datasets were processed in Microsoft Excel to facilitate their import (Microsoft Corporation, 2016).Queries conducted within the GIS were then exported as tabular datasets for further analysis in spreadsheet software when required by the research process.The choice of Microsoft Excel, a proprietary software, over FOSS alternatives was guided by administrative and practical considerations (Hagmann, 2020b).Notably, for long-term archiving, only dataset components offering unique insights into the area are preserved (Hagmann, 2021g).

2) DATASET DESCRIPTION
All project-related data is stored in a dedicated data top collection stored on PHAIDRA, entitled "Roman Rural Landscapes in Noricum: Archaeological investigations of the Roman settlement in the hinterland of Northern Noricum" (Hagmann, 2019b).It encompasses project-related data such as selected research publications -e.g., Hagmann (2020a) -supplementary files (e.g., maps), and RRLN-DB queries intended for long-term archiving (Table 1).
To enhance clarity, the long-term archived data described in this paper has been organized into a dedicated sub-collection titled "Roman Rural Landscapes in Noricum (RRLN) -Findspots and Sites: Open archaeological data" (Hagmann, 2021g).This collection comprises the original archeological data (spatial data tables locating archeological objects) and the controlled vocabularies employed (Hagmann, 2021a(Hagmann, , 2021b)).The data is stored in XLSX and CSV files.These files contain information representing spatial 2D point coordinates and attributes denoting the sites and their respective archeological objects (  (Trognitz, 2017;Unicode, 2023).Each digital object within this collection is assigned a Digital Object Identifier (DOI).This unique alphanumeric string is a persistent link to its online location, facilitating reliable and consistent access and citation of digital content (ISO, 2022).
At the core of the collection, data identifying Roman sites is stored in a freely accessible subsub-collection.This collection is presented as CSV tables for ease of access, reuse as well as sustainable availability and is titled "Roman Rural Landscapes in Noricum -Sites: Roman settlement places -open dataset (CSV)" (Hagmann, 2021c).This collection comprises, again, two objects: "Roman Rural Landscapes in Noricum -Sites (CSV/MGI)" provides coordinates of the relevant sites using the local coordinate system MGI GK East (Hagmann, 2021d).The other object utilizes WGS 84 for global-scale operations (Hagmann, 2021e).Alternatively, an XLSX table can be accessed via the "Roman Rural Landscapes in Noricum -Sites" object, employing MGI GK East (Hagmann, 2021i).
Two datasets are accessible for personal scientific research upon formal request due to administrative-technical limitations regarding possible source material copyright issues.One table links all find spots within the AoI to their corresponding features and forms a comprehensive query table ("Roman Rural Landscapes in Noricum: Findspots").The other table comprises all Roman features linked to all Roman find spots in a separate dedicated table ("Roman Rural Landscapes in Noricum: Roman Findspots"); both utilize MGI GK East.However, the metadata for these datasets remain openly and freely accessible (Hagmann, 2021h, 2021f)."Fundstellen-ID" is the primary respectively foreign key to link Roman sites with Roman find spots joined with the features.The comprehensive tables for (Roman) find spots are designed to function as a stand-alone dataset.The RRLN-DB dataset selected queries' key metadata fields can be described as follows.

LICENSE
Creative Commons Attribution 4.0 International (where applicable)

3.1) DATABASE MODEL: SITES -FINDSPOTS -FEATURES
The RRLN database employs a data model that organizes archeological data using "features" as the primary unit.Inspired by the BDA-FSDB model, it adopts a three-tier system to classify and locate archeological items within the AoI, summarized as: "An archeological 'feature' is documented at a specific 'find spot' within a 'site,' a cluster of find spots" (Figure 2): -1 "Site" groups n "Findspots" (clusters).
-1 "Findspot" contains n "Features" (includes). 3 The first title identifies the digital object as a virtual entity, while the second names the content-related, abstract information entity represented by the former.The smallest unit, the archeological "feature," represents a distinct entity, like a Samian fragment (i.e., a find) or a kiln (i.e., a structure), identified by the presence of any archeological object at a "findspot."A "feature" is therefore an abstract archeological container capturing information using a controlled vocabulary and based on the (generalized) information provided by the BDA-FSDB, not representing detailed components like an ash fill within a kiln.It is the first level of qualitative value, while the find spot, with coordinates in the EPSG:31256 system, is the second level.

#
The find spot, linked to local information like plot name or political municipality, indicates the spatial location of an archeological object without pinpointing its exact spot.Hence, a find spot can contain one or several features, establishing a 1:n relationship.However, the model does not describe the exact location of an object but rather the coordinates of the initial place of discovery.Feature assignment to a findspot is based on archeological activities recorded in the BDA-FSDB, with point coordinates marking the approximate center of the parcel(s) where the object was found.Therefore, both BDA and RRLN databases employ pseudogeomasking, avoiding pinpointing the exact location for preservation purposes.Despite the relative inaccuracy, it allows for geographic determination of a find spot's area of interest (Smith, 2020).
The third level combines one or more find spots into a "site," a superordinate conceptual entity envisioned as a cluster of spatially connected find spots with a unique ID per cluster (Doneus, 2013, pp. 122-125) -such sites are the "features of interest" stored in the respective "Roman Rural Landscapes in Noricum -Sites" objects on PHAIDRA.

(3.2) BASE DATASET
The data originates from 217 independent BDA-FSDB queries, distributed across 604 cadastral municipalities (Figure 3) included in 73 political municipalities (Figure 4) within seven political districts (Figure 5), representing the state of archeological knowledge in 2016.
Each BDA-FSDB query, geographically based on current municipal boundaries, included all BDAregistered sites within that area.In general, three BDA-FSDB queries per political municipalities were provided as at least two XLS files and one TXT file named after the respective municipality.Consequently, there are three distinct types of queries, each containing partially identical yet structurally different information sets for every site within each municipality.Every query within such a set thereby complements the others. 5Copies of the queries were modified for GIS-based analysis in a spreadsheet program, leaving the original BDA-provided dataset unaltered and serving as a backup.
Data in the fields are typically integers for numeric data like coordinates or strings for text-based information like archeological object descriptions (Gumm & Sommer, 2013:114-115).The data covers administrative and archeological information from parcel locations to monument protection status.Importantly, it includes categorical descriptions of archeological features and their periods.

(3.3) AGGREGATION, NORMALIZATION, AND GIS INTEGRATION
The 217 individual BDA-FSDB-queries were aggregated into a single table with 5010 BDA-FSDBentries, representing all archeological find spots from the BDA-FSDB within the AoI.The aggregated 4 The ER diagram was generated using ChatGPT (GPT-4, Code Interpreter beta; August 3, 2023 version).ChatGPT as generative AI software was employed for diagram generation due to its simplicity and efficiency in producing the desired output with minimal input using a few prompts only, and the choice was informed by a methodological preference for exploring contemporary AI-integrated software solutions.

5
There was only one query available for the municipality of Golling an der Erlauf.data were then normalized, emphasizing aspects like location, qualitative classification of the archeological objects, and chronological characteristics, due to the importance of normalizing heterogeneous data (Dziwis, 2018, p. 2). 6The resulting data tables were thus assigned 46 unique headers detailing various attributes, including coordinates, labels, categorizations, and related administrative aspects derived from the BDA-FSDB (Table 2).
Tailored, controlled vocabularies served for the standardized qualitative (Hagmann, 2021b) and chronological (Hagmann, 2021a) attribution of the archeological objects.After reworking of the BDA-FSDB, each entry in the revised data table no longer represents a "BDA-FSDB-entry," but instead corresponds to a newly defined "RRLN-feature."After GIS verification, "RRLN-findspots" were identified using revised coordinates from the BDA-FSDB.These were then clustered in the GIS to create the "RRLN-sites" for the RRLN-DB.The GeoPackage geodatabase was used to store all the data. 6 The method has been adapted from the practices of other studies, such as a Dutch study using the local ARCHIS system (Verhagen et al., 2016, p. 310).The data reveals that in the AoI and surrounding political municipalities, there are no administrative units without archeological objects (Figure 8).This is further confirmed at the smaller cadastral municipality level, where only a small part shows no findings (Figure 9).Despite some "background noise" seen almost everywhere in the AoI, features concentrate mainly on sections associated with intensive construction work.The RRLN-DB's crucial data source, the BDA-FSDB, merits detailed discussion.Its data traces back to the mid-19th century, systematically recording archeological objects in Austria since the 1850s.The BDA-FSDB originated from the analog Central Finds File ("Zentrale Fundstellenkartei"; BDA-ZFSK) created by H. Adler in 1965 as a card index system.In 1995, C. Mayer replaced the BDA-ZFSK with the BDA-FSDB, initially designed as a relational database.Despite early GIS considerations, its integration was delayed due to various reasons, with a GIS client-server application later added in parallel (Mayer et al., 2004).The core of the BDA-FSDB, archeological knowledge, is captured through standardized categories and free text.This includes both entire   structures and individual objects, requiring data aggregation for a uniform evaluation at the sitelevel.Furthermore, non-archeological objects, such as geofacts, were also recorded.The BDA-FSDB provides structured information on archeological objects, including dating, detailed find history, literature collection, and current storage locations, elements initially derived from the BDA-ZFSK.C. Mayer's categorization defines "findspots" ("Fundstellen") as landscape segments of human use and "find locations" ("Fundplätze") as specific areas within findspots evidencing past human activities, adding to the system's complexity.By 2016, the BDA-FSDB recorded 18,860 findspots and 52,083 find locations across 85% of all Austrian cadastral municipalities.The data quality varies due to different collection methods and research standards over time.Notable increases in recorded findspots were influenced by the 1923 Monument Protection Law, post-WWII archeological research, and environmental impact assessments since the 1990s (Mayer, 1997(Mayer, , 2002(Mayer, , 2017)).
Chronology is crucial in the BDA-FSDB, with varying periods and epochs reflecting the lifespan of an archeological record rather than precise moments.These datings, often provisional and sometimes contributed by "citizen scientists," indicate the state of research at the time of the last data edit (Verhagen et al., 2016, p. 311;Mayer, 2017, p. 27).
Notably, C. Mayer published significant research on the BDA-FSDB during the 2000s, defining its general purpose: to record archeological objects (Mayer, 2004(Mayer, , 2008(Mayer, , 2009(Mayer, , 2017;;Mayer et al., 2004;Pollak, 2008).In the past years, there have primarily been publications on the further development of the BDA-FSDB, such as in the form of projects for GIS-based cartographic recording of the BDA-FSDB's contents and online dissemination of archeological information (Steigberger, 2017) within the framework of various focal point projects (Hinterwallner & Krenn, 2020;Langendorf & Hagmann, 2020;Steigberger, 2019Steigberger, , 2020;;Trognitz, 2021).There have also been focused efforts on the successor project, the Heritage Information System (HERIS), which has been gradually replacing the BDA-FSDB since 2020.While the BDA-FSDB was exclusively designed for archeological data, HERIS is intended to handle all cultural assets in Austria, including art-historical ones (Bundesdenkmalamt, 2021;Steigberger, 2022bSteigberger, , 2022a)).
It was initially assumed that the 5,010 archeological sites from BDA-FSDB queries corresponded to actual locations with coordinates.However, many sites either lacked coordinates or had erroneous ones.Attempts to geocode these sites were only partly successful, leading to the exclusion of 771 sites without coordinates from the geodatabase.An alternative method, assigning sites to the geometric centers of their respective cadastral communities, was considered but dismissed due to the potential for introducing bias and distorting spatial analyses.Instead, data without coordinates were selectively included for their qualitative information in a separate table in the RRLN-DB and used for manual assessments where necessary.Anyway, entries in the BDA-FSDB that could not be spatially located represented often unclear, uncertain, or speculative data, more due to imprecise source content than database inaccuracies (Figure 10). 8pon examining the spatial locations and the thematic content of the 5,010 BDA-find spots provided in the BDA-FSDB dataset, challenges emerged with handling the assigned BDA-attributes.The BDA-attributes describe crucial archeological properties of each object.However, character limitations in corresponding fields of the queries led to incomplete or only partial attribute display.Consequently, issues arose with archeological object classification.To address this, the partially included BDA-attributes were compared with each other within all queries, hence completed, and mapped to the aforementioned controlled vocabulary to establish standardization.Therefore, for the 5,010 BDA-find spots, 7,694 attribute entries, corresponding to the 7,694 RRLN-features, were extracted and finally combined into 187 attribute-entries.
Besides the classification-based information, the BDA-FSDB also contains "cultural" (e.g., "Roman" or "Germanic") data, indicating the BDA-FSDB's role as a social-archeological interpretation tool (Pollak, 2017, p. 19).To avoid the numerous challenges associated with the complex concept of "cultural affiliation," particularly about controversial topics such as the still ongoing debate on the "Roman Way of Life," the approach was taken not to consider these categories further (see recently, e.g., Pitts, 2021;Versluys, 2021;Woolf, 2021).
(4.2.2) Selecting the "right" repository: opportunities and challenges Considering the qualitative and quantitative framework of the data, it is necessary to discuss the approach chosen for long-term archiving: Specialized repositories such as PHAIDRA are crucial in meeting regional scholarly needs by providing tailored services and fostering local academic engagement.They adeptly preserve cultural and scholarly output, offering personalized support.However, Austria has no dedicated archeological repositories like the United Kingdom's Archaeological Data Service (ADS) (University of York, 2023) for archiving specialized research data. 9Instead, several institutional repositories with a broad thematic In archeology, datasets from projects like RRLN are often stored in local repositories, highlighting the strong intrinsic link between archeological data and their geographical context, a practice that differs from other scientific disciplines.Hence, "regional projects" that provide "regional data" often align with local digital infrastructures: Local storage can improve the findability of data, particularly for local research efforts.Nevertheless, broad dissemination of these datasets can be facilitated through publication in international, peer-reviewed journals or the use of scientific social network sites.Yet, local repositories encounter challenges in global discoverability and accessibility, contributing to a segmented information landscape.Smaller repositories, in particular, face difficulties with interoperability and sustainability.As a result, standardizing practices and fostering collaborations are crucial for integrating these repositories into the global research community (Bibby, 2021;Bisták et al., 2021;Calandra et al., 2021;Correia & Silva, 2023;Faniel et al., 2018;Geser et al., 2022;Göldner et al., 2023;Huvila, 2020;Jantos & Sommer, 2021;Juty et al., 2020;Kansa et al., 2020;Kreiter, 2021;Nicholson et al., 2023;Novák et al., 2021;Oniszczuk & Makowska, 2021;Richards, 2021Richards, , 2023bRichards, , 2023a;;Richards et al., 2021;Seaton et al., 2023;Štular, 2021;Trognitz, 2021;Wallis et al., 2013;Wilkinson et al., 2016).
The chosen repository PHAIDRA enables the archiving and dissemination of scholarly work across disciplines and formats.Its sustained operation for over a decade demonstrates long-term viability and stability.The technical framework of PHAIDRA enhances online discoverability, addressing the challenges of regionalized information.It exemplifies the benefits of local databases in international academic research, contributing significantly to wider scholarly endeavors.The RRLN project, funded like PHAIDRA by the University of Vienna, uses this infrastructure to enhance funding efficiency and data preservation, thereby improving research integrity and reliability.RRLN's approach includes internal considerations that influence conceptual designs and execution, based on conditions arising from various factors.Furthermore, PHAIDRA›s influence extends beyond regional limits by supporting the FAIR principles, promoting collaboration locally and globally through open science (Faniel et al., 2018;Hagmann, 2018;Huvila, 2020;Juty et al., 2020;S. W. Kansa et al., 2020;Nicholson et al., 2023;Novák et al., 2023;Ross et al., 2022;Seaton et al., 2023;Trognitz, 2021;Wallis et al., 2013;Wilkinson et al., 2016). (

5) IMPLICATIONS AND APPLICATIONS
The project's careful data selection guarantees the preservation and availability of new insights into rural settlement in Northern Noricum while avoiding redundancy with widely accessible data.The RRLN-DB, unlike a printed catalog, consists of a dataset that can be dynamically updated with new data.This dataset allows for controlled modifications, such as error corrections, with changes documented via version history in PHAIDRA.In the repository, each object is stored permanently, and new versions are added as separate items linked to the original, ensuring that no data is deleted or overwritten.This method ensures maximum transparency and traceability, assuming PHAIDRA operates flawlessly.
Where legal, the data is freely and openly reusable for long-term use (Hagmann, 2018).Thus, for the first time in the study area, the entire long-term archived dataset was made sustainably and freely available online under the CC BY 4.0 license, as far as possible.This robust and lightweight collection of data, archived "FAIRly" and consisting of interrelated tables, ensures the application of the principle "as little as possible, as much as necessary," preserving only what is essential for further research while avoiding unnecessary redundancy (Nicholson et al., 2023).In addition, by encouraging meaningful follow-up work based on a reuse concept, it aims to maximize the value of this unique dataset and foster an environment of collaborative, progressive scholarship in the study of rural settlement in ancient Noricum.Furthermore, the implications of this data curation strategy may extend beyond the immediate research context.The approach used here could inform similar initiatives in other disciplines, highlighting the potential for improved efficiencies in data management and facilitating measured advances in historical and archeological research

Figure 6
Figure 6 Number and temporal distribution of the features for all affected municipal areas (n = 7,694) and the AoI (n = 5,030).

7
The figure includes a total of 6,924 features: (a) AoI; (b) municipalities overlapping the AoI; (c) features from all periods; (d) Stone Age; (e) Bronze Age; (f) Iron Age; (g) Roman period; (h) Middle Ages; (i) Modern Era.

Figure 7
Figure 7 GIS-based visualization of the corresponding features per period across the AoI and municipal territories 7 (map: D. Hagmann 2023; data: BDA; BEV; Land NÖ).

Figure 8
Figure 8 Localized features (n = 6,924) per political municipality, revealing that there are no administrative units in the AoI and surrounding municipalities without archeological objects (map: D. Hagmann 2023; data: BDA; BEV; Land NÖ).

Figure 9
Figure 9 Localized features (n = 6,924) per cadastral municipality, indicating that despite some "background noise", archeological objects are mainly concentrated in settlement centers and further sections associated with intensive construction activity, especially freeways (map: D. Hagmann 2023; data: BDA; BEV; Land NÖ).
These principles ensure corresponding archeological data are discoverable and usable.Data must have unique identifiers like DOIs, comprehensive metadata, standard retrieval protocols, and clear licenses.Interoperability requires standardized formats and semantic annotations, while reusability involves detailed documentation and adherence to community standards.The RRLN project integrates with the University of Vienna's repository, PHAIDRA (Permanent Hosting, Archiving and Indexing of (Jisc, 2023)urces and Assets) (University ofVienna, 2008).PHAIDRA, recognized in repository indices like Open Directory of Open Access Repositories (OpenDOAR)(Jisc, 2023)or re3data.org(GermanResearchCentreforGeosciencesetal., 2013), is open to all academic disciplines and offers a robust Fedora Commons framework-based system for the storage and management of diverse file types, including texts, images, and audio files.The system employs an objectoriented data structure and leverages a customized metadata schema from the University of Vienna (UWmetadata), inspired by the Dublin Core standard as initially defined byISO (2003b)and augmented by the Learning Object Metadata (LOM) scheme as defined by the Institute of Electrical and Electronics Engineers([IEEE], 2002).This structure requires several mandatory metadata fields such as "object type," "title," "description," "keywords," and "topic terms," utilizing controlled vocabularies like the Österreichische Systematik der Wissenschaftszweige (ÖFOS)(Statistik Austria, 2023)or the Getty Arts and Architecture Thesaurus (AAT) (Getty Research Institute, 2021).Additional mandatory metadata fields encompass essential elements such as "contributor" and "license."Furthermore,thereis the provision for an individually adjustable number of optional metadata fields, ensuring a comprehensive description of the data and enhancing its accessibility.PHAIDRA offers interoperability through protocols such as the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) (2002) and offers differentiated access rights.Using PHAIDRA ensures that open access objects are easily discoverable through search engines like the Bielefeld Academic Search Engine (BASE) (University ofBielefeld, 2004)or the Open Access Infrastructure for Research in Europe (OpenAIRE) EXPLORE infrastructure, fostering greater visibility and accessibility

Table 2
BDA_Bezeichnung Designation of the Federal Monuments Office on the site BDA_Datierung Dating of the Federal Monuments Office of the site BDA_FO_Nummer Site number of the Federal Monuments Office of the site BDA_Kategorie Category of the Federal Monuments Office of the site BescheidThe existence of a decision by the Austrian Federal Monuments Office relating to monument protection laws Cluster_size Size of the site cluster (number of find spots in the cluster) open Table 1 Overview of the RRLN-DB collection's inner structure, long-term archived on PHAIDRA.# HEADER DESCRIPTION

Table 2
List of all column header names used in the data tables.
Hagmann Journal of Open Humanities Data DOI: 10.5334/johd.129 scope are hosted in Austria alongside PHAIDRA.The most prominent of these is A Resource Centre for the HumanitiEs (ARCHE) run by the Austrian Academy of Sciences (2023), which caters to the broad arena of Digital Humanities.