Open-Access Archaeological Predictive Modeling Using Zonal Statistics: A Case Study from Zanzibar, Tanzania

This paper presents a case study using zonal statistical analysis for archaeological predictive modeling with open-access software and free geospatial datasets. The method is applied to the rural landscapes of Zanzibar, Tanzania on the Swahili Coast. This study used QGIS (version 3.28) to perform zonal statistical analyses of environmental datasets weighted by settlement classes digitized from a 1907 historical map, to create predictive models for settlement across the island. These models were compared against the locations of major precolonial archaeological sites on the island and site data from a random stratified archaeological survey in an environmentally diverse region of northern Zanzibar. The results show strong correspondences between larger permanent site locations and areas of high likelihood for site detection in the predictive model. Additionally, there were correspondences between areas of lower likelihood for site detection and smaller, ephemeral sites related to agricultural production in swidden field plots. These results attest to Swahili rural complexity and enable an understanding of the specific environmental affordances that structured settlement and land use over the last millennium, in ways that shaped colonial contact in rural areas and altered the sociopolitical development of Zanzibar and the East African coast. The methods described here may be applicable for researchers and heritage managers in Africa and the Global South, where funding for large-scale field projects, expensive satellite imagery, or software licensing is limited.


INTRODUCTION
The African continent is a key site for the expansion of geospatial archaeological methods.Geospatial and remote sensing approaches to archaeological research in Africa have increased in visibility and importance in the last decade, and this has coincided with calls to increase the accessibility of these methods for researchers with limited funding (Davis and Douglass 2020;Klehm and Gokee 2020).One avenue for increased geospatial accessibility has been the development free, low-cost, and open-access tools for archaeological remote sensing and geospatial research (Casana 2020;Cerasoni et al. 2022;Davis and Sanger 2021;Fisher et al. 2021;Khalaf and Insoll 2019;Sadr 2016;Rayne et al. 2020).This paper contributes by developing an archaeological predictive model for Zanzibar, Tanzania, using free geospatial datasets and open-access software.The model was created by digitizing a historical map and performing zonal statistical analyses of these features across weighted environmental raster images in QGIS 3.28.Summing these weighted zonal raster images produced two predictive models showing zones of probability for future site detection.These models were ground-truthed with archaeological field survey data from an inland region of Zanzibar, Tanzania, and were also compared against the locations of known major precolonial sites.
Zanzibar is an island region in Tanzania that was centrally important to the sociopolitical development of the Swahili Coast and the western Indian Ocean social system over the last two millennia (Crowther et al. 2016;Fitton 2018).The island of Zanzibar is environmentally diverse, with two major ecological zones: an agriculturally fertile northwestern region with deep soils and above-ground streams, and a rocky, agriculturally marginal karstic limestone landscape in the south and east, where water does not persist above ground (Alders 2023).Archaeologists have increasingly investigated ecological relationships between Swahili people and their landscapes on Zanzibar (Faulkner et al. 2022;Fitton et al. 2023;Kotarba-Morley et al. 2022;Prendergast et al. 2017;Quintana Morales et al. 2022).This study builds on this recent research, contributing to an understanding of the environmental affordances that structured long-term settlement and social change.Modeling human-environment relationships may enable future archaeological prospection on the island where settlement patterns are poorly understood, especially with regard to sites that do not possess standing stone architecture (Alders 2023;Fitton 2018;Horton and Clark 1985).In doing so, this research contributes to a long-standing orientation toward uncovering "hidden majorities" (Fleisher and LaViolette 1999) of Swahili non-elites, who created complex and independent rural societies beyond the boundaries of monumental stone-built towns (Kusimba et al. 2013;LaViolette and Fleisher 2018;LaViolette et al. 2023).Given its dynamic precolonial and colonial history and diverse environmental conditions, Zanzibar is a well-suited context for investigating the relationships between processes of urbanism, colonialism, and environmental factors, and as a case study for testing the suitability of an open-access method for predictive modeling.

PREDICTIVE MODELING, ZONAL STATISTICS, AND ARCHAEOLOGICAL PROSPECTION IN AFRICA AND BEYOND
Archaeological predictive modeling has continued to develop in relevance and sophistication since its inception, incorporating post-processual critiques related to environmental determinism, agency, and the interplay between data-driven and theory-laden approaches (Castiello 2022;Magnini and Bettineschi 2021;Verhagen and Whitley 2020).An assumption shared by all archaeological predictive models is that archaeological features were not randomly produced by humans in the past, but that a confluence of social and environmental factors conditioned their spatial location.Modeling the relationship between known archaeological features and their material and spatial environments can give insights into the locations of presently unknown features, aiding in archaeological prospection and survey and informing an understanding of human-environment relationships.Recent studies have evaluated the predictive power of different statistical approaches and sampling strategies (e.g., Castiello and Tonini 2021;Comer et al. 2023;Kelly et al. 2023;Yaworsky et al. 2020) and developed techniques for raster imagery analysis using machine learning and object-based imagery analysis (e.g., Magnini and Bettineschi 2021).
In African archaeology, recent studies have used a combination of remote sensing and spatial analysis for site detection, predictive modeling, and for understanding archaeological landscapes (Biagetti 2017;Davis and Douglass 2021;Fitton et al. 2023Harrower et al. 2020;Klehm et al. 2019;Ochungo et al. 2022;Pawlowicz et al 2020;Reid 2016Reid , 2020;;Thabeng et al. 2020).Creating models for site detection with multispectral imagery is increasingly accessible, with resources like Landsat, Planet, and Sentinel 2 imagery becoming available at increasingly high spatial, spectral, and temporal resolutions.While recognizing the utility of multispectral remote sensing for site detection, one limitation in rainy tropical regions like Zanzibar is the low availability of consistently cloud-free imagery, and the academic licensing required to access multispectral imagery.
To work around this problem, this paper presents a case study for an alternative method: predictive modeling through zonal statistical analyses of environmental raster images in comparison to training data, which are used to the weight a final summative model (see Fitton et al. 2023 for another regional study integrating multiple geospatial and legacy datasets on Zanzibar).This method is based around quantifying the most suitable spatial zones for the occurrence of specific phenomena by summing weighted raster datasets using raster calculations (e.g., Behr et al. 2017;Kuria et al. 2011).It is a predictive model that relies on environmental zonal raster images (for instance, published or archived maps) and factors for weighting these raster images; in this case, the primary factor is known site location and site sizes for a region of northern Zanzibar during the late colonial period, digitized from a historical map.
Zonal statistical analysis of training features with the Majority statistic is the method chosen to weight environmental raster images, because it is a simple, accessible, and powerful tool that is built into the functionality of open-access geospatial software like QGIS.In keeping with the theme of accessibility, zonal statistical analysis can be carried out by researchers with limited resources, lack of experience in computer programming languages, and limited experience with complex statistical modeling.An innovation of this paper is the use of the coefficient of variation statistic to further weight favored zones.
Sources used for zonal statistical analysis in this paper include a digital elevation model derived from free SRTM satellite imagery (D'Andrea 2008;Harrower 2010;Harrower et al. 2012;Hritz 2010), spatial-environmental datasets from published sources (Colbert et al. 1987;Hardy et al. 2015;Khamis et al. 2017), and a historical map of Zanzibar that was published in 1907.The historical map was georeferenced using methods developed for 19 thcentury Survey of India maps (Garcia et al. 2019;Green et al. 2019;Petrie et al. 2019).It contains a wealth of information about pre-modern rural Zanzibar, showcasing landscapes and features that have disappeared due to urban and agricultural development.This is the first time this map has been considered in detail, and in addition to the conclusions of this paper, it is hoped that the data will be a valuable resource for archaeologists and heritage managers in the future who are concerned with Zanzibar's colonial history.Heritage management and conservation on Zanzibar is constrained by limited resources and many archaeological sites may be in danger of destruction as urban growth and agricultural development continues (Mansab 2021, interview with Mariam Mansab, director of Zanzibar's Department of Museums and Antiquities).The digitization of this map and the creation of archaeological predictive models may aid in site conservation and stewardship in areas where development is proceeding.
Workflows for historical map digitization, zonal statistical analyses, and raster calculation are modeled in QGIS 3.28, a free and open-source GIS.The archaeological predictive model was ground-truthed using a random stratified survey across an environmentally and sociopolitically diverse zone in rural northern Zanzibar (Alders 2023), and also compared to major known precolonial sites on the island (Horton and Clark 1985;Fitton 2018).
This paper advances the development of open-source geospatial applications for archaeological prospection, especially in the tropical, forested environments of sub-Saharan Africa.In line with other recent examples, it draws on the availability of free geospatial datasets to help understand archaeological landscapes.The sections below outline the methods and results of this case study in Zanzibar, Tanzania.Results inform a discussion of the environmental affordances which structured Swahili social development over the last millennium.Care must be taken to avoid choosing raster images with zones that co-vary significantly because of dependencies between them.Dependent environmental factors summed together as raster images would overweight certain zones.For instance, two other zonal raster images were considered for use in this model: a map depicting areas inside and outside of the historical clove plantation zone (Sheriff et al. 2016: 20), and a map of soil infiltration zones (Hardy et al. 2015).These images were discarded because they co-vary with elevation and soil types in a dependent way, meaning that their inclusion would bias the model more heavily toward specific zones.Geology and soil type raster images co-vary to some extent, but they were both included because they describe different independent categories: the former describes geological categories, while the latter map of soil types refers to indigenous Swahili topsoil categories that relate to organic composition, soil color, and soil depth.

STEP 3: PREPARING THE TRAINING FEATURES
Training features for the predictive model were derived from settlement classes digitized from a historical map of Zanzibar, since these settlement locations likely reflect environmental affordances that may have conditioned the spatial patterns of archaeological sites over the last millennium in Zanzibar.Stanford's Geographical Establishment in London published a map of Zanzibar, showing villages, landforms, and other features recorded on the island during the 1890s (A Map of Zanzibar Island, 1907).Figure 2 shows this map, referred to from here on as the 1907 Zanzibar map.The inset is shown in higher resolution in Figure I in the Supplementary Materials section.
The map is not an official British Survey of India map, but the legend names the mapmaker as Imam Sherif Khan Bahadur, a surveyor of the British Survey of India.The survey that produced the map likely occurred between 1892 and 1894, when Imam Sherif Khan Bahadur was stationed in Zanzibar (National Archives of India, 1894).However, this date is complicated by some details on the map.Marahubi Palace (built by Sultan Barghash in 1880) is listed on the map as a ruin, suggesting that Imam Sherif Khan Bahadur or someone else surveyed that region after Marahubi Palace was destroyed by a fire in 1899 (Rhodes et al. 2015: 350).Some other aspects of the map also suggest different  dates.Frazer's Sugar Mill and Frazer's House are not listed as ruins, even though Fitzgerald (1898) described them as such during his travels through Zanzibar in 1898.
Fitzgerald wrote then that the house and sugar mill were active around 25 years prior, in 1873 (Fitzgerald 1898: 521-523).The depiction of Frazer's sugar mill and house would seemingly contradict the depiction of Marahubi as a ruin if the map were representative of a single snapshot in time.Imam Sherif Khan Bahadur likely did not carry out surveys prior to 1892, but these details show that this map was made from a composite of surveys which were only finished in 1899 at the earliest.The map was the main cartographic source for Zanzibar prior to a more recent map created in 1984-85 (Horton, pers. comm.).However, the map has been overlooked by both historians and archaeologists of the colonial period in Zanzibar, despite the wealth of information it contains regarding settlement, land use, and geography on the island during the late 19 th and early 20 th centuries.Because of the map's status as a tool of British imperial dominion in Zanzibar, it records invaluable data for understanding the composition of rural areas on the island.Preserved in the map is a settlement system that was surveyed while the plantation system was still fully developed, around the same time that slavery was being abolished, from 1896 to the early 1900s (Cooper 1977: 122).It contains detailed information regarding Zanzibar's villages, wells, hydrology, roads, and other features at the end of the 19 th century.As such, the map is a unique source of information for understanding the spatial patterns of the 19 th -century plantation system, preserving many features that have changed through urban and agricultural development in the 20 th and 21 st centuries.

Georeferencing and Digitization
Petrie et al. ( 2019), Garcia et al. (2019) and Green et al. (2019) developed methods for georeferencing and interpreting 1 inch to 1 mile Survey of India maps to glean data related to ancient settlement in northwest India, in the form of anthropogenic mounds that surveyors in the late 19 th and early 20 th centuries recorded.This study draws on their methods since the map was made in the style of British Survey of India maps.The first step was to georeference the 1907 Zanzibar map to features on a modern basemap of the island.This can be achieved using the Georeferencer in QGIS, and by uploading base maps from the QuickMapServices plugin.
No datum or coordinate system is specified on the map itself, but it was likely created using the Everest 1830, Clarke 1866, or Clarke 1880 datum (Mugnier 2021;Petrie et al. 2019).The map has longitude and latitude graticules with specified coordinate points, which were converted into a point vector file.Though the unreferenced map aligns nearly perfectly with these points when projected in the Clarke 1880 datum, the features on the map can be up to a kilometer or more off from their actual locations when the map is georeferenced in this way.This inaccuracy may be due to mapmaker error, or changes to the map when it was printed in 1907 or digitized in recent years.Other examples of Survey of India maps have also been found to be internally inconsistent in this way (Petrie et al. 2019).Given this problem, the solution following Petrie et al. (2019) was to georeference features on the map by hand using the WGS 1984 datum, and to allow the graticules to distort.Petrie et al. (2019) suggest that the "Adjust" transformation in ESRI ArcGIS may give the best possible results.111 control points were used to georeference this map with the Adjust transformation in ArcGIS; this was done prior to conceiving of an all-open-access study.However, georeferencing in QGIS is equally advanced and while the Adjust transform is not available in QGIS 3.28, the Thin Plate Spline transform may give comparable results since it similarly allows for rubber-sheeting with a large number of control points.After georeferencing, map features were digitized by hand by creating point, line, and polygon features.The following sections describe these features.

Settlements and roads
The map depicts settlements and a road network, digitized in Figure 3.There are 489 settlements, with squares drawn in varying numbers to represent village size.Imam Sherif Khan Bahadur's survey methodology is not apparent, but from the map it is likely that two methods were used to indicate settlement size and importance.The first method was through the illustration of squares, which represent settlement areas.Squares are not drawn to the scale of a Zanzibari house-by the scale of the map, the average square ranges from 600 to 1000 square meters.A house this size would be a mansion.Rather than indicating the actual sizes of houses, it is likely that overall settlement size was estimated by a surveyor and then indicated by the number of squares drawn.Officially, the legend of the map depicts a scatter of squares and describes this as a village, further suggesting that surveyors were drawing settlement squares with the aim of capturing settlement size, rather than individual households.The legend also states: "N.B.-Very few of the Miji (settlements) in Zanzibar Island are compact villages, the houses are rather scattered over each district".This is the case today as well in many areas.
The second method for distinguishing settlement types is typographical-there are Latin letters in italics with capitalized first letters and additional lowercase letters to name most settlements, as well as to indicate place names.Bolded, non-italic letters are used to name larger towns, also with capitalized first letters and additional letters in lower-case.Finally, bolded, nonitalic, and all-capitalized letters are used for Zanzibar Stone Town, the largest settlement on the map.The inset in Figure 2 shows an example of all three types.Zanzibar Stone Town is a city, Mtoni is a town, and Gulioni, Mianzini, and Miwaleni are three of many villages.
While the map legend describes the squares as representing villages, they can be interpreted to represent a settlement hierarchy from their placement and count in conjunction with typographical differences.Square counts for each village appear to be significant and relate to different sized settlements: 1) hamlets or very small villages, 2) small villages, and 3) large villages.Settlements were divided into these 5 size classes based on their typography and number of settlement squares depicted.A sample of each size class was measured in area, to convert square counts into estimated average settlement size, in hectares.Table 3 shows these estimated size classes.
This method of distinguishing settlements by size is an imposition for the sake of regional analysis.A contrasting perspective is the view derived from mid-20 th century ethnographies of the Swahili, which divided permanent settlements into "stone-towns" and "country-towns", irrespective of size.These towns, though differentiated by the degree of political and economic specialization, functioned similarly as places that were the basis of social rights for their residents.They were also characterized by different forms of production and trade, with stonetowns emphasizing mercantile activity and countrytowns emphasizing agricultural production (Horton and Middleton 2000: 55-58).Since the legend of the 1907  map specifically refers to villages as miji, it is likely that the surveyors had some familiarity with an idealized Swahili system of land tenure.Nevertheless, the sizebased and typographical differences that are visible on the map attest to the material differences in settlements that the map makers encountered and adapted to as they produced their survey.
In addition to the settlements, the map also depicts a network of roads, paths, tracks, and other ambiguous dotted lines on the map.The longest paved or "metalled" road during this time ran from Zanzibar Stone Town to Chwaka, connecting the east and west of the island.This road was under construction in the early 1890s (Owens 2007), and its presence on the map may indicate that it was just finished when the map was completed.Surveyors recorded two other paved roads on the map as well.One went north from Stone Town to Bububu, and the other went south to Stone Town to Mbweni.
Other roads are those which the legend calls "village roads"; these are likely dirt paths for foot traffic, and may have also been accessible to mules, camels, horses, and carts.A third category is called "Other Roads and Tracks".In practice, these are likely not qualitatively distinguishable from village roads in that they were also dirt footpaths, though possibly smaller and less frequently used.letters by their Swahili names.This stream map likely represents the oldest model of hydrology on Zanzibar, existing prior to many landscape transformations which occurred during the latter half of the 20 th century.The map shows that major streams did not flow in the south, east and far north of the island, where porous limestone bedrock draws water underground.It also shows that the courses of the larger streams of the early 20 th century were slightly different than comparable streams today.This may be due to variations in local geology, urban development, or changes in landforms that have altered the courses of streams since the early 20 th century.

Streams, Wells, and Other Miscellaneous features
Wells are indicated on the map by circles.Their preponderance in karstic limestone areas far from the streams of the northwest region aligns inversely with the stream network-the wells are most common in places where streams are not shown above ground.The lack of wells in places with above-ground streams suggests that people in the early 20 th century relied considerably on above-ground stream water for daily use where it could be found, and dug wells in places where stream water was not available.Six wells in the south are marked as either "Cave Wells" or "C.W.", which likely also stands for cave well.One of these cave wells is Kuumbi Cave, a wellknown site from the late Pleistocene to late Holocene (Shipton et al. 2016).The data showing other cave wells contained in this map may point to other cave site locations.
Other miscellaneous features on the 1907 Zanzibar map include lighthouses, "poor houses", a leper colony, a sanatorium, ferries, sugar mills, ruins, and a depiction of buildings with steepled roofs that may represent mosques or large houses.Two other areas are places called the Mwana Msa Shrine and the Kuani House.Finally, dotted lines which form small circles are not described in the legend but appear to correspond to some labeled settlements in the south and east of the island.It is unclear what these circles represent, but further research might investigate whether they correspond to abandoned settlements, as boundary markers or raised areas.Comparisons of these circles with satellite imagery are inconclusive.Some circles fall over modern field plots or settlements, but others fall in areas that are today covered in brush.Horton and Middleton (2000: 56) describe historical areas of built-up soil in the south and east where village communities repeatedly constructed and demolished earth and thatch houses; it is possible that these circular features could represent the mounds created by this practice.Further research and groundtruthing in the southern region of Zanzibar might clarify this question.As is the case with Survey of India maps in India (Green et al. 2019;Petrie et al. 2019), surveyors may have unknowingly mapped archaeological sites by recording mounds as landscape features.
Other archaeologically significant features are the five places listed as ruins on the map.Two of these locations are the known Portuguese-period sites of Fukuchani and Mvuleni, which LaViolette and Norman (2023) have recently investigated.The three others are the Chimani Ruin, the Kizimbani Ruin, and the Marseilles Ruin, all of which are located just northeast of Stone Town near Mwera.The Marseilles Ruin may be the site of the Marseilles plantation, the site of a battle where Barghash bin Said surrendered to his brother the Sultan Sayyid Majid in 1865 after an abortive attempt to seize the throne of Zanzibar (Ruete 1888: 107).Further investigation is needed to determine whether any of these ruins still exist today.

STEPS 4-7: CONVERTING SETTLEMENT CLASSES TO TRAINING FEATURES, CALCULATING ZONAL STATISTICS, AND DETERMINING WEIGHT CLASSES
The digitization process outlined above produced five settlement classes from the 1907 map: 1) hamlets/ very small villages, 2) small villages, 3) large villages, 4) towns, and 5) the main urban center, Zanzibar Stone Town.To analyze these settlements in relation to raster data, they were converted into polygons that reflect their area.Buffer polygons for each class were created, encompassing the average area of each class.To simplify analyses, these buffer polygons were then merged into two settlement class groups: small settlements (n = 379) and large settlements (n = 110).Small settlements were comprised of hamlets/very small villages and small villages.Large settlements were comprised of large villages, towns, and Zanzibar Stone Town.These two polygon vector files constituted the training data for two distinct models.
The Majority statistic was calculated across these two training feature classes for each zonal raster image in QGIS using the Zonal Statistics process.This statistic reports which unique pixel value (reflecting a zone) is most numerous within the space of a settlement polygon; summarizing this statistic using the Statistics by Categories process gives a count of settlement polygons per zone for each raster image.Table 4 through Table 11 display this data for each raster image and include statistics that were used to weight zones with the densest count of settlement polygons.
Density was calculated by dividing the count of settlement polygons per zone by the area of each zone, to derive a count of settlement polygons per square kilometer.The zone with the highest density of settlements per raster image was selected for weighting in the final predictive model.Calculating the zone with the highest density of settlement polygons is a better measure of zone favorability than a simple count of settlement polygons per zone, since each zone within a raster image can vary significantly in size.To compare the evenness or unevenness of the distribution of settlements across zones, another statistic calculated was the coefficient of variation (CV), found by dividing the standard deviation of settlement polygons per zone by the mean of settlement polygons per zone.
The following sections show settlement class distributions across the eight zonal raster images.Each table is divided into two groups, showing the distribution of small settlements (n = 379) and large settlement (n = 110) across each zonal raster image.Each table also lists the zone with the highest density of settlements for each settlement group, and the coefficient of variation for each settlement group.Maps of each zonal raster image are available as supplementary materials.

Zonal statistics across aspect zones
A zonal raster image for aspect derived from 30 m free SRTM imagery from USGS shows hillslope orientation.Table 4 shows settlement classes from the 1907 map across these aspect zones.East and west-facing slopes have the highest site density as well as the highest site counts for both classes.No other patterns are apparent.See Figure A in the Supplementary Materials section for a map of this raster image.

Zonal statistics across elevation zones
Table 5 shows the 2019 sites distributed across five elevation zones, categorized using a Natural Breaks (Jenks) algorithm on a digital elevation model from SRTM 30 m imagery.Higher elevation zones are favored.The 69-135 m zone is most densely settled for small settlements, while the 48-69 m zone was most densely settled for large settlements.See Figure B in the Supplementary Materials section for a map of this raster image.

Zonal statistics across geology zones
Seven geology zones exist on Zanzibar (Colbert et al. 1987, also see Hardy et al. 2015).These are catenas of M3 sandy clay marl, Q2 coralline limestone, M1 Miocene limestone, a Q2/M1 mixture, Q1 recent deposits, a Q2/Q3/M1 mixture, and mangrove zones with no data.Table 6 shows settlement classes from the 1907 map across these zones.Settlements of both classes are most numerous and most dense in areas of M3 Sandy Clay Marl.See Figure C in the Supplementary Materials section for this raster image.

Zonal statistics across rainfall zones
Rainfall zones on Zanzibar (Colbert et al. 1987) are divided into three zones: 1000-1500 mm, 1500-2000 mm, and 2000-2500 mm of rainfall per year.Table 7 shows the zonal statistics for settlement classes across these zones.Large settlements are most dense in the

Zonal statistics across reef distance buffer zones
Reefs (Khamis et al. 2017: 120) were buffered by distance to create a zonal raster image for the island.Table 8 shows the zonal statistics for settlement classes across these zones.Small settlements were densest within 500 meters of reefs, while large settlements were densest within 3 kilometers.See Figure E in the Supplementary Materials section for a map this raster image.

Zonal statistics across slope degree zones
Slope degree zones, also derived from 30 m SRTM imagery from USGS, fall into three categories: 0-3-degree slope, 3-10-degree slope, and areas with >10-degree slope.
Table 9 shows settlement classes across these zones.
No site is found on a slope of 10 degrees or more, and both settlement classes are most dense in 0-3-degree slope areas.See Figure F in the Supplementary Materials section for a map this raster image.

Weighting Zonal Raster Images
The next step was to weight zonal raster images based on the training features.Zones in which settlement class training data is most dense were selected as factors, and then weighted based on the coefficient of variation (CV).For each settlement class, the standard of deviation was calculated for the range of CV values, and then a CV range was created by adding and subtracting the standard of deviation from the mean CV.This range was then divided up by equal intervals, and each interval threshold was assigned a weight class.This created an internally consistent range of weight class thresholds for each training feature dataset.The CV for each zonal raster image was assigned a score based on these weight class thresholds.
One knowledge-based adjustment was made: the CV for large settlements across aspect zones was reduced from 3 to 2, for three reasons.First, the CV value is 0.56, only 0.01 points into weight class 3, producing an edge effect.Secondly, visual inspection of both settlement classes across aspect zones confirms that the distribution mostly reflects the fact that east and west facing slopes are the most numerous slopes on the island, due to the fact that Zanzibar's hill system runs like a spine from south to north.It was likely that settlement choices in the past were not strongly influenced by aspect zones, but rather fell in a relatively random distribution with regard to hillslope orientation.This is because hill slopes on Zanzibar are mild, and the equatorial sun means that hill orientation is less of a factor for agricultural production than in climates closer to the Earth's poles.Third, the 30 m resolution of the aspect raster image means that nearly all large settlement classes contain multiple aspect zones within them, so the zonal statistics for each training feature can vary considerably due to very slight adjustments in settlement placement by the manual digitizer.This third problem was not the case with any other class and was less the case with the small class of training features since the areas of these settlements tended to encompass far fewer aspect zones.To compensate for these factors and to minimize the impact of the aspect raster image on weighting the large settlement zones, a knowledge-based adjustment reduced the aspect weight class for large settlements from 3 to 2, while still maintaining the most favored zone chosen by the Majority statistic.
Tables 12 and 13 depict the weight class thresholds for each model.Tables 14 and 15 depict the weighted scores for each zonal raster image for both settlement classes, based on the zonal statistics above.This table also summarizes the most favored zones for both settlement classes across all zonal raster images.For small settlement classes, this produced Model A; for large settlement classes, this produced Model B.
The eight zonal raster images were reclassified to reflect weight classes, using the Reclassify by Table process.Favored zones were classified with a unique pixel value of each weight, and all other pixels were assigned a value of zero.This was done twice, once for each model.

STEPS 8-9: PRODUCING THE PREDICTIVE MODEL
The two sets of weighted raster images from these models were summed using the Raster Calculator process in QGIS to produce two archaeological predictive models, Model A (Figure 5) and Model B (Figure 6).The summed raster images contained values from 1-22 in the case of the small settlement model and 1-25 in the case of the large settlement model, but both were reclassified into five zones of site detection probability using a Natural Breaks (Jenks) algorithm: Very Low, Low, Medium, High, and Very High (see Diwan 2020: 152).Since QGIS 3.28 does not yet support this reclassification algorithm for raster images, a workaround was to convert the raster images into polygons, re-symbolize the polygon files with graduated symbols using Natural Breaks (Jenks), and then manually reclassify the original raster images using the Reclassify by Table tool, with the values generated from the polygon symbology.

RESULTS
Summing weighted zonal raster images from two different settlement classes produced two raster images that reflect site detection probability zones, Model A (Figure 5, from small settlement classes) and Model B (Figure 6, from large settlement classes).The models predict site locations based on the density of training features within specific environmental zones.Both training feature classes were strongly associated with flat, level ground (0-3-degree slope), areas within 500 meters of above ground streams, and M3 Sandy Clay Marl geology zones.Both were also associated with east and west aspect zones, but as discussed in the prior section, this was likely not a strong factor influencing settlement location.However, smaller settlement classes were more strongly associated with zones related to agricultural production and subsistence: higher elevation zones (preferential for clove plantations, see Sheriff et al. 2016), high rainfall areas (2000-2500 mm), kinongo soils (deep soils favorable for subsistence agriculture and earth and thatch house construction), and areas within 500 meters of offshore reefs.In contrast, larger settlement classes were associated with lower rainfall areas, lower elevation zones, sandy mchanga soils, and areas further from offshore reefs.The comparison between reef distance is noteworthy-Swahili people in small rural settlements may have preferred to live near reefs because they provided opportunities for subsistence fishing, while people in larger settlements may have preferred open seas without reefs.Reefs would have impeded the movement of larger ships and boats that brought trade to port towns like Mkokotoni, Tumbatu and Zanzibar Stone Town, with their deep-water ports and good anchorage (Fitton 2018).The lack of easily accessible reefs for small-scale fishing around Mkokotoni, Tumbatu and Zanzibar Stone Town may have not only facilitated the arrival of larger boats bringing trade, but also may have stimulated communities in these places to develop larger-scale fishing operations, necessitating greater social cooperation and coordination.
Having considered the environmental factors that structure Models A and B, the next section ground truths the models by comparing them to the locations of major precolonial sites of the late first and early second millennium on Zanzibar (Fitton 2018;Horton and Clark 1985), and to a dataset of sites recovered during a systematic survey across multiple environmental zones in northern Zanzibar in 2019 (Alders 2023).These latter surveys identified and recorded 44 new archaeological sites, with 31 found in the course of a systematic random stratified sample in a region of 32 km 2 .

COMPARING THE MODELS TO MAJOR KNOWN PRECOLONIAL SWAHILI SITES
Tables 16 and 17 show zonal statistics for known major precolonial sites in relation to the site detection probability zones of the two predictive models, and Figures 7 and 8 show the spatial distribution of these sites in relation to the two models.Six out of eight precolonial sites fall within the High or Very High site detection probability zones from Model B. The fact that Model B predicts these precolonial site locations well attests to similarities in environmental favorability between precolonial site locations and the larger settlements which persisted and grew during the 19 th century on Zanzibar, which became the training data for this model.The two outliers are Kuumbi Cave and Tumbatu, which lie in Low and Medium probability zones respectively.Kuumbi Cave is a famous precolonial site on the island, but it was not significantly inhabited by Swahili people.Rather, it is most well-known as one of a handful of late Pleistocene hunter-gatherer settlements on the East African coast (Shipton et al. 2016).Occupied by hunter-gatherers, agricultural suitability was not a concern, and the site was likely favored for the naturally occurring shelter that the cave provided.Similarly, the 11 th -15 th century town of Tumbatu was not founded as an agricultural center but was rather established by nascent       Swahili elites around the 11 th century, who may have sought seclusion, security, and access to shipping routes rather than agricultural suitability.The town's residents relied on support from the residents of Mkokotoni across the channel, who may have continually ferried over food and water to the site (Rødland 2021: 254).Given this interdependence, Rødland (2021) has argued that Tumbatu and Mkokotoni formed a single urban landscape.That Model B does not predict Tumbatu's site location well was to be expected, given the specific history of the town.
Model A does not predict the locations of major precolonial as well, with four sites falling into Very Low or Low zones; however, the Very High zone still has the highest density of sites because the zone is the smallest relative to others.The poorer performance of Model A attests to the fact that in comparison to major precolonial sites, slightly different environmental affordances structured the small-scale settlements during the 19 th century that were used as training data for this model.

COMPARING THE MODELS TO 2019 FIELD SURVEY DATA
Tables 18 and 19 show zonal statistics for the sites recorded during field surveys (see Alders 2022;2023) in relation to the two predictive models.The tables are stratified by site type, considering artifact scatters in fields in the top rows and permanent, village-sized sites in the bottom rows.Figures 9 and 10 show the spatial distribution of these sites across both models.
Model A predicted the site locations of larger, permanently occupied sites in rural inland Zanzibar, with seven out of nine sites falling within High and Very High site detection probability zones.This reflects similarities between environmental affordances which structured the small settlement classes on the 1907 map of Zanzibar, and the larger, village-sized sites recovered archaeologically during survey.For 19 th -century sites recovered this was expected; however, the model's ability to predict the location of a precolonial village site also suggests that the model reflects environmental factors that conditioned small-scale settlement for many centuries in rural Zanzibar.
Model A failed to predict the site locations of smaller artifact scatters in fields.These smaller sites represent ephemeral camps or field houses that were occupied during seasonal agricultural labor (see also Walshaw 2015), especially in the eastern region where stony landscapes and a lack of fresh water on the surface prohibit larger settlements in many areas.In these regions, farmers today bring food and water to swidden field plots and camp for several days during clearing and planting.Ceramic scatters and shell piles in these same fields dating to the 11 th century at the earliest likely attest to similar land use patterns in the past (Alders 2022: 118-126).In contrast, larger sites in other parts of the survey region likely reflect more permanent occupations, ranging from small hamlets to plantation estates to the large, dispersed village or town of Chaani, which spanned at least 60 hectares by the 19 th century (Alders 2023).The failure of this model to predict these smaller ephemeral camp sites reflects the fact the model was trained with permanent settlement classes, the smallest of which (Hamlets/Very Small Villages on the 1907 map) was still larger than the ephemeral camp sites that surveys recorded.
Model B was less successful at predicting the locations of small-scale sites recovered through field survey, with Low and Medium site detection zones having the highest density of sites for small artifact scatters and larger permanent sites, respectively.This result suggests that Model B, which was trained using large settlement classes from the 1907 map, reflects slightly different environmental affordances that did not apply to smallscale settlement in rural inland Zanzibar.Environmental conditions that influenced the locations of larger settlements were less constraining to small-scale communities in rural areas.

ASSESSING FALSE POSITIVES
A consideration for both models is the extent to which zones of High and Very High probability for site detection return false positives.For known major precolonial sites this was not possible to assess, since these site locations come from disparate sources and were not the result of a survey sample.For the 2019 field survey sites, Figures 9 and 10 show survey transects across and within site detection zones for Model A and B, which give an indication of false positive results for each model.Though Model A's Very High site zone was a better predictor of permanent sites recovered during survey than Model B, it also created High and Very High site detection zones across three transects which did not produce any permanent sites, one of which did not produce any sites at all.Model B's High and Very High site detection zones were less successful at predicting all site locations, but the model also has fewer false positives in Very High site detection zones.Model B's Very High site detection zone was smaller, but still included the precolonial village site, and the largest site in the survey region, the ~60 ha dispersed village of Chaani.The false positives in Model A especially show the limitations of this model at the scale of transect survey, but do not detract from the larger regional implications of the study.

DISCUSSION
Model A was successful at predicting the locations of smaller, permanent village sites in rural inland Zanzibar, while Model B was more successful at predicting the locations of major precolonial Swahili sites, especially in coastal areas.This section considers the utility of these models for future site detection and reflects on the environmental affordances that might have conditioned Swahili settlement over time at different scales.
Model A can be used to predict other permanent precolonial and colonial period village sites in rural inland areas.Another precolonial village site, Mwanakombo, was also discovered in 2019 during field surveys but was not included in this analysis because it was not recorded during systematic surveys (Alders 2023); nevertheless, this site also falls within the High and Very High site detection zones of Model A. Precolonial village sites like these took advantage of kinongo soils for farming and making earth and thatch houses, proximity to streams, and high rainfall.On the other hand, the small ephemeral sites outside of the predictive zones in Model A attest to the creative forms of land use that Swahili communities have employed for centuries in environmentally marginal landscapes.Though Swahili people did not settlement permanently in these zones they nevertheless transformed and occupied these landscapes through seasonal incremental processes, digging in coralline limestone bedrock to plant and crafting field walls out of limestone cobbles (Alders 2022: 123-125).Although Swahili communities favored specific environmental zones for permanent settlement, they were not constrained from using and moving through less favorable zones on the island.Ecological affordances structured, but did not determine, long-term land use in rural inland areas.
Model B is a better fit than A for the data for known locations of major precolonial sites.In addition to predicting the locations of Stone Town, Mkokotoni, Shangani and Fukuchani in the northwest, the model identifies small strips of coastline in the south of the island as areas of High and Very High probability for site detection, and these locations line up well with the large precolonial port of Unguja Ukuu and the precolonial town of Kizimkazi, which hosts the oldest mosque in East Africa (Kleppe 2001).Further surveys in the High and Very High zones of Model B would likely reveal other important precolonial sites on the island.Areas for future surveys might include the southeastern coast near Paje, the southwest coast across the bay from Unguja Ukuu, the western peninsula south of Zanzibar Stone Town, the northwest coast, and many inland areas north of Zanzibar Stone Town.The inland region north of this urban center in particular likely contains a number of precolonial village sites that would help clarify urban-rural interactions.These sites may be under threat of destruction from growing agricultural and urban development.
The predictive models produced here are useful tools for archaeological prospection, but they also inform a long-term understanding of urban and rural settlement development on the East African Swahili Coast over the last millennium.Research on the Swahili Coast has definitively revealed the scale, complexity, and interconnectedness of non-elite, rural settlement (Kusimba et al. 2013;LaViolette and Fleisher 2018).Increasingly, archaeologists have sought to investigate the environmental dynamics of Swahili settlement landscapes (Faulkner et al. 2022;Fitton et al. 2023;Kotarba-Morley et al. 2022;Pawlowicz et al. 2014;Prendergast et al. 2017;Quintana Morales et al. 2022;Walshaw and Stoetzel 2018).This paper contributes to this growing body of research by modeling and testing the environmental affordances that influenced regional settlement trends.The fact that, out of the large area of southern Zanzibar, Unguja Ukuu and Kizimkazi developed in small regions identified by Model B attests to how local environmental conditions were significant factors for the development of Swahili settlements.The settlement locations of even the largest and wealthiest Swahili towns on Zanzibar developed in places where early Swahili communities capitalized on environmentally suitable zones for farming, fishing, house building, and procuring water.These zones continued to influence settlement trends into the colonial era when Omani planters settled rural inland landscapes with enslaved retinues and sought to produce cloves and other products for international markets.
Environmental factors were important, but Swahili people on Zanzibar were not constrained by them.In the case of Tumbatu, they settled on a rocky, agriculturally marginal offshore island with little water, and may have relied on social networks to provision the town from the more agriculturally suitable territories around Mkokotoni.Also, as demonstrated by field surveys, small-scale Swahili communities farmed and camped in environmentally marginal zones in the rocky eastern region, though they did not settle there permanently.Nevertheless, ecological factors certainly influenced Swahili settlement trends over time, and the predictive models produced here help contextualize the material affordances that Swahili people dealt with, mobilized, and capitalized on over the last millennium.

CONCLUSION
A comparison with ground-truthed archaeological sites shows the effectiveness of archaeological predictive modeling through zonal statistics on Zanzibar, Tanzania.The results may help plan future surveys and inform emergent understandings of human-environment dynamics on the Swahili Coast.The development of this methodology using open-access software increases geospatial accessibility and affordability, a consideration that will be especially impactful for researchers in the Global South where funding and licensing is limited.Like recent studies that emphasize low-cost open-access remote sensing methods for archaeological prospection in Africa, this method takes advantage of a growing suite of freely available geospatial datasets.The methodology described here can be applied across a wide variety of contexts in Africa and globally.This method does not rely on having high-resolution multispectral imagery, LiDAR, or paywalled software.The only prerequisite is having a representative and theoretically-justified way to weight zonal raster images-in this case, settlement classes from a digitized historical map were the basis for weighting.The quality and representativeness of training features across all zones under consideration is an important factor for producing a model that is useful for archaeological prospection and for understanding regional environmental and spatial factors.

ADDITIONAL FILE
The additional file for this article can be found as follows: • Supplementary Materials.Supplementary files contain all zonal raster images used in the analysis, a digitized close-up of the 1907 map, and a detailed workflow in QGIS.DOI: https://doi.org/10.5334/jcaa.107.s1

Figure 2
Figure 2 1907 map of Zanzibar, with inset showing detail.The inset is shown in high resolution in the Supplementary Materials section in Figure I.

Figure 3
Figure 3 Settlements and road network of the 1907 map.

Figure 4 Figure 4
Figure 4 shows a map of streams, wells, and other miscellaneous features on the 1907 map.Streams are drawn as black lines on the 1907 map and labeled in italic

Figure 5
Figure 5 Model A. Site detection probability zones derived from small settlements on the 1907 map, in relation to environmental datasets.

Figure 6
Figure 6 Model B. Site detection probability zones derived from large settlements on the 1907 map, in relation to environmental datasets.

Figure 7
Figure 7 Known major precolonial sites on Zanzibar in relation to Model A, which is based on small settlement size classes in the 1907 map.

Figure 8
Figure 8 Known major precolonial sites on Zanzibar in relation to Model B, which is based on large settlement size classes in the 1907 map.

Figure 9
Figure 9 Sites from systematic survey in 2019 in relation to Model A, which is based on small settlement size classes in the 1907 map.

Figure 10
Figure 10 Sites from systematic survey in 2019 in relation to Model B, which is based on large settlement size classes in the 1907 map.

Table 1
below presents the workflow for the methods used in this paper, outlining basic steps for producing an Find zones with the highest density of training points for each zonal raster.This will determine which zone will be weighted as part of the predictive model for each zonal raster image.
1 Prepare the Map and Raster Datasets: Assemble and normalize raster datasets, define Area of Interest (AOI), and set map coordinate reference system. 2 Create Zonal Raster Images: Create raster images with unique values for each zone, which can be uniformly queried in relation to training features.3 Prepare the Training Features: Import or digitize training set datapoints, and buffer them as polygons to reflect their real-world area.7 Create and Assign Weight Classes: Determine coefficient of variation (CV) threshold values and assign these thresholds to weight classes, which will be used to weight and add zonal raster images.8 Create Weighted Zonal Raster Images: Create zonal raster images with a value for the most favored zone (weighted by CV class), and a value of 0 for all other zones.9 Sum Weighted Zonal Raster images into a Predictive Model: Create a raster image that reflects site detection probability for each pixel, and reclassify it into a zonal raster image with categories for site detection probability.

Table 1
Workflow of methods for this study.

Table 2
List of geospatial datasets used in this study.

Table 5
Zonal statistics for map settlement classes and elevation.

Table 4
Zonal statistics for map settlement classes and aspect zones.

Table 6
Zonal statistics for map settlement classes across the geology zones.

Table 7
Zonal statistics for settlement classes across rainfall zones.

Table 8
Zonal statistics for settlement classes across reef distance buffer zones.

Table 9
Zonal statistics for settlement classes across slope degree zones.

Table 10
Zonal statistics for settlement classes and soil types.

Table 11
Zonal statistics for settlement classes across the 1907 stream buffer raster image.

Table 12
CV thresholds and associated weight class values, based on mean CV and CV std.dev.for small settlement classes.

Table 13
CV thresholds and associated weight class values, based on mean CV and CV std.dev.for large settlement classes.Alders Journal of Computer Applications in Archaeology DOI: 10.5334/jcaa.107 5Table14Model A, favored zones, CVs, and weight classes for small settlement class training features across eight zonal raster images.
Table 15 Model B, favored zones, CVs, and weight classes for large settlement class training features across eight zonal raster images.

Table 16
Known precolonial sites in relation to Model A, based on small settlement classes from the 1907 map.

Table 17
Known precolonial sites in relation to Model B, based on large settlement classes from the 1907 map.

Table 19
2019 field survey sites in relation to Model B, which is based on large settlement classes from the 1907 map.

Table 18
2019 field survey sites in relation to Model A, which is based on small settlement classes from the 1907 map.