The authors have declared that no competing interests exist.

To date all public records of ^{14}. The profiles themselves are found to have very few occurrences of common values between the 15 loci and thus according to Fisher’s theory of epistatic variance no correlation to phenotype attributes is expected–a result verified by the original investigators. Therefore further discovery of appropriate markers is needed to fully capture geno- and pheno-type characteristics in

Identifying plant varieties is an age-old human endeavor. Historically morphological traits were used to categorize specimens [

Genetic distance measures can be roughly classified into 4 categories: dynamic, statistical, geometric, and biostatistical. Dynamic methods use knowledge of linkage locations of loci along a genetic sequence to produce simulations of genetic crossovers in breeding, then analyze allele values to compute probabilities of relationships. Centimorgans are an example measure produced by dynamic simulation [

Of interest in the present study are SSR profiles taken circa 2009 of the

C22F1 | C24H1 | C26N1 | C31F1 | C35H1 | C37N1 | LM12H1 | LM14H1 | LM30N1 | LM36N1 | M1F1 | M2H1 | M3N1 | M4F1 | M8N1 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

283 | 272 | 234 | 224 | 254 | 204 | 214 | 200 | 243 | 248 | 172 | 153 | 120 | 194 | 171 |

283 | 272 | 234 | 239 | 254 | 208 | 243 | 200 | 245 | 248 | 189 | 167 | 132 | 218 | 175 |

Locus names are across the top. Values are the total number of repeats of the dominant type per allele (nomenclature: BP).

In the original published analyses of the NCGR profiles [^{−9}.

δ | Type | Domain | Property Failed | Failures per Domain | Tests per Domain | Maximum |error| | Minimum |δ(a,b)| |
---|---|---|---|---|---|---|---|

Alleles Mask | tensor, biostatistical | spatial | none | n/a | n/a | n/a | n/a |

Spectral 2 | tensor, geometric | all | none | n/a | n/a | n/a | n/a |

Nei | vector, geometric | spatial | 4 | 10569 | 317750 | 10^{−4} |
10^{−7} |

Nei | vector, geometric | alleles frequencies | 4 | 9145 | 317750 | 10^{−1} |
10^{−6} |

Nei | vector, geometric | loci frequencies | 4 | 17428 | 317750 | 10^{−1} |
10^{−5} |

Nei | vector, geometric | population frequencies | 4 | 10277 | 317750 | 10^{−1} |
10^{−5} |

Spectral Radius Angle | tensor, geometric | spatial | none | n/a | n/a | n/a | n/a |

Spectral Radius Angle | tensor, geometric | alleles frequencies | 4 | 2 | 317750 | 10^{1} |
10^{0} |

Spectral Radius Angle | tensor, geometric | loci frequencies | 4 | 6 | 317750 | 10^{1} |
10^{0} |

Spectral Radius Angle | tensor, geometric | population frequencies | 4 | 3 | 317750 | 10^{1} |
10^{0} |

The tensor measures Alleles Mask, Spectral 2, and Spectral Radius Angle were further analyzed for applicability to the profiles. The first is biostatistical and the others geometric. No static statistical tensor metric could be located. The computed distances were compared with measurements of allele similarities obtained from manual evaluation of the profiles. Results in the spatial domain edged out those in the alleles frequencies followed in turn by loci and population frequencies. However, neither geometric measure could completely resolve observed relations between all profiles–instead producing a few anomalies each due to their reliance on normative computations between numerical allele values (Tables

δ | Domain | Units | closest #1 | closer #2 | average #3 | farther #4 | farthest #5 |
---|---|---|---|---|---|---|---|

Alleles Mask | spatial | Loci mismatches | [0., 2.) | [2., 3.5) | [3.5, 7.) | [7., 9.) | [9., 13.] |

Spectral 2 | alleles frequencies | BP frequencies | [0.008, 0.2) | [0.2, 0.54) | [0.54, 1.3) | [1.3, 1.5) | [1.5, 2.4] |

Spectral Radius Angle | spatial | μradians | [0.594, 11.06) | [11.06, 21.514) | [21.514, 31.957) | [31.957, 42.387) | [42.387, 52.807] |

C1 DFIC, Label | C2 DFIC, Label | μr | BP |
Loci mismatches | Profile Analysis |
---|---|---|---|---|---|

7. Archipel | 261. Encanto Brown Turkey | #1 | #2 | #1 | 14 ~ 0 ~ 0 |

102. Gulbun | 126. Capri Q | #2 | #3 | #2 | 10 ~ 5 ~ 0 |

66. Kadota | 20. Excel | #3 | #3 | #3 | 8 ~ 6 ~ 0 |

10. not Saleeb | 205. LSU Hollier | #1 | #4 | #4 | 5 ~ 5 ~ 1 |

155. not California Brown Turkey | 218. Fico Nero | #3 | #4 | #5 | 2 ~ 5 ~ 0 |

DFIC = accession #. Profile Analysis Key: (# exact loci matches) ~ (# single allele matches) ~ (# likely intra-loci crossovers).

The Alleles Mask distances were then compared with breeding records documented by NCGR Davis (GRIN pedigree data), information from accession donor sites (GRIN passport data), and historical accounts [

Labels are only provided for verified accessions. Known/discovered descendants (if any) are denoted by arrows, not vertical hierarchy.

The linear density of profiles within their esoteric space was estimated by computing _{l} ≈ 0.733 the ratio of mean displacement đ to profiles perimeter radius r_{p} from central feature DFIC 32 Adriatic. For comparison, a “cannonball” packing of identical spheres in 3 dimensions has _{l} ≈ 0.61 This demonstrates how dense SSR packings can be. In fact, any specific spatial profile in this set is distance 0 from the others in an average of 54.6% of alleles (identical allele value). Together these statistics demonstrate that the use of clustering techniques based on distance radii are inappropriate for this dataset. Nearest neighbor relations are necessary to overcome the high packing density. This is accomplished here by using Least Bridges Graphs as structural representations of profile relations.

The maximal Laplacian eigenvalue [_{max} ≈ 11.63 was computed for the connected Least Bridges Graph of distances. The maximal Laplacian is an upper bound on the number of edge frequencies and hence varieties of substructures within the graph. Organization of the SSR profiles into distance classes (hierarchies of nearest neighbor distances) demonstrates the infeasibility of large-scale biological clades that would span the collection. Specifically the lack of cohesion in the shortest distance classes prohibits larger scale aggregations of close relations. The result is that when the graph of connected components of profiles is restricted to using edge lengths with distance measure less than 3.5 Loci mismatches, no more than half of the profiles are used and the remaining are essentially cladeless. Also, most components constructed in this manner have the poor quality of containing 1 to 2 edges (^{14} (see

Numbers refer to DFIC accessions.

An examination of frequencies of spatial values revealed only a few that occur in multiple loci (

C22F1 | C24H1 | C26N1 | C31F1 | C35H1 | C37N1 | LM12H1 | LM14H1 | LM30N1 | LM36N1 | M1F1 | M2H1 | M3N1 | M4F1 | M8N1 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

200 |
214 |
200 |
233 |
155 | 155 | 214 |

Few purely tensor genetic distance measures exist in the literature. As such it is a common but dubious practice among practitioners to “flatten” tensors (string out in single vector) for use in vector measures. Consider for a moment though the p×q non-trivial tensors A, B, A ≠ B, and C = B—A, which in our Euclidean minds we would like to think of as edges of "triangle" A,B,C. Now compute the angle

To make matters worse, we also discover that with few exceptions:

Hence tensors are different from vectors and “flattening” tensors into vectors changes the problem under study. Further: any values computed by nontrivial δ in the vector space are useless because an inverse to translate them back to corresponding δ values in the original tensor space is infeasible due to the nature of the projection. Consequently the practice of flattening tensors for the purpose of vector computation should be avoided.

The values produced by a distance measure δ are not considered valid for comparison unless δ is a qualified metric [

δ(a, a) = 0 for every profile a

δ(a, b) > 0 for all profiles a with single edge path to b and a ≠ b

δ(a, b) = δ(b, a) for all single undirected edges between profiles a, b

δ(a, c) ≤ δ(a, b) + δ(b, c) for all profiles a, b, c having single edge path a to b, b to c, and a to c.

Some measures come “pre-proven” for undirected graphs, e.g. Euclidean. Having a pre-proven measure does not mean that numerical instability or ill-conditioning [

A measure with units of Loci mismatches. Denote F_{i}(A_{n}, A_{m}) = a full Loci match between profiles A_{n} and A_{m} at locus i. Likewise denote S_{i,j}(A_{n}, A_{m}) = a single allele match at allele j of locus i in A_{n} and A_{m}, but not “double counting” those in full Loci matches. And finally denote C_{i,j}(A_{n}, A_{m}) = an intra-loci crossover match from allele j to allele ~j of locus i in A_{n} and A_{m}−but not double counting those from full Loci matches of identical values, and also not counting those where the target allele is one of the high frequency (e.g. ≥ 84%) values in the sample population (_{i,j}(A_{n}, A_{m})≠ C_{i,−j}(A_{m}, A_{n}), thus producing a directed graph.

The back allele of C24H1 commonly has the value 272.

To compute, begin with a profile mask containing all 1’s:

For example, consider the profile of DFIC 66 Kadota:

The mask produced for the distance Kadota to Excel is

Here, the spectral radius [

In this application of the spectral radius norm a spatial tensor metric with units of radians is obtained

Also available at the USDA GRIN-Global site are breeding records for

Arrows point to progeny. Shaded names are NCGR accessions with matching labels and genetic data. Some labels were found to be inaccurate after genetic profile analysis. Color indicates breeding location. Gold = UC Riverside, Green = Kearney Ag Center, Purple = Louisiana State University, Maroon = Texas A&M University.

Ira Condit’s voluminous monograph of fig varieties was published by the UC research periodical Hilgardia in 1955 [

Least Bridges Graphs are a method of visualizing nearest-neighbor relationships in abstract spaces. They are constructed by first considering the vertices (e.g. genetic profiles) as disconnected components, then incrementally adding the shortest available edge connection (i.e. edge representing distance between the two components). Edges are only added between disconnected components and thus termed "bridges" [^{®} v12.1 NearestNeighborGraph [

Distance class hues = {

The fig collection from NCGR Davis is not a random sample of individuals from the worldwide population, but mostly a selection of preferred cultivars from commercial production and private collections [

As a first estimate consider the product of the number of unique alleles values per loci

This assumes the frequencies are accurate with no multiplicities. To include multiplicity and uncertainty in the estimate, introduce a 2% frequency variation in numerator values that conserves probability so the sum of frequencies per loci still adds to 1. In particular, numerators of alleles frequencies of each loci are permitted to vary by {-2,-1,0,+1+,2} provided the sum of the frequencies per loci adds to 1. (The denominator is held constant at N = 125.) Now check the numerators per loci and count the greatest common divisors. Selecting the min, median, and max produces

The purpose of going to this trouble is to demonstrate that sample sizes of 1000, 2000, or even 200000 are insignificant [^{10} (more likely 10^{18}) profiles will be needed for a statistically significant sample. This is an intractable situation unless someone can express it as a satisfiability problem for quantum computing.

Special thanks to Mark Steele for many interesting discussions about fruit trees and to Tracy Frost for proof-reading the manuscript.