Feedback Loops as Loops

Topological Data Analysis of Genetic Regulatory Circuits

Gary Welz

CopernicusAI / CUNY Graduate Center (PoI)

February 27, 2026

From papers to flowcharts

First attempt at the β-galactosidase flow chart in 1995. This appeared in an article in The X Advisor, an online magazine for Unix developers. The article was entitled “Is the Genome Like a Computer Program?” and contained excerpts from my conversations with biologists on the bionet.genome.chromosome newsgroup. The article was archived at the Wayback Machine (Internet Archive); the newsgroup discussions are archived by Google.

The 1995 chart was created from text alone — the same process LLMs use today. The source was Berg & Singer (1992, pp. 71–73). This shows that diagrams are only as detailed and reliable as their source material; using different sources for the same process can yield different charts.

1995 article (Internet Archive) | Source: Berg, P. & Singer, M. (1992). Dealing With Genes. University Science Books, pp. 71–73.

bionet.genome.chromosome thread: first posting (flowchart with “and”/“or”) · Robbins: “care must be taken in interpreting that flow chart”; “computer-science insights… potentially huge payoffs” · Dellaire: genome structure (not just sequence) encodes how the code is read—context spatial/temporal · Robbins.

Beta-galactosidase / Lac flowchart (GLMP, 1995)

Same chart, 30 years later

This is the same Lac operon / β-galactosidase idea — but generated with LLMs and Mermaid Markdown. The original chart was so time-consuming that the idea sat dormant for decades. Now we can produce any of these flowcharts from a single prompt in seconds.

Lac Operon (GLMP viewer)

First LLM-generated Mermaid Markdown flowchart: β-galactosidase / Lac operon

The Innovation: Text to Visual Data

Traditional TDA starts from numerical data. Here we start from text — paper descriptions — and turn them into visual flowcharts first. That shift is what makes the rest possible.

The Question

We’re asking whether the shape of these circuits — as captured by topology — lines up with what biologists already know: feedback loops, cascades, and regulatory motifs.

The GLMP Database

The Genome Logic Modeling Project gives us 108 processes — each one a Mermaid flowchart with nodes, conditionals (edges), and OR/AND logic. We extract five features per process: nodes, conditionals, OR gates, AND gates, and loops (back-edges).

GLMP: References in JSON & Feedback

Each process in GLMP is grounded in literature: the JSON holds PubMed/DOI and the viewer lets anyone suggest improvements. So the flowcharts are citable and correctable.

Scroll down in the viewer to see Sources & Citations, Metadata, and the Improve-this-process form.

From Flowcharts to Features

We don’t use the full graph structure for TDA — we summarize each flowchart into five features per process.

TDA Pipeline

From the feature matrix we build a distance between every pair of processes, then run a Vietoris–Rips filtration and use Ripser to get persistence diagrams. The cocycles tell us which processes sit on which topological loop.

What Are We Counting? H₀, H₁, H₂

Before the persistence diagram, a quick intuitive ladder:

Mathematical Note (1): Betti Numbers — History & Geometry

Betti numbers β₀, β₁, β₂ count connected components, loops, and voids. Named for Enrico Betti (1823–1892), formalized by Poincaré (1890s). Geometrically: β₀ = pieces; β₁ = independent loops that don’t bound a filled region; β₂ = enclosed voids. Euler’s formula χ = V − E + F = 2 (planar graphs) — F includes the outer face.

Mathematical Note (2): Faces, 2-Simplices, and H₁

In a planar graph, a face is a region bounded by edges—including the outer, unbounded region. In homology, faces correspond to 2-simplices (filled triangles): three vertices within the distance threshold form a triangle whose interior “fills in” the loop. When a loop is not bounded by any 2-simplex—no triangle fills it in—that loop persists as an H₁ feature. Our 33 H₁ loops are exactly those cycles that fail to be filled; they are the β₁ contribution to χ.

Persistence Diagram

Here’s the persistence diagram. We get one component per process in H₀, and 33 loops in H₁. The question is whether those H₁ loops line up with known biology — feedback circuits, stress responses, and so on.

Persistence diagram

What Do the Loops Look Like? (1) PCA + Cocycle Edges

The persistence diagram tells us H₁ has 33 loops, but not where they sit in the data. To make homology visible, we project the 5D feature space to 2D via PCA (principal component analysis—finds directions of maximum variance; preserves distances for visualization), then draw the cocycle edges—the pairs of processes that form each cycle. Each colored loop is one H₁ cycle: red (#1), blue (#2), green (#3), purple (#4), orange (#5). Lac operon, two-component, and SOS are labeled.

→ Interactive PCA + cocycle (hover for process names)

H1 loops in PCA space

What Do the Loops Look Like? (2) Mapper Graph

The Mapper algorithm builds a simplicial complex from the data: cluster nearby processes, then connect clusters that overlap. Each node is a cluster of similar processes (node size = number of processes); edges connect overlapping clusters. Cycles in this graph correspond to topological loops—so the loops in the Mapper graph visualize the H₁ structure in a different way. This complements the persistence diagram and the cocycle-in-PCA view.

→ Open interactive Mapper (click nodes to see processes, search by name)

Mapper graph

Top H₁ Loop #1 (Persistence = 0.563)

The most persistent loop aggregates stress response, protein quality control, and DNA repair: SOS response, quorum sensing, biofilm formation, base excision repair (BER), BAM complex assembly, ribosome assembly, RNA pol recycling, Type III secretion, ubiquitin-proteasome, and unfolded protein response (UPR). E. coli and yeast; shared “stress + quality control + feedback” character.

Example: SOS Response (Loop #1)

The SOS response is E. coli’s emergency DNA repair system: damage activates RecA, which inactivates LexA repressor, inducing repair genes. Classic feedback — repair turns genes off. SOS sits in the top H₁ loop alongside quorum sensing, biofilm, UPR, and protein quality control — processes that share stress-response and feedback structure.

SOS Response flowchart (GLMP viewer)

Top H₁ Loop #2 (Persistence = 0.443)

Six processes: antibiotic efflux pumps, arginine biosynthesis, osmotic stress response, tryptophan biosynthesis, peroxisome biogenesis, vacuolar protein sorting. Metabolic regulation and organelle biogenesis — E. coli and yeast. Topology groups by circuit structure: feedback in biosynthesis and stress-induced transport.

Top H₁ Loop #3 (Persistence = 0.306)

Six processes: biofilm formation, DNA replication elongation, flagellar assembly, osmotic stress, sigma factor competition, peroxisome biogenesis. Gene regulation, replication, motility, stress — shared circuit logic across E. coli and yeast.

Top H₁ Loop #4 (Persistence = 0.279)

Six processes: phosphate regulation, translation elongation, translation termination, tryptophan biosynthesis, osmotic stress response, sporulation initiation. Gene regulation, translation, stress, developmental — E. coli, yeast, Bacillus.

Top H₁ Loop #5 (Persistence = 0.198)

Five processes: ara operon, maltose regulon, Pho regulon, nitrogen catabolite repression (NCR/TORC1), competence development. Nutrient and developmental regulation — ara and Pho are classic feedback circuits; topology groups by nutrient-sensing regulatory logic. Cross-organism: E. coli, yeast, Bacillus.

Example: Ara Operon (Loop #5)

AraC acts as repressor or activator depending on arabinose; DNA looping and CRP–cAMP integration. Ara sits in Loop #5 with Pho regulon, maltose regulon, nitrogen catabolite repression, and competence — all nutrient-sensing or developmental decisions with shared regulatory logic.

Ara Operon flowchart (GLMP viewer)

Biological Coherence Check

With the new loop-based feature set, known feedback circuits cluster in coherent loops: SOS, quorum sensing, biofilm in Loop #1 (stress + feedback); ara and Pho in Loop #5 (nutrient-sensing feedback); trp biosynthesis in Loops #2 and #4. Topology recovers regulatory structure — stress, protein quality, nutrient regulation — from structural features alone.

Organism Patterns

All top five loops mix organisms. Loop #1: E. coli and yeast. Loops #2, #3: E. coli and yeast. Loop #4 and #5: E. coli, yeast, and Bacillus. Topology groups by circuit structure, not by species — regulatory logic transcends organism boundaries.

Why These Features Work

Using loop (back-edge) features instead of NOT gates yields richer persistence values and clearer biological groupings. The coherence check asks whether topology recovers known biology — not the reverse.

Known feedback circuits cluster coherently: stress in Loop #1, nutrient-sensing in Loop #5, metabolic feedback in Loops #2 and #4.

Stress circuits (SOS, quorum sensing, biofilm) → Loop #1. Nutrient-sensing (ara, Pho) → Loop #5. Metabolic feedback (trp) → Loops #2 and #4. Structure matches regulatory architecture biologists recognize.

Conclusion: loop-based features capture coarse regulatory topology. The new feature set is a distinct experiment; results are richer and more interpretable.

Limitations and Caveats

Next Steps

We’re moving toward Mapper, ablation and null-model validation, and richer features. A longer-term goal is to use flowcharts and TDA as a kind of Rosetta Stone: linking topological structure to the genetic “machine code” on the chromosome — the sequence motifs that implement AND/OR connectives.

References

Acknowledgments and Questions

1 / 29