GLMP Working Paper 2026 · Sequel

The Genome as Computer: Logical Primitives, Runtime States, and the Computational Limits of Biological Prediction

Sequel to: Primitive Relations, Computational Complexity, and a Conjecture on the Genomic Computational Class

Gary Welz
gwelz@gc.cuny.edu
CUNY Graduate Center / New Media Lab
Genome Logic Modeling Project (GLMP)
Abstract

The companion paper established that the choice of primitive relations determines the logical character of a formal system, and conjectured that gene regulatory circuits can be classified by computational complexity class. This paper develops the computational hypothesis to its full extent. We argue that the genome implements a genuine two-layer computational system: a data layer (the codon table, encoding protein sequences) and a control layer (promoter architecture, encoding regulatory logic). The logical primitives of the control layer — binding, NOT, AND, OR, CONDITIONAL, NAND, NOR, XOR, biconditional, and their temporal, modal, and predicate extensions — have specific molecular implementations at the level of promoter sequence architecture and transcription factor interaction geometry. The CONDITIONAL (IF-THEN) is identified as the master primitive, of which all feedback relationships are special cases. The transcriptome is the runtime state of this computational system: a snapshot of which instructions are currently executing, readable as a logical state vector. From this framework we derive nine predictions about cell fate, cancer, drug resistance, the limits of virtual cell models, and the computational irreducibility of complex biological behavior. The most consequential prediction follows from Rice's theorem: if Class V genomic circuits are Turing-complete, then perfect prediction of cellular behavior from genomic sequence is provably impossible for any algorithm. We propose that grammar-aware AI models, informed by the logical structure of the control layer, will outperform grammar-blind statistical models in interpretability, sample efficiency, and formal verifiability.

Scope and Relationship to Companion Paper
This paper is the direct sequel to Primitive Relations, Computational Complexity, and a Conjecture on the Genomic Computational Class (Welz, 2026). Readers are assumed familiar with the companion paper's framework: the foundational dependency DAG, the five-class complexity ladder, the epistemic rung table, and the mathematical instruments of Reverse Mathematics, ordinal analysis, forcing, and computability theory. Where the companion paper established the theoretical framework and stated the central conjecture, this paper takes the conjecture as a working hypothesis and develops its consequences to the level of specific, falsifiable predictions. All predictions are explicitly labeled by confidence level. The companion paper's seven-rung epistemic ladder applies here — the predictions of this paper sit at Rungs 3 through 7.
Part I The Two-Layer Genome

1. Introduction: Beyond the Codon Table

The decoding of the genetic code between 1961 and 1966 — the mapping of 64 codons to 20 amino acids and three stop signals — is one of the great intellectual achievements of the 20th century. It revealed that the genome encodes protein sequences in a systematic, universal, and readable language. But it decoded only one layer of the genome's computational architecture. The codon table is the data layer: it specifies what proteins are made. A second layer — the control layer — specifies when, where, under what conditions, and in response to what signals each gene is expressed. This control layer is the regulatory program of the cell, and it remains only partially decoded.

The distinction between data layer and control layer corresponds to a fundamental distinction in computer science between a program's data structures and its control flow. A program that stores numbers in memory but has no conditional branching, no loops, and no subroutine calls is not a useful program — it is just a data store. The control flow — the IF-THEN statements, the loops, the function calls — is what makes a program a computation. The codon table, read in isolation, is the genome's data store. The regulatory architecture — the promoters, operators, enhancers, silencers, and the transcription factor networks that read them — is the genome's control flow.

This paper argues that the control layer is written in a language whose primitives are logical rather than chemical, whose grammar encodes computational operations, and whose runtime state is the transcriptome. The logical primitives of this language have specific molecular implementations that are in principle readable from genomic sequence and transcriptomic data.

CONTROL LAYER (program) DNA backbone DATA LAYER (sequences) PROMOTER / CONTROL REGION OPERATOR NOT gate ¬P DUAL BINDING SITE AND gate TF-A ∧ TF-B SIGNAL SITE CONDITIONAL P→Q TSS start Regulatory logic: ¬Rep ∧ (A∧B) ∧ Signal → Gene ON Repressor TF-A TF-B Signal mol. ← PROMOTER / REGULATORY → ← GENE BODY (coding) → intergenic ← NEXT GENE BODY → reads transcribes Codon sequence (gene body): AUG START GCU Ala AAA Lys UGG Trp CAU His · · · UAA STOP Translation ↓ Protein product: H₂N — Ala — Lys — Trp — His · · · — COOH Transcription factor / enzyme / structural protein Key insight: The codon table (data layer) has been fully decoded since 1966. The promoter grammar (control layer) remains only partially decoded. Reading both layers is the project of which GLMP is a part.
Figure 1. The genome as a two-layer computational system. The control layer (top, red) is written in promoter and regulatory regions: binding sites encode NOT, AND, and CONDITIONAL gates. The DNA backbone (middle) carries both layers. The data layer (bottom, blue) is written in coding regions: codons specify amino acid sequences, decoded since 1961–1966. The codon table maps 64 triplets to 20 amino acids, START, and STOP — a complete dictionary. The regulatory grammar of the control layer maps ~1,600 TF binding motifs to logical operations — a vocabulary partially mapped in databases (JASPAR, RegulonDB, ENCODE) but not yet understood as a complete formal grammar.

2. The Logical Primitives and Their Molecular Implementations

2.1 Binding: The True Primitive

As established in the companion paper, binding is the foundational primitive — the molecular analog of Tarski's betweenness relation. All other logical operations are derived from binding in specific geometric and contextual arrangements: sequence-specific protein-DNA binding (a transcription factor recognizes and binds a specific sequence motif), protein-protein binding (TFs interact with co-activators, co-repressors, and mediator complexes), and RNA-protein binding (regulatory RNAs bind target mRNAs or proteins, implementing post-transcriptional logic). The binding relation B(X, Y) is dyadic, binary in the logical abstraction, and the ground-level primitive — the RCA₀-level operation of the genomic computational system.

2.2 NOT, AND, OR — The Boolean Foundation

NOT (Repression) is implemented by the repressor-operator system. A repressor protein binds an operator sequence within the promoter; when bound, RNA polymerase cannot access the promoter and transcription is blocked. The operator sequence is the physical encoding of the NOT gate. In single-cell RNA-seq data, NOT relationships appear as anti-correlated expression pairs.

AND (Cooperativity) requires two conditions simultaneously. It is implemented by promoter architectures requiring multiple transcription factors: dual binding sites where both must be occupied, or cooperative assembly of multi-protein complexes. The interferon-β enhanceosome — requiring eight distinct proteins to assemble simultaneously on a 55 bp enhancer — is an eight-input AND gate. AND gates appear in transcriptomic data as genes expressed only when both upstream inputs are simultaneously high.

OR (Alternative Activation) requires at least one of multiple conditions. It is implemented by multiple independent promoters or alternative upstream activating sequences. OR gates appear as genes expressed in the union of upstream input domains.

GateSymbolTruth TableMolecular ImplementationscRNA-seq SignatureClass
¬ NOT
Repression
¬P P=0 → 1
P=1 → 0
Repressor binds operator, blocks RNAP.
lac operon: LacI binds lacO. Operator sequence is the NOT gate.
Anti-correlated pairs: TF↑ → target↓ I
∧ AND
Cooperativity
P∧Q 0,0→0   1,0→0
0,1→0   1,1→1
Dual binding sites; both must be occupied.
IFN-β enhanceosome: 8-protein AND gate.
Intersection domain: gene ON only when both inputs high I
∨ OR
Alt. Activation
P∨Q 0,0→0   1,0→1
0,1→1   1,1→1
Multiple independent promoters; either sufficient.
Stress response genes, tissue-specific promoters.
Union domain: gene ON when any input high I
→ CONDITIONAL
IF-THEN · Master
P→Q P=0 → Q=0
P=1 → Q=1
Introduces time, context, threshold
Signal transduction cascade; threshold = Kd.
lac: allolactose → LacI release → lacZ ON.
All feedback = CONDITIONAL applied recursively.
Correlated response with sharp threshold at Kd; time-delayed I–V
⊼ NAND
Co-repressor
COMPLETE
¬(P∧Q) 0,0→1   1,0→1
0,1→1   1,1→0
Repressor requires co-repressor for active conformation.
trp operon: TrpR + 2× tryptophan. NAND alone is Boolean-complete.
Gene ON except when both repressor and co-repressor present I
⊽ NOR
Dual Repression
COMPLETE
¬(P∨Q) 0,0→1   1,0→0
0,1→0   1,1→0
Either of two repressors alone blocks transcription.
Developmental genes: TF repressor + Polycomb. NOR alone is also Boolean-complete.
Very narrow expression: gene ON only when both repressors absent I

Figure 2. All six gates derive from the single ground primitive of binding. NAND and NOR are each individually functionally complete — any Boolean regulatory logic can be built from NAND gates alone. The CONDITIONAL is the master primitive; all feedback structures are CONDITIONAL applied recursively.

2.3 The CONDITIONAL: Master Primitive

The CONDITIONAL — IF P THEN Q — is the most biologically fundamental logical operation. It is not merely one gate among others. Unlike the Boolean gates (NOT, AND, OR), which are truth-functional (output depends only on current input values), the CONDITIONAL introduces a temporal dimension (P is detected before Q is executed), a contextual dimension (the same P can trigger different Q in different cell types), and a threshold dimension (the response fires only when signal exceeds the binding affinity Kd). The molecular implementation is a signal transduction cascade: ligand binds receptor → conformational change → TF activation → target gene transcription.

Feedback as a special case of the CONDITIONAL. Feedback is the CONDITIONAL applied recursively: IF output Q exceeds threshold T THEN modify input P. Negative feedback (Q → ¬P) implements homeostasis. Positive feedback (Q → P) implements bistability. Delayed negative feedback (Q →[D] ¬Q) implements oscillation. Self-modifying feedback (Q → modify(P → Q)) implements Class V epigenetic reprogramming. The CONDITIONAL without feedback is Class I (decidable); the CONDITIONAL with feedback generates all higher complexity classes. This makes the CONDITIONAL the gateway to the entire complexity ladder.

2.4 NAND, NOR, XOR, and Functional Completeness

NAND is implemented by the co-repressor system: a repressor requires a co-repressor molecule to achieve active conformation. The trp operon repressor (TrpR) requires two tryptophan molecules — neither alone represses. NOR is implemented by dual alternative repression: either of two repressors alone is sufficient to block transcription, so the gene is expressed only when neither is present. XOR (exclusive OR) appears in competitive binding, where two TFs compete for the same site. The key theoretical consequence: NAND alone is functionally complete — any Boolean regulatory logic is in principle implementable. But Boolean completeness is only the floor of the system's expressive power; the feedback primitives of Class II-V circuits provide additional expressive power beyond it.

2.5 The Biconditional: A Derived but Fundamental Structure

The biconditional P ↔ Q — P if and only if Q — is derived from two CONDITIONALs running in opposite directions: (P → Q) ∧ (Q → P). It is not a new primitive, but it underlies two of the most important regulatory architectures in biology.

Mutual activation (biconditional without negation): Gene A activates Gene B and Gene B activates Gene A — (A → B) ∧ (B → A). Once either gene is activated, the loop sustains both. This is the logical structure of commitment. MyoD and myogenin in skeletal muscle differentiation, Oct4 and Sox2 in pluripotency, and GATA1 and PU.1 in hematopoietic lineage decisions all exhibit mutual activation.

Mutual repression (biconditional with negation): Gene A represses Gene B and Gene B represses Gene A — (A → ¬B) ∧ (B → ¬A). The circuit has exactly two stable states: A high/B low, and A low/B high. This is the toggle switch (Gardner et al. 2000) — bistability in its purest logical form. The biconditional reinforces the primacy of the CONDITIONAL: two of the most important regulatory architectures in biology emerge from combinations of primitives already identified, requiring no new molecular machinery.

2.6 Beyond Classical Logic: Temporal, Modal, and Predicate Extensions

Temporal Logic

The four fundamental temporal operators each have biological implementations. ALWAYS (□P) — constitutive expression; housekeeping genes. EVENTUALLY (◇P) — inducible expression; the lac operon implements EVENTUALLY(lacZ expressed). UNTIL (P U Q) — transient developmental expression; Hox genes expressed until positional identity is established. NEXT (○P) — the delay cascade; gene A activates gene B after one transcription-translation cycle. The repressilator implements NEXT recursively, producing sustained oscillation.

Temporal logic operators map onto the complexity ladder: ALWAYS, EVENTUALLY, UNTIL, and NEXT without recursion are Class I; NEXT applied recursively generates Class IV oscillation; UNTIL with a self-modifying condition generates Class V epigenetic silencing. The full logical grammar of the control layer is better described by temporal logic than classical propositional logic — a claim with consequences for the LEAN formalization path, since Mathlib contains substantial formal treatments of temporal logic.

Modal Logic

Modal operators distinguish between current state and possible states. Necessity (□P) maps to constitutive expression (housekeeping genes). Possibility (◇P) maps to cell-type-specific expression. The clinically significant operator is Impossibility (¬◇P) — epigenetic silencing, where a promoter is permanently inaccessible via heterochromatin or DNA methylation. The distinction between "gene G is not currently expressed" (propositional) and "gene G cannot be expressed in this context" (modal) matters: oncogene silencing by methylation (modal ¬◇) is fundamentally different from oncogene repression by a TF (propositional). Modal impossibility is a Class V operation.

Predicate Logic and Quantifiers

Predicate quantifiers distinguish population-level from cell-level expression: ∀x: expressed(G, x) (universal, housekeeping), ∃x: expressed(G, x) (existential, detected somewhere), ∃!x: expressed(G, x) (unique cell-type marker). This framework clarifies the difference between bulk RNA-seq (existential queries over cell mixtures) and single-cell RNA-seq (individual queries per cell). Single-cell heterogeneity — ∃x: expressed(G, x) ∧ ∃y: ¬expressed(G, y) — is the population-level signature of a bistable (Class III) circuit.

2.7 The Complete Primitive Vocabulary

PrimitiveSymbolMolecular ImplementationComplexity ClassType
BindingB(X,Y)Sequence-specific molecular contactRCA₀ groundFoundational
NOT¬PRepressor-operator systemClass IBoolean
ANDP∧QCooperative dual binding siteClass IBoolean
ORP∨QMultiple independent promotersClass IBoolean
NAND¬(P∧Q)Co-repressor dual requirementClass IBoolean (complete)
NOR¬(P∨Q)Dual alternative repressorsClass IBoolean (complete)
XOR(P∨Q)∧¬(P∧Q)Competitive binding at shared siteClass IBoolean
CONDITIONALP→QSignal transduction cascadeClass I (no feedback)Response — master primitive
BICONDITIONALP↔QMutual activation loopClass III (derived)Derived
BICONDITIONAL+NOTP↔¬QToggle switch (mutual repression)Class III (derived)Derived
Negative feedbackQ→¬PAutorepressor; product inhibitionClass IIRecursive
Positive feedbackQ→PAutoactivator; bistable switchClass IIIRecursive
Delayed feedbackQ→[D]¬QRepressilator architectureClass IVRecursive
Self-modifying feedbackQ→modify(P→Q)Epigenetic architecture modificationClass VRecursive
ALWAYS□PConstitutive expressionClass ITemporal
EVENTUALLY◇PInducible expressionClass ITemporal
UNTILP U QTransient developmental expressionClass I–IITemporal
NEXT (recursive)○P recursivelyOscillatory delay cascadeClass IVTemporal
Necessity□P (modal)Housekeeping expressionClass IModal
Possibility◇P (modal)Cell-type-specific expressionClass IModal
Impossibility¬◇PEpigenetic silencingClass VModal
Universal∀x: expr(G,x)Universal population expressionClass IPredicate
Existential∃x: expr(G,x)Expression in some cells (bulk RNA-seq)Class IPredicate
Bistable population∃x∧∃y¬: expr(G)Single-cell bistable heterogeneityClass IIIPredicate

The most striking feature of this complete vocabulary is that all of its richness — temporal, modal, predicate, Boolean, recursive — derives from the single ground-level primitive of binding. Every entry is binding in a specific geometric, temporal, and contextual arrangement.

3. The Transcriptome as Runtime State

At any moment, each gene in a cell's genome is either expressed or not, and if expressed, at some level. The collection of all mRNA levels across all genes is the transcriptome. In the computational interpretation, this is the state vector of the genomic program. The dimensionality of the transcriptome (~20,000 dimensions for a human cell) is the dimensionality of the state space of the genomic computer. A single-cell RNA-seq dataset is a sample from the state space of the genomic program — not merely a collection of expression profiles.

The dimensionality reduction techniques standard in single-cell analysis — PCA, UMAP, t-SNE, diffusion maps — are techniques for finding the attractor structure of the genomic program's state space. The clusters that appear in UMAP plots of single-cell data are not merely statistical clusters — they are the attractors of the genomic computation. Cell types are attractors. The geometry of the UMAP plot reflects the computational topology of the regulatory circuits generating the data: Class III bistable circuits generate two-cluster UMAP plots; Class IV oscillatory circuits generate trajectory structure; Class V circuits generate complex attractor structure.

4. The Grammar of the Control Layer

The promoter architecture — the arrangement of binding sites, operator sequences, and regulatory elements upstream of each gene — is the instruction encoding: the physical medium in which the control layer program is written. Binding site motifs encode the identity of the TF; site position relative to the TSS encodes effect type; site spacing and orientation encode combinatorial logic; clustering encodes complexity; distance from TSS encodes temporal character.

The strongest claim of this paper is that the control layer of all organisms shares a universal grammar — the same logical primitives implemented in organism-specific molecular machinery. (Note: "universal grammar" here is used in the computational sense — a shared set of logical operations — not in Chomsky's linguistic sense of an innate language faculty. The analogy is structural, not cognitive.) Evidence: the same circuit motifs (autoregulation, feedforward loops, feedback oscillators) appear across all domains of life; the same computational functions (bistability, oscillation, adaptation) are implemented by different molecular machines in different organisms; synthetic biology circuits transplanted across organisms function correctly. The universal grammar conjecture predicts that logical structure is more conserved across evolution than molecular identity — testable by comparing GLMP flowcharts across organisms at the topological level.

Part II Computational Consequences
The molecular and logical framework of Part I generates specific, falsifiable predictions. Each prediction is labeled by confidence level: High confidence follows directly from the framework; Medium confidence requires the full computational hypothesis; Speculative predictions are offered explicitly as research directions.

5. The Five-Class Complexity Ladder

ClassNameDescriptionComputabilityRev. MathOrdinalAnalogExample
V Self-modifying / Epigenetic Feedback Circuit rewrites its own regulatory architecture. Rice's theorem applies: perfect prediction provably impossible if Turing-complete. Σ⁰₁ or above ATR₀ / Π¹₁-CA₀ ε₀ or beyond Peano Arithmetic Epigenetic reprogramming circuits
IV Mixed Feedback — Oscillators Sustained oscillation. Circadian rhythms and developmental clocks. Period determined by delay structure. Primitive recursive ACA₀ Approaching ε₀ Pushdown automata Repressilator (Elowitz & Leibler 2000)
III Positive Feedback — Bistable Switches Two stable attractors: cell fate decisions. State persists after signal removal. Toggle switch is the canonical case. Δ⁰₂ (limit computable) WKL₀ ωω Finite automata with memory Toggle switch (Gardner et al. 2000)
II Negative Feedback — Damped Regulation Graded responses and homeostasis. Negative feedback suppresses noise. No sustained oscillation. Δ⁰₁ (bounded recursion) RCA₀ ωω Bounded arithmetic Homeostatic gene regulation
I Feed-Forward Only — No Loops Decidable, complete, bounded expressive power. Output always determinable from input. No memory, no oscillation. The Tarski-like logical floor. Δ⁰₁ (decidable) Below RCA₀ ω Tarski's geometry Simple inducible promoters

Figure 3. The five-class genomic computational complexity ladder, from most expressive (Class V, Peano-like ceiling) to most constrained (Class I, Tarski-like floor). Each class is calibrated against three formal measures: Computability (Δ⁰₁ = decidable; Σ⁰₁ = recursively enumerable), Reverse Mathematics (Big Five subsystems), and proof-theoretic ordinal (ω = Tarski-level; ε₀ = Peano-level). Classes I and II are in principle fully predictable. Class V circuits are subject to Rice's theorem if Turing-complete.

6. Nine Predictions

PREDICTION 1Transcriptomic Noise Distribution Diagnoses Circuit ClassHigh Confidence

Different circuit classes generate different noise distributions in single-cell expression data. Class I: unimodal, low-variance. Class II (negative feedback): unimodal, very low-variance — feedback suppresses noise. Class III (bistable): bimodal. Class IV (oscillatory): time-structured, periodic. Class V (self-modifying): heavy-tailed, non-stationary.

Testability: For genes regulated by circuits of characterized class (toggle switch genes, repressilator genes, simple inducible promoters), compare observed single-cell expression distributions against predicted patterns.
PREDICTION 2Cell Fate Decisions Are Minimum-Energy State TransitionsHigh Confidence

If cell types are computational attractors, then cell fate decisions are transitions between attractors requiring passage through a region of low probability between two basins. The minimum number of transcription factor perturbations required to convert cell type A to cell type B is determined by the logical distance between the two attractors. Yamanaka's four reprogramming factors are the minimum perturbation set required to cross the energy barrier between the somatic and pluripotent attractors.

Testability: For any pair of cell types, the minimum reprogramming factor set can in principle be predicted from GLMP-style flowcharts of the relevant regulatory circuits.
PREDICTION 3Drug Resistance Is Attractor EscapeHigh Confidence

When a cancer cell population develops drug resistance, it transitions from a drug-sensitive attractor to a drug-resistant attractor. Circuit mutation resistance (permanent: the attractor structure changes) differs fundamentally from state transition resistance (potentially reversible: the cell moves to a pre-existing resistant attractor). Cancers that develop resistance through state transitions should be re-sensitizable by forcing the cell back to the sensitive attractor. Cancers with circuit mutation resistance cannot be re-sensitized because the sensitive attractor no longer exists.

Testability: The fraction of reversible vs. irreversible resistance correlates with circuit class: Class III generates reversible resistance; Class V generates irreversible resistance.
PREDICTION 4The Complexity Gradient Across OrganismsHigh Confidence

The modal computational class of regulatory circuits correlates with organismal complexity, measurable by GLMP-style topological classification across species. Prokaryotes: predominantly Class I-II. Unicellular eukaryotes: Class II-III. Simple multicellular organisms: Class III-IV. Complex vertebrates: Class IV-V.

PREDICTION 5Virtual Cell Model Accuracy Correlates with Circuit ClassMedium Confidence

Virtual cell models should have accuracy correlating with the circuit class of target genes. Highest accuracy for Class I-II circuits (decidable, uniquely determined). Lower accuracy for Class III (bistable — response depends on which attractor the cell is currently in). Lowest for Class IV-V (oscillatory and self-modifying — response depends on phase or current epigenetic state).

Testability: Reanalyze existing virtual cell model benchmarks, stratifying predictions by the circuit class of target genes.
PREDICTION 6The Reprogramming Factor Minimum Is a Circuit Depth MeasureMedium Confidence

The number of transcription factors required for cellular reprogramming should relate to the logical depth of the circuit separating source and target cell type attractors. Topologically close cell type pairs require fewer factors; topologically distant pairs require more. A GLMP-style analysis should produce predicted reprogramming factor counts correlating with experimentally observed minimums.

PREDICTION 7Rice's Theorem Sets a Hard Ceiling on Cancer PredictionMedium Confidence

Rice's theorem (1953) states that any non-trivial semantic property of programs is undecidable. If Class V genomic circuits are Turing-complete — a conjecture, not a proven theorem — then by Rice's theorem, no algorithm can determine for an arbitrary Class V circuit whether it will produce a given gene expression pattern. The predictive accuracy ceiling for AI models of cancer driven by Class V circuits is less than 100% and cannot be reached by scaling data or compute. This is a mathematical theorem about what any algorithm can achieve, not a technological limitation. The practical implication is constructive: identify which cancers are driven by Class I-III circuits (potentially fully predictable) versus Class IV-V (subject to principled limits).

The Predictability Ceiling for AI Models of Biological Regulation Schematic — not empirical data. Class V ceiling conditional on unproven Turing-completeness conjecture. 100% 100% ~80% ~55% ~25% 0% Max. Prediction Accuracy Class I Feed-fwd · Δ⁰₁ Class II Neg.fb · RCA₀ Class III Bistable · WKL₀ Class IV Oscillatory · ACA₀ Class V Self-mod. · Σ⁰₁ Genomic Circuit Complexity Class 100% in principle fully decidable Rice's theorem: ceiling < 100% if Class V is Turing-complete LEGEND Theoretical accuracy ceiling Current AI models (schematic) Grammar-aware models (projected)
Figure 4. Schematic diagram of the maximum theoretical prediction accuracy for AI models of biological regulation as a function of genomic circuit class. The colored ceiling curve is the theoretical maximum achievable by any algorithm. Class I circuits are in principle fully predictable. Accuracy declines through Classes II-IV. For Class V circuits, Rice's theorem establishes the ceiling is strictly less than 100% for any algorithm — not a technological limitation but a mathematical theorem about computability, if the Turing-completeness conjecture holds. The green dashed line projects performance of grammar-aware models (not yet built) predicted to approach the ceiling more closely than grammar-blind statistical models (gray dashed). All values are schematic; no empirical data is presented.
PREDICTION 8Grammar-Aware AI Models Will Outperform Grammar-Blind ModelsMedium Confidence

Grammar-aware models explicitly representing the logical primitives and using them as inductive biases should require less training data for equivalent accuracy on Class I-III circuits; be more interpretable (predictions expressible in terms of logical primitives, auditable by biologists); generalize better across organisms (because the logical grammar is universal); and be formally verifiable against the LEAN formalization path described in the companion paper.

PREDICTION 9The Control Layer Has a Finite VocabularyMedium Confidence

If the control layer is a language with a universal grammar, it has a finite vocabulary — approximately 1,600 known human TF binding motifs constituting the alphabet of regulatory instructions. Completing the vocabulary is a finite project, analogous to completing the codon table. Once complete, any promoter sequence can in principle be read as a logical formula in the regulatory grammar — a program fragment specifying which conditions activate the gene, which repress it, and which combination is required for each outcome.

Part III The Grammar-Aware AI Research Program

7. From Grammar-Blind to Grammar-Aware Models

Current AI models for biology — ESM2 (protein language model), Enformer (genomic sequence to gene expression), the Arc Institute's STATE model (perturbation response prediction) — learn statistical regularities in biological data without explicit knowledge of the logical grammar of gene regulation. A grammar-aware model would explicitly represent the logical primitives of the control layer as inductive biases. Rather than treating a promoter sequence as a string of nucleotides, it would parse the promoter as a logical formula: binding site X (TF-A) AND binding site Y (TF-B), with NOT site Z (repressor-C), CONDITIONAL on signal S.

The GLMP hybridization strategy — using RegulonDB as a primary regulatory backbone combined with LLM-generated logical interpretation — is a practical implementation of grammar decoding. Databases contribute entity completeness (which TFs, which binding sites, which genes); LLMs contribute logical interpretation (AND vs. OR, conditional vs. constitutive, feedback vs. feed-forward). Scaling this approach across the GLMP sample would produce a corpus of logical specifications for regulatory circuits — the training data for grammar-aware AI models.

8. The LEAN Formalization Path Revisited

The companion paper identified LEAN 4 and Mathlib as the long-term formal verification path for the genomic conjecture. In the context of this sequel, the LEAN formalization path takes on additional significance: it is the path toward formally verified grammar-aware AI models. A grammar-aware model whose logical specifications are formalized in LEAN would have a property no current biological AI model possesses — its predictions could be formally verified against circuit specifications.

graph TD FR["Formally specify genomic primitive relations
in LEAN's type theory
(binding as a typed dyadic relation)"] CT["Define circuit topology classes in LEAN
(DAG vs. cyclic graph; feedback types;
temporal and modal operators)"] CS["Prove decidability of Class I circuits within LEAN
(analogous to Tarski's completeness proof)"] GR["Formalize the regulatory grammar
(promoter logic as typed formulas in LEAN)"] RM["Establish Reverse Mathematics equivalences
for each complexity class"] GA["Train grammar-aware models
on LEAN-verified circuit specifications"] TH["Full formalization of the five-class ladder
as a theorem in LEAN/Mathlib"] FR --> CT --> CS --> GR --> RM --> TH GR --> GA classDef lean fill:#2E75B6,color:#fff,stroke:#1a4f8a classDef model fill:#27ae60,color:#fff,stroke:#1e8449 classDef theorem fill:#c0392b,color:#fff,stroke:#96281b class FR,CT,CS,GR,RM lean class GA model class TH theorem
Figure 5. The LEAN formalization path for grammar-aware biological AI. Blue: LEAN specification and proof steps. Green: grammar-aware model training. Red: the long-term theorem target.
Part IV Future Directions

9. The Empirical Sequel

This paper is the theoretical version of an argument that has an empirical sequel. The empirical tests include:

10. Open Questions

11. Conclusion

We have argued that the genome is a computer in a precise and non-metaphorical sense: its control layer implements a logical language whose primitives are binding, NOT, AND, OR, CONDITIONAL, and their temporal, modal, and predicate extensions, each with specific molecular implementations readable from genomic sequence and promoter architecture. The CONDITIONAL is the master primitive — the operation that introduces temporal response, contextual adaptation, and threshold sensitivity — of which all feedback relationships are special cases. The biconditional, NAND, NOR, XOR, and the full temporal-modal-predicate vocabulary are derived from these foundational operations, all grounded ultimately in the single primitive of binding.

The transcriptome is the runtime state of this program: a high-dimensional snapshot sampleable by single-cell RNA-seq and analyzable as an attractor landscape. Cell types are attractors; cell fate decisions are state transitions; transcriptomic noise distributions are diagnostic signatures of circuit class.

From this framework we derived nine predictions. The most consequential — that Rice's theorem sets a hard ceiling on cancer prediction for Class V circuits — is a mathematical claim about the limits of any algorithm. The most constructive — that grammar-aware AI models will outperform grammar-blind models — is a research program that GLMP's hybridization methodology is designed to support.

The genome has been partially read for sixty years, since the cracking of the codon table. What remains unread is the control layer — the regulatory program that determines when, where, and under what conditions each gene's instruction is executed. Reading that program is the next great project of molecular biology. The logical framework developed in this paper and its companion is one approach to that reading. It may not be the right approach. But it is a precise approach, with falsifiable predictions, a clear epistemic ladder, and a long-term formalization path.

Either outcome advances the field.


Key References

This paper builds on references 1–33 of the companion paper. New references for this sequel:

Companion paper: Welz, G. Primitive Relations, Computational Complexity, and a Conjecture on the Genomic Computational Class. GLMP Working Paper, 2026. Full text.

  1. Jacob, F. & Monod, J. Genetic regulatory mechanisms in the synthesis of proteins. Journal of Molecular Biology, 3(3), 1961. DOI. Founding paper of molecular regulatory logic; establishes the repressor-operator NOT gate.
  2. Ptashne, M. A Genetic Switch: Phage Lambda Revisited. 3rd ed. Cold Spring Harbor Laboratory Press, 2004. Bistable Class III circuit as biconditional with negation.
  3. Thanos, D. & Maniatis, T. Virus induction of human IFN-β gene expression requires the assembly of an enhanceosome. Cell, 83(7), 1995. DOI. Multi-input AND gate.
  4. Alon, U. An Introduction to Systems Biology: Design Principles of Biological Circuits. Chapman & Hall/CRC, 2006. Network motifs as computational units.
  5. Milo, R. et al. Network motifs: simple building blocks of complex networks. Science, 298(5594), 2002. DOI.
  6. Gardner, T. et al. Construction of a genetic toggle switch in Escherichia coli. Nature, 2000. DOI. Toggle switch as bistable Class III circuit; biconditional with negation.
  7. Elowitz, M. & Leibler, S. A synthetic oscillatory network of transcriptional regulators. Nature, 2000. DOI. Repressilator as Class IV oscillatory circuit.
  8. Ferrell, J.E. & Xiong, W. Bistability in cell signaling. Chaos, 11(1), 2001. DOI. Mathematical basis of bistability.
  9. Waddington, C.H. The Strategy of the Genes. Allen & Unwin, 1957. The epigenetic landscape; reinterpreted here as attractor landscape.
  10. Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell, 126(4), 2006. DOI. Reprogramming as forced attractor transition.
  11. Rice, H.G. Classes of Recursively Enumerable Sets and Their Decision Problems. Transactions of the American Mathematical Society, 74(2), 1953. JSTOR. Rice's theorem.
  12. Pnueli, A. The temporal logic of programs. In Proc. 18th Annual Symposium on Foundations of Computer Science, 1977. DOI. Founding paper of temporal logic for program verification.
  13. Clarke, E.M., Grumberg, O. & Peled, D. Model Checking. MIT Press, 1999. Temporal logic model checking; applicable to gene regulatory circuit verification.
  14. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637), 2023. DOI. ESM2: grammar-blind protein language model.
  15. Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods, 18, 2021. DOI. Enformer: sequence to expression model.
  16. Stormo, G.D. DNA binding sites: representation and discovery. Bioinformatics, 16(1), 2000. DOI. The JASPAR/motif database approach.
  17. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 2012. DOI. Systematic mapping of regulatory elements.
  18. Ptashne, M. & Gann, A. Genes and Signals. Cold Spring Harbor Laboratory Press, 2002. Molecular basis of transcriptional activation and repression as logical operations.
  19. Bintu, L. et al. Transcriptional regulation by the numbers: models. Current Opinion in Genetics & Development, 15(2), 2005. DOI. Quantitative treatment of promoter logic as combinatorial input-output functions.