The Genome as Computer: Logical Primitives, Runtime States, and the Computational Limits of Biological Prediction

Abstract

The companion paper established that the choice of primitive relations determines the logical character of a formal system, and conjectured that gene regulatory circuits can be classified by computational complexity class. This paper develops the computational hypothesis to its full extent. We argue that the genome implements a genuine two-layer computational system: a data layer (the codon table, encoding protein sequences) and a control layer (promoter architecture, encoding regulatory logic). The logical primitives of the control layer — binding, NOT, AND, OR, CONDITIONAL, NAND, NOR, XOR, biconditional, and their temporal, modal, and predicate extensions — have specific molecular implementations at the level of promoter sequence architecture and transcription factor interaction geometry. The CONDITIONAL (IF-THEN) is identified as the master primitive, of which all feedback relationships are special cases. The transcriptome is the runtime state of this computational system: a snapshot of which instructions are currently executing, readable as a logical state vector. From this framework we derive nine predictions about cell fate, cancer, drug resistance, the limits of virtual cell models, and the computational irreducibility of complex biological behavior. The most consequential prediction follows from Rice's theorem: if Class V genomic circuits are Turing-complete, then perfect prediction of cellular behavior from genomic sequence is provably impossible for any algorithm. We propose that grammar-aware AI models, informed by the logical structure of the control layer, will outperform grammar-blind statistical models in interpretability, sample efficiency, and formal verifiability.

Scope and Relationship to Companion Paper

This paper is the direct sequel to Primitive Relations, Computational Complexity, and a Conjecture on the Genomic Computational Class (Welz, 2026). Readers are assumed familiar with the companion paper's framework: the foundational dependency DAG, the five-class complexity ladder, the epistemic rung table, and the mathematical instruments of Reverse Mathematics, ordinal analysis, forcing, and computability theory. Where the companion paper established the theoretical framework and stated the central conjecture, this paper takes the conjecture as a working hypothesis and develops its consequences to the level of specific, falsifiable predictions. All predictions are explicitly labeled by confidence level. The companion paper's seven-rung epistemic ladder applies here — the predictions of this paper sit at Rungs 3 through 7.

1. Introduction: Beyond the Codon Table

The decoding of the genetic code between 1961 and 1966 — the mapping of 64 codons to 20 amino acids and three stop signals — is one of the great intellectual achievements of the 20th century. It revealed that the genome encodes protein sequences in a systematic, universal, and readable language. But it decoded only one layer of the genome's computational architecture. The codon table is the data layer: it specifies what proteins are made. A second layer — the control layer — specifies when, where, under what conditions, and in response to what signals each gene is expressed. This control layer is the regulatory program of the cell, and it remains only partially decoded.

The distinction between data layer and control layer corresponds to a fundamental distinction in computer science between a program's data structures and its control flow. A program that stores numbers in memory but has no conditional branching, no loops, and no subroutine calls is not a useful program — it is just a data store. The control flow — the IF-THEN statements, the loops, the function calls — is what makes a program a computation. The codon table, read in isolation, is the genome's data store. The regulatory architecture — the promoters, operators, enhancers, silencers, and the transcription factor networks that read them — is the genome's control flow.

This paper argues that the control layer is written in a language whose primitives are logical rather than chemical, whose grammar encodes computational operations, and whose runtime state is the transcriptome. The logical primitives of this language have specific molecular implementations that are in principle readable from genomic sequence and transcriptomic data.

2. The Logical Primitives and Their Molecular Implementations

2.1 Binding: The True Primitive

As established in the companion paper, binding is the foundational primitive — the molecular analog of Tarski's betweenness relation. All other logical operations are derived from binding in specific geometric and contextual arrangements: sequence-specific protein-DNA binding (a transcription factor recognizes and binds a specific sequence motif), protein-protein binding (TFs interact with co-activators, co-repressors, and mediator complexes), and RNA-protein binding (regulatory RNAs bind target mRNAs or proteins, implementing post-transcriptional logic). The binding relation B(X, Y) is dyadic, binary in the logical abstraction, and the ground-level primitive — the RCA₀-level operation of the genomic computational system.

2.2 NOT, AND, OR — The Boolean Foundation

NOT (Repression) is implemented by the repressor-operator system. A repressor protein binds an operator sequence within the promoter; when bound, RNA polymerase cannot access the promoter and transcription is blocked. The operator sequence is the physical encoding of the NOT gate. In single-cell RNA-seq data, NOT relationships appear as anti-correlated expression pairs.

AND (Cooperativity) requires two conditions simultaneously. It is implemented by promoter architectures requiring multiple transcription factors: dual binding sites where both must be occupied, or cooperative assembly of multi-protein complexes. The interferon-β enhanceosome — requiring eight distinct proteins to assemble simultaneously on a 55 bp enhancer — is an eight-input AND gate. AND gates appear in transcriptomic data as genes expressed only when both upstream inputs are simultaneously high.

OR (Alternative Activation) requires at least one of multiple conditions. It is implemented by multiple independent promoters or alternative upstream activating sequences. OR gates appear as genes expressed in the union of upstream input domains.

Gate	Symbol	Truth Table	Molecular Implementation	scRNA-seq Signature	Class
¬ NOT Repression	`¬P`	P=0 → 1 P=1 → 0	Repressor binds operator, blocks RNAP. lac operon: LacI binds lacO. Operator sequence is the NOT gate.	Anti-correlated pairs: TF↑ → target↓	I
∧ AND Cooperativity	`P∧Q`	0,0→0 1,0→0 0,1→0 1,1→1	Dual binding sites; both must be occupied. IFN-β enhanceosome: 8-protein AND gate.	Intersection domain: gene ON only when both inputs high	I
∨ OR Alt. Activation	`P∨Q`	0,0→0 1,0→1 0,1→1 1,1→1	Multiple independent promoters; either sufficient. Stress response genes, tissue-specific promoters.	Union domain: gene ON when any input high	I
→ CONDITIONAL IF-THEN · Master	`P→Q`	P=0 → Q=0 P=1 → Q=1 Introduces time, context, threshold	Signal transduction cascade; threshold = K_d. lac: allolactose → LacI release → lacZ ON. All feedback = CONDITIONAL applied recursively.	Correlated response with sharp threshold at K_d; time-delayed	I–V
⊼ NAND Co-repressor COMPLETE	`¬(P∧Q)`	0,0→1 1,0→1 0,1→1 1,1→0	Repressor requires co-repressor for active conformation. trp operon: TrpR + 2× tryptophan. NAND alone is Boolean-complete.	Gene ON except when both repressor and co-repressor present	I
⊽ NOR Dual Repression COMPLETE	`¬(P∨Q)`	0,0→1 1,0→0 0,1→0 1,1→0	Either of two repressors alone blocks transcription. Developmental genes: TF repressor + Polycomb. NOR alone is also Boolean-complete.	Very narrow expression: gene ON only when both repressors absent	I

Figure 2. All six gates derive from the single ground primitive of binding. NAND and NOR are each individually functionally complete — any Boolean regulatory logic can be built from NAND gates alone. The CONDITIONAL is the master primitive; all feedback structures are CONDITIONAL applied recursively.

2.3 The CONDITIONAL: Master Primitive

The CONDITIONAL — IF P THEN Q — is the most biologically fundamental logical operation. It is not merely one gate among others. Unlike the Boolean gates (NOT, AND, OR), which are truth-functional (output depends only on current input values), the CONDITIONAL introduces a temporal dimension (P is detected before Q is executed), a contextual dimension (the same P can trigger different Q in different cell types), and a threshold dimension (the response fires only when signal exceeds the binding affinity Kd). The molecular implementation is a signal transduction cascade: ligand binds receptor → conformational change → TF activation → target gene transcription.

Feedback as a special case of the CONDITIONAL. Feedback is the CONDITIONAL applied recursively: IF output Q exceeds threshold T THEN modify input P. Negative feedback (Q → ¬P) implements homeostasis. Positive feedback (Q → P) implements bistability. Delayed negative feedback (Q →[D] ¬Q) implements oscillation. Self-modifying feedback (Q → modify(P → Q)) implements Class V epigenetic reprogramming. The CONDITIONAL without feedback is Class I (decidable); the CONDITIONAL with feedback generates all higher complexity classes. This makes the CONDITIONAL the gateway to the entire complexity ladder.

2.4 NAND, NOR, XOR, and Functional Completeness

NAND is implemented by the co-repressor system: a repressor requires a co-repressor molecule to achieve active conformation. The trp operon repressor (TrpR) requires two tryptophan molecules — neither alone represses. NOR is implemented by dual alternative repression: either of two repressors alone is sufficient to block transcription, so the gene is expressed only when neither is present. XOR (exclusive OR) appears in competitive binding, where two TFs compete for the same site. The key theoretical consequence: NAND alone is functionally complete — any Boolean regulatory logic is in principle implementable. But Boolean completeness is only the floor of the system's expressive power; the feedback primitives of Class II-V circuits provide additional expressive power beyond it.

2.5 The Biconditional: A Derived but Fundamental Structure

The biconditional P ↔ Q — P if and only if Q — is derived from two CONDITIONALs running in opposite directions: (P → Q) ∧ (Q → P). It is not a new primitive, but it underlies two of the most important regulatory architectures in biology.

Mutual activation (biconditional without negation): Gene A activates Gene B and Gene B activates Gene A — (A → B) ∧ (B → A). Once either gene is activated, the loop sustains both. This is the logical structure of commitment. MyoD and myogenin in skeletal muscle differentiation, Oct4 and Sox2 in pluripotency, and GATA1 and PU.1 in hematopoietic lineage decisions all exhibit mutual activation.

Mutual repression (biconditional with negation): Gene A represses Gene B and Gene B represses Gene A — (A → ¬B) ∧ (B → ¬A). The circuit has exactly two stable states: A high/B low, and A low/B high. This is the toggle switch (Gardner et al. 2000) — bistability in its purest logical form. The biconditional reinforces the primacy of the CONDITIONAL: two of the most important regulatory architectures in biology emerge from combinations of primitives already identified, requiring no new molecular machinery.

2.6 Beyond Classical Logic: Temporal, Modal, and Predicate Extensions

Temporal Logic

The four fundamental temporal operators each have biological implementations. ALWAYS (□P) — constitutive expression; housekeeping genes. EVENTUALLY (◇P) — inducible expression; the lac operon implements EVENTUALLY(lacZ expressed). UNTIL (P U Q) — transient developmental expression; Hox genes expressed until positional identity is established. NEXT (○P) — the delay cascade; gene A activates gene B after one transcription-translation cycle. The repressilator implements NEXT recursively, producing sustained oscillation.

Temporal logic operators map onto the complexity ladder: ALWAYS, EVENTUALLY, UNTIL, and NEXT without recursion are Class I; NEXT applied recursively generates Class IV oscillation; UNTIL with a self-modifying condition generates Class V epigenetic silencing. The full logical grammar of the control layer is better described by temporal logic than classical propositional logic — a claim with consequences for the LEAN formalization path, since Mathlib contains substantial formal treatments of temporal logic.

Modal Logic

Modal operators distinguish between current state and possible states. Necessity (□P) maps to constitutive expression (housekeeping genes). Possibility (◇P) maps to cell-type-specific expression. The clinically significant operator is Impossibility (¬◇P) — epigenetic silencing, where a promoter is permanently inaccessible via heterochromatin or DNA methylation. The distinction between "gene G is not currently expressed" (propositional) and "gene G cannot be expressed in this context" (modal) matters: oncogene silencing by methylation (modal ¬◇) is fundamentally different from oncogene repression by a TF (propositional). Modal impossibility is a Class V operation.

Predicate Logic and Quantifiers

Predicate quantifiers distinguish population-level from cell-level expression: ∀x: expressed(G, x) (universal, housekeeping), ∃x: expressed(G, x) (existential, detected somewhere), ∃!x: expressed(G, x) (unique cell-type marker). This framework clarifies the difference between bulk RNA-seq (existential queries over cell mixtures) and single-cell RNA-seq (individual queries per cell). Single-cell heterogeneity — ∃x: expressed(G, x) ∧ ∃y: ¬expressed(G, y) — is the population-level signature of a bistable (Class III) circuit.

2.7 The Complete Primitive Vocabulary

Primitive	Symbol	Molecular Implementation	Complexity Class	Type
Binding	B(X,Y)	Sequence-specific molecular contact	RCA₀ ground	Foundational
NOT	¬P	Repressor-operator system	Class I	Boolean
AND	P∧Q	Cooperative dual binding site	Class I	Boolean
OR	P∨Q	Multiple independent promoters	Class I	Boolean
NAND	¬(P∧Q)	Co-repressor dual requirement	Class I	Boolean (complete)
NOR	¬(P∨Q)	Dual alternative repressors	Class I	Boolean (complete)
XOR	(P∨Q)∧¬(P∧Q)	Competitive binding at shared site	Class I	Boolean
CONDITIONAL	P→Q	Signal transduction cascade	Class I (no feedback)	Response — master primitive
BICONDITIONAL	P↔Q	Mutual activation loop	Class III (derived)	Derived
BICONDITIONAL+NOT	P↔¬Q	Toggle switch (mutual repression)	Class III (derived)	Derived
Negative feedback	Q→¬P	Autorepressor; product inhibition	Class II	Recursive
Positive feedback	Q→P	Autoactivator; bistable switch	Class III	Recursive
Delayed feedback	Q→[D]¬Q	Repressilator architecture	Class IV	Recursive
Self-modifying feedback	Q→modify(P→Q)	Epigenetic architecture modification	Class V	Recursive
ALWAYS	□P	Constitutive expression	Class I	Temporal
EVENTUALLY	◇P	Inducible expression	Class I	Temporal
UNTIL	P U Q	Transient developmental expression	Class I–II	Temporal
NEXT (recursive)	○P recursively	Oscillatory delay cascade	Class IV	Temporal
Necessity	□P (modal)	Housekeeping expression	Class I	Modal
Possibility	◇P (modal)	Cell-type-specific expression	Class I	Modal
Impossibility	¬◇P	Epigenetic silencing	Class V	Modal
Universal	∀x: expr(G,x)	Universal population expression	Class I	Predicate
Existential	∃x: expr(G,x)	Expression in some cells (bulk RNA-seq)	Class I	Predicate
Bistable population	∃x∧∃y¬: expr(G)	Single-cell bistable heterogeneity	Class III	Predicate

The most striking feature of this complete vocabulary is that all of its richness — temporal, modal, predicate, Boolean, recursive — derives from the single ground-level primitive of binding. Every entry is binding in a specific geometric, temporal, and contextual arrangement.

3. The Transcriptome as Runtime State

At any moment, each gene in a cell's genome is either expressed or not, and if expressed, at some level. The collection of all mRNA levels across all genes is the transcriptome. In the computational interpretation, this is the state vector of the genomic program. The dimensionality of the transcriptome (~20,000 dimensions for a human cell) is the dimensionality of the state space of the genomic computer. A single-cell RNA-seq dataset is a sample from the state space of the genomic program — not merely a collection of expression profiles.

The dimensionality reduction techniques standard in single-cell analysis — PCA, UMAP, t-SNE, diffusion maps — are techniques for finding the attractor structure of the genomic program's state space. The clusters that appear in UMAP plots of single-cell data are not merely statistical clusters — they are the attractors of the genomic computation. Cell types are attractors. The geometry of the UMAP plot reflects the computational topology of the regulatory circuits generating the data: Class III bistable circuits generate two-cluster UMAP plots; Class IV oscillatory circuits generate trajectory structure; Class V circuits generate complex attractor structure.

4. The Grammar of the Control Layer

The promoter architecture — the arrangement of binding sites, operator sequences, and regulatory elements upstream of each gene — is the instruction encoding: the physical medium in which the control layer program is written. Binding site motifs encode the identity of the TF; site position relative to the TSS encodes effect type; site spacing and orientation encode combinatorial logic; clustering encodes complexity; distance from TSS encodes temporal character.

The strongest claim of this paper is that the control layer of all organisms shares a universal grammar — the same logical primitives implemented in organism-specific molecular machinery. (Note: "universal grammar" here is used in the computational sense — a shared set of logical operations — not in Chomsky's linguistic sense of an innate language faculty. The analogy is structural, not cognitive.) Evidence: the same circuit motifs (autoregulation, feedforward loops, feedback oscillators) appear across all domains of life; the same computational functions (bistability, oscillation, adaptation) are implemented by different molecular machines in different organisms; synthetic biology circuits transplanted across organisms function correctly. The universal grammar conjecture predicts that logical structure is more conserved across evolution than molecular identity — testable by comparing GLMP flowcharts across organisms at the topological level.

5. The Five-Class Complexity Ladder

Class	Name	Description	Computability	Rev. Math	Ordinal	Analog	Example
V	Self-modifying / Epigenetic Feedback	Circuit rewrites its own regulatory architecture. Rice's theorem applies: perfect prediction provably impossible if Turing-complete.	Σ⁰₁ or above	ATR₀ / Π¹₁-CA₀	ε₀ or beyond	Peano Arithmetic	Epigenetic reprogramming circuits
IV	Mixed Feedback — Oscillators	Sustained oscillation. Circadian rhythms and developmental clocks. Period determined by delay structure.	Primitive recursive	ACA₀	Approaching ε₀	Pushdown automata	Repressilator (Elowitz & Leibler 2000)
III	Positive Feedback — Bistable Switches	Two stable attractors: cell fate decisions. State persists after signal removal. Toggle switch is the canonical case.	Δ⁰₂ (limit computable)	WKL₀	ω^ω	Finite automata with memory	Toggle switch (Gardner et al. 2000)
II	Negative Feedback — Damped Regulation	Graded responses and homeostasis. Negative feedback suppresses noise. No sustained oscillation.	Δ⁰₁ (bounded recursion)	RCA₀	ω^ω	Bounded arithmetic	Homeostatic gene regulation
I	Feed-Forward Only — No Loops	Decidable, complete, bounded expressive power. Output always determinable from input. No memory, no oscillation. The Tarski-like logical floor.	Δ⁰₁ (decidable)	Below RCA₀	ω	Tarski's geometry	Simple inducible promoters

Figure 3. The five-class genomic computational complexity ladder, from most expressive (Class V, Peano-like ceiling) to most constrained (Class I, Tarski-like floor). Each class is calibrated against three formal measures: Computability (Δ⁰₁ = decidable; Σ⁰₁ = recursively enumerable), Reverse Mathematics (Big Five subsystems), and proof-theoretic ordinal (ω = Tarski-level; ε₀ = Peano-level). Classes I and II are in principle fully predictable. Class V circuits are subject to Rice's theorem if Turing-complete.

6. Nine Predictions

PREDICTION 1Transcriptomic Noise Distribution Diagnoses Circuit ClassHigh Confidence

Different circuit classes generate different noise distributions in single-cell expression data. Class I: unimodal, low-variance. Class II (negative feedback): unimodal, very low-variance — feedback suppresses noise. Class III (bistable): bimodal. Class IV (oscillatory): time-structured, periodic. Class V (self-modifying): heavy-tailed, non-stationary.

Testability: For genes regulated by circuits of characterized class (toggle switch genes, repressilator genes, simple inducible promoters), compare observed single-cell expression distributions against predicted patterns.

PREDICTION 2Cell Fate Decisions Are Minimum-Energy State TransitionsHigh Confidence

If cell types are computational attractors, then cell fate decisions are transitions between attractors requiring passage through a region of low probability between two basins. The minimum number of transcription factor perturbations required to convert cell type A to cell type B is determined by the logical distance between the two attractors. Yamanaka's four reprogramming factors are the minimum perturbation set required to cross the energy barrier between the somatic and pluripotent attractors.

Testability: For any pair of cell types, the minimum reprogramming factor set can in principle be predicted from GLMP-style flowcharts of the relevant regulatory circuits.

PREDICTION 3Drug Resistance Is Attractor EscapeHigh Confidence

When a cancer cell population develops drug resistance, it transitions from a drug-sensitive attractor to a drug-resistant attractor. Circuit mutation resistance (permanent: the attractor structure changes) differs fundamentally from state transition resistance (potentially reversible: the cell moves to a pre-existing resistant attractor). Cancers that develop resistance through state transitions should be re-sensitizable by forcing the cell back to the sensitive attractor. Cancers with circuit mutation resistance cannot be re-sensitized because the sensitive attractor no longer exists.

Testability: The fraction of reversible vs. irreversible resistance correlates with circuit class: Class III generates reversible resistance; Class V generates irreversible resistance.

PREDICTION 7Rice's Theorem Sets a Hard Ceiling on Cancer PredictionMedium Confidence

Rice's theorem (1953) states that any non-trivial semantic property of programs is undecidable. If Class V genomic circuits are Turing-complete — a conjecture, not a proven theorem — then by Rice's theorem, no algorithm can determine for an arbitrary Class V circuit whether it will produce a given gene expression pattern. The predictive accuracy ceiling for AI models of cancer driven by Class V circuits is less than 100% and cannot be reached by scaling data or compute. This is a mathematical theorem about what any algorithm can achieve, not a technological limitation. The practical implication is constructive: identify which cancers are driven by Class I-III circuits (potentially fully predictable) versus Class IV-V (subject to principled limits).

Figure 4. Schematic diagram of the maximum theoretical prediction accuracy for AI models of biological regulation as a function of genomic circuit class. The colored ceiling curve is the theoretical maximum achievable by any algorithm. Class I circuits are in principle fully predictable. Accuracy declines through Classes II-IV. For Class V circuits, Rice's theorem establishes the ceiling is strictly less than 100% for any algorithm — not a technological limitation but a mathematical theorem about computability, if the Turing-completeness conjecture holds. The green dashed line projects performance of grammar-aware models (not yet built) predicted to approach the ceiling more closely than grammar-blind statistical models (gray dashed). All values are schematic; no empirical data is presented.

7. From Grammar-Blind to Grammar-Aware Models

Current AI models for biology — ESM2 (protein language model), Enformer (genomic sequence to gene expression), the Arc Institute's STATE model (perturbation response prediction) — learn statistical regularities in biological data without explicit knowledge of the logical grammar of gene regulation. A grammar-aware model would explicitly represent the logical primitives of the control layer as inductive biases. Rather than treating a promoter sequence as a string of nucleotides, it would parse the promoter as a logical formula: binding site X (TF-A) AND binding site Y (TF-B), with NOT site Z (repressor-C), CONDITIONAL on signal S.

The GLMP hybridization strategy — using RegulonDB as a primary regulatory backbone combined with LLM-generated logical interpretation — is a practical implementation of grammar decoding. Databases contribute entity completeness (which TFs, which binding sites, which genes); LLMs contribute logical interpretation (AND vs. OR, conditional vs. constitutive, feedback vs. feed-forward). Scaling this approach across the GLMP sample would produce a corpus of logical specifications for regulatory circuits — the training data for grammar-aware AI models.

8. The LEAN Formalization Path Revisited

The companion paper identified LEAN 4 and Mathlib as the long-term formal verification path for the genomic conjecture. In the context of this sequel, the LEAN formalization path takes on additional significance: it is the path toward formally verified grammar-aware AI models. A grammar-aware model whose logical specifications are formalized in LEAN would have a property no current biological AI model possesses — its predictions could be formally verified against circuit specifications.

graph TD FR["Formally specify genomic primitive relations
in LEAN's type theory
(binding as a typed dyadic relation)"] CT["Define circuit topology classes in LEAN
(DAG vs. cyclic graph; feedback types;
temporal and modal operators)"] CS["Prove decidability of Class I circuits within LEAN
(analogous to Tarski's completeness proof)"] GR["Formalize the regulatory grammar
(promoter logic as typed formulas in LEAN)"] RM["Establish Reverse Mathematics equivalences
for each complexity class"] GA["Train grammar-aware models
on LEAN-verified circuit specifications"] TH["Full formalization of the five-class ladder
as a theorem in LEAN/Mathlib"] FR --> CT --> CS --> GR --> RM --> TH GR --> GA classDef lean fill:#2E75B6,color:#fff,stroke:#1a4f8a classDef model fill:#27ae60,color:#fff,stroke:#1e8449 classDef theorem fill:#c0392b,color:#fff,stroke:#96281b class FR,CT,CS,GR,RM lean class GA model class TH theorem

Figure 5. The LEAN formalization path for grammar-aware biological AI. Blue: LEAN specification and proof steps. Green: grammar-aware model training. Red: the long-term theorem target.

9. The Empirical Sequel

This paper is the theoretical version of an argument that has an empirical sequel. The empirical tests include:

10. Open Questions

11. Conclusion

We have argued that the genome is a computer in a precise and non-metaphorical sense: its control layer implements a logical language whose primitives are binding, NOT, AND, OR, CONDITIONAL, and their temporal, modal, and predicate extensions, each with specific molecular implementations readable from genomic sequence and promoter architecture. The CONDITIONAL is the master primitive — the operation that introduces temporal response, contextual adaptation, and threshold sensitivity — of which all feedback relationships are special cases. The biconditional, NAND, NOR, XOR, and the full temporal-modal-predicate vocabulary are derived from these foundational operations, all grounded ultimately in the single primitive of binding.

The transcriptome is the runtime state of this program: a high-dimensional snapshot sampleable by single-cell RNA-seq and analyzable as an attractor landscape. Cell types are attractors; cell fate decisions are state transitions; transcriptomic noise distributions are diagnostic signatures of circuit class.

From this framework we derived nine predictions. The most consequential — that Rice's theorem sets a hard ceiling on cancer prediction for Class V circuits — is a mathematical claim about the limits of any algorithm. The most constructive — that grammar-aware AI models will outperform grammar-blind models — is a research program that GLMP's hybridization methodology is designed to support.

The genome has been partially read for sixty years, since the cracking of the codon table. What remains unread is the control layer — the regulatory program that determines when, where, and under what conditions each gene's instruction is executed. Reading that program is the next great project of molecular biology. The logical framework developed in this paper and its companion is one approach to that reading. It may not be the right approach. But it is a precise approach, with falsifiable predictions, a clear epistemic ladder, and a long-term formalization path.

Key References

This paper builds on references 1–33 of the companion paper. New references for this sequel:

Companion paper: Welz, G. Primitive Relations, Computational Complexity, and a Conjecture on the Genomic Computational Class. GLMP Working Paper, 2026. Full text.

Jacob, F. & Monod, J. Genetic regulatory mechanisms in the synthesis of proteins. Journal of Molecular Biology, 3(3), 1961. DOI. Founding paper of molecular regulatory logic; establishes the repressor-operator NOT gate.
Ptashne, M. A Genetic Switch: Phage Lambda Revisited. 3rd ed. Cold Spring Harbor Laboratory Press, 2004. Bistable Class III circuit as biconditional with negation.
Thanos, D. & Maniatis, T. Virus induction of human IFN-β gene expression requires the assembly of an enhanceosome. Cell, 83(7), 1995. DOI. Multi-input AND gate.
Alon, U. An Introduction to Systems Biology: Design Principles of Biological Circuits. Chapman & Hall/CRC, 2006. Network motifs as computational units.
Milo, R. et al. Network motifs: simple building blocks of complex networks. Science, 298(5594), 2002. DOI.
Gardner, T. et al. Construction of a genetic toggle switch in Escherichia coli. Nature, 2000. DOI. Toggle switch as bistable Class III circuit; biconditional with negation.
Elowitz, M. & Leibler, S. A synthetic oscillatory network of transcriptional regulators. Nature, 2000. DOI. Repressilator as Class IV oscillatory circuit.
Ferrell, J.E. & Xiong, W. Bistability in cell signaling. Chaos, 11(1), 2001. DOI. Mathematical basis of bistability.
Waddington, C.H. The Strategy of the Genes. Allen & Unwin, 1957. The epigenetic landscape; reinterpreted here as attractor landscape.
Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell, 126(4), 2006. DOI. Reprogramming as forced attractor transition.
Rice, H.G. Classes of Recursively Enumerable Sets and Their Decision Problems. Transactions of the American Mathematical Society, 74(2), 1953. JSTOR. Rice's theorem.
Pnueli, A. The temporal logic of programs. In Proc. 18th Annual Symposium on Foundations of Computer Science, 1977. DOI. Founding paper of temporal logic for program verification.
Clarke, E.M., Grumberg, O. & Peled, D. Model Checking. MIT Press, 1999. Temporal logic model checking; applicable to gene regulatory circuit verification.
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637), 2023. DOI. ESM2: grammar-blind protein language model.
Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods, 18, 2021. DOI. Enformer: sequence to expression model.
Stormo, G.D. DNA binding sites: representation and discovery. Bioinformatics, 16(1), 2000. DOI. The JASPAR/motif database approach.
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 2012. DOI. Systematic mapping of regulatory elements.
Ptashne, M. & Gann, A. Genes and Signals. Cold Spring Harbor Laboratory Press, 2002. Molecular basis of transcriptional activation and repression as logical operations.
Bintu, L. et al. Transcriptional regulation by the numbers: models. Current Opinion in Genetics & Development, 15(2), 2005. DOI. Quantitative treatment of promoter logic as combinatorial input-output functions.