Sequel to: Primitive Relations, Computational Complexity, and a Conjecture on the Genomic Computational Class
The companion paper established that the choice of primitive relations determines the logical character of a formal system, and conjectured that gene regulatory circuits can be classified by computational complexity class. This paper develops the computational hypothesis to its full extent. We argue that the genome implements a genuine two-layer computational system: a data layer (the codon table, encoding protein sequences) and a control layer (promoter architecture, encoding regulatory logic). The logical primitives of the control layer — binding, NOT, AND, OR, CONDITIONAL, NAND, NOR, XOR, biconditional, and their temporal, modal, and predicate extensions — have specific molecular implementations at the level of promoter sequence architecture and transcription factor interaction geometry. The CONDITIONAL (IF-THEN) is identified as the master primitive, of which all feedback relationships are special cases. The transcriptome is the runtime state of this computational system: a snapshot of which instructions are currently executing, readable as a logical state vector. From this framework we derive nine predictions about cell fate, cancer, drug resistance, the limits of virtual cell models, and the computational irreducibility of complex biological behavior. The most consequential prediction follows from Rice's theorem: if Class V genomic circuits are Turing-complete, then perfect prediction of cellular behavior from genomic sequence is provably impossible for any algorithm. We propose that grammar-aware AI models, informed by the logical structure of the control layer, will outperform grammar-blind statistical models in interpretability, sample efficiency, and formal verifiability.
The decoding of the genetic code between 1961 and 1966 — the mapping of 64 codons to 20 amino acids and three stop signals — is one of the great intellectual achievements of the 20th century. It revealed that the genome encodes protein sequences in a systematic, universal, and readable language. But it decoded only one layer of the genome's computational architecture. The codon table is the data layer: it specifies what proteins are made. A second layer — the control layer — specifies when, where, under what conditions, and in response to what signals each gene is expressed. This control layer is the regulatory program of the cell, and it remains only partially decoded.
The distinction between data layer and control layer corresponds to a fundamental distinction in computer science between a program's data structures and its control flow. A program that stores numbers in memory but has no conditional branching, no loops, and no subroutine calls is not a useful program — it is just a data store. The control flow — the IF-THEN statements, the loops, the function calls — is what makes a program a computation. The codon table, read in isolation, is the genome's data store. The regulatory architecture — the promoters, operators, enhancers, silencers, and the transcription factor networks that read them — is the genome's control flow.
This paper argues that the control layer is written in a language whose primitives are logical rather than chemical, whose grammar encodes computational operations, and whose runtime state is the transcriptome. The logical primitives of this language have specific molecular implementations that are in principle readable from genomic sequence and transcriptomic data.
As established in the companion paper, binding is the foundational primitive — the molecular analog of Tarski's betweenness relation. All other logical operations are derived from binding in specific geometric and contextual arrangements: sequence-specific protein-DNA binding (a transcription factor recognizes and binds a specific sequence motif), protein-protein binding (TFs interact with co-activators, co-repressors, and mediator complexes), and RNA-protein binding (regulatory RNAs bind target mRNAs or proteins, implementing post-transcriptional logic). The binding relation B(X, Y) is dyadic, binary in the logical abstraction, and the ground-level primitive — the RCA₀-level operation of the genomic computational system.
NOT (Repression) is implemented by the repressor-operator system. A repressor protein binds an operator sequence within the promoter; when bound, RNA polymerase cannot access the promoter and transcription is blocked. The operator sequence is the physical encoding of the NOT gate. In single-cell RNA-seq data, NOT relationships appear as anti-correlated expression pairs.
AND (Cooperativity) requires two conditions simultaneously. It is implemented by promoter architectures requiring multiple transcription factors: dual binding sites where both must be occupied, or cooperative assembly of multi-protein complexes. The interferon-β enhanceosome — requiring eight distinct proteins to assemble simultaneously on a 55 bp enhancer — is an eight-input AND gate. AND gates appear in transcriptomic data as genes expressed only when both upstream inputs are simultaneously high.
OR (Alternative Activation) requires at least one of multiple conditions. It is implemented by multiple independent promoters or alternative upstream activating sequences. OR gates appear as genes expressed in the union of upstream input domains.
| Gate | Symbol | Truth Table | Molecular Implementation | scRNA-seq Signature | Class |
|---|---|---|---|---|---|
| ¬ NOT Repression |
¬P |
P=0 → 1 P=1 → 0 |
Repressor binds operator, blocks RNAP. lac operon: LacI binds lacO. Operator sequence is the NOT gate. |
Anti-correlated pairs: TF↑ → target↓ | I |
| ∧ AND Cooperativity |
P∧Q |
0,0→0 1,0→0 0,1→0 1,1→1 |
Dual binding sites; both must be occupied. IFN-β enhanceosome: 8-protein AND gate. |
Intersection domain: gene ON only when both inputs high | I |
| ∨ OR Alt. Activation |
P∨Q |
0,0→0 1,0→1 0,1→1 1,1→1 |
Multiple independent promoters; either sufficient. Stress response genes, tissue-specific promoters. |
Union domain: gene ON when any input high | I |
| → CONDITIONAL IF-THEN · Master |
P→Q |
P=0 → Q=0 P=1 → Q=1 Introduces time, context, threshold |
Signal transduction cascade; threshold = Kd. lac: allolactose → LacI release → lacZ ON. All feedback = CONDITIONAL applied recursively. |
Correlated response with sharp threshold at Kd; time-delayed | I–V |
| ⊼ NAND Co-repressor COMPLETE |
¬(P∧Q) |
0,0→1 1,0→1 0,1→1 1,1→0 |
Repressor requires co-repressor for active conformation. trp operon: TrpR + 2× tryptophan. NAND alone is Boolean-complete. |
Gene ON except when both repressor and co-repressor present | I |
| ⊽ NOR Dual Repression COMPLETE |
¬(P∨Q) |
0,0→1 1,0→0 0,1→0 1,1→0 |
Either of two repressors alone blocks transcription. Developmental genes: TF repressor + Polycomb. NOR alone is also Boolean-complete. |
Very narrow expression: gene ON only when both repressors absent | I |
Figure 2. All six gates derive from the single ground primitive of binding. NAND and NOR are each individually functionally complete — any Boolean regulatory logic can be built from NAND gates alone. The CONDITIONAL is the master primitive; all feedback structures are CONDITIONAL applied recursively.
The CONDITIONAL — IF P THEN Q — is the most biologically fundamental logical operation. It is not merely one gate among others. Unlike the Boolean gates (NOT, AND, OR), which are truth-functional (output depends only on current input values), the CONDITIONAL introduces a temporal dimension (P is detected before Q is executed), a contextual dimension (the same P can trigger different Q in different cell types), and a threshold dimension (the response fires only when signal exceeds the binding affinity Kd). The molecular implementation is a signal transduction cascade: ligand binds receptor → conformational change → TF activation → target gene transcription.
Feedback as a special case of the CONDITIONAL. Feedback is the CONDITIONAL applied recursively: IF output Q exceeds threshold T THEN modify input P. Negative feedback (Q → ¬P) implements homeostasis. Positive feedback (Q → P) implements bistability. Delayed negative feedback (Q →[D] ¬Q) implements oscillation. Self-modifying feedback (Q → modify(P → Q)) implements Class V epigenetic reprogramming. The CONDITIONAL without feedback is Class I (decidable); the CONDITIONAL with feedback generates all higher complexity classes. This makes the CONDITIONAL the gateway to the entire complexity ladder.
NAND is implemented by the co-repressor system: a repressor requires a co-repressor molecule to achieve active conformation. The trp operon repressor (TrpR) requires two tryptophan molecules — neither alone represses. NOR is implemented by dual alternative repression: either of two repressors alone is sufficient to block transcription, so the gene is expressed only when neither is present. XOR (exclusive OR) appears in competitive binding, where two TFs compete for the same site. The key theoretical consequence: NAND alone is functionally complete — any Boolean regulatory logic is in principle implementable. But Boolean completeness is only the floor of the system's expressive power; the feedback primitives of Class II-V circuits provide additional expressive power beyond it.
The biconditional P ↔ Q — P if and only if Q — is derived from two CONDITIONALs running in opposite directions: (P → Q) ∧ (Q → P). It is not a new primitive, but it underlies two of the most important regulatory architectures in biology.
Mutual activation (biconditional without negation): Gene A activates Gene B and Gene B activates Gene A — (A → B) ∧ (B → A). Once either gene is activated, the loop sustains both. This is the logical structure of commitment. MyoD and myogenin in skeletal muscle differentiation, Oct4 and Sox2 in pluripotency, and GATA1 and PU.1 in hematopoietic lineage decisions all exhibit mutual activation.
Mutual repression (biconditional with negation): Gene A represses Gene B and Gene B represses Gene A — (A → ¬B) ∧ (B → ¬A). The circuit has exactly two stable states: A high/B low, and A low/B high. This is the toggle switch (Gardner et al. 2000) — bistability in its purest logical form. The biconditional reinforces the primacy of the CONDITIONAL: two of the most important regulatory architectures in biology emerge from combinations of primitives already identified, requiring no new molecular machinery.
The four fundamental temporal operators each have biological implementations. ALWAYS (□P) — constitutive expression; housekeeping genes. EVENTUALLY (◇P) — inducible expression; the lac operon implements EVENTUALLY(lacZ expressed). UNTIL (P U Q) — transient developmental expression; Hox genes expressed until positional identity is established. NEXT (○P) — the delay cascade; gene A activates gene B after one transcription-translation cycle. The repressilator implements NEXT recursively, producing sustained oscillation.
Temporal logic operators map onto the complexity ladder: ALWAYS, EVENTUALLY, UNTIL, and NEXT without recursion are Class I; NEXT applied recursively generates Class IV oscillation; UNTIL with a self-modifying condition generates Class V epigenetic silencing. The full logical grammar of the control layer is better described by temporal logic than classical propositional logic — a claim with consequences for the LEAN formalization path, since Mathlib contains substantial formal treatments of temporal logic.
Modal operators distinguish between current state and possible states. Necessity (□P) maps to constitutive expression (housekeeping genes). Possibility (◇P) maps to cell-type-specific expression. The clinically significant operator is Impossibility (¬◇P) — epigenetic silencing, where a promoter is permanently inaccessible via heterochromatin or DNA methylation. The distinction between "gene G is not currently expressed" (propositional) and "gene G cannot be expressed in this context" (modal) matters: oncogene silencing by methylation (modal ¬◇) is fundamentally different from oncogene repression by a TF (propositional). Modal impossibility is a Class V operation.
Predicate quantifiers distinguish population-level from cell-level expression: ∀x: expressed(G, x) (universal, housekeeping), ∃x: expressed(G, x) (existential, detected somewhere), ∃!x: expressed(G, x) (unique cell-type marker). This framework clarifies the difference between bulk RNA-seq (existential queries over cell mixtures) and single-cell RNA-seq (individual queries per cell). Single-cell heterogeneity — ∃x: expressed(G, x) ∧ ∃y: ¬expressed(G, y) — is the population-level signature of a bistable (Class III) circuit.
| Primitive | Symbol | Molecular Implementation | Complexity Class | Type |
|---|---|---|---|---|
| Binding | B(X,Y) | Sequence-specific molecular contact | RCA₀ ground | Foundational |
| NOT | ¬P | Repressor-operator system | Class I | Boolean |
| AND | P∧Q | Cooperative dual binding site | Class I | Boolean |
| OR | P∨Q | Multiple independent promoters | Class I | Boolean |
| NAND | ¬(P∧Q) | Co-repressor dual requirement | Class I | Boolean (complete) |
| NOR | ¬(P∨Q) | Dual alternative repressors | Class I | Boolean (complete) |
| XOR | (P∨Q)∧¬(P∧Q) | Competitive binding at shared site | Class I | Boolean |
| CONDITIONAL | P→Q | Signal transduction cascade | Class I (no feedback) | Response — master primitive |
| BICONDITIONAL | P↔Q | Mutual activation loop | Class III (derived) | Derived |
| BICONDITIONAL+NOT | P↔¬Q | Toggle switch (mutual repression) | Class III (derived) | Derived |
| Negative feedback | Q→¬P | Autorepressor; product inhibition | Class II | Recursive |
| Positive feedback | Q→P | Autoactivator; bistable switch | Class III | Recursive |
| Delayed feedback | Q→[D]¬Q | Repressilator architecture | Class IV | Recursive |
| Self-modifying feedback | Q→modify(P→Q) | Epigenetic architecture modification | Class V | Recursive |
| ALWAYS | □P | Constitutive expression | Class I | Temporal |
| EVENTUALLY | ◇P | Inducible expression | Class I | Temporal |
| UNTIL | P U Q | Transient developmental expression | Class I–II | Temporal |
| NEXT (recursive) | ○P recursively | Oscillatory delay cascade | Class IV | Temporal |
| Necessity | □P (modal) | Housekeeping expression | Class I | Modal |
| Possibility | ◇P (modal) | Cell-type-specific expression | Class I | Modal |
| Impossibility | ¬◇P | Epigenetic silencing | Class V | Modal |
| Universal | ∀x: expr(G,x) | Universal population expression | Class I | Predicate |
| Existential | ∃x: expr(G,x) | Expression in some cells (bulk RNA-seq) | Class I | Predicate |
| Bistable population | ∃x∧∃y¬: expr(G) | Single-cell bistable heterogeneity | Class III | Predicate |
The most striking feature of this complete vocabulary is that all of its richness — temporal, modal, predicate, Boolean, recursive — derives from the single ground-level primitive of binding. Every entry is binding in a specific geometric, temporal, and contextual arrangement.
At any moment, each gene in a cell's genome is either expressed or not, and if expressed, at some level. The collection of all mRNA levels across all genes is the transcriptome. In the computational interpretation, this is the state vector of the genomic program. The dimensionality of the transcriptome (~20,000 dimensions for a human cell) is the dimensionality of the state space of the genomic computer. A single-cell RNA-seq dataset is a sample from the state space of the genomic program — not merely a collection of expression profiles.
The dimensionality reduction techniques standard in single-cell analysis — PCA, UMAP, t-SNE, diffusion maps — are techniques for finding the attractor structure of the genomic program's state space. The clusters that appear in UMAP plots of single-cell data are not merely statistical clusters — they are the attractors of the genomic computation. Cell types are attractors. The geometry of the UMAP plot reflects the computational topology of the regulatory circuits generating the data: Class III bistable circuits generate two-cluster UMAP plots; Class IV oscillatory circuits generate trajectory structure; Class V circuits generate complex attractor structure.
The promoter architecture — the arrangement of binding sites, operator sequences, and regulatory elements upstream of each gene — is the instruction encoding: the physical medium in which the control layer program is written. Binding site motifs encode the identity of the TF; site position relative to the TSS encodes effect type; site spacing and orientation encode combinatorial logic; clustering encodes complexity; distance from TSS encodes temporal character.
The strongest claim of this paper is that the control layer of all organisms shares a universal grammar — the same logical primitives implemented in organism-specific molecular machinery. (Note: "universal grammar" here is used in the computational sense — a shared set of logical operations — not in Chomsky's linguistic sense of an innate language faculty. The analogy is structural, not cognitive.) Evidence: the same circuit motifs (autoregulation, feedforward loops, feedback oscillators) appear across all domains of life; the same computational functions (bistability, oscillation, adaptation) are implemented by different molecular machines in different organisms; synthetic biology circuits transplanted across organisms function correctly. The universal grammar conjecture predicts that logical structure is more conserved across evolution than molecular identity — testable by comparing GLMP flowcharts across organisms at the topological level.
| Class | Name | Description | Computability | Rev. Math | Ordinal | Analog | Example |
|---|---|---|---|---|---|---|---|
| V | Self-modifying / Epigenetic Feedback | Circuit rewrites its own regulatory architecture. Rice's theorem applies: perfect prediction provably impossible if Turing-complete. | Σ⁰₁ or above | ATR₀ / Π¹₁-CA₀ | ε₀ or beyond | Peano Arithmetic | Epigenetic reprogramming circuits |
| IV | Mixed Feedback — Oscillators | Sustained oscillation. Circadian rhythms and developmental clocks. Period determined by delay structure. | Primitive recursive | ACA₀ | Approaching ε₀ | Pushdown automata | Repressilator (Elowitz & Leibler 2000) |
| III | Positive Feedback — Bistable Switches | Two stable attractors: cell fate decisions. State persists after signal removal. Toggle switch is the canonical case. | Δ⁰₂ (limit computable) | WKL₀ | ωω | Finite automata with memory | Toggle switch (Gardner et al. 2000) |
| II | Negative Feedback — Damped Regulation | Graded responses and homeostasis. Negative feedback suppresses noise. No sustained oscillation. | Δ⁰₁ (bounded recursion) | RCA₀ | ωω | Bounded arithmetic | Homeostatic gene regulation |
| I | Feed-Forward Only — No Loops | Decidable, complete, bounded expressive power. Output always determinable from input. No memory, no oscillation. The Tarski-like logical floor. | Δ⁰₁ (decidable) | Below RCA₀ | ω | Tarski's geometry | Simple inducible promoters |
Figure 3. The five-class genomic computational complexity ladder, from most expressive (Class V, Peano-like ceiling) to most constrained (Class I, Tarski-like floor). Each class is calibrated against three formal measures: Computability (Δ⁰₁ = decidable; Σ⁰₁ = recursively enumerable), Reverse Mathematics (Big Five subsystems), and proof-theoretic ordinal (ω = Tarski-level; ε₀ = Peano-level). Classes I and II are in principle fully predictable. Class V circuits are subject to Rice's theorem if Turing-complete.
Different circuit classes generate different noise distributions in single-cell expression data. Class I: unimodal, low-variance. Class II (negative feedback): unimodal, very low-variance — feedback suppresses noise. Class III (bistable): bimodal. Class IV (oscillatory): time-structured, periodic. Class V (self-modifying): heavy-tailed, non-stationary.
If cell types are computational attractors, then cell fate decisions are transitions between attractors requiring passage through a region of low probability between two basins. The minimum number of transcription factor perturbations required to convert cell type A to cell type B is determined by the logical distance between the two attractors. Yamanaka's four reprogramming factors are the minimum perturbation set required to cross the energy barrier between the somatic and pluripotent attractors.
When a cancer cell population develops drug resistance, it transitions from a drug-sensitive attractor to a drug-resistant attractor. Circuit mutation resistance (permanent: the attractor structure changes) differs fundamentally from state transition resistance (potentially reversible: the cell moves to a pre-existing resistant attractor). Cancers that develop resistance through state transitions should be re-sensitizable by forcing the cell back to the sensitive attractor. Cancers with circuit mutation resistance cannot be re-sensitized because the sensitive attractor no longer exists.
The modal computational class of regulatory circuits correlates with organismal complexity, measurable by GLMP-style topological classification across species. Prokaryotes: predominantly Class I-II. Unicellular eukaryotes: Class II-III. Simple multicellular organisms: Class III-IV. Complex vertebrates: Class IV-V.
Virtual cell models should have accuracy correlating with the circuit class of target genes. Highest accuracy for Class I-II circuits (decidable, uniquely determined). Lower accuracy for Class III (bistable — response depends on which attractor the cell is currently in). Lowest for Class IV-V (oscillatory and self-modifying — response depends on phase or current epigenetic state).
The number of transcription factors required for cellular reprogramming should relate to the logical depth of the circuit separating source and target cell type attractors. Topologically close cell type pairs require fewer factors; topologically distant pairs require more. A GLMP-style analysis should produce predicted reprogramming factor counts correlating with experimentally observed minimums.
Rice's theorem (1953) states that any non-trivial semantic property of programs is undecidable. If Class V genomic circuits are Turing-complete — a conjecture, not a proven theorem — then by Rice's theorem, no algorithm can determine for an arbitrary Class V circuit whether it will produce a given gene expression pattern. The predictive accuracy ceiling for AI models of cancer driven by Class V circuits is less than 100% and cannot be reached by scaling data or compute. This is a mathematical theorem about what any algorithm can achieve, not a technological limitation. The practical implication is constructive: identify which cancers are driven by Class I-III circuits (potentially fully predictable) versus Class IV-V (subject to principled limits).
Grammar-aware models explicitly representing the logical primitives and using them as inductive biases should require less training data for equivalent accuracy on Class I-III circuits; be more interpretable (predictions expressible in terms of logical primitives, auditable by biologists); generalize better across organisms (because the logical grammar is universal); and be formally verifiable against the LEAN formalization path described in the companion paper.
If the control layer is a language with a universal grammar, it has a finite vocabulary — approximately 1,600 known human TF binding motifs constituting the alphabet of regulatory instructions. Completing the vocabulary is a finite project, analogous to completing the codon table. Once complete, any promoter sequence can in principle be read as a logical formula in the regulatory grammar — a program fragment specifying which conditions activate the gene, which repress it, and which combination is required for each outcome.
Current AI models for biology — ESM2 (protein language model), Enformer (genomic sequence to gene expression), the Arc Institute's STATE model (perturbation response prediction) — learn statistical regularities in biological data without explicit knowledge of the logical grammar of gene regulation. A grammar-aware model would explicitly represent the logical primitives of the control layer as inductive biases. Rather than treating a promoter sequence as a string of nucleotides, it would parse the promoter as a logical formula: binding site X (TF-A) AND binding site Y (TF-B), with NOT site Z (repressor-C), CONDITIONAL on signal S.
The GLMP hybridization strategy — using RegulonDB as a primary regulatory backbone combined with LLM-generated logical interpretation — is a practical implementation of grammar decoding. Databases contribute entity completeness (which TFs, which binding sites, which genes); LLMs contribute logical interpretation (AND vs. OR, conditional vs. constitutive, feedback vs. feed-forward). Scaling this approach across the GLMP sample would produce a corpus of logical specifications for regulatory circuits — the training data for grammar-aware AI models.
The companion paper identified LEAN 4 and Mathlib as the long-term formal verification path for the genomic conjecture. In the context of this sequel, the LEAN formalization path takes on additional significance: it is the path toward formally verified grammar-aware AI models. A grammar-aware model whose logical specifications are formalized in LEAN would have a property no current biological AI model possesses — its predictions could be formally verified against circuit specifications.
This paper is the theoretical version of an argument that has an empirical sequel. The empirical tests include:
We have argued that the genome is a computer in a precise and non-metaphorical sense: its control layer implements a logical language whose primitives are binding, NOT, AND, OR, CONDITIONAL, and their temporal, modal, and predicate extensions, each with specific molecular implementations readable from genomic sequence and promoter architecture. The CONDITIONAL is the master primitive — the operation that introduces temporal response, contextual adaptation, and threshold sensitivity — of which all feedback relationships are special cases. The biconditional, NAND, NOR, XOR, and the full temporal-modal-predicate vocabulary are derived from these foundational operations, all grounded ultimately in the single primitive of binding.
The transcriptome is the runtime state of this program: a high-dimensional snapshot sampleable by single-cell RNA-seq and analyzable as an attractor landscape. Cell types are attractors; cell fate decisions are state transitions; transcriptomic noise distributions are diagnostic signatures of circuit class.
From this framework we derived nine predictions. The most consequential — that Rice's theorem sets a hard ceiling on cancer prediction for Class V circuits — is a mathematical claim about the limits of any algorithm. The most constructive — that grammar-aware AI models will outperform grammar-blind models — is a research program that GLMP's hybridization methodology is designed to support.
The genome has been partially read for sixty years, since the cracking of the codon table. What remains unread is the control layer — the regulatory program that determines when, where, and under what conditions each gene's instruction is executed. Reading that program is the next great project of molecular biology. The logical framework developed in this paper and its companion is one approach to that reading. It may not be the right approach. But it is a precise approach, with falsifiable predictions, a clear epistemic ladder, and a long-term formalization path.
Either outcome advances the field.
This paper builds on references 1–33 of the companion paper. New references for this sequel:
Companion paper: Welz, G. Primitive Relations, Computational Complexity, and a Conjecture on the Genomic Computational Class. GLMP Working Paper, 2026. Full text.