ROUND R19 · MODULUM × MTP + INFERENCE STACK
2026-05-11 · 6 streams · 4/5 R1 + 3/5 R2 substantive · Codex max-budget · ~$25-40 spend
R18 closed the architecture (Modulum as control plane). R19 resolved the central technical question — how to compound 3× MTP and 3× Modulum on the same decode pass. Codex's R2 Substrate Delta Masks synthesis (M_shared + ΔM_k) reconciles the shared-mask vs per-head debate cleanly. Modulum API live; Forge healthy; Google JV becomes the R20 push.
Codex R2 collapsed the shared-mask vs per-head debate into one architecture. This is the R19 technical breakthrough.
R1 surfaced three candidate paths: shared-Modulum-mask (Claude) · per-head Modulum draft (Stream 1 path 2) · substrate-aware draft training with JV partner (Stream 1 path 3). Grok's R1 critique: "MTP changes the geometry of the attention mask Modulum must optimize, forcing architectural re-derivation rather than simple multiplication of speedups." Codex's R2 delta-mask synthesis reconciles both: shared mask for proximate positions; deltas for distal / uncertain positions.
Compound math under delta masks:
R20 prototype proves M_shared + ΔM_k on Gemma 4 Q4_K_M with ≥8× decode at 128k, ≤2% BABILong qa1 regression. Production rollout follows. JV with Google for Gemma 5 ships substrate-policy training defaults.
Cross-pollinated R2 verdicts. Codex's reframes resolved Stream 1 architecturally; Grok's pick won Stream 6 with Claude conceding.
| Stream | R2 Verdict | Key resolution |
|---|---|---|
| 1 — MTP × Modulum compound | sound · substrate delta masks | Codex M_shared + ΔM_k synthesizes shared + per-head. Compound 3N× achievable. |
| 2 — Hypernym Router | sound · inference-improving | Stop saying cheaper; say better. Retention SLA marketplace as Y2 evolution. |
| 3 — Reasoning-state arch | sound · obligation integrity | Codex tightening: obligation tracking is canonical Day-1 use case. Cursor as partner. |
| 4 — Forge deployment toolkit | sound · Y1 benchmark+integrate; Y1.5 deploy | Phased scope (Codex partial → sound with phases). |
| 5 — Cross-model · M5 Compiler | sound · substrate policy + reverse-compiler moat | Codex substrate-policy reframe + Grok M5-in-reverse for training-data moat. |
| 6 — World model | pivot-resolved · materials Y1 · econ+gov Y2 | Materials wins Y1 on revenue + Modulum fit; econ + governance are Y2 fast-validation. |
M_shared + ΔM_k as the MTP × Modulum compound architecture. Codex R2 origin; Claude R2 adopted; Grok R1 critique resolved.R19 produced the architecture spec. R20 productizes — JV pitch, prototype, materials science pilot, Router launch, Cursor partnership.
Hypernym brings: substrate policies + BABILong validation + retrosynthesis benchmark planning. Google brings: training pipeline + Gemma 5 weights + revenue share on Hypernym-verified calls. Year-1 LOI.
Prove M_shared + ΔM_k architecture achieves ≥8× decode at 128k on Gemma 4 Q4_K_M with ≤2% BABILong qa1 regression. Acceptance-aware shared-mask MTP.
Pfizer-AI / Eli Lilly / Recursion / Insilico for retrosynthesis-planning using Modulum-augmented Gemma. 7-figure annual + per-hit fee. Retrosynthesis-Bench-Hypernym v1 publication.
Inference-improving router + Retention SLA marketplace. Free tier + Pro $20/mo + Enterprise wholesale. Distribution: GitHub + pip + base_url swap.
Reasoning State SDK into Cursor's Composer/Agent mode. $0.05-0.20 per traced action; Cursor markets as Premium tier. ~100k paying developers in Cursor base.
Google 50-65% · xAI 25-35% · Meta 20-30% · DeepSeek 10-20% · Anthropic 5-15%. Pursue Google first; substrate-policy training run starts ~Q3 2026.
The architectural answer to Stream 1. M_shared + ΔM_k · most positions use shared mask; deeper/uncertain heads receive horizon-specific delta additions. R20 must prove ≥8× decode with ≤2% regression.
Expose per-head acceptance rates (head 1: 99% · head 4: 28%). Internal architecture tool + enterprise diagnostic. "Your workload loses long-context coherence at future-token horizon 3 because entity anchors are under-retained."
When HR falls back to Claude / GPT, preprocess context around obligations (constraints, decisions, test duties, unresolved assumptions) rather than semantic summaries. Hypernym extracts value even when routing to frontier APIs.
Honorable mentions: Substrate Policy Marketplace (Claude R2 — per-domain licensed substrate policies) · Retrosynthesis-Bench-Hypernym (Claude R2 — first public multi-step retrosynthesis benchmark) · M5 Compiler in REVERSE (Grok R1 — generates synthetic long-context training data Y2 moat) · MTP Acceptance Insurance v2 (Codex+Claude — adaptive decode depth degrades gracefully when uncertainty rises) · Modulum as differentiable attention curriculum for continued pre-training (Grok R1).
M_shared + ΔM_k mathematicallymlx_vlm CLI fix200 OK / 1.2s smoke test. Model: gemma-4-31B-it-Q4_K_M.gguf. Active draft_n=2 speculative drafting with 100% acceptance on short prompts. Production architecture is the live test bed for Substrate Delta Mask validation.
All services up. Zero drift. Zero orphans. Memory router 10/12 healthy. Ready to deploy Modulum benchmarks + customer integrations through forge benchmark / integrate.
Per memory rule: Codex catches blockers + surfaces unifying frames others miss.
M_shared + ΔM_k + Rejection-Position Profiler + Obligation-Preserving Context CompilerSynthesis adopts every Codex anchor each round. The compound-9× target Chris flagged is structurally achievable because Codex resolved the architectural debate cleanly.