All numbers reproducible at git commit 54b089f against corpus.lock.json.
KANAJA_DISABLE_FEEDBACK=1 python3 tests/stats_audit_v2.py
Approximately 7 minutes runtime on M1 16 GB.
| Configuration | nDCG@5 | nDCG@10 | MRR | R@1 | R@3 | R@5 | R@10 |
|---|---|---|---|---|---|---|---|
| RRF baseline (cosine + BM25 + prosodic) | 0.7620 | 0.7795 | 0.7480 | 0.7143 | 0.7619 | 0.8095 | 0.8571 |
| + fractal (w=0.090) | 0.7620 | 0.7795 | 0.7480 | 0.7143 | 0.7619 | 0.8095 | 0.8571 |
| + Poincaré (w=0.001) | 0.7686 | 0.7837 | 0.7520 | 0.7143 | 0.8095 | 0.8095 | 0.8571 |
| + Topo (w=0.001) | 0.7694 | 0.7863 | 0.7619 | 0.7143 | 0.8095 | 0.8095 | 0.8571 |
| + Poincaré + Topo | 0.7873 | 0.8023 | 0.7917 | 0.7619 | 0.8095 | 0.8095 | 0.8571 |
| + all three | 0.7873 | 0.8023 | 0.7917 | 0.7619 | 0.8095 | 0.8095 | 0.8571 |
Verdicts (per definition 1):
Robustness observation: the full-stack lift appears in every metric we measured (nDCG@5, nDCG@10, MRR, R@1) — this is not nDCG@5 cherry-picking. R@5 and R@10 are saturated for the baseline, so reranker contributions cannot move them.
| Comparison | Δ mean | 95% CI on Δ | Paired-perm p (B=10 000) | Verdict |
|---|---|---|---|---|
| fractal-only | 0.0000 | [+0.0000, +0.0000] | 1.000 | MISS |
| Poincaré-only | +0.0067 | [+0.0000, +0.0200] | 1.000 | UNDER-NS |
| Topo-only | +0.0074 | [−0.0076, +0.0261] | 0.504 | UNDER-NS |
| Poincaré + Topo | +0.0253 | [−0.0076, +0.0777] | 0.499 | UNDER-NS |
| All three | +0.0253 | [−0.0076, +0.0777] | 0.496 | UNDER-NS |
The Poincaré-only CI hits zero on the lower bound but does not exclude it. The full-stack CI is asymmetric (much more upside than downside), suggesting a real but bench-undetectable effect. Power analysis (S4) quantifies the gap.
| # | Δ | baseline | full | text-id | query (truncated) |
|---|---|---|---|---|---|
| 0 | +0.0000 | 1.0000 | 1.0000 | yogasutra | What is yoga and the cessation of mental fluctuations |
| 1 | +0.0000 | 0.0000 | 0.0000 | yogasutra | How does samadhi lead to liberation |
| 2 | +0.0000 | 1.0000 | 1.0000 | yogasutra | What are the eight limbs of yoga |
| 3 | +0.0000 | 0.0000 | 0.0000 | yogasutra | What is the relationship between purusha and prakriti |
| 4 | +0.0000 | 1.0000 | 1.0000 | nyayasutra | What are the valid means of knowledge according to Nyaya |
| 5 | +0.4890 | 0.4307 | 0.9197 | nyayasutra | How does inference work as a pramana |
| 6 | +0.0000 | 1.0000 | 1.0000 | nyayasutra | What is the definition of doubt in logic |
| 7 | +0.0000 | 1.0000 | 1.0000 | nyayasutra | How does Nyaya define perception |
| 8 | +0.0000 | 1.0000 | 1.0000 | panini_ashtadhyayi | What is the root of the verb to be in Sanskrit grammar |
| 9 | +0.0000 | 1.0000 | 1.0000 | panini_ashtadhyayi | How are nominal compounds formed in Sanskrit |
| 10 | +0.1228 | 0.5706 | 0.6934 | yaska_nirukta | What is the etymology of the word dharma |
| 11 | +0.0000 | 1.0000 | 1.0000 | yaska_nirukta | What does Nirukta say about the origin of words |
| 12 | +0.0000 | 1.0000 | 1.0000 | brahmasutra | What is Brahman and its relationship to Atman |
| 13 | +0.0000 | 1.0000 | 1.0000 | brahmasutra | How does Badarayana define the nature of ultimate reality |
| 14 | −0.0803 | 1.0000 | 0.9197 | chandogya_upanishad | What is the teaching of tat tvam asi |
| 15 | +0.0000 | 1.0000 | 1.0000 | mandukya_upanishad | What does Mandukya say about the four states |
| 16 | +0.0000 | 0.0000 | 0.0000 | katha_upanishad | What is the teaching on the self in Katha Upanishad |
| 17 | +0.0000 | 0.0000 | 0.0000 | samkhyakarika | What are the 25 tattvas of Samkhya philosophy |
| 18 | +0.0000 | 1.0000 | 1.0000 | samkhyakarika | How does Samkhya describe cosmic evolution |
| 19 | +0.0000 | 1.0000 | 1.0000 | arthashastra | What does Kautilya say about the duties of a king |
| 20 | +0.0000 | 1.0000 | 1.0000 | arthashastra | How should a king manage his treasury |
Summary: 18/21 unchanged · 2/21 improved (one substantially: +0.49) · 1/21 degraded (−0.08).
The per-query data makes the bench-mismatch story precise: 13 queries already at nDCG=1.0 (cosine wins), 4 queries already at 0.0 (cosine misses entirely; rerankers cannot recover these — they reorder retrieved candidates but cannot pull a missing chunk into the candidate set), 4 middle-ground queries reachable; rerankers fire on 3 of those 4. The bench has structural ceiling and floor effects that are independent of reranker quality.
Computed via N ≈ ((z_{α/2} + z_β) σ / Δ)² with z_{α/2}=1.96, z_β=0.84, σ from observed per-query difference vector.
| Comparison | Observed Δ | σ(per-query diff) | N required | We have | Shortfall factor |
|---|---|---|---|---|---|
| Poincaré-only vs baseline | +0.0067 | 0.023 | ~96 | 21 | 4.6× |
| Topo-only vs baseline | +0.0074 | 0.040 | ~226 | 21 | 10.8× |
| Poincaré + Topo vs baseline | +0.0253 | 0.100 | ~124 | 21 | 5.9× |
To detect the observed effects at conventional significance and power, the bench needs to grow by roughly a factor of 5–11. v2 will target N ≥ 200 with independent annotation.
Note on per-query σ being larger for full stack than for individual rerankers: this is the interaction effect (S5) — when both rerankers fire on the same query they amplify each other’s per-query swings, which inflates the diff vector’s standard deviation even as the mean lift grows.
| Singleton | Δ |
|---|---|
| Poincaré-only | +0.0067 |
| Topo-only | +0.0074 |
| Sum if additive | +0.0141 |
| Full stack (both on) | +0.0253 |
| Excess over additive | +0.0112 (~80% interaction term) |
The full-stack Δ exceeds the sum of singleton Δs by ~80%. Plausibly: Poincaré reorders within connected components in hyperbolic space, while Topo reorders by topological coherence; their combined reordering produces ranking permutations neither achieves alone. We do not have power to test whether this interaction is statistically significant; we report it as a quantitative observation only.
| Null | Mean (50 trials, seed=2026) | 95% CI of null mean | Δ vs full stack | Gate (τ=0.05) |
|---|---|---|---|---|
| G_A. Tradition-permuted | 0.1348 | [0.030, 0.266] | +0.6524 | ✓ |
| G_B. Tradition-random | 0.1225 | [0.044, 0.208] | +0.6648 | ✓ |
| G_C. Random-retrieval | 0.0931 | [0.019, 0.194] | +0.6942 | ✓ |
By Proposition 1 with Bonferroni correction at 3 tests, the gate PASSes with confidence ≥ 0.95 in each direction. The full-stack score (0.7873) is well below the leakage-suspicion ceiling (0.88).
git commit: 54b089f
corpus.lock: 9 artifacts, hashes documented in corpus.lock.json
random seed: 2026 (np.random.default_rng)
trial counts:
bootstrap CI: B = 10 000
paired permutation: B = 10 000
null distribution: N = 50 per null × 3 nulls
random text-id baseline: 100 trials
runtimes (M1 16 GB):
full audit (stats_audit_v2.py): ~7 min
null harness (null_corpus.py): ~4 min
single bench (sanskrit_bench.py): ~80 s
env:
KANAJA_DISABLE_FEEDBACK=1 (set automatically by sanskrit_bench.py)
pre-registration:
Vak-Kanaja-Unified-Fractal-Engine.pdf
SHA-256: 1eccc2e10762cc2e90b39e0490fb46a80a3b440670d1a1491171c04988b0d8d8
mtime: 2026-04-30 01:00:34 UTC
(predates all measurements in this paper)