Sparsh Sharma
Falsification methodology for retrieval; cross-cultural Sanskrit–Dravidian engines.
What I work on
Most evaluation harnesses for retrieval and ranking systems will quietly accept a constant predictor that exploits the gold label distribution. The standard permutation null does not catch this; the standard random-retrieval null does not catch it either. I work on the small piece of statistical machinery that does.
The applied side is a retrieval engine for cross-cultural Sanskrit and Dravidian text alignment — an under-served corner of computational philology where the hard parts (long compounds, oral-prosodic features, sparse parallel data) push retrieval methodology in useful directions.
Open artifacts
falsify-eval v0.1.0
Four-null falsification gate (incl. gold marginal-matched random — the contribution that catches constant predictors). Bootstrap CIs, paired permutation tests, cryptographic state lock for pre-registration. Apache-2.0. Pure stdlib + numpy.
Methodology preprint
A four-null falsification gate for retrieval evaluation, with a positive-control protocol against constant and marginal-exploiting predictors. Validated on N = 10,000 synthetic queries; broken-predictor suite; sensitivity grid; bench-size calibration curve. Submission to arXiv pending.
Quickstart
pip install -e git+https://github.com/spalsh-spec/falsify-eval.git#egg=falsify-eval
python -c "from falsify_eval import four_null_gate; help(four_null_gate)"
Or clone and run the 50-query synthetic demo: python examples/synthetic_demo.py.
Expected verdict: constant_predictor fails (Null D rejects),
oracle passes, mock_engine passes.
Writing
Notes on falsification methodology, retrieval evaluation, and computational philology. RSS.
Background
Undergraduate at RMIT University. Interested in any work that pushes retrieval evaluation toward the standards held by adjacent fields (clinical trials, psychometrics, computational philology). Reachable by email or LinkedIn for collaboration on falsification methodology, low-resource philological corpora, or both.