Bhardwaj & Sons

A house of standards.

सत्यमेव जयते

We build the methodologies, the libraries,
and the audits by which the next century of
intelligent systems will be measured.

Founded May 2026  ·  Melbourne  ·  Trust-owned

Heritage

The lineage is older than the work. The work earns the lineage.

The name carries a weight we did not invent. The Bharadvāja gotra is one of the seven brahmanical lineages of the Vedic tradition, traced to the rishi who composed many of the hymns of the sixth maṇḍala of the Ṛgveda.

We claim no inherited authority from this. The house was founded in 2026 by the modern Bhardwaj family in Australia, owned through a private trust, and operates under the standards we set ourselves. The lineage is on the wall to remind us what the work has to live up to — not to substitute for it.

~1500 BCE

Bharadvāja the rishi

Composed many hymns of Ṛgveda Maṇḍala VI. The lineage from which the family takes its name.

~400 BCE

Bharadvāja Smṛti

Dharmaśāstra text in the lineage, codifying conduct that rests not on assertion but on evidence.

2026

Bhardwaj & Sons established

Founded as a private trust by the Bhardwaj family. First work: a public falsification methodology for retrieval and ranking systems, released open-source.

First Work

falsify-eval. The missing test in retrieval evaluation.

Retrieval and ranking systems today are graded on a method that cannot tell a real ranker from a predictor that just exploits the gold-label distribution. We built the patch.

The library is small. The discipline behind it is not. It implements a four-test gate where the standard pipeline uses three — and the fourth catches a class of false positives the field has been silently accepting for years. It is open-source under the Apache 2.0 licence so it can become a standard rather than a product.

"Most retrieval-system papers report a single aggregate metric and call it a contribution. Three failure modes make this unsafe at any benchmark size, and dangerous on small ones. This library closes one of them." — from the methodology preprint

Validated at ten thousand queries on an independent corpus. The complete library, the benchmark suite, and the preprint are public:

github.com/spalsh-spec/falsify-eval  →

House Standards

What we will not compromise on.

Four discipline-rules under which every piece of work the house ships is built. They are not aspirations. They are the conditions of the brand.

I.

Open by default.

The methodology is free, public, and citable. The implementation may be private; the standard never is. Free is the moat.

II.

Calibrated, never inflated.

Every claim survives the four-null gate before it leaves the house. If a result will not survive verification, it is not shipped.

III.

Verifiable in thirty seconds.

Every public artifact has a one-command demonstration. Trust is built on what the audience can prove, not on what we assert.

IV.

Earned, never claimed.

Heritage is a debt to the past, not a credential. The house signs no work it did not build.

Method of the House

Old principles, exactly applied. Nothing original here, except the discipline.

Four texts shape how we work. They are roughly two thousand five hundred years older than us. We did not choose them to look ancient. We chose them because the principles inside them are what good engineering already does, named correctly.

कर्मण्येवाधिकारस्ते
मा फलेषु कदाचन
Bhagavad Gītā · 2.47

"You have the right to action only — never to its fruits."

We ship the work and let time decide its outcome. Forecasts and valuations are not the measure; whether the work survives verification is. This is why every artifact we publish has a thirty-second demonstration anyone can run, and why we never ship a claim before the proof.

साम · दान
भेद · दण्ड
Kauṭilya · Arthaśāstra · Bk II

Sāma, dāna, bheda, daṇḍa — escalate in this order, never reverse.

Begin every relationship — customer, collaborator, critic — at sāma: shared interest. Escalate to dāna (concession) only when shared interest is exhausted. Bheda and daṇḍa are last instruments because they are most expensive. The house has never opened with force, and never will.

आ नो भद्राः क्रतवो
यन्तु विश्वतः
Ṛgveda · I.89.1

"Let noble thoughts come to us from every direction."

The work is open-source by design. We accept correction from any source — graduate students, regulators, hostile reviewers, anonymous internet commenters. The point is for the methodology to become better than its author. Authorship is a debt to the work, not an entitlement to it.

सत्यमेव जयते
नानृतम्
Muṇḍaka Upaniṣad · 3.1.6

"Truth alone triumphs — not falsehood."

We commit to never publishing a claim that we do not believe will survive falsification. When our own results fail to survive, we publish that too. The first work shipped by the house publicly retracted its own pre-fix numbers; that retraction is what earned the validated numbers their weight.

Engagements

Seven ways the house works. Seven only.

We do not bill by the hour. We do not negotiate scope mid-engagement. We accept a small number of clients per quarter so the work that leaves the house carries the standard the brand requires. The figures below are the figures.

I — The Audit

Methodology audit.

From AUD $120,000 · 4 weeks · fixed

A four-week falsification audit of your existing evaluation pipeline. We apply the four-null gate, isolated-baseline panel, and bench-bias diagnostic to your live system and tell you which of your published numbers will survive third-party verification — and which will not.

  • Written report with reproducible artifacts
  • Per-claim survival verdict against the four-null gate
  • Hardening recommendations, prioritised
  • One closing session with your research lead

Payable 50% on signature, 50% on delivery.

II — The Pipeline

Bespoke evaluation build.

From AUD $280,000 · 8–12 weeks · fixed

We design and ship the evaluation infrastructure your team will run for the next two years. Custom corpora, custom null hypotheses calibrated to your domain, custom dashboards. Built to your stack, owned by you, audited by us against the same standards we hold ourselves to.

  • Production-grade pipeline in your repo
  • Domain-calibrated null suite (Nulls A–D plus a fifth bespoke)
  • CI integration so regressions are caught at PR time
  • Two-week handover with your engineering lead

Payable in three milestones: signature, mid-build, delivery.

III — The Standing Retainer

Annual standards seat.

From AUD $240,000 · per year

One named partner of the house on retainer for the year. Quarterly review of every public claim before it leaves your team. First-call access for any claim under contestation. Renewable annually, never auto-renewing.

  • Quarterly methodology review of all public-facing results
  • On-call review for contested claims (48-hour SLA)
  • Two annual on-site days with your team
  • Right to cite the house as your standards reviewer

Annual, paid in advance. We accept a maximum of six retainers.

IV — The Retrieval Engine

Domain retrieval build.

From AUD $320,000 · 10–14 weeks · fixed

We design and ship a production-grade retrieval engine for your specialist corpus — legal, medical, financial, scientific, or any proprietary domain. Built on the Vāk-Kaṇaja architecture: three-channel φ-RRF (semantic + lexical + domain-specific), Docker-deployed, OpenAI-compatible API. Benchmarked against BEIR baselines using the four-null gate before delivery. The engine belongs to you; the methodology stays open.

  • Production pipeline in your infrastructure (Docker, arm64/amd64)
  • Three-channel retrieval: semantic FAISS, BM25, domain-specific channel
  • falsify-eval gate applied at every benchmark milestone
  • OpenAI-compatible REST API with nine endpoints
  • Two-week engineering handover and corpus ingestion guide

Payable in three milestones: signature, corpus-build, delivery.

V — The Output Architecture

Structured-output system design.

From AUD $180,000 · 6–8 weeks · fixed

We design the prompting architecture that forces your LLM stack into verifiable, schema-compliant outputs — and prove it holds across model scales. Built from the Ākāśa Pantheon framework: schema definition, system-prompt hardening, parse-time verification, and an empirical adherence benchmark across your model fleet. The finding that format adherence is non-monotonic in model scale is not a curiosity — it is a production risk most teams discover too late.

  • Bespoke schema definition for your output domain
  • System-prompt harness with parse-time verification
  • Adherence benchmark across your model fleet (N ≥ 100 runs)
  • Citation / reference auto-verification pipeline (where applicable)
  • Written report: per-model adherence rates, failure mode catalogue

Payable 50% on signature, 50% on delivery.

VI — The Corpus Infrastructure

Specialist knowledge corpus.

From AUD $220,000 · 6–10 weeks · fixed

We ingest, clean, embed, index, and serve your proprietary knowledge base at production grade. Any format, any language, any domain. The universal corpus adapter handles YAML-declared ingestion pipelines; FAISS indexing with int8 ONNX encoders for sub-100ms query latency; reproducible corpus state locked by SHA-256 hash and git-commit integrity. The result is a queryable artifact store your engineering team owns and can extend independently.

  • Full ingestion pipeline from your source format to FAISS + SQLite
  • ONNX int8 semantic embeddings (multilingual, sub-100ms warm latency)
  • Reproducible corpus state lock (SHA-256 + git-commit integrity)
  • Query API with epistemic-status labelling per result
  • Complete corpus build documentation and re-ingestion runbook

Payable in three milestones: signature, ingestion-complete, delivery.

VII — The Scale Audit

Model-scale behaviour audit.

From AUD $140,000 · 4–6 weeks · fixed

We empirically audit how your AI system's behaviour changes as you move across model scales. Schema adherence, citation accuracy, structured-output compliance, and failure-mode distribution are measured across your model fleet under controlled, reproducible conditions. The finding — that smaller models are more obedient and larger models more accurate, but neither is uniformly safe — is consistent enough to treat as a production constraint, not a research footnote.

  • Cross-scale benchmark (minimum three model sizes, N ≥ 36 per model)
  • Format-adherence, citation-accuracy, and failure-mode analysis
  • Stochastic variance run to separate structural failure from noise
  • Per-model deployment recommendation with risk classification
  • Reproducible benchmark harness, owned by your team post-engagement

Payable 50% on signature, 50% on delivery.

Commitment

A standing share to those who cannot pay for the work.

The house commits a defined share of every commercial contract to deploying AI and agentic tooling at no cost to non-profits, public-sector institutions, and underserved communities — beginning with India and Australia.

The figure, the recipients, and the audited use of the funds will be published annually. We treat the commitment the way the methodology treats a benchmark claim: nothing said that cannot be verified.

This is not a marketing line. It is a constraint we accept on every deal we close. The brand exists in part to protect this commitment from the pressure of growth.