A full write-up of the methods and measurements behind Tīrtha is in progress. Below is what it will cover, published as we finish each section, with every number shown honestly as the directional, pre-beta result it is.
We timed it on the live gateway: send a coding task, then send the very same task again. The first time, the team solves it end-to-end; the second, the answer is already remembered. We measured both, wall-clock, across a dozen tasks.
| Answer length | Solved fresh | On repeat | Faster |
|---|---|---|---|
| Short | 4.0s | 0.17s | 24× |
| Medium | 12.0s | 0.17s | 71× |
| Long | 27.8s | 0.15s | 185× |
Retrieval is rock-steady at ~0.16s; the gap is set by how much there is to generate the first time, so a short answer is ~24× faster on repeat, a long one ~185×. Method: live api.tirtha.ai, the identical request sent twice, end-to-end wall-clock, n=8 clean fresh-solve cases. Directional · pre-beta · to be re-run on our own hardware.
The whole thesis is that you don't have to send every problem to a frontier model to get frontier-quality answers. We tested it head-to-head on the full HumanEval+ benchmark: 164 coding problems, scored by actually running each solution against its tests (pass@1, a single attempt, the same harness for every system).
| System | Solved correctly · pass@1 · 164 problems |
|---|---|
| Tīrtha | 95% |
| Claude | 96% |
| Codex | 95% |
| One model, alone | 82% |
The number that matters isn't just the 95%. It's that Tīrtha reached it while sending only 57% of problems to a frontier model. The lightweight tier drafts an answer; an automated check decides whether it's trustworthy or needs escalating. Frontier-level accuracy, with frontier prices paid only where they're earned. Method: full HumanEval+ (164 problems), pass@1, each solution scored by execution against its tests, identical harness per system, run 2026-06-19. The lightweight tier in this run was a 7B model; today's lightweight tier is substantially stronger, so we expect to hold this accuracy while handling an even larger share without a frontier model. These are pre-beta numbers and we're still testing, and we'll re-run the full suite on our own hardware as we move out of pre-beta, and we expect them to get better, not worse. Gemini is excluded (quota-blocked mid-run → invalid). Directional · pre-beta.
The savings aren't a discount. They're structural. Send everyday work to a lightweight tier and only the genuinely hard problems to a frontier model, and the bill is lower from the very first request. Then, as the system recognizes work it has seen before, it drops further.
This is a cost model, not a measured production bill, projected from per-request routing economics and a conservative reuse assumption (it counts only exact repeats). It has not been tested on the full production system yet, and it is not final. The day-one routing saving is the firmest part; the compounding tail depends on how repetitive a given workload is. Directional · pre-beta.
We're pre-beta, and we'd rather you trust the numbers than be dazzled by them. So here's the honest accounting of every claim on this site.
| Claim | Status |
|---|---|
| Speed on repeat work: ~0.16s, 24–185× | Measured (live gateway, n=8) |
| Accuracy: 95% on HumanEval+, frontier parity | Measured (full 164, pass@1; 2026-06-19, on a 7B tier; current tier is stronger) |
| Cost: ~8× day one, compounding further | Projected (a model, not a production bill; not final) |
| Lightweight-tier coding (aider-polyglot) | Early (directional; Python subset, single run; not the full leaderboard metric) |
| Retrieval-quality improvements | Measured internally. The results are real; the method stays private |
What we haven't done yet: re-run the full suite on our own hardware (we're on borrowed compute), measure across every benchmark language, and validate the cost model against a real production bill at scale. We'll publish each as we finish it, and we expect the numbers to get better, not worse. If a claim here isn't marked "measured," treat it as a direction of travel, not a guarantee.
We're publishing section by section. Email [email protected] and we'll send the write-up as each part goes live.