Jurisdiction Matrix
The engine's confidence varies by jurisdiction. Gurugram is where the verified-input cohort lives today. Greater Noida, Mumbai, Pune and Thane are observed but not yet validated — the engine knows what it doesn't know.
Active validation jurisdiction. Signature Global DRHP/RHP projects anchor headline revenue MAPE (n=4). HRERA Form REP-I Part C provides cost ground-truth: HRERA Bulk Gurugram (n=74, MAPE 10.4% / median 9.3%) + HRERA NCR-B Gurugram/Faridabad/Sonipat/Rohtak/Karnal/Panchkula (n=209, MAPE 10.8% / median 8.6%). Part of the 5-portal RERA coverage cohort (n=714, declared-agreement 11.3% / 8.5% — true-cost accuracy is measured separately vs independent QS). Every input adjustment cites Tier-1 primary URL + page + quoted text — 350+ cited HRERA brochure spec overrides.
Cost cohort (n=2, HRERA) — EXCLUDED from group-housing headline. Both projects (Godrej Retreat Vista, Park Arena / BPTP) are plotted-colony developments. Engine cost model assumes GFA × rate (group-housing); HRERA REP-I for plotted developments is land + common infra only. Scope mismatch MAPE 177.9% — documented transparently, not averaged into group-housing figures.
Observed cohort (n=2). No SEBI-verified plot anchors. MAPE pending.
MahaRERA cohort (n=5). Revenue not publicly disclosed for any project. Outside active MAPE.
MahaRERA cohort (n=2). Revenue not publicly disclosed. Outside active validation surface.
Expanded cohort (n=2, MahaRERA). Revenue not publicly disclosed. Area + timeline validation only.
Scope definition
The engine's cost model — GFA × construction_rate_per_sqm— is calibrated for multi-unit residential group-housing (apartments, floors, AHP affordable). Plotted-colony and commercial developments have fundamentally different cost structures: HRERA's Form REP-I Part C for those project types reports land + common infrastructure only, not per-unit superstructure cost. Comparing the engine's output against that yardstick produces a scope mismatch, not an accuracy failure. We measure plotted and commercial cohorts separately and will publish them when the cohort matures to n ≥ 10.
Faridabad cost cohort — scope mismatch disclosed
Two projects (Godrej Retreat Vista, Park Arena / BPTP) from Faridabad were added to the back-test corpus in the 2026-05-16 expansion and flagged with a 177.9% cost MAPE. Both are plotted-colony developments — the engine does not model that project type. They are excluded from the 5-portal coverage cohort (n=714) and tracked in a separate plotted-colony cohort (n=2). This is the honest thing to do: an engine that knows its scope boundary is more trustworthy than one that averages everything together.
UPRERA NCR Tier-5 sub-cohort — IDC-noise structural mismatch disclosed (19.5% MAPE, n=19)
Nineteen UPRERA NCR apartment projects (Noida + Greater Noida, S21 canonical cohort) where UPRERA Form-5 Row 3C declared construction cost embeds Infrastructure & Development Charges (IDC) the engine does not model — structural Rs15-30k/sqm baked into the declared total, biased toward UNDER-prediction. Comparing engine output against that yardstick produces a structural category mismatch (MAPE 19.5%, median 19.3%, n=19), not an accuracy failure. RTI to UPRERA for Form-5 Row 3C IDC breakdown is queued (draft at back_tests/RTI_*) — 5 outliers fail falsifiability [0.65, 1.55] at all spec levels with engine's current model. The 5-portal coverage cohort (n=714, RERA-declared agreement 11.3%) includes UPRERA NCR honestly without spec-tuning to mask IDC noise.
NCR-only sub-cohort — same engine, geographic refocus (15.1% MAPE, median 10.2%, n=313)
Master's NCR refocus directive (T461 sprint). Sub-cohort: HRERA Gurugram old Bulk (n=74) + HRERA NCR-B new Gurugram/Faridabad/Sonipat/Rohtak/Karnal/Panchkula (n=209) + UPRERA NCR canonical (n=19) + Delhi RERA (n=6) + HRERA Tier-3 (n=3). Mean APE 15.1%, median APE 10.2% — gate <11% on median hit by 0.8pp. n=313 is the honest NCR cap; closing to n=500 requires RTI for Lodha/M3M/Macrotech SPV mapping or improved UPRERA Form-5 Row 3C parser (NCR-adjacent districts probed and found structurally empty of private apartment QPR stock). The 5-portal coverage cohort (RERA-declared agreement 11.3% mean / 8.5% median, n=714) includes NCR plus MahaRERA + KRERA — broader scope, comparable agreement with the declared floor.
What the engine predicts
Not yet validated / out of scope
Per-metric confidence at a glance
Methodology
98 projects assembled from HRERA public register (Gurugram, Sohna, Faridabad), Signature Global DRHP + RHP FY23 filings, and K-RERA / TN-RERA / TS-RERA / MahaRERA inputs-only lists. No project was cherry-picked — every HRERA project passing quality gates (≥ 20 units, non-phased, cost-per-sqm sanity) is included.
Revenue validated only on SEBI-filed projects (Signature Global DRHP/RHP) where project-level GDV is in a table cell — not inferred, not estimated from press releases. This is Tier 1. Projects where revenue is disclosed in RHP prose (acreage + area from tables) are Tier 2. All others: revenue validation pending.
Five RERA portals expose project-level cost: HRERA in static HTML; MahaRERA + KRERA + UPRERA + Delhi RERA via portal-specific extraction. 4,760 raw scrapes filtered through quality gates to 714 publishable. Eight listed-developer annual reports confirmed no developer discloses project-level total cost. True-cost accuracy is measured separately against independent professional QS benchmarks (Colliers/JLL, ~8.5% / 7.3%, n=20). The RERA coverage cohort split into 5 tiers (declared-agreement, evidence-first 2026-05-21): 5-portal coverage cohort n=714 (11.3% / 8.5%), NCR-only sub-cohort n=313 (15.1% / 10.2%, NCR refocus), UPRERA NCR Tier-5 IDC-noise n=19 (19.5%, disclosed not hidden), TNRERA Chennai Tier-5 n=301 (23.8%, disclosed separately), 6-portal full transparency n=1,039 (16.2%, audit completeness). Every JSON correction underpinning these numbers cites a Tier-1 primary URL + page + quoted text.
Project Ledger
All 98projects. Rows where Actual = "—" have no public revenue disclosure available. Error % is |(predicted − actual) / actual|.
Note on rows 21–22 (Faridabad — Godrej Retreat Vista, Park Arena / BPTP): both are plotted-colony developments. The engine's cost model is designed for group-housing, not plotted-colony, so these two projects are excluded from the cost MAPE cohort. Revenue and area metrics are unaffected.
Data Acquisition Pipeline
Today: n=3 verified. Target: n=30+ across NCR, MMR, Bangalore, Chennai, Hyderabad. Here is the live queue of formal RTI applications, partnership conversations, and direct developer disclosures that will widen the verified cohort.
Source ledger: ww_rti_applications.md · ww_data_request_emails.md
Want to run a feasibility analysis on your plot with this engine?
Analyse My Plot →