We score every answer we give, then publish the numbers.
Most AI vendors ask you to trust them. We instead run every production question through seven independent scorers — factual accuracy, routing correctness, citation grounding, budget discipline, and more — and aggregate the rolling 7-day results across every firm on the platform. These numbers are live.
How we measure
Every real customer question runs through two systems in parallel: the copilot your team uses, and a prior version we keep running in the background as a comparison. Seven independent scorers grade each answer on the dimensions that matter for accounting work.
Some scorers are deterministic (did the answer cite a source? did it stay within cost limits?). Others use an independent AI as a judge (was the answer factually correct? was the response well-written?). No scorer sees which firm asked the question — it sees only the question and the answer.
The numbers above aggregate across the rolling last 7 days. A minimum of 10 distinct firms must have data before any aggregate is published, and per-firm numbers are never exposed through this endpoint. The page is cached at the edge for 5 minutes so no individual firm can use this view to infer their own performance signal in real time.
The full methodology and scorer definitions are part of our open product documentation — email founder@memwamind.com if you'd like to see them before signing.
Why this matters
Accounting firms don't have the luxury of “sometimes” correct. Every answer your team hands to a client has to be defensible. MemwaMind was built on the premise that if we can't prove how often we're right, we shouldn't ship. This page is our proof.