How We Improve the AI·5 min read·Updated 1 April 2026

The Improvement Loop

How AskBiz continuously improves AI accuracy through user feedback, automated benchmarking, and structured review cycles — and what role you play in making it better.

The Four-Stage Improvement Loop

AskBiz operates a continuous improvement loop across four stages:

Stage 1 — Signal collection

We collect three types of signals:

Explicit flags — thumbs-down feedback from users with error type and notes
Implicit signals — when a user immediately asks the same question again after receiving an answer, this suggests the first answer was unsatisfactory
Confidence discrepancy signals — when an answer marked High confidence is subsequently corrected by user data, we log this as a potential overconfidence case

Stage 2 — Pattern analysis

Signals are aggregated weekly. Our team looks for patterns:

Which question types generate the most flags?
Are errors clustered around specific data source types?
Are there systematic biases (e.g. always under-estimating a particular metric)?
Are errors increasing or decreasing after model updates?

Stage 3 — Intervention design

Based on patterns, we design improvements:

System prompt updates — adjusting the instructions Claude receives for specific question types
Data retrieval changes — improving which data is pulled for particular query patterns
Confidence threshold adjustments — recalibrating when High/Medium/Low confidence is assigned
Benchmark additions — adding new test cases based on error patterns
Model-level feedback — for systematic errors that appear to be model-level issues, we report to Anthropic

Stage 4 — Testing and deployment

Every change is tested against our full benchmark suite before deployment. A change is only deployed if it improves accuracy on the targeted error type without degrading accuracy on any other category by more than 0.2%.

Your Role in the Loop

Every flag you submit directly enters Stage 1 of the improvement loop. Flags are not just logged and ignored — they are the primary driver of Stage 2 pattern analysis.

The most impactful flags include:

A clear description of why the answer was wrong
The correct answer (or what the correct answer should look like)
The data source the correct answer should have come from

We don't require all three — even a bare thumbs-down helps us identify that an answer was unsatisfactory. But the more detail you provide, the faster we can identify and fix the underlying issue.

What We Do Not Do

To be explicit about the limits of our improvement process:

We do not use your individual business data to train AI models
We do not share your specific flagged examples with other users
We do not use the content of your questions to build user profiles
We do not deploy model changes without benchmark testing
We do not make improvement claims we have not measured

All improvement claims in this Transparency Centre are based on measured benchmark results, not subjective assessment.

Timelines

System prompt updates: deployed within 5–10 business days of identifying a pattern
Confidence threshold adjustments: deployed monthly as part of scheduled updates
Data retrieval improvements: deployed within 2–4 weeks of identifying the issue
Model-level improvements: depend on Anthropic's release cycle — typically 4–12 weeks after reporting
Methodology updates (Business Pulse, anomaly detection, churn): quarterly, with 7-day advance notice to users

The Four-Stage Improvement Loop

Your Role in the Loop

What We Do Not Do

Timelines

Related Articles