Accuracy & Errorsยท6 min readยทUpdated 1 April 2026

AI Accuracy & Known Error Types

What AskBiz's AI accuracy rates are across different question categories, the most common error types, what causes them, and how we track and reduce them.

Live metrics

98.2%โ†‘Calculation accuracy
94.7%โ†‘Pattern accuracy
91.3%โ†’Factual accuracy
87.6%โ†‘Recommendation accuracy
1.2%โ†“Overall hallucination rate

Our Accuracy Commitment

We publish accuracy metrics because transparency about AI limitations is more useful than projecting false confidence. These numbers are measured against our internal benchmark suite โ€” a set of test queries with known correct answers, run against each new model version before deployment.

Our benchmark covers four question categories:

  • Calculation questions โ€” precise arithmetic on your data (revenue totals, margin calculations, landed costs)
  • Pattern questions โ€” identifying trends, anomalies, and comparisons
  • Factual questions โ€” explaining business concepts, regulatory frameworks, definitions
  • Recommendation questions โ€” strategic suggestions based on data analysis

Accuracy by Category

Calculation questions: 98.2% accuracy

Math on your data is where AI is most reliable. Given complete, correctly formatted data, arithmetic errors are rare. The 1.8% error rate is almost entirely caused by ambiguous date ranges or currency conversion edge cases.

Pattern questions: 94.7% accuracy

Trend identification, anomaly detection, and comparisons are highly reliable. Errors typically occur when seasonal patterns are misidentified, or when very short time series are compared.

Factual questions: 91.3% accuracy

Business concept explanations, regulatory summaries, and methodology explanations are generally accurate. Error rates increase for jurisdiction-specific regulatory questions (where rules change frequently) and for niche sector-specific knowledge.

Recommendation questions: 87.6% accuracy

Strategic recommendations are the most variable category. 'Accuracy' here means alignment with what an experienced business analyst would recommend given the same data โ€” assessed by our internal review team on a sample basis. Errors are typically overly conservative recommendations or failure to weight a key data signal appropriately.

Common Error Types

Hallucination (fabrication): The AI states something as fact that is not in your data or Claude's training. Rate: approximately 1.2% of responses. Most common in: factual questions about specific regulations or market data not in our training set.

Data misinterpretation: The AI reads your data correctly but draws an incorrect conclusion. Rate: approximately 2.8% of responses. Most common in: seasonal pattern analysis on short time series.

Overconfidence: The AI gives a High confidence answer that should be Medium or Low, because it did not correctly identify gaps in the data. Rate: approximately 3.1% of responses. We are actively working to reduce this.

Under-specificity: The AI gives a correct but vague answer when a more specific one was possible. Rate: approximately 4.2% of responses. Most common in: recommendation questions where the data supports a clear recommendation but the AI hedges unnecessarily.

Currency/unit errors: Confusion between currencies, units of measure, or time zones. Rate: approximately 0.9% of responses. Mitigated by always specifying your home currency and timezone in account settings.

How We Measure and Improve Accuracy

Accuracy is measured through three mechanisms:

Automated benchmarking: Every model update is tested against our benchmark suite of 2,400+ test queries before deployment. A model update is rejected if accuracy drops more than 0.5% in any category.

User feedback loop: Every thumbs-down on an AI response creates a flagged example that our team reviews. High-volume error patterns are used to improve our system prompting and, where appropriate, fed back to Anthropic.

Quarterly manual audit: Our team manually reviews a random sample of 200 AI responses per quarter, graded against expert business analyst standards. Results are published in this Transparency Centre.