How the AI Receipt Scanner Extracts Data
Understand the technology behind the AskBiz receipt scanner — how Claude AI reads receipt images, what data it extracts, and why accuracy varies across receipts.
Key Takeaways
- AskBiz uses the Claude vision API to read receipt images, not traditional OCR, so it understands context as well as text.
- The AI extracts six fields: vendor, date, amount, category, notes, and a confidence score for each.
- Image quality — especially lighting and focus — is the biggest factor controlling extraction accuracy.
The Technology Behind the Scanner
When you photograph or upload a receipt, AskBiz sends the image to Anthropic's Claude AI via a secure API call. Claude is a multimodal large language model, meaning it can understand both text and images simultaneously. Unlike older optical character recognition (OCR) tools that simply convert printed letters to plain text, Claude reads the receipt in context. It understands that the large bold figure at the bottom of a restaurant receipt is the total, that a string like '23/05/25' is a date in day/month/year format, and that 'Costa Coffee' is a vendor in the Meals and Entertainment category. This contextual understanding is why the scanner handles varied receipt layouts — thermal rolls, PDF invoices, handwritten notes — far better than traditional OCR.
What Data Gets Extracted
For every receipt, the AI attempts to identify six pieces of information. The vendor name is the trading name of the business you paid, for example 'Tesco' or 'Slack Technologies'. The date is the transaction date, converted to the standard format used by your account. The amount is the total charged, inclusive of tax. The category is Claude's best guess at which of the 14 AskBiz expense categories applies, based on the vendor name and items listed on the receipt. The notes field may be populated with a brief description if the receipt contains useful context, such as a list of purchased items or a reference number. Finally, a confidence score between 0 and 100 is generated for each field independently, reflecting how certain the model is about that particular value.
Free — no card needed
See this in action for your business
AskBiz tracks these metrics automatically — just connect your data and start asking questions.
Start for free →Why Accuracy Varies
Several factors affect how accurately the AI reads a receipt. Image sharpness is the most critical — a blurry photo caused by camera shake will produce low confidence scores across all fields. Lighting matters too: a receipt photographed under warm dim light may show low contrast between the thermal ink and the paper, making characters harder to distinguish. Receipt format also plays a role: a clean PDF invoice with machine-readable text will always score higher than a faded thermal roll receipt that has been folded and refolded. The AI also performs better on receipts in Latin-script languages (English, French, Spanish, etc.) than on receipts in non-Latin scripts.
Category Inference Logic
When assigning a category, Claude looks at the vendor name, any line items listed, and any category codes that appear on the receipt. For well-known vendors, the mapping is typically very accurate: Slack maps to Software/SaaS, a petrol station maps to Travel, a pharmacy maps to Supplies. For less familiar vendors or ambiguous purchases, the AI falls back on context clues from the line items. If you find the AI consistently miscategorising a particular vendor, you can correct it on the review screen. The corrected category will override the AI suggestion for that submission but does not automatically re-train the model for future receipts.
Data Security and Privacy
The receipt image is transmitted over an encrypted HTTPS connection to Anthropic's API. Anthropic does not use API-submitted images to train future models unless you have opted in under a separate data use agreement. The image itself is not stored permanently by AskBiz — only the extracted field values are written to your Supabase database. If you are handling receipts that contain sensitive personal information (for example, medical expenses), you can redact that information from the image before uploading without affecting the fields the scanner needs.