Capture

Document scanning

Turn any image or PDF into structured, editable data. This page covers how OCR works under the hood and how to get the best results.

How it works

When you upload a document, ScanLedger runs it through a multi-step pipeline:

Preprocessing — OpenCV and Pillow normalize the image: deskew, denoise, contrast correction.
Table detection — if the document contains tabular data, the table structure is detected separately so rows and columns map cleanly.
OCR extraction — the active engine (GPT-4 Vision or Gemini 2.5 Flash) reads the document and returns structured fields.
Confidence scoring — each field is scored; anything below 0.85 is flagged for review.

OCR engines

Two engines are supported, configurable via the OCR_ENGINE environment variable:

Engine	Key	Model
GPT-4 Vision (default)	`gpt4_vision`	`gpt-4o-mini`
Gemini 2.5 Flash	`gemini_flash`	`gemini-2.5-flash`

Gemini includes an automatic fallback to GPT-4 Vision for geo-restricted regions, so you never see a request fail because of provider availability.

Supported formats

Images: JPG, JPEG, PNG, GIF, BMP, TIFF, WebP, HEIC
PDFs: single- and multi-page (each page is processed individually)

Max file size is 20 MB per image and 50 MB per upload.

Document types

When scanning free-form (not through a template), you pick a document type so the model knows what to expect:

invoice — supplier invoices with line items, totals, and parties.
receipt — point-of-sale receipts, often with line items and taxes.
inventory_log — stock logs with product, quantity, and movement direction.
attendance_sheet — staff attendance or sign-in sheets.
payment_slip — deposit slips or payment acknowledgements.
general_table — any tabular document.
general_note — free-form handwritten or typed notes.
auto_detect — let the AI pick the best category.

Confidence scores

Every extracted field gets a number between 0 and 1 indicating how confident the model is. The UI highlights anything below 0.85. Click the field to edit — editing flips it to verified and updates the score.

Tip: You can lower or raise the threshold per workspace via OCR_CONFIDENCE_THRESHOLD. Raise it for high-stakes documents, lower it for casual notes where manual review would be overkill.

Tips for better accuracy

Use good, even lighting. Side shadows are the biggest cause of misread digits.
Keep the camera square to the page and avoid perspective distortion.
For handwriting, use dark ink on light paper.
For glossy thermal receipts, tilt slightly to kill the glare before you shoot.
If a document is critical, scan it as a PDF from a flatbed scanner rather than a phone.

Batch scanning

You can upload multiple files at once. Each file becomes an independent document. Paid plans process batches in parallel through the background worker; the Free trial includes 7 Receipt/Invoice scans in total.

Plan limits

Plan	Receipt/Invoice scans
Free (30-day trial)	7 total
Basic	15 / month
Pro	Unlimited
Enterprise	Unlimited

Next steps

Document templates — map every scan to a fixed schema.
Datasets — turn scanned data into searchable tables.