Capture

Document scanning

Turn any image or PDF into structured, editable data. This page covers how OCR works under the hood and how to get the best results.

How it works

When you upload a document, ScanLedger runs it through a multi-step pipeline:

  1. Preprocessing — OpenCV and Pillow normalize the image: deskew, denoise, contrast correction.
  2. Table detection — if the document contains tabular data, the table structure is detected separately so rows and columns map cleanly.
  3. OCR extraction — the active engine (GPT-4 Vision or Gemini 2.5 Flash) reads the document and returns structured fields.
  4. Confidence scoring — each field is scored; anything below 0.85 is flagged for review.

OCR engines

Two engines are supported, configurable via the OCR_ENGINE environment variable:

EngineKeyModel
GPT-4 Vision (default)gpt4_visiongpt-4o-mini
Gemini 2.5 Flashgemini_flashgemini-2.5-flash

Gemini includes an automatic fallback to GPT-4 Vision for geo-restricted regions, so you never see a request fail because of provider availability.

Supported formats

  • Images: JPG, JPEG, PNG, GIF, BMP, TIFF, WebP, HEIC
  • PDFs: single- and multi-page (each page is processed individually)

Max file size is 20 MB per image and 50 MB per upload.

Document types

When scanning free-form (not through a template), you pick a document type so the model knows what to expect:

  • invoice — supplier invoices with line items, totals, and parties.
  • receipt — point-of-sale receipts, often with line items and taxes.
  • inventory_log — stock logs with product, quantity, and movement direction.
  • attendance_sheet — staff attendance or sign-in sheets.
  • payment_slip — deposit slips or payment acknowledgements.
  • general_table — any tabular document.
  • general_note — free-form handwritten or typed notes.
  • auto_detect — let the AI pick the best category.

Confidence scores

Every extracted field gets a number between 0 and 1 indicating how confident the model is. The UI highlights anything below 0.85. Click the field to edit — editing flips it to verified and updates the score.

Tip: You can lower or raise the threshold per workspace via OCR_CONFIDENCE_THRESHOLD. Raise it for high-stakes documents, lower it for casual notes where manual review would be overkill.

Tips for better accuracy

  • Use good, even lighting. Side shadows are the biggest cause of misread digits.
  • Keep the camera square to the page and avoid perspective distortion.
  • For handwriting, use dark ink on light paper.
  • For glossy thermal receipts, tilt slightly to kill the glare before you shoot.
  • If a document is critical, scan it as a PDF from a flatbed scanner rather than a phone.

Batch scanning

You can upload multiple files at once. Each file becomes an independent document. Pro and Enterprise plans process batches in parallel through the background worker; the Free trial is capped at 5 scans per day.

Plan limits

PlanScans per dayStorage
Free (trial)550 MB
ProUnlimited1.5 GB
EnterpriseUnlimited5 GB

Next steps