Get started

Core concepts

Understand how data flows through ScanLedger and how the pieces fit together.

The data model at a glance

Everything in ScanLedger lives inside a workspace. Within a workspace, three primitives power the rest of the product:

  • Documents — anything you scanned or uploaded, with extracted structured fields and a confidence score per field.
  • Datasets — tabular collections with a typed schema. Datasets are populated by scans, CSV/Excel imports, Google Sheets, or manual entry.
  • Files & folders — arbitrary binary storage with hierarchy and semantic search (Pro+).

On top of these sit operational tools — Inventory, Point of Sale, Bank reconciliation, and Expected payments — each of which reads from and writes to the same underlying data model.

Workspaces

A workspace is the boundary for billing, data, and team membership. If you invite someone to collaborate, they see only your workspace's data. When a team member belongs to multiple workspaces (for example, their own account plus the business they work for), they switch between them using the workspace switcher in the sidebar.

The backend enforces workspace scoping on every API call, and the API client sends anX-Workspace-Id header whenever a team member is acting inside another user's workspace.

Documents and confidence scores

When you scan a document, the OCR engine returns extracted fields along with a confidence score between 0 and 1. ScanLedger flags anything below the configured threshold (default 0.85) for human review. Editing a field flips it to verified and persists the corrected value.

Templates vs. free-form scans

You have two scanning modes:

  • Free-form — pick a generic document type (invoice, receipt, general_table, auto_detect, etc.) and the AI extracts whatever it finds.
  • Template — you define the exact fields (name, type, required) and layout up front. Every scan is mapped to your schema so the output is consistent.

Templates are the right choice when you have a repetitive document type — invoices from the same vendor, standardized exam results, payroll slips. Free-form is the right choice for ad-hoc capture.

Datasets and schema fingerprinting

Datasets have a typed schema that ScanLedger fingerprints so related data lands in the right table. When you import a CSV with slightly different header names — amount vs total_amount — the schema service matches the columns and appends rows into the existing dataset rather than creating a new one.

Expected payments

Every sale and invoice creates an expected payment — a promise of money owed. When you import a bank statement, reconciliation matches bank transactions against the pool of expected payments so you always know who has paid and who has not.

Roles and plan limits

Roles determine what a team member can see and do inside a workspace. Available roles depend on the plan:

PlanTeam sizeRoles available
Free (trial)1Owner
Pro5Owner, Staff
EnterpriseUnlimitedOwner, Admin, Manager, Staff, + custom roles

See Team & permissions for details on each role, and Billing & plans for the full plan matrix.

Background processing

Heavy work — OCR, bulk imports, reconciliation runs — is queued through an ARQ worker backed by Redis. When Redis is available, requests return quickly with a job ID and stream the result back over WebSocket. If Redis is unavailable in your deployment, ScanLedger gracefully falls back to synchronous processing.