Analyze
Datasets
Datasets turn the unstructured world of scanned documents and spreadsheets into tables you can filter, chart, export, and chat with.
What is a dataset?
A dataset is a typed, schema-aware table. Each column has a type (string, number, date, boolean, currency), each row is a record, and the whole thing is searchable, filterable, and exportable.
Creating a dataset
From Dashboard → Datasets → Create Dataset, you can source from:
- CSV / Excel upload — schema is auto-detected from headers and sample values.
- Scanned documents — choose one or more documents with tabular data.
- Folders — pick files already uploaded to the file manager.
- Google Sheets — after connecting a Google account, import any sheet by URL.
- Blank — start with a schema you define manually.
Schema fingerprinting and appends
When you import new data, ScanLedger fingerprints the incoming schema against existing datasets. If a match is found, new rows are appended and any novel columns are auto-added. This lets you upload a slightly different CSV next month without creating a new dataset.
Filtering
Each column exposes operators appropriate to its type:
- String:
contains,equals,starts_with,ends_with,is_empty. - Number:
equals,greater_than,less_than,between. - Date:
before,after,between,in_range. - Boolean:
is_true,is_false.
Stack multiple filters across columns; each one appears as a removable badge above the table.
Search and replace
Fix typos or standardize values across many rows with one operation. Choose:
- Scope — a single column or the whole dataset.
- Match mode — exact cell or substring.
- Case sensitivity.
Inline editing
Click any cell to edit. Add records manually with New record. Mark records as verified to track data quality — useful when you are splitting human-reviewed entries from unverified machine-extracted ones.
Type conversion
Change a column's type at any time. ScanLedger coerces values when it can — a text column of "2024-01-15" strings becomes a real date column; text numbers become numeric. Values that cannot be converted are set to null and reported in the conversion summary.
Field statistics
Every numeric column gets automatic sum, average, min, and max. Date columns get oldest / latest. Text columns get unique-count and top-N categories. Grouped aggregations surface breakdowns like “revenue by product.”
Exporting
- CSV — respects active filters so you can export a subset.
- Google Sheets — push directly to a new or existing sheet (requires Google Workspace connection).
Plan access
Datasets and AI chat are included on Pro and Enterprise. Free-trial users get read and export access to data already imported, but most advanced operations are gated behind an upgrade.
Next steps
- AI chat — ask questions in plain English.
- Integrations — connect Google Sheets.