Analyze

Datasets

Datasets turn the unstructured world of scanned documents and spreadsheets into tables you can filter, chart, export, and chat with.

What is a dataset?

A dataset is a typed, schema-aware table. Each column has a type (string, number, date, boolean, currency), each row is a record, and the whole thing is searchable, filterable, and exportable.

Creating a dataset

From Dashboard → Datasets → Create Dataset, you can source from:

  • CSV / Excel upload — schema is auto-detected from headers and sample values.
  • Scanned documents — choose one or more documents with tabular data.
  • Folders — pick files already uploaded to the file manager.
  • Google Sheets — after connecting a Google account, import any sheet by URL.
  • Blank — start with a schema you define manually.

Schema fingerprinting and appends

When you import new data, ScanLedger fingerprints the incoming schema against existing datasets. If a match is found, new rows are appended and any novel columns are auto-added. This lets you upload a slightly different CSV next month without creating a new dataset.

How matching works: column names are normalized (lowercased, stripped of punctuation), fuzzy-matched against existing fields, and reconciled with type compatibility. You can review and override the suggested mapping before the import commits.

Filtering

Each column exposes operators appropriate to its type:

  • String: contains, equals, starts_with, ends_with, is_empty.
  • Number: equals, greater_than, less_than, between.
  • Date: before, after, between, in_range.
  • Boolean: is_true, is_false.

Stack multiple filters across columns; each one appears as a removable badge above the table.

Search and replace

Fix typos or standardize values across many rows with one operation. Choose:

  • Scope — a single column or the whole dataset.
  • Match mode — exact cell or substring.
  • Case sensitivity.

Inline editing

Click any cell to edit. Add records manually with New record. Mark records as verified to track data quality — useful when you are splitting human-reviewed entries from unverified machine-extracted ones.

Type conversion

Change a column's type at any time. ScanLedger coerces values when it can — a text column of "2024-01-15" strings becomes a real date column; text numbers become numeric. Values that cannot be converted are set to null and reported in the conversion summary.

Field statistics

Every numeric column gets automatic sum, average, min, and max. Date columns get oldest / latest. Text columns get unique-count and top-N categories. Grouped aggregations surface breakdowns like “revenue by product.”

Exporting

  • CSV — respects active filters so you can export a subset.
  • Google Sheets — push directly to a new or existing sheet (requires Google Workspace connection).

Plan access

Datasets and AI chat are included on Pro and Enterprise. Free-trial users get read and export access to data already imported, but most advanced operations are gated behind an upgrade.

Next steps