Help & Documentation

Overview

Smart Capture extracts invoice data from email attachments. The web app surfaces three pages:

  • Tasks — the HITL (human-in-the-loop) queue. Reviewers pick up invoices that need a human eye and either submit corrections, release them back to the queue, or reject them.
  • Validation — the per-task review screen. Opens when you pick up a task; lets you correct header fields and line items against the original PDF.
  • Inspector — a read-only investigation tool for support staff. Look at any email, see which invoices the pipeline produced, drill into the workflow and the captured fields.

Access is controlled by Entra groups: smart-capture-<env>-reviewer grants the Tasks + Validation pages, smart-capture-<env>-inspector grants the Inspector page. Users who need both must be added to both groups for the target environment.

The colored strip at the top of every page shows which environment you're in. It is always present so you can never mistake one environment for another — double-check before submitting a task or making corrections.

Live vs. Analytics data on the Home dashboard

The Home dashboard shows two kinds of widgets, each labeled with its source.

  • Live widgets (tagged LIVE) read the operational review queue from DynamoDB and reflect only the environment's retention window (configured per environment), so they are never labeled "all-time."
  • Analytics widgets (tagged ATHENA) query the permanent final_json data lake via Athena, so longer ranges are truthful. Use the range picker in the analytics section header to change the window.

Inspector

The Inspector lets you look at any email that arrived in a Smart Capture mailbox, see what the pipeline did with each attachment, and download the raw artifacts. It does not change any data — it's a read-only window.

Filters

  • Mailbox — required. Type to filter the list; ↓ / Enter to select. Until a mailbox is chosen, the email list shows a "select a mailbox" callout.
  • Date range — UTC datetime pickers. From / To default to yesterday 00:00:00 UTC through today 23:59:59 UTC.
  • Sender contains — substring match on the From address.
  • Subject contains — substring match on the email subject (case-insensitive).

Click Submit to load matching emails (paginated; pick 5 / 10 / 25 / 50 per page; default 5). Reset filters clears every filter including the mailbox.

Email list cards

Each email is a card with a status icon on the left, sender and subject in the middle, timestamps on the right, and pill counters underneath. Sample:

vendor@example.com
Invoices for May
05/17/2026 15:18:33 UTC
05/17/2026 11:18:33 AM EST
3 accepted 1 ignored · 3 invoices 2 processed 1 running

Click any row to open the detail panel on the right.

Status icon legend (email-list cards)

The icon on the left of each card follows a priority cascade based on the email's invoice statuses:

  • Processed — every invoice on the email finished successfully.
  • Running — at least one invoice is still in flight (Textract, splitting, or awaiting HITL).
  • Partial failure — some invoices succeeded and some failed.
  • All failed — every invoice on the email ended in a failure state.
  • Empty — no invoices were produced (no attachments, or every attachment was filtered out as the wrong file type).

Count pill legend (email list + Email Info)

  • N accepted attachments that passed the mailbox's approved_extensions filter and entered the pipeline.
  • N ignored attachments dropped because of an unsupported extension. Lives in the rejected S3 bucket.
  • 0 accepted / 0 ignored — muted variant when the count is zero.
  • N invoices the total count after auto-splitting. Split parents are excluded; only the resulting child invoices count.
  • N processed reached SuccessState in the Step Function.
  • N running still in flight (incl. awaiting HITL).
  • N failed ended in FAILED / TIMED_OUT / ABORTED.

Email Info panel

The detail panel opens when you click an email card. The top-level Email Info section is collapsed by default — operators usually drill straight into Invoice documents below, and the email-list card already shows the headline facts. Expanding it gives you the flat header info (From / Mailbox / Subject / Received / Size / Attachments / Invoices) and a nested collapsed Body section (HTML iframe or text, depending on what the email carried).

For records ingested via direct file upload instead of email (the file_drop_to_attachments ingest path), the Email Info section is replaced entirely with File Upload Info — same outer chrome, but the dl drops the email-only rows (From, Mailbox, Subject) and the nested Body section is omitted. Only Received, Attachments, and Invoices remain.

Invoice document cards — header layout

Each attachment that entered the pipeline (plus any split children) renders as a collapsible card. The header is two rows:

Row 1 (identity): caret · INVOICE #: 120663 · filename · page count.

Row 2 (status): left-anchored context badges (e.g. "Review Time Expired" warning) and right-floating pills (split badge, status, duration).

Sample collapsed card:

INVOICE #: 120663 · email-04_invoice_120663.pdf · 1 page
Processed
⏱ 4m 14s

The INVOICE # badge always renders. When Textract didn't capture an ID it shows INVOICE #: — — this keeps the header layout visually consistent across cards. Independent of whether the field is in the customer's HITL trigger list.

Document status pill legend (Row 2 right)

  • Processed the document's execution reached SuccessState. For a split parent this means splitting succeeded; the actual invoices are its child rows.
  • Awaiting HITL the execution is paused at the human-review state. A reviewer in the Tasks queue needs to pick it up.
  • Running the execution is in flight but not yet at HITL.
  • Failed the execution ended in FAILED / TIMED_OUT / ABORTED. Open the Workflow section to see where it broke.

Context badges (Row 2 left)

The left slot surfaces operational warnings beside the regular status pill:

  • ⚠ Review Time Expired the mailbox required human review but the a2i loop timed out without a reviewer addressing the doc. The pipeline auto-released it without review. A matching amber banner appears at the top of the card body explaining the bypass.

Inside the card body

The body renders three collapsible sub-sections, in this order: Workflow (expanded by default), Captured fields (collapsed), Final JSON (collapsed).

Above all three sits the actions strip:

  • (IT ONLY) — SF execution link ↗ — the AWS Step Function console deep-link for IT / support staff. The (IT ONLY) label is a muted text marker sitting OUTSIDE the link itself so it isn't part of the clickable surface; AP operators should ignore the link entirely.
  • View PDF — opens the converted PDF in a modal preview.

Workflow timeline

A two-row-per-step table: each step's status dot (rail) on the left, the step name across the top row, and the entered UTC / EST timestamps + duration across the bottom row. A vertical rail connects consecutive dots; it trims at the first and last step so the rail starts and ends at a dot.

Step status dot legend:

  • Done — step entered and exited (or is a terminal SucceedStateEntered).
  • Running — step entered, hasn't exited yet (Wait for Human Review Response when the execution is paused).
  • Failed — FailStateEntered, or the last running step when the execution terminated abnormally.
  • Warning — step succeeded technically (Pass state) but represents an operationally-unhappy outcome. Fires on the "Wait for Human Review Response" step when human review was bypassed — timed out, declined by the reviewer, or cancelled out-of-band.

Duration rendering rules (no decimals, always a space before the unit):

  • 320ms — sub-second steps.
  • 0ms — an instant non-terminal step (Choice / Pass that entered and exited in the same SF timestamp). The 0 is explicit so you know the step ran rather than wondering whether data is missing.
  • with hover tooltip "End of execution — no exit event recorded" — only on terminal states (SucceedState / FailState). These emit just an Entered event by design.
  • 17s / 4m 14s / 72h 10m — integer h / m / s scale once you're past 1 second.

HITL queue-wait vs. active-work sub-text

The Wait for Human Review Response step gets an extra muted sub-line under its main duration that splits the wait into two operator-meaningful numbers: how long the task sat in queue before a reviewer accepted it, versus how long the reviewer actively worked on it. Same shape for both HITL paths (AWS A2I and our custom onPhase Team) — the values are derived from the metadata.a2i_metadata block both writers emit into _final.json, anchored on the wait state's Step Function entry time.

  • ⏱ 2m 14s
    Queued 47s · Working 1m 27scompleted: reviewer accepted, edited, and submitted. The active-work number comes from the validation page's own stopwatch (the canonical value), with submission − acceptance as the fallback.
  • in progress…
    Queued 38m · Working (not started)in progress: the wait state is still running. The queue number grows with wallclock; the active-work portion shows "(not started)" in italic blue until a reviewer's submission lands in the final JSON. Note: in-session active-work time isn't surfaced live — it appears only after submit.
  • ⏱ 3m 45s
    Queued 3m 45s · Review expiredtimed out: the heartbeat exceeded the SLA without a reviewer accepting the task. The wait state exited via the HandleHumanReviewTimeout branch; the whole wallclock was queue time. Paired with the amber ⚠ Review Time Expired context pill and the body banner.
  • ⏱ 1m 22s
    Queued 47s · Review declineddeclined: a reviewer opened the task and clicked Reject — typically because the document wasn't a recognizable invoice, was unreadable, or belonged to a different workflow. Same amber treatment as the timed-out case but distinct copy (the task didn't sit in queue; a human refused it). Paired with the ⚠ Review Declined context pill.
  • ⏱ 14s
    Queued 14s · Review cancelledcancelled: the review was stopped out-of-band before any reviewer could address it — typically by an admin using the AWS console / CLI or by a deployment process. Same amber treatment, distinct copy. Paired with the ⚠ Review Cancelled context pill.

All three bypass-family outcomes (timed out, declined, cancelled) also flip the Wait for Human Review Response step-row's bullet from the default green ✓ to an amber ⚠, marking the wait state itself as the source-of-truth for the operationally-unhappy outcome. The invoice keeps the green Processed pill in all three cases — the Step Function workflow completed successfully; the review just didn't.

Captured fields

Collapsed by default. The outer summary surfaces meta pills so you can scan without expanding:

22 fields 2 HITL edits 1 not captured 1 low confidence

Inside, fields split into two groups:

  • Primary fields — the 14 keys the pipeline evaluates against the mailbox's min_confidence_score AND the customer's priority_fields list. Any low-confidence value OR an absent value listed in priority_fields sends the invoice to HITL. Always visible.
  • Additional fields — everything else Textract captured. Confidence scores are shown but never trigger HITL. Collapsible, closed by default.

Captured-fields row variants

  • INVOICE_RECEIPT_ID Primary key — bolded blue background. Always INVOICE_RECEIPT_ID, the canonical row.
  • VENDOR_NAME HITL-edited — purple left rail. Value differs from the original Textract capture; the Original column shows the strike-through and a ✎ marker.
  • PO_NUMBER Low confidence — soft amber background. Score below the mailbox's threshold; triggered HITL.
  • INVOICE_RECEIPT_ID Not captured — red left rail + soft red row. The field was listed in client_config.priority_fields but Textract didn't extract it; the pipeline synthesized a placeholder to force HITL.

Confidence pill legend (Captured fields)

  • 95% high confidence (at or above the mailbox's min_confidence_score).
  • 72% medium confidence (below threshold). Triggers HITL when this row is in the Primary group.
  • 35% low confidence (deep below threshold). Same HITL behavior as medium; just the strongest amber.
  • n/a field naturally has no value and no review is expected (e.g. ACCOUNT_NUMBER on a vendor that doesn't use account numbers).
  • not captured — red em-dash pill. Textract didn't extract this field, and it was in priority_fields. HITL ran (or is pending). Pairs with the red-rail row treatment above. The outer summary's "N not captured" meta pill is where the word-form label appears.

Final JSON

The post-HITL _final.json artifact, rendered as a syntax-light monospace block. Collapsed by default. Synthesized fields show as { "Value": "", "confidence_score": 0, "validation_required": true, "synthetic": true }.

Mobile / landscape behavior

  • Portrait phones (≤ 430 px) — the list and detail stack vertically (list on top, detail below). Tapping an email auto-scrolls the detail into view. Two columns hidden to fit: Original column in Captured fields, EST column in the Workflow timeline.
  • Landscape phones (844 px+) — side-by-side layout returns, with full content density (all columns visible). Both panes scroll together with the page (the desktop sticky right-pane behavior is disabled on short viewports).
  • iPad / desktop (≥ 769 px) — full two-column split. Right pane stays sticky as you scroll the email list on the left.

Tasks

The Tasks page lists every HITL task currently waiting on a human reviewer. Each task corresponds to one invoice that the pipeline can't validate automatically (low confidence, or a configured priority_fields field absent from capture).

Filtering the queue

Use the filter inputs at the top of the table to narrow down by task name, status, assigned user, client, or date range. Column headers are sortable; click to toggle ascending / descending.

Picking up a task

Select a task and click Process (or Validate on the row). The Validation page opens with that task's invoice loaded. Your assignment is locked while you're working on it — other reviewers won't see it in the queue.

Returning a task to the queue

If you can't finish a task you opened, use the Release button on the Validation page (see below). Releasing puts the task back into the queue with no edits saved. It does not reject the invoice — it simply hands it to whoever picks it up next.

Validation

The Validation page is where you review and correct a single invoice. The left panel is the editable form; the right panel is the original PDF (zoom in/out, scroll, fit-to-width). The action buttons live in the sub-bar at the top of the page so they're always within reach.

Reviewing the form

Each header field (INVOICE_RECEIPT_ID, VENDOR_NAME, TOTAL, etc.) and each line item appears in the form pre-populated with whatever the pipeline extracted. Compare against the PDF and correct anything wrong. Fields flagged red are below the confidence threshold or were absent from capture — pay them the most attention.

Three actions

  • Submit — saves your corrections to the final invoice and closes the task. The pipeline picks up from here.
  • Release — sends the task back to the queue with no edits saved. Use this when you opened a task but can't finish it.
  • Reject — terminates the task with a required reason. Use this when the invoice cannot be processed at all (not actually an invoice, blank document, corrupted PDF, etc.). The reason is logged and surfaced to the Inspector.

All three are visible in the page sub-bar regardless of which form field you're focused on. Do not close the tab without using one of them — the task remains locked to you until you submit, release, or reject.

Reviewers can leave a field blank even when it was flagged "not captured"; the pipeline accepts the empty value and continues. The Inspector will surface the still-empty field with the red "not captured" pill so the downstream operator knows it was reviewed but no value was provided.

Terminology

  • Mailbox — a customer-specific email address that feeds the pipeline. Configured per environment.
  • Attachment — a file that arrived on an email. Becomes "accepted" if its extension matches the mailbox's approved_extensions; otherwise it's "ignored" and lives in the rejected bucket.
  • Document — an attachment that entered the pipeline. May spawn split children if auto-split is enabled and the doc holds multiple invoices.
  • Invoice — the final unit of work. One per document for unsplit docs; one per split child for documents that were broken up.
  • Split parent / Split child — when one attachment holds multiple invoices, document_splitting_process emits children named <stem>_split_<a>-<b>.<ext>. The parent finishes early (SuccessState after splitting); the children carry the full pipeline.
  • HITL — Human-In-The-Loop. The review step that fires when the pipeline can't validate an invoice automatically. Two triggers: a Primary field captured below the mailbox's min_confidence_score, OR a priority_fields field absent from capture.
  • priority_fields — a per-mailbox list of field keys that trigger HITL when not captured by Textract. Defaults to ["INVOICE_RECEIPT_ID"] if the customer config omits it. Set to [] to opt out entirely.
  • Synthesized / synthetic — a placeholder Captured-fields entry the pipeline injects when a priority_fields field is absent. Marked with synthetic: true in the final JSON and rendered with the red "not captured" treatment in the Inspector.
  • header_only — a mailbox configuration flag. When true, Textract extracts only header fields; line items are skipped. When false, line items appear in the Captured fields section.
  • Release from Human Review — the Step Function state that fires when the HITL heartbeat times out without a reviewer addressing the document. The pipeline auto-releases the invoice (sets a2i_workforce=bypassed) and continues to post-processing — the workflow itself succeeds. The bypass signal is carried by the preceding Wait for Human Review Response step (amber ⚠ warning dot, "Review expired" sub-text) and by the invoice card header's ⚠ Review Time Expired context badge.