Skip to content

Receipts Scanner

URL: https://receipts.heezy.info
Internal: http://192.168.1.15:30850
k8s: heezy namespace, deployment receipts
Source: heezy-containers/dockerfiles/receipts/

How It Works

  1. Upload photo of receipt via web UI
  2. EXIF rotation correction applied immediately
  3. AWS Textract runs OCR in background (~30–45s)
  4. Extracted: merchant, total, date, line items
  5. Ollama (llama3.2:3b) categorizes each line item
  6. Results appear in UI, editable

Database

Writes to receipts + receipt_items tables in heezy Postgres.

Key columns on receipts: - merchant — raw extracted merchant name - merchant_normalized — lowercased + alias-mapped - date — raw OCR date string (legacy, mixed formats) - date_ts — parsed timestamptz (canonical, used by dashboard) - category — overall receipt category - payment_type — cash / credit / debit / check / other (set manually via edit form)

OCR Pipeline

  • Primary: AWS Textract (handles rotation natively, structured output)
  • Fallback: Tesseract (if no AWS creds)
  • Textract IAM user: receipts-textract in terraform-heezy
  • Creds via ExternalSecret → OpenBao production/heezy/receipts/aws-credentials

Storage

  • Images: NFS PVC (192.168.1.200:/mnt/Arr1/SMB/receipts/)
  • DB: Postgres on big-boi

Reprocessing

POST /receipt/<id>/reprocess

Forces OCR re-run on a stuck or failed receipt.