Skip to content

heezy-finance - Data Sources

All data lives in the heezy Postgres database on big-boi (192.168.1.21:5432).

Three Data Sources

heezy-finance ingests from three independent sources: email orders, physical receipts, and bank transactions.

Orders (Gmail + Ollama)

  • Source: Email order confirmations (Amazon, eBay, Woot, etc.)
  • Ingested by: amazon_orders.py in heezy-finance container
  • Tables: orders, order_items
  • Current count: 161 orders, 202 line items
  • Categorization: Ollama llama3.2:3b classifies each item on ingest

Receipts (OCR + Categorization)

  • Source: Physical receipts photographed via receipts.heezy.info
  • Ingested by: Receipts service - AWS Textract OCR + Ollama categorization
  • Tables: receipts, receipt_items
  • Current count: 43 receipts, all with date_ts
  • Date column: date_ts (timestamptz) - canonical date for filtering and display
  • Merchant column: merchant_normalized - standardized lowercase with aliases
  • Payment method: payment_type - set manually per receipt (e.g., cash, CC, debit)

Bank Transactions (Statement Parser)

  • Source: Bank of America + Capital One PDF statements
  • Ingested by: /opt/statement-parser/ingest.py on big-boi via NFS ingest pipeline
  • Ingest frequency: Every 5 minutes
  • Upload method: scp to /nfs/heezy/ingest/raw/statements/new/ on big-boi
  • Tables: bank_statements, bank_transactions
  • Current count: 7 statements (BoA Jan-May 2026, Cap1 Q1 2026), 134 transactions
  • Auto-populated fields:
  • category - rule-based keyword classifier
  • merchant_normalized - standardized merchant names
  • Pending: Credit card statements (Phase 2)

Dashboard Integration

All three sources are unified in get_all_purchases() in app.py:

Source Date field used Filter
orders order_date (RFC 2822 parsed) all
receipts date_ts (timestamptz) date_ts IS NOT NULL
bank_transactions date (DATE) type=debit, category NOT IN (Transfer, Income)