Skip to content

heezy-finance

Last Updated: 2026-06-18
Status: Active — data consolidation in progress
Owner: Trent Nielsen


What Is This?

heezy-finance is a household spending analytics platform. It is not an accounting tool — it is a visibility tool. The goal is a single dashboard that shows where money is going, with accurate data from every spending channel.

It consists of several loosely coupled services that all write to the same Postgres database (heezy db on big-boi at 192.168.1.21:5432), and a web dashboard that reads from all of them.


Services

Service Repo Path URL Status
heezy-finance-dashboard heezy-containers/dockerfiles/heezy-finance/ http://192.168.1.19:30860 ✅ Running (data issues)
heezy-finance-orders same container (amazon_orders.py) n/a (sync job) ✅ Running
heezy-finance-receipts heezy-containers/dockerfiles/receipts/ https://receipts.heezy.info ✅ Running
heezy-finance-statements /opt/statement-parser/ on big-boi n/a (cron job) ⚠️ Partially working — 7 statements loaded, no CC yet

Data Sources

1. orders / order_items (email-parsed orders)

  • Source: Amazon order confirmation emails, other vendor confirmation emails
  • Populated by: amazon_orders.py (heezy-finance container) — calls Gmail API, extracts via Ollama llama3.2:3b
  • Tables: orders, order_items
  • Current state: 161 orders, 202 line items
  • Vendors present: Amazon (114), Apple, Woot.com, eBay, NordVPN, IONOS, and ~15 others
  • Known issues:
  • grand_total is real (float) — rounding errors on display
  • Some orders have NULL grand_total (JetBrains, Jenson USA, Michigan Registered Agent)
  • order_date stored as RFC 2822 string — requires parsing on every query (no index benefit)
  • order_date_ts exists as a proper timestamptz column but may not be consistently populated

2. receipts / receipt_items (scanned receipts)

  • Source: Physical receipts photographed and uploaded to receipts.heezy.info
  • Populated by: Receipts service — AWS Textract OCR + Ollama line item categorization
  • Tables: receipts, receipt_items
  • Current state: 43 receipts, 342 line items
  • Top merchants: COSTCO ($1,352), Newegg ($604), Aldi ($408), Meijer ($225)
  • Known issues:
  • date column is a text field with wildly inconsistent formats ("June 9, 2026", "6/6/26", "2026-06-16")
  • No proper date_ts timestamptz column — all date filtering requires string parsing in app layer
  • category is sparse — only populated on recent receipts
  • Merchant names are not normalized (COSTCO vs Costco, meijer vs Meijer, etc.)
  • Gas receipts are split as separate line items on COSTCO receipt — so "Gas & Fuel" shows only $36.34 (one trip) even though there are multiple gas receipts
  • No payment_type field — cash vs card cannot be distinguished

3. bank_statements / bank_transactions (statement parser)

  • Source: Bank of America and Capital One PDF statements, dropped to NFS watch dir
  • Populated by: /opt/statement-parser/ingest.py on big-boi — cron every 5min, uses pdfplumber
  • Tables: bank_statements, bank_transactions
  • Current state: 7 statements (BoA Jan–May 2026, Cap1 Mar 2026), 134 transactions
  • Known issues:
  • BoA statement coverage ends May 2026 — June not loaded yet
  • Capital One has only 1 statement
  • No credit card statements at all yet
  • bank_transactions has no category or merchant_normalized column — raw descriptions only
  • Dashboard has zero integration with this data source

Dashboard (heezy-finance-dashboard)

  • Stack: Python Flask + vanilla JS + HTML templates
  • Live URL: http://192.168.1.19:30860 (internal NodePort)
  • k8s: heezy namespace, deployment heezy-finance
  • Source: heezy-containers/dockerfiles/heezy-finance/app.py

What it currently reads

  • orders + order_items (email orders)
  • receipts + receipt_items (scanned receipts — via get_all_purchases())
  • bank_statements / bank_transactionsNOT integrated at all
  • ❌ No date normalization on receipts at DB level — relies on app-layer parsing which can silently drop records with unparseable dates

API Endpoints

Endpoint Purpose
GET /api/overview MTD spend, category breakdown, 12-month sparkline
GET /api/purchases Paginated list with filters (date, category, source, search)
GET /api/categories Category totals + trend
GET /api/monthly Monthly totals by source
GET /api/vendors Vendor breakdown

Known Data Problems (Priority Order)

  1. Bank transactions not in dashboard — biggest gap. 134 transactions in DB, zero on screen.
  2. Receipt dates broken — inconsistent text formats cause silent date parse failures; receipts with no parseable date are invisible in time-filtered views.
  3. Gas spend under-reported — COSTCO gas is a line item on a multi-category receipt; the category filter works but the date filter drops it if the date string fails to parse.
  4. No category coverage on bank transactions — raw descriptions like "AMZN Mktp US*1A2B3C4D" need merchant normalization + category tagging.
  5. Missing June 2026 statements — BoA statement ends May 26. June transactions not loaded.
  6. No CC statements — credit card spend is invisible entirely.
  7. Receipt merchant names inconsistent — COSTCO, Costco, costco all treated as different merchants.

Improvement Roadmap

Phase 1 — Fix the data (current focus)

Goal: accurate numbers in every existing view.

  • Add date_ts timestamptz column to receipts — backfill from date text column, set NOT NULL on new inserts
  • Add merchant_normalized column to receipts — lowercase + trim + known-alias map
  • Normalize existing receipt merchant names via one-time migration
  • Integrate bank_transactions into dashboard — new "Bank" source tab in /api/overview and /api/purchases
  • Add category and merchant_normalized to bank_transactions — Ollama classification on description
  • Load June 2026 BoA statement
  • Validate gas spend: query all COSTCO gas line items, confirm total matches physical receipts

Phase 2 — Complete bank coverage

Goal: every bank/CC account represented.

  • Add Capital One credit card statement parsing
  • Load all available historical statements (backfill)
  • Add payment_type to receipts table (cash/card/unknown) — UI flag in receipts scanner
  • Statement auto-import via Gmail attachment (ingest_statements_from_gmail.py — exists but untested)

Phase 3 — Reconciliation service (future)

Goal: match receipts to bank transactions, surface discrepancies.

  • Service name: heezy-finance-reconcile
  • Match scanned receipt → bank transaction by: date (±2 days), amount (exact or within $0.50), merchant (normalized)
  • Match email order → bank transaction by: vendor, amount, date range
  • Flag: bank charge with no receipt/order
  • Flag: receipt/order with no matching bank charge
  • Flag: amount mismatch between order total and bank charge
  • UI: dedicated reconciliation page — pending matches, confirmed, unmatched
  • Prerequisite: Phase 1 and Phase 2 complete

Naming Convention

All services under this umbrella follow: heezy-finance-<service>[-<subservice>]

Examples: - heezy-finance-dashboard — the web app - heezy-finance-orders — email order ingestion - heezy-finance-receipts — receipt OCR scanner - heezy-finance-statements — bank statement parser - heezy-finance-reconcile — future reconciliation service


Database

All tables in heezy database on big-boi (192.168.1.21:5432).
Schema source of truth: ansible-heezy/roles/heezy-postgres-schema/files/schema.sql

Table Owner Purpose
orders heezy_app Order-level data from email receipts
order_items heezy_app Line items per order
receipts heezy_app Scanned receipt metadata
receipt_items heezy_app Line items extracted from scanned receipts
bank_statements n8n Statement metadata (bank, account, period, balances)
bank_transactions n8n Individual transactions from statements
amazon_returns heezy_app Return initiations and acceptances

Daily Progress Log

2026-06-18

  • Full audit of all data sources and current app code
  • Identified root cause of gas spend under-reporting (date parse failures + category split on COSTCO receipts)
  • Confirmed bank_transactions was completely disconnected from dashboard
  • Defined project roadmap and service naming convention
  • Created this documentation

Phase 1 progress (same day): - Added date_ts (timestamptz), merchant_normalized, payment_type columns to receipts - Added category, merchant_normalized columns to bank_transactions - Backfilled date_ts on 42/43 receipts (1 had literal "null" string date) - Backfilled merchant_normalized on all 43 receipts - Built rule-based categorizer for bank transactions — 134 txns classified without Ollama - Top non-transfer categories: Household $23k (mortgage), Health $383, Travel $335, Food $115 - Ollama was too slow on CPU (timeout on 10-item batches) — keyword rules are better here - Updated app.py to use date_ts instead of parsing text dates from receipts - Integrated bank_transactions as third data source (debits, excluding Transfer/Income) - Updated /api/sync to report counts from all three sources - Persisted GRANT SELECT on bank tables to heezy_app in schema.sql - Deployed to k8s, verified: gas now $140.52 (was $36), June spend $2,226 - All commits pushed to Gitea + GitHub

Remaining Phase 1: - [ ] Update receipts service to populate date_ts + merchant_normalized on new inserts - [ ] Load June 2026 BoA statement (coverage ends May 26) - [ ] docs.heezy.info setup (MkDocs + Cloudflare Pages)