heezy-finance¶
Last Updated: 2026-06-18
Status: Active — data consolidation in progress
Owner: Trent Nielsen
What Is This?¶
heezy-finance is a household spending analytics platform. It is not an accounting tool — it is a visibility tool. The goal is a single dashboard that shows where money is going, with accurate data from every spending channel.
It consists of several loosely coupled services that all write to the same Postgres database (heezy db on big-boi at 192.168.1.21:5432), and a web dashboard that reads from all of them.
Services¶
| Service | Repo Path | URL | Status |
|---|---|---|---|
heezy-finance-dashboard |
heezy-containers/dockerfiles/heezy-finance/ |
http://192.168.1.19:30860 | ✅ Running (data issues) |
heezy-finance-orders |
same container (amazon_orders.py) | n/a (sync job) | ✅ Running |
heezy-finance-receipts |
heezy-containers/dockerfiles/receipts/ |
https://receipts.heezy.info | ✅ Running |
heezy-finance-statements |
/opt/statement-parser/ on big-boi |
n/a (cron job) | ⚠️ Partially working — 7 statements loaded, no CC yet |
Data Sources¶
1. orders / order_items (email-parsed orders)¶
- Source: Amazon order confirmation emails, other vendor confirmation emails
- Populated by:
amazon_orders.py(heezy-finance container) — calls Gmail API, extracts via Ollama llama3.2:3b - Tables:
orders,order_items - Current state: 161 orders, 202 line items
- Vendors present: Amazon (114), Apple, Woot.com, eBay, NordVPN, IONOS, and ~15 others
- Known issues:
grand_totalisreal(float) — rounding errors on display- Some orders have NULL
grand_total(JetBrains, Jenson USA, Michigan Registered Agent) order_datestored as RFC 2822 string — requires parsing on every query (no index benefit)order_date_tsexists as a proper timestamptz column but may not be consistently populated
2. receipts / receipt_items (scanned receipts)¶
- Source: Physical receipts photographed and uploaded to receipts.heezy.info
- Populated by: Receipts service — AWS Textract OCR + Ollama line item categorization
- Tables:
receipts,receipt_items - Current state: 43 receipts, 342 line items
- Top merchants: COSTCO ($1,352), Newegg ($604), Aldi ($408), Meijer ($225)
- Known issues:
datecolumn is a text field with wildly inconsistent formats ("June 9, 2026", "6/6/26", "2026-06-16")- No proper
date_tstimestamptz column — all date filtering requires string parsing in app layer categoryis sparse — only populated on recent receipts- Merchant names are not normalized (COSTCO vs Costco, meijer vs Meijer, etc.)
- Gas receipts are split as separate line items on COSTCO receipt — so "Gas & Fuel" shows only $36.34 (one trip) even though there are multiple gas receipts
- No
payment_typefield — cash vs card cannot be distinguished
3. bank_statements / bank_transactions (statement parser)¶
- Source: Bank of America and Capital One PDF statements, dropped to NFS watch dir
- Populated by:
/opt/statement-parser/ingest.pyon big-boi — cron every 5min, uses pdfplumber - Tables:
bank_statements,bank_transactions - Current state: 7 statements (BoA Jan–May 2026, Cap1 Mar 2026), 134 transactions
- Known issues:
- BoA statement coverage ends May 2026 — June not loaded yet
- Capital One has only 1 statement
- No credit card statements at all yet
bank_transactionshas nocategoryormerchant_normalizedcolumn — raw descriptions only- Dashboard has zero integration with this data source
Dashboard (heezy-finance-dashboard)¶
- Stack: Python Flask + vanilla JS + HTML templates
- Live URL: http://192.168.1.19:30860 (internal NodePort)
- k8s:
heezynamespace, deploymentheezy-finance - Source:
heezy-containers/dockerfiles/heezy-finance/app.py
What it currently reads¶
- ✅
orders+order_items(email orders) - ✅
receipts+receipt_items(scanned receipts — viaget_all_purchases()) - ❌
bank_statements/bank_transactions— NOT integrated at all - ❌ No date normalization on receipts at DB level — relies on app-layer parsing which can silently drop records with unparseable dates
API Endpoints¶
| Endpoint | Purpose |
|---|---|
GET /api/overview |
MTD spend, category breakdown, 12-month sparkline |
GET /api/purchases |
Paginated list with filters (date, category, source, search) |
GET /api/categories |
Category totals + trend |
GET /api/monthly |
Monthly totals by source |
GET /api/vendors |
Vendor breakdown |
Known Data Problems (Priority Order)¶
- Bank transactions not in dashboard — biggest gap. 134 transactions in DB, zero on screen.
- Receipt dates broken — inconsistent text formats cause silent date parse failures; receipts with no parseable date are invisible in time-filtered views.
- Gas spend under-reported — COSTCO gas is a line item on a multi-category receipt; the category filter works but the date filter drops it if the date string fails to parse.
- No category coverage on bank transactions — raw descriptions like "AMZN Mktp US*1A2B3C4D" need merchant normalization + category tagging.
- Missing June 2026 statements — BoA statement ends May 26. June transactions not loaded.
- No CC statements — credit card spend is invisible entirely.
- Receipt merchant names inconsistent — COSTCO, Costco, costco all treated as different merchants.
Improvement Roadmap¶
Phase 1 — Fix the data (current focus)¶
Goal: accurate numbers in every existing view.
- Add
date_ts timestamptzcolumn toreceipts— backfill fromdatetext column, set NOT NULL on new inserts - Add
merchant_normalizedcolumn toreceipts— lowercase + trim + known-alias map - Normalize existing receipt merchant names via one-time migration
- Integrate
bank_transactionsinto dashboard — new "Bank" source tab in/api/overviewand/api/purchases - Add
categoryandmerchant_normalizedtobank_transactions— Ollama classification on description - Load June 2026 BoA statement
- Validate gas spend: query all COSTCO gas line items, confirm total matches physical receipts
Phase 2 — Complete bank coverage¶
Goal: every bank/CC account represented.
- Add Capital One credit card statement parsing
- Load all available historical statements (backfill)
- Add
payment_typetoreceiptstable (cash/card/unknown) — UI flag in receipts scanner - Statement auto-import via Gmail attachment (ingest_statements_from_gmail.py — exists but untested)
Phase 3 — Reconciliation service (future)¶
Goal: match receipts to bank transactions, surface discrepancies.
- Service name:
heezy-finance-reconcile - Match scanned receipt → bank transaction by: date (±2 days), amount (exact or within $0.50), merchant (normalized)
- Match email order → bank transaction by: vendor, amount, date range
- Flag: bank charge with no receipt/order
- Flag: receipt/order with no matching bank charge
- Flag: amount mismatch between order total and bank charge
- UI: dedicated reconciliation page — pending matches, confirmed, unmatched
- Prerequisite: Phase 1 and Phase 2 complete
Naming Convention¶
All services under this umbrella follow: heezy-finance-<service>[-<subservice>]
Examples:
- heezy-finance-dashboard — the web app
- heezy-finance-orders — email order ingestion
- heezy-finance-receipts — receipt OCR scanner
- heezy-finance-statements — bank statement parser
- heezy-finance-reconcile — future reconciliation service
Database¶
All tables in heezy database on big-boi (192.168.1.21:5432).
Schema source of truth: ansible-heezy/roles/heezy-postgres-schema/files/schema.sql
| Table | Owner | Purpose |
|---|---|---|
orders |
heezy_app | Order-level data from email receipts |
order_items |
heezy_app | Line items per order |
receipts |
heezy_app | Scanned receipt metadata |
receipt_items |
heezy_app | Line items extracted from scanned receipts |
bank_statements |
n8n | Statement metadata (bank, account, period, balances) |
bank_transactions |
n8n | Individual transactions from statements |
amazon_returns |
heezy_app | Return initiations and acceptances |
Daily Progress Log¶
2026-06-18¶
- Full audit of all data sources and current app code
- Identified root cause of gas spend under-reporting (date parse failures + category split on COSTCO receipts)
- Confirmed bank_transactions was completely disconnected from dashboard
- Defined project roadmap and service naming convention
- Created this documentation
Phase 1 progress (same day):
- Added date_ts (timestamptz), merchant_normalized, payment_type columns to receipts
- Added category, merchant_normalized columns to bank_transactions
- Backfilled date_ts on 42/43 receipts (1 had literal "null" string date)
- Backfilled merchant_normalized on all 43 receipts
- Built rule-based categorizer for bank transactions — 134 txns classified without Ollama
- Top non-transfer categories: Household $23k (mortgage), Health $383, Travel $335, Food $115
- Ollama was too slow on CPU (timeout on 10-item batches) — keyword rules are better here
- Updated app.py to use date_ts instead of parsing text dates from receipts
- Integrated bank_transactions as third data source (debits, excluding Transfer/Income)
- Updated /api/sync to report counts from all three sources
- Persisted GRANT SELECT on bank tables to heezy_app in schema.sql
- Deployed to k8s, verified: gas now $140.52 (was $36), June spend $2,226
- All commits pushed to Gitea + GitHub
Remaining Phase 1: - [ ] Update receipts service to populate date_ts + merchant_normalized on new inserts - [ ] Load June 2026 BoA statement (coverage ends May 26) - [ ] docs.heezy.info setup (MkDocs + Cloudflare Pages)