# DECISIONS.md - Locked Criteria, Scoring Rubric, Decision Log

> **Creator-only document. Do not ship to buyers.**

**Version**: 1.6
**Last updated**: April 28, 2026

This document captures the original locked operating criteria, the scoring framework used to select the product category, the platform-model evaluation, and key decisions with rationale. It exists so future-you (or a recovery rebuild) can reconstruct *why* the project is what it is, not just *what* it is.

---

## 1. Locked Operating Criteria

These are the constraints, targets, and goals the product strategy must satisfy. Established at project start. Any change to these requires an explicit re-lock.

### Constraints

| # | Criterion | Notes |
|---|---|---|
| 1 | Cash budget ≤ $1,200/month | Recurring monthly only; no large one-time capital, no external funding |
| 2 | Time available ≤ 10 hours/week | Strong preference for build-once assets generating revenue for years with minimal maintenance |
| 3 | Skill set: Database Design, Data Pipelines, Data Aggregation, Programming | Every opportunity must directly leverage these |
| 4 | Existing network: none | Zero reliance on personal connections for acquisition, sales, or operations |

### Targets

| # | Target | Notes |
|---|---|---|
| 5 | Time to first revenue: 15 days preferred, 90 days hard stop | |
| 6 | Revenue ceiling: tiered (see BUSINESS.md Section 6) | Revised from original $50k/mo. Realistic 12-month target: $5k/mo |
| 7 | Lifestyle cashflow goal | Sustainable for several years, no saleable-asset exit required |
| 8 | Distribution: fully async, no-touch, automated | Revisit at $5k/mo (see BUSINESS.md Section 8) |
| 9 | Day-to-day work pattern: deep work + recovery periods | No real-time on-call or customer-facing constraints |

### Goals

| # | Goal | Priority |
|---|---|---|
| 10 | Escape 9-5 W2 employment without stability concerns | Primary |
| 11 | Free up time for retirement lifestyle, optional enjoyable work | Secondary |

### No Internal Contradictions

The original criteria were checked for tension. The "fully async + 15-day to first revenue + no network" combination is tight but workable, with the caveat documented in BUSINESS.md Section 8 (revisit async constraint at $5k/mo).

---

## 2. Scoring Rubric

Every business candidate was scored 1-5 on six dimensions. Total /30, then mapped to verdict.

| Dimension | What it measures |
|---|---|
| Fit to locked criteria | Direct match to constraints 1-4 and targets 5-9. **Any 1 is a hard kill.** |
| Demand durability | Structural shift vs. trend peak. Will this still pay in 3 years? |
| Defensibility | What stops the next entrant from copying it. |
| Unit economics realism | CAC, payback period, gross margin, working capital. |
| Operator fit | Skills, capital, time, stomach for the work. |
| Exit / cash-flow optionality | Multiple paths to revenue, optionality on later changes. |

**Verdict mapping**: PURSUE / INVESTIGATE / PASS / KILL based on total score and any hard-kill dimension.

**Calibration note added in v1.1**: The original scoring inflated unit economics for the lead candidate by treating near-100% gross margin as 5/5 without accounting for CAC under the "no network" constraint. A more honest score for the Python Bundles category is 7.0-7.5/10, not 8.7/10. The strategy is still sound; the optimism just needed deflating.

---

## 3. Candidate Evaluation Summary

Five candidates were evaluated against the locked criteria. Top three:

| Rank | Candidate | Score | Verdict |
|---|---|---|---|
| 1 | Niche Python Automation Script Bundles | 8.7/10 (original) / ~7.5/10 (calibrated) | **PURSUE** |
| 2 | Curated Datasets | 8.7/10 | PURSUE (deferred) |
| 3 | Hosted Data Pipeline Micro-Tool | 8.3/10 | INVESTIGATE |

**Why #1 was selected over #2**:
- Faster path to first revenue (digital download vs. ongoing data curation pipeline).
- Lower ongoing maintenance after launch.
- Direct leverage of programming skills, not just data acquisition.
- Better fit for the "build once, sell many times" preference in criterion 2.

**Why others were ranked lower**:
- Notion Templates: weaker leverage of programming skills.
- Query Optimizer (SaaS): introduces hosting, support, and recurring infrastructure costs that conflict with the lifestyle / minimal maintenance constraint.

---

## 4. Platform Model Decision (How to Sell)

Models considered for the lead bundle:

| Model | Pros | Cons | Verdict |
|---|---|---|---|
| **Standalone tools, dual CLI + GUI interface (chosen)** | Build once, sell forever. No hosting. No SaaS support burden. Direct skill match. GUI captures non-technical buyer; CLI captures power users and automation use cases. | Requires installer for non-technical buyers. Some platform friction (signing, etc.). GUI adds build cost vs. CLI-only. | **CHOSEN (revised v1.2)** |
| SaaS web app | Recurring revenue. Easy install. | Ongoing hosting cost, support burden, SaaS scrutiny. Conflicts with "minimal maintenance" criterion. | Rejected |
| CLI-only | Lowest build cost | Wrong fit for non-technical buyer persona. Will produce refunds. | Rejected (revised v1.2) |
| Browser extension | Easy install | Limited by browser sandbox. Wrong tool for data file processing. | Rejected |
| Notion / Airtable templates | Fast to ship | Doesn't leverage programming skills. Low defensibility. | Rejected |

**Decision (revised v1.2)**: Ship as standalone tools with **both** a CLI and a GUI front-end sharing the same core logic. Packaged with cross-platform installers (PyInstaller-based) so the buyer experience approximates a native app. GUI is no longer "deferred"; it is required at v1 launch.

**Rationale for the v1.2 revision**:
- The buyer persona ("hate repetitive Excel work but cannot code") will not learn a CLI. CLI-only at this price point produces refunds.
- The deduplicator specifically requires interactive review of fuzzy-match candidates. That UX is not viable in pure CLI.
- A dual-interface design keeps the CLI for power users and future automation/scheduling use cases without sacrificing the primary buyer experience.

---

## 4a. Functional Scope Principle (added v1.2)

**Decision**: Each script ships with **complete functional coverage of the problem it names**, including features available for free elsewhere (e.g., Excel's built-in exact-match dedup).

**Rationale**: The product is "one-stop shopping" for the buyer's data-cleaning workflow. Forcing a buyer to bounce between this product and Excel/OpenRefine/etc. for parts of a single task defeats the value proposition. A buyer cleaning a customer list expects exact dedup, fuzzy dedup, normalization, and survivor-merge in one tool. Splitting that across products is what they paid to avoid.

**Consequence for design**: Do not omit a feature on the grounds that "Excel does this for free." If it belongs to the workflow, it belongs in the script.

**Anti-rule**: This is not license to scope-creep. The boundary is "the workflow this script names." A deduplicator includes everything dedup-adjacent (normalization, survivor selection, audit). It does not include format conversion, charting, or anything outside the dedup workflow. Those belong to other scripts in the bundle.

---

## 4b. UX Standards for GUI Front-End (added v1.2)

The GUI is the primary buyer surface. These standards are load-bearing.

| Standard | What it means in practice |
|---|---|
| **Works out of the box** | Dropping any reasonable CSV / XLSX onto the GUI must produce a useful result with zero configuration. The buyer should never see a config screen on first run. |
| **Sensible defaults everywhere** | Every option has a default that works for the most common case. Defaults are visible (so the user understands what is being applied) but not blocking. |
| **Progressive disclosure** | Advanced options exist but are tucked behind an "Advanced" or "Settings" pane. The default view shows the minimum needed for a first run. |
| **Plain-English labels** | No technical jargon in primary UI. "Find duplicates" not "Apply Levenshtein matching with token_set_ratio threshold". Tooltips can carry the technical detail for users who want it. |
| **Visible safety** | Dry-run / preview by default. The user sees what *would* change before any file is written. Original input is never modified. |
| **No multi-step setup** | If the GUI requires more than a single window (file picker + go button + results view) to complete a basic task, it has failed this standard. |
| **Errors that name the problem and the fix** | "Column 'email' not found in this file. Available columns: name, phone, address. Did you mean 'phone'?" not "KeyError: 'email'". |
| **Identical core to CLI** | The GUI and CLI are two front-ends over the same library code. Anything the CLI can do, the GUI can do. Anything the GUI can do, the CLI can do (possibly minus interactive review). No drift. |

**Test for "intuitive enough"**: A non-technical person who has never seen the tool can complete the lead use case (dedup a customer list with one or more confidence levels) on first launch with no documentation read. If that test fails on real users, the GUI is not yet shippable.

---

## 4c. GUI Framework Decision: Streamlit (added v1.3)

**Chosen**: Streamlit.

### Frameworks evaluated

| Framework | Verdict |
|---|---|
| **Streamlit** | **CHOSEN** |
| Tkinter + CustomTkinter | Rejected (CustomTkinter maintenance status confirmed inactive: last release Jan 2024, ~28 months old as of decision date; Snyk classifies as Inactive project) |
| Plain Tkinter | Rejected (UX quality below what a $49-79 product justifies in 2026 without significant hand-styling work) |
| Flet | Rejected (ecosystem too young for a build-once-maintain-for-years product) |
| PySide6 / Qt | Rejected (overkill for this product tier; steepest learning curve, largest bundles) |
| NiceGUI | Rejected (similar pattern to Streamlit but smaller community and less mature data-tool ergonomics) |

### Full evaluation matrix (added v1.6)

Promoted from chat-history-only into docs in v1.6 to lock the rejection reasoning against re-litigation. Scored 1-5 where 5 is best for *this specific product*.

| Dimension | Tkinter | Tk+CTk | Streamlit | Flet | PySide6 | NiceGUI |
|---|---|---|---|---|---|---|
| Non-tech UX quality (look + feel) | 1 | 3 | 4 | 4 | 5 | 4 |
| "Native window opens" (no browser) | 5 | 5 | 1 | 5 | 5 | 1 |
| Build speed for v1 | 3 | 3 | 5 | 4 | 2 | 4 |
| Build speed per added feature | 3 | 3 | 5 | 4 | 2 | 4 |
| PyInstaller compatibility (low friction) | 5 | 4 | 2 | 3 | 3 | 2 |
| Bundle size (smaller = better) | 5 | 4 | 1 | 3 | 2 | 1 |
| Maintenance burden over time | 4 | 3 | 4 | 3 | 4 | 3 |
| Ecosystem maturity / longevity bet | 5 | 3 | 4 | 2 | 5 | 3 |
| Solo dev learning curve | 4 | 4 | 5 | 4 | 2 | 4 |
| Suits "drop file, see result" pattern | 3 | 3 | 5 | 4 | 4 | 5 |
| **Total /50** | **38** | **37** | **38** | **36** | **34** | **35** |

**The total is misleading on purpose.** Equal totals hide that these options fail differently. Tkinter ties Streamlit on the sum but loses on look-and-feel and data-app fit (the dimensions that matter most for this product). The verdict is in the per-dimension story, not the sum.

**Per-option summary** (the substance behind the verdicts):

- **Plain Tkinter**: Smallest bundle (~30-50 MB added), most predictable PyInstaller behavior, will work in 10 years. Default widgets look like 1998. A non-technical buyer paying $49-79 and seeing a default Tk UI will feel cheated. Don't ship.
- **Tkinter + CustomTkinter**: Native window, ~50-80 MB added, modern look, mature PyInstaller story. Maintainer absent (last release Jan 2024). Multi-year product cannot bet UI layer on a library classified Inactive. The probable failure mode is a future macOS or Python update breaking the Tk layer with no upstream fix.
- **Streamlit**: Fastest to build for data tools. Tables, file uploads, dataframes are first-class. Mature ecosystem. Browser-launch UX is the real liability, mitigated by in-app messaging and the optional pywebview wrap (v1.1). Bundle size 300-500 MB. PyInstaller packaging fiddly first time, reusable after.
- **Flet**: Modern Flutter-based UI, native windows, looks great. Ecosystem too young for a build-once-maintain-for-years product. Breaking API changes between minor versions still happening. PyInstaller story less battle-tested.
- **PySide6 / Qt**: Industrial-grade, best widget set, native everything. Steepest learning curve, largest bundles, licensing care needed. Overkill for $49-79 product tier and burns the 10 hr/wk time budget on UI scaffolding instead of script features.
- **NiceGUI**: Similar pattern to Streamlit (Python-to-web). Smaller community, less mature data-tool ergonomics. Same browser-launch tradeoff without Streamlit's velocity advantage.

### Why Streamlit won

1. **Fastest build velocity for v1 and every subsequent bundle.** "Drop a CSV, see results" is the native Streamlit interaction pattern. Tables, filters, dataframes display well with minimal code. This compounds across the 9-script lead bundle and the future 5 bundles in the roadmap.
2. **Lowest maintenance burden per added feature.** Active framework, large community, mature ecosystem. Bug fixes happen upstream, not on this project's time.
3. **Hosted browser demo as a marketing asset from day one.** A Streamlit app deploys to Streamlit Community Cloud (free) or a $5/mo VPS. The Gumroad landing page can offer "Try it free in your browser" with a sample dataset. For a $49-79 product where buyers cannot evaluate quality before purchase, a working demo can move conversion meaningfully. Tkinter family options cannot provide this.
4. **Future SaaS optionality** (expanded v1.6). Not a driver of this decision; the locked criteria reject SaaS. But if criteria ever evolve, Streamlit code converts to a hosted multi-user app in hours rather than weeks. Streamlit's session-state model, component patterns, and HTTP-server architecture are SaaS-native by default; the same code that runs the desktop bundle's local browser GUI runs unchanged on a hosted server (modulo authentication and per-user file isolation). Tkinter code, by contrast, would require a complete rewrite to become a hosted product. This is low-cost optionality: zero implementation effort now, meaningful flexibility later if the lifestyle-cashflow constraint ever lifts in favor of recurring revenue.

### Tradeoffs accepted

1. **Browser-launch UX on the desktop install.** When a buyer double-clicks the desktop shortcut, their default browser opens to a localhost URL. This may briefly confuse non-technical buyers. **Mitigation**: a single sentence in the welcome dialog and install email explains that the data tool runs in the browser locally and uses no internet. If support tickets show this is a meaningful confusion driver, evaluate wrapping with pywebview (native window around the local Streamlit server) in v1.1.
2. **Larger bundle size**, ~300-500 MB vs. ~50 MB for Tkinter. Acceptable for marketplace download in 2026 with typical broadband.
3. **PyInstaller packaging is fiddly** the first time. Budget 1-3 days for the one-time setup, then it's reusable across all subsequent bundles via a shared template.
4. **Streamlit's session re-run model is unusual.** Manageable for single-user data tools; would matter more if the SaaS optionality were exercised at scale.

### Why CustomTkinter was rejected (the previously-favored option)

A web check during this decision found that CustomTkinter's last PyPI release was 5.2.2 in January 2024. As of April 2026, that's roughly 28 months without a release, and Snyk classifies the project as Inactive. The library still works and remains popular (~115k weekly downloads, 13k+ GitHub stars), but the maintainer is effectively absent. For a product intended to ship to non-technical buyers and remain functional for years with minimal touch from the operator, betting the UI layer on an unmaintained library is an unacceptable risk: any future Python or macOS update that breaks the Tk underpinnings becomes the operator's problem to fix or fork.

This is the kind of dependency risk that matters most in a "build once, sell forever" product, where every hour spent firefighting a dependency break is an hour stolen from the next bundle.

---

## 5. Distribution Channel Decision

**Chosen primary**: Marketplace listings (Gumroad, Lemon Squeezy).

**Rationale**: Under the "no network + fully async + 90-day hard stop" constraints, marketplaces are the only channel that:
- Has built-in buyer traffic (no audience-building required).
- Handles payments, delivery, refunds asynchronously.
- Allows listing in days, not months.

Own-domain SEO is treated as a long-term compounding asset (6-18 months to traction), not an early-stage channel.

**Added v1.3**: A **hosted browser demo** of each bundle (deployed via Streamlit Community Cloud) becomes a secondary distribution surface and a primary conversion-rate lever on the landing page. Marketing details in BUSINESS.md Section 7.

---

## 6. Pricing Decision

**Chosen**: $49-$79 per bundle, $149 for full suite (when 3+ bundles exist).

**Rationale**:
- Below $99 threshold avoids procurement / approval friction for solo operator buyers.
- Above $99 raises buyer expectations (SaaS, human support) that conflict with the no-touch constraint.
- $49-$79 produces the right unit economics for marketplace fees + Stripe fees while remaining impulse-purchase territory.

---

## 7. Decision Log (Chronological)

| Date | Decision | Rationale |
|---|---|---|
| April 2026 | Lock operating criteria | Project kickoff |
| April 2026 | Select Python Automation Script Bundles as the product category | Highest score against locked criteria |
| April 2026 | Choose CLI standalone over SaaS / GUI | Best fit for minimal maintenance + skill leverage |
| April 2026 | Pick Excel & CSV Data Cleaning Mastery as lead bundle | Highest pain, broadest demand, easiest demonstration |
| April 2026 | Initial install path: Inno Setup (Windows-only) | First-pass design |
| April 2026 (revised v1.1) | **Switch to PyInstaller-based cross-platform pipeline** | Eliminates "install Python first" friction; expands TAM to Mac and Linux users |
| April 2026 (revised v1.1) | **Enroll in Apple Developer Program ($99/yr)** | Required for clean macOS install experience for non-technical buyers |
| April 2026 (revised v1.1) | **Replace $50k/mo target with tiered realistic targets** | Original target was unsupported by evidence base; tiered targets hit $5k at 12mo, $10k at 24mo |
| April 2026 (revised v1.1) | **Tag "fully async no-touch" for revisit at $5k/mo** | Strict adherence pre-PMF may cost more revenue than it saves time |
| April 28, 2026 (v1.2) | **Functional scope: include all workflow-relevant features even if available free elsewhere** | One-stop shopping is the value proposition. Forcing buyers to bounce between products defeats the purpose. See Section 4a. |
| April 28, 2026 (v1.2) | **Promote GUI from "deferred" to required at v1 launch; ship dual CLI + GUI interface** | Buyer persona will not use CLI. Deduplicator specifically requires interactive review UX that CLI cannot deliver well. See Section 4. |
| April 28, 2026 (v1.2) | **Lock UX standards for GUI: works out of the box, sensible defaults, progressive disclosure, plain-English labels, dry-run by default** | These are load-bearing for the non-technical buyer. Without them the GUI may exist but won't justify the price. See Section 4b. |
| April 28, 2026 (v1.3) | **Lock GUI framework as Streamlit; reject CustomTkinter (maintenance inactive), plain Tkinter (UX gap), Flet/PySide6/NiceGUI (each fails on a dimension that matters)** | Fastest build velocity, lowest maintenance burden, hosted browser demo as marketing asset, future SaaS optionality. Browser-launch UX accepted as a tradeoff with documented mitigation. See Section 4c. |
| April 28, 2026 (v1.3) | **Add hosted browser demo as secondary distribution surface and conversion lever** | Direct consequence of Streamlit choice. See Section 5 and BUSINESS.md Section 7. |
| April 28, 2026 (v1.4) | **Re-apply 03/05 script boundary work dropped during v1.3 merge (silent drift recovery)** | Stream B v1.2 content (sharpened 03/05 descriptions in USER-GUIDE, run-order rule, TECHNICAL.md Section 9 boundary spec, RECOVERY.md pointer) was overwritten when Stream A's parallel v1.3 Streamlit work was saved to project. Restoring per the doc's own no-silent-drift rule. 03 owns "what's not there" (missing values, sentinel codes, imputation), 05 owns "what shouldn't be there" (statistical outliers, domain rules, winsorization). 03 runs before 05 because outlier statistics on data containing NaN or sentinel codes are mathematically poisoned. See TECHNICAL.md Section 9. |
| April 28, 2026 (v1.5) | **Add `02_text_cleaner.py` as new script; renumber 02-08 → 03-09** | Audit revealed character-level hygiene (whitespace trimming, multi-space collapse, Unicode normalization, BOM handling, line-ending normalization, special-character handling) had no clear owner. Was implicitly scattered: `01_deduplicator` normalizes internally for matching only (doesn't write back), `02_format_standardizer` (now 03) implies it but its named scope is dates/currencies/names/phones/addresses, `03_missing_value_handler` (now 04) only handles whitespace-only as disguised null. A buyer with trailing-space pollution had no obvious script to run. Per Section 4a (functional scope principle: one-stop shopping for the workflow), this was a real gap. Added as 02 because text cleaning is a pre-processing step that should run before format standardization, missing-value handling, and outlier detection. Kept 01 (deduplicator) at position 1 as the lead/working/marketing-flagship script; numbering does not strictly equal pipeline order, the orchestrator manages execution order. Renumber consequence: TECHNICAL.md Section 9 boundary references updated 03→04, 05→06; orchestrator references updated 08→09. New contested case documented in Section 9.3: whitespace-only cells (02 trims first, leaving empty string; 04 then detects empty strings as disguised null). Master orchestrator now 09. |
| April 28, 2026 (v1.6) | **Fold conversation-history content into docs: deduplicator functional spec, lead bundle use cases, competitive landscape, full GUI framework comparison matrix, concrete 04/06 boundary examples, expanded Streamlit-to-SaaS reasoning** | None of this represents new decisions; all of it represents prior analysis that lived only in chat history and was at risk of evaporating. Per the doc's own no-silent-drift rule (Section 8) and the v1.4 recovery story, valuable analysis must be promoted to docs to survive. Specifically: TECHNICAL.md gains Section 10 (per-script functional specs, starting with the deduplicator's 36-item tiered spec) which is the buildable target for the v1 launch GUI port; this also makes the gap between "currently working" (exact + basic fuzzy) and "v1 launch best-of-class" (Tier 1) explicit so the docs don't quietly overstate where the code is. Section 9.3 gains three concrete distinguishing examples (bank-export blank fees / $1M outlier / "999=refused") that prove 04 and 06 are distinct concerns. BUSINESS.md gains Section 4a (Lead Bundle Deep Dive: 15 use cases by persona, 6-row competitive landscape table, market gap statement) which feeds landing page copy and demo design. Section 4c gains a 10-dimension scored framework matrix and per-option summaries (locks the rejection reasoning against re-litigation), plus expanded point 4 on Streamlit-to-SaaS migration cost. RECOVERY.md updated to reference Section 10 in rebuild and priority steps. No structural decisions changed; this is pure capture work. |

---

## 8. What Would Trigger Re-Locking the Criteria

These criteria are load-bearing and not casually changed. Triggers for explicit re-evaluation:

- Hitting the $5k/mo revenue tier (revisit async constraint).
- Hitting the $10k/mo revenue tier (revisit time-budget allocation).
- A platform shutting down (Gumroad / Lemon Squeezy policy change forcing channel migration).
- A new skill acquired that opens a higher-leverage product category.
- A burnout signal indicating the time / recovery balance is broken.
- Streamlit project taking a hard direction change that breaks the desktop-packaging path (low probability, but worth flagging).

Any re-lock requires writing the new criteria here with a date and rationale. No silent drift.
