# RECOVERY.md - Full Project Recovery Guide

> **Creator-only document. Do not ship to buyers.**

**Version**: 1.6
**Last updated**: April 28, 2026

If the project is ever lost, this guide plus the source ZIP is enough to rebuild it 100%.

---

## 1. What's in the Project

```
project-root/
├── README.md
├── BUSINESS.md                  # Creator only
├── TECHNICAL.md                 # Creator only
├── DECISIONS.md                 # Creator only - locked criteria, rationale, GUI framework decision
├── USER-GUIDE.md                # Ships to buyers
├── RECOVERY.md                  # Creator only (this file)
│
├── scripts/                     # The 9 .py source files (CLI entry points)
│   ├── 01_deduplicator.py       # Working
│   ├── 02_text_cleaner.py
│   ├── 03_format_standardizer.py
│   ├── 04_missing_value_handler.py
│   ├── 05_column_mapper_enforcer.py
│   ├── 06_outlier_detector.py
│   ├── 07_multi_file_merger.py
│   ├── 08_validator_reporter.py
│   └── 09_master_orchestrator.py
│
├── src/
│   ├── core/                    # Shared business logic - both CLI and GUI call into this
│   ├── cli.py                   # Typer CLI front-end
│   └── gui/                     # Streamlit GUI front-end
│       ├── app.py               # Streamlit entry point
│       ├── pages/               # One Streamlit page per script in the bundle
│       └── components.py        # Shared widgets
│
├── samples/
│   ├── messy_sales.csv
│   └── bank_export.xlsx
│
├── demo/
│   └── streamlit_app.py         # Constrained version for Streamlit Community Cloud
│
├── build/
│   ├── pyinstaller.spec         # Cross-platform build spec (handles GUI launcher + CLI binaries)
│   ├── launcher.py              # Starts local Streamlit server, opens default browser
│   ├── windows/
│   │   └── installer.iss        # Inno Setup wrapper
│   ├── macos/
│   │   ├── entitlements.plist
│   │   └── dmg_settings.py
│   └── linux/
│       └── AppImage/            # AppImage build assets
│
├── ci/
│   └── build.yml                # GitHub Actions cross-platform build
│
├── tests/
│
└── requirements.txt
```

---

## 2. Rebuild Steps

### From a complete ZIP backup
1. Unzip into a clean directory.
2. Push to a GitHub repository.
3. The CI pipeline (`ci/build.yml`) builds Windows, macOS, and Linux artifacts on tagged releases.
4. Connect the repo to Streamlit Community Cloud and point it at `demo/streamlit_app.py` to redeploy the hosted demo.
5. For local builds: see Section 3.
6. Done.

### From documentation only (worst case)
1. Read `DECISIONS.md` to understand *why* the project is what it is. Section 4c locks the GUI framework as Streamlit; Section 4b locks the UX standards. These are non-negotiable.
2. Read `TECHNICAL.md` Sections 2-3 for the build pipeline architecture, including the Streamlit launcher pattern in Section 3.4.
3. Read `BUSINESS.md` for product strategy, which bundles to build, and the hosted demo as a marketing asset.
4. Recreate scripts using the spec in `USER-GUIDE.md` Section 2 (script table), `TECHNICAL.md` Section 7 (per-bundle technical notes), `TECHNICAL.md` Section 9 (boundary between scripts 04 and 06 - do not relitigate this), and `TECHNICAL.md` Section 10 (per-script functional requirements; Section 10.1 is the v1 launch target for the deduplicator).
5. Set up the cross-platform build pipeline (Section 3 below).
6. Recreate installer configs per `TECHNICAL.md` Section 3.
7. Build the constrained `demo/streamlit_app.py` for hosted deployment. Constraints: row limit, watermark, sample data only or strict file-size cap.

---

## 3. Local Build Setup (per platform)

### All platforms (common)
- Install Python 3.11+.
- `pip install -r requirements.txt pyinstaller`
- Verify Streamlit app runs locally: `streamlit run src/gui/app.py`
- Verify CLI runs locally: `python -m src.cli --help`

### Windows
- Install Inno Setup: https://jrsoftware.org/isinfo.php
- Build: `pyinstaller build/pyinstaller.spec`
- Wrap in installer: open `build/windows/installer.iss` in Inno Setup, compile.

### macOS
- Install Xcode command line tools: `xcode-select --install`
- Enroll in Apple Developer Program ($99/yr). Allow 1-2 weeks first time.
- Generate Developer ID Application certificate, install in Keychain.
- Generate app-specific password for `notarytool`.
- Build: `pyinstaller build/pyinstaller.spec`
- Sign: `codesign --deep --force --options runtime --sign "Developer ID Application: [Name]" dist/BundleName.app`
- Package as DMG.
- Notarize: `xcrun notarytool submit BundleName.dmg --wait`
- Staple: `xcrun stapler staple BundleName.dmg`

### Linux
- Install AppImage tooling: download `appimagetool` from https://appimage.github.io
- Build: `pyinstaller build/pyinstaller.spec`
- Wrap as AppImage using `appimagetool` per the assets in `build/linux/AppImage/`.

### Streamlit + PyInstaller specific notes
- A custom PyInstaller hook (`hook-streamlit.py`) is required to bundle Streamlit's data files correctly.
- Hidden imports must include `streamlit`, `altair`, `pyarrow` (and their submodules where PyInstaller fails to detect them).
- The launcher script (`build/launcher.py`) is the actual PyInstaller entry point, not the Streamlit script directly.
- Budget 1-3 days the first time getting the Streamlit-PyInstaller spec right; it's reusable across all subsequent bundles.

### CI build (recommended)
- Push the repo to GitHub.
- Tag a release: `git tag v1.0.0 && git push --tags`
- GitHub Actions runs the matrix build, produces all three artifacts.
- Manual step: download artifacts from the Releases page, upload to Gumroad / Lemon Squeezy.

### Hosted demo deployment (separate from desktop build)
- Connect GitHub repo to Streamlit Community Cloud (one-time, free).
- Configure the deployment to point at `demo/streamlit_app.py`.
- The demo updates automatically on git push to the configured branch.
- Custom domain optional via CNAME (verify Streamlit Community Cloud current policy at recovery time).

---

## 4. External Dependencies (re-acquire if lost)

| Item | Source | Cost |
|---|---|---|
| Python | https://python.org/downloads | Free |
| PyInstaller | `pip install pyinstaller` | Free |
| Streamlit | `pip install streamlit` | Free |
| Inno Setup (Windows) | https://jrsoftware.org/isinfo.php | Free |
| Apple Developer Program (macOS signing) | https://developer.apple.com | $99/yr |
| Xcode command line tools (macOS) | `xcode-select --install` | Free |
| appimagetool (Linux) | https://appimage.github.io | Free |
| GitHub Actions (CI) | github.com | Free tier covers all three OS runners |
| Streamlit Community Cloud (demo hosting) | streamlit.io/cloud | Free |
| Python libraries | See `requirements.txt`, `pip install -r requirements.txt` | Free |

---

## 5. Backup Recommendation

- **Primary backup**: GitHub repository (private). Source is the source of truth.
- **Secondary backup**: ZIP of the full project tree on cloud storage (Google Drive / Dropbox / S3).
- **Apple Developer credentials**: store certificate + app-specific password in a password manager. Losing these requires regenerating, not catastrophic.
- **Streamlit Community Cloud connection**: stored in Streamlit's UI as a GitHub OAuth link. Re-authorize from a new Streamlit account if lost.
- Back up after every meaningful code or doc change.
- Include this `RECOVERY.md` and `DECISIONS.md` in every backup. They contain the irreplaceable context.

---

## 6. Recovery Priorities (if rebuilding under time pressure)

If you only have time to rebuild part of the project, this is the order:

1. **Source: `src/core/` and `scripts/`**. Without these there is no product.
2. **DECISIONS.md**. Without this you will re-litigate every settled decision (especially GUI framework, dual interface, UX standards) and probably get it wrong differently.
3. **TECHNICAL.md**, especially Sections 9 (04/06 boundary) and 10 (per-script functional requirements). Without these you will rebuild the deduplicator with weaker fuzzy matching than the v1 launch spec demands and ship something that loses to free Excel.
4. **Streamlit GUI source (`src/gui/`)**. The primary buyer surface; without it the product reverts to CLI-only and the buyer persona will refund.
5. **PyInstaller spec + launcher + per-OS build configs** (`build/`). Reproducing the Streamlit-PyInstaller integration from scratch is 1-3 days of work.
6. **Apple Developer Program enrollment**. 1-2 week lead time. Start this first if Mac distribution matters.
7. **Hosted demo (`demo/streamlit_app.py`)**. Important marketing asset but not blocking for desktop sales.
8. Documentation files (USER-GUIDE, BUSINESS, README). Recoverable from memory + this guide.
9. CI config (`ci/build.yml`). Nice to have, not blocking.
