docs/goals.md

Active goals

For the operator taking over: these are the explicit work directives in play. Each goal lists where it came from, what done looks like, and how to instruct Claude to continue.

Macro-framing

Per Thor 2026-05-30: "the overall goal of the labs is to figure out which model works best for a given task. Right now, its how to generate the best variants from an uploaded portrait. After figuring things out, we need to transfer this approach over to the webapp. These are the main tasks and challenges".

Every active goal below is a task in this sense — a model-selection problem to solve, with an evidence-backed winner as the deliverable. After a winner lands, the next step is the webapp port (out of scope on this channel; tracked separately).

Candidate vendor space (per Thor 2026-05-30)

For every probe, the candidate model space includes BOTH cloud APIs (Replicate, fal, OpenAI, Anthropic, etc.) AND HuggingFace / open-weight models. The Mac mini is the exploratory probe surface for open-weight models — bigger models won't fit locally, but that doesn't disqualify them: probe via a cheap cloud GPU instead. Once a winner is identified, the production deployment path is AWS (SageMaker endpoint or a directly-hosted EC2 instance). Mac mini hosting is NEVER the production path. See [[local-models-in-scope]] for constraints + the decision tree.

Concrete past example: a "lighting" task where commercial vendors didn't fit and a HuggingFace model was the answer — that's the kind of probe where this matters.

1. Realistic AI-based identity variants of an uploaded portrait

Source: Thor 2026-05-30 on C0B75KPL50U threads 1780128019.836289 / 1780131789.364859 — "goal is to ensure that after an uploaded portrait/face we can generate AI based variants of this face/portrait... very realistic AI generated pic variant so that comparing orig with variant, an amateur like me does not identify the variant as AI generated... models we generate only wear bare minimum, we dress them in another step".

Status: In flight. v1 LoRA on a 14-photo CB fixture landed at Q1=83%, Q4=83% with catalog-outfit prompts (wrong baseline for downstream use). v2 in training with stronger hyperparams (4000 steps, rank 64) but ALSO with the wrong outfit prompts. v3 will fix the prompt baseline.

Pipeline framing (Thor 2026-05-30): 1. This labs probe = identity LoRA → generates body variants of an identity in bare-minimum clothing (fitted black bodysuit / athletic basics) — a clean replaceable baseline. 2. Downstream FASHN tryon (existing webapp) → puts garments on the LoRA output.

The LoRA's job is identity + body + framing; the dressing pipeline's job is clothing. See [[identity-lora-vs-dressing-pipeline]] for prompt-design rules.

Done looks like (customer bar, Thor 2026-05-30): Amateur side-by-side test — comparing the original portrait and an AI-generated variant of the same person, a non-expert cannot identify which is AI-generated. Turing-test gestalt, tighter than per-axis thresholds.

Operational bar (used by the judge): Per-cell Q1 ≥ 0.7 AND Q4 ≥ 0.7. Aggregate Q1 pass rate ≥ 95% AND Q4 pass rate ≥ 90%. No cell with Q1 < 0.3 (catastrophic identity loss is a deal-breaker even at low frequency).

To extend: - See the run page for 20260529-cb-lora-v1 — has copy-pasteable next-step prompts. - General pattern: "drop N photos into inputs/<identity>/, train LoRA via scripts/train_<identity>_lora.py, run 6-cell matrix, judge it."

2. Labs web surface (this thing)

Source: Thor 2026-05-30 on C0B75KPL50U thread 1780129450.301019 — "lightweight server that has the ability to display whatever you finish... key is to organise the labs web setup in a way that the one taken over my work knows where to find what and how to best instruct you to work on stuff".

Status: v0 shipped (you're looking at it).

Done looks like: Any takeover operator can land on this URL and within 30 seconds know what's in flight, where to find context, and how to ask Claude to continue. Verdict: see for yourself.

To extend: Add new run dirs under runs/. Drop a SUMMARY.md in each (template in /docs/handover/SUMMARY_TEMPLATE). The server picks them up automatically.