notshot.ai labs
docs/handover/TAKEOVER.md

Labs takeover — pick up the in-flight LoRA validation

You (the Claude session continuing this work) are walking into a Replicate LoRA training that was launched 2026-05-29 18:27 UTC and a polling loop that was killed at 18:37 UTC so the local Mac could be packed up. The training is still running on Replicate's side. This doc tells you exactly how to resume + what to do when it completes.

What's running on Replicate

Training ID ey5dzs2z9nrmw0cyej88fasczr
Trainer ostris/flux-dev-lora-trainer:26dce37af90b9d997eeb970d92e47de3064d46c300504ae376c75bef6a9022d2
Destination model thor-reef/ikatomi-cb-fixture-lora-v1 (private)
Hyperparams steps=2000 · lora_rank=32 · resolution=1024 · learning_rate=4e-4 · trigger_word=TOKR · autocaption=true
Status when polling paused processing at t+546s (~9 min in; training is ~25-30 min total → expect ~16-20 more min on Replicate)
Spend so far ~$5 (training)
Spend remaining ~$0.30 (6-cell inference + judge)

State file (load-bearing — don't lose this)

ikatomi-labs/runs/20260529-cb-lora-v1/training_state.json

Contains training_id, zip_url (uploaded fixture on Replicate), and will accumulate training_result and inference blocks as the script progresses. The script is idempotent against this file — if training_id is set it resumes polling instead of starting a new training. If this file is missing, do NOT run the script unmodified — it will start a fresh $5 training. Ask Thor instead.

The Mini gets this file via bundle-secrets.shrestore-secrets.sh. Verify it's there before resuming:

cat ~/Projects/reef-digital/ikatomi-workspace/ikatomi-labs/runs/20260529-cb-lora-v1/training_state.json
# Should show {"zip_url": "...", "training_id": "ey5dzs2z9nrmw0cyej88fasczr"}

Resume in one command

cd ~/Projects/reef-digital/ikatomi-workspace/ikatomi-labs
.venv/bin/python -m scripts.train_cb_lora_v1

Script flow with the existing state: 1. Reads training_state.json → sees training_id → skips upload + start-training. 2. Polls Replicate every 30s until terminal state. 3. On succeeded: pulls the trained version ref from tr["output"]["version"], persists it to state. 4. Runs the 6-cell inference matrix (3 prompts × 2 seeds) at 50 inference steps / guidance 3.5 / lora_scale 1.0 / 2:3 aspect / 1024px PNG. 5. Writes each PNG to runs/20260529-cb-lora-v1/CB-LORA-{prompt}-S{seed}.png + appends to state.

Run it in the foreground OR nohup ... & disown if you want to background-poll. It writes to stdout — keep an eye on the log.

The 6-cell matrix

Prompts (defined in scripts/train_cb_lora_v1.py:PROMPTS): - front-full — full-body fashion catalog, white tank + grey shorts, clean studio backdrop, neutral expression - upper-front — upper-body fashion catalog, white tank, clean studio backdrop, neutral expression
- 3q-side — three-quarter side, neutral grey dress, soft natural lighting

Seeds: 42, 7301

All prompts include the trigger word TOKR (the LoRA's identity anchor).

When inference completes — the dev page (mandatory before handback)

Per feedback_dev_view_per_milestone.md, no milestone counts as done without a /dev/<page> Thor can open. Build:

Page should surface: - Verdict banner (PROCEED / REJECT) - Stat cards: identity-pass count / photographic-pass count / full-pass count (per the same axes used in identity-body page) - Per-cell tiles with source ↔ render and click-to-expand judge reasoning - A "training context" panel: what hyperparams, what fixture, what cost

After building: restart dev + worker per .claude/rules/always-restart-both-before-handback.md, probe http://localhost:3001/dev/v1.9/lora-identity-cb returns 200, hand back with that URL as the final line.

PROCEED bar

80% pass rate on BOTH Q1 (face_is_source_identity) AND Q4 (looks_photographic) across the 6 cells.

The judge is in pipeline/compare.py (mirrors the shape used in runs/20260529-replicate-body-multiface-v1/). Use claude-sonnet-4-6 vision call with the 4-axis prompt.

Hard nos (don't do these without Thor's ack)

Why this work exists (one-paragraph background)

The v1.9 wizard's "upload 1 portrait → render full-body model" arc hit a hard architectural ceiling. 6 vendor probes (FASHN model-create ×4 cycles, fal PhotoMaker, fal InstantID, fal FLUX-PuLID, fal IPAFID, Replicate PuLID) all topped out at 8–56% identity preservation. Root cause: embedding-conditioned models give identity-themed not identity-preserving output — ArcFace embeddings encode similarity-class, FLUX picks the fashion-prior-likely face that maximizes similarity, distinctive features get lost. The pivot is per-customer LoRA fine-tune (Botika / CreatorKit / Vmodel / Synthesia model). 10–15 photos at signup → train LoRA ~20 min → identity baked into the weights → photographic + identity-locked output thereafter. The Cate Blanchett fixture validates this architecture before scoping the webapp port. Full visual evidence: /dev/v1.9/identity-body on a running webapp.

Fixture origin (for context if you need to re-train)

14 CC-licensed Wikimedia Commons photos of Cate Blanchett curated for varied events (Cannes 2026, Berlinale 2023, Camerimage 2024, TIFF 2024, Venice 2024). Dropped: group shots, sunglasses photos, near-duplicates. Lives at ikatomi-labs/inputs/lora-cb-fixture/cb_*.jpg + zipped as cb-fixture.zip (83 MB). The zip URL on Replicate is already in training_state.json — no need to re-upload. Labs validation only, never reaches customers.

If training failed

Read training_state.json:training_result.status. If failed or canceled, surface to Thor with the error message from training_result.error. Don't auto-retry.

Auto-memory equivalent

This doc has a memory mirror at ~/.claude/projects/-Users-thor-Projects-reef-digital-ikatomi-workspace/memory/project_cb_lora_in_flight.md — same content, indexed in MEMORY.md. Auto-loads on relevance in any session. The repo doc is the persistent + human-readable copy; the memory is the auto-loaded copy.