docs/handover/TAKEOVER.md

Labs takeover — pick up the in-flight LoRA validation

You (the Claude session continuing this work) are walking into a Replicate LoRA training that was launched 2026-05-29 18:27 UTC and a polling loop that was killed at 18:37 UTC so the local Mac could be packed up. The training is still running on Replicate's side. This doc tells you exactly how to resume + what to do when it completes.

What's running on Replicate


Training ID	`ey5dzs2z9nrmw0cyej88fasczr`
Trainer	`ostris/flux-dev-lora-trainer:26dce37af90b9d997eeb970d92e47de3064d46c300504ae376c75bef6a9022d2`
Destination model	`thor-reef/ikatomi-cb-fixture-lora-v1` (private)
Hyperparams	steps=2000 · lora_rank=32 · resolution=1024 · learning_rate=4e-4 · trigger_word=`TOKR` · autocaption=true
Status when polling paused	`processing` at t+546s (~9 min in; training is ~25-30 min total → expect ~16-20 more min on Replicate)
Spend so far	~$5 (training)
Spend remaining	~$0.30 (6-cell inference + judge)

State file (load-bearing — don't lose this)

ikatomi-labs/runs/20260529-cb-lora-v1/training_state.json

Contains training_id, zip_url (uploaded fixture on Replicate), and will accumulate training_result and inference blocks as the script progresses. The script is idempotent against this file — if training_id is set it resumes polling instead of starting a new training. If this file is missing, do NOT run the script unmodified — it will start a fresh $5 training. Ask Thor instead.

The Mini gets this file via bundle-secrets.sh → restore-secrets.sh. Verify it's there before resuming:

cat ~/Projects/reef-digital/ikatomi-workspace/ikatomi-labs/runs/20260529-cb-lora-v1/training_state.json
# Should show {"zip_url": "...", "training_id": "ey5dzs2z9nrmw0cyej88fasczr"}

Resume in one command

cd ~/Projects/reef-digital/ikatomi-workspace/ikatomi-labs
.venv/bin/python -m scripts.train_cb_lora_v1

Script flow with the existing state: 1. Reads training_state.json → sees training_id → skips upload + start-training. 2. Polls Replicate every 30s until terminal state. 3. On succeeded: pulls the trained version ref from tr["output"]["version"], persists it to state. 4. Runs the 6-cell inference matrix (3 prompts × 2 seeds) at 50 inference steps / guidance 3.5 / lora_scale 1.0 / 2:3 aspect / 1024px PNG. 5. Writes each PNG to runs/20260529-cb-lora-v1/CB-LORA-{prompt}-S{seed}.png + appends to state.

Run it in the foreground OR nohup ... & disown if you want to background-poll. It writes to stdout — keep an eye on the log.

The 6-cell matrix

Prompts (defined in scripts/train_cb_lora_v1.py:PROMPTS): - front-full — full-body fashion catalog, white tank + grey shorts, clean studio backdrop, neutral expression - upper-front — upper-body fashion catalog, white tank, clean studio backdrop, neutral expression
- 3q-side — three-quarter side, neutral grey dress, soft natural lighting

Seeds: 42, 7301

All prompts include the trigger word TOKR (the LoRA's identity anchor).

When inference completes — the dev page (mandatory before handback)

Per feedback_dev_view_per_milestone.md, no milestone counts as done without a /dev/<page> Thor can open. Build:

Page: ikatomi-webapp/app/dev/v1.9/lora-identity-cb/page.tsx
Mirror the shape of: ikatomi-webapp/app/dev/v1.9/identity-body/page.tsx (already shipped — read it first)
Asset dir: ikatomi-webapp/public/dev/v1.9/lora-identity-cb-v1/ — copy the 6 PNGs + 14 source fixtures + training_state.json here

Page should surface: - Verdict banner (PROCEED / REJECT) - Stat cards: identity-pass count / photographic-pass count / full-pass count (per the same axes used in identity-body page) - Per-cell tiles with source ↔ render and click-to-expand judge reasoning - A "training context" panel: what hyperparams, what fixture, what cost

After building: restart dev + worker per .claude/rules/always-restart-both-before-handback.md, probe http://localhost:3001/dev/v1.9/lora-identity-cb returns 200, hand back with that URL as the final line.

PROCEED bar

≥ 80% pass rate on BOTH Q1 (face_is_source_identity) AND Q4 (looks_photographic) across the 6 cells.

Q1 < 80%: identity drift — the LoRA didn't learn the face strongly enough. Options: more training steps, higher lora_rank, more fixture photos. Surface to Thor before any of those.
Q4 < 80%: AI-plastic skin / softbox lighting — the FLUX prior is winning. Pipe outputs through Magnific photorealistic upscale and re-judge.
Both ≥ 80%: surface to Thor with the dev page URL and ask for the webapp port plan.

The judge is in pipeline/compare.py (mirrors the shape used in runs/20260529-replicate-body-multiface-v1/). Use claude-sonnet-4-6 vision call with the 4-axis prompt.

Hard nos (don't do these without Thor's ack)

❌ Don't start a second training run. Another $5 + Thor's trust on cost discipline.
❌ Don't port to webapp before Thor reviews the dev page (feedback_labs_results_need_thor_review_before_port.md).
❌ Don't probe more vendors. The embedding-conditioned vendor family (FASHN, PhotoMaker, InstantID, PuLID, FLUX-PuLID, IPAFID) has been exhaustively probed at the dev page /dev/v1.9/identity-body. All DOES-NOT-WORK. The LoRA path IS the architectural decision.
❌ Don't claim the LoRA looks great from your own eyes. Visual judgment burned on 2026-05-29 — "outputs look dramatically better" was wrong; judge then flagged 11/12 fail. Thor reviews the dev page, not your assessment.
❌ Don't extend the matrix beyond the 6 planned cells without ack — that's feedback_single_look_personal_ack_before_extending.md.

Why this work exists (one-paragraph background)

The v1.9 wizard's "upload 1 portrait → render full-body model" arc hit a hard architectural ceiling. 6 vendor probes (FASHN model-create ×4 cycles, fal PhotoMaker, fal InstantID, fal FLUX-PuLID, fal IPAFID, Replicate PuLID) all topped out at 8–56% identity preservation. Root cause: embedding-conditioned models give identity-themed not identity-preserving output — ArcFace embeddings encode similarity-class, FLUX picks the fashion-prior-likely face that maximizes similarity, distinctive features get lost. The pivot is per-customer LoRA fine-tune (Botika / CreatorKit / Vmodel / Synthesia model). 10–15 photos at signup → train LoRA ~20 min → identity baked into the weights → photographic + identity-locked output thereafter. The Cate Blanchett fixture validates this architecture before scoping the webapp port. Full visual evidence: /dev/v1.9/identity-body on a running webapp.

Fixture origin (for context if you need to re-train)

14 CC-licensed Wikimedia Commons photos of Cate Blanchett curated for varied events (Cannes 2026, Berlinale 2023, Camerimage 2024, TIFF 2024, Venice 2024). Dropped: group shots, sunglasses photos, near-duplicates. Lives at ikatomi-labs/inputs/lora-cb-fixture/cb_*.jpg + zipped as cb-fixture.zip (83 MB). The zip URL on Replicate is already in training_state.json — no need to re-upload. Labs validation only, never reaches customers.

If training failed

Read training_state.json:training_result.status. If failed or canceled, surface to Thor with the error message from training_result.error. Don't auto-retry.

Auto-memory equivalent

This doc has a memory mirror at ~/.claude/projects/-Users-thor-Projects-reef-digital-ikatomi-workspace/memory/project_cb_lora_in_flight.md — same content, indexed in MEMORY.md. Auto-loads on relevance in any session. The repo doc is the persistent + human-readable copy; the memory is the auto-loaded copy.