Superseded by 012-intent-cli.md.
Decision: Safe-by-Construction Reconciliation
Principle: Safe-by-construction operations
Context
braid apply originally mixed two fundamentally different operation classes:
- One-time destructive initialization —
cryptsetup luksFormatdestroys all data on the target device. It is not idempotent. Running it twice destroys a working LUKS volume. - Repeatable reconciliation —
cryptsetup luksOpen,btrfs device add/remove, balance, verify. These are safe to run repeatedly.
The structural hazard: state ambiguity (temporarily absent disk vs truly new disk) can route execution toward formatting. A disk that was unplugged and replugged could be misidentified as “new” and reformatted, destroying data.
Options considered
- Registry-based — track disk identity in a persistent registry to distinguish “new” from “returning”. Adds hidden state, drift risk, and recovery complexity.
- Config flags — add lifecycle state (
new/existing/replace) to NixOS config. Violates declarative end-state principle — config becomes imperative. - Structural separation — move destructive operations to a separate command that requires explicit operator intent.
applyphysically cannot format.
Decision
Option 3. Hard boundary between destructive initialization and safe reconciliation.
Architecture
braid init-disk <by-id>— the only command that may callcryptsetup luksFormat. Requires the disk to be declared in config, not already LUKS-formatted (unless--force), and not in an active pool. Enforces single-passphrase invariant.braid plan/braid apply— reconciliation only. EmitsOPEN_LUKS(non-destructive open), neverluksFormat. Non-LUKS disks produce anINIT_REQUIREDwarning telling the operator to runinit-diskfirst.
Hard boundary enforcement
cryptsetup luksFormat is forbidden in the plan/apply code path. This is verified by:
- Code inspection — no
luksFormatcall exists incompute_plan(), executor dispatch, or any function reachable fromcmd_apply(). - Test assertion —
braid-apply.pyincludes an explicit test thatapplynever containsluksFormat. - The
ADD_DISK_LUKS_FORMAT_OPENaction type has been removed from the planner and executor entirely.
Missing-disk policy
Absent configured disks are skipped with a DISK_ABSENT_SKIPPED warning. The plan remains applicable — other safe operations proceed. This prevents a temporarily disconnected disk from blocking all reconciliation.
Missing pool devices (devices in btrfs but not in config) require explicit operator intent to evict: --allow-remove-missing flag plus BRAID_CONFIRM='remove missing device from pool' environment variable. This prevents accidental eviction of temporarily absent disks.
Device identity is established by LUKS UUID, not by path or mapper name. When a config disk is absent, its UUID is unknowable, creating identity ambiguity for removal decisions. If the planner wants to remove a pool device but cannot verify it doesn’t match an absent config disk, the plan is blocked with IDENTITY_AMBIGUOUS_ABSENT_DISK. The operator can override with --allow-remove-ambiguous plus BRAID_CONFIRM='remove despite ambiguous identity'.
Resume strictness
Fresh apply is tolerant of absent disks (skip + warn). But checkpointed in-flight actions are strict: if a pending action’s target becomes absent during --resume, the apply fails with RESUME_TARGET_MISSING. The checkpoint is preserved for retry after the target is restored.
Constraint
Two commands (init-disk + apply) instead of one. Operators must explicitly initialize each new disk before reconciliation can include it. This is the minimum viable approach given that LUKS formatting is destructive and non-idempotent.
Revisit trigger
If NixOS or LUKS ever provides an idempotent “ensure formatted” primitive that is safe to run on an already-formatted device, the separation can be revisited.
See
cli/src/— Rust CLI (init-disk,plan,apply,status)- 002-config-first-workflow.md — config-first principle
- 008-unified-cli.md — plan/apply architecture and action types