Active – Supersedes
008-unified-cli.mdand011-two-phase-apply.md.
Intent CLI
Context
Braid’s plan/apply reconciliation engine was over-engineered for NAS drives, which have ~4 events in their lifetime (create pool, add disk, add another, replace a dead one). The generic reconciler created problems:
- Risk flattening: routine reboot and adding a disk produced the same output format (a “plan” with “actions”)
- Combinatorial complexity:
--allow-remove-missing,--allow-remove-ambiguous,BRAID_CONFIRM='phrase1;phrase2' - Ceremony for routine operations:
braid applyafter every reboot
Decision
Replace plan/apply with five intent commands:
| Command | Purpose | Risk |
|---|---|---|
braid add <name=by_id>... | Format + join pool, or recover identity-verified LUKS device | Destructive (new disk), safe (returning braid disk with matching FSID), or refused (non-braid LUKS, foreign pool, no pool to verify) |
braid remove <name> | Migrate data off present disk, detach from pool | Long-running |
braid remove-missing --missing-id <devid> | Clean up a stale missing-device entry; restores RAID1 profiles if this clears the last missing device | Long-running |
braid replace --old <name> --new <name=by_id> | Replace a disk (live or dead) using btrfs replace start; restores RAID1 profiles for missing-path when clearing the last missing device | In-place swap (preserves devid) |
braid status | Display pool health and disk info | Read-only |
Disk keys
Disk membership is CLI-owned runtime state in /var/lib/braid/pool.json (see 017-runtime-disk-membership.md). pool.json is keyed by LUKS UUID; the disk name is stored as presentation metadata. Disks are added with name=by_id syntax:
braid add toshiba=/dev/disk/by-id/ata-Toshiba_MN07_XXXX \
ironwolf=/dev/disk/by-id/ata-Ironwolf_ST12_YYYY
Mapper names are braid-<name> (e.g., braid-toshiba) — human-friendly, debuggable in lsblk/systemd logs, deterministic. They are runtime handles, not persistent identity.
Safety model
The old architecture used a structural code boundary — luksFormat was literally unreachable from apply. The new architecture replaces this with:
- Explicit operator intent: user specifies a disk key and confirms
- Layered identity check for existing LUKS devices:
a. LUKS UUID is the persistent identity. LUKS label
braid-<key>is an adoption-safety gate for returning disks; non-braid LUKS is refused outright. b. Pool must be mounted — bootstrap refuses existing LUKS (no pool to verify against). c. Opened mapper’s btrfs FSID must match the current pool — foreign-pool disks are refused. d. Braid-labeled LUKS with no btrfs superblock is refused – this state is ambiguous (clean eviction, partial init, manual wipe, stale data) and cannot be distinguished without tombstones. e. A braid-labeled LUKS disk with a btrfs superblock whose FSID matches the mounted pool may be accepted as a returned-disk add target. The add journal records the LUKS UUID before mutation. If the stale btrfs signature would blockbtrfs device add, braid runs onlywipefs --all --types btrfson the verified mapper and usesbtrfs device add -f. f. Superblock guard is defense-in-depth on the FSID-matching path for existing-LUKS adds. The bootstrap path accepts only disks classified as fresh non-LUKS during add planning, and the LUKS open helpers verify that any pre-existingbraid-<key>mapper is backed by the requested by-id disk before pool creation proceeds.mkfs.btrfsitself is invoked without-f, so its own signature check is the final fail-closed guard. - Unified confirmation with device context: all mutating commands (
add,remove,remove-missing,replace) show a rich device-info block (model, size, serial via lsblk) and confirm withType 'yes' to continue:. Degraded-path warnings are informational text, not special confirmation phrases.--yesskips the prompt for scripting. - Disk name immutability: mutating commands validate names against recorded disk identity and reject name rename/reassignment. Operators must use explicit
replaceorremove+addworkflows instead of renaming. - Journal-protected mutations: mutating commands write
pending-op.jsonbefore the first irreversible step; it is cleared only after the full operation (including follow-up work like soft balance) succeeds. Existing-pool add, replace, and remove-missing journals are phased. TheirPoolMutationphases may reconcile whether the primary btrfs membership mutation committed; their post-maintenance phases may only validate committed membership, repairpool.json, and finish owed resize/balance work. On any error exit, the journal persists to enablebraid recover.
--dry-run performs side-effect-free, passphrase-free LUKS probes only – LUKS label reads, and the keyfile credential test used by braid enroll (cryptsetup open --test-passphrase --key-file, which evaluates a credential without activating the device). Checks that require a passphrase or an open mapper – e.g. full identity verification (FSID comparison) – are deferred to execution time when the mapper is closed.
The dry-run preview itself stays on stdout. Side-effect-free probes that nevertheless do bound long-running work – specifically the Argon2-bounded --test-passphrase evaluation in braid enroll --dry-run – emit canonical [wait]/[ok]/[skip] status rows to stderr per Principle 13. Announce long-running work. The previous “successful dry-run leaves stderr empty” contract is intentionally relaxed for this case: an Argon2 derivation runs whether or not the user can see it, and silent dry-runs that take seconds-to-minutes look like hangs. The structured preview output is unchanged.
Replace safety constraints
--oldaccepts both live (present in pool) and dead/missing disks.- Both paths use
btrfs replace start— the sole replacement primitive. Live disks replace in-place; missing disks are rebuilt from RAID redundancy by devid. --missing-idis only valid when--oldis dead/missing. Rejected with live--old. Validated againstPoolState::missing_devids(live btrfs state viaprobe::probe_pool).- The missing devid is auto-resolved from
--old’s persisted pool.json devid, cross-checked againstPoolState::missing_devids– independent of how many devices are missing. Because--old’s name already identifies the member, no missing-count gate is needed;--missing-idis an optional cross-check (it must equal the persisted devid, elseOldDevidMismatch) and is never required. - Mixed state (live
--old+ pool has missing devices) is rejected – operator must repair the missing device first withbraid replace --old <missing-name> --new <new-name>=/dev/disk/by-id/<...>.braid remove-missingis only for intentional cleanup (forgetting stale entries without rebuilding data). - No replacement path uses
btrfs device add. Missing-path replace may run a post-commit soft RAID1 balance only when it clears the last missing device.
ENOSPC pre-flight check
remove and remove-missing validate that surviving devices have enough
unallocated space to absorb the target device’s allocations before invoking
btrfs device remove. Without this, btrfs will either ENOSPC instantly or
crash the filesystem to read-only mid-relocation (reproduced in
tests/repro/).
The >=2-survivor remove path treats relocation-probe uncertainty as
warn-and-proceed – a miss falls through to btrfs’s clean instant-ENOSPC –
while remove-missing and the 2→1 remove path are fail-closed on any
uncertainty, because a miss there can crash the filesystem read-only with
pending-op.json already written.
remove-missing also refuses an untrusted missing-device allocation shape
before btrfs device remove. Its trust check validates shape, not per-type
completeness: the targeted missing devid must have exactly one usage stanza,
every positive target allocation row must be one of Data/Metadata/System RAID1,
and at least one positive supported row must be present. Missing supported row
types are treated as zero demand because a sparse 3+ device RAID1 member may
legitimately hold only a subset of Data, Metadata, and System chunks.
Single-survivor cases use a path-specific check:
remove(2→1): the RAID1-aware relocation check does not apply (there is only one remaining device, not two). Instead, a single- survivor capacity check derives demand frombtrfs filesystem dflogical usage –data + 2 * metadata + 2 * system, reflecting the post-balance single + DUP profile on one device – and compares it to the survivor’sdevice_size - device_slack. This check runs at plan time and is re-run as a pre-journal gate inexecute(abovejournal::write_journal), closing the plan/execute drift window – a survivor over-committed by writes during the confirmation + inhibitor wait is caught before the irreversible-fbalance and fails clean, with nopending-op.jsonstranded.remove-missingon a 2-device RAID1 pool with 1 missing (pool.total_devices == 2 && pool.devices.len() == 1 && pool.missing_count == 1): rejected at preflight.btrfs_rm_devicerunsbtrfs_check_raid_min_devices(num_devices - 1)and returnsBTRFS_ERROR_DEV_RAID1_MIN_NOT_METwhenever the remaining device count would drop below the RAID1 minimum of 2, so the call is guaranteed to fail at the kernel level. The supported repair paths for that case arebraid replace(preferred) orbraid addfollowed bybraid remove-missing.
NixOS-native automation
- systemd
braid-unlock.service+braid-pool.targetfor post-boot unlock braid-online.servicelifecycle owner (ExecStop=braid lock,RemainAfterExit=yes)
Rejected alternatives
- Keep plan/apply with simpler flags: Still risk-flattening. The core problem is that a generic reconciler treats “reboot recovery” and “add a new disk” as the same kind of operation.
- Separate init-disk + apply: The original approach. Created an artificial code boundary that was hard to explain and required ceremony for the common case.
Consequences
- Five commands instead of three (no init-disk, no plan, no apply;
removesplit intoremove+remove-missing) - Dry-run/confirmation coverage is a command category, not a blanket guarantee.
The pool/LUKS-lifecycle mutators (
add,remove,remove-missing,replace,unlock,lock,enroll,recover) support--dry-run, whilediscoverpreviews by default and commits with--write.--yesis scoped to the confirmation-gated mutations (add,remove,remove-missing,replace) for scripting. Reactive notification-state maintenance (ack) and internal systemd-invoked paths (scrub-*) are deliberately excluded – they are reversible/self-correcting or machine-contract commands where a dry-run preview adds no operator value. - Tab completion returns disk names from
pool.json