Principles
Canonical invariants for braid. Each principle is authoritative — if code or config contradicts a principle, the code is wrong.
1. Resilient by default
Data drives never block boot. The pool is unlocked and mounted by explicit CLI invocations (braid unlock, the braid-auto-unlock.service unit, or braid recover during recovery), not by systemd mount units. No LUKS or btrfs units are generated at build time. Degraded mounts require explicit --allow-degraded — braid refuses to silently run with zero redundancy. Why →
2. CLI-owned membership
Disk membership is runtime state owned by the CLI, stored in /var/lib/braid/pool.json. Adding or removing a drive is braid add name=/dev/disk/by-id/... — no nixos-rebuild required. The NixOS module provides the mount point, services, and toolchain; the CLI owns which disks are in the pool. unlock requires pool.json to exist and be valid — it never creates or repairs it. Recovery is explicit via braid discover --write. Why →
pool.json is a best-effort operational snapshot — it tells braid which drives to attempt unlocking, not what the pool actually looks like. Any state that can be read from live btrfs (devids, device counts, FSID) must come from btrfs, not pool.json. Commands like status must never surface pool.json-sourced devids; for display authority, devids are authoritative only when read from a mounted filesystem via btrfs device usage or equivalent. Persisted DiskMember.devid carries prior-binding authority only: when live btrfs reports a device by devid alone (the null_underlying mapper case and the btrfs missing_devids case), the persisted devid is the authorized fallback binding for re-attaching that live device to its membership entry. This is not a display-side use of pool.json devid; status output continues to draw devids from live btrfs. Why →
3. Safe-by-construction operations
- Each intent command (
add,remove,remove-missing,replace) does exactly one thing with risk-appropriate confirmation.replacealways usesbtrfs replace start— for live disks it replaces in-place, for missing disks it rebuilds from RAID redundancy using the missing device’s devid.remove-missingcleans up a stale missing-device entry; it never rebuilds data onto a new device (that isreplace). When clearing the last missing device with ≥2 devices remaining, bothremove-missingandreplace(missing path) run a follow-up soft balance to restore RAID1 profiles for chunks written during degraded operation. - Post-commit persist with journal: mutating commands write a pending-operation journal (
pending-op.json) with pre/target membership snapshots before the first irreversible disk operation.pool.jsonis written once the btrfs membership change has committed, so it reflects committed live membership, not necessarily completion of follow-up maintenance such as RAID1 rebalance or resize. Phased journals advance to post-maintenance after the committedpool.jsonwrite; those post phases must never rerun the primary btrfs membership mutation. The journal is cleared only after the entire lifecycle succeeds, including required post-mutation maintenance like soft balance. While the journal exists,braid recoverreplays owed maintenance when btrfs balance state is idle and fails closed with the journal preserved when owed RAID1 replay finds a crash-paused, running, or unknown balance state. If braid crashes or fails mid-operation, the journal triggers recovery mode: membership/mount/key-enrollment commands (add,remove,remove-missing,replace,unlock,enroll,discover --write) hard-fail; read-only diagnostic and cleanup surfaces (status,doctor,lock, barediscover) stay available.braid recoverrebuilds membership from the live mounted pool (not LUKS label scanning) and is the only command that clears the journal. - Environment-side resource acquisition (file locks, sleep inhibitors, dbus/logind handshakes, external service availability) must happen before
journal::write_journal. The journal write commits the user to recovery mode on any subsequent failure, so a pure environment failure (logind unreachable, flock contention) must not leave a strandedpending-op.jsonfor what was conceptually a “command never started” failure. The journal write is the line of no return; reorder code so any RAII guards or environment probes that can fail are bound above it. The per-command pre-journal excluded scope (which also covers reversible validation and identity checks) is enumerated in ADR 019. - Disk names are immutable once recorded in pool membership; name rename/reassignment is rejected by mutating commands and must use explicit
replaceorremove+addworkflows. mkfs.btrfsis gated on bootstrap only – bootstrap accepts only disks classified as fresh non-LUKS during add planning, and the LUKS open helpers verify that any pre-existingbraid-<name>mapper is backed by the requested by-id disk before pool creation proceeds.mkfs.btrfsis invoked without-fso its own libblkid signature check is the final fail-closed guard.- An existing LUKS device or pool member is never reformatted — a multi-layer identity check (LUKS label match, LUKS UUID cross-check against pool.json, pool-mounted requirement, btrfs FSID comparison) prevents accidental data loss, with the btrfs superblock guard as defense-in-depth.
- Failed
unlockand recovery mount paths close only LUKS mappers braid newly opened during that invocation. They never close pre-existing operator-owned mappers, including mappers that become already open between planning and execution. - Mounts always include
skip_balance— btrfs silently resumes interrupted balances on mount by default, which can re-trigger ENOSPC or surprise the user with heavy I/O. braid manages balance lifecycle explicitly;unlockwarns if a paused balance is detected. - The bare pool mountpoint is sealed immutable (
chattr +i) while the pool is offline, so a process writing it before mount fails withEPERMinstead of silently landing on the root filesystem and being shadowed when the pool mounts. The seal is always-on (no knob), lives only in the boot/activation unit (braid-seal-mountpoint), and persists across lock/unlock. Why → - Dry-run previews for migrated mutating commands are rendered from the same typed work plans that execution consumes;
Stepis output-only. Why -> - Why →
4. Single passphrase
All drives share one LUKS passphrase. braid unlock and braid add depend on this — one passphrase unlocks all drives. Before any irreversible operation, every reachable existing LUKS device that will remain in or enter post-operation pool membership has its slot 0 verified. Fresh-format disks are excluded because they have no existing slot 0. The live-replace source is excluded when other retained members exist, so a divergent slot 0 on the disk being replaced does not block its own replacement. The same all-relevant-disk rule applies to keyfile credentials used by mount, unlock, and recover. Why →
Binary keyfile support is available via braid enroll (slot 1) and braid.autoUnlock (NixOS module). The passphrase (slot 0) is the default interactive-unlock mechanism; the slot-1 keyfile drives braid.autoUnlock for unattended boots and can also be passed directly to braid unlock --key-file.
5. Stable identifiers
All persistent storage config uses /dev/disk/by-id/ paths. Never /dev/sdX. Mapper names are braid-<disk-name> (e.g., braid-toshiba) — deterministic, human-friendly, debuggable in lsblk, systemd logs, and error messages. LuksUuid is the primary persistent identity for code; the disk name and the LUKS label are presentation; by_id is for hardware addressing. When the live LUKS UUID is unobservable for a device the kernel/btrfs still reports (null_underlying mapper, btrfs missing_devids), btrfs devid is the only authorized live-fallback binding key. No code path may decide membership, target a device, or correlate live pool state by parsing a name out of a mapper path or LUKS label, except in two narrow cases: discover bootstrapping a UUID-keyed membership from cold disks, and returning-disk adoption safety in add (the PresentLuks path may gate adoption on label match, but identity correlation still uses LuksUuid/devid/FSID). Why →
6. btrfs RAID1
Auto-healing checksums, dynamic drive pooling, in-kernel (no out-of-tree modules). 50% space overhead is accepted. btrfs RAID5/6 is not production-ready. Why →
7. Sane defaults
If a knowledgeable admin would always enable it, braid enables it by default. Use lib.mkDefault for simple pass-through defaults on stable NixOS options. Wrap in a braid.* option when the feature is inside braid’s product boundary and benefits from lifecycle control, discoverability, or a unified config surface – even if the mapping is 1:1. Examples: braid.autoScrub (periodic scrub with lifecycle binding to pool online state), poolAccessGroup for mount root access (root:storage 2770). Why ->
8. Test every design decision
NixOS VM tests validate behavior, not just command success. TDD: write failing tests first, confirm they fail for expected reasons, then implement.
9. NixOS-native
Braid only targets NixOS. No portability abstractions, no generic Linux fallbacks. Follow NixOS module conventions — same option types, patterns, and idioms as nixpkgs. When in doubt, nixpkgs is the tiebreaker. Why →
10. Pinned toolchain
Parser-critical tools (btrfs-progs, cryptsetup, util-linux, NUT, smartmontools, ethtool) are pinned to a specific NixOS stable release via the flake input. Wrappers execute with an explicit PATH built from module-controlled packages (braid.packages.*). Parsers assume the output format of the pinned version – upgrading those tools requires updating fixtures and parser tests. These pinned defaults are a compatibility baseline, not a lock; users may override braid.packages.* to pick up newer system versions when needed. Generic helpers (coreutils, systemd) come from the consumer’s package set and are not part of braid’s parser contract, except that Browse parses systemctl list-units --output=json as a tolerant UI-only picker with raw-output fallback. Why ->
11. HDD defaults
Mount options, LUKS flags, and scrub scheduling are chosen for HDD NAS deployments. Why →
12. One pool operation at a time
Rust dispatch acquires /run/braid-pool.lock before loading config, loading pool.json, probing pool state, or prompting for command input. The authoritative command-to-lock-discipline mapping lives in lock_policy in cli/src/main.rs; its wildcard-free exhaustive match makes every Commands variant choose a discipline at compile time.
Lock disciplines are policy categories, not prose-maintained command lists. Interactive mutators acquire non-blocking and fail fast with braid: another braid operation is already in progress so the user can retry once the active operation completes. Short-contention maintenance paths may wait for a bounded timeout, such as the 10-second alert acknowledgement window. Timer-driven monitoring may exit 0 silently on contention because a missed cycle is harmless and exit 1 would falsely start alert notification. Read-only paths and dry-run modes do not acquire the lock; bare discover is read-only, while its write mode participates because the scan -> pool.json write window must be serialized against pool-state mutators. Read-only diagnostics status and doctor never acquire the lock so operators retain a working diagnostic surface during contention; tests/module/pool-lock-readonly-bypass.py pins this invariant.
Mutual exclusion is enforced at the critical section itself, not via systemd unit topology. Under the held lock, unlock re-checks whether the pool is already mounted and exits cleanly if a prior winner mounted it sequentially; other mutators operate on the current locked state rather than stale pre-lock observations.
13. Announce long-running work
Every interactive command emits a [wait] row before any subprocess
that can stall the terminal long enough for the user to wonder
whether the CLI has hung. The bound categories:
- cryptsetup Argon2 operations (
luksFormat,luksOpen,luksAddKey,--test-passphrase); cryptsetup close(single attempt or busy-retry loop);- btrfs
balance,replace, anddevice remove(potentially hours); mountandumount(kernel can drain in-flight I/O / replace workers / inhibitors).
A [wait] row is closed by one of:
- the same command’s paired success row (
[ok] {same subject}: ...) on the success path, - a same-subject
[fail]row on a known failure path (e.g.lock.rs’s umount failure), - a same-subject
[warn]row on a non-fatal best-effort failure (e.g.mapper_close::close_mapper_best_effort’s LUKS close, orwait_for_kernel_replace_to_finish’s status-poll error — the command continues despite the failure, and the warn row tells the user the wait window is closed without success), - a same-subject
[skip]row on a successful negative or no-op probe (e.g.braid enroll’s pre-mutation keyfile probe finding the keyfile not yet enrolled — the work the wait announced completed, the answer is “no work yet”), - or the command’s normal error output (
MountError/LuksError/PoolErrorpropagation) on uncaught error paths.
A [wait] followed by none of these closers (i.e., success, fail,
warn, skip, or non-zero exit) is a documentation bug.
Fast bookkeeping that completes well under a second
(mkfs.btrfs on a fresh disk, btrfs device add,
btrfs filesystem resize, btrfs device scan,
btrfs device scan --forget, cryptsetup luksHeaderBackup,
cryptsetup status, blkid, JSON parses, journal writes,
pool.json saves, sysfs reads) does not warrant a row.
Rendering uses status_tag::status_line(StatusTag::Wait, ...)
against color_enabled_for_stderr() so plain stderr captures
contain unwrapped [wait] bytes and TTY output picks up the gray
ANSI tag. Why →
Implementation workflow and conventions are in AGENTS.md at the repo root.