Decision: Systemd Lifecycle State Machine
Principle: Resilient by default
Context
braid needs systemd integration for three things: interactive unlock, unattended unlock, and clean shutdown (LUKS close before power-off). The module must not generate data-pool fileSystems or boot.initrd.luks.devices entries — those create hard boot dependencies on the data pool (see 003-resilient-boot.md). Instead, the CLI owns LUKS open/close and btrfs mount/unmount at runtime, and a thin systemd layer provides the entry points and shutdown hook.
Units
┌─────────────────────┐
│ braid-pool.target │ entry point
│ wants + after │
└─────────┬────────────┘
│ (soft dep)
┌─────────▼────────────┐
│ braid-unlock.service │ interactive passphrase
│ oneshot │
└─────────┬────────────┘
│ (CLI marks online on success)
┌─────────▼────────────┐
│ braid-online.service │ lifecycle owner
│ ExecStart=/bin/true │
│ ExecStop=braid lock │ --systemd-stop
│ oneshot, RAE │
└──────────────────────┘
braid-auto-unlock.service (alternative unlock path, boot-time)
wantedBy multi-user.target activates braid-online via same CLI path
mnt-storage.mount (auto-generated by systemd from /proc/mounts)
braid-monitor.timer -> braid-monitor.service -> braid-alert.service
ConditionPathIsMountPoint (health polling, skipped when pool not mounted)
braid-scrub.timer -> braid-scrub.service
braid-online.service -> braid-scrub-resume-trigger.service -> braid-scrub.service
BindsTo + After braid-online.service (lifecycle-bound periodic scrub)
Persistent=true (catch-up on activation)
RAE = RemainAfterExit = true
braid-pool.target — entry point
Public handle for “bring pool online.” User runs systemctl start braid-pool.target.
wants(notrequires)braid-unlock.service— soft dependency. Unlock failure does not fail the target, and the target cannot block boot because nothingrequiresit.after braid-unlock.service— ordering only.- Does not want or require
braid-online.service. The CLI activates that separately after confirming the mount succeeded.
braid-unlock.service — interactive passphrase unlock
Single orchestrator: opens all LUKS devices and mounts the btrfs pool in one shot. Guarantees exactly one passphrase prompt (avoids relying on systemd-ask-password cache behavior across multiple LUKS units).
Type = oneshot— runs once, returns to inactive on completion.ConditionPathIsMountPoint(below) prevents re-run while mounted; the inactive state allowssystemctl start braid-pool.targetto re-unlock after a priorbraid lock.ConditionPathIsMountPoint = !${mountPoint}— skips if pool already mounted.- Calls
systemd-ask-password --timeout=0 --id=braid | braid unlock --passphrase-stdin.
braid-auto-unlock.service — unattended USB keyfile unlock
Optional (only created when braid.autoUnlock.enable = true). Runs at boot, unlocks from a USB keyfile without interactive prompt.
wantedBy = [ "multi-user.target" ]— starts automatically at boot.after = [ "local-fs.target" ]— waits for/runto exist.ConditionPathIsMountPoint = !${mountPoint}— skips if pool already mounted.- No
RemainAfterExit— intentional. If USB is absent at boot (service exits 0 on skip), a latersystemctl start braid-auto-unlockcan re-run when the USB is inserted. - Mounts USB read-only, validates keyfile path (symlink defense), runs
braid unlock --key-file, always unmounts USB after (never leaves keyfile accessible). - Always exits 0 — failures are logged to the journal but never reported as unit failure, because auto-unlock must not block boot under any circumstance.
braid-online.service — lifecycle owner
State-ownership service. Its only purpose is to mark “pool is online” and run the bounded braid lock stop path on stop.
ExecStart = /bin/true— no work. Exists for itsExecStophook.ExecStop = braid lock --systemd-stop --deadline-secs <n>– unmounts pool and closes all LUKS on shutdown or manual stop with a bounded stop-coordinator/pool-lock wait belowTimeoutStopSec. In this mode, braid permits a running or paused btrfsbalance: a running balance is explicitly paused before unmount, an already-paused balance proceeds to unmount, and every other exclusive operation is refused. If the blockingbtrfs balanceuserspace process briefly holds the mount fd after its parent dies, the systemd-stop path uses a longer transient-busy umount retry than plainbraid lock.RemainAfterExit = true— persists “active” state.ConditionPathIsMountPoint = ${mountPoint}– systemd skips activation when the pool is not mounted (systemctl startreturns 0 but the unit stays inactive). Defense-in-depth: the CLI’smountpoint -qcheck is the primary gate, but this condition prevents directsystemctl startfrom leaving the unit active while unmounted.TimeoutStopSec = 300s– raises the stop timeout from the 90s default so a slow braid lock is not SIGKILL’d mid-operation.- Not in any dependency chain. Neither the target nor unlock services want/require it. Activated exclusively by the CLI after
mountpoint -qconfirms the pool is mounted.
mnt-storage.mount — readiness contract
Auto-generated by systemd from /proc/mounts when the btrfs pool is mounted. Consumer services bind to this unit.
braid-monitor.timer + braid-monitor.service — health polling
Periodic oneshot (default: every 5 minutes). Pure detector — checks btrfs device stats for errors.
ConditionPathIsMountPoint— skipped cleanly when pool is not mounted (no dependency-failure noise from timer). NoAfterorBindsToonmnt-storage.mount— those directives force systemd to load the unit, which doesn’t exist before the first unlock.- Exit code 1 from
braid monitor→ startsbraid-alert.service. braid monitorfails closed: probe/parse/stats/mountinfo failures,acked-stats.jsonbaseline read/parse failures, and alert-latch read/quarantine failures latchAlertCause::ComputationErrorand exit 1, so the wrapper above starts the beeper. Exit 0 is reserved for healthy, pool-offline, and pool-lock-contended cycles; exit 2 is reserved for pre-cmd_monitorsetup failures (e.g. pool-lock I/O, config load failure) and is never emitted bycmd_monitoritself. See ADR 014 fail-closed contract for the cause taxonomy.- The gate and the fail-closed path are independent mount checks, so the gate cannot mask a real alert.
ConditionPathIsMountPointresolves throughstatx(STATX_ATTR_MOUNT_ROOT)(thenname_to_handle_at(2), then/proc/self/fdinfo) – a kernel VFS query, never a parse of/proc/self/mountinfotext. The fail-closed path above instead parses that text and latchesComputationErroron a malformed line, duplicate target, or read error. On a genuinely-mounted poolstatxreports a mount root regardless of any text anomaly, so the service runs and the beep fires – the protective beep is never gated away. The gate only short-circuits astatx-confirmed-offline pool; the sole beep it suppresses is braid’s conservativeComputationErroron an offline pool with anomalous mountinfo text, which is not a disk-health alert.
braid-scrub.timer + scrub service + resume trigger – lifecycle-bound scrub
Periodic scrub (default: monthly). Uses a timer-lifecycle pattern distinct from the monitor’s ConditionPathIsMountPoint-only approach.
- Timer is
wantedBy,BindsTo, andAfterbraid-online.service. Starts when pool comes online, stops when pool goes offline. Persistent=true+AccuracySec=1d. When the timer activates (pool unlock), systemd compares the last-trigger stamp againstOnCalendar. If a scrub was overdue during the offline period, it fires immediately.braid-scrub.serviceis the only foreground scrub runner. It isType=simple; its internalbraid scrub-resume-or-start --mount <mount>ExecStart resumes saved scrub progress first, then starts a fresh scrub only when btrfs reports nothing resumable.braid-scrub.serviceuses a sharedExecStopcancel script – same pattern as the nixpkgs btrfs scrub service. This cancels in-flight scrub on lock or shutdown throughbtrfs scrub cancel, leaving btrfs-progs’/var/lib/btrfs/scrub.status.<fsid>progress file available for the next resume.braid-scrub-resume-trigger.serviceis the pool-online predicate-and-poke path. It isType=oneshot,wantedBy,BindsTo, andAfterbraid-online.service; it runs internalbraid scrub-needs-resume --mount <mount>and startsbraid-scrub.servicewithsystemctl start --no-blockonly when saved progress is resumable.- The scrub service and resume trigger use
BindsTo+Afterbraid-online.service. On shutdown orsystemctl stop braid-online.service, systemd stops them beforebraid lockruns. ConditionPathIsMountPointon the scrub service and trigger is defense-in-depth.- Serialization via single runner. Only
braid-scrub.serviceever runsbtrfs scrub; both activation paths (timer and trigger) issuesystemctl start braid-scrub.service, and systemd coalesces overlapping starts for the same unit. A completedscrub-resume-or-startrun satisfies both an overdue timer fire and a pool-online resumable state, with noflockand no/run/braid-scrub.lock. Conflicts+Beforeshutdown.targetandsleep.targeton the scrub service. The short-lived resume trigger also usesConflicts+Beforesleep.targetso suspend setup wins cleanly against pool-online activation.
braid-alert.service — notification
Started by monitor on error detection. Beeps via PC speaker (if enabled) and/or runs a custom alert command. Stopped by braid ack.
Rust dispatch as synchronization layer
The wrapper (braid-wrapper.sh) is a pure exec shim: it sets the module-controlled PATH and execs the Rust binary. Synchronization lives in Rust dispatch (cli/src/main.rs), which owns the pool lock, braid-online.service lifecycle updates, and shutdown stop coordination. See 026-pool-lock-rust-owned.md.
modules/braid/cli.nix emits systemd_lifecycle = true for module-managed
installs. Standalone CLI deployments omit it; those configs still get mount
permission fixups but do not touch braid-online.service.
After every unlock, add, or recover attempt:
- Rust dispatch acquires
/run/braid-pool.lock, loads config and membership, and snapshotsbraid-online.serviceActiveStateonly whensystemd_lifecycle = true. - CLI opens LUKS + mounts pool when the command reaches its mount step. (
recoverself-mounts when recovering from an interrupted operation.) - Before dispatch returns, success or failure, Rust runs
mark_onlinewhile the pool lock is still held. mark_onlinechecksmountpoint -q; pre-mount failures short-circuit here.- Rust sets permissions (
root:poolAccessGroup 2770) ifpoolAccessGroupis configured. - When
systemd_lifecycle = true, Rust startsbraid-online.serviceonly when the initial snapshot wasinactiveorfailed. - If activation fails: prints WARNING to stderr, then preserves the command’s original exit result. Pool is mounted and usable; only the shutdown hook is missing.
On lock:
- Plain
braid lockacquires/run/braid-stop-coordinator.lock, then/run/braid-pool.lock. - When
systemd_lifecycle = true, Rust stopsbraid-scrub.timer,braid-scrub-resume-trigger.service, thenbraid-scrub.service(timer first prevents re-trigger; trigger before service prevents the trigger from queuing a fresh start of the service being stopped; service last cancels in-flight scrub). - When
systemd_lifecycle = true, Rust iteratessystemctl show -P BoundBy braid-online.serviceand stops each remaining bound consumer (samba, nfs, future). The scrub units already handled in step 2 are skipped. This mirrors the cascade systemd performs on shutdown for user-initiatedbraid lock. - CLI unmounts pool + closes LUKS.
- Plain
braid lockwritesdone\nto/run/braid-stop-coordinator.lock. - When
systemd_lifecycle = true, Rust checks the mount is gone and runssystemctl stop braid-online.servicesynchronously so the command returns only after the lifecycle owner is inactive. The synchronous stop runs only when the post-cleanup mountpoint check confirms the mount is gone; if the check itself fails, Rust warns and skips the stop, leaving the unit active for the operator to retry. The recursiveExecStopreentry polls the coordinator, observesdone\n, and exits 0.
On system shutdown:
- systemd stops
braid-online.service(if active); itsBindsTo+Aftercascade stops the scrub units and any full-triad consumer first. ExecStop then re-runs the same scrub-stop +BoundByiteration as the “Onlock” steps 2-3. For the scrub units and any consumer that follows the documentedWantedBy+BindsTo+Aftertriad, the cascade has already stopped them, so these re-issued stops are no-ops. A consumer that declaresBindsTowithoutAfterhas no stop-ordering guarantee and may still be active when ExecStop runs, so the explicit blocking stop here is what frees the mount. Running the pre-steps unconditionally covers both cases, keeping teardown code-owned and independent of cascade ordering. ExecStop = braid lock --systemd-stop --deadline-secs <n>waits for an in-flight plainbraid lockto finish through the stop coordinator, or waits for the pool lock up to the configured deadline.- Lock dispatch loads membership from
pool.json; ifpool.jsonis absent or corrupt, it warns and proceeds with empty membership because mapper cleanup still requires per-candidate LUKS UUID verification. - CLI unmounts and closes LUKS. If sysfs reports a running btrfs
balance,--systemd-stopfirst runsbtrfs balance pauseso the kernel persists the paused balance before LUKS close; if sysfs reports an already-paused balance, teardown proceeds directly to unmount. Next-bootbraid recoverfails closed on that persisted paused balance and preservespending-op.jsonfor manual inspection instead of resuming it. Plainbraid lockstill refuses all active exclusive operations. The systemd-stop path also retries transientumountEBUSYlonger than user lock so a survivingbtrfs balanceprocess can release its mount fd during shutdown. - Drives are safe to power off.
Pool lock mutual exclusion
Pool mutators, alert-state mutators, key enrollment, lock, and discover --write (unlock, add, recover, remove, remove-missing, replace, enroll, lock, discover --write, ack, monitor) acquire an exclusive flock on /run/braid-pool.lock in Rust dispatch before reading pool state. unlock, add, recover, remove, remove-missing, replace, enroll, lock, and discover --write are non-blocking fail-fast commands: if the lock is already held by another braid process, the CLI exits 1 immediately with braid: another braid operation is already in progress and the user must retry once the active operation completes. Bare discover is read-only and does not acquire the lock. ack waits up to 10 seconds before returning a retry message. monitor exits 0 silently on contention so a skipped timer cycle does not start alert notification. The lock is held through post-processing (permissions, braid-online activation/deactivation). Under the held lock, unlock re-checks whether the pool is already mounted and exits cleanly if a prior winner mounted it sequentially; other mutators operate on current locked state. See Principle 12.
Lock acquisition site
For non-dry-run pool mutators, alert-state mutators, key enrollment, lock, and discover --write, the operation lock is acquired in cli/src/main.rs dispatch before config load, pool.json load, journal read, identity probes, subprocess health probes, or interactive prompts. The shell wrapper must not acquire /run/braid-pool.lock; it execs the Rust binary and leaves critical-section ownership to dispatch.
A command started during another mutator could otherwise read stale state, then acquire the lock after the first command finishes and act on old inputs. Late acquisition also regresses the fail-fast UX – users see prompts and probes complete before being told the operation is contended.
The pool lock is the first real execution boundary. Do not model it after the sleep inhibitor’s late-acquisition pattern: the inhibitor protects against suspend mid-operation and can wait until the irreversible window; the pool lock protects against state-staleness and must precede any read of pool state.
ExecStop bounded-wait pattern
When a unit’s ExecStop= invokes a CLI that needs a contended resource (e.g. braid-online.service ExecStop=braid lock colliding with an in-flight mutator that holds the pool lock), the ExecStop path gets a distinct bounded-wait variant – not a fail-fast call. “ExecStop fails fast; in-flight work finishes and a later stop attempt succeeds” is not a valid design: during shutdown there is no later stop attempt. systemctl poweroff can leave the resource (mounted btrfs / open LUKS) in an inconsistent state, and the “in-flight mutator finishes before TimeoutStopSec” claim is not guaranteed.
Current pattern: braid-online.service runs braid lock --systemd-stop --deadline-secs ${braid.lockSystemdStopDeadlineSecs}. The module default is 270 seconds and an assertion requires it to be strictly less than braid-online.service TimeoutStopSec (300 seconds). That deadline bounds only stop-coordinator and pool-lock acquisition; once lock cleanup reaches btrfs balance pause or umount, any kernel wait to quiesce btrfs has no userspace timeout and is bounded only by the unit’s TimeoutStopSec (300 seconds). The systemd-stop path also has a longer transient-busy umount retry (60 attempts at 500ms) because btrfs-progs holds the mount fd while blocked in BTRFS_IOC_BALANCE_V2 and can survive the Rust parent briefly during shutdown. Regular braid lock stays fail-fast for user invocations; the bounded-wait path is documented and tested as a distinct mode.
systemctl start/stop inside held-resource windows
systemctl start <unit> on an already-active oneshot+RemainAfterExit unit is a no-op at the work level, but it still queues a job. If a stop job for the same unit is already in flight (because someone else invoked systemctl stop), the start queues behind the stop. If that stop’s ExecStop= is itself blocked on a resource the caller holds, the result is a deadlock.
This is load-bearing for any CLI that both holds a resource and uses systemctl start/stop on a unit whose ExecStart=/ExecStop= touches that resource (e.g. Rust dispatch holding pool.lock while activating braid-online.service whose ExecStop calls braid lock).
These rules govern start/stop of braid-online.service itself. The
systemctl stop calls in run_lock_pre_steps target bound consumers and scrub
units, not the lifecycle owner, so they queue no job against
braid-online.service and the start-behind-stop deadlock above does not apply
to them.
Rules:
- Snapshot full unit state at the start of the held-resource window with
systemctl show -P ActiveState <unit>. Do NOT usesystemctl is-active– it returns “active” only foractive, classifyingactivatinganddeactivatingas not-active. Adeactivatingunit (its ExecStop is already running and waiting on the held resource) snapshotted as “not active” leads the caller to issue astartthat queues behind the in-flight stop – the exact deadlock the snapshot was supposed to prevent. - Only emit
systemctl start <unit>at the end of the window if the snapshot wasinactiveorfailed. Skip whenactive,activating, ordeactivating. See ADR 026 snapshot rule. - Only emit
systemctl stop <unit>at the end of the window if the snapshot wasactiveoractivating. Skip wheninactive,failed, ordeactivating.- Exception: plain
braid lock’s post-successmark_offlineruns a synchronoussystemctl stop braid-online.servicewithout a stop-side snapshot. It is safe because/run/braid-stop-coordinator.lockplus thedone\nprotocol guarantees the recursiveExecStopreentry exits 0 once plainbraid lockhas finishedcmd_lock, instead of queuing behind the in-flight stop. This coordinator is the mechanism that replaces the stop-side snapshot gate formark_offline; see ADR 026 stop coordinator.mark_offlineskips the synchronous stop when the post-cleanupmountpoint -qcheck itself fails (e.g.OnlineError::Spawnmid-shutdown): the unit stays active and the operator retries. Treating unknown mount state as still-mounted mirrorsmark_online’s start-side fail-safe.
- Exception: plain
Consumer dependency contracts
Services that depend on the pool being mounted use one of three patterns:
Frequent periodic services (monitor): ConditionPathIsMountPoint only. Neither After nor BindsTo on mnt-storage.mount – those directives force systemd to load the unit, which doesn’t exist until the CLI mounts the pool at runtime (auto-generated from /proc/mounts). The condition gate silently skips the service when unmounted. Fires every 5 minutes – missed fires are cheap, so lifecycle binding is unnecessary.
Infrequent periodic services (scrub): The timer, scrub service, and resume trigger use BindsTo + After on braid-online.service; the timer and trigger are wantedBy the online unit. The timer’s active lifecycle matches the pool’s online period. Persistent=true handles catch-up for overdue fires. Unlike the monitor timer (which fires every 5 minutes and can afford missed runs), the monthly scrub timer cannot wait until next month if it misses – lifecycle binding ensures it fires on the next unlock. The scrub service and resume trigger also get ConditionPathIsMountPoint as defense-in-depth. For manual lock, Rust dispatch stops the timer, resume trigger, and scrub service before unmount (see above).
Long-running services holding open files (samba, nfs): Use the full WantedBy=braid-online.service + BindsTo=braid-online.service + After=braid-online.service triad (same shape as the scrub timer above), plus ConditionPathIsMountPoint=<pool mount>. BindsTo + After ensures systemd stops them before braid lock runs ExecStop, preventing unmount failures from busy filesystems; WantedBy ensures they restart automatically when braid unlock reactivates braid-online.service. The triad handles the unlock-start and lock-stop lifecycle, but these consumers carry their own boot or direct-start edges – NixOS wants samba-smbd.service from samba.target and nfs-server.service from multi-user.target. For starts not initiated by braid-online.service, ConditionPathIsMountPoint is the load-bearing gate that prevents serving an offline mount directory. Rust dispatch iterates BoundBy braid-online.service and stops these consumers before unmount, mirroring the cascade systemd performs on shutdown for user-initiated lock. See ../../guides/sharing-and-permissions.md#binding-shares-to-the-pool-lifecycle for the user-facing example.
Key design constraints
- No hard boot dependencies.
wantseverywhere, neverrequires. Pool failure never blocks boot. - Rust-synchronized lifecycle. For dispatch-managed operations, Rust keeps
braid-onlinesynchronized with pool mount state: it activates the service only aftermountpoint -qsucceeds, and deactivates it after a successful lock.ConditionPathIsMountPointon the unit is defense-in-depth against directsystemctl startwhen unmounted. Out-of-band mount or unmount bypasses dispatch and can leavebraid-onlinestale;braid lockhandles already-unmounted pools gracefully. - One passphrase prompt.
braid-unlock.serviceis the sole interactive prompt source. The CLI opens all LUKS devices from that single passphrase. - Graceful degradation. If
braid-onlineactivation fails, the pool is still mounted and usable – only the shutdown hook is missing (warned to stderr). - One pool operation at a time. Enforced by a non-blocking
flockin Rust dispatch, not wrapper logic or unit topology – concurrent attempts are rejected, not queued. See Principle 12.
See
modules/braid/storage.nix— unit definitionsmodules/braid/monitor.nix— monitor/alert unitsmodules/braid/braid-wrapper.sh— pure exec shim- 026-pool-lock-rust-owned.md — Rust-owned pool lock and lifecycle synchronization
- 003-resilient-boot.md — why no hard dependencies
- 017-runtime-disk-membership.md — lifecycle model context
tests/module/systemd-lifecycle.py— state machine test suite