Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Decision: Systemd Lifecycle State Machine

Principle: Resilient by default

Context

braid needs systemd integration for three things: interactive unlock, unattended unlock, and clean shutdown (LUKS close before power-off). The module must not generate data-pool fileSystems or boot.initrd.luks.devices entries — those create hard boot dependencies on the data pool (see 003-resilient-boot.md). Instead, the CLI owns LUKS open/close and btrfs mount/unmount at runtime, and a thin systemd layer provides the entry points and shutdown hook.

Units

                          ┌─────────────────────┐
                          │  braid-pool.target   │  entry point
                          │  wants + after       │
                          └─────────┬────────────┘
                                    │ (soft dep)
                          ┌─────────▼────────────┐
                          │  braid-unlock.service │  interactive passphrase
                          │  oneshot              │
                          └─────────┬────────────┘
                                    │ (CLI marks online on success)
                          ┌─────────▼────────────┐
                          │  braid-online.service │  lifecycle owner
                          │  ExecStart=/bin/true  │
                          │  ExecStop=braid lock  │  --systemd-stop
                          │  oneshot, RAE         │
                          └──────────────────────┘

  braid-auto-unlock.service          (alternative unlock path, boot-time)
  wantedBy multi-user.target          activates braid-online via same CLI path

  mnt-storage.mount                   (auto-generated by systemd from /proc/mounts)

  braid-monitor.timer -> braid-monitor.service -> braid-alert.service
  ConditionPathIsMountPoint            (health polling, skipped when pool not mounted)

  braid-scrub.timer -> braid-scrub.service
  braid-online.service -> braid-scrub-resume-trigger.service -> braid-scrub.service
  BindsTo + After braid-online.service    (lifecycle-bound periodic scrub)
  Persistent=true                          (catch-up on activation)

RAE = RemainAfterExit = true

braid-pool.target — entry point

Public handle for “bring pool online.” User runs systemctl start braid-pool.target.

  • wants (not requires) braid-unlock.service — soft dependency. Unlock failure does not fail the target, and the target cannot block boot because nothing requires it.
  • after braid-unlock.service — ordering only.
  • Does not want or require braid-online.service. The CLI activates that separately after confirming the mount succeeded.

braid-unlock.service — interactive passphrase unlock

Single orchestrator: opens all LUKS devices and mounts the btrfs pool in one shot. Guarantees exactly one passphrase prompt (avoids relying on systemd-ask-password cache behavior across multiple LUKS units).

  • Type = oneshot — runs once, returns to inactive on completion. ConditionPathIsMountPoint (below) prevents re-run while mounted; the inactive state allows systemctl start braid-pool.target to re-unlock after a prior braid lock.
  • ConditionPathIsMountPoint = !${mountPoint} — skips if pool already mounted.
  • Calls systemd-ask-password --timeout=0 --id=braid | braid unlock --passphrase-stdin.

braid-auto-unlock.service — unattended USB keyfile unlock

Optional (only created when braid.autoUnlock.enable = true). Runs at boot, unlocks from a USB keyfile without interactive prompt.

  • wantedBy = [ "multi-user.target" ] — starts automatically at boot.
  • after = [ "local-fs.target" ] — waits for /run to exist.
  • ConditionPathIsMountPoint = !${mountPoint} — skips if pool already mounted.
  • No RemainAfterExit — intentional. If USB is absent at boot (service exits 0 on skip), a later systemctl start braid-auto-unlock can re-run when the USB is inserted.
  • Mounts USB read-only, validates keyfile path (symlink defense), runs braid unlock --key-file, always unmounts USB after (never leaves keyfile accessible).
  • Always exits 0 — failures are logged to the journal but never reported as unit failure, because auto-unlock must not block boot under any circumstance.

braid-online.service — lifecycle owner

State-ownership service. Its only purpose is to mark “pool is online” and run the bounded braid lock stop path on stop.

  • ExecStart = /bin/true — no work. Exists for its ExecStop hook.
  • ExecStop = braid lock --systemd-stop --deadline-secs <n> – unmounts pool and closes all LUKS on shutdown or manual stop with a bounded stop-coordinator/pool-lock wait below TimeoutStopSec. In this mode, braid permits a running or paused btrfs balance: a running balance is explicitly paused before unmount, an already-paused balance proceeds to unmount, and every other exclusive operation is refused. If the blocking btrfs balance userspace process briefly holds the mount fd after its parent dies, the systemd-stop path uses a longer transient-busy umount retry than plain braid lock.
  • RemainAfterExit = true — persists “active” state.
  • ConditionPathIsMountPoint = ${mountPoint} – systemd skips activation when the pool is not mounted (systemctl start returns 0 but the unit stays inactive). Defense-in-depth: the CLI’s mountpoint -q check is the primary gate, but this condition prevents direct systemctl start from leaving the unit active while unmounted.
  • TimeoutStopSec = 300s – raises the stop timeout from the 90s default so a slow braid lock is not SIGKILL’d mid-operation.
  • Not in any dependency chain. Neither the target nor unlock services want/require it. Activated exclusively by the CLI after mountpoint -q confirms the pool is mounted.

mnt-storage.mount — readiness contract

Auto-generated by systemd from /proc/mounts when the btrfs pool is mounted. Consumer services bind to this unit.

braid-monitor.timer + braid-monitor.service — health polling

Periodic oneshot (default: every 5 minutes). Pure detector — checks btrfs device stats for errors.

  • ConditionPathIsMountPoint — skipped cleanly when pool is not mounted (no dependency-failure noise from timer). No After or BindsTo on mnt-storage.mount — those directives force systemd to load the unit, which doesn’t exist before the first unlock.
  • Exit code 1 from braid monitor → starts braid-alert.service.
  • braid monitor fails closed: probe/parse/stats/mountinfo failures, acked-stats.json baseline read/parse failures, and alert-latch read/quarantine failures latch AlertCause::ComputationError and exit 1, so the wrapper above starts the beeper. Exit 0 is reserved for healthy, pool-offline, and pool-lock-contended cycles; exit 2 is reserved for pre-cmd_monitor setup failures (e.g. pool-lock I/O, config load failure) and is never emitted by cmd_monitor itself. See ADR 014 fail-closed contract for the cause taxonomy.
  • The gate and the fail-closed path are independent mount checks, so the gate cannot mask a real alert. ConditionPathIsMountPoint resolves through statx(STATX_ATTR_MOUNT_ROOT) (then name_to_handle_at(2), then /proc/self/fdinfo) – a kernel VFS query, never a parse of /proc/self/mountinfo text. The fail-closed path above instead parses that text and latches ComputationError on a malformed line, duplicate target, or read error. On a genuinely-mounted pool statx reports a mount root regardless of any text anomaly, so the service runs and the beep fires – the protective beep is never gated away. The gate only short-circuits a statx-confirmed-offline pool; the sole beep it suppresses is braid’s conservative ComputationError on an offline pool with anomalous mountinfo text, which is not a disk-health alert.

braid-scrub.timer + scrub service + resume trigger – lifecycle-bound scrub

Periodic scrub (default: monthly). Uses a timer-lifecycle pattern distinct from the monitor’s ConditionPathIsMountPoint-only approach.

  • Timer is wantedBy, BindsTo, and After braid-online.service. Starts when pool comes online, stops when pool goes offline.
  • Persistent=true + AccuracySec=1d. When the timer activates (pool unlock), systemd compares the last-trigger stamp against OnCalendar. If a scrub was overdue during the offline period, it fires immediately.
  • braid-scrub.service is the only foreground scrub runner. It is Type=simple; its internal braid scrub-resume-or-start --mount <mount> ExecStart resumes saved scrub progress first, then starts a fresh scrub only when btrfs reports nothing resumable.
  • braid-scrub.service uses a shared ExecStop cancel script – same pattern as the nixpkgs btrfs scrub service. This cancels in-flight scrub on lock or shutdown through btrfs scrub cancel, leaving btrfs-progs’ /var/lib/btrfs/scrub.status.<fsid> progress file available for the next resume.
  • braid-scrub-resume-trigger.service is the pool-online predicate-and-poke path. It is Type=oneshot, wantedBy, BindsTo, and After braid-online.service; it runs internal braid scrub-needs-resume --mount <mount> and starts braid-scrub.service with systemctl start --no-block only when saved progress is resumable.
  • The scrub service and resume trigger use BindsTo + After braid-online.service. On shutdown or systemctl stop braid-online.service, systemd stops them before braid lock runs.
  • ConditionPathIsMountPoint on the scrub service and trigger is defense-in-depth.
  • Serialization via single runner. Only braid-scrub.service ever runs btrfs scrub; both activation paths (timer and trigger) issue systemctl start braid-scrub.service, and systemd coalesces overlapping starts for the same unit. A completed scrub-resume-or-start run satisfies both an overdue timer fire and a pool-online resumable state, with no flock and no /run/braid-scrub.lock.
  • Conflicts + Before shutdown.target and sleep.target on the scrub service. The short-lived resume trigger also uses Conflicts + Before sleep.target so suspend setup wins cleanly against pool-online activation.

braid-alert.service — notification

Started by monitor on error detection. Beeps via PC speaker (if enabled) and/or runs a custom alert command. Stopped by braid ack.

Rust dispatch as synchronization layer

The wrapper (braid-wrapper.sh) is a pure exec shim: it sets the module-controlled PATH and execs the Rust binary. Synchronization lives in Rust dispatch (cli/src/main.rs), which owns the pool lock, braid-online.service lifecycle updates, and shutdown stop coordination. See 026-pool-lock-rust-owned.md.

modules/braid/cli.nix emits systemd_lifecycle = true for module-managed installs. Standalone CLI deployments omit it; those configs still get mount permission fixups but do not touch braid-online.service.

After every unlock, add, or recover attempt:

  1. Rust dispatch acquires /run/braid-pool.lock, loads config and membership, and snapshots braid-online.service ActiveState only when systemd_lifecycle = true.
  2. CLI opens LUKS + mounts pool when the command reaches its mount step. (recover self-mounts when recovering from an interrupted operation.)
  3. Before dispatch returns, success or failure, Rust runs mark_online while the pool lock is still held.
  4. mark_online checks mountpoint -q; pre-mount failures short-circuit here.
  5. Rust sets permissions (root:poolAccessGroup 2770) if poolAccessGroup is configured.
  6. When systemd_lifecycle = true, Rust starts braid-online.service only when the initial snapshot was inactive or failed.
  7. If activation fails: prints WARNING to stderr, then preserves the command’s original exit result. Pool is mounted and usable; only the shutdown hook is missing.

On lock:

  1. Plain braid lock acquires /run/braid-stop-coordinator.lock, then /run/braid-pool.lock.
  2. When systemd_lifecycle = true, Rust stops braid-scrub.timer, braid-scrub-resume-trigger.service, then braid-scrub.service (timer first prevents re-trigger; trigger before service prevents the trigger from queuing a fresh start of the service being stopped; service last cancels in-flight scrub).
  3. When systemd_lifecycle = true, Rust iterates systemctl show -P BoundBy braid-online.service and stops each remaining bound consumer (samba, nfs, future). The scrub units already handled in step 2 are skipped. This mirrors the cascade systemd performs on shutdown for user-initiated braid lock.
  4. CLI unmounts pool + closes LUKS.
  5. Plain braid lock writes done\n to /run/braid-stop-coordinator.lock.
  6. When systemd_lifecycle = true, Rust checks the mount is gone and runs systemctl stop braid-online.service synchronously so the command returns only after the lifecycle owner is inactive. The synchronous stop runs only when the post-cleanup mountpoint check confirms the mount is gone; if the check itself fails, Rust warns and skips the stop, leaving the unit active for the operator to retry. The recursive ExecStop reentry polls the coordinator, observes done\n, and exits 0.

On system shutdown:

  1. systemd stops braid-online.service (if active); its BindsTo+After cascade stops the scrub units and any full-triad consumer first. ExecStop then re-runs the same scrub-stop + BoundBy iteration as the “On lock” steps 2-3. For the scrub units and any consumer that follows the documented WantedBy+BindsTo+After triad, the cascade has already stopped them, so these re-issued stops are no-ops. A consumer that declares BindsTo without After has no stop-ordering guarantee and may still be active when ExecStop runs, so the explicit blocking stop here is what frees the mount. Running the pre-steps unconditionally covers both cases, keeping teardown code-owned and independent of cascade ordering.
  2. ExecStop = braid lock --systemd-stop --deadline-secs <n> waits for an in-flight plain braid lock to finish through the stop coordinator, or waits for the pool lock up to the configured deadline.
  3. Lock dispatch loads membership from pool.json; if pool.json is absent or corrupt, it warns and proceeds with empty membership because mapper cleanup still requires per-candidate LUKS UUID verification.
  4. CLI unmounts and closes LUKS. If sysfs reports a running btrfs balance, --systemd-stop first runs btrfs balance pause so the kernel persists the paused balance before LUKS close; if sysfs reports an already-paused balance, teardown proceeds directly to unmount. Next-boot braid recover fails closed on that persisted paused balance and preserves pending-op.json for manual inspection instead of resuming it. Plain braid lock still refuses all active exclusive operations. The systemd-stop path also retries transient umount EBUSY longer than user lock so a surviving btrfs balance process can release its mount fd during shutdown.
  5. Drives are safe to power off.

Pool lock mutual exclusion

Pool mutators, alert-state mutators, key enrollment, lock, and discover --write (unlock, add, recover, remove, remove-missing, replace, enroll, lock, discover --write, ack, monitor) acquire an exclusive flock on /run/braid-pool.lock in Rust dispatch before reading pool state. unlock, add, recover, remove, remove-missing, replace, enroll, lock, and discover --write are non-blocking fail-fast commands: if the lock is already held by another braid process, the CLI exits 1 immediately with braid: another braid operation is already in progress and the user must retry once the active operation completes. Bare discover is read-only and does not acquire the lock. ack waits up to 10 seconds before returning a retry message. monitor exits 0 silently on contention so a skipped timer cycle does not start alert notification. The lock is held through post-processing (permissions, braid-online activation/deactivation). Under the held lock, unlock re-checks whether the pool is already mounted and exits cleanly if a prior winner mounted it sequentially; other mutators operate on current locked state. See Principle 12.

Lock acquisition site

For non-dry-run pool mutators, alert-state mutators, key enrollment, lock, and discover --write, the operation lock is acquired in cli/src/main.rs dispatch before config load, pool.json load, journal read, identity probes, subprocess health probes, or interactive prompts. The shell wrapper must not acquire /run/braid-pool.lock; it execs the Rust binary and leaves critical-section ownership to dispatch.

A command started during another mutator could otherwise read stale state, then acquire the lock after the first command finishes and act on old inputs. Late acquisition also regresses the fail-fast UX – users see prompts and probes complete before being told the operation is contended.

The pool lock is the first real execution boundary. Do not model it after the sleep inhibitor’s late-acquisition pattern: the inhibitor protects against suspend mid-operation and can wait until the irreversible window; the pool lock protects against state-staleness and must precede any read of pool state.

ExecStop bounded-wait pattern

When a unit’s ExecStop= invokes a CLI that needs a contended resource (e.g. braid-online.service ExecStop=braid lock colliding with an in-flight mutator that holds the pool lock), the ExecStop path gets a distinct bounded-wait variant – not a fail-fast call. “ExecStop fails fast; in-flight work finishes and a later stop attempt succeeds” is not a valid design: during shutdown there is no later stop attempt. systemctl poweroff can leave the resource (mounted btrfs / open LUKS) in an inconsistent state, and the “in-flight mutator finishes before TimeoutStopSec” claim is not guaranteed.

Current pattern: braid-online.service runs braid lock --systemd-stop --deadline-secs ${braid.lockSystemdStopDeadlineSecs}. The module default is 270 seconds and an assertion requires it to be strictly less than braid-online.service TimeoutStopSec (300 seconds). That deadline bounds only stop-coordinator and pool-lock acquisition; once lock cleanup reaches btrfs balance pause or umount, any kernel wait to quiesce btrfs has no userspace timeout and is bounded only by the unit’s TimeoutStopSec (300 seconds). The systemd-stop path also has a longer transient-busy umount retry (60 attempts at 500ms) because btrfs-progs holds the mount fd while blocked in BTRFS_IOC_BALANCE_V2 and can survive the Rust parent briefly during shutdown. Regular braid lock stays fail-fast for user invocations; the bounded-wait path is documented and tested as a distinct mode.

systemctl start/stop inside held-resource windows

systemctl start <unit> on an already-active oneshot+RemainAfterExit unit is a no-op at the work level, but it still queues a job. If a stop job for the same unit is already in flight (because someone else invoked systemctl stop), the start queues behind the stop. If that stop’s ExecStop= is itself blocked on a resource the caller holds, the result is a deadlock.

This is load-bearing for any CLI that both holds a resource and uses systemctl start/stop on a unit whose ExecStart=/ExecStop= touches that resource (e.g. Rust dispatch holding pool.lock while activating braid-online.service whose ExecStop calls braid lock).

These rules govern start/stop of braid-online.service itself. The systemctl stop calls in run_lock_pre_steps target bound consumers and scrub units, not the lifecycle owner, so they queue no job against braid-online.service and the start-behind-stop deadlock above does not apply to them.

Rules:

  1. Snapshot full unit state at the start of the held-resource window with systemctl show -P ActiveState <unit>. Do NOT use systemctl is-active – it returns “active” only for active, classifying activating and deactivating as not-active. A deactivating unit (its ExecStop is already running and waiting on the held resource) snapshotted as “not active” leads the caller to issue a start that queues behind the in-flight stop – the exact deadlock the snapshot was supposed to prevent.
  2. Only emit systemctl start <unit> at the end of the window if the snapshot was inactive or failed. Skip when active, activating, or deactivating. See ADR 026 snapshot rule.
  3. Only emit systemctl stop <unit> at the end of the window if the snapshot was active or activating. Skip when inactive, failed, or deactivating.
    • Exception: plain braid lock’s post-success mark_offline runs a synchronous systemctl stop braid-online.service without a stop-side snapshot. It is safe because /run/braid-stop-coordinator.lock plus the done\n protocol guarantees the recursive ExecStop reentry exits 0 once plain braid lock has finished cmd_lock, instead of queuing behind the in-flight stop. This coordinator is the mechanism that replaces the stop-side snapshot gate for mark_offline; see ADR 026 stop coordinator. mark_offline skips the synchronous stop when the post-cleanup mountpoint -q check itself fails (e.g. OnlineError::Spawn mid-shutdown): the unit stays active and the operator retries. Treating unknown mount state as still-mounted mirrors mark_online’s start-side fail-safe.

Consumer dependency contracts

Services that depend on the pool being mounted use one of three patterns:

Frequent periodic services (monitor): ConditionPathIsMountPoint only. Neither After nor BindsTo on mnt-storage.mount – those directives force systemd to load the unit, which doesn’t exist until the CLI mounts the pool at runtime (auto-generated from /proc/mounts). The condition gate silently skips the service when unmounted. Fires every 5 minutes – missed fires are cheap, so lifecycle binding is unnecessary.

Infrequent periodic services (scrub): The timer, scrub service, and resume trigger use BindsTo + After on braid-online.service; the timer and trigger are wantedBy the online unit. The timer’s active lifecycle matches the pool’s online period. Persistent=true handles catch-up for overdue fires. Unlike the monitor timer (which fires every 5 minutes and can afford missed runs), the monthly scrub timer cannot wait until next month if it misses – lifecycle binding ensures it fires on the next unlock. The scrub service and resume trigger also get ConditionPathIsMountPoint as defense-in-depth. For manual lock, Rust dispatch stops the timer, resume trigger, and scrub service before unmount (see above).

Long-running services holding open files (samba, nfs): Use the full WantedBy=braid-online.service + BindsTo=braid-online.service + After=braid-online.service triad (same shape as the scrub timer above), plus ConditionPathIsMountPoint=<pool mount>. BindsTo + After ensures systemd stops them before braid lock runs ExecStop, preventing unmount failures from busy filesystems; WantedBy ensures they restart automatically when braid unlock reactivates braid-online.service. The triad handles the unlock-start and lock-stop lifecycle, but these consumers carry their own boot or direct-start edges – NixOS wants samba-smbd.service from samba.target and nfs-server.service from multi-user.target. For starts not initiated by braid-online.service, ConditionPathIsMountPoint is the load-bearing gate that prevents serving an offline mount directory. Rust dispatch iterates BoundBy braid-online.service and stops these consumers before unmount, mirroring the cascade systemd performs on shutdown for user-initiated lock. See ../../guides/sharing-and-permissions.md#binding-shares-to-the-pool-lifecycle for the user-facing example.

Key design constraints

  1. No hard boot dependencies. wants everywhere, never requires. Pool failure never blocks boot.
  2. Rust-synchronized lifecycle. For dispatch-managed operations, Rust keeps braid-online synchronized with pool mount state: it activates the service only after mountpoint -q succeeds, and deactivates it after a successful lock. ConditionPathIsMountPoint on the unit is defense-in-depth against direct systemctl start when unmounted. Out-of-band mount or unmount bypasses dispatch and can leave braid-online stale; braid lock handles already-unmounted pools gracefully.
  3. One passphrase prompt. braid-unlock.service is the sole interactive prompt source. The CLI opens all LUKS devices from that single passphrase.
  4. Graceful degradation. If braid-online activation fails, the pool is still mounted and usable – only the shutdown hook is missing (warned to stderr).
  5. One pool operation at a time. Enforced by a non-blocking flock in Rust dispatch, not wrapper logic or unit topology – concurrent attempts are rejected, not queued. See Principle 12.

See

  • modules/braid/storage.nix — unit definitions
  • modules/braid/monitor.nix — monitor/alert units
  • modules/braid/braid-wrapper.sh — pure exec shim
  • 026-pool-lock-rust-owned.md — Rust-owned pool lock and lifecycle synchronization
  • 003-resilient-boot.md — why no hard dependencies
  • 017-runtime-disk-membership.md — lifecycle model context
  • tests/module/systemd-lifecycle.py — state machine test suite